Automasi OSAN V1
Sistem berjalan dengan PostgreSQL sebagai pusat data, UI reviewer di Next.js
(port 3000), dan backend API Python di 127.0.0.1:8787 (API-only).
Stage 4 menilai kelayakan dokumen sekaligus mengisi target_form.
Human review hanya menangani kasus gagal (fail-only).
Dokumen auto-approve bisa langsung diupload jika mode run tidak mewajibkan approval reviewer
dan tidak dalam dry-run. Reviewer juga dibantu Copilot berbasis OpenRouter,
dengan fallback rule-based jika koneksi LLM bermasalah.
0) Gambaran Umum Sistem
Visual ini menjelaskan alur end-to-end dengan bahasa non-teknis: dokumen masuk, dinilai otomatis, hanya kasus pengecualian yang naik ke reviewer, lalu dokumen yang lolos dipublikasikan. Reviewer dapat meminta bantuan Copilot, dan hasil akhir tetap mengikuti mode run yang dipilih.
Auto-Approve
Dokumen lolos penilaian LLM dan validasi form. Bisa langsung diupload bila run tidak mewajibkan approval reviewer dan bukan dry-run.
Human Review (Fail-Only)
Hanya dokumen bermasalah yang diperiksa reviewer, termasuk edit form jika perlu. Reviewer bisa memakai Copilot OpenRouter sebagai saran.
Auto-Reject
Dokumen duplikat atau tidak layak dihentikan otomatis. Pada kasus duplicate, call LLM Stage 4 bisa di-skip (status skipped_duplicate).
Peran reviewer
Reviewer fokus ke kasus pengecualian, sehingga mayoritas dokumen tetap diproses otomatis.
Kontrol kualitas
Setiap keputusan punya jejak audit di database untuk pelacakan dan evaluasi.
Observability Copilot
UI menampilkan provider, status, fallback_reason, serta usage token/cost untuk memantau kualitas dan biaya LLM.
1) Arsitektur Runtime
flowchart LR
U[User Browser]
N[Next.js UI\n127.0.0.1:3000\nroute: / dan /review]
A[/api/auth/login\ncredential dari env:\nREVIEW_UI_USERNAME / REVIEW_UI_PASSWORD/]
G[Middleware Auth Gate\nsemua dashboard + /backend/* wajib login]
P[/backend/* proxy rewrite/]
H[Human Review API\n127.0.0.1:8787\nThreadingHTTPServer]
O[Pipeline Orchestrator\nstage2 -> stage6]
OR[OpenRouter LLM\nuntuk reviewer copilot]
S2[ingest_extract.py]
S3[dedup_check.py]
S4[llm_score_policy.py]
S6[upload_worker.py]
DB[(PostgreSQL)]
EV[(pipeline_state + pipeline_events)]
AU[(review_audit)]
RT[(data/runtime)]
TS[Target Site]
SW[(SeaweedFS S3 opsional)]
U --> N
N --> A
N --> G --> P --> H
H --> DB
H --> O
H --> OR
O --> S2 --> DB
O --> S3 --> DB
O --> S4 --> DB
O --> S6 --> DB
O --> EV
H --> AU
S6 --> TS
S2 --> SW
S6 --> RT
classDef ui fill:#e9f3ff,stroke:#417fca,color:#133a61;
classDef svc fill:#ebfcf9,stroke:#33a39a,color:#0f514c;
classDef data fill:#fdf5e8,stroke:#ca8a2b,color:#68400a;
class U,N,A,G,P ui;
class H,O,S2,S3,S4,S6 svc;
class DB,EV,AU,RT,TS,SW,OR data;
Port aktif
3000 UI Next.js, 127.0.0.1:8787 backend API lokal (API-only).
Kontrol pipeline
Start/Resume pipeline dilakukan dari UI, lalu backend meneruskan eksekusinya ke orchestrator.
Route UI terpisah
/ untuk overview pipeline, /review untuk review kasus gagal.
Catatan outbound LLM
Call Copilot ke OpenRouter membutuhkan CA cert valid (SSL_CERT_FILE, REQUESTS_CA_BUNDLE) agar tidak gagal SSL.
2) Lifecycle Dokumen
Status konseptual tiap dokumen berdasarkan hasil stage dan keputusan reviewer.
stateDiagram-v2
[*] --> PDF_AVAILABLE
PDF_AVAILABLE --> STAGE2_DONE: metadata tersimpan
STAGE2_DONE --> STAGE3_DONE: dedup_result tersimpan
STAGE3_DONE --> STAGE4_SKIPPED_DUP: dedup = duplicate
STAGE4_SKIPPED_DUP --> AUTO_REJECTED: policy auto-reject (llm skipped_duplicate)
STAGE3_DONE --> STAGE4_DONE: non-duplicate
STAGE4_DONE --> WAITING_REVIEW: llm gagal / form review / policy guardrail
STAGE4_DONE --> READY_UPLOAD: approve + confidence cukup + form pass
STAGE4_DONE --> AUTO_REJECTED: reject + low score
WAITING_REVIEW --> READY_UPLOAD: reviewer approve (boleh edit target_form)
WAITING_REVIEW --> REJECTED_BY_REVIEWER: reviewer reject
WAITING_REVIEW --> WAITING_REVIEW: reviewer edit / gunakan copilot
READY_UPLOAD --> UPLOADING: full mode + not dry-run + gate approval terpenuhi
UPLOADING --> UPLOADED: upload_result success
UPLOADING --> UPLOAD_FAILED: upload_result failed
UPLOADING --> SKIPPED_DUPLICATE: duplicate detected
AUTO_REJECTED --> [*]
REJECTED_BY_REVIEWER --> [*]
SKIPPED_DUPLICATE --> [*]
UPLOADED --> [*]
UPLOAD_FAILED --> [*]
Gate manual
Queue review default hanya berisi route human-review (kasus gagal).
Gate dedup
Dokumen duplicate dipetakan ke auto-reject sejak policy Stage 4.
Bukti hasil
Status akhir dapat dibaca dari payload upload_result di tabel documents.
Makna status waiting_review
Ini status di level run. UI terbaru memberi hint kontekstual: masih menunggu review, siap resume upload, atau selesai tanpa dokumen uploadable.
3) Urutan Aksi UI ke Upload
Urutan endpoint dari login, review fail-only, sampai resume upload.
sequenceDiagram
autonumber
participant User
participant UI as Next.js 3000
participant Auth as /api/auth/*
participant API as Backend 8787
participant Orch as pipeline_orchestrator
participant PG as PostgreSQL
participant Upl as upload_worker
participant OR as OpenRouter
participant Target as Target Site
User->>UI: Buka /login
UI->>Auth: POST /api/auth/login (credential dari env)
Auth-->>UI: Set cookie toba_auth
User->>UI: Buka / (Overview) atau /review
UI->>API: GET /backend/api/queue
API->>PG: SELECT documents + payload stage
PG-->>API: Queue payload
API-->>UI: Queue list
Note over UI,API: Queue default = policy_route human-review
User->>UI: Start pipeline
UI->>API: POST /backend/api/pipeline/start (mode=full / until_review)
API->>Orch: start run
Orch->>PG: upsert pipeline_state + pipeline_events
Orch->>PG: simpan stage2, stage3, stage4
User->>UI: Generate / Refresh Copilot
UI->>API: POST /backend/api/doc/{id}/copilot
API->>OR: chat/completions (OpenRouter)
OR-->>API: suggestion + usage
API->>PG: simpan human_review.copilot_suggestion
Note over API,OR: Jika call LLM gagal (mis. SSL/API), fallback rule_based_copilot
User->>UI: Approve/edit dokumen fail-case
UI->>API: POST /backend/api/doc/{id}/decision\n(+ edited_fields.target_form)
API->>PG: set human_review + insert review_audit
User->>UI: Lanjutkan upload
UI->>API: POST /backend/api/pipeline/resume
API->>Orch: resume_upload
Orch->>Upl: run stage6
Upl->>Target: Login + isi form + submit
Upl->>PG: set upload_result
Note over UI,Upl: Auto-approve hanya langsung upload jika require_human_approval=false dan upload_dry_run=false
User->>UI: Klik Refresh Status
UI->>API: GET /backend/api/pipeline/status + /backend/api/queue
API-->>UI: status terbaru + ringkasan queue + hint alur
Mode pipeline
fulluntil_reviewresume_upload
Aksi reviewer
approverejectedit
Endpoint utama
/api/queue, /api/doc/{id}, /api/doc/{id}/decision, /api/doc/{id}/copilot, /api/pipeline/start, /api/pipeline/resume, /api/pipeline/status.
Payload dokumen memuat target_form_suggested, target_form_resolved, form_validation_status, form_validation_issues, dan allowed_options.
Gate login
Route dashboard dan request ke /backend/* hanya dapat diakses setelah login.
4) Model Data PostgreSQL (Aktual)
Schema yang dipakai langsung oleh script stage 2-6 saat ini.
erDiagram
DOCUMENTS ||--o{ REVIEW_AUDIT : has
DOCUMENTS {
text document_id PK
text source_filename
text source_path
text source_sha256 UK
bigint source_size_bytes
jsonb metadata
jsonb dedup_result
jsonb llm_policy_result
jsonb human_review
jsonb upload_result
timestamptz created_at
timestamptz updated_at
}
REVIEW_AUDIT {
bigserial id PK
text review_id
text document_id FK
timestamptz event_at
jsonb payload
}
PIPELINE_STATE {
text name PK
jsonb state
timestamptz updated_at
}
PIPELINE_EVENTS {
bigserial id PK
text run_id
text stage_id
text event
timestamptz event_at
jsonb payload
}
Desain sederhana
Satu tabel utama documents menyimpan payload tiap stage dalam kolom JSONB.
Audit reviewer
Riwayat keputusan reviewer ada di review_audit untuk jejak perubahan.
Monitoring pipeline
Status run dan event stage disimpan di pipeline_state dan pipeline_events.
5) Checklist Operasional Singkat
Urutan minimal agar alur fail-only berjalan konsisten dengan arsitektur saat ini.
Step 1 - Jalankan backend API
python3 backend/scripts/human_review_ui.py --runtime-dir data/runtime --host 127.0.0.1 --port 8787
Step 2 - Jalankan frontend
UI Next.js di 3000, akses dengan akun login internal.
cd frontend/web npm install npm run dev
Buka / untuk overview, lalu pindah ke /review untuk pengambilan keputusan reviewer.
Step 3 - Cek pemetaan form target
Jalankan sekali di awal, lalu ulang jika opsi form target berubah.
python3 backend/scripts/upload_worker.py --runtime-dir data/runtime --env-file .env --inspect-form-options
Step 4 - Jalankan Stage 4 (backfill/refresh policy)
Memproses ulang scoring LLM dan policy untuk semua dokumen di PostgreSQL.
python3 backend/scripts/llm_score_policy.py --env-file .env
Step 5 - Validasi koneksi Copilot OpenRouter
Pastikan env SSL aktif agar call LLM tidak fallback karena masalah sertifikat.
SSL_CERT_FILE=/path/to/certifi/cacert.pem REQUESTS_CA_BUNDLE=/path/to/certifi/cacert.pem
Step 6 - Gunakan Refresh Status
Di overview/review, tombol Refresh Status menyegarkan status pipeline, queue, dan hint alur terbaru.