Automasi OSAN V1

Sistem berjalan dengan PostgreSQL sebagai pusat data, UI reviewer di Next.js (port 3000), dan backend API Python di 127.0.0.1:8787 (API-only). Stage 4 menilai kelayakan dokumen sekaligus mengisi target_form. Human review hanya menangani kasus gagal (fail-only). Dokumen auto-approve bisa langsung diupload jika mode run tidak mewajibkan approval reviewer dan tidak dalam dry-run. Reviewer juga dibantu Copilot berbasis OpenRouter, dengan fallback rule-based jika koneksi LLM bermasalah.

Frontend: Next.js 3000 Backend API: Python 127.0.0.1:8787 Source of truth: PostgreSQL Fail-only human review Copilot Reviewer: OpenRouter + fallback Runtime logs di data/runtime Upload target via Playwright

0) Gambaran Umum Sistem

Visual ini menjelaskan alur end-to-end dengan bahasa non-teknis: dokumen masuk, dinilai otomatis, hanya kasus pengecualian yang naik ke reviewer, lalu dokumen yang lolos dipublikasikan. Reviewer dapat meminta bantuan Copilot, dan hasil akhir tetap mengikuti mode run yang dipilih.

Akses Pengguna
Uploader
->
Reviewer
->
Dashboard Internal
Mesin Otomasi
PDF Masuk
->
Sistem Baca Konten
->
Cek Duplikasi
->
LLM Menilai Kelayakan
->
Validasi Isian Form
Keputusan Sistem

Auto-Approve

Dokumen lolos penilaian LLM dan validasi form. Bisa langsung diupload bila run tidak mewajibkan approval reviewer dan bukan dry-run.

Human Review (Fail-Only)

Hanya dokumen bermasalah yang diperiksa reviewer, termasuk edit form jika perlu. Reviewer bisa memakai Copilot OpenRouter sebagai saran.

Auto-Reject

Dokumen duplikat atau tidak layak dihentikan otomatis. Pada kasus duplicate, call LLM Stage 4 bisa di-skip (status skipped_duplicate).

Pendamping Reviewer
Generate Copilot (OpenRouter)
->
Jika gagal: Rule-Based Fallback
->
Reviewer putuskan approve/reject/edit
Publikasi
Upload Otomatis
->
Website Target
Jejak & Audit
Semua hasil tersimpan di PostgreSQL
Audit keputusan reviewer tercatat
Log proses disimpan di data/runtime

Peran reviewer

Reviewer fokus ke kasus pengecualian, sehingga mayoritas dokumen tetap diproses otomatis.

Kontrol kualitas

Setiap keputusan punya jejak audit di database untuk pelacakan dan evaluasi.

Observability Copilot

UI menampilkan provider, status, fallback_reason, serta usage token/cost untuk memantau kualitas dan biaya LLM.

1) Arsitektur Runtime

flowchart LR
  U[User Browser]
  N[Next.js UI\n127.0.0.1:3000\nroute: / dan /review]
  A[/api/auth/login\ncredential dari env:\nREVIEW_UI_USERNAME / REVIEW_UI_PASSWORD/]
  G[Middleware Auth Gate\nsemua dashboard + /backend/* wajib login]
  P[/backend/* proxy rewrite/]
  H[Human Review API\n127.0.0.1:8787\nThreadingHTTPServer]
  O[Pipeline Orchestrator\nstage2 -> stage6]
  OR[OpenRouter LLM\nuntuk reviewer copilot]

  S2[ingest_extract.py]
  S3[dedup_check.py]
  S4[llm_score_policy.py]
  S6[upload_worker.py]

  DB[(PostgreSQL)]
  EV[(pipeline_state + pipeline_events)]
  AU[(review_audit)]
  RT[(data/runtime)]
  TS[Target Site]
  SW[(SeaweedFS S3 opsional)]

  U --> N
  N --> A
  N --> G --> P --> H
  H --> DB
  H --> O
  H --> OR

  O --> S2 --> DB
  O --> S3 --> DB
  O --> S4 --> DB
  O --> S6 --> DB

  O --> EV
  H --> AU
  S6 --> TS
  S2 --> SW
  S6 --> RT

  classDef ui fill:#e9f3ff,stroke:#417fca,color:#133a61;
  classDef svc fill:#ebfcf9,stroke:#33a39a,color:#0f514c;
  classDef data fill:#fdf5e8,stroke:#ca8a2b,color:#68400a;

  class U,N,A,G,P ui;
  class H,O,S2,S3,S4,S6 svc;
  class DB,EV,AU,RT,TS,SW,OR data;
          

Port aktif

3000 UI Next.js, 127.0.0.1:8787 backend API lokal (API-only).

Kontrol pipeline

Start/Resume pipeline dilakukan dari UI, lalu backend meneruskan eksekusinya ke orchestrator.

Route UI terpisah

/ untuk overview pipeline, /review untuk review kasus gagal.

Catatan outbound LLM

Call Copilot ke OpenRouter membutuhkan CA cert valid (SSL_CERT_FILE, REQUESTS_CA_BUNDLE) agar tidak gagal SSL.

2) Lifecycle Dokumen

Status konseptual tiap dokumen berdasarkan hasil stage dan keputusan reviewer.

stateDiagram-v2
  [*] --> PDF_AVAILABLE
  PDF_AVAILABLE --> STAGE2_DONE: metadata tersimpan
  STAGE2_DONE --> STAGE3_DONE: dedup_result tersimpan
  STAGE3_DONE --> STAGE4_SKIPPED_DUP: dedup = duplicate
  STAGE4_SKIPPED_DUP --> AUTO_REJECTED: policy auto-reject (llm skipped_duplicate)
  STAGE3_DONE --> STAGE4_DONE: non-duplicate

  STAGE4_DONE --> WAITING_REVIEW: llm gagal / form review / policy guardrail
  STAGE4_DONE --> READY_UPLOAD: approve + confidence cukup + form pass
  STAGE4_DONE --> AUTO_REJECTED: reject + low score

  WAITING_REVIEW --> READY_UPLOAD: reviewer approve (boleh edit target_form)
  WAITING_REVIEW --> REJECTED_BY_REVIEWER: reviewer reject
  WAITING_REVIEW --> WAITING_REVIEW: reviewer edit / gunakan copilot

  READY_UPLOAD --> UPLOADING: full mode + not dry-run + gate approval terpenuhi
  UPLOADING --> UPLOADED: upload_result success
  UPLOADING --> UPLOAD_FAILED: upload_result failed
  UPLOADING --> SKIPPED_DUPLICATE: duplicate detected

  AUTO_REJECTED --> [*]
  REJECTED_BY_REVIEWER --> [*]
  SKIPPED_DUPLICATE --> [*]
  UPLOADED --> [*]
  UPLOAD_FAILED --> [*]
          

Gate manual

Queue review default hanya berisi route human-review (kasus gagal).

Gate dedup

Dokumen duplicate dipetakan ke auto-reject sejak policy Stage 4.

Bukti hasil

Status akhir dapat dibaca dari payload upload_result di tabel documents.

Makna status waiting_review

Ini status di level run. UI terbaru memberi hint kontekstual: masih menunggu review, siap resume upload, atau selesai tanpa dokumen uploadable.

3) Urutan Aksi UI ke Upload

Urutan endpoint dari login, review fail-only, sampai resume upload.

sequenceDiagram
  autonumber
  participant User
  participant UI as Next.js 3000
  participant Auth as /api/auth/*
  participant API as Backend 8787
  participant Orch as pipeline_orchestrator
  participant PG as PostgreSQL
  participant Upl as upload_worker
  participant OR as OpenRouter
  participant Target as Target Site

  User->>UI: Buka /login
  UI->>Auth: POST /api/auth/login (credential dari env)
  Auth-->>UI: Set cookie toba_auth

  User->>UI: Buka / (Overview) atau /review
  UI->>API: GET /backend/api/queue
  API->>PG: SELECT documents + payload stage
  PG-->>API: Queue payload
  API-->>UI: Queue list
  Note over UI,API: Queue default = policy_route human-review

  User->>UI: Start pipeline
  UI->>API: POST /backend/api/pipeline/start (mode=full / until_review)
  API->>Orch: start run
  Orch->>PG: upsert pipeline_state + pipeline_events
  Orch->>PG: simpan stage2, stage3, stage4

  User->>UI: Generate / Refresh Copilot
  UI->>API: POST /backend/api/doc/{id}/copilot
  API->>OR: chat/completions (OpenRouter)
  OR-->>API: suggestion + usage
  API->>PG: simpan human_review.copilot_suggestion
  Note over API,OR: Jika call LLM gagal (mis. SSL/API), fallback rule_based_copilot

  User->>UI: Approve/edit dokumen fail-case
  UI->>API: POST /backend/api/doc/{id}/decision\n(+ edited_fields.target_form)
  API->>PG: set human_review + insert review_audit

  User->>UI: Lanjutkan upload
  UI->>API: POST /backend/api/pipeline/resume
  API->>Orch: resume_upload
  Orch->>Upl: run stage6
  Upl->>Target: Login + isi form + submit
  Upl->>PG: set upload_result
  Note over UI,Upl: Auto-approve hanya langsung upload jika require_human_approval=false dan upload_dry_run=false

  User->>UI: Klik Refresh Status
  UI->>API: GET /backend/api/pipeline/status + /backend/api/queue
  API-->>UI: status terbaru + ringkasan queue + hint alur
          

Mode pipeline

fulluntil_reviewresume_upload

Aksi reviewer

approverejectedit

Endpoint utama

/api/queue, /api/doc/{id}, /api/doc/{id}/decision, /api/doc/{id}/copilot, /api/pipeline/start, /api/pipeline/resume, /api/pipeline/status.

Payload dokumen memuat target_form_suggested, target_form_resolved, form_validation_status, form_validation_issues, dan allowed_options.

Gate login

Route dashboard dan request ke /backend/* hanya dapat diakses setelah login.

4) Model Data PostgreSQL (Aktual)

Schema yang dipakai langsung oleh script stage 2-6 saat ini.

erDiagram
    DOCUMENTS ||--o{ REVIEW_AUDIT : has

    DOCUMENTS {
      text document_id PK
      text source_filename
      text source_path
      text source_sha256 UK
      bigint source_size_bytes
      jsonb metadata
      jsonb dedup_result
      jsonb llm_policy_result
      jsonb human_review
      jsonb upload_result
      timestamptz created_at
      timestamptz updated_at
    }

    REVIEW_AUDIT {
      bigserial id PK
      text review_id
      text document_id FK
      timestamptz event_at
      jsonb payload
    }

    PIPELINE_STATE {
      text name PK
      jsonb state
      timestamptz updated_at
    }

    PIPELINE_EVENTS {
      bigserial id PK
      text run_id
      text stage_id
      text event
      timestamptz event_at
      jsonb payload
    }
          

Desain sederhana

Satu tabel utama documents menyimpan payload tiap stage dalam kolom JSONB.

Audit reviewer

Riwayat keputusan reviewer ada di review_audit untuk jejak perubahan.

Monitoring pipeline

Status run dan event stage disimpan di pipeline_state dan pipeline_events.

5) Checklist Operasional Singkat

Urutan minimal agar alur fail-only berjalan konsisten dengan arsitektur saat ini.

Step 1 - Jalankan backend API

python3 backend/scripts/human_review_ui.py --runtime-dir data/runtime --host 127.0.0.1 --port 8787

Step 2 - Jalankan frontend

UI Next.js di 3000, akses dengan akun login internal.

cd frontend/web
npm install
npm run dev

Buka / untuk overview, lalu pindah ke /review untuk pengambilan keputusan reviewer.

Step 3 - Cek pemetaan form target

Jalankan sekali di awal, lalu ulang jika opsi form target berubah.

python3 backend/scripts/upload_worker.py --runtime-dir data/runtime --env-file .env --inspect-form-options

Step 4 - Jalankan Stage 4 (backfill/refresh policy)

Memproses ulang scoring LLM dan policy untuk semua dokumen di PostgreSQL.

python3 backend/scripts/llm_score_policy.py --env-file .env

Step 5 - Validasi koneksi Copilot OpenRouter

Pastikan env SSL aktif agar call LLM tidak fallback karena masalah sertifikat.

SSL_CERT_FILE=/path/to/certifi/cacert.pem
REQUESTS_CA_BUNDLE=/path/to/certifi/cacert.pem

Step 6 - Gunakan Refresh Status

Di overview/review, tombol Refresh Status menyegarkan status pipeline, queue, dan hint alur terbaru.