断点爬取

This commit is contained in:
ray
2026-04-27 09:16:07 +08:00
parent 5c1d0f3fad
commit 2e4ce07340
6 changed files with 283 additions and 38 deletions

View File

@@ -59,6 +59,17 @@ npm run sync
npm run bills
```
### 基于 checkpoint 断点续爬账单
```powershell
npm run bills -- --resume
```
作用:
- 自动读取 `data/checkpoints/bills/` 下最新 checkpoint。
- 从 checkpoint 记录的月份和页码之后继续抓取。
### 启动定时同步
```powershell
@@ -173,6 +184,12 @@ python aps_db_sync.py --sync-target bills
python aps_db_sync.py --incremental --sync-target bills
```
### 直接将最新 bills checkpoint 入库
```powershell
python aps_db_sync.py --sync-target bills --from-checkpoint
```
### 查询数据库最新账单消费时间
```powershell