断点爬取
This commit is contained in:
@@ -59,6 +59,17 @@ npm run sync
|
||||
npm run bills
|
||||
```
|
||||
|
||||
### 基于 checkpoint 断点续爬账单
|
||||
|
||||
```powershell
|
||||
npm run bills -- --resume
|
||||
```
|
||||
|
||||
作用:
|
||||
|
||||
- 自动读取 `data/checkpoints/bills/` 下最新 checkpoint。
|
||||
- 从 checkpoint 记录的月份和页码之后继续抓取。
|
||||
|
||||
### 启动定时同步
|
||||
|
||||
```powershell
|
||||
@@ -173,6 +184,12 @@ python aps_db_sync.py --sync-target bills
|
||||
python aps_db_sync.py --incremental --sync-target bills
|
||||
```
|
||||
|
||||
### 直接将最新 bills checkpoint 入库
|
||||
|
||||
```powershell
|
||||
python aps_db_sync.py --sync-target bills --from-checkpoint
|
||||
```
|
||||
|
||||
### 查询数据库最新账单消费时间
|
||||
|
||||
```powershell
|
||||
|
||||
Reference in New Issue
Block a user