This commit is contained in:
ray
2026-04-21 21:16:56 +08:00
parent aa67b0e37e
commit 19e8a833ba
9 changed files with 777 additions and 82 deletions

304
aliyun-sync/COMMANDS.md Normal file
View File

@@ -0,0 +1,304 @@
# 阿里云 APS 同步命令清单
本文档整理项目中常用的爬取、同步、入库、调度、增量和运行时控制命令。
## 1. 项目目录
### 前端爬取项目
```powershell
cd D:\project\python\aliyun-sync\aliyun-aps-sync
```
### 后端入库项目
```powershell
cd D:\project\python\aliyun-sync\aps-aliyun-sync
```
## 2. 安装依赖
在前端爬取项目目录执行:
```powershell
npm install
```
## 3. 登录
```powershell
npm run login
```
作用:
- 打开浏览器。
- 手动完成阿里云 / RAM 登录。
- 自动验证“我的客户”和“账单查询”页面。
- 保存登录态到 `.browser/``.browser/storage-state.json`
## 4. 爬取 / 同步命令
### 全量同步全部模块
```powershell
npm run sync
```
默认同步:
- customers
- customerDetails
- orders
- orderDetails
- bills
### 仅爬取账单
```powershell
npm run bills
```
### 启动定时同步
```powershell
npm run schedule
```
默认 cron
```env
ALIYUN_APS_CRON=0 6 * * *
```
表示每天早上 6 点执行。
## 5. 增量同步
### 默认增量同步
`.env` 中设置:
```env
ALIYUN_APS_FULL_SYNC=false
```
然后执行:
```powershell
npm run sync
```
默认行为:
- 订单只查昨天。
- 订单详情跟随本次订单结果。
- 账单按数据库最新消费时间增量。
### 指定订单 / 订单详情增量起始日期
临时命令方式:
```powershell
npm run sync -- --incremental-order-start-date=2026-01-01
```
`.env` 固定配置方式:
```env
ALIYUN_APS_FULL_SYNC=false
ALIYUN_APS_INCREMENTAL_ORDER_START_DATE=2026-01-01
```
然后执行:
```powershell
npm run sync
```
行为:
- 订单从指定日期开始补抓到今天。
- 订单详情跟随这些订单抓取。
### 指定账单起始月份
`.env` 中设置:
```env
ALIYUN_APS_BILL_START_MONTH=2024-01
```
然后执行:
```powershell
npm run bills
```
说明:当前账单爬取支持按月份开始;如需按具体日期开始,需要新增账单日期过滤参数。
## 6. 后端入库命令
进入后端入库项目目录:
```powershell
cd D:\project\python\aliyun-sync\aps-aliyun-sync
```
### 全量入库
```powershell
python aps_db_sync.py
```
### 增量入库
```powershell
python aps_db_sync.py --incremental
```
### 指定同步对象入库
```powershell
python aps_db_sync.py --sync-target all
python aps_db_sync.py --sync-target customer
python aps_db_sync.py --sync-target order
python aps_db_sync.py --sync-target orderdetails
python aps_db_sync.py --sync-target bills
```
### 增量只同步账单入库
```powershell
python aps_db_sync.py --incremental --sync-target bills
```
### 查询数据库最新账单消费时间
```powershell
python aps_db_sync.py --latest-bill-consumption-time
```
## 7. 后端调度器命令
进入后端入库项目目录:
```powershell
cd D:\project\python\aliyun-sync\aps-aliyun-sync
```
### 启动调度器
```powershell
python aps_scheduler.py
```
### 指定同步对象启动调度器
```powershell
python aps_scheduler.py --sync-target all
python aps_scheduler.py --sync-target customer
python aps_scheduler.py --sync-target order
python aps_scheduler.py --sync-target orderdetails
python aps_scheduler.py --sync-target bills
```
## 8. 常用 `.env` 示例
文件位置:
```powershell
D:\project\python\aliyun-sync\aliyun-aps-sync\.env
```
### 全量模式
```env
ALIYUN_APS_BASE_URL=https://aps.aliyun.com
ALIYUN_APS_HEADLESS=false
ALIYUN_APS_TIMEZONE=Asia/Shanghai
ALIYUN_APS_CRON=0 6 * * *
ALIYUN_APS_FULL_SYNC=true
ALIYUN_APS_ORDER_START_DATE=2024-01-01
ALIYUN_APS_INCREMENTAL_ORDER_START_DATE=
ALIYUN_APS_BILL_START_MONTH=2024-01
ALIYUN_APS_CLOSE_BROWSER=true
ALIYUN_APS_DB_SYNC_SCRIPT=../aps-aliyun-sync/aps_db_sync.py
```
### 增量模式:默认只查昨天订单
```env
ALIYUN_APS_FULL_SYNC=false
ALIYUN_APS_INCREMENTAL_ORDER_START_DATE=
```
### 增量模式:指定订单起始日期
```env
ALIYUN_APS_FULL_SYNC=false
ALIYUN_APS_INCREMENTAL_ORDER_START_DATE=2026-01-01
```
## 9. 常用组合
### 首次使用
```powershell
cd D:\project\python\aliyun-sync\aliyun-aps-sync
npm install
npm run login
npm run sync
```
### 只抓账单
```powershell
cd D:\project\python\aliyun-sync\aliyun-aps-sync
npm run login
npm run bills
```
### 订单 / 订单详情从指定日期补抓
```powershell
cd D:\project\python\aliyun-sync\aliyun-aps-sync
npm run login
npm run sync -- --incremental-order-start-date=2026-01-01
```
### 抓完后只同步账单入库
```powershell
cd D:\project\python\aliyun-sync\aps-aliyun-sync
python aps_db_sync.py --sync-target bills
```
### 抓完后增量同步账单入库
```powershell
cd D:\project\python\aliyun-sync\aps-aliyun-sync
python aps_db_sync.py --incremental --sync-target bills
```
## 10. 清理登录态
如果登录态异常,可以删除 `.browser` 后重新登录:
```powershell
cd D:\project\python\aliyun-sync\aliyun-aps-sync
Remove-Item -Recurse -Force .browser
npm run login
```
## 11. 运行时热键
脚本运行时可在当前终端中使用:
| 按键 | 功能 |
| --- | --- |
| F7 | 暂停 |
| F8 | 继续 |
| F9 | 终止 |
注意:这是当前终端进程内热键,不是系统级全局热键。

View File

@@ -3,4 +3,5 @@ ALIYUN_APS_HEADLESS=false
ALIYUN_APS_TIMEZONE=Asia/Shanghai ALIYUN_APS_TIMEZONE=Asia/Shanghai
ALIYUN_APS_CRON=0 6 * * * ALIYUN_APS_CRON=0 6 * * *
ALIYUN_APS_ORDER_START_DATE=2024-01-01 ALIYUN_APS_ORDER_START_DATE=2024-01-01
ALIYUN_APS_INCREMENTAL_ORDER_START_DATE=
ALIYUN_APS_BILL_START_MONTH=2024-01 ALIYUN_APS_BILL_START_MONTH=2024-01

View File

@@ -23,9 +23,10 @@ cp .env.example .env
## 配置 ## 配置
`.env` 里最重要的两个时间范围: `.env` 里最重要的时间范围:
- `ALIYUN_APS_ORDER_START_DATE`: 订单查询的起始日期,会按月滚动抓取直到今天。 - `ALIYUN_APS_ORDER_START_DATE`: 订单查询的起始日期,会按月滚动抓取直到今天。
- `ALIYUN_APS_INCREMENTAL_ORDER_START_DATE`: 订单/订单详情在增量模式下的指定起始日期;留空时仍默认只查昨天。
- `ALIYUN_APS_BILL_START_MONTH`: 账单查询的起始佣金月份,会按月滚动抓取直到当前月。 - `ALIYUN_APS_BILL_START_MONTH`: 账单查询的起始佣金月份,会按月滚动抓取直到当前月。
## 使用 ## 使用
@@ -42,6 +43,18 @@ npm run login
npm run sync npm run sync
``` ```
如果需要在增量模式下让订单和订单详情从指定日期开始补抓,可以配置:
```bash
ALIYUN_APS_INCREMENTAL_ORDER_START_DATE=2026-01-01
```
或临时执行:
```bash
npm run sync -- --incremental-order-start-date=2026-01-01
```
3. 常驻定时同步 3. 常驻定时同步
```bash ```bash

View File

@@ -23,6 +23,7 @@ export const config = {
timezone: process.env.ALIYUN_APS_TIMEZONE || 'Asia/Shanghai', timezone: process.env.ALIYUN_APS_TIMEZONE || 'Asia/Shanghai',
cron: process.env.ALIYUN_APS_CRON || '0 6 * * *', cron: process.env.ALIYUN_APS_CRON || '0 6 * * *',
orderStartDate: process.env.ALIYUN_APS_ORDER_START_DATE || '2024-01-01', orderStartDate: process.env.ALIYUN_APS_ORDER_START_DATE || '2024-01-01',
incrementalOrderStartDate: process.env.ALIYUN_APS_INCREMENTAL_ORDER_START_DATE || '',
billStartMonth: process.env.ALIYUN_APS_BILL_START_MONTH || '2024-01', billStartMonth: process.env.ALIYUN_APS_BILL_START_MONTH || '2024-01',
smtp: { smtp: {
host: process.env.ALIYUN_APS_SMTP_HOST || 'smtp.qq.com', host: process.env.ALIYUN_APS_SMTP_HOST || 'smtp.qq.com',
@@ -38,6 +39,7 @@ export const config = {
resumeBillPage: Math.max(1, Number.parseInt(process.env.ALIYUN_APS_RESUME_BILL_PAGE || '1', 10) || 1), resumeBillPage: Math.max(1, Number.parseInt(process.env.ALIYUN_APS_RESUME_BILL_PAGE || '1', 10) || 1),
dbSyncScript: process.env.ALIYUN_APS_DB_SYNC_SCRIPT || '../aps-aliyun-sync/aps_db_sync.py', dbSyncScript: process.env.ALIYUN_APS_DB_SYNC_SCRIPT || '../aps-aliyun-sync/aps_db_sync.py',
userDataDir: ensureDir(path.join(rootDir, '.browser')), userDataDir: ensureDir(path.join(rootDir, '.browser')),
storageStateFile: path.join(rootDir, '.browser', 'storage-state.json'),
dataDir: ensureDir(path.join(rootDir, 'data')), dataDir: ensureDir(path.join(rootDir, 'data')),
downloadDir: ensureDir(path.join(rootDir, 'downloads')), downloadDir: ensureDir(path.join(rootDir, 'downloads')),
}; };

View File

@@ -1,6 +1,13 @@
import { login, scheduleSync, syncAll, syncBillsOnly } from './sync.js'; const args = process.argv.slice(2);
const command = args[0] || 'sync';
const command = process.argv[2] || 'sync'; for (const arg of args.slice(1)) {
if (arg.startsWith('--incremental-order-start-date=')) {
process.env.ALIYUN_APS_INCREMENTAL_ORDER_START_DATE = arg.split('=').slice(1).join('=');
}
}
const { login, scheduleSync, syncAll, syncBillsOnly } = await import('./sync.js');
if (command === 'login') { if (command === 'login') {
await login(); await login();

View File

@@ -4,7 +4,7 @@ import { config } from './config.js';
let lastSentAt = 0; let lastSentAt = 0;
const ONE_HOUR_MS = 60 * 60 * 1000; const ONE_HOUR_MS = 60 * 60 * 1000;
export async function sendLoginAlert() { export async function sendLoginAlert(loginUrl = '') {
const now = Date.now(); const now = Date.now();
if (now - lastSentAt < ONE_HOUR_MS) { if (now - lastSentAt < ONE_HOUR_MS) {
console.log('[通知] 登录提醒1小时内已发送过跳过重复发送'); console.log('[通知] 登录提醒1小时内已发送过跳过重复发送');
@@ -33,7 +33,7 @@ export async function sendLoginAlert() {
}, },
}); });
const url = `${baseUrl}/#/signin`; const url = loginUrl || `${baseUrl}/#/signin`;
const timestamp = new Date().toISOString(); const timestamp = new Date().toISOString();
const subject = '[APS同步] 登录态已过期,请手动登录'; const subject = '[APS同步] 登录态已过期,请手动登录';
const text = [ const text = [

View File

@@ -1,6 +1,8 @@
import { chromium } from 'playwright'; import { chromium } from 'playwright';
import cron from 'node-cron'; import cron from 'node-cron';
import fs from 'node:fs';
import path from 'node:path'; import path from 'node:path';
import readline from 'node:readline';
import { execSync } from 'node:child_process'; import { execSync } from 'node:child_process';
import { config, datasets } from './config.js'; import { config, datasets } from './config.js';
import { sendLoginAlert } from './notify.js'; import { sendLoginAlert } from './notify.js';
@@ -17,6 +19,106 @@ import {
const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms)); const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms));
let _context = null; let _context = null;
let _runtimeController = null;
const AUTH_PAGE_KEYWORDS = [
'RAM 用户登录',
'主账号登录',
'钉钉扫码登录',
'用户名',
'下一步',
'登录并使用 RAM',
];
async function closeContextIfNeeded() {
if (!_context) return;
await _context.close();
_context = null;
}
function getRuntimeController() {
if (_runtimeController) return _runtimeController;
let paused = false;
let terminated = false;
let keypressBound = false;
const onKeypress = (_str, key = {}) => {
if (key.name === 'f7') {
if (!paused) {
paused = true;
console.log('[控制] 已暂停F7。按 F8 继续,按 F9 终止。');
}
return;
}
if (key.name === 'f8') {
if (paused) {
paused = false;
console.log('[控制] 已继续F8。');
}
return;
}
if (key.name === 'f9') {
terminated = true;
paused = false;
console.log('[控制] 已请求终止F9将在安全检查点停止。');
}
};
const bind = () => {
if (keypressBound || !process.stdin.isTTY) return;
readline.emitKeypressEvents(process.stdin);
if (typeof process.stdin.setRawMode === 'function') {
process.stdin.setRawMode(true);
}
process.stdin.resume();
process.stdin.on('keypress', onKeypress);
keypressBound = true;
console.log('[控制] 热键已启用F7 暂停 / F8 继续 / F9 终止');
};
const unbind = () => {
if (!keypressBound) return;
process.stdin.off('keypress', onKeypress);
if (process.stdin.isTTY && typeof process.stdin.setRawMode === 'function') {
process.stdin.setRawMode(false);
}
keypressBound = false;
};
const waitIfPaused = async (label = '任务') => {
if (terminated) {
throw new Error(`[控制] 已终止:${label}`);
}
while (paused) {
await sleep(300);
if (terminated) {
throw new Error(`[控制] 已终止:${label}`);
}
}
};
const throwIfTerminated = (label = '任务') => {
if (terminated) {
throw new Error(`[控制] 已终止:${label}`);
}
};
_runtimeController = {
bind,
unbind,
waitIfPaused,
throwIfTerminated,
};
return _runtimeController;
}
async function runtimeCheckpoint(label) {
const controller = getRuntimeController();
controller.throwIfTerminated(label);
await controller.waitIfPaused(label);
}
async function getContext() { async function getContext() {
if (_context) return _context; if (_context) return _context;
@@ -26,24 +128,103 @@ async function getContext() {
acceptDownloads: true, acceptDownloads: true,
downloadsPath: config.downloadDir, downloadsPath: config.downloadDir,
}); });
await restoreStorageState(_context);
return _context; return _context;
} }
async function restoreStorageState(context) {
if (!fs.existsSync(config.storageStateFile)) {
return;
}
try {
const state = JSON.parse(fs.readFileSync(config.storageStateFile, 'utf-8'));
if (Array.isArray(state.cookies) && state.cookies.length > 0) {
await context.addCookies(state.cookies);
console.log(`[storageState] 已恢复 ${state.cookies.length} 个 cookie`);
}
} catch (error) {
console.warn(`[storageState] 恢复失败,继续使用 .browser profile: ${error.message}`);
}
}
async function saveStorageState(context) {
await context.storageState({ path: config.storageStateFile });
console.log(`[storageState] 已保存登录态快照: ${config.storageStateFile}`);
}
async function getPageBodyPreview(page) {
return page
.evaluate(() => document.body?.innerText?.substring(0, 500) || '(空)')
.catch(() => '(无法获取)');
}
function isAuthUrl(url) {
return /account\.aliyun\.com|signin\.aliyun\.com/.test(url)
|| url.includes('login.htm')
|| url.includes('/#/signin');
}
function hasAuthKeywords(text) {
return AUTH_PAGE_KEYWORDS.some((keyword) => text.includes(keyword));
}
async function detectAuthRedirect(page) {
const currentUrl = page.url();
const bodyText = await getPageBodyPreview(page);
return {
currentUrl,
bodyText,
isAuthPage: isAuthUrl(currentUrl) || hasAuthKeywords(bodyText),
};
}
async function ensureDatasetAccessible(page, dataset, timeout = 120000, options = {}) {
await page.goto(dataset.url, { waitUntil: 'domcontentloaded' });
await waitUntilReady(page, dataset.heading, timeout, options);
}
export async function login() { export async function login() {
const runtimeController = getRuntimeController();
runtimeController.bind();
const context = await getContext(); const context = await getContext();
const cleanupAndExit = async (signal) => {
console.log(`[login] 收到 ${signal},正在保存登录态并关闭浏览器...`);
await closeContextIfNeeded();
process.exit(130);
};
const page = context.pages()[0] || (await context.newPage()); const onSigint = () => {
await page.goto(datasets.customers.url, { waitUntil: 'domcontentloaded' }); void cleanupAndExit('SIGINT');
console.log('请在打开的浏览器里完成阿里云伙伴中心登录,然后回到终端按 Ctrl+C 结束。'); };
await waitUntilReady(page, datasets.customers.heading, 10 * 60 * 1000); const onSigterm = () => {
console.log('登录态已写入 .browser 目录,后续可直接执行 npm run sync。'); void cleanupAndExit('SIGTERM');
};
// 必须正常关闭 context否则登录态不会持久化到磁盘 process.once('SIGINT', onSigint);
await context.close(); process.once('SIGTERM', onSigterm);
_context = null;
try {
const page = context.pages()[0] || (await context.newPage());
await page.goto(datasets.customers.url, { waitUntil: 'domcontentloaded' });
console.log('请在打开的浏览器里完成阿里云伙伴中心登录。检测到进入“我的客户”和“账单查询”页面后,脚本会自动保存登录态并关闭浏览器。');
await waitUntilReady(page, datasets.customers.heading, 10 * 60 * 1000, { allowInteractiveAuth: true });
console.log('[login] 我的客户页验证通过,继续验证账单页登录态...');
await ensureDatasetAccessible(page, datasets.bills, 60 * 1000, { allowInteractiveAuth: true });
await sleep(1000);
await saveStorageState(context);
console.log('登录态已写入 .browser 目录,且已验证“我的客户”和“账单查询”页面可访问,后续可直接执行 npm run sync 或 npm run bills。');
} finally {
process.off('SIGINT', onSigint);
process.off('SIGTERM', onSigterm);
await closeContextIfNeeded();
runtimeController.unbind();
}
} }
export async function syncAll() { export async function syncAll() {
const runtimeController = getRuntimeController();
runtimeController.bind();
const context = await getContext(); const context = await getContext();
try { try {
@@ -67,11 +248,36 @@ export async function syncAll() {
return summary; return summary;
} finally { } finally {
if (config.closeBrowser) { if (config.closeBrowser) {
await context.close(); await closeContextIfNeeded();
_context = null;
} else { } else {
console.log('浏览器保持运行'); console.log('浏览器保持运行');
} }
runtimeController.unbind();
}
}
export async function syncBillsOnly() {
const runtimeController = getRuntimeController();
runtimeController.bind();
const context = await getContext();
try {
const summary = { startedAt: new Date().toISOString(), datasets: {} };
const page = context.pages()[0] || (await context.newPage());
summary.datasets.bills = await syncBills(page);
summary.finishedAt = new Date().toISOString();
const stamp = nowStamp();
saveRunSummary(stamp, summary);
return summary;
} finally {
if (config.closeBrowser) {
await closeContextIfNeeded();
} else {
console.log('浏览器保持运行');
}
runtimeController.unbind();
} }
} }
@@ -106,6 +312,7 @@ export async function scheduleSync() {
} }
async function syncCustomers(page) { async function syncCustomers(page) {
await runtimeCheckpoint('同步客户');
const dataset = datasets.customers; const dataset = datasets.customers;
await page.goto(dataset.url, { waitUntil: 'domcontentloaded' }); await page.goto(dataset.url, { waitUntil: 'domcontentloaded' });
await waitUntilReady(page, dataset.heading); await waitUntilReady(page, dataset.heading);
@@ -115,6 +322,7 @@ async function syncCustomers(page) {
} }
async function syncCustomerDetails(page) { async function syncCustomerDetails(page) {
await runtimeCheckpoint('同步客户详情');
const dataset = datasets.customerDetails; const dataset = datasets.customerDetails;
const customersState = loadCurrentState('customers'); const customersState = loadCurrentState('customers');
const allAccountIds = collectValidAccountIds(customersState.records || []); const allAccountIds = collectValidAccountIds(customersState.records || []);
@@ -130,6 +338,7 @@ async function syncCustomerDetails(page) {
'https://aps.aliyun.com/?spm=5176.12818093.top-nav.ditem-fx.785716d0LKDpKT#/detail/my_customer/~/customer/'; 'https://aps.aliyun.com/?spm=5176.12818093.top-nav.ditem-fx.785716d0LKDpKT#/detail/my_customer/~/customer/';
for (let index = 0; index < allAccountIds.length; index += 1) { for (let index = 0; index < allAccountIds.length; index += 1) {
await runtimeCheckpoint(`客户详情 ${index + 1}/${allAccountIds.length}`);
const accountId = allAccountIds[index]; const accountId = allAccountIds[index];
console.log(`[客户详情] ${index + 1}/${allAccountIds.length} accountId=${accountId}`); console.log(`[客户详情] ${index + 1}/${allAccountIds.length} accountId=${accountId}`);
@@ -158,23 +367,20 @@ async function syncCustomerDetails(page) {
} }
async function syncOrders(page) { async function syncOrders(page) {
await runtimeCheckpoint('同步订单');
const dataset = datasets.orders; const dataset = datasets.orders;
let windows; let windows;
if (config.fullSync) { if (config.fullSync) {
windows = buildMonthlyDateWindows(config.orderStartDate); windows = buildMonthlyDateWindows(config.orderStartDate);
} else { } else {
// 增量模式:只查前一天 windows = buildIncrementalOrderWindows();
const yesterday = new Date();
yesterday.setDate(yesterday.getDate() - 1);
const dateStr = formatDate(yesterday);
windows = [{ windowStart: dateStr, windowEnd: dateStr, start: dateStr, end: dateStr }];
console.log(`[增量模式] 订单仅查询: ${dateStr}`);
} }
const allRecords = []; const allRecords = [];
for (const window of windows) { for (const window of windows) {
await runtimeCheckpoint(`订单窗口 ${window.start} ~ ${window.end}`);
await page.goto(dataset.url, { waitUntil: 'domcontentloaded' }); await page.goto(dataset.url, { waitUntil: 'domcontentloaded' });
await waitUntilReady(page, dataset.heading); await waitUntilReady(page, dataset.heading);
await setDateRange(page, window.start, window.end); await setDateRange(page, window.start, window.end);
@@ -187,7 +393,40 @@ async function syncOrders(page) {
return persistDataset(dataset, dedupeByHash(allRecords), {}); return persistDataset(dataset, dedupeByHash(allRecords), {});
} }
function buildIncrementalOrderWindows() {
const configuredStartDate = normalizeConfiguredDate(config.incrementalOrderStartDate);
if (configuredStartDate) {
const windows = buildMonthlyDateWindows(configuredStartDate);
console.log(`[增量模式] 订单从指定日期开始查询: ${configuredStartDate}`);
return windows;
}
const yesterday = new Date();
yesterday.setDate(yesterday.getDate() - 1);
const dateStr = formatDate(yesterday);
console.log(`[增量模式] 订单仅查询: ${dateStr}`);
return [{ windowStart: dateStr, windowEnd: dateStr, start: dateStr, end: dateStr }];
}
function normalizeConfiguredDate(value) {
const normalized = String(value || '').trim();
if (!normalized) {
return '';
}
if (!/^\d{4}-\d{2}-\d{2}$/.test(normalized)) {
throw new Error(`ALIYUN_APS_INCREMENTAL_ORDER_START_DATE 格式无效: ${normalized},期望 YYYY-MM-DD`);
}
const parsed = new Date(`${normalized}T00:00:00+08:00`);
if (Number.isNaN(parsed.getTime())) {
throw new Error(`ALIYUN_APS_INCREMENTAL_ORDER_START_DATE 不是有效日期: ${normalized}`);
}
return normalized;
}
async function syncBills(page) { async function syncBills(page) {
await runtimeCheckpoint('同步账单');
const dataset = datasets.bills; const dataset = datasets.bills;
let months; let months;
let latestConsumptionDate = null; let latestConsumptionDate = null;
@@ -205,6 +444,7 @@ async function syncBills(page) {
const allRecords = []; const allRecords = [];
for (const month of months) { for (const month of months) {
await runtimeCheckpoint(`账单月份 ${month}`);
await page.goto(dataset.url, { waitUntil: 'domcontentloaded' }); await page.goto(dataset.url, { waitUntil: 'domcontentloaded' });
await waitUntilReady(page, dataset.heading); await waitUntilReady(page, dataset.heading);
await setMonthValue(page, month); await setMonthValue(page, month);
@@ -247,6 +487,7 @@ function isAfterLatestConsumptionDate(record, latestConsumptionDate) {
} }
async function syncOrderDetails(page, cachedOrderIds) { async function syncOrderDetails(page, cachedOrderIds) {
await runtimeCheckpoint('同步订单详情');
const dataset = datasets.orderDetails; const dataset = datasets.orderDetails;
// 使用传入的 orderId 列表(在 syncOrders 覆盖 orders.json 之前缓存的) // 使用传入的 orderId 列表(在 syncOrders 覆盖 orders.json 之前缓存的)
@@ -262,6 +503,7 @@ async function syncOrderDetails(page, cachedOrderIds) {
const detailBaseUrl = 'https://aps.aliyun.com/?spm=5176.12818093.top-nav.ditem-fx.785716d0LKDpKT#/detail/order/~/costCenter/order/detail/'; const detailBaseUrl = 'https://aps.aliyun.com/?spm=5176.12818093.top-nav.ditem-fx.785716d0LKDpKT#/detail/order/~/costCenter/order/detail/';
for (let index = 0; index < allOrderIds.length; index += 1) { for (let index = 0; index < allOrderIds.length; index += 1) {
await runtimeCheckpoint(`订单详情 ${index + 1}/${allOrderIds.length}`);
const orderId = allOrderIds[index]; const orderId = allOrderIds[index];
console.log(`[订单详情] ${index + 1}/${allOrderIds.length} orderId=${orderId}`); console.log(`[订单详情] ${index + 1}/${allOrderIds.length} orderId=${orderId}`);
@@ -304,11 +546,30 @@ function persistDataset(dataset, records, context) {
}; };
} }
async function waitUntilReady(page, heading, timeout = 120000) { async function waitUntilReady(page, heading, timeout = 120000, options = {}) {
await runtimeCheckpoint(`等待页面 ${heading}`);
const { allowInteractiveAuth = false } = options;
await page.waitForLoadState('domcontentloaded'); await page.waitForLoadState('domcontentloaded');
console.log(`[waitUntilReady] 当前URL: ${page.url()}`); console.log(`[waitUntilReady] 当前URL: ${page.url()}`);
console.log(`[waitUntilReady] 等待页面出现: "${heading}"`); console.log(`[waitUntilReady] 等待页面出现: "${heading}"`);
const initialState = await detectAuthRedirect(page);
if (initialState.isAuthPage) {
console.error(`[waitUntilReady] 检测到登录页/鉴权页: ${initialState.currentUrl}`);
console.error(`[waitUntilReady] 页面内容前500字: ${initialState.bodyText}`);
if (!allowInteractiveAuth && isAuthUrl(initialState.currentUrl)) {
try {
await sendLoginAlert(initialState.currentUrl);
} catch (notifyErr) {
console.error('[通知] 发送登录提醒失败:', notifyErr.message);
}
}
if (!allowInteractiveAuth) {
throw new Error(`当前页面仍处于登录/鉴权页,无法进入「${heading}」。请重新执行 npm run login并确认该账号对该页面有访问权限。`);
}
console.log(`[waitUntilReady] 允许交互式登录,等待用户完成认证后进入「${heading}」...`);
}
try { try {
await page.waitForFunction( await page.waitForFunction(
(text) => document.body && document.body.innerText.includes(text), (text) => document.body && document.body.innerText.includes(text),
@@ -317,22 +578,26 @@ async function waitUntilReady(page, heading, timeout = 120000) {
); );
} catch (err) { } catch (err) {
// 超时时打印诊断信息 // 超时时打印诊断信息
const currentUrl = page.url(); const { currentUrl, bodyText, isAuthPage } = await detectAuthRedirect(page);
const bodyText = await page.evaluate(() => document.body?.innerText?.substring(0, 500) || '(空)').catch(() => '(无法获取)');
console.error(`[waitUntilReady] 超时当前URL: ${currentUrl}`); console.error(`[waitUntilReady] 超时当前URL: ${currentUrl}`);
console.error(`[waitUntilReady] 页面内容前500字: ${bodyText}`); console.error(`[waitUntilReady] 页面内容前500字: ${bodyText}`);
if (currentUrl.includes('signin')) { if (isAuthPage && !allowInteractiveAuth) {
try { try {
await sendLoginAlert(); await sendLoginAlert(currentUrl);
} catch (notifyErr) { } catch (notifyErr) {
console.error('[通知] 发送登录提醒失败:', notifyErr.message); console.error('[通知] 发送登录提醒失败:', notifyErr.message);
} }
throw new Error(`当前页面停留在登录/鉴权页,未能进入「${heading}」。请重新执行 npm run login并确认该账号对该页面有访问权限。`);
}
if (isAuthPage && allowInteractiveAuth) {
throw new Error(`交互式登录超时,仍未进入「${heading}」。请确认已在浏览器中完成 RAM/阿里云登录,并且当前账号有访问该页面的权限。`);
} }
throw err; throw err;
} }
if ((await page.locator('text=登录').count()) > 0 && page.url().includes('login')) { const finalState = await detectAuthRedirect(page);
throw new Error('当前未登录,请先执行 npm run login'); if (finalState.isAuthPage && !allowInteractiveAuth) {
throw new Error(`当前页面仍处于登录/鉴权页,未成功进入「${heading}」。请重新执行 npm run login并确认该账号对该页面有访问权限。`);
} }
await sleep(1500); await sleep(1500);
} }
@@ -342,6 +607,7 @@ async function scrapePagedTable(page, dataset, context) {
const visited = new Set(); const visited = new Set();
while (true) { while (true) {
await runtimeCheckpoint(`抓取 ${dataset.name} 分页`);
await waitForTableRows(page); await waitForTableRows(page);
const pageData = await extractTable(page); const pageData = await extractTable(page);
const pageNum = await currentPageNumber(page); const pageNum = await currentPageNumber(page);
@@ -403,6 +669,7 @@ async function extractTable(page) {
} }
async function waitForTableRows(page) { async function waitForTableRows(page) {
await runtimeCheckpoint('等待表格加载');
await page.waitForFunction(() => document.querySelectorAll('table tbody tr').length > 0, null, { timeout: 120000 }); await page.waitForFunction(() => document.querySelectorAll('table tbody tr').length > 0, null, { timeout: 120000 });
await sleep(800); await sleep(800);
} }
@@ -414,6 +681,7 @@ async function currentPageNumber(page) {
} }
async function gotoNextPage(page) { async function gotoNextPage(page) {
await runtimeCheckpoint('翻页');
const before = await currentPageNumber(page); const before = await currentPageNumber(page);
// 用 Playwright locator 定位"下一页"按钮 // 用 Playwright locator 定位"下一页"按钮
@@ -439,6 +707,7 @@ async function gotoNextPage(page) {
} }
async function trySetPageSize(page, pageSize) { async function trySetPageSize(page, pageSize) {
await runtimeCheckpoint(`设置每页 ${pageSize}`);
const input = page.locator('input[aria-label="请选择每页显示几条"]').first(); const input = page.locator('input[aria-label="请选择每页显示几条"]').first();
if ((await input.count()) === 0) return; if ((await input.count()) === 0) return;
await input.click().catch(() => null); await input.click().catch(() => null);
@@ -453,6 +722,7 @@ async function trySetPageSize(page, pageSize) {
} }
async function setDateRange(page, start, end) { async function setDateRange(page, start, end) {
await runtimeCheckpoint(`设置订单日期 ${start} ~ ${end}`);
console.log(`[订单日期] 设置: ${start} ~ ${end}`); console.log(`[订单日期] 设置: ${start} ~ ${end}`);
await _fillDateRange(page, start, end); await _fillDateRange(page, start, end);
@@ -474,6 +744,7 @@ async function setDateRange(page, start, end) {
} }
async function _fillDateRange(page, start, end, startFirst = false) { async function _fillDateRange(page, start, end, startFirst = false) {
await runtimeCheckpoint('填写订单日期');
const trigger = page.locator('input[placeholder="结束日期"]'); const trigger = page.locator('input[placeholder="结束日期"]');
await trigger.click(); await trigger.click();
await sleep(1000); await sleep(1000);
@@ -520,6 +791,7 @@ async function _fillDateRange(page, start, end, startFirst = false) {
} }
async function setMonthValue(page, month) { async function setMonthValue(page, month) {
await runtimeCheckpoint(`设置账单月份 ${month}`);
// 先尝试按 inputValue 匹配 YYYY-MM 格式 // 先尝试按 inputValue 匹配 YYYY-MM 格式
const inputs = page.locator('input'); const inputs = page.locator('input');
const total = await inputs.count(); const total = await inputs.count();
@@ -568,6 +840,7 @@ async function setMonthValue(page, month) {
* 即使面板弹出,快速键入 + Tab 也能在面板滚动前完成提交并关闭。 * 即使面板弹出,快速键入 + Tab 也能在面板滚动前完成提交并关闭。
*/ */
async function typeIntoDateInput(locator, value, page) { async function typeIntoDateInput(locator, value, page) {
await runtimeCheckpoint(`填写日期输入 ${value}`);
// 移除 readonly // 移除 readonly
await locator.evaluate((node) => node.removeAttribute('readonly')); await locator.evaluate((node) => node.removeAttribute('readonly'));
@@ -599,6 +872,7 @@ async function typeIntoDateInput(locator, value, page) {
} }
async function clickQuery(page) { async function clickQuery(page) {
await runtimeCheckpoint('点击查询');
const button = page.locator('button:has-text("查询")').first(); const button = page.locator('button:has-text("查询")').first();
await button.click(); await button.click();
await sleep(1800); await sleep(1800);

View File

@@ -35,6 +35,7 @@ from pymysql.cursors import DictCursor
JsonDict = dict[str, object] JsonDict = dict[str, object]
JsonList = list[JsonDict] JsonList = list[JsonDict]
StatsDict = dict[str, int] StatsDict = dict[str, int]
SyncTarget = str
class DbConfig(TypedDict): class DbConfig(TypedDict):
@@ -48,7 +49,7 @@ class DbConfig(TypedDict):
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Config # Config
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
DB_CONFIG = { DB_CONFIG: DbConfig = {
"host": "172.27.137.236", "host": "172.27.137.236",
"port": 3306, "port": 3306,
"user": "ray", "user": "ray",
@@ -62,6 +63,19 @@ LOG_FORMAT = "%(asctime)s [%(levelname)s] %(message)s"
logging.basicConfig(level=logging.INFO, format=LOG_FORMAT) logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
logger = logging.getLogger("aps_sync") logger = logging.getLogger("aps_sync")
SYNC_TARGET_ALL = "all"
SYNC_TARGET_CUSTOMER = "customer"
SYNC_TARGET_ORDER = "order"
SYNC_TARGET_ORDERDETAILS = "orderdetails"
SYNC_TARGET_BILLS = "bills"
VALID_SYNC_TARGETS = {
SYNC_TARGET_ALL,
SYNC_TARGET_CUSTOMER,
SYNC_TARGET_ORDER,
SYNC_TARGET_ORDERDETAILS,
SYNC_TARGET_BILLS,
}
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Schema DDL # Schema DDL
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@@ -382,6 +396,18 @@ def is_valid_order_id(order_id: str | None) -> bool:
return bool(order_id and order_id.isdigit()) return bool(order_id and order_id.isdigit())
def normalize_sync_target(sync_target: str | None) -> SyncTarget:
if sync_target is None:
return SYNC_TARGET_ALL
normalized = sync_target.strip().lower()
if not normalized:
return SYNC_TARGET_ALL
if normalized not in VALID_SYNC_TARGETS:
valid_targets = ", ".join(sorted(VALID_SYNC_TARGETS))
raise ValueError(f"Invalid sync target: {sync_target}. Expected one of: {valid_targets}")
return normalized
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Sync logic # Sync logic
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@@ -465,7 +491,7 @@ class APSSyncer:
return [cast(JsonDict, record) for record in data_list if isinstance(record, dict)] return [cast(JsonDict, record) for record in data_list if isinstance(record, dict)]
return [] return []
def resolve_data_files(self, data_dir: str) -> tuple[Path, Path, Path, Path, Path]: def resolve_data_files(self, data_dir: str, sync_target: SyncTarget = SYNC_TARGET_ALL) -> tuple[Path, Path, Path, Path, Path]:
root = Path(data_dir) root = Path(data_dir)
if not root.exists() or not root.is_dir(): if not root.exists() or not root.is_dir():
raise FileNotFoundError(f"Data directory not found: {root}") raise FileNotFoundError(f"Data directory not found: {root}")
@@ -475,11 +501,33 @@ class APSSyncer:
order_details_file = root / "orderDetails.json" order_details_file = root / "orderDetails.json"
bills_file = root / "bills.json" bills_file = root / "bills.json"
customer_details_file = root / "customerDetails.json" customer_details_file = root / "customerDetails.json"
for fp in (customers_file, orders_file, order_details_file, bills_file):
required_files_by_target = {
SYNC_TARGET_ALL: (customers_file, orders_file, order_details_file, bills_file),
SYNC_TARGET_CUSTOMER: (customers_file,),
SYNC_TARGET_ORDER: (orders_file,),
SYNC_TARGET_ORDERDETAILS: (order_details_file,),
SYNC_TARGET_BILLS: (bills_file,),
}
for fp in required_files_by_target[sync_target]:
if not fp.exists(): if not fp.exists():
raise FileNotFoundError(f"Required JSON file not found: {fp}") raise FileNotFoundError(f"Required JSON file not found: {fp}")
return customers_file, orders_file, order_details_file, bills_file, customer_details_file return customers_file, orders_file, order_details_file, bills_file, customer_details_file
def fetch_login_to_account_map(self) -> dict[str, str]:
conn = self._require_conn()
with conn.cursor() as cur:
_ = cur.execute("SELECT login_name, account_id FROM aps_customer")
rows = cur.fetchall()
login_to_account: dict[str, str] = {}
for row in rows:
login_name = safe_str(row.get("login_name"), 128)
account_id = safe_str(row.get("account_id"), 32)
if login_name and account_id:
login_to_account[login_name] = account_id
return login_to_account
@staticmethod @staticmethod
def normalize_customer_record(raw: JsonDict) -> JsonDict | None: def normalize_customer_record(raw: JsonDict) -> JsonDict | None:
account_id = safe_str(raw.get("accountId"), 32) account_id = safe_str(raw.get("accountId"), 32)
@@ -906,20 +954,38 @@ class APSSyncer:
self.stats["bills"] += 1 self.stats["bills"] += 1
# ---- Main sync entry ---- # ---- Main sync entry ----
def sync_from_json(self, data_dir: str, incremental: bool = False) -> StatsDict: def sync_from_json(self, data_dir: str, incremental: bool = False, sync_target: str = SYNC_TARGET_ALL) -> StatsDict:
start = datetime.now() start = datetime.now()
customers_file, orders_file, order_details_file, bills_file, customer_details_file = self.resolve_data_files(data_dir) normalized_sync_target = normalize_sync_target(sync_target)
logger.info("Loading source files from %s%s", data_dir, " (增量模式)" if incremental else "") customers_file, orders_file, order_details_file, bills_file, customer_details_file = self.resolve_data_files(
data_dir,
normalized_sync_target,
)
logger.info(
"Loading source files from %s%s%s",
data_dir,
" (增量模式)" if incremental else "",
"" if normalized_sync_target == SYNC_TARGET_ALL else f" (sync_target={normalized_sync_target})",
)
raw_customers: JsonList = []
raw_orders: JsonList = []
raw_order_details: JsonList = []
raw_bills: JsonList = []
if normalized_sync_target in {SYNC_TARGET_ALL, SYNC_TARGET_CUSTOMER}:
raw_customers = self.load_json_records(customers_file)
if normalized_sync_target in {SYNC_TARGET_ALL, SYNC_TARGET_ORDER}:
raw_orders = self.load_json_records(orders_file)
if normalized_sync_target in {SYNC_TARGET_ALL, SYNC_TARGET_ORDERDETAILS}:
raw_order_details = self.load_json_records(order_details_file)
if normalized_sync_target in {SYNC_TARGET_ALL, SYNC_TARGET_BILLS}:
raw_bills = self.load_json_records(bills_file)
raw_customers = self.load_json_records(customers_file)
raw_orders = self.load_json_records(orders_file)
raw_order_details = self.load_json_records(order_details_file)
raw_bills = self.load_json_records(bills_file)
raw_customer_details: JsonList = [] raw_customer_details: JsonList = []
try: try:
if customer_details_file.exists(): if normalized_sync_target in {SYNC_TARGET_ALL, SYNC_TARGET_CUSTOMER} and customer_details_file.exists():
raw_customer_details = self.load_json_records(customer_details_file) raw_customer_details = self.load_json_records(customer_details_file)
else: elif normalized_sync_target in {SYNC_TARGET_ALL, SYNC_TARGET_CUSTOMER}:
logger.info("Optional file missing, skip customer details: %s", customer_details_file) logger.info("Optional file missing, skip customer details: %s", customer_details_file)
except Exception as e: except Exception as e:
logger.warning("Failed to load optional customer details file %s: %s", customer_details_file, e) logger.warning("Failed to load optional customer details file %s: %s", customer_details_file, e)
@@ -941,27 +1007,33 @@ class APSSyncer:
customers: JsonList = [] customers: JsonList = []
skipped_customers = 0 skipped_customers = 0
for raw in raw_customers: if normalized_sync_target in {SYNC_TARGET_ALL, SYNC_TARGET_CUSTOMER}:
c = self.normalize_customer_record(raw) for raw in raw_customers:
if not c: c = self.normalize_customer_record(raw)
skipped_customers += 1 if not c:
continue skipped_customers += 1
customers.append(c) continue
self.upsert_customer(c) customers.append(c)
self.insert_snapshot(c, billing_month, captured_at) self.upsert_customer(c)
self.insert_snapshot(c, billing_month, captured_at)
if skipped_customers: if skipped_customers:
logger.info("Skipped %d invalid customer rows", skipped_customers) logger.info("Skipped %d invalid customer rows", skipped_customers)
login_to_account = build_login_to_account_map(customers) login_to_account = build_login_to_account_map(customers)
if not login_to_account and normalized_sync_target in {SYNC_TARGET_ORDER, SYNC_TARGET_BILLS}:
login_to_account = self.fetch_login_to_account_map()
logger.info("Resolved %d customer login_name -> account_id mappings", len(login_to_account)) logger.info("Resolved %d customer login_name -> account_id mappings", len(login_to_account))
if raw_customer_details: if raw_customer_details:
self.update_customer_details(raw_customer_details, billing_month) self.update_customer_details(raw_customer_details, billing_month)
self.upsert_orders(raw_orders, login_to_account) if normalized_sync_target in {SYNC_TARGET_ALL, SYNC_TARGET_ORDER}:
self.upsert_order_details(raw_order_details) self.upsert_orders(raw_orders, login_to_account)
self.sync_bills(raw_bills, login_to_account, incremental=incremental) if normalized_sync_target in {SYNC_TARGET_ALL, SYNC_TARGET_ORDERDETAILS}:
self.upsert_order_details(raw_order_details)
if normalized_sync_target in {SYNC_TARGET_ALL, SYNC_TARGET_BILLS}:
self.sync_bills(raw_bills, login_to_account, incremental=incremental)
# Log sync # Log sync
duration = (datetime.now() - start).total_seconds() duration = (datetime.now() - start).total_seconds()
@@ -1051,10 +1123,17 @@ def main() -> None:
default=False, default=False,
help="仅查询 aps_bill 中最新的 consumption_time 并输出", help="仅查询 aps_bill 中最新的 consumption_time 并输出",
) )
_ = parser.add_argument(
"--sync-target",
choices=sorted(VALID_SYNC_TARGETS),
default=SYNC_TARGET_ALL,
help="选择同步对象: all/customer/order/orderdetails/bills",
)
args = parser.parse_args() args = parser.parse_args()
data_dir = cast(str, args.dir) data_dir = cast(str, args.dir)
incremental = cast(bool, args.incremental) incremental = cast(bool, args.incremental)
latest_bill_consumption_time = cast(bool, args.latest_bill_consumption_time) latest_bill_consumption_time = cast(bool, args.latest_bill_consumption_time)
sync_target = cast(str, args.sync_target)
syncer = APSSyncer(db_config=DB_CONFIG) syncer = APSSyncer(db_config=DB_CONFIG)
if latest_bill_consumption_time: if latest_bill_consumption_time:
@@ -1066,7 +1145,7 @@ def main() -> None:
return return
finally: finally:
syncer.close() syncer.close()
_ = syncer.sync_from_json(data_dir, incremental=incremental) _ = syncer.sync_from_json(data_dir, incremental=incremental, sync_target=sync_target)
if __name__ == "__main__": if __name__ == "__main__":

View File

@@ -12,33 +12,46 @@ import sys
import signal import signal
import argparse import argparse
import logging import logging
import importlib
from pathlib import Path from pathlib import Path
from datetime import datetime from datetime import datetime
from typing import Any, cast
from aps_db_sync import APSSyncer, DB_CONFIG, JSON_DIR try:
from . import aps_db_sync as aps_db_sync_module
except ImportError:
aps_db_sync_module = importlib.import_module("aps_db_sync")
APSSyncer = aps_db_sync_module.APSSyncer
db_config_default = cast(dict[str, str | int], aps_db_sync_module.DB_CONFIG)
json_dir = cast(Path, aps_db_sync_module.JSON_DIR)
default_sync_target = cast(str, aps_db_sync_module.SYNC_TARGET_ALL)
valid_sync_targets = cast(set[str], aps_db_sync_module.VALID_SYNC_TARGETS)
LOG_FORMAT = "%(asctime)s [%(levelname)s] %(message)s" LOG_FORMAT = "%(asctime)s [%(levelname)s] %(message)s"
logging.basicConfig(level=logging.INFO, format=LOG_FORMAT) logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
logger = logging.getLogger("aps_scheduler") logger = logging.getLogger("aps_scheduler")
WATCH_INTERVAL_SECONDS = 30 DEFAULT_WATCH_INTERVAL_SECONDS = 30
PROCESSED_MARKER_DIR = JSON_DIR / ".aps_sync_processed" watch_interval_seconds = DEFAULT_WATCH_INTERVAL_SECONDS
PROCESSED_MARKER_DIR = json_dir / ".aps_sync_processed"
def _update_watch_interval(value: int): def _update_watch_interval(value: int):
global WATCH_INTERVAL_SECONDS global watch_interval_seconds
WATCH_INTERVAL_SECONDS = value watch_interval_seconds = value
class SyncScheduler: class SyncScheduler:
def __init__(self, db_config: dict = None): def __init__(self, db_config: dict[str, str | int] | None = None, sync_target: str = default_sync_target):
self.db_config = db_config or DB_CONFIG self.db_config: dict[str, str | int] = db_config or db_config_default
self.running = True self.sync_target: str = sync_target
self.running: bool = True
PROCESSED_MARKER_DIR.mkdir(exist_ok=True) PROCESSED_MARKER_DIR.mkdir(exist_ok=True)
signal.signal(signal.SIGINT, self._shutdown) _ = signal.signal(signal.SIGINT, self._shutdown)
signal.signal(signal.SIGTERM, self._shutdown) _ = signal.signal(signal.SIGTERM, self._shutdown)
def _shutdown(self, signum, frame): def _shutdown(self, signum: int, frame: object | None):
logger.info("Shutdown signal received, stopping...") logger.info("Shutdown signal received, stopping...")
self.running = False self.running = False
@@ -55,18 +68,18 @@ class SyncScheduler:
def _mark_processed(self, json_path: Path): def _mark_processed(self, json_path: Path):
marker = self._marker_path(json_path) marker = self._marker_path(json_path)
marker.write_text(datetime.now().isoformat()) _ = marker.write_text(datetime.now().isoformat())
def find_unprocessed_files(self) -> list[Path]: def find_unprocessed_files(self) -> list[Path]:
pattern = "aps_aliyun_customers_with_bills_*.json" pattern = "aps_aliyun_customers_with_bills_*.json"
all_files = sorted(JSON_DIR.glob(pattern), key=lambda p: p.stat().st_mtime) all_files = sorted(json_dir.glob(pattern), key=lambda p: p.stat().st_mtime)
return [f for f in all_files if not self._is_processed(f)] return [f for f in all_files if not self._is_processed(f)]
def sync_file(self, json_path: Path) -> bool: def sync_file(self, json_path: Path) -> bool:
logger.info("Syncing: %s", json_path.name) logger.info("Syncing: %s", json_path.name)
try: try:
syncer = APSSyncer(db_config=self.db_config) syncer = APSSyncer(db_config=cast(Any, self.db_config))
syncer.sync_from_json(str(json_path)) _ = syncer.sync_from_json(str(json_path), sync_target=self.sync_target)
self._mark_processed(json_path) self._mark_processed(json_path)
return True return True
except Exception as e: except Exception as e:
@@ -86,38 +99,40 @@ class SyncScheduler:
return count return count
def run_watch(self): def run_watch(self):
logger.info("Watching %s for new JSON files (interval=%ds)", JSON_DIR, WATCH_INTERVAL_SECONDS) logger.info("Watching %s for new JSON files (interval=%ds)", json_dir, watch_interval_seconds)
self.run_once() _ = self.run_once()
while self.running: while self.running:
time.sleep(WATCH_INTERVAL_SECONDS) time.sleep(watch_interval_seconds)
unprocessed = self.find_unprocessed_files() unprocessed = self.find_unprocessed_files()
for f in unprocessed: for f in unprocessed:
if not self.running: if not self.running:
break break
self.sync_file(f) _ = self.sync_file(f)
logger.info("Watcher stopped") logger.info("Watcher stopped")
def main(): def main():
parser = argparse.ArgumentParser(description="APS Sync Scheduler") parser = argparse.ArgumentParser(description="APS Sync Scheduler")
parser.add_argument("--mode", choices=["watch", "cron", "daemon"], default="watch", _ = parser.add_argument("--mode", choices=["watch", "cron", "daemon"], default="watch",
help="watch=file watcher, cron=one-shot, daemon=watch with initial sync") help="watch=file watcher, cron=one-shot, daemon=watch with initial sync")
parser.add_argument("--host", default=DB_CONFIG["host"]) _ = parser.add_argument("--host", default=db_config_default["host"])
parser.add_argument("--port", type=int, default=DB_CONFIG["port"]) _ = parser.add_argument("--port", type=int, default=db_config_default["port"])
parser.add_argument("--user", default=DB_CONFIG["user"]) _ = parser.add_argument("--user", default=db_config_default["user"])
parser.add_argument("--password", default=DB_CONFIG["password"]) _ = parser.add_argument("--password", default=db_config_default["password"])
parser.add_argument("--database", default=DB_CONFIG["database"]) _ = parser.add_argument("--database", default=db_config_default["database"])
parser.add_argument("--interval", type=int, default=WATCH_INTERVAL_SECONDS, _ = parser.add_argument("--interval", type=int, default=watch_interval_seconds,
help="Watch interval in seconds") help="Watch interval in seconds")
_ = parser.add_argument("--sync-target", choices=sorted(valid_sync_targets), default=default_sync_target,
help="选择同步对象: all/customer/order/orderdetails/bills")
args = parser.parse_args() args = parser.parse_args()
_update_watch_interval(args.interval) _update_watch_interval(args.interval)
config = { config: dict[str, str | int] = {
"host": args.host, "port": args.port, "user": args.user, "host": args.host, "port": args.port, "user": args.user,
"password": args.password, "database": args.database, "charset": "utf8mb4", "password": args.password, "database": args.database, "charset": "utf8mb4",
} }
scheduler = SyncScheduler(db_config=config) scheduler = SyncScheduler(db_config=config, sync_target=args.sync_target)
if args.mode == "cron": if args.mode == "cron":
count = scheduler.run_once() count = scheduler.run_once()