28 Commits

Author SHA1 Message Date
00df439bc5 Add cache operation logging and fix log viewer to show latest entries
- supervisor: emit structured events for dirty_count/errored_files
  changes, cache size changes >10%, transfer active/idle transitions,
  cache ≥80%/≥95% warnings, and 60s periodic stats snapshots
- supervisor: add parse_size_bytes() helper; structured BwLimit log
- warmup: add per-file debug/info/warn logging with 100-file milestones
  and rule-complete summary
- web/api: fix /api/logs initial load to return most recent entries
  (tail behaviour) instead of oldest entries

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 14:39:41 +08:00
5efef83a90 Add multi_thread_streams/cutoff support and Samba performance tuning
- Add multi_thread_streams (default 4) and multi_thread_cutoff (default "50M")
  fields to ReadConfig, wired into rclone mount args
- Expose both fields in Web UI config editor under Read Tuning section
- Add Samba performance options: TCP_NODELAY, large readwrite, max xmit
- Update config.toml.default with new fields and sftp_connections guidance

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 14:15:23 +08:00
078ab4505e Fix portal freeze: release write lock before HTTP IO in update_status
update_status() previously acquired the shared_status write lock on the
first line and then called rc::vfs_stats() and rc::core_stats() (blocking
ureq HTTP) for every share while holding it. With dir-refresh flooding
the rclone RC port, these calls could take seconds, starving all web
handler reads and making the portal completely unresponsive.

Refactor to a two-phase approach: Phase 1 collects all RC stats with no
lock held; Phase 2 applies the results under a short-lived write lock
(pure memory writes, microseconds). Lock hold time drops from seconds to
microseconds regardless of rclone response latency.

Also included in this batch:
- vfs_refresh now reads the response body and surfaces partial failures
- dir-refresh iterates top-level FUSE subdirs instead of refreshing "/"
  (rclone does not accept "/" as a valid vfs/refresh target)
- Track per-share dirs_ok / dirs_failed counts in DaemonStatus and
  expose them through the web API

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 13:37:21 +08:00
64d6171ec9 Unify logging to tracing: file appender + unified log viewer
Replace scattered println!/eprintln! with structured tracing macros throughout
supervisor, scheduler, and web modules. Add LogConfig (file + level) to Config
and a new logging module that initialises a stderr + optional non-blocking file
appender on `warpgate run`. Remove the in-memory LogBuffer/LogEntry from
AppState; the web /api/logs endpoint now reads the log file directly with
from_line/lines pagination. `warpgate log` replaces journalctl with `tail`,
and the Logs tab Alpine.js is updated to match the new API response shape.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 11:24:06 +08:00
74b0e72549 Add periodic dir-refresh and per-share refresh status display
Introduces a ScheduledTask mechanism that periodically calls rclone RC
vfs/refresh to keep directory listing caches warm (no file downloads),
with two-level config (global default + per-share override). Adds
dir-refresh status badges and timestamps to the web UI shares tab and
CLI status output, following the same pattern as warmup/warmed.

- src/scheduler.rs: New generic ScheduledTask runner with generation-based
  cancellation and parse_interval() helper
- src/rclone/rc.rs: Add vfs_refresh() RC API call
- src/config.rs: Add DirRefreshConfig, per-share dir_refresh_interval
  override, effective_dir_refresh_interval() resolution method
- src/config_diff.rs: Track dir_refresh_changed for hot-reload
- src/daemon.rs: Track per-share last_dir_refresh timestamps (HashMap),
  add dir_refresh_ago_for() helper and format_ago()
- src/supervisor.rs: spawn_dir_refresh() per-share background threads,
  called on startup and config reload
- src/web/api.rs: Expose dir_refresh_active + last_dir_refresh_ago in
  ShareStatusResponse
- src/web/pages.rs: Populate dir_refresh_active + last_dir_refresh_ago
  in ShareView and ShareDetailView
- templates/web/tabs/shares.html: DIR-REFRESH badge (yellow=pending,
  green=N ago) in health column; Dir Refresh row in detail panel
- templates/web/tabs/config.html: Dir Refresh section and per-share
  interval field in interactive config editor
- src/cli/status.rs: Append Dir-Refresh suffix to mount status lines

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 10:54:08 +08:00
15f915fbee Show active transfer count, add SFTP retry resilience, and fix config tab refresh
- Use rclone transferring array to show only active transfers instead of
  cumulative count; zero out speed when no transfers are active
- Add SFTP retry/timeout flags to rclone mount for flaky Tailscale tunnels
- Skip auto-refresh on config tab to prevent editor resets

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 00:21:22 +08:00
16d11aa4ef Fix stale warmup status persisting after rules are removed from config
spawn_warmup() returned early when rules were empty without clearing
status.warmup, leaving orphaned entries visible in web UI and CLI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 00:01:34 +08:00
e8f1971d63 Add auto-refresh toggle for web UI tabs with localStorage persistence
Periodic client-side refresh for Shares/Config tabs using Alpine.js
setInterval, with toggle and configurable interval (2-30s) in header.
Dashboard (SSE) and Logs (own polling) are excluded. Shares tab
preserves row expansion state across refreshes via ?expand= param.
Adds [x-cloak] CSS rule and conditional x-cloak on detail rows to
prevent flash during content swaps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 19:31:54 +08:00
2432f83914 Re-trigger warmup on config reload and add per-share warmup status tracking
Warmup config changes via the web UI now actually run warmup without requiring
a daemon restart. Adds generation-based warmup tracking with progress reporting
across CLI status, JSON API, SSE live updates, and web UI badges/detail panels.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 19:13:04 +08:00
6bb7ec4d27 Web UI overhaul: interactive config editor, SSE live updates, log viewer, and SMB reload fixes
- Replace raw TOML textarea with Alpine.js interactive form editor (10 collapsible
  sections with change-tier badges, dynamic array management for connections/shares/
  warmup rules, proper input controls per field type)
- Add SSE-based live dashboard updates replacing htmx polling
- Add log viewer tab with ring buffer backend and incremental polling
- Fix SMB not seeing new shares after config reload: kill entire smbd process group
  (not just parent PID) so forked workers release port 445
- Add SIGHUP-based smbd config reload for share changes instead of full restart,
  preserving existing client connections
- Generate human-readable commented TOML from config editor instead of bare
  toml::to_string_pretty() output
- Fix Alpine.js 2.x __x.$data calls in dashboard/share templates (now Alpine 3.x)
- Fix toggle switch CSS overlap with field labels
- Fix dashboard going blank on tab switch (remove hx-swap-oob from tab content)
- Add htmx:afterSettle → Alpine.initTree() bridge for robust tab switching

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 18:06:52 +08:00
466ea5cfa8 Add pre-mount remote path probe and per-share health status
Before mounting, probe each share's remote path with `rclone lsf`
(10s timeout, parallel execution). Failed shares are skipped — they
never get mounted or exposed to SMB/NFS/WebDAV — preventing the
silent hang that occurred when rclone mounted a nonexistent directory.

- ShareHealth enum: Pending → Probing → Healthy / Failed(reason)
- Supervisor: probe phase between preflight and mount, protocol
  configs generated after probe with only healthy shares
- Web UI: health-aware badges (OK/FAILED/PROBING/PENDING) with
  error messages on dashboard, status partial, and share detail
- JSON API: health + health_message fields on /api/status
- CLI: `warpgate status` queries daemon API first for tri-state
  display (OK/FAILED/DOWN), falls back to direct mount checks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 15:28:56 +08:00
ba1cae7f75 Add daemon web UI, JSON API, and config hot-reload engine
- New: axum web server on port 8090 with htmx dashboard
- New: JSON API endpoints (/api/status, /api/config, /api/bwlimit)
- New: config diff engine with 4-tier change classification
- New: tiered config hot-reload (live/protocol/per-share/global)
- Refactor: supervisor loop uses mpsc command channel (recv_timeout)
- Refactor: supervisor updates shared DaemonStatus every poll cycle
- Dependencies: tokio, axum, askama, tower-http

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 14:18:20 +08:00
08f8fc4667 Per-share independent mounts: each share gets its own rclone process
Replace the hierarchical single-mount design with independent mounts:
each [[shares]] entry is a (name, remote_path, mount_point) triplet
with its own rclone FUSE mount process and dedicated RC API port
(5572 + index). Remove top-level connection.remote_path and [mount]
section. Auto-warmup now runs in a background thread to avoid blocking
the supervision loop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 12:32:18 +08:00
46e592c3a4 Flatten project structure: move warpgate/ contents to repo root
Single-crate project doesn't need a subdirectory. Moves Cargo.toml,
src/, templates/ to root for standard Rust project layout. Updates
.gitignore and test harness binary paths accordingly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 11:25:15 +08:00
a2d49137f9 Add comprehensive test suite: 63 integration tests + 110 Rust unit tests
Integration tests (tests/):
- 9 categories covering config, lifecycle, signals, supervision,
  cache, writeback, network faults, crash recovery, and CLI
- Shell-based harness with mock NAS (network namespace + SFTP),
  fault injection (tc netem), and power loss simulation
- TAP format runner (run-all.sh) with proper SKIP detection

Rust unit tests (warpgate/src/):
- 110 tests across 14 modules, all passing in 0.01s
- Config parsing, defaults validation, RestartTracker logic,
  RC API response parsing, rclone arg generation, service
  config generation, CLI output formatting, warmup path logic

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 11:21:35 +08:00
e6c48c9bd9 Harden supervisor shutdown: process group isolation, write-back drain
- Spawn all children (rclone, smbd, webdav) in isolated process groups
  so Ctrl+C doesn't reach them directly — supervisor controls shutdown order
- Wait for rclone VFS write-back queue to drain before unmounting (5min cap)
- Prefer fusermount3 over fusermount, skip if already unmounted

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 09:56:09 +08:00
960ddd20ce Add incremental warmup with cache check and auto-warmup on startup
Warmup now checks the rclone VFS cache directory before reading each file
through the FUSE mount, skipping already-cached files for fast re-runs.
Also adds WarmupConfig with configurable rules that auto-execute when
the supervisor starts (best-effort, non-blocking).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 09:39:58 +08:00
9b37c88cd5 Fix warmup to use VFS cache, dynamic SMB share name, smbd long flags
- warmup: read files through FUSE mount instead of rclone copy to temp
  dir. Files now actually land in rclone VFS SSD cache.
- samba: derive share name from mount point dir name instead of
  hardcoded [nas-photos] (e.g. /mnt/projects → [projects])
- supervisor: use smbd long flags (--foreground, --debug-stdout,
  --no-process-group, --configfile) for compatibility with Samba 4.19

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 00:38:42 +08:00
5d8bf52ae9 Add warpgate MVP implementation with hardened supervisor
Full Rust implementation of the warpgate NAS cache proxy:

- CLI: clap-based with subcommands (run, setup, status, cache, warmup,
  bwlimit, speed-test, config-init, log)
- Config: TOML-based with env var override, preset templates
- rclone: VFS mount args builder, config generator, RC API client
- Services: Samba config gen, NFS exports, WebDAV serve args, systemd units
- Deploy: dependency checker, filesystem validation, one-click setup
- Supervisor: single-process tree managing rclone mount + smbd + WebDAV
  as child processes — no systemd dependency for protocol management

Supervisor hardening:
- ProtocolChildren Drop impl prevents orphaned processes on startup failure
- Early rclone exit detection in mount wait loop (fail fast, not 30s timeout)
- Graceful SIGTERM → 3s grace → SIGKILL (prevents orphaned smbd workers)
- RestartTracker with 5-min stability reset and linear backoff (2s/4s/6s)
- Shutdown signal checked during mount wait and before protocol start

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 23:29:17 +08:00
8f00f86eb4 PRD v4: revert to rclone VFS read-write proxy architecture
Drop the read-only cache + SD Uploader design in favor of rclone VFS
native read-write caching. Key changes:
- SMB shares are now read-write, writes go to SSD and async write-back to NAS
- Remove SD card import/upload, metadata DB, self-built polling
- Simplify remote change detection to rclone --dir-cache-time
- Add dirty file management, write-back config, and related risks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 18:56:12 +08:00
3caddc6370 Simplify architecture: read-only cache + one-way SD upload
Replace OverlayFS + sync daemon with two independent subsystems:
- Read-only cache: rclone --read-only + Samba read only = yes
- SD Uploader: staging → SFTP direct upload to NAS (temp file + rename)

Remove: OverlayFS, sync daemon, three-timestamp model, write-back,
conflict detection, dirty file tracking. Net -299 lines.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 14:35:43 +08:00
d40997312b Redesign conflict UX: in-place copies like Dropbox/iCloud (4.16)
Instead of moving conflict files to a separate conflict/ directory,
keep them in the original directory with naming convention:
  {name} (Warpgate Conflict {YYYY-MM-DD HH-mm}).{ext}

Benefits:
- Lightroom/Finder see both versions side by side
- Preserved extension ensures app compatibility
- Matches Dropbox/iCloud behavior users already know
- Conflict copies auto-sync to NAS via rclone (backed up)

Remote-deleted + local-dirty: file stays in place (no rename),
marked as orphan-conflict, user decides whether to re-upload.

Updated: decision matrix diagrams, scenario walkthroughs,
cache_files lifecycle, CLI commands, config section, directory
structure description.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 21:44:51 +08:00
823d20606a Add adaptive throttle for write-back bandwidth (4.14)
Throughput-based congestion detection: when sustained throughput
drops >30% over sliding window with rising RTT, auto-reduce
write-back speed to 50% of current throughput, then probe back up
at +10% every 2 minutes.

- Throttle state visible via `warpgate status`
- User can disable with BW_ADAPTIVE=no
- Only affects write-back uploads, not read fetches
- New config: BW_ADAPTIVE, BW_ADAPTIVE_WINDOW, BW_ADAPTIVE_PROBE_INTERVAL

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 21:42:00 +08:00
7fd1934be5 Clarify metadata.db must be on local filesystem, not FUSE mount
SQLite WAL depends on POSIX file locks and shared memory (-shm),
which FUSE/network filesystems cannot support correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 21:38:08 +08:00
aaf947859f Convert all ASCII art diagrams to Mermaid
Replace 20 ASCII box-drawing diagrams with Mermaid equivalents:
- System architecture → flowchart with subgraphs
- Multi-protocol cache → flowchart LR
- Write-back decision matrix → flowchart with branches
- SD card import decision tree → flowchart
- Read cache validation → markdown table (cleaner than ASCII grid)
- 5 scenario walkthroughs → flowcharts with timeline context
- 4-table ER diagram → erDiagram
- Deletion detection flow → flowchart
- Write-back dual-pipeline → flowchart with subgraphs
- Import state machine → stateDiagram-v2
- Tiered polling strategy → flowchart
- NAS agent push → flowchart LR
- Read/write flows → flowcharts
- Cache eviction → flowchart
- Headscale infrastructure → flowchart BT
- Cloud backup → flowchart with subgraph
- TM write-back strategy → flowchart LR

Kept directory tree structure as plain text (standard convention).
Cache protection measures converted to structured markdown list.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 21:28:03 +08:00
ddcfb87b36 Add captive portal setup AP as P1 feature (4.11)
Hotel/airport WiFi requires web-based captive portal authentication,
which is impossible on a headless device without this feature.

- New P1 feature 4.11: Setup AP + Captive Portal proxy
  - Box auto-enters setup mode when no network is available
  - Phone connects to temporary AP, completes portal auth via proxy
  - Requires WiFi AP+STA concurrent mode
- Fallback options: USB tethering, mobile hotspot, ethernet, MAC clone
- New CLI commands: warpgate setup-wifi, warpgate clone-mac
- New config section for setup AP parameters
- Updated hardware requirements: WiFi module must support AP+STA
- Updated roadmap v1.5 to include setup AP
- Added risk entry and glossary terms
- Renumbered 4.12-4.23 accordingly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 18:20:42 +08:00
aa2db2bf5f Rename product to Warpgate — Make your NAS feel local
- Rename NAS Cache Proxy → Warpgate throughout PRD
- Update CLI commands: nas-cache → warpgate
- Update paths: /mnt/ssd/nas-cache → /mnt/ssd/warpgate
- Rename file: nas-cache-proxy-prd-v3.md → warpgate-prd-v3.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 18:05:23 +08:00
c3b458bced Add NAS Cache Proxy PRD v3
Complete product requirements document covering:
- Transparent SMB/NFS/WebDAV cache proxy with rclone VFS
- SD card ingest + auto archive pipeline for photographers
- Three-timestamp consistency model with write-back controller
- Time Machine backup target with independent sparsebundle sync
- Layered SFTP polling for remote change detection
- Cache space protection and import state machine
- Paid services: Headscale + DERP relay, cloud disaster backup
- Hardware appliance roadmap (v1.0 MVP → v3.0 hardware product)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 18:00:44 +08:00