Architecture ============ ``automation_file`` follows a layered architecture built around five design patterns: System overview --------------- The diagram below shows the full dispatch surface: every caller — CLI, GUI, HTTP/MCP clients, entry-point plugins — eventually lands in the shared ``ActionRegistry`` that ``build_default_registry()`` populates, and the registry fans out to local ops, remote backends, reliability / security / observability helpers, notifications, and the event-driven trigger + cron dispatchers. .. mermaid:: flowchart TD CLI["CLI / JSON batch
python -m automation_file"] GUIUser["PySide6 GUI
launch_ui"] ClientSDK["HTTPActionClient SDK"] MCPHost["MCP hosts
Claude Desktop · MCP CLIs"] Plugins["Entry-point plugins
automation_file.actions"] subgraph Facade["automation_file — facade (__init__.py)"] PublicAPI["Public API
execute_action · execute_action_parallel · execute_action_dag
validate_action · driver_instance · s3_instance · azure_blob_instance
dropbox_instance · sftp_instance · ftp_instance · onedrive_instance · box_instance
start_autocontrol_socket_server · start_http_action_server
start_metrics_server · start_web_ui · MCPServer
notification_manager · scheduler · trigger_manager
AutomationConfig · progress_registry · Quota · retry_on_transient"] end subgraph Core["core"] Registry[("ActionRegistry
FA_* commands")] Executor["ActionExecutor
serial · parallel · dry-run · validate-first"] DAG["dag_executor
topological fan-out"] Callback["CallbackExecutor"] Loader["PackageLoader
+ entry-point plugins"] Queue["ActionQueue"] Json["json_store"] Sub["substitution
${env:} ${date:} ${uuid}"] end subgraph Reliability["reliability"] Retry["retry
@retry_on_transient"] QuotaMod["Quota
bytes + time budget"] Breaker["CircuitBreaker"] RL["RateLimiter"] Locks["FileLock · SQLiteLock"] end subgraph Observability["observability"] Progress["progress
CancellationToken · Reporter"] Metrics["metrics
Prometheus counters + histograms"] Audit["AuditLog
SQLite"] Tracing["tracing
OpenTelemetry spans"] FIM["IntegrityMonitor"] end subgraph Security["security & config"] Secrets["Secret providers
Env · File · Chained"] Config["AutomationConfig
TOML loader"] ConfW["ConfigWatcher
hot reload"] Crypto["crypto
AES-256-GCM"] Check["checksum / manifest"] SafeP["safe_paths
safe_join · is_within"] ACL["ActionACL"] end subgraph Events["event-driven"] Trigger["TriggerManager
watchdog file watcher"] Sched["Scheduler
5-field cron + overlap guard"] end subgraph Servers["servers"] TCP["TCPActionServer
loopback · AUTH secret"] HTTPS["HTTPActionServer
POST /actions · Bearer
/healthz /readyz /progress /openapi.json"] MCP["MCPServer
JSON-RPC 2.0 (stdio)"] MetSrv["MetricsServer
/metrics"] WebUI["WebUIServer
HTMX dashboard"] end subgraph UI["ui (PySide6)"] MainWin["MainWindow
Home · Local · HTTP · Drive · S3 · Azure · Dropbox
SFTP · OneDrive · Box · JSON · Triggers · Scheduler
Progress · Transfer · Servers"] Worker["ActionWorker
QRunnable on QThreadPool"] end subgraph Local["local ops"] FileOps["file_ops · dir_ops"] Archives["zip_ops · tar_ops · archive_ops"] DataOps["data_ops
csv · jsonl · parquet · yaml"] TextOps["text_ops · diff_ops
json_edit · templates"] Misc["shell_ops · sync_ops · trash
versioning · conditional · mime"] end subgraph Remote["remote backends"] UrlVal["url_validator
SSRF guard"] Http["http_download
retry · resume · SHA-256"] Drive["google_drive"] S3M["s3"] Azure["azure_blob"] Dropbox["dropbox_api"] SFTP["sftp (RejectPolicy)"] FTP["ftp / FTPS"] OneD["onedrive"] Box["box"] WebDAV["webdav"] SMB["smb / cifs"] Fsspec["fsspec_bridge"] Cross["cross_backend
local:// s3:// drive:// azure://
dropbox:// sftp:// ftp://"] end subgraph Notify["notifications"] NM["NotificationManager
fanout · dedup · SSRF guard"] Sinks["Sinks
Webhook · Slack · Email
Telegram · Discord · Teams · PagerDuty"] end subgraph Utils["utils / project"] Fast["fast_find
mdfind / locate / es.exe"] Dedup["find_duplicates"] Grep["grep_files"] Rotate["rotate_backups"] Discovery["file_discovery"] Builder["ProjectBuilder + templates"] end CLI ==> PublicAPI GUIUser ==> MainWin ClientSDK ==> HTTPS MCPHost ==> MCP Plugins ==> Loader MainWin ==> Worker Worker ==> PublicAPI PublicAPI ==> Executor PublicAPI ==> DAG PublicAPI ==> Callback PublicAPI ==> Queue PublicAPI ==> Config PublicAPI ==> NM PublicAPI ==> Trigger PublicAPI ==> Sched TCP ==> Executor HTTPS ==> Executor MCP ==> Registry MetSrv ==> Metrics WebUI ==> Registry ACL ==> TCP ACL ==> HTTPS Executor ==> Registry Executor ==> Sub Executor ==> Retry Executor ==> QuotaMod Executor ==> Metrics Executor ==> Audit Executor ==> Tracing Executor ==> Json DAG ==> Executor Callback ==> Registry Loader ==> Registry Trigger ==> Executor Sched ==> Executor Trigger -. on failure .-> NM Sched -. on failure .-> NM FIM -. on drift .-> NM ConfW ==> Config Config ==> Secrets Config ==> NM Registry ==> FileOps Registry ==> Archives Registry ==> DataOps Registry ==> TextOps Registry ==> Misc Registry ==> Http Registry ==> Drive Registry ==> S3M Registry ==> Azure Registry ==> Dropbox Registry ==> SFTP Registry ==> FTP Registry ==> OneD Registry ==> Box Registry ==> WebDAV Registry ==> SMB Registry ==> Fsspec Registry ==> Cross Registry ==> Crypto Registry ==> Check Registry ==> Fast Registry ==> Dedup Registry ==> Grep Registry ==> Rotate Registry ==> Discovery Registry ==> Builder Registry ==> Progress FileOps ==> SafeP Archives ==> SafeP Misc ==> SafeP Http ==> UrlVal Http ==> Retry Http ==> Progress Http ==> Check S3M ==> Progress WebDAV ==> UrlVal NM ==> UrlVal NM ==> Sinks Cross ==> Drive Cross ==> S3M Cross ==> Azure Cross ==> Dropbox Cross ==> SFTP Cross ==> FTP classDef entry fill:#FDEDEC,stroke:#641E16,stroke-width:3px,color:#000,font-weight:bold; classDef facade fill:#D6EAF8,stroke:#154360,stroke-width:4px,color:#000,font-weight:bold; classDef core fill:#FEF9E7,stroke:#1F3A93,stroke-width:3px,color:#000,font-weight:bold; classDef rel fill:#D1F2EB,stroke:#0B5345,stroke-width:3px,color:#000,font-weight:bold; classDef obs fill:#FDEBD0,stroke:#9C640C,stroke-width:3px,color:#000,font-weight:bold; classDef sec fill:#F5B7B1,stroke:#78281F,stroke-width:3px,color:#000,font-weight:bold; classDef event fill:#FCF3CF,stroke:#7D6608,stroke-width:3px,color:#000,font-weight:bold; classDef server fill:#FADBD8,stroke:#922B21,stroke-width:3px,color:#000,font-weight:bold; classDef ui fill:#AED6F1,stroke:#1B4F72,stroke-width:3px,color:#000,font-weight:bold; classDef localOps fill:#E8DAEF,stroke:#512E5F,stroke-width:3px,color:#000,font-weight:bold; classDef remote fill:#D5F5E3,stroke:#196F3D,stroke-width:3px,color:#000,font-weight:bold; classDef notify fill:#F9E79F,stroke:#7D6608,stroke-width:3px,color:#000,font-weight:bold; classDef utils fill:#EAEDED,stroke:#212F3C,stroke-width:3px,color:#000,font-weight:bold; class CLI,GUIUser,ClientSDK,MCPHost,Plugins entry; class PublicAPI facade; class Registry,Executor,DAG,Callback,Loader,Queue,Json,Sub core; class Retry,QuotaMod,Breaker,RL,Locks rel; class Progress,Metrics,Audit,Tracing,FIM obs; class Secrets,Config,ConfW,Crypto,Check,SafeP,ACL sec; class Trigger,Sched event; class TCP,HTTPS,MCP,MetSrv,WebUI server; class MainWin,Worker ui; class FileOps,Archives,DataOps,TextOps,Misc localOps; class UrlVal,Http,Drive,S3M,Azure,Dropbox,SFTP,FTP,OneD,Box,WebDAV,SMB,Fsspec,Cross remote; class NM,Sinks notify; class Fast,Dedup,Grep,Rotate,Discovery,Builder utils; linkStyle default stroke:#1F2A44,stroke-width:2.5px; Design patterns --------------- **Facade** :mod:`automation_file` (the top-level ``__init__``) is the only name users should need to import. Every public function and singleton is re-exported from there. **Registry + Command** :class:`~automation_file.core.action_registry.ActionRegistry` maps an action name (a string that appears in a JSON action list) to a Python callable. An action is a Command object of shape ``[name]``, ``[name, {kwargs}]``, or ``[name, [args]]``. **Template Method** :class:`~automation_file.core.action_executor.ActionExecutor` defines the single-action lifecycle: resolve the name, dispatch the call, capture the return value or exception. The outer iteration template guarantees that one bad action never aborts the batch unless ``validate_first=True`` is set. **Strategy** Each ``local/*_ops.py``, ``remote/*_ops.py``, and cloud subpackage is a collection of independent strategy functions. Every backend — local, HTTP, Google Drive, S3, Azure Blob, Dropbox, SFTP — is auto-registered by :func:`automation_file.core.action_registry.build_default_registry`. The ``register__ops(registry)`` helpers stay exported for callers that assemble custom registries. **Singleton (module-level)** ``executor``, ``callback_executor``, ``package_manager``, ``driver_instance``, ``s3_instance``, ``azure_blob_instance``, ``dropbox_instance``, and ``sftp_instance`` are shared instances wired in ``__init__`` so plugins pick up the same state as the CLI. Module layout ------------- .. code-block:: text automation_file/ ├── __init__.py # Facade — every public name ├── __main__.py # CLI with subcommands ├── exceptions.py # FileAutomationException hierarchy ├── logging_config.py # file_automation_logger ├── core/ │ ├── action_registry.py │ ├── action_executor.py # serial, parallel, dry-run, validate-first │ ├── dag_executor.py # topological scheduler with parallel fan-out │ ├── callback_executor.py │ ├── package_loader.py │ ├── plugins.py # entry-point plugin discovery │ ├── json_store.py │ ├── retry.py # @retry_on_transient │ ├── quota.py # Quota(max_bytes, max_seconds) │ ├── checksum.py # file_checksum, verify_checksum │ ├── manifest.py # write_manifest, verify_manifest │ ├── config.py # AutomationConfig (TOML loader + secret resolver) │ ├── secrets.py # Env/File/Chained secret providers │ └── progress.py # CancellationToken, ProgressReporter, progress_registry ├── local/ │ ├── file_ops.py │ ├── dir_ops.py │ ├── zip_ops.py │ ├── sync_ops.py # rsync-style incremental sync │ └── safe_paths.py # safe_join + is_within ├── remote/ │ ├── url_validator.py # SSRF guard │ ├── http_download.py # retried HTTP download │ ├── google_drive/ │ ├── s3/ # auto-registered in build_default_registry() │ ├── azure_blob/ # auto-registered in build_default_registry() │ ├── dropbox_api/ # auto-registered in build_default_registry() │ └── sftp/ # auto-registered in build_default_registry() ├── server/ │ ├── tcp_server.py # loopback-only, optional shared-secret │ └── http_server.py # POST /actions, Bearer auth ├── trigger/ │ └── manager.py # FileWatcher + TriggerManager (watchdog-backed) ├── scheduler/ │ ├── cron.py # 5-field cron expression parser │ └── manager.py # Scheduler background thread + ScheduledJob ├── notify/ │ ├── sinks.py # Webhook / Slack / Email sinks │ └── manager.py # NotificationManager (fanout + dedup + auto-notify hook) ├── project/ │ ├── project_builder.py │ └── templates.py ├── ui/ # PySide6 GUI │ ├── launcher.py # launch_ui(argv) │ ├── main_window.py # tabbed MainWindow (Home, Local, Transfer, │ │ # Progress, JSON actions, Triggers, │ │ # Scheduler, Servers) │ ├── worker.py # ActionWorker (QRunnable) │ ├── log_widget.py # LogPanel │ └── tabs/ # one tab per backend + JSON runner + servers └── utils/ ├── file_discovery.py ├── fast_find.py # OS-index (mdfind/locate/es) + scandir fallback └── deduplicate.py # size → partial-hash → full-hash dedup pipeline Execution modes --------------- The shared executor supports five orthogonal modes: * ``execute_action(actions)`` — default serial execution; each failure is captured and reported without aborting the batch. * ``execute_action(actions, validate_first=True)`` — resolve every name against the registry before running anything. A typo aborts the batch up-front instead of after half the actions have already run. * ``execute_action(actions, dry_run=True)`` — parse each action and log what would be called without invoking the underlying function. * ``execute_action_parallel(actions, max_workers=4)`` — dispatch actions concurrently through a thread pool. The caller is responsible for ensuring the chosen actions are independent. * ``execute_action_dag(nodes, max_workers=4, fail_fast=True)`` — Kahn-style topological scheduling. Each node is ``{"id": str, "action": [...], "depends_on": [id, ...]}``. Independent branches run in parallel, failed branches mark their transitive dependents ``skipped`` (or still run them under ``fail_fast=False``). Cycles / unknown deps / duplicate ids are rejected before any node runs. Reliability utilities --------------------- * :func:`automation_file.core.retry.retry_on_transient` — decorator that retries ``ConnectionError`` / ``TimeoutError`` / ``OSError`` with capped exponential back-off. Used by :func:`automation_file.download_file`. * :class:`automation_file.core.quota.Quota` — dataclass bundling an optional ``max_bytes`` size cap and an optional ``max_seconds`` time budget. * :func:`automation_file.core.checksum.file_checksum` and :func:`automation_file.core.checksum.verify_checksum` — streaming file hashing (any :mod:`hashlib` algorithm) with constant-time digest comparison. :func:`automation_file.download_file` accepts ``expected_sha256=`` to verify the target immediately after the HTTP transfer completes. * Resumable downloads: :func:`automation_file.download_file` accepts ``resume=True``, which writes to ``.part`` and sends ``Range: bytes=-`` so interrupted transfers continue from the existing byte count instead of restarting from zero. * :func:`automation_file.utils.deduplicate.find_duplicates` — three-stage size → partial-hash → full-hash pipeline; most files never get hashed because unique-size buckets are discarded before any digest is read. * :func:`automation_file.sync_dir` — incremental directory mirror with ``(size, mtime)`` or checksum-based change detection, optional delete of extras, and a dry-run mode. * :func:`automation_file.write_manifest` / :func:`automation_file.verify_manifest` — JSON snapshot of every file digest under a root, for release-artifact verification and tamper detection. * :class:`automation_file.core.progress.CancellationToken` and :class:`automation_file.core.progress.ProgressReporter` — opt-in per-transfer instrumentation. HTTP download and S3 upload/download accept a ``progress_name=`` kwarg that wires both primitives into the transfer loop; JSON actions ``FA_progress_list`` / ``FA_progress_cancel`` / ``FA_progress_clear`` address the central registry. Event-driven dispatch --------------------- Two long-running subsystems reuse the shared executor instead of forking their own dispatch paths: * :mod:`automation_file.trigger` wraps ``watchdog`` observers. Each :class:`~automation_file.trigger.FileWatcher` forwards matching filesystem events to an action list dispatched through the shared registry. :data:`~automation_file.trigger.trigger_manager` owns the name → watcher map so the GUI and JSON actions share one lifecycle. * :mod:`automation_file.scheduler` runs one background thread that wakes on minute boundaries, iterates registered :class:`~automation_file.scheduler.ScheduledJob` instances, and dispatches every matching job on a short-lived worker thread so a slow action can't starve subsequent jobs. Both dispatchers call :func:`automation_file.notify.manager.notify_on_failure` when an action list raises :class:`~automation_file.exceptions.FileAutomationException`. The helper is a no-op when no sinks are registered, so auto-notification is an opt-in side effect of registering any :class:`~automation_file.NotificationSink`. Notifications ------------- :mod:`automation_file.notify` ships three concrete sinks (:class:`~automation_file.WebhookSink`, :class:`~automation_file.SlackSink`, :class:`~automation_file.EmailSink`) behind one :class:`~automation_file.NotificationManager` fanout. The manager owns: * Per-sink error isolation — one broken sink never aborts the others. * Sliding-window dedup keyed on ``(subject, body, level)`` so a stuck trigger can't flood a channel. * A shared module-level singleton (:data:`~automation_file.notification_manager`) so CLI, GUI, and long-running dispatchers all publish through one state. Every webhook/Slack URL passes through :func:`~automation_file.remote.url_validator.validate_http_url`, blocking SSRF targets. Email sinks never expose the password in ``repr()``. Configuration and secrets ------------------------- :class:`automation_file.AutomationConfig` loads an ``automation_file.toml`` document and exposes helpers to materialise sinks / defaults. Secret placeholders (``${env:NAME}`` / ``${file:NAME}``) resolve at load time through a :class:`~automation_file.ChainedSecretProvider` built from :class:`~automation_file.EnvSecretProvider` and/or :class:`~automation_file.FileSecretProvider`. Unresolved references raise :class:`~automation_file.SecretNotFoundException` so a typo never silently becomes an empty string. Security boundaries ------------------- * **SSRF guard**: every outbound HTTP URL passes through :func:`automation_file.remote.url_validator.validate_http_url`. * **Path traversal**: :func:`automation_file.local.safe_paths.safe_join` resolves user paths under a caller-specified root and rejects ``..`` escapes, absolute paths outside the root, and symlinks pointing out of it. * **TCP / HTTP auth**: both servers accept an optional ``shared_secret``. When set, the TCP server requires ``AUTH \\n`` before the payload and the HTTP server requires ``Authorization: Bearer ``. Both bind to loopback by default and refuse non-loopback binds unless ``allow_non_loopback=True`` is passed. * **SFTP host verification**: the SFTP client uses :class:`paramiko.RejectPolicy` and never auto-adds unknown host keys. * **Plugin loading**: :class:`automation_file.core.package_loader.PackageLoader` registers arbitrary module members; never expose it to untrusted input. The entry-point discovery path (:func:`automation_file.core.plugins.load_entry_point_plugins`) is safer — only packages the user has explicitly installed can contribute commands — but every plugin still runs with full library privileges, so review third-party plugins before installing them. Entry-point plugins ------------------- Third-party packages can ship extra actions without ``automation_file`` having to import them. A plugin advertises itself in its ``pyproject.toml``:: [project.entry-points."automation_file.actions"] my_plugin = "my_plugin:register" where ``register`` is a zero-argument callable returning a ``Mapping[str, Callable]`` — the same shape you would hand to :func:`automation_file.add_command_to_executor`. :func:`automation_file.core.action_registry.build_default_registry` invokes :func:`automation_file.core.plugins.load_entry_point_plugins` after the built-ins are wired in, so installed plugins populate every freshly-built registry automatically. Plugin failures (import errors, factory exceptions, bad return shape, registry rejection) are logged and swallowed so one broken plugin does not break the library. Shared singletons ----------------- ``automation_file/__init__.py`` creates the following process-wide singletons: * ``executor`` — :class:`ActionExecutor` used by :func:`execute_action`. * ``callback_executor`` — :class:`CallbackExecutor` bound to ``executor.registry``. * ``package_manager`` — :class:`PackageLoader` bound to the same registry. * ``driver_instance``, ``s3_instance``, ``azure_blob_instance``, ``dropbox_instance``, ``sftp_instance`` — lazy clients for each cloud backend. All executors share one :class:`ActionRegistry` instance, so calling :func:`add_command_to_executor` (or any ``register_*_ops`` helper) makes the new command visible to every dispatcher at once.