Persistency
The persistency module gives a detector a place to remember things about the
events it sees. State is keyed by EventID and survives across training, the
detection loop, and (optionally) restarts on disk.
This page is structured to read top-to-bottom: first the mental model, then a quick start, then the API surface.
Mental model
Persistency has three moving parts. Understanding what each one does makes the rest of the page much easier to follow.
1. Events
Logs are grouped by EventID. Two events with the same ID share a template
but have their own variable values. Persistency stores one independent state
object per event ID, so an EventStabilityTracker for EventID=4733 does
not interfere with one for EventID=4624.
2. Backends (EventDataStructure)
A backend is the thing that actually stores the per-event state. Persistency
owns the dict {event_id: backend}; the backend itself decides how data is
kept.
Two families ship today:
- DataFrame backends (
EventDataFrame,ChunkedEventDataFrame) keep the raw rows. Use these when a detector needs to scan history. - Tracker backends (
EventStabilityTracker) keep only derived features (e.g. "this variable has been constant for the last 10k events"). Use these when you only need a summary, not the raw history — they cost a fraction of the memory.
All backends implement the same four-method contract: add_data, get_data,
dump, load. That contract is what EventPersistency and
PersistencySaver rely on — anything you add later only has to follow it.
3. Saver lifecycle (PersistencySaver)
EventPersistency itself is in-memory. To survive a process restart, the
state has to be written somewhere. PersistencySaver wraps an
EventPersistency and:
- writes to disk (or any
fsspecURI) on two triggers — a wall-clock interval and an event-count threshold; - optionally
auto_loads previously saved state during construction; - exposes
start()/stop()so the background timer can be torn down cleanly.stop()is idempotent and is called automatically when aComponentis used as a context manager.
In practice a detector never instantiates PersistencySaver directly: it sets
a persist: block in its config and CoreDetector wires the saver up via
init_persistency.
Quick start
from detectmatelibrary.utils import persistency
ep = persistency.EventPersistency(
event_data_class=persistency.EventStabilityTracker,
)
ep.ingest_event(
event_id="4624",
event_template="An account was successfully logged on.",
named_variables={"AccountName": "alice", "LogonType": "3"},
)
tracker = ep.get_event_data("4624") # or ep["4624"]
That snippet covers the whole in-memory API: pick a backend class, ingest events, query state.
API reference
EventPersistency
| Parameter | Description |
|---|---|
event_data_class |
An EventDataStructure subclass; one instance is created per event ID. |
variable_blacklist |
Variable names to skip when ingesting. Defaults to ["Content"]. |
event_data_kwargs |
Extra kwargs forwarded to each backend instance. |
Common methods:
ep.ingest_event(event_id, event_template, variables=..., named_variables=...)
ep.get_event_data(event_id) # backend for a single event
ep.get_events_data() # dict[event_id -> backend]
ep.get_event_template(event_id)
ep.get_event_templates()
ep.get_events_seen() # all event IDs ever ingested
ep[event_id] # alias for get_event_data
Available backends
| Class | Use when |
|---|---|
persistency.EventDataFrame |
You need history and a Pandas DataFrame is the natural shape. |
persistency.ChunkedEventDataFrame |
High-volume / streaming workloads — Polars-backed with row-retention and automatic compaction. |
persistency.EventStabilityTracker |
You only care about how variables behave over time (STATIC / STABLE / UNSTABLE / RANDOM). Cheapest memory footprint. |
All three are re-exported from the top of the package — persistency.X is the
canonical import; the deeply nested submodules are an implementation detail.
Persisting to disk
saver = persistency.PersistencySaver(
ep,
persistency.PersistencySaverConfig(
path="./state/my-detector",
save_interval_seconds=300,
events_until_save=10_000, # save after this many ingests, too
auto_load=False,
storage_options={}, # forwarded to fsspec
),
)
saver.start()
# ... detector runs ...
saver.stop() # final flush, stops the background timer
PersistencySaver.save() is thread-safe, and stop() is idempotent. The two
save triggers (save_interval_seconds and events_until_save) are
independent — whichever fires first wins.
Restoring state
saver = persistency.PersistencySaver(
ep,
persistency.PersistencySaverConfig(path="./state/my-detector", auto_load=True),
)
# ep is now pre-populated from disk
If auto_load=True and no saved state exists, the constructor raises
persistency.PersistencyLoadError immediately — fail-fast rather than
silently starting empty.
Storage backends (fsspec)
PersistencySaverConfig.path accepts any URI fsspec understands: a local path
(./state), s3://bucket/key, gs://..., az://..., and so on. Provider
credentials and tuning knobs go in storage_options.
Using persistency inside a detector
The recommended path: declare persist: in the detector's config and let
CoreDetector._register_persistency build the saver for you. See
Saving state (persist) for the config
schema.
In detector code, the pattern is:
from detectmatelibrary.common.detector import CoreDetector
from detectmatelibrary.utils import persistency
class MyDetector(CoreDetector):
def __init__(self, name="MyDetector", config=MyDetectorConfig()):
super().__init__(name=name, config=config)
self.persistency = persistency.EventPersistency(
event_data_class=persistency.EventStabilityTracker,
)
self._register_persistency(self.persistency)
def train(self, input_):
self.persistency.ingest_event(
event_id=input_["EventID"],
event_template=input_["template"],
named_variables={...},
)
def detect(self, input_, output_):
tracker = self.persistency.get_events_data().get(input_["EventID"])
# compare against tracker to produce alerts
_register_persistency is a one-line wrapper around
init_persistency; the helper
honours config.persist and returns None (so self.saver stays None)
when persistence is disabled.