Bigram Frequency Detector
The Bigram Frequency Detector raises alerts when a variable's character bigrams (pairs of adjacent characters) appear improbable under a learned per-variable bigram frequency model. Optionally, an English-language bigram table can be consulted as a fallback for bigrams not yet seen during training.
| Schema | Description | |
|---|---|---|
| Input | ParserSchema | Structured log |
| Output | DetectorSchema | Alert / finding |
Description
For each configured variable, the detector walks every observed value character-by-character (with virtual boundary characters before the first and after the last) and updates a per-(event, variable) bigram frequency table. At detect time, the average per-bigram conditional probability of a new value is computed against this table. Values scoring below prob_thresh (default 0.05) are flagged. When default_freqs is enabled, a built-in English bigram table acts as a fallback for bigrams unseen during training.
Configuration example
detectors:
BigramFrequencyDetector:
method_type: bigram_frequency_detector
auto_config: False
params: {}
events:
1:
test:
params: {}
variables:
- pos: 0
name: var1
params:
threshold: 0.
header_variables:
- pos: level
params: {}
Example usage
from detectmatelibrary.detectors.bigram_frequency_detector import BigramFrequencyDetector, BufferMode
import detectmatelibrary.schemas as schemas
detector = BigramFrequencyDetector(name="BigramFrequencyTest", config=cfg)
parsed_data = schemas.ParserSchema({
"parserType": "test",
"EventID": 1,
"template": "test template",
"variables": ["var1"],
"logID": "1",
"parsedLogID": "1",
"parserID": "test_parser",
"log": "test log message",
"logFormatVariables": {"timestamp": "123456"}
})
alert = detector.process(parsed_data)
Go back Index