MODEL
NanoMind Security Classifier
On-device Mamba TME classifier for AI agent security content. 10 classes (9 attack types plus benign). 8.3 MB ONNX. 98.45% eval accuracy on the held-out set. Published to HuggingFace.
98.45%
Eval Accuracy
0.978
Macro F1
10
Classes
8.3 MB
Model Size
The 10 Classes
Nine attack types plus benign. This is the label set the v0.5.0 classifier emits, taken from the sft-v10 training corpus.
exfiltration
Data forwarding to external endpoints
injection
Instruction override, jailbreak
privilege_escalation
Unauthorized access elevation
persistence
Permanent state manipulation
credential_abuse
Credential harvesting, phishing
lateral_movement
Remote config, C2 communication
social_engineering
Urgency, pressure tactics
policy_violation
Governance bypass
steganography
Zero-width chars, homoglyphs, BIDI
benign
Normal agent behavior
Evaluation (v0.5.0)
Held-out eval (194 samples) plus a 50-fixture oracle (40 malicious plus 10 benign hard-negatives). Per-class F1 is tracked as a release gate but not published per class. Macro F1 is the authoritative summary.
| Eval accuracy | 98.45% |
| Macro F1 | 0.978 |
| Eval samples | 194 |
| Oracle recall | 100% |
| Oracle precision | 79.6% |
| Oracle F1 | 0.887 |
| Oracle benign FPR | 9.1% |
Version History
| Version | Architecture | Accuracy | Corpus | Status |
|---|---|---|---|---|
| v0.5.0 | Mamba TME + dropout | 98.45% | sft-v10 (3,168) | latest |
| v0.4.0 | Mamba TME | 96.73% | sft-v9 (3,337) | stable |
| v0.2.0 | Mamba TME | 97.01% | v4 (822) | deprecated |
| v0.1.0 | MLP (3 layers) | 86% | v4 (822) | deprecated |
Training Data (sft-v10 corpus)
3,168 training samples, 194 held-out eval, 10 classes, vocab 6,000. Claude LLM reviews every label as chief data scientist. The sources below are the raw pool sampled into the sft-v10 split.
| Source | Samples | Type |
|---|---|---|
| OASB benchmark | 4,151 | Real labeled scenarios |
| Registry (pretrain) | 4,885 | Real package descriptions |
| Synthetic | 1,029 | Template edge cases |
| DVAA | 88 | Vulnerable configs |
| AgentPwn | 68 | Real-world captures |
Architecture Details
| Type | Ternary Mamba Encoder (TME) |
| Blocks | 8 Mamba SSM blocks |
| d_model | 128 |
| d_state | 64 |
| Dropout | 0.1 |
| Pooling | Mean over sequence |
| Output | 10-class softmax |
| Format | ONNX (CPU inference) |
| Training | Apple Silicon MLX |
| Loss | Cross-entropy, class-weighted |
| LR Schedule | Cosine with warmup |
| Early Stopping | Patience 30 on eval loss |