This post is a deep-dive, hands‑on guide to turning an experimental ML project ideas into a production‑grade service you can deploy, scale, and observe. We will build a real‑time anomaly detection microservice for grid sensor data, expose a friendly UI, containerize both services, deploy them to Kubernetes with autoscaling, and reason about SLOs, performance, and extensibility.
The full codebase lives in this repository: https://github.com/Tomas-Kozak/mlops-series-l1
Contents
- A FastAPI service that serves an IsolationForest model for detecting anomalies in 3 sensor features:
voltage
,current
,frequency
. - A Streamlit UI that simulates grid readings, sends them in batches to the backend, and visualizes anomalies and tail latencies.
- Prometheus metrics for end‑to‑end observability (
/metrics
). - Docker images for both services; Docker Compose for local multi‑service dev.
- Kubernetes manifests with a Horizontal Pod Autoscaler (HPA) driven by CPU, deployable to a local kind cluster.
Patterns and code you can reuse: request/response schemas, model artifact handling, latency instruments, container hardening, HPA configuration, and a realistic load test harness.
Architecture Overview
Data flow:
1) Simulator emits grid readings with occasional injected anomalies.
2) UI assembles readings into request batches and calls the /predict
endpoint.
3) Backend validates payloads (Pydantic), converts to feature matrix, runs IsolationForest inference.
4) Backend returns per‑reading anomaly flags and scores, while emitting Prometheus metrics.
5) UI renders charts, overlays anomalies, and generates burst/sustained load to trigger autoscaling.
Services and deployment:
src/backend/*
: containerized FastAPI app, scaled by HPA in Kubernetes.src/ui/*
: containerized Streamlit UI, pointed to backend viaBACKEND_URL
.docker/compose.yaml
: local multi‑service run with shared network.k8s/*
:Deployment
+Service
for both apps and CPU‑basedHorizontalPodAutoscaler
for backend.
Diagram: High‑Level Architecture
+----------------+ HTTP (JSON) +-------------------+
| Streamlit UI | -------------------------> | FastAPI Backend |
| (src/ui) | | (src/backend) |
+--------+-------+ +----+--------------+
^ |
| | Prometheus exposition
| v
| +-----+------+
Simulation (local) | /metrics |
└─ stream_readings() +------------+
In Kubernetes:
+------------------+ +------------------+
| anomaly-ui Pod | ---> Service ---> | anomaly-backend |
| (Deployment) | (ClusterIP) | Pods (HPA) |
+---------+--------+ +---------+--------+
| |
Port-forward HPA
| (CPU utilization target)
v v
http://localhost:8501 Scales 1..N pods
Diagram: Request Path and Timers
UI batch -> HTTP POST /predict -> [FastAPI]
|
+-- parse JSON (Pydantic)
|
+-- as_feature_matrix()
|
+-- [T_infer] IsolationForest.predict
|
+-- build response
|
`-- observe REQUEST_LATENCY (total)
Diagram: Sequence (UI burst)
UI(ThreadPool) Kubernetes SVC Pod A (backend) Pod B (backend)
| | | |
|-- N requests ------->|-- distribute ------>|-- handle (A) |
| | | |
|-- N requests ------->|-- distribute ----------------------------->|-- handle (B)
| | | |
|<- responses (A,B) ---|<--------------------|<---------------------|
Data and Modeling
Core modeling is not the focus of this post, but we’ll briefly go over it to get a sense of what we’re working with. Problem: detect anomalous grid operating points using only point‑in‑time readings for three features: voltage (V), current (A), and frequency (Hz).
We use an IsolationForest (scikit‑learn) trained on synthetically generated “normal” operating points; assigns higher anomaly scores to outliers (inverted). This keeps the focus on the MLOps systems work rather than modeling intricacies, but we still make the model deterministic, reproducible, and artifacted.
Feature schema and defaults (source: src/backend/model.py:11
):
DEFAULT_FEATURES = ["voltage", "current", "frequency"]
Model config and training (source: src/backend/model.py:15
):
@dataclass(frozen=True)
class ModelConfig:
feature_names: Tuple[str, ...] = tuple(DEFAULT_FEATURES)
random_state: int = 42
contamination: float = 0.05
n_estimators: int = 200
def train_isolation_forest(config: ModelConfig, n_samples: int = 5000) -> IsolationForest:
rng = np.random.default_rng(config.random_state)
X = _generate_synthetic_normal(n_samples, rng)
model = IsolationForest(
n_estimators=config.n_estimators,
contamination=config.contamination,
random_state=config.random_state,
)
model.fit(X)
return model
Theory notes:
- IsolationForest isolates points by random splits; outliers require fewer splits to isolate, yielding larger anomaly scores.
contamination
approximates the expected fraction of anomalies. For energy datasets with non‑stationarity, you’d calibrate this offline (ROC/PR curves) and monitor precision/recall drop‑offs over time.- Determinism and reproducibility come from fixed
random_state
and artifacting the trained model tomodels/iso_forest.joblib
.
Advanced modeling guidance tailored to this service:
- If your grid has diurnal/seasonal cycles, consider adding time‑of‑day or temperature context features or training per‑segment models (substations/regions). Keep
feature_names
ordered and versioned. - Track input drift via PSI/KL on voltage/current/frequency; alert when drift exceeds thresholds and kick off retraining jobs.
- Calibrate a score threshold to a target precision/recall on validation data and make it a dynamic config value.
API Design and Contracts
We model the request/response with Pydantic. The contract is explicit and versionable (good for compatibility testing and schema drift detection).
Schemas (source: src/backend/api.py:1
):
class SensorReading(BaseModel):
voltage: float
current: float
frequency: float
timestamp: Optional[str] = None
class BatchPredictRequest(BaseModel):
readings: List[SensorReading]
class Prediction(BaseModel):
anomaly: bool
score: float
class BatchPredictResponse(BaseModel):
predictions: List[Prediction]
anomalies: int
total: int
served_by: str | None = None
Endpoints (source: src/backend/app.py
):
GET /health
: quick readiness and instance identity.POST /predict
: batch inference. Returns per‑reading anomaly booleans and scores, plus metadata.GET /metrics
: Prometheus exposition for scraping.
Example request:
POST /predict
Content-Type: application/json
{
"readings": [
{"voltage": 230.1, "current": 9.8, "frequency": 50.02},
{"voltage": 271.0, "current": 28.5, "frequency": 50.7}
]
}
Example response:
{
"predictions": [
{"anomaly": false, "score": 0.06},
{"anomaly": true, "score": 0.79}
],
"anomalies": 1,
"total": 2,
"served_by": "pod-xyz:12345"
}
Validation and invariants:
- Pydantic enforces types; reject empty batches with 400.
as_feature_matrix
ensures strict feature ordering to match the model’s training order.- On model unavailability, return
503
to signal readiness gates.
Backend: Serving, Metrics, and Error Semantics
Core app (abridged; source: src/backend/app.py:1
):
REQUEST_COUNT = Counter("requests_total", "Total HTTP requests", ["path", "method", "status"])
REQUEST_LATENCY = Histogram("request_latency_seconds", "Request latency (s)", ["path", "method"])
INFERENCE_LATENCY = Histogram("inference_latency_seconds", "Model inference latency (s)")
ANOMALY_COUNT = Counter("anomalies_total", "Total anomalies predicted")
@app.post("/predict", response_model=BatchPredictResponse)
def predict(req: BatchPredictRequest) -> BatchPredictResponse:
if not req.readings:
raise HTTPException(status_code=400, detail="No readings provided")
if model_instance is None:
raise HTTPException(status_code=503, detail="Model not ready")
start = time.perf_counter()
status_label = "200"
try:
X = as_feature_matrix([r.model_dump() for r in req.readings], model_instance.feature_names)
infer_t0 = time.perf_counter()
flags, scores = model_instance.predict_batch(X)
INFERENCE_LATENCY.observe(time.perf_counter() - infer_t0)
preds = [Prediction(anomaly=bool(flags[i]), score=float(scores[i])) for i in range(len(req.readings))]
anomalies = int(flags.sum())
ANOMALY_COUNT.inc(anomalies)
return BatchPredictResponse(predictions=preds, anomalies=anomalies, total=len(preds), served_by=INSTANCE_ID)
except HTTPException as he:
status_label = str(he.status_code); raise
except Exception:
status_label = "500"; raise
finally:
dt = time.perf_counter() - start
REQUEST_LATENCY.labels(path="/predict", method="POST").observe(dt)
REQUEST_COUNT.labels(path="/predict", method="POST", status=status_label).inc()
Notes on instrumentation and semantics:
- We separate total request latency from model inference latency; this allows you to spot overhead from JSON parsing, data marshaling, or network.
- We label counts by path/method/status to power RED/USE dashboards and simple SLOs (e.g., error rate < 1%).
- A global exception handler ensures 500s are counted even on unexpected code paths.
Threading and CPU:
- scikit‑learn inference is CPU‑bound. We scale using Uvicorn workers (multiprocessing) and Kubernetes replicas; the GIL isn’t the bottleneck when workers are processes.
- Suggested env for predictable CPU usage: set
OMP_NUM_THREADS=1
andMKL_NUM_THREADS=1
in production to avoid oversubscription when running multiple workers/pods.
Histogram buckets and cardinality:
- Customize histogram buckets to your SLOs to make
histogram_quantile
stable. Keep label sets small to control time‑series cardinality and Prometheus load.
Example bucket customization:
REQUEST_LATENCY = Histogram(
"request_latency_seconds", "Request latency (s)", ["path", "method"],
buckets=(0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1.0)
)
Model Artifact Management
Artifacts live under MODEL_DIR
(models
locally, /models
in containers). On first run, the service trains and persists an artifact, making cold‑start deterministic.
Artifact handling (source: src/backend/model.py:39
):
def ensure_model(model_path: Path | str, config: ModelConfig | None = None, train_if_missing: bool = True) -> AnomalyModel:
path = Path(model_path)
cfg = config or ModelConfig()
if path.exists():
model = joblib.load(path)
return AnomalyModel(model, cfg)
if not train_if_missing:
raise FileNotFoundError(f"Model artifact not found at {path}")
path.parent.mkdir(parents=True, exist_ok=True)
model = train_isolation_forest(cfg)
joblib.dump(model, path)
return AnomalyModel(model, cfg)
Production variants:
- Swap the local filesystem for a PVC in Kubernetes or a remote artifact store (S3/GCS) with a digest‑addressed path and integrity check.
- Gate startup on model availability (
TRAIN_IF_MISSING=0
) and fail fast if artifact is missing. - Record model metadata (config, hash, training data hash) next to the artifact for lineage.
Artifact promotion workflow:
- Train offline; store artifact with immutable digest and metadata JSON (config, data hashes, metrics).
- Canary new artifact with 1 replica; compare error/latency and anomaly base rate vs. baseline.
- Promote by scaling up canary and scaling down baseline or using traffic splitting with Argo Rollouts.
The UI: Streaming, Visualization, and Load Generation
The UI lets you explore the system’s behavior and also stress test it. It simulates readings, batches requests, and displays rolling windows with anomaly overlays and server instance distribution.
Highlights (source: src/ui/main.py
):
- Sidebar controls: backend URL, batch size, interval, anomaly rate, and “new connection each request” to improve load distribution across pods (disables HTTP keep‑alive pinning).
- Actions: Generate Batch, Start/Stop Streaming, Inject Extreme Reading, Concurrent Burst, and Sustained Bursts (with workers and cycles).
- Charts: line chart for signals; red triangle markers for anomalies; bar chart of “served_by” instance to visualize request spread across replicas.
Why “new connection each request” matters: HTTP keep‑alive can pin a Streamlit (single process) client to a single pod behind a ClusterIP/Service due to 5‑tuple hashing. Setting Connection: close
on each request approximates a fairer fan‑out across pods at the cost of extra TCP overhead. For production, consider a proper load balancer with connection‑level balancing or a service mesh.
Additional UI internals:
- Maintains a rolling deque (
WINDOW=200
) of recent points with anomaly overlay; aggregatesserved_by
to visualize load distribution. - Burst features use
ThreadPoolExecutor
; when pushing very high RPS from one host, watch ephemeral port exhaustion and OS limits. - “Inject Extreme Reading” posts a synthetic outlier to validate end‑to‑end behavior and verify alerting/visualization.
Local Development and Tooling
We use uv
with pyproject.toml
to pin dependencies and streamline local runs.
Key commands:
- Install:
uv sync --extra dev
- Run API:
uv run python -m src.backend.main
oruv run uvicorn src.backend.app:app --host 0.0.0.0 --port 8000
- Run UI:
uv run streamlit run src/ui/main.py
- Lint/format:
uv run python scripts/dev.py fix
(optional helper)
Typing and linting:
mypy
enforces type safety onsrc/
; gradually tighten in critical paths (API surface, model I/O).ruff
provides fast lint + formatting; keep consistent style to reduce PR churn.
Containerization
We build two containers — a full backend image with the ML stack and a lean UI image that depends only on Streamlit + Requests.
Backend Dockerfile (source: docker/Dockerfile.backend
):
# syntax=docker/dockerfile:1.7
FROM python:3.13-slim AS base
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1 \
PORT=8000 \
MODEL_DIR=/models \
MODEL_NAME=iso_forest.joblib \
TRAIN_IF_MISSING=1 \
VIRTUAL_ENV=/app/.venv \
PATH="/app/.venv/bin:$PATH"
WORKDIR /app
RUN pip install --no-cache-dir --upgrade pip \
&& pip install --no-cache-dir uv
COPY pyproject.toml README.md uv.lock ./
COPY src ./src
RUN uv --version \
&& uv sync --frozen
RUN groupadd -r app && useradd -r -g app app \
&& mkdir -p /models \
&& chown -R app:app /app /models
USER app
EXPOSE 8000
CMD ["sh", "-c", "uvicorn src.backend.app:app --host 0.0.0.0 --port 8000 --workers ${WORKERS:-1}"]
UI Dockerfile (source: docker/Dockerfile.ui
):
# syntax=docker/dockerfile:1.7
FROM python:3.13-slim AS base
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1 \
VIRTUAL_ENV=/app/.venv \
PATH="/app/.venv/bin:$PATH" \
BACKEND_URL=http://localhost:8000 \
STREAMLIT_SERVER_PORT=8501
WORKDIR /app
RUN pip install --no-cache-dir --upgrade pip \
&& pip install --no-cache-dir uv
COPY src/__init__.py ./src/__init__.py
COPY src/ui ./src/ui
RUN uv pip install --system streamlit==1.49.1 requests==2.32.4
EXPOSE 8501
CMD ["streamlit", "run", "src/ui/main.py", "--server.port=8501", "--server.address=0.0.0.0"]
Notes:
- Non‑root user (
app
) and a writable/models
directory for artifacts in backend. - The UI image avoids installing the heavy ML stack, reducing build time and attack surface.
- Use
WORKERS
env to tune Uvicorn worker processes for CPU cores.
Supply‑chain and runtime hardening:
- Pin base image digests; generate SBOMs; scan images in CI.
- Set
OMP_NUM_THREADS=1
andMKL_NUM_THREADS=1
in the backend image to avoid CPU oversubscription. - Consider
readOnlyRootFilesystem: true
and dropping Linux capabilities where possible.
Docker Compose
Local multi‑service development (source: docker/compose.yaml
):
services:
backend:
build:
context: ..
dockerfile: docker/Dockerfile.backend
image: anomaly-backend
environment:
- TRAIN_IF_MISSING=1
- MODEL_DIR=/models
- POD_NAME=compose-backend
- WORKERS=2
volumes:
- ../models:/models
ports:
- "8000:8000"
ui:
build:
context: ..
dockerfile: docker/Dockerfile.ui
image: anomaly-ui
environment:
- BACKEND_URL=http://backend:8000
ports:
- "8501:8501"
depends_on:
- backend
networks:
default:
name: anomaly-net
Run both: cd docker && docker compose up --build
.
Testing variations locally:
- Scale
WORKERS=4
and compare p95 at fixed RPS. - Delete
./models/iso_forest.joblib
to exercise cold‑start training path.
Kubernetes: Deploy, Observe, and Scale
We provide Deployment
+ Service
for both apps and an HPA for the backend. Apply via Kustomize: kubectl apply -k k8s/
.
Backend deployment (source: k8s/backend-deployment.yaml
):
apiVersion: apps/v1
kind: Deployment
metadata:
name: anomaly-backend
namespace: anomaly
labels:
app: anomaly-backend
spec:
replicas: 1
selector:
matchLabels:
app: anomaly-backend
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: anomaly-backend
spec:
containers:
- name: app
image: anomaly-backend:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8000
name: http
env:
- name: TRAIN_IF_MISSING
value: "1"
- name: MODEL_DIR
value: "/models"
- name: WORKERS
value: "2"
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 2
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 10
periodSeconds: 20
timeoutSeconds: 2
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "1"
memory: "512Mi"
volumeMounts:
- name: model-store
mountPath: /models
volumes:
- name: model-store
emptyDir: {}
Horizontal Pod Autoscaler (source: k8s/backend-hpa.yaml
):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: anomaly-backend-hpa
namespace: anomaly
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: anomaly-backend
minReplicas: 1
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
Backend service (source: k8s/backend-service.yaml
):
apiVersion: v1
kind: Service
metadata:
name: anomaly-backend
namespace: anomaly
labels:
app: anomaly-backend
spec:
type: ClusterIP
selector:
app: anomaly-backend
ports:
- name: http
port: 8000
targetPort: 8000
UI deployment (source: k8s/ui-deployment.yaml
):
apiVersion: apps/v1
kind: Deployment
metadata:
name: anomaly-ui
namespace: anomaly
labels:
app: anomaly-ui
spec:
replicas: 1
selector:
matchLabels:
app: anomaly-ui
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: anomaly-ui
spec:
containers:
- name: ui
image: anomaly-ui:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8501
name: http
env:
- name: BACKEND_URL
value: http://anomaly-backend:8000
readinessProbe:
httpGet:
path: /
port: 8501
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 2
livenessProbe:
tcpSocket:
port: 8501
initialDelaySeconds: 10
periodSeconds: 20
timeoutSeconds: 2
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
UI service (source: k8s/ui-service.yaml
):
apiVersion: v1
kind: Service
metadata:
name: anomaly-ui
namespace: anomaly
labels:
app: anomaly-ui
spec:
type: ClusterIP
selector:
app: anomaly-ui
ports:
- name: http
port: 8501
targetPort: 8501
Notes and production tips:
- On Kubernetes, prefer a single Uvicorn worker per container and let HPA/KEDA add pods; use multiple workers per pod only with a clear reason (startup cost/bin-packing) and cap OMP_NUM_THREADS/MKL_NUM_THREADS to 1 to avoid CPU oversubscription.
- HPA on CPU is simple and effective for CPU‑bound inference. For request‑driven scaling (QPS or queue length), consider KEDA with event sources or custom metrics (requests in flight).
- Ensure
metrics-server
is installed; for kind, patch insecure TLS as in the README. - Tune
requests
/limits
to achieve a useful utilization target. Too low limits can cause throttling; too high requests can starve scheduling. - Prefer
readinessProbe
on/health
over/metrics
; treat model availability as a readiness condition.
Diagram: K8s Objects and Flow
+-------------+ +-------------------+ +------------------------+
| Deployment | --> | ReplicaSet (Pods) | --> | HPA observes CPU usage |
+------+------+ +---------+---------+ +-----------+------------+
| | |
v v |
Pods expose :8000 ClusterIP Service |
| | |
+-------> kube-proxy/iptables (hash) <--------------+
Kustomize overlays (source: k8s/kustomization.yaml
):
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: anomaly
resources:
- namespace.yaml
- backend-deployment.yaml
- backend-service.yaml
- backend-hpa.yaml
- ui-deployment.yaml
- ui-service.yaml
Namespace (source: k8s/namespace.yaml
):
apiVersion: v1
kind: Namespace
metadata:
name: anomaly
labels:
name: anomaly
Observability: Metrics, SLI/SLOs, and PromQL
The backend emits Prometheus metrics at /metrics
:
requests_total{path,method,status}
: RED metrics (rate/errors/duration).request_latency_seconds_bucket{path,method}
(+_sum
,_count
): request latency histograms.inference_latency_seconds_bucket
(+_sum
,_count
): pure model time.anomalies_total
: running count of predicted anomalies.
Example SLOs:
- Rule of thumb: for a CPU-bound sync service, keep average CPU ≲70% to protect p95/p99 from queuing blow-ups; we target 60% here to leave headroom.
- Bound synchronous server, keep utilization under ~70–75% to avoid runaway tail latencies; HPA targets 60% CPU here.
Reliability and Failure Modes
- Empty batches 400 error; invalid types 422 (Pydantic).
- Model missing 503 until artifact present or auto‑trained.
- Global exception handler increments
requests_total{status="500"}
and returns a stable JSON body for clients. - Timeouts: client defaults to 5s; adjust based on batch sizes and latency budgets.
Resilience extensions:
- Add circuit breaking/retries at the client or service mesh layer.
- Expose a lightweight
/live
probe separate from/health
readiness (for faster restarts). - Emit structured logs (already using
structlog
); attachinstance
,pod
, andnode
for correlation.
Structured logging example:
logger.info("predict", instance=INSTANCE_ID, pod=POD_NAME, total=len(req.readings), anomalies=anomalies)
Security and Hardening
- Non‑root containers; explicit resource requests/limits; probes.
- Avoid including the ML stack in the UI image.
- For production: ingress with TLS, authentication/authorization, rate limiting, and secrets management for external artifact stores.
- Image scanning and SBOM generation in CI.
Secrets and config management:
- Use
Secret
for credentials (e.g., remote artifact store) andConfigMap
for non‑sensitive configs (thresholds, feature lists). - Mount via env or volumes; avoid embedding secrets in images.
CI/CD and Testing Strategy
While this project focuses on systems integration, a robust setup would include:
- Unit tests around schema conversions (
as_feature_matrix
), model scoring sign, and endpoint error paths. - Contract tests that POST known payloads and assert response schema and monotonicity of scores.
- Performance tests gated on p95 thresholds per commit.
- CI pipeline: lint (ruff), typecheck (mypy), test (pytest), build images, push to registry, deploy to a staging namespace via Kustomize overlays, smoke test, then promote.
Testing pyramid tailored to this service:
- Unit:
as_feature_matrix
, score polarity, simulator generation bounds. - API: schema conformance, empty payloads, large batch behavior.
- Load: enforce p95/p99 targets at representative RPS.
- E2E: UI - backend flow; anomaly injection path.
- Upgrade: canary two backends with different model digests and verify parity.
Extensibility: Beyond the Minimal Service
- Model registry and lineage: back your artifacts by a registry (MLflow, WandB, or custom) with immutable digests and promotion flows.
- Data drift and concept drift: track input feature distributions and anomaly rate; trigger alerts or shadow retraining.
- Event‑driven streaming: swap the synchronous HTTP flow for Kafka/NATS ingestion + consumer workers; scale with KEDA on lag.
- Inference servers: the same contract can be implemented with BentoML, Ray Serve, or LitServe while preserving schema and metrics.
- Real thresholds: convert scores to probabilities via calibration; expose thresholds per segment or dynamic thresholds from control charts.
Streaming architectures:
- Replace synchronous HTTP with Kafka ingestion and consumer workers; scale with KEDA on lag; UI subscribes via WebSocket/SSE.
End‑to‑End Run:
Bare metal (no containers):
- Install deps:
uv sync --extra dev
. - Backend:
uv run python -m src.backend.main
. - UI:
uv run streamlit run src/ui/main.py
. - Generate batches or start streaming from the UI; optionally run
uv run python scripts/load.py ...
for load.
Docker Compose:
cd docker && docker compose up --build
.- UI at
http://localhost:8501
(points to backend). - Optionally run load against
http://localhost:8000
.
Kubernetes (kind):
kind create cluster --name anomaly
.- Install metrics‑server (and patch for kind); verify
kubectl top nodes
works. - Build and load images into kind.
kubectl apply -k k8s/
.- Port‑forward UI:
kubectl -n anomaly port-forward svc/anomaly-ui 8501:8501
. - Optionally port‑forward backend and run the load generator.
- Watch
kubectl -n anomaly get hpa -w
andkubectl -n anomaly get pods -w
while increasing RPS.
Smoke test checklist:
/health
returnsstatus=ok
; features matchDEFAULT_FEATURES
.- First run creates model artifact; subsequent runs load it (no retrain).
- UI shows anomalies with non‑zero anomaly rate or after “Inject Extreme Reading”.
/metrics
counters and histograms move when generating load.
Appendix
Environment variables:
- Backend:
WORKERS
,MODEL_DIR
,MODEL_NAME
,TRAIN_IF_MISSING
,POD_NAME
,NODE_NAME
. - UI:
BACKEND_URL
,STREAMLIT_SERVER_PORT
.
Key files to explore:
src/backend/app.py
src/backend/api.py
src/backend/model.py
src/backend/sim.py
src/ui/main.py
scripts/load.py
docker/Dockerfile.backend
,docker/Dockerfile.ui
,docker/compose.yaml
k8s/*
Closing
This project demonstrates the full lifecycle from a simple notebook‑idea model to a production‑ready microservice with observability and autoscaling. The same patterns scale to richer models and larger fleets: keep contracts explicit, instrument everything, design for scale‑out, and treat artifacts and configuration as first‑class citizens.