Project
FreightSense
A shipment triage tool that combines rule-based risk scoring with LLM recommendations to suggest delay interventions and record overrides.
Problem
When delayed shipments start piling up, operations teams often end up reviewing each one manually. FreightSense was built to shorten that decision loop by combining deterministic scoring with LLM-based reasoning, then keeping a clear audit trail when a human overrides the recommendation.
How It Works
FreightSense uses two layers:
- Rule-based scoring to measure delay risk, financial exposure, and benchmark-based severity.
- LLM reasoning to recommend an intervention and estimate potential savings in a structured format.
The output is meant for triage rather than full automation. A human still sees the recommendation, the reasoning, and any disagreement between the two layers before taking action.
System Overview
Browser dashboard | vFastAPI service |- POST /api/evaluate |- POST /api/evaluate/{id}/override |- GET /api/audit |- GET /api/audit/{id}/overrides `- GET /api/meta | +-- Layer 1: deterministic scoring +-- Layer 2: Groq LLM evaluation `-- SQLite audit logDecision Logic
| Risk score | Recommendation | Typical trigger |
|---|---|---|
>= 75 | EXPEDITE | High delay and high exposure |
50-74 | DISCOUNT | Moderate delay with retention risk |
25-49 | MONITOR | Some delay, not urgent yet |
< 25 | NO_ACTION | Within acceptable variance |
If the deterministic layer and the LLM disagree, the UI shows both outputs so the operator can make the final call explicitly.
API Surface
POST /api/evaluate
Evaluates a shipment and returns the risk score, recommended action, confidence, reasoning, and estimated intervention impact.
{ "order_id": "ORD-00123", "customer_segment": "Corporate", "market": "USCA", "category_name": "Electronics", "shipping_mode": "Standard Class", "days_scheduled": 5, "days_actual_estimate": 9, "order_item_total": 1200.0, "profit_ratio": 0.18}POST /api/evaluate/{id}/override
Stores a human decision against an evaluation, including custom reasoning.
GET /api/audit
Returns the audit log for previous evaluations.
GET /api/audit/{id}/overrides
Returns the full override history for one evaluation.
GET /api/meta
Returns available categories and markets for the dashboard.
Benchmark Layer
The deterministic layer uses precomputed benchmark statistics from roughly 180k supply-chain records. Those benchmarks are compiled into data/benchmarks.json and loaded at startup so inference stays simple and fast at request time.
uv run python scripts/build_benchmarks.pyEach benchmark group stores:
- average scheduled days
- average delay days
- late-delivery rate
- average profit ratio
- sample size
Local Setup
Requirements
- Python 3.12+
uvorpip- A Groq API key
Install
git clone https://github.com/VedantAndhale/FreightSense.gitcd FreightSenseuv syncConfigure
cp .env.example .envGROQ_API_KEY=gsk_...DATABASE_URL=./freightsense.dbGROQ_MODEL=llama-3.3-70b-versatileRun
uv run uvicorn main:app --reloadThe operational dashboard is served at http://localhost:8000, and the API docs are available at http://localhost:8000/docs.
Deployment
FreightSense is set up for Google Cloud Run with a warm instance so the SQLite-backed audit log stays available between requests.
One-time GCP setup
GITHUB_ORG=your-org GITHUB_REPO=freightsense bash scripts/setup_gcp.shThe setup script:
- enables required GCP APIs
- creates the Artifact Registry repository
- creates the service account used by GitHub Actions
- configures Workload Identity Federation
- stores
GROQ_API_KEYin Secret Manager
Continuous deployment
Pushes to main build the image, push it to Artifact Registry, and deploy a new Cloud Run revision through GitHub Actions.
Project Structure
freightsense/|- main.py|- service.yaml|- Dockerfile|- pyproject.toml|- app/| |- api/| |- core/| |- db/| `- static/|- data/| |- benchmarks.json| `- DataCoSupplyChainDataset.csv|- scripts/| |- build_benchmarks.py| |- setup_gcp.sh| `- test_groq.py`- .github/workflows/deploy.ymlTech Stack
| Layer | Technology |
|---|---|
| API | FastAPI |
| LLM inference | Groq API |
| Data + scoring | pandas + Python |
| Persistence | SQLite via aiosqlite |
| Containerization | Docker |
| Deployment | Google Cloud Run |
Outcome
FreightSense is a practical example of using an LLM as a decision support layer instead of a standalone answer engine. The value comes from pairing model output with explicit scoring, guardrails, and a human override path.