DBN Production Architecture¶
Status: Approved by DBN · Target Environment: Microsoft Azure · Database: MS SQL Server (Azure SQL)
This page maps every component in the DBN-approved architecture diagram to a concrete technical implementation using our FastAPI RAG application.
Approved Architecture — Full Mermaid Reconstruction¶
The diagram below faithfully reproduces the approved DBN architecture and annotates each node with its implementation technology.
flowchart LR
subgraph LOCAL["🖥️ Local Computers"]
FF["📄 Flat Files\n(CSV / Excel)"]
API_SRC["🔌 API Connection\n(External Systems)"]
end
subgraph SECURITY["🛡️ Security Perimeter"]
FW["🔒 Firewall &\nAnti-Virus\n(Azure Front Door / WAF)"]
end
subgraph AZURE["☁️ Microsoft Azure"]
subgraph INGESTION["Data Ingestion"]
IE["⚙️ Injection Engine\n(Azure Data Factory)"]
DBN_DW[("🗄️ DBN DW\n(MS SQL Server\nOn-Premise / Azure SQL)")]
end
subgraph STAGING["Staging Environment / SQL Mgt. Studio"]
STAGE[("🗃️ Staging DB\n(Azure SQL Database\nPremium Tier)")]
end
MLE["🤖 Machine Learning Engine\n(Azure Machine Learning)"]
subgraph ANALYTICS["Analytics Engine"]
AE[("📊 Analytics DB\n(Azure SQL Database)\nmsmeloans / dimensional models")]
end
subgraph API_LAYER["NL Processor / SQL Generator"]
NLP["⚡ FastAPI App\n(Azure Container Apps)\n/api/v1/chat/ask"]
RULES[("🟣 Rules Based Engine\n(ChromaDB Vector Store)\nDDL + SQL + Docs")]
VALIDATOR["✅ SQL Validator\n(read-only role +\nVanna pre-execution check)"]
end
subgraph SEC_LAYER["Security Layer"]
TOKEN["🔑 Anti-Forgery Token & TLS\n(Azure Managed Cert +\nEntra ID JWT Validation)"]
end
CLOUD_OUT["🌐 Consumer Interface\n(Dashboard / BI Tool / REST Client)"]
end
%% Local → Azure ingestion
FF --> FW
API_SRC --> FW
FW --> IE
IE --> DBN_DW
IE --> STAGE
%% Staging → ML → Analytics
STAGE -.->|"ML training data"| MLE
MLE -.->|"pred_default_prob\nrisk_score"| AE
STAGE --> AE
%% Analytics → API layer
AE --> NLP
NLP --> RULES
RULES -->|"top-10 similar\ncontext records"| NLP
NLP --> VALIDATOR
VALIDATOR -->|"validated T-SQL"| AE
%% Security → output
NLP --> TOKEN
TOKEN --> CLOUD_OUT
%% Styling
classDef azure fill:#e8f0fe,stroke:#3F51B5,color:#1a237e
classDef db fill:#ede7f6,stroke:#7C4DFF,color:#311b92
classDef security fill:#fff3e0,stroke:#FF6D00,color:#e65100
classDef api fill:#e8f5e9,stroke:#00C853,color:#1b5e20
classDef local fill:#f5f5f5,stroke:#9e9e9e,color:#424242
class IE,MLE,NLP,VALIDATOR azure
class DBN_DW,STAGE,AE,RULES db
class FW,TOKEN security
class CLOUD_OUT api
class FF,API_SRC local
Component-by-Component Implementation¶
Zone 1 — Local Computers (Data Sources)¶
| Approved Component | Implementation | Technology |
|---|---|---|
| Flat Files | CSV/Excel exports from PFI partners | Azure Blob Storage (staging area) |
| API Connection | External loan management system feeds | Azure API Management gateway |
| Firewall / Anti-Virus | Network perimeter protection | Azure Front Door + WAF Policy |
Zone 2 — Injection Engine & DBN DW¶
| Approved Component | Implementation | Technology |
|---|---|---|
| Injection Engine | ETL/ELT orchestration pipeline | Azure Data Factory |
| DBN DW | Existing on-premise or Azure-hosted warehouse | MS SQL Server (on-prem or Azure SQL MI) |
| Staging / SQL Mgt. Studio | Data landing zone and quality checks | Azure SQL Database (Premium) + Azure Data Studio |
Staging Process
Data lands in the Staging Database first where cleansing rules, deduplication, and schema enforcement are applied. Only validated records proceed to the Analytics Engine.
Zone 3 — Machine Learning Engine¶
| Approved Component | Implementation | Technology |
|---|---|---|
| Machine Learning Engine | Model training + batch inference pipeline | Azure Machine Learning |
The ML Engine produces the following columns that are written back into the Analytics Engine:
| ML-Generated Column | Type | Description |
|---|---|---|
pred_default_prob |
FLOAT |
Probability of loan default (0.0–1.0) |
predicted_default |
BIT |
Binary default flag |
risk_score |
FLOAT |
Composite risk score |
risk_category |
VARCHAR |
Low Risk / Medium Risk / High Risk |
Zone 4 — Analytics Engine (MS SQL Server)¶
The Analytics Engine is the curated, ML-enriched Azure SQL Database that the FastAPI application queries.
-- Connection string format (for services/vanna_service.py)
Driver={ODBC Driver 18 for SQL Server};
Server=tcp:dbn-analytics-server.database.windows.net,1433;
Database=DBNAnalyticsDB;
Authentication=ActiveDirectoryMsi; -- Uses Managed Identity (no password!)
Encrypt=yes;
TrustServerCertificate=no;
Recommended database user setup (read-only):
-- Create a dedicated API service account
CREATE USER dbn_api_user FROM EXTERNAL PROVIDER; -- Entra ID managed identity
-- Grant only SELECT privilege — no DML or DDL
ALTER ROLE db_datareader ADD MEMBER dbn_api_user;
Zone 5 — NL Processor / SQL Generator (FastAPI)¶
This is the heart of our application. It maps to three distinct sub-components:
5a. FastAPI Application — NL Processor¶
Deployed as an Azure Container Apps service running the Docker image built from our Dockerfile.
5b. ChromaDB Vector Store — Rules Based Engine¶
The vector store acts as the semantic rules engine by holding three categories of knowledge:
pie title ChromaDB Training Data Composition (recommended)
"DDL Schema Definitions" : 20
"Verified Question-SQL Pairs" : 60
"Business Documentation" : 20
5c. SQL Validator¶
Before any generated SQL is sent to the Analytics Engine:
- Vanna pre-check: Vanna validates SQL syntax and strips markdown code fences.
- Role enforcement: The database user (
dbn_api_user) has onlydb_datareader—DROP,INSERT,UPDATE, andDELETEare physically impossible. - Query timeout: Azure SQL enforces a
Connection Timeout=30to prevent runaway queries.
Zone 6 — Security Layer¶
| Security Control | Implementation | Standard |
|---|---|---|
| TLS / HTTPS | Azure Container Apps managed certificate | TLS 1.3 |
| Anti-Forgery Token | JWT Bearer token via Azure Entra ID | OAuth 2.0 / OIDC |
| CORS | FastAPI CORSMiddleware (approved origins only) |
Same-Origin Policy |
| Secret Management | Azure Key Vault + Managed Identity | Zero secrets in code |
| API Rate Limiting | Azure API Management throttling policy | Configurable per consumer |
Production Migration Checklist¶
Checklist for transitioning from the SQLite POC to the production MS SQL Server deployment:
- Infrastructure: Provision Azure SQL Database (Premium tier, 4 vCores minimum)
- Networking: Create Azure Virtual Network; place Container App and SQL in same VNet
- Identity: Create Managed Identity for the Container App; grant
db_datareaderrole - Key Vault: Store
OPENAI_API_KEYin Azure Key Vault; reference via Managed Identity - Data Migration: Run ADF pipeline to load
msmeloansfrom DBN DW into Analytics DB - Driver Update: Add
msodbcsql18toDockerfile; installpyodbcinrequirements.txt - Code Update: Update
VannaService.__init__to usepyodbcconnection (see below) - Retrain DDL: Clear old SQLite DDL from ChromaDB; re-train with T-SQL DDL from Azure SQL
- Retrain SQL: Replace
LIMITwithTOPin all SQL training examples (T-SQL dialect) - Security Review: Restrict CORS
allow_originsto approved frontend domains - Load Test: Verify <2s p95 latency on
/api/v1/chat/askunder 10 concurrent users
Code: MS SQL Server Integration¶
Updated requirements.txt¶
Updated Dockerfile¶
FROM python:3.10-slim
WORKDIR /app
# Install ODBC Driver 18 for SQL Server
RUN apt-get update && apt-get install -y \
build-essential curl gnupg unixodbc-dev \
&& curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - \
&& curl https://packages.microsoft.com/config/debian/11/prod.list \
> /etc/apt/sources.list.d/mssql-release.list \
&& apt-get update \
&& ACCEPT_EULA=Y apt-get install -y msodbcsql18 \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8080
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
Updated services/vanna_service.py¶
import pyodbc
import pandas as pd
from vanna.chromadb import ChromaDB_VectorStore
from vanna.openai import OpenAI_Chat
from core.config import settings
class MyVanna(ChromaDB_VectorStore, OpenAI_Chat):
def __init__(self, config=None):
ChromaDB_VectorStore.__init__(self, config=config)
OpenAI_Chat.__init__(self, config=config)
class VannaService:
def __init__(self):
self.vn = MyVanna(config={
'api_key': settings.openai_api_key,
'model': settings.vanna_model
})
# ── Connect to Azure SQL Database via pyodbc ──────
self.conn = pyodbc.connect(
settings.database_url,
autocommit=True # (1)
)
# Override Vanna's run_sql with pyodbc-backed version
def run_sql_mssql(sql: str) -> pd.DataFrame:
return pd.read_sql_query(sql, self.conn)
self.vn.run_sql = run_sql_mssql # (2)
autocommit=Trueprevents implicit transactions from holding read locks.- Vanna's
run_sqlis a pluggable method — overriding it is the official pattern for custom database backends.
Re-Training DDL from Azure SQL¶
After connecting to Azure SQL, run the following to auto-populate ChromaDB with the full schema:
import httpx
BASE = "http://127.0.0.1:8000/api/v1"
# Step 1 — auto-ingest schema (calls INFORMATION_SCHEMA internally)
resp = httpx.post(f"{BASE}/training/from-database")
print(resp.json())
# Step 2 — verify
data = httpx.get(f"{BASE}/training/data").json()
print(f"Total training records: {data['count']}")
T-SQL vs SQLite Syntax
After switching to MS SQL Server, all SQL training examples must use T-SQL:
| SQLite (POC) | T-SQL (Production) |
|---|---|
SELECT ... LIMIT 10 |
SELECT TOP 10 ... |
strftime('%Y', date) |
YEAR(date) |
GROUP_CONCAT(col) |
STRING_AGG(col, ',') |
CAST(x AS TEXT) |
CAST(x AS NVARCHAR) |