DBN Production Architecture¶

Status: Approved by DBN · Target Environment: Microsoft Azure · Database: MS SQL Server (Azure SQL)

This page maps every component in the DBN-approved architecture diagram to a concrete technical implementation using our FastAPI RAG application.

Approved Architecture — Full Mermaid Reconstruction¶

The diagram below faithfully reproduces the approved DBN architecture and annotates each node with its implementation technology.

flowchart LR
    subgraph LOCAL["🖥️  Local Computers"]
        FF["📄 Flat Files\n(CSV / Excel)"]
        API_SRC["🔌 API Connection\n(External Systems)"]
    end

    subgraph SECURITY["🛡️  Security Perimeter"]
        FW["🔒 Firewall &\nAnti-Virus\n(Azure Front Door / WAF)"]
    end

    subgraph AZURE["☁️  Microsoft Azure"]

        subgraph INGESTION["Data Ingestion"]
            IE["⚙️  Injection Engine\n(Azure Data Factory)"]
            DBN_DW[("🗄️  DBN DW\n(MS SQL Server\nOn-Premise / Azure SQL)")]
        end

        subgraph STAGING["Staging Environment / SQL Mgt. Studio"]
            STAGE[("🗃️  Staging DB\n(Azure SQL Database\nPremium Tier)")]
        end

        MLE["🤖 Machine Learning Engine\n(Azure Machine Learning)"]

        subgraph ANALYTICS["Analytics Engine"]
            AE[("📊  Analytics DB\n(Azure SQL Database)\nmsmeloans / dimensional models")]
        end

        subgraph API_LAYER["NL Processor / SQL Generator"]
            NLP["⚡  FastAPI App\n(Azure Container Apps)\n/api/v1/chat/ask"]
            RULES[("🟣  Rules Based Engine\n(ChromaDB Vector Store)\nDDL + SQL + Docs")]
            VALIDATOR["✅  SQL Validator\n(read-only role +\nVanna pre-execution check)"]
        end

        subgraph SEC_LAYER["Security Layer"]
            TOKEN["🔑  Anti-Forgery Token & TLS\n(Azure Managed Cert +\nEntra ID JWT Validation)"]
        end

        CLOUD_OUT["🌐  Consumer Interface\n(Dashboard / BI Tool / REST Client)"]
    end

    %% Local → Azure ingestion
    FF --> FW
    API_SRC --> FW
    FW --> IE
    IE --> DBN_DW
    IE --> STAGE

    %% Staging → ML → Analytics
    STAGE -.->|"ML training data"| MLE
    MLE -.->|"pred_default_prob\nrisk_score"| AE
    STAGE --> AE

    %% Analytics → API layer
    AE --> NLP
    NLP --> RULES
    RULES -->|"top-10 similar\ncontext records"| NLP
    NLP --> VALIDATOR
    VALIDATOR -->|"validated T-SQL"| AE

    %% Security → output
    NLP --> TOKEN
    TOKEN --> CLOUD_OUT

    %% Styling
    classDef azure    fill:#e8f0fe,stroke:#3F51B5,color:#1a237e
    classDef db       fill:#ede7f6,stroke:#7C4DFF,color:#311b92
    classDef security fill:#fff3e0,stroke:#FF6D00,color:#e65100
    classDef api      fill:#e8f5e9,stroke:#00C853,color:#1b5e20
    classDef local    fill:#f5f5f5,stroke:#9e9e9e,color:#424242

    class IE,MLE,NLP,VALIDATOR azure
    class DBN_DW,STAGE,AE,RULES db
    class FW,TOKEN security
    class CLOUD_OUT api
    class FF,API_SRC local

Component-by-Component Implementation¶

Zone 1 — Local Computers (Data Sources)¶

Approved Component	Implementation	Technology
Flat Files	CSV/Excel exports from PFI partners	Azure Blob Storage (staging area)
API Connection	External loan management system feeds	Azure API Management gateway
Firewall / Anti-Virus	Network perimeter protection	Azure Front Door + WAF Policy

Zone 2 — Injection Engine & DBN DW¶

Approved Component	Implementation	Technology
Injection Engine	ETL/ELT orchestration pipeline	Azure Data Factory
DBN DW	Existing on-premise or Azure-hosted warehouse	MS SQL Server (on-prem or Azure SQL MI)
Staging / SQL Mgt. Studio	Data landing zone and quality checks	Azure SQL Database (Premium) + Azure Data Studio

Staging Process

Data lands in the Staging Database first where cleansing rules, deduplication, and schema enforcement are applied. Only validated records proceed to the Analytics Engine.

Zone 3 — Machine Learning Engine¶

Approved Component	Implementation	Technology
Machine Learning Engine	Model training + batch inference pipeline	Azure Machine Learning

The ML Engine produces the following columns that are written back into the Analytics Engine:

ML-Generated Column	Type	Description
`pred_default_prob`	`FLOAT`	Probability of loan default (0.0–1.0)
`predicted_default`	`BIT`	Binary default flag
`risk_score`	`FLOAT`	Composite risk score
`risk_category`	`VARCHAR`	`Low Risk` / `Medium Risk` / `High Risk`

Zone 4 — Analytics Engine (MS SQL Server)¶

The Analytics Engine is the curated, ML-enriched Azure SQL Database that the FastAPI application queries.

Connecting Vanna to Azure SQL Database

-- Connection string format (for services/vanna_service.py)
Driver={ODBC Driver 18 for SQL Server};
Server=tcp:dbn-analytics-server.database.windows.net,1433;
Database=DBNAnalyticsDB;
Authentication=ActiveDirectoryMsi;   -- Uses Managed Identity (no password!)
Encrypt=yes;
TrustServerCertificate=no;

Recommended database user setup (read-only):

-- Create a dedicated API service account
CREATE USER dbn_api_user FROM EXTERNAL PROVIDER;  -- Entra ID managed identity

-- Grant only SELECT privilege — no DML or DDL
ALTER ROLE db_datareader ADD MEMBER dbn_api_user;

Zone 5 — NL Processor / SQL Generator (FastAPI)¶

This is the heart of our application. It maps to three distinct sub-components:

5a. FastAPI Application — `NL Processor`¶

POST /api/v1/chat/ask   →   VannaService.generate_sql()   →   OpenAI GPT-4o

Deployed as an Azure Container Apps service running the Docker image built from our Dockerfile.

5b. ChromaDB Vector Store — `Rules Based Engine`¶

The vector store acts as the semantic rules engine by holding three categories of knowledge:

pie title ChromaDB Training Data Composition (recommended)
    "DDL Schema Definitions" : 20
    "Verified Question-SQL Pairs" : 60
    "Business Documentation" : 20

5c. SQL Validator¶

Before any generated SQL is sent to the Analytics Engine:

Vanna pre-check: Vanna validates SQL syntax and strips markdown code fences.
Role enforcement: The database user (dbn_api_user) has only db_datareader — DROP, INSERT, UPDATE, and DELETE are physically impossible.
Query timeout: Azure SQL enforces a Connection Timeout=30 to prevent runaway queries.

Zone 6 — Security Layer¶

Security Control	Implementation	Standard
TLS / HTTPS	Azure Container Apps managed certificate	TLS 1.3
Anti-Forgery Token	JWT Bearer token via Azure Entra ID	OAuth 2.0 / OIDC
CORS	FastAPI `CORSMiddleware` (approved origins only)	Same-Origin Policy
Secret Management	Azure Key Vault + Managed Identity	Zero secrets in code
API Rate Limiting	Azure API Management throttling policy	Configurable per consumer

Production Migration Checklist¶

Checklist for transitioning from the SQLite POC to the production MS SQL Server deployment:

Code: MS SQL Server Integration¶

Updated `requirements.txt`¶

pyodbc>=5.1.0

Updated `Dockerfile`¶

Dockerfile (MS SQL Server variant)

FROM python:3.10-slim

WORKDIR /app

# Install ODBC Driver 18 for SQL Server
RUN apt-get update && apt-get install -y \
    build-essential curl gnupg unixodbc-dev \
    && curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - \
    && curl https://packages.microsoft.com/config/debian/11/prod.list \
       > /etc/apt/sources.list.d/mssql-release.list \
    && apt-get update \
    && ACCEPT_EULA=Y apt-get install -y msodbcsql18 \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8080
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

Updated `services/vanna_service.py`¶

services/vanna_service.py — MS SQL Server

import pyodbc
import pandas as pd
from vanna.chromadb import ChromaDB_VectorStore
from vanna.openai import OpenAI_Chat
from core.config import settings

class MyVanna(ChromaDB_VectorStore, OpenAI_Chat):
    def __init__(self, config=None):
        ChromaDB_VectorStore.__init__(self, config=config)
        OpenAI_Chat.__init__(self, config=config)

class VannaService:
    def __init__(self):
        self.vn = MyVanna(config={
            'api_key': settings.openai_api_key,
            'model': settings.vanna_model
        })

        # ── Connect to Azure SQL Database via pyodbc ──────
        self.conn = pyodbc.connect(
            settings.database_url,
            autocommit=True      # (1)
        )

        # Override Vanna's run_sql with pyodbc-backed version
        def run_sql_mssql(sql: str) -> pd.DataFrame:
            return pd.read_sql_query(sql, self.conn)

        self.vn.run_sql = run_sql_mssql  # (2)

autocommit=True prevents implicit transactions from holding read locks.
Vanna's run_sql is a pluggable method — overriding it is the official pattern for custom database backends.

Re-Training DDL from Azure SQL¶

After connecting to Azure SQL, run the following to auto-populate ChromaDB with the full schema:

scripts/retrain_mssql_ddl.py

import httpx

BASE = "http://127.0.0.1:8000/api/v1"

# Step 1 — auto-ingest schema (calls INFORMATION_SCHEMA internally)
resp = httpx.post(f"{BASE}/training/from-database")
print(resp.json())

# Step 2 — verify
data = httpx.get(f"{BASE}/training/data").json()
print(f"Total training records: {data['count']}")

T-SQL vs SQLite Syntax

After switching to MS SQL Server, all SQL training examples must use T-SQL:

SQLite (POC)	T-SQL (Production)
`SELECT ... LIMIT 10`	`SELECT TOP 10 ...`
`strftime('%Y', date)`	`YEAR(date)`
`GROUP_CONCAT(col)`	`STRING_AGG(col, ',')`
`CAST(x AS TEXT)`	`CAST(x AS NVARCHAR)`