Skip to content

DBN Production Architecture

Status: Approved by DBN · Target Environment: Microsoft Azure · Database: MS SQL Server (Azure SQL)

This page maps every component in the DBN-approved architecture diagram to a concrete technical implementation using our FastAPI RAG application.


Approved Architecture — Full Mermaid Reconstruction

The diagram below faithfully reproduces the approved DBN architecture and annotates each node with its implementation technology.

flowchart LR
    subgraph LOCAL["🖥️  Local Computers"]
        FF["📄 Flat Files\n(CSV / Excel)"]
        API_SRC["🔌 API Connection\n(External Systems)"]
    end

    subgraph SECURITY["🛡️  Security Perimeter"]
        FW["🔒 Firewall &\nAnti-Virus\n(Azure Front Door / WAF)"]
    end

    subgraph AZURE["☁️  Microsoft Azure"]

        subgraph INGESTION["Data Ingestion"]
            IE["⚙️  Injection Engine\n(Azure Data Factory)"]
            DBN_DW[("🗄️  DBN DW\n(MS SQL Server\nOn-Premise / Azure SQL)")]
        end

        subgraph STAGING["Staging Environment / SQL Mgt. Studio"]
            STAGE[("🗃️  Staging DB\n(Azure SQL Database\nPremium Tier)")]
        end

        MLE["🤖 Machine Learning Engine\n(Azure Machine Learning)"]

        subgraph ANALYTICS["Analytics Engine"]
            AE[("📊  Analytics DB\n(Azure SQL Database)\nmsmeloans / dimensional models")]
        end

        subgraph API_LAYER["NL Processor / SQL Generator"]
            NLP["⚡  FastAPI App\n(Azure Container Apps)\n/api/v1/chat/ask"]
            RULES[("🟣  Rules Based Engine\n(ChromaDB Vector Store)\nDDL + SQL + Docs")]
            VALIDATOR["✅  SQL Validator\n(read-only role +\nVanna pre-execution check)"]
        end

        subgraph SEC_LAYER["Security Layer"]
            TOKEN["🔑  Anti-Forgery Token & TLS\n(Azure Managed Cert +\nEntra ID JWT Validation)"]
        end

        CLOUD_OUT["🌐  Consumer Interface\n(Dashboard / BI Tool / REST Client)"]
    end

    %% Local → Azure ingestion
    FF --> FW
    API_SRC --> FW
    FW --> IE
    IE --> DBN_DW
    IE --> STAGE

    %% Staging → ML → Analytics
    STAGE -.->|"ML training data"| MLE
    MLE -.->|"pred_default_prob\nrisk_score"| AE
    STAGE --> AE

    %% Analytics → API layer
    AE --> NLP
    NLP --> RULES
    RULES -->|"top-10 similar\ncontext records"| NLP
    NLP --> VALIDATOR
    VALIDATOR -->|"validated T-SQL"| AE

    %% Security → output
    NLP --> TOKEN
    TOKEN --> CLOUD_OUT

    %% Styling
    classDef azure    fill:#e8f0fe,stroke:#3F51B5,color:#1a237e
    classDef db       fill:#ede7f6,stroke:#7C4DFF,color:#311b92
    classDef security fill:#fff3e0,stroke:#FF6D00,color:#e65100
    classDef api      fill:#e8f5e9,stroke:#00C853,color:#1b5e20
    classDef local    fill:#f5f5f5,stroke:#9e9e9e,color:#424242

    class IE,MLE,NLP,VALIDATOR azure
    class DBN_DW,STAGE,AE,RULES db
    class FW,TOKEN security
    class CLOUD_OUT api
    class FF,API_SRC local

Component-by-Component Implementation

Zone 1 — Local Computers (Data Sources)

Approved Component Implementation Technology
Flat Files CSV/Excel exports from PFI partners Azure Blob Storage (staging area)
API Connection External loan management system feeds Azure API Management gateway
Firewall / Anti-Virus Network perimeter protection Azure Front Door + WAF Policy

Zone 2 — Injection Engine & DBN DW

Approved Component Implementation Technology
Injection Engine ETL/ELT orchestration pipeline Azure Data Factory
DBN DW Existing on-premise or Azure-hosted warehouse MS SQL Server (on-prem or Azure SQL MI)
Staging / SQL Mgt. Studio Data landing zone and quality checks Azure SQL Database (Premium) + Azure Data Studio

Staging Process

Data lands in the Staging Database first where cleansing rules, deduplication, and schema enforcement are applied. Only validated records proceed to the Analytics Engine.


Zone 3 — Machine Learning Engine

Approved Component Implementation Technology
Machine Learning Engine Model training + batch inference pipeline Azure Machine Learning

The ML Engine produces the following columns that are written back into the Analytics Engine:

ML-Generated Column Type Description
pred_default_prob FLOAT Probability of loan default (0.0–1.0)
predicted_default BIT Binary default flag
risk_score FLOAT Composite risk score
risk_category VARCHAR Low Risk / Medium Risk / High Risk

Zone 4 — Analytics Engine (MS SQL Server)

The Analytics Engine is the curated, ML-enriched Azure SQL Database that the FastAPI application queries.

Connecting Vanna to Azure SQL Database
-- Connection string format (for services/vanna_service.py)
Driver={ODBC Driver 18 for SQL Server};
Server=tcp:dbn-analytics-server.database.windows.net,1433;
Database=DBNAnalyticsDB;
Authentication=ActiveDirectoryMsi;   -- Uses Managed Identity (no password!)
Encrypt=yes;
TrustServerCertificate=no;

Recommended database user setup (read-only):

-- Create a dedicated API service account
CREATE USER dbn_api_user FROM EXTERNAL PROVIDER;  -- Entra ID managed identity

-- Grant only SELECT privilege — no DML or DDL
ALTER ROLE db_datareader ADD MEMBER dbn_api_user;

Zone 5 — NL Processor / SQL Generator (FastAPI)

This is the heart of our application. It maps to three distinct sub-components:

5a. FastAPI Application — NL Processor

POST /api/v1/chat/ask   →   VannaService.generate_sql()   →   OpenAI GPT-4o

Deployed as an Azure Container Apps service running the Docker image built from our Dockerfile.

5b. ChromaDB Vector Store — Rules Based Engine

The vector store acts as the semantic rules engine by holding three categories of knowledge:

pie title ChromaDB Training Data Composition (recommended)
    "DDL Schema Definitions" : 20
    "Verified Question-SQL Pairs" : 60
    "Business Documentation" : 20

5c. SQL Validator

Before any generated SQL is sent to the Analytics Engine:

  1. Vanna pre-check: Vanna validates SQL syntax and strips markdown code fences.
  2. Role enforcement: The database user (dbn_api_user) has only db_datareaderDROP, INSERT, UPDATE, and DELETE are physically impossible.
  3. Query timeout: Azure SQL enforces a Connection Timeout=30 to prevent runaway queries.

Zone 6 — Security Layer

Security Control Implementation Standard
TLS / HTTPS Azure Container Apps managed certificate TLS 1.3
Anti-Forgery Token JWT Bearer token via Azure Entra ID OAuth 2.0 / OIDC
CORS FastAPI CORSMiddleware (approved origins only) Same-Origin Policy
Secret Management Azure Key Vault + Managed Identity Zero secrets in code
API Rate Limiting Azure API Management throttling policy Configurable per consumer

Production Migration Checklist

Checklist for transitioning from the SQLite POC to the production MS SQL Server deployment:

  • Infrastructure: Provision Azure SQL Database (Premium tier, 4 vCores minimum)
  • Networking: Create Azure Virtual Network; place Container App and SQL in same VNet
  • Identity: Create Managed Identity for the Container App; grant db_datareader role
  • Key Vault: Store OPENAI_API_KEY in Azure Key Vault; reference via Managed Identity
  • Data Migration: Run ADF pipeline to load msmeloans from DBN DW into Analytics DB
  • Driver Update: Add msodbcsql18 to Dockerfile; install pyodbc in requirements.txt
  • Code Update: Update VannaService.__init__ to use pyodbc connection (see below)
  • Retrain DDL: Clear old SQLite DDL from ChromaDB; re-train with T-SQL DDL from Azure SQL
  • Retrain SQL: Replace LIMIT with TOP in all SQL training examples (T-SQL dialect)
  • Security Review: Restrict CORS allow_origins to approved frontend domains
  • Load Test: Verify <2s p95 latency on /api/v1/chat/ask under 10 concurrent users

Code: MS SQL Server Integration

Updated requirements.txt

pyodbc>=5.1.0

Updated Dockerfile

Dockerfile (MS SQL Server variant)
FROM python:3.10-slim

WORKDIR /app

# Install ODBC Driver 18 for SQL Server
RUN apt-get update && apt-get install -y \
    build-essential curl gnupg unixodbc-dev \
    && curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - \
    && curl https://packages.microsoft.com/config/debian/11/prod.list \
       > /etc/apt/sources.list.d/mssql-release.list \
    && apt-get update \
    && ACCEPT_EULA=Y apt-get install -y msodbcsql18 \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8080
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

Updated services/vanna_service.py

services/vanna_service.py — MS SQL Server
import pyodbc
import pandas as pd
from vanna.chromadb import ChromaDB_VectorStore
from vanna.openai import OpenAI_Chat
from core.config import settings

class MyVanna(ChromaDB_VectorStore, OpenAI_Chat):
    def __init__(self, config=None):
        ChromaDB_VectorStore.__init__(self, config=config)
        OpenAI_Chat.__init__(self, config=config)

class VannaService:
    def __init__(self):
        self.vn = MyVanna(config={
            'api_key': settings.openai_api_key,
            'model': settings.vanna_model
        })

        # ── Connect to Azure SQL Database via pyodbc ──────
        self.conn = pyodbc.connect(
            settings.database_url,
            autocommit=True      # (1)
        )

        # Override Vanna's run_sql with pyodbc-backed version
        def run_sql_mssql(sql: str) -> pd.DataFrame:
            return pd.read_sql_query(sql, self.conn)

        self.vn.run_sql = run_sql_mssql  # (2)
  1. autocommit=True prevents implicit transactions from holding read locks.
  2. Vanna's run_sql is a pluggable method — overriding it is the official pattern for custom database backends.

Re-Training DDL from Azure SQL

After connecting to Azure SQL, run the following to auto-populate ChromaDB with the full schema:

scripts/retrain_mssql_ddl.py
import httpx

BASE = "http://127.0.0.1:8000/api/v1"

# Step 1 — auto-ingest schema (calls INFORMATION_SCHEMA internally)
resp = httpx.post(f"{BASE}/training/from-database")
print(resp.json())

# Step 2 — verify
data = httpx.get(f"{BASE}/training/data").json()
print(f"Total training records: {data['count']}")

T-SQL vs SQLite Syntax

After switching to MS SQL Server, all SQL training examples must use T-SQL:

SQLite (POC) T-SQL (Production)
SELECT ... LIMIT 10 SELECT TOP 10 ...
strftime('%Y', date) YEAR(date)
GROUP_CONCAT(col) STRING_AGG(col, ',')
CAST(x AS TEXT) CAST(x AS NVARCHAR)