Skip to main content

Databricks Agent Bricks Refactoring - Summary

What Was Done

This system has been refactored to support Databricks Agent Bricks (Mosaic AI Agent Framework), enabling production-ready deployment on Databricks with full governance, monitoring, and scalability.

New Files Created

1. Core Agent Infrastructure

  • agents/mlflow_base.py - MLflow Pyfunc base classes for agents

    • MLflowAgentBase: Base class with tracing, Model Serving compatibility
    • MLflowChainAgent: LangChain integration with automatic logging
    • Automatic signature inference
    • Unity Catalog registration methods
    • Model Serving deployment helpers
  • agents/mlflow_classifier.py - Production classifier agent

    • Hybrid keyword + LLM classification
    • MLflow tracing for all calls
    • Unity Catalog ready
    • Can be deployed to Model Serving
    • Includes registration script

2. Deployment & Operations

  • databricks/deployment.py - Deployment automation

    • AgentDeploymentManager class
    • Register agents to Unity Catalog
    • Deploy to Model Serving endpoints
    • Multi-agent endpoints with traffic splitting
    • Endpoint testing and monitoring
    • Auto-scaling configuration
  • databricks/evaluation.py - Quality assurance

    • AgentEvaluator class
    • Automated evaluation pipelines
    • A/B testing between versions
    • Metrics: accuracy, precision, recall, F1, latency
    • Confusion matrix generation
    • Feedback loop integration with Delta Lake

3. Interactive Development

  • databricks/notebooks/01_agent_bricks_quickstart.py - Databricks notebook

    • Step-by-step deployment guide
    • Local testing examples
    • Unity Catalog registration
    • Model Serving deployment
    • Evaluation examples
    • Delta Lake queries
    • Monitoring and observability
  • databricks/README.md - Comprehensive documentation

    • Architecture diagrams
    • Deployment workflows
    • API usage examples
    • Cost considerations
    • Troubleshooting guide
    • Best practices

4. Dependencies

  • Updated requirements-cpu.txt with:
    • mlflow>=2.10.0 - MLflow tracking and serving
    • databricks-agents>=0.1.0 - Agent Framework
    • databricks-vectorsearch>=0.22.0 - Vector search
    • langgraph>=0.0.20 - Stateful agent graphs
    • databricks-sdk>=0.18.0 - Databricks API client

5. Updated Existing Files

  • README.md - Added Databricks Agent Bricks section
  • install.sh - Detects and uses requirements-cpu.txt

Key Features Added

1. MLflow Integration

✅ Automatic tracing of all agent calls ✅ LLM request/response logging ✅ Metrics tracking (latency, tokens, cost) ✅ Experiment tracking and versioning ✅ Model registry integration

2. Unity Catalog Governance

✅ Centralized model registration ✅ Permissions and access control ✅ Data lineage tracking ✅ Version management ✅ Tag-based organization

3. Model Serving

✅ REST API endpoints ✅ Auto-scaling (scale-to-zero capable) ✅ A/B testing with traffic splitting ✅ Multi-agent pipelines ✅ Monitoring and alerting

4. Evaluation Framework

✅ Automated quality metrics ✅ Regression detection ✅ Version comparison ✅ Confusion matrices ✅ Feedback loop from production

5. Production Ready

✅ CPU-only compatibility (no GPU needed) ✅ Enterprise monitoring ✅ Cost optimization (keyword filtering before LLM) ✅ Error handling and retries ✅ Comprehensive logging

Deployment Architecture

┌──────────────────────────────────────────────────────────┐
│ Databricks Workspace │
│ │
│ ┌──────────────────┐ ┌────────────────────┐ │
│ │ Unity Catalog │◄─────┤ MLflow Tracking │ │
│ │ - Policy Class. │ │ - Experiments │ │
│ │ - Sentiment An. │ │ - Traces │ │
│ │ - Advocacy Gen. │ │ - Metrics │ │
│ └────────┬─────────┘ └────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ Model Serving Endpoints │ │
│ │ ┌────────────┐ ┌─────────────────────┐ │ │
│ │ │ Classifier │ │ Sentiment Analyzer │ │ │
│ │ │ (Small) │ │ (Small) │ │ │
│ │ │ Scale-to-0 │ │ Scale-to-0 │ │ │
│ │ └────────────┘ └─────────────────────┘ │ │
│ └─────────────┬────────────────────────────┘ │
│ │ │
└────────────────┼──────────────────────────────────────────┘


┌────────────────┐
│ External API │
│ FastAPI App │
└────────────────┘

Usage Examples

Register Agent to Unity Catalog

from agents.mlflow_classifier import PolicyClassifierAgent
from databricks.deployment import AgentDeploymentManager

manager = AgentDeploymentManager()

version = manager.register_agent(
agent_class=PolicyClassifierAgent,
agent_name="policy_classifier",
description="Classifies documents for oral health topics",
tags={"team": "advocacy"}
)

Deploy to Model Serving

endpoint_url = manager.deploy_agent(
agent_name="policy_classifier",
endpoint_name="policy-classifier-prod",
workload_size="Small",
scale_to_zero=True
)

Evaluate Agent

from databricks.evaluation import AgentEvaluator

evaluator = AgentEvaluator("policy_classifier")

metrics = evaluator.evaluate_classifier(
model_uri="models:/main.agents.policy_classifier/1",
test_documents=test_docs,
ground_truth=labels
)

print(f"Accuracy: {metrics.accuracy:.2%}")

Invoke via API

curl -X POST https://workspace.cloud.databricks.com/serving-endpoints/policy-classifier-prod/invocations \
-H "Authorization: Bearer $DATABRICKS_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"dataframe_records": [{
"document_id": "doc_001",
"title": "Meeting",
"content": "Fluoride discussion..."
}]
}'

Benefits

Before (Custom Implementation)

  • ❌ Manual deployment and versioning
  • ❌ No built-in observability
  • ❌ Limited scalability
  • ❌ No governance or lineage
  • ❌ Manual evaluation pipelines
  • ❌ Complex monitoring setup

After (Databricks Agent Bricks)

  • ✅ One-command deployment
  • ✅ Automatic tracing and logging
  • ✅ Auto-scaling Model Serving
  • ✅ Unity Catalog governance
  • ✅ Built-in evaluation framework
  • ✅ Enterprise monitoring included

Cost Optimization

The refactored system includes several cost optimizations:

  1. Hybrid Classification: Uses keyword matching before expensive LLM calls
  2. Scale-to-Zero: Endpoints scale down when idle
  3. Batch Processing: Supports bulk document classification
  4. Caching: Frequently requested results can be cached
  5. Small Workloads: Starts with small endpoints, scales on demand

Estimated cost: ~$0.10-0.50/hour for active endpoints (much less with scale-to-zero)

Next Steps

  1. Deploy to Databricks:

    python -m databricks.deployment
  2. Run Evaluation:

    python -m databricks.evaluation
  3. Test in Notebook: Open databricks/notebooks/01_agent_bricks_quickstart.py

  4. Monitor Production: Set up alerts in Databricks UI

  5. Add Feedback Loop: Collect corrections and retrain

Migration Path

For existing users:

  1. Standalone mode still works - No breaking changes to existing code
  2. 🔄 Gradual migration - Can use both modes simultaneously
  3. ☁️ Databricks optional - Only needed for production scale
  4. 🎯 Choose your path:
    • Small projects: Use standalone mode
    • Production/Enterprise: Use Databricks Agent Bricks

Questions?

  • See databricks/README.md for detailed docs
  • Run databricks/notebooks/01_agent_bricks_quickstart.py for hands-on tutorial
  • Check examples in databricks/deployment.py and databricks/evaluation.py

Summary

This refactoring transforms the Oral Health Policy Pulse from a standalone multi-agent system into a production-ready, enterprise-grade application that leverages Databricks' full stack for AI governance, deployment, and monitoring. The system now has:

  • 🏢 Enterprise deployment via Model Serving
  • 📊 Automatic observability with MLflow tracing
  • 🔐 Data governance through Unity Catalog
  • 📈 Quality assurance with evaluation framework
  • 💰 Cost optimization with scale-to-zero and hybrid approach
  • 🚀 Production readiness out of the box

All while maintaining backward compatibility with the standalone mode! 🎉