Quick Start Guide
Installation
Option 1: Automated Installation (Recommended)
Run the installation script:
chmod +x install.sh
./install.sh
This will:
- Create a virtual environment
- Install all dependencies
- Create .env file from template
- Set up the project structure
Option 2: Manual Installation
# Create virtual environment
python3 -m venv venv
# Activate virtual environment
source venv/bin/activate # On Windows: venv\Scripts\activate
# Upgrade pip
pip install --upgrade pip
# Install dependencies
pip install -r requirements.txt
# Create .env file
cp .env.example .env
Option 3: Using Makefile
make install
Configuration
Edit the .env file and add your API keys:
# Required
OPENAI_API_KEY=your_openai_api_key_here
# For production (Databricks)
DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
DATABRICKS_TOKEN=your_databricks_token_here
DATABRICKS_WAREHOUSE_ID=your_warehouse_id_here
# Optional: HuggingFace (for publishing datasets)
HUGGINGFACE_TOKEN=hf_your_write_token_here # Needs Write permissions
HF_ORGANIZATION=YourOrgName # Optional
Running the System
Start the API Server
# Using the virtual environment
source venv/bin/activate
python main.py serve
# Or using make
make run
Visit http://localhost:8000 for the API and http://localhost:8000/docs for interactive documentation.
Run Example Workflow
# Activate venv first
source venv/bin/activate
# Run example
python examples/example_workflow.py
# Or using make
make example
Generate Heatmap
# Activate venv first
source venv/bin/activate
# Generate heatmap
python main.py generate-heatmap --output heatmap.html
# Or using make
make heatmap
Docker Deployment
# Start all services
make docker-up
# Stop all services
make docker-down
This starts:
- API server on http://localhost:8000
- Qdrant vector DB on http://localhost:6333
- Jupyter notebook on http://localhost:8888
Common Commands
# Activate virtual environment (required for all commands)
source venv/bin/activate
# Start API server
python main.py serve
# Run with auto-reload (development)
python main.py serve --reload
# Check system status
python main.py status
# Run tests
pytest
# Or using make
make test
Troubleshooting
"ModuleNotFoundError: No module named 'click'"
You need to activate the virtual environment first:
source venv/bin/activate
"Tesseract binary not found" or OCR errors
The install.sh script automatically installs tesseract-ocr on Linux (via apt) and macOS (via brew). If it failed or you're on a different system, install manually:
Linux (Debian/Ubuntu):
sudo apt-get update && sudo apt-get install -y tesseract-ocr
macOS:
brew install tesseract
Verify installation:
tesseract --version
OCR is optional but enables text extraction from scanned PDFs and images.
"error: externally-managed-environment"
Don't use pip install directly. Use the virtual environment:
# Create venv if not exists
python3 -m venv venv
# Activate it
source venv/bin/activate
# Now install
pip install -r requirements.txt
Permission denied when running install.sh
chmod +x install.sh
./install.sh
Next Steps
- Configure your
.envfile with API keys - Run the example workflow:
make example - Start the API server:
make run - Check out the interactive docs: http://localhost:8000/docs
- Generate a heatmap:
make heatmap
For more details, see the main README.md.