# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

EpicrisisIA is a medical AI system for processing clinical histories, extracting diagnoses and procedures, and automatically assigning ICD-10 and CUPS medical codes. The system uses FastAPI with MongoDB for storage and integrates with Groq API for LLM analysis and FAISS for vector similarity search.

## Architecture

### Core Components

- **FastAPI Application**: Two main entry points
  - `app/main.py`: Basic analysis functionality 
  - `app/main_CIE10.py`: Full system with ICD-10/CUPS coding (primary application)

- **Authentication System**: JWT-based auth in `app/auth.py` with user management in `app/routes/users.py`

- **Data Processing Pipeline**:
  - PDF text extraction: `modules/data_collection/read_pdf.py`
  - MongoDB storage: `modules/data_collection/mongodb_storage.py`
  - Clinical analysis: `modules/processing/resumen_cie10.py`
  - ICD-10 coding: `modules/processing/cie10/RANGES_HTML.py`
  - CUPS coding: `modules/processing/cie10/RANGES_CUPS.py`

- **Vector Search**: FAISS indices in `faiss_principal_jerarquico/` and `cups_faiss/` directories

### Key Data Flows

1. **PDF Upload**: User uploads clinical history PDF → text extraction → MongoDB storage
2. **Analysis**: Clinical text → Groq LLM analysis → structured HTML output with diagnoses/procedures
3. **Code Assignment**: Extracted diagnoses/procedures → FAISS similarity search → ICD-10/CUPS code assignment
4. **Complete Processing**: `/procesar_ultimo_pdf` endpoint combines all steps for user's latest PDF

## Development Commands

### Running the Application

```bash
# Main application with ICD-10/CUPS coding
python app/main_CIE10.py

# Basic analysis only
python app/main.py
```

Default server runs on `http://0.0.0.0:7070`

### Dependencies

```bash
pip install -r requirements.txt
```

Key dependencies: FastAPI, uvicorn, pymongo, groq, faiss-cpu, transformers, langchain_community

## Configuration

- Environment variables managed in `app/config.py`
- MongoDB connection: `MONGO_URI`, `DB_NAME` 
- Groq API key: `GROQ_API_KEY` (currently hardcoded in config.py - should be moved to env var)
- Database collections:
  - `HistoriaClinica`: Raw PDF content storage
  - `historias_analizadas`: Processed analysis results

## Important Notes

- **Security Issue**: Groq API key is hardcoded in `app/config.py:13` - should use environment variable
- **Main Application**: Use `app/main_CIE10.py` for full functionality including medical coding
- **User Authentication**: All PDF processing endpoints require authentication via JWT tokens
- **FAISS Indices**: Pre-built indices required for ICD-10/CUPS code assignment functionality
- **MongoDB**: Required for PDF storage and analysis results persistence

## Common Endpoints

- `POST /upload_pdf`: Upload and extract text from clinical history PDFs
- `GET /procesar_ultimo_pdf`: Complete processing pipeline for user's latest PDF
- `POST /analisis_historia`: Analyze clinical text and extract structured information
- `POST /asignar_codigos`: Assign ICD-10 codes to diagnoses
- `GET /mis_historias`: Get user's processed clinical histories
- `GET /historia/{id}`: Get specific clinical history by ID