Technologies Used
Production-Ready NLP API Platform Using FastAPI and Ollama
This project is a FastAPI-based NLP service that integrates with the Ollama API to expose multiple large language model capabilities through clean, well-structured REST endpoints.
The goal of this project was to design a modular, API-first backend system that makes advanced language model functionalities easily accessible for real-world applications such as chat systems, summarization tools, sentiment analysis engines, translation services, and more.
Instead of building separate scripts for each task, I consolidated everything into a unified, scalable FastAPI application with properly defined endpoints, structured request/response schemas, and interactive API documentation via Swagger UI.
Key Features
- Modular REST API built with FastAPI
- Integration with Ollama (LLaMA-based models)
- Text generation with configurable parameters
- Summarization, classification, and Q&A endpoints
- Chatbot with persistent memory handling
- Translation and simulated text-to-speech
- Token counting endpoint for prompt optimization
Core Architecture
- FastAPI handles routing, validation, and API documentation.
- Ollama acts as the LLM inference backend.
- Each NLP task is exposed as a dedicated endpoint.
- JSON-based request and response structure for clean integration.
- Designed for easy extension with additional LLM capabilities.
Installation
1. Clone the repository:
git clone https://github.com/akhilsu/ollama-fastapi.git
2. Run the Application
uvicorn main:app --reload
http://127.0.0.1:8000/docs
3. API Endpoints
1. Basic Text Generation
Endpoint: POST /chat/
Request Body Example:
{
"prompt": "Once upon a time..."
}
Response:
{
"response": "Generated response text based on the prompt."
}
2. Text Generation with Custom Parameters
Endpoint: POST /generate/
Request Body Example:
{
"prompt": "Tell me a story.",
"temperature": 0.8,
"max_tokens": 200,
"top_p": 0.9
}
Response:
{
"response": "Generated response text based on the prompt and parameters."
}
3. Text Summarization
Endpoint: POST /summarize/
Request Body Example:
{
"text": "The quick brown fox jumps over the lazy dog. The dog wakes up and chases the fox."
}
Response:
{
"summary": "Shortened version of the text provided."
}
4. Text Classification (Sentiment Analysis)
Endpoint: POST /classify/
Request Body Example:
{
"text": "I love this place!"
}
Response:
{
"classification": "positive"
}
5. Question Answering
Endpoint: POST /qa/
Request Body Example:
{
"context": "The Eiffel Tower is in Paris.",
"question": "Where is the Eiffel Tower located?"
}
Response:
{
"answer": "Paris"
}
6. Chatbot with Memory (Persistent Chat)
Endpoint: POST /chat_with_memory/
Request Body Example:
{
"prompt": "Hello, how are you?",
"conversation_history": [{"user": "Hi!"}]
}
Response:
{
"conversation_history": [
{"user": "Hi!"},
{"assistant": "Hello, how are you?"}
]
}
7. Language Translation
Endpoint: POST /translate/
Request Body Example:
{
"text": "Hello, how are you?",
"target_language": "es"
}
Response:
{
"translation": "Hola, ¿cómo estás?"
}
8. Text-to-Speech (Simulated)
Endpoint: POST /text_to_speech/
Request Body Example:
{
"prompt": "Hello, welcome!"
}
Response:
{
"audio_file": "Simulated audio for: Hello, welcome!"
}
9. Model Information
Endpoint: GET /model_info/
Response:
{
"model": "llama3.2:3b-instruct-q4_K_M",
"version": "v1.0",
"parameters": {
"num_layers": 32,
"hidden_units": 4096
}
}
10. Token Count
Endpoint: POST /token_count/
Request Body Example:
{
"text": "Hello, world!"
}
Response:
{
"token_count": 3
}