Technologies Used

Python FastAPI Ollama API LLaMA LLM

Production-Ready NLP API Platform Using FastAPI and Ollama

This project is a FastAPI-based NLP service that integrates with the Ollama API to expose multiple large language model capabilities through clean, well-structured REST endpoints.

The goal of this project was to design a modular, API-first backend system that makes advanced language model functionalities easily accessible for real-world applications such as chat systems, summarization tools, sentiment analysis engines, translation services, and more.

Instead of building separate scripts for each task, I consolidated everything into a unified, scalable FastAPI application with properly defined endpoints, structured request/response schemas, and interactive API documentation via Swagger UI.


Key Features

  • Modular REST API built with FastAPI
  • Integration with Ollama (LLaMA-based models)
  • Text generation with configurable parameters
  • Summarization, classification, and Q&A endpoints
  • Chatbot with persistent memory handling
  • Translation and simulated text-to-speech
  • Token counting endpoint for prompt optimization

Core Architecture

  • FastAPI handles routing, validation, and API documentation.
  • Ollama acts as the LLM inference backend.
  • Each NLP task is exposed as a dedicated endpoint.
  • JSON-based request and response structure for clean integration.
  • Designed for easy extension with additional LLM capabilities.

Installation

1. Clone the repository:

git clone https://github.com/akhilsu/ollama-fastapi.git

2. Run the Application

uvicorn main:app --reload
http://127.0.0.1:8000/docs

3. API Endpoints

1. Basic Text Generation

Endpoint: POST /chat/

Request Body Example:

{
  "prompt": "Once upon a time..."
}

Response:

{
  "response": "Generated response text based on the prompt."
}

2. Text Generation with Custom Parameters

Endpoint: POST /generate/

Request Body Example:

{
  "prompt": "Tell me a story.",
  "temperature": 0.8,
  "max_tokens": 200,
  "top_p": 0.9
}

Response:

{
  "response": "Generated response text based on the prompt and parameters."
}

3. Text Summarization

Endpoint: POST /summarize/

Request Body Example:

{
  "text": "The quick brown fox jumps over the lazy dog. The dog wakes up and chases the fox."
}

Response:

{
  "summary": "Shortened version of the text provided."
}

4. Text Classification (Sentiment Analysis)

Endpoint: POST /classify/

Request Body Example:

{
  "text": "I love this place!"
}

Response:

{
  "classification": "positive"
}

5. Question Answering

Endpoint: POST /qa/

Request Body Example:

{
  "context": "The Eiffel Tower is in Paris.",
  "question": "Where is the Eiffel Tower located?"
}

Response:

{
  "answer": "Paris"
}

6. Chatbot with Memory (Persistent Chat)

Endpoint: POST /chat_with_memory/

Request Body Example:

{
  "prompt": "Hello, how are you?",
  "conversation_history": [{"user": "Hi!"}]
}

Response:

{
  "conversation_history": [
    {"user": "Hi!"},
    {"assistant": "Hello, how are you?"}
  ]
}

7. Language Translation

Endpoint: POST /translate/

Request Body Example:

{
  "text": "Hello, how are you?",
  "target_language": "es"
}

Response:

{
  "translation": "Hola, ¿cómo estás?"
}

8. Text-to-Speech (Simulated)

Endpoint: POST /text_to_speech/

Request Body Example:

{
  "prompt": "Hello, welcome!"
}

Response:

{
  "audio_file": "Simulated audio for: Hello, welcome!"
}

9. Model Information

Endpoint: GET /model_info/

Response:

{
  "model": "llama3.2:3b-instruct-q4_K_M",
  "version": "v1.0",
  "parameters": {
    "num_layers": 32,
    "hidden_units": 4096
  }
}

10. Token Count

Endpoint: POST /token_count/

Request Body Example:

{
  "text": "Hello, world!"
}

Response:

{
  "token_count": 3
}