Chat with Multiple PDFs using Google Gemini Pro

Published on April 11, 2025 (8d ago)

A Streamlit web application enabling users to chat with multiple PDFs, merge PDFs, extract images, and perform image-to-text conversion, all powered by the Google Gemini Pro model.

Live Demo

GitHub Repository

Tags: Streamlit, Python, Google Gemini Pro, PDF Processing, Langchain, Chat History, Document Management, PDF Merging, Image Extraction, Image to Text

Introduction

The "Chat with Multiple PDFs using Gemini Pro with Advanced Features" application is a comprehensive Streamlit-based web tool designed for intelligent document interaction and manipulation. Beyond enabling natural language chat with multiple PDFs powered by the Google Gemini Pro model and Langchain, this application offers the capabilities to merge PDF documents, extract images embedded within PDFs, and perform image-to-text conversion on those extracted images.

Key Features

Technologies Used

How It Works

  1. PDF Upload: Users upload one or more PDF files through the Streamlit file uploader.
  2. Document Processing and Indexing: The application processes the uploaded PDFs using Langchain and PyPDF2, splitting the text into chunks and generating vector embeddings for efficient semantic search.
  3. Chat Interaction: Users type their questions into the chat input field.
  4. Intelligent Retrieval: Langchain's retrieval mechanisms identify the most relevant document chunks based on the user's query and the generated embeddings.
  5. Gemini Pro Response Generation: The relevant document context and the user's question are passed to the Google Gemini Pro model.
  6. Displaying Chat History: The user's questions and the model's answers are displayed chronologically in the chat interface, managed by the streamlit-chat component.
  7. Uploaded Document List: A section displays the names of the currently uploaded PDF documents.
  8. PDF Merging: Users can select multiple uploaded PDFs and initiate a merging process using PyPDF2. The merged PDF can then be downloaded.
  9. Image Extraction: Users can trigger the extraction of all images from the uploaded PDFs. The application uses PyPDF2 and Pillow to identify and save the embedded images.
  10. Image to Text Conversion: Users can select extracted images, and the application utilizes pytesseract (with Tesseract OCR) to convert the text within those images into machine-readable text, which can then potentially be used in the chat or displayed to the user.

Getting Started

To run this project locally:

  1. Clone the repository:
    git clone [https://github.com/kgurnoor/gemini_multipdf_chat](https://github.com/kgurnoor/gemini_multipdf_chat)
    cd gemini_multipdf_chat
  2. Install dependencies:
    pip install -r requirements.txt
  3. Install Tesseract OCR:
    • You will need to install the Tesseract OCR engine separately on your system. Refer to the Tesseract documentation for installation instructions specific to your operating system.
  4. Configure your Google Gemini Pro API key:
    • Set your Google Gemini Pro API key as an environment variable (recommended) or within your Python script. Consult the Google Cloud documentation for instructions.
  5. Run the Streamlit application:
    streamlit run app.py
  6. Open your web browser to the address shown in the terminal (typically http://localhost:8501).

Deployment

This Streamlit application can be deployed using Streamlit sharing or other Python web hosting platforms that support Streamlit. For the image-to-text functionality, ensure that Tesseract OCR is either pre-installed on the deployment environment or containerized with your application.

Future Enhancements

Contributing

Contributions to the "Chat with Multiple PDFs using Gemini Pro with Advanced Features" project are highly encouraged! Please submit pull requests for bug fixes, new features, and improvements. Feel free to open issues to discuss potential changes or report problems.