Skip to content

InspectRAG Domain-Wide Gmail Integration: Step-by-Step Guide

Overview

This guide provides detailed instructions for setting up and configuring InspectRAG's integration with Google Workspace for domain-wide Gmail access. This integration allows you to index and search emails across all users in your domain while maintaining appropriate access controls.

Prerequisites

Before beginning, ensure you have:

  • Administrator access to your Google Workspace Admin Console
  • Access to Google Cloud Console
  • Ability to create and configure service accounts
  • Redis server running (for temporary data storage)
  • InspectRAG application installed and configured

Step 1: Create a Google Cloud Project

  1. Go to the Google Cloud Console
  2. Click on the project dropdown at the top of the page
  3. Click "New Project"
  4. Enter a project name (e.g., "InspectRAG-Integration")
  5. Click "Create"
  6. Wait for the project to be created and select it

Step 2: Enable Required APIs

  1. In your Google Cloud Project, navigate to "APIs & Services" > "Library"
  2. Search for and enable the following APIs:
  3. Gmail API
  4. Admin SDK API
  5. Google Drive API (if you plan to index attachments)

Step 3: Create a Service Account

  1. Navigate to "IAM & Admin" > "Service Accounts"
  2. Click "Create Service Account"
  3. Enter a name (e.g., "inspectrag-gmail-integration")
  4. Add a description: "Service account for domain-wide access to Gmail"
  5. Click "Create and Continue"
  6. Skip role assignment (you'll set up domain-wide delegation instead)
  7. Click "Done"

Step 4: Create and Download Service Account Key

  1. From the Service Accounts list, click on your newly created service account
  2. Go to the "Keys" tab
  3. Click "Add Key" > "Create new key"
  4. Select "JSON" format
  5. Click "Create"
  6. Save the downloaded key file securely

Step 5: Enable Domain-Wide Delegation

  1. Still on your service account page, click on the "Details" tab
  2. Under "Domain-wide delegation," click "Edit"
  3. Check the box for "Enable Google Workspace Domain-wide Delegation"
  4. Add a product name for the consent screen (e.g., "InspectRAG Gmail Integration")
  5. Click "Save"
  6. Note your service account's Client ID for the next step

Step 6: Configure Google Workspace Admin Console

  1. Go to your Google Workspace Admin Console
  2. Navigate to "Security" > "API controls"
  3. In the "Domain-wide Delegation" section, click "Manage Domain-wide Delegation"
  4. Click "Add new"
  5. Enter the Client ID of your service account
  6. Enter the following OAuth scopes:
    https://www.googleapis.com/auth/gmail.readonly
    https://www.googleapis.com/auth/admin.directory.user.readonly
    
  7. Click "Authorize"

Step 7: Set Up Environment Variables in InspectRAG

  1. In your InspectRAG installation directory, locate the environment configuration file (.env)
  2. Configure the following environment variables:
# Enable/disable domain email processing
ENABLE_DOMAIN_EMAILS=true

# Processing interval (in seconds)
DOMAIN_EMAIL_INTERVAL=15     # How frequently to check for new emails

# Domain-wide delegation settings
SERVICE_ACCOUNT_FILE=/path/to/service_accounts_key.json
WORKSPACE_DOMAIN=yourdomain.com
[email protected]
DOMAIN_EMAIL_QUERY=is:inbox newer_than:1d

# Redis connection (required for thread tracking)
RAG_REDIS_URL=redis://localhost:6379/0
  1. Replace the values with your specific configuration:
  2. SERVICE_ACCOUNT_FILE: Full path to your downloaded service account key
  3. WORKSPACE_DOMAIN: Your Google Workspace domain name (without @ symbol)
  4. WORKSPACE_ADMIN: Email of an administrator in your domain
  5. DOMAIN_EMAIL_QUERY: Gmail search query to determine which emails to index
  6. RAG_REDIS_URL: Your Redis server connection string

Environment Variable Details

Variable Description Example
ENABLE_DOMAIN_EMAILS Toggles the domain Gmail integration on/off true
DOMAIN_EMAIL_INTERVAL How often to check for new emails (seconds) 15 for testing, 300 (5 min) for production
SERVICE_ACCOUNT_FILE Path to your service account key file /home/user/InspectRAG/service_accounts_key.json
WORKSPACE_DOMAIN Your Google Workspace domain company.com
WORKSPACE_ADMIN Admin email for domain operations [email protected]
DOMAIN_EMAIL_QUERY Gmail search query for filtering emails is:inbox newer_than:1d

Note: After changing these environment variables, you'll need to restart the InspectRAG application for the changes to take effect.

Step 8: Store the Service Account Key

  1. Create a directory for storing service account keys if it doesn't exist:
    sudo mkdir -p /etc/app/service-accounts
    
  2. Set appropriate permissions:
    sudo chmod 700 /etc/app/service-accounts
    
  3. Copy your service account key file to this directory:
    sudo cp /path/to/downloaded-key.json /etc/app/service-accounts/yourdomain-gmail.json
    
  4. Set appropriate file permissions:
    sudo chmod 600 /etc/app/service-accounts/yourdomain-gmail.json
    

Step 9: Test the Configuration

  1. Run the following Celery task to verify access to your domain:
    from tasks.domain_email_tasks import process_all_domain_users
    
    # Replace with your domain and service account file path
    process_all_domain_users.delay(
        service_account_file="/etc/app/service-accounts/yourdomain-gmail.json",
        domain="yourdomain.com",
        query="is:inbox newer_than:3d"
    )
    
  2. Check the logs to ensure the task is running successfully
  3. Verify that user emails are being discovered and processed

Step 10: Configure Scheduled Tasks

  1. Set up a periodic task to regularly sync emails using Celery Beat
  2. Add the following configuration to your Celery Beat schedule:
# In your Celery configuration
app.conf.beat_schedule = {
    'sync-domain-emails-daily': {
        'task': 'tasks.domain_email_tasks.process_all_domain_users',
        'schedule': crontab(hour=1, minute=30),  # Runs at 1:30 AM daily
        'args': (
            '/etc/app/service-accounts/yourdomain-gmail.json',
            'yourdomain.com',
            'is:inbox newer_than:1d'
        )
    },
    'track-domain-threads-daily': {
        'task': 'tasks.domain_email_tasks.track_domain_threads',
        'schedule': crontab(hour=3, minute=0),  # Runs at 3:00 AM daily
        'args': (
            '/etc/app/service-accounts/yourdomain-gmail.json',
            'yourdomain.com'
        )
    },
}

Step 11: Fine-Tune Query Parameters

For optimal performance and coverage, you can adjust the Gmail query parameters:

Query Parameter Description Example
newer_than: Retrieves emails within time range newer_than:7d
is:inbox Only inbox emails is:inbox
is:sent Only sent emails is:sent
is:anywhere All emails (inbox, sent, archived) is:anywhere
has:attachment Only emails with attachments has:attachment
-label:spam Exclude spam -label:spam

Common query combinations: - is:inbox newer_than:7d -label:spam (new inbox emails, not spam) - is:anywhere newer_than:30d has:attachment (all recent emails with attachments) - is:sent newer_than:14d (recently sent emails)

Step 12: Monitor and Troubleshoot

  1. Check Celery logs for errors:

    tail -f celery.log
    

  2. Monitor Redis for thread tracking information:

    redis-cli
    keys email:thread:*
    keys email:domain:thread:*
    

  3. Common issues and solutions:

Issue Possible Cause Solution
Authentication errors Incorrect service account configuration Verify scopes in Google Workspace Admin Console
Task failures Redis connection issues Check Redis server status and connection URL
Missing emails Restrictive query Adjust query parameters to be more inclusive
No users found Insufficient admin permissions Verify admin account has directory access
Rate limiting Too many API requests Implement progressive backoff; spread tasks over time

Step 13: Securing Sensitive Information

  1. Ensure service account keys are stored securely
  2. Implement access controls to the email index
  3. Set up appropriate user authentication for InspectRAG
  4. Configure Redis to require authentication if exposed to network

Advanced Configuration

Processing Specific Users

To process emails for specific users only:

from tasks.domain_email_tasks import fetch_gmail_domain_wide_task

fetch_gmail_domain_wide_task.delay(
    user_id="yourdomain.com:[email protected]",
    service_account_file="/etc/app/service-accounts/yourdomain-gmail.json",
    subject_email="[email protected]",
    query="is:inbox newer_than:7d"
)

Cross-Thread Analysis

To analyze relationships between email threads across users:

from tasks.domain_email_tasks import fetch_domain_threads

fetch_domain_threads.delay(
    service_account_file="/etc/app/service-accounts/yourdomain-gmail.json",
    domain="yourdomain.com",
    admin_email="[email protected]",
    query="is:anywhere newer_than:30d"
)

Performance Optimization

For large domains: 1. Process users in batches 2. Use different time ranges for initial vs. delta syncs 3. Schedule jobs during off-peak hours 4. Adjust Redis expiration settings for thread tracking

Resources

This guide provides comprehensive instructions for setting up and configuring InspectRAG's domain-wide Gmail integration. Follow these steps carefully to ensure proper configuration and operation.