InspectRAG S3 Bucket Integration Guide
Overview
This guide explains how to configure and use InspectRAG's integration with Amazon S3 buckets. This integration allows InspectRAG to index and keep synchronized your S3 objects, automatically processing files when they are created, updated, or deleted.
Prerequisites
Before configuring the S3 bucket integration, ensure you have:
- An AWS account with appropriate permissions
- Access to create or modify S3 buckets
- AWS credentials (Access Key ID and Secret Access Key)
- InspectRAG application installed and configured
Step 1: Set Up AWS Credentials
- Create an IAM user with programmatic access in your AWS account
- Attach policies that allow S3 bucket access (e.g.,
AmazonS3ReadOnlyAccess
for read-only integration, or more specific permissions as needed) - Generate and securely store the Access Key ID and Secret Access Key
Step 2: Configure Environment Variables
In your InspectRAG installation directory, locate the environment configuration file (.env
) and add the following variables:
# AWS Credentials
AWS_ACCESS_KEY_ID=your_access_key_id
AWS_SECRET_ACCESS_KEY=your_secret_access_key
AWS_REGION=us-east-1
# S3 Configuration
S3_BUCKET_NAME=your-inspectrag-bucket
S3_DEFAULT_PERMISSIONS=public,authenticated
S3_DEFAULT_CREATOR=s3_webhook
Environment Variable Details
Variable | Description | Example |
---|---|---|
AWS_ACCESS_KEY_ID |
Your AWS access key | AKIAIOSFODNN7EXAMPLE |
AWS_SECRET_ACCESS_KEY |
Your AWS secret key | wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY |
AWS_REGION |
AWS region where your bucket is located | us-east-1 |
S3_BUCKET_NAME |
Name of the S3 bucket to integrate | company-documents |
S3_DEFAULT_PERMISSIONS |
Comma-separated list of default permissions | public,authenticated |
S3_DEFAULT_CREATOR |
Default creator identifier for S3 objects | s3_webhook |
Step 3: Set Up S3 Event Notifications
For real-time updates, you need to configure S3 event notifications to trigger the InspectRAG webhook:
- Go to the AWS Management Console and navigate to the S3 service
- Select your bucket and go to the "Properties" tab
- Scroll down to "Event notifications" and click "Create event notification"
- Configure the notification:
- Name:
InspectRAGIntegration
- Event types: Select "All object create events" and "All object delete events"
- Destination: Select "Lambda Function" or "SQS Queue" based on your setup
- If using Lambda, create a Lambda function that forwards events to your InspectRAG webhook endpoint
- If using SQS, configure your InspectRAG application to poll the queue
Step 3a: Configure AWS Lambda Function (Recommended Approach)
For robust S3 event handling, we recommend using AWS Lambda to forward S3 events to your InspectRAG webhook.
3a.1: Create the Lambda Function
- Navigate to AWS Lambda Console:
- Go to AWS Management Console → Lambda
-
Click "Create function"
-
Function Configuration:
- Choose "Author from scratch"
- Function name:
InspectRAG-S3-Webhook-Forwarder
- Runtime:
Python 3.9
or later - Architecture:
x86_64
- Click "Create function"
3a.2: Configure the Lambda Function Code
- Download the Lambda Function Code:
- The
lambda_function.py
file is included with your InspectRAG release package -
Open the file in any text editor
-
Update the Webhook URL:
- In the
lambda_function.py
file, locate line 14: - Replace
https://your-rag-api-url
with your actual InspectRAG API URL -
Examples:
-
Deploy the Function Code:
- Copy the entire contents of the modified
lambda_function.py
file - In the AWS Lambda console, paste the code into the function editor
- Click "Deploy" to save your changes
3a.3: Connect S3 Bucket to Lambda
- Return to S3 Bucket Configuration:
- Go to your S3 bucket → Properties → Event notifications
-
Click "Create event notification"
-
Event Notification Details:
- Name:
InspectRAG-Event-Notification
- Event types: Select:
- ✅ All object create events (
s3:ObjectCreated:*
) - ✅ All object delete events (
s3:ObjectRemoved:*
)
- ✅ All object create events (
- Destination: Lambda function
-
Lambda function: Select
InspectRAG-S3-Webhook-Forwarder
-
Save Configuration:
- Click "Save changes"
- AWS will automatically grant the necessary permissions
3a.4: Test the Integration
- Upload a test file to your S3 bucket
- Check Lambda logs:
- Go to Lambda → Functions →
InspectRAG-S3-Webhook-Forwarder
- Click "Monitor" → "View logs in CloudWatch"
- Look for successful webhook forwarding messages
- Verify in InspectRAG:
- Check that the file appears in your InspectRAG index
3a.5: Troubleshooting Lambda Setup
Issue | Cause | Solution |
---|---|---|
Function not triggered | S3 event notification not configured | Verify event notification is properly set up |
HTTP errors in logs | Incorrect webhook URL | Double-check the webhook URL in lambda_function.py |
Timeout errors | InspectRAG not responding | Verify InspectRAG is accessible and running |
Permission denied | Lambda execution role missing permissions | Check IAM role and policies |
Important Notes
- URL Requirements: Your InspectRAG API must be accessible from the internet
- HTTPS Recommended: Use HTTPS for secure communication
- File Location: The
lambda_function.py
file is included in your InspectRAG release package - Only Edit the URL: Only modify the
webhook_url
variable - don't change other parts of the code
Step 4: Configure the Webhook Endpoint
If you're using a direct webhook connection:
- Set up a publicly accessible endpoint in your InspectRAG application (e.g.,
/s3-webhook
) - Ensure this endpoint is secure and validates incoming requests
- Configure AWS to send notifications to this endpoint
Step 5: Test the Integration
- Upload a file to your S3 bucket
- Check the InspectRAG logs to ensure the file is processed
- Verify the file is indexed by searching for its content in the InspectRAG interface
Step 6: Initial Synchronization
To index existing files in your S3 bucket, you can trigger a full synchronization from the InspectRAG admin interface. This will process all existing objects in the bucket according to your configuration settings.
Step 7: Scheduled Consistency Checks
InspectRAG can perform periodic consistency checks between your S3 bucket and its index. This helps identify and fix any objects that may have been missed during normal operations. Configure these checks through the administrative interface.
Data Processing
Object Types
The integration processes various file types from your S3 bucket:
File Type | Processing |
---|---|
PDFs | Text extraction with structure preservation |
Documents (Word, Excel, etc.) | Content extraction |
Text files | Direct processing |
Images | OCR processing if configured |
Metadata Capture
Metadata Key | Description | Example |
---|---|---|
file_id |
Unique ID for the object, format: s3-{bucket_name}-{object_key} |
s3-company-documents-folder/file.txt |
platform |
Source platform identifier | s3 |
bucket_name |
Name of the S3 bucket | company-documents |
object_key |
Object key (path) in the bucket | folder/file.txt |
permissions |
List of default permissions applied to the object | ["public", "authenticated"] |
creator |
Identifier of the creator (default from S3_DEFAULT_CREATOR ) |
s3_webhook |
Troubleshooting
Common Issues
Issue | Possible Causes | Solution |
---|---|---|
Files not being indexed | Incorrect AWS credentials | Verify AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY |
Authentication errors | Insufficient permissions | Check IAM policies for the AWS user |
Missing files in index | Event notifications not configured | Set up bucket event notifications correctly |
Processing errors | Unsupported file types | Check logs for specific file processing errors |
Checking Indexed Files
You can verify if specific S3 objects are indexed using the InspectRAG admin interface. This provides a way to check the status of individual files and reprocess them if necessary.
Manually Re-indexing Files
If you need to re-index specific files, you can do so through the InspectRAG administrative interface. Simply locate the file in question and use the "reprocess" option.
Performance Considerations
- Files are processed in batches to optimize performance
- Large files may take longer to process
- Consider using a prefix strategy for large buckets
- Monitoring Redis and application logs helps identify bottlenecks
Security Considerations
- Store AWS credentials securely as environment variables
- Use IAM roles with the minimum required permissions
- Set up appropriate bucket policies
- Consider encrypting sensitive files in S3 (using SSE)
- Implement appropriate access controls in InspectRAG
Related Resources
This guide provides comprehensive instructions for setting up and configuring InspectRAG's S3 bucket integration. Follow these steps carefully to ensure proper configuration and operation.