Vector Storage Providers 
Vector Storage Providers determine where and how embeddings (vector representations) of your datasource content are stored. Cognipeer AI supports multiple vector storage backends, allowing you to choose the best solution for your use case.
Overview 
When you add content to a datasource, Cognipeer AI:
- Generates Embeddings: Converts text into vector representations using AI models
- Stores Vectors: Saves these vectors in your chosen vector storage provider
- Enables Search: Allows semantic search across your content using vector similarity
Different providers offer different features, performance characteristics, and pricing models.
Available Providers 
System Default 
Best for: Getting started, testing, small-scale deployments
The System Default provider uses Cognipeer's built-in vector storage infrastructure.
Features:
- ✅ Zero configuration required
- ✅ Automatic setup and management
- ✅ No external services needed
- ✅ Included in your Cognipeer plan
Limitations:
- Limited to Cognipeer's infrastructure
- May have storage quotas based on your plan
- Not suitable for very large datasets (>100k documents)
When to Use:
- Prototyping and testing
- Small to medium datasources
- Simplified deployment without external dependencies
Configuration:
No configuration needed - this is the default option.
Pinecone 
Best for: Production deployments, large-scale applications, high performance
Pinecone is a fully managed vector database service optimized for similarity search.
Features:
- ✅ Serverless vector database
- ✅ High performance and low latency
- ✅ Automatic scaling
- ✅ Advanced filtering capabilities
- ✅ Real-time updates
Requirements:
- Pinecone account
- API key
- Environment name
- Index name
When to Use:
- Production applications with high query volumes
- Large datasources (100k+ documents)
- Need for advanced filtering and metadata search
- Mission-critical applications requiring high availability
Configuration:
- Create a Pinecone account at pinecone.io
- Create an index in your Pinecone dashboard
- Get your API key and environment name
- Configure in datasource settings:
{
  "provider": "pinecone",
  "config": {
    "apiKey": "your-pinecone-api-key",
    "environment": "us-west1-gcp",
    "indexName": "cognipeer-datasource",
    "namespace": "production"
  }
}Best Practices:
- Use separate indexes for different environments (dev/staging/prod)
- Set appropriate dimensions based on your embedding model
- Use namespaces to logically separate different datasources
- Monitor usage through Pinecone dashboard
Pricing:
- Pay-as-you-go based on usage
- Free tier available for testing
- See Pinecone pricing
Qdrant 
Best for: Self-hosted deployments, privacy-focused applications, cost optimization
Qdrant is an open-source vector search engine that can be self-hosted or used as a managed service.
Features:
- ✅ Open source and self-hostable
- ✅ Advanced filtering and payload indexing
- ✅ High performance
- ✅ Rich query capabilities
- ✅ Privacy control (self-hosted)
Requirements:
- Qdrant instance (self-hosted or cloud)
- API endpoint URL
- API key (optional for self-hosted)
- Collection name
When to Use:
- Need full control over data location
- Privacy and compliance requirements
- Cost optimization for large datasets
- On-premise deployments
Configuration:
Self-hosted:
{
  "provider": "qdrant",
  "config": {
    "url": "http://your-qdrant-instance:6333",
    "collectionName": "cognipeer_vectors",
    "apiKey": null
  }
}Qdrant Cloud:
{
  "provider": "qdrant",
  "config": {
    "url": "https://xyz.qdrant.cloud",
    "collectionName": "cognipeer_vectors",
    "apiKey": "your-qdrant-api-key"
  }
}Setup Instructions:
- Self-hosted: Deploy Qdrant using Docker: bash- docker run -p 6333:6333 qdrant/qdrant
- Qdrant Cloud: Sign up at qdrant.tech 
- Create a collection with appropriate vector dimensions 
- Configure in datasource settings 
Best Practices:
- Use Docker Compose for production self-hosted deployments
- Set appropriate resource limits (memory, CPU)
- Enable authentication for production environments
- Regular backups for self-hosted instances
Pricing:
- Self-hosted: Infrastructure costs only
- Qdrant Cloud: See Qdrant pricing
Elasticsearch 
Best for: Hybrid search (text + vector), existing Elasticsearch infrastructure, production deployments
Elasticsearch provides powerful vector search capabilities combined with traditional full-text search, making it ideal for hybrid search scenarios.
Features:
- ✅ Hybrid search (dense vectors + full-text)
- ✅ Advanced filtering and aggregations
- ✅ Rich query DSL
- ✅ KNN (k-nearest neighbors) search
- ✅ Production-ready with high availability
- ✅ Supports both cloud and self-hosted deployments
Deployment Options:
Cognipeer AI supports three Elasticsearch configurations:
1. Elasticsearch Cloud (Managed) 
Best for: Production use, minimal ops overhead
- Fully managed by Elastic
- Automatic scaling and updates
- Built-in monitoring and alerts
- API key authentication
- Cloud ID-based connection
Requirements:
- Elastic Cloud account
- Cloud ID
- API key or username/password
Configuration:
{
  "provider": "elasticsearch-cloud",
  "config": {
    "cloudId": "deployment:dXMtZWFzdC0xLmF3cy5mb3VuZC5pbyQ...",
    "apiKey": "your-api-key",
    "indexName": "cognipeer-vectors",
    "dimensions": 1536
  }
}Setup Instructions:
- Create an Elastic Cloud deployment at cloud.elastic.co
- Get your Cloud ID from the deployment details
- Generate an API key: - Navigate to Stack Management → API Keys
- Create a new API key with appropriate permissions
 
- Configure in Cognipeer datasource settings
2. Elasticsearch Self-Hosted 
Best for: On-premise deployments, full control, cost optimization
- Deploy on your own infrastructure
- Full control over configuration
- No vendor lock-in
- Direct node URL connection
Requirements:
- Elasticsearch 8.0+ cluster
- Node URL (http://hostname:port)
- Username/password or API key
Configuration:
{
  "provider": "elasticsearch-self-hosted",
  "config": {
    "node": "http://localhost:9200",
    "username": "elastic",
    "password": "your-password",
    "indexName": "cognipeer-vectors",
    "dimensions": 1536
  }
}With API Key:
{
  "provider": "elasticsearch-self-hosted",
  "config": {
    "node": "http://localhost:9200",
    "apiKey": "your-api-key",
    "indexName": "cognipeer-vectors",
    "dimensions": 1536
  }
}Setup Instructions:
- Deploy Elasticsearch 8.0+ using Docker: bash- docker run -d \ --name elasticsearch \ -p 9200:9200 \ -e "discovery.type=single-node" \ -e "xpack.security.enabled=false" \ docker.elastic.co/elasticsearch/elasticsearch:8.11.0
- For production with security: bash- docker run -d \ --name elasticsearch \ -p 9200:9200 \ -e "discovery.type=single-node" \ -e "ELASTIC_PASSWORD=your-password" \ docker.elastic.co/elasticsearch/elasticsearch:8.11.0
- Verify cluster health: bash- curl http://localhost:9200/_cluster/health
- Configure in Cognipeer datasource settings 
3. Elasticsearch (Legacy) 
The legacy Elasticsearch provider is maintained for backward compatibility. New implementations should use either elasticsearch-cloud or elasticsearch-self-hosted.
When to Use:
- Supported infrastructure
- Already-configured Elasticsearch instances
- Need for both cloud ID and node URL support in same provider
Configuration:
{
  "provider": "elasticsearch",
  "config": {
    "node": "http://localhost:9200",
    "cloudId": "optional-cloud-id",
    "username": "elastic",
    "password": "your-password",
    "indexName": "cognipeer-vectors",
    "dimensions": 1536
  }
}Elasticsearch Best Practices 
Index Configuration:
- Set appropriate number of shards based on data size
- Configure replica shards for high availability
- Use index templates for consistent mapping
Security:
- Always enable authentication in production
- Use API keys instead of basic auth when possible
- Implement IP filtering and TLS/SSL
- Regular security updates
Performance:
- Adjust num_candidatesfor KNN search quality vs. speed
- Use field filtering before vector search for better performance
- Configure appropriate shard count
- Monitor query latency and adjust settings
Monitoring:
- Use Elasticsearch monitoring features
- Set up alerts for cluster health
- Track index size and growth
- Monitor query performance
Cost Optimization:
- Use hot-warm-cold architecture for data lifecycle
- Configure index lifecycle management (ILM)
- Right-size your cluster based on usage
- Consider storage tiers for older data
Elasticsearch Limitations 
- Requires Elasticsearch 8.0+ for dense_vector support
- KNN search may be slower than specialized vector databases for very large datasets
- Memory requirements can be high for large vector indices
- Advanced features may require Enterprise license
Pricing:
- Self-hosted: Infrastructure costs only
- Elastic Cloud: See Elastic Cloud pricing
Orama 
Best for: Edge deployments, embedded search, browser-based applications
Orama is a fast, in-memory search engine that can run in browsers and edge environments.
Features:
- ✅ Extremely fast in-memory search
- ✅ Works in browser and Node.js
- ✅ Lightweight and portable
- ✅ Typo-tolerance and fuzzy search
- ✅ No external infrastructure needed
Requirements:
- Configuration for persistence (optional)
- Index configuration
When to Use:
- Edge computing scenarios
- Browser-based search applications
- Low-latency requirements
- Embedded search in applications
- Small to medium datasets that fit in memory
Configuration:
{
  "provider": "orama",
  "config": {
    "persistencePath": "/path/to/storage",
    "schema": {
      "id": "string",
      "content": "string",
      "metadata": "object"
    }
  }
}Best Practices:
- Monitor memory usage for large datasets
- Implement periodic persistence for data durability
- Use for read-heavy workloads
- Consider hybrid approach with server-side indexing
Limitations:
- Dataset size limited by available memory
- Not suitable for very large datasources
- Requires re-indexing on restarts (unless persisted)
S3 Vectors 
Best for: Cost-effective storage, archival, batch processing
S3 Vectors stores vector embeddings in Amazon S3 or S3-compatible storage.
Features:
- ✅ Low-cost storage
- ✅ Unlimited scalability
- ✅ Works with AWS S3, MinIO, DigitalOcean Spaces
- ✅ Good for batch processing
- ✅ Archival and backup
Requirements:
- S3-compatible storage account
- Access key and secret key
- Bucket name
- Region
When to Use:
- Cost optimization for large static datasets
- Archival storage
- Batch processing workflows
- Backup and disaster recovery
Configuration:
AWS S3:
{
  "provider": "s3-vectors",
  "config": {
    "accessKeyId": "your-access-key",
    "secretAccessKey": "your-secret-key",
    "bucket": "cognipeer-vectors",
    "region": "us-east-1",
    "endpoint": null
  }
}MinIO or DigitalOcean Spaces:
{
  "provider": "s3-vectors",
  "config": {
    "accessKeyId": "your-access-key",
    "secretAccessKey": "your-secret-key",
    "bucket": "cognipeer-vectors",
    "region": "us-east-1",
    "endpoint": "https://your-minio-instance.com"
  }
}Best Practices:
- Use S3 lifecycle policies for cost optimization
- Enable versioning for data protection
- Set up bucket policies for access control
- Use S3 Select for efficient querying
- Consider S3 Intelligent-Tiering for automatic cost optimization
Limitations:
- Higher latency than dedicated vector databases
- Not optimized for real-time search
- Requires additional indexing layer for fast retrieval
Pricing:
- AWS S3: Pay for storage and requests
- See AWS S3 pricing
Choosing a Provider 
Decision Matrix 
| Provider | Best For | Performance | Cost | Complexity | Scale | 
|---|---|---|---|---|---|
| System Default | Testing, small apps | Medium | Included | None | Small-Medium | 
| Pinecone | Production, high scale | Very High | Medium-High | Low | Very Large | 
| Qdrant | Self-hosted, privacy | High | Low (self-hosted) | Medium | Large | 
| Elasticsearch Cloud | Hybrid search, managed | Very High | Medium | Low | Very Large | 
| Elasticsearch Self-Hosted | Hybrid search, control | High | Low | Medium-High | Large | 
| Orama | Edge, in-memory | Very High | None | Low | Small-Medium | 
| S3 Vectors | Cost optimization, archival | Low | Very Low | Medium | Unlimited | 
Use Case Recommendations 
Startup/MVP 
Recommended: System Default or Qdrant Cloud
- Quick to set up
- Scales with growth
- Easy migration path later
Enterprise Production 
Recommended: Pinecone or Self-hosted Qdrant
- High availability
- Advanced features
- Production support
Privacy/Compliance 
Recommended: Self-hosted Qdrant
- Full data control
- On-premise deployment
- Audit capabilities
Cost-Sensitive 
Recommended: S3 Vectors or Self-hosted Qdrant
- Low storage costs
- Pay only for infrastructure
- No vendor lock-in
Real-time Applications 
Recommended: Pinecone or Orama
- Low latency
- High throughput
- Optimized for search
Configuring Vector Providers 
In Datasource Settings 
- Navigate to Datasource → Settings
- Find Vector Storage section
- Select your preferred provider
- Enter required credentials and configuration
- Click Test Connection to verify
- Save settings
Via API 
POST /api/v1/datasource/:datasourceId/vector-provider
{
  "provider": "pinecone",
  "config": {
    "apiKey": "your-api-key",
    "environment": "us-west1-gcp",
    "indexName": "cognipeer-datasource"
  }
}Migration Between Providers 
Migration Process 
- Backup Current Data - Export your datasource content
- Save current vector configurations
 
- Configure New Provider - Set up new vector storage provider
- Test connection and configuration
 
- Re-index Content - Trigger re-indexing of all content
- Monitor progress in datasource dashboard
 
- Verify Search Quality - Test search queries
- Compare results with previous provider
 
- Update Production - Switch datasource to new provider
- Monitor performance and errors
 
Migration Example 
// 1. Get current datasource
const datasource = await api.getDatasource(datasourceId);
// 2. Update vector provider
await api.updateDatasourceVectorProvider(datasourceId, {
  provider: 'qdrant',
  config: {
    url: 'https://your-qdrant.cloud',
    collectionName: 'cognipeer_vectors',
    apiKey: 'your-key'
  }
});
// 3. Trigger re-indexing
await api.reindexDatasource(datasourceId);
// 4. Monitor progress
const status = await api.getDatasourceIndexingStatus(datasourceId);
console.log(`Progress: ${status.progress}%`);Hybrid Search 
Cognipeer AI supports Hybrid Search, combining:
- Vector Search: Semantic similarity using embeddings
- Keyword Search: Traditional text-based search
- Filtering: Metadata-based filtering
Hybrid Search Configuration 
Enable hybrid search in datasource settings:
{
  "searchConfig": {
    "hybridSearch": {
      "enabled": true,
      "vectorWeight": 0.7,
      "keywordWeight": 0.3,
      "useReranking": true
    }
  }
}Parameters:
- vectorWeight: Weight for semantic search (0.0 - 1.0)
- keywordWeight: Weight for keyword search (0.0 - 1.0)
- useReranking: Apply AI-powered result reranking
Best Practices:
- Adjust weights based on your content type
- Use higher vector weight for conceptual searches
- Use higher keyword weight for exact term matching
- Enable reranking for improved relevance
Performance Optimization 
Indexing Performance 
- Batch Processing: Index content in batches
- Parallel Indexing: Use multiple workers
- Incremental Updates: Only re-index changed content
- Dimension Reduction: Use smaller embedding dimensions if appropriate
Query Performance 
- Caching: Cache frequent queries
- Pre-filtering: Filter by metadata before vector search
- Result Limiting: Request only needed number of results
- Connection Pooling: Reuse connections to vector databases
Monitoring 
Track these metrics:
- Index Size: Total vectors stored
- Query Latency: Average search response time
- Index Update Time: Time to index new content
- Storage Usage: Disk/memory consumption
- Error Rate: Failed indexing or query operations
Troubleshooting 
Common Issues 
Slow Search Performance 
Symptoms: Queries take too long to return results
Solutions:
- Check vector provider latency
- Reduce result limit
- Add metadata filters
- Consider upgrading provider tier
- Enable query caching
Indexing Failures 
Symptoms: Content not appearing in search
Solutions:
- Verify provider credentials
- Check provider quota/limits
- Review error logs
- Retry failed documents
- Verify network connectivity
Poor Search Quality 
Symptoms: Irrelevant results returned
Solutions:
- Adjust hybrid search weights
- Enable reranking
- Use better embedding models
- Add more context to queries
- Improve content quality and metadata
Connection Errors 
Symptoms: Cannot connect to vector provider
Solutions:
- Verify credentials and endpoint URL
- Check firewall/network settings
- Confirm provider service status
- Test with provider's native tools
- Review provider documentation
Security Best Practices 
- Credentials Management - Use environment variables for API keys
- Rotate credentials regularly
- Never commit credentials to version control
 
- Access Control - Use provider's built-in access controls
- Implement least privilege principle
- Separate dev/staging/prod credentials
 
- Data Encryption - Enable encryption at rest (provider-side)
- Use TLS/SSL for data in transit
- Consider client-side encryption for sensitive data
 
- Monitoring - Track unusual access patterns
- Set up alerts for errors and anomalies
- Regularly audit access logs
 
Cost Optimization 
Tips for Reducing Costs 
- Right-size Your Provider - Start with System Default for testing
- Upgrade only when needed
- Consider self-hosted options for high volume
 
- Optimize Indexing - Deduplicate content before indexing
- Use smaller embedding dimensions
- Batch updates instead of real-time
 
- Efficient Querying - Implement caching layer
- Use pre-filtering to reduce vector comparisons
- Limit result sets appropriately
 
- Storage Management - Remove obsolete vectors
- Use provider's storage optimization features
- Archive old content
 
Related Documentation 
- Datasource Management - Complete datasource guide
- Hybrid Search Configuration - Advanced search setup
- Peer Settings - Configuring search in peers
Summary 
Vector Storage Providers are a critical component of Cognipeer AI's search capabilities. Choosing the right provider depends on your:
- Scale and performance requirements
- Budget constraints
- Privacy and compliance needs
- Technical infrastructure
Quick Recommendations:
- 🚀 Getting Started: Use System Default
- 🏢 Production: Use Pinecone or Qdrant Cloud
- 🔒 Privacy/On-premise: Self-host Qdrant
- 💰 Cost-Sensitive: Self-host Qdrant or use S3 Vectors
- ⚡ Low Latency: Use Pinecone or Orama
Start simple, monitor performance, and scale/migrate as needed. All providers support seamless migration paths.

