OpenSearch Integration
This guide shows how to set up and use OpenSearch 3.1+ with govdoc-scanner for searchable company data indexing.
Overview
The apps/opensearch/ directory provides complete OpenSearch integration with:
- Development environment: Quick local setup for testing
- Production environment: Secure, scalable deployment with authentication
- Index templates: Pre-configured mappings for company data
- Dashboard setup: Ready-to-use visualizations and index patterns
Prerequisites
- Docker and Docker Compose
- Node.js 18+ (20+ recommended)
Development Setup
For local development and testing:
1. Start Development Cluster
cd apps/opensearch/development
cp .env.template .env
# Edit .env with a strong password (8+ characters)
docker compose up -d
2. Configure Application
Update your root .env file:
OPENSEARCH_PUSH=true
OPENSEARCH_URL=https://localhost:9200
OPENSEARCH_USERNAME=admin
OPENSEARCH_PASSWORD=yourAdminPassword
OPENSEARCH_INSECURE=true
OPENSEARCH_INDEX=govdoc-companies-000001
3. Create Index Template
curl -k -u admin:yourAdminPassword -X PUT "https://localhost:9200/_index_template/govdoc-company-template" \
-H "Content-Type: application/json" \
-d @apps/opensearch/shared/templates/company-index-template.json
4. Create Initial Index
curl -k -u admin:yourAdminPassword -X PUT "https://localhost:9200/govdoc-companies-000001"
Verify setup:
# Check if template was created
curl -k -u admin:yourAdminPassword "https://localhost:9200/_index_template/govdoc-company-template?pretty"
# Check index mappings
curl -k -u admin:yourAdminPassword "https://localhost:9200/govdoc-companies-000001/_mapping?pretty"
5. Test Data Ingestion
npm start govdoc -- --input ./companies.gds --push
Access Dashboards
- URL: http://localhost:5601
- Username: admin
- Password: (from your
.envfile)
Create index patterns manually:
- Go to Discover -> Create Index Pattern
- Create pattern:
govdoc-companies-* - Set time field:
scan_date - Explore data in Discover tab
Shut Down Docker Container:
cd apps/opensearch/development
docker compose down
Reset development environment:
cd apps/opensearch/development
docker compose down --volumes --remove-orphans
Production Setup
For production deployments with security and monitoring:
Quick Setup
cd apps/opensearch/production
./setup-production.sh
This automatically:
- Generates secure passwords and certificates
- Creates security configuration (users, roles, mappings)
- Starts production OpenSearch cluster
- Initializes security configuration with proper authentication
- Creates test data to verify bulk operations work
Important: After setup completes, passwords are stored in apps/opensearch/production/.env. Copy the govdoc_ingest password to your root .env file.
Manual Setup
For step-by-step control:
cd apps/opensearch/production
# Step 1: Run security setup (creates .env file automatically)
./scripts/setup-security.sh
# Step 2: Start production cluster
docker compose -f docker-compose.prod.yml up -d
# Step 3: Initialize security configuration (loads YAML files into OpenSearch)
./scripts/initialize-security.sh
# Step 4: Initialize indices and templates
./scripts/initialize-cluster.sh
# Step 5: Setup dashboards
./scripts/setup-dashboards.sh
Security Note: Production uses a dedicated govdoc_ingest user with minimal permissions (only bulk write access to govdoc-companies-* indexes). Admin credentials are separate and should be stored securely.
Configure your application by copying from the .env created to the root .env file:
OPENSEARCH_URL=https://localhost:9200
OPENSEARCH_USERNAME=govdoc_ingest
OPENSEARCH_PASSWORD=govdoc_ingest_password
OPENSEARCH_INDEX=govdoc-companies-write
OPENSEARCH_BATCH_SIZE=500
OPENSEARCH_INSECURE=true # Set to false when using proper certificates
Shut Down Docker Container:
cd apps/opensearch/production
docker compose -f docker-compose.prod.yml down
Reset production environment:
cd apps/opensearch/production
./cleanup-production.sh
Production Maintenance
Health Monitoring
Check cluster health and status:
cd apps/opensearch/production
./scripts/health-check.sh
This script monitors:
- Cluster status (green/yellow/red) with shard distribution
- Node health, heap memory usage, and JVM statistics
- Index statistics and document counts for govdoc-companies-* indices
- Disk usage (both container and host)
- Recent snapshot status and backup health
- Security configuration (HTTPS and authentication status)
Data Backup
Create backups of your data:
cd apps/opensearch/production
./scripts/backup.sh
Features:
- Creates timestamped snapshots (govdoc-daily-YYYYMMDD-HHMMSS)
- Backs up govdoc-companies-* indices with metadata
- Automatic cleanup of old snapshots (30-day retention)
- Repository verification and integrity checks
- Progress monitoring and detailed reporting
- Support for --list-only, --cleanup-only, --verify-only options
Access Production Dashboards
- URL: http://localhost:5601
- Username: admin
- Password: (shown after setup completion)
Index patterns are automatically created. You can:
- Explore data in Discover
- Create visualizations in Visualize
- Build dashboards in Dashboard
- Monitor health in Stack Management
Troubleshooting
OpenSearch Production Startup Issues
If OpenSearch production fails to start, check the logs first:
cd apps/opensearch/production
docker compose -f docker-compose.prod.yml logs opensearch
Common Issues:
-
Insufficient Memory: The most common issue is insufficient RAM allocation
- Solution: Reduce memory settings from 4GB to 2GB in
docker-compose.prod.yml:
environment:
- "OPENSEARCH_JAVA_OPTS=-Xms2g -Xmx2g" - Solution: Reduce memory settings from 4GB to 2GB in
-
Port Conflicts: Ports 9200 or 5601 already in use
- Check:
sudo netstat -tulpn | grep :9200 - Solution: Stop conflicting services or change ports in docker-compose
- Check:
-
Permission Issues: Container cannot write to mounted volumes
- Solution: Fix ownership:
sudo chown -R 1000:1000 apps/opensearch/production/
- Solution: Fix ownership:
Environment Differences
| Feature | Development | Production |
|---|---|---|
| Memory | 512MB heap | 4GB heap |
| Security | Basic auth | Full TLS + RBAC |
| Persistence | Docker volumes | Named volumes + backup |
| Monitoring | Basic health checks | Health checks + monitoring |
| Certificates | Auto-generated | Demo certs |
CLI Integration
Interactive Mode
npm start govdoc
# Automatically pushes if OPENSEARCH_PUSH=true
Command Mode with Flags
# Development
npm start govdoc -- --input ./companies.gds \
--push \
--os.endpoint https://localhost:9200 \
--os.username admin \
--os.password yourDevPassword \
--os.index govdoc-companies-000001 \
--os.insecure \
--os.batch-size 500
# Production
npm start govdoc -- --input ./companies.gds \
--push \
--os.endpoint https://localhost:9200 \
--os.username govdoc_ingest \
--os.password yourProdPassword \
--os.index govdoc-companies-write \
--os.insecure \
--os.batch-size 500
Data Model
The index template (apps/opensearch/shared/templates/company-index-template.json) defines:
- Index pattern:
govdoc-companies-* - Dynamic mapping: false (unknown fields rejected)
- Document structure: One document per company (gemi_id)
Key Fields:
gemi_id,company_tax_id(keyword)company_name(text + keyword subfield)creation_date,scan_date,document_date(date)representatives(nested array)tracked_changes_history(nested array withcompany_changes,economic_changesper document)
Query Examples
Search by company name:
curl -k -u admin:yourPassword -X POST "https://localhost:9200/govdoc-companies-000001/_search" \
-H "Content-Type: application/json" \
-d '{
"query": {
"match": { "company_name": "ΤΕΧΝΙΚΗ" }
}
}'
Filter by region and aggregate cities:
curl -k -u admin:yourPassword -X POST "https://localhost:9200/govdoc-companies-000001/_search" \
-H "Content-Type: application/json" \
-d '{
"size": 0,
"query": {
"term": { "region": "ΑΤΤΙΚΗΣ" }
},
"aggs": {
"cities": { "terms": { "field": "city" } }
}
}'
Find active representatives:
curl -k -u admin:yourPassword -X POST "https://localhost:9200/govdoc-companies-000001/_search" \
-H "Content-Type: application/json" \
-d '{
"query": {
"nested": {
"path": "representatives",
"query": {
"bool": {
"must": [
{ "match": { "representatives.name": "ΓΕΩΡΓΙΟΣ" } },
{ "term": { "representatives.is_active": true } }
]
}
}
}
}
}'
Directory Structure
apps/opensearch/
├── README.md # Quick start guide
├── development/ # Development environment
│ ├── docker-compose.yml # Dev Docker Compose
│ └── .env.template # Environment template
├── production/ # Production environment
│ ├── docker-compose.prod.yml # Production Docker Compose
│ ├── setup-production.sh # One-click setup script
│ ├── cleanup-production.sh # Reset script
│ ├── config/ # OpenSearch configuration
│ └── scripts/ # Setup automation scripts
└── shared/ # Shared resources
└── templates/ # Index templates
└── company-index-template.json # Company data mapping