Skip to main content

CLI Examples

This section provides simple examples for using the GovDoc Scanner CLI tool - the recommended way to use the project.

The CLI tool provides the complete end-to-end workflow, combining crawling and document processing automatically with both interactive and command-line modes.

Basic Usage

Interactive Mode (Recommended):

npm start govdoc
# Follow interactive prompts:
# 1. Choose input method (file, manual, random)
# 2. Enter or select companies
# 3. Confirm processing
# 4. Watch automated crawling and processing

Command Line Mode for Automation:

# Process from file
npm start govdoc -- --input ./companies.gds

# Process random companies
npm start govdoc -- --company-random 10

# Show help
npm start govdoc -- --help

Programmatic Usage

Simple Integration:

import { spawn } from "child_process";
import { createLogger } from "./shared/logging/index.mjs";
import { validateConfig, validateApiKey } from "./shared/config/validator.mjs";

const logger = createLogger("GOVDOC-INTEGRATION");

async function processCompanies(inputFilePath) {
try {
// Validate configuration before starting
validateConfig();
const apiResult = await validateApiKey();
if (!apiResult.ok) {
logger.error(`API validation failed: ${apiResult.reason}`);
throw new Error(`API validation failed: ${apiResult.reason}`);
}

logger.info("Starting GovDoc processing...");

return new Promise((resolve, reject) => {
const govdoc = spawn("npm", [
"start",
"govdoc",
"--",
"--input",
inputFilePath, // JSON file with an array of GEMI IDs, e.g. ["152034008000","175175703000"]
]);

govdoc.stdout.on("data", (data) => {
logger.debug(`GovDoc output: ${data}`);
});

govdoc.stderr.on("data", (data) => {
logger.warn(`GovDoc stderr: ${data}`);
});

govdoc.on("close", (code) => {
if (code === 0) {
logger.info("✅ GovDoc processing completed successfully");
resolve();
} else {
logger.error(`❌ GovDoc processing failed with exit code: ${code}`);
reject(new Error(`Failed with exit code: ${code}`));
}
});
});
} catch (error) {
logger.error("Failed to start GovDoc processing", error);
throw error;
}
}

Output Structure

After processing, you'll find results in the output/ directory:

output/
├── 123204604000/
│ ├── 123204604000_final_metadata.json # Comprehensive company metadata
│ └── document_downloads/
│ ├── 2019-09-23_90189.pdf
│ └── 2020-11-03_2334237.pdf
└── govdoc-output.json # Summary of all processed companies

Individual Applications

For specific use cases, you can use individual applications:

Next Steps