CLI Examples

This section provides simple examples for using the GovDoc Scanner CLI tool - the recommended way to use the project.

The CLI tool provides the complete end-to-end workflow, combining crawling and document processing automatically with both interactive and command-line modes.

Basic Usage

Interactive Mode (Recommended):

npm start govdoc
# Follow interactive prompts:
# 1. Choose input method (file, manual, random)
# 2. Enter or select companies
# 3. Confirm processing
# 4. Watch automated crawling and processing

Command Line Mode for Automation:

# Process from file
npm start govdoc -- --input ./companies.gds

# Process random companies
npm start govdoc -- --company-random 10

# Show help
npm start govdoc -- --help

Programmatic Usage

Simple Integration:

import { spawn } from "child_process";
import { createLogger } from "./shared/logging/index.mjs";
import { validateConfig, validateApiKey } from "./shared/config/validator.mjs";

const logger = createLogger("GOVDOC-INTEGRATION");

async function processCompanies(inputFilePath) {
  try {
    // Validate configuration before starting
    validateConfig();
    const apiResult = await validateApiKey();
    if (!apiResult.ok) {
      logger.error(`API validation failed: ${apiResult.reason}`);
      throw new Error(`API validation failed: ${apiResult.reason}`);
    }

    logger.info("Starting GovDoc processing...");

    return new Promise((resolve, reject) => {
      const govdoc = spawn("npm", [
        "start",
        "govdoc",
        "--",
        "--input",
        inputFilePath, // JSON file with an array of GEMI IDs, e.g. ["152034008000","175175703000"]
      ]);

      govdoc.stdout.on("data", (data) => {
        logger.debug(`GovDoc output: ${data}`);
      });

      govdoc.stderr.on("data", (data) => {
        logger.warn(`GovDoc stderr: ${data}`);
      });

      govdoc.on("close", (code) => {
        if (code === 0) {
          logger.info("✅ GovDoc processing completed successfully");
          resolve();
        } else {
          logger.error(`❌ GovDoc processing failed with exit code: ${code}`);
          reject(new Error(`Failed with exit code: ${code}`));
        }
      });
    });
  } catch (error) {
    logger.error("Failed to start GovDoc processing", error);
    throw error;
  }
}

Output Structure

After processing, you'll find results in the output/ directory:

output/
├── 123204604000/
│   ├── 123204604000_final_metadata.json  # Comprehensive company metadata
│   └── document_downloads/
│       ├── 2019-09-23_90189.pdf
│       └── 2020-11-03_2334237.pdf
└── govdoc-output.json  # Summary of all processed companies

Individual Applications

For specific use cases, you can use individual applications:

Crawler Examples - Web scraping and document downloading
Doc-Scanner Examples - Document processing and metadata extraction

Next Steps

Check the Getting Started guide for setup
Explore Development Setup for advanced usage
Review the GSoC 2025 Overview for project roadmap

Basic Usage​

Programmatic Usage​

Output Structure​

Individual Applications​

Next Steps​

Basic Usage

Programmatic Usage

Output Structure

Individual Applications

Next Steps