Skip to content

Building data processing pipelines for documents processing with NLP using Apache NiFi and related services

License

Notifications You must be signed in to change notification settings

CogStack/CogStack-NiFi

CogStack-NiFi

nifi doc-build elasticsearch-stack

💡 Introduction

This repository proposes a possible next step in the evolution of free-text data processing originally implemented in CogStack-Pipeline, moving towards a more modular, Platform-as-a-Service (PaaS) approach.

CogStack-NiFi demonstrates how to use Apache NiFi as the central data workflow engine for clinical document processing, integrating services such as text extraction and natural language processing (NLP). Each component runs as a standalone service, with NiFi handling data routing between components and data sources/sinks.

All NLP/ML/data services are expected to implement a uniform RESTful API, allowing seamless integration into existing pipelines and making it easy to incorporate any NLP application into the stack.


⚠️ Important Notice

This project is under active development. New features or services may impact existing deployments. Please review the release notes and documentation before upgrading.


💬 Asking Questions

Need help? Feel free to:


🗂️ Project

This table describes repository layout. For setup and operations, use the deployment and NiFi docs linked below.

Folder Description
nifi Custom Apache NiFi Docker image with workflows, configs, drivers, and user resources.
security Scripts for generating SSL certificates and other security-related tools.
services NLP and auxiliary services, each with its own configs and resources.
deploy Example deployment setup, combining NiFi and related services.
scripts Helper scripts (e.g., setup tools, sample DB ingestion, Elasticsearch ingestion).
data Place any test or data to be ingested here.
typings Stubs for code linting/type-hint, etc.

📚 Documentation & Getting Started

Quick Start (5 minutes)

# from repository root
git lfs pull
make -C deploy git-update-submodules
make -C deploy help
make -C deploy start-data-infra

After services start:

  • NiFi: https://localhost:8443
  • Elasticsearch: http://localhost:9200
  • Kibana/OpenSearch Dashboards: https://localhost:5601

Stop the core stack with:

make -C deploy stop-data-infra

Prerequisites:

  • Docker + Docker Compose (mandatory)
  • make
  • git + git-lfs
  • python3.11
  • Basic Linux/UNIX shell familiarity

📖 Official documentation: cogstack-nifi.readthedocs.io

🚀 New to the project? Start with the deployment guide for example setups and workflows.

🐞 For troubleshooting or bug reports, consult the known issues section before opening a ticket.


🛑 Important Updates

Check the release notes section regularly for:

  • Major changes to project structure or configuration
  • Security advisories or vulnerabilities affecting deployments