This repository proposes a possible next step in the evolution of free-text data processing originally implemented in CogStack-Pipeline, moving towards a more modular, Platform-as-a-Service (PaaS) approach.
CogStack-NiFi demonstrates how to use Apache NiFi as the central data workflow engine for clinical document processing, integrating services such as text extraction and natural language processing (NLP). Each component runs as a standalone service, with NiFi handling data routing between components and data sources/sinks.
All NLP/ML/data services are expected to implement a uniform RESTful API, allowing seamless integration into existing pipelines and making it easy to incorporate any NLP application into the stack.
This project is under active development. New features or services may impact existing deployments. Please review the release notes and documentation before upgrading.
Need help? Feel free to:
- Open an issue on the GitHub Issue Tracker
- Start a discussion on our Discourse forum (actively monitored by the dev team)
This table describes repository layout. For setup and operations, use the deployment and NiFi docs linked below.
| Folder | Description |
|---|---|
nifi |
Custom Apache NiFi Docker image with workflows, configs, drivers, and user resources. |
security |
Scripts for generating SSL certificates and other security-related tools. |
services |
NLP and auxiliary services, each with its own configs and resources. |
deploy |
Example deployment setup, combining NiFi and related services. |
scripts |
Helper scripts (e.g., setup tools, sample DB ingestion, Elasticsearch ingestion). |
data |
Place any test or data to be ingested here. |
typings |
Stubs for code linting/type-hint, etc. |
# from repository root
git lfs pull
make -C deploy git-update-submodules
make -C deploy help
make -C deploy start-data-infraAfter services start:
- NiFi:
https://localhost:8443 - Elasticsearch:
http://localhost:9200 - Kibana/OpenSearch Dashboards:
https://localhost:5601
Stop the core stack with:
make -C deploy stop-data-infraPrerequisites:
- Docker + Docker Compose (mandatory)
makegit+git-lfspython3.11- Basic Linux/UNIX shell familiarity
📖 Official documentation: cogstack-nifi.readthedocs.io
🚀 New to the project? Start with the deployment guide for example setups and workflows.
🐞 For troubleshooting or bug reports, consult the known issues section before opening a ticket.
Check the release notes section regularly for:
- Major changes to project structure or configuration
- Security advisories or vulnerabilities affecting deployments