Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: CI

on:
push:
branches: [main, master]
pull_request:

jobs:
test:
runs-on: ubuntu-latest

strategy:
fail-fast: false
matrix:
python-version: ["3.10", "3.11"]

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install -e .[dev]

- name: Run tests
run: python -m pytest -v

- name: Build package
run: python -m build
39 changes: 39 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: Publish Python Package

on:
release:
types: [published]

jobs:
publish:
if: ${{ !github.event.release.prerelease }}
runs-on: ubuntu-latest

permissions:
contents: write
id-token: write

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Install build tools
run: |
python -m pip install --upgrade pip
pip install build

- name: Build distributions
run: python -m build

- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1

- name: Upload dist files to GitHub Release
uses: softprops/action-gh-release@v2
with:
files: dist/*
35 changes: 35 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: Release Build

on:
release:
types: [published]

jobs:
build-release-artifacts:
if: ${{ !github.event.release.prerelease }}
runs-on: ubuntu-latest

permissions:
contents: write

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: "3.10"

- name: Install build dependencies
run: |
python -m pip install --upgrade pip
python -m pip install build

- name: Build distributions
run: python -m build

- name: Upload artifacts to GitHub Release
uses: softprops/action-gh-release@v2
with:
files: dist/*
103 changes: 103 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
*.egg
MANIFEST

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage / pytest
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache/
.pytest_cache/
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/

# Type check / lint caches
.mypy_cache/
.ruff_cache/
.pyre/
.dmypy.json
dmypy.json

# Virtual environments
.venv/
venv/
env/
ENV/

# Jupyter Notebook
.ipynb_checkpoints/

# IDEs / editors
.vscode/
.idea/

# OS files
.DS_Store
Thumbs.db

# Local environment files
.env
.env.*
*.local

# Logs
*.log

# Temporary files
tmp/
temp/
*.tmp

# Project generated files
output/
temp_uploads/
generated/
reports/

# Excel / export artifacts
*.xlsx

# Database / local data
*.db
*.sqlite3

# Python build metadata
.pybuild/

# Packaging tools
pip-wheel-metadata/

# PyInstaller
*.manifest
*.spec
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2026 Daniel Arndt

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
137 changes: 137 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Pydf

[Documentação PT-BR](docs/README.pt-BR.md) | [English docs](docs/README.en.md)

A `pydf` é uma biblioteca Python leve para leitura de PDFs de faturas, extração de metadados com regex, exportação para Excel e persistência opcional em MySQL.

Esta versão reorganiza o projeto original como biblioteca e CLI, sem fugir da ideia central do script: **PDF -> extração -> Excel -> MySQL opcional**.

## Visão rápida

- Biblioteca Python reutilizável
- CLI simples para uso no terminal
- Regex configurável para número e data da fatura
- Exportação para `.xlsx`
- Persistência opcional em MySQL
- Documentação em PT-BR e inglês
- Workflows de CI e release para GitHub Actions

## Requisitos

- Python **3.10 ou superior**
- `pip`
- Recomendado: ambiente virtual (`venv`)

## Instalação local

Na raiz do projeto:

```bash
pip install -e .
```

Instalação com dependências de desenvolvimento:

```bash
pip install -e .[dev]
```

## Instalação da CLI via GitHub

Como o GitHub não oferece um registry Python suportado para `pip` no GitHub Packages, a forma recomendada para instalar a CLI a partir do GitHub é usar o próprio repositório Git.

### Instalar da branch padrão

```bash
pip install "git+https://github.com/DanielArndt0/pydf.git"
```

### Instalar de uma tag ou release específica

```bash
pip install "git+https://github.com/DanielArndt0/pydf.git@v1.0.0"
```

Depois disso, a CLI fica disponível como:

```bash
pydf --help
```

## Primeiros passos com venv no Windows

Se você tiver mais de uma versão do Python instalada, confira as versões disponíveis:

```powershell
py -0p
```

Crie e ative um ambiente virtual com Python 3.10:

```powershell
py -3.10 -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -e .[dev]
```

## Como executar a CLI

```bash
pydf --help
pydf examples/pdf_invoices --output output/invoices.xlsx
```

## Como usar como biblioteca

```python
from pydf import InvoiceProcessor, ProcessorConfig

config = ProcessorConfig(
input_dir="examples/pdf_invoices",
output_excel="output/invoices.xlsx",
)

result = InvoiceProcessor(config).process()

print(result.output_excel)
for record in result.records:
print(record.file_name, record.invoice_number, record.invoice_date, record.status)
```

## Rodando testes

```bash
pytest -v
```

Se o ambiente ainda não estiver preparado:

```bash
pip install -e .[dev]
pytest -v
```

## Build local

```bash
python -m build
```

## CI e releases no GitHub

Este repositório inclui dois workflows:

- `ci.yml`: roda testes e build em todo push e pull request
- `release.yml`: gera os artefatos e anexa `dist/*` a uma release publicada manualmente

Documentação detalhada:

- [Guia principal da documentação](docs/README.pt-BR.md)
- [Guia da CLI](docs/CLI.pt-BR.md)
- [Guia da API](docs/API.pt-BR.md)
- [Arquitetura](docs/ARCHITECTURE.pt-BR.md)
- [CI/CD e Releases](docs/CI-CD.pt-BR.md)
- [Ambiente Python, venv e troubleshooting](docs/ENVIRONMENT.pt-BR.md)
- [Exemplos](examples/README.md)

Loading