Speech-to-Text Fine-Tuning

Dataset Preparation Scripts

During the dataset preparation process, various Python scripts were used to format and preprocess the data. Most scripts can be found in the PythonScripts/ directory.

Dataset Format

The dataset is structured in JSON format, where each entry represents an audio sample with metadata. Below is an example:

{"audio_filepath": "./input/wav_format/106_मेडिकल_चोक.wav", "text": " मेडिकल चोक ", "duration": 1.296,  "language_ids": "1", "sample_ids": "0001", "lang": "ne"}
{"audio_filepath": "./input/wav_format/107_एकता_चोक.wav", "text": "  एकता चोक ", "duration": 1.152,  "language_ids": "1", "sample_ids": "0002", "lang": "ne"}
{"audio_filepath": "./input/wav_format/108_चित्रपुर.wav", "text": " चित्रपुर ", "duration": 0.936,  "language_ids": "1", "sample_ids": "0003", "lang": "ne"}
{"audio_filepath": "./input/wav_format/109_Kumaripati.wav", "text": " Kumaripati ", "duration": 1.08,  "language_ids": "1", "sample_ids": "0004", "lang": "ne"}
{"audio_filepath": "./input/wav_format/110_Pulchowk.wav", "text": " Pulchowk ", "duration": 0.96,  "language_ids": "1", "sample_ids": "0005", "lang": "ne"}

Fields Explanation

audio_filepath: Path to the audio file.
text: Transcription of the audio.
duration: Length of the audio in seconds.
language_ids: Language identifier.
sample_ids: Unique identifier for the sample.
lang: Language code (e.g., "ne" for Nepali).

Configuration File

The fine-tuning process is managed using a configuration file:

examples/asr/conf/asr_finetune/speech_to_text_finetune.yaml

This file contains hyperparameters and training settings for fine-tuning the ASR model.

Running the Fine-Tuning Script

To start the fine-tuning process, execute the following bash script:

./finetune_asr.sh

Requirements

The required dependencies are listed in the requirements/ directory. To install them, you can use the provided script:

bash reinstall.sh

Name		Name	Last commit message	Last commit date
Latest commit History 6,309 Commits
.github		.github
PythonScripts		PythonScripts
docs		docs
examples		examples
external		external
nemo		nemo
requirements		requirements
scripts		scripts
tests		tests
tools		tools
tutorials		tutorials
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
Old_README.rst		Old_README.rst
README.md		README.md
ci.groovy		ci.groovy
delme.py		delme.py
finetune_asr.sh		finetune_asr.sh
pyproject.toml		pyproject.toml
reinstall.sh		reinstall.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-to-Text Fine-Tuning

Dataset Format

Fields Explanation

Configuration File

Running the Fine-Tuning Script

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speech-to-Text Fine-Tuning

Dataset Format

Fields Explanation

Configuration File

Running the Fine-Tuning Script

Requirements

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages