Dataset Preparation Scripts
During the dataset preparation process, various Python scripts were used to format and preprocess the data. Most scripts can be found in the PythonScripts/ directory.
The dataset is structured in JSON format, where each entry represents an audio sample with metadata. Below is an example:
{"audio_filepath": "./input/wav_format/106_मेडिकल_चोक.wav", "text": " मेडिकल चोक ", "duration": 1.296, "language_ids": "1", "sample_ids": "0001", "lang": "ne"}
{"audio_filepath": "./input/wav_format/107_एकता_चोक.wav", "text": " एकता चोक ", "duration": 1.152, "language_ids": "1", "sample_ids": "0002", "lang": "ne"}
{"audio_filepath": "./input/wav_format/108_चित्रपुर.wav", "text": " चित्रपुर ", "duration": 0.936, "language_ids": "1", "sample_ids": "0003", "lang": "ne"}
{"audio_filepath": "./input/wav_format/109_Kumaripati.wav", "text": " Kumaripati ", "duration": 1.08, "language_ids": "1", "sample_ids": "0004", "lang": "ne"}
{"audio_filepath": "./input/wav_format/110_Pulchowk.wav", "text": " Pulchowk ", "duration": 0.96, "language_ids": "1", "sample_ids": "0005", "lang": "ne"}audio_filepath: Path to the audio file.text: Transcription of the audio.duration: Length of the audio in seconds.language_ids: Language identifier.sample_ids: Unique identifier for the sample.lang: Language code (e.g., "ne" for Nepali).
The fine-tuning process is managed using a configuration file:
examples/asr/conf/asr_finetune/speech_to_text_finetune.yaml
This file contains hyperparameters and training settings for fine-tuning the ASR model.
To start the fine-tuning process, execute the following bash script:
./finetune_asr.shThe required dependencies are listed in the requirements/ directory. To install them, you can use the provided script:
bash reinstall.sh