Custom Transformer-Based Text Summarization

Project Overview

This project is part of my NLP course assignment to build a transformer-based Large Language Model (LLM) from scratch for text summarization. The model is designed to summarize dialogues, though a decoder-only model would have been more suitable for the task. However, the assignment required implementing a full transformer architecture.

Dataset

The model is trained on the text summarization dataset available at: Text Summarization with Large Language Models

Implementation Details

Model Architecture

Type: Abstractive Transformer Model
Layers: 6 encoder and 6 decoder layers (similar to the original paper)
Attention Heads: 8
Positional Encoding: Implemented to preserve input sequence order
Sequence Length: 128 tokens (due to memory constraints)
Vocabulary: Created from the provided dataset only (to reduce computational burden)
Normalization: Applied after each encoder and decoder stack for slight performance improvement
Tokenization: Word-level tokenization
Batch Size: 32

Training Details

Loss Function: Cross-entropy loss for sequence-to-sequence generation
Optimizer: Adam optimizer with learning rate scheduling (warm-up and decay)
Early Stopping: Implemented to prevent overfitting
Challenges:
- Limited computational resources led to a reduced sequence length (128 instead of 512)
- Difficulty in capturing context for dialogues longer than 128 tokens

Evaluation Metrics

The model was evaluated using:

ROUGE Scores:
- ROUGE-1: 0.3016
- ROUGE-2: 0.0769
- ROUGE-L: 0.2350
Qualitative Evaluation:
- Relevance: Performs well for sequences around 128 tokens but struggles with longer or extremely short dialogues.
- Coherence: Inconsistent; sometimes logical, sometimes not.
- Conciseness: Generally effective at capturing essential information.

Results and Analysis

The model's performance, measured via ROUGE scores, is approximately half of what state-of-the-art models (e.g., T5) achieved on news summarization tasks.
Longer and shorter dialogues posed significant challenges in context comprehension.

References

Learning Resources Used:

Related Work:

Automated News Summarization Using Transformers

Future Improvements

Use of pre-trained embeddings to improve generalization
Implementing better tokenization methods such as subword tokenization (e.g., BPE or WordPiece)
Experimenting with longer sequence lengths with better resource optimization
Exploring decoder-only models for better performance in dialogue summarization

Conclusion

This project involved significant learning and effort, given the lack of prior experience with deep learning and PyTorch. Despite constraints, the model provided insightful results and highlighted areas for further exploration and improvement.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
README.md		README.md
model.py		model.py
requirements.txt		requirements.txt
transformer.ipynb		transformer.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Custom Transformer-Based Text Summarization

Project Overview

Dataset

Implementation Details

Model Architecture

Training Details

Evaluation Metrics

Results and Analysis

References

Learning Resources Used:

Related Work:

Future Improvements

Conclusion

About

Uh oh!

Releases

Packages

Languages

Holy-Morphism/Transformer

Folders and files

Latest commit

History

Repository files navigation

Custom Transformer-Based Text Summarization

Project Overview

Dataset

Implementation Details

Model Architecture

Training Details

Evaluation Metrics

Results and Analysis

References

Learning Resources Used:

Related Work:

Future Improvements

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages