NLP with PyTorch
Code and slides to accompany the online series of webinars: https://data4sci.com/nlp-with-pytorch by Data For Science.
Natural Language lies at the heart of current developments in Artificial Intelligence, User Interaction, and Information Processing. The combination of unprecedented corpora of written text provided by social media and the massification of computational power has led to increased interest in the development of modern NLP tools based on state-of-the-art Deep Learning tools.
In this course, participants are introduced to the fundamental concepts and algorithms used for Natural Language Processing (NLP) through an in-depth exploration of different examples built using the PyTorch framework for deep learning. Applications to real datasets will be explored in detail.
Schedule
1. Foundations of NLP
- One-Hot Encoding
- TF/IDF and Stemming
- Stopwords
- N-grams
- Working with Word Embeddings
2. Neural Networks with PyTorch
- PyTorch review
- Activation Functions
- Loss Functions
- Training procedures
- Network Architectures
3. Text classification
- Feed Forward Networks
- Convolutional Neural Networks
- Applications
4. Word Embeddings
- Motivations
- Skip-gram and Continuous Bag of words
- Transfer Learning
5. Sequence Modeling
- Recurrent Network Networks
- Gated Recurrent Unit
- Long-Short Term Memory
- Encoder-Decoder Models
- Text Generation
Author
Getting Started
Prerequisites
- Python 3.10 or higher (up to 3.13)
- macOS, Linux, or Windows
Installation
Option 1: Using uv (Recommended)
uv is a fast Python package installer and resolver. If you don't have it installed:
# Install uv curl -LsSf https://astral.sh/uv/install.sh | sh
Then clone and setup the repository:
# Clone the repository git clone https://github.com/DataForScience/AdvancedNLP.git cd AdvancedNLP # Install dependencies uv sync # Run Jupyter uv run jupyter notebook
Option 2: Using pip and venv
# Clone the repository git clone https://github.com/DataForScience/AdvancedNLP.git cd AdvancedNLP # Create virtual environment python -m venv .venv # Activate virtual environment # On macOS/Linux: source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install dependencies pip install -e . # Run Jupyter jupyter notebook
Hardware Acceleration
This project supports hardware acceleration for faster training:
- Apple Silicon (M1/M2/M3): Automatically uses MPS (Metal Performance Shaders) backend
- NVIDIA GPUs: Automatically uses CUDA if available
- CPU: Falls back to CPU if no GPU is available
Running the Notebooks
Once Jupyter is running, open any of the numbered notebooks:
- Foundations of NLP.ipynb
- Neural Networks with PyTorch.ipynb
- Text Classification.ipynb
- Word Embeddings.ipynb
- Sequence Modeling.ipynb