PyCon2006/Tutorials/TextProcessing
This wiki is in the process of being archived due to lack of usage and the resources necessary to serve it — predominately to bots, crawlers, and LLM companies. Edits are discouraged.
Pages are preserved as they were at the time of archival. For current information, please visit python.org.
If a change to this archive is absolutely needed, requests can be made via the infrastructure@python.org mailing list.
Intended Audience
Beginning to intermediate programmers. A basic working knowledge of Python is assumed.
Summary
This tutorial will introduce beginning to intermediate programmers to the many useful Python tools & techniques for text and data processing. Topics will include regular expressions, filtering data with generators, and parsing.
Outline
- Common data sources needing processing:
- log files
- CSV
- tabular data
- XML
- Tools & techniques:
- lists & dictionaries
- s.join(list) instead of accumulating
- for line in file
- filters, large data sources: generators
- decorate-sort-undecorate
- StringIO
- Regular expressions:
- pattern matching
- filtering
- substitution
- splitting
- Parsing:
- text.split()
- text.find()
- regular expressions
- "real" parsers (including XML)
- state machines
Please send feedback & ideas for further specific topics to the trainer, David Goodger (email, home page).