📄 Html2Json Cog for discord.py
This Discord bot cog allows you to convert HTML chapters from https://prabhupadabooks.com/bg into a structured JSON file, preserving verse numbers, translations, synonyms, and other elements.
✨ Features
- ✅ Accepts
.htmlfiles directly from Prabhupāda’s Bhagavad-gītā website. - ✅ Extracts:
- Chapter description
- Verse numbers and text
- Synonyms (standardised and cleaned)
- Translations
- Titles (e.g. ślokas or subheadings)
- ✅ Outputs a clean, readable
.jsonfile for further use in apps, studies, or publication pipelines.
🛠 Installation
- Add this modmail-plugin to your
cogs/folder or use the command in step 3. - BS4
beautifulsoup4is installed automatically via requirements.txt:
- Load the cog in your bot:
!plugin add WebKide/modmail-plugins/html2json@masterAdjust the path as needed based on your cog structure.
🧪 Usage
✅ Command:
📝 How to Use:
-
Open any chapter.
-
Save the entire webpage as an
.htmlfile. -
Attach it to your message and use the command:
💡 Example:
User: (uploads BG_02.html)
User: !html2json
User: hits Enter
Bot: (uploads BG_02.json with parsed verses)
📁 Output Format (JSON)
{
"Chapter-Desc": "The Supreme Personality of Godhead said...",
"Verses": [
{
"Textnum": "2.1",
"Titles": "Śrī Bhagavān uvāca",
"Uvaca-line": "śrī-bhagavān uvāca",
"Synonyms-SA": "śrī-bhagavān — the Supreme Personality of Godhead; uvāca — said...",
"Verse-Text": "saṅjaya uvāca...\nkṛipayā parayāviṣṭo...",
"Translation": "Sañjaya said: Seeing Arjuna full of compassion..."
},
...
]
}🧩 Dependencies
discord.py(v2.x)beautifulsoup4
👨💻 Developer Notes
- Filename Check: Only accepts
.htmlfiles. - Safe Parsing: Filters empty elements and whitespace.
- Cleaning Logic: Normalises hyphens, em-dashes, and punctuation in Sanskrit synonyms and translations.
- Output Handling: JSON is returned as a downloadable file directly in the Discord chat.
🙏 Credits
Developed for parsing and preserving the teachings of Śrīla A.C. Bhaktivedānta Swāmī Prabhupāda from Bhagavad-gītā As It Is in a structured data format.