GitHub - mysticmind/reversemarkdown-net: ReverseMarkdown.Net is a Html to Markdown converter library in C#. Conversion is very reliable since HtmlAgilityPack (HAP) library is used for traversing the Html DOM

Meet ReverseMarkdown

Build status NuGet Version

ReverseMarkdown is a Html to Markdown converter library in C#. Conversion is very reliable since the HtmlAgilityPack (HAP) library is used for traversing the HTML DOM.

If you have used and benefitted from this library. Please feel free to sponsor me!
GitHub Sponsor

Features

Core conversion

  • Supports common HTML tags like h1-h6, p, em, strong, i, b, blockquote, code, img, a, hr, li, ol, ul, table, tr, th, td, br, pre, del, strike, sup, dl, dt, dd, div, and span
  • Supports nested lists
  • Improved performance with optimized text writer approach and O(1) ancestor lookups

Markdown flavors

  • GitHub Flavoured Markdown conversion for br, pre, tasklists, and table. Use var config = new ReverseMarkdown.Config(githubFlavoured:true);. By default the table will always be converted to Github flavored markdown immaterial of this flag
  • Slack Flavoured Markdown conversion. Use var config = new ReverseMarkdown.Config { SlackFlavored = true };
  • CommonMark-focused output with opt-in flags to preserve compatibility. Use var config = new ReverseMarkdown.Config { CommonMark = true }; This mode may emit inline HTML for tricky emphasis/link cases unless you disable CommonMarkUseHtmlInlineTags.

Tables

  • Support for nested tables (converted as HTML inside markdown)
  • Support for table captions (rendered as paragraph above table)
  • Configurable table header handling

Links and images

  • Smart link handling and URI scheme whitelisting for links and images
  • Base64-encoded image handling with options to include as-is, skip, or save to disk

Extensibility and safety

  • Tag aliasing and unknown tag replacement options for custom conversion behavior
  • Pass-through, bypass, drop, or raise strategies for unknown tags
  • Pre-tidy handling for malformed unclosed script/style tags

Formatting controls

  • Configurable list bullets and default code block language
  • Comment removal and optional whitespace cleanup

Usage

Install the package from NuGet using Install-Package ReverseMarkdown or clone the repository and build it yourself.

var converter = new ReverseMarkdown.Converter();

string html = "This a sample <strong>paragraph</strong> from <a href=\"http://test.com\">my site</a>";

string result = converter.Convert(html);

snippet source | anchor

Will result in:

This a sample **paragraph** from [my site](http://test.com)

snippet source | anchor

The conversion can also be customized:

var config = new ReverseMarkdown.Config
{
    // Include the unknown tag completely in the result (default as well)
    UnknownTags = Config.UnknownTagsOption.PassThrough,
    // generate GitHub flavoured markdown, supported for BR, PRE and table tags
    GithubFlavored = true,
    // will ignore all comments
    RemoveComments = true,
    // remove markdown output for links where appropriate
    SmartHrefHandling = true
};

var converter = new ReverseMarkdown.Converter(config);

snippet source | anchor

To treat <pre> (and <pre><code>) content as normal HTML instead of code blocks:

var config = new ReverseMarkdown.Config
{
    ConvertPreContentAsHtml = true
};

var converter = new ReverseMarkdown.Converter(config);

If you need to preserve markdown-like text as literal content (for example # Heading or - Item), either enable EscapeMarkdownLineStarts or use CommonMark:

var config = new ReverseMarkdown.Config
{
    EscapeMarkdownLineStarts = true
    // or CommonMark = true
};

var converter = new ReverseMarkdown.Converter(config);

Configuration options

  • DefaultCodeBlockLanguage - Option to set the default code block language for Github style markdown if class based language markers are not available

  • GithubFlavored - Github style markdown for br, pre and table. Default is false

  • SlackFlavored - Slack style markdown formatting. When enabled, uses * for bold, _ for italic, ~ for strikethrough, and for list bullets. Default is false

  • CommonMark - Enable CommonMark-focused output rules. Default is false

  • CommonMarkUseHtmlInlineTags - When CommonMark is enabled, emit HTML for inline tags (em, strong, a, img) to avoid delimiter edge cases. Default is true

  • CommonMarkIntrawordEmphasisSpacing - When CommonMark is enabled, insert spaces to avoid intraword emphasis. Default is false

    • Note: CommonMark is best used on its own. Combining CommonMark with GithubFlavored can produce mixed output; keep them separate unless you explicitly want that behavior.
  • EscapeMarkdownLineStarts - Escape markdown line starts (headings, lists, block markers) in plain text output. Default is false

    • Note: If you need to preserve markdown-like text as literal content, enable EscapeMarkdownLineStarts or use CommonMark.
  • OutputLineEnding - Output line endings used in generated markdown. Default is Environment.NewLine

  • CleanupUnnecessarySpaces - Cleanup unnecessary spaces in the output. Default is true

  • SuppressDivNewlines - Removes prefixed newlines from div tags. Default is false

  • ConvertPreContentAsHtml - Treat <pre> (and <pre><code>) content as normal HTML instead of a code block. Default is false

  • ListBulletChar - Allows you to change the bullet character. Default value is -. Some systems expect the bullet character to be * rather than -, this config allows you to change it. Note: This option is ignored when SlackFlavored is enabled

  • RemoveComments - Remove comment tags with text. Default is false

  • SmartHrefHandling - How to handle <a> tag href attribute

    • false - Outputs [{name}]({href}{title}) even if the name and href is identical. This is the default option.

    • true - If the name and href equals, outputs just the name. Note that if the Uri is not well formed as per Uri.IsWellFormedUriString (i.e string is not correctly escaped like http://example.com/path/file name.docx) then markdown syntax will be used anyway.

      If href contains http/https protocol, and name doesn't but otherwise are the same, output href only

      If tel: or mailto: scheme, but afterwards identical with name, output name only.

  • UnknownTags - handle unknown tags.

    • UnknownTagsOption.PassThrough - Include the unknown tag completely into the result. That is, the tag along with the text will be left in output. This is the default
    • UnknownTagsOption.Drop - Drop the unknown tag and its content
    • UnknownTagsOption.Bypass - Ignore the unknown tag but try to convert its content
    • UnknownTagsOption.Raise - Raise an error to let you know
  • UnknownTagsReplacer - Optional replacements for unknown tags. Key is tag name and value is the markdown wrapper used as prefix/suffix around converted content (example: { ["u"] = "*" }).

  • TagAliases - Optional alias map to treat a tag as another tag during conversion (example: { ["u"] = "em" }).

  • PassThroughTags - Pass a list of tags to pass through as-is without any processing.

  • WhitelistUriSchemes - Specify which schemes (without trailing colon) are to be allowed for <a> and <img> tags. Others will be bypassed (output text or nothing). By default allows everything.

    If string.Empty provided and when href or src schema couldn't be determined - whitelists

    Schema is determined by Uri class, with exception when url begins with / (file schema) and // (http schema)

  • TableWithoutHeaderRowHandling - handle table without header rows

    • TableWithoutHeaderRowHandlingOption.Default - First row will be used as header row (default)
    • TableWithoutHeaderRowHandlingOption.EmptyRow - An empty row will be added as the header row
  • TableHeaderColumnSpanHandling - Set this flag to handle or process table header column with column spans. Default is true

  • Base64Images - Control how base64-encoded images (inline data URIs) are handled during conversion

    • Base64ImageHandling.Include - Include base64-encoded images in the markdown output as-is (default behavior)
    • Base64ImageHandling.Skip - Skip/ignore base64-encoded images entirely
    • Base64ImageHandling.SaveToFile - Save base64-encoded images to disk and reference the saved file path in markdown. Requires Base64ImageSaveDirectory to be set
  • Base64ImageSaveDirectory - When Base64Images is set to SaveToFile, specifies the directory path where images should be saved

  • Base64ImageFileNameGenerator - When Base64Images is set to SaveToFile, this function generates a filename for each saved image. The function receives the image index (int) and MIME type (string), and should return a filename without extension. If not specified, images will be named as image_0, image_1, etc.

Custom converter alias

You can also register a tag to reuse another tag's converter directly:

var converter = new ReverseMarkdown.Converter();
converter.Register("u", new ReverseMarkdown.Converters.AliasConverter(converter, "em"));

Base64 Image Handling Examples

ReverseMarkdown provides flexible options for handling base64-encoded images (inline data URIs) during HTML to Markdown conversion.

Include Base64 Images (Default)

By default, base64-encoded images are included in the markdown output as-is:

var converter = new ReverseMarkdown.Converter();
string html = "<img src=\"data:image/png;base64,iVBORw0KGg...\" alt=\"Sample Image\"/>";
string result = converter.Convert(html);
// Output: ![Sample Image](data:image/png;base64,iVBORw0KGg...)

snippet source | anchor

Skip Base64 Images

To ignore base64-encoded images entirely:

var config = new ReverseMarkdown.Config
{
    Base64Images = Config.Base64ImageHandling.Skip
};
var converter = new ReverseMarkdown.Converter(config);
string html = "<img src=\"data:image/png;base64,iVBORw0KGg...\" alt=\"Sample Image\"/>";
string result = converter.Convert(html);
// Output: (empty - image is skipped)

snippet source | anchor

Save Base64 Images to Disk

To extract and save base64-encoded images to disk:

var config = new ReverseMarkdown.Config
{
    Base64Images = Config.Base64ImageHandling.SaveToFile,
    Base64ImageSaveDirectory = "/path/to/images"
};
var converter = new ReverseMarkdown.Converter(config);
string html = "<img src=\"data:image/png;base64,iVBORw0KGg...\" alt=\"Sample Image\"/>";
string result = converter.Convert(html);
// Output: ![Sample Image](/path/to/images/image_0.png)
// Image file saved to: /path/to/images/image_0.png

snippet source | anchor

Custom Filename Generator

You can provide a custom filename generator for saved images:

var config = new ReverseMarkdown.Config
{
    Base64Images = Config.Base64ImageHandling.SaveToFile,
    Base64ImageSaveDirectory = "/path/to/images",
    Base64ImageFileNameGenerator = (index, mimeType) => 
    {
        var timestamp = DateTime.Now.ToString("yyyyMMdd_HHmmss");
        return $"converted_{timestamp}_{index}";
    }
};
var converter = new ReverseMarkdown.Converter(config);
// Images will be saved as: converted_20260108_143022_0.png, converted_20260108_143022_1.jpg, etc.

snippet source | anchor

Supported Image Formats:

  • PNG (image/png)
  • JPEG (image/jpeg, image/jpg)
  • GIF (image/gif)
  • BMP (image/bmp)
  • TIFF (image/tiff)
  • WebP (image/webp)
  • SVG (image/svg+xml)

Breaking Changes

v5.0.0

Configuration Changes:

  • WhitelistUriSchemes - Changed from string[] to HashSet<string> (read-only property). Use .Add() method to add schemes instead of array assignment
  • PassThroughTags - Changed from string[] to HashSet<string>

API Changes:

  • IConverter interface signature changed from string Convert(HtmlNode node) to void Convert(TextWriter writer, HtmlNode node). If you have custom converters, you'll need to update them to write to the TextWriter instead of returning a string

Target Framework Changes:

  • Removed support for legacy and end-of-life .NET versions. Only actively supported .NET versions are now targeted i.e. .NET 8, .NET 9 and .NET 10.

v2.0.0

  • UnknownTags config has been changed to an enumeration

Acknowledgements

This library's initial implementation ideas from the Ruby based Html to Markdown converter xijo/reverse_markdown.

Copyright

Copyright © Babu Annamalai

License

ReverseMarkdown is licensed under MIT. Refer to License file for more information.