mod_proxy_html
Note
mod_proxy_html has now been relicensed and incorporated into the core Apache HTTPD distribution at apache.org from HTTPD 2.4. That version is now likely to be more up-to-date than this one.
Here a one-line bug-fix has been introduced since version 3.1.3. If you downloaded the bundled package from here, you should apply the patch before compiling. The bug won't affect most users and is not a security issue, so if you are already using mod_proxy_html successfully there's nothing to do.
mod_proxy_html is an output filter to rewrite HTML links in a proxy
situation, to ensure that links work for users outside the proxy.
It serves the same purpose as Apache's ProxyPassReverse
directive does for HTTP headers, and is an essential component of a
reverse proxy.
For example, if a company has an application server at appserver.example.com that is only visible from within the company's internal network, and a public webserver www.example.com, they may wish to provide a gateway to the application server at http://www.example.com/appserver/. When the application server links to itself, those links need to be rewritten to work through the gateway. mod_proxy_html serves to rewrite <a href="http://appserver.example.com/foo/bar.html">foobar</a> to <a href="http://www.example.com/appserver/foo/bar.html">foobar</a> making it accessible from outside.
April 2006 Added a new FAQ to deal with some of the questions people commonly mail me with.
Origins and History
For more recent history, see updates at apache.org.
The current bugfix release is 3.1.4. The most recent single-line bugfix is dated 2013-02-04.
The current version is 3.1. This builds on the new features in 3.0, but delegates internationalisation/charset support to mod_xml2enc. The previous version 3.0 is here.
The origins and history of mod_proxy_html are documented in a separate page.
Related Modules
- mod_proxy_html is derived originally from mod_accessibility.
- A companion module mod_proxy_xml serves a similar purpose for XML document types, based on extensible namespace processing.
- mod_publisher is the universal markup manipulation filter. It includes the capabilities of mod_proxy_html amongst a broad range of powerful markup processing options.
- mod_xml2enc provides extensive internationalisation support for mod_proxy_html. In the absence of mod_xml2enc, mod_proxy_html will work well on ASCII or UTF-8 (Unicode), but may display other character sets incorrectly (or not at all).
Capabilities
The original capabilities of mod_proxy_html are:
- Parses HTML and XHTML markup, rewriting links according to rules defined by the server admin.
- Optionally converts HTML to XHTML or vice versa.
- Optionally makes some minor corrections to broken HTML.
Important changes from 1.x to 2.x include:
- Support for rewriting URLs in scripts, stylesheets and scripting events
- Support for regular expression match-and-replace
- Improved charset detection, including XML BOM, XML declaration, and HTML META where the information is not available in HTTP.
- Support for converting
meta http-equivHTML elements to real HTTP headers. - The default FPI (doctype) generated has changed to none, on the grounds that it's better to omit it than declare a bogus doctype. You should now configure it explicitly if your backend generates sane HTML or XHTML.
- A verbose logging option is provided to help with testing your configuration and diagnosing exactly what it is doing.
Important changes from 2.x to 3.x include:
- Improved Internationalisation support no longer relies on libxml2's useful but limited builtin capabilities, but is delegated to mod_xml2enc.
- Configurable support for proprietary HTML variants.
- Enables fixups for bad HTML, including corrections and workarounds for common problems.
- Flexible configuration using environment variables, interpolation, and conditional execution of rules.
With these new features, mod_proxy_html might find applications outside a
proxy context, and an environment variable PROXY_HTML_FORCE
can be used to force it to process non-proxied pages.
But reverse-proxying remains its primary purpose, while
mod_publisher offers a far wider range
of capabilities for other applications.
How to use it
Important Note: Configuration has changed:
- Unless you only need to handle unicode (or ASCII), you should load mod_xml2enc alongside mod_proxy_html.
- Instead of using Apache's filter configuration directives, use
the new
ProxyHTMLEnabledirective, and mod_proxy_html will configure both itself and mod_xml2enc for you.
We now have three documents: a tutorial on reverse proxying (including basic use of mod_proxy_html), a user guide, and a configuration guide. Please read the tutorial first, unless you already know it all!
Tutorial: Reverse Proxying
A tutorial on reverse proxying with Apache by the author of this module is available at ApacheTutor. This offers an in-depth overview of the problem solved by this module, in context.
Configuration Guide
Reference documentation for the configuration directives implemented by mod_proxy_html.
Technical Guide
This focuses on the module itself, separate from the context of the problem it solves.
Frequently Asked Questions
Some questions come up repeatedly (sometimes in the form of bug reports, feature requests, or patches). Please see the FAQ before mailing anything.
Support
Support is available from the developer for paying clients.
Support is also available on an ad-hoc basis from the apache community, including the users mailinglist, and the #httpd channel on irc.freenode.org.
Availability
mod_proxy_html.c source code is available from here under the GNU General Public License (GPL) Version 2.
- Last bundled Version (3.1.2) as bzipped tar (signature) and as zipfile (signature).
- Development Version.
NOTE: If you downloaded the bundle before October 30th, you should upgrade your copy of mod_xml2enc to 1.0.3, which fixes a bug. The current bundle includes the bugfix.
The current version from apache.org has been relicensed under the Apache License. If you are redistributing anything derived from mod_proxy_html, you should use the source whose license best meets your needs. If neither license is appropriate, we can consider relicensing on your own terms for payment.
Errata
Users of the FreeBSD port of mod_proxy_html, please see this fix.