DefaultEncoder / getCanonicalizedURI returns mix encoding for HTML special characters · ESAPI/esapi-java-legacy · Discussion #823
There's not enough data here for me to offer a diagnosis. What's the input URL that's giving you trouble? @krog78 we have to start there or there's no conversation to be had.
The method is designed to allow Mixed or Multiple encoding depending on what you have set in your ESAPI.properties file, I would check there first.
As written the line to canonicalize is begins like this:
for(UriSegment seg: set){
String value = canonicalize(parseMap.get(seg), allowMultiple, allowMixed);
This iterates over the entire set, queries, if present, included, and they will get canonicalized on this line. Unless of course, you've configured allowMixed and allowMultiple contrary to your intentions. I DO see that I didn't mention in the documentation that those parameters come from ESAPI.properties and will update the docs accordingly.
Further, all of the regression tests were updated in ESAPI to use getCanonicalizedURI, not saying that we couldn't have missed something, but having walked through the code here, I'm not seeing anything obviously wrong.
As for the question:
And the canonicalize is applied to scheme, host, port and also UriSegment.SCHEMSPECIFICPART, is it really relevant?
Would have been nice to have you on the code review back when this was created, I DID NOT think deeply about each URL segment, so this is a bug. Part of what this method was designed to perform was to avoid the general bad practice of using regex to validate URLs when URLs have their own BNF grammar. Validating against that segment breaks our intent, so this is definitely a bug. Separate from your original issue however, which we still have to determine.
I'm currently running a regression having taken that piece out. Please get back to us with the input and usage, and properties file configurations so we can make a determination.