Fix issue with multi-byte utf-8 characters in MD by ZarToK · Pull Request #1221 · simplesamlphp/simplesamlphp
Hi @ZarToK!
Can you provide any more information on this? Why is this necessary? What's the issue you are having, and how to reproduce?
I'm asking because UTF-8 is the default, as far as I see, and after a very quick test, it looks like it works as expected with multibyte characters. So I'm wondering if this could be an issue with your particular deployment...
@tvdijen sorry for not responding to this. I will try to replicate the issue again and describe it better. It could be on our environment but we are running this in a kubernetes-cluster with docker-images so its basically a fresh php7.3-apache setup. It could ofc be that i have made some encoding issues with a config file for the SP or something but the reason is that the contact information for the SP contained Swedish charachters like "åäö" and these would get corrupted and the XML failed to be read by others.
Not a problem at all! Let me know and we'll reopen the issue if necessary.. I've seen the strangest things happen in docker-containers so I'm really not surprised tbh..
Dear @tvdijen,
Italian language has several characters that love UTF-8: à è ì ò ù (for example).
Can I hope that SSP v2.0 generates XML metadata with the "UTF-8" encoding to support also our special characters in the metadata values?
Thank you!
Marco
On v2.0.0 I see that the metadata generated hasn't the UTF-8 encoding.
Can I expect that on the v2.0.1 will be there?
Thank you @tvdijen
Without the patch proposed by @ZarToK:
With the patch proposed by @ZarToK:
I see an improvement with the patch proposed.
Is it clear now?
I also add that the line:
<?xml version="1.0" encoding="utf-8"?>
is not reported into the metadata downloaded from the location /simplesaml/module.php/saml/idp/metadata.
Thanks
I've checked and can reproduce indeed. The fix by @ZarToK seems to fix it just fine.
Removal of the first line seems to happen by
| // make sure to export only the md:EntityDescriptor | |
| $i = strpos($metaxml, '<md:EntityDescriptor'); | |
| $metaxml = substr($metaxml, $i ? $i : 0); | |
| // 22 = strlen('</md:EntityDescriptor>') | |
| $i = strrpos($metaxml, '</md:EntityDescriptor>'); | |
| $metaxml = substr($metaxml, 0, $i ? $i + 22 : 0); |
Not sure why that was added @tvdijen? It seems newly added in f461c7b and was not present in the www-script version.
So, maybe it wasn't a good idea to drop DOMDocumentFactory::create @thijskh ? Do we want to default to UTF-8 in this case?
In general I think utf-8 is the only sane choice in the current world. So I think it's best to always create utf-8 documents.
In current SSP it seems the SAML messages are correctly encoded. Only the metadata is not. So this fixes the metadata. Except with the note that the <?xml .. part is stripped off for unknown reasons as per my previous comment.
Can't tell you that.. Need to investigate this a bit more, and there's only 48 hrs in a weekend before life/work takes over again
Forgive me @tvdijen, I didn't want to rush. I can wait without problems.
Forgive me @tvdijen, I didn't want to rush. I can wait without problems.
No offense taken ;)
I've checked and can reproduce indeed. The fix by @ZarToK seems to fix it just fine.
Removal of the first line seems to happen by
// make sure to export only the md:EntityDescriptor $i = strpos($metaxml, '<md:EntityDescriptor'); $metaxml = substr($metaxml, $i ? $i : 0); // 22 = strlen('</md:EntityDescriptor>') $i = strrpos($metaxml, '</md:EntityDescriptor>'); $metaxml = substr($metaxml, 0, $i ? $i + 22 : 0); Not sure why that was added @tvdijen? It seems newly added in f461c7b and was not present in the www-script version.
I tried to trace back my steps, but I cannot recall why this was done.. I'm fine with removing it
github-actions
bot
locked as resolved and limited conversation to collaborators
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

