Issue12129
Created on 2011-05-20 22:02 by Kyle.Keating, last changed 2022-04-11 14:57 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| xmlNameVerification.py | jocassid, 2013-07-28 02:49 | code to validate xml element/attribute names | ||
| Messages (7) | |||
|---|---|---|---|
| msg136402 - (view) | Author: Kyle Keating (Kyle.Keating) | Date: 2011-05-20 22:02 | |
I was doing some tests on using this library and I noticed xml elements and attribute names could be created with mal-formed xml because special characters which can break validation are not cleaned or converted from their literal forms. Only the attribute values are cleaned, but not the names.
For example
import xml.dom
...
doc.createElement("p></p>")
...
will just embed a pair of p tags in the xml result. I thought that the xml spec did not permit <, >, &, \n etc. in the element name or attribute name? Could I get some clarification on this, thanks!
|
|||
| msg137142 - (view) | Author: Terry J. Reedy (terry.reedy) * ![]() |
Date: 2011-05-28 18:35 | |
I suspect you are right, but do not know the rules, and have never used the module. There is no particular person maintaining xml.dom.X at present. Could you please fill in the ... after the import to give a complete minimal example that fails? Someone could then test it on 3.2 |
|||
| msg137487 - (view) | Author: Kyle Keating (Kyle.Keating) | Date: 2011-06-02 17:10 | |
This looks to break pretty good... I did confirm this on 3.0, I'm guessing 3.2 is the same.
import sys
import xml.dom
doc = xml.dom.getDOMImplementation().createDocument(None, 'xml', None)
doc.firstChild.appendChild(doc.createElement('element00'))
element01 = doc.createElement('element01')
element01.setAttribute('attribute', "script><![CDATA[alert('script!');]]></script>")
doc.firstChild.appendChild(element01)
element02 = doc.createElement("script><![CDATA[alert('script!');]]></script>")
doc.firstChild.appendChild(element02)
element03 = doc.createElement("new line \n")
element03.setAttribute('attribute-name','new line \n')
doc.firstChild.appendChild(element03)
print doc.toprettyxml(indent=" ")
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
output:
<?xml version="1.0" ?>
<xml>
<element/>
<element01 attribute="script><![CDATA[alert('script!');]]></script
>"/>
<script><![CDATA[alert('script!');]]></script>/>
<new line
attribute-name="new line
"/>
</xml>
|
|||
| msg137488 - (view) | Author: Kyle Keating (Kyle.Keating) | Date: 2011-06-02 17:13 | |
oops, the first xml element in the output should read "<element00/>" not "<element/>" just a typo! don't get confused! |
|||
| msg193804 - (view) | Author: John Cassidy (jocassid) | Date: 2013-07-28 02:49 | |
I added the line print(str(doc)) after the call to getDomImplementation and verified that the errors that I'm seeing are coming from the xml.dom.minidom implemenation of xml.dom. Checking minidom.py I did not see any validation on the tagName that gets passed to createElement. http://www.w3.org/TR/xml11/#NT-NameStartChar lists the format of allowed names. Attached is a file containing the functions I was working on. My thinking is that if the tagName is not valid a ValueError should be thrown. |
|||
| msg258344 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2016-01-16 00:57 | |
My limited understanding is that xml.dom and minidom are supposed to implement particular interfaces. So do these DOM interfaces specify if this validation should be done? If so, this would be a bug. Or is it just a question of whether Python should do extra validation not specified by the underlying DOM API? |
|||
| msg283873 - (view) | Author: Pradeep (pdeep5693) | Date: 2016-12-23 08:39 | |
xml minidom.py needs extra validation in setAttributes for certain special characters depending on the attribute name. Attribute values cannot have special characters like <,> and cant be nested as described in the example below
element01 = doc.createElement('element01')
element01.setAttribute('attribute', "script><![CDATA[alert('script!');]]></script>")
doc.firstChild.appendChild(element01)
script shouldn't be allowed as a value for an attribute and I feel it should throw an exception (Value Exception) and as described above <,> shouldn't be allowed as attributes are more like key-value pairs. Could someone tell me if this is right? If it is, then minidom.py needs this extra level of validation for the same
|
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:57:17 | admin | set | github: 56338 |
| 2019-04-27 11:39:42 | scoder | unlink | issue5166 dependencies |
| 2016-12-23 08:39:36 | pdeep5693 | set | nosy:
+ pdeep5693 messages: + msg283873 |
| 2016-01-16 00:57:27 | martin.panter | set | versions:
+ Python 3.5, Python 3.6 nosy: + martin.panter messages: + msg258344 components:
+ XML, - Library (Lib) |
| 2016-01-16 00:44:53 | martin.panter | link | issue5166 dependencies |
| 2013-07-28 02:49:49 | jocassid | set | files:
+ xmlNameVerification.py nosy: + jocassid messages: + msg193804 |
| 2011-06-02 17:13:17 | Kyle.Keating | set | messages: + msg137488 |
| 2011-06-02 17:10:39 | Kyle.Keating | set | messages: + msg137487 |
| 2011-05-28 18:35:29 | terry.reedy | set | nosy:
+ terry.reedy messages:
+ msg137142 |
| 2011-05-20 22:02:10 | Kyle.Keating | create | |
