PostgreSQL PL/Java – XML view of non-XML data
Because Java has a rich ecosystem of APIs and tools for XML processing, and JDBC supports those directly with the SQLXML data type, it may be useful to offer XML “views” of other PostgreSQL data types that are not XML, but are similarly tree-structured.
A preview of such a feature is included in this release, allowing values
of PostgreSQL's pg_node_tree type to be retrieved as if they were XML.
pg_node_tree
The pg_node_tree type is a representation of PostgreSQL internal data
structures, serialized to a text form found in various places in the
system catalogs: default expressions for attributes, types, or function
parameters, constraint and index expressions, trigger and policy qualifiers,
rewrite rule actions, and so on.
To make full use of the information in a pg_node_tree would require access
to all of the PostgreSQL native structure definitions used in it (which could
become feasible in a future PostgreSQL version if some
current work-in-progress is completed and released). On the other
hand, depending on need, some partial information may be usefully extracted
from a pg_node_tree using a simple syntactic transformation and standard
tools for XML querying.
For an example of the current pg_node_tree syntax, here is the
yes_or_no_check constraint in PostgreSQL 12 (on a little-endian machine):
SELECT conbin FROM pg_constraint WHERE conname = 'yes_or_no_check';
{SCALARARRAYOPEXPR :opno 98 :opfuncid 67 :useOr true :inputcollid 100
:args ({RELABELTYPE :arg {COERCETODOMAINVALUE :typeId 1043 :typeMod 7
:collation 100 :location 133} :resulttype 25 :resulttypmod -1
:resultcollid 100 :relabelformat 2 :location -1} {ARRAYCOERCEEXPR
:arg {ARRAY :array_typeid 1015 :array_collid 100 :element_typeid 1043
:elements ({CONST :consttype 1043 :consttypmod -1 :constcollid 100
:constlen -1 :constbyval false :constisnull false :location 143
:constvalue 7 [ 28 0 0 0 89 69 83 ]} {CONST :consttype 1043 :consttypmod -1
:constcollid 100 :constlen -1 :constbyval false :constisnull false :location 150
:constvalue 6 [ 24 0 0 0 78 79 ]}) :multidims false :location -1}
:elemexpr {RELABELTYPE :arg {CASETESTEXPR :typeId 1043 :typeMod -1 :collation 0}
:resulttype 25 :resulttypmod -1 :resultcollid 100 :relabelformat 2 :location -1}
:resulttype 1009 :resulttypmod -1 :resultcollid 100 :coerceformat 2 :location -1
}) :location 139}
A Java function receiving a pg_node_tree as an argument could be declared
this way:
@Function
public static void pgNodeTreeAsXML(@SQLType("pg_node_tree") SQLXML pgt)
{
...
A parameter with the Java type SQLXML would normally lead to a parameter
type of xml in the generated SQL function declaration, but here the
@SQLType annotation is used to change that, declaring a function that accepts
a pg_node_tree in SQL, but presents it to Java as the SQLXML type.
The pljava-examples jar includes just such a function, only declared to
return xml rather than void. In fact, it returns its argument untouched, so
it can be treated as XML by the surrounding query. Its full implementation is:
@Function
public static SQLXML pgNodeTreeAsXML(@SQLType("pg_node_tree") SQLXML pgt)
throws SQLException
{
return pgt;
}
Using that function (and the XQuery serialize function with the indent
option for readability, courtesy of XQuery-based XMLTABLE), the same
node tree can be viewed in a more familiar structured syntax:
SELECT
xmltable.*
FROM
pg_constraint,
LATERAL (SELECT PgNodeTreeAsXML(conbin) AS ".") AS p,
"xmltable"('serialize(., map{"indent":true()})',
passing => p, columns => '{.}') AS (indented text)
WHERE
conname = 'yes_or_no_check';
<SCALARARRAYOPEXPR>
<member name="opno">98</member>
<member name="opfuncid">67</member>
<member name="useOr">true</member>
<member name="inputcollid">100</member>
<member name="args">
<list>
<RELABELTYPE>
<member name="arg">
<COERCETODOMAINVALUE>
<member name="typeId">1043</member>
<member name="typeMod">7</member>
<member name="collation">100</member>
<member name="location">133</member>
</COERCETODOMAINVALUE>
</member>
<member name="resulttype">25</member>
<member name="resulttypmod">-1</member>
<member name="resultcollid">100</member>
<member name="relabelformat">2</member>
<member name="location">-1</member>
</RELABELTYPE>
<ARRAYCOERCEEXPR>
<member name="arg">
<ARRAY>
<member name="array_typeid">1015</member>
<member name="array_collid">100</member>
<member name="element_typeid">1043</member>
<member name="elements">
<list>
<CONST>
<member name="consttype">1043</member>
<member name="consttypmod">-1</member>
<member name="constcollid">100</member>
<member name="constlen">-1</member>
<member name="constbyval">false</member>
<member name="constisnull">false</member>
<member name="location">143</member>
<member name="constvalue" length="7">1C000000594553</member>
</CONST>
<CONST>
<member name="consttype">1043</member>
<member name="consttypmod">-1</member>
<member name="constcollid">100</member>
<member name="constlen">-1</member>
<member name="constbyval">false</member>
<member name="constisnull">false</member>
<member name="location">150</member>
<member name="constvalue" length="6">180000004E4F</member>
</CONST>
</list>
</member>
<member name="multidims">false</member>
<member name="location">-1</member>
</ARRAY>
</member>
<member name="elemexpr">
<RELABELTYPE>
<member name="arg">
<CASETESTEXPR>
<member name="typeId">1043</member>
<member name="typeMod">-1</member>
<member name="collation">0</member>
</CASETESTEXPR>
</member>
<member name="resulttype">25</member>
<member name="resulttypmod">-1</member>
<member name="resultcollid">100</member>
<member name="relabelformat">2</member>
<member name="location">-1</member>
</RELABELTYPE>
</member>
<member name="resulttype">1009</member>
<member name="resulttypmod">-1</member>
<member name="resultcollid">100</member>
<member name="coerceformat">2</member>
<member name="location">-1</member>
</ARRAYCOERCEEXPR>
</list>
</member>
<member name="location">139</member>
</SCALARARRAYOPEXPR>
Although exact interpretation of all that isn't possible without heavy reference
to the PostgreSQL source, it can be eyeballed for a decent idea of what is
going on, and simple queries could extract useful information for some
purposes. For example, this simple XPath would return the Oids of all types
that are used in constants within the expression:
number(//CONST/member[@name = 'consttype'])
Some details of the mapping
- A
<list>either has no attribute, and children that are, recursively,pg_node_treestructures, or it has anallattribute with valueint,oid, orbitand its children all are<v>elements with numeric content representing integers,Oids, or bit numbers in a bit set, respectively. - A
<CONST>representing a typed SQLNULLwill have aconstvaluemember with nolengthattribute and no content. Otherwise, theconstvaluemember will have content of typexs:hexBinaryand alengthattribute indicating how many octets of the binary content are used. For types withconstbyvaltrue, the hex content will always be the full width of aDatum, though thelengthmay be smaller. For types withconstbyvalfalse, thelengthattribute matches the length of the binary content. - A
<CONST>with aconstlenof-1represents a type with avarlenarepresentation, as described under Database Physical Storage. Theconstvaluein such a case is the entirevarlena, including its header.
The two <CONST> elements in the example above have type 1043
(CHARACTER VARYING) and varlena representations, so the constvalue members
consist of a four-octet header followed by the three ASCII characters YES or
the two characters NO, respectively. The one-octet length difference changes
the varlena header value by four (from 18 to 1C) because the two
lowest-order bits of the header (on little-endian hardware) are usurped for
TOAST.
It is possible that a future PostgreSQL version will change the current idiosyncratic syntax, or serialize to JSON instead.
Limits of the current XML view implementation
Implementation of XML views is work in progress. The current implementation has these limitations:
- It is read-only. There is no provision yet for writing an XML-viewable type
by returning
SQLXMLfrom a Java function, or passingSQLXMLto aResultSet,PreparedStatement, orSQLOutput. - A fully-compliant readable
SQLXMLimplementation should supportgetBinaryStream,getCharacterStream,getString, andgetSourcewith any of the four must-support subtypes ofSource. The current XML-view implementation will support onlygetSource(SAXSource.class)orgetSource(null)(which will return aSAXSource). All other cases will throw anSQLFeatureNotSupportedException.