Vor kurzem hat mich ein Kollege um Hilfe beim Parsen eines JSON-Dokumentes gebeten. Dafür gibt es zwar schon einige Beispiele mit dem Package APEX_JSON oder den nativen SQL/JSON Funktionen in Oracle12c. Der Fall lag hier aber etwas anders.
Es geht darum, ein JSON-Dokument zu parsen, dessen Struktur man gar nicht kennt. Egal, ob man das Dokument mit APEX_JSON oder mit den nativen SQL/JSON-Funktionen in Oracle12c parsen möchte - stets muss man wissen, an welchen Attributen man in welchen Stellen des JSON-Dokumentes interessiert ist. Möchte man das JSON jedoch einfach nur "erforschen", so kommt man mit diesen beiden Ansätzen nicht weiter.
Angenommen, es geht darum, das folgende JSON zu parsen und Informationen über alle Knoten zu erhalten:
{ "string": "value", "number": 0, "boolean": true, "array": [ 1,2,3 ], "object": { "string": "value", "array": [ { "number": 0, "string": "value" }, { "number": 1, "string": "value" } ], "object": { "array": [ "v1", "v2", "v3" ] } } }
Man siegt, eine Menge geschachtelter Strukturen. Eine Idee ist es, den in diesem Blog Posting erwähnten Kniff mit APEX_JSON.TO_XMLTYPE zu nutzen: Damit wandelt man das JSON in ein XML-Dokument um ...
select xmlserialize( content apex_json.to_xmltype( '{ "string": "value", "number": 0, "boolean": true, "array": [ 1,2,3 ], "object": { "string": "value", "array": [ { "number": 0, "string": "value" }, { "number": 1, "string": "value" } ], "object": { "array": [ "v1", "v2", "v3" ] } } }' ) indent ) from dual; <?xml version="1.0" encoding="UTF-8"?> <json> <string>value</string> <number>0</number> <boolean>true</boolean> <array> <row>1</row> <row>2</row> <row>3</row> </array> :
... und die SQL/XML-Funktionen bieten einige Möglichkeiten an, die Struktur zu erforschen ...
with xml_from_json as ( select apex_json.to_xmltype( '{ "string": "value", "number": 0, "boolean": true, "array": [ 1,2,3 ], "object": { "string": "value", "array": [ { "number": 0, "string": "value" }, { "number": 1, "string": "value" } ], "object": { "array": [ "v1", "v2", "v3" ] } } }' ) xml from dual ) select tagname, parentname, count(*) occurrences from xml_from_json, xmltable( 'for $x in //* return <r><t>{$x/name()}</t><p>{$x/parent::*/name()}</p></r>' passing xml columns tagname varchar2(15) path '/r/t', parentname varchar2(15) path '/r/p' ) group by tagname, parentname TAGNAME PARENTNAME OCCURRENCES -------- ----------- ------------ object object 1 string object 1 string json 1 object json 1 number row 2 json 1 row array 8 : : :
Unter Umständen kommt man hiermit schon sehr weit - eine Kleinigkeit ist allerdings beim Umwandeln nach XML verlorengegangen; und das sind die Datentypen der JSON-Attribute. JSON kennt hier immerhin numerische, boolean und String-Datentypen - im XML-Dokument ist alles zum String geworden. Möchte man diese Information behalten, so bleibt noch der programmatische Weg mit dem Paket APEX_JSON. Der nun folgende PL/SQL Code implementiert einen "JSON-Explorer" als Table-Function; man gibt ein JSON-Dokument hinein und bekommt eine Liste mit enthaltenen Elementen, Datentypen und Positionen in der Hierarchie zurück.
drop type json_element_ct / drop type json_element_t / create type json_element_t as object ( attr_level number, attr_name varchar2(255), attr_type varchar2(30), attr_value varchar2(255), attr_path varchar2(255), parent_attr_path varchar2(255) ) / create type json_element_ct as table of json_element_t / create or replace function parse_json( p_json in clob ) return json_element_ct is l_parsed_json apex_json.t_values; l_root_children wwv_flow_t_varchar2; l_root_node apex_json.t_value; l_p0 number := 0; l_p1 number := 0; l_p2 number := 0; l_p3 number := 0; l_p4 number := 0; l_result json_element_ct := json_element_ct(); function replace_path( p_path in varchar2 ) return varchar2 is begin return regexp_replace( regexp_replace( regexp_replace( regexp_replace( regexp_replace( p_path, '%d', l_p0, 1, 1 ), '%d', l_p1, 1, 1 ), '%d', l_p2, 1, 1 ), '%d', l_p3, 1, 1 ), '%d', l_p4, 1, 1 ); end replace_path; procedure evaluate_node( p_prefix in varchar2, p_node in apex_json.t_value, p_node_name in varchar2, p_parent_name in varchar2 default null, p_level in number default 0, p_arraylevel in number default 1 ) is l_node apex_json.t_value; l_node_children wwv_flow_t_varchar2; l_type varchar2(30); l_value varchar2(255); begin if p_node.kind = 2 then l_type := 'boolean'; l_value := 'true'; elsif p_node.kind = 3 then l_type := 'boolean'; l_value := 'false'; elsif p_node.kind = 4 then l_type := 'number'; l_value := to_char( p_node.number_value ); elsif p_node.kind = 5 then l_type := 'varchar2'; l_value := p_node.varchar2_value; elsif p_node.kind = 7 then l_type := 'array'; l_value := null; l_result.extend(1); l_result( l_result.count ) := json_element_t ( p_level, nvl( p_node_name, '<no name>'), l_type, l_value, replace_path( p_prefix || p_node_name ), replace_path( p_parent_name ) ); if p_arraylevel = 1 then l_p0 := 0; end if; if p_arraylevel = 2 then l_p1 := 0; end if; if p_arraylevel = 3 then l_p2 := 0; end if; if p_arraylevel = 4 then l_p3 := 0; end if; if p_arraylevel = 5 then l_p4 := 0; end if; for j in 1 .. p_node.number_value loop if p_arraylevel = 1 then l_p0 := l_p0 + 1; end if; if p_arraylevel = 2 then l_p1 := l_p1 + 1; end if; if p_arraylevel = 3 then l_p2 := l_p2 + 1; end if; if p_arraylevel = 4 then l_p3 := l_p3 + 1; end if; if p_arraylevel = 5 then l_p4 := l_p4 + 1; end if; l_node := apex_json.get_value( p_values => l_parsed_json, p_path => p_prefix || p_node_name|| '[%d]', p0 => l_p0, p1 => l_p1, p2 => l_p2, p3 => l_p3, p4 => l_p4 ); evaluate_node( p_prefix => p_prefix || p_node_name || '[%d]', p_node => l_node, p_node_name => null, p_parent_name => p_prefix || p_node_name, p_level => p_level + 1, p_arraylevel => p_arraylevel + 1 ); end loop; elsif p_node.kind = 6 then l_type := 'object'; l_value := null; l_result.extend(1); l_result( l_result.count ) := json_element_t ( p_level, nvl( p_node_name, '<no name>'), l_type, l_value, replace_path( p_prefix || p_node_name ), replace_path( p_parent_name ) ); l_node_children := apex_json.get_members( p_values => l_parsed_json, p_path => p_prefix || p_node_name, p0 => l_p0, p1 => l_p1, p2 => l_p2, p3 => l_p3, p4 => l_p4 ); if l_node_children is not null then for i in 1 .. l_node_children.count loop l_node := apex_json.get_value( p_values => l_parsed_json, p_path => p_prefix || p_node_name || '.' || l_node_children( i ), p0 => l_p0, p1 => l_p1, p2 => l_p2, p3 => l_p3, p4 => l_p4 ); evaluate_node( p_prefix => p_prefix || p_node_name || '.', p_node => l_node, p_node_name => l_node_children( i ), p_parent_name => p_prefix || p_node_name, p_level => p_level + 1, p_arraylevel => p_arraylevel ); end loop; end if; end if; if p_node.kind in ( 2, 3, 4, 5 ) then l_result.extend( 1 ); l_result( l_result.count ) := json_element_t ( p_level, nvl( p_node_name, '<no name>'), l_type, l_value, replace_path( p_prefix || p_node_name ), replace_path( p_parent_name ) ); end if; end; begin apex_json.parse( l_parsed_json, p_json ); l_root_children := apex_json.get_members( p_values => l_parsed_json, p_path => '.' ); l_result.extend( 1 ); l_result( l_result.count ) := json_element_t ( null, '<root>', 'object', null, '<root>', null ); if l_root_children is not null then for i in 1 .. l_root_children.count loop l_root_node := apex_json.get_value( p_values => l_parsed_json, p_path => l_root_children( i ) ); evaluate_node( p_prefix => null, p_node => l_root_node, p_node_name => l_root_children( i ), p_parent_name => '<root>' ); end loop; end if; return l_result; end; / sho err
Ausprobieren ist dann einfach ...
select * from table(parse_json('{ "string": "value", "number": 0, "boolean": true, "array": [ 1,2,3 ], "object": { "string": "value", "array": [ { "number": 0, "string": "value" }, { "number": 1, "string": "value" } ], "object": { "array": [ "v1", "v2", "v3" ] } } } ' )); ATTR_LEVEL ATTR_NAME ATTR_TYPE ATTR_VALUE ATTR_PATH PARENT_ATTR_PATH ----------- ---------- ---------- ----------- ----------------------- ------------------- <root> object <root> 0 string varchar2 value string <root> 0 number number 0 number <root> 0 boolean boolean true boolean <root> 0 array array array <root> 1 <no name> number 1 array[1] array 1 <no name> number 2 array[2] array 1 <no name> number 3 array[3] array 0 object object object <root> 1 string varchar2 value object.string object 1 array array object.array object 2 <no name> object object.array[1] object.array 3 number number 0 object.array[1].number object.array[1] 3 string varchar2 value object.array[1].string object.array[1] : : : : : :
Wer mag, kann das ganze mit hierarischem SQL ein wenig aufbereiten ...
select lpad(' ', level * 4 - 1, ' ') || attr_name, attr_type, attr_value, attr_path from data start with parent_attr_path is null connect by parent_attr_path = prior attr_path / ATTR_NAME ATTR_TYPE ATTR_VALUE ATTR_PATH ------------------------------ ---------- ----------- ------------------------ <root> object <root> array array array <no name> number 1 array[1] <no name> number 2 array[2] <no name> number 3 array[3] boolean boolean true boolean number number 0 number object object object array array object.array <no name> object object.array[1] number number 0 object.array[1].number : : : :
Fertig; viel Spaß beim Ausprobieren. Die Funktion arbeitet sich durch die ganze Hierarchie; allerdings gilt für Arrays, dass die Schachtelungstiefe maximal fünf sein darf. In das sechste, geschachtelte (!) Array kann die Funktion nicht einsteigen, da APEX_JSON hier auf fünf Ebenen begrenzt ist. Für die meisten Anforderungen sollte das aber genug sein. Außerdem habe ich den Fall, dass das JSON-Dokument bereits auf Root-Ebene ein Array ist, nicht berücksichtigt; es muss ein Objekt sein (das erste Zeichen im JSON muss die geöffnete, geschweifte Klammer "{" sein).
Recently, I helped colleague in how to parse a JSON document in PL/SQL - that was a very interesting case, which I did not blog about so far. I already have blog postings about parsing JSON with SQL or PL/SQL: JSON parsing with the APEX_JSON package or JSON parsing using the Oracle12c native SQL functions. But this case was different.
The requirement was to explore a JSON document - it's not about looking up a very specific value; we want to know which attributes and values are in the document. Assumed, we have the following JSON document and we want to explore it programmatically - what's in the document?
{ "string": "value", "number": 0, "boolean": true, "array": [ 1,2,3 ], "object": { "string": "value", "array": [ { "number": 0, "string": "value" }, { "number": 1, "string": "value" } ], "object": { "array": [ "v1", "v2", "v3" ] } } }
We see some nested structures, objects and arrays. One approach would be to use the APEX_JSON.TO_XMLTYPE function as described in the JSON parsing with APEX_JSON blog posting. With that function, we can convert the JSON document to an XMLTYPE ...
select xmlserialize( content apex_json.to_xmltype( '{ "string": "value", "number": 0, "boolean": true, "array": [ 1,2,3 ], "object": { "string": "value", "array": [ { "number": 0, "string": "value" }, { "number": 1, "string": "value" } ], "object": { "array": [ "v1", "v2", "v3" ] } } }' ) indent ) from dual; <?xml version="1.0" encoding="UTF-8"?> <json> <string>value</string> <number>0</number> <boolean>true</boolean> <array> <row>1</row> <row>2</row> <row>3</row> </array> :
... which allows us to use all the XML DB and SQL/XML functions on it.
with xml_from_json as ( select apex_json.to_xmltype( '{ "string": "value", "number": 0, "boolean": true, "array": [ 1,2,3 ], "object": { "string": "value", "array": [ { "number": 0, "string": "value" }, { "number": 1, "string": "value" } ], "object": { "array": [ "v1", "v2", "v3" ] } } }' ) xml from dual ) select tagname, parentname, count(*) occurrences from xml_from_json, xmltable( 'for $x in //* return <r><t>{$x/name()}</t><p>{$x/parent::*/name()}</p></r>' passing xml columns tagname varchar2(15) path '/r/t', parentname varchar2(15) path '/r/p' ) group by tagname, parentname TAGNAME PARENTNAME OCCURRENCES -------- ----------- ------------ object object 1 string object 1 string json 1 object json 1 number row 2 json 1 row array 8 : : :
That is a nice result; we can see the attribute names, parent names and (if we adjust the SQL query) also the attribute values. But the data types of the JSON attributes are lost: JSON attributes are either numeric, boolean or strings - with the conversion to XMLTYPE they all became strings - and we cannot get that information any more.
The APEX_JSON package provides procedures and functions to explore an unknown JSON document. The following code implements a PL/SQL table function which acts as a "JSON explorer": It parses a JSON and returns all found nodes and attributes together with some context information. Here is the code:
drop type json_element_ct / drop type json_element_t / create type json_element_t as object ( attr_level number, attr_name varchar2(255), attr_type varchar2(30), attr_value varchar2(255), attr_path varchar2(255), parent_attr_path varchar2(255) ) / create type json_element_ct as table of json_element_t / create or replace function parse_json( p_json in clob ) return json_element_ct is l_parsed_json apex_json.t_values; l_root_children wwv_flow_t_varchar2; l_root_node apex_json.t_value; l_p0 number := 0; l_p1 number := 0; l_p2 number := 0; l_p3 number := 0; l_p4 number := 0; l_result json_element_ct := json_element_ct(); function replace_path( p_path in varchar2 ) return varchar2 is begin return regexp_replace( regexp_replace( regexp_replace( regexp_replace( regexp_replace( p_path, '%d', l_p0, 1, 1 ), '%d', l_p1, 1, 1 ), '%d', l_p2, 1, 1 ), '%d', l_p3, 1, 1 ), '%d', l_p4, 1, 1 ); end replace_path; procedure evaluate_node( p_prefix in varchar2, p_node in apex_json.t_value, p_node_name in varchar2, p_parent_name in varchar2 default null, p_level in number default 0, p_arraylevel in number default 1 ) is l_node apex_json.t_value; l_node_children wwv_flow_t_varchar2; l_type varchar2(30); l_value varchar2(255); begin if p_node.kind = 2 then l_type := 'boolean'; l_value := 'true'; elsif p_node.kind = 3 then l_type := 'boolean'; l_value := 'false'; elsif p_node.kind = 4 then l_type := 'number'; l_value := to_char( p_node.number_value ); elsif p_node.kind = 5 then l_type := 'varchar2'; l_value := p_node.varchar2_value; elsif p_node.kind = 7 then l_type := 'array'; l_value := null; l_result.extend(1); l_result( l_result.count ) := json_element_t ( p_level, nvl( p_node_name, '<no name>'), l_type, l_value, replace_path( p_prefix || p_node_name ), replace_path( p_parent_name ) ); if p_arraylevel = 1 then l_p0 := 0; end if; if p_arraylevel = 2 then l_p1 := 0; end if; if p_arraylevel = 3 then l_p2 := 0; end if; if p_arraylevel = 4 then l_p3 := 0; end if; if p_arraylevel = 5 then l_p4 := 0; end if; for j in 1 .. p_node.number_value loop if p_arraylevel = 1 then l_p0 := l_p0 + 1; end if; if p_arraylevel = 2 then l_p1 := l_p1 + 1; end if; if p_arraylevel = 3 then l_p2 := l_p2 + 1; end if; if p_arraylevel = 4 then l_p3 := l_p3 + 1; end if; if p_arraylevel = 5 then l_p4 := l_p4 + 1; end if; l_node := apex_json.get_value( p_values => l_parsed_json, p_path => p_prefix || p_node_name|| '[%d]', p0 => l_p0, p1 => l_p1, p2 => l_p2, p3 => l_p3, p4 => l_p4 ); evaluate_node( p_prefix => p_prefix || p_node_name || '[%d]', p_node => l_node, p_node_name => null, p_parent_name => p_prefix || p_node_name, p_level => p_level + 1, p_arraylevel => p_arraylevel + 1 ); end loop; elsif p_node.kind = 6 then l_type := 'object'; l_value := null; l_result.extend(1); l_result( l_result.count ) := json_element_t ( p_level, nvl( p_node_name, '<no name>'), l_type, l_value, replace_path( p_prefix || p_node_name ), replace_path( p_parent_name ) ); l_node_children := apex_json.get_members( p_values => l_parsed_json, p_path => p_prefix || p_node_name, p0 => l_p0, p1 => l_p1, p2 => l_p2, p3 => l_p3, p4 => l_p4 ); if l_node_children is not null then for i in 1 .. l_node_children.count loop l_node := apex_json.get_value( p_values => l_parsed_json, p_path => p_prefix || p_node_name || '.' || l_node_children( i ), p0 => l_p0, p1 => l_p1, p2 => l_p2, p3 => l_p3, p4 => l_p4 ); evaluate_node( p_prefix => p_prefix || p_node_name || '.', p_node => l_node, p_node_name => l_node_children( i ), p_parent_name => p_prefix || p_node_name, p_level => p_level + 1, p_arraylevel => p_arraylevel ); end loop; end if; end if; if p_node.kind in ( 2, 3, 4, 5 ) then l_result.extend( 1 ); l_result( l_result.count ) := json_element_t ( p_level, nvl( p_node_name, '<no name>'), l_type, l_value, replace_path( p_prefix || p_node_name ), replace_path( p_parent_name ) ); end if; end; begin apex_json.parse( l_parsed_json, p_json ); l_root_children := apex_json.get_members( p_values => l_parsed_json, p_path => '.' ); l_result.extend( 1 ); l_result( l_result.count ) := json_element_t ( null, '<root>', 'object', null, '<root>', null ); if l_root_children is not null then for i in 1 .. l_root_children.count loop l_root_node := apex_json.get_value( p_values => l_parsed_json, p_path => l_root_children( i ) ); evaluate_node( p_prefix => null, p_node => l_root_node, p_node_name => l_root_children( i ), p_parent_name => '<root>' ); end loop; end if; return l_result; end; / sho err
We can test it easily ...
select * from table(parse_json('{ "string": "value", "number": 0, "boolean": true, "array": [ 1,2,3 ], "object": { "string": "value", "array": [ { "number": 0, "string": "value" }, { "number": 1, "string": "value" } ], "object": { "array": [ "v1", "v2", "v3" ] } } } ' )); ATTR_LEVEL ATTR_NAME ATTR_TYPE ATTR_VALUE ATTR_PATH PARENT_ATTR_PATH ----------- ---------- ---------- ----------- ----------------------- ------------------- <root> object <root> 0 string varchar2 value string <root> 0 number number 0 number <root> 0 boolean boolean true boolean <root> 0 array array array <root> 1 <no name> number 1 array[1] array 1 <no name> number 2 array[2] array 1 <no name> number 3 array[3] array 0 object object object <root> 1 string varchar2 value object.string object 1 array array object.array object 2 <no name> object object.array[1] object.array 3 number number 0 object.array[1].number object.array[1] 3 string varchar2 value object.array[1].string object.array[1] : : : : : :
And with some hierarchical SQL, we can pretty-print the JSON hierarchy.
select lpad(' ', level * 4 - 1, ' ') || attr_name, attr_type, attr_value, attr_path from data start with parent_attr_path is null connect by parent_attr_path = prior attr_path / ATTR_NAME ATTR_TYPE ATTR_VALUE ATTR_PATH ------------------------------ ---------- ----------- ------------------------ <root> object <root> array array array <no name> number 1 array[1] <no name> number 2 array[2] <no name> number 3 array[3] boolean boolean true boolean number number 0 number object object object array array object.array <no name> object object.array[1] number number 0 object.array[1].number : : : :
That's it - have fun trying this out. Basically the function walks through the complete JSON hierarchy, but due to a restriction in APEX_JSON, it can only access up to five nested(!) arrays. Also I did not take care for the case that the JSON document is an array at the root level (starting with a "["); it only works for JSON documents which are objects at the root level (starting with a "{").