GitHub - jkpr/OdkGraph: Conceptualize an XlsForm as a graph

ODK graph

Description

Analyze your XlsForms as directed graphs. Survey elements, such as select_one ..., calculate, or note become nodes in such a graph. In other words, the nodes of the graph are the individual XlsForm questions (rows in an XlsForm). The edges are dependencies on other questions. If question B depends on question A being answered a specific way (e.g. through ${...} in the relevant column), then an edge points from A to B. A dependency could also be when a label in question D displays the value of survey element C. Here an edge points from C to D.

Installation

All package dependencies, networkx and xlrd, are on PyPI. To install, a single pip call on the command line suffices:

python3 -m pip install https://github.com/jkpr/OdkGraph/zipball/master

Usage

First, make sure the ODK Xlsform converts cleanly to XML.

Import the OdkGraph class with

from odkgraph import OdkGraph

Next, create an OdkGraph object. The __init__ method accepts a path to the file:

odk_graph = OdkGraph('/path/to/odk/xlsform.xlsx')

Access Xlsform nodes

Access nodes through a variety of ways

odk_graph['age']        # Get the ODK survey element (node) named 'age'
odk_graph[0]            # Zero-indexed node access. This example returns the first node
odk_graph.excel_row(2)  # Return the ODK survey element from row 2 in the Excel file

Slicing is also supported.

Learn stuff

Some useful things this code does now that we have an OdkGraph object:

odk_graph.number_edges()            # The number of edges (dependencies)
odk_graph.number_nodes()            # The number of nodes (survey elements)
odk_graph.forward_dependencies()    # The ODK elements that depend on things that are defined after them in the Xlsform
odk_graph.terminal_nodes()          # The ODK elements that depend on other elements, but nothing depends on them
odk_graph.isolates()                # The ODK elements that depend on nothing else, and nothing depends on them
odk_graph.simple_cycles()           # A list of cyclical dependencies

With node(s) in hand, we can do

age = odk_graph['age']
odk_graph.predecessors(age)             # All nodes that 'age' directly depends on
odk_graph.successors(age)               # All nodes that directly depend on 'age'
odk_graph.all_dependencies_of([age])    # All nodes that 'age directly or indirectly depends on
odk_graph.all_nodes_dependent_on([age]) # All nodes that directly or indirectly depend on 'age'

The underlying networkx network (documentation here) can be accessed with

See all methods and attributes on OdkGraph and their docstrings with

or by reading the source code.

Bugs

Submit bug reports to James K. Pringle at jpringleBEAR@jhu.edu minus the bear.