Defines the core metadata model for iSamples.
src/schemas/isamples_core.yaml defines the iSamples core model in LinkML. It references vocabularies contained in isamplesorg/vocabularies/vocabulary which define terms for the Material Type, Sampled Feature, and Material Sample Object Type vocabularies.
Documentation is available at https://isamplesorg.github.io/metadata/
Repository Structure
metadata/
├── src/
│ └── schemas/ # LinkML schema definitions
│ └── isamples_core.yaml
├── examples/ # Example metadata documents from different systems
│ ├── APItesting/
│ ├── GEOME/
│ ├── geoJSON/
│ ├── OpenContext/
│ ├── SESAR/
│ └── smithonsonian/
├── tools/ # Modified docgen tool and templates for Quarto
├── quarto/ # Quarto configuration files
├── build/ # Build output (intermediate docs)
│ └── docs/ # Generated markdown documentation
└── notes/ # Development notes
Development
LinkML and associated tools require a Python environment (version 3.9 or newer) and uses Poetry for dependency management. Poetry can be installed with pip install poetry.
To work on project contents and run artifact generators, first grab the source and switch to the develop branch:
git clone https://github.com/isamplesorg/metadata.git
cd metadata
git checkout develop
git pullSetup a virtual environment using Poetry:
poetry shell poetry install
(To exit poetry shell, use exit).
Artifacts are produced by running make or make all.
Documentation Generation
Documentation is rendered with Quarto rather than the default mkdocs or Sphinx (Quarto offers many additional features for including computed examples). To generate the documentation:
- Install Quarto >= 1.2
- Run
make,make allormake gen-docs
This will generate markdown intermediate files in the build/docs folder, then invoke quarto render to generate HTML documentation.
Note that this project uses a modified version of the LinkML docgen tool and templates to render markdown for Quarto. The modified docgen and templates are located in the tools/ folder.
LinkML Schema Operations
Convert YAML schema to JSON schema
gen-json-schema -t MaterialSampleRecord --not-closed src/schemas/isamples_core.yaml > isamples_core.schema.jsonThe -t MaterialSampleRecord option makes the "MaterialSampleRecord" class the top-level class in the JSON schema.
Generate JSON-LD context
gen-jsonld-context src/schemas/isamples_core.yaml > isamples_core.jsonldAfter generating the JSON-LD context, the enumeration part may need manual modification. For each enumeration, use @type to declare the enumeration type.
Example modified JSON-LD context
{
"@context": {
"dct": "http://purl.org/dc/terms/",
"isam": "http://resource.isamples.org/schema/",
"mat": "http://resource.isamples.org/vocabulary/material/",
"sf": "http://resource.isamples.org/vocabulary/sampledFeature/",
"skos": "http://www.w3.org/2004/02/skos/core#",
"spt": "http://resource.isamples.org/vocabulary/sampleobjecttype/",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"@vocab": "http://resource.isamples.org/schema/",
"hasContextCategory": {
"@type": "contextcategory"
},
"hasMaterialCategory": {
"@type": "materialtype"
},
"has_sample_object_type": {
"@type": "specimencategory"
},
"id": "@id",
"latitude": {
"@type": "xsd:decimal"
},
"longitude": {
"@type": "xsd:decimal"
},
"resultTime": {
"@type": "xsd:date"
}
}
}Validate instance files
linkml-validate -s src/schemas/isamples_core.yaml instance.json jsonschema -i instance.json isamples_core.schema.json
The first command validates an instance file against the YAML schema. The second command validates against the JSON schema.
Docker
The iSamples Metadata Docker container is based on the Docker container from the LinkML project (https://hub.docker.com/r/monarchinitiative/linkml/tags).
Build the image:
docker build -t isamples_linkml .Run the container (opens a bash shell with the repository mounted at /work):
docker run -a stdin -a stdout -i -t -v `pwd`:/work isamples_linkmlRelated iSamples Repositories
| Repo | Purpose | Start Here |
|---|---|---|
| isamplesorg-metadata | Schema definition | src/schemas/isamples_core.yaml |
| isamples-python | Jupyter examples | examples/basic/isamples_explorer.ipynb |
| isamplesorg.github.io | Browser tutorials | tutorials/isamples_explorer.qmd |
| vocabularies | SKOS terms | Material types, context categories |
Sample metadata repository (parquet format)
Wide format (primary) - 280MB, 20M rows
WIDE_URL = "https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide.parquet"
Narrow format (advanced) - 850MB, 106M rows
NARROW_URL = "https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202512_narrow.parquet"