Welcome to the GitHub project web pages for
IUPAC Project 2019-031-1-024, Development of a Standard for FAIR Data Management for Spectroscopic Data.
At this site we highlight our development of IUPAC FAIRSpec Finding Aids.
See
publications for published and submitted publications.
But, first, 352 pages retrieved from multiple points on the wayBack machine
preserving the CDX/CDXML specification. (Note that some images were not retrievable.)
This specification is important, as it is the basis for one of the most
widely used and accessible formats in the chemistry community for
communicating structural information. The specification is detailed and
extensive, and has been implemented in Jmol, allowing an extractor
to use it as a basis for creating value-added structure representations such as
molecular formulas, InChI strings, SMILES, and MOL-format descriptions,
which can be used both for display and validation. The pages are also available
as a
ZIP file.
NOTE
It is important to note that, while the first two demonstrations, from 2021 and 2023, use their own specialized landing pages,
all of the other demonstrations use identical HTML for their landing page. Only the finding aids are different.
DEMOS
Our first demonstration of a very early prototypical finding aid.
It was generated from a set of supporting information data sets from the
ACS FAIRData pilot study (2020).
This demo just gave an idea of what a minimal extraction of metadata might look like in JSON format.
A more sophisticated demonstration allowing for substructure and text searching, as well as the creation of predicted spectra based on SMILES strings.
This demonstration illustrates how supporting information ZIP files in a variety of formats can be
extracted for metadata.
The page (the link above) provides substructure searching by
utilizing about 400 Java classes transpiled into JavaScript using the Eclipse-based
java2script transpiler,
running the JavaScript SwingJS implementation of Java 8. These JavaScript (né Java) classes provide substructure searching for the page using SMILES strings created from MOL, CDX, and CDXML files associated with the spectra in the collection.
Input uses a hybrid JME (
Java Molecular Editor)/OCL (
OpenChemLib) structure-drawing and analysis interface;
matching is carried
out using a small Jmol-derived smiles processing package, also transpiled to JavaScript from Java.
The format of the finding aids in this demo is an early (now deprecated) alpha version of the specification, version 0.0.5.
This demonstration features the addition of an IFD_METADATA file within a Bruker dataset to automatically
generate IFDSample objects and display spectra by sample as well as by compound.
The data are from a summer organic chemistry lab at St. Olaf College.
Undergraduate students were assigned the task of determining the structure of an unknown compound.
They used the St. Olaf Bruker Avance 400 NMR instrument, with pre-assigned slots in a 120-position BACS autosampler.
The interface was the remote-access web-based
OleNMR system, which iterfaces
directly with IconNMR.
The system required entry of a sample ID.
Students were also encouraged (later in the semester) to provide a proposed structure, drawn
using JSME, also within the OleNMR interface. Sample ID and structure (as "structure.mol") were
automatically added to the primary Bruker dataset directory by OleNMR.
The IFD_METADATA file in this case is just a single line, for example:
sample_id=A5-Ex.6A-230613
Metadata extraction and construction
of an IUPAC FAIRSpec Data Collection and associated IUPAC FAIRSpec Finding Aid were carried out
based on a simple configuration file (
IFD_extract.json)
that indicated, among other things,
the source of the originating sample identifiers:
{"FAIRSpec.extractor.related_metadata" : "IFD_METADATA"},
{"FAIRSpec.extractor.related_metadata_map" : {"sample_id":"IFD.property.dataobject.originating_sample_id"}},
Thus, these two files, IFD_METADATA and structure.mol, provided all the additional bits of metadata
needed for the extractor program (
ExtractorTestSTO.jar)
to create the
IUPAC FAIRSpec Data Collection,
IUPAC FAIRSpec Finding Aid, and
html landing page for this demonstration.
A much richer demonstration involving advanced finding aids created using
ExtractorTestACS.java.
The extractor generated fourteen the web pages (thirteen from a set of supporting information datasets from the ACS FAIRData pilot study, and one added from a repository at Cambridge University).
It includes the capability to
search for properties of compounds, structures, and spectra.
This demonstration illustrates
IUPAC FAIRSpec Metadata Object Model Specification, version 0.1.0 (2025.08.15).

This example demonstrates a generic web page loading an IUPAC FAIRSpec Finding Aid
from the URL on this server using the ?url= query.
This "metadata crawler" demonstration illustrates how we
can take a single DOI, "10.14469/HPC/14635", and produce a
complete IUPAC FAIRSpec Finding Aid simply by following metadata stored at DataCite.
The data set is from Imperial College London.
The only calls to the actual repository in creating the Finding Aid were
to extract https HEAD information about each data item. This included preferred file name,
media type (such as "image/png"), and file size in bytes.
The program used
to create the IUPAC FAIRSpec Finding Aid was DOICrawler.java, subclassed as ICLDOICrawler2
in order to handle a few idiocyncracies
of that particular repository.
This example combines the remote-access idea of Demo 2025.2 with a crawler-based IUPAC FAIRSpec Finding Aid (from Demo 2025.3). Unlike the previous url= example,
here there is no stand-alone IUPAC FAIRSpec Data Collection. Instead, the repository itself serves as the IUPAC FAIRSpec Data Collection
and is called directly for all referenced digital objects. The
finding aid refering to the collection can be located anywhere on the web.
A second demonstration highlighting what can be done
by "crawling" of metadata-rich DataCite records.
The demonstration focus on the interconnected DataCite
metadata records of a
highly curated collection at
the high-performance computing repository at
Imperial College London. This collection contains
data relating to 57 compounds associated with the
article
Syntheses and Characterization of Main Group, Transition Metal, Lanthanide,
and Actinide Complexes of Bidentate Acylpyrazolone Ligands,
by Thomas Mies, Andrew J. P. White, Henry S. Rzepa,
Luciano Barluzzi, Mohit Devgan and Richard A. Layfield,
and Anthony G. M. Barrett.
The
IUPAC FAIRSpec Finding Aid
accesses 354 DOI-referenced pages in the repository
backed by DataCite metadata records pointing to 1354 distinct digital
items, including 244 CDXML drawings, 209 MOL files,
146 complete Bruker NMR datasets, 144 JCAMP-DX files,
and 375 PNG images.
The sample landing page for the Finding Aid
uses only the information in the DataCite metadata records retrieved from
DataCite by
ICLDOICrawler2.java,
which was initiated using only the DOI string for the main repository page. The command line used was:
java -jar ICLDOICrawler2.jar "10.14469/hpc/10386" "c:/temp/iupac/crawler2" -insitu
For this proof-of-concept, the only files downloaded from the repository
were PNG files (the -insitu flag).
The internalization of all PNG images as dataURIs within the IUPAC FAIRSpec Finding Aid allows
the page to and display rapidly, without directly accessing the
repository itself for images.
This creates a somewhat larger JSON document (17 MB) -- still considerably
smaller than the 75.7 GB of data in this repository collection.
The landing page uses
JME-SwintJS
to create
SMARTS
substructure searches and a minimal implementation of
Jmol-SwingJS
to do the SMILES-string searching. Not all compounds in this collection
have fully validated
SMILES strings
due to the inorganic nature of the
compounds. But this does not prevent SMARTS searching of the metal center, ligands, or
associated solvents. In addition, the page provides access to
nmrdb for optional NMR spectrum prediction.
This third demonstration combines crawling of DataCite metadata records with extraction of the files.
The DataCite DOI records are followed, but unlike the v5 example, here we also download all digital objects
into a temporary directory in order to extract their metadata.
The command line used was
java -jar ICLDOICrawler2.jar "10.14469/hpc/10386" "c:/temp/iupac/crawler2" -insitu -extractspecproperties
The process creates an especially metadata-rich IUPAC FAIRSpec Finding Aid
for a repository, in effect converting the repository from a FAIRSpec-ready data collection
to an IUPAC FAIRSpec Data Collection.
The landing page and associated files are completely portable. You can work with it locally
by downloading and unzipping
v6-crawler2/10.14469_hpc_10386.zip.