ℹ️ This is the new documentation of the EBRAINS KG. It's going to be extended continuously.
If you find any issues / have any comment, please contact kg@ebrains.eu to give us your feedback!

Indexing

The indexing mechanism makes sure that the aggregation of the metadata read from the KG as well as its denormalization takes place and that the underlying ElasticSearch indices are populated accordingly.

Multi-identifier support

Sometimes, instances have multiple identifiers they should be made accessible with. Although the default id is the one provided by the EBRAINS KG, it is possible to define multiple identifiers as part of the translation process of an instance to make sure the underlying card is available by the use of all of them

"Searchable" vs. "Non-Searchable" indices

The KG Search knows two main types of indices: "Searchable" and "Non-Searchable". The search requests are only operating on the "searchable" indices. This means, that if only the newest version of an instance should appear as a search result whilst the others will be available either by id or by navigation on the result cards only, the indexing-service has to ensure that only the newest instances are registered in the "searchable".

"Autorelease" vs. "Non-autorelease" indices

Disclaimer: The naming of "autorelease" vs. "non-autorelease" is there for legacy reasons and can be slightly misleading. We're aiming at renaming it at one point.

The only real impact the separation has (today) is to separate those indices which are expensive to generate and therefore have their individual endpoints in the kg-indexing API. For EBRAINS, this means that we're maintaining e.g. the "File" representation in an "autorelease" index which then can be scheduled independently from the "non-autorelease".

Incremental update vs rebuild of indices

The KG Search knows two modes of updating an index. Incremental means that it updates instances individually "on the fly" without any impact on the end-user. "Rebuild" means that the index is recreated from scratch. Since the index is rebuilt in the background and then replacing the old index in one go, downtime is minimal but still potentially noticeable by the end-user. Please note, that a rebuild is required to update e.g. the ElasticSearch mappings of an index (see below).

As a consequence, it's recommendable to use the "incremental update" for regular updates for productive and running instances unless there is the need of a full rebuild which usually is triggered manually.

Autogeneration of ES-mappings, UI-settings and sitemap by annotation

Alongside the pure data, other resources are required to make the KG Search work: - ElasticSearch requires mapping tables for proper indexing. These mappings are autogenerated by the KG Search service based on the model annotations of the target models. - The UI needs additional directives to properly visualize and layout the metadata (e.g. where in the screen an item should be shown, which widget to use for visualization, etc.) - An automated generation of a sitemap for search engines is also part of the KG Search Service allowing to optimize the appearance in search engines.

Multi-run indexing

Please note that some features require multiple runs of the indexing process - e.g. the removal of "dead links" (internal links to non-existing resources) relies on the information of a previous run of the indexing mechanism. For the index to settle, it is therefore required to run at least twice indexing runs in sequence.

This open source software code was developed in part or in whole in the Human Brain Project, funded from the European Union's Horizon 2020 Framework Programme for Research and Innovation under Specific Grant Agreements No. 720270, No. 785907 and No. 945539 (Human Brain Project SGA1, SGA2 and SGA3).