If you find any issues / have any comment, please contact email@example.com to give us your feedback!
The KG documentation is collected across the various sources of the components of the KG and aggregated with this aggregator. The goal is to collect documentation elements from various sources (as markdown or HTML) from different repositories and to aggregate them in a central place.
One of the most important pieces of the documentation generation is the structure.yml: It defines the navigation hierarchy of the generated documentation.
To be able to map the resources to a hierarchy, naming conventions are applied. For this, every entry on the structure.yml is translated to a "slug" which includes the parent structures.
Example: A page called "KG Search" which is a subpage of "The EBRAINS Knowledge Graph" will receive the slug
By default, it's the title of the page which is used for creating the slug - however, if you want to override this behavior, you can do so by following this pattern:
- Your human readable title \#the-explicitly-specified-slug
Collection of resources
There are two main sources of resources: - Local resources within this repository in general_sources which are typically structures needed to give a broader context to the user which doesn't fit into the documentation of a very specific feature / component and therefore wouldn't fit into the project repository. - Sources from the various code repositories involved. The build pipeline (compare gitlab-ci.yml) ensures that all repositories are properly cloned into the project_sources directory. Letting the CI pipeline to handle this makes it possible to access public as well as private repositories without putting the burden on the aggregator logic to handle the authentication (this is part of the CI pipeline). Please note, that the directory name the repository is checked out into has a meaning and therefore should be chosen carefully.
Filtering of resources
The resources are filtered by patterns defined in config.json. This file allows you to declare inclusionPatterns and to specify ignoredFiles.
Analysis of resources
The script is mainly interested in markdown and HTML files. In a first step, all those files are analyzed to evaluate an explicit identifier for them.
The logic to evaluate this identifiers follows these rules in order:
1. An explicitly defined identifier inside the document in the first line following the convention
[#//IDENTIFER]:<> in markdown and
<!-- #IDENTIIFER -->in HTML.
2. The first main heading (
# YOUR HEADING in markdown or
<h1>YOUR HEADING</h1> in HTML)
Additionally, a file can be assigned to a category according to its source - this is typically the directory name at the level just below "general_sources" or "project_sources" (in the latter case, this is equivalent to the place the CI pipeline has checked out the code).
Mapping of "slugs" by identifier and category
Now that all slugs of the structure.yml are known and we have information about explicitly defined identifiers as well as categories of the source files, it's time to do the mapping. For this, we first evaluate which slugs could potentially be matches between the identifier and the slug. If exactly one of those slugs applies, we have a match. If there are multiple, we try to figure out if one of the potential matches contains the category information. This e.g. allows to have the same files such as "how-to" or similar in multiple repositories whilst still being able to apply them to the right "product" (such as "KG Search" or "KG Core").
If it is not possible to find an unambiguous slug, this is reported to the console and requires manual action (e.g. explicit definition of identifiers in the file or explicit slugs in the structure.yml)
If no slug can be found by the identifier, the script tries to use the original file name to create the slug and to map it with the same mechanism. If this is not possible, the file is reported to not be part of the structure.yml and might need to be added or the slugs have to be updated.
Generation of HTML
Once the mapping has been clarified, the HTML generator is triggered - in case of a HTML source, this means that it's integrated into the wrapping template only. If it's Markdown, it's translated into HTML whilst applying some tweaks: - Restructuring of code sections for compatibility with prism code highlighting and mermaidjs - Conventions that an italic style following a code block or image shall be interpreted as a caption
- Please note, that it is also possible to use Jinja templating syntax inside the files to cover more advanced use-cases.
The generated HTML files are stored in the "target" directory according to their "category". Please note that resource files are copied next to the HTML files to ensure that relative paths are maintained.
Special case for 🔐
If an entry in the structure.yml is prefixed with the special character 🔐, this means that these are internal resources and are meant for protected access. Those resources are stored in the directory "internal" in the "target" folder. Please note that it's the duty of the deployment context to ensure the access protection (e.g. by protecting the "target" folder with a password or - as in our case by oAuth on the reverse proxy).