YAML configuration¶
This page is a reference for version 2 of the YAML configuration file for Git repositories handled by Arkindex. Version 1 is not supported.
The configuration file is always named .arkindex.yml
and should be found at
the root of the repository.
Required attributes¶
The following attributes are required in every .arkindex.yml
file:
version
- Version of the configuration file in use. An error will occur if the version
number is not set to
2
.
Example configuration¶
---
version: 2
workers:
- workers/config.yml
This would match workers/config.yml
starting at the root of
the repository.
Worker repository attributes¶
The workers
attribute is a list of the following:
- Paths to a YAML file holding the configuration for a single worker
- Unix-style patterns matching paths to YAML files holding the configuration for a single worker
- The configuration of a single worker embedded directly into the file
Single worker configuration¶
The following describes the attributes of a YAML file configuring one worker, or
of the configuration embedded directly in the .arkindex.yml
file.
All attributes are optional unless explicitly specified.
name
- Mandatory. Name of the worker, for display purposes.
slug
- Mandatory. Slug of this worker. The slug must be unique across the repository and must only hold alphanumerical characters, underscores or hyphens.
type
-
Mandatory. Type of the worker, for display purposes only. Some common values include:
classifier
recognizer
ner
dla
word-segmenter
paragraph-creator
description
- Path to a file containing the worker’s description stored in the
descriptions
folder. gpu_usage
-
Whether or not this worker requires or supports GPUs. Defaults to
disabled
. May take one of the following values:required
- This worker requires a GPU, and will only be run on Ponos agents whose hosts have a GPU.
supported
- This worker supports using a GPU, but may run on any available host, including those without GPUs.
disabled
- This worker does not support GPUs. It may run on a host that has a GPU, but it will ignore it.
model_usage
-
Whether or not this worker requires a model version to run. Defaults to
disabled
. May take one of the following values:required
- This worker requires a model version, and will only be run on processes with a model.
supported
- This worker supports a model version, but may run on any processes, including those without model.
disabled
- This worker does not support model version. It may run on a process that has a model, but it will ignore it.
docker
-
Regroups Docker-related configuration attributes:
-
command
- Mandatory command line to be used when launching the Docker container for this Worker.
-
shm_size
: Size of the available shared memory in/dev/shm
. The default value is64M
, but when training machine learning models an increase might be necessary. The given value must be either an integer, or an integer followed by a unit (b
for bytes,k
for kilobytes,m
for megabytes andg
for gigabytes). If no unit is specified, the default unit isbytes
. See the Docker documentation.
-
configuration
- Mapping holding any string keys and values that can be later accessed in the worker’s Python code. Can be used to define settings on your own worker, such as a file’s location.
user_configuration
- Mapping defining settings on your worker that can be modified by users. See below for details.
secrets
- List of required secret names for that specific worker. For more information, learn how to use secrets in workers on the official Arkindex documentation.
Setting up user-configurable parameters¶
The YAML file can define parameters that users will be able to change when they use this worker in a process on Arkindex. These parameters are listed in a user_configuration
attribute.
A parameter is defined using the following settings:
title
- Mandatory. The parameter’s title.
type
-
Mandatory. A value type. The supported types are:
int
bool
float
string
enum
list
dict
model
default
- Optional. A default value for the parameter. Must be of the defined parameter
type
. required
- Optional. A boolean, defaults to
false
. choices
- Optional. A list of options for
enum
type parameters. subtype
- Optional. The type of the elements of
list
type parameters. multiline
- Optional. Enable multi-line input for
string
type parameters.
This definition allows for both validation of the input and the display of a form to make configuring workers easy for Arkindex users.
String parameters¶
String-type parameters must be defined using a title
and the string
type
. You can also set a default
value for this parameter, which must be a string, as well as make it a required
parameter, which prevents users from leaving it blank.
Setting the additional multiline
parameter, which is a boolean, to True
, results for the user in the display of a larger text area. This allows for the use of line breaks when filling the field. It is however still possible to use \n
in non-multiline string configuration fields. If multiline
is not specified, it defaults to False
.
For example, a string-type parameter can be defined like this:
subfolder_name:
title: Created Subfolder Name
type: string
default: My Neat Subfolder
multiline: False
Which will result in the following display for the user:
Integer parameters¶
Integer-type parameters must be defined using a title
and the int
type
. You can also set a default
value for this parameter, which must be an integer, as well as make it a required
parameter, which prevents users from leaving it blank.
For example, an integer-type parameter can be defined like this:
input_size:
title: Input Size
type: int
default: 768
required: True
Which will result in the following display for the user:
Float parameters¶
Float-type parameters must be defined using a title
and the float
type
. You can also set a default
value for this parameter, which must be a float, as well as make it a required
parameter, which prevents users from leaving it blank.
For example, a float-type parameter can be defined like this:
wip:
title: Word Insertion Penalty
type: float
required: True
Which will result in the following display for the user:
Boolean parameters¶
Boolean-type parameters must be defined using a title
and the bool
type
. You can also set a default
value for this parameter, which must be a boolean, as well as make it a required
parameter, which prevents users from leaving it blank.
In the configuration form, boolean parameters are displayed as toggles.
For example, a boolean-type parameter can be defined like this:
score:
title: Run Worker in Evaluation Mode
type: bool
default: False
Which will result in the following display for the user:
Enum (choices) parameters¶
Enum-type parameters must be defined using a title
, the enum
type
and at least two choices
. You cannot define an enum-type parameter without choices
. You can also set a default
value for this parameter, which must be one of the available choices
, as well as make it a required
parameter, which prevents users from leaving it blank. Enum-type parameters should be used when you want to limit the users to a given set of options.
In the configuration form, enum parameters are displayed as selects.
For example, an enum-type parameter can be defined like this:
parent_type:
title: Target Parent Element Type
type: enum
default: paragraph
choices:
- paragraph
- text_zone
- page
Which will result in the following display for the user:
List parameters¶
List-type parameters must be defined using a title
, the list
type
and a subtype
for the elements inside the list. You can also set a default
value for this parameter, which must be a list containing elements of the given subtype
, as well as make it a required
parameter, which prevents users from leaving it blank.
The allowed subtype
s are int
, float
and string
.
In the configuration form, list parameters are displayed as rows of input fields.
For example, a list-type parameter can be defined like this:
a_list:
title: A List of Values
type: list
subtype: int
default: [4, 3, 12]
Which will result in the following display for the user:
Dictionary parameters¶
Dictionary-type parameters must be defined using a title
and the dict
type
. You can also set a default
value for this parameter, which must be a dictionary, as well as make it a required
parameter, which prevents users from leaving it blank. You can use dictionary parameters for example to specify a correspondence between the classes that are predicted by a worker and the elements that are created on Arkindex from these predictions.
Dictionary-type parameters only accept strings as values.
In the configuration form, dictionary parameters are displayed as a table with one column for keys and one column for values.
For example, a dictionary-type parameter can be defined like this:
classes:
title: Output Classes to Elements Correspondence
type: dict
default:
a: page
b: text_line
Which will result in the following display for the user:
Model parameters¶
Model-type parameters must be defined using a title
and the model
type. You can also set a default
value for this parameter, which must be the UUID of an existing Model, and make it a required
parameter, which prevents users from leaving it blank. You can use a model parameter to specify to which Model the Model Version that is created by a Training process will be attached.
Model-type parameters only accept Model UUIDs as values.
In the configuration form, model parameters are displayed as an input field. Users can select a model from a list of available Models: what they type into the input field filters that list, allowing them to search for a model using its name or UUID.
For example, a model-type parameter can be defined like this:
model_param:
title: Training Model
type: model
Which will result in the following display for the user:
Example user_configuration¶
user_configuration:
vertical_padding:
type: int
default: 0
title: Vertical Padding
element_base_name:
type: string
required: true
title: Element Base Name
create_confidence_metadata:
type: bool
default: false
title: Create confidence metadata on elements
some_other_parameter:
type: enum
required: true
default: 23
choices:
- 12
- 23
- 56
title: Another Parameter
a_model_parameter:
type: model
title: Model to train
Fallback to free JSON input¶
If you have defined user-configurable parameters using these specifications, Arkindex users can choose between using the form or the free JSON input field by toggling the JSON toggle. If there are unsupported parameter types in the defined user_configuration
, the frontend will automatically fall back to the free JSON input field. The same is true if you have not defined user-configurable parameters using these specifications.
Example configuration¶
---
version: 2
workers:
# Path to a single YAML file
- path/to/worker.yml
# Pattern matching any YAML file in the configuration folder
# or in its sub-directories
- configuration/**/*.yml
# Configuration embedded directly into this file
- name: Book of hours
slug: book_of_hours
type: classifier
docker:
command: python mysuperscript.py --blabla
shm_size: 128m
configuration:
model: path/to/model
anyKey: anyValue
classes: [X, Y, Z]
user_configuration:
vertical_padding:
type: int
default: 0
title: Vertical Padding
secrets:
- path/to/secret.json