Skip to content

Setting up a new worker

This page will guide you through creating a new Arkindex worker locally and preparing a development environment.

This guide assumes you are using Ubuntu 20.04 or later and have root access.

Preparing your environment

This section will guide you through preparing your system to create a new Arkindex worker from our official template.

Installing system dependencies

To retrieve the Arkindex worker template, you will need to have both Git and SSH. Git is a version control system that you will later use to manage multiple versions of your worker. SSH allows secure connections to remote machines, and will be used in our case to retrieve the template from a Git server.

To install system dependencies

  1. Run the following command:
sudo apt install git ssh

Checking your version of Python

Our Arkindex worker template requires Python 3.6 or later. Checking if a compatible version of Python is installed avoids further issues in the setup process.

To check your version of Python

  1. Run the following command: python3 --version

This command will have an output similar to the following:

Python 3.6.9

Installing Python

If you were unable to check your Python version as stated above because python3 was not found, you will need to install Python 3 on your system.

To install Python on Ubuntu

  1. Run the following command:
sudo apt install python3 python3-pip python3-virtualenv
  1. Check your Python version again, as instructed in the previous section.

Installing Python dependencies

To bootstrap a new Arkindex worker, some Python dependencies will be required:

  • pre-commit will be used to automatically check the syntax of your source code.
  • tox will be used to run unit tests.

To install Python dependencies

  1. Run the following command:
pip3 install pre-commit tox cookiecutter virtualenvwrapper
  1. Follow the official virtualenvwrapper setup instructions until you are able to run workon.

workon should have an empty output, as no Python virtual environments have been set up yet.

Creating the project

This section will guide you through creating a new worker from our official template and making it available on a GitLab instance.

Creating a GitLab project

For a worker to be accessible from an Arkindex instance, it needs to be sent to a repository on a GitLab project. A GitLab project will also allow you to manage different versions of a worker and run automated checks on your code.

To create a GitLab project

  1. Open the New project form on GitLab.com or on another GitLab instance

  2. Enter your worker name as the Project name

  3. Define a Project slug related to your worker, e.g.:

    • tesseract for a Tesseract worker
    • opencv-foo for an OpenCV worker related to project Foo
  4. Click on the Create project button

Bootstrapping the project

This section guides you through using our official template to get a basic structure for your worker.

To bootstrap the project

  1. Open a terminal and go to a folder in which you will want your worker to be.

  2. Enter this command and fill in the required information:

cookiecutter git@gitlab.teklia.com:workers/base-worker.git

Cookiecutter will ask you for several options:

slug
A slug for the worker. This should use lowercase alphanumeric characters, underscores or hyphens to meet the code formatting requirements that the template automatically enforces via black.
name
A name for the worker, purely used for display purposes.
description
A general description of the worker. This will be used to initialize the README.md of your repository as well as the help command output.
worker_type

An arbitrary string purely used for display purposes. For example:

  • recognizer,

  • classifier,

  • dla,

  • entity-recognizer, etc.

author
A name for the worker’s author. Usually your first and last name.
email
Your e-mail address. This will be used to contact you if any administrative need arise

Cookiecutter will also automatically normalize your worker’s slug in new parameters:

__package
The name of the Python package for your worker, generated by normalizing the slug with characters’ lowering and replacing underscores with hyphens.
__module
The name of the Python module for your worker, generated by normalizing the slug with characters’ lowering and replacing hyphens with underscores.

Pushing to GitLab

This section guides you through pushing the newly created worker from your system to the GitLab project’s repository.

This section assumes you have Maintainer or Owner access to the GitLab project.

To push to GitLab

  1. Enter the newly created directory, starting in worker- and ending with your worker’s slug.

  2. Add your GitLab project as a Git remote:

git remote add origin git@my-gitlab-instance.com:path/to/worker.git

You will need to use your own instance’s URL and the path to your own project. For example, a project named hello in the teklia group on gitlab.com will use the following command:

git remote add origin git@gitlab.com:teklia/hello.git
  1. Push the new branch to GitLab:
git push --set-upstream origin master

If you want to push a different branch, you first need to create it. For example, if you want to push to a new branch named bootstrap, you will use:

git checkout -b bootstrap
git push --set-upstream origin bootstrap
  1. Open your GitLab project in a browser.

  2. Click on the blue icon indicating that CI is running on your repository, and wait for it to turn green to confirm everything worked.

Setting up your development environment

This section guides you through setting up a Python development environment specifically for your worker.

Activating the pre-commit hook

The official template includes code syntax checks such as trailing whitespace, as well as code linting using black. Those checks run on GitLab as soon as you push new code, but it is possible to run those automatically when you create new commits using the pre-commit hook.

To activate the pre-commit hook

  1. Run pre-commit install.

Setting up the Python virtual environment

To install Python dependencies that are specific to your worker, and prevent other dependencies installed on your system from interfering, it is recommended to use a virtual environment.

To set up a Python virtual environment

  1. Run mkvirtualenv my_worker, where my_worker is any name of your choice.
  2. Install your worker in editable mode: pip install -e .