Setting up a new worker¶
This page will guide you through creating a new Arkindex worker locally and preparing a development environment.
This guide assumes you are using Ubuntu 20.04 or later and have root access.
Preparing your environment¶
This section will guide you through preparing your system to create a new Arkindex worker from our official template.
Installing system dependencies¶
To retrieve the Arkindex worker template, you will need to have both Git and SSH. Git is a version control system that you will later use to manage multiple versions of your worker. SSH allows secure connections to remote machines, and will be used in our case to retrieve the template from a Git server.
To install system dependencies¶
- Run the following command:
sudo apt install git ssh
Checking your version of Python¶
Our Arkindex worker template requires Python 3.6 or later. Checking if a compatible version of Python is installed avoids further issues in the setup process.
To check your version of Python¶
- Run the following command:
python3 --version
This command will have an output similar to the following:
Python 3.6.9
Installing Python¶
If you were unable to check your Python version as stated above because
python3
was not found, you will need to install Python 3 on your system.
To install Python on Ubuntu¶
- Run the following command:
sudo apt install python3 python3-pip python3-virtualenv
- Check your Python version again, as instructed in the previous section.
Installing Python dependencies¶
To bootstrap a new Arkindex worker, some Python dependencies will be required:
- pre-commit will be used to automatically check the syntax of your source code.
- tox will be used to run unit tests.
- cookiecutter will be used to bootstrap the project.
- virtualenvwrapper will be used to manage Python virtual environments.
To install Python dependencies¶
- Run the following command:
pip3 install pre-commit tox cookiecutter virtualenvwrapper
- Follow the
official virtualenvwrapper setup instructions
until you are able to run
workon
.
workon
should have an empty output, as no Python virtual environments have
been set up yet.
Creating the project¶
This section will guide you through creating a new worker from our official template and making it available on a GitLab instance.
Creating a GitLab project¶
For a worker to be accessible from an Arkindex instance, it needs to be sent to a repository on a GitLab project. A GitLab project will also allow you to manage different versions of a worker and run automated checks on your code.
To create a GitLab project¶
-
Open the New project form on GitLab.com or on another GitLab instance
-
Enter your worker name as the Project name
-
Define a Project slug related to your worker, e.g.:
tesseract
for a Tesseract workeropencv-foo
for an OpenCV worker related to project Foo
-
Click on the Create project button
Bootstrapping the project¶
This section guides you through using our official template to get a basic structure for your worker.
To bootstrap the project¶
-
Open a terminal and go to a folder in which you will want your worker to be.
-
Enter this command and fill in the required information:
cookiecutter git@gitlab.teklia.com:workers/base-worker.git
Cookiecutter will ask you for several options:
slug
- A slug for the worker. This should use lowercase alphanumeric characters, underscores or hyphens to meet the code formatting requirements that the template automatically enforces via black.
name
- A name for the worker, purely used for display purposes.
description
- A general description of the worker. This will be used to initialize the
README.md
of your repository as well as thehelp
command output. worker_type
-
An arbitrary string purely used for display purposes. For example:
-
recognizer
, -
classifier
, -
dla
, -
entity-recognizer
, etc.
-
author
- A name for the worker’s author. Usually your first and last name.
email
- Your e-mail address. This will be used to contact you if any administrative need arise
Cookiecutter will also automatically normalize your worker’s slug
in new parameters:
__package
- The name of the Python package for your worker, generated by normalizing the
slug
with characters’ lowering and replacing underscores with hyphens. __module
- The name of the Python module for your worker, generated by normalizing the
slug
with characters’ lowering and replacing hyphens with underscores.
Pushing to GitLab¶
This section guides you through pushing the newly created worker from your system to the GitLab project’s repository.
This section assumes you have Maintainer or Owner access to the GitLab project.
To push to GitLab¶
-
Enter the newly created directory, starting in
worker-
and ending with your worker’sslug
. -
Add your GitLab project as a Git remote:
git remote add origin git@my-gitlab-instance.com:path/to/worker.git
You will need to use your own instance’s URL and the path to your own
project. For example, a project named hello
in the teklia
group
on gitlab.com
will use the following command:
git remote add origin git@gitlab.com:teklia/hello.git
- Push the new branch to GitLab:
git push --set-upstream origin master
If you want to push a different branch, you first need to create it. For example,
if you want to push to a new branch named bootstrap
, you will use:
git checkout -b bootstrap
git push --set-upstream origin bootstrap
-
Open your GitLab project in a browser.
-
Click on the blue icon indicating that CI is running on your repository, and wait for it to turn green to confirm everything worked.
Setting up your development environment¶
This section guides you through setting up a Python development environment specifically for your worker.
Activating the pre-commit hook¶
The official template includes code syntax checks such as trailing whitespace, as well as code linting using black. Those checks run on GitLab as soon as you push new code, but it is possible to run those automatically when you create new commits using the pre-commit hook.
To activate the pre-commit hook¶
- Run
pre-commit install
.
Setting up the Python virtual environment¶
To install Python dependencies that are specific to your worker, and prevent other dependencies installed on your system from interfering, it is recommended to use a virtual environment.
To set up a Python virtual environment¶
- Run
mkvirtualenv my_worker
, wheremy_worker
is any name of your choice. - Install your worker in editable mode:
pip install -e .