🧪 Beaker#

Important

To use this integration you should install tango with the “beaker” extra (e.g. pip install tango[beaker]) or just install the beaker-py library after the fact (e.g. pip install beaker-py).

Components for Tango integration with Beaker.

Reference#

class tango.integrations.beaker.BeakerWorkspace(beaker_workspace, **kwargs)[source]#

This is a Workspace that stores step artifacts on Beaker.

Tip

Registered as a Workspace under the name “beaker”.

Parameters
  • beaker_workspace (str) – The name or ID of the Beaker workspace to use.

  • kwargs – Additional keyword arguments passed to Beaker.from_env().

class tango.integrations.beaker.BeakerStepCache(beaker_workspace=None, beaker=None)[source]#

This is a StepCache that’s used by BeakerWorkspace. It stores the results of steps on Beaker as datasets.

It also keeps a limited in-memory cache as well as a local backup on disk, so fetching a step’s resulting subsequent times should be fast.

Tip

Registered as a StepCache under the name “beaker”.

Parameters
  • workspace – The name or ID of the Beaker workspace to use.

  • beaker (Optional[Beaker], default: None) – The Beaker client to use.

class tango.integrations.beaker.BeakerExecutor(workspace, clusters, include_package=None, beaker_workspace=None, github_token=None, beaker_image=None, docker_image=None, datasets=None, env_vars=None, venv_name=None, parallelism=-1, install_cmd=None, **kwargs)[source]#

This is a Executor that runs steps on Beaker. Each step is run as its own Beaker experiment.

Tip

Registered as an Executor under the name “beaker”.

Important

The BeakerExecutor requires that you run Tango within a GitHub repository and you push all of your changes prior to each tango run call. It also requires that you have a GitHub personal access token with at least the “repo” scope set to the environment variable GITHUB_TOKEN (you can also set it using the github_token parameter, see below).

This is because BeakerExecutor has to be able to clone your code from Beaker.

Important

The BeakerExecutor will try to recreate your Python environment on Beaker every time a step is run, so it’s important that you specify all of your dependencies in a PIP requirements.txt file, setup.py file, or a conda environment.yml file. Alternatively you could provide the install_cmd argument.

Important

The BeakerExecutor takes no responsibility for saving the results of steps that it runs on Beaker. That’s the job of your workspace. So make sure your using the right type of workspace or your results will be lost.

For example, any “remote” workspace (like the BeakerWorkspace) would work, or in some cases you could use a LocalWorkspace on an NFS drive.

Important

If you’re running a step that requires special hardware, e.g. a GPU, you should specify that in the step_resources parameter to the step, or by overriding the step’s .resources() property method.

Parameters
  • workspace (Workspace) – The Workspace to use.

  • clusters (List[str]) – A list of Beaker clusters that the executor may use to run steps.

  • include_package (Optional[Sequence[str]], default: None) – A list of Python packages to import before running steps.

  • beaker_workspace (Optional[str], default: None) – The name or ID of the Beaker workspace to use.

  • github_token (Optional[str], default: None) – You can use this parameter to set a GitHub personal access token instead of using the GITHUB_TOKEN environment variable.

  • beaker_image (Optional[str], default: None) – The name or ID of a Beaker image to use for running steps on Beaker. The image must come with bash and conda installed (Miniconda is okay). This is mutually exclusive with the docker_image parameter. If neither beaker_image nor docker_image is specified, the DEFAULT_BEAKER_IMAGE will be used.

  • docker_image (Optional[str], default: None) –

    The name of a publicly-available Docker image to use for running steps on Beaker. The image must come with bash and conda installed (Miniconda is okay). This is mutually exclusive with the beaker_image parameter.

  • datasets (Optional[List[DataMount]], default: None) – External data sources to mount into the Beaker job for each step. You could use this to mount an NFS drive, for example.

  • env_vars (Optional[List[EnvVar]], default: None) – Environment variables to set in the Beaker job for each step.

  • venv_name (Optional[str], default: None) – The name of the conda virtual environment to use or create on the image. If you’re using your own image that already has a conda environment you want to use, you should set this variable to the name of that environment. You can also set this to “base” to use the base environment.

  • parallelism (Optional[int], default: -1) – Control the maximum number of steps run in parallel on Beaker.

  • install_cmd (Optional[str], default: None) – Override the command used to install your code and its dependencies in each Beaker job. For example, you could set install_cmd="pip install .[dev]".

  • kwargs – Additional keyword arguments passed to Beaker.from_env().

Attention

Certain parameters should not be included in the executor part of your tango.yml file, namely workspace and include_package. Instead use the top-level workspace and include_package fields, respectively.

Examples

Minimal tango.yaml file

You can use this executor by specifying it in your tango.yml settings file:

executor:
  type: beaker
  beaker_workspace: ai2/my-workspace
  clusters:
    - ai2/general-cirrascale

Using GPUs

If you have a step that requires a GPU, there are two things you need to do:

1. First, you’ll need to ensure that the BeakerExecutor can install your dependencies the right way to support the GPU hardware. There are usually two ways to do this: use a Docker image that comes with a proper installation of your hardware-specific dependencies (e.g. PyTorch), or add a conda environment.yml file to your project that specifies the proper version of those dependencies.

If you go with first option you don’t necessarily need to build your own Docker image. If PyTorch is the only hardware-specific dependency you have, you could just use one of AI2’s pre-built PyTorch images. Just add these lines to your tango.yml file:

 executor:
   type: beaker
   beaker_workspace: ai2/my-workspace
+  docker_image: ghcr.io/allenai/pytorch:1.12.0-cuda11.3-python3.9
+  venv_name: base
   clusters:
     - ai2/general-cirrascale

The venv_name: base line tells the BeakerExecutor to use the existing conda environment called “base” on the image instead of creating a new one.

Alternatively, you could use the default image and just add a conda environment.yml file to the root of your project that looks like this:

name: torch-env
channels:
  - pytorch
dependencies:
  - python=3.9
  - cudatoolkit=11.3
  - numpy
  - pytorch
  - ...

2. And second, you’ll need to specify the GPUs required by each step in the config for that step under the step_resources parameter. For example,

"steps": {
    "train": {
        "type": "torch::train",
        "step_resources": {
            "gpu_count": 1
        }
    }
}
DEFAULT_BEAKER_IMAGE: str = 'ai2/conda'#

The default image. Used if neither beaker_image nor docker_image are set.