🧪 Beaker#
Important
To use this integration you should install tango
with the “beaker” extra
(e.g. pip install tango[beaker]
) or just install the beaker-py
library after the fact (e.g. pip install beaker-py
).
Components for Tango integration with Beaker.
Reference#
- class tango.integrations.beaker.BeakerWorkspace(workspace, max_workers=None, **kwargs)[source]#
This is a
Workspace
that stores step artifacts on Beaker.Tip
Registered as a
Workspace
under the name “beaker”.- Parameters:
workspace (
str
) – The name or ID of the Beaker workspace to use.kwargs – Additional keyword arguments passed to
Beaker.from_env()
.
- class tango.integrations.beaker.BeakerStepCache(beaker_workspace=None, beaker=None)[source]#
This is a
StepCache
that’s used byBeakerWorkspace
. It stores the results of steps on Beaker as datasets.It also keeps a limited in-memory cache as well as a local backup on disk, so fetching a step’s resulting subsequent times should be fast.
Tip
Registered as a
StepCache
under the name “beaker”.
- class tango.integrations.beaker.BeakerExecutor(workspace, clusters=None, include_package=None, beaker_workspace=None, github_token=None, google_token=None, beaker_image=None, docker_image=None, datasets=None, env_vars=None, venv_name=None, parallelism=None, install_cmd=None, priority=None, allow_dirty=False, scheduler=None, **kwargs)[source]#
This is a
Executor
that runs steps on Beaker. Each step is run as its own Beaker experiment.Tip
Registered as an
Executor
under the name “beaker”.Important
The
BeakerExecutor
requires that you run Tango within a GitHub repository and you push all of your changes prior to eachtango run
call. It also requires that you have a GitHub personal access token with at least the “repo” scope set to the environment variableGITHUB_TOKEN
(you can also set it using thegithub_token
parameter, see below).This is because
BeakerExecutor
has to be able to clone your code from Beaker.Important
The
BeakerExecutor
will try to recreate your Python environment on Beaker every time a step is run, so it’s important that you specify all of your dependencies in a PIPrequirements.txt
file,setup.py
file, or a condaenvironment.yml
file. Alternatively you could provide theinstall_cmd
argument.Important
The
BeakerExecutor
takes no responsibility for saving the results of steps that it runs on Beaker. That’s the job of your workspace. So make sure your using the right type of workspace or your results will be lost.For example, any “remote” workspace (like the
BeakerWorkspace
) would work, or in some cases you could use aLocalWorkspace
on an NFS drive.Important
If you’re running a step that requires special hardware, e.g. a GPU, you should specify that in the
step_resources
parameter to the step, or by overriding the step’s.resources()
property method.- Parameters:
clusters (
Optional
[List
[str
]], default:None
) – A list of Beaker clusters that the executor may use to run steps. Ifscheduler
is specified, this argument is ignored.include_package (
Optional
[Sequence
[str
]], default:None
) – A list of Python packages to import before running steps.beaker_workspace (
Optional
[str
], default:None
) – The name or ID of the Beaker workspace to use.github_token (
Optional
[str
], default:None
) – You can use this parameter to set a GitHub personal access token instead of using theGITHUB_TOKEN
environment variable.google_token (
Optional
[str
], default:None
) – You can use this parameter to set a Google Cloud token instead of using theGOOGLE_TOKEN
environment variable.beaker_image (
Optional
[str
], default:None
) – The name or ID of a Beaker image to use for running steps on Beaker. The image must come with bash and conda installed (Miniconda is okay). This is mutually exclusive with thedocker_image
parameter. If neitherbeaker_image
nordocker_image
is specified, theDEFAULT_BEAKER_IMAGE
will be used.docker_image (
Optional
[str
], default:None
) –The name of a publicly-available Docker image to use for running steps on Beaker. The image must come with bash and conda installed (Miniconda is okay). This is mutually exclusive with the
beaker_image
parameter.datasets (
Optional
[List
[DataMount
]], default:None
) – External data sources to mount into the Beaker job for each step. You could use this to mount an NFS drive, for example.env_vars (
Optional
[List
[EnvVar
]], default:None
) – Environment variables to set in the Beaker job for each step.venv_name (
Optional
[str
], default:None
) – The name of the conda virtual environment to use or create on the image. If you’re using your own image that already has a conda environment you want to use, you should set this variable to the name of that environment. You can also set this to “base” to use the base environment.parallelism (
Optional
[int
], default:None
) – Control the maximum number of steps run in parallel on Beaker.install_cmd (
Optional
[str
], default:None
) – Override the command used to install your code and its dependencies in each Beaker job. For example, you could setinstall_cmd="pip install .[dev]"
.priority (
Union
[str
,Priority
,None
], default:None
) – The default task priority to assign to jobs ran on Beaker. Ifscheduler
is specified, this argument is ignored.scheduler (
Optional
[BeakerScheduler
], default:None
) – ABeakerScheduler
to use for assigning resources to steps. If not specified theSimpleBeakerScheduler
is used with the givenclusters
andpriority
.allow_dirty (
bool
, default:False
) – By default, the Beaker Executor requires that your git working directory has no uncommitted changes. If you set this toTrue
, we skip this check.kwargs – Additional keyword arguments passed to
Beaker.from_env()
.
Attention
Certain parameters should not be included in the
executor
part of yourtango.yml
file, namelyworkspace
andinclude_package
. Instead use the top-levelworkspace
andinclude_package
fields, respectively.- Examples:
Minimal tango.yaml file
You can use this executor by specifying it in your
tango.yml
settings file:executor: type: beaker beaker_workspace: ai2/my-workspace clusters: - ai2/general-cirrascale
Using GPUs
If you have a step that requires a GPU, there are two things you need to do:
1. First, you’ll need to ensure that the
BeakerExecutor
can install your dependencies the right way to support the GPU hardware. There are usually two ways to do this: use a Docker image that comes with a proper installation of your hardware-specific dependencies (e.g. PyTorch), or add a condaenvironment.yml
file to your project that specifies the proper version of those dependencies.If you go with first option you don’t necessarily need to build your own Docker image. If PyTorch is the only hardware-specific dependency you have, you could just use one of AI2’s pre-built PyTorch images. Just add these lines to your
tango.yml
file:executor: type: beaker beaker_workspace: ai2/my-workspace + docker_image: ghcr.io/allenai/pytorch:1.12.0-cuda11.3-python3.9 + venv_name: base clusters: - ai2/general-cirrascale
The
venv_name: base
line tells theBeakerExecutor
to use the existing conda environment called “base” on the image instead of creating a new one.Alternatively, you could use the
default image
and just add a condaenvironment.yml
file to the root of your project that looks like this:name: torch-env channels: - pytorch dependencies: - python=3.9 - cudatoolkit=11.3 - numpy - pytorch - ...
2. And second, you’ll need to specify the GPUs required by each step in the config for that step under the
step_resources
parameter. For example,"steps": { "train": { "type": "torch::train", "step_resources": { "gpu_count": 1 } } }
- class tango.integrations.beaker.BeakerScheduler[source]#
A
BeakerScheduler
is responsible for determining which resources and priority to assign to the execution of a step.- abstract schedule(step)[source]#
Determine the
ResourceAssignment
for a step.- Raises:
ResourceAssignmentError – If the scheduler can’t find enough free resources at the moment to run the step.
- Return type:
-
default_implementation:
Optional
[str
] = 'simple'# The default implementation is
SimpleBeakerScheduler
.
- class tango.integrations.beaker.SimpleBeakerScheduler(clusters, priority)[source]#
The
SimpleBeakerScheduler
just searches the given clusters for one with enough resources to match what’s specified by the step’s required resources.