Workspace#
Base class#
- class tango.workspace.Workspace[source]#
A workspace is a place for Tango to put the results of steps, intermediate results, and various other pieces of metadata. If you don’t want to worry about all that, do nothing and Tango will use the default
LocalWorkspace
that puts everything into a directory of your choosing.If you want to do fancy things like store results in the cloud, share state across machines, etc., this is your integration point.
If you got here solely because you want to share results between machines, consider that
LocalWorkspace
works fine on an NFS drive.- capture_logs_for_run(name)[source]#
Should return a context manager that can be used to capture the logs for a run.
By default, this doesn’t do anything.
- Return type:
AbstractContextManager
[None
]
Examples
The
LocalWorkspace
implementation uses this method to capture logs to a file in the workspace’s directory using thefile_handler()
context manager, similar to this:from tango.common.logging import file_handler from tango.workspace import Workspace class MyLocalWorkspace(Workspace): def capture_logs_for_run(self, name: str): return file_handler("/path/to/workspace/" + name + ".log")
- abstract classmethod from_parsed_url(parsed_url)[source]#
Subclasses should override this so that can be initialized from a URL.
- Parameters:
parsed_url (
ParseResult
) – The parsed URL object.- Return type:
- classmethod from_url(url)[source]#
Initialize a
Workspace
from a workspace URL or path, e.g.local:///tmp/workspace
would give you aLocalWorkspace
in the directory/tmp/workspace
.For
LocalWorkspace
, you can also just pass in a plain path, e.g./tmp/workspace
. :rtype:Workspace
Tip
Registered as a workspace constructor under the name “from_url”.
- abstract register_run(targets, name=None)[source]#
Register a run in the workspace. A run is a set of target steps that a user wants to execute.
- search_registered_runs(*, sort_by=None, sort_descending=True, match=None, start=0, stop=None)[source]#
Search through registered runs in the workspace.
This method is primarily meant to be used to implement a UI, and workspaces don’t necessarily need to implement all sort_by or filter operations. They should only implement those that can be done efficiently.
Note
The data type returned in the list here is
RunInfo
, which contains a subset of the data in theRun
type.- Parameters:
sort_by (
Optional
[RunSort
], default:None
) – The field to sort the results by.sort_descending (
bool
, default:True
) – Sort the results in descending order of thesort_by
field.match (
Optional
[str
], default:None
) – Only return results with a name matching this string.start (
int
, default:0
) – Start from a certain index in the results.stop (
Optional
[int
], default:None
) – Stop at a certain index in the results.
- Raises:
NotImplementedError – If a workspace doesn’t support an efficient implementation for the given sorting/filtering criteria.
- Return type:
- search_step_info(*, sort_by=None, sort_descending=True, match=None, state=None, start=0, stop=None)[source]#
Search through steps in the workspace.
This method is primarily meant to be used to implement a UI, and workspaces don’t necessarily need to implement all sort_by or filter operations. They should only implement those that can be done efficiently.
- Parameters:
sort_by (
Optional
[StepInfoSort
], default:None
) – The field to sort the results by.sort_descending (
bool
, default:True
) – Sort the results in descending order of thesort_by
field.match (
Optional
[str
], default:None
) – Only return steps with a unique ID matching this string.state (
Optional
[StepState
], default:None
) – Only return steps that are in the given state.start (
int
, default:0
) – Start from a certain index in the results.stop (
Optional
[int
], default:None
) – Stop at a certain index in the results.
- Raises:
NotImplementedError – If a workspace doesn’t support an efficient implementation for the given sorting/filtering criteria.
- Return type:
- abstract step_failed(step, e)[source]#
The
Step
class calls this when a step failed.- Parameters:
step (
Step
) – The step that failed.e (
BaseException
) – The exception thrown by the step’sStep.run()
method.
- Raises:
StepStateError – If the step is in an unexpected state (e.g. RUNNING).
- Return type:
- abstract step_finished(step, result)[source]#
The
Step
class calls this when a step finished running.- Parameters:
step (
Step
) – The step that finished.- Raises:
StepStateError – If the step is in an unexpected state (e.g. RUNNING).
- Return type:
TypeVar
(T
)
This method is given the result of the step’s
Step.run()
method. It is expected to return that result. This gives it the opportunity to make changes to the result if necessary. For example, if theStep.run()
method returns an iterator, that iterator would be consumed when it’s written to the cache. So this method can handle the situation and return something other than the now-consumed iterator.
- step_result(step_name)[source]#
Get the result of a step from the latest run with a step by that name.
- abstract step_starting(step)[source]#
The
Step
class calls this when a step is about to start running.- Parameters:
step (
Step
) – The step that is about to start.- Raises:
StepStateError – If the step is in an unexpected state (e.g. RUNNING).
- Return type:
- work_dir(step)[source]#
Steps that can be restarted (like a training job that gets interrupted half-way through) must save their state somewhere. A
StepCache
can help by providing a suitable location in this method.By default, the step dir is a temporary directory that gets cleaned up after every run. This effectively disables restartability of steps.
- Return type:
- abstract property url: str#
Get a URL for the workspace that can be used to instantiate the same workspace using
from_url()
.
Implementations#
- class tango.workspaces.LocalWorkspace(dir)[source]#
This is a
Workspace
that keeps all its data in a local directory. This works great for single-machine jobs, or for multiple machines in a cluster if they can all access the same NFS drive.The directory will have three subdirectories,
cache/
for the step cache,runs/
for the runs, andlatest/
for the results of the latest run. For the format of thecache/
directory, refer toLocalStepCache
. Theruns/
directory will contain one subdirectory for each registered run. Each one of those contains a symlink from the name of the step to the results directory in the step cache. Note thatLocalWorkspace
creates these symlinks even for steps that have not finished yet. You can tell the difference because either the symlink points to a directory that doesn’t exist, or it points to a directory in the step cache that doesn’t contain results.Tip
Registered as a
Workspace
under the name “local”.You can also instantiate this workspace from a URL with the scheme
local://
. For example,Workspace.from_url("local:///tmp/workspace")
gives you aLocalWorkspace
in the directory/tmp/workspace
.
Metadata#
- class tango.workspace.Run(name, steps, start_date)[source]#
Stores information about a single Tango run.
- class tango.workspace.RunInfo(name, steps=None, start_date=None)[source]#
Stores partial data about a run. This is the type that you get back from
Workspace.search_registered_runs()
. The data here is a subset of the data in theRun
type because not all workspaces can fetch all of the data in theRun
type efficiently.