Utilities#
- class tango.common.DatasetDict(splits, metadata=<factory>)[source]#
A generic
Mapping
class of split names (str
) to datasets (Sequence[T]
).
- class tango.common.DatasetDictBase(splits, metadata=<factory>)[source]#
The base class for
DatasetDict
andIterableDatasetDict
.
- class tango.common.FromParams[source]#
Mixin to give a
from_params()
method to classes. We create a distinct base class for this because sometimes we want nonRegistrable
classes to be instantiatablefrom_params
.- classmethod from_params(params_, constructor_to_call=None, constructor_to_inspect=None, **extras)[source]#
This is the automatic implementation of
from_params
. Any class that subclasses fromFromParams
(orRegistrable
, which itself subclassesFromParams
) gets this implementation for free. If you want your class to be instantiated from params in the “obvious” way – pop off parameters and hand them to your constructor with the same names – this provides that functionality.If you need more complex logic in your from
from_params
method, you’ll have to implement your own method that overrides this one.The
constructor_to_call
andconstructor_to_inspect
arguments deal with a bit of redirection that we do. We allow you to register particular@classmethods
on a class as the constructor to use for a registered name. This lets you, e.g., have a singleVocabulary
class that can be constructed in two different ways, with different names registered to each constructor. In order to handle this, we need to know not just the class we’re trying to construct (cls
), but also what method we should inspect to find its arguments (constructor_to_inspect
), and what method to call when we’re done constructing arguments (constructor_to_call
). These two methods are the same when you’ve used a@classmethod
as your constructor, but they aredifferent
when you use the default constructor (because you inspect__init__
, but callcls()
).- Return type:
TypeVar
(T
, bound= FromParams)
- class tango.common.IterableDatasetDict(splits, metadata=<factory>)[source]#
An “iterable” version of
DatasetDict
, where the dataset splits have typeIterable[T]
instead ofSequence[T]
. This is useful for streaming datasets.
- class tango.common.Lazy(constructor, params=None, constructor_extras=None, **kwargs)[source]#
This class is for use when constructing objects using
FromParams
, when an argument to a constructor has a sequential dependency with another argument to the same constructor.For example, in a
Trainer
class you might want to take aModel
and anOptimizer
as arguments, but theOptimizer
needs to be constructed using the parameters from theModel
. You can give the type annotationLazy[Optimizer]
to the optimizer argument, then inside the constructor calloptimizer.construct(parameters=model.parameters)
.This is only recommended for use when you have registered a
@classmethod
as the constructor for your class, instead of using__init__
. Having aLazy[]
type annotation on an argument to an__init__
method makes your class completely dependent on being constructed using theFromParams
pipeline, which is not a good idea.The actual implementation here is incredibly simple; the logic that handles the lazy construction is actually found in
FromParams
, where we have a special case for aLazy
type annotation.Examples
@classmethod def my_constructor( cls, some_object: Lazy[MyObject], optional_object: Lazy[MyObject] = None, # or: # optional_object: Optional[Lazy[MyObject]] = None, optional_object_with_default: Optional[Lazy[MyObject]] = Lazy(MyObjectDefault), required_object_with_default: Lazy[MyObject] = Lazy(MyObjectDefault), ) -> MyClass: obj1 = some_object.construct() obj2 = None if optional_object is None else optional_object.construct() obj3 = None optional_object_with_default is None else optional_object_with_default.construct() obj4 = required_object_with_default.construct()
- class tango.common.Params(params, history='')[source]#
A
MutableMapping
that represents a parameter dictionary with a history, and contains other functionality around parameter passing and validation for AI2 Tango.There are currently two benefits of a
Params
object over a plain dictionary for parameter passing:We handle a few kinds of parameter validation, including making sure that parameters representing discrete choices actually have acceptable values, and making sure no extra parameters are passed.
We log all parameter reads, including default values. This gives a more complete specification of the actual parameters used than is given in a JSON file, because those may not specify what default values were used, whereas this will log them.
Important
The convention for using a
Params
object in Tango is that you will consume the parameters as you read them, so that there are none left when you’ve read everything you expect. This lets us easily validate that you didn’t pass in anyextra
parameters, just by making sure that the parameter dictionary is empty. You should do this when you’re done handling parameters, by callingParams.assert_empty()
.- as_dict(quiet=False, infer_type_and_cast=False)[source]#
Sometimes we need to just represent the parameters as a dict, for instance when we pass them to PyTorch code.
- as_flat_dict()[source]#
Returns the parameters of a flat dictionary from keys to values. Nested structure is collapsed with periods.
- as_ordered_dict(preference_orders=None)[source]#
Returns an
OrderedDict
ofParams
from list of partial order preferences.- Parameters:
preference_orders (
Optional
[List
[List
[str
]]], default:None
) –preference_orders
is list of partial preference orders. [“A”, “B”, “C”] means “A” > “B” > “C”. For multiple preference_orders first will be considered first. Keys not found, will have last but alphabetical preference. Default Preferences:[["dataset_reader", "iterator", "model", "train_data_path", "validation_data_path", "test_data_path", "trainer", "vocabulary"], ["type"]]
- Return type:
- assert_empty(name)[source]#
Raises a
ConfigurationError
ifself.params
is not empty. We takename
as an argument so that the error message gives some idea of where an error happened, if there was one. For example,name
could be the name of thecalling
class that got extra parameters (if there are any).
- duplicate()[source]#
Uses
copy.deepcopy()
to create a duplicate (but fully distinct) copy of these Params.- Return type:
- classmethod from_file(params_file, params_overrides='', ext_vars=None)[source]#
Load a
Params
object from a configuration file.- Parameters:
params_file (
Union
[str
,PathLike
]) – The path to the configuration file to load. Can be JSON, Jsonnet, or YAML.params_overrides (
Union
[str
,Dict
[str
,Any
]], default:''
) – A dict of overrides that can be applied to final object. e.g.{"model.embedding_dim": 10}
will change the value of “embedding_dim” within the “model” object of the config to 10. If you wanted to override the entire “model” object of the config, you could do{"model": {"type": "other_type", ...}}
.ext_vars (
Optional
[dict
], default:None
) – Our config files are Jsonnet, which allows specifying external variables for later substitution. Typically we substitute these using environment variables; however, you can also specify them here, in which case they take priority over environment variables. e.g.{"HOME_DIR": "/Users/allennlp/home"}
- Return type:
- get(key, default=<object object>)[source]#
Performs the functionality associated with
dict.get(key)
but also checks for returned dicts and returns aParams
object in their place with an updated history.
- get_hash()[source]#
Returns a hash code representing the current state of this
Params
object. We don’t want to implement__hash__
because that has deeper python implications (and this is a mutable object), but this will give you a representation of the current state. We usezlib.adler32
instead of Python’s builtinhash
because the random seed for the latter is reset on each new program invocation, as discussed here: https://stackoverflow.com/questions/27954892/deterministic-hashing-in-python-3.- Return type:
- pop(key, default=<object object>, keep_as_dict=False)[source]#
Performs the functionality associated with
dict.pop(key)
, along with checking for returned dictionaries, replacing them with Param objects with an updated history (unless keep_as_dict is True, in which case we leave them as dictionaries).If
key
is not present in the dictionary, and no default was specified, we raise aConfigurationError
, instead of the typicalKeyError
.- Return type:
- pop_choice(key, choices, default_to_first_choice=False, allow_class_names=True)[source]#
Gets the value of
key
in theparams
dictionary, ensuring that the value is one of the given choices. Note that thispops
the key from params, modifying the dictionary, consistent with how parameters are processed in this codebase.- Parameters:
key (
str
) – Key to get the value from in the param dictionarychoices (
List
[Any
]) – A list of valid options for values corresponding tokey
. For example, if you’re specifying the type of encoder to use for some part of your model, the choices might be the list of encoder classes we know about and can instantiate. If the value we find in the param dictionary is not inchoices
, we raise aConfigurationError
, because the user specified an invalid value in their parameter file.default_to_first_choice (
bool
, default:False
) – If this isTrue
, we allow thekey
to not be present in the parameter dictionary. If the key is not present, we will use the return as the value the first choice in thechoices
list. If this isFalse
, we raise aConfigurationError
, because specifying thekey
is required (e.g., youhave
to specify your model class when running an experiment, but you can feel free to use default settings for encoders if you want).allow_class_names (
bool
, default:True
) – If this isTrue
, then we allow unknown choices that look like fully-qualified class names. This is to allow e.g. specifying a model type asmy_library.my_model.MyModel
and importing it on the fly. Our check for “looks like” is extremely lenient and consists of checking that the value contains a ‘.’.
- Return type:
- class tango.common.Registrable[source]#
Any class that inherits from
Registrable
gains access to a named registry for its subclasses. To register them, just decorate them with the classmethod@BaseClass.register(name)
.After which you can call
BaseClass.list_available()
to get the keys for the registered subclasses, andBaseClass.by_name(name)
to get the corresponding subclass. Note that the registry stores the subclasses themselves; not class instances. In most cases you would then callfrom_params()
on the returned subclass.You can specify a default by setting
BaseClass.default_implementation
. If it is set, it will be the first element oflist_available()
.Note that if you use this class to implement a new
Registrable
abstract class, you must ensure that all subclasses of the abstract class are loaded when the module is loaded, because the subclasses register themselves in their respective files. You can achieve this by having the abstract class and all subclasses in the__init__.py
of the module in which they reside (as this causes any import of either the abstract class or a subclass to load all other subclasses and the abstract class).- classmethod by_name(name)[source]#
Returns a callable function that constructs an argument of the registered class. Because you can register particular functions as constructors for specific names, this isn’t necessarily the
__init__
method of some class.
- classmethod register(name, constructor=None, exist_ok=False)[source]#
Register a class under a particular name.
- Parameters:
name (
str
) – The name to register the class under.constructor (
Optional
[str
], default:None
) – The name of the method to use on the class to construct the object. If this is given, we will use this method (which must be a@classmethod
) instead of the default constructor.exist_ok (
bool
, default:False
) – If True, overwrites any existing models registered undername
. Else, throws an error if a model is already registered undername
.
- Return type:
Examples
To use this class, you would typically have a base class that inherits from
Registrable
:class Vocabulary(Registrable): ...
Then, if you want to register a subclass, you decorate it like this:
@Vocabulary.register("my-vocabulary") class MyVocabulary(Vocabulary): def __init__(self, param1: int, param2: str): ...
Registering a class like this will let you instantiate a class from a config file, where you give
"type": "my-vocabulary"
, and keys corresponding to the parameters of the__init__
method (note that for this to work, those parameters must have type annotations).If you want to have the instantiation from a config file call a method other than the constructor, either because you have several different construction paths that could be taken for the same object (as we do in
Vocabulary
) or because you have logic you want to happen before you get to the constructor (as we do inEmbedding
), you can register a specific@classmethod
as the constructor to use, like this:@Vocabulary.register("my-vocabulary-from-instances", constructor="from_instances") @Vocabulary.register("my-vocabulary-from-files", constructor="from_files") class MyVocabulary(Vocabulary): def __init__(self, some_params): ... @classmethod def from_instances(cls, some_other_params) -> MyVocabulary: ... # construct some_params from instances return cls(some_params) @classmethod def from_files(cls, still_other_params) -> MyVocabulary: ... # construct some_params from files return cls(some_params)
- classmethod resolve_class_name(name, search_modules=True)[source]#
Returns the subclass that corresponds to the given
name
, along with the name of the method that was registered as a constructor for thatname
, if any.This method also allows
name
to be a fully-specified module name, instead of a name that was already added to theRegistry
. In that case, you cannot use a separate function as a constructor (as you need to callcls.register()
in order to tell us what separate function to use).If the
name
given is not in the registry andsearch_modules
isTrue
, it will search for and import modules where the class might be defined according tosearch_modules()
.
- class tango.common.RegistrableFunction[source]#
A registrable class mimicking a Callable. This is to allow referring to functions by their name in tango configurations.
- class tango.common.Tqdm[source]#
A tqdm wrapper that respects
FILE_FRIENDLY_LOGGING
and other Tango logging configurations.
- tango.common.make_registrable(name=None, *, exist_ok=False)[source]#
A decorator to create a
RegistrableFunction
from a function.- Parameters:
- tango.common.threaded_generator(g, queue_size=16)[source]#
Puts the generating side of a generator into its own thread
Let’s say you have a generator that reads records from disk, and something that consumes the generator that spends most of its time in PyTorch. Wouldn’t it be great if you could read more records while the PyTorch code runs? If you wrap your record-reading generator with
threaded_generator(inner)
, that’s exactly what happens. The reading code will run in a new thread, while the consuming code runs in the main thread as normal.threaded_generator()
uses a queue to hand off items.- Parameters:
queue_size (
int
, default:16
) – the maximum queue size for hand-offs between the main thread and the generator thread