Utilities#

class tango.common.DatasetDict(splits, metadata=<factory>)[source]#: A generic Mapping class of split names (str) to datasets (Sequence[T]).

class tango.common.DatasetDictBase(splits, metadata=<factory>)[source]#

The base class for DatasetDict and IterableDatasetDict.

keys()[source]#: Returns the split names in splits.

metadata: Mapping[str, Any]#: Metadata can contain anything you need.

splits: Mapping[str, TypeVar(S)]#: A mapping of dataset split names to splits.

class tango.common.FromParams[source]#

Mixin to give a from_params() method to classes. We create a distinct base class for this because sometimes we want non Registrable classes to be instantiatable from_params.

classmethod from_params(params_, constructor_to_call=None, constructor_to_inspect=None, **extras)[source]#

This is the automatic implementation of from_params. Any class that subclasses from FromParams (or Registrable, which itself subclasses FromParams) gets this implementation for free. If you want your class to be instantiated from params in the “obvious” way – pop off parameters and hand them to your constructor with the same names – this provides that functionality.

If you need more complex logic in your from from_params method, you’ll have to implement your own method that overrides this one.

The constructor_to_call and constructor_to_inspect arguments deal with a bit of redirection that we do. We allow you to register particular @classmethods on a class as the constructor to use for a registered name. This lets you, e.g., have a single Vocabulary class that can be constructed in two different ways, with different names registered to each constructor. In order to handle this, we need to know not just the class we’re trying to construct (cls), but also what method we should inspect to find its arguments (constructor_to_inspect), and what method to call when we’re done constructing arguments (constructor_to_call). These two methods are the same when you’ve used a @classmethod as your constructor, but they are different when you use the default constructor (because you inspect __init__, but call cls()).

Return type:: TypeVar(T, bound= FromParams)

to_params()[source]#

Returns a Params object that can be used with .from_params() to recreate an object just like it.

This relies on _to_params(). If you need this in your custom FromParams class, override _to_params(), not this method.

Return type:: Params

class tango.common.IterableDatasetDict(splits, metadata=<factory>)[source]#: An “iterable” version of DatasetDict, where the dataset splits have type Iterable[T] instead of Sequence[T]. This is useful for streaming datasets.

class tango.common.Lazy(constructor, params=None, constructor_extras=None, **kwargs)[source]#

This class is for use when constructing objects using FromParams, when an argument to a constructor has a sequential dependency with another argument to the same constructor.

For example, in a Trainer class you might want to take a Model and an Optimizer as arguments, but the Optimizer needs to be constructed using the parameters from the Model. You can give the type annotation Lazy[Optimizer] to the optimizer argument, then inside the constructor call optimizer.construct(parameters=model.parameters).

This is only recommended for use when you have registered a @classmethod as the constructor for your class, instead of using __init__. Having a Lazy[] type annotation on an argument to an __init__ method makes your class completely dependent on being constructed using the FromParams pipeline, which is not a good idea.

The actual implementation here is incredibly simple; the logic that handles the lazy construction is actually found in FromParams, where we have a special case for a Lazy type annotation.

Examples

@classmethod
def my_constructor(
    cls,
    some_object: Lazy[MyObject],
    optional_object: Lazy[MyObject] = None,
    # or:
    #  optional_object: Optional[Lazy[MyObject]] = None,
    optional_object_with_default: Optional[Lazy[MyObject]] = Lazy(MyObjectDefault),
    required_object_with_default: Lazy[MyObject] = Lazy(MyObjectDefault),
) -> MyClass:
    obj1 = some_object.construct()
    obj2 = None if optional_object is None else optional_object.construct()
    obj3 = None optional_object_with_default is None else optional_object_with_default.construct()
    obj4 = required_object_with_default.construct()

construct(**kwargs)[source]#

Call the constructor to create an instance of T.

Return type:: TypeVar(T)

det_hash_object()[source]#

Return an object to use for deterministic hashing instead of self.

Return type:: Any

class tango.common.Params(params, history='')[source]#

A MutableMapping that represents a parameter dictionary with a history, and contains other functionality around parameter passing and validation for AI2 Tango.

There are currently two benefits of a Params object over a plain dictionary for parameter passing:

We handle a few kinds of parameter validation, including making sure that parameters representing discrete choices actually have acceptable values, and making sure no extra parameters are passed.
We log all parameter reads, including default values. This gives a more complete specification of the actual parameters used than is given in a JSON file, because those may not specify what default values were used, whereas this will log them.

Important

The convention for using a Params object in Tango is that you will consume the parameters as you read them, so that there are none left when you’ve read everything you expect. This lets us easily validate that you didn’t pass in any extra parameters, just by making sure that the parameter dictionary is empty. You should do this when you’re done handling parameters, by calling Params.assert_empty().

as_dict(quiet=False, infer_type_and_cast=False)[source]#

Sometimes we need to just represent the parameters as a dict, for instance when we pass them to PyTorch code.

Parameters:

quiet (bool, default: False) – Whether to log the parameters before returning them as a dict.
infer_type_and_cast (bool, default: False) – If True, we infer types and cast (e.g. things that look like floats to floats).

as_flat_dict()[source]#

Returns the parameters of a flat dictionary from keys to values. Nested structure is collapsed with periods.

Return type:: Dict[str, Any]

as_ordered_dict(preference_orders=None)[source]#

Returns an OrderedDict of Params from list of partial order preferences.

Parameters:: preference_orders (Optional[List[List[str]]], default: None) – preference_orders is list of partial preference orders. [“A”, “B”, “C”] means “A” > “B” > “C”. For multiple preference_orders first will be considered first. Keys not found, will have last but alphabetical preference. Default Preferences: [["dataset_reader", "iterator", "model", "train_data_path", "validation_data_path", "test_data_path", "trainer", "vocabulary"], ["type"]]
Return type:: OrderedDict

assert_empty(name)[source]#: Raises a ConfigurationError if self.params is not empty. We take name as an argument so that the error message gives some idea of where an error happened, if there was one. For example, name could be the name of the calling class that got extra parameters (if there are any).

duplicate()[source]#

Uses copy.deepcopy() to create a duplicate (but fully distinct) copy of these Params.

Return type:: Params

classmethod from_file(params_file, params_overrides='', ext_vars=None)[source]#

Load a Params object from a configuration file.

Parameters:

params_file (Union[str, PathLike]) – The path to the configuration file to load. Can be JSON, Jsonnet, or YAML.
params_overrides (Union[str, Dict[str, Any]], default: '') – A dict of overrides that can be applied to final object. e.g. {"model.embedding_dim": 10} will change the value of “embedding_dim” within the “model” object of the config to 10. If you wanted to override the entire “model” object of the config, you could do {"model": {"type": "other_type", ...}}.
ext_vars (Optional[dict], default: None) – Our config files are Jsonnet, which allows specifying external variables for later substitution. Typically we substitute these using environment variables; however, you can also specify them here, in which case they take priority over environment variables. e.g. {"HOME_DIR": "/Users/allennlp/home"}

Return type:

Params

get(key, default=<object object>)[source]#: Performs the functionality associated with dict.get(key) but also checks for returned dicts and returns a Params object in their place with an updated history.

get_hash()[source]#

Returns a hash code representing the current state of this Params object. We don’t want to implement __hash__ because that has deeper python implications (and this is a mutable object), but this will give you a representation of the current state. We use zlib.adler32 instead of Python’s builtin hash because the random seed for the latter is reset on each new program invocation, as discussed here: https://stackoverflow.com/questions/27954892/deterministic-hashing-in-python-3.

Return type:: str

pop(key, default=<object object>, keep_as_dict=False)[source]#

Performs the functionality associated with dict.pop(key), along with checking for returned dictionaries, replacing them with Param objects with an updated history (unless keep_as_dict is True, in which case we leave them as dictionaries).

If key is not present in the dictionary, and no default was specified, we raise a ConfigurationError, instead of the typical KeyError.

Return type:: Any

pop_bool(key, default=<object object>)[source]#

Performs a pop and coerces to a bool.

Return type:: Optional[bool]

pop_choice(key, choices, default_to_first_choice=False, allow_class_names=True)[source]#

Gets the value of key in the params dictionary, ensuring that the value is one of the given choices. Note that this pops the key from params, modifying the dictionary, consistent with how parameters are processed in this codebase.

Parameters:

key (str) – Key to get the value from in the param dictionary
choices (List[Any]) – A list of valid options for values corresponding to key. For example, if you’re specifying the type of encoder to use for some part of your model, the choices might be the list of encoder classes we know about and can instantiate. If the value we find in the param dictionary is not in choices, we raise a ConfigurationError, because the user specified an invalid value in their parameter file.
default_to_first_choice (bool, default: False) – If this is True, we allow the key to not be present in the parameter dictionary. If the key is not present, we will use the return as the value the first choice in the choices list. If this is False, we raise a ConfigurationError, because specifying the key is required (e.g., you have to specify your model class when running an experiment, but you can feel free to use default settings for encoders if you want).
allow_class_names (bool, default: True) – If this is True, then we allow unknown choices that look like fully-qualified class names. This is to allow e.g. specifying a model type as my_library.my_model.MyModel and importing it on the fly. Our check for “looks like” is extremely lenient and consists of checking that the value contains a ‘.’.

Return type:

Any

pop_float(key, default=<object object>)[source]#

Performs a pop and coerces to a float.

Return type:: Optional[float]

pop_int(key, default=<object object>)[source]#

Performs a pop and coerces to an int.

Return type:: Optional[int]

to_file(params_file, preference_orders=None)[source]#

Write the params to file.

Return type:: None

class tango.common.Registrable[source]#

Any class that inherits from Registrable gains access to a named registry for its subclasses. To register them, just decorate them with the classmethod @BaseClass.register(name).

After which you can call BaseClass.list_available() to get the keys for the registered subclasses, and BaseClass.by_name(name) to get the corresponding subclass. Note that the registry stores the subclasses themselves; not class instances. In most cases you would then call from_params() on the returned subclass.

You can specify a default by setting BaseClass.default_implementation. If it is set, it will be the first element of list_available().

Note that if you use this class to implement a new Registrable abstract class, you must ensure that all subclasses of the abstract class are loaded when the module is loaded, because the subclasses register themselves in their respective files. You can achieve this by having the abstract class and all subclasses in the __init__.py of the module in which they reside (as this causes any import of either the abstract class or a subclass to load all other subclasses and the abstract class).

classmethod by_name(name)[source]#

Returns a callable function that constructs an argument of the registered class. Because you can register particular functions as constructors for specific names, this isn’t necessarily the __init__ method of some class.

Return type:: Callable[..., TypeVar(_RegistrableT, bound= Registrable)]

classmethod list_available()[source]#

List default first if it exists

Return type:: List[str]

classmethod register(name, constructor=None, exist_ok=False)[source]#

Parameters:

name (str) – The name to register the class under.
constructor (Optional[str], default: None) – The name of the method to use on the class to construct the object. If this is given, we will use this method (which must be a @classmethod) instead of the default constructor.
exist_ok (bool, default: False) – If True, overwrites any existing models registered under name. Else, throws an error if a model is already registered under name.

Return type:

Callable[[Type[TypeVar(_T)]], Type[TypeVar(_T)]]

Examples

To use this class, you would typically have a base class that inherits from Registrable:

class Vocabulary(Registrable):
    ...

Then, if you want to register a subclass, you decorate it like this:

@Vocabulary.register("my-vocabulary")
class MyVocabulary(Vocabulary):
    def __init__(self, param1: int, param2: str):
        ...

Registering a class like this will let you instantiate a class from a config file, where you give "type": "my-vocabulary", and keys corresponding to the parameters of the __init__ method (note that for this to work, those parameters must have type annotations).

If you want to have the instantiation from a config file call a method other than the constructor, either because you have several different construction paths that could be taken for the same object (as we do in Vocabulary) or because you have logic you want to happen before you get to the constructor (as we do in Embedding), you can register a specific @classmethod as the constructor to use, like this:

@Vocabulary.register("my-vocabulary-from-instances", constructor="from_instances")
@Vocabulary.register("my-vocabulary-from-files", constructor="from_files")
class MyVocabulary(Vocabulary):
    def __init__(self, some_params):
        ...

    @classmethod
    def from_instances(cls, some_other_params) -> MyVocabulary:
        ...  # construct some_params from instances
        return cls(some_params)

    @classmethod
    def from_files(cls, still_other_params) -> MyVocabulary:
        ...  # construct some_params from files
        return cls(some_params)

classmethod resolve_class_name(name, search_modules=True)[source]#

Returns the subclass that corresponds to the given name, along with the name of the method that was registered as a constructor for that name, if any.

This method also allows name to be a fully-specified module name, instead of a name that was already added to the Registry. In that case, you cannot use a separate function as a constructor (as you need to call cls.register() in order to tell us what separate function to use).

If the name given is not in the registry and search_modules is True, it will search for and import modules where the class might be defined according to search_modules().

Return type:: Tuple[Type[TypeVar(_RegistrableT, bound= Registrable)], Optional[str]]

classmethod search_modules(name)[source]#: Search for and import modules where name might be registered.

class tango.common.RegistrableFunction[source]#: A registrable class mimicking a Callable. This is to allow referring to functions by their name in tango configurations.

class tango.common.Tqdm[source]#: A tqdm wrapper that respects FILE_FRIENDLY_LOGGING and other Tango logging configurations.

tango.common.make_registrable(name=None, *, exist_ok=False)[source]#

A decorator to create a RegistrableFunction from a function.

Parameters:

name (Optional[str], default: None) – A name to register the function under. By default the name of the function is used.
exist_ok (bool, default: False) – If True, overwrites any existing function registered under the same name. Else, throws an error if a function is already registered under name.

tango.common.threaded_generator(g, queue_size=16)[source]#

Puts the generating side of a generator into its own thread

Let’s say you have a generator that reads records from disk, and something that consumes the generator that spends most of its time in PyTorch. Wouldn’t it be great if you could read more records while the PyTorch code runs? If you wrap your record-reading generator with threaded_generator(inner), that’s exactly what happens. The reading code will run in a new thread, while the consuming code runs in the main thread as normal. threaded_generator() uses a queue to hand off items.

Parameters:: queue_size (int, default: 16) – the maximum queue size for hand-offs between the main thread and the generator thread