Utilities#
- class tango.common.DatasetDict(splits, metadata=<factory>)[source]#
A generic
Mappingclass of split names (str) to datasets (Sequence[T]).
- class tango.common.DatasetDictBase(splits, metadata=<factory>)[source]#
The base class for
DatasetDictandIterableDatasetDict.
- class tango.common.FromParams[source]#
Mixin to give a
from_params()method to classes. We create a distinct base class for this because sometimes we want nonRegistrableclasses to be instantiatablefrom_params.- classmethod from_params(params_, constructor_to_call=None, constructor_to_inspect=None, **extras)[source]#
This is the automatic implementation of
from_params. Any class that subclasses fromFromParams(orRegistrable, which itself subclassesFromParams) gets this implementation for free. If you want your class to be instantiated from params in the “obvious” way – pop off parameters and hand them to your constructor with the same names – this provides that functionality.If you need more complex logic in your from
from_paramsmethod, you’ll have to implement your own method that overrides this one.The
constructor_to_callandconstructor_to_inspectarguments deal with a bit of redirection that we do. We allow you to register particular@classmethodson a class as the constructor to use for a registered name. This lets you, e.g., have a singleVocabularyclass that can be constructed in two different ways, with different names registered to each constructor. In order to handle this, we need to know not just the class we’re trying to construct (cls), but also what method we should inspect to find its arguments (constructor_to_inspect), and what method to call when we’re done constructing arguments (constructor_to_call). These two methods are the same when you’ve used a@classmethodas your constructor, but they aredifferentwhen you use the default constructor (because you inspect__init__, but callcls()).- Return type:
TypeVar(T, bound= FromParams)
- class tango.common.IterableDatasetDict(splits, metadata=<factory>)[source]#
An “iterable” version of
DatasetDict, where the dataset splits have typeIterable[T]instead ofSequence[T]. This is useful for streaming datasets.
- class tango.common.Lazy(constructor, params=None, constructor_extras=None, **kwargs)[source]#
This class is for use when constructing objects using
FromParams, when an argument to a constructor has a sequential dependency with another argument to the same constructor.For example, in a
Trainerclass you might want to take aModeland anOptimizeras arguments, but theOptimizerneeds to be constructed using the parameters from theModel. You can give the type annotationLazy[Optimizer]to the optimizer argument, then inside the constructor calloptimizer.construct(parameters=model.parameters).This is only recommended for use when you have registered a
@classmethodas the constructor for your class, instead of using__init__. Having aLazy[]type annotation on an argument to an__init__method makes your class completely dependent on being constructed using theFromParamspipeline, which is not a good idea.The actual implementation here is incredibly simple; the logic that handles the lazy construction is actually found in
FromParams, where we have a special case for aLazytype annotation.Examples
@classmethod def my_constructor( cls, some_object: Lazy[MyObject], optional_object: Lazy[MyObject] = None, # or: # optional_object: Optional[Lazy[MyObject]] = None, optional_object_with_default: Optional[Lazy[MyObject]] = Lazy(MyObjectDefault), required_object_with_default: Lazy[MyObject] = Lazy(MyObjectDefault), ) -> MyClass: obj1 = some_object.construct() obj2 = None if optional_object is None else optional_object.construct() obj3 = None optional_object_with_default is None else optional_object_with_default.construct() obj4 = required_object_with_default.construct()
- class tango.common.Params(params, history='')[source]#
A
MutableMappingthat represents a parameter dictionary with a history, and contains other functionality around parameter passing and validation for AI2 Tango.There are currently two benefits of a
Paramsobject over a plain dictionary for parameter passing:We handle a few kinds of parameter validation, including making sure that parameters representing discrete choices actually have acceptable values, and making sure no extra parameters are passed.
We log all parameter reads, including default values. This gives a more complete specification of the actual parameters used than is given in a JSON file, because those may not specify what default values were used, whereas this will log them.
Important
The convention for using a
Paramsobject in Tango is that you will consume the parameters as you read them, so that there are none left when you’ve read everything you expect. This lets us easily validate that you didn’t pass in anyextraparameters, just by making sure that the parameter dictionary is empty. You should do this when you’re done handling parameters, by callingParams.assert_empty().- as_dict(quiet=False, infer_type_and_cast=False)[source]#
Sometimes we need to just represent the parameters as a dict, for instance when we pass them to PyTorch code.
- as_flat_dict()[source]#
Returns the parameters of a flat dictionary from keys to values. Nested structure is collapsed with periods.
- as_ordered_dict(preference_orders=None)[source]#
Returns an
OrderedDictofParamsfrom list of partial order preferences.- Parameters:
preference_orders (
Optional[List[List[str]]], default:None) –preference_ordersis list of partial preference orders. [“A”, “B”, “C”] means “A” > “B” > “C”. For multiple preference_orders first will be considered first. Keys not found, will have last but alphabetical preference. Default Preferences:[["dataset_reader", "iterator", "model", "train_data_path", "validation_data_path", "test_data_path", "trainer", "vocabulary"], ["type"]]- Return type:
- assert_empty(name)[source]#
Raises a
ConfigurationErrorifself.paramsis not empty. We takenameas an argument so that the error message gives some idea of where an error happened, if there was one. For example,namecould be the name of thecallingclass that got extra parameters (if there are any).
- duplicate()[source]#
Uses
copy.deepcopy()to create a duplicate (but fully distinct) copy of these Params.- Return type:
- classmethod from_file(params_file, params_overrides='', ext_vars=None)[source]#
Load a
Paramsobject from a configuration file.- Parameters:
params_file (
Union[str,PathLike]) – The path to the configuration file to load. Can be JSON, Jsonnet, or YAML.params_overrides (
Union[str,Dict[str,Any]], default:'') – A dict of overrides that can be applied to final object. e.g.{"model.embedding_dim": 10}will change the value of “embedding_dim” within the “model” object of the config to 10. If you wanted to override the entire “model” object of the config, you could do{"model": {"type": "other_type", ...}}.ext_vars (
Optional[dict], default:None) – Our config files are Jsonnet, which allows specifying external variables for later substitution. Typically we substitute these using environment variables; however, you can also specify them here, in which case they take priority over environment variables. e.g.{"HOME_DIR": "/Users/allennlp/home"}
- Return type:
- get(key, default=<object object>)[source]#
Performs the functionality associated with
dict.get(key)but also checks for returned dicts and returns aParamsobject in their place with an updated history.
- get_hash()[source]#
Returns a hash code representing the current state of this
Paramsobject. We don’t want to implement__hash__because that has deeper python implications (and this is a mutable object), but this will give you a representation of the current state. We usezlib.adler32instead of Python’s builtinhashbecause the random seed for the latter is reset on each new program invocation, as discussed here: https://stackoverflow.com/questions/27954892/deterministic-hashing-in-python-3.- Return type:
- pop(key, default=<object object>, keep_as_dict=False)[source]#
Performs the functionality associated with
dict.pop(key), along with checking for returned dictionaries, replacing them with Param objects with an updated history (unless keep_as_dict is True, in which case we leave them as dictionaries).If
keyis not present in the dictionary, and no default was specified, we raise aConfigurationError, instead of the typicalKeyError.- Return type:
- pop_choice(key, choices, default_to_first_choice=False, allow_class_names=True)[source]#
Gets the value of
keyin theparamsdictionary, ensuring that the value is one of the given choices. Note that thispopsthe key from params, modifying the dictionary, consistent with how parameters are processed in this codebase.- Parameters:
key (
str) – Key to get the value from in the param dictionarychoices (
List[Any]) – A list of valid options for values corresponding tokey. For example, if you’re specifying the type of encoder to use for some part of your model, the choices might be the list of encoder classes we know about and can instantiate. If the value we find in the param dictionary is not inchoices, we raise aConfigurationError, because the user specified an invalid value in their parameter file.default_to_first_choice (
bool, default:False) – If this isTrue, we allow thekeyto not be present in the parameter dictionary. If the key is not present, we will use the return as the value the first choice in thechoiceslist. If this isFalse, we raise aConfigurationError, because specifying thekeyis required (e.g., youhaveto specify your model class when running an experiment, but you can feel free to use default settings for encoders if you want).allow_class_names (
bool, default:True) – If this isTrue, then we allow unknown choices that look like fully-qualified class names. This is to allow e.g. specifying a model type asmy_library.my_model.MyModeland importing it on the fly. Our check for “looks like” is extremely lenient and consists of checking that the value contains a ‘.’.
- Return type:
- class tango.common.Registrable[source]#
Any class that inherits from
Registrablegains access to a named registry for its subclasses. To register them, just decorate them with the classmethod@BaseClass.register(name).After which you can call
BaseClass.list_available()to get the keys for the registered subclasses, andBaseClass.by_name(name)to get the corresponding subclass. Note that the registry stores the subclasses themselves; not class instances. In most cases you would then callfrom_params()on the returned subclass.You can specify a default by setting
BaseClass.default_implementation. If it is set, it will be the first element oflist_available().Note that if you use this class to implement a new
Registrableabstract class, you must ensure that all subclasses of the abstract class are loaded when the module is loaded, because the subclasses register themselves in their respective files. You can achieve this by having the abstract class and all subclasses in the__init__.pyof the module in which they reside (as this causes any import of either the abstract class or a subclass to load all other subclasses and the abstract class).- classmethod by_name(name)[source]#
Returns a callable function that constructs an argument of the registered class. Because you can register particular functions as constructors for specific names, this isn’t necessarily the
__init__method of some class.
- classmethod register(name, constructor=None, exist_ok=False)[source]#
Register a class under a particular name.
- Parameters:
name (
str) – The name to register the class under.constructor (
Optional[str], default:None) – The name of the method to use on the class to construct the object. If this is given, we will use this method (which must be a@classmethod) instead of the default constructor.exist_ok (
bool, default:False) – If True, overwrites any existing models registered undername. Else, throws an error if a model is already registered undername.
- Return type:
Examples
To use this class, you would typically have a base class that inherits from
Registrable:class Vocabulary(Registrable): ...
Then, if you want to register a subclass, you decorate it like this:
@Vocabulary.register("my-vocabulary") class MyVocabulary(Vocabulary): def __init__(self, param1: int, param2: str): ...
Registering a class like this will let you instantiate a class from a config file, where you give
"type": "my-vocabulary", and keys corresponding to the parameters of the__init__method (note that for this to work, those parameters must have type annotations).If you want to have the instantiation from a config file call a method other than the constructor, either because you have several different construction paths that could be taken for the same object (as we do in
Vocabulary) or because you have logic you want to happen before you get to the constructor (as we do inEmbedding), you can register a specific@classmethodas the constructor to use, like this:@Vocabulary.register("my-vocabulary-from-instances", constructor="from_instances") @Vocabulary.register("my-vocabulary-from-files", constructor="from_files") class MyVocabulary(Vocabulary): def __init__(self, some_params): ... @classmethod def from_instances(cls, some_other_params) -> MyVocabulary: ... # construct some_params from instances return cls(some_params) @classmethod def from_files(cls, still_other_params) -> MyVocabulary: ... # construct some_params from files return cls(some_params)
- classmethod resolve_class_name(name, search_modules=True)[source]#
Returns the subclass that corresponds to the given
name, along with the name of the method that was registered as a constructor for thatname, if any.This method also allows
nameto be a fully-specified module name, instead of a name that was already added to theRegistry. In that case, you cannot use a separate function as a constructor (as you need to callcls.register()in order to tell us what separate function to use).If the
namegiven is not in the registry andsearch_modulesisTrue, it will search for and import modules where the class might be defined according tosearch_modules().
- class tango.common.RegistrableFunction[source]#
A registrable class mimicking a Callable. This is to allow referring to functions by their name in tango configurations.
- class tango.common.Tqdm[source]#
A tqdm wrapper that respects
FILE_FRIENDLY_LOGGINGand other Tango logging configurations.
- tango.common.make_registrable(name=None, *, exist_ok=False)[source]#
A decorator to create a
RegistrableFunctionfrom a function.- Parameters:
- tango.common.threaded_generator(g, queue_size=16)[source]#
Puts the generating side of a generator into its own thread
Let’s say you have a generator that reads records from disk, and something that consumes the generator that spends most of its time in PyTorch. Wouldn’t it be great if you could read more records while the PyTorch code runs? If you wrap your record-reading generator with
threaded_generator(inner), that’s exactly what happens. The reading code will run in a new thread, while the consuming code runs in the main thread as normal.threaded_generator()uses a queue to hand off items.- Parameters:
queue_size (
int, default:16) – the maximum queue size for hand-offs between the main thread and the generator thread