Sequences#
This module contains some utilities to make sequences out of other sequences. All of these are lazy, so they
take minimal time and memory when you create them. These work particularly well when used together. For example,
you can concatenate two sequences (ConcatenatedSequence
), and then shuffle
them (ShuffledSequence
).
This module is not dependent on other Tango modules and can be used in isolation.
- class tango.common.sequences.ConcatenatedSequence(*sequences)[source]#
Produces a sequence that’s the lazy concatenation of multiple other sequences. It does not copy any of the elements of the original sequences.
This assumes that the inner sequences never change. If they do, the results are undefined.
- Parameters:
sequences (
Sequence
) – the inner sequences to concatenate
Example:
from tango.common.sequences import ConcatenatedSequence l1 = [1, 2, 3] l2 = [4, 5] l3 = [6] cat_l = ConcatenatedSequence(l1, l2, l3) assert len(cat_l) == 6 for i in cat_l: print(i)
This will print the following:
1 2 3 4 5 6
- class tango.common.sequences.MappedSequence(fn, inner_sequence)[source]#
Produces a sequence that applies a function to every element of another sequence.
This is similar to Python’s
map()
, but it returns a sequence instead of amap
object.- Parameters:
Example:
from tango.common.sequences import MappedSequence def square(x): return x * x l = [1, 2, 3, 4] map_l = MappedSequence(square, l) assert len(map_l) == len(l) for i in map_l: print(i)
This will print the following:
1 4 9 16
- class tango.common.sequences.ShuffledSequence(inner_sequence, indices=None)[source]#
Produces a shuffled view of a sequence, such as a list.
This assumes that the inner sequence never changes. If it does, the results are undefined.
- Parameters:
inner_sequence (
Sequence
) – the inner sequence that’s being shuffledindices (
Optional
[Sequence
[int
]], default:None
) – Optionally, you can specify a list of indices here. If you don’t, we’ll just shuffle the inner sequence randomly. If you do specify indices, elementn
of the output sequence will beinner_sequence[indices[n]]
. This gives you great flexibility. You can repeat elements, leave them out completely, or slice the list. A Pythonslice
object is an acceptable input for this parameter, and so are other sequences from this module.
Example:
from tango.common.sequences import ShuffledSequence l = [1, 2, 3, 4, 5, 6, 7, 8, 9] shuffled_l = ShuffledSequence(l) print(shuffled_l[0]) print(shuffled_l[1]) print(shuffled_l[2]) assert len(shuffled_l) == len(l)
This will print something like the following:
4 7 8
- class tango.common.sequences.SlicedSequence(inner_sequence, s)[source]#
Produces a sequence that’s a slice into another sequence, without copying the elements.
This assumes that the inner sequence never changes. If it does, the results are undefined.
- Parameters:
Example:
from tango.common.sequences import SlicedSequence l = [1, 2, 3, 4, 5, 6, 7, 8, 9] sliced_l = SlicedSequence(l, slice(1, 4)) print(sliced_l[0]) print(sliced_l[1]) print(sliced_l[2]) assert len(sliced_l) == 3
This will print the following:
2 3 4
- class tango.common.sequences.SqliteSparseSequence(filename, read_only=False)[source]#
This is a sparse sequence that pickles elements to a Sqlite database.
When you read from the sequence, elements are retrieved and unpickled lazily. That means creating/opening a sequence is very fast and does not depend on the length of the sequence.
This is a “sparse sequence” because you can set element
n
before you set elementn-1
:s = SqliteSparseSequence(filename) element = "Big number, small database." s[2**32] = element assert len(s) == 2**32 + 1 assert s[2**32] == element assert s[1000] is None s.close()
You can use a
SqliteSparseSequence
from multiple processes at the same time. This is useful, for example, if you’re filling out a sequence and you are partitioning ranges to processes.- Parameters:
- close()[source]#
Closes the underlying Sqlite table. Do not use this sequence afterwards!
- Return type:
- copy_to(target)[source]#
Make a copy of this sequence at a new location.
This will attempt to make a hardlink, which is very fast, but only works on Linux and if
target
is on the same drive. If making a hardlink fails, it falls back to making a regular copy. As a result, there is no guarantee whether you will get a hardlink or a copy. If you get a hardlink, future edits in the source sequence will also appear in the target sequence. This is why we recommend to not usecopy_to()
until you are done with the sequence. This is not ideal, but it is a compromise we make for performance.