Sequences#

This module contains some utilities to make sequences out of other sequences. All of these are lazy, so they take minimal time and memory when you create them. These work particularly well when used together. For example, you can concatenate two sequences (ConcatenatedSequence), and then shuffle them (ShuffledSequence).

This module is not dependent on other Tango modules and can be used in isolation.

class tango.common.sequences.ConcatenatedSequence(*sequences)[source]#

Produces a sequence that’s the lazy concatenation of multiple other sequences. It does not copy any of the elements of the original sequences.

This assumes that the inner sequences never change. If they do, the results are undefined.

Parameters

sequences (Sequence) – the inner sequences to concatenate

Example:

from tango.common.sequences import ConcatenatedSequence
l1 = [1, 2, 3]
l2 = [4, 5]
l3 = [6]
cat_l = ConcatenatedSequence(l1, l2, l3)

assert len(cat_l) == 6
for i in cat_l:
    print(i)

This will print the following:

1
2
3
4
5
6
class tango.common.sequences.MappedSequence(fn, inner_sequence)[source]#

Produces a sequence that applies a function to every element of another sequence.

This is similar to Python’s map(), but it returns a sequence instead of a map object.

Parameters
  • fn (Callable) – the function to apply to every element of the inner sequence. The function should take one argument.

  • inner_sequence (Sequence) – the inner sequence to map over

Example:

from tango.common.sequences import MappedSequence

def square(x):
    return x * x

l = [1, 2, 3, 4]
map_l = MappedSequence(square, l)

assert len(map_l) == len(l)
for i in map_l:
    print(i)

This will print the following:

1
4
9
16
class tango.common.sequences.ShuffledSequence(inner_sequence, indices=None)[source]#

Produces a shuffled view of a sequence, such as a list.

This assumes that the inner sequence never changes. If it does, the results are undefined.

Parameters
  • inner_sequence (Sequence) – the inner sequence that’s being shuffled

  • indices (Optional[Sequence[int]], default: None) – Optionally, you can specify a list of indices here. If you don’t, we’ll just shuffle the inner sequence randomly. If you do specify indices, element n of the output sequence will be inner_sequence[indices[n]]. This gives you great flexibility. You can repeat elements, leave them out completely, or slice the list. A Python slice object is an acceptable input for this parameter, and so are other sequences from this module.

Example:

from tango.common.sequences import ShuffledSequence
l = [1, 2, 3, 4, 5, 6, 7, 8, 9]
shuffled_l = ShuffledSequence(l)

print(shuffled_l[0])
print(shuffled_l[1])
print(shuffled_l[2])
assert len(shuffled_l) == len(l)

This will print something like the following:

4
7
8
class tango.common.sequences.SlicedSequence(inner_sequence, s)[source]#

Produces a sequence that’s a slice into another sequence, without copying the elements.

This assumes that the inner sequence never changes. If it does, the results are undefined.

Parameters
  • inner_sequence (Sequence) – the inner sequence that’s being shuffled

  • s (slice) – the slice to slice the input with.

Example:

from tango.common.sequences import SlicedSequence
l = [1, 2, 3, 4, 5, 6, 7, 8, 9]
sliced_l = SlicedSequence(l, slice(1, 4))

print(sliced_l[0])
print(sliced_l[1])
print(sliced_l[2])
assert len(sliced_l) == 3

This will print the following:

2
3
4
class tango.common.sequences.SqliteSparseSequence(filename, read_only=False)[source]#

This is a sparse sequence that pickles elements to a Sqlite database.

When you read from the sequence, elements are retrieved and unpickled lazily. That means creating/opening a sequence is very fast and does not depend on the length of the sequence.

This is a “sparse sequence” because you can set element n before you set element n-1:

s = SqliteSparseSequence(filename)
element = "Big number, small database."
s[2**32] = element
assert len(s) == 2**32 + 1
assert s[2**32] == element
assert s[1000] is None
s.close()

You can use a SqliteSparseSequence from multiple processes at the same time. This is useful, for example, if you’re filling out a sequence and you are partitioning ranges to processes.

Parameters
  • filename (Union[str, PathLike]) – the filename at which to store the data

  • read_only (bool, default: False) – Set this to True if you only want to read.

clear()[source]#

Clears the entire sequence

Return type

None

close()[source]#

Closes the underlying Sqlite table. Do not use this sequence afterwards!

Return type

None

copy_to(target)[source]#

Make a copy of this sequence at a new location.

Parameters

target (Union[str, PathLike]) – the location of the copy

This will attempt to make a hardlink, which is very fast, but only works on Linux and if target is on the same drive. If making a hardlink fails, it falls back to making a regular copy. As a result, there is no guarantee whether you will get a hardlink or a copy. If you get a hardlink, future edits in the source sequence will also appear in the target sequence. This is why we recommend to not use copy_to() until you are done with the sequence. This is not ideal, but it is a compromise we make for performance.

extend(values)[source]#

S.extend(iterable) – extend sequence by appending elements from the iterable

Return type

None

insert(i, value)[source]#

S.insert(index, value) – insert value before index

Return type

None