There are three strategies implemented that can be used to produce repeatable pseudo-random numbers across multiple processes (local or distributed).
SeedSequence
SeedSequence implements an algorithm to process a user-provided seed, typically as an integer of some size, and to convert it into an initial state for a BitGenerator. It uses hashing techniques to ensure that low-quality seeds are turned into high quality initial states (at least, with very high probability).
BitGenerator
For example, MT19937 has a state consisting of 624 uint32 integers. A naive way to take a 32-bit integer seed would be to just set the last element of the state to the 32-bit seed and leave the rest 0s. This is a valid state for MT19937, but not a good one. The Mersenne Twister algorithm suffers if there are too many 0s. Similarly, two adjacent 32-bit integer seeds (i.e. 12345 and 12346) would produce very similar streams.
MT19937
12345
12346
SeedSequence avoids these problems by using successions of integer hashes with good avalanche properties to ensure that flipping any bit in the input input has about a 50% chance of flipping any bit in the output. Two input seeds that are very close to each other will produce initial states that are very far from each other (with very high probability). It is also constructed in such a way that you can provide arbitrary-sized integers or lists of integers. SeedSequence will take all of the bits that you provide and mix them together to produce however many bits the consuming BitGenerator needs to initialize itself.
These properties together mean that we can safely mix together the usual user-provided seed with simple incrementing counters to get BitGenerator states that are (to very high probability) independent of each other. We can wrap this together into an API that is easy to use and difficult to misuse.
from numpy.random import SeedSequence, default_rng ss = SeedSequence(12345) # Spawn off 10 child SeedSequences to pass to child processes. child_seeds = ss.spawn(10) streams = [default_rng(s) for s in child_seeds]
Child SeedSequence objects can also spawn to make grandchildren, and so on. Each SeedSequence has its position in the tree of spawned SeedSequence objects mixed in with the user-provided seed to generate independent (with very high probability) streams.
grandchildren = child_seeds[0].spawn(4) grand_streams = [default_rng(s) for s in grandchildren]
This feature lets you make local decisions about when and how to split up streams without coordination between processes. You do not have to preallocate space to avoid overlapping or request streams from a common global service. This general “tree-hashing” scheme is not unique to numpy but not yet widespread. Python has increasingly-flexible mechanisms for parallelization available, and this scheme fits in very well with that kind of use.
Using this scheme, an upper bound on the probability of a collision can be estimated if one knows the number of streams that you derive. SeedSequence hashes its inputs, both the seed and the spawn-tree-path, down to a 128-bit pool by default. The probability that there is a collision in that pool, pessimistically-estimated (1), will be about
System Message: WARNING/2 (n^2*2^{-128})
latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017/Debian) (preloaded format=latex) restricted \write18 enabled. entering extended mode (./math.tex LaTeX2e <2017-04-15> Babel <3.18> and hyphenation patterns for 3 language(s) loaded. (/usr/share/texlive/texmf-dist/tex/latex/base/article.cls Document Class: article 2014/09/29 v1.4h Standard LaTeX document class (/usr/share/texlive/texmf-dist/tex/latex/base/size12.clo)) (/usr/share/texlive/texmf-dist/tex/latex/base/inputenc.sty (/usr/share/texlive/texmf-dist/tex/latex/base/utf8.def (/usr/share/texlive/texmf-dist/tex/latex/base/t1enc.dfu) (/usr/share/texlive/texmf-dist/tex/latex/base/ot1enc.dfu) (/usr/share/texlive/texmf-dist/tex/latex/base/omsenc.dfu))) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsmath.sty For additional information on amsmath, use the `?' option. (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amstext.sty (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsgen.sty)) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsbsy.sty) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsopn.sty)) (/usr/share/texlive/texmf-dist/tex/latex/amscls/amsthm.sty) (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amssymb.sty (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amsfonts.sty)) ! LaTeX Error: File `anyfontsize.sty' not found. Type X to quit or <RETURN> to proceed, or enter new name. (Default extension: sty) Enter file name: ! Emergency stop. <read *> l.8 \usepackage {bm}^^M No pages of output. Transcript written on math.log.
System Message: WARNING/2 (2^{20})
System Message: WARNING/2 (2^{-88})
The algorithm is carefully designed to eliminate a number of possible ways to collide. For example, if one only does one level of spawning, it is guaranteed that all states will be unique. But it’s easier to estimate the naive upper bound on a napkin and take comfort knowing that the probability is actually lower.
In this calculation, we can ignore the amount of numbers drawn from each stream. Each of the PRNGs we provide has some extra protection built in that avoids overlaps if the SeedSequence pools differ in the slightest bit. PCG64 has
PCG64
System Message: WARNING/2 (2^{127})
System Message: WARNING/2 (2^{128})
Philox
SFC64
System Message: WARNING/2 (2^{64})
Philox is a counter-based RNG based which generates values by encrypting an incrementing counter using weak cryptographic primitives. The seed determines the key that is used for the encryption. Unique keys create unique, independent streams. Philox lets you bypass the seeding algorithm to directly set the 128-bit key. Similar, but different, keys will still create independent streams.
import secrets from numpy.random import Philox # 128-bit number as a seed root_seed = secrets.getrandbits(128) streams = [Philox(key=root_seed + stream_id) for stream_id in range(10)]
This scheme does require that you avoid reusing stream IDs. This may require coordination between the parallel processes.
jumped advances the state of the BitGenerator as-if a large number of random numbers have been drawn, and returns a new instance with this state. The specific number of draws varies by BitGenerator, and ranges from
jumped
Period
Jump Size
Bits
System Message: WARNING/2 (2^{19937})
32
System Message: WARNING/2 (~2^{127})
64
System Message: WARNING/2 (2^{256})
The jump size is
System Message: WARNING/2 ((\phi-1)*2^{128})
System Message: WARNING/2 (\phi)
jumped can be used to produce long blocks which should be long enough to not overlap.
import secrets from numpy.random import PCG64 seed = secrets.getrandbits(128) blocked_rng = [] rng = PCG64(seed) for i in range(10): blocked_rng.append(rng.jumped(i))
When using jumped, one does have to take care not to jump to a stream that was already used. In the above example, one could not later use blocked_rng[0].jumped() as it would overlap with blocked_rng[1]. Like with the independent streams, if the main process here wants to split off 10 more streams by jumping, then it needs to start with range(10, 20), otherwise it would recreate the same streams. On the other hand, if you carefully construct the streams, then you are guaranteed to have streams that do not overlap.
blocked_rng[0].jumped()
blocked_rng[1]
range(10, 20)