IT博客汇 | The Genuine JavaScript Sieve of Eratosthenes

The Genuine JavaScript Sieve of Eratosthenes

reg@braythwayt.com (Reginald Braithwaite)发表于 2016-04-22 00:00:00

melissa e. o’neill

Melissa E. O'Neill lecturing at Stanford

the unfaithful sieve

In the last post, we looked at the Sieve of Eratosthenes. The definition is:

We start with a table of numbers (e.g., 2, 3, 4, 5, . . . ) and progressively cross off numbers in the table until the only numbers left are primes. Specifically, we begin with the first number, p, in the table, and:

Declare p to be prime, and cross off all the multiples of that number in the table, starting from p squared, then;

Find the next number in the table after p that is not yet crossed off and set p to that number; and then repeat from step 1.

The solution given was the most naïve possible mapping from words to code:

// General-Purpose Lazy Operations

function * range (from = 0, to = null) {
  let number = from;

  if (to == null) {
    while (true) {
      yield number++
    }
  }
  else {
    while (from <= to) {
      yield number++;
    }
  }
}

function * take (numberToTake, iterable) {
  const iterator = iterable[Symbol.iterator]();

  for (let i = 0; i < numberToTake; ++i) {
    const { done, value } = iterator.next();
    if (!done) yield value;
  }
}

function * compact (list) {
  for (const element of list) {
    if (element != null) {
      yield element;
    }
  }
}

// The implementation

function * nullEveryNth (skipFirst, n, iterable) {
  const iterator = iterable[Symbol.iterator]();

  yield * take(skipFirst, iterator);

  while (true) {
    yield * take(n - 1, iterator);
    iterator.next();
    yield null;
  }
}

function * sieve (iterable) {
  const iterator = iterable[Symbol.iterator]();
  let n;

  do {
    const { value } = iterator.next();

    n = value;
    yield n;
  } while (n == null);

  yield * sieve(nullEveryNth(n * (n - 2), n, iterator));
}

const Primes = compact(sieve(range(2)));

This is naïve in the sense that it mimics what a child does when the sieve is explained to them for the first time. Given a big table of numbers, they start crossing them out using what we know to be modulo arithmetic: They scan forward number by number, counting as they go:

One TWO (cross out), one TWO (cross out), one TWO (cross out), …

One two THREE (cross out), one two THREE (cross out), one two THREE (cross out), …

One two three four FIVE (cross out), one two three four FIVE (cross out), one two three four FIVE (cross out), …

This conforms to the description given above, but it has the performance characteristics of Trial Division: Every number, whether prime or not, must be ‘touched’ by every prime smaller than it. Melissa O’Neill calls this an “Unfaithful Sieve.”

altering our metaphor

Instead of thinking of “crossing numbers out of a list,” let’s think of the sieve in the title: Let’s build an actual sieve, a data structure that we can use to determine whether a number is prime or not.

Our sieve will be an object with a constructor and two methods of note:

addAll(iterable) adds all the elements of iterable to our sieve. It is required that the elements of iterable be ordered, and that the first element of iterable be larger than the lowest number of any iterable already added.
has(number) tests whether number is present in our sieve. It is required that successive calls to has must provide numbers that increase. In other words, calls to has are also ordered. Since calls to has are ordered by definition, the sieve is free to internally discard number if it returns true.

Given such a data structure, our generator for primes is much simpler:

function * Primes () {
  let prime = 2;
  const composites = new Sieve();

  while (true) {
    yield prime;
    composites.addAll(multiplesOf(prime * prime, prime));

    while (composites.has(++prime)) {
      // do nothing
    }
  }
}

So let’s write a Sieve class. Our first crack will actually use the same “unfaithful” approach. Only instead of “crossing out” numbers in a list, we’ll merge lists of composite numbers together. Here is a generator that takes two ordered lists and merges them naïvely:

function * merge (aIterable, bIterable) {
  const aIterator = aIterable[Symbol.iterator]();
  const bIterator = bIterable[Symbol.iterator]();
  let { done: aDone, value: aValue } = aIterator.next();
  let { done: bDone, value: bValue } = bIterator.next();

  while (true) {
    if (aDone && bDone) {
      return;
    } else if (aDone) {
      yield bValue;
      yield * bIterator;
      return;
    } else if (bDone) {
      yield aValue;
      yield * aIterator;
      return;
    } else if (aValue <= bValue) {
      yield aValue;
      ({ done: aDone, value: aValue } = aIterator.next());
    } else {
      yield bValue;
      ({ done: bDone, value: bValue } = bIterator.next());
    }
  }
}

We want to work with ordered lists that are also unique, so this function eliminates duplicates:

function * unique (iterable) {
  const iterator = iterable[Symbol.iterator]();
  let lastYielded = {};
  let { done, value } = iterator.next();

  while (!done) {
    if (value !== lastYielded) {
      yield value;
      lastYielded = value;
    }
    ({ done, value } = iterator.next());
  }
}

We need to get the first element of a list and the rest of a list on a regular basis. Here’s a helper that works very nicely with JavaScript’s destructuring assignment:

function destructure (iterable) {
  const iterator = iterable[Symbol.iterator]();
  const { done, value } = iterator.next();

  if (!done) {
    return { first: value, rest: iterator }
  }
}

With these in hand, we can write our sieve in the new way. As we collect primes, we create a list of composite numbers by collecting the multiples of each primes, starting with the prime squared. So for two, our composites are 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, .... For three, they’re 9, 12, 15, 18, 21, 24, ..., and for five they’re 25, 30, 35, 40, 45, ....

function * multiplesOf (startingWith, n) {
  let number = startingWith;

  while (true) {
    yield number;
    number = number + n;
  }
}

By successively merging them together, we get a list of numbers that aren’t prime. The merge of the composites above is 4, 6, 8, 9, 10, 12, 12, 14, 15, 16, 18, 18, 20, 21, 22, 24, 24, 25, ..., which we can pass to unique to get 4, 6, 8, 9, 10, 12, 14, 15, 16, 18, 20, 21, 22, 24, 25, ...`.

Here is our MergeSieve class. It implements addAll by merging the new iterator with its existing iterator of composite numbers, and it implements has by checking whether the number provided is equal to the first number in its list. If it is, it removes the first.

class MergeSieve {
  constructor () {
    this._first = undefined;
    this._rest = [];
  }

  addAll (iterable) {
    this._rest = unique(merge(this._rest, iterable));
    if (this._first == null) {
      ({
        first: this._first,
        rest: this._rest
      } = destructure(this._rest));
    }
  }

  has (number) {
    while (this._first < number) {
      this._removeFirst();
    }
    if (number === this._first) {
      this._removeFirst();
      return true;
    }
    else return false;
  }

  _removeFirst () {
    ({
      first: this._first,
      rest: this._rest
    } = destructure(this._rest));
  }
}

function * Primes () {
  let prime = 2;
  const composites = new MergeSieve();

  while (true) {
    yield prime;
    composites.addAll(multiplesOf(prime * prime, prime));

    while (composites.has(++prime)) {
      // do nothing
    }
  }
}

take(100, Primes())
  //=>
    [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47,
     53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107,
     109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167,
     173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229,
     233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283,
     293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359,
     367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431,
     433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491,
     499, 503, 509, 521, 523, 541]

So far so good, but looking at our implementation of merge, we can see that the way it works is that as we take things from a collection of lists merged together, we’re invoking a series of comparisons, one for each list. So every time we come across a composite number, we’re invoking one comparison for each prime less than the composite number.

Therefore, checking a number like 26 requires a comparison for the multiples of 2, 3, 5, 7, 11 and so on up to 23 even though it’s only divisible by 2 and 13. This is roughly equivalent in performance to our naïve implementation from the last post.

The ideal performance of the Sieve of Eratosthenes is that every composite number gets crossed out once for each of its factors less than its square root. Therefore, a number like 26 will get crossed out for 2, but not 3, 5, or any other prime including 13, its other factor.

Nevertheless, this is a step towards a better implementation in a different way: By isolating the composites in their own lazy structure, we can swap out the naïve merge for something faster.

hash checking

As described, the Sieve of Eratosthenes uses a large set of numbers and checks each one off. If we were to faithfully reproduce it as we iterate over prime numbers, we’d require space proportional to the cardinality of the largest prime returned. So if we collect 100 primes, we’d need space proportional to 541, the 100th prime.

We don’t want to do that, we want to maintain space proportional to the number of primes found. So for the 100th prime, we’d require space proportional to 100. The ratio grows as we collect more primes, so this is important.

The first and obvious step is to use a data structure more accommodating of sparse data. Since JavaScript makes hash tables easy, we’ll start with that.

Instead of our naïve lazy merge, let’s put our prime iterators in a hash table, indexed by the next number to extract. Two iterators can have the same next number, so we’ll keep a list of iterators for each number.

When we start, our HashMerge will have one iterable, at index 4. Its remaining numbers will be 6, 8, and d so on. We then add another at 9, with numbers 12, 15, and so on. We try removing 4, and when we do so, we re-merge the iterator for multiples of two, but now it will be at number 6, with remaining numbers 8, 10, and so on.

Thus, _remove is always relocating iterables to their next higher number. When two or more iterators end up at the same index (like 12), all get relocated.

class HashSieve {
  constructor () {
    this._hash = Object.create(null);
  }

  addAll (iterable) {
    const { first, rest } = destructure(iterable);

    if (this._hash[first]) {
      this._hash[first].push(rest);
    }
    else this._hash[first] = [rest];

    return this;
  }

  has (number) {
    if (this._hash[number]) {
      this._remove(number);
      return true;
    }
    else return false;
  }

  _remove (number) {
    const iterables = this._hash[number];

    if (iterables == null) return false;

    delete this._hash[number];
    iterables.forEach((iterable) => this.addAll(iterable));

    return number;
  }
}

function * Primes () {
  let prime = 2;
  const composites = new HashSieve();

  while (true) {
    yield prime;
    composites.addAll(multiplesOf(prime * prime, prime));

    while (composites.has(++prime)) {
      // do nothing
    }
  }
}

take(100, Primes())
  //=>
    [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47,
     53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107,
     109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167,
     173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229,
     233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283,
     293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359,
     367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431,
     433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491,
     499, 503, 509, 521, 523, 541]

This is much better than our MergeSieve. The check for has involves the constant overhead of performing a hash lookup, of course, and that is more expensive than the === test of MergeSieve. But what happens when each sieve removes the no-longer-needed number?

When MergeSieve tests a composite number, it iterates. Because of the way merge is written, iterating requires an if statement for the number of iterables seen so far -1. In other words, every time it hits a composite number, if there are p primes less than the composite number, MergeSieve needs to perform p-1 operations, regardless of how many prime factors the composite number actually has.
When HashSieve tests a composite number, it iterates and relocates each of the iterators at that number. There will be one iterator for each prime factor, and only the prime factors less than the square root of the composite number will be included. However, for each one, there is a large constant factor as we have to perform an insert in the has table. Finally, there is a single remove from the hash table. So HashSieve does fewer operations, but each is more expensive.

As the numbers grow, primes become scarcer, but the total number of primes grows. Therefore, MergeSieve gets slower and slower as it is performing operations for each prime discovered so far. HashSieve catches up and then gets relatively faster and faster.

summary

Not all Sieves of Eratosthenes have the desired performance of time proportional to the number of prime factors for each composite number. Those that don’t may or may not be useful for illustrating a point about mixing lazy and non-lazy operations, but they are “unfaithful” to the Sieve of Eratosthenes’ performance and should not be considered authentic implementations.

(This essay is adapted from The Genuine Sieve of Eratosthenes by Melissa E. O’Neill)

postscript: queues

Our HashSieve is much better than our MergeSieve, and the code looks simpler to boot. That being said, it could be even better. Hash tables are very nice, and fast for most purposes. But they are not perfect. Adding and removing elements involves monkeying around behind the scenes with hashing functions and buckets. Every time we look a number up, we run a hashing function on it.

Also, we only ever check the smallest number in the table with .has. There’s actually no need for a fancy lookup that is actually checking all the numbers. If we knew what the smallest number was, we could check that with straight up ===, just like MergeSieve. The complexity would, of course, involve figuring out what the next smallest number would be when removing iterables and re-merging them.

We’d need a priority queue.