> I think the intent is to calculate medians on longer running streams of data t...

nshepperd · on Nov 25, 2020

It's an online algorithm. It's meant to be used on essentially infinite streams of data, such as a live dashboard of latencies of a running server. In that context, providing a 5 element, or any finite list as an example seems like a non sequitur.

Your examples do relate to problems the algorithm actually has in this application, but they manifest as things like "extremely large warmup time" and "adjusting to a regime change in the distribution taking time proportional to the change". For instance if your data is [1000, 1001, 1000, 1001...] then it takes 1000 steps to converge, which may be longer than the user has patience for.

However, the algorithm does always converge eventually, as long as the stream is not something pathological like an infinite sequence of consecutive numbers (for which the median is undefined anyhow).

p1necone · on Nov 25, 2020

If you have a monotonically increasing stream of numbers no algorithm is going to converge on a meaningful average, even if it's "correct" in a technical sense. This algorithm is intended to give a (very) cheap estimate for values with some kind of organic distribution (probably normal).

It doesn't have to be accurate for all possible edge cases because that's not the point of cheap estimation like this.

dataflow · on Nov 25, 2020

What I said has absolutely nothing to do with monotonicity. Pick any permutation of [2N, 2N + 1, ..., 3N - 1] for N as large as you want and you'll see the algorithm estimate the median as N, which is below even the minimum.

p1necone · on Nov 25, 2020

> [2N, 2N + 1, ..., 3N - 1]

This is practically the definition of monotonicity. (https://en.wikipedia.org/wiki/Monotonic_function)

You should read the paper, it doesn't have the problems you think it has when used on real world data.

zappy_zippy · on Nov 25, 2020

The parent mentioned permutation of those numbers. Does that affect your response?

Also note that the parent wasn't commenting on the algorithm in this HN submission, but rather the algorithm described in the top-level comment.

creato · on Nov 25, 2020

Quote from the paper cited in the post describing this algorithm:

> These algorithms do not perform well with adversarial streams, but we have mathematically analyzed the 1 unit memory algorithm and shown fast approach and stability properties for stochastic streams

Your criticism is really pretty strident for saying something the authors probably completely understand and agree with.