Count Distinct

Count-distinct problem is a problem of finding the number of distinct elements in a data set or data stream, within which you might possibly see some repeated elements. For example, [1, 3, 2, 1, 5, 2, 4] has 5 distinct elements [1, 2, 3, 4, 5].


HyperLogLog is an algorithm that can solve Count Distinct problem. It can provide estimated count on a very large data stream.