š aprxc ā A #Python #CLI tool to approximate the number of distinct values in a file/iterable using the Chakraborty/Vinodchandran/Meelās (ācoin flipā) #algorithm¹.
https://codeberg.org/fa81/ApproxyCount
Vs. `sort | uniq -c | wc -l`: needs slightly more memory, but 5x faster.
Vs. `awk '!a[$0]++' | wc -l`: just as fast, using *much less* memory (20x-150x for large inputs).
At the cost of ~1% inaccuracy (configurable).
Useful? You decide! :)
Edited 364d ago