I want to shuffle a large file with millions of lines of strings in Linux. I tried 'sort -R' But it is very slow (takes like 50 mins for a 16M big file). Is there a faster utility that I can use in the place of it?
millions of lines for a 16MB file: you have very short lines? BTW: 16 MB is not big. It will fit in core, and sorting will take less than a second, I guess. – wildplasserFeb 6 '13 at 10:56
@AndersLindahl : What's the entropy Shuf introduces? Is it as random as 'sort -R' – alpha_codFeb 6 '13 at 11:05
@wildplasser : Oh...its a 16 Million line file, not 16 MB. Sorting is quite fast on this file, but 'sort -R' is very slow. – alpha_codFeb 6 '13 at 11:05
@alpha_cod: I would guess it's /dev/random. You can control then entropy source with --random-source. – Anders LindahlFeb 6 '13 at 11:33