MyTetra Share
Делитесь знаниями!
Перемешать строки в файле случайным образом
Время создания: 02.02.2017 17:38
Автор: http://stackoverflow.com/users/862763/alpha-cod
Текстовые метки: linux file shuffle
Раздел: Linux


8down votefavorite

1

I want to shuffle a large file with millions of lines of strings in Linux. I tried 'sort -R' But it is very slow (takes like 50 mins for a 16M big file). Is there a faster utility that I can use in the place of it?

linux bash unix

shareimprove this question

asked Feb 6 '13 at 10:48

alpha_cod

590725

7

 

Shuf? en.wikipedia.org/wiki/Shuf – Anders Lindahl Feb 6 '13 at 10:51

  

 

millions of lines for a 16MB file: you have very short lines? BTW: 16 MB is not big. It will fit in core, and sorting will take less than a second, I guess. – wildplasser Feb 6 '13 at 10:56 

  

 

@AndersLindahl : What's the entropy Shuf introduces? Is it as random as 'sort -R' – alpha_cod Feb 6 '13 at 11:05

  

 

@wildplasser : Oh...its a 16 Million line file, not 16 MB. Sorting is quite fast on this file, but 'sort -R' is very slow. – alpha_cod Feb 6 '13 at 11:05

  

 

@alpha_cod: I would guess it's /dev/random. You can control then entropy source with --random-source. – Anders Lindahl Feb 6 '13 at 11:33

  

 

This is a similar thread stackoverflow.com/questions/2153882/… – Ifthikhan Feb 6 '13 at 12:10

  

 

@AndersLindahl How about suggesting that as an answer? – that other guy Feb 6 '13 at 20:02

add a comment

3 Answers

activeoldestvotes


up vote11down vote

Use shuf instead of sort -R (man page).

The slowness of sort -R is probably due to it hashing every lineshuf just does a random permutation so it doesn't have that problem.

(This was suggested in a comment but for some reason not written as an answer by anyone)

Так же в этом разделе:
 
MyTetra Share v.0.53
Яндекс индекс цитирования