How to use grep/ack with files in arbitrary encoding?

MyTetra Share

Делитесь знаниями!

Ссылка на оригинал

Время создания: 26.03.2017 16:54

Текстовые метки: linux grep luit encoding

Раздел: Linux

Запись: Velonski/mytetra-database/master/base/1490529280ncblyj6sx1/text.html на raw.githubusercontent.com

On my Linux desktop I have a UTF-8 locale. When I try to search some KOI8-R encoded files with grep (ack), it fails. If I manually encode the pattern into KOI8-R and pass that as an argument, it works.

Is it possible to tell grep what encoding to use for the pattern? Or any other tool?

If all the files you're searching in have the same encoding:

LC_CTYPE=ru_RU.KOI8-R luit ack-grep "$(echo 'привет' | iconv -t KOI8-R)" *.txt

or in bash or zsh

LC_CTYPE=ru_RU.KOI8-R luit ack-grep "$(iconv -t KOI8-R <<<'привет')" *.txt

Or start a child shell in the desired encoding:

$ LC_CTYPE=ru_RU.KOI8-R luit

$ ack-grep 'привет' *.txt

$ exit

Luit (shipped with XFree86 and X.org) runs the program specified on its command line in the locale specified by the LC_CTYPE setting, assuming an UTF-8 terminal. So the command runs in the desired locale, and Luit translates its terminal output to UTF-8.

Another approach, if you have a directory tree with a lot of files in a different encoding, is to mount a view of that directory tree under a your prefered encoding. I think the fuseflt filesystem can do this (untested).

mkdir /utf8-view

fuseflt iconv-koi8r-utf8.conf /some/dir /utf8-view

ack-grep 'привет' /utf8-view/*.txt.utf8

fusermount -u /utf8-view

where the configuration file iconv-koi8r-utf8.conf contains

ext_in =

ext_out = *.utf8

flt_in =

flt_out = .utf8

flt_cmd = iconv -f KOI8-R -t UTF-8

Так же в этом разделе:

MyTetra Share v.0.67