IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    [原]Linux使用技巧7--GBK转成UTF-8

    lincyang发表于 2015-04-30 10:40:11
    love 0

    文件的内容编码的转换

    Windows系统中编辑的Java源码,在Linux下打开会出现中文乱码的情况。原因就是文件编码格式的问题,Windows下通常是GBK而Linux下是UTF-8。

    在vim中用set fileencoding命令就可以看出编码格式,如下:

    //linux下
    fileencoding=utf-8
    //windows下
    fileencoding=latin1

    最简单的办法就是在windows下将文件另存为utf8格式。那么在linux下我们可以使用iconv工具将其转换格式。

    $ iconv --help
    Usage: iconv [OPTION...] [FILE...]
    Convert encoding of given files from one encoding to another.
    
     Input/Output format specification:
      -f, --from-code=NAME       encoding of original text
      -t, --to-code=NAME         encoding for output
    
     Information:
      -l, --list                 list all known coded character sets
    
     Output control:
      -c                         omit invalid characters from output
      -o, --output=FILE          output file
      -s, --silent               suppress warnings
          --verbose              print progress information
    
      -?, --help                 Give this help list
          --usage                Give a short usage message
      -V, --version              Print program version
    
    $ iconv -f GBK -t UTF-8 test.java -o test2.java

    转换完毕,中文乱码就不见了。

    文件的批量转换

    借用http://blog.csdn.net/deqingguo/article/details/7314558的方案:
    1.创建一样的目录结构:

    $ find com -type d -exec mkdir -p com2/{} \;
    

    2.转换

    $ find com -type f -exec iconv -f GBK -t UTF-8 {} -o com2/{} \;

    文件/文件夹名的转换

    这就要用到convmv工具了。

    $ convmv
    Your Perl version has fleas #22111 #37757 #49830 
    convmv 1.15 - converts filenames from one encoding to another
    Copyright (C) 2003-2011 Bjoern JACKE 
    
     USAGE: convmv [options] FILE(S)
    -f enc     encoding *from* which should be converted
    -t enc     encoding *to* which should be converted
    -r         recursively go through directories
    -i         interactive mode (ask for each action)
    --nfc      target files will be normalization form C for UTF-8 (Linux etc.)
    --nfd      target files will be normalization form D for UTF-8 (OS X etc.)
    --qfrom    be quiet about the "from" of a rename (if it screws up your terminal e.g.)
    --qto      be quiet about the "to" of a rename (if it screws up your terminal e.g.)
    --exec c   execute command instead of rename (use #1 and #2 and see man page)
    --list     list all available encodings
    --lowmem   keep memory footprint low (see man page)
    --nosmart  ignore if files already seem to be UTF-8 and convert if posible
    --notest   actually do rename the files
    --replace  will replace files if they are equal
    --unescape convert%20ugly%20escape%20sequences
    --upper    turn to upper case
    --lower    turn to lower case
    --parsable write a parsable todo list (see man page)
    --help     print this help

    将tech目录下的文件夹或文件递归转换:

    sudo convmv -f gbk -t utf-8 -r --notest tech/

    另外需要注意,有时候在windows上用zip压缩时也会带来乱码问题。



沪ICP备19023445号-2号
友情链接