| python 中文解决方法 gb2312 <==> utf8 |
|
作者:佚名 责任编辑:左决 点击数: 更新时间:2008-2-21 6:59:38 |
 |
程序包见附件 也可参考 http://quijote.blog@bbs.nju.edu.cn
作 者: quijote
抛砖引玉
这是我以前收集整理的。内容比较凌乱,也比较全面。 包括windows, python2.3,pyqt. 而pygtk和thinker和pyqt类似都用unicode.
我想最好的办法是做一个库直接调用gb13080编码字库. 我搜集了一个gb18030映射表 > 830k, 这样双向两个表 > 1.6 M
在win2000+sp3,python2.2
from Tkinter import * w = Button(text="中国".decode("mbcs"), font="simhei", command='exIT') w.pack() w.mainloop() 这个方法治标不治本 有时候,我会把字符串的mbcs(GB)和unicode混淆
这个方法有个缺点,由于mbcs的缘故,只适用于windows系统. 一个解决办法,安装 http://sourceforge.net/projects/python-codecs/ A SourceForge project working on addITional support for Asian codecs for use wITh Python. They are in the early stages of development at the time of this wrITing -- look in their FTP area for downloadable files. (见 Python Library Reference 4.9) 略作修改即可使用 ( 下载4个文件 eucgb23212utf.py (182K) , utf2eucgb2321.py (182K), ( http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python- codecs/practicecodecs/ChineseCodecs/chinesecn/Attic/ ) eucgb2321_cn.py ( http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python- codecs/practicecodecs/ChineseCodecs/Python/) test.py
本来有个setup.py, 但我不会用,手工修改:
1.把EUCGB2321_CN 替换成gb2312,包括文件名,文件里面的内容;
2. aliases.py 文件最后添加一行 # eucgb2321_cn codec 'gb2312' : 'gb2312',
3. 需要:c:\python22\lib\encodings中,新建一个目录chinesecn, 放置gb23122utf.py (182K) ,utf2gb2312.py (182K), 和 __inIT__.py(文件内容为空)三个文件,
4. encodings下,放置gb2312.py文件(原名是eucgb2321_cn.py ?)
)。
注释(2003.7): EUCGB2321_CN 是unix下汉字编码。
直接下载: http://bbs1.nju.edu.cn/file/gb2312.rar 即可。
------------------------------------------------------------------------ 运行 test.py
gbstring = "大家好" #print gbstring
uni = unicode(gbstring, "gb2312")
gstring = uni.encode("gb2312")
print "Original gb2312 encoded string:" print gbstring print "Transcode to Unicode encoding:" print repr(uni) print "Print as a gb2312 encoded string:" print gstring
------------------------------------------------------------ 运行结果: Original gb2312 encoded string: 大家好 Transcode to Unicode encoding: u'\u5927\u5bb6\u597d' Print as a gb2312 encoded string: 大家好 ------------------------------------------------------------------------------ 这个方法的缺点,有点麻烦(unicode(gbstring, "gb2312")), 只适用gb2312,而不是gb18030编码(没有unicode<-->gb18030 table) 我搜集了一个gb18030映射表 > 830k, 这样双向两个表 > 1.6 M
优点是 通用性很好,无论windows, linux系统,还是 Tkinter, pyQT, pyGTK, wxpython都可以使用。
--------------------------------------------------------------------------- btw, eucgb2321, 2321? 2312? 把我搞迷糊了 ^_^ EUCGB2321_CN 是unix下汉字编码。
我原本用杜文山先生的汉化包( http://dohao.org),可是他并不能及时更新了, 只好另想办法。
python 开发人员的建议
寄件者:Martin v. Loewis (martin@v.loewis.de) 主旨:Re: Chinese language support of Python?
View this article only 新闻群组:comp.lang.python 日期:2002-07-07 01:01:02 PST
guidance_shanghai@yahoo.com.cn (Leon Wang) wrITes:
> But still can not put Chinese directly as string in source, I can not > live with so much \u... for a whole Chinese sensence/paragraph, IT's > impossible to read and edIT them
This is a known problem, and it will be addressed wITh PEP 263 (http://www.python.org/peps/pep-0263.html.
Meanwhile, you have the following options:
- Don't use IDLE to edIT Python source code (but, say, notepad), and only put Chinese text into string lITerals. - Set the default encoding in sITe.py to the encoding you want to use. - Apply patch http://sourceforge.net/tracker/index.php?func=detail&aid=508973&group_id=957 9&atid=309579
which allows you to declare the source encoding for IDLE.
In either case, you cannot use Chinese in Unicode lITerals. Instead, you should always use
unicode("chinese string", "chinese encoding")
For portability, and if your editors support IT, I recommend to use UTF-8 as the "chinese encoding".
Regards, Martin
[1] [2] 下一页 |
|