不用iconv函数实现UTF-8编码转换GB2312的PHP函数
如果使用 iconv() 函数转换编码就相比比较简单了,不过很多虚拟主机里并不支持这个组件,我在网上找半天,才找到一个gb2312转utf-8的方法,但不能逆向转换。
这个函数如下:
/******************************* //GB转UTF-8编码 *******************************/ function gb2utf8($gbstr) { global $CODETABLE; if(trim($gbstr)=="") return $gbstr; if(empty($CODETABLE)){ $filename = dirname(__FILE__)."/gb2312-utf8.table"; $fp = fopen($filename,"r"); while ($l = fgets($fp,15)) { $CODETABLE[hexdec(substr($l, 0, 6))] = substr($l, 7, 6); } fclose($fp); } $ret = ""; $utf8 = ""; while ($gbstr) { if (ord(substr($gbstr, 0, 1)) > 127) { $thisW = substr($gbstr, 0, 2); $gbstr = substr($gbstr, 2, strlen($gbstr)); $utf8 = ""; @$utf8 = u2utf8(hexdec($CODETABLE[hexdec(bin2hex($thisW)) - 0x8080])); if($utf8!=""){ for ($i = 0;$i < strlen($utf8);$i += 3) $ret .= chr(substr($utf8, $i, 3)); } } else { $ret .= substr($gbstr, 0, 1); $gbstr = substr($gbstr, 1, strlen($gbstr)); } } return $ret; } //Unicode转utf8 function u2utf8($c) { for ($i = 0;$i < count($c);$i++) $str = ""; if ($c < 0x80) { $str .= $c; } else if ($c < 0x800) { $str .= (0xC0 | $c >> 6); $str .= (0x80 | $c & 0x3F); } else if ($c < 0x10000) { $str .= (0xE0 | $c >> 12); $str .= (0x80 | $c >> 6 & 0x3F); $str .= (0x80 | $c & 0x3F); } else if ($c < 0x200000) { $str .= (0xF0 | $c >> 18); $str .= (0x80 | $c >> 12 & 0x3F); $str .= (0x80 | $c >> 6 & 0x3F); $str .= (0x80 | $c & 0x3F); } return $str; } 因为gb2312都是双字节的,因此转换为utf-8就相对比较简单,但反之有很麻烦了,我尝试了一下: 这样 function utf82gb($utfstr) { global $UC2GBTABLE; $okstr = ""; if(trim($utfstr)=="") return $utfstr; if(empty($UC2GBTABLE)){ $filename = dirname(__FILE__)."/gb2312-utf8.table"; $fp = fopen($filename,"r"); while($l = fgets($fp,15)) { $UC2GBTABLE[hexdec(substr($l, 7, 6))] = hexdec(substr($l, 0, 6));} fclose($fp); } $ulen = strlen($utfstr); for($i=0;$i<$ulen;$i++) { if(ord($utfstr[$i])<0x81) $okstr .= $utfstr[$i]; else { if($ulen>$i+2) { $utfc = substr($utfstr,$i,3); $c = ""; @$c = dechex($UC2GBTABLE[utf82u_3($utfc)]+0x8080); if($c!=""){ $okstr .= chr(hexdec($c[0].$c[1])).chr(hexdec($c[2].$c[3])); } } else { $okstr .= $utfstr[$i]; } } } $okstr = trim($okstr); return $okstr; } function utf82u_3($c) { $n = (ord($c[0]) & 0x1f) << 12; $n += (ord($c[1]) & 0x3f) << 6; $n += ord($c[2]) & 0x3f; return $n; } 按这种方法,大部份字符也算是能转换成功的了,不过总是有点不妥之处,我把程序改成这样子: function utf82gb($utfstr) { global $UC2GBTABLE; $okstr = ""; if(trim($utfstr)=="") return $utfstr; if(empty($UC2GBTABLE)){ $filename = dirname(__FILE__)."/gb2312-utf8.table"; $fp = fopen($filename,"r"); while($l = fgets($fp,15)) { $UC2GBTABLE[hexdec(substr($l, 7, 6))] = hexdec(substr($l, 0, 6));} fclose($fp); } $okstr = ""; $utfstr = urlencode($utfstr); $ulen = strlen($utfstr); for($i=0;$i<$ulen;$i++) { if($utfstr[$i]=="%") { if($ulen>$i+2){ $hexnext = hexdec("0x".substr($utfstr,$i+1,2)); if($hexnext<127){ $okstr .= chr($hexnext); $i = $i+2; } else{ if($ulen>=$i+9){ $hexnext = substr($utfstr,$i+1,8); $c = ""; @$c = dechex($UC2GBTABLE[url_utf2u($hexnext)]+0x8080); if($c!=""){ $okstr .= chr(hexdec($c[0].$c[1])).chr(hexdec($c[2].$c[3])); } $i = $i+8; } } } else { $okstr .= $utfstr[$i]; } } else if($utfstr[$i]=="+") $okstr .= " "; else $okstr .= $utfstr[$i]; } $okstr = trim($okstr); return $okstr; } //三字节的URL编码转成的utf8字符转为unicode编码 function url_utf2u($c) { $utfc = ""; $cs = split("%",$c); for($i=0;$i<< 12; $n += (ord($utfc[1]) & 0x3f) << 6; $n += ord($utfc[2]) & 0x3f; return $n; } 一测试,发现完全OK,而且速度居然比上一个方法要快,我真是搞不懂这是什么原因了 谁要 gb2312-utf8.table 这个文件请加我的QQ 2500875 IT柏拉图 或与 1877000 泡泡 联系 PHP下UTF-8转换成GB2312格式 将UTF-8编码的字符串转化成GB2312的编码,没有对应编码的字符串转化为&#DEC; 的格式。如 ?=>? 语言:PHP,Javas cript 内容:浏览器用Javas cript中encodeURI函数将字符串(包含非GB2312中字符)编码,GET请求到服务器,页面编码均为GB2312,服务器PHP脚本将请求数据转换成GB2312表示。 基础: 1. 单独使用iconv函数只能转换GB2312字符,外文字符无法转换 2. 没有现成的函数可以用 3. bindec()函数:将二进制格式的"01"字符串转换为十进制数 4. decbin()函数:将十进制数转换为二进制字符串,如decbin(224)="11100000" 思路:因为UTF-8分别有1,2,3字节编码,中日韩文都是3字节编码,处理时根据字符编码中首字节大小区分字节数量。 1.如首字节小于128,为ASCII码 2.128~192,非UTF-8编码,且处理为&#ord(); 3. 192~224, 双字节UTF-8编码 4. 224~240,三字节编码 5. 240~248,四字节编码 6. 。。。 7. 对于三字节编码的尝试用iconv转换成GB2312 8. 非GB2312的多字节字符,尝试把UTF-8转换成Unicode,再取到Unicode十进制值 9. 可以考虑使用位运算,也可以用bindec()函数 程序: function GetGB2312String($name) { $tostr = ""; for($i=0;$i < 0x80) { $tostr .= substr($name,$i,1); }elseif($curbin < bindec("11000000")){ $str = substr($name,$i,1); $tostr .= "&#".ord($str).";"; }elseif($curbin < bindec("11100000")){ $str = substr($name,$i,2); $tostr .= "&#".GetUnicodeChar($str).";"; $i += 1; }elseif($curbin < bindec("11110000")){ $str = substr($name,$i,3); $gstr= iconv("UTF-8","GB2312",$str); if(!$gstr) { $tostr .= "&#".GetUnicodeChar($str).";"; }else{ $tostr .= $gstr; } $i += 2; }elseif($curbin < bindec("11111000")){ $str = substr($name,$i,4); $tostr .= "&#".GetUnicodeChar($str).";"; $i += 3; }elseif($curbin < bindec("11111100")){ $str = substr($name,$i,5); $tostr .= "&#".GetUnicodeChar($str).";"; $i += 4; }else{ $str = substr($name,$i,6); $tostr .= "&#".GetUnicodeChar($str).";"; $i += 5; } } return $tostr; } function GetUnicodeChar($str) { $temp = ""; for($i=0;$i UTF-8 转换函数 这个函数是干嘛的?哦,把汉字转换成UTF-8后,就可以在GD中使用了!(作者sadly) //Program writen by sadly www.phpx.com function gb2utf8($gb) { if(!trim($gb)) return $gb; $filename="gb2312.txt"; $tmp=file($filename); $codetable=array(); while(list($key,$value)=each($tmp)) $codetable[hexdec(substr($value,0,6))]=substr($value,7,6); $utf8=""; while($gb) { if (ord(substr($gb,0,1))>127) { $this=substr($gb,0,2); $gb=substr($gb,2,strlen($gb)); $utf8.=u2utf8(hexdec($codetable[hexdec(bin2hex($this))-0x8080])); } else { $gb=substr($gb,1,strlen($gb)); $utf8.=u2utf8(substr($gb,0,1)); } } $ret=""; for($i=0;$i $ret.=chr(substr($utf8,$i,3)); return $ret; } function u2utf8($c) { for($i=0;$i $str=""; if ($c < 0x80) { $str.=$c; } else if ($c < 0x800) { $str.=(0xC0 | $c>>6); $str.=(0x80 | $c & 0x3F); } else if ($c < 0x10000) { $str.=(0xE0 | $c>>12); $str.=(0x80 | $c>>6 & 0x3F); $str.=(0x80 | $c & 0x3F); } else if ($c < 0x200000) { $str.=(0xF0 | $c>>18); $str.=(0x80 | $c>>12 & 0x3F); $str.=(0x80 | $c>>6 & 0x3F); $str.=(0x80 | $c & 0x3F); } return $str; } Header("Content-type: image/gif"); $im = imagecreate(400,300); $bkg = ImageColorAllocate($im, 0,0,0); $clr = ImageColorAllocate($im, 255,255,255); $fnt = "wb.ttf"; //include("gb2utf8.php"); $str = gb2utf8("中国"); ImageTTFText($im, 20, 0, 10, 20, $clr, $fnt, $str); ImageGif($im); ImageDestroy($im); ?>