ÉÐѧÌÃ.ÕÅÖ¾Óî.ÂÒÂë·ÖÎö_01_»ù´¡.doc_JAVA_±à³Ì¿ª·¢_³ÌÐòÔ±¾ãÀÖ²¿

ÖйúÓÅÐãµÄ³ÌÐòÔ±ÍøÕ¾³ÌÐòԱƵµÀCXYCLUB¼¼ÊõµØͼ
ÈÈËÑ£º
¸ü¶à>>
 
ÄúËùÔÚµÄλÖ㺠³ÌÐòÔ±¾ãÀÖ²¿ > ±à³Ì¿ª·¢ > JAVA > ÉÐѧÌÃ.ÕÅÖ¾Óî.ÂÒÂë·ÖÎö_01_»ù´¡.doc

ÉÐѧÌÃ.ÕÅÖ¾Óî.ÂÒÂë·ÖÎö_01_»ù´¡.doc

 2012/1/31 9:22:40  bjrobin  ³ÌÐòÔ±¾ãÀÖ²¿  ÎÒÒªÆÀÂÛ(0)
  • ÕªÒª£º1.ʲôÊÇASCIIÂëASCII£¨AmericanStandardCodeforInformationInterchange£¬ÃÀ¹úÐÅÏ¢»¥»»±ê×¼´úÂ룩ASCIIÂëÒ»¹²¹æ¶¨ÁË128¸ö×Ö·ûµÄ±àÂëASCII±íÉϵÄÊý×Ö0¨C31·ÖÅä¸øÁË¿ØÖÆ×Ö·û£¬ÓÃÓÚ¿ØÖÆÏñ´òÓ¡»úµÈһЩÍâΧÉ豸¡£ËüÒѱ»¹ú¼Ê±ê×¼»¯×éÖ¯£¨ISO£©¶¨Îª¹ú¼Ê±ê×¼£¬³ÆΪISO646±ê×¼¡£²Î¿¼£º´ÓÕâÀï¿´ASCIIµÄÄÚÈÝhttp://www.jimprice.com/jim-asc.shtml»òÕß¿´ÕâÕÅͼÀ´Á˽âASCIIµÄÄÚÈÝhttp://www
  • ±êÇ©£º·ÖÎö
1. ʲôÊÇASCIIÂë
ASCII£¨American Standard Code for Information Interchange£¬ÃÀ¹úÐÅÏ¢»¥»»±ê×¼´úÂ룩
ASCIIÂëÒ»¹²¹æ¶¨ÁË128¸ö×Ö·ûµÄ±àÂë
ASCII ±íÉϵÄÊý×Ö 0¨C31 ·ÖÅä¸øÁË¿ØÖÆ×Ö·û£¬ÓÃÓÚ¿ØÖÆÏñ´òÓ¡»úµÈһЩÍâΧÉ豸¡£
ËüÒѱ»¹ú¼Ê±ê×¼»¯×éÖ¯£¨ISO£©¶¨Îª¹ú¼Ê±ê×¼£¬³ÆΪISO 646±ê×¼¡£

²Î¿¼£º
´ÓÕâÀï¿´ASCIIµÄÄÚÈÝ
http://www.jimprice.com/jim-asc.shtml
»òÕß¿´ÕâÕÅͼÀ´Á˽âASCIIµÄÄÚÈÝ
http://www.asciitable.com/
¿ØÖÆ×Ö·û¶¼Ê²Ã´º¬Ò壬¿ÉÒÔ´ÓÕâ¸öÍøÖ·Á˽â
http://zh.wikipedia.org/w/index.php?title=ASCII&variant=zh-cn


2. ʲôÊÇISO/IEC iso8859
ISO 8859£¬È«³ÆISO/IEC 8859£¬Êǹú¼Ê±ê×¼»¯×éÖ¯(ISO)¼°¹ú¼Êµç¹¤Î¯Ô±»á(IEC)ÁªºÏÖƶ¨µÄһϵÁÐ8λ×Ö·û¼¯µÄ±ê×¼£¬ÏÖʱ¶¨ÒåÁË15¸ö×Ö·û¼¯¡£
? ISO/IEC 8859-1 (Latin-1) - Î÷Å·ÓïÑÔ
? ISO/IEC 8859-2 (Latin-2) - ÖÐÅ·ÓïÑÔ
? ISO/IEC 8859-3 (Latin-3) - ÄÏÅ·ÓïÑÔ¡£ÊÀ½çÓïÒ²¿ÉÓôË×Ö·û¼¯ÏÔʾ¡£
? ISO/IEC 8859-4 (Latin-4) - ±±Å·ÓïÑÔ
? ISO/IEC 8859-5 (Cyrillic) - ˹À­·òÓïÑÔ
? ISO/IEC 8859-6 (Arabic) - °¢À­²®Óï
? ISO/IEC 8859-7 (Greek) - Ï£À°Óï
? ISO/IEC 8859-8 (Hebrew) - Ï£²®À´Óï(ÊÓ¾õ˳Ðò)
? ISO 8859-8-I - Ï£²®À´Óï(Âß¼­Ë³Ðò)
? ISO/IEC 8859-9 (Latin-5 »ò Turkish) - Ëü°ÑLatin-1µÄ±ùµºÓï×Öĸ»»×ߣ¬¼ÓÈëÍÁ¶úÆäÓï×Öĸ¡£
? ISO/IEC 8859-10 (Latin-6 »ò Nordic) - ±±ÈÕ¶úÂüÓïÖ§£¬ÓÃÀ´´úÌæLatin-4¡£
? ISO/IEC 8859-11 (Thai) - Ì©Ó´ÓÌ©¹úµÄ TIS620 ±ê×¼×Ö¼¯ÑÝ»¯¶øÀ´¡£
? ISO/IEC 8859-13 (Latin-7 »ò Baltic Rim) - ²¨ÂÞµÄÓï×å
? ISO/IEC 8859-14 (Latin-8 »ò Celtic) - ¿­¶ûÌØÓï×å
? ISO/IEC 8859-15 (Latin-9) - Î÷Å·ÓïÑÔ£¬¼ÓÈëLatin-1ǷȱµÄ·ÒÀ¼Óï×ÖĸºÍ´óд·¨ÓïÖØÒô×Öĸ£¬ÒÔ¼°Å·Ôª(€)·ûºÅ¡£
? ISO/IEC 8859-16 (Latin-10) - ¶«ÄÏÅ·ÓïÑÔ¡£Ö÷Òª¹©ÂÞÂíÄáÑÇÓïʹÓ㬲¢¼ÓÈëÅ·Ôª·ûºÅ¡£
²Î¿¼£º
ÕâÀïÓбȽÏÏêϸµÄ˵Ã÷
http://zh.wikipedia.org/w/index.php?title=ISO/IEC_8859&variant=zh-cn

3. ISO/IEC 8859Ê®Îå¸ö×Ö·û¼¯µÄ±È½Ï
²Î¿¼
http://zh.wikipedia.org/w/index.php?title=ISO/IEC_8859&variant=zh-cn
»òÕß¿´Õâ¸öÒ²ÐÐ
http://www.terena.org/activities/multiling/ml-docs/iso-8859.html

4. ʲôÊÇiso-8859-1
ÔÚISO/IEC 8859-nÖ®ÖУ¬¹ú¼Ê±ê×¼»¯×éÖ¯Ö»Ìæÿ¸ö×Ö·û¼¯¶¨ÒåÁË×î¶à96¸ö×Ö·û(0xA0-0xFF)¡£

ISO-8859-n(ÔÚISOÓë8859Ö®¼ä¼ÓÉÏÒ»Á¬×ÖºÅ)ÔòÊÇÓÉIANA¸ù¾ÝISO/IEC 8859-nËù¶¨ÒåµÄ±àÂë±í¡£Ëü³ýÁËISO/IEC 8859-nµÄ×Ö·ûÍ⣬»¹°üÀ¨ASCII(0x20-0x7E)×Ö·û¼°65¸ö¿ØÖÆ×Ö·û(0x00-0x1F¼°0x7E-0x9F)¡£

²Î¿¼£º
http://zh.wikipedia.org/wiki/ISO/IEC_8859-1
http://wiki.ccw.com.cn/ISO_8859-1


5. ʲôÊÇunicode
Unicode µÄ±àÂ뷽ʽÓë ISO 10646 µÄͨÓÃ×Ö·û¼¯£¨Universal Character Set£¬UCS£©¸ÅÄîÏà¶ÔÓ¦£¬Ä¿Ç°Êµ¼ÊÓ¦ÓÃµÄ Unicode °æ±¾¶ÔÓ¦ÓÚ UCS-2£¬Ê¹ÓÃ16λµÄ±àÂë¿Õ¼ä¡£Ò²¾ÍÊÇÿ¸ö×Ö·ûÕ¼ÓÃ2¸ö×Ö½Ú¡£

²Î¿¼£º
ÖÐÎĵÄ˵Ã÷
http://zh.wikipedia.org/w/index.php?title=Unicode&variant=zh-cn

¹Ù·½ÍøÖ·
http://www.unicode.org/
Èç¹ûÏëÏÂÔرàÂë¾ßÌåÄÚÈÝ
http://www.unicode.org/charts/
²é¿´¸÷ÖÖ×Ö·û¼¯µÄ¶ÔÓ¦¹Øϵ
http://www.unicode.org/Public/MAPPINGS/

ÍêÕûUnicode±àÂë±í
http://zh.wikibooks.org/wiki/Unicode

6. Ôö²¹×Ö·û

Ôö²¹×Ö·ûÊÇ Unicode ±ê×¼ÖдúÂëµã³¬³ö U+FFFF µÄ×Ö·û
Ôö²¹×Ö·ûÊÇ´úÂëµãÔÚ U+10000 ÖÁ U+10FFFF ·¶Î§Ö®¼äµÄ×Ö·û£¬Ò²¾ÍÊÇÄÇЩʹÓÃԭʼµÄ Unicode µÄ 16 λÉè¼ÆÎÞ·¨±íʾµÄ×Ö·û

ÔÚUTF-16±àÂëÖУ¬Ôö²¹×Ö·û±íʾ³ÉÁ½¸ö×Ö½Ú¡£µÚÒ»¸ö×Ö½ÚÊôÓڸߴúÀí·¶Î§(\uD800-\uDBFF), µÚ¶þ¸ö×Ö½ÚÊôÓڵʹúÀí·¶Î§ (\uDC00-\uDFFF).

package encodetest;

public class TestChar {
public static void main(String[] args) {
System.out.println(Character.charCount(0x10000));
System.out.println(Character.isHighSurrogate((char)0xd87e));
System.out.println(Character.isLowSurrogate((char)0xdc1a));
String s=String.valueOf(Character.toChars(0x2F81A));
char[]chars=s.toCharArray();
for(char c:chars){
    System.out.format("%x",(short)c);
}
//d87edc1a
//Õâ¸ö×Ö·û±ä³ÉÁËÁ½¸öcharÐͱäÁ¿£¬ÆäÖÐ0xd87e¾ÍÊǸߴúÀí²¿·ÖµÄÖµ£¬0xdc1a¾ÍÊǵʹúÀíµÄÖµ¡£
}
}

²Î¿¼
Java ƽ̨ÖеÄÔö²¹×Ö·û
http://gceclub.sun.com.cn/developer/technicalArticles/Intl/Supplementary/index_zh_CN.html

7. Big EndianºÍLittle Endian
Ò»¸ö×Ö·û¿ÉÄÜÕ¼Óöà¸ö×Ö½Ú£¬ÄÇôÕâ¶à¸ö×Ö½ÚÔÚ¼ÆËã»úÖÐÈçºÎ´æ´¢ÄØ£¿±ÈÈç×Ö·û0xabcd£¬ËüµÄ´æ´¢¸ñʽµ½µ×ÊÇ AB CD£¬»¹ÊÇ CD AB ÄØ£¿
ʵ¼ÊÉÏÁ½Õ߶¼ÓпÉÄÜ£¬²¢·Ö±ðÓв»Í¬µÄÃû×Ö¡£Èç¹û´æ´¢Îª AB CD£¬Ôò³ÆΪBig Endian£»Èç¹û´æ´¢Îª CD AB£¬Ôò³ÆΪLittle Endian¡£

8. ISO/IEC 8859-1:1998 ÓëUnicodeµÄ¹Øϵ
http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT
9. ASCIIÓëUnicodeµÄ¹Øϵ
http://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/US-ASCII-QUOTES.TXT
10. UCS-2ºÍUCS-4
? UnicodeÊÇΪÕûºÏÈ«ÊÀ½çµÄËùÓÐÓïÑÔÎÄ×Ö¶øµ®ÉúµÄ¡£ÈκÎÎÄ×ÖÔÚUnicodeÖж¼¶ÔÓ¦Ò»¸öÖµ£¬Õâ¸öÖµ³ÆΪ´úÂëµã£¨code point£©¡£´úÂëµãµÄֵͨ³£Ð´³É U+ABCD µÄ¸ñʽ¡£¶øÎÄ×ֺʹúÂëµãÖ®¼äµÄ¶ÔÓ¦¹Øϵ¾ÍÊÇUCS-2£¨Universal Character Set coded in 2 octets£©¡£¹ËÃû˼Ò壬UCS-2ÊÇÓÃÁ½¸ö×Ö½ÚÀ´±íʾ´úÂëµã£¬ÆäÈ¡Öµ·¶Î§Îª U+0000¡«U+FFFF¡£
? ΪÁËÄܱíʾ¸ü¶àµÄÎÄ×Ö£¬ÈËÃÇÓÖÌá³öÁËUCS-4£¬¼´ÓÃËĸö×Ö½Ú±íʾ´úÂëµã¡£ËüµÄ·¶Î§Îª U+00000000¡«U+7FFFFFFF£¬ÆäÖÐ U+00000000¡«U+0000FFFFºÍUCS-2ÊÇÒ»ÑùµÄ¡£
? ҪעÒ⣬UCS-2ºÍUCS-4Ö»¹æ¶¨ÁË´úÂëµãºÍÎÄ×ÖÖ®¼äµÄ¶ÔÓ¦¹Øϵ£¬²¢Ã»Óй涨´úÂëµãÔÚ¼ÆËã»úÖÐÈçºÎ´æ´¢¡£¹æ¶¨´æ´¢·½Ê½µÄ³ÆΪUTF£¨Unicode Transformation Format£©£¬ÆäÖÐÓ¦Óý϶àµÄ¾ÍÊÇUTF-16ºÍUTF-8ÁË¡£
11. UTF-16

? UTF-16ÓÉRFC2781¹æ¶¨£¬ËüʹÓÃÁ½¸ö×Ö½ÚÀ´±íʾһ¸ö´úÂëµã¡£
? ²»ÄѲµ½£¬UTF-16ÊÇÍêÈ«¶ÔÓ¦ÓÚUCS-2µÄ£¬¼´°ÑUCS-2¹æ¶¨µÄ´úÂëµãͨ¹ýBig Endian»òLittle Endian·½Ê½Ö±½Ó±£´æÏÂÀ´¡£UTF-16°üÀ¨ÈýÖÖ£ºUTF-16£¬UTF-16BE£¨Big Endian£©£¬UTF-16LE£¨Little Endian£©¡£
? UTF-16BEºÍUTF-16LE²»ÄÑÀí½â£¬¶øUTF-16¾ÍÐèҪͨ¹ýÔÚÎļþ¿ªÍ·ÒÔÃûΪBOM£¨Byte Order Mark£©µÄ×Ö·ûÀ´±íÃ÷ÎļþÊÇBig Endian»¹ÊÇLittle Endian¡£BOMΪU+FEFFÕâ¸ö×Ö·û¡£
? ÆäʵBOMÊǸöС´ÏÃ÷µÄÏë·¨¡£ÓÉÓÚUCS-2ûÓж¨ÒåU+FFFE£¬Òò´ËÖ»Òª³öÏÖ FF FE »òÕß FE FF ÕâÑùµÄ×Ö½ÚÐòÁУ¬¾Í¿ÉÒÔÈÏΪËüÊÇU+FEFF£¬²¢ÇÒ¿ÉÒÔÅжϳöÊÇBig Endian»¹ÊÇLittle Endian¡£
? ¾Ù¸öÀý×Ó¡£¡°ABC¡±ÕâÈý¸ö×Ö·ûÓø÷ÖÖ·½Ê½±àÂëºóµÄ½á¹ûÈçÏ£º
UTF-16BE 00 41 00 42 00 43
UTF-16LE 41 00 42 00 43 00
UTF-16(Big Endian) FE FF 00 41 00 42 00 43
UTF-16(Little Endian) FF FE 41 00 42 00 43 00
UTF-16(²»´øBOM) 00 41 00 42 00 43
? Windowsƽ̨ÏÂĬÈϵÄUnicode±àÂëΪLittle EndianµÄUTF-16£¨¼´ÉÏÊöµÄ FF FE 41 00 42 00 43 00£©¡£Äã¿ÉÒÔ´ò¿ª¼Çʱ¾£¬Ð´ÉÏABC£¬È»ºó±£´æ£¬ÔÙÓöþ½øÖƱ༭Æ÷¿´¿´ËüµÄ±àÂë½á¹û¡£

? ÁíÍ⣬UTF-16»¹Äܱíʾһ²¿·ÖµÄUCS-4´úÂëµã¡ª¡ªU+10000¡«U+10FFFF¡£±íʾËã·¨±È½Ï¸´ÔÓ£¬¼òµ¥ËµÃ÷ÈçÏ£º
? ´Ó´úÂëµãUÖмõÈ¥0x10000£¬µÃµ½U'¡£ÕâÑùU+10000¡«U+10FFFF¾Í±ä³ÉÁË 0x00000¡«0xFFFFF¡£
? ÓÃ20λ¶þ½øÖÆÊý±íʾU'¡£ U'=yyyyyyyyyyxxxxxxxxxx
? ½«Ç°10λºÍºó10λÓÃW1ºÍW2±íʾ£¬W1=110110yyyyyyyyyy£¬W2=110111xxxxxxxxxx£¬Ôò W1 = D800¡«DBFF£¬W2 = DC00¡«DFFF¡£
? ÀýÈ磬U+12345±íʾΪ D8 08 DF 45£¨UTF-16BE£©£¬»òÕß08 D8 45 DF£¨UTF-16LE£©¡£
? µ«ÊÇÓÉÓÚÕâÖÖËã·¨µÄ´æÔÚ£¬Ôì³ÉUCS-2ÖÐµÄ U+D800¡«U+DFFF ±ä³ÉÁËÎÞ¶¨ÒåµÄ×Ö·û¡£

²Î¿¼£ºhttp://www.ietf.org/rfc/rfc2781.txt
12. UTF-32
UTF-32ÓÃËĸö×Ö½Ú±íʾ´úÂëµã£¬ÕâÑù¾Í¿ÉÒÔÍêÈ«±íʾUCS-4µÄËùÓдúÂëµã£¬¶øÎÞÐèÏñUTF-16ÄÇÑùʹÓø´ÔÓµÄËã·¨¡£ÓëUTF-16ÀàËÆ£¬UTF-32Ò²°üÀ¨UTF-32¡¢UTF-32BE¡¢UTF-32LEÈýÖÖ±àÂ룬UTF-32ҲͬÑùÐèÒªBOM×Ö·û¡£½öÓÃ'ABC'¾ÙÀý£º
UTF-32BE 00 00 00 41 00 00 00 42 00 00 00 43
UTF-32LE 41 00 00 00 42 00 00 00 43 00 00 00
UTF-32(Big Endian) 00 00 FE FF 00 00 00 41 00 00 00 42 00 00 00 43
UTF-32(Little Endian) FF FE 00 00 41 00 00 00 42 00 00 00 43 00 00 00
UTF-32(²»´øBOM) 00 00 00 41 00 00 00 42 00 00 00 43
13. UTF-8
UTF-16ºÍUTF-32µÄÒ»¸öȱµã¾ÍÊÇËüÃǹ̶¨Ê¹ÓÃÁ½¸ö»òËĸö×Ö½Ú£¬ÕâÑùÔÚ±íʾ´¿ASCIIÎļþʱ»áÓкܶà00×Ö½Ú£¬Ôì³ÉÀË·Ñ¡£¶øRFC3629¶¨ÒåµÄUTF-8Ôò½â¾öÁËÕâ¸öÎÊÌâ¡£
UTF-8ÓÃ1¡«4¸ö×Ö½ÚÀ´±íʾ´úÂëµã¡£±íʾ·½Ê½ÈçÏ£º
UCS-2 (UCS-4) λÐòÁÐ µÚÒ»×Ö½Ú µÚ¶þ×Ö½Ú µÚÈý×Ö½Ú µÚËÄ×Ö½Ú
U+0000 .. U+007F 00000000-0xxxxxxx 0xxxxxxx
U+0080 .. U+07FF 00000xxx-xxyyyyyy 110xxxxx 10yyyyyy
U+0800 .. U+FFFF xxxxyyyy-yyzzzzzz 1110xxxx 10yyyyyy 10zzzzzz
U+10000..U+1FFFFF 00000000-000wwwxx-
xxxxyyyy-yyzzzzzzz 11110www 10xxxxxx 10yyyyyy 10zzzzzz
¿É¼û£¬ASCII×Ö·û£¨U+0000¡«U+007F£©²¿·ÖÍêȫʹÓÃÒ»¸ö×Ö½Ú£¬±ÜÃâÁË´æ´¢¿Õ¼äµÄÀË·Ñ¡£¶øÇÒUTF-8²»ÔÙÐèÒªBOM×Ö½Ú¡£
ÁíÍ⣬´ÓÉϱíÖпÉÒÔ¿´³ö£¬µ¥×Ö½Ú±àÂëµÄµÚÒ»×Ö½ÚΪ[00-7F]£¬Ë«×Ö½Ú±àÂëµÄµÚÒ»×Ö½ÚΪ[C2-DF]£¬Èý×Ö½Ú±àÂëµÄµÚÒ»×Ö½ÚΪ[E0-EF]¡£ÕâÑùÖ»Òª¿´µ½µÚÒ»¸ö×ֽڵķ¶Î§¾Í¿ÉÒÔÖªµÀ±àÂëµÄ×Ö½ÚÊý¡£ÕâÑùÒ²¿ÉÒÔ´ó´ó¼ò»¯Ëã·¨¡£

14. javaºÍUnicodeµÄ¹Øϵ
ÏÂÃæÕâ¶Î»°À´×ÔJDKÎĵµ¹ØÓÚUnicodeµÄ˵Ã÷¡£
Programs are written using the Unicode character set. Information about this character set and its associated character encodings may be found at:
http://www.unicode.org
java³ÌÐòÊÇ»ùÓÚUnicode ×Ö·û¼¯À´±àдµÄ¡£¹ØÓÚÕâ¸ö×Ö·û¼¯ÒÔ¼°ËüµÄÏà¹ØµÄ±àÂë¿ÉÒÔÔÚÕâ¸öÍøÕ¾ÕÒµ½£º
http://www.unicode.org
The Java platform tracks the Unicode specification as it evolves. The precise version of Unicode used by a given release is specified in the documentation of the class Character.
Javaƽ̨¸ú×ÅUnicodeµÄ¹æ·¶¶ø±ä»¯¡£JavaµÄÿһ¸ö°æ±¾Óõ½µÄ׼ȷµÄUnicodeµÄ°æ±¾ºÅ£¬¶¨ÒåÔÚCharacterÀàµÄÎĵµÖС£
Versions of the Java programming language prior to 1.1 used Unicode version 1.1.5. Upgrades to newer versions of the Unicode Standard occurred in JDK 1.1 (to Unicode 2.0), JDK 1.1.7 (to Unicode 2.1), J2SE 1.4 (to Unicode 3.0), and J2SE 5.0 (to Unicode 4.0).
Java 1.1ÓõÄÊÇUnicode 1.1.5¡£JDK 1.1 ÓõÄÊÇUnicode 2.0£¬ JDK 1.1.7 ÓõÄÊÇUnicode 2.1£¬ J2SE 1.4ÓõÄÊÇUnicode 3.0, and J2SE 5.0 ÓõÄÊÇUnicode 4.0¡£
£¨J2SE 6.0 ÓõÄÒ²ÊÇUnicode 4.0£©
The Unicode standard was originally designed as a fixed-width 16-bit character encoding. It has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, using the hexadecimal U+n notation. Characters whose code points are greater than U+FFFF are called supplementary characters. To represent the complete range of characters using only 16-bit units, the Unicode standard defines an encoding called UTF-16. In this encoding, supplementary characters are represented as pairs of 16-bit code units, the first from the high-surrogates range, (U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to U+DFFF). For characters in the range U+0000 to U+FFFF, the values of code points and UTF-16 code units are the same.
Unicode±ê×¼×î³õµÄÉè¼ÆÊÇ16λ¹Ì¶¨¿í¶ÈµÄ×Ö·û±àÂë. ºóÀ´±äΪÔÊÐíÓöàÓÚ16λÀ´±íʾ×Ö·û¡£ÏÖÔڵĺϷ¨µÄ´úÂëµã´ÓU+0000 to U+10FFFF, 16½øÖƵıíʾ·½Ê½¡£´úÂëµã´óÓÚU+FFFF µÄ×Ö·û½Ð×ö²¹³ä×Ö·û. ΪÁËÖ»ÓÃ16λÀ´±íʾȫ²¿·¶Î§µÄ×Ö·û£¬ Unicode±ê×¼¶¨ÒåÁËÒ»Ì×±àÂ룬½Ð×öUTF-16. ÔÚÕâ¸ö±àÂëÖУ¬²¹³ä×Ö·û±»±íʾΪ2¸ö16-bit±àÂë, µÚÒ»²¿·Ö±àÂëµÄ·¶Î§ÊÇ(U+D800 to U+DBFF), µÚ¶þ²¿·Ö±àÂëµÄ·¶Î§ÊÇ (U+DC00 to U+DFFF). ¶ÔÓÚÔÚ U+0000 to U+FFFF·¶Î§µÄ×Ö·ûÀ´Ëµ, ´úÂëµãµÄÖµºÍUTF-16±àÂëÊÇÒ»Öµġ£

Java±à³ÌÓïÑÔÓÃ16λµÄ±àÂë´ú±íÎı¾¡£Ê¹ÓÃUTF-16±àÂë. ÉÙÊýµÄAPIs, Ö÷ÒªÔÚCharacter ÀàÖУ¬ÓÃ32-bit µÄÕûÊýÀ´´ú±í´úÂëµãµÄµ¥¸öʵÀý¡£Javaƽ̨Ìṩ·½·¨ÔÚÁ½ÖÖ±íʾ·½·¨Ö®¼ä½øÐÐת»»¡£
This book uses the terms code point and UTF-16 code unit where the representation is relevant, and the generic term character where the representation is irrelevant to the discussion.
J2SE ¼¼Êõ¹æ·¶ÏÖÔÚʹÓÃÊõÓï´úÂëµãºÍ UTF-16 ´úÂëµ¥Ôª£¨±íʾ·¨ÊÇÏà¹ØµÄ£©ÒÔ¼°Í¨ÓÃÊõÓï×Ö·û£¨±íʾ·¨Óë¸ÃÌÖÂÛûÓйØϵ£©¡£(API ͨ³£Ê¹ÓÃÃû³Æ codePoint ÃèÊö±íʾ´úÂëµãµÄÀàÐÍ int µÄ±äÁ¿£¬¶ø UTF-16 ´úÂëµ¥ÔªµÄÀàÐ͵±È»Îª char¡£)

Except for comments (¡ì3.7), identifiers, and the contents of character and string literals (¡ì3.10.4, ¡ì3.10.5), all input elements (¡ì3.5) in a program are formed only from ASCII characters (or Unicode escapes (¡ì3.3) which result in ASCII characters). ASCII (ANSI X3.4) is the American Standard Code for Information Interchange. The first 128 characters of the Unicode character encoding are the ASCII characters.
³ýÁË×¢ÊÍ£¬±êʶ·û£¬×Ö·û³£Á¿£¬×Ö·û´®³£Á¿£¬³ÌÐòÀïÆäËûµÄËùÓеÄÊäÈëÔªËØÖ»ÄÜÊÇASCII×Ö·û£¨»òÕßͨ¹ýתÒåµÃµ½µÄASCII×Ö·û£©¡£ASCII (ANSI X3.4) ÊÇÃÀ¹úÐÅÏ¢»¥»»±ê×¼´úÂë. Unicode×Ö·û±àÂëÖеÄÇ°128¸ö×Ö·û¾ÍÊÇASCII×Ö·û¡£

ÏÂÃæÕâ¶Î»°À´×ÔAPIµÄ¹ØÓÚjava.lang.CharacterÀàµÄÃèÊö¡£
The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP). Characters whose code points are greater than U+FFFF are called supplementary characters. The Java 2 platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. In this representation, supplementary characters are represented as a pair of char values, the first from the high-surrogates range, (\uD800-\uDBFF), the second from the low-surrogates range (\uDC00-\uDFFF).
·¶Î§ÊÇU+0000 µ½ U+FFFF µÄ×Ö·û£¬Ö¸µÄÊÇBasic Multilingual Plane (BMP).
´úÂëµã±È U+FFFF »¹´óµÄ×Ö·û½ÐÔö²¹×Ö·û. Java 2 ƽ̨ÔÚchar arrays ºÍString ºÍ StringBuffer ÀàÖÐʹÓÃUTF-16±àÂë¡£ÔÚÕâÖÖ±àÂëÖУ¬Ôö²¹×Ö·û±íʾ³ÉÁ½¸öcharµÄÖµ¡£µÚÒ»¸öcharµÄÊôÓڸߴúÀí·¶Î§(\uD800-\uDBFF), µÚ¶þ¸öcharÊôÓڵʹúÀí·¶Î§ (\uDC00-\uDFFF).



²Î¿¼£º
JDKÎĵµ¹ØÓÚUnicodeµÄ˵Ã÷£º
http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.1

15. charµÄº¬Òå
Ò»¸ö char ±íʾһ¸ö UTF-16 ´úÂëµ¥Ôª
½«ËùÓÐÐÎʽµÄ char ÐòÁоù½âÊÍΪ UTF-16 ÐòÁÐ
16. ʲôÊÇgb2312
GB2312ÊǺº×Ö×Ö·û¼¯ºÍ±àÂëµÄ´úºÅ£¬ÖÐÎÄÈ«³ÆΪ¡°ÐÅÏ¢½»»»Óúº×Ö±àÂë×Ö·û¼¯¡±£¬ÓÉÖлªÈËÃñ¹²ºÍ¹ú¹ú¼Ò±ê×¼×ַܾ¢²¼£¬1981Äê5ÔÂ1ÈÕʵʩ¡£GBÊÇ¡°¹ú±ê¡± ¶þ×ֵĺºÓïÆ´ÒôËõд¡£GB2312×Ö·û¼¯Ö»ÊÕ¼¼ò»¯×Öºº×Ö£¬ÒÔ¼°Ò»°ã³£ÓÃ×ÖĸºÍ·ûºÅ£¬Ö÷ҪͨÐÐÓÚÖйú´ó½µØÇøºÍмÓƵȵء£
GB2312 ¹²ÊÕ¼ÓÐ7445¸ö×Ö·û£¬ÆäÖмò»¯ºº×Ö6763¸ö£¬×ÖĸºÍ·ûºÅ682¸ö¡£

±ê×¼µÄ½éÉÜ£º
http://www.nits.gov.cn/sc2/jishufile1.asp
http://zh.wikipedia.org/w/index.php?title=GB_2312&variant=zh-cn
ËùÓÐ×Ö·ûµÄÁÐ±í£º
http://www.mytju.com/classcode/tools/QuWeiMa_FullList.asp
GB2312×Ö·û¼¯Óë±àÂë¶ÔÕÕ±í£¨ºÍÈٰ棩
http://cuimingda.com/2008/12/gb2312.html
ºÍÈÙ±Ê¼Ç - GB2312 ×Ö·û¼¯Óë±àÂë¶ÔÕÕ±í
http://www.herongyang.com/gb2312_gb/about.html


17. ʲôÊÇÇøλÂë
GB2312½«ËùÊÕ¼µÄ×Ö·û·ÖΪ94¸öÇø£¬±àºÅΪ01ÇøÖÁ94Çø£»Ã¿¸öÇøÊÕ¼94 ¸ö×Ö·û£¬±àºÅΪ01λÖÁ94λ¡£GB2312µÄÿһ¸ö×Ö·û¶¼ÓÉÓëÆäΨһ¶ÔÓ¦µÄÇøºÅºÍλºÅËùÈ·¶¨¡£ÀýÈ磺ºº×Ö¡°°¡¡±£¬±àºÅΪ16Çø01λ¡£

GB2312×Ö·û¼¯µÄÇøλ·Ö²¼±í£º

  ÇøºÅ    ×ÖÊý    ×Ö·ûÀà±ð
   01      94    Ò»°ã·ûºÅ
   02      72    ˳ÐòºÅÂë
   03      94    À­¶¡×Öĸ
   04      83    ÈÕÎÄƬ¼ÙÃû
   05      86    ÈÕÎÄƬ¼ÙÃû
   06      48    Ï£À°×Öĸ
   07      66    ¶íÎÄ×Öĸ
   08      63    ººÓïÆ´Òô·ûºÅ
   09      76    ͼÐηûºÅ
10-15            ±¸ÓÃÇø
16-55    3755    Ò»¼¶ºº×Ö£¬ÒÔÆ´ÒôΪÐò
56-87    3008    ¶þ¼¶ºº×Ö£¬ÒԱʻ®ÎªÐò
88-94            ±¸ÓÃÇø
18. ʲôÊÇGB2312±àÂë
? GB2312 ԭʼ±àÂë (encoding) ÊǶÔËùÊÕ¼µÄÿ¸ö×Ö·û¶¼ÓÃÁ½¸ö×Ö½Ú (byte) ±íʾ¡£µÚÒ»×Ö½ÚΪ¡°¸ß×Ö½Ú¡±£¬ÓÉ×Ö·ûµÄÇøºÅÖµ¼ÓÉÏ 32 ¶øÐγɣ»µÚ¶þ×Ö½ÚΪ¡°µÍ×Ö½Ú¡±£¬ÓÉ×Ö·ûµÄλºÅÖµ¼ÓÉÏ 32 ¶øÐγɡ£ÀýÈ磺ºº×Ö¡°°¡¡±£¬±àºÅΪ 16 Çø 01 λ¡£ËüµÄ¸ß×Ö½ÚΪ 16 + 32 = 48 (0x30)£¬µÍ×Ö½ÚΪ 01 + 32 = 33 (0x21)£¬ºÏ²¢¶ø³ÉµÄ±àÂëΪ 0x3021¡£ÔÚÇøλºÅÖµÉÏ¼Ó 32 µÄÔ­Òò´ó¿®ÊÇΪÁ˱ܿªµÍÖµ×Ö½ÚÇø¼ä¡£
? ÓÉÓÚ GB2312 ԭʼ±àÂëÓë ASCII ±àÂëµÄ×Ö½ÚÓÐÖصþ£¬ÏÖÔÚͨÐÐµÄ GB2312 ±àÂëÊÇÔÚԭʼ±àÂëµÄÁ½¸ö×Ö½ÚÉϸ÷¼Ó 128 Ð޸ĶøÐγɡ£ÀýÈ磺ºº×Ö¡°°¡¡±£¬±àºÅΪ 16 Çø 01 λ¡£ËüµÄԭʼ±àÂëΪ 0x3021£¬Í¨ÐбàÂëΪ 0xB0A1¡£Èç¹û²»Áí¼Ó˵Ã÷£¬GB2312 ³£Ö¸ÕâÖÖÐ޸ĹýµÄ±àÂë¡£
19. ʲôÊÇgbk
? Unicode 1.1? GB 13000.1-93?GBK
? GBKÈ«ÃûΪºº×ÖÄÚÂëÀ©Õ¹¹æ·¶£¬Ó¢ÎÄÃûChinese Internal Code Specification¡£K ¼´ÊÇ¡°À©Õ¹¡±Ëù¶ÔÓ¦µÄººÓïÆ´Òô£¨KuoZhan11£©ÖС°À©¡±×ÖµÄÉùĸ¡£GBK À´×ÔÖйú¹ú¼Ò±ê×¼´úÂëGB 13000.1-93¡£
? 1993Ä꣬Unicode 1.1°æ±¾ÍƳö£¬ÊÕ¼ÁËÖйú´ó½¡¢Ì¨Íå¡¢ÈÕ±¾¼°º«¹úͨÓÃ×Ö·û¼¯µÄºº×Ö£¬×ܹ²ÓÐ20,902¸ö¡£
? Öйú´ó½¶©¶¨Á˵ÈͬÓÚUnicode 1.1°æ±¾µÄ¡°GB 13000.1-93¡±¡°ÐÅÏ¢¼¼Êõ ͨÓöà°Ëλ±àÂë×Ö·û¼¯£¨UCS£© µÚÒ»²¿·Ö£ºÌåϵ½á¹¹Óë»ù±¾¶àÎÄÖÖƽÃ桱¡£
? ÓÉÓÚGB 2312-80Ö»ÊÕ¼ÁË6763¸öºº×Ö£¬Óв»ÉÙºº×Ö£¬È粿·ÖÔÚGB 2312-80ÍƳöÒÔºó²Å¼ò»¯µÄºº×Ö£¨Èç¡°†ª¡±£©£¬²¿·ÖÈËÃûÓÃ×Ö£¨ÈçÖйúÇ°×ÜÀíÖìéF»ùµÄ¡°éF¡±×Ö£©£¬Ì¨Íå¼°Ïã¸ÛʹÓõķ±Ìå×Ö£¬ÈÕÓï¼°³¯ÏÊÓﺺ×ֵȣ¬²¢Î´ÓÐÊÕ¼ÔÚÄÚ¡£ÖÐÎĵçÄÔ¿ª·¢ÉÌ£¬ÓÚÊÇÀûÓÃÁËGB 2312-80δÓÐʹÓõıàÂë¿Õ¼ä£¬ÊÕ¼ÁËËùÓгöÏÖÔÚUnicode 1.1¼°GB 13000.1-93Ö®Öеĺº×Ö£¬Öƶ¨ÁËGBK±àÂë¡£
? ¸ù¾ÝÎ÷·½×ÊÁÏ£¬GBK×î³õÊÇÓÉ΢Èí¶ÔGB2312µÄÀ©Õ¹£¬Ò²¾ÍÊÇCP936×ÖÂë±í (Code Page 936)µÄÀ©Õ¹£¨Ô­À´µÄCP936ºÍGB 2312-80һģһÑù£©£¬×î³õ³öÏÖÓÚWindows 95¼òÌåÖÐÎÄ°æÖУ¬ÓÉÓÚWindows²úÆ·µÄÁ÷ÐкÍÔÚ´ó½¹ã·º±»Ê¹Óã¬ÖлªÈËÃñ¹²ºÍ¹ú¹ú¼ÒÓйز¿ÃŽ«Æä×÷Ϊ¼¼Êõ¹æ·¶¡£×¢ÒâGBK²¢·Ç¹ú¼ÒÕýʽ±ê×¼£¬Ö»Êǹú¼Ò¼¼Êõ¼à¶½¾Ö±ê×¼»¯Ë¾¡¢µç×Ó¹¤Òµ²¿¿Æ¼¼ÓëÖÊÁ¿¼à¶½Ë¾·¢²¼µÄ¡°¼¼Êõ¹æ·¶Ö¸µ¼ÐÔÎļþ¡±¡£ËäÈ» GBKÊÕ¼ÁËËùÓÐUnicode 1.1¼°GB 13000.1-93Ö®Öеĺº×Ö£¬µ«ÊDZàÂ뷽ʽÓëUnicode 1.1¼°GB 13000.1-93²»Í¬¡£½ö½öÊÇGB 2312µ½GB 13000.1-93Ö®¼äµÄ¹ý¶É·½°¸¡£

²Î¿¼£º
´ÓÕâÀï¿ÉÒÔ¿´µ½gbk±àÂë±í
http://www.microsoft.com/globaldev/reference/dbcs/936.mspx


ÆäËû£º
http://www.nits.gov.cn/sc2/jishufile1-2.asp
http://zh.wikipedia.org/w/index.php?title=GBK&variant=zh-cn

20. ʲôÊÇgb18030
? ?  GB 18030£¬È«³Æ£º¹ú¼Ò±ê×¼GB 18030-2005¡¶ÐÅÏ¢¼¼Êõ ÖÐÎıàÂë×Ö·û¼¯¡·£¬ÊÇÖлªÈËÃñ¹²ºÍ¹úÏÖʱ×îеÄÄÚÂë×Ö¼¯£¬ÊÇGB 18030-2000¡¶ÐÅÏ¢¼¼Êõ ÐÅÏ¢½»»»Óúº×Ö±àÂë×Ö·û¼¯ »ù±¾¼¯µÄÀ©³ä¡·µÄÐÞ¶©°æ¡£ÓëGB 2312-1980ÍêÈ«¼æÈÝ£¬ÓëGBK»ù±¾¼æÈÝ£¬Ö§³ÖGB 13000¼°UnicodeµÄÈ«²¿Í³Ò»ºº×Ö£¬¹²ÊÕ¼ºº×Ö70244¸ö¡£
? ?  GB 18030Ö÷ÒªÓÐÒÔÏÂÌص㣺
    * ²ÉÓöà×Ö½Ú±àÂ룬ÿ¸ö×Ö¿ÉÒÔÓÉ1¸ö¡¢2¸ö»ò4¸ö×Ö½Ú×é³É¡£
    * ±àÂë¿Õ¼äÅÓ´ó£¬×î¶à¿É¶¨Òå161Íò¸ö×Ö·û¡£
    * Ö§³ÖÖйú¹úÄÚÉÙÊýÃñ×åµÄÎÄ×Ö£¬²»ÐèÒª¶¯ÓÃÔì×ÖÇø¡£
? ×ֽڽṹ
? µ¥×Ö½Ú£¬ÆäÖµ´Ó0µ½0x7F¡£129 ¸öÂëλ
? Ë«×Ö½Ú£¬µÚÒ»¸ö×Ö½ÚµÄÖµ´Ó0x81µ½0xFE£¬µÚ¶þ¸ö×Ö½ÚµÄÖµ´Ó0x40µ½0xFE£¨²»°üÀ¨0x7F£©¡£23940 ¸öÂëλ
? ËÄ×Ö½Ú£¬µÚÒ»¸ö×Ö½ÚµÄÖµ´Ó0x81µ½0xFE£¬µÚ¶þ¸ö×Ö½ÚµÄÖµ´Ó0x30µ½0x39£¬µÚÈý¸ö×Ö½Ú´Ó0x81µ½0xFE£¬µÚËĸö×Ö½Ú´Ó0x30µ½0x39¡£1587600 ¸öÂëλ

²Î¿¼£º
½éÉÜ
http://zh.wikipedia.org/w/index.php?title=GB_18030&variant=zh-cn

GB18030-2000 ±ê×¼µÄÏêϸÄÚÈÝ£º
http://www.foundertype.com.cn/product_oem_2.htm
GB18030Óŵã
http://www.nits.gov.cn/sc2/jishufile1-3.asp
¹ØÓÚGB18030ºº×Ö±àÂë±ê×¼¼¯
http://tech.sina.com.cn/s/2001-07-26/1850.html
21. ÈçºÎ²é¿´GB 18030-2000µÄ4×Ö½Ú×é³ÉµÄ×Ö·û£¿
gb18030_4byte.jsp
22. ʲôÊÇansi
? Ϊʹ¼ÆËã»úÖ§³Ö¸ü¶àÓïÑÔ£¬Í¨³£Ê¹Óà 0x80~0xFF ·¶Î§µÄ 2 ¸ö×Ö½ÚÀ´±íʾ 1 ¸ö×Ö·û¡£±ÈÈ磺ºº×Ö 'ÖÐ' ÔÚÖÐÎIJÙ×÷ϵͳÖУ¬Ê¹Óà [0xD6,0xD0] ÕâÁ½¸ö×Ö½Ú´æ´¢¡£
? ²»Í¬µÄ¹ú¼ÒºÍµØÇøÖƶ¨Á˲»Í¬µÄ±ê×¼£¬Óɴ˲úÉúÁË GB2312, BIG5, JIS µÈ¸÷×ԵıàÂë±ê×¼¡£ÕâЩʹÓà 2 ¸ö×Ö½ÚÀ´´ú±íÒ»¸ö×Ö·ûµÄ¸÷ÖÖºº×ÖÑÓÉì±àÂ뷽ʽ£¬³ÆΪ ANSI ±àÂë¡£ÔÚ¼òÌåÖÐÎÄϵͳÏ£¬ANSI ±àÂë´ú±í GB2312 ±àÂ룬ÔÚÈÕÎIJÙ×÷ϵͳÏ£¬ANSI ±àÂë´ú±í JIS ±àÂë¡£
? ²»Í¬ ANSI ±àÂëÖ®¼ä»¥²»¼æÈÝ£¬µ±ÐÅÏ¢ÔÚ¹ú¼Ê¼ä½»Á÷ʱ£¬ÎÞ·¨½«ÊôÓÚÁ½ÖÖÓïÑÔµÄÎÄ×Ö£¬´æ´¢ÔÚͬһ¶Î ANSI ±àÂëµÄÎı¾ÖС£
23. ʲôÊÇ´úÂëÒ³
´úÂëÒ³ code page ÊÇIBM µÄ´«Í³ÊõÓ¾ÍÊÇ¡°Ò»ÕÅ×Ö·û±àÂë±í¡±£¬

²Î¿¼£º
http://www.microsoft.com/globaldev/reference/dbcs/936.mspx
24. gb2312ºÍgbkºÍgb18030µÄ¹Øϵ
<html>
<head><title>Frame</title></head>
<frameset cols="30%, *, *, *,*">
<frame name="middle" src="gb2312.jsp">
<frame name="middle" src="gbk.jsp">
<frame name="middle" src="gb18030.jsp">
<frame name="left" src="htmlcode.jsp">
<frame name="left" src="unicode.jsp">
</frameset>
</html>

25. ´Ógb2312ת»»³Éunicode
import java.io.*;
import java.nio.*;
import java.nio.charset.*;
class GB2312Unicde {
   static OutputStream out = null;
   static char hexDigit[] = {'0', '1', '2', '3', '4', '5', '6', '7',
                             '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
   static int b_out[] = {201,267,279,293,484,587,625,657,734,782,827,
      874,901,980,5590};
   static int e_out[] = {216,268,280,294,494,594,632,694,748,794,836,
      894,903,994,5594};
   public static void main(String[] args) {
      try {
         out = new FileOutputStream("gb2312.gb");
         writeCode();
         out.close();
      } catch (IOException e) {
         System.out.println(e.toString());
      }
   }
   public static void writeCode() throws IOException {
      boolean reserved = false;
      String name = null;
      // GB2312 is not supported by JDK. So I am using GBK.
      CharsetDecoder gbdc = Charset.forName("GBK").newDecoder();
      CharsetEncoder uxec = Charset.forName("UTF-16BE").newEncoder();
      CharsetEncoder u8ec = Charset.forName("UTF-8").newEncoder();
      ByteBuffer gbbb = null;
      ByteBuffer uxbb = null;
      ByteBuffer u8bb = null;
      CharBuffer cb = null;
      int count = 0;
      for (int i=1; i<=94; i++) {
         // Defining row settings
         if (i>=1 && i<=9) {
            reserved = false;
            name = "Graphic symbols";
         } else if (i>=10 && i<=15) {
            reserved = true;
            name = "Reserved";
         } else if (i>=16 && i<=55) {
            reserved = false;
            name = "Level 1 characters";
         } else if (i>=56 && i<=87) {
            reserved = false;
            name = "Level 2 characters";
         } else if (i>=88 && i<=94) {
            reserved = true;
            name = "Reserved";
         }
         // writing row title
         writeln();
         writeString("");
         writeNumber(i);
         writeString(" Row: "+name);
         writeln();
         writeString("
");
         writeln();
         if (!reserved) {
            writeln();
            writeHeader();
           // looping through all characters in one row
            for (int j=1; j<=94; j++) {
               byte hi = (byte)(0xA0 + i);
               byte lo = (byte)(0xA0 + j);
               if (validGB(i,j)) {
                  // getting GB, UTF-16BE, UTF-8 codes
                  gbbb = ByteBuffer.wrap(new byte[]{hi,lo});
                  try {
                     cb = gbdc.decode(gbbb);
                     uxbb = uxec.encode(cb);
                     cb.rewind();
                     u8bb = u8ec.encode(cb);
                  } catch (CharacterCodingException e) {
                     cb = null;
                     uxbb = null;
                     u8bb = null;
                  }
               } else {
                  cb = null;
                  uxbb = null;
                  u8bb = null;
               }
               writeNumber(i);
               writeNumber(j);
               writeString(" ");
               if (cb!=null) {
                  writeByte(hi);
                  writeByte(lo);
                  writeString(" ");
                  writeHex(hi);
                  writeHex(lo);
                  count++;
               } else {
                  writeGBSpace();
                  writeString(" null");
               }
               writeString(" ");
               writeByteBuffer(uxbb,2);
               writeString(" ");
               writeByteBuffer(u8bb,3);
               if (j%2 == 0) {
                  writeln();
               } else {
                  writeString("   ");
               }
            }
            writeFooter();
         }
      }
      System.out.println("Number of GB characters worte: "+count);
   }
   public static void writeln() throws IOException {
      out.write(0x0D);
      out.write(0x0A);
   }
   public static void writeByte(byte b) throws IOException {
      out.write(b & 0xFF);
   }
   public static void writeByteBuffer(ByteBuffer b, int l)
      throws IOException {
      int i = 0;
      if (b==null) {
      writeString("null");
      i = 2;
      } else {
for (i=0; i<b.limit(); i++) writeHex(b.get(i));
      }
      for (int j=i; j<l; j++) writeString("  ");
   }
   public static void writeGBSpace() throws IOException {
      out.write(0xA1);
      out.write(0xA1);
   }
   public static void writeString(String s) throws IOException {
      if (s!=null) {
         for (int i=0; i<s.length(); i++) {
            out.write((int) (s.charAt(i) & 0xFF));
         }
      }        
   }
   public static void writeNumber(int i) throws IOException {
      String s = "00" + String.valueOf(i);
      writeString(s.substring(s.length()-2,s.length()));
   }
   public static void writeHex(byte b) throws IOException {
      out.write((int) hexDigit[(b >> 4) & 0x0F]);
      out.write((int) hexDigit[b & 0x0F]);
   }
   public static void writeHeader() throws IOException {
      writeString("<pre>");
      writeln();
      writeString("Q.W. ");
      writeGBSpace();
      writeString(" GB   Uni. UTF-8 ");
      writeString("   ");
      writeString("Q.W. ");
      writeGBSpace();
      writeString(" GB   Uni. UTF-8 ");
      writeln();
      writeln();
   }
   public static void writeFooter() throws IOException {
      writeString("</pre>");
      writeln();
   }
   public static boolean validGB(int i,int j) {
      for (int l=0; l<b_out.length; l++) {
         if (i*100+j>=b_out[l] && i*100+j<=e_out[l]) return false;
      }
      return true;
   }
}


26. ÔõÑù¸øÒ»¸ö×Ö·û´®±àÂë
public byte[] getBytes(String charsetName)
                throws UnsupportedEncodingException
Encodes this String into a sequence of bytes using the named charset, storing the result into a new byte array.

27. ÔõÑùµÃµ½Ò»¸ö×Ö·û´®µÄunicode
public static String getUnicodeFromStr(String s){
String retS = "";
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
retS +=String.format("%1$04x",(int)c)+" ";
}
return retS;
}
public static String getUnicodeFromStr2(String s) throws UnsupportedEncodingException{
String retS = "";
byte [] bytes = s.getBytes("unicode");
for (int i = 0; i < bytes.length; i++) {
retS+=String.format("%1$02x",(int)(bytes[i]))+" ";
}
return retS;
}

28. JavaÖ§³Ö¶àÉÙÖÖ±àÂë
import java.nio.charset.*;
import java.util.*;
public class Encode1 {
public static void main(String args[]) {
Map<String, Charset> availcs = Charset.availableCharsets();
Set<String> keys = availcs.keySet();
for (Iterator<String> iter =keys.iterator(); iter.hasNext();) {
System.out.println(iter.next());
}
}
}

29. JavaµÄĬÈϱàÂëÊÇʲô£¿
import java.io.*;
public class Encode2 {
public static void main(String args[]) throws IOException {
FileWriter filewriter = new FileWriter("out");
String encname = filewriter.getEncoding();
filewriter.close();
System.out.println("default charset is: " + encname);
}
}

30. ÁªÍ¨
½²µ½ÕâÀÎÒÃÇÔÙ˳±ã˵˵һ¸öºÜÖøÃûµÄÆæ¹ÖÏÖÏ󣺵±ÄãÔÚ windows µÄ¼Çʱ¾Àïн¨Ò»¸öÎļþ£¬ÊäÈë"ÁªÍ¨"Á½¸ö×ÖÖ®ºó£¬±£´æ£¬¹Ø±Õ£¬È»ºóÔٴδò¿ª£¬Äã»á·¢ÏÖÕâÁ½¸ö×ÖÒѾ­ÏûʧÁË£¬´úÖ®µÄÊǼ¸¸öÂÒÂ룡ºÇºÇ£¬ÓÐÈË˵Õâ¾ÍÊÇÁªÍ¨Ö®ËùÒÔÆ´²»¹ýÒƶ¯µÄÔ­Òò¡£
ÆäʵÕâÊÇÒòΪGB2312±àÂëÓëUTF8±àÂë²úÉúÁ˱àÂë³åײµÄÔ­Òò¡£
µ±Ò»¸öÈí¼þ´ò¿ªÒ»¸öÎı¾Ê±£¬ËüÒª×öµÄµÚÒ»¼þÊÂÊǾö¶¨Õâ¸öÎı¾¾¿¾¹ÊÇʹÓÃÄÄÖÖ×Ö·û¼¯µÄÄÄÖÖ±àÂë±£´æµÄ¡£Èí¼þÒ»°ã²ÉÓÃÈýÖÖ·½Ê½À´¾ö¶¨Îı¾µÄ×Ö·û¼¯ºÍ±àÂ룺
¼ì²âÎļþÍ·±êʶ£¬ÌáʾÓû§Ñ¡Ôñ£¬¸ù¾ÝÒ»¶¨µÄ¹æÔò²Â²â
×î±ê×¼µÄ;¾¶ÊǼì²âÎı¾×ͷµÄ¼¸¸ö×Ö½Ú£¬¿ªÍ·×Ö½Ú Charset/encoding,ÈçÏÂ±í£º
EF BB BF UTF-8
FE FF UTF-16/UCS-2, little endian
FF FE UTF-16/UCS-2, big endian
FF FE 00 00 UTF-32/UCS-4, little endian.
00 00 FE FF UTF-32/UCS-4, big-endian.

µ±Äãн¨Ò»¸öÎı¾Îļþʱ£¬¼Çʱ¾µÄ±àÂëĬÈÏÊÇANSI£¨´ú±íϵͳĬÈϱàÂ룬ÔÚÖÐÎÄϵͳÖÐÒ»°ãÊÇGBϵÁбàÂ룩, Èç¹ûÄãÔÚANSIµÄ±àÂëÊäÈ뺺×Ö£¬ÄÇôËûʵ¼Ê¾ÍÊÇGBϵÁеıàÂ뷽ʽ£¬ÔÚÕâÖÖ±àÂëÏ£¬"ÁªÍ¨"µÄÄÚÂëÊÇ£º
c1 1100 0001
aa 1010 1010
cd 1100 1101
a8 1010 1000
×¢Òâµ½ÁËÂ𣿵ÚÒ»¶þ¸ö×Ö½Ú¡¢µÚÈýËĸö×Ö½ÚµÄÆðʼ²¿·ÖµÄ¶¼ÊÇ"110"ºÍ"10"£¬ÕýºÃÓëUTF8¹æÔòÀïµÄÁ½×Ö½ÚÄ£°åÊÇÒ»Öµģ¬
ÓÚÊǵ±ÎÒÃÇÔٴδò¿ª¼Çʱ¾Ê±£¬¼Çʱ¾¾ÍÎó ÈÏΪÕâÊÇÒ»¸öUTF8±àÂëµÄÎļþ£¬ÈÃÎÒÃǰѵÚÒ»¸ö×Ö½ÚµÄ110ºÍµÚ¶þ¸ö×Ö½ÚµÄ10È¥µô£¬ÎÒÃǾ͵õ½ÁË"00001 101010"£¬ÔٰѸ÷λ¶ÔÆ룬²¹ÉÏÇ°µ¼µÄ0£¬¾ÍµÃµ½ÁË"0000 0000 0110 1010"£¬²»ºÃÒâ˼£¬ÕâÊÇUNICODEµÄ006A£¬Ò²¾ÍÊÇСдµÄ×Öĸ"j"£¬¶øÖ®ºóµÄÁ½×Ö½ÚÓÃUTF8½âÂëÖ®ºóÊÇ0368£¬Õâ¸ö×Ö·ûʲôҲ²»ÊÇ¡£Õâ¾Í ÊÇÖ»ÓÐ"ÁªÍ¨"Á½¸ö×ÖµÄÎļþûÓа취ÔÚ¼Çʱ¾ÀïÕý³£ÏÔʾµÄÔ­Òò¡£
¶øÈç¹ûÄãÔÚ"ÁªÍ¨"Ö®ºó¶àÊäÈ뼸¸ö×Ö£¬ÆäËûµÄ×ֵıàÂë²»¼ûµÃÓÖÇ¡ºÃÊÇ110ºÍ10¿ªÊ¼µÄ×Ö½Ú£¬ÕâÑùÔٴδò¿ªÊ±£¬¼Çʱ¾¾Í²»»á¼á³ÖÕâÊÇÒ»¸öutf8±àÂëµÄÎļþ£¬¶ø»áÓÃANSIµÄ·½Ê½½â¶ÁÖ®£¬ÕâʱÂÒÂëÓÖ²»³öÏÖÁË¡£





  • ÉÐѧÌÃ.ÕÅÖ¾Óî.ÂÒÂë·ÖÎö_01_»ù´¡.rar (35.9 KB)
  • ÏÂÔØ´ÎÊý: 5
·¢±íÆÀÂÛ
Óû§Ãû: ÄäÃû