IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    UTF8<=>UTF16<=>UTF32

    Chipset发表于 2021-06-23 06:55:00
    love 0

    https://github.com/ww898/utf-cpp别人写好了,MIT License,我直接拿过来备份一份。http://www.cppblog.com/Files/Chipset/utf-cpp.7z

    This is the C++11 template based header only library under Windows/Linux/MacOs to convert UFT-8/16/32 symbols and strings. The library transparently support wchar_t as UTF-16 for Windows and UTF-32 for Linux and MacOs.

    UTF-8 and UTF-32 (UCS-32) both support 31 bit wide code points [0‥0x7FFFFFFF]with no restriction. UTF-16 supports only unicode code points [0‥0x10FFFF], where high [0xD800‥0xDBFF] and low [0xDC00‥0xDFFF] surrogate regions are prohibited.

    The maximum UTF-16 symbol size is 2 words (4 bytes, both words should be in the surrogate region). UFT-32 (UCS-32) is always 1 word (4 bytes). UTF-8 has the maximum symbol size (see conversion table for details):

    • 4 bytes for unicode code points
    • 6 bytes for 31bit code points
    UTF-16 surrogate decoder:
    High\LowDC00DC01…DFFF
    D800010000010001…0103FF
    D801010400010401…0107FF
    ⋮⋮⋮⋱⋮
    DBFF10FC0010FC01…10FFFF



    UTF-8 Conversion table


    // यूनिकोड
        static char const u8s[] = "\xE0\xA4\xAF\xE0\xA5\x82\xE0\xA4\xA8\xE0\xA4\xBF\xE0\xA4\x95\xE0\xA5\x8B\xE0\xA4\xA1";
        
    using namespace ww898::utf;
        std::u16string u16;
        convz
    <utf_selector_t<decltype(*u8s)>, utf16>(u8s, std::back_inserter(u16));
        std::u32string u32;
        conv
    <utf16, utf_selector_t<decltype(u32)::value_type>>(u16.begin(), u16.end(), std::back_inserter(u32));
        std::vector
    <char> u8;
        convz
    <utf32, utf8>(u32.data(), std::back_inserter(u8));
        std::wstring uw;
        conv
    <utf8, utfw>(u8s, u8s + sizeof(u8s), std::back_inserter(uw));
        auto u8r 
    = conv<char>(uw);
        auto u16r 
    = conv<char16_t>(u16);
        auto uwr 
    = convz<wchar_t>(u8s);

        auto u32r 
    = conv<char32_t>(std::string_view(u8r.data(), u8r.size())); // C++17 only

        static_assert(std::is_same
    <utf_selector<decltype(*u8s)>, utf_selector<decltype(u8)::value_type>>::value, "Fail");
        static_assert(
            std::is_same
    <utf_selector_t<decltype(u16)::value_type>, utf_selector_t<decltype(uw)::value_type>>::value !=
            std::is_same
    <utf_selector_t<decltype(u32)::value_type>, utf_selector_t<decltype(uw)::value_type>>::value, "Fail");


    Chipset 2021-06-23 14:55 发表评论


沪ICP备19023445号-2号
友情链接