C++ fopen如何以UTF-8编码格式读写文件_C/C++_编程开发_程序员俱乐部

中国优秀的程序员网站程序员频道CXYCLUB技术地图
热搜:
更多>>
 
您所在的位置: 程序员俱乐部 > 编程开发 > C/C++ > C++ fopen如何以UTF-8编码格式读写文件

C++ fopen如何以UTF-8编码格式读写文件

 2016/5/14 5:33:22  aigo  程序员俱乐部  我要评论(0)
  • 摘要:HowtowriteUTF-8filewithfprintfinC++http://stackoverflow.com/questions/10028750/how-to-write-utf-8-file-with-fprintf-in-coushouldn'tneedtosetyourlocaleorsetanyspecialmodesonthefileifyoujustwanttousefprintf.YousimplyhavetouseUTF-8encodedstrings
  • 标签:文件 c++ 编码

?

How to write UTF-8 file with fprintf in C++

http://stackoverflow.com/questions/10028750/how-to-write-utf-8-file-with-fprintf-in-c

?

ou shouldn't need to set your locale or set any special modes on the file if you just want to use fprintf. You simply have to use UTF-8 encoded strings.

class="cpp" name="code">#include <cstdio>
#include <codecvt>

int main() {
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t> convert;
    std::string utf8_string = convert.to_bytes(L"кошка 日本国");

    if(FILE *f = fopen("tmp","w"))
    fprintf(f,"%s\n",utf8_string.c_str());
}

?

Save the program as UTF-8 with signature or UTF-16 (i.e. don't use UTF-8 without signature, otherwise VS won't produce the right string literal). The file written by the program will contain the UTF-8 version of that string. Or you can do:

int main() {
    if(FILE *f = fopen("tmp","w"))
        fprintf(f,"%s\n","кошка 日本国");
}

?

In this case you must save the file as UTF-8?without?signature, because you want the compiler to think the source encoding is the same as the execution encoding... This is a bit of a hack that relies on the compiler's, IMO, broken behavior.

You can do basically the same thing with any of the other APIs for writing narrow characters to a file, but note that none of these methods work for writing UTF-8 to the Windows console. Because the C runtime and/or the console is a bit broken you can only write UTF-8 directly to the console by doing SetConsoleOutputCP(65001) and then using one of the?monospace, sans-serif; white-space: pre-wrap; background-color: #eff0f1;">puts?variety of function.

If you want to use wide characters instead of narrow characters then locale based methods and setting modes on file descriptors could come into play.

#include <cstdio>
#include <fcntl.h>
#include <io.h>

int main() {
    if(FILE *f = fopen("tmp","w")) {
        _setmode(_fileno(f), _O_U8TEXT);
        fwprintf(f,L"%s\n",L"кошка 日本国");
    }
}

?

#include <fstream>
#include <codecvt>

int main() {
    if(auto f = std::wofstream("tmp")) {
        f.imbue(std::locale(std::locale(),
                new std::codecvt_utf8_utf16<wchar_t>)); // assumes wchar_t is UTF-16
        f << L"кошка 日本国\n";
    }
}

?

The first example uses wstring_convert from C++11, but any other method of obtaining a UTF-8 encoding works too, e.g. WideCharToMultiByte. The last example uses a C++11 codecvt facet for which there's not a built-in, pre-c++11 replacement. The other two examples don't use C++11.?

?

How to Read/Write UTF8 text files in C?

http://stackoverflow.com/questions/21737906/how-to-read-write-utf8-text-files-in-c

Instead of

fprintf(fout,"%c ",character);

?

use

fprintf(fout,"%c",character);

?

The second?fprintf()?does not contain a space after?%c?which is what was causing?out.txt?to display weird characters. The reason is that?fgetc()?is retrieving a single byte (the same thing as an ASCII character),?not?a UTF-8 character. Since UTF-8 is also ASCII compatible, it will write English characters to the file just fine.

putchar(character)?output the bytes sequentially without the extra space between every byte so the original UTF-8 sequence remained intact. To see what I'm talking about, try

while((character=fgetc(fin))!=EOF){
    putchar(character);
    printf(" "); // This mimics what you are doing when you write to out.txt
    fprintf(fout,"%c ",character);
}

?

If you want to write UTF-8 characters with the space between them to out.txt, you would need to handle the variable length encoding of a UTF-8 character.

#include <stdio.h>
#include <stdlib.h>

/* The first byte of a UTF-8 character
 * indicates how many bytes are in
 * the character, so only check that
 */
int numberOfBytesInChar(unsigned char val) {
    if (val < 128) {
        return 1;
    } else if (val < 224) {
        return 2;
    } else if (val < 240) {
        return 3;
    } else {
        return 4;
    }
}

int main(){
    FILE *fin;
    FILE *fout;
    int character;
    fin = fopen("in.txt", "r");
    fout = fopen("out.txt","w");
    while( (character = fgetc(fin)) != EOF) {
        for (int i = 0; i < numberOfBytesInChar((unsigned char)character) - 1; i++) {
            putchar(character);
            fprintf(fout, "%c", character);
            character = fgetc(fin);
        }
        putchar(character);
        printf(" ");
        fprintf(fout, "%c ", character);
    }
    fclose(fin);
    fclose(fout);
    printf("\nFile has been created...\n");
    return 0;
}

?

发表评论
用户名: 匿名