Overview
Features
Download
Documentation
Community
Add-Ons & Services

[SOLVED] Poco and UTF-8

A general discussion forum.

[SOLVED] Poco and UTF-8

Postby petervn » 06 Oct 2009, 21:55

I am looking for a sample how to read/write UTF-8 encoded files using Poco.
Last edited by petervn on 13 Oct 2009, 16:31, edited 1 time in total.
petervn
 
Posts: 12
Joined: 06 Oct 2009, 21:39

Re: Poco and UTF-8

Postby alex » 08 Oct 2009, 01:33

Code: Select all
std::string file("testfile.txt");
Poco::FileOutputStream fos(file);
fos << "sometestdata";
fos.close();

Poco::FileInputStream fis(file);
std::string read;
fis >> read;


But you don't really need POCO if you only have UTF-8 requirement - you can use std::ifstream and std::ofstream. POCO has its own wrappers only because standard file I/O streams can not deal with Unicode file names on windows.
alex
 
Posts: 1106
Joined: 11 Jul 2006, 16:27
Location: United_States

Re: Poco and UTF-8

Postby petervn » 12 Oct 2009, 20:57

Does the example you've given read UTF-8 encoded arabic characters? Or russian? I'd poked around Poco I/O libraries and the examples and found a StreamConverter class that is supposed to automatically convert UTF-8 representation to some other encoding based on the parameters passed to the class constructor.
Not sure if I am on the right path, though...
Will ifstream work with FTP or HTTP streams? We frequently use Poco streams to download data from FTP sites or an HTTP servers, so it is not possible for us to use streams that can only deal with local files.
petervn
 
Posts: 12
Joined: 06 Oct 2009, 21:39

Re: Poco and UTF-8

Postby alex » 12 Oct 2009, 22:39

petervn wrote:Does the example you've given read UTF-8 encoded arabic characters? Or russian?

Yes. But it appears that you say read but actually mean convert.
petervn wrote: I'd poked around Poco I/O libraries and the examples and found a StreamConverter class that is supposed to automatically convert UTF-8 representation to some other encoding based on the parameters passed to the class constructor.

That's a different issue. If you want to convert the file content to another encoding, then yes, you'll want to use a stream converter, something like this:
Code: Select all
UTF8Encoding utf8Encoding;
Latin1Encoding latin1Encoding;

std::ifstream istr(file);
std::ostringstream ostr;
InputStreamConverter converter(istr, utf8Encoding, latin1Encoding);
StreamCopier::copyStream(converter, ostr);


petervn wrote:Not sure if I am on the right path, though...
Will ifstream work with FTP or HTTP streams? We frequently use Poco streams to download data from FTP sites or an HTTP servers, so it is not possible for us to use streams that can only deal with local files.


All Poco streams inherit from std streams and FTPStreamFactory::open() returns std::istream pointer , so yes you should be able to copy and convert between them.
alex
 
Posts: 1106
Joined: 11 Jul 2006, 16:27
Location: United_States

Re: Poco and UTF-8

Postby petervn » 12 Oct 2009, 23:52

UTF-8 conversion was what I meant of course. Anyhow, thanks for your replies. After having spend few hours on learning how the stream conversion works, I finally got it working:

Code: Select all
   Poco::UTF8Encoding utf8;
   Poco::UTF16Encoding utf16;
   Poco::FileInputStream fs ("someUTF_8_encoded_file.txt");
   Poco::InputStreamConverter conv(fs, utf8, utf16);

   std::streambuf& rdbuf = *conv.rdbuf();
   while (conv.good())
   {
      wchar_t wctemp;
      char *p = (char*)&wctemp;
      *p++ = rdbuf.sbumpc();
      *p++ = rdbuf.sbumpc();
      wprintf(L"%c", wctemp);
   }


I've had some issues trying to use operations such as ungetc but I got around it by maintaining my own internal cache.


alex wrote:
petervn wrote:Does the example you've given read UTF-8 encoded arabic characters? Or russian?

Yes. But it appears that you say read but actually mean convert.
petervn wrote: I'd poked around Poco I/O libraries and the examples and found a StreamConverter class that is supposed to automatically convert UTF-8 representation to some other encoding based on the parameters passed to the class constructor.

That's a different issue. If you want to convert the file content to another encoding, then yes, you'll want to use a stream converter, something like this:
Code: Select all
UTF8Encoding utf8Encoding;
Latin1Encoding latin1Encoding;

std::ifstream istr(file);
std::ostringstream ostr;
InputStreamConverter converter(istr, utf8Encoding, latin1Encoding);
StreamCopier::copyStream(converter, ostr);


petervn wrote:Not sure if I am on the right path, though...
Will ifstream work with FTP or HTTP streams? We frequently use Poco streams to download data from FTP sites or an HTTP servers, so it is not possible for us to use streams that can only deal with local files.


All Poco streams inherit from std streams and FTPStreamFactory::open() returns std::istream pointer , so yes you should be able to copy and convert between them.
petervn
 
Posts: 12
Joined: 06 Oct 2009, 21:39

Re: Poco and UTF-8

Postby alex » 13 Oct 2009, 00:08

petervn wrote:I've had some issues trying to use operations such as ungetc but I got around it by maintaining my own internal cache.

It looks like you are targeting windows only - you may want to take a look at Poco::UnicodeConverter
alex
 
Posts: 1106
Joined: 11 Jul 2006, 16:27
Location: United_States


Return to General Discussion

Who is online

Users browsing this forum: No registered users and 1 guest

cron