> The XML parser shouldn't do any CRLF conversion. How are you invoking the parser? If you create an istream yourself (e.g., a ifstream), do you make sure that the stream is opened in binary mode?
Thanks for the reply. That was my first thought as well, however I confirmed that the stream was indeed being passed into the parser correctly. In the function PREFIX(contentTok) in xmltok_impl.c, if a CR is found and followed by a LF, it is explicitly stripped.
The offending code is:
- Code: Select all
case BT_CR:
ptr += MINBPC(enc);
if (ptr == end)
return XML_TOK_TRAILING_CR;
if (BYTE_TYPE(enc, ptr) == BT_LF)
ptr += MINBPC(enc);
*nextTokPtr = ptr;
return XML_TOK_DATA_NEWLINE;
When a CR is encountered, the pointer is incremented. When a following LF is encountered, the pointer is incremented again. The state XML_TOK_DATA_NEWLINE is returned, and as you can see in my original post, only a LF character is passed into the contentDataHandler, rather than a CR followed by a LF.
I patched my own local copy, so that CR are not stripped, and the actually values are passed along. This solved the problem, and CRLF are no longer converted to single LF.
The patch is:
- Code: Select all
Index: xmlparse.cpp
===================================================================
--- xmlparse.cpp (.../trunk/external/poco-1.1.2/XML/src) (revision 10293)
+++ xmlparse.cpp (.../branches/tt6523/external/poco-1.1.2/XML/src) (revision 10293)
@@ -2471,7 +2471,7 @@
return XML_ERROR_MISPLACED_XML_PI;
case XML_TOK_DATA_NEWLINE:
if (characterDataHandler) {
- XML_Char c = 0xA;
+ XML_Char c = s[0];
characterDataHandler(handlerArg, &c, 1);
}
else if (defaultHandler)
Index: xmltok_impl.c
===================================================================
--- xmltok_impl.c (.../trunk/external/poco-1.1.2/XML/src) (revision 10293)
+++ xmltok_impl.c (.../branches/tt6523/external/poco-1.1.2/XML/src) (revision 10293)
@@ -802,8 +802,6 @@
ptr += MINBPC(enc);
if (ptr == end)
return XML_TOK_TRAILING_CR;
- if (BYTE_TYPE(enc, ptr) == BT_LF)
- ptr += MINBPC(enc);
*nextTokPtr = ptr;
return XML_TOK_DATA_NEWLINE;
case BT_LF: