Overview
Features
Download
Documentation
Community
Add-Ons & Services

Incomplete URL encoding

Please post support and help requests here.

Incomplete URL encoding

Postby jbandela » 30 May 2012, 01:49

When using HTMLForm to generate a post request, there is a problem in that the URL encoding does not encode ":" (colon). It is easy to fix by adding a : to the list of reserved characters in writeUrl. Could this be fixed for the 1.5 release


void HTMLForm::writeUrl(std::ostream& ostr)
{
for (NameValueCollection::ConstIterator it = begin(); it != end(); ++it)
{
if (it != begin()) ostr << "&";
std::string name;
URI::encode(it->first, "=&+;:", name);
std::string value;
URI::encode(it->second, "=&+;:", value);
ostr << name << "=" << value;
}
}
jbandela
 
Posts: 4
Joined: 30 May 2012, 01:44

Re: Incomplete URL encoding

Postby alex » 30 May 2012, 22:50

According to RFC 3986, there's more than just ':' missing:

Code: Select all
reserved    = gen-delims / sub-delims

gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="


Or am I missing something?
alex
 
Posts: 1048
Joined: 11 Jul 2006, 16:27
Location: United_States

Re: Incomplete URL encoding

Postby jbandela » 01 Jun 2012, 18:37

I agree. After looking at it some more,and looking through the URI documentation and the documentation for the Amazon and twitter REST api, I ended up changing the URI::encode function to encode everything except reserved characters. This modification required just minimal changes to the function my changes consisted of commenting out the if part of else if and the else afterward.

Code: Select all
void URI::encode(const std::string& str, const std::string& reserved, std::string& encodedStr)
{
   for (std::string::const_iterator it = str.begin(); it != str.end(); ++it)
   {
      char c = *it;
      if ((c >= 'a' && c <= 'z') ||
          (c >= 'A' && c <= 'Z') ||
          (c >= '0' && c <= '9') ||
          c == '-' || c == '_' ||
          c == '.' || c == '~')
      {
         encodedStr += c;
      }
      else //if (c <= 0x20 || c >= 0x7F || ILLEGAL.find(c) != std::string::npos || reserved.find(c) != std::string::npos)
      {
         encodedStr += '%';
         encodedStr += NumberFormatter::formatHex((unsigned) (unsigned char) c, 2);
      }
      //else encodedStr += c;
   }
}



Do you know of any downsides to encoding this way? The upside would be it would be easier with REST API that require signatures.
jbandela
 
Posts: 4
Joined: 30 May 2012, 01:44

Re: Incomplete URL encoding

Postby alex » 03 Jun 2012, 22:00

jbandela wrote:Do you know of any downsides to encoding this way?

I don't. But I'd still go strictly with the RFC.
alex
 
Posts: 1048
Joined: 11 Jul 2006, 16:27
Location: United_States

Re: Incomplete URL encoding

Postby jbandela » 04 Jun 2012, 17:18

I looked into it some more

RFC3986 in section 2.5 includes
When a new URI scheme defines a component that represents textual
data consisting of characters from the Universal Character Set [UCS],
the data should first be encoded as octets according to the UTF-8
character encoding [STD63]; then only those octets that do not
correspond to characters in the unreserved set should be percent-
encoded. For example, the character A would be represented as "A",
the character LATIN CAPITAL LETTER A WITH GRAVE would be represented
as "%C3%80", and the character KATAKANA LETTER A would be represented
as "%E3%82%A2".

The above would imply that you cannot just encode the reserved characters only

In addition Wikepedia on percent encoding seems to support this as well see section 1.6
http://en.wikipedia.org/wiki/Percent-en ... t_standard


In addition the following implementations percent encode everything that is not unreserved (ie alphanumeric or the few puncuation marks).

LIBCurl - http://curl.haxx.se/libcurl/c/curl_escape.html

PHP - http://www.php.net/manual/en/function.rawurlencode.php

Perl - http://search.cpan.org/dist/URI/URI/Escape.pm

Java - http://docs.oracle.com/javase/7/docs/ap ... coder.html

Also as mentioned above, amazon and twitter allow require this type as does oath in RFC5849
http://tools.ietf.org/html/rfc5849#page-29

Finally, the implementation is simpler and will handle utf8 encoded strings without modification

Based on above, I would recommend that POCO encode everything other than the unreserved characters
jbandela
 
Posts: 4
Joined: 30 May 2012, 01:44

Re: Incomplete URL encoding

Postby alex » 04 Jun 2012, 17:39

jbandela wrote:Based on above, I would recommend that POCO encode everything other than the unreserved characters

OK, can you file a patch on SourceForge?
alex
 
Posts: 1048
Joined: 11 Jul 2006, 16:27
Location: United_States

Re: Incomplete URL encoding

Postby jbandela » 13 Jun 2012, 00:46

Patch submitted to sourceforge
jbandela
 
Posts: 4
Joined: 30 May 2012, 01:44


Return to Support

Who is online

Users browsing this forum: No registered users and 1 guest

cron