[xmlsec] Re: non us-ascii filenames in user locale
Roumen Petrov
xmlsec at roumenpetrov.info
Fri Jun 25 02:46:26 PDT 2004
Aleksey Sanin wrote:
> [SNIP]
>
>> Before to xmlSecTransformCtxUriExecute(...) when encoding is not
>> NULL(is it posible?) or UTF-8 we can convert ctx->url from UTF-8 to
>> "document encoding", to replace temporary ctx->url with new string
>> and to call xmlSecTransformCtxUriExecute.
>
> It's a guess. Who said that the document filename is in the document
> locale???
A.) From libxml "Encodings support" page
(http://www.xmlsoft.org/encoding.html) :
....
for examples when adding a text node to a document, the content would
have to be provided in the document encoding
....
B.) From rfc2396 (http://www.ietf.org/rfc/rfc2396.txt):
....
However, there is currently
no provision within the generic URI syntax to accomplish this
identification. An individual URI scheme may require a single
charset, define a default charset, or provide a way to indicate the
charset used.
It is expected that a systematic treatment of character encoding
within URI will be developed as a future modification of this
specification."
....
C.) From "XML-Signature Syntax and Processing "
(http://www.w3.org/TR/xmldsig-core/)
....
4.3.3.1 The URI Attribute ..."
The URI attribute identifies a data object using a URI-Reference, as
specified by RFC2396 [URI]. The set of allowed characters for URI
attributes is the same as for XML, namely [Unicode]. However, some
Unicode characters are disallowed from URI references including all
non-ASCII characters and the excluded characters listed in RFC2396 [URI,
section 2.4]. However, the number sign (#), percent sign (%), and square
bracket characters re-allowed in RFC 2732 [URI-Literal] are permitted.
Disallowed characters must be escaped as follows:
Each disallowed character is converted to [UTF-8] as one or more octets.
Any octets corresponding to a disallowed character are escaped with the
URI escaping mechanism (that is, converted to %HH, where HH is the
hexadecimal notation of the octet value).
The original character is replaced by the resulting character sequence.
....
>From A. I expect in Reference node URI to be in document encoding.
>From B. I see that we are free to use in URI any charset.
C. define that we should use UTF-8 encoding.
When document encoding is not acceptable as default charset for
"Reference URIs" might we should provide in xmlsec way "to indicate the
charset used" ?
For me solution is clear. I will create xmldsig document with encoding
same as user locale charmap and filename(URI) will be converted from
locale charmap to UTF-8 and escaped. Later from UTF-8 URI I will convert
back to charset specified in xmldsig document encoding.
When I would like to use UTF-8 URI I will create xmldsig document in
UTF-8 encoding.
When I would like to use URI in ISO-8859-1 or CP1251 I will create
xmldsig document in corresponding encoding.
Regards,
Roumen Petrov
More information about the xmlsec
mailing list