International characters, Unicode, encodings

A lot can be said about this, but Snakelets tries to support them as much as possible, and seems to succeed in this.

URLs

It is possible to put non-ASCII characters in the URL that your browser submits to the web server. The w3c is vague on this subject. It is not specified how these URLs are encoded when sent to the web server. Internet Explorer and Mozilla-like browsers support a global setting that lets you specify to encode all URLs in UTF-8, which is convenient. If you don't activate this setting, it depends on the system how the URLs are encoded.

If you expect non-ASCII characters in URLs, which can happen if you're using a FORM that is submitted using the GET method, you have to set the expected encoding on the Request object before you extract parameters from the form. That's why the above mentioned browser setting is handy; enable it and you can set request.setEncoding("UTF-8") before getting the parameters from the request.

page source code (html, ypage, snakelet)

For static HTML pages, follow the standard HTML rules concerning HTML source that contains international characters. For snakelets, follow the standard Python rules concerning non-ASCII characters in Python source code (source file encoding tag, etc) For Ypages, use the @inputencoding declaration to set the character encoding that the Ypage file is stored in. This is the Ypage analogy of Python's source file encoding tag. Note: the Ypage can not be stored in a multi-byte character encoding like UTF-16 that uses more than a single byte per normal character! Remember that you have to use unicode string literals for non-ascii strings. Also you probably need to set a page output encoding to be able to show the non-ascii symbols without getting encoding exceptions!

page output

You can use defaultOutputEncoding in the webapp's config to set a global output encoding. You can also do this manually in every page that needs it:
For snakelets: use response.setEncoding(...) to set the desired character for the response data. After that, just write unicode strings to the output. For Ypages: use the @outputencoding declaration. To make it easier to include pages with or without outputencodings in eachother, the following algorithm is used by the Ypage parser, if pages are included in each other:

if parent page has no output encoding: set parent's output encoding to included page's encoding
if parent already has output encoding: do nothing (keep parent's encoding)

If a page redirects to another page, the actual page output encoding is determined as follows:

if current page already has output encoding set, use this;
otherwise: use the output encoding from the page that is being redirected to;
if that's also not specified: use the default output encoding.

submitted FORM data

If you expect non-ASCII characters in your FORM data, you have to set the encoding on the request object before doing anything else with it: request.setEncoding(...). (You can also set a defaultRequestEncoding in the webapp init file). Also: you have to specify the corresponding encoding on the FORM tag in your page source too: use accept-charset="..." on the FORM tag. For Internet Explorer it may also be necessary to send the page containing the form in the same encoding as the accept-charset on the FORM, to make it completely correct.

Snakelets manual - Back to index