MOO-cows Mailing List Archive


Re: Support for alt. language

Kondratiev Dima writes:
> I am looking for is a 'clean' way to support alternative character sets in 
> strings, Cyrillic for example.
> I need  to support alternative language in string properties such as
> object names and descriptions.
> As I understand this problem now (correct me please, if I am wrong) parser 
> should pass
> extended ASCII chars in strings. Do I need to hack parser code to achieve 
> this ?

Actually, it has nothing to do with the parser; it has to do with the network
module's input handler.  Right now, for non-binary connections, it discards any
input character that isn't an ASCII printing character, space, or tab.  You
could change that test to allow other characters and things would work fine.
For example, you might use
	if ((c & 0x7F) >= 0x20  &&  (c & 0x7F) != 0x7F) {...}
if you wanted to allow all of the eight-bit non-control characters, such as all
of those in the Latin-1 character set.  Of course, the interpretation of those
bytes as characters is partially fixed by the MOO language, which believes that
it knows the identity of most codes corresponding to printing ASCII characters.
Also, your client program is going to need to understand the `proper'
interpretation of the codes.

However, if your intended character set only *adds* to ASCII, presumably by
giving meaning to the codes in the upper half of the one-byte space, then you
should be in good shape.  NOTE: you *must not* allow input of the character
codes 0x00 (the null character) or 0x0A (the line-feed/newline character); if
either of those appears in a MOO string, you will definitely lose.

Now, if you're trying to handle multi-byte character sets, like Japanese, then
you have even more problems, like support for discovering the true length (in
characters, not bytes) of strings, etc.



Home | Subject Index | Thread Index