Fidonet Portal

From: Maurice Kinal (1:261/38.9)
To: All
Date: Thu, 31.05.12 09:50
just checking
Hey Dale!

This might be a more revealing reply. I am inserting a hexdump of the
bogo-Norwegian utf-8 string below;

----- insert of single character hexdump starts

00000000 41 20 4d c3 b8 c3 b8 73 65 20 6f 6e 63 65 20 62 |A once b|
00000010 69 74 20 6d 79 20 73 69 73 74 65 72 20 2e 2e 2e |it my sister ...|

----- insert of single character hexdump ends

Note that on the right hand side (characters) the two multibyte characters
between the 'M' and 's' characters show up as four '.' characters as they don't
translate properly to single byte hex codes. In reality they are either U+00F8
in utf8 speak or 0x0f8 in 12 bit hex speak. Either way they are two individual
multibyte characters. Counting the character offset they correspond to the "c3
b8 c3 b8" part of the hexdump. If your editor sees simular then it is getting
it correctly despite it not being able to properly display the actual
characters which are the "LATIN SMALL LETTER O WITH STROKE" according to which is the site I am using as a reference for
all things utf-8.

Just for fun the hexdump of the tagline Russian characters looks like this;

----- insert of single character hexdump starts

00000000 d0 ad d1 82 d0 be 20 d0 b2 d1 81 d1 91 20 d0 b4 |...... ...... ..|
00000010 d0 bb d1 8f 20 d0 bc d0 b5 d0 bd d1 8f 20 d0 b3 |.... ........ ..|
00000020 d1 80 d0 b5 d1 87 d0 b5 d1 81 d0 ba d0 b8 d0 b9 |................|
00000030 20 d1 8f d0 b7 d1 8b d0 ba 2e | .........|

----- insert of single character hexdump ends

Other than the spaces and the period at the end (ascii) they are all multibyte
utf-8 encoded.

Life is good,

... Это всё для меня греческий язык.
--- GNU bash, version 4.2.28(2)-release (x86_64-core2-linux-gnu)
* Origin: Pointy Stick Society (1:261/38.9)


