When subsequently interpreted with the original encoding, the corrupted characters appeared as �. Thus, the most probable cause of the errors is that the book’s text was read by a program which: erroneously treated it as utf-8, converted the non-ASCII characters to U+FFFD, and wrote it out again as utf-8. The replacement character is this special marker. This latter scheme allows most of the text to be read, but still leaves an indication that something went wrong. In such cases, the program has three choices: it can stop decoding and raise an error, silently skip over the invalid group of bytes, or translate the group of bytes into a special marker. it does not correspond to a Unicode character). When decoding sequences of bytes into Unicode characters, a program may encounter a group bytes that is invalid (i.e. The three characters correspond to the bytes EF BF BD (in hex), which is the utf-8 encoding of the Unicode character U+FFFD REPLACEMENT CHARACTER. Strange sequences of characters beginning ï will be instantly familiar to some readers as multi-byte utf-8 sequences misinterpreted as iso8859-1 (or as a similar encoding such as latin-1).
My first task after unpacking the Kindle was to download something to read. Its screen is a joy, and the integration with the Kindle Store is as smooth as silk: voracious readers will be penniless in days.