Why Does Certain Characters Display Oddly

Recently we had a client come to us wondering about an "odd character" found in their article. They sent it over and we realized the problem immediately, the hard part was explaining it in a concise manner... so here's that "concise" reply:

"The character that we noticed was definitely "different" from the other characters, and it did appear in multiple areas for some of the same letters. This immediately led us to believe that since your file came in a .txt file and was saved by our writers, the encoding of the UTF changed. This has no effect on the reading of the article, and 99.9% of the time, when placed inside of your CMS (WordPress, Joomla, Drupal, Wix, etc.) or Blog, it will auto-correct itself and show properly in the same UTF set as the other characters. This may have happened because most of our clients use a Mac or PC and most of our writers utilize Linux as an OS system. Without getting too complex, we are going to make an article in our Help Center explaining everything in full details.

Long story short, these "odd characters" will affect nothing about the article.

We hope this helps you out, here is the link to the article that we have made fully detailing why this occurred."


Pretty long winded, wasn't it? Well, it's a small detail, and with small details, usually comes large explanations (small in coding never means simple). What makes this even harder, is that sometimes the UTF-8 (base UTF coding) and Unicode can get mixed up in a delivery because of how it was saved, and how it was opened by the end-user, and that is what is showing for those "odd characters". What is the difference between UTF-8 and Unicode? Find Out Here.

In the end, all you need to know is that it is like adding a X2 to the mix for what character replaces which in the encoding structure of your article. Yet again, this does not affect the outcome of the article, it's readability, it's Copy-Scape passable, SEO optimization, or it's displaying of characters once published inside your CMS.

The following table summarizes some of the properties of each of the UTFs and why there can be an appearance difference because the the bytes allowed per character displayed.

Smallest code point 0000 0000 0000 0000 0000 0000 0000
Largest code point 10FFFF 10FFFF 10FFFF 10FFFF 10FFFF 10FFFF 10FFFF
Code unit size 8 bits 16 bits 16 bits 16 bits 32 bits 32 bits 32 bits
Byte order N/A <BOM> big-endian little-endian <BOM> big-endian little-endian
Fewest bytes per character 1 2 2 2 4 4 4
Most bytes per character 4 4 4 4 4 4 4
Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request


Please sign in to leave a comment.
Powered by Zendesk