Today we are going to discuss a topic that sounds fairly technical, but when fully fleshed out it is actually semi-easy to understand. That topic is charset=UTF-8 vs. ISO-8859-1 and search engine optimization.
What is the Document Character Set?
According to w3.org, “To promote interoperability, SGML requires that each application (including HTML) specify its document character set. A document character set consists of:
- A Repertoire: A set of abstract characters,, such as the Latin letter “A”, the Cyrillic letter “I”, the Chinese character meaning “water”, etc.
- Code positions: A set of integer references to characters in the repertoire.”
So as we can see, the issue we are discussing here is the type of character sets we use to code our websites. If we choose to code our websites in a certain character set, we must alert the browser of the character set we are using. This will allow the browser to display the data correctly. Note: In this blog we will not discuss how to specify the character sets. If you would like to learn about character set specification you can do that here.
Now that we have flushed that out, let’s take a look at charset=UTF-8 vs. ISO-8859-1 and SEO.
What is charset=UTF-8?
According to UTF8.com, “UTF-8 stands for Unicode Transformation Format-8. It is an octet (8-bit) lossless encoding of Unicode characters.”
OK, it sounds complicated when you read the quote above, so please allow me to simplify it. UTF-8 is a standard for coding websites. By coding websites in this particular way and alerting web browsers that our websites are coded in this format we can be sure that our web pages will be displayed correctly.
So how do we tell a browser that our webpage is coded in this language? It is easy! We just place a piece of code in the head section of our webpage. It looks like this. Notice there is different code for HTML and XML documents.
UTF-8 Code for HTML
<meta http-equiv=”Content-Type” content=”text/html;charset=UTF-8″>
UTF-8 Code for XML
<?xml version=”1.0″ encoding=”UTF-8″ ?>
What is ISO-8859-1
According to w3schools.com, “ISO-8859-1 is the default character set in most browsers. The first 128 characters of ISO-8859-1 is the original ASCII character-set (the numbers from 0-9, the uppercase and lowercase English alphabet, and some special characters).”
So ISO-8859-1 is the main character set used by most browsers. Because of this, programs like Dreamweaver will often automatically add this meta code in the top of a page.
What is the Difference Between UTF-8 and ISO-8859-1?
This is really the money question right, what is the difference between the two? As of right now, they are really just a matter of preference. If you choose one standard and stick to it you will be fine. But if you want to get into specifics, here they are.
- UTF-8 supports more characters than ISO-8859-1. If you require more characters then this is a good option.
- ISO-8859-1 is the default for most browsers, so it enjoys wider support
How Could this Effect SEO?
Whether you choose UTF-8 or ISO-8859-1 it should not have a large effect on your search engine optimization. There could be potential issues if you mistakenly display both standards in the header or do not correctly adhere to the standard correctly. For example, if you do not follow the rules of ISO-8859-1 and your characters display incorrectly a few things could occur such as:
- People may not want to link to you
- You may experience a high bounce rate
- Search engines may not interpret the data to the best of their ability
While these are all certainly possibilities, the final word from SEO Inc. on the subject, for the moment, is choose a standard and stick to it. tLately, there has been a trend in web application development towards UTF-8. You can read more about that here.
For more information visit our home page, SEO Inc is a SEO Company located in San Diego.