Using the Power of the .htaccess File to Improve WordPress SEO

Search engine optimization is a rapidly growing field which seems to grow more lucrative by the day, and that’s largely because a website’s placement within a search engine can make or break the site’s revenues and their stream of new readers or subscribers. Many people are familiar with the best ways to optimize a website for the best search engine placement, whether it’s committing to a routine and regular cycle of publishing new content, focusing on several keywords and shying away from overemphasizing just one of those words, or refusing to alter the typeface or link structure of a site to accommodate those key words.

These things are crucial when conveying to Google that a website exists not just to draw visitors toward content and advertisements, but to actually retain visitors by creating high-value, keyword dense content which is as informative as it is promotional and helpful to the site’s standing in search engine results.

However, mere optimization of the website’s content is not the only way to get new visitors through search results. Increasingly, search engine optimization professionals are relying on the little-known .htaccess file to contribute to the site’s standing at Google, Yahoo, and other major search engine websites around the internet. This file is often neglected because it has a strange name and is often hidden from users who access their website via standard FTP clients. However, the file can be seen and edited when using web-based FTP clients and file managers provided directly via the web hosting company. This requires logging into the cPanel or Plesk Panels administration area, so it should be attempted only by users with a good understanding of how those pieces of software work and where the file manager application is located.

Now that the importance of the .htaccess file has been described, it’s time to learn how to leverage this file and earn a top-notch ranking at the internet’s most popular search providers.

Step 1: Direct the Search Engine Spider Toward the Site’s Sitemap File

An absolutely essential part of optimizing a site for top-ranking Google search results is making sure that the company’s spiders know where to find the appropriate XML sitemap file which describes which of the site’s entries are most recent, most important, and most notable. This file also contains an index of all pages, posts, categories, tags, and other pages, which will allow the search engine to return more accurate results and thus increase the chances that visitors will be satisfied with their experience and click through more of the website’s content. That, in turn, promotes a higher ranking on the search engine which sent the visitor there in the first place.

In order to guide the search engine spiders toward the XML sitemap file, a few lines of code must be placed into the .htaccess document which simply tells all newcomers where to find the XML file which contains the key to the site’s structure. It looks like the example below:

< IfModule mod_alias.c >
RedirectMatch 301 /sitemap\.xml$ http://your-site.com/sitemap.xml
RedirectMatch 301 /sitemap\.xml\.gz$ http://your-site.com/sitemap.xml.gz
< /IfModule >

These few lines of code employ an HTTP 301 redirect, known as a permanent redirect, which essentially tells the search engine spiders that the file is located at the domain listed above. it communicates that the file is there permanently, has not moved and is not going to move, and that the spider should reference that URL from now on.

As an aside, if your site does not currently generate its own XML sitemap for use by the major search engines, it’s missing out on a major source of traffic and communication with the wider internet audience searching for the site’s most dominant keywords. There are several WordPress plugins, led by one titled Google XML Sitemap which automatically generate new sitemaps every time a new post, page, category, or tag is added to the website.

These plugins also notify the search engines that the sitemap has been updated, prompting the site to crawl for content and add new results to its user searches. Don’t subject your site to lower traffic by missing out on this now-essential technology which as been adopted by all of the biggest players in search.

Step 2: Avoid Displaying Dangerous 404 Errors That Dramatically Drop a Page’s Rank

One of the great mysteries of the universe happens to center around a quirky WordPress behavior when the software is crawled by search engine spiders and robots. This behavior causes the website to promote URLs which look like the following:

http://your-site.com/category/entry-name-goes-here-as-permalink

This is a completely invalid URL. WordPress has never supported this kind of permalink, and it doesn’t even produce this permalink when it submits entries to a sitemap or displays them on search engines. And yet, for reasons that are still unknown to the universe, search engines insist on indexing these URLs night after night, month after month. That would certainly be fine if 404 errors didn’t affect a website’s ranking at the major search engines, but the simple fact of the matter is that these missing pages are causing websites to be placed pages and pages further back in a Google search than they otherwise would be.

Image Source: Internet Warning Message via Shutterstock

In order to get rid of this phenomenon, a new snippet of code should be added to the .htaccess file which instructs any visitor, whether they’re a search engine robot or not, to be redirected to the valid standalone entry page. This is done by removing the directory (such as “category” or “tag”) and instead redirecting them to a URL like the following:

http://your-site.com/entry-name-goes-here-as-permalink

It’s done by using permanent redirects and a few conditions variables that are shown here:

< IfModule mod_alias.c >
RedirectMatch 301 ^/search/$ http://your-site.com/
RedirectMatch 301 ^/tag/$ http://your-site.com/
RedirectMatch 301 ^/category/$ http://your-site.com/
< /IfModule >

The permanent redirection code, assigned here jut like it was in the first step of this process, will force the search engine spiders not to return to an entry’s supposed permalink while using a search, tag, or category directory before the title. This should remain in place so that it applies to each new entry as it is created; over time the lack of 404 errors produced by WordPress because of this hack will lead to far improved page rankings with Google and other major search engines which ding a website’s reputation for promoting 404 error pages and missing content.

Step 3: Permanently Redirect an RSS Feed to FeedBurner.com

If your website doesn’t have an RSS feed to allow user subscription to content updates, it’s likely not ranking very highly as far as Google or Yahoo are concerned. Likewise, the same is true of your website does have an RSS feed but does not properly promote its URL because the feed lies at an external site like FeedBurner.com. Whereas WordPress includes optimized SEO code for displaying its own internal RSS feeds, the FeedBurner feeds suffer from being outside the software’s purview. That can lead to major losses in terms of page rank when a user switches to the FeedBurner off-site syndication service.

Luckily, .htaccess allows FeedBurner to be treated essentially like a native WordPress feed, to the point where search engines will not know the difference between a self-hosted feed and one that’s kept off-site. It uses our good friend, the 301 redirect, to accomplish this, and it’s as easy as adding the following lines of code the .htaccess file:

< IfModule mod_rewrite.c >
RewriteCond %{REQUEST_URI} ^/feed/ [NC]
RewriteCond %{HTTP_USER_AGENT} !(FeedBurner|FeedValidator) [NC]
RewriteRule .* http://feeds.feedburner.com/Your-Site-Userame [L,R=301]
< /IfModule >

The code below is simply instructed to permanently redirect search spiders, as well as any visitors who may have subscribed to the old feed URL, to the new version of the feed which is hosted on FeedBurner.com. It also makes an exception for FeedBurner itself so that FeedBurner.com doesn’t get stuck in an endless loop, constantly redirecting to itself and then back to the website, over and over again. Failure to include this exception would result in an RSS feed without any content. Finally, the last line of the code snippet simply instructs the .htaccess file where to redirect search engine spiders and subscribers, and the “301” is the code it will deliver to browsers and spiders while it sends them to the new, off-site feed.

Step 4: Direct Search Engines to the Robots, and Leave Malicious Sites Behind

Most website owners and administrators aren’t aware that the internet is full of malicious “spider” applications which endlessly crawl web servers, looking in every single directory of a robots.txt file with specific search engine instructions. This is done largely to slow down a website’s load times and exploit any security risks that the site might be vulnerable to, and it’s a huge problem for a large number of unsuspecting WordPress website operators. This problematic “perpetual crawl” can be avoided by telling the .htaccess file to specifically list where the site’s robots.txt file is stored, thereby preventing the endless crawling of external spiders. They’ll be told to look in one place, get what they need, and get out. That’s a far better approach. Here’s what it looks like:

< IfModule mod_rewrite.c >
RewriteBase /
RewriteCond %{REQUEST_URI} !^/robots.txt$ [NC]
RewriteCond %{REQUEST_URI} robots\.txt [NC]
RewriteRule .* http://your-site.com/robots.txt [R=301,L]
< /IfModule >

Remember that a robots.txt file should always be placed in the root directory of a website. In terms of an FTP client, that means the file should be placed in the public_html folder for maximum effect when boosting a website’s rankings with major search engines. This file controls not only what information can be seen, but also which search engines can crawl a website and which directories should be excluded. This is an essential way of walling off subdomains, add-on domain folders, and other information which should be crawled separately and indexed away from the main domain name that serves a website.

Step 5: Eliminate Favicon Crawling and Exploitation Using .htaccess

The same malicious spiders which crawl through every one of a server’s directories looking for a robots.txt file are also well-known to do the “perpetual crawl” when looking for so-called favicon images. These images are used in the URL area of a browser and are generally placed right next to the site’s URL. A favicon image is typically included in the root directory of a website, and it generally contains a small version of the site’s logo or a memorable image to identify the site to a user who has bookmarked it for delayed reading. In order to prevent malicious spiders form exploiting security vulnerabilities and slowing down a website’s performance, simply place the following code into the .htaccess file:

< IfModule mod_rewrite.c >
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_URI} /favicon?\.?(gif|ico|jpe?g?|png)?$ [NC]
RewriteRule (.*) http://your-site.com/favicon.ico [R=301,L]
< /IfModule >

That’s all that needs to be done in order to ensure a website’s continued security and integrity. When combined with the other four steps featured above, the website in question will now be perfectly optimized for an even higher page ranking with the major search engines. That means more visitors, more revenue, and greater interaction.

Author: (40 Posts)

Vladislav Davidzon is the principal of US-based online marketing consultancy Vladislav Davidzon & Associates, developing integrative solutions through high impact search engine optimized WordPress websites for socially responsible customers of all sizes around the world.

Comments