How to make clean URLs
Clean URLs are prettier, easier to remember, more compact, are easier to link to and allow search engines to spider your site.
I guess I should first make clear what a messy URL is, and what a clean URL is. Here’s a messy URL:
And here’s a clean URL:
Notice how much nicer the last one looks. Clean URLs are great for 6 reasons:
And the really big reason as to why you should use clean URLS:
Search engine spiders usually stay away from messy URLs because chances are the code is really bad on whatever script you use and the spider could become stuck in a loop, requesting the same pages over and over and draining you of your bandwidth.
Anyway, how do we make our URLs clean? Well, you need a server that runs Apache. This solution only works for Apache servers, so Win2k users have to go somewhere else for their kicks.
The first step is to make an .htaccess file. It doesn’t need to be anything fancy.
There’s your .htaccess file.
Now for the actual commands. The .htaccess is really just a container that holds the commands that you want Apache to execute. Paste this into your file:
That tells Apache to turn mod_rewrite on, but I don’t know anything about that, and you don’t need to either.
Now figure out the query string that your site’s messy URLs use. My example was http://www.desiquintans.com/index.php?page=articles, so my query string is index.php?page=articles. This would be what I put in my .htaccess:
RewriteRule ^([a-zA-Z0-9]+)/$ index.php?page=$1
Let me explain the components of the above command.
The caret (the ^ symbol) means “all the stuff before this.” It’s the URL where this particular .htaccess file is located, so if you put it in www.x.com/hi/ it.ll stand for www.x.com/hi/, and if you put it in www.x.com.s public folder, it.ll stand for www.x.com.
([a-zA-Z0-9]+) is a variable set that means, “any amount of characters that are lowercase alphabetical, uppercase alphabetical and numerical.” The stuff inside the square brackets is the variable itself, the plus sign outside the square brackets tells it that any amount of characters is fine. If I didn’t have the plus sign I would only be allowed one character in each clean URL. A list of common variables is at the bottom of the page.
$ means that this is the end of the clean URL, and that the messy one that Apache is supposed to fix up is following. Please note the slash just before this dollar sign – you can either leave it out or keep it, but it’s best to do both in their own RewriteRules.
$1 is the number of a specific variable. Since I have only one variable set ( ([a-zA-Z0-9]+) ) I only need to specify one variable. If I had several variable sets, like section/([a-zA-Z0-9]+)/page/[0-9] then I would have to specify $1 and $2 variables.
Save the .htaccess file and upload it to your root public directory — the directory with your main index page.
My sample .htaccess file has this:
RewriteEngine On RewriteRule ^([a-zA-Z0-9]+)$ index.php?page=$1 RewriteRule ^([a-zA-Z0-9]+)/$ index.php?page=$1
Notice that the two RewriteRules allow the user to have a trailing slash or no trailing slash when they enter a URL. You should allow for both of these instances, because there isn’t a person alive who won’t skimp on a slash when they think they’re accessing a directory. A clean URL is not a directory — it’s just a redirected messy URL.