How to make clean URLs.

I wrote this when I was, like, 15. Tagged as Web Development, Life Skills.

I guess I should first make clear what a messy URL is, and what a clean URL is. Here's a messy URL:

http://www.desiquintans.com/index.php?page=articles

And here's a clean URL:

http://www.desiquintans.com/articles/

Notice how much nicer the last one looks. Clean URLs are great for 6 reasons:

  1. They look prettier
  2. They are easier to remember
  3. They help you save filespace on pages which have many links
  4. They are easier to link to, both for you and for other webbies
  5. They help cut down on typos because there is less confusion about what exactly to write and type

And the really big reason as to why you should use clean URLS:

  1. They allow search engines to spider your site.

Why are spiders so picky?

Search engine spiders usually stay away from messy URLs because chances are the code is really bad on whatever script you use and the spider could become stuck in a loop, requesting the same pages over and over and draining you of your bandwidth.

.htaccess is your friend

Anyway, how do we make our URLs clean? Well, you need a server that runs Apache. This solution only works for Apache servers, so Win2k users have to go somewhere else for their kicks.

The first step is to make an ․htaccess file. It doesn’t need to be anything fancy.

  1. Open Notepad or EditPad Lite or whatever the hell you use to make plain text files (that is, a program with no text formatting)
  2. Save a new blank document as ․htaccess

There’s your .htaccess file.

․htaccess hackery

Now for the actual commands. The ․htaccess is really just a container that holds the commands that you want Apache to execute. Paste this into your file:
RewriteEngine On

That tells Apache to turn mod_rewrite on, but I don’t know anything about that, and you don’t need to either.

Now figure out the query string that your site’s messy URLs use. My example was http://www.desiquintans.com/index.php?page=articles, so my query string is index.php?page=articles. This would be what I put in my ․htaccess:

RewriteRule ^([a-zA-Z0-9]+)/$ index.php?page=$1

RewriteRule explained

Let me explain the components of the above command.

The caret (the ^ symbol) means “all the stuff before this.” It’s the URL where this particular ․htaccess file is located, so if you put it in www.x.com/hi/ it․ll stand for www.x.com/hi/, and if you put it in www.x.com․s public folder, it․ll stand for www.x.com.

([a-zA-Z0-9]+) is a variable set that means, “any amount of characters that are lowercase alphabetical, uppercase alphabetical and numerical.” The stuff inside the square brackets is the variable itself, the plus sign outside the square brackets tells it that any amount of characters is fine. If I didn’t have the plus sign I would only be allowed one character in each clean URL. A list of common variables is at the bottom of the page.

$ means that this is the end of the clean URL, and that the messy one that Apache is supposed to fix up is following. Please note the slash just before this dollar sign -- you can either leave it out or keep it, but it’s best to do both in their own RewriteRules.

$1 is the number of a specific variable. Since I have only one variable set ( ([a-zA-Z0-9]+) ) I only need to specify one variable. If I had several variable sets, like section/([a-zA-Z0-9]+)/page/[0-9] then I would have to specify $1 and $2 variables.

Save the ․htaccess file and upload it to your root public directory—the directory with your main index page.

My ․htaccess file

My sample ․htaccess file has this:

RewriteEngine On
RewriteRule ^([a-zA-Z0-9]+)$ index.php?page=$1
RewriteRule ^([a-zA-Z0-9]+)/$ index.php?page=$1

Notice that the two RewriteRules allow the user to have a trailing slash or no trailing slash when they enter a URL. You should allow for both of these instances, because there isn’t a person alive who won’t skimp on a slash when they think they’re accessing a directory. A clean URL is not a directory—it’s just a redirected messy URL.

Common variable sets

  • [A-Z] The variable can have uppercase letters
  • [a-z] The variable can have lowercase letters
  • [0-9] The variable can have numbers
  • ([a-zA-Z0-9]+) The variable can have any of the above, and has no limit to the number of characters.