How to make clean URLs

Created on Monday, November 8, 2004. I last modified it on Friday, March 4, 2022.
Filed under Life Skills, .
 

Clean URLs are prettier, easier to remember, more compact, are easier to link to and allow search engines to spider your site.

 

Here’s a messy URL:

https://www.desiquintans.com/index.php?page=articles

And here’s a clean URL:

https://www.desiquintans.com/articles/

Notice how much nicer the last one looks. Clean URLs are great because

  1. they’re easy to remember,
  2. they’re easy to type, like if you’re advertising your site on printed material, and
  3. if you change the way your website is organised or even the software it runs on, you can update them to point to the right page.

.htaccess is your friend

This method uses a .htaccess file to tell the server (one running Apache) what to do. The first step is to make a .htaccess file.

  1. Open Notepad or EditPad Lite or whatever you use to make plain text files (that is, a program with no text formatting).
  2. Save a new blank document as .htaccess.

There’s your .htaccess file.

Help! Even an empty .htaccess file creates a 500 Internal Server Error!

This serverfault thread talks about some of the things that can create this, like linebreaks being DOS and not Unix, or a byte-order-marker in the file.

Simple solution: Use your FTP program to create and edit the .htaccess file on the server.

.htaccess hackery

The .htaccess file holds commands that you want the server to execute. Paste this into your file:

RewriteEngine On

That tells Apache to turn mod_rewrite on, but I don’t know anything about that, and you don’t need to either.

Now figure out the query string that your site’s messy URLs use. My example was https://www.desiquintans.com/index.php?page=articles, so my query string will be index.php?page=articles. This would be what I put in my .htaccess:

RewriteRule ^([a-zA-Z0-9]+)/$ index.php?page=$1

RewriteRule explained

Let me explain the components of the above command in a very simplified and in-context way:

The caret (the ^ symbol) stands for the URL where this particular .htaccess file is located, so if you put it in www.x.com/hi/ it.ll stand for www.x.com/hi/, and if you put it in www.x.com.s root folder, it’ll stand for www.x.com.

([a-zA-Z0-9]+) is a variable set that means, “any amount of characters that are lowercase alphabetical, uppercase alphabetical and numerical.” The stuff inside the square brackets is the variable itself, the plus sign outside the square brackets tells it that any amount of characters is fine. If I didn’t have the plus sign I would only be allowed one character in each clean URL. A list of common variables is at the bottom of the page.

$ means that this is the end of the clean URL, and that the messy one that Apache is supposed to fix up is following. Please note the slash just before this dollar sign – you can either leave it out or keep it, but it’s clearest to do both in their own RewriteRules.

$1 is the number of a specific saved variable set. Since I have only one variable set ( ([a-zA-Z0-9]+) ) I only need to specify one variable. If I had several variable sets, like section/([a-zA-Z0-9]+)/page/([0-9]+) then I would have to specify $1 and $2 variables.

Save the .htaccess file and upload it to the desired directory. This is usually your root public directory — the directory with your main index page.

My .htaccess file

My sample .htaccess file has this:

RewriteEngine On
RewriteRule ^([a-zA-Z0-9]+)$ index.php?page=$1
RewriteRule ^([a-zA-Z0-9]+)/$ index.php?page=$1

Notice that the two RewriteRules allow the user to have a trailing slash or no trailing slash when they enter a URL. You should allow for both of these instances.

Common variable sets

  • [A-Z] The variable can have uppercase letters
  • [a-z] The variable can have lowercase letters
  • [0-9] The variable can have numbers
  • [-_] The variable can have a hyphen - or underscore _
  • ([a-zA-Z0-9]+) The variable can have any of the above, and has no limit to the number of characters.
That's all there is, there isn't any more.
© Desi Quintans, 2002 – 2022.