404 Error Control in Expression Engine

One of the things that has always concerned me about using Expression Engine is 404 error control. With the template system being so flexible with segments, includes, etc… there is a “vulnerability” when it comes to 404 error control. Not so much for a small EE site, but on large scale applications there could be problems with large numbers of bogus urls returning 200 status codes.

Problems could be caused by template coding errors or on the darker side of things it could be used by a competitor to damage your search engine rankings or create other problems for you.

They could point a bunch of links at some bogus urls on your EE site and create duplicate content problems or invent new urls for you, like

yoursite.com/services/consulting/WE-WILL-RIP-YOU-OFF/
or perhaps something even nastier than that like
http://expressionengine.com/showcase/interview/ovation_guitars/DOWNLOAD-WORDPRESS-INSTEAD/
You get the picture…

Now I realize that this could also be taken care of by the canonical meta tag, but that’s a wicked cop out. Plus that canonical tag stuff is pretty new and I’ve been using Expression Engine since it was called pMachine.

For the code examples below let’s assume that you have ditched the index.php from your EE urls. So instead of the out of the box install (yoursite.com/index.php/blog/post-url-title/) you have clean urls like yoursite.com/blog/post-url-title/

For Single Entry pages

On single entry pages like for displaying a blog post you might set up your site so your blog’s index page is blog/index where blog is also the name of your template group. Your blog/index template displays your blog’s home page, but also displays your blog posts via a conditional and the segment_2 variable. So if the page requested is yoursite.com/blog/another-blog-post/ it displays the weblog entry with the url_title “another-blog-post”. That’s the way this blog is set up.

What happens when there is a request for yoursite.com/blog/bogus-post-url/ ? You want this to return a 404 error and not just display your blog/index template.

Use the require_entry parameter in your weblog tag and use the no_results conditional to redirect any bogus urls to your 404 error template.

{if segment_2 == ""}
{!--- THIS IS THE BLOG HOME PAGE -----}
{!--- the code to display the BLOG HOME PAGE ----}
...
...
{/if}
{if segment_2}
{!---- THIS IS A SINGLE ENTRY BLOG POST -----}
{!--- NOT USING segment_3 IN THIS TEMPLATE, SO IF segment_3 then 404 it ---}
{if segment_3 != ""}{redirect="404"}{/if}
<head>
{exp:weblog:entries weblog="blog" limit="1" require_entry="yes" rdf="off" url_title="{segment_2}"}
{!---- IF url_title DOES NOT MATCH ANY EXISTING ENTRIES then 404 it ----}
{if no_results}{redirect="404"}{/if}
<title>{title} - Your Blog</title>
{!--- the rest of the code to display the rest of this blog post ----}
...
...
{/exp:weblog:entries}
{/if}

This way when a request for a bogus url like yoursite.com/blog/bogus-post-url/ is handled it redirects to your 404 error template instead of returning a 200 status code and displaying an empty template or your blog/index template. Since we are not using the segment_3 variable in this template we also took care of that as well.

For Pages module pages

Let’s say you have a setup like this:

The About section of your website is yoursite.com/about/ which is also what you named your template group and the template for this index page is about/index.

Instead of making this a “static” page with the pages module (which isn’t ideal for an index page) you made a template where you have the navigation/site map to all your /about/ pages via a weblog tag that pulls the entries from your pages weblog and sorts them however you need, by category or whatever.

This way when you add more pages to this section they automatically appear on the index page without having to modify a static index page, etc…

When the page module page is called it is displayed through another template: pages/index

Then in the pages module you set your page urls to be virtual subdirectories off of the /about index page you set up above. So your urls are something like this:

yoursite.com/about/ — the index page for this section

yoursite.com/about/me/interests/ — a page displayed via pages/index

yoursite.com/about/me/hobbies/ — a page displayed via pages/index

yoursite.com/about/company/services/ — a page displayed via pages/index

and so forth…

There’s two things going on here when you get a bogus url request like: yoursite.com/about/foo/bar/

First EE is going to look for a page to display and it doesn’t find one because you don’t have a page with the url about/foo/bar.

Next it’s going to look for a template in the url which it does find (the about/index template) and appends the /foo/bar segments on to it.

Without any 404 error checking that page and any other bogus urls pointed at your /about section are going to display your about/index template and return a 200 status code.

Lock this down by using Strict Urls and put the following code at the top of your about/index template

{if segment_2 != ""}{redirect="404"}{/if}

What this does is require a valid template for segment _2 of the url or in the above example the /foo/ part of about/foo/bar. So if there is no about/foo template in your about template group it will show a 404 error.

You should use this on every template where there should/will be no segment variables appended on to the url past the segment that calls the template (or the last segment used in that template).

Let’s say you have a template named privacy in your about template group that is accessed by yoursite.com/about/privacy. We already took care of the about/index template above, but you should use the strict url segment conditional above on your about/privacy template which would check for a bogus segment_3:

{if segment_3 != ""}{redirect="404"}{/if}

That way it will give a proper 404 status code if someone requests yoursite.com/about/privacy/does-not-exist/

On your pages/index template (that we set up above to handle the display of the entries in the pages weblog) use the require_entry parameter and the no_results conditional above. It should look something like this

{exp:weblog:entries weblog="pages" limit="1" require_entry="yes" rdf="off"}
{if no_results}{redirect="404"}{/if}
<head>
<title>{title} - Your Site</title>
{!--- the rest of the code to display the rest of this page ----}
...
...
{/exp:weblog:entries}

That way any direct requests to your pages/index template will give a 404.

There are probably other methods for controlling access to bogus urls and serving proper 404 status codes in EE, but these are a few examples of what I have been using over the last several years.

Again this really isn’t a security problem, but it could end up being a pain in the ass.

Keeping track of segments

One helpful tip when building an EE site is to use the following code in your footer_template to keep track of your url segments. I assume that you are building your EE site with includes for the head section and other includable parts of your layout like the navigation, sidebars, footer, etc… Structuring your templates this way makes it much easier to maintain your site going forward.

Assuming that your admin login name is “superuser”, put the following in one of your global template includes like the footer:

{if username == "superuser"}
URL segments:<br />
1 - {segment_1}<br />
2 - {segment_2}<br />
3 - {segment_3}<br />
4 - {segment_4}<br />
5 - {segment_5}<br />
6 - {segment_6}
{/if}

This way when you are in development mode you will be able to see the labeled url segments for every section and page of your site. This makes it much easier, especially if you are using segment conditionals to control the display of page content. I got this tip a few years ago from the EE forums and it has been a big help.

Comments

404 Error Control in Expression Engine — 4 Comments

  1. Useful article. The problem I’ve found, however, is that anything like this in or before the section causes EE to not generate anything afterwards i.e. incomplete web page which then doesn’t render properly.

  2. Thanks a lot for this post. In the past I had some difficulties trying to emulate the WordPress behaviour but now I can’t take full advantage of the EE’s 404 management features. This post is what I was looking desperately a few months ago so thank you for sharing, I’ve learnt a very useful method.