ModRewriteWOHyperlink

It is important for dynamic web sites to have URLs which are both easy for humans to read (and copy/paste) as well as readily spiderable by Googlebot and friends. Unfortunately, most web sites employ URLs which are just nasty looking — they've got "?" marks, variable associations using "=" signs, and strange paths with "cgi-bin" and other nonsense. Compare the two examples below:

Do you see the difference between the two URLs shown above? Do you see their similarities? In reality, both URLs actually take you to the same web page, but the "static" version is more desirable because it is easier for humans to comprehend and it is more likely to be indexed by a search engine. The remainder of this tutorial shows how modify the Apache web browser and use Jewelry Luv's "ModRewriteWOHyperlink" class to automagically convert all your dynamic URLs into static ones. This is a "plug-and-play" solution which will not require major rewrites to your existing applications. Go ahead and download our modified WOHyperlink class now.

The Problem

In the beginning, all web sites were written by hand, one page at a time. The links which connected the pages were deliberately written created, one-at-a-time, by the webmaster. This is a time intensive process for sites which are updated often.

Soon, clever programmers realized that that web pages could be dynamically constructed, on the fly, from values in a database. This idea has taken over the Web, and most major web sites no longer have many physical .html pages. Instead, these sites pull all their structure from the database and dynamically piece together pages as users request them.

Now software decides what URLs should look like. URLs are automatically created and variables are embedded in them. Those variables give the web application the information it needs to pull the right information from the database. For the spider, wishing to index the world wide web, there is a real danger with dynamic web sites. The fear is that the spider may get stuck in an infinite loop. Since a dynamic site has no "real" pages, there is the possibility that every page the spider visits is slightly different from the next. For this reason, all spiders make a decision… they don't follow any dynamic links, or if they do follow them, those links will rate lower than static ones and the spider will only go a few levels deep into the web site.

The Solution

Dynamic web sites are a good thing. They allow a programmer to set up the structure of a web site, then let mere mortals update the site and add new content every day. To get spiders to fully appreciate your dynamic site, you are going to have to make your dynamic URLs indistinguishable from static ones. This requires a bit of wizardry and planning, but in the end, your site will be easier to use by robots and humans alike.

The Webserver

You need a way to rewrite fictitious "static" looking URLs into real dynamic URLs. That's the trick. All of the major webservers provide a way to map a static URL to the true dynamic URL that your web application needs. On Apache, there is a module called "mod_rewrite" which does this task beautifully.

First, if you have not done so already, you should install an administration tool called "WebMin." This is a very powerful web based interface to many unix server applications such as Apache. I'd write about how to install it, but Ward Mundy has already done a great job of that in his article titled "ISP-In-A-Box: The $500 Mac mini (Chapter V, WebMin)." Follow the link to Ward's tutorial then scroll down the page a bit to see the part where he instructs you how to install WebMin.

Using Webmin is pretty intuitive. Still it will take some time to learn and get acquainted with it. Everyone has a different agenda with their web site, so I won't even attempt to advise you on how to best configure your server. I will; however, point out that Apache has the ability to create "virtual servers." This allows you to serve two or more sites from just one physical computer. You'll need to configure your DNS server to point each of your web domains to this one computer. Then, using WebMin to configure Apache, you can have as many virtual servers as you want. Here is an article which gives an overview of the virtual server setup process.

The mod_rewrite engine — now for the heavy lifting, the mod_rewrite rules do the mapping between the fake static URLs and the true dynamic ones. It uses "regular expressions" to match parts of a fake static URL so that you can map them to variable names. Regular expressions allow for some pretty amazing matching, but you don't need to master this language because a few simple rules will get you going. Use the outline below as a guide:

  • "^" — the carrot character starts the matching string
  • "$" — the dollar sign character ends the matching string
  • "(.*)" — this series of characters signifies one variable

Example 1 - match your domain to a dynamic start page
For the URL http://www.jewelryluv.com/, we want to transparently redirect users to our dynamic home page at:
http://www.jewelryluv.com/cgi-bin/WebObjects/Jewelry.woa/wa/default
To do that, we want to use the regular expression "^/$" which matches just the single trailing slash after our domain name. So we put the regular expression on the left and the mapping rule on the right like so:
RewriteRule ^/$ /cgi-bin/WebObjects/Jewelry.woa/wa/default [L,PT]

Example 2 - map two variables in a static URL to a dynamic URL
Here we want to map two variables found in the static URL using the "(*.)" syntax to denote one variable. We put that syntax in two places on the left-hand side. Then we use the numbers "1" and "2" to denote "the first match" and "the second match" respectively:
RewriteRule ^/fashion/showDetail/(.*)/(.*)/$
/cgi-bin/WebObjects/Jewelry.woa/wa/showDetail?parentMenuIndex=$1&merchandiseID=$2 [L,PT]

For your reference, the entire rewrite rules for Jewelry Luv are shown below. These were taken directly from our virtual server directives via the WebMin interface:

ServerName www.jewelryluv.com
DirectoryIndex jewelryluv-redirect.html
RewriteEngine ON
RewriteRule ^/$ /cgi-bin/WebObjects/Jewelry.woa/wa/default [L,PT]
RewriteRule ^/fashion/pageWithName/(.*)/$ /cgi-bin/WebObjects/Jewelry.woa/wa/pageWithName?pageName=$1 [L,PT]
RewriteRule ^/fashion/default/$ /cgi-bin/WebObjects/Jewelry.woa/wa/default [L,PT]
RewriteRule ^/fashion/linkToNextPageFromMenu/(.*)/$ /cgi-bin/WebObjects/Jewelry.woa/wa/linkToNextPageFromMenu?link=$1 [L,PT]
RewriteRule ^/fashion/linkToNextPageFromSubMenu/(.*)/(.*)/$ /cgi-bin/WebObjects/Jewelry.woa/wa/linkToNextPageFromSubMenu?title=$1&link=$2 [L,PT]
RewriteRule ^/fashion/showDetail/(.*)/(.*)/$ /cgi-bin/WebObjects/Jewelry.woa/wa/showDetail?parentMenuIndex=$1&merchandiseID=$2 [L,PT]
RewriteRule ^/fashion/showBlowUp/(.*)/(.*)/(.*)/$ /cgi-bin/WebObjects/Jewelry.woa/wa/showBlowUp?parentMenuIndex=$1&merchandiseID=$2&imageID=$3 [L,PT]
RewriteRule ^/fashion/addToCart/(.*)/(.*)/$ /cgi-bin/WebObjects/Jewelry.woa/wa/addToCart?itemCodeStr=$1&pageName=$2 [L,PT]
RewriteRule ^/fashion/miscLinks/(.*)/$ /cgi-bin/WebObjects/Jewelry.woa/wa/miscLinks?relation=$1 [L,PT]
RewriteRule ^/fashion/defaultWithSpecifiedCSS/(.*)/$ /cgi-bin/WebObjects/Jewelry.woa/wa/defaultWithSpecifiedCSS?cssPropFileStr=$1 [L,PT]


The Web Application

The preceding section showed you what mod rewrite rules are and how to configure them. It did not tell you how your WebObjects application was going to generate those static URLs. Here is the highlight of this tutorial, where we use the object oriented design of WebObjects to make your application produce those static URLs with hardly any effort.

If you have not done so already, download our ModRewriteWOHyperlink.java class now. Add it to your WebObjects project along side Application.java. Be sure to select "Application Server" as your target.

Next, open up Application.java and make a few additions:

1) Add the following line to the list of import statements at the top of the file.

import com.webobjects.appserver._private.*;

2) Add the following method:

/**
* Sets the class registered for the name <code>className</code> to the given class.
* Changes the private WebObjects class cache.
*
* @param clazz class object
* @param className name for the class - normally clazz.getName()
*/
public static void setClassForName(Class clazz, String className) {
_NSUtilities.setClassForName(clazz, className);
}

3) Add the following two lines to the end of the Application() constructor method:

// Use our modified version of WOHyperlink
setClassForName(ModRewriteWOHyperlink.class, "WOHyperlink");

4) Compile your project and then move it out to your production server. Log into the WebObjects "Monitor" application with your Web browser and then click on the "configuration" button. Scroll down to the section labeled "Additional Arguments" and add the following:

-WODirectConnectEnabled NO

5) You are done! View your live application with your web browser and see all the beautiful static URLs it creates. Use the WebMin interface to Apache to add additional mod_rewrite rules as needed. During development, you application will use the true dynamic URLs. When you push your app live, it will automagically start using static URLs. Life is sweet.

Why This Works

Notice that you did very little to hook in our ModRewriteWOHyperlink class. That's the real beauty here. Not only do you not have go through any troublesome maneuvers to start showing static URLs, but you can go on using WOHyperlinks in WebObjects builder just as you always have.

What you've done is replaced WOHyperlink with a new version that knows when to show static URLs and when to show dynamic URLs. We re-implemented all the methods in WOHyperlink, and added a few more. We would have preferred to extend the existing WOHyperlink class, and not rewrite all the other methods, but we didn't have access to a few important (but private) member variables. In step #3, the call to setClassForName(), you are telling WebObjects to use our new class instead of the default WOHyperlink class.

modRewritePathStr — this is the name of a public String that you can set using Key-Value-Coding. It is located at the top of ModRewriteWOHyperlink.java. The default is "fashion" but you may change it to anything else you want. Since you have the source code, you could even hard code a new value. This shows up in your static URLs right after your domain name. For example: http://www.jewelryluv/fashion/.

modRewriteString() — this method produces the "static" URL as a String. It is a helper method. It is listed in its entirety below (notice it uses modRewritePathStr):

protected String modRewriteString( WOContext wocontext )
{
String actionStr = computeActionStringInContext(_actionClass, _directActionName, wocontext);
NSDictionary nsdictionary = computeQueryDictionaryInContext(_actionClass, _directActionName, _queryDictionary, _actionClass != null, _otherQueryAssociations, wocontext);
String actionVars = computeActionVariables( nsdictionary );
String fullPath = "/" + modRewritePathStr + "/" + actionStr + "/" + actionVars;
//System.out.println( "fullPath = " + fullPath );
return fullPath;
}

appendAttributesToResponse() ? this method is called when the object oriented design asks a component to print itself. It's like the "toString()" method except rather than printing a textual representation of itself, it prints the HTML necessary to render this object. Here it acts like a traffic cop. There are a couple of if-then-else blocks. The first "if" block checks to see if we have an explicitly declared href. Next we check to see if one three scenarios holds true, and if so, use the origAppendAttributesToResponse() method. Here are the three scenarios:

  1. isDirectConnectEnabled — if true, this means that we are in rapid turnaround mode and actively developing our application. More specifically, if true, we can connect directly to our app and not have to go through Apache. We specifically set this value to "NO" in our live production app. Therefore our production app will always return false for this variable.
  2. isUsingSession — if true, this means that the user has creating special memory just for them. Most likely they have started using a shopping cart, have logged in, etc. At this point, if the value is true, then we want to show the dynamic URL. This is because we need to tack on a wosid variable to identify the user. Also, you spiders won't travel a url that has a guid like a session variable, you wouldn't want them to anyway.
  3. hasExplicitHref — if true, it means the URL is already defined and we don't need to do any processing.

If none of the three scenarios above hold true, we fall into the "else" block. There we first call "super.appendAttributesToResponse()" to add any misc items like "target=_blank" or "rel=nofollow" to our href tag. In WebObjects, you can bind any number of additional attributes such as "target" and "rel" to your components. Finally, we go through the act of actually building the "href" portion and then calling "modRewriteString()" to populate it. Here is the code:

public void appendAttributesToResponse(WOResponse woresponse, 
					WOContext wocontext)
{
    boolean isDirectConnectEnabled = 
	wocontext.component().application().isDirectConnectEnabled();
    String aWOsid = (String) wocontext.request().formValueForKey( "wosid" );
    boolean isUsingSession = 
	wocontext.component().hasSession() || aWOsid != null;
    boolean hasExplicitHref = false;
    if( _href != null )
    {
        Object obj1 = _href.valueInComponent( wocontext.component() );
        if( obj1 != null )
            hasExplicitHref = true;
    }
//System.out.println( "isDirectConnectEnabled = " + 
	isDirectConnectEnabled + "\n" );
//System.out.println( "isUsingSession = " + isUsingSession + "\n" );
//System.out.println( "hasExplicitHref = " + hasExplicitHref + "\n\n" );
    if ( isDirectConnectEnabled || isUsingSession || hasExplicitHref )
        origAppendAttributesToResponse( woresponse, wocontext );
    else
    {
        super.appendAttributesToResponse(woresponse, wocontext);
        //Do mod_rewrite
        woresponse.appendContentCharacter(' ');
        woresponse._appendContentAsciiString("href");
        woresponse.appendContentCharacter('=');
        woresponse.appendContentCharacter('"');
        woresponse._appendContentAsciiString(modRewriteString(wocontext));
        woresponse.appendContentCharacter('"');
    }
}

There you have it! All of your DirectActions will automagically be created as "static" URLs when you move your project to production. All the time you are in development, you'll be working with the true dynamic URLs. Once in production, all you have to do is add the new mod rewrite rules to Apache via the WebMin interface.