Wednesday, September 18, 2024

apache tricks

Apache, the most popular web server on earth, is distributed with a large number of modules, some of which are included by default when you compile the package, and some which aren’t. One of those that you can optionally compile is mod_vhost_alias, which I find particularly useful. It’s been available since Apache version 1.3.7.

Mod_vhost_alias adds dynamic virtual hosting. That is, we don’t need a VirtualHost declaration for each site. Using mod_vhost_alias directives, we can rewrite requests based on the IP address or the host name of the requested resource and return documents from the correct site. The advantages of this capability are that you can add any number of web sites without resorting to modifying and reloading the Apache configuration time and time again.

The extra directives available to us when mod_vhost_alias is present are: VirtualDocumentRoot, VirtualDocumentRootIP, VirtualScriptAlias, and VirtualScriptAliasIP.

Those ending with ‘IP’ use the IP address of the site requested, whereas the others use the host name. Since IP addresses are hard to come by these days, that is until IPV6, the 128 bit net addressing protocol becomes the norm, we’ll concentrate on using VirtualDocumentRoot and VirtualScriptAlias to maximize the number of sites we host under one IP. If you need to use the IP versions of these instructions, it shouldn’t be difficult for you to figure out since they work exactly the same way, but parse the IP of the requested host name instead.

Let’s start with an example from an imaginary configuration file…

<VirtualHost 208.231.13.91>
CustomLog /www/logs/208.231.13.91.log vcommon
UseCanonicalName Off
VirtualDocumentRoot /www/webdocs/%-2/
VirtualScriptAlias /www/scripts/%-2/
</VirtualHost>

You’ll first notice the “UseCanonicalName Off” core directive. This is mandatory for our purposes as it tells Apache to use the host name as requested by the client rather than a value set in a ServerName directive or devising one if it’s absent. You’ll also notice that all our sites’ web documents must be in a sub-directory of /www/webdocs and all their scripts in a sub-directory of /www/scripts. For each site you added, you’d have to create these sub-directories and add a dns zone with the IP 208.231.13.91, but that’s all. Of course, you can use your own paths.

More important are the strange %-2 in the paths. This is a vhost instruction that allows us to extract a part of the host name and use it in the rewriting of the document path. The parts of the host name are determined by the ‘.’ it contains. Thus, www.xyzcgrf.com has 3 parts. ‘%-2’ means “extract the second to last part of the host name”. Again using our example, suppose we have a request for “http://www.xyzcgrf.com/hello.htm”…Since xyzcgrf.com resolves to 208.231.13.91, Apache would use our VirtualHost declaration and would replace ‘%-2’ with ‘xyzcgrf’, as the latter is the before last part of the host name. The document to return would thus be found at: /www/webdocs/xyzcgrf/hello.htm.

Continuing with our example, a request for the script “http://www.xyzcgrf.com/cgi-bin/test.cgi” would execute the script found at /www/scripts/xyzcgrf/test.cgi. Note that even if the requested url omitted the ‘www.’, it would still be rewritten correctly in my example. While the above example would serve you well for the dot-com and dot-net type domains, it would not fare well for country domains. Thus, a request for http://xfxfvc.co.uk and http://ghtrrd.co.uk would both translate to a path of /www/webdocs/co/, which is obviously not what we intended. There’s actually a whole slew of interpolation meta characters that we can use, so we’re not stuck. Here are some of them, using http://www.bogus.xyzcgrf.com as an example.

%0 : use the whole name [www.bogus.xyzcgrf.com]
%1 : use the first part [www]
%2 : use the second part [bogus]
%3 : use the third part [xyzcgrf]
%-1 : use the last part [com]
%-3 : use the third to last part [bogus]
%2+ : use the second and all subsequent parts [bogus.xyzcgrf.com]
%3+ : use the third and all subsequent parts [xyzcgrf.com]
…etc…

You can also go nuts by extracting a part from a host name, then extracting a part from that part. In the latter case, the part would be a character or sequence of characters. We do this by using the format ‘%N.M’, where %N is our first extract and ‘M’ is the second. The ‘.’ is mandatory, but you omit the ‘%’ in the second. For example…

VirtualDocumentRoot /www/webdocs/%-2.2/

…if we put the url http://www.bogus.xyzcgrf.com through this directive, we get /www/webdocs/y/. That’s because %-2 extracted xyzcgrf, and the ‘2’ gave us the second letter ‘y’. I never needed this capability, but maybe you’ll find some use for it. If what truly interests you is extreme url rewriting, there’s another module which can allow you to slice and dice urls any which way, called “rewrite_module”. We’ll explore its possibilities in the near future.

In order to smoothly rewrite paths for our xfxfvc.co.uk type host names that our first example fumbled with, we need to assign them a different IP in a separate VirtualHost declaration. Our vhost directives in that one would look like this…

VirtualDocumentRoot /www/webdocs/%-3/
VirtualScriptAlias /www/scripts/%-3/

Thus, we’d extract the xfxfvc rather than the ‘co’. I believe in keeping things simple, so I avoid using anything but the %- interpolations, as the rightmost part of any host name is reliable whereas the leftmost isn’t. One drawback to using vhost_mod is that you can’t have individual log files for each site sharing the IP. Instead, all will log to the same files. The way we compensate for that is by creating a special logging definition that will include a field that registers the domain of the request. Thus, for example, the instruction…

LogFormat “%V %h %l %u %t “%r” %s %b” vcommon

Appears somewhere in our configuration file. The “%V” tells apache to log the host name requested by the client [See: mod_log_config].

With the LogFormat we’ve just defined, your log might look something like this…

www.rtzxfgh.com 172.128.55.43 – –
[03/May/2001:10:21:48 -0400]
“GET /bogus.htm HTTP/1.1” 200 0
www.ouwqagxh.com 172.128.55.44 – –
[03/May/2001:10:21:49 -0400]
“GET / HTTP/1.1” 200 0

In order to generate individual traffic reports for each site sharing the log file is a statistical tool that can parse such a log. We’ll compare various log analyzers in a future issue.

Some useful links you can follow to get further knowledge on the subjects discussed today:

mod_vhost_alias
UseCanonicalName
mod_log_config

Murdok provides free highly informative newsletters for web developers, IT professionals and small business owners. We deliver 50 million email newsletters per month and have over 4,000,000 unique opt-in subscribers. From our extensive range of email newsletters we can provide you with a selection of newsletters that best meet your interests.

Related Articles

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles