Monday, April 16, 2007

Apache How to

APACHE HOW TO

Installing the Unix version takes a bit of extra work. You'll first need to decompress the files using a command like this:

gzip -d apache_1.3.1.tar.gz

That leaves you with a file called apache_1.3.1.tar. To extract the Apache files from this file you need to use the following:

tar xvf apache_1.3.1.tar

Certain versions of Unix may require you to type tar -xvf, though most modern version of Unix should let you omit the minus sign and use just tar xvf.
This will create a subdirectory called apache_1.3.1. Go to that directory, and then compile Apache.

Compile Apache

There are precompiled versions of Apache available for many Unix platforms. For example, we used Red Hat Linux 5.1 as our Unix test platform and could have installed a prepackaged executable of Apache available from the Red Hat FTP site.
However, most Web builders download the source version of Apache and compile it themselves in order to gain control over the initial configuration. Once you've downloaded the source and extracted it to a directory on your local disk drive, you can change to that new directory and run the Configure program:

./configure

This creates a makefile using the default configuration file, src/Configuration.tmpl. If you want to make changes, such as including a module that's normally commented out of Apache, make that change in the Configuration.tmpl file before running Configure.
Now you need to issue two commands, one to build Apache, and the other to install it in the appropriate directories:

make & make install

This will create an Apache installation in the /usr/local/apache directory, unless you've overridden that default directory during the previous configuration step. For more on customizing Configure, check out the README.configure file that comes with Apache.
Once you've installed the Apache binaries, you're ready to learn how to run Apache.
Increase DOS environment space
Compiling Apache 1.3.1 for Windows requires the Visual C++ compiler from Microsoft. The easiest way to run the compilation scripts is from a command prompt, also known in Windows 95 as a DOS window. And the easiest way to run the Visual C++ command-line tools is by setting a number of environment variables. The Visual C++ set-up program normally puts all the environment settings in a special BAT file called vcvars32.bat. All you have to do is find and run that file (normally in the \Program Files\DevStudio\VC\bin directory) to set up your compilation environment.

On most Windows 95 and 98 installations, you don't usually have enough environment space to store all the compiler settings in vcvars32.bat. When you run the batch file, you'll get a bunch of "Out of environment space" warnings.
Digging back to the ancient days of DOS, the way you increase environment space is by adding something like the following line to your config.sys file:
shell=c:\command.com /e:1024 /p

You would replace the value 1024 with however many bytes of environment space you need. That's going to be a bit different from machine to machine, depending on what other software and settings you have, so you may want to play around with that value. Unfortunately, DOS requires you to reboot in order for any changes in config.sys to take place, so you're in for a couple reboots before you find the right amount of environment space.
Once you've got the program compiled, you're ready to try running your new Apache server.
Run Apache
Apache for Windows is run just like any other Windows application--click the appropriate shortcut in your Start menu. If you install Apache on Windows NT, you can configure it as an NT Service, which means it will start whenever NT is booted up. The Apache installation program installs a shortcut in your Start menu to let you configure this option.
Unix On Unix, you have two basic options for running Apache. The first is as a standalone server, which is the normal mode of operation. The second is as an inetd server, which is useful if you're experimenting with various Apache configuration settings.
Apache comes preconfigured to run standalone. To double-check this for your copy of Apache, open the httpd.conf file (normally stored in the /usr/local/apache/etc directory). Look for this line:

ServerType standalone

The httpd.conf file is one of Apache's configuration files. The ServerType command is called a directive, an instruction that tells Apache how to execute. The httpd.conf file is one of three Apache configuration files; the others are src.conf and access.conf. For now you just want to make sure the ServerType directive is set to standalone mode.
To actually run Apache, you execute the apachectl script. Assuming you used the default directories when you built Apache, the full command line would be:
/usr/local/apache/sbin/apachectl start

You would use apachectl stop to stop the server, and apachectl restart to restart it. If you want Unix to automatically start Apache whenever the computer boots up, copy the apachectl file into the /etc/rc.d/init.d directory.
While standalone mode is the simplest way to run Apache, there are times when you want to run it as an inetd service.

Configure inetd

In a production Web server, you'd most likely run Apache as a standalone server. But there are times when you might prefer to run it as an inetd server--on a test server with limited resources, for example.
Inetd is the Unix Internet daemon, which listens to the TCP/IP connection on your computer. Whenever inetd sees a call to a specific socket, it launches the program that's configured to handle that socket, such as Telnet (port 23) and SMTP (port 25). When the program finishes, inetd shuts it down.
This has the advantage of reusing limited resources--you don't need to have dozens of different programs running simultaneously, inetd will run them only when needed. On a test server, where you're trying different configuration directives, using inetd means you don't have to manually restart the server each time you alter the configuration.
If you want to try inetd, the first step is to edit the httpd.conf file by changing the ServerType:
ServerType inetd

Now add the following line to the /etc/inetd.conf file, which tells inetd what kind of services it's running and where to find the appropriate executable files:
httpd stream tcp nowait root /usr/local/apache/sbin/httpd -f /usr/local/apache/etc/httpd.conf

This assumes you want to run Apache as the root user. That's not a good idea on a production server, but on a test server it's probably acceptable and definitely simpler.
The last configuration change you need to make is in the /etc/services file, which tells inetd which port to associate with the HTTPD service. We've assumed the default HTTP port, port 80:

httpd 80/tcp httpd

Next, you need to restart inetd. To do that, you'll need to first figure out its process ID (PID):

ps auxw grep inetd

This runs the Unix ps command, which generates a list of running processes. That list is then piped to grep, which looks for any string that includes inetd. You'll probably get back two process IDs--one for inetd and one for grep inetd.
Finally, use the kill command to restart the appropriate PID:

kill -HUP pid

Once you get Apache running, it's time to figure out how to configure it.
Apache is controlled by a series of configuration files: httpd.conf, access.conf. and srm.conf (there's actually also a mime.types file, but you have to deal with that only when you're adding or removing MIME types from your server, which shouldn't be too often). The files contain instructions, called directives, that tell Apache how to run. Several companies offer GUI-based Apache front-ends, but it's easier to edit the configuration files by hand.
Remember to make back-up copies of all your Apache configuration files, in case one of the changes you make while experimenting renders the Web server inoperable.
Also, remember that configuration changes you make don't take effect until you restart Apache. If you've configured Apache to run as an inetd server, then you don't need to worry about restarting, since inetd will do that for you.
Download the reference card
As with other open-source projects, Apache users share a wealth of information on the Web. Possibly the single most useful piece of Apache-related information--apart from the code itself, of course--is a two-page guide created by Andrew Ford.
Called the Apache Quick Reference Card, it's a PDF file (also available in PostScript) generated from a database of Apache directives. There are a lot of directives, and Ford's card gives you a handy reference to them.
While this may not seem like a tip on how to run Apache, it will make your Apache configuration go much smoother because you will have the directives in an easy-to-access format.
One quick note--we found that the PDF page was a bit larger than the printable area of our printer (an HP LaserJet 8000 N). So we set the Acrobat reader to scale-to-fit and the pages printed just fine.

Use one configuration file
The typical Apache user has to maintain three different configuration files--httpd.conf, access.conf, and srm.conf. These files contain the directives to control Apache's behavior.
The tips in this story keep the configuration files separate, since it's a handy way to compartmentalize the different directives. But Apache itself doesn't care--if you have a simple enough configuration or you just want the convenience of editing a single file, then you can place all the configuration directives in one file. That one file should be httpd.conf, since it is the first configuration file that Apache interprets. You'll have to include the following directives in httpd.conf:

AccessConfig /dev/nullResourceConfig /dev/null

That way, Apache won't cough up an error message about the missing access.conf and srm.conf files. Of course, you'll also need to copy the directives from srm.conf and access.conf into your new httpd.conf file.

Restrict access

Say you have document directories or files on your Web server that should be visible only to a select group of computers. One way to protect those pages is by using host-based authentication. In your access.conf file, you would add something like this:
order deny,allowdeny from allallow from 10.10.64
The directive is what's called a sectional directive. It encloses a group of directives that apply to the specified directory. The Apache Quick Reference Card includes a listing of sectional directives.
The above case allows only computers with an IP address starting with 10.10.64 to access the pages in the given directory. You can use the complete IP address, an IP range as shown here, or even use the DNS names. For example, to allow only CNET computers access to a specific file, you might do this in your access.conf file:

order deny,allowdeny from allallow from .cnet.com

It's important to have that preceding period on the domain name, otherwise Apache allows only the computer that exactly matches cnet.com. If that's what you want, you can restrict to individual IP addresses and fully qualified domain names.
An interesting side-effect of host-based authentication is that if you're using a browser on the Web server machine itself and attempt to access the page through localhost, you'll be denied permission. That's because the localhost IP, 127.0.0.1, will not be in the .cnet.com range. You can easily add localhost to the permission list by putting the appropriate IP on the allow directive:
allow from .cnet.com 127.0.0.1
The majority of security measures you will need to take when running a publicly accessible Web site will be set at the operating system level. You will want to make sure write access is restricted in the directories where your Web pages are stored to keep visitors from defacing your site.

Customize error messages

If a user requests a page that doesn't exist or is in a protected directory, Apache returns one of its built-in error messages that say things like Forbidden or Not Found. That's accurate, but not very informative. You may want to give your users more guidance as to what they did wrong, provide an alternative URL to get them back in your site, or at least offer an error page that fits in with your overall site design. With a bit of editing, you can make Apache return a custom error page or run a script to handle the error.

Open the srm.conf file and insert the following:
ErrorDocument 404 /error.html

Your server will now return the error.html page whenever a user requests a page that doesn't exist (which is what the 404 error code means--check out the Apache Quick Reference Card for a list of other HTTP 1.1 status codes). In this example, the destination of the directive is an HTML page, but you could also point to a CGI or even a URL from a different Web site.
Unless you include a full URL, the ErrorDocument directive uses a path relative to the document root of your Web server. So in our example, error.html must reside in the Apache document root. By default that document root is /usr/local/apache/share/htdocs. Also, when Apache actually serves up this error page it does so within the context of the erroneous URL. So if a user requested a nonexistent page (http://www.dummydomain.com/one/two/none.html), Apache returns error.html as if it resided in the /one/two directory. That means you need to be careful and fully qualify any relative paths to images or other pages in the error.html file. Otherwise you might serve an error page that itself contained errors.

Support multiple languages

HTTP 1.1 formally specified a feature called content negotiation, which had actually been around for awhile in experimental servers, including early versions of Apache. It's a way to present documents in different languages and formats based on a user's browser configuration.
For example, suppose you're a Canadian company that needs to serve both French and English versions of your Web site. First, you must enable the feature by adding the appropriate directive to your access.conf file.
Open the access.conf file and find or create the appropriate entry where you plan to store the multilanguage pages. Then add the Options MultiViews directive to that section. Remember that Options All does not actually mean all--it doesn't turn on MultiViews support. So you must explicitly declare your intention to use MultiViews. For example:
Options MultiViews

Next, you need to edit your srm.conf file to include the languages you want to support and the file extensions associated with each language. The Canadian example calls for English and French, which have the standard identifiers en and fr, respectively. Your srm.conf file should already have these, but if not, add the appropriate lines:
AddLanguage en .enAddLanguage fr .fr
LanguagePriority en fr

The LanguagePriority directive is used when there's a tie during content negotiation. For example, if Apache can't tell whether the browser prefers English or French, the LanguagePriority directive tells Apache to serve the English version of the page.
For Apache to recognize which pages it should serve, you have to include the proper extension on your file names. If, for example, you want to offer a help file in two languages, you'd create a help.html.en and help.html.fr file with the appropriate language content. Then, when a user requests the http://yourdomain.com/multi/help.html file, Apache will check the browser's language preference and return the correct version.

Configure for server-side includes

If you want to take a small step beyond static HTML pages, but you aren't quite ready to dive into writing your own Perl scripts, then you should try server-side includes (SSI). With SSI turned on, Apache will preparse certain HTML files before sending them out, looking for special embedded commands. These commands allow you to do basic things like include the contents from another file or print out an environment variable.
To enable it, you first need to make sure it has been compiled into your version of Apache. Go to the directory where your httpd executable resides, typically /usr/local/apache/sbin, and type ./httpd -l. That should return a list of all the modules included in your build of Apache. Hopefully mod_include.c is in that list. If not, you'll have to rerun the build of Apache, editing the comment code from the mod_include in the Configuration.tmpl file.
Once you've determined that mod_include is available, you have to allow the execution of includes and map an appropriate filetype. As with all things Apache, there are about a gazillion ways to do this. Probably the easiest is to enable all the options in one place in your access.conf file:

Options +IncludesAddType text/html .shtmlAddHandler server-parsed .shtml

All files in the /usr/local/apache/share/htdocs/include directory that contain a .shtml extension get parsed by Apache before being sent out to a browser.

In many instances, the AddType and AddHandler directives are already in your srm.conf file, but they're commented out. So you could uncomment those, and in your access.conf file set the Options to allow executing include commands. Note the use of the plus sign in the Options directive--that tells Apache to add this option to any preceding options settings, rather than overriding them. If you want to limit the SSI support to prevent executing potentially dangerous programs, you might want to use Options +IncludesNOEXEC.

To test your settings, create a test.shtml file like this one:
(head)(title)SSI Test(/title)(/head)(body bgcolor="white")(h1)SSI Test(/h1)File last modified (p)(pre)(/pre)(p)(/body)(/html)

Apache will attempt to parse any text that starts with a