A Multithreaded Web Server

Table of Contents

This is a pair programming assignment. If you are on a team, this means that you and your partner should be doing the entirety of this assignment side-by-side, on a single computer, where one person is "driving" and the other is "navigating." Take turns every so often who is driving; you should each spend approximately 50% of the time driving.

Introduction

For this project, you will implement a multithreaded web server. This project is designed to give you some practice writing client-server socket programs and writing multithreaded programs, as well as familiarizing you with the HTTP protocol.

Part 1: Sockets

Sockets allow computers to communicate over a network connection. A socket is similar to a file handle, except instead of sending data to and from disk, it sends it over the network.

Setup

If you are running on your own computer, download this zip file and unzip it somewhere. This is a scrape from an old version of the department website, downloaded a couple of years ago. This is the website that your web server will ultimately serve code for. These files are also available on the lab machines in /Accounts/courses/cs348w16/website, so you don't need to unzip them anywhere if you're working in the lab.

Initial socket code

Go through the Java tutorial on sockets, especially the knock-knock joke example. Get that code running, and ask any questions you have about it.

Your task

Modify the above knock-knock code it so that when it tells a knock-knock joke, the punchline is the HTML code for one of the pages in the department website. (I admit, HTML is not very funny as a joke, but this will help you get your webserver going.) The location for the website should be specified on the command line when running the server, so we should be able to test your code in two different terminal windows as follows:

javac KnockKnockServer.java
java KnockKnockServer rootDir
javac KnockKnockClient.java
java KnockKnockClient localhost:8888

The command line argument rootDir to the server should indicate the directory where the website is being stored, and the command line argument to the client should indicate the URL for the server. If you are running them both on the same computer, localhost will do that. Your server should list for connections on port 8888.

Note that compiling KnockKnockWebServer.java will compile any dependent files as well, so you can use as many files as you need.

Make sure that you pay attention to the software licenses on the sample code you download, and observe them accordingly.

Part 2: Simplistic Single-Threaded Web Server

For this portion of the assignment, you should transform your knock-knock server into an HTTP server. It should be able to serve up pages that are viewable in a web browser; this is a fully-functioning (though limited in capability) web server.

When your work is complete, we should be able to test it by running:

javac WebServer.java
java WebServer rootDir

The two aspects of this assignment that you do not need to implement yet are:

  1. Multithreading. That will happen later.
  2. More complicated HTTP requests. For this portion of the assignment, you only need to make simple GET requests work for a specific filename.

When your work is complete, we should be able to test it by running:

javac WebServer.java
java WebServer rootDir

We should then be able to start a browser on the same compter, visit localhost:8888/index.html. Your browser should then display at least a portion of the Carleton Computer Science Department home page. Only the front page was downloaded, so the links probably won't work! Here's a screenshot showing something of what you should expect to see.

HTTP

Read HTTP Made Really Easy by Jim Marshall. At a minimum, read it carefully through the section titled "Sample HTTP Exchange."

One important detail involves whitespace characters. It is very important that you can interpret the format of a client request correctly, and that you send correctly formatted responses to clients. Many parts of a correctly formatted message involve sequences of carriage return and newline characters (i.e., \r\n ). These are used to signify the end of all or part of a "message". Here is the general format of a server request:

initial line
Header1: value1
Header2: value2
Header3: value3

(optional message body goes here)

For example, a GET response for a very simple page may look like:

HTTP/1.1 200 OK
Date: Sun, 10 Feb 2013 18:17:43 GMT
Content-Type: text/html
Content-Length: 54

<html><body>
<h1>CS 348 Test Page</h1>
</body></html>

It is very important that each header line ends with a \r\n and that there is a blank line (another \r\n) between the headers and the message body. The message body, however is sent without a trailing \r\n. Instead the header Content-Length is used to tell the client the size of the message body.

Web clients

There are many ways of testing your web server, and you may find some of them useful:

  1. telnet server port_num, then type in a GET command (make sure to enter a blank line after the GET command). For example:

    $ telnet localhost 8888
    
    GET /index.html HTTP/1.0
    

    telnet will exit when it detects that your web server has closed its end of the socket (or you can kill it with ctrl-C, or if that doesn't work use kill or killall: killall telnet.

  2. Firefox/Chrome: Enter the URL of the desired page specifying your web server using its IP:port_num (e.g. http://137.22.4.77:8888/index.php). You can also just use localhost or the host name on our system:

    localhost:8888/index.php
    
  3. wget:

    wget -v localhost:8888/index.html
    

    wget copies the html file returned by your web server into a file with a matching name (index.html) in the directory from which you call wget.

  4. Your client program from part 1, or some modification of it. This might be useful if you want to inspect the data received over the socket more closely, or test your server's response to broken requests.

Transmitting a CSS file

Some browsers (e.g., Chrome, Safari, possibly others) won't properly render a website with CSS files unless the HTTP responses contain text/css as the Content-type in the header. The only reliable way I could find to do this was to check the file extension of the file being requested. If it ends in .css, then I set the Content-Type field accordingly.

Reading a file (HTML, JPG, whatever) and transmitting it via socket

A common task that you'll need to do is be able to read a file that has been requested, and as part of your HTTP response, transmit it back to the requesting client. Reading the file as straight text, line-by-line, may work for HTML, but won't work for images. Furthermore, if you do a line-by-line read, you may be changing the newlines in the file you transmit. Admittedly, that may not change how the browser renders the page, but your server is still inappropriately changing the structure of the file it has been asked to transmit.

Here is (part of) the approach I used for reading a file in pure binary and transfering it via a socket. I was inspired by this StackOverflow posting. This code is intentionally incomplete, and might even be incorrect. (I copied lines out of my solution without verifying that they run on their own.)

OutputStream os = socket.getOutputStream();
InputStream input = Files.newInputStream(path);
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
byte[] data = new byte[1024];
int totRead = 0;
while ((numRead = input.read(data,0,data.length)) != -1) {
    totRead += numRead;
    buffer.write(data,0,numRead);
}
String response = "Content length: ";  // not even close to complete
for (Byte b : response.getBytes()) {
    os.write(b);
}
buffer.writeTo(os);

Part 3: Simplistic Multithreaded Web Server

For this part, you should extend part 2 to be multithreaded. Multiple web browsers (or browser window/tabs) connecting to the server at the same time should launch multiple threads in your server. The knock-knock joke example provided above has a section at the end called "Supporting Multiple Clients," which provides more sample code on going multithreaded. You are welcome to use that as a starting point as well; again, observe the software license that is provided.

When your work is complete, we should be able to test it by running:

javac WebServer.java
java WebServer maxConnections rootDir

maxConnections is an integer greater than or equal to 1 representing the maximum number of client connections (i.e., the number of threads serving web pages) allowed at any given time.

Web server structure

The basic design of your web server should be the following:

  1. Create a server socket on port 8888.
  2. Enter an infinite loop:
    1. Accept the next connection.
    2. If there are already max connections, kill the oldest thread by closing its client socket. This will cause the worker thread to receive a SocketException the next time it tries to read from or write to the socket. The worker thread should then exit.
    3. Create a new thread to handle the new client's connection, passing it the client socket returned by accept.
  3. The main server thread should exit only if it encounters an IOException.

The worker thread's run() method should be an infinite loop that only exits if it encounters an IOException or if the socket is closed by the main server thread or by the client. Otherwise, the worker threads continue to handle HTTP requests from the client.

Remember that connections can also be closed on the client-side. In this case the associated worker thread on the server should detect that the socket was closed, clean up any shared state, and exit.

If your solution requires any use of shared state among threads, make sure to use synchronization to coordinate the accesses to this shared state.

Part 4: Better Multi-Threaded Web Server that Handles More HTTP

This part adds to the capability of your web server, adding more parts of the HTTP 1.1 protocol. Specifically:

  • Your server must handle GET and HEAD client requests. It does not need to handle POST nor any other requests.
  • It should return appropriate status codes, including 200, 400, 403, and 404. If the server returns an error code to a client, it should also return headers and a message body with a simple error page. For example:
<html><body>Not Found</body></html>
  • It should support the headers Content-Length, Content-Type, and Date.
  • It does not need to handle any PHP or JavaScript parsing.
  • It should handle paths that start with /. It does not need to handle paths that start with a username, such as /~username/.

GET requests and mapping URLs to files

Directory names in URLs correspond to files named either index.html or index.htm in the named directory. Your web server should first look for a file named index.html, and if that doesn't exist, look for index.htm.

Here are some example GET requests that you need to handle, and their corresponding filename(s):

GET  /   HTTP/1.1                           /rootDir/index.html
                                         or /rootDir/index.htm

GET /index.html  HTTP/1.1                   /rootDir/index.html 

GET /index.htm   HTTP/1.1                   /rootDir/index.htm 

GET /search.html HTTP/1.1                   /rootDir/search.html

GET /cat.jpg  HTTP/1.1                      /rootDir/cat.jpg
GET /courses/ HTTP/1.1                      /rootDir/courses/index.html 
                                            /rootDir/courses/index.php

You do not need to correctly handle GET requests of the following format (i.e. GET requests with no trailing '/' when the last name corresponds to a directory):

GET /courses  HTTP/1.1

We won't test this case.

Submission

For each part, you should submit two files to Moodle, the first one of which should be anonymized:

  • A zip file containing your Java code.
  • Citations in a text file credits.txt

There is no writeup required for this assignment.

Author: This project was created by Tia Newhall for Swarthmore's CS 87 and adapted to Java by Laura Effinger-Dean. The wording in some sections is partially due to Dan Grossman. Jeannie Albrecht's and Amy Csizmar Dalal's descriptions of similar projects were consulted. Dave Musicant broke it into multiple parts, changed some of the functionality, and reorganized the description.

Created: 2016-02-18 Thu 12:17

Validate