Web Servers(Udacity)

Table of Contents

1 Toolkits

1.1 ncat

In Nmap network testing toolkit.

  1. sudo apt-get install nmap
  2. ncat -l 9999: start a server
  3. ncat localhost 9999: start a client

1.2 host

DNS lookup utility. host www.google.com

  • alternative: nslookup, dig

2 Basic

2.1 URL

  • a URL is a URI for a resource on the network
  • URL parts: https://en.wikipedia.org/wiki/Fish
    • https is the scheme
    • en.wikipedia.org is the hostname
    • /wiki/Fish is the path. A single slash / is a path, called the root.
    • The part of the URI after the # sign is called a fragment. In HTML pages it links to an element by id.
    • ?q=fish is a query part of the URI.
    • Ports

2.2 HTTP

A simple HTTP request

GET / HTTP/1.1
Host: localhost

response example

HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/3.8.8
Date: Thu, 29 Apr 2021 06:23:14 GMT
Content-type: text/html; charset=utf-8
Content-Length: 514

body...

Response

is made up of three parts:

  • status line: status codes descriptions
  • headers
    • Headers are a sort of metadata for the response.
    • the names of HTTP headers are case-insensitive
    • cookies are a web feature using headers. The server sends the Set-Cookie header. The client send the cookie data back in a Cookie header.
  • response body

2.3 GET vs. Post

Why don't we want to use GET for submitting the form?

Imagine if a user did this. They write a message and press the submit button and the message text shows up in their URL bar. If they press reload, it sends the message again.

2.4 Post-Redirect-Get Pattern

There's a very common design pattern for interactive HTTP applications and APIs, called the PRG(Post-Redirect-Get). A client POST s to a server to create or update a resource; on success, the server replies not with a 200 OK but with a 303 redirect. The redirect causes the client to GET the created or updated resource.

2.5 Other Http Methods

  • PUT/DELETE for creating/deleting things
  • PATCH for making changes
  • HEAD/OPTIONS/TRACE for debugging
    • HEAD works just like GET, except the server doesn't return any content — just headers.
    • OPTIONS can be used to find out what features the server supports.
    • TRACE echoes back what the server received from the client — but is often disabled for security reasons.

details: HTTP standards documents

3 Python Utilities

3.1 urllib

from urllib.parse import parse_qs, quote, urlparse

parts = urlparse("https://www.google.com/search?q=gray+squirrel&tbm=isch")
# => ParseResult(scheme='https', netloc='www.google.com', path='/search', params='', query='q=gray+squirrel&tbm=isch', fragment='')

parse_qs(parts.query)
# => {'q': ['gray squirrel'], 'tbm': ['isch']}

quote("gray squirrel")
# => 'gray%20squirrel'

4 Deployment

4.1 heroku

  1. Check your server code into a new local Git repository.
  2. Sign up for a free Heroku account.
  3. Download the Heroku command-line interface (CLI).
  4. Authenticate the Heroku CLI with your account: heroku login
  5. Create configuration files Procfile, requirements.txt, and runtime.txt and check them into your Git repository.
  6. Modify your server to listen on a configurable port.
  7. Create your Heroku app: heroku create your-app-name
  8. Push your code to Heroku with Git: git push heroku master

4.2 Routing and Load Balancing

One thing a specialized web server can do is dispatch requests to the particular backend servers that need to handle each request. There are a lot of names for this, including request routing and reverse proxying.

Splitting requests up among several servers is called load balancing.

4.3 Caching

The server can set HTTP headers indicating that a particular resource is not intended to change quickly, and can safely be cached.

There are a few places that caching usually can happen.

Browser Cache
Browser maintains a browser cache of cacheable resources — such as images from recently-viewed web pages.
Web Proxy
Perform caching on behalf of many users.
Reverse Proxy
Server-side caching.

All HTTP caching is supposed to be governed by cache control headers set by the server.

4.4 Cookies

Cookies are a way that a server can ask a browser to retain a piece of information, and send it back to the server when the browser makes subsequent requests.

What are cookies for?

  • If the server sends each client a unique cookie value, it can use these to tell clients apart. (sessions and login)
  • Cookies are used by analytics and advertising systems to track user activity from site to site.
  • Cookies are also sometimes used to store user preferences for a site.

How cookies happen

The first time the client makes a request to the server, the server sends back the response with a Set-Cookie header. This header contains three things: a cookie name, a value, and some attributes. The server can update cookies, or ask the browser to expire them.

cookie fields

See here

  • Domain and Path, describe the scope of the cookie - that is to say, which queries will include it.

5 Https(Http with TLS)

TLS provides some important guarantees for web security:

  • privacy: keeps the connection private by encrypting everything sent over it.
  • authenticity: lets the browser authenticate the server.
  • integrity: helps protect the integrity of the data sent over that connection.

5.1 Keys and certificates

The server-side configuration for TLS includes two important pieces of data: a private key and a public certificate.

  • The private key is secret; it's held on the server and never leaves there.
  • The certificate is sent to every browser that connects to that server via TLS.

5.2 CA

The server's certificate is issued by an organization called a certificate authority (CA). The certificate authority's job is to make sure that the server really is who it says it is. The role of a certificate authority is kind of like getting a document notarized.

6 Http/2

Improvements:

  • Multiplexing: multiplexs requests and responses over a single connection
  • Server push: allows the server to say, effectively, "If you're asking for index.html, I know you're going to ask for style.css too, so I'm going to send it along as well."
  • Encryption: Most of browsers will only attempt HTTP/2 with a site that is using TLS encryption.