CGI was the radioactive spider for the Peter Parker of the web
The fall of the Berlin wall in 1989 was historic but today the biggest news from 1989 was an invention. When Sir Tim Berners-Lee invented the World Wide Web in 1989, the revolutionary web was rather static. In 1989 and for many years, the web's superpower was hyperlinks. This webserver was called CERN HTTPd. Early web servers could pick up files based on the request path and send the files in response to the request.
A usual request/response would look something like this (I haven't included all headers)
GET /TheProject.html HTTP/1.0
HTTP/1.0 200 OK
Date: date
Server: Apache/1.3.27 (Unix)
Content-Type: text/html
Content-Length: 111
** empty line *
Life was simple but this was nowhere close to the powerful web we know today. How could people signup, post messages and make friends on the internet if the web was so static.
🦸 CGI super-powers
In 1993, the first specification for calling command line programs/executables from web servers was proposed. This was formalized into the standard Common Gateway Interface (CGI) in 1997 based on the initial CGI standard Rasmus Lerdorf wrote PHP/FI in 1994 and it instantly become popular among the developers who were writing C/C++ programs but liked the simplicity of scripting languages.
Now developers could write programs that could receive inputs from the browsers. They could also send a different response to the same URL based on the user. I would even say that CGI standard started the .com boom. Altavista, the most popular search engine before Google was started in 1995. Yahoo! went IPO in 1996, Hotmail was started in the same year.
This is the dynamic web I started my professional programming career in 1999. We used the Apache webserver to serve both static and dynamic pages.
⚖️ CGI standard
The standard itself was simple and initially addressed 'form processing'. Specific URL extensions (eg. .cgi, .pl, .php etc) or specific folders (/cgi-bin, /scripts) etc. were used to tell web server not to pick up the contents of the file and send it to browsers.
Instead, the server should treat the file as an executable, send incoming data, URLs, headers, etc. to the invoked program environment. The web server also opened pipes with the program to ensure any POST request's body could be read by the program from standard input and anything written on standard output was inspected and sent back to the browser.
Suddenly all the contact forms on the web went from triggering a simple mailto:
action to collect more information (including messages).
🦾 CGI was easy, powerful
Dynamic page URLs ended with .cgi as convention. Web server passed the parameters to our programs using environment variables and sent any POST method body to our programs using standard input. (Fork, Exec, Pipe).
We wrote these programs in C and printing environment variables used to be a popular beginner program.
These early times meant, we converted all plus signs to spaces followed by all %22 to quotes (and other hex ASCII conversions happened here.). x-URL-form-encoded and form-data were parsed differently too.
CGI standard expected Content-Type: \n\n as the very first output from our programs. Any other response resulted in the 500 internal server error. Program crashed- 500, Program printed anything on standard output - 500, Program didn't print anything - 500.
CGI developers had nightmares about 500 internal server errors; in some comics even Spider-Man has nightmares.
😵💫 Why don't we know about CGI?
CGI standard had found a beautiful way to run programs and scripts on the internet. But running programs for every request was very costly for web servers, especially because the standard Fork, Exec process was highly inefficient at the time.
Most Linux and Unix systems made a full copy of the forked process even though the process was only going to get replaced by an exec call. (exec doesn't create a new process, it replaces the current process, hence the need to fork first).
This didn't scale as the dot com boom approached and popular websites were getting millions of visits per day.
My team realized our Web servers will have a difficult time scaling and we wrote an Apache module to emulate FastCGI over TCP sockets and our application server (written in C++) managed the heavy part of the application. Neither the web server nor our application server was creating new processes for every request.
We still hoped that Apache's new versions with multi-threading support would be a game-changer. In our applications, the Linux system call select
to multiplex over multiple sockets, and we were hoping Apache would use that too.
🍴 Early solutions removed to fork/exec by building monoliths
Several solutions including FastCGI and mod_perl/mod_php tried to solve this in 1996. mod_perl and PHP executed PERL and PHP scripts within the webserver processes. They were faster than executing PERL and PHP language binaries but not by an order of magnitude.
FastCGI was ahead of its time and would become more popular soon.
🕸️ Nginx - a webserver we wanted all along
Nginx made our wishes come true. In 2004, Nginx launched and FastCGI got a new lease of life. Instead of UNIX sockets, it also worked on TCP sockets. PHP-FPM (FastCGI Process Manager) and Nginx combination improve the performance significantly for PHP systems. All popular languages have their versions WSGI (Python), Rack (Ruby). Nginx used an improved version of select - epoll
and also used more advanced system calls like sendfile
.
Tomcat (1999) and Jetty (1995) took a different approach from day one and supported HTTP directly. The servlet standard used by Java is still relevant and is the hidden backbone behind popular frameworks like Springboot.
Now programming languages support web/HTTP protocols by default. Python Simple HTTP Server and Golang net/HTTP come to mind readily.
CGI is the forgotten hero of the web. Without CGI, we probably wouldn't have a Twitter and LinkedIn. Even Google wouldn't exist if someone had not thought of connecting the power of links with the power of executables using the Unix pipes.