Node.js was designed to make use of asynchronous I/O - many traditional programing languages utilize synchronous I/O. What happens with synchronous code is that when several tasks are being executed on the same thread, one task must complete before control returns to the thread and the next task can be executed. To get past this bottleneck, developers can make use of multiple threads and dispatch certain tasks to certain threads, allowing processes to run concurrently. Managing threads is, to understate it, a challenging task.
Node.js gets rid of the complexities of multithreading by allowing for asynchronous code (see the asynchronous section further down)- functions take callbacks (other functions that are passed as parameter variables) that can make use of the results of the original function. When the original function finishes executing, the callback function is called.
The thing about writing pure Node.js code is that, in using callbacks that are nested within callbacks, we quickly reach “callback hell” (think pyramid of doom in the context of multiple if/else statements). We can tame our indented code using things like promises or a module such as Async.js.
Let’s talk about routes - enter Express. Express.js is a lightweight framework built on top of Node.js that leverages the asynchronous event-driven programming paradigms of Node.js and makes it easier to build web applications. As a quick example of how it makes life easier, sending a file in pure Node.JS in an HTTP response object can be quite a few lines of code. After creating a server (or just using the http module “get” or “post” method which creates a server under the hood), we need to specify the response header data (content type, etc), possibly create a data stream out of our file and then pipe it to our response object. In express.js, we’d just use the “sendFile” function.
Streams and Buffers
What are streams and buffers? In general computer science, a buffer represents a temporary place to house data while it’s being moved from one place to another. Once data has been collected and stored in a buffer, it can be manipulated in some way (read from, written to, etc). In Node.js , a buffer is a structure that allows for manipulation or reading of binary data - much like an immutable array, a buffer cannot be resized. It allows for much lower level access to data (to the binary data that composes a string vs the encoded value of the string itself, for example). If you use buffers, you gain some performance since you can avoid, in our string example, string management functions.
A stream represents a sequence of objects (sometimes bytes) that are accessed in sequential order. They’re core to I/O processes (file access, networking, processes). Streams access data in chunks instead of all at once - they’re associated with event emitters so that developers can write callbacks for when certain things have happened involving stream data (encountering an error, receiving data, ending the reading of data).
In contrast to buffers, streams can read data piecemeal. Buffers need to be processed all at once before any action can be taken to alter the data contained in the buffer.
HTTP uses a request/response paradigm whereas TCP servers utilize bidirectional streams. We can create readable and writeable streams using the filesystem and then pipe that data into an HTTP response (which itself is a writable stream) or we can pipe an HTTP request (a readable stream) into a data stream. TCP sockets are bidirectional meaning there is an open connection that we can both read and write streams to.
With asynchronous code in Node.js, we don’t have to deal with multiple threads - that complexity is abstracted away from us within the context of the event loop. Instead, we take advantage of the asynchronous nature of Node.js to write our software. With synchronous code, if we want tasks to run in parallel, they must be executed on separate cores or threads. With asynchronous code, once a process begins, we can begin another one without waiting for the original process to complete. We use callback functions to perform operations with the return data after a process finishes.
Let’s use reading a file as an example. With asynchronous code, once the file starts being read, we can go do some other task. When the file is finished being read (whenever that may be), our callback function that we wrote earlier handles the results.
Imagine we were to write a program that did the following - [print message to console] -> [read contents of file asynchronously and print contents to console] -> [print end message to console]. In asynchronous code, we don’t know when the contents of the file will actually finish being read. It may very well turn out that our first print statement and our end print statement get logged, THEN the file contents are logged. If we did this synchronously, the first print statement would be logged, then the program would hang while the file was read and the contents were logged, and we would see our end print statement last.
For this reason, being able to serialize asynchronous tasks is an important part of writing code in Node.js. As a very simple illustration of this concept, say you're writing client side code and you need to get data from several APIs - you have the URLs in an array and you want to execute a GET request for each. If your code is asynchronous, you can't be sure of the order in which the requests will be executed. You can solve this by treating your URL array as a queue structure. Shift the first URL off of the array, execute a get request and add the data to a new array. If there are any URLs left in your array, recursively call the function again, this time shifting the next URL off of the array. When the URL array is empty, do something with the data array which will have each response in the correct order.
This was just a quick primer on writing some I/O code in Node.js! I'll be including more code as I write a backend for a new mobile app and continue learning more about Node.js and Express.js.