Data Structures - Arrays

In the context of data structures, I'll be writing about them in a language-agnostic fashion. However, I WILL be sure to address how they are used in Cocoa Touch programming (the NS classes that the structures correspond to, differences between Objective-C and other languages, etc). Over the next few blog posts, I'll be going over Arrays, Links, Stacks, Queues, and Hash Tables. We will be discussing some of the main uses of data structures which ultimately determine which is the best to use for your situation: indexing, searching, inserting, deleting, and sorting.

Side note: I finally took the time to find the origins of shell, terminal and console. First of all, most Macs use bash as their shell. The shell is just the program that processes commands and returns output. The terminal a wrapper program that contains and runs the shell program. A console is a special kind of terminal used to communicate with your operating system at a low level (not standard on Mac machines).
Today, we use terminal emulators (the terminal program on a Mac). These are virtual terminals that replace the terminals of years past, which were little more than a keyboard and a monitor used to type in commands, pass them to the shell, process them with the computer, and return an output. Today we more commonly use graphical user interfaces to interact with our computers.

Side note aside, let's begin by discussing a very commonly used data structure, the array.

Arrays are ordered collections of data stored in memory contiguously, which means that a whole block of memory is allocated for an array. For this reason, arrays cannot grow in size easily, since there may not be additional memory RIGHT NEXT to the block of memory already set aside for an array. This leads us to our first point, which is that arrays are immutable (for the first Objective-C specific point on this, see #1 at the bottom of this post).

Arrays can be one-dimensional, two-dimensional (an array of arrays), three-dimensional (an array of two-dimensional arrays), etc. It just depends on how you want to structure your data. They can even be jagged (a two-dimensional array where each array is not necessarily the same length).

Arrays allow for random, or direct, access. This means that we can index right into an array. Our computer already knows the size of an array, so if we say "take array A and go to the 4th index", our computer immediately knows where in memory that array is located, how much space is taken up, and can jump right to the 4th index. This means that indexing in an array is a constant time function, meaning that additional inputs of data in an array structure don't lead to increases in time complexity. The Big O notation for array indexing is O(1).

Searching, on the other hand, is a memory intensive process, as the computer must iterate over every item in the array until a desired object is found. It doesn't know what is being stored in an array, only WHERE in MEMORY it is stored. The best case scenario is that the object we are looking for is at index 0, and the worst case is that it is the last index in the array. Either way, we must assume the worst case scenario, which means iterating over the entire array until we find our object. Searching in an array is a linear time function, meaning that additional inputs result in a corresponding increase in time complexity. The big O notation for array searching is O(n).

We can optimize searching by using a binary search - we sort an array in ascending or descending order, check the middle element, and if that value is higher than our desired value, we search the lower half of the data values. This process continues, halving the data set each time, until we find (or don't find) the desired value.

1. With regard to the Objective-C implementation of arrays, although it is more memory intensive to do so, the language offers the NSMutableArray class (along with NSArray and CFArray. This is an array that DOES have dynamic sizing; however, things change when you're using this kind of data structure. If you are adding objects to a mutable array, there MAY be space in the contiguous memory block for that additional data. However, all the other objects must be shuffled around to keep the indices in place. Additionally, if there is no space in that contiguous block, the entire array must be copied over with the new values to a new contiguous block somewhere else in memory.
As a result, if data is subject to frequent change / growth in size, perhaps there are other data structures available better suited to a developer's needs.

Arrays are good to use when access is prioritized (constant time function). If you are able to insert data into an array (mutable) and you are inserting at the end, THEN insertion is a constant time function (we can jump directly to the last index of the array in memory). However, if end-insertion is not happening, the time complexity is linear (O(n)) since we must iterate through the entire array to find our desired point of insertion.
 

AN ADDITIONAL POINT ON RANDOM/DIRECT ACCESS (INDEXING) OF AN ARRAY BEING CONSTANT-TIME:

The math involved in indexing an array is addition and multiplication - these are both considered to be constant time operations. In other words, adding 1000 + 1000 + 1000 doesn't take much more time than adding 15 + 15 (and the same goes for multiplication). Higher values and more of them don't corresponding to a linear increase in the time required for the computation, so they are considered constant.

A crude representation of an array in a contiguous memory block, with each collection member mapped to it's place in memory

A crude representation of an array in a contiguous memory block, with each collection member mapped to it's place in memory

Let's say we want our program to access the third index in the array (index 2). We know that the first index has a memory address of x004. An address is 2 bytes large in this case, so each address increases by 2 here. To get the index we want, we must compute:
(address of first index) + ((desired index) (2 bytes per index in memory))
This comes out to (x004) + (index 2 * 2) -> (x004 + 4 bytes).
This leads us to the address x008, which holds the values 99 (the value of the third index, or index 2).
As you can see, multiplication and addition are constant operations required to index into an array; this is the reason why array indexing is a constant-time function.

*if you see me using the term deterministic in future posts, take it to mean that if given the same input each time, an algorithm will return the same output.