objective-c

INSTANCE VARIABLES & PROPERTIES

INSTANCE VARIABLES VS PROPERTIES

An instance variable is a variable that exists and holds its value for the life of the object. Instance variables (or ivars, as they're commonly referred to) are available to the entire class that they're declared in - this is in contrast to local variables, which are available only to the method they are declared in. Generally, it is bad practice to directly set the value of an ivar - instead, we abstract this process by using setter and getter methods. The concept of instance variables (and properties) has a lot to do with data encapsulation, which we'll cover towards the end of this post.

Properties are simply variables that are backed by their ivars - by declaring a property, we create an instance variable and automatically generate setter and getter methods. Additionally, properties come with a list of attributes that we can use to help further shape and define our variables - readonly, readwrite, atomic, nonatomic, strong and copy.

A readonly property is when the backing ivars setter method is never synthesized.

A readwrite property generates both getters and setters.

An atomic property guarantees that a fully initialized object is always returned - the object is locked when being set or get so that it is not being simultaneously accessed from another thread. You will never see a "partial write" for an atomic property. Atomicity doesn't imply thread-safety! Consider an example where you have a first and last name, both atomic, both being accessed and set from two threads (one thread sets the initial first and last names, the other changes them). While you can always guarantee that you will be returned a full string for the first name and a full string for the last name (no partial writes), you cannot guarantee which thread is returning the value. You may get the original first name (from the first thread) and the changed last name (from the second thread), which is not the combination of values you perhaps expected.

A nonatomic property makes no guarantees regarding object initialization.

A strong property creates an owning relationship between the property and the assigned value. This is the default for properties. If a Car object has a @property (strong) Person *driver, and in our implementation file we write Car * Honda = [Car new]; and we have Person *John = [Person new]; and Honda.driver = John, then Honda has strong ownership of John. As long as Honda needs its driver, John will exist. 

A weak property creates a non-owning relationship between the property and the assigned value. This can be used to prevent retain cycles. In our previous example, if Person has a property called @property (strong) Car *car, and we have John and say John.car = Honda, we have a problem. Honda has strong ownership of its driver, John, but John also has strong ownership of his car, Honda. These objects, because of the strong relationship, will ALWAYS be owned by another object. If they’re no longer needed, they still can’t be destroyed - this is a retain cycle. A retain cycle is when two objects have a strong relationship to one another and thus can’t be destroyed, and no other objects hold a reference to either of them. The way around this is to use the weak property attribute - if John has weak ownership of his Honda, then we can avoid this retain cycle. If Honda is deallocated, because John has a weak relationship to it, that won’t be a problem - however, John.car will, by default, be set to nil. 

Keep in mind that if we want to use Key Value Observing (KVO), we must user properties in our classes. This is a means of notifying a current class when a property value has been accessed in a different class. It provides a means of communicating between different classes. Additionally, properties don’t HAVE to be global - if we declare them as class extensions (declared in our implementation file) as (readwrite), we can use them as we like in the class file. In the interface file, we can declare those same properties as (readonly) - other classes will not be able to write values to that property, essentially encapsulating that data. 

Data encapsulation encompasses the idea that data should be hidden and confined to a single area of software as much as possible, unless it is necessary that it be available to other classes. There are many, many ways to go about doing this, and it's up to the developer to decide how they want to go about it. You can hide ivars and properties in a class extension, making them available only to that class. You can create a category, use public variables, and import that category into your class for use. 

*As a side note, I almost always use properties instead of explicitly declaring ivars.

If data needs to be publicly available, it is bad practice to make it possible for other classes to change that variables value - in this case, you can declare a readonly property in a class interface file, and redeclare it as readwrite in the implementation file. This makes it so that you can internally change the value of that variable, but no other classes can write new values to it. To make properties "private" (although there is no such thing as a truly private variable in objective-c due to its dynamic runtime) you can simply put them in a class extension - this is my typical encapsulation workflow. dditionally, if you will be using publicly available variables from one class, but you are only using them internally, you can write your import statement for that class ONLY in your implementation file.

Between the @interface and @end tags, we have declared a private property. This is one way of using an extension to house private variables.   After the @implementation tag, we have declared both a private and public ivar between the curly braces. We can still achieve data encapsulation without using extensions, as shown here. 

Between the @interface and @end tags, we have declared a private property. This is one way of using an extension to house private variables. 

After the @implementation tag, we have declared both a private and public ivar between the curly braces. We can still achieve data encapsulation without using extensions, as shown here. 

To recap - 

  1. Try and declare ivars or properties in categories as much as possible - this limits their exposure to the class they are declared in.
  2. If you need to expose a variable to other classes, try and make them readonly properties in your interface file and make them readwrite in your implementation file.
  3. All property variable are backed by ivars - they simply generate setters and getters automatically and come with several attributes you can utilize that deal with atomicity, encapsulation (readonly) and memory management.
  4. NOTE - If you don't want to put your variables in an extension, you can just put them in curly braces right after your @implementation "class name" in your .m file. This also encapsulates your data - I prefer using extensions because I think it makes my code more readable. 

Last but not least, if you want other classes to be able to change the values of a variable you have, you can keep the variable private and still accomplish this - create a public method, declared in your interface file, that internally changes the value of the private variable (see below). This is often used for further encapsulation, as it is bad practice in a large code base to allow other classes to directly change the value of an object. Once again, the way you choose to encapsulate your data depends entirely on your preferences and the data you have at hand - different pieces of software will always have different encapsulation requirements. 

On the left, we have the interface (.h) class file, and in it, we have a public readonly property and a public method. On the right, we have the implementation (.m) file with an extension declared. In the extension, we have redeclared our readonly property to be readwrite - other classes can now READ the variable value directly (it is publicly declared on the left) but cannot directly alter the value of it (since it's publicly readonly). In order to change the value, other classes must go through our publicly declared method, the implementation of which changes the property value.

On the left, we have the interface (.h) class file, and in it, we have a public readonly property and a public method. On the right, we have the implementation (.m) file with an extension declared. In the extension, we have redeclared our readonly property to be readwrite - other classes can now READ the variable value directly (it is publicly declared on the left) but cannot directly alter the value of it (since it's publicly readonly). In order to change the value, other classes must go through our publicly declared method, the implementation of which changes the property value.

Data Structures - Hash Tables

Associative arrays, structures which have meaningful keys associated with values instead of indexes being associated with values, are another abstract data type. The first rule to remember is that keys must be unique, but values can be duplicates. Custom objects can be stored as values, and implementations of associative arrays differ across languages; in Objective-C, we have the NSDictionary class. Underneath the hood of every associative array is a data structure known as a hash table.

A hash table is essentially a pair of arrays, one that stores a key and the other that stores an associated value. Entries in a hash table are called buckets, and hash tables begin with multiple buckets that are waiting to be filled with data. What determines which bucket will be used to store a value from our key value pair? The answer to that lies in understanding a core part of the hash table: the hash function.

   This diagram illustrates the basic idea behind a hash function. It takes in raw data and through some deterministic operation outputs an index.

 

This diagram illustrates the basic idea behind a hash function. It takes in raw data and through some deterministic operation outputs an index.

Imagine we have some raw data, perhaps a custom object, that we are going to store in a hash table. How does a hash function determine where to store that data? Hash functions can be written in many different ways, and some are much better than others. In its most generic sense, a hash function takes some data input (let's focus on the key in our key-value pair here), and performs some operations on it to generate a large integer. This is a way of purposefully representing that data in a shortened way; this is known as hashing, and the integer output is a hash or a hash value. There are two rules to remember when discussing hash functions: 1. Hashing is an irreversible process (it is not invertible) and 2. some information is lost when hashing.

A hash function should be able to generate some data from some input using the same set of rules every single time; we can say that hash functions must be deterministic. While a hash function should produce the same hash for two equal objects (allocated separately but holding duplicate data), two equal hashes do not necessarily come from the same data. When two different pieces of data result in the same hash value, we have what is known as a collision; we will discuss how to manage collisions shortly.

A hash function takes a key from a key-value pair and ultimately maps it to an integer within the range [0, N-1], where N is the capacity of the bucket array of our hash table. The goal is to use h(k), which represents the hash value of a key (where h is our hash function and k is the key), as an index of our bucket array. Depending on the hash function, our hash value for a key may be a very large integer; we must find a way to reduce that number down to a size relative to the size of the hash table. An example of a way to do this is by taking the modulo of the integer with regard to the number of buckets; this final output would represent the hash table bucket array index. We then map to that index of the array and store the value in our key-value pair.

The implication of all of this is that if we need to find a value and we have the key, we can just pass that key to our hash function, get our final hash value output, and index directly into the array using that value. Indexing into an array, remember, is a random access operation (and is a constant time operation). Although similar to arrays in that we have random access, it is unlike both arrays and linked lists in that there is no concept of a linear/binary search or traversing (which are all linear time functions).

Collision Management

Hash functions sometimes result in collisions - this is when a generated hash value isn't unique enough between two different objects, resulting in our function attempting to map two different objects to the same bucket. One way to deal with this is to have a bucket where multiple values are attempting to be stored simply hold an array or a linked list. This way, when you index into that bucket, you can either iterate through an array storing multiple objects, or you can traverse over a linked list to do the same. This solution to collisions is known as separate chaining.

Next time we'll be covering trees and graphs (briefly). After that, it's all about Cocoa Touch.

Data Structures - Stacks & Queues

Today we're going to talk about stacks and queues - two abstract data types that just use data structures (arrays or linked lists) under the hood. For Objective-C support, head near the bottom of the post.

   LIFO (stack) vs FIFO (queue)

 

LIFO (stack) vs FIFO (queue)

To visualize a stack, think of a stack of dishes - you wash the first one, dry it, and place it on a table. Every other dish that is washed and dried is placed on top of this first one, creating a stack. Which dish comes off first when it's time to put them away? The conventional way to do it would be to pull the topmost item off of the stack and put it away. This explains the protocol followed by stacks which is LIFO, or last in, first out. The last item to go on the stack (dish at the top) is the first to come out.

Stacks are simple because you can generally perform three operations - push (add something to the top of the stack), pop (you can take something off the top of the stack), and peek (you can see what item is at the top of the stack without doing anything to it). They're sort of a wrapper around an array or linked list with limitations on what operations can be performed.

The term "stack" can also refer to a concept involving memory allocation - see the bottom of the post for information on this.

Queues can be visualized by thinking of a line in a store - the first person to get on line is the first person to be serviced at the counter. This data type follows the FIFO protocol - first in, last out. The first person in line is the last person to get out of the line. Often times, queues are used in concurrency situations - you can create queues to order processes and control what process runs when.

*In the context of this blog, we are talking about heaps and stacks as they exist within the realms of CPU memory. Keep in mind, however, that heap may also refer to the tree-based data structure, entirely separate from CPU memory. Heaps can be max or min heaps; there is a lot to learn about them, so we will not be covering them as data structures in this blog series.

OBJECTIVE-C SUPPORT:

Stacks are not explicitly supported in Objective-C, but they are simple enough to implement with, say, an NSMutableArray. You can add a category to the NSMutableArray class and add methods for pop, push and peek; it's that simple.

Queues are supported in Objective-C in a few ways: NSOperationQueue/NSOperations, and Grand Central Dispatch (GCD). GCD is lower-level than NSOperationQueue, and is written in C; however, it is important to understand that although it adds another layer of abstraction, NSOperationQueue uses GCD under the hood. You can also implement a queue yourself similarly to how we discussed doing so for a stack using a mutable array.

GCD provides queue implementation for use with many things, concurrency being one of them; it does so in the form of blocks (read up on them!). Both allow for priority queues, meaning you can insert processes to run into the middle of an existing queue by setting its priority.

Whether to use GCD vs NSOperationQueue is a point of contention, but it should be noted that GCD is much more light-weight than NSOperationQueue. It involves no thread-safety locking mechanisms, and is less taxing on the CPU. As such, some developers prefer to use it, as long as it suits their needs and they require none of the higher-level functionality of NSOperationQueue.

*In following convention, UI updates should happen on the main queue, and other processes should be executed on background queues. This ensures that there is minimal (or no) lag for the end-user of an application.

Left for review are hash tables, trees and a brief introduction to graphs. I will try and be brief with the last two so that we can move on to more iOS specific concepts.

 

Perhaps without realizing it, programmers use stacks every day in coding - the stack refers to a place in computer memory (along with the heap). The heap refers to memory on the CPU dedicated to items that are explicitly allocated memory in a program. It is up to the programmer to release the memory holding onto those objects when they're no longer needed. Heap memory is accessed using pointers (or some other reference), and as such, it is somewhat slow to read/write to. The stack, on the other hand, is an abstract data type used to "manage" a region of CPU memory that stores temporary variables which are created by functions. When we refer to "the stack", we are talking about the region of memory (versus just "a stack" which refers to some instance of the data type). Whenever we enter a function, all variables created are pushed onto a stack frame, which itself sits on the stack. A new stack frame exists for each function. When a function exits, the frame is popped off of the top; once a stack variable is freed, that piece of memory becomes available for other stack variables. FOR A LIST OF PROS AND CONS BETWEEN STACKS AND HEAPS, REFER TO THE BOTTOM OF THIS POST.

HEAP

PROS:

  • Variables can be globally accessed
  • There is no limit on memory size (limitations set by CPU)

CONS:

  • Relatively slow access
  • Programmer is in charge of allocating and deallocating variables
  • Efficiency regarding use of memory is not guaranteed.

STACK

PROS:

  • Fast access
  • Don't need to explicitly deallocate variables
  • Memory won't become fragmented from deallocation (efficient use of memory)

CONS:

  • Access to local variables (within function) only
  • There are limitations on size of the stack

 

 

Data Structures - Linked Lists

Before we dive into a new data structure, be aware that there is a great quick guide on Computer Science concepts (data structures and common algorithms) that you can access here.

Now, let's discuss another data structure that builds off of our knowledge of arrays: the Linked List.

The linked list is made up of units called nodes, and each node contains two things: a value, and a pointer to the next node so that we can find it in memory. Unlike arrays, linked lists are collections of data stored that are not stored contiguously. Because of this, linked lists can grow in size very easily; you can simply add on an object to a linked list anywhere in memory, and the previous node will simply contain a pointer to the new node, wherever it may be.

Linked lists, as data structures, are often used to implement other abstract data types; they describe functionality more than implementation. The underlying implementation can be due to data structures such as arrays, hash tables, linked lists, etc. For example, stacks, queues, and associate arrays can be implemented using linked lists; they themselves are abstract data types, but are implemented using data structures.

Another way to differentiate between data structures and abstract data types is that data structures are looked at from the perspective of the person implementing them; an abstract data type is viewed from the perspective of a user, meaning that the data is described by its behavior.

Just as a developer can have variations of a standard array (think dimensions), we can have doubly linked lists, where nodes contain pointers to both the next node and the previous node, and even circular linked lists, where instead of the last node pointing to a "dummy" node, as is sometimes done, the last node contains a pointer to the first one. Linked lists support sequential access; that is, you must traverse from the "head" node through the list until you find your desired object. This means that indexing in a linked list is a linear time function; the Big O notation for linked list access (and also searching) is O(n).

                                This is a diagram illustrating the insertion of a node to an existing linked list

                              

This is a diagram illustrating the insertion of a node to an existing linked list

 

However, insertion and deletion are optimized operations in linked lists - they are both constant time functions (O(1)). If something needs to be inserted, we simply redirect the pointers in the linked list so that the intended previous one points to the new node, and the new node points to the intended next one. The same goes for deletion; we can simply redirect the node pointers so that nothing points to a certain node, and deallocate it from memory.

*note - if you don't already have a pointer to a node that you want to delete, the actual time for deletion would be (search_time) + O(1). The O(1) represents the constant time deletion. The same goes for insertion.

Arrays, at least in my experience, are used far more often than linked lists. There isn't even support for linked lists in Objective-C, although you can implement your own in one of a few ways; check stack overflow for more information on replicating linked list functionality in Objective-C.

One time a linked list may be useful is when using it to implement, for example, a queue. As we will learn, queues are FIFO (first in, last out), which means that the first object to be placed on a queue is the first one to exit the queue. Because insertion is a constant time function for linked lists, inserting to and deletion from the middle of a linked list is optimized. For a priority queue, in which you may break the FIFO rule when something has a high priority flag placed on it, linked lists would work nicely since you can remove directly from themiddle of one.

Next time we'll be covering very simple abstract data types - stacks and queues!