Leaving Amazon

OpenAI is so much fun!

I鈥檇 like to preface this by stating that Amazon is obviously a huge company, and my opinions are just that, one person鈥檚 opinions. There will probably be some people that share my frustrations while others have had a completely different experience.

I interviewed at AWS in early 2020, pre-pandemic. The interview process is grueling and I spent considerable effort preparing for the 5-hours-long pantomime of absurd algorithms trivia and 鈥渢ell me a time when you said no鈥 behavioral questions. COVID-induced visa processing delays pushed my start date forward in time many times. The high-stress interview process and years-spanning wait built up tremendous anticipation. In hindsight I can say I probably had somewhat unrealistic expectations when finally joining the company in late 2021.

Regardless, I was quite frankly shocked after my first couple of weeks, and my first impression was that this kind of work was not for me. As a software engineer, I expected to eventually do some software engineering. I鈥檓 not sure how to describe the work that first team I joined was doing, but I can鈥檛 in good conscious call it software engineering.

Book review - Sapiens

馃摎 Sapiens (2014), a book by Yuval Harari.

Rating: 猸愨瓙猸愨瓙 路 Read: July 2022

Sapiens in an image.

Sapiens has no central point being made; rather there鈥檚 an intricate web of mostly interdependent theories and speculations. This makes for an enjoyable read, however at times it is easy to lose sight of the original premises used to build up on increasingly speculative conclusions.

Book review - Collapse

馃摎 Collapse (2006), a book by Jared Diamond.

Rating: 猸愨瓙猸愨瓙 路 Read: June 2022

Hvalsey church ruins, Greenland. Credit: https://en.wikipedia.org/wiki/User:Number_57

Collapse is a fascinating, if somewhat exhausting, read. The central point of the book is that environmental changes, man-made or not, have been responsible for many a civilization鈥檚 collapse.

DALL路E minis of the future won't be fun

I鈥檝e been playing with dalle-mini the last few weeks. Part of what makes it fun to play with are the bizarre and obtuse outputs. They reached that sweet spot between laughably bad and frighteningly perfect: they鈥檙e good enough to be understood and enjoyed, basically.

I think that incompleteness is part of what makes it so amusing to toy with these things, and conversely what will make future versions much less fun.

Teslas are a dystopia

This could have been a subway.

Since moving to Vancouver metro I鈥檓 continuously surprised with how common electric vehicles have become. Some may see the rise of EVs as an exciting turn towards a futuristic solarpunk utopia. I see the opposite: they are a dystopia of sorts. They are a dead end, a waste of resources in the wrong direction, a false hope.

The proliferation of personal electric vehicles is a strong marker of failure. It is a manifestation in the physical world of our inability as a society to move on from a clearly failed, car-centric way of living. Teslas kind of epitomize this 鈥 despite being just another very expensive car, despite catching fire every now and then, despite bizarre QC issues, despite its nitwit CEO, still they are seen as cool and fashionable and trendy.

Are "digital nomad visas" a thing yet?

Immigration sucks. In addition to the personal toll it takes on anyone, it is also mind-numbingly tedious and baroquely complex. Why aren鈥檛 things better by now?

This reminded me of the so-called 鈥渄igital nomad visas鈥. Searching for that term will get you a thousand clickbaitey Wordpress sites with the 鈥20 best countries with nomad visas鈥 or whatever. But how real are they really?

Botched interviews

Here鈥檚 something I鈥檝e been wanting to write for a while: all the times (the ones I can remember, anyway) I bombed a software engineer job interview. There are so many 鈥渉ow I aced interviewing at X鈥/鈥漢ow to pass X interview鈥 floating around that I thought the opposite story would make for an amusing read.

My first developer job was as an intern at a big tech company in 2012. I think that was one of the worst interviews I鈥檝e had, by the way 鈥 I could barely understand the interviewer over the cellphone, and those were the days of 鈥渉ow many piano players are there in New York鈥-kind of questions. I thought it went terrible, but I got the job somehow. On the other hand I鈥檝e had many interviews I thought I did great but bombed anyway.

Analyzing LinkedIn's data export: what happened in 2021?

I鈥檝e been using LinkedIn basically since I started working as an intern back in 2012. My usage is mostly limited to posting my blog posts, except the couple of times I used the platform to search for a new job. So most of the time, LinkedIn has been pretty slow-paced, with maybe half a dozen random recruiters reaching out per year.

However, since the Covid-19 pandemic started, and particularly in 2021, things seem to have gone a little crazy, with a lot more recruiter activity. I was curious to see just how much things had changed, so I looked at LinkedIn鈥檚 data export.

Onsites considered harmful

A couple of years ago I interviewed at one of the largest Ruby shops out there. Screening went well, and some days later I was invited for an onsite.

These were the good old pre-covid days, so an onsite really meant onsite. You had to travel to the office, wherever that was.

The thing is, an onsite is actually radically different depending on where you live. It follows that onsites introduce further bias into our industry鈥檚 already problematic hiring process. I鈥檇 like to argue that although onsites have some advantages, they鈥檙e mostly a waste of time (and money).

Efficient resource distribution

TLDR A simple metrics-based ranking system is good enough to decide who gets how many resources.

Computational resources 鈥 CPU time, memory usage, network traffic etc 鈥 are limited. This may be more or less of a problem depending on project/company size and so on; if you鈥檙e working on a smaller product with limited traffic, it might not be meaningful at all.

Once past a certain threshold though, expenses with such resources become non-trivial and it begins to make sense to spend some time thinking about how to distribute them as efficiently as possible.

Here鈥檚 the problem that got me thinking about this: at work, we had a computational resource that needed to be consumed by a large fleet of workers (think several thousand concurrent), but each type of worker had different productivity, and that productivity changed over time. How can we decide who gets what?

I replaced Google Analytics with a web server running on my phone

TLDR I built android-analytics, a web analytics tracker running on my phone.

Say you run a blog, personal website, small-time business page or something of the sorts. Say you also want to keep an eye on how many visitors you鈥檙e getting.

The first thing that most people think at this point is 鈥淕oogle Analytics鈥. It mostly works and is free. Its also hosted by Google, which makes it very easy to start using. There aren鈥檛 many competitors that bring those points to the table, so Google Analytics usually wins by WO at this point.

I used to use Google Analytics to track this blog for those same reasons. But after finding out about Termux and writing this post about installing a web server on an Android phone, I started toying with the idea that I had this ARM-based, 2GB RAM, Linux-like device with Internet connectivity which must be more than enough for a simple webcounter-like application. After a few weeks of tinkering, here it is!

Setting up a free HTTPS home server

Try searching for 鈥渇ree dynamic dns https鈥, 鈥渇ree domain with SSL鈥 or anything similar. There won鈥檛 be a lot of meaningful results. Sure, some of the results are pretty close, like this guide on how to get free SSL certification from Cloudflare, or this one on setting up a free dynamic hostname with SSL, but they all assume you already own a domain. If you鈥檙e looking for a completely free domain that you can use for your personal web server that also has verified SSL, there are very few results.

But why was I even looking for this?

I鈥檓 working on a side project. It has a web server that communicates with a static web page hosted on Github Pages. There are a lot of ways of setting that up; in my particular case, I have a local (as in in my house) HTTP web server accepting traffic on a non-standard port (port 80 is blocked by my ISP for commercial reasons 鈥 this detail is of paramount importance, but more on that later). It is accessible through my external IP (which is dynamic), which can be mapped to a dynamic DNS domain.

I wanted to run a simple API on the web server and access it through static pages (like this blog) hosted on Github Pages (which has a verified SSL certificate). I asked the Internet if it is possible to call from a SSL-verified page (in JavaScript) a different server that does not have a verified SSL certificate (that is, my aforementioned webapp running in my home server). It isn鈥檛, so the conclusion was that I needed somehow to get a verified SSL certificate for my dynamic DNS domain.

Having no idea whether this was possible, I started to research.

Communication tips for remote developers

We're all remote -- for now.

Communicating well with your co-workers and managers is supremely important to a software developer, and even more so for the remote one. With a lot more remote workers due to the COVID-19 pandemic, this topic became a lot more relevant.

I鈥檝e seen people hint at this more than a few times over the years, but I didn鈥檛 really 鈥済et it鈥 until I started working as a fully remote engineer. I also find it important to understand not only what we should be doing to achieve efficient communication, but also why we should be doing those things in those ways.

To me, the single most important thing to keep in mind is that people鈥檚 mental resources: time, attention span, etc, like yours, are limited.

Figuring out the Nvidia x Linux puzzle

Ubuntu's power rate over time.

I鈥檝e struggled with some kind of problem with Nvidia graphics cards on Linux since forever.

Most commonly, an external monitor wouldn鈥檛 work or the dedicated card would refuse to power off when it should.

The latter problem 鈥 a power-hogging discrete Nvidia card not turning off when it isn鈥檛 needed, specifically in Optimus-enabled laptops 鈥 has consistently haunted me throughout the years. At least in my experience, this problem is in that sweet spot of things that are definitively annoying and kind of inconvenient, but complicated enough not to be worth the several work-hours needed to definitively solve it.

Repurposing an old Android phone as a Ruby web server

CC-BY Carlos Varela, https://www.flickr.com/photos/c32/7755470064

Do you have an old Android phone? Sure you do! There鈥檚 a mind-blowing amount of electronic waste of all kinds, and with the average person in developed countries discarding their phones every couple of years, discarded smartphones are probably one of the most common forms of e-waste.

I had an old Motorola G5 Cedric gathering dust, so I decided to do something with it 鈥 it is now running a Puma web server with a simple Sinatra webapp.

Now, before going any further, you might be thinking: what is the real, practical use of all this? An old Android phone probably isn鈥檛 going to have a stellar performance, but neither do those t2.nanos, honestly. I鈥檓 yet to deploy any 鈥渞eal鈥 code on an Android, but even the cheaper and older phones do commonly have quad-core or even octa-core CPUs, and at least 2GB RAM, so at least in theory a phone should be close 鈥 ballpark, at least 鈥 to the most modest cloud IaaS offers our there (t2.nano has 512MB for instance). Of course, a phone has an ARM processor while IaaS usually are x86; memory management is entirely different as well, but still 鈥 we鈥檙e talking ballpark estimates here.

Anyway, this is a short tutorial on how to repurpose an Android device as a web server 鈥 or any number of different things, really.

Speeding Up the Backend with Graph Theory

Here at Sensor Tower we handle large volumes of data, so to keep things snappy for our customers we need to think carefully about how we process and serve that data.

Understanding the data we鈥檙e handling is a fundamental part of improving the way we serve it, and by analyzing how an important backend service worked, we were able to speed it up by a factor of four.

My attempt at creating more

I began blogging in the now prehistoric late 2000s.

I鈥檝e done a few blogs about different subjects (computer science, algorithms, web development, short stories and political ramblings). I鈥檝e had blogs on Blogspot, Wordpress and, more recently, Medium.

Those platforms were (or are, I suppose) an easy way to spew your ideas over the Internet while also being nice and comfy for other people to actually read (this last point is important for the CSS-challenged such as yours truly). In other words, those services Got Shit Done鈩.

10 ways not to do a big deploy

Ideally, deploys should be small, concise, easily revertible, fast and with a small or nil footprint on the database. However, no matter how awesome you are, sometimes that is just unattainable and you end up needing to deploy something that is just the opposite: big, messy, hard to revert, painfully slow and rubbing the DB the wrong way. If the deploy messes with a mission-critical part of your software, all the worse for you.

But there are actually many ways you can make those situations even worse. Here are a few bullet points you can follow to guarantee a nightmarish deploy complete with nasty side-effects that will haunt you and your coworkers for days to come.

Halving page sizes with srcset

Web bloat is discussed a lot nowadays. Web pages with fairly straightforward content鈥娾斺妔uch as a Google search results page鈥娾斺奱re substantially bigger today than they were a few decades ago, even though the content itself hasn鈥檛 changed that much. We, web developers, are at least partly to blame: laziness or just bad programming are definitively part of the problem (of course, laziness might stem from a tight or impossible deadline, and bad code might come from inexperienced programmers鈥娾斺妌o judgment going on here).

Working remotely in a non-remote company

We鈥檙e a small team here at Guava, and we鈥檝e always considered ourselves remote friendly. Most of us work remotely every now and then pushed by varied force majeure situations鈥 be it the flu, the need to supervise renovation or construction work at home, flash floods near the office, receiving guests at home or any number of other situations. We鈥檝e also had a few of us working remotely for a few days or weeks while traveling to or back from a conference, or when visiting relatives that live out of town. In other words, remote working has always been a very temporary and circumstantial thing among us.

We have a nice office (with hammocks!), excellent work equipment, great desk space, comfortable chairs, plenty of snacks and comfort food and an infinite supply of coffee. We鈥檙e also easygoing and overall pleasant people (well, most of us are) to work with several hours a day, and some of us are even mildly funny.

The 5 stages of dealing with legacy code

Yes, this article will use the 5 stages of grief as an analogy for something software development-related. There are at least a few thousand other articles with a similar motif (424,000 results for 鈥済rief stages software鈥 according to Google). But bear with me for the next 5 minutes and I promise you鈥檒l get something out of this鈥娾斺奿f nothing else, at least the smirk of those who read their past follies put on text by someone else.

I have been working on a rather big Rails project for the past year and half. The project is nearly 7 years old, and has an all-too-common successful-startup-bought-by-industry-giant background story. In a project with this kind of background, some things are bound to happen: many developers of many skill ranges have come and gone, many software fads (cough, Meteor, cough), and above all else a lot of legacy code that is, well, let鈥檚 put it nicely, not so great. None of this should be taken personally in any way鈥娾斺奿t is just natural for such things to occur in such projects.

Improving spec speed in a huge, old Rails app

We got a 6-year-old Rails app with ~370k LOC and a ~6k-test suite which took 24 minutes to complete. Not good! We took a few days off of the main project to see if we could make things better.

More often than not, test suites are the nasty underbelly of a Rails app. Size and age just aggravate the problem. Tests are seldom a high priority in any project, and speed might not be an issue at all in smaller apps where the whole test suite might take just a few seconds to complete. As the project grows and the CI takes increasingly longer to complete, spec speed suddenly becomes more of an issue.

鈥淪mall鈥 and 鈥渘ew鈥 are not exactly the case for a certain Rails project we鈥檙e working on here at Guava. We鈥檙e talking about a 6-year-old e-commerce portal with ~370k LOC, a couple million customers and a ~6k-test, 300-spec suite which took, on average, a whopping 24 minutes to complete in our CI. Not good! So we took a couple of days off the main project to see if we could make things better鈥娾斺妎r less worse.

How a Unix CLI tool made me care about software feedback

Providing feedback is one of the most important parts of any software. Unfortunately, more often than not we tend to downplay or ignore the very simple yet crucial task of letting the user know what is going on. In this article I鈥檒l use a short cautionary tale of how the lack of proper user feedback (and some laziness, I admit) almost cost me an entire HDD with years of personal data.

Can you tell by the output of dd that the device will be completely and irrevocably wiped out? Hint: while the operation is running (i.e. before hitting CTRL+C), there _is no output.

When Postgres is not enough

What happens when your project鈥檚 RDBMS is just not enough to deal with unexpectedly huge amounts of data?

You could try to de-normalize some tables here and there to avoid unnecessary JOINs, create a few indexes, implement some kind of pagination or even pre-process the data into a more palatable format. However, if you did all that and it still was not enough, the 鈥渘atural impulse鈥 is to give up on the RDBMS altogether and just use Elasticsearch. Sounds like a no-brainer, right?

Don't obsess over code DRYness

Being clever is a good thing for a developer. Ingenuity allows us to write software that solves complex real-world problems. However, 鈥渃lever鈥 code is not always a good thing. In many cases鈥娾斺奍 dare say in most cases鈥娾斺奿t is a very bad thing. I consciously try to avoid writing code that might be seen as 鈥渃lever鈥. The smart thing to do is trying hard not to be smart (yes, very 1984).

Developers tend to see themselves (quite indulgently) as smart people. Not many people understand what we do, and society sees a developer as a kind of modern wizard, writing unreadable magic spells in a small metal box. In reality, though, we are not half as smart as we think: for instance, if you are a developer, you are certainly familiar with the frustration of trying to understand some cryptic piece of code that seemed perfectly reasonable and straightforward when you wrote it a couple of months earlier.

Building a shared library in C and using it in a Python program


Figure 1

How do old-time languages such as C, Fortran and others survive in a world with Python, Ruby and so on?

There is plenty legacy code still around which need maintaining, of course. And there are (will always be?) a few specific applications where low level is needed. But one of the great things with software is building upon old stuff using new tools, which brings us to our topic today: building a shared library containing some of our C stuff and using it in nice and comfy Python. Figure 1 shows an example of what we can achieve by using graphical tools available in Python to improve our existing code鈥檚 text-based output. More on that later on.

For our purposes, we consider shared libraries as a collection of compiled objects condensed into a single file, which may then be called by other software. This is, of course, a simplification. A longer discussion about shared and static libraries can be found in [1].

Trees, part IV - Benchmarking Red-black and AVL trees

In our previous installments we implemented two of the most well-known self-balancing binary search trees: AVL and Red-black trees.

We had a few classes on AVL trees in our basic data structures & algorithms class back in college, which made its implementation far less of a challenge than the Red-black tree. So besides the fundamental guidance of CLRS I had to do quite some googling to get it working. While googling I noticed there were quite a lot of questions about which (AVL or RB) tree was 鈥渂etter鈥 in some sense, be it insertion, search time, deletion time, etc. Most textbooks and articles dismiss this question just by stating the factor differences in either trees鈥 worst case heights, as we briefly mentioned in the past installment. If you鈥檙e anything like me, however, you鈥檒l want to see some comparisons where the trees are actually tested. So I decided to do some simple benchmarking to test those theoretical worst-cases. Here鈥檚 what I found out.

Trees, part III - Red-black tree

In our last installment on trees, we studied and implemented the AVL tree. The AVL tree is one of many self-balancing binary search trees, a special kind of BST that enforces sub-linear operation costs by maintaining tree height close to the theoretical minimum of $latex log_{2}(n)$. This is usually done by what is called tree rotation, which is basically moving around tree nodes (and updating some special node properties).

As you can see in the Wikipedia page鹿, AVL trees guarantee that the tree height is strictly less than $latex \approx 1.44~log_{2}(n)$, while Red-black trees have a slightly worse threshold of $latex \approx 2~log_{2}(n)$; thus, AVL trees will provide significantly better search times than Red-black trees. However, while AVL trees may need to do $latex O(log(n))$ rotations after each insertion, Red-black trees must do at most 2 rotations per insertion. So either one may be your tree of choice depending on the application: if search time is critical but data doesn鈥檛 get updated too often, an AVL tree will perform better; whereas a Red-black tree will perform better in scenarios where data is constantly being changed.

Self-balancing BSTs add some kind of property to tree nodes that make way for tree balancing: with AVL trees, it was the 鈥渂alance factor鈥. With Red-black trees, a 鈥渃olor鈥 property is added to each node. This leads us to the Red-black tree properties:

  1. Every node is either red or black
  2. Every leaf is black
  3. If a node is red, then both its children are black
  4. Every path from a node to any of its descendant leafs contains the same number of black nodes

Ruby DSL & metaprogramming, part II

In the previous installment we built a simple text generator using some Ruby meta-programming tricks. It was still far from being our desired context-free grammar (CFG) generator, though, since it lacked many CFG prerequisites. Most flagrantly, we had no rule recursion and only one production (rule definition) per rule. Here鈥檚 the what a script that would use both features:

  noun 'dog', 'bus'
  verb 'barked', 'parked'
  preposition 'at'

rule 'phrase'
  opt 'The', noun, verb, preposition, 'a', noun
  opt 'Here goes some', phrase, 'recursion.'
  opt 'Meet me', preposition, 'the station.'

grammar phrase: 10

The dictionary section is just as we left it. Let鈥檚 see what changed in the rule section.

Ruby DSL & metaprogramming, part I

I鈥檝e been working with Ruby for nearly a year now, which means I鈥檓 starting to feel the urge to tell people how awesome the language is. One of the most interesting aspects of Ruby to me is metaprogramming, which it seems to have quite a vocation for.

Since college I have a fondness for automata and formal languages theory. One of the topics I particularly like is text generation (if you haven鈥檛 already, check out the excellent SCIgen and the Dada engine), so I thought that building a Context-free grammar (CFG)-like text generator in Ruby would be a nice little exercise and an opportunity to use some of the language鈥檚 coolest features. Also I鈥檝e implemented one of those using Java several years ago, and it was a mess, so I was curious as to how much of an improvement would Ruby offer.

Suppose the following script:

dictionary 'noun', 'dog', 'bus'
dictionary 'verb', 'barked', 'parked'
dictionary 'preposition', 'at'

rule 'phrase', 'noun', 'verb', 'preposition', 'noun'

codex 'phrase'

We鈥檇 like dictionary to store some words according to their classes, and rule to define a specific ordering of words. For now let鈥檚 not worry about codex (it鈥檚 just a collection of rules).

At this point the seasoned programmer is mentally sketching some kind of text parser. It鈥檚 an okay solution, but isn鈥檛 there something nicer we can do? Well, there is: DSLs! In fact, Ruby is quite an excellent tool to build a DSL, and many famed Ruby-powered applications such as Rspec (and many others) define some kind of DSL.

Trees, Part II: AVL Tree

Masters classes started a few weeks ago, taking their toll on my productivity here. Sorry about that!

So we (pardon the nosism, but I think it sounds less egocentric than writing 鈥淚鈥 all the time) hinted at AVL trees back on our Trees, Part I post. Specifically, we learned that:

a binary search tree (BST), provides O(h) time search, insert and delete operations (h is the tree height.

Linear time (O(h)) doesn鈥檛 sound very good - if h is close to n, we鈥檒l have the same performance as a linked list. What if there were a way to bound the tree height to some sub-linear factor? As it turns out, there are several ways to do so, and the general idea of somehow keeping the tree height limited to a certain factor of the number of elements it holds is called height balancing. Ergo we鈥檒l want to look into (height) balanced/self-balancing binary search trees **(BBST). **


                        .   .
                      .       .
                    .           .
                  .               .
                E .                 P .
              .     .                   .
            .         .                   .
          .             .                   .
      D .                 I                   Y

AVL tree

Since binary search trees have at most two children, the best tree height (i.e. smallest) we can achieve is log2 n (n being the number of elements in the tree). There are several self-balancing BSTs developed over the years. It seems that up there in the US college professors tend to prefer the red-black tree when studying BBSTs, whilst over here AVL is preferred. In any case, AVL tree was the first BBST ever devised, so we鈥檒l adopt it as our BBST model.

AVL trees (named after its two Soviet inventors Adelson-Velsky and Landis) use a series of rotations to keep the tree balanced. To keep track of when a certain subtree rooted at some node needs to be rotated, we maintain (or calculate) a balance factor variable for each node, which is the difference between the node鈥檚 left and right children鈥檚 heights, i.e.:

balance_factor(n) = n.left_child.height - n.right_child.height

Shortest path, part I - Dijkstra's algorithm

Now that we have a way to represent graphs, we can discuss one of the most important problems in graph theory: the shortest path problem (SPP). More or less formally, we鈥檒l define SPP as:

Given a weighted graph G(V,E), find the sequence P = {v0, v1, v2, 鈥, v(n-1)}, vi 鈭 V, from vertex V0 to vertex V(n-1), such that the list of edges EP = {(v0,v1), (v1,v2), 鈥 (v(n-2), v(n-1))} exists and the summation of costs of all elements e 鈭 EP is the smallest possible.

In other words, find the less expensive (ergo 鈥渟hortest鈥) path between two vertices.

The trivial solution is using BFS starting at vertex A and stopping when it reaches vertex B. However, BFS doesn鈥檛 look at the edge costs: it calculates the path with least edges, not the path with least total cost.

Although not necessarily the fastest, Dijkstra鈥檚 algorithm is probably the most popular way to solve the shortest path problem due to its simplicity and elegance. The algorithm relies heavily on priority queues, so make sure to take a look at that if you haven鈥檛 already.


dist[from] = 0
for v : G
      if v != source
            dist[v] = infinity
      prev[v] = -1
      PQ.add(v, dist[v])
while PQ.hasNext()
      u = PQ.pop()
      for each neighbor v of u
            alt = dist[u] + length(u, v)
            if alt < dist[v]
                  dist[v] = alt
                  prev[v] = u
return prev

Trees - Part I


render(鈥/image.*鈥, caption: 鈥楤right green tree - Waikat鈥, src: 鈥//upload.wikimedia.org/wikipedia/commons/thumb/f/f6/Bright_green_tree_-Waikato.jpg/512px-Bright_green_tree-Waikato.jpg)](http://commons.wikimedia.org/wiki/File%3ABright_green_tree-_Waikato.jpg)

We used trees to build the heap data structure before, but we didn鈥檛 bother with the theory behind trees, which are abstract and concrete data structures themselves. There鈥檚 a huge range of material to cover so I鈥檒l split this in several posts.

In this first post we鈥檒l cover the basic theory and implement a binary search tree (BST), which provides O(h) time search, insert and delete operations (h is the tree height). First, the basics:

Trees are graphs with a few extra properties and interpretations/conventions. * Trees have height (longest branch length) and depth (distance to root). * The uppermost level consists of at most one node (the tree root). * All nodes may have children. * There are no edges other than parent-child edges.

Trees are classified according to some of those properties above and some others we鈥檒l mention later. Most commonly, there is a constraint to the maximum number of children per node -e.g. the binary tree limits children to 2 per node.


Mathematically, a graph is a set of vertices and edges, thus a graph G is usually written as G(V,E). Besides linking vertices in the graph, edges can also carry a specific value which may be interpreted as cost, weight, distance etc.

graph viewed with BurgerGF

In computer science, we鈥檙e interested in the (abstract) data structure used to implement the graph mathematical concept. Let鈥檚 first discuss the basic elements in a graph - vertices and edges:

typedef struct vertex
 unsigned long id;
 int status;
 double x,y;
 void* data;
} vertex;

Vertices should be able to hold any kind of data, so we鈥檒l just throw in a void pointer for that. Other than that we have an id, status (marked or unmarked - more on that later) and 2D coordinates so we can draw the vertices somewhere.

typedef struct edge
 vertex* from, *to;
 int cost;
} edge;

Edges consist of just pointers to the vertices they link and an optional value used as weight, distance, cost etc. Strictly speaking we could use a void pointer for that value as well, as long as we also defined a comparison function. But let鈥檚 save the hassle and just use an integer instead - most algorithms will be fine with that.

Heap & Priority Queues

Priority queues (PQs) are abstract data types that work just like regular stacks, but the popping order depends on each element鈥檚 priority instead of the sequence they were pushed onto the queue (FIFO or LIFO).

The na茂ve way of implementing a PQ consists of using an unsorted list or array and searching for the highest-priority element at each pop, which takes O(n) time. There are several more efficient implementations, of which the most usual is the heap.

Heaps are complete (i.e. all levels except possibly the last are filled) binary trees that work as PQs by maintaining the following property: children nodes always have a smaller priority than their parent, i.e. for any node A with children B and C, priority(B) < priority(A) && priority(C) < priority(A). Note that there is no assumed relation between siblings or cousins.

max-heap and corresponding array.

Each element of a heap has two pieces of information: a key and a value, hence we call them key-value (KV) pair. The key identifies the specific element, and the value determines the element鈥檚 priority within the heap. Heaps can be min-heaps (low value = high priority) or max-heaps (high value = high priority).

BurgerGFX - simple 2D graphics

sample code and outpu

Several times I find myself wanting to visualize something in 2D, but can鈥檛 bother to fire up OpenGL or other cumbersome API.

So I wrote a simple program which I called BurgerGFX, with 2 core functionalities: draw point and draw line. I find this to be quite enough for simple applications such as viewing a graph.

Setting up the drawing canvas, which I call burger, is simple: call create(width, height), which returns a pointer to the burger. Then simply call the draws, prints and cleans as needed.


Using our implementation of a doubly linked (DL) list, we can very simply build the most basic LIFO (last in, first out) data structure: the stack.


Stacks have two basic operations: push and pop. Push pushes data onto the stack (i.e., end of the DL list) and pop pops data off the list鈥檚 tail, which is only possible because we can set the new tail as tail->prev, since we鈥檙e using a DL list, with previous pointers. Another useful function is peek, which returns a pointer to the stack鈥檚 top.

Doubly linked list

A doubly linked list is like our previously implemented Linked List, but instead of only having pointers to the next element, it also has pointers to the _previous _element:


This property makes the doubly linked list very useful as a base for other data structures such as the stack: having a previous pointer means we can quickly (O(1)) remove objects from the list鈥檚 tail, which would be impossible with a linked list.

We won鈥檛 discuss implementation since it so similar to a linked list. If anything implementation is even simpler than a linked list because of the previous pointer access.


Very simple Vector implementation with add, add_all, get and delete operations using arrays of void pointers.

The downside to this as compared to simply using an array is that here we have an array of pointers, which means the data will most likely be scattered over the memory, not coalesced.


Mergesort is an important sorting algorithm when you don鈥檛 have efficient random memory access, since it doesn鈥檛 rely on that and has good time complexity - O(n logn) specifically.

As a typical divide-and-conquer algorithm, Mergesort has two steps: first it recursively splits the lists in two until each half is unitary, then it recursively mends back the lists until it reaches the original size.

But before we dive into the actual algorithm, we need to make some changes to the linked list algorithm we鈥檒l be using.

Linked List

Here鈥檚 a very simple implementation of the linked list data structure.

A pointer to the head element is enough to define a linked list. Each element consists of one pointer to the subsequent element in the list and one pointer to the element鈥檚 data: