Posted in 2015

Building a shared library in C and using it in a Python program

pathfindin

Figure 1

How do old-time languages such as C, Fortran and others survive in a world with Python, Ruby and so on?

There is plenty legacy code still around which need maintaining, of course. And there are (will always be?) a few specific applications where low level is needed. But one of the great things with software is building upon old stuff using new tools, which brings us to our topic today: building a shared library containing some of our C stuff and using it in nice and comfy Python. Figure 1 shows an example of what we can achieve by using graphical tools available in Python to improve our existing code鈥檚 text-based output. More on that later on.

For our purposes, we consider shared libraries as a collection of compiled objects condensed into a single file, which may then be called by other software. This is, of course, a simplification. A longer discussion about shared and static libraries can be found in [1].

Trees, part IV - Benchmarking Red-black and AVL trees

In our previous installments we implemented two of the most well-known self-balancing binary search trees: AVL and Red-black trees.

We had a few classes on AVL trees in our basic data structures & algorithms class back in college, which made its implementation far less of a challenge than the Red-black tree. So besides the fundamental guidance of CLRS I had to do quite some googling to get it working. While googling I noticed there were quite a lot of questions about which (AVL or RB) tree was 鈥渂etter鈥 in some sense, be it insertion, search time, deletion time, etc. Most textbooks and articles dismiss this question just by stating the factor differences in either trees鈥 worst case heights, as we briefly mentioned in the past installment. If you鈥檙e anything like me, however, you鈥檒l want to see some comparisons where the trees are actually tested. So I decided to do some simple benchmarking to test those theoretical worst-cases. Here鈥檚 what I found out.

Ruby DSL & metaprogramming, part II

In the previous installment we built a simple text generator using some Ruby meta-programming tricks. It was still far from being our desired context-free grammar (CFG) generator, though, since it lacked many CFG prerequisites. Most flagrantly, we had no rule recursion and only one production (rule definition) per rule. Here鈥檚 the what a script that would use both features:

dictionary
  noun 'dog', 'bus'
  verb 'barked', 'parked'
  preposition 'at'

rule 'phrase'
  opt 'The', noun, verb, preposition, 'a', noun
  opt 'Here goes some', phrase, 'recursion.'
  opt 'Meet me', preposition, 'the station.'

grammar phrase: 10

The dictionary section is just as we left it. Let鈥檚 see what changed in the rule section.

Ruby DSL & metaprogramming, part I

I鈥檝e been working with Ruby for nearly a year now, which means I鈥檓 starting to feel the urge to tell people how awesome the language is. One of the most interesting aspects of Ruby to me is metaprogramming, which it seems to have quite a vocation for.

Since college I have a fondness for automata and formal languages theory. One of the topics I particularly like is text generation (if you haven鈥檛 already, check out the excellent SCIgen and the Dada engine), so I thought that building a Context-free grammar (CFG)-like text generator in Ruby would be a nice little exercise and an opportunity to use some of the language鈥檚 coolest features. Also I鈥檝e implemented one of those using Java several years ago, and it was a mess, so I was curious as to how much of an improvement would Ruby offer.

Suppose the following script:

dictionary 'noun', 'dog', 'bus'
dictionary 'verb', 'barked', 'parked'
dictionary 'preposition', 'at'

rule 'phrase', 'noun', 'verb', 'preposition', 'noun'

codex 'phrase'

We鈥檇 like dictionary to store some words according to their classes, and rule to define a specific ordering of words. For now let鈥檚 not worry about codex (it鈥檚 just a collection of rules).

At this point the seasoned programmer is mentally sketching some kind of text parser. It鈥檚 an okay solution, but isn鈥檛 there something nicer we can do? Well, there is: DSLs! In fact, Ruby is quite an excellent tool to build a DSL, and many famed Ruby-powered applications such as Rspec (and many others) define some kind of DSL.

Trees, part III - Red-black tree

In our last installment on trees, we studied and implemented the AVL tree. The AVL tree is one of many self-balancing binary search trees, a special kind of BST that enforces sub-linear operation costs by maintaining tree height close to the theoretical minimum of $latex log_{2}(n)$. This is usually done by what is called tree rotation, which is basically moving around tree nodes (and updating some special node properties).

As you can see in the Wikipedia page鹿, AVL trees guarantee that the tree height is strictly less than $latex \approx 1.44~log_{2}(n)$, while Red-black trees have a slightly worse threshold of $latex \approx 2~log_{2}(n)$; thus, AVL trees will provide significantly better search times than Red-black trees. However, while AVL trees may need to do $latex O(log(n))$ rotations after each insertion, Red-black trees must do at most 2 rotations per insertion. So either one may be your tree of choice depending on the application: if search time is critical but data doesn鈥檛 get updated too often, an AVL tree will perform better; whereas a Red-black tree will perform better in scenarios where data is constantly being changed.

Self-balancing BSTs add some kind of property to tree nodes that make way for tree balancing: with AVL trees, it was the 鈥渂alance factor鈥. With Red-black trees, a 鈥渃olor鈥 property is added to each node. This leads us to the Red-black tree properties:

  1. Every node is either red or black
  2. Every leaf is black
  3. If a node is red, then both its children are black
  4. Every path from a node to any of its descendant leafs contains the same number of black nodes