A Developer's Notebook

Leaving Amazon

2023-12-26T23:30:00Z

OpenAI is so much fun!

I’d like to preface this by stating that Amazon is obviously a huge company, and my opinions are just that, one person’s opinions. There will probably be some people that share my frustrations while others have had a completely different experience.

I interviewed at AWS in early 2020, pre-pandemic. The interview process is grueling and I spent considerable effort preparing for the 5-hours-long pantomime of absurd algorithms trivia and “tell me a time when you said no” behavioral questions. COVID-induced visa processing delays pushed my start date forward in time many times. The high-stress interview process and years-spanning wait built up tremendous anticipation. In hindsight I can say I probably had somewhat unrealistic expectations when finally joining the company in late 2021.

Regardless, I was quite frankly shocked after my first couple of weeks, and my first impression was that this kind of work was not for me. As a software engineer, I expected to eventually do some software engineering. I’m not sure how to describe the work that first team I joined was doing, but I can’t in good conscious call it software engineering.

The bulk of the work consisted of updating configuration files and attending meetings with other teams where the driver would recite what was written in project management cards. No one seemed to really deeply know what the team was doing (or maybe I was too thick to understand it!), and quite frankly no one seemed to care very much. Having worked on very product-centric, fast-paced startups before, this kind of apathetic, complacent attitude was bizarre to me. If you browse Blind for more than five minutes, though, you will see for yourself that this kind of experience is far from being an exception.

I found my way out of that team as fast as I could. The ability to effortlessly change teams is something truly praiseworthy at Amazon. I got no pushback from my former manager. The whole process is extremely simple and streamlined.

Although the new team was far more interesting from an engineering perspective and my coworkers much more motivated, it was still an Amazon team. Over time I would understand that while Amazon has a great diversity of teams, they’re all playing the same tune. These are things that bothered me, in no particular order:

Big companies have a way of codifying their customs and culture into lists of commandments (1, 2, 3) that are often vague and contradictory. No one has figured out yet how to make true believers out of new hires, but you are expected to at least pay lip service to their moral code. In the day-to-day work life of rank-and-file employees, these dogmas are at best a hurdle that regular folk have to navigate around, and at worst become weapons to justify doing or not doing whatever you want.
Tooling. No matter how good an external solution is, there is an internal tool that is twice as difficult to use and half as good. This isn’t any developer’s fault, by the way, and almost always is a side effect of either legal requirements (software licensing), legacy decisions made long ago, or empire building. More on that later.
It is a large company where decisions flow from top to bottom. These decisions are to be accepted as facts of life. There is no conversation, no room for debate, and no one to appeal to. No exception, no accommodation. Scream against the wind all you want, write a petition with 30,000 signatures - doesn’t matter.
Big ships take a long time to turn around. Everything moves slowly and there is little space for experimentation. Do you have a good idea? Maybe something dozens of others also agree is a good idea? Good luck getting that into any roadmap.
Empire building. This follows almost as a direct result of the previous two points. Higher-level people decide things and those things get built, regardless of whether its a good idea or not. Promotions are vastly politics-driven, and large orgs seem to form around who has more political clout than any technical or business realities.
Apathy towards the craft. I struggled to find passionate, highly motivated and competent engineers at Amazon. I met a few excellent engineers, but most seemed to only punch in and out, with near-zero interest in anything related to programming or technology outside of working hours.
Misaligned incentives with the Leadership Principles. LPs are always the strongest force directing everything at work; engineers are trained to think about how their work artifacts align with the LPs, not on the quality or usefulness of what they are doing, which is usually left as an “if we have time for that” afterthought.
Unstable, untrustworthy leadership. I was truly blessed to work with first-grade direct managers. Higher level leadership, however, had some truly bizarre behavior the last few years, like the poorly-communicated, seemingly endless waves of layoffs or the disastrous return-to-office and return-to-hub policies.
Collaboration culture. While at any other job I would expect to be able to find a subject-matter expert on something, contact them and maybe do some quick pair programming, this is highly frowned upon at Amazon. There are good reasons for that: the company is too big for this kind of fast, informal interaction, and SMEs of highly demanded services would be quickly be overwhelmed by DMs if this were a thing. Moats are built to prevent that; instead of just asking someone about something, many times you will have to file a ticket and wait a week or more for office hours, which ends up being about as useful as DIY research in the internal platforms.

Some of the above is to be expected in any large company, and I could live with many of those issues. The one thing that convinced me that I needed to leave, though, was the stuck-in-quicksand feeling that working at Amazon was a career terminus. This was the first time in my career I felt that my day job was actively working against me, making me a worse software developer. There’s a choice to be made: keep up with the industry or keep up with Amazon’s internal technologies and tooling.

There was no new technology or innovative engineering process for me to learn. Simple problems become complicated ones because of the internal tooling overhead, leaving little time to work on more interesting things.

Going full circle, I want to restate the above is just my experience. I’m sure there are teams, maybe many of them, that contradict some or all of what I listed. I’ve never had demanding on-calls, for instance, which are an extremely common complaint at Amazon (I’ve personally seen friends and acquaintances leave a dinner party after being paged). To those who enjoy working there I wish only the best.

Book review - Sapiens

2022-07-15T17:54:50Z

Sapiens in an image.

Sapiens has no central point being made; rather there’s an intricate web of mostly interdependent theories and speculations. This makes for an enjoyable read, however at times it is easy to lose sight of the original premises used to build up on increasingly speculative conclusions.

His reflections on early humanity, what the author calls Cognitive Revolution and the unforeseen consequences of the Agricultural Revolution are the most interesting. Homo Sapiens’ edge over the competing species was the ability to operate in shared imagined worlds, which eventually developed into modern states, complex economies and religions. These shared worlds allow large numbers of individuals to cooperate towards large projects unfeasible to smaller bands. And while this might have hugely increased human population, it also meant worse living conditions for the average person: grain-based civilizations are more susceptible to famines, droughts and plagues than their hunter-gatherer predecessors.

Less interesting are the speculations on the future of humanity in the later, basically envisioning a very prolonged or eternal life based on vague hopes of medical breakthroughs/transhumanism. Its really just a very tired rehash of ancient myths. Weirdly, Harari kind of implicitly admits this by citing those same myths and insinuating that his hopes are of the same vein or kin.

Book review - Collapse

2022-07-13T17:54:50Z

Hvalsey church ruins, Greenland. Credit: https://en.wikipedia.org/wiki/User:Number_57

Collapse is a fascinating, if somewhat exhausting, read. The central point of the book is that environmental changes, man-made or not, have been responsible for many a civilization’s collapse.

In the year 2022 that claim might perhaps sound obvious, but the fascinating part is learning how these stories unfold, often with unexpected twists and many unintended consequences. For instance, medieval Scandinavians found in Greenland a scenery not unlike their native northern Europe, and proceeded to build a society over there replicating what they had back home. This worked for a while – centuries, in fact – but proved ultimately unsustainable and disastrous. An overview of modern Montana’s environmental and economic woes sends the message: what happened before might be happening again.

Less brilliant are the chapters regarding Central America, specifically the chapter on Haiti. It glosses through the island’s modern history while failing to mention the single most important historical fact about it: the extortion of Haiti by France, in which France, backed by the United States, demanded indemnity by the loss of property (including the enslaved Haitians themselves), militarily extorting the tiny island with warships in probably modern history’s greatest heist. The last payment was made only in 1947. Given the minutiae of other historical facts included in the book, which are trivia by comparison, this jarring gap can only be seen as intentional.

As for the previously mentioned exhaustion: the book gets its point across clearly long before its final chapters, which in turn seem a bit redundant and repetitive.

Jared Diamond is a somewhat controversial scholar, often accused of geographic determinism. There is plenty of self-justification of the contrary in Collapse, which makes you wonder.

Be that as it may, and even with the embarrassing failings mentioned above, Collapse is an excellent read.

DALL·E minis of the future won't be fun

2022-07-10T23:30:00Z

I’ve been playing with dalle-mini the last few weeks. Part of what makes it fun to play with are the bizarre and obtuse outputs. They reached that sweet spot between laughably bad and frighteningly perfect: they’re good enough to be understood and enjoyed, basically.

I think that incompleteness is part of what makes it so amusing to toy with these things, and conversely what will make future versions much less fun.

Dalle-mini is, as the name implies, much smaller than dall-e, using 0.4 billion parameters instead of 12 billion. Dall-e isn’t entirely publicly available so dall-e mini is more of an reconstruction/reverse engineering effort rather than just a toned down version (extracted from their website).

Future iterations will be far more impressive. Consider the difference between dall-e and dall-e 2, for instance:

As they inch closer to perfection – perfection being “indistinguishable from human-made”, by the way – these models will surely be widely commercialized, and eventually easily available.

I think playing with a near-perfect “future dalle” will be as fun as comparing baking soda packaging at the grocery store. These models will become just another tool. I can see them being widely used to generate birthday card designs, obliterating the probably small niche of birthday card designers. I can’t see them being the next Bruegel the Elder or Hieronymus Bosch.

As these models become increasingly commoditized, they will blend in with the art industry and eventually be forgotten by the public like all novelties eventually are. An AI-generated corporate décor at the office will be as bland and uninteresting as a human-made one. Eventually the distinction won’t matter. Right now, though, it is hard to mistake a dalle-mini output as being created by a human, and I think that is part of its appeal.

Think about the sample inputs that OpenAI gives their model, like “an astronaut riding a horse in photorealistic style”. We don’t really need a near-perfect AI model to amuse ourselves with that thought – our brains are enough.

Perhaps something similar or analogous happened with videogames. Early 3D games would leave much up to the imagination of the player; the closer games get to perfect graphical realism, however, the more boring and uninteresting they seem (to me). I think the wave of indie games with intentionally “bad” graphics is no accident - there is an undeniable appeal of leaving things out.

Return of the Obra Dinn (https://en.wikipedia.org/wiki/Return_of_the_Obra_Dinn)

This boils down to what I think is a fundamental misconception about AI held by many people. AI doesn’t (and cannot) create anything, it just mixes and matches (in very smart ways) things created by humans. The training sets are immense, and the techniques are increasingly complex, but unless there is some radical, foundational pivot, AI is and will be ultimately derivative of human creations. And derivation is ultimately boring. Right now most people still don’t quite grasp this ontological difference between “real” intelligence, which creates things, and AI, which doesn’t (some will claim there really is no difference and humans must operate in the same way, but I won’t get into that).

This misunderstanding leads to pretty funny situations, like the Google engineer that claimed a chatbot was sentient. Its not. It mixes texts in a smart way. I think people will eventually catch up to this, and when that happens, the improved “mini-dalle” of the future won’t be nearly as interesting or amusing as its less perfect predecessors.

Teslas are a dystopia

2022-06-12T01:08:41Z

This could have been a subway.

Since moving to Vancouver metro I’m continuously surprised with how common electric vehicles have become. Some may see the rise of EVs as an exciting turn towards a futuristic solarpunk utopia. I see the opposite: they are a dystopia of sorts. They are a dead end, a waste of resources in the wrong direction, a false hope.

The proliferation of personal electric vehicles is a strong marker of failure. It is a manifestation in the physical world of our inability as a society to move on from a clearly failed, car-centric way of living. Teslas kind of epitomize this – despite being just another very expensive car, despite catching fire every now and then, despite bizarre QC issues, despite its nitwit CEO, still they are seen as cool and fashionable and trendy.

EVs are just an incremental improvement on cars. This painful reality hit me when I first heard an EV whooshing down the street: I couldn’t really tell it apart from any other vehicle. Over the years I had read so many hyped articles about how EVs were so quiet that they needed artificial sounds to warn pedestrians that I expected nothing less than library-level quietness from an EV. Instead I heard the same deafening roar as any other car.

It turns out that if you put a tonne of metal on rubber tires going on asphalt, that rubber-on-surface sound is likely the dominant noise source at any appreciable speed:

"Tire-pavement interaction noise (TPIN) dominates for passenger vehicles with the speed of above 40 km/h and for trucks with the speed of 70 km/h."

Literature review of tire-pavement interaction noise and reduction approaches, Tan Li, 2018

As speed increases, it quickly doesn’t matter if you even have an engine at all, as aerodynamic noise also factors in:

Past 15mph or so, propulsion noise is no longer the largest noise source.

So the sound will be the same. So will other things: the parking space it requires is the same, the urban sprawl it stimulates is the same, the accidents are the same (or worse). Everything that matters is basically the same.

For all the distracting “features” of these high-tech gizmos, in essence they are still just cars. No matter how much high tech is embedded into them, they are ontologically the same as the first Model T.

The most obvious improvement of EVs is in energy efficiency/pollution. Ironically this seems to be a very old pro-car argument. In The Death and Life of Great American Cities, written over half a century ago, Jane Jacobs notes that early car proponents defended the personal vehicle as cleaner than its predecessor – horses. Internal combustion engines don’t poop, so the streets were in fact cleaner than before. That is all fair and just; she argued that the issue, however, was that horses were being replaced with far too many cars – in other words, the one-car-per-person model of North America. Essentially the same argument about cleanliness is being repeated now, a century later, with EVs.

Pollution spewing from combustion engines, although a serious issue, was never the only nor the greatest problem caused by cars (and EVs still generate plenty pollution, just not inside the engine, but in power plants and factories). It is disingenuous to pretend otherwise.

The rise of EVs is so disheartening because it is a huge missed opportunity and waste of resources. Even more so because it is being pushed by and within rich Western economies, that tend to lead the way to the rest of the world. Climate change, failing cities and fossil fuel scarcity could have catalyzed a new era of major public transit infrastructure projects and a shift in zoning practices towards sane densification. But those things are hard – culturally and politically – and people will do basically anything to avoid having to do them.

EVs are the perfect kludge that enables us to do nothing. People get a clean conscience thinking they are “doing something”, while not really addressing the underlying issue (and creating some brand new problems as well). Selfish NIMBYism and infinite sprawl can carry on unbothered.

Are "digital nomad visas" a thing yet?

2022-04-13T08:00:00Z

Immigration sucks. In addition to the personal toll it takes on anyone, it is also mind-numbingly tedious and baroquely complex. Why aren’t things better by now?

This reminded me of the so-called “digital nomad visas”. Searching for that term will get you a thousand clickbaitey Wordpress sites with the “20 best countries with nomad visas” or whatever. But how real are they really?

I’ll admit the term sounds pretty cool. It sounds like something you’ll be doing on your phone, as opposed to over the hundreds of PDF files and scanned documents a normal application will demand.

In fact, the name is so enticing I’m pretty sure that’s the whole point. It is so radically opposite to idea of traditional visas that people can’t help but think that they are opposites.

The “digital nomad” part leads one to believe that the stamp is somehow catered towards tech workers with remote jobs; thus, being directed at that demographic, it makes sense to assume that the visa has the characteristics that make it more desireable to them than traditional visas – otherwise, what is the point of these visas? In short, I think this is what most people tend to think when meeting with the term for the first time:

	traditional visa	digital nomad visa
Requirements	Many	Few
Application process	Hard, time-consuming, complex, lots of paper documents and legalese	Easy, fast, simple, lots of e-documents
Time	Measured in aeons	Measured in days
Path to full residency	Tortuous	Streamlined
Uniqueness	Specific to each country	Same/similar between countries
Time limits	Constrained	Unconstrained
Coolness	Minivan heading to football practice	Convertible doing doughnuts

What people think is going on ☝️

What really goes on is that these digital thingies are almost always just a rebranding of old the same old temporary visitor visas. Now this is where things get interesting: who is advertising these visas as “digital nomad visas”? At first I assumed governments were doing the rebranding, but that is not always the case. There are some countries that do use the specific wording “digital nomad” when describing these visas, like Greece’s Law 4825/2021 and more famously Estonia; however, in most cases most of the publicity seems to come from elsewhere.

For instance, take the Portuguese D7 visa, which is very specifically meant for members of religious orders, retirees and people with significant passive income, but is touted as a “digital nomad visa”:

Official D7 requisition form. It literally says it is meant for retirees, members of religious orders and people living off passive income.

Now if you search for “portugal digital nomad visa”, guess which visa you are lead to believe is tailored for, well… digital nomads?

I’m not saying that the people above that are connecting “digital nomad” to the D7 visa are wrong in any way – they might be totally correct. In fact, among the search results there are so many law firms, consultancies, blogs and vlogs selling the D7 as exactly that that they are probably right. Have they found a loophole? Am I missing something – perhaps all remote workers are really nuns and priests in disguise? Do I just suck at googling and missed something obvious?

Be that as it may, one thing I would love to see is how many people successfully live as digital nomads with that visa.

Anyway, my point is that the publicity linking that visa to digital nomadism is coming from those very interested parties, and not the government of Portugal. In short, they have skin in the game, and you can bet they are making money off of that connection.

I bet if someone investigated other countries touted as having “digital nomad visas” they would find similar results.

Back to the original question, when are these visas going to be a thing?

In a sense they already are, because these are old visas under new guises – be it the government or third parties doing the rebranding. Now the idea we have of how they ought to be – the happy second column in the table above – that’s something else, and I can’t see it becoming a thing, ever.

No matter what shiny wrapping you put around immigration, it is still immigration. Countries that have painful processes have them for a reason – put simply there are more people willing to go through the process than they need, so they can be picky. That fundamental reality isn’t going to change just because we have this new expectation due to this shiny new term.

All countries have the same basic incentive for getting more of these “digital nomads”: high salaries, which means more taxes and consumption. How important that might be varies between countries, which explains the list of countries that have some kind of “digital nomad” visa – the larger world economies are usually not on those lists, and when some of them eventually are, the requirements are so stringent that they’re not really any different from normal temporary work or rich-person visa.

The closest thing I’ve seen to a “true” digital nomad visa is Estonia’s. They seem truly committed to the same ideals we described, and the process seems fairly straightforward. Its the exception that confirms the rule, sadly.

Botched interviews

2021-12-30T12:34:36Z

Here’s something I’ve been wanting to write for a while: all the times (the ones I can remember, anyway) I bombed a software engineer job interview. There are so many “how I aced interviewing at X”/”how to pass X interview” floating around that I thought the opposite story would make for an amusing read.

My first developer job was as an intern at a big tech company in 2012. I think that was one of the worst interviews I’ve had, by the way – I could barely understand the interviewer over the cellphone, and those were the days of “how many piano players are there in New York”-kind of questions. I thought it went terrible, but I got the job somehow. On the other hand I’ve had many interviews I thought I did great but bombed anyway.

FAANG 1 (2013)

I was just out of school when I got a cold call from a FAANG recruiter asking if I was interested in interviewing there. Needless to say I was naive and didn’t quite know what I was getting myself into.

The first part of the process was a phone screen with a technical recruiter. She asked me something like “What is the fastest way to sum two 32-bit integers?”. I froze like a deer in the headlights.

After gasping in awe at the bizarre question and mumbling some nonsense, I was informed that my answer was not correct. The right answer, the recruiter said, was “using the CPU’s TLB”. The translation lookaside buffer is a memory cache located between the CPU and the CPU cache. I still had college classes in my somewhat-recent memory at the time, so I kind of knew that this thing existed, but to this day I still don’t know how to sum two integers with it.

FAANG 1 (2014)

A year passed and (the same) FAANG came to my town with a recruitment event. Again I got a call, and instead of a phone screening, I would go straight to the event location and do a quick onsite interview.

They provided recommendations on technical subjects I should refresh my memory about: basically Algorithms 101 syllabus; sorting algorithms, red/black and AVL trees, A* and Dijkstra, NP-complete problems, etc, as well as some operating systems topics: processes, threads, mutexes, scheduling algorithms and so on.

At the time I had just started grad school. I was a bit more seasoned than the last interview, but far from having any relevant industry experience.

Interview day. I got to the venue and sat reading my notes on how to balance AVL trees. The interviewers showed up, greeted me and got things started. They gave me pen and paper and asked me how to some things on a list of integers. I don’t recall the interview being particularly bad or anything, but round one was the end of the line for me. A few days later I got a boilerplate rejection email and that was my last contact with this FAANG.

Mid-sized tech (2014)

Grad school classes were few and far apart, so I decided to start looking for a job. This mid-sized tech company had a local office, so I got in touch and scheduled an interview.

I don’t remember any meaningful details from this interview. One thing I do remember is being asked what monthly compensation I expected. The interviewer passed me a scrap of paper and a pen for me to write the number down. I took a few moments to think and wrote down what I thought was an adequate number. In today’s US dollars, that number would be enough to afford a parking spot in San Francisco, but it was an okay salary for a junior hire in my town.

I didn’t get a lot of feedback here other than “we went with someone else”.

FAANG 2 (2019)

A few years had passed since my last failure. I had finished my education, become a Ruby developer and enjoyed a 3.5-years tenure at a great, small local software studio. I had a preferred text editor. I had dotfiles. I felt weathered. So I did what you’re supposed to do: I applied to FAANG.

FAANG responded to my application, and after a couple of months and being ghosted by one of the 5+ recruiters involved in the process, things got on track for the onsite.

I did the Leetcode thing daily. I read Glassdoor tips and talked to college friends that worked at this company. I even went on Blind and found out that you’re basically an idiot if you don’t pass FAANG’s interview, cause it’s so damn easy.

I passed the initial screening rituals and was invited for an onsite – 25 hours and 3 flights away. As I obsessively went through my notes in the hotel, I thought I was finally ready.

There were maybe 5 or 6 rounds, each with one or two interviewers, all very nice and moderately helpful. Pretty much textbook FAANG interview, just like CTCI describes it. I was asked a particularly difficult set of questions. Mid-interview, I got the familiar feeling that a train is steaming towards me and I’ll soon be smashed into smithereens. Looking back, I’d evaluate myself at this interview as Not Good.

Another 25 hours, 2-stop travel back to my town. Feedback came swiftly in the familiar lukewarm rejection phone call.

Big Tech (2019)

This Big Tech company is highly respected in the Ruby community and their culture seemed to align with my own. I tried, and failed, their interview process two times.

They have a fairly straightforward interview process: first a phone screen/short code challenge, then a longer behavioral/technical challenge, typically onsite. First time, I didn’t make it to the onsite.

My second attempt was much smoother and lead to the onsite. Like FAANG 1 (2019), this involved multiple flights and 20+ hours of travel. Also like my previous interview, I felt ready. The recruiter was great, and every person I met so far was extremely nice. Company culture seemed fantastic and I liked the tech stack.

There were two technical rounds, one cultural fit conversation and one technical-but-not-coding round. The coding parts went well, maybe a B+. The culture fit thing was very good. The not-coding part, ironically the one I had prepared the most, was a disaster, although I didn’t see things that way at the time.

I came prepared to talk about one of the projects I lead in my current job at the time; evidently I had an NDA in place and had to navigate around it. I thought this was fine – I had already published a post on the same subject on the company’s public blog and never had any complaints about it being too esoteric/abstract. The interviewer was not amused by this at all. Because of the vagueness of the language I was using, he seemed to think I was describing something shady. At one point, he actually asked something like “do your users know you are doing this”! Right now I think that was kind of funny, but I was utterly bewildered at the time. I tried to reassure him that there was nothing fishy going on, but at that point he had probably made his mind. I might as well have gone straight to the airport and saved everyone some time.

Rejection call came a week later. Upon my request, the recruiter followed with a very thorough email detailing the reasons of the rejection, which is a fairly unusual thing for these companies to do, and very helpful for the candidate. Although I think the interviewer could have handled the situation better (just ask me to describe another project), I have great respect for the way the company handled the process and gave honest feedback.

The printer in the room

Hiring is the printer of software engineering jobs. It kind of works, but not very well, and everyone seems to agree it should be better at this point. This is in no way a demerit to recruiters – they’re doing their job and are not at fault here. They’re usually pretty good; its the framework that isn’t great. There have been some incremental improvements: code sharing platforms are pretty good, which makes remote interviewing very straightforward (and reduces the need for onsites, which suck). There are many services where you build a single profile and apply to many companies at once, which reduces the time waste of filling the same forms in all the different companies’ websites. The bulk of the process remains more or less the same though.

Interviewing is in part a numbers game, but also not entirely random, which means there is a way to get better at it (that’s the whole point of companies like Leetcode and books like CTCI). Failing an interview you prepared for leaves a sour taste in the mouth, but over time it gets easier to accept as just part of the probability game.

Analyzing LinkedIn's data export: what happened in 2021?

2021-10-07T14:45:00Z

I’ve been using LinkedIn basically since I started working as an intern back in 2012. My usage is mostly limited to posting my blog posts, except the couple of times I used the platform to search for a new job. So most of the time, LinkedIn has been pretty slow-paced, with maybe half a dozen random recruiters reaching out per year.

However, since the Covid-19 pandemic started, and particularly in 2021, things seem to have gone a little crazy, with a lot more recruiter activity. I was curious to see just how much things had changed, so I looked at LinkedIn’s data export.

First I requested my data from LinkedIn:

Messages, Connections and Invitations seem like the most promising sources of data:

➜  Basic_LinkedInDataExport_10-01-2021 ls -lahtS
total 1020K
-rw-rw-r-- 1 leo leo 577K out  1 13:06  messages.csv
-rw-rw-r-- 1 leo leo 146K out  1 13:05  Connections.csv
-rw-rw-r-- 1 leo leo 113K out  1 13:05  Contacts.csv
-rw-rw-r-- 1 leo leo  43K out  1 13:05  Learning.csv
-rw-rw-r-- 1 leo leo  23K out  1 13:05  Invitations.csv

The Connections export is somewhat limited for our purposes: I only actively add people on LinkedIn during a job search.

Messages are a bit more interesting because a lot of recruiters immediately offer a position in their first contact (sometimes even with a pre-scheduled Google calendar event! I wish things were this straightforward back when I was finishing school).

Invites are also a good source, complimentary to Messages. After accepting or rejecting an invite, the Invitation is deleted, so there’s no danger of double counting an interaction that started as Invitation and then evolved to Messaging.

Focusing first on the Messages export, here are some relevant info we might aspire to extract:

Job offers (as in “I have a job I want you to apply for”) per date
Keywords mentioned in messages (“Ruby”, “Rails” etc)

Let’s see if we can extract those.

Job offers per month

Job offers mainly come from messages, and the bulk of my messages come from recruiters. However, I do get a few scattered personal messaging from old acquaintances, some professional but not interview-related conversations, etc.

A simple approach to estimate how many messages are actually from someone promoting a job opening is to look for certain job-related terms: in my case, as a Ruby engineer, if a message contains “Ruby” it is probably from a recruiter advertising a Ruby-related position. This is only an estimate: maybe I chatted about Ruby at some point with an acquaintance, which of course is non-related to our objective here. Those cases are few and far apart compared to the recruiter conversations though.

With that in mind, I built a list of terms that are related to job searches:

job offer opportunity ruby developer engineer talent salary relocation position role recruiter talent looking interested oportunidade trabalho vaga software experience tech rails interesse interested company email work senior contato vagas remote working stack backend technical developers skill skills

Most of the messaging I get is in English, but I do get a significant amount of contacts in Portuguese as well, so we have terms in both languages.

With that list of terms, we can simply select all the relevant ones and group those by month:

  def job_messages_per_month
    job_related_messages = @input.select { |row| row["CONTENT"] =~ /#{@relevant_words.join("|")}/i }
    metric_per_month(job_related_messages) { |row| Time.parse(row["DATE"]).to_date }
  end

Now, when I’m not actively looking for another job, I tend not do look at LinkedIn too much, so the Invitations tend to pile up. As already mentioned, accepted/rejected invites get “deleted” from LinkedIn’s data export (which doesn’t seem like a great practice IMO, as they probably still have that data), so only invites that you haven’t acted on either way are available in the CSV export.

Just like with messages, we group the relevant invites (“Inbound”, meaning someone is adding you as opposed to “Outbound” where you’re adding someone) by date. I didn’t bother filtering by terms because nearly everyone that adds me is a recruiter these days:

  def recruiters_per_month
    received_invites = @input.select { |row| row["Direction"] == "INCOMING" }
    metric_per_month(received_invites) { |row| Time.strptime(row["Sent At"], "%m/%d/%y").to_date }
  end

Here’s the result of summing both data, messages and invites:

There was a spike in early 2019, when I actively pursued a new job. I also gave a conference talk at that time and added a bunch of people on LinkedIn. Thus, this peak in job offers is just a consequence of me actively looking for a job. After that, activity dropped to back to lower levels (I also ticked “not looking for a job” on LinkedIn right after I signed the offer at my new job around April 2019).

On the other hand, since late 2020 job offer messaging has grown steadily. I wasn’t actively looking, so this here is organic growth. I’m curious to see if other people also had a similar pattern. Perhaps this is a reflection of an increase of demand in some specific skillset (Ruby) or experience level (X years of experience), but I’m guessing its part of a general upwards trend in the industry since the beginning of the pandemic.

Most common terms

Another interesting piece of information is the most common words mentioned in the conversations.

We could just count how frequently each word pops up, but irrelevant words like “the”, “a” and so on would rank in the top. So first we need to get rid of those words, then look at the linted text. I’m sure there’s an API that does just that somewhere out there, but I created my own list of non-relevant words manually and used that instead.

  def word_frequencies
    full_text = @input.map { |row| row["CONTENT"] }.compact.join(" ")
    normalize_words(full_text.split(/\s+/))
      .map { |w| w.gsub(/[^a-z]+/, "") }
      .reject { |w| w.size < 3 || @nonrelevant_words.include?(w) }
      .group_count.sort_by{ |a| a[1] }.reverse
  end

Using the MagicCloud gem, here’s the plotted results:

No surprises there – terms like “Ruby” and “Rails” are among the most frequent. Other bland job-related terms compose the bulk of the word cloud.

Here are the actual numbers for these terms:

What happened/is happening?

Going back to the original question: what happened in 2021? Did software-related job offers explode during the pandemic? My anecdata points to an obvious Yes. Most articles discussing this question are just opinion pieces around the lines of “engineering demand increased because digital services sharply expanded with the lockdowns”. I couldn’t find any hard data supporting these assumptions (other than this personal analysis presented above).

One of the major impacts of the pandemic, being forced to remote-only did have pretty obvious effects in non-US job markets. Before the pandemic, I suspect that many very competent people were hesitant to leave their jobs due to strictly non-remote perks: nice offices, work colleagues, local benefits like healthcare, maybe even specific labor laws regarding vacation time and so on.

Remote jobs based in strong currency countries, especially the US, were already “a thing” long before the pandemic, but with remote work being mandatory rather than an option, local remote vs foreign remote boils down to a huge pay gap in most cases, with US-based software engineering salaries being hard to compete with anywhere else in the world.

I’m very curious to see how this pans out for local software shops. These local companies are really bleeding talent leaving for stronger currencies; if these dynamics go on for too long, I can’t see how most of them will last.

Code

Here’s the repo with all the scripts needed to reproduce these results.

Onsites considered harmful

2021-08-27T21:33:00Z

A couple of years ago I interviewed at one of the largest Ruby shops out there. Screening went well, and some days later I was invited for an onsite.

These were the good old pre-covid days, so an onsite really meant onsite. You had to travel to the office, wherever that was.

The thing is, an onsite is actually radically different depending on where you live. It follows that onsites introduce further bias into our industry’s already problematic hiring process. I’d like to argue that although onsites have some advantages, they’re mostly a waste of time (and money).

If you’re a local, an onsite probably means taking a bus, metro, taxi, walking, whatever. If you’re not local but are at least within the same country that you’re interviewing, it might take a day trip or maybe a short flight. If you’re a foreigner it might take travel visas and a week of travel.

Without getting too philosophical, we all know we have a limited amount of sand grains in our hourglass. Fallacies apart, anyone can feel that the more we pour into something – be it renovating a kitchen or traveling for an onsite – the higher the stakes become.

Its glaringly obvious that someone who invested a 30 minute bus ride to an onsite will be much more at ease than someone who flew godless hours on a red-eye. It doesn’t really matter how much pampering the latter is treated with: exquisite hotels, meal allowances… investing a week of your time will always drive up anxiety a lot more than taking an afternoon off work.

Back to my story: I was interviewing for a company overseas. I happened to have a valid visa for that country, which already puts me in some advantage compared to others. Physically getting to the company building for the interview, however, took some effort: I drove to the local airport, where I arrived more than a couple of hours early, flew down to São Paulo, then took two more flights until I reached my final destination, some thirty hours after I stepped out of my house, then I took another cab to the fancy hotel booked by my not-to-be future company and collapsed for the night.

Next day I had the onsite (which took basically the full business hours), then back to the hotel, collapsed again. On the third day I backtracked through the 30 hours of cabs, airports and flights back home. This was late December, by the way, so airports were packed. A couple of days later, on Christmas eve, I got a very thoughtful happy holidays + no thanks call from the recruiter.

Might I have gotten the job if I had taken a bus instead of multiple planes? Maybe, maybe not (probably not, since someone in the interview panel just didn’t enjoy my parlance).

That isn’t really the point, though, and as far as anecdata goes, I have the opposite story as well: I interviewed twice at the same company, once through a tortuous voyage similar to the one I described above, and another time at my city, with the roles reversed: I left my house and drove for a few minutes to the onsite, while the interviewers were enjoying a fancy beachside hotel after several plane trips. I failed the former and got an offer from the latter. I’m the same person applying for the same job at the same company. Did I just perform better? Did they perform “worse”? Is it all just a coincidence? Are all of these interviews meaningless hazing rituals?

But I digress. Back to the matter at hand: if you think of it, the 60 hours of commuting alone is more than one work week (and as far as effort goes, I’m sure more goes into enduring 60 hours of planes and airports than into programming). If you factor in the actual onsite, then we’re talking about two workweeks of effort put into a no-strings-attached situation. The elapsed wall time is well into a full business week.

There are sensible, rational grounds for an onsite. Recruiters want to know if the candidate hates the cold and is going to churn early winter, or maybe the city is too small, or too big, or whatever random factor might make people want to run away. That said, I find it hard to believe even the most prescient can get a read on any of those thoughts rushing through the candidate (probably even the candidate can’t).

In any case, are those things worth the several thousand dollars of expenses, and more importantly, are they worth excluding a possibly large pool of candidates that aren’t willing to invest a full week of their time on a process with naturally low chances of going forward? At least in my opinion, those very thin pros are outshined by the very real cons like the -8.208527 Sun outshines my laptop screen.

Now let us also remember that the success of job searches depends on arbitrary things like if everyone on the panel likes your face. We all know things shouldn’t be this way; we’re supposed to be unbiased and empathetic, but let’s face it – we humans suck at that.

Even if we consider an utopically unbiased interviewer panel, there’s still all sorts of random noise going on at an interview, like performance anxiety. No matter how great the people interviewing are, and even how great you are, interviewing always has a huge degree of uncertainty:

"The fact that people who are overall pretty strong (e.g. mean ~= 3) can mess up technical interviews as much as 22% of the time shows that there’s definitely room for improvement in the process"
- Technical interview performance is kind of arbitrary. Here’s the data.

My point is: such a volatile thing should never have been tied to multiple plane tickets and 2-night hotel stays in a different continent in the first place.

Never.

Perhaps surprisingly, “onsites” are still a thing during the covid pandemic – they’re just remote, i.e., not really onsites at all.

This is immensely beneficial for everyone involved: the company won’t have to pay for expensive hotels and plane tickets, the planet won’t have to suffer the huge CO2 emissions from this ultimately unnecessary shenanigan, and the candidate won’t have to waste a week of his/her vacation time with something as ethereal as pursuing a software engineering job.

Recruitment in this industry is difficult. This is widely acknowledged by all parts involved. No wonder there are so many books, videos and Discord channels about interviewing for a tech job – not to mention coding prep services, automated third-party code challenges… the list goes on.

This post is specifically about onsites, but it is impossible not to mention the overall sad state of interviewing for a software engineer job. A quick survey of HN posts is enough to glimpse how people feel about this:

Sounds like hiring isn't in a great shape.

Inefficient and biased as it is (or, hopefully, was), physical onsites are nowhere near the worst possible interviewing practice we can observe in the wild.

Why interview for a job in a quiet office full of nerds if you can FIGHT FOR IT IN A TOURNAMENT like a geek gladiator?

Yikes.

Things aren’t any better on the other side of the table – finding skilled developers in 2021 is tough, even if you’re not setting up a pair programming arena for a code to the death contest.

Some recruiters go a step beyond cold calling and start cold referral calling, like this recruiter asking me to pleeeeeeeeeease refer candidates that match their laundry list of requirements:

Pleeeeeeeeeease send me candidates that match my laundry list of skills!

My second point: recruitment is seriously hard for all parts involved. If we are able to, we should try to make it easier, not harder.

Onsites were a significant hassle on top of an already complicated, inefficient, time-consuming, stressful process.

Although the company usually takes on most of all of the financial hit, the time and emotional load was carried by the candidate alone.

Getting rid of physical onsites is fantastic news for everyone – especially people interviewing, but also companies that can now cast a wider net and carry out a faster, more diverse recruitment process. And our planet will also have God-knows-how-many thousand tonnes of CO2 less to deal with each year.

Efficient resource distribution

2021-08-20T11:53:10Z

TLDR A simple metrics-based ranking system is good enough to decide who gets how many resources.

Computational resources – CPU time, memory usage, network traffic etc – are limited. This may be more or less of a problem depending on project/company size and so on; if you’re working on a smaller product with limited traffic, it might not be meaningful at all.

Once past a certain threshold though, expenses with such resources become non-trivial and it begins to make sense to spend some time thinking about how to distribute them as efficiently as possible.

Here’s the problem that got me thinking about this: at work, we had a computational resource that needed to be consumed by a large fleet of workers (think several thousand concurrent), but each type of worker had different productivity, and that productivity changed over time. How can we decide who gets what?

So the problem is: you have a set of consumers that use the same resource, for which you have a static budget. The consumers all solve the same problem, more or less (i.e. have the same output), but come in different types that have different productivities (defined as output per resource consumption). Additionally, although the consumer types solve the same problem, we want consumers to be as diverse as possible – we can’t just pick the best performing one and go with that.

First thought that comes to mind is this seems ordinary enough; there must be an easy, well-known solution. There might be, but I couldn’t find any that was simple and effective for this use case. Closest I got were PID controllers, which solve a similar problem, but probably doesn’t solve the entire problem here (and also seems complicated).

I gave the problem some thought and came up with a reasonable solution that has been working well for a year now.

The problem boils down to two parts:

Consistently keeping track of productivity among the different consumer types;
Deciding how to share the resource among the consumers.

The concept that glues both parts is that of the cycle – a repeating time period in which we measure productivity and distribute resources to be shared within that time frame, until the next cycle comes up and everything is recalculated.

Problem 1) boils down to maintaining a time series of how much output per resource each consumer type produced during the latest cycle.

Problem 2) comes almost as a corollary to the former problem: we want the best global output possible, and that can be guessed by using the productivity stats from the previous cycle. This won’t be perfect, because productivity varies over time within each consumer type, but basic statistical intuition says it will be good enough for our purposes.

So the first step of solving 2) is building a ranking of consumers by productivity. We want a diverse set of consumer types, though, so we can’t just pick type #1 and give it 100% of the resources all the time. Also, the ranking might change each cycle, and we don’t want resource distribution to be too volatile – that might become hard to monitor and debug. We want something that is somewhat smooth, stable, convergent, but at the same time that reflects changes in productivity as quickly as possible, and that delivers good global output-per-resource-consumption.

We know that the top tier within the ranking probably deserves more than the the rest, while the bottom tier probably deserves less, and that is the gist of the solution to problem 2). We don’t know beforehand how each consumer will perform though, so it makes sense to start with equal resource distribution among them.

Here’s the complete solution I worked out:

Start the system by sharing an equal amount of resources among all consumers: let’s say every consumer has the same weight W₀.

Then, for each cycle:

Build the productivity-per-consumer-class ranking
For the top N% consumers, do W += K (limited to a certain maximum)
For the bottom N% consumers, do W -= K (limited to a certain minimum)
Translate each W to a real-world resource amount (e.g. “1GB RAM” or something). This involves the weights as well as the global resource budget per time cycle, such that we guarantee we won’t exceed the static budget.

The global sum of weights is kept more or less the same because we’re summing and subtracting the same amounts each cycle (although this isn’t perfect because we have min and max values), so the system is kept fairly stable over time while also reacting quickly to changes in productivity. Also, the system is robust, and blowing up the weights store is no big deal – weights will creep back to their previous values over a short time.

To finalize, here’s a chart showing the weights of different types of consumers over the last few months:

Consumer class weights over time.

I replaced Google Analytics with a web server running on my phone

2020-07-06T13:45:40Z

TLDR I built android-analytics, a web analytics tracker running on my phone.

Say you run a blog, personal website, small-time business page or something of the sorts. Say you also want to keep an eye on how many visitors you’re getting.

The first thing that most people think at this point is “Google Analytics”. It mostly works and is free. Its also hosted by Google, which makes it very easy to start using. There aren’t many competitors that bring those points to the table, so Google Analytics usually wins by WO at this point.

I used to use Google Analytics to track this blog for those same reasons. But after finding out about Termux and writing this post about installing a web server on an Android phone, I started toying with the idea that I had this ARM-based, 2GB RAM, Linux-like device with Internet connectivity which must be more than enough for a simple webcounter-like application. After a few weeks of tinkering, here it is!

Motivation
Developing android-analytics
Conclusion

Motivation

Why even keep anything?

Before going into this whole thing, there’s a very reasonable question to be answered: why do I even need to collect this data?

The answer is simple: I really don’t, I just enjoy seeing it. Call it a vanity metric, but I think its just plain cool to know that someone half across the planet read something I wrote months ago (maybe it was just a crawler; I’ll take it either way).

It should be no surprise, then, that Google Analytics always felt immensely overkill.

Its heartwarming to know that some nerd from Bhutan read one of my posts in the wee hours of the morning, but that is pretty much all I’m interested in. I could care less about Acquisition Treemaps, Audience Cohort Analysis or Behavior Flow. I’m not making those up: they’re all real products available on Google Analytics. I have no idea of what any of those mean, yet I’m 100% sure I don’t need them.

Visitor counter from the 90s.

What I wanted was closer to the late 90s’ visitor count GIF above (minus the embarrassment of publicity) than to the unsightly “Intersitial online advertising network conglomerate SEO dashboard” feeling of Google Analytics:

Google Analytics dashboard.

In short, I wanted to geek out, not do advertisement arbitrage.

And then there is the data

As aforementioned, Google Analytics is great, free, and hosted by Google.

They keep your data. I have no idea of what they do with that data, or even what exactly it is that their tracker is sending to their servers (judging from the number of articles showing how to keep the payload below the cap of 8kb, it must be a lot).

That's a lot of results.

Apparently they often need over 8kb per request to feed their Lovecraftian “Audience Cohort Analysis” line of products. Fair enough, but I’m pretty sure that for my purposes, a several-kb payload is effectively using a sledgehammer to kill a fly.

By using Google Analytics I was willfully sending Google who-knows-what kind of data designed to build up people’s advertising profile. The page views of my blog probably didn’t help Google too much in that aspect, sure, but the principle of the whole thing still bothered me enough to do something about it.

The (lack of) competition

There are a lot of software similar to Google Analytics out there. The most prominent is probably Matomo, often posted on Hacker News. It is free, open source and self-hosted (with cloud offerings for a monthly fee).

I would happily use Matomo, but with it comes a conundrum:

Self-hosting implies I had to have some kind of publicly accessible Linux host, which would likely not be entirely free;
Cloud-hosting comes with a subscription fee.

Those points are trivial if you’re running a lucrative business that needs analytics, but paying for this service sounds ludicrous when all you want is simple visitor stats for a personal blog.

Developing android-analytics

These were the requirements I had for my tracker:

Has to run on an old Android phone I have lying around;
Has to work with Github Pages-hosted sites;
Has a per-page view count;
Nice to have: geo info.

These requirements are deceivingly simple, as I quickly learned.

Termux makes it really easy to run many kinds of software on your Android phone, and I had already tinkered with web servers with Termux. For something as simple as a page view, this should be pretty straightforward.

I had also already registered a dynamic DNS subdomain pointing to my phone, so it was ready to accept incoming traffic from the Internet.

The first major roadblock I faced was getting my Android-hosted web server to communicate with Github Pages. After a couple of days of research, I finally learned that it is basically impossible to make a request from an HTTPS website (which Github Pages is) to an HTTP address (my Dynamic DNS’s subdomain). To summarize, you can make that work, but at the cost of having the client browser do something (like actively mark a “allow mixed content” checkbox somewhere in the browser’s flags/advanced options).

This lead me to the excruciating path of obtaining and using a verified SSL certificate in my Android phone with a Dynamic DNS subdomain. This took me long enough to want to write a separate blog post about it. The TLDR here is that it is entirely possible to get a verified SSL cert for a Dynamic DNS subdomain – all of it entirely for free. Depending on your ISP, you’ll have different choices of SSL challenges, but if you’re able to receive TCP requests on port 443, it is possible to get the certificate for free.

Once I figured out the SSL thing, the rest was pretty much a breeze.

Fundamentals

I tried out a few different ideas when developing this, but the overall architecture is always the same:

JavaScript code in my tracked page calls the Android host;
Android host saves that information in a database;
Some graphical tool is used to parse that data into something viewable (charts etc).

First iteration: Sinatra webapp

I started with a Sinatra webapp with a single POST endpoint that would receive a request from the tracked page and immediately save it in a Postgres database. I used Nginx as a reverse-proxy that handled traffic before passing it to Sinatra.

This approach had the merit of being simple to understand and reliable. Also, it worked.

But after watching it work for a few days, I realized that the whole webapp part was superfluous. Nginx logs all accesses by default, and the logs contain all the information I need: what page was requested, at what time and from what IP. This lead naturally to the second iteration.

Second iteration: Nginx log parser

Nginx provides flexible, per-endpoint logs: logs are activated for the endpoint that I want (/damn_fine_coffee) and deactivated for everything else. This is important because the Internet is full of crawlers that annoyingly hit the root path /, which obviously shouldn’t count as a page view. As I learned, the web is also surprisingly full of smartypants trying to make their way into /tp-link, /admin and so on; I also wanted to just ignore those.

The logs provided all the data I needed, but I still needed to transform that data into useful information. I found out about GoAccess on Hacker News, and, perhaps surprisingly, it worked out of the box with Termux:

GoAccess dashboard with my Android-hosted data.

At this point I could settle for GoAccess, but it didn’t seem to provide any geo info, which I always thought would be a cool feature, so I kept working on my own tool.

I configured Nginx to print CSV-like logs, and wrote a parser that transforms those log entries into DB entries with geographic information provided by the excellent geocoder gem, and also annonymizes the request IPs using MD5 hashing. The final step was adding a cron entry to run the parser regularly.

At this point I was getting regular traffic converted to rows in a Postgresql table. I still needed a more convenient way to look at the data, though.

Third iteration: Adding a viewer

I initially thought about using Grafana as a visualization tool. Its free, easy to use, flexible and I was already familiar with it. Unfortunately Grafana doesn’t have binaries available for Termux (there’s an issue open in Termux’s repo requesting that), and I wasn’t feeling like trying to compile it manually.

Thankfully I found the blazer gem, which has a very similar concept compared with Grafana: you write SQL queries and it transforms them into charts. That was exactly what I was looking for. The downside is that it requires a full-fledged Rails application to run, but I was okay with that trade-off.

Here’s how the data looks like right now:

blazer gem dashboard.

Fourth iteration: Adding an installation script

So far I was playing by ear; I knew more or less how to reinstall the project on a new device, but I knew that after some time my memory would fade and the process would become a painstaking trial-and-error mess.

I first compiled all the steps needed for this to work in the repo’s README – it took a total of 17 steps to get things running. Noticing that most of these steps could be automated, I wrote a setup script that should do most of the work. I tested it in a separate Android device to make sure it works – hopefully it works for other people as well.

Final architecture

When someone accesses one of my tracked pages, this is roughly what happens:

JavaScript on that page calls my domain (provided for free by DuckDNS);
DuckDNS translates that address to my router’s most recent IP;
My router receives that request and uses the NAT table to redirect it to my Android phone;
On Android, Nginx receives the request and either logs it if the request comes from the right place (my list of tracked pages), or does nothing otherwise;
A scheduled Cron job rotates Nginx logs and converts the “old” log into rows in a Postgresql table;
I open <my-android-local-ip>:3000 on my desktop’s browser and view the charts, maps etc.

This diagram shows those same steps, more or less:

android-analytics diagram.

I named the too (quite unimaginatively) android-analytics; code and set-up instructions are available on Github.

August 2021 Update

I managed to install Grafana on Termux by using AnLinux; thus, the Viewer part of the project is no longer needed.

Also, by using Ngrok (free tier), the project now works if you’re behind CGNAT, which is my case. No need for dynamic DNS or port forwarding as well.

Conclusion

I used the Google Analytics analogy because that’s the tool that most people are familiar with, and most people will immediately understand what this thing is about, which probably wouldn’t happen if instead of saying this was a “simple Google Analytics alternative”, I said it was a “log-based web analytics tool”.

But saying this is a “Google Analytics replacement” is like saying that a bicycle is a replacement for a truck. Although they are both transportation modes, they’re different in every other aspect. The thing is: sometimes you really need a truck, but a lot of times you just need to get from point A to point B, and a bike is more than enough. In fact, it is probably better: it is cheaper, easier to park and carry around, and has a smaller environmental footprint. This project is a bike: for some people, that’s all they will need.

There’s absolutely no need to use a mammoth like Google Analytics for a personal blog or pet project. Its more than wasteful – you’re offering free data to Google in exchange for a fancy dashboard so you can play I’m-SEO-master-at-Adcorp-LLC. Someone has to keep the data, of course, but I’d argue that a decentralized approach is much safer and probably more ethical than data monopoly by a single huge advertising company.

So what are the alternatives? There are a few competitors – we already discussed that in a previous section. But then we have all this processing power just lying around, free and unused; we might as well make better use of it. Smartphones have amazing processing, networking and storage capabilities, yet for many reasons they turn old very quickly, which translates to getting sold (in the best case); shoved into oblivion in our designated e-junk clutter drawer; or just discarded.

It is just sad that we have these tiny slabs of processing power that could navigate Man to the Moon and back thousands of times over, and we can’t seem to quite find any better occupation for them other than sitting in a dusty drawer for years or getting trashed. That is why even if it takes a little extra effort, I’d rather repurpose and reuse something I already own than subscribe to the fanciest new PaaS.

Setting up a free HTTPS home server

2020-06-27T22:48:21Z

Try searching for “free dynamic dns https”, “free domain with SSL” or anything similar. There won’t be a lot of meaningful results. Sure, some of the results are pretty close, like this guide on how to get free SSL certification from Cloudflare, or this one on setting up a free dynamic hostname with SSL, but they all assume you already own a domain. If you’re looking for a completely free domain that you can use for your personal web server that also has verified SSL, there are very few results.

But why was I even looking for this?

I’m working on a side project. It has a web server that communicates with a static web page hosted on Github Pages. There are a lot of ways of setting that up; in my particular case, I have a local (as in in my house) HTTP web server accepting traffic on a non-standard port (port 80 is blocked by my ISP for commercial reasons – this detail is of paramount importance, but more on that later). It is accessible through my external IP (which is dynamic), which can be mapped to a dynamic DNS domain.

I wanted to run a simple API on the web server and access it through static pages (like this blog) hosted on Github Pages (which has a verified SSL certificate). I asked the Internet if it is possible to call from a SSL-verified page (in JavaScript) a different server that does not have a verified SSL certificate (that is, my aforementioned webapp running in my home server). It isn’t, so the conclusion was that I needed somehow to get a verified SSL certificate for my dynamic DNS domain.

Having no idea whether this was possible, I started to research.

Setting up Dynamic DNS

Most ISPs provide dynamic IP addresses for their residential customers, while static IP addresses are usually reserved to the “commercial” or “business” tier. That means your public IP address changes (usually every 14 days), so DNS servers will have to keep track of your changing IP somehow. That kind of service is called Dynamic DNS, or DDNS for short.

Several companies provide DDNS service for free. Some of them also provide a free subdomain, which is useful if you don’t own a domain yourself (I don’t). I’ve tried out most of the free DDNS providers, the most prominent seeming to be Hurricane Electric, No-ip, Dynu and DuckDNS. If you’re up for it there are even several blog posts out there explaining how to set up your own dynamic DNS server.

I wasn’t feeling too adventurous so I decided to set up shop with DuckDNS. It is really easy to set up, comes with a great HTTP API for updating the domain’s TXT, provides free subdomains that don’t expire (No-ip for instance has subdomains that expire after 30 days), and has a valid SSL certificate. They have a page explaining how to set up the actual DDNS service, so I’ll skip that.

Caveat: carrier-grade NAT

One big potential problem in the DDNS setup is whether you’re behind a carrier-grade NAT (CGNAT), which some ISPs unfortunately do. In short, being in a CGNAT boils down to not having a public IP address – you’re part of your ISP’s private network, and your router’s “public” IP address is actually a private IP address within that private network, which the ISP translates to and from the Internet.

CGNATs suck, and it essentially makes using DDNS impossible. You can find out if you’re behind a CGNAT by comparing your WAN IP address (displayed in the router admin page) and your public IP. If they differ, you’re probably behind a CGNAT

Setting up a verified SSL

I had set up the dynamic DNS service, and the next step was finding out if it was even possible to obtain a free valid SSL certificate for my subdomain.

Let’s Encrypt provides free valid SSL certificates, which are usually obtained by using Certbot, a handy software that will handle most of the complicated SSL verification process you. There are several other alternative tools that implement the same protocol used by Let’s Encrypt, but I really recommend using Certbot – it has much better out-of-the-box functionality than all the other tools I tried out, and the community is much bigger. The only caveat I could find is that you need sudo access to use it properly.

One thing I’d wish someone had told me before I spent hours looking for alternatives to Certbot is that it doesn’t have to be executed in the host that is ultimately going to obtain the SSL certificate. This might be a crucial bit of information if you can’t run as root on the actual host that will obtain the SSL certificate, which was my case. It is perfectly fine to run Certbot on a separate computer, obtain the SSL certificates and then scp them to the correct host.

Now, as I mentioned, my ISP blocks incoming traffic to port 80 for their residential customers. This is relevant because Let’s Encrypt uses by default the HTTP-01 challenge in the SSL verification process, and it requires the ports 80 and 443 to be open. However, LE also offers the alternative DNS-01 challenge which does not require those ports to be open (but requires the ability to update TXT domain records, which not all DDNS service providers allow – No-ip, for instance, does not). I happened to find out about this by reading this very helpful post from someone in a similar predicament (home server, port 80 not available) saying he used this alternative challenge successfully with DuckDNS (thank you!). In this Server Fault answer, the poster explains how to use Certbot with the DNS-01 challenge (thank you!).

Running Certbot with DNS-01 and DuckDNS

DNS-01 works by confirming that you can modify the DNS TXT record of your domain.

Here’s the command to start SSL verification with Certbot using DNS-01 and a DuckDNS subdomain, and the expected output:

$ sudo certbot -d  my-subdomain.duckdns.org --manual --preferred-challenges dns certonly

Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator manual, Installer None
Obtaining a new certificate
Performing the following challenges:
dns-01 challenge for my-subdomain.duckdns.org

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
NOTE: The IP of this machine will be publicly logged as having requested this
certificate. If you're running certbot in manual mode on a machine that is not
your server, please ensure you're okay with that.

Are you OK with your IP being logged?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(Y)es/(N)o: Y

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Please deploy a DNS TXT record under the name
_acme-challenge.my-subdomain.duckdns.org with the following value:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Before continuing, verify the record is deployed.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Press Enter to Continue

At this point you have to do as the program says: update the DNS TXT record. Thankfully, this is exceedingly easy to do with DuckDNS (see their spec page for instructions).

You can verify that the TXT was updated by running dig:

$ dig my-subdomain.duckdns.org TXT

; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> my-subdomain.duckdns.org TXT
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21765
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;my-subdomain.duckdns.org. IN  TXT

;; ANSWER SECTION:
my-subdomain.duckdns.org. 59 IN  TXT "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

;; Query time: 335 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Mon Jun 15 18:50:41 -03 2020
;; MSG SIZE  rcvd: 114

Once you confirmed the TXT value, the remainder of Certbot’s output should be this success message:

Waiting for verification...
Cleaning up challenges

IMPORTANT NOTES:
 - Congratulations! Your certificate and chain have been saved at:
   /etc/letsencrypt/live/my-subdomain.duckdns.org/fullchain.pem
   Your key file has been saved at:
   /etc/letsencrypt/live/my-subdomain.duckdns.org/privkey.pem
   Your cert will expire on 2020-09-13. To obtain a new or tweaked
   version of this certificate in the future, simply run certbot
   again. To non-interactively renew *all* of your certificates, run
   "certbot renew"
 - If you like Certbot, please consider supporting our work by:

   Donating to ISRG / Let's Encrypt:   https://letsencrypt.org/donate
   Donating to EFF:                    https://eff.org/donate-le

All set! You now have a valid SSL certificate. You’ll still need to place it in the right place, which will vary depending on what web server you’re using. For example, if you’re using Nginx, the configuration file might look something like this:


server {
  ssl_certificate /path/to/fullchain.pem;
  ssl_certificate_key /path/to/privkey.pem;
  ...
}

Conclusion

There’s quite a lot of shady-looking websites out there offering for a monthly fee the exact same thing as I just wrote about. When researching this, not knowing too much about most of these topics, I was almost fooled into accepting that this just couldn’t be done for free for some unknown technical reason. There had to be a reason why there were no Google results for this – maybe my case was too specific, or maybe other people are less cheap than I am and just pay for a domain and get the SSL stuff for free.

I still have no good explanation as to why the kind of guide I just wrote above didn’t show up in my research. Maybe people don’t care about home servers, or maybe I’m just not too good at searching (probably both). In any case, hopefully this post will make it clear that setting up a DDNS subdomain with SSL for free is not only possible, but really not that complicated.

Communication tips for remote developers

2020-05-30T15:16:00Z

We're all remote -- for now.

Communicating well with your co-workers and managers is supremely important to a software developer, and even more so for the remote one. With a lot more remote workers due to the COVID-19 pandemic, this topic became a lot more relevant.

I’ve seen people hint at this more than a few times over the years, but I didn’t really “get it” until I started working as a fully remote engineer. I also find it important to understand not only what we should be doing to achieve efficient communication, but also why we should be doing those things in those ways.

To me, the single most important thing to keep in mind is that people’s mental resources: time, attention span, etc, like yours, are limited.

That might be obvious, but different management styles might make it seem otherwise – a more hands-on manager might kind of seem like he is acutely aware of what you’re doing all the time, but that is hardly ever the case. Managing styles aside, managers are, after all, humans like the rest of us and have limited time and resources. They can’t possibly have the same insight into each task as the respective engineers working on them.

The corollary to that is that it is your job to keep managers in the loop, providing the right amount of information at the right time through the right channel (text, video call, presentation…). Just like any other skill, this is something you can learn over time.

Similar reasoning applies to co-workers: people are usually deeply involved in whatever it is they’re doing, so they won’t usually know too many details about what you or other co-workers are doing all the time.

A lot of the things you need to do are fairly obvious and well-known: be clear about what you’re saying, keep communicating frequently, etc. Other things aren’t too obvious (at least to me, that is) and are worth sharing in this short blog post.

Don’t write a novel

Reading is hard. People’s availability and attention span vary. Try to get your point across with the least words as possible.

If I’m writing an issue update, pull request, or other technical information, I usually start with a more winding text and then prune as much of it as I can. This can be really simple, like changing “According to #85748, the problem I described started when…” to “The problem began when … (#85748)”.

This can’t be done at the expense of clarity, though. It is preferable to write or say “I think option B is the way to go” than an ambiguous “sounds good”.

Manage expectations

People don’t like to feel disappointed. To avoid unfulfilled expectations, it is your job to make sure those expectations stay realistic – the more so when the task evolves or unravels into something much more complicated than people originally expected.

As we all know, there’s a lot of uncertainty in this job. Something that seems easy might actually be super easy or might be very hard. Unexpected difficulties are expected, and most people are fine with that as long as they also think that those problems are actually problems.

That’s a huge caveat: if you’re unable to convince other people about the seriousness of the unexpected problems you’re facing, you might as well not say anything at all about them. People usually only believe what they understand, and it is your job to properly communicate that to people less involved in the task than you are.

Tailor to the audience

Different people use different lingo to express the same things. Sales people will use different terms than engineering people.

Adjusting your language to the audience isn’t just about replacing technical words with other words, though, but also about cropping the information in the right way.

Excess information that isn’t relevant to the point you want to get across generates noise and confusion. This might be seen as a broader definition of the first topic: if you can manage to get your point across with less information (whatever that is: spreadsheets, images, etc), then that is certainly desirable. If the person you’re talking or presenting to is only interested in 1 of the 3 columns of a spreadsheet, although the other 2 might be insightful to you, you should probably refrain from showing them at that moment.

And that is the note I’m ending this post on. These three things are probably obvious or second nature to a lot of people, but at least to me, it took a few years of remote work to fully appreciate them. Hopefully this post can be helpful to other like-minded developers.

Figuring out the Nvidia x Linux puzzle

2020-05-16T19:48:00Z

Ubuntu's power rate over time.

I’ve struggled with some kind of problem with Nvidia graphics cards on Linux since forever.

Most commonly, an external monitor wouldn’t work or the dedicated card would refuse to power off when it should.

The latter problem – a power-hogging discrete Nvidia card not turning off when it isn’t needed, specifically in Optimus-enabled laptops – has consistently haunted me throughout the years. At least in my experience, this problem is in that sweet spot of things that are definitively annoying and kind of inconvenient, but complicated enough not to be worth the several work-hours needed to definitively solve it.

I know that I’m not alone here, as other people over the internet have said things like “I’ve been pulling my hair out for the past few hours trying to configure my graphics drivers on my laptop”. I’ve also not been a total sloth about this: although I have tried many times in the past to fix this, I’ve consistently found myself thinking “okay, now this is fixed”, only to a few hours/days later notice that my laptop battery was drained in an hour and the problem was back. I actually re-wrote a significant part of this post because when I thought I was finished, my Nvidia card started turning on again and I had to do more research.

Taking advantage of the extra time in my hands due to the Covid-19 city-wide lockdown, I decided to persistently look for a solution to this issue. This guide is just a documentation of this process. I use Ubuntu, but similar steps should be valid with whatever distro you’re using. Also, some or many of the steps might not actually be necessary - they’re just what happened to finally work in my case.

1. Install the proprietary Nvidia drivers

Ubuntu uses the open-source Nouveau driver for Nvidia cards, which doesn’t play well with Optimus-enabled laptops. Let’s install the proprietary Nvidia driver.

First, find out what’s the recommended Nvidia driver:

$ ubuntu-drivers devices

== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00002191sv00001462sd00001274bc03sc00i00
vendor   : NVIDIA Corporation
driver   : nvidia-driver-435 - distro non-free
driver   : nvidia-driver-440 - distro non-free recommended
driver   : xserver-xorg-video-nouveau - distro free builtin

== /sys/devices/pci0000:00/0000:00:14.3 ==
modalias : pci:v00008086d0000A370sv00008086sd00000034bc02sc80i00
vendor   : Intel Corporation
manual_install: True
driver   : backport-iwlwifi-dkms - distro free

Then install it:

$ sudo apt install nvidia-440

Another option is to pick the driver in the Additional Drivers tab of the Softwares & Updates tool:

Nvidia proprietary driver option in Ubuntu's Additional Drivers menu.

Nvidia’s proprietary driver lets you choose if you want to use the dedicated or integrated GPU, which you can try setting:

Nvidia proprietary driver's GPU selection menu.

Now if you’re lucky this might be enough. Check the power usag using Ubuntu’s Power Statistics tool or powertop: if the Nvidia card is successfully turned off, then typical power usage is somewhere between 8-14W. If, like me, this changed nothing in your power usage, read on.

2. Install and configure bbswitch

Although Nvidia’s proprietary driver allows selecting between integrated and dedicated cards, in my experience that setting has had no effect at all, with both cards always being powered on anyway.

bbswitch is a tool that allows you to select which card you want your system to use. Ubuntu has the bbswitch-dkms package available:

$ sudo apt install bbswitch-dkms

Then configure it to always turn off the discrete card by creating the following file:

$ cat /etc/modprobe.d/bbswitch.conf
options bbswitch load_state=0

3. Blacklist Nouveau driver

According to this Stackoverflow answer, there seem to be at least a couple of bugs that result in Ubuntu trying to load the Nouveau module even if you’re using a proprietary Nvidia driver. When that happens, the discrete Nvidia GPU turns on and starts hogging a lot of power.

Blacklisting the Nouveau module solved this issue for me:

$ sudo bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
$ sudo bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"

Restart and confirm that the right driver is loaded:

$ gpu-manager | grep nouveau
Is nouveau loaded? no
Is nouveau blacklisted? yes

4. Blacklist some Nvidia modules

Even after the above, my system kept turning on the nvidia card seemingly at random. I found this post in the Bumblebee issue tracker to be extremely helpful:

“bumblebee can turn the nvidia card off when it starts, but as soon as the nvidia module is loaded, it loads nvidia_drm, which links to drm_kms_helper and then bumblebee can’t remove the nvidia modules. This means that bumblebee can’t turn off the nvidia card when optirun terminates. To fix this, I added “alias nvidia_drm off” and “alias nvidia_modeset off” to my conf file in /etc/modprobe.d.”

This is the file created by the OP:

$ cat /etc/modprobe.d/nvidia.conf

blacklist nvidia
blacklist nvidia_drm
blacklist nvidia_modeset
alias nvidia_drm off
alias nvidia_modeset off

After creating this file and restarting, my system was finally using only the Intel integrated card. Hopefully this time it’ll stay that way.

Results

Here’s a chart of my laptop’s power rate:

Ubuntu's power rate over time.

Using the integrated Intel GPU, the rate fluctuates around 10W. When the Nvidia card kicks in, which is what was going on around the middle of the chart, it jumps to 40W+.

References

Repurposing an old Android phone as a Ruby web server

2020-02-05T12:24:41Z

CC-BY Carlos Varela, https://www.flickr.com/photos/c32/7755470064

Do you have an old Android phone? Sure you do! There’s a mind-blowing amount of electronic waste of all kinds, and with the average person in developed countries discarding their phones every couple of years, discarded smartphones are probably one of the most common forms of e-waste.

I had an old Motorola G5 Cedric gathering dust, so I decided to do something with it – it is now running a Puma web server with a simple Sinatra webapp.

Now, before going any further, you might be thinking: what is the real, practical use of all this? An old Android phone probably isn’t going to have a stellar performance, but neither do those t2.nanos, honestly. I’m yet to deploy any “real” code on an Android, but even the cheaper and older phones do commonly have quad-core or even octa-core CPUs, and at least 2GB RAM, so at least in theory a phone should be close – ballpark, at least – to the most modest cloud IaaS offers our there (t2.nano has 512MB for instance). Of course, a phone has an ARM processor while IaaS usually are x86; memory management is entirely different as well, but still – we’re talking ballpark estimates here.

Anyway, this is a short tutorial on how to repurpose an Android device as a web server – or any number of different things, really.

1. Install Termux

First of all we need a Linux environment in our phone. Termux is a terminal emulator and Linux environment for Android. It’s available on Google Play Store. No additional configuration is needed after installation.

2. Set up SSH

You won’t want to type a lot of commands into a tiny touchscreen, so let’s set up ssh so that we can log into Termux remotely.

There are several ways of doing this, but I’ve found that the easiest way is through a software called Dropbear:

Run this on Android:

pkg upgrade
pkg install dropbear

You can use password-based authentication or public key authentication. You should use key-based authentication, but for testing purposes password-based is easiest. Run this on Android:

Run this on Android:

passwd
New password:
Retype new password:
New password was successfully set.

Bonus points: install a terminal multiplexer like tmux or screen. This will make your life much easier when running stuff via ssh:

pkg install tmux

Now go ahead and test the connection on your desktop:

ssh android-ip-address -p 8022

3. Set up static IP address on Android

Go to wifi settings, disable DHCP and assign an IP address for your phone.

This is necessary so that your router won’t assign a new IP address to your phone every few hours/days, which would make configuration a lot harder.

4. Install Ruby, Bundler, Sinatra and Puma

Sinatra is a lightweight web application framework, and Puma is a web server.

Ruby is, well Ruby!

Of course, Sinatra and Puma are just suggestions – you could even use full-blown Rails on your phone, as described in this neat tutorial. Just don’t use WEBRick, the default Rails web server in development – it is single-process, single-threaded and thus not suitable for production environments (it is fine for small experiments though).

Run this on Android:

pkg install ruby
gem install sinatra puma

Install nginx

nginx is a web server, reverse-proxy and load balancer. Although most useful in multi-server setups where it is used to distribute requests among different instances, nginx is also a good idea in our setup because of the embedded DDoS protection and static file serving that it provides.

Run this on Android:

pkg install nginx

Now the slightly tricky part is configuring nginx to work with Puma. This gist is a pretty good start – copy & paste nginx.conf and change appdir to your webapp’s root dir. In my case, for example, that would be /data/data/com.termux/files/home/android-sinatra.

Set up port forwarding

You probably want your web server to be accessible through the internet, so you’ll have to set up port forwarding in your router to redirect incoming requests to your public IP address to your brand new Android web server.

How exactly to do this varies depending on your router. Here’s a pretty good tutorial to get you started.

Configure a dynamic dns

Most people have dynamic public IP addresses. In these cases it is useful to set up a dynamic dns (DDNS) service, which provides you with a static domain name that redirects automatically to whatever your public IP address is at that moment.

There are few free services that provide DDNS nowadays; I’m using no-ip and it has been okay so far. You do have to “renew” your domain every month though.

After setting up a DDNS, you’ll have to configure your router as well so that it periodically notifies the DDNS service with your current IP address. Again, how exactly to do this depends on your router model.

Hello world!

Puma and nginx running on a Motorola G5.

Under siege

You can simulate real-world usage through siege, a http load testing software. Here’s a screenshot of siege running on my setup with 3 concurrent users (real tests would use bigger numbers):

siege running in the foreground; nginx logs and top on remote (android) running in the background terminals.

The numbers in that screenshot don’t matter much because our webapp was serving a simple 100-char response with a timestamp, but it is enough to at least know that the server can handle a few concurrent users.

Epilogue: safety

If you’ve watched Mr Robot, you know that the internet can be a dangerous place. That is a lot more true if you have a web server open to the internet.

Within a few hours of opening up the server, it was already being crawled by all sorts of things. Most are innocuous indexing robots, but some are definitively not so nice, like these two requests:

Most of those requests seem fine, but the two in red are probably some kind of attack.

So the headline here is: keep all software updated, keep an eye on access logs and maybe go through nginx safety guides such as this and this.

Speeding Up the Backend with Graph Theory

2019-11-06T22:29:15Z

Here at Sensor Tower we handle large volumes of data, so to keep things snappy for our customers we need to think carefully about how we process and serve that data.

Understanding the data we’re handling is a fundamental part of improving the way we serve it, and by analyzing how an important backend service worked, we were able to speed it up by a factor of four.

This post was originally posted in the Sensor Tower blog.

Background

We have many user-facing endpoints in Sensor Tower. So many, in fact, that we have numerous dashboards to keep tabs on how the system behaves.

A few months ago, we noticed that a particular and very important endpoint was very sluggish: While all the other endpoints of the same type had <50ms latencies, this particular service took a leisurely 300 to 500ms to respond. Here’s a diagram of how that looked:

The customer doesn’t look very happy up there! So, we decided to take some time and do an in-depth analysis of that endpoint.

Okay, now to a more serious diagram. Here’s what that endpoint looked like:

The numbered steps in the diagram above perform the following operations: 1. Decode a Protobuf string and build a Ruby object from it; 2. Modify the object; 3. Encode the Ruby object back to Protobuf.

In essence, the endpoint receives Protobuf-encoded strings, does some work on them, and returns a processed version of the Protobuf-encoded string to the client. If you don’t know what Protobuf is, that’s okay; I didn’t either. You can think of it as something similar to JSON: A serialized tree structure encoded using a binary format rather than text.¹

Once we pinpointed that the endpoint slowness was due to Protobuf parsing, the next step was to try and find bottlenecks in the algorithm. The proper way to do this (which is not to just to read the code thoroughly) is by using a profiler.

With the help of ruby-prof (generate profile data) and KCacheGrind (view profile data), we were able to identify two methods, #find_all and #encode, that took a large portion of the CPU time:

While a profiler is a useful tool to identify potential problems, it has its limitations. Profiling helps you visualize data from a single run through your code. That might be fine if you’re analyzing a simple algorithm that deals with very homogeneous input, but is not really enough in the case of our back-end service, which receives thousands of very different inputs every hour.

In other words, we also needed to validate the profiler results with more data.

Taking this understanding into account, we opted for benchmarking a few thousand requests. Specifically, we benchmarked the #find_all and #encode methods and found out that, while the time they consumed relative to the total time varied, the sum of those two methods took almost 100 percent of the total time of the entire endpoint. At this point we knew we could focus our attention on these two methods.

Naturally, the first step was to understand what each of those methods did.

#find_all is responsible for decoding and modifying the object (steps one and two in the diagram), while #encode is exactly what the name implies: It re-encodes the modified Ruby object back to Protobuf (step three).

With that said, let’s go through the optimizations performed in both of these methods.

First Optimization: Encode

Before we dive into the first optimization, let’s first explain how exactly the encoding/decoding processing works. Here’s an overview of what they do:

*Those Strings aren't really Protobuf -- because it is a binary encoding, it isn't too easy on the eyes, so for the sake of readability we're using this pseudo-JSON representation.

There are two things we need to point out about this decoding/encoding process that might not be obvious:

Both encoding and decoding work recursively, starting at the root and finding their way down to the leaves;
Each node in the Ruby object also contains the original Protobuf string for that node. So for instance the node C in the example above also contains the following string: "{ C: { B: [A]}, F: [D, X1] }"; the node B contains this other string: "{ B: [A] }", and so on.

Now we’re ready to understand the actual optimization.

Let’s take a detailed look at the modification process:

One of the most fundamental steps of any optimization is avoiding repeated work. If we look closely, we can see that the nodes in blue (A, B, D) were not modified: Look at the strings generated by decode (left side, in yellow) and compare them with the ones generated by encode (blue, to the right)—they’re identical! Conversely, nodes in red (C, F) were indeed modified: The strings are different. So, now we know there is some potentially repeated work going on.

The first optimization leveraged this repeated work. Instead of always encoding every single node, we now encode only those nodes that were modified. All the rest of the nodes already have a valid Protobuf string stored as an instance variable, and that string is identical to what we would obtain if we were to run #encode on them.

The actual code change to implement this was quite simple: Just a matter of adding a dirty flag to each node, and marking the node as @dirty = true if it or one of its descendants was modified.

This optimization alone reduced the endpoint’s execution time by 30 percent. Here’s the execution time chart right after deploying the optimization:

Second Optimization: Finding a Node

The first optimization worked on the #encode method, so the natural next step was to look at the other time-consuming method, #find_all.

As we briefly mentioned, #find_all is responsible for two things: Decoding the Protobuf string into a Ruby object and modifying the object itself.

Unfortunately, there is no way of knowing beforehand if we’ll need to modify anything or not, so we’ll always have to do the decoding step. But what about the other thing #find_all does, modifying the object?

Before diving in, let’s recall a few things: 1. Protobuf is a tree-based data structure; 2. The trees we receive have no internal order to take advantage of; 3. Our algorithm searches for specific nodes and removes them from the tree; 4. We don’t know what the trees look like beforehand.

Before this optimization, #find_all was running a simple tree traversal to try and find those specific nodes mentioned in step three above. This is an acceptable approach when your input is small or when you’re not too worried about response time, but when you have massive inputs and want to deliver the smallest possible runtime, tree traversals can be a problem: They have linear time complexity (O(n), where n is the number of nodes).

Once we know the path to a node, though, accessing it is very cheap: It can be done in logarithmic time, O(log n). This is possible because of a mathematical property of trees: Tree height is roughly a logarithmic function of the amount of nodes (it might degenerate into a linear function as well, but let’s leave those explanations to the textbooks), thus the average-case maximum path length to a node (that is, from the root down to the deepest leaf) is also bound to that same logarithmic constraint.

So, we started looking closely to which paths we were going through to access those few nodes we wanted to remove. Ideally, there would be a single, universal path found in all the trees we ever encounter. That, way we could store that single path and always be guaranteed of finding the nodes we want to. Conversely, the worst possible outcome would be that every tree had a unique path to those nodes.

The truth lied somewhere in between those two extremes (thankfully for us, it leaned more towards the former rather than the latter). Here’s a chart of the amount of different paths we found over time (the two curves represent paths to two of the nodes we want to remove):

Without going into too much detail about that chart, just notice that it is very logarithmic! This is excellent for us, because it means that with a relatively small amount of paths we can find a very large percentage of the total nodes we want to find (and for the few we don’t, not finding them is okay). The next chart compares what we actually found (logarithmically growing paths) with the worst possible scenario mentioned previously:

So, the second optimization was, in the end, also very simple: We simply collected a large enough amount of different paths and then traversed to them in logarithmic time instead of doing a full tree traversal that takes linear time to find the nodes we wanted to modify. This was responsible for a 300 percent speedup:

Caveats

You might have noticed that this method is not perfect in the sense that it doesn’t always find all the nodes that we would’ve found using a complete tree traversal. This is quite true: The optimization comes with an accuracy trade-off. While this might be a deal-breaker for systems where you actually need 100 percent accuracy at all times, this wasn’t really a problem for us; Missing a few nodes out of the several thousand we process each hour wasn’t really a big deal.

As time passes, however, and different trees keep coming in, the precision of this approach eventually declines to a level that is significant even for our not-too-strict requirements. This happens slowly, because of how different paths appear (following a logarithmic function), but surely—our accuracy was ever-descending because we had used a fixed number of paths.

After employing our optimized algorithm to the payload and responding to the request, we post-process a subset of the the requests in the background and dynamically update the path definitions. This way, we always have a very high success rate on the parsing but keep the latency of responding to a particular request low.

Final Results

Here’s a chart showing our execution time before and after we rolled out both optimizations:

We effectively reduced execution time from 300-500ms to 80ms with almost no impact to the user.

Notes

Protobuf, or Protocol buffers, is a binary serialization method developed by Google. ↩

My attempt at creating more

2019-08-30T00:00:00Z

I began blogging in the now prehistoric late 2000s.

I’ve done a few blogs about different subjects (computer science, algorithms, web development, short stories and political ramblings). I’ve had blogs on Blogspot, Wordpress and, more recently, Medium.

Those platforms were (or are, I suppose) an easy way to spew your ideas over the Internet while also being nice and comfy for other people to actually read (this last point is important for the CSS-challenged such as yours truly). In other words, those services Got Shit Done™.

Alas, as I opened my eyes to the wonders of web development I started noticing a few things. First, Wordpress is written in PHP, which is gross (just kidding). Second, you don’t really control much: you can pick themes or whatever, but you won’t have the full control you’d have by creating a website from scratch or nearly scratch. Third, and maybe a corollary to the previous point, that stuff is bloated. There’s approximately 3 terabytes of mostly useless JavaScript, ads and all kind of crap I don’t care about.

But most importantly, I understood the hidden costs of most “free” web services. You don’t really own anything. You provide content, and Wordpress or Google or whoever package that content into a neat bundle and servce it to your audience together with whatever else (cough, trackers) they see fit.

That’s one of the reasons that pushed me towards a less-walled-garden approach towards blogging. But there’s a nother reason as well.

As many others, I have fond memories of the late-90s/early-2000s Web 1.0 Internet. There is something warm and fuzzy about those beautifully terrible Geocities pages. They pierced the eyes of the viewer but were wondrous in a way. As I said, I’m not alone: Web 1.0 nostalgia is definitively on the rise.

But why? What is not to like in our world of beautiful walled gardens? Surely it is better than those gross-looking Web 1.0 fan sites about some crappy GameBoy game, right? …Right?

We're soooo cooler than this in 2019.

Wrong.

Well, in many ways the Internet has of course improved over time. It has useful things like search engines and Wikipedia, and convenient subscription-based entertainment like Netflix. It has a whole bunch of nice stuff I could spend hours blabbering about.

But it also has a lot of problems. At this point there are surely many PhD theses about most of them, so I won’t bother. I’m just going to recommend one New York Times article that explains how YouTube indirectly helped elect a buffoon that makes high school-tier penis jokes as president of Brazil.

Now, the specific problem with today’s Internet that I feel is most relevant regarding blogging is how we’re gravitating towards all these “free” services all the time. Medium, for instance, is so nice looking that one doesn’t even think of perhaps using something else. But what happens when everyone uses Medium? First: all the blogs look exactly the same, which is lame. Second, Medium gets all that content and traffic for itself, for free.

Of course, not everyone is skilled enough to build a personal blog from scratch. I am just barely able, as you can see from my lackluster front-end skills (I promise you I’m good on back-end things). So I’m definitively not dismissing the inclusiveness that services like Medium offer.

But as I searched for a way out of the walled gardens and fiddled with Jekyll for a while, I figured I might as well just build something to call my own.

So I built this.

The goals were to create the simplest possible blogging system with as little fluff as possible. It should meet what I defined as basic blogging needs: list posts, show post, use tags, use images etc. And also not have 3 terabytes of JavaScript split in 90 requests just to show a fancy menu button.

So I started messing with Nanoc, an excellent Ruby library for static page generation, and came up with this bad boy.

I won’t pretend these ideas are new. They aren’t! I feel there’s been an increasing amount of Web 1.0 nostalgia going on, and a big part of that is probably fueled by similar sentiments as those I described. The longing for simplicity in a world of trillions of new JS frameworks is also quite widespread these days.

This small project is nothing special. There are much better projects available for free on the Internet done by people that actually know what they’re doing with a CSS file. This here is just a tiny vase with some ugly flowers – it would be ridiculous to compare it to the beautiful walled gardens of Medium or Wordpress. But, ugly as they are, they’re mine!

Halving page sizes with srcset

2018-09-03T00:00:00Z

Web bloat is discussed a lot nowadays. Web pages with fairly straightforward content — such as a Google search results page — are substantially bigger today than they were a few decades ago, even though the content itself hasn’t changed that much. We, web developers, are at least partly to blame: laziness or just bad programming are definitively part of the problem (of course, laziness might stem from a tight or impossible deadline, and bad code might come from inexperienced programmers — no judgment going on here).

But here at Guava we believe that software should not be unnecessarily bloated, even though it could be slightly easier to develop and ship. We believe in delivering high quality production code, and a part of that is not taking the easy way out in detriment of page size.

We frequently have to start working on long-running software that has more than a few coding shortcuts that were probably necessary at the time to ship something quickly to production, but are now aching for optimization. Sometimes the improvements are too time-consuming to be worth our trouble, but sometimes they are an extremely easy win.

Such is the case of separating image assets by pixel density (DPI). As the name implies, DPI (dots per inch) is the amount of dots (or pixels, in our case) that fit in a square inch of screen real estate. The exact definition varies according to context, so for the sake of readability we’ll say that low DPI means the average desktop or laptop screen and budget smartphones, while high DPI means the average smartphone, tablet or higher-resolution computer screens (e.g. Retina displays and 4k monitors).

Nowadays, smartphone customers are important to most online retail businesses, which means that we should serve high DPI images when necessary. The “when necessary” part is important because the easy way out is to always serve high DPI assets, even though the client device might not need them. The problem with this is that high DPI images are roughly 4 times as big as their low DPI counterparts, so low DPI devices would be getting unnecessarily big images for nothing at all — web bloat!

Serving different assets according to the client’s DPI was not a trivial task a few years ago, which means that the web is probably filled with pages that still serve high DPI assets by default to all client browsers. But now that HTML5 is widely adopted we can make good use of srcset to do just that. To each their own: srcset takes a list of different images and serves the most appropriate one to each client. In image-heavy sites such as retail stores this is an excellent tool to optimize average page size and save a good deal of bandwidth — which means saving money. Smaller images also take less time to load, so customers will also see product images faster than before.

This very simple change allowed us to decrease page sizes in one of our projects over 50% in some of its most-accessed endpoints, and an overall average 25% page size reduction for low DPI customers. Considering that some of the pages were 4 or 5MB big, halving those sizes was a great improvement to our customers — even more so considering that some of them might access our site on low-quality mobile networks, which can be excruciatingly slow sometimes. Considering the proportion of low DPI customers we have on an average day, this improvement saved our client some 7.5% of bandwidth.

Now that we’ve got some hindsight, it seems glaringly obvious that we should have been using this feature all along. But more often than not, extremely simple optimizations such as the one we described are overlooked by less experienced teams or worse — deemed “not important” by management because customers nowadays supposedly can spare a few megabytes per page (that may be so, but they don’t want to!).

We think that bloated web pages hurt everyone involved: web developers, customers and businesses. We strive to achieve what we think is good quality web code: that which delivers optimized, slim web pages to all clients.

By Leonardo Brito on January 14, 2019.

Canonical link

Exported from Medium on May 1, 2019.

10 ways not to do a big deploy

2018-09-03T00:00:00Z

Ideally, deploys should be small, concise, easily revertible, fast and with a small or nil footprint on the database. However, no matter how awesome you are, sometimes that is just unattainable and you end up needing to deploy something that is just the opposite: big, messy, hard to revert, painfully slow and rubbing the DB the wrong way. If the deploy messes with a mission-critical part of your software, all the worse for you.

But there are actually many ways you can make those situations even worse. Here are a few bullet points you can follow to guarantee a nightmarish deploy complete with nasty side-effects that will haunt you and your coworkers for days to come.

1. Don’t make a plan

Plans suck. They take time and effort, and don’t add any new features to your software. Planning a deploy requires thinking carefully about what it should do and, more importantly, what it shouldn’t do (but potentially could). A good deploy plan is a step-by-step happy path that is written clearly and concisely, followed by a list of everything nasty that can happen. Making a deploy plan is basically trying to cover as many blind spots as you can before pulling the trigger. But, of course, you and your team are code ninjas or master software crafters or whatever the hippest term is nowadays, and you don’t need a plan! Just wing it. Press the button and solve every problem that might arise in an ad-hoc fashion. What could go wrong?

2. Don’t schedule downtime

Downtime sucks: it usually is in odd hours, late in the night or early in the morning, when customers are fast asleep (and you would very much like to be as well). Why bother blocking public access and redirecting customers to a nice “scheduled maintenance page”? Why gift you and your team with peace of mind and a clear timeframe to work with if you can feel the rush of breaking stuff in production with live customers? Production debugging is the best kind of debugging! Confuse your customers with inconsistent states and leave them waiting while your team tries to fix those bugs that were definitively fixed last Friday night.

3. Don’t have a great log system

Logs are for buggy software, you won’t need them. Why spend time and possibly money with a great logging-as-a-service (LaaS) platform? Just have your whole team ssh into production and watch the log tails. Or, even better, use a terrible LaaS that is slow, unreliable and has a confusing user interface so everyone can get frustrated trying to find errors during the deploy.

4. Don’t have a bug tracker

See above: just like logs, bug trackers are also lame. Your awesome PR won’t have any bugs, now, will it? Regressions never happen under your watch. Also, who needs to track exceptions with a great, fast, reliable bug tracking platform when you have logs available? Aren’t you hacker enough to grep every single exception that might be raised?

5. Don’t have a staging server

Staging servers are a waste of resources, both time and money. What is the point of having a close-to-exact copy of your production servers, which by this point are radically different from your development environment? Sure, containerization already kind of abstracts many of those differences, but (hopefully) you have network settings, 3rd-party APIs and other stuff that aren’t the same in development, even with containers. So be bold and make the leap from development right to production!

6. Don’t check your env vars

Your project only has like 80 different access tokens, API keys, DB credentials and cache store credentials spread over half a dozen YAMLs. Super easy to keep track of and super hard to mess up with your production, development and (hopefully) staging environments. Don’t triple-check the variables that might have been changed in the deploy, and you’ll secure a few hours of painful debugging in the near future.

7. Don’t guarantee data consistency post-deploy

In a previous step you were told already to make sure that customers can keep using your software mid-deploy, so we’re halfway there already to guaranteeing poor data consistency. Make sure you haven’t mapped out all the points your new code might touch the DB, particularly the DB structure itself. If anything goes wrong, just revert the commit and rollback — don’t ever worry about becoming orphaned or inconsistent.

8. Don’t prepare for a late rollback

If everything else fails… wait, it won’t! Some problems can surface during the deploy, sure, but we won’t need to rollback after it is done, right? Right? After everything is settled, and you made a plan (which you totally shouldn’t, remember?) and followed it step-by-step, and all went well, you shouldn’t need to rollback. But let’s say it happens, and a few hours (or days) after the deploy you need to go back to the previous commit/tag/whatever you use. New data will have flowed which might need to be manually converted back to something manageable by the previous version of your software. Don’t think about it, don’t plan for it — it isn’t likely to happen. And if it does, you will have a heck of a time working on oddball and edge cases late in the night. What is not to love?

9. Don’t communicate efficiently with your team

You already know you should have terrible log and error tracking systems. Add insult to injury and don’t talk to your coworkers in a quick, direct and clear way. Long pauses are great for dramatic effect, especially when your coworkers are waiting for a timely answer. Be vague about what you’re doing. Hit the rollback button and “forget” to tell people about it. In general, just be as confusing and unavailable as possible.

Following all of the points above might lead to a “perfect storm” situation, and making sure you don’t follow them will surely make things easier on you and your team. But even if you have great deploy practices in place, sometimes things just fall apart. There will always be blind spots, and it is in their nature to be more or less unpredictable. That is just the way things are with software development. Which leads us to our 10th and final point in this guide to terrible deploys:

10. Don’t be patient and understanding with your coworkers if everything falls apart!

By Leonardo Brito on September 3, 2018.

Canonical link

Exported from Medium on May 1, 2019.

Working remotely in a non-remote company

2018-06-12T00:00:00Z

We’re a small team here at Guava, and we’ve always considered ourselves remote friendly. Most of us work remotely every now and then pushed by varied force majeure situations— be it the flu, the need to supervise renovation or construction work at home, flash floods near the office, receiving guests at home or any number of other situations. We’ve also had a few of us working remotely for a few days or weeks while traveling to or back from a conference, or when visiting relatives that live out of town. In other words, remote working has always been a very temporary and circumstantial thing among us.

We have a nice office (with hammocks!), excellent work equipment, great desk space, comfortable chairs, plenty of snacks and comfort food and an infinite supply of coffee. We’re also easygoing and overall pleasant people (well, most of us are) to work with several hours a day, and some of us are even mildly funny.

I bid adieu to my coworkers, the coffee machine, the nice desk and the hammocks and traveled abroad to try out being a remote worker (some prefer the term digital nomad — to me, it seems a bit preposterous to compare month-long stays in modern urban dwellings with electricity and wireless internet with the traditional nomadic lifestyle) for half a year. A few weeks before leaving, I read the interesting Remote: Office Not Required, which I vividly recommend to anyone considering working remotely. Some of the challenges I faced during my time as a remote worker were foretold by the book, while others were a complete surprise. Here are a few of the things I learned firsthand about remote work:

It takes time to adjust.

Your mind takes some time to adjust to working remotely. In many ways, working remotely feels like a completely new job — even if you’ve been in the same company and position for years.

Some people have more trouble with this than others, but everyone will take some time to adjust. The important lesson here — for worker and employer — is to have patience. Steep as it might be, the learning curve of adapting to remoteness will eventually plateau out.

It is easier when you’re well acquainted with your team.

Starting a new project — be it a new job or just a new assignment involving different team members — may be daunting. Starting remote work already with good rapport with your coworkers helps tremendously, as you will feel more at ease to talk to people.

It is important to feel comfortable enough to let your team know if something is going wrong right away, for example, as opposed to keeping it to yourself and suffer silently. Good rapport between developers also means it will be easier to understand each other when discussing technical problems.

It is easier when you’re not the only remote person.

Being “the remote guy” is a thing. People tend to forget people they don’t see every day, and you have to be comfortable with the low profile that comes with working remotely.

Having other remote workers in your team helps a bit, creating that nice “we’re all on the same boat” feeling.

You need to be able to communicate very well.

A huge part of working as a developer is being able to communicate well with other developers and with normal human beings. A programming genius that isn’t able to explain what he’s doing to his non-genius co-workers will likely not be a very good developer overall.

Language is one of man’s great achievements as a species, and it carries the weight and complexity of thousands of years of mutation. Expressing yourself verbally is hard enough, but we also have a myriad of non-verbal communication cues that we unconsciously rely on to communicate with one another — which you won’t have as a remote worker (at least most of the times).

Sure, you can occasionally call the HQ and video-chat with someone. But that is just not practical enough most of the times. As a remote worker, I find myself heavily relying on written communication with the rest of the team.

Every challenge brings a chance to learn something. Because of the challenges and limitations of working remotely, the experience helps you grow professionally in some ways:

Guidance from more experienced coworkers or bosses is much more rarefied, which force you to exercise self-teaching and pro-activity.
The need to communicate more often through asynchronous text-first chats helps you develop language skills (and patience).
Working away from the office and your coworkers makes you appreciate them more when eventually returning to the HQ.

Of course, there are many other benefits that come to mind when thinking about remote work, such as increased productivity and financial savings, and there are already studies and books that got those covered. Which is not to say that remote work is some kind of panacea: it has fundamental disadvantages when compared with traditional work inside a brick-and-mortar office building, the most obvious and important being the lack of human contact with your fellow workers. The solitude and heavy reliance on written, asynchronous communication that often comes with remote work might not be your cup of tea.

Remote work is neither a universal solution nor something completely out of reach for the average developer. Although it won’t be to everyone’s taste, it is definitively available to everyone (or should be). This semester abroad taught me that it is plainly possible and viable for a developer in a small, non-remote (but remote-friendly) software company to work far away from the HQ, and both sides have a lot to gain from the experience. It is really a win-win scenario, and people should try it more often.

By Leonardo Brito on June 12, 2018.

Canonical link

Exported from Medium on May 1, 2019.

A Developer's Notebook

Leaving Amazon

Book review - Sapiens

Book review - Collapse

DALL·E minis of the future won't be fun

Teslas are a dystopia

Are "digital nomad visas" a thing yet?

Botched interviews

FAANG 1 (2013)

FAANG 1 (2014)

Mid-sized tech (2014)

FAANG 2 (2019)

Big Tech (2019)

The printer in the room

Analyzing LinkedIn's data export: what happened in 2021?

Job offers per month

Most common terms

What happened/is happening?

Code

Onsites considered harmful

Efficient resource distribution

I replaced Google Analytics with a web server running on my phone

Table of Contents

Motivation

Why even keep anything?

And then there is the data

The (lack of) competition

Developing android-analytics

Fundamentals

First iteration: Sinatra webapp

Second iteration: Nginx log parser

Third iteration: Adding a viewer

Fourth iteration: Adding an installation script

Final architecture

August 2021 Update

Conclusion

Setting up a free HTTPS home server

Setting up Dynamic DNS

Caveat: carrier-grade NAT

Setting up a verified SSL

Running Certbot with DNS-01 and DuckDNS

Conclusion

Communication tips for remote developers

Don’t write a novel

Manage expectations

Tailor to the audience

Figuring out the Nvidia x Linux puzzle

1. Install the proprietary Nvidia drivers

2. Install and configure bbswitch

3. Blacklist Nouveau driver

4. Blacklist some Nvidia modules

Results

References

Repurposing an old Android phone as a Ruby web server

1. Install Termux

2. Set up SSH

3. Set up static IP address on Android

4. Install Ruby, Bundler, Sinatra and Puma

Install nginx

Set up port forwarding

Configure a dynamic dns

Hello world!

Under siege

Epilogue: safety

Speeding Up the Backend with Graph Theory

Background

First Optimization: Encode

Second Optimization: Finding a Node

Caveats

Final Results

Notes

My attempt at creating more

So I built this.

Halving page sizes with srcset

10 ways not to do a big deploy

1. Don’t make a plan

2. Don’t schedule downtime

3. Don’t have a great log system

4. Don’t have a bug tracker

5. Don’t have a staging server