Michael Hirn

Oppty, Codemod

January 15th, 2024

Foreword: If you are reading this, thank you for taking the time. My goal here is to inform and make it worth your while. If you feel generous, please send me your critical feedback and evaluation (rank/rate) of this opportunity via email, WhatsApp, or book a slot on my calendar.

As some of you may know, I began this year with the goal to find an under-appreciated, under-explored niche that I could develop to benefit others, while capturing some of the created value through a capital-efficient business. You may have also heard me mentioning something-something-codemod and some of you may wonder how this is useful to anyone and if the world really needs another developer tool.

Of course, codemod, like many other technologies, is what Kenneth Stanley would call a stepping stone, i.e. a means to something useful. For example, vacuum tubes were only a nerdy curiosity before Tommy Flowers used them as the defining stepping stone to build the first general computer and help the Allied forces win WWII. In this sense, codemods, like vacuum tubes once, are largely a nerdy curiosity and it is correct to ask why anyone should care.

When Max talked to me about his experience with the codemod space a few weeks ago, I immediately felt that this was an important puzzle piece for a few problems I had encountered. So below I will explain how I think the world would change for the better if we look at codemod through the right lens.

The Problem

Our story is about two characters and their tragic relationship: API vendors and API buyers. Our API vendors are companies and engineers that create libraries, SDKs, APIs, frameworks, toolkits, and anything that is consumed via code by other companies and engineers to build other tools or applications. Those other companies and engineers are our API buyers.

While API vendors now generate $500B in revenue across verticals such as AI (OpenAI, Mistral), Finance (Stripe, Adyen), Cloud (AWS, Cloudflare), DevTooling, Crypto, OSS, it seems that we just got started. There are a few trends that point to that.

  1. Today, API buyers are other tech businesses, but the 10x larger market of non-tech businesses is just coming online and is growing fast.
  2. Anyone who is already a API buyer keeps buying more API vendors. Many small-to-medium projects have now 10-20 critical, external API vendors.
  3. API vendors enter new verticals with native, fast-growing solutions (e.g. Headless everything - CRM, CMS, online shops, ...).

In short, the API ecosystem is booming! But what makes the API vendor-buyer relationship so strong and symbiotic, the deep integration into the code of the buyer, is also what makes it so tragic. Enter the villain of our story: switching costs.

For API buyers the problem starts when the honeymoon phase ends, usually 3-12 months after the API vendor is integrated. In slightly more than 50% of the cases, the buyer realizes that a sub-optimal vendor choice has been made. This is not, because buyers run a poor selection and procurement process. The problem is that no selection process can account for markets changing, needs changing, unknowns becoming known, and sometimes even vendors themselves changing - all these things are inherently unknowable up front.

Now this wouldn't be so bad if the API vendor decision could be easily reverted, but it can't. In fact, it is so costly that the capital outlays are not economical in 4/5 cases and the sub-optimal choice is accepted with technical debt accruing. API buyers are so mindful of this phenomenon that they created a term for it: vendor-lockin. You want to switch, but the switching costs are so high that you are effectively locked into the sub-optimal choice forever. API buyers also created a term for the only solution they know: vendor-agnostic, which refers to products that abstract over multiple vendors behind one interface. This barely solves the problem though, as you have now even more abstractions and overhead to deal with, which slows you down even in the case where you never end up switching vendors.

For API vendors, the problem is strongly varied. For incumbent vendors switching costs are not a problem, they are a godsend, for it creates a mini-monopoly that allows them to economically extort their consumers. For challenger vendors it is the opposite of a godsend, it is an existential threat, as they have to strike an impossible balancing act between reducing the switching cost for potential buyers by staying as close as possible to the legacy interfaces, while increasing the value through step-change innovation so buyers even consider switching. For API vendors switching costs create long, poorly-converting sales cycles for the most meaningful type of accounts - those that already spend large amounts on a solution and see value in switching.

The larger implication of it is this: Innovation, progress, and diversity are stifled by switching costs and all of us are worse off. The mission statement here would be something like this: Empower API vendors and buyyers with tools to create healthy, interdependent relationships for a growing, innovating, and diverse API ecosystem.

An interesting space where some of the variables are extreme is the contemporary AI space. Currently, AI API vendors become sub-optimal the moment they are integrated as every day new, ground-breaking solutions come on the market. Possibly a good beachhead.

The Solution

In a world without solutions, only tradeoffs, good automation comes as close as anything I know. And this is where we find our hero: codemod aka. automated source code modification. A codemod is a program that takes source code and returns a modified version of that source code. It is at the heart of the solution to the above switching problem because for API buyers the root cost comes from the fact that source code changes need to be made which are complex and require engineers. US companies spent $6-24B last year on those activities [1]. Codemod promises that we can reliably automate that for any codebase and any API vendor and make the switching 10-100x cheaper for API buyers.

In the next section, I will talk more about how codemod works and its state today, but for now, I want to point out that codemod is unrelated to recent AI & LLMs. Codemods for our situation need to be reliable and verifiable for several reasons but in the end to give API buyers and vendors a great experience and confidence in our solution. I think AI will be an ingredient for codemod (as I will explain below), but not in the way some people may think, for now, I just want you to know that I am not talking about training or prompting an LLM to rewrite the code as this naive approach does not work nearly reliably enough.

Reducing the switching cost problem for API buyers by 10-100x is for all intents and purposes a new capability and will likely have several second-order consequences, many non-obvious. However, there are two obvious ones for our API vendors that I want to draw attention to.

With a tool that eliminates switching costs between vendor solutions dramatically, API vendors can grow faster (in $ terms) and cheaper (in CAC terms). While switching costs made sales processes slow and poorly converting, they can now offer a self-service solution to allow even the largest clients to get a working implementation PR up in hours and thanks to CI see the impact on core metrics before buying. Even under modest assumptions a Series A API vendor who invests $50k into a solution to reduce switching costs for potential customers breaks even within six months [2]. Also, once vendor switching can be automated, there is less of a reason for API vendors to stay within the restriction of having to support and stay close to legacy interfaces, freeing them up to innovate around new workflows.

My current hunch is, that the product is a tool that API vendors implement on their website to let potential customers get an implementation PR up within minutes. Currently, API vendors point to their docs or sales, and have little direct control over activating leads. Solving the switching cost on the vendor side allows for a more scalable and frankly attractive business model as you are increasing their top-line, compared to the buyers side where it would be about decreasing their bottom-line.

API vendor example. Potential customers have to pay their own switching costs, which limits the influence and control API vendors have over their activation rates and CACs.

Why now

I want to address two fair and valid objections. One, why has this not been done before? And two, why can it be done now? Specifically, one possibility that would answer both: automatically editing arbitrarily complex code to migrate it from one vendor solution to another is simply not possible. I will present some evidence later as to why I think that this is probably not true and an explanation for why it has not been and what changed now.

And to start I want to give you a brief overview of the people and projects that shaped the space and continue to do so today. We are all standing on the shoulders of giants and this may be even a little bit more true in the codemod space. You will likely recognize some of the names and others you may want to learn from - lots of great talent in that space.


Note: I want to flag that the below section likely suffers from survivorship bias. Git (2004) and then Github (2008) make it fastly easier to find more recent tools, and other tools that never made the switch to git//github are simply much harder to find. So I do apologize if I am doing injustice to earlier tools. Please ping me if you think the below section deserves any additions or corrections.

Conception (early 70s - late 90s): Codemod's origins, like many great computer concepts, can be traced back to the Bell Labs of the 70s. Codemod related tools and ideas can be found in Lee McMahon's sed (1974) and Stephen Johnson's lint (1978). As far as I gather many emerging languages of the 80s and 90s like Java, Perl, or Python had some home-grown tools that would perform code inspection or transformation-related functions that are reminiscent of automatically modifying source code. However, as many predated the internet and certainly modern version control systems, it is hard to find really good sources here. This is complicated by the fact that the term "refactoring", a key use-case for codemod, was not invented until 1992 by William Opdyke.

Birth (early 00s - mid 10s): Leaving the swamps, one of the earliest and at the same time most economically successful tools/companies is JetBrains. While you may know them for their IDEs, it is less known that their first product, IntelliJ Renamer (2000), was a graphical tool for automated Java code modification and refactoring and one of the earliest codemod tools I could find. A year later, they included it in the first version of their IDE. In 2021, JetBrains created $180m in profit on $450m in revenue and never raised equity investment since its founding in 2000. Similarly, IBM released the Eclipse IDE (2001) which in 2003 received similar graphical refactor capabilities. Both IntelliJ/JetBrains and IBM/Eclipse may have their roots in Rational Software, which employed the IntelliJ founders in the 90s and in 2002 was acquired by IBM.

Another important, influential tool which popularised the word codemod itself, was written between 2007-2008 by Justin Rosenstein during his time at Facebook. His former boss and founding Facebook engineer, David Fetterman, released project codemod in Dec 2008. While possibly inspired by early refactoring tools from the Python ecosystem like rope (2006), codemod was much simpler and more accessible as it was essentially "search & replace" via the command line and likely helped popularize the concept.

Running, completely counter to codemod was Semmle (2007) which was founded by Oxford Professor Oege De Moor and built one of the most sophisticated source code querying languages, SemmleCode aka. .QL. With a sophisticated query language (the opposite of Justin's regex-based search & replace) they could create sophisticated code modification programs. As I understand it they commercialized that via various products and services and got acquired in 2019 by Github for an undisclosed amount.

Modern (mid 10s - now): Facebook appears to be a very fertile ground for codemod as it produced two other popular codemod tools jscodeshift (2015, Felix Kling) and fastmod (2018, Scott Wolchok). Both expanded on the usability and the performance front, making both of them widely used tools to write powerful codemod solutions.

Another relevant company is semgrep (2017) which was started by three security engineers from Palantir and in 2020 brought on Yoann Padioleau from Facebook who wrote another important project there called pfff (2010). Similar to Semmle, semgrep developed a sophisticated code query language to automate many code security-related tasks. Since its founding they have raised close to $100m to date from Lightspeed, Redpoint, and Sequoia.

There are several other interesting (open-source) tools that I will just link to for now: Coccinelle (C), ast-grep (multi), OpenRewrite (Java), Grit (multi), mogglo (multi), comby (multi).


Ok, let's return to our question above about whether or not this automated vendor switching is possible and why now. Having spent about 50 hours researching and coding (a Rust CLI to migrate OpenAI 3.5 calls to the new Mistral API which is faster/cheaper/better) I found the following:

  1. SemmleCode (now Github CodeQL) and semgrep have shown that it is possible to have a language to write arbitrarily complex code transformations.
  2. Since then, two powerful ecosystems, tree-sitter (2018, Github) and LSP/LSIF (2016-2019, Microsoft) emerged and brought sophistication and standardization into the space to simplify what Semmle and semgrep had to build over many years from scratch.
  3. Recent LLM tech is the missing piece to automate the construction of codemods for various vendors at scale. It is essentially codegen (automatic source code generation) for codemod (or codemod-gen) and having coded on the CLI and with my past experience in codegen and LLM-based codegen this solves the usability and production side of things.

What's next?

Almost everything in the above story is up for debate. Is switching cost really the root problem? Is codemod really the hero in that story? My believe is that the API vendor-buying relationship is sub-optimal for almost everyone involved. And I have a hunch of why that is, how you can fix that, and what a better relationship would look like. But as mentioned in the beginning my goal is to be useful to others (i.e. solve a real problem) and build a capital-efficient business, not to be right about my hunches and opinions. So at this stage, I am speaking to PMs, DevRels, Sales at API vendors and Engineers, CXO/VP/PMs at API buyers to learn more about their reality and problems. If you think you can help please reach out or even better book a slot to chat.

As solving this problem and developing the space is a team effort, I want to say thank you to all the people who contributed to this already, and I am excited to see where we can take this.


[1]: There are ca. 28m engineers which, according to one study, spend about 1 day / week on RMO related activity. At larger companies, this can go up to 2 days / week. For the US, which has ca. 4.4m engineers with an average comp of $140k, this equates to $120-$240B being spend on code RMO a year. In my experience a conservative number for how much of that time is spend on core dependency and vendor overhaul would be 5-10%.

[2]: TODO: copy Excel math over