I love monorepos! Or at least I love Meta's (Facebook's) monorepo, which is the only real monorepo I have ever worked with. Here is why I think it’s great.
Easy access to all code
Meta's monorepo contains most of the company's code. Any developer working at Meta has access to it. We can search it, read it, and check the commit history. We also can, and frequently do, modify code code managed by other teams.
This easy access to all the code is great for the developer's productivity. Engineers can understand their dependencies deeper, debug issues across the entire stack, and implement features or bug fixes regardless of who manages the code. All this is available at their fingertips. They can hit the ground quickly without talking to other teams, reading their out-of-date wikis, and spending time figuring out how to clone and build their code.
Linear commit history
Meta's monorepo does not use branches, so the commit history is linear. The linear commiit history saves engineers from having to reverse engineer a London Tube Map-like merge history to determine if a given commit's snapshot contains their changes. With linear commit history, answering this question boils down to comparing commit times.
No versioning
Versioning is one of the most complex problems when working with multiple repos. Each repo is independent, and teams are free to decide which versions of dependencies they want to adopt. However, because each repo evolves at its own pace, different repos will inevitably end up with different versions of the same package. These inconsistencies lead to situations where a project may contain more than one version of the same dependency, but no single version works for everyone.
I experienced this firsthand during my time at Amazon. I was working on the Alexa app, which consisted of tens of packages, each pulling in at least a few dependencies. It was a versioning hell: conflicts were common, and resolving them was difficult. For example - one package used an older dependency because a newer version contained a bug. Another package, however, required the latest version because older versions lacked the needed features.
A monorepo solves versioning issues in a simple way: there is no versioning. All code is built together, so each package or project has only one version for a given commit.
Atomic commits
Monorepos allow atomic cross-project commits. Developers can rename classes or change function signatures without breaking code or tests. They just need to fix all the code affected by their change in the same commit.
Doing the same is impossible in a multi-repo environment. Introducing breaking changes is either safe but slow (as it requires multiple commits for a proper migration) or fast but at the expense of broken builds.
This problem plagued the ASP.NET Core project in its early days (ProjectK anyone?). The team was working on getting abstractions right, so the foundational interfaces constantly changed. Many packages (each in its repo) implemented or used these interfaces. Whenever they changed, most repos stopped compiling and needed fixes.
Build
Builds in monorepos are conceptually simple: all code in the repo is built at a given commit.
This approach makes it possible to quickly tell what's included in the build and create bundles where all build artifacts match.
While the idea is simple, building the entire monorepo becomes increasingly challenging as the repository grows. Compiling big monorepos, like Meta's, in a reasonable time is impossible without specialized build tools and massive infra.
Multiple repos make creating a list of matching packages surprisingly hard. I learned this when working on ASP.NET Core. The framework initially consisted of a couple of dozen of repos. Our build servers were constantly grinding because of what we called "build waves." A build wave was initiated by a single commit that triggered a build. When this build finished, it triggered builds in repos depending on it. This process continued until all repos were built. Not only was this process slow and fragile, but with a steady stream of commits across all the repos, producing a list of matching packages was difficult.
The ASP.NET Core team eventually consolidated all the code in a single repository adopting the monorepo approach. This change happened after I left the team, but I believe the challenges behind getting fast and consistent builds were an important reason.
What are the problems with monorepos?
If monorepos are so great, why isn't everyone using them? There are several reasons.
Scale
Scale poses the biggest challenge for monorepos. Meta's repository is counted in terabytes and receives thousands of commits each day. Detecting conflicts and ensuring that all changes are merged correctly and don't break the build without hurting developers' productivity is tough. As most off-the-shelf tools cannot handle this scale, Meta has many dedicated teams that maintain the build infrastructure. Sometimes, they need to go to great lengths to do their job. Here is an example:
Back in 2013, tooling teams ran a simulation that showed that in a few years, basic git commands would take 45 minutes to execute if the repo continued to grow at the rate it did. It was unacceptable, so Facebook engineers turned to Git folks to solve this problem. At that time, Git was uninterested in modifying their SCM (Source Code Management) to support such a big repo. The Mercurial (hg) team, however, was more receptive. With significant contributions from Facebook, it rearchitected Mercurial to meet Facebook's requirements. This is why Meta (a.k.a. Facebook) uses Mercurial (hg) as its source control.
Granular project permissions
Monorepos make accessing any code in the repository easy, which is great for developers' productivity. However, companies often have sensitive code only selected developers should be able to access. This requirement goes against the idea of the monorepo, which aims to make all code easily accessible. As a result, enforcing access to code in a monorepo is problematic. Creating separate repos for sensitive projects is also not ideal, especially if these projects use the common infrastructure the monorepo provides for free.
Release management
A common strategy to maintain multiple releases is to create a branch for each release. Follow-ups (e.g., bug fixes) can be merged to these branches without bringing unrelated changes that could destabilize the release. This strategy won't work in monorepos with a linear history.
I must admit that I don't know how teams that ship their products publicly handle their releases. Our team owns a few services we deploy to production frequently. If we find an issue, we roll back our deployment and fix the bug forward.
A single commit can break the build
Because for monorepos, the entire codebase is built at a given commit, merging a mistake that causes compilation errors will break the build. These situations happen despite the tooling that is supposed to prevent them. In practice, this is only rarely a problem. Developers are only affected if the project that doesn't compile is a dependency. And even then, they can workaround the problem by working off of an older commit until the breakage is fixed.
If you found this useful, please share it with a friend and consider subscribing if you haven’t already.
Thanks for reading!
-Pawel