RSLABBERT.COM

A build tool for the rest of us

2021-04-13

Imagine a hypothetical company with a codebase ranging between 0 to 10 million lines of code. This codebase spans a number of different languages such as Go/Java/Node.js servers, JS webapps, and Python data pipelines. Is there an existing tool we can use to provide a unified build system? Maybe, but it's probably not worth the effort to implement. Can we do better?

When we talk about build tools it generally illicits one of three archetypes:

Given the above, our hypotethical company most likely wants the 3rd option, but these existing tools are fairly complex and have a number of shortcomings for companies that don't need to build billions of lines of code.

Language specific tool integration

Bazel was initially designed heavily around Google's core languages: C/++, Java, Python, and pre-npm Javascript. The thing all four of these technologies had in common (at the time the tool was written) was a lack of good language centric build tools. Coupled with a preference for vendoring libraries, this design get brought to a stretch when we look at how it handles languages such as Rust.

If we acknowledge that most modern development use a build tool that is already designed for the language in question, it would make sense for our hypothetical polyglot tool to natively support these language centric tools. In regards to Rust, instead of trying to extract a Cargo.toml file into a generated BAZEL file, our build tool could understand what a Rust project looks like, especially as it relates to how the tools already handle dependency management.

For example, consider the following project structure:

src/
    internal/sdk/mylib/Cargo.toml (depends on crates.io libraries)
    team_a/services/myexe/Cargo.toml (depends on mylib)

We should be able to just run ourtool run team_a/services/myexe (in addition to build and test) and it should "just work". That's to say, it should fetch the crates.io libraries for mylib, build mylib, build myexe, and execute it. This is a dramatic departure from the current build tool migration process which generally involves extensive reworking of build scripts and dependency management.

Pants v2 is looking very promising in this space. It currently only supports Python, but already tries to integrate natively with what Python development is today. For example, you can automatically run any python projects without first having to create a build file. In addition, adding third party dependencies is as simple as adding a requirements.txt file and having pants push that into the virtual environment (via some Pex magic) for the Python app we're trying to execute. I'm keeping an eye on this to see how it handles the next two languages they seem to be targeting (namely Rust and Go).

Editor / IDE / Language Server Protocol Integration

A follow on from the previous point is that most development smarts (intellisense, refactoring, etc.) all rely on the language centric tools. In the simplest case, having our Javascript project use a real package.json file with a corresponding node_modules/.yarn directory (optimally created and managed by our tool), our editor will natively pick up types and libraries.

There are more complicated cases which are also generally solveable. Python, for example, needs a virtual environment or Pex file created, and code generation from tools such as GRPC needs to actually create the files in the same filesystem as the code (not in a hermetic directory) to ensure the editor picks it up (our tool should probably manage .gitignore to ensure these files aren't commited).

Make is easy to integrate new tools

While our tool natively supporting language centric build systems works in some environments (e.g. Rust or Go), there are others where it works slightly less well. Javascript web development, for example, doesn't have a standardised build tool. Teams might use Webpack, Rollup, Parcel, Snowpack, Vite, Create-React-App, ESBuild, and so on. Trying to support each of these tools is a recipe for development failure.

The optimal way to approach this should probably two-fold:

For example, if we wanted to integrate create react app natively into our build tool, for projects already using npm it should be as simple as:

npm_bin(
    target = "build",
    script = "react-scripts build",
    out = ["./build"]
)

This should change our ourtool build ./a/path to "just work".

It should be better for single language projects

Even if we use only a single language, our build tool should add enough on top of the native language tooling to justify it. There are a couple of examples that immediately jump out:

Fast / Correct - Choose 80%

If you're someone who worked on Bazel I hope you're not reading this post, because at this point you'll be thinking "all of these features will break invariants and make other features such as remote caching more difficult/impossible". The issue is that I agree, but fundamentally, there is most likely an 80% version of these invariants that will enable most of these features while not requiring the same amount of specificity.

Additionally, there's probably a disable_smarts = true flag that can be conditionally or globally set in situations where an organisation finds the smarts are causing significant enough performance issues. Then again, for these organisations, Bazel already exists.

There are a couple of things we definitely want to keep, however:

Conclusion

What is missing here and why does this tool not exist? There are a lot of gaps or issues that should immediately jump out to different people. Some of these issues might disqualify a tool like this from existing entirely, but I suspect that most of these issues can actually be worked around as long as we're willing to accept that we just might not be able to perfectly guarantee certain things (such as pure hermeticism), but is that not still an improvement for teams where Bazel is too complex?