In the software industry, it is known that an out-of-source build is superior to an in-source build. However, despite the fact that mature build systems like CMake and Meson have out-of-the-box support for out-of-source builds, many projects are still built in-source.
Many agree that in source builds more straightforward to setup, out of source builds are “cleaner” and easier to manage. Some projects are not being switched because the current setup is sufficient and the “out-of-source build” is considered a nice-to-have feature.
The situation is similar in code generation. We have a source code generation process where the generated code is tracked in the source tree. For example, we generate API code from IDL files in backend projects. Then we have the out of source code generation. The macro systems supported by modern languages like Rust are an example of this.
In light of the evolving DevOps landscape, it is essential to re-evaluate the pros and cons of in/out-of-source builds.
Garbage files don’t work well with Monorepo
More and more projects are using monorepo to organize their code, especially large projects. Source code build and generation systems save more files in the repository. The build cache is not tracked by the source control system, but the generated code is. This put more pressure on the source control system.
Ignored files hurts source control performance
To get the status of the repository, source control system have to stating every files in it. Having more files means that each time you use the “git status” command, git have more work to do. While compiling, the cache files are updated often. Source control systems have to invalidate the internal cache, which causes the execution time to get even slower. For small projects, it doesn’t take long to get repository status. However, in a monorepo, the performance hit is worse. A monorepo has so many files that the cost of invalidating the cache is high. Git needs to load the “.gitignore” file for every ignored file, then run regular expression matches against the file path to determine if there are new files to track. Actually, in a big monorepo, more than 60% of the time is spent checking ignored files in “git status”.
In source generated code makes code review impossible
Code that is generated is considered noise during a code review. Nobody reviews code that was generated. In a small project, there is usually only a few lines of generated code the reviewer can easily filter out. In a monorepo, the code is stored in its respective module. This makes it very difficult to filter out during a code review. In that case, most reviewers would simply quit and say “LGTM”.
Generated code in monorepo is not reviewable.
Hidden ignored files eat up disk space over time
A monorepo is made up of more than one project. Developers may work on one project at a time and switch to another project later. However, old project’s build caches are not cleaned. As time goes on, there will be more and more build caches in the repository that haven’t been deleted. These caches will take up a lot of storage space. The in-source-build cache directories are ignored by the source control system. They spread all over the repository, there is no obvious way to know the size of the build caches. In a out-of-source build project, you can see the size of the build directory in the file browser.
Garbage files don’t play well with Predictive coding helpers
In recent years, the popularity of coding tools that use predictive models has grown quickly. Even if you don’t use a predictive model-based coding helper, other people will use it.
In source build caches and generated code pollute the context
You should not edit the caches or generated code manually. Changing these files in the wrong way will probably cause the build to fail and will lead to hard-to-find bugs. But your predictive model-based coding helpers don’t know that! They will treat build caches and generated code as regular source files, adding them to the current context. If the models have a shorter context length, the input is cut off, and useful information may be lost. Even on models with a long context, the output becomes less accurate as the context gets longer.
In source build increase the background noise level.
In source build caches and generated code makes searching slower and affects the efficiency of coding helpers
Code searching is important for both the developers and coding helpers. Developers depend on code search during the process of refactoring. Code helpers use code search to better understand the code base and improve the quality of generated code. Building caches and generating code in the source makes searching slower and adds noise to the search results. In the worst case, code helpers may add code to the wrong files.