Posted on 3 minutes read
daemon meme

At my current job, while reorganizing our company's root git monorepo, we decided to adopt a .gitignore allowlist pattern.

I had previously experimented with this approach on raconn, but this monorepo is significantly larger. The complexity required a dedicated script to generate the .gitignore file effectively.

Why use an allowlist?

Pros:

  • Pure state: You know exactly what is being tracked and must explicitly write tracking patterns.
  • No unintended files: Prevents accidental commits of secrets, large binaries, or build artifacts.

Cons:

  • Noise: Spurious files aren't listed as untracked, making them harder to spot.
  • Maintenance: The .gitignore must be regenerated.

Git gotchas

By default, every line in a .gitignore file is a pattern to exclude. Files matching these patterns cannot be added to version control easily and do not appear as untracked in git status.

The Git documentation states the following regarding negation:

An optional prefix "!" which negates the pattern; any matching file excluded by a previous pattern will become included again. It is not possible to re-include a file if a parent directory of that file is excluded. Git doesn’t list excluded directories for performance reasons, so any patterns on contained files have no effect, no matter where they are defined.

Starting point

To build an allowlist, we start by ignoring everything with /**/* and then selectively adding file extensions. For a Rust project, a naive .gitignore might look like this:

# ignore everything
/**/*

!/.gitignore
!**/*.rs
!**/Cargo.toml

If your repository looks like this:

$ tree
.
β”œβ”€β”€ .gitignore
└── crate-a
    β”œβ”€β”€ Cargo.toml
    └── src
        └── main.rs

Running git status will unexpectedly show only the .gitignore file:

$ git status
On branch main
Untracked files:
        .gitignore

This happens because git does not list excluded directories for performance reasons and we ignored everything. To fix this, we must "unignore" all directories so git can traverse them to find allowed files:

# ignore everything
/**/*
# allow all directories
!**/
# allow specific files
!**/*.rs
...

It is best to explicitly tell git not to look into commonly ignored directories (like target/ or node_modules/), as they might contain allowed file extensions that we don't want tracked.

A script is born

Managing this manually is complex, especially when mixing multiple projects and programming languages in a single monorepo, to alleviate some of the pain, I created a generation script.

I used some fun bash idioms to keep it concise.

println() { printf "%s\n" "$@"; } every parameters to println will be printed on a new line.

"${@/#/param}" in a function will expands to "paramA" "paramB" "paramC" when called with func A B C.

Some git command were of great help while searching for missing patterns that should be allowed, git ls-files --others --ignored --exclude-standard will list all files that are currently ignored, with this and a fresh clone of the repo, it make it easier to spot files that are tracked but ignored.

With these helpers, we can go up the abstraction ladder to build allow and deny functions and in turn build files and exts to create language specific allowlists.

Here is the script:

#!/usr/bin/env -S bash -o nounset -o pipefail -o errexit
# https://blog.izissise.net/posts/gitignoreallowlist/

# helpers
println() { printf "%s\n" "$@"; }
header()  {
    println \
        "# This file is generated by '$0' do NOT modify ($(date "+%Y-%m-%d"))" \
        "# Commit '$(git rev-list --max-count=1 HEAD)'"
}
denyallfiles() { println "/**/*"; }
allowalldirs() { println "!**/"; }
allow()        { println "${@/#/!/}"; }
deny()         { println "${@/#//}"; }

files() { # allow files, usage: files BASE_PATH [FILES]...
    local p=$1; shift 1;
    allow "${@/#/${p}/}"
}

exts() { # allow extensions, usage: exts BASE_PATH [EXTS]...
    local p=$1; shift 1;
    allow "${@/#/${p}/*.}"
}

css()    { for p in "${@}"; do exts "$p" css scss coffee less; done; }
fonts()  { for p in "${@}"; do exts "$p" ttf woff woff2 otf eot; done; }
shell()  { for p in "${@}"; do exts "$p" sh bash; done; }
python() {
    for p in "${@}"; do
        deny "${p}/**/.venv/" "${p}/**/venv/"
        files "$p" requirements.txt
        exts "${p}/**" py
    done
}
golang() {
    for p in "${@}"; do
        files "$p" go.mod go.sum
        exts "${p}/**" go
    done
}
rust()   {
    for p in "${@}"; do
        deny "${p}/target"
        exts "${p}/src/**" rs
        exts "${p}/examples/**" rs
        exts "${p}/tests/**" rs
        files "$p" \
            build.rs Cargo.toml Cargo.lock \
            rustfmt.toml clippy.toml .cargo/config.toml
    done
}

##############################
##############################

header
denyallfiles
allowalldirs

# root
allow \
    .editorconfig .gitignore .mailmap \
    "**/README.md" "**/.keep" "**/LICENSE"

rust crate-a
shell "**"

To update the allowlist, simply run: ./gengitignore.sh > .gitignore

This example generates the following
# This should works well for crate-a
/**/*
!**/
!/.editorconfig
!/.gitignore
!/.mailmap
!/**/README.md
!/**/.keep
!/**/LICENSE
/crate-a/target
!/crate-a/src/**/*.rs
!/crate-a/examples/**/*.rs
!/crate-a/tests/**/*.rs
!/crate-a/build.rs
!/crate-a/Cargo.toml
!/crate-a/Cargo.lock
!/crate-a/rustfmt.toml
!/crate-a/clippy.toml
!/crate-a/.cargo/config.toml
!/**/*.sh
!/**/*.bash

That’s all for now!