From: Sebastian Schuberth <sschuberth@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Clean up stale .gitignore and .gitattribute patterns
Date: Mon, 26 Jun 2023 09:51:27 +0200 [thread overview]
Message-ID: <CAHGBnuPO63Hi8mfA+MkAGES-gs0eNCDPG2FcPZT=YsnVzKd30A@mail.gmail.com> (raw)
In-Reply-To: <20230624011234.GA95358@coredump.intra.peff.net>
Thanks Peff for the suggestion. I ended up scripting something via
JGit [1], as we're anyway using it as part of our Gradle build system.
PS: As a future idea, it might be good if "git mv" gives a hint about
updating .gitattributes if files matching .gitattributes pattern are
moved.
[1]: https://github.com/oss-review-toolkit/ort/pull/7195/commits/e01945d41012db2d0bc2e53d7be4abd513888ba6
--
Sebastian Schuberth
On Sat, Jun 24, 2023 at 3:12 AM Jeff King <peff@peff.net> wrote:
>
> On Fri, Jun 23, 2023 at 05:29:42PM +0200, Sebastian Schuberth wrote:
>
> > is there a command to easily check patterns in .gitignore and
> > .gitattributes to still match something? I'd like to remove / correct
> > patterns that don't match anything anymore due to (re)moved files.
>
> I don't think there's a solution that matches "easily", but you can do a
> bit with some scripting. See below.
>
> For checking .gitignore, I don't think you can ever say (at the git
> level) that a certain pattern is useless, because it is inherently about
> matching things that not tracked, and hence generated elsewhere. So if
> you have a "*.foo" pattern, you can check if it matches anything
> _currently_ in your working tree, but if it doesn't that may mean that
> you simply did not trigger the build rule that makes the garbage ".foo"
> file.
>
> So with that caveat, we can ask Git which rules _do_ have a match, and
> then eliminate them as "definitely useful", and print the others. The
> logic is sufficiently tricky that I turned to perl:
>
> -- >8 show-unmatched-ignore.pl 8< --
> #!/usr/bin/perl
>
> # The general idea here is to read "filename:linenr ..." output from
> # "check-ignore -v". For each filename we learn about, we'll load the
> # complete set of lines into an array and then "cross them off" as
> # check-ignore tells us they were used.
> #
> # Note that we'd fail to mention an ignore file which matches nothing.
> # Probably the list of filenames could be generated independently. I'll
> # that as an exercise for the reader.
> while (<>) {
> /^(.*?):(\d+):/
> or die "puzzling input: $_";
> if (!defined $files{$1}) {
> $files{$1} = do {
> open(my $fh, '<', $1)
> or die "unable to open $1: $!";
> [<$fh>]
> };
> }
> $files{$1}->[$2] = undef;
> }
>
> # With that done, whatever is left is unmatched. Print them.
> for my $fn (sort keys(%files)) {
> my $lines = $files{$fn};
> for my $nr (1..@$lines) {
> my $line = $lines->[$nr-1];
> print "$fn:$nr $line" if defined $line;
> }
> }
> -- >8 --
>
> And you'd use it something like:
>
> git ls-files -o |
> git check-ignore --stdin -v |
> perl show-unmatched-ignore.pl
>
> Pretty clunky, but it works OK in git.git (and shows that there are many
> "not matched but probably still useful" entries; e.g., "*.dll" will
> never match for me on Linux, but is probably something we still want to
> keep). So I wouldn't use it as an automated tool, but it might give a
> starting point for a human looking to clean things up manually.
>
> For attributes, I think the situation is better; we only need them to
> match tracked files (though technically speaking, you may want to keep
> attributes around for historical files as we use the checked-out
> attributes during "git log", etc). Unfortunately we don't have an
> equivalent of "-v" for check-attr. It might be possible to add that ,but
> in the meantime, the best I could come up with is to munge each pattern
> to add a sentinel attribute, and see if it matches anything.
>
> Something like:
>
> # Maybe also pipe in .git/info/attributes and core.attributesFile
> # if you want to check those.
> git ls-files '.gitattributes' '**/.gitattributes' |
> while read fn; do
> lines=$(wc -l <"$fn")
> mv "$fn" "$fn.orig"
> nr=1
> while test $nr -le $lines; do
> sed "${nr}s/$/ is-matched/" <"$fn.orig" >"$fn"
> git ls-files | git check-attr --stdin is-matched |
> grep -q "is-matched: set" ||
> echo "$fn:$nr $(sed -n ${nr}p "$fn.orig")"
> nr=$((nr+1))
> done
> mv "$fn.orig" "$fn"
> done
>
> It produces no output in git.git (we are using all of our attributes),
> but you can add a useless one like:
>
> echo '*.c -diff' >>Documentation/.gitattributes
>
> and then the loop yields:
>
> Documentation/.gitattributes:2 *.c -diff
>
> So I definitely wouldn't call any of that "easy", but it may help you.
>
> -Peff
next prev parent reply other threads:[~2023-06-26 7:51 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-23 15:29 Clean up stale .gitignore and .gitattribute patterns Sebastian Schuberth
2023-06-23 17:26 ` Junio C Hamano
2023-06-24 1:16 ` Jeff King
2023-06-24 1:12 ` Jeff King
2023-06-26 7:51 ` Sebastian Schuberth [this message]
2023-06-26 15:55 ` Junio C Hamano
2023-06-26 16:42 ` Sebastian Schuberth
2023-06-27 6:51 ` Jeff King
2023-06-27 6:55 ` Sebastian Schuberth
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHGBnuPO63Hi8mfA+MkAGES-gs0eNCDPG2FcPZT=YsnVzKd30A@mail.gmail.com' \
--to=sschuberth@gmail.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).