* Clean up stale .gitignore and .gitattribute patterns
@ 2023-06-23 15:29 Sebastian Schuberth
2023-06-23 17:26 ` Junio C Hamano
2023-06-24 1:12 ` Jeff King
0 siblings, 2 replies; 9+ messages in thread
From: Sebastian Schuberth @ 2023-06-23 15:29 UTC (permalink / raw)
To: Git Mailing List
Hi,
is there a command to easily check patterns in .gitignore and
.gitattributes to still match something? I'd like to remove / correct
patterns that don't match anything anymore due to (re)moved files.
--
Sebastian Schuberth
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Clean up stale .gitignore and .gitattribute patterns
2023-06-23 15:29 Clean up stale .gitignore and .gitattribute patterns Sebastian Schuberth
@ 2023-06-23 17:26 ` Junio C Hamano
2023-06-24 1:16 ` Jeff King
2023-06-24 1:12 ` Jeff King
1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2023-06-23 17:26 UTC (permalink / raw)
To: Sebastian Schuberth; +Cc: Git Mailing List
Sebastian Schuberth <sschuberth@gmail.com> writes:
> is there a command to easily check patterns in .gitignore and
> .gitattributes to still match something? I'd like to remove / correct
> patterns that don't match anything anymore due to (re)moved files.
I guess "git check-attr --stdin" and "git check-ignore --stdin" will
be part of the solution to your problem, but I do not know what the
other parts would be.
Feeding "ls-files" output to "check-ignore --stdin" feels sort-of
oxymoron because by definition the output from "ls-files" cannot
contain any ignored paths.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Clean up stale .gitignore and .gitattribute patterns
2023-06-23 15:29 Clean up stale .gitignore and .gitattribute patterns Sebastian Schuberth
2023-06-23 17:26 ` Junio C Hamano
@ 2023-06-24 1:12 ` Jeff King
2023-06-26 7:51 ` Sebastian Schuberth
1 sibling, 1 reply; 9+ messages in thread
From: Jeff King @ 2023-06-24 1:12 UTC (permalink / raw)
To: Sebastian Schuberth; +Cc: Git Mailing List
On Fri, Jun 23, 2023 at 05:29:42PM +0200, Sebastian Schuberth wrote:
> is there a command to easily check patterns in .gitignore and
> .gitattributes to still match something? I'd like to remove / correct
> patterns that don't match anything anymore due to (re)moved files.
I don't think there's a solution that matches "easily", but you can do a
bit with some scripting. See below.
For checking .gitignore, I don't think you can ever say (at the git
level) that a certain pattern is useless, because it is inherently about
matching things that not tracked, and hence generated elsewhere. So if
you have a "*.foo" pattern, you can check if it matches anything
_currently_ in your working tree, but if it doesn't that may mean that
you simply did not trigger the build rule that makes the garbage ".foo"
file.
So with that caveat, we can ask Git which rules _do_ have a match, and
then eliminate them as "definitely useful", and print the others. The
logic is sufficiently tricky that I turned to perl:
-- >8 show-unmatched-ignore.pl 8< --
#!/usr/bin/perl
# The general idea here is to read "filename:linenr ..." output from
# "check-ignore -v". For each filename we learn about, we'll load the
# complete set of lines into an array and then "cross them off" as
# check-ignore tells us they were used.
#
# Note that we'd fail to mention an ignore file which matches nothing.
# Probably the list of filenames could be generated independently. I'll
# that as an exercise for the reader.
while (<>) {
/^(.*?):(\d+):/
or die "puzzling input: $_";
if (!defined $files{$1}) {
$files{$1} = do {
open(my $fh, '<', $1)
or die "unable to open $1: $!";
[<$fh>]
};
}
$files{$1}->[$2] = undef;
}
# With that done, whatever is left is unmatched. Print them.
for my $fn (sort keys(%files)) {
my $lines = $files{$fn};
for my $nr (1..@$lines) {
my $line = $lines->[$nr-1];
print "$fn:$nr $line" if defined $line;
}
}
-- >8 --
And you'd use it something like:
git ls-files -o |
git check-ignore --stdin -v |
perl show-unmatched-ignore.pl
Pretty clunky, but it works OK in git.git (and shows that there are many
"not matched but probably still useful" entries; e.g., "*.dll" will
never match for me on Linux, but is probably something we still want to
keep). So I wouldn't use it as an automated tool, but it might give a
starting point for a human looking to clean things up manually.
For attributes, I think the situation is better; we only need them to
match tracked files (though technically speaking, you may want to keep
attributes around for historical files as we use the checked-out
attributes during "git log", etc). Unfortunately we don't have an
equivalent of "-v" for check-attr. It might be possible to add that ,but
in the meantime, the best I could come up with is to munge each pattern
to add a sentinel attribute, and see if it matches anything.
Something like:
# Maybe also pipe in .git/info/attributes and core.attributesFile
# if you want to check those.
git ls-files '.gitattributes' '**/.gitattributes' |
while read fn; do
lines=$(wc -l <"$fn")
mv "$fn" "$fn.orig"
nr=1
while test $nr -le $lines; do
sed "${nr}s/$/ is-matched/" <"$fn.orig" >"$fn"
git ls-files | git check-attr --stdin is-matched |
grep -q "is-matched: set" ||
echo "$fn:$nr $(sed -n ${nr}p "$fn.orig")"
nr=$((nr+1))
done
mv "$fn.orig" "$fn"
done
It produces no output in git.git (we are using all of our attributes),
but you can add a useless one like:
echo '*.c -diff' >>Documentation/.gitattributes
and then the loop yields:
Documentation/.gitattributes:2 *.c -diff
So I definitely wouldn't call any of that "easy", but it may help you.
-Peff
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Clean up stale .gitignore and .gitattribute patterns
2023-06-23 17:26 ` Junio C Hamano
@ 2023-06-24 1:16 ` Jeff King
0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2023-06-24 1:16 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Sebastian Schuberth, Git Mailing List
On Fri, Jun 23, 2023 at 10:26:23AM -0700, Junio C Hamano wrote:
> Sebastian Schuberth <sschuberth@gmail.com> writes:
>
> > is there a command to easily check patterns in .gitignore and
> > .gitattributes to still match something? I'd like to remove / correct
> > patterns that don't match anything anymore due to (re)moved files.
>
> I guess "git check-attr --stdin" and "git check-ignore --stdin" will
> be part of the solution to your problem, but I do not know what the
> other parts would be.
>
> Feeding "ls-files" output to "check-ignore --stdin" feels sort-of
> oxymoron because by definition the output from "ls-files" cannot
> contain any ignored paths.
You can feed "ls-files -o" (since without --exclude-standard it lists
every untracked file in the working tree), but note that this is
inherently incomplete. Any solution like this can only tell you which
ones are unused by what's in your current working tree, not what might
be possible if you ran "make foo" or whatever.
It can be wrong the other way, too. You might have "file.foo" sitting
around from a build last year (or even sightseeing an old commit), even
though support for building ".foo" is long gone from the code base.
So you'd really want to start with a fresh clone, then run any build
commands that might possibly put cruft in the working tree (if that's
even possible on a single platform), and then do your analysis (and see
my other mail in the thread for some hacky scripting there).
-Peff
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Clean up stale .gitignore and .gitattribute patterns
2023-06-24 1:12 ` Jeff King
@ 2023-06-26 7:51 ` Sebastian Schuberth
2023-06-26 15:55 ` Junio C Hamano
0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Schuberth @ 2023-06-26 7:51 UTC (permalink / raw)
To: Jeff King; +Cc: Git Mailing List
Thanks Peff for the suggestion. I ended up scripting something via
JGit [1], as we're anyway using it as part of our Gradle build system.
PS: As a future idea, it might be good if "git mv" gives a hint about
updating .gitattributes if files matching .gitattributes pattern are
moved.
[1]: https://github.com/oss-review-toolkit/ort/pull/7195/commits/e01945d41012db2d0bc2e53d7be4abd513888ba6
--
Sebastian Schuberth
On Sat, Jun 24, 2023 at 3:12 AM Jeff King <peff@peff.net> wrote:
>
> On Fri, Jun 23, 2023 at 05:29:42PM +0200, Sebastian Schuberth wrote:
>
> > is there a command to easily check patterns in .gitignore and
> > .gitattributes to still match something? I'd like to remove / correct
> > patterns that don't match anything anymore due to (re)moved files.
>
> I don't think there's a solution that matches "easily", but you can do a
> bit with some scripting. See below.
>
> For checking .gitignore, I don't think you can ever say (at the git
> level) that a certain pattern is useless, because it is inherently about
> matching things that not tracked, and hence generated elsewhere. So if
> you have a "*.foo" pattern, you can check if it matches anything
> _currently_ in your working tree, but if it doesn't that may mean that
> you simply did not trigger the build rule that makes the garbage ".foo"
> file.
>
> So with that caveat, we can ask Git which rules _do_ have a match, and
> then eliminate them as "definitely useful", and print the others. The
> logic is sufficiently tricky that I turned to perl:
>
> -- >8 show-unmatched-ignore.pl 8< --
> #!/usr/bin/perl
>
> # The general idea here is to read "filename:linenr ..." output from
> # "check-ignore -v". For each filename we learn about, we'll load the
> # complete set of lines into an array and then "cross them off" as
> # check-ignore tells us they were used.
> #
> # Note that we'd fail to mention an ignore file which matches nothing.
> # Probably the list of filenames could be generated independently. I'll
> # that as an exercise for the reader.
> while (<>) {
> /^(.*?):(\d+):/
> or die "puzzling input: $_";
> if (!defined $files{$1}) {
> $files{$1} = do {
> open(my $fh, '<', $1)
> or die "unable to open $1: $!";
> [<$fh>]
> };
> }
> $files{$1}->[$2] = undef;
> }
>
> # With that done, whatever is left is unmatched. Print them.
> for my $fn (sort keys(%files)) {
> my $lines = $files{$fn};
> for my $nr (1..@$lines) {
> my $line = $lines->[$nr-1];
> print "$fn:$nr $line" if defined $line;
> }
> }
> -- >8 --
>
> And you'd use it something like:
>
> git ls-files -o |
> git check-ignore --stdin -v |
> perl show-unmatched-ignore.pl
>
> Pretty clunky, but it works OK in git.git (and shows that there are many
> "not matched but probably still useful" entries; e.g., "*.dll" will
> never match for me on Linux, but is probably something we still want to
> keep). So I wouldn't use it as an automated tool, but it might give a
> starting point for a human looking to clean things up manually.
>
> For attributes, I think the situation is better; we only need them to
> match tracked files (though technically speaking, you may want to keep
> attributes around for historical files as we use the checked-out
> attributes during "git log", etc). Unfortunately we don't have an
> equivalent of "-v" for check-attr. It might be possible to add that ,but
> in the meantime, the best I could come up with is to munge each pattern
> to add a sentinel attribute, and see if it matches anything.
>
> Something like:
>
> # Maybe also pipe in .git/info/attributes and core.attributesFile
> # if you want to check those.
> git ls-files '.gitattributes' '**/.gitattributes' |
> while read fn; do
> lines=$(wc -l <"$fn")
> mv "$fn" "$fn.orig"
> nr=1
> while test $nr -le $lines; do
> sed "${nr}s/$/ is-matched/" <"$fn.orig" >"$fn"
> git ls-files | git check-attr --stdin is-matched |
> grep -q "is-matched: set" ||
> echo "$fn:$nr $(sed -n ${nr}p "$fn.orig")"
> nr=$((nr+1))
> done
> mv "$fn.orig" "$fn"
> done
>
> It produces no output in git.git (we are using all of our attributes),
> but you can add a useless one like:
>
> echo '*.c -diff' >>Documentation/.gitattributes
>
> and then the loop yields:
>
> Documentation/.gitattributes:2 *.c -diff
>
> So I definitely wouldn't call any of that "easy", but it may help you.
>
> -Peff
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Clean up stale .gitignore and .gitattribute patterns
2023-06-26 7:51 ` Sebastian Schuberth
@ 2023-06-26 15:55 ` Junio C Hamano
2023-06-26 16:42 ` Sebastian Schuberth
0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2023-06-26 15:55 UTC (permalink / raw)
To: Sebastian Schuberth; +Cc: Jeff King, Git Mailing List
Sebastian Schuberth <sschuberth@gmail.com> writes:
> Thanks Peff for the suggestion. I ended up scripting something via
> JGit [1], as we're anyway using it as part of our Gradle build system.
>
> PS: As a future idea, it might be good if "git mv" gives a hint about
> updating .gitattributes if files matching .gitattributes pattern are
> moved.
Interesting. "git mv hello.jpg hello.jpeg" would suggest updating
a "*.jpg <list of attribute definitions>" line in the .gitattributes
to begin with "*.jpeg"?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Clean up stale .gitignore and .gitattribute patterns
2023-06-26 15:55 ` Junio C Hamano
@ 2023-06-26 16:42 ` Sebastian Schuberth
2023-06-27 6:51 ` Jeff King
0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Schuberth @ 2023-06-26 16:42 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, Git Mailing List
On Mon, Jun 26, 2023 at 5:55 PM Junio C Hamano <gitster@pobox.com> wrote:
> > PS: As a future idea, it might be good if "git mv" gives a hint about
> > updating .gitattributes if files matching .gitattributes pattern are
> > moved.
>
> Interesting. "git mv hello.jpg hello.jpeg" would suggest updating
> a "*.jpg <list of attribute definitions>" line in the .gitattributes
> to begin with "*.jpeg"?
Yes, right. Or as a simpler variant to start with (as patterns might
match files in different directories, and not all of the matching
files might be moved), just say that a specific .gitattributes line
needs updating (or needs to be duplicated / generalized in case files
in both the old and new location match).
--
Sebastian Schuberth
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Clean up stale .gitignore and .gitattribute patterns
2023-06-26 16:42 ` Sebastian Schuberth
@ 2023-06-27 6:51 ` Jeff King
2023-06-27 6:55 ` Sebastian Schuberth
0 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2023-06-27 6:51 UTC (permalink / raw)
To: Sebastian Schuberth; +Cc: Junio C Hamano, Git Mailing List
On Mon, Jun 26, 2023 at 06:42:17PM +0200, Sebastian Schuberth wrote:
> On Mon, Jun 26, 2023 at 5:55 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> > > PS: As a future idea, it might be good if "git mv" gives a hint about
> > > updating .gitattributes if files matching .gitattributes pattern are
> > > moved.
> >
> > Interesting. "git mv hello.jpg hello.jpeg" would suggest updating
> > a "*.jpg <list of attribute definitions>" line in the .gitattributes
> > to begin with "*.jpeg"?
>
> Yes, right. Or as a simpler variant to start with (as patterns might
> match files in different directories, and not all of the matching
> files might be moved), just say that a specific .gitattributes line
> needs updating (or needs to be duplicated / generalized in case files
> in both the old and new location match).
Yeah, I don't think we could ever do anything automated here; a human
needs to judge the intent and how the patterns should be adapted.
But perhaps something like:
1. When git-commit makes a new commit that removes paths (whether they
were totally removed, or renamed), find all gitattribute lines
whose patterns match those paths.
2. For each such pattern, see if it still matches anything in the
resulting tree.
3. If not, print advise() lines showing the file/line of the pattern
which is no longer used.
Doing so naively (by checking matches for each file in the tree) would
be a little expensive, but maybe OK in practice. It could perhaps be
done more efficiently with specialized code, but it might be tricky to
right (and you still end up O(size of tree) in the worst case, because
something like "*.jpg" needs to be compared against every entry).
Of course on the way there you should end up with a decent tool for
"which patterns are not currently used?". And you could just
periodically run that manually if you want to clean up (or even from a
post-commit hook).
Re-reading your email, though, I wonder if you meant something a little
simpler, like:
1. When a path is moved via git-mv, see if the attributes before/after
are the same.
2. If not, then mention which ones matched the old path via advise().
That is probably easier to write, though it does not help the "git rm"
case (where attributes may become obsolete).
-Peff
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Clean up stale .gitignore and .gitattribute patterns
2023-06-27 6:51 ` Jeff King
@ 2023-06-27 6:55 ` Sebastian Schuberth
0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Schuberth @ 2023-06-27 6:55 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Git Mailing List
On Tue, Jun 27, 2023 at 8:51 AM Jeff King <peff@peff.net> wrote:
> Re-reading your email, though, I wonder if you meant something a little
> simpler, like:
Indeed, I was only having the "git mv" case in mind and to advise() at
the time of that command being run, instead of advise()'ing at "git
commit" time.
--
Sebastian Schuberth
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-06-27 6:56 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-23 15:29 Clean up stale .gitignore and .gitattribute patterns Sebastian Schuberth
2023-06-23 17:26 ` Junio C Hamano
2023-06-24 1:16 ` Jeff King
2023-06-24 1:12 ` Jeff King
2023-06-26 7:51 ` Sebastian Schuberth
2023-06-26 15:55 ` Junio C Hamano
2023-06-26 16:42 ` Sebastian Schuberth
2023-06-27 6:51 ` Jeff King
2023-06-27 6:55 ` Sebastian Schuberth
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).