* Clean up stale .gitignore and .gitattribute patterns @ 2023-06-23 15:29 Sebastian Schuberth 2023-06-23 17:26 ` Junio C Hamano 2023-06-24 1:12 ` Jeff King 0 siblings, 2 replies; 9+ messages in thread From: Sebastian Schuberth @ 2023-06-23 15:29 UTC (permalink / raw) To: Git Mailing List Hi, is there a command to easily check patterns in .gitignore and .gitattributes to still match something? I'd like to remove / correct patterns that don't match anything anymore due to (re)moved files. -- Sebastian Schuberth ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Clean up stale .gitignore and .gitattribute patterns 2023-06-23 15:29 Clean up stale .gitignore and .gitattribute patterns Sebastian Schuberth @ 2023-06-23 17:26 ` Junio C Hamano 2023-06-24 1:16 ` Jeff King 2023-06-24 1:12 ` Jeff King 1 sibling, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2023-06-23 17:26 UTC (permalink / raw) To: Sebastian Schuberth; +Cc: Git Mailing List Sebastian Schuberth <sschuberth@gmail.com> writes: > is there a command to easily check patterns in .gitignore and > .gitattributes to still match something? I'd like to remove / correct > patterns that don't match anything anymore due to (re)moved files. I guess "git check-attr --stdin" and "git check-ignore --stdin" will be part of the solution to your problem, but I do not know what the other parts would be. Feeding "ls-files" output to "check-ignore --stdin" feels sort-of oxymoron because by definition the output from "ls-files" cannot contain any ignored paths. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Clean up stale .gitignore and .gitattribute patterns 2023-06-23 17:26 ` Junio C Hamano @ 2023-06-24 1:16 ` Jeff King 0 siblings, 0 replies; 9+ messages in thread From: Jeff King @ 2023-06-24 1:16 UTC (permalink / raw) To: Junio C Hamano; +Cc: Sebastian Schuberth, Git Mailing List On Fri, Jun 23, 2023 at 10:26:23AM -0700, Junio C Hamano wrote: > Sebastian Schuberth <sschuberth@gmail.com> writes: > > > is there a command to easily check patterns in .gitignore and > > .gitattributes to still match something? I'd like to remove / correct > > patterns that don't match anything anymore due to (re)moved files. > > I guess "git check-attr --stdin" and "git check-ignore --stdin" will > be part of the solution to your problem, but I do not know what the > other parts would be. > > Feeding "ls-files" output to "check-ignore --stdin" feels sort-of > oxymoron because by definition the output from "ls-files" cannot > contain any ignored paths. You can feed "ls-files -o" (since without --exclude-standard it lists every untracked file in the working tree), but note that this is inherently incomplete. Any solution like this can only tell you which ones are unused by what's in your current working tree, not what might be possible if you ran "make foo" or whatever. It can be wrong the other way, too. You might have "file.foo" sitting around from a build last year (or even sightseeing an old commit), even though support for building ".foo" is long gone from the code base. So you'd really want to start with a fresh clone, then run any build commands that might possibly put cruft in the working tree (if that's even possible on a single platform), and then do your analysis (and see my other mail in the thread for some hacky scripting there). -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Clean up stale .gitignore and .gitattribute patterns 2023-06-23 15:29 Clean up stale .gitignore and .gitattribute patterns Sebastian Schuberth 2023-06-23 17:26 ` Junio C Hamano @ 2023-06-24 1:12 ` Jeff King 2023-06-26 7:51 ` Sebastian Schuberth 1 sibling, 1 reply; 9+ messages in thread From: Jeff King @ 2023-06-24 1:12 UTC (permalink / raw) To: Sebastian Schuberth; +Cc: Git Mailing List On Fri, Jun 23, 2023 at 05:29:42PM +0200, Sebastian Schuberth wrote: > is there a command to easily check patterns in .gitignore and > .gitattributes to still match something? I'd like to remove / correct > patterns that don't match anything anymore due to (re)moved files. I don't think there's a solution that matches "easily", but you can do a bit with some scripting. See below. For checking .gitignore, I don't think you can ever say (at the git level) that a certain pattern is useless, because it is inherently about matching things that not tracked, and hence generated elsewhere. So if you have a "*.foo" pattern, you can check if it matches anything _currently_ in your working tree, but if it doesn't that may mean that you simply did not trigger the build rule that makes the garbage ".foo" file. So with that caveat, we can ask Git which rules _do_ have a match, and then eliminate them as "definitely useful", and print the others. The logic is sufficiently tricky that I turned to perl: -- >8 show-unmatched-ignore.pl 8< -- #!/usr/bin/perl # The general idea here is to read "filename:linenr ..." output from # "check-ignore -v". For each filename we learn about, we'll load the # complete set of lines into an array and then "cross them off" as # check-ignore tells us they were used. # # Note that we'd fail to mention an ignore file which matches nothing. # Probably the list of filenames could be generated independently. I'll # that as an exercise for the reader. while (<>) { /^(.*?):(\d+):/ or die "puzzling input: $_"; if (!defined $files{$1}) { $files{$1} = do { open(my $fh, '<', $1) or die "unable to open $1: $!"; [<$fh>] }; } $files{$1}->[$2] = undef; } # With that done, whatever is left is unmatched. Print them. for my $fn (sort keys(%files)) { my $lines = $files{$fn}; for my $nr (1..@$lines) { my $line = $lines->[$nr-1]; print "$fn:$nr $line" if defined $line; } } -- >8 -- And you'd use it something like: git ls-files -o | git check-ignore --stdin -v | perl show-unmatched-ignore.pl Pretty clunky, but it works OK in git.git (and shows that there are many "not matched but probably still useful" entries; e.g., "*.dll" will never match for me on Linux, but is probably something we still want to keep). So I wouldn't use it as an automated tool, but it might give a starting point for a human looking to clean things up manually. For attributes, I think the situation is better; we only need them to match tracked files (though technically speaking, you may want to keep attributes around for historical files as we use the checked-out attributes during "git log", etc). Unfortunately we don't have an equivalent of "-v" for check-attr. It might be possible to add that ,but in the meantime, the best I could come up with is to munge each pattern to add a sentinel attribute, and see if it matches anything. Something like: # Maybe also pipe in .git/info/attributes and core.attributesFile # if you want to check those. git ls-files '.gitattributes' '**/.gitattributes' | while read fn; do lines=$(wc -l <"$fn") mv "$fn" "$fn.orig" nr=1 while test $nr -le $lines; do sed "${nr}s/$/ is-matched/" <"$fn.orig" >"$fn" git ls-files | git check-attr --stdin is-matched | grep -q "is-matched: set" || echo "$fn:$nr $(sed -n ${nr}p "$fn.orig")" nr=$((nr+1)) done mv "$fn.orig" "$fn" done It produces no output in git.git (we are using all of our attributes), but you can add a useless one like: echo '*.c -diff' >>Documentation/.gitattributes and then the loop yields: Documentation/.gitattributes:2 *.c -diff So I definitely wouldn't call any of that "easy", but it may help you. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Clean up stale .gitignore and .gitattribute patterns 2023-06-24 1:12 ` Jeff King @ 2023-06-26 7:51 ` Sebastian Schuberth 2023-06-26 15:55 ` Junio C Hamano 0 siblings, 1 reply; 9+ messages in thread From: Sebastian Schuberth @ 2023-06-26 7:51 UTC (permalink / raw) To: Jeff King; +Cc: Git Mailing List Thanks Peff for the suggestion. I ended up scripting something via JGit [1], as we're anyway using it as part of our Gradle build system. PS: As a future idea, it might be good if "git mv" gives a hint about updating .gitattributes if files matching .gitattributes pattern are moved. [1]: https://github.com/oss-review-toolkit/ort/pull/7195/commits/e01945d41012db2d0bc2e53d7be4abd513888ba6 -- Sebastian Schuberth On Sat, Jun 24, 2023 at 3:12 AM Jeff King <peff@peff.net> wrote: > > On Fri, Jun 23, 2023 at 05:29:42PM +0200, Sebastian Schuberth wrote: > > > is there a command to easily check patterns in .gitignore and > > .gitattributes to still match something? I'd like to remove / correct > > patterns that don't match anything anymore due to (re)moved files. > > I don't think there's a solution that matches "easily", but you can do a > bit with some scripting. See below. > > For checking .gitignore, I don't think you can ever say (at the git > level) that a certain pattern is useless, because it is inherently about > matching things that not tracked, and hence generated elsewhere. So if > you have a "*.foo" pattern, you can check if it matches anything > _currently_ in your working tree, but if it doesn't that may mean that > you simply did not trigger the build rule that makes the garbage ".foo" > file. > > So with that caveat, we can ask Git which rules _do_ have a match, and > then eliminate them as "definitely useful", and print the others. The > logic is sufficiently tricky that I turned to perl: > > -- >8 show-unmatched-ignore.pl 8< -- > #!/usr/bin/perl > > # The general idea here is to read "filename:linenr ..." output from > # "check-ignore -v". For each filename we learn about, we'll load the > # complete set of lines into an array and then "cross them off" as > # check-ignore tells us they were used. > # > # Note that we'd fail to mention an ignore file which matches nothing. > # Probably the list of filenames could be generated independently. I'll > # that as an exercise for the reader. > while (<>) { > /^(.*?):(\d+):/ > or die "puzzling input: $_"; > if (!defined $files{$1}) { > $files{$1} = do { > open(my $fh, '<', $1) > or die "unable to open $1: $!"; > [<$fh>] > }; > } > $files{$1}->[$2] = undef; > } > > # With that done, whatever is left is unmatched. Print them. > for my $fn (sort keys(%files)) { > my $lines = $files{$fn}; > for my $nr (1..@$lines) { > my $line = $lines->[$nr-1]; > print "$fn:$nr $line" if defined $line; > } > } > -- >8 -- > > And you'd use it something like: > > git ls-files -o | > git check-ignore --stdin -v | > perl show-unmatched-ignore.pl > > Pretty clunky, but it works OK in git.git (and shows that there are many > "not matched but probably still useful" entries; e.g., "*.dll" will > never match for me on Linux, but is probably something we still want to > keep). So I wouldn't use it as an automated tool, but it might give a > starting point for a human looking to clean things up manually. > > For attributes, I think the situation is better; we only need them to > match tracked files (though technically speaking, you may want to keep > attributes around for historical files as we use the checked-out > attributes during "git log", etc). Unfortunately we don't have an > equivalent of "-v" for check-attr. It might be possible to add that ,but > in the meantime, the best I could come up with is to munge each pattern > to add a sentinel attribute, and see if it matches anything. > > Something like: > > # Maybe also pipe in .git/info/attributes and core.attributesFile > # if you want to check those. > git ls-files '.gitattributes' '**/.gitattributes' | > while read fn; do > lines=$(wc -l <"$fn") > mv "$fn" "$fn.orig" > nr=1 > while test $nr -le $lines; do > sed "${nr}s/$/ is-matched/" <"$fn.orig" >"$fn" > git ls-files | git check-attr --stdin is-matched | > grep -q "is-matched: set" || > echo "$fn:$nr $(sed -n ${nr}p "$fn.orig")" > nr=$((nr+1)) > done > mv "$fn.orig" "$fn" > done > > It produces no output in git.git (we are using all of our attributes), > but you can add a useless one like: > > echo '*.c -diff' >>Documentation/.gitattributes > > and then the loop yields: > > Documentation/.gitattributes:2 *.c -diff > > So I definitely wouldn't call any of that "easy", but it may help you. > > -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Clean up stale .gitignore and .gitattribute patterns 2023-06-26 7:51 ` Sebastian Schuberth @ 2023-06-26 15:55 ` Junio C Hamano 2023-06-26 16:42 ` Sebastian Schuberth 0 siblings, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2023-06-26 15:55 UTC (permalink / raw) To: Sebastian Schuberth; +Cc: Jeff King, Git Mailing List Sebastian Schuberth <sschuberth@gmail.com> writes: > Thanks Peff for the suggestion. I ended up scripting something via > JGit [1], as we're anyway using it as part of our Gradle build system. > > PS: As a future idea, it might be good if "git mv" gives a hint about > updating .gitattributes if files matching .gitattributes pattern are > moved. Interesting. "git mv hello.jpg hello.jpeg" would suggest updating a "*.jpg <list of attribute definitions>" line in the .gitattributes to begin with "*.jpeg"? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Clean up stale .gitignore and .gitattribute patterns 2023-06-26 15:55 ` Junio C Hamano @ 2023-06-26 16:42 ` Sebastian Schuberth 2023-06-27 6:51 ` Jeff King 0 siblings, 1 reply; 9+ messages in thread From: Sebastian Schuberth @ 2023-06-26 16:42 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Git Mailing List On Mon, Jun 26, 2023 at 5:55 PM Junio C Hamano <gitster@pobox.com> wrote: > > PS: As a future idea, it might be good if "git mv" gives a hint about > > updating .gitattributes if files matching .gitattributes pattern are > > moved. > > Interesting. "git mv hello.jpg hello.jpeg" would suggest updating > a "*.jpg <list of attribute definitions>" line in the .gitattributes > to begin with "*.jpeg"? Yes, right. Or as a simpler variant to start with (as patterns might match files in different directories, and not all of the matching files might be moved), just say that a specific .gitattributes line needs updating (or needs to be duplicated / generalized in case files in both the old and new location match). -- Sebastian Schuberth ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Clean up stale .gitignore and .gitattribute patterns 2023-06-26 16:42 ` Sebastian Schuberth @ 2023-06-27 6:51 ` Jeff King 2023-06-27 6:55 ` Sebastian Schuberth 0 siblings, 1 reply; 9+ messages in thread From: Jeff King @ 2023-06-27 6:51 UTC (permalink / raw) To: Sebastian Schuberth; +Cc: Junio C Hamano, Git Mailing List On Mon, Jun 26, 2023 at 06:42:17PM +0200, Sebastian Schuberth wrote: > On Mon, Jun 26, 2023 at 5:55 PM Junio C Hamano <gitster@pobox.com> wrote: > > > > PS: As a future idea, it might be good if "git mv" gives a hint about > > > updating .gitattributes if files matching .gitattributes pattern are > > > moved. > > > > Interesting. "git mv hello.jpg hello.jpeg" would suggest updating > > a "*.jpg <list of attribute definitions>" line in the .gitattributes > > to begin with "*.jpeg"? > > Yes, right. Or as a simpler variant to start with (as patterns might > match files in different directories, and not all of the matching > files might be moved), just say that a specific .gitattributes line > needs updating (or needs to be duplicated / generalized in case files > in both the old and new location match). Yeah, I don't think we could ever do anything automated here; a human needs to judge the intent and how the patterns should be adapted. But perhaps something like: 1. When git-commit makes a new commit that removes paths (whether they were totally removed, or renamed), find all gitattribute lines whose patterns match those paths. 2. For each such pattern, see if it still matches anything in the resulting tree. 3. If not, print advise() lines showing the file/line of the pattern which is no longer used. Doing so naively (by checking matches for each file in the tree) would be a little expensive, but maybe OK in practice. It could perhaps be done more efficiently with specialized code, but it might be tricky to right (and you still end up O(size of tree) in the worst case, because something like "*.jpg" needs to be compared against every entry). Of course on the way there you should end up with a decent tool for "which patterns are not currently used?". And you could just periodically run that manually if you want to clean up (or even from a post-commit hook). Re-reading your email, though, I wonder if you meant something a little simpler, like: 1. When a path is moved via git-mv, see if the attributes before/after are the same. 2. If not, then mention which ones matched the old path via advise(). That is probably easier to write, though it does not help the "git rm" case (where attributes may become obsolete). -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Clean up stale .gitignore and .gitattribute patterns 2023-06-27 6:51 ` Jeff King @ 2023-06-27 6:55 ` Sebastian Schuberth 0 siblings, 0 replies; 9+ messages in thread From: Sebastian Schuberth @ 2023-06-27 6:55 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Git Mailing List On Tue, Jun 27, 2023 at 8:51 AM Jeff King <peff@peff.net> wrote: > Re-reading your email, though, I wonder if you meant something a little > simpler, like: Indeed, I was only having the "git mv" case in mind and to advise() at the time of that command being run, instead of advise()'ing at "git commit" time. -- Sebastian Schuberth ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-06-27 6:56 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-06-23 15:29 Clean up stale .gitignore and .gitattribute patterns Sebastian Schuberth 2023-06-23 17:26 ` Junio C Hamano 2023-06-24 1:16 ` Jeff King 2023-06-24 1:12 ` Jeff King 2023-06-26 7:51 ` Sebastian Schuberth 2023-06-26 15:55 ` Junio C Hamano 2023-06-26 16:42 ` Sebastian Schuberth 2023-06-27 6:51 ` Jeff King 2023-06-27 6:55 ` Sebastian Schuberth
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).