* [PATCH] ls-files: add an --exclude-links option
@ 2023-06-21 6:08 Guido Martínez via GitGitGadget
2023-06-21 8:17 ` Jeff King
0 siblings, 1 reply; 7+ messages in thread
From: Guido Martínez via GitGitGadget @ 2023-06-21 6:08 UTC (permalink / raw)
To: git; +Cc: Guido Martínez, Guido Martínez
From: =?UTF-8?q?Guido=20Mart=C3=ADnez?= <mtzguido@gmail.com>
Add an option to exclude symlinks from the listed files. This is useful
in case we are listing the files in order to process the contents,
for instance to do some text replacement with `sed -i`. In that case,
there is no point in processing the links, and it could even be
counterproductive as some tools (like sed) will replace the link with a
fresh regular file.
This option enables a straightforward implementation of a `git sed`:
#!/bin/bash
git ls-files --exclude-links -z | xargs -0 -P $(nproc) -- sed -i -e "$@"
Signed-off-by: Guido Martínez <mtzguido@gmail.com>
---
ls-files: add an --exclude-links option
Hi, not sure if this is desirable, but I found it very useful to
implement the git sed shown in the commit message to do some
replacements in a big repo. I placed the option right after
--exclude-standard and (I think) updated the relevant docs. The flag
applies to all modes: -c, -d, -m and -o, though I think it's mostly
useful on the default -c.
Also is there interest in a standard git sed?
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1549%2Fmtzguido%2Fls_files_exclude_links-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1549/mtzguido/ls_files_exclude_links-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1549
Documentation/git-ls-files.txt | 5 ++++-
builtin/ls-files.c | 10 ++++++++++
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index 1bc0328bb78..138ba49bd8c 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -18,7 +18,7 @@ SYNOPSIS
[-x <pattern>|--exclude=<pattern>]
[-X <file>|--exclude-from=<file>]
[--exclude-per-directory=<file>]
- [--exclude-standard]
+ [--exclude-standard] [--exclude-links]
[--error-unmatch] [--with-tree=<tree-ish>]
[--full-name] [--recurse-submodules]
[--abbrev[=<n>]] [--format=<format>] [--] [<file>...]
@@ -126,6 +126,9 @@ OPTIONS
Add the standard Git exclusions: .git/info/exclude, .gitignore
in each directory, and the user's global exclusion file.
+--exclude-links::
+ Do not list symbolic links.
+
--error-unmatch::
If any <file> does not appear in the index, treat this as an
error (return 1).
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 72012c0f0f7..0826e89a496 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -46,6 +46,7 @@ static int show_eol;
static int recurse_submodules;
static int skipping_duplicates;
static int show_sparse_dirs;
+static int exclude_links;
static const char *prefix;
static int max_prefix_len;
@@ -171,6 +172,11 @@ static void show_other_files(struct index_state *istate,
struct dir_entry *ent = dir->entries[i];
if (!index_name_is_other(istate, ent->name, ent->len))
continue;
+ if (exclude_links) {
+ struct stat st;
+ if (!lstat(ent->name, &st) && S_ISLNK(st.st_mode))
+ continue;
+ }
show_dir_entry(istate, tag_other, ent);
}
}
@@ -451,6 +457,8 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
continue;
if (ce->ce_flags & CE_UPDATE)
continue;
+ if (exclude_links && S_ISLNK(ce->ce_mode))
+ continue;
if ((show_cached || show_stage) &&
(!show_unmerged || ce_stage(ce))) {
show_ce(repo, dir, ce, fullname.buf,
@@ -780,6 +788,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
N_("add the standard git exclusions"),
PARSE_OPT_NOARG | PARSE_OPT_NONEG,
option_parse_exclude_standard),
+ OPT_BOOL(0, "exclude-links", &exclude_links,
+ N_("do not print symbolic links")),
OPT_SET_INT_F(0, "full-name", &prefix_len,
N_("make the output relative to the project top directory"),
0, PARSE_OPT_NONEG),
base-commit: 6640c2d06d112675426cf436f0594f0e8c614848
--
gitgitgadget
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] ls-files: add an --exclude-links option
2023-06-21 6:08 [PATCH] ls-files: add an --exclude-links option Guido Martínez via GitGitGadget
@ 2023-06-21 8:17 ` Jeff King
2023-06-21 15:19 ` Guido Martínez
2023-06-21 16:08 ` Junio C Hamano
0 siblings, 2 replies; 7+ messages in thread
From: Jeff King @ 2023-06-21 8:17 UTC (permalink / raw)
To: Guido Martínez via GitGitGadget; +Cc: git, Guido Martínez
On Wed, Jun 21, 2023 at 06:08:04AM +0000, Guido Martínez via GitGitGadget wrote:
> From: =?UTF-8?q?Guido=20Mart=C3=ADnez?= <mtzguido@gmail.com>
>
> Add an option to exclude symlinks from the listed files. This is useful
> in case we are listing the files in order to process the contents,
> for instance to do some text replacement with `sed -i`. In that case,
> there is no point in processing the links, and it could even be
> counterproductive as some tools (like sed) will replace the link with a
> fresh regular file.
>
> This option enables a straightforward implementation of a `git sed`:
>
> #!/bin/bash
> git ls-files --exclude-links -z | xargs -0 -P $(nproc) -- sed -i -e "$@"
This invocation would likewise have a problem with gitlink entries (for
submodules). I think what you really want is not "exclude symlinks" but
"show only regular files". There is no option for that, but you can do
it by grepping modes from "-s", like:
git ls-files -s | grep '^100[67]' | ...
You do unfortunately have to then pull the filename out of the rest of
the line, and since we didn't use "-z", it will be quoted (and using
"-z" makes it hard to use tools like grep and sed). A mild application
of perl works, though:
git ls-files -s |
perl -0ne 'print if s/^100(644|755).*?\t//' |
xargs -0 ...
So I dunno. That is not exactly pretty, but if you were hiding it in a
"git sed" alias or script, it's not so bad.
If we were to add an option to ls-files, it would make more sense to me
to allow the user to include/exclude by mode or possibly naming the type
(e.g., "f" for regular files, "l" for symbolic links, etc, which matches
"find"). And then allow inclusion/exclusion similar to the way that
git-diff's --diff-filter option works.
-Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ls-files: add an --exclude-links option
2023-06-21 8:17 ` Jeff King
@ 2023-06-21 15:19 ` Guido Martínez
2023-06-21 16:08 ` Junio C Hamano
1 sibling, 0 replies; 7+ messages in thread
From: Guido Martínez @ 2023-06-21 15:19 UTC (permalink / raw)
To: Jeff King; +Cc: Guido Martínez via GitGitGadget, git
Hi Jeff, thanks for reviewing!
On Wed, Jun 21, 2023 at 1:17 AM Jeff King <peff@peff.net> wrote:
>
> On Wed, Jun 21, 2023 at 06:08:04AM +0000, Guido Martínez via GitGitGadget wrote:
>
> > From: =?UTF-8?q?Guido=20Mart=C3=ADnez?= <mtzguido@gmail.com>
> >
> > Add an option to exclude symlinks from the listed files. This is useful
> > in case we are listing the files in order to process the contents,
> > for instance to do some text replacement with `sed -i`. In that case,
> > there is no point in processing the links, and it could even be
> > counterproductive as some tools (like sed) will replace the link with a
> > fresh regular file.
> >
> > This option enables a straightforward implementation of a `git sed`:
> >
> > #!/bin/bash
> > git ls-files --exclude-links -z | xargs -0 -P $(nproc) -- sed -i -e "$@"
>
> This invocation would likewise have a problem with gitlink entries (for
> submodules). I think what you really want is not "exclude symlinks" but
> "show only regular files".
Ah, thanks! I hadn't realized that. I suppose my patch should also
come with a few tests, including one with submodules.
> There is no option for that, but you can do
> it by grepping modes from "-s", like:
>
> git ls-files -s | grep '^100[67]' | ...
>
> You do unfortunately have to then pull the filename out of the rest of
> the line, and since we didn't use "-z", it will be quoted (and using
> "-z" makes it hard to use tools like grep and sed). A mild application
> of perl works, though:
>
> git ls-files -s |
> perl -0ne 'print if s/^100(644|755).*?\t//' |
> xargs -0 ...
My perl-foo is not as strong, so I didn't realize perl could do this
kind of thing :) Still it may be desirable for ls-files to do the
filtering itself?
> So I dunno. That is not exactly pretty, but if you were hiding it in a
> "git sed" alias or script, it's not so bad.
>
> If we were to add an option to ls-files, it would make more sense to me
> to allow the user to include/exclude by mode or possibly naming the type
> (e.g., "f" for regular files, "l" for symbolic links, etc, which matches
> "find"). And then allow inclusion/exclusion similar to the way that
> git-diff's --diff-filter option works.
Yes, this is totally reasonable. I'll try to get that working here and resubmit.
Thanks,
Guido
>
> -Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ls-files: add an --exclude-links option
2023-06-21 8:17 ` Jeff King
2023-06-21 15:19 ` Guido Martínez
@ 2023-06-21 16:08 ` Junio C Hamano
2023-06-21 17:54 ` Guido Martínez
1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2023-06-21 16:08 UTC (permalink / raw)
To: Jeff King; +Cc: Guido Martínez via GitGitGadget, git, Guido Martínez
Jeff King <peff@peff.net> writes:
>> This option enables a straightforward implementation of a `git sed`:
>>
>> #!/bin/bash
>> git ls-files --exclude-links -z | xargs -0 -P $(nproc) -- sed -i -e "$@"
Unrelated nit.
I think -i -e above is iffy, as it does not distribute -e across
"$@" and your users may not always want to edit the files. It is
better to leave them to the callers.
"sed" is also something the caller can easily pass from their
command line, for that matter ;-). Passing the entire command
part run under xargs from the command line of the wrapper,
$ git for-all-paths grep -e pattern
would also work just fine, for example.
> ... A mild application of perl works, though:
>
> git ls-files -s -z |
> perl -0ne 'print if s/^100(644|755).*?\t//' |
> xargs -0 ...
>
> So I dunno. That is not exactly pretty, but if you were hiding it in a
> "git sed" alias or script, it's not so bad.
Yes, the above would be a perfectly reasonable implementation of
"git for-all-paths", especially if you do not hardcode anything in
the ... part and instead use something like xargs -0 "$@" there.
What is somewhat unsatisfactory is that we cannot pass pathspec to
the "ls-files" so that the command does not have to be for-all-paths
but can be usable as "git do-for-paths -c '<command>' <pathspec>".
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] ls-files: add an --exclude-links option
2023-06-21 16:08 ` Junio C Hamano
@ 2023-06-21 17:54 ` Guido Martínez
2023-06-21 19:47 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Guido Martínez @ 2023-06-21 17:54 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, Guido Martínez via GitGitGadget, git
Hi Junio, thanks for taking a look!
On Wed, Jun 21, 2023 at 9:08 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jeff King <peff@peff.net> writes:
>
> >> This option enables a straightforward implementation of a `git sed`:
> >>
> >> #!/bin/bash
> >> git ls-files --exclude-links -z | xargs -0 -P $(nproc) -- sed -i -e "$@"
>
> Unrelated nit.
>
> I think -i -e above is iffy, as it does not distribute -e across
> "$@" and your users may not always want to edit the files. It is
> better to leave them to the callers.
>
> "sed" is also something the caller can easily pass from their
> command line, for that matter ;-).
I should have known my one-liner would be torn to shreds :-). But yes, agreed!
> Passing the entire command
> part run under xargs from the command line of the wrapper,
>
> $ git for-all-paths grep -e pattern
>
> would also work just fine, for example.
>
> > ... A mild application of perl works, though:
> >
> > git ls-files -s -z |
> > perl -0ne 'print if s/^100(644|755).*?\t//' |
> > xargs -0 ...
> >
> > So I dunno. That is not exactly pretty, but if you were hiding it in a
> > "git sed" alias or script, it's not so bad.
>
> Yes, the above would be a perfectly reasonable implementation of
> "git for-all-paths", especially if you do not hardcode anything in
> the ... part and instead use something like xargs -0 "$@" there.
>
> What is somewhat unsatisfactory is that we cannot pass pathspec to
> the "ls-files" so that the command does not have to be for-all-paths
> but can be usable as "git do-for-paths -c '<command>' <pathspec>".
Indeed, a command like that would be great, and doing `git
do-for-paths -c "sed -i 's/old/new/g'"` is still extremely convenient.
I suppose this command should take options -n and -P to pass to xargs.
In the case of sed, running in batches is _much_ faster than 1-by-1,
and it trivially parallelizes.
So would it make sense to
1- Add the file type filter to ls-files
2- Use that to implement a proper git-do-for-paths script/binary,
which can take pathspecs, filetype filters, -n, -P, and maybe more
?
Thanks again!
Guido
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ls-files: add an --exclude-links option
2023-06-21 17:54 ` Guido Martínez
@ 2023-06-21 19:47 ` Junio C Hamano
2023-06-21 20:02 ` Guido Martínez
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2023-06-21 19:47 UTC (permalink / raw)
To: Guido Martínez; +Cc: Jeff King, Guido Martínez via GitGitGadget, git
Guido Martínez <mtzguido@gmail.com> writes:
> So would it make sense to
> 1- Add the file type filter to ls-files
I think it definitely makes sense to extend the filtering criteria
supported by ls-files beyond what we currently support (i.e.
pathspec).
I also wonder if "file type filter" could just be implemented as the
magic pathspec. For example, we can already use the magic pathspec
'attr' (read on "pathspec" in "git help glossary") this way:
$ git ls-files ":(attr:eol=crlf)"
to list only those files for which the 'eol' attribute is set to
'crlf' (i.e. they must be checked out for DOS no matter what your
platform actually is). That is even more flexible than the
hardcoded "is it a regular file? is it a symlink? is it a
submodule" file types. And the magic pathspec is understood not
everywhere but by git subcommands other than "ls-files".
We can either invent a new pathspec magic "filetype" and express
them this way,
$ git ls-files ":(filetype:regular)" # 100644 and 100755
$ git ls-files ":(filetype:symbolic-link)" # 120000
$ git ls-files ":(filetype:submodule)" # 160000
or we invent a magic attribute "filetype" that is automatically
given to every path, and express the above more like so:
$ git ls-files ":(attr:filetype=regular)" # 100644 and 100755
$ git ls-files ":(attr:filetype=symbolic-link)" # 120000
$ git ls-files ":(attr:filetype=submodule)" # 160000
may be even better, as there are git subcommands other than ls-files
that supports magic pathspec. For example, it might be even useful
to do something like
$ git diff v1.0 v2.0 -- ":(attr:filetype=executable)"
instead of saying
$ git diff v1.0 v2.0 -- \*.sh \*.perl \*.py \*.bat
So, yeah, whether it is done via the magic pathspec or "ls-files"
specific option, teaching "ls-files" to support more filtering
criteria would make sense.
> 2- Use that to implement a proper git-do-for-paths script/binary,
> which can take pathspecs, filetype filters, -n, -P, and maybe more
> ?
The primary obstacle was you'd need a custom perl script to filter
"ls-files -z" output, but once that need is gone, I actually do not
think it buys us a lot to have such a wrapper. Treat the improved
ls-files as what it is, i.e. a plumbing command that can be used as
a building block of your workflow, and piping its output to xargs
would just be fine.
Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] ls-files: add an --exclude-links option
2023-06-21 19:47 ` Junio C Hamano
@ 2023-06-21 20:02 ` Guido Martínez
0 siblings, 0 replies; 7+ messages in thread
From: Guido Martínez @ 2023-06-21 20:02 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, Guido Martínez via GitGitGadget, git
On Wed, Jun 21, 2023 at 12:47 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Guido Martínez <mtzguido@gmail.com> writes:
>
> > So would it make sense to
> > 1- Add the file type filter to ls-files
>
> I think it definitely makes sense to extend the filtering criteria
> supported by ls-files beyond what we currently support (i.e.
> pathspec).
>
> I also wonder if "file type filter" could just be implemented as the
> magic pathspec. For example, we can already use the magic pathspec
> 'attr' (read on "pathspec" in "git help glossary") this way:
>
> $ git ls-files ":(attr:eol=crlf)"
>
> to list only those files for which the 'eol' attribute is set to
> 'crlf' (i.e. they must be checked out for DOS no matter what your
> platform actually is). That is even more flexible than the
> hardcoded "is it a regular file? is it a symlink? is it a
> submodule" file types. And the magic pathspec is understood not
> everywhere but by git subcommands other than "ls-files".
>
> We can either invent a new pathspec magic "filetype" and express
> them this way,
>
> $ git ls-files ":(filetype:regular)" # 100644 and 100755
> $ git ls-files ":(filetype:symbolic-link)" # 120000
> $ git ls-files ":(filetype:submodule)" # 160000
>
> or we invent a magic attribute "filetype" that is automatically
> given to every path, and express the above more like so:
>
> $ git ls-files ":(attr:filetype=regular)" # 100644 and 100755
> $ git ls-files ":(attr:filetype=symbolic-link)" # 120000
> $ git ls-files ":(attr:filetype=submodule)" # 160000
>
> may be even better, as there are git subcommands other than ls-files
> that supports magic pathspec. For example, it might be even useful
> to do something like
>
> $ git diff v1.0 v2.0 -- ":(attr:filetype=executable)"
>
> instead of saying
>
> $ git diff v1.0 v2.0 -- \*.sh \*.perl \*.py \*.bat
>
> So, yeah, whether it is done via the magic pathspec or "ls-files"
> specific option, teaching "ls-files" to support more filtering
> criteria would make sense.
Thanks, I was totally unaware of this magic pathspec, I'll see what it
takes to add attr:filetype.
> > 2- Use that to implement a proper git-do-for-paths script/binary,
> > which can take pathspecs, filetype filters, -n, -P, and maybe more
> > ?
>
> The primary obstacle was you'd need a custom perl script to filter
> "ls-files -z" output, but once that need is gone, I actually do not
> think it buys us a lot to have such a wrapper. Treat the improved
> ls-files as what it is, i.e. a plumbing command that can be used as
> a building block of your workflow, and piping its output to xargs
> would just be fine.
Yup, totally fine by me. I'll just keep that script locally then.
Thanks again,
Guido
>
> Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-06-21 20:02 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-21 6:08 [PATCH] ls-files: add an --exclude-links option Guido Martínez via GitGitGadget
2023-06-21 8:17 ` Jeff King
2023-06-21 15:19 ` Guido Martínez
2023-06-21 16:08 ` Junio C Hamano
2023-06-21 17:54 ` Guido Martínez
2023-06-21 19:47 ` Junio C Hamano
2023-06-21 20:02 ` Guido Martínez
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).