* [RFC PATCH] builtin/shortlog: explicitly set hash algo when there is no repo @ 2024-10-11 18:34 Wolfgang Müller 2024-10-15 9:33 ` Wolfgang Müller ` (3 more replies) 0 siblings, 4 replies; 36+ messages in thread From: Wolfgang Müller @ 2024-10-11 18:34 UTC (permalink / raw) To: git; +Cc: Wolfgang Müller Whilst git-shortlog(1) does not explicitly need any repository information when run without reference to one, it still parses some of its arguments with parse_revision_opt() which assumes that the hash algorithm is set. However, in c8aed5e8da (repository: stop setting SHA1 as the default object hash, 2024-05-07) we stopped setting up a default hash algorithm and instead require commands to set it up explicitly. This was done for most other commands like in ab274909d4 (builtin/diff: explicitly set hash algo when there is no repo, 2024-05-07) but was missed for builtin/shortlog, making git-shortlog(1) segfault outside of a repository when given arguments like --author that trigger a call to parse_revision_opt(). Fix this for now by explicitly setting the hash algorithm to SHA1. Signed-off-by: Wolfgang Müller <wolf@oriole.systems> --- I think there is a much cleaner patch here if one would look at disentangling the parsing machinery split between cmd_shortlog() and parse_revision_opt() such that the latter is only called if there is an actual repository, but I'm not at all familiar enough with the codebase to do that. Like the commit says some other commands were fixed like this as well, so I thought to send this patch your way. builtin/shortlog.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/builtin/shortlog.c b/builtin/shortlog.c index 3ed5c46078..6422f088b2 100644 --- a/builtin/shortlog.c +++ b/builtin/shortlog.c @@ -387,6 +387,14 @@ int cmd_shortlog(int argc, struct rev_info rev; int nongit = !startup_info->have_repository; + /* + * Later on we'll call parse_revision_opt which relies on the hash + * algorithm being set but since we are operating outside of a Git + * repository we cannot determine one. For now default to SHA1. + */ + if (nongit && !the_hash_algo) + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + const struct option options[] = { OPT_BIT('c', "committer", &log.groups, N_("group by committer rather than author"), base-commit: 777489f9e09c8d0dd6b12f9d90de6376330577a2 -- 2.47.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [RFC PATCH] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-11 18:34 [RFC PATCH] builtin/shortlog: explicitly set hash algo when there is no repo Wolfgang Müller @ 2024-10-15 9:33 ` Wolfgang Müller 2024-10-15 9:47 ` Kristoffer Haugsbakk 2024-10-15 19:54 ` Taylor Blau 2024-10-15 11:48 ` [RFC PATCH v2] " Wolfgang Müller ` (2 subsequent siblings) 3 siblings, 2 replies; 36+ messages in thread From: Wolfgang Müller @ 2024-10-15 9:33 UTC (permalink / raw) To: git On 2024-10-11 20:34, Wolfgang Müller wrote: > Whilst git-shortlog(1) does not explicitly need any repository > information when run without reference to one, it still parses some of > its arguments with parse_revision_opt() which assumes that the hash > algorithm is set. However, in c8aed5e8da (repository: stop setting SHA1 > as the default object hash, 2024-05-07) we stopped setting up a default > hash algorithm and instead require commands to set it up explicitly. > > This was done for most other commands like in ab274909d4 (builtin/diff: > explicitly set hash algo when there is no repo, 2024-05-07) but was > missed for builtin/shortlog, making git-shortlog(1) segfault outside of > a repository when given arguments like --author that trigger a call to > parse_revision_opt(). > > Fix this for now by explicitly setting the hash algorithm to SHA1. > > Signed-off-by: Wolfgang Müller <wolf@oriole.systems> > --- > I think there is a much cleaner patch here if one would look at > disentangling the parsing machinery split between cmd_shortlog() and > parse_revision_opt() such that the latter is only called if there is an > actual repository, but I'm not at all familiar enough with the codebase > to do that. Like the commit says some other commands were fixed like > this as well, so I thought to send this patch your way. Was this topic maybe missed? It's not a critical fix but would solve a segfault that is pretty easy for a user to trigger. Thanks! -- Wolf ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-15 9:33 ` Wolfgang Müller @ 2024-10-15 9:47 ` Kristoffer Haugsbakk 2024-10-15 19:54 ` Taylor Blau 1 sibling, 0 replies; 36+ messages in thread From: Kristoffer Haugsbakk @ 2024-10-15 9:47 UTC (permalink / raw) To: Wolfgang Müller, git On Tue, Oct 15, 2024, at 11:33, Wolfgang Müller wrote: > > […] > > Was this topic maybe missed? It's not a critical fix but would solve a > segfault that is pretty easy for a user to trigger. I can confirm that this gives me a segfault: ``` # Directory which is not tracked cd ~ git shortlog --author=Kristoffer ``` Maybe a few things could make the change more ready to be included: 1. A regression test for that segfault 2. A quickfix (which you indicate below the three-dash/hyphens) can be documented in the comment above the code with a `NEEDSWORK` -- Kristoffer Haugsbakk ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-15 9:33 ` Wolfgang Müller 2024-10-15 9:47 ` Kristoffer Haugsbakk @ 2024-10-15 19:54 ` Taylor Blau 2024-10-15 23:28 ` Taylor Blau 1 sibling, 1 reply; 36+ messages in thread From: Taylor Blau @ 2024-10-15 19:54 UTC (permalink / raw) To: git On Tue, Oct 15, 2024 at 11:33:35AM +0200, Wolfgang Müller wrote: > On 2024-10-11 20:34, Wolfgang Müller wrote: > > Whilst git-shortlog(1) does not explicitly need any repository > > information when run without reference to one, it still parses some of > > its arguments with parse_revision_opt() which assumes that the hash > > algorithm is set. However, in c8aed5e8da (repository: stop setting SHA1 > > as the default object hash, 2024-05-07) we stopped setting up a default > > hash algorithm and instead require commands to set it up explicitly. > > > > This was done for most other commands like in ab274909d4 (builtin/diff: > > explicitly set hash algo when there is no repo, 2024-05-07) but was > > missed for builtin/shortlog, making git-shortlog(1) segfault outside of > > a repository when given arguments like --author that trigger a call to > > parse_revision_opt(). > > > > Fix this for now by explicitly setting the hash algorithm to SHA1. > > > > Signed-off-by: Wolfgang Müller <wolf@oriole.systems> > > --- > > I think there is a much cleaner patch here if one would look at > > disentangling the parsing machinery split between cmd_shortlog() and > > parse_revision_opt() such that the latter is only called if there is an > > actual repository, but I'm not at all familiar enough with the codebase > > to do that. Like the commit says some other commands were fixed like > > this as well, so I thought to send this patch your way. > > Was this topic maybe missed? It's not a critical fix but would solve a > segfault that is pretty easy for a user to trigger. Yeah. It looks like this was sent out on Friday, which was Junio's last working day before vacation. I think we probably both assumed the other picked it up ;-). Thanks, Taylor ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-15 19:54 ` Taylor Blau @ 2024-10-15 23:28 ` Taylor Blau 2024-10-16 8:15 ` Wolfgang Müller 0 siblings, 1 reply; 36+ messages in thread From: Taylor Blau @ 2024-10-15 23:28 UTC (permalink / raw) To: git On Tue, Oct 15, 2024 at 03:54:51PM -0400, Taylor Blau wrote: > On Tue, Oct 15, 2024 at 11:33:35AM +0200, Wolfgang Müller wrote: > > On 2024-10-11 20:34, Wolfgang Müller wrote: > > > Whilst git-shortlog(1) does not explicitly need any repository > > > information when run without reference to one, it still parses some of > > > its arguments with parse_revision_opt() which assumes that the hash > > > algorithm is set. However, in c8aed5e8da (repository: stop setting SHA1 > > > as the default object hash, 2024-05-07) we stopped setting up a default > > > hash algorithm and instead require commands to set it up explicitly. > > > > > > This was done for most other commands like in ab274909d4 (builtin/diff: > > > explicitly set hash algo when there is no repo, 2024-05-07) but was > > > missed for builtin/shortlog, making git-shortlog(1) segfault outside of > > > a repository when given arguments like --author that trigger a call to > > > parse_revision_opt(). > > > > > > Fix this for now by explicitly setting the hash algorithm to SHA1. > > > > > > Signed-off-by: Wolfgang Müller <wolf@oriole.systems> > > > --- > > > I think there is a much cleaner patch here if one would look at > > > disentangling the parsing machinery split between cmd_shortlog() and > > > parse_revision_opt() such that the latter is only called if there is an > > > actual repository, but I'm not at all familiar enough with the codebase > > > to do that. Like the commit says some other commands were fixed like > > > this as well, so I thought to send this patch your way. > > > > Was this topic maybe missed? It's not a critical fix but would solve a > > segfault that is pretty easy for a user to trigger. > > Yeah. It looks like this was sent out on Friday, which was Junio's last > working day before vacation. I think we probably both assumed the other > picked it up ;-). I picked this up today and applied it to 'seen', but ejected it before pushing the result out, as this appears to break CI, presumably due to the mixed declaration and code in the patch. Thanks, Taylor ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-15 23:28 ` Taylor Blau @ 2024-10-16 8:15 ` Wolfgang Müller 2024-10-16 18:28 ` Taylor Blau 0 siblings, 1 reply; 36+ messages in thread From: Wolfgang Müller @ 2024-10-16 8:15 UTC (permalink / raw) To: Taylor Blau; +Cc: git On 2024-10-15 19:28, Taylor Blau wrote: > I picked this up today and applied it to 'seen', but ejected it before > pushing the result out, as this appears to break CI, presumably due to > the mixed declaration and code in the patch. Oh, I had missed setting DEVELOPER=1 in my builds so I didn't see this failure myself. Will fix in the upcoming version. -- Wolf ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-16 8:15 ` Wolfgang Müller @ 2024-10-16 18:28 ` Taylor Blau 0 siblings, 0 replies; 36+ messages in thread From: Taylor Blau @ 2024-10-16 18:28 UTC (permalink / raw) To: git On Wed, Oct 16, 2024 at 10:15:31AM +0200, Wolfgang Müller wrote: > On 2024-10-15 19:28, Taylor Blau wrote: > > I picked this up today and applied it to 'seen', but ejected it before > > pushing the result out, as this appears to break CI, presumably due to > > the mixed declaration and code in the patch. > > Oh, I had missed setting DEVELOPER=1 in my builds so I didn't see this > failure myself. Will fix in the upcoming version. Thanks for fixing. Thanks, Taylor ^ permalink raw reply [flat|nested] 36+ messages in thread
* [RFC PATCH v2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-11 18:34 [RFC PATCH] builtin/shortlog: explicitly set hash algo when there is no repo Wolfgang Müller 2024-10-15 9:33 ` Wolfgang Müller @ 2024-10-15 11:48 ` Wolfgang Müller 2024-10-15 17:20 ` Eric Sunshine 2024-10-16 5:32 ` Patrick Steinhardt 2024-10-16 18:21 ` [PATCH v3 0/2] " Wolfgang Müller 2024-10-17 9:35 ` [PATCH v4] " Wolfgang Müller 3 siblings, 2 replies; 36+ messages in thread From: Wolfgang Müller @ 2024-10-15 11:48 UTC (permalink / raw) To: git; +Cc: Wolfgang Müller Whilst git-shortlog(1) does not explicitly need any repository information when run without reference to one, it still parses some of its arguments with parse_revision_opt() which assumes that the hash algorithm is set. However, in c8aed5e8da (repository: stop setting SHA1 as the default object hash, 2024-05-07) we stopped setting up a default hash algorithm and instead require commands to set it up explicitly. This was done for most other commands like in ab274909d4 (builtin/diff: explicitly set hash algo when there is no repo, 2024-05-07) but was missed for builtin/shortlog, making git-shortlog(1) segfault outside of a repository when given arguments like --author that trigger a call to parse_revision_opt(). Fix this for now by explicitly setting the hash algorithm to SHA1. Also add a regression test for the segfault. Signed-off-by: Wolfgang Müller <wolf@oriole.systems> --- builtin/shortlog.c | 12 ++++++++++++ t/t4201-shortlog.sh | 5 +++++ 2 files changed, 17 insertions(+) diff --git a/builtin/shortlog.c b/builtin/shortlog.c index 3ed5c46078..0fa35202ed 100644 --- a/builtin/shortlog.c +++ b/builtin/shortlog.c @@ -387,6 +387,18 @@ int cmd_shortlog(int argc, struct rev_info rev; int nongit = !startup_info->have_repository; + /* + * NEEDSWORK: Later on we'll call parse_revision_opt which relies on + * the hash algorithm being set but since we are operating outside of a + * Git repository we cannot determine one. This is only needed because + * parse_revision_opt expects hexsz for --abbrev which is irrelevant + * for shortlog outside of a git repository. For now explicitly set + * SHA1, but ideally the parsing machinery would be split between + * git/nongit so that we do not have to do this. + */ + if (nongit && !the_hash_algo) + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + const struct option options[] = { OPT_BIT('c', "committer", &log.groups, N_("group by committer rather than author"), diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh index c20c885724..ed39c67ba1 100755 --- a/t/t4201-shortlog.sh +++ b/t/t4201-shortlog.sh @@ -143,6 +143,11 @@ fuzz() test_grep "too many arguments" out ' +test_expect_success 'shortlog --author from non-git directory does not segfault' ' + git log --no-expand-tabs HEAD >log && + env GIT_DIR=non-existing git shortlog --author=author <log 2>out +' + test_expect_success 'shortlog should add newline when input line matches wraplen' ' cat >expect <<\EOF && A U Thor (2): Range-diff against v1: 1: 42516cc02d ! 1: d3047a0291 builtin/shortlog: explicitly set hash algo when there is no repo @@ Commit message a repository when given arguments like --author that trigger a call to parse_revision_opt(). - Fix this for now by explicitly setting the hash algorithm to SHA1. + Fix this for now by explicitly setting the hash algorithm to SHA1. Also + add a regression test for the segfault. Signed-off-by: Wolfgang Müller <wolf@oriole.systems> @@ builtin/shortlog.c: int cmd_shortlog(int argc, int nongit = !startup_info->have_repository; + /* -+ * Later on we'll call parse_revision_opt which relies on the hash -+ * algorithm being set but since we are operating outside of a Git -+ * repository we cannot determine one. For now default to SHA1. ++ * NEEDSWORK: Later on we'll call parse_revision_opt which relies on ++ * the hash algorithm being set but since we are operating outside of a ++ * Git repository we cannot determine one. This is only needed because ++ * parse_revision_opt expects hexsz for --abbrev which is irrelevant ++ * for shortlog outside of a git repository. For now explicitly set ++ * SHA1, but ideally the parsing machinery would be split between ++ * git/nongit so that we do not have to do this. + */ + if (nongit && !the_hash_algo) + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); @@ builtin/shortlog.c: int cmd_shortlog(int argc, const struct option options[] = { OPT_BIT('c', "committer", &log.groups, N_("group by committer rather than author"), + + ## t/t4201-shortlog.sh ## +@@ t/t4201-shortlog.sh: fuzz() + test_grep "too many arguments" out + ' + ++test_expect_success 'shortlog --author from non-git directory does not segfault' ' ++ git log --no-expand-tabs HEAD >log && ++ env GIT_DIR=non-existing git shortlog --author=author <log 2>out ++' ++ + test_expect_success 'shortlog should add newline when input line matches wraplen' ' + cat >expect <<\EOF && + A U Thor (2): -- 2.47.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-15 11:48 ` [RFC PATCH v2] " Wolfgang Müller @ 2024-10-15 17:20 ` Eric Sunshine 2024-10-15 17:51 ` Wolfgang Müller 2024-10-16 5:32 ` Patrick Steinhardt 1 sibling, 1 reply; 36+ messages in thread From: Eric Sunshine @ 2024-10-15 17:20 UTC (permalink / raw) To: Wolfgang Müller; +Cc: git, Patrick Steinhardt On Tue, Oct 15, 2024 at 7:48 AM Wolfgang Müller <wolf@oriole.systems> wrote: > Whilst git-shortlog(1) does not explicitly need any repository > information when run without reference to one, it still parses some of > its arguments with parse_revision_opt() which assumes that the hash > algorithm is set. However, in c8aed5e8da (repository: stop setting SHA1 > as the default object hash, 2024-05-07) we stopped setting up a default > hash algorithm and instead require commands to set it up explicitly. > > This was done for most other commands like in ab274909d4 (builtin/diff: > explicitly set hash algo when there is no repo, 2024-05-07) but was > missed for builtin/shortlog, making git-shortlog(1) segfault outside of > a repository when given arguments like --author that trigger a call to > parse_revision_opt(). > > Fix this for now by explicitly setting the hash algorithm to SHA1. Also > add a regression test for the segfault. > > Signed-off-by: Wolfgang Müller <wolf@oriole.systems> Cc:'ing Patrick since I believe he's been patching similar cases. > diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh > @@ -143,6 +143,11 @@ fuzz() > +test_expect_success 'shortlog --author from non-git directory does not segfault' ' > + git log --no-expand-tabs HEAD >log && > + env GIT_DIR=non-existing git shortlog --author=author <log 2>out > +' t/test-lib-functions.sh has a handy nongit() function for running a command in a non-Git directory: nongit git shortlog --author=author <log 2>out By the way, what is the purpose of capturing output to file "out"? That file is never consulted. Also, can the original crash be reproduced without having to invoke the additional `git log`? In my tests, this is sufficient: echo | nongit git shortlog --author=author ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-15 17:20 ` Eric Sunshine @ 2024-10-15 17:51 ` Wolfgang Müller 0 siblings, 0 replies; 36+ messages in thread From: Wolfgang Müller @ 2024-10-15 17:51 UTC (permalink / raw) To: Eric Sunshine; +Cc: git, Patrick Steinhardt On 2024-10-15 13:20, Eric Sunshine wrote: > t/test-lib-functions.sh has a handy nongit() function for running a > command in a non-Git directory: > > nongit git shortlog --author=author <log 2>out Ah, nice, I'll be using that instead, then. > By the way, what is the purpose of capturing output to file "out"? > That file is never consulted. Oh, my bad, that was a leftover from local testing. > Also, can the original crash be reproduced without having to invoke > the additional `git log`? In my tests, this is sufficient: > > echo | nongit git shortlog --author=author Yeah, same as above. I'll take that into account for the next version, thanks! Thinking more about this, any opinion on the specific test for --author? It is a succinct test but also relies on the fact that it "just happens" to be one of the flags handled in parse_revision_opt(). In fact, any such flag (even non-existing ones, like "--foo") would crash shortlog right now. I'm wondering whether there's a more universal way of testing this. -- Wolf ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-15 11:48 ` [RFC PATCH v2] " Wolfgang Müller 2024-10-15 17:20 ` Eric Sunshine @ 2024-10-16 5:32 ` Patrick Steinhardt 2024-10-16 8:47 ` Wolfgang Müller 1 sibling, 1 reply; 36+ messages in thread From: Patrick Steinhardt @ 2024-10-16 5:32 UTC (permalink / raw) To: Wolfgang Müller; +Cc: git On Tue, Oct 15, 2024 at 01:48:26PM +0200, Wolfgang Müller wrote: > + /* > + * NEEDSWORK: Later on we'll call parse_revision_opt which relies on > + * the hash algorithm being set but since we are operating outside of a > + * Git repository we cannot determine one. This is only needed because > + * parse_revision_opt expects hexsz for --abbrev which is irrelevant > + * for shortlog outside of a git repository. For now explicitly set > + * SHA1, but ideally the parsing machinery would be split between > + * git/nongit so that we do not have to do this. > + */ > + if (nongit && !the_hash_algo) > + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); > + Makes sense. As you say, git-shortlog(1) should ideally recognize the hash function used by the input. The next-best thing would be to provide a command line option to switch it. But punting on that should be fine for now. > const struct option options[] = { > OPT_BIT('c', "committer", &log.groups, > N_("group by committer rather than author"), > diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh > index c20c885724..ed39c67ba1 100755 > --- a/t/t4201-shortlog.sh > +++ b/t/t4201-shortlog.sh > @@ -143,6 +143,11 @@ fuzz() > test_grep "too many arguments" out > ' > > +test_expect_success 'shortlog --author from non-git directory does not segfault' ' > + git log --no-expand-tabs HEAD >log && > + env GIT_DIR=non-existing git shortlog --author=author <log 2>out > +' > + I'd like to see another testcase added that exercises behaviour when git-shortlog(1) is passed SHA256 output outside of a repo, as I'm curious how it'll behave in that scenario. Thanks! Patrick ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-16 5:32 ` Patrick Steinhardt @ 2024-10-16 8:47 ` Wolfgang Müller 2024-10-16 8:57 ` Patrick Steinhardt 0 siblings, 1 reply; 36+ messages in thread From: Wolfgang Müller @ 2024-10-16 8:47 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git On 2024-10-16 07:32, Patrick Steinhardt wrote: > On Tue, Oct 15, 2024 at 01:48:26PM +0200, Wolfgang Müller wrote: > > diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh > > index c20c885724..ed39c67ba1 100755 > > --- a/t/t4201-shortlog.sh > > +++ b/t/t4201-shortlog.sh > > @@ -143,6 +143,11 @@ fuzz() > > test_grep "too many arguments" out > > ' > > > > +test_expect_success 'shortlog --author from non-git directory does not segfault' ' > > + git log --no-expand-tabs HEAD >log && > > + env GIT_DIR=non-existing git shortlog --author=author <log 2>out > > +' > > + > > I'd like to see another testcase added that exercises behaviour when > git-shortlog(1) is passed SHA256 output outside of a repo, as I'm > curious how it'll behave in that scenario. I had a look at this in builtin/shortlog.c's read_from_stdin() and am pretty sure that git-shortlog(1), when reading from stdin, simply ignores anything but either the "Author:" or "Commit:" lines, depending on the value given by --group. The --group=format: option is not supported when reading from stdin. Neither is --format handled at all. So I don't think there is actually a way to make git-shortlog(1) encounter and handle a commit hash when reading from stdin; the hash algorithm seems completely meaningless for its user-facing behaviour. As far as I have seen the closest it comes to getting into contact with a hash (or more specifically, hexsz) is when cmd_shortlog() sets: log.abbrev = rev.abbrev; This relies on the parsing machinery in parse_revision_opt() - the one this patch is for. Technically --abbrev is honored by git-shortlog(1) when reading from stdin, but its value goes unused because of the difference in code paths when reading from stdin. Do take this with a grain of salt, however, this is my first look at the inner workings of git-shortlog(1). As for the test, I'd be happy to provide one if this is still deemed necessary after considering the above. There's two questions I have: 1) Is this already covered by GIT_TEST_DEFAULT_HASH=sha256? Running the t4201-shortlog testsuite with that passes. 2) I've already experimented with setting up a test for this and am unsure how to cleanly set up a sha256 repository. Ordinarily it should be a simple init/add (perhaps with test_commit), but t4201-shortlog is already running within a git repository if I understand the setup step correctly. Is there a clean way to escape from there, or would it simply be fine creating another repository within that one? Thanks! -- Wolf ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-16 8:47 ` Wolfgang Müller @ 2024-10-16 8:57 ` Patrick Steinhardt 2024-10-16 9:07 ` Wolfgang Müller 2024-10-16 9:48 ` Wolfgang Müller 0 siblings, 2 replies; 36+ messages in thread From: Patrick Steinhardt @ 2024-10-16 8:57 UTC (permalink / raw) To: git On Wed, Oct 16, 2024 at 10:47:03AM +0200, Wolfgang Müller wrote: > On 2024-10-16 07:32, Patrick Steinhardt wrote: > > On Tue, Oct 15, 2024 at 01:48:26PM +0200, Wolfgang Müller wrote: > > > diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh > > > index c20c885724..ed39c67ba1 100755 > > > --- a/t/t4201-shortlog.sh > > > +++ b/t/t4201-shortlog.sh > > > @@ -143,6 +143,11 @@ fuzz() > > > test_grep "too many arguments" out > > > ' > > > > > > +test_expect_success 'shortlog --author from non-git directory does not segfault' ' > > > + git log --no-expand-tabs HEAD >log && > > > + env GIT_DIR=non-existing git shortlog --author=author <log 2>out > > > +' > > > + > > > > I'd like to see another testcase added that exercises behaviour when > > git-shortlog(1) is passed SHA256 output outside of a repo, as I'm > > curious how it'll behave in that scenario. > > I had a look at this in builtin/shortlog.c's read_from_stdin() and am > pretty sure that git-shortlog(1), when reading from stdin, simply > ignores anything but either the "Author:" or "Commit:" lines, depending > on the value given by --group. The --group=format: option is not > supported when reading from stdin. Neither is --format handled at all. > > So I don't think there is actually a way to make git-shortlog(1) > encounter and handle a commit hash when reading from stdin; the hash > algorithm seems completely meaningless for its user-facing behaviour. As > far as I have seen the closest it comes to getting into contact with a > hash (or more specifically, hexsz) is when cmd_shortlog() sets: > > log.abbrev = rev.abbrev; > > This relies on the parsing machinery in parse_revision_opt() - the one > this patch is for. Technically --abbrev is honored by git-shortlog(1) > when reading from stdin, but its value goes unused because of the > difference in code paths when reading from stdin. Do take this with a > grain of salt, however, this is my first look at the inner workings of > git-shortlog(1). Okay, thanks for the explanation. Given that we do set `log.abbrev` I think we should be hitting code paths in git-shortlog(1) that use it. `git shortlog --format=%h` for example would use `log.abbrev`, wouldn't it? It would be nice to figure out whether this can be made to misbehave based on which object hash we have in the input. > As for the test, I'd be happy to provide one if this is still deemed > necessary after considering the above. There's two questions I have: > > 1) Is this already covered by GIT_TEST_DEFAULT_HASH=sha256? Running the > t4201-shortlog testsuite with that passes. I think it doesn't hurt to have an explicit test for this scenario, even if it just demonstrates that things don't crash or behave in weird ways. > 2) I've already experimented with setting up a test for this and am > unsure how to cleanly set up a sha256 repository. Ordinarily it should > be a simple init/add (perhaps with test_commit), but t4201-shortlog is > already running within a git repository if I understand the setup step > correctly. Is there a clean way to escape from there, or would it simply > be fine creating another repository within that one? You can take e.g. b2dbf97f47 (builtin/index-pack: fix segfaults when running outside of a repo, 2024-09-04) as inspiration for how to achieve this. Patrick ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-16 8:57 ` Patrick Steinhardt @ 2024-10-16 9:07 ` Wolfgang Müller 2024-10-16 18:52 ` Taylor Blau 2024-10-16 9:48 ` Wolfgang Müller 1 sibling, 1 reply; 36+ messages in thread From: Wolfgang Müller @ 2024-10-16 9:07 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git On 2024-10-16 10:57, Patrick Steinhardt wrote: > Given that we do set `log.abbrev` I think we should be hitting code > paths in git-shortlog(1) that use it. `git shortlog --format=%h` for > example would use `log.abbrev`, wouldn't it? It would be nice to figure > out whether this can be made to misbehave based on which object hash we > have in the input. I did try this, but like I said, --format seems not supported when git-shortlog(1) is reading from stdin. It always outputs the same summary format, either grouped on authors or committers. This is not explicitly documented anywhere - the manual only says that "git shortlog will output a summary of the log read from standard input" and then goes on to document all the options with no mention of a difference in behaviour when reading from stdin. So I'm still not entirely convinced that this is impossible to trigger (also given the complexity of the argument parsing machinery), but I have not been able to find a way. > I think it doesn't hurt to have an explicit test for this scenario, even > if it just demonstrates that things don't crash or behave in weird ways. Sure, that makes sense. > You can take e.g. b2dbf97f47 (builtin/index-pack: fix segfaults when > running outside of a repo, 2024-09-04) as inspiration for how to achieve > this. Thanks much for the pointer, I'll have a look! -- Wolf ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-16 9:07 ` Wolfgang Müller @ 2024-10-16 18:52 ` Taylor Blau 2024-10-16 19:01 ` Wolfgang Müller 2024-10-17 5:04 ` Patrick Steinhardt 0 siblings, 2 replies; 36+ messages in thread From: Taylor Blau @ 2024-10-16 18:52 UTC (permalink / raw) To: Patrick Steinhardt, git On Wed, Oct 16, 2024 at 11:07:02AM +0200, Wolfgang Müller wrote: > On 2024-10-16 10:57, Patrick Steinhardt wrote: > > Given that we do set `log.abbrev` I think we should be hitting code > > paths in git-shortlog(1) that use it. `git shortlog --format=%h` for > > example would use `log.abbrev`, wouldn't it? It would be nice to figure > > out whether this can be made to misbehave based on which object hash we > > have in the input. > > I did try this, but like I said, --format seems not supported when > git-shortlog(1) is reading from stdin. It always outputs the same > summary format, either grouped on authors or committers. This is not > explicitly documented anywhere - the manual only says that "git shortlog > will output a summary of the log read from standard input" and then goes > on to document all the options with no mention of a difference in > behaviour when reading from stdin. So I'm still not entirely convinced > that this is impossible to trigger (also given the complexity of the > argument parsing machinery), but I have not been able to find a way. Yeah, I think this is correct. For the purposes of this series, I think that what Woflgang has done is sufficient on the testing front. As a nice piece of #leftoverbits, it would be appreciated to have a patch that amends git-shortlog(1) to indicate that '--format' is ignored when reading input from stdin. Thanks, Taylor ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-16 18:52 ` Taylor Blau @ 2024-10-16 19:01 ` Wolfgang Müller 2024-10-17 5:04 ` Patrick Steinhardt 1 sibling, 0 replies; 36+ messages in thread From: Wolfgang Müller @ 2024-10-16 19:01 UTC (permalink / raw) To: Taylor Blau; +Cc: Patrick Steinhardt, git On 2024-10-16 14:52, Taylor Blau wrote: > As a nice piece of #leftoverbits, it would be appreciated to have a > patch that amends git-shortlog(1) to indicate that '--format' is > ignored when reading input from stdin. Think I shall give that a look in the coming days, then, this is something I'd like to see fixed as well. Immediately what comes to my mind is that not only --format is ignored (--author is too if I'm not mistaken), so I'll see whether there's a nice way to indicate which specific options are valid when git-shortlog(1) reads from stdin. -- Wolf ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-16 18:52 ` Taylor Blau 2024-10-16 19:01 ` Wolfgang Müller @ 2024-10-17 5:04 ` Patrick Steinhardt 1 sibling, 0 replies; 36+ messages in thread From: Patrick Steinhardt @ 2024-10-17 5:04 UTC (permalink / raw) To: Taylor Blau; +Cc: git On Wed, Oct 16, 2024 at 02:52:56PM -0400, Taylor Blau wrote: > On Wed, Oct 16, 2024 at 11:07:02AM +0200, Wolfgang Müller wrote: > > On 2024-10-16 10:57, Patrick Steinhardt wrote: > > > Given that we do set `log.abbrev` I think we should be hitting code > > > paths in git-shortlog(1) that use it. `git shortlog --format=%h` for > > > example would use `log.abbrev`, wouldn't it? It would be nice to figure > > > out whether this can be made to misbehave based on which object hash we > > > have in the input. > > > > I did try this, but like I said, --format seems not supported when > > git-shortlog(1) is reading from stdin. It always outputs the same > > summary format, either grouped on authors or committers. This is not > > explicitly documented anywhere - the manual only says that "git shortlog > > will output a summary of the log read from standard input" and then goes > > on to document all the options with no mention of a difference in > > behaviour when reading from stdin. So I'm still not entirely convinced > > that this is impossible to trigger (also given the complexity of the > > argument parsing machinery), but I have not been able to find a way. > > Yeah, I think this is correct. For the purposes of this series, I think > that what Woflgang has done is sufficient on the testing front. > > As a nice piece of #leftoverbits, it would be appreciated to have a > patch that amends git-shortlog(1) to indicate that '--format' is ignored > when reading input from stdin. Yeah, I'm happy with the in-depth explanation that he gave and am fine with not adding a test. Patrick ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-16 8:57 ` Patrick Steinhardt 2024-10-16 9:07 ` Wolfgang Müller @ 2024-10-16 9:48 ` Wolfgang Müller 2024-10-16 19:01 ` Taylor Blau 1 sibling, 1 reply; 36+ messages in thread From: Wolfgang Müller @ 2024-10-16 9:48 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git On 2024-10-16 10:57, Patrick Steinhardt wrote: > Given that we do set `log.abbrev` I think we should be hitting code > paths in git-shortlog(1) that use it. `git shortlog --format=%h` for > example would use `log.abbrev`, wouldn't it? It would be nice to > figure out whether this can be made to misbehave based on which object > hash we have in the input. I dove into the code again and now I'm fairly sure custom formatting is only ever done when in a repository. shortlog_output() itself, called at the end of cmd_shortlog(), doesn't do any formatting, only possibly wrapping the lines already present in the shortlog struct. That struct is filled either by read_from_stdin() or get_from_rev(). The latter is only ever called when in a repository: > if (!nongit && !rev.pending.nr && isatty(0)) > add_head_to_pending(&rev); > if (rev.pending.nr == 0) { > if (isatty(0)) > fprintf(stderr, _("(reading log message from standard input)\n")); > read_from_stdin(&log); > } > else > get_from_rev(&rev, &log); get_from_rev() handles formatting through shortlog_add_commit(), directly formatting the commit into the oneline buffer that is later added to the shortlog struct: > if (!log->summary) { > if (log->user_format) > pretty_print_commit(&ctx, commit, &oneline); > else > repo_format_commit_message(the_repository, commit, > "%s", &oneline, &ctx); > } > oneline_str = oneline.len ? oneline.buf : "<none>"; > > insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str); > insert_records_from_format(log, &dups, commit, &ctx, oneline_str); read_from_stdin(), in comparison, skips all lines from stdin not matching either "Author: " or "Commit: ", and then adds only the author/committer name and the subject to its log: > const char *v; > if (!skip_prefix(ident.buf, match[0], &v) && > !skip_prefix(ident.buf, match[1], &v)) > continue; > while (strbuf_getline_lf(&oneline, stdin) != EOF && > oneline.len) > ; /* discard headers */ > while (strbuf_getline_lf(&oneline, stdin) != EOF && > !oneline.len) > ; /* discard blanks */ > > strbuf_reset(&mapped_ident); > if (parse_ident(log, &mapped_ident, v) < 0) > continue; > > insert_one_record(log, mapped_ident.buf, oneline.buf); So whilst we parse all the relevant options like --abbrev and --format, we take a shortcut through read_from_stdin() and never get to apply a custom format. Commit hashes from stdin are discarded. I'm not sure a test case for different hash algorithms would test anything meaningful here, unless the plan in the future is to have git-shortlog(1) support formatting when reading from stdin. -- Wolf ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-16 9:48 ` Wolfgang Müller @ 2024-10-16 19:01 ` Taylor Blau 2024-10-16 19:14 ` Wolfgang Müller 0 siblings, 1 reply; 36+ messages in thread From: Taylor Blau @ 2024-10-16 19:01 UTC (permalink / raw) To: Patrick Steinhardt, git On Wed, Oct 16, 2024 at 11:48:24AM +0200, Wolfgang Müller wrote: > On 2024-10-16 10:57, Patrick Steinhardt wrote: > > Given that we do set `log.abbrev` I think we should be hitting code > > paths in git-shortlog(1) that use it. `git shortlog --format=%h` for > > example would use `log.abbrev`, wouldn't it? It would be nice to > > figure out whether this can be made to misbehave based on which object > > hash we have in the input. > > I dove into the code again and now I'm fairly sure custom formatting is > only ever done when in a repository. shortlog_output() itself, called at > the end of cmd_shortlog(), doesn't do any formatting, only possibly > wrapping the lines already present in the shortlog struct. > > That struct is filled either by read_from_stdin() or get_from_rev(). The > latter is only ever called when in a repository: > > [...] Thanks; I agree with your analysis here. > So whilst we parse all the relevant options like --abbrev and --format, > we take a shortcut through read_from_stdin() and never get to apply a > custom format. Commit hashes from stdin are discarded. > > I'm not sure a test case for different hash algorithms would test > anything meaningful here, unless the plan in the future is to have > git-shortlog(1) support formatting when reading from stdin. I think that in general it would be difficult to support the full range of --format specifiers when operating outside of a repository, because we don't have all of the information necessary to assemble all of the possible formatting options. For instance, let's say I want to take Patrick's example to test 'git shortlog' with '--format="%H"' outside of the repository. There's no way to disambiguate whether, say, a SHA-256 hash is either (a) a correctly formatted SHA-256 hash, or (b) a corrupted / too-long SHA-1 hash. So that means that '%H', '%h', '%T', and '%t' are off the table. '%an' and '%ae' seem reasonable to implement, but '%aN' and '%aE' less so, because we don't have a .mailmap file to read. The same goes for the committer variants of all of those. I don't think there is any reasonable interpretation of '%d'/'%D', and likewise for '%(decorate)' as well as '%(describe)'. We could probably go on, but I am getting tired of looking through the 'PRETTY FORMATS' section of git-log(1) and trying to figure out how they'd work (or not) without a repository ;-). In any event, my feeling is that while we could probably implement a handful of these formatting options, that it would likely be kind of awkward to do so. Not to mention the user-visible awkwardness of supporting some '--format' specifiers but not others[^1]. So I think that the best course of action would be to document the limitation and move on ;-). Thanks, Taylor [^1]: Playing devil's advocate, though, perhaps it is OK to document well which formatting options do and don't work, and accept that a user asking for '--format="%(describe)"' (etc.) outside of a repository is nonsensical and warn / return nothing appropriately. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH v2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-16 19:01 ` Taylor Blau @ 2024-10-16 19:14 ` Wolfgang Müller 0 siblings, 0 replies; 36+ messages in thread From: Wolfgang Müller @ 2024-10-16 19:14 UTC (permalink / raw) To: Taylor Blau; +Cc: Patrick Steinhardt, git On 2024-10-16 15:01, Taylor Blau wrote: > So that means that '%H', '%h', '%T', and '%t' are off the table. '%an' > and '%ae' seem reasonable to implement, but '%aN' and '%aE' less so, > because we don't have a .mailmap file to read. The same goes for the > committer variants of all of those. An interesting tidbit here is that git-shortlog(1) actually does read a .mailmap file when processing stdin, but it takes it from the current directory instead. Having read the full manual today, it's one of the only times it mentions a difference in behaviour regarding standard input. -- Wolf ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 0/2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-11 18:34 [RFC PATCH] builtin/shortlog: explicitly set hash algo when there is no repo Wolfgang Müller 2024-10-15 9:33 ` Wolfgang Müller 2024-10-15 11:48 ` [RFC PATCH v2] " Wolfgang Müller @ 2024-10-16 18:21 ` Wolfgang Müller 2024-10-16 18:21 ` [PATCH v3 1/2] " Wolfgang Müller ` (2 more replies) 2024-10-17 9:35 ` [PATCH v4] " Wolfgang Müller 3 siblings, 3 replies; 36+ messages in thread From: Wolfgang Müller @ 2024-10-16 18:21 UTC (permalink / raw) To: git; +Cc: Wolfgang Müller Here's v3 of this series. CI should now pass as the code has been moved below any declarations. The regression test has been simplified, thanks to Eric's input. I added another commit with a test for git-shortlog(1) that makes sure it can process logs obtained from a repo with --object-format=sha256. Thanks to Patrick for the suggestion and pointers. Wolfgang Müller (2): builtin/shortlog: explicitly set hash algo when there is no repo shortlog: Test reading a log from a SHA256 repo in a non-git directory builtin/shortlog.c | 12 ++++++++++++ t/t4201-shortlog.sh | 12 ++++++++++++ 2 files changed, 24 insertions(+) Range-diff against v2: 1: d3047a0291 ! 1: 4813b458ac builtin/shortlog: explicitly set hash algo when there is no repo @@ Commit message add a regression test for the segfault. Signed-off-by: Wolfgang Müller <wolf@oriole.systems> + Thanks-to: Eric Sunshine <sunshine@sunshineco.com> ## builtin/shortlog.c ## @@ builtin/shortlog.c: int cmd_shortlog(int argc, - struct rev_info rev; - int nongit = !startup_info->have_repository; + + struct parse_opt_ctx_t ctx; + /* + * NEEDSWORK: Later on we'll call parse_revision_opt which relies on @@ builtin/shortlog.c: int cmd_shortlog(int argc, + if (nongit && !the_hash_algo) + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + - const struct option options[] = { - OPT_BIT('c', "committer", &log.groups, - N_("group by committer rather than author"), + git_config(git_default_config, NULL); + shortlog_init(&log); + repo_init_revisions(the_repository, &rev, prefix); ## t/t4201-shortlog.sh ## @@ t/t4201-shortlog.sh: fuzz() @@ t/t4201-shortlog.sh: fuzz() ' +test_expect_success 'shortlog --author from non-git directory does not segfault' ' -+ git log --no-expand-tabs HEAD >log && -+ env GIT_DIR=non-existing git shortlog --author=author <log 2>out ++ echo | nongit git shortlog --author=author +' + test_expect_success 'shortlog should add newline when input line matches wraplen' ' -: ---------- > 2: 9dfdc7510f shortlog: Test reading a log from a SHA256 repo in a non-git directory -- 2.47.0 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-16 18:21 ` [PATCH v3 0/2] " Wolfgang Müller @ 2024-10-16 18:21 ` Wolfgang Müller 2024-10-16 19:22 ` Taylor Blau 2024-10-16 18:21 ` [PATCH v3 2/2] shortlog: Test reading a log from a SHA256 repo in a non-git directory Wolfgang Müller 2024-10-16 19:32 ` [PATCH v3 0/2] builtin/shortlog: explicitly set hash algo when there is no repo Taylor Blau 2 siblings, 1 reply; 36+ messages in thread From: Wolfgang Müller @ 2024-10-16 18:21 UTC (permalink / raw) To: git; +Cc: Wolfgang Müller Whilst git-shortlog(1) does not explicitly need any repository information when run without reference to one, it still parses some of its arguments with parse_revision_opt() which assumes that the hash algorithm is set. However, in c8aed5e8da (repository: stop setting SHA1 as the default object hash, 2024-05-07) we stopped setting up a default hash algorithm and instead require commands to set it up explicitly. This was done for most other commands like in ab274909d4 (builtin/diff: explicitly set hash algo when there is no repo, 2024-05-07) but was missed for builtin/shortlog, making git-shortlog(1) segfault outside of a repository when given arguments like --author that trigger a call to parse_revision_opt(). Fix this for now by explicitly setting the hash algorithm to SHA1. Also add a regression test for the segfault. Signed-off-by: Wolfgang Müller <wolf@oriole.systems> Thanks-to: Eric Sunshine <sunshine@sunshineco.com> --- builtin/shortlog.c | 12 ++++++++++++ t/t4201-shortlog.sh | 4 ++++ 2 files changed, 16 insertions(+) diff --git a/builtin/shortlog.c b/builtin/shortlog.c index 3ed5c46078..c86b75d981 100644 --- a/builtin/shortlog.c +++ b/builtin/shortlog.c @@ -407,6 +407,18 @@ int cmd_shortlog(int argc, struct parse_opt_ctx_t ctx; + /* + * NEEDSWORK: Later on we'll call parse_revision_opt which relies on + * the hash algorithm being set but since we are operating outside of a + * Git repository we cannot determine one. This is only needed because + * parse_revision_opt expects hexsz for --abbrev which is irrelevant + * for shortlog outside of a git repository. For now explicitly set + * SHA1, but ideally the parsing machinery would be split between + * git/nongit so that we do not have to do this. + */ + if (nongit && !the_hash_algo) + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + git_config(git_default_config, NULL); shortlog_init(&log); repo_init_revisions(the_repository, &rev, prefix); diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh index c20c885724..0cc2aa5089 100755 --- a/t/t4201-shortlog.sh +++ b/t/t4201-shortlog.sh @@ -143,6 +143,10 @@ fuzz() test_grep "too many arguments" out ' +test_expect_success 'shortlog --author from non-git directory does not segfault' ' + echo | nongit git shortlog --author=author +' + test_expect_success 'shortlog should add newline when input line matches wraplen' ' cat >expect <<\EOF && A U Thor (2): -- 2.47.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v3 1/2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-16 18:21 ` [PATCH v3 1/2] " Wolfgang Müller @ 2024-10-16 19:22 ` Taylor Blau 2024-10-16 19:37 ` Wolfgang Müller 2024-10-17 11:58 ` Patrick Steinhardt 0 siblings, 2 replies; 36+ messages in thread From: Taylor Blau @ 2024-10-16 19:22 UTC (permalink / raw) To: Wolfgang Müller; +Cc: git On Wed, Oct 16, 2024 at 08:21:23PM +0200, Wolfgang Müller wrote: > Whilst git-shortlog(1) does not explicitly need any repository > information when run without reference to one, it still parses some of > its arguments with parse_revision_opt() which assumes that the hash > algorithm is set. However, in c8aed5e8da (repository: stop setting SHA1 > as the default object hash, 2024-05-07) we stopped setting up a default > hash algorithm and instead require commands to set it up explicitly. > > This was done for most other commands like in ab274909d4 (builtin/diff: > explicitly set hash algo when there is no repo, 2024-05-07) but was > missed for builtin/shortlog, making git-shortlog(1) segfault outside of > a repository when given arguments like --author that trigger a call to > parse_revision_opt(). Good analysis. > Fix this for now by explicitly setting the hash algorithm to SHA1. Also > add a regression test for the segfault. Makes sense. > Signed-off-by: Wolfgang Müller <wolf@oriole.systems> > Thanks-to: Eric Sunshine <sunshine@sunshineco.com> In the future, please ensure that your Signed-off-by is the last trailer in the section to indicate that you have certified everything above it (which in that case would include your Thanks-to here). > @@ -143,6 +143,10 @@ fuzz() > test_grep "too many arguments" out > ' > > +test_expect_success 'shortlog --author from non-git directory does not segfault' ' > + echo | nongit git shortlog --author=author > +' > + Instead of 'echo |', I wonder if it would be just as good to write this test as: nongit git shortlog --author=author </dev/null Thanks, Taylor ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 1/2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-16 19:22 ` Taylor Blau @ 2024-10-16 19:37 ` Wolfgang Müller 2024-10-17 11:58 ` Patrick Steinhardt 1 sibling, 0 replies; 36+ messages in thread From: Wolfgang Müller @ 2024-10-16 19:37 UTC (permalink / raw) To: Taylor Blau; +Cc: git > Instead of 'echo |', I wonder if it would be just as good to write this > test as: > > nongit git shortlog --author=author </dev/null Existing tests seem to prefer </dev/null overwhelmingly, so I think I'll change that. Shouldn't make a difference to the test itself. -- Wolf ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 1/2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-16 19:22 ` Taylor Blau 2024-10-16 19:37 ` Wolfgang Müller @ 2024-10-17 11:58 ` Patrick Steinhardt 2024-10-17 12:09 ` Wolfgang Müller 1 sibling, 1 reply; 36+ messages in thread From: Patrick Steinhardt @ 2024-10-17 11:58 UTC (permalink / raw) To: Taylor Blau; +Cc: Wolfgang Müller, git On Wed, Oct 16, 2024 at 03:22:04PM -0400, Taylor Blau wrote: > On Wed, Oct 16, 2024 at 08:21:23PM +0200, Wolfgang Müller wrote: > > Whilst git-shortlog(1) does not explicitly need any repository > > information when run without reference to one, it still parses some of > > its arguments with parse_revision_opt() which assumes that the hash > > algorithm is set. However, in c8aed5e8da (repository: stop setting SHA1 > > as the default object hash, 2024-05-07) we stopped setting up a default > > hash algorithm and instead require commands to set it up explicitly. > > > > This was done for most other commands like in ab274909d4 (builtin/diff: > > explicitly set hash algo when there is no repo, 2024-05-07) but was > > missed for builtin/shortlog, making git-shortlog(1) segfault outside of > > a repository when given arguments like --author that trigger a call to > > parse_revision_opt(). > > Good analysis. > > > Fix this for now by explicitly setting the hash algorithm to SHA1. Also > > add a regression test for the segfault. > > Makes sense. > > > Signed-off-by: Wolfgang Müller <wolf@oriole.systems> > > Thanks-to: Eric Sunshine <sunshine@sunshineco.com> > > In the future, please ensure that your Signed-off-by is the last trailer > in the section to indicate that you have certified everything above it > (which in that case would include your Thanks-to here). I think it's also more common to use "Helped-by" instead of "Thanks-to". I see you have the same trailer in v4, but don't necessarily think that it is a good-enough reason to reroll. Patrick ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 1/2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-17 11:58 ` Patrick Steinhardt @ 2024-10-17 12:09 ` Wolfgang Müller 2024-10-17 12:11 ` Patrick Steinhardt 0 siblings, 1 reply; 36+ messages in thread From: Wolfgang Müller @ 2024-10-17 12:09 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: Taylor Blau, git On 2024-10-17 13:58, Patrick Steinhardt wrote: > I think it's also more common to use "Helped-by" instead of "Thanks-to". > I see you have the same trailer in v4, but don't necessarily think that > it is a good-enough reason to reroll. That is good to know, thanks. Is there by chance any document that outlines usage guidelines for when to use which trailer or is it more of an undocumented custom? -- Wolf ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 1/2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-17 12:09 ` Wolfgang Müller @ 2024-10-17 12:11 ` Patrick Steinhardt 0 siblings, 0 replies; 36+ messages in thread From: Patrick Steinhardt @ 2024-10-17 12:11 UTC (permalink / raw) To: Taylor Blau, git On Thu, Oct 17, 2024 at 02:09:12PM +0200, Wolfgang Müller wrote: > On 2024-10-17 13:58, Patrick Steinhardt wrote: > > I think it's also more common to use "Helped-by" instead of "Thanks-to". > > I see you have the same trailer in v4, but don't necessarily think that > > it is a good-enough reason to reroll. > > That is good to know, thanks. Is there by chance any document that > outlines usage guidelines for when to use which trailer or is it more of > an undocumented custom? We have Documentation/SubmittingPatches, which has a section named "commit-trailers". Patrick ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 2/2] shortlog: Test reading a log from a SHA256 repo in a non-git directory 2024-10-16 18:21 ` [PATCH v3 0/2] " Wolfgang Müller 2024-10-16 18:21 ` [PATCH v3 1/2] " Wolfgang Müller @ 2024-10-16 18:21 ` Wolfgang Müller 2024-10-16 19:25 ` Taylor Blau 2024-10-16 19:32 ` [PATCH v3 0/2] builtin/shortlog: explicitly set hash algo when there is no repo Taylor Blau 2 siblings, 1 reply; 36+ messages in thread From: Wolfgang Müller @ 2024-10-16 18:21 UTC (permalink / raw) To: git; +Cc: Wolfgang Müller git-shortlog(1) might be used outside of a git repository to process a log obtained from a repository initialized with --object-format=sha256. Since git-shortlog(1) has no information on the hash algorithm, make sure that it can still successfully process the log regardless. Signed-off-by: Wolfgang Müller <wolf@oriole.systems> Thanks-to: Patrick Steinhardt <ps@pks.im> --- t/t4201-shortlog.sh | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh index 0cc2aa5089..50d987cbe4 100755 --- a/t/t4201-shortlog.sh +++ b/t/t4201-shortlog.sh @@ -147,6 +147,14 @@ fuzz() echo | nongit git shortlog --author=author ' +test_expect_success 'shortlog from non-git directory reads log from SHA256 repository' ' + test_when_finished "rm -rf repo" && + git init --object-format=sha256 repo && + test_commit -C repo initial && + git -C repo log HEAD >log && + nongit git shortlog <log +' + test_expect_success 'shortlog should add newline when input line matches wraplen' ' cat >expect <<\EOF && A U Thor (2): -- 2.47.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v3 2/2] shortlog: Test reading a log from a SHA256 repo in a non-git directory 2024-10-16 18:21 ` [PATCH v3 2/2] shortlog: Test reading a log from a SHA256 repo in a non-git directory Wolfgang Müller @ 2024-10-16 19:25 ` Taylor Blau 2024-10-16 19:35 ` Wolfgang Müller 0 siblings, 1 reply; 36+ messages in thread From: Taylor Blau @ 2024-10-16 19:25 UTC (permalink / raw) To: Wolfgang Müller; +Cc: git On Wed, Oct 16, 2024 at 08:21:24PM +0200, Wolfgang Müller wrote: > git-shortlog(1) might be used outside of a git repository to process a > log obtained from a repository initialized with --object-format=sha256. > Since git-shortlog(1) has no information on the hash algorithm, make > sure that it can still successfully process the log regardless. > > Signed-off-by: Wolfgang Müller <wolf@oriole.systems> > Thanks-to: Patrick Steinhardt <ps@pks.im> Same note here about ordering Signed-off-by: and Thanks-to: lines as in the previous patch. > @@ -147,6 +147,14 @@ fuzz() > echo | nongit git shortlog --author=author > ' > > +test_expect_success 'shortlog from non-git directory reads log from SHA256 repository' ' > + test_when_finished "rm -rf repo" && > + git init --object-format=sha256 repo && I wondered why we needed a separate repo for this test, but I understand now that we need to have a SHA-256 repository to test this on. I'm still dubious as to the value of this test overall, but I think it may be nice to s/repo/sha256-repo/ or something here to indicate its purpose more clearly in the test itself. I don't feel strongly about it, though. Thanks, Taylor ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 2/2] shortlog: Test reading a log from a SHA256 repo in a non-git directory 2024-10-16 19:25 ` Taylor Blau @ 2024-10-16 19:35 ` Wolfgang Müller 2024-10-16 19:45 ` Taylor Blau 0 siblings, 1 reply; 36+ messages in thread From: Wolfgang Müller @ 2024-10-16 19:35 UTC (permalink / raw) To: Taylor Blau; +Cc: git On 2024-10-16 15:25, Taylor Blau wrote: > Same note here about ordering Signed-off-by: and Thanks-to: lines as in > the previous patch. Shall keep in mind, thanks! > > @@ -147,6 +147,14 @@ fuzz() > > echo | nongit git shortlog --author=author > > ' > > > > +test_expect_success 'shortlog from non-git directory reads log from SHA256 repository' ' > > + test_when_finished "rm -rf repo" && > > + git init --object-format=sha256 repo && > > I wondered why we needed a separate repo for this test, but I understand > now that we need to have a SHA-256 repository to test this on. > > I'm still dubious as to the value of this test overall, but I think it > may be nice to s/repo/sha256-repo/ or something here to indicate its > purpose more clearly in the test itself. > > I don't feel strongly about it, though. I personally lean towards skipping out on this test, but I can also see value in it coming from a point of view of future changes to git-shortlog(1) behaviour. Then again, perhaps it would be better then to add a relevant test only if new behaviour is added. No strong feelings here either, I'd defer to what regular maintainers and developers here prefer :-) -- Wolf ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 2/2] shortlog: Test reading a log from a SHA256 repo in a non-git directory 2024-10-16 19:35 ` Wolfgang Müller @ 2024-10-16 19:45 ` Taylor Blau 0 siblings, 0 replies; 36+ messages in thread From: Taylor Blau @ 2024-10-16 19:45 UTC (permalink / raw) To: git On Wed, Oct 16, 2024 at 09:35:46PM +0200, Wolfgang Müller wrote: > > > @@ -147,6 +147,14 @@ fuzz() > > > echo | nongit git shortlog --author=author > > > ' > > > > > > +test_expect_success 'shortlog from non-git directory reads log from SHA256 repository' ' > > > + test_when_finished "rm -rf repo" && > > > + git init --object-format=sha256 repo && > > > > I wondered why we needed a separate repo for this test, but I understand > > now that we need to have a SHA-256 repository to test this on. > > > > I'm still dubious as to the value of this test overall, but I think it > > may be nice to s/repo/sha256-repo/ or something here to indicate its > > purpose more clearly in the test itself. > > > > I don't feel strongly about it, though. > > I personally lean towards skipping out on this test, but I can also see > value in it coming from a point of view of future changes to > git-shortlog(1) behaviour. Then again, perhaps it would be better then > to add a relevant test only if new behaviour is added. No strong > feelings here either, I'd defer to what regular maintainers and > developers here prefer :-) I have a vague preference towards just getting rid of it, but I wouldn't be sad to see it stay, either. Thanks, Taylor ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 0/2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-16 18:21 ` [PATCH v3 0/2] " Wolfgang Müller 2024-10-16 18:21 ` [PATCH v3 1/2] " Wolfgang Müller 2024-10-16 18:21 ` [PATCH v3 2/2] shortlog: Test reading a log from a SHA256 repo in a non-git directory Wolfgang Müller @ 2024-10-16 19:32 ` Taylor Blau 2024-10-16 19:38 ` Wolfgang Müller 2 siblings, 1 reply; 36+ messages in thread From: Taylor Blau @ 2024-10-16 19:32 UTC (permalink / raw) To: Wolfgang Müller; +Cc: git On Wed, Oct 16, 2024 at 08:21:22PM +0200, Wolfgang Müller wrote: > Here's v3 of this series. CI should now pass as the code has been moved > below any declarations. The regression test has been simplified, thanks > to Eric's input. > > I added another commit with a test for git-shortlog(1) that makes sure > it can process logs obtained from a repo with --object-format=sha256. > Thanks to Patrick for the suggestion and pointers. > > Wolfgang Müller (2): > builtin/shortlog: explicitly set hash algo when there is no repo > shortlog: Test reading a log from a SHA256 repo in a non-git directory Thanks. I took a look at this new round and it is looking close to me. I had a couple of small-ish comments that I think would be good to address in a new round. But if you feel strongly that the current round is good as-is, I'm happy to hear arguments in that direction, too ;-). Thanks, Taylor ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 0/2] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-16 19:32 ` [PATCH v3 0/2] builtin/shortlog: explicitly set hash algo when there is no repo Taylor Blau @ 2024-10-16 19:38 ` Wolfgang Müller 0 siblings, 0 replies; 36+ messages in thread From: Wolfgang Müller @ 2024-10-16 19:38 UTC (permalink / raw) To: Taylor Blau; +Cc: git On 2024-10-16 15:32, Taylor Blau wrote: > I had a couple of small-ish comments that I think would be good to > address in a new round. But if you feel strongly that the current round > is good as-is, I'm happy to hear arguments in that direction, too ;-). New round sounds perfectly fine to me, I'll send it along once there is some consensus on whether or not to keep the additional test. -- Wolf ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v4] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-11 18:34 [RFC PATCH] builtin/shortlog: explicitly set hash algo when there is no repo Wolfgang Müller ` (2 preceding siblings ...) 2024-10-16 18:21 ` [PATCH v3 0/2] " Wolfgang Müller @ 2024-10-17 9:35 ` Wolfgang Müller 2024-10-17 20:10 ` Taylor Blau 3 siblings, 1 reply; 36+ messages in thread From: Wolfgang Müller @ 2024-10-17 9:35 UTC (permalink / raw) To: git Whilst git-shortlog(1) does not explicitly need any repository information when run without reference to one, it still parses some of its arguments with parse_revision_opt() which assumes that the hash algorithm is set. However, in c8aed5e8da (repository: stop setting SHA1 as the default object hash, 2024-05-07) we stopped setting up a default hash algorithm and instead require commands to set it up explicitly. This was done for most other commands like in ab274909d4 (builtin/diff: explicitly set hash algo when there is no repo, 2024-05-07) but was missed for builtin/shortlog, making git-shortlog(1) segfault outside of a repository when given arguments like --author that trigger a call to parse_revision_opt(). Fix this for now by explicitly setting the hash algorithm to SHA1. Also add a regression test for the segfault. Thanks-to: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Wolfgang Müller <wolf@oriole.systems> --- Here's v4 with the trailers fixed and a small improvement to the test. The commit including the additional test has been dropped. Thanks! builtin/shortlog.c | 12 ++++++++++++ t/t4201-shortlog.sh | 4 ++++ 2 files changed, 16 insertions(+) diff --git a/builtin/shortlog.c b/builtin/shortlog.c index 3ed5c46078..c86b75d981 100644 --- a/builtin/shortlog.c +++ b/builtin/shortlog.c @@ -407,6 +407,18 @@ int cmd_shortlog(int argc, struct parse_opt_ctx_t ctx; + /* + * NEEDSWORK: Later on we'll call parse_revision_opt which relies on + * the hash algorithm being set but since we are operating outside of a + * Git repository we cannot determine one. This is only needed because + * parse_revision_opt expects hexsz for --abbrev which is irrelevant + * for shortlog outside of a git repository. For now explicitly set + * SHA1, but ideally the parsing machinery would be split between + * git/nongit so that we do not have to do this. + */ + if (nongit && !the_hash_algo) + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + git_config(git_default_config, NULL); shortlog_init(&log); repo_init_revisions(the_repository, &rev, prefix); diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh index c20c885724..5b5d3b637c 100755 --- a/t/t4201-shortlog.sh +++ b/t/t4201-shortlog.sh @@ -143,6 +143,10 @@ fuzz() test_grep "too many arguments" out ' +test_expect_success 'shortlog --author from non-git directory does not segfault' ' + nongit git shortlog --author=author </dev/null +' + test_expect_success 'shortlog should add newline when input line matches wraplen' ' cat >expect <<\EOF && A U Thor (2): Range-diff against v3: 1: 4813b458ac ! 1: 1a2959c0de builtin/shortlog: explicitly set hash algo when there is no repo @@ Commit message Fix this for now by explicitly setting the hash algorithm to SHA1. Also add a regression test for the segfault. - Signed-off-by: Wolfgang Müller <wolf@oriole.systems> Thanks-to: Eric Sunshine <sunshine@sunshineco.com> + Signed-off-by: Wolfgang Müller <wolf@oriole.systems> ## builtin/shortlog.c ## @@ builtin/shortlog.c: int cmd_shortlog(int argc, @@ t/t4201-shortlog.sh: fuzz() ' +test_expect_success 'shortlog --author from non-git directory does not segfault' ' -+ echo | nongit git shortlog --author=author ++ nongit git shortlog --author=author </dev/null +' + test_expect_success 'shortlog should add newline when input line matches wraplen' ' 2: 9dfdc7510f < -: ---------- shortlog: Test reading a log from a SHA256 repo in a non-git directory -- 2.47.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v4] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-17 9:35 ` [PATCH v4] " Wolfgang Müller @ 2024-10-17 20:10 ` Taylor Blau 2024-10-17 22:02 ` Wolfgang Müller 0 siblings, 1 reply; 36+ messages in thread From: Taylor Blau @ 2024-10-17 20:10 UTC (permalink / raw) To: Wolfgang Müller; +Cc: git On Thu, Oct 17, 2024 at 11:35:28AM +0200, Wolfgang Müller wrote: > builtin/shortlog.c | 12 ++++++++++++ > t/t4201-shortlog.sh | 4 ++++ > 2 files changed, 16 insertions(+) This version looks great. Thanks, Wolfgang -- let's start merging this one down. Thanks, Taylor ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4] builtin/shortlog: explicitly set hash algo when there is no repo 2024-10-17 20:10 ` Taylor Blau @ 2024-10-17 22:02 ` Wolfgang Müller 0 siblings, 0 replies; 36+ messages in thread From: Wolfgang Müller @ 2024-10-17 22:02 UTC (permalink / raw) To: Taylor Blau; +Cc: git On 2024-10-17 16:10, Taylor Blau wrote: > This version looks great. Thanks, Wolfgang -- let's start merging this > one down. Thanks! And thanks to the community as well for the help and feedback, as well as the thorough discussion. It was fun to contribute, and I feel like I learned a lot :-) -- Wolf ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2024-10-17 22:02 UTC | newest] Thread overview: 36+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-10-11 18:34 [RFC PATCH] builtin/shortlog: explicitly set hash algo when there is no repo Wolfgang Müller 2024-10-15 9:33 ` Wolfgang Müller 2024-10-15 9:47 ` Kristoffer Haugsbakk 2024-10-15 19:54 ` Taylor Blau 2024-10-15 23:28 ` Taylor Blau 2024-10-16 8:15 ` Wolfgang Müller 2024-10-16 18:28 ` Taylor Blau 2024-10-15 11:48 ` [RFC PATCH v2] " Wolfgang Müller 2024-10-15 17:20 ` Eric Sunshine 2024-10-15 17:51 ` Wolfgang Müller 2024-10-16 5:32 ` Patrick Steinhardt 2024-10-16 8:47 ` Wolfgang Müller 2024-10-16 8:57 ` Patrick Steinhardt 2024-10-16 9:07 ` Wolfgang Müller 2024-10-16 18:52 ` Taylor Blau 2024-10-16 19:01 ` Wolfgang Müller 2024-10-17 5:04 ` Patrick Steinhardt 2024-10-16 9:48 ` Wolfgang Müller 2024-10-16 19:01 ` Taylor Blau 2024-10-16 19:14 ` Wolfgang Müller 2024-10-16 18:21 ` [PATCH v3 0/2] " Wolfgang Müller 2024-10-16 18:21 ` [PATCH v3 1/2] " Wolfgang Müller 2024-10-16 19:22 ` Taylor Blau 2024-10-16 19:37 ` Wolfgang Müller 2024-10-17 11:58 ` Patrick Steinhardt 2024-10-17 12:09 ` Wolfgang Müller 2024-10-17 12:11 ` Patrick Steinhardt 2024-10-16 18:21 ` [PATCH v3 2/2] shortlog: Test reading a log from a SHA256 repo in a non-git directory Wolfgang Müller 2024-10-16 19:25 ` Taylor Blau 2024-10-16 19:35 ` Wolfgang Müller 2024-10-16 19:45 ` Taylor Blau 2024-10-16 19:32 ` [PATCH v3 0/2] builtin/shortlog: explicitly set hash algo when there is no repo Taylor Blau 2024-10-16 19:38 ` Wolfgang Müller 2024-10-17 9:35 ` [PATCH v4] " Wolfgang Müller 2024-10-17 20:10 ` Taylor Blau 2024-10-17 22:02 ` Wolfgang Müller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).