* [PATCH 0/7] shortlog: introduce `--group=<format>`
@ 2022-10-11  0:33 Taylor Blau
  2022-10-11  0:34 ` [PATCH 1/7] Documentation: extract date-options.txt Taylor Blau
                   ` (9 more replies)
  0 siblings, 10 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-11  0:33 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster
This series is based on a suggestion[1] from Junio in response to
Jacob Stopak's series to add committer month/year options as shortlog
groups.
This series pursues a more flexible approach, where arbitrary pretty
formats can be used to generate group names in shortlog. For example, to
group by committer month/year, you can do the following:
    $ git.compile shortlog -s --group='%cd' --date='format:%Y-%m' v2.37.0..
       117  2022-06
       274  2022-07
       325  2022-08
       271  2022-09
        17  2022-10
, which is cute. It can also be used to reimplement existing
functionality in shortlog, which is what the final three patches in this
series do. Author and committer options are implemented under the hood
as if they were shorthands for `--group='%aN <%aE>'` (or `--group='%cN
<%cE>'`, respectively).
The `--group=trailer:<key>` option is also reimplemented, but it is
slightly trickier than the rest. See the final patch for more details
there.
Thanks in advance for your review.
[1]: https://lore.kernel.org/git/xmqqillevzeh.fsf@gitster.g/
Jeff King (1):
  shortlog: accept `--date`-related options
Taylor Blau (6):
  Documentation: extract date-options.txt
  shortlog: extract `--group` fragment for translation
  shortlog: support arbitrary commit format `--group`s
  shortlog: implement `--group=author` in terms of `--group=<format>`
  shortlog: implement `--group=committer` in terms of `--group=<format>`
  shortlog: implement `--group=trailer` in terms of `--group=<format>`
 Documentation/date-options.txt     |  66 +++++++++++++
 Documentation/git-shortlog.txt     |   4 +
 Documentation/rev-list-options.txt |  67 +------------
 builtin/log.c                      |   1 +
 builtin/shortlog.c                 | 145 +++++++++++++++--------------
 shortlog.h                         |   6 +-
 t/t4201-shortlog.sh                |   9 ++
 7 files changed, 163 insertions(+), 135 deletions(-)
 create mode 100644 Documentation/date-options.txt
-- 
2.37.0.1.g1379af2e9d
^ permalink raw reply	[flat|nested] 72+ messages in thread
* [PATCH 1/7] Documentation: extract date-options.txt
  2022-10-11  0:33 [PATCH 0/7] shortlog: introduce `--group=<format>` Taylor Blau
@ 2022-10-11  0:34 ` Taylor Blau
  2022-10-11  0:54   ` Jeff King
  2022-10-11  0:34 ` [PATCH 2/7] shortlog: accept `--date`-related options Taylor Blau
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 72+ messages in thread
From: Taylor Blau @ 2022-10-11  0:34 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster
A future commit will want to include `--date`-related options in the
documentation for `git-shortlog(1)`, but without some of the additional
baggage in the usual rev-list-options.txt.
Extract those options to a separate file in Documentation and include it
from its original source in rev-list-options.txt.
This patch does not modify the contents of the `--date`-options section
of Documentation/rev-list-options.txt.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/date-options.txt     | 66 +++++++++++++++++++++++++++++
 Documentation/rev-list-options.txt | 67 +-----------------------------
 2 files changed, 67 insertions(+), 66 deletions(-)
 create mode 100644 Documentation/date-options.txt
diff --git a/Documentation/date-options.txt b/Documentation/date-options.txt
new file mode 100644
index 0000000000..65896e4b95
--- /dev/null
+++ b/Documentation/date-options.txt
@@ -0,0 +1,66 @@
+--relative-date::
+	Synonym for `--date=relative`.
+
+--date=<format>::
+	Only takes effect for dates shown in human-readable format, such
+	as when using `--pretty`. `log.date` config variable sets a default
+	value for the log command's `--date` option. By default, dates
+	are shown in the original time zone (either committer's or
+	author's). If `-local` is appended to the format (e.g.,
+	`iso-local`), the user's local time zone is used instead.
++
+--
+`--date=relative` shows dates relative to the current time,
+e.g. ``2 hours ago''. The `-local` option has no effect for
+`--date=relative`.
+
+`--date=local` is an alias for `--date=default-local`.
+
+`--date=iso` (or `--date=iso8601`) shows timestamps in a ISO 8601-like format.
+The differences to the strict ISO 8601 format are:
+
+	- a space instead of the `T` date/time delimiter
+	- a space between time and time zone
+	- no colon between hours and minutes of the time zone
+
+`--date=iso-strict` (or `--date=iso8601-strict`) shows timestamps in strict
+ISO 8601 format.
+
+`--date=rfc` (or `--date=rfc2822`) shows timestamps in RFC 2822
+format, often found in email messages.
+
+`--date=short` shows only the date, but not the time, in `YYYY-MM-DD` format.
+
+`--date=raw` shows the date as seconds since the epoch (1970-01-01
+00:00:00 UTC), followed by a space, and then the timezone as an offset
+from UTC (a `+` or `-` with four digits; the first two are hours, and
+the second two are minutes). I.e., as if the timestamp were formatted
+with `strftime("%s %z")`).
+Note that the `-local` option does not affect the seconds-since-epoch
+value (which is always measured in UTC), but does switch the accompanying
+timezone value.
+
+`--date=human` shows the timezone if the timezone does not match the
+current time-zone, and doesn't print the whole date if that matches
+(ie skip printing year for dates that are "this year", but also skip
+the whole date itself if it's in the last few days and we can just say
+what weekday it was).  For older dates the hour and minute is also
+omitted.
+
+`--date=unix` shows the date as a Unix epoch timestamp (seconds since
+1970).  As with `--raw`, this is always in UTC and therefore `-local`
+has no effect.
+
+`--date=format:...` feeds the format `...` to your system `strftime`,
+except for %s, %z, and %Z, which are handled internally.
+Use `--date=format:%c` to show the date in your system locale's
+preferred format.  See the `strftime` manual for a complete list of
+format placeholders. When using `-local`, the correct syntax is
+`--date=format-local:...`.
+
+`--date=default` is the default format, and is similar to
+`--date=rfc2822`, with a few exceptions:
+--
+	- there is no comma after the day-of-week
+
+	- the time zone is omitted when the local time zone is used
diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 1837509566..1cb91dfb9c 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -1033,72 +1033,7 @@ endif::git-rev-list[]
 
 include::pretty-options.txt[]
 
---relative-date::
-	Synonym for `--date=relative`.
-
---date=<format>::
-	Only takes effect for dates shown in human-readable format, such
-	as when using `--pretty`. `log.date` config variable sets a default
-	value for the log command's `--date` option. By default, dates
-	are shown in the original time zone (either committer's or
-	author's). If `-local` is appended to the format (e.g.,
-	`iso-local`), the user's local time zone is used instead.
-+
---
-`--date=relative` shows dates relative to the current time,
-e.g. ``2 hours ago''. The `-local` option has no effect for
-`--date=relative`.
-
-`--date=local` is an alias for `--date=default-local`.
-
-`--date=iso` (or `--date=iso8601`) shows timestamps in a ISO 8601-like format.
-The differences to the strict ISO 8601 format are:
-
-	- a space instead of the `T` date/time delimiter
-	- a space between time and time zone
-	- no colon between hours and minutes of the time zone
-
-`--date=iso-strict` (or `--date=iso8601-strict`) shows timestamps in strict
-ISO 8601 format.
-
-`--date=rfc` (or `--date=rfc2822`) shows timestamps in RFC 2822
-format, often found in email messages.
-
-`--date=short` shows only the date, but not the time, in `YYYY-MM-DD` format.
-
-`--date=raw` shows the date as seconds since the epoch (1970-01-01
-00:00:00 UTC), followed by a space, and then the timezone as an offset
-from UTC (a `+` or `-` with four digits; the first two are hours, and
-the second two are minutes). I.e., as if the timestamp were formatted
-with `strftime("%s %z")`).
-Note that the `-local` option does not affect the seconds-since-epoch
-value (which is always measured in UTC), but does switch the accompanying
-timezone value.
-
-`--date=human` shows the timezone if the timezone does not match the
-current time-zone, and doesn't print the whole date if that matches
-(ie skip printing year for dates that are "this year", but also skip
-the whole date itself if it's in the last few days and we can just say
-what weekday it was).  For older dates the hour and minute is also
-omitted.
-
-`--date=unix` shows the date as a Unix epoch timestamp (seconds since
-1970).  As with `--raw`, this is always in UTC and therefore `-local`
-has no effect.
-
-`--date=format:...` feeds the format `...` to your system `strftime`,
-except for %s, %z, and %Z, which are handled internally.
-Use `--date=format:%c` to show the date in your system locale's
-preferred format.  See the `strftime` manual for a complete list of
-format placeholders. When using `-local`, the correct syntax is
-`--date=format-local:...`.
-
-`--date=default` is the default format, and is similar to
-`--date=rfc2822`, with a few exceptions:
---
-	- there is no comma after the day-of-week
-
-	- the time zone is omitted when the local time zone is used
+include::date-options.txt[]
 
 ifdef::git-rev-list[]
 --header::
-- 
2.37.0.1.g1379af2e9d
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH 2/7] shortlog: accept `--date`-related options
  2022-10-11  0:33 [PATCH 0/7] shortlog: introduce `--group=<format>` Taylor Blau
  2022-10-11  0:34 ` [PATCH 1/7] Documentation: extract date-options.txt Taylor Blau
@ 2022-10-11  0:34 ` Taylor Blau
  2022-10-11  0:48   ` Jeff King
  2022-10-11  0:34 ` [PATCH 3/7] shortlog: extract `--group` fragment for translation Taylor Blau
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 72+ messages in thread
From: Taylor Blau @ 2022-10-11  0:34 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster
From: Jeff King <peff@peff.net>
Prepare for the future patch which will introduce arbitrary pretty
formats via the `--group` argument.
To allow additional customizability (for example, to support something
like `git shortlog -s --group='%aD' --date='format:%Y-%m' ...` (which
groups commits by the datestring 'YYYY-mm' according to author date), we
must store off the `--date` parsed from calling `parse_revision_opt()`.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-shortlog.txt | 2 ++
 builtin/shortlog.c             | 3 ++-
 shortlog.h                     | 2 ++
 3 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index f64e77047b..4982ceee21 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -108,6 +108,8 @@ options or the revision range, when confusion arises.
 :git-shortlog: 1
 include::rev-list-options.txt[]
 
+include::date-options.txt[]
+
 MAPPING AUTHORS
 ---------------
 
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 7a1e1fe7c0..53c379a51d 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -211,7 +211,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	ctx.fmt = CMIT_FMT_USERFORMAT;
 	ctx.abbrev = log->abbrev;
 	ctx.print_email_subject = 1;
-	ctx.date_mode.type = DATE_NORMAL;
+	ctx.date_mode = log->date_mode;
 	ctx.output_encoding = get_log_output_encoding();
 
 	if (!log->summary) {
@@ -407,6 +407,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 	log.user_format = rev.commit_format == CMIT_FMT_USERFORMAT;
 	log.abbrev = rev.abbrev;
 	log.file = rev.diffopt.file;
+	log.date_mode = rev.date_mode;
 
 	if (!log.groups)
 		log.groups = SHORTLOG_GROUP_AUTHOR;
diff --git a/shortlog.h b/shortlog.h
index 3f7e9aabca..dc388dd459 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -2,6 +2,7 @@
 #define SHORTLOG_H
 
 #include "string-list.h"
+#include "date.h"
 
 struct commit;
 
@@ -15,6 +16,7 @@ struct shortlog {
 	int in2;
 	int user_format;
 	int abbrev;
+	struct date_mode date_mode;
 
 	enum {
 		SHORTLOG_GROUP_AUTHOR = (1 << 0),
-- 
2.37.0.1.g1379af2e9d
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH 3/7] shortlog: extract `--group` fragment for translation
  2022-10-11  0:33 [PATCH 0/7] shortlog: introduce `--group=<format>` Taylor Blau
  2022-10-11  0:34 ` [PATCH 1/7] Documentation: extract date-options.txt Taylor Blau
  2022-10-11  0:34 ` [PATCH 2/7] shortlog: accept `--date`-related options Taylor Blau
@ 2022-10-11  0:34 ` Taylor Blau
  2022-10-11 10:55   ` Ævar Arnfjörð Bjarmason
  2022-10-11  0:34 ` [PATCH 4/7] shortlog: support arbitrary commit format `--group`s Taylor Blau
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 72+ messages in thread
From: Taylor Blau @ 2022-10-11  0:34 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster
The subsequent commit will add another unhandled case in
`read_from_stdin()` which will want to use the same message as with
`--group=trailer`.
Extract the "--group=trailer" part from this message so the same
translation key can be used for both cases.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/shortlog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 53c379a51d..051c97bd5a 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -132,7 +132,7 @@ static void read_from_stdin(struct shortlog *log)
 		match = committer_match;
 		break;
 	case SHORTLOG_GROUP_TRAILER:
-		die(_("using --group=trailer with stdin is not supported"));
+		die(_("using %s with stdin is not supported"), "--group=trailer");
 	default:
 		BUG("unhandled shortlog group");
 	}
-- 
2.37.0.1.g1379af2e9d
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH 4/7] shortlog: support arbitrary commit format `--group`s
  2022-10-11  0:33 [PATCH 0/7] shortlog: introduce `--group=<format>` Taylor Blau
                   ` (2 preceding siblings ...)
  2022-10-11  0:34 ` [PATCH 3/7] shortlog: extract `--group` fragment for translation Taylor Blau
@ 2022-10-11  0:34 ` Taylor Blau
  2022-10-11  1:12   ` Jeff King
  2022-10-11  0:34 ` [PATCH 5/7] shortlog: implement `--group=author` in terms of `--group=<format>` Taylor Blau
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 72+ messages in thread
From: Taylor Blau @ 2022-10-11  0:34 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster
In addition to generating a shortlog based on committer, author, or the
identity in one or more specified trailers, it can be useful to generate
a shortlog based on an arbitrary commit format.
This can be used, for example, to generate a distribution of commit
activity over time, like so:
    $ git shortlog --group='%cd' --date='format:%Y-%m' -s v2.37.0..
       117  2022-06
       274  2022-07
       324  2022-08
       263  2022-09
         7  2022-10
Arbitrary commit formats can be used. In fact, `git shortlog`'s default
behavior (to count by commit authors) can be emulated as follows:
    $ git shortlog --group='%aN <%aE>' ...
and future patches will make the default behavior (as well as
`--committer`, and `--group=trailer:<trailer>`) special cases of the
more flexible `--group` option.
Note also that the SHORTLOG_GROUP_FORMAT enum value is used only to
designate that `--group:<format>` is in use when in stdin mode to
declare that the combination is invalid.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-shortlog.txt |  2 ++
 builtin/shortlog.c             | 33 ++++++++++++++++++++++++++++++++-
 shortlog.h                     |  2 ++
 t/t4201-shortlog.sh            |  9 +++++++++
 4 files changed, 45 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index 4982ceee21..ca15643f45 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -59,6 +59,8 @@ OPTIONS
    example, if your project uses `Reviewed-by` trailers, you might want
    to see who has been reviewing with
    `git shortlog -ns --group=trailer:reviewed-by`.
+ - `<format>`, any string accepted by the `--format` option of 'git log'.
+   (See the "PRETTY FORMATS" section of linkgit:git-log[1].)
 +
 Note that commits that do not include the trailer will not be counted.
 Likewise, commits with multiple trailers (e.g., multiple signoffs) may
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 051c97bd5a..f708d96558 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -133,6 +133,8 @@ static void read_from_stdin(struct shortlog *log)
 		break;
 	case SHORTLOG_GROUP_TRAILER:
 		die(_("using %s with stdin is not supported"), "--group=trailer");
+	case SHORTLOG_GROUP_FORMAT:
+		die(_("using %s with stdin is not supported"), "--group=format");
 	default:
 		BUG("unhandled shortlog group");
 	}
@@ -200,6 +202,27 @@ static void insert_records_from_trailers(struct shortlog *log,
 	unuse_commit_buffer(commit, commit_buffer);
 }
 
+static void insert_records_from_format(struct shortlog *log,
+				       struct strset *dups,
+				       struct commit *commit,
+				       struct pretty_print_context *ctx,
+				       const char *oneline)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct string_list_item *item;
+
+	for_each_string_list_item(item, &log->format) {
+		strbuf_reset(&buf);
+
+		format_commit_message(commit, item->string, &buf, ctx);
+
+		if (strset_add(dups, buf.buf))
+			insert_one_record(log, buf.buf, oneline);
+	}
+
+	strbuf_release(&buf);
+}
+
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
 	struct strbuf ident = STRBUF_INIT;
@@ -243,6 +266,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	if (log->groups & SHORTLOG_GROUP_TRAILER) {
 		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
 	}
+	insert_records_from_format(log, &dups, commit, &ctx, oneline_str);
 
 	strset_clear(&dups);
 	strbuf_release(&ident);
@@ -314,6 +338,7 @@ static int parse_group_option(const struct option *opt, const char *arg, int uns
 	if (unset) {
 		log->groups = 0;
 		string_list_clear(&log->trailers, 0);
+		string_list_clear(&log->format, 0);
 	} else if (!strcasecmp(arg, "author"))
 		log->groups |= SHORTLOG_GROUP_AUTHOR;
 	else if (!strcasecmp(arg, "committer"))
@@ -321,8 +346,12 @@ static int parse_group_option(const struct option *opt, const char *arg, int uns
 	else if (skip_prefix(arg, "trailer:", &field)) {
 		log->groups |= SHORTLOG_GROUP_TRAILER;
 		string_list_append(&log->trailers, field);
-	} else
+	} else if (strchrnul(arg, '%')) {
+		log->groups |= SHORTLOG_GROUP_FORMAT;
+		string_list_append(&log->format, arg);
+	} else {
 		return error(_("unknown group type: %s"), arg);
+	}
 
 	return 0;
 }
@@ -340,6 +369,7 @@ void shortlog_init(struct shortlog *log)
 	log->in2 = DEFAULT_INDENT2;
 	log->trailers.strdup_strings = 1;
 	log->trailers.cmp = strcasecmp;
+	log->format.strdup_strings = 1;
 }
 
 int cmd_shortlog(int argc, const char **argv, const char *prefix)
@@ -480,4 +510,5 @@ void shortlog_output(struct shortlog *log)
 	log->list.strdup_strings = 1;
 	string_list_clear(&log->list, 1);
 	clear_mailmap(&log->mailmap);
+	string_list_clear(&log->format, 0);
 }
diff --git a/shortlog.h b/shortlog.h
index dc388dd459..4850a8c30f 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -22,8 +22,10 @@ struct shortlog {
 		SHORTLOG_GROUP_AUTHOR = (1 << 0),
 		SHORTLOG_GROUP_COMMITTER = (1 << 1),
 		SHORTLOG_GROUP_TRAILER = (1 << 2),
+		SHORTLOG_GROUP_FORMAT = (1 << 3),
 	} groups;
 	struct string_list trailers;
+	struct string_list format;
 
 	int email;
 	struct string_list mailmap;
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 3095b1b2ff..2417ac8896 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -237,6 +237,15 @@ test_expect_success 'shortlog --group=trailer:signed-off-by' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog --group=<format>' '
+	git shortlog -s --date="format:%Y" --group="%cN (%cd)" HEAD >actual &&
+	cat >expect <<-\EOF &&
+	     4	C O Mitter (2005)
+	     1	Sin Nombre (2005)
+	EOF
+	test_cmp expect actual
+'
+
 test_expect_success 'trailer idents are split' '
 	cat >expect <<-\EOF &&
 	     2	C O Mitter
-- 
2.37.0.1.g1379af2e9d
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH 5/7] shortlog: implement `--group=author` in terms of `--group=<format>`
  2022-10-11  0:33 [PATCH 0/7] shortlog: introduce `--group=<format>` Taylor Blau
                   ` (3 preceding siblings ...)
  2022-10-11  0:34 ` [PATCH 4/7] shortlog: support arbitrary commit format `--group`s Taylor Blau
@ 2022-10-11  0:34 ` Taylor Blau
  2022-10-11  1:34   ` Jeff King
  2022-10-11 10:57   ` Ævar Arnfjörð Bjarmason
  2022-10-11  0:34 ` [PATCH 6/7] shortlog: implement `--group=committer` " Taylor Blau
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-11  0:34 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster
Instead of handling SHORTLOG_GROUP_AUTHOR separately, reimplement it as
a special case of the new `--group=<format>` mode, where the author mode
is a shorthand for `--group='%aN <%aE>'.
Note that we still need to keep the SHORTLOG_GROUP_AUTHOR enum since it
has a different meaning in `read_from_stdin()`, where it is still used
for a different purpose.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/log.c      |  1 +
 builtin/shortlog.c | 23 ++++++++++++-----------
 shortlog.h         |  1 +
 3 files changed, 14 insertions(+), 11 deletions(-)
diff --git a/builtin/log.c b/builtin/log.c
index ee19dc5d45..6b77e520b5 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1334,6 +1334,7 @@ static void make_cover_letter(struct rev_info *rev, int use_separate_file,
 	log.in2 = 4;
 	log.file = rev->diffopt.file;
 	log.groups = SHORTLOG_GROUP_AUTHOR;
+	shortlog_init_group(&log);
 	for (i = 0; i < nr; i++)
 		shortlog_add_commit(&log, list[i]);
 
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index f708d96558..aac8c7afa4 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -245,15 +245,6 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	}
 	oneline_str = oneline.len ? oneline.buf : "<none>";
 
-	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
-		strbuf_reset(&ident);
-		format_commit_message(commit,
-				      log->email ? "%aN <%aE>" : "%aN",
-				      &ident, &ctx);
-		if (!HAS_MULTI_BITS(log->groups) ||
-		    strset_add(&dups, ident.buf))
-			insert_one_record(log, ident.buf, oneline_str);
-	}
 	if (log->groups & SHORTLOG_GROUP_COMMITTER) {
 		strbuf_reset(&ident);
 		format_commit_message(commit,
@@ -372,6 +363,16 @@ void shortlog_init(struct shortlog *log)
 	log->format.strdup_strings = 1;
 }
 
+void shortlog_init_group(struct shortlog *log)
+{
+	if (!log->groups)
+		log->groups = SHORTLOG_GROUP_AUTHOR;
+
+	if (log->groups & SHORTLOG_GROUP_AUTHOR)
+		string_list_append(&log->format,
+				   log->email ? "%aN <%aE>" : "%aN");
+}
+
 int cmd_shortlog(int argc, const char **argv, const char *prefix)
 {
 	struct shortlog log = { STRING_LIST_INIT_NODUP };
@@ -439,8 +440,8 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 	log.file = rev.diffopt.file;
 	log.date_mode = rev.date_mode;
 
-	if (!log.groups)
-		log.groups = SHORTLOG_GROUP_AUTHOR;
+	shortlog_init_group(&log);
+
 	string_list_sort(&log.trailers);
 
 	/* assume HEAD if from a tty */
diff --git a/shortlog.h b/shortlog.h
index 4850a8c30f..e52f001fb7 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -33,6 +33,7 @@ struct shortlog {
 };
 
 void shortlog_init(struct shortlog *log);
+void shortlog_init_group(struct shortlog *log);
 
 void shortlog_add_commit(struct shortlog *log, struct commit *commit);
 
-- 
2.37.0.1.g1379af2e9d
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH 6/7] shortlog: implement `--group=committer` in terms of `--group=<format>`
  2022-10-11  0:33 [PATCH 0/7] shortlog: introduce `--group=<format>` Taylor Blau
                   ` (4 preceding siblings ...)
  2022-10-11  0:34 ` [PATCH 5/7] shortlog: implement `--group=author` in terms of `--group=<format>` Taylor Blau
@ 2022-10-11  0:34 ` Taylor Blau
  2022-10-11  1:35   ` Jeff King
  2022-10-11  0:34 ` [PATCH 7/7] shortlog: implement `--group=trailer` " Taylor Blau
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 72+ messages in thread
From: Taylor Blau @ 2022-10-11  0:34 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster
In the same spirit as the previous commit, reimplement
`--group=committer` as a special case of `--group=<format>`, too.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/shortlog.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index aac8c7afa4..b46df179fe 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -225,7 +225,6 @@ static void insert_records_from_format(struct shortlog *log,
 
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
-	struct strbuf ident = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
 	struct strset dups = STRSET_INIT;
 	struct pretty_print_context ctx = {0};
@@ -245,22 +244,12 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	}
 	oneline_str = oneline.len ? oneline.buf : "<none>";
 
-	if (log->groups & SHORTLOG_GROUP_COMMITTER) {
-		strbuf_reset(&ident);
-		format_commit_message(commit,
-				      log->email ? "%cN <%cE>" : "%cN",
-				      &ident, &ctx);
-		if (!HAS_MULTI_BITS(log->groups) ||
-		    strset_add(&dups, ident.buf))
-			insert_one_record(log, ident.buf, oneline_str);
-	}
 	if (log->groups & SHORTLOG_GROUP_TRAILER) {
 		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
 	}
 	insert_records_from_format(log, &dups, commit, &ctx, oneline_str);
 
 	strset_clear(&dups);
-	strbuf_release(&ident);
 	strbuf_release(&oneline);
 }
 
@@ -371,6 +360,9 @@ void shortlog_init_group(struct shortlog *log)
 	if (log->groups & SHORTLOG_GROUP_AUTHOR)
 		string_list_append(&log->format,
 				   log->email ? "%aN <%aE>" : "%aN");
+	if (log->groups & SHORTLOG_GROUP_COMMITTER)
+		string_list_append(&log->format,
+				   log->email ? "%cN <%cE>" : "%cN");
 }
 
 int cmd_shortlog(int argc, const char **argv, const char *prefix)
-- 
2.37.0.1.g1379af2e9d
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH 7/7] shortlog: implement `--group=trailer` in terms of `--group=<format>`
  2022-10-11  0:33 [PATCH 0/7] shortlog: introduce `--group=<format>` Taylor Blau
                   ` (5 preceding siblings ...)
  2022-10-11  0:34 ` [PATCH 6/7] shortlog: implement `--group=committer` " Taylor Blau
@ 2022-10-11  0:34 ` Taylor Blau
  2022-10-11  1:50   ` Jeff King
  2022-10-21  3:11 ` [PATCH v2 0/6] shortlog: introduce `--group=<format>` Taylor Blau
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 72+ messages in thread
From: Taylor Blau @ 2022-10-11  0:34 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster
In the same spirit as the previous commit, reimplement
`--group=trailer:<key>` as a special case of `--group=<format>`, too.
Unsurprisingly, this reimplementation is a little bit more complicated
than the previous two. The complexity stems from having to enumerate
each of the trailers one-by-one, as well as delegating control to
`parse_ident()` to handle whether or not to show an individual's email.
To enumerate each trailer in a commit, we set the separator used in the
pretty format to be a NUL byte. This hack allows us to treat the strbuf
from `format_commit_message()` as an array of strings.
Since this handling is unique to the `--group:trailer` mode, use `util`
bit in the `string_list_entry` to signal which `<format>`s came from
`--group:trailer` arguments (and thus need special treatment).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/shortlog.c | 66 ++++++++++++++++++----------------------------
 shortlog.h         |  1 -
 2 files changed, 25 insertions(+), 42 deletions(-)
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index b46df179fe..e8cf727342 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -162,44 +162,30 @@ static void read_from_stdin(struct shortlog *log)
 	strbuf_release(&oneline);
 }
 
-static void insert_records_from_trailers(struct shortlog *log,
-					 struct strset *dups,
-					 struct commit *commit,
-					 struct pretty_print_context *ctx,
-					 const char *oneline)
+static void insert_records_from_trailer(struct shortlog *log,
+					struct strset *dups,
+					struct strbuf *buf,
+					struct pretty_print_context *ctx,
+					const char *oneline)
 {
-	struct trailer_iterator iter;
-	const char *commit_buffer, *body;
-	struct strbuf ident = STRBUF_INIT;
+	struct strbuf ibuf = STRBUF_INIT;
+	char *bol = buf->buf;
+	char *value;
 
-	/*
-	 * Using format_commit_message("%B") would be simpler here, but
-	 * this saves us copying the message.
-	 */
-	commit_buffer = logmsg_reencode(commit, NULL, ctx->output_encoding);
-	body = strstr(commit_buffer, "\n\n");
-	if (!body)
-		return;
+	while (bol < buf->buf + buf->len) {
+		strbuf_reset(&ibuf);
 
-	trailer_iterator_init(&iter, body);
-	while (trailer_iterator_advance(&iter)) {
-		const char *value = iter.val.buf;
+		value = bol;
+		if (!parse_ident(log, &ibuf, bol))
+			value = ibuf.buf;
 
-		if (!string_list_has_string(&log->trailers, iter.key.buf))
-			continue;
+		if (strset_add(dups, value))
+			insert_one_record(log, value, oneline);
 
-		strbuf_reset(&ident);
-		if (!parse_ident(log, &ident, value))
-			value = ident.buf;
-
-		if (!strset_add(dups, value))
-			continue;
-		insert_one_record(log, value, oneline);
+		bol += strlen(bol) + 1;
 	}
-	trailer_iterator_release(&iter);
 
-	strbuf_release(&ident);
-	unuse_commit_buffer(commit, commit_buffer);
+	strbuf_release(&ibuf);
 }
 
 static void insert_records_from_format(struct shortlog *log,
@@ -216,7 +202,10 @@ static void insert_records_from_format(struct shortlog *log,
 
 		format_commit_message(commit, item->string, &buf, ctx);
 
-		if (strset_add(dups, buf.buf))
+		if (item->util)
+			insert_records_from_trailer(log, dups, &buf, ctx,
+						    oneline);
+		else if (strset_add(dups, buf.buf))
 			insert_one_record(log, buf.buf, oneline);
 	}
 
@@ -244,9 +233,6 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	}
 	oneline_str = oneline.len ? oneline.buf : "<none>";
 
-	if (log->groups & SHORTLOG_GROUP_TRAILER) {
-		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
-	}
 	insert_records_from_format(log, &dups, commit, &ctx, oneline_str);
 
 	strset_clear(&dups);
@@ -317,15 +303,17 @@ static int parse_group_option(const struct option *opt, const char *arg, int uns
 
 	if (unset) {
 		log->groups = 0;
-		string_list_clear(&log->trailers, 0);
 		string_list_clear(&log->format, 0);
 	} else if (!strcasecmp(arg, "author"))
 		log->groups |= SHORTLOG_GROUP_AUTHOR;
 	else if (!strcasecmp(arg, "committer"))
 		log->groups |= SHORTLOG_GROUP_COMMITTER;
 	else if (skip_prefix(arg, "trailer:", &field)) {
+		struct strbuf buf = STRBUF_INIT;
 		log->groups |= SHORTLOG_GROUP_TRAILER;
-		string_list_append(&log->trailers, field);
+		strbuf_addf(&buf, "%%(trailers:key=%s,valueonly=true,separator=%%x00)", field);
+		string_list_append(&log->format, buf.buf)->util = (void*)1;
+		strbuf_release(&buf);
 	} else if (strchrnul(arg, '%')) {
 		log->groups |= SHORTLOG_GROUP_FORMAT;
 		string_list_append(&log->format, arg);
@@ -347,8 +335,6 @@ void shortlog_init(struct shortlog *log)
 	log->wrap = DEFAULT_WRAPLEN;
 	log->in1 = DEFAULT_INDENT1;
 	log->in2 = DEFAULT_INDENT2;
-	log->trailers.strdup_strings = 1;
-	log->trailers.cmp = strcasecmp;
 	log->format.strdup_strings = 1;
 }
 
@@ -434,8 +420,6 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 
 	shortlog_init_group(&log);
 
-	string_list_sort(&log.trailers);
-
 	/* assume HEAD if from a tty */
 	if (!nongit && !rev.pending.nr && isatty(0))
 		add_head_to_pending(&rev);
diff --git a/shortlog.h b/shortlog.h
index e52f001fb7..ac844394a8 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -24,7 +24,6 @@ struct shortlog {
 		SHORTLOG_GROUP_TRAILER = (1 << 2),
 		SHORTLOG_GROUP_FORMAT = (1 << 3),
 	} groups;
-	struct string_list trailers;
 	struct string_list format;
 
 	int email;
-- 
2.37.0.1.g1379af2e9d
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* Re: [PATCH 2/7] shortlog: accept `--date`-related options
  2022-10-11  0:34 ` [PATCH 2/7] shortlog: accept `--date`-related options Taylor Blau
@ 2022-10-11  0:48   ` Jeff King
  2022-10-11  1:14     ` Taylor Blau
  0 siblings, 1 reply; 72+ messages in thread
From: Jeff King @ 2022-10-11  0:48 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster
On Mon, Oct 10, 2022 at 08:34:05PM -0400, Taylor Blau wrote:
> From: Jeff King <peff@peff.net>
> 
> Prepare for the future patch which will introduce arbitrary pretty
> formats via the `--group` argument.
> 
> To allow additional customizability (for example, to support something
> like `git shortlog -s --group='%aD' --date='format:%Y-%m' ...` (which
> groups commits by the datestring 'YYYY-mm' according to author date), we
> must store off the `--date` parsed from calling `parse_revision_opt()`.
Sorry, I haven't had a chance yet to look carefully at the rest of the
shortlog discussion, but I wanted to note here: this patch also affects
custom output formats. So:
  - we probably want to note that in the commit message as a limitation
    of this, versus something like %(authordate:format=...). I think
    that's fine, as the same limitation already applies to multiple
    dates in a single format string.
  - it's arguably a bug-fix, and can be tested in isolation like:
       git shortlog --format='%ad %s' --date=short
    which currently ignores the --date option entirely.
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 1/7] Documentation: extract date-options.txt
  2022-10-11  0:34 ` [PATCH 1/7] Documentation: extract date-options.txt Taylor Blau
@ 2022-10-11  0:54   ` Jeff King
  2022-10-11  1:02     ` Taylor Blau
  0 siblings, 1 reply; 72+ messages in thread
From: Jeff King @ 2022-10-11  0:54 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster
On Mon, Oct 10, 2022 at 08:34:02PM -0400, Taylor Blau wrote:
> A future commit will want to include `--date`-related options in the
> documentation for `git-shortlog(1)`, but without some of the additional
> baggage in the usual rev-list-options.txt.
> 
> Extract those options to a separate file in Documentation and include it
> from its original source in rev-list-options.txt.
> 
> This patch does not modify the contents of the `--date`-options section
> of Documentation/rev-list-options.txt.
I think git-shortlog.txt already includes rev-list-options, but the date
parts are omitted because there's an ifndef::git-shortlog[] around the
whole "commit formatting" section.
Should we just be loosening the ifndef here?
I think we _could_ include more of the "formatting" section overall, but
it looks like we have a custom mention of "--format" in
git-shortlog.txt, which I think is a better solution anyway. Maybe we
should just do the same with `--date`?
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 1/7] Documentation: extract date-options.txt
  2022-10-11  0:54   ` Jeff King
@ 2022-10-11  1:02     ` Taylor Blau
  0 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-11  1:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, jacob, gitster
On Mon, Oct 10, 2022 at 08:54:11PM -0400, Jeff King wrote:
> On Mon, Oct 10, 2022 at 08:34:02PM -0400, Taylor Blau wrote:
>
> > A future commit will want to include `--date`-related options in the
> > documentation for `git-shortlog(1)`, but without some of the additional
> > baggage in the usual rev-list-options.txt.
> >
> > Extract those options to a separate file in Documentation and include it
> > from its original source in rev-list-options.txt.
> >
> > This patch does not modify the contents of the `--date`-options section
> > of Documentation/rev-list-options.txt.
>
> I think git-shortlog.txt already includes rev-list-options, but the date
> parts are omitted because there's an ifndef::git-shortlog[] around the
> whole "commit formatting" section.
>
> Should we just be loosening the ifndef here?
That would do the trick. When I originally wrote it, I thought that
chopping the section up would be trickier than it actually was, which is
really just limited to the following:
--- 8< ---
diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 1cb91dfb9c..d5fbde6c80 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -1033,8 +1033,12 @@ endif::git-rev-list[]
 include::pretty-options.txt[]
+endif::git-shortlog[]
+
 include::date-options.txt[]
+ifndef::git-shortlog[]
+
 ifdef::git-rev-list[]
 --header::
 	Print the contents of the commit in raw-format; each record is
--- >8 ---
(Obviously reverting this patch to replace the include of
date-options.txt with the old contents that was there before this
patch).
> I think we _could_ include more of the "formatting" section overall, but
> it looks like we have a custom mention of "--format" in
> git-shortlog.txt, which I think is a better solution anyway. Maybe we
> should just do the same with `--date`?
That works for me, too. Let's do that ;-).
Thanks,
Taylor
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* Re: [PATCH 4/7] shortlog: support arbitrary commit format `--group`s
  2022-10-11  0:34 ` [PATCH 4/7] shortlog: support arbitrary commit format `--group`s Taylor Blau
@ 2022-10-11  1:12   ` Jeff King
  2022-10-11  1:24     ` Taylor Blau
  0 siblings, 1 reply; 72+ messages in thread
From: Jeff King @ 2022-10-11  1:12 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster
On Mon, Oct 10, 2022 at 08:34:12PM -0400, Taylor Blau wrote:
> diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
> index 4982ceee21..ca15643f45 100644
> --- a/Documentation/git-shortlog.txt
> +++ b/Documentation/git-shortlog.txt
> @@ -59,6 +59,8 @@ OPTIONS
>     example, if your project uses `Reviewed-by` trailers, you might want
>     to see who has been reviewing with
>     `git shortlog -ns --group=trailer:reviewed-by`.
> + - `<format>`, any string accepted by the `--format` option of 'git log'.
> +   (See the "PRETTY FORMATS" section of linkgit:git-log[1].)
I have a slight preference here to call this:
  --group=format:<format>
and then DWIM:
  --group=<format>
as long as <format> contains a percent. This is similar to how --pretty
works, but leaves us more room for changing things later if we need to.
> +static void insert_records_from_format(struct shortlog *log,
> +				       struct strset *dups,
> +				       struct commit *commit,
> +				       struct pretty_print_context *ctx,
> +				       const char *oneline)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +	struct string_list_item *item;
> +
> +	for_each_string_list_item(item, &log->format) {
> +		strbuf_reset(&buf);
> +
> +		format_commit_message(commit, item->string, &buf, ctx);
> +
> +		if (strset_add(dups, buf.buf))
> +			insert_one_record(log, buf.buf, oneline);
> +	}
> +
> +	strbuf_release(&buf);
> +}
Versus the earlier patch we discussed, this is one identity per --group,
so it matches how --group=trailer works. Good.
> @@ -243,6 +266,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
>  	if (log->groups & SHORTLOG_GROUP_TRAILER) {
>  		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
>  	}
> +	insert_records_from_format(log, &dups, commit, &ctx, oneline_str);
Hmm. I see why we don't need to guard this with:
  if (log->groups & SHORTLOG_GROUP_FORMAT)
since our helper function is a noop if log->format is empty. But I
wonder if:
  - it's more clear to match the others (although it looks like they are
    going away later, so that potentially becomes a non-issue)
  - it's useful to have a conditional that lets us skip any setup work.
    For trailers, in particular, we call logmsg_reencode(), which is
    potentially expensive. OTOH, it would be easy for the helper
    function to just return early when log->format.nr is 0.
So I'll reserve judgement until seeing the end state.
> diff --git a/shortlog.h b/shortlog.h
> index dc388dd459..4850a8c30f 100644
> --- a/shortlog.h
> +++ b/shortlog.h
> @@ -22,8 +22,10 @@ struct shortlog {
>  		SHORTLOG_GROUP_AUTHOR = (1 << 0),
>  		SHORTLOG_GROUP_COMMITTER = (1 << 1),
>  		SHORTLOG_GROUP_TRAILER = (1 << 2),
> +		SHORTLOG_GROUP_FORMAT = (1 << 3),
>  	} groups;
>  	struct string_list trailers;
> +	struct string_list format;
Given that "shortlog --format" does a totally different thing, I wonder
if this needs a more descriptive name like group_format or something.
Probably not. That other format feature is all in rev_info, and it's not
like "trailers" above doesn't have a similar issue.
> diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
> index 3095b1b2ff..2417ac8896 100755
> --- a/t/t4201-shortlog.sh
> +++ b/t/t4201-shortlog.sh
> @@ -237,6 +237,15 @@ test_expect_success 'shortlog --group=trailer:signed-off-by' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'shortlog --group=<format>' '
> +	git shortlog -s --date="format:%Y" --group="%cN (%cd)" HEAD >actual &&
> +	cat >expect <<-\EOF &&
> +	     4	C O Mitter (2005)
> +	     1	Sin Nombre (2005)
> +	EOF
> +	test_cmp expect actual
> +'
Makes sense. Two other tests that might be worth including:
  - "shortlog --group=bogus" generates an error (we might already have
    such a test; I didn't check)
  - since the multiple-option behavior is so subtle, maybe show a case
    where two formats partially overlap. A plausible one is "--group=%aN
    --group=%cN", but the test setup might need tweaked to cover both.
    There's an existing "multiple groups" test that might come in handy.
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 2/7] shortlog: accept `--date`-related options
  2022-10-11  0:48   ` Jeff King
@ 2022-10-11  1:14     ` Taylor Blau
  0 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-11  1:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, jacob, gitster
On Mon, Oct 10, 2022 at 08:48:49PM -0400, Jeff King wrote:
>   - it's arguably a bug-fix, and can be tested in isolation like:
>
>        git shortlog --format='%ad %s' --date=short
>
>     which currently ignores the --date option entirely.
Good call-out, and I agree with this reasoning. I updated the patch
accordingly, thanks!
Thanks,
Taylor
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 4/7] shortlog: support arbitrary commit format `--group`s
  2022-10-11  1:12   ` Jeff King
@ 2022-10-11  1:24     ` Taylor Blau
  2022-10-11  1:28       ` Taylor Blau
  2022-10-11  2:02       ` Jeff King
  0 siblings, 2 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-11  1:24 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, jacob, gitster
On Mon, Oct 10, 2022 at 09:12:46PM -0400, Jeff King wrote:
> On Mon, Oct 10, 2022 at 08:34:12PM -0400, Taylor Blau wrote:
>
> > diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
> > index 4982ceee21..ca15643f45 100644
> > --- a/Documentation/git-shortlog.txt
> > +++ b/Documentation/git-shortlog.txt
> > @@ -59,6 +59,8 @@ OPTIONS
> >     example, if your project uses `Reviewed-by` trailers, you might want
> >     to see who has been reviewing with
> >     `git shortlog -ns --group=trailer:reviewed-by`.
> > + - `<format>`, any string accepted by the `--format` option of 'git log'.
> > +   (See the "PRETTY FORMATS" section of linkgit:git-log[1].)
>
> I have a slight preference here to call this:
>
>   --group=format:<format>
That seems very reasonable, thanks!
> > @@ -243,6 +266,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
> >  	if (log->groups & SHORTLOG_GROUP_TRAILER) {
> >  		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
> >  	}
> > +	insert_records_from_format(log, &dups, commit, &ctx, oneline_str);
>
> Hmm. I see why we don't need to guard this with:
>
>   if (log->groups & SHORTLOG_GROUP_FORMAT)
>
> since our helper function is a noop if log->format is empty. But I
> wonder if:
>
>   - it's more clear to match the others (although it looks like they are
>     going away later, so that potentially becomes a non-issue)
>
>   - it's useful to have a conditional that lets us skip any setup work.
>     For trailers, in particular, we call logmsg_reencode(), which is
>     potentially expensive. OTOH, it would be easy for the helper
>     function to just return early when log->format.nr is 0.
In this case, `insert_records_from_format()` is cheap when
log->format.nr is 0. It is limited to setting up a strbuf to
STRBUF_INIT, and then calling strbuf_release() on it before returning.
And, indeed, the remaining conditionals go away by the final patch, so
you may want to decide then if it looks good to you or not.
> > diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
> > index 3095b1b2ff..2417ac8896 100755
> > --- a/t/t4201-shortlog.sh
> > +++ b/t/t4201-shortlog.sh
> > @@ -237,6 +237,15 @@ test_expect_success 'shortlog --group=trailer:signed-off-by' '
> >  	test_cmp expect actual
> >  '
> >
> > +test_expect_success 'shortlog --group=<format>' '
> > +	git shortlog -s --date="format:%Y" --group="%cN (%cd)" HEAD >actual &&
> > +	cat >expect <<-\EOF &&
> > +	     4	C O Mitter (2005)
> > +	     1	Sin Nombre (2005)
> > +	EOF
> > +	test_cmp expect actual
> > +'
>
> Makes sense. Two other tests that might be worth including:
>
>   - "shortlog --group=bogus" generates an error (we might already have
>     such a test; I didn't check)
We didn't have such a test before, so adding one is a good call, thanks!
>   - since the multiple-option behavior is so subtle, maybe show a case
>     where two formats partially overlap. A plausible one is "--group=%aN
>     --group=%cN", but the test setup might need tweaked to cover both.
>     There's an existing "multiple groups" test that might come in handy.
Interesting. I was starting to write that test up, but then realized
that this will be covered by the end of the series, since the
`--group=trailer` machinery is reimplemented in terms of the new format
group.
So if the existing "shortlog can match multiple groups" test keeps
passing, we did our job correctly ;-).
Thanks,
Taylor
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 4/7] shortlog: support arbitrary commit format `--group`s
  2022-10-11  1:24     ` Taylor Blau
@ 2022-10-11  1:28       ` Taylor Blau
  2022-10-11  2:03         ` Jeff King
  2022-10-11  2:02       ` Jeff King
  1 sibling, 1 reply; 72+ messages in thread
From: Taylor Blau @ 2022-10-11  1:28 UTC (permalink / raw)
  To: Jeff King; +Cc: git, jacob, gitster
On Mon, Oct 10, 2022 at 09:24:32PM -0400, Taylor Blau wrote:
> > Makes sense. Two other tests that might be worth including:
> >
> >   - "shortlog --group=bogus" generates an error (we might already have
> >     such a test; I didn't check)
>
> We didn't have such a test before, so adding one is a good call, thanks!
Just writing back to say that this was a really good suggestion, since
it caught my mistake in writing:
    } else if (strchrnul(arg, '%')) {
since we'll always get back a non-NULL answer from calling `strchrnul`
instead of `strchr`.
Thanks ;-).
Thanks,
Taylor
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 5/7] shortlog: implement `--group=author` in terms of `--group=<format>`
  2022-10-11  0:34 ` [PATCH 5/7] shortlog: implement `--group=author` in terms of `--group=<format>` Taylor Blau
@ 2022-10-11  1:34   ` Jeff King
  2022-10-11  1:48     ` Taylor Blau
  2022-10-11 10:57   ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 72+ messages in thread
From: Jeff King @ 2022-10-11  1:34 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster
On Mon, Oct 10, 2022 at 08:34:15PM -0400, Taylor Blau wrote:
> Instead of handling SHORTLOG_GROUP_AUTHOR separately, reimplement it as
> a special case of the new `--group=<format>` mode, where the author mode
> is a shorthand for `--group='%aN <%aE>'.
OK, this should be a nice cleanup.
> diff --git a/builtin/log.c b/builtin/log.c
> index ee19dc5d45..6b77e520b5 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -1334,6 +1334,7 @@ static void make_cover_letter(struct rev_info *rev, int use_separate_file,
>  	log.in2 = 4;
>  	log.file = rev->diffopt.file;
>  	log.groups = SHORTLOG_GROUP_AUTHOR;
> +	shortlog_init_group(&log);
>  	for (i = 0; i < nr; i++)
>  		shortlog_add_commit(&log, list[i]);
In another caller you drop the assignment of log.groups, since
shortlog_init_group() already does so if log.groups is 0 (which it will
be, since shortlog_init() will zero-initialize).
Should we do the same here? Or maybe leaving it is more obvious. It
would be more obvious still if we made the helper take the type, like:
  shortlog_init_group(&log, SHORTLOG_GROUP_AUTHOR);
But I guess that is not accurate, as we'd eventually use this in
shortlog.c to turn _any_ bits we've accumulated in log.group into their
correct formats.
I think the name of the helper function puzzled me a bit. It is really
"finish up any setup for the shortlog struct". We could lazily do it in
shortlog_add_commit(), but that's pretty subtle. But could we maybe call
it:
  shortlog_finish_setup();
or something? I dunno. I might be nit-picking, but I actually had to
scratch my head quite a bit to understand what was going on here.
> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index f708d96558..aac8c7afa4 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -245,15 +245,6 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
>  	}
>  	oneline_str = oneline.len ? oneline.buf : "<none>";
>  
> -	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
> -		strbuf_reset(&ident);
> -		format_commit_message(commit,
> -				      log->email ? "%aN <%aE>" : "%aN",
> -				      &ident, &ctx);
> -		if (!HAS_MULTI_BITS(log->groups) ||
> -		    strset_add(&dups, ident.buf))
> -			insert_one_record(log, ident.buf, oneline_str);
> -	}
This loses the HAS_MULTI_BITS() optimization. The idea there is that if
you have a single group-by that can't produce multiple outputs, then
there's no need to do duplicate detection.
The equivalent in an all-formats world is something like:
  log.format.nr > 1 && !log.trailers.nr
(because trailers are special in that one trailer key can produce
multiple idents for a single commit).
> +void shortlog_init_group(struct shortlog *log)
> +{
> +	if (!log->groups)
> +		log->groups = SHORTLOG_GROUP_AUTHOR;
> +
> +	if (log->groups & SHORTLOG_GROUP_AUTHOR)
> +		string_list_append(&log->format,
> +				   log->email ? "%aN <%aE>" : "%aN");
> +}
Regardless of the naming suggestion I made, I think things would be more
obvious if this top conditional remained in cmd_shortlog(). And then the
explicit assignment of "log.group" in make_cover_letter() would remain.
> @@ -439,8 +440,8 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
>  	log.file = rev.diffopt.file;
>  	log.date_mode = rev.date_mode;
>  
> -	if (!log.groups)
> -		log.groups = SHORTLOG_GROUP_AUTHOR;
> +	shortlog_init_group(&log);
> +
>  	string_list_sort(&log.trailers);
Now that we have a "finish the shortlog init" function, probably this
trailer sort should go into it, too. The current caller doesn't need it,
but it removes on more gotcha from using the shortlog API.
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 6/7] shortlog: implement `--group=committer` in terms of `--group=<format>`
  2022-10-11  0:34 ` [PATCH 6/7] shortlog: implement `--group=committer` " Taylor Blau
@ 2022-10-11  1:35   ` Jeff King
  0 siblings, 0 replies; 72+ messages in thread
From: Jeff King @ 2022-10-11  1:35 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster
On Mon, Oct 10, 2022 at 08:34:18PM -0400, Taylor Blau wrote:
> In the same spirit as the previous commit, reimplement
> `--group=committer` as a special case of `--group=<format>`, too.
Makes sense. This all looks as expected, but the same HAS_MULTI_BITS()
comment applies.
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 5/7] shortlog: implement `--group=author` in terms of `--group=<format>`
  2022-10-11  1:34   ` Jeff King
@ 2022-10-11  1:48     ` Taylor Blau
  2022-10-11  2:15       ` Jeff King
  2022-10-21  2:02       ` Taylor Blau
  0 siblings, 2 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-11  1:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, jacob, gitster
On Mon, Oct 10, 2022 at 09:34:42PM -0400, Jeff King wrote:
> On Mon, Oct 10, 2022 at 08:34:15PM -0400, Taylor Blau wrote:
>
> I think the name of the helper function puzzled me a bit. It is really
> "finish up any setup for the shortlog struct". We could lazily do it in
> shortlog_add_commit(), but that's pretty subtle. But could we maybe call
> it:
>
>   shortlog_finish_setup();
>
> or something? I dunno. I might be nit-picking, but I actually had to
> scratch my head quite a bit to understand what was going on here.
I think that shortlog_finish_setup() is probably the most reasonable
choice (and I changed it to that locally). I spent a day or so thinking
on and off about this while writing the series, and didn't come up with
any satisfying names.
I think it points to something deeper about the API that doesn't quite
sit right with me. But shortlog_finish_setup() is the least-unsatisfying
name so far, so let's go with that ;-).
> > diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> > index f708d96558..aac8c7afa4 100644
> > --- a/builtin/shortlog.c
> > +++ b/builtin/shortlog.c
> > @@ -245,15 +245,6 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
> >  	}
> >  	oneline_str = oneline.len ? oneline.buf : "<none>";
> >
> > -	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
> > -		strbuf_reset(&ident);
> > -		format_commit_message(commit,
> > -				      log->email ? "%aN <%aE>" : "%aN",
> > -				      &ident, &ctx);
> > -		if (!HAS_MULTI_BITS(log->groups) ||
> > -		    strset_add(&dups, ident.buf))
> > -			insert_one_record(log, ident.buf, oneline_str);
> > -	}
>
> This loses the HAS_MULTI_BITS() optimization. The idea there is that if
> you have a single group-by that can't produce multiple outputs, then
> there's no need to do duplicate detection.
>
> The equivalent in an all-formats world is something like:
>
>   log.format.nr > 1 && !log.trailers.nr
>
> (because trailers are special in that one trailer key can produce
> multiple idents for a single commit).
Hmm. I suspect that this is going to become more complicated by the time
that you read the final patch ;-). Let's wait until then and see what
you think.
> > +void shortlog_init_group(struct shortlog *log)
> > +{
> > +	if (!log->groups)
> > +		log->groups = SHORTLOG_GROUP_AUTHOR;
> > +
> > +	if (log->groups & SHORTLOG_GROUP_AUTHOR)
> > +		string_list_append(&log->format,
> > +				   log->email ? "%aN <%aE>" : "%aN");
> > +}
>
> Regardless of the naming suggestion I made, I think things would be more
> obvious if this top conditional remained in cmd_shortlog(). And then the
> explicit assignment of "log.group" in make_cover_letter() would remain.
Yep, works for me.
> > @@ -439,8 +440,8 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
> >  	log.file = rev.diffopt.file;
> >  	log.date_mode = rev.date_mode;
> >
> > -	if (!log.groups)
> > -		log.groups = SHORTLOG_GROUP_AUTHOR;
> > +	shortlog_init_group(&log);
> > +
> >  	string_list_sort(&log.trailers);
>
> Now that we have a "finish the shortlog init" function, probably this
> trailer sort should go into it, too. The current caller doesn't need it,
> but it removes on more gotcha from using the shortlog API.
We'll drop this list by the end of the series, too, so it probably isn't
worth moving it into shortlog_finish_setup() in the interim.
> -Peff
Thanks,
Taylor
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 7/7] shortlog: implement `--group=trailer` in terms of `--group=<format>`
  2022-10-11  0:34 ` [PATCH 7/7] shortlog: implement `--group=trailer` " Taylor Blau
@ 2022-10-11  1:50   ` Jeff King
  2022-10-11  1:57     ` Jeff King
  2022-10-21  1:38     ` Taylor Blau
  0 siblings, 2 replies; 72+ messages in thread
From: Jeff King @ 2022-10-11  1:50 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster
On Mon, Oct 10, 2022 at 08:34:21PM -0400, Taylor Blau wrote:
> In the same spirit as the previous commit, reimplement
> `--group=trailer:<key>` as a special case of `--group=<format>`, too.
> 
> Unsurprisingly, this reimplementation is a little bit more complicated
> than the previous two. The complexity stems from having to enumerate
> each of the trailers one-by-one, as well as delegating control to
> `parse_ident()` to handle whether or not to show an individual's email.
> 
> To enumerate each trailer in a commit, we set the separator used in the
> pretty format to be a NUL byte. This hack allows us to treat the strbuf
> from `format_commit_message()` as an array of strings.
Hmph. So that's _probably_ OK, but I have to wonder if this is going too
far. The resulting code is not that much shorter, and IMHO is actually a
little more complicated, because of this hack and the extra util bit.
I also would be curious if there is any speed difference between the
two.
I spent a fair bit of time optimizing the regular author/committer paths
(and I think if we restore the skip-the-dedup logic I mentioned earlier
for those, they should be equivalent, as they already use
format_commit_message()). But the internals of the trailer code are
already so slow I doubt it makes much of a difference either way here.
The one thing that could hurt is that multiple trailers now require
multiple format calls, which may load the commit object for each one. I
think _probably_ not because we'd hopefully have it cached in the commit
slab via the save_commit_buffer mechanism. Though I guess if we used the
commit graph in the first place, we would need to load it. We'd also
likely reencode, but that's a noop on utf8 commits, so it's probably not
worth worrying about.
-Peff
PS That last paragraph gives me the suspicion that there's some
   low-hanging optimization fruit here: we want save_commit_buffer on in
   general, but after we visit each commit, we probably should call
   free_commit_buffer() on it to get rid of it, like we do for git-log.
   Probably shortlog is keeping all of the commit messages in memory at
   once (though of course not if the commit-graph is in use).
   If the commit-graph _is_ in use, there's probably the opposite
   optimization we could be doing: we should make sure to load it once,
   then do all of the formats, and then unuse it.
   Those don't have to be part of your series, but might be worth
   looking into while you're in the area.
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 7/7] shortlog: implement `--group=trailer` in terms of `--group=<format>`
  2022-10-11  1:50   ` Jeff King
@ 2022-10-11  1:57     ` Jeff King
  2022-10-21  1:38     ` Taylor Blau
  1 sibling, 0 replies; 72+ messages in thread
From: Jeff King @ 2022-10-11  1:57 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster
On Mon, Oct 10, 2022 at 09:50:42PM -0400, Jeff King wrote:
> I spent a fair bit of time optimizing the regular author/committer paths
> (and I think if we restore the skip-the-dedup logic I mentioned earlier
> for those, they should be equivalent, as they already use
> format_commit_message()). But the internals of the trailer code are
> already so slow I doubt it makes much of a difference either way here.
> 
> The one thing that could hurt is that multiple trailers now require
> multiple format calls, which may load the commit object for each one. I
> think _probably_ not because we'd hopefully have it cached in the commit
> slab via the save_commit_buffer mechanism. Though I guess if we used the
> commit graph in the first place, we would need to load it. We'd also
> likely reencode, but that's a noop on utf8 commits, so it's probably not
> worth worrying about.
Looking at it more carefully, your new code also does an individual
trailer parse for each trailer. And that code is also really slow, and
better if we call it only once.
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 4/7] shortlog: support arbitrary commit format `--group`s
  2022-10-11  1:24     ` Taylor Blau
  2022-10-11  1:28       ` Taylor Blau
@ 2022-10-11  2:02       ` Jeff King
  2022-10-21  2:39         ` Taylor Blau
  1 sibling, 1 reply; 72+ messages in thread
From: Jeff King @ 2022-10-11  2:02 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster
On Mon, Oct 10, 2022 at 09:24:32PM -0400, Taylor Blau wrote:
> > Hmm. I see why we don't need to guard this with:
> >
> >   if (log->groups & SHORTLOG_GROUP_FORMAT)
> >
> > since our helper function is a noop if log->format is empty. But I
> > wonder if:
> >
> >   - it's more clear to match the others (although it looks like they are
> >     going away later, so that potentially becomes a non-issue)
> >
> >   - it's useful to have a conditional that lets us skip any setup work.
> >     For trailers, in particular, we call logmsg_reencode(), which is
> >     potentially expensive. OTOH, it would be easy for the helper
> >     function to just return early when log->format.nr is 0.
> 
> In this case, `insert_records_from_format()` is cheap when
> log->format.nr is 0. It is limited to setting up a strbuf to
> STRBUF_INIT, and then calling strbuf_release() on it before returning.
> 
> And, indeed, the remaining conditionals go away by the final patch, so
> you may want to decide then if it looks good to you or not.
Right. Having read to the end, a few new thoughts:
  - I'm skeptical on the conversion of --group=trailer to use the
    formats. It seems more complicated. The others seem fine.
    As a result, I do think it's awkward to check bits for log.format
    (since now there are many), and we should just enter the helper
    function unconditionally. I do think it's a little weird to check
    the TRAILER bit just before it though (assuming we'd leave that in
    place). But it would be natural for us to just return early and skip
    any setup work in insert_records_from_trailers(). I.e., something
    like this:
    diff --git a/builtin/shortlog.c b/builtin/shortlog.c
    index 086dfee45a..19d953c26a 100644
    --- a/builtin/shortlog.c
    +++ b/builtin/shortlog.c
    @@ -170,6 +170,9 @@ static void insert_records_from_trailers(struct shortlog *log,
     	const char *commit_buffer, *body;
     	struct strbuf ident = STRBUF_INIT;
     
    +	if (!log->trailers.nr)
    +		return;
    +
     	/*
     	 * Using format_commit_message("%B") would be simpler here, but
     	 * this saves us copying the message.
    @@ -240,9 +243,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
     		    strset_add(&dups, ident.buf))
     			insert_one_record(log, ident.buf, oneline_str);
     	}
    -	if (log->groups & SHORTLOG_GROUP_TRAILER) {
    -		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
    -	}
    +	insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
     
     	strset_clear(&dups);
     	strbuf_release(&ident);
  - I mentioned possible setup work. And indeed, I think it may be
    useful for insert_records_from_format() to load and cache the commit
    message once via get_commit_buffer(), since that result could then
    be used by all formats. But that is not really specific to
    --group=format! The existing code would benefit for any multi-group
    invocation. So any such setup should probably be at the top of
    shortlog_add_commit() anyway.
> >   - since the multiple-option behavior is so subtle, maybe show a case
> >     where two formats partially overlap. A plausible one is "--group=%aN
> >     --group=%cN", but the test setup might need tweaked to cover both.
> >     There's an existing "multiple groups" test that might come in handy.
> 
> Interesting. I was starting to write that test up, but then realized
> that this will be covered by the end of the series, since the
> `--group=trailer` machinery is reimplemented in terms of the new format
> group.
True, if we follow through on that. ;)
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 4/7] shortlog: support arbitrary commit format `--group`s
  2022-10-11  1:28       ` Taylor Blau
@ 2022-10-11  2:03         ` Jeff King
  0 siblings, 0 replies; 72+ messages in thread
From: Jeff King @ 2022-10-11  2:03 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster
On Mon, Oct 10, 2022 at 09:28:24PM -0400, Taylor Blau wrote:
> On Mon, Oct 10, 2022 at 09:24:32PM -0400, Taylor Blau wrote:
> > > Makes sense. Two other tests that might be worth including:
> > >
> > >   - "shortlog --group=bogus" generates an error (we might already have
> > >     such a test; I didn't check)
> >
> > We didn't have such a test before, so adding one is a good call, thanks!
> 
> Just writing back to say that this was a really good suggestion, since
> it caught my mistake in writing:
> 
>     } else if (strchrnul(arg, '%')) {
> 
> since we'll always get back a non-NULL answer from calling `strchrnul`
> instead of `strchr`.
I'm going to pretend that I noticed that bug and was gently leading you
to it, rather than that I was oblivious and lucky.
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 5/7] shortlog: implement `--group=author` in terms of `--group=<format>`
  2022-10-11  1:48     ` Taylor Blau
@ 2022-10-11  2:15       ` Jeff King
  2022-10-21  2:03         ` Taylor Blau
  2022-10-21  2:02       ` Taylor Blau
  1 sibling, 1 reply; 72+ messages in thread
From: Jeff King @ 2022-10-11  2:15 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster
On Mon, Oct 10, 2022 at 09:48:47PM -0400, Taylor Blau wrote:
> I think that shortlog_finish_setup() is probably the most reasonable
> choice (and I changed it to that locally). I spent a day or so thinking
> on and off about this while writing the series, and didn't come up with
> any satisfying names.
> 
> I think it points to something deeper about the API that doesn't quite
> sit right with me. But shortlog_finish_setup() is the least-unsatisfying
> name so far, so let's go with that ;-).
Yeah, touching that block in make_cover_letter() definitely tickled my
memory that there was some awkwardness there. That's when I added
shortlog_init() at all (which is good, because otherwise you'd have had
an uninitialized log.format string-list).
I think the general pattern of:
  foo_init();
  foo.option = whatever;
  foo_finish_setup();
  foo_do_the_thing();
is OK. It's a little complicated, but it gives callers the freedom to
tweak options with a lot of flexibility (e.g., based on command line
config, command line options, etc). We have a similar pattern with
diff_setup_done().
The other option is for any option-tweaking to go through methods which
maintain invariants (like if GROUP_AUTHOR is set, then the "%an" format
has been added). That's usually the "right" object-oriented way to do
it. But I think even in this simple case it gets awkward, because the
correct format can't be known until you see whether log->email is set.
So it really has to be a "finalize" step after both log->email and
log->group are set.
> > This loses the HAS_MULTI_BITS() optimization. The idea there is that if
> > you have a single group-by that can't produce multiple outputs, then
> > there's no need to do duplicate detection.
> >
> > The equivalent in an all-formats world is something like:
> >
> >   log.format.nr > 1 && !log.trailers.nr
> >
> > (because trailers are special in that one trailer key can produce
> > multiple idents for a single commit).
> 
> Hmm. I suspect that this is going to become more complicated by the time
> that you read the final patch ;-). Let's wait until then and see what
> you think.
Yes, it's pretty gross with the trailer util thing. You'd probably want
to do something like:
  static int needs_dedup(const struct string_list *formats)
  {
	struct string_list_item *item;
	if (formats->nr > 1)
		return 1;
	for_each_string_list_item(item, formats) {
		if (item->util)
			return 1;
	}
	return 0;
  }
and perhaps call it only once and cache the result, rather than
iterating for each commit/format.
But if we leave trailers as is, then the logic is much easier. And
implementing the optimization for --group=format should fall out pretty
naturally (and that both preserves it for --group=author, and extends it
to --group=format, which I should have noticed when reviewing patch 4).
> > > @@ -439,8 +440,8 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
> > >  	log.file = rev.diffopt.file;
> > >  	log.date_mode = rev.date_mode;
> > >
> > > -	if (!log.groups)
> > > -		log.groups = SHORTLOG_GROUP_AUTHOR;
> > > +	shortlog_init_group(&log);
> > > +
> > >  	string_list_sort(&log.trailers);
> >
> > Now that we have a "finish the shortlog init" function, probably this
> > trailer sort should go into it, too. The current caller doesn't need it,
> > but it removes on more gotcha from using the shortlog API.
> 
> We'll drop this list by the end of the series, too, so it probably isn't
> worth moving it into shortlog_finish_setup() in the interim.
Ah, right. Well, see my other comments. :)
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 3/7] shortlog: extract `--group` fragment for translation
  2022-10-11  0:34 ` [PATCH 3/7] shortlog: extract `--group` fragment for translation Taylor Blau
@ 2022-10-11 10:55   ` Ævar Arnfjörð Bjarmason
  2022-10-11 13:24     ` Jeff King
  0 siblings, 1 reply; 72+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-11 10:55 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, peff, gitster
On Mon, Oct 10 2022, Taylor Blau wrote:
> The subsequent commit will add another unhandled case in
> `read_from_stdin()` which will want to use the same message as with
> `--group=trailer`.
>
> Extract the "--group=trailer" part from this message so the same
> translation key can be used for both cases.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/shortlog.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index 53c379a51d..051c97bd5a 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -132,7 +132,7 @@ static void read_from_stdin(struct shortlog *log)
>  		match = committer_match;
>  		break;
>  	case SHORTLOG_GROUP_TRAILER:
> -		die(_("using --group=trailer with stdin is not supported"));
> +		die(_("using %s with stdin is not supported"), "--group=trailer");
>  	default:
>  		BUG("unhandled shortlog group");
>  	}
Rather than add another translation that you can use in 2x places here
(with 4/7) instead do:
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 7a1e1fe7c0e..59aef24f637 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -132,7 +132,8 @@ static void read_from_stdin(struct shortlog *log)
 		match = committer_match;
 		break;
 	case SHORTLOG_GROUP_TRAILER:
-		die(_("using --group=trailer with stdin is not supported"));
+		die(_("options '%s' and '%s' cannot be used together"),
+		    "--group=<trailer>", "--stdin");
 	default:
 		BUG("unhandled shortlog group");
 	}
Which as you can see with "git grep 'cannot be used together'" we
already use in a lot of places.
We should probably extract this to a:
	die_opt_2_incompatible("--group=<trailer>", "--stdin");
But that can all be done in some hypothetical future, but for now the
status quo is that we know of these common strings, and copy/paste them
in various files.
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* Re: [PATCH 5/7] shortlog: implement `--group=author` in terms of `--group=<format>`
  2022-10-11  0:34 ` [PATCH 5/7] shortlog: implement `--group=author` in terms of `--group=<format>` Taylor Blau
  2022-10-11  1:34   ` Jeff King
@ 2022-10-11 10:57   ` Ævar Arnfjörð Bjarmason
  2022-10-11 11:07     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 72+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-11 10:57 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, peff, gitster
On Mon, Oct 10 2022, Taylor Blau wrote:
> Instead of handling SHORTLOG_GROUP_AUTHOR separately, reimplement it as
> a special case of the new `--group=<format>` mode, where the author mode
> is a shorthand for `--group='%aN <%aE>'.
>
> Note that we still need to keep the SHORTLOG_GROUP_AUTHOR enum since it
> has a different meaning in `read_from_stdin()`, where it is still used
> for a different purpose.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/log.c      |  1 +
>  builtin/shortlog.c | 23 ++++++++++++-----------
>  shortlog.h         |  1 +
>  3 files changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/builtin/log.c b/builtin/log.c
> index ee19dc5d45..6b77e520b5 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -1334,6 +1334,7 @@ static void make_cover_letter(struct rev_info *rev, int use_separate_file,
>  	log.in2 = 4;
>  	log.file = rev->diffopt.file;
>  	log.groups = SHORTLOG_GROUP_AUTHOR;
> +	shortlog_init_group(&log);
>  	for (i = 0; i < nr; i++)
>  		shortlog_add_commit(&log, list[i]);
>  
> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index f708d96558..aac8c7afa4 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -245,15 +245,6 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
>  	}
>  	oneline_str = oneline.len ? oneline.buf : "<none>";
>  
> -	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
> -		strbuf_reset(&ident);
> -		format_commit_message(commit,
> -				      log->email ? "%aN <%aE>" : "%aN",
> -				      &ident, &ctx);
> -		if (!HAS_MULTI_BITS(log->groups) ||
> -		    strset_add(&dups, ident.buf))
> -			insert_one_record(log, ident.buf, oneline_str);
> -	}
>  	if (log->groups & SHORTLOG_GROUP_COMMITTER) {
>  		strbuf_reset(&ident);
>  		format_commit_message(commit,
> @@ -372,6 +363,16 @@ void shortlog_init(struct shortlog *log)
>  	log->format.strdup_strings = 1;
>  }
>  
> +void shortlog_init_group(struct shortlog *log)
> +{
> +	if (!log->groups)
> +		log->groups = SHORTLOG_GROUP_AUTHOR;
> +
> +	if (log->groups & SHORTLOG_GROUP_AUTHOR)
Nit (easier reading):
	if (!x)
	...
	else if (x & FLAG)
	...
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 5/7] shortlog: implement `--group=author` in terms of `--group=<format>`
  2022-10-11 10:57   ` Ævar Arnfjörð Bjarmason
@ 2022-10-11 11:07     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 72+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-11 11:07 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, peff, gitster
On Tue, Oct 11 2022, Ævar Arnfjörð Bjarmason wrote:
> On Mon, Oct 10 2022, Taylor Blau wrote:
>> +void shortlog_init_group(struct shortlog *log)
>> +{
>> +	if (!log->groups)
>> +		log->groups = SHORTLOG_GROUP_AUTHOR;
>> +
>> +	if (log->groups & SHORTLOG_GROUP_AUTHOR)
>
> Nit (easier reading):
>
> 	if (!x)
> 	...
> 	else if (x & FLAG)
> 	...
Urgh, sorry about the noise. That's an obvious logic error, I don't know
what I was thinking...
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 3/7] shortlog: extract `--group` fragment for translation
  2022-10-11 10:55   ` Ævar Arnfjörð Bjarmason
@ 2022-10-11 13:24     ` Jeff King
  0 siblings, 0 replies; 72+ messages in thread
From: Jeff King @ 2022-10-11 13:24 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Taylor Blau, git, jacob, gitster
On Tue, Oct 11, 2022 at 12:55:10PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >  	case SHORTLOG_GROUP_TRAILER:
> > -		die(_("using --group=trailer with stdin is not supported"));
> > +		die(_("using %s with stdin is not supported"), "--group=trailer");
> >  	default:
> >  		BUG("unhandled shortlog group");
> >  	}
> 
> Rather than add another translation that you can use in 2x places here
> (with 4/7) instead do:
> 
> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index 7a1e1fe7c0e..59aef24f637 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -132,7 +132,8 @@ static void read_from_stdin(struct shortlog *log)
>  		match = committer_match;
>  		break;
>  	case SHORTLOG_GROUP_TRAILER:
> -		die(_("using --group=trailer with stdin is not supported"));
> +		die(_("options '%s' and '%s' cannot be used together"),
> +		    "--group=<trailer>", "--stdin");
>  	default:
>  		BUG("unhandled shortlog group");
>  	}
It's not usually --stdin, though. Generally you'd just not provide any
revisions to traverse, and it defaults to stdin. So "with stdin" is IMHO
a more accurate message, as it covers all of the cases.
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 7/7] shortlog: implement `--group=trailer` in terms of `--group=<format>`
  2022-10-11  1:50   ` Jeff King
  2022-10-11  1:57     ` Jeff King
@ 2022-10-21  1:38     ` Taylor Blau
  1 sibling, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-21  1:38 UTC (permalink / raw)
  To: Jeff King; +Cc: git, jacob, gitster
On Mon, Oct 10, 2022 at 09:50:42PM -0400, Jeff King wrote:
> Hmph. So that's _probably_ OK, but I have to wonder if this is going too
> far. The resulting code is not that much shorter, and IMHO is actually a
> little more complicated, because of this hack and the extra util bit.
Thanks for a careful analysis, as always. "Additional complexity" and
"slower" are both usually not worth it in isolation, but it's definitely
difficult to advocate for doing both at the same time ;-).
Let's drop this patch and let the `--group=trailer` bits continue to
benefit from your optimizations.
Thanks,
Taylor
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 5/7] shortlog: implement `--group=author` in terms of `--group=<format>`
  2022-10-11  1:48     ` Taylor Blau
  2022-10-11  2:15       ` Jeff King
@ 2022-10-21  2:02       ` Taylor Blau
  2022-10-21  5:03         ` Jeff King
  1 sibling, 1 reply; 72+ messages in thread
From: Taylor Blau @ 2022-10-21  2:02 UTC (permalink / raw)
  To: Jeff King; +Cc: git, jacob, gitster
On Mon, Oct 10, 2022 at 09:48:47PM -0400, Taylor Blau wrote:
> > > diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> > > index f708d96558..aac8c7afa4 100644
> > > --- a/builtin/shortlog.c
> > > +++ b/builtin/shortlog.c
> > > @@ -245,15 +245,6 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
> > >  	}
> > >  	oneline_str = oneline.len ? oneline.buf : "<none>";
> > >
> > > -	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
> > > -		strbuf_reset(&ident);
> > > -		format_commit_message(commit,
> > > -				      log->email ? "%aN <%aE>" : "%aN",
> > > -				      &ident, &ctx);
> > > -		if (!HAS_MULTI_BITS(log->groups) ||
> > > -		    strset_add(&dups, ident.buf))
> > > -			insert_one_record(log, ident.buf, oneline_str);
> > > -	}
> >
> > This loses the HAS_MULTI_BITS() optimization. The idea there is that if
> > you have a single group-by that can't produce multiple outputs, then
> > there's no need to do duplicate detection.
> >
> > The equivalent in an all-formats world is something like:
> >
> >   log.format.nr > 1 && !log.trailers.nr
> >
> > (because trailers are special in that one trailer key can produce
> > multiple idents for a single commit).
Hmm. Shouldn't that "&& !log.trailers.nr" be an "|| log.trailers.nr"? It
doesn't seem to make sense to say "there are things that could produce
multiple outputs" if there's more than one format _and_ no trailers.
The logic should read "there are things that could produce multiple
outputs if there is more than one format *or* at least one trailer".
So I think the right change would be:
--- >8 ---
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 95ceab7649..7e1b56e2aa 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -216,7 +216,8 @@ static void insert_records_from_format(struct shortlog *log,
 		format_commit_message(commit, item->string, &buf, ctx);
-		if (strset_add(dups, buf.buf))
+		if (!(log->format.nr > 1 || log->trailers.nr) ||
+		    strset_add(dups, buf.buf))
 			insert_one_record(log, buf.buf, oneline);
 	}
--- 8< ---
Yeah?
Thanks,
Taylor
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* Re: [PATCH 5/7] shortlog: implement `--group=author` in terms of `--group=<format>`
  2022-10-11  2:15       ` Jeff King
@ 2022-10-21  2:03         ` Taylor Blau
  0 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-21  2:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git, jacob, gitster
On Mon, Oct 10, 2022 at 10:15:32PM -0400, Jeff King wrote:
> > > > @@ -439,8 +440,8 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
> > > >  	log.file = rev.diffopt.file;
> > > >  	log.date_mode = rev.date_mode;
> > > >
> > > > -	if (!log.groups)
> > > > -		log.groups = SHORTLOG_GROUP_AUTHOR;
> > > > +	shortlog_init_group(&log);
> > > > +
> > > >  	string_list_sort(&log.trailers);
> > >
> > > Now that we have a "finish the shortlog init" function, probably this
> > > trailer sort should go into it, too. The current caller doesn't need it,
> > > but it removes on more gotcha from using the shortlog API.
> >
> > We'll drop this list by the end of the series, too, so it probably isn't
> > worth moving it into shortlog_finish_setup() in the interim.
>
> Ah, right. Well, see my other comments. :)
Seen ;-). Now that this series no longer touches the log.trailers bits,
we should move this around. Done locally, will send a reroll shortly or
tomorrow.
Thanks,
Taylor
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 4/7] shortlog: support arbitrary commit format `--group`s
  2022-10-11  2:02       ` Jeff King
@ 2022-10-21  2:39         ` Taylor Blau
  2022-10-21  5:21           ` Jeff King
  0 siblings, 1 reply; 72+ messages in thread
From: Taylor Blau @ 2022-10-21  2:39 UTC (permalink / raw)
  To: Jeff King; +Cc: git, jacob, gitster
On Mon, Oct 10, 2022 at 10:02:13PM -0400, Jeff King wrote:
> > >   - since the multiple-option behavior is so subtle, maybe show a case
> > >     where two formats partially overlap. A plausible one is "--group=%aN
> > >     --group=%cN", but the test setup might need tweaked to cover both.
> > >     There's an existing "multiple groups" test that might come in handy.
> >
> > Interesting. I was starting to write that test up, but then realized
> > that this will be covered by the end of the series, since the
> > `--group=trailer` machinery is reimplemented in terms of the new format
> > group.
>
> True, if we follow through on that. ;)
So, obviously we ended up not following through on that ;-).
I tried to leverage the existing test as much as possible, to the point
that the new test is pretty hacky. But I think that the result is cute,
and it does save us from having to modify the downstream tests (or
create unreachable commits, initialize another repository, etc).
It looks something like this:
--- 8< ---
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 0abe5354fc..566d581e1b 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -356,6 +356,19 @@ test_expect_success 'shortlog can match multiple groups' '
 	test_cmp expect actual
 '
+test_expect_success 'shortlog can match multiple format groups' '
+	cat >expect <<-\EOF &&
+	     2	User B <b@example.com>
+	     1	A U Thor <author@example.com>
+	     1	User A <a@example.com>
+	EOF
+	git shortlog -ns \
+		--group="%(trailers:valueonly,separator=%0x00,key=some-trailer)" \
+		--group="%(trailers:valueonly,separator=%0x00,key=another-trailer)" \
+		-2 HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'set up option selection tests' '
 	git commit --allow-empty -F - <<-\EOF
 	subject
--- >8 ---
The gross bit there really is the 'separator=%0x00' thing, since the
"trailers" pretty format gives us a NL character already. I suppose that
you could do something like this on top instead:
--- >8 ---
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 566d581e1b..13ac0bac64 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -363,9 +363,10 @@ test_expect_success 'shortlog can match multiple format groups' '
 	     1	User A <a@example.com>
 	EOF
 	git shortlog -ns \
-		--group="%(trailers:valueonly,separator=%0x00,key=some-trailer)" \
-		--group="%(trailers:valueonly,separator=%0x00,key=another-trailer)" \
-		-2 HEAD >actual &&
+		--group="%(trailers:valueonly,key=some-trailer)" \
+		--group="%(trailers:valueonly,key=another-trailer)" \
+		-2 HEAD >actual.raw &&
+	grep -v "^$" actual.raw >actual &&
 	test_cmp expect actual
 '
--- 8< ---
which I think I prefer slightly.
If this is all too hacky for your (or anybody's!) taste, let me know.
But I think ultimately this results in a smaller, more easily digestible
diff overall.
Thanks,
Taylor
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH v2 0/6] shortlog: introduce `--group=<format>`
  2022-10-11  0:33 [PATCH 0/7] shortlog: introduce `--group=<format>` Taylor Blau
                   ` (6 preceding siblings ...)
  2022-10-11  0:34 ` [PATCH 7/7] shortlog: implement `--group=trailer` " Taylor Blau
@ 2022-10-21  3:11 ` Taylor Blau
  2022-10-21  3:11   ` [PATCH v2 1/6] shortlog: accept `--date`-related options Taylor Blau
                     ` (6 more replies)
  2022-10-21 22:25 ` [PATCH v3 0/7] " Taylor Blau
  2022-10-24 18:55 ` [PATCH " Taylor Blau
  9 siblings, 7 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-21  3:11 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster, avarab
Here is a reroll of my series to implement arbitrary pretty formats as
shortlog `--group`'s, based on a suggestion from Jacob Stopak.
The changes are somewhat minimal, including a rebase onto the current
tip of master. Less minimal, however, is dropping the reimplementation
of `--group=trailer:<key>` in terms of the format group, since this
ended up being more trouble than it was worth.
There are also a handful of small tweaks throughout based on feedback
from Peff.
Thanks in advance for your review.
Jeff King (1):
  shortlog: accept `--date`-related options
Taylor Blau (5):
  shortlog: make trailer insertion a noop when appropriate
  shortlog: extract `--group` fragment for translation
  shortlog: support arbitrary commit format `--group`s
  shortlog: implement `--group=author` in terms of `--group=<format>`
  shortlog: implement `--group=committer` in terms of `--group=<format>`
 Documentation/git-shortlog.txt |  8 ++++
 builtin/log.c                  |  1 +
 builtin/shortlog.c             | 83 +++++++++++++++++++++++-----------
 shortlog.h                     |  5 ++
 t/t4201-shortlog.sh            | 51 +++++++++++++++++++++
 5 files changed, 121 insertions(+), 27 deletions(-)
Range-diff against v1:
1:  eaec59daa1 < -:  ---------- Documentation: extract date-options.txt
2:  b587b8ea4a ! 1:  58baccbaa8 shortlog: accept `--date`-related options
    @@ Metadata
      ## Commit message ##
         shortlog: accept `--date`-related options
     
    -    Prepare for the future patch which will introduce arbitrary pretty
    -    formats via the `--group` argument.
    +    Prepare for a future patch which will introduce arbitrary pretty formats
    +    via the `--group` argument.
     
         To allow additional customizability (for example, to support something
         like `git shortlog -s --group='%aD' --date='format:%Y-%m' ...` (which
         groups commits by the datestring 'YYYY-mm' according to author date), we
         must store off the `--date` parsed from calling `parse_revision_opt()`.
     
    +    Note that this also affects custom output `--format` strings in `git
    +    shortlog`. Though this is a behavior change, this is arguably fixing a
    +    long-standing bug (ie., that `--format` strings are not affected by
    +    `--date` specifiers as they should be).
    +
         Signed-off-by: Jeff King <peff@peff.net>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## Documentation/git-shortlog.txt ##
    -@@ Documentation/git-shortlog.txt: options or the revision range, when confusion arises.
    - :git-shortlog: 1
    - include::rev-list-options.txt[]
    +@@ Documentation/git-shortlog.txt: OPTIONS
      
    -+include::date-options.txt[]
    + 	Each pretty-printed commit will be rewrapped before it is shown.
    + 
    ++--date=<format>::
    ++	With a `--group=format:<format>`, show dates formatted
    ++	according to the given date string. (See the `--date` options
    ++	in the "COMMIT FORMATTING" section of linkgit:git-log[1].)
     +
    - MAPPING AUTHORS
    - ---------------
    - 
    + --group=<type>::
    + 	Group commits based on `<type>`. If no `--group` option is
    + 	specified, the default is `author`. `<type>` is one of:
     
      ## builtin/shortlog.c ##
     @@ builtin/shortlog.c: void shortlog_add_commit(struct shortlog *log, struct commit *commit)
    @@ shortlog.h: struct shortlog {
      
      	enum {
      		SHORTLOG_GROUP_AUTHOR = (1 << 0),
    +
    + ## t/t4201-shortlog.sh ##
    +@@ t/t4201-shortlog.sh: test_expect_success 'pretty format' '
    + 	test_cmp expect log.predictable
    + '
    + 
    ++test_expect_success 'pretty format (with --date)' '
    ++	sed "s/SUBJECT/2005-04-07 OBJECT_NAME/" expect.template >expect &&
    ++	git shortlog --format="%ad %H" --date=short HEAD >log &&
    ++	fuzz log >log.predictable &&
    ++	test_cmp expect log.predictable
    ++'
    ++
    + test_expect_success '--abbrev' '
    + 	sed s/SUBJECT/OBJID/ expect.template >expect &&
    + 	git shortlog --format="%h" --abbrev=35 HEAD >log &&
-:  ---------- > 2:  7decccad7c shortlog: make trailer insertion a noop when appropriate
3:  c3f50465cb = 3:  3665488fc9 shortlog: extract `--group` fragment for translation
4:  6f38990cc2 ! 4:  4a36c0ca4e shortlog: support arbitrary commit format `--group`s
    @@ Documentation/git-shortlog.txt: OPTIONS
         example, if your project uses `Reviewed-by` trailers, you might want
         to see who has been reviewing with
         `git shortlog -ns --group=trailer:reviewed-by`.
    -+ - `<format>`, any string accepted by the `--format` option of 'git log'.
    -+   (See the "PRETTY FORMATS" section of linkgit:git-log[1].)
    ++ - `format:<format>`, any string accepted by the `--format` option of
    ++   'git log'. (See the "PRETTY FORMATS" section of
    ++   linkgit:git-log[1].)
      +
      Note that commits that do not include the trailer will not be counted.
      Likewise, commits with multiple trailers (e.g., multiple signoffs) may
    @@ builtin/shortlog.c: static void insert_records_from_trailers(struct shortlog *lo
     +
     +		format_commit_message(commit, item->string, &buf, ctx);
     +
    -+		if (strset_add(dups, buf.buf))
    ++		if (!(log->format.nr > 1 || log->trailers.nr) ||
    ++		    strset_add(dups, buf.buf))
     +			insert_one_record(log, buf.buf, oneline);
     +	}
     +
    @@ builtin/shortlog.c: static void insert_records_from_trailers(struct shortlog *lo
      {
      	struct strbuf ident = STRBUF_INIT;
     @@ builtin/shortlog.c: void shortlog_add_commit(struct shortlog *log, struct commit *commit)
    - 	if (log->groups & SHORTLOG_GROUP_TRAILER) {
    - 		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
    + 			insert_one_record(log, ident.buf, oneline_str);
      	}
    + 	insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
     +	insert_records_from_format(log, &dups, commit, &ctx, oneline_str);
      
      	strset_clear(&dups);
    @@ builtin/shortlog.c: static int parse_group_option(const struct option *opt, cons
      		log->groups |= SHORTLOG_GROUP_TRAILER;
      		string_list_append(&log->trailers, field);
     -	} else
    -+	} else if (strchrnul(arg, '%')) {
    ++	} else if (skip_prefix(arg, "format:", &field)) {
    ++		log->groups |= SHORTLOG_GROUP_FORMAT;
    ++		string_list_append(&log->format, field);
    ++	} else if (strchr(arg, '%')) {
     +		log->groups |= SHORTLOG_GROUP_FORMAT;
     +		string_list_append(&log->format, arg);
     +	} else {
    @@ t/t4201-shortlog.sh: test_expect_success 'shortlog --group=trailer:signed-off-by
      	test_cmp expect actual
      '
      
    -+test_expect_success 'shortlog --group=<format>' '
    ++test_expect_success 'shortlog --group=format' '
    ++	git shortlog -s --date="format:%Y" --group="format:%cN (%cd)" \
    ++		HEAD >actual &&
    ++	cat >expect <<-\EOF &&
    ++	     4	C O Mitter (2005)
    ++	     1	Sin Nombre (2005)
    ++	EOF
    ++	test_cmp expect actual
    ++'
    ++
    ++test_expect_success 'shortlog --group=<format> DWIM' '
     +	git shortlog -s --date="format:%Y" --group="%cN (%cd)" HEAD >actual &&
    ++	test_cmp expect actual
    ++'
    ++
    ++test_expect_success 'shortlog multiple --group=format' '
    ++	git shortlog -s --date="format:%Y" --group="format:%cN (%cd)" \
    ++		HEAD >actual &&
     +	cat >expect <<-\EOF &&
     +	     4	C O Mitter (2005)
     +	     1	Sin Nombre (2005)
     +	EOF
     +	test_cmp expect actual
     +'
    ++
    ++test_expect_success 'shortlog bogus --group' '
    ++	test_must_fail git shortlog --group=bogus HEAD 2>err &&
    ++	grep "unknown group type" err
    ++'
     +
      test_expect_success 'trailer idents are split' '
      	cat >expect <<-\EOF &&
      	     2	C O Mitter
    +@@ t/t4201-shortlog.sh: test_expect_success 'shortlog can match multiple groups' '
    + 	test_cmp expect actual
    + '
    + 
    ++test_expect_success 'shortlog can match multiple format groups' '
    ++	cat >expect <<-\EOF &&
    ++	     2	User B <b@example.com>
    ++	     1	A U Thor <author@example.com>
    ++	     1	User A <a@example.com>
    ++	EOF
    ++	git shortlog -ns \
    ++		--group="%(trailers:valueonly,key=some-trailer)" \
    ++		--group="%(trailers:valueonly,key=another-trailer)" \
    ++		-2 HEAD >actual.raw &&
    ++	grep -v "^$" actual.raw >actual &&
    ++	test_cmp expect actual
    ++'
    ++
    + test_expect_success 'set up option selection tests' '
    + 	git commit --allow-empty -F - <<-\EOF
    + 	subject
5:  55a6ef7bc0 ! 5:  f0044682be shortlog: implement `--group=author` in terms of `--group=<format>`
    @@ builtin/log.c: static void make_cover_letter(struct rev_info *rev, int use_separ
      	log.in2 = 4;
      	log.file = rev->diffopt.file;
      	log.groups = SHORTLOG_GROUP_AUTHOR;
    -+	shortlog_init_group(&log);
    ++	shortlog_finish_setup(&log);
      	for (i = 0; i < nr; i++)
      		shortlog_add_commit(&log, list[i]);
      
    @@ builtin/shortlog.c: void shortlog_init(struct shortlog *log)
      	log->format.strdup_strings = 1;
      }
      
    -+void shortlog_init_group(struct shortlog *log)
    ++void shortlog_finish_setup(struct shortlog *log)
     +{
    -+	if (!log->groups)
    -+		log->groups = SHORTLOG_GROUP_AUTHOR;
    -+
     +	if (log->groups & SHORTLOG_GROUP_AUTHOR)
     +		string_list_append(&log->format,
     +				   log->email ? "%aN <%aE>" : "%aN");
    ++
    ++	string_list_sort(&log->trailers);
     +}
     +
      int cmd_shortlog(int argc, const char **argv, const char *prefix)
      {
      	struct shortlog log = { STRING_LIST_INIT_NODUP };
     @@ builtin/shortlog.c: int cmd_shortlog(int argc, const char **argv, const char *prefix)
    - 	log.file = rev.diffopt.file;
    - 	log.date_mode = rev.date_mode;
      
    --	if (!log.groups)
    --		log.groups = SHORTLOG_GROUP_AUTHOR;
    -+	shortlog_init_group(&log);
    -+
    - 	string_list_sort(&log.trailers);
    + 	if (!log.groups)
    + 		log.groups = SHORTLOG_GROUP_AUTHOR;
    +-	string_list_sort(&log.trailers);
    ++	shortlog_finish_setup(&log);
      
      	/* assume HEAD if from a tty */
    + 	if (!nongit && !rev.pending.nr && isatty(0))
     
      ## shortlog.h ##
     @@ shortlog.h: struct shortlog {
      };
      
      void shortlog_init(struct shortlog *log);
    -+void shortlog_init_group(struct shortlog *log);
    ++void shortlog_finish_setup(struct shortlog *log);
      
      void shortlog_add_commit(struct shortlog *log, struct commit *commit);
      
6:  814ee4b8d8 ! 6:  6a9e204177 shortlog: implement `--group=committer` in terms of `--group=<format>`
    @@ builtin/shortlog.c: void shortlog_add_commit(struct shortlog *log, struct commit
     -		    strset_add(&dups, ident.buf))
     -			insert_one_record(log, ident.buf, oneline_str);
     -	}
    - 	if (log->groups & SHORTLOG_GROUP_TRAILER) {
    - 		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
    - 	}
    + 	insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
      	insert_records_from_format(log, &dups, commit, &ctx, oneline_str);
      
      	strset_clear(&dups);
    @@ builtin/shortlog.c: void shortlog_add_commit(struct shortlog *log, struct commit
      	strbuf_release(&oneline);
      }
      
    -@@ builtin/shortlog.c: void shortlog_init_group(struct shortlog *log)
    +@@ builtin/shortlog.c: void shortlog_finish_setup(struct shortlog *log)
      	if (log->groups & SHORTLOG_GROUP_AUTHOR)
      		string_list_append(&log->format,
      				   log->email ? "%aN <%aE>" : "%aN");
     +	if (log->groups & SHORTLOG_GROUP_COMMITTER)
     +		string_list_append(&log->format,
     +				   log->email ? "%cN <%cE>" : "%cN");
    - }
      
    - int cmd_shortlog(int argc, const char **argv, const char *prefix)
    + 	string_list_sort(&log->trailers);
    + }
7:  02adc297e7 < -:  ---------- shortlog: implement `--group=trailer` in terms of `--group=<format>`
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply	[flat|nested] 72+ messages in thread
* [PATCH v2 1/6] shortlog: accept `--date`-related options
  2022-10-21  3:11 ` [PATCH v2 0/6] shortlog: introduce `--group=<format>` Taylor Blau
@ 2022-10-21  3:11   ` Taylor Blau
  2022-10-21  5:32     ` Jeff King
  2022-10-21  3:11   ` [PATCH v2 2/6] shortlog: make trailer insertion a noop when appropriate Taylor Blau
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 72+ messages in thread
From: Taylor Blau @ 2022-10-21  3:11 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster, avarab
From: Jeff King <peff@peff.net>
Prepare for a future patch which will introduce arbitrary pretty formats
via the `--group` argument.
To allow additional customizability (for example, to support something
like `git shortlog -s --group='%aD' --date='format:%Y-%m' ...` (which
groups commits by the datestring 'YYYY-mm' according to author date), we
must store off the `--date` parsed from calling `parse_revision_opt()`.
Note that this also affects custom output `--format` strings in `git
shortlog`. Though this is a behavior change, this is arguably fixing a
long-standing bug (ie., that `--format` strings are not affected by
`--date` specifiers as they should be).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-shortlog.txt | 5 +++++
 builtin/shortlog.c             | 3 ++-
 shortlog.h                     | 2 ++
 t/t4201-shortlog.sh            | 7 +++++++
 4 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index f64e77047b..311c041c06 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -47,6 +47,11 @@ OPTIONS
 
 	Each pretty-printed commit will be rewrapped before it is shown.
 
+--date=<format>::
+	With a `--group=format:<format>`, show dates formatted
+	according to the given date string. (See the `--date` options
+	in the "COMMIT FORMATTING" section of linkgit:git-log[1].)
+
 --group=<type>::
 	Group commits based on `<type>`. If no `--group` option is
 	specified, the default is `author`. `<type>` is one of:
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 7a1e1fe7c0..53c379a51d 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -211,7 +211,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	ctx.fmt = CMIT_FMT_USERFORMAT;
 	ctx.abbrev = log->abbrev;
 	ctx.print_email_subject = 1;
-	ctx.date_mode.type = DATE_NORMAL;
+	ctx.date_mode = log->date_mode;
 	ctx.output_encoding = get_log_output_encoding();
 
 	if (!log->summary) {
@@ -407,6 +407,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 	log.user_format = rev.commit_format == CMIT_FMT_USERFORMAT;
 	log.abbrev = rev.abbrev;
 	log.file = rev.diffopt.file;
+	log.date_mode = rev.date_mode;
 
 	if (!log.groups)
 		log.groups = SHORTLOG_GROUP_AUTHOR;
diff --git a/shortlog.h b/shortlog.h
index 3f7e9aabca..dc388dd459 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -2,6 +2,7 @@
 #define SHORTLOG_H
 
 #include "string-list.h"
+#include "date.h"
 
 struct commit;
 
@@ -15,6 +16,7 @@ struct shortlog {
 	int in2;
 	int user_format;
 	int abbrev;
+	struct date_mode date_mode;
 
 	enum {
 		SHORTLOG_GROUP_AUTHOR = (1 << 0),
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 3095b1b2ff..7547da539d 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -83,6 +83,13 @@ test_expect_success 'pretty format' '
 	test_cmp expect log.predictable
 '
 
+test_expect_success 'pretty format (with --date)' '
+	sed "s/SUBJECT/2005-04-07 OBJECT_NAME/" expect.template >expect &&
+	git shortlog --format="%ad %H" --date=short HEAD >log &&
+	fuzz log >log.predictable &&
+	test_cmp expect log.predictable
+'
+
 test_expect_success '--abbrev' '
 	sed s/SUBJECT/OBJID/ expect.template >expect &&
 	git shortlog --format="%h" --abbrev=35 HEAD >log &&
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH v2 2/6] shortlog: make trailer insertion a noop when appropriate
  2022-10-21  3:11 ` [PATCH v2 0/6] shortlog: introduce `--group=<format>` Taylor Blau
  2022-10-21  3:11   ` [PATCH v2 1/6] shortlog: accept `--date`-related options Taylor Blau
@ 2022-10-21  3:11   ` Taylor Blau
  2022-10-21  5:38     ` Jeff King
  2022-10-21  3:11   ` [PATCH v2 3/6] shortlog: extract `--group` fragment for translation Taylor Blau
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 72+ messages in thread
From: Taylor Blau @ 2022-10-21  3:11 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster, avarab
When there are no trailers to insert, it is natural that
insert_records_from_trailers() should return without having done any
work.
But instead we guard this call unnecessarily by first checking whether
`log->groups` has the `SHORTLOG_GROUP_TRAILER` bit set.
Prepare to match a similar pattern in the future where a function which
inserts records of a certain type does no work when no specifiers
matching that type are given.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/shortlog.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 53c379a51d..18f0800c82 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -170,6 +170,9 @@ static void insert_records_from_trailers(struct shortlog *log,
 	const char *commit_buffer, *body;
 	struct strbuf ident = STRBUF_INIT;
 
+	if (!log->trailers.nr)
+		return;
+
 	/*
 	 * Using format_commit_message("%B") would be simpler here, but
 	 * this saves us copying the message.
@@ -240,9 +243,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 		    strset_add(&dups, ident.buf))
 			insert_one_record(log, ident.buf, oneline_str);
 	}
-	if (log->groups & SHORTLOG_GROUP_TRAILER) {
-		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
-	}
+	insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
 
 	strset_clear(&dups);
 	strbuf_release(&ident);
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH v2 3/6] shortlog: extract `--group` fragment for translation
  2022-10-21  3:11 ` [PATCH v2 0/6] shortlog: introduce `--group=<format>` Taylor Blau
  2022-10-21  3:11   ` [PATCH v2 1/6] shortlog: accept `--date`-related options Taylor Blau
  2022-10-21  3:11   ` [PATCH v2 2/6] shortlog: make trailer insertion a noop when appropriate Taylor Blau
@ 2022-10-21  3:11   ` Taylor Blau
  2022-10-21  5:40     ` Jeff King
  2022-10-21  3:11   ` [PATCH v2 4/6] shortlog: support arbitrary commit format `--group`s Taylor Blau
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 72+ messages in thread
From: Taylor Blau @ 2022-10-21  3:11 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster, avarab
The subsequent commit will add another unhandled case in
`read_from_stdin()` which will want to use the same message as with
`--group=trailer`.
Extract the "--group=trailer" part from this message so the same
translation key can be used for both cases.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/shortlog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 18f0800c82..d0645769d7 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -132,7 +132,7 @@ static void read_from_stdin(struct shortlog *log)
 		match = committer_match;
 		break;
 	case SHORTLOG_GROUP_TRAILER:
-		die(_("using --group=trailer with stdin is not supported"));
+		die(_("using %s with stdin is not supported"), "--group=trailer");
 	default:
 		BUG("unhandled shortlog group");
 	}
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH v2 4/6] shortlog: support arbitrary commit format `--group`s
  2022-10-21  3:11 ` [PATCH v2 0/6] shortlog: introduce `--group=<format>` Taylor Blau
                     ` (2 preceding siblings ...)
  2022-10-21  3:11   ` [PATCH v2 3/6] shortlog: extract `--group` fragment for translation Taylor Blau
@ 2022-10-21  3:11   ` Taylor Blau
  2022-10-21  5:48     ` Jeff King
  2022-10-21  3:11   ` [PATCH v2 5/6] shortlog: implement `--group=author` in terms of `--group=<format>` Taylor Blau
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 72+ messages in thread
From: Taylor Blau @ 2022-10-21  3:11 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster, avarab
In addition to generating a shortlog based on committer, author, or the
identity in one or more specified trailers, it can be useful to generate
a shortlog based on an arbitrary commit format.
This can be used, for example, to generate a distribution of commit
activity over time, like so:
    $ git shortlog --group='%cd' --date='format:%Y-%m' -s v2.37.0..
       117  2022-06
       274  2022-07
       324  2022-08
       263  2022-09
         7  2022-10
Arbitrary commit formats can be used. In fact, `git shortlog`'s default
behavior (to count by commit authors) can be emulated as follows:
    $ git shortlog --group='%aN <%aE>' ...
and future patches will make the default behavior (as well as
`--committer`, and `--group=trailer:<trailer>`) special cases of the
more flexible `--group` option.
Note also that the SHORTLOG_GROUP_FORMAT enum value is used only to
designate that `--group:<format>` is in use when in stdin mode to
declare that the combination is invalid.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-shortlog.txt |  3 +++
 builtin/shortlog.c             | 37 +++++++++++++++++++++++++++-
 shortlog.h                     |  2 ++
 t/t4201-shortlog.sh            | 44 ++++++++++++++++++++++++++++++++++
 4 files changed, 85 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index 311c041c06..8cf5a2e3c7 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -64,6 +64,9 @@ OPTIONS
    example, if your project uses `Reviewed-by` trailers, you might want
    to see who has been reviewing with
    `git shortlog -ns --group=trailer:reviewed-by`.
+ - `format:<format>`, any string accepted by the `--format` option of
+   'git log'. (See the "PRETTY FORMATS" section of
+   linkgit:git-log[1].)
 +
 Note that commits that do not include the trailer will not be counted.
 Likewise, commits with multiple trailers (e.g., multiple signoffs) may
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index d0645769d7..0358397a7b 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -133,6 +133,8 @@ static void read_from_stdin(struct shortlog *log)
 		break;
 	case SHORTLOG_GROUP_TRAILER:
 		die(_("using %s with stdin is not supported"), "--group=trailer");
+	case SHORTLOG_GROUP_FORMAT:
+		die(_("using %s with stdin is not supported"), "--group=format");
 	default:
 		BUG("unhandled shortlog group");
 	}
@@ -203,6 +205,28 @@ static void insert_records_from_trailers(struct shortlog *log,
 	unuse_commit_buffer(commit, commit_buffer);
 }
 
+static void insert_records_from_format(struct shortlog *log,
+				       struct strset *dups,
+				       struct commit *commit,
+				       struct pretty_print_context *ctx,
+				       const char *oneline)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct string_list_item *item;
+
+	for_each_string_list_item(item, &log->format) {
+		strbuf_reset(&buf);
+
+		format_commit_message(commit, item->string, &buf, ctx);
+
+		if (!(log->format.nr > 1 || log->trailers.nr) ||
+		    strset_add(dups, buf.buf))
+			insert_one_record(log, buf.buf, oneline);
+	}
+
+	strbuf_release(&buf);
+}
+
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
 	struct strbuf ident = STRBUF_INIT;
@@ -244,6 +268,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 			insert_one_record(log, ident.buf, oneline_str);
 	}
 	insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
+	insert_records_from_format(log, &dups, commit, &ctx, oneline_str);
 
 	strset_clear(&dups);
 	strbuf_release(&ident);
@@ -315,6 +340,7 @@ static int parse_group_option(const struct option *opt, const char *arg, int uns
 	if (unset) {
 		log->groups = 0;
 		string_list_clear(&log->trailers, 0);
+		string_list_clear(&log->format, 0);
 	} else if (!strcasecmp(arg, "author"))
 		log->groups |= SHORTLOG_GROUP_AUTHOR;
 	else if (!strcasecmp(arg, "committer"))
@@ -322,8 +348,15 @@ static int parse_group_option(const struct option *opt, const char *arg, int uns
 	else if (skip_prefix(arg, "trailer:", &field)) {
 		log->groups |= SHORTLOG_GROUP_TRAILER;
 		string_list_append(&log->trailers, field);
-	} else
+	} else if (skip_prefix(arg, "format:", &field)) {
+		log->groups |= SHORTLOG_GROUP_FORMAT;
+		string_list_append(&log->format, field);
+	} else if (strchr(arg, '%')) {
+		log->groups |= SHORTLOG_GROUP_FORMAT;
+		string_list_append(&log->format, arg);
+	} else {
 		return error(_("unknown group type: %s"), arg);
+	}
 
 	return 0;
 }
@@ -341,6 +374,7 @@ void shortlog_init(struct shortlog *log)
 	log->in2 = DEFAULT_INDENT2;
 	log->trailers.strdup_strings = 1;
 	log->trailers.cmp = strcasecmp;
+	log->format.strdup_strings = 1;
 }
 
 int cmd_shortlog(int argc, const char **argv, const char *prefix)
@@ -481,4 +515,5 @@ void shortlog_output(struct shortlog *log)
 	log->list.strdup_strings = 1;
 	string_list_clear(&log->list, 1);
 	clear_mailmap(&log->mailmap);
+	string_list_clear(&log->format, 0);
 }
diff --git a/shortlog.h b/shortlog.h
index dc388dd459..4850a8c30f 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -22,8 +22,10 @@ struct shortlog {
 		SHORTLOG_GROUP_AUTHOR = (1 << 0),
 		SHORTLOG_GROUP_COMMITTER = (1 << 1),
 		SHORTLOG_GROUP_TRAILER = (1 << 2),
+		SHORTLOG_GROUP_FORMAT = (1 << 3),
 	} groups;
 	struct string_list trailers;
+	struct string_list format;
 
 	int email;
 	struct string_list mailmap;
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 7547da539d..13ac0bac64 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -244,6 +244,36 @@ test_expect_success 'shortlog --group=trailer:signed-off-by' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog --group=format' '
+	git shortlog -s --date="format:%Y" --group="format:%cN (%cd)" \
+		HEAD >actual &&
+	cat >expect <<-\EOF &&
+	     4	C O Mitter (2005)
+	     1	Sin Nombre (2005)
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'shortlog --group=<format> DWIM' '
+	git shortlog -s --date="format:%Y" --group="%cN (%cd)" HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'shortlog multiple --group=format' '
+	git shortlog -s --date="format:%Y" --group="format:%cN (%cd)" \
+		HEAD >actual &&
+	cat >expect <<-\EOF &&
+	     4	C O Mitter (2005)
+	     1	Sin Nombre (2005)
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'shortlog bogus --group' '
+	test_must_fail git shortlog --group=bogus HEAD 2>err &&
+	grep "unknown group type" err
+'
+
 test_expect_success 'trailer idents are split' '
 	cat >expect <<-\EOF &&
 	     2	C O Mitter
@@ -326,6 +356,20 @@ test_expect_success 'shortlog can match multiple groups' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog can match multiple format groups' '
+	cat >expect <<-\EOF &&
+	     2	User B <b@example.com>
+	     1	A U Thor <author@example.com>
+	     1	User A <a@example.com>
+	EOF
+	git shortlog -ns \
+		--group="%(trailers:valueonly,key=some-trailer)" \
+		--group="%(trailers:valueonly,key=another-trailer)" \
+		-2 HEAD >actual.raw &&
+	grep -v "^$" actual.raw >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'set up option selection tests' '
 	git commit --allow-empty -F - <<-\EOF
 	subject
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH v2 5/6] shortlog: implement `--group=author` in terms of `--group=<format>`
  2022-10-21  3:11 ` [PATCH v2 0/6] shortlog: introduce `--group=<format>` Taylor Blau
                     ` (3 preceding siblings ...)
  2022-10-21  3:11   ` [PATCH v2 4/6] shortlog: support arbitrary commit format `--group`s Taylor Blau
@ 2022-10-21  3:11   ` Taylor Blau
  2022-10-21  5:50     ` Jeff King
  2022-10-21  3:11   ` [PATCH v2 6/6] shortlog: implement `--group=committer` " Taylor Blau
  2022-10-21  5:53   ` [PATCH v2 0/6] shortlog: introduce `--group=<format>` Jeff King
  6 siblings, 1 reply; 72+ messages in thread
From: Taylor Blau @ 2022-10-21  3:11 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster, avarab
Instead of handling SHORTLOG_GROUP_AUTHOR separately, reimplement it as
a special case of the new `--group=<format>` mode, where the author mode
is a shorthand for `--group='%aN <%aE>'.
Note that we still need to keep the SHORTLOG_GROUP_AUTHOR enum since it
has a different meaning in `read_from_stdin()`, where it is still used
for a different purpose.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/log.c      |  1 +
 builtin/shortlog.c | 20 ++++++++++----------
 shortlog.h         |  1 +
 3 files changed, 12 insertions(+), 10 deletions(-)
diff --git a/builtin/log.c b/builtin/log.c
index ee19dc5d45..b4d5420217 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1334,6 +1334,7 @@ static void make_cover_letter(struct rev_info *rev, int use_separate_file,
 	log.in2 = 4;
 	log.file = rev->diffopt.file;
 	log.groups = SHORTLOG_GROUP_AUTHOR;
+	shortlog_finish_setup(&log);
 	for (i = 0; i < nr; i++)
 		shortlog_add_commit(&log, list[i]);
 
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 0358397a7b..fceb84f12f 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -249,15 +249,6 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	}
 	oneline_str = oneline.len ? oneline.buf : "<none>";
 
-	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
-		strbuf_reset(&ident);
-		format_commit_message(commit,
-				      log->email ? "%aN <%aE>" : "%aN",
-				      &ident, &ctx);
-		if (!HAS_MULTI_BITS(log->groups) ||
-		    strset_add(&dups, ident.buf))
-			insert_one_record(log, ident.buf, oneline_str);
-	}
 	if (log->groups & SHORTLOG_GROUP_COMMITTER) {
 		strbuf_reset(&ident);
 		format_commit_message(commit,
@@ -377,6 +368,15 @@ void shortlog_init(struct shortlog *log)
 	log->format.strdup_strings = 1;
 }
 
+void shortlog_finish_setup(struct shortlog *log)
+{
+	if (log->groups & SHORTLOG_GROUP_AUTHOR)
+		string_list_append(&log->format,
+				   log->email ? "%aN <%aE>" : "%aN");
+
+	string_list_sort(&log->trailers);
+}
+
 int cmd_shortlog(int argc, const char **argv, const char *prefix)
 {
 	struct shortlog log = { STRING_LIST_INIT_NODUP };
@@ -446,7 +446,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 
 	if (!log.groups)
 		log.groups = SHORTLOG_GROUP_AUTHOR;
-	string_list_sort(&log.trailers);
+	shortlog_finish_setup(&log);
 
 	/* assume HEAD if from a tty */
 	if (!nongit && !rev.pending.nr && isatty(0))
diff --git a/shortlog.h b/shortlog.h
index 4850a8c30f..28d04f951a 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -33,6 +33,7 @@ struct shortlog {
 };
 
 void shortlog_init(struct shortlog *log);
+void shortlog_finish_setup(struct shortlog *log);
 
 void shortlog_add_commit(struct shortlog *log, struct commit *commit);
 
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH v2 6/6] shortlog: implement `--group=committer` in terms of `--group=<format>`
  2022-10-21  3:11 ` [PATCH v2 0/6] shortlog: introduce `--group=<format>` Taylor Blau
                     ` (4 preceding siblings ...)
  2022-10-21  3:11   ` [PATCH v2 5/6] shortlog: implement `--group=author` in terms of `--group=<format>` Taylor Blau
@ 2022-10-21  3:11   ` Taylor Blau
  2022-10-21  5:51     ` Jeff King
  2022-10-21  5:53   ` [PATCH v2 0/6] shortlog: introduce `--group=<format>` Jeff King
  6 siblings, 1 reply; 72+ messages in thread
From: Taylor Blau @ 2022-10-21  3:11 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster, avarab
In the same spirit as the previous commit, reimplement
`--group=committer` as a special case of `--group=<format>`, too.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/shortlog.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index fceb84f12f..b9f5af22d3 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -229,7 +229,6 @@ static void insert_records_from_format(struct shortlog *log,
 
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
-	struct strbuf ident = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
 	struct strset dups = STRSET_INIT;
 	struct pretty_print_context ctx = {0};
@@ -249,20 +248,10 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	}
 	oneline_str = oneline.len ? oneline.buf : "<none>";
 
-	if (log->groups & SHORTLOG_GROUP_COMMITTER) {
-		strbuf_reset(&ident);
-		format_commit_message(commit,
-				      log->email ? "%cN <%cE>" : "%cN",
-				      &ident, &ctx);
-		if (!HAS_MULTI_BITS(log->groups) ||
-		    strset_add(&dups, ident.buf))
-			insert_one_record(log, ident.buf, oneline_str);
-	}
 	insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
 	insert_records_from_format(log, &dups, commit, &ctx, oneline_str);
 
 	strset_clear(&dups);
-	strbuf_release(&ident);
 	strbuf_release(&oneline);
 }
 
@@ -373,6 +362,9 @@ void shortlog_finish_setup(struct shortlog *log)
 	if (log->groups & SHORTLOG_GROUP_AUTHOR)
 		string_list_append(&log->format,
 				   log->email ? "%aN <%aE>" : "%aN");
+	if (log->groups & SHORTLOG_GROUP_COMMITTER)
+		string_list_append(&log->format,
+				   log->email ? "%cN <%cE>" : "%cN");
 
 	string_list_sort(&log->trailers);
 }
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* Re: [PATCH 5/7] shortlog: implement `--group=author` in terms of `--group=<format>`
  2022-10-21  2:02       ` Taylor Blau
@ 2022-10-21  5:03         ` Jeff King
  2022-10-21 22:05           ` Taylor Blau
  0 siblings, 1 reply; 72+ messages in thread
From: Jeff King @ 2022-10-21  5:03 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster
On Thu, Oct 20, 2022 at 10:02:45PM -0400, Taylor Blau wrote:
> > > > -	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
> > > > -		strbuf_reset(&ident);
> > > > -		format_commit_message(commit,
> > > > -				      log->email ? "%aN <%aE>" : "%aN",
> > > > -				      &ident, &ctx);
> > > > -		if (!HAS_MULTI_BITS(log->groups) ||
> > > > -		    strset_add(&dups, ident.buf))
> > > > -			insert_one_record(log, ident.buf, oneline_str);
> > > > -	}
> > >
> > > This loses the HAS_MULTI_BITS() optimization. The idea there is that if
> > > you have a single group-by that can't produce multiple outputs, then
> > > there's no need to do duplicate detection.
> > >
> > > The equivalent in an all-formats world is something like:
> > >
> > >   log.format.nr > 1 && !log.trailers.nr
> > >
> > > (because trailers are special in that one trailer key can produce
> > > multiple idents for a single commit).
> 
> Hmm. Shouldn't that "&& !log.trailers.nr" be an "|| log.trailers.nr"? It
> doesn't seem to make sense to say "there are things that could produce
> multiple outputs" if there's more than one format _and_ no trailers.
Yeah. I was thinking of it as "is it OK to not de-dup", but of course it
is the other way around because of the "!". And regardless of which way
you write the conditional, the two sides must agree. ;)
> The logic should read "there are things that could produce multiple
> outputs if there is more than one format *or* at least one trailer".
> 
> So I think the right change would be:
> 
> --- >8 ---
> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index 95ceab7649..7e1b56e2aa 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -216,7 +216,8 @@ static void insert_records_from_format(struct shortlog *log,
> 
>  		format_commit_message(commit, item->string, &buf, ctx);
> 
> -		if (strset_add(dups, buf.buf))
> +		if (!(log->format.nr > 1 || log->trailers.nr) ||
> +		    strset_add(dups, buf.buf))
>  			insert_one_record(log, buf.buf, oneline);
>  	}
> --- 8< ---
> 
> Yeah?
Right. I wondered if it might be a little clearer to drop the outer "!",
which yields:
  if ((log->format.nr <= 1 && !log->trailers.nr) ||
      strset_add(dups, buf.buf))
but it is not really any less confusing. If we gave that first part of
the conditional a name, like:
  if (!needs_dedup(log) || strset_add(dups, buf.buf))
maybe that is better. I dunno.
Regardless, what you wrote above is correct. Thanks for catching it.
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 4/7] shortlog: support arbitrary commit format `--group`s
  2022-10-21  2:39         ` Taylor Blau
@ 2022-10-21  5:21           ` Jeff King
  2022-10-21 22:12             ` Taylor Blau
  0 siblings, 1 reply; 72+ messages in thread
From: Jeff King @ 2022-10-21  5:21 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster
On Thu, Oct 20, 2022 at 10:39:54PM -0400, Taylor Blau wrote:
> --- 8< ---
> diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
> index 0abe5354fc..566d581e1b 100755
> --- a/t/t4201-shortlog.sh
> +++ b/t/t4201-shortlog.sh
> @@ -356,6 +356,19 @@ test_expect_success 'shortlog can match multiple groups' '
>  	test_cmp expect actual
>  '
> 
> +test_expect_success 'shortlog can match multiple format groups' '
> +	cat >expect <<-\EOF &&
> +	     2	User B <b@example.com>
> +	     1	A U Thor <author@example.com>
> +	     1	User A <a@example.com>
> +	EOF
> +	git shortlog -ns \
> +		--group="%(trailers:valueonly,separator=%0x00,key=some-trailer)" \
> +		--group="%(trailers:valueonly,separator=%0x00,key=another-trailer)" \
> +		-2 HEAD >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_expect_success 'set up option selection tests' '
>  	git commit --allow-empty -F - <<-\EOF
>  	subject
> --- >8 ---
> 
> The gross bit there really is the 'separator=%0x00' thing, since the
> "trailers" pretty format gives us a NL character already. I suppose that
> you could do something like this on top instead:
IMHO the new test should avoid using trailers entirely, to avoid the
notion that they are necessary to create the duplicate situation. In a
normal repository, the most obvious one is just asking about both
authors and committers:
  git shortlog --group=format:%cn --group=format:%an
Most commits will have the same value for both, but we want to credit
the commit to them only once.
Of course in our fake test-lib world, authors and committers are
different by default. :-/
But it's easy enough to make more normal-looking commits, perhaps like:
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 0abe5354fc..f0ff8a1714 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -356,6 +356,20 @@ test_expect_success 'shortlog can match multiple groups' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog can match multiple format groups' '
+	GIT_COMMITTER_NAME=$GIT_AUTHOR_NAME \
+		git commit --allow-empty -m "identical names" &&
+	cat >expect <<-\EOF &&
+	     2	A U Thor
+	     1	C O Mitter
+	EOF
+	git shortlog -ns \
+		--group=%cn \
+		--group=%an \
+		-2 HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'set up option selection tests' '
 	git commit --allow-empty -F - <<-\EOF
 	subject
You could also get fancier by dropping "-s" and making sure that the
commits were attributed as expected, but I think the count covers
things.
-Peff
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* Re: [PATCH v2 1/6] shortlog: accept `--date`-related options
  2022-10-21  3:11   ` [PATCH v2 1/6] shortlog: accept `--date`-related options Taylor Blau
@ 2022-10-21  5:32     ` Jeff King
  2022-10-21 21:55       ` Taylor Blau
  0 siblings, 1 reply; 72+ messages in thread
From: Jeff King @ 2022-10-21  5:32 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster, avarab
On Thu, Oct 20, 2022 at 11:11:29PM -0400, Taylor Blau wrote:
> diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
> index f64e77047b..311c041c06 100644
> --- a/Documentation/git-shortlog.txt
> +++ b/Documentation/git-shortlog.txt
> @@ -47,6 +47,11 @@ OPTIONS
>  
>  	Each pretty-printed commit will be rewrapped before it is shown.
>  
> +--date=<format>::
> +	With a `--group=format:<format>`, show dates formatted
> +	according to the given date string. (See the `--date` options
> +	in the "COMMIT FORMATTING" section of linkgit:git-log[1].)
I like this much better than including the whole date-formats section.
Two nits:
  - s/options/option/ ? I guess you could say there are "options" for
    formatting the date, but with "--date" it seems like you'd mean the
    singular option.
  - Because "Commit Formatting" is a subsection in git-log(1), the
    manual will format it with title caps, not all-caps.
> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index 7a1e1fe7c0..53c379a51d 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -211,7 +211,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
>  	ctx.fmt = CMIT_FMT_USERFORMAT;
>  	ctx.abbrev = log->abbrev;
>  	ctx.print_email_subject = 1;
> -	ctx.date_mode.type = DATE_NORMAL;
> +	ctx.date_mode = log->date_mode;
>  	ctx.output_encoding = get_log_output_encoding();
>  
>  	if (!log->summary) {
After the discussion in the last round of whether that funky caller in
make_cover_letter() properly initialized the "struct shortlog", I
wondered briefly if this would do the right thing for that caller. And
it does, because DATE_NORMAL is intentionally set to "0" to allow for
zero-initializing (which is what shortlog_init does).
Not that it matters in practice, because that caller will not group by
nor format using a date, but I was curious if I had inadvertently
introduced a gotcha for future callers. But it is good.
> diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
> index 3095b1b2ff..7547da539d 100755
> --- a/t/t4201-shortlog.sh
> +++ b/t/t4201-shortlog.sh
> @@ -83,6 +83,13 @@ test_expect_success 'pretty format' '
>  	test_cmp expect log.predictable
>  '
>  
> +test_expect_success 'pretty format (with --date)' '
> +	sed "s/SUBJECT/2005-04-07 OBJECT_NAME/" expect.template >expect &&
> +	git shortlog --format="%ad %H" --date=short HEAD >log &&
> +	fuzz log >log.predictable &&
> +	test_cmp expect log.predictable
> +'
And this looks sensible. If you dropped "%H" then you wouldn't need to
fuzz, and are still testing the interesting part. But just following the
whole template/fuzz thing of the surrounding tests is reasonable.
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH v2 2/6] shortlog: make trailer insertion a noop when appropriate
  2022-10-21  3:11   ` [PATCH v2 2/6] shortlog: make trailer insertion a noop when appropriate Taylor Blau
@ 2022-10-21  5:38     ` Jeff King
  2022-10-21 21:57       ` Taylor Blau
  0 siblings, 1 reply; 72+ messages in thread
From: Jeff King @ 2022-10-21  5:38 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster, avarab
On Thu, Oct 20, 2022 at 11:11:32PM -0400, Taylor Blau wrote:
> When there are no trailers to insert, it is natural that
> insert_records_from_trailers() should return without having done any
> work.
> 
> But instead we guard this call unnecessarily by first checking whether
> `log->groups` has the `SHORTLOG_GROUP_TRAILER` bit set.
> 
> Prepare to match a similar pattern in the future where a function which
> inserts records of a certain type does no work when no specifiers
> matching that type are given.
The patch looks good. And knowing what the rest of the series looks
like, this is clearly the right thing to do. But I wonder if the
rationale would be easier if it came at the end. Then rather than
"prepare to match a similar pattern in the future", you can just say
"it's weird that we check the bit for SHORTLOG_GROUP_TRAILER but nothing
else; let's make them consistent".
Obviously this is a minor nit, and I don't care much either way, as the
end result is the same. I just think in general that "this is bad, let's
make it better" makes commit messages easier to write than ones which
posit some hypothetical future state. ;)
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH v2 3/6] shortlog: extract `--group` fragment for translation
  2022-10-21  3:11   ` [PATCH v2 3/6] shortlog: extract `--group` fragment for translation Taylor Blau
@ 2022-10-21  5:40     ` Jeff King
  0 siblings, 0 replies; 72+ messages in thread
From: Jeff King @ 2022-10-21  5:40 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster, avarab
On Thu, Oct 20, 2022 at 11:11:35PM -0400, Taylor Blau wrote:
> The subsequent commit will add another unhandled case in
> `read_from_stdin()` which will want to use the same message as with
> `--group=trailer`.
> 
> Extract the "--group=trailer" part from this message so the same
> translation key can be used for both cases.
Continuing my thoughts from the previous patch, this one actually makes
more sense to me in the order you have it. I.e., making it more generic
ahead of time, instead of afterwards saying "hey, these two messages are
very similar".
I'm not sure what makes me see them differently. It's possible I'm
simply not consistent and you should ignore me. :)
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH v2 4/6] shortlog: support arbitrary commit format `--group`s
  2022-10-21  3:11   ` [PATCH v2 4/6] shortlog: support arbitrary commit format `--group`s Taylor Blau
@ 2022-10-21  5:48     ` Jeff King
  2022-10-21 22:07       ` Taylor Blau
  0 siblings, 1 reply; 72+ messages in thread
From: Jeff King @ 2022-10-21  5:48 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster, avarab
On Thu, Oct 20, 2022 at 11:11:38PM -0400, Taylor Blau wrote:
> +static void insert_records_from_format(struct shortlog *log,
> +				       struct strset *dups,
> +				       struct commit *commit,
> +				       struct pretty_print_context *ctx,
> +				       const char *oneline)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +	struct string_list_item *item;
> +
> +	for_each_string_list_item(item, &log->format) {
> +		strbuf_reset(&buf);
> +
> +		format_commit_message(commit, item->string, &buf, ctx);
> +
> +		if (!(log->format.nr > 1 || log->trailers.nr) ||
> +		    strset_add(dups, buf.buf))
> +			insert_one_record(log, buf.buf, oneline);
> +	}
Just to be clear, since I talked about this conditional in the other
thread: what you have here is correct, and now seeing it again in
context, I think it's OK in terms of readability.
> diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
> index 7547da539d..13ac0bac64 100755
> --- a/t/t4201-shortlog.sh
> +++ b/t/t4201-shortlog.sh
> @@ -244,6 +244,36 @@ test_expect_success 'shortlog --group=trailer:signed-off-by' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'shortlog --group=format' '
> +	git shortlog -s --date="format:%Y" --group="format:%cN (%cd)" \
> +		HEAD >actual &&
> +	cat >expect <<-\EOF &&
> +	     4	C O Mitter (2005)
> +	     1	Sin Nombre (2005)
> +	EOF
> +	test_cmp expect actual
> +'
OK, so this is a basic test of the feature, and makes sure the --date
support works.
> +test_expect_success 'shortlog --group=<format> DWIM' '
> +	git shortlog -s --date="format:%Y" --group="%cN (%cd)" HEAD >actual &&
> +	test_cmp expect actual
> +'
And this is the same thing but without "format:". Good...
> +test_expect_success 'shortlog multiple --group=format' '
> +	git shortlog -s --date="format:%Y" --group="format:%cN (%cd)" \
> +		HEAD >actual &&
> +	cat >expect <<-\EOF &&
> +	     4	C O Mitter (2005)
> +	     1	Sin Nombre (2005)
> +	EOF
> +	test_cmp expect actual
> +'
...but this one seems redundant. It is not using multiple formats. And
you test that later, in the "can match multiple format groups" test. Is
this one just leftover from development?
> +test_expect_success 'shortlog bogus --group' '
> +	test_must_fail git shortlog --group=bogus HEAD 2>err &&
> +	grep "unknown group type" err
> +'
This one is a nice inclusion. I was surprised we didn't have it already. :)
> +test_expect_success 'shortlog can match multiple format groups' '
> +	cat >expect <<-\EOF &&
> +	     2	User B <b@example.com>
> +	     1	A U Thor <author@example.com>
> +	     1	User A <a@example.com>
> +	EOF
> +	git shortlog -ns \
> +		--group="%(trailers:valueonly,key=some-trailer)" \
> +		--group="%(trailers:valueonly,key=another-trailer)" \
> +		-2 HEAD >actual.raw &&
> +	grep -v "^$" actual.raw >actual &&
> +	test_cmp expect actual
> +'
And this one is correct, though I think avoiding trailers is more to the
point; see the other mail I sent.
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH v2 5/6] shortlog: implement `--group=author` in terms of `--group=<format>`
  2022-10-21  3:11   ` [PATCH v2 5/6] shortlog: implement `--group=author` in terms of `--group=<format>` Taylor Blau
@ 2022-10-21  5:50     ` Jeff King
  0 siblings, 0 replies; 72+ messages in thread
From: Jeff King @ 2022-10-21  5:50 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster, avarab
On Thu, Oct 20, 2022 at 11:11:41PM -0400, Taylor Blau wrote:
> Instead of handling SHORTLOG_GROUP_AUTHOR separately, reimplement it as
> a special case of the new `--group=<format>` mode, where the author mode
> is a shorthand for `--group='%aN <%aE>'.
> 
> Note that we still need to keep the SHORTLOG_GROUP_AUTHOR enum since it
> has a different meaning in `read_from_stdin()`, where it is still used
> for a different purpose.
Yep, makes sense. The implementation all looks correct to me, although:
> diff --git a/builtin/log.c b/builtin/log.c
> index ee19dc5d45..b4d5420217 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -1334,6 +1334,7 @@ static void make_cover_letter(struct rev_info *rev, int use_separate_file,
>  	log.in2 = 4;
>  	log.file = rev->diffopt.file;
>  	log.groups = SHORTLOG_GROUP_AUTHOR;
> +	shortlog_finish_setup(&log);
>  	for (i = 0; i < nr; i++)
>  		shortlog_add_commit(&log, list[i]);
I'd probably have pulled out this "finish_setup()" stuff into its own
patch. I can live with it either way, though. The advantage of having it
here is that it becomes obvious why this caller needs it (whereas if
done as a preparatory patch, it would just do the trailer sorting, which
make_cover_letter() obviously does not need).
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH v2 6/6] shortlog: implement `--group=committer` in terms of `--group=<format>`
  2022-10-21  3:11   ` [PATCH v2 6/6] shortlog: implement `--group=committer` " Taylor Blau
@ 2022-10-21  5:51     ` Jeff King
  0 siblings, 0 replies; 72+ messages in thread
From: Jeff King @ 2022-10-21  5:51 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster, avarab
On Thu, Oct 20, 2022 at 11:11:44PM -0400, Taylor Blau wrote:
> In the same spirit as the previous commit, reimplement
> `--group=committer` as a special case of `--group=<format>`, too.
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/shortlog.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
And this one is nice and simple, the AUTHOR one having dealt with all of
the shortlog_finish_setup() parts for us. :)
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH v2 0/6] shortlog: introduce `--group=<format>`
  2022-10-21  3:11 ` [PATCH v2 0/6] shortlog: introduce `--group=<format>` Taylor Blau
                     ` (5 preceding siblings ...)
  2022-10-21  3:11   ` [PATCH v2 6/6] shortlog: implement `--group=committer` " Taylor Blau
@ 2022-10-21  5:53   ` Jeff King
  6 siblings, 0 replies; 72+ messages in thread
From: Jeff King @ 2022-10-21  5:53 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster, avarab
On Thu, Oct 20, 2022 at 11:11:25PM -0400, Taylor Blau wrote:
> Here is a reroll of my series to implement arbitrary pretty formats as
> shortlog `--group`'s, based on a suggestion from Jacob Stopak.
> 
> The changes are somewhat minimal, including a rebase onto the current
> tip of master. Less minimal, however, is dropping the reimplementation
> of `--group=trailer:<key>` in terms of the format group, since this
> ended up being more trouble than it was worth.
> 
> There are also a handful of small tweaks throughout based on feedback
> from Peff.
Thanks. This mostly looks good to me, though I think there are a few doc
and test nits that make it worth one more reroll. I had various style
and ordering suggestions, too. I'll leave it to you whether you think
it's worth taking any of them during the re-roll, or just ignoring them. :)
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH v2 1/6] shortlog: accept `--date`-related options
  2022-10-21  5:32     ` Jeff King
@ 2022-10-21 21:55       ` Taylor Blau
  0 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-21 21:55 UTC (permalink / raw)
  To: Jeff King; +Cc: git, jacob, gitster, avarab
On Fri, Oct 21, 2022 at 01:32:07AM -0400, Jeff King wrote:
> On Thu, Oct 20, 2022 at 11:11:29PM -0400, Taylor Blau wrote:
>
> > diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
> > index f64e77047b..311c041c06 100644
> > --- a/Documentation/git-shortlog.txt
> > +++ b/Documentation/git-shortlog.txt
> > @@ -47,6 +47,11 @@ OPTIONS
> >
> >  	Each pretty-printed commit will be rewrapped before it is shown.
> >
> > +--date=<format>::
> > +	With a `--group=format:<format>`, show dates formatted
> > +	according to the given date string. (See the `--date` options
> > +	in the "COMMIT FORMATTING" section of linkgit:git-log[1].)
>
> I like this much better than including the whole date-formats section.
> Two nits:
Thanks, both make sense.
Thanks,
Taylor
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH v2 2/6] shortlog: make trailer insertion a noop when appropriate
  2022-10-21  5:38     ` Jeff King
@ 2022-10-21 21:57       ` Taylor Blau
  0 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-21 21:57 UTC (permalink / raw)
  To: Jeff King; +Cc: git, jacob, gitster, avarab
On Fri, Oct 21, 2022 at 01:38:28AM -0400, Jeff King wrote:
> On Thu, Oct 20, 2022 at 11:11:32PM -0400, Taylor Blau wrote:
>
> > When there are no trailers to insert, it is natural that
> > insert_records_from_trailers() should return without having done any
> > work.
> >
> > But instead we guard this call unnecessarily by first checking whether
> > `log->groups` has the `SHORTLOG_GROUP_TRAILER` bit set.
> >
> > Prepare to match a similar pattern in the future where a function which
> > inserts records of a certain type does no work when no specifiers
> > matching that type are given.
>
> The patch looks good. And knowing what the rest of the series looks
> like, this is clearly the right thing to do. But I wonder if the
> rationale would be easier if it came at the end. Then rather than
> "prepare to match a similar pattern in the future", you can just say
> "it's weird that we check the bit for SHORTLOG_GROUP_TRAILER but nothing
> else; let's make them consistent".
Hmm. I see what you're saying, but in fact by this point in the series
there are two other spots in shortlog_add_commit() that check whether
individual bits are set (one for SHORTLOG_GROUP_AUTHOR, and another for
SHORTLOG_GROUP_COMMITTER).
So I think that the semi-awkward commit message may in fact be the least
awkward thing to do here ;-).
Thanks,
Taylor
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 5/7] shortlog: implement `--group=author` in terms of `--group=<format>`
  2022-10-21  5:03         ` Jeff King
@ 2022-10-21 22:05           ` Taylor Blau
  0 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-21 22:05 UTC (permalink / raw)
  To: Jeff King; +Cc: git, jacob, gitster
On Fri, Oct 21, 2022 at 01:03:40AM -0400, Jeff King wrote:
> Right. I wondered if it might be a little clearer to drop the outer "!",
> which yields:
>
>   if ((log->format.nr <= 1 && !log->trailers.nr) ||
>       strset_add(dups, buf.buf))
>
> but it is not really any less confusing. If we gave that first part of
> the conditional a name, like:
>
>   if (!needs_dedup(log) || strset_add(dups, buf.buf))
>
> maybe that is better. I dunno.
Yeah, I think that this is best. I briefly wondered whether it would be
better to invert the check _and_ name it, but `if
(does_not_need_dedup(log))` is too wordy.
> Regardless, what you wrote above is correct. Thanks for catching it.
No problem, it was an honest mistake.
Thanks,
Taylor
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH v2 4/6] shortlog: support arbitrary commit format `--group`s
  2022-10-21  5:48     ` Jeff King
@ 2022-10-21 22:07       ` Taylor Blau
  0 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-21 22:07 UTC (permalink / raw)
  To: Jeff King; +Cc: git, jacob, gitster, avarab
On Fri, Oct 21, 2022 at 01:48:22AM -0400, Jeff King wrote:
> > +test_expect_success 'shortlog multiple --group=format' '
> > +	git shortlog -s --date="format:%Y" --group="format:%cN (%cd)" \
> > +		HEAD >actual &&
> > +	cat >expect <<-\EOF &&
> > +	     4	C O Mitter (2005)
> > +	     1	Sin Nombre (2005)
> > +	EOF
> > +	test_cmp expect actual
> > +'
>
> ...but this one seems redundant. It is not using multiple formats. And
> you test that later, in the "can match multiple format groups" test. Is
> this one just leftover from development?
Oops, yes. Thanks for catching it, I'll drop this one.
> > +test_expect_success 'shortlog bogus --group' '
> > +	test_must_fail git shortlog --group=bogus HEAD 2>err &&
> > +	grep "unknown group type" err
> > +'
>
> This one is a nice inclusion. I was surprised we didn't have it already. :)
:-).
Thanks,
Taylor
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 4/7] shortlog: support arbitrary commit format `--group`s
  2022-10-21  5:21           ` Jeff King
@ 2022-10-21 22:12             ` Taylor Blau
  0 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-21 22:12 UTC (permalink / raw)
  To: Jeff King; +Cc: git, jacob, gitster
On Fri, Oct 21, 2022 at 01:21:30AM -0400, Jeff King wrote:
> On Thu, Oct 20, 2022 at 10:39:54PM -0400, Taylor Blau wrote:
>
> > --- 8< ---
> > diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
> > index 0abe5354fc..566d581e1b 100755
> > --- a/t/t4201-shortlog.sh
> > +++ b/t/t4201-shortlog.sh
> > @@ -356,6 +356,19 @@ test_expect_success 'shortlog can match multiple groups' '
> >  	test_cmp expect actual
> >  '
> >
> > +test_expect_success 'shortlog can match multiple format groups' '
> > +	cat >expect <<-\EOF &&
> > +	     2	User B <b@example.com>
> > +	     1	A U Thor <author@example.com>
> > +	     1	User A <a@example.com>
> > +	EOF
> > +	git shortlog -ns \
> > +		--group="%(trailers:valueonly,separator=%0x00,key=some-trailer)" \
> > +		--group="%(trailers:valueonly,separator=%0x00,key=another-trailer)" \
> > +		-2 HEAD >actual &&
> > +	test_cmp expect actual
> > +'
> > +
> >  test_expect_success 'set up option selection tests' '
> >  	git commit --allow-empty -F - <<-\EOF
> >  	subject
> > --- >8 ---
> >
> > The gross bit there really is the 'separator=%0x00' thing, since the
> > "trailers" pretty format gives us a NL character already. I suppose that
> > you could do something like this on top instead:
>
> IMHO the new test should avoid using trailers entirely, to avoid the
> notion that they are necessary to create the duplicate situation. In a
> normal repository, the most obvious one is just asking about both
> authors and committers:
>
>   git shortlog --group=format:%cn --group=format:%an
Yeah, that's fair. I was worried enough about whether or not this was
going to cause significant test fallout, but it ended up being extremely
straightforward and simplified the tests nicely. Thanks!
Thanks,
Taylor
^ permalink raw reply	[flat|nested] 72+ messages in thread
* [PATCH v3 0/7] shortlog: introduce `--group=<format>`
  2022-10-11  0:33 [PATCH 0/7] shortlog: introduce `--group=<format>` Taylor Blau
                   ` (7 preceding siblings ...)
  2022-10-21  3:11 ` [PATCH v2 0/6] shortlog: introduce `--group=<format>` Taylor Blau
@ 2022-10-21 22:25 ` Taylor Blau
  2022-10-21 22:25   ` [PATCH v3 1/7] shortlog: accept `--date`-related options Taylor Blau
                     ` (7 more replies)
  2022-10-24 18:55 ` [PATCH " Taylor Blau
  9 siblings, 8 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-21 22:25 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster, avarab
Here is a (likely final) reroll of my series to implement arbitrary
pretty formats as shortlog `--group`'s, based on a suggestion from Jacob
Stopak.
The changes are cosmetic, based on the latest round of review from Peff.
There aren't any substantive changes from the last round.
Thanks in advance for your review.
Jeff King (1):
  shortlog: accept `--date`-related options
Taylor Blau (6):
  shortlog: make trailer insertion a noop when appropriate
  shortlog: extract `--group` fragment for translation
  shortlog: support arbitrary commit format `--group`s
  shortlog: extract `shortlog_finish_setup()`
  shortlog: implement `--group=author` in terms of `--group=<format>`
  shortlog: implement `--group=committer` in terms of `--group=<format>`
 Documentation/git-shortlog.txt |  8 ++++
 builtin/log.c                  |  1 +
 builtin/shortlog.c             | 87 +++++++++++++++++++++++-----------
 shortlog.h                     |  5 ++
 t/t4201-shortlog.sh            | 39 +++++++++++++++
 5 files changed, 113 insertions(+), 27 deletions(-)
Range-diff against v2:
1:  58baccbaa8 ! 1:  add457f319 shortlog: accept `--date`-related options
    @@ Documentation/git-shortlog.txt: OPTIONS
      	Each pretty-printed commit will be rewrapped before it is shown.
      
     +--date=<format>::
    -+	With a `--group=format:<format>`, show dates formatted
    -+	according to the given date string. (See the `--date` options
    -+	in the "COMMIT FORMATTING" section of linkgit:git-log[1].)
    ++	Show dates formatted according to the given date string. (See
    ++	the `--date` option in the "Commit Formatting" section of
    ++	linkgit:git-log[1]).
     +
      --group=<type>::
      	Group commits based on `<type>`. If no `--group` option is
2:  7decccad7c = 2:  25cdc28215 shortlog: make trailer insertion a noop when appropriate
3:  3665488fc9 = 3:  cf84f829aa shortlog: extract `--group` fragment for translation
4:  4a36c0ca4e ! 4:  f75a952421 shortlog: support arbitrary commit format `--group`s
    @@ Commit message
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## Documentation/git-shortlog.txt ##
    +@@ Documentation/git-shortlog.txt: OPTIONS
    + --date=<format>::
    + 	Show dates formatted according to the given date string. (See
    + 	the `--date` option in the "Commit Formatting" section of
    +-	linkgit:git-log[1]).
    ++	linkgit:git-log[1]). Useful with `--group=format:<format>`.
    + 
    + --group=<type>::
    + 	Group commits based on `<type>`. If no `--group` option is
     @@ Documentation/git-shortlog.txt: OPTIONS
         example, if your project uses `Reviewed-by` trailers, you might want
         to see who has been reviewing with
    @@ builtin/shortlog.c: static void insert_records_from_trailers(struct shortlog *lo
      	unuse_commit_buffer(commit, commit_buffer);
      }
      
    ++static int shortlog_needs_dedup(const struct shortlog *log)
    ++{
    ++	return log->format.nr > 1 || log->trailers.nr;
    ++}
    ++
     +static void insert_records_from_format(struct shortlog *log,
     +				       struct strset *dups,
     +				       struct commit *commit,
    @@ builtin/shortlog.c: static void insert_records_from_trailers(struct shortlog *lo
     +
     +		format_commit_message(commit, item->string, &buf, ctx);
     +
    -+		if (!(log->format.nr > 1 || log->trailers.nr) ||
    -+		    strset_add(dups, buf.buf))
    ++		if (!shortlog_needs_dedup(log) || strset_add(dups, buf.buf))
     +			insert_one_record(log, buf.buf, oneline);
     +	}
     +
    @@ t/t4201-shortlog.sh: test_expect_success 'shortlog --group=trailer:signed-off-by
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'shortlog multiple --group=format' '
    -+	git shortlog -s --date="format:%Y" --group="format:%cN (%cd)" \
    -+		HEAD >actual &&
    -+	cat >expect <<-\EOF &&
    -+	     4	C O Mitter (2005)
    -+	     1	Sin Nombre (2005)
    -+	EOF
    -+	test_cmp expect actual
    -+'
    -+
     +test_expect_success 'shortlog bogus --group' '
     +	test_must_fail git shortlog --group=bogus HEAD 2>err &&
     +	grep "unknown group type" err
    @@ t/t4201-shortlog.sh: test_expect_success 'shortlog can match multiple groups' '
      '
      
     +test_expect_success 'shortlog can match multiple format groups' '
    ++	GIT_COMMITTER_NAME="$GIT_AUTHOR_NAME" \
    ++		git commit --allow-empty -m "identical names" &&
    ++	test_tick &&
     +	cat >expect <<-\EOF &&
    -+	     2	User B <b@example.com>
    -+	     1	A U Thor <author@example.com>
    -+	     1	User A <a@example.com>
    ++	     2	A U Thor
    ++	     1	C O Mitter
     +	EOF
    -+	git shortlog -ns \
    -+		--group="%(trailers:valueonly,key=some-trailer)" \
    -+		--group="%(trailers:valueonly,key=another-trailer)" \
    -+		-2 HEAD >actual.raw &&
    -+	grep -v "^$" actual.raw >actual &&
    ++	git shortlog -ns --group="%cn" --group="%an" -2 HEAD >actual &&
     +	test_cmp expect actual
     +'
     +
5:  f0044682be ! 5:  8dd69edcf8 shortlog: implement `--group=author` in terms of `--group=<format>`
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    shortlog: implement `--group=author` in terms of `--group=<format>`
    +    shortlog: extract `shortlog_finish_setup()`
     
    -    Instead of handling SHORTLOG_GROUP_AUTHOR separately, reimplement it as
    -    a special case of the new `--group=<format>` mode, where the author mode
    -    is a shorthand for `--group='%aN <%aE>'.
    +    Extract a function which finishes setting up the shortlog struct for
    +    use. The caller in `make_cover_letter()` does not care about trailer
    +    sorting, so it isn't strictly necessary to add a call there in this
    +    patch.
     
    -    Note that we still need to keep the SHORTLOG_GROUP_AUTHOR enum since it
    -    has a different meaning in `read_from_stdin()`, where it is still used
    -    for a different purpose.
    +    But the next patch will add additional functionality to the new
    +    `shortlog_finish_setup()` function, which the caller in
    +    `make_cover_letter()` will care about.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ builtin/log.c: static void make_cover_letter(struct rev_info *rev, int use_separ
      
     
      ## builtin/shortlog.c ##
    -@@ builtin/shortlog.c: void shortlog_add_commit(struct shortlog *log, struct commit *commit)
    - 	}
    - 	oneline_str = oneline.len ? oneline.buf : "<none>";
    - 
    --	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
    --		strbuf_reset(&ident);
    --		format_commit_message(commit,
    --				      log->email ? "%aN <%aE>" : "%aN",
    --				      &ident, &ctx);
    --		if (!HAS_MULTI_BITS(log->groups) ||
    --		    strset_add(&dups, ident.buf))
    --			insert_one_record(log, ident.buf, oneline_str);
    --	}
    - 	if (log->groups & SHORTLOG_GROUP_COMMITTER) {
    - 		strbuf_reset(&ident);
    - 		format_commit_message(commit,
     @@ builtin/shortlog.c: void shortlog_init(struct shortlog *log)
      	log->format.strdup_strings = 1;
      }
      
     +void shortlog_finish_setup(struct shortlog *log)
     +{
    -+	if (log->groups & SHORTLOG_GROUP_AUTHOR)
    -+		string_list_append(&log->format,
    -+				   log->email ? "%aN <%aE>" : "%aN");
    -+
     +	string_list_sort(&log->trailers);
     +}
     +
-:  ---------- > 6:  f4f233fb48 shortlog: implement `--group=author` in terms of `--group=<format>`
6:  6a9e204177 = 7:  6ce06a054e shortlog: implement `--group=committer` in terms of `--group=<format>`
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply	[flat|nested] 72+ messages in thread
* [PATCH v3 1/7] shortlog: accept `--date`-related options
  2022-10-21 22:25 ` [PATCH v3 0/7] " Taylor Blau
@ 2022-10-21 22:25   ` Taylor Blau
  2022-10-21 22:25   ` [PATCH v3 2/7] shortlog: make trailer insertion a noop when appropriate Taylor Blau
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-21 22:25 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster, avarab
From: Jeff King <peff@peff.net>
Prepare for a future patch which will introduce arbitrary pretty formats
via the `--group` argument.
To allow additional customizability (for example, to support something
like `git shortlog -s --group='%aD' --date='format:%Y-%m' ...` (which
groups commits by the datestring 'YYYY-mm' according to author date), we
must store off the `--date` parsed from calling `parse_revision_opt()`.
Note that this also affects custom output `--format` strings in `git
shortlog`. Though this is a behavior change, this is arguably fixing a
long-standing bug (ie., that `--format` strings are not affected by
`--date` specifiers as they should be).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-shortlog.txt | 5 +++++
 builtin/shortlog.c             | 3 ++-
 shortlog.h                     | 2 ++
 t/t4201-shortlog.sh            | 7 +++++++
 4 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index f64e77047b..9ed9d6a9e7 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -47,6 +47,11 @@ OPTIONS
 
 	Each pretty-printed commit will be rewrapped before it is shown.
 
+--date=<format>::
+	Show dates formatted according to the given date string. (See
+	the `--date` option in the "Commit Formatting" section of
+	linkgit:git-log[1]).
+
 --group=<type>::
 	Group commits based on `<type>`. If no `--group` option is
 	specified, the default is `author`. `<type>` is one of:
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 7a1e1fe7c0..53c379a51d 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -211,7 +211,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	ctx.fmt = CMIT_FMT_USERFORMAT;
 	ctx.abbrev = log->abbrev;
 	ctx.print_email_subject = 1;
-	ctx.date_mode.type = DATE_NORMAL;
+	ctx.date_mode = log->date_mode;
 	ctx.output_encoding = get_log_output_encoding();
 
 	if (!log->summary) {
@@ -407,6 +407,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 	log.user_format = rev.commit_format == CMIT_FMT_USERFORMAT;
 	log.abbrev = rev.abbrev;
 	log.file = rev.diffopt.file;
+	log.date_mode = rev.date_mode;
 
 	if (!log.groups)
 		log.groups = SHORTLOG_GROUP_AUTHOR;
diff --git a/shortlog.h b/shortlog.h
index 3f7e9aabca..dc388dd459 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -2,6 +2,7 @@
 #define SHORTLOG_H
 
 #include "string-list.h"
+#include "date.h"
 
 struct commit;
 
@@ -15,6 +16,7 @@ struct shortlog {
 	int in2;
 	int user_format;
 	int abbrev;
+	struct date_mode date_mode;
 
 	enum {
 		SHORTLOG_GROUP_AUTHOR = (1 << 0),
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 3095b1b2ff..7547da539d 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -83,6 +83,13 @@ test_expect_success 'pretty format' '
 	test_cmp expect log.predictable
 '
 
+test_expect_success 'pretty format (with --date)' '
+	sed "s/SUBJECT/2005-04-07 OBJECT_NAME/" expect.template >expect &&
+	git shortlog --format="%ad %H" --date=short HEAD >log &&
+	fuzz log >log.predictable &&
+	test_cmp expect log.predictable
+'
+
 test_expect_success '--abbrev' '
 	sed s/SUBJECT/OBJID/ expect.template >expect &&
 	git shortlog --format="%h" --abbrev=35 HEAD >log &&
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH v3 2/7] shortlog: make trailer insertion a noop when appropriate
  2022-10-21 22:25 ` [PATCH v3 0/7] " Taylor Blau
  2022-10-21 22:25   ` [PATCH v3 1/7] shortlog: accept `--date`-related options Taylor Blau
@ 2022-10-21 22:25   ` Taylor Blau
  2022-10-21 22:25   ` [PATCH v3 3/7] shortlog: extract `--group` fragment for translation Taylor Blau
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-21 22:25 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster, avarab
When there are no trailers to insert, it is natural that
insert_records_from_trailers() should return without having done any
work.
But instead we guard this call unnecessarily by first checking whether
`log->groups` has the `SHORTLOG_GROUP_TRAILER` bit set.
Prepare to match a similar pattern in the future where a function which
inserts records of a certain type does no work when no specifiers
matching that type are given.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/shortlog.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 53c379a51d..18f0800c82 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -170,6 +170,9 @@ static void insert_records_from_trailers(struct shortlog *log,
 	const char *commit_buffer, *body;
 	struct strbuf ident = STRBUF_INIT;
 
+	if (!log->trailers.nr)
+		return;
+
 	/*
 	 * Using format_commit_message("%B") would be simpler here, but
 	 * this saves us copying the message.
@@ -240,9 +243,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 		    strset_add(&dups, ident.buf))
 			insert_one_record(log, ident.buf, oneline_str);
 	}
-	if (log->groups & SHORTLOG_GROUP_TRAILER) {
-		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
-	}
+	insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
 
 	strset_clear(&dups);
 	strbuf_release(&ident);
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH v3 3/7] shortlog: extract `--group` fragment for translation
  2022-10-21 22:25 ` [PATCH v3 0/7] " Taylor Blau
  2022-10-21 22:25   ` [PATCH v3 1/7] shortlog: accept `--date`-related options Taylor Blau
  2022-10-21 22:25   ` [PATCH v3 2/7] shortlog: make trailer insertion a noop when appropriate Taylor Blau
@ 2022-10-21 22:25   ` Taylor Blau
  2022-10-21 22:25   ` [PATCH v3 4/7] shortlog: support arbitrary commit format `--group`s Taylor Blau
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-21 22:25 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster, avarab
The subsequent commit will add another unhandled case in
`read_from_stdin()` which will want to use the same message as with
`--group=trailer`.
Extract the "--group=trailer" part from this message so the same
translation key can be used for both cases.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/shortlog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 18f0800c82..d0645769d7 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -132,7 +132,7 @@ static void read_from_stdin(struct shortlog *log)
 		match = committer_match;
 		break;
 	case SHORTLOG_GROUP_TRAILER:
-		die(_("using --group=trailer with stdin is not supported"));
+		die(_("using %s with stdin is not supported"), "--group=trailer");
 	default:
 		BUG("unhandled shortlog group");
 	}
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH v3 4/7] shortlog: support arbitrary commit format `--group`s
  2022-10-21 22:25 ` [PATCH v3 0/7] " Taylor Blau
                     ` (2 preceding siblings ...)
  2022-10-21 22:25   ` [PATCH v3 3/7] shortlog: extract `--group` fragment for translation Taylor Blau
@ 2022-10-21 22:25   ` Taylor Blau
  2022-10-22  0:28     ` Jeff King
  2022-10-21 22:25   ` [PATCH v3 5/7] shortlog: extract `shortlog_finish_setup()` Taylor Blau
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 72+ messages in thread
From: Taylor Blau @ 2022-10-21 22:25 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster, avarab
In addition to generating a shortlog based on committer, author, or the
identity in one or more specified trailers, it can be useful to generate
a shortlog based on an arbitrary commit format.
This can be used, for example, to generate a distribution of commit
activity over time, like so:
    $ git shortlog --group='%cd' --date='format:%Y-%m' -s v2.37.0..
       117  2022-06
       274  2022-07
       324  2022-08
       263  2022-09
         7  2022-10
Arbitrary commit formats can be used. In fact, `git shortlog`'s default
behavior (to count by commit authors) can be emulated as follows:
    $ git shortlog --group='%aN <%aE>' ...
and future patches will make the default behavior (as well as
`--committer`, and `--group=trailer:<trailer>`) special cases of the
more flexible `--group` option.
Note also that the SHORTLOG_GROUP_FORMAT enum value is used only to
designate that `--group:<format>` is in use when in stdin mode to
declare that the combination is invalid.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-shortlog.txt |  5 ++++-
 builtin/shortlog.c             | 41 +++++++++++++++++++++++++++++++++-
 shortlog.h                     |  2 ++
 t/t4201-shortlog.sh            | 32 ++++++++++++++++++++++++++
 4 files changed, 78 insertions(+), 2 deletions(-)
diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index 9ed9d6a9e7..7d0277d033 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -50,7 +50,7 @@ OPTIONS
 --date=<format>::
 	Show dates formatted according to the given date string. (See
 	the `--date` option in the "Commit Formatting" section of
-	linkgit:git-log[1]).
+	linkgit:git-log[1]). Useful with `--group=format:<format>`.
 
 --group=<type>::
 	Group commits based on `<type>`. If no `--group` option is
@@ -64,6 +64,9 @@ OPTIONS
    example, if your project uses `Reviewed-by` trailers, you might want
    to see who has been reviewing with
    `git shortlog -ns --group=trailer:reviewed-by`.
+ - `format:<format>`, any string accepted by the `--format` option of
+   'git log'. (See the "PRETTY FORMATS" section of
+   linkgit:git-log[1].)
 +
 Note that commits that do not include the trailer will not be counted.
 Likewise, commits with multiple trailers (e.g., multiple signoffs) may
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index d0645769d7..a52288acab 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -133,6 +133,8 @@ static void read_from_stdin(struct shortlog *log)
 		break;
 	case SHORTLOG_GROUP_TRAILER:
 		die(_("using %s with stdin is not supported"), "--group=trailer");
+	case SHORTLOG_GROUP_FORMAT:
+		die(_("using %s with stdin is not supported"), "--group=format");
 	default:
 		BUG("unhandled shortlog group");
 	}
@@ -203,6 +205,32 @@ static void insert_records_from_trailers(struct shortlog *log,
 	unuse_commit_buffer(commit, commit_buffer);
 }
 
+static int shortlog_needs_dedup(const struct shortlog *log)
+{
+	return log->format.nr > 1 || log->trailers.nr;
+}
+
+static void insert_records_from_format(struct shortlog *log,
+				       struct strset *dups,
+				       struct commit *commit,
+				       struct pretty_print_context *ctx,
+				       const char *oneline)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct string_list_item *item;
+
+	for_each_string_list_item(item, &log->format) {
+		strbuf_reset(&buf);
+
+		format_commit_message(commit, item->string, &buf, ctx);
+
+		if (!shortlog_needs_dedup(log) || strset_add(dups, buf.buf))
+			insert_one_record(log, buf.buf, oneline);
+	}
+
+	strbuf_release(&buf);
+}
+
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
 	struct strbuf ident = STRBUF_INIT;
@@ -244,6 +272,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 			insert_one_record(log, ident.buf, oneline_str);
 	}
 	insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
+	insert_records_from_format(log, &dups, commit, &ctx, oneline_str);
 
 	strset_clear(&dups);
 	strbuf_release(&ident);
@@ -315,6 +344,7 @@ static int parse_group_option(const struct option *opt, const char *arg, int uns
 	if (unset) {
 		log->groups = 0;
 		string_list_clear(&log->trailers, 0);
+		string_list_clear(&log->format, 0);
 	} else if (!strcasecmp(arg, "author"))
 		log->groups |= SHORTLOG_GROUP_AUTHOR;
 	else if (!strcasecmp(arg, "committer"))
@@ -322,8 +352,15 @@ static int parse_group_option(const struct option *opt, const char *arg, int uns
 	else if (skip_prefix(arg, "trailer:", &field)) {
 		log->groups |= SHORTLOG_GROUP_TRAILER;
 		string_list_append(&log->trailers, field);
-	} else
+	} else if (skip_prefix(arg, "format:", &field)) {
+		log->groups |= SHORTLOG_GROUP_FORMAT;
+		string_list_append(&log->format, field);
+	} else if (strchr(arg, '%')) {
+		log->groups |= SHORTLOG_GROUP_FORMAT;
+		string_list_append(&log->format, arg);
+	} else {
 		return error(_("unknown group type: %s"), arg);
+	}
 
 	return 0;
 }
@@ -341,6 +378,7 @@ void shortlog_init(struct shortlog *log)
 	log->in2 = DEFAULT_INDENT2;
 	log->trailers.strdup_strings = 1;
 	log->trailers.cmp = strcasecmp;
+	log->format.strdup_strings = 1;
 }
 
 int cmd_shortlog(int argc, const char **argv, const char *prefix)
@@ -481,4 +519,5 @@ void shortlog_output(struct shortlog *log)
 	log->list.strdup_strings = 1;
 	string_list_clear(&log->list, 1);
 	clear_mailmap(&log->mailmap);
+	string_list_clear(&log->format, 0);
 }
diff --git a/shortlog.h b/shortlog.h
index dc388dd459..4850a8c30f 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -22,8 +22,10 @@ struct shortlog {
 		SHORTLOG_GROUP_AUTHOR = (1 << 0),
 		SHORTLOG_GROUP_COMMITTER = (1 << 1),
 		SHORTLOG_GROUP_TRAILER = (1 << 2),
+		SHORTLOG_GROUP_FORMAT = (1 << 3),
 	} groups;
 	struct string_list trailers;
+	struct string_list format;
 
 	int email;
 	struct string_list mailmap;
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 7547da539d..8e4effebdb 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -244,6 +244,26 @@ test_expect_success 'shortlog --group=trailer:signed-off-by' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog --group=format' '
+	git shortlog -s --date="format:%Y" --group="format:%cN (%cd)" \
+		HEAD >actual &&
+	cat >expect <<-\EOF &&
+	     4	C O Mitter (2005)
+	     1	Sin Nombre (2005)
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'shortlog --group=<format> DWIM' '
+	git shortlog -s --date="format:%Y" --group="%cN (%cd)" HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'shortlog bogus --group' '
+	test_must_fail git shortlog --group=bogus HEAD 2>err &&
+	grep "unknown group type" err
+'
+
 test_expect_success 'trailer idents are split' '
 	cat >expect <<-\EOF &&
 	     2	C O Mitter
@@ -326,6 +346,18 @@ test_expect_success 'shortlog can match multiple groups' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog can match multiple format groups' '
+	GIT_COMMITTER_NAME="$GIT_AUTHOR_NAME" \
+		git commit --allow-empty -m "identical names" &&
+	test_tick &&
+	cat >expect <<-\EOF &&
+	     2	A U Thor
+	     1	C O Mitter
+	EOF
+	git shortlog -ns --group="%cn" --group="%an" -2 HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'set up option selection tests' '
 	git commit --allow-empty -F - <<-\EOF
 	subject
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH v3 5/7] shortlog: extract `shortlog_finish_setup()`
  2022-10-21 22:25 ` [PATCH v3 0/7] " Taylor Blau
                     ` (3 preceding siblings ...)
  2022-10-21 22:25   ` [PATCH v3 4/7] shortlog: support arbitrary commit format `--group`s Taylor Blau
@ 2022-10-21 22:25   ` Taylor Blau
  2022-10-21 22:25   ` [PATCH v3 6/7] shortlog: implement `--group=author` in terms of `--group=<format>` Taylor Blau
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-21 22:25 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster, avarab
Extract a function which finishes setting up the shortlog struct for
use. The caller in `make_cover_letter()` does not care about trailer
sorting, so it isn't strictly necessary to add a call there in this
patch.
But the next patch will add additional functionality to the new
`shortlog_finish_setup()` function, which the caller in
`make_cover_letter()` will care about.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/log.c      | 1 +
 builtin/shortlog.c | 7 ++++++-
 shortlog.h         | 1 +
 3 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/builtin/log.c b/builtin/log.c
index ee19dc5d45..b4d5420217 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1334,6 +1334,7 @@ static void make_cover_letter(struct rev_info *rev, int use_separate_file,
 	log.in2 = 4;
 	log.file = rev->diffopt.file;
 	log.groups = SHORTLOG_GROUP_AUTHOR;
+	shortlog_finish_setup(&log);
 	for (i = 0; i < nr; i++)
 		shortlog_add_commit(&log, list[i]);
 
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index a52288acab..ebf2bf3d3b 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -381,6 +381,11 @@ void shortlog_init(struct shortlog *log)
 	log->format.strdup_strings = 1;
 }
 
+void shortlog_finish_setup(struct shortlog *log)
+{
+	string_list_sort(&log->trailers);
+}
+
 int cmd_shortlog(int argc, const char **argv, const char *prefix)
 {
 	struct shortlog log = { STRING_LIST_INIT_NODUP };
@@ -450,7 +455,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 
 	if (!log.groups)
 		log.groups = SHORTLOG_GROUP_AUTHOR;
-	string_list_sort(&log.trailers);
+	shortlog_finish_setup(&log);
 
 	/* assume HEAD if from a tty */
 	if (!nongit && !rev.pending.nr && isatty(0))
diff --git a/shortlog.h b/shortlog.h
index 4850a8c30f..28d04f951a 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -33,6 +33,7 @@ struct shortlog {
 };
 
 void shortlog_init(struct shortlog *log);
+void shortlog_finish_setup(struct shortlog *log);
 
 void shortlog_add_commit(struct shortlog *log, struct commit *commit);
 
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH v3 6/7] shortlog: implement `--group=author` in terms of `--group=<format>`
  2022-10-21 22:25 ` [PATCH v3 0/7] " Taylor Blau
                     ` (4 preceding siblings ...)
  2022-10-21 22:25   ` [PATCH v3 5/7] shortlog: extract `shortlog_finish_setup()` Taylor Blau
@ 2022-10-21 22:25   ` Taylor Blau
  2022-10-21 22:25   ` [PATCH v3 7/7] shortlog: implement `--group=committer` " Taylor Blau
  2022-10-22  0:31   ` [PATCH v3 0/7] shortlog: introduce `--group=<format>` Jeff King
  7 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-21 22:25 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster, avarab
Instead of handling SHORTLOG_GROUP_AUTHOR separately, reimplement it as
a special case of the new `--group=<format>` mode, where the author mode
is a shorthand for `--group='%aN <%aE>'.
Note that we still need to keep the SHORTLOG_GROUP_AUTHOR enum since it
has a different meaning in `read_from_stdin()`, where it is still used
for a different purpose.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/shortlog.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index ebf2bf3d3b..4b95820bca 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -253,15 +253,6 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	}
 	oneline_str = oneline.len ? oneline.buf : "<none>";
 
-	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
-		strbuf_reset(&ident);
-		format_commit_message(commit,
-				      log->email ? "%aN <%aE>" : "%aN",
-				      &ident, &ctx);
-		if (!HAS_MULTI_BITS(log->groups) ||
-		    strset_add(&dups, ident.buf))
-			insert_one_record(log, ident.buf, oneline_str);
-	}
 	if (log->groups & SHORTLOG_GROUP_COMMITTER) {
 		strbuf_reset(&ident);
 		format_commit_message(commit,
@@ -383,6 +374,10 @@ void shortlog_init(struct shortlog *log)
 
 void shortlog_finish_setup(struct shortlog *log)
 {
+	if (log->groups & SHORTLOG_GROUP_AUTHOR)
+		string_list_append(&log->format,
+				   log->email ? "%aN <%aE>" : "%aN");
+
 	string_list_sort(&log->trailers);
 }
 
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH v3 7/7] shortlog: implement `--group=committer` in terms of `--group=<format>`
  2022-10-21 22:25 ` [PATCH v3 0/7] " Taylor Blau
                     ` (5 preceding siblings ...)
  2022-10-21 22:25   ` [PATCH v3 6/7] shortlog: implement `--group=author` in terms of `--group=<format>` Taylor Blau
@ 2022-10-21 22:25   ` Taylor Blau
  2022-10-22  0:31   ` [PATCH v3 0/7] shortlog: introduce `--group=<format>` Jeff King
  7 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-21 22:25 UTC (permalink / raw)
  To: git; +Cc: jacob, peff, gitster, avarab
In the same spirit as the previous commit, reimplement
`--group=committer` as a special case of `--group=<format>`, too.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/shortlog.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 4b95820bca..27b057940d 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -233,7 +233,6 @@ static void insert_records_from_format(struct shortlog *log,
 
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
-	struct strbuf ident = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
 	struct strset dups = STRSET_INIT;
 	struct pretty_print_context ctx = {0};
@@ -253,20 +252,10 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	}
 	oneline_str = oneline.len ? oneline.buf : "<none>";
 
-	if (log->groups & SHORTLOG_GROUP_COMMITTER) {
-		strbuf_reset(&ident);
-		format_commit_message(commit,
-				      log->email ? "%cN <%cE>" : "%cN",
-				      &ident, &ctx);
-		if (!HAS_MULTI_BITS(log->groups) ||
-		    strset_add(&dups, ident.buf))
-			insert_one_record(log, ident.buf, oneline_str);
-	}
 	insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
 	insert_records_from_format(log, &dups, commit, &ctx, oneline_str);
 
 	strset_clear(&dups);
-	strbuf_release(&ident);
 	strbuf_release(&oneline);
 }
 
@@ -377,6 +366,9 @@ void shortlog_finish_setup(struct shortlog *log)
 	if (log->groups & SHORTLOG_GROUP_AUTHOR)
 		string_list_append(&log->format,
 				   log->email ? "%aN <%aE>" : "%aN");
+	if (log->groups & SHORTLOG_GROUP_COMMITTER)
+		string_list_append(&log->format,
+				   log->email ? "%cN <%cE>" : "%cN");
 
 	string_list_sort(&log->trailers);
 }
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* Re: [PATCH v3 4/7] shortlog: support arbitrary commit format `--group`s
  2022-10-21 22:25   ` [PATCH v3 4/7] shortlog: support arbitrary commit format `--group`s Taylor Blau
@ 2022-10-22  0:28     ` Jeff King
  0 siblings, 0 replies; 72+ messages in thread
From: Jeff King @ 2022-10-22  0:28 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster, avarab
On Fri, Oct 21, 2022 at 06:25:49PM -0400, Taylor Blau wrote:
> @@ -203,6 +205,32 @@ static void insert_records_from_trailers(struct shortlog *log,
>  	unuse_commit_buffer(commit, commit_buffer);
>  }
>  
> +static int shortlog_needs_dedup(const struct shortlog *log)
> +{
> +	return log->format.nr > 1 || log->trailers.nr;
> +}
So this is obviously much nicer to read. But I think there might be a
weird logic error during the series. Since we haven't yet folded
SHORTLOG_GROUP_AUTHOR, etc, into the "format" code, I think at this
point it still needs:
  HAS_MULTI_BITS(log->groups)
at the beginning of the OR-chain. Otherwise "shortlog --group=author
--format=%an" would fail to dedup, I think.
It's hard to care too much, since the end result of the series is
correct, and you'd end up just removing that part of the line in the
final patch. So I could go either way on re-rolling.
Sorry for not catching this in the last round.
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH v3 0/7] shortlog: introduce `--group=<format>`
  2022-10-21 22:25 ` [PATCH v3 0/7] " Taylor Blau
                     ` (6 preceding siblings ...)
  2022-10-21 22:25   ` [PATCH v3 7/7] shortlog: implement `--group=committer` " Taylor Blau
@ 2022-10-22  0:31   ` Jeff King
  7 siblings, 0 replies; 72+ messages in thread
From: Jeff King @ 2022-10-22  0:31 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, jacob, gitster, avarab
On Fri, Oct 21, 2022 at 06:25:37PM -0400, Taylor Blau wrote:
> Here is a (likely final) reroll of my series to implement arbitrary
> pretty formats as shortlog `--group`'s, based on a suggestion from Jacob
> Stopak.
> 
> The changes are cosmetic, based on the latest round of review from Peff.
> There aren't any substantive changes from the last round.
I looked over the range-diff and gave another careful read of patch 4.
It mostly looks good, though I did find an oddity in patch 4. It's a
really subtle breakage that goes away by the end, so I'm not entirely
sure it's worth dealing with.
So I'd be happy enough with this version, or if you want to do a final
re-roll that addresses that, I'm happy to review it.
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
* [PATCH 0/7] shortlog: introduce `--group=<format>`
  2022-10-11  0:33 [PATCH 0/7] shortlog: introduce `--group=<format>` Taylor Blau
                   ` (8 preceding siblings ...)
  2022-10-21 22:25 ` [PATCH v3 0/7] " Taylor Blau
@ 2022-10-24 18:55 ` Taylor Blau
  2022-10-24 18:55   ` [PATCH 1/7] shortlog: accept `--date`-related options Taylor Blau
                     ` (8 more replies)
  9 siblings, 9 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-24 18:55 UTC (permalink / raw)
  To: git
  Cc: Jacob Stopak, Jeff King, Junio C Hamano,
	Ævar Arnfjörð Bjarmason
Here is one more very tiny reroll of my series to implement arbitrary
pretty formats as shortlog `--group`'s, based on a suggestion from Jacob
Stopak.
There are only a couple of changes from last time: one to rebase onto
the current tip of 'master', and another to address a bug in 4/7 (which
was resolved by the end of the series, but now works consistently
throughout the series).
This was pointed out by Peff in [1], and he indicated there:
> It's hard to care too much, since the end result of the series is
> correct, and you'd end up just removing that part of the line in
> the final patch. So I could go either way on re-rolling.
...but not re-rolling would be somewhat unsatisfying. So here is a
reroll that I think should be good to go.
Thanks in advance for your review.
[1]: https://lore.kernel.org/git/Y1M5OH%2FFiWNaKO6p@coredump.intra.peff.net/
Jeff King (1):
  shortlog: accept `--date`-related options
Taylor Blau (6):
  shortlog: make trailer insertion a noop when appropriate
  shortlog: extract `--group` fragment for translation
  shortlog: support arbitrary commit format `--group`s
  shortlog: extract `shortlog_finish_setup()`
  shortlog: implement `--group=author` in terms of `--group=<format>`
  shortlog: implement `--group=committer` in terms of `--group=<format>`
 Documentation/git-shortlog.txt |  8 ++++
 builtin/log.c                  |  1 +
 builtin/shortlog.c             | 87 +++++++++++++++++++++++-----------
 shortlog.h                     |  5 ++
 t/t4201-shortlog.sh            | 39 +++++++++++++++
 5 files changed, 113 insertions(+), 27 deletions(-)
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply	[flat|nested] 72+ messages in thread
* [PATCH 1/7] shortlog: accept `--date`-related options
  2022-10-24 18:55 ` [PATCH " Taylor Blau
@ 2022-10-24 18:55   ` Taylor Blau
  2022-10-24 18:55   ` [PATCH 2/7] shortlog: make trailer insertion a noop when appropriate Taylor Blau
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-24 18:55 UTC (permalink / raw)
  To: git
  Cc: Jacob Stopak, Jeff King, Junio C Hamano,
	Ævar Arnfjörð Bjarmason
From: Jeff King <peff@peff.net>
Prepare for a future patch which will introduce arbitrary pretty formats
via the `--group` argument.
To allow additional customizability (for example, to support something
like `git shortlog -s --group='%aD' --date='format:%Y-%m' ...` (which
groups commits by the datestring 'YYYY-mm' according to author date), we
must store off the `--date` parsed from calling `parse_revision_opt()`.
Note that this also affects custom output `--format` strings in `git
shortlog`. Though this is a behavior change, this is arguably fixing a
long-standing bug (ie., that `--format` strings are not affected by
`--date` specifiers as they should be).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-shortlog.txt | 5 +++++
 builtin/shortlog.c             | 3 ++-
 shortlog.h                     | 2 ++
 t/t4201-shortlog.sh            | 7 +++++++
 4 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index f64e77047b..9ed9d6a9e7 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -47,6 +47,11 @@ OPTIONS
 
 	Each pretty-printed commit will be rewrapped before it is shown.
 
+--date=<format>::
+	Show dates formatted according to the given date string. (See
+	the `--date` option in the "Commit Formatting" section of
+	linkgit:git-log[1]).
+
 --group=<type>::
 	Group commits based on `<type>`. If no `--group` option is
 	specified, the default is `author`. `<type>` is one of:
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 7a1e1fe7c0..53c379a51d 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -211,7 +211,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	ctx.fmt = CMIT_FMT_USERFORMAT;
 	ctx.abbrev = log->abbrev;
 	ctx.print_email_subject = 1;
-	ctx.date_mode.type = DATE_NORMAL;
+	ctx.date_mode = log->date_mode;
 	ctx.output_encoding = get_log_output_encoding();
 
 	if (!log->summary) {
@@ -407,6 +407,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 	log.user_format = rev.commit_format == CMIT_FMT_USERFORMAT;
 	log.abbrev = rev.abbrev;
 	log.file = rev.diffopt.file;
+	log.date_mode = rev.date_mode;
 
 	if (!log.groups)
 		log.groups = SHORTLOG_GROUP_AUTHOR;
diff --git a/shortlog.h b/shortlog.h
index 3f7e9aabca..dc388dd459 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -2,6 +2,7 @@
 #define SHORTLOG_H
 
 #include "string-list.h"
+#include "date.h"
 
 struct commit;
 
@@ -15,6 +16,7 @@ struct shortlog {
 	int in2;
 	int user_format;
 	int abbrev;
+	struct date_mode date_mode;
 
 	enum {
 		SHORTLOG_GROUP_AUTHOR = (1 << 0),
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 3095b1b2ff..7547da539d 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -83,6 +83,13 @@ test_expect_success 'pretty format' '
 	test_cmp expect log.predictable
 '
 
+test_expect_success 'pretty format (with --date)' '
+	sed "s/SUBJECT/2005-04-07 OBJECT_NAME/" expect.template >expect &&
+	git shortlog --format="%ad %H" --date=short HEAD >log &&
+	fuzz log >log.predictable &&
+	test_cmp expect log.predictable
+'
+
 test_expect_success '--abbrev' '
 	sed s/SUBJECT/OBJID/ expect.template >expect &&
 	git shortlog --format="%h" --abbrev=35 HEAD >log &&
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH 2/7] shortlog: make trailer insertion a noop when appropriate
  2022-10-24 18:55 ` [PATCH " Taylor Blau
  2022-10-24 18:55   ` [PATCH 1/7] shortlog: accept `--date`-related options Taylor Blau
@ 2022-10-24 18:55   ` Taylor Blau
  2022-10-24 18:55   ` [PATCH 3/7] shortlog: extract `--group` fragment for translation Taylor Blau
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-24 18:55 UTC (permalink / raw)
  To: git
  Cc: Jacob Stopak, Jeff King, Junio C Hamano,
	Ævar Arnfjörð Bjarmason
When there are no trailers to insert, it is natural that
insert_records_from_trailers() should return without having done any
work.
But instead we guard this call unnecessarily by first checking whether
`log->groups` has the `SHORTLOG_GROUP_TRAILER` bit set.
Prepare to match a similar pattern in the future where a function which
inserts records of a certain type does no work when no specifiers
matching that type are given.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/shortlog.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 53c379a51d..18f0800c82 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -170,6 +170,9 @@ static void insert_records_from_trailers(struct shortlog *log,
 	const char *commit_buffer, *body;
 	struct strbuf ident = STRBUF_INIT;
 
+	if (!log->trailers.nr)
+		return;
+
 	/*
 	 * Using format_commit_message("%B") would be simpler here, but
 	 * this saves us copying the message.
@@ -240,9 +243,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 		    strset_add(&dups, ident.buf))
 			insert_one_record(log, ident.buf, oneline_str);
 	}
-	if (log->groups & SHORTLOG_GROUP_TRAILER) {
-		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
-	}
+	insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
 
 	strset_clear(&dups);
 	strbuf_release(&ident);
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH 3/7] shortlog: extract `--group` fragment for translation
  2022-10-24 18:55 ` [PATCH " Taylor Blau
  2022-10-24 18:55   ` [PATCH 1/7] shortlog: accept `--date`-related options Taylor Blau
  2022-10-24 18:55   ` [PATCH 2/7] shortlog: make trailer insertion a noop when appropriate Taylor Blau
@ 2022-10-24 18:55   ` Taylor Blau
  2022-10-24 18:55   ` [PATCH 4/7] shortlog: support arbitrary commit format `--group`s Taylor Blau
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-24 18:55 UTC (permalink / raw)
  To: git
  Cc: Jacob Stopak, Jeff King, Junio C Hamano,
	Ævar Arnfjörð Bjarmason
The subsequent commit will add another unhandled case in
`read_from_stdin()` which will want to use the same message as with
`--group=trailer`.
Extract the "--group=trailer" part from this message so the same
translation key can be used for both cases.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/shortlog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 18f0800c82..d0645769d7 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -132,7 +132,7 @@ static void read_from_stdin(struct shortlog *log)
 		match = committer_match;
 		break;
 	case SHORTLOG_GROUP_TRAILER:
-		die(_("using --group=trailer with stdin is not supported"));
+		die(_("using %s with stdin is not supported"), "--group=trailer");
 	default:
 		BUG("unhandled shortlog group");
 	}
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH 4/7] shortlog: support arbitrary commit format `--group`s
  2022-10-24 18:55 ` [PATCH " Taylor Blau
                     ` (2 preceding siblings ...)
  2022-10-24 18:55   ` [PATCH 3/7] shortlog: extract `--group` fragment for translation Taylor Blau
@ 2022-10-24 18:55   ` Taylor Blau
  2022-10-24 18:55   ` [PATCH 5/7] shortlog: extract `shortlog_finish_setup()` Taylor Blau
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-24 18:55 UTC (permalink / raw)
  To: git
  Cc: Jacob Stopak, Jeff King, Junio C Hamano,
	Ævar Arnfjörð Bjarmason
In addition to generating a shortlog based on committer, author, or the
identity in one or more specified trailers, it can be useful to generate
a shortlog based on an arbitrary commit format.
This can be used, for example, to generate a distribution of commit
activity over time, like so:
    $ git shortlog --group='%cd' --date='format:%Y-%m' -s v2.37.0..
       117  2022-06
       274  2022-07
       324  2022-08
       263  2022-09
         7  2022-10
Arbitrary commit formats can be used. In fact, `git shortlog`'s default
behavior (to count by commit authors) can be emulated as follows:
    $ git shortlog --group='%aN <%aE>' ...
and future patches will make the default behavior (as well as
`--committer`, and `--group=trailer:<trailer>`) special cases of the
more flexible `--group` option.
Note also that the SHORTLOG_GROUP_FORMAT enum value is used only to
designate that `--group:<format>` is in use when in stdin mode to
declare that the combination is invalid.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-shortlog.txt |  5 ++++-
 builtin/shortlog.c             | 41 +++++++++++++++++++++++++++++++++-
 shortlog.h                     |  2 ++
 t/t4201-shortlog.sh            | 32 ++++++++++++++++++++++++++
 4 files changed, 78 insertions(+), 2 deletions(-)
diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index 9ed9d6a9e7..7d0277d033 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -50,7 +50,7 @@ OPTIONS
 --date=<format>::
 	Show dates formatted according to the given date string. (See
 	the `--date` option in the "Commit Formatting" section of
-	linkgit:git-log[1]).
+	linkgit:git-log[1]). Useful with `--group=format:<format>`.
 
 --group=<type>::
 	Group commits based on `<type>`. If no `--group` option is
@@ -64,6 +64,9 @@ OPTIONS
    example, if your project uses `Reviewed-by` trailers, you might want
    to see who has been reviewing with
    `git shortlog -ns --group=trailer:reviewed-by`.
+ - `format:<format>`, any string accepted by the `--format` option of
+   'git log'. (See the "PRETTY FORMATS" section of
+   linkgit:git-log[1].)
 +
 Note that commits that do not include the trailer will not be counted.
 Likewise, commits with multiple trailers (e.g., multiple signoffs) may
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index d0645769d7..f3b237c5ff 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -133,6 +133,8 @@ static void read_from_stdin(struct shortlog *log)
 		break;
 	case SHORTLOG_GROUP_TRAILER:
 		die(_("using %s with stdin is not supported"), "--group=trailer");
+	case SHORTLOG_GROUP_FORMAT:
+		die(_("using %s with stdin is not supported"), "--group=format");
 	default:
 		BUG("unhandled shortlog group");
 	}
@@ -203,6 +205,32 @@ static void insert_records_from_trailers(struct shortlog *log,
 	unuse_commit_buffer(commit, commit_buffer);
 }
 
+static int shortlog_needs_dedup(const struct shortlog *log)
+{
+	return HAS_MULTI_BITS(log->groups) || log->format.nr > 1 || log->trailers.nr;
+}
+
+static void insert_records_from_format(struct shortlog *log,
+				       struct strset *dups,
+				       struct commit *commit,
+				       struct pretty_print_context *ctx,
+				       const char *oneline)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct string_list_item *item;
+
+	for_each_string_list_item(item, &log->format) {
+		strbuf_reset(&buf);
+
+		format_commit_message(commit, item->string, &buf, ctx);
+
+		if (!shortlog_needs_dedup(log) || strset_add(dups, buf.buf))
+			insert_one_record(log, buf.buf, oneline);
+	}
+
+	strbuf_release(&buf);
+}
+
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
 	struct strbuf ident = STRBUF_INIT;
@@ -244,6 +272,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 			insert_one_record(log, ident.buf, oneline_str);
 	}
 	insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
+	insert_records_from_format(log, &dups, commit, &ctx, oneline_str);
 
 	strset_clear(&dups);
 	strbuf_release(&ident);
@@ -315,6 +344,7 @@ static int parse_group_option(const struct option *opt, const char *arg, int uns
 	if (unset) {
 		log->groups = 0;
 		string_list_clear(&log->trailers, 0);
+		string_list_clear(&log->format, 0);
 	} else if (!strcasecmp(arg, "author"))
 		log->groups |= SHORTLOG_GROUP_AUTHOR;
 	else if (!strcasecmp(arg, "committer"))
@@ -322,8 +352,15 @@ static int parse_group_option(const struct option *opt, const char *arg, int uns
 	else if (skip_prefix(arg, "trailer:", &field)) {
 		log->groups |= SHORTLOG_GROUP_TRAILER;
 		string_list_append(&log->trailers, field);
-	} else
+	} else if (skip_prefix(arg, "format:", &field)) {
+		log->groups |= SHORTLOG_GROUP_FORMAT;
+		string_list_append(&log->format, field);
+	} else if (strchr(arg, '%')) {
+		log->groups |= SHORTLOG_GROUP_FORMAT;
+		string_list_append(&log->format, arg);
+	} else {
 		return error(_("unknown group type: %s"), arg);
+	}
 
 	return 0;
 }
@@ -341,6 +378,7 @@ void shortlog_init(struct shortlog *log)
 	log->in2 = DEFAULT_INDENT2;
 	log->trailers.strdup_strings = 1;
 	log->trailers.cmp = strcasecmp;
+	log->format.strdup_strings = 1;
 }
 
 int cmd_shortlog(int argc, const char **argv, const char *prefix)
@@ -481,4 +519,5 @@ void shortlog_output(struct shortlog *log)
 	log->list.strdup_strings = 1;
 	string_list_clear(&log->list, 1);
 	clear_mailmap(&log->mailmap);
+	string_list_clear(&log->format, 0);
 }
diff --git a/shortlog.h b/shortlog.h
index dc388dd459..4850a8c30f 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -22,8 +22,10 @@ struct shortlog {
 		SHORTLOG_GROUP_AUTHOR = (1 << 0),
 		SHORTLOG_GROUP_COMMITTER = (1 << 1),
 		SHORTLOG_GROUP_TRAILER = (1 << 2),
+		SHORTLOG_GROUP_FORMAT = (1 << 3),
 	} groups;
 	struct string_list trailers;
+	struct string_list format;
 
 	int email;
 	struct string_list mailmap;
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 7547da539d..8e4effebdb 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -244,6 +244,26 @@ test_expect_success 'shortlog --group=trailer:signed-off-by' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog --group=format' '
+	git shortlog -s --date="format:%Y" --group="format:%cN (%cd)" \
+		HEAD >actual &&
+	cat >expect <<-\EOF &&
+	     4	C O Mitter (2005)
+	     1	Sin Nombre (2005)
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'shortlog --group=<format> DWIM' '
+	git shortlog -s --date="format:%Y" --group="%cN (%cd)" HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'shortlog bogus --group' '
+	test_must_fail git shortlog --group=bogus HEAD 2>err &&
+	grep "unknown group type" err
+'
+
 test_expect_success 'trailer idents are split' '
 	cat >expect <<-\EOF &&
 	     2	C O Mitter
@@ -326,6 +346,18 @@ test_expect_success 'shortlog can match multiple groups' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog can match multiple format groups' '
+	GIT_COMMITTER_NAME="$GIT_AUTHOR_NAME" \
+		git commit --allow-empty -m "identical names" &&
+	test_tick &&
+	cat >expect <<-\EOF &&
+	     2	A U Thor
+	     1	C O Mitter
+	EOF
+	git shortlog -ns --group="%cn" --group="%an" -2 HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'set up option selection tests' '
 	git commit --allow-empty -F - <<-\EOF
 	subject
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH 5/7] shortlog: extract `shortlog_finish_setup()`
  2022-10-24 18:55 ` [PATCH " Taylor Blau
                     ` (3 preceding siblings ...)
  2022-10-24 18:55   ` [PATCH 4/7] shortlog: support arbitrary commit format `--group`s Taylor Blau
@ 2022-10-24 18:55   ` Taylor Blau
  2022-10-24 18:55   ` [PATCH 6/7] shortlog: implement `--group=author` in terms of `--group=<format>` Taylor Blau
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-24 18:55 UTC (permalink / raw)
  To: git
  Cc: Jacob Stopak, Jeff King, Junio C Hamano,
	Ævar Arnfjörð Bjarmason
Extract a function which finishes setting up the shortlog struct for
use. The caller in `make_cover_letter()` does not care about trailer
sorting, so it isn't strictly necessary to add a call there in this
patch.
But the next patch will add additional functionality to the new
`shortlog_finish_setup()` function, which the caller in
`make_cover_letter()` will care about.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/log.c      | 1 +
 builtin/shortlog.c | 7 ++++++-
 shortlog.h         | 1 +
 3 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/builtin/log.c b/builtin/log.c
index ee19dc5d45..b4d5420217 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1334,6 +1334,7 @@ static void make_cover_letter(struct rev_info *rev, int use_separate_file,
 	log.in2 = 4;
 	log.file = rev->diffopt.file;
 	log.groups = SHORTLOG_GROUP_AUTHOR;
+	shortlog_finish_setup(&log);
 	for (i = 0; i < nr; i++)
 		shortlog_add_commit(&log, list[i]);
 
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index f3b237c5ff..808bae9baa 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -381,6 +381,11 @@ void shortlog_init(struct shortlog *log)
 	log->format.strdup_strings = 1;
 }
 
+void shortlog_finish_setup(struct shortlog *log)
+{
+	string_list_sort(&log->trailers);
+}
+
 int cmd_shortlog(int argc, const char **argv, const char *prefix)
 {
 	struct shortlog log = { STRING_LIST_INIT_NODUP };
@@ -450,7 +455,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 
 	if (!log.groups)
 		log.groups = SHORTLOG_GROUP_AUTHOR;
-	string_list_sort(&log.trailers);
+	shortlog_finish_setup(&log);
 
 	/* assume HEAD if from a tty */
 	if (!nongit && !rev.pending.nr && isatty(0))
diff --git a/shortlog.h b/shortlog.h
index 4850a8c30f..28d04f951a 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -33,6 +33,7 @@ struct shortlog {
 };
 
 void shortlog_init(struct shortlog *log);
+void shortlog_finish_setup(struct shortlog *log);
 
 void shortlog_add_commit(struct shortlog *log, struct commit *commit);
 
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH 6/7] shortlog: implement `--group=author` in terms of `--group=<format>`
  2022-10-24 18:55 ` [PATCH " Taylor Blau
                     ` (4 preceding siblings ...)
  2022-10-24 18:55   ` [PATCH 5/7] shortlog: extract `shortlog_finish_setup()` Taylor Blau
@ 2022-10-24 18:55   ` Taylor Blau
  2022-10-24 18:55   ` [PATCH 7/7] shortlog: implement `--group=committer` " Taylor Blau
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-24 18:55 UTC (permalink / raw)
  To: git
  Cc: Jacob Stopak, Jeff King, Junio C Hamano,
	Ævar Arnfjörð Bjarmason
Instead of handling SHORTLOG_GROUP_AUTHOR separately, reimplement it as
a special case of the new `--group=<format>` mode, where the author mode
is a shorthand for `--group='%aN <%aE>'.
Note that we still need to keep the SHORTLOG_GROUP_AUTHOR enum since it
has a different meaning in `read_from_stdin()`, where it is still used
for a different purpose.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/shortlog.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 808bae9baa..f6032c6328 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -253,15 +253,6 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	}
 	oneline_str = oneline.len ? oneline.buf : "<none>";
 
-	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
-		strbuf_reset(&ident);
-		format_commit_message(commit,
-				      log->email ? "%aN <%aE>" : "%aN",
-				      &ident, &ctx);
-		if (!HAS_MULTI_BITS(log->groups) ||
-		    strset_add(&dups, ident.buf))
-			insert_one_record(log, ident.buf, oneline_str);
-	}
 	if (log->groups & SHORTLOG_GROUP_COMMITTER) {
 		strbuf_reset(&ident);
 		format_commit_message(commit,
@@ -383,6 +374,10 @@ void shortlog_init(struct shortlog *log)
 
 void shortlog_finish_setup(struct shortlog *log)
 {
+	if (log->groups & SHORTLOG_GROUP_AUTHOR)
+		string_list_append(&log->format,
+				   log->email ? "%aN <%aE>" : "%aN");
+
 	string_list_sort(&log->trailers);
 }
 
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* [PATCH 7/7] shortlog: implement `--group=committer` in terms of `--group=<format>`
  2022-10-24 18:55 ` [PATCH " Taylor Blau
                     ` (5 preceding siblings ...)
  2022-10-24 18:55   ` [PATCH 6/7] shortlog: implement `--group=author` in terms of `--group=<format>` Taylor Blau
@ 2022-10-24 18:55   ` Taylor Blau
  2022-10-24 21:59   ` [PATCH 0/7] shortlog: introduce `--group=<format>` Junio C Hamano
  2022-10-24 23:45   ` Jeff King
  8 siblings, 0 replies; 72+ messages in thread
From: Taylor Blau @ 2022-10-24 18:55 UTC (permalink / raw)
  To: git
  Cc: Jacob Stopak, Jeff King, Junio C Hamano,
	Ævar Arnfjörð Bjarmason
In the same spirit as the previous commit, reimplement
`--group=committer` as a special case of `--group=<format>`, too.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/shortlog.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index f6032c6328..27a87167e1 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -233,7 +233,6 @@ static void insert_records_from_format(struct shortlog *log,
 
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
-	struct strbuf ident = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
 	struct strset dups = STRSET_INIT;
 	struct pretty_print_context ctx = {0};
@@ -253,20 +252,10 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	}
 	oneline_str = oneline.len ? oneline.buf : "<none>";
 
-	if (log->groups & SHORTLOG_GROUP_COMMITTER) {
-		strbuf_reset(&ident);
-		format_commit_message(commit,
-				      log->email ? "%cN <%cE>" : "%cN",
-				      &ident, &ctx);
-		if (!HAS_MULTI_BITS(log->groups) ||
-		    strset_add(&dups, ident.buf))
-			insert_one_record(log, ident.buf, oneline_str);
-	}
 	insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
 	insert_records_from_format(log, &dups, commit, &ctx, oneline_str);
 
 	strset_clear(&dups);
-	strbuf_release(&ident);
 	strbuf_release(&oneline);
 }
 
@@ -377,6 +366,9 @@ void shortlog_finish_setup(struct shortlog *log)
 	if (log->groups & SHORTLOG_GROUP_AUTHOR)
 		string_list_append(&log->format,
 				   log->email ? "%aN <%aE>" : "%aN");
+	if (log->groups & SHORTLOG_GROUP_COMMITTER)
+		string_list_append(&log->format,
+				   log->email ? "%cN <%cE>" : "%cN");
 
 	string_list_sort(&log->trailers);
 }
-- 
2.38.0.16.g393fd4c6db
^ permalink raw reply related	[flat|nested] 72+ messages in thread
* Re: [PATCH 0/7] shortlog: introduce `--group=<format>`
  2022-10-24 18:55 ` [PATCH " Taylor Blau
                     ` (6 preceding siblings ...)
  2022-10-24 18:55   ` [PATCH 7/7] shortlog: implement `--group=committer` " Taylor Blau
@ 2022-10-24 21:59   ` Junio C Hamano
  2022-10-24 23:45   ` Jeff King
  8 siblings, 0 replies; 72+ messages in thread
From: Junio C Hamano @ 2022-10-24 21:59 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Jacob Stopak, Jeff King,
	Ævar Arnfjörð Bjarmason
Taylor Blau <me@ttaylorr.com> writes:
> Here is one more very tiny reroll of my series to implement arbitrary
> pretty formats as shortlog `--group`'s, based on a suggestion from Jacob
> Stopak.
>
> There are only a couple of changes from last time: one to rebase onto
> the current tip of 'master', and another to address a bug in 4/7 (which
> was resolved by the end of the series, but now works consistently
> throughout the series).
>
> This was pointed out by Peff in [1], and he indicated there:
>
>> It's hard to care too much, since the end result of the series is
>> correct, and you'd end up just removing that part of the line in
>> the final patch. So I could go either way on re-rolling.
One "huh?" is that the final patch in this round hasn't changed from
the last round, so the check for HAS_MULTI_BITS(log->groups) remains
in the final result.
IOW, after replacing the previous series with this round,
    $ git diff @{1}
    diff --git c/builtin/shortlog.c w/builtin/shortlog.c
    index 27b057940d..27a87167e1 100644
    --- c/builtin/shortlog.c
    +++ w/builtin/shortlog.c
    @@ -207,7 +207,7 @@ static void insert_records_from_trailers(struct shortlog *log,
     static int shortlog_needs_dedup(const struct shortlog *log)
     {
    -	return log->format.nr > 1 || log->trailers.nr;
    +	return HAS_MULTI_BITS(log->groups) || log->format.nr > 1 || log->trailers.nr;
     }
     static void insert_records_from_format(struct shortlog *log,
is the difference in the end result.
^ permalink raw reply	[flat|nested] 72+ messages in thread
* Re: [PATCH 0/7] shortlog: introduce `--group=<format>`
  2022-10-24 18:55 ` [PATCH " Taylor Blau
                     ` (7 preceding siblings ...)
  2022-10-24 21:59   ` [PATCH 0/7] shortlog: introduce `--group=<format>` Junio C Hamano
@ 2022-10-24 23:45   ` Jeff King
  8 siblings, 0 replies; 72+ messages in thread
From: Jeff King @ 2022-10-24 23:45 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Jacob Stopak, Junio C Hamano,
	Ævar Arnfjörð Bjarmason
On Mon, Oct 24, 2022 at 02:55:26PM -0400, Taylor Blau wrote:
> There are only a couple of changes from last time: one to rebase onto
> the current tip of 'master', and another to address a bug in 4/7 (which
> was resolved by the end of the series, but now works consistently
> throughout the series).
> 
> This was pointed out by Peff in [1], and he indicated there:
> 
> > It's hard to care too much, since the end result of the series is
> > correct, and you'd end up just removing that part of the line in
> > the final patch. So I could go either way on re-rolling.
> 
> ...but not re-rolling would be somewhat unsatisfying. So here is a
> reroll that I think should be good to go.
OK, good. It offended my sensibilities, too, but I didn't want to force
extra work on you because of them. I'm glad you agreed. ;)
This isn't mark v4, but the range-diff from v3 is:
1:  31487229e6 = 1:  e79db4b987 shortlog: accept `--date`-related options
2:  be2c6c0f4c = 2:  81e91f7049 shortlog: make trailer insertion a noop when appropriate
3:  bd38ac66f2 = 3:  cde611e3b0 shortlog: extract `--group` fragment for translation
4:  277ffe92ce ! 4:  020a2175cb shortlog: support arbitrary commit format `--group`s
    @@ builtin/shortlog.c: static void insert_records_from_trailers(struct shortlog *lo
      
     +static int shortlog_needs_dedup(const struct shortlog *log)
     +{
    -+	return log->format.nr > 1 || log->trailers.nr;
    ++	return HAS_MULTI_BITS(log->groups) || log->format.nr > 1 || log->trailers.nr;
     +}
     +
     +static void insert_records_from_format(struct shortlog *log,
5:  6c02a2daab = 5:  13a1f1b8c8 shortlog: extract `shortlog_finish_setup()`
6:  969bdaae39 = 6:  8af036c9f8 shortlog: implement `--group=author` in terms of `--group=<format>`
7:  bad7a2bc68 = 7:  09d138353b shortlog: implement `--group=committer` in terms of `--group=<format>`
I had assumed you would drop the HAS_MULTI_BITS() part from this
function after patch 7, as we know that everything is either a format or
a trailer. But actually, I like keeping it. It future-proofs us against
adding more non-format group types (though I have trouble imagining what
those would be), and it's not like it's expensive to keep around.
So this iteration looks good to me. Thanks!
-Peff
^ permalink raw reply	[flat|nested] 72+ messages in thread
end of thread, other threads:[~2022-10-25  0:57 UTC | newest]
Thread overview: 72+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-11  0:33 [PATCH 0/7] shortlog: introduce `--group=<format>` Taylor Blau
2022-10-11  0:34 ` [PATCH 1/7] Documentation: extract date-options.txt Taylor Blau
2022-10-11  0:54   ` Jeff King
2022-10-11  1:02     ` Taylor Blau
2022-10-11  0:34 ` [PATCH 2/7] shortlog: accept `--date`-related options Taylor Blau
2022-10-11  0:48   ` Jeff King
2022-10-11  1:14     ` Taylor Blau
2022-10-11  0:34 ` [PATCH 3/7] shortlog: extract `--group` fragment for translation Taylor Blau
2022-10-11 10:55   ` Ævar Arnfjörð Bjarmason
2022-10-11 13:24     ` Jeff King
2022-10-11  0:34 ` [PATCH 4/7] shortlog: support arbitrary commit format `--group`s Taylor Blau
2022-10-11  1:12   ` Jeff King
2022-10-11  1:24     ` Taylor Blau
2022-10-11  1:28       ` Taylor Blau
2022-10-11  2:03         ` Jeff King
2022-10-11  2:02       ` Jeff King
2022-10-21  2:39         ` Taylor Blau
2022-10-21  5:21           ` Jeff King
2022-10-21 22:12             ` Taylor Blau
2022-10-11  0:34 ` [PATCH 5/7] shortlog: implement `--group=author` in terms of `--group=<format>` Taylor Blau
2022-10-11  1:34   ` Jeff King
2022-10-11  1:48     ` Taylor Blau
2022-10-11  2:15       ` Jeff King
2022-10-21  2:03         ` Taylor Blau
2022-10-21  2:02       ` Taylor Blau
2022-10-21  5:03         ` Jeff King
2022-10-21 22:05           ` Taylor Blau
2022-10-11 10:57   ` Ævar Arnfjörð Bjarmason
2022-10-11 11:07     ` Ævar Arnfjörð Bjarmason
2022-10-11  0:34 ` [PATCH 6/7] shortlog: implement `--group=committer` " Taylor Blau
2022-10-11  1:35   ` Jeff King
2022-10-11  0:34 ` [PATCH 7/7] shortlog: implement `--group=trailer` " Taylor Blau
2022-10-11  1:50   ` Jeff King
2022-10-11  1:57     ` Jeff King
2022-10-21  1:38     ` Taylor Blau
2022-10-21  3:11 ` [PATCH v2 0/6] shortlog: introduce `--group=<format>` Taylor Blau
2022-10-21  3:11   ` [PATCH v2 1/6] shortlog: accept `--date`-related options Taylor Blau
2022-10-21  5:32     ` Jeff King
2022-10-21 21:55       ` Taylor Blau
2022-10-21  3:11   ` [PATCH v2 2/6] shortlog: make trailer insertion a noop when appropriate Taylor Blau
2022-10-21  5:38     ` Jeff King
2022-10-21 21:57       ` Taylor Blau
2022-10-21  3:11   ` [PATCH v2 3/6] shortlog: extract `--group` fragment for translation Taylor Blau
2022-10-21  5:40     ` Jeff King
2022-10-21  3:11   ` [PATCH v2 4/6] shortlog: support arbitrary commit format `--group`s Taylor Blau
2022-10-21  5:48     ` Jeff King
2022-10-21 22:07       ` Taylor Blau
2022-10-21  3:11   ` [PATCH v2 5/6] shortlog: implement `--group=author` in terms of `--group=<format>` Taylor Blau
2022-10-21  5:50     ` Jeff King
2022-10-21  3:11   ` [PATCH v2 6/6] shortlog: implement `--group=committer` " Taylor Blau
2022-10-21  5:51     ` Jeff King
2022-10-21  5:53   ` [PATCH v2 0/6] shortlog: introduce `--group=<format>` Jeff King
2022-10-21 22:25 ` [PATCH v3 0/7] " Taylor Blau
2022-10-21 22:25   ` [PATCH v3 1/7] shortlog: accept `--date`-related options Taylor Blau
2022-10-21 22:25   ` [PATCH v3 2/7] shortlog: make trailer insertion a noop when appropriate Taylor Blau
2022-10-21 22:25   ` [PATCH v3 3/7] shortlog: extract `--group` fragment for translation Taylor Blau
2022-10-21 22:25   ` [PATCH v3 4/7] shortlog: support arbitrary commit format `--group`s Taylor Blau
2022-10-22  0:28     ` Jeff King
2022-10-21 22:25   ` [PATCH v3 5/7] shortlog: extract `shortlog_finish_setup()` Taylor Blau
2022-10-21 22:25   ` [PATCH v3 6/7] shortlog: implement `--group=author` in terms of `--group=<format>` Taylor Blau
2022-10-21 22:25   ` [PATCH v3 7/7] shortlog: implement `--group=committer` " Taylor Blau
2022-10-22  0:31   ` [PATCH v3 0/7] shortlog: introduce `--group=<format>` Jeff King
2022-10-24 18:55 ` [PATCH " Taylor Blau
2022-10-24 18:55   ` [PATCH 1/7] shortlog: accept `--date`-related options Taylor Blau
2022-10-24 18:55   ` [PATCH 2/7] shortlog: make trailer insertion a noop when appropriate Taylor Blau
2022-10-24 18:55   ` [PATCH 3/7] shortlog: extract `--group` fragment for translation Taylor Blau
2022-10-24 18:55   ` [PATCH 4/7] shortlog: support arbitrary commit format `--group`s Taylor Blau
2022-10-24 18:55   ` [PATCH 5/7] shortlog: extract `shortlog_finish_setup()` Taylor Blau
2022-10-24 18:55   ` [PATCH 6/7] shortlog: implement `--group=author` in terms of `--group=<format>` Taylor Blau
2022-10-24 18:55   ` [PATCH 7/7] shortlog: implement `--group=committer` " Taylor Blau
2022-10-24 21:59   ` [PATCH 0/7] shortlog: introduce `--group=<format>` Junio C Hamano
2022-10-24 23:45   ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).