git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Speed up git-notes show
@ 2024-02-05 20:49 Maarten Bosmans
  2024-02-05 20:49 ` [PATCH 1/4] notes: print note blob to stdout directly Maarten Bosmans
                   ` (5 more replies)
  0 siblings, 6 replies; 36+ messages in thread
From: Maarten Bosmans @ 2024-02-05 20:49 UTC (permalink / raw)
  To: git; +Cc: Teng Long

First time contributor here, trying my first git.git patch series.

BACKGROUND
  We have a script that runs a range of build tests for all new commits in the
  repository and adds a line to the commit note with the result from the test.
  Something along the lines of:
      occa-build-jit-gnu-cuda-develop: PASSED (<hostname>, 2024-01-01 00:00:00+01:00)
  Pretty useful to quickly check that all commits at least build, not only for
  master, but also in progress feature branches.  (a passing test suite is
  generally only required at the merge point)
    
PROBLEM
  The bash script loops over all remote refs and lists the commits newer than
  <N> days ago.  For each commit its note is read and grep'ed for an existing
  test name to see whether the build test needs to run again.  The `git note show`
  command that is in this loop nest only takes 14ms to execute, but as it is in
  a loop, those times add up.

ANALYSIS
  When asked to show a note for a specific commit, git looks up the blob hash
  for the note and executes `git show` with that hash.  That of course adds
  the child process overhead, but also causes the initialization of a lot of
  log related configuration, such as for decorations or the mailmap.  Simply
  outputting the blob directly in the main process reduces the run time by
  almost halve.

When looking through the git show implementation for useful stuff that command
does that should also be done when showing a note, I could only find the
`setup_pager()` call. All other git show functionality was related to showing
commits or other non-blob objects.

The only thing I was not 100% sure of was the textconv_object stuff.  From what
I could deduce was that this is only ever used on blobs that represent files in
a tree, not on blobs that represent note objects.  So I did not include any
textconv calls in the git notes code.

The first commit is the main one fixing performance. The other three are just
eliminating some overhead I noticed when going through the git notes code.


Maarten Bosmans (4):
  notes: print note blob to stdout directly
  notes: use exisisting function stream_blob_to_fd
  notes: do not clean up right before calling die()
  notes: use strbuf_attach to take ownership of the object contents

 builtin/notes.c | 37 ++++++++++---------------------------
 1 file changed, 10 insertions(+), 27 deletions(-)

-- 
2.35.3


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/4] notes: print note blob to stdout directly
  2024-02-05 20:49 [PATCH 0/4] Speed up git-notes show Maarten Bosmans
@ 2024-02-05 20:49 ` Maarten Bosmans
  2024-02-06  3:44   ` Junio C Hamano
  2024-02-05 20:49 ` [PATCH 2/4] notes: use exisisting function stream_blob_to_fd Maarten Bosmans
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: Maarten Bosmans @ 2024-02-05 20:49 UTC (permalink / raw)
  To: git; +Cc: Teng Long, Maarten Bosmans

From: Maarten Bosmans <mkbosmans@gmail.com>

From: Maarten Bosmans <maarten.bosmans@vortech.nl>

Avoid the need to launch a subprocess by calling stream_blob_to_fd
directly.  This does not only get rid of the overhead of a separate
child process, but also avoids the initalization of the whole log
machinery that `git show` does.  That is needed for example to show
decorated commits and applying the mailmap.  For simply displaying
a blob however, the only useful thing show does is enabling the pager.

Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl>
---
 builtin/notes.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/builtin/notes.c b/builtin/notes.c
index e65cae0bcf..8efe9809b2 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -20,7 +20,8 @@
 #include "repository.h"
 #include "pretty.h"
 #include "refs.h"
-#include "exec-cmd.h"
+#include "pager.h"
+#include "streaming.h"
 #include "run-command.h"
 #include "parse-options.h"
 #include "string-list.h"
@@ -751,7 +752,7 @@ static int show(int argc, const char **argv, const char *prefix)
 	struct notes_tree *t;
 	struct object_id object;
 	const struct object_id *note;
-	int retval;
+	int retval = 0;
 	struct option options[] = {
 		OPT_END()
 	};
@@ -776,8 +777,9 @@ static int show(int argc, const char **argv, const char *prefix)
 		retval = error(_("no note found for object %s."),
 			       oid_to_hex(&object));
 	else {
-		const char *show_args[3] = {"show", oid_to_hex(note), NULL};
-		retval = execv_git_cmd(show_args);
+		setup_pager();
+		if (stream_blob_to_fd(1, note, NULL, 0))
+			die(_("object %s is not a blob"), oid_to_hex(note));
 	}
 	free_notes(t);
 	return retval;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/4] notes: use exisisting function stream_blob_to_fd
  2024-02-05 20:49 [PATCH 0/4] Speed up git-notes show Maarten Bosmans
  2024-02-05 20:49 ` [PATCH 1/4] notes: print note blob to stdout directly Maarten Bosmans
@ 2024-02-05 20:49 ` Maarten Bosmans
  2024-02-05 22:00   ` Eric Sunshine
  2024-02-05 20:49 ` [PATCH 3/4] notes: do not clean up right before calling die() Maarten Bosmans
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: Maarten Bosmans @ 2024-02-05 20:49 UTC (permalink / raw)
  To: git; +Cc: Teng Long, Maarten Bosmans

From: Maarten Bosmans <mkbosmans@gmail.com>

From: Maarten Bosmans <maarten.bosmans@vortech.nl>

Use functionality from streaming.c and remove the copy_obj_to_fd()
function local to notes.c.

Streaming the blob to stdout instead of copying through an
intermediate buffer might also be more efficient, but at the
size a typical note is, this is unlikely to matter a lot.

Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl>
---
 builtin/notes.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/builtin/notes.c b/builtin/notes.c
index 8efe9809b2..048b8c559c 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -149,18 +149,6 @@ static int list_each_note(const struct object_id *object_oid,
 	return 0;
 }
 
-static void copy_obj_to_fd(int fd, const struct object_id *oid)
-{
-	unsigned long size;
-	enum object_type type;
-	char *buf = repo_read_object_file(the_repository, oid, &type, &size);
-	if (buf) {
-		if (size)
-			write_or_die(fd, buf, size);
-		free(buf);
-	}
-}
-
 static void write_commented_object(int fd, const struct object_id *object)
 {
 	struct child_process show = CHILD_PROCESS_INIT;
@@ -205,7 +193,7 @@ static void prepare_note_data(const struct object_id *object, struct note_data *
 		if (d->given)
 			write_or_die(fd, d->buf.buf, d->buf.len);
 		else if (old_note)
-			copy_obj_to_fd(fd, old_note);
+			stream_blob_to_fd(fd, old_note, NULL, 0);
 
 		strbuf_addch(&buf, '\n');
 		strbuf_add_commented_lines(&buf, "\n", strlen("\n"), comment_line_char);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/4] notes: do not clean up right before calling die()
  2024-02-05 20:49 [PATCH 0/4] Speed up git-notes show Maarten Bosmans
  2024-02-05 20:49 ` [PATCH 1/4] notes: print note blob to stdout directly Maarten Bosmans
  2024-02-05 20:49 ` [PATCH 2/4] notes: use exisisting function stream_blob_to_fd Maarten Bosmans
@ 2024-02-05 20:49 ` Maarten Bosmans
  2024-02-05 20:49 ` [PATCH 4/4] notes: use strbuf_attach to take ownership of the object contents Maarten Bosmans
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: Maarten Bosmans @ 2024-02-05 20:49 UTC (permalink / raw)
  To: git; +Cc: Teng Long, Maarten Bosmans

From: Maarten Bosmans <mkbosmans@gmail.com>

From: Maarten Bosmans <maarten.bosmans@vortech.nl>

For consistency with the other uses of die() in the same function.

Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl>
---
 builtin/notes.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/builtin/notes.c b/builtin/notes.c
index 048b8c559c..0e98a820dd 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -310,12 +310,8 @@ static int parse_reuse_arg(const struct option *opt, const char *arg, int unset)
 		die(_("failed to resolve '%s' as a valid ref."), arg);
 	if (!(value = repo_read_object_file(the_repository, &object, &type, &len)))
 		die(_("failed to read object '%s'."), arg);
-	if (type != OBJ_BLOB) {
-		strbuf_release(&msg->buf);
-		free(value);
-		free(msg);
+	if (type != OBJ_BLOB)
 		die(_("cannot read note data from non-blob object '%s'."), arg);
-	}
 
 	strbuf_add(&msg->buf, value, len);
 	free(value);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/4] notes: use strbuf_attach to take ownership of the object contents
  2024-02-05 20:49 [PATCH 0/4] Speed up git-notes show Maarten Bosmans
                   ` (2 preceding siblings ...)
  2024-02-05 20:49 ` [PATCH 3/4] notes: do not clean up right before calling die() Maarten Bosmans
@ 2024-02-05 20:49 ` Maarten Bosmans
  2024-02-06  7:08 ` [PATCH 0/4] Speed up git-notes show Kristoffer Haugsbakk
  2024-02-18 19:59 ` [PATCH v2 0/5] " Maarten Bosmans
  5 siblings, 0 replies; 36+ messages in thread
From: Maarten Bosmans @ 2024-02-05 20:49 UTC (permalink / raw)
  To: git; +Cc: Teng Long, Maarten Bosmans

From: Maarten Bosmans <mkbosmans@gmail.com>

From: Maarten Bosmans <maarten.bosmans@vortech.nl>

Avoid an extra allocation in the strbuf when pushing the string into it.

Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl>
---
 builtin/notes.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/builtin/notes.c b/builtin/notes.c
index 0e98a820dd..71f36667f3 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -313,10 +313,8 @@ static int parse_reuse_arg(const struct option *opt, const char *arg, int unset)
 	if (type != OBJ_BLOB)
 		die(_("cannot read note data from non-blob object '%s'."), arg);
 
-	strbuf_add(&msg->buf, value, len);
-	free(value);
+	strbuf_attach(&msg->buf, value, len, len + 1);
 
-	msg->buf.len = len;
 	ALLOC_GROW_BY(d->messages, d->msg_nr, 1, d->msg_alloc);
 	d->messages[d->msg_nr - 1] = msg;
 	msg->stripspace = NO_STRIPSPACE;
@@ -702,12 +700,11 @@ static int append_edit(int argc, const char **argv, const char *prefix)
 		char *prev_buf = repo_read_object_file(the_repository, note, &type, &size);
 
 		if (prev_buf && size)
-			strbuf_add(&buf, prev_buf, size);
+			strbuf_attach(&buf, prev_buf, size, size + 1);
 		if (d.buf.len && prev_buf && size)
 			append_separator(&buf);
 		strbuf_insert(&d.buf, 0, buf.buf, buf.len);
 
-		free(prev_buf);
 		strbuf_release(&buf);
 	}
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] notes: use exisisting function stream_blob_to_fd
  2024-02-05 20:49 ` [PATCH 2/4] notes: use exisisting function stream_blob_to_fd Maarten Bosmans
@ 2024-02-05 22:00   ` Eric Sunshine
  0 siblings, 0 replies; 36+ messages in thread
From: Eric Sunshine @ 2024-02-05 22:00 UTC (permalink / raw)
  To: Maarten Bosmans; +Cc: git, Teng Long, Maarten Bosmans

On Mon, Feb 5, 2024 at 4:53 PM Maarten Bosmans <mkbosmans@gmail.com> wrote:
> notes: use exisisting function stream_blob_to_fd

s/exisisting/existing/

> Use functionality from streaming.c and remove the copy_obj_to_fd()
> function local to notes.c.
>
> Streaming the blob to stdout instead of copying through an
> intermediate buffer might also be more efficient, but at the
> size a typical note is, this is unlikely to matter a lot.
>
> Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] notes: print note blob to stdout directly
  2024-02-05 20:49 ` [PATCH 1/4] notes: print note blob to stdout directly Maarten Bosmans
@ 2024-02-06  3:44   ` Junio C Hamano
  2024-02-06  9:55     ` Maarten Bosmans
  2024-02-06 13:55     ` Kristoffer Haugsbakk
  0 siblings, 2 replies; 36+ messages in thread
From: Junio C Hamano @ 2024-02-06  3:44 UTC (permalink / raw)
  To: Maarten Bosmans; +Cc: git, Teng Long, Maarten Bosmans

Maarten Bosmans <mkbosmans@gmail.com> writes:

> From: Maarten Bosmans <mkbosmans@gmail.com>
>
> From: Maarten Bosmans <maarten.bosmans@vortech.nl>

Which one of you are you?  Please make up your mind and use only one
;-)  IOW, the first one is unneeded, as the latter matches what you
have on the S-o-b line.

> Avoid the need to launch a subprocess by calling stream_blob_to_fd
> directly.  This does not only get rid of the overhead of a separate
> child process, but also avoids the initalization of the whole log
> machinery that `git show` does.  That is needed for example to show
> decorated commits and applying the mailmap.  For simply displaying
> a blob however, the only useful thing show does is enabling the pager.
>
> Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl>
> ---
>  builtin/notes.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)

I am not sure if we want to accept an approach that feels somewhat
narrow/short sighted, like this patch.  When "git show" learns an
improved way to show blob objects, who will update the code this
patch touches to teach it to use the same improved way to show the
notes?

I actually was hoping, after seeing the use case description in the
cover letter, that the series would be introducing a batch mode
interface to allow callers to ask notes for many objects and have
the command respond notes for these objects in a way that which
piece of output corresponds to which object in the request, reducing
the process overhead amortised over many objects.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/4] Speed up git-notes show
  2024-02-05 20:49 [PATCH 0/4] Speed up git-notes show Maarten Bosmans
                   ` (3 preceding siblings ...)
  2024-02-05 20:49 ` [PATCH 4/4] notes: use strbuf_attach to take ownership of the object contents Maarten Bosmans
@ 2024-02-06  7:08 ` Kristoffer Haugsbakk
  2024-02-06  8:51   ` Maarten Bosmans
  2024-02-18 19:59 ` [PATCH v2 0/5] " Maarten Bosmans
  5 siblings, 1 reply; 36+ messages in thread
From: Kristoffer Haugsbakk @ 2024-02-06  7:08 UTC (permalink / raw)
  To: Maarten Bosmans; +Cc: Teng Long, git

On Mon, Feb 5, 2024, at 21:49, Maarten Bosmans wrote:
> BACKGROUND
>   We have a script that runs a range of build tests for all new
>   commits in the repository and adds a line to the commit note with
>   the result from the test.

Nice. I’ve seen a few tools that do that:

• https://github.com/spotify/git-testhttps://github.com/mhagger/git-test

Thanks for improving this use-case.

-- 
Kristoffer Haugsbakk

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/4] Speed up git-notes show
  2024-02-06  7:08 ` [PATCH 0/4] Speed up git-notes show Kristoffer Haugsbakk
@ 2024-02-06  8:51   ` Maarten Bosmans
  0 siblings, 0 replies; 36+ messages in thread
From: Maarten Bosmans @ 2024-02-06  8:51 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: git

Op di 6 feb 2024 om 08:08 schreef Kristoffer Haugsbakk <code@khaugsbakk.name>:
> Nice. I’ve seen a few tools that do that:
>
> • https://github.com/spotify/git-test
> • https://github.com/mhagger/git-test

Cool to see others are doing something similar.
The first git-test stores the test results in a local cache though,
not in git notes.
The second git-test stores the results in a note attached to the tree
instead of the commit. This is a clever idea, as you avoid running
tests twice on the same tree (e.g. the merge commit of a `git merge
--no-ff` merged branch that was already rebased to be current). Down
side is that git log and can display commit notes after the message,
but cannot display tree notes.

In our setup there's one CI server running these tests and pushing the
notes to the repo. This way everybody (developer, reviewer) can see
the state of in-progress work without having to run all build test
variations themselves.
Anyway, good to see wider usage of the git notes feature as
inspiration for new possibilities.

Maarten

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] notes: print note blob to stdout directly
  2024-02-06  3:44   ` Junio C Hamano
@ 2024-02-06  9:55     ` Maarten Bosmans
  2024-02-06 17:52       ` Junio C Hamano
  2024-02-06 13:55     ` Kristoffer Haugsbakk
  1 sibling, 1 reply; 36+ messages in thread
From: Maarten Bosmans @ 2024-02-06  9:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Teng Long, Maarten Bosmans

Op di 6 feb 2024 om 04:44 schreef Junio C Hamano <gitster@pobox.com>:
>
> Maarten Bosmans <mkbosmans@gmail.com> writes:
> > Avoid the need to launch a subprocess by calling stream_blob_to_fd
> > directly.  This does not only get rid of the overhead of a separate
> > child process, but also avoids the initalization of the whole log
> > machinery that `git show` does.  That is needed for example to show
> > decorated commits and applying the mailmap.  For simply displaying
> > a blob however, the only useful thing show does is enabling the pager.
> >
> > Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl>
> > ---
> >  builtin/notes.c | 10 ++++++----
> >  1 file changed, 6 insertions(+), 4 deletions(-)
>
> I am not sure if we want to accept an approach that feels somewhat
> narrow/short sighted, like this patch.  When "git show" learns an
> improved way to show blob objects, who will update the code this
> patch touches to teach it to use the same improved way to show the
> notes?

Yes, you would loose some of the flexibility by just calling out to
another git command. But whether that is actually an issue depends on
the way git show would be extended. As I mentioned in the cover
letter, there is some handling of textconv
being done in git show for the blob case. I though it would not be
applicable to note blobs, but am not entirely sure.

To address your concern and the textconv case, instead of calling
stream_blob_to_fd() directly, show_blob_object() could be used. Right
now that is a static function in building/log.c I guess that needs to
be moved then to log.c (or show.c)?

I'm still not sure that is a good approach though, as the notes
edit/append subcommands use repo_read_object_file() directly to fill
the current note contents. So anything fancy that git show would do to
blob output is not reflected when editing a note.

> I actually was hoping, after seeing the use case description in the
> cover letter, that the series would be introducing a batch mode
> interface to allow callers to ask notes for many objects and have
> the command respond notes for these objects in a way that which
> piece of output corresponds to which object in the request, reducing
> the process overhead amortised over many objects.

That is also a cool idea. That would probably use the functionality of
the cat-file batch mode, right? In that case you loose any future
changes to git show anyway. At least the current behavior of git show
when fed multiple blob hashes is to simply output them directly after
another, without any way of identifying the parts.

My concern would be that this additional functionality would not get
used much, as it requires a bit more than just a quick-n-dirty bash
script with some loops and command substitutions. If that approach is
reasonably fast (the object of this patch series), then there is not
much to be gained in a practical sense by a batch mode. To me it seems
the git notes code is already a bit undermaintained currently, so I'd
rather keep it generic, instead of customizing it to more specific use
cases.

Maarten

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] notes: print note blob to stdout directly
  2024-02-06  3:44   ` Junio C Hamano
  2024-02-06  9:55     ` Maarten Bosmans
@ 2024-02-06 13:55     ` Kristoffer Haugsbakk
  1 sibling, 0 replies; 36+ messages in thread
From: Kristoffer Haugsbakk @ 2024-02-06 13:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Teng Long, Maarten Bosmans, Maarten Bosmans

On Tue, Feb 6, 2024, at 04:44, Junio C Hamano wrote:
> I actually was hoping, after seeing the use case description in the
> cover letter, that the series would be introducing a batch mode
> interface to allow callers to ask notes for many objects and have
> the command respond notes for these objects in a way that which
> piece of output corresponds to which object in the request, reducing
> the process overhead amortised over many objects.

That sounds great. I imagine that would be very useful.

-- 
Kristoffer Haugsbakk

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] notes: print note blob to stdout directly
  2024-02-06  9:55     ` Maarten Bosmans
@ 2024-02-06 17:52       ` Junio C Hamano
  2024-02-13  8:00         ` Jeff King
  2024-02-15  7:41         ` Maarten Bosmans
  0 siblings, 2 replies; 36+ messages in thread
From: Junio C Hamano @ 2024-02-06 17:52 UTC (permalink / raw)
  To: Maarten Bosmans; +Cc: git, Teng Long, Maarten Bosmans

Maarten Bosmans <mkbosmans@gmail.com> writes:

>> I am not sure if we want to accept an approach that feels somewhat
>> narrow/short sighted, like this patch.  When "git show" learns an
>> improved way to show blob objects, who will update the code this
>> patch touches to teach it to use the same improved way to show the
>> notes?

Another thing I forgot to mention was that I suspected the use of
"git show" was so that we can deal with notes trees whose leaves are
*not* blob objects.  As our "git notes" machinery, including how the
initial contents are populated with "git notes add" and how existing
notes on the same object are merged, all assume and depend on that
the leaves are blobs, we can say "we'll just dump to stdout with
write_in_full()", but who knows what kind of custom crap random
third-party tools and IDEs try to use notes trees to store.

Even if we limit ourselves to blobs, they may not be suitable to be
dumped to tty (or a pager, for that matter), and I can see how
textconv could be used as a way out ("detect that the notes payload
is an image and spawn display" kind of hack in a repository full of
images and notes used to store thumbnails, perhaps).

> That is also a cool idea. That would probably use the functionality of
> the cat-file batch mode, right?

Not really.  I was hoping that "git show" that can take multiple
objects from its command line would directly be used, or with a new
option that gives a separator between these objects.

Perhaps we want a new option, e.g., "git notes show --text" that
passes the contents of the leaf blob object to write_in_full(),
bypassing all the things "git show" does, while in a rare case when
the leaf we find is not a blob to invoke "git show".  That might be
a safe approach to move forward, if we wanted to do this.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] notes: print note blob to stdout directly
  2024-02-06 17:52       ` Junio C Hamano
@ 2024-02-13  8:00         ` Jeff King
  2024-02-13 17:35           ` Junio C Hamano
  2024-02-15  7:46           ` Maarten Bosmans
  2024-02-15  7:41         ` Maarten Bosmans
  1 sibling, 2 replies; 36+ messages in thread
From: Jeff King @ 2024-02-13  8:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Maarten Bosmans, git, Teng Long, Maarten Bosmans

On Tue, Feb 06, 2024 at 09:52:38AM -0800, Junio C Hamano wrote:

> > That is also a cool idea. That would probably use the functionality of
> > the cat-file batch mode, right?
> 
> Not really.  I was hoping that "git show" that can take multiple
> objects from its command line would directly be used, or with a new
> option that gives a separator between these objects.

How about:

  cat some_commit_ids |
  git show --stdin -s -z --format='%H%n%N'

?

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] notes: print note blob to stdout directly
  2024-02-13  8:00         ` Jeff King
@ 2024-02-13 17:35           ` Junio C Hamano
  2024-02-15  5:26             ` Jeff King
  2024-02-15  7:46           ` Maarten Bosmans
  1 sibling, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2024-02-13 17:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Maarten Bosmans, git, Teng Long, Maarten Bosmans

Jeff King <peff@peff.net> writes:

> On Tue, Feb 06, 2024 at 09:52:38AM -0800, Junio C Hamano wrote:
>
>> > That is also a cool idea. That would probably use the functionality of
>> > the cat-file batch mode, right?
>> 
>> Not really.  I was hoping that "git show" that can take multiple
>> objects from its command line would directly be used, or with a new
>> option that gives a separator between these objects.
>
> How about:
>
>   cat some_commit_ids |
>   git show --stdin -s -z --format='%H%n%N'

Yeah, that is more in line with what I was hoping to see, instead of
invoking "git show %s" for a note object one by one.

Thanks.

Remind me if you can if we (1) had plans to allow non-blob objects
as notes, or (2) actively support to have non-text blob objects as
notes.  I _think_ we do assume people may try to add non-text blob
as notes (while they accept that they cannot merge two such notes on
the same object), but I do not recall if we planned to allow them to
store trees and commits.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] notes: print note blob to stdout directly
  2024-02-13 17:35           ` Junio C Hamano
@ 2024-02-15  5:26             ` Jeff King
  2024-02-16  6:25               ` Junio C Hamano
  0 siblings, 1 reply; 36+ messages in thread
From: Jeff King @ 2024-02-15  5:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Maarten Bosmans, git, Teng Long, Maarten Bosmans

On Tue, Feb 13, 2024 at 09:35:28AM -0800, Junio C Hamano wrote:

> Remind me if you can if we (1) had plans to allow non-blob objects
> as notes, or (2) actively support to have non-text blob objects as
> notes.  I _think_ we do assume people may try to add non-text blob
> as notes (while they accept that they cannot merge two such notes on
> the same object), but I do not recall if we planned to allow them to
> store trees and commits.

Short answer: no for (1) and yes for (2).

In my head non-blob notes were always something we'd eventually allow.
But I don't think the "git notes" tool really helps you at all (it
insists on being fed message content and makes the blob itself, rather
than letting you specify an oid).

I wondered what the display side might do here (like, say, segfault).
But it looks like the notes code does not even consider a subtree like
this to be a note. It ends up in the "non_note" list of the notes struct
(and format_note() on the display side explicitly ignores non-blobs
anyway).

So it looks like it did not ever work, and nobody can even be using it
accidentally (of course you can put whatever garbage you like under
refs/notes/*, but if neither git-notes nor the display machinery
works with it, I think we can discount that entirely).

Non-text blob objects, on the other hand, should be easy to do with "git
notes add -F". Interestingly, the display code (in format_note() again)
converts embedded NULs into newlines. I think this is an accidental
behavior due to the use of strchrnul().

Of course it's reasonable to also store notes that aren't meant to be
displayed via git-log, etc, at all. The textconv-caching machinery
stores its results in a separate notes ref. Those should generally be
text (since the point is to come up with something diff-able). But I
think it would be perfectly reasonable for another mechanism to make use
of notes in the same way and store binary goo.

(And of course we can store notes on any object, not just commits, as
the textconv cache demonstrates. But I think that is orthogonal to what
you're asking).

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] notes: print note blob to stdout directly
  2024-02-06 17:52       ` Junio C Hamano
  2024-02-13  8:00         ` Jeff King
@ 2024-02-15  7:41         ` Maarten Bosmans
  1 sibling, 0 replies; 36+ messages in thread
From: Maarten Bosmans @ 2024-02-15  7:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Teng Long

Op di 6 feb 2024 om 18:52 schreef Junio C Hamano <gitster@pobox.com>:
> Maarten Bosmans <mkbosmans@gmail.com> writes:
> > That is also a cool idea. That would probably use the functionality of
> > the cat-file batch mode, right?
>
> Not really.  I was hoping that "git show" that can take multiple
> objects from its command line would directly be used, or with a new
> option that gives a separator between these objects.

OK, I need some guidance here. Should I submit a new version of the
patch that addresses the comments about the code, or should I this
approach of making a single `git notes show` faster altogether?
In my view it is worthwhile to eliminate the extra `git show`
subprocess launch (while still accounting for non-text blob notes, but
not non-blob notes) and focus on batching later.

Maarten

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] notes: print note blob to stdout directly
  2024-02-13  8:00         ` Jeff King
  2024-02-13 17:35           ` Junio C Hamano
@ 2024-02-15  7:46           ` Maarten Bosmans
  2024-02-15 15:04             ` Jeff King
  1 sibling, 1 reply; 36+ messages in thread
From: Maarten Bosmans @ 2024-02-15  7:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Teng Long

Op di 13 feb 2024 om 09:00 schreef Jeff King <peff@peff.net>:
>
> On Tue, Feb 06, 2024 at 09:52:38AM -0800, Junio C Hamano wrote:
>
> > > That is also a cool idea. That would probably use the functionality of
> > > the cat-file batch mode, right?
> >
> > Not really.  I was hoping that "git show" that can take multiple
> > objects from its command line would directly be used, or with a new
> > option that gives a separator between these objects.
>
> How about:
>
>   cat some_commit_ids |
>   git show --stdin -s -z --format='%H%n%N'
>
> ?
>
> -Peff

Wouldn't that fail horribly with non-text blobs?

Are there examples other than cat-file that show how batching is done in git?
I'm afraid of adding something to git-notes with slightly different
syntax from other batch commands. The git cli is already confusing
enough with all the little inconsistencies.

Maarten

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] notes: print note blob to stdout directly
  2024-02-15  7:46           ` Maarten Bosmans
@ 2024-02-15 15:04             ` Jeff King
  2024-02-17 12:45               ` Maarten Bosmans
  0 siblings, 1 reply; 36+ messages in thread
From: Jeff King @ 2024-02-15 15:04 UTC (permalink / raw)
  To: Maarten Bosmans; +Cc: Junio C Hamano, git, Teng Long

On Thu, Feb 15, 2024 at 08:46:02AM +0100, Maarten Bosmans wrote:

> > How about:
> >
> >   cat some_commit_ids |
> >   git show --stdin -s -z --format='%H%n%N'
> >
> Wouldn't that fail horribly with non-text blobs?

Yes, if you have NULs embedded in your notes then it won't work. Any
batch output format would require byte counts, then. If we wanted to add
a feature to support that, I would suggest one of:

  - teach the pretty-print formatter a new placeholder to output the
    number of bytes in an element. Then you could do something like
    "%H %(size:%N)%n%N", but it would be generally useful for other
    cases, too.

  - teach the pretty-print formatter a variant of %N that outputs only
    the oid of the note, note the note content itself. And then you
    could do something like:

      git log --format='%(note:oid) %H' |
      git cat-file --batch='%(objectname) %(objectsize) %(rest)'

    to get the usual cat-file output of each note blob, but associated
    with the commit it's attached to (the "%(rest)" placeholder for
    cat-file just relays any text found after the object name of each
    line). You might need to do some scripting between the two to handle
    commits with no note.

Of the two, I'd guess that the second one is a lot less work to
implement (on the Git side; on the reading side it's a little more
involved, but still should be a constant number of processes).

One variant of the second one is to use "git notes list". For example,
you can get all notes via cat-file like this right now:

  git notes list |
  git cat-file --batch='%(objectname) %(objectsize) %(rest)'

You can get individual notes by asking for "git notes list <commit>",
but it will only take one at a time. So another easy patch would be
something like (indentation left funny to make the diff more readable):

diff --git a/builtin/notes.c b/builtin/notes.c
index e65cae0bcf..5fdad5fb8f 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -446,22 +446,22 @@ static int list(int argc, const char **argv, const char *prefix)
 		argc = parse_options(argc, argv, prefix, options,
 				     git_notes_list_usage, 0);
 
-	if (1 < argc) {
-		error(_("too many arguments"));
-		usage_with_options(git_notes_list_usage, options);
-	}
-
 	t = init_notes_check("list", 0);
 	if (argc) {
-		if (repo_get_oid(the_repository, argv[0], &object))
-			die(_("failed to resolve '%s' as a valid ref."), argv[0]);
+		retval = 0;
+		while (*++argv) {
+		if (repo_get_oid(the_repository, *argv, &object))
+			die(_("failed to resolve '%s' as a valid ref."), *argv);
 		note = get_note(t, &object);
 		if (note) {
-			puts(oid_to_hex(note));
-			retval = 0;
+			if (argc > 1)
+				printf("%s %s\n", oid_to_hex(note), oid_to_hex(&object));
+			else
+				puts(oid_to_hex(note));
 		} else
-			retval = error(_("no note found for object %s."),
+			retval |= error(_("no note found for object %s."),
 				       oid_to_hex(&object));
+		}
 	} else
 		retval = for_each_note(t, 0, list_each_note, NULL);
 

That would allow:

  git rev-list ... |
  xargs git notes list |
  git cat-file --batch='%(objectname) %(objectsize) %(rest)'

We could even add a "--stdin" mode to avoid the use of xargs.

-Peff

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] notes: print note blob to stdout directly
  2024-02-15  5:26             ` Jeff King
@ 2024-02-16  6:25               ` Junio C Hamano
  2024-02-17  5:16                 ` Jeff King
  0 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2024-02-16  6:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Maarten Bosmans, git, Teng Long, Maarten Bosmans

Jeff King <peff@peff.net> writes:

> On Tue, Feb 13, 2024 at 09:35:28AM -0800, Junio C Hamano wrote:
>
>> Remind me if you can if we (1) had plans to allow non-blob objects
>> as notes, or (2) actively support to have non-text blob objects as
>> notes.  I _think_ we do assume people may try to add non-text blob
>> as notes (while they accept that they cannot merge two such notes on
>> the same object), but I do not recall if we planned to allow them to
>> store trees and commits.
>
> Short answer: no for (1) and yes for (2).
>
> In my head non-blob notes were always something we'd eventually allow.
> But I don't think the "git notes" tool really helps you at all (it
> insists on being fed message content and makes the blob itself, rather
> than letting you specify an oid).

OK, that makes my worry about dropping "git show" from the codepath
quite a lot.  At least we only have to worry about blobs.

> Non-text blob objects, on the other hand, should be easy to do with "git
> notes add -F". Interestingly, the display code (in format_note() again)
> converts embedded NULs into newlines. I think this is an accidental
> behavior due to the use of strchrnul().
>
> Of course it's reasonable to also store notes that aren't meant to be
> displayed via git-log, etc, at all. The textconv-caching machinery
> stores its results in a separate notes ref. Those should generally be
> text (since the point is to come up with something diff-able). But I
> think it would be perfectly reasonable for another mechanism to make use
> of notes in the same way and store binary goo.

Yup.  

The question is if our current use of "git show" allows creative use
of such binary data attached as notes.  The patch in question will
break such users, as it would become impossible once we start
bypassing the "git show" and emitting the binary data directly to
the standard output stream.

Because the pathname a note blob is stored at is unpredictable and
not under end-user control, it would not be practical to define
different smudge filters per note object using the attribute
mechanism, but if you limit the types of binary data you store in
your notes (e.g., refs/notes/thumbnail may be a note ref whose
contents are all jpeg thumbnail images, where your main history is
your photo archive), you might be able to convince the "git show"
invocation we use to show the note object to show that thumbnail
image by using something involving "display" (from imagemagick---of
course you could use other programs that allows you to view the
image on different platforms) in your smudge filter.  Bypassing "git
show" and sending the blob contents directly to the standard output
would be a grave regression for such a user.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] notes: print note blob to stdout directly
  2024-02-16  6:25               ` Junio C Hamano
@ 2024-02-17  5:16                 ` Jeff King
  2024-02-17  5:56                   ` Junio C Hamano
  0 siblings, 1 reply; 36+ messages in thread
From: Jeff King @ 2024-02-17  5:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Maarten Bosmans, git, Teng Long, Maarten Bosmans

On Thu, Feb 15, 2024 at 10:25:45PM -0800, Junio C Hamano wrote:

> > Of course it's reasonable to also store notes that aren't meant to be
> > displayed via git-log, etc, at all. The textconv-caching machinery
> > stores its results in a separate notes ref. Those should generally be
> > text (since the point is to come up with something diff-able). But I
> > think it would be perfectly reasonable for another mechanism to make use
> > of notes in the same way and store binary goo.
> 
> Yup.  
> 
> The question is if our current use of "git show" allows creative use
> of such binary data attached as notes.  The patch in question will
> break such users, as it would become impossible once we start
> bypassing the "git show" and emitting the binary data directly to
> the standard output stream.

Hmm, good question. I can't think offhand of a way that the user could
convince "git show <oid>" to do anything different than just dumping the
literal contents. It is not even handed a path that could trigger
.gitattributes or similar.

The only difference I can think of is that "git show" triggers a pager
by default.  Probably "git notes show" should be doing the same (and
honestly, that would be less confusing to me; the fact that configuring
"pager.show" would affect "git notes show" is surprising to me).

Digging in the history, it looks like we use "git show" at all because
this was adapted from a shell script (though that shell script probably
ought to have been using cat-file in the first place; maybe back then we
thought we'd support non-blobs ;) ).

> Because the pathname a note blob is stored at is unpredictable and
> not under end-user control, it would not be practical to define
> different smudge filters per note object using the attribute
> mechanism, but if you limit the types of binary data you store in
> your notes (e.g., refs/notes/thumbnail may be a note ref whose
> contents are all jpeg thumbnail images, where your main history is
> your photo archive), you might be able to convince the "git show"
> invocation we use to show the note object to show that thumbnail
> image by using something involving "display" (from imagemagick---of
> course you could use other programs that allows you to view the
> image on different platforms) in your smudge filter.  Bypassing "git
> show" and sending the blob contents directly to the standard output
> would be a grave regression for such a user.

Like I said, I do not think there is a way to convince "git show" in
that way. Unless perhaps something like:

  git -c pager.show='f() { cat >tmp.jpg; display tmp; }; f' \
    notes show --ref thumbnail

But that's pretty awful. Why would you do that instead of just "git
notes show >tmp; display tmp" yourself? And again, even if we wanted to
support such a monstrosity, I feel like setting pager.notes.show would
be much more intuitive than piggy-backing on "git show".

Sometimes, of course, we have to support weird stuff anyway because it
has existed for a long time and we don't want to break users. But this
is really pushing my gut-feeling limit of what is reasonable / plausible
for somebody to be doing.

Of course I may be missing some other case where "show" behaves in a
useful way that is different than a straight dump of the blob. But if
not, I'd almost say that getting rid of the extra "show" call now is a
good thing, because it locks in the simple behavior. ;)

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] notes: print note blob to stdout directly
  2024-02-17  5:16                 ` Jeff King
@ 2024-02-17  5:56                   ` Junio C Hamano
  2024-02-17  6:09                     ` Jeff King
  0 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2024-02-17  5:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Maarten Bosmans, git, Teng Long, Maarten Bosmans

Jeff King <peff@peff.net> writes:

> Digging in the history, it looks like we use "git show" at all because
> this was adapted from a shell script (though that shell script probably
> ought to have been using cat-file in the first place; maybe back then we
> thought we'd support non-blobs ;) ).

Yup, that is why I suspected that we planned to store non-blobs.

> Hmm, good question. I can't think offhand of a way that the user could
> convince "git show <oid>" to do anything different than just dumping the
> literal contents. It is not even handed a path that could trigger
> .gitattributes or similar.

show_blob_object() directly calls stream_blob_to_fd() without any
filter, as the hardcoded invocation of "git show <oid>" in the note
codepath does not allow --textconv at all, so we probably are safe
to assume that the contents will appear as-is, not even going
through EOL conversion (which is a bit puzzling, to be honest,
though).  Lack of path I am not worried about, as you can easily
declare with '*' wildcard that everything in the notes tree is of
the same type.  But if the stream_blob_to_fd() interface does not
have anywhere smudge filters or textconv filters can hook into
without some command line option, we do not have to worry about it.

> Sometimes, of course, we have to support weird stuff anyway because it
> has existed for a long time and we don't want to break users. But this
> is really pushing my gut-feeling limit of what is reasonable / plausible
> for somebody to be doing.
>
> Of course I may be missing some other case where "show" behaves in a
> useful way that is different than a straight dump of the blob. But if
> not, I'd almost say that getting rid of the extra "show" call now is a
> good thing, because it locks in the simple behavior. ;)

Yes, of course.  It would be great if we can just stream the blob
contents out in-process.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] notes: print note blob to stdout directly
  2024-02-17  5:56                   ` Junio C Hamano
@ 2024-02-17  6:09                     ` Jeff King
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff King @ 2024-02-17  6:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Maarten Bosmans, git, Teng Long, Maarten Bosmans

On Fri, Feb 16, 2024 at 09:56:14PM -0800, Junio C Hamano wrote:

> > Hmm, good question. I can't think offhand of a way that the user could
> > convince "git show <oid>" to do anything different than just dumping the
> > literal contents. It is not even handed a path that could trigger
> > .gitattributes or similar.
> 
> show_blob_object() directly calls stream_blob_to_fd() without any
> filter, as the hardcoded invocation of "git show <oid>" in the note
> codepath does not allow --textconv at all, so we probably are safe
> to assume that the contents will appear as-is, not even going
> through EOL conversion (which is a bit puzzling, to be honest,
> though).  Lack of path I am not worried about, as you can easily
> declare with '*' wildcard that everything in the notes tree is of
> the same type.  But if the stream_blob_to_fd() interface does not
> have anywhere smudge filters or textconv filters can hook into
> without some command line option, we do not have to worry about it.

I think we are mostly on the same page, but just to be pedantic: I do
not think even adding a "*" attribute in the notes tree could ever do
anything. The "git show" invocation is not "git show notes_ref:<path>".
We resolve the note blob to an oid, and then pass that oid's hex to "git
show".

So the "git show" command itself has no context at all, and does not
even know this blob came from a tree in the first place.

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] notes: print note blob to stdout directly
  2024-02-15 15:04             ` Jeff King
@ 2024-02-17 12:45               ` Maarten Bosmans
  2024-02-20  1:51                 ` Jeff King
  0 siblings, 1 reply; 36+ messages in thread
From: Maarten Bosmans @ 2024-02-17 12:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Teng Long

Op do 15 feb 2024 om 16:04 schreef Jeff King <peff@peff.net>:
>
> On Thu, Feb 15, 2024 at 08:46:02AM +0100, Maarten Bosmans wrote:
>
> > > How about:
> > >
> > >   cat some_commit_ids |
> > >   git show --stdin -s -z --format='%H%n%N'
> > >
> > Wouldn't that fail horribly with non-text blobs?
>
> Yes, if you have NULs embedded in your notes then it won't work. Any
> batch output format would require byte counts, then. If we wanted to add
> a feature to support that, I would suggest one of:
>
>   - teach the pretty-print formatter a new placeholder to output the
>     number of bytes in an element. Then you could do something like
>     "%H %(size:%N)%n%N", but it would be generally useful for other
>     cases, too.
>
>   - teach the pretty-print formatter a variant of %N that outputs only
>     the oid of the note, note the note content itself. And then you
>     could do something like:
>
>       git log --format='%(note:oid) %H' |
>       git cat-file --batch='%(objectname) %(objectsize) %(rest)'
>
>     to get the usual cat-file output of each note blob, but associated
>     with the commit it's attached to (the "%(rest)" placeholder for
>     cat-file just relays any text found after the object name of each
>     line). You might need to do some scripting between the two to handle
>     commits with no note.
>
> Of the two, I'd guess that the second one is a lot less work to
> implement (on the Git side; on the reading side it's a little more
> involved, but still should be a constant number of processes).

The second one is attractive for another reason than implementation
simplicity. While the first one offers more flexibility, the second
reuses the existing cat-file batch format, so the interface between
git and scripts is familiar and consistent.

> One variant of the second one is to use "git notes list". For example,
> you can get all notes via cat-file like this right now:
>
>   git notes list |
>   git cat-file --batch='%(objectname) %(objectsize) %(rest)'

So the cat-file batch output is suitable for blobs containing newline
or NUL characters. But I struggle a bit with what would be an easy way
of using this format in a shell script. Something with multiple read
and read -N commands reading from the output, I guess.
The git codebase has `extract_batch_output()` in t1006. This uses a
separate perl invocation to parse the cat-file output, which confirms
my suspicion there isn't a straight-forward way to do this in e.g.
just a bash script.

That was why my first steps were to accept that a launching a separate
process per note in a bash loop is a pretty clear and well understood
idiom in shell scripts and try to make the git part of that a bit more
efficient.

> You can get individual notes by asking for "git notes list <commit>",
> but it will only take one at a time. So another easy patch would be
> something like (indentation left funny to make the diff more readable):
>
> diff --git a/builtin/notes.c b/builtin/notes.c
> index e65cae0bcf..5fdad5fb8f 100644
> --- a/builtin/notes.c
> +++ b/builtin/notes.c
> @@ -446,22 +446,22 @@ static int list(int argc, const char **argv, const char *prefix)
>                 argc = parse_options(argc, argv, prefix, options,
>                                      git_notes_list_usage, 0);
>
> -       if (1 < argc) {
> -               error(_("too many arguments"));
> -               usage_with_options(git_notes_list_usage, options);
> -       }
> -
>         t = init_notes_check("list", 0);
>         if (argc) {
> -               if (repo_get_oid(the_repository, argv[0], &object))
> -                       die(_("failed to resolve '%s' as a valid ref."), argv[0]);
> +               retval = 0;
> +               while (*++argv) {
> +               if (repo_get_oid(the_repository, *argv, &object))
> +                       die(_("failed to resolve '%s' as a valid ref."), *argv);
>                 note = get_note(t, &object);
>                 if (note) {
> -                       puts(oid_to_hex(note));
> -                       retval = 0;
> +                       if (argc > 1)
> +                               printf("%s %s\n", oid_to_hex(note), oid_to_hex(&object));
> +                       else
> +                               puts(oid_to_hex(note));
>                 } else
> -                       retval = error(_("no note found for object %s."),
> +                       retval |= error(_("no note found for object %s."),
>                                        oid_to_hex(&object));
> +               }
>         } else
>                 retval = for_each_note(t, 0, list_each_note, NULL);
>
>
> That would allow:
>
>   git rev-list ... |
>   xargs git notes list |
>   git cat-file --batch='%(objectname) %(objectsize) %(rest)'
>
> We could even add a "--stdin" mode to avoid the use of xargs.

Yes, a --stdin mode for `git notes list` would be a useful building
block for scripting indeed.
I'll probably give it a try when this patch series succeeds.

Maarten

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 0/5] Speed up git-notes show
  2024-02-05 20:49 [PATCH 0/4] Speed up git-notes show Maarten Bosmans
                   ` (4 preceding siblings ...)
  2024-02-06  7:08 ` [PATCH 0/4] Speed up git-notes show Kristoffer Haugsbakk
@ 2024-02-18 19:59 ` Maarten Bosmans
  2024-02-18 19:59   ` [PATCH v2 1/5] log: Move show_blob_object() to log.c Maarten Bosmans
                     ` (4 more replies)
  5 siblings, 5 replies; 36+ messages in thread
From: Maarten Bosmans @ 2024-02-18 19:59 UTC (permalink / raw)
  To: git

First time contributor here, trying my first git.git patch series.

BACKGROUND
  We have a script that runs a range of build tests for all new commits in the
  repository and adds a line to the commit note with the result from the test.
  Something along the lines of:
      occa-build-jit-gnu-cuda-develop: PASSED (<hostname>, 2024-01-01 00:00:00+01:00)
  Pretty useful to quickly check that all commits at least build, not only for
  master, but also in progress feature branches.  (a passing test suite is
  generally only required at the merge point)

PROBLEM
  The bash script loops over all remote refs and lists the commits newer than
  <N> days ago.  For each commit its note is read and grep'ed for an existing
  test name to see whether the build test needs to run again.  The `git note show`
  command that is in this loop nest only takes 14ms to execute, but as it is in
  a loop, those times add up.

ANALYSIS
  When asked to show a note for a specific commit, git looks up the blob hash
  for the note and executes `git show` with that hash.  That of course adds
  the child process overhead, but also causes the initialization of a lot of
  log related configuration, such as for decorations or the mailmap.  Simply
  outputting the blob directly in the main process reduces the run time by
  almost halve.

When looking through the git show implementation for useful stuff that command
does that should also be done when showing a note, I could only find the
`setup_pager()` call and some optional textconv stuff.

The second commit is the main one fixing performance. The others are just
eliminating some overhead I noticed when going through the git notes code.


CHANGES WRT V1
  Sharing of the show_blob_object() function from log.c. The intention here is
  to have `git notes show` behave the same as `git show` when in the future the
  latter might be changed to have more sophisticated behaviour for some blobs,
  e.g. launching an image viewer for PNG blobs.  Non-blob notes are still not
  handled, just like in V1.  Current master does handle that, but that is not
  deemed a case worth handling.


Maarten Bosmans (5):
  log: Move show_blob_object() to log.c
  notes: avoid launching a child process to show a note blob
  notes: use existing function stream_blob_to_fd
  notes: do not clean up right before calling die()
  notes: use strbuf_attach to take ownership of the object contents

 Makefile        |  1 +
 builtin/log.c   | 39 +++++----------------------------------
 builtin/notes.c | 38 +++++++++++---------------------------
 log.c           | 41 +++++++++++++++++++++++++++++++++++++++++
 log.h           | 11 +++++++++++
 5 files changed, 69 insertions(+), 61 deletions(-)
 create mode 100644 log.c
 create mode 100644 log.h

-- 
2.35.3


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 1/5] log: Move show_blob_object() to log.c
  2024-02-18 19:59 ` [PATCH v2 0/5] " Maarten Bosmans
@ 2024-02-18 19:59   ` Maarten Bosmans
  2024-02-20  1:22     ` Junio C Hamano
  2024-02-18 19:59   ` [PATCH v2 2/5] notes: avoid launching a child process to show a note blob Maarten Bosmans
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 36+ messages in thread
From: Maarten Bosmans @ 2024-02-18 19:59 UTC (permalink / raw)
  To: git; +Cc: Maarten Bosmans

From: Maarten Bosmans <mkbosmans@gmail.com>

From: Maarten Bosmans <maarten.bosmans@vortech.nl>

This way it can be used outside of builtin/log.c.
The next commit will make builtin/notes.c use it.

Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl>
---
 Makefile      |  1 +
 builtin/log.c | 39 +++++----------------------------------
 log.c         | 41 +++++++++++++++++++++++++++++++++++++++++
 log.h         | 11 +++++++++++
 4 files changed, 58 insertions(+), 34 deletions(-)
 create mode 100644 log.c
 create mode 100644 log.h

diff --git a/Makefile b/Makefile
index 78e874099d..1c19d5c0f3 100644
--- a/Makefile
+++ b/Makefile
@@ -1059,6 +1059,7 @@ LIB_OBJS += list-objects-filter-options.o
 LIB_OBJS += list-objects-filter.o
 LIB_OBJS += list-objects.o
 LIB_OBJS += lockfile.o
+LIB_OBJS += log.o
 LIB_OBJS += log-tree.o
 LIB_OBJS += ls-refs.o
 LIB_OBJS += mailinfo.o
diff --git a/builtin/log.c b/builtin/log.c
index db1808d7c1..587a4c374d 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -32,7 +32,6 @@
 #include "parse-options.h"
 #include "line-log.h"
 #include "branch.h"
-#include "streaming.h"
 #include "version.h"
 #include "mailmap.h"
 #include "progress.h"
@@ -42,7 +41,7 @@
 #include "range-diff.h"
 #include "tmp-objdir.h"
 #include "tree.h"
-#include "write-or-die.h"
+#include "log.h"
 
 #define MAIL_DEFAULT_WRAP 72
 #define COVER_FROM_AUTO_MAX_SUBJECT_LEN 100
@@ -653,37 +652,6 @@ static void show_tagger(const char *buf, struct rev_info *rev)
 	strbuf_release(&out);
 }
 
-static int show_blob_object(const struct object_id *oid, struct rev_info *rev, const char *obj_name)
-{
-	struct object_id oidc;
-	struct object_context obj_context;
-	char *buf;
-	unsigned long size;
-
-	fflush(rev->diffopt.file);
-	if (!rev->diffopt.flags.textconv_set_via_cmdline ||
-	    !rev->diffopt.flags.allow_textconv)
-		return stream_blob_to_fd(1, oid, NULL, 0);
-
-	if (get_oid_with_context(the_repository, obj_name,
-				 GET_OID_RECORD_PATH,
-				 &oidc, &obj_context))
-		die(_("not a valid object name %s"), obj_name);
-	if (!obj_context.path ||
-	    !textconv_object(the_repository, obj_context.path,
-			     obj_context.mode, &oidc, 1, &buf, &size)) {
-		free(obj_context.path);
-		return stream_blob_to_fd(1, oid, NULL, 0);
-	}
-
-	if (!buf)
-		die(_("git show %s: bad file"), obj_name);
-
-	write_or_die(1, buf, size);
-	free(obj_context.path);
-	return 0;
-}
-
 static int show_tag_object(const struct object_id *oid, struct rev_info *rev)
 {
 	unsigned long size;
@@ -770,7 +738,10 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 		const char *name = rev.pending.objects[i].name;
 		switch (o->type) {
 		case OBJ_BLOB:
-			ret = show_blob_object(&o->oid, &rev, name);
+			fflush(rev.diffopt.file);
+			bool do_textconv = rev.diffopt.flags.textconv_set_via_cmdline &&
+				rev.diffopt.flags.allow_textconv;
+			ret = show_blob_object(&o->oid, name, do_textconv);
 			break;
 		case OBJ_TAG: {
 			struct tag *t = (struct tag *)o;
diff --git a/log.c b/log.c
new file mode 100644
index 0000000000..5c77707385
--- /dev/null
+++ b/log.c
@@ -0,0 +1,41 @@
+#include "git-compat-util.h"
+#include "gettext.h"
+#include "diff.h"
+#include "log.h"
+#include "notes.h"
+#include "object-name.h"
+#include "repository.h"
+#include "streaming.h"
+#include "write-or-die.h"
+
+/*
+ * Print blob contents to stdout.
+ */
+int show_blob_object(const struct object_id *oid, const char *obj_name, bool do_textconv)
+{
+	struct object_id oidc;
+	struct object_context obj_context;
+	char *buf;
+	unsigned long size;
+
+	if (!do_textconv)
+		return stream_blob_to_fd(1, oid, NULL, 0);
+
+	if (get_oid_with_context(the_repository, obj_name,
+				 GET_OID_RECORD_PATH,
+				 &oidc, &obj_context))
+		die(_("not a valid object name %s"), obj_name);
+	if (!obj_context.path ||
+	    !textconv_object(the_repository, obj_context.path,
+			     obj_context.mode, &oidc, 1, &buf, &size)) {
+		free(obj_context.path);
+		return stream_blob_to_fd(1, oid, NULL, 0);
+	}
+
+	if (!buf)
+		die(_("git show %s: bad file"), obj_name);
+
+	write_or_die(1, buf, size);
+	free(obj_context.path);
+	return 0;
+}
diff --git a/log.h b/log.h
new file mode 100644
index 0000000000..464cca52ff
--- /dev/null
+++ b/log.h
@@ -0,0 +1,11 @@
+#ifndef LOG_H
+#define LOG_H
+
+struct object_id;
+
+/*
+ * Print blob contents to stdout.
+ */
+int show_blob_object(const struct object_id *oid, const char *obj_name, bool do_textconv);
+
+#endif
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 2/5] notes: avoid launching a child process to show a note blob
  2024-02-18 19:59 ` [PATCH v2 0/5] " Maarten Bosmans
  2024-02-18 19:59   ` [PATCH v2 1/5] log: Move show_blob_object() to log.c Maarten Bosmans
@ 2024-02-18 19:59   ` Maarten Bosmans
  2024-02-18 19:59   ` [PATCH v2 3/5] notes: use existing function stream_blob_to_fd Maarten Bosmans
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 36+ messages in thread
From: Maarten Bosmans @ 2024-02-18 19:59 UTC (permalink / raw)
  To: git; +Cc: Maarten Bosmans

From: Maarten Bosmans <mkbosmans@gmail.com>

From: Maarten Bosmans <maarten.bosmans@vortech.nl>

Avoid the need to launch a subprocess by calling stream_blob_to_fd
directly.  This does not only get rid of the overhead of a separate
child process, but also avoids the initalization of the whole log
machinery that `git show` does.  That is needed for example to show
decorated commits and applying the mailmap.  For simply displaying
a blob however, the only useful thing show does is enabling the pager.

This locks in the expectation that a note oid refers to a blob, and
not a tree or something else.  To still keep the option open that the
blob might not be a text blob and in the future we might handle such
notes differently, the show_blob_object() function is called in order
to have the same behaviour as though `git show <note-oid>` was called.

Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl>
---
 builtin/notes.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/builtin/notes.c b/builtin/notes.c
index caf20fd5bd..2a31da6c97 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -20,7 +20,8 @@
 #include "repository.h"
 #include "pretty.h"
 #include "refs.h"
-#include "exec-cmd.h"
+#include "pager.h"
+#include "log.h"
 #include "run-command.h"
 #include "parse-options.h"
 #include "string-list.h"
@@ -753,7 +754,7 @@ static int show(int argc, const char **argv, const char *prefix)
 	struct notes_tree *t;
 	struct object_id object;
 	const struct object_id *note;
-	int retval;
+	int retval = 0;
 	struct option options[] = {
 		OPT_END()
 	};
@@ -778,8 +779,9 @@ static int show(int argc, const char **argv, const char *prefix)
 		retval = error(_("no note found for object %s."),
 			       oid_to_hex(&object));
 	else {
-		const char *show_args[3] = {"show", oid_to_hex(note), NULL};
-		retval = execv_git_cmd(show_args);
+		setup_pager();
+		if (show_blob_object(note, NULL, false))
+			die(_("object %s is not a blob"), oid_to_hex(note));
 	}
 	free_notes(t);
 	return retval;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 3/5] notes: use existing function stream_blob_to_fd
  2024-02-18 19:59 ` [PATCH v2 0/5] " Maarten Bosmans
  2024-02-18 19:59   ` [PATCH v2 1/5] log: Move show_blob_object() to log.c Maarten Bosmans
  2024-02-18 19:59   ` [PATCH v2 2/5] notes: avoid launching a child process to show a note blob Maarten Bosmans
@ 2024-02-18 19:59   ` Maarten Bosmans
  2024-02-18 19:59   ` [PATCH v2 4/5] notes: do not clean up right before calling die() Maarten Bosmans
  2024-02-18 19:59   ` [PATCH v2 5/5] notes: use strbuf_attach to take ownership of the object contents Maarten Bosmans
  4 siblings, 0 replies; 36+ messages in thread
From: Maarten Bosmans @ 2024-02-18 19:59 UTC (permalink / raw)
  To: git; +Cc: Maarten Bosmans

From: Maarten Bosmans <mkbosmans@gmail.com>

From: Maarten Bosmans <maarten.bosmans@vortech.nl>

Use functionality from streaming.c and remove the copy_obj_to_fd()
function local to notes.c.

Streaming the blob to stdout instead of copying through an
intermediate buffer might also be more efficient, but at the
size a typical note is, this is unlikely to matter a lot.

Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl>
---
 builtin/notes.c | 15 ++-------------
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/builtin/notes.c b/builtin/notes.c
index 2a31da6c97..184a92d0c1 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -22,6 +22,7 @@
 #include "refs.h"
 #include "pager.h"
 #include "log.h"
+#include "streaming.h"
 #include "run-command.h"
 #include "parse-options.h"
 #include "string-list.h"
@@ -149,18 +150,6 @@ static int list_each_note(const struct object_id *object_oid,
 	return 0;
 }
 
-static void copy_obj_to_fd(int fd, const struct object_id *oid)
-{
-	unsigned long size;
-	enum object_type type;
-	char *buf = repo_read_object_file(the_repository, oid, &type, &size);
-	if (buf) {
-		if (size)
-			write_or_die(fd, buf, size);
-		free(buf);
-	}
-}
-
 static void write_commented_object(int fd, const struct object_id *object)
 {
 	struct child_process show = CHILD_PROCESS_INIT;
@@ -205,7 +194,7 @@ static void prepare_note_data(const struct object_id *object, struct note_data *
 		if (d->given)
 			write_or_die(fd, d->buf.buf, d->buf.len);
 		else if (old_note)
-			copy_obj_to_fd(fd, old_note);
+			stream_blob_to_fd(fd, old_note, NULL, 0);
 
 		strbuf_addch(&buf, '\n');
 		strbuf_add_commented_lines(&buf, "\n", strlen("\n"), comment_line_char);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 4/5] notes: do not clean up right before calling die()
  2024-02-18 19:59 ` [PATCH v2 0/5] " Maarten Bosmans
                     ` (2 preceding siblings ...)
  2024-02-18 19:59   ` [PATCH v2 3/5] notes: use existing function stream_blob_to_fd Maarten Bosmans
@ 2024-02-18 19:59   ` Maarten Bosmans
  2024-02-18 19:59   ` [PATCH v2 5/5] notes: use strbuf_attach to take ownership of the object contents Maarten Bosmans
  4 siblings, 0 replies; 36+ messages in thread
From: Maarten Bosmans @ 2024-02-18 19:59 UTC (permalink / raw)
  To: git; +Cc: Maarten Bosmans

From: Maarten Bosmans <mkbosmans@gmail.com>

From: Maarten Bosmans <maarten.bosmans@vortech.nl>

For consistency with the other uses of die() in the same function.

Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl>
---
 builtin/notes.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/builtin/notes.c b/builtin/notes.c
index 184a92d0c1..6863935d03 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -311,12 +311,8 @@ static int parse_reuse_arg(const struct option *opt, const char *arg, int unset)
 		die(_("failed to resolve '%s' as a valid ref."), arg);
 	if (!(value = repo_read_object_file(the_repository, &object, &type, &len)))
 		die(_("failed to read object '%s'."), arg);
-	if (type != OBJ_BLOB) {
-		strbuf_release(&msg->buf);
-		free(value);
-		free(msg);
+	if (type != OBJ_BLOB)
 		die(_("cannot read note data from non-blob object '%s'."), arg);
-	}
 
 	strbuf_add(&msg->buf, value, len);
 	free(value);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 5/5] notes: use strbuf_attach to take ownership of the object contents
  2024-02-18 19:59 ` [PATCH v2 0/5] " Maarten Bosmans
                     ` (3 preceding siblings ...)
  2024-02-18 19:59   ` [PATCH v2 4/5] notes: do not clean up right before calling die() Maarten Bosmans
@ 2024-02-18 19:59   ` Maarten Bosmans
  2024-02-20  2:12     ` Jeff King
  4 siblings, 1 reply; 36+ messages in thread
From: Maarten Bosmans @ 2024-02-18 19:59 UTC (permalink / raw)
  To: git; +Cc: Maarten Bosmans

From: Maarten Bosmans <mkbosmans@gmail.com>

From: Maarten Bosmans <maarten.bosmans@vortech.nl>

Avoid an extra allocation in the strbuf when pushing the string into it.

Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl>
---
 builtin/notes.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/builtin/notes.c b/builtin/notes.c
index 6863935d03..5be24a9c58 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -314,10 +314,8 @@ static int parse_reuse_arg(const struct option *opt, const char *arg, int unset)
 	if (type != OBJ_BLOB)
 		die(_("cannot read note data from non-blob object '%s'."), arg);
 
-	strbuf_add(&msg->buf, value, len);
-	free(value);
+	strbuf_attach(&msg->buf, value, len, len + 1);
 
-	msg->buf.len = len;
 	ALLOC_GROW_BY(d->messages, d->msg_nr, 1, d->msg_alloc);
 	d->messages[d->msg_nr - 1] = msg;
 	msg->stripspace = NO_STRIPSPACE;
@@ -705,12 +703,11 @@ static int append_edit(int argc, const char **argv, const char *prefix)
 		if (!prev_buf)
 			die(_("unable to read %s"), oid_to_hex(note));
 		if (size)
-			strbuf_add(&buf, prev_buf, size);
+			strbuf_attach(&buf, prev_buf, size, size + 1);
 		if (d.buf.len && size)
 			append_separator(&buf);
 		strbuf_insert(&d.buf, 0, buf.buf, buf.len);
 
-		free(prev_buf);
 		strbuf_release(&buf);
 	}
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] log: Move show_blob_object() to log.c
  2024-02-18 19:59   ` [PATCH v2 1/5] log: Move show_blob_object() to log.c Maarten Bosmans
@ 2024-02-20  1:22     ` Junio C Hamano
  2024-02-20  1:59       ` Jeff King
  0 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2024-02-20  1:22 UTC (permalink / raw)
  To: Maarten Bosmans; +Cc: git, Maarten Bosmans

Maarten Bosmans <mkbosmans@gmail.com> writes:

> Subject: Re: [PATCH v2 1/5] log: Move show_blob_object() to log.c

"Move" -> "move".

> From: Maarten Bosmans <mkbosmans@gmail.com>
>
> From: Maarten Bosmans <maarten.bosmans@vortech.nl>

The earlier one is not needed.  Please do not include such a line.

>
> This way it can be used outside of builtin/log.c.
> The next commit will make builtin/notes.c use it.

The resulting file is only about a small part of the implementation
detail of "show", and has very little to do with "log".

"show.c" that happens to house show_blob_object(), a "canonical" way
to display a blob object, with an anticipation that somebody may
want to expand it in the future to house the "canonical" way to
display a tag, a tree or a commit object, would be OK, though.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] notes: print note blob to stdout directly
  2024-02-17 12:45               ` Maarten Bosmans
@ 2024-02-20  1:51                 ` Jeff King
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff King @ 2024-02-20  1:51 UTC (permalink / raw)
  To: Maarten Bosmans; +Cc: Junio C Hamano, git, Teng Long

On Sat, Feb 17, 2024 at 01:45:39PM +0100, Maarten Bosmans wrote:

> > Of the two, I'd guess that the second one is a lot less work to
> > implement (on the Git side; on the reading side it's a little more
> > involved, but still should be a constant number of processes).
> 
> The second one is attractive for another reason than implementation
> simplicity. While the first one offers more flexibility, the second
> reuses the existing cat-file batch format, so the interface between
> git and scripts is familiar and consistent.

Agreed.

> > One variant of the second one is to use "git notes list". For example,
> > you can get all notes via cat-file like this right now:
> >
> >   git notes list |
> >   git cat-file --batch='%(objectname) %(objectsize) %(rest)'
> 
> So the cat-file batch output is suitable for blobs containing newline
> or NUL characters. But I struggle a bit with what would be an easy way
> of using this format in a shell script. Something with multiple read
> and read -N commands reading from the output, I guess.
> The git codebase has `extract_batch_output()` in t1006. This uses a
> separate perl invocation to parse the cat-file output, which confirms
> my suspicion there isn't a straight-forward way to do this in e.g.
> just a bash script.

Yes, you could perhaps do it with "read -N". But note that it is a
bash-ism, and other POSIX shells may not support it. That's one of the
reasons we do not use it in t1006.

Also, I suspect that even bash does not retain NUL bytes in shell
variables. E.g.:

  $ printf '\0foo\0bar\0' | cat -A
  ^@foo^@bar^@

  $ printf '\0foo\0bar\0' | (read -r -N 9 var; echo "var=$var" | cat -A)
  var=foobar$

So pure shell is probably a losing battle if you want to be binary
clean. You might be able to do something like using "read" to get the
header line, and then another tool like "head -c" to read those bytes.
But now you're back to one process per object. Plus depending on the
implementation of "head", it might or might not read more bytes from the
pipe as part of its stdio buffering.

So really, you are probably better off handling the output with a more
capable language (though again, there's not much Git can do here; the
complications are from handling binary data, and not Git's output format
choices).

> That was why my first steps were to accept that a launching a separate
> process per note in a bash loop is a pretty clear and well understood
> idiom in shell scripts and try to make the git part of that a bit more
> efficient.

Yes, I agree it's a simple idiom. And I certainly don't mind any
speedups we can get there, if they don't come with a cost (and your
patches looked pretty reasonable to me, though I didn't read them too
carefully). But I do think anytime you are invoking N+1 processes in a
shell script, it's kind of a losing battle for performance. :)

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] log: Move show_blob_object() to log.c
  2024-02-20  1:22     ` Junio C Hamano
@ 2024-02-20  1:59       ` Jeff King
  2024-02-20  3:03         ` Junio C Hamano
  2024-02-20 11:40         ` Maarten Bosmans
  0 siblings, 2 replies; 36+ messages in thread
From: Jeff King @ 2024-02-20  1:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Maarten Bosmans, git, Maarten Bosmans

On Mon, Feb 19, 2024 at 05:22:44PM -0800, Junio C Hamano wrote:

> > This way it can be used outside of builtin/log.c.
> > The next commit will make builtin/notes.c use it.
> 
> The resulting file is only about a small part of the implementation
> detail of "show", and has very little to do with "log".
> 
> "show.c" that happens to house show_blob_object(), a "canonical" way
> to display a blob object, with an anticipation that somebody may
> want to expand it in the future to house the "canonical" way to
> display a tag, a tree or a commit object, would be OK, though.

I'm not sure this function (or its siblings in builtin/log.c) counts as
"canonical", since it is deeply connected to "struct rev_info". So it is
appropriate for log, show, etc, but not for other commands.

It felt a little funny to me to make a new file just for this one
function. The related functions are all in log-tree.c. And as a bonus,
the name is _already_ horribly confusing because it says "tree" but the
functions are really about showing commits. ;)

All that said, I'm not sure based on our previous discussion why we
can't just call stream_blob_to_fd(). Looking at show_blob_object(), most
of the logic is about recording the tree-context of the given name and
using it for textconv. But since we know we are feeding a bare oid,
that would never kick in. So I don't know if there's any value in
sharing this function more widely in the first place.

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/5] notes: use strbuf_attach to take ownership of the object contents
  2024-02-18 19:59   ` [PATCH v2 5/5] notes: use strbuf_attach to take ownership of the object contents Maarten Bosmans
@ 2024-02-20  2:12     ` Jeff King
  2024-02-20  7:42       ` Maarten Bosmans
  0 siblings, 1 reply; 36+ messages in thread
From: Jeff King @ 2024-02-20  2:12 UTC (permalink / raw)
  To: Maarten Bosmans; +Cc: git, Maarten Bosmans

On Sun, Feb 18, 2024 at 08:59:38PM +0100, Maarten Bosmans wrote:

> @@ -705,12 +703,11 @@ static int append_edit(int argc, const char **argv, const char *prefix)
>  		if (!prev_buf)
>  			die(_("unable to read %s"), oid_to_hex(note));
>  		if (size)
> -			strbuf_add(&buf, prev_buf, size);
> +			strbuf_attach(&buf, prev_buf, size, size + 1);
>  		if (d.buf.len && size)
>  			append_separator(&buf);
>  		strbuf_insert(&d.buf, 0, buf.buf, buf.len);
>  
> -		free(prev_buf);
>  		strbuf_release(&buf);
>  	}

Is it possible for "size" to be 0, but prev_buf to be non-NULL? I assume
it is so if the previous note is the empty object (and anyway, we'd have
died earlier if prev_buf was NULL). In that case your patch introduces a
leak (we do not attach prev_buf to buf, but we no longer free prev_buf).

I'm a little skeptical that this is actually increasing the speed of the
command in a measurable way, though. It's one allocation/copy, right
next to a big old strbuf_insert() that is going to splice into an
existing array.

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] log: Move show_blob_object() to log.c
  2024-02-20  1:59       ` Jeff King
@ 2024-02-20  3:03         ` Junio C Hamano
  2024-02-20 11:40         ` Maarten Bosmans
  1 sibling, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2024-02-20  3:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Maarten Bosmans, git, Maarten Bosmans

Jeff King <peff@peff.net> writes:

> All that said, I'm not sure based on our previous discussion why we
> can't just call stream_blob_to_fd(). Looking at show_blob_object(), most
> of the logic is about recording the tree-context of the given name and
> using it for textconv. But since we know we are feeding a bare oid,
> that would never kick in. So I don't know if there's any value in
> sharing this function more widely in the first place.

It is very nice that we do not need to touch this code move at all.
Thanks, as usual, for a dose of sanity.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/5] notes: use strbuf_attach to take ownership of the object contents
  2024-02-20  2:12     ` Jeff King
@ 2024-02-20  7:42       ` Maarten Bosmans
  0 siblings, 0 replies; 36+ messages in thread
From: Maarten Bosmans @ 2024-02-20  7:42 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Op di 20 feb 2024 om 03:12 schreef Jeff King <peff@peff.net>:
>
> On Sun, Feb 18, 2024 at 08:59:38PM +0100, Maarten Bosmans wrote:
>
> > @@ -705,12 +703,11 @@ static int append_edit(int argc, const char **argv, const char *prefix)
> >               if (!prev_buf)
> >                       die(_("unable to read %s"), oid_to_hex(note));
> >               if (size)
> > -                     strbuf_add(&buf, prev_buf, size);
> > +                     strbuf_attach(&buf, prev_buf, size, size + 1);
> >               if (d.buf.len && size)
> >                       append_separator(&buf);
> >               strbuf_insert(&d.buf, 0, buf.buf, buf.len);
> >
> > -             free(prev_buf);
> >               strbuf_release(&buf);
> >       }
>
> Is it possible for "size" to be 0, but prev_buf to be non-NULL? I assume
> it is so if the previous note is the empty object (and anyway, we'd have
> died earlier if prev_buf was NULL). In that case your patch introduces a
> leak (we do not attach prev_buf to buf, but we no longer free prev_buf).

You are right. I think the `if (size)` is not needed and removing it
would remove the potential for a leak.

> I'm a little skeptical that this is actually increasing the speed of the
> command in a measurable way, though. It's one allocation/copy, right
> next to a big old strbuf_insert() that is going to splice into an
> existing array.

Yeah, I was doubting this patch a bit too. The simple idiom of
starting with an empty strbuf and appending strings to it seems pretty
nice and clear, so may be there's value in leaving it at that.
The speed increase is not measurable of course. I was simply operating
in full on lets-eliminate-all-sources-of-overhead mode while profiling
the notes show code.

I'll drop the patch, in order to keep focus in this series.

Maarten

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] log: Move show_blob_object() to log.c
  2024-02-20  1:59       ` Jeff King
  2024-02-20  3:03         ` Junio C Hamano
@ 2024-02-20 11:40         ` Maarten Bosmans
  1 sibling, 0 replies; 36+ messages in thread
From: Maarten Bosmans @ 2024-02-20 11:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

Op di 20 feb 2024 om 02:59 schreef Jeff King <peff@peff.net>:
> All that said, I'm not sure based on our previous discussion why we
> can't just call stream_blob_to_fd(). Looking at show_blob_object(), most
> of the logic is about recording the tree-context of the given name and
> using it for textconv. But since we know we are feeding a bare oid,
> that would never kick in. So I don't know if there's any value in
> sharing this function more widely in the first place.

Indeed. My original analysis of what `git show` does when invoked by
`git notes show` led to the conclusion that the only thing worthwhile
to keep is the `setup_pager()` call. Thanks for confirming this.

I'll reroll in a couple of days with the V1 approach back.

Maarten

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2024-02-20 11:40 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-05 20:49 [PATCH 0/4] Speed up git-notes show Maarten Bosmans
2024-02-05 20:49 ` [PATCH 1/4] notes: print note blob to stdout directly Maarten Bosmans
2024-02-06  3:44   ` Junio C Hamano
2024-02-06  9:55     ` Maarten Bosmans
2024-02-06 17:52       ` Junio C Hamano
2024-02-13  8:00         ` Jeff King
2024-02-13 17:35           ` Junio C Hamano
2024-02-15  5:26             ` Jeff King
2024-02-16  6:25               ` Junio C Hamano
2024-02-17  5:16                 ` Jeff King
2024-02-17  5:56                   ` Junio C Hamano
2024-02-17  6:09                     ` Jeff King
2024-02-15  7:46           ` Maarten Bosmans
2024-02-15 15:04             ` Jeff King
2024-02-17 12:45               ` Maarten Bosmans
2024-02-20  1:51                 ` Jeff King
2024-02-15  7:41         ` Maarten Bosmans
2024-02-06 13:55     ` Kristoffer Haugsbakk
2024-02-05 20:49 ` [PATCH 2/4] notes: use exisisting function stream_blob_to_fd Maarten Bosmans
2024-02-05 22:00   ` Eric Sunshine
2024-02-05 20:49 ` [PATCH 3/4] notes: do not clean up right before calling die() Maarten Bosmans
2024-02-05 20:49 ` [PATCH 4/4] notes: use strbuf_attach to take ownership of the object contents Maarten Bosmans
2024-02-06  7:08 ` [PATCH 0/4] Speed up git-notes show Kristoffer Haugsbakk
2024-02-06  8:51   ` Maarten Bosmans
2024-02-18 19:59 ` [PATCH v2 0/5] " Maarten Bosmans
2024-02-18 19:59   ` [PATCH v2 1/5] log: Move show_blob_object() to log.c Maarten Bosmans
2024-02-20  1:22     ` Junio C Hamano
2024-02-20  1:59       ` Jeff King
2024-02-20  3:03         ` Junio C Hamano
2024-02-20 11:40         ` Maarten Bosmans
2024-02-18 19:59   ` [PATCH v2 2/5] notes: avoid launching a child process to show a note blob Maarten Bosmans
2024-02-18 19:59   ` [PATCH v2 3/5] notes: use existing function stream_blob_to_fd Maarten Bosmans
2024-02-18 19:59   ` [PATCH v2 4/5] notes: do not clean up right before calling die() Maarten Bosmans
2024-02-18 19:59   ` [PATCH v2 5/5] notes: use strbuf_attach to take ownership of the object contents Maarten Bosmans
2024-02-20  2:12     ` Jeff King
2024-02-20  7:42       ` Maarten Bosmans

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).