* [PATCH 0/4] Speed up git-notes show @ 2024-02-05 20:49 Maarten Bosmans 2024-02-05 20:49 ` [PATCH 1/4] notes: print note blob to stdout directly Maarten Bosmans ` (5 more replies) 0 siblings, 6 replies; 36+ messages in thread From: Maarten Bosmans @ 2024-02-05 20:49 UTC (permalink / raw) To: git; +Cc: Teng Long First time contributor here, trying my first git.git patch series. BACKGROUND We have a script that runs a range of build tests for all new commits in the repository and adds a line to the commit note with the result from the test. Something along the lines of: occa-build-jit-gnu-cuda-develop: PASSED (<hostname>, 2024-01-01 00:00:00+01:00) Pretty useful to quickly check that all commits at least build, not only for master, but also in progress feature branches. (a passing test suite is generally only required at the merge point) PROBLEM The bash script loops over all remote refs and lists the commits newer than <N> days ago. For each commit its note is read and grep'ed for an existing test name to see whether the build test needs to run again. The `git note show` command that is in this loop nest only takes 14ms to execute, but as it is in a loop, those times add up. ANALYSIS When asked to show a note for a specific commit, git looks up the blob hash for the note and executes `git show` with that hash. That of course adds the child process overhead, but also causes the initialization of a lot of log related configuration, such as for decorations or the mailmap. Simply outputting the blob directly in the main process reduces the run time by almost halve. When looking through the git show implementation for useful stuff that command does that should also be done when showing a note, I could only find the `setup_pager()` call. All other git show functionality was related to showing commits or other non-blob objects. The only thing I was not 100% sure of was the textconv_object stuff. From what I could deduce was that this is only ever used on blobs that represent files in a tree, not on blobs that represent note objects. So I did not include any textconv calls in the git notes code. The first commit is the main one fixing performance. The other three are just eliminating some overhead I noticed when going through the git notes code. Maarten Bosmans (4): notes: print note blob to stdout directly notes: use exisisting function stream_blob_to_fd notes: do not clean up right before calling die() notes: use strbuf_attach to take ownership of the object contents builtin/notes.c | 37 ++++++++++--------------------------- 1 file changed, 10 insertions(+), 27 deletions(-) -- 2.35.3 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 1/4] notes: print note blob to stdout directly 2024-02-05 20:49 [PATCH 0/4] Speed up git-notes show Maarten Bosmans @ 2024-02-05 20:49 ` Maarten Bosmans 2024-02-06 3:44 ` Junio C Hamano 2024-02-05 20:49 ` [PATCH 2/4] notes: use exisisting function stream_blob_to_fd Maarten Bosmans ` (4 subsequent siblings) 5 siblings, 1 reply; 36+ messages in thread From: Maarten Bosmans @ 2024-02-05 20:49 UTC (permalink / raw) To: git; +Cc: Teng Long, Maarten Bosmans From: Maarten Bosmans <mkbosmans@gmail.com> From: Maarten Bosmans <maarten.bosmans@vortech.nl> Avoid the need to launch a subprocess by calling stream_blob_to_fd directly. This does not only get rid of the overhead of a separate child process, but also avoids the initalization of the whole log machinery that `git show` does. That is needed for example to show decorated commits and applying the mailmap. For simply displaying a blob however, the only useful thing show does is enabling the pager. Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl> --- builtin/notes.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/builtin/notes.c b/builtin/notes.c index e65cae0bcf..8efe9809b2 100644 --- a/builtin/notes.c +++ b/builtin/notes.c @@ -20,7 +20,8 @@ #include "repository.h" #include "pretty.h" #include "refs.h" -#include "exec-cmd.h" +#include "pager.h" +#include "streaming.h" #include "run-command.h" #include "parse-options.h" #include "string-list.h" @@ -751,7 +752,7 @@ static int show(int argc, const char **argv, const char *prefix) struct notes_tree *t; struct object_id object; const struct object_id *note; - int retval; + int retval = 0; struct option options[] = { OPT_END() }; @@ -776,8 +777,9 @@ static int show(int argc, const char **argv, const char *prefix) retval = error(_("no note found for object %s."), oid_to_hex(&object)); else { - const char *show_args[3] = {"show", oid_to_hex(note), NULL}; - retval = execv_git_cmd(show_args); + setup_pager(); + if (stream_blob_to_fd(1, note, NULL, 0)) + die(_("object %s is not a blob"), oid_to_hex(note)); } free_notes(t); return retval; -- 2.35.3 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] notes: print note blob to stdout directly 2024-02-05 20:49 ` [PATCH 1/4] notes: print note blob to stdout directly Maarten Bosmans @ 2024-02-06 3:44 ` Junio C Hamano 2024-02-06 9:55 ` Maarten Bosmans 2024-02-06 13:55 ` Kristoffer Haugsbakk 0 siblings, 2 replies; 36+ messages in thread From: Junio C Hamano @ 2024-02-06 3:44 UTC (permalink / raw) To: Maarten Bosmans; +Cc: git, Teng Long, Maarten Bosmans Maarten Bosmans <mkbosmans@gmail.com> writes: > From: Maarten Bosmans <mkbosmans@gmail.com> > > From: Maarten Bosmans <maarten.bosmans@vortech.nl> Which one of you are you? Please make up your mind and use only one ;-) IOW, the first one is unneeded, as the latter matches what you have on the S-o-b line. > Avoid the need to launch a subprocess by calling stream_blob_to_fd > directly. This does not only get rid of the overhead of a separate > child process, but also avoids the initalization of the whole log > machinery that `git show` does. That is needed for example to show > decorated commits and applying the mailmap. For simply displaying > a blob however, the only useful thing show does is enabling the pager. > > Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl> > --- > builtin/notes.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) I am not sure if we want to accept an approach that feels somewhat narrow/short sighted, like this patch. When "git show" learns an improved way to show blob objects, who will update the code this patch touches to teach it to use the same improved way to show the notes? I actually was hoping, after seeing the use case description in the cover letter, that the series would be introducing a batch mode interface to allow callers to ask notes for many objects and have the command respond notes for these objects in a way that which piece of output corresponds to which object in the request, reducing the process overhead amortised over many objects. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] notes: print note blob to stdout directly 2024-02-06 3:44 ` Junio C Hamano @ 2024-02-06 9:55 ` Maarten Bosmans 2024-02-06 17:52 ` Junio C Hamano 2024-02-06 13:55 ` Kristoffer Haugsbakk 1 sibling, 1 reply; 36+ messages in thread From: Maarten Bosmans @ 2024-02-06 9:55 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Teng Long, Maarten Bosmans Op di 6 feb 2024 om 04:44 schreef Junio C Hamano <gitster@pobox.com>: > > Maarten Bosmans <mkbosmans@gmail.com> writes: > > Avoid the need to launch a subprocess by calling stream_blob_to_fd > > directly. This does not only get rid of the overhead of a separate > > child process, but also avoids the initalization of the whole log > > machinery that `git show` does. That is needed for example to show > > decorated commits and applying the mailmap. For simply displaying > > a blob however, the only useful thing show does is enabling the pager. > > > > Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl> > > --- > > builtin/notes.c | 10 ++++++---- > > 1 file changed, 6 insertions(+), 4 deletions(-) > > I am not sure if we want to accept an approach that feels somewhat > narrow/short sighted, like this patch. When "git show" learns an > improved way to show blob objects, who will update the code this > patch touches to teach it to use the same improved way to show the > notes? Yes, you would loose some of the flexibility by just calling out to another git command. But whether that is actually an issue depends on the way git show would be extended. As I mentioned in the cover letter, there is some handling of textconv being done in git show for the blob case. I though it would not be applicable to note blobs, but am not entirely sure. To address your concern and the textconv case, instead of calling stream_blob_to_fd() directly, show_blob_object() could be used. Right now that is a static function in building/log.c I guess that needs to be moved then to log.c (or show.c)? I'm still not sure that is a good approach though, as the notes edit/append subcommands use repo_read_object_file() directly to fill the current note contents. So anything fancy that git show would do to blob output is not reflected when editing a note. > I actually was hoping, after seeing the use case description in the > cover letter, that the series would be introducing a batch mode > interface to allow callers to ask notes for many objects and have > the command respond notes for these objects in a way that which > piece of output corresponds to which object in the request, reducing > the process overhead amortised over many objects. That is also a cool idea. That would probably use the functionality of the cat-file batch mode, right? In that case you loose any future changes to git show anyway. At least the current behavior of git show when fed multiple blob hashes is to simply output them directly after another, without any way of identifying the parts. My concern would be that this additional functionality would not get used much, as it requires a bit more than just a quick-n-dirty bash script with some loops and command substitutions. If that approach is reasonably fast (the object of this patch series), then there is not much to be gained in a practical sense by a batch mode. To me it seems the git notes code is already a bit undermaintained currently, so I'd rather keep it generic, instead of customizing it to more specific use cases. Maarten ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] notes: print note blob to stdout directly 2024-02-06 9:55 ` Maarten Bosmans @ 2024-02-06 17:52 ` Junio C Hamano 2024-02-13 8:00 ` Jeff King 2024-02-15 7:41 ` Maarten Bosmans 0 siblings, 2 replies; 36+ messages in thread From: Junio C Hamano @ 2024-02-06 17:52 UTC (permalink / raw) To: Maarten Bosmans; +Cc: git, Teng Long, Maarten Bosmans Maarten Bosmans <mkbosmans@gmail.com> writes: >> I am not sure if we want to accept an approach that feels somewhat >> narrow/short sighted, like this patch. When "git show" learns an >> improved way to show blob objects, who will update the code this >> patch touches to teach it to use the same improved way to show the >> notes? Another thing I forgot to mention was that I suspected the use of "git show" was so that we can deal with notes trees whose leaves are *not* blob objects. As our "git notes" machinery, including how the initial contents are populated with "git notes add" and how existing notes on the same object are merged, all assume and depend on that the leaves are blobs, we can say "we'll just dump to stdout with write_in_full()", but who knows what kind of custom crap random third-party tools and IDEs try to use notes trees to store. Even if we limit ourselves to blobs, they may not be suitable to be dumped to tty (or a pager, for that matter), and I can see how textconv could be used as a way out ("detect that the notes payload is an image and spawn display" kind of hack in a repository full of images and notes used to store thumbnails, perhaps). > That is also a cool idea. That would probably use the functionality of > the cat-file batch mode, right? Not really. I was hoping that "git show" that can take multiple objects from its command line would directly be used, or with a new option that gives a separator between these objects. Perhaps we want a new option, e.g., "git notes show --text" that passes the contents of the leaf blob object to write_in_full(), bypassing all the things "git show" does, while in a rare case when the leaf we find is not a blob to invoke "git show". That might be a safe approach to move forward, if we wanted to do this. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] notes: print note blob to stdout directly 2024-02-06 17:52 ` Junio C Hamano @ 2024-02-13 8:00 ` Jeff King 2024-02-13 17:35 ` Junio C Hamano 2024-02-15 7:46 ` Maarten Bosmans 2024-02-15 7:41 ` Maarten Bosmans 1 sibling, 2 replies; 36+ messages in thread From: Jeff King @ 2024-02-13 8:00 UTC (permalink / raw) To: Junio C Hamano; +Cc: Maarten Bosmans, git, Teng Long, Maarten Bosmans On Tue, Feb 06, 2024 at 09:52:38AM -0800, Junio C Hamano wrote: > > That is also a cool idea. That would probably use the functionality of > > the cat-file batch mode, right? > > Not really. I was hoping that "git show" that can take multiple > objects from its command line would directly be used, or with a new > option that gives a separator between these objects. How about: cat some_commit_ids | git show --stdin -s -z --format='%H%n%N' ? -Peff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] notes: print note blob to stdout directly 2024-02-13 8:00 ` Jeff King @ 2024-02-13 17:35 ` Junio C Hamano 2024-02-15 5:26 ` Jeff King 2024-02-15 7:46 ` Maarten Bosmans 1 sibling, 1 reply; 36+ messages in thread From: Junio C Hamano @ 2024-02-13 17:35 UTC (permalink / raw) To: Jeff King; +Cc: Maarten Bosmans, git, Teng Long, Maarten Bosmans Jeff King <peff@peff.net> writes: > On Tue, Feb 06, 2024 at 09:52:38AM -0800, Junio C Hamano wrote: > >> > That is also a cool idea. That would probably use the functionality of >> > the cat-file batch mode, right? >> >> Not really. I was hoping that "git show" that can take multiple >> objects from its command line would directly be used, or with a new >> option that gives a separator between these objects. > > How about: > > cat some_commit_ids | > git show --stdin -s -z --format='%H%n%N' Yeah, that is more in line with what I was hoping to see, instead of invoking "git show %s" for a note object one by one. Thanks. Remind me if you can if we (1) had plans to allow non-blob objects as notes, or (2) actively support to have non-text blob objects as notes. I _think_ we do assume people may try to add non-text blob as notes (while they accept that they cannot merge two such notes on the same object), but I do not recall if we planned to allow them to store trees and commits. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] notes: print note blob to stdout directly 2024-02-13 17:35 ` Junio C Hamano @ 2024-02-15 5:26 ` Jeff King 2024-02-16 6:25 ` Junio C Hamano 0 siblings, 1 reply; 36+ messages in thread From: Jeff King @ 2024-02-15 5:26 UTC (permalink / raw) To: Junio C Hamano; +Cc: Maarten Bosmans, git, Teng Long, Maarten Bosmans On Tue, Feb 13, 2024 at 09:35:28AM -0800, Junio C Hamano wrote: > Remind me if you can if we (1) had plans to allow non-blob objects > as notes, or (2) actively support to have non-text blob objects as > notes. I _think_ we do assume people may try to add non-text blob > as notes (while they accept that they cannot merge two such notes on > the same object), but I do not recall if we planned to allow them to > store trees and commits. Short answer: no for (1) and yes for (2). In my head non-blob notes were always something we'd eventually allow. But I don't think the "git notes" tool really helps you at all (it insists on being fed message content and makes the blob itself, rather than letting you specify an oid). I wondered what the display side might do here (like, say, segfault). But it looks like the notes code does not even consider a subtree like this to be a note. It ends up in the "non_note" list of the notes struct (and format_note() on the display side explicitly ignores non-blobs anyway). So it looks like it did not ever work, and nobody can even be using it accidentally (of course you can put whatever garbage you like under refs/notes/*, but if neither git-notes nor the display machinery works with it, I think we can discount that entirely). Non-text blob objects, on the other hand, should be easy to do with "git notes add -F". Interestingly, the display code (in format_note() again) converts embedded NULs into newlines. I think this is an accidental behavior due to the use of strchrnul(). Of course it's reasonable to also store notes that aren't meant to be displayed via git-log, etc, at all. The textconv-caching machinery stores its results in a separate notes ref. Those should generally be text (since the point is to come up with something diff-able). But I think it would be perfectly reasonable for another mechanism to make use of notes in the same way and store binary goo. (And of course we can store notes on any object, not just commits, as the textconv cache demonstrates. But I think that is orthogonal to what you're asking). -Peff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] notes: print note blob to stdout directly 2024-02-15 5:26 ` Jeff King @ 2024-02-16 6:25 ` Junio C Hamano 2024-02-17 5:16 ` Jeff King 0 siblings, 1 reply; 36+ messages in thread From: Junio C Hamano @ 2024-02-16 6:25 UTC (permalink / raw) To: Jeff King; +Cc: Maarten Bosmans, git, Teng Long, Maarten Bosmans Jeff King <peff@peff.net> writes: > On Tue, Feb 13, 2024 at 09:35:28AM -0800, Junio C Hamano wrote: > >> Remind me if you can if we (1) had plans to allow non-blob objects >> as notes, or (2) actively support to have non-text blob objects as >> notes. I _think_ we do assume people may try to add non-text blob >> as notes (while they accept that they cannot merge two such notes on >> the same object), but I do not recall if we planned to allow them to >> store trees and commits. > > Short answer: no for (1) and yes for (2). > > In my head non-blob notes were always something we'd eventually allow. > But I don't think the "git notes" tool really helps you at all (it > insists on being fed message content and makes the blob itself, rather > than letting you specify an oid). OK, that makes my worry about dropping "git show" from the codepath quite a lot. At least we only have to worry about blobs. > Non-text blob objects, on the other hand, should be easy to do with "git > notes add -F". Interestingly, the display code (in format_note() again) > converts embedded NULs into newlines. I think this is an accidental > behavior due to the use of strchrnul(). > > Of course it's reasonable to also store notes that aren't meant to be > displayed via git-log, etc, at all. The textconv-caching machinery > stores its results in a separate notes ref. Those should generally be > text (since the point is to come up with something diff-able). But I > think it would be perfectly reasonable for another mechanism to make use > of notes in the same way and store binary goo. Yup. The question is if our current use of "git show" allows creative use of such binary data attached as notes. The patch in question will break such users, as it would become impossible once we start bypassing the "git show" and emitting the binary data directly to the standard output stream. Because the pathname a note blob is stored at is unpredictable and not under end-user control, it would not be practical to define different smudge filters per note object using the attribute mechanism, but if you limit the types of binary data you store in your notes (e.g., refs/notes/thumbnail may be a note ref whose contents are all jpeg thumbnail images, where your main history is your photo archive), you might be able to convince the "git show" invocation we use to show the note object to show that thumbnail image by using something involving "display" (from imagemagick---of course you could use other programs that allows you to view the image on different platforms) in your smudge filter. Bypassing "git show" and sending the blob contents directly to the standard output would be a grave regression for such a user. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] notes: print note blob to stdout directly 2024-02-16 6:25 ` Junio C Hamano @ 2024-02-17 5:16 ` Jeff King 2024-02-17 5:56 ` Junio C Hamano 0 siblings, 1 reply; 36+ messages in thread From: Jeff King @ 2024-02-17 5:16 UTC (permalink / raw) To: Junio C Hamano; +Cc: Maarten Bosmans, git, Teng Long, Maarten Bosmans On Thu, Feb 15, 2024 at 10:25:45PM -0800, Junio C Hamano wrote: > > Of course it's reasonable to also store notes that aren't meant to be > > displayed via git-log, etc, at all. The textconv-caching machinery > > stores its results in a separate notes ref. Those should generally be > > text (since the point is to come up with something diff-able). But I > > think it would be perfectly reasonable for another mechanism to make use > > of notes in the same way and store binary goo. > > Yup. > > The question is if our current use of "git show" allows creative use > of such binary data attached as notes. The patch in question will > break such users, as it would become impossible once we start > bypassing the "git show" and emitting the binary data directly to > the standard output stream. Hmm, good question. I can't think offhand of a way that the user could convince "git show <oid>" to do anything different than just dumping the literal contents. It is not even handed a path that could trigger .gitattributes or similar. The only difference I can think of is that "git show" triggers a pager by default. Probably "git notes show" should be doing the same (and honestly, that would be less confusing to me; the fact that configuring "pager.show" would affect "git notes show" is surprising to me). Digging in the history, it looks like we use "git show" at all because this was adapted from a shell script (though that shell script probably ought to have been using cat-file in the first place; maybe back then we thought we'd support non-blobs ;) ). > Because the pathname a note blob is stored at is unpredictable and > not under end-user control, it would not be practical to define > different smudge filters per note object using the attribute > mechanism, but if you limit the types of binary data you store in > your notes (e.g., refs/notes/thumbnail may be a note ref whose > contents are all jpeg thumbnail images, where your main history is > your photo archive), you might be able to convince the "git show" > invocation we use to show the note object to show that thumbnail > image by using something involving "display" (from imagemagick---of > course you could use other programs that allows you to view the > image on different platforms) in your smudge filter. Bypassing "git > show" and sending the blob contents directly to the standard output > would be a grave regression for such a user. Like I said, I do not think there is a way to convince "git show" in that way. Unless perhaps something like: git -c pager.show='f() { cat >tmp.jpg; display tmp; }; f' \ notes show --ref thumbnail But that's pretty awful. Why would you do that instead of just "git notes show >tmp; display tmp" yourself? And again, even if we wanted to support such a monstrosity, I feel like setting pager.notes.show would be much more intuitive than piggy-backing on "git show". Sometimes, of course, we have to support weird stuff anyway because it has existed for a long time and we don't want to break users. But this is really pushing my gut-feeling limit of what is reasonable / plausible for somebody to be doing. Of course I may be missing some other case where "show" behaves in a useful way that is different than a straight dump of the blob. But if not, I'd almost say that getting rid of the extra "show" call now is a good thing, because it locks in the simple behavior. ;) -Peff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] notes: print note blob to stdout directly 2024-02-17 5:16 ` Jeff King @ 2024-02-17 5:56 ` Junio C Hamano 2024-02-17 6:09 ` Jeff King 0 siblings, 1 reply; 36+ messages in thread From: Junio C Hamano @ 2024-02-17 5:56 UTC (permalink / raw) To: Jeff King; +Cc: Maarten Bosmans, git, Teng Long, Maarten Bosmans Jeff King <peff@peff.net> writes: > Digging in the history, it looks like we use "git show" at all because > this was adapted from a shell script (though that shell script probably > ought to have been using cat-file in the first place; maybe back then we > thought we'd support non-blobs ;) ). Yup, that is why I suspected that we planned to store non-blobs. > Hmm, good question. I can't think offhand of a way that the user could > convince "git show <oid>" to do anything different than just dumping the > literal contents. It is not even handed a path that could trigger > .gitattributes or similar. show_blob_object() directly calls stream_blob_to_fd() without any filter, as the hardcoded invocation of "git show <oid>" in the note codepath does not allow --textconv at all, so we probably are safe to assume that the contents will appear as-is, not even going through EOL conversion (which is a bit puzzling, to be honest, though). Lack of path I am not worried about, as you can easily declare with '*' wildcard that everything in the notes tree is of the same type. But if the stream_blob_to_fd() interface does not have anywhere smudge filters or textconv filters can hook into without some command line option, we do not have to worry about it. > Sometimes, of course, we have to support weird stuff anyway because it > has existed for a long time and we don't want to break users. But this > is really pushing my gut-feeling limit of what is reasonable / plausible > for somebody to be doing. > > Of course I may be missing some other case where "show" behaves in a > useful way that is different than a straight dump of the blob. But if > not, I'd almost say that getting rid of the extra "show" call now is a > good thing, because it locks in the simple behavior. ;) Yes, of course. It would be great if we can just stream the blob contents out in-process. Thanks. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] notes: print note blob to stdout directly 2024-02-17 5:56 ` Junio C Hamano @ 2024-02-17 6:09 ` Jeff King 0 siblings, 0 replies; 36+ messages in thread From: Jeff King @ 2024-02-17 6:09 UTC (permalink / raw) To: Junio C Hamano; +Cc: Maarten Bosmans, git, Teng Long, Maarten Bosmans On Fri, Feb 16, 2024 at 09:56:14PM -0800, Junio C Hamano wrote: > > Hmm, good question. I can't think offhand of a way that the user could > > convince "git show <oid>" to do anything different than just dumping the > > literal contents. It is not even handed a path that could trigger > > .gitattributes or similar. > > show_blob_object() directly calls stream_blob_to_fd() without any > filter, as the hardcoded invocation of "git show <oid>" in the note > codepath does not allow --textconv at all, so we probably are safe > to assume that the contents will appear as-is, not even going > through EOL conversion (which is a bit puzzling, to be honest, > though). Lack of path I am not worried about, as you can easily > declare with '*' wildcard that everything in the notes tree is of > the same type. But if the stream_blob_to_fd() interface does not > have anywhere smudge filters or textconv filters can hook into > without some command line option, we do not have to worry about it. I think we are mostly on the same page, but just to be pedantic: I do not think even adding a "*" attribute in the notes tree could ever do anything. The "git show" invocation is not "git show notes_ref:<path>". We resolve the note blob to an oid, and then pass that oid's hex to "git show". So the "git show" command itself has no context at all, and does not even know this blob came from a tree in the first place. -Peff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] notes: print note blob to stdout directly 2024-02-13 8:00 ` Jeff King 2024-02-13 17:35 ` Junio C Hamano @ 2024-02-15 7:46 ` Maarten Bosmans 2024-02-15 15:04 ` Jeff King 1 sibling, 1 reply; 36+ messages in thread From: Maarten Bosmans @ 2024-02-15 7:46 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git, Teng Long Op di 13 feb 2024 om 09:00 schreef Jeff King <peff@peff.net>: > > On Tue, Feb 06, 2024 at 09:52:38AM -0800, Junio C Hamano wrote: > > > > That is also a cool idea. That would probably use the functionality of > > > the cat-file batch mode, right? > > > > Not really. I was hoping that "git show" that can take multiple > > objects from its command line would directly be used, or with a new > > option that gives a separator between these objects. > > How about: > > cat some_commit_ids | > git show --stdin -s -z --format='%H%n%N' > > ? > > -Peff Wouldn't that fail horribly with non-text blobs? Are there examples other than cat-file that show how batching is done in git? I'm afraid of adding something to git-notes with slightly different syntax from other batch commands. The git cli is already confusing enough with all the little inconsistencies. Maarten ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] notes: print note blob to stdout directly 2024-02-15 7:46 ` Maarten Bosmans @ 2024-02-15 15:04 ` Jeff King 2024-02-17 12:45 ` Maarten Bosmans 0 siblings, 1 reply; 36+ messages in thread From: Jeff King @ 2024-02-15 15:04 UTC (permalink / raw) To: Maarten Bosmans; +Cc: Junio C Hamano, git, Teng Long On Thu, Feb 15, 2024 at 08:46:02AM +0100, Maarten Bosmans wrote: > > How about: > > > > cat some_commit_ids | > > git show --stdin -s -z --format='%H%n%N' > > > Wouldn't that fail horribly with non-text blobs? Yes, if you have NULs embedded in your notes then it won't work. Any batch output format would require byte counts, then. If we wanted to add a feature to support that, I would suggest one of: - teach the pretty-print formatter a new placeholder to output the number of bytes in an element. Then you could do something like "%H %(size:%N)%n%N", but it would be generally useful for other cases, too. - teach the pretty-print formatter a variant of %N that outputs only the oid of the note, note the note content itself. And then you could do something like: git log --format='%(note:oid) %H' | git cat-file --batch='%(objectname) %(objectsize) %(rest)' to get the usual cat-file output of each note blob, but associated with the commit it's attached to (the "%(rest)" placeholder for cat-file just relays any text found after the object name of each line). You might need to do some scripting between the two to handle commits with no note. Of the two, I'd guess that the second one is a lot less work to implement (on the Git side; on the reading side it's a little more involved, but still should be a constant number of processes). One variant of the second one is to use "git notes list". For example, you can get all notes via cat-file like this right now: git notes list | git cat-file --batch='%(objectname) %(objectsize) %(rest)' You can get individual notes by asking for "git notes list <commit>", but it will only take one at a time. So another easy patch would be something like (indentation left funny to make the diff more readable): diff --git a/builtin/notes.c b/builtin/notes.c index e65cae0bcf..5fdad5fb8f 100644 --- a/builtin/notes.c +++ b/builtin/notes.c @@ -446,22 +446,22 @@ static int list(int argc, const char **argv, const char *prefix) argc = parse_options(argc, argv, prefix, options, git_notes_list_usage, 0); - if (1 < argc) { - error(_("too many arguments")); - usage_with_options(git_notes_list_usage, options); - } - t = init_notes_check("list", 0); if (argc) { - if (repo_get_oid(the_repository, argv[0], &object)) - die(_("failed to resolve '%s' as a valid ref."), argv[0]); + retval = 0; + while (*++argv) { + if (repo_get_oid(the_repository, *argv, &object)) + die(_("failed to resolve '%s' as a valid ref."), *argv); note = get_note(t, &object); if (note) { - puts(oid_to_hex(note)); - retval = 0; + if (argc > 1) + printf("%s %s\n", oid_to_hex(note), oid_to_hex(&object)); + else + puts(oid_to_hex(note)); } else - retval = error(_("no note found for object %s."), + retval |= error(_("no note found for object %s."), oid_to_hex(&object)); + } } else retval = for_each_note(t, 0, list_each_note, NULL); That would allow: git rev-list ... | xargs git notes list | git cat-file --batch='%(objectname) %(objectsize) %(rest)' We could even add a "--stdin" mode to avoid the use of xargs. -Peff ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] notes: print note blob to stdout directly 2024-02-15 15:04 ` Jeff King @ 2024-02-17 12:45 ` Maarten Bosmans 2024-02-20 1:51 ` Jeff King 0 siblings, 1 reply; 36+ messages in thread From: Maarten Bosmans @ 2024-02-17 12:45 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git, Teng Long Op do 15 feb 2024 om 16:04 schreef Jeff King <peff@peff.net>: > > On Thu, Feb 15, 2024 at 08:46:02AM +0100, Maarten Bosmans wrote: > > > > How about: > > > > > > cat some_commit_ids | > > > git show --stdin -s -z --format='%H%n%N' > > > > > Wouldn't that fail horribly with non-text blobs? > > Yes, if you have NULs embedded in your notes then it won't work. Any > batch output format would require byte counts, then. If we wanted to add > a feature to support that, I would suggest one of: > > - teach the pretty-print formatter a new placeholder to output the > number of bytes in an element. Then you could do something like > "%H %(size:%N)%n%N", but it would be generally useful for other > cases, too. > > - teach the pretty-print formatter a variant of %N that outputs only > the oid of the note, note the note content itself. And then you > could do something like: > > git log --format='%(note:oid) %H' | > git cat-file --batch='%(objectname) %(objectsize) %(rest)' > > to get the usual cat-file output of each note blob, but associated > with the commit it's attached to (the "%(rest)" placeholder for > cat-file just relays any text found after the object name of each > line). You might need to do some scripting between the two to handle > commits with no note. > > Of the two, I'd guess that the second one is a lot less work to > implement (on the Git side; on the reading side it's a little more > involved, but still should be a constant number of processes). The second one is attractive for another reason than implementation simplicity. While the first one offers more flexibility, the second reuses the existing cat-file batch format, so the interface between git and scripts is familiar and consistent. > One variant of the second one is to use "git notes list". For example, > you can get all notes via cat-file like this right now: > > git notes list | > git cat-file --batch='%(objectname) %(objectsize) %(rest)' So the cat-file batch output is suitable for blobs containing newline or NUL characters. But I struggle a bit with what would be an easy way of using this format in a shell script. Something with multiple read and read -N commands reading from the output, I guess. The git codebase has `extract_batch_output()` in t1006. This uses a separate perl invocation to parse the cat-file output, which confirms my suspicion there isn't a straight-forward way to do this in e.g. just a bash script. That was why my first steps were to accept that a launching a separate process per note in a bash loop is a pretty clear and well understood idiom in shell scripts and try to make the git part of that a bit more efficient. > You can get individual notes by asking for "git notes list <commit>", > but it will only take one at a time. So another easy patch would be > something like (indentation left funny to make the diff more readable): > > diff --git a/builtin/notes.c b/builtin/notes.c > index e65cae0bcf..5fdad5fb8f 100644 > --- a/builtin/notes.c > +++ b/builtin/notes.c > @@ -446,22 +446,22 @@ static int list(int argc, const char **argv, const char *prefix) > argc = parse_options(argc, argv, prefix, options, > git_notes_list_usage, 0); > > - if (1 < argc) { > - error(_("too many arguments")); > - usage_with_options(git_notes_list_usage, options); > - } > - > t = init_notes_check("list", 0); > if (argc) { > - if (repo_get_oid(the_repository, argv[0], &object)) > - die(_("failed to resolve '%s' as a valid ref."), argv[0]); > + retval = 0; > + while (*++argv) { > + if (repo_get_oid(the_repository, *argv, &object)) > + die(_("failed to resolve '%s' as a valid ref."), *argv); > note = get_note(t, &object); > if (note) { > - puts(oid_to_hex(note)); > - retval = 0; > + if (argc > 1) > + printf("%s %s\n", oid_to_hex(note), oid_to_hex(&object)); > + else > + puts(oid_to_hex(note)); > } else > - retval = error(_("no note found for object %s."), > + retval |= error(_("no note found for object %s."), > oid_to_hex(&object)); > + } > } else > retval = for_each_note(t, 0, list_each_note, NULL); > > > That would allow: > > git rev-list ... | > xargs git notes list | > git cat-file --batch='%(objectname) %(objectsize) %(rest)' > > We could even add a "--stdin" mode to avoid the use of xargs. Yes, a --stdin mode for `git notes list` would be a useful building block for scripting indeed. I'll probably give it a try when this patch series succeeds. Maarten ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] notes: print note blob to stdout directly 2024-02-17 12:45 ` Maarten Bosmans @ 2024-02-20 1:51 ` Jeff King 0 siblings, 0 replies; 36+ messages in thread From: Jeff King @ 2024-02-20 1:51 UTC (permalink / raw) To: Maarten Bosmans; +Cc: Junio C Hamano, git, Teng Long On Sat, Feb 17, 2024 at 01:45:39PM +0100, Maarten Bosmans wrote: > > Of the two, I'd guess that the second one is a lot less work to > > implement (on the Git side; on the reading side it's a little more > > involved, but still should be a constant number of processes). > > The second one is attractive for another reason than implementation > simplicity. While the first one offers more flexibility, the second > reuses the existing cat-file batch format, so the interface between > git and scripts is familiar and consistent. Agreed. > > One variant of the second one is to use "git notes list". For example, > > you can get all notes via cat-file like this right now: > > > > git notes list | > > git cat-file --batch='%(objectname) %(objectsize) %(rest)' > > So the cat-file batch output is suitable for blobs containing newline > or NUL characters. But I struggle a bit with what would be an easy way > of using this format in a shell script. Something with multiple read > and read -N commands reading from the output, I guess. > The git codebase has `extract_batch_output()` in t1006. This uses a > separate perl invocation to parse the cat-file output, which confirms > my suspicion there isn't a straight-forward way to do this in e.g. > just a bash script. Yes, you could perhaps do it with "read -N". But note that it is a bash-ism, and other POSIX shells may not support it. That's one of the reasons we do not use it in t1006. Also, I suspect that even bash does not retain NUL bytes in shell variables. E.g.: $ printf '\0foo\0bar\0' | cat -A ^@foo^@bar^@ $ printf '\0foo\0bar\0' | (read -r -N 9 var; echo "var=$var" | cat -A) var=foobar$ So pure shell is probably a losing battle if you want to be binary clean. You might be able to do something like using "read" to get the header line, and then another tool like "head -c" to read those bytes. But now you're back to one process per object. Plus depending on the implementation of "head", it might or might not read more bytes from the pipe as part of its stdio buffering. So really, you are probably better off handling the output with a more capable language (though again, there's not much Git can do here; the complications are from handling binary data, and not Git's output format choices). > That was why my first steps were to accept that a launching a separate > process per note in a bash loop is a pretty clear and well understood > idiom in shell scripts and try to make the git part of that a bit more > efficient. Yes, I agree it's a simple idiom. And I certainly don't mind any speedups we can get there, if they don't come with a cost (and your patches looked pretty reasonable to me, though I didn't read them too carefully). But I do think anytime you are invoking N+1 processes in a shell script, it's kind of a losing battle for performance. :) -Peff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] notes: print note blob to stdout directly 2024-02-06 17:52 ` Junio C Hamano 2024-02-13 8:00 ` Jeff King @ 2024-02-15 7:41 ` Maarten Bosmans 1 sibling, 0 replies; 36+ messages in thread From: Maarten Bosmans @ 2024-02-15 7:41 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Teng Long Op di 6 feb 2024 om 18:52 schreef Junio C Hamano <gitster@pobox.com>: > Maarten Bosmans <mkbosmans@gmail.com> writes: > > That is also a cool idea. That would probably use the functionality of > > the cat-file batch mode, right? > > Not really. I was hoping that "git show" that can take multiple > objects from its command line would directly be used, or with a new > option that gives a separator between these objects. OK, I need some guidance here. Should I submit a new version of the patch that addresses the comments about the code, or should I this approach of making a single `git notes show` faster altogether? In my view it is worthwhile to eliminate the extra `git show` subprocess launch (while still accounting for non-text blob notes, but not non-blob notes) and focus on batching later. Maarten ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] notes: print note blob to stdout directly 2024-02-06 3:44 ` Junio C Hamano 2024-02-06 9:55 ` Maarten Bosmans @ 2024-02-06 13:55 ` Kristoffer Haugsbakk 1 sibling, 0 replies; 36+ messages in thread From: Kristoffer Haugsbakk @ 2024-02-06 13:55 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Teng Long, Maarten Bosmans, Maarten Bosmans On Tue, Feb 6, 2024, at 04:44, Junio C Hamano wrote: > I actually was hoping, after seeing the use case description in the > cover letter, that the series would be introducing a batch mode > interface to allow callers to ask notes for many objects and have > the command respond notes for these objects in a way that which > piece of output corresponds to which object in the request, reducing > the process overhead amortised over many objects. That sounds great. I imagine that would be very useful. -- Kristoffer Haugsbakk ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 2/4] notes: use exisisting function stream_blob_to_fd 2024-02-05 20:49 [PATCH 0/4] Speed up git-notes show Maarten Bosmans 2024-02-05 20:49 ` [PATCH 1/4] notes: print note blob to stdout directly Maarten Bosmans @ 2024-02-05 20:49 ` Maarten Bosmans 2024-02-05 22:00 ` Eric Sunshine 2024-02-05 20:49 ` [PATCH 3/4] notes: do not clean up right before calling die() Maarten Bosmans ` (3 subsequent siblings) 5 siblings, 1 reply; 36+ messages in thread From: Maarten Bosmans @ 2024-02-05 20:49 UTC (permalink / raw) To: git; +Cc: Teng Long, Maarten Bosmans From: Maarten Bosmans <mkbosmans@gmail.com> From: Maarten Bosmans <maarten.bosmans@vortech.nl> Use functionality from streaming.c and remove the copy_obj_to_fd() function local to notes.c. Streaming the blob to stdout instead of copying through an intermediate buffer might also be more efficient, but at the size a typical note is, this is unlikely to matter a lot. Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl> --- builtin/notes.c | 14 +------------- 1 file changed, 1 insertion(+), 13 deletions(-) diff --git a/builtin/notes.c b/builtin/notes.c index 8efe9809b2..048b8c559c 100644 --- a/builtin/notes.c +++ b/builtin/notes.c @@ -149,18 +149,6 @@ static int list_each_note(const struct object_id *object_oid, return 0; } -static void copy_obj_to_fd(int fd, const struct object_id *oid) -{ - unsigned long size; - enum object_type type; - char *buf = repo_read_object_file(the_repository, oid, &type, &size); - if (buf) { - if (size) - write_or_die(fd, buf, size); - free(buf); - } -} - static void write_commented_object(int fd, const struct object_id *object) { struct child_process show = CHILD_PROCESS_INIT; @@ -205,7 +193,7 @@ static void prepare_note_data(const struct object_id *object, struct note_data * if (d->given) write_or_die(fd, d->buf.buf, d->buf.len); else if (old_note) - copy_obj_to_fd(fd, old_note); + stream_blob_to_fd(fd, old_note, NULL, 0); strbuf_addch(&buf, '\n'); strbuf_add_commented_lines(&buf, "\n", strlen("\n"), comment_line_char); -- 2.35.3 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] notes: use exisisting function stream_blob_to_fd 2024-02-05 20:49 ` [PATCH 2/4] notes: use exisisting function stream_blob_to_fd Maarten Bosmans @ 2024-02-05 22:00 ` Eric Sunshine 0 siblings, 0 replies; 36+ messages in thread From: Eric Sunshine @ 2024-02-05 22:00 UTC (permalink / raw) To: Maarten Bosmans; +Cc: git, Teng Long, Maarten Bosmans On Mon, Feb 5, 2024 at 4:53 PM Maarten Bosmans <mkbosmans@gmail.com> wrote: > notes: use exisisting function stream_blob_to_fd s/exisisting/existing/ > Use functionality from streaming.c and remove the copy_obj_to_fd() > function local to notes.c. > > Streaming the blob to stdout instead of copying through an > intermediate buffer might also be more efficient, but at the > size a typical note is, this is unlikely to matter a lot. > > Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl> ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 3/4] notes: do not clean up right before calling die() 2024-02-05 20:49 [PATCH 0/4] Speed up git-notes show Maarten Bosmans 2024-02-05 20:49 ` [PATCH 1/4] notes: print note blob to stdout directly Maarten Bosmans 2024-02-05 20:49 ` [PATCH 2/4] notes: use exisisting function stream_blob_to_fd Maarten Bosmans @ 2024-02-05 20:49 ` Maarten Bosmans 2024-02-05 20:49 ` [PATCH 4/4] notes: use strbuf_attach to take ownership of the object contents Maarten Bosmans ` (2 subsequent siblings) 5 siblings, 0 replies; 36+ messages in thread From: Maarten Bosmans @ 2024-02-05 20:49 UTC (permalink / raw) To: git; +Cc: Teng Long, Maarten Bosmans From: Maarten Bosmans <mkbosmans@gmail.com> From: Maarten Bosmans <maarten.bosmans@vortech.nl> For consistency with the other uses of die() in the same function. Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl> --- builtin/notes.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/builtin/notes.c b/builtin/notes.c index 048b8c559c..0e98a820dd 100644 --- a/builtin/notes.c +++ b/builtin/notes.c @@ -310,12 +310,8 @@ static int parse_reuse_arg(const struct option *opt, const char *arg, int unset) die(_("failed to resolve '%s' as a valid ref."), arg); if (!(value = repo_read_object_file(the_repository, &object, &type, &len))) die(_("failed to read object '%s'."), arg); - if (type != OBJ_BLOB) { - strbuf_release(&msg->buf); - free(value); - free(msg); + if (type != OBJ_BLOB) die(_("cannot read note data from non-blob object '%s'."), arg); - } strbuf_add(&msg->buf, value, len); free(value); -- 2.35.3 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 4/4] notes: use strbuf_attach to take ownership of the object contents 2024-02-05 20:49 [PATCH 0/4] Speed up git-notes show Maarten Bosmans ` (2 preceding siblings ...) 2024-02-05 20:49 ` [PATCH 3/4] notes: do not clean up right before calling die() Maarten Bosmans @ 2024-02-05 20:49 ` Maarten Bosmans 2024-02-06 7:08 ` [PATCH 0/4] Speed up git-notes show Kristoffer Haugsbakk 2024-02-18 19:59 ` [PATCH v2 0/5] " Maarten Bosmans 5 siblings, 0 replies; 36+ messages in thread From: Maarten Bosmans @ 2024-02-05 20:49 UTC (permalink / raw) To: git; +Cc: Teng Long, Maarten Bosmans From: Maarten Bosmans <mkbosmans@gmail.com> From: Maarten Bosmans <maarten.bosmans@vortech.nl> Avoid an extra allocation in the strbuf when pushing the string into it. Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl> --- builtin/notes.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/builtin/notes.c b/builtin/notes.c index 0e98a820dd..71f36667f3 100644 --- a/builtin/notes.c +++ b/builtin/notes.c @@ -313,10 +313,8 @@ static int parse_reuse_arg(const struct option *opt, const char *arg, int unset) if (type != OBJ_BLOB) die(_("cannot read note data from non-blob object '%s'."), arg); - strbuf_add(&msg->buf, value, len); - free(value); + strbuf_attach(&msg->buf, value, len, len + 1); - msg->buf.len = len; ALLOC_GROW_BY(d->messages, d->msg_nr, 1, d->msg_alloc); d->messages[d->msg_nr - 1] = msg; msg->stripspace = NO_STRIPSPACE; @@ -702,12 +700,11 @@ static int append_edit(int argc, const char **argv, const char *prefix) char *prev_buf = repo_read_object_file(the_repository, note, &type, &size); if (prev_buf && size) - strbuf_add(&buf, prev_buf, size); + strbuf_attach(&buf, prev_buf, size, size + 1); if (d.buf.len && prev_buf && size) append_separator(&buf); strbuf_insert(&d.buf, 0, buf.buf, buf.len); - free(prev_buf); strbuf_release(&buf); } -- 2.35.3 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 0/4] Speed up git-notes show 2024-02-05 20:49 [PATCH 0/4] Speed up git-notes show Maarten Bosmans ` (3 preceding siblings ...) 2024-02-05 20:49 ` [PATCH 4/4] notes: use strbuf_attach to take ownership of the object contents Maarten Bosmans @ 2024-02-06 7:08 ` Kristoffer Haugsbakk 2024-02-06 8:51 ` Maarten Bosmans 2024-02-18 19:59 ` [PATCH v2 0/5] " Maarten Bosmans 5 siblings, 1 reply; 36+ messages in thread From: Kristoffer Haugsbakk @ 2024-02-06 7:08 UTC (permalink / raw) To: Maarten Bosmans; +Cc: Teng Long, git On Mon, Feb 5, 2024, at 21:49, Maarten Bosmans wrote: > BACKGROUND > We have a script that runs a range of build tests for all new > commits in the repository and adds a line to the commit note with > the result from the test. Nice. I’ve seen a few tools that do that: • https://github.com/spotify/git-test • https://github.com/mhagger/git-test Thanks for improving this use-case. -- Kristoffer Haugsbakk ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/4] Speed up git-notes show 2024-02-06 7:08 ` [PATCH 0/4] Speed up git-notes show Kristoffer Haugsbakk @ 2024-02-06 8:51 ` Maarten Bosmans 0 siblings, 0 replies; 36+ messages in thread From: Maarten Bosmans @ 2024-02-06 8:51 UTC (permalink / raw) To: Kristoffer Haugsbakk; +Cc: git Op di 6 feb 2024 om 08:08 schreef Kristoffer Haugsbakk <code@khaugsbakk.name>: > Nice. I’ve seen a few tools that do that: > > • https://github.com/spotify/git-test > • https://github.com/mhagger/git-test Cool to see others are doing something similar. The first git-test stores the test results in a local cache though, not in git notes. The second git-test stores the results in a note attached to the tree instead of the commit. This is a clever idea, as you avoid running tests twice on the same tree (e.g. the merge commit of a `git merge --no-ff` merged branch that was already rebased to be current). Down side is that git log and can display commit notes after the message, but cannot display tree notes. In our setup there's one CI server running these tests and pushing the notes to the repo. This way everybody (developer, reviewer) can see the state of in-progress work without having to run all build test variations themselves. Anyway, good to see wider usage of the git notes feature as inspiration for new possibilities. Maarten ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 0/5] Speed up git-notes show 2024-02-05 20:49 [PATCH 0/4] Speed up git-notes show Maarten Bosmans ` (4 preceding siblings ...) 2024-02-06 7:08 ` [PATCH 0/4] Speed up git-notes show Kristoffer Haugsbakk @ 2024-02-18 19:59 ` Maarten Bosmans 2024-02-18 19:59 ` [PATCH v2 1/5] log: Move show_blob_object() to log.c Maarten Bosmans ` (4 more replies) 5 siblings, 5 replies; 36+ messages in thread From: Maarten Bosmans @ 2024-02-18 19:59 UTC (permalink / raw) To: git First time contributor here, trying my first git.git patch series. BACKGROUND We have a script that runs a range of build tests for all new commits in the repository and adds a line to the commit note with the result from the test. Something along the lines of: occa-build-jit-gnu-cuda-develop: PASSED (<hostname>, 2024-01-01 00:00:00+01:00) Pretty useful to quickly check that all commits at least build, not only for master, but also in progress feature branches. (a passing test suite is generally only required at the merge point) PROBLEM The bash script loops over all remote refs and lists the commits newer than <N> days ago. For each commit its note is read and grep'ed for an existing test name to see whether the build test needs to run again. The `git note show` command that is in this loop nest only takes 14ms to execute, but as it is in a loop, those times add up. ANALYSIS When asked to show a note for a specific commit, git looks up the blob hash for the note and executes `git show` with that hash. That of course adds the child process overhead, but also causes the initialization of a lot of log related configuration, such as for decorations or the mailmap. Simply outputting the blob directly in the main process reduces the run time by almost halve. When looking through the git show implementation for useful stuff that command does that should also be done when showing a note, I could only find the `setup_pager()` call and some optional textconv stuff. The second commit is the main one fixing performance. The others are just eliminating some overhead I noticed when going through the git notes code. CHANGES WRT V1 Sharing of the show_blob_object() function from log.c. The intention here is to have `git notes show` behave the same as `git show` when in the future the latter might be changed to have more sophisticated behaviour for some blobs, e.g. launching an image viewer for PNG blobs. Non-blob notes are still not handled, just like in V1. Current master does handle that, but that is not deemed a case worth handling. Maarten Bosmans (5): log: Move show_blob_object() to log.c notes: avoid launching a child process to show a note blob notes: use existing function stream_blob_to_fd notes: do not clean up right before calling die() notes: use strbuf_attach to take ownership of the object contents Makefile | 1 + builtin/log.c | 39 +++++---------------------------------- builtin/notes.c | 38 +++++++++++--------------------------- log.c | 41 +++++++++++++++++++++++++++++++++++++++++ log.h | 11 +++++++++++ 5 files changed, 69 insertions(+), 61 deletions(-) create mode 100644 log.c create mode 100644 log.h -- 2.35.3 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 1/5] log: Move show_blob_object() to log.c 2024-02-18 19:59 ` [PATCH v2 0/5] " Maarten Bosmans @ 2024-02-18 19:59 ` Maarten Bosmans 2024-02-20 1:22 ` Junio C Hamano 2024-02-18 19:59 ` [PATCH v2 2/5] notes: avoid launching a child process to show a note blob Maarten Bosmans ` (3 subsequent siblings) 4 siblings, 1 reply; 36+ messages in thread From: Maarten Bosmans @ 2024-02-18 19:59 UTC (permalink / raw) To: git; +Cc: Maarten Bosmans From: Maarten Bosmans <mkbosmans@gmail.com> From: Maarten Bosmans <maarten.bosmans@vortech.nl> This way it can be used outside of builtin/log.c. The next commit will make builtin/notes.c use it. Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl> --- Makefile | 1 + builtin/log.c | 39 +++++---------------------------------- log.c | 41 +++++++++++++++++++++++++++++++++++++++++ log.h | 11 +++++++++++ 4 files changed, 58 insertions(+), 34 deletions(-) create mode 100644 log.c create mode 100644 log.h diff --git a/Makefile b/Makefile index 78e874099d..1c19d5c0f3 100644 --- a/Makefile +++ b/Makefile @@ -1059,6 +1059,7 @@ LIB_OBJS += list-objects-filter-options.o LIB_OBJS += list-objects-filter.o LIB_OBJS += list-objects.o LIB_OBJS += lockfile.o +LIB_OBJS += log.o LIB_OBJS += log-tree.o LIB_OBJS += ls-refs.o LIB_OBJS += mailinfo.o diff --git a/builtin/log.c b/builtin/log.c index db1808d7c1..587a4c374d 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -32,7 +32,6 @@ #include "parse-options.h" #include "line-log.h" #include "branch.h" -#include "streaming.h" #include "version.h" #include "mailmap.h" #include "progress.h" @@ -42,7 +41,7 @@ #include "range-diff.h" #include "tmp-objdir.h" #include "tree.h" -#include "write-or-die.h" +#include "log.h" #define MAIL_DEFAULT_WRAP 72 #define COVER_FROM_AUTO_MAX_SUBJECT_LEN 100 @@ -653,37 +652,6 @@ static void show_tagger(const char *buf, struct rev_info *rev) strbuf_release(&out); } -static int show_blob_object(const struct object_id *oid, struct rev_info *rev, const char *obj_name) -{ - struct object_id oidc; - struct object_context obj_context; - char *buf; - unsigned long size; - - fflush(rev->diffopt.file); - if (!rev->diffopt.flags.textconv_set_via_cmdline || - !rev->diffopt.flags.allow_textconv) - return stream_blob_to_fd(1, oid, NULL, 0); - - if (get_oid_with_context(the_repository, obj_name, - GET_OID_RECORD_PATH, - &oidc, &obj_context)) - die(_("not a valid object name %s"), obj_name); - if (!obj_context.path || - !textconv_object(the_repository, obj_context.path, - obj_context.mode, &oidc, 1, &buf, &size)) { - free(obj_context.path); - return stream_blob_to_fd(1, oid, NULL, 0); - } - - if (!buf) - die(_("git show %s: bad file"), obj_name); - - write_or_die(1, buf, size); - free(obj_context.path); - return 0; -} - static int show_tag_object(const struct object_id *oid, struct rev_info *rev) { unsigned long size; @@ -770,7 +738,10 @@ int cmd_show(int argc, const char **argv, const char *prefix) const char *name = rev.pending.objects[i].name; switch (o->type) { case OBJ_BLOB: - ret = show_blob_object(&o->oid, &rev, name); + fflush(rev.diffopt.file); + bool do_textconv = rev.diffopt.flags.textconv_set_via_cmdline && + rev.diffopt.flags.allow_textconv; + ret = show_blob_object(&o->oid, name, do_textconv); break; case OBJ_TAG: { struct tag *t = (struct tag *)o; diff --git a/log.c b/log.c new file mode 100644 index 0000000000..5c77707385 --- /dev/null +++ b/log.c @@ -0,0 +1,41 @@ +#include "git-compat-util.h" +#include "gettext.h" +#include "diff.h" +#include "log.h" +#include "notes.h" +#include "object-name.h" +#include "repository.h" +#include "streaming.h" +#include "write-or-die.h" + +/* + * Print blob contents to stdout. + */ +int show_blob_object(const struct object_id *oid, const char *obj_name, bool do_textconv) +{ + struct object_id oidc; + struct object_context obj_context; + char *buf; + unsigned long size; + + if (!do_textconv) + return stream_blob_to_fd(1, oid, NULL, 0); + + if (get_oid_with_context(the_repository, obj_name, + GET_OID_RECORD_PATH, + &oidc, &obj_context)) + die(_("not a valid object name %s"), obj_name); + if (!obj_context.path || + !textconv_object(the_repository, obj_context.path, + obj_context.mode, &oidc, 1, &buf, &size)) { + free(obj_context.path); + return stream_blob_to_fd(1, oid, NULL, 0); + } + + if (!buf) + die(_("git show %s: bad file"), obj_name); + + write_or_die(1, buf, size); + free(obj_context.path); + return 0; +} diff --git a/log.h b/log.h new file mode 100644 index 0000000000..464cca52ff --- /dev/null +++ b/log.h @@ -0,0 +1,11 @@ +#ifndef LOG_H +#define LOG_H + +struct object_id; + +/* + * Print blob contents to stdout. + */ +int show_blob_object(const struct object_id *oid, const char *obj_name, bool do_textconv); + +#endif -- 2.35.3 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v2 1/5] log: Move show_blob_object() to log.c 2024-02-18 19:59 ` [PATCH v2 1/5] log: Move show_blob_object() to log.c Maarten Bosmans @ 2024-02-20 1:22 ` Junio C Hamano 2024-02-20 1:59 ` Jeff King 0 siblings, 1 reply; 36+ messages in thread From: Junio C Hamano @ 2024-02-20 1:22 UTC (permalink / raw) To: Maarten Bosmans; +Cc: git, Maarten Bosmans Maarten Bosmans <mkbosmans@gmail.com> writes: > Subject: Re: [PATCH v2 1/5] log: Move show_blob_object() to log.c "Move" -> "move". > From: Maarten Bosmans <mkbosmans@gmail.com> > > From: Maarten Bosmans <maarten.bosmans@vortech.nl> The earlier one is not needed. Please do not include such a line. > > This way it can be used outside of builtin/log.c. > The next commit will make builtin/notes.c use it. The resulting file is only about a small part of the implementation detail of "show", and has very little to do with "log". "show.c" that happens to house show_blob_object(), a "canonical" way to display a blob object, with an anticipation that somebody may want to expand it in the future to house the "canonical" way to display a tag, a tree or a commit object, would be OK, though. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 1/5] log: Move show_blob_object() to log.c 2024-02-20 1:22 ` Junio C Hamano @ 2024-02-20 1:59 ` Jeff King 2024-02-20 3:03 ` Junio C Hamano 2024-02-20 11:40 ` Maarten Bosmans 0 siblings, 2 replies; 36+ messages in thread From: Jeff King @ 2024-02-20 1:59 UTC (permalink / raw) To: Junio C Hamano; +Cc: Maarten Bosmans, git, Maarten Bosmans On Mon, Feb 19, 2024 at 05:22:44PM -0800, Junio C Hamano wrote: > > This way it can be used outside of builtin/log.c. > > The next commit will make builtin/notes.c use it. > > The resulting file is only about a small part of the implementation > detail of "show", and has very little to do with "log". > > "show.c" that happens to house show_blob_object(), a "canonical" way > to display a blob object, with an anticipation that somebody may > want to expand it in the future to house the "canonical" way to > display a tag, a tree or a commit object, would be OK, though. I'm not sure this function (or its siblings in builtin/log.c) counts as "canonical", since it is deeply connected to "struct rev_info". So it is appropriate for log, show, etc, but not for other commands. It felt a little funny to me to make a new file just for this one function. The related functions are all in log-tree.c. And as a bonus, the name is _already_ horribly confusing because it says "tree" but the functions are really about showing commits. ;) All that said, I'm not sure based on our previous discussion why we can't just call stream_blob_to_fd(). Looking at show_blob_object(), most of the logic is about recording the tree-context of the given name and using it for textconv. But since we know we are feeding a bare oid, that would never kick in. So I don't know if there's any value in sharing this function more widely in the first place. -Peff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 1/5] log: Move show_blob_object() to log.c 2024-02-20 1:59 ` Jeff King @ 2024-02-20 3:03 ` Junio C Hamano 2024-02-20 11:40 ` Maarten Bosmans 1 sibling, 0 replies; 36+ messages in thread From: Junio C Hamano @ 2024-02-20 3:03 UTC (permalink / raw) To: Jeff King; +Cc: Maarten Bosmans, git, Maarten Bosmans Jeff King <peff@peff.net> writes: > All that said, I'm not sure based on our previous discussion why we > can't just call stream_blob_to_fd(). Looking at show_blob_object(), most > of the logic is about recording the tree-context of the given name and > using it for textconv. But since we know we are feeding a bare oid, > that would never kick in. So I don't know if there's any value in > sharing this function more widely in the first place. It is very nice that we do not need to touch this code move at all. Thanks, as usual, for a dose of sanity. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 1/5] log: Move show_blob_object() to log.c 2024-02-20 1:59 ` Jeff King 2024-02-20 3:03 ` Junio C Hamano @ 2024-02-20 11:40 ` Maarten Bosmans 1 sibling, 0 replies; 36+ messages in thread From: Maarten Bosmans @ 2024-02-20 11:40 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git Op di 20 feb 2024 om 02:59 schreef Jeff King <peff@peff.net>: > All that said, I'm not sure based on our previous discussion why we > can't just call stream_blob_to_fd(). Looking at show_blob_object(), most > of the logic is about recording the tree-context of the given name and > using it for textconv. But since we know we are feeding a bare oid, > that would never kick in. So I don't know if there's any value in > sharing this function more widely in the first place. Indeed. My original analysis of what `git show` does when invoked by `git notes show` led to the conclusion that the only thing worthwhile to keep is the `setup_pager()` call. Thanks for confirming this. I'll reroll in a couple of days with the V1 approach back. Maarten ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 2/5] notes: avoid launching a child process to show a note blob 2024-02-18 19:59 ` [PATCH v2 0/5] " Maarten Bosmans 2024-02-18 19:59 ` [PATCH v2 1/5] log: Move show_blob_object() to log.c Maarten Bosmans @ 2024-02-18 19:59 ` Maarten Bosmans 2024-02-18 19:59 ` [PATCH v2 3/5] notes: use existing function stream_blob_to_fd Maarten Bosmans ` (2 subsequent siblings) 4 siblings, 0 replies; 36+ messages in thread From: Maarten Bosmans @ 2024-02-18 19:59 UTC (permalink / raw) To: git; +Cc: Maarten Bosmans From: Maarten Bosmans <mkbosmans@gmail.com> From: Maarten Bosmans <maarten.bosmans@vortech.nl> Avoid the need to launch a subprocess by calling stream_blob_to_fd directly. This does not only get rid of the overhead of a separate child process, but also avoids the initalization of the whole log machinery that `git show` does. That is needed for example to show decorated commits and applying the mailmap. For simply displaying a blob however, the only useful thing show does is enabling the pager. This locks in the expectation that a note oid refers to a blob, and not a tree or something else. To still keep the option open that the blob might not be a text blob and in the future we might handle such notes differently, the show_blob_object() function is called in order to have the same behaviour as though `git show <note-oid>` was called. Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl> --- builtin/notes.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/builtin/notes.c b/builtin/notes.c index caf20fd5bd..2a31da6c97 100644 --- a/builtin/notes.c +++ b/builtin/notes.c @@ -20,7 +20,8 @@ #include "repository.h" #include "pretty.h" #include "refs.h" -#include "exec-cmd.h" +#include "pager.h" +#include "log.h" #include "run-command.h" #include "parse-options.h" #include "string-list.h" @@ -753,7 +754,7 @@ static int show(int argc, const char **argv, const char *prefix) struct notes_tree *t; struct object_id object; const struct object_id *note; - int retval; + int retval = 0; struct option options[] = { OPT_END() }; @@ -778,8 +779,9 @@ static int show(int argc, const char **argv, const char *prefix) retval = error(_("no note found for object %s."), oid_to_hex(&object)); else { - const char *show_args[3] = {"show", oid_to_hex(note), NULL}; - retval = execv_git_cmd(show_args); + setup_pager(); + if (show_blob_object(note, NULL, false)) + die(_("object %s is not a blob"), oid_to_hex(note)); } free_notes(t); return retval; -- 2.35.3 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 3/5] notes: use existing function stream_blob_to_fd 2024-02-18 19:59 ` [PATCH v2 0/5] " Maarten Bosmans 2024-02-18 19:59 ` [PATCH v2 1/5] log: Move show_blob_object() to log.c Maarten Bosmans 2024-02-18 19:59 ` [PATCH v2 2/5] notes: avoid launching a child process to show a note blob Maarten Bosmans @ 2024-02-18 19:59 ` Maarten Bosmans 2024-02-18 19:59 ` [PATCH v2 4/5] notes: do not clean up right before calling die() Maarten Bosmans 2024-02-18 19:59 ` [PATCH v2 5/5] notes: use strbuf_attach to take ownership of the object contents Maarten Bosmans 4 siblings, 0 replies; 36+ messages in thread From: Maarten Bosmans @ 2024-02-18 19:59 UTC (permalink / raw) To: git; +Cc: Maarten Bosmans From: Maarten Bosmans <mkbosmans@gmail.com> From: Maarten Bosmans <maarten.bosmans@vortech.nl> Use functionality from streaming.c and remove the copy_obj_to_fd() function local to notes.c. Streaming the blob to stdout instead of copying through an intermediate buffer might also be more efficient, but at the size a typical note is, this is unlikely to matter a lot. Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl> --- builtin/notes.c | 15 ++------------- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/builtin/notes.c b/builtin/notes.c index 2a31da6c97..184a92d0c1 100644 --- a/builtin/notes.c +++ b/builtin/notes.c @@ -22,6 +22,7 @@ #include "refs.h" #include "pager.h" #include "log.h" +#include "streaming.h" #include "run-command.h" #include "parse-options.h" #include "string-list.h" @@ -149,18 +150,6 @@ static int list_each_note(const struct object_id *object_oid, return 0; } -static void copy_obj_to_fd(int fd, const struct object_id *oid) -{ - unsigned long size; - enum object_type type; - char *buf = repo_read_object_file(the_repository, oid, &type, &size); - if (buf) { - if (size) - write_or_die(fd, buf, size); - free(buf); - } -} - static void write_commented_object(int fd, const struct object_id *object) { struct child_process show = CHILD_PROCESS_INIT; @@ -205,7 +194,7 @@ static void prepare_note_data(const struct object_id *object, struct note_data * if (d->given) write_or_die(fd, d->buf.buf, d->buf.len); else if (old_note) - copy_obj_to_fd(fd, old_note); + stream_blob_to_fd(fd, old_note, NULL, 0); strbuf_addch(&buf, '\n'); strbuf_add_commented_lines(&buf, "\n", strlen("\n"), comment_line_char); -- 2.35.3 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 4/5] notes: do not clean up right before calling die() 2024-02-18 19:59 ` [PATCH v2 0/5] " Maarten Bosmans ` (2 preceding siblings ...) 2024-02-18 19:59 ` [PATCH v2 3/5] notes: use existing function stream_blob_to_fd Maarten Bosmans @ 2024-02-18 19:59 ` Maarten Bosmans 2024-02-18 19:59 ` [PATCH v2 5/5] notes: use strbuf_attach to take ownership of the object contents Maarten Bosmans 4 siblings, 0 replies; 36+ messages in thread From: Maarten Bosmans @ 2024-02-18 19:59 UTC (permalink / raw) To: git; +Cc: Maarten Bosmans From: Maarten Bosmans <mkbosmans@gmail.com> From: Maarten Bosmans <maarten.bosmans@vortech.nl> For consistency with the other uses of die() in the same function. Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl> --- builtin/notes.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/builtin/notes.c b/builtin/notes.c index 184a92d0c1..6863935d03 100644 --- a/builtin/notes.c +++ b/builtin/notes.c @@ -311,12 +311,8 @@ static int parse_reuse_arg(const struct option *opt, const char *arg, int unset) die(_("failed to resolve '%s' as a valid ref."), arg); if (!(value = repo_read_object_file(the_repository, &object, &type, &len))) die(_("failed to read object '%s'."), arg); - if (type != OBJ_BLOB) { - strbuf_release(&msg->buf); - free(value); - free(msg); + if (type != OBJ_BLOB) die(_("cannot read note data from non-blob object '%s'."), arg); - } strbuf_add(&msg->buf, value, len); free(value); -- 2.35.3 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 5/5] notes: use strbuf_attach to take ownership of the object contents 2024-02-18 19:59 ` [PATCH v2 0/5] " Maarten Bosmans ` (3 preceding siblings ...) 2024-02-18 19:59 ` [PATCH v2 4/5] notes: do not clean up right before calling die() Maarten Bosmans @ 2024-02-18 19:59 ` Maarten Bosmans 2024-02-20 2:12 ` Jeff King 4 siblings, 1 reply; 36+ messages in thread From: Maarten Bosmans @ 2024-02-18 19:59 UTC (permalink / raw) To: git; +Cc: Maarten Bosmans From: Maarten Bosmans <mkbosmans@gmail.com> From: Maarten Bosmans <maarten.bosmans@vortech.nl> Avoid an extra allocation in the strbuf when pushing the string into it. Signed-off-by: Maarten Bosmans <maarten.bosmans@vortech.nl> --- builtin/notes.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/builtin/notes.c b/builtin/notes.c index 6863935d03..5be24a9c58 100644 --- a/builtin/notes.c +++ b/builtin/notes.c @@ -314,10 +314,8 @@ static int parse_reuse_arg(const struct option *opt, const char *arg, int unset) if (type != OBJ_BLOB) die(_("cannot read note data from non-blob object '%s'."), arg); - strbuf_add(&msg->buf, value, len); - free(value); + strbuf_attach(&msg->buf, value, len, len + 1); - msg->buf.len = len; ALLOC_GROW_BY(d->messages, d->msg_nr, 1, d->msg_alloc); d->messages[d->msg_nr - 1] = msg; msg->stripspace = NO_STRIPSPACE; @@ -705,12 +703,11 @@ static int append_edit(int argc, const char **argv, const char *prefix) if (!prev_buf) die(_("unable to read %s"), oid_to_hex(note)); if (size) - strbuf_add(&buf, prev_buf, size); + strbuf_attach(&buf, prev_buf, size, size + 1); if (d.buf.len && size) append_separator(&buf); strbuf_insert(&d.buf, 0, buf.buf, buf.len); - free(prev_buf); strbuf_release(&buf); } -- 2.35.3 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v2 5/5] notes: use strbuf_attach to take ownership of the object contents 2024-02-18 19:59 ` [PATCH v2 5/5] notes: use strbuf_attach to take ownership of the object contents Maarten Bosmans @ 2024-02-20 2:12 ` Jeff King 2024-02-20 7:42 ` Maarten Bosmans 0 siblings, 1 reply; 36+ messages in thread From: Jeff King @ 2024-02-20 2:12 UTC (permalink / raw) To: Maarten Bosmans; +Cc: git, Maarten Bosmans On Sun, Feb 18, 2024 at 08:59:38PM +0100, Maarten Bosmans wrote: > @@ -705,12 +703,11 @@ static int append_edit(int argc, const char **argv, const char *prefix) > if (!prev_buf) > die(_("unable to read %s"), oid_to_hex(note)); > if (size) > - strbuf_add(&buf, prev_buf, size); > + strbuf_attach(&buf, prev_buf, size, size + 1); > if (d.buf.len && size) > append_separator(&buf); > strbuf_insert(&d.buf, 0, buf.buf, buf.len); > > - free(prev_buf); > strbuf_release(&buf); > } Is it possible for "size" to be 0, but prev_buf to be non-NULL? I assume it is so if the previous note is the empty object (and anyway, we'd have died earlier if prev_buf was NULL). In that case your patch introduces a leak (we do not attach prev_buf to buf, but we no longer free prev_buf). I'm a little skeptical that this is actually increasing the speed of the command in a measurable way, though. It's one allocation/copy, right next to a big old strbuf_insert() that is going to splice into an existing array. -Peff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 5/5] notes: use strbuf_attach to take ownership of the object contents 2024-02-20 2:12 ` Jeff King @ 2024-02-20 7:42 ` Maarten Bosmans 0 siblings, 0 replies; 36+ messages in thread From: Maarten Bosmans @ 2024-02-20 7:42 UTC (permalink / raw) To: Jeff King; +Cc: git Op di 20 feb 2024 om 03:12 schreef Jeff King <peff@peff.net>: > > On Sun, Feb 18, 2024 at 08:59:38PM +0100, Maarten Bosmans wrote: > > > @@ -705,12 +703,11 @@ static int append_edit(int argc, const char **argv, const char *prefix) > > if (!prev_buf) > > die(_("unable to read %s"), oid_to_hex(note)); > > if (size) > > - strbuf_add(&buf, prev_buf, size); > > + strbuf_attach(&buf, prev_buf, size, size + 1); > > if (d.buf.len && size) > > append_separator(&buf); > > strbuf_insert(&d.buf, 0, buf.buf, buf.len); > > > > - free(prev_buf); > > strbuf_release(&buf); > > } > > Is it possible for "size" to be 0, but prev_buf to be non-NULL? I assume > it is so if the previous note is the empty object (and anyway, we'd have > died earlier if prev_buf was NULL). In that case your patch introduces a > leak (we do not attach prev_buf to buf, but we no longer free prev_buf). You are right. I think the `if (size)` is not needed and removing it would remove the potential for a leak. > I'm a little skeptical that this is actually increasing the speed of the > command in a measurable way, though. It's one allocation/copy, right > next to a big old strbuf_insert() that is going to splice into an > existing array. Yeah, I was doubting this patch a bit too. The simple idiom of starting with an empty strbuf and appending strings to it seems pretty nice and clear, so may be there's value in leaving it at that. The speed increase is not measurable of course. I was simply operating in full on lets-eliminate-all-sources-of-overhead mode while profiling the notes show code. I'll drop the patch, in order to keep focus in this series. Maarten ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2024-02-20 11:40 UTC | newest] Thread overview: 36+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-02-05 20:49 [PATCH 0/4] Speed up git-notes show Maarten Bosmans 2024-02-05 20:49 ` [PATCH 1/4] notes: print note blob to stdout directly Maarten Bosmans 2024-02-06 3:44 ` Junio C Hamano 2024-02-06 9:55 ` Maarten Bosmans 2024-02-06 17:52 ` Junio C Hamano 2024-02-13 8:00 ` Jeff King 2024-02-13 17:35 ` Junio C Hamano 2024-02-15 5:26 ` Jeff King 2024-02-16 6:25 ` Junio C Hamano 2024-02-17 5:16 ` Jeff King 2024-02-17 5:56 ` Junio C Hamano 2024-02-17 6:09 ` Jeff King 2024-02-15 7:46 ` Maarten Bosmans 2024-02-15 15:04 ` Jeff King 2024-02-17 12:45 ` Maarten Bosmans 2024-02-20 1:51 ` Jeff King 2024-02-15 7:41 ` Maarten Bosmans 2024-02-06 13:55 ` Kristoffer Haugsbakk 2024-02-05 20:49 ` [PATCH 2/4] notes: use exisisting function stream_blob_to_fd Maarten Bosmans 2024-02-05 22:00 ` Eric Sunshine 2024-02-05 20:49 ` [PATCH 3/4] notes: do not clean up right before calling die() Maarten Bosmans 2024-02-05 20:49 ` [PATCH 4/4] notes: use strbuf_attach to take ownership of the object contents Maarten Bosmans 2024-02-06 7:08 ` [PATCH 0/4] Speed up git-notes show Kristoffer Haugsbakk 2024-02-06 8:51 ` Maarten Bosmans 2024-02-18 19:59 ` [PATCH v2 0/5] " Maarten Bosmans 2024-02-18 19:59 ` [PATCH v2 1/5] log: Move show_blob_object() to log.c Maarten Bosmans 2024-02-20 1:22 ` Junio C Hamano 2024-02-20 1:59 ` Jeff King 2024-02-20 3:03 ` Junio C Hamano 2024-02-20 11:40 ` Maarten Bosmans 2024-02-18 19:59 ` [PATCH v2 2/5] notes: avoid launching a child process to show a note blob Maarten Bosmans 2024-02-18 19:59 ` [PATCH v2 3/5] notes: use existing function stream_blob_to_fd Maarten Bosmans 2024-02-18 19:59 ` [PATCH v2 4/5] notes: do not clean up right before calling die() Maarten Bosmans 2024-02-18 19:59 ` [PATCH v2 5/5] notes: use strbuf_attach to take ownership of the object contents Maarten Bosmans 2024-02-20 2:12 ` Jeff King 2024-02-20 7:42 ` Maarten Bosmans
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).