* [BUGREPORT] git diff-tree --cc SEGFAUTs
@ 2025-01-03 19:28 Wink Saville
2025-01-03 20:46 ` Jeff King
0 siblings, 1 reply; 38+ messages in thread
From: Wink Saville @ 2025-01-03 19:28 UTC (permalink / raw)
To: Git List
[-- Attachment #1: Type: text/plain, Size: 115 bytes --]
`git diff-tree --cc` SEGFAUTs after adding trace_printf to diff_tree_combined.
Details in attached git-bugreport.
[-- Attachment #2: git-bugreport-2025-01-02-1731.txt --]
[-- Type: text/plain, Size: 10602 bytes --]
Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.
What did you do before the bug happened? (Steps to reproduce your issue)
I'm learning some inner workings of git so I've added `trace_printf` statements
to the `diff_tree_combined` function and others. While running ./git I ran into a
problem where `git diff-tree --cc` fails with a SEGFAULT.
This bug report is a "minimal" change to `diff_tree_combined` of the `git@github.com:git/git`
repo that reproduces the problem. The change adds an additional for loop inside the loop that counts
the number of "surving paths" and it prints the `struct combined_diff_path` fields so I can see
the list of paths that make up merge commits with changes.
Below is the diff which is applied to the `next` branch, it is also available on my fork on github:
https://github.com/winksaville/git/tree/wink-segfault-with-minimal-changes
```
wink@fwlaptop 25-01-03T18:43:04.330Z:~/prgs/forks/git (wink-segfault-with-minimal-changes)
$ git --no-pager diff next
diff --git a/combine-diff.c b/combine-diff.c
index 641bc92dbd..455bc19087 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -20,6 +20,8 @@
#include "oid-array.h"
#include "revision.h"
+#include "trace.h"
+
static int compare_paths(const struct combine_diff_path *one,
const struct diff_filespec *two)
{
@@ -1595,8 +1597,16 @@ void diff_tree_combined(const struct object_id *oid,
}
/* find out number of surviving paths */
- for (num_paths = 0, p = paths; p; p = p->next)
+ trace_printf("Wink diff_tree_combined: find number of surviving paths num_parent=%d\n", num_parent);
+ for (num_paths = 0, p = paths; p; p = p->next) {
+ trace_printf("Wink diff_tree_combined: num_paths=%d &p=%p mode=%0x, oid=%s path=%s\n", num_paths, p, p->mode, oid_to_hex(&p->oid), p->path);
+ for (i = 0; i < num_parent; i++) {
+ trace_printf("Wink diff_tree_combined: &p->parent[%d]=%p status=%c mode=%x oid=%s path.buf=%p contents path.buf=%s\n",
+ i, &p->parent[i], p->parent[i].status, p->parent[i].mode, oid_to_hex(&p->parent[i].oid), p->parent[i].path.buf, p->parent[i].path.buf);
+ }
num_paths++;
+ }
+ trace_printf("Wink diff_tree_combined: found %d surviving paths\n", num_paths);
/* order paths according to diffcore_order */
if (opt->orderfile && num_paths) {
wink@fwlaptop 25-01-03T18:43:18.695Z:~/prgs/forks/git (wink-segfault-with-minimal-changes)
```
I made those changes on the `next` branch of git/git repo as of yesterday, 1/2/25,
here are the the relavant logs:
```
wink@fwlaptop 25-01-03T18:43:18.695Z:~/prgs/forks/git (wink-segfault-with-minimal-changes)
$ git log -2 --oneline
edf34e3ab4 (HEAD -> wink-segfault-with-minimal-changes, origin/wink-segfault-with-minimal-changes) bug: SEGFAULT with minimal changes:
6c04ab211c (upstream/next, origin/next, next) Sync with 'master'
wink@fwlaptop 25-01-03T18:44:13.976Z:~/prgs/forks/git (wink-segfault-with-minimal-changes)
```
What did you expect to happen? (Expected behavior)
When I use `git diff-tree --cc` on a merge commit with changes I expect to see
my trace_printf statements print the `struct combined_diff_path` members and
also the output diff-tree -cc results with the "combined changes" as can be
seen below:
```
wink@fwlaptop 25-01-03T18:46:58.472Z:~/prgs/forks/git (wink-segfault-with-minimal-changes)
$ GIT_TRACE=1 ./git diff-tree --cc 6f8ae955bda8ad246cc1f5f7a15f1c3b1c04696a
10:47:04.259504 git.c:476 trace: built-in: git diff-tree --cc 6f8ae955bda8ad246cc1f5f7a15f1c3b1c04696a
6f8ae955bda8ad246cc1f5f7a15f1c3b1c04696a
10:47:04.278220 combine-diff.c:1600 Wink diff_tree_combined: find number of surviving paths num_parent=2
10:47:04.278245 combine-diff.c:1602 Wink diff_tree_combined: num_paths=0 &p=0x5dd590883f60 mode=81a4, oid=0f41b2fd4a6b679a1cfcaa9a584c382068146212 path=refs.c
10:47:04.278253 combine-diff.c:1604 Wink diff_tree_combined: &p->parent[0]=0x5dd590883f98 status=M mode=81a4 oid=7dd5e9fa3323111f06303674b213ae24ed2d04b6 path.buf=(nil) contents path.buf=(null)
10:47:04.278260 combine-diff.c:1604 Wink diff_tree_combined: &p->parent[1]=0x5dd590883fe0 status=M mode=81a4 oid=c55583986940d8ef1e1c839364c03cd92d4f7114 path.buf=(nil) contents path.buf=(null)
10:47:04.278265 combine-diff.c:1602 Wink diff_tree_combined: num_paths=1 &p=0x5dd59088b450 mode=81a4, oid=a0cdd99250e8286b55808b697b0a94afac5d8319 path=refs.h
10:47:04.278270 combine-diff.c:1604 Wink diff_tree_combined: &p->parent[0]=0x5dd59088b488 status=M mode=81a4 oid=09be47afbee51e99f4ae49588cd65596ccfcb07e path.buf=(nil) contents path.buf=(null)
10:47:04.278274 combine-diff.c:1604 Wink diff_tree_combined: &p->parent[1]=0x5dd59088b4d0 status=M mode=81a4 oid=b0dfc65ed2e59c4b66967840339f81e7746a96d3 path.buf=(nil) contents path.buf=(null)
10:47:04.278278 combine-diff.c:1602 Wink diff_tree_combined: num_paths=2 &p=0x5dd59088b530 mode=81a4, oid=5cfb8b7ca8678e171b8e8a7ad6daf1af74a81b59 path=refs/files-backend.c
10:47:04.278283 combine-diff.c:1604 Wink diff_tree_combined: &p->parent[0]=0x5dd59088b568 status=M mode=81a4 oid=467fe347fa7e7d82ed7a2836e43ea749bb90ad7d path.buf=(nil) contents path.buf=(null)
10:47:04.278287 combine-diff.c:1604 Wink diff_tree_combined: &p->parent[1]=0x5dd59088b5b0 status=M mode=81a4 oid=8953d1c6d37b13b0db701888b3db92fd87a68aaa path.buf=(nil) contents path.buf=(null)
10:47:04.278291 combine-diff.c:1602 Wink diff_tree_combined: num_paths=3 &p=0x5dd59088b620 mode=81a4, oid=16550862d3ebe3b357c52254088b143c7ba000d6 path=refs/refs-internal.h
10:47:04.278296 combine-diff.c:1604 Wink diff_tree_combined: &p->parent[0]=0x5dd59088b658 status=M mode=81a4 oid=66e66e0fc1e812ebebd1d4b0119899c84bf1c0ae path.buf=(nil) contents path.buf=(null)
10:47:04.278300 combine-diff.c:1604 Wink diff_tree_combined: &p->parent[1]=0x5dd59088b6a0 status=M mode=81a4 oid=79b287c5ec5c7d8f759869cf93cda405640186dc path.buf=(nil) contents path.buf=(null)
10:47:04.278305 combine-diff.c:1602 Wink diff_tree_combined: num_paths=4 &p=0x5dd59088b710 mode=81a4, oid=00d95a9a2f42ce74c5cb4a42175b0953287851a6 path=refs/reftable-backend.c
10:47:04.278309 combine-diff.c:1604 Wink diff_tree_combined: &p->parent[0]=0x5dd59088b748 status=M mode=81a4 oid=8a2a5b847c3d86332e319da69bfb5c8a56a10e86 path.buf=(nil) contents path.buf=(null)
10:47:04.278314 combine-diff.c:1604 Wink diff_tree_combined: &p->parent[1]=0x5dd59088b790 status=M mode=81a4 oid=bec5962debea7b62572d08f6fa8fd38ab4cd8af6 path.buf=(nil) contents path.buf=(null)
10:47:04.278318 combine-diff.c:1609 Wink diff_tree_combined: found 5 surviving paths
diff --cc refs/files-backend.c
index 467fe347fa,8953d1c6d3..5cfb8b7ca8
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@@ -2533,9 -2539,15 +2543,15 @@@ static int check_old_oid(struct ref_upd
oid_to_hex(oid),
oid_to_hex(&update->old_oid));
- return -1;
+ return ret;
}
+ struct files_transaction_backend_data {
+ struct ref_transaction *packed_transaction;
+ int packed_refs_locked;
+ struct strmap ref_locks;
+ };
+
/*
* Prepare for carrying out update:
* - Lock the reference referred to by update.
wink@fwlaptop 25-01-03T18:47:04.295Z:~/prgs/forks/git (wink-segfault-with-minimal-changes)
```
The above works because it has a hack fix I introduced and thus it works. The hack is to always
use `find_paths_generic` rather than `find_paths_multitree`. To do this I added
`need_generic_pathscan = true;` on top of the above change to have it run properly:
```
wink@fwlaptop 25-01-03T18:47:04.295Z:~/prgs/forks/git (wink-segfault-with-minimal-changes)
$ git diff
diff --git a/combine-diff.c b/combine-diff.c
index 455bc19087..f03ff6f820 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1563,7 +1563,7 @@ void diff_tree_combined(const struct object_id *oid,
(opt->pickaxe_opts &
(DIFF_PICKAXE_KINDS_MASK & ~DIFF_PICKAXE_KIND_OBJFIND)) ||
opt->filter;
-
+ need_generic_pathscan = true;
if (need_generic_pathscan) {
/*
* NOTE generic case also handles --stat, as it computes
wink@fwlaptop 25-01-03T18:49:56.900Z:~/prgs/forks/git (wink-segfault-with-minimal-changes)
```
What happened instead? (Actual behavior)
If I don't use my hack a SEGFAULT occurs:
```
wink@fwlaptop 25-01-03T18:51:32.242Z:~/prgs/forks/git (wink-segfault-with-minimal-changes)
$ GIT_TRACE=1 ./git diff-tree --cc 6f8ae955bda8ad246cc1f5f7a15f1c3b1c04696a
10:51:37.211992 git.c:476 trace: built-in: git diff-tree --cc 6f8ae955bda8ad246cc1f5f7a15f1c3b1c04696a
6f8ae955bda8ad246cc1f5f7a15f1c3b1c04696a
10:51:37.216500 combine-diff.c:1600 Wink diff_tree_combined: find number of surviving paths num_parent=2
10:51:37.216516 combine-diff.c:1602 Wink diff_tree_combined: num_paths=0 &p=0x6325add91f60 mode=81a4, oid=0f41b2fd4a6b679a1cfcaa9a584c382068146212 path=refs.c
10:51:37.216524 combine-diff.c:1604 Wink diff_tree_combined: &p->parent[0]=0x6325add91f98 status=M mode=81a4 oid=7dd5e9fa3323111f06303674b213ae24ed2d04b6 path.buf=(nil) contents path.buf=(null)
10:51:37.216530 combine-diff.c:1604 Wink diff_tree_combined: &p->parent[1]=0x6325add91fe0 status=M mode=81a4 oid=c55583986940d8ef1e1c839364c03cd92d4f7114 path.buf=0x632588e43a00 contents path.buf=�C%c
10:51:37.216536 combine-diff.c:1602 Wink diff_tree_combined: num_paths=1 &p=0x6325add990c0 mode=81a4, oid=a0cdd99250e8286b55808b697b0a94afac5d8319 path=refs.h
Segmentation fault (core dumped)
wink@fwlaptop 25-01-03T18:51:37.350Z:~/prgs/forks/git (wink-segfault-with-minimal-changes)
```
What's different between what you expected and what actually happened?
There should be no SEGFAULT
Anything else you want to add:
I have a guess on what the problem might be; that `find_paths_multitree` is not properly
initializing path.buf. I determined this because, in my limited testing, if I always use
`find_paths_generic` we see that all the pointers are NULL and we don't SEGFAULT.
Please review the rest of the bug report below.
You can delete any lines you don't wish to share.
[System Info]
git version:
git version 2.48.0.rc1.242.gedf34e3ab4
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
libcurl: 8.11.0
OpenSSL: OpenSSL 3.4.0 22 Oct 2024
zlib: 1.3.1
uname: Linux 6.12.6-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 19 Dec 2024 21:29:01 +0000 x86_64
compiler info: gnuc: 14.2
libc info: glibc: 2.40
$SHELL (typically, interactive shell): /bin/bash
[Enabled Hooks]
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [BUGREPORT] git diff-tree --cc SEGFAUTs
2025-01-03 19:28 [BUGREPORT] git diff-tree --cc SEGFAUTs Wink Saville
@ 2025-01-03 20:46 ` Jeff King
2025-01-03 23:34 ` Wink Saville
0 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2025-01-03 20:46 UTC (permalink / raw)
To: Wink Saville; +Cc: Git List
On Fri, Jan 03, 2025 at 11:28:47AM -0800, Wink Saville wrote:
> `git diff-tree --cc` SEGFAUTs after adding trace_printf to diff_tree_combined.
Hmm, is it really a bug in Git if you had to add new code which contains
the bug? :)
> @@ -1595,8 +1597,16 @@ void diff_tree_combined(const struct object_id *oid,
> }
>
> /* find out number of surviving paths */
> - for (num_paths = 0, p = paths; p; p = p->next)
> + trace_printf("Wink diff_tree_combined: find number of surviving paths num_parent=%d\n", num_parent);
> + for (num_paths = 0, p = paths; p; p = p->next) {
> + trace_printf("Wink diff_tree_combined: num_paths=%d &p=%p mode=%0x, oid=%s path=%s\n", num_paths, p, p->mode, oid_to_hex(&p->oid), p->path);
> + for (i = 0; i < num_parent; i++) {
> + trace_printf("Wink diff_tree_combined: &p->parent[%d]=%p status=%c mode=%x oid=%s path.buf=%p contents path.buf=%s\n",
> + i, &p->parent[i], p->parent[i].status, p->parent[i].mode, oid_to_hex(&p->parent[i].oid), p->parent[i].path.buf, p->parent[i].path.buf);
> + }
The parent "path" strbufs are only initialized in intersect_paths() if
combined_all_paths is set, and if there was an actual path change (a
copy or rename).
So you'd probably need something like this:
diff --git a/combine-diff.c b/combine-diff.c
index 455bc19087..1e58809c4e 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1601,8 +1601,11 @@ void diff_tree_combined(const struct object_id *oid,
for (num_paths = 0, p = paths; p; p = p->next) {
trace_printf("Wink diff_tree_combined: num_paths=%d &p=%p mode=%0x, oid=%s path=%s\n", num_paths, p, p->mode, oid_to_hex(&p->oid), p->path);
for (i = 0; i < num_parent; i++) {
+ const char *path = rev->combine_all_paths &&
+ filename_changed(p->parent[i].status) ?
+ p->parent[i].path.buf : NULL;
trace_printf("Wink diff_tree_combined: &p->parent[%d]=%p status=%c mode=%x oid=%s path.buf=%p contents path.buf=%s\n",
- i, &p->parent[i], p->parent[i].status, p->parent[i].mode, oid_to_hex(&p->parent[i].oid), p->parent[i].path.buf, p->parent[i].path.buf);
+ i, &p->parent[i], p->parent[i].status, p->parent[i].mode, oid_to_hex(&p->parent[i].oid), path, path);
}
num_paths++;
}
-Peff
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [BUGREPORT] git diff-tree --cc SEGFAUTs
2025-01-03 20:46 ` Jeff King
@ 2025-01-03 23:34 ` Wink Saville
2025-01-04 0:31 ` Jeff King
0 siblings, 1 reply; 38+ messages in thread
From: Wink Saville @ 2025-01-03 23:34 UTC (permalink / raw)
To: Jeff King; +Cc: Git List
On Fri, Jan 3, 2025 at 12:46 PM Jeff King <peff@peff.net> wrote:
>
> On Fri, Jan 03, 2025 at 11:28:47AM -0800, Wink Saville wrote:
>
> > `git diff-tree --cc` SEGFAUTs after adding trace_printf to diff_tree_combined.
>
> Hmm, is it really a bug in Git if you had to add new code which contains
> the bug? :)
>
> > @@ -1595,8 +1597,16 @@ void diff_tree_combined(const struct object_id *oid,
> > }
> >
> > /* find out number of surviving paths */
> > - for (num_paths = 0, p = paths; p; p = p->next)
> > + trace_printf("Wink diff_tree_combined: find number of surviving paths num_parent=%d\n", num_parent);
> > + for (num_paths = 0, p = paths; p; p = p->next) {
> > + trace_printf("Wink diff_tree_combined: num_paths=%d &p=%p mode=%0x, oid=%s path=%s\n", num_paths, p, p->mode, oid_to_hex(&p->oid), p->path);
> > + for (i = 0; i < num_parent; i++) {
> > + trace_printf("Wink diff_tree_combined: &p->parent[%d]=%p status=%c mode=%x oid=%s path.buf=%p contents path.buf=%s\n",
> > + i, &p->parent[i], p->parent[i].status, p->parent[i].mode, oid_to_hex(&p->parent[i].oid), p->parent[i].path.buf, p->parent[i].path.buf);
> > + }
>
> The parent "path" strbufs are only initialized in intersect_paths() if
> combined_all_paths is set, and if there was an actual path change (a
> copy or rename).
>
> So you'd probably need something like this:
>
> diff --git a/combine-diff.c b/combine-diff.c
> index 455bc19087..1e58809c4e 100644
> --- a/combine-diff.c
> +++ b/combine-diff.c
> @@ -1601,8 +1601,11 @@ void diff_tree_combined(const struct object_id *oid,
> for (num_paths = 0, p = paths; p; p = p->next) {
> trace_printf("Wink diff_tree_combined: num_paths=%d &p=%p mode=%0x, oid=%s path=%s\n", num_paths, p, p->mode, oid_to_hex(&p->oid), p->path);
> for (i = 0; i < num_parent; i++) {
> + const char *path = rev->combine_all_paths &&
> + filename_changed(p->parent[i].status) ?
> + p->parent[i].path.buf : NULL;
> trace_printf("Wink diff_tree_combined: &p->parent[%d]=%p status=%c mode=%x oid=%s path.buf=%p contents path.buf=%s\n",
> - i, &p->parent[i], p->parent[i].status, p->parent[i].mode, oid_to_hex(&p->parent[i].oid), p->parent[i].path.buf, p->parent[i].path.buf);
> + i, &p->parent[i], p->parent[i].status, p->parent[i].mode, oid_to_hex(&p->parent[i].oid), path, path);
> }
> num_paths++;
> }
>
> -Peff
TYVM!
That worked but changed the name and fixed a typo in `combined_all_paths`:
```
wink@3900x 25-01-03T23:06:08.344Z:~/data/prgs/forks/git
(wink-segfault-with-minimal-changes)
$ git diff
diff --git a/combine-diff.c b/combine-diff.c
index 455bc19087..70394c3350 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1601,8 +1601,9 @@ void diff_tree_combined(const struct object_id *oid,
for (num_paths = 0, p = paths; p; p = p->next) {
trace_printf("Wink diff_tree_combined: num_paths=%d
&p=%p mode=%0x, oid=%s path=%s\n", num_paths, p, p->mode,
oid_to_hex(&p->oid), p->path);
for (i = 0; i < num_parent; i++) {
+ const char *parent_path =
rev->combined_all_paths && filename_changed(p->parent[i].status) ?
p->parent[i].path.buf : NULL;
trace_printf("Wink diff_tree_combined:
&p->parent[%d]=%p status=%c mode=%x oid=%s path.buf=%p contents
path.buf=%s\n",
- i, &p->parent[i],
p->parent[i].status, p->parent[i].mode, oid_to_hex(&p->parent[i].oid),
p->parent[i].path.buf, p->parent[i].path.buf);
+ i, &p->parent[i],
p->parent[i].status, p->parent[i].mode, oid_to_hex(&p->parent[i].oid),
parent_path, parent_path);
}
num_paths++;
}
```
But having to protect yourself is unobvious and especially if it isn't necessary
when using the `fetch_paths_generic`.
In addition, from strbuf.h `buf` is never NULL:
"
* strbufs have some invariants that are very important to keep in mind:
*
* - The `buf` member is never NULL, so it can be used in any usual C
* string operations safely. strbufs _have_ to be initialized either by
* `strbuf_init()` or by `= STRBUF_INIT` before the invariants, though.
*
"
So I'd say this could be considered a bug in git at least in how
combine_diff_path
is being managed. I assume you agree that neither find_paths_generic or
find_paths_multitree are adhering to at least that strbuf invariant and I wonder
if the other strbuf invariants are being upheld.
So, should this bug be "closed" and a new one "created"?
Actually, using the mailing list to identify bugs and initially discuss
them, seems fine. But is there a place where there is a list of current bugs and
their state?
-- wink
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [BUGREPORT] git diff-tree --cc SEGFAUTs
2025-01-03 23:34 ` Wink Saville
@ 2025-01-04 0:31 ` Jeff King
2025-01-04 2:55 ` Junio C Hamano
0 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2025-01-04 0:31 UTC (permalink / raw)
To: Wink Saville; +Cc: Git List
On Fri, Jan 03, 2025 at 03:34:58PM -0800, Wink Saville wrote:
> But having to protect yourself is unobvious and especially if it isn't necessary
> when using the `fetch_paths_generic`.
>
> In addition, from strbuf.h `buf` is never NULL:
>
> "
> * strbufs have some invariants that are very important to keep in mind:
> *
> * - The `buf` member is never NULL, so it can be used in any usual C
> * string operations safely. strbufs _have_ to be initialized either by
> * `strbuf_init()` or by `= STRBUF_INIT` before the invariants, though.
> *
> "
>
> So I'd say this could be considered a bug in git at least in how
> combine_diff_path
> is being managed. I assume you agree that neither find_paths_generic or
> find_paths_multitree are adhering to at least that strbuf invariant and I wonder
> if the other strbuf invariants are being upheld.
The strbuf invariant can only be held on strbufs which have been
initialized, and this one has not. I don't think it's wrong to have
variables which not (yet) been initialized. It can make for a fragile
interface, though, if uninitialized struct members are exposed widely.
I'm not sure how wide this case is. It's mostly an internal combine-diff
data structure, though it looks like it gets exposed to other code in a
few spots (though nobody outside of combine-diff.c currently looks at
the parent paths at all).
So I wouldn't call it a bug, as the internals of Git are not part of the
public interface and there is no user-visible behavior problem without
patching. But I doubt anybody would object to a patch making the API
less fragile if it can be done cheaply and easily. And strbufs are
designed to be cheap to initialize. So something like (completely
untested):
diff --git a/combine-diff.c b/combine-diff.c
index 641bc92dbd..452b5f5beb 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -67,9 +67,9 @@ static struct combine_diff_path *intersect_paths(
p->parent[n].mode = q->queue[i]->one->mode;
p->parent[n].status = q->queue[i]->status;
+ strbuf_init(&p->parent[n].path, 0);
if (combined_all_paths &&
filename_changed(p->parent[n].status)) {
- strbuf_init(&p->parent[n].path, 0);
strbuf_addstr(&p->parent[n].path,
q->queue[i]->one->path);
}
@@ -92,9 +92,7 @@ static struct combine_diff_path *intersect_paths(
/* p->path not in q->queue[]; drop it */
*tail = p->next;
for (j = 0; j < num_parent; j++)
- if (combined_all_paths &&
- filename_changed(p->parent[j].status))
- strbuf_release(&p->parent[j].path);
+ strbuf_release(&p->parent[j].path);
free(p);
continue;
}
@@ -1645,9 +1643,7 @@ void diff_tree_combined(const struct object_id *oid,
struct combine_diff_path *tmp = paths;
paths = paths->next;
for (i = 0; i < num_parent; i++)
- if (rev->combined_all_paths &&
- filename_changed(tmp->parent[i].status))
- strbuf_release(&tmp->parent[i].path);
+ strbuf_release(&tmp->parent[i].path);
free(tmp);
}
might help the uninitialized-pointer issue. OTOH it is not really
solving the more fundamental problem, which is that p->parent[i].path is
only sometimes useful (we do not fill it in if it would just be the same
as p->path, so the patch only changes it from uninitialized memory into
an empty strbuf).
And that is probably not something we want to change, as allocating
duplicates of each path may be expensive. Probably we'd be better to
encapsulate it in a function which falls back to p->path automatically.
But then, AFAICT there are only two sites (both inside combine-diff.c)
which look at it, so it would mostly be hypothetical future-proofing. I
dunno.
> So, should this bug be "closed" and a new one "created"?
>
> Actually, using the mailing list to identify bugs and initially discuss
> them, seems fine. But is there a place where there is a list of current bugs and
> their state?
No, there's no bug tracker for the project[1]. Discussion may lead to a
patch or not, which may be applied or not, but there is no formal
classification of "open" or "fixed" or "won't fix".
-Peff
[1] Some folks seem to be using:
https://git.issues.gerritcodereview.com/issues?q=status:open
but I don't know how active it is. I never use it and had to dig out
the link to https://crbug.com, which now redirects there, from a
message from 2017. There's some recent activity, but I wouldn't
count on opening something there to get wide attention.
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [BUGREPORT] git diff-tree --cc SEGFAUTs
2025-01-04 0:31 ` Jeff King
@ 2025-01-04 2:55 ` Junio C Hamano
2025-01-04 3:32 ` Jeff King
0 siblings, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2025-01-04 2:55 UTC (permalink / raw)
To: Jeff King; +Cc: Wink Saville, Git List
Jeff King <peff@peff.net> writes:
> ... OTOH it is not really
> solving the more fundamental problem, which is that p->parent[i].path is
> only sometimes useful (we do not fill it in if it would just be the same
> as p->path, so the patch only changes it from uninitialized memory into
> an empty strbuf).
>
> And that is probably not something we want to change, as allocating
> duplicates of each path may be expensive.
Nicely said. I reached the same conclusion after looking at the
existing code, even though I have to admit that I am not a huge fan
of the more recent part of combine-diff.c and its data structures.
Thanks.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [BUGREPORT] git diff-tree --cc SEGFAUTs
2025-01-04 2:55 ` Junio C Hamano
@ 2025-01-04 3:32 ` Jeff King
2025-01-04 18:09 ` Wink Saville
2025-01-09 8:27 ` [PATCH 0/14] combine-diff cleanups Jeff King
0 siblings, 2 replies; 38+ messages in thread
From: Jeff King @ 2025-01-04 3:32 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Wink Saville, Git List
On Fri, Jan 03, 2025 at 06:55:18PM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > ... OTOH it is not really
> > solving the more fundamental problem, which is that p->parent[i].path is
> > only sometimes useful (we do not fill it in if it would just be the same
> > as p->path, so the patch only changes it from uninitialized memory into
> > an empty strbuf).
> >
> > And that is probably not something we want to change, as allocating
> > duplicates of each path may be expensive.
>
> Nicely said. I reached the same conclusion after looking at the
> existing code, even though I have to admit that I am not a huge fan
> of the more recent part of combine-diff.c and its data structures.
I poked at this a little bit more, so here are a few tidbits:
- the patch I showed earlier is not sufficient! There are lots of
other spots that create combine_diff_path structs but don't bother
to put anything in the parent paths at all. It works now because
they also don't set a status that triggers filename_changed(). But
what I showed earlier was wrong, because it was assuming in the
cleanup functions that the strbufs were always initialized.
- there's really no need for a strbuf at all here. It is always
uninitialized/empty, or contains a direct copy of a path string. So
a raw pointer with xstrdup() is plenty. And then we can use NULL to
mean "it was not set".
Which would Just Work for all those other spots if they bothered to
zero the memory they allocated, but they don't. So we have to update
them to set it to NULL anyway. That patch is below.
- it is not at all clear to me that we need to be allocating at all.
We always copy a string from the diff_queue. Do our
combine_diff_path structs persist beyond then? I'm not sure. It is
probably asking for trouble to just point to them directly without
copying, as it creates a dependency (that even if it is not needed
now, is a trap for somebody later). But it would drop some
allocation/cleanup code, and we could just have p->parent[i].path
fall back to p->path naturally.
diff --git a/combine-diff.c b/combine-diff.c
index 641bc92dbd..0d9d344c4e 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -66,13 +66,9 @@ static struct combine_diff_path *intersect_paths(
oidcpy(&p->parent[n].oid, &q->queue[i]->one->oid);
p->parent[n].mode = q->queue[i]->one->mode;
p->parent[n].status = q->queue[i]->status;
-
- if (combined_all_paths &&
- filename_changed(p->parent[n].status)) {
- strbuf_init(&p->parent[n].path, 0);
- strbuf_addstr(&p->parent[n].path,
- q->queue[i]->one->path);
- }
+ p->parent[n].path = combined_all_paths &&
+ filename_changed(p->parent[n].status) ?
+ xstrdup(q->queue[i]->one->path) : NULL;
*tail = p;
tail = &p->next;
}
@@ -92,9 +88,7 @@ static struct combine_diff_path *intersect_paths(
/* p->path not in q->queue[]; drop it */
*tail = p->next;
for (j = 0; j < num_parent; j++)
- if (combined_all_paths &&
- filename_changed(p->parent[j].status))
- strbuf_release(&p->parent[j].path);
+ free(p->parent[j].path);
free(p);
continue;
}
@@ -108,10 +102,9 @@ static struct combine_diff_path *intersect_paths(
oidcpy(&p->parent[n].oid, &q->queue[i]->one->oid);
p->parent[n].mode = q->queue[i]->one->mode;
p->parent[n].status = q->queue[i]->status;
- if (combined_all_paths &&
- filename_changed(p->parent[n].status))
- strbuf_addstr(&p->parent[n].path,
- q->queue[i]->one->path);
+ p->parent[n].path = combined_all_paths &&
+ filename_changed(p->parent[n].status) ?
+ xstrdup(q->queue[i]->one->path) : NULL;
tail = &p->next;
i++;
@@ -996,8 +989,9 @@ static void show_combined_header(struct combine_diff_path *elem,
if (rev->combined_all_paths) {
for (i = 0; i < num_parent; i++) {
- char *path = filename_changed(elem->parent[i].status)
- ? elem->parent[i].path.buf : elem->path;
+ const char *path = elem->parent[i].path ?
+ elem->parent[i].path :
+ elem->path;
if (elem->parent[i].status == DIFF_STATUS_ADDED)
dump_quoted_path("--- ", "", "/dev/null",
line_prefix, c_meta, c_reset);
@@ -1278,12 +1272,10 @@ static void show_raw_diff(struct combine_diff_path *p, int num_parent, struct re
for (i = 0; i < num_parent; i++)
if (rev->combined_all_paths) {
- if (filename_changed(p->parent[i].status))
- write_name_quoted(p->parent[i].path.buf, stdout,
- inter_name_termination);
- else
- write_name_quoted(p->path, stdout,
- inter_name_termination);
+ const char *path = p->parent[i].path ?
+ p->parent[i].path :
+ p->path;
+ write_name_quoted(path, stdout, inter_name_termination);
}
write_name_quoted(p->path, stdout, line_termination);
}
@@ -1645,9 +1637,7 @@ void diff_tree_combined(const struct object_id *oid,
struct combine_diff_path *tmp = paths;
paths = paths->next;
for (i = 0; i < num_parent; i++)
- if (rev->combined_all_paths &&
- filename_changed(tmp->parent[i].status))
- strbuf_release(&tmp->parent[i].path);
+ free(tmp->parent[i].path);
free(tmp);
}
diff --git a/diff-lib.c b/diff-lib.c
index c6d3bc4d37..88a5aed736 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -417,9 +417,11 @@ static int show_modified(struct rev_info *revs,
memset(p->parent, 0, 2 * sizeof(struct combine_diff_parent));
p->parent[0].status = DIFF_STATUS_MODIFIED;
p->parent[0].mode = new_entry->ce_mode;
+ p->parent[0].path = NULL;
oidcpy(&p->parent[0].oid, &new_entry->oid);
p->parent[1].status = DIFF_STATUS_MODIFIED;
p->parent[1].mode = old_entry->ce_mode;
+ p->parent[1].path = NULL;
oidcpy(&p->parent[1].oid, &old_entry->oid);
show_combined_diff(p, 2, revs);
free(p);
diff --git a/diff.h b/diff.h
index 6e6007c17b..3157faeabb 100644
--- a/diff.h
+++ b/diff.h
@@ -480,7 +480,7 @@ struct combine_diff_path {
char status;
unsigned int mode;
struct object_id oid;
- struct strbuf path;
+ char *path;
} parent[FLEX_ARRAY];
};
#define combine_diff_path_size(n, l) \
diff --git a/tree-diff.c b/tree-diff.c
index d9237ffd9b..57af377c2b 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -272,6 +272,7 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
}
p->parent[i].mode = mode_i;
+ p->parent[i].path = NULL;
oidcpy(&p->parent[i].oid, oid_i);
}
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [BUGREPORT] git diff-tree --cc SEGFAUTs
2025-01-04 3:32 ` Jeff King
@ 2025-01-04 18:09 ` Wink Saville
2025-01-05 22:13 ` Wink Saville
2025-01-09 8:27 ` [PATCH 0/14] combine-diff cleanups Jeff King
1 sibling, 1 reply; 38+ messages in thread
From: Wink Saville @ 2025-01-04 18:09 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Git List
On Fri, Jan 3, 2025 at 7:32 PM Jeff King <peff@peff.net> wrote:
>
> On Fri, Jan 03, 2025 at 06:55:18PM -0800, Junio C Hamano wrote:
>
> > Jeff King <peff@peff.net> writes:
> >
> > > ... OTOH it is not really
> > > solving the more fundamental problem, which is that p->parent[i].path is
> > > only sometimes useful (we do not fill it in if it would just be the same
> > > as p->path, so the patch only changes it from uninitialized memory into
> > > an empty strbuf).
> > >
> > > And that is probably not something we want to change, as allocating
> > > duplicates of each path may be expensive.
> >
> > Nicely said. I reached the same conclusion after looking at the
> > existing code, even though I have to admit that I am not a huge fan
> > of the more recent part of combine-diff.c and its data structures.
>
> I poked at this a little bit more, so here are a few tidbits:
>
> - the patch I showed earlier is not sufficient! There are lots of
> other spots that create combine_diff_path structs but don't bother
> to put anything in the parent paths at all. It works now because
> they also don't set a status that triggers filename_changed(). But
> what I showed earlier was wrong, because it was assuming in the
> cleanup functions that the strbufs were always initialized.
>
> - there's really no need for a strbuf at all here. It is always
> uninitialized/empty, or contains a direct copy of a path string. So
> a raw pointer with xstrdup() is plenty. And then we can use NULL to
> mean "it was not set".
>
> Which would Just Work for all those other spots if they bothered to
> zero the memory they allocated, but they don't. So we have to update
> them to set it to NULL anyway. That patch is below.
>
> - it is not at all clear to me that we need to be allocating at all.
> We always copy a string from the diff_queue. Do our
> combine_diff_path structs persist beyond then? I'm not sure. It is
> probably asking for trouble to just point to them directly without
> copying, as it creates a dependency (that even if it is not needed
> now, is a trap for somebody later). But it would drop some
> allocation/cleanup code, and we could just have p->parent[i].path
> fall back to p->path naturally.
>
> diff --git a/combine-diff.c b/combine-diff.c
> index 641bc92dbd..0d9d344c4e 100644
> --- a/combine-diff.c
> +++ b/combine-diff.c
> @@ -66,13 +66,9 @@ static struct combine_diff_path *intersect_paths(
> oidcpy(&p->parent[n].oid, &q->queue[i]->one->oid);
> p->parent[n].mode = q->queue[i]->one->mode;
> p->parent[n].status = q->queue[i]->status;
> -
> - if (combined_all_paths &&
> - filename_changed(p->parent[n].status)) {
> - strbuf_init(&p->parent[n].path, 0);
> - strbuf_addstr(&p->parent[n].path,
> - q->queue[i]->one->path);
> - }
> + p->parent[n].path = combined_all_paths &&
> + filename_changed(p->parent[n].status) ?
> + xstrdup(q->queue[i]->one->path) : NULL;
> *tail = p;
> tail = &p->next;
> }
> @@ -92,9 +88,7 @@ static struct combine_diff_path *intersect_paths(
> /* p->path not in q->queue[]; drop it */
> *tail = p->next;
> for (j = 0; j < num_parent; j++)
> - if (combined_all_paths &&
> - filename_changed(p->parent[j].status))
> - strbuf_release(&p->parent[j].path);
> + free(p->parent[j].path);
> free(p);
> continue;
> }
> @@ -108,10 +102,9 @@ static struct combine_diff_path *intersect_paths(
> oidcpy(&p->parent[n].oid, &q->queue[i]->one->oid);
> p->parent[n].mode = q->queue[i]->one->mode;
> p->parent[n].status = q->queue[i]->status;
> - if (combined_all_paths &&
> - filename_changed(p->parent[n].status))
> - strbuf_addstr(&p->parent[n].path,
> - q->queue[i]->one->path);
> + p->parent[n].path = combined_all_paths &&
> + filename_changed(p->parent[n].status) ?
> + xstrdup(q->queue[i]->one->path) : NULL;
>
> tail = &p->next;
> i++;
> @@ -996,8 +989,9 @@ static void show_combined_header(struct combine_diff_path *elem,
>
> if (rev->combined_all_paths) {
> for (i = 0; i < num_parent; i++) {
> - char *path = filename_changed(elem->parent[i].status)
> - ? elem->parent[i].path.buf : elem->path;
> + const char *path = elem->parent[i].path ?
> + elem->parent[i].path :
> + elem->path;
> if (elem->parent[i].status == DIFF_STATUS_ADDED)
> dump_quoted_path("--- ", "", "/dev/null",
> line_prefix, c_meta, c_reset);
> @@ -1278,12 +1272,10 @@ static void show_raw_diff(struct combine_diff_path *p, int num_parent, struct re
>
> for (i = 0; i < num_parent; i++)
> if (rev->combined_all_paths) {
> - if (filename_changed(p->parent[i].status))
> - write_name_quoted(p->parent[i].path.buf, stdout,
> - inter_name_termination);
> - else
> - write_name_quoted(p->path, stdout,
> - inter_name_termination);
> + const char *path = p->parent[i].path ?
> + p->parent[i].path :
> + p->path;
> + write_name_quoted(path, stdout, inter_name_termination);
> }
> write_name_quoted(p->path, stdout, line_termination);
> }
> @@ -1645,9 +1637,7 @@ void diff_tree_combined(const struct object_id *oid,
> struct combine_diff_path *tmp = paths;
> paths = paths->next;
> for (i = 0; i < num_parent; i++)
> - if (rev->combined_all_paths &&
> - filename_changed(tmp->parent[i].status))
> - strbuf_release(&tmp->parent[i].path);
> + free(tmp->parent[i].path);
> free(tmp);
> }
>
> diff --git a/diff-lib.c b/diff-lib.c
> index c6d3bc4d37..88a5aed736 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -417,9 +417,11 @@ static int show_modified(struct rev_info *revs,
> memset(p->parent, 0, 2 * sizeof(struct combine_diff_parent));
> p->parent[0].status = DIFF_STATUS_MODIFIED;
> p->parent[0].mode = new_entry->ce_mode;
> + p->parent[0].path = NULL;
> oidcpy(&p->parent[0].oid, &new_entry->oid);
> p->parent[1].status = DIFF_STATUS_MODIFIED;
> p->parent[1].mode = old_entry->ce_mode;
> + p->parent[1].path = NULL;
> oidcpy(&p->parent[1].oid, &old_entry->oid);
> show_combined_diff(p, 2, revs);
> free(p);
> diff --git a/diff.h b/diff.h
> index 6e6007c17b..3157faeabb 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -480,7 +480,7 @@ struct combine_diff_path {
> char status;
> unsigned int mode;
> struct object_id oid;
> - struct strbuf path;
> + char *path;
> } parent[FLEX_ARRAY];
> };
> #define combine_diff_path_size(n, l) \
> diff --git a/tree-diff.c b/tree-diff.c
> index d9237ffd9b..57af377c2b 100644
> --- a/tree-diff.c
> +++ b/tree-diff.c
> @@ -272,6 +272,7 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
> }
>
> p->parent[i].mode = mode_i;
> + p->parent[i].path = NULL;
> oidcpy(&p->parent[i].oid, oid_i);
> }
The above LGTM and hopefully it can be accepted.
With that change I can revert my trace_printfs of combine_diff_path
back to something simple:
```
$ git diff HEAD^
diff --git a/combine-diff.c b/combine-diff.c
index 5e0b7919bc..4764383f20 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1593,8 +1593,8 @@ void diff_tree_combined(const struct object_id *oid,
for (num_paths = 0, p = paths; p; p = p->next) {
trace_printf("Wink diff_tree_combined: num_paths=%d
&p=%p mode=%0x, oid=%s path=%s\n", num_paths, p, p->mode,
oid_to_hex(&p->oid), p->path);
for (i = 0; i < num_parent; i++) {
- trace_printf("Wink diff_tree_combined:
&p->parent[%d]=%p status=%c mode=%x oid=%s path.buf=%p contents
path.buf=%s\n",
- i, &p->parent[i],
p->parent[i].status, p->parent[i].mode, oid_to_hex(&p->parent[i].oid),
p->parent[i].path.buf, p->parent[i].path.buf);
+ trace_printf("Wink diff_tree_combined:
&p->parent[%d]=%p status=%c mode=%x oid=%s path=%s\n",
+ i, &p->parent[i],
p->parent[i].status, p->parent[i].mode, oid_to_hex(&p->parent[i].oid),
p->parent[i].path);
}
num_paths++;
}
```
And the output doesn't SEGFAULT :)
```
$ GIT_TRACE=1 ./git diff-tree --cc 6f8ae955bda8ad246cc1f5f7a15f1c3b1c04696a
10:06:11.716284 git.c:476 trace: built-in: git diff-tree
--cc 6f8ae955bda8ad246cc1f5f7a15f1c3b1c04696a
6f8ae955bda8ad246cc1f5f7a15f1c3b1c04696a
10:06:11.718102 combine-diff.c:1592 Wink diff_tree_combined: find
number of surviving paths num_parent=2
10:06:11.718108 combine-diff.c:1594 Wink diff_tree_combined:
num_paths=0 &p=0x643ac70f7ef0 mode=81a4,
oid=0f41b2fd4a6b679a1cfcaa9a584c382068146212 path=refs.c
10:06:11.718112 combine-diff.c:1596 Wink diff_tree_combined:
&p->parent[0]=0x643ac70f7f28 status=M mode=81a4
oid=7dd5e9fa3323111f06303674b213ae24ed2d04b6 path=(null)
10:06:11.718116 combine-diff.c:1596 Wink diff_tree_combined:
&p->parent[1]=0x643ac70f7f60 status=M mode=81a4
oid=c55583986940d8ef1e1c839364c03cd92d4f7114 path=(null)
10:06:11.718120 combine-diff.c:1594 Wink diff_tree_combined:
num_paths=1 &p=0x643ac70f7fb0 mode=81a4,
oid=a0cdd99250e8286b55808b697b0a94afac5d8319 path=refs.h
10:06:11.718123 combine-diff.c:1596 Wink diff_tree_combined:
&p->parent[0]=0x643ac70f7fe8 status=M mode=81a4
oid=09be47afbee51e99f4ae49588cd65596ccfcb07e path=(null)
10:06:11.718126 combine-diff.c:1596 Wink diff_tree_combined:
&p->parent[1]=0x643ac70f8020 status=M mode=81a4
oid=b0dfc65ed2e59c4b66967840339f81e7746a96d3 path=(null)
10:06:11.718129 combine-diff.c:1594 Wink diff_tree_combined:
num_paths=2 &p=0x643ac70f8900 mode=81a4,
oid=5cfb8b7ca8678e171b8e8a7ad6daf1af74a81b59 path=refs/files-backend.c
10:06:11.718132 combine-diff.c:1596 Wink diff_tree_combined:
&p->parent[0]=0x643ac70f8938 status=M mode=81a4
oid=467fe347fa7e7d82ed7a2836e43ea749bb90ad7d path=(null)
10:06:11.718135 combine-diff.c:1596 Wink diff_tree_combined:
&p->parent[1]=0x643ac70f8970 status=M mode=81a4
oid=8953d1c6d37b13b0db701888b3db92fd87a68aaa path=(null)
10:06:11.718138 combine-diff.c:1594 Wink diff_tree_combined:
num_paths=3 &p=0x643ac70f89d0 mode=81a4,
oid=16550862d3ebe3b357c52254088b143c7ba000d6 path=refs/refs-internal.h
10:06:11.718142 combine-diff.c:1596 Wink diff_tree_combined:
&p->parent[0]=0x643ac70f8a08 status=M mode=81a4
oid=66e66e0fc1e812ebebd1d4b0119899c84bf1c0ae path=(null)
10:06:11.718162 combine-diff.c:1596 Wink diff_tree_combined:
&p->parent[1]=0x643ac70f8a40 status=M mode=81a4
oid=79b287c5ec5c7d8f759869cf93cda405640186dc path=(null)
10:06:11.718181 combine-diff.c:1594 Wink diff_tree_combined:
num_paths=4 &p=0x643ac70f8aa0 mode=81a4,
oid=00d95a9a2f42ce74c5cb4a42175b0953287851a6
path=refs/reftable-backend.c
10:06:11.718184 combine-diff.c:1596 Wink diff_tree_combined:
&p->parent[0]=0x643ac70f8ad8 status=M mode=81a4
oid=8a2a5b847c3d86332e319da69bfb5c8a56a10e86 path=(null)
10:06:11.718188 combine-diff.c:1596 Wink diff_tree_combined:
&p->parent[1]=0x643ac70f8b10 status=M mode=81a4
oid=bec5962debea7b62572d08f6fa8fd38ab4cd8af6 path=(null)
10:06:11.718192 combine-diff.c:1601 Wink diff_tree_combined: found
5 surviving paths
diff --cc refs/files-backend.c
index 467fe347fa,8953d1c6d3..5cfb8b7ca8
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@@ -2533,9 -2539,15 +2543,15 @@@ static int check_old_oid(struct ref_upd
oid_to_hex(oid),
oid_to_hex(&update->old_oid));
- return -1;
+ return ret;
}
+ struct files_transaction_backend_data {
+ struct ref_transaction *packed_transaction;
+ int packed_refs_locked;
+ struct strmap ref_locks;
+ };
+
/*
* Prepare for carrying out update:
* - Lock the reference referred to by update.
```
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [BUGREPORT] git diff-tree --cc SEGFAUTs
2025-01-04 18:09 ` Wink Saville
@ 2025-01-05 22:13 ` Wink Saville
0 siblings, 0 replies; 38+ messages in thread
From: Wink Saville @ 2025-01-05 22:13 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Git List
[-- Attachment #1: Type: text/plain, Size: 293 bytes --]
I'd like to suggest one minor tweak to Peff's change. Rather than just
changing the type of combine_diff_parent::path to `char *` I like to suggest
changing the name and adding a comment. My inclination is to use
`changed_path` as the field name.
The resulting diff against next is attached.
[-- Attachment #2: wink-changed_path.diff --]
[-- Type: text/x-patch, Size: 4492 bytes --]
diff --git a/combine-diff.c b/combine-diff.c
index 641bc92dbd..be5df18d75 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -66,13 +66,9 @@ static struct combine_diff_path *intersect_paths(
oidcpy(&p->parent[n].oid, &q->queue[i]->one->oid);
p->parent[n].mode = q->queue[i]->one->mode;
p->parent[n].status = q->queue[i]->status;
-
- if (combined_all_paths &&
- filename_changed(p->parent[n].status)) {
- strbuf_init(&p->parent[n].path, 0);
- strbuf_addstr(&p->parent[n].path,
- q->queue[i]->one->path);
- }
+ p->parent[n].changed_path = combined_all_paths &&
+ filename_changed(p->parent[n].status) ?
+ xstrdup(q->queue[i]->one->path) : NULL;
*tail = p;
tail = &p->next;
}
@@ -92,9 +88,7 @@ static struct combine_diff_path *intersect_paths(
/* p->path not in q->queue[]; drop it */
*tail = p->next;
for (j = 0; j < num_parent; j++)
- if (combined_all_paths &&
- filename_changed(p->parent[j].status))
- strbuf_release(&p->parent[j].path);
+ free(p->parent[j].changed_path);
free(p);
continue;
}
@@ -108,10 +102,9 @@ static struct combine_diff_path *intersect_paths(
oidcpy(&p->parent[n].oid, &q->queue[i]->one->oid);
p->parent[n].mode = q->queue[i]->one->mode;
p->parent[n].status = q->queue[i]->status;
- if (combined_all_paths &&
- filename_changed(p->parent[n].status))
- strbuf_addstr(&p->parent[n].path,
- q->queue[i]->one->path);
+ p->parent[n].changed_path = combined_all_paths &&
+ filename_changed(p->parent[n].status) ?
+ xstrdup(q->queue[i]->one->path) : NULL;
tail = &p->next;
i++;
@@ -996,8 +989,9 @@ static void show_combined_header(struct combine_diff_path *elem,
if (rev->combined_all_paths) {
for (i = 0; i < num_parent; i++) {
- char *path = filename_changed(elem->parent[i].status)
- ? elem->parent[i].path.buf : elem->path;
+ const char *path = elem->parent[i].changed_path ?
+ elem->parent[i].changed_path :
+ elem->path;
if (elem->parent[i].status == DIFF_STATUS_ADDED)
dump_quoted_path("--- ", "", "/dev/null",
line_prefix, c_meta, c_reset);
@@ -1278,12 +1272,10 @@ static void show_raw_diff(struct combine_diff_path *p, int num_parent, struct re
for (i = 0; i < num_parent; i++)
if (rev->combined_all_paths) {
- if (filename_changed(p->parent[i].status))
- write_name_quoted(p->parent[i].path.buf, stdout,
- inter_name_termination);
- else
- write_name_quoted(p->path, stdout,
- inter_name_termination);
+ const char *path = p->parent[i].changed_path ?
+ p->parent[i].changed_path :
+ p->path;
+ write_name_quoted(path, stdout, inter_name_termination);
}
write_name_quoted(p->path, stdout, line_termination);
}
@@ -1645,9 +1637,7 @@ void diff_tree_combined(const struct object_id *oid,
struct combine_diff_path *tmp = paths;
paths = paths->next;
for (i = 0; i < num_parent; i++)
- if (rev->combined_all_paths &&
- filename_changed(tmp->parent[i].status))
- strbuf_release(&tmp->parent[i].path);
+ free(tmp->parent[i].changed_path);
free(tmp);
}
diff --git a/diff-lib.c b/diff-lib.c
index c6d3bc4d37..602ae0c84b 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -417,9 +417,11 @@ static int show_modified(struct rev_info *revs,
memset(p->parent, 0, 2 * sizeof(struct combine_diff_parent));
p->parent[0].status = DIFF_STATUS_MODIFIED;
p->parent[0].mode = new_entry->ce_mode;
+ p->parent[0].changed_path = NULL;
oidcpy(&p->parent[0].oid, &new_entry->oid);
p->parent[1].status = DIFF_STATUS_MODIFIED;
p->parent[1].mode = old_entry->ce_mode;
+ p->parent[1].changed_path = NULL;
oidcpy(&p->parent[1].oid, &old_entry->oid);
show_combined_diff(p, 2, revs);
free(p);
diff --git a/diff.h b/diff.h
index 6e6007c17b..d13be142dd 100644
--- a/diff.h
+++ b/diff.h
@@ -480,7 +480,7 @@ struct combine_diff_path {
char status;
unsigned int mode;
struct object_id oid;
- struct strbuf path;
+ char *changed_path; // NULL unless status == 'R' or 'C', see filename_changed()
} parent[FLEX_ARRAY];
};
#define combine_diff_path_size(n, l) \
diff --git a/tree-diff.c b/tree-diff.c
index d9237ffd9b..85f1d2a4a6 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -272,6 +272,7 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
}
p->parent[i].mode = mode_i;
+ p->parent[i].changed_path = NULL;
oidcpy(&p->parent[i].oid, oid_i);
}
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 0/14] combine-diff cleanups
2025-01-04 3:32 ` Jeff King
2025-01-04 18:09 ` Wink Saville
@ 2025-01-09 8:27 ` Jeff King
2025-01-09 8:28 ` [PATCH 01/14] run_diff_files(): delay allocation of combine_diff_path Jeff King
` (13 more replies)
1 sibling, 14 replies; 38+ messages in thread
From: Jeff King @ 2025-01-09 8:27 UTC (permalink / raw)
To: Git List; +Cc: Junio C Hamano, Wink Saville
Since Wink successfully nerd-sniped me into digging into the
combine-diff code, and since I had such a hard time figuring out some of
its logic, I spent a little time trying to put that puzzling to good use
to make it more readable.
Aside from a minor leak fix in the first patch, I didn't find any bugs.
So arguably this whole thing could be discarded as churn. But I hope at
least some of it is worthwhile, and I tried to order it to keep the less
controversial bits near the top.
The series can be split into a few sections:
[01/14]: run_diff_files(): delay allocation of combine_diff_path
[02/14]: combine-diff: add combine_diff_path_new()
[03/14]: tree-diff: clear parent array in path_appendnew()
[04/14]: combine-diff: use pointer for parent paths
[05/14]: diff: add a comment about combine_diff_path.parent.path
[06/14]: run_diff_files(): de-mystify the size of combine_diff_path struct
These first six clean up most of the allocation and initialization
confusion that started this thread. They can't go all the way
because of the scariness in path_appendnew().
[07/14]: tree-diff: drop path_appendnew() alloc optimization
[08/14]: tree-diff: pass whole path string to path_appendnew()
[09/14]: tree-diff: inline path_appendnew()
[10/14]: combine-diff: drop public declaration of combine_diff_path_size()
And these ones take it further, but at the cost of losing an
optimization in patch 07. I don't think it was doing much (and I
gave some timings there). But it's a judgement call on whether the
cleaner code is worthwhile.
[11/14]: tree-diff: drop list-tail argument to diff_tree_paths()
[12/14]: tree-diff: use the name "tail" to refer to list tail
[13/14]: tree-diff: simplify emit_path() list management
[14/14]: tree-diff: make list tail-passing more explicit
And these last four fix some confusion I had while reading the
functions. I think they _could_ be done independent of 7-14,
but there'd be some kinks to work out in emit_path().
The final one is probably a matter of taste, and I'm not sure if
people find it easier to understand than the original or not. If
not, it can easily be dropped.
combine-diff.c | 80 +++++++++++++-------------
diff-lib.c | 36 ++++--------
diff.h | 18 ++++--
tree-diff.c | 152 ++++++++++++-------------------------------------
4 files changed, 102 insertions(+), 184 deletions(-)
-Peff
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH 01/14] run_diff_files(): delay allocation of combine_diff_path
2025-01-09 8:27 ` [PATCH 0/14] combine-diff cleanups Jeff King
@ 2025-01-09 8:28 ` Jeff King
2025-01-09 17:57 ` Junio C Hamano
2025-01-09 8:32 ` [PATCH 02/14] combine-diff: add combine_diff_path_new() Jeff King
` (12 subsequent siblings)
13 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2025-01-09 8:28 UTC (permalink / raw)
To: Git List; +Cc: Junio C Hamano, Wink Saville
While looping over the index entries, when we see a higher level stage
the first thing we do is allocate a combine_diff_path struct for it. But
this can leak; if check_removed() returns an error, we'll continue to
the next iteration of the loop without cleaning up.
We can fix this by just delaying the allocation by a few lines.
I don't think this leak is triggered in the test suite, but it's pretty
easy to see by inspection. My ulterior motive here is that the delayed
allocation means we have all of the data needed to initialize "dpath" at
the time of malloc, making it easier to factor out a constructor
function.
Signed-off-by: Jeff King <peff@peff.net>
---
diff-lib.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/diff-lib.c b/diff-lib.c
index c6d3bc4d37..85b8f1fa59 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -156,18 +156,6 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
size_t path_len;
struct stat st;
- path_len = ce_namelen(ce);
-
- dpath = xmalloc(combine_diff_path_size(5, path_len));
- dpath->path = (char *) &(dpath->parent[5]);
-
- dpath->next = NULL;
- memcpy(dpath->path, ce->name, path_len);
- dpath->path[path_len] = '\0';
- oidclr(&dpath->oid, the_repository->hash_algo);
- memset(&(dpath->parent[0]), 0,
- sizeof(struct combine_diff_parent)*5);
-
changed = check_removed(ce, &st);
if (!changed)
wt_mode = ce_mode_from_stat(ce, st.st_mode);
@@ -178,7 +166,19 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
}
wt_mode = 0;
}
+
+ path_len = ce_namelen(ce);
+
+ dpath = xmalloc(combine_diff_path_size(5, path_len));
+ dpath->path = (char *) &(dpath->parent[5]);
+
+ dpath->next = NULL;
+ memcpy(dpath->path, ce->name, path_len);
+ dpath->path[path_len] = '\0';
+ oidclr(&dpath->oid, the_repository->hash_algo);
dpath->mode = wt_mode;
+ memset(&(dpath->parent[0]), 0,
+ sizeof(struct combine_diff_parent)*5);
while (i < entries) {
struct cache_entry *nce = istate->cache[i];
--
2.48.0.rc2.413.gc1c80375a3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 02/14] combine-diff: add combine_diff_path_new()
2025-01-09 8:27 ` [PATCH 0/14] combine-diff cleanups Jeff King
2025-01-09 8:28 ` [PATCH 01/14] run_diff_files(): delay allocation of combine_diff_path Jeff King
@ 2025-01-09 8:32 ` Jeff King
2025-01-09 18:05 ` Junio C Hamano
2025-01-13 15:40 ` Patrick Steinhardt
2025-01-09 8:33 ` [PATCH 03/14] tree-diff: clear parent array in path_appendnew() Jeff King
` (11 subsequent siblings)
13 siblings, 2 replies; 38+ messages in thread
From: Jeff King @ 2025-01-09 8:32 UTC (permalink / raw)
To: Git List; +Cc: Junio C Hamano, Wink Saville
The combine_diff_path struct has variable size, since it embeds both the
memory allocation for the path field as well as a variable-sized parent
array. This makes allocating one a bit tricky.
We have a helper to compute the required size, but it's up to individual
sites to actually initialize all of the fields. Let's provide a
constructor function to make that a little nicer. Besides being shorter,
it also hides away tricky bits like the computation of the "path"
pointer (which is right after the "parent" flex array).
As a bonus, using the same constructor everywhere means that we'll
consistently initialize all parts of the struct. A few code paths left
the parent array unitialized. This didn't cause any bugs, but we'll be
able to simplify some code in the next few patches knowing that the
parent fields have all been zero'd.
This also gets rid of some questionable uses of "int" to store buffer
lengths. Though we do use them to allocate, I don't think there are any
integer overflow vulnerabilities here (the allocation helper promotes
them to size_t and checks arithmetic for overflow, and the actual memcpy
of the bytes is done using the possibly-truncated "int" value).
Sadly we can't use the FLEX_* macros to simplify the allocation here,
because there are two variable-sized parts to the struct (and those
macros only handle one).
Nor can we get stop publicly declaring combine_diff_path_size(). This
patch does not touch the code in path_appendnew() at all, which is not
ready to be moved to our new constructor for a few reasons:
- path_appendnew() has a memory-reuse optimization where it tries to
reuse combine_diff_path structs rather than freeing and
reallocating.
- path_appendnew() does not create the struct from a single path
string, but rather allocates and copies into the buffer from
multiple sources.
These can be addressed by some refactoring, but let's leave it as-is for
now.
Signed-off-by: Jeff King <peff@peff.net>
---
combine-diff.c | 40 ++++++++++++++++++++++++++--------------
diff-lib.c | 29 ++++++-----------------------
diff.h | 5 +++++
3 files changed, 37 insertions(+), 37 deletions(-)
diff --git a/combine-diff.c b/combine-diff.c
index 641bc92dbd..45548fd438 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -47,22 +47,13 @@ static struct combine_diff_path *intersect_paths(
if (!n) {
for (i = 0; i < q->nr; i++) {
- int len;
- const char *path;
if (diff_unmodified_pair(q->queue[i]))
continue;
- path = q->queue[i]->two->path;
- len = strlen(path);
- p = xmalloc(combine_diff_path_size(num_parent, len));
- p->path = (char *) &(p->parent[num_parent]);
- memcpy(p->path, path, len);
- p->path[len] = 0;
- p->next = NULL;
- memset(p->parent, 0,
- sizeof(p->parent[0]) * num_parent);
-
- oidcpy(&p->oid, &q->queue[i]->two->oid);
- p->mode = q->queue[i]->two->mode;
+ p = combine_diff_path_new(q->queue[i]->two->path,
+ strlen(q->queue[i]->two->path),
+ q->queue[i]->two->mode,
+ &q->queue[i]->two->oid,
+ num_parent);
oidcpy(&p->parent[n].oid, &q->queue[i]->one->oid);
p->parent[n].mode = q->queue[i]->one->mode;
p->parent[n].status = q->queue[i]->status;
@@ -1667,3 +1658,24 @@ void diff_tree_combined_merge(const struct commit *commit,
diff_tree_combined(&commit->object.oid, &parents, rev);
oid_array_clear(&parents);
}
+
+struct combine_diff_path *combine_diff_path_new(const char *path,
+ size_t path_len,
+ unsigned int mode,
+ const struct object_id *oid,
+ size_t num_parents)
+{
+ struct combine_diff_path *p;
+
+ p = xmalloc(combine_diff_path_size(num_parents, path_len));
+ p->path = (char *)&(p->parent[num_parents]);
+ memcpy(p->path, path, path_len);
+ p->path[path_len] = 0;
+ p->next = NULL;
+ p->mode = mode;
+ oidcpy(&p->oid, oid);
+
+ memset(p->parent, 0, sizeof(p->parent[0]) * num_parents);
+
+ return p;
+}
diff --git a/diff-lib.c b/diff-lib.c
index 85b8f1fa59..471ef99614 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -153,7 +153,6 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
struct diff_filepair *pair;
unsigned int wt_mode = 0;
int num_compare_stages = 0;
- size_t path_len;
struct stat st;
changed = check_removed(ce, &st);
@@ -167,18 +166,8 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
wt_mode = 0;
}
- path_len = ce_namelen(ce);
-
- dpath = xmalloc(combine_diff_path_size(5, path_len));
- dpath->path = (char *) &(dpath->parent[5]);
-
- dpath->next = NULL;
- memcpy(dpath->path, ce->name, path_len);
- dpath->path[path_len] = '\0';
- oidclr(&dpath->oid, the_repository->hash_algo);
- dpath->mode = wt_mode;
- memset(&(dpath->parent[0]), 0,
- sizeof(struct combine_diff_parent)*5);
+ dpath = combine_diff_path_new(ce->name, ce_namelen(ce),
+ wt_mode, null_oid(), 5);
while (i < entries) {
struct cache_entry *nce = istate->cache[i];
@@ -405,16 +394,10 @@ static int show_modified(struct rev_info *revs,
if (revs->combine_merges && !cached &&
(!oideq(oid, &old_entry->oid) || !oideq(&old_entry->oid, &new_entry->oid))) {
struct combine_diff_path *p;
- int pathlen = ce_namelen(new_entry);
-
- p = xmalloc(combine_diff_path_size(2, pathlen));
- p->path = (char *) &p->parent[2];
- p->next = NULL;
- memcpy(p->path, new_entry->name, pathlen);
- p->path[pathlen] = 0;
- p->mode = mode;
- oidclr(&p->oid, the_repository->hash_algo);
- memset(p->parent, 0, 2 * sizeof(struct combine_diff_parent));
+
+ p = combine_diff_path_new(new_entry->name,
+ ce_namelen(new_entry),
+ mode, null_oid(), 2);
p->parent[0].status = DIFF_STATUS_MODIFIED;
p->parent[0].mode = new_entry->ce_mode;
oidcpy(&p->parent[0].oid, &new_entry->oid);
diff --git a/diff.h b/diff.h
index 6e6007c17b..5cddd5a870 100644
--- a/diff.h
+++ b/diff.h
@@ -486,6 +486,11 @@ struct combine_diff_path {
#define combine_diff_path_size(n, l) \
st_add4(sizeof(struct combine_diff_path), (l), 1, \
st_mult(sizeof(struct combine_diff_parent), (n)))
+struct combine_diff_path *combine_diff_path_new(const char *path,
+ size_t path_len,
+ unsigned int mode,
+ const struct object_id *oid,
+ size_t num_parents);
void show_combined_diff(struct combine_diff_path *elem, int num_parent,
struct rev_info *);
--
2.48.0.rc2.413.gc1c80375a3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 03/14] tree-diff: clear parent array in path_appendnew()
2025-01-09 8:27 ` [PATCH 0/14] combine-diff cleanups Jeff King
2025-01-09 8:28 ` [PATCH 01/14] run_diff_files(): delay allocation of combine_diff_path Jeff King
2025-01-09 8:32 ` [PATCH 02/14] combine-diff: add combine_diff_path_new() Jeff King
@ 2025-01-09 8:33 ` Jeff King
2025-01-09 18:28 ` Junio C Hamano
2025-01-09 8:42 ` [PATCH 04/14] combine-diff: use pointer for parent paths Jeff King
` (10 subsequent siblings)
13 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2025-01-09 8:33 UTC (permalink / raw)
To: Git List; +Cc: Junio C Hamano, Wink Saville
All of the other functions which allocate a combine_diff_path struct
zero out the parent array, but this code path does not. There's no bug,
since our caller will fill in most of the fields. But leaving the unused
fields (like combine_diff_parent.path) uninitialized makes working with
the struct more error-prone than it needs to be.
Let's just zero the parent field to be consistent with the
combine_diff_path_new() allocator.
Signed-off-by: Jeff King <peff@peff.net>
---
tree-diff.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tree-diff.c b/tree-diff.c
index d9237ffd9b..24f7b5912c 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -151,8 +151,6 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_
* process(p);
* p = pprev;
* ; don't forget to free tail->next in the end
- *
- * p->parent[] remains uninitialized.
*/
static struct combine_diff_path *path_appendnew(struct combine_diff_path *last,
int nparent, const struct strbuf *base, const char *path, int pathlen,
@@ -187,6 +185,8 @@ static struct combine_diff_path *path_appendnew(struct combine_diff_path *last,
p->mode = mode;
oidcpy(&p->oid, oid ? oid : null_oid());
+ memset(p->parent, 0, sizeof(p->parent[0]) * nparent);
+
return p;
}
--
2.48.0.rc2.413.gc1c80375a3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 04/14] combine-diff: use pointer for parent paths
2025-01-09 8:27 ` [PATCH 0/14] combine-diff cleanups Jeff King
` (2 preceding siblings ...)
2025-01-09 8:33 ` [PATCH 03/14] tree-diff: clear parent array in path_appendnew() Jeff King
@ 2025-01-09 8:42 ` Jeff King
2025-01-09 18:49 ` Junio C Hamano
2025-01-09 8:42 ` [PATCH 05/14] diff: add a comment about combine_diff_path.parent.path Jeff King
` (9 subsequent siblings)
13 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2025-01-09 8:42 UTC (permalink / raw)
To: Git List; +Cc: Junio C Hamano, Wink Saville
Commit d76ce4f734 (log,diff-tree: add --combined-all-paths option,
2019-02-07) added a "path" field to each combine_diff_parent struct.
It's defined as a strbuf, but this is overkill. We never manipulate the
buffer beyond inserting a single string into it.
And in fact there's a small bug: we zero the parent structs, including
the path strbufs. For the 0th parent, we strbuf_init() the strbuf before
adding to it. But for subsequent parents, we never do the init. This is
technically violating the strbuf API, though the code there is resilient
enough to handle this zero'd state.
This patch switches us to just store an allocated string pointer.
Zeroing it is enough to properly initialize it there (modulo the usual
assumption we make that a NULL pointer is all-zeroes).
And as a bonus, we can just check for a non-NULL value to see if it is
present, rather than repeating the combined_all_paths logic at each
site.
Signed-off-by: Jeff King <peff@peff.net>
---
combine-diff.c | 30 +++++++++++-------------------
diff.h | 2 +-
2 files changed, 12 insertions(+), 20 deletions(-)
diff --git a/combine-diff.c b/combine-diff.c
index 45548fd438..ae3cbfc699 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -60,9 +60,7 @@ static struct combine_diff_path *intersect_paths(
if (combined_all_paths &&
filename_changed(p->parent[n].status)) {
- strbuf_init(&p->parent[n].path, 0);
- strbuf_addstr(&p->parent[n].path,
- q->queue[i]->one->path);
+ p->parent[n].path = xstrdup(q->queue[i]->one->path);
}
*tail = p;
tail = &p->next;
@@ -83,9 +81,7 @@ static struct combine_diff_path *intersect_paths(
/* p->path not in q->queue[]; drop it */
*tail = p->next;
for (j = 0; j < num_parent; j++)
- if (combined_all_paths &&
- filename_changed(p->parent[j].status))
- strbuf_release(&p->parent[j].path);
+ free(p->parent[j].path);
free(p);
continue;
}
@@ -101,8 +97,7 @@ static struct combine_diff_path *intersect_paths(
p->parent[n].status = q->queue[i]->status;
if (combined_all_paths &&
filename_changed(p->parent[n].status))
- strbuf_addstr(&p->parent[n].path,
- q->queue[i]->one->path);
+ p->parent[n].path = xstrdup(q->queue[i]->one->path);
tail = &p->next;
i++;
@@ -987,8 +982,9 @@ static void show_combined_header(struct combine_diff_path *elem,
if (rev->combined_all_paths) {
for (i = 0; i < num_parent; i++) {
- char *path = filename_changed(elem->parent[i].status)
- ? elem->parent[i].path.buf : elem->path;
+ const char *path = elem->parent[i].path ?
+ elem->parent[i].path :
+ elem->path;
if (elem->parent[i].status == DIFF_STATUS_ADDED)
dump_quoted_path("--- ", "", "/dev/null",
line_prefix, c_meta, c_reset);
@@ -1269,12 +1265,10 @@ static void show_raw_diff(struct combine_diff_path *p, int num_parent, struct re
for (i = 0; i < num_parent; i++)
if (rev->combined_all_paths) {
- if (filename_changed(p->parent[i].status))
- write_name_quoted(p->parent[i].path.buf, stdout,
- inter_name_termination);
- else
- write_name_quoted(p->path, stdout,
- inter_name_termination);
+ const char *path = p->parent[i].path ?
+ p->parent[i].path :
+ p->path;
+ write_name_quoted(path, stdout, inter_name_termination);
}
write_name_quoted(p->path, stdout, line_termination);
}
@@ -1636,9 +1630,7 @@ void diff_tree_combined(const struct object_id *oid,
struct combine_diff_path *tmp = paths;
paths = paths->next;
for (i = 0; i < num_parent; i++)
- if (rev->combined_all_paths &&
- filename_changed(tmp->parent[i].status))
- strbuf_release(&tmp->parent[i].path);
+ free(tmp->parent[i].path);
free(tmp);
}
diff --git a/diff.h b/diff.h
index 5cddd5a870..f5f6ea00fb 100644
--- a/diff.h
+++ b/diff.h
@@ -480,7 +480,7 @@ struct combine_diff_path {
char status;
unsigned int mode;
struct object_id oid;
- struct strbuf path;
+ char *path;
} parent[FLEX_ARRAY];
};
#define combine_diff_path_size(n, l) \
--
2.48.0.rc2.413.gc1c80375a3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 05/14] diff: add a comment about combine_diff_path.parent.path
2025-01-09 8:27 ` [PATCH 0/14] combine-diff cleanups Jeff King
` (3 preceding siblings ...)
2025-01-09 8:42 ` [PATCH 04/14] combine-diff: use pointer for parent paths Jeff King
@ 2025-01-09 8:42 ` Jeff King
2025-01-13 15:40 ` Patrick Steinhardt
2025-01-09 8:44 ` [PATCH 06/14] run_diff_files(): de-mystify the size of combine_diff_path struct Jeff King
` (8 subsequent siblings)
13 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2025-01-09 8:42 UTC (permalink / raw)
To: Git List; +Cc: Junio C Hamano, Wink Saville
We only fill in the per-parent "path" field when it differs from what's
in combine_diff_path.path (and even then only when the option is
appropriate). Let's document that.
Suggested-by: Wink Saville <wink@saville.com>
Signed-off-by: Jeff King <peff@peff.net>
---
diff.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/diff.h b/diff.h
index f5f6ea00fb..60e7db4ad6 100644
--- a/diff.h
+++ b/diff.h
@@ -480,6 +480,12 @@ struct combine_diff_path {
char status;
unsigned int mode;
struct object_id oid;
+ /*
+ * This per-parent path is filled only when doing a combined
+ * diff with revs.combined_all_paths set, and only if the path
+ * differs from the post-image (e.g., a rename or copy).
+ * Otherwise it is left NULL.
+ */
char *path;
} parent[FLEX_ARRAY];
};
--
2.48.0.rc2.413.gc1c80375a3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 06/14] run_diff_files(): de-mystify the size of combine_diff_path struct
2025-01-09 8:27 ` [PATCH 0/14] combine-diff cleanups Jeff King
` (4 preceding siblings ...)
2025-01-09 8:42 ` [PATCH 05/14] diff: add a comment about combine_diff_path.parent.path Jeff King
@ 2025-01-09 8:44 ` Jeff King
2025-01-10 16:40 ` Junio C Hamano
2025-01-09 8:46 ` [PATCH 07/14] tree-diff: drop path_appendnew() alloc optimization Jeff King
` (7 subsequent siblings)
13 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2025-01-09 8:44 UTC (permalink / raw)
To: Git List; +Cc: Junio C Hamano, Wink Saville
We allocate a combine_diff_path struct with space for 5 parents. Why 5?
The history is not particularly enlightening. The allocation comes from
b4b1550315 (Don't instantiate structures with FAMs., 2006-06-18), which
just switched to xmalloc from a stack struct with 5 elements. That
struct changed to 5 from 4 in 2454c962fb (combine-diff: show mode
changes as well., 2006-02-06), when we also moved from storing raw sha1
bytes to the combine_diff_parent struct. But no explanation is given.
That 4 comes from the earliest code in ea726d02e9 (diff-files: -c and
--cc options., 2006-01-28).
One might guess it is for the 4 stages we can store in the index. But
this code path only ever diffs the current state against stages 2 and 3.
So we only need two slots.
And it's easy to see this is still the case. We fill the parent slots by
subtracting 2 from the ce_stage() values, ignoring values below 2. And
since ce_stage() is only 2 bits, there are 4 values, and thus we need 2
slots.
Let's use the correct value (saving a tiny bit of memory) and add a
comment explaining what's going on (saving a tiny bit of programmer
brain power).
Arguably we could use:
1 + (STAGEMASK >> STAGESHIFT) - 2
which lets the compiler enforce that we will not go out-of-bounds if we
see an unexpected value from ce_stage(). But that is more confusing to
explain, and the constant "2" is baked into other parts of the function.
It is a fundamental constant, not something where somebody might bump a
macro and forget to update this code.
Signed-off-by: Jeff King <peff@peff.net>
---
diff-lib.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/diff-lib.c b/diff-lib.c
index 471ef99614..353b473ed5 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -166,8 +166,13 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
wt_mode = 0;
}
+ /*
+ * Allocate space for two parents, which will come from
+ * index stages #2 and #3, if present. Below we'll fill
+ * these from (stage - 2).
+ */
dpath = combine_diff_path_new(ce->name, ce_namelen(ce),
- wt_mode, null_oid(), 5);
+ wt_mode, null_oid(), 2);
while (i < entries) {
struct cache_entry *nce = istate->cache[i];
--
2.48.0.rc2.413.gc1c80375a3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 07/14] tree-diff: drop path_appendnew() alloc optimization
2025-01-09 8:27 ` [PATCH 0/14] combine-diff cleanups Jeff King
` (5 preceding siblings ...)
2025-01-09 8:44 ` [PATCH 06/14] run_diff_files(): de-mystify the size of combine_diff_path struct Jeff King
@ 2025-01-09 8:46 ` Jeff King
2025-01-13 15:40 ` Patrick Steinhardt
2025-01-09 8:49 ` [PATCH 08/14] tree-diff: pass whole path string to path_appendnew() Jeff King
` (6 subsequent siblings)
13 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2025-01-09 8:46 UTC (permalink / raw)
To: Git List; +Cc: Junio C Hamano, Wink Saville
When we're diffing trees, we create a list of combine_diff_path structs
that represent changed paths. We allocate each struct and add it to the
list with path_appendnew(), which we then feed to opt->pathchange().
That function tells us whether the path is of interest or not; if not,
then we can throw away the struct we allocated.
So there's an optimization to avoid extra allocations: instead of
throwing away the new entry, we try to reuse it. If it was large enough
to store the next path we care about, we can do so. And if not, we fall
back to freeing and re-allocating a new struct.
This comes from 72441af7c4 (tree-diff: rework diff_tree() to generate
diffs for multiparent cases as well, 2014-04-07), where the goal was to
have even the 2-parent diff code use the combine-diff infrastructure,
but without taking a performance hit.
The implementation causes some complexities in the interface (as we
store the allocation length inside the "next" pointer), and prevents us
from using the regular combine_diff_path_new() constructor. The
complexity is mostly contained inside two functions, but it's worth
re-evaluating how much it's helping.
That commit claims it helps ~1% on generating two-parent diffs in
linux.git. Here are the timings I get on the same command today ("old"
is the current tip of master, and "new" has this patch applied):
Benchmark 1: ./git.old log --raw --no-abbrev --no-renames v3.10..v3.11
Time (mean ± σ): 532.9 ms ± 5.8 ms [User: 472.7 ms, System: 59.6 ms]
Range (min … max): 525.9 ms … 543.3 ms 10 runs
Benchmark 2: ./git.new log --raw --no-abbrev --no-renames v3.10..v3.11
Time (mean ± σ): 538.3 ms ± 5.7 ms [User: 478.0 ms, System: 59.7 ms]
Range (min … max): 528.5 ms … 545.3 ms 10 runs
Summary
./git.old log --raw --no-abbrev --no-renames v3.10..v3.11 ran
1.01 ± 0.02 times faster than ./git.new log --raw --no-abbrev --no-renames v3.10..v3.11
So we do end up on average 1% faster, but with 2% of noise. I tried to
focus more on diff performance by running the commit traversal
separately, like:
git rev-list v3.10..v3.11 >in
and then timing just the diffs:
Benchmark 1: ./git.old diff-tree --stdin -r <in
Time (mean ± σ): 415.7 ms ± 5.8 ms [User: 357.7 ms, System: 58.0 ms]
Range (min … max): 410.9 ms … 430.3 ms 10 runs
Benchmark 2: ./git.new diff-tree --stdin -r <in
Time (mean ± σ): 418.5 ms ± 2.1 ms [User: 361.7 ms, System: 56.6 ms]
Range (min … max): 414.9 ms … 421.3 ms 10 runs
Summary
./git.old diff-tree --stdin -r <in ran
1.01 ± 0.02 times faster than ./git.new diff-tree --stdin -r <in
That gets roughly the same result.
Adding in "-c" to do multi-parent diffs doesn't change much:
Benchmark 1: ./git.old diff-tree --stdin -r -c <in
Time (mean ± σ): 525.3 ms ± 6.6 ms [User: 470.0 ms, System: 55.1 ms]
Range (min … max): 508.4 ms … 531.0 ms 10 runs
Benchmark 2: ./git.new diff-tree --stdin -r -c <in
Time (mean ± σ): 532.3 ms ± 6.2 ms [User: 469.0 ms, System: 63.1 ms]
Range (min … max): 520.3 ms … 539.4 ms 10 runs
Summary
./git.old diff-tree --stdin -r -c <in ran
1.01 ± 0.02 times faster than ./git.new diff-tree --stdin -r -c <in
And of course if you add in a lot more work by doing actual
content-level diffs, any difference is lost entirely (here the newer
version is actually faster, but that's really just noise):
Benchmark 1: ./git.old diff-tree --stdin -r --cc <in
Time (mean ± σ): 11.571 s ± 0.064 s [User: 11.287 s, System: 0.283 s]
Range (min … max): 11.497 s … 11.615 s 3 runs
Benchmark 2: ./git.new diff-tree --stdin -r --cc <in
Time (mean ± σ): 11.466 s ± 0.109 s [User: 11.108 s, System: 0.357 s]
Range (min … max): 11.346 s … 11.560 s 3 runs
Summary
./git.new diff-tree --stdin -r --cc <in ran
1.01 ± 0.01 times faster than ./git.old diff-tree --stdin -r --cc <in
So my conclusion is that it probably does help a little, but it's mostly
lost in the noise. I could see an argument for keeping it, as the
complexity is hidden away in functions that do not often need to be
touched. But it does make them more confusing than necessary (despite
some detailed explanations from the author of that commit; it just took
me a while to wrap my head around what was going on) and prevents
further refactoring of the combine_diff_path struct. So let's drop it.
Signed-off-by: Jeff King <peff@peff.net>
---
tree-diff.c | 67 +++++------------------------------------------------
1 file changed, 6 insertions(+), 61 deletions(-)
diff --git a/tree-diff.c b/tree-diff.c
index 24f7b5912c..22fc2d8f8c 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -127,30 +127,6 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_
/*
* Make a new combine_diff_path from path/mode/sha1
* and append it to paths list tail.
- *
- * Memory for created elements could be reused:
- *
- * - if last->next == NULL, the memory is allocated;
- *
- * - if last->next != NULL, it is assumed that p=last->next was returned
- * earlier by this function, and p->next was *not* modified.
- * The memory is then reused from p.
- *
- * so for clients,
- *
- * - if you do need to keep the element
- *
- * p = path_appendnew(p, ...);
- * process(p);
- * p->next = NULL;
- *
- * - if you don't need to keep the element after processing
- *
- * pprev = p;
- * p = path_appendnew(p, ...);
- * process(p);
- * p = pprev;
- * ; don't forget to free tail->next in the end
*/
static struct combine_diff_path *path_appendnew(struct combine_diff_path *last,
int nparent, const struct strbuf *base, const char *path, int pathlen,
@@ -160,22 +136,8 @@ static struct combine_diff_path *path_appendnew(struct combine_diff_path *last,
size_t len = st_add(base->len, pathlen);
size_t alloclen = combine_diff_path_size(nparent, len);
- /* if last->next is !NULL - it is a pre-allocated memory, we can reuse */
- p = last->next;
- if (p && (alloclen > (intptr_t)p->next)) {
- FREE_AND_NULL(p);
- }
-
- if (!p) {
- p = xmalloc(alloclen);
-
- /*
- * until we go to it next round, .next holds how many bytes we
- * allocated (for faster realloc - we don't need copying old data).
- */
- p->next = (struct combine_diff_path *)(intptr_t)alloclen;
- }
-
+ p = xmalloc(alloclen);
+ p->next = NULL;
last->next = p;
p->path = (char *)&(p->parent[nparent]);
@@ -279,21 +241,11 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
if (opt->pathchange)
keep = opt->pathchange(opt, p);
- /*
- * If a path was filtered or consumed - we don't need to add it
- * to the list and can reuse its memory, leaving it as
- * pre-allocated element on the tail.
- *
- * On the other hand, if path needs to be kept, we need to
- * correct its .next to NULL, as it was pre-initialized to how
- * much memory was allocated.
- *
- * see path_appendnew() for details.
- */
- if (!keep)
+ if (!keep) {
+ free(p);
+ pprev->next = NULL;
p = pprev;
- else
- p->next = NULL;
+ }
}
if (recurse) {
@@ -585,13 +537,6 @@ struct combine_diff_path *diff_tree_paths(
struct strbuf *base, struct diff_options *opt)
{
p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt, 0);
-
- /*
- * free pre-allocated last element, if any
- * (see path_appendnew() for details about why)
- */
- FREE_AND_NULL(p->next);
-
return p;
}
--
2.48.0.rc2.413.gc1c80375a3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 08/14] tree-diff: pass whole path string to path_appendnew()
2025-01-09 8:27 ` [PATCH 0/14] combine-diff cleanups Jeff King
` (6 preceding siblings ...)
2025-01-09 8:46 ` [PATCH 07/14] tree-diff: drop path_appendnew() alloc optimization Jeff King
@ 2025-01-09 8:49 ` Jeff King
2025-01-13 15:40 ` Patrick Steinhardt
2025-01-09 8:49 ` [PATCH 09/14] tree-diff: inline path_appendnew() Jeff King
` (5 subsequent siblings)
13 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2025-01-09 8:49 UTC (permalink / raw)
To: Git List; +Cc: Junio C Hamano, Wink Saville
When diffing trees, we'll have a strbuf "base" containing the
slash-separted names of our parent trees, and a "path" string
representing an entry name from the current tree. We pass these
separately to path_appendnew(), which combines them to form a single
path string in the combine_diff_path struct.
Instead, let's append the path string to our base strbuf ourselves, pass
in the result, and then roll it back with strbuf_setlen(). This lets us
simplify path_appendnew() a bit, enabling further refactoring.
And while it might seem like this causes extra wasted allocations, it
does not in practice. We reuse the same strbuf for each tree entry, so
we only have to allocate it to match the largest name. Plus, in a
recursive diff we'll end up doing this same operation to extend the base
for the next level of recursion. So we're really just incurring a small
memcpy().
Signed-off-by: Jeff King <peff@peff.net>
---
tree-diff.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/tree-diff.c b/tree-diff.c
index 22fc2d8f8c..d2f8dd14a6 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -129,20 +129,18 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_
* and append it to paths list tail.
*/
static struct combine_diff_path *path_appendnew(struct combine_diff_path *last,
- int nparent, const struct strbuf *base, const char *path, int pathlen,
+ int nparent, const char *path, size_t len,
unsigned mode, const struct object_id *oid)
{
struct combine_diff_path *p;
- size_t len = st_add(base->len, pathlen);
size_t alloclen = combine_diff_path_size(nparent, len);
p = xmalloc(alloclen);
p->next = NULL;
last->next = p;
p->path = (char *)&(p->parent[nparent]);
- memcpy(p->path, base->buf, base->len);
- memcpy(p->path + base->len, path, pathlen);
+ memcpy(p->path, path, len);
p->path[len] = 0;
p->mode = mode;
oidcpy(&p->oid, oid ? oid : null_oid());
@@ -206,7 +204,10 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
if (emitthis) {
int keep;
struct combine_diff_path *pprev = p;
- p = path_appendnew(p, nparent, base, path, pathlen, mode, oid);
+
+ strbuf_add(base, path, pathlen);
+ p = path_appendnew(p, nparent, base->buf, base->len, mode, oid);
+ strbuf_setlen(base, old_baselen);
for (i = 0; i < nparent; ++i) {
/*
--
2.48.0.rc2.413.gc1c80375a3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 09/14] tree-diff: inline path_appendnew()
2025-01-09 8:27 ` [PATCH 0/14] combine-diff cleanups Jeff King
` (7 preceding siblings ...)
2025-01-09 8:49 ` [PATCH 08/14] tree-diff: pass whole path string to path_appendnew() Jeff King
@ 2025-01-09 8:49 ` Jeff King
2025-01-11 0:41 ` Junio C Hamano
2025-01-09 8:50 ` [PATCH 10/14] combine-diff: drop public declaration of combine_diff_path_size() Jeff King
` (4 subsequent siblings)
13 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2025-01-09 8:49 UTC (permalink / raw)
To: Git List; +Cc: Junio C Hamano, Wink Saville
Our path_appendnew() has been simplified to the point that it is mostly
just implementing combine_diff_path_new(), plus setting the "next"
pointer. Since there's only one caller, let's replace it completely with
a call to that helper function.
Signed-off-by: Jeff King <peff@peff.net>
---
tree-diff.c | 31 ++++---------------------------
1 file changed, 4 insertions(+), 27 deletions(-)
diff --git a/tree-diff.c b/tree-diff.c
index d2f8dd14a6..18e5a16716 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -124,32 +124,6 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_
}
-/*
- * Make a new combine_diff_path from path/mode/sha1
- * and append it to paths list tail.
- */
-static struct combine_diff_path *path_appendnew(struct combine_diff_path *last,
- int nparent, const char *path, size_t len,
- unsigned mode, const struct object_id *oid)
-{
- struct combine_diff_path *p;
- size_t alloclen = combine_diff_path_size(nparent, len);
-
- p = xmalloc(alloclen);
- p->next = NULL;
- last->next = p;
-
- p->path = (char *)&(p->parent[nparent]);
- memcpy(p->path, path, len);
- p->path[len] = 0;
- p->mode = mode;
- oidcpy(&p->oid, oid ? oid : null_oid());
-
- memset(p->parent, 0, sizeof(p->parent[0]) * nparent);
-
- return p;
-}
-
/*
* new path should be added to combine diff
*
@@ -206,7 +180,10 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
struct combine_diff_path *pprev = p;
strbuf_add(base, path, pathlen);
- p = path_appendnew(p, nparent, base->buf, base->len, mode, oid);
+ p = combine_diff_path_new(base->buf, base->len, mode,
+ oid ? oid : null_oid(),
+ nparent);
+ pprev->next = p;
strbuf_setlen(base, old_baselen);
for (i = 0; i < nparent; ++i) {
--
2.48.0.rc2.413.gc1c80375a3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 10/14] combine-diff: drop public declaration of combine_diff_path_size()
2025-01-09 8:27 ` [PATCH 0/14] combine-diff cleanups Jeff King
` (8 preceding siblings ...)
2025-01-09 8:49 ` [PATCH 09/14] tree-diff: inline path_appendnew() Jeff King
@ 2025-01-09 8:50 ` Jeff King
2025-01-09 8:51 ` [PATCH 11/14] tree-diff: drop list-tail argument to diff_tree_paths() Jeff King
` (3 subsequent siblings)
13 siblings, 0 replies; 38+ messages in thread
From: Jeff King @ 2025-01-09 8:50 UTC (permalink / raw)
To: Git List; +Cc: Junio C Hamano, Wink Saville
We want callers to use combine_diff_path_new() to allocate structs,
rather than using combine_diff_path_size() and xmalloc(). That gives us
more consistency over the initialization of the fields.
Now that the final external user of combine_diff_path_size() is gone, we
can stop declaring it publicly. And since our constructor is the only
caller, we can just inline it there.
Breaking the size computation into two parts also lets us reuse the
intermediate multiplication result of the parent length, since we need
to know it to perform our memset(). The result is a little easier to
read.
Signed-off-by: Jeff King <peff@peff.net>
---
combine-diff.c | 5 +++--
diff.h | 3 ---
2 files changed, 3 insertions(+), 5 deletions(-)
diff --git a/combine-diff.c b/combine-diff.c
index ae3cbfc699..f21e1f58ba 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1658,16 +1658,17 @@ struct combine_diff_path *combine_diff_path_new(const char *path,
size_t num_parents)
{
struct combine_diff_path *p;
+ size_t parent_len = st_mult(sizeof(p->parent[0]), num_parents);
- p = xmalloc(combine_diff_path_size(num_parents, path_len));
+ p = xmalloc(st_add4(sizeof(*p), path_len, 1, parent_len));
p->path = (char *)&(p->parent[num_parents]);
memcpy(p->path, path, path_len);
p->path[path_len] = 0;
p->next = NULL;
p->mode = mode;
oidcpy(&p->oid, oid);
- memset(p->parent, 0, sizeof(p->parent[0]) * num_parents);
+ memset(p->parent, 0, parent_len);
return p;
}
diff --git a/diff.h b/diff.h
index 60e7db4ad6..32ad17fd38 100644
--- a/diff.h
+++ b/diff.h
@@ -489,9 +489,6 @@ struct combine_diff_path {
char *path;
} parent[FLEX_ARRAY];
};
-#define combine_diff_path_size(n, l) \
- st_add4(sizeof(struct combine_diff_path), (l), 1, \
- st_mult(sizeof(struct combine_diff_parent), (n)))
struct combine_diff_path *combine_diff_path_new(const char *path,
size_t path_len,
unsigned int mode,
--
2.48.0.rc2.413.gc1c80375a3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 11/14] tree-diff: drop list-tail argument to diff_tree_paths()
2025-01-09 8:27 ` [PATCH 0/14] combine-diff cleanups Jeff King
` (9 preceding siblings ...)
2025-01-09 8:50 ` [PATCH 10/14] combine-diff: drop public declaration of combine_diff_path_size() Jeff King
@ 2025-01-09 8:51 ` Jeff King
2025-01-18 0:33 ` Junio C Hamano
2025-01-09 8:53 ` [PATCH 12/14] tree-diff: use the name "tail" to refer to list tail Jeff King
` (2 subsequent siblings)
13 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2025-01-09 8:51 UTC (permalink / raw)
To: Git List; +Cc: Junio C Hamano, Wink Saville
The internals of the path diffing code, including ll_diff_tree_paths(),
all take an extra combine_diff_path parameter which they use as the tail
of a list of results, appending any new entries to it.
The public-facing diff_tree_paths() takes the same argument, but it just
makes the callers more awkward. They always start with a clean list, and
have to set up a fake head struct to pass in.
Let's keep the public API clean by always returning a new list. That
keeps the fake struct as an implementation detail of tree-diff.c.
Signed-off-by: Jeff King <peff@peff.net>
---
combine-diff.c | 9 +++------
diff.h | 2 +-
tree-diff.c | 14 ++++++++------
3 files changed, 12 insertions(+), 13 deletions(-)
diff --git a/combine-diff.c b/combine-diff.c
index f21e1f58ba..9527f3160d 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1428,22 +1428,19 @@ static struct combine_diff_path *find_paths_multitree(
{
int i, nparent = parents->nr;
const struct object_id **parents_oid;
- struct combine_diff_path paths_head;
+ struct combine_diff_path *paths;
struct strbuf base;
ALLOC_ARRAY(parents_oid, nparent);
for (i = 0; i < nparent; i++)
parents_oid[i] = &parents->oid[i];
- /* fake list head, so worker can assume it is non-NULL */
- paths_head.next = NULL;
-
strbuf_init(&base, PATH_MAX);
- diff_tree_paths(&paths_head, oid, parents_oid, nparent, &base, opt);
+ paths = diff_tree_paths(oid, parents_oid, nparent, &base, opt);
strbuf_release(&base);
free(parents_oid);
- return paths_head.next;
+ return paths;
}
static int match_objfind(struct combine_diff_path *path,
diff --git a/diff.h b/diff.h
index 32ad17fd38..7831ed1a2b 100644
--- a/diff.h
+++ b/diff.h
@@ -462,7 +462,7 @@ const char *diff_line_prefix(struct diff_options *);
extern const char mime_boundary_leader[];
struct combine_diff_path *diff_tree_paths(
- struct combine_diff_path *p, const struct object_id *oid,
+ const struct object_id *oid,
const struct object_id **parents_oid, int nparent,
struct strbuf *base, struct diff_options *opt);
void diff_tree_oid(const struct object_id *old_oid,
diff --git a/tree-diff.c b/tree-diff.c
index 18e5a16716..e99e40da18 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -510,11 +510,14 @@ static struct combine_diff_path *ll_diff_tree_paths(
}
struct combine_diff_path *diff_tree_paths(
- struct combine_diff_path *p, const struct object_id *oid,
+ const struct object_id *oid,
const struct object_id **parents_oid, int nparent,
struct strbuf *base, struct diff_options *opt)
{
- p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt, 0);
+ struct combine_diff_path head, *p;
+ /* fake list head, so worker can assume it is non-NULL */
+ head.next = NULL;
+ p = ll_diff_tree_paths(&head, oid, parents_oid, nparent, base, opt, 0);
return p;
}
@@ -631,14 +634,13 @@ static void ll_diff_tree_oid(const struct object_id *old_oid,
const struct object_id *new_oid,
struct strbuf *base, struct diff_options *opt)
{
- struct combine_diff_path phead, *p;
+ struct combine_diff_path *paths, *p;
pathchange_fn_t pathchange_old = opt->pathchange;
- phead.next = NULL;
opt->pathchange = emit_diff_first_parent_only;
- diff_tree_paths(&phead, new_oid, &old_oid, 1, base, opt);
+ paths = diff_tree_paths(new_oid, &old_oid, 1, base, opt);
- for (p = phead.next; p;) {
+ for (p = paths; p;) {
struct combine_diff_path *pprev = p;
p = p->next;
free(pprev);
--
2.48.0.rc2.413.gc1c80375a3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 12/14] tree-diff: use the name "tail" to refer to list tail
2025-01-09 8:27 ` [PATCH 0/14] combine-diff cleanups Jeff King
` (10 preceding siblings ...)
2025-01-09 8:51 ` [PATCH 11/14] tree-diff: drop list-tail argument to diff_tree_paths() Jeff King
@ 2025-01-09 8:53 ` Jeff King
2025-01-09 8:54 ` [PATCH 13/14] tree-diff: simplify emit_path() list management Jeff King
2025-01-09 8:57 ` [PATCH 14/14] tree-diff: make list tail-passing more explicit Jeff King
13 siblings, 0 replies; 38+ messages in thread
From: Jeff King @ 2025-01-09 8:53 UTC (permalink / raw)
To: Git List; +Cc: Junio C Hamano, Wink Saville
The ll_diff_tree_paths() function and its helpers all append to a
running list by taking in a pointer to the old tail and returning the
new tail. But they just call this argument "p", which is not very
descriptive.
It gets particularly confusing in emit_path(), where we actually add to
the list, because "p" does double-duty: it is the tail of the list, but
it is also the entry which we add. Except that in some cases we _don't_
add a new entry (or we might even add it and roll it back) if the path
isn't interesting. At first glance, this makes it look like a bug that
we pass "p" on to ll_diff_tree_paths() to recurse; sometimes it is
getting the new entry we made and sometimes not!
But it's not a bug, because ll_diff_tree_paths() does not care about the
entry itself at all. It is only using its "next" pointer as the tail of
the list.
Let's swap out "p" for "tail" to make this obvious. And then in
emit_path() we'll continue to use "p" for our newly allocated entry.
Signed-off-by: Jeff King <peff@peff.net>
---
tree-diff.c | 33 +++++++++++++++++----------------
1 file changed, 17 insertions(+), 16 deletions(-)
diff --git a/tree-diff.c b/tree-diff.c
index e99e40da18..a1a611bef6 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -49,7 +49,7 @@
} while(0)
static struct combine_diff_path *ll_diff_tree_paths(
- struct combine_diff_path *p, const struct object_id *oid,
+ struct combine_diff_path *tail, const struct object_id *oid,
const struct object_id **parents_oid, int nparent,
struct strbuf *base, struct diff_options *opt,
int depth);
@@ -134,7 +134,7 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_
* t, tp -> path modified/added
* (M for tp[i]=tp[imin], A otherwise)
*/
-static struct combine_diff_path *emit_path(struct combine_diff_path *p,
+static struct combine_diff_path *emit_path(struct combine_diff_path *tail,
struct strbuf *base, struct diff_options *opt, int nparent,
struct tree_desc *t, struct tree_desc *tp,
int imin, int depth)
@@ -177,13 +177,14 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
if (emitthis) {
int keep;
- struct combine_diff_path *pprev = p;
+ struct combine_diff_path *pprev = tail, *p;
strbuf_add(base, path, pathlen);
p = combine_diff_path_new(base->buf, base->len, mode,
oid ? oid : null_oid(),
nparent);
- pprev->next = p;
+ tail->next = p;
+ tail = p;
strbuf_setlen(base, old_baselen);
for (i = 0; i < nparent; ++i) {
@@ -222,7 +223,7 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
if (!keep) {
free(p);
pprev->next = NULL;
- p = pprev;
+ tail = pprev;
}
}
@@ -239,13 +240,13 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
strbuf_add(base, path, pathlen);
strbuf_addch(base, '/');
- p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt,
- depth + 1);
+ tail = ll_diff_tree_paths(tail, oid, parents_oid, nparent, base, opt,
+ depth + 1);
FAST_ARRAY_FREE(parents_oid, nparent);
}
strbuf_setlen(base, old_baselen);
- return p;
+ return tail;
}
static void skip_uninteresting(struct tree_desc *t, struct strbuf *base,
@@ -359,7 +360,7 @@ static inline void update_tp_entries(struct tree_desc *tp, int nparent)
}
static struct combine_diff_path *ll_diff_tree_paths(
- struct combine_diff_path *p, const struct object_id *oid,
+ struct combine_diff_path *tail, const struct object_id *oid,
const struct object_id **parents_oid, int nparent,
struct strbuf *base, struct diff_options *opt,
int depth)
@@ -463,8 +464,8 @@ static struct combine_diff_path *ll_diff_tree_paths(
}
/* D += {δ(t,pi) if pi=p[imin]; "+a" if pi > p[imin]} */
- p = emit_path(p, base, opt, nparent,
- &t, tp, imin, depth);
+ tail = emit_path(tail, base, opt, nparent,
+ &t, tp, imin, depth);
skip_emit_t_tp:
/* t↓, ∀ pi=p[imin] pi↓ */
@@ -475,8 +476,8 @@ static struct combine_diff_path *ll_diff_tree_paths(
/* t < p[imin] */
else if (cmp < 0) {
/* D += "+t" */
- p = emit_path(p, base, opt, nparent,
- &t, /*tp=*/NULL, -1, depth);
+ tail = emit_path(tail, base, opt, nparent,
+ &t, /*tp=*/NULL, -1, depth);
/* t↓ */
update_tree_entry(&t);
@@ -491,8 +492,8 @@ static struct combine_diff_path *ll_diff_tree_paths(
goto skip_emit_tp;
}
- p = emit_path(p, base, opt, nparent,
- /*t=*/NULL, tp, imin, depth);
+ tail = emit_path(tail, base, opt, nparent,
+ /*t=*/NULL, tp, imin, depth);
skip_emit_tp:
/* ∀ pi=p[imin] pi↓ */
@@ -506,7 +507,7 @@ static struct combine_diff_path *ll_diff_tree_paths(
FAST_ARRAY_FREE(tptree, nparent);
FAST_ARRAY_FREE(tp, nparent);
- return p;
+ return tail;
}
struct combine_diff_path *diff_tree_paths(
--
2.48.0.rc2.413.gc1c80375a3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 13/14] tree-diff: simplify emit_path() list management
2025-01-09 8:27 ` [PATCH 0/14] combine-diff cleanups Jeff King
` (11 preceding siblings ...)
2025-01-09 8:53 ` [PATCH 12/14] tree-diff: use the name "tail" to refer to list tail Jeff King
@ 2025-01-09 8:54 ` Jeff King
2025-01-09 8:57 ` [PATCH 14/14] tree-diff: make list tail-passing more explicit Jeff King
13 siblings, 0 replies; 38+ messages in thread
From: Jeff King @ 2025-01-09 8:54 UTC (permalink / raw)
To: Git List; +Cc: Junio C Hamano, Wink Saville
In emit_path() we may append a new combine_diff_path entry to our list,
decide that we don't want it (because opt->pathchange() told us so) and
then roll it back.
Between the addition and the rollback, it doesn't matter if it's in the
list or not (no functions can even tell, since it's a singly-linked list
and we pass around just the tail entry).
So it's much simpler to just wait until opt->pathchange() tells us
whether to keep it, and either attach it (or free it) then. We do still
have to allocate it up front since it's that struct itself which is
passed to the pathchange callback.
Signed-off-by: Jeff King <peff@peff.net>
---
tree-diff.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/tree-diff.c b/tree-diff.c
index a1a611bef6..f5ec19113c 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -177,14 +177,12 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *tail,
if (emitthis) {
int keep;
- struct combine_diff_path *pprev = tail, *p;
+ struct combine_diff_path *p;
strbuf_add(base, path, pathlen);
p = combine_diff_path_new(base->buf, base->len, mode,
oid ? oid : null_oid(),
nparent);
- tail->next = p;
- tail = p;
strbuf_setlen(base, old_baselen);
for (i = 0; i < nparent; ++i) {
@@ -220,10 +218,11 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *tail,
if (opt->pathchange)
keep = opt->pathchange(opt, p);
- if (!keep) {
+ if (keep) {
+ tail->next = p;
+ tail = p;
+ } else {
free(p);
- pprev->next = NULL;
- tail = pprev;
}
}
--
2.48.0.rc2.413.gc1c80375a3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 14/14] tree-diff: make list tail-passing more explicit
2025-01-09 8:27 ` [PATCH 0/14] combine-diff cleanups Jeff King
` (12 preceding siblings ...)
2025-01-09 8:54 ` [PATCH 13/14] tree-diff: simplify emit_path() list management Jeff King
@ 2025-01-09 8:57 ` Jeff King
13 siblings, 0 replies; 38+ messages in thread
From: Jeff King @ 2025-01-09 8:57 UTC (permalink / raw)
To: Git List; +Cc: Junio C Hamano, Wink Saville
The ll_diff_tree_paths() function and its helpers all take a pointer to
a list tail, possibly add to it, and then return the new tail. This
works but has two downsides:
- The top-level caller (diff_tree_paths() in this case) has to make a
fake combine_diff_path struct to act as the list head. This is
especially weird here, as it's a flexible-sized struct which will
have an empty FLEX_ARRAY field. That used to be a portability
problem, though these days it is legal because our FLEX_ARRAY macro
over-allocates if necessary. It's still kind of ugly, though.
- Besides the name "tail", it's not immediately obvious that the entry
we pass around will not be examined by each function. Using a
pointer-to-pointer or similar makes it more obvious we only care
about the pointer itself, not its contents.
We can solve both by passing around a pointer to the tail instead. That
gets rid of the return value entirely, though note that because of the
recursion we actually need a three-star pointer for this to work.
The result is fairly readable, as we only need to dereference the tail
in one spot. If we wanted to make it simpler we could wrap the tail in a
struct, which we pass around.
Another option is to convert combine_diff to use our generic list_head
API. I tried that and found the result became much harder to read
overall. It means that _all_ code that looks at combine_diff_path
structs needs to be modified, since the "next" pointer is now inside a
list_head which has to be dereferenced with list_entry(). And we lose
some type safety, since we're just passing around a list_head struct
everywhere, and everybody who looks at it has to specify the type to
list_entry themselves.
Signed-off-by: Jeff King <peff@peff.net>
---
tree-diff.c | 47 +++++++++++++++++++++--------------------------
1 file changed, 21 insertions(+), 26 deletions(-)
diff --git a/tree-diff.c b/tree-diff.c
index f5ec19113c..60c558c2b5 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -48,8 +48,8 @@
free((x)); \
} while(0)
-static struct combine_diff_path *ll_diff_tree_paths(
- struct combine_diff_path *tail, const struct object_id *oid,
+static void ll_diff_tree_paths(
+ struct combine_diff_path ***tail, const struct object_id *oid,
const struct object_id **parents_oid, int nparent,
struct strbuf *base, struct diff_options *opt,
int depth);
@@ -134,10 +134,10 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_
* t, tp -> path modified/added
* (M for tp[i]=tp[imin], A otherwise)
*/
-static struct combine_diff_path *emit_path(struct combine_diff_path *tail,
- struct strbuf *base, struct diff_options *opt, int nparent,
- struct tree_desc *t, struct tree_desc *tp,
- int imin, int depth)
+static void emit_path(struct combine_diff_path ***tail,
+ struct strbuf *base, struct diff_options *opt,
+ int nparent, struct tree_desc *t, struct tree_desc *tp,
+ int imin, int depth)
{
unsigned short mode;
const char *path;
@@ -219,8 +219,8 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *tail,
keep = opt->pathchange(opt, p);
if (keep) {
- tail->next = p;
- tail = p;
+ **tail = p;
+ *tail = &p->next;
} else {
free(p);
}
@@ -239,13 +239,12 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *tail,
strbuf_add(base, path, pathlen);
strbuf_addch(base, '/');
- tail = ll_diff_tree_paths(tail, oid, parents_oid, nparent, base, opt,
- depth + 1);
+ ll_diff_tree_paths(tail, oid, parents_oid, nparent, base, opt,
+ depth + 1);
FAST_ARRAY_FREE(parents_oid, nparent);
}
strbuf_setlen(base, old_baselen);
- return tail;
}
static void skip_uninteresting(struct tree_desc *t, struct strbuf *base,
@@ -358,8 +357,8 @@ static inline void update_tp_entries(struct tree_desc *tp, int nparent)
update_tree_entry(&tp[i]);
}
-static struct combine_diff_path *ll_diff_tree_paths(
- struct combine_diff_path *tail, const struct object_id *oid,
+static void ll_diff_tree_paths(
+ struct combine_diff_path ***tail, const struct object_id *oid,
const struct object_id **parents_oid, int nparent,
struct strbuf *base, struct diff_options *opt,
int depth)
@@ -463,8 +462,8 @@ static struct combine_diff_path *ll_diff_tree_paths(
}
/* D += {δ(t,pi) if pi=p[imin]; "+a" if pi > p[imin]} */
- tail = emit_path(tail, base, opt, nparent,
- &t, tp, imin, depth);
+ emit_path(tail, base, opt, nparent,
+ &t, tp, imin, depth);
skip_emit_t_tp:
/* t↓, ∀ pi=p[imin] pi↓ */
@@ -475,8 +474,8 @@ static struct combine_diff_path *ll_diff_tree_paths(
/* t < p[imin] */
else if (cmp < 0) {
/* D += "+t" */
- tail = emit_path(tail, base, opt, nparent,
- &t, /*tp=*/NULL, -1, depth);
+ emit_path(tail, base, opt, nparent,
+ &t, /*tp=*/NULL, -1, depth);
/* t↓ */
update_tree_entry(&t);
@@ -491,8 +490,8 @@ static struct combine_diff_path *ll_diff_tree_paths(
goto skip_emit_tp;
}
- tail = emit_path(tail, base, opt, nparent,
- /*t=*/NULL, tp, imin, depth);
+ emit_path(tail, base, opt, nparent,
+ /*t=*/NULL, tp, imin, depth);
skip_emit_tp:
/* ∀ pi=p[imin] pi↓ */
@@ -505,20 +504,16 @@ static struct combine_diff_path *ll_diff_tree_paths(
free(tptree[i]);
FAST_ARRAY_FREE(tptree, nparent);
FAST_ARRAY_FREE(tp, nparent);
-
- return tail;
}
struct combine_diff_path *diff_tree_paths(
const struct object_id *oid,
const struct object_id **parents_oid, int nparent,
struct strbuf *base, struct diff_options *opt)
{
- struct combine_diff_path head, *p;
- /* fake list head, so worker can assume it is non-NULL */
- head.next = NULL;
- p = ll_diff_tree_paths(&head, oid, parents_oid, nparent, base, opt, 0);
- return p;
+ struct combine_diff_path *head = NULL, **tail = &head;
+ ll_diff_tree_paths(&tail, oid, parents_oid, nparent, base, opt, 0);
+ return head;
}
/*
--
2.48.0.rc2.413.gc1c80375a3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH 01/14] run_diff_files(): delay allocation of combine_diff_path
2025-01-09 8:28 ` [PATCH 01/14] run_diff_files(): delay allocation of combine_diff_path Jeff King
@ 2025-01-09 17:57 ` Junio C Hamano
0 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2025-01-09 17:57 UTC (permalink / raw)
To: Jeff King; +Cc: Git List, Wink Saville
Jeff King <peff@peff.net> writes:
> While looping over the index entries, when we see a higher level stage
> the first thing we do is allocate a combine_diff_path struct for it. But
> this can leak; if check_removed() returns an error, we'll continue to
> the next iteration of the loop without cleaning up.
>
> We can fix this by just delaying the allocation by a few lines.
>
> I don't think this leak is triggered in the test suite, but it's pretty
> easy to see by inspection. My ulterior motive here is that the delayed
> allocation means we have all of the data needed to initialize "dpath" at
> the time of malloc, making it easier to factor out a constructor
> function.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff-lib.c | 24 ++++++++++++------------
> 1 file changed, 12 insertions(+), 12 deletions(-)
Makes sense.
>
> diff --git a/diff-lib.c b/diff-lib.c
> index c6d3bc4d37..85b8f1fa59 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -156,18 +156,6 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
> size_t path_len;
> struct stat st;
>
> - path_len = ce_namelen(ce);
> -
> - dpath = xmalloc(combine_diff_path_size(5, path_len));
> - dpath->path = (char *) &(dpath->parent[5]);
> -
> - dpath->next = NULL;
> - memcpy(dpath->path, ce->name, path_len);
> - dpath->path[path_len] = '\0';
> - oidclr(&dpath->oid, the_repository->hash_algo);
> - memset(&(dpath->parent[0]), 0,
> - sizeof(struct combine_diff_parent)*5);
> -
> changed = check_removed(ce, &st);
> if (!changed)
> wt_mode = ce_mode_from_stat(ce, st.st_mode);
> @@ -178,7 +166,19 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
> }
> wt_mode = 0;
> }
> +
> + path_len = ce_namelen(ce);
> +
> + dpath = xmalloc(combine_diff_path_size(5, path_len));
> + dpath->path = (char *) &(dpath->parent[5]);
> +
> + dpath->next = NULL;
> + memcpy(dpath->path, ce->name, path_len);
> + dpath->path[path_len] = '\0';
> + oidclr(&dpath->oid, the_repository->hash_algo);
> dpath->mode = wt_mode;
> + memset(&(dpath->parent[0]), 0,
> + sizeof(struct combine_diff_parent)*5);
>
> while (i < entries) {
> struct cache_entry *nce = istate->cache[i];
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 02/14] combine-diff: add combine_diff_path_new()
2025-01-09 8:32 ` [PATCH 02/14] combine-diff: add combine_diff_path_new() Jeff King
@ 2025-01-09 18:05 ` Junio C Hamano
2025-01-13 15:40 ` Patrick Steinhardt
1 sibling, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2025-01-09 18:05 UTC (permalink / raw)
To: Jeff King; +Cc: Git List, Wink Saville
Jeff King <peff@peff.net> writes:
> +struct combine_diff_path *combine_diff_path_new(const char *path,
> + size_t path_len,
> + unsigned int mode,
> + const struct object_id *oid,
> + size_t num_parents)
> +{
> + struct combine_diff_path *p;
> +
> + p = xmalloc(combine_diff_path_size(num_parents, path_len));
> + p->path = (char *)&(p->parent[num_parents]);
> + memcpy(p->path, path, path_len);
> + p->path[path_len] = 0;
> + p->next = NULL;
> + p->mode = mode;
> + oidcpy(&p->oid, oid);
> +
> + memset(p->parent, 0, sizeof(p->parent[0]) * num_parents);
> +
> + return p;
> +}
OK, I can see how the structure is laid out clearly in this code,
but I have to say it is one ugly hack X-<. At least with the
refactoring, it becomes much easier to see what the caller is doing.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 03/14] tree-diff: clear parent array in path_appendnew()
2025-01-09 8:33 ` [PATCH 03/14] tree-diff: clear parent array in path_appendnew() Jeff King
@ 2025-01-09 18:28 ` Junio C Hamano
2025-01-10 10:54 ` Jeff King
0 siblings, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2025-01-09 18:28 UTC (permalink / raw)
To: Jeff King; +Cc: Git List, Wink Saville
Jeff King <peff@peff.net> writes:
> All of the other functions which allocate a combine_diff_path struct
> zero out the parent array, but this code path does not. There's no bug,
> since our caller will fill in most of the fields. But leaving the unused
> fields (like combine_diff_parent.path) uninitialized makes working with
> the struct more error-prone than it needs to be.
OK. We however will still not use the array at all when we do not
need it, so it would be between accessing uninitialized bytes vs
accessing 0-bytes by mistake? With my devil's advocate hat on, I
wonder if this would lead to more sloppy users saying "I am not
following the pointer; I am merely stopping when I see a NULL
pointer at the end of the array" or something silly like that
without checking the validity of the array itself (which presumably
can be inferred by inspecting some other member in the containing
struct, right?)".
> Let's just zero the parent field to be consistent with the
> combine_diff_path_new() allocator.
But I like the "let's be consistent" reasoning, so I wouldn't
complain ;-)
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> tree-diff.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tree-diff.c b/tree-diff.c
> index d9237ffd9b..24f7b5912c 100644
> --- a/tree-diff.c
> +++ b/tree-diff.c
> @@ -151,8 +151,6 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_
> * process(p);
> * p = pprev;
> * ; don't forget to free tail->next in the end
> - *
> - * p->parent[] remains uninitialized.
> */
> static struct combine_diff_path *path_appendnew(struct combine_diff_path *last,
> int nparent, const struct strbuf *base, const char *path, int pathlen,
> @@ -187,6 +185,8 @@ static struct combine_diff_path *path_appendnew(struct combine_diff_path *last,
> p->mode = mode;
> oidcpy(&p->oid, oid ? oid : null_oid());
>
> + memset(p->parent, 0, sizeof(p->parent[0]) * nparent);
> +
> return p;
> }
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 04/14] combine-diff: use pointer for parent paths
2025-01-09 8:42 ` [PATCH 04/14] combine-diff: use pointer for parent paths Jeff King
@ 2025-01-09 18:49 ` Junio C Hamano
0 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2025-01-09 18:49 UTC (permalink / raw)
To: Jeff King; +Cc: Git List, Wink Saville
Jeff King <peff@peff.net> writes:
> Commit d76ce4f734 (log,diff-tree: add --combined-all-paths option,
> 2019-02-07) added a "path" field to each combine_diff_parent struct.
> It's defined as a strbuf, but this is overkill. We never manipulate the
> buffer beyond inserting a single string into it.
>
> And in fact there's a small bug: we zero the parent structs, including
> the path strbufs. For the 0th parent, we strbuf_init() the strbuf before
> adding to it. But for subsequent parents, we never do the init. This is
> technically violating the strbuf API, though the code there is resilient
> enough to handle this zero'd state.
>
> This patch switches us to just store an allocated string pointer.
> Zeroing it is enough to properly initialize it there (modulo the usual
> assumption we make that a NULL pointer is all-zeroes).
Yay! Every time I see an array of strbufs, my skin tingles. Thanks
for cleaning this up.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 03/14] tree-diff: clear parent array in path_appendnew()
2025-01-09 18:28 ` Junio C Hamano
@ 2025-01-10 10:54 ` Jeff King
0 siblings, 0 replies; 38+ messages in thread
From: Jeff King @ 2025-01-10 10:54 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git List, Wink Saville
On Thu, Jan 09, 2025 at 10:28:05AM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > All of the other functions which allocate a combine_diff_path struct
> > zero out the parent array, but this code path does not. There's no bug,
> > since our caller will fill in most of the fields. But leaving the unused
> > fields (like combine_diff_parent.path) uninitialized makes working with
> > the struct more error-prone than it needs to be.
>
> OK. We however will still not use the array at all when we do not
> need it, so it would be between accessing uninitialized bytes vs
> accessing 0-bytes by mistake? With my devil's advocate hat on, I
> wonder if this would lead to more sloppy users saying "I am not
> following the pointer; I am merely stopping when I see a NULL
> pointer at the end of the array" or something silly like that
> without checking the validity of the array itself (which presumably
> can be inferred by inspecting some other member in the containing
> struct, right?)".
Yes, code may be equally wrong to look at uninitialized versus zero
bytes, depending on what it's doing. I don't think "stop when you see
NULL" is a danger here; this is an array of structs, one of which now
happens to be NULL (rather than an array of char pointers, which might
imply that NULL is the end).
Some of that sloppiness already exists. For instance, before my series,
check out intersect_paths(). If we are removing an element from the
list, we clean it up like this:
for (j = 0; j < num_parent; j++)
if (combined_all_paths &&
filename_changed(p->parent[j].status))
strbuf_release(&p->parent[j].path);
but if we allocated for 3 parents and have only gotten to the second
pass, all of parent[2] will never have been filled in. We zero
initialize the parents in that function, so there's no memory error. But
it is relying on the fact that filename_changed() will reject a zero
status to avoid calling strbuf_release() on a zero'd strbuf (which
incidentally also works, but violates the strbuf API).
Now in that case we are zero-ing, so it is not one of the uninitialized
cases that Wink ran into. But even if he had tried to be careful with:
if (filename_changed(p->parent[i].status))
/* ok to look at p->parent[i].path */
it would not have worked, because that status would have been
uninitialized, too.
> > Let's just zero the parent field to be consistent with the
> > combine_diff_path_new() allocator.
>
> But I like the "let's be consistent" reasoning, so I wouldn't
> complain ;-)
So yeah. This is the part that I think is really helping new code.
Changing the strbuf to a pointer makes it even simpler (you do not even
have to check the status at all), but this is the commit that is
preventing undefined behavior. ;)
-Peff
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 06/14] run_diff_files(): de-mystify the size of combine_diff_path struct
2025-01-09 8:44 ` [PATCH 06/14] run_diff_files(): de-mystify the size of combine_diff_path struct Jeff King
@ 2025-01-10 16:40 ` Junio C Hamano
0 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2025-01-10 16:40 UTC (permalink / raw)
To: Jeff King; +Cc: Git List, Wink Saville
Jeff King <peff@peff.net> writes:
> That 4 comes from the earliest code in ea726d02e9 (diff-files: -c and
> --cc options., 2006-01-28).
Thanks. I have no idea where that hardcoded constant 4 came from,
but I think you are right that 2 would have been the correct number
ea726d02e9 shoudl have used there.
> + /*
> + * Allocate space for two parents, which will come from
> + * index stages #2 and #3, if present. Below we'll fill
> + * these from (stage - 2).
> + */
> dpath = combine_diff_path_new(ce->name, ce_namelen(ce),
> - wt_mode, null_oid(), 5);
> + wt_mode, null_oid(), 2);
>
> while (i < entries) {
> struct cache_entry *nce = istate->cache[i];
Perfect.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 09/14] tree-diff: inline path_appendnew()
2025-01-09 8:49 ` [PATCH 09/14] tree-diff: inline path_appendnew() Jeff King
@ 2025-01-11 0:41 ` Junio C Hamano
0 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2025-01-11 0:41 UTC (permalink / raw)
To: Jeff King; +Cc: Git List, Wink Saville
Jeff King <peff@peff.net> writes:
> Our path_appendnew() has been simplified to the point that it is mostly
> just implementing combine_diff_path_new(), plus setting the "next"
> pointer. Since there's only one caller, let's replace it completely with
> a call to that helper function.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> tree-diff.c | 31 ++++---------------------------
> 1 file changed, 4 insertions(+), 27 deletions(-)
Very nice, indeed.
> -static struct combine_diff_path *path_appendnew(struct combine_diff_path *last,
> - int nparent, const char *path, size_t len,
> - unsigned mode, const struct object_id *oid)
> -{
> - struct combine_diff_path *p;
> - size_t alloclen = combine_diff_path_size(nparent, len);
> -
> - p = xmalloc(alloclen);
> - p->next = NULL;
> - last->next = p;
> -
> - p->path = (char *)&(p->parent[nparent]);
> - memcpy(p->path, path, len);
> - p->path[len] = 0;
> - p->mode = mode;
> - oidcpy(&p->oid, oid ? oid : null_oid());
> -
> - memset(p->parent, 0, sizeof(p->parent[0]) * nparent);
> -
> - return p;
> -}
> -
> /*
> * new path should be added to combine diff
> *
> @@ -206,7 +180,10 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
> struct combine_diff_path *pprev = p;
>
> strbuf_add(base, path, pathlen);
> - p = path_appendnew(p, nparent, base->buf, base->len, mode, oid);
> + p = combine_diff_path_new(base->buf, base->len, mode,
> + oid ? oid : null_oid(),
> + nparent);
> + pprev->next = p;
> strbuf_setlen(base, old_baselen);
>
> for (i = 0; i < nparent; ++i) {
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 08/14] tree-diff: pass whole path string to path_appendnew()
2025-01-09 8:49 ` [PATCH 08/14] tree-diff: pass whole path string to path_appendnew() Jeff King
@ 2025-01-13 15:40 ` Patrick Steinhardt
2025-01-14 9:26 ` Jeff King
0 siblings, 1 reply; 38+ messages in thread
From: Patrick Steinhardt @ 2025-01-13 15:40 UTC (permalink / raw)
To: Jeff King; +Cc: Git List, Junio C Hamano, Wink Saville
On Thu, Jan 09, 2025 at 03:49:07AM -0500, Jeff King wrote:
> diff --git a/tree-diff.c b/tree-diff.c
> index 22fc2d8f8c..d2f8dd14a6 100644
> --- a/tree-diff.c
> +++ b/tree-diff.c
> @@ -129,20 +129,18 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_
> * and append it to paths list tail.
> */
> static struct combine_diff_path *path_appendnew(struct combine_diff_path *last,
> - int nparent, const struct strbuf *base, const char *path, int pathlen,
> + int nparent, const char *path, size_t len,
Sneaky, you also changed the type of `len` :) You might want to point
that out in the commit message.
> unsigned mode, const struct object_id *oid)
> {
> struct combine_diff_path *p;
> - size_t len = st_add(base->len, pathlen);
> size_t alloclen = combine_diff_path_size(nparent, len);
>
> p = xmalloc(alloclen);
> p->next = NULL;
> last->next = p;
>
> p->path = (char *)&(p->parent[nparent]);
> - memcpy(p->path, base->buf, base->len);
> - memcpy(p->path + base->len, path, pathlen);
> + memcpy(p->path, path, len);
> p->path[len] = 0;
> p->mode = mode;
> oidcpy(&p->oid, oid ? oid : null_oid());
> @@ -206,7 +204,10 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
> if (emitthis) {
> int keep;
> struct combine_diff_path *pprev = p;
> - p = path_appendnew(p, nparent, base, path, pathlen, mode, oid);
> +
> + strbuf_add(base, path, pathlen);
> + p = path_appendnew(p, nparent, base->buf, base->len, mode, oid);
> + strbuf_setlen(base, old_baselen);
>
> for (i = 0; i < nparent; ++i) {
> /*
Makes sense. And there is a single caller of `path_appendnew()`, only,
so no further changes should be required.
Patrick
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 02/14] combine-diff: add combine_diff_path_new()
2025-01-09 8:32 ` [PATCH 02/14] combine-diff: add combine_diff_path_new() Jeff King
2025-01-09 18:05 ` Junio C Hamano
@ 2025-01-13 15:40 ` Patrick Steinhardt
2025-01-14 9:29 ` Jeff King
1 sibling, 1 reply; 38+ messages in thread
From: Patrick Steinhardt @ 2025-01-13 15:40 UTC (permalink / raw)
To: Jeff King; +Cc: Git List, Junio C Hamano, Wink Saville
On Thu, Jan 09, 2025 at 03:32:36AM -0500, Jeff King wrote:
> The combine_diff_path struct has variable size, since it embeds both the
> memory allocation for the path field as well as a variable-sized parent
> array. This makes allocating one a bit tricky.
>
> We have a helper to compute the required size, but it's up to individual
> sites to actually initialize all of the fields. Let's provide a
> constructor function to make that a little nicer. Besides being shorter,
> it also hides away tricky bits like the computation of the "path"
> pointer (which is right after the "parent" flex array).
>
> As a bonus, using the same constructor everywhere means that we'll
> consistently initialize all parts of the struct. A few code paths left
> the parent array unitialized. This didn't cause any bugs, but we'll be
> able to simplify some code in the next few patches knowing that the
> parent fields have all been zero'd.
>
> This also gets rid of some questionable uses of "int" to store buffer
> lengths. Though we do use them to allocate, I don't think there are any
> integer overflow vulnerabilities here (the allocation helper promotes
> them to size_t and checks arithmetic for overflow, and the actual memcpy
> of the bytes is done using the possibly-truncated "int" value).
>
> Sadly we can't use the FLEX_* macros to simplify the allocation here,
> because there are two variable-sized parts to the struct (and those
> macros only handle one).
>
> Nor can we get stop publicly declaring combine_diff_path_size(). This
s/we get stop/we stop/
> diff --git a/combine-diff.c b/combine-diff.c
> index 641bc92dbd..45548fd438 100644
> --- a/combine-diff.c
> +++ b/combine-diff.c
> @@ -47,22 +47,13 @@ static struct combine_diff_path *intersect_paths(
>
> if (!n) {
> for (i = 0; i < q->nr; i++) {
> - int len;
> - const char *path;
> if (diff_unmodified_pair(q->queue[i]))
> continue;
> - path = q->queue[i]->two->path;
> - len = strlen(path);
> - p = xmalloc(combine_diff_path_size(num_parent, len));
> - p->path = (char *) &(p->parent[num_parent]);
> - memcpy(p->path, path, len);
> - p->path[len] = 0;
> - p->next = NULL;
> - memset(p->parent, 0,
> - sizeof(p->parent[0]) * num_parent);
> -
> - oidcpy(&p->oid, &q->queue[i]->two->oid);
> - p->mode = q->queue[i]->two->mode;
> + p = combine_diff_path_new(q->queue[i]->two->path,
> + strlen(q->queue[i]->two->path),
> + q->queue[i]->two->mode,
> + &q->queue[i]->two->oid,
> + num_parent);
> oidcpy(&p->parent[n].oid, &q->queue[i]->one->oid);
> p->parent[n].mode = q->queue[i]->one->mode;
> p->parent[n].status = q->queue[i]->status;
> @@ -1667,3 +1658,24 @@ void diff_tree_combined_merge(const struct commit *commit,
> diff_tree_combined(&commit->object.oid, &parents, rev);
> oid_array_clear(&parents);
> }
> +
> +struct combine_diff_path *combine_diff_path_new(const char *path,
> + size_t path_len,
> + unsigned int mode,
> + const struct object_id *oid,
> + size_t num_parents)
> +{
> + struct combine_diff_path *p;
> +
> + p = xmalloc(combine_diff_path_size(num_parents, path_len));
> + p->path = (char *)&(p->parent[num_parents]);
> + memcpy(p->path, path, path_len);
> + p->path[path_len] = 0;
> + p->next = NULL;
> + p->mode = mode;
> + oidcpy(&p->oid, oid);
> +
> + memset(p->parent, 0, sizeof(p->parent[0]) * num_parents);
> +
> + return p;
> +}
If I were to write this anew I'd probably use `xcalloc()` instead of
manually `memset()`ing parts of it to zero. But it's a faithful
transplant of the code from `intersect_paths()`, so that's probably
okay.
Patrick
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 05/14] diff: add a comment about combine_diff_path.parent.path
2025-01-09 8:42 ` [PATCH 05/14] diff: add a comment about combine_diff_path.parent.path Jeff King
@ 2025-01-13 15:40 ` Patrick Steinhardt
0 siblings, 0 replies; 38+ messages in thread
From: Patrick Steinhardt @ 2025-01-13 15:40 UTC (permalink / raw)
To: Jeff King; +Cc: Git List, Junio C Hamano, Wink Saville
On Thu, Jan 09, 2025 at 03:42:48AM -0500, Jeff King wrote:
> We only fill in the per-parent "path" field when it differs from what's
> in combine_diff_path.path (and even then only when the option is
> appropriate). Let's document that.
>
> Suggested-by: Wink Saville <wink@saville.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff.h | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/diff.h b/diff.h
> index f5f6ea00fb..60e7db4ad6 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -480,6 +480,12 @@ struct combine_diff_path {
> char status;
> unsigned int mode;
> struct object_id oid;
> + /*
> + * This per-parent path is filled only when doing a combined
> + * diff with revs.combined_all_paths set, and only if the path
> + * differs from the post-image (e.g., a rename or copy).
> + * Otherwise it is left NULL.
> + */
> char *path;
> } parent[FLEX_ARRAY];
> };
I feel like this change would've neatly fit into the preceding commit,
but don't mind it much either way.
Patrick
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 07/14] tree-diff: drop path_appendnew() alloc optimization
2025-01-09 8:46 ` [PATCH 07/14] tree-diff: drop path_appendnew() alloc optimization Jeff King
@ 2025-01-13 15:40 ` Patrick Steinhardt
2025-01-14 10:30 ` Jeff King
0 siblings, 1 reply; 38+ messages in thread
From: Patrick Steinhardt @ 2025-01-13 15:40 UTC (permalink / raw)
To: Jeff King; +Cc: Git List, Junio C Hamano, Wink Saville
On Thu, Jan 09, 2025 at 03:46:49AM -0500, Jeff King wrote:
> So my conclusion is that it probably does help a little, but it's mostly
> lost in the noise. I could see an argument for keeping it, as the
> complexity is hidden away in functions that do not often need to be
> touched. But it does make them more confusing than necessary (despite
> some detailed explanations from the author of that commit; it just took
> me a while to wrap my head around what was going on) and prevents
> further refactoring of the combine_diff_path struct. So let's drop it.
A 1% performance speedup does not feel like a good argument to me, so
I'm perfectly fine with dropping the code, even if most of it is
actually in the form of comments. But that already shows that it needs
quite a bit of explanation.
I wonder though: did you also use e.g. Valgrind to compare the number of
allocations? glibc tends to be heavily optimized with regards to small
allocations, so you typically don't notice the performance impact caused
by them even when the number of saved allocations is significant. So the
effect might be more pronounced with other libcs that aren't optimized
for such usecases, like e.g. musl libc.
Patrick
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 08/14] tree-diff: pass whole path string to path_appendnew()
2025-01-13 15:40 ` Patrick Steinhardt
@ 2025-01-14 9:26 ` Jeff King
0 siblings, 0 replies; 38+ messages in thread
From: Jeff King @ 2025-01-14 9:26 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Git List, Junio C Hamano, Wink Saville
On Mon, Jan 13, 2025 at 04:40:00PM +0100, Patrick Steinhardt wrote:
> On Thu, Jan 09, 2025 at 03:49:07AM -0500, Jeff King wrote:
> > diff --git a/tree-diff.c b/tree-diff.c
> > index 22fc2d8f8c..d2f8dd14a6 100644
> > --- a/tree-diff.c
> > +++ b/tree-diff.c
> > @@ -129,20 +129,18 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_
> > * and append it to paths list tail.
> > */
> > static struct combine_diff_path *path_appendnew(struct combine_diff_path *last,
> > - int nparent, const struct strbuf *base, const char *path, int pathlen,
> > + int nparent, const char *path, size_t len,
>
> Sneaky, you also changed the type of `len` :) You might want to point
> that out in the commit message.
Sort of. The original took a (ptr,size_t) pair in the form of "base",
and then also a (ptr,int) path. That matches what the caller has:
"pathlen" comes from tree_entry(), which returns an int (it should
probably become a size_t in the long run, but it has a lot of ripple
effects if you change it).
Now the caller handles path/pathlen itself here:
> > + strbuf_add(base, path, pathlen);
So there is nothing left to pass in except a (ptr,size_t) pair. We could
have continued passing those in as a strbuf, but calling it "base"
doesn't make sense any more.
The "int" is still there, but it just stays in the caller. In the
original it becomes a size_t via passing to combine_diff_path_size(). In
the new code, it happens when we feed it to strbuf_add().
-Peff
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 02/14] combine-diff: add combine_diff_path_new()
2025-01-13 15:40 ` Patrick Steinhardt
@ 2025-01-14 9:29 ` Jeff King
0 siblings, 0 replies; 38+ messages in thread
From: Jeff King @ 2025-01-14 9:29 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Git List, Junio C Hamano, Wink Saville
On Mon, Jan 13, 2025 at 04:40:19PM +0100, Patrick Steinhardt wrote:
> > +struct combine_diff_path *combine_diff_path_new(const char *path,
> > + size_t path_len,
> > + unsigned int mode,
> > + const struct object_id *oid,
> > + size_t num_parents)
> > +{
> > + struct combine_diff_path *p;
> > +
> > + p = xmalloc(combine_diff_path_size(num_parents, path_len));
> > + p->path = (char *)&(p->parent[num_parents]);
> > + memcpy(p->path, path, path_len);
> > + p->path[path_len] = 0;
> > + p->next = NULL;
> > + p->mode = mode;
> > + oidcpy(&p->oid, oid);
> > +
> > + memset(p->parent, 0, sizeof(p->parent[0]) * num_parents);
> > +
> > + return p;
> > +}
>
> If I were to write this anew I'd probably use `xcalloc()` instead of
> manually `memset()`ing parts of it to zero. But it's a faithful
> transplant of the code from `intersect_paths()`, so that's probably
> okay.
Yeah, I actually wrote it that way originally (thinking the issue was
that we were leaving uninitialized fields all over), before realizing
that most callers were explicitly zero-ing the parents. So I went for
the minimal change.
From an efficiency standpoint, I don't know that it matters much between
the two (xcalloc would zero some fields which we're going to assign
anyway, but the zeroing may be more efficient on the backend). xcalloc
means you'd never forget to initialize any part of it, so maybe it's
more readable / less error prone?
We could do a patch on top, but I doubt it's a big deal either way.
-Peff
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 07/14] tree-diff: drop path_appendnew() alloc optimization
2025-01-13 15:40 ` Patrick Steinhardt
@ 2025-01-14 10:30 ` Jeff King
0 siblings, 0 replies; 38+ messages in thread
From: Jeff King @ 2025-01-14 10:30 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Git List, Junio C Hamano, Wink Saville
On Mon, Jan 13, 2025 at 04:40:28PM +0100, Patrick Steinhardt wrote:
> On Thu, Jan 09, 2025 at 03:46:49AM -0500, Jeff King wrote:
> > So my conclusion is that it probably does help a little, but it's mostly
> > lost in the noise. I could see an argument for keeping it, as the
> > complexity is hidden away in functions that do not often need to be
> > touched. But it does make them more confusing than necessary (despite
> > some detailed explanations from the author of that commit; it just took
> > me a while to wrap my head around what was going on) and prevents
> > further refactoring of the combine_diff_path struct. So let's drop it.
>
> A 1% performance speedup does not feel like a good argument to me, so
> I'm perfectly fine with dropping the code, even if most of it is
> actually in the form of comments. But that already shows that it needs
> quite a bit of explanation.
>
> I wonder though: did you also use e.g. Valgrind to compare the number of
> allocations? glibc tends to be heavily optimized with regards to small
> allocations, so you typically don't notice the performance impact caused
> by them even when the number of saved allocations is significant. So the
> effect might be more pronounced with other libcs that aren't optimized
> for such usecases, like e.g. musl libc.
I didn't use valgrind, but I did confirm via some hacky printf() calls
that the optimization does kick in. Here's a version with counting:
diff --git a/tree-diff.c b/tree-diff.c
index d9237ffd9b..60db2b2f51 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -154,6 +154,11 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_
*
* p->parent[] remains uninitialized.
*/
+static int hit, total;
+void show_counter(void)
+{
+ warning("%d / %d\n", hit, total);
+}
static struct combine_diff_path *path_appendnew(struct combine_diff_path *last,
int nparent, const struct strbuf *base, const char *path, int pathlen,
unsigned mode, const struct object_id *oid)
@@ -168,6 +173,11 @@ static struct combine_diff_path *path_appendnew(struct combine_diff_path *last,
FREE_AND_NULL(p);
}
+ if (!total++)
+ atexit(show_counter);
+ if (p)
+ hit++;
+
if (!p) {
p = xmalloc(alloclen);
It seems to kick in about half of the time when running "git log --raw"
on git.git and linux.git. The absolute best case for the optimization is
comparing two trees with all entries of the same size, and all changed,
like:
git init
blob1=$(echo one | git hash-object -w --stdin)
blob2=$(echo two | git hash-object -w --stdin)
mktree() {
perl -e '
printf "100644 blob %s\tpath%08d\n", $ARGV[0], $_ for (1..1000000)
' $1
}
git tag tree1 $(mktree $blob1 | git mktree)
git tag tree2 $(mktree $blob2 | git mktree)
git diff-tree tree1 tree2
In that optimal case I see ~3% speedup on glibc. If somebody on a
platform with a different allocator can show a bigger change, that would
definitely be interesting.
I suspect it won't make that big a difference even with a slower
allocator, though, because each changed path involves other allocations
(like creating a diff_pair).
Running under valgrind with that optimal case, the old code does ~3M
allocations (so 3 per entry). Now we do 4 per entry.
So if we really care about micro-optimizing, I suspect a more productive
path would be getting a better allocator. ;) Here are hyperfine results
for the existing code ("old") versus my series ("new") with the glibc
allocator versus jemalloc:
Benchmark 1: LD_PRELOAD= ./git.old -C repo diff-tree tree1 tree2
Time (mean ± σ): 625.3 ms ± 13.3 ms [User: 547.9 ms, System: 77.3 ms]
Range (min … max): 599.8 ms … 649.9 ms 10 runs
Benchmark 2: LD_PRELOAD= ./git.new -C repo diff-tree tree1 tree2
Time (mean ± σ): 650.8 ms ± 14.5 ms [User: 568.2 ms, System: 82.5 ms]
Range (min … max): 632.2 ms … 673.6 ms 10 runs
Benchmark 3: LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 ./git.old -C repo diff-tree tree1 tree2
Time (mean ± σ): 563.9 ms ± 9.2 ms [User: 538.4 ms, System: 25.3 ms]
Range (min … max): 545.4 ms … 571.0 ms 10 runs
Benchmark 4: LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 ./git.new -C repo diff-tree tree1 tree2
Time (mean ± σ): 582.9 ms ± 10.8 ms [User: 545.1 ms, System: 37.7 ms]
Range (min … max): 568.6 ms … 595.5 ms 10 runs
Summary
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 ./git.old -C repo diff-tree tree1 tree2 ran
1.03 ± 0.03 times faster than LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 ./git.new -C repo diff-tree tree1 tree2
1.11 ± 0.03 times faster than LD_PRELOAD= ./git.old -C repo diff-tree tree1 tree2
1.15 ± 0.03 times faster than LD_PRELOAD= ./git.new -C repo diff-tree tree1 tree2
So rather than saving 2-3%, a better allocator gives you 10-15% (again,
these are pretty synthetic numbers because this is a pathological test
case). It is still faster to do fewer allocations with jemalloc, but
both the relative and absolute improvement is smaller.
-Peff
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH 11/14] tree-diff: drop list-tail argument to diff_tree_paths()
2025-01-09 8:51 ` [PATCH 11/14] tree-diff: drop list-tail argument to diff_tree_paths() Jeff King
@ 2025-01-18 0:33 ` Junio C Hamano
0 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2025-01-18 0:33 UTC (permalink / raw)
To: Jeff King; +Cc: Git List, Wink Saville
Jeff King <peff@peff.net> writes:
> The internals of the path diffing code, including ll_diff_tree_paths(),
> all take an extra combine_diff_path parameter which they use as the tail
> of a list of results, appending any new entries to it.
>
> The public-facing diff_tree_paths() takes the same argument, but it just
> makes the callers more awkward. They always start with a clean list, and
> have to set up a fake head struct to pass in.
>
> Let's keep the public API clean by always returning a new list. That
> keeps the fake struct as an implementation detail of tree-diff.c.
Yes, this is much nicer. I've always hated these code paths related
to "multitree" optimization, but these clean-ups make them more
palatable.
Thanks.
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2025-01-18 0:33 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-03 19:28 [BUGREPORT] git diff-tree --cc SEGFAUTs Wink Saville
2025-01-03 20:46 ` Jeff King
2025-01-03 23:34 ` Wink Saville
2025-01-04 0:31 ` Jeff King
2025-01-04 2:55 ` Junio C Hamano
2025-01-04 3:32 ` Jeff King
2025-01-04 18:09 ` Wink Saville
2025-01-05 22:13 ` Wink Saville
2025-01-09 8:27 ` [PATCH 0/14] combine-diff cleanups Jeff King
2025-01-09 8:28 ` [PATCH 01/14] run_diff_files(): delay allocation of combine_diff_path Jeff King
2025-01-09 17:57 ` Junio C Hamano
2025-01-09 8:32 ` [PATCH 02/14] combine-diff: add combine_diff_path_new() Jeff King
2025-01-09 18:05 ` Junio C Hamano
2025-01-13 15:40 ` Patrick Steinhardt
2025-01-14 9:29 ` Jeff King
2025-01-09 8:33 ` [PATCH 03/14] tree-diff: clear parent array in path_appendnew() Jeff King
2025-01-09 18:28 ` Junio C Hamano
2025-01-10 10:54 ` Jeff King
2025-01-09 8:42 ` [PATCH 04/14] combine-diff: use pointer for parent paths Jeff King
2025-01-09 18:49 ` Junio C Hamano
2025-01-09 8:42 ` [PATCH 05/14] diff: add a comment about combine_diff_path.parent.path Jeff King
2025-01-13 15:40 ` Patrick Steinhardt
2025-01-09 8:44 ` [PATCH 06/14] run_diff_files(): de-mystify the size of combine_diff_path struct Jeff King
2025-01-10 16:40 ` Junio C Hamano
2025-01-09 8:46 ` [PATCH 07/14] tree-diff: drop path_appendnew() alloc optimization Jeff King
2025-01-13 15:40 ` Patrick Steinhardt
2025-01-14 10:30 ` Jeff King
2025-01-09 8:49 ` [PATCH 08/14] tree-diff: pass whole path string to path_appendnew() Jeff King
2025-01-13 15:40 ` Patrick Steinhardt
2025-01-14 9:26 ` Jeff King
2025-01-09 8:49 ` [PATCH 09/14] tree-diff: inline path_appendnew() Jeff King
2025-01-11 0:41 ` Junio C Hamano
2025-01-09 8:50 ` [PATCH 10/14] combine-diff: drop public declaration of combine_diff_path_size() Jeff King
2025-01-09 8:51 ` [PATCH 11/14] tree-diff: drop list-tail argument to diff_tree_paths() Jeff King
2025-01-18 0:33 ` Junio C Hamano
2025-01-09 8:53 ` [PATCH 12/14] tree-diff: use the name "tail" to refer to list tail Jeff King
2025-01-09 8:54 ` [PATCH 13/14] tree-diff: simplify emit_path() list management Jeff King
2025-01-09 8:57 ` [PATCH 14/14] tree-diff: make list tail-passing more explicit Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).