* [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t
@ 2026-01-26 10:48 Phillip Wood
2026-01-26 10:48 ` [PATCH 1/2] xdiff: remove "line_hash" field from xrecord_t Phillip Wood
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Phillip Wood @ 2026-01-26 10:48 UTC (permalink / raw)
To: git; +Cc: Ezekiel Newren
From: Phillip Wood <phillip.wood@dunelm.org.uk>
This series has a couple of cleanups on top of 'en/xdiff-cleanup-2'
that reduce the sizes of the xrecord_t and xdlclass_t. Unfortunately
they conflict with 'en/xdiff-cleanup-3' in seen, in particular with
db8a50ca6b9 (xdiff: don't waste time guessing the number of lines,
2026-01-02). I'm not particularly convinced that moving the call to
xdl_classify_record() out of xdl_prepare_ctx() in that commit is
a good idea, but if we decide that we do want to stop classifying
lines in xdl_prepare_ctx() we can start passing the hashes out in a
separate array rather than wasting space in xrecord_t.
Base-Commit: 1faf5b085a171f9ba9a6d7a446e0de16acccb1dc
Published-As: https://github.com/phillipwood/git/releases/tag/pw%2Fxdiff-cleanup-xrecord_t-and-xdlclass_t%2Fv1
View-Changes-At: https://github.com/phillipwood/git/compare/1faf5b085...0d251dfba
Fetch-It-Via: git fetch https://github.com/phillipwood/git pw/xdiff-cleanup-xrecord_t-and-xdlclass_t/v1
Phillip Wood (2):
xdiff: remove "line_hash" field from xrecord_t
xdiff: remove unused data from xdlclass_t
xdiff/xprepare.c | 20 ++++++++++++--------
xdiff/xtypes.h | 1 -
2 files changed, 12 insertions(+), 9 deletions(-)
--
2.52.0.362.g884e03848a9
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] xdiff: remove "line_hash" field from xrecord_t
2026-01-26 10:48 [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t Phillip Wood
@ 2026-01-26 10:48 ` Phillip Wood
2026-01-26 10:48 ` [PATCH 2/2] xdiff: remove unused data from xdlclass_t Phillip Wood
2026-01-26 17:35 ` [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t Junio C Hamano
2 siblings, 0 replies; 5+ messages in thread
From: Phillip Wood @ 2026-01-26 10:48 UTC (permalink / raw)
To: git; +Cc: Ezekiel Newren, Phillip Wood
From: Phillip Wood <phillip.wood@dunelm.org.uk>
Prior to commit 6a26019c81 (xdiff: split xrecord_t.ha into line_hash
and minimal_perfect_hash, 2025-11-18) the "ha" field of xrecord_t
initially held the "line_hash" value and once the line had been
interned that field was updated to hold the "minimal_perfect_hash". The
"line_hash" is only used to intern the line so there is no point in
storing it after all the input lines have been interned.
Removing the "line_hash" field from xrecord_t and storing it in
xdlclass_t where it is actually used makes it clearer that it is a
temporary value and it should not be used once we're calculated the
"minimal_perfect_hash". This also reduces the size of xrecord_t by 25%
on 64-bit platforms and 40% on 32-bit platforms. While the struct is
small we create one instance per input line so any saving is welcome.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
xdiff/xprepare.c | 12 +++++++-----
xdiff/xtypes.h | 1 -
2 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index 34c82e4f8e1..08e5d3f4dfa 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -34,6 +34,7 @@
#define INVESTIGATE 2
typedef struct s_xdlclass {
+ uint64_t line_hash;
struct s_xdlclass *next;
xrecord_t rec;
long idx;
@@ -92,13 +93,14 @@ static void xdl_free_classifier(xdlclassifier_t *cf) {
}
-static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t *rec) {
+static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t *rec,
+ uint64_t line_hash) {
size_t hi;
xdlclass_t *rcrec;
- hi = XDL_HASHLONG(rec->line_hash, cf->hbits);
+ hi = XDL_HASHLONG(line_hash, cf->hbits);
for (rcrec = cf->rchash[hi]; rcrec; rcrec = rcrec->next)
- if (rcrec->rec.line_hash == rec->line_hash &&
+ if (rcrec->line_hash == line_hash &&
xdl_recmatch((const char *)rcrec->rec.ptr, (long)rcrec->rec.size,
(const char *)rec->ptr, (long)rec->size, cf->flags))
break;
@@ -112,6 +114,7 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t
if (XDL_ALLOC_GROW(cf->rcrecs, cf->count, cf->alloc))
return -1;
cf->rcrecs[rcrec->idx] = rcrec;
+ rcrec->line_hash = line_hash;
rcrec->rec = *rec;
rcrec->len1 = rcrec->len2 = 0;
rcrec->next = cf->rchash[hi];
@@ -158,8 +161,7 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
crec = &xdf->recs[xdf->nrec++];
crec->ptr = prev;
crec->size = cur - prev;
- crec->line_hash = hav;
- if (xdl_classify_record(pass, cf, crec) < 0)
+ if (xdl_classify_record(pass, cf, crec, hav) < 0)
goto abort;
}
}
diff --git a/xdiff/xtypes.h b/xdiff/xtypes.h
index 979586f20a6..50aee779be3 100644
--- a/xdiff/xtypes.h
+++ b/xdiff/xtypes.h
@@ -41,7 +41,6 @@ typedef struct s_chastore {
typedef struct s_xrecord {
uint8_t const *ptr;
size_t size;
- uint64_t line_hash;
size_t minimal_perfect_hash;
} xrecord_t;
--
2.52.0.362.g884e03848a9
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] xdiff: remove unused data from xdlclass_t
2026-01-26 10:48 [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t Phillip Wood
2026-01-26 10:48 ` [PATCH 1/2] xdiff: remove "line_hash" field from xrecord_t Phillip Wood
@ 2026-01-26 10:48 ` Phillip Wood
2026-01-26 17:35 ` [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t Junio C Hamano
2 siblings, 0 replies; 5+ messages in thread
From: Phillip Wood @ 2026-01-26 10:48 UTC (permalink / raw)
To: git; +Cc: Ezekiel Newren, Phillip Wood
From: Phillip Wood <phillip.wood@dunelm.org.uk>
Prior to commit 6d507bd41a (xdiff: delete fields ha, line, size
in xdlclass_t in favor of an xrecord_t, 2025-09-26) xdlclass_t
carried a copy of all the fields in xrecord_t. That commit embedded
xrecord_t in xdlclass_t to make it easier to change the types of
the fields in xrecord_t. However commit 6a26019c81 (xdiff: split
xrecord_t.ha into line_hash and minimal_perfect_hash, 2025-11-18)
added the "minimal_perfect_hash" field to xrecord_t which is not
used by xdlclass_t. To avoid wasting space stop copying the whole
of xrecord_t and just copy the pointer and length that we need to
intern the line. Together with the previous commit this effectively
reverts 6d507bd41a.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
xdiff/xprepare.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index 08e5d3f4dfa..cd4fc405eb1 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -36,7 +36,8 @@
typedef struct s_xdlclass {
uint64_t line_hash;
struct s_xdlclass *next;
- xrecord_t rec;
+ const uint8_t *ptr;
+ size_t size;
long idx;
long len1, len2;
} xdlclass_t;
@@ -101,7 +102,7 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t
hi = XDL_HASHLONG(line_hash, cf->hbits);
for (rcrec = cf->rchash[hi]; rcrec; rcrec = rcrec->next)
if (rcrec->line_hash == line_hash &&
- xdl_recmatch((const char *)rcrec->rec.ptr, (long)rcrec->rec.size,
+ xdl_recmatch((const char *)rcrec->ptr, (long)rcrec->size,
(const char *)rec->ptr, (long)rec->size, cf->flags))
break;
@@ -115,7 +116,8 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t
return -1;
cf->rcrecs[rcrec->idx] = rcrec;
rcrec->line_hash = line_hash;
- rcrec->rec = *rec;
+ rcrec->ptr = rec->ptr;
+ rcrec->size = rec->size;
rcrec->len1 = rcrec->len2 = 0;
rcrec->next = cf->rchash[hi];
cf->rchash[hi] = rcrec;
--
2.52.0.362.g884e03848a9
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t
2026-01-26 10:48 [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t Phillip Wood
2026-01-26 10:48 ` [PATCH 1/2] xdiff: remove "line_hash" field from xrecord_t Phillip Wood
2026-01-26 10:48 ` [PATCH 2/2] xdiff: remove unused data from xdlclass_t Phillip Wood
@ 2026-01-26 17:35 ` Junio C Hamano
2026-02-10 20:39 ` Junio C Hamano
2 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2026-01-26 17:35 UTC (permalink / raw)
To: Phillip Wood; +Cc: git, Ezekiel Newren
Phillip Wood <phillip.wood123@gmail.com> writes:
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> This series has a couple of cleanups on top of 'en/xdiff-cleanup-2'
> that reduce the sizes of the xrecord_t and xdlclass_t. Unfortunately
> they conflict with 'en/xdiff-cleanup-3' in seen, in particular with
> db8a50ca6b9 (xdiff: don't waste time guessing the number of lines,
> 2026-01-02). I'm not particularly convinced that moving the call to
> xdl_classify_record() out of xdl_prepare_ctx() in that commit is
> a good idea, but if we decide that we do want to stop classifying
> lines in xdl_prepare_ctx() we can start passing the hashes out in a
> separate array rather than wasting space in xrecord_t.
Both patches look well reasoned and sensible.
It is unfortunate that the en/xdiff-cleanup-3 wants to pull these
fields in a different direction, but the topic has been dormant for
quite a while, so let's tentatively kick it out of 'seen' and see
how well this one does, until we decide how to consolidate the two
topics. Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t
2026-01-26 17:35 ` [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t Junio C Hamano
@ 2026-02-10 20:39 ` Junio C Hamano
0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2026-02-10 20:39 UTC (permalink / raw)
To: Phillip Wood, Ezekiel Newren; +Cc: git
Junio C Hamano <gitster@pobox.com> writes:
> Phillip Wood <phillip.wood123@gmail.com> writes:
>
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> This series has a couple of cleanups on top of 'en/xdiff-cleanup-2'
>> that reduce the sizes of the xrecord_t and xdlclass_t. Unfortunately
>> they conflict with 'en/xdiff-cleanup-3' in seen, in particular with
>> db8a50ca6b9 (xdiff: don't waste time guessing the number of lines,
>> 2026-01-02). I'm not particularly convinced that moving the call to
>> xdl_classify_record() out of xdl_prepare_ctx() in that commit is
>> a good idea, but if we decide that we do want to stop classifying
>> lines in xdl_prepare_ctx() we can start passing the hashes out in a
>> separate array rather than wasting space in xrecord_t.
>
> Both patches look well reasoned and sensible.
I was hoping that these two patches will get reviewed by somebody
else in adddition to mine, but unfortunately nothing happened. I am
inclined to merge it down so that the other topic can have a stable
base to be rebased.
Opinions?
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-02-10 20:39 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-26 10:48 [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t Phillip Wood
2026-01-26 10:48 ` [PATCH 1/2] xdiff: remove "line_hash" field from xrecord_t Phillip Wood
2026-01-26 10:48 ` [PATCH 2/2] xdiff: remove unused data from xdlclass_t Phillip Wood
2026-01-26 17:35 ` [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t Junio C Hamano
2026-02-10 20:39 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox