public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t
@ 2026-01-26 10:48 Phillip Wood
  2026-01-26 10:48 ` [PATCH 1/2] xdiff: remove "line_hash" field from xrecord_t Phillip Wood
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Phillip Wood @ 2026-01-26 10:48 UTC (permalink / raw)
  To: git; +Cc: Ezekiel Newren

From: Phillip Wood <phillip.wood@dunelm.org.uk>

This series has a couple of cleanups on top of 'en/xdiff-cleanup-2'
that reduce the sizes of the xrecord_t and xdlclass_t. Unfortunately
they conflict with 'en/xdiff-cleanup-3' in seen, in particular with
db8a50ca6b9 (xdiff: don't waste time guessing the number of lines,
2026-01-02). I'm not particularly convinced that moving the call to
xdl_classify_record() out of xdl_prepare_ctx() in that commit is
a good idea, but if we decide that we do want to stop classifying
lines in xdl_prepare_ctx() we can start passing the hashes out in a
separate array rather than wasting space in xrecord_t.

Base-Commit: 1faf5b085a171f9ba9a6d7a446e0de16acccb1dc
Published-As: https://github.com/phillipwood/git/releases/tag/pw%2Fxdiff-cleanup-xrecord_t-and-xdlclass_t%2Fv1
View-Changes-At: https://github.com/phillipwood/git/compare/1faf5b085...0d251dfba
Fetch-It-Via: git fetch https://github.com/phillipwood/git pw/xdiff-cleanup-xrecord_t-and-xdlclass_t/v1


Phillip Wood (2):
  xdiff: remove "line_hash" field from xrecord_t
  xdiff: remove unused data from xdlclass_t

 xdiff/xprepare.c | 20 ++++++++++++--------
 xdiff/xtypes.h   |  1 -
 2 files changed, 12 insertions(+), 9 deletions(-)

-- 
2.52.0.362.g884e03848a9


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] xdiff: remove "line_hash" field from xrecord_t
  2026-01-26 10:48 [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t Phillip Wood
@ 2026-01-26 10:48 ` Phillip Wood
  2026-01-26 10:48 ` [PATCH 2/2] xdiff: remove unused data from xdlclass_t Phillip Wood
  2026-01-26 17:35 ` [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t Junio C Hamano
  2 siblings, 0 replies; 5+ messages in thread
From: Phillip Wood @ 2026-01-26 10:48 UTC (permalink / raw)
  To: git; +Cc: Ezekiel Newren, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Prior to commit 6a26019c81 (xdiff: split xrecord_t.ha into line_hash
and minimal_perfect_hash, 2025-11-18) the "ha" field of xrecord_t
initially held the "line_hash" value and once the line had been
interned that field was updated to hold the "minimal_perfect_hash". The
"line_hash" is only used to intern the line so there is no point in
storing it after all the input lines have been interned.

Removing the "line_hash" field from xrecord_t and storing it in
xdlclass_t where it is actually used makes it clearer that it is a
temporary value and it should not be used once we're calculated the
"minimal_perfect_hash". This also reduces the size of xrecord_t by 25%
on 64-bit platforms and 40% on 32-bit platforms. While the struct is
small we create one instance per input line so any saving is welcome.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 xdiff/xprepare.c | 12 +++++++-----
 xdiff/xtypes.h   |  1 -
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index 34c82e4f8e1..08e5d3f4dfa 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -34,6 +34,7 @@
 #define INVESTIGATE 2
 
 typedef struct s_xdlclass {
+	uint64_t line_hash;
 	struct s_xdlclass *next;
 	xrecord_t rec;
 	long idx;
@@ -92,13 +93,14 @@ static void xdl_free_classifier(xdlclassifier_t *cf) {
 }
 
 
-static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t *rec) {
+static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t *rec,
+			       uint64_t line_hash) {
 	size_t hi;
 	xdlclass_t *rcrec;
 
-	hi = XDL_HASHLONG(rec->line_hash, cf->hbits);
+	hi = XDL_HASHLONG(line_hash, cf->hbits);
 	for (rcrec = cf->rchash[hi]; rcrec; rcrec = rcrec->next)
-		if (rcrec->rec.line_hash == rec->line_hash &&
+		if (rcrec->line_hash == line_hash &&
 				xdl_recmatch((const char *)rcrec->rec.ptr, (long)rcrec->rec.size,
 					(const char *)rec->ptr, (long)rec->size, cf->flags))
 			break;
@@ -112,6 +114,7 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t
 		if (XDL_ALLOC_GROW(cf->rcrecs, cf->count, cf->alloc))
 				return -1;
 		cf->rcrecs[rcrec->idx] = rcrec;
+		rcrec->line_hash = line_hash;
 		rcrec->rec = *rec;
 		rcrec->len1 = rcrec->len2 = 0;
 		rcrec->next = cf->rchash[hi];
@@ -158,8 +161,7 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
 			crec = &xdf->recs[xdf->nrec++];
 			crec->ptr = prev;
 			crec->size = cur - prev;
-			crec->line_hash = hav;
-			if (xdl_classify_record(pass, cf, crec) < 0)
+			if (xdl_classify_record(pass, cf, crec, hav) < 0)
 				goto abort;
 		}
 	}
diff --git a/xdiff/xtypes.h b/xdiff/xtypes.h
index 979586f20a6..50aee779be3 100644
--- a/xdiff/xtypes.h
+++ b/xdiff/xtypes.h
@@ -41,7 +41,6 @@ typedef struct s_chastore {
 typedef struct s_xrecord {
 	uint8_t const *ptr;
 	size_t size;
-	uint64_t line_hash;
 	size_t minimal_perfect_hash;
 } xrecord_t;
 
-- 
2.52.0.362.g884e03848a9


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] xdiff: remove unused data from xdlclass_t
  2026-01-26 10:48 [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t Phillip Wood
  2026-01-26 10:48 ` [PATCH 1/2] xdiff: remove "line_hash" field from xrecord_t Phillip Wood
@ 2026-01-26 10:48 ` Phillip Wood
  2026-01-26 17:35 ` [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t Junio C Hamano
  2 siblings, 0 replies; 5+ messages in thread
From: Phillip Wood @ 2026-01-26 10:48 UTC (permalink / raw)
  To: git; +Cc: Ezekiel Newren, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Prior to commit 6d507bd41a (xdiff: delete fields ha, line, size
in xdlclass_t in favor of an xrecord_t, 2025-09-26) xdlclass_t
carried a copy of all the fields in xrecord_t. That commit embedded
xrecord_t in xdlclass_t to make it easier to change the types of
the fields in xrecord_t. However commit 6a26019c81 (xdiff: split
xrecord_t.ha into line_hash and minimal_perfect_hash, 2025-11-18)
added the "minimal_perfect_hash" field to xrecord_t which is not
used by xdlclass_t. To avoid wasting space stop copying the whole
of xrecord_t and just copy the pointer and length that we need to
intern the line. Together with the previous commit this effectively
reverts 6d507bd41a.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 xdiff/xprepare.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index 08e5d3f4dfa..cd4fc405eb1 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -36,7 +36,8 @@
 typedef struct s_xdlclass {
 	uint64_t line_hash;
 	struct s_xdlclass *next;
-	xrecord_t rec;
+	const uint8_t *ptr;
+	size_t size;
 	long idx;
 	long len1, len2;
 } xdlclass_t;
@@ -101,7 +102,7 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t
 	hi = XDL_HASHLONG(line_hash, cf->hbits);
 	for (rcrec = cf->rchash[hi]; rcrec; rcrec = rcrec->next)
 		if (rcrec->line_hash == line_hash &&
-				xdl_recmatch((const char *)rcrec->rec.ptr, (long)rcrec->rec.size,
+				xdl_recmatch((const char *)rcrec->ptr, (long)rcrec->size,
 					(const char *)rec->ptr, (long)rec->size, cf->flags))
 			break;
 
@@ -115,7 +116,8 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t
 				return -1;
 		cf->rcrecs[rcrec->idx] = rcrec;
 		rcrec->line_hash = line_hash;
-		rcrec->rec = *rec;
+		rcrec->ptr = rec->ptr;
+		rcrec->size = rec->size;
 		rcrec->len1 = rcrec->len2 = 0;
 		rcrec->next = cf->rchash[hi];
 		cf->rchash[hi] = rcrec;
-- 
2.52.0.362.g884e03848a9


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t
  2026-01-26 10:48 [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t Phillip Wood
  2026-01-26 10:48 ` [PATCH 1/2] xdiff: remove "line_hash" field from xrecord_t Phillip Wood
  2026-01-26 10:48 ` [PATCH 2/2] xdiff: remove unused data from xdlclass_t Phillip Wood
@ 2026-01-26 17:35 ` Junio C Hamano
  2026-02-10 20:39   ` Junio C Hamano
  2 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2026-01-26 17:35 UTC (permalink / raw)
  To: Phillip Wood; +Cc: git, Ezekiel Newren

Phillip Wood <phillip.wood123@gmail.com> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> This series has a couple of cleanups on top of 'en/xdiff-cleanup-2'
> that reduce the sizes of the xrecord_t and xdlclass_t. Unfortunately
> they conflict with 'en/xdiff-cleanup-3' in seen, in particular with
> db8a50ca6b9 (xdiff: don't waste time guessing the number of lines,
> 2026-01-02). I'm not particularly convinced that moving the call to
> xdl_classify_record() out of xdl_prepare_ctx() in that commit is
> a good idea, but if we decide that we do want to stop classifying
> lines in xdl_prepare_ctx() we can start passing the hashes out in a
> separate array rather than wasting space in xrecord_t.

Both patches look well reasoned and sensible.

It is unfortunate that the en/xdiff-cleanup-3 wants to pull these
fields in a different direction, but the topic has been dormant for
quite a while, so let's tentatively kick it out of 'seen' and see
how well this one does, until we decide how to consolidate the two
topics.  Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t
  2026-01-26 17:35 ` [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t Junio C Hamano
@ 2026-02-10 20:39   ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2026-02-10 20:39 UTC (permalink / raw)
  To: Phillip Wood, Ezekiel Newren; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Phillip Wood <phillip.wood123@gmail.com> writes:
>
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> This series has a couple of cleanups on top of 'en/xdiff-cleanup-2'
>> that reduce the sizes of the xrecord_t and xdlclass_t. Unfortunately
>> they conflict with 'en/xdiff-cleanup-3' in seen, in particular with
>> db8a50ca6b9 (xdiff: don't waste time guessing the number of lines,
>> 2026-01-02). I'm not particularly convinced that moving the call to
>> xdl_classify_record() out of xdl_prepare_ctx() in that commit is
>> a good idea, but if we decide that we do want to stop classifying
>> lines in xdl_prepare_ctx() we can start passing the hashes out in a
>> separate array rather than wasting space in xrecord_t.
>
> Both patches look well reasoned and sensible.

I was hoping that these two patches will get reviewed by somebody
else in adddition to mine, but unfortunately nothing happened.  I am
inclined to merge it down so that the other topic can have a stable
base to be rebased.

Opinions?



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-02-10 20:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-26 10:48 [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t Phillip Wood
2026-01-26 10:48 ` [PATCH 1/2] xdiff: remove "line_hash" field from xrecord_t Phillip Wood
2026-01-26 10:48 ` [PATCH 2/2] xdiff: remove unused data from xdlclass_t Phillip Wood
2026-01-26 17:35 ` [PATCH 0/2] xdiff: Remove unneeded members from xrecord_t and xdlclass_t Junio C Hamano
2026-02-10 20:39   ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox