Git development
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Torsten Bögershausen" <tboegi@web.de>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	 Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com>,
	 git@vger.kernel.org,  Derrick Stolee <stolee@gmail.com>,
	 Jeff King <peff@peff.net>
Subject: Re: [PATCH v2 01/11] index-pack, unpack-objects: use size_t for object size
Date: Sun, 10 May 2026 11:41:15 +0900	[thread overview]
Message-ID: <xmqqik8w9eb8.fsf@gitster.g> (raw)
In-Reply-To: <20260508190947.GA25792@tb-raspi4> ("Torsten Bögershausen"'s message of "Fri, 8 May 2026 21:09:48 +0200")

Torsten Bögershausen <tboegi@web.de> writes:

> On Fri, May 08, 2026 at 09:36:53AM +0200, Johannes Schindelin wrote:
>> Hi Torsten,
>> 
>> On Tue, 5 May 2026, Torsten Bögershausen wrote:
>> 
>> > On Mon, May 04, 2026 at 05:08:18PM +0000, Johannes Schindelin via GitGitGadget wrote:
>> > > From: Johannes Schindelin <johannes.schindelin@gmx.de>
>> > > 
>> > > [...]
>> > > @@ -524,7 +524,8 @@ static void *unpack_raw_entry(struct object_entry *obj,
>> > >  			      struct object_id *oid)
>> > >  {
>> > >  	unsigned char *p;
>> > > -	unsigned long size, c;
>> > > +	size_t size;
>> > > +	unsigned long c;
>> >
>> > Does this look a little bit strange ?
>> 
>> Good point.
>> 
>> > p points to an unsigned char (better would be *uint8_t)
>> > then it is dereferenced into an "unsigned long".
>> > Then it is masked with 0x7f
>> > In short: should "c" be declared as uint8_t ?
>> 
>> Almost. It should be a `size_t`, so that we don't have to cast it when
>> shifting it. I'll include a fix in the next iteration.
>
> I think I was not very clear here.
> De-referencing a long (or size_t) from any address is something
> I would try to avoid:
> Some processors do not like to read a 32 or 64 bit value from
> an uneven address (and throw an exeption).
> x86 processors just handle it, using more than one bus cycle.
>
> In short: Please keep the up-cast and simply read an uint8_t.

Hmph, I do not think there is "up-cast" to keep.  And we do not
dereference a random pointer that would be suitable for unsigned
char * as if it were "unsigned long *" or "size_t *" in this code.

This came from commit 48fb7deb5bbd87933e7d314b73d7c1b52667f80f

Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Jun 17 17:22:27 2009 -0700

    Fix big left-shifts of unsigned char
    
    Shifting 'unsigned char' or 'unsigned short' left can result in sign
    extension errors, since the C integer promotion rules means that the
    unsigned char/short will get implicitly promoted to a signed 'int' due to
    the shift (or due to other operations).
    
    This normally doesn't matter, but if you shift things up sufficiently, it
    will now set the sign bit in 'int', and a subsequent cast to a bigger type
    (eg 'long' or 'unsigned long') will now sign-extend the value despite the
    original expression being unsigned.
    
    One example of this would be something like
    
            unsigned long size;
            unsigned char c;
    
            size += c << 24;
    
    where despite all the variables being unsigned, 'c << 24' ends up being a
    signed entity, and will get sign-extended when then doing the addition in
    an 'unsigned long' type.


You could rewrite Linus's example to

	unsigned char *cp;
	unsigned long size;
	unsigned char c;

	c = *cp;
	size += ((unsigned long)c) << 24;

While I am sympathetic to that position, I also would not mind

	unsigned char *cp;
	unsigned long size;
	unsigned long c;

	c = *cp;
	size += c << 24;

all that much.  In any case, such a "clean-up" has little to do with
the topic under discussion, and itshould be discussed separately on
its own merit, most likely when the dust settles after this topic
lands.  Let's not contaminate the patches that is "a trivial rewrite
that is so obviously correct to fix the assumption that ulong and
size_t are of the same size everywhere" with unrelated clean-up.

Thanks.


  reply	other threads:[~2026-05-10  2:41 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28 16:26 [PATCH 0/6] Handle cloning of objects larger than 4GB on Windows Johannes Schindelin via GitGitGadget
2026-04-28 16:26 ` [PATCH 1/6] index-pack, unpack-objects: use size_t for object size Johannes Schindelin via GitGitGadget
2026-04-30 14:13   ` Torsten Bögershausen
2026-05-03 14:46     ` Johannes Schindelin
2026-04-28 16:26 ` [PATCH 2/6] git-zlib: handle data streams larger than 4GB Johannes Schindelin via GitGitGadget
2026-04-28 16:26 ` [PATCH 3/6] odb, packfile: use size_t for streaming object sizes Johannes Schindelin via GitGitGadget
2026-04-28 16:26 ` [PATCH 4/6] delta, packfile: use size_t for delta header sizes Johannes Schindelin via GitGitGadget
2026-04-29 13:28   ` Derrick Stolee
2026-05-03 14:49     ` Johannes Schindelin
2026-04-28 16:26 ` [PATCH 5/6] test-tool: add a helper to synthesize large packfiles Johannes Schindelin via GitGitGadget
2026-04-28 16:26 ` [PATCH 6/6] t5608: add regression test for >4GB object clone Johannes Schindelin via GitGitGadget
2026-04-29 13:34   ` Derrick Stolee
2026-05-01  6:38     ` Jeff King
2026-05-01 13:19       ` Derrick Stolee
2026-05-04 17:07         ` Johannes Schindelin
2026-04-29 13:35 ` [PATCH 0/6] Handle cloning of objects larger than 4GB on Windows Derrick Stolee
2026-05-04 17:08 ` [PATCH v2 00/11] " Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 01/11] index-pack, unpack-objects: use size_t for object size Johannes Schindelin via GitGitGadget
2026-05-05 19:11     ` Torsten Bögershausen
2026-05-08  7:36       ` Johannes Schindelin
2026-05-08 19:09         ` Torsten Bögershausen
2026-05-10  2:41           ` Junio C Hamano [this message]
2026-05-10  9:14             ` Torsten Bögershausen
2026-05-04 17:08   ` [PATCH v2 02/11] git-zlib: handle data streams larger than 4GB Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 03/11] odb, packfile: use size_t for streaming object sizes Johannes Schindelin via GitGitGadget
2026-05-05 19:27     ` Torsten Bögershausen
2026-05-08  7:38       ` Johannes Schindelin
2026-05-04 17:08   ` [PATCH v2 04/11] delta, packfile: use size_t for delta header sizes Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 05/11] test-tool: add a helper to synthesize large packfiles Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 06/11] t5608: add regression test for >4GB object clone Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 07/11] test-tool synthesize: use the unsafe hash for speed Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 08/11] test-tool synthesize: precompute pack for 4 GiB + 1 Johannes Schindelin via GitGitGadget
2026-05-04 18:27     ` Derrick Stolee
2026-05-05 20:54       ` Johannes Schindelin
2026-05-04 17:08   ` [PATCH v2 09/11] test-tool synthesize: add precomputed SHA-256 " Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 10/11] t5608: mark >4GB tests as EXPENSIVE Johannes Schindelin via GitGitGadget
2026-05-04 17:08   ` [PATCH v2 11/11] ci: run expensive tests on push builds to integration branches Johannes Schindelin via GitGitGadget
2026-05-04 18:35     ` Derrick Stolee
2026-05-05 12:56       ` Junio C Hamano
2026-05-05 23:07         ` Junio C Hamano
2026-05-06  8:33           ` Johannes Schindelin
2026-05-07  9:18             ` Junio C Hamano
2026-05-07 10:24               ` Patrick Steinhardt
2026-05-08  2:50         ` Junio C Hamano
2026-05-08  8:16   ` [PATCH v3 00/11] Handle cloning of objects larger than 4GB on Windows Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 01/11] index-pack, unpack-objects: use size_t for object size Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 02/11] git-zlib: handle data streams larger than 4GB Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 03/11] odb, packfile: use size_t for streaming object sizes Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 04/11] delta, packfile: use size_t for delta header sizes Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 05/11] test-tool: add a helper to synthesize large packfiles Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 06/11] t5608: add regression test for >4GB object clone Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 07/11] test-tool synthesize: use the unsafe hash for speed Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 08/11] test-tool synthesize: precompute pack for 4 GiB + 1 Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 09/11] test-tool synthesize: add precomputed SHA-256 " Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 10/11] t5608: mark >4GB tests as EXPENSIVE Johannes Schindelin via GitGitGadget
2026-05-08  8:16     ` [PATCH v3 11/11] ci: run expensive tests on push builds to integration branches Johannes Schindelin via GitGitGadget
2026-05-10 23:51       ` [PATCH] ci: enable EXPENSIVE for contributor builds Junio C Hamano
2026-05-11  7:05         ` Patrick Steinhardt
2026-05-11  8:29           ` Junio C Hamano
2026-05-11 10:02             ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqik8w9eb8.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox