git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] git crashes with a SIGBUS on sparc64 during pull
@ 2025-01-17  3:30 Koakuma
  2025-01-17 12:11 ` Jeff King
  0 siblings, 1 reply; 22+ messages in thread
From: Koakuma @ 2025-01-17  3:30 UTC (permalink / raw)
  To: git@vger.kernel.org

Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.

What did you do before the bug happened? (Steps to reproduce your issue)

1. Compile git with clang (this doesn't seem to happen with GCC, oddly)
2. Do a `git pull` on a repository that will trigger the `unpack-objects` code

What did you expect to happen? (Expected behavior)

Pull succeeds

What happened instead? (Actual behavior)

Pull fails with this error:
remote: Enumerating objects: 7, done.
remote: Counting objects: 100% (5/5), done.
error: unpack-objects died of signal 10
fatal: unpack-objects failed

What's different between what you expected and what actually happened?

The pull process crashes with a SIGBUS.

Anything else you want to add:

re: unpack-objects, I don't know how to specifically trigger that code,
but I know that once `unpack-objects` is triggered it will reliably crash
with the error above.

According to gdb, the crash happens on this line:
#0  0x000001000019ca18 in cmd_unpack_objects (argc=<optimized out>, argv=0x1000063a4e0, prefix=<optimized out>, repo=<optimized out>) at builtin/unpack-objects.c:653
653					hdr->hdr_signature = htonl(PACK_SIGNATURE);

Overaligning the `buffer` declaration in the same file to try to get around
possible alignment issues seems to be able to prevent the crash,
but I don't know if it would be a proper fix for it.

Please review the rest of the bug report below.
You can delete any lines you don't wish to share.


[System Info]
git version:
git version 2.48.1.262.g85cc9f2d1e
cpu: sparc64
built from commit: 85cc9f2d1ee4d65cb1edb00d4f56863185a53e0f
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
libcurl: 8.10.1
OpenSSL: OpenSSL 3.3.2 3 Sep 2024
zlib: 1.3.1
uname: Linux 6.6.30-sparc64-clang+ #1 SMP Sat Oct 26 21:22:10 WIB 2024 sparc64
compiler info: clang: 19.1.4
libc info: glibc: 2.40
$SHELL (typically, interactive shell): /bin/bash


[Enabled Hooks]


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [BUG] git crashes with a SIGBUS on sparc64 during pull
  2025-01-17  3:30 [BUG] git crashes with a SIGBUS on sparc64 during pull Koakuma
@ 2025-01-17 12:11 ` Jeff King
  2025-01-17 12:52   ` Jeff King
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2025-01-17 12:11 UTC (permalink / raw)
  To: Koakuma; +Cc: git@vger.kernel.org

On Fri, Jan 17, 2025 at 03:30:09AM +0000, Koakuma wrote:

> re: unpack-objects, I don't know how to specifically trigger that code,
> but I know that once `unpack-objects` is triggered it will reliably crash
> with the error above.

Perhaps:

  git init repo
  cd repo
  git commit --allow-empty -m foo
  git repack -ad
  pack=$(ls .git/objects/pack/*.pack)
  dd if=$pack of=no-header.pack bs=1 skip=12
  # don't bother parsing the first 12 bytes; we know it is
  # a version 2 pack with 2 objects
  git unpack-objects --pack_header=2,2 <no-header.pack

would be a minimal reproduction?

> According to gdb, the crash happens on this line:
> #0  0x000001000019ca18 in cmd_unpack_objects (argc=<optimized out>, argv=0x1000063a4e0, prefix=<optimized out>, repo=<optimized out>) at builtin/unpack-objects.c:653
> 653					hdr->hdr_signature = htonl(PACK_SIGNATURE);
> 
> Overaligning the `buffer` declaration in the same file to try to get around
> possible alignment issues seems to be able to prevent the crash,
> but I don't know if it would be a proper fix for it.

Interesting. We are pretty cavalier about casting mmap'd buffers to
structs in order to read pack headers (since the format is designed to
be 4-byte aligned).

But in this case we are _writing_ into a buffer through the cast
structure (our caller has already parsed the header, so we are
reconstructing it). So we are relying on the alignment of a static
"unsigned char" in the data section.

Which certainly seems sketchy, though it is kind of interesting that it
has never been a problem before (and the code has been this way for
decades). At any rate, the fix is probably this (and we'd want the same
in index-pack, too, I'd think):

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 2197d6d933..288cecf98f 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -645,18 +645,20 @@ int cmd_unpack_objects(int argc,
 				continue;
 			}
 			if (starts_with(arg, "--pack_header=")) {
-				struct pack_header *hdr;
+				unsigned char *hdr = buffer;
 				char *c;
 
-				hdr = (struct pack_header *)buffer;
-				hdr->hdr_signature = htonl(PACK_SIGNATURE);
-				hdr->hdr_version = htonl(strtoul(arg + 14, &c, 10));
+				put_be32(hdr, PACK_SIGNATURE);
+				hdr += 4;
+				put_be32(hdr, strtoul(arg + 14, &c, 10));
+				hdr += 4;
 				if (*c != ',')
 					die("bad %s", arg);
-				hdr->hdr_entries = htonl(strtoul(c + 1, &c, 10));
+				put_be32(hdr, strtoul(c + 1, &c, 10));
+				hdr += 4;
 				if (*c)
 					die("bad %s", arg);
-				len = sizeof(*hdr);
+				len = hdr - buffer;
 				continue;
 			}
 			if (skip_prefix(arg, "--max-input-size=", &arg)) {

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [BUG] git crashes with a SIGBUS on sparc64 during pull
  2025-01-17 12:11 ` Jeff King
@ 2025-01-17 12:52   ` Jeff King
  2025-01-17 12:54     ` [PATCH 1/3] packfile: factor out --pack_header argument parsing Jeff King
                       ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Jeff King @ 2025-01-17 12:52 UTC (permalink / raw)
  To: Koakuma; +Cc: git@vger.kernel.org

On Fri, Jan 17, 2025 at 07:11:21AM -0500, Jeff King wrote:

> Which certainly seems sketchy, though it is kind of interesting that it
> has never been a problem before (and the code has been this way for
> decades). At any rate, the fix is probably this (and we'd want the same
> in index-pack, too, I'd think):

Assuming that helps, here's a series which does that fix plus some
associated cleanups.

I'm curious if it's enough. After we write to this unaligned buffer,
naturally the next thing we'll do is read from it, and the reading
routines will do the same cast (see unpack_all() in unpack-objects).
But maybe your platform allows unaligned reads but not writes? Probably
I am being too optimistic. :)

  [1/3]: packfile: factor out --pack_header argument parsing
  [2/3]: parse_pack_header_option(): avoid unaligned memory writes
  [3/3]: index-pack, unpack-objects: use skip_prefix to avoid magic number

 builtin/index-pack.c     | 18 +++++-------------
 builtin/unpack-objects.c | 18 +++++-------------
 packfile.c               | 22 ++++++++++++++++++++++
 packfile.h               |  6 ++++++
 4 files changed, 38 insertions(+), 26 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/3] packfile: factor out --pack_header argument parsing
  2025-01-17 12:52   ` Jeff King
@ 2025-01-17 12:54     ` Jeff King
  2025-01-17 22:45       ` Junio C Hamano
  2025-01-17 12:55     ` [PATCH 2/3] parse_pack_header_option(): avoid unaligned memory writes Jeff King
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2025-01-17 12:54 UTC (permalink / raw)
  To: Koakuma; +Cc: git@vger.kernel.org

Both index-pack and unpack-objects accept a --pack_header argument. This
is an undocumented internal argument used by receive-pack and fetch to
pass along information about the header of the pack, which they've
already read from the incoming stream.

In preparation for a bugfix, let's factor the duplicated code into a
common helper.

The callers are still responsible for identifying the option. While this
could likewise be factored out, it is more flexible this way (e.g., if
they ever started using parse-options and wanted to handle both the
stuck and unstuck forms).

Likewise, the callers are responsible for reporting errors, though they
both just call die(). I've tweaked unpack-objects to match index-pack in
marking the error for translation.

Signed-off-by: Jeff King <peff@peff.net>
---
The generating side of this option is duplicate, too, between
receive-pack and fetch-pack. But it's so much simpler that I didn't
think it was worth factoring out. We could always do it later.

 builtin/index-pack.c     | 14 +++-----------
 builtin/unpack-objects.c | 16 ++++------------
 packfile.c               | 17 +++++++++++++++++
 packfile.h               |  6 ++++++
 4 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 0b62b2589f..75b84f78f4 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1955,18 +1955,10 @@ int cmd_index_pack(int argc,
 					nr_threads = 1;
 				}
 			} else if (starts_with(arg, "--pack_header=")) {
-				struct pack_header *hdr;
-				char *c;
-
-				hdr = (struct pack_header *)input_buffer;
-				hdr->hdr_signature = htonl(PACK_SIGNATURE);
-				hdr->hdr_version = htonl(strtoul(arg + 14, &c, 10));
-				if (*c != ',')
-					die(_("bad %s"), arg);
-				hdr->hdr_entries = htonl(strtoul(c + 1, &c, 10));
-				if (*c)
+				if (parse_pack_header_option(arg + 14,
+							     input_buffer,
+							     &input_len) < 0)
 					die(_("bad %s"), arg);
-				input_len = sizeof(*hdr);
 			} else if (!strcmp(arg, "-v")) {
 				verbose = 1;
 			} else if (!strcmp(arg, "--progress-title")) {
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 2197d6d933..cf2bc5c531 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -18,6 +18,7 @@
 #include "progress.h"
 #include "decorate.h"
 #include "fsck.h"
+#include "packfile.h"
 
 static int dry_run, quiet, recover, has_errors, strict;
 static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict]";
@@ -645,18 +646,9 @@ int cmd_unpack_objects(int argc,
 				continue;
 			}
 			if (starts_with(arg, "--pack_header=")) {
-				struct pack_header *hdr;
-				char *c;
-
-				hdr = (struct pack_header *)buffer;
-				hdr->hdr_signature = htonl(PACK_SIGNATURE);
-				hdr->hdr_version = htonl(strtoul(arg + 14, &c, 10));
-				if (*c != ',')
-					die("bad %s", arg);
-				hdr->hdr_entries = htonl(strtoul(c + 1, &c, 10));
-				if (*c)
-					die("bad %s", arg);
-				len = sizeof(*hdr);
+				if (parse_pack_header_option(arg + 14,
+							     buffer, &len) < 0)
+					die(_("bad %s"), arg);
 				continue;
 			}
 			if (skip_prefix(arg, "--max-input-size=", &arg)) {
diff --git a/packfile.c b/packfile.c
index cc7ab6403a..2bf9e57330 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2315,3 +2315,20 @@ int is_promisor_object(struct repository *r, const struct object_id *oid)
 	}
 	return oidset_contains(&promisor_objects, oid);
 }
+
+int parse_pack_header_option(const char *in, unsigned char *out, unsigned int *len)
+{
+	struct pack_header *hdr;
+	char *c;
+
+	hdr = (struct pack_header *)out;
+	hdr->hdr_signature = htonl(PACK_SIGNATURE);
+	hdr->hdr_version = htonl(strtoul(in, &c, 10));
+	if (*c != ',')
+		return -1;
+	hdr->hdr_entries = htonl(strtoul(c + 1, &c, 10));
+	if (*c)
+		return -1;
+	*len = sizeof(*hdr);
+	return 0;
+}
diff --git a/packfile.h b/packfile.h
index 58104fa009..00ada7a938 100644
--- a/packfile.h
+++ b/packfile.h
@@ -216,4 +216,10 @@ int is_promisor_object(struct repository *r, const struct object_id *oid);
 int load_idx(const char *path, const unsigned int hashsz, void *idx_map,
 	     size_t idx_size, struct packed_git *p);
 
+/*
+ * Parse a --pack_header option as accepted by index-pack and unpack-objects,
+ * turning it into the matching bytes we'd find in a pack.
+ */
+int parse_pack_header_option(const char *in, unsigned char *out, unsigned int *len);
+
 #endif
-- 
2.48.1.438.g7fbf7b046e


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/3] parse_pack_header_option(): avoid unaligned memory writes
  2025-01-17 12:52   ` Jeff King
  2025-01-17 12:54     ` [PATCH 1/3] packfile: factor out --pack_header argument parsing Jeff King
@ 2025-01-17 12:55     ` Jeff King
  2025-01-18  1:27       ` Junio C Hamano
  2025-01-17 12:56     ` [PATCH 3/3] index-pack, unpack-objects: use skip_prefix to avoid magic number Jeff King
  2025-01-17 15:55     ` [BUG] git crashes with a SIGBUS on sparc64 during pull Koakuma
  3 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2025-01-17 12:55 UTC (permalink / raw)
  To: Koakuma; +Cc: git@vger.kernel.org

In order to recreate a pack header in our in-memory buffer, we cast the
buffer to a "struct pack_header" and assign the individual fields. This
is reported to cause SIGBUS on sparc64 due to alignment issues.

We can work around this by using put_be32() which will write individual
bytes into the buffer.

Reported-by: Koakuma <koachan@protonmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
Fingers crossed that this is sufficient, and we don't have more
alignment problems.

 packfile.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/packfile.c b/packfile.c
index 2bf9e57330..2d80d80cb3 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2318,17 +2318,20 @@ int is_promisor_object(struct repository *r, const struct object_id *oid)
 
 int parse_pack_header_option(const char *in, unsigned char *out, unsigned int *len)
 {
-	struct pack_header *hdr;
+	unsigned char *hdr;
 	char *c;
 
-	hdr = (struct pack_header *)out;
-	hdr->hdr_signature = htonl(PACK_SIGNATURE);
-	hdr->hdr_version = htonl(strtoul(in, &c, 10));
+	hdr = out;
+	put_be32(hdr, PACK_SIGNATURE);
+	hdr += 4;
+	put_be32(hdr, strtoul(in, &c, 10));
+	hdr += 4;
 	if (*c != ',')
 		return -1;
-	hdr->hdr_entries = htonl(strtoul(c + 1, &c, 10));
+	put_be32(hdr, strtoul(c + 1, &c, 10));
+	hdr += 4;
 	if (*c)
 		return -1;
-	*len = sizeof(*hdr);
+	*len = hdr - out;
 	return 0;
 }
-- 
2.48.1.438.g7fbf7b046e


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/3] index-pack, unpack-objects: use skip_prefix to avoid magic number
  2025-01-17 12:52   ` Jeff King
  2025-01-17 12:54     ` [PATCH 1/3] packfile: factor out --pack_header argument parsing Jeff King
  2025-01-17 12:55     ` [PATCH 2/3] parse_pack_header_option(): avoid unaligned memory writes Jeff King
@ 2025-01-17 12:56     ` Jeff King
  2025-01-17 15:55     ` [BUG] git crashes with a SIGBUS on sparc64 during pull Koakuma
  3 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2025-01-17 12:56 UTC (permalink / raw)
  To: Koakuma; +Cc: git@vger.kernel.org

When parsing --pack_header=, we manually skip 14 bytes to the data.
Let's use skip_prefix() to do this automatically.

Note that we overwrite our pointer to the front of the string, so we
have to add more context to the error message. We could avoid this by
declaring an extra pointer to hold the value, but I think the modified
message is actually preferable. It should give translators a bit more
context.

Signed-off-by: Jeff King <peff@peff.net>
---
Just a cleanup I noticed in the area. It's possible that I'm overly
annoyed by manual string counting and this is just churn, though. ;)

 builtin/index-pack.c     | 6 +++---
 builtin/unpack-objects.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 75b84f78f4..0561eec00a 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1954,11 +1954,11 @@ int cmd_index_pack(int argc,
 					warning(_("no threads support, ignoring %s"), arg);
 					nr_threads = 1;
 				}
-			} else if (starts_with(arg, "--pack_header=")) {
-				if (parse_pack_header_option(arg + 14,
+			} else if (skip_prefix(arg, "--pack_header=", &arg)) {
+				if (parse_pack_header_option(arg,
 							     input_buffer,
 							     &input_len) < 0)
-					die(_("bad %s"), arg);
+					die(_("bad --pack_header: %s"), arg);
 			} else if (!strcmp(arg, "-v")) {
 				verbose = 1;
 			} else if (!strcmp(arg, "--progress-title")) {
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index cf2bc5c531..06d517dfb6 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -645,10 +645,10 @@ int cmd_unpack_objects(int argc,
 				fsck_set_msg_types(&fsck_options, arg);
 				continue;
 			}
-			if (starts_with(arg, "--pack_header=")) {
-				if (parse_pack_header_option(arg + 14,
+			if (skip_prefix(arg, "--pack_header=", &arg)) {
+				if (parse_pack_header_option(arg,
 							     buffer, &len) < 0)
-					die(_("bad %s"), arg);
+					die(_("bad --pack_header: %s"), arg);
 				continue;
 			}
 			if (skip_prefix(arg, "--max-input-size=", &arg)) {
-- 
2.48.1.438.g7fbf7b046e

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [BUG] git crashes with a SIGBUS on sparc64 during pull
  2025-01-17 12:52   ` Jeff King
                       ` (2 preceding siblings ...)
  2025-01-17 12:56     ` [PATCH 3/3] index-pack, unpack-objects: use skip_prefix to avoid magic number Jeff King
@ 2025-01-17 15:55     ` Koakuma
  2025-01-18  9:20       ` Jeff King
  2025-01-19 13:12       ` [PATCH v2 0/5] " Jeff King
  3 siblings, 2 replies; 22+ messages in thread
From: Koakuma @ 2025-01-17 15:55 UTC (permalink / raw)
  To: Jeff King; +Cc: git@vger.kernel.org

Jeff King <peff@peff.net> wrote:
> Perhaps:
> 
> git init repo
> cd repo
> git commit --allow-empty -m foo
> git repack -ad
> pack=$(ls .git/objects/pack/*.pack)
> dd if=$pack of=no-header.pack bs=1 skip=12
> # don't bother parsing the first 12 bytes; we know it is
> # a version 2 pack with 2 objects
> git unpack-objects --pack_header=2,2 <no-header.pack
> 
> would be a minimal reproduction?

Thank you! This reliably triggers it every time.

> diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
> index 2197d6d933..288cecf98f 100644
> --- a/builtin/unpack-objects.c
> +++ b/builtin/unpack-objects.c
> @@ -645,18 +645,20 @@ int cmd_unpack_objects(int argc,
> continue;
> }
> if (starts_with(arg, "--pack_header=")) {
> - struct pack_header *hdr;
> + unsigned char *hdr = buffer;
> char *c;
> 
> - hdr = (struct pack_header *)buffer;
> - hdr->hdr_signature = htonl(PACK_SIGNATURE);
> 
> - hdr->hdr_version = htonl(strtoul(arg + 14, &c, 10));
> 
> + put_be32(hdr, PACK_SIGNATURE);
> + hdr += 4;
> + put_be32(hdr, strtoul(arg + 14, &c, 10));
> + hdr += 4;
> if (*c != ',')
> die("bad %s", arg);
> - hdr->hdr_entries = htonl(strtoul(c + 1, &c, 10));
> 
> + put_be32(hdr, strtoul(c + 1, &c, 10));
> + hdr += 4;
> if (*c)
> die("bad %s", arg);
> - len = sizeof(*hdr);
> + len = hdr - buffer;
> continue;
> }
> if (skip_prefix(arg, "--max-input-size=", &arg)) {

This diff does fix the issue in `cmd_unpack_objects`, however...

> I'm curious if it's enough. After we write to this unaligned buffer,
> naturally the next thing we'll do is read from it, and the reading
> routines will do the same cast (see unpack_all() in unpack-objects).

It crashes in `unpack_all`, just as you guessed:

#0  unpack_all () at builtin/unpack-objects.c:583
583		nr_objects = ntohl(hdr->hdr_entries);

So I suppose the reading part needs to be adjusted as well?

> But maybe your platform allows unaligned reads but not writes? Probably
> I am being too optimistic. :)

As far as I understand, sparc64 traps on any unaligned accesses,
be it reads or writes.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] packfile: factor out --pack_header argument parsing
  2025-01-17 12:54     ` [PATCH 1/3] packfile: factor out --pack_header argument parsing Jeff King
@ 2025-01-17 22:45       ` Junio C Hamano
  2025-01-18  9:23         ` Jeff King
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2025-01-17 22:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Koakuma, git@vger.kernel.org

Jeff King <peff@peff.net> writes:

>  			} else if (starts_with(arg, "--pack_header=")) {
> -				struct pack_header *hdr;
> -				char *c;
> -
> -				hdr = (struct pack_header *)input_buffer;
> -				hdr->hdr_signature = htonl(PACK_SIGNATURE);
> -				hdr->hdr_version = htonl(strtoul(arg + 14, &c, 10));

Interesting.  So the file-scope static input_buffer[] sits in the
BSS and happens to be well aligned not to cause the problem, but ...

> @@ -645,18 +646,9 @@ int cmd_unpack_objects(int argc,
>  				continue;
>  			}
>  			if (starts_with(arg, "--pack_header=")) {
> -				struct pack_header *hdr;
> -				char *c;
> -
> -				hdr = (struct pack_header *)buffer;
> -				hdr->hdr_signature = htonl(PACK_SIGNATURE);
> -				hdr->hdr_version = htonl(strtoul(arg + 14, &c, 10));

... the same file-scope static buffer[] that also sits in the BSS
was not well aligned by chance?

Otherwise these should be identical code.  Very interesting.

And of course the fix in the [2/3] is absolutely the right thing to
do.

Thanks.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] parse_pack_header_option(): avoid unaligned memory writes
  2025-01-17 12:55     ` [PATCH 2/3] parse_pack_header_option(): avoid unaligned memory writes Jeff King
@ 2025-01-18  1:27       ` Junio C Hamano
  2025-01-18  9:36         ` Jeff King
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2025-01-18  1:27 UTC (permalink / raw)
  To: Jeff King; +Cc: Koakuma, git@vger.kernel.org

Jeff King <peff@peff.net> writes:

> +	put_be32(hdr, PACK_SIGNATURE);

Tonight's comedy.  PACK_SIGNATURE is defined as 0x5041434b (in pack.h)
In <compat/bswap.h> we want to take advantage of the fact that
assigning any unsigned integer to *(unsigned char *) would assign
the integer's least significant 8 bits.

static inline void put_be32(void *ptr, uint32_t value)
{
	unsigned char *p = ptr;
	p[0] = value >> 24;
	p[1] = value >> 16;
	p[2] = value >>  8;
	p[3] = value >>  0;
}

But sparse seems not to like that.

compat/bswap.h:175:22: error: cast truncates bits from constant value (5041 becomes 41)
compat/bswap.h:176:22: error: cast truncates bits from constant value (504143 becomes 43)
compat/bswap.h:177:22: error: cast truncates bits from constant value (5041434b becomes 4b)

Of course we could do the mask, but should we have to?

I think the real compiler would be clever ehough to produce the
identical binary with the following patch that is only needed to
squelch this error, but I feel dirty after writing this.

By the way, a "git grep" finds 

	put_be32(&hdr_version, INDEX_EXTENSION_VERSION2);

in the fsmonitor.c file, which does not get flagged only because the
CPP macro expands to a small integer (2).  That is doubly insulting.


 compat/bswap.h | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git c/compat/bswap.h w/compat/bswap.h
index 512f6f4b99..b34054f2bd 100644
--- c/compat/bswap.h
+++ w/compat/bswap.h
@@ -171,23 +171,23 @@ static inline uint64_t get_be64(const void *ptr)
 static inline void put_be32(void *ptr, uint32_t value)
 {
 	unsigned char *p = ptr;
-	p[0] = value >> 24;
-	p[1] = value >> 16;
-	p[2] = value >>  8;
-	p[3] = value >>  0;
+	p[0] = (value >> 24) & 0xff;
+	p[1] = (value >> 16) & 0xff;
+	p[2] = (value >>  8) & 0xff;
+	p[3] = (value >>  0) & 0xff;
 }
 
 static inline void put_be64(void *ptr, uint64_t value)
 {
 	unsigned char *p = ptr;
-	p[0] = value >> 56;
-	p[1] = value >> 48;
-	p[2] = value >> 40;
-	p[3] = value >> 32;
-	p[4] = value >> 24;
-	p[5] = value >> 16;
-	p[6] = value >>  8;
-	p[7] = value >>  0;
+	p[0] = (value >> 56) & 0xff;
+	p[1] = (value >> 48) & 0xff;
+	p[2] = (value >> 40) & 0xff;
+	p[3] = (value >> 32) & 0xff;
+	p[4] = (value >> 24) & 0xff;
+	p[5] = (value >> 16) & 0xff;
+	p[6] = (value >>  8) & 0xff;
+	p[7] = (value >>  0) & 0xff;
 }
 
 #endif /* COMPAT_BSWAP_H */


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [BUG] git crashes with a SIGBUS on sparc64 during pull
  2025-01-17 15:55     ` [BUG] git crashes with a SIGBUS on sparc64 during pull Koakuma
@ 2025-01-18  9:20       ` Jeff King
  2025-01-18 16:50         ` Koakuma
  2025-01-19 13:12       ` [PATCH v2 0/5] " Jeff King
  1 sibling, 1 reply; 22+ messages in thread
From: Jeff King @ 2025-01-18  9:20 UTC (permalink / raw)
  To: Koakuma; +Cc: git@vger.kernel.org

On Fri, Jan 17, 2025 at 03:55:53PM +0000, Koakuma wrote:

> This diff does fix the issue in `cmd_unpack_objects`, however...
> 
> > I'm curious if it's enough. After we write to this unaligned buffer,
> > naturally the next thing we'll do is read from it, and the reading
> > routines will do the same cast (see unpack_all() in unpack-objects).
> 
> It crashes in `unpack_all`, just as you guessed:
> 
> #0  unpack_all () at builtin/unpack-objects.c:583
> 583		nr_objects = ntohl(hdr->hdr_entries);
> 
> So I suppose the reading part needs to be adjusted as well?

I guess that's not too surprising. Probably an application of get_be32()
would solve it. But I do wonder if it would be simpler just to make sure
the buffer is aligned. You mentioned that you tried that before and it
worked. How did you do it? With a pragma/attribute, or with a union (as
below)?

The union thing should be portable, I'd think, but unfortunately has a
lot of fallout through the code because the name of "buffer" changes (we
could also declare the storage separately and make "buffer" a pointer to
it, but we'd have to be careful about calls to sizeof()).

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 2197d6d933..65db435b46 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -23,7 +23,10 @@ static int dry_run, quiet, recover, has_errors, strict;
 static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict]";
 
 /* We always read in 4kB chunks. */
-static unsigned char buffer[4096];
+static union {
+	struct pack_header hdr;
+	unsigned char bytes[4096];
+} buffer;
 static unsigned int offset, len;
 static off_t consumed_bytes;
 static off_t max_input_size;
@@ -65,24 +68,24 @@ static void add_object_buffer(struct object *object, char *buffer, unsigned long
 static void *fill(int min)
 {
 	if (min <= len)
-		return buffer + offset;
-	if (min > sizeof(buffer))
+		return buffer.bytes + offset;
+	if (min > sizeof(buffer.bytes))
 		die("cannot fill %d bytes", min);
 	if (offset) {
-		the_hash_algo->update_fn(&ctx, buffer, offset);
-		memmove(buffer, buffer + offset, len);
+		the_hash_algo->update_fn(&ctx, buffer.bytes, offset);
+		memmove(buffer.bytes, buffer.bytes + offset, len);
 		offset = 0;
 	}
 	do {
-		ssize_t ret = xread(0, buffer + len, sizeof(buffer) - len);
+		ssize_t ret = xread(0, buffer.bytes + len, sizeof(buffer.bytes) - len);
 		if (ret <= 0) {
 			if (!ret)
 				die("early EOF");
 			die_errno("read error on input");
 		}
 		len += ret;
 	} while (len < min);
-	return buffer;
+	return buffer.bytes;
 }
 
 static void use(int bytes)
@@ -645,18 +648,16 @@ int cmd_unpack_objects(int argc,
 				continue;
 			}
 			if (starts_with(arg, "--pack_header=")) {
-				struct pack_header *hdr;
 				char *c;
 
-				hdr = (struct pack_header *)buffer;
-				hdr->hdr_signature = htonl(PACK_SIGNATURE);
-				hdr->hdr_version = htonl(strtoul(arg + 14, &c, 10));
+				buffer.hdr.hdr_signature = htonl(PACK_SIGNATURE);
+				buffer.hdr.hdr_version = htonl(strtoul(arg + 14, &c, 10));
 				if (*c != ',')
 					die("bad %s", arg);
-				hdr->hdr_entries = htonl(strtoul(c + 1, &c, 10));
+				buffer.hdr.hdr_entries = htonl(strtoul(c + 1, &c, 10));
 				if (*c)
 					die("bad %s", arg);
-				len = sizeof(*hdr);
+				len = sizeof(buffer.hdr);
 				continue;
 			}
 			if (skip_prefix(arg, "--max-input-size=", &arg)) {
@@ -671,7 +672,7 @@ int cmd_unpack_objects(int argc,
 	}
 	the_hash_algo->init_fn(&ctx);
 	unpack_all();
-	the_hash_algo->update_fn(&ctx, buffer, offset);
+	the_hash_algo->update_fn(&ctx, buffer.bytes, offset);
 	the_hash_algo->init_fn(&tmp_ctx);
 	the_hash_algo->clone_fn(&tmp_ctx, &ctx);
 	the_hash_algo->final_oid_fn(&oid, &tmp_ctx);
@@ -686,7 +687,7 @@ int cmd_unpack_objects(int argc,
 	use(the_hash_algo->rawsz);
 
 	/* Write the last part of the buffer to stdout */
-	write_in_full(1, buffer + offset, len);
+	write_in_full(1, buffer.bytes + offset, len);
 
 	/* All done */
 	return has_errors;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] packfile: factor out --pack_header argument parsing
  2025-01-17 22:45       ` Junio C Hamano
@ 2025-01-18  9:23         ` Jeff King
  2025-01-18 16:57           ` Koakuma
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2025-01-18  9:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Koakuma, git@vger.kernel.org

On Fri, Jan 17, 2025 at 02:45:04PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >  			} else if (starts_with(arg, "--pack_header=")) {
> > -				struct pack_header *hdr;
> > -				char *c;
> > -
> > -				hdr = (struct pack_header *)input_buffer;
> > -				hdr->hdr_signature = htonl(PACK_SIGNATURE);
> > -				hdr->hdr_version = htonl(strtoul(arg + 14, &c, 10));
> 
> Interesting.  So the file-scope static input_buffer[] sits in the
> BSS and happens to be well aligned not to cause the problem, but ...

I suspect it _is_ a problem, but either:

  - The OP's test case was small enough to trigger unpack-objects, not
    index-pack. Possibly:

      git index-pack --stdin --pack_header=2,2 <no-header.pack

    would fail for them.

  - We simply got lucky with alignment based on the other things in BSS,
    the whim of the compiler, etc. But it is an accident waiting to
    happen.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] parse_pack_header_option(): avoid unaligned memory writes
  2025-01-18  1:27       ` Junio C Hamano
@ 2025-01-18  9:36         ` Jeff King
  0 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2025-01-18  9:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Koakuma, git@vger.kernel.org

On Fri, Jan 17, 2025 at 05:27:21PM -0800, Junio C Hamano wrote:

> static inline void put_be32(void *ptr, uint32_t value)
> {
> 	unsigned char *p = ptr;
> 	p[0] = value >> 24;
> 	p[1] = value >> 16;
> 	p[2] = value >>  8;
> 	p[3] = value >>  0;
> }
> 
> But sparse seems not to like that.
> 
> compat/bswap.h:175:22: error: cast truncates bits from constant value (5041 becomes 41)
> compat/bswap.h:176:22: error: cast truncates bits from constant value (504143 becomes 43)
> compat/bswap.h:177:22: error: cast truncates bits from constant value (5041434b becomes 4b)
> 
> Of course we could do the mask, but should we have to?

Cute. I think the above is well defined in terms of the C standard. But
I could see how a linter might want to remind you that you're truncating
a constant.

It is kind of lame that it only flags the call with a constant. If you
want to warn people that they are accidentally truncating, surely it's
obvious in the code above that truncation is _possible_ depending on the
value. It seems like it's either worth flagging as a dangerous
construct, or not; but doing it only for a constant is not super
helpful.

> I think the real compiler would be clever ehough to produce the
> identical binary with the following patch that is only needed to
> squelch this error, but I feel dirty after writing this.

I checked with "gcc -s" and it produces the same asm before and after
your patch, with both -O0 and -O2. So I don't think there's a practical
downside. As far as feeling dirty, I dunno. It is basically telling any
linter "yes, I know we are truncating here". Since it is contained
within put_be32() and won't spread across the code base, I'm not too
offended by it.

I guess the other option is to pass -Wno-cast-truncate to sparse.

> By the way, a "git grep" finds 
> 
> 	put_be32(&hdr_version, INDEX_EXTENSION_VERSION2);
> 
> in the fsmonitor.c file, which does not get flagged only because the
> CPP macro expands to a small integer (2).  That is doubly insulting.

Heh. Yeah, that goes back to my "kind of lame" comment above. ;)

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [BUG] git crashes with a SIGBUS on sparc64 during pull
  2025-01-18  9:20       ` Jeff King
@ 2025-01-18 16:50         ` Koakuma
  0 siblings, 0 replies; 22+ messages in thread
From: Koakuma @ 2025-01-18 16:50 UTC (permalink / raw)
  To: Jeff King; +Cc: git@vger.kernel.org

Jeff King <peff@peff.net> wrote:
> The union thing should be portable, I'd think, but unfortunately has a
> lot of fallout through the code because the name of "buffer" changes (we
> could also declare the storage separately and make "buffer" a pointer to
> it, but we'd have to be careful about calls to sizeof()).
> 
> diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
> index 2197d6d933..65db435b46 100644
> --- a/builtin/unpack-objects.c
> +++ b/builtin/unpack-objects.c
> @@ -23,7 +23,10 @@ static int dry_run, quiet, recover, has_errors, strict;
> static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict]";
> 
> /* We always read in 4kB chunks. */
> -static unsigned char buffer[4096];
> +static union {
> + struct pack_header hdr;
> + unsigned char bytes[4096];
> +} buffer;
> static unsigned int offset, len;
> static off_t consumed_bytes;
> static off_t max_input_size;
> @@ -65,24 +68,24 @@ static void add_object_buffer(struct object *object, char *buffer, unsigned long
> static void *fill(int min)
> {
> if (min <= len)
> - return buffer + offset;
> - if (min > sizeof(buffer))
> 
> + return buffer.bytes + offset;
> + if (min > sizeof(buffer.bytes))
> 
> die("cannot fill %d bytes", min);
> if (offset) {
> - the_hash_algo->update_fn(&ctx, buffer, offset);
> 
> - memmove(buffer, buffer + offset, len);
> + the_hash_algo->update_fn(&ctx, buffer.bytes, offset);
> 
> + memmove(buffer.bytes, buffer.bytes + offset, len);
> offset = 0;
> }
> do {
> - ssize_t ret = xread(0, buffer + len, sizeof(buffer) - len);
> + ssize_t ret = xread(0, buffer.bytes + len, sizeof(buffer.bytes) - len);
> if (ret <= 0) {
> if (!ret)
> die("early EOF");
> die_errno("read error on input");
> }
> len += ret;
> } while (len < min);
> - return buffer;
> + return buffer.bytes;
> }
> 
> static void use(int bytes)
> @@ -645,18 +648,16 @@ int cmd_unpack_objects(int argc,
> continue;
> }
> if (starts_with(arg, "--pack_header=")) {
> - struct pack_header *hdr;
> char *c;
> 
> - hdr = (struct pack_header *)buffer;
> - hdr->hdr_signature = htonl(PACK_SIGNATURE);
> 
> - hdr->hdr_version = htonl(strtoul(arg + 14, &c, 10));
> 
> + buffer.hdr.hdr_signature = htonl(PACK_SIGNATURE);
> + buffer.hdr.hdr_version = htonl(strtoul(arg + 14, &c, 10));
> if (*c != ',')
> die("bad %s", arg);
> - hdr->hdr_entries = htonl(strtoul(c + 1, &c, 10));
> 
> + buffer.hdr.hdr_entries = htonl(strtoul(c + 1, &c, 10));
> if (*c)
> die("bad %s", arg);
> - len = sizeof(*hdr);
> + len = sizeof(buffer.hdr);
> continue;
> }
> if (skip_prefix(arg, "--max-input-size=", &arg)) {
> @@ -671,7 +672,7 @@ int cmd_unpack_objects(int argc,
> }
> the_hash_algo->init_fn(&ctx);
> 
> unpack_all();
> - the_hash_algo->update_fn(&ctx, buffer, offset);
> 
> + the_hash_algo->update_fn(&ctx, buffer.bytes, offset);
> 
> the_hash_algo->init_fn(&tmp_ctx);
> 
> the_hash_algo->clone_fn(&tmp_ctx, &ctx);
> 
> the_hash_algo->final_oid_fn(&oid, &tmp_ctx);
> 
> @@ -686,7 +687,7 @@ int cmd_unpack_objects(int argc,
> use(the_hash_algo->rawsz);
> 
> 
> /* Write the last part of the buffer to stdout /
> - write_in_full(1, buffer + offset, len);
> + write_in_full(1, buffer.bytes + offset, len);
> 
> / All done */
> return has_errors;

This patch works well here, yes.

> I guess that's not too surprising. Probably an application of get_be32()
> would solve it. But I do wonder if it would be simpler just to make sure
> the buffer is aligned. You mentioned that you tried that before and it
> worked. How did you do it? With a pragma/attribute, or with a union (as
> below)?

I did it by adding an attribute, though probably a better implementation would
query for the alignment of `structack_hdr` instead of hardcoding it.

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 842a90353a..98f270ec0c 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -23,7 +23,7 @@ static int dry_run, quiet, recover, has_errors, strict;
 static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict]";
 
 /* We always read in 4kB chunks. */
-static unsigned char buffer[4096];
+static unsigned char buffer[4096] __attribute__((aligned(16)));
 static unsigned int offset, len;
 static off_t consumed_bytes;
 static off_t max_input_size;


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] packfile: factor out --pack_header argument parsing
  2025-01-18  9:23         ` Jeff King
@ 2025-01-18 16:57           ` Koakuma
  0 siblings, 0 replies; 22+ messages in thread
From: Koakuma @ 2025-01-18 16:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git@vger.kernel.org

Jeff King <peff@peff.net> wrote:
> Junio C Hamano wrote:
> > Jeff King peff@peff.net writes:
> > Interesting. So the file-scope static input_buffer[] sits in the
> > BSS and happens to be well aligned not to cause the problem, but ...
> 
> I suspect it is a problem, but either:
> 
> - The OP's test case was small enough to trigger unpack-objects, not
> index-pack. Possibly:
> 
> git index-pack --stdin --pack_header=2,2 <no-header.pack
> 
> would fail for them.
> 
> - We simply got lucky with alignment based on the other things in BSS,
> the whim of the compiler, etc. But it is an accident waiting to
> happen.
> 
> -Peff

That particular command doesn't seem to crash for me right now, but, as you
said, that is probably just a lucky coincidence that the compiler happens to
place the buffer on aligned memory.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 0/5] git crashes with a SIGBUS on sparc64 during pull
  2025-01-17 15:55     ` [BUG] git crashes with a SIGBUS on sparc64 during pull Koakuma
  2025-01-18  9:20       ` Jeff King
@ 2025-01-19 13:12       ` Jeff King
  2025-01-19 13:23         ` [PATCH v2 1/5] bswap.h: squelch potential sparse -Wcast-truncate warnings Jeff King
                           ` (5 more replies)
  1 sibling, 6 replies; 22+ messages in thread
From: Jeff King @ 2025-01-19 13:12 UTC (permalink / raw)
  To: Koakuma; +Cc: Junio C Hamano, git@vger.kernel.org

Here's a v2 which fixes the reading side, as well. I think this should
let you get through a full run of unpack-objects, but please confirm. :)

This also includes Junio's put_be32() tweak to silence sparse.

  [1/5]: bswap.h: squelch potential sparse -Wcast-truncate warnings
  [2/5]: packfile: factor out --pack_header argument parsing
  [3/5]: parse_pack_header_option(): avoid unaligned memory writes
  [4/5]: index-pack, unpack-objects: use get_be32() for reading pack header
  [5/5]: index-pack, unpack-objects: use skip_prefix to avoid magic number

 builtin/index-pack.c     | 30 ++++++++++++------------------
 builtin/unpack-objects.c | 31 ++++++++++++-------------------
 compat/bswap.h           | 24 ++++++++++++------------
 pack.h                   |  3 ++-
 packfile.c               | 20 ++++++++++++++++++++
 packfile.h               |  6 ++++++
 6 files changed, 64 insertions(+), 50 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/5] bswap.h: squelch potential sparse -Wcast-truncate warnings
  2025-01-19 13:12       ` [PATCH v2 0/5] " Jeff King
@ 2025-01-19 13:23         ` Jeff King
  2025-01-19 13:23         ` [PATCH v2 2/5] packfile: factor out --pack_header argument parsing Jeff King
                           ` (4 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2025-01-19 13:23 UTC (permalink / raw)
  To: Koakuma; +Cc: Junio C Hamano, git@vger.kernel.org

From: Junio C Hamano <gitster@pobox.com>

In put_be32(), we right-shift a uint32_t value various amounts and then
assign the low 8-bits to individual "unsigned char" bytes, throwing away
the high bits. For shifts smaller than 24 bits, those thrown away bits
will be arbitrary bits from the original uint32_t.

This works exactly as we want, but if you feed a constant, then sparse
complains. For example if we write this (which we plan to do in a future
patch):

  put_be32(hdr, PACK_SIGNATURE);

then "make sparse" produces:

  compat/bswap.h:175:22: error: cast truncates bits from constant value (5041 becomes 41)
  compat/bswap.h:176:22: error: cast truncates bits from constant value (504143 becomes 43)
  compat/bswap.h:177:22: error: cast truncates bits from constant value (5041434b becomes 4b)

And the same issue exists in the other put_be*() functions, when used
with a constant.

We can silence this warning by explicitly masking off the truncated
bits. The compiler is smart enough to know the result is the same, and
the asm generated by gcc (with both -O0 and -O2) is identical.

Curiously this line already exists:

	put_be32(&hdr_version, INDEX_EXTENSION_VERSION2);

in the fsmonitor.c file, but it does not get flagged because the CPP
macro expands to a small integer (2).

Signed-off-by: Jeff King <peff@peff.net>
---
I tweaked the commit message to make more sense in context. Probably
should get Junio's signoff. :)

I noticed that reftable/basics.c has its own version of these macros,
and it also does the masking.

 compat/bswap.h | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/compat/bswap.h b/compat/bswap.h
index 512f6f4b99..b34054f2bd 100644
--- a/compat/bswap.h
+++ b/compat/bswap.h
@@ -171,23 +171,23 @@ static inline uint64_t get_be64(const void *ptr)
 static inline void put_be32(void *ptr, uint32_t value)
 {
 	unsigned char *p = ptr;
-	p[0] = value >> 24;
-	p[1] = value >> 16;
-	p[2] = value >>  8;
-	p[3] = value >>  0;
+	p[0] = (value >> 24) & 0xff;
+	p[1] = (value >> 16) & 0xff;
+	p[2] = (value >>  8) & 0xff;
+	p[3] = (value >>  0) & 0xff;
 }
 
 static inline void put_be64(void *ptr, uint64_t value)
 {
 	unsigned char *p = ptr;
-	p[0] = value >> 56;
-	p[1] = value >> 48;
-	p[2] = value >> 40;
-	p[3] = value >> 32;
-	p[4] = value >> 24;
-	p[5] = value >> 16;
-	p[6] = value >>  8;
-	p[7] = value >>  0;
+	p[0] = (value >> 56) & 0xff;
+	p[1] = (value >> 48) & 0xff;
+	p[2] = (value >> 40) & 0xff;
+	p[3] = (value >> 32) & 0xff;
+	p[4] = (value >> 24) & 0xff;
+	p[5] = (value >> 16) & 0xff;
+	p[6] = (value >>  8) & 0xff;
+	p[7] = (value >>  0) & 0xff;
 }
 
 #endif /* COMPAT_BSWAP_H */
-- 
2.48.1.466.g22b8cfd1fa


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/5] packfile: factor out --pack_header argument parsing
  2025-01-19 13:12       ` [PATCH v2 0/5] " Jeff King
  2025-01-19 13:23         ` [PATCH v2 1/5] bswap.h: squelch potential sparse -Wcast-truncate warnings Jeff King
@ 2025-01-19 13:23         ` Jeff King
  2025-01-19 13:23         ` [PATCH v2 3/5] parse_pack_header_option(): avoid unaligned memory writes Jeff King
                           ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2025-01-19 13:23 UTC (permalink / raw)
  To: Koakuma; +Cc: Junio C Hamano, git@vger.kernel.org

Both index-pack and unpack-objects accept a --pack_header argument. This
is an undocumented internal argument used by receive-pack and fetch to
pass along information about the header of the pack, which they've
already read from the incoming stream.

In preparation for a bugfix, let's factor the duplicated code into a
common helper.

The callers are still responsible for identifying the option. While this
could likewise be factored out, it is more flexible this way (e.g., if
they ever started using parse-options and wanted to handle both the
stuck and unstuck forms).

Likewise, the callers are responsible for reporting errors, though they
both just call die(). I've tweaked unpack-objects to match index-pack in
marking the error for translation.

Signed-off-by: Jeff King <peff@peff.net>
---
Same as before.

 builtin/index-pack.c     | 14 +++-----------
 builtin/unpack-objects.c | 16 ++++------------
 packfile.c               | 17 +++++++++++++++++
 packfile.h               |  6 ++++++
 4 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 0b62b2589f..75b84f78f4 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1955,18 +1955,10 @@ int cmd_index_pack(int argc,
 					nr_threads = 1;
 				}
 			} else if (starts_with(arg, "--pack_header=")) {
-				struct pack_header *hdr;
-				char *c;
-
-				hdr = (struct pack_header *)input_buffer;
-				hdr->hdr_signature = htonl(PACK_SIGNATURE);
-				hdr->hdr_version = htonl(strtoul(arg + 14, &c, 10));
-				if (*c != ',')
-					die(_("bad %s"), arg);
-				hdr->hdr_entries = htonl(strtoul(c + 1, &c, 10));
-				if (*c)
+				if (parse_pack_header_option(arg + 14,
+							     input_buffer,
+							     &input_len) < 0)
 					die(_("bad %s"), arg);
-				input_len = sizeof(*hdr);
 			} else if (!strcmp(arg, "-v")) {
 				verbose = 1;
 			} else if (!strcmp(arg, "--progress-title")) {
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 2197d6d933..cf2bc5c531 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -18,6 +18,7 @@
 #include "progress.h"
 #include "decorate.h"
 #include "fsck.h"
+#include "packfile.h"
 
 static int dry_run, quiet, recover, has_errors, strict;
 static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict]";
@@ -645,18 +646,9 @@ int cmd_unpack_objects(int argc,
 				continue;
 			}
 			if (starts_with(arg, "--pack_header=")) {
-				struct pack_header *hdr;
-				char *c;
-
-				hdr = (struct pack_header *)buffer;
-				hdr->hdr_signature = htonl(PACK_SIGNATURE);
-				hdr->hdr_version = htonl(strtoul(arg + 14, &c, 10));
-				if (*c != ',')
-					die("bad %s", arg);
-				hdr->hdr_entries = htonl(strtoul(c + 1, &c, 10));
-				if (*c)
-					die("bad %s", arg);
-				len = sizeof(*hdr);
+				if (parse_pack_header_option(arg + 14,
+							     buffer, &len) < 0)
+					die(_("bad %s"), arg);
 				continue;
 			}
 			if (skip_prefix(arg, "--max-input-size=", &arg)) {
diff --git a/packfile.c b/packfile.c
index cc7ab6403a..2bf9e57330 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2315,3 +2315,20 @@ int is_promisor_object(struct repository *r, const struct object_id *oid)
 	}
 	return oidset_contains(&promisor_objects, oid);
 }
+
+int parse_pack_header_option(const char *in, unsigned char *out, unsigned int *len)
+{
+	struct pack_header *hdr;
+	char *c;
+
+	hdr = (struct pack_header *)out;
+	hdr->hdr_signature = htonl(PACK_SIGNATURE);
+	hdr->hdr_version = htonl(strtoul(in, &c, 10));
+	if (*c != ',')
+		return -1;
+	hdr->hdr_entries = htonl(strtoul(c + 1, &c, 10));
+	if (*c)
+		return -1;
+	*len = sizeof(*hdr);
+	return 0;
+}
diff --git a/packfile.h b/packfile.h
index 58104fa009..00ada7a938 100644
--- a/packfile.h
+++ b/packfile.h
@@ -216,4 +216,10 @@ int is_promisor_object(struct repository *r, const struct object_id *oid);
 int load_idx(const char *path, const unsigned int hashsz, void *idx_map,
 	     size_t idx_size, struct packed_git *p);
 
+/*
+ * Parse a --pack_header option as accepted by index-pack and unpack-objects,
+ * turning it into the matching bytes we'd find in a pack.
+ */
+int parse_pack_header_option(const char *in, unsigned char *out, unsigned int *len);
+
 #endif
-- 
2.48.1.466.g22b8cfd1fa


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/5] parse_pack_header_option(): avoid unaligned memory writes
  2025-01-19 13:12       ` [PATCH v2 0/5] " Jeff King
  2025-01-19 13:23         ` [PATCH v2 1/5] bswap.h: squelch potential sparse -Wcast-truncate warnings Jeff King
  2025-01-19 13:23         ` [PATCH v2 2/5] packfile: factor out --pack_header argument parsing Jeff King
@ 2025-01-19 13:23         ` Jeff King
  2025-01-19 13:25         ` [PATCH v2 4/5] index-pack, unpack-objects: use get_be32() for reading pack header Jeff King
                           ` (2 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2025-01-19 13:23 UTC (permalink / raw)
  To: Koakuma; +Cc: Junio C Hamano, git@vger.kernel.org

In order to recreate a pack header in our in-memory buffer, we cast the
buffer to a "struct pack_header" and assign the individual fields. This
is reported to cause SIGBUS on sparc64 due to alignment issues.

We can work around this by using put_be32() which will write individual
bytes into the buffer.

Reported-by: Koakuma <koachan@protonmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
Same as before.

 packfile.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/packfile.c b/packfile.c
index 2bf9e57330..2d80d80cb3 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2318,17 +2318,20 @@ int is_promisor_object(struct repository *r, const struct object_id *oid)
 
 int parse_pack_header_option(const char *in, unsigned char *out, unsigned int *len)
 {
-	struct pack_header *hdr;
+	unsigned char *hdr;
 	char *c;
 
-	hdr = (struct pack_header *)out;
-	hdr->hdr_signature = htonl(PACK_SIGNATURE);
-	hdr->hdr_version = htonl(strtoul(in, &c, 10));
+	hdr = out;
+	put_be32(hdr, PACK_SIGNATURE);
+	hdr += 4;
+	put_be32(hdr, strtoul(in, &c, 10));
+	hdr += 4;
 	if (*c != ',')
 		return -1;
-	hdr->hdr_entries = htonl(strtoul(c + 1, &c, 10));
+	put_be32(hdr, strtoul(c + 1, &c, 10));
+	hdr += 4;
 	if (*c)
 		return -1;
-	*len = sizeof(*hdr);
+	*len = hdr - out;
 	return 0;
 }
-- 
2.48.1.466.g22b8cfd1fa


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 4/5] index-pack, unpack-objects: use get_be32() for reading pack header
  2025-01-19 13:12       ` [PATCH v2 0/5] " Jeff King
                           ` (2 preceding siblings ...)
  2025-01-19 13:23         ` [PATCH v2 3/5] parse_pack_header_option(): avoid unaligned memory writes Jeff King
@ 2025-01-19 13:25         ` Jeff King
  2025-01-19 13:25         ` [PATCH v2 5/5] index-pack, unpack-objects: use skip_prefix to avoid magic number Jeff King
  2025-01-20 15:20         ` [PATCH v2 0/5] git crashes with a SIGBUS on sparc64 during pull Koakuma
  5 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2025-01-19 13:25 UTC (permalink / raw)
  To: Koakuma; +Cc: Junio C Hamano, git@vger.kernel.org

Both of these commands read the incoming pack into a static unsigned
char buffer in BSS, and then parse it by casting the start of the buffer
to a struct pack_header. This can result in SIGBUS on some platforms if
the compiler doesn't place the buffer in a position that is properly
aligned for 4-byte integers.

This reportedly happens with unpack-objects (but not index-pack) on
sparc64 when compiled with clang (but not gcc). But we are definitely in
the wrong in both spots; since the buffer's type is unsigned char, we
can't depend on larger alignment. When it works it is only because we
are lucky.

We'll fix this by switching to get_be32() to read the headers (just like
the last few commits similarly switched us to put_be32() for writing
into the same buffer).

It would be nice to factor this out into a common helper function, but
the interface ends up quite awkward. Either the caller needs to hardcode
how many bytes we'll need, or it needs to pass us its fill()/use()
functions as pointers. So I've just fixed both spots in the same way;
this is not code that is likely to be repeated a third time (most of the
pack reading code uses an mmap'd buffer, which should be properly
aligned).

I did make one tweak to the shared code: our pack_version_ok() macro
expects us to pass the big-endian value we'd get by casting. We can
introduce a "native" variant which uses the host integer ordering.

Reported-by: Koakuma <koachan@protonmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
New in v2.

I did write the function-pointer version, and it's not even _too_ bad to
read. But while trying to write a comment documenting the helper, it
was just too gross to justify.

 builtin/index-pack.c     | 12 +++++++-----
 builtin/unpack-objects.c | 13 +++++++------
 pack.h                   |  3 ++-
 3 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 75b84f78f4..d6fd4bbde6 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -379,16 +379,18 @@ static const char *open_pack_file(const char *pack_name)
 
 static void parse_pack_header(void)
 {
-	struct pack_header *hdr = fill(sizeof(struct pack_header));
+	unsigned char *hdr = fill(sizeof(struct pack_header));
 
 	/* Header consistency check */
-	if (hdr->hdr_signature != htonl(PACK_SIGNATURE))
+	if (get_be32(hdr) != PACK_SIGNATURE)
 		die(_("pack signature mismatch"));
-	if (!pack_version_ok(hdr->hdr_version))
+	hdr += 4;
+	if (!pack_version_ok_native(get_be32(hdr)))
 		die(_("pack version %"PRIu32" unsupported"),
-			ntohl(hdr->hdr_version));
+		    get_be32(hdr));
+	hdr += 4;
 
-	nr_objects = ntohl(hdr->hdr_entries);
+	nr_objects = get_be32(hdr);
 	use(sizeof(struct pack_header));
 }
 
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index cf2bc5c531..76c6a9031b 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -579,15 +579,16 @@ static void unpack_one(unsigned nr)
 static void unpack_all(void)
 {
 	int i;
-	struct pack_header *hdr = fill(sizeof(struct pack_header));
+	unsigned char *hdr = fill(sizeof(struct pack_header));
 
-	nr_objects = ntohl(hdr->hdr_entries);
-
-	if (ntohl(hdr->hdr_signature) != PACK_SIGNATURE)
+	if (get_be32(hdr) != PACK_SIGNATURE)
 		die("bad pack file");
-	if (!pack_version_ok(hdr->hdr_version))
+	hdr += 4;
+	if (!pack_version_ok_native(get_be32(hdr)))
 		die("unknown pack file version %"PRIu32,
-			ntohl(hdr->hdr_version));
+		    get_be32(hdr));
+	hdr += 4;
+	nr_objects = get_be32(hdr);
 	use(sizeof(struct pack_header));
 
 	if (!quiet)
diff --git a/pack.h b/pack.h
index a8da040629..78f8d8b213 100644
--- a/pack.h
+++ b/pack.h
@@ -13,7 +13,8 @@ struct repository;
  */
 #define PACK_SIGNATURE 0x5041434b	/* "PACK" */
 #define PACK_VERSION 2
-#define pack_version_ok(v) ((v) == htonl(2) || (v) == htonl(3))
+#define pack_version_ok(v) pack_version_ok_native(ntohl(v))
+#define pack_version_ok_native(v) ((v) == 2 || (v) == 3)
 struct pack_header {
 	uint32_t hdr_signature;
 	uint32_t hdr_version;
-- 
2.48.1.466.g22b8cfd1fa


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 5/5] index-pack, unpack-objects: use skip_prefix to avoid magic number
  2025-01-19 13:12       ` [PATCH v2 0/5] " Jeff King
                           ` (3 preceding siblings ...)
  2025-01-19 13:25         ` [PATCH v2 4/5] index-pack, unpack-objects: use get_be32() for reading pack header Jeff King
@ 2025-01-19 13:25         ` Jeff King
  2025-01-20 15:20         ` [PATCH v2 0/5] git crashes with a SIGBUS on sparc64 during pull Koakuma
  5 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2025-01-19 13:25 UTC (permalink / raw)
  To: Koakuma; +Cc: Junio C Hamano, git@vger.kernel.org

When parsing --pack_header=, we manually skip 14 bytes to the data.
Let's use skip_prefix() to do this automatically.

Note that we overwrite our pointer to the front of the string, so we
have to add more context to the error message. We could avoid this by
declaring an extra pointer to hold the value, but I think the modified
message is actually preferable; it should give translators a bit more
context.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/index-pack.c     | 6 +++---
 builtin/unpack-objects.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index d6fd4bbde6..c632d9f88b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1956,11 +1956,11 @@ int cmd_index_pack(int argc,
 					warning(_("no threads support, ignoring %s"), arg);
 					nr_threads = 1;
 				}
-			} else if (starts_with(arg, "--pack_header=")) {
-				if (parse_pack_header_option(arg + 14,
+			} else if (skip_prefix(arg, "--pack_header=", &arg)) {
+				if (parse_pack_header_option(arg,
 							     input_buffer,
 							     &input_len) < 0)
-					die(_("bad %s"), arg);
+					die(_("bad --pack_header: %s"), arg);
 			} else if (!strcmp(arg, "-v")) {
 				verbose = 1;
 			} else if (!strcmp(arg, "--progress-title")) {
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 76c6a9031b..51a856d823 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -646,10 +646,10 @@ int cmd_unpack_objects(int argc,
 				fsck_set_msg_types(&fsck_options, arg);
 				continue;
 			}
-			if (starts_with(arg, "--pack_header=")) {
-				if (parse_pack_header_option(arg + 14,
+			if (skip_prefix(arg, "--pack_header=", &arg)) {
+				if (parse_pack_header_option(arg,
 							     buffer, &len) < 0)
-					die(_("bad %s"), arg);
+					die(_("bad --pack_header: %s"), arg);
 				continue;
 			}
 			if (skip_prefix(arg, "--max-input-size=", &arg)) {
-- 
2.48.1.466.g22b8cfd1fa

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/5] git crashes with a SIGBUS on sparc64 during pull
  2025-01-19 13:12       ` [PATCH v2 0/5] " Jeff King
                           ` (4 preceding siblings ...)
  2025-01-19 13:25         ` [PATCH v2 5/5] index-pack, unpack-objects: use skip_prefix to avoid magic number Jeff King
@ 2025-01-20 15:20         ` Koakuma
  2025-01-21 16:37           ` Junio C Hamano
  5 siblings, 1 reply; 22+ messages in thread
From: Koakuma @ 2025-01-20 15:20 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git@vger.kernel.org

Jeff King <peff@peff.net> wrote:
> Here's a v2 which fixes the reading side, as well. I think this should
> let you get through a full run of unpack-objects, but please confirm. :)
> 
> This also includes Junio's put_be32() tweak to silence sparse.
> 
> [1/5]: bswap.h: squelch potential sparse -Wcast-truncate warnings
> [2/5]: packfile: factor out --pack_header argument parsing
> [3/5]: parse_pack_header_option(): avoid unaligned memory writes
> [4/5]: index-pack, unpack-objects: use get_be32() for reading pack header
> [5/5]: index-pack, unpack-objects: use skip_prefix to avoid magic number
> 
> builtin/index-pack.c | 30 ++++++++++++------------------
> builtin/unpack-objects.c | 31 ++++++++++++-------------------
> compat/bswap.h | 24 ++++++++++++------------
> pack.h | 3 ++-
> packfile.c | 20 ++++++++++++++++++++
> packfile.h | 6 ++++++
> 6 files changed, 64 insertions(+), 50 deletions(-)
> 
> -Peff

Okay, just tested the patchset here.
Both the testcase from upthread and actual pulls seem to work well now,
without any crashes happening.
Thanks a lot!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/5] git crashes with a SIGBUS on sparc64 during pull
  2025-01-20 15:20         ` [PATCH v2 0/5] git crashes with a SIGBUS on sparc64 during pull Koakuma
@ 2025-01-21 16:37           ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2025-01-21 16:37 UTC (permalink / raw)
  To: Koakuma; +Cc: Jeff King, git@vger.kernel.org

Koakuma <koachan@protonmail.com> writes:

> Jeff King <peff@peff.net> wrote:
>> Here's a v2 which fixes the reading side, as well. I think this should
>> let you get through a full run of unpack-objects, but please confirm. :)
>> ...
>
> Okay, just tested the patchset here.
> Both the testcase from upthread and actual pulls seem to work well now,
> without any crashes happening.
> Thanks a lot!

Thanks, both.  Will queue.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-01-21 16:37 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-17  3:30 [BUG] git crashes with a SIGBUS on sparc64 during pull Koakuma
2025-01-17 12:11 ` Jeff King
2025-01-17 12:52   ` Jeff King
2025-01-17 12:54     ` [PATCH 1/3] packfile: factor out --pack_header argument parsing Jeff King
2025-01-17 22:45       ` Junio C Hamano
2025-01-18  9:23         ` Jeff King
2025-01-18 16:57           ` Koakuma
2025-01-17 12:55     ` [PATCH 2/3] parse_pack_header_option(): avoid unaligned memory writes Jeff King
2025-01-18  1:27       ` Junio C Hamano
2025-01-18  9:36         ` Jeff King
2025-01-17 12:56     ` [PATCH 3/3] index-pack, unpack-objects: use skip_prefix to avoid magic number Jeff King
2025-01-17 15:55     ` [BUG] git crashes with a SIGBUS on sparc64 during pull Koakuma
2025-01-18  9:20       ` Jeff King
2025-01-18 16:50         ` Koakuma
2025-01-19 13:12       ` [PATCH v2 0/5] " Jeff King
2025-01-19 13:23         ` [PATCH v2 1/5] bswap.h: squelch potential sparse -Wcast-truncate warnings Jeff King
2025-01-19 13:23         ` [PATCH v2 2/5] packfile: factor out --pack_header argument parsing Jeff King
2025-01-19 13:23         ` [PATCH v2 3/5] parse_pack_header_option(): avoid unaligned memory writes Jeff King
2025-01-19 13:25         ` [PATCH v2 4/5] index-pack, unpack-objects: use get_be32() for reading pack header Jeff King
2025-01-19 13:25         ` [PATCH v2 5/5] index-pack, unpack-objects: use skip_prefix to avoid magic number Jeff King
2025-01-20 15:20         ` [PATCH v2 0/5] git crashes with a SIGBUS on sparc64 during pull Koakuma
2025-01-21 16:37           ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).