From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Ghanshyam Thakkar <shyamthakkar001@gmail.com>,
"brian m. carlson" <sandals@crustytoothpaste.net>
Subject: [PATCH v2 04/20] global: ensure that object IDs are always padded
Date: Thu, 13 Jun 2024 08:13:40 +0200 [thread overview]
Message-ID: <fa263d6b0777ba8d157f7364507590c19069ed56.1718259125.git.ps@pks.im> (raw)
In-Reply-To: <cover.1718259125.git.ps@pks.im>
[-- Attachment #1: Type: text/plain, Size: 6452 bytes --]
The `oidcmp()` and `oideq()` functions only compare the prefix length as
specified by the given hash algorithm. This mandates that the object IDs
have a valid hash algorithm set, or otherwise we wouldn't be able to
figure out that prefix. As we do not have a hash algorithm in many
cases, for example when handling null object IDs, this assumption cannot
always be fulfilled. We thus have a fallback in place that instead uses
`the_repository` to derive the hash function. This implicit dependency
is hidden away from callers and can be quite surprising, especially in
contexts where there may be no repository.
In theory, we can adapt those functions to always memcmp(3P) the whole
length of their hash arrays. But there exist a couple of sites where we
populate `struct object_id`s such that only the prefix of its hash that
is actually used by the hash algorithm is populated. The remaining bytes
are left uninitialized. The fact that those bytes are uninitialized also
leads to warnings under Valgrind in some places where we copy those
bytes.
Refactor callsites where we populate object IDs to always initialize all
bytes. This also allows us to get rid of `oidcpy_with_padding()`, for
one because the input is now fully initialized, and because `oidcpy()`
will now always copy the whole hash array.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
hash-ll.h | 2 ++
hash.h | 16 ----------------
hex.c | 6 +++++-
http-push.c | 1 +
notes.c | 2 ++
object-file.c | 2 ++
oidtree.c | 4 ++--
parallel-checkout.c | 8 +-------
8 files changed, 15 insertions(+), 26 deletions(-)
diff --git a/hash-ll.h b/hash-ll.h
index dbb96369fc..b72f84f4ae 100644
--- a/hash-ll.h
+++ b/hash-ll.h
@@ -288,6 +288,8 @@ static inline void oidread(struct object_id *oid, const unsigned char *hash,
const struct git_hash_algo *algop)
{
memcpy(oid->hash, hash, algop->rawsz);
+ if (algop->rawsz < GIT_MAX_RAWSZ)
+ memset(oid->hash + algop->rawsz, 0, GIT_MAX_RAWSZ - algop->rawsz);
oid->algo = hash_algo_by_ptr(algop);
}
diff --git a/hash.h b/hash.h
index 43623a0c86..e43e3d8b5a 100644
--- a/hash.h
+++ b/hash.h
@@ -31,22 +31,6 @@ static inline int is_null_oid(const struct object_id *oid)
return oideq(oid, null_oid());
}
-/* Like oidcpy() but zero-pads the unused bytes in dst's hash array. */
-static inline void oidcpy_with_padding(struct object_id *dst,
- const struct object_id *src)
-{
- size_t hashsz;
-
- if (!src->algo)
- hashsz = the_hash_algo->rawsz;
- else
- hashsz = hash_algos[src->algo].rawsz;
-
- memcpy(dst->hash, src->hash, hashsz);
- memset(dst->hash + hashsz, 0, GIT_MAX_RAWSZ - hashsz);
- dst->algo = src->algo;
-}
-
static inline int is_empty_blob_oid(const struct object_id *oid)
{
return oideq(oid, the_hash_algo->empty_blob);
diff --git a/hex.c b/hex.c
index d42262bdca..bc9e86a978 100644
--- a/hex.c
+++ b/hex.c
@@ -25,8 +25,12 @@ int get_oid_hex_algop(const char *hex, struct object_id *oid,
const struct git_hash_algo *algop)
{
int ret = get_hash_hex_algop(hex, oid->hash, algop);
- if (!ret)
+ if (!ret) {
oid_set_algo(oid, algop);
+ if (algop->rawsz != GIT_MAX_RAWSZ)
+ memset(oid->hash + algop->rawsz, 0,
+ GIT_MAX_RAWSZ - algop->rawsz);
+ }
return ret;
}
diff --git a/http-push.c b/http-push.c
index 86de238b84..a97df4a1fb 100644
--- a/http-push.c
+++ b/http-push.c
@@ -1016,6 +1016,7 @@ static void remote_ls(const char *path, int flags,
/* extract hex from sharded "xx/x{38}" filename */
static int get_oid_hex_from_objpath(const char *path, struct object_id *oid)
{
+ memset(oid->hash, 0, GIT_MAX_RAWSZ);
oid->algo = hash_algo_by_ptr(the_hash_algo);
if (strlen(path) != the_hash_algo->hexsz + 1)
diff --git a/notes.c b/notes.c
index 3a8da92fb9..afe2e2882e 100644
--- a/notes.c
+++ b/notes.c
@@ -427,6 +427,8 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
hashsz - prefix_len))
goto handle_non_note; /* entry.path is not a SHA1 */
+ memset(object_oid.hash + hashsz, 0, GIT_MAX_RAWSZ - hashsz);
+
type = PTR_TYPE_NOTE;
} else if (path_len == 2) {
/* This is potentially an internal node */
diff --git a/object-file.c b/object-file.c
index c161e3e2a5..bb97f8a809 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2743,6 +2743,8 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
!hex_to_bytes(oid.hash + 1, de->d_name,
the_hash_algo->rawsz - 1)) {
oid_set_algo(&oid, the_hash_algo);
+ memset(oid.hash + the_hash_algo->rawsz, 0,
+ GIT_MAX_RAWSZ - the_hash_algo->rawsz);
if (obj_cb) {
r = obj_cb(&oid, path->buf, data);
if (r)
diff --git a/oidtree.c b/oidtree.c
index daef175dc7..92d03b52db 100644
--- a/oidtree.c
+++ b/oidtree.c
@@ -42,7 +42,7 @@ void oidtree_insert(struct oidtree *ot, const struct object_id *oid)
* Clear the padding and copy the result in separate steps to
* respect the 4-byte alignment needed by struct object_id.
*/
- oidcpy_with_padding(&k, oid);
+ oidcpy(&k, oid);
memcpy(on->k, &k, sizeof(k));
/*
@@ -60,7 +60,7 @@ int oidtree_contains(struct oidtree *ot, const struct object_id *oid)
struct object_id k;
size_t klen = sizeof(k);
- oidcpy_with_padding(&k, oid);
+ oidcpy(&k, oid);
if (oid->algo == GIT_HASH_UNKNOWN)
klen -= sizeof(oid->algo);
diff --git a/parallel-checkout.c b/parallel-checkout.c
index b5a714c711..08b960aac8 100644
--- a/parallel-checkout.c
+++ b/parallel-checkout.c
@@ -429,13 +429,7 @@ static void send_one_item(int fd, struct parallel_checkout_item *pc_item)
fixed_portion->ident = pc_item->ca.ident;
fixed_portion->name_len = name_len;
fixed_portion->working_tree_encoding_len = working_tree_encoding_len;
- /*
- * We pad the unused bytes in the hash array because, otherwise,
- * Valgrind would complain about passing uninitialized bytes to a
- * write() syscall. The warning doesn't represent any real risk here,
- * but it could hinder the detection of actual errors.
- */
- oidcpy_with_padding(&fixed_portion->oid, &pc_item->ce->oid);
+ oidcpy(&fixed_portion->oid, &pc_item->ce->oid);
variant = data + sizeof(*fixed_portion);
if (working_tree_encoding_len) {
--
2.45.2.457.g8d94cfb545.dirty
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-06-13 6:13 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-11 11:57 [PATCH 00/21] Introduce `USE_THE_REPOSITORY_VARIABLE` macro Patrick Steinhardt
2024-06-11 11:57 ` [PATCH 01/21] hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions Patrick Steinhardt
2024-06-11 11:57 ` [PATCH 02/21] hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()` Patrick Steinhardt
2024-06-11 11:57 ` [PATCH 03/21] hash: require hash algorithm in `oidread()` and `oidclr()` Patrick Steinhardt
2024-06-11 11:57 ` [PATCH 04/21] global: ensure that object IDs are always padded Patrick Steinhardt
2024-06-11 11:57 ` [PATCH 05/21] hash: convert `oidcmp()` and `oideq()` to compare whole hash Patrick Steinhardt
2024-06-11 11:58 ` [PATCH 06/21] hash: make `is_null_oid()` independent of `the_repository` Patrick Steinhardt
2024-06-11 11:58 ` [PATCH 07/21] hash: require hash algorithm in `is_empty_{blob,tree}_oid()` Patrick Steinhardt
2024-06-11 11:58 ` [PATCH 08/21] hash: require hash algorithm in `empty_tree_oid_hex()` Patrick Steinhardt
2024-06-11 11:58 ` [PATCH 09/21] global: introduce `USE_THE_REPOSITORY_VARIABLE` macro Patrick Steinhardt
2024-06-11 11:58 ` [PATCH 10/21] refs: avoid include cycle with "repository.h" Patrick Steinhardt
2024-06-11 11:58 ` [PATCH 11/21] hash-ll: merge with "hash.h" Patrick Steinhardt
2024-06-11 11:58 ` [PATCH 12/21] http-fetch: don't crash when parsing packfile without a repo Patrick Steinhardt
2024-06-11 11:58 ` [PATCH 13/21] oidset: pass hash algorithm when parsing file Patrick Steinhardt
2024-06-11 11:58 ` [PATCH 14/21] protocol-caps: use hash algorithm from passed-in repository Patrick Steinhardt
2024-06-11 11:58 ` [PATCH 15/21] replace-object: " Patrick Steinhardt
2024-06-11 11:58 ` [PATCH 16/21] compat/fsmonitor: remove dependency on `the_repository` in Darwin IPC Patrick Steinhardt
2024-06-11 23:16 ` brian m. carlson
2024-06-12 7:38 ` Patrick Steinhardt
2024-06-11 11:58 ` [PATCH 17/21] t/helper: use correct object hash in partial-clone helper Patrick Steinhardt
2024-06-11 11:58 ` [PATCH 18/21] t/helper: fix segfault in "oid-array" command without repository Patrick Steinhardt
2024-06-11 13:23 ` Ghanshyam Thakkar
2024-06-11 11:59 ` [PATCH 19/21] t/helper: remove dependency on `the_repository` in "oidtree" Patrick Steinhardt
2024-06-11 12:57 ` Ghanshyam Thakkar
2024-06-12 7:38 ` Patrick Steinhardt
2024-06-11 23:17 ` brian m. carlson
2024-06-12 7:38 ` Patrick Steinhardt
2024-06-11 11:59 ` [PATCH 20/21] t/helper: remove dependency on `the_repository` in "proc-receive" Patrick Steinhardt
2024-06-11 11:59 ` [PATCH 21/21] hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE` Patrick Steinhardt
2024-06-11 23:24 ` [PATCH 00/21] Introduce `USE_THE_REPOSITORY_VARIABLE` macro brian m. carlson
2024-06-12 7:37 ` Patrick Steinhardt
2024-06-12 10:12 ` Ghanshyam Thakkar
2024-06-13 6:13 ` [PATCH v2 00/20] " Patrick Steinhardt
2024-06-13 6:13 ` [PATCH v2 01/20] hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions Patrick Steinhardt
2024-06-13 6:13 ` [PATCH v2 02/20] hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()` Patrick Steinhardt
2024-06-13 6:13 ` [PATCH v2 03/20] hash: require hash algorithm in `oidread()` and `oidclr()` Patrick Steinhardt
2024-06-13 6:13 ` Patrick Steinhardt [this message]
2024-06-13 6:13 ` [PATCH v2 05/20] hash: convert `oidcmp()` and `oideq()` to compare whole hash Patrick Steinhardt
2024-06-13 6:13 ` [PATCH v2 06/20] hash: make `is_null_oid()` independent of `the_repository` Patrick Steinhardt
2024-06-13 6:13 ` [PATCH v2 07/20] hash: require hash algorithm in `is_empty_{blob,tree}_oid()` Patrick Steinhardt
2024-06-13 6:14 ` [PATCH v2 08/20] hash: require hash algorithm in `empty_tree_oid_hex()` Patrick Steinhardt
2024-06-13 10:01 ` Phillip Wood
2024-06-13 15:39 ` Junio C Hamano
2024-06-14 5:23 ` Patrick Steinhardt
2024-06-13 6:14 ` [PATCH v2 09/20] global: introduce `USE_THE_REPOSITORY_VARIABLE` macro Patrick Steinhardt
2024-06-13 6:14 ` [PATCH v2 10/20] refs: avoid include cycle with "repository.h" Patrick Steinhardt
2024-06-13 6:14 ` [PATCH v2 11/20] hash-ll: merge with "hash.h" Patrick Steinhardt
2024-06-13 6:14 ` [PATCH v2 12/20] http-fetch: don't crash when parsing packfile without a repo Patrick Steinhardt
2024-06-13 6:14 ` [PATCH v2 13/20] oidset: pass hash algorithm when parsing file Patrick Steinhardt
2024-06-13 6:14 ` [PATCH v2 14/20] protocol-caps: use hash algorithm from passed-in repository Patrick Steinhardt
2024-06-13 6:14 ` [PATCH v2 15/20] replace-object: " Patrick Steinhardt
2024-06-13 6:14 ` [PATCH v2 16/20] compat/fsmonitor: fix socket path in networked SHA256 repos Patrick Steinhardt
2024-06-13 6:14 ` [PATCH v2 17/20] t/helper: use correct object hash in partial-clone helper Patrick Steinhardt
2024-06-13 6:14 ` [PATCH v2 18/20] t/helper: fix segfault in "oid-array" command without repository Patrick Steinhardt
2024-06-13 6:14 ` [PATCH v2 19/20] t/helper: remove dependency on `the_repository` in "proc-receive" Patrick Steinhardt
2024-06-13 6:14 ` [PATCH v2 20/20] hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE` Patrick Steinhardt
2024-06-13 18:01 ` [PATCH v2 00/20] Introduce `USE_THE_REPOSITORY_VARIABLE` macro Junio C Hamano
2024-06-13 18:48 ` Junio C Hamano
2024-06-13 23:14 ` Ramsay Jones
2024-06-13 23:18 ` Junio C Hamano
2024-06-13 23:55 ` Ramsay Jones
2024-06-14 0:17 ` Junio C Hamano
2024-06-14 5:28 ` Patrick Steinhardt
2024-06-14 15:54 ` Junio C Hamano
2024-06-14 6:49 ` [PATCH v3 " Patrick Steinhardt
2024-06-14 6:49 ` [PATCH v3 01/20] hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions Patrick Steinhardt
2024-06-14 6:49 ` [PATCH v3 02/20] hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()` Patrick Steinhardt
2024-06-14 6:49 ` [PATCH v3 03/20] hash: require hash algorithm in `oidread()` and `oidclr()` Patrick Steinhardt
2024-06-14 6:49 ` [PATCH v3 04/20] global: ensure that object IDs are always padded Patrick Steinhardt
2024-06-14 6:50 ` [PATCH v3 05/20] hash: convert `oidcmp()` and `oideq()` to compare whole hash Patrick Steinhardt
2024-06-17 9:18 ` Karthik Nayak
2024-06-14 6:50 ` [PATCH v3 06/20] hash: make `is_null_oid()` independent of `the_repository` Patrick Steinhardt
2024-06-14 6:50 ` [PATCH v3 07/20] hash: require hash algorithm in `is_empty_{blob,tree}_oid()` Patrick Steinhardt
2024-06-14 6:50 ` [PATCH v3 08/20] hash: require hash algorithm in `empty_tree_oid_hex()` Patrick Steinhardt
2024-06-14 6:50 ` [PATCH v3 09/20] global: introduce `USE_THE_REPOSITORY_VARIABLE` macro Patrick Steinhardt
2024-06-17 9:30 ` Karthik Nayak
2024-06-18 5:16 ` Patrick Steinhardt
2024-06-18 9:25 ` Karthik Nayak
2024-06-18 15:58 ` Junio C Hamano
2024-06-18 16:56 ` Junio C Hamano
2024-06-18 20:20 ` Karthik Nayak
2024-06-14 6:50 ` [PATCH v3 10/20] refs: avoid include cycle with "repository.h" Patrick Steinhardt
2024-06-14 6:50 ` [PATCH v3 11/20] hash-ll: merge with "hash.h" Patrick Steinhardt
2024-06-14 6:50 ` [PATCH v3 12/20] http-fetch: don't crash when parsing packfile without a repo Patrick Steinhardt
2024-06-14 6:50 ` [PATCH v3 13/20] oidset: pass hash algorithm when parsing file Patrick Steinhardt
2024-06-14 6:50 ` [PATCH v3 14/20] protocol-caps: use hash algorithm from passed-in repository Patrick Steinhardt
2024-06-14 6:50 ` [PATCH v3 15/20] replace-object: " Patrick Steinhardt
2024-06-14 6:50 ` [PATCH v3 16/20] compat/fsmonitor: fix socket path in networked SHA256 repos Patrick Steinhardt
2024-06-14 6:51 ` [PATCH v3 17/20] t/helper: use correct object hash in partial-clone helper Patrick Steinhardt
2024-06-14 6:51 ` [PATCH v3 18/20] t/helper: fix segfault in "oid-array" command without repository Patrick Steinhardt
2024-06-14 6:51 ` [PATCH v3 19/20] t/helper: remove dependency on `the_repository` in "proc-receive" Patrick Steinhardt
2024-06-14 6:51 ` [PATCH v3 20/20] hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE` Patrick Steinhardt
2024-06-18 9:56 ` [PATCH v3 00/20] Introduce `USE_THE_REPOSITORY_VARIABLE` macro Phillip Wood
2024-06-18 20:22 ` Karthik Nayak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fa263d6b0777ba8d157f7364507590c19069ed56.1718259125.git.ps@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=sandals@crustytoothpaste.net \
--cc=shyamthakkar001@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).