All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Andrzej Hunt via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Chris Torek <chris.torek@gmail.com>, Jeff King <peff@peff.net>,
	Andrzej Hunt <andrzej@ahunt.org>
Subject: [PATCH v2 0/3] Fix uninitialised reads found with MSAN
Date: Mon, 14 Jun 2021 15:51:13 +0000	[thread overview]
Message-ID: <pull.1033.v2.git.git.1623685877.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1033.git.git.1623343712.gitgitgadget@gmail.com>

V2 replaces an #if'd memset with some brace initialisation (patch 3/3) as
per review comments.

I've also removed an irrelevant "technically" from commit message 2/3, and
fixed a typo in commit message 3/3.

Andrzej Hunt (3):
  bulk-checkin: make buffer reuse more obvious and safer
  split-index: use oideq instead of memcmp to compare object_id's
  builtin/checkout--worker: zero-initialise struct to avoid MSAN
    complaints

 builtin/checkout--worker.c | 2 +-
 bulk-checkin.c             | 3 +--
 split-index.c              | 3 ++-
 3 files changed, 4 insertions(+), 4 deletions(-)


base-commit: 62a8d224e6203d9d3d2d1d63a01cf5647ec312c9
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1033%2Fahunt%2Fmsan-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1033/ahunt/msan-v2
Pull-Request: https://github.com/git/git/pull/1033

Range-diff vs v1:

 1:  7659d4bf13c2 = 1:  7659d4bf13c2 bulk-checkin: make buffer reuse more obvious and safer
 2:  14b0d5dd7fce ! 2:  6943eb511bee split-index: use oideq instead of memcmp to compare object_id's
     @@ Commit message
          include that field when calling memcmp on a subset of the cache_entry.
          Depending on which hashing algorithm is being used, only part of
          object_id.hash is actually being used, therefore including it in a
     -    memcmp() is technically incorrect. Instead we choose to exclude the
     -    object_id when calling memcmp(), and call oideq() separately.
     +    memcmp() is incorrect. Instead we choose to exclude the object_id when
     +    calling memcmp(), and call oideq() separately.
      
          This issue was found when running t1700-split-index with MSAN, see MSAN
          output below (on my machine, offset 76 corresponds to 4 bytes after the
 3:  cd1e1f6985c7 ! 3:  4bdc0b77f6f2 builtin/checkout--worker: memset struct to avoid MSAN complaints
     @@ Metadata
      Author: Andrzej Hunt <ajrhunt@google.com>
      
       ## Commit message ##
     -    builtin/checkout--worker: memset struct to avoid MSAN complaints
     +    builtin/checkout--worker: zero-initialise struct to avoid MSAN complaints
      
          report_result() sends a struct to the parent process, but that struct
     -    contains unintialised padding bytes. Running this code under MSAN
     -    rightly triggers a warning - but we also don't care about this warning
     -    because we control the receiving code, and we therefore know that those
     -    padding bytes won't be read on the receiving end. Therefore we add a
     -    memset to convince MSAN that this memory is safe to read - but only
     -    when building with MSAN to avoid this cost in normal usage.
     +    would contain uninitialised padding bytes. Running this code under MSAN
     +    rightly triggers a warning - but we don't particularly care about this
     +    warning because we control the receiving code, and we therefore know
     +    that those padding bytes won't be read on the receiving end.
     +
     +    We could simply suppress this warning under MSAN with the approporiate
     +    ifdef'd attributes, but a less intrusive solution is to 0-initialise the
     +    struct, which guarantees that the padding will also be initialised.
      
          Interestingly, in the error-case branch, we only try to copy the first
          two members of pc_item_result, by copying only PC_ITEM_RESULT_BASE_SIZE
     @@ Commit message
          after the end of the second last member. We could avoid doing this by
          redefining PC_ITEM_RESULT_BASE_SIZE as
          'offsetof(second_last_member) + sizeof(second_last_member)', but there's
     -    no huge benefit to doing so (and our memset hack silences the MSAN
     -    warning in this scenario either way).
     +    no huge benefit to doing so (and this patch silences the MSAN warning in
     +    this scenario either way).
      
          MSAN output from t2080 (partially interleaved due to the
          parallel work :) ):
     @@ Commit message
          Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
      
       ## builtin/checkout--worker.c ##
     -@@ builtin/checkout--worker.c: static void report_result(struct parallel_checkout_item *pc_item)
     - 	struct pc_item_result res;
     +@@ builtin/checkout--worker.c: static void packet_to_pc_item(const char *buffer, int len,
     + 
     + static void report_result(struct parallel_checkout_item *pc_item)
     + {
     +-	struct pc_item_result res;
     ++	struct pc_item_result res = { 0 };
       	size_t size;
       
     -+#if defined(__has_feature)
     -+#  if __has_feature(memory_sanitizer)
     -+	// MSAN workaround: res contains padding bytes, which will remain
     -+	// permanently unintialised. Later, we read all of res in order to send
     -+	// it to the parent process - and MSAN (rightly) complains that we're
     -+	// reading those unintialised padding bytes. By memset'ing res we
     -+	// guarantee that there are no uninitialised bytes.
     -+	memset(&res, 0, sizeof(res));
     -+#endif
     -+#endif
     -+
       	res.id = pc_item->id;
     - 	res.status = pc_item->status;
     - 

-- 
gitgitgadget

  parent reply	other threads:[~2021-06-14 15:52 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10 16:48 [PATCH 0/3] Fix uninitialised reads found with MSAN Andrzej Hunt via GitGitGadget
2021-06-10 16:48 ` [PATCH 1/3] bulk-checkin: make buffer reuse more obvious and safer Andrzej Hunt via GitGitGadget
2021-06-10 16:48 ` [PATCH 2/3] split-index: use oideq instead of memcmp to compare object_id's Andrzej Hunt via GitGitGadget
2021-06-10 16:48 ` [PATCH 3/3] builtin/checkout--worker: memset struct to avoid MSAN complaints Andrzej Hunt via GitGitGadget
2021-06-11  4:43   ` Chris Torek
2021-06-11  6:28     ` Junio C Hamano
2021-06-11 15:37       ` Andrzej Hunt
2021-06-14  1:04         ` Junio C Hamano
2021-06-11 17:11 ` [PATCH 0/3] Fix uninitialised reads found with MSAN Jeff King
2021-06-14 15:51 ` Andrzej Hunt via GitGitGadget [this message]
2021-06-14 15:51   ` [PATCH v2 1/3] bulk-checkin: make buffer reuse more obvious and safer Andrzej Hunt via GitGitGadget
2021-06-14 15:51   ` [PATCH v2 2/3] split-index: use oideq instead of memcmp to compare object_id's Andrzej Hunt via GitGitGadget
2021-06-14 15:51   ` [PATCH v2 3/3] builtin/checkout--worker: zero-initialise struct to avoid MSAN complaints Andrzej Hunt via GitGitGadget
2021-06-17  9:28 ` [PATCH 0/3] Fix uninitialised reads found with MSAN Philip Oakley
2021-06-20 15:19   ` Andrzej Hunt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1033.v2.git.git.1623685877.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=andrzej@ahunt.org \
    --cc=chris.torek@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.