git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Martin Koegler <martin.koegler@chello.at>
Subject: Re: [PATCH] zlib.c: use size_t for size
Date: Fri, 12 Oct 2018 22:38:45 -0400	[thread overview]
Message-ID: <20181013023845.GA15595@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqqsh1bbq36.fsf@gitster-ct.c.googlers.com>

On Fri, Oct 12, 2018 at 04:07:25PM +0900, Junio C Hamano wrote:

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index e6316d294d..b9ca04eb8a 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -266,15 +266,15 @@ static void copy_pack_data(struct hashfile *f,
>  		struct packed_git *p,
>  		struct pack_window **w_curs,
>  		off_t offset,
> -		off_t len)
> +		size_t len)
>  {
>  	unsigned char *in;
> -	unsigned long avail;
> +	size_t avail;

I know there were a lot of comments about "maybe this off_t switch is
not good". Let me say something a bit stronger: I think this part of the
change is strictly worse.

copy_pack_data() looks like this right now:

  static void copy_pack_data(struct hashfile *f,
                  struct packed_git *p,
                  struct pack_window **w_curs,
                  off_t offset,
                  off_t len)
  {
          unsigned char *in;
          unsigned long avail;
  
          while (len) {
                  in = use_pack(p, w_curs, offset, &avail);
                  if (avail > len)
                          avail = (unsigned long)len;
                  hashwrite(f, in, avail);
                  offset += avail;
                  len -= avail;
          }
  }

So right now let's imagine that off_t is 64-bit, and "unsigned long" is
32-bit (e.g., 32-bit system, or an IL32P64 model like Windows). We'll
repeatedly ask use_pack() for a window, and it will tell us how many
bytes we have in "avail". So even as a 32-bit value, that just means
we'll process chunks smaller than 4GB, and this is correct (or at least
this part of it -- hold on). But we can still process the whole "len"
given by the off_t eventually.

But by switching away from off_t in the function interface, we risk
truncation before we even enter the loop. Because of the switch to
size_t, it actually works on an IL32P64 system (because size_t is big
there), but it has introduced a bug on a true 32-bit system. If your
off_t really is 64-bit (and it generally is because we #define
_FILE_OFFSET_BITS), the function will truncate modulo 2^32.

And nor will most compilers warn without -Wconversion. You can try it
with this on Linux:

  #define _FILE_OFFSET_BITS 64
  #include <unistd.h>
  
  void foo(size_t x);
  void bar(off_t x);
  
  void bar(off_t x)
  {
  
  	foo(x);
  }

That compiles fine with "gcc -c -m32 -Wall -Werror -Wextra" for me.
Adding "-Wconversion" catches it, but our code base is not close to
compiling with that warning enabled.

So I don't think this hunk is actually fixing any problems, and is
actually introducing one.

I do in general support moving to size_t over "unsigned long". Switching
avail to size_t makes sense here. It's just the off_t part that is
funny.

-Peff

  parent reply	other threads:[~2018-10-13  2:38 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-12  7:07 [PATCH] zlib.c: use size_t for size Junio C Hamano
2018-10-12  9:54 ` Johannes Schindelin
2018-10-12 13:52   ` Junio C Hamano
2018-10-12 15:34     ` Johannes Schindelin
2018-10-12 23:23       ` Ramsay Jones
2018-10-12 20:42 ` [PATCH v2 1/1] " tboegi
2018-10-12 22:22   ` SZEDER Gábor
2018-10-13  5:00     ` Torsten Bögershausen
2018-10-14  2:16       ` Ramsay Jones
2018-10-14  2:31         ` Ramsay Jones
2018-10-14  2:52         ` Jeff King
2018-10-14 15:03           ` Ramsay Jones
2018-10-15  0:01             ` Jeff King
2018-10-15  0:41               ` Ramsay Jones
2018-10-15  4:22                 ` Junio C Hamano
2018-10-15  5:54                   ` Torsten Bögershausen
2018-10-13  2:38 ` Jeff King [this message]
2018-10-13  2:46   ` [PATCH] " Jeff King
2018-10-13  8:43     ` Johannes Sixt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181013023845.GA15595@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=martin.koegler@chello.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).