git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: "Kyle J. McKay" <mackyle@gmail.com>
Cc: Steffen Prohaska <prohaska@zib.de>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org
Subject: Re: [PATCH] xread(): Fix read error when filtering >= 2GB on Mac OS X
Date: Sat, 17 Aug 2013 14:23:14 -0700	[thread overview]
Message-ID: <20130817212314.GC2904@elie.Belkin> (raw)
In-Reply-To: <0EC822B9-E5BB-4FF5-B054-167866EA2075@gmail.com>

Kyle J. McKay wrote:

> According to POSIX [1] for read:
>
> If the value of nbyte is greater than {SSIZE_MAX}, the result is
> implementation-defined.

Sure.

[...]
> Since OS X still supports running 32-bit executables, and SSIZE_MAX is 2GB -
> 1 when running 32-bit it would seem the same limit has been imposed on
> 64-bit executables.  In any case, we should avoid "implementation-defined"
> behavior

Wait --- that's a big leap.

In a 64-bit executable, SSIZE_MAX is 2^63 - 1, so the behavior is not
implementation-defined.  I'm not sure if Steffen's copy of git is
32-bit or 64-bit --- my guess would be 64-bit.  So at first glance
this does look like an XNU-specific bug, not a standards thing.

What about the related case where someone does try to "git add"
a file with a clean filter producing more than SSIZE_MAX and less
than SIZE_MAX bytes?

strbuf_grow() does not directly protect against a strbuf growing to >
SSIZE_MAX bytes, but in practice on most machines realloc() does.  So
in practice we could never read more than SSIZE_MAX characters in the
strbuf_read() codepath, but it might be worth a check for paranoia
anyway.

While we're here, it's easy to wonder: why is git reading into such a
large buffer anyway?  Normally git uses the streaming codepath for
files larger than big_file_threshold (typically 512 MiB).
Unfortunately there are cases where it doesn't.  For example:

  - convert_to_git() has not been taught to stream, so paths
    with a clean filter or requiring crlf conversion are read or
    mapped into memory.

  - deflate_to_pack() relies on seeking backward to retry when
    a pack would grow too large, so "git hash-object --stdin"
    cannot use that codepath.

  - a "clean" filter can make a file bigger.

  Perhaps git needs to learn to write to a temporary file
  when asked to keep track of a blob that is larger than fits
  reasonably in memory.  Or maybe not.

So there is room for related work but the codepaths that read()
indefinitely large files do seem to be needed, at least in the short
term.  Working around this Mac OS X-specific limitation at the read()
level like you've done still sounds like the right thing to do.

Thanks, both, for your work tracking this down.  Hopefully the next
version of the patch will be in good shape and then it can be applied
quickly.

Thanks and hope that helps,
Jonathan

  reply	other threads:[~2013-08-17 21:23 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-17 12:40 [PATCH] xread(): Fix read error when filtering >= 2GB on Mac OS X Steffen Prohaska
2013-08-17 15:27 ` John Keeping
2013-08-17 15:56 ` Torsten Bögershausen
2013-08-17 17:16 ` Johannes Sixt
2013-08-17 18:57 ` Jonathan Nieder
2013-08-17 20:25 ` Kyle J. McKay
2013-08-17 21:23   ` Jonathan Nieder [this message]
2013-08-19  6:38 ` [PATCH v2] compat: Fix read() of 2GB and more " Steffen Prohaska
2013-08-19  7:54   ` John Keeping
2013-08-19  8:20     ` Steffen Prohaska
2013-08-19  8:20   ` Johannes Sixt
2013-08-19  8:25     ` Stefan Beller
2013-08-19  8:40       ` Johannes Sixt
2013-08-19  8:28     ` Steffen Prohaska
2013-08-19  8:21   ` [PATCH v3] " Steffen Prohaska
2013-08-19 13:59     ` Eric Sunshine
2013-08-19 16:33       ` Junio C Hamano
2013-08-19 15:41     ` [PATCH v4] " Steffen Prohaska
2013-08-19 16:04       ` Linus Torvalds
2013-08-19 16:37         ` Steffen Prohaska
2013-08-19 17:24           ` Junio C Hamano
2013-08-19 17:16         ` Junio C Hamano
2013-08-19 17:28           ` Linus Torvalds
2013-08-19 21:56           ` Kyle J. McKay
2013-08-19 22:51             ` Linus Torvalds
2013-08-27  4:59               ` Junio C Hamano
2013-08-20  6:43       ` [PATCH v5 0/2] Fix IO of >=2GB on Mac OS X by limiting IO chunks Steffen Prohaska
2013-08-20  6:43         ` [PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X Steffen Prohaska
2013-08-20 19:37           ` Junio C Hamano
2013-08-21 19:50           ` Torsten Bögershausen
2013-08-20  6:43         ` [PATCH v5 2/2] Revert "compate/clipped-write.c: large write(2) fails on Mac OS X/XNU" Steffen Prohaska
2013-08-21 13:46         ` [PATCH v6 0/2] Fix IO >= 2GB on Mac, fixed typo Steffen Prohaska
2013-08-21 13:46           ` [PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X Steffen Prohaska
2013-08-21 13:46           ` [PATCH v5 2/2] Revert "compate/clipped-write.c: large write(2) fails on Mac OS X/XNU" Steffen Prohaska
2013-08-21 15:58           ` [PATCH v6 0/2] Fix IO >= 2GB on Mac, fixed typo Junio C Hamano
2013-08-19  8:27   ` [PATCH v2] compat: Fix read() of 2GB and more on Mac OS X Johannes Sixt
2013-08-19 14:41   ` Torsten Bögershausen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130817212314.GC2904@elie.Belkin \
    --to=jrnieder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mackyle@gmail.com \
    --cc=prohaska@zib.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).