From: Junio C Hamano <gitster@pobox.com>
To: kusmabite@gmail.com
Cc: git@vger.kernel.org
Subject: Re: [PATCH 6/6] zlib: zlib can only process 4GB at a time
Date: Mon, 13 Jun 2011 04:56:56 -0700 [thread overview]
Message-ID: <7vsjreq707.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <BANLkTi=sT_LxRaJSM3Cj-QkSwqGan29K7A@mail.gmail.com> (Erik Faye-Lund's message of "Mon, 13 Jun 2011 13:17:00 +0200")
Erik Faye-Lund <kusmabite@gmail.com> writes:
> On Sun, Jun 12, 2011 at 11:33 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Erik Faye-Lund <kusmabite@gmail.com> writes:
>>
>>> On Fri, Jun 10, 2011 at 10:15 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>>> The size of objects we read from the repository and data we try to put
>>>> into the repository are represented in "unsigned long", so that on larger
>>>> architectures we can handle objects that weigh more than 4GB.
>>>
>>> shouldn't this be "size_t" instead of "unsigned long"?
>>
>> No, this must be unsigned long as that is the internal type we use.
There are two unrelated issues you have to address if your "unsigned long"
is 32-bit and you want to handle more than 4GB data in git.
When git holds repository data in core, it always has represented it as a
pair of <pointer to the beginning of memory block that holds data, length>
where the length is "unsigned long" from day one. See read_sha1_file() in
read-cache.c that appears in e83c516 (Initial revision of "git", the
information manager from hell, 2005-04-07). This limits you to 4GB if your
"unsigned long" is 32-bit.
The right type to use in order to enable more platforms to go beyond 4GB
might be to use uintmax_t, but the series you are commenting on however is
not about changing that.
We have another problem stemming from the way in which we incorrectly used
zlib API even on a platform where "unsigned long" is capable to express
size beyond 4GB. In many places, we set up the state object used by zlib
API (i.e. z_stream) to point at the "pointer to the beginning of memory
block" with its "next_in" field, and "length" with its "avail_in" field,
pass that object around in the callchain, and expect that by making
repeated call to zlib, "next_in" would eventually progress to the end of
the data we have in core while "avail_in" would fall to zero when all data
is processed. The "avail_in" field zlib API gives us however is uInt which
is 32-bit, so this expectation is incorrect. If you have 4G+32 bytes of
data, for example, we only feed 32 bytes and stop, barfing on "corrupt"
data.
That is the issue this series is about. The approach of the series takes
is to wrap zlib's state object with our own, that has our own "avail_in"
field (by the way, the same issue exists in "next_out/avail_out" on the
output side) that uses the same type of "length" used in other parts of
our system.
The type of the "avail_in" and "avail_out" fields in the wrapper needs to
be updated to match that type when you address the "other" issue to update
all the internal "length" from "unsigned long" to "uintmax_t", but not
before. And updating the rest of the system to "uintmax_t" is not part of
the scope of this series.
next prev parent reply other threads:[~2011-06-13 11:57 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-10 20:15 [PATCH 0/6] zlib only processes 4GB at a time Junio C Hamano
2011-06-10 20:15 ` [PATCH 1/6] zlib: refactor error message formatter Junio C Hamano
2011-06-10 20:15 ` [PATCH 2/6] zlib: wrap remaining calls to direct inflate/inflateEnd Junio C Hamano
2011-06-10 20:15 ` [PATCH 3/6] zlib: wrap inflateInit2 used to accept only for gzip format Junio C Hamano
2011-06-10 20:15 ` [PATCH 4/6] zlib: wrap deflate side of the API Junio C Hamano
2011-06-10 22:23 ` Thiago Farina
2011-06-10 23:00 ` Junio C Hamano
2011-06-10 20:15 ` [PATCH 5/6] zlib: wrap deflateBound() too Junio C Hamano
2011-06-10 20:15 ` [PATCH 6/6] zlib: zlib can only process 4GB at a time Junio C Hamano
2011-06-12 20:43 ` Erik Faye-Lund
2011-06-12 21:33 ` Junio C Hamano
2011-06-12 21:46 ` Matthieu Moy
2011-06-13 11:17 ` Erik Faye-Lund
2011-06-13 11:52 ` Jonathan Nieder
2011-06-13 11:56 ` Junio C Hamano [this message]
2011-06-10 23:47 ` [PATCH 7/6] zlib: allow feeding more than 4GB in one go Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vsjreq707.fsf@alter.siamese.dyndns.org \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=kusmabite@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).