From: Junio C Hamano <gitster@pobox.com>
To: Martin Koegler <martin.koegler@chello.at>,
Ramsay Jones <ramsay@ramsayjones.plus.com>
Cc: git@vger.kernel.org, Johannes.Schindelin@gmx.de
Subject: Re: [PATCH 1/9] Convert pack-objects to size_t
Date: Mon, 14 Aug 2017 10:08:05 -0700 [thread overview]
Message-ID: <xmqqfucuw00a.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <xmqqtw1bw1v6.fsf@gitster.mtv.corp.google.com> (Junio C. Hamano's message of "Sun, 13 Aug 2017 15:15:41 -0700")
Junio C Hamano <gitster@pobox.com> writes:
> One interesting question is which of these two types we should use
> for the size of objects Git uses.
>
> Most of the "interesting" operations done by Git require that the
> thing is in core as a whole before we can do anything (e.g. compare
> two such things to produce delta, have one in core and apply patch),
> so it is tempting that we deal with size_t, but at the lowest level
> to serve as a SCM, i.e. recording the state of a file at each
> version, we actually should be able to exceed the in-core
> limit---both "git add" of a huge file whose contents would not fit
> in-core and "git checkout" of a huge blob whose inflated contents
> would not fit in-core should (in theory, modulo bugs) be able to
> exercise the streaming interface to handle such case without holding
> everything in-core at once. So from that point of view, even size_t
> may not be the "correct" type to use.
A few additions to the above observations.
- We have varint that encodes how far the location from a delta
representation of an object to its base object in the packfile.
Both encoding and decoding sides in the current code use off_t to
represent this offset, so we can already reference an object that
is far in the same packfile as a base.
- I think it is OK in practice to limit the size of individual
objects to size_t (i.e. on 32-bit arch, you cannot interact with
a repository with an object whose size exceeds 4GB). Using off_t
would allow occasional ultra-huge objects that can only be added
and checked in via the streaming API on such a platform, but I
suspect that it may become too much of a hassle to maintain.
It may help reducing the maintenance if we introduced obj_size_t
that is defined to be size_t for now, so that we can later swap
it to ofs_t or some larger type when we know we do need to
support objects whose size cannot be expressed in size_t, but I
do not offhand know what the pros-and-cons with such an approach
would look like.
Thanks.
next prev parent reply other threads:[~2017-08-14 17:08 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-12 8:47 [PATCH 1/9] Convert pack-objects to size_t Martin Koegler
2017-08-12 8:47 ` [PATCH 2/9] Convert index-pack " Martin Koegler
2017-08-12 13:51 ` Ramsay Jones
2017-08-12 8:47 ` [PATCH 3/9] Convert unpack-objects " Martin Koegler
2017-08-12 14:07 ` Martin Ågren
2017-08-13 18:25 ` Martin Koegler
2017-08-12 8:47 ` [PATCH 4/9] Convert archive functions " Martin Koegler
2017-08-12 8:47 ` [PATCH 5/9] Convert various things " Martin Koegler
2017-08-12 13:27 ` Martin Ågren
2017-08-13 17:48 ` Martin Koegler
2017-08-12 8:47 ` [PATCH 6/9] Use size_t for config parsing Martin Koegler
2017-08-12 8:47 ` [PATCH 7/9] Convert ref-filter to size_t Martin Koegler
2017-08-12 8:47 ` [PATCH 8/9] Convert tree-walk " Martin Koegler
2017-08-12 8:47 ` [PATCH 9/9] Convert xdiff-interface " Martin Koegler
2017-08-12 9:59 ` [PATCH 1/9] Convert pack-objects " Torsten Bögershausen
2017-08-13 18:27 ` Martin Koegler
2017-08-12 13:47 ` Ramsay Jones
2017-08-13 18:30 ` Martin Koegler
2017-08-13 19:45 ` Ramsay Jones
2017-08-13 22:15 ` Junio C Hamano
2017-08-14 17:08 ` Junio C Hamano [this message]
2017-08-14 19:31 ` Ramsay Jones
2017-08-14 19:58 ` Junio C Hamano
2017-08-14 20:32 ` Torsten Bögershausen
2017-08-15 0:30 ` Ramsay Jones
2017-08-16 20:22 ` Martin Koegler
2017-08-17 10:38 ` Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqfucuw00a.fsf@gitster.mtv.corp.google.com \
--to=gitster@pobox.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=martin.koegler@chello.at \
--cc=ramsay@ramsayjones.plus.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.