From: Eric Wong <e@80x24.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH v2 02/10] packfile: allow content-limit for cat-file
Date: Tue, 27 Aug 2024 20:23:59 +0000 [thread overview]
Message-ID: <20240827202359.M464972@dcvr> (raw)
In-Reply-To: <xmqqcylvky69.fsf@gitster.g>
Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <e@80x24.org> writes:
> > From: Jeff King <peff@peff.net>
> >
> > Avoid unnecessary round trips to the object store to speed
> > up cat-file contents retrievals. The majority of packed objects
> > don't benefit from the streaming interface at all and we end up
> > having to load them in core anyways to satisfy our streaming
> > API.
>
> What I found missing from the description is something like ...
>
> The new trick used is to teach oid_object_info_extended() that a
> non-NULL oi->contentp that means "grab the contents of the objects
> here" can be told to refrain from grabbing an object that is too
> large.
OK.
> > diff --git a/object-file.c b/object-file.c
> > index 065103be3e..1cc29c3c58 100644
> > --- a/object-file.c
> > +++ b/object-file.c
> > @@ -1492,6 +1492,12 @@ static int loose_object_info(struct repository *r,
> >
> > if (!oi->contentp)
> > break;
> > + if (oi->content_limit && *oi->sizep > oi->content_limit) {
>
> I cannot convince myself enough to say "content limit" is a great
> name. It invites "limited by what? text files are allowed but
> images are not?".
Hmm... naming is a most difficult problem :<
->slurp_max? It could be ->content_slurp_max, but I think
that's too long...
Would welcome other suggestions...
> > diff --git a/object-store-ll.h b/object-store-ll.h
> > index c5f2bb2fc2..b71a15f590 100644
> > --- a/object-store-ll.h
> > +++ b/object-store-ll.h
> > @@ -289,6 +289,7 @@ struct object_info {
> > struct object_id *delta_base_oid;
> > struct strbuf *type_name;
> > void **contentp;
> > + size_t content_limit;
> >
> > /* Response */
> > enum {
> > diff --git a/packfile.c b/packfile.c
> > index 4028763947..c12a0515b3 100644
> > --- a/packfile.c
> > +++ b/packfile.c
> > @@ -1529,7 +1529,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
> > * We always get the representation type, but only convert it to
> > * a "real" type later if the caller is interested.
> > */
> > - if (oi->contentp) {
> > + if (oi->contentp && !oi->content_limit) {
> > *oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep,
> > &type);
> > if (!*oi->contentp)
> > @@ -1555,6 +1555,17 @@ int packed_object_info(struct repository *r, struct packed_git *p,
> > *oi->sizep = size;
> > }
> > }
> > +
> > + if (oi->contentp) {
> > + if (oi->sizep && *oi->sizep < oi->content_limit) {
>
> It happens that with the current code structure, at this point,
> oi->content_limit is _always_ non-zero. But it felt somewhat
> fragile to rely on it, and I would have appreciated if this was
> written with an explicit check for oi->content_limit, just like how
> it is done in loose_object_info() function.
Right. I actually think something like:
assert(oi->content_limit); /* see `if' above */
if (oi->sizep && *oi->sizep < oi->content_limit) {
is good for documentation purposes since this is in the `else'
branch of the `if (oi->contentp && !oi->content_limit) {' condition.
next prev parent reply other threads:[~2024-08-27 20:24 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-15 0:35 [PATCH v1 00/10] cat-file speedups Eric Wong
2024-07-15 0:35 ` [PATCH v1 01/10] packfile: move sizep computation Eric Wong
2024-07-24 8:35 ` Patrick Steinhardt
2024-07-15 0:35 ` [PATCH v1 02/10] packfile: allow content-limit for cat-file Eric Wong
2024-07-24 8:35 ` Patrick Steinhardt
2024-07-26 7:30 ` Eric Wong
2024-07-15 0:35 ` [PATCH v1 03/10] packfile: fix off-by-one in content_limit comparison Eric Wong
2024-07-24 8:35 ` Patrick Steinhardt
2024-07-26 7:43 ` Eric Wong
2024-07-15 0:35 ` [PATCH v1 04/10] packfile: inline cache_or_unpack_entry Eric Wong
2024-07-15 0:35 ` [PATCH v1 05/10] cat-file: use delta_base_cache entries directly Eric Wong
2024-07-24 8:35 ` Patrick Steinhardt
2024-07-26 7:42 ` Eric Wong
2024-08-18 17:36 ` assert vs BUG [was: [PATCH v1 05/10] cat-file: use delta_base_cache entries directly] Eric Wong
2024-08-19 15:50 ` Junio C Hamano
2024-07-15 0:35 ` [PATCH v1 06/10] packfile: packed_object_info avoids packed_to_object_type Eric Wong
2024-07-24 8:36 ` Patrick Steinhardt
2024-07-26 8:01 ` Eric Wong
2024-07-15 0:35 ` [PATCH v1 07/10] object_info: content_limit only applies to blobs Eric Wong
2024-07-15 0:35 ` [PATCH v1 08/10] cat-file: batch-command uses content_limit Eric Wong
2024-07-15 0:35 ` [PATCH v1 09/10] cat-file: batch_write: use size_t for length Eric Wong
2024-07-15 0:35 ` [PATCH v1 10/10] cat-file: use writev(2) if available Eric Wong
2024-07-24 8:35 ` [PATCH v1 00/10] cat-file speedups Patrick Steinhardt
2024-08-23 22:46 ` [PATCH v2 " Eric Wong
2024-08-23 22:46 ` [PATCH v2 01/10] packfile: move sizep computation Eric Wong
2024-09-17 10:06 ` Taylor Blau
2024-08-23 22:46 ` [PATCH v2 02/10] packfile: allow content-limit for cat-file Eric Wong
2024-08-26 17:10 ` Junio C Hamano
2024-08-27 20:23 ` Eric Wong [this message]
2024-09-17 10:10 ` Taylor Blau
2024-09-17 21:15 ` Junio C Hamano
2024-08-23 22:46 ` [PATCH v2 03/10] packfile: fix off-by-one in content_limit comparison Eric Wong
2024-08-26 16:55 ` Junio C Hamano
2024-09-17 10:11 ` Taylor Blau
2024-08-23 22:46 ` [PATCH v2 04/10] packfile: inline cache_or_unpack_entry Eric Wong
2024-08-26 17:09 ` Junio C Hamano
2024-10-06 17:40 ` Eric Wong
2024-08-23 22:46 ` [PATCH v2 05/10] cat-file: use delta_base_cache entries directly Eric Wong
2024-08-26 21:31 ` Junio C Hamano
2024-08-26 23:05 ` Junio C Hamano
2024-08-23 22:46 ` [PATCH v2 06/10] packfile: packed_object_info avoids packed_to_object_type Eric Wong
2024-08-26 21:50 ` Junio C Hamano
2024-08-23 22:46 ` [PATCH v2 07/10] object_info: content_limit only applies to blobs Eric Wong
2024-08-26 22:02 ` Junio C Hamano
2024-08-23 22:46 ` [PATCH v2 08/10] cat-file: batch-command uses content_limit Eric Wong
2024-08-26 22:13 ` Junio C Hamano
2024-08-23 22:46 ` [PATCH v2 09/10] cat-file: batch_write: use size_t for length Eric Wong
2024-08-27 5:06 ` Junio C Hamano
2024-08-23 22:46 ` [PATCH v2 10/10] cat-file: use writev(2) if available Eric Wong
2024-08-27 5:41 ` Junio C Hamano
2024-08-27 15:43 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240827202359.M464972@dcvr \
--to=e@80x24.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.