git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Wong <e@80x24.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH v2 02/10] packfile: allow content-limit for cat-file
Date: Tue, 27 Aug 2024 20:23:59 +0000	[thread overview]
Message-ID: <20240827202359.M464972@dcvr> (raw)
In-Reply-To: <xmqqcylvky69.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <e@80x24.org> writes:
> > From: Jeff King <peff@peff.net>
> >
> > Avoid unnecessary round trips to the object store to speed
> > up cat-file contents retrievals.  The majority of packed objects
> > don't benefit from the streaming interface at all and we end up
> > having to load them in core anyways to satisfy our streaming
> > API.
> 
> What I found missing from the description is something like ...
> 
>     The new trick used is to teach oid_object_info_extended() that a
>     non-NULL oi->contentp that means "grab the contents of the objects
>     here" can be told to refrain from grabbing an object that is too
>     large.

OK.

> > diff --git a/object-file.c b/object-file.c
> > index 065103be3e..1cc29c3c58 100644
> > --- a/object-file.c
> > +++ b/object-file.c
> > @@ -1492,6 +1492,12 @@ static int loose_object_info(struct repository *r,
> >  
> >  		if (!oi->contentp)
> >  			break;
> > +		if (oi->content_limit && *oi->sizep > oi->content_limit) {
> 
> I cannot convince myself enough to say "content limit" is a great
> name.  It invites "limited by what?  text files are allowed but
> images are not?".

Hmm... naming is a most difficult problem :<

->slurp_max?  It could be ->content_slurp_max, but I think
that's too long...

Would welcome other suggestions...

> > diff --git a/object-store-ll.h b/object-store-ll.h
> > index c5f2bb2fc2..b71a15f590 100644
> > --- a/object-store-ll.h
> > +++ b/object-store-ll.h
> > @@ -289,6 +289,7 @@ struct object_info {
> >  	struct object_id *delta_base_oid;
> >  	struct strbuf *type_name;
> >  	void **contentp;
> > +	size_t content_limit;
> >  
> >  	/* Response */
> >  	enum {
> > diff --git a/packfile.c b/packfile.c
> > index 4028763947..c12a0515b3 100644
> > --- a/packfile.c
> > +++ b/packfile.c
> > @@ -1529,7 +1529,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
> >  	 * We always get the representation type, but only convert it to
> >  	 * a "real" type later if the caller is interested.
> >  	 */
> > -	if (oi->contentp) {
> > +	if (oi->contentp && !oi->content_limit) {
> >  		*oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep,
> >  						      &type);
> >  		if (!*oi->contentp)
> > @@ -1555,6 +1555,17 @@ int packed_object_info(struct repository *r, struct packed_git *p,
> >  				*oi->sizep = size;
> >  			}
> >  		}
> > +
> > +		if (oi->contentp) {
> > +			if (oi->sizep && *oi->sizep < oi->content_limit) {
> 
> It happens that with the current code structure, at this point,
> oi->content_limit is _always_ non-zero.  But it felt somewhat
> fragile to rely on it, and I would have appreciated if this was
> written with an explicit check for oi->content_limit, just like how
> it is done in loose_object_info() function.

Right.  I actually think something like:

		assert(oi->content_limit); /* see `if' above */
		if (oi->sizep && *oi->sizep < oi->content_limit) {

is good for documentation purposes since this is in the `else'
branch of the `if (oi->contentp && !oi->content_limit) {' condition.

  reply	other threads:[~2024-08-27 20:24 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-15  0:35 [PATCH v1 00/10] cat-file speedups Eric Wong
2024-07-15  0:35 ` [PATCH v1 01/10] packfile: move sizep computation Eric Wong
2024-07-24  8:35   ` Patrick Steinhardt
2024-07-15  0:35 ` [PATCH v1 02/10] packfile: allow content-limit for cat-file Eric Wong
2024-07-24  8:35   ` Patrick Steinhardt
2024-07-26  7:30     ` Eric Wong
2024-07-15  0:35 ` [PATCH v1 03/10] packfile: fix off-by-one in content_limit comparison Eric Wong
2024-07-24  8:35   ` Patrick Steinhardt
2024-07-26  7:43     ` Eric Wong
2024-07-15  0:35 ` [PATCH v1 04/10] packfile: inline cache_or_unpack_entry Eric Wong
2024-07-15  0:35 ` [PATCH v1 05/10] cat-file: use delta_base_cache entries directly Eric Wong
2024-07-24  8:35   ` Patrick Steinhardt
2024-07-26  7:42     ` Eric Wong
2024-08-18 17:36       ` assert vs BUG [was: [PATCH v1 05/10] cat-file: use delta_base_cache entries directly] Eric Wong
2024-08-19 15:50         ` Junio C Hamano
2024-07-15  0:35 ` [PATCH v1 06/10] packfile: packed_object_info avoids packed_to_object_type Eric Wong
2024-07-24  8:36   ` Patrick Steinhardt
2024-07-26  8:01     ` Eric Wong
2024-07-15  0:35 ` [PATCH v1 07/10] object_info: content_limit only applies to blobs Eric Wong
2024-07-15  0:35 ` [PATCH v1 08/10] cat-file: batch-command uses content_limit Eric Wong
2024-07-15  0:35 ` [PATCH v1 09/10] cat-file: batch_write: use size_t for length Eric Wong
2024-07-15  0:35 ` [PATCH v1 10/10] cat-file: use writev(2) if available Eric Wong
2024-07-24  8:35 ` [PATCH v1 00/10] cat-file speedups Patrick Steinhardt
2024-08-23 22:46 ` [PATCH v2 " Eric Wong
2024-08-23 22:46   ` [PATCH v2 01/10] packfile: move sizep computation Eric Wong
2024-09-17 10:06     ` Taylor Blau
2024-08-23 22:46   ` [PATCH v2 02/10] packfile: allow content-limit for cat-file Eric Wong
2024-08-26 17:10     ` Junio C Hamano
2024-08-27 20:23       ` Eric Wong [this message]
2024-09-17 10:10         ` Taylor Blau
2024-09-17 21:15           ` Junio C Hamano
2024-08-23 22:46   ` [PATCH v2 03/10] packfile: fix off-by-one in content_limit comparison Eric Wong
2024-08-26 16:55     ` Junio C Hamano
2024-09-17 10:11       ` Taylor Blau
2024-08-23 22:46   ` [PATCH v2 04/10] packfile: inline cache_or_unpack_entry Eric Wong
2024-08-26 17:09     ` Junio C Hamano
2024-10-06 17:40       ` Eric Wong
2024-08-23 22:46   ` [PATCH v2 05/10] cat-file: use delta_base_cache entries directly Eric Wong
2024-08-26 21:31     ` Junio C Hamano
2024-08-26 23:05       ` Junio C Hamano
2024-08-23 22:46   ` [PATCH v2 06/10] packfile: packed_object_info avoids packed_to_object_type Eric Wong
2024-08-26 21:50     ` Junio C Hamano
2024-08-23 22:46   ` [PATCH v2 07/10] object_info: content_limit only applies to blobs Eric Wong
2024-08-26 22:02     ` Junio C Hamano
2024-08-23 22:46   ` [PATCH v2 08/10] cat-file: batch-command uses content_limit Eric Wong
2024-08-26 22:13     ` Junio C Hamano
2024-08-23 22:46   ` [PATCH v2 09/10] cat-file: batch_write: use size_t for length Eric Wong
2024-08-27  5:06     ` Junio C Hamano
2024-08-23 22:46   ` [PATCH v2 10/10] cat-file: use writev(2) if available Eric Wong
2024-08-27  5:41     ` Junio C Hamano
2024-08-27 15:43       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240827202359.M464972@dcvr \
    --to=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).