From: Junio C Hamano <gitster@pobox.com>
To: Eric Wong <e@80x24.org>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH v2 02/10] packfile: allow content-limit for cat-file
Date: Mon, 26 Aug 2024 10:10:22 -0700 [thread overview]
Message-ID: <xmqqcylvky69.fsf@gitster.g> (raw)
In-Reply-To: <20240823224630.1180772-3-e@80x24.org> (Eric Wong's message of "Fri, 23 Aug 2024 22:46:22 +0000")
Eric Wong <e@80x24.org> writes:
> From: Jeff King <peff@peff.net>
>
> Avoid unnecessary round trips to the object store to speed
> up cat-file contents retrievals. The majority of packed objects
> don't benefit from the streaming interface at all and we end up
> having to load them in core anyways to satisfy our streaming
> API.
What I found missing from the description is something like ...
The new trick used is to teach oid_object_info_extended() that a
non-NULL oi->contentp that means "grab the contents of the objects
here" can be told to refrain from grabbing an object that is too
large.
> diff --git a/object-file.c b/object-file.c
> index 065103be3e..1cc29c3c58 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1492,6 +1492,12 @@ static int loose_object_info(struct repository *r,
>
> if (!oi->contentp)
> break;
> + if (oi->content_limit && *oi->sizep > oi->content_limit) {
I cannot convince myself enough to say "content limit" is a great
name. It invites "limited by what? text files are allowed but
images are not?".
> diff --git a/object-store-ll.h b/object-store-ll.h
> index c5f2bb2fc2..b71a15f590 100644
> --- a/object-store-ll.h
> +++ b/object-store-ll.h
> @@ -289,6 +289,7 @@ struct object_info {
> struct object_id *delta_base_oid;
> struct strbuf *type_name;
> void **contentp;
> + size_t content_limit;
>
> /* Response */
> enum {
> diff --git a/packfile.c b/packfile.c
> index 4028763947..c12a0515b3 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1529,7 +1529,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
> * We always get the representation type, but only convert it to
> * a "real" type later if the caller is interested.
> */
> - if (oi->contentp) {
> + if (oi->contentp && !oi->content_limit) {
> *oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep,
> &type);
> if (!*oi->contentp)
> @@ -1555,6 +1555,17 @@ int packed_object_info(struct repository *r, struct packed_git *p,
> *oi->sizep = size;
> }
> }
> +
> + if (oi->contentp) {
> + if (oi->sizep && *oi->sizep < oi->content_limit) {
It happens that with the current code structure, at this point,
oi->content_limit is _always_ non-zero. But it felt somewhat
fragile to rely on it, and I would have appreciated if this was
written with an explicit check for oi->content_limit, just like how
it is done in loose_object_info() function.
Other than that, looking very good.
Thanks.
> + *oi->contentp = cache_or_unpack_entry(r, p, obj_offset,
> + oi->sizep, &type);
> + if (!*oi->contentp)
> + type = OBJ_BAD;
> + } else {
> + *oi->contentp = NULL;
> + }
> + }
> }
>
> if (oi->disk_sizep) {
next prev parent reply other threads:[~2024-08-26 17:10 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-15 0:35 [PATCH v1 00/10] cat-file speedups Eric Wong
2024-07-15 0:35 ` [PATCH v1 01/10] packfile: move sizep computation Eric Wong
2024-07-24 8:35 ` Patrick Steinhardt
2024-07-15 0:35 ` [PATCH v1 02/10] packfile: allow content-limit for cat-file Eric Wong
2024-07-24 8:35 ` Patrick Steinhardt
2024-07-26 7:30 ` Eric Wong
2024-07-15 0:35 ` [PATCH v1 03/10] packfile: fix off-by-one in content_limit comparison Eric Wong
2024-07-24 8:35 ` Patrick Steinhardt
2024-07-26 7:43 ` Eric Wong
2024-07-15 0:35 ` [PATCH v1 04/10] packfile: inline cache_or_unpack_entry Eric Wong
2024-07-15 0:35 ` [PATCH v1 05/10] cat-file: use delta_base_cache entries directly Eric Wong
2024-07-24 8:35 ` Patrick Steinhardt
2024-07-26 7:42 ` Eric Wong
2024-08-18 17:36 ` assert vs BUG [was: [PATCH v1 05/10] cat-file: use delta_base_cache entries directly] Eric Wong
2024-08-19 15:50 ` Junio C Hamano
2024-07-15 0:35 ` [PATCH v1 06/10] packfile: packed_object_info avoids packed_to_object_type Eric Wong
2024-07-24 8:36 ` Patrick Steinhardt
2024-07-26 8:01 ` Eric Wong
2024-07-15 0:35 ` [PATCH v1 07/10] object_info: content_limit only applies to blobs Eric Wong
2024-07-15 0:35 ` [PATCH v1 08/10] cat-file: batch-command uses content_limit Eric Wong
2024-07-15 0:35 ` [PATCH v1 09/10] cat-file: batch_write: use size_t for length Eric Wong
2024-07-15 0:35 ` [PATCH v1 10/10] cat-file: use writev(2) if available Eric Wong
2024-07-24 8:35 ` [PATCH v1 00/10] cat-file speedups Patrick Steinhardt
2024-08-23 22:46 ` [PATCH v2 " Eric Wong
2024-08-23 22:46 ` [PATCH v2 01/10] packfile: move sizep computation Eric Wong
2024-09-17 10:06 ` Taylor Blau
2024-08-23 22:46 ` [PATCH v2 02/10] packfile: allow content-limit for cat-file Eric Wong
2024-08-26 17:10 ` Junio C Hamano [this message]
2024-08-27 20:23 ` Eric Wong
2024-09-17 10:10 ` Taylor Blau
2024-09-17 21:15 ` Junio C Hamano
2024-08-23 22:46 ` [PATCH v2 03/10] packfile: fix off-by-one in content_limit comparison Eric Wong
2024-08-26 16:55 ` Junio C Hamano
2024-09-17 10:11 ` Taylor Blau
2024-08-23 22:46 ` [PATCH v2 04/10] packfile: inline cache_or_unpack_entry Eric Wong
2024-08-26 17:09 ` Junio C Hamano
2024-10-06 17:40 ` Eric Wong
2024-08-23 22:46 ` [PATCH v2 05/10] cat-file: use delta_base_cache entries directly Eric Wong
2024-08-26 21:31 ` Junio C Hamano
2024-08-26 23:05 ` Junio C Hamano
2024-08-23 22:46 ` [PATCH v2 06/10] packfile: packed_object_info avoids packed_to_object_type Eric Wong
2024-08-26 21:50 ` Junio C Hamano
2024-08-23 22:46 ` [PATCH v2 07/10] object_info: content_limit only applies to blobs Eric Wong
2024-08-26 22:02 ` Junio C Hamano
2024-08-23 22:46 ` [PATCH v2 08/10] cat-file: batch-command uses content_limit Eric Wong
2024-08-26 22:13 ` Junio C Hamano
2024-08-23 22:46 ` [PATCH v2 09/10] cat-file: batch_write: use size_t for length Eric Wong
2024-08-27 5:06 ` Junio C Hamano
2024-08-23 22:46 ` [PATCH v2 10/10] cat-file: use writev(2) if available Eric Wong
2024-08-27 5:41 ` Junio C Hamano
2024-08-27 15:43 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqcylvky69.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=e@80x24.org \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).