From: Junio C Hamano <gitster@pobox.com>
To: Eric Wong <e@80x24.org>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH v2 02/10] packfile: allow content-limit for cat-file
Date: Mon, 26 Aug 2024 10:10:22 -0700 [thread overview]
Message-ID: <xmqqcylvky69.fsf@gitster.g> (raw)
In-Reply-To: <20240823224630.1180772-3-e@80x24.org> (Eric Wong's message of "Fri, 23 Aug 2024 22:46:22 +0000")
Eric Wong <e@80x24.org> writes:
> From: Jeff King <peff@peff.net>
>
> Avoid unnecessary round trips to the object store to speed
> up cat-file contents retrievals. The majority of packed objects
> don't benefit from the streaming interface at all and we end up
> having to load them in core anyways to satisfy our streaming
> API.
What I found missing from the description is something like ...
The new trick used is to teach oid_object_info_extended() that a
non-NULL oi->contentp that means "grab the contents of the objects
here" can be told to refrain from grabbing an object that is too
large.
> diff --git a/object-file.c b/object-file.c
> index 065103be3e..1cc29c3c58 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1492,6 +1492,12 @@ static int loose_object_info(struct repository *r,
>
> if (!oi->contentp)
> break;
> + if (oi->content_limit && *oi->sizep > oi->content_limit) {
I cannot convince myself enough to say "content limit" is a great
name. It invites "limited by what? text files are allowed but
images are not?".
> diff --git a/object-store-ll.h b/object-store-ll.h
> index c5f2bb2fc2..b71a15f590 100644
> --- a/object-store-ll.h
> +++ b/object-store-ll.h
> @@ -289,6 +289,7 @@ struct object_info {
> struct object_id *delta_base_oid;
> struct strbuf *type_name;
> void **contentp;
> + size_t content_limit;
>
> /* Response */
> enum {
> diff --git a/packfile.c b/packfile.c
> index 4028763947..c12a0515b3 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1529,7 +1529,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
> * We always get the representation type, but only convert it to
> * a "real" type later if the caller is interested.
> */
> - if (oi->contentp) {
> + if (oi->contentp && !oi->content_limit) {
> *oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep,
> &type);
> if (!*oi->contentp)
> @@ -1555,6 +1555,17 @@ int packed_object_info(struct repository *r, struct packed_git *p,
> *oi->sizep = size;
> }
> }
> +
> + if (oi->contentp) {
> + if (oi->sizep && *oi->sizep < oi->content_limit) {
It happens that with the current code structure, at this point,
oi->content_limit is _always_ non-zero. But it felt somewhat
fragile to rely on it, and I would have appreciated if this was
written with an explicit check for oi->content_limit, just like how
it is done in loose_object_info() function.
Other than that, looking very good.
Thanks.
> + *oi->contentp = cache_or_unpack_entry(r, p, obj_offset,
> + oi->sizep, &type);
> + if (!*oi->contentp)
> + type = OBJ_BAD;
> + } else {
> + *oi->contentp = NULL;
> + }
> + }
> }
>
> if (oi->disk_sizep) {
next prev parent reply other threads:[~2024-08-26 17:10 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-15 0:35 [PATCH v1 00/10] cat-file speedups Eric Wong
2024-07-15 0:35 ` [PATCH v1 01/10] packfile: move sizep computation Eric Wong
2024-07-24 8:35 ` Patrick Steinhardt
2024-07-15 0:35 ` [PATCH v1 02/10] packfile: allow content-limit for cat-file Eric Wong
2024-07-24 8:35 ` Patrick Steinhardt
2024-07-26 7:30 ` Eric Wong
2024-07-15 0:35 ` [PATCH v1 03/10] packfile: fix off-by-one in content_limit comparison Eric Wong
2024-07-24 8:35 ` Patrick Steinhardt
2024-07-26 7:43 ` Eric Wong
2024-07-15 0:35 ` [PATCH v1 04/10] packfile: inline cache_or_unpack_entry Eric Wong
2024-07-15 0:35 ` [PATCH v1 05/10] cat-file: use delta_base_cache entries directly Eric Wong
2024-07-24 8:35 ` Patrick Steinhardt
2024-07-26 7:42 ` Eric Wong
2024-08-18 17:36 ` assert vs BUG [was: [PATCH v1 05/10] cat-file: use delta_base_cache entries directly] Eric Wong
2024-08-19 15:50 ` Junio C Hamano
2024-07-15 0:35 ` [PATCH v1 06/10] packfile: packed_object_info avoids packed_to_object_type Eric Wong
2024-07-24 8:36 ` Patrick Steinhardt
2024-07-26 8:01 ` Eric Wong
2024-07-15 0:35 ` [PATCH v1 07/10] object_info: content_limit only applies to blobs Eric Wong
2024-07-15 0:35 ` [PATCH v1 08/10] cat-file: batch-command uses content_limit Eric Wong
2024-07-15 0:35 ` [PATCH v1 09/10] cat-file: batch_write: use size_t for length Eric Wong
2024-07-15 0:35 ` [PATCH v1 10/10] cat-file: use writev(2) if available Eric Wong
2024-07-24 8:35 ` [PATCH v1 00/10] cat-file speedups Patrick Steinhardt
2024-08-23 22:46 ` [PATCH v2 " Eric Wong
2024-08-23 22:46 ` [PATCH v2 01/10] packfile: move sizep computation Eric Wong
2024-09-17 10:06 ` Taylor Blau
2024-08-23 22:46 ` [PATCH v2 02/10] packfile: allow content-limit for cat-file Eric Wong
2024-08-26 17:10 ` Junio C Hamano [this message]
2024-08-27 20:23 ` Eric Wong
2024-09-17 10:10 ` Taylor Blau
2024-09-17 21:15 ` Junio C Hamano
2024-08-23 22:46 ` [PATCH v2 03/10] packfile: fix off-by-one in content_limit comparison Eric Wong
2024-08-26 16:55 ` Junio C Hamano
2024-09-17 10:11 ` Taylor Blau
2024-08-23 22:46 ` [PATCH v2 04/10] packfile: inline cache_or_unpack_entry Eric Wong
2024-08-26 17:09 ` Junio C Hamano
2024-10-06 17:40 ` Eric Wong
2024-08-23 22:46 ` [PATCH v2 05/10] cat-file: use delta_base_cache entries directly Eric Wong
2024-08-26 21:31 ` Junio C Hamano
2024-08-26 23:05 ` Junio C Hamano
2024-08-23 22:46 ` [PATCH v2 06/10] packfile: packed_object_info avoids packed_to_object_type Eric Wong
2024-08-26 21:50 ` Junio C Hamano
2024-08-23 22:46 ` [PATCH v2 07/10] object_info: content_limit only applies to blobs Eric Wong
2024-08-26 22:02 ` Junio C Hamano
2024-08-23 22:46 ` [PATCH v2 08/10] cat-file: batch-command uses content_limit Eric Wong
2024-08-26 22:13 ` Junio C Hamano
2024-08-23 22:46 ` [PATCH v2 09/10] cat-file: batch_write: use size_t for length Eric Wong
2024-08-27 5:06 ` Junio C Hamano
2024-08-23 22:46 ` [PATCH v2 10/10] cat-file: use writev(2) if available Eric Wong
2024-08-27 5:41 ` Junio C Hamano
2024-08-27 15:43 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqcylvky69.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=e@80x24.org \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.