From: Junio C Hamano <gitster@pobox.com>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Cc: git@vger.kernel.org, Thomas Rast <trast@inf.ethz.ch>
Subject: Re: [PATCH] pack-objects: no crc check when the cached version is used
Date: Fri, 13 Sep 2013 11:28:05 -0700 [thread overview]
Message-ID: <xmqq7gekk24q.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <1379070180-15947-1-git-send-email-pclouds@gmail.com> ("Nguyễn Thái Ngọc Duy"'s message of "Fri, 13 Sep 2013 18:03:00 +0700")
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> Current code makes pack-objects always do check_pack_crc() in
> unpack_entry() even if right after that we find out there's a cached
> version and pack access is not needed. Swap two code blocks, search
> for cached version first, then check crc.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
Interesting.
This is only triggered inside pack-objects, which would read a lot
of data from existing packs, and the overhead for looking up the
entry from the revindex, faulting in the actual packdata, and
computing and comparing the crc would not be trivial, especially as
the cost is incurred over many objects we need to untangle in the
delta chain. If you have interesting numbers to show how much this
improves the performance, I am curious to see it.
Good spotting ;-)
> sha1_file.c | 20 ++++++++++----------
> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index 8c2d1ed..4955724 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2126,6 +2126,16 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
> int i;
> struct delta_base_cache_entry *ent;
>
> + ent = get_delta_base_cache_entry(p, curpos);
> + if (eq_delta_base_cache_entry(ent, p, curpos)) {
> + type = ent->type;
> + data = ent->data;
> + size = ent->size;
> + clear_delta_base_cache_entry(ent);
> + base_from_cache = 1;
> + break;
> + }
> +
> if (do_check_packed_object_crc && p->index_version > 1) {
> struct revindex_entry *revidx = find_pack_revindex(p, obj_offset);
> unsigned long len = revidx[1].offset - obj_offset;
> @@ -2140,16 +2150,6 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
> }
> }
>
> - ent = get_delta_base_cache_entry(p, curpos);
> - if (eq_delta_base_cache_entry(ent, p, curpos)) {
> - type = ent->type;
> - data = ent->data;
> - size = ent->size;
> - clear_delta_base_cache_entry(ent);
> - base_from_cache = 1;
> - break;
> - }
> -
> type = unpack_object_header(p, &w_curs, &curpos, &size);
> if (type != OBJ_OFS_DELTA && type != OBJ_REF_DELTA)
> break;
next prev parent reply other threads:[~2013-09-13 18:28 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-13 11:03 [PATCH] pack-objects: no crc check when the cached version is used Nguyễn Thái Ngọc Duy
2013-09-13 18:28 ` Junio C Hamano [this message]
2013-09-13 21:26 ` Thomas Rast
2013-09-14 1:04 ` Duy Nguyen
2013-09-14 3:18 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq7gekk24q.fsf@gitster.dls.corp.google.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
--cc=trast@inf.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.