From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Nicolas Pitre" <nico@fluxnic.net>,
"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH 4/4] pack v4: make use of cached v4 trees when unpacking
Date: Thu, 12 Sep 2013 17:38:04 +0700 [thread overview]
Message-ID: <1378982284-7848-4-git-send-email-pclouds@gmail.com> (raw)
In-Reply-To: <1378982284-7848-1-git-send-email-pclouds@gmail.com>
"git rev-list --objects v1.8.4" time is reduced from 29s to 10s with
this patch. But it is still a long way to catch up with v2: 4s.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
The problem I see with decode_entries() is that given n copy
sequences, it re-reads the same base n times. 30+ copy sequences are
not unusual at all with git.git.
I'm thinking of adding a cache to deal with one-base trees, which is
all we have now. If we know in advance what base a tree needs without
parsing the tree, we could unpack from base up like we do with
ref-deltas. Because in this case we know the base is always flat, we
could have a more efficient decode_entries that only goes through the
base once. I want to get the timing down to as close as possible to
v2 before adding v4-aware interface.
Pack cache is an idea being cooked for a while by Jeff. Maybe we
could merge his work to pack v4 or require it when pack v4 is finally
merged to 'next'.
packv4-parse.c | 17 +++++++++++++++--
packv4-parse.h | 2 ++
sha1_file.c | 14 ++++++++++++++
3 files changed, 31 insertions(+), 2 deletions(-)
diff --git a/packv4-parse.c b/packv4-parse.c
index 5002f42..b8855b0 100644
--- a/packv4-parse.c
+++ b/packv4-parse.c
@@ -415,8 +415,20 @@ static int decode_entries(struct packed_git *p, struct pack_window **w_curs,
unsigned int nb_entries;
const unsigned char *src, *scp;
off_t copy_objoffset = 0;
+ const void *cached = NULL;
+ unsigned long cached_size, cached_v4_size;
+
+ if (hdr) /* we need offset point at obj header */
+ cached = get_cached_v4_tree(p, offset,
+ &cached_size, &cached_v4_size);
+
+ if (cached) {
+ src = cached;
+ avail = cached_v4_size;
+ hdr = 0;
+ } else
+ src = use_pack(p, w_curs, offset, &avail);
- src = use_pack(p, w_curs, offset, &avail);
scp = src;
if (hdr) {
@@ -452,7 +464,8 @@ static int decode_entries(struct packed_git *p, struct pack_window **w_curs,
while (count) {
unsigned int what;
- if (avail < 20) {
+ /* fixme: need to put bach the out-of-bound check when cached == 1 */
+ if (!cached && avail < 20) {
src = use_pack(p, w_curs, offset, &avail);
if (avail < 20)
return -1;
diff --git a/packv4-parse.h b/packv4-parse.h
index 647b73c..f584c31 100644
--- a/packv4-parse.h
+++ b/packv4-parse.h
@@ -16,6 +16,8 @@ unsigned long pv4_unpack_object_header_buffer(const unsigned char *base,
unsigned long *sizep);
const unsigned char *get_sha1ref(struct packed_git *p,
const unsigned char **bufp);
+const void *get_cached_v4_tree(struct packed_git *p, off_t base_offset,
+ unsigned long *size, unsigned long *v4_size);
void *pv4_get_commit(struct packed_git *p, struct pack_window **w_curs,
off_t offset, unsigned long size);
diff --git a/sha1_file.c b/sha1_file.c
index b176316..82570be 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1967,6 +1967,20 @@ static int in_delta_base_cache(struct packed_git *p, off_t base_offset)
return eq_delta_base_cache_entry(ent, p, base_offset);
}
+const void *get_cached_v4_tree(struct packed_git *p, off_t base_offset,
+ unsigned long *size, unsigned long *v4_size)
+{
+ struct delta_base_cache_entry *ent;
+ ent = get_delta_base_cache_entry(p, base_offset);
+
+ if (!eq_delta_base_cache_entry(ent, p, base_offset) ||
+ ent->type != OBJ_PV4_TREE)
+ return NULL;
+ *size = ent->size;
+ *v4_size = ent->v4_size;
+ return ent->data;
+}
+
static void clear_delta_base_cache_entry(struct delta_base_cache_entry *ent)
{
ent->data = NULL;
--
1.8.2.83.gc99314b
next prev parent reply other threads:[~2013-09-12 10:37 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-12 10:38 [PATCH 1/4] pack v4: avoid strlen() in tree_entry_prefix Nguyễn Thái Ngọc Duy
2013-09-12 10:38 ` [PATCH 2/4] pack v4: add v4_size to struct delta_base_cache_entry Nguyễn Thái Ngọc Duy
2013-09-13 13:27 ` Nicolas Pitre
2013-09-13 13:59 ` Duy Nguyen
2013-09-14 2:06 ` Nicolas Pitre
2013-09-14 4:22 ` Nicolas Pitre
2013-09-15 7:35 ` Duy Nguyen
2013-09-16 4:42 ` Nicolas Pitre
2013-09-16 5:24 ` Duy Nguyen
2013-09-12 10:38 ` [PATCH 3/4] pack v4: cache flattened v4 trees in delta base cache Nguyễn Thái Ngọc Duy
2013-09-12 10:38 ` Nguyễn Thái Ngọc Duy [this message]
2013-09-12 13:29 ` [PATCH 5/4] pack v4: convert v4 tree to canonical format if found in " Nguyễn Thái Ngọc Duy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1378982284-7848-4-git-send-email-pclouds@gmail.com \
--to=pclouds@gmail.com \
--cc=git@vger.kernel.org \
--cc=nico@fluxnic.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.