git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] pack-objects: no crc check when the cached version is used
@ 2013-09-13 11:03 Nguyễn Thái Ngọc Duy
  2013-09-13 18:28 ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-09-13 11:03 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

Current code makes pack-objects always do check_pack_crc() in
unpack_entry() even if right after that we find out there's a cached
version and pack access is not needed. Swap two code blocks, search
for cached version first, then check crc.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 sha1_file.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 8c2d1ed..4955724 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2126,6 +2126,16 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
 		int i;
 		struct delta_base_cache_entry *ent;
 
+		ent = get_delta_base_cache_entry(p, curpos);
+		if (eq_delta_base_cache_entry(ent, p, curpos)) {
+			type = ent->type;
+			data = ent->data;
+			size = ent->size;
+			clear_delta_base_cache_entry(ent);
+			base_from_cache = 1;
+			break;
+		}
+
 		if (do_check_packed_object_crc && p->index_version > 1) {
 			struct revindex_entry *revidx = find_pack_revindex(p, obj_offset);
 			unsigned long len = revidx[1].offset - obj_offset;
@@ -2140,16 +2150,6 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
 			}
 		}
 
-		ent = get_delta_base_cache_entry(p, curpos);
-		if (eq_delta_base_cache_entry(ent, p, curpos)) {
-			type = ent->type;
-			data = ent->data;
-			size = ent->size;
-			clear_delta_base_cache_entry(ent);
-			base_from_cache = 1;
-			break;
-		}
-
 		type = unpack_object_header(p, &w_curs, &curpos, &size);
 		if (type != OBJ_OFS_DELTA && type != OBJ_REF_DELTA)
 			break;
-- 
1.8.2.82.gc24b958

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] pack-objects: no crc check when the cached version is used
  2013-09-13 11:03 [PATCH] pack-objects: no crc check when the cached version is used Nguyễn Thái Ngọc Duy
@ 2013-09-13 18:28 ` Junio C Hamano
  2013-09-13 21:26   ` Thomas Rast
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2013-09-13 18:28 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Thomas Rast

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> Current code makes pack-objects always do check_pack_crc() in
> unpack_entry() even if right after that we find out there's a cached
> version and pack access is not needed. Swap two code blocks, search
> for cached version first, then check crc.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---

Interesting.

This is only triggered inside pack-objects, which would read a lot
of data from existing packs, and the overhead for looking up the
entry from the revindex, faulting in the actual packdata, and
computing and comparing the crc would not be trivial, especially as
the cost is incurred over many objects we need to untangle in the
delta chain.  If you have interesting numbers to show how much this
improves the performance, I am curious to see it.

Good spotting ;-)

>  sha1_file.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index 8c2d1ed..4955724 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2126,6 +2126,16 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
>  		int i;
>  		struct delta_base_cache_entry *ent;
>  
> +		ent = get_delta_base_cache_entry(p, curpos);
> +		if (eq_delta_base_cache_entry(ent, p, curpos)) {
> +			type = ent->type;
> +			data = ent->data;
> +			size = ent->size;
> +			clear_delta_base_cache_entry(ent);
> +			base_from_cache = 1;
> +			break;
> +		}
> +
>  		if (do_check_packed_object_crc && p->index_version > 1) {
>  			struct revindex_entry *revidx = find_pack_revindex(p, obj_offset);
>  			unsigned long len = revidx[1].offset - obj_offset;
> @@ -2140,16 +2150,6 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
>  			}
>  		}
>  
> -		ent = get_delta_base_cache_entry(p, curpos);
> -		if (eq_delta_base_cache_entry(ent, p, curpos)) {
> -			type = ent->type;
> -			data = ent->data;
> -			size = ent->size;
> -			clear_delta_base_cache_entry(ent);
> -			base_from_cache = 1;
> -			break;
> -		}
> -
>  		type = unpack_object_header(p, &w_curs, &curpos, &size);
>  		if (type != OBJ_OFS_DELTA && type != OBJ_REF_DELTA)
>  			break;

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] pack-objects: no crc check when the cached version is used
  2013-09-13 18:28 ` Junio C Hamano
@ 2013-09-13 21:26   ` Thomas Rast
  2013-09-14  1:04     ` Duy Nguyen
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Rast @ 2013-09-13 21:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nguyễn Thái Ngọc Duy, git

Junio C Hamano <gitster@pobox.com> writes:

> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>
>> Current code makes pack-objects always do check_pack_crc() in
>> unpack_entry() even if right after that we find out there's a cached
>> version and pack access is not needed. Swap two code blocks, search
>> for cached version first, then check crc.
[...]
>
> Interesting.
>
> This is only triggered inside pack-objects, which would read a lot
> of data from existing packs, and the overhead for looking up the
> entry from the revindex, faulting in the actual packdata, and
> computing and comparing the crc would not be trivial, especially as
> the cost is incurred over many objects we need to untangle in the
> delta chain.  If you have interesting numbers to show how much this
> improves the performance, I am curious to see it.

I can't see anything wrong with the patch, but then I haven't stared too
hard.  (It seems that my conversion around abe601b (sha1_file: remove
recursion in unpack_entry, 2013-03-27) was faithful on this point, the
problem has existed for longer than that.)

I tried the perf script below, but at least for the git repo the only
thing I can see is noise.

--- 8< --- t/perf/p5300-pack-object.sh --- 8< ---
#!/bin/sh

test_description="Tests object packing performance"

. ./perf-lib.sh

test_perf_default_repo

test_perf 'pack-objects on commits in HEAD' '
	git rev-list HEAD |
	git pack-objects --stdout >/dev/null
'

test_perf 'pack-objects on all of HEAD' '
	git rev-list --objects HEAD |
	git pack-objects --stdout >/dev/null
'

test_done

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] pack-objects: no crc check when the cached version is used
  2013-09-13 21:26   ` Thomas Rast
@ 2013-09-14  1:04     ` Duy Nguyen
  2013-09-14  3:18       ` Nicolas Pitre
  0 siblings, 1 reply; 5+ messages in thread
From: Duy Nguyen @ 2013-09-14  1:04 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Junio C Hamano, Git Mailing List

On Sat, Sep 14, 2013 at 4:26 AM, Thomas Rast <trast@inf.ethz.ch> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>>
>>> Current code makes pack-objects always do check_pack_crc() in
>>> unpack_entry() even if right after that we find out there's a cached
>>> version and pack access is not needed. Swap two code blocks, search
>>> for cached version first, then check crc.
> [...]
>>
>> Interesting.
>>
>> This is only triggered inside pack-objects, which would read a lot
>> of data from existing packs, and the overhead for looking up the
>> entry from the revindex, faulting in the actual packdata, and
>> computing and comparing the crc would not be trivial, especially as
>> the cost is incurred over many objects we need to untangle in the
>> delta chain.  If you have interesting numbers to show how much this
>> improves the performance, I am curious to see it.

No I don't have any timing numbers. I just updated the code to see how
many times crc is checked and how many times we find a cached version
after crc is checked. The numbers with git.git are 353535 and 113257
respectively. IOW we could reduce the number of crc checks by 30%.

>
> I can't see anything wrong with the patch, but then I haven't stared too
> hard.  (It seems that my conversion around abe601b (sha1_file: remove
> recursion in unpack_entry, 2013-03-27) was faithful on this point, the
> problem has existed for longer than that.)
>
> I tried the perf script below, but at least for the git repo the only
> thing I can see is noise.

--stdout does not set do_check_packed_object_crc, you need to run
pack-objects without --stdout (i.e. the real use case is repack)

>
> --- 8< --- t/perf/p5300-pack-object.sh --- 8< ---
> #!/bin/sh
>
> test_description="Tests object packing performance"
>
> . ./perf-lib.sh
>
> test_perf_default_repo
>
> test_perf 'pack-objects on commits in HEAD' '
>         git rev-list HEAD |
>         git pack-objects --stdout >/dev/null
> '
>
> test_perf 'pack-objects on all of HEAD' '
>         git rev-list --objects HEAD |
>         git pack-objects --stdout >/dev/null
> '
>
> test_done



-- 
Duy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] pack-objects: no crc check when the cached version is used
  2013-09-14  1:04     ` Duy Nguyen
@ 2013-09-14  3:18       ` Nicolas Pitre
  0 siblings, 0 replies; 5+ messages in thread
From: Nicolas Pitre @ 2013-09-14  3:18 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Thomas Rast, Junio C Hamano, Git Mailing List

On Sat, 14 Sep 2013, Duy Nguyen wrote:

> On Sat, Sep 14, 2013 at 4:26 AM, Thomas Rast <trast@inf.ethz.ch> wrote:
> > I tried the perf script below, but at least for the git repo the only
> > thing I can see is noise.
> 
> --stdout does not set do_check_packed_object_crc, you need to run
> pack-objects without --stdout (i.e. the real use case is repack)

And for those who might wonder why, you can have a look at the 
description for commit 0e8189e2708b.  This was probably one of my best 
commit logs ever.  ;-)

This commit also provides a hint about the cost of over-checking the 
CRC.


Nicolas

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-09-14  3:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-13 11:03 [PATCH] pack-objects: no crc check when the cached version is used Nguyễn Thái Ngọc Duy
2013-09-13 18:28 ` Junio C Hamano
2013-09-13 21:26   ` Thomas Rast
2013-09-14  1:04     ` Duy Nguyen
2013-09-14  3:18       ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).