git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 08/13] prevent try_delta from using objects not in pack
@ 2007-04-05 22:35 Dana How
  2007-04-05 23:04 ` Junio C Hamano
  2007-04-06  1:02 ` Nicolas Pitre
  0 siblings, 2 replies; 8+ messages in thread
From: Dana How @ 2007-04-05 22:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, danahow

[-- Attachment #1: Type: text/plain, Size: 148 bytes --]

---
 builtin-pack-objects.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

-- 
Dana L. How  danahow@gmail.com  +1 650 804 5991 cell

[-- Attachment #2: 0008-prevent-try_delta-from-using-objects-not-in-pack.patch.txt --]
[-- Type: text/plain, Size: 857 bytes --]

From 7d510f82b1e6acccc25596052abf3c5f1961ebcc Mon Sep 17 00:00:00 2001
From: Dana How <how@deathvalley.cswitch.com>
Date: Thu, 5 Apr 2007 14:04:59 -0700
Subject: [PATCH 08/13] prevent try_delta from using objects not in pack

---
 builtin-pack-objects.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 9eb5fd6..37b0150 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1251,6 +1251,10 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
 	if (trg_entry->type != src_entry->type)
 		return -1;
 
+	/* Don't try deltas involving already/non written objects */
+	if (trg_entry->no_write || src_entry->no_write)
+		return -1;
+
 	/* We do not compute delta to *create* objects we are not
 	 * going to pack.
 	 */
-- 
1.5.1.rc2.18.g9c88-dirty


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 08/13] prevent try_delta from using objects not in pack
  2007-04-05 22:35 [PATCH 08/13] prevent try_delta from using objects not in pack Dana How
@ 2007-04-05 23:04 ` Junio C Hamano
  2007-04-06  1:02 ` Nicolas Pitre
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2007-04-05 23:04 UTC (permalink / raw)
  To: Dana How; +Cc: Junio C Hamano, git

"Dana How" <danahow@gmail.com> writes:

> Subject: [PATCH 08/13] prevent try_delta from using objects not in pack
>
> ---
>  builtin-pack-objects.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
> index 9eb5fd6..37b0150 100644
> --- a/builtin-pack-objects.c
> +++ b/builtin-pack-objects.c
> @@ -1251,6 +1251,10 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
>  	if (trg_entry->type != src_entry->type)
>  		return -1;
>  
> +	/* Don't try deltas involving already/non written objects */
> +	if (trg_entry->no_write || src_entry->no_write)
> +		return -1;
> +
>  	/* We do not compute delta to *create* objects we are not
>  	 * going to pack.
>  	 */

I think trg_entry->no_write case should return -1 (meaning:
doesn't delta with this source, nor any other sources you would
throw at try_delta()), but shouldn't src_entry->no_write case
return 0 (meaning: does not delta with this source, but do not
give up -- other source candidate you have might be usable)?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 08/13] prevent try_delta from using objects not in pack
  2007-04-05 22:35 [PATCH 08/13] prevent try_delta from using objects not in pack Dana How
  2007-04-05 23:04 ` Junio C Hamano
@ 2007-04-06  1:02 ` Nicolas Pitre
  2007-04-06  2:28   ` Dana How
  1 sibling, 1 reply; 8+ messages in thread
From: Nicolas Pitre @ 2007-04-06  1:02 UTC (permalink / raw)
  To: Dana How; +Cc: Junio C Hamano, git


What is the purpose of this patch?

The try_delta() function is called with all objects before any object is 
written to a pack to find out how to deltify objects upfront.


Nicolas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 08/13] prevent try_delta from using objects not in pack
  2007-04-06  1:02 ` Nicolas Pitre
@ 2007-04-06  2:28   ` Dana How
  2007-04-06  3:27     ` Nicolas Pitre
  0 siblings, 1 reply; 8+ messages in thread
From: Dana How @ 2007-04-06  2:28 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git, danahow

On 4/5/07, Nicolas Pitre <nico@cam.org> wrote:
> What is the purpose of this patch?
>
> The try_delta() function is called with all objects before any object is
> written to a pack to find out how to deltify objects upfront.

I set no_write for 2 different reasons in the patchset.
(1) When the blob is too big (--blob-limit) and will never be written.
(2) When the blob has been written to a previous, finished pack.

You're correct that this patch will never see condition (2).

I think my repository statistics are a little unusual.
Perhaps I'm getting ahead of myself here,
but I also wanted to experiment with writing all blobs to one set
of packs,  and all trees, commits, and tags to another set
(but probably just one small pack).
I would use no_write for that and it would matter here.
But I haven't run any experiments yet so I don't know
if it makes any difference.  I have seen related discussions
on the mailing list so I know this isn't a new idea.
-- 
Dana L. How  danahow@gmail.com  +1 650 804 5991 cell

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 08/13] prevent try_delta from using objects not in pack
  2007-04-06  2:28   ` Dana How
@ 2007-04-06  3:27     ` Nicolas Pitre
  2007-04-06  3:47       ` Dana How
  0 siblings, 1 reply; 8+ messages in thread
From: Nicolas Pitre @ 2007-04-06  3:27 UTC (permalink / raw)
  To: Dana How; +Cc: Junio C Hamano, git

On Thu, 5 Apr 2007, Dana How wrote:

> On 4/5/07, Nicolas Pitre <nico@cam.org> wrote:
> > What is the purpose of this patch?
> > 
> > The try_delta() function is called with all objects before any object is
> > written to a pack to find out how to deltify objects upfront.
> 
> I set no_write for 2 different reasons in the patchset.
> (1) When the blob is too big (--blob-limit) and will never be written.
> (2) When the blob has been written to a previous, finished pack.
> 
> You're correct that this patch will never see condition (2).

Given that I proposed another way for big blobs in my previous email, 
then (1) should not be needed either.

> I think my repository statistics are a little unusual.
> Perhaps I'm getting ahead of myself here,
> but I also wanted to experiment with writing all blobs to one set
> of packs,  and all trees, commits, and tags to another set
> (but probably just one small pack).
> I would use no_write for that and it would matter here.

Again you should simply _not_ add objects you don't want to the list 
instead of adding them and marking them as unwanted.


Nicolas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 08/13] prevent try_delta from using objects not in pack
  2007-04-06  3:27     ` Nicolas Pitre
@ 2007-04-06  3:47       ` Dana How
  2007-04-06 14:59         ` Nicolas Pitre
  0 siblings, 1 reply; 8+ messages in thread
From: Dana How @ 2007-04-06  3:47 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git, danahow

On 4/5/07, Nicolas Pitre <nico@cam.org> wrote:
> On Thu, 5 Apr 2007, Dana How wrote:
> > On 4/5/07, Nicolas Pitre <nico@cam.org> wrote:
> > > What is the purpose of this patch?
> > >
> > > The try_delta() function is called with all objects before any object is
> > > written to a pack to find out how to deltify objects upfront.
> >
> > I set no_write for 2 different reasons in the patchset.
> > (1) When the blob is too big (--blob-limit) and will never be written.
> > (2) When the blob has been written to a previous, finished pack.
> >
> > You're correct that this patch will never see condition (2).
>
> Given that I proposed another way for big blobs in my previous email,
> then (1) should not be needed either.
>
> > I think my repository statistics are a little unusual.
> > Perhaps I'm getting ahead of myself here,
> > but I also wanted to experiment with writing all blobs to one set
> > of packs,  and all trees, commits, and tags to another set
> > (but probably just one small pack).
> > I would use no_write for that and it would matter here.
>
> Again you should simply _not_ add objects you don't want to the list
> instead of adding them and marking them as unwanted.

Agreed, the marking just required less code change at the time.
I would like not to have to add nr_skipped and nr_actual.

Currently we have get_object_list -> traverse_commit_list ->
show_{commit,object} -> add_object_entry ,  which is all
called way before get_object_details -> check_object -> sha1_object_info .
Can I safely move the sha1_object_info calll earlier into
add_object_entry so I will know the size for pruning?

In your other email you mention memory consumption due to object_entry.
This structure could benefit somewhat from some attention beyond removing
my no_write.(e.g. int->short for .depth and .delta_limit; int->enum
for .preferred_base).  Perhaps for later.

Thanks,
-- 
Dana L. How  danahow@gmail.com  +1 650 804 5991 cell

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 08/13] prevent try_delta from using objects not in pack
  2007-04-06  3:47       ` Dana How
@ 2007-04-06 14:59         ` Nicolas Pitre
  2007-04-06 18:17           ` Dana How
  0 siblings, 1 reply; 8+ messages in thread
From: Nicolas Pitre @ 2007-04-06 14:59 UTC (permalink / raw)
  To: Dana How; +Cc: Junio C Hamano, git

On Thu, 5 Apr 2007, Dana How wrote:

> Currently we have get_object_list -> traverse_commit_list ->
> show_{commit,object} -> add_object_entry ,  which is all
> called way before get_object_details -> check_object -> sha1_object_info .
> Can I safely move the sha1_object_info calll earlier into
> add_object_entry so I will know the size for pruning?

I think so, yes.  And actually this call to sha1_object_info() has been 
bothering me for a while.

There are currently two ways to get the list of objects to pack: one is 
from stdin where objects are listed with their SHA1's, the other one 
uses the internal revision walking code.  In the later case we _already_ 
have the information that sha1_object_info() later provides making it 
rather wasteful by forcing a second object header parsing.

However... aren't you more interested in the _compressed_ blob size than 
its raw size?

> In your other email you mention memory consumption due to object_entry.
> This structure could benefit somewhat from some attention beyond removing
> my no_write.(e.g. int->short for .depth and .delta_limit; int->enum
> for .preferred_base).  Perhaps for later.

Sure.


Nicolas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 08/13] prevent try_delta from using objects not in pack
  2007-04-06 14:59         ` Nicolas Pitre
@ 2007-04-06 18:17           ` Dana How
  0 siblings, 0 replies; 8+ messages in thread
From: Dana How @ 2007-04-06 18:17 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git, danahow

On 4/6/07, Nicolas Pitre <nico@cam.org> wrote:
> On Thu, 5 Apr 2007, Dana How wrote:
> > Currently we have get_object_list -> traverse_commit_list ->
> > show_{commit,object} -> add_object_entry ,  which is all
> > called way before get_object_details -> check_object -> sha1_object_info .
> > Can I safely move the sha1_object_info calll earlier into
> > add_object_entry so I will know the size for pruning?
>
> I think so, yes.  And actually this call to sha1_object_info() has been
> bothering me for a while.
>
> There are currently two ways to get the list of objects to pack: one is
> from stdin where objects are listed with their SHA1's, the other one
> uses the internal revision walking code.  In the later case we _already_
> have the information that sha1_object_info() later provides making it
> rather wasteful by forcing a second object header parsing.
Perhaps I'll add the early call (or actually, re-use the info if I can find it),
and change the second call to only happen if the size is zero,
since that's how it starts out in the current add_object_entry.

> However... aren't you more interested in the _compressed_ blob size than
> its raw size?
Yes!  So I am leaning towards disposing of --blob-limit anyway,
which was in part a poor response to a bug in --pack-limit.
Please see the other email I sent out so far today.

Thanks,
-- 
Dana L. How  danahow@gmail.com  +1 650 804 5991 cell

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-04-06 18:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-05 22:35 [PATCH 08/13] prevent try_delta from using objects not in pack Dana How
2007-04-05 23:04 ` Junio C Hamano
2007-04-06  1:02 ` Nicolas Pitre
2007-04-06  2:28   ` Dana How
2007-04-06  3:27     ` Nicolas Pitre
2007-04-06  3:47       ` Dana How
2007-04-06 14:59         ` Nicolas Pitre
2007-04-06 18:17           ` Dana How

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).