* [PATCH 08/13] prevent try_delta from using objects not in pack
@ 2007-04-05 22:35 Dana How
2007-04-05 23:04 ` Junio C Hamano
2007-04-06 1:02 ` Nicolas Pitre
0 siblings, 2 replies; 8+ messages in thread
From: Dana How @ 2007-04-05 22:35 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, danahow
[-- Attachment #1: Type: text/plain, Size: 148 bytes --]
---
builtin-pack-objects.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
--
Dana L. How danahow@gmail.com +1 650 804 5991 cell
[-- Attachment #2: 0008-prevent-try_delta-from-using-objects-not-in-pack.patch.txt --]
[-- Type: text/plain, Size: 857 bytes --]
From 7d510f82b1e6acccc25596052abf3c5f1961ebcc Mon Sep 17 00:00:00 2001
From: Dana How <how@deathvalley.cswitch.com>
Date: Thu, 5 Apr 2007 14:04:59 -0700
Subject: [PATCH 08/13] prevent try_delta from using objects not in pack
---
builtin-pack-objects.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 9eb5fd6..37b0150 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1251,6 +1251,10 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
if (trg_entry->type != src_entry->type)
return -1;
+ /* Don't try deltas involving already/non written objects */
+ if (trg_entry->no_write || src_entry->no_write)
+ return -1;
+
/* We do not compute delta to *create* objects we are not
* going to pack.
*/
--
1.5.1.rc2.18.g9c88-dirty
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 08/13] prevent try_delta from using objects not in pack
2007-04-05 22:35 [PATCH 08/13] prevent try_delta from using objects not in pack Dana How
@ 2007-04-05 23:04 ` Junio C Hamano
2007-04-06 1:02 ` Nicolas Pitre
1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2007-04-05 23:04 UTC (permalink / raw)
To: Dana How; +Cc: Junio C Hamano, git
"Dana How" <danahow@gmail.com> writes:
> Subject: [PATCH 08/13] prevent try_delta from using objects not in pack
>
> ---
> builtin-pack-objects.c | 4 ++++
> 1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
> index 9eb5fd6..37b0150 100644
> --- a/builtin-pack-objects.c
> +++ b/builtin-pack-objects.c
> @@ -1251,6 +1251,10 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
> if (trg_entry->type != src_entry->type)
> return -1;
>
> + /* Don't try deltas involving already/non written objects */
> + if (trg_entry->no_write || src_entry->no_write)
> + return -1;
> +
> /* We do not compute delta to *create* objects we are not
> * going to pack.
> */
I think trg_entry->no_write case should return -1 (meaning:
doesn't delta with this source, nor any other sources you would
throw at try_delta()), but shouldn't src_entry->no_write case
return 0 (meaning: does not delta with this source, but do not
give up -- other source candidate you have might be usable)?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 08/13] prevent try_delta from using objects not in pack
2007-04-05 22:35 [PATCH 08/13] prevent try_delta from using objects not in pack Dana How
2007-04-05 23:04 ` Junio C Hamano
@ 2007-04-06 1:02 ` Nicolas Pitre
2007-04-06 2:28 ` Dana How
1 sibling, 1 reply; 8+ messages in thread
From: Nicolas Pitre @ 2007-04-06 1:02 UTC (permalink / raw)
To: Dana How; +Cc: Junio C Hamano, git
What is the purpose of this patch?
The try_delta() function is called with all objects before any object is
written to a pack to find out how to deltify objects upfront.
Nicolas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 08/13] prevent try_delta from using objects not in pack
2007-04-06 1:02 ` Nicolas Pitre
@ 2007-04-06 2:28 ` Dana How
2007-04-06 3:27 ` Nicolas Pitre
0 siblings, 1 reply; 8+ messages in thread
From: Dana How @ 2007-04-06 2:28 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, git, danahow
On 4/5/07, Nicolas Pitre <nico@cam.org> wrote:
> What is the purpose of this patch?
>
> The try_delta() function is called with all objects before any object is
> written to a pack to find out how to deltify objects upfront.
I set no_write for 2 different reasons in the patchset.
(1) When the blob is too big (--blob-limit) and will never be written.
(2) When the blob has been written to a previous, finished pack.
You're correct that this patch will never see condition (2).
I think my repository statistics are a little unusual.
Perhaps I'm getting ahead of myself here,
but I also wanted to experiment with writing all blobs to one set
of packs, and all trees, commits, and tags to another set
(but probably just one small pack).
I would use no_write for that and it would matter here.
But I haven't run any experiments yet so I don't know
if it makes any difference. I have seen related discussions
on the mailing list so I know this isn't a new idea.
--
Dana L. How danahow@gmail.com +1 650 804 5991 cell
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 08/13] prevent try_delta from using objects not in pack
2007-04-06 2:28 ` Dana How
@ 2007-04-06 3:27 ` Nicolas Pitre
2007-04-06 3:47 ` Dana How
0 siblings, 1 reply; 8+ messages in thread
From: Nicolas Pitre @ 2007-04-06 3:27 UTC (permalink / raw)
To: Dana How; +Cc: Junio C Hamano, git
On Thu, 5 Apr 2007, Dana How wrote:
> On 4/5/07, Nicolas Pitre <nico@cam.org> wrote:
> > What is the purpose of this patch?
> >
> > The try_delta() function is called with all objects before any object is
> > written to a pack to find out how to deltify objects upfront.
>
> I set no_write for 2 different reasons in the patchset.
> (1) When the blob is too big (--blob-limit) and will never be written.
> (2) When the blob has been written to a previous, finished pack.
>
> You're correct that this patch will never see condition (2).
Given that I proposed another way for big blobs in my previous email,
then (1) should not be needed either.
> I think my repository statistics are a little unusual.
> Perhaps I'm getting ahead of myself here,
> but I also wanted to experiment with writing all blobs to one set
> of packs, and all trees, commits, and tags to another set
> (but probably just one small pack).
> I would use no_write for that and it would matter here.
Again you should simply _not_ add objects you don't want to the list
instead of adding them and marking them as unwanted.
Nicolas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 08/13] prevent try_delta from using objects not in pack
2007-04-06 3:27 ` Nicolas Pitre
@ 2007-04-06 3:47 ` Dana How
2007-04-06 14:59 ` Nicolas Pitre
0 siblings, 1 reply; 8+ messages in thread
From: Dana How @ 2007-04-06 3:47 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, git, danahow
On 4/5/07, Nicolas Pitre <nico@cam.org> wrote:
> On Thu, 5 Apr 2007, Dana How wrote:
> > On 4/5/07, Nicolas Pitre <nico@cam.org> wrote:
> > > What is the purpose of this patch?
> > >
> > > The try_delta() function is called with all objects before any object is
> > > written to a pack to find out how to deltify objects upfront.
> >
> > I set no_write for 2 different reasons in the patchset.
> > (1) When the blob is too big (--blob-limit) and will never be written.
> > (2) When the blob has been written to a previous, finished pack.
> >
> > You're correct that this patch will never see condition (2).
>
> Given that I proposed another way for big blobs in my previous email,
> then (1) should not be needed either.
>
> > I think my repository statistics are a little unusual.
> > Perhaps I'm getting ahead of myself here,
> > but I also wanted to experiment with writing all blobs to one set
> > of packs, and all trees, commits, and tags to another set
> > (but probably just one small pack).
> > I would use no_write for that and it would matter here.
>
> Again you should simply _not_ add objects you don't want to the list
> instead of adding them and marking them as unwanted.
Agreed, the marking just required less code change at the time.
I would like not to have to add nr_skipped and nr_actual.
Currently we have get_object_list -> traverse_commit_list ->
show_{commit,object} -> add_object_entry , which is all
called way before get_object_details -> check_object -> sha1_object_info .
Can I safely move the sha1_object_info calll earlier into
add_object_entry so I will know the size for pruning?
In your other email you mention memory consumption due to object_entry.
This structure could benefit somewhat from some attention beyond removing
my no_write.(e.g. int->short for .depth and .delta_limit; int->enum
for .preferred_base). Perhaps for later.
Thanks,
--
Dana L. How danahow@gmail.com +1 650 804 5991 cell
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 08/13] prevent try_delta from using objects not in pack
2007-04-06 3:47 ` Dana How
@ 2007-04-06 14:59 ` Nicolas Pitre
2007-04-06 18:17 ` Dana How
0 siblings, 1 reply; 8+ messages in thread
From: Nicolas Pitre @ 2007-04-06 14:59 UTC (permalink / raw)
To: Dana How; +Cc: Junio C Hamano, git
On Thu, 5 Apr 2007, Dana How wrote:
> Currently we have get_object_list -> traverse_commit_list ->
> show_{commit,object} -> add_object_entry , which is all
> called way before get_object_details -> check_object -> sha1_object_info .
> Can I safely move the sha1_object_info calll earlier into
> add_object_entry so I will know the size for pruning?
I think so, yes. And actually this call to sha1_object_info() has been
bothering me for a while.
There are currently two ways to get the list of objects to pack: one is
from stdin where objects are listed with their SHA1's, the other one
uses the internal revision walking code. In the later case we _already_
have the information that sha1_object_info() later provides making it
rather wasteful by forcing a second object header parsing.
However... aren't you more interested in the _compressed_ blob size than
its raw size?
> In your other email you mention memory consumption due to object_entry.
> This structure could benefit somewhat from some attention beyond removing
> my no_write.(e.g. int->short for .depth and .delta_limit; int->enum
> for .preferred_base). Perhaps for later.
Sure.
Nicolas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 08/13] prevent try_delta from using objects not in pack
2007-04-06 14:59 ` Nicolas Pitre
@ 2007-04-06 18:17 ` Dana How
0 siblings, 0 replies; 8+ messages in thread
From: Dana How @ 2007-04-06 18:17 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, git, danahow
On 4/6/07, Nicolas Pitre <nico@cam.org> wrote:
> On Thu, 5 Apr 2007, Dana How wrote:
> > Currently we have get_object_list -> traverse_commit_list ->
> > show_{commit,object} -> add_object_entry , which is all
> > called way before get_object_details -> check_object -> sha1_object_info .
> > Can I safely move the sha1_object_info calll earlier into
> > add_object_entry so I will know the size for pruning?
>
> I think so, yes. And actually this call to sha1_object_info() has been
> bothering me for a while.
>
> There are currently two ways to get the list of objects to pack: one is
> from stdin where objects are listed with their SHA1's, the other one
> uses the internal revision walking code. In the later case we _already_
> have the information that sha1_object_info() later provides making it
> rather wasteful by forcing a second object header parsing.
Perhaps I'll add the early call (or actually, re-use the info if I can find it),
and change the second call to only happen if the size is zero,
since that's how it starts out in the current add_object_entry.
> However... aren't you more interested in the _compressed_ blob size than
> its raw size?
Yes! So I am leaning towards disposing of --blob-limit anyway,
which was in part a poor response to a bug in --pack-limit.
Please see the other email I sent out so far today.
Thanks,
--
Dana L. How danahow@gmail.com +1 650 804 5991 cell
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-04-06 18:17 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-05 22:35 [PATCH 08/13] prevent try_delta from using objects not in pack Dana How
2007-04-05 23:04 ` Junio C Hamano
2007-04-06 1:02 ` Nicolas Pitre
2007-04-06 2:28 ` Dana How
2007-04-06 3:27 ` Nicolas Pitre
2007-04-06 3:47 ` Dana How
2007-04-06 14:59 ` Nicolas Pitre
2007-04-06 18:17 ` Dana How
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).