git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 2/2] Flag and skip over packfiles known to be invalid.
       [not found] <dda240a4adf0511b3e1ab1eb74abdd28821358b0.1170403175.git.spearce@spearce.org>
@ 2007-02-02  8:00 ` Shawn O. Pearce
  2007-02-02  8:34   ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Shawn O. Pearce @ 2007-02-02  8:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

If we've already looked at a packfile and determined it isn't
valid/usable as a pack, we shouldn't try to use it again in
the future either.  This avoids multiple error messages from
the same packfile.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 cache.h     |    1 +
 sha1_file.c |    5 +++++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/cache.h b/cache.h
index 9873ee9..64c2d3f 100644
--- a/cache.h
+++ b/cache.h
@@ -357,6 +357,7 @@ extern struct packed_git {
 	off_t pack_size;
 	int pack_fd;
 	int pack_local;
+	unsigned invalid:1;
 	unsigned char sha1[20];
 	/* something like ".git/objects/pack/xxxxx.pack" */
 	char pack_name[FLEX_ARRAY]; /* more */
diff --git a/sha1_file.c b/sha1_file.c
index ba1c799..9f6e94e 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -714,6 +714,7 @@ struct packed_git *add_packed_git(char *path, int path_len, int local)
 	p->next = NULL;
 	p->windows = NULL;
 	p->pack_fd = -1;
+	p->invalid = 0;
 	p->pack_local = local;
 	if ((path_len > 44) && !get_sha1_hex(path + path_len - 44, sha1))
 		hashcpy(p->sha1, sha1);
@@ -746,6 +747,7 @@ struct packed_git *parse_pack_index_file(const unsigned char *sha1, char *idx_pa
 	p->next = NULL;
 	p->windows = NULL;
 	p->pack_fd = -1;
+	p->invalid = 0;
 	hashcpy(p->sha1, sha1);
 	return p;
 }
@@ -1395,6 +1397,8 @@ static int find_pack_entry(const unsigned char *sha1, struct pack_entry *e, cons
 	prepare_packed_git();
 
 	for (p = packed_git; p; p = p->next) {
+		if (p->invalid)
+			continue;
 		if (ignore_packed) {
 			const char **ig;
 			for (ig = ignore_packed; *ig; ig++)
@@ -1418,6 +1422,7 @@ static int find_pack_entry(const unsigned char *sha1, struct pack_entry *e, cons
 					close(p->pack_fd);
 					p->pack_fd = -1;
 				}
+				p->invalid = 1;
 				error("packfile %s cannot be accessed", p->pack_name);
 				continue;
 			}
-- 
1.5.0.rc3.1.ge4b0e

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] Flag and skip over packfiles known to be invalid.
  2007-02-02  8:00 ` [PATCH 2/2] Flag and skip over packfiles known to be invalid Shawn O. Pearce
@ 2007-02-02  8:34   ` Junio C Hamano
  2007-02-02  8:51     ` Shawn O. Pearce
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2007-02-02  8:34 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> If we've already looked at a packfile and determined it isn't
> valid/usable as a pack, we shouldn't try to use it again in
> the future either.  This avoids multiple error messages from
> the same packfile.

I haven't looked your patch very closely, but two successive
"git-repack -a -d" would produce the packfile under the same
name, but with different parameters (--window or --depth), the
old .idx and new .pack may not match.

Can this be possible (time flows from top to bottom):

	git-repack -a -d		somebody else

	produces a new .pack/.idx

        mv new .pack to objects/pack

					mmap .idx (old)
					finds .pack does not match

	mv new .idx to objects/pack
	
Is this something we care about?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] Flag and skip over packfiles known to be invalid.
  2007-02-02  8:34   ` Junio C Hamano
@ 2007-02-02  8:51     ` Shawn O. Pearce
  2007-02-03  5:44       ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Shawn O. Pearce @ 2007-02-02  8:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> wrote:
> I haven't looked your patch very closely, but two successive
> "git-repack -a -d" would produce the packfile under the same
> name, but with different parameters (--window or --depth), the
> old .idx and new .pack may not match.
> 
> Can this be possible (time flows from top to bottom):
> 
> 	git-repack -a -d		somebody else
> 
> 	produces a new .pack/.idx
> 
>         mv new .pack to objects/pack
> 
> 					mmap .idx (old)
> 					finds .pack does not match
> 
> 	mv new .idx to objects/pack

Gah.  That's sick.  And *so* totally possible.
 	
> Is this something we care about?
 
Yes and no.

I've pushed the idea that `git repack -a -d` is totally safe to
run in a live repository.  Its clearly not.  *Especially* if you
repack with the same set of objects (as you do above) or if you
repack in rapid succession faster than readers can make progress.
I think both of these are just outside of the realm of "good ideas
to do to live things".

Its like taking your cat to the zoo play with a lion.  Maybe not
the best thing for the cat...


In this particular case that you highlight above this patch would
mark the pack as invalid and never permit it to be accessed again
by the 'somebody else'.

But Git has always been broken for this case.  prepare_packed_git_one
would refuse to open the new index, as it already has a packed_git.
Prior to this patch we'd try to access the packfile and die(),
as it did not match the index that we have.  Now we'd at least
try to locate some other source, or die() over a missing object,
but only after listing the idx/pack mismatch error too.

The only fix for the case above is to unlink the invalid packed_git
from packed_git, so that if we fail to find the object and wind up
calling reprepare_packed_git() we can reload the new index.  I did
consider doing that over the invalid flag, but flipped a coin and
added invalid.  FWIW, maybe invalid is the wrong way to go.


We still have a problem though, as the above case could still
happen to us during the fallback search in read_sha1_file().
In other words two searches ain't enough.  As Dscho pointed out in
the original thread, we probably need to loop until no new packs
are identified before giving up.

I almost submitted a patch to do that tonight, but I couldn't decide
on behavior: should we scan known packs, then try for loose, then
scan packs again until no object or no new pack is found?  Probably.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] Flag and skip over packfiles known to be invalid.
  2007-02-02  8:51     ` Shawn O. Pearce
@ 2007-02-03  5:44       ` Junio C Hamano
  2007-02-04  5:02         ` Shawn O. Pearce
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2007-02-03  5:44 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> I almost submitted a patch to do that tonight, but I couldn't decide
> on behavior: should we scan known packs, then try for loose, then
> scan packs again until no object or no new pack is found?  Probably.

Hmmm.  Probably.  

But I tend to think that this particular failure scenario is
probably rare enough that plugging this in "the right way" is
not a high priority.  We should definitely revisit it post
1.5.0.

Also if we are adding a bitfield, I think pack_local should also
become one, as it currently wastes a whole word to hold one bit
(on the other hand if we do not want to add a field I think a
different negative value in pack_fd could mean "do not bother to
look at it again").

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] Flag and skip over packfiles known to be invalid.
  2007-02-03  5:44       ` Junio C Hamano
@ 2007-02-04  5:02         ` Shawn O. Pearce
  0 siblings, 0 replies; 5+ messages in thread
From: Shawn O. Pearce @ 2007-02-04  5:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> 
> > I almost submitted a patch to do that tonight, but I couldn't decide
> > on behavior: should we scan known packs, then try for loose, then
> > scan packs again until no object or no new pack is found?  Probably.
> 
> Hmmm.  Probably.  
> 
> But I tend to think that this particular failure scenario is
> probably rare enough that plugging this in "the right way" is
> not a high priority.  We should definitely revisit it post
> 1.5.0.

Indeed.  I'll come back to it after 1.5.0 is out.
 
> Also if we are adding a bitfield, I think pack_local should also
> become one, as it currently wastes a whole word to hold one bit
> (on the other hand if we do not want to add a field I think a
> different negative value in pack_fd could mean "do not bother to
> look at it again").

Good point. I forgot about that ~4 byte boolean hanging around.
As a comment on the TDWTF might say, "Yes, No, FileNotFound, 42,
192, 1088, ... these are all valid values for pack_local!"  :-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-02-04  5:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <dda240a4adf0511b3e1ab1eb74abdd28821358b0.1170403175.git.spearce@spearce.org>
2007-02-02  8:00 ` [PATCH 2/2] Flag and skip over packfiles known to be invalid Shawn O. Pearce
2007-02-02  8:34   ` Junio C Hamano
2007-02-02  8:51     ` Shawn O. Pearce
2007-02-03  5:44       ` Junio C Hamano
2007-02-04  5:02         ` Shawn O. Pearce

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).