git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* repack: handling of .keep files
@ 2007-05-04  9:25 Alex Riesen
  2007-05-04  9:56 ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Riesen @ 2007-05-04  9:25 UTC (permalink / raw)
  To: Git Mailing List

It does not seem to be consistent with the name:
"git repack -a -d" removes .keep files anyway.

How can I pin down a pack file so that it is never repacked?
(Windows doesn't like big files, they get fragmented heavily
and are hard to read and defragment. So I try keep them
relatively small).

P.S. Experimenting with the .keep-files I had a crash in git-log,
when the pack was renamed into .keep.pack, but the
index was not. git-log complained about two objects it could not
read and than crashed. It's cygwin.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: repack: handling of .keep files
  2007-05-04  9:25 repack: handling of .keep files Alex Riesen
@ 2007-05-04  9:56 ` Junio C Hamano
  2007-05-04 10:42   ` Alex Riesen
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2007-05-04  9:56 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Git Mailing List

"Alex Riesen" <raa.lkml@gmail.com> writes:

> ... Experimenting with the .keep-files I had a crash in git-log,
> when the pack was renamed into .keep.pack, but the
> index was not. git-log complained about two objects it could not
> read and than crashed. It's cygwin.

This part makes me suspect you are not even using the .keep
properly.  In addition to pack-[0-9a-f]{40}.(pack|idx), you
would have a corresponding pack-[0-9a-f]{40}.keep file (whose
contents does not matter) to mark that these should not get
repacked.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: repack: handling of .keep files
  2007-05-04  9:56 ` Junio C Hamano
@ 2007-05-04 10:42   ` Alex Riesen
  2007-05-04 16:59     ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Riesen @ 2007-05-04 10:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On 5/4/07, Junio C Hamano <junkio@cox.net> wrote:
> "Alex Riesen" <raa.lkml@gmail.com> writes:
>
> > ... Experimenting with the .keep-files I had a crash in git-log,
> > when the pack was renamed into .keep.pack, but the
> > index was not. git-log complained about two objects it could not
> > read and than crashed. It's cygwin.
>
> This part makes me suspect you are not even using the .keep
> properly.  In addition to pack-[0-9a-f]{40}.(pack|idx), you
> would have a corresponding pack-[0-9a-f]{40}.keep file (whose
> contents does not matter) to mark that these should not get
> repacked.

This is it (described in git-index-pack.txt, maybe git-repack.txt should
reference it too), thanks.

Still, git-log shouldn't crash (nothing should, of course).
And, the temporary pack is created in working tree, instead of in GIT_DIR
(why not GIT_OBJECT_DIR, btw?)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: repack: handling of .keep files
  2007-05-04 10:42   ` Alex Riesen
@ 2007-05-04 16:59     ` Junio C Hamano
  2007-05-04 17:24       ` Alex Riesen
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2007-05-04 16:59 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Git Mailing List

"Alex Riesen" <raa.lkml@gmail.com> writes:

> Still, git-log shouldn't crash (nothing should, of course).

Honestly, I think that's borderline.  If you "dd if=/dev/random
of=/dev/hda", should the kernel keep going, perhaps gracefully
declining access to the filesystem on that drive?

> And, the temporary pack is created in working tree, instead of in GIT_DIR
> (why not GIT_OBJECT_DIR, btw?)

Not limited to the temporary pack (the detail of exact use
pattern escapes me -- I do not think it was temporary pack that
initiated the use of GIT_DIR for temporary things), I think
trying to not create things in the working tree came after
somebody who had a read-only (to him, not necessarily to
everybody) working tree and read-write GIT_DIR (separate
location, specified with the environment variable) had trouble.
At least the theory was GIT_DIR would be writable as long as you
are doing a "write" oriented operation, regardless of what.

Back then we did not support working in subdirectory of the
project as fully as we do now, so using such configuration was
not much less convenient than you have the normal layout of
having $project/.git directory at the top of the working tree.

While I do not think it is worth to try "supporting" use of
read-only working tree for write oriented operations, which
means that it should be safe to assume that the working tree is
writable and we _could_ create temporary pack in working tree
instead, I do not think the "_could_" means we should.  In the
case of temporary pack I do not think there would be a risk of
filename collisions, I think it makes sense to use either
GIT_DIR or GIT_OBJECT_DIRECTORY instead of the working tree.  

I do not know pros-and-cons between .git/ and .git/objects/;
filesystems tend to cluster nearby things better, so the latter
might be more logical, but packs are about using smaller (much
much much smaller) number of files than you would use otherwise
to store objects _and_ keeping them in use, so I suspect it
would not make much practical difference even if we try to
encourage the filesystem to allocate the disk blocks for new
pack near existing packs by creating the temporary file.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: repack: handling of .keep files
  2007-05-04 16:59     ` Junio C Hamano
@ 2007-05-04 17:24       ` Alex Riesen
  2007-05-04 19:20         ` Junio C Hamano
                           ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Alex Riesen @ 2007-05-04 17:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On 5/4/07, Junio C Hamano <junkio@cox.net> wrote:
> "Alex Riesen" <raa.lkml@gmail.com> writes:
>
> > Still, git-log shouldn't crash (nothing should, of course).
>
> Honestly, I think that's borderline.  If you "dd if=/dev/random
> of=/dev/hda", should the kernel keep going, perhaps gracefully
> declining access to the filesystem on that drive?

e2fsck has a test somewhere which randomly corrupts a partition
and then lets the program fix it.
All kind of corruptions happen, we will have to deal with them.
Especially if this crash is so simple to reproduce.

> case of temporary pack I do not think there would be a risk of
> filename collisions, I think it makes sense to use either
> GIT_DIR or GIT_OBJECT_DIRECTORY instead of the working tree.
>
> I do not know pros-and-cons between .git/ and .git/objects/;

These are settable separately, so theoretically you can end
up with .git and .git/objects being on different filesystems.
Atomic rename wont be possible than.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: repack: handling of .keep files
  2007-05-04 17:24       ` Alex Riesen
@ 2007-05-04 19:20         ` Junio C Hamano
       [not found]           ` <20070507173324.GA3436@steel.home>
  2007-05-04 19:40         ` repack: handling of .keep files Nicolas Pitre
  2007-05-04 21:54         ` [PATCH] Handle return code of parse_commit in revision machinery Alex Riesen
  2 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2007-05-04 19:20 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Git Mailing List

"Alex Riesen" <raa.lkml@gmail.com> writes:

> On 5/4/07, Junio C Hamano <junkio@cox.net> wrote:
> ...
>> I do not know pros-and-cons between .git/ and .git/objects/;
>
> These are settable separately, so theoretically you can end
> up with .git and .git/objects being on different filesystems.
> Atomic rename wont be possible than.

Fair enough.

As we've lived without trouble long enough creating the new pack
tempfile in $GIT_DIR, I do not think it is urgent; but feel free
to propose moving it to GIT_OBJECT_DIRECTORY.  We _might_ need
to teach fsck to take notice of such leftover temporary files
from an earlier, aborted run if we do this -- I haven't looked.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: repack: handling of .keep files
  2007-05-04 17:24       ` Alex Riesen
  2007-05-04 19:20         ` Junio C Hamano
@ 2007-05-04 19:40         ` Nicolas Pitre
  2007-05-04 21:54         ` [PATCH] Handle return code of parse_commit in revision machinery Alex Riesen
  2 siblings, 0 replies; 11+ messages in thread
From: Nicolas Pitre @ 2007-05-04 19:40 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Junio C Hamano, Git Mailing List

On Fri, 4 May 2007, Alex Riesen wrote:

> On 5/4/07, Junio C Hamano <junkio@cox.net> wrote:
> > "Alex Riesen" <raa.lkml@gmail.com> writes:
> > 
> > > Still, git-log shouldn't crash (nothing should, of course).
> > 
> > Honestly, I think that's borderline.  If you "dd if=/dev/random
> > of=/dev/hda", should the kernel keep going, perhaps gracefully
> > declining access to the filesystem on that drive?
> 
> e2fsck has a test somewhere which randomly corrupts a partition
> and then lets the program fix it.
> All kind of corruptions happen, we will have to deal with them.
> Especially if this crash is so simple to reproduce.
> 
> > case of temporary pack I do not think there would be a risk of
> > filename collisions, I think it makes sense to use either
> > GIT_DIR or GIT_OBJECT_DIRECTORY instead of the working tree.
> > 
> > I do not know pros-and-cons between .git/ and .git/objects/;
> 
> These are settable separately, so theoretically you can end
> up with .git and .git/objects being on different filesystems.
> Atomic rename wont be possible than.

Other temporary pack/object creation instances already use 
GIT_OBJECT_DIRECTORY.


Nicolas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] Handle return code of parse_commit in revision machinery
  2007-05-04 17:24       ` Alex Riesen
  2007-05-04 19:20         ` Junio C Hamano
  2007-05-04 19:40         ` repack: handling of .keep files Nicolas Pitre
@ 2007-05-04 21:54         ` Alex Riesen
  2 siblings, 0 replies; 11+ messages in thread
From: Alex Riesen @ 2007-05-04 21:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

This fixes a crash in broken repositories where random commits
suddenly disappear.

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
---

... For example when loosing one of the packs by renaming it into a
random name while trying to figure out how to protect it from deletion
by "git-repack -a -d". Lesson learned, code fixed, tests passed.

 revision.c |   75 +++++++++++++++++++++++++++++++++++++++++------------------
 revision.h |    2 +-
 2 files changed, 53 insertions(+), 24 deletions(-)

diff --git a/revision.c b/revision.c
index e60a26c..b4c494d 100644
--- a/revision.c
+++ b/revision.c
@@ -318,7 +318,10 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 	while ((parent = *pp) != NULL) {
 		struct commit *p = parent->item;
 
-		parse_commit(p);
+		if (parse_commit(p) < 0)
+			die("cannot simplify commit %s (because of %s)",
+			    sha1_to_hex(commit->object.sha1),
+			    sha1_to_hex(p->object.sha1));
 		switch (rev_compare_tree(revs, p->tree, commit->tree)) {
 		case REV_TREE_SAME:
 			tree_same = 1;
@@ -347,7 +350,10 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 				 * IOW, we pretend this parent is a
 				 * "root" commit.
 				 */
-				parse_commit(p);
+				if(parse_commit(p) < 0)
+					die("cannot simplify commit %s (invalid %s)",
+					    sha1_to_hex(commit->object.sha1),
+					    sha1_to_hex(p->object.sha1));
 				p->parents = NULL;
 			}
 		/* fallthrough */
@@ -362,14 +368,14 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 		commit->object.flags |= TREECHANGE;
 }
 
-static void add_parents_to_list(struct rev_info *revs, struct commit *commit, struct commit_list **list)
+static int add_parents_to_list(struct rev_info *revs, struct commit *commit, struct commit_list **list)
 {
 	struct commit_list *parent = commit->parents;
 	unsigned left_flag;
 	int add, rest;
 
 	if (commit->object.flags & ADDED)
-		return;
+		return 0;
 	commit->object.flags |= ADDED;
 
 	/*
@@ -388,7 +394,8 @@ static void add_parents_to_list(struct rev_info *revs, struct commit *commit, st
 		while (parent) {
 			struct commit *p = parent->item;
 			parent = parent->next;
-			parse_commit(p);
+			if (parse_commit(p) < 0)
+				return -1;
 			p->object.flags |= UNINTERESTING;
 			if (p->parents)
 				mark_parents_uninteresting(p);
@@ -397,7 +404,7 @@ static void add_parents_to_list(struct rev_info *revs, struct commit *commit, st
 			p->object.flags |= SEEN;
 			insert_by_date(p, list);
 		}
-		return;
+		return 0;
 	}
 
 	/*
@@ -409,7 +416,7 @@ static void add_parents_to_list(struct rev_info *revs, struct commit *commit, st
 		revs->prune_fn(revs, commit);
 
 	if (revs->no_walk)
-		return;
+		return 0;
 
 	left_flag = (commit->object.flags & SYMMETRIC_LEFT);
 
@@ -418,7 +425,8 @@ static void add_parents_to_list(struct rev_info *revs, struct commit *commit, st
 		struct commit *p = parent->item;
 
 		parent = parent->next;
-		parse_commit(p);
+		if (parse_commit(p) < 0)
+			return -1;
 		p->object.flags |= left_flag;
 		if (p->object.flags & SEEN)
 			continue;
@@ -426,6 +434,7 @@ static void add_parents_to_list(struct rev_info *revs, struct commit *commit, st
 		if (add)
 			insert_by_date(p, list);
 	}
+	return 0;
 }
 
 static void cherry_pick_list(struct commit_list *list)
@@ -508,7 +517,7 @@ static void cherry_pick_list(struct commit_list *list)
 	free_patch_ids(&ids);
 }
 
-static void limit_list(struct rev_info *revs)
+static int limit_list(struct rev_info *revs)
 {
 	struct commit_list *list = revs->commits;
 	struct commit_list *newlist = NULL;
@@ -524,7 +533,8 @@ static void limit_list(struct rev_info *revs)
 
 		if (revs->max_age != -1 && (commit->date < revs->max_age))
 			obj->flags |= UNINTERESTING;
-		add_parents_to_list(revs, commit, &list);
+		if (add_parents_to_list(revs, commit, &list) < 0)
+			return -1;
 		if (obj->flags & UNINTERESTING) {
 			mark_parents_uninteresting(commit);
 			if (everybody_uninteresting(list))
@@ -539,6 +549,7 @@ static void limit_list(struct rev_info *revs)
 		cherry_pick_list(newlist);
 
 	revs->commits = newlist;
+	return 0;
 }
 
 struct all_refs_cb {
@@ -1227,7 +1238,7 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, const ch
 	return left;
 }
 
-void prepare_revision_walk(struct rev_info *revs)
+int prepare_revision_walk(struct rev_info *revs)
 {
 	int nr = revs->pending.nr;
 	struct object_array_entry *e, *list;
@@ -1249,42 +1260,59 @@ void prepare_revision_walk(struct rev_info *revs)
 	free(list);
 
 	if (revs->no_walk)
-		return;
+		return 0;
 	if (revs->limited)
-		limit_list(revs);
+		if (limit_list(revs) < 0)
+			return -1;
 	if (revs->topo_order)
 		sort_in_topological_order_fn(&revs->commits, revs->lifo,
 					     revs->topo_setter,
 					     revs->topo_getter);
+	return 0;
 }
 
-static int rewrite_one(struct rev_info *revs, struct commit **pp)
+enum rewrite_result
+{
+	rewrite_one_ok,
+	rewrite_one_noparents,
+	rewrite_one_error,
+};
+
+static enum rewrite_result rewrite_one(struct rev_info *revs, struct commit **pp)
 {
 	for (;;) {
 		struct commit *p = *pp;
 		if (!revs->limited)
-			add_parents_to_list(revs, p, &revs->commits);
+			if (add_parents_to_list(revs, p, &revs->commits) < 0)
+				return rewrite_one_error;
 		if (p->parents && p->parents->next)
-			return 0;
+			return rewrite_one_ok;
 		if (p->object.flags & (TREECHANGE | UNINTERESTING))
-			return 0;
+			return rewrite_one_ok;
 		if (!p->parents)
-			return -1;
+			return rewrite_one_noparents;
 		*pp = p->parents->item;
 	}
 }
 
-static void rewrite_parents(struct rev_info *revs, struct commit *commit)
+static int rewrite_parents(struct rev_info *revs, struct commit *commit)
 {
 	struct commit_list **pp = &commit->parents;
 	while (*pp) {
 		struct commit_list *parent = *pp;
-		if (rewrite_one(revs, &parent->item) < 0) {
+		switch (rewrite_one(revs, &parent->item))
+		{
+		case rewrite_one_ok:
+			break;
+		case rewrite_one_noparents:
 			*pp = parent->next;
 			continue;
+		case rewrite_one_error:
+			return -1;
 		}
 		pp = &parent->next;
 	}
+	return 0;
 }
 
 static int commit_match(struct commit *commit, struct rev_info *opt)
@@ -1320,7 +1348,8 @@ static struct commit *get_revision_1(struct rev_info *revs)
 			if (revs->max_age != -1 &&
 			    (commit->date < revs->max_age))
 				continue;
-			add_parents_to_list(revs, commit, &revs->commits);
+			if (add_parents_to_list(revs, commit, &revs->commits) < 0)
+				return NULL;
 		}
 		if (commit->object.flags & SHOWN)
 			continue;
@@ -1348,8 +1377,8 @@ static struct commit *get_revision_1(struct rev_info *revs)
 				if (!commit->parents || !commit->parents->next)
 					continue;
 			}
-			if (revs->parents)
-				rewrite_parents(revs, commit);
+			if (revs->parents && rewrite_parents(revs, commit) < 0)
+				return NULL;
 		}
 		return commit;
 	} while (revs->commits);
diff --git a/revision.h b/revision.h
index cdf94ad..2845167 100644
--- a/revision.h
+++ b/revision.h
@@ -113,7 +113,7 @@ extern void init_revisions(struct rev_info *revs, const char *prefix);
 extern int setup_revisions(int argc, const char **argv, struct rev_info *revs, const char *def);
 extern int handle_revision_arg(const char *arg, struct rev_info *revs,int flags,int cant_be_filename);
 
-extern void prepare_revision_walk(struct rev_info *revs);
+extern int prepare_revision_walk(struct rev_info *revs);
 extern struct commit *get_revision(struct rev_info *revs);
 
 extern void mark_parents_uninteresting(struct commit *commit);
-- 
1.5.2.rc1.21.g80e79

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] Use GIT_OBJECT_DIR for temporary files of pack-objects
       [not found]           ` <20070507173324.GA3436@steel.home>
@ 2007-05-07 17:51             ` Dana How
  2007-05-07 21:33               ` Alex Riesen
  0 siblings, 1 reply; 11+ messages in thread
From: Dana How @ 2007-05-07 17:51 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git, Junio C Hamano, danahow

On 5/7/07, Alex Riesen <raa.lkml@gmail.com> wrote:
> I'm not sure about fsck cleaning up after crashed/killed pack-objects:
> not sure I _can_ detect if the temp files really are just leftovers.

It looks like you create temp file in objects , not objects/pack .

So a rule could be : packs left in the former are crashed/killed,
and packs in the latter are complete?

You should also look at $PACKTMP in git-repack.sh .
In it $GIT_DIR should probably be $GIT_OBJECT_DIRECTORY ?

Junio:
This patch touches the same lines as the --max-pack-size patch.
What do you want to do with the latter?

Thanks,
-- 
Dana L. How  danahow@gmail.com  +1 650 804 5991 cell

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Use GIT_OBJECT_DIR for temporary files of pack-objects
  2007-05-07 17:51             ` [PATCH] Use GIT_OBJECT_DIR for temporary files of pack-objects Dana How
@ 2007-05-07 21:33               ` Alex Riesen
  2007-05-07 21:59                 ` Dana How
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Riesen @ 2007-05-07 21:33 UTC (permalink / raw)
  To: Dana How; +Cc: git, Junio C Hamano

Dana How, Mon, May 07, 2007 19:51:24 +0200:
> On 5/7/07, Alex Riesen <raa.lkml@gmail.com> wrote:
> >I'm not sure about fsck cleaning up after crashed/killed pack-objects:
> >not sure I _can_ detect if the temp files really are just leftovers.
> 
> It looks like you create temp file in objects , not objects/pack .
> 
> So a rule could be : packs left in the former are crashed/killed,
> and packs in the latter are complete?

How do you know for sure that they are _left_ already?

> You should also look at $PACKTMP in git-repack.sh .
> In it $GIT_DIR should probably be $GIT_OBJECT_DIRECTORY ?

I think it should, but it is already not the working directory, so my
original complaint does not apply

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Use GIT_OBJECT_DIR for temporary files of pack-objects
  2007-05-07 21:33               ` Alex Riesen
@ 2007-05-07 21:59                 ` Dana How
  0 siblings, 0 replies; 11+ messages in thread
From: Dana How @ 2007-05-07 21:59 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git, Junio C Hamano, danahow

On 5/7/07, Alex Riesen <raa.lkml@gmail.com> wrote:
> Dana How, Mon, May 07, 2007 19:51:24 +0200:
> > On 5/7/07, Alex Riesen <raa.lkml@gmail.com> wrote:
> > >I'm not sure about fsck cleaning up after crashed/killed pack-objects:
> > >not sure I _can_ detect if the temp files really are just leftovers.
> >
> > It looks like you create temp file in objects , not objects/pack .
> >
> > So a rule could be : packs left in the former are crashed/killed,
> > and packs in the latter are complete?
>
> How do you know for sure that they are _left_ already?
Ah, e.g. you're packing and fsck'ing at the same time?
Then you can't know.  Never mind...

> > You should also look at $PACKTMP in git-repack.sh .
> > In it $GIT_DIR should probably be $GIT_OBJECT_DIRECTORY ?
>
> I think it should, but it is already not the working directory, so my
> original complaint does not apply
OK, if I have to further change my --max-pack-size patchset
which touches git-repack.sh, I'll roll this in.

Thanks,
-- 
Dana L. How  danahow@gmail.com  +1 650 804 5991 cell

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-05-07 21:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-04  9:25 repack: handling of .keep files Alex Riesen
2007-05-04  9:56 ` Junio C Hamano
2007-05-04 10:42   ` Alex Riesen
2007-05-04 16:59     ` Junio C Hamano
2007-05-04 17:24       ` Alex Riesen
2007-05-04 19:20         ` Junio C Hamano
     [not found]           ` <20070507173324.GA3436@steel.home>
2007-05-07 17:51             ` [PATCH] Use GIT_OBJECT_DIR for temporary files of pack-objects Dana How
2007-05-07 21:33               ` Alex Riesen
2007-05-07 21:59                 ` Dana How
2007-05-04 19:40         ` repack: handling of .keep files Nicolas Pitre
2007-05-04 21:54         ` [PATCH] Handle return code of parse_commit in revision machinery Alex Riesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).