git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git-relink status (or bug?)
@ 2008-06-21 10:36 Marc Zonzon
  2008-06-21 19:22 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Marc Zonzon @ 2008-06-21 10:36 UTC (permalink / raw)
  To: git

When trying to use git-relink, I found it quite disappointing when
going over packs. Git relink seem to make the assumption that there is
a unique mapping from object name to object identity, which is of
course acceptable for loose objects that are named with their sha-1
but false for .pack and .idx, to pack objects with the same name have
contains the same objects but may be not packed in the same order, or
compression.
Moreover .idx files can not be considered alone, but depends on the
associated .pack.

When it happen that you have two different packs with the same name
but of different sizes, git relink does not hard link the .packs
because the size differ, and hard link the idx. And your repository is
corrupted.

It happen when you clone a repository, repack the clone and relink the
clone to the original one.

I found very few information about git relink, but as it appears in
changelog of v1.5.4 I suppose it is not obsoleted.

What about the use of this script?

Marc

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: git-relink status (or bug?)
  2008-06-21 10:36 git-relink status (or bug?) Marc Zonzon
@ 2008-06-21 19:22 ` Junio C Hamano
  2008-06-21 20:23   ` marc zonzon
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2008-06-21 19:22 UTC (permalink / raw)
  To: Marc Zonzon; +Cc: git

Marc Zonzon <marc.zonzon+git@gmail.com> writes:

> I found very few information about git relink, but as it appears in
> changelog of v1.5.4 I suppose it is not obsoleted.

I do not think anybody uses it these days.  Instead either they clone with
reference (or -s), or perhaps use new-workdir.

Here is a totally untested fix.

The "careful" part can be made much more clever and efficient by learning
implementation details about the .idx file (it has the checksum for itself
and the checksum for its .pack file at the end) but I did not bother.

I do not think this in its current shape is committable, without
improvements and success reports from the list.  Hint, hint...


 git-relink.perl |   26 ++++++++++++++++++--------
 1 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/git-relink.perl b/git-relink.perl
index 15fb932..68e0f0e 100755
--- a/git-relink.perl
+++ b/git-relink.perl
@@ -10,10 +10,11 @@ use 5.006;
 use strict;
 use warnings;
 use Getopt::Long;
+use File::Compare;
 
 sub get_canonical_form($);
 sub do_scan_directory($$$);
-sub compare_two_files($$);
+sub compare_and_link($$$);
 sub usage();
 sub link_two_files($$);
 
@@ -67,6 +68,7 @@ sub do_scan_directory($$$) {
 
 	my $sfulldir = sprintf("%sobjects/%s/",$srcdir,$subdir);
 	my $dfulldir = sprintf("%sobjects/%s/",$dstdir,$subdir);
+	my $careful = ($subdir eq 'pack');
 
 	opendir(S,$sfulldir)
 		or die "Failed to opendir $sfulldir: $!";
@@ -75,14 +77,14 @@ sub do_scan_directory($$$) {
 		my $sfilename = $sfulldir . $file;
 		my $dfilename = $dfulldir . $file;
 
-		compare_two_files($sfilename,$dfilename);
+		compare_and_link($sfilename, $dfilename, $careful);
 
 	}
 	closedir(S);
 }
 
-sub compare_two_files($$) {
-	my ($sfilename, $dfilename) = @_;
+sub compare_and_link($$$) {
+	my ($sfilename, $dfilename, $careful) = @_;
 
 	# Perl's stat returns relevant information as follows:
 	# 0 = dev number
@@ -100,12 +102,20 @@ sub compare_two_files($$) {
 
 	if ( ($sstatinfo[0] == $dstatinfo[0]) &&
 	     ($sstatinfo[1] != $dstatinfo[1])) {
-		if ($sstatinfo[7] == $dstatinfo[7]) {
+		my $differs = undef;
+		if ($sstatinfo[7] != $dstatinfo[7]) {
+			$differs = "size";
+		}
+		if (!$differs && $careful) {
+			if (File::Compare::compare($sfilename, $dfilename)) {
+				$differs = "contents";
+			}
+		}
+		if (!$differs) {
 			link_two_files($sfilename, $dfilename);
-
 		} else {
-			my $err = sprintf("ERROR: File sizes are not the same, cannot relink %s to %s.\n",
-				$sfilename, $dfilename);
+			my $err = sprintf("ERROR: File differs (%s), cannot relink %s to %s.\n",
+					  $differs, $sfilename, $dfilename);
 			if ($fail_on_different_sizes) {
 				die $err;
 			} else {

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: git-relink status (or bug?)
  2008-06-21 19:22 ` Junio C Hamano
@ 2008-06-21 20:23   ` marc zonzon
  0 siblings, 0 replies; 3+ messages in thread
From: marc zonzon @ 2008-06-21 20:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Thank you for your answer

On Sat, Jun 21, 2008 at 9:22 PM, Junio C Hamano <gitster@pobox.com> wrote:

>
> I do not think anybody uses it these days.  Instead either they clone with
> reference (or -s), or perhaps use new-workdir.

The goal of git-relink is analogous to the default local clone,
hardlinks can be safer than
 sharing because you don't loose anything when the origin directory
reset a branch.
I remark that git-clone(1) warn about -s use, but not --reference, but
they seems identical on these aspects.

In numerous cases you cannot suppose your alternate will keep your
objects forever.
I have posted recently such a case study
http://thread.gmane.org/gmane.comp.version-control.git/85407
and when trying hardlinks, i found this bug. It happens that sharing
was a better solution
(but only with the help of Shawn answer I could set it up!)

This new-workdir seems also a nice script, that I never looked at
before (But why is there no documentation on these contrib?)

>
> Here is a totally untested fix.
>
> The "careful" part can be made much more clever and efficient by learning
> implementation details about the .idx file (it has the checksum for itself
> and the checksum for its .pack file at the end) but I did not bother.

Thank you
I see that you only take the safe way, don't hardlink if something is
different, but there would be a more efficient one, to link when the
packs have the same name, and link also the idx. If they have the same
name they have the same content (with a fair probability!)

I cannot provide a patch for that, because I'm not a perl programmer,
and I'm too lazy to rewrite it in C or python!

> I do not think this in its current shape is committable, without
> improvements and success reports from the list.  Hint, hint...

Being "perl challenged" I cannot readproof the script, but at least I
can test it but only on trivial test cases which make git-relink fail!
(I have only tried once to use it to solve the previously cited
problem)

Marc

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-06-21 20:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-21 10:36 git-relink status (or bug?) Marc Zonzon
2008-06-21 19:22 ` Junio C Hamano
2008-06-21 20:23   ` marc zonzon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).