git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFE: git relink
@ 2005-06-09 18:35 Jeff Garzik
  2005-06-09 19:15 ` Ryan Anderson
  2005-06-11  3:44 ` Junio C Hamano
  0 siblings, 2 replies; 4+ messages in thread
From: Jeff Garzik @ 2005-06-09 18:35 UTC (permalink / raw)
  To: Git Mailing List


It would be nice if somebody were motivated enough to create a command 
that functions like:

	git relink repoA repoB repoC repoD... repoX

which would examine

	repoA/.git
	repoB/.git
	repoC/.git
	repoD/.git

and verify (updating, if necessary) that each of the A/B/C/D repos are 
hardlinked to repoX.

	Jeff




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFE: git relink
  2005-06-09 18:35 RFE: git relink Jeff Garzik
@ 2005-06-09 19:15 ` Ryan Anderson
  2005-06-09 19:29   ` Ryan Anderson
  2005-06-11  3:44 ` Junio C Hamano
  1 sibling, 1 reply; 4+ messages in thread
From: Ryan Anderson @ 2005-06-09 19:15 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Git Mailing List

On Thu, Jun 09, 2005 at 02:35:51PM -0400, Jeff Garzik wrote:
> 
> It would be nice if somebody were motivated enough to create a command 
> that functions like:
> 
> 	git relink repoA repoB repoC repoD... repoX
> 
> which would examine
> 
> 	repoA/.git
> 	repoB/.git
> 	repoC/.git
> 	repoD/.git
> 
> and verify (updating, if necessary) that each of the A/B/C/D repos are 
> hardlinked to repoX.

Submitted a while ago, dunno what happened with it.  This only does 2
repositories, but it's trivial to do
	for i in repoA repoB repoC repoD ; do git-relink-script "$i" repoX ; done

Provide a tool to relink two git repositories.

Signed-Off-By: Ryan Anderson <ryan@michonline.com>

---
commit a3bcc763d71bdb91a3b48e9105fbaa5e79abb807
tree 2553e2d8befbe0cda3e413616fd4cc7bf04157ad
parent a31c6d022e2435a514fcc8ca57f9995c4376a986
author Ryan Anderson <ryan@mythryan2.(none)> 1115185675 -0400
committer Ryan Anderson <ryan@michonline.com> 1115185675 -0400

Index: Makefile
===================================================================
--- 51a882a2dc62e0d3cdc79e0badc61559fb723481/Makefile  (mode:100644 sha1:99b4753d34879842b972da9b68694c9d0485f216)
+++ 2553e2d8befbe0cda3e413616fd4cc7bf04157ad/Makefile  (mode:100644 sha1:a99665e252a2342caa84238e886a80a5f27ac3c8)
@@ -13,7 +13,7 @@
 AR=ar
 
 SCRIPTS=git-apply-patch-script git-merge-one-file-script git-prune-script \
-	git-pull-script git-tag-script
+	git-pull-script git-tag-script git-relink-script
 
 PROG=   git-update-cache git-diff-files git-init-db git-write-tree \
 	git-read-tree git-commit-tree git-cat-file git-fsck-cache \
Index: git-relink-script
===================================================================
--- /dev/null  (tree:51a882a2dc62e0d3cdc79e0badc61559fb723481)
+++ 2553e2d8befbe0cda3e413616fd4cc7bf04157ad/git-relink-script  (mode:100644 sha1:78c954edcc370d8be951c856bfbfd38975d08348)
@@ -0,0 +1,115 @@
+#!/usr/bin/env perl
+# Copyright 2005, Ryan Anderson <ryan@michonline.com>
+# Distribution permitted under the GPL v2, as distributed
+# by the Free Software Foundation.
+# Later versions of the GPL at the discretion of Linus Torvalds
+#
+# Scan two git object-trees, and hardlink any common objects between them.
+
+use 5.006;
+use strict;
+use warnings;
+
+sub get_canonical_form($);
+sub do_scan_directory($$$);
+sub compare_two_files($$);
+
+# stats
+my $linked = 0;
+my $already = 0;
+
+my ($dir1, $dir2) = @ARGV;
+
+if (!defined $dir1 || !defined $dir2) {
+	print("Usage: $0 <dir1> <dir2>\nBoth dir1 and dir2 should contain a .git/objects/ subdirectory.\n");
+	exit(1);
+}
+
+$dir1 = get_canonical_form($dir1);
+$dir2 = get_canonical_form($dir2);
+
+printf("Searching '%s' and '%s' for common objects and hardlinking them...\n",$dir1,$dir2);
+
+opendir(D,$dir1 . "objects/")
+	or die "Failed to open $dir1/objects/ : $!";
+
+my @hashdirs = grep !/^\.{1,2}$/, readdir(D);
+foreach my $hashdir (@hashdirs) {
+	do_scan_directory($dir1, $hashdir, $dir2);
+}
+
+printf("Linked %d files, %d were already linked.\n",$linked, $already);
+
+
+sub do_scan_directory($$$) {
+	my ($srcdir, $subdir, $dstdir) = @_;
+
+	my $sfulldir = sprintf("%sobjects/%s/",$srcdir,$subdir);
+	my $dfulldir = sprintf("%sobjects/%s/",$dstdir,$subdir);
+
+	opendir(S,$sfulldir)
+		or die "Failed to opendir $sfulldir: $!";
+
+	foreach my $file (grep(!/\.{1,2}$/, readdir(S))) {
+		my $sfilename = $sfulldir . $file;
+		my $dfilename = $dfulldir . $file;
+
+		compare_two_files($sfilename,$dfilename);
+
+	}
+	closedir(S);
+}
+
+sub compare_two_files($$) {
+	my ($sfilename, $dfilename) = @_;
+
+	# Perl's stat returns relevant information as follows:
+	# 0 = dev number
+	# 1 = inode number
+	# 7 = size
+	my @sstatinfo = stat($sfilename);
+	my @dstatinfo = stat($dfilename);
+
+	if (@sstatinfo == 0 && @dstatinfo == 0) {
+		die sprintf("Stat of both %s and %s failed: %s\n",$sfilename, $dfilename, $!);
+
+	} elsif (@dstatinfo == 0) {
+		return;
+	}
+
+	if ( ($sstatinfo[0] == $dstatinfo[0]) &&
+	     ($sstatinfo[1] != $dstatinfo[1])) {
+		if ($sstatinfo[7] == $dstatinfo[7]) {
+			unlink($dfilename)
+				or die "Unlink of $dfilename failed: $!\n";
+
+			link($sfilename,$dfilename)
+				or die "Failed to link $sfilename to $dfilename: $!\n" .
+					"Git Repository containing $dfilename is probably corrupted, please copy '$sfilename' to '$dfilename' to fix.\n";
+
+			$linked++;
+
+		} else {
+			die sprintf("ERROR: File sizes are not the same, cannot relink %s to %s.\n",
+				$sfilename, $dfilename);
+		}
+
+	} elsif ( ($sstatinfo[0] == $dstatinfo[0]) &&
+	     ($sstatinfo[1] == $dstatinfo[1])) {
+		$already++;
+	}
+}
+
+sub get_canonical_form($) {
+	my $dir = shift;
+	my $original = $dir;
+
+	die "$dir is not a directory." unless -d $dir;
+
+	$dir .= "/" unless $dir =~ m#/$#;
+	$dir .= ".git/" unless $dir =~ m#\.git/$#;
+
+	die "$original does not have a .git/ subdirectory.\n" unless -d $dir;
+
+	return $dir;
+}

-- 

Ryan Anderson
  sometimes Pug Majere

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFE: git relink
  2005-06-09 19:15 ` Ryan Anderson
@ 2005-06-09 19:29   ` Ryan Anderson
  0 siblings, 0 replies; 4+ messages in thread
From: Ryan Anderson @ 2005-06-09 19:29 UTC (permalink / raw)
  To: Ryan Anderson; +Cc: Jeff Garzik, Git Mailing List

On Thu, Jun 09, 2005 at 03:15:48PM -0400, Ryan Anderson wrote:
> On Thu, Jun 09, 2005 at 02:35:51PM -0400, Jeff Garzik wrote:
> > 
> > It would be nice if somebody were motivated enough to create a command 
> > that functions like:
> > 
> > 	git relink repoA repoB repoC repoD... repoX
> > 
> > which would examine
> > 
> > 	repoA/.git
> > 	repoB/.git
> > 	repoC/.git
> > 	repoD/.git
> > 
> > and verify (updating, if necessary) that each of the A/B/C/D repos are 
> > hardlinked to repoX.
> 
> Submitted a while ago, dunno what happened with it.  This only does 2
> repositories, but it's trivial to do
> 	for i in repoA repoB repoC repoD ; do git-relink-script "$i" repoX ; done

And of course, I didn't actually look at my code, what my code really
wants is:

 	for i in repoA repoB repoC repoD ; do git-relink-script repoX "$i" ; done

It should be relatively trivial to convert it over to the other behavior
if it matters.

-- 

Ryan Anderson
  sometimes Pug Majere

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFE: git relink
  2005-06-09 18:35 RFE: git relink Jeff Garzik
  2005-06-09 19:15 ` Ryan Anderson
@ 2005-06-11  3:44 ` Junio C Hamano
  1 sibling, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2005-06-11  3:44 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: git

>>>>> "JG" == Jeff Garzik <jgarzik@pobox.com> writes:

JG> It would be nice if somebody were motivated enough to create a command
JG> that functions like:

JG> 	git relink repoA repoB repoC repoD... repoX

JG> which would examine

JG> 	repoA/.git
JG> 	repoB/.git
JG> 	repoC/.git
JG> 	repoD/.git

JG> and verify (updating, if necessary) that each of the A/B/C/D repos are
JG> hardlinked to repoX.

Whoever is doing this script needs to be a bit careful.

If you end up unlinking a full object and hard-linking a
deltified object representation (delta) in its place, the
repository can get corrupted, because it might not have the
necessary base object for the delta.

There are two strategies to solve this.  Either (1) the relinker
refuses to replace a full object with a delta, or (2) the
relinker notices a delta, and makes an additional hard link to
the base object when replacing a full object with a delta (this
needs to be done recursively until you hit a full base object).

(1) is simpler and cleaner, but does not get full advantage of
the delta compression.  (2) gives you delta compression but it
will add possibly "unwanted" objects to a repository that
happens to slurp in a delta (fsck would not complain, though).

My knee-jerk vote goes to (1), but in either case the relinker
needs to check if it is dealing with a delta; this cannot be
done with git-cat-file -t AFAIK.

Ryan Anderson code will notice delta vs full object case most of
the time because it checks and makes sure the sizes of
corresponding files from two repositories match.  The problem
with the code is that it dies, instead of just ignoring, when
size differs.  Dying is good for ordinary case (two full object
representations of the same file should not have different
sizes), but it is not the right thing to do when it sees one
side using the full object representation and the other side
using a delta.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-06-11  3:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-09 18:35 RFE: git relink Jeff Garzik
2005-06-09 19:15 ` Ryan Anderson
2005-06-09 19:29   ` Ryan Anderson
2005-06-11  3:44 ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).