git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cvsimport: skip commits that are too recent
@ 2007-01-08  1:11 Martin Langhoff
  2007-01-08  1:59 ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Langhoff @ 2007-01-08  1:11 UTC (permalink / raw)
  To: junkio, git; +Cc: Martin Langhoff

With this patch, cvsimport will skip commits made
in the last 10 minutes. The recent-ness test is of
5 minutes + cvsps fuzz window (5 minutes default).

When working with a CVS repository that is in use,
importing commits that are too recent can lead to
partially incorrect trees. This is mainly due to

 - Commits that are within the cvsps fuzz window may later
   be found to have affected more files.

 - When performing incremental imports, clock drift between
   the systems may lead to skipped commits.

This commit helps keep incremental imports of in-use
CVS repositories sane.

Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
---
 git-cvsimport.perl |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/git-cvsimport.perl b/git-cvsimport.perl
index c5bf2d1..2686775 100755
--- a/git-cvsimport.perl
+++ b/git-cvsimport.perl
@@ -129,6 +129,11 @@ if ($opt_M) {
 	push (@mergerx, qr/$opt_M/);
 }
 
+# Remember UTC of our starting time
+# we'll want to avoid importing commits
+# that are too recent
+our $starttime = time();
+
 select(STDERR); $|=1; select(STDOUT);
 
 
@@ -824,6 +829,15 @@ while (<CVS>) {
 			$state = 11;
 			next;
 		}
+		if ( $starttime - 300 - (defined $opt_z ? $opt_z : 300) <= $date) {
+			# skip if the commit is too recent
+			# that the cvsps default fuzz is 300s, we give ourselves another
+			# 300s just in case -- this also prevents skipping commits
+			# due to server clock drift
+			print "skip patchset $patchset: $date too recent\n" if $opt_v;
+			$state = 11;
+			next;
+		}
 		if (exists $ignorebranch{$branch}) {
 			print STDERR "Skipping $branch\n";
 			$state = 11;
-- 
1.5.0.rc0.g4017-dirty

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] cvsimport: skip commits that are too recent
  2007-01-08  1:11 Martin Langhoff
@ 2007-01-08  1:59 ` Junio C Hamano
  2007-01-08  2:13   ` Martin Langhoff
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2007-01-08  1:59 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: junkio, git

Martin Langhoff <martin@catalyst.net.nz> writes:

> With this patch, cvsimport will skip commits made
> in the last 10 minutes. The recent-ness test is of
> 5 minutes + cvsps fuzz window (5 minutes default).
>
> When working with a CVS repository that is in use,
> importing commits that are too recent can lead to
> partially incorrect trees. This is mainly due to
>
>  - Commits that are within the cvsps fuzz window may later
>    be found to have affected more files.
>
>  - When performing incremental imports, clock drift between
>    the systems may lead to skipped commits.

Hmmmmm.  This is good for tracking other people's work, but, I
am not quite sure how well it works with my workflow.

I have a CVS upstream, but I manage my own development with git.
I start my day by running an incremental cvsimport, rebase my
branch on top of whatever at the tip of the cvsimport branch.  I
use cvsexportcommit (actually a moral equivalent of it I've been
using before cvsexportcommit has become part of git) to make
parts of my branch that are ready for other people's consumption
available by making commits on the CVS side.  Almost immediately
after that, I do another incremental cvsimport so that I can
rebase the remainder of my branch on top of what I made public.

I guess there is no negative impact your patch has on me -- this
10 minute window does not mean that I cannot continue working.
I can keep working on my stuff on my (old) branch without the
second cvsimport and rebase and there is nothing lost.  I can do
the second cvsimport and rebase later.

Which means that you did not give me a new excuse to take a
coffee break and work on git stuff instead of my day job project
to my management but that is Ok.  I'll find other ways ;-).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cvsimport: skip commits that are too recent
  2007-01-08  1:59 ` Junio C Hamano
@ 2007-01-08  2:13   ` Martin Langhoff
  2007-01-08  2:19     ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Langhoff @ 2007-01-08  2:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin Langhoff, git

On 1/8/07, Junio C Hamano <junkio@cox.net> wrote:
>  Almost immediately
> after that, I do another incremental cvsimport so that I can
> rebase the remainder of my branch on top of what I made public.

That probably means I should have added a
--force-fetch-all-and-i-mean-it flag to override the cautious
behaviour. I'll repost ina few hours, with doco too.

 ...

> Which means that you did not give me a new excuse to take a
> coffee break and work on git stuff instead of my day job project
> to my management but that is Ok.  I'll find other ways ;-).

I'll try harder next time ;-)

cheers


martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cvsimport: skip commits that are too recent
  2007-01-08  2:13   ` Martin Langhoff
@ 2007-01-08  2:19     ` Junio C Hamano
  2007-01-08  3:18       ` Martin Langhoff
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2007-01-08  2:19 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git

"Martin Langhoff" <martin.langhoff@gmail.com> writes:

> On 1/8/07, Junio C Hamano <junkio@cox.net> wrote:
>>  Almost immediately
>> after that, I do another incremental cvsimport so that I can
>> rebase the remainder of my branch on top of what I made public.
>
> That probably means I should have added a
> --force-fetch-all-and-i-mean-it flag to override the cautious
> behaviour. I'll repost ina few hours, with doco too.

Well I meant to say that my conclusion is that that flag is not
needed in practice.

>
> ...
>
>> Which means that you did not give me a new excuse to take a
>> coffee break and work on git stuff instead of my day job project
>> to my management but that is Ok.  I'll find other ways ;-).
>
> I'll try harder next time ;-)
>
> cheers
>
>
> martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cvsimport: skip commits that are too recent
  2007-01-08  2:19     ` Junio C Hamano
@ 2007-01-08  3:18       ` Martin Langhoff
  0 siblings, 0 replies; 10+ messages in thread
From: Martin Langhoff @ 2007-01-08  3:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 1/8/07, Junio C Hamano <junkio@cox.net> wrote:
> Well I meant to say that my conclusion is that that flag is not
> needed in practice.

you are getting soft in your old age, man :-)

More seriously, as soon as I sent the patch I thought of the use case
you describe. Not everyone is as patient as you are, and someone may
have things scripted. If the cvs repo is local/fast and under your
control, it doesn't make sense to be _so_ conservative.

Also -- by having the flag we get an excuse to mention it in the doco.


m

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] cvsimport: skip commits that are too recent
@ 2007-01-08  6:43 Martin Langhoff
  2007-01-08  7:17 ` Martin Langhoff
  2007-01-11  8:22 ` Robin Rosenberg
  0 siblings, 2 replies; 10+ messages in thread
From: Martin Langhoff @ 2007-01-08  6:43 UTC (permalink / raw)
  To: git, junkio; +Cc: Martin Langhoff

With this patch, cvsimport will skip commits made
in the last 10 minutes. The recent-ness test is of
5 minutes + cvsps fuzz window (5 minutes default).

To force recent commits to be imported, pass the
-a(ll) flag.

When working with a CVS repository that is in use,
importing commits that are too recent can lead to
partially incorrect trees. This is mainly due to

 - Commits that are within the cvsps fuzz window may later
   be found to have affected more files.

 - When performing incremental imports, clock drift between
   the systems may lead to skipped commits.

This commit helps keep incremental imports of in-use
CVS repositories sane.

Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
---
 Documentation/git-cvsimport.txt |    7 ++++++-
 git-cvsimport.perl              |   20 ++++++++++++++++++--
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-cvsimport.txt b/Documentation/git-cvsimport.txt
index d21d66b..6deee94 100644
--- a/Documentation/git-cvsimport.txt
+++ b/Documentation/git-cvsimport.txt
@@ -90,7 +90,8 @@ If you need to pass multiple options, separate them with a comma.
 	Print a short usage message and exit.
 
 -z <fuzz>::
-        Pass the timestamp fuzz factor to cvsps.
+	Pass the timestamp fuzz factor to cvsps, in seconds. If unset,
+	cvsps defaults to 300s.
 
 -s <subst>::
 	Substitute the character "/" in branch names with <subst>
@@ -99,6 +100,10 @@ If you need to pass multiple options, separate them with a comma.
 	CVS by default uses the unix username when writing its
 	commit logs. Using this option and an author-conv-file
 	in this format
+
+-a::
+	Import all commits, including recent ones. cvsimport by default
+	skips commits that have a timestamp less than 10 minutes ago.
 +
 ---------
 	exon=Andreas Ericsson <ae@op5.se>
diff --git a/git-cvsimport.perl b/git-cvsimport.perl
index c5bf2d1..a75aaa3 100755
--- a/git-cvsimport.perl
+++ b/git-cvsimport.perl
@@ -29,7 +29,7 @@ use IPC::Open2;
 $SIG{'PIPE'}="IGNORE";
 $ENV{'TZ'}="UTC";
 
-our ($opt_h,$opt_o,$opt_v,$opt_k,$opt_u,$opt_d,$opt_p,$opt_C,$opt_z,$opt_i,$opt_P, $opt_s,$opt_m,$opt_M,$opt_A,$opt_S,$opt_L);
+our ($opt_h,$opt_o,$opt_v,$opt_k,$opt_u,$opt_d,$opt_p,$opt_C,$opt_z,$opt_i,$opt_P, $opt_s,$opt_m,$opt_M,$opt_A,$opt_S,$opt_L, $opt_a);
 my (%conv_author_name, %conv_author_email);
 
 sub usage() {
@@ -37,7 +37,7 @@ sub usage() {
 Usage: ${\basename $0}     # fetch/update GIT from CVS
        [-o branch-for-HEAD] [-h] [-v] [-d CVSROOT] [-A author-conv-file]
        [-p opts-for-cvsps] [-C GIT_repository] [-z fuzz] [-i] [-k] [-u]
-       [-s subst] [-m] [-M regex] [-S regex] [CVS_module]
+       [-s subst] [-a] [-m] [-M regex] [-S regex] [CVS_module]
 END
 	exit(1);
 }
@@ -105,6 +105,8 @@ if ($opt_d) {
 }
 $opt_o ||= "origin";
 $opt_s ||= "-";
+$opt_a ||= 0;
+
 my $git_tree = $opt_C;
 $git_tree ||= ".";
 
@@ -129,6 +131,11 @@ if ($opt_M) {
 	push (@mergerx, qr/$opt_M/);
 }
 
+# Remember UTC of our starting time
+# we'll want to avoid importing commits
+# that are too recent
+our $starttime = time();
+
 select(STDERR); $|=1; select(STDOUT);
 
 
@@ -824,6 +831,15 @@ while (<CVS>) {
 			$state = 11;
 			next;
 		}
+		if ( !$opt_a && $starttime - 300 - (defined $opt_z ? $opt_z : 300) <= $date) {
+			# skip if the commit is too recent
+			# that the cvsps default fuzz is 300s, we give ourselves another
+			# 300s just in case -- this also prevents skipping commits
+			# due to server clock drift
+			print "skip patchset $patchset: $date too recent\n" if $opt_v;
+			$state = 11;
+			next;
+		}
 		if (exists $ignorebranch{$branch}) {
 			print STDERR "Skipping $branch\n";
 			$state = 11;
-- 
1.5.0.rc0.g4017-dirty

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] cvsimport: skip commits that are too recent
  2007-01-08  6:43 [PATCH] cvsimport: skip commits that are too recent Martin Langhoff
@ 2007-01-08  7:17 ` Martin Langhoff
  2007-01-08  8:24   ` Martin Langhoff
  2007-01-11  8:22 ` Robin Rosenberg
  1 sibling, 1 reply; 10+ messages in thread
From: Martin Langhoff @ 2007-01-08  7:17 UTC (permalink / raw)
  To: git, junkio

On 1/8/07, Martin Langhoff <martin@catalyst.net.nz> wrote:
> With this patch, cvsimport will skip commits made
> in the last 10 minutes. The recent-ness test is of
> 5 minutes + cvsps fuzz window (5 minutes default).

Here is the repost with appropriate doco and an override ;-)

In related news, I am trying to debug an import that consistently
skips over remote commits... which is bad, bad news. The culprit seems
to be cvsps -- it skips commits it clearly knows about, and I'm not
sure why. I do think those were commits that cvsps saw half-baked in
the first place.

Passing -x to cvsps does bring those commits back, cvsps with -x can
afford to rewrite history a little bit. As long as the history being
rewritten is not too old we are safe. So with this patch, passing -x
is safer, assuming that 10 minutes is enough of a time window for
cvsps to change opinion about the project history.

 (Before you ask: from a data correctness, this is a fine mess.)

For this repo, I'll start running cvsimport with -o ' -x ' and see how
it behaves. Time-wise, the bandwidth usage and cpu times are roughly
similar for me using --cvs-direct. The patch to do it by default in
cvsimport is trivial, but I'm not entirely happy with the concept just
now.

In any case -- this should be a bit of a warning. cvsps is not
particularly reliable (not that cvs data ever is!), and passing -o '
-x' may help.

cheers,


martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cvsimport: skip commits that are too recent
  2007-01-08  7:17 ` Martin Langhoff
@ 2007-01-08  8:24   ` Martin Langhoff
  0 siblings, 0 replies; 10+ messages in thread
From: Martin Langhoff @ 2007-01-08  8:24 UTC (permalink / raw)
  To: git, junkio

On 1/8/07, Martin Langhoff <martin.langhoff@gmail.com> wrote:
> In any case -- this should be a bit of a warning. cvsps is not
> particularly reliable (not that cvs data ever is!), and passing -o '
> -x' may help.

Correction:  it is -p ' -x ' that you need to pass. Things _are_ saner
here with it. YMMV.

cheers,


martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cvsimport: skip commits that are too recent
  2007-01-08  6:43 [PATCH] cvsimport: skip commits that are too recent Martin Langhoff
  2007-01-08  7:17 ` Martin Langhoff
@ 2007-01-11  8:22 ` Robin Rosenberg
  2007-01-11 20:18   ` Martin Langhoff
  1 sibling, 1 reply; 10+ messages in thread
From: Robin Rosenberg @ 2007-01-11  8:22 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git, junkio

The idea is nice,  but the downside of this patch is that I (and presumably 
others) have to rewrite the scripts to invoke cvsps explicitly now. The fix 
should really be in cvsps, not git-cvsimport (which is the reason I haven't 
fixed this). Running a full cvsps takes two hours and consumes more than a 
gigabyte of memory for me, which makes it impossible to run on all but one 
machine, wheras the incremental import runs in less than five minutes on any 
machine.

Add to that the risk that the buggy nature of cvsps probably increases the 
risk of errors, so please make the old behaviour the default (import all, 
retain cvsps cache) and make the changed behaviour the result of an explicit 
switch.

-- robin

måndag 08 januari 2007 07:43 skrev Martin Langhoff:
> With this patch, cvsimport will skip commits made
> in the last 10 minutes. The recent-ness test is of
> 5 minutes + cvsps fuzz window (5 minutes default).
>
> To force recent commits to be imported, pass the
> -a(ll) flag.
>
> When working with a CVS repository that is in use,
> importing commits that are too recent can lead to
> partially incorrect trees. This is mainly due to
>
>  - Commits that are within the cvsps fuzz window may later
>    be found to have affected more files.
>
>  - When performing incremental imports, clock drift between
>    the systems may lead to skipped commits.
>
> This commit helps keep incremental imports of in-use
> CVS repositories sane.
>
> Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
> ---
>  Documentation/git-cvsimport.txt |    7 ++++++-
>  git-cvsimport.perl              |   20 ++++++++++++++++++--
>  2 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/git-cvsimport.txt
> b/Documentation/git-cvsimport.txt index d21d66b..6deee94 100644
> --- a/Documentation/git-cvsimport.txt
> +++ b/Documentation/git-cvsimport.txt
> @@ -90,7 +90,8 @@ If you need to pass multiple options, separate them with
> a comma. Print a short usage message and exit.
>
>  -z <fuzz>::
> -        Pass the timestamp fuzz factor to cvsps.
> +	Pass the timestamp fuzz factor to cvsps, in seconds. If unset,
> +	cvsps defaults to 300s.
>
>  -s <subst>::
>  	Substitute the character "/" in branch names with <subst>
> @@ -99,6 +100,10 @@ If you need to pass multiple options, separate them
> with a comma. CVS by default uses the unix username when writing its
>  	commit logs. Using this option and an author-conv-file
>  	in this format
> +
> +-a::
> +	Import all commits, including recent ones. cvsimport by default
> +	skips commits that have a timestamp less than 10 minutes ago.
>  +
>  ---------
>  	exon=Andreas Ericsson <ae@op5.se>
> diff --git a/git-cvsimport.perl b/git-cvsimport.perl
> index c5bf2d1..a75aaa3 100755
> --- a/git-cvsimport.perl
> +++ b/git-cvsimport.perl
> @@ -29,7 +29,7 @@ use IPC::Open2;
>  $SIG{'PIPE'}="IGNORE";
>  $ENV{'TZ'}="UTC";
>
> -our
> ($opt_h,$opt_o,$opt_v,$opt_k,$opt_u,$opt_d,$opt_p,$opt_C,$opt_z,$opt_i,$opt
>_P, $opt_s,$opt_m,$opt_M,$opt_A,$opt_S,$opt_L); +our
> ($opt_h,$opt_o,$opt_v,$opt_k,$opt_u,$opt_d,$opt_p,$opt_C,$opt_z,$opt_i,$opt
>_P, $opt_s,$opt_m,$opt_M,$opt_A,$opt_S,$opt_L, $opt_a); my
> (%conv_author_name, %conv_author_email);
>
>  sub usage() {
> @@ -37,7 +37,7 @@ sub usage() {
>  Usage: ${\basename $0}     # fetch/update GIT from CVS
>         [-o branch-for-HEAD] [-h] [-v] [-d CVSROOT] [-A author-conv-file]
>         [-p opts-for-cvsps] [-C GIT_repository] [-z fuzz] [-i] [-k] [-u]
> -       [-s subst] [-m] [-M regex] [-S regex] [CVS_module]
> +       [-s subst] [-a] [-m] [-M regex] [-S regex] [CVS_module]
>  END
>  	exit(1);
>  }
> @@ -105,6 +105,8 @@ if ($opt_d) {
>  }
>  $opt_o ||= "origin";
>  $opt_s ||= "-";
> +$opt_a ||= 0;
> +
>  my $git_tree = $opt_C;
>  $git_tree ||= ".";
>
> @@ -129,6 +131,11 @@ if ($opt_M) {
>  	push (@mergerx, qr/$opt_M/);
>  }
>
> +# Remember UTC of our starting time
> +# we'll want to avoid importing commits
> +# that are too recent
> +our $starttime = time();
> +
>  select(STDERR); $|=1; select(STDOUT);
>
>
> @@ -824,6 +831,15 @@ while (<CVS>) {
>  			$state = 11;
>  			next;
>  		}
> +		if ( !$opt_a && $starttime - 300 - (defined $opt_z ? $opt_z : 300) <=
> $date) { +			# skip if the commit is too recent
> +			# that the cvsps default fuzz is 300s, we give ourselves another
> +			# 300s just in case -- this also prevents skipping commits
> +			# due to server clock drift
> +			print "skip patchset $patchset: $date too recent\n" if $opt_v;
> +			$state = 11;
> +			next;
> +		}
>  		if (exists $ignorebranch{$branch}) {
>  			print STDERR "Skipping $branch\n";
>  			$state = 11;

-- 
TESTMAIL

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cvsimport: skip commits that are too recent
  2007-01-11  8:22 ` Robin Rosenberg
@ 2007-01-11 20:18   ` Martin Langhoff
  0 siblings, 0 replies; 10+ messages in thread
From: Martin Langhoff @ 2007-01-11 20:18 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git, junkio

Robin Rosenberg wrote:
> The idea is nice,  but the downside of this patch is that I (and presumably 
> others) have to rewrite the scripts to invoke cvsps explicitly now. 

This patch did _not_ change how we invoke cvsps at all. It did change
that we now ignore the very recent commits (and pick them up in the next
run), unless you pass -a.

> The fix
> should really be in cvsps, not git-cvsimport (which is the reason I haven't 
> fixed this). Running a full cvsps takes two hours and consumes more than a 
> gigabyte of memory for me, which makes it impossible to run on all but one 
> machine, wheras the incremental import runs in less than five minutes on any 
> machine.

Many things would need fixing in cvsps. This aspect [that commits we do
not know if recent activty belongs to a finished commit or a commit that
is still happening], is not cvsps' fault. It is due to the lack of
atomicity in CVS, combined with its rather bad network protocol.

> Add to that the risk that the buggy nature of cvsps probably increases the 
> risk of errors, so please make the old behaviour the default (import all, 
> retain cvsps cache) and make the changed behaviour the result of an explicit 
> switch.

What seems to concern you is the "retain cvsps cache" -- which we do.

I did comment later in the thread that we should consider rebuilding the
cvsps cache. The reason for that is that I am seeing LESS breakage than
maintaining the cache. Significantly less.

As you say, however, it is a major change, so I'm still evaluating options.

cheers



martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-01-11 20:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-08  6:43 [PATCH] cvsimport: skip commits that are too recent Martin Langhoff
2007-01-08  7:17 ` Martin Langhoff
2007-01-08  8:24   ` Martin Langhoff
2007-01-11  8:22 ` Robin Rosenberg
2007-01-11 20:18   ` Martin Langhoff
  -- strict thread matches above, loose matches on Subject: below --
2007-01-08  1:11 Martin Langhoff
2007-01-08  1:59 ` Junio C Hamano
2007-01-08  2:13   ` Martin Langhoff
2007-01-08  2:19     ` Junio C Hamano
2007-01-08  3:18       ` Martin Langhoff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).