git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Kyle J. McKay" <mackyle@gmail.com>
To: "Eric Wong" <normalperson@yhbt.net>, Mike <ipso@snappymail.ca>,
	Minty <mintywalker@gmail.com>,
	"Nico Schlömer" <nico.schloemer@gmail.com>,
	"Valery Yundin" <yuvalery@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Git mailing list <git@vger.kernel.org>
Subject: Re: git svn import failure : write .git/Git_svn_hash_BmjclS: Bad file descriptor
Date: Thu, 26 Feb 2015 05:49:34 -0800	[thread overview]
Message-ID: <cbcb926c1b7edd31548a15ea4966a35@74d39fa044aa309eaea14b9f57fe79c> (raw)
In-Reply-To: <CAK6Z60fhyY3wPG6hx7YJ4y7wZsWb=33JhqejmpVBgZLGdB1XMA@mail.gmail.com>

On Feb 26, 2015, at 01:09, Nico Schl=F6mer wrote:
> All,
>
> I applied Kyle's latest patch
>
>> diff --git a/perl/Git/SVN/Ra.pm b/perl/Git/SVN/Ra.pm
>> index 622535e2..96888228 100644
>> --- a/perl/Git/SVN/Ra.pm
>> +++ b/perl/Git/SVN/Ra.pm
>> @@ -391,6 +391,7 @@ sub longest_common_path {
>> sub gs_fetch_loop_common {
>>        my ($self, $base, $head, $gsv, $globs) =3D @_;
>>        return if ($base > $head);
>> +       $::_repository->_open_cat_blob_if_needed;
>>        my $gpool =3D SVN::Pool->new_default;
>>        my $ra_url =3D $self->url;
>>        my $reload_ra =3D sub {
>
> alone and this fixes the bug for me. Thanks a lot, Kyle!

The updated patch with the additional fix is below.

There are two symptoms it addresses:

1) the 'Git_svn_hash_... bad file descriptor' failure

2) the 'out pipe went bad' failure

While (1) could probably also be addressed by Eric's alternate
"destroy all tempfiles when reloading RA" patch, that would require re-
opening all the temp files every 100 revisions (or whatever the log
window size is changed to).

I noticed that with the default log window size of 100 revisions, each
time the connection was reset at the 100-revision boundary (to reduce
the overall memory usage) it seemed to take approx. 3 seconds to start
up again processing revisions on that gmsh repository.  If you're
cloning 30000 revisions, that adds 15 minutes to the total clone time
already.  Admittedly opening new temp files will be a lot faster and
hardly noticeable, but doing that 300 times over the course of 30000
revisions will probably add at least a little extra delay and since
blowing away all the temp files does not seem to be necessary, why
incur the extra delay?

Also the "destroy all tempfiles when reloading RA" patch isn't going
to fix the cat_blob 'out pipe went bad' problem since that has nothing
to do with the temp files so another change will still be required for
that.

-Kyle

-- 8< --
Subject: [PATCH v2] Git::SVN::*: avoid premature FileHandle closure

Since b19138b (git-svn: Make it incrementally faster by minimizing temp
files, v1.6.0), git-svn has been using the Git.pm temp_acquire and
temp_release mechanism to avoid unnecessary temp file churn and provide
a speed boost.

However, that change introduced a call to temp_acquire inside the
Git::SVN::Fetcher::close_file function for an 'svn_hash' temp file.
Because an SVN::Pool is active at the time this function is called, if
the Git::temp_acquire function ends up actually creating a new
FileHandle for the temp file (which it will the first time it's called
with the name 'svn_hash') that FileHandle will end up in the SVN::Pool
and should that pool have SVN::Pool::clear called on it that FileHandle
will be closed out from under Git::temp_acquire.

Since the only call site to Git::temp_acquire with the name 'svn_hash'
is inside the close_file function, if an 'svn_hash' temp file is ever
created its FileHandle is guaranteed to be created in the active
SVN::Pool.

This has not been a problem in the past because the SVN::Pool was not
being cleared.  However, since dfa72fdb (git-svn: reload RA every
log-window-size, v2.2.0) the pool has been getting cleared periodically
at which point the FileHandle for the 'svn_hash' temp file gets closed.
Any subsequent calls to Git::temp_acquire for 'svn_hash', however,
succeed without creating/opening a new temporary file since it still has
the now invalid FileHandle in its cache.  Callers that then attempt to
use that FileHandle fail with an error.

We avoid this problem by making sure the 'svn_hash' temp file is created
in the same place the 'svn_delta_...' and 'git_blob_...' temp files are
(and then temp_release'd) so that it can be safely used inside the
close_file function without having its FileHandle end up in an SVN::Pool
that gets cleared.

Additionally the Git.pm cat_blob function creates a bidirectional pipe
FileHandle using the IPC::Open2::open2 function.  If that handle is
created too late, it also gets caught up in the SVN::Pool and incorrectly
closed by the SVN::Pool::clear call.  But this only seems to happen with
more recent versions of Perl and svn.

To avoid this problem we add an explicit call to _open_cat_blob_if_needed
before the first call to SVN::Pool->new_default to make sure the open2
handle does not end up in the SVN::Pool.

Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
---
 perl/Git/SVN/Fetcher.pm | 8 ++++++++
 perl/Git/SVN/Ra.pm      | 3 +++
 2 files changed, 11 insertions(+)

diff --git a/perl/Git/SVN/Fetcher.pm b/perl/Git/SVN/Fetcher.pm
index 10edb277..613055a3 100644
--- a/perl/Git/SVN/Fetcher.pm
+++ b/perl/Git/SVN/Fetcher.pm
@@ -322,6 +322,14 @@ sub apply_textdelta {
 	# (but $base does not,) so dup() it for reading in close_file
 	open my $dup, '<&', $fh or croak $!;
 	my $base = $::_repository->temp_acquire("git_blob_${$}_$suffix");
+	# close_file may call temp_acquire on 'svn_hash', but because of the
+	# call chain, if the temp_acquire call from close_file ends up being the
+	# call that first creates the 'svn_hash' temp file, then the FileHandle
+	# that's created as a result will end up in an SVN::Pool that we clear
+	# in SVN::Ra::gs_fetch_loop_common.  Avoid that by making sure the
+	# 'svn_hash' FileHandle is already created before close_file is called.
+	my $tmp_fh = $::_repository->temp_acquire('svn_hash');
+	$::_repository->temp_release($tmp_fh, 1);
 
 	if ($fb->{blob}) {
 		my ($base_is_link, $size);
diff --git a/perl/Git/SVN/Ra.pm b/perl/Git/SVN/Ra.pm
index 622535e2..32e2f19e 100644
--- a/perl/Git/SVN/Ra.pm
+++ b/perl/Git/SVN/Ra.pm
@@ -391,6 +391,9 @@ sub longest_common_path {
 sub gs_fetch_loop_common {
 	my ($self, $base, $head, $gsv, $globs) = @_;
 	return if ($base > $head);
+	# Make sure the cat_blob open2 FileHandle is created before calling
+	# SVN::Pool::new_default so that it does not incorrectly end up in the pool.
+	$::_repository->_open_cat_blob_if_needed;
 	my $gpool = SVN::Pool->new_default;
 	my $ra_url = $self->url;
 	my $reload_ra = sub {
--

  reply	other threads:[~2015-02-26 13:49 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAK6Z60ciheWOUGOv1sYcA==B2WR1Rs_eMU+9a=R3FBwc_37CyQ@mail.gmail.com>
2015-02-26  9:09 ` git svn import failure : write .git/Git_svn_hash_BmjclS: Bad file descriptor Nico Schlömer
2015-02-26 13:49   ` Kyle J. McKay [this message]
2015-02-26 21:27     ` Eric Wong
2015-03-23 19:11       ` Junio C Hamano
2015-03-23 19:36         ` Eric Wong
2015-03-23 19:56           ` Junio C Hamano
2015-01-29 22:03 Valery Yundin
2015-01-29 23:34 ` Eric Wong
2015-01-29 23:59   ` Valery Yundin
2015-01-30  0:22     ` Eric Wong
2015-01-30  0:40       ` Valery Yundin
2015-01-30  1:30         ` Eric Wong
2015-01-31 12:51           ` Nico Schlömer
2015-01-31 14:52             ` Valery Yundin
2015-02-12 19:18               ` Eric Wong
2015-02-16 15:10                 ` Nico Schlömer
2015-02-17  6:17                   ` Eric Wong
2015-02-25 10:19                     ` Kyle J. McKay
     [not found]                       ` <CAK6Z60fqEkM_tON6tcnwBqJzBCvLB=eVJdyXSnNb7N1iR_DSsw@mail.gmail.com>
2015-02-25 20:08                         ` Eric Wong
2015-02-25 20:33                         ` Kyle J. McKay
  -- strict thread matches above, loose matches on Subject: below --
2015-01-16 17:13 Mike
2015-01-08 12:43 Minty
2015-01-08 17:51 ` Minty
2015-01-14  1:50 ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cbcb926c1b7edd31548a15ea4966a35@74d39fa044aa309eaea14b9f57fe79c \
    --to=mackyle@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ipso@snappymail.ca \
    --cc=mintywalker@gmail.com \
    --cc=nico.schloemer@gmail.com \
    --cc=normalperson@yhbt.net \
    --cc=yuvalery@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).