Re: [PATCHv3] refs.c: enable large transactions

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Junio C Hamano <gitster@pobox.com>
To: Stefan Beller <sbeller@google.com>
Cc: mhagger@alum.mit.edu, git@vger.kernel.org, peff@peff.net
Subject: Re: [PATCHv3] refs.c: enable large transactions
Date: Thu, 23 Apr 2015 10:56:21 -0700	[thread overview]
Message-ID: <xmqqzj5y3f0a.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <1429738227-2985-1-git-send-email-sbeller@google.com> (Stefan Beller's message of "Wed, 22 Apr 2015 14:30:27 -0700")

Stefan Beller <sbeller@google.com> writes:

> diff --git a/refs.c b/refs.c
> index 4f495bd..7ce7b97 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -3041,6 +3041,13 @@ static int write_ref_sha1(struct ref_lock *lock,
>  		errno = EINVAL;
>  		return -1;
>  	}
> +	if (lock->lk->fd == -1 && reopen_lock_file(lock->lk) == -1) {

Granted, we explicitly assign -1 to lk->fd when we know it is
closed, and the return value from reopen_lock_file() can come only
from the return value from open(2), which is defined to return -1
(i.e. not just any negative value) upon failure, but still, isn't it
customary to check with "< 0" rather than "== -1" for errors?

> +		int save_errno = errno;
> +		error("Couldn't reopen %s", lock->lk->filename.buf);

No need to change this line, but I noticed that we might want to do
something about the first one of the following two:

    $ git grep -e '[ 	]error(_*"[A-Z]' | wc -l
    146
    $ git grep -e '[ 	]error(_*"[a-z]' | wc -l
    390

> +		unlock_ref(lock);
> +		errno = save_errno;
> +		return -1;
> +	}

Is this the only place in the entire codebase where a lockfile,
which may have been closed to save number of open file descriptors,
needs to be reopened?  If I understand correctly, lockfile API is
not for sole use of refs (don't the index and the pack codepaths use
it, too?), so it may give us a better abstraction to have a helper
function in lockfile.[ch] that takes a lock object, i.e.

	int make_lock_fd_valid(struct lock_file *lk)
        {
		if (lk->fd < 0 && reopen_lock_file(lk) < 0) {
			... error ...
                        return -1;
		}
                return 0;
	}

and to call it at this point, i.e.

	if (make_lock_fd_valid(lock->lk) < 0)
        	return -1;

perhaps?

> @@ -3733,6 +3741,20 @@ int ref_transaction_commit(struct ref_transaction *transaction,
>  		return 0;
>  	}
>  
> +	/*
> +	 * We need to open many files in a large transaction, so come up with
> +	 * a reasonable maximum. We still keep some spares for stdin/out and
> +	 * other open files. Experiments determined we need more fds when
> +	 * running inside our test suite than directly in the shell. It's
> +	 * unclear where these fds come from. 25 should be a reasonable large
> +	 * number though.
> +	 */

> +	remaining_fds = get_max_fd_limit();
> +	if (remaining_fds > 25)
> +		remaining_fds -= 25;
> +	else
> +		remaining_fds = 0;

Is the value of pack_open_fds visible from here? I am wondering if
we might want "scratch fds Git can use for its own use" accounting
so that the two subsystems can collectively say "it is still safe
to use one more fd and leave 25 for other people". With the code
structure proposed here, the pack reader can grab all but 25 fds,
which would leave the rest of Git including this code only 25,
and the value in remaining_fds would become totally meaningless.

It certainly can wait to worry about that and we do not have to do
anything about it in this patch, but it may still be a good idea to
leave "NEEDSWORK:" comment here (and point at open_packed_git_1() in
sha1_file.c in the comment).

I do not think the other side needs to know about the fd consumption
in this function, as what is opened in here will be closed before
this function returns.

> @@ -3762,6 +3784,12 @@ int ref_transaction_commit(struct ref_transaction *transaction,
>  				    update->refname);
>  			goto cleanup;
>  		}
> +		if (!(flags & REF_HAVE_NEW) ||
> +		    is_null_sha1(update->new_sha1) ||
> +		    remaining_fds == 0)
> +			close_lock_file(update->lock->lk);
> +		else
> +			remaining_fds--;
>  	}

next prev parent reply	other threads:[~2015-04-23 17:56 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-22 21:30 [PATCHv3] refs.c: enable large transactions Stefan Beller
2015-04-23 17:56 ` Junio C Hamano [this message]
2015-04-24  0:21   ` Stefan Beller
2015-04-24  1:37     ` Junio C Hamano
2015-04-24 16:16       ` Stefan Beller
2015-04-24 17:19         ` Jeff King
2015-04-24 18:12         ` Jonathan Nieder
2015-04-24 18:31           ` Stefan Beller
2015-04-24 20:17           ` Jeff King
2015-04-25  4:23             ` Junio C Hamano
2015-04-25  5:00               ` Jeff King
2015-04-25  5:24                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqzj5y3f0a.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.