[PATCH 0/2] Retry attempts to acquire the packed-refs lock

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Retry attempts to acquire the packed-refs lock
@ 2015-05-01 14:52 Michael Haggerty
  2015-05-01 14:52 ` [PATCH 1/2] lockfile: allow file locking to be retried with a timeout Michael Haggerty
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Michael Haggerty @ 2015-05-01 14:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

At GitHub we were seeing occasional lock contention over packed-refs.
It wasn't very common, but when you have as much git traffic as we
have, anything that *can* happen *will* happen.

The problem is that Git only tries to acquire locks a single time. If
that attempt fails, the whole process fails. So, for example, if two
processes are trying to delete two different references, one of them
can fail due to inability to acquire the packed-refs lock, even though
it could have succeeded if it had just waited a moment.

This patch series adds a new function,
hold_lock_file_for_update_timeout(), which retries the lock
acquisition for a specified length of time. The retries necessarily
have to use polling, which it does using a quadratic backoff with a
random component. It might be a bit overengineered for this single
purpose, but I wanted it to be usable in other contexts without second
thoughts.

This patch series also changes lock_packed_refs() to call the new
function with a timeout of 1 second (by default; the timeout is
configurable) and adds some tests that the retry and timeout are
working.

It might be that there are other call sites that would benefit from
retrying lock acquisition (the index file?), but this patch series
only applies the new functionality to packed-refs.lock.

We have a patch similar to this one running on our servers at GitHub,
with no problems observed so far.

This series applies to master. It is also available from my GitHub
repository:

    https://github.com/mhagger/git branch lockfile-retry

Michael Haggerty (2):
  lockfile: allow file locking to be retried with a timeout
  lock_packed_refs(): allow retries when acquiring the packed-refs lock

 Documentation/config.txt |  6 ++++
 lockfile.c               | 79 ++++++++++++++++++++++++++++++++++++++++++++++--
 lockfile.h               | 16 ++++++++--
 refs.c                   | 12 +++++++-
 t/t3210-pack-refs.sh     | 17 +++++++++++
 5 files changed, 125 insertions(+), 5 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] lockfile: allow file locking to be retried with a timeout
  2015-05-01 14:52 [PATCH 0/2] Retry attempts to acquire the packed-refs lock Michael Haggerty
@ 2015-05-01 14:52 ` Michael Haggerty
  2015-05-11 18:04   ` Junio C Hamano
  2015-05-01 14:52 ` [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock Michael Haggerty
  2015-05-03  2:12 ` [PATCH 0/2] Retry attempts to acquire " Junio C Hamano
  2 siblings, 1 reply; 18+ messages in thread
From: Michael Haggerty @ 2015-05-01 14:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

Currently, there is only one attempt to lock a file. If it fails, the
whole operation fails.

But it might sometimes be advantageous to try acquiring a file lock a
few times before giving up. So add a new function,
hold_lock_file_for_update_timeout(), that allows a timeout to be
specified. Make hold_lock_file_for_update() a thin wrapper around the
new function.

If timeout_ms is positive, then retry for at least that many
milliseconds to acquire the lock. On each failed attempt, use select()
to wait for a backoff time that increases quadratically (capped at 1
second) and has a random component to prevent two processes from
getting synchronized. If timeout_ms is negative, retry indefinitely.

In a moment we will switch to using the new function when locking
packed-refs.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---

Notes (discussion):
    It would be easy to also add a hold_lock_file_for_append_timeout(),
    but I can't yet think of an application for it.

 lockfile.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 lockfile.h | 16 +++++++++++--
 2 files changed, 91 insertions(+), 4 deletions(-)

diff --git a/lockfile.c b/lockfile.c
index 9889277..30e65e9 100644
--- a/lockfile.c
+++ b/lockfile.c
@@ -157,6 +157,80 @@ static int lock_file(struct lock_file *lk, const char *path, int flags)
 	return lk->fd;
 }
 
+static int sleep_microseconds(long us)
+{
+	struct timeval tv;
+	tv.tv_sec = 0;
+	tv.tv_usec = us;
+	return select(0, NULL, NULL, NULL, &tv);
+}
+
+/*
+ * Constants defining the gaps between attempts to lock a file. The
+ * first backoff period is approximately INITIAL_BACKOFF_MS
+ * milliseconds. The longest backoff period is approximately
+ * (BACKOFF_MAX_MULTIPLIER * INITIAL_BACKOFF_MS) milliseconds.
+ */
+#define INITIAL_BACKOFF_MS 1L
+#define BACKOFF_MAX_MULTIPLIER 1000
+
+/*
+ * Try locking path, retrying with quadratic backoff for at least
+ * timeout_ms milliseconds. If timeout_ms is 0, try locking the file
+ * exactly once. If timeout_ms is -1, try indefinitely.
+ */
+static int lock_file_timeout(struct lock_file *lk, const char *path,
+			     int flags, long timeout_ms)
+{
+	int n = 1;
+	int multiplier = 1;
+	long remaining_us = 0;
+	static int random_initialized = 0;
+
+	if (timeout_ms == 0)
+		return lock_file(lk, path, flags);
+
+	if (!random_initialized) {
+		srandom((unsigned int)getpid());
+		random_initialized = 1;
+	}
+
+	if (timeout_ms > 0) {
+		/* avoid overflow */
+		if (timeout_ms <= LONG_MAX / 1000)
+			remaining_us = timeout_ms * 1000;
+		else
+			remaining_us = LONG_MAX;
+	}
+
+	while (1) {
+		long backoff_ms, wait_us;
+		int fd;
+
+		fd = lock_file(lk, path, flags);
+
+		if (fd >= 0)
+			return fd; /* success */
+		else if (errno != EEXIST)
+			return -1; /* failure other than lock held */
+		else if (timeout_ms > 0 && remaining_us <= 0)
+			return -1; /* failure due to timeout */
+
+		backoff_ms = multiplier * INITIAL_BACKOFF_MS;
+		/* back off for between 0.75*backoff_ms and 1.25*backoff_ms */
+		wait_us = (750 + random() % 500) * backoff_ms;
+		sleep_microseconds(wait_us);
+		remaining_us -= wait_us;
+
+		/* Recursion: (n+1)^2 = n^2 + 2n + 1 */
+		multiplier += 2*n + 1;
+		if (multiplier > BACKOFF_MAX_MULTIPLIER)
+			multiplier = BACKOFF_MAX_MULTIPLIER;
+		else
+			n++;
+	}
+}
+
 void unable_to_lock_message(const char *path, int err, struct strbuf *buf)
 {
 	if (err == EEXIST) {
@@ -179,9 +253,10 @@ NORETURN void unable_to_lock_die(const char *path, int err)
 }
 
 /* This should return a meaningful errno on failure */
-int hold_lock_file_for_update(struct lock_file *lk, const char *path, int flags)
+int hold_lock_file_for_update_timeout(struct lock_file *lk, const char *path,
+				      int flags, long timeout_ms)
 {
-	int fd = lock_file(lk, path, flags);
+	int fd = lock_file_timeout(lk, path, flags, timeout_ms);
 	if (fd < 0 && (flags & LOCK_DIE_ON_ERROR))
 		unable_to_lock_die(path, errno);
 	return fd;
diff --git a/lockfile.h b/lockfile.h
index cd2ec95..b4abc61 100644
--- a/lockfile.h
+++ b/lockfile.h
@@ -74,8 +74,20 @@ struct lock_file {
 extern void unable_to_lock_message(const char *path, int err,
 				   struct strbuf *buf);
 extern NORETURN void unable_to_lock_die(const char *path, int err);
-extern int hold_lock_file_for_update(struct lock_file *, const char *path, int);
-extern int hold_lock_file_for_append(struct lock_file *, const char *path, int);
+extern int hold_lock_file_for_update_timeout(
+		struct lock_file *lk, const char *path,
+		int flags, long timeout_ms);
+
+static inline int hold_lock_file_for_update(
+		struct lock_file *lk, const char *path,
+		int flags)
+{
+	return hold_lock_file_for_update_timeout(lk, path, flags, 0);
+}
+
+extern int hold_lock_file_for_append(struct lock_file *lk, const char *path,
+				     int flags);
+
 extern FILE *fdopen_lock_file(struct lock_file *, const char *mode);
 extern char *get_locked_file_path(struct lock_file *);
 extern int commit_lock_file_to(struct lock_file *, const char *path);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] lockfile: allow file locking to be retried with a timeout
  2015-05-01 14:52 ` [PATCH 1/2] lockfile: allow file locking to be retried with a timeout Michael Haggerty
@ 2015-05-11 18:04   ` Junio C Hamano
  0 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2015-05-11 18:04 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Stefan Beller, Jeff King, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
> ---
>
> Notes (discussion):
>     It would be easy to also add a hold_lock_file_for_append_timeout(),
>     but I can't yet think of an application for it.
>
>  lockfile.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  lockfile.h | 16 +++++++++++--
>  2 files changed, 91 insertions(+), 4 deletions(-)
>
> diff --git a/lockfile.c b/lockfile.c
> index 9889277..30e65e9 100644
> --- a/lockfile.c
> +++ b/lockfile.c
> @@ -157,6 +157,80 @@ static int lock_file(struct lock_file *lk, const char *path, int flags)
>  	return lk->fd;
>  }
>  
> +static int sleep_microseconds(long us)
> +{
> +	struct timeval tv;
> +	tv.tv_sec = 0;
> +	tv.tv_usec = us;
> +	return select(0, NULL, NULL, NULL, &tv);
> +}

This is familiar for an old timer like me.  It seems help.c uses
"poll(NULL, 0, timeout)" for "do nothing but wait" (and we have
compat/poll to help those who lack poll() by using select()).

As we do not need microseconds resolution here, either would be fine
;-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock
  2015-05-01 14:52 [PATCH 0/2] Retry attempts to acquire the packed-refs lock Michael Haggerty
  2015-05-01 14:52 ` [PATCH 1/2] lockfile: allow file locking to be retried with a timeout Michael Haggerty
@ 2015-05-01 14:52 ` Michael Haggerty
  2015-05-01 16:13   ` Johannes Sixt
  2015-05-01 17:51   ` Stefan Beller
  2015-05-03  2:12 ` [PATCH 0/2] Retry attempts to acquire " Junio C Hamano
  2 siblings, 2 replies; 18+ messages in thread
From: Michael Haggerty @ 2015-05-01 14:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

Currently, there is only one attempt to acquire any lockfile, and if
the lock is held by another process, the locking attempt fails
immediately.

This is not such a limitation for loose reference files. First, they
don't take long to rewrite. Second, most reference updates have a
known "old" value, so if another process is updating a reference at
the same moment that we are trying to lock it, then probably the
expected "old" value will not longer be valid, and the update will
fail anyway.

But these arguments do not hold for packed-refs:

* The packed-refs file can be large and take significant time to
  rewrite.

* Many references are stored in a single packed-refs file, so it could
  be that the other process was changing a different reference than
  the one that we are interested in.

Therefore, it is much more likely for there to be spurious lock
conflicts in connection to the packed-refs file, resulting in
unnecessary command failures.

So, if the first attempt to lock the packed-refs file fails, continue
retrying for a configurable length of time before giving up. The
default timeout is 1 second.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---

Notes (discussion):
    At first I wasn't going to make this setting configurable. After all,
    who could object to one second?
    
    But then I realized that making the length of the timeout configurable
    would make it easier to test. Since sleep(1) is only guaranteed to
    have a 1-second resolution, it is helpful to set the timeout longer
    than 1 second in the test. So I made it configurable, and documented
    the setting.
    
    I don't have a strong opinion either way.
    
    By the way, if anybody has a better idea for how to test this, please
    chime up. As written, the two new tests each take about one second to
    run (though they are mostly idle during that time, so they parallelize
    well with other tests).

 Documentation/config.txt |  6 ++++++
 refs.c                   | 12 +++++++++++-
 t/t3210-pack-refs.sh     | 17 +++++++++++++++++
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2e5ceaf..3c24e10 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -622,6 +622,12 @@ core.commentChar::
 If set to "auto", `git-commit` would select a character that is not
 the beginning character of any line in existing commit messages.
 
+core.packedRefsTimeout::
+	The length of time, in milliseconds, to retry when trying to
+	lock the `packed-refs` file. Value 0 means not to retry at
+	all; -1 means to try indefinitely. Default is 1000 (i.e.,
+	retry for 1 second).
+
 sequence.editor::
 	Text editor used by `git rebase -i` for editing the rebase instruction file.
 	The value is meant to be interpreted by the shell when it is used.
diff --git a/refs.c b/refs.c
index 47e4e53..3f8ac63 100644
--- a/refs.c
+++ b/refs.c
@@ -2413,9 +2413,19 @@ static int write_packed_entry_fn(struct ref_entry *entry, void *cb_data)
 /* This should return a meaningful errno on failure */
 int lock_packed_refs(int flags)
 {
+	static int timeout_configured = 0;
+	static int timeout_value = 1000;
+
 	struct packed_ref_cache *packed_ref_cache;
 
-	if (hold_lock_file_for_update(&packlock, git_path("packed-refs"), flags) < 0)
+	if (!timeout_configured) {
+		git_config_get_int("core.packedrefstimeout", &timeout_value);
+		timeout_configured = 1;
+	}
+
+	if (hold_lock_file_for_update_timeout(
+			    &packlock, git_path("packed-refs"),
+			    flags, timeout_value) < 0)
 		return -1;
 	/*
 	 * Get the current packed-refs while holding the lock.  If the
diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
index aa9eb3a..d18b726 100755
--- a/t/t3210-pack-refs.sh
+++ b/t/t3210-pack-refs.sh
@@ -187,4 +187,21 @@ test_expect_success 'notice d/f conflict with existing ref' '
 	test_must_fail git branch foo/bar/baz/lots/of/extra/components
 '
 
+test_expect_success 'timeout if packed-refs.lock exists' '
+	LOCK=.git/packed-refs.lock &&
+	>$LOCK &&
+	test_when_finished "rm -f $LOCK" &&
+	test_must_fail git pack-refs --all --prune
+'
+
+test_expect_success 'retry acquiring packed-refs.lock' '
+	LOCK=.git/packed-refs.lock &&
+	>$LOCK &&
+	test_when_finished "rm -f $LOCK" &&
+	{
+		( sleep 1 ; rm -f $LOCK ) &
+	} &&
+	git -c core.packedrefstimeout=3000 pack-refs --all --prune
+'
+
 test_done
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock
  2015-05-01 14:52 ` [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock Michael Haggerty
@ 2015-05-01 16:13   ` Johannes Sixt
  2015-05-02  3:47     ` Michael Haggerty
  2015-05-01 17:51   ` Stefan Beller
  1 sibling, 1 reply; 18+ messages in thread
From: Johannes Sixt @ 2015-05-01 16:13 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, Stefan Beller, Jeff King, git

Am 01.05.2015 um 16:52 schrieb Michael Haggerty:
> +test_expect_success 'retry acquiring packed-refs.lock' '
> +	LOCK=.git/packed-refs.lock &&
> +	>$LOCK &&
> +	test_when_finished "rm -f $LOCK" &&
> +	{
> +		( sleep 1 ; rm -f $LOCK ) &
> +	} &&

I haven't tested yet, but I think that this will be problematic on 
Windows: a directory cannot be removed if it is the current directory of 
a process. Here, the sub-shell process is alive for a second. If the 
remainder of the test script completes before the process dies, the test 
directory cannot be removed.

How about this:

	test_when_finished "wait; rm -f $LOCK" &&
	{ sleep 1 & } &&
	...

> +	git -c core.packedrefstimeout=3000 pack-refs --all --prune
> +'
> +
>   test_done
>

-- Hannes

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock
  2015-05-01 16:13   ` Johannes Sixt
@ 2015-05-02  3:47     ` Michael Haggerty
  2015-05-02 21:43       ` Johannes Sixt
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Haggerty @ 2015-05-02  3:47 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Stefan Beller, Jeff King, git

On 05/01/2015 06:13 PM, Johannes Sixt wrote:
> Am 01.05.2015 um 16:52 schrieb Michael Haggerty:
>> +test_expect_success 'retry acquiring packed-refs.lock' '
>> +    LOCK=.git/packed-refs.lock &&
>> +    >$LOCK &&
>> +    test_when_finished "rm -f $LOCK" &&
>> +    {
>> +        ( sleep 1 ; rm -f $LOCK ) &
>> +    } &&
> 
> I haven't tested yet, but I think that this will be problematic on
> Windows: a directory cannot be removed if it is the current directory of
> a process. Here, the sub-shell process is alive for a second. If the
> remainder of the test script completes before the process dies, the test
> directory cannot be removed.
> 
> How about this:
> 
>     test_when_finished "wait; rm -f $LOCK" &&
>     { sleep 1 & } &&
>     ...
> 
>> +    git -c core.packedrefstimeout=3000 pack-refs --all --prune
>> +'
>> +
>>   test_done

Thanks for pointing out this problem. Your suggestion seems good. I
assume that you didn't intend to omit the "rm -f $LOCK" from the
subprocess, because the whole point is for that to happen while "git
pack-refs" is running.

I will include your change in v2.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock
  2015-05-02  3:47     ` Michael Haggerty
@ 2015-05-02 21:43       ` Johannes Sixt
  2015-05-11  9:31         ` Michael Haggerty
  0 siblings, 1 reply; 18+ messages in thread
From: Johannes Sixt @ 2015-05-02 21:43 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, Stefan Beller, Jeff King, git

Am 02.05.2015 um 05:47 schrieb Michael Haggerty:
> On 05/01/2015 06:13 PM, Johannes Sixt wrote:
>> How about this:
>>
>>      test_when_finished "wait; rm -f $LOCK" &&
>>      { sleep 1 & } &&
>>      ...
>
> Thanks for pointing out this problem. Your suggestion seems good. I
> assume that you didn't intend to omit the "rm -f $LOCK" from the
> subprocess, because the whole point is for that to happen while "git
> pack-refs" is running.

I see. So, if git pack-refs works correctly, it waits for the 
sub-process, and the 'wait' in test_when_finished does not buy a lot. If 
there is breakage, the trash directory is not attempted to be removed, 
and it does not matter that a process potentially occupies it. I think 
your version is good then.

-- Hannes

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock
  2015-05-02 21:43       ` Johannes Sixt
@ 2015-05-11  9:31         ` Michael Haggerty
  0 siblings, 0 replies; 18+ messages in thread
From: Michael Haggerty @ 2015-05-11  9:31 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Stefan Beller, Jeff King, git

On 05/02/2015 11:43 PM, Johannes Sixt wrote:
> Am 02.05.2015 um 05:47 schrieb Michael Haggerty:
>> On 05/01/2015 06:13 PM, Johannes Sixt wrote:
>>> How about this:
>>>
>>>      test_when_finished "wait; rm -f $LOCK" &&
>>>      { sleep 1 & } &&
>>>      ...
>>
>> Thanks for pointing out this problem. Your suggestion seems good. I
>> assume that you didn't intend to omit the "rm -f $LOCK" from the
>> subprocess, because the whole point is for that to happen while "git
>> pack-refs" is running.
> 
> I see. So, if git pack-refs works correctly, it waits for the
> sub-process, and the 'wait' in test_when_finished does not buy a lot. If
> there is breakage, the trash directory is not attempted to be removed,
> and it does not matter that a process potentially occupies it. I think
> your version is good then.

Well, even if my version is working correctly, there is still a gap
between when the subprocess removes the lockfile and when the subprocess
ends, during which the outer process could theoretically finish and try
to delete the test directory. It may not cause problems in practice, but
let's add the wait anyway to make everything kosher.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock
  2015-05-01 14:52 ` [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock Michael Haggerty
  2015-05-01 16:13   ` Johannes Sixt
@ 2015-05-01 17:51   ` Stefan Beller
  2015-05-01 18:22     ` Jeff King
  1 sibling, 1 reply; 18+ messages in thread
From: Stefan Beller @ 2015-05-01 17:51 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, Jeff King, git@vger.kernel.org

On Fri, May 1, 2015 at 7:52 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> Currently, there is only one attempt to acquire any lockfile, and if
> the lock is held by another process, the locking attempt fails
> immediately.
>
> This is not such a limitation for loose reference files. First, they
> don't take long to rewrite. Second, most reference updates have a
> known "old" value, so if another process is updating a reference at
> the same moment that we are trying to lock it, then probably the
> expected "old" value will not longer be valid, and the update will
> fail anyway.
>
> But these arguments do not hold for packed-refs:
>
> * The packed-refs file can be large and take significant time to
>   rewrite.
>
> * Many references are stored in a single packed-refs file, so it could
>   be that the other process was changing a different reference than
>   the one that we are interested in.
>
> Therefore, it is much more likely for there to be spurious lock
> conflicts in connection to the packed-refs file, resulting in
> unnecessary command failures.
>
> So, if the first attempt to lock the packed-refs file fails, continue
> retrying for a configurable length of time before giving up. The
> default timeout is 1 second.
>
> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
> ---
>
> Notes (discussion):
>     At first I wasn't going to make this setting configurable. After all,
>     who could object to one second?
>
>     But then I realized that making the length of the timeout configurable
>     would make it easier to test. Since sleep(1) is only guaranteed to
>     have a 1-second resolution, it is helpful to set the timeout longer
>     than 1 second in the test. So I made it configurable, and documented
>     the setting.
>
>     I don't have a strong opinion either way.
>
>     By the way, if anybody has a better idea for how to test this, please
>     chime up. As written, the two new tests each take about one second to
>     run (though they are mostly idle during that time, so they parallelize
>     well with other tests).
>
>  Documentation/config.txt |  6 ++++++
>  refs.c                   | 12 +++++++++++-
>  t/t3210-pack-refs.sh     | 17 +++++++++++++++++
>  3 files changed, 34 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 2e5ceaf..3c24e10 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -622,6 +622,12 @@ core.commentChar::
>  If set to "auto", `git-commit` would select a character that is not
>  the beginning character of any line in existing commit messages.
>
> +core.packedRefsTimeout::
> +       The length of time, in milliseconds, to retry when trying to
> +       lock the `packed-refs` file. Value 0 means not to retry at
> +       all; -1 means to try indefinitely. Default is 1000 (i.e.,
> +       retry for 1 second).
> +
>  sequence.editor::
>         Text editor used by `git rebase -i` for editing the rebase instruction file.
>         The value is meant to be interpreted by the shell when it is used.
> diff --git a/refs.c b/refs.c
> index 47e4e53..3f8ac63 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -2413,9 +2413,19 @@ static int write_packed_entry_fn(struct ref_entry *entry, void *cb_data)
>  /* This should return a meaningful errno on failure */
>  int lock_packed_refs(int flags)
>  {
> +       static int timeout_configured = 0;
> +       static int timeout_value = 1000;

I'd personally be more happier with a default value of 100 ms or less
The reason is found in the human nature, as humans tend to perceive
anything faster than 100ms as "instant"[1], while a 100ms is a long time
for computers.

Now a small default time may lead to to little retries, so maybe it's worth
checking at the very end of the time again (ignoring the computed backoff
times). As pushes to $server usually take a while (connecting, packing packs,
writing objects etc), this may be overcautios bikeshedding from my side.

[1] http://stackoverflow.com/questions/536300/what-is-the-shortest-perceivable-application-response-delay


> +
>         struct packed_ref_cache *packed_ref_cache;
>
> -       if (hold_lock_file_for_update(&packlock, git_path("packed-refs"), flags) < 0)
> +       if (!timeout_configured) {
> +               git_config_get_int("core.packedrefstimeout", &timeout_value);
> +               timeout_configured = 1;
> +       }
> +
> +       if (hold_lock_file_for_update_timeout(
> +                           &packlock, git_path("packed-refs"),
> +                           flags, timeout_value) < 0)
>                 return -1;
>         /*
>          * Get the current packed-refs while holding the lock.  If the
> diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
> index aa9eb3a..d18b726 100755
> --- a/t/t3210-pack-refs.sh
> +++ b/t/t3210-pack-refs.sh
> @@ -187,4 +187,21 @@ test_expect_success 'notice d/f conflict with existing ref' '
>         test_must_fail git branch foo/bar/baz/lots/of/extra/components
>  '
>
> +test_expect_success 'timeout if packed-refs.lock exists' '
> +       LOCK=.git/packed-refs.lock &&
> +       >$LOCK &&
> +       test_when_finished "rm -f $LOCK" &&
> +       test_must_fail git pack-refs --all --prune
> +'
> +
> +test_expect_success 'retry acquiring packed-refs.lock' '
> +       LOCK=.git/packed-refs.lock &&
> +       >$LOCK &&
> +       test_when_finished "rm -f $LOCK" &&
> +       {
> +               ( sleep 1 ; rm -f $LOCK ) &
> +       } &&
> +       git -c core.packedrefstimeout=3000 pack-refs --all --prune
> +'
> +
>  test_done
> --
> 2.1.4
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock
  2015-05-01 17:51   ` Stefan Beller
@ 2015-05-01 18:22     ` Jeff King
  2015-05-02  5:19       ` Michael Haggerty
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2015-05-01 18:22 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Michael Haggerty, Junio C Hamano, git@vger.kernel.org

On Fri, May 01, 2015 at 10:51:47AM -0700, Stefan Beller wrote:

> > diff --git a/refs.c b/refs.c
> > index 47e4e53..3f8ac63 100644
> > --- a/refs.c
> > +++ b/refs.c
> > @@ -2413,9 +2413,19 @@ static int write_packed_entry_fn(struct ref_entry *entry, void *cb_data)
> >  /* This should return a meaningful errno on failure */
> >  int lock_packed_refs(int flags)
> >  {
> > +       static int timeout_configured = 0;
> > +       static int timeout_value = 1000;
> 
> I'd personally be more happier with a default value of 100 ms or less
> The reason is found in the human nature, as humans tend to perceive
> anything faster than 100ms as "instant"[1], while a 100ms is a long time
> for computers.
> 
> Now a small default time may lead to to little retries, so maybe it's worth
> checking at the very end of the time again (ignoring the computed backoff
> times). As pushes to $server usually take a while (connecting, packing packs,
> writing objects etc), this may be overcautios bikeshedding from my side.

Keep in mind that this 1s is the maximum time to wait. The
lock_file_timeout() code from patch 1 starts off at 1ms, grows
quadratically, and quits as soon as it succeeds. So in most cases, the
user will wait a much smaller amount of time.

The factors that go into this timeout length are really:

  1. If there's a stale lockfile, the user will have to wait the whole
     period. How long do we keep retrying before giving up?

  2. How long do we typically hold the lock for? Aside from absurd
     cases, writing out the packed-refs file isn't that expensive. But
     while holding the packed-refs lock, we may actually be iterating
     the loose refs, which can be rather slow on a cold-cache.

If we want to improve _responsiveness_ in the normal case, I think it's
not the max timeout we want to tweak but the resolution of retries.
That's set in patch 1 by the maximum backoff multiplier, which can put
us up to 1s between retries. It might make sense to drop that to 500ms
or even less.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock
  2015-05-01 18:22     ` Jeff King
@ 2015-05-02  5:19       ` Michael Haggerty
  2015-05-04 17:31         ` Stefan Beller
  2015-05-05 19:21         ` Jeff King
  0 siblings, 2 replies; 18+ messages in thread
From: Michael Haggerty @ 2015-05-02  5:19 UTC (permalink / raw)
  To: Jeff King, Stefan Beller; +Cc: Junio C Hamano, git@vger.kernel.org

On 05/01/2015 08:22 PM, Jeff King wrote:
> On Fri, May 01, 2015 at 10:51:47AM -0700, Stefan Beller wrote:
> 
>>> diff --git a/refs.c b/refs.c
>>> index 47e4e53..3f8ac63 100644
>>> --- a/refs.c
>>> +++ b/refs.c
>>> @@ -2413,9 +2413,19 @@ static int write_packed_entry_fn(struct ref_entry *entry, void *cb_data)
>>>  /* This should return a meaningful errno on failure */
>>>  int lock_packed_refs(int flags)
>>>  {
>>> +       static int timeout_configured = 0;
>>> +       static int timeout_value = 1000;
>>
>> I'd personally be more happier with a default value of 100 ms or less
>> The reason is found in the human nature, as humans tend to perceive
>> anything faster than 100ms as "instant"[1], while a 100ms is a long time
>> for computers.
>>
>> Now a small default time may lead to to little retries, so maybe it's worth
>> checking at the very end of the time again (ignoring the computed backoff
>> times). As pushes to $server usually take a while (connecting, packing packs,
>> writing objects etc), this may be overcautios bikeshedding from my side.
> 
> Keep in mind that this 1s is the maximum time to wait. The
> lock_file_timeout() code from patch 1 starts off at 1ms, grows
> quadratically, and quits as soon as it succeeds. So in most cases, the
> user will wait a much smaller amount of time.
> 
> The factors that go into this timeout length are really:
> 
>   1. If there's a stale lockfile, the user will have to wait the whole
>      period. How long do we keep retrying before giving up?
> 
>   2. How long do we typically hold the lock for? Aside from absurd
>      cases, writing out the packed-refs file isn't that expensive. But
>      while holding the packed-refs lock, we may actually be iterating
>      the loose refs, which can be rather slow on a cold-cache.
> 
> If we want to improve _responsiveness_ in the normal case, I think it's
> not the max timeout we want to tweak but the resolution of retries.
> That's set in patch 1 by the maximum backoff multiplier, which can put
> us up to 1s between retries. It might make sense to drop that to 500ms
> or even less.

Thanks for the discussion.

100 ms seems to be considered an acceptable delay between the time that
a user, say, clicks a button and the time that the button reacts. What
we are talking about is the time between the release of a lock by one
process and the resumption of another process that was blocked waiting
for the lock. The former is probably not under the control of the user
anyway, and perhaps not even observable by the user. Thus I don't think
that a perceivable delay between that event and the resumption of the
blocked process would be annoying. The more salient delay is between the
time that the user started the blocked command and when that command
completed. Let's look in more detail.

The current code would poll at the following times (in ms), ignoring the
time taken for the actual polling attempt and ignoring the random component:

    time  backoff  percent
    ----  -------  -------
       0        1     N/A
       1        4     400%
       5        9     180%
      14       16     114%
      30       25      83%
      55       36      65%
      91       49      54%
     140       64      46%
     204       81      40%
     285      100      35%
     385      121      31%
     506      144      28%
     650      169      26%
     819      196      24%
    1015      225      22%  <- Stop here with the default 1 s timeout
    1240      256      21%
    1496      289      19%
    1785      324      18%
    2109      361      17%
    2470      400      16%
    2870      441      15%
    3311      484      15%
    3795      529      14%
    4324      576      13%
    4900      625      13%
    5525      676      12%
    6201      729      12%
    6930      784      11%
    7714      841      11%
    8555      900      11%
    9455      961      10%
   10416     1000      10%
   11416     1000       9%
   12416     1000       8%

>From the table, the first backoff that is longer than 100 ms doesn't
start until 385 ms, and in the worst case, that 121 ms delay would
increase the *total* wait by only 31%. And the longest backoff within
the first second is only 196 ms. The backoff doesn't reach its maximum
value, 1 s, until the process has already been blocked for 10.4 seconds.

Remember, these backoffs are the *maximum* time that the user might wait
between the time that one process is finished and the time that the
second process resumes. The *average* wait time will be half of that.

And finally, remember that lock contention should be very rare in the
first place, and will mostly occur on servers (because normal users are
unlikely to have lots of parallel processes accessing a single git
repository). What are the most likely scenarios for lock contention in
the real world?

* Occasionally, by chance, under normal operations. In most cases, the
blocking process will finish up within a few milliseconds, the blocked
process will resume almost immediately, and nobody will notice a thing.

* In a pathological repository that has, say, a million packed
references, and writing the packed-refs file takes unusually long. Here,
typical lock contention delays will be long anyway, and adding (worst
case) 121 ms == 31% to the delay is not unreasonable.

* When the server is struggling with enormous load, or a
denial-of-service attack, or some other system-wide problem that is
making processes super slow. In this case it would be counterproductive
to have *additional* processes waking up every 100 ms.

* When a packed-refs.lock file fails to get cleaned up for some reason.
In this case the "contention" will never go away on its own, so the
polling is wasted effort. (Happily, I've almost never seen this happen
on our servers.)

It would be trivial to increase or decrease the maximum backoff. But I
think the current value is a reasonable compromise.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock
  2015-05-02  5:19       ` Michael Haggerty
@ 2015-05-04 17:31         ` Stefan Beller
  2015-05-05 19:21         ` Jeff King
  1 sibling, 0 replies; 18+ messages in thread
From: Stefan Beller @ 2015-05-04 17:31 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Jeff King, Junio C Hamano, git@vger.kernel.org

On Fri, May 1, 2015 at 10:19 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> On 05/01/2015 08:22 PM, Jeff King wrote:
>> On Fri, May 01, 2015 at 10:51:47AM -0700, Stefan Beller wrote:
>>
>>>> diff --git a/refs.c b/refs.c
>>>> index 47e4e53..3f8ac63 100644
>>>> --- a/refs.c
>>>> +++ b/refs.c
>>>> @@ -2413,9 +2413,19 @@ static int write_packed_entry_fn(struct ref_entry *entry, void *cb_data)
>>>>  /* This should return a meaningful errno on failure */
>>>>  int lock_packed_refs(int flags)
>>>>  {
>>>> +       static int timeout_configured = 0;
>>>> +       static int timeout_value = 1000;
>>>
>>> I'd personally be more happier with a default value of 100 ms or less
>>> The reason is found in the human nature, as humans tend to perceive
>>> anything faster than 100ms as "instant"[1], while a 100ms is a long time
>>> for computers.
>>>
>>> Now a small default time may lead to to little retries, so maybe it's worth
>>> checking at the very end of the time again (ignoring the computed backoff
>>> times). As pushes to $server usually take a while (connecting, packing packs,
>>> writing objects etc), this may be overcautios bikeshedding from my side.
>>
>> Keep in mind that this 1s is the maximum time to wait. The
>> lock_file_timeout() code from patch 1 starts off at 1ms, grows
>> quadratically, and quits as soon as it succeeds. So in most cases, the
>> user will wait a much smaller amount of time.

Yes, I forgot about that when having an opinion. I agree a one second
time out is reasonable.

> The current code would poll at the following times (in ms), ignoring the
> time taken for the actual polling attempt and ignoring the random component:
>
>     time  backoff  percent
>     ----  -------  -------
>        0        1     N/A
>        1        4     400%
>        5        9     180%
>       14       16     114%
>       30       25      83%
>       55       36      65%
>       91       49      54%
>      140       64      46%
>      204       81      40%
>      285      100      35%
>      385      121      31%
>      506      144      28%
>      650      169      26%
>      819      196      24%
>     1015      225      22%  <- Stop here with the default 1 s timeout

So a configuration of one second only has about twice the attempts
than a 100ms configuration in the worst case, which is nice for the
machine load, and as I said above, not too long for the user.

Thanks for laying out the numbers here, it's more
understandable now.

>     1240      256      21%
>     1496      289      19%
>     1785      324      18%
>     2109      361      17%
>     2470      400      16%
>     2870      441      15%
>     3311      484      15%
>     3795      529      14%
>     4324      576      13%
>     4900      625      13%
>     5525      676      12%
>     6201      729      12%
>     6930      784      11%
>     7714      841      11%
>     8555      900      11%
>     9455      961      10%
>    10416     1000      10%
>    11416     1000       9%
>    12416     1000       8%
>
> From the table, the first backoff that is longer than 100 ms doesn't
> start until 385 ms, and in the worst case, that 121 ms delay would
> increase the *total* wait by only 31%. And the longest backoff within
> the first second is only 196 ms. The backoff doesn't reach its maximum
> value, 1 s, until the process has already been blocked for 10.4 seconds.
>
> Remember, these backoffs are the *maximum* time that the user might wait
> between the time that one process is finished and the time that the
> second process resumes. The *average* wait time will be half of that.
>
> And finally, remember that lock contention should be very rare in the
> first place, and will mostly occur on servers (because normal users are
> unlikely to have lots of parallel processes accessing a single git
> repository). What are the most likely scenarios for lock contention in
> the real world?
>
> * Occasionally, by chance, under normal operations. In most cases, the
> blocking process will finish up within a few milliseconds, the blocked
> process will resume almost immediately, and nobody will notice a thing.
>
> * In a pathological repository that has, say, a million packed
> references, and writing the packed-refs file takes unusually long. Here,
> typical lock contention delays will be long anyway, and adding (worst
> case) 121 ms == 31% to the delay is not unreasonable.
>
> * When the server is struggling with enormous load, or a
> denial-of-service attack, or some other system-wide problem that is
> making processes super slow. In this case it would be counterproductive
> to have *additional* processes waking up every 100 ms.
>
> * When a packed-refs.lock file fails to get cleaned up for some reason.
> In this case the "contention" will never go away on its own, so the
> polling is wasted effort. (Happily, I've almost never seen this happen
> on our servers.)
>
> It would be trivial to increase or decrease the maximum backoff. But I
> think the current value is a reasonable compromise.
>
> Michael
>
> --
> Michael Haggerty
> mhagger@alum.mit.edu
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock
  2015-05-02  5:19       ` Michael Haggerty
  2015-05-04 17:31         ` Stefan Beller
@ 2015-05-05 19:21         ` Jeff King
  2015-05-11 10:26           ` Michael Haggerty
  1 sibling, 1 reply; 18+ messages in thread
From: Jeff King @ 2015-05-05 19:21 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Stefan Beller, Junio C Hamano, git@vger.kernel.org

On Sat, May 02, 2015 at 07:19:28AM +0200, Michael Haggerty wrote:

> 100 ms seems to be considered an acceptable delay between the time that
> a user, say, clicks a button and the time that the button reacts. What
> we are talking about is the time between the release of a lock by one
> process and the resumption of another process that was blocked waiting
> for the lock. The former is probably not under the control of the user
> anyway, and perhaps not even observable by the user. Thus I don't think
> that a perceivable delay between that event and the resumption of the
> blocked process would be annoying. The more salient delay is between the
> time that the user started the blocked command and when that command
> completed. Let's look in more detail.

Yeah, you can't impact when the other process will drop the lock, but if
we assume that it takes on the order of 100ms for the other process to
do its whole operation, then on average we experience half that. And
then tack on to that whatever time we waste in sleep() after the other
guy drops the lock. And that's on average half of our backoff time.

So something like 100ms max backoff makes sense to me, in that it keeps
us in the same order of magnitude as the expected time that the lock is
held.

Of course these numbers are all grossly hand-wavy, and as you point out,
the current formula never even hits 100ms with the current 1s timeout,
anyway. So for the record, I'm fine leaving your patch as-is.

I think for our disgusting 1GB packed-refs files at GitHub, we will end
up bumping the maximum timeout, but by my own argument above, it will be
fine for the backoff to increase at the same time (i.e., they will
remain in the same rough order of magnitude).

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock
  2015-05-05 19:21         ` Jeff King
@ 2015-05-11 10:26           ` Michael Haggerty
  2015-05-11 16:49             ` Jeff King
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Haggerty @ 2015-05-11 10:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, Junio C Hamano, git@vger.kernel.org

On 05/05/2015 09:21 PM, Jeff King wrote:
> On Sat, May 02, 2015 at 07:19:28AM +0200, Michael Haggerty wrote:
> 
>> 100 ms seems to be considered an acceptable delay between the time that
>> a user, say, clicks a button and the time that the button reacts. What
>> we are talking about is the time between the release of a lock by one
>> process and the resumption of another process that was blocked waiting
>> for the lock. The former is probably not under the control of the user
>> anyway, and perhaps not even observable by the user. Thus I don't think
>> that a perceivable delay between that event and the resumption of the
>> blocked process would be annoying. The more salient delay is between the
>> time that the user started the blocked command and when that command
>> completed. Let's look in more detail.
> 
> Yeah, you can't impact when the other process will drop the lock, but if
> we assume that it takes on the order of 100ms for the other process to
> do its whole operation, then on average we experience half that. And
> then tack on to that whatever time we waste in sleep() after the other
> guy drops the lock. And that's on average half of our backoff time.
> 
> So something like 100ms max backoff makes sense to me, in that it keeps
> us in the same order of magnitude as the expected time that the lock is
> held. [...]

I don't understand your argument. If another process blocks us for on
the order of 100 ms, the backoff time (reading from my table) is less
than half of that. It is only if another process blocks us for longer
that our backoff times grow larger than 100 ms. I don't see the point of
comparing those larger backoff numbers to hypothetical 100 ms expected
blocking times when the larger backoffs *can only happen* for larger
blocking times [1].

But even aside from bikeshedding about which backoff algorithm might be
a tiny bit better than another, let's remember that these locking
conflicts are INCREDIBLY RARE in real life. Current git doesn't have any
retry at all, but users don't seem to be noticeably upset.

In a moment I will submit a re-roll, changing the test case to add the
"wait" that Johannes suggested but leaving the maximum backoff time
unchanged. If anybody feels strongly about changing it, go ahead and do
so (or make it configurable). I like the current setting because I think
it makes more sense for servers, which is the only environment where
lock contention is likely to occur with any measurable frequency.

Michael

[1] For completeness, let's also consider a difference scenario: Suppose
the blocking is not being caused by a single long-lived process but
rather by many short-lived processes running one after the other. In
that case the time we spend blocking depends more on the duty cycle of
other blocking processes, so our backoff time could grow to be longer
than the mean time that any single process holds the lock. But in this
scenario we are throughput-limited rather than latency limited, so our
success in acquiring the lock sooner only deprives another process of
the lock, not significantly improving the throughput of the system as a
whole. (And given that the other processes are probably following the
same rules as we are, the shorter backoff times are just as often
helping them snatch the lock from us as us from them.)

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock
  2015-05-11 10:26           ` Michael Haggerty
@ 2015-05-11 16:49             ` Jeff King
  2015-05-11 17:49               ` Stefan Beller
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2015-05-11 16:49 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Stefan Beller, Junio C Hamano, git@vger.kernel.org

On Mon, May 11, 2015 at 12:26:23PM +0200, Michael Haggerty wrote:

> > So something like 100ms max backoff makes sense to me, in that it keeps
> > us in the same order of magnitude as the expected time that the lock is
> > held. [...]
> 
> I don't understand your argument. If another process blocks us for on
> the order of 100 ms, the backoff time (reading from my table) is less
> than half of that.

I think it is just that I was agreeing with you, but communicated it
badly. I think your series is fine as-is.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock
  2015-05-11 16:49             ` Jeff King
@ 2015-05-11 17:49               ` Stefan Beller
  2015-05-11 17:50                 ` Stefan Beller
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Beller @ 2015-05-11 17:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Haggerty, Junio C Hamano, git@vger.kernel.org

On Mon, May 11, 2015 at 9:49 AM, Jeff King <peff@peff.net> wrote:
> On Mon, May 11, 2015 at 12:26:23PM +0200, Michael Haggerty wrote:
>
>> > So something like 100ms max backoff makes sense to me, in that it keeps
>> > us in the same order of magnitude as the expected time that the lock is
>> > held. [...]
>>
>> I don't understand your argument. If another process blocks us for on
>> the order of 100 ms, the backoff time (reading from my table) is less
>> than half of that.
>
> I think it is just that I was agreeing with you, but communicated it
> badly. I think your series is fine as-is.

By now I also think your series is fine as is.
I am currently implementing something similar for Gerrit and I realize
testing time based things is a royal pain in the butt. The tests you propose
just take wall clock time and all should is good, though slowing down
the test suite
by another second or three in the worst case.

So for tests in Gerrit I use a dedicated java class which is
specialized to pretend different
times, such that you can write:

    doThings();
    pretendTimePassing(1 second);
    checkResultsFromThreads();

but running the tests takes less than 1 second as it's no real wall
clock time passing.

On my machine there is
/bin/usleep - sleep some number of microseconds
which may be better for running the test suite. I don't know if it's
worth the hassle
of checking if that program is there and selectively running a faster
version of said
test though.

That said, I'm personally fine with the waste of 1 second as I view
running the test suite
as a long waiting game anyway. Maybe Junio as the maintainer (I assume
he runs more tests
than all of us?) dislikes waiting for the second all the time?

>
> -Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock
  2015-05-11 17:49               ` Stefan Beller
@ 2015-05-11 17:50                 ` Stefan Beller
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Beller @ 2015-05-11 17:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Haggerty, Junio C Hamano, git@vger.kernel.org

On Mon, May 11, 2015 at 10:49 AM, Stefan Beller <sbeller@google.com> wrote:
> On Mon, May 11, 2015 at 9:49 AM, Jeff King <peff@peff.net> wrote:
>> On Mon, May 11, 2015 at 12:26:23PM +0200, Michael Haggerty wrote:
>>
>>> > So something like 100ms max backoff makes sense to me, in that it keeps
>>> > us in the same order of magnitude as the expected time that the lock is
>>> > held. [...]
>>>
>>> I don't understand your argument. If another process blocks us for on
>>> the order of 100 ms, the backoff time (reading from my table) is less
>>> than half of that.
>>
>> I think it is just that I was agreeing with you, but communicated it
>> badly. I think your series is fine as-is.
>
> By now I also think your series is fine as is.
> I am currently implementing something similar for Gerrit and I realize
> testing time based things is a royal pain in the butt. The tests you propose
> just take wall clock time and all should is good, though slowing down
> the test suite
> by another second or three in the worst case.
>
> So for tests in Gerrit I use a dedicated java class which is
> specialized to pretend different
> times, such that you can write:
>
>     doThings();
>     pretendTimePassing(1 second);
>     checkResultsFromThreads();
>
> but running the tests takes less than 1 second as it's no real wall
> clock time passing.
>
> On my machine there is
> /bin/usleep - sleep some number of microseconds

As we're using perl anyway we may even want to just do

    perl -e "select(undef,undef,undef,0.1);"

as found at
http://serverfault.com/questions/469247/how-do-i-sleep-for-a-millisecond-in-bash-or-ksh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] Retry attempts to acquire the packed-refs lock
  2015-05-01 14:52 [PATCH 0/2] Retry attempts to acquire the packed-refs lock Michael Haggerty
  2015-05-01 14:52 ` [PATCH 1/2] lockfile: allow file locking to be retried with a timeout Michael Haggerty
  2015-05-01 14:52 ` [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock Michael Haggerty
@ 2015-05-03  2:12 ` Junio C Hamano
  2 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2015-05-03  2:12 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Stefan Beller, Jeff King, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> At GitHub we were seeing occasional lock contention over packed-refs.
> It wasn't very common, but when you have as much git traffic as we
> have, anything that *can* happen *will* happen.
>
> The problem is that Git only tries to acquire locks a single time. If
> that attempt fails, the whole process fails. So, for example, if two
> processes are trying to delete two different references, one of them
> can fail due to inability to acquire the packed-refs lock, even though
> it could have succeeded if it had just waited a moment.

Yeah, try and abort may be perfectly fine for Git used interactively
by humans, but is totally unsuitable for the server side.  Thanks
for looking into this.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-05-11 18:05 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-01 14:52 [PATCH 0/2] Retry attempts to acquire the packed-refs lock Michael Haggerty
2015-05-01 14:52 ` [PATCH 1/2] lockfile: allow file locking to be retried with a timeout Michael Haggerty
2015-05-11 18:04   ` Junio C Hamano
2015-05-01 14:52 ` [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock Michael Haggerty
2015-05-01 16:13   ` Johannes Sixt
2015-05-02  3:47     ` Michael Haggerty
2015-05-02 21:43       ` Johannes Sixt
2015-05-11  9:31         ` Michael Haggerty
2015-05-01 17:51   ` Stefan Beller
2015-05-01 18:22     ` Jeff King
2015-05-02  5:19       ` Michael Haggerty
2015-05-04 17:31         ` Stefan Beller
2015-05-05 19:21         ` Jeff King
2015-05-11 10:26           ` Michael Haggerty
2015-05-11 16:49             ` Jeff King
2015-05-11 17:49               ` Stefan Beller
2015-05-11 17:50                 ` Stefan Beller
2015-05-03  2:12 ` [PATCH 0/2] Retry attempts to acquire " Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).