* Concurrent pushes updating the same ref @ 2011-01-06 15:46 Marc Branchaud 2011-01-06 16:30 ` Jeff King 0 siblings, 1 reply; 8+ messages in thread From: Marc Branchaud @ 2011-01-06 15:46 UTC (permalink / raw) To: Git Mailing List Hi all, [ BACKGROUND: I've modified our build system to push a custom ref at the start of each build. The aim is to identify in the repo which revision got built. For us, an overall "build" consists of creating about a dozen products, all from the same source tree. The build system (Hudson) launches each product's build concurrently on one or more build slaves. Each of those individual product builds clones the repo, checks out the appropriate revision, and pushes up the custom ref. (I would have liked to make the Hudson master job push up the ref, instead of all the slave jobs, but I couldn't find a way to do that.) ] Usually this works: Each slave is setting the ref to the same value, so the order of the updates doesn't matter. But every once in a while, the push fails with: fatal: Unable to create '/usr/xiplink/git/public/Main.git/refs/builds/3.3.0-3.lock': File exists. If no other git process is currently running, this probably means a git process crashed in this repository earlier. Make sure no other git process is running and remove the file manually to continue. fatal: The remote end hung up unexpectedly I think the cause is pretty obvious, and in a normal interactive situation the solution would be to simply try again. But in a script trying again isn't so straightforward. So I'm wondering if there's any sense or desire to make git a little more flexible here. Maybe teach it to wait and try again once or twice when it sees a lock file. I presume that normally a ref lock file should disappear pretty quickly, so there shouldn't be a need to wait very long. Thoughts? M. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Concurrent pushes updating the same ref 2011-01-06 15:46 Concurrent pushes updating the same ref Marc Branchaud @ 2011-01-06 16:30 ` Jeff King 2011-01-06 16:48 ` Shawn Pearce ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Jeff King @ 2011-01-06 16:30 UTC (permalink / raw) To: Marc Branchaud; +Cc: Git Mailing List On Thu, Jan 06, 2011 at 10:46:38AM -0500, Marc Branchaud wrote: > fatal: Unable to create > '/usr/xiplink/git/public/Main.git/refs/builds/3.3.0-3.lock': File exists. > If no other git process is currently running, this probably means a > git process crashed in this repository earlier. Make sure no other git > process is running and remove the file manually to continue. > fatal: The remote end hung up unexpectedly > > I think the cause is pretty obvious, and in a normal interactive situation > the solution would be to simply try again. But in a script trying again > isn't so straightforward. > > So I'm wondering if there's any sense or desire to make git a little more > flexible here. Maybe teach it to wait and try again once or twice when it > sees a lock file. I presume that normally a ref lock file should disappear > pretty quickly, so there shouldn't be a need to wait very long. Yeah, we probably should try again. The simplest possible (and untested) patch is below. However, a few caveats: 1. This patch unconditionally retries for all lock files. Do all callers want that? I wonder if there are any exploratory lock acquisitions that would rather return immediately than have some delay. 2. The number of tries and sleep time are pulled out of a hat. 3. Even with retries, I don't know if you will get the behavior you want. The lock procedure for refs is: 1. get the lock 2. check and remember the sha1 3. release the lock 4. do some long-running work (like the actual push) 5. get the lock 6. check that the sha1 is the same as the remembered one 7. update the sha1 8. release the lock Right now you are getting contention on the lock itself. But may you not also run afoul of step (6) above? That is, one push updates the ref from A to B, then the other one in attempting to go from A to B sees that it has already changed to B under our feet and complains? I can certainly think of a rule around that special case (if we are going to B, and it already changed to B, silently leave it alone and pretend we wrote it), but I don't know how often that would be useful in the real world. Anyway, patch (for discussion, not inclusion) is below. diff --git a/lockfile.c b/lockfile.c index b0d74cd..3329719 100644 --- a/lockfile.c +++ b/lockfile.c @@ -122,7 +122,7 @@ static char *resolve_symlink(char *p, size_t s) } -static int lock_file(struct lock_file *lk, const char *path, int flags) +static int lock_file_single(struct lock_file *lk, const char *path, int flags) { if (strlen(path) >= sizeof(lk->filename)) return -1; @@ -155,6 +155,21 @@ static int lock_file(struct lock_file *lk, const char *path, int flags) return lk->fd; } +static int lock_file(struct lock_file *lk, const char *path, int flags) +{ + int tries; + int fd; + for (tries = 0; tries < 3; tries++) { + fd = lock_file_single(lk, path, flags); + if (fd >= 0) + return fd; + if (errno != EEXIST) + return fd; + sleep(1); + } + return fd; +} + static char *unable_to_lock_message(const char *path, int err) { struct strbuf buf = STRBUF_INIT; ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Concurrent pushes updating the same ref 2011-01-06 16:30 ` Jeff King @ 2011-01-06 16:48 ` Shawn Pearce 2011-01-06 17:28 ` Ilari Liusvaara 2011-01-06 17:12 ` Marc Branchaud 2011-01-06 19:37 ` Junio C Hamano 2 siblings, 1 reply; 8+ messages in thread From: Shawn Pearce @ 2011-01-06 16:48 UTC (permalink / raw) To: Jeff King; +Cc: Marc Branchaud, Git Mailing List On Thu, Jan 6, 2011 at 08:30, Jeff King <peff@peff.net> wrote: > > Yeah, we probably should try again. The simplest possible (and untested) > patch is below. However, a few caveats: > > 1. This patch unconditionally retries for all lock files. Do all > callers want that? I wonder if there are any exploratory lock > acquisitions that would rather return immediately than have some > delay. I don't see why not. We shouldn't be exploring to see if a lock is possible anywhere. > 2. The number of tries and sleep time are pulled out of a hat. FWIW, JGit has started to do some of this stuff for Windows. We're using 10 retries, with a delay of 100 milliseconds between each. This was also pulled out of a hat, but it seems to have resolved the bug reports that came in on Windows. We unfortunately have to do retries on directory and file deletion. > 3. Even with retries, I don't know if you will get the behavior you > want. The lock procedure for refs is: > > 1. get the lock > 2. check and remember the sha1 > 3. release the lock Why are we locking the ref to read it? You can read a ref atomically without locking. > 4. do some long-running work (like the actual push) > 5. get the lock > 6. check that the sha1 is the same as the remembered one > 7. update the sha1 > 8. release the lock > > Right now you are getting contention on the lock itself. But may > you not also run afoul of step (6) above? That is, one push updates > the ref from A to B, then the other one in attempting to go from A > to B sees that it has already changed to B under our feet and > complains? Not if its a force push. :-) -- Shawn. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Concurrent pushes updating the same ref 2011-01-06 16:48 ` Shawn Pearce @ 2011-01-06 17:28 ` Ilari Liusvaara 0 siblings, 0 replies; 8+ messages in thread From: Ilari Liusvaara @ 2011-01-06 17:28 UTC (permalink / raw) To: Shawn Pearce; +Cc: Jeff King, Marc Branchaud, Git Mailing List On Thu, Jan 06, 2011 at 08:48:11AM -0800, Shawn Pearce wrote: > On Thu, Jan 6, 2011 at 08:30, Jeff King <peff@peff.net> wrote: > > > > Right now you are getting contention on the lock itself. But may > > you not also run afoul of step (6) above? That is, one push updates > > the ref from A to B, then the other one in attempting to go from A > > to B sees that it has already changed to B under our feet and > > complains? > > Not if its a force push. :-) IIRC, there are no wire protocol bits to denote a forced push, the force option only overrides client-side checks. Thus, even forced pushes can fail due to race conditions... -Ilari ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Concurrent pushes updating the same ref 2011-01-06 16:30 ` Jeff King 2011-01-06 16:48 ` Shawn Pearce @ 2011-01-06 17:12 ` Marc Branchaud 2011-01-10 22:14 ` Marc Branchaud 2011-01-06 19:37 ` Junio C Hamano 2 siblings, 1 reply; 8+ messages in thread From: Marc Branchaud @ 2011-01-06 17:12 UTC (permalink / raw) To: Jeff King; +Cc: Git Mailing List On 11-01-06 11:30 AM, Jeff King wrote: > On Thu, Jan 06, 2011 at 10:46:38AM -0500, Marc Branchaud wrote: > >> fatal: Unable to create >> '/usr/xiplink/git/public/Main.git/refs/builds/3.3.0-3.lock': File exists. >> If no other git process is currently running, this probably means a >> git process crashed in this repository earlier. Make sure no other git >> process is running and remove the file manually to continue. >> fatal: The remote end hung up unexpectedly >> >> I think the cause is pretty obvious, and in a normal interactive situation >> the solution would be to simply try again. But in a script trying again >> isn't so straightforward. >> >> So I'm wondering if there's any sense or desire to make git a little more >> flexible here. Maybe teach it to wait and try again once or twice when it >> sees a lock file. I presume that normally a ref lock file should disappear >> pretty quickly, so there shouldn't be a need to wait very long. > > Yeah, we probably should try again. The simplest possible (and untested) > patch is below. However, a few caveats: > > 1. This patch unconditionally retries for all lock files. Do all > callers want that? I wonder if there are any exploratory lock > acquisitions that would rather return immediately than have some > delay. > > 2. The number of tries and sleep time are pulled out of a hat. > > 3. Even with retries, I don't know if you will get the behavior you > want. The lock procedure for refs is: > > 1. get the lock > 2. check and remember the sha1 > 3. release the lock > 4. do some long-running work (like the actual push) > 5. get the lock > 6. check that the sha1 is the same as the remembered one > 7. update the sha1 > 8. release the lock > > Right now you are getting contention on the lock itself. But may > you not also run afoul of step (6) above? That is, one push updates > the ref from A to B, then the other one in attempting to go from A > to B sees that it has already changed to B under our feet and > complains? Could not anything run afoul of step (6)? Who knows what might happen in step (4)... However, in my particular case I'm using a "force" refspec: git push origin +HEAD:refs/builds/${TAG} so (as Shawn says) step (6) shouldn't matter, right? Plus, all the concurrent pushes are setting the ref to the same value anyway. This is fairly degenerate behaviour though. > I can certainly think of a rule around that special case (if we are > going to B, and it already changed to B, silently leave it alone > and pretend we wrote it), but I don't know how often that would be > useful in the real world. Yes -- useful in my case, but otherwise... Still, I think it would be more-correct to do that. M. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Concurrent pushes updating the same ref 2011-01-06 17:12 ` Marc Branchaud @ 2011-01-10 22:14 ` Marc Branchaud 0 siblings, 0 replies; 8+ messages in thread From: Marc Branchaud @ 2011-01-10 22:14 UTC (permalink / raw) To: Jeff King; +Cc: Git Mailing List On 11-01-06 12:12 PM, Marc Branchaud wrote: > On 11-01-06 11:30 AM, Jeff King wrote: >> On Thu, Jan 06, 2011 at 10:46:38AM -0500, Marc Branchaud wrote: >> >>> fatal: Unable to create >>> '/usr/xiplink/git/public/Main.git/refs/builds/3.3.0-3.lock': File exists. >>> If no other git process is currently running, this probably means a >>> git process crashed in this repository earlier. Make sure no other git >>> process is running and remove the file manually to continue. >>> fatal: The remote end hung up unexpectedly >>> >>> I think the cause is pretty obvious, and in a normal interactive situation >>> the solution would be to simply try again. But in a script trying again >>> isn't so straightforward. >>> >>> So I'm wondering if there's any sense or desire to make git a little more >>> flexible here. Maybe teach it to wait and try again once or twice when it >>> sees a lock file. I presume that normally a ref lock file should disappear >>> pretty quickly, so there shouldn't be a need to wait very long. >> >> Yeah, we probably should try again. The simplest possible (and untested) >> patch is below. However, a few caveats: >> >> 1. This patch unconditionally retries for all lock files. Do all >> callers want that? I wonder if there are any exploratory lock >> acquisitions that would rather return immediately than have some >> delay. >> >> 2. The number of tries and sleep time are pulled out of a hat. >> >> 3. Even with retries, I don't know if you will get the behavior you >> want. The lock procedure for refs is: >> >> 1. get the lock >> 2. check and remember the sha1 >> 3. release the lock >> 4. do some long-running work (like the actual push) >> 5. get the lock >> 6. check that the sha1 is the same as the remembered one >> 7. update the sha1 >> 8. release the lock >> >> Right now you are getting contention on the lock itself. But may >> you not also run afoul of step (6) above? That is, one push updates >> the ref from A to B, then the other one in attempting to go from A >> to B sees that it has already changed to B under our feet and >> complains? > > Could not anything run afoul of step (6)? Who knows what might happen in > step (4)... > > However, in my particular case I'm using a "force" refspec: > > git push origin +HEAD:refs/builds/${TAG} > > so (as Shawn says) step (6) shouldn't matter, right? Plus, all the > concurrent pushes are setting the ref to the same value anyway. Well, after modifying my build script to ignore failed pushes, I do occasionally see failures like this: remote: fatal: Invalid revision range 0000000000000000000000000000000000000000..1c58dc4c3fdd9475d26d0eb797cc096fb622a594 error: Ref refs/builds/3.3.0-9 is at 1c58dc4c3fdd9475d26d0eb797cc096fb622a594 but expected 0000000000000000000000000000000000000000 remote: error: failed to lock refs/builds/3.3.0-9 So I guess even the "force" refspec is getting blocked by step 6. FYI, the repo receiving the push is running git 1.7.1. M. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Concurrent pushes updating the same ref 2011-01-06 16:30 ` Jeff King 2011-01-06 16:48 ` Shawn Pearce 2011-01-06 17:12 ` Marc Branchaud @ 2011-01-06 19:37 ` Junio C Hamano 2011-01-06 21:51 ` Marc Branchaud 2 siblings, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2011-01-06 19:37 UTC (permalink / raw) To: Jeff King; +Cc: Marc Branchaud, Git Mailing List Jeff King <peff@peff.net> writes: > On Thu, Jan 06, 2011 at 10:46:38AM -0500, Marc Branchaud wrote: > >> fatal: Unable to create >> '/usr/xiplink/git/public/Main.git/refs/builds/3.3.0-3.lock': File exists. >> If no other git process is currently running, this probably means a >> git process crashed in this repository earlier. Make sure no other git >> process is running and remove the file manually to continue. >> fatal: The remote end hung up unexpectedly >> >> I think the cause is pretty obvious, and in a normal interactive situation >> the solution would be to simply try again. But in a script trying again >> isn't so straightforward. >> >> So I'm wondering if there's any sense or desire to make git a little more >> flexible here. Maybe teach it to wait and try again once or twice when it >> sees a lock file. I presume that normally a ref lock file should disappear >> pretty quickly, so there shouldn't be a need to wait very long. > > Yeah, we probably should try again. The simplest possible (and untested) > patch is below. However, a few caveats: > > 1. This patch unconditionally retries for all lock files. Do all > callers want that? I actually have to say that _no_ caller should want this. If somebody earlier crashed, we would want to know about it (and how). If somebody else alive is actively holding a lock, why not make it the responsibility of a calling script to decide if it wants to retry itself or perhaps decide to do something else? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Concurrent pushes updating the same ref 2011-01-06 19:37 ` Junio C Hamano @ 2011-01-06 21:51 ` Marc Branchaud 0 siblings, 0 replies; 8+ messages in thread From: Marc Branchaud @ 2011-01-06 21:51 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Git Mailing List On 11-01-06 02:37 PM, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > >> On Thu, Jan 06, 2011 at 10:46:38AM -0500, Marc Branchaud wrote: >> >>> fatal: Unable to create >>> '/usr/xiplink/git/public/Main.git/refs/builds/3.3.0-3.lock': File exists. >>> If no other git process is currently running, this probably means a >>> git process crashed in this repository earlier. Make sure no other git >>> process is running and remove the file manually to continue. >>> fatal: The remote end hung up unexpectedly >>> >>> I think the cause is pretty obvious, and in a normal interactive situation >>> the solution would be to simply try again. But in a script trying again >>> isn't so straightforward. >>> >>> So I'm wondering if there's any sense or desire to make git a little more >>> flexible here. Maybe teach it to wait and try again once or twice when it >>> sees a lock file. I presume that normally a ref lock file should disappear >>> pretty quickly, so there shouldn't be a need to wait very long. >> >> Yeah, we probably should try again. The simplest possible (and untested) >> patch is below. However, a few caveats: >> >> 1. This patch unconditionally retries for all lock files. Do all >> callers want that? > > I actually have to say that _no_ caller should want this. If somebody > earlier crashed, we would want to know about it (and how). If somebody > else alive is actively holding a lock, why not make it the responsibility > of a calling script to decide if it wants to retry itself or perhaps > decide to do something else? I'm not sure I follow this. How would retrying a few times prevent us from finding out about an earlier crash? It's not like we're overriding the lock by retrying. Nobody's going to be able to remove a lock created by a crashed process, right? And if someone active doesn't release the lock and the low-level code retried a few times, the caller can still decide what to do. I don't see how it would even impact that decision -- if the caller wants to try again, the system can still retry a few times underneath the caller's one retry. It seems fine to me. M. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-01-10 22:14 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-01-06 15:46 Concurrent pushes updating the same ref Marc Branchaud 2011-01-06 16:30 ` Jeff King 2011-01-06 16:48 ` Shawn Pearce 2011-01-06 17:28 ` Ilari Liusvaara 2011-01-06 17:12 ` Marc Branchaud 2011-01-10 22:14 ` Marc Branchaud 2011-01-06 19:37 ` Junio C Hamano 2011-01-06 21:51 ` Marc Branchaud
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).