From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ian Campbell Subject: Re: [OSSTEST PATCH 28/28] Executive: Delay releasing build host shares by 90s Date: Tue, 22 Sep 2015 16:34:11 +0100 Message-ID: <1442936051.10338.183.camel@citrix.com> References: <1442934764-8672-1-git-send-email-ian.jackson@eu.citrix.com> <1442934764-8672-8-git-send-email-ian.jackson@eu.citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZePaD-0006oT-Rv for xen-devel@lists.xenproject.org; Tue, 22 Sep 2015 15:34:17 +0000 In-Reply-To: <1442934764-8672-8-git-send-email-ian.jackson@eu.citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Jackson , xen-devel@lists.xenproject.org List-Id: xen-devel@lists.xenproject.org On Tue, 2015-09-22 at 16:12 +0100, Ian Jackson wrote: > When a build job finishes, the same flight may well want to do a > subsequent build that depended on the first. When this happens, we > have a race: > > One the one hand, we have the flight: after sg-run-job exits, > sg-execute-flight needs to double-check the job status, and search the > flight for more jobs to run; it will spawn ts-allocate-hosts-Executive > for the new job, which needs to get its head together, parse its > arguments, become a client of the queue daemon, and ask to be put in > the queue. > > On the other hand, we have the planning system: currently, as soon as > sg-run-job exits, the connection to the ownerdaemon closes. The > ownerdaemon tells the queue daemon, and the planning queue is > restarted. It might even happen that coincidentally the planning > queue is about to start. > > If the planning system wins the race, another job will pick up the > newly-freed resource. Often this will mean unsharing the build host, > which is very wasteful if the releasing flight hasn't finished its > builds for that architecture: it means that the next build job needs > to regroove a host for builds. > > Add a bodge to try to make the race go the other way: after a build > job completes successfuly, do not give up the share for a further 90 > seconds. (We have to use setsid because sg-execute-flight kills the > process group to clean up stray processes, which this sleep definitely > is.) > > A better solution would be to move the wait-for-referenced-job logic > from sg-execute-flight to ts-hosts-allocate-*. But that would be much > more complicated. > > Signed-off-by: Ian Jackson Acked-by: Ian Campbell