From mboxrd@z Thu Jan  1 00:00:00 1970
From: Ian Campbell <ian.campbell@citrix.com>
Subject: Re: [OSSTEST PATCH 28/28] Executive: Delay releasing
 build host shares by 90s
Date: Tue, 22 Sep 2015 16:34:11 +0100
Message-ID: <1442936051.10338.183.camel@citrix.com>
References: <1442934764-8672-1-git-send-email-ian.jackson@eu.citrix.com>
	<1442934764-8672-8-git-send-email-ian.jackson@eu.citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta5.messagelabs.com ([195.245.231.135])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <prvs=700593250=Ian.Campbell@citrix.com>)
	id 1ZePaD-0006oT-Rv
	for xen-devel@lists.xenproject.org; Tue, 22 Sep 2015 15:34:17 +0000
In-Reply-To: <1442934764-8672-8-git-send-email-ian.jackson@eu.citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Ian Jackson <ian.jackson@eu.citrix.com>, xen-devel@lists.xenproject.org
List-Id: xen-devel@lists.xenproject.org

On Tue, 2015-09-22 at 16:12 +0100, Ian Jackson wrote:
> When a build job finishes, the same flight may well want to do a
> subsequent build that depended on the first.  When this happens, we
> have a race:
> 
> One the one hand, we have the flight: after sg-run-job exits,
> sg-execute-flight needs to double-check the job status, and search the
> flight for more jobs to run; it will spawn ts-allocate-hosts-Executive
> for the new job, which needs to get its head together, parse its
> arguments, become a client of the queue daemon, and ask to be put in
> the queue.
> 
> On the other hand, we have the planning system: currently, as soon as
> sg-run-job exits, the connection to the ownerdaemon closes.  The
> ownerdaemon tells the queue daemon, and the planning queue is
> restarted.  It might even happen that coincidentally the planning
> queue is about to start.
> 
> If the planning system wins the race, another job will pick up the
> newly-freed resource.  Often this will mean unsharing the build host,
> which is very wasteful if the releasing flight hasn't finished its
> builds for that architecture: it means that the next build job needs
> to regroove a host for builds.
> 
> Add a bodge to try to make the race go the other way: after a build
> job completes successfuly, do not give up the share for a further 90
> seconds.  (We have to use setsid because sg-execute-flight kills the
> process group to clean up stray processes, which this sleep definitely
> is.)
> 
> A better solution would be to move the wait-for-referenced-job logic
> from sg-execute-flight to ts-hosts-allocate-*.  But that would be much
> more complicated.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>