From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: Prepping a bugfix push Date: Fri, 04 Dec 2009 08:37:10 -0800 Message-ID: <4B193AB6.2080203@goop.org> References: <4B1810DF.40309@goop.org> <20091203193540.GB4228@kremvax.cs.ubc.ca> <4B184830.7070107@goop.org> <20091204002406.GB5897@kremvax.cs.ubc.ca> <4B186192.8000201@goop.org> <1259912810.31045.175.camel@localhost.localdomain> <1259941826.23698.16421.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1259941826.23698.16421.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Ian Campbell Cc: Brendan Cully , "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org On 12/04/09 07:50, Ian Campbell wrote: > On Fri, 2009-12-04 at 07:46 +0000, Ian Campbell wrote: > >> I've been doing regular suspend/resumes not checkpoint ones as Brendan >> is doing, I did try a couple of checkpointed ones yesterday and they >> failed, IIRC with a similar softlockup to this one. >> > So what is happening is that the device event channels are getting torn > down by the resume handler and never completely reinstated in the > cancelled suspend (aka checkpoint) case. > Hm. > In 2.6.18 there was a separate ->suspend_cancel() callback for each > driver, called instead of the ->resume() callback in exactly these > circumstances. The cancel callback doesn't do any of the teardown, in > fact for blkfront it doesn't even exist. > > (As a proof of concept, commenting out the entire contents of > blkfront_resume and netfront_resume makes checkpointing work OK for me, > at the cost of breaking regular resume, of course) > > pv-ops uses the generic power management infrastructure which does not > have a concept of cancelling a suspend. Perhaps it should? Otherwise a > different solution will be required, I'm not sure what that might be yet > yet. > Well, the obvious one is to treat it as a full suspend followed by immediate resume. That is, just remove all the special case handling for checkpoint, and let it do the normal resume stuff when the hypercall returns. I think the PM core can fail to suspend; it just resumes anything that has been suspended so far. J