Re: Prepping a bugfix push - Jeremy Fitzhardinge

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Brendan Cully <brendan@cs.ubc.ca>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: Prepping a bugfix push
Date: Fri, 04 Dec 2009 16:05:48 -0800	[thread overview]
Message-ID: <4B19A3DC.1000008@goop.org> (raw)
In-Reply-To: <1259945803.2554.8.camel@localhost.localdomain>

On 12/04/09 08:56, Ian Campbell wrote:
> On Fri, 2009-12-04 at 16:37 +0000, Jeremy Fitzhardinge wrote:
>    
>> On 12/04/09 07:50, Ian Campbell wrote:
>>      
>>> On Fri, 2009-12-04 at 07:46 +0000, Ian Campbell wrote:
>>>
>>>        
>>>> I've been doing regular suspend/resumes not checkpoint ones as Brendan
>>>> is doing, I did try a couple of checkpointed ones yesterday and they
>>>> failed, IIRC with a similar softlockup to this one.
>>>>
>>>>          
>>> So what is happening is that the device event channels are getting torn
>>> down by the resume handler and never completely reinstated in the
>>> cancelled suspend (aka checkpoint) case.
>>>
>>>        
>> Hm.
>>
>>      
>>> In 2.6.18 there was a separate ->suspend_cancel() callback for each
>>> driver, called instead of the ->resume() callback in exactly these
>>> circumstances. The cancel callback doesn't do any of the teardown, in
>>> fact for blkfront it doesn't even exist.
>>>
>>> (As a proof of concept, commenting out the entire contents of
>>> blkfront_resume and netfront_resume makes checkpointing work OK for me,
>>> at the cost of breaking regular resume, of course)
>>>
>>> pv-ops uses the generic power management infrastructure which does not
>>> have a concept of cancelling a suspend. Perhaps it should? Otherwise a
>>> different solution will be required, I'm not sure what that might be yet
>>> yet.
>>>
>>>        
>> Well, the obvious one is to treat it as a full suspend followed by
>> immediate resume.  That is, just remove all the special case handling
>> for checkpoint, and let it do the normal resume stuff when the hypercall
>> returns.
>>      
> I'm not sure how much that will help, some of the resume stuff relies on
> the domain actually changing underneath, i.e. the backends are torn down
> and resetup by the tools and therefore expect a fresh reconnection, the
> hypervisor side of event channels is implicitly reset (the kernel just
> resets its own state) etc. None of these things happen during a
> checkpoint. Presumably those who are interested in checkpointing would
> prefer them not to happen in order to remain fast.
>    

Yes, that's certainly all possible with some biggish performance hit...

How about this: if its a checkpoint, then don't bother calling all the 
resume functions.  We may need to call the device model resume just to 
keep everyone sane and happy, but at the xenbus nodes, filter out the 
calls to the drivers.

>> I think the PM core can fail to suspend; it just resumes anything that
>> has been suspended so far.
>>      
> An optional separate hook for that case (called in preference to
> ->resume) might be acceptable upstream? Adding a parameter to the
> ->resume handler itself might also be acceptable but would involve more
> churn.
>    

The whole area is so fragile and fraught, I don't really want to get 
into it.

     J

next prev parent reply	other threads:[~2009-12-05  0:05 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-03 19:26 Prepping a bugfix push Jeremy Fitzhardinge
2009-12-03 19:35 ` Brendan Cully
2009-12-03 19:38   ` Jeremy Fitzhardinge
2009-12-03 23:22   ` Jeremy Fitzhardinge
2009-12-04  0:24     ` Brendan Cully
2009-12-04  1:10       ` Jeremy Fitzhardinge
2009-12-04  4:29         ` Brendan Cully
2009-12-04  7:46         ` Ian Campbell
2009-12-04 15:50           ` Ian Campbell
2009-12-04 16:37             ` Jeremy Fitzhardinge
2009-12-04 16:56               ` Ian Campbell
2009-12-05  0:05                 ` Jeremy Fitzhardinge [this message]
2009-12-04 17:01               ` Brendan Cully
2009-12-04 17:12                 ` Jeremy Fitzhardinge
2009-12-04 10:45 ` Ian Campbell
2009-12-04 14:10   ` Konrad Rzeszutek Wilk
2009-12-04 16:18     ` Jeremy Fitzhardinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B19A3DC.1000008@goop.org \
    --to=jeremy@goop.org \
    --cc=Ian.Campbell@citrix.com \
    --cc=brendan@cs.ubc.ca \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.