All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: "Gonglei (Arei)" <arei.gonglei@huawei.com>
Cc: "Zhangbo (Oscar)" <oscar.zhangbo@huawei.com>,
	Hanweidong <hanweidong@huawei.com>,
	Luonengjun <luonengjun@huawei.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: pvops: Does PVOPS guest os support online "suspend/resume"
Date: Thu, 8 Aug 2013 15:16:48 -0400	[thread overview]
Message-ID: <20130808191648.GA4513@konrad-lan.dumpdata.com> (raw)
In-Reply-To: <33183CC9F5247A488A2544077AF1902080BFC3C6@szxeml538-mbx.china.huawei.com>

On Thu, Aug 08, 2013 at 02:23:06PM +0000, Gonglei (Arei) wrote:
> Hi all,
> 
> While suspend and resume a PVOPS guest os while it's running, we found that it would get its block/net io stucked. However, non-PVOPS guest os has no such problem.
> 

With what version of Linux is this? Have you tried with v3.10?

Thanks.
> How reproducible:
> -------------------
> 1/1
> 
> Steps to reproduce:
> ------------------
>   1)suspend guest os
>     Note: do not migrate/shutdown the guest os.
>   2)resume guest os 
> 
> (Think about rolling-back(resume) during core-dumping(suspend) a guest, such problem would cause the guest os unoprationable.)
> 
> ====================================================================
> we found warning messages in guest os:
> --------------------------------------------------------------------
> Aug  2 10:17:34 localhost kernel: [38592.985159] platform pcspkr: resume
> Aug  2 10:17:34 localhost kernel: [38592.989890] platform vesafb.0: resume
> Aug  2 10:17:34 localhost kernel: [38592.996075] input input0: type resume
> Aug  2 10:17:34 localhost kernel: [38593.001330] input input1: type resume
> Aug  2 10:17:34 localhost kernel: [38593.005496] vbd vbd-51712: legacy resume
> Aug  2 10:17:34 localhost kernel: [38593.011506] WARNING: g.e. still in use!
> Aug  2 10:17:34 localhost kernel: [38593.016909] WARNING: leaking g.e. and page still in use!
> Aug  2 10:17:34 localhost kernel: [38593.026204] xen vbd-51760: legacy resume
> Aug  2 10:17:34 localhost kernel: [38593.033070] vif vif-0: legacy resume
> Aug  2 10:17:34 localhost kernel: [38593.039327] WARNING: g.e. still in use!
> Aug  2 10:17:34 localhost kernel: [38593.045304] WARNING: leaking g.e. and page still in use!
> Aug  2 10:17:34 localhost kernel: [38593.052101] WARNING: g.e. still in use!
> Aug  2 10:17:34 localhost kernel: [38593.057965] WARNING: leaking g.e. and page still in use!
> Aug  2 10:17:34 localhost kernel: [38593.066795] serial8250 serial8250: resume
> Aug  2 10:17:34 localhost kernel: [38593.073556] input input2: type resume
> Aug  2 10:17:34 localhost kernel: [38593.079385] platform Fixed MDIO bus.0: resume
> Aug  2 10:17:34 localhost kernel: [38593.086285] usb usb1: type resume
> ------------------------------------------------------
> 
> which means that we refers to a grant-table while it's in use.
> 
> The reason results in that:
> suspend/resume codes:
> --------------------------------------------------------
> //drivers/xen/manage.c
> static void do_suspend(void)
> {
> 	int err;
> 	struct suspend_info si;
> 
> 	shutting_down = SHUTDOWN_SUSPEND;
> 
> ………………
> 	err = dpm_suspend_start(PMSG_FREEZE);
> ………………
> 	dpm_resume_start(si.cancelled ? PMSG_THAW : PMSG_RESTORE);
> 
> 	if (err) {
> 		pr_err("failed to start xen_suspend: %d\n", err);
> 		si.cancelled = 1;
> 	}
> //NOTE: si.cancelled = 1
> 
> out_resume:
> 	if (!si.cancelled) {
> 		xen_arch_resume();   
> 		xs_resume();
> 	} else
> 		xs_suspend_cancel();
> 
> 	dpm_resume_end(si.cancelled ? PMSG_THAW : PMSG_RESTORE);  //blkfront device got resumed here.
> 
> out_thaw:
> #ifdef CONFIG_PREEMPT
> 	thaw_processes();
> out:
> #endif
> 	shutting_down = SHUTDOWN_INVALID;
> }
> ------------------------------------
> 
> Func "dpm_suspend_start" suspends devices, and "dpm_resume_end" resumes devices.
> However, we found that the device "blkfront" has no SUSPEND method but RESUME method.
> 
> -------------------------------------
> //drivers/block/xen-blkfront.c
> static DEFINE_XENBUS_DRIVER(blkfront, ,
> 	.probe = blkfront_probe,
> 	.remove = blkfront_remove,
> 	.resume = blkfront_resume,  // only RESUME method found here.
> 	.otherend_changed = blkback_changed,
> 	.is_ready = blkfront_is_ready,
> );
> --------------------------------------
> 
> It resumes blkfront device when it didn't get suspended, which caused the prolem above.
> 
> 
> =========================================
> In order to check whether it's the problem of PVOPS or hypervisor(xen)/dom0, we suspend/resume other non-PVOPS guest oses, no such problem occured.
> 
> Other non-PVOPS are using their own xen drivers, as shown in https://github.com/jpaton/xen-4.1-LJX1/blob/master/unmodified_drivers/linux-2.6/platform-pci/machine_reboot.c :
> 
> int __xen_suspend(int fast_suspend, void (*resume_notifier)(int))
> {
>     int err, suspend_cancelled, nr_cpus;
>     struct ap_suspend_info info;
> 
>     xenbus_suspend();
> 
> ……………………
>     preempt_enable();
> 
>     if (!suspend_cancelled)
>         xenbus_resume();     //when the guest os get resumed, suspend_cancelled == 1, thus it wouldn't enter xenbus_resume_uvp here.
>     else
>         xenbus_suspend_cancel();  //It gets here. so the blkfront wouldn't resume.
> 
>     return 0;
> }
> 
> 
> In non-PVOPS guest os, although they don't have blkfront SUSPEND method either, their xen-driver doesn't resume blkfront device, thus, they would't have any problem after suspend/resume.
> 
> 
> I'm wondering why the 2 types of driver(PVOPS and non-PVOPS) are different here. 
> Is that because:
> 1) PVOPS kernel doesn't take this situation into accont, and has a bug here?
> or
> 2) PVOPS has other ways to avoid such problem?
> 
> thank you in advance.
> 
> -Gonglei
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2013-08-08 19:16 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-08 14:23 pvops: Does PVOPS guest os support online "suspend/resume" Gonglei (Arei)
2013-08-08 19:16 ` Konrad Rzeszutek Wilk [this message]
2013-08-10  8:29   ` Gonglei (Arei)
2013-08-12 12:49     ` Konrad Rzeszutek Wilk
2013-08-12 14:19       ` Gonglei (Arei)
2013-08-12 18:04         ` Shriram Rajagopalan
2013-08-13 14:38           ` Gonglei (Arei)
2013-08-13 16:34             ` Konrad Rzeszutek Wilk
2013-08-14 10:52               ` Gonglei (Arei)
  -- strict thread matches above, loose matches on Subject: below --
2013-10-29 10:24 herbert cland
2013-10-29 16:48 ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130808191648.GA4513@konrad-lan.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=arei.gonglei@huawei.com \
    --cc=hanweidong@huawei.com \
    --cc=luonengjun@huawei.com \
    --cc=oscar.zhangbo@huawei.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.