* [PATCH 0/2] xenbus: Fix S3 frontend resume when xenstored is not running @ 2013-05-01 12:57 Aurelien Chartier 2013-05-01 12:57 ` [PATCH 1/2] xenbus: save xenstore local status for later use Aurelien Chartier 2013-05-01 12:57 ` [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running Aurelien Chartier 0 siblings, 2 replies; 9+ messages in thread From: Aurelien Chartier @ 2013-05-01 12:57 UTC (permalink / raw) To: xen-devel Cc: Aurelien Chartier, stefano.stabellini, ian.campbell, JBeulich, konrad.wilk Hi, This patch series fixes the S3 resume of dom0 or a Xenstore stub domain running a frontend over xenbus (xen-netfront in my use case). As device resume is happening before process resume, the xenbus frontend resume is hanging if xenstored is not running, thus causing a deadlock. This patch series is fixing that issue by bypassing the xenbus frontend resume when we are running in dom0 or a Xenstore stub domain. Aurelien Chartier (2): xenbus: save xenstore local status for later use xenbus: bypass xenbus frontend resume if xenstored is not running drivers/xen/xenbus/xenbus_comms.h | 1 + drivers/xen/xenbus/xenbus_probe.c | 27 ++++++++++++--------------- drivers/xen/xenbus/xenbus_probe.h | 7 +++++++ drivers/xen/xenbus/xenbus_probe_frontend.c | 15 ++++++++++++++- 4 files changed, 34 insertions(+), 16 deletions(-) ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] xenbus: save xenstore local status for later use 2013-05-01 12:57 [PATCH 0/2] xenbus: Fix S3 frontend resume when xenstored is not running Aurelien Chartier @ 2013-05-01 12:57 ` Aurelien Chartier 2013-05-01 12:57 ` [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running Aurelien Chartier 1 sibling, 0 replies; 9+ messages in thread From: Aurelien Chartier @ 2013-05-01 12:57 UTC (permalink / raw) To: xen-devel Cc: Aurelien Chartier, stefano.stabellini, ian.campbell, JBeulich, konrad.wilk Save the xenstore local status computed in xenbus_init. It can then be used later to check if xenstored is running in that domain. --- drivers/xen/xenbus/xenbus_comms.h | 1 + drivers/xen/xenbus/xenbus_probe.c | 27 ++++++++++++--------------- drivers/xen/xenbus/xenbus_probe.h | 7 +++++++ 3 files changed, 20 insertions(+), 15 deletions(-) diff --git a/drivers/xen/xenbus/xenbus_comms.h b/drivers/xen/xenbus/xenbus_comms.h index c8abd3b..be329a1 100644 --- a/drivers/xen/xenbus/xenbus_comms.h +++ b/drivers/xen/xenbus/xenbus_comms.h @@ -45,6 +45,7 @@ int xb_wait_for_data_to_read(void); int xs_input_avail(void); extern struct xenstore_domain_interface *xen_store_interface; extern int xen_store_evtchn; +extern enum xenstore_init xen_store_domain; extern const struct file_operations xen_xenbus_fops; diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 3325884..c3375ba 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -69,6 +69,9 @@ EXPORT_SYMBOL_GPL(xen_store_evtchn); struct xenstore_domain_interface *xen_store_interface; EXPORT_SYMBOL_GPL(xen_store_interface); +enum xenstore_init xen_store_domain; +EXPORT_SYMBOL_GPL(xen_store_domain); + static unsigned long xen_store_mfn; static BLOCKING_NOTIFIER_HEAD(xenstore_chain); @@ -719,17 +722,11 @@ static int __init xenstored_local_init(void) return err; } -enum xenstore_init { - UNKNOWN, - PV, - HVM, - LOCAL, -}; static int __init xenbus_init(void) { int err = 0; - enum xenstore_init usage = UNKNOWN; uint64_t v = 0; + xen_store_domain = XS_UNKNOWN; if (!xen_domain()) return -ENODEV; @@ -737,29 +734,29 @@ static int __init xenbus_init(void) xenbus_ring_ops_init(); if (xen_pv_domain()) - usage = PV; + xen_store_domain = XS_PV; if (xen_hvm_domain()) - usage = HVM; + xen_store_domain = XS_HVM; if (xen_hvm_domain() && xen_initial_domain()) - usage = LOCAL; + xen_store_domain = XS_LOCAL; if (xen_pv_domain() && !xen_start_info->store_evtchn) - usage = LOCAL; + xen_store_domain = XS_LOCAL; if (xen_pv_domain() && xen_start_info->store_evtchn) xenstored_ready = 1; - switch (usage) { - case LOCAL: + switch (xen_store_domain) { + case XS_LOCAL: err = xenstored_local_init(); if (err) goto out_error; xen_store_interface = mfn_to_virt(xen_store_mfn); break; - case PV: + case XS_PV: xen_store_evtchn = xen_start_info->store_evtchn; xen_store_mfn = xen_start_info->store_mfn; xen_store_interface = mfn_to_virt(xen_store_mfn); break; - case HVM: + case XS_HVM: err = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN, &v); if (err) goto out_error; diff --git a/drivers/xen/xenbus/xenbus_probe.h b/drivers/xen/xenbus/xenbus_probe.h index bb4f92e..146f857 100644 --- a/drivers/xen/xenbus/xenbus_probe.h +++ b/drivers/xen/xenbus/xenbus_probe.h @@ -47,6 +47,13 @@ struct xen_bus_type { struct bus_type bus; }; +enum xenstore_init { + XS_UNKNOWN, + XS_PV, + XS_HVM, + XS_LOCAL, +}; + extern struct device_attribute xenbus_dev_attrs[]; extern int xenbus_match(struct device *_dev, struct device_driver *_drv); ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running 2013-05-01 12:57 [PATCH 0/2] xenbus: Fix S3 frontend resume when xenstored is not running Aurelien Chartier 2013-05-01 12:57 ` [PATCH 1/2] xenbus: save xenstore local status for later use Aurelien Chartier @ 2013-05-01 12:57 ` Aurelien Chartier 2013-05-02 8:24 ` Ian Campbell 1 sibling, 1 reply; 9+ messages in thread From: Aurelien Chartier @ 2013-05-01 12:57 UTC (permalink / raw) To: xen-devel Cc: Aurelien Chartier, stefano.stabellini, ian.campbell, JBeulich, konrad.wilk If the xenbus frontend is running in a domain running xenstored or in dom0, the device resume is hanging because it is happening before the process resume. This patch adds extra logic to the resume code to check if we are the domain running xenstored or dom0. The frontend will be reconnected later, when the backend resumes from S3. This logic is working when xenstored is running in dom0, but has not been tested with a xenstore stub domain. --- drivers/xen/xenbus/xenbus_probe_frontend.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c index 3159a37..8583afe 100644 --- a/drivers/xen/xenbus/xenbus_probe_frontend.c +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c @@ -89,9 +89,22 @@ static void backend_changed(struct xenbus_watch *watch, xenbus_otherend_changed(watch, vec, len, 1); } +static int xenbus_frontend_dev_resume(struct device *dev) +{ + /* + * If xenstored is running in that domain, we cannot access the backend + * state at the moment. If we are running in dom0, the domain running + * xenstored is still suspended at that point + */ + if (xen_initial_domain() || (xen_store_domain == XS_LOCAL)) + return 0; + + return xenbus_dev_resume(dev); +} + static const struct dev_pm_ops xenbus_pm_ops = { .suspend = xenbus_dev_suspend, - .resume = xenbus_dev_resume, + .resume = xenbus_frontend_dev_resume, .freeze = xenbus_dev_suspend, .thaw = xenbus_dev_cancel, .restore = xenbus_dev_resume, ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running 2013-05-01 12:57 ` [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running Aurelien Chartier @ 2013-05-02 8:24 ` Ian Campbell 2013-05-02 9:21 ` Jan Beulich 0 siblings, 1 reply; 9+ messages in thread From: Ian Campbell @ 2013-05-02 8:24 UTC (permalink / raw) To: Aurelien Chartier Cc: Stefano Stabellini, xen-devel@lists.xensource.com, JBeulich@suse.com, konrad.wilk@oracle.com On Wed, 2013-05-01 at 13:57 +0100, Aurelien Chartier wrote: > If the xenbus frontend is running in a domain running xenstored or in dom0, > the device resume is hanging because it is happening before the process > resume. This patch adds extra logic to the resume code to check if we are > the domain running xenstored or dom0. > > The frontend will be reconnected later, when the backend resumes from S3. > This logic is working when xenstored is running in dom0, but has not been > tested with a xenstore stub domain. > --- > drivers/xen/xenbus/xenbus_probe_frontend.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c > index 3159a37..8583afe 100644 > --- a/drivers/xen/xenbus/xenbus_probe_frontend.c > +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c > @@ -89,9 +89,22 @@ static void backend_changed(struct xenbus_watch *watch, > xenbus_otherend_changed(watch, vec, len, 1); > } > > +static int xenbus_frontend_dev_resume(struct device *dev) > +{ > + /* > + * If xenstored is running in that domain, we cannot access the backend > + * state at the moment. If we are running in dom0, the domain running > + * xenstored is still suspended at that point > + */ > + if (xen_initial_domain() || (xen_store_domain == XS_LOCAL)) > + return 0; > + > + return xenbus_dev_resume(dev); When or where does this eventually get called for the init domain or XS_LOCAL cases? > +} > + > static const struct dev_pm_ops xenbus_pm_ops = { > .suspend = xenbus_dev_suspend, > - .resume = xenbus_dev_resume, > + .resume = xenbus_frontend_dev_resume, > .freeze = xenbus_dev_suspend, > .thaw = xenbus_dev_cancel, > .restore = xenbus_dev_resume, ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running 2013-05-02 8:24 ` Ian Campbell @ 2013-05-02 9:21 ` Jan Beulich 2013-05-02 9:24 ` Ian Campbell 0 siblings, 1 reply; 9+ messages in thread From: Jan Beulich @ 2013-05-02 9:21 UTC (permalink / raw) To: Aurelien Chartier, Ian Campbell Cc: xen-devel@lists.xensource.com, konrad.wilk@oracle.com, Stefano Stabellini >>> On 02.05.13 at 10:24, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Wed, 2013-05-01 at 13:57 +0100, Aurelien Chartier wrote: >> If the xenbus frontend is running in a domain running xenstored or in dom0, >> the device resume is hanging because it is happening before the process >> resume. This patch adds extra logic to the resume code to check if we are >> the domain running xenstored or dom0. >> >> The frontend will be reconnected later, when the backend resumes from S3. >> This logic is working when xenstored is running in dom0, but has not been >> tested with a xenstore stub domain. >> --- >> drivers/xen/xenbus/xenbus_probe_frontend.c | 15 ++++++++++++++- >> 1 file changed, 14 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c > b/drivers/xen/xenbus/xenbus_probe_frontend.c >> index 3159a37..8583afe 100644 >> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c >> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c >> @@ -89,9 +89,22 @@ static void backend_changed(struct xenbus_watch *watch, >> xenbus_otherend_changed(watch, vec, len, 1); >> } >> >> +static int xenbus_frontend_dev_resume(struct device *dev) >> +{ >> + /* >> + * If xenstored is running in that domain, we cannot access the backend >> + * state at the moment. If we are running in dom0, the domain running >> + * xenstored is still suspended at that point >> + */ >> + if (xen_initial_domain() || (xen_store_domain == XS_LOCAL)) >> + return 0; >> + >> + return xenbus_dev_resume(dev); > > When or where does this eventually get called for the init domain or > XS_LOCAL cases? I was about to ask the same question. Plus I don't think the description here or in the overview mail really makes clear how specifically a deadlock would occur here. That's pretty relevant to understand in the light that so far we had no indication of there being any special treatment necessary here, and resume from S3 had been working quite fine without that (at least as long as xenstored is running in Dom0 and at least with the traditional/ forward-port/non-pvops kernels). Jan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running 2013-05-02 9:21 ` Jan Beulich @ 2013-05-02 9:24 ` Ian Campbell 2013-05-02 10:10 ` Aurélien Chartier 0 siblings, 1 reply; 9+ messages in thread From: Ian Campbell @ 2013-05-02 9:24 UTC (permalink / raw) To: Jan Beulich Cc: Aurelien Chartier, xen-devel@lists.xensource.com, konrad.wilk@oracle.com, Stefano Stabellini On Thu, 2013-05-02 at 10:21 +0100, Jan Beulich wrote: > >>> On 02.05.13 at 10:24, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Wed, 2013-05-01 at 13:57 +0100, Aurelien Chartier wrote: > >> If the xenbus frontend is running in a domain running xenstored or in dom0, > >> the device resume is hanging because it is happening before the process > >> resume. This patch adds extra logic to the resume code to check if we are > >> the domain running xenstored or dom0. > >> > >> The frontend will be reconnected later, when the backend resumes from S3. > >> This logic is working when xenstored is running in dom0, but has not been > >> tested with a xenstore stub domain. > >> --- > >> drivers/xen/xenbus/xenbus_probe_frontend.c | 15 ++++++++++++++- > >> 1 file changed, 14 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c > > b/drivers/xen/xenbus/xenbus_probe_frontend.c > >> index 3159a37..8583afe 100644 > >> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c > >> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c > >> @@ -89,9 +89,22 @@ static void backend_changed(struct xenbus_watch *watch, > >> xenbus_otherend_changed(watch, vec, len, 1); > >> } > >> > >> +static int xenbus_frontend_dev_resume(struct device *dev) > >> +{ > >> + /* > >> + * If xenstored is running in that domain, we cannot access the backend > >> + * state at the moment. If we are running in dom0, the domain running > >> + * xenstored is still suspended at that point > >> + */ > >> + if (xen_initial_domain() || (xen_store_domain == XS_LOCAL)) > >> + return 0; > >> + > >> + return xenbus_dev_resume(dev); > > > > When or where does this eventually get called for the init domain or > > XS_LOCAL cases? > > I was about to ask the same question. Plus I don't think the > description here or in the overview mail really makes clear how > specifically a deadlock would occur here. That's pretty relevant to > understand in the light that so far we had no indication of there > being any special treatment necessary here, and resume from S3 > had been working quite fine without that (at least as long as > xenstored is running in Dom0 and at least with the traditional/ > forward-port/non-pvops kernels). I think the unusual feature here is that dom0 has a netfront attached. Netfront resume is therefore hanging because it is trying to talk to the still frozen xenstored process in dom0. Ian. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running 2013-05-02 9:24 ` Ian Campbell @ 2013-05-02 10:10 ` Aurélien Chartier 2013-05-02 10:30 ` Ian Campbell 0 siblings, 1 reply; 9+ messages in thread From: Aurélien Chartier @ 2013-05-02 10:10 UTC (permalink / raw) To: Ian Campbell Cc: konrad.wilk@oracle.com, xen-devel@lists.xensource.com, Jan Beulich, Stefano Stabellini On 02/05/13 10:24, Ian Campbell wrote: > On Thu, 2013-05-02 at 10:21 +0100, Jan Beulich wrote: >>>>> On 02.05.13 at 10:24, Ian Campbell <Ian.Campbell@citrix.com> wrote: >>> On Wed, 2013-05-01 at 13:57 +0100, Aurelien Chartier wrote: >>>> If the xenbus frontend is running in a domain running xenstored or in dom0, >>>> the device resume is hanging because it is happening before the process >>>> resume. This patch adds extra logic to the resume code to check if we are >>>> the domain running xenstored or dom0. >>>> >>>> The frontend will be reconnected later, when the backend resumes from S3. >>>> This logic is working when xenstored is running in dom0, but has not been >>>> tested with a xenstore stub domain. >>>> --- >>>> drivers/xen/xenbus/xenbus_probe_frontend.c | 15 ++++++++++++++- >>>> 1 file changed, 14 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c >>> b/drivers/xen/xenbus/xenbus_probe_frontend.c >>>> index 3159a37..8583afe 100644 >>>> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c >>>> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c >>>> @@ -89,9 +89,22 @@ static void backend_changed(struct xenbus_watch *watch, >>>> xenbus_otherend_changed(watch, vec, len, 1); >>>> } >>>> >>>> +static int xenbus_frontend_dev_resume(struct device *dev) >>>> +{ >>>> + /* >>>> + * If xenstored is running in that domain, we cannot access the backend >>>> + * state at the moment. If we are running in dom0, the domain running >>>> + * xenstored is still suspended at that point >>>> + */ >>>> + if (xen_initial_domain() || (xen_store_domain == XS_LOCAL)) >>>> + return 0; >>>> + >>>> + return xenbus_dev_resume(dev); >>> When or where does this eventually get called for the init domain or >>> XS_LOCAL cases? >> I was about to ask the same question. Plus I don't think the >> description here or in the overview mail really makes clear how >> specifically a deadlock would occur here. That's pretty relevant to >> understand in the light that so far we had no indication of there >> being any special treatment necessary here, and resume from S3 >> had been working quite fine without that (at least as long as >> xenstored is running in Dom0 and at least with the traditional/ >> forward-port/non-pvops kernels). > I think the unusual feature here is that dom0 has a netfront attached. > Netfront resume is therefore hanging because it is trying to talk to the > still frozen xenstored process in dom0. > > Ian. > Yes, the unusual feature of having a netfront driver in dom0 is triggering the S3 issue I described. Ian made me realize this issue could also happen in Xenstore stub domains. The root cause of the issue is the assomption that a xenstored process is running in another domain when the xenbus frontend is being resumed from S3. This assomption is incorrect if xenstored and the xenbus frontend are running in the same domain. As Linux kernel is waiting for all devices to be resumed before resuming userland tasks, the xenbus frontend resume is blocking the userland process resume, waiting for xenstored (which cannot run as it is a userland process). The xenbus_dev_resume function for frontend devices such as nefront will not be called at all with that patch. I am relying on the fact that the network backend domain will be resumed after dom0 resume is complete. When that resume is happening, it will trigger a call to netback_changed in dom0 netfront. This call will end up resuming xenbus states in netfront. That logic is working for a dom0 netfront, as we can safely rely on the fact that the network backend domain will be resumed after dom0 resume is complete. I don't have a Xen configuration with Xenstore stub domain, but it would probably need some extra logic to reconnect the frontend after xenstored is being resumed. The main goal of this patch is to fix the S3 resume of domains running both a xenbus frontend and xenstored. Aurelien ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running 2013-05-02 10:10 ` Aurélien Chartier @ 2013-05-02 10:30 ` Ian Campbell 2013-05-02 12:32 ` Aurelien Chartier 0 siblings, 1 reply; 9+ messages in thread From: Ian Campbell @ 2013-05-02 10:30 UTC (permalink / raw) To: Aurélien Chartier Cc: konrad.wilk@oracle.com, xen-devel@lists.xensource.com, Jan Beulich, Stefano Stabellini On Thu, 2013-05-02 at 11:10 +0100, Aurélien Chartier wrote: > On 02/05/13 10:24, Ian Campbell wrote: > > On Thu, 2013-05-02 at 10:21 +0100, Jan Beulich wrote: > >>>>> On 02.05.13 at 10:24, Ian Campbell <Ian.Campbell@citrix.com> wrote: > >>> On Wed, 2013-05-01 at 13:57 +0100, Aurelien Chartier wrote: > >>>> If the xenbus frontend is running in a domain running xenstored or in dom0, > >>>> the device resume is hanging because it is happening before the process > >>>> resume. This patch adds extra logic to the resume code to check if we are > >>>> the domain running xenstored or dom0. > >>>> > >>>> The frontend will be reconnected later, when the backend resumes from S3. > >>>> This logic is working when xenstored is running in dom0, but has not been > >>>> tested with a xenstore stub domain. > >>>> --- > >>>> drivers/xen/xenbus/xenbus_probe_frontend.c | 15 ++++++++++++++- > >>>> 1 file changed, 14 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c > >>> b/drivers/xen/xenbus/xenbus_probe_frontend.c > >>>> index 3159a37..8583afe 100644 > >>>> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c > >>>> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c > >>>> @@ -89,9 +89,22 @@ static void backend_changed(struct xenbus_watch *watch, > >>>> xenbus_otherend_changed(watch, vec, len, 1); > >>>> } > >>>> > >>>> +static int xenbus_frontend_dev_resume(struct device *dev) > >>>> +{ > >>>> + /* > >>>> + * If xenstored is running in that domain, we cannot access the backend > >>>> + * state at the moment. If we are running in dom0, the domain running > >>>> + * xenstored is still suspended at that point > >>>> + */ > >>>> + if (xen_initial_domain() || (xen_store_domain == XS_LOCAL)) > >>>> + return 0; > >>>> + > >>>> + return xenbus_dev_resume(dev); > >>> When or where does this eventually get called for the init domain or > >>> XS_LOCAL cases? > >> I was about to ask the same question. Plus I don't think the > >> description here or in the overview mail really makes clear how > >> specifically a deadlock would occur here. That's pretty relevant to > >> understand in the light that so far we had no indication of there > >> being any special treatment necessary here, and resume from S3 > >> had been working quite fine without that (at least as long as > >> xenstored is running in Dom0 and at least with the traditional/ > >> forward-port/non-pvops kernels). > > I think the unusual feature here is that dom0 has a netfront attached. > > Netfront resume is therefore hanging because it is trying to talk to the > > still frozen xenstored process in dom0. > > > > Ian. > > > Yes, the unusual feature of having a netfront driver in dom0 is > triggering the S3 issue I described. Ian made me realize this issue > could also happen in Xenstore stub domains. > > The root cause of the issue is the assomption that a xenstored process > is running in another domain when the xenbus frontend is being resumed > from S3. This assomption is incorrect if xenstored and the xenbus > frontend are running in the same domain. As Linux kernel is waiting for > all devices to be resumed before resuming userland tasks, the xenbus > frontend resume is blocking the userland process resume, waiting for > xenstored (which cannot run as it is a userland process). > > The xenbus_dev_resume function for frontend devices such as nefront will > not be called at all with that patch. I am relying on the fact that the > network backend domain will be resumed after dom0 resume is complete. > When that resume is happening, it will trigger a call to netback_changed > in dom0 netfront. This call will end up resuming xenbus states in netfront. > > That logic is working for a dom0 netfront, as we can safely rely on the > fact that the network backend domain will be resumed after dom0 resume > is complete. I don't have a Xen configuration with Xenstore stub domain, > but it would probably need some extra logic to reconnect the frontend > after xenstored is being resumed. The main goal of this patch is to fix > the S3 resume of domains running both a xenbus frontend and xenstored. Is the assumption that other domains are all suspended over S3 a valid one in the general case? In principal there is nothing stopping the toolstack from leaving domains running over S3, is there? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running 2013-05-02 10:30 ` Ian Campbell @ 2013-05-02 12:32 ` Aurelien Chartier 0 siblings, 0 replies; 9+ messages in thread From: Aurelien Chartier @ 2013-05-02 12:32 UTC (permalink / raw) To: Ian Campbell Cc: konrad.wilk@oracle.com, xen-devel@lists.xensource.com, Jan Beulich, Stefano Stabellini On 02/05/13 11:30, Ian Campbell wrote: > On Thu, 2013-05-02 at 11:10 +0100, Aurélien Chartier wrote: >> On 02/05/13 10:24, Ian Campbell wrote: >>> On Thu, 2013-05-02 at 10:21 +0100, Jan Beulich wrote: >>>>>>> On 02.05.13 at 10:24, Ian Campbell <Ian.Campbell@citrix.com> wrote: >>>>> On Wed, 2013-05-01 at 13:57 +0100, Aurelien Chartier wrote: >>>>>> If the xenbus frontend is running in a domain running xenstored or in dom0, >>>>>> the device resume is hanging because it is happening before the process >>>>>> resume. This patch adds extra logic to the resume code to check if we are >>>>>> the domain running xenstored or dom0. >>>>>> >>>>>> The frontend will be reconnected later, when the backend resumes from S3. >>>>>> This logic is working when xenstored is running in dom0, but has not been >>>>>> tested with a xenstore stub domain. >>>>>> --- >>>>>> drivers/xen/xenbus/xenbus_probe_frontend.c | 15 ++++++++++++++- >>>>>> 1 file changed, 14 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c >>>>> b/drivers/xen/xenbus/xenbus_probe_frontend.c >>>>>> index 3159a37..8583afe 100644 >>>>>> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c >>>>>> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c >>>>>> @@ -89,9 +89,22 @@ static void backend_changed(struct xenbus_watch *watch, >>>>>> xenbus_otherend_changed(watch, vec, len, 1); >>>>>> } >>>>>> >>>>>> +static int xenbus_frontend_dev_resume(struct device *dev) >>>>>> +{ >>>>>> + /* >>>>>> + * If xenstored is running in that domain, we cannot access the backend >>>>>> + * state at the moment. If we are running in dom0, the domain running >>>>>> + * xenstored is still suspended at that point >>>>>> + */ >>>>>> + if (xen_initial_domain() || (xen_store_domain == XS_LOCAL)) >>>>>> + return 0; >>>>>> + >>>>>> + return xenbus_dev_resume(dev); >>>>> When or where does this eventually get called for the init domain or >>>>> XS_LOCAL cases? >>>> I was about to ask the same question. Plus I don't think the >>>> description here or in the overview mail really makes clear how >>>> specifically a deadlock would occur here. That's pretty relevant to >>>> understand in the light that so far we had no indication of there >>>> being any special treatment necessary here, and resume from S3 >>>> had been working quite fine without that (at least as long as >>>> xenstored is running in Dom0 and at least with the traditional/ >>>> forward-port/non-pvops kernels). >>> I think the unusual feature here is that dom0 has a netfront attached. >>> Netfront resume is therefore hanging because it is trying to talk to the >>> still frozen xenstored process in dom0. >>> >>> Ian. >>> >> Yes, the unusual feature of having a netfront driver in dom0 is >> triggering the S3 issue I described. Ian made me realize this issue >> could also happen in Xenstore stub domains. >> >> The root cause of the issue is the assomption that a xenstored process >> is running in another domain when the xenbus frontend is being resumed >> from S3. This assomption is incorrect if xenstored and the xenbus >> frontend are running in the same domain. As Linux kernel is waiting for >> all devices to be resumed before resuming userland tasks, the xenbus >> frontend resume is blocking the userland process resume, waiting for >> xenstored (which cannot run as it is a userland process). >> >> The xenbus_dev_resume function for frontend devices such as nefront will >> not be called at all with that patch. I am relying on the fact that the >> network backend domain will be resumed after dom0 resume is complete. >> When that resume is happening, it will trigger a call to netback_changed >> in dom0 netfront. This call will end up resuming xenbus states in netfront. >> >> That logic is working for a dom0 netfront, as we can safely rely on the >> fact that the network backend domain will be resumed after dom0 resume >> is complete. I don't have a Xen configuration with Xenstore stub domain, >> but it would probably need some extra logic to reconnect the frontend >> after xenstored is being resumed. The main goal of this patch is to fix >> the S3 resume of domains running both a xenbus frontend and xenstored. > Is the assumption that other domains are all suspended over S3 a valid > one in the general case? > > In principal there is nothing stopping the toolstack from leaving > domains running over S3, is there? > That seems a valid assumption for dom0. It is probably not for Xenstore stub domains, even if I fail to see a use case for that. Another solution would be to defer the call to xenbus_dev_resume until userland processes have been resumed. Any opinion on that ? Aurelien _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-05-02 12:32 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-05-01 12:57 [PATCH 0/2] xenbus: Fix S3 frontend resume when xenstored is not running Aurelien Chartier 2013-05-01 12:57 ` [PATCH 1/2] xenbus: save xenstore local status for later use Aurelien Chartier 2013-05-01 12:57 ` [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running Aurelien Chartier 2013-05-02 8:24 ` Ian Campbell 2013-05-02 9:21 ` Jan Beulich 2013-05-02 9:24 ` Ian Campbell 2013-05-02 10:10 ` Aurélien Chartier 2013-05-02 10:30 ` Ian Campbell 2013-05-02 12:32 ` Aurelien Chartier
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.