* xen-4.6: xenstored crashes during domain->interface access
@ 2016-01-26 10:58 Stefan Bader
2016-01-28 8:50 ` Stefan Bader
0 siblings, 1 reply; 4+ messages in thread
From: Stefan Bader @ 2016-01-26 10:58 UTC (permalink / raw)
To: xen-devel@lists.xensource.com; +Cc: Bastian Blank, Ian Campbell
[-- Attachment #1.1: Type: text/plain, Size: 1170 bytes --]
Hi,
while playing around with xen-4.6 I stumbled over an odd problem and am
wondering whether anybody has seen the same. A method to relatively quickly
reproduce this for me seems to:
- Start one domU (PV or HVM does not seem to matter)
- Repeatedly call xenstore-ls a few times
I think I never got beyond 10 repeats when the xenstore-ls call suddenly locks
up and xenstored crashes with a SIGBUS error. In the majority of cases (I think
I saw one different), the crash happens while accessing conn->domain->interface
in tools/xenstore/xenstored_domain.c:domain_can_read().
Looking at the corefile produced by xenstored I now got at least one case where
the pointer still matches the previously mapped value. Though I think I had also
at least one run (with less debugging added) where it seemed to be really wrong.
There is more info at [1] in case someone is interested.
I need to repeat a few more times to see how consistent the whole thing is. Does
this happen for anybody else? Any advice what I should look at (in the sense of
gathering better data)?
Thanks,
Stefan
[1] https://bugs.launchpad.net/ubuntu/+source/xen/+bug/1538049
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: xen-4.6: xenstored crashes during domain->interface access
2016-01-26 10:58 xen-4.6: xenstored crashes during domain->interface access Stefan Bader
@ 2016-01-28 8:50 ` Stefan Bader
2016-01-28 9:39 ` Ian Campbell
0 siblings, 1 reply; 4+ messages in thread
From: Stefan Bader @ 2016-01-28 8:50 UTC (permalink / raw)
To: xen-devel@lists.xensource.com; +Cc: Bastian Blank, Ian Campbell
[-- Attachment #1.1: Type: text/plain, Size: 1636 bytes --]
On 26.01.2016 11:58, Stefan Bader wrote:
> Hi,
>
> while playing around with xen-4.6 I stumbled over an odd problem and am
> wondering whether anybody has seen the same. A method to relatively quickly
> reproduce this for me seems to:
>
> - Start one domU (PV or HVM does not seem to matter)
> - Repeatedly call xenstore-ls a few times
>
> I think I never got beyond 10 repeats when the xenstore-ls call suddenly locks
> up and xenstored crashes with a SIGBUS error. In the majority of cases (I think
> I saw one different), the crash happens while accessing conn->domain->interface
> in tools/xenstore/xenstored_domain.c:domain_can_read().
> Looking at the corefile produced by xenstored I now got at least one case where
> the pointer still matches the previously mapped value. Though I think I had also
> at least one run (with less debugging added) where it seemed to be really wrong.
> There is more info at [1] in case someone is interested.
>
> I need to repeat a few more times to see how consistent the whole thing is. Does
> this happen for anybody else? Any advice what I should look at (in the sense of
> gathering better data)?
Just as an update and confirmation for Ian and Bastian: Debian testing is fine.
I have not dug into the specifics but its not the Xen package side at all.
Something in our 4.3 kernel causes this. Unfortunately without any hint in
dmesg. But since we move to 4.4 soon and I cannot reproduce it with the pending
4.4 build it seems good enough to me.
-Stefan
>
> Thanks,
> Stefan
>
> [1] https://bugs.launchpad.net/ubuntu/+source/xen/+bug/1538049
>
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: xen-4.6: xenstored crashes during domain->interface access
2016-01-28 8:50 ` Stefan Bader
@ 2016-01-28 9:39 ` Ian Campbell
2016-01-28 10:05 ` Stefan Bader
0 siblings, 1 reply; 4+ messages in thread
From: Ian Campbell @ 2016-01-28 9:39 UTC (permalink / raw)
To: Stefan Bader, xen-devel@lists.xensource.com; +Cc: Bastian Blank
On Thu, 2016-01-28 at 09:50 +0100, Stefan Bader wrote:
> On 26.01.2016 11:58, Stefan Bader wrote:
> > Hi,
> >
> > while playing around with xen-4.6 I stumbled over an odd problem and am
> > wondering whether anybody has seen the same. A method to relatively
> > quickly
> > reproduce this for me seems to:
> >
> > - Start one domU (PV or HVM does not seem to matter)
> > - Repeatedly call xenstore-ls a few times
> >
> > I think I never got beyond 10 repeats when the xenstore-ls call
> > suddenly locks
> > up and xenstored crashes with a SIGBUS error. In the majority of cases
> > (I think
> > I saw one different), the crash happens while accessing conn->domain-
> > >interface
> > in tools/xenstore/xenstored_domain.c:domain_can_read().
> > Looking at the corefile produced by xenstored I now got at least one
> > case where
> > the pointer still matches the previously mapped value. Though I think I
> > had also
> > at least one run (with less debugging added) where it seemed to be
> > really wrong.
> > There is more info at [1] in case someone is interested.
> >
> > I need to repeat a few more times to see how consistent the whole thing
> > is. Does
> > this happen for anybody else? Any advice what I should look at (in the
> > sense of
> > gathering better data)?
>
> Just as an update and confirmation for Ian and Bastian: Debian testing is fine.
> I have not dug into the specifics but its not the Xen package side at all.
> Something in our 4.3 kernel causes this. Unfortunately without any hint in
> dmesg. But since we move to 4.4 soon and I cannot reproduce it with the pending
> 4.4 build it seems good enough to me.
Ah, this is probably fixed by 9c17d96500f78 "xen/gntdev: Grant maps should
not be subject to NUMA balancing" then.
Ian.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: xen-4.6: xenstored crashes during domain->interface access
2016-01-28 9:39 ` Ian Campbell
@ 2016-01-28 10:05 ` Stefan Bader
0 siblings, 0 replies; 4+ messages in thread
From: Stefan Bader @ 2016-01-28 10:05 UTC (permalink / raw)
To: Ian Campbell, xen-devel@lists.xensource.com; +Cc: Bastian Blank
[-- Attachment #1.1: Type: text/plain, Size: 2412 bytes --]
On 28.01.2016 10:39, Ian Campbell wrote:
> On Thu, 2016-01-28 at 09:50 +0100, Stefan Bader wrote:
>> On 26.01.2016 11:58, Stefan Bader wrote:
>>> Hi,
>>>
>>> while playing around with xen-4.6 I stumbled over an odd problem and am
>>> wondering whether anybody has seen the same. A method to relatively
>>> quickly
>>> reproduce this for me seems to:
>>>
>>> - Start one domU (PV or HVM does not seem to matter)
>>> - Repeatedly call xenstore-ls a few times
>>>
>>> I think I never got beyond 10 repeats when the xenstore-ls call
>>> suddenly locks
>>> up and xenstored crashes with a SIGBUS error. In the majority of cases
>>> (I think
>>> I saw one different), the crash happens while accessing conn->domain-
>>>> interface
>>> in tools/xenstore/xenstored_domain.c:domain_can_read().
>>> Looking at the corefile produced by xenstored I now got at least one
>>> case where
>>> the pointer still matches the previously mapped value. Though I think I
>>> had also
>>> at least one run (with less debugging added) where it seemed to be
>>> really wrong.
>>> There is more info at [1] in case someone is interested.
>>>
>>> I need to repeat a few more times to see how consistent the whole thing
>>> is. Does
>>> this happen for anybody else? Any advice what I should look at (in the
>>> sense of
>>> gathering better data)?
>>
>> Just as an update and confirmation for Ian and Bastian: Debian testing is fine.
>> I have not dug into the specifics but its not the Xen package side at all.
>> Something in our 4.3 kernel causes this. Unfortunately without any hint in
>> dmesg. But since we move to 4.4 soon and I cannot reproduce it with the pending
>> 4.4 build it seems good enough to me.
>
> Ah, this is probably fixed by 9c17d96500f78 "xen/gntdev: Grant maps should
> not be subject to NUMA balancing" then.
Oh right. That sounds very possible. Maybe paired with balancing done even on a
non-NUMA system (because I saw the same happen on a non-NUMA host, too). And I
cannot remember anytime having this with 4.2, so 4.3 seems to have introduced
the additional (or maybe more aggressive) balancing.
But the result pretty much was what I saw. That from one second to the next the
grant-table page of xenstored for the running domU was invalid. Without the
daemon having done any unmap. So yeah, likely the balancing got rid of it.
-Stefan
>
> Ian.
>
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-01-28 10:05 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-26 10:58 xen-4.6: xenstored crashes during domain->interface access Stefan Bader
2016-01-28 8:50 ` Stefan Bader
2016-01-28 9:39 ` Ian Campbell
2016-01-28 10:05 ` Stefan Bader
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).