From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Bader Subject: Re: xen-4.6: xenstored crashes during domain->interface access Date: Thu, 28 Jan 2016 11:05:05 +0100 Message-ID: <56A9E7D1.8090203@canonical.com> References: <56A7513F.9040504@canonical.com> <56A9D65D.2080700@canonical.com> <1453973957.26591.62.camel@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6287080143091371622==" Return-path: In-Reply-To: <1453973957.26591.62.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell , "xen-devel@lists.xensource.com" Cc: Bastian Blank List-Id: xen-devel@lists.xenproject.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --===============6287080143091371622== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="Q4qnLFufKFuHoURvvqSMv3CR6pPeHhuiv" This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --Q4qnLFufKFuHoURvvqSMv3CR6pPeHhuiv Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 28.01.2016 10:39, Ian Campbell wrote: > On Thu, 2016-01-28 at 09:50 +0100, Stefan Bader wrote: >> On 26.01.2016 11:58, Stefan Bader wrote: >>> Hi, >>> >>> while playing around with xen-4.6 I stumbled over an odd problem and = am >>> wondering whether anybody has seen the same. A method to relatively >>> quickly >>> reproduce this for me seems to: >>> >>> - Start one domU (PV or HVM does not seem to matter) >>> - Repeatedly call xenstore-ls a few times >>> >>> I think I never got beyond 10 repeats when the xenstore-ls call >>> suddenly locks >>> up and xenstored crashes with a SIGBUS error. In the majority of case= s >>> (I think >>> I saw one different), the crash happens while accessing conn->domain-= >>>> interface >>> in tools/xenstore/xenstored_domain.c:domain_can_read(). >>> Looking at the corefile produced by xenstored I now got at least one >>> case where >>> the pointer still matches the previously mapped value. Though I think= I >>> had also >>> at least one run (with less debugging added) where it seemed to be >>> really wrong. >>> There is more info at [1] in case someone is interested. >>> >>> I need to repeat a few more times to see how consistent the whole thi= ng >>> is. Does >>> this happen for anybody else? Any advice what I should look at (in th= e >>> sense of >>> gathering better data)? >> >> Just as an update and confirmation for Ian and Bastian: Debian testing= is fine. >> I have not dug into the specifics but its not the Xen package side at = all. >> Something in our 4.3 kernel causes this. Unfortunately without any hin= t in >> dmesg. But since we move to 4.4 soon and I cannot reproduce it with th= e pending >> 4.4 build it seems good enough to me. >=20 > Ah, this is probably fixed by 9c17d96500f78 "xen/gntdev: Grant maps sho= uld > not be subject to NUMA balancing" then. Oh right. That sounds very possible. Maybe paired with balancing done eve= n on a non-NUMA system (because I saw the same happen on a non-NUMA host, too). = And I cannot remember anytime having this with 4.2, so 4.3 seems to have introd= uced the additional (or maybe more aggressive) balancing. But the result pretty much was what I saw. That from one second to the ne= xt the grant-table page of xenstored for the running domU was invalid. Without t= he daemon having done any unmap. So yeah, likely the balancing got rid of it= =2E -Stefan >=20 > Ian. >=20 --Q4qnLFufKFuHoURvvqSMv3CR6pPeHhuiv Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBCgAGBQJWqefYAAoJEOhnXe7L7s6j1dYQAMSoCYDDZ8PNdK+VEwVT81Oq 5lrORN7NYJJNm0XgX6t42dhEvOJgJl3lvQcY2XlnYsQ4UpxfAObHvtTYc/4DPSgV 9k5Y20YJMeSKcR3FC7UoenoF4+wqlQ1kRVUZSWxlHn2LW20Q3K5wGflfyQObSQff oGmFji/pj1jqpYxO8si8N1l0Z0auIdGWmAV+DGygZf6AKkU5r1MvM7Q7OeuO84vd YMpVsrBqJE4MF/r/JeBINKQF5LSQCF8JUmKUyZ6+L16fdWtebCgEdHGFjLRq/3Dq XyzdPH/pNTp2Cdwi95/MqY8nP2JJ4orwF8g2+3vzzeMgt+iGyz592HNmZ8xqDUNP lg6IDSu02v2uJAZWiHCJjClTH62MZOqTRViBdliz3XurcWyrl5NCdfPOmf5oJ91L gYdzhs1r/bLnQL+xCj64ES0oj7aG7tVz2YF4Ezp0H2LDHq0oa/nMVfNpTLjJGZGS nXCjJIeBcHBCWscjnVL/sq0bT8uDzd7EFO/+1iSFpKP7ekDUPIqlonQ0pft3Sl8A L9uDTJ3yTBAf/DofaPHMrIqmjkP8UCX2SgyZRxDlCze9QQ1V8opyFUlKq5hNEX1J VVxQr09yOPpkzKnqxFXWmnUvnq0CS41GT0Gx6os357XHkC1BESyEtNWrJOlGkxHP XL3n/hrsNQguJvmE0eML =6Vzf -----END PGP SIGNATURE----- --Q4qnLFufKFuHoURvvqSMv3CR6pPeHhuiv-- --===============6287080143091371622== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============6287080143091371622==--