From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: 2.6.32.27 dom0 - BUG: unable to handle kernel paging request Date: Mon, 31 Jan 2011 14:22:47 -0800 Message-ID: <4D473637.1050804@goop.org> References: <4D1D0E44.9030807@theshore.net> <1294132563.3831.11.camel@zakaz.uk.xensource.com> <31258BA9-9301-4144-B8F4-4F799BB4BB74@theshore.net> <1294173242.13733.1.camel@localhost.localdomain> <4517209A-2F8B-41A1-9727-A0E498181135@theshore.net> <20110110185610.GC9837@dumpdata.com> <4D2B7EE8.7070309@nuclearfallout.net> <614A8802-4406-48BF-83FF-69EAA2A233E1@theshore.net> <4D472486.9050205@theshore.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4D472486.9050205@theshore.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Christopher S. Aker" Cc: Ian Campbell , xen devel List-Id: xen-devel@lists.xenproject.org On 01/31/2011 01:07 PM, Christopher S. Aker wrote: >> Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986) > > Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x) >> >> We've been running our xen-thrash testsuite on a bunch of hosts >> against a very recent build, and we've just hit this on one box: >> >> BUG: unable to handle kernel paging request at 15555d60 > > Two additional boxes out of my last test round have also hit this. > About one a week. > > Ian / Jeremy: Where do I go from here? There seems to be a moderately difficult-to-hit (but still pretty large) race in pagetable teardown. It *should* be protected by the pgd lock, so we need to work out where a teardown (or access) is happening without that lock. I think that's going to be a matter of close code-review rather than any more testing. The interesting thing is that this problem seems to have come to the fore since the the patch that was explicitly intended to avoid it was put in :/... Before that, the race was theoretical, but AFAIK had never been observed in a pvops kernel (though it was seen in the Citrix product in non-pvops kernels, which is why we fixed it). I'll try to stare at it in the next couple of days. J