From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: 2.6.29-rc8 pv_ops dom0 BUG / unable to handle kernel paging request Date: Sun, 22 Mar 2009 14:21:01 -0700 Message-ID: <49C6ABBD.3010702@goop.org> References: <20090311212622.GK15052@edu.joroinen.fi> <49B84BE8.4020103@goop.org> <20090312083242.GM15052@edu.joroinen.fi> <49C3DC02.7010903@goop.org> <20090321201652.GT15052@edu.joroinen.fi> <20090321225031.GU15052@edu.joroinen.fi> <20090321231341.GV15052@edu.joroinen.fi> <49C5BE87.8040303@goop.org> <20090322115151.GW15052@edu.joroinen.fi> <20090322170423.GD5528@edu.joroinen.fi> <20090322204041.GE5528@edu.joroinen.fi> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20090322204041.GE5528@edu.joroinen.fi> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: =?ISO-8859-1?Q?Pasi_K=E4rkk=E4inen?= Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Pasi K=E4rkk=E4inen wrote: > On Sun, Mar 22, 2009 at 07:04:23PM +0200, Pasi K=E4rkk=E4inen wrote: > =20 >> On Sun, Mar 22, 2009 at 01:51:51PM +0200, Pasi K=E4rkk=E4inen wrote: >> =20 >>> On Sat, Mar 21, 2009 at 09:28:55PM -0700, Jeremy Fitzhardinge wrote: >>> =20 >>>> Pasi K=E4rkk=E4inen wrote: >>>> =20 >>>>> On Sun, Mar 22, 2009 at 12:50:31AM +0200, Pasi K=E4rkk=E4inen wrote= : >>>>> =20 >>>>> =20 >>>>>> On Sat, Mar 21, 2009 at 10:16:52PM +0200, Pasi K=E4rkk=E4inen wrot= e: >>>>>> =20 >>>>>> =20 >>>>>>>> Also, do you see this problem before you've started any other do= mains? =20 >>>>>>>> Or does it only happen once you've run a domU (or only while a d= omU is=20 >>>>>>>> running)? >>>>>>>> >>>>>>>> =20 >>>>>>>> =20 >>>>>>> I'm not running any other domains.. Only dom0 is running. >>>>>>> >>>>>>> Steps to reproduce this BUG on my pv_ops dom0 testbox: >>>>>>> >>>>>>> 1) Reboot the box to pv_ops dom0 kernel >>>>>>> 2) Login to dom0 via ssh >>>>>>> 3) Start kernel compilation on dom0 (make bzImage && make modules= ) >>>>>>> 4) Wait some minutes and pv_ops dom0 kernel BUGs >>>>>>> >>>>>>> So no other domains has been or is running when this happens.. >>>>>>> >>>>>>> I'll try disabling CONFIG_HIGHPTE now, and see if that makes any=20 >>>>>>> difference. >>>>>>> >>>>>>> =20 >>>>>>> =20 >>>>>> CONFIG_HIGHPTE=3Dy and pv_ops dom0 survives up for maybe 30 mins, = and then >>>>>> BUGs (during kernel compilation): >>>>>> http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog= -22-xen331-linux-2.6.29-rc8-bug-with-highpte.txt >>>>>> >>>>>> >>>>>> CONFIG_HIGHPTE=3Dn and I get BUG during system startup when udev i= s started: >>>>>> http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog= -23-xen331-linux-2.6.29-rc8-bug-no-highpte.txt >>>>>> >>>>>> Starting udev: BUG: unable to handle kernel paging request at 7000= 7823 >>>>>> IP: [] pdc_common_ops+0x171/0xfffffcfe [sata_promise] >>>>>> *pdpt =3D 000000005f781001=20 >>>>>> Oops: 0002 [#1] SMP=20 >>>>>> >>>>>> So yeah.. with CONFIG_HIGHPTE=3Dn it seems to happen when sata_pr= omise is=20 >>>>>> loaded.. What should I try next?=20 >>>>>> =20 >>>>>> =20 >>>>> Actually it's not only sata_promise. I tried 2 more times with the >>>>> CONFIG_HIGHPTE=3Dn pv_ops dom0 kernel: >>>>> >>>>> http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-= 23-xen331-linux-2.6.29-rc8-bug-no-highpte.txt >>>>> http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-= 23-xen331-linux-2.6.29-rc8-bug-no-highpte-2.txt >>>>> http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-= 23-xen331-linux-2.6.29-rc8-bug-no-highpte-3.txt >>>>> >>>>> BUG: unable to handle kernel paging request at a536462c >>>>> IP: [] classes+0x688/0xfffffa30 [parport] >>>>> *pdpt =3D 000000005f759001=20 >>>>> Oops: 0002 [#1] SMP=20 >>>>> =20 >>>>> =20 >>>> Hm, OK. Something is clearly drastically amiss. I'll try to repro. >>>> >>>> =20 >>> Actually it seems CONFIG_HIGHPTE=3Dn kernel fails also on baremetal: >>> http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-24= -baremetal-2.6.29-rc8-bug-no-highpte.txt >>> >>> Starting udev: invalid opcode: 0000 [#1] SMP=20 >>> >>> Summary: >>> CONFIG_HIGHPTE=3Dn: both dom0 and baremetal fail during system startu= p when udev is started >>> CONFIG_HIGHPTE=3Dy: baremetal works OK, dom0 fails with BUG after aro= und 30 mins of kernel compilation >>> >>> =20 >> Please ignore this summary, there was something wrong with my kernel b= uilds or >> something.=20 >> >> I'll post new summary soon when I'm finished with testing.=20 >> >> =20 > > Ok, I did new fresh kernel+modules builds and re-tested everything. > > New summary: > CONFIG_HIGHPTE=3Dn: both dom0 and baremetal work OK, both survive kerne= l compilation. > CONFIG_HIGHPTE=3Dy: baremetal works OK and survives kernel compilation,= but dom0 fails with BUG after around 20-30 mins of kernel compilation > =20 Thanks for getting a consistent test result; the other reports looked,=20 frankly, scary and I wouldn't want to be on that wild goose chase. These ones look much more tractable, though I don't really have a theory=20 for them. I'll have a look next week sometime. J