Re: xen: oops at atomic64_read

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* Re: xen: oops at atomic64_read_cx8+0x4
       [not found]   ` <20120607064017.GA2112@hanuman.astro.su.se>
@ 2012-06-07  7:33     ` Jonathan Nieder
  2012-06-07 10:33       ` Andrea Arcangeli
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Nieder @ 2012-06-07  7:33 UTC (permalink / raw)
  To: Sergio Gelato; +Cc: Andrea Arcangeli, Andrew Morton, xen-devel, 676360

Sergio Gelato wrote[1]:

>                          That 3.4.1-1~experimental.1 build
> (3.4-trunk-686-pae #1 SMP Wed Jun 6 15:11:31 UTC 2012 i686 GNU/Linux)
> is even less well-behaved under Xen: I'm getting a kernel OOPS at
> EIP: [<c1168e54>] atomic64_read_cx8+0x4/0xc SS:ESP e021:ca853c6c
> The top of the trace message unfortunately scrolled off the console before I
> could see it, and the message doesn't have time to make it to syslog (either
> local or remote).
[...]
> Non-Xen boots proceed normally.

Yeah, apparently[2] that's caused by

	commit 26c191788f18
	Author: Andrea Arcangeli <aarcange@redhat.com>
	Date:   Tue May 29 15:06:49 2012 -0700

	    mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition

which was also included in Debian kernel 3.2.19-1.

[1] http://bugs.debian.org/676360
[2] https://bugzilla.redhat.com/show_bug.cgi?id=829016#c4

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xen: oops at atomic64_read_cx8+0x4
  2012-06-07  7:33     ` xen: oops at atomic64_read_cx8+0x4 Jonathan Nieder
@ 2012-06-07 10:33       ` Andrea Arcangeli
  2012-06-07 12:30         ` Bug#676360: " Sergio Gelato
                           ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Andrea Arcangeli @ 2012-06-07 10:33 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Andrew Morton, Sergio Gelato, xen-devel, 676360

On Thu, Jun 07, 2012 at 02:33:33AM -0500, Jonathan Nieder wrote:
> Sergio Gelato wrote[1]:
> 
> >                          That 3.4.1-1~experimental.1 build
> > (3.4-trunk-686-pae #1 SMP Wed Jun 6 15:11:31 UTC 2012 i686 GNU/Linux)
> > is even less well-behaved under Xen: I'm getting a kernel OOPS at
> > EIP: [<c1168e54>] atomic64_read_cx8+0x4/0xc SS:ESP e021:ca853c6c
> > The top of the trace message unfortunately scrolled off the console before I
> > could see it, and the message doesn't have time to make it to syslog (either
> > local or remote).
> [...]
> > Non-Xen boots proceed normally.
> 
> Yeah, apparently[2] that's caused by
> 
> 	commit 26c191788f18
> 	Author: Andrea Arcangeli <aarcange@redhat.com>
> 	Date:   Tue May 29 15:06:49 2012 -0700
> 
> 	    mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition
> 
> which was also included in Debian kernel 3.2.19-1.
> 
> [1] http://bugs.debian.org/676360
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=829016#c4

Oops, sorry I didn't imagine atomic64_read on a pmd would trip.

Unfortunately to support pagetable walking with mmap_sem hold for
reading, we need an atomic read on 32bit PAE if
CONFIG_TRANSPARENT_HUGEPAGE=y.

The only case requiring this is 32bit PAE with
CONFIG_TRANSPARENT_HUGEPAGE=y at build time. If you set
CONFIG_TRANSPARENT_HUGEPAGE=n temporarily you should be able to work
around this as I optimized the code in a way to avoid an expensive
cmpxchg8b.

I guess if Xen can't be updated to handle an atomic64_read on a pmd in
the guest, we can add a pmd_read paravirt op? Or if we don't want to
break the paravirt interface a loop like gup_fast with irq disabled
should also work but looping + local_irq_disable()/enable() sounded
worse and more complex than a atomic64_read (gup fast already disables
irqs because it doesn't hold the mmap_sem so it's a different cost
looping there). AFIK Xen disables THP during boot, so a check on THP
being enabled and falling back in the THP=n version of
pmd_read_atomic, would also be safe, but it's not so nice to do it
with a runtime check.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Bug#676360: xen: oops at atomic64_read_cx8+0x4
  2012-06-07 10:33       ` Andrea Arcangeli
@ 2012-06-07 12:30         ` Sergio Gelato
  2012-06-07 15:56         ` Bug#676360: [Xen-devel] " Konrad Rzeszutek Wilk
  2012-06-07 19:50         ` Jan Beulich
  2 siblings, 0 replies; 6+ messages in thread
From: Sergio Gelato @ 2012-06-07 12:30 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Jonathan Nieder, 676360, Andrew Morton, xen-devel

* Andrea Arcangeli [2012-06-07 12:33:55 +0200]:
> I guess if Xen can't be updated to handle an atomic64_read on a pmd in
> the guest, 

I'm not sure if it makes a difference, but just in case: I observed the
problem in a dom0.

>            we can add a pmd_read paravirt op? Or if we don't want to
> break the paravirt interface a loop like gup_fast with irq disabled
> should also work but looping + local_irq_disable()/enable() sounded
> worse and more complex than a atomic64_read (gup fast already disables
> irqs because it doesn't hold the mmap_sem so it's a different cost
> looping there). AFIK Xen disables THP during boot, so a check on THP
> being enabled and falling back in the THP=n version of
> pmd_read_atomic, would also be safe, but it's not so nice to do it
> with a runtime check.
> 
> Thanks,
> Andrea

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Bug#676360: [Xen-devel] xen: oops at atomic64_read_cx8+0x4
  2012-06-07 10:33       ` Andrea Arcangeli
  2012-06-07 12:30         ` Bug#676360: " Sergio Gelato
@ 2012-06-07 15:56         ` Konrad Rzeszutek Wilk
  2012-06-07 16:17           ` Andrea Arcangeli
  2012-06-07 19:50         ` Jan Beulich
  2 siblings, 1 reply; 6+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-06-07 15:56 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Jonathan Nieder, Andrew Morton, Sergio Gelato, xen-devel, 676360

On Thu, Jun 07, 2012 at 12:33:55PM +0200, Andrea Arcangeli wrote:
> On Thu, Jun 07, 2012 at 02:33:33AM -0500, Jonathan Nieder wrote:
> > Sergio Gelato wrote[1]:
> > 
> > >                          That 3.4.1-1~experimental.1 build
> > > (3.4-trunk-686-pae #1 SMP Wed Jun 6 15:11:31 UTC 2012 i686 GNU/Linux)
> > > is even less well-behaved under Xen: I'm getting a kernel OOPS at
> > > EIP: [<c1168e54>] atomic64_read_cx8+0x4/0xc SS:ESP e021:ca853c6c
> > > The top of the trace message unfortunately scrolled off the console before I
> > > could see it, and the message doesn't have time to make it to syslog (either
> > > local or remote).
> > [...]
> > > Non-Xen boots proceed normally.
> > 
> > Yeah, apparently[2] that's caused by
> > 
> > 	commit 26c191788f18
> > 	Author: Andrea Arcangeli <aarcange@redhat.com>
> > 	Date:   Tue May 29 15:06:49 2012 -0700
> > 
> > 	    mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition
> > 
> > which was also included in Debian kernel 3.2.19-1.
> > 
> > [1] http://bugs.debian.org/676360
> > [2] https://bugzilla.redhat.com/show_bug.cgi?id=829016#c4
> 
> Oops, sorry I didn't imagine atomic64_read on a pmd would trip.

Hmm, so it looks like it used to do this:

 pmd = pmd_offset(pud, addr);
 ..
 pmd_t pmdval = *pmd;

but now you do:
 pmd_t ret = (pmd_val)((u32)*tmp);
 ret |= (*tmp+1) << 32;

which would read the low first and then the high one next
(or is the other way around?).  The 'pmd_offset' beforehand
manufactures the pmd using the PFN to MFN lookup tree (so
that there aren't any hypercall or traps).

Hm, with your change, you are still looking at the 'pmd'
and its contents, except that you are reading the low and
then the high part. Why that would trip the hypervisor
is not clear to me. Perhaps in the past it only read the
low bits?

If there was Xen hypervisor log that might give some ideas. Is
there any chance that the Linode folks could send that over?

> 
> Unfortunately to support pagetable walking with mmap_sem hold for
> reading, we need an atomic read on 32bit PAE if
> CONFIG_TRANSPARENT_HUGEPAGE=y.
> 
> The only case requiring this is 32bit PAE with
> CONFIG_TRANSPARENT_HUGEPAGE=y at build time. If you set
> CONFIG_TRANSPARENT_HUGEPAGE=n temporarily you should be able to work
> around this as I optimized the code in a way to avoid an expensive
> cmpxchg8b.

Ah, by just skipping the thing if the low bits are zero.
> 
> I guess if Xen can't be updated to handle an atomic64_read on a pmd in
> the guest, we can add a pmd_read paravirt op? Or if we don't want to
> break the paravirt interface a loop like gup_fast with irq disabled
> should also work but looping + local_irq_disable()/enable() sounded
> worse and more complex than a atomic64_read (gup fast already disables
> irqs because it doesn't hold the mmap_sem so it's a different cost

I am not really sure what is at foot. It sounds like the hypervisor
didn't like somebody reading the high and low bit, but isn't the
pmdval_t still 64-bit ? So I would have thought this would
have been triggered? Or is that the code on pmd_val never actually
read the high bits (before your addition to the atomic_read?)?

> looping there). AFIK Xen disables THP during boot, so a check on THP
> being enabled and falling back in the THP=n version of
> pmd_read_atomic, would also be safe, but it's not so nice to do it
> with a runtime check.

The thing is that I did install a 32-bit PAE guest (a Fedora) on a Fedora
17 dom0. So it looks like this is reading high part is fixed on the newer
hypervisors, but now with the older ones. And the older one is Amazon EC2
so some .. hack to workaround older hypervisors could be added.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xen: oops at atomic64_read_cx8+0x4
  2012-06-07 15:56         ` Bug#676360: [Xen-devel] " Konrad Rzeszutek Wilk
@ 2012-06-07 16:17           ` Andrea Arcangeli
  0 siblings, 0 replies; 6+ messages in thread
From: Andrea Arcangeli @ 2012-06-07 16:17 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jonathan Nieder, Andrew Morton, Sergio Gelato, xen-devel, 676360

Hi,

On Thu, Jun 07, 2012 at 11:56:47AM -0400, Konrad Rzeszutek Wilk wrote:
> then the high part. Why that would trip the hypervisor
> is not clear to me. Perhaps in the past it only read the

That is the CONFIG_TRANSPARENT_HUGEPAGE=n case and in fact it doesn't
trip the hypervisor. That was tested too, it should work fine.

The problem is with the atomic64_read version, that one uses cmpxchg8b
to read the contents of the pmdp.

> Ah, by just skipping the thing if the low bits are zero.

Yep.

> didn't like somebody reading the high and low bit, but isn't the
> pmdval_t still 64-bit ? So I would have thought this would

The pmd format is unchanged, that's hardware.

> The thing is that I did install a 32-bit PAE guest (a Fedora) on a Fedora
> 17 dom0. So it looks like this is reading high part is fixed on the newer
> hypervisors, but now with the older ones. And the older one is Amazon EC2
> so some .. hack to workaround older hypervisors could be added.

The insn oopsing is cmpxchg8b and it's not reading the low/high part
in two separate insn but reading it in a single insn, which means the
kernel oopsing was built with CONFIG_TRANSPARENT_HUGEPAGE=y.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xen: oops at atomic64_read_cx8+0x4
  2012-06-07 10:33       ` Andrea Arcangeli
  2012-06-07 12:30         ` Bug#676360: " Sergio Gelato
  2012-06-07 15:56         ` Bug#676360: [Xen-devel] " Konrad Rzeszutek Wilk
@ 2012-06-07 19:50         ` Jan Beulich
  2 siblings, 0 replies; 6+ messages in thread
From: Jan Beulich @ 2012-06-07 19:50 UTC (permalink / raw)
  To: jrnieder, aarcange; +Cc: akpm, Sergio.Gelato, xen-devel, 676360

>>> Andrea Arcangeli <aarcange@redhat.com> 06/07/12 12:35 PM >>>
>Oops, sorry I didn't imagine atomic64_read on a pmd would trip.

The problem really is that the cmpxchg8b is a write, and hence won't go
through without faulting on a write protected page (which all page table
pages necessarily are).

>I guess if Xen can't be updated to handle an atomic64_read on a pmd in
>the guest, we can add a pmd_read paravirt op?

Xen could certainly be updated to treat cmpxchg8b on a PMD entry as a
simple 8-byte read when compared-to and to-be-stored values are
identical, but the problem would be that hypervisors in the field wouldn't
necessarily get updated.

>Or if we don't want to
>break the paravirt interface a loop like gup_fast with irq disabled
>should also work but looping + local_irq_disable()/enable() sounded
>worse and more complex than a atomic64_read (gup fast already disables
>irqs because it doesn't hold the mmap_sem so it's a different cost
>looping there). AFIK Xen disables THP during boot, so a check on THP
>being enabled and falling back in the THP=n version of
>pmd_read_atomic, would also be safe, but it's not so nice to do it
>with a runtime check.

That would probably nevertheless be the most viable option. If
performance critical(?), maybe this could be hidden with something
like the alternative instruction or paravirt patching mechanisms?

Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-06-07 19:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20120606132012.GB24320@hanuman.astro.su.se>
     [not found] ` <20120606163000.GD9354@burratino>
     [not found]   ` <20120607064017.GA2112@hanuman.astro.su.se>
2012-06-07  7:33     ` xen: oops at atomic64_read_cx8+0x4 Jonathan Nieder
2012-06-07 10:33       ` Andrea Arcangeli
2012-06-07 12:30         ` Bug#676360: " Sergio Gelato
2012-06-07 15:56         ` Bug#676360: [Xen-devel] " Konrad Rzeszutek Wilk
2012-06-07 16:17           ` Andrea Arcangeli
2012-06-07 19:50         ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).