* Re: [PATCH] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
@ 2013-03-20 13:53 Boris Ostrovsky
2013-03-21 0:08 ` Josh Boyer
0 siblings, 1 reply; 10+ messages in thread
From: Boris Ostrovsky @ 2013-03-20 13:53 UTC (permalink / raw)
To: jwboyer
Cc: mingo, konrad.wilk, tglx, rostedt, kraman, gregkh, stable, bp,
samu.kallio, xen-devel, linux-kernel, hpa
----- jwboyer@redhat.com wrote:
> On Wed, Mar 13, 2013 at 09:25:44AM -0400, Boris Ostrovsky wrote:
> > On 03/01/2013 07:14 AM, Josh Boyer wrote:
> > >On Thu, Feb 28, 2013 at 04:52:20PM -0800, H. Peter Anvin wrote:
> > >>On 02/28/2013 04:42 PM, Josh Boyer wrote:
> > >>>On Fri, Mar 01, 2013 at 01:36:29AM +0100, Borislav Petkov wrote:
> > >>>>On Thu, Feb 28, 2013 at 04:15:45PM -0800, H. Peter Anvin wrote:
> > >>>>>>I'll try to get someone to test this tomorrow.
> > >>>>Btw, you'd need to apply that other patch too
> > >>>>
> > >>>>http://marc.info/?l=xen-devel&m=136206183814547&w=2
> > >>>>
> > >>>>so that arch_flush_lazy_mmu_mode() has at least one caller on
> x86_64.
> > >>>Yeah, we already have that applied. It stops crashes in xen
> > >>>environments so we pulled it in as a bugfix. Thanks though!
> > >>>
> > >>Who are "we"?
> > >Sorry, Fedora. That patch has a link to a bug in it. We applied
> the
> > >patch for that bug. I'll apply Boris' patch on top and get the
> same
> > >people to test it.
> >
> > Josh, have you had a chance to test this?
>
> I've tested it on bare metal for a while now. No problems noticed at
> all. I've not heard back from Krishna who was testing it in the Xen
> environment. Krishna?
Any updates?
Thanks.
-boris
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal 2013-03-20 13:53 [PATCH] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal Boris Ostrovsky @ 2013-03-21 0:08 ` Josh Boyer 2013-03-22 20:09 ` Konrad Rzeszutek Wilk 0 siblings, 1 reply; 10+ messages in thread From: Josh Boyer @ 2013-03-21 0:08 UTC (permalink / raw) To: Boris Ostrovsky Cc: mingo, konrad.wilk, tglx, rostedt, kraman, gregkh, stable, bp, samu.kallio, xen-devel, linux-kernel, hpa On Wed, Mar 20, 2013 at 06:53:55AM -0700, Boris Ostrovsky wrote: > > ----- jwboyer@redhat.com wrote: > > > On Wed, Mar 13, 2013 at 09:25:44AM -0400, Boris Ostrovsky wrote: > > > On 03/01/2013 07:14 AM, Josh Boyer wrote: > > > >On Thu, Feb 28, 2013 at 04:52:20PM -0800, H. Peter Anvin wrote: > > > >>On 02/28/2013 04:42 PM, Josh Boyer wrote: > > > >>>On Fri, Mar 01, 2013 at 01:36:29AM +0100, Borislav Petkov wrote: > > > >>>>On Thu, Feb 28, 2013 at 04:15:45PM -0800, H. Peter Anvin wrote: > > > >>>>>>I'll try to get someone to test this tomorrow. > > > >>>>Btw, you'd need to apply that other patch too > > > >>>> > > > >>>>http://marc.info/?l=xen-devel&m=136206183814547&w=2 > > > >>>> > > > >>>>so that arch_flush_lazy_mmu_mode() has at least one caller on > > x86_64. > > > >>>Yeah, we already have that applied. It stops crashes in xen > > > >>>environments so we pulled it in as a bugfix. Thanks though! > > > >>> > > > >>Who are "we"? > > > >Sorry, Fedora. That patch has a link to a bug in it. We applied > > the > > > >patch for that bug. I'll apply Boris' patch on top and get the > > same > > > >people to test it. > > > > > > Josh, have you had a chance to test this? > > > > I've tested it on bare metal for a while now. No problems noticed at > > all. I've not heard back from Krishna who was testing it in the Xen > > environment. Krishna? > > > Any updates? No. I've still not heard from Krishna. At this point I've tested it on bare metal quite a bit, and Konrad has tested it on both bare metal and Xen. That should already cover the case Krishna was going to test anyway. I suggest we move forward and take the patch. josh ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal 2013-03-21 0:08 ` Josh Boyer @ 2013-03-22 20:09 ` Konrad Rzeszutek Wilk 2013-03-22 20:25 ` H. Peter Anvin 0 siblings, 1 reply; 10+ messages in thread From: Konrad Rzeszutek Wilk @ 2013-03-22 20:09 UTC (permalink / raw) To: Josh Boyer, hpa Cc: Boris Ostrovsky, mingo, tglx, rostedt, kraman, gregkh, stable, bp, samu.kallio, xen-devel, linux-kernel, hpa On Wed, Mar 20, 2013 at 08:08:45PM -0400, Josh Boyer wrote: > On Wed, Mar 20, 2013 at 06:53:55AM -0700, Boris Ostrovsky wrote: > > > > ----- jwboyer@redhat.com wrote: > > > > > On Wed, Mar 13, 2013 at 09:25:44AM -0400, Boris Ostrovsky wrote: > > > > On 03/01/2013 07:14 AM, Josh Boyer wrote: > > > > >On Thu, Feb 28, 2013 at 04:52:20PM -0800, H. Peter Anvin wrote: > > > > >>On 02/28/2013 04:42 PM, Josh Boyer wrote: > > > > >>>On Fri, Mar 01, 2013 at 01:36:29AM +0100, Borislav Petkov wrote: > > > > >>>>On Thu, Feb 28, 2013 at 04:15:45PM -0800, H. Peter Anvin wrote: > > > > >>>>>>I'll try to get someone to test this tomorrow. > > > > >>>>Btw, you'd need to apply that other patch too > > > > >>>> > > > > >>>>http://marc.info/?l=xen-devel&m=136206183814547&w=2 > > > > >>>> > > > > >>>>so that arch_flush_lazy_mmu_mode() has at least one caller on > > > x86_64. > > > > >>>Yeah, we already have that applied. It stops crashes in xen > > > > >>>environments so we pulled it in as a bugfix. Thanks though! > > > > >>> > > > > >>Who are "we"? > > > > >Sorry, Fedora. That patch has a link to a bug in it. We applied > > > the > > > > >patch for that bug. I'll apply Boris' patch on top and get the > > > same > > > > >people to test it. > > > > > > > > Josh, have you had a chance to test this? > > > > > > I've tested it on bare metal for a while now. No problems noticed at > > > all. I've not heard back from Krishna who was testing it in the Xen > > > environment. Krishna? > > > > > > Any updates? > > No. I've still not heard from Krishna. > > At this point I've tested it on bare metal quite a bit, and Konrad has > tested it on both bare metal and Xen. That should already cover the > case Krishna was going to test anyway. I suggest we move forward and > take the patch. Peter? Would you like me or Boris to clean up the two patches with the appropiate Acks and send them to you? > > josh ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal 2013-03-22 20:09 ` Konrad Rzeszutek Wilk @ 2013-03-22 20:25 ` H. Peter Anvin 2013-03-23 13:36 ` [PATCH 1/2] x86: mm: Fix vmalloc_fault oops during lazy MMU updates Konrad Rzeszutek Wilk 0 siblings, 1 reply; 10+ messages in thread From: H. Peter Anvin @ 2013-03-22 20:25 UTC (permalink / raw) To: Konrad Rzeszutek Wilk, Josh Boyer Cc: Boris Ostrovsky, mingo, tglx, rostedt, kraman, gregkh, stable, bp, samu.kallio, xen-devel, linux-kernel Sure. Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: >On Wed, Mar 20, 2013 at 08:08:45PM -0400, Josh Boyer wrote: >> On Wed, Mar 20, 2013 at 06:53:55AM -0700, Boris Ostrovsky wrote: >> > >> > ----- jwboyer@redhat.com wrote: >> > >> > > On Wed, Mar 13, 2013 at 09:25:44AM -0400, Boris Ostrovsky wrote: >> > > > On 03/01/2013 07:14 AM, Josh Boyer wrote: >> > > > >On Thu, Feb 28, 2013 at 04:52:20PM -0800, H. Peter Anvin >wrote: >> > > > >>On 02/28/2013 04:42 PM, Josh Boyer wrote: >> > > > >>>On Fri, Mar 01, 2013 at 01:36:29AM +0100, Borislav Petkov >wrote: >> > > > >>>>On Thu, Feb 28, 2013 at 04:15:45PM -0800, H. Peter Anvin >wrote: >> > > > >>>>>>I'll try to get someone to test this tomorrow. >> > > > >>>>Btw, you'd need to apply that other patch too >> > > > >>>> >> > > > >>>>http://marc.info/?l=xen-devel&m=136206183814547&w=2 >> > > > >>>> >> > > > >>>>so that arch_flush_lazy_mmu_mode() has at least one caller >on >> > > x86_64. >> > > > >>>Yeah, we already have that applied. It stops crashes in xen >> > > > >>>environments so we pulled it in as a bugfix. Thanks though! >> > > > >>> >> > > > >>Who are "we"? >> > > > >Sorry, Fedora. That patch has a link to a bug in it. We >applied >> > > the >> > > > >patch for that bug. I'll apply Boris' patch on top and get >the >> > > same >> > > > >people to test it. >> > > > >> > > > Josh, have you had a chance to test this? >> > > >> > > I've tested it on bare metal for a while now. No problems >noticed at >> > > all. I've not heard back from Krishna who was testing it in the >Xen >> > > environment. Krishna? >> > >> > >> > Any updates? >> >> No. I've still not heard from Krishna. >> >> At this point I've tested it on bare metal quite a bit, and Konrad >has >> tested it on both bare metal and Xen. That should already cover the >> case Krishna was going to test anyway. I suggest we move forward and >> take the patch. > >Peter? > >Would you like me or Boris to clean up the two patches with the >appropiate Acks and send them to you? >> >> josh -- Sent from my mobile phone. Please excuse brevity and lack of formatting. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] x86: mm: Fix vmalloc_fault oops during lazy MMU updates. 2013-03-22 20:25 ` H. Peter Anvin @ 2013-03-23 13:36 ` Konrad Rzeszutek Wilk 2013-03-23 13:36 ` [PATCH 2/2] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal Konrad Rzeszutek Wilk 2013-04-11 0:29 ` [tip:x86/urgent] x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates tip-bot for Samu Kallio 0 siblings, 2 replies; 10+ messages in thread From: Konrad Rzeszutek Wilk @ 2013-03-23 13:36 UTC (permalink / raw) To: xen-devel, linux-kernel, hpa Cc: mingo, tglx, rostedt, kraman, gregkh, bp, samu.kallio, stable, Konrad Rzeszutek Wilk From: Samu Kallio <samu.kallio@aberdeencloud.com> In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops when lazy MMU updates are enabled, because set_pgd effects are being deferred. One instance of this problem is during process mm cleanup with memory cgroups enabled. The chain of events is as follows: - zap_pte_range enables lazy MMU updates - zap_pte_range eventually calls mem_cgroup_charge_statistics, which accesses the vmalloc'd mem_cgroup per-cpu stat area - vmalloc_fault is triggered which tries to sync the corresponding PGD entry with set_pgd, but the update is deferred - vmalloc_fault oopses due to a mismatch in the PUD entries The OOPs usually looks as so: ------------[ cut here ]------------ kernel BUG at arch/x86/mm/fault.c:396! invalid opcode: 0000 [#1] SMP .. snip .. CPU 1 Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1 RIP: e030:[<ffffffff816271bf>] [<ffffffff816271bf>] vmalloc_fault+0x11f/0x208 .. snip .. Call Trace: [<ffffffff81627759>] do_page_fault+0x399/0x4b0 [<ffffffff81004f4c>] ? xen_mc_extend_args+0xec/0x110 [<ffffffff81624065>] page_fault+0x25/0x30 [<ffffffff81184d03>] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50 [<ffffffff81186f78>] __mem_cgroup_uncharge_common+0xd8/0x350 [<ffffffff8118aac7>] mem_cgroup_uncharge_page+0x57/0x60 [<ffffffff8115fbc0>] page_remove_rmap+0xe0/0x150 [<ffffffff8115311a>] ? vm_normal_page+0x1a/0x80 [<ffffffff81153e61>] unmap_single_vma+0x531/0x870 [<ffffffff81154962>] unmap_vmas+0x52/0xa0 [<ffffffff81007442>] ? pte_mfn_to_pfn+0x72/0x100 [<ffffffff8115c8f8>] exit_mmap+0x98/0x170 [<ffffffff810050d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e [<ffffffff81059ce3>] mmput+0x83/0xf0 [<ffffffff810624c4>] exit_mm+0x104/0x130 [<ffffffff8106264a>] do_exit+0x15a/0x8c0 [<ffffffff810630ff>] do_group_exit+0x3f/0xa0 [<ffffffff81063177>] sys_exit_group+0x17/0x20 [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the changes visible to the consistency checks. CC: stable@vger.kernel.org RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737 Tested-by: Josh Boyer <jwboyer@redhat.com> Reported-and-Tested-by: Krishna Raman <kraman@redhat.com> Signed-off-by: Samu Kallio <samu.kallio@aberdeencloud.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/mm/fault.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 2b97525..0e88336 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -378,10 +378,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long address) if (pgd_none(*pgd_ref)) return -1; - if (pgd_none(*pgd)) + if (pgd_none(*pgd)) { set_pgd(pgd, *pgd_ref); - else + arch_flush_lazy_mmu_mode(); + } else { BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref)); + } /* * Below here mismatches are bugs because these lower tables -- 1.8.0.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal 2013-03-23 13:36 ` [PATCH 1/2] x86: mm: Fix vmalloc_fault oops during lazy MMU updates Konrad Rzeszutek Wilk @ 2013-03-23 13:36 ` Konrad Rzeszutek Wilk 2013-04-03 13:26 ` Boris Ostrovsky 2013-04-11 0:30 ` [tip:x86/urgent] x86, mm: " tip-bot for Boris Ostrovsky 2013-04-11 0:29 ` [tip:x86/urgent] x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates tip-bot for Samu Kallio 1 sibling, 2 replies; 10+ messages in thread From: Konrad Rzeszutek Wilk @ 2013-03-23 13:36 UTC (permalink / raw) To: xen-devel, linux-kernel, hpa Cc: mingo, tglx, rostedt, kraman, gregkh, bp, samu.kallio, Boris Ostrovsky, Konrad Rzeszutek Wilk From: Boris Ostrovsky <boris.ostrovsky@oracle.com> Invoking arch_flush_lazy_mmu_mode() results in calls to preempt_enable()/disable() which may have performance impact. Since lazy MMU is not used on bare metal we can patch away arch_flush_lazy_mmu_mode() so that it is never called in such environment. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Tested-by: Josh Boyer <jwboyer@redhat.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/include/asm/paravirt.h | 5 ++++- arch/x86/include/asm/paravirt_types.h | 2 ++ arch/x86/kernel/paravirt.c | 25 +++++++++++++------------ arch/x86/lguest/boot.c | 1 + arch/x86/xen/mmu.c | 1 + 5 files changed, 21 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 5edd174..7361e47 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -703,7 +703,10 @@ static inline void arch_leave_lazy_mmu_mode(void) PVOP_VCALL0(pv_mmu_ops.lazy_mode.leave); } -void arch_flush_lazy_mmu_mode(void); +static inline void arch_flush_lazy_mmu_mode(void) +{ + PVOP_VCALL0(pv_mmu_ops.lazy_mode.flush); +} static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, phys_addr_t phys, pgprot_t flags) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 142236e..b3b0ec1 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -91,6 +91,7 @@ struct pv_lazy_ops { /* Set deferred update mode, used for batching operations. */ void (*enter)(void); void (*leave)(void); + void (*flush)(void); }; struct pv_time_ops { @@ -679,6 +680,7 @@ void paravirt_end_context_switch(struct task_struct *next); void paravirt_enter_lazy_mmu(void); void paravirt_leave_lazy_mmu(void); +void paravirt_flush_lazy_mmu(void); void _paravirt_nop(void); u32 _paravirt_ident_32(u32); diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 17fff18..8bfb335 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -263,6 +263,18 @@ void paravirt_leave_lazy_mmu(void) leave_lazy(PARAVIRT_LAZY_MMU); } +void paravirt_flush_lazy_mmu(void) +{ + preempt_disable(); + + if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU) { + arch_leave_lazy_mmu_mode(); + arch_enter_lazy_mmu_mode(); + } + + preempt_enable(); +} + void paravirt_start_context_switch(struct task_struct *prev) { BUG_ON(preemptible()); @@ -292,18 +304,6 @@ enum paravirt_lazy_mode paravirt_get_lazy_mode(void) return this_cpu_read(paravirt_lazy_mode); } -void arch_flush_lazy_mmu_mode(void) -{ - preempt_disable(); - - if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU) { - arch_leave_lazy_mmu_mode(); - arch_enter_lazy_mmu_mode(); - } - - preempt_enable(); -} - struct pv_info pv_info = { .name = "bare hardware", .paravirt_enabled = 0, @@ -475,6 +475,7 @@ struct pv_mmu_ops pv_mmu_ops = { .lazy_mode = { .enter = paravirt_nop, .leave = paravirt_nop, + .flush = paravirt_nop, }, .set_fixmap = native_set_fixmap, diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c index 1cbd89c..7114c63 100644 --- a/arch/x86/lguest/boot.c +++ b/arch/x86/lguest/boot.c @@ -1334,6 +1334,7 @@ __init void lguest_init(void) pv_mmu_ops.read_cr3 = lguest_read_cr3; pv_mmu_ops.lazy_mode.enter = paravirt_enter_lazy_mmu; pv_mmu_ops.lazy_mode.leave = lguest_leave_lazy_mmu_mode; + pv_mmu_ops.lazy_mode.flush = paravirt_flush_lazy_mmu; pv_mmu_ops.pte_update = lguest_pte_update; pv_mmu_ops.pte_update_defer = lguest_pte_update; diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index e8e3493..f4f4105 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -2197,6 +2197,7 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = { .lazy_mode = { .enter = paravirt_enter_lazy_mmu, .leave = xen_leave_lazy_mmu, + .flush = paravirt_flush_lazy_mmu, }, .set_fixmap = xen_set_fixmap, -- 1.8.0.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal 2013-03-23 13:36 ` [PATCH 2/2] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal Konrad Rzeszutek Wilk @ 2013-04-03 13:26 ` Boris Ostrovsky 2013-04-11 0:30 ` [tip:x86/urgent] x86, mm: " tip-bot for Boris Ostrovsky 1 sibling, 0 replies; 10+ messages in thread From: Boris Ostrovsky @ 2013-04-03 13:26 UTC (permalink / raw) To: hpa Cc: Konrad Rzeszutek Wilk, xen-devel, linux-kernel, mingo, tglx, rostedt, kraman, gregkh, bp, samu.kallio On 03/23/2013 09:36 AM, Konrad Rzeszutek Wilk wrote: > From: Boris Ostrovsky <boris.ostrovsky@oracle.com> > > Invoking arch_flush_lazy_mmu_mode() results in calls to > preempt_enable()/disable() which may have performance impact. > > Since lazy MMU is not used on bare metal we can patch away > arch_flush_lazy_mmu_mode() so that it is never called in such > environment. > > Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> > Tested-by: Josh Boyer <jwboyer@redhat.com> > Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > Acked-by: Borislav Petkov <bp@suse.de> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Peter, what's the status of these two patches? They are not going into 3.9, right? Thanks. -boris > --- > arch/x86/include/asm/paravirt.h | 5 ++++- > arch/x86/include/asm/paravirt_types.h | 2 ++ > arch/x86/kernel/paravirt.c | 25 +++++++++++++------------ > arch/x86/lguest/boot.c | 1 + > arch/x86/xen/mmu.c | 1 + > 5 files changed, 21 insertions(+), 13 deletions(-) > > diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h > index 5edd174..7361e47 100644 > --- a/arch/x86/include/asm/paravirt.h > +++ b/arch/x86/include/asm/paravirt.h > @@ -703,7 +703,10 @@ static inline void arch_leave_lazy_mmu_mode(void) > PVOP_VCALL0(pv_mmu_ops.lazy_mode.leave); > } > > -void arch_flush_lazy_mmu_mode(void); > +static inline void arch_flush_lazy_mmu_mode(void) > +{ > + PVOP_VCALL0(pv_mmu_ops.lazy_mode.flush); > +} > > static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, > phys_addr_t phys, pgprot_t flags) > diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h > index 142236e..b3b0ec1 100644 > --- a/arch/x86/include/asm/paravirt_types.h > +++ b/arch/x86/include/asm/paravirt_types.h > @@ -91,6 +91,7 @@ struct pv_lazy_ops { > /* Set deferred update mode, used for batching operations. */ > void (*enter)(void); > void (*leave)(void); > + void (*flush)(void); > }; > > struct pv_time_ops { > @@ -679,6 +680,7 @@ void paravirt_end_context_switch(struct task_struct *next); > > void paravirt_enter_lazy_mmu(void); > void paravirt_leave_lazy_mmu(void); > +void paravirt_flush_lazy_mmu(void); > > void _paravirt_nop(void); > u32 _paravirt_ident_32(u32); > diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c > index 17fff18..8bfb335 100644 > --- a/arch/x86/kernel/paravirt.c > +++ b/arch/x86/kernel/paravirt.c > @@ -263,6 +263,18 @@ void paravirt_leave_lazy_mmu(void) > leave_lazy(PARAVIRT_LAZY_MMU); > } > > +void paravirt_flush_lazy_mmu(void) > +{ > + preempt_disable(); > + > + if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU) { > + arch_leave_lazy_mmu_mode(); > + arch_enter_lazy_mmu_mode(); > + } > + > + preempt_enable(); > +} > + > void paravirt_start_context_switch(struct task_struct *prev) > { > BUG_ON(preemptible()); > @@ -292,18 +304,6 @@ enum paravirt_lazy_mode paravirt_get_lazy_mode(void) > return this_cpu_read(paravirt_lazy_mode); > } > > -void arch_flush_lazy_mmu_mode(void) > -{ > - preempt_disable(); > - > - if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU) { > - arch_leave_lazy_mmu_mode(); > - arch_enter_lazy_mmu_mode(); > - } > - > - preempt_enable(); > -} > - > struct pv_info pv_info = { > .name = "bare hardware", > .paravirt_enabled = 0, > @@ -475,6 +475,7 @@ struct pv_mmu_ops pv_mmu_ops = { > .lazy_mode = { > .enter = paravirt_nop, > .leave = paravirt_nop, > + .flush = paravirt_nop, > }, > > .set_fixmap = native_set_fixmap, > diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c > index 1cbd89c..7114c63 100644 > --- a/arch/x86/lguest/boot.c > +++ b/arch/x86/lguest/boot.c > @@ -1334,6 +1334,7 @@ __init void lguest_init(void) > pv_mmu_ops.read_cr3 = lguest_read_cr3; > pv_mmu_ops.lazy_mode.enter = paravirt_enter_lazy_mmu; > pv_mmu_ops.lazy_mode.leave = lguest_leave_lazy_mmu_mode; > + pv_mmu_ops.lazy_mode.flush = paravirt_flush_lazy_mmu; > pv_mmu_ops.pte_update = lguest_pte_update; > pv_mmu_ops.pte_update_defer = lguest_pte_update; > > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c > index e8e3493..f4f4105 100644 > --- a/arch/x86/xen/mmu.c > +++ b/arch/x86/xen/mmu.c > @@ -2197,6 +2197,7 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = { > .lazy_mode = { > .enter = paravirt_enter_lazy_mmu, > .leave = xen_leave_lazy_mmu, > + .flush = paravirt_flush_lazy_mmu, > }, > > .set_fixmap = xen_set_fixmap, ^ permalink raw reply [flat|nested] 10+ messages in thread
* [tip:x86/urgent] x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal 2013-03-23 13:36 ` [PATCH 2/2] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal Konrad Rzeszutek Wilk 2013-04-03 13:26 ` Boris Ostrovsky @ 2013-04-11 0:30 ` tip-bot for Boris Ostrovsky 2013-04-11 15:17 ` Boris Ostrovsky 1 sibling, 1 reply; 10+ messages in thread From: tip-bot for Boris Ostrovsky @ 2013-04-11 0:30 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, konrad.wilk, boris.ostrovsky, jwboyer, tglx, hpa, bp Commit-ID: 511ba86e1d386f671084b5d0e6f110bb30b8eeb2 Gitweb: http://git.kernel.org/tip/511ba86e1d386f671084b5d0e6f110bb30b8eeb2 Author: Boris Ostrovsky <boris.ostrovsky@oracle.com> AuthorDate: Sat, 23 Mar 2013 09:36:36 -0400 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Wed, 10 Apr 2013 11:25:10 -0700 x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal Invoking arch_flush_lazy_mmu_mode() results in calls to preempt_enable()/disable() which may have performance impact. Since lazy MMU is not used on bare metal we can patch away arch_flush_lazy_mmu_mode() so that it is never called in such environment. [ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU updates" may cause a minor performance regression on bare metal. This patch resolves that performance regression. It is somewhat unclear to me if this is a good -stable candidate. ] Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1364045796-10720-2-git-send-email-konrad.wilk@oracle.com Tested-by: Josh Boyer <jwboyer@redhat.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> SEE NOTE ABOVE --- arch/x86/include/asm/paravirt.h | 5 ++++- arch/x86/include/asm/paravirt_types.h | 2 ++ arch/x86/kernel/paravirt.c | 25 +++++++++++++------------ arch/x86/lguest/boot.c | 1 + arch/x86/xen/mmu.c | 1 + 5 files changed, 21 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 5edd174..7361e47 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -703,7 +703,10 @@ static inline void arch_leave_lazy_mmu_mode(void) PVOP_VCALL0(pv_mmu_ops.lazy_mode.leave); } -void arch_flush_lazy_mmu_mode(void); +static inline void arch_flush_lazy_mmu_mode(void) +{ + PVOP_VCALL0(pv_mmu_ops.lazy_mode.flush); +} static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, phys_addr_t phys, pgprot_t flags) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 142236e..b3b0ec1 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -91,6 +91,7 @@ struct pv_lazy_ops { /* Set deferred update mode, used for batching operations. */ void (*enter)(void); void (*leave)(void); + void (*flush)(void); }; struct pv_time_ops { @@ -679,6 +680,7 @@ void paravirt_end_context_switch(struct task_struct *next); void paravirt_enter_lazy_mmu(void); void paravirt_leave_lazy_mmu(void); +void paravirt_flush_lazy_mmu(void); void _paravirt_nop(void); u32 _paravirt_ident_32(u32); diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 17fff18..8bfb335 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -263,6 +263,18 @@ void paravirt_leave_lazy_mmu(void) leave_lazy(PARAVIRT_LAZY_MMU); } +void paravirt_flush_lazy_mmu(void) +{ + preempt_disable(); + + if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU) { + arch_leave_lazy_mmu_mode(); + arch_enter_lazy_mmu_mode(); + } + + preempt_enable(); +} + void paravirt_start_context_switch(struct task_struct *prev) { BUG_ON(preemptible()); @@ -292,18 +304,6 @@ enum paravirt_lazy_mode paravirt_get_lazy_mode(void) return this_cpu_read(paravirt_lazy_mode); } -void arch_flush_lazy_mmu_mode(void) -{ - preempt_disable(); - - if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU) { - arch_leave_lazy_mmu_mode(); - arch_enter_lazy_mmu_mode(); - } - - preempt_enable(); -} - struct pv_info pv_info = { .name = "bare hardware", .paravirt_enabled = 0, @@ -475,6 +475,7 @@ struct pv_mmu_ops pv_mmu_ops = { .lazy_mode = { .enter = paravirt_nop, .leave = paravirt_nop, + .flush = paravirt_nop, }, .set_fixmap = native_set_fixmap, diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c index 1cbd89c..7114c63 100644 --- a/arch/x86/lguest/boot.c +++ b/arch/x86/lguest/boot.c @@ -1334,6 +1334,7 @@ __init void lguest_init(void) pv_mmu_ops.read_cr3 = lguest_read_cr3; pv_mmu_ops.lazy_mode.enter = paravirt_enter_lazy_mmu; pv_mmu_ops.lazy_mode.leave = lguest_leave_lazy_mmu_mode; + pv_mmu_ops.lazy_mode.flush = paravirt_flush_lazy_mmu; pv_mmu_ops.pte_update = lguest_pte_update; pv_mmu_ops.pte_update_defer = lguest_pte_update; diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 6afbb2c..2f5d687 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -2196,6 +2196,7 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = { .lazy_mode = { .enter = paravirt_enter_lazy_mmu, .leave = xen_leave_lazy_mmu, + .flush = paravirt_flush_lazy_mmu, }, .set_fixmap = xen_set_fixmap, ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [tip:x86/urgent] x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal 2013-04-11 0:30 ` [tip:x86/urgent] x86, mm: " tip-bot for Boris Ostrovsky @ 2013-04-11 15:17 ` Boris Ostrovsky 0 siblings, 0 replies; 10+ messages in thread From: Boris Ostrovsky @ 2013-04-11 15:17 UTC (permalink / raw) To: hpa; +Cc: mingo, linux-kernel, konrad.wilk, boris.ostrovsky, jwboyer, tglx, bp On 04/10/2013 08:30 PM, tip-bot for Boris Ostrovsky wrote: > Commit-ID: 511ba86e1d386f671084b5d0e6f110bb30b8eeb2 > Gitweb: http://git.kernel.org/tip/511ba86e1d386f671084b5d0e6f110bb30b8eeb2 > Author: Boris Ostrovsky <boris.ostrovsky@oracle.com> > AuthorDate: Sat, 23 Mar 2013 09:36:36 -0400 > Committer: H. Peter Anvin <hpa@linux.intel.com> > CommitDate: Wed, 10 Apr 2013 11:25:10 -0700 > > x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal > > Invoking arch_flush_lazy_mmu_mode() results in calls to > preempt_enable()/disable() which may have performance impact. > > Since lazy MMU is not used on bare metal we can patch away > arch_flush_lazy_mmu_mode() so that it is never called in such > environment. > > [ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU > updates" may cause a minor performance regression on > bare metal. This patch resolves that performance regression. It is > somewhat unclear to me if this is a good -stable candidate. ] I think this https://lkml.org/lkml/2013/2/26/420 was also part of lazy mmu set of patches but is missing in the latest batch of commits. -boris ^ permalink raw reply [flat|nested] 10+ messages in thread
* [tip:x86/urgent] x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates 2013-03-23 13:36 ` [PATCH 1/2] x86: mm: Fix vmalloc_fault oops during lazy MMU updates Konrad Rzeszutek Wilk 2013-03-23 13:36 ` [PATCH 2/2] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal Konrad Rzeszutek Wilk @ 2013-04-11 0:29 ` tip-bot for Samu Kallio 1 sibling, 0 replies; 10+ messages in thread From: tip-bot for Samu Kallio @ 2013-04-11 0:29 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, konrad.wilk, samu.kallio, stable, jwboyer, tglx, hpa, kraman Commit-ID: 1160c2779b826c6f5c08e5cc542de58fd1f667d5 Gitweb: http://git.kernel.org/tip/1160c2779b826c6f5c08e5cc542de58fd1f667d5 Author: Samu Kallio <samu.kallio@aberdeencloud.com> AuthorDate: Sat, 23 Mar 2013 09:36:35 -0400 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Wed, 10 Apr 2013 11:25:07 -0700 x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops when lazy MMU updates are enabled, because set_pgd effects are being deferred. One instance of this problem is during process mm cleanup with memory cgroups enabled. The chain of events is as follows: - zap_pte_range enables lazy MMU updates - zap_pte_range eventually calls mem_cgroup_charge_statistics, which accesses the vmalloc'd mem_cgroup per-cpu stat area - vmalloc_fault is triggered which tries to sync the corresponding PGD entry with set_pgd, but the update is deferred - vmalloc_fault oopses due to a mismatch in the PUD entries The OOPs usually looks as so: ------------[ cut here ]------------ kernel BUG at arch/x86/mm/fault.c:396! invalid opcode: 0000 [#1] SMP .. snip .. CPU 1 Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1 RIP: e030:[<ffffffff816271bf>] [<ffffffff816271bf>] vmalloc_fault+0x11f/0x208 .. snip .. Call Trace: [<ffffffff81627759>] do_page_fault+0x399/0x4b0 [<ffffffff81004f4c>] ? xen_mc_extend_args+0xec/0x110 [<ffffffff81624065>] page_fault+0x25/0x30 [<ffffffff81184d03>] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50 [<ffffffff81186f78>] __mem_cgroup_uncharge_common+0xd8/0x350 [<ffffffff8118aac7>] mem_cgroup_uncharge_page+0x57/0x60 [<ffffffff8115fbc0>] page_remove_rmap+0xe0/0x150 [<ffffffff8115311a>] ? vm_normal_page+0x1a/0x80 [<ffffffff81153e61>] unmap_single_vma+0x531/0x870 [<ffffffff81154962>] unmap_vmas+0x52/0xa0 [<ffffffff81007442>] ? pte_mfn_to_pfn+0x72/0x100 [<ffffffff8115c8f8>] exit_mmap+0x98/0x170 [<ffffffff810050d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e [<ffffffff81059ce3>] mmput+0x83/0xf0 [<ffffffff810624c4>] exit_mm+0x104/0x130 [<ffffffff8106264a>] do_exit+0x15a/0x8c0 [<ffffffff810630ff>] do_group_exit+0x3f/0xa0 [<ffffffff81063177>] sys_exit_group+0x17/0x20 [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the changes visible to the consistency checks. Cc: <stable@vger.kernel.org> RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737 Tested-by: Josh Boyer <jwboyer@redhat.com> Reported-and-Tested-by: Krishna Raman <kraman@redhat.com> Signed-off-by: Samu Kallio <samu.kallio@aberdeencloud.com> Link: http://lkml.kernel.org/r/1364045796-10720-1-git-send-email-konrad.wilk@oracle.com Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> --- arch/x86/mm/fault.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 2b97525..0e88336 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -378,10 +378,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long address) if (pgd_none(*pgd_ref)) return -1; - if (pgd_none(*pgd)) + if (pgd_none(*pgd)) { set_pgd(pgd, *pgd_ref); - else + arch_flush_lazy_mmu_mode(); + } else { BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref)); + } /* * Below here mismatches are bugs because these lower tables ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-04-11 15:17 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-03-20 13:53 [PATCH] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal Boris Ostrovsky 2013-03-21 0:08 ` Josh Boyer 2013-03-22 20:09 ` Konrad Rzeszutek Wilk 2013-03-22 20:25 ` H. Peter Anvin 2013-03-23 13:36 ` [PATCH 1/2] x86: mm: Fix vmalloc_fault oops during lazy MMU updates Konrad Rzeszutek Wilk 2013-03-23 13:36 ` [PATCH 2/2] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal Konrad Rzeszutek Wilk 2013-04-03 13:26 ` Boris Ostrovsky 2013-04-11 0:30 ` [tip:x86/urgent] x86, mm: " tip-bot for Boris Ostrovsky 2013-04-11 15:17 ` Boris Ostrovsky 2013-04-11 0:29 ` [tip:x86/urgent] x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates tip-bot for Samu Kallio
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox