* vunmap() on large regions may trigger soft lockup warnings
@ 2013-12-11 16:58 David Vrabel
2013-12-11 21:39 ` Andrew Morton
0 siblings, 1 reply; 6+ messages in thread
From: David Vrabel @ 2013-12-11 16:58 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel@vger.kernel.org, Len Brown, Rafael J. Wysocki,
linux-acpi, xen-devel, Dietmar Hahn
Andrew,
Dietmar Hahn reported an issue where calling vunmap() on a large (50 GB)
region would trigger soft lockup warnings.
The following patch would resolve this (by adding a cond_resched() call
to vunmap_pmd_range()). Almost calls of vunmap(), unmap_kernel_range()
are from process context (as far as I could tell) except for an ACPI
driver (drivers/acpi/apei/ghes.c) calls unmap_kernel_range_noflush()
from an interrupt and NMI contexts.
Can you advise on a preferred solution?
For example, an unmap_kernel_page() function (callable from atomic
context) could be provided since the GHES driver only maps/unmaps a
single page.
8<-------------------------
mm/vmalloc: avoid soft lockup warnings when vunmap()'ing large ranges
From: David Vrabel <david.vrabel@citrix.com>
If vunmap() is used to unmap a large (e.g., 50 GB) region, it may take
sufficiently long that it triggers soft lockup warnings.
Add a cond_resched() into vunmap_pmd_range() so the calling task may
be resheduled after unmapping each PMD entry. This is how
zap_pmd_range() fixes the same problem for userspace mappings.
Reported-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
mm/vmalloc.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 0fdf968..b1b5b39 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -75,6 +75,7 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long
addr, unsigned long end)
if (pmd_none_or_clear_bad(pmd))
continue;
vunmap_pte_range(pmd, addr, next);
+ cond_resched();
} while (pmd++, addr = next, addr != end);
}
--
1.7.2.5
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: vunmap() on large regions may trigger soft lockup warnings
2013-12-11 16:58 vunmap() on large regions may trigger soft lockup warnings David Vrabel
@ 2013-12-11 21:39 ` Andrew Morton
2013-12-12 12:50 ` David Vrabel
0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2013-12-11 21:39 UTC (permalink / raw)
To: David Vrabel
Cc: linux-kernel@vger.kernel.org, Len Brown, Rafael J. Wysocki,
linux-acpi, xen-devel, Dietmar Hahn
On Wed, 11 Dec 2013 16:58:19 +0000 David Vrabel <david.vrabel@citrix.com> wrote:
> Andrew,
>
> Dietmar Hahn reported an issue where calling vunmap() on a large (50 GB)
> region would trigger soft lockup warnings.
>
> The following patch would resolve this (by adding a cond_resched() call
> to vunmap_pmd_range()). Almost calls of vunmap(), unmap_kernel_range()
> are from process context (as far as I could tell) except for an ACPI
> driver (drivers/acpi/apei/ghes.c) calls unmap_kernel_range_noflush()
> from an interrupt and NMI contexts.
>
> Can you advise on a preferred solution?
>
> For example, an unmap_kernel_page() function (callable from atomic
> context) could be provided since the GHES driver only maps/unmaps a
> single page.
>
> 8<-------------------------
> mm/vmalloc: avoid soft lockup warnings when vunmap()'ing large ranges
>
> From: David Vrabel <david.vrabel@citrix.com>
>
> If vunmap() is used to unmap a large (e.g., 50 GB) region, it may take
> sufficiently long that it triggers soft lockup warnings.
>
> Add a cond_resched() into vunmap_pmd_range() so the calling task may
> be resheduled after unmapping each PMD entry. This is how
> zap_pmd_range() fixes the same problem for userspace mappings.
>
> ...
>
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -75,6 +75,7 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long
> addr, unsigned long end)
> if (pmd_none_or_clear_bad(pmd))
> continue;
> vunmap_pte_range(pmd, addr, next);
> + cond_resched();
> } while (pmd++, addr = next, addr != end);
> }
Well that's ugly.
We could redo unmap_kernel_range() so it takes an `atomic' flag then
loops around unmapping N MB at a time, doing
if (!atomic)
cond_resched()
each time. But that would require difficult tuning of N.
I suppose we could just do
if (!in_interrupt())
cond_resched();
in vunmap_pmd_range(), but that's pretty specific to ghes.c and doesn't
permit unmap-inside-spinlock.
So I can't immediately think of a suitable fix apart from adding a new
unmap_kernel_range_atomic(). Then add a `bool atomic' arg to
vunmap_page_range() and pass that all the way down.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: vunmap() on large regions may trigger soft lockup warnings
2013-12-11 21:39 ` Andrew Morton
@ 2013-12-12 12:50 ` David Vrabel
2013-12-14 8:32 ` Andrew Morton
0 siblings, 1 reply; 6+ messages in thread
From: David Vrabel @ 2013-12-12 12:50 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel@vger.kernel.org, Len Brown, Rafael J. Wysocki,
linux-acpi, xen-devel, Dietmar Hahn
On 11/12/13 21:39, Andrew Morton wrote:
> On Wed, 11 Dec 2013 16:58:19 +0000 David Vrabel <david.vrabel@citrix.com> wrote:
>
>> Andrew,
>>
>> Dietmar Hahn reported an issue where calling vunmap() on a large (50 GB)
>> region would trigger soft lockup warnings.
>>
>> The following patch would resolve this (by adding a cond_resched() call
>> to vunmap_pmd_range()). Almost calls of vunmap(), unmap_kernel_range()
>> are from process context (as far as I could tell) except for an ACPI
>> driver (drivers/acpi/apei/ghes.c) calls unmap_kernel_range_noflush()
>> from an interrupt and NMI contexts.
>>
>> Can you advise on a preferred solution?
>>
>> For example, an unmap_kernel_page() function (callable from atomic
>> context) could be provided since the GHES driver only maps/unmaps a
>> single page.
>>
>> 8<-------------------------
>> mm/vmalloc: avoid soft lockup warnings when vunmap()'ing large ranges
>>
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> If vunmap() is used to unmap a large (e.g., 50 GB) region, it may take
>> sufficiently long that it triggers soft lockup warnings.
>>
>> Add a cond_resched() into vunmap_pmd_range() so the calling task may
>> be resheduled after unmapping each PMD entry. This is how
>> zap_pmd_range() fixes the same problem for userspace mappings.
>>
>> ...
>>
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -75,6 +75,7 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long
>> addr, unsigned long end)
>> if (pmd_none_or_clear_bad(pmd))
>> continue;
>> vunmap_pte_range(pmd, addr, next);
>> + cond_resched();
>> } while (pmd++, addr = next, addr != end);
>> }
>
> Well that's ugly.
>
> We could redo unmap_kernel_range() so it takes an `atomic' flag then
> loops around unmapping N MB at a time, doing
>
> if (!atomic)
> cond_resched()
>
> each time. But that would require difficult tuning of N.
>
> I suppose we could just do
>
> if (!in_interrupt())
> cond_resched();
>
> in vunmap_pmd_range(), but that's pretty specific to ghes.c and doesn't
> permit unmap-inside-spinlock.
>
> So I can't immediately think of a suitable fix apart from adding a new
> unmap_kernel_range_atomic(). Then add a `bool atomic' arg to
> vunmap_page_range() and pass that all the way down.
That would work for the unmap, but looking at the GHES driver some more
and it looks like it's call to ioremap_page_range() is already unsafe --
it may need to allocate a new PTE page with a non-atomic alloc in
pte_alloc_one_kernel().
Perhaps what's needed here is a pair of ioremap_page_atomic() and
iounmap_page_atomic() calls? With some prep function to sure the PTE
pages (etc.) are preallocated.
David
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: vunmap() on large regions may trigger soft lockup warnings
2013-12-12 12:50 ` David Vrabel
@ 2013-12-14 8:32 ` Andrew Morton
2013-12-16 12:56 ` David Vrabel
0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2013-12-14 8:32 UTC (permalink / raw)
To: David Vrabel
Cc: linux-kernel@vger.kernel.org, Len Brown, Rafael J. Wysocki,
linux-acpi, xen-devel, Dietmar Hahn
On Thu, 12 Dec 2013 12:50:47 +0000 David Vrabel <david.vrabel@citrix.com> wrote:
> > each time. But that would require difficult tuning of N.
> >
> > I suppose we could just do
> >
> > if (!in_interrupt())
> > cond_resched();
> >
> > in vunmap_pmd_range(), but that's pretty specific to ghes.c and doesn't
> > permit unmap-inside-spinlock.
> >
> > So I can't immediately think of a suitable fix apart from adding a new
> > unmap_kernel_range_atomic(). Then add a `bool atomic' arg to
> > vunmap_page_range() and pass that all the way down.
>
> That would work for the unmap, but looking at the GHES driver some more
> and it looks like it's call to ioremap_page_range() is already unsafe --
> it may need to allocate a new PTE page with a non-atomic alloc in
> pte_alloc_one_kernel().
>
> Perhaps what's needed here is a pair of ioremap_page_atomic() and
> iounmap_page_atomic() calls? With some prep function to sure the PTE
> pages (etc.) are preallocated.
Is ghes.c the only problem source here? If so then a suitable solution
would be to declare that driver hopelessly busted and proceed as if it
didn't exist :(
Just from a quick look, the thing is doing ioremap() from NMI context!
ioremap has to do a bunch of memory allocations, takes spinlocks etc.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: vunmap() on large regions may trigger soft lockup warnings
2013-12-14 8:32 ` Andrew Morton
@ 2013-12-16 12:56 ` David Vrabel
2013-12-16 22:57 ` Andrew Morton
0 siblings, 1 reply; 6+ messages in thread
From: David Vrabel @ 2013-12-16 12:56 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel@vger.kernel.org, Len Brown, Rafael J. Wysocki,
linux-acpi, xen-devel, Dietmar Hahn
On 14/12/13 08:32, Andrew Morton wrote:
> On Thu, 12 Dec 2013 12:50:47 +0000 David Vrabel <david.vrabel@citrix.com> wrote:
>
>>> each time. But that would require difficult tuning of N.
>>>
>>> I suppose we could just do
>>>
>>> if (!in_interrupt())
>>> cond_resched();
>>>
>>> in vunmap_pmd_range(), but that's pretty specific to ghes.c and doesn't
>>> permit unmap-inside-spinlock.
>>>
>>> So I can't immediately think of a suitable fix apart from adding a new
>>> unmap_kernel_range_atomic(). Then add a `bool atomic' arg to
>>> vunmap_page_range() and pass that all the way down.
>>
>> That would work for the unmap, but looking at the GHES driver some more
>> and it looks like it's call to ioremap_page_range() is already unsafe --
>> it may need to allocate a new PTE page with a non-atomic alloc in
>> pte_alloc_one_kernel().
>>
>> Perhaps what's needed here is a pair of ioremap_page_atomic() and
>> iounmap_page_atomic() calls? With some prep function to sure the PTE
>> pages (etc.) are preallocated.
>
> Is ghes.c the only problem source here? If so then a suitable solution
> would be to declare that driver hopelessly busted and proceed as if it
> didn't exist :(
All the other callers do so from non-atomic context. ghes.c is the only
broken caller.
Shall I resend or are you happy to take the patch off the first email in
this thread?
David
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: vunmap() on large regions may trigger soft lockup warnings
2013-12-16 12:56 ` David Vrabel
@ 2013-12-16 22:57 ` Andrew Morton
0 siblings, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2013-12-16 22:57 UTC (permalink / raw)
To: David Vrabel
Cc: linux-kernel@vger.kernel.org, Len Brown, Rafael J. Wysocki,
linux-acpi, xen-devel, Dietmar Hahn, Huang Ying,
Mauro Carvalho Chehab
On Mon, 16 Dec 2013 12:56:13 +0000 David Vrabel <david.vrabel@citrix.com> wrote:
> On 14/12/13 08:32, Andrew Morton wrote:
> > On Thu, 12 Dec 2013 12:50:47 +0000 David Vrabel <david.vrabel@citrix.com> wrote:
> >
> >>> each time. But that would require difficult tuning of N.
> >>>
> >>> I suppose we could just do
> >>>
> >>> if (!in_interrupt())
> >>> cond_resched();
> >>>
> >>> in vunmap_pmd_range(), but that's pretty specific to ghes.c and doesn't
> >>> permit unmap-inside-spinlock.
> >>>
> >>> So I can't immediately think of a suitable fix apart from adding a new
> >>> unmap_kernel_range_atomic(). Then add a `bool atomic' arg to
> >>> vunmap_page_range() and pass that all the way down.
> >>
> >> That would work for the unmap, but looking at the GHES driver some more
> >> and it looks like it's call to ioremap_page_range() is already unsafe --
> >> it may need to allocate a new PTE page with a non-atomic alloc in
> >> pte_alloc_one_kernel().
> >>
> >> Perhaps what's needed here is a pair of ioremap_page_atomic() and
> >> iounmap_page_atomic() calls? With some prep function to sure the PTE
> >> pages (etc.) are preallocated.
> >
> > Is ghes.c the only problem source here? If so then a suitable solution
> > would be to declare that driver hopelessly busted and proceed as if it
> > didn't exist :(
>
> All the other callers do so from non-atomic context. ghes.c is the only
> broken caller.
>
> Shall I resend or are you happy to take the patch off the first email in
> this thread?
Well first we should attempt to wake up the ghes maintainers and tell
them we're about to break their stuff. (Which I believe is already broken).
The fix won't be easy - presumably ghes will need to punt all its IRQ-
and NMI_context operations up into kernel thread context.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-12-16 22:57 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-11 16:58 vunmap() on large regions may trigger soft lockup warnings David Vrabel
2013-12-11 21:39 ` Andrew Morton
2013-12-12 12:50 ` David Vrabel
2013-12-14 8:32 ` Andrew Morton
2013-12-16 12:56 ` David Vrabel
2013-12-16 22:57 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).