linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* vunmap() on large regions may trigger soft lockup warnings
@ 2013-12-11 16:58 David Vrabel
  2013-12-11 21:39 ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: David Vrabel @ 2013-12-11 16:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel@vger.kernel.org, Len Brown, Rafael J. Wysocki,
	linux-acpi, xen-devel, Dietmar Hahn

Andrew,

Dietmar Hahn reported an issue where calling vunmap() on a large (50 GB)
region would trigger soft lockup warnings.

The following patch would resolve this (by adding a cond_resched() call
to vunmap_pmd_range()). Almost calls of vunmap(), unmap_kernel_range()
are from process context (as far as I could tell) except for an ACPI
driver (drivers/acpi/apei/ghes.c) calls unmap_kernel_range_noflush()
from an interrupt and NMI contexts.

Can you advise on a preferred solution?

For example, an unmap_kernel_page() function (callable from atomic
context) could be provided since the GHES driver only maps/unmaps a
single page.

8<-------------------------
mm/vmalloc: avoid soft lockup warnings when vunmap()'ing large ranges

From: David Vrabel <david.vrabel@citrix.com>

If vunmap() is used to unmap a large (e.g., 50 GB) region, it may take
sufficiently long that it triggers soft lockup warnings.

Add a cond_resched() into vunmap_pmd_range() so the calling task may
be resheduled after unmapping each PMD entry.  This is how
zap_pmd_range() fixes the same problem for userspace mappings.

Reported-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 mm/vmalloc.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 0fdf968..b1b5b39 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -75,6 +75,7 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long
addr, unsigned long end)
 		if (pmd_none_or_clear_bad(pmd))
 			continue;
 		vunmap_pte_range(pmd, addr, next);
+		cond_resched();
 	} while (pmd++, addr = next, addr != end);
 }

-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: vunmap() on large regions may trigger soft lockup warnings
  2013-12-11 16:58 vunmap() on large regions may trigger soft lockup warnings David Vrabel
@ 2013-12-11 21:39 ` Andrew Morton
  2013-12-12 12:50   ` David Vrabel
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2013-12-11 21:39 UTC (permalink / raw)
  To: David Vrabel
  Cc: linux-kernel@vger.kernel.org, Len Brown, Rafael J. Wysocki,
	linux-acpi, xen-devel, Dietmar Hahn

On Wed, 11 Dec 2013 16:58:19 +0000 David Vrabel <david.vrabel@citrix.com> wrote:

> Andrew,
> 
> Dietmar Hahn reported an issue where calling vunmap() on a large (50 GB)
> region would trigger soft lockup warnings.
> 
> The following patch would resolve this (by adding a cond_resched() call
> to vunmap_pmd_range()). Almost calls of vunmap(), unmap_kernel_range()
> are from process context (as far as I could tell) except for an ACPI
> driver (drivers/acpi/apei/ghes.c) calls unmap_kernel_range_noflush()
> from an interrupt and NMI contexts.
> 
> Can you advise on a preferred solution?
> 
> For example, an unmap_kernel_page() function (callable from atomic
> context) could be provided since the GHES driver only maps/unmaps a
> single page.
> 
> 8<-------------------------
> mm/vmalloc: avoid soft lockup warnings when vunmap()'ing large ranges
> 
> From: David Vrabel <david.vrabel@citrix.com>
> 
> If vunmap() is used to unmap a large (e.g., 50 GB) region, it may take
> sufficiently long that it triggers soft lockup warnings.
> 
> Add a cond_resched() into vunmap_pmd_range() so the calling task may
> be resheduled after unmapping each PMD entry.  This is how
> zap_pmd_range() fixes the same problem for userspace mappings.
> 
> ...
>
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -75,6 +75,7 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long
> addr, unsigned long end)
>  		if (pmd_none_or_clear_bad(pmd))
>  			continue;
>  		vunmap_pte_range(pmd, addr, next);
> +		cond_resched();
>  	} while (pmd++, addr = next, addr != end);
>  }

Well that's ugly.

We could redo unmap_kernel_range() so it takes an `atomic' flag then
loops around unmapping N MB at a time, doing

	if (!atomic)
		cond_resched()

each time.  But that would require difficult tuning of N.

I suppose we could just do

	if (!in_interrupt())
		cond_resched();

in vunmap_pmd_range(), but that's pretty specific to ghes.c and doesn't
permit unmap-inside-spinlock.

So I can't immediately think of a suitable fix apart from adding a new
unmap_kernel_range_atomic().  Then add a `bool atomic' arg to
vunmap_page_range() and pass that all the way down.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: vunmap() on large regions may trigger soft lockup warnings
  2013-12-11 21:39 ` Andrew Morton
@ 2013-12-12 12:50   ` David Vrabel
  2013-12-14  8:32     ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: David Vrabel @ 2013-12-12 12:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel@vger.kernel.org, Len Brown, Rafael J. Wysocki,
	linux-acpi, xen-devel, Dietmar Hahn

On 11/12/13 21:39, Andrew Morton wrote:
> On Wed, 11 Dec 2013 16:58:19 +0000 David Vrabel <david.vrabel@citrix.com> wrote:
> 
>> Andrew,
>>
>> Dietmar Hahn reported an issue where calling vunmap() on a large (50 GB)
>> region would trigger soft lockup warnings.
>>
>> The following patch would resolve this (by adding a cond_resched() call
>> to vunmap_pmd_range()). Almost calls of vunmap(), unmap_kernel_range()
>> are from process context (as far as I could tell) except for an ACPI
>> driver (drivers/acpi/apei/ghes.c) calls unmap_kernel_range_noflush()
>> from an interrupt and NMI contexts.
>>
>> Can you advise on a preferred solution?
>>
>> For example, an unmap_kernel_page() function (callable from atomic
>> context) could be provided since the GHES driver only maps/unmaps a
>> single page.
>>
>> 8<-------------------------
>> mm/vmalloc: avoid soft lockup warnings when vunmap()'ing large ranges
>>
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> If vunmap() is used to unmap a large (e.g., 50 GB) region, it may take
>> sufficiently long that it triggers soft lockup warnings.
>>
>> Add a cond_resched() into vunmap_pmd_range() so the calling task may
>> be resheduled after unmapping each PMD entry.  This is how
>> zap_pmd_range() fixes the same problem for userspace mappings.
>>
>> ...
>>
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -75,6 +75,7 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long
>> addr, unsigned long end)
>>  		if (pmd_none_or_clear_bad(pmd))
>>  			continue;
>>  		vunmap_pte_range(pmd, addr, next);
>> +		cond_resched();
>>  	} while (pmd++, addr = next, addr != end);
>>  }
> 
> Well that's ugly.
> 
> We could redo unmap_kernel_range() so it takes an `atomic' flag then
> loops around unmapping N MB at a time, doing
> 
> 	if (!atomic)
> 		cond_resched()
> 
> each time.  But that would require difficult tuning of N.
> 
> I suppose we could just do
> 
> 	if (!in_interrupt())
> 		cond_resched();
> 
> in vunmap_pmd_range(), but that's pretty specific to ghes.c and doesn't
> permit unmap-inside-spinlock.
> 
> So I can't immediately think of a suitable fix apart from adding a new
> unmap_kernel_range_atomic().  Then add a `bool atomic' arg to
> vunmap_page_range() and pass that all the way down.

That would work for the unmap, but looking at the GHES driver some more
and it looks like it's call to ioremap_page_range() is already unsafe --
it may need to allocate a new PTE page with a non-atomic alloc in
pte_alloc_one_kernel().

Perhaps what's needed here is a pair of ioremap_page_atomic() and
iounmap_page_atomic() calls?  With some prep function to sure the PTE
pages (etc.) are preallocated.

David

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: vunmap() on large regions may trigger soft lockup warnings
  2013-12-12 12:50   ` David Vrabel
@ 2013-12-14  8:32     ` Andrew Morton
  2013-12-16 12:56       ` David Vrabel
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2013-12-14  8:32 UTC (permalink / raw)
  To: David Vrabel
  Cc: linux-kernel@vger.kernel.org, Len Brown, Rafael J. Wysocki,
	linux-acpi, xen-devel, Dietmar Hahn

On Thu, 12 Dec 2013 12:50:47 +0000 David Vrabel <david.vrabel@citrix.com> wrote:

> > each time.  But that would require difficult tuning of N.
> > 
> > I suppose we could just do
> > 
> > 	if (!in_interrupt())
> > 		cond_resched();
> > 
> > in vunmap_pmd_range(), but that's pretty specific to ghes.c and doesn't
> > permit unmap-inside-spinlock.
> > 
> > So I can't immediately think of a suitable fix apart from adding a new
> > unmap_kernel_range_atomic().  Then add a `bool atomic' arg to
> > vunmap_page_range() and pass that all the way down.
> 
> That would work for the unmap, but looking at the GHES driver some more
> and it looks like it's call to ioremap_page_range() is already unsafe --
> it may need to allocate a new PTE page with a non-atomic alloc in
> pte_alloc_one_kernel().
> 
> Perhaps what's needed here is a pair of ioremap_page_atomic() and
> iounmap_page_atomic() calls?  With some prep function to sure the PTE
> pages (etc.) are preallocated.

Is ghes.c the only problem source here?  If so then a suitable solution
would be to declare that driver hopelessly busted and proceed as if it
didn't exist :(

Just from a quick look, the thing is doing ioremap() from NMI context! 
ioremap has to do a bunch of memory allocations, takes spinlocks etc.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: vunmap() on large regions may trigger soft lockup warnings
  2013-12-14  8:32     ` Andrew Morton
@ 2013-12-16 12:56       ` David Vrabel
  2013-12-16 22:57         ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: David Vrabel @ 2013-12-16 12:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel@vger.kernel.org, Len Brown, Rafael J. Wysocki,
	linux-acpi, xen-devel, Dietmar Hahn

On 14/12/13 08:32, Andrew Morton wrote:
> On Thu, 12 Dec 2013 12:50:47 +0000 David Vrabel <david.vrabel@citrix.com> wrote:
> 
>>> each time.  But that would require difficult tuning of N.
>>>
>>> I suppose we could just do
>>>
>>> 	if (!in_interrupt())
>>> 		cond_resched();
>>>
>>> in vunmap_pmd_range(), but that's pretty specific to ghes.c and doesn't
>>> permit unmap-inside-spinlock.
>>>
>>> So I can't immediately think of a suitable fix apart from adding a new
>>> unmap_kernel_range_atomic().  Then add a `bool atomic' arg to
>>> vunmap_page_range() and pass that all the way down.
>>
>> That would work for the unmap, but looking at the GHES driver some more
>> and it looks like it's call to ioremap_page_range() is already unsafe --
>> it may need to allocate a new PTE page with a non-atomic alloc in
>> pte_alloc_one_kernel().
>>
>> Perhaps what's needed here is a pair of ioremap_page_atomic() and
>> iounmap_page_atomic() calls?  With some prep function to sure the PTE
>> pages (etc.) are preallocated.
> 
> Is ghes.c the only problem source here?  If so then a suitable solution
> would be to declare that driver hopelessly busted and proceed as if it
> didn't exist :(

All the other callers do so from non-atomic context.  ghes.c is the only
broken caller.

Shall I resend or are you happy to take the patch off the first email in
this thread?

David


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: vunmap() on large regions may trigger soft lockup warnings
  2013-12-16 12:56       ` David Vrabel
@ 2013-12-16 22:57         ` Andrew Morton
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2013-12-16 22:57 UTC (permalink / raw)
  To: David Vrabel
  Cc: linux-kernel@vger.kernel.org, Len Brown, Rafael J. Wysocki,
	linux-acpi, xen-devel, Dietmar Hahn, Huang Ying,
	Mauro Carvalho Chehab

On Mon, 16 Dec 2013 12:56:13 +0000 David Vrabel <david.vrabel@citrix.com> wrote:

> On 14/12/13 08:32, Andrew Morton wrote:
> > On Thu, 12 Dec 2013 12:50:47 +0000 David Vrabel <david.vrabel@citrix.com> wrote:
> > 
> >>> each time.  But that would require difficult tuning of N.
> >>>
> >>> I suppose we could just do
> >>>
> >>> 	if (!in_interrupt())
> >>> 		cond_resched();
> >>>
> >>> in vunmap_pmd_range(), but that's pretty specific to ghes.c and doesn't
> >>> permit unmap-inside-spinlock.
> >>>
> >>> So I can't immediately think of a suitable fix apart from adding a new
> >>> unmap_kernel_range_atomic().  Then add a `bool atomic' arg to
> >>> vunmap_page_range() and pass that all the way down.
> >>
> >> That would work for the unmap, but looking at the GHES driver some more
> >> and it looks like it's call to ioremap_page_range() is already unsafe --
> >> it may need to allocate a new PTE page with a non-atomic alloc in
> >> pte_alloc_one_kernel().
> >>
> >> Perhaps what's needed here is a pair of ioremap_page_atomic() and
> >> iounmap_page_atomic() calls?  With some prep function to sure the PTE
> >> pages (etc.) are preallocated.
> > 
> > Is ghes.c the only problem source here?  If so then a suitable solution
> > would be to declare that driver hopelessly busted and proceed as if it
> > didn't exist :(
> 
> All the other callers do so from non-atomic context.  ghes.c is the only
> broken caller.
> 
> Shall I resend or are you happy to take the patch off the first email in
> this thread?

Well first we should attempt to wake up the ghes maintainers and tell
them we're about to break their stuff. (Which I believe is already broken).

The fix won't be easy - presumably ghes will need to punt all its IRQ-
and NMI_context operations up into kernel thread context.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-12-16 22:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-11 16:58 vunmap() on large regions may trigger soft lockup warnings David Vrabel
2013-12-11 21:39 ` Andrew Morton
2013-12-12 12:50   ` David Vrabel
2013-12-14  8:32     ` Andrew Morton
2013-12-16 12:56       ` David Vrabel
2013-12-16 22:57         ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).