* [RFC PATCH 1/3] mm: teach mm by current context info to not do I/O during memory allocation
[not found] <1350278059-14904-1-git-send-email-ming.lei@canonical.com>
@ 2012-10-15 5:14 ` Ming Lei
2012-10-15 14:33 ` Alan Stern
2012-10-15 15:47 ` Minchan Kim
0 siblings, 2 replies; 10+ messages in thread
From: Ming Lei @ 2012-10-15 5:14 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, linux-usb, linux-pm, Ming Lei, Alan Stern,
Oliver Neukum, Jiri Kosina, Andrew Morton, Mel Gorman,
Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar,
Peter Zijlstra, Rafael J. Wysocki, linux-mm
This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
'struct task_struct'), so that the flag can be set by one task
to avoid doing I/O inside memory allocation in the task's context.
The patch trys to solve one deadlock problem caused by block device,
and the problem can be occured at least in the below situations:
- during block device runtime resume situation, if memory allocation
with GFP_KERNEL is called inside runtime resume callback of any one
of its ancestors(or the block device itself), the deadlock may be
triggered inside the memory allocation since it might not complete
until the block device becomes active and the involed page I/O finishes.
The situation is pointed out first by Alan Stern. It is not a good
approach to convert all GFP_KERNEL in the path into GFP_NOIO because
several subsystems may be involved(for example, PCI, USB and SCSI may
be involved for usb mass stoarage device)
- during error handling situation of usb mass storage deivce, USB
bus reset will be put on the device, so there shouldn't have any
memory allocation with GFP_KERNEL during USB bus reset, otherwise
the deadlock similar with above may be triggered. Unfortunately, any
usb device may include one mass storage interface in theory, so it
requires all usb interface drivers to handle the situation. In fact,
most usb drivers don't know how to handle bus reset on the device
and don't provide .pre_set() and .post_reset() callback at all, so
USB core has to unbind and bind driver for these devices. So it
is still not practical to resort to GFP_NOIO for solving the problem.
Also the introduced solution can be used by block subsystem or block
drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing
actual I/O transfer.
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oneukum@suse.de>
Cc: Jiri Kosina <jiri.kosina@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: linux-mm <linux-mm@kvack.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
include/linux/sched.h | 5 +++++
mm/page_alloc.c | 2 ++
2 files changed, 7 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f6961c9..33be290 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1811,6 +1811,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define PF_FROZEN 0x00010000 /* frozen for system suspend */
#define PF_FSTRANS 0x00020000 /* inside a filesystem transaction */
#define PF_KSWAPD 0x00040000 /* I am kswapd */
+#define PF_MEMALLOC_NOIO 0x00080000 /* Allocating memory without IO involved */
#define PF_LESS_THROTTLE 0x00100000 /* Throttle me less: I clean memory */
#define PF_KTHREAD 0x00200000 /* I am a kernel thread */
#define PF_RANDOMIZE 0x00400000 /* randomize virtual address space */
@@ -1848,6 +1849,10 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
#define used_math() tsk_used_math(current)
+#define tsk_memalloc_no_io(p) ((p)->flags & PF_MEMALLOC_NOIO)
+#define tsk_memalloc_allow_io(p) do { (p)->flags &= ~PF_MEMALLOC_NOIO; } while (0)
+#define tsk_memalloc_forbid_io(p) do { (p)->flags |= PF_MEMALLOC_NOIO; } while (0)
+
/*
* task->jobctl flags
*/
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8e1be1c..e15381f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2596,6 +2596,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET;
gfp_mask &= gfp_allowed_mask;
+ if (unlikely(tsk_memalloc_no_io(current)))
+ gfp_mask &= ~GFP_IOFS;
lockdep_trace_alloc(gfp_mask);
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 1/3] mm: teach mm by current context info to not do I/O during memory allocation
2012-10-15 5:14 ` [RFC PATCH 1/3] mm: teach mm by current context info to not do I/O during memory allocation Ming Lei
@ 2012-10-15 14:33 ` Alan Stern
2012-10-15 14:41 ` Ming Lei
2012-10-15 15:47 ` Minchan Kim
1 sibling, 1 reply; 10+ messages in thread
From: Alan Stern @ 2012-10-15 14:33 UTC (permalink / raw)
To: Ming Lei
Cc: linux-kernel, Greg Kroah-Hartman, linux-usb, linux-pm,
Oliver Neukum, Jiri Kosina, Andrew Morton, Mel Gorman,
Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar,
Peter Zijlstra, Rafael J. Wysocki, linux-mm
On Mon, 15 Oct 2012, Ming Lei wrote:
> This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
> 'struct task_struct'), so that the flag can be set by one task
> to avoid doing I/O inside memory allocation in the task's context.
>
> The patch trys to solve one deadlock problem caused by block device,
> and the problem can be occured at least in the below situations:
>
> - during block device runtime resume situation, if memory allocation
> with GFP_KERNEL is called inside runtime resume callback of any one
> of its ancestors(or the block device itself), the deadlock may be
> triggered inside the memory allocation since it might not complete
> until the block device becomes active and the involed page I/O finishes.
> The situation is pointed out first by Alan Stern. It is not a good
> approach to convert all GFP_KERNEL in the path into GFP_NOIO because
> several subsystems may be involved(for example, PCI, USB and SCSI may
> be involved for usb mass stoarage device)
>
> - during error handling situation of usb mass storage deivce, USB
> bus reset will be put on the device, so there shouldn't have any
> memory allocation with GFP_KERNEL during USB bus reset, otherwise
> the deadlock similar with above may be triggered. Unfortunately, any
> usb device may include one mass storage interface in theory, so it
> requires all usb interface drivers to handle the situation. In fact,
> most usb drivers don't know how to handle bus reset on the device
> and don't provide .pre_set() and .post_reset() callback at all, so
> USB core has to unbind and bind driver for these devices. So it
> is still not practical to resort to GFP_NOIO for solving the problem.
>
> Also the introduced solution can be used by block subsystem or block
> drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing
> actual I/O transfer.
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1811,6 +1811,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
> #define PF_FROZEN 0x00010000 /* frozen for system suspend */
> #define PF_FSTRANS 0x00020000 /* inside a filesystem transaction */
> #define PF_KSWAPD 0x00040000 /* I am kswapd */
> +#define PF_MEMALLOC_NOIO 0x00080000 /* Allocating memory without IO involved */
> #define PF_LESS_THROTTLE 0x00100000 /* Throttle me less: I clean memory */
> #define PF_KTHREAD 0x00200000 /* I am a kernel thread */
> #define PF_RANDOMIZE 0x00400000 /* randomize virtual address space */
> @@ -1848,6 +1849,10 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
> #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
> #define used_math() tsk_used_math(current)
>
> +#define tsk_memalloc_no_io(p) ((p)->flags & PF_MEMALLOC_NOIO)
> +#define tsk_memalloc_allow_io(p) do { (p)->flags &= ~PF_MEMALLOC_NOIO; } while (0)
> +#define tsk_memalloc_forbid_io(p) do { (p)->flags |= PF_MEMALLOC_NOIO; } while (0)
Instead of allow/forbid, the API should be save/restore (like
local_irq_save and local_irq_restore). This makes nesting much easier.
Also, do we really the "p" argument? This is not at all likely to be
used with any task other than the current one.
Alan Stern
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 1/3] mm: teach mm by current context info to not do I/O during memory allocation
2012-10-15 14:33 ` Alan Stern
@ 2012-10-15 14:41 ` Ming Lei
0 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2012-10-15 14:41 UTC (permalink / raw)
To: Alan Stern
Cc: linux-kernel, Greg Kroah-Hartman, linux-usb, linux-pm,
Oliver Neukum, Jiri Kosina, Andrew Morton, Mel Gorman,
Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar,
Peter Zijlstra, Rafael J. Wysocki, linux-mm
On Mon, Oct 15, 2012 at 10:33 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
>
> Instead of allow/forbid, the API should be save/restore (like
> local_irq_save and local_irq_restore). This makes nesting much easier.
Good point.
> Also, do we really the "p" argument? This is not at all likely to be
> used with any task other than the current one.
Yes, only 'current' can be passed now. I keep it because no performance
effect with macro implementation. But that is not good since it may
cause misuse. Will remove the 'p' argument.
Thanks,
--
Ming Lei
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 1/3] mm: teach mm by current context info to not do I/O during memory allocation
2012-10-15 5:14 ` [RFC PATCH 1/3] mm: teach mm by current context info to not do I/O during memory allocation Ming Lei
2012-10-15 14:33 ` Alan Stern
@ 2012-10-15 15:47 ` Minchan Kim
2012-10-16 1:56 ` Ming Lei
1 sibling, 1 reply; 10+ messages in thread
From: Minchan Kim @ 2012-10-15 15:47 UTC (permalink / raw)
To: Ming Lei
Cc: linux-kernel, Greg Kroah-Hartman, linux-usb, linux-pm, Alan Stern,
Oliver Neukum, Jiri Kosina, Andrew Morton, Mel Gorman,
KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar, Peter Zijlstra,
Rafael J. Wysocki, linux-mm
On Mon, Oct 15, 2012 at 01:14:17PM +0800, Ming Lei wrote:
> This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
> 'struct task_struct'), so that the flag can be set by one task
> to avoid doing I/O inside memory allocation in the task's context.
>
> The patch trys to solve one deadlock problem caused by block device,
> and the problem can be occured at least in the below situations:
>
> - during block device runtime resume situation, if memory allocation
> with GFP_KERNEL is called inside runtime resume callback of any one
> of its ancestors(or the block device itself), the deadlock may be
> triggered inside the memory allocation since it might not complete
> until the block device becomes active and the involed page I/O finishes.
> The situation is pointed out first by Alan Stern. It is not a good
> approach to convert all GFP_KERNEL in the path into GFP_NOIO because
> several subsystems may be involved(for example, PCI, USB and SCSI may
> be involved for usb mass stoarage device)
Couldn't we expand pm_restrict_gfp_mask to cover resume path as well as
suspend path?
>
> - during error handling situation of usb mass storage deivce, USB
> bus reset will be put on the device, so there shouldn't have any
> memory allocation with GFP_KERNEL during USB bus reset, otherwise
> the deadlock similar with above may be triggered. Unfortunately, any
> usb device may include one mass storage interface in theory, so it
> requires all usb interface drivers to handle the situation. In fact,
> most usb drivers don't know how to handle bus reset on the device
> and don't provide .pre_set() and .post_reset() callback at all, so
> USB core has to unbind and bind driver for these devices. So it
> is still not practical to resort to GFP_NOIO for solving the problem.
I hope this case could be handled by usb core like usb_restrict_gfp_mask
rather than adding new branch on fast path.
>
> Also the introduced solution can be used by block subsystem or block
> drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing
> actual I/O transfer.
>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Oliver Neukum <oneukum@suse.de>
> Cc: Jiri Kosina <jiri.kosina@suse.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
> Cc: linux-mm <linux-mm@kvack.org>
>
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
> include/linux/sched.h | 5 +++++
> mm/page_alloc.c | 2 ++
> 2 files changed, 7 insertions(+)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index f6961c9..33be290 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1811,6 +1811,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
> #define PF_FROZEN 0x00010000 /* frozen for system suspend */
> #define PF_FSTRANS 0x00020000 /* inside a filesystem transaction */
> #define PF_KSWAPD 0x00040000 /* I am kswapd */
> +#define PF_MEMALLOC_NOIO 0x00080000 /* Allocating memory without IO involved */
> #define PF_LESS_THROTTLE 0x00100000 /* Throttle me less: I clean memory */
> #define PF_KTHREAD 0x00200000 /* I am a kernel thread */
> #define PF_RANDOMIZE 0x00400000 /* randomize virtual address space */
> @@ -1848,6 +1849,10 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
> #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
> #define used_math() tsk_used_math(current)
>
> +#define tsk_memalloc_no_io(p) ((p)->flags & PF_MEMALLOC_NOIO)
> +#define tsk_memalloc_allow_io(p) do { (p)->flags &= ~PF_MEMALLOC_NOIO; } while (0)
> +#define tsk_memalloc_forbid_io(p) do { (p)->flags |= PF_MEMALLOC_NOIO; } while (0)
> +
> /*
> * task->jobctl flags
> */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8e1be1c..e15381f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2596,6 +2596,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET;
>
> gfp_mask &= gfp_allowed_mask;
> + if (unlikely(tsk_memalloc_no_io(current)))
> + gfp_mask &= ~GFP_IOFS;
>
> lockdep_trace_alloc(gfp_mask);
>
> --
> 1.7.9.5
>
--
Kind Regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 1/3] mm: teach mm by current context info to not do I/O during memory allocation
2012-10-15 15:47 ` Minchan Kim
@ 2012-10-16 1:56 ` Ming Lei
2012-10-16 5:49 ` Minchan Kim
0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2012-10-16 1:56 UTC (permalink / raw)
To: Minchan Kim
Cc: linux-kernel, Greg Kroah-Hartman, linux-usb, linux-pm, Alan Stern,
Oliver Neukum, Jiri Kosina, Andrew Morton, Mel Gorman,
KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar, Peter Zijlstra,
Rafael J. Wysocki, linux-mm
On Mon, Oct 15, 2012 at 11:47 PM, Minchan Kim <minchan@kernel.org> wrote:
> On Mon, Oct 15, 2012 at 01:14:17PM +0800, Ming Lei wrote:
>> This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
>> 'struct task_struct'), so that the flag can be set by one task
>> to avoid doing I/O inside memory allocation in the task's context.
>>
>> The patch trys to solve one deadlock problem caused by block device,
>> and the problem can be occured at least in the below situations:
>>
>> - during block device runtime resume situation, if memory allocation
>> with GFP_KERNEL is called inside runtime resume callback of any one
>> of its ancestors(or the block device itself), the deadlock may be
>> triggered inside the memory allocation since it might not complete
>> until the block device becomes active and the involed page I/O finishes.
>> The situation is pointed out first by Alan Stern. It is not a good
>> approach to convert all GFP_KERNEL in the path into GFP_NOIO because
>> several subsystems may be involved(for example, PCI, USB and SCSI may
>> be involved for usb mass stoarage device)
>
> Couldn't we expand pm_restrict_gfp_mask to cover resume path as well as
> suspend path?
IMO, we could, but it is not good and might trigger memory allocation problem.
pm_restrict_gfp_mask uses the global variable of gfp_allowed_mask to
avoid allocating page with GFP_IOFS in all contexts during system sleep,
when processes have been frozen.
But during runtime PM, the whole system is running and all processes are
runnable. Also runtime PM is per device and the whole system may have
lots of devices, so taking the global gfp_allowed_mask may keep page
allocation with ~GFP_IOFS for a considerable proportion of system
running time, then alloc_page() will return failure easier.
The above deadlock problem may be fixed by allocating memory with
~GFP_IOFS only in the context of calling runtime_resume, and that is
idea of the patch.
>
>>
>> - during error handling situation of usb mass storage deivce, USB
>> bus reset will be put on the device, so there shouldn't have any
>> memory allocation with GFP_KERNEL during USB bus reset, otherwise
>> the deadlock similar with above may be triggered. Unfortunately, any
>> usb device may include one mass storage interface in theory, so it
>> requires all usb interface drivers to handle the situation. In fact,
>> most usb drivers don't know how to handle bus reset on the device
>> and don't provide .pre_set() and .post_reset() callback at all, so
>> USB core has to unbind and bind driver for these devices. So it
>> is still not practical to resort to GFP_NOIO for solving the problem.
>
> I hope this case could be handled by usb core like usb_restrict_gfp_mask
> rather than adding new branch on fast path.
See above, applying the global gfp_allowed_mask is not good.
Thanks,
--
Ming Lei
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 1/3] mm: teach mm by current context info to not do I/O during memory allocation
2012-10-16 1:56 ` Ming Lei
@ 2012-10-16 5:49 ` Minchan Kim
2012-10-16 7:08 ` Ming Lei
0 siblings, 1 reply; 10+ messages in thread
From: Minchan Kim @ 2012-10-16 5:49 UTC (permalink / raw)
To: Ming Lei
Cc: linux-kernel, Greg Kroah-Hartman, linux-usb, linux-pm, Alan Stern,
Oliver Neukum, Jiri Kosina, Andrew Morton, Mel Gorman,
KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar, Peter Zijlstra,
Rafael J. Wysocki, linux-mm
On Tue, Oct 16, 2012 at 09:56:48AM +0800, Ming Lei wrote:
> On Mon, Oct 15, 2012 at 11:47 PM, Minchan Kim <minchan@kernel.org> wrote:
> > On Mon, Oct 15, 2012 at 01:14:17PM +0800, Ming Lei wrote:
> >> This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
> >> 'struct task_struct'), so that the flag can be set by one task
> >> to avoid doing I/O inside memory allocation in the task's context.
> >>
> >> The patch trys to solve one deadlock problem caused by block device,
> >> and the problem can be occured at least in the below situations:
> >>
> >> - during block device runtime resume situation, if memory allocation
> >> with GFP_KERNEL is called inside runtime resume callback of any one
> >> of its ancestors(or the block device itself), the deadlock may be
> >> triggered inside the memory allocation since it might not complete
> >> until the block device becomes active and the involed page I/O finishes.
> >> The situation is pointed out first by Alan Stern. It is not a good
> >> approach to convert all GFP_KERNEL in the path into GFP_NOIO because
> >> several subsystems may be involved(for example, PCI, USB and SCSI may
> >> be involved for usb mass stoarage device)
> >
> > Couldn't we expand pm_restrict_gfp_mask to cover resume path as well as
> > suspend path?
>
> IMO, we could, but it is not good and might trigger memory allocation problem.
>
> pm_restrict_gfp_mask uses the global variable of gfp_allowed_mask to
> avoid allocating page with GFP_IOFS in all contexts during system sleep,
> when processes have been frozen.
>
> But during runtime PM, the whole system is running and all processes are
> runnable. Also runtime PM is per device and the whole system may have
> lots of devices, so taking the global gfp_allowed_mask may keep page
> allocation with ~GFP_IOFS for a considerable proportion of system
> running time, then alloc_page() will return failure easier.
>
> The above deadlock problem may be fixed by allocating memory with
> ~GFP_IOFS only in the context of calling runtime_resume, and that is
> idea of the patch.
Fair enough but it wouldn't be a good idea that add new unlikely branch
in allocator's fast path. Please move the check into slow path which could
be in __alloc_pages_slowpath.
>
> >
> >>
> >> - during error handling situation of usb mass storage deivce, USB
> >> bus reset will be put on the device, so there shouldn't have any
> >> memory allocation with GFP_KERNEL during USB bus reset, otherwise
> >> the deadlock similar with above may be triggered. Unfortunately, any
> >> usb device may include one mass storage interface in theory, so it
> >> requires all usb interface drivers to handle the situation. In fact,
> >> most usb drivers don't know how to handle bus reset on the device
> >> and don't provide .pre_set() and .post_reset() callback at all, so
> >> USB core has to unbind and bind driver for these devices. So it
> >> is still not practical to resort to GFP_NOIO for solving the problem.
> >
> > I hope this case could be handled by usb core like usb_restrict_gfp_mask
> > rather than adding new branch on fast path.
>
> See above, applying the global gfp_allowed_mask is not good.
>
>
> Thanks,
> --
> Ming Lei
--
Kind Regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 1/3] mm: teach mm by current context info to not do I/O during memory allocation
2012-10-16 5:49 ` Minchan Kim
@ 2012-10-16 7:08 ` Ming Lei
2012-10-16 13:09 ` Minchan Kim
0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2012-10-16 7:08 UTC (permalink / raw)
To: Minchan Kim
Cc: linux-kernel, Greg Kroah-Hartman, linux-usb, linux-pm, Alan Stern,
Oliver Neukum, Jiri Kosina, Andrew Morton, Mel Gorman,
KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar, Peter Zijlstra,
Rafael J. Wysocki, linux-mm
On Tue, Oct 16, 2012 at 1:49 PM, Minchan Kim <minchan@kernel.org> wrote:
>
> Fair enough but it wouldn't be a good idea that add new unlikely branch
> in allocator's fast path. Please move the check into slow path which could
> be in __alloc_pages_slowpath.
Thanks for your comment.
I have considered to add the branch into gfp_to_alloc_flags() before,
but didn't do it because I see that get_page_from_freelist() may use
the GFP_IO or GFP_FS flag at least in zone_reclaim() path.
So could you make sure it is safe to move the branch into
__alloc_pages_slowpath()? If so, I will add the check into
gfp_to_alloc_flags().
Thanks,
--
Ming Lei
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 1/3] mm: teach mm by current context info to not do I/O during memory allocation
2012-10-16 7:08 ` Ming Lei
@ 2012-10-16 13:09 ` Minchan Kim
2012-10-16 13:47 ` Ming Lei
0 siblings, 1 reply; 10+ messages in thread
From: Minchan Kim @ 2012-10-16 13:09 UTC (permalink / raw)
To: Ming Lei
Cc: linux-kernel, Greg Kroah-Hartman, linux-usb, linux-pm, Alan Stern,
Oliver Neukum, Jiri Kosina, Andrew Morton, Mel Gorman,
KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar, Peter Zijlstra,
Rafael J. Wysocki, linux-mm
On Tue, Oct 16, 2012 at 03:08:41PM +0800, Ming Lei wrote:
> On Tue, Oct 16, 2012 at 1:49 PM, Minchan Kim <minchan@kernel.org> wrote:
> >
> > Fair enough but it wouldn't be a good idea that add new unlikely branch
> > in allocator's fast path. Please move the check into slow path which could
> > be in __alloc_pages_slowpath.
>
> Thanks for your comment.
>
> I have considered to add the branch into gfp_to_alloc_flags() before,
> but didn't do it because I see that get_page_from_freelist() may use
> the GFP_IO or GFP_FS flag at least in zone_reclaim() path.
Good point. You can check it in __zone_reclaim and change gfp_mask of scan_control
because it's never hot path.
>
> So could you make sure it is safe to move the branch into
> __alloc_pages_slowpath()? If so, I will add the check into
> gfp_to_alloc_flags().
How about this?
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d976957..b3607fa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2614,10 +2614,16 @@ retry_cpuset:
page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
zonelist, high_zoneidx, alloc_flags,
preferred_zone, migratetype);
- if (unlikely(!page))
+ if (unlikely(!page)) {
+ /*
+ * Resume path can deadlock because block device
+ * isn't active yet.
+ */
+ if (unlikely(tsk_memalloc_no_io(current)))
+ gfp_mask &= ~GFP_IOFS;
page = __alloc_pages_slowpath(gfp_mask, order,
zonelist, high_zoneidx, nodemask,
preferred_zone, migratetype);
+ }
trace_mm_page_alloc(page, order, gfp_mask, migratetype);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b5e45f4..6c2ccdd 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3290,6 +3290,16 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
};
unsigned long nr_slab_pages0, nr_slab_pages1;
+ if (unlikely(tsk_memalloc_no_io(current))) {
+ sc.gfp_mask &= ~GFP_IOFS;
+ shrink.gfp_mask = sc.gfp_mask;
+ /*
+ * We allow to reclaim only clean pages.
+ * It can affect RECLAIM_SWAP and RECLAIM_WRITE mode
+ * but this is really rare event and allocator can
* fallback to other zones.
+ */
+ sc.may_writepage = 0;
+ sc.may_swap = 0;
+ }
+
cond_resched();
/*
* We need to be able to allocate from the reserves for RECLAIM_SWAP
>
>
> Thanks,
> --
> Ming Lei
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Kind Regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 1/3] mm: teach mm by current context info to not do I/O during memory allocation
2012-10-16 13:09 ` Minchan Kim
@ 2012-10-16 13:47 ` Ming Lei
2012-10-16 13:53 ` Minchan Kim
0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2012-10-16 13:47 UTC (permalink / raw)
To: Minchan Kim
Cc: linux-kernel, Greg Kroah-Hartman, linux-usb, linux-pm, Alan Stern,
Oliver Neukum, Jiri Kosina, Andrew Morton, Mel Gorman,
KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar, Peter Zijlstra,
Rafael J. Wysocki, linux-mm
On Tue, Oct 16, 2012 at 9:09 PM, Minchan Kim <minchan@kernel.org> wrote:
>
> Good point. You can check it in __zone_reclaim and change gfp_mask of scan_control
> because it's never hot path.
>
>>
>> So could you make sure it is safe to move the branch into
>> __alloc_pages_slowpath()? If so, I will add the check into
>> gfp_to_alloc_flags().
>
> How about this?
It is quite smart change, :-)
Considered that other part(sched.h) of the patch need update, I
will merge your change into -v1 for further review with your
Signed-off-by if you have no objection.
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d976957..b3607fa 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2614,10 +2614,16 @@ retry_cpuset:
> page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
> zonelist, high_zoneidx, alloc_flags,
> preferred_zone, migratetype);
> - if (unlikely(!page))
> + if (unlikely(!page)) {
> + /*
> + * Resume path can deadlock because block device
> + * isn't active yet.
> + */
Not only resume path, I/O transfer or its error handling path may deadlock too.
> + if (unlikely(tsk_memalloc_no_io(current)))
> + gfp_mask &= ~GFP_IOFS;
> page = __alloc_pages_slowpath(gfp_mask, order,
> zonelist, high_zoneidx, nodemask,
> preferred_zone, migratetype);
> + }
>
> trace_mm_page_alloc(page, order, gfp_mask, migratetype);
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index b5e45f4..6c2ccdd 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3290,6 +3290,16 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> };
> unsigned long nr_slab_pages0, nr_slab_pages1;
>
> + if (unlikely(tsk_memalloc_no_io(current))) {
> + sc.gfp_mask &= ~GFP_IOFS;
> + shrink.gfp_mask = sc.gfp_mask;
> + /*
> + * We allow to reclaim only clean pages.
> + * It can affect RECLAIM_SWAP and RECLAIM_WRITE mode
> + * but this is really rare event and allocator can
> * fallback to other zones.
> + */
> + sc.may_writepage = 0;
> + sc.may_swap = 0;
> + }
> +
> cond_resched();
> /*
> * We need to be able to allocate from the reserves for RECLAIM_SWAP
>
Thanks,
--
Ming Lei
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 1/3] mm: teach mm by current context info to not do I/O during memory allocation
2012-10-16 13:47 ` Ming Lei
@ 2012-10-16 13:53 ` Minchan Kim
0 siblings, 0 replies; 10+ messages in thread
From: Minchan Kim @ 2012-10-16 13:53 UTC (permalink / raw)
To: Ming Lei
Cc: linux-kernel, Greg Kroah-Hartman, linux-usb, linux-pm, Alan Stern,
Oliver Neukum, Jiri Kosina, Andrew Morton, Mel Gorman,
KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar, Peter Zijlstra,
Rafael J. Wysocki, linux-mm
On Tue, Oct 16, 2012 at 09:47:03PM +0800, Ming Lei wrote:
> On Tue, Oct 16, 2012 at 9:09 PM, Minchan Kim <minchan@kernel.org> wrote:
> >
> > Good point. You can check it in __zone_reclaim and change gfp_mask of scan_control
> > because it's never hot path.
> >
> >>
> >> So could you make sure it is safe to move the branch into
> >> __alloc_pages_slowpath()? If so, I will add the check into
> >> gfp_to_alloc_flags().
> >
> > How about this?
>
> It is quite smart change, :-)
>
> Considered that other part(sched.h) of the patch need update, I
> will merge your change into -v1 for further review with your
> Signed-off-by if you have no objection.
No problem. :)
--
Kind Regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-10-16 13:53 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1350278059-14904-1-git-send-email-ming.lei@canonical.com>
2012-10-15 5:14 ` [RFC PATCH 1/3] mm: teach mm by current context info to not do I/O during memory allocation Ming Lei
2012-10-15 14:33 ` Alan Stern
2012-10-15 14:41 ` Ming Lei
2012-10-15 15:47 ` Minchan Kim
2012-10-16 1:56 ` Ming Lei
2012-10-16 5:49 ` Minchan Kim
2012-10-16 7:08 ` Ming Lei
2012-10-16 13:09 ` Minchan Kim
2012-10-16 13:47 ` Ming Lei
2012-10-16 13:53 ` Minchan Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).