* Re: + mempool-fix-first-round-failure-behavior.patch added to -mm tree
@ 2011-12-22 16:39 Oleg Nesterov
2011-12-22 16:49 ` Tejun Heo
0 siblings, 1 reply; 3+ messages in thread
From: Oleg Nesterov @ 2011-12-22 16:39 UTC (permalink / raw)
To: Tejun Heo, Andrew Morton; +Cc: linux-kernel
Hello,
> For the initial allocation, mempool passes modified gfp mask to the
> backing allocator so that it doesn't try too hard when there are reserved
> elements waiting in the pool; however, when that allocation fails and pool
> is empty too, it either waits for the pool to be replenished before
> retrying or fails if !__GFP_WAIT.
>
> * If the caller was calling in with GFP_ATOMIC, it never gets to try
> emergency reserve. Allocations which would have succeeded without
> mempool may fail, which is just wrong.
>
> * Allocation which could have succeeded after a bit of reclaim now has
> to wait on the reserved items and it's not like mempool doesn't retry
> with the original gfp mask. It just does that *after* someone returns
> an element, pointlessly delaying things.
>
> Fix it by retrying immediately with the gfp mask requested by the caller
> if the first round of allocation attempts fails with modified mask.
I can't even explain why this (simple!) logic looks confusing to me,
with or without the patch. A couple of questions:
1. Why do we remove __GFP_WAIT unconditionally before the the
very 1st allocation?
2. Why do we always restore it after io_schedule(), even if
we have the reserved items?
IOW, what do you think about the patch below instead?
Oleg.
--- x/mm/mempool.c
+++ x/mm/mempool.c
@@ -212,10 +212,12 @@ void * mempool_alloc(mempool_t *pool, gf
gfp_mask |= __GFP_NORETRY; /* don't loop in __alloc_pages */
gfp_mask |= __GFP_NOWARN; /* failures are OK */
- gfp_temp = gfp_mask & ~(__GFP_WAIT|__GFP_IO);
-
repeat_alloc:
+ gfp_temp = gfp_mask;
+ if (pool->curr_nr)
+ gfp_temp &= ~(__GFP_WAIT|__GFP_IO);
+
element = pool->alloc(gfp_temp, pool->pool_data);
if (likely(element != NULL))
return element;
@@ -229,13 +231,15 @@ repeat_alloc:
}
/* We must not sleep in the GFP_ATOMIC case */
- if (!(gfp_mask & __GFP_WAIT)) {
+ if (!(gfp_temp & __GFP_WAIT)) {
spin_unlock_irqrestore(&pool->lock, flags);
+ /* raced with another mempool_alloc? */
+ if (gfp_mask & __GFP_WAIT)
+ goto repeat_alloc;
return NULL;
}
/* Let's wait for someone else to return an element to @pool */
- gfp_temp = gfp_mask;
init_wait(&wait);
prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: + mempool-fix-first-round-failure-behavior.patch added to -mm tree
2011-12-22 16:39 + mempool-fix-first-round-failure-behavior.patch added to -mm tree Oleg Nesterov
@ 2011-12-22 16:49 ` Tejun Heo
0 siblings, 0 replies; 3+ messages in thread
From: Tejun Heo @ 2011-12-22 16:49 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: Andrew Morton, linux-kernel
Hello, Oleg.
On Thu, Dec 22, 2011 at 05:39:00PM +0100, Oleg Nesterov wrote:
> I can't even explain why this (simple!) logic looks confusing to me,
Yeah, gfp_mask and temp confused me pretty good too.
> with or without the patch. A couple of questions:
>
> 1. Why do we remove __GFP_WAIT unconditionally before the the
> very 1st allocation?
To avoid blocking when there's pool sitting around.
> 2. Why do we always restore it after io_schedule(), even if
> we have the reserved items?
No idea.
> @@ -212,10 +212,12 @@ void * mempool_alloc(mempool_t *pool, gf
> gfp_mask |= __GFP_NORETRY; /* don't loop in __alloc_pages */
> gfp_mask |= __GFP_NOWARN; /* failures are OK */
>
> - gfp_temp = gfp_mask & ~(__GFP_WAIT|__GFP_IO);
> -
> repeat_alloc:
>
> + gfp_temp = gfp_mask;
> + if (pool->curr_nr)
> + gfp_temp &= ~(__GFP_WAIT|__GFP_IO);
> +
> element = pool->alloc(gfp_temp, pool->pool_data);
> if (likely(element != NULL))
> return element;
> @@ -229,13 +231,15 @@ repeat_alloc:
> }
>
> /* We must not sleep in the GFP_ATOMIC case */
> - if (!(gfp_mask & __GFP_WAIT)) {
> + if (!(gfp_temp & __GFP_WAIT)) {
> spin_unlock_irqrestore(&pool->lock, flags);
> + /* raced with another mempool_alloc? */
> + if (gfp_mask & __GFP_WAIT)
> + goto repeat_alloc;
> return NULL;
> }
>
> /* Let's wait for someone else to return an element to @pool */
> - gfp_temp = gfp_mask;
> init_wait(&wait);
> prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
Yeah, this one definitely looks better & makes more sense. Andrew,
please feel free to drop mine and take this one.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 3+ messages in thread
* + mempool-fix-first-round-failure-behavior.patch added to -mm tree
@ 2011-12-22 0:33 akpm
0 siblings, 0 replies; 3+ messages in thread
From: akpm @ 2011-12-22 0:33 UTC (permalink / raw)
To: mm-commits; +Cc: tj
The patch titled
Subject: mempool: fix first round failure behavior
has been added to the -mm tree. Its filename is
mempool-fix-first-round-failure-behavior.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Tejun Heo <tj@kernel.org>
Subject: mempool: fix first round failure behavior
For the initial allocation, mempool passes modified gfp mask to the
backing allocator so that it doesn't try too hard when there are reserved
elements waiting in the pool; however, when that allocation fails and pool
is empty too, it either waits for the pool to be replenished before
retrying or fails if !__GFP_WAIT.
* If the caller was calling in with GFP_ATOMIC, it never gets to try
emergency reserve. Allocations which would have succeeded without
mempool may fail, which is just wrong.
* Allocation which could have succeeded after a bit of reclaim now has
to wait on the reserved items and it's not like mempool doesn't retry
with the original gfp mask. It just does that *after* someone returns
an element, pointlessly delaying things.
Fix it by retrying immediately with the gfp mask requested by the caller
if the first round of allocation attempts fails with modified mask.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/mempool.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff -puN mm/mempool.c~mempool-fix-first-round-failure-behavior mm/mempool.c
--- a/mm/mempool.c~mempool-fix-first-round-failure-behavior
+++ a/mm/mempool.c
@@ -221,14 +221,24 @@ repeat_alloc:
return element;
}
- /* We must not sleep in the GFP_ATOMIC case */
+ /*
+ * We use modified gfp mask for the first round. If alloc failed
+ * with that and @pool was empty too, immediately retry with the
+ * original gfp mask.
+ */
+ if (gfp_temp != gfp_mask) {
+ gfp_temp = gfp_mask;
+ spin_unlock_irqrestore(&pool->lock, flags);
+ goto repeat_alloc;
+ }
+
+ /* We must not sleep if !__GFP_WAIT */
if (!(gfp_mask & __GFP_WAIT)) {
spin_unlock_irqrestore(&pool->lock, flags);
return NULL;
}
/* Let's wait for someone else to return an element to @pool */
- gfp_temp = gfp_mask;
init_wait(&wait);
prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
_
Subject: Subject: mempool: fix first round failure behavior
Patches currently in -mm which might be from tj@kernel.org are
origin.patch
linux-next.patch
mm-page_alloc-generalize-order-handling-in-__free_pages_bootmem.patch
mm-bootmem-drop-superfluous-range-check-when-freeing-pages-in-bulk.patch
mm-bootmem-try-harder-to-free-pages-in-bulk.patch
mempool-fix-and-document-synchronization-and-memory-barrier-usage.patch
mempool-drop-unnecessary-and-incorrect-bug_on-from-mempool_destroy.patch
mempool-fix-first-round-failure-behavior.patch
procfs-make-proc_get_link-to-use-dentry-instead-of-inode.patch
procfs-introduce-the-proc-pid-map_files-directory.patch
procfs-introduce-the-proc-pid-map_files-directory-make-proc-pid-map_files-being-checkpoint_restore-dependent.patch
workqueue-make-alloc_workqueue-take-printf-fmt-and-args-for-name.patch
workqueue-make-alloc_workqueue-take-printf-fmt-and-args-for-name-fix.patch
sysctl-add-the-kernelns_last_pid-control.patch
scatterlist-new-helper-functions.patch
scatterlist-new-helper-functions-update.patch
scatterlist-new-helper-functions-update-fix.patch
c-r-introduce-checkpoint_restore-symbol.patch
c-r-procfs-add-start_data-end_data-start_brk-members-to-proc-pid-stat-v4.patch
c-r-procfs-add-start_data-end_data-start_brk-members-to-proc-pid-stat-v4-fix.patch
c-r-prctl-add-pr_set_mm-codes-to-set-up-mm_struct-entries.patch
c-r-prctl-add-pr_set_mm-codes-to-set-up-mm_struct-entries-fix.patch
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-12-22 16:50 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-22 16:39 + mempool-fix-first-round-failure-behavior.patch added to -mm tree Oleg Nesterov
2011-12-22 16:49 ` Tejun Heo
-- strict thread matches above, loose matches on Subject: below --
2011-12-22 0:33 akpm
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.