* [PATCH]: buddy allocator: ext3 failed to alloc with __GFP_NOFAIL
@ 2005-11-08 13:48 Kirill Korotaev
2005-11-09 19:46 ` Kirill Korotaev
0 siblings, 1 reply; 4+ messages in thread
From: Kirill Korotaev @ 2005-11-08 13:48 UTC (permalink / raw)
To: Andrew Morton, xemul, linux-kernel, st
[-- Attachment #1: Type: text/plain, Size: 2718 bytes --]
Hello Andrew,
we had the following ext3 problems once during stress testing (on 2.6.8):
-----------------------------------------------------------------
journal_get_undo_access: No memory for committed data
ext3_free_blocks: aborting transaction: Out of memory in
__ext3_journal_get_undo_access<2>EXT3-fs error (device hda7) in
ext3_free_blocks: Out of memory
Aborting journal on device hda7.
EXT3-fs error (device hda7) in ext3_ordered_commit_write: IO failure
ext3_abort called.
EXT3-fs abort (device hda7): ext3_journal_start: Detected aborted journal
Remounting filesystem read-only
ext3_free_blocks: aborting transaction: Journal has aborted in
__ext3_journal_get_undo_access<2>EXT3-fs error (device hda7) in
ext3_free_blocks: Journal has aborted
ext3_reserve_inode_write: aborting transaction: Journal has aborted in
__ext3_journal_get_write_access<2>EXT3-fs error (device hda7) in
ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device hda7) in ext3_truncate: Out of memory
ext3_reserve_inode_write: aborting transaction: Journal has aborted in
__ext3_journal_get_write_access<2>EXT3-fs error (device hda7) in
ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device hda7) in ext3_orphan_del: Journal has aborted
ext3_reserve_inode_write: aborting transaction: Journal has aborted in
__ext3_journal_get_write_access<2>EXT3-fs error (device hda7) in
ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device hda7) in ext3_delete_inode: Out of memory
__journal_remove_journal_head: freeing b_committed_data
..................
------------------------------------------------------------------
As it is seen from the messages journal_get_undo_access() failed to
allocate some memory with jbd_kmalloc(), which in turn called kmalloc()
with __GFP_NOFAIL flag.
How could it happen?
The only possible reason for this we suppose is a piece of code in
__alloc_pages():
if ((p->flags & (PF_MEMALLOC | PF_MEMDIE)) && !in_interrupt()) {
/* go through the zonelist yet again, ignoring mins */
for (i = 0; zones[i] != NULL; i++) {
struct zone *z = zones[i];
page = buffered_rmqueue(z, order, gfp_mask);
if (page) {
zone_statistics(zonelist, z);
goto got_pg;
}
}
goto nopage; <<<< HERE!!! FAIL...
}
So kswapd (which has PF_MEMALLOC flag) can fail to allocate memory even
when it allocates it with __GFP_NOFAIL flag.
Signed-Off-By: Pavel Emelianov <xemul@sw.ru>
Signed-Off-By: Denis Lunev <den@sw.ru>
Signed-Off-By: Kirill Korotaev <dev@sw.ru>
The attached patch should fix the problem (against 2.6.14).
Kirill
[-- Attachment #2: diff-alloc-nofail --]
[-- Type: text/plain, Size: 931 bytes --]
--- ./mm/page_alloc.c.mmreb 2005-10-28 04:02:08.000000000 +0400
+++ ./mm/page_alloc.c 2005-11-08 16:32:36.000000000 +0300
@@ -867,6 +867,7 @@ zone_reclaim_retry:
/* This allocation should allow future memory freeing. */
+rebalance:
if (((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE)))
&& !in_interrupt()) {
if (!(gfp_mask & __GFP_NOMEMALLOC)) {
@@ -879,14 +880,13 @@ zone_reclaim_retry:
goto got_pg;
}
}
- goto nopage;
+ goto empty;
}
/* Atomic allocations - we can't balance anything */
if (!wait)
goto nopage;
-rebalance:
cond_resched();
/* We now go into synchronous reclaim */
@@ -946,6 +946,7 @@ rebalance:
* In this implementation, __GFP_REPEAT means __GFP_NOFAIL for order
* <= 3, but that may not be true in other implementations.
*/
+empty:
do_retry = 0;
if (!(gfp_mask & __GFP_NORETRY)) {
if ((order <= 3) || (gfp_mask & __GFP_REPEAT))
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH]: buddy allocator: ext3 failed to alloc with __GFP_NOFAIL
2005-11-08 13:48 [PATCH]: buddy allocator: ext3 failed to alloc with __GFP_NOFAIL Kirill Korotaev
@ 2005-11-09 19:46 ` Kirill Korotaev
2005-11-09 20:20 ` Andrew Morton
2005-11-10 3:18 ` Andrew Morton
0 siblings, 2 replies; 4+ messages in thread
From: Kirill Korotaev @ 2005-11-09 19:46 UTC (permalink / raw)
To: Andrey Savochkin; +Cc: Andrew Morton, xemul, linux-kernel, den
[-- Attachment #1: Type: text/plain, Size: 3077 bytes --]
Hello,
as Andrey Savochkin pointed to me, the previous patch is incorrect since
makes allocations with PF_MEMALLOC to be always success for order <= 3
which is not what we usually want.
I remade the patch to be more explicit about the issue - now it retries
allocation _only_ if __GFP_NOFAIL is set.
------------- original comment below ------------------
we had the following ext3 problems once during stress testing (on 2.6.8):
-----------------------------------------------------------------
journal_get_undo_access: No memory for committed data
ext3_free_blocks: aborting transaction: Out of memory in
__ext3_journal_get_undo_access<2>EXT3-fs error (device hda7) in
ext3_free_blocks: Out of memory
Aborting journal on device hda7.
EXT3-fs error (device hda7) in ext3_ordered_commit_write: IO failure
ext3_abort called.
EXT3-fs abort (device hda7): ext3_journal_start: Detected aborted journal
Remounting filesystem read-only
ext3_free_blocks: aborting transaction: Journal has aborted in
__ext3_journal_get_undo_access<2>EXT3-fs error (device hda7) in
ext3_free_blocks: Journal has aborted
ext3_reserve_inode_write: aborting transaction: Journal has aborted in
__ext3_journal_get_write_access<2>EXT3-fs error (device hda7) in
ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device hda7) in ext3_truncate: Out of memory
ext3_reserve_inode_write: aborting transaction: Journal has aborted in
__ext3_journal_get_write_access<2>EXT3-fs error (device hda7) in
ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device hda7) in ext3_orphan_del: Journal has aborted
ext3_reserve_inode_write: aborting transaction: Journal has aborted in
__ext3_journal_get_write_access<2>EXT3-fs error (device hda7) in
ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device hda7) in ext3_delete_inode: Out of memory
__journal_remove_journal_head: freeing b_committed_data
..................
------------------------------------------------------------------
As it is seen from the messages journal_get_undo_access() failed to
allocate some memory with jbd_kmalloc(), which in turn called kmalloc()
with __GFP_NOFAIL flag.
How could it happen?
The only possible reason for this we suppose is a piece of code in
__alloc_pages():
if ((p->flags & (PF_MEMALLOC | PF_MEMDIE)) && !in_interrupt()) {
/* go through the zonelist yet again, ignoring mins */
for (i = 0; zones[i] != NULL; i++) {
struct zone *z = zones[i];
page = buffered_rmqueue(z, order, gfp_mask);
if (page) {
zone_statistics(zonelist, z);
goto got_pg;
}
}
goto nopage; <<<< HERE!!! FAIL...
}
So kswapd (which has PF_MEMALLOC flag) can fail to allocate memory even
when it allocates it with __GFP_NOFAIL flag.
Signed-Off-By: Pavel Emelianov <xemul@sw.ru>
Signed-Off-By: Denis Lunev <den@sw.ru>
Signed-Off-By: Kirill Korotaev <dev@sw.ru>
The attached patch should fix the problem (against 2.6.14).
Kirill
[-- Attachment #2: diff-alloc-nofail --]
[-- Type: text/plain, Size: 679 bytes --]
--- ./mm/page_alloc.c.alpg 2005-11-09 21:42:50.000000000 +0300
+++ ./mm/page_alloc.c 2005-11-09 21:44:22.000000000 +0300
@@ -870,6 +870,7 @@ zone_reclaim_retry:
if (((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE)))
&& !in_interrupt()) {
if (!(gfp_mask & __GFP_NOMEMALLOC)) {
+nofail_alloc:
/* go through the zonelist yet again, ignoring mins */
for (i = 0; (z = zones[i]) != NULL; i++) {
if (!cpuset_zone_allowed(z, gfp_mask))
@@ -878,6 +879,10 @@ zone_reclaim_retry:
if (page)
goto got_pg;
}
+ if (gfp_mask & __GFP_NOFAIL) {
+ blk_congestion_wait(WRITE, HZ/50);
+ goto nofail_alloc;
+ }
}
goto nopage;
}
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH]: buddy allocator: ext3 failed to alloc with __GFP_NOFAIL
2005-11-09 19:46 ` Kirill Korotaev
@ 2005-11-09 20:20 ` Andrew Morton
2005-11-10 3:18 ` Andrew Morton
1 sibling, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2005-11-09 20:20 UTC (permalink / raw)
To: Kirill Korotaev; +Cc: saw, xemul, linux-kernel, den
Kirill Korotaev <dev@sw.ru> wrote:
>
> as Andrey Savochkin pointed to me, the previous patch is incorrect since
> makes allocations with PF_MEMALLOC to be always success for order <= 3
> which is not what we usually want.
OK, thanks. This patch was still in my
things-i-need-to-spend-half-an-hour-thinking-about queue anyway.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH]: buddy allocator: ext3 failed to alloc with __GFP_NOFAIL
2005-11-09 19:46 ` Kirill Korotaev
2005-11-09 20:20 ` Andrew Morton
@ 2005-11-10 3:18 ` Andrew Morton
1 sibling, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2005-11-10 3:18 UTC (permalink / raw)
To: Kirill Korotaev; +Cc: saw, xemul, linux-kernel, den, Nick Piggin
Kirill Korotaev <dev@sw.ru> wrote:
>
> So kswapd (which has PF_MEMALLOC flag) can fail to allocate memory even
> when it allocates it with __GFP_NOFAIL flag.
>
> --- ./mm/page_alloc.c.alpg 2005-11-09 21:42:50.000000000 +0300
> +++ ./mm/page_alloc.c 2005-11-09 21:44:22.000000000 +0300
> @@ -870,6 +870,7 @@ zone_reclaim_retry:
> if (((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE)))
> && !in_interrupt()) {
> if (!(gfp_mask & __GFP_NOMEMALLOC)) {
> +nofail_alloc:
> /* go through the zonelist yet again, ignoring mins */
> for (i = 0; (z = zones[i]) != NULL; i++) {
> if (!cpuset_zone_allowed(z, gfp_mask))
> @@ -878,6 +879,10 @@ zone_reclaim_retry:
> if (page)
> goto got_pg;
> }
> + if (gfp_mask & __GFP_NOFAIL) {
> + blk_congestion_wait(WRITE, HZ/50);
> + goto nofail_alloc;
> + }
> }
> goto nopage;
> }
The problem here is that we'll loop if TIF_MEMDIE is set.
But given that the caller has specified __GFP_NOFAIL, I think that's
correct behaviour - __GFP_NOFAIL means "I am lame, and will oops if you
cannot allocate memory". So we just ignore TIF_MEMDIE..
That being said, why do we need another loop here? Would it not
be sufficient to do:
--- devel/mm/page_alloc.c~a 2005-11-09 19:15:03.000000000 -0800
+++ devel-akpm/mm/page_alloc.c 2005-11-09 19:15:32.000000000 -0800
@@ -907,7 +907,8 @@ zone_reclaim_retry:
goto got_pg;
}
}
- goto nopage;
+ if (!(gfp_mask & __GFP_NOFAIL))
+ goto nopage;
}
/* Atomic allocations - we can't balance anything */
_
Answer: because that way we'll go recursive if PF_MEMALLOC is set. Ho-hum.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2005-11-10 3:22 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-08 13:48 [PATCH]: buddy allocator: ext3 failed to alloc with __GFP_NOFAIL Kirill Korotaev
2005-11-09 19:46 ` Kirill Korotaev
2005-11-09 20:20 ` Andrew Morton
2005-11-10 3:18 ` Andrew Morton
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.