[PATCH]: buddy allocator: ext3 failed to alloc with __GFP

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH]: buddy allocator: ext3 failed to alloc with __GFP_NOFAIL
@ 2005-11-08 13:48 Kirill Korotaev
  2005-11-09 19:46 ` Kirill Korotaev
  0 siblings, 1 reply; 4+ messages in thread
From: Kirill Korotaev @ 2005-11-08 13:48 UTC (permalink / raw)
  To: Andrew Morton, xemul, linux-kernel, st

[-- Attachment #1: Type: text/plain, Size: 2718 bytes --]

Hello Andrew,

we had the following ext3 problems once during stress testing (on 2.6.8):
-----------------------------------------------------------------
journal_get_undo_access: No memory for committed data
ext3_free_blocks: aborting transaction: Out of memory in 
__ext3_journal_get_undo_access<2>EXT3-fs error (device hda7) in 
ext3_free_blocks: Out of memory
Aborting journal on device hda7.
EXT3-fs error (device hda7) in ext3_ordered_commit_write: IO failure
ext3_abort called.
EXT3-fs abort (device hda7): ext3_journal_start: Detected aborted journal
Remounting filesystem read-only
ext3_free_blocks: aborting transaction: Journal has aborted in 
__ext3_journal_get_undo_access<2>EXT3-fs error (device hda7) in 
ext3_free_blocks: Journal has aborted
ext3_reserve_inode_write: aborting transaction: Journal has aborted in 
__ext3_journal_get_write_access<2>EXT3-fs error (device hda7) in 
ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device hda7) in ext3_truncate: Out of memory
ext3_reserve_inode_write: aborting transaction: Journal has aborted in 
__ext3_journal_get_write_access<2>EXT3-fs error (device hda7) in 
ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device hda7) in ext3_orphan_del: Journal has aborted
ext3_reserve_inode_write: aborting transaction: Journal has aborted in 
__ext3_journal_get_write_access<2>EXT3-fs error (device hda7) in 
ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device hda7) in ext3_delete_inode: Out of memory
__journal_remove_journal_head: freeing b_committed_data
..................
------------------------------------------------------------------

As it is seen from the messages journal_get_undo_access() failed to 
allocate some memory with jbd_kmalloc(), which in turn called kmalloc() 
with __GFP_NOFAIL flag.

How could it happen?
The only possible reason for this we suppose is a piece of code in 
__alloc_pages():

if ((p->flags & (PF_MEMALLOC | PF_MEMDIE)) && !in_interrupt()) {
         /* go through the zonelist yet again, ignoring mins */
         for (i = 0; zones[i] != NULL; i++) {
                 struct zone *z = zones[i];

                 page = buffered_rmqueue(z, order, gfp_mask);
                 if (page) {
                         zone_statistics(zonelist, z);
                         goto got_pg;
                 }
         }
         goto nopage;				<<<< HERE!!! FAIL...
}


So kswapd (which has PF_MEMALLOC flag) can fail to allocate memory even 
when it allocates it with __GFP_NOFAIL flag.

Signed-Off-By: Pavel Emelianov <xemul@sw.ru>
Signed-Off-By: Denis Lunev <den@sw.ru>
Signed-Off-By: Kirill Korotaev <dev@sw.ru>

The attached patch should fix the problem (against 2.6.14).

Kirill

[-- Attachment #2: diff-alloc-nofail --]
[-- Type: text/plain, Size: 931 bytes --]

--- ./mm/page_alloc.c.mmreb	2005-10-28 04:02:08.000000000 +0400
+++ ./mm/page_alloc.c	2005-11-08 16:32:36.000000000 +0300
@@ -867,6 +867,7 @@ zone_reclaim_retry:
 
 	/* This allocation should allow future memory freeing. */
 
+rebalance:
 	if (((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE)))
 			&& !in_interrupt()) {
 		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
@@ -879,14 +880,13 @@ zone_reclaim_retry:
 					goto got_pg;
 			}
 		}
-		goto nopage;
+		goto empty;
 	}
 
 	/* Atomic allocations - we can't balance anything */
 	if (!wait)
 		goto nopage;
 
-rebalance:
 	cond_resched();
 
 	/* We now go into synchronous reclaim */
@@ -946,6 +946,7 @@ rebalance:
 	 * In this implementation, __GFP_REPEAT means __GFP_NOFAIL for order
 	 * <= 3, but that may not be true in other implementations.
 	 */
+empty:
 	do_retry = 0;
 	if (!(gfp_mask & __GFP_NORETRY)) {
 		if ((order <= 3) || (gfp_mask & __GFP_REPEAT))

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH]: buddy allocator: ext3 failed to alloc with __GFP_NOFAIL
  2005-11-08 13:48 [PATCH]: buddy allocator: ext3 failed to alloc with __GFP_NOFAIL Kirill Korotaev
@ 2005-11-09 19:46 ` Kirill Korotaev
  2005-11-09 20:20   ` Andrew Morton
  2005-11-10  3:18   ` Andrew Morton
  0 siblings, 2 replies; 4+ messages in thread
From: Kirill Korotaev @ 2005-11-09 19:46 UTC (permalink / raw)
  To: Andrey Savochkin; +Cc: Andrew Morton, xemul, linux-kernel, den

[-- Attachment #1: Type: text/plain, Size: 3077 bytes --]

Hello,

as Andrey Savochkin pointed to me, the previous patch is incorrect since 
makes allocations with PF_MEMALLOC to be always success for order <= 3 
which is not what we usually want.
I remade the patch to be more explicit about the issue - now it retries 
allocation _only_ if __GFP_NOFAIL is set.

------------- original comment below ------------------

we had the following ext3 problems once during stress testing (on 2.6.8):
-----------------------------------------------------------------
journal_get_undo_access: No memory for committed data
ext3_free_blocks: aborting transaction: Out of memory in 
__ext3_journal_get_undo_access<2>EXT3-fs error (device hda7) in 
ext3_free_blocks: Out of memory
Aborting journal on device hda7.
EXT3-fs error (device hda7) in ext3_ordered_commit_write: IO failure
ext3_abort called.
EXT3-fs abort (device hda7): ext3_journal_start: Detected aborted journal
Remounting filesystem read-only
ext3_free_blocks: aborting transaction: Journal has aborted in 
__ext3_journal_get_undo_access<2>EXT3-fs error (device hda7) in 
ext3_free_blocks: Journal has aborted
ext3_reserve_inode_write: aborting transaction: Journal has aborted in 
__ext3_journal_get_write_access<2>EXT3-fs error (device hda7) in 
ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device hda7) in ext3_truncate: Out of memory
ext3_reserve_inode_write: aborting transaction: Journal has aborted in 
__ext3_journal_get_write_access<2>EXT3-fs error (device hda7) in 
ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device hda7) in ext3_orphan_del: Journal has aborted
ext3_reserve_inode_write: aborting transaction: Journal has aborted in 
__ext3_journal_get_write_access<2>EXT3-fs error (device hda7) in 
ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device hda7) in ext3_delete_inode: Out of memory
__journal_remove_journal_head: freeing b_committed_data
..................
------------------------------------------------------------------

As it is seen from the messages journal_get_undo_access() failed to 
allocate some memory with jbd_kmalloc(), which in turn called kmalloc() 
with __GFP_NOFAIL flag.

How could it happen?
The only possible reason for this we suppose is a piece of code in 
__alloc_pages():

if ((p->flags & (PF_MEMALLOC | PF_MEMDIE)) && !in_interrupt()) {
         /* go through the zonelist yet again, ignoring mins */
         for (i = 0; zones[i] != NULL; i++) {
                 struct zone *z = zones[i];

                 page = buffered_rmqueue(z, order, gfp_mask);
                 if (page) {
                         zone_statistics(zonelist, z);
                         goto got_pg;
                 }
         }
         goto nopage;                <<<< HERE!!! FAIL...
}


So kswapd (which has PF_MEMALLOC flag) can fail to allocate memory even 
when it allocates it with __GFP_NOFAIL flag.

Signed-Off-By: Pavel Emelianov <xemul@sw.ru>
Signed-Off-By: Denis Lunev <den@sw.ru>
Signed-Off-By: Kirill Korotaev <dev@sw.ru>

The attached patch should fix the problem (against 2.6.14).

Kirill

[-- Attachment #2: diff-alloc-nofail --]
[-- Type: text/plain, Size: 679 bytes --]

--- ./mm/page_alloc.c.alpg	2005-11-09 21:42:50.000000000 +0300
+++ ./mm/page_alloc.c	2005-11-09 21:44:22.000000000 +0300
@@ -870,6 +870,7 @@ zone_reclaim_retry:
 	if (((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE)))
 			&& !in_interrupt()) {
 		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
+nofail_alloc:
 			/* go through the zonelist yet again, ignoring mins */
 			for (i = 0; (z = zones[i]) != NULL; i++) {
 				if (!cpuset_zone_allowed(z, gfp_mask))
@@ -878,6 +879,10 @@ zone_reclaim_retry:
 				if (page)
 					goto got_pg;
 			}
+			if (gfp_mask & __GFP_NOFAIL) {
+				blk_congestion_wait(WRITE, HZ/50);
+				goto nofail_alloc;
+			}
 		}
 		goto nopage;
 	}

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH]: buddy allocator: ext3 failed to alloc with __GFP_NOFAIL
  2005-11-09 19:46 ` Kirill Korotaev
@ 2005-11-09 20:20   ` Andrew Morton
  2005-11-10  3:18   ` Andrew Morton
  1 sibling, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2005-11-09 20:20 UTC (permalink / raw)
  To: Kirill Korotaev; +Cc: saw, xemul, linux-kernel, den

Kirill Korotaev <dev@sw.ru> wrote:
>
> as Andrey Savochkin pointed to me, the previous patch is incorrect since 
>  makes allocations with PF_MEMALLOC to be always success for order <= 3 
>  which is not what we usually want.

OK, thanks.  This patch was still in my
things-i-need-to-spend-half-an-hour-thinking-about queue anyway.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH]: buddy allocator: ext3 failed to alloc with __GFP_NOFAIL
  2005-11-09 19:46 ` Kirill Korotaev
  2005-11-09 20:20   ` Andrew Morton
@ 2005-11-10  3:18   ` Andrew Morton
  1 sibling, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2005-11-10  3:18 UTC (permalink / raw)
  To: Kirill Korotaev; +Cc: saw, xemul, linux-kernel, den, Nick Piggin

Kirill Korotaev <dev@sw.ru> wrote:
>
> So kswapd (which has PF_MEMALLOC flag) can fail to allocate memory even 
>  when it allocates it with __GFP_NOFAIL flag.
> 
>  --- ./mm/page_alloc.c.alpg	2005-11-09 21:42:50.000000000 +0300
>  +++ ./mm/page_alloc.c	2005-11-09 21:44:22.000000000 +0300
>  @@ -870,6 +870,7 @@ zone_reclaim_retry:
>   	if (((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE)))
>   			&& !in_interrupt()) {
>   		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
>  +nofail_alloc:
>   			/* go through the zonelist yet again, ignoring mins */
>   			for (i = 0; (z = zones[i]) != NULL; i++) {
>   				if (!cpuset_zone_allowed(z, gfp_mask))
>  @@ -878,6 +879,10 @@ zone_reclaim_retry:
>   				if (page)
>   					goto got_pg;
>   			}
>  +			if (gfp_mask & __GFP_NOFAIL) {
>  +				blk_congestion_wait(WRITE, HZ/50);
>  +				goto nofail_alloc;
>  +			}
>   		}
>   		goto nopage;
>   	}

The problem here is that we'll loop if TIF_MEMDIE is set.

But given that the caller has specified __GFP_NOFAIL, I think that's
correct behaviour - __GFP_NOFAIL means "I am lame, and will oops if you
cannot allocate memory".   So we just ignore TIF_MEMDIE..

That being said, why do we need another loop here?  Would it not
be sufficient to do:

--- devel/mm/page_alloc.c~a	2005-11-09 19:15:03.000000000 -0800
+++ devel-akpm/mm/page_alloc.c	2005-11-09 19:15:32.000000000 -0800
@@ -907,7 +907,8 @@ zone_reclaim_retry:
 					goto got_pg;
 			}
 		}
-		goto nopage;
+		if (!(gfp_mask & __GFP_NOFAIL))
+			goto nopage;
 	}
 
 	/* Atomic allocations - we can't balance anything */
_

Answer: because that way we'll go recursive if PF_MEMALLOC is set.  Ho-hum.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-11-10  3:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-08 13:48 [PATCH]: buddy allocator: ext3 failed to alloc with __GFP_NOFAIL Kirill Korotaev
2005-11-09 19:46 ` Kirill Korotaev
2005-11-09 20:20   ` Andrew Morton
2005-11-10  3:18   ` Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.