linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: mhocko@kernel.org
To: LKML <linux-kernel@vger.kernel.org>
Cc: <linux-mm@kvack.org>, <linux-fsdevel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Dave Chinner <david@fromorbit.com>,
	"Theodore Ts'o" <tytso@mit.edu>,
	linux-btrfs@vger.kernel.org, linux-ext4@vger.kernel.org,
	Jan Kara <jack@suse.cz>, Michal Hocko <mhocko@suse.com>
Subject: [RFC 5/8] ext4: Do not fail journal due to block allocator
Date: Wed,  5 Aug 2015 11:51:21 +0200	[thread overview]
Message-ID: <1438768284-30927-6-git-send-email-mhocko@kernel.org> (raw)
In-Reply-To: <1438768284-30927-1-git-send-email-mhocko@kernel.org>

From: Michal Hocko <mhocko@suse.com>

Since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
memory allocator doesn't endlessly loop to satisfy low-order allocations
and instead fails them to allow callers to handle them gracefully.

Some of the callers are not yet prepared for this behavior though. ext4
block allocator relies solely on GFP_NOFS allocation requests and
allocation failures lead to aborting yournal too easily:

[  345.028333] oom-trash: page allocation failure: order:0, mode:0x50
[  345.028336] CPU: 1 PID: 8334 Comm: oom-trash Tainted: G        W       4.0.0-nofs3-00006-gdfe9931f5f68 #588
[  345.028337] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150428_134905-gandalf 04/01/2014
[  345.028339]  0000000000000000 ffff880005a17708 ffffffff81538a54 ffffffff8107a40f
[  345.028341]  0000000000000050 ffff880005a17798 ffffffff810fe854 0000000180000000
[  345.028342]  0000000000000046 0000000000000000 ffffffff81a52100 0000000000000246
[  345.028343] Call Trace:
[  345.028348]  [<ffffffff81538a54>] dump_stack+0x4f/0x7b
[  345.028370]  [<ffffffff810fe854>] warn_alloc_failed+0x12a/0x13f
[  345.028373]  [<ffffffff81101bd2>] __alloc_pages_nodemask+0x7f3/0x8aa
[  345.028375]  [<ffffffff810f9933>] pagecache_get_page+0x12a/0x1c9
[  345.028390]  [<ffffffffa005bc64>] ext4_mb_load_buddy+0x220/0x367 [ext4]
[  345.028414]  [<ffffffffa006014f>] ext4_free_blocks+0x522/0xa4c [ext4]
[  345.028425]  [<ffffffffa0054e14>] ext4_ext_remove_space+0x833/0xf22 [ext4]
[  345.028434]  [<ffffffffa005677e>] ext4_ext_truncate+0x8c/0xb0 [ext4]
[  345.028441]  [<ffffffffa00342bf>] ext4_truncate+0x20b/0x38d [ext4]
[  345.028462]  [<ffffffffa003573c>] ext4_evict_inode+0x32b/0x4c1 [ext4]
[  345.028464]  [<ffffffff8116d04f>] evict+0xa0/0x148
[  345.028466]  [<ffffffff8116dca8>] iput+0x1a1/0x1f0
[  345.028468]  [<ffffffff811697b4>] __dentry_kill+0x136/0x1a6
[  345.028470]  [<ffffffff81169a3e>] dput+0x21a/0x243
[  345.028472]  [<ffffffff81157cda>] __fput+0x184/0x19b
[  345.028473]  [<ffffffff81157d29>] ____fput+0xe/0x10
[  345.028475]  [<ffffffff8105a05f>] task_work_run+0x8a/0xa1
[  345.028477]  [<ffffffff810452f0>] do_exit+0x3c6/0x8dc
[  345.028482]  [<ffffffff8104588a>] do_group_exit+0x4d/0xb2
[  345.028483]  [<ffffffff8104eeeb>] get_signal+0x5b1/0x5f5
[  345.028488]  [<ffffffff81002202>] do_signal+0x28/0x5d0
[...]
[  345.028624] EXT4-fs error (device hdb1) in ext4_free_blocks:4879: Out of memory
[  345.033097] Aborting journal on device hdb1-8.
[  345.036339] EXT4-fs (hdb1): Remounting filesystem read-only
[  345.036344] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.036766] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.038583] EXT4-fs error (device hdb1) in ext4_ext_remove_space:3048: Journal has aborted
[  345.049115] EXT4-fs error (device hdb1) in ext4_ext_truncate:4669: Journal has aborted
[  345.050434] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.053064] EXT4-fs error (device hdb1) in ext4_truncate:3668: Journal has aborted
[  345.053582] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.053946] EXT4-fs error (device hdb1) in ext4_orphan_del:2686: Journal has aborted
[  345.055367] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted

The failure is really premature because GFP_NOFS allocation context is
very restricted - especially in the fs metadata heavy loads. Before we
go with a more sofisticated solution, let's simply imitate the previous
behavior of non-failing NOFS allocation and use __GFP_NOFAIL for the
buddy block allocator. I wasn't able to trigger the issue with this
patch anymore.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/ext4/mballoc.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5b1613a54307..e6361622bfd5 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -992,7 +992,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
 	block = group * 2;
 	pnum = block / blocks_per_page;
 	poff = block % blocks_per_page;
-	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+	page = find_or_create_page(inode->i_mapping, pnum,
+				   GFP_NOFS|__GFP_NOFAIL);
 	if (!page)
 		return -ENOMEM;
 	BUG_ON(page->mapping != inode->i_mapping);
@@ -1006,7 +1007,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
 
 	block++;
 	pnum = block / blocks_per_page;
-	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+	page = find_or_create_page(inode->i_mapping, pnum,
+				   GFP_NOFS|__GFP_NOFAIL);
 	if (!page)
 		return -ENOMEM;
 	BUG_ON(page->mapping != inode->i_mapping);
@@ -1158,7 +1160,8 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 			 * wait for it to initialize.
 			 */
 			page_cache_release(page);
-		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+		page = find_or_create_page(inode->i_mapping, pnum,
+					   GFP_NOFS|__GFP_NOFAIL);
 		if (page) {
 			BUG_ON(page->mapping != inode->i_mapping);
 			if (!PageUptodate(page)) {
@@ -1194,7 +1197,8 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 	if (page == NULL || !PageUptodate(page)) {
 		if (page)
 			page_cache_release(page);
-		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+		page = find_or_create_page(inode->i_mapping, pnum,
+					   GFP_NOFS|__GFP_NOFAIL);
 		if (page) {
 			BUG_ON(page->mapping != inode->i_mapping);
 			if (!PageUptodate(page)) {
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2015-08-05  9:51 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-05  9:51 [RFC 0/8] Allow GFP_NOFS allocation to fail mhocko
2015-08-05  9:51 ` [RFC 1/8] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves mhocko
2015-08-05  9:51 ` [RFC 2/8] mm: Allow GFP_IOFS for page_cache_read page cache allocation mhocko
2015-08-05  9:51 ` [RFC 3/8] mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM mhocko
2015-08-05  9:51 ` [RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure mhocko
2015-08-05 11:42   ` Jan Kara
2015-08-05 16:49   ` Greg Thelen
2015-08-12  9:14     ` Michal Hocko
2015-08-15 13:54       ` Theodore Ts'o
2015-08-18 10:36         ` Michal Hocko
2015-08-24 12:06         ` Michal Hocko
2015-08-18 10:38   ` [RFC -v2 " Michal Hocko
2015-08-05  9:51 ` mhocko [this message]
2015-08-05 11:43   ` [RFC 5/8] ext4: Do not fail journal due to block allocator Jan Kara
2015-08-18 10:39   ` [RFC -v2 " Michal Hocko
2015-08-18 10:55     ` Michal Hocko
2015-08-05  9:51 ` [RFC 6/8] ext3: Do not abort journal prematurely mhocko
2015-08-18 10:39   ` [RFC -v2 " Michal Hocko
2015-08-05  9:51 ` [RFC 7/8] btrfs: Prevent from early transaction abort mhocko
2015-08-05 16:31   ` David Sterba
2015-08-18 10:40   ` [RFC -v2 " Michal Hocko
2015-08-18 11:01     ` Michal Hocko
2015-08-18 17:11     ` Chris Mason
2015-08-18 17:29       ` Michal Hocko
2015-08-19 12:26         ` Michal Hocko
2015-08-05  9:51 ` [RFC 8/8] btrfs: use __GFP_NOFAIL in alloc_btrfs_bio mhocko
2015-08-05 16:32   ` David Sterba
2015-08-18 10:41   ` [RFC -v2 " Michal Hocko
2015-08-05 19:58 ` [RFC 0/8] Allow GFP_NOFS allocation to fail Andreas Dilger
2015-08-06 14:34 ` Michal Hocko
2015-09-07 16:51 ` Tetsuo Handa
2015-09-15 13:16   ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1438768284-30927-6-git-send-email-mhocko@kernel.org \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).