* [PATCH 1/2] btrfs: Prevent from early transaction abort
2015-08-19 12:17 [PATCH 0/2] btrfs: fortification for GFP_NOFS allocations mhocko
@ 2015-08-19 12:17 ` mhocko
2015-08-19 12:17 ` [PATCH 2/2] btrfs: use __GFP_NOFAIL in alloc_btrfs_bio mhocko
2015-08-19 18:17 ` [PATCH 0/2] btrfs: fortification for GFP_NOFS allocations Chris Mason
2 siblings, 0 replies; 6+ messages in thread
From: mhocko @ 2015-08-19 12:17 UTC (permalink / raw)
To: linux-btrfs
Cc: Chris Mason, Josef Bacik, David Sterba, linux-kernel,
Michal Hocko
From: Michal Hocko <mhocko@suse.com>
Btrfs relies on GFP_NOFS allocation when committing the transaction but
this allocation context is rather weak wrt. reclaim capabilities. The
page allocator currently tries hard to not fail these allocations if
they are small (<=PAGE_ALLOC_COSTLY_ORDER) so this is not a problem
currently but there is an attempt to move away from the default no-fail
behavior and allow these allocation to fail more eagerly. And this would
lead to a pre-mature transaction abort as follows:
[ 55.328093] Call Trace:
[ 55.328890] [<ffffffff8154e6f0>] dump_stack+0x4f/0x7b
[ 55.330518] [<ffffffff8108fa28>] ? console_unlock+0x334/0x363
[ 55.332738] [<ffffffff8110873e>] __alloc_pages_nodemask+0x81d/0x8d4
[ 55.334910] [<ffffffff81100752>] pagecache_get_page+0x10e/0x20c
[ 55.336844] [<ffffffffa007d916>] alloc_extent_buffer+0xd0/0x350 [btrfs]
[ 55.338973] [<ffffffffa0059d8c>] btrfs_find_create_tree_block+0x15/0x17 [btrfs]
[ 55.341329] [<ffffffffa004f728>] btrfs_alloc_tree_block+0x18c/0x405 [btrfs]
[ 55.343566] [<ffffffffa003fa34>] split_leaf+0x1e4/0x6a6 [btrfs]
[ 55.345577] [<ffffffffa0040567>] btrfs_search_slot+0x671/0x831 [btrfs]
[ 55.347679] [<ffffffff810682d7>] ? get_parent_ip+0xe/0x3e
[ 55.349434] [<ffffffffa0041cb2>] btrfs_insert_empty_items+0x5d/0xa8 [btrfs]
[ 55.351681] [<ffffffffa004ecfb>] __btrfs_run_delayed_refs+0x7a6/0xf35 [btrfs]
[ 55.353979] [<ffffffffa00512ea>] btrfs_run_delayed_refs+0x6e/0x226 [btrfs]
[ 55.356212] [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[ 55.358378] [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[ 55.360626] [<ffffffffa0060221>] btrfs_commit_transaction+0x4c/0xaba [btrfs]
[ 55.362894] [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[ 55.365221] [<ffffffffa0073428>] btrfs_sync_file+0x29c/0x310 [btrfs]
[ 55.367273] [<ffffffff81186808>] vfs_fsync_range+0x8f/0x9e
[ 55.369047] [<ffffffff81186833>] vfs_fsync+0x1c/0x1e
[ 55.370654] [<ffffffff81186869>] do_fsync+0x34/0x4e
[ 55.372246] [<ffffffff81186ab3>] SyS_fsync+0x10/0x14
[ 55.373851] [<ffffffff81554f97>] system_call_fastpath+0x12/0x6f
[ 55.381070] BTRFS: error (device hdb1) in btrfs_run_delayed_refs:2821: errno=-12 Out of memory
[ 55.382431] BTRFS warning (device hdb1): Skipping commit of aborted transaction.
[ 55.382433] BTRFS warning (device hdb1): cleanup_transaction:1692: Aborting unused transaction(IO failure).
[ 55.384280] ------------[ cut here ]------------
[ 55.384312] WARNING: CPU: 0 PID: 3010 at fs/btrfs/delayed-ref.c:438 btrfs_select_ref_head+0xd9/0xfe [btrfs]()
[...]
[ 55.384337] Call Trace:
[ 55.384353] [<ffffffff8154e6f0>] dump_stack+0x4f/0x7b
[ 55.384357] [<ffffffff8107f717>] ? down_trylock+0x2d/0x37
[ 55.384359] [<ffffffff81046977>] warn_slowpath_common+0xa1/0xbb
[ 55.384398] [<ffffffffa00a1d6b>] ? btrfs_select_ref_head+0xd9/0xfe [btrfs]
[ 55.384400] [<ffffffff81046a34>] warn_slowpath_null+0x1a/0x1c
[ 55.384423] [<ffffffffa00a1d6b>] btrfs_select_ref_head+0xd9/0xfe [btrfs]
[ 55.384446] [<ffffffffa004e5f7>] ? __btrfs_run_delayed_refs+0xa2/0xf35 [btrfs]
[ 55.384455] [<ffffffffa004e600>] __btrfs_run_delayed_refs+0xab/0xf35 [btrfs]
[ 55.384476] [<ffffffffa00512ea>] btrfs_run_delayed_refs+0x6e/0x226 [btrfs]
[ 55.384499] [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[ 55.384521] [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[ 55.384543] [<ffffffffa0060221>] btrfs_commit_transaction+0x4c/0xaba [btrfs]
[ 55.384565] [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[ 55.384588] [<ffffffffa0073428>] btrfs_sync_file+0x29c/0x310 [btrfs]
[ 55.384591] [<ffffffff81186808>] vfs_fsync_range+0x8f/0x9e
[ 55.384592] [<ffffffff81186833>] vfs_fsync+0x1c/0x1e
[ 55.384593] [<ffffffff81186869>] do_fsync+0x34/0x4e
[ 55.384594] [<ffffffff81186ab3>] SyS_fsync+0x10/0x14
[ 55.384595] [<ffffffff81554f97>] system_call_fastpath+0x12/0x6f
[...]
[ 55.384608] ---[ end trace c29799da1d4dd621 ]---
[ 55.437323] BTRFS info (device hdb1): forced readonly
[ 55.438815] BTRFS info (device hdb1): delayed_refs has NO entry
Fix this by being explicit about the no-fail behavior of this allocation
path and use __GFP_NOFAIL.
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
fs/btrfs/extent_io.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c374e1e71e5f..f4d6eea975d7 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4607,9 +4607,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
{
struct extent_buffer *eb = NULL;
- eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS);
- if (eb == NULL)
- return NULL;
+ eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS|__GFP_NOFAIL);
eb->start = start;
eb->len = len;
eb->fs_info = fs_info;
@@ -4867,7 +4865,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
return NULL;
for (i = 0; i < num_pages; i++, index++) {
- p = find_or_create_page(mapping, index, GFP_NOFS);
+ p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
if (!p)
goto free_eb;
--
2.5.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] btrfs: use __GFP_NOFAIL in alloc_btrfs_bio
2015-08-19 12:17 [PATCH 0/2] btrfs: fortification for GFP_NOFS allocations mhocko
2015-08-19 12:17 ` [PATCH 1/2] btrfs: Prevent from early transaction abort mhocko
@ 2015-08-19 12:17 ` mhocko
2015-08-19 18:17 ` [PATCH 0/2] btrfs: fortification for GFP_NOFS allocations Chris Mason
2 siblings, 0 replies; 6+ messages in thread
From: mhocko @ 2015-08-19 12:17 UTC (permalink / raw)
To: linux-btrfs
Cc: Chris Mason, Josef Bacik, David Sterba, linux-kernel,
Michal Hocko
From: Michal Hocko <mhocko@suse.com>
alloc_btrfs_bio relies on GFP_NOFS allocation when committing the
transaction but this allocation context is rather weak wrt. reclaim
capabilities. The page allocator currently tries hard to not fail these
allocations if they are small (<=PAGE_ALLOC_COSTLY_ORDER) but it can
still fail if the _current_ process is the OOM killer victim. Moreover
there is an attempt to move away from the default no-fail behavior and
allow these allocation to fail more eagerly. This would lead to:
[ 37.928625] kernel BUG at fs/btrfs/extent_io.c:4045
which is clearly undesirable and the nofail behavior should be explicit
if the allocation failure cannot be tolerated.
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
fs/btrfs/volumes.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 53af23f2c087..42b9949dd71d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4914,9 +4914,7 @@ static struct btrfs_bio *alloc_btrfs_bio(int total_stripes, int real_stripes)
* and the stripes
*/
sizeof(u64) * (total_stripes),
- GFP_NOFS);
- if (!bbio)
- return NULL;
+ GFP_NOFS|__GFP_NOFAIL);
atomic_set(&bbio->error, 0);
atomic_set(&bbio->refs, 1);
--
2.5.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] btrfs: fortification for GFP_NOFS allocations
2015-08-19 12:17 [PATCH 0/2] btrfs: fortification for GFP_NOFS allocations mhocko
2015-08-19 12:17 ` [PATCH 1/2] btrfs: Prevent from early transaction abort mhocko
2015-08-19 12:17 ` [PATCH 2/2] btrfs: use __GFP_NOFAIL in alloc_btrfs_bio mhocko
@ 2015-08-19 18:17 ` Chris Mason
2015-09-09 16:13 ` Vlastimil Babka
2 siblings, 1 reply; 6+ messages in thread
From: Chris Mason @ 2015-08-19 18:17 UTC (permalink / raw)
To: mhocko; +Cc: linux-btrfs, Josef Bacik, David Sterba, linux-kernel
On Wed, Aug 19, 2015 at 02:17:39PM +0200, mhocko@kernel.org wrote:
> Hi,
> these two patches were sent as a part of a larger RFC which aims at
> allowing GFP_NOFS allocations to fail to help sort out memory reclaim
> issues bound to the current behavior
> (http://marc.info/?l=linux-mm&m=143876830616538&w=2).
>
> It is clear that move to the GFP_NOFS behavior change is a long term
> plan but these patches should be good enough even with that change in
> place. It also seems that Chris wasn't opposed and would be willing to
> take them http://marc.info/?l=linux-mm&m=143991792427165&w=2 so here we
> come. I have rephrased the changeslogs to not refer to the patch which
> changes the NOFS behavior.
>
> Just to clarify. These two patches allowed my particular testcase
> (mentioned in the cover referenced above) to survive it doesn't mean
> that the failing GFP_NOFS are OK now. I have seen some other places
> where GFP_NOFS allocation is followed by BUG_ON(ALLOC_FAILED). I have
> not encountered them though.
>
> Let me know if you would prefer other changes.
My plan is to start with these two and take more as required.
-chris
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] btrfs: fortification for GFP_NOFS allocations
2015-08-19 18:17 ` [PATCH 0/2] btrfs: fortification for GFP_NOFS allocations Chris Mason
@ 2015-09-09 16:13 ` Vlastimil Babka
2015-09-11 8:27 ` Michal Hocko
0 siblings, 1 reply; 6+ messages in thread
From: Vlastimil Babka @ 2015-09-09 16:13 UTC (permalink / raw)
To: Chris Mason, mhocko, linux-btrfs, Josef Bacik, David Sterba,
linux-kernel
On 08/19/2015 08:17 PM, Chris Mason wrote:
> On Wed, Aug 19, 2015 at 02:17:39PM +0200, mhocko@kernel.org wrote:
>> Hi,
>> these two patches were sent as a part of a larger RFC which aims at
>> allowing GFP_NOFS allocations to fail to help sort out memory reclaim
>> issues bound to the current behavior
>> (http://marc.info/?l=linux-mm&m=143876830616538&w=2).
>>
>> It is clear that move to the GFP_NOFS behavior change is a long term
>> plan but these patches should be good enough even with that change in
>> place. It also seems that Chris wasn't opposed and would be willing to
>> take them http://marc.info/?l=linux-mm&m=143991792427165&w=2 so here we
>> come. I have rephrased the changeslogs to not refer to the patch which
>> changes the NOFS behavior.
>>
>> Just to clarify. These two patches allowed my particular testcase
>> (mentioned in the cover referenced above) to survive it doesn't mean
>> that the failing GFP_NOFS are OK now. I have seen some other places
>> where GFP_NOFS allocation is followed by BUG_ON(ALLOC_FAILED). I have
>> not encountered them though.
>>
>> Let me know if you would prefer other changes.
>
> My plan is to start with these two and take more as required.
I've previously noticed in __set_extent_bit() things like:
if (!prealloc && (mask & __GFP_WAIT)) {
prealloc = alloc_extent_state(mask);
BUG_ON(!prealloc);
}
and later:
prealloc = alloc_extent_state_atomic(prealloc);
BUG_ON(!prealloc);
which internally does:
if (!prealloc)
prealloc = alloc_extent_state(GFP_ATOMIC);
The first one could be fixable by adding __GFP_NOFAIL. In fact we've got
an internal bug report for that one already. Even without GFP_NOFS being
allowed to fail, allocation can already fail when the thread is marked
for oom kill, which is likely what happened in that case.
The second case is problematic though, because GFP_ATOMIC | __GFP_NOFAIL
is not allowed. GFP_ATOMIC will give you access to memory reserves,
which reduces the chance of hitting the BUG_ON(), but it's not a
bulletproof solution.
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] btrfs: fortification for GFP_NOFS allocations
2015-09-09 16:13 ` Vlastimil Babka
@ 2015-09-11 8:27 ` Michal Hocko
0 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2015-09-11 8:27 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Chris Mason, linux-btrfs, Josef Bacik, David Sterba, linux-kernel
On Wed 09-09-15 18:13:39, Vlastimil Babka wrote:
> On 08/19/2015 08:17 PM, Chris Mason wrote:
> >On Wed, Aug 19, 2015 at 02:17:39PM +0200, mhocko@kernel.org wrote:
> >>Hi,
> >>these two patches were sent as a part of a larger RFC which aims at
> >>allowing GFP_NOFS allocations to fail to help sort out memory reclaim
> >>issues bound to the current behavior
> >>(http://marc.info/?l=linux-mm&m=143876830616538&w=2).
> >>
> >>It is clear that move to the GFP_NOFS behavior change is a long term
> >>plan but these patches should be good enough even with that change in
> >>place. It also seems that Chris wasn't opposed and would be willing to
> >>take them http://marc.info/?l=linux-mm&m=143991792427165&w=2 so here we
> >>come. I have rephrased the changeslogs to not refer to the patch which
> >>changes the NOFS behavior.
> >>
> >>Just to clarify. These two patches allowed my particular testcase
> >>(mentioned in the cover referenced above) to survive it doesn't mean
> >>that the failing GFP_NOFS are OK now. I have seen some other places
> >>where GFP_NOFS allocation is followed by BUG_ON(ALLOC_FAILED). I have
> >>not encountered them though.
> >>
> >>Let me know if you would prefer other changes.
> >
> >My plan is to start with these two and take more as required.
>
> I've previously noticed in __set_extent_bit() things like:
>
> if (!prealloc && (mask & __GFP_WAIT)) {
> prealloc = alloc_extent_state(mask);
> BUG_ON(!prealloc);
> }
>
> and later:
>
> prealloc = alloc_extent_state_atomic(prealloc);
> BUG_ON(!prealloc);
Yes. I have noticed also many other places:
$ git grep "BUG_ON.*ENOMEM" -- fs/btrfs/ | wc -l
47
I have talked to David Sterba and he said this is on his todo list.
So this will likely take some more time but it is definitely good to
sort out.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 6+ messages in thread