[PATCH] btrfs: Avoid BUG_ON()s because of ENOMEM caused by kmalloc() failure

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] btrfs: Avoid BUG_ON()s because of ENOMEM caused by kmalloc() failure
@ 2016-02-15  5:38 Satoru Takeuchi
  2016-02-15 17:53 ` David Sterba
  0 siblings, 1 reply; 5+ messages in thread
From: Satoru Takeuchi @ 2016-02-15  5:38 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

There are some BUG_ON()'s after kmalloc() as follows.

=====
foo = kmalloc();
BUG_ON(!foo);	/* -ENOMEM case */
=====

A Docker + memory cgroup user hit these BUG_ON()s.

https://bugzilla.kernel.org/show_bug.cgi?id=112101

Since it's very hard to handle these ENOMEMs properly,
preventing these kmalloc() failures to avoid these
BUG_ON()s for now, are a bit better than the current
implementation anyway.

Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
---
 fs/btrfs/extent_io.c  | 6 ++----
 fs/btrfs/inode.c      | 6 ++----
 fs/btrfs/relocation.c | 3 +--
 3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2e7c97a..5f92450 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -874,10 +874,8 @@ __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,

 	bits |= EXTENT_FIRST_DELALLOC;
 again:
-	if (!prealloc && gfpflags_allow_blocking(mask)) {
-		prealloc = alloc_extent_state(mask);
-		BUG_ON(!prealloc);
-	}
+	if (!prealloc && gfpflags_allow_blocking(mask))
+		prealloc = alloc_extent_state(mask|__GFP_NOFAIL);

 	spin_lock(&tree->lock);
 	if (cached_state && *cached_state) {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 85afe66..d20e5c5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -357,8 +357,7 @@ static noinline int add_async_extent(struct async_cow *cow,
 {
 	struct async_extent *async_extent;

-	async_extent = kmalloc(sizeof(*async_extent), GFP_NOFS);
-	BUG_ON(!async_extent); /* -ENOMEM */
+	async_extent = kmalloc(sizeof(*async_extent), GFP_NOFS|__GFP_NOFAIL);
 	async_extent->start = start;
 	async_extent->ram_size = ram_size;
 	async_extent->compressed_size = compressed_size;
@@ -1143,8 +1142,7 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page,
 	clear_extent_bit(&BTRFS_I(inode)->io_tree, start, end, EXTENT_LOCKED,
 			 1, 0, NULL, GFP_NOFS);
 	while (start < end) {
-		async_cow = kmalloc(sizeof(*async_cow), GFP_NOFS);
-		BUG_ON(!async_cow); /* -ENOMEM */
+		async_cow = kmalloc(sizeof(*async_cow), GFP_NOFS|__GFP_NOFAIL);
 		async_cow->inode = igrab(inode);
 		async_cow->root = root;
 		async_cow->locked_page = locked_page;
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index ef6d8fc..6b9f718 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1373,8 +1373,7 @@ static struct btrfs_root *create_reloc_root(struct btrfs_trans_handle *trans,
 	u64 last_snap = 0;
 	int ret;

-	root_item = kmalloc(sizeof(*root_item), GFP_NOFS);
-	BUG_ON(!root_item);
+	root_item = kmalloc(sizeof(*root_item), GFP_NOFS|__GFP_NOFAIL);

 	root_key.objectid = BTRFS_TREE_RELOC_OBJECTID;
 	root_key.type = BTRFS_ROOT_ITEM_KEY;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: Avoid BUG_ON()s because of ENOMEM caused by kmalloc() failure
  2016-02-15  5:38 [PATCH] btrfs: Avoid BUG_ON()s because of ENOMEM caused by kmalloc() failure Satoru Takeuchi
@ 2016-02-15 17:53 ` David Sterba
  2016-02-17  5:54   ` Satoru Takeuchi
  0 siblings, 1 reply; 5+ messages in thread
From: David Sterba @ 2016-02-15 17:53 UTC (permalink / raw)
  To: Satoru Takeuchi; +Cc: linux-btrfs@vger.kernel.org

On Mon, Feb 15, 2016 at 02:38:09PM +0900, Satoru Takeuchi wrote:
> There are some BUG_ON()'s after kmalloc() as follows.
> 
> =====
> foo = kmalloc();
> BUG_ON(!foo);	/* -ENOMEM case */
> =====
> 
> A Docker + memory cgroup user hit these BUG_ON()s.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=112101
> 
> Since it's very hard to handle these ENOMEMs properly,
> preventing these kmalloc() failures to avoid these
> BUG_ON()s for now, are a bit better than the current
> implementation anyway.

Beware that the NOFAIL semantics is can cause deadlocks if it's on the
critical writeback path or and can be reentered from itself through the
reclaim. Unless you're sure that this is not the case, please do not add
them just because it would seemingly fix the allocation failures.

In the docker example, the memory is limited by cgroups so the NOFAIL
mode can exhaust all reserves and just loop endlessly waiting for the
OOM killer to get some memory or just waiting without any chance to
progress.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: Avoid BUG_ON()s because of ENOMEM caused by kmalloc() failure
  2016-02-15 17:53 ` David Sterba
@ 2016-02-17  5:54   ` Satoru Takeuchi
  2016-02-17 15:11     ` David Sterba
  0 siblings, 1 reply; 5+ messages in thread
From: Satoru Takeuchi @ 2016-02-17  5:54 UTC (permalink / raw)
  To: dsterba, linux-btrfs@vger.kernel.org

On 2016/02/16 2:53, David Sterba wrote:
> On Mon, Feb 15, 2016 at 02:38:09PM +0900, Satoru Takeuchi wrote:
>> There are some BUG_ON()'s after kmalloc() as follows.
>>
>> =====
>> foo = kmalloc();
>> BUG_ON(!foo);	/* -ENOMEM case */
>> =====
>>
>> A Docker + memory cgroup user hit these BUG_ON()s.
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=112101
>>
>> Since it's very hard to handle these ENOMEMs properly,
>> preventing these kmalloc() failures to avoid these
>> BUG_ON()s for now, are a bit better than the current
>> implementation anyway.
>
> Beware that the NOFAIL semantics is can cause deadlocks if it's on the
> critical writeback path or and can be reentered from itself through the
> reclaim. Unless you're sure that this is not the case, please do not add
> them just because it would seemingly fix the allocation failures.

About the all cases I changed, kmalloc()s can block
since gfp_flags_allow_blocking() are true. Then no locks
are acquired here and deadlocks don't happen.

Am I missing something?

>
> In the docker example, the memory is limited by cgroups so the NOFAIL
> mode can exhaust all reserves and just loop endlessly waiting for the
> OOM killer to get some memory or just waiting without any chance to
> progress.

I consider triggering OOM killer and killing processes
in a cgroup are better than killing whole system.

About the possibility of endless loop, there are many
such problems in the whole kernel. Of course it can be
said to Btrfs.

==========================================
$ grep -rnH __GFP_NOFAIL fs/btrfs/
fs/btrfs/extent-tree.c:5970: GFP_NOFS | __GFP_NOFAIL);
fs/btrfs/extent-tree.c:6043: bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
fs/btrfs/extent_io.c:4643: eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS|__GFP_NOFAIL);
fs/btrfs/extent_io.c:4909: p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
==========================================

I understand fixing these problems cooperate with
memory cgroup guys is the best in the long run.
However, I consider bypassing this problem for now
is better than the current implementation.

Thanks,
Satoru

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: Avoid BUG_ON()s because of ENOMEM caused by kmalloc() failure
  2016-02-17  5:54   ` Satoru Takeuchi
@ 2016-02-17 15:11     ` David Sterba
  2016-02-18  1:59       ` Satoru Takeuchi
  0 siblings, 1 reply; 5+ messages in thread
From: David Sterba @ 2016-02-17 15:11 UTC (permalink / raw)
  To: Satoru Takeuchi; +Cc: dsterba, linux-btrfs@vger.kernel.org

On Wed, Feb 17, 2016 at 02:54:23PM +0900, Satoru Takeuchi wrote:
> On 2016/02/16 2:53, David Sterba wrote:
> > On Mon, Feb 15, 2016 at 02:38:09PM +0900, Satoru Takeuchi wrote:
> >> There are some BUG_ON()'s after kmalloc() as follows.
> >>
> >> =====
> >> foo = kmalloc();
> >> BUG_ON(!foo);	/* -ENOMEM case */
> >> =====
> >>
> >> A Docker + memory cgroup user hit these BUG_ON()s.
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=112101
> >>
> >> Since it's very hard to handle these ENOMEMs properly,
> >> preventing these kmalloc() failures to avoid these
> >> BUG_ON()s for now, are a bit better than the current
> >> implementation anyway.
> >
> > Beware that the NOFAIL semantics is can cause deadlocks if it's on the
> > critical writeback path or and can be reentered from itself through the
> > reclaim. Unless you're sure that this is not the case, please do not add
> > them just because it would seemingly fix the allocation failures.
> 
> About the all cases I changed, kmalloc()s can block
> since gfp_flags_allow_blocking() are true. Then no locks
> are acquired here and deadlocks don't happen.
> 
> Am I missing something?

Waiting as in GFP_WAIT is not the same as GFP_NOFAIL that can wait
indefinetelly. The locking of NOFAIL is implied. The kmalloc callsite
will block until there's memory available. If another thread of btrfs
waits for this code to progress, it will block as well.

> > In the docker example, the memory is limited by cgroups so the NOFAIL
> > mode can exhaust all reserves and just loop endlessly waiting for the
> > OOM killer to get some memory or just waiting without any chance to
> > progress.
> 
> I consider triggering OOM killer and killing processes
> in a cgroup are better than killing whole system.

The action of OOM killer is not a problem. The cgroup memory limit can
be low or all the memory is unreclaimable. At this point btrfs code will
block.

> About the possibility of endless loop, there are many
> such problems in the whole kernel. Of course it can be
> said to Btrfs.

Many? Examples? In this context we're talking about endless loops caused
by non-failing allocations.

> ==========================================
> $ grep -rnH __GFP_NOFAIL fs/btrfs/
> fs/btrfs/extent-tree.c:5970: GFP_NOFS | __GFP_NOFAIL);
> fs/btrfs/extent-tree.c:6043: bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
> fs/btrfs/extent_io.c:4643: eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS|__GFP_NOFAIL);
> fs/btrfs/extent_io.c:4909: p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
> ==========================================

I'm aware of the existing NOFAIL allocations. There are two at
extent_buffer allocation time, added recently and provoked by the
intended changes to memory allocator that would fail the GFP_NOFS
allocations.

The other two are from year 2010, set_extent_bit, IMHO added so the
error handling does not get complicated and expressing "we don't want to
fail here". But there are many other calls to set_extent_bit that could
fail. This is inconsistent and should be unified. In a way that we're
sure that we're not introducing potential hangs.

> I understand fixing these problems cooperate with
> memory cgroup guys is the best in the long run.

It's not about cgroups, btrfs needs to ideally handle all memory
allocation failures in a way that uses some sort of fallbacks but still
can lead to a transaction abort if there's simply no memory left.

> However, I consider bypassing this problem for now
> is better than the current implementation.

It's more like replacing one problem with another. With every new
NOFAIL, one has to think about the runtime interactions with the others.
I'd rather see this fixed with a particular call path in mind or class,
eg. the extent_map bit settings, than throwing NOFAIL at places that
somebody accidentally hit.

As a temporary fix we can add __GFP_HIGH to the interesting sites so we
can get access to the emergency reserves, and this is on my list of
things to do after the NOFS -> KERNEL updates are done.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: Avoid BUG_ON()s because of ENOMEM caused by kmalloc() failure
  2016-02-17 15:11     ` David Sterba
@ 2016-02-18  1:59       ` Satoru Takeuchi
  0 siblings, 0 replies; 5+ messages in thread
From: Satoru Takeuchi @ 2016-02-18  1:59 UTC (permalink / raw)
  To: dsterba, linux-btrfs@vger.kernel.org

On 2016/02/18 0:11, David Sterba wrote:
> On Wed, Feb 17, 2016 at 02:54:23PM +0900, Satoru Takeuchi wrote:
>> On 2016/02/16 2:53, David Sterba wrote:
>>> On Mon, Feb 15, 2016 at 02:38:09PM +0900, Satoru Takeuchi wrote:
>>>> There are some BUG_ON()'s after kmalloc() as follows.
>>>>
>>>> =====
>>>> foo = kmalloc();
>>>> BUG_ON(!foo);	/* -ENOMEM case */
>>>> =====
>>>>
>>>> A Docker + memory cgroup user hit these BUG_ON()s.
>>>>
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=112101
>>>>
>>>> Since it's very hard to handle these ENOMEMs properly,
>>>> preventing these kmalloc() failures to avoid these
>>>> BUG_ON()s for now, are a bit better than the current
>>>> implementation anyway.
>>>
>>> Beware that the NOFAIL semantics is can cause deadlocks if it's on the
>>> critical writeback path or and can be reentered from itself through the
>>> reclaim. Unless you're sure that this is not the case, please do not add
>>> them just because it would seemingly fix the allocation failures.
>>
>> About the all cases I changed, kmalloc()s can block
>> since gfp_flags_allow_blocking() are true. Then no locks
>> are acquired here and deadlocks don't happen.
>>
>> Am I missing something?
>
> Waiting as in GFP_WAIT is not the same as GFP_NOFAIL that can wait
> indefinetelly. The locking of NOFAIL is implied. The kmalloc callsite
> will block until there's memory available. If another thread of btrfs
> waits for this code to progress, it will block as well.
>
>>> In the docker example, the memory is limited by cgroups so the NOFAIL
>>> mode can exhaust all reserves and just loop endlessly waiting for the
>>> OOM killer to get some memory or just waiting without any chance to
>>> progress.
>>
>> I consider triggering OOM killer and killing processes
>> in a cgroup are better than killing whole system.
>
> The action of OOM killer is not a problem. The cgroup memory limit can
> be low or all the memory is unreclaimable. At this point btrfs code will
> block.
>
>
>> About the possibility of endless loop, there are many
>> such problems in the whole kernel. Of course it can be
>> said to Btrfs.
>
> Many? Examples? In this context we're talking about endless loops caused
> by non-failing allocations.
>
>> ==========================================
>> $ grep -rnH __GFP_NOFAIL fs/btrfs/
>> fs/btrfs/extent-tree.c:5970: GFP_NOFS | __GFP_NOFAIL);
>> fs/btrfs/extent-tree.c:6043: bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
>> fs/btrfs/extent_io.c:4643: eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS|__GFP_NOFAIL);
>> fs/btrfs/extent_io.c:4909: p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
>> ==========================================
>
> I'm aware of the existing NOFAIL allocations. There are two at
> extent_buffer allocation time, added recently and provoked by the
> intended changes to memory allocator that would fail the GFP_NOFS
> allocations.
>
> The other two are from year 2010, set_extent_bit, IMHO added so the
> error handling does not get complicated and expressing "we don't want to
> fail here". But there are many other calls to set_extent_bit that could
> fail. This is inconsistent and should be unified. In a way that we're
> sure that we're not introducing potential hangs.
>
>> I understand fixing these problems cooperate with
>> memory cgroup guys is the best in the long run.
>
> It's not about cgroups, btrfs needs to ideally handle all memory
> allocation failures in a way that uses some sort of fallbacks but still
> can lead to a transaction abort if there's simply no memory left.
>
>> However, I consider bypassing this problem for now
>> is better than the current implementation.
>
> It's more like replacing one problem with another. With every new
> NOFAIL, one has to think about the runtime interactions with the others.
> I'd rather see this fixed with a particular call path in mind or class,
> eg. the extent_map bit settings, than throwing NOFAIL at places that
> somebody accidentally hit.
>
> As a temporary fix we can add __GFP_HIGH to the interesting sites so we
> can get access to the emergency reserves, and this is on my list of
> things to do after the NOFS -> KERNEL updates are done.

OK, got it. I'll reconsider how to fix there problem.

Thanks,
Satoru

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-02-18  1:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-15  5:38 [PATCH] btrfs: Avoid BUG_ON()s because of ENOMEM caused by kmalloc() failure Satoru Takeuchi
2016-02-15 17:53 ` David Sterba
2016-02-17  5:54   ` Satoru Takeuchi
2016-02-17 15:11     ` David Sterba
2016-02-18  1:59       ` Satoru Takeuchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).