linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
To: <dsterba@suse.cz>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] btrfs: Avoid BUG_ON()s because of ENOMEM caused by kmalloc() failure
Date: Thu, 18 Feb 2016 10:59:22 +0900	[thread overview]
Message-ID: <56C5257A.20603@jp.fujitsu.com> (raw)
In-Reply-To: <20160217151113.GT4374@twin.jikos.cz>

On 2016/02/18 0:11, David Sterba wrote:
> On Wed, Feb 17, 2016 at 02:54:23PM +0900, Satoru Takeuchi wrote:
>> On 2016/02/16 2:53, David Sterba wrote:
>>> On Mon, Feb 15, 2016 at 02:38:09PM +0900, Satoru Takeuchi wrote:
>>>> There are some BUG_ON()'s after kmalloc() as follows.
>>>>
>>>> =====
>>>> foo = kmalloc();
>>>> BUG_ON(!foo);	/* -ENOMEM case */
>>>> =====
>>>>
>>>> A Docker + memory cgroup user hit these BUG_ON()s.
>>>>
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=112101
>>>>
>>>> Since it's very hard to handle these ENOMEMs properly,
>>>> preventing these kmalloc() failures to avoid these
>>>> BUG_ON()s for now, are a bit better than the current
>>>> implementation anyway.
>>>
>>> Beware that the NOFAIL semantics is can cause deadlocks if it's on the
>>> critical writeback path or and can be reentered from itself through the
>>> reclaim. Unless you're sure that this is not the case, please do not add
>>> them just because it would seemingly fix the allocation failures.
>>
>> About the all cases I changed, kmalloc()s can block
>> since gfp_flags_allow_blocking() are true. Then no locks
>> are acquired here and deadlocks don't happen.
>>
>> Am I missing something?
>
> Waiting as in GFP_WAIT is not the same as GFP_NOFAIL that can wait
> indefinetelly. The locking of NOFAIL is implied. The kmalloc callsite
> will block until there's memory available. If another thread of btrfs
> waits for this code to progress, it will block as well.
>
>>> In the docker example, the memory is limited by cgroups so the NOFAIL
>>> mode can exhaust all reserves and just loop endlessly waiting for the
>>> OOM killer to get some memory or just waiting without any chance to
>>> progress.
>>
>> I consider triggering OOM killer and killing processes
>> in a cgroup are better than killing whole system.
>
> The action of OOM killer is not a problem. The cgroup memory limit can
> be low or all the memory is unreclaimable. At this point btrfs code will
> block.
>
>
>> About the possibility of endless loop, there are many
>> such problems in the whole kernel. Of course it can be
>> said to Btrfs.
>
> Many? Examples? In this context we're talking about endless loops caused
> by non-failing allocations.
>
>> ==========================================
>> $ grep -rnH __GFP_NOFAIL fs/btrfs/
>> fs/btrfs/extent-tree.c:5970: GFP_NOFS | __GFP_NOFAIL);
>> fs/btrfs/extent-tree.c:6043: bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
>> fs/btrfs/extent_io.c:4643: eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS|__GFP_NOFAIL);
>> fs/btrfs/extent_io.c:4909: p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
>> ==========================================
>
> I'm aware of the existing NOFAIL allocations. There are two at
> extent_buffer allocation time, added recently and provoked by the
> intended changes to memory allocator that would fail the GFP_NOFS
> allocations.
>
> The other two are from year 2010, set_extent_bit, IMHO added so the
> error handling does not get complicated and expressing "we don't want to
> fail here". But there are many other calls to set_extent_bit that could
> fail. This is inconsistent and should be unified. In a way that we're
> sure that we're not introducing potential hangs.
>
>> I understand fixing these problems cooperate with
>> memory cgroup guys is the best in the long run.
>
> It's not about cgroups, btrfs needs to ideally handle all memory
> allocation failures in a way that uses some sort of fallbacks but still
> can lead to a transaction abort if there's simply no memory left.
>
>> However, I consider bypassing this problem for now
>> is better than the current implementation.
>
> It's more like replacing one problem with another. With every new
> NOFAIL, one has to think about the runtime interactions with the others.
> I'd rather see this fixed with a particular call path in mind or class,
> eg. the extent_map bit settings, than throwing NOFAIL at places that
> somebody accidentally hit.
>
> As a temporary fix we can add __GFP_HIGH to the interesting sites so we
> can get access to the emergency reserves, and this is on my list of
> things to do after the NOFS -> KERNEL updates are done.

OK, got it. I'll reconsider how to fix there problem.

Thanks,
Satoru

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

      reply	other threads:[~2016-02-18  1:59 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-15  5:38 [PATCH] btrfs: Avoid BUG_ON()s because of ENOMEM caused by kmalloc() failure Satoru Takeuchi
2016-02-15 17:53 ` David Sterba
2016-02-17  5:54   ` Satoru Takeuchi
2016-02-17 15:11     ` David Sterba
2016-02-18  1:59       ` Satoru Takeuchi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56C5257A.20603@jp.fujitsu.com \
    --to=takeuchi_satoru@jp.fujitsu.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).