Re: [PATCH] bcachefs: Switch to memalloc_flags_do() for vmalloc allocations

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Yafang Shao <laoar.shao@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>,
	Michal Hocko <mhocko@suse.com>,
	 Matthew Wilcox <willy@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org, Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH] bcachefs: Switch to memalloc_flags_do() for vmalloc allocations
Date: Fri, 30 Aug 2024 17:14:28 +0800	[thread overview]
Message-ID: <CALOAHbCssCSb7zF6VoKugFjAQcMACmOTtSCzd7n8oGfXdsxNsg@mail.gmail.com> (raw)
In-Reply-To: <ZtCFP5w6yv/aykui@dread.disaster.area>

On Thu, Aug 29, 2024 at 10:29 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Aug 29, 2024 at 07:55:08AM -0400, Kent Overstreet wrote:
> > Ergo, if you're not absolutely sure that a GFP_NOFAIL use is safe
> > according to call path and allocation size, you still need to be
> > checking for failure - in the same way that you shouldn't be using
> > BUG_ON() if you cannot prove that the condition won't occur in real wold
> > usage.
>
> We've been using __GFP_NOFAIL semantics in XFS heavily for 30 years
> now. This was the default Irix kernel allocator behaviour (it had a
> forwards progress guarantee and would never fail allocation unless
> told it could do so). We've been using the same "guaranteed not to
> fail" semantics on Linux since the original port started 25 years
> ago via open-coded loops.
>
> IOWs, __GFP_NOFAIL semantics have been production tested for a
> couple of decades on Linux via XFS, and nobody here can argue that
> XFS is unreliable or crashes in low memory scenarios. __GFP_NOFAIL
> as it is used by XFS is reliable and lives up to the "will not fail"
> guarantee that it is supposed to have.
>
> Fundamentally, __GFP_NOFAIL came about to replace the callers doing
>
>         do {
>                 p = kmalloc(size);
>         while (!p);
>
> so that they blocked until memory allocation succeeded. The call
> sites do not check for failure, because -failure never occurs-.
>
> The MM devs want to have visibility of these allocations - they may
> not like them, but having __GFP_NOFAIL means it's trivial to audit
> all the allocations that use these semantics.  IOWs, __GFP_NOFAIL
> was created with an explicit guarantee that it -will not fail- for
> normal allocation contexts so it could replace all the open-coded
> will-not-fail allocation loops..
>
> Given this guarantee, we recently removed these historic allocation
> wrapper loops from XFS, and replaced them with __GFP_NOFAIL at the
> allocation call sites. There's nearly a hundred memory allocation
> locations in XFS that are tagged with __GFP_NOFAIL.
>
> If we're now going to have the "will not fail" guarantee taken away
> from __GFP_NOFAIL, then we cannot use __GFP_NOFAIL in XFS. Nor can
> it be used anywhere else that a "will not fail" guarantee it
> required.
>
> Put simply: __GFP_NOFAIL will be rendered completely useless if it
> can fail due to external scoped memory allocation contexts.  This
> will force us to revert all __GFP_NOFAIL allocations back to
> open-coded will-not-fail loops.
>
> This is not a step forwards for anyone.

Hello Dave,

I've noticed that XFS has increasingly replaced kmem_alloc() with
__GFP_NOFAIL. For example, in kernel 4.19.y, there are 0 instances of
__GFP_NOFAIL under fs/xfs, but in kernel 6.1.y, there are 41
occurrences. In kmem_alloc(), there's an explicit
memalloc_retry_wait() to throttle the allocator under heavy memory
pressure, which aligns with your filesystem design. However, using
__GFP_NOFAIL removes this throttling mechanism, potentially causing
issues when the system is under heavy memory load. I'm concerned that
this shift might not be a beneficial trend.

We have been using XFS for our big data servers for years, and it has
consistently performed well with older kernels like 4.19.y. However,
after upgrading all our servers from 4.19.y to 6.1.y over the past two
years, we have frequently encountered livelock issues caused by memory
exhaustion. To mitigate this, we've had to limit the RSS of
applications, which isn't an ideal solution and represents a worrying
trend.

-- 
Regards
Yafang

next prev parent reply	other threads:[~2024-08-30  9:15 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-28 14:06 [PATCH] bcachefs: Switch to memalloc_flags_do() for vmalloc allocations Kent Overstreet
2024-08-28 18:48 ` Matthew Wilcox
2024-08-28 19:11   ` Kent Overstreet
2024-08-28 19:26     ` Michal Hocko
2024-08-28 22:58       ` Kent Overstreet
2024-08-29  7:19         ` Michal Hocko
2024-08-29 11:41           ` Kent Overstreet
2024-08-29 11:08         ` Michal Hocko
2024-08-29 11:55           ` Kent Overstreet
2024-08-29 12:34             ` Michal Hocko
2024-08-29 12:42               ` Kent Overstreet
2024-08-29 14:27             ` Dave Chinner
2024-08-30  3:39               ` Theodore Ts'o
2024-08-31 15:46                 ` Kent Overstreet
2024-08-30  9:14               ` Yafang Shao [this message]
2024-08-30 15:25                 ` Vlastimil Babka
2024-09-02  3:00                   ` Yafang Shao
2024-09-01  3:35                 ` Dave Chinner
2024-09-02  3:02                   ` Yafang Shao
2024-09-02  8:11                     ` Michal Hocko
2024-09-02  9:01                       ` Yafang Shao
2024-09-02  9:09                         ` Michal Hocko
2024-09-03  6:34                           ` Yafang Shao
2024-09-03  7:18                             ` Michal Hocko
2024-09-03 12:44                             ` Theodore Ts'o
2024-09-03 13:15                               ` Yafang Shao
2024-09-03 14:03                                 ` Michal Hocko
2024-09-03 13:30                               ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALOAHbCssCSb7zF6VoKugFjAQcMACmOTtSCzd7n8oGfXdsxNsg@mail.gmail.com \
    --to=laoar.shao@gmail.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).