linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Yafang Shao <laoar.shao@gmail.com>,
	Dave Chinner <david@fromorbit.com>,
	Kent Overstreet <kent.overstreet@linux.dev>,
	Matthew Wilcox <willy@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH] bcachefs: Switch to memalloc_flags_do() for vmalloc allocations
Date: Tue, 3 Sep 2024 15:30:52 +0200	[thread overview]
Message-ID: <ZtcPjBeF9TjCe8Sl@tiehlicka> (raw)
In-Reply-To: <20240903124416.GE424729@mit.edu>

On Tue 03-09-24 08:44:16, Theodore Ts'o wrote:
> On Tue, Sep 03, 2024 at 02:34:05PM +0800, Yafang Shao wrote:
> >
> > When setting GFP_NOFAIL, it's important to not only enable direct
> > reclaim but also the OOM killer. In scenarios where swap is off and
> > there is minimal page cache, setting GFP_NOFAIL without __GFP_FS can
> > result in an infinite loop. In other words, GFP_NOFAIL should not be
> > used with GFP_NOFS. Unfortunately, many call sites do combine them.
> > For example:
> > 
> > XFS:
> > 
> > fs/xfs/libxfs/xfs_exchmaps.c: GFP_NOFS | __GFP_NOFAIL
> > fs/xfs/xfs_attr_item.c: GFP_NOFS | __GFP_NOFAIL
> > 
> > EXT4:
> > 
> > fs/ext4/mballoc.c: GFP_NOFS | __GFP_NOFAIL
> > fs/ext4/extents.c: GFP_NOFS | __GFP_NOFAIL
> > 
> > This seems problematic, but I'm not an FS expert. Perhaps Dave or Ted
> > could provide further insight.
> 
> GFP_NOFS is needed because we need to signal to the mm layer to avoid
> recursing into file system layer --- for example, to clean a page by
> writing it back to the FS.  Since we may have taken various file
> system locks, recursing could lead to deadlock, which would make the
> system (and the user) sad.
> 
> If the mm layer wants to OOM kill a process, that should be fine as
> far as the file system is concerned --- this could reclaim anonymous
> pages that don't need to be written back, for example.  And we don't
> need to write back dirty pages before the process killed.  So I'm a
> bit puzzled why (as you imply; I haven't dug into the mm code in
> question) GFP_NOFS implies disabling the OOM killer?

Yes, because there might be a lot of fs pages pinned while performing
NOFS allocation and that could fire the OOM killer way too prematurely.
This has been quite some time ago since this was introduced but I do
remember workloads hitting that. Also there is usually kswapd making
sufficient progress to move forward. There are cases where kswapd is
completely stuck and other __GFP_FS allocations triggering full direct
reclaim or background kworkers freeing some memory and OOM killer
doesn't have good enough picture to make an educated guess the oom
killer is the only available way forward.

A typical example would be a workload that would care is trashing but
still making a slow progress which is acceptable which is acceptable
because the most important workload makes a decent progress (the working
set fits in or is mlocked) and rebuilding the state is more harmfull
than a slow IO.

-- 
Michal Hocko
SUSE Labs

      parent reply	other threads:[~2024-09-03 13:31 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-28 14:06 [PATCH] bcachefs: Switch to memalloc_flags_do() for vmalloc allocations Kent Overstreet
2024-08-28 18:48 ` Matthew Wilcox
2024-08-28 19:11   ` Kent Overstreet
2024-08-28 19:26     ` Michal Hocko
2024-08-28 22:58       ` Kent Overstreet
2024-08-29  7:19         ` Michal Hocko
2024-08-29 11:41           ` Kent Overstreet
2024-08-29 11:08         ` Michal Hocko
2024-08-29 11:55           ` Kent Overstreet
2024-08-29 12:34             ` Michal Hocko
2024-08-29 12:42               ` Kent Overstreet
2024-08-29 14:27             ` Dave Chinner
2024-08-30  3:39               ` Theodore Ts'o
2024-08-31 15:46                 ` Kent Overstreet
2024-08-30  9:14               ` Yafang Shao
2024-08-30 15:25                 ` Vlastimil Babka
2024-09-02  3:00                   ` Yafang Shao
2024-09-01  3:35                 ` Dave Chinner
2024-09-02  3:02                   ` Yafang Shao
2024-09-02  8:11                     ` Michal Hocko
2024-09-02  9:01                       ` Yafang Shao
2024-09-02  9:09                         ` Michal Hocko
2024-09-03  6:34                           ` Yafang Shao
2024-09-03  7:18                             ` Michal Hocko
2024-09-03 12:44                             ` Theodore Ts'o
2024-09-03 13:15                               ` Yafang Shao
2024-09-03 14:03                                 ` Michal Hocko
2024-09-03 13:30                               ` Michal Hocko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZtcPjBeF9TjCe8Sl@tiehlicka \
    --to=mhocko@suse.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=kent.overstreet@linux.dev \
    --cc=laoar.shao@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).