Re: Freeing page flags - Matthew Wilcox

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Matthew Wilcox <willy@infradead.org>
To: "Luís Henriques" <lhenriques@suse.de>
Cc: Xiubo Li <xiubli@redhat.com>, Jeff Layton <jlayton@kernel.org>,
	Josef Bacik <josef@toxicpanda.com>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: Freeing page flags
Date: Fri, 13 May 2022 13:53:15 +0100	[thread overview]
Message-ID: <Yn5Uu/ZZkNfbdhGA@casper.infradead.org> (raw)
In-Reply-To: <87sfpd22kq.fsf@brahms.olymp>

On Fri, May 13, 2022 at 10:40:05AM +0100, Luís Henriques wrote:
> Matthew Wilcox <willy@infradead.org> writes:
> 
> > On Thu, May 12, 2022 at 10:41:41PM -0400, Josef Bacik wrote:
> >> On Thu, May 12, 2022 at 09:54:59PM +0100, Matthew Wilcox wrote:
> >> > The LWN writeup [1] on merging the MGLRU reminded me that I need to send
> >> > out a plan for removing page flags that we can do without.
> >> > 
> >> > 1. PG_error.  It's basically useless.  If the page was read successfully,
> >> > PG_uptodate is set.  If not, PG_uptodate is clear.  The page cache
> >> > doesn't use PG_error.  Some filesystems do, and we need to transition
> >> > them away from using it.
> >> >
> >> 
> >> What about writes?  A cursory look shows we don't clear Uptodate if we fail to
> >> write, which is correct I think.  The only way to indicate we had a write error
> >> to check later is the page error.
> >
> > On encountering a write error, we're supposed to call mapping_set_error(),
> > not SetPageError().
> >
> >> > 2. PG_private.  This tells us whether we have anything stored at
> >> > page->private.  We can just check if page->private is NULL or not.
> >> > No need to have this extra bit.  Again, there may be some filesystems
> >> > that are a bit wonky here, but I'm sure they're fixable.
> >> > 
> >> 
> >> At least for Btrfs we serialize the page->private with the private_lock, so we
> >> could probably just drop PG_private, but it's kind of nice to check first before
> >> we have to take the spin lock.  I suppose we can just do
> >> 
> >> if (page->private)
> >> 	// do lock and check thingy
> >
> > That's my hope!  I think btrfs is already using folio_attach_private() /
> > attach_page_private(), which makes everything easier.  Some filesystems
> > still manipulate page->private and PagePrivate by hand.
> 
> In ceph we've recently [1] spent a bit of time debugging a bug related
> with ->private not being NULL even though we expected it to be.  The
> solution found was to replace the check for NULL and use
> folio_test_private() instead, but we _may_ have not figured the whole
> thing out.
> 
> We assumed that folios were being recycled and not cleaned-up.  The values
> we were seeing in ->private looked like they were some sort of flags as
> only a few bits were set (e.g. 0x0200000):
> 
> [ 1672.578313] page:00000000e23868c1 refcount:2 mapcount:0 mapping:0000000022e0d3b4 index:0xd8 pfn:0x74e83
> [ 1672.581934] aops:ceph_aops [ceph] ino:10000016c9e dentry name:"faed"
> [ 1672.584457] flags: 0x4000000000000015(locked|uptodate|lru|zone=1)
> [ 1672.586878] raw: 4000000000000015 ffffea0001d3a108 ffffea0001d3a088 ffff888003491948
> [ 1672.589894] raw: 00000000000000d8 0000000000200000 00000002ffffffff 0000000000000000
> [ 1672.592935] page dumped because: VM_BUG_ON_FOLIO(1)
> 
> [1] https://lore.kernel.org/all/20220508061543.318394-1-xiubli@redhat.com/

I remember Jeff asking me about this problem a few days ago.  A folio
passed to you in ->dirty_folio() or ->invalidate_folio() belongs to
your filesystem.  Nobody else should be storing to the ->private field;
there's no race that could lead to it being freed while you see it.
There may, of course, be bugs that are overwriting folio->private, but
it's definitely not supposed to happen.  I agree with you that it looks
like a bit has been set (is it possibly bad RAM?)

We do use page->private in the buddy allocator, but that stores the order
of the page; it wouldn't be storing 1<<21.  PG flag 21 is PG_mlocked,
which seems like a weird one to be setting in the wrong field, so probably
not that.

Is it always bit 21 that gets set?

next prev parent reply	other threads:[~2022-05-13 12:53 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-12 20:54 Freeing page flags Matthew Wilcox
2022-05-13  2:41 ` Josef Bacik
2022-05-13  3:39   ` Matthew Wilcox
2022-05-13  9:40     ` Luís Henriques
2022-05-13 12:53       ` Matthew Wilcox [this message]
2022-05-13 13:18         ` Luís Henriques
2022-05-13 13:21         ` Jeff Layton
2022-05-13 13:38           ` Matthew Wilcox
2022-05-13 13:57             ` Jeff Layton
2022-05-17  0:34         ` Xiubo Li
2022-05-13 13:17     ` Josef Bacik
2022-05-13  3:46 ` Yu Zhao
2022-05-14  6:10 ` Eric Biggers
2022-05-23  6:38 ` Mike Rapoport
2022-06-10 17:39   ` David Hildenbrand
2022-05-30 16:54 ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yn5Uu/ZZkNfbdhGA@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=jlayton@kernel.org \
    --cc=josef@toxicpanda.com \
    --cc=lhenriques@suse.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).