linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
	William Kucharski <william.kucharski@oracle.com>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH next 3/3] shmem: Fix "Unused swap" messages
Date: Tue, 4 Jan 2022 08:12:25 +0000	[thread overview]
Message-ID: <YdQBaYbQNC9laavZ@casper.infradead.org> (raw)
In-Reply-To: <2fb90c3-5285-56ca-65af-439c4527dbe4@google.com>

On Mon, Jan 03, 2022 at 08:43:13PM -0800, Hugh Dickins wrote:
> On Tue, 4 Jan 2022, Matthew Wilcox wrote:
> > So let me try again.  My concern was that we might be trying to store
> > a 2MB entry which had a non-NULL 'expected' entry which was found in a
> > 4k (ie single-index) slot within the 512 entries (as the first non-NULL
> > entry in that range), and we'd then store the 2MB entry into a
> > single-entry slot.
> 
> Thanks, that sounds much more like how I was imagining it.  And the
> two separate tests much more understandable than twice round a loop.
> 
> > Now, maybe that can't happen for higher-level reasons, and I don't need
> > to worry about it.  But I feel like we should check for that?  Anyway,
> > I think the right fix is this:
> 
> I don't object to (cheaply) excluding a possibility at the low level,
> even if there happen to be high level reasons why it cannot happen at
> present.
> 
> But I don't think your second xas_find_conflict() gives quite as much
> assurance as you're expecting of it.  Since xas_find_conflict() skips
> NULLs, the conflict test would pass if there were 511 NULLs and one
> 4k entry matching the expected entry, but a 2MB entry to be inserted
> (the "small and large folios" case in my earlier ramblings).
> 
> I think xas_find_conflict() is not really up to giving that assurance;
> and maybe better than a second call to xas_find_conflict(), might be
> a VM_BUG_ON earlier, to say that if 'expected' is non-NULL, then the
> range is PAGE_SIZE - or something more folio-friendly like that?
> That would give you the extra assurance you're looking for,
> wouldn't it?

I actually don't need that assurance.  The wretch who wrote the
documentation for xas_find_conflict() needs to be put on a diet of
bread and water, but the promise that it should make is that once it
returns NULL, the xa_state is restored to where it was before the first
call to xas_find_conflict().  So if you're trying to store a 2MB entry
and the only swap entry in the 2MB range is a 4KB entry, the xa_state
gets walked back up to point at the original 512-aligned entry and the
subsequent xas_store() will free the node containing the swap entry.

> For now and the known future, shmem only swaps PAGE_SIZE, out and in;
> maybe someone will want to change that one day, then xas_find_conflict()
> could be enhanced to know more of what's expected.

Good to know.

> > 
> > +++ b/mm/shmem.c
> > @@ -733,11 +733,12 @@ static int shmem_add_to_page_cache(struct page *page,
> >         cgroup_throttle_swaprate(page, gfp);
> > 
> >         do {
> > -               void *entry;
> >                 xas_lock_irq(&xas);
> > -               while ((entry = xas_find_conflict(&xas)) != NULL) {
> > -                       if (entry == expected)
> > -                               continue;
> > +               if (expected != xas_find_conflict(&xas)) {
> > +                       xas_set_err(&xas, -EEXIST);
> > +                       goto unlock;
> > +               }
> > +               if (expected && xas_find_conflict(&xas)) {
> >                         xas_set_err(&xas, -EEXIST);
> >                         goto unlock;
> >                 }
> 
> That also worried me because, if the second xas_find_conflict()
> is to make any sense, the first must have had a side-effect on xas:
> are those side-effects okay for the subsequent xas_store(&xas, page)?
> You'll know that they are, but it's not obvious to the reader.
> 
> > 
> > which says what I mean.  I certainly didn't intend to imply that I
> > was expecting to see 512 consecutive entries which were all identical,
> > which would be the idiomatic way to read the code that was there before.
> > I shouldn't've tried to be so concise.
> > 
> > (If you'd rather I write any of this differently, I'm more than happy
> > to change it)
> 
> No, I'm happy with the style of it, just discontented that the second
> xas_find_conflict() pretends to more than it provides (I think).
> 
> Hugh

      reply	other threads:[~2022-01-04  8:12 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-03  1:35 [PATCH next 3/3] shmem: Fix "Unused swap" messages Hugh Dickins
2022-01-03 15:36 ` Matthew Wilcox
2022-01-03 20:10   ` Hugh Dickins
2022-01-04  1:41     ` Matthew Wilcox
2022-01-04  4:43       ` Hugh Dickins
2022-01-04  8:12         ` Matthew Wilcox [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YdQBaYbQNC9laavZ@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=william.kucharski@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).