linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Christoph Hellwig <hch@infradead.org>,
	darrick.wong@oracle.com, david@fromorbit.com, yukuai3@huawei.com,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: Splitting an iomap_page
Date: Tue, 8 Sep 2020 12:43:28 +0100	[thread overview]
Message-ID: <20200908114328.GE27537@casper.infradead.org> (raw)
In-Reply-To: <20200907113324.2uixo4u5elveoysf@box>

On Mon, Sep 07, 2020 at 02:33:24PM +0300, Kirill A. Shutemov wrote:
> On Fri, Sep 04, 2020 at 04:37:24AM +0100, Matthew Wilcox wrote:
> > Kirill, do I have the handling of split_huge_page() failure correct?
> > It seems reasonable to me -- unlock the page and drop the reference,
> > hoping that somebody else will not have a reference to the page by the
> > next time we try to split it.  Or they will split it for us.  There's a
> > livelock opportunity here, but I'm not sure it's worse than the one in
> > a holepunch scenario.
> 
> The worst case scenario is when the page is referenced (directly or
> indirectly) by the caller. It this case we would end up with endless loop.
> I'm not sure how we can guarantee that this will never happen.

I don't see a way for that to happen at the moment.  We're pretty
careful not to take references on multiple pages at once in these paths.
I've fixed the truncate paths to only take one reference per THP too.

I was thinking that if livelock becomes a problem, we could (ab)use the
THP destructor mechanism somewhat like this:

Add
	[TRANSHUGE_PAGE_SPLIT] = split_transhuge_page,
to the compound_page_dtors array.

New function split_huge_page_wait() which, if !can_split_huge_page()
first checks if the dtor is already set to TRANSHUGE_PAGE_SPLIT.  If so,
it returns to its caller, reporting failure (so that it will put its
reference to the page).  Then it sets the dtor to TRANSHUGE_PAGE_SPLIT
and sets page refcount to 1.  It goes to sleep on the page.

split_transhuge_page() first has to check if the refcount went to 0
due to mapcount being reduced.  If so, it resets the refcount to 1 and
returns to the caller.  If not, it freezes the page and wakes the task
above which is sleeping in split_huge_page_wait().

It's complicated and I don't love it.  But it might solve livelock, should
we need to do it.  It wouldn't prevent us from an indefinite wait if the
caller of split_huge_page_wait() has more than one reference to this page.
That's better than a livelock though.

  reply	other threads:[~2020-09-08 12:29 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-21 14:40 Splitting an iomap_page Matthew Wilcox
2020-08-22  6:06 ` Christoph Hellwig
2020-08-24 16:06   ` Darrick J. Wong
2020-09-04  3:37 ` Matthew Wilcox
2020-09-07 11:33   ` Kirill A. Shutemov
2020-09-08 11:43     ` Matthew Wilcox [this message]
2020-09-09 11:57       ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200908114328.GE27537@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).