From: Andrea Righi <andrea.righi@canonical.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: swap: do not wait for lock_page() in unuse_pte_range()
Date: Wed, 22 Jul 2020 20:48:41 +0200 [thread overview]
Message-ID: <20200722184841.GC841369@xps-13> (raw)
In-Reply-To: <20200722180425.GP15516@casper.infradead.org>
On Wed, Jul 22, 2020 at 07:04:25PM +0100, Matthew Wilcox wrote:
> On Wed, Jul 22, 2020 at 07:44:36PM +0200, Andrea Righi wrote:
> > Waiting for lock_page() with mm->mmap_sem held in unuse_pte_range() can
> > lead to stalls while running swapoff (i.e., not being able to ssh into
> > the system, inability to execute simple commands like 'ps', etc.).
> >
> > Replace lock_page() with trylock_page() and release mm->mmap_sem if we
> > fail to lock it, giving other tasks a chance to continue and prevent
> > the stall.
>
> I think you've removed the warning at the expense of turning a stall
> into a potential livelock.
>
> > @@ -1977,7 +1977,11 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> > return -ENOMEM;
> > }
> >
> > - lock_page(page);
> > + if (!trylock_page(page)) {
> > + ret = -EAGAIN;
> > + put_page(page);
> > + goto out;
> > + }
>
> If you look at the patterns we have elsewhere in the MM for doing
> this kind of thing (eg truncate_inode_pages_range()), we iterate over the
> entire range, take care of the easy cases, then go back and deal with the
> hard cases later.
>
> So that would argue for skipping any page that we can't trylock, but
> continue over at least the VMA, and quite possibly the entire MM until
> we're convinced that we have unused all of the required pages.
>
> Another thing we could do is drop the MM semaphore _here_, sleep on this
> page until it's unlocked, then go around again.
>
> if (!trylock_page(page)) {
> mmap_read_unlock(mm);
> lock_page(page);
> unlock_page(page);
> put_page(page);
> ret = -EAGAIN;
> goto out;
> }
>
> (I haven't checked the call paths; maybe you can't do this because
> sometimes it's called with the mmap sem held for write)
>
> Also, if we're trying to scale this better, there are some fun
> workloads where readers block writers who block subsequent readers
> and we shouldn't wait for I/O in swapin_readahead(). See patches like
> 6b4c9f4469819a0c1a38a0a4541337e0f9bf6c11 for more on this kind of thing.
Thanks for the review, Matthew. I'll see if I can find a better solution
following your useful hints!
-Andrea
prev parent reply other threads:[~2020-07-22 18:48 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-22 17:44 [PATCH] mm: swap: do not wait for lock_page() in unuse_pte_range() Andrea Righi
2020-07-22 18:04 ` Matthew Wilcox
2020-07-22 18:48 ` Andrea Righi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200722184841.GC841369@xps-13 \
--to=andrea.righi@canonical.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).