All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yu Zhao <yuzhao@google.com>
To: Mauricio Faria de Oliveira <mfo@canonical.com>
Cc: Minchan Kim <minchan@kernel.org>,
	"Huang, Ying" <ying.huang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Yang Shi <shy828301@gmail.com>, Miaohe Lin <linmiaohe@huawei.com>,
	linux-mm@kvack.org, linux-block@vger.kernel.org
Subject: Re: [PATCH v3] mm: fix race between MADV_FREE reclaim and blkdev direct IO read
Date: Wed, 2 Feb 2022 14:53:22 -0700	[thread overview]
Message-ID: <Yfr9UkEtLSHL2qhZ@google.com> (raw)
In-Reply-To: <CAO9xwp3DNioiVPJNH9w-eXLxfVmTx9jBpOgq9eatpTFJTTg50Q@mail.gmail.com>

On Wed, Feb 02, 2022 at 06:27:47PM -0300, Mauricio Faria de Oliveira wrote:
> On Wed, Feb 2, 2022 at 4:56 PM Yu Zhao <yuzhao@google.com> wrote:
> >
> > On Mon, Jan 31, 2022 at 08:02:55PM -0300, Mauricio Faria de Oliveira wrote:
> > > Problem:
> > > =======
> >
> > Thanks for the update. A couple of quick questions:
> >
> > > Userspace might read the zero-page instead of actual data from a
> > > direct IO read on a block device if the buffers have been called
> > > madvise(MADV_FREE) on earlier (this is discussed below) due to a
> > > race between page reclaim on MADV_FREE and blkdev direct IO read.
> >
> > 1) would page migration be affected as well?
> 
> Could you please elaborate on the potential problem you considered?
> 
> I checked migrate_pages() -> try_to_migrate() holds the page lock,
> thus shouldn't race with shrink_page_list() -> with try_to_unmap()
> (where the issue with MADV_FREE is), but maybe I didn't get you
> correctly.

Could the race exist between DIO and migration? While DIO is writing
to a page, could migration unmap it and copy the data from this page
to a new page?

> > > @@ -1599,7 +1599,30 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> > >
> > >                       /* MADV_FREE page check */
> > >                       if (!PageSwapBacked(page)) {
> > > -                             if (!PageDirty(page)) {
> > > +                             int ref_count, map_count;
> > > +
> > > +                             /*
> > > +                              * Synchronize with gup_pte_range():
> > > +                              * - clear PTE; barrier; read refcount
> > > +                              * - inc refcount; barrier; read PTE
> > > +                              */
> > > +                             smp_mb();
> > > +
> > > +                             ref_count = page_count(page);
> > > +                             map_count = page_mapcount(page);
> > > +
> > > +                             /*
> > > +                              * Order reads for page refcount and dirty flag;
> > > +                              * see __remove_mapping().
> > > +                              */
> > > +                             smp_rmb();
> >
> > 2) why does it need to order against __remove_mapping()? It seems to
> >    me that here (called from the reclaim path) it can't race with
> >    __remove_mapping() because both lock the page.
> 
> I'll improve that comment in v4.  The ordering isn't against __remove_mapping(),
> but actually because of an issue described in __remove_mapping()'s comments
> (something else that doesn't hold the page lock, just has a page reference, that
> may clear the page dirty flag then drop the reference; thus check ref,
> then dirty).

Got it. IIRC, get_user_pages() doesn't imply a write barrier. If so,
there should be a smp_wmb() on the other side:

	 * get_user_pages(&page);

	smp_wmb()

	 * SetPageDirty(page);
	 * put_page(page);

(__remove_mapping() doesn't need smp_[rw]mb() on either side because
it relies on page refcnt freeze and retesting.)

Thanks.

  reply	other threads:[~2022-02-02 21:53 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-31 23:02 [PATCH v3] mm: fix race between MADV_FREE reclaim and blkdev direct IO read Mauricio Faria de Oliveira
2022-01-31 23:43 ` Andrew Morton
2022-02-01  2:23   ` Mauricio Faria de Oliveira
2022-02-02 14:03 ` Christoph Hellwig
2022-02-02 16:29   ` Mauricio Faria de Oliveira
2022-02-02 19:56 ` Yu Zhao
2022-02-02 21:27   ` Mauricio Faria de Oliveira
2022-02-02 21:53     ` Yu Zhao [this message]
2022-02-03 22:17       ` Mauricio Faria de Oliveira
2022-02-04  5:56         ` Yu Zhao
2022-02-04  7:03           ` John Hubbard
2022-02-04 18:59             ` Mauricio Faria de Oliveira
2022-02-04 18:58           ` Mauricio Faria de Oliveira
2022-02-16  6:48       ` Huang, Ying
2022-02-16 21:58         ` Yu Zhao
2022-02-16 22:00           ` Yu Zhao
2022-02-17  6:08           ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yfr9UkEtLSHL2qhZ@google.com \
    --to=yuzhao@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mfo@canonical.com \
    --cc=minchan@kernel.org \
    --cc=shy828301@gmail.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.