From: Yang Shi <shy828301@gmail.com>
To: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
"Hugh Dickins" <hughd@google.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
"Matthew Wilcox" <willy@infradead.org>,
"Peter Xu" <peterx@redhat.com>,
"Oscar Salvador" <osalvador@suse.de>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Linux MM" <linux-mm@kvack.org>,
"Linux FS-devel Mailing List" <linux-fsdevel@vger.kernel.org>,
"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC v4 PATCH 0/6] Solve silent data loss caused by poisoned page cache (shmem/tmpfs)
Date: Tue, 19 Oct 2021 10:32:04 -0700 [thread overview]
Message-ID: <CAHbLzkpBusb2wmifocodcpWwC5q=1g6Cpn8HHHXZNq2=PD977g@mail.gmail.com> (raw)
In-Reply-To: <20211019055347.GD2268449@u2004>
On Mon, Oct 18, 2021 at 10:53 PM Naoya Horiguchi
<naoya.horiguchi@linux.dev> wrote:
>
> On Thu, Oct 14, 2021 at 12:16:09PM -0700, Yang Shi wrote:
> >
> > When discussing the patch that splits page cache THP in order to offline the
> > poisoned page, Noaya mentioned there is a bigger problem [1] that prevents this
> > from working since the page cache page will be truncated if uncorrectable
> > errors happen. By looking this deeper it turns out this approach (truncating
> > poisoned page) may incur silent data loss for all non-readonly filesystems if
> > the page is dirty. It may be worse for in-memory filesystem, e.g. shmem/tmpfs
> > since the data blocks are actually gone.
> >
> > To solve this problem we could keep the poisoned dirty page in page cache then
> > notify the users on any later access, e.g. page fault, read/write, etc. The
> > clean page could be truncated as is since they can be reread from disk later on.
> >
> > The consequence is the filesystems may find poisoned page and manipulate it as
> > healthy page since all the filesystems actually don't check if the page is
> > poisoned or not in all the relevant paths except page fault. In general, we
> > need make the filesystems be aware of poisoned page before we could keep the
> > poisoned page in page cache in order to solve the data loss problem.
> >
> > To make filesystems be aware of poisoned page we should consider:
> > - The page should be not written back: clearing dirty flag could prevent from
> > writeback.
> > - The page should not be dropped (it shows as a clean page) by drop caches or
> > other callers: the refcount pin from hwpoison could prevent from invalidating
> > (called by cache drop, inode cache shrinking, etc), but it doesn't avoid
> > invalidation in DIO path.
> > - The page should be able to get truncated/hole punched/unlinked: it works as it
> > is.
> > - Notify users when the page is accessed, e.g. read/write, page fault and other
> > paths (compression, encryption, etc).
> >
> > The scope of the last one is huge since almost all filesystems need do it once
> > a page is returned from page cache lookup. There are a couple of options to
> > do it:
> >
> > 1. Check hwpoison flag for every path, the most straightforward way.
> > 2. Return NULL for poisoned page from page cache lookup, the most callsites
> > check if NULL is returned, this should have least work I think. But the
> > error handling in filesystems just return -ENOMEM, the error code will incur
> > confusion to the users obviously.
> > 3. To improve #2, we could return error pointer, e.g. ERR_PTR(-EIO), but this
> > will involve significant amount of code change as well since all the paths
> > need check if the pointer is ERR or not just like option #1.
> >
> > I did prototype for both #1 and #3, but it seems #3 may require more changes
> > than #1. For #3 ERR_PTR will be returned so all the callers need to check the
> > return value otherwise invalid pointer may be dereferenced, but not all callers
> > really care about the content of the page, for example, partial truncate which
> > just sets the truncated range in one page to 0. So for such paths it needs
> > additional modification if ERR_PTR is returned. And if the callers have their
> > own way to handle the problematic pages we need to add a new FGP flag to tell
> > FGP functions to return the pointer to the page.
> >
> > It may happen very rarely, but once it happens the consequence (data corruption)
> > could be very bad and it is very hard to debug. It seems this problem had been
> > slightly discussed before, but seems no action was taken at that time. [2]
> >
> > As the aforementioned investigation, it needs huge amount of work to solve
> > the potential data loss for all filesystems. But it is much easier for
> > in-memory filesystems and such filesystems actually suffer more than others
> > since even the data blocks are gone due to truncating. So this patchset starts
> > from shmem/tmpfs by taking option #1.
>
> Thank you for the work. I have a few comment on todo...
>
> >
> > TODO:
> > * The unpoison has been broken since commit 0ed950d1f281 ("mm,hwpoison: make
> > get_hwpoison_page() call get_any_page()"), and this patch series make
> > refcount check for unpoisoning shmem page fail.
>
> It's OK to leave unpoison unsolved now. I'm working on this now (revising
> v1 patch [1]), but I'm facing some race issue cauisng kernel panic with kernel
> mode page fault, so I need to solve it.
>
> [1] https://lore.kernel.org/linux-mm/20210614021212.223326-1-nao.horiguchi@gmail.com/
Thanks.
>
> > * Expand to other filesystems. But I haven't heard feedback from filesystem
> > developers yet.
>
> I think that hugetlbfs can be a good next target because it's similar to
> shmem in that it's in-memory filesystem.
Yeah, I agree. Will look into it later. Thanks for the suggestion.
>
> Thanks,
> Naoya Horiguchi
prev parent reply other threads:[~2021-10-19 17:33 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-14 19:16 [RFC v4 PATCH 0/6] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Yang Shi
2021-10-14 19:16 ` [v4 PATCH 1/6] mm: hwpoison: remove the unnecessary THP check Yang Shi
2021-10-14 19:16 ` [v4 PATCH 2/6] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault Yang Shi
2021-10-19 5:50 ` Naoya Horiguchi
2021-10-19 17:13 ` Yang Shi
2021-10-14 19:16 ` [v4 PATCH 3/6] mm: filemap: coding style cleanup for filemap_map_pmd() Yang Shi
2021-10-19 5:51 ` Naoya Horiguchi
2021-10-14 19:16 ` [v4 PATCH 4/6] mm: hwpoison: refactor refcount check handling Yang Shi
2021-10-14 19:16 ` [v4 PATCH 5/6] mm: shmem: don't truncate page if memory failure happens Yang Shi
2021-10-19 5:52 ` Naoya Horiguchi
2021-10-19 17:29 ` Yang Shi
2021-10-19 22:30 ` Naoya Horiguchi
2021-10-20 18:32 ` Yang Shi
2021-10-14 19:16 ` [v4 PATCH 6/6] mm: hwpoison: handle non-anonymous THP correctly Yang Shi
2021-10-15 20:28 ` [RFC v4 PATCH 0/6] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Andrew Morton
2021-10-15 21:48 ` Yang Shi
2021-10-19 5:53 ` Naoya Horiguchi
2021-10-19 17:32 ` Yang Shi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHbLzkpBusb2wmifocodcpWwC5q=1g6Cpn8HHHXZNq2=PD977g@mail.gmail.com' \
--to=shy828301@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=naoya.horiguchi@linux.dev \
--cc=naoya.horiguchi@nec.com \
--cc=osalvador@suse.de \
--cc=peterx@redhat.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).