From: Naoya Horiguchi <naoya.horiguchi@linux.dev>
To: Yang Shi <shy828301@gmail.com>
Cc: naoya.horiguchi@nec.com, hughd@google.com,
kirill.shutemov@linux.intel.com, willy@infradead.org,
peterx@redhat.com, osalvador@suse.de, akpm@linux-foundation.org,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC v4 PATCH 0/6] Solve silent data loss caused by poisoned page cache (shmem/tmpfs)
Date: Tue, 19 Oct 2021 14:53:47 +0900 [thread overview]
Message-ID: <20211019055347.GD2268449@u2004> (raw)
In-Reply-To: <20211014191615.6674-1-shy828301@gmail.com>
On Thu, Oct 14, 2021 at 12:16:09PM -0700, Yang Shi wrote:
>
> When discussing the patch that splits page cache THP in order to offline the
> poisoned page, Noaya mentioned there is a bigger problem [1] that prevents this
> from working since the page cache page will be truncated if uncorrectable
> errors happen. By looking this deeper it turns out this approach (truncating
> poisoned page) may incur silent data loss for all non-readonly filesystems if
> the page is dirty. It may be worse for in-memory filesystem, e.g. shmem/tmpfs
> since the data blocks are actually gone.
>
> To solve this problem we could keep the poisoned dirty page in page cache then
> notify the users on any later access, e.g. page fault, read/write, etc. The
> clean page could be truncated as is since they can be reread from disk later on.
>
> The consequence is the filesystems may find poisoned page and manipulate it as
> healthy page since all the filesystems actually don't check if the page is
> poisoned or not in all the relevant paths except page fault. In general, we
> need make the filesystems be aware of poisoned page before we could keep the
> poisoned page in page cache in order to solve the data loss problem.
>
> To make filesystems be aware of poisoned page we should consider:
> - The page should be not written back: clearing dirty flag could prevent from
> writeback.
> - The page should not be dropped (it shows as a clean page) by drop caches or
> other callers: the refcount pin from hwpoison could prevent from invalidating
> (called by cache drop, inode cache shrinking, etc), but it doesn't avoid
> invalidation in DIO path.
> - The page should be able to get truncated/hole punched/unlinked: it works as it
> is.
> - Notify users when the page is accessed, e.g. read/write, page fault and other
> paths (compression, encryption, etc).
>
> The scope of the last one is huge since almost all filesystems need do it once
> a page is returned from page cache lookup. There are a couple of options to
> do it:
>
> 1. Check hwpoison flag for every path, the most straightforward way.
> 2. Return NULL for poisoned page from page cache lookup, the most callsites
> check if NULL is returned, this should have least work I think. But the
> error handling in filesystems just return -ENOMEM, the error code will incur
> confusion to the users obviously.
> 3. To improve #2, we could return error pointer, e.g. ERR_PTR(-EIO), but this
> will involve significant amount of code change as well since all the paths
> need check if the pointer is ERR or not just like option #1.
>
> I did prototype for both #1 and #3, but it seems #3 may require more changes
> than #1. For #3 ERR_PTR will be returned so all the callers need to check the
> return value otherwise invalid pointer may be dereferenced, but not all callers
> really care about the content of the page, for example, partial truncate which
> just sets the truncated range in one page to 0. So for such paths it needs
> additional modification if ERR_PTR is returned. And if the callers have their
> own way to handle the problematic pages we need to add a new FGP flag to tell
> FGP functions to return the pointer to the page.
>
> It may happen very rarely, but once it happens the consequence (data corruption)
> could be very bad and it is very hard to debug. It seems this problem had been
> slightly discussed before, but seems no action was taken at that time. [2]
>
> As the aforementioned investigation, it needs huge amount of work to solve
> the potential data loss for all filesystems. But it is much easier for
> in-memory filesystems and such filesystems actually suffer more than others
> since even the data blocks are gone due to truncating. So this patchset starts
> from shmem/tmpfs by taking option #1.
Thank you for the work. I have a few comment on todo...
>
> TODO:
> * The unpoison has been broken since commit 0ed950d1f281 ("mm,hwpoison: make
> get_hwpoison_page() call get_any_page()"), and this patch series make
> refcount check for unpoisoning shmem page fail.
It's OK to leave unpoison unsolved now. I'm working on this now (revising
v1 patch [1]), but I'm facing some race issue cauisng kernel panic with kernel
mode page fault, so I need to solve it.
[1] https://lore.kernel.org/linux-mm/20210614021212.223326-1-nao.horiguchi@gmail.com/
> * Expand to other filesystems. But I haven't heard feedback from filesystem
> developers yet.
I think that hugetlbfs can be a good next target because it's similar to
shmem in that it's in-memory filesystem.
Thanks,
Naoya Horiguchi
next prev parent reply other threads:[~2021-10-19 5:53 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-14 19:16 [RFC v4 PATCH 0/6] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Yang Shi
2021-10-14 19:16 ` [v4 PATCH 1/6] mm: hwpoison: remove the unnecessary THP check Yang Shi
2021-10-14 19:16 ` [v4 PATCH 2/6] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault Yang Shi
2021-10-19 5:50 ` Naoya Horiguchi
2021-10-19 17:13 ` Yang Shi
2021-10-14 19:16 ` [v4 PATCH 3/6] mm: filemap: coding style cleanup for filemap_map_pmd() Yang Shi
2021-10-19 5:51 ` Naoya Horiguchi
2021-10-14 19:16 ` [v4 PATCH 4/6] mm: hwpoison: refactor refcount check handling Yang Shi
2021-10-14 19:16 ` [v4 PATCH 5/6] mm: shmem: don't truncate page if memory failure happens Yang Shi
2021-10-19 5:52 ` Naoya Horiguchi
2021-10-19 17:29 ` Yang Shi
2021-10-19 22:30 ` Naoya Horiguchi
2021-10-20 18:32 ` Yang Shi
2021-10-14 19:16 ` [v4 PATCH 6/6] mm: hwpoison: handle non-anonymous THP correctly Yang Shi
2021-10-15 20:28 ` [RFC v4 PATCH 0/6] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Andrew Morton
2021-10-15 21:48 ` Yang Shi
2021-10-19 5:53 ` Naoya Horiguchi [this message]
2021-10-19 17:32 ` Yang Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211019055347.GD2268449@u2004 \
--to=naoya.horiguchi@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=naoya.horiguchi@nec.com \
--cc=osalvador@suse.de \
--cc=peterx@redhat.com \
--cc=shy828301@gmail.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.