From: Mike Kravetz <mike.kravetz@oracle.com>
To: Jiaqi Yan <jiaqiyan@google.com>, naoya.horiguchi@nec.com
Cc: songmuchun@bytedance.com, shy828301@gmail.com,
linmiaohe@huawei.com, akpm@linux-foundation.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
duenwen@google.com, axelrasmussen@google.com,
jthoughton@google.com
Subject: Re: [PATCH v1 1/3] mm/hwpoison: find subpage in hugetlb HWPOISON list
Date: Fri, 19 May 2023 15:42:14 -0700 [thread overview]
Message-ID: <20230519224214.GB3581@monkey> (raw)
In-Reply-To: <CACw3F52zNguJ-MvXOAJuMK+JfreLxorvHDPwO8w_gQdOzWj7eA@mail.gmail.com>
On 05/19/23 13:54, Jiaqi Yan wrote:
> On Wed, May 17, 2023 at 4:53 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> > On 05/17/23 16:09, Jiaqi Yan wrote:
> > > Adds the functionality to search a subpage's corresponding raw_hwp_page
> > > in hugetlb page's HWPOISON list. This functionality can also tell if a
> > > subpage is a raw HWPOISON page.
> > >
> > > Exports this functionality to be immediately used in the read operation
> > > for hugetlbfs.
> > >
> > > Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> > > ---
> > > include/linux/mm.h | 23 +++++++++++++++++++++++
> > > mm/memory-failure.c | 26 ++++++++++++++++----------
> > > 2 files changed, 39 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 27ce77080c79..f191a4119719 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> >
> > Any reason why you decided to add the following to linux/mm.h instead of
> > linux/hugetlb.h? Since it is hugetlb specific I would have thought
> > hugetlb.h was more appropriate.
> >
> > > @@ -3683,6 +3683,29 @@ enum mf_action_page_type {
> > > */
> > > extern const struct attribute_group memory_failure_attr_group;
> > >
> > > +#ifdef CONFIG_HUGETLB_PAGE
> > > +/*
> > > + * Struct raw_hwp_page represents information about "raw error page",
> > > + * constructing singly linked list from ->_hugetlb_hwpoison field of folio.
> > > + */
> > > +struct raw_hwp_page {
> > > + struct llist_node node;
> > > + struct page *page;
> > > +};
> > > +
> > > +static inline struct llist_head *raw_hwp_list_head(struct folio *folio)
> > > +{
> > > + return (struct llist_head *)&folio->_hugetlb_hwpoison;
> > > +}
> > > +
> > > +/*
> > > + * Given @subpage, a raw page in a hugepage, find its location in @folio's
> > > + * _hugetlb_hwpoison list. Return NULL if @subpage is not in the list.
> > > + */
> > > +struct raw_hwp_page *find_raw_hwp_page(struct folio *folio,
> > > + struct page *subpage);
> > > +#endif
> > > +
> > > #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
> > > extern void clear_huge_page(struct page *page,
> > > unsigned long addr_hint,
> > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > > index 5b663eca1f29..c49e6c2d1f07 100644
> > > --- a/mm/memory-failure.c
> > > +++ b/mm/memory-failure.c
> > > @@ -1818,18 +1818,24 @@ EXPORT_SYMBOL_GPL(mf_dax_kill_procs);
> > > #endif /* CONFIG_FS_DAX */
> > >
> > > #ifdef CONFIG_HUGETLB_PAGE
> > > -/*
> > > - * Struct raw_hwp_page represents information about "raw error page",
> > > - * constructing singly linked list from ->_hugetlb_hwpoison field of folio.
> > > - */
> > > -struct raw_hwp_page {
> > > - struct llist_node node;
> > > - struct page *page;
> > > -};
> > >
> > > -static inline struct llist_head *raw_hwp_list_head(struct folio *folio)
> > > +struct raw_hwp_page *find_raw_hwp_page(struct folio *folio,
> > > + struct page *subpage)
> > > {
> > > - return (struct llist_head *)&folio->_hugetlb_hwpoison;
> > > + struct llist_node *t, *tnode;
> > > + struct llist_head *raw_hwp_head = raw_hwp_list_head(folio);
> > > + struct raw_hwp_page *hwp_page = NULL;
> > > + struct raw_hwp_page *p;
> > > +
> > > + llist_for_each_safe(tnode, t, raw_hwp_head->first) {
> >
> > IIUC, in rare error cases a hugetlb page can be poisoned WITHOUT a
> > raw_hwp_list. This is indicated by the hugetlb page specific flag
> > RawHwpUnreliable or folio_test_hugetlb_raw_hwp_unreliable().
> >
> > Looks like this routine does not consider that case. Seems like it should
> > always return the passed subpage if folio_test_hugetlb_raw_hwp_unreliable()
> > is true?
>
> Thanks for catching this. I wonder should this routine consider
> RawHwpUnreliable or should the caller do.
>
> find_raw_hwp_page now returns raw_hwp_page* in the llist entry to
> caller (valid one at the moment), but once RawHwpUnreliable is set,
> all the raw_hwp_page in the llist will be kfree(), and the returned
> value becomes dangling pointer to caller (if the caller holds that
> caller long enough). Maybe returning a bool would be safer to the
> caller? If the routine returns bool, then checking RawHwpUnreliable
> can definitely be within the routine.
I think the check for RawHwpUnreliable should be within this routine.
Looking closer at the code, I do not see any way to synchronize this.
It looks like manipulation in the memory-failure code would be
synchronized via the mf_mutex. However, I do not see how traversal and
freeing of the raw_hwp_list called from __update_and_free_hugetlb_folio
is synchronized against memory-failure code modifying the list.
Naoya, can you provide some thoughts?
>
> Another option is, this routine simply doesn one thing: find a
> raw_hwp_page in raw_hwp_list for a subpage. But the caller needs to 1)
> test RawHwpUnreliable before calls into the routine, and 2) test
> RawHwpUnreliable before access returned raw_hwp_page*. I think 2nd
> option will be error-prone and the 1st option is a better one.
>
> Maybe I am over-thinking. What do you think?
I think racing code accessing the raw_hwp_list is very unlikely.
However, it is possible and should be considered.
--
Mike Kravetz
>
> > --
> > Mike Kravetz
> >
> > > + p = container_of(tnode, struct raw_hwp_page, node);
> > > + if (subpage == p->page) {
> > > + hwp_page = p;
> > > + break;
> > > + }
> > > + }
> > > +
> > > + return hwp_page;
> > > }
> > >
> > > static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_flag)
> > > --
> > > 2.40.1.606.ga4b1b128d6-goog
> > >
next prev parent reply other threads:[~2023-05-19 22:43 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-17 16:09 [PATCH v1 0/3] Improve hugetlbfs read on HWPOISON hugepages Jiaqi Yan
2023-05-17 16:09 ` [PATCH v1 1/3] mm/hwpoison: find subpage in hugetlb HWPOISON list Jiaqi Yan
2023-05-17 23:53 ` Mike Kravetz
2023-05-19 20:54 ` Jiaqi Yan
2023-05-19 22:42 ` Mike Kravetz [this message]
2023-05-22 4:50 ` HORIGUCHI NAOYA(堀口 直也)
2023-05-22 18:22 ` Jiaqi Yan
2023-05-23 2:43 ` HORIGUCHI NAOYA(堀口 直也)
2023-05-26 0:28 ` Jiaqi Yan
2023-06-10 5:48 ` Jiaqi Yan
2023-06-12 4:19 ` Naoya Horiguchi
2023-06-16 21:19 ` Jiaqi Yan
2023-06-16 23:34 ` Mike Kravetz
2023-06-17 2:18 ` Jiaqi Yan
2023-06-17 22:59 ` Mike Kravetz
2023-06-19 8:23 ` Naoya Horiguchi
2023-06-20 18:05 ` Mike Kravetz
2023-06-20 22:39 ` Mike Kravetz
2023-06-23 0:45 ` Jiaqi Yan
2023-06-23 4:19 ` Mike Kravetz
2023-06-23 16:40 ` Jiaqi Yan
2023-05-17 16:09 ` [PATCH v1 2/3] hugetlbfs: improve read HWPOISON hugepage Jiaqi Yan
2023-05-18 22:18 ` Mike Kravetz
2023-05-19 20:54 ` Jiaqi Yan
2023-05-17 16:09 ` [PATCH v1 3/3] selftests/mm: add tests for HWPOISON hugetlbfs read Jiaqi Yan
2023-05-23 7:35 ` kernel test robot
2023-05-17 23:30 ` [PATCH v1 0/3] Improve hugetlbfs read on HWPOISON hugepages Mike Kravetz
2023-05-18 16:02 ` Jiaqi Yan
2023-05-18 16:10 ` Jiaqi Yan
2023-05-18 22:24 ` Mike Kravetz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230519224214.GB3581@monkey \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=duenwen@google.com \
--cc=jiaqiyan@google.com \
--cc=jthoughton@google.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=naoya.horiguchi@nec.com \
--cc=shy828301@gmail.com \
--cc=songmuchun@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.