linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jane Chu <jane.chu@oracle.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: david <david@fromorbit.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	Vishal L Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Alasdair Kergon <agk@redhat.com>,
	Mike Snitzer <snitzer@redhat.com>,
	device-mapper development <dm-devel@redhat.com>,
	"Weiny, Ira" <ira.weiny@intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Vivek Goyal <vgoyal@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux NVDIMM <nvdimm@lists.linux.dev>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>, X86 ML <x86@kernel.org>,
	"luto@kernel.org" <luto@kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"dave.hansen@intel.com" <dave.hansen@intel.com>
Subject: Re: [PATCH v7 3/6] mce: fix set_mce_nospec to always unmap the whole page
Date: Fri, 15 Apr 2022 16:18:57 +0000	[thread overview]
Message-ID: <b3f0bfcd-9e7c-cc71-e91d-d0f28053dea9@oracle.com> (raw)
In-Reply-To: <CAPcyv4jpwzMPKtzzc=DEbC340+zmzXkj+QtPVxfYbraskLKv8g@mail.gmail.com>

On 4/13/2022 7:32 PM, Dan Williams wrote:
> On Wed, Apr 13, 2022 at 4:36 PM Jane Chu <jane.chu@oracle.com> wrote:
>>
>> On 4/11/2022 4:27 PM, Dan Williams wrote:
>>> On Tue, Apr 5, 2022 at 12:48 PM Jane Chu <jane.chu@oracle.com> wrote:
>>>>
>>>> The set_memory_uc() approach doesn't work well in all cases.
>>>> For example, when "The VMM unmapped the bad page from guest
>>>> physical space and passed the machine check to the guest."
>>>> "The guest gets virtual #MC on an access to that page.
>>>>    When the guest tries to do set_memory_uc() and instructs
>>>>    cpa_flush() to do clean caches that results in taking another
>>>>    fault / exception perhaps because the VMM unmapped the page
>>>>    from the guest."
>>>>
>>>> Since the driver has special knowledge to handle NP or UC,
>>>
>>> I think a patch is needed before this one to make this statement true? I.e.:
>>>
>>> diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
>>> index ee8d9973f60b..11641f55025a 100644
>>> --- a/drivers/acpi/nfit/mce.c
>>> +++ b/drivers/acpi/nfit/mce.c
>>> @@ -32,6 +32,7 @@ static int nfit_handle_mce(struct notifier_block
>>> *nb, unsigned long val,
>>>            */
>>>           mutex_lock(&acpi_desc_lock);
>>>           list_for_each_entry(acpi_desc, &acpi_descs, list) {
>>> +               unsigned int align = 1UL << MCI_MISC_ADDR_LSB(mce->misc);
>>>                   struct device *dev = acpi_desc->dev;
>>>                   int found_match = 0;
>>>
>>> @@ -63,8 +64,7 @@ static int nfit_handle_mce(struct notifier_block
>>> *nb, unsigned long val,
>>>
>>>                   /* If this fails due to an -ENOMEM, there is little we can do */
>>>                   nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus,
>>> -                               ALIGN(mce->addr, L1_CACHE_BYTES),
>>> -                               L1_CACHE_BYTES);
>>> +                                       ALIGN(mce->addr, align), align);
>>>                   nvdimm_region_notify(nfit_spa->nd_region,
>>>                                   NVDIMM_REVALIDATE_POISON);
>>>
>>
>> Dan, I tried the above change, and this is what I got after injecting 8
>> back-to-back poisons, then read them and received  SIGBUS/BUS_MCEERR_AR,
>> then repair via the v7 patch which works until this change is added.
>>
>> [ 6240.955331] nfit ACPI0012:00: XXX, align = 100
>> [ 6240.960300] nfit ACPI0012:00: XXX, ALIGN(mce->addr,
>> L1_CACHE_BYTES)=1851600400, L1_CACHE_BYTES=40, ALIGN(mce->addr,
>> align)=1851600400
>> [..]
>> [ 6242.052277] nfit ACPI0012:00: XXX, align = 100
>> [ 6242.057243] nfit ACPI0012:00: XXX, ALIGN(mce->addr,
>> L1_CACHE_BYTES)=1851601000, L1_CACHE_BYTES=40, ALIGN(mce->addr,
>> align)=1851601000
>> [..]
>> [ 6244.917198] nfit ACPI0012:00: XXX, align = 1000
>> [ 6244.922258] nfit ACPI0012:00: XXX, ALIGN(mce->addr,
>> L1_CACHE_BYTES)=1851601200, L1_CACHE_BYTES=40, ALIGN(mce->addr,
>> align)=1851602000
>> [..]
>>
>> All 8 poisons remain uncleared.
>>
>> Without further investigation, I don't know why the failure.
>> Could we mark this change to a follow-on task?
> 
> Perhaps a bit more debug before kicking this can down the road...
> 
> I'm worried that this means that the driver is not accurately tracking
> poison data For example, that last case the hardware is indicating a
> full page clobber, but the old code would only track poison from
> 1851601200 to 1851601400 (i.e. the first 512 bytes from the base error
> address).

I would appear so, but the old code tracking seems to be working 
correctly. For example, injecting 8 back-to-back poison, the
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/badblocks
cpatures all 8 of them, that spans 2K range, right?  I never had issue 
about missing poison in my tests.

> 
> Oh... wait, I think there is a second bug here, that ALIGN should be
> ALIGN_DOWN(). Does this restore the result you expect?

That's it, ALIGN_DOWN fixed the issue, thanks!!
I'll add a patch, suggested-by you.

> 
> diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
> index ee8d9973f60b..d7a52238a741 100644
> --- a/drivers/acpi/nfit/mce.c
> +++ b/drivers/acpi/nfit/mce.c
> @@ -63,8 +63,7 @@ static int nfit_handle_mce(struct notifier_block
> *nb, unsigned long val,
> 
>                  /* If this fails due to an -ENOMEM, there is little we can do */
>                  nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus,
> -                               ALIGN(mce->addr, L1_CACHE_BYTES),
> -                               L1_CACHE_BYTES);
> +                                       ALIGN_DOWN(mce->addr, align), align);
>                  nvdimm_region_notify(nfit_spa->nd_region,
>                                  NVDIMM_REVALIDATE_POISON);
> 
> 
>> The driver knows a lot about how to clear poisons besides hardcoding
>> poison alignment to 0x40 bytes.
> 
> It does, but the badblocks tracking should still be reliable, and if
> it's not reliable I expect there are cases where recovery_write() will
> not be triggered because the driver will not fail the
> dax_direct_access() attempt.

thanks!
-jane


  reply	other threads:[~2022-04-15 16:19 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-05 19:47 [PATCH v7 0/6] DAX poison recovery Jane Chu
2022-04-05 19:47 ` [PATCH v7 1/6] x86/mm: fix comment Jane Chu
2022-04-11 22:07   ` Dan Williams
2022-04-12  9:53   ` Borislav Petkov
2022-04-14  1:00     ` Jane Chu
2022-04-14  8:44       ` Borislav Petkov
2022-04-14 21:54         ` Jane Chu
2022-04-05 19:47 ` [PATCH v7 2/6] x86/mce: relocate set{clear}_mce_nospec() functions Jane Chu
2022-04-06  5:01   ` Christoph Hellwig
2022-04-11 22:20   ` Dan Williams
2022-04-14  0:56     ` Jane Chu
2022-04-05 19:47 ` [PATCH v7 3/6] mce: fix set_mce_nospec to always unmap the whole page Jane Chu
2022-04-06  5:02   ` Christoph Hellwig
2022-04-11 23:27   ` Dan Williams
2022-04-13 23:36     ` Jane Chu
2022-04-14  2:32       ` Dan Williams
2022-04-15 16:18         ` Jane Chu [this message]
2022-04-12 10:07   ` Borislav Petkov
2022-04-13 23:41     ` Jane Chu
2022-04-05 19:47 ` [PATCH v7 4/6] dax: add DAX_RECOVERY flag and .recovery_write dev_pgmap_ops Jane Chu
2022-04-06  5:19   ` Christoph Hellwig
2022-04-06 17:32     ` Jane Chu
2022-04-06 17:45       ` Jane Chu
2022-04-07  5:30       ` Christoph Hellwig
2022-04-11 23:55         ` Dan Williams
2022-04-14  0:48           ` Jane Chu
2022-04-14  0:47         ` Jane Chu
2022-04-12  0:08   ` Dan Williams
2022-04-14  0:50     ` Jane Chu
2022-04-12  4:57   ` Dan Williams
2022-04-12  5:02     ` Christoph Hellwig
2022-04-14  0:51       ` Jane Chu
2022-04-05 19:47 ` [PATCH v7 5/6] pmem: refactor pmem_clear_poison() Jane Chu
2022-04-06  5:04   ` Christoph Hellwig
2022-04-06 17:34     ` Jane Chu
2022-04-12  4:26   ` Dan Williams
2022-04-14  0:55     ` Jane Chu
2022-04-14  2:02       ` Dan Williams
2022-04-05 19:47 ` [PATCH v7 6/6] pmem: implement pmem_recovery_write() Jane Chu
2022-04-06  5:21   ` Christoph Hellwig
2022-04-06 17:33     ` Jane Chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b3f0bfcd-9e7c-cc71-e91d-d0f28053dea9@oracle.com \
    --to=jane.chu@oracle.com \
    --cc=agk@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=ira.weiny@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=snitzer@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).