From: Peter Xu <peterx@redhat.com>
To: Jiaqi Yan <jiaqiyan@google.com>
Cc: James Houghton <jthoughton@google.com>,
Axel Rasmussen <axelrasmussen@google.com>,
David Hildenbrand <david@redhat.com>,
Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>,
linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Shuah Khan <shuah@kernel.org>,
linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: BUG selftests/mm]
Date: Tue, 12 Mar 2024 11:38:24 -0400 [thread overview]
Message-ID: <ZfB28NIbflrnsqiX@x1n> (raw)
In-Reply-To: <CACw3F51vMqPBHmvj4ehSA8PadXw30s3MxCqph1op5dxtB-tV6Q@mail.gmail.com>
On Mon, Mar 11, 2024 at 03:28:28PM -0700, Jiaqi Yan wrote:
> On Mon, Mar 11, 2024 at 2:27 PM James Houghton <jthoughton@google.com> wrote:
> >
> > On Mon, Mar 11, 2024 at 12:28 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Mon, Mar 11, 2024 at 11:59:59AM -0700, Axel Rasmussen wrote:
> > > > I'd prefer not to require root or CAP_SYS_ADMIN or similar for
> > > > UFFDIO_POISON, because those control access to lots more things
> > > > besides, which we don't necessarily want the process using UFFD to be
> > > > able to do. :/
> >
> > I agree; UFFDIO_POISON should not require CAP_SYS_ADMIN.
>
> +1.
>
>
> >
> > > >
> > > > Ratelimiting seems fairly reasonable to me. I do see the concern about
> > > > dropping some addresses though.
> > >
> > > Do you know how much could an admin rely on such addresses? How frequent
> > > would MCE generate normally in a sane system?
> >
> > I'm not sure about how much admins rely on the address themselves. +cc
> > Jiaqi Yan
>
> I think admins mostly care about MCEs from **real** hardware. For
> example they may choose to perform some maintenance if the number of
> hardware DIMM errors, keyed by PFN, exceeds some threshold. And I
> think mcelog or /sys/devices/system/node/node${X}/memory_failure are
> better tools than dmesg. In the case all memory errors are emulated by
> hypervisor after a live migration, these dmesgs may confuse admins to
> think there is dimm error on host but actually it is not the case. In
> this sense, silencing these emulated by UFFDIO_POISON makes sense (if
> not too complicated to do).
Now we have three types of such error: (1) PFN poisoned, (2) swapin error,
(3) emulated. Both 1+2 should deserve a global message dump, while (3)
should be process-internal, and nobody else should need to care except the
process itself (via the signal + meta info).
If we want to differenciate (2) v.s. (3), we may need 1 more pte marker bit
to show whether such poison is "global" or "local" (while as of now 2+3
shares the usage of the same PTE_MARKER_POISONED bit); a swapin error can
still be seen as a "global" error (instead of a mem error, it can be a disk
error, and the err msg still applies to it describing a VA corrupt).
Another VM_FAULT_* flag is also needed to reflect that locality, then
ignore a global broadcast for "local" poison faults.
>
> SIGBUS (and logged "MCE: Killing %s:%d due to hardware memory
> corruption fault at %lx\n") emit by fault handler due to UFFDIO_POISON
> are less useful to admins AFAIK. They are for sure crucial to
> userspace / vmm / hypervisor, but the SIGBUS sent already contains the
> poisoned address (in si_addr from force_sig_mceerr).
>
> >
> > It's possible for a sane hypervisor dealing with a buggy guest / guest
> > userspace to trigger lots of these pr_errs. Consider the case where a
> > guest userspace uses HugeTLB-1G, finds poison (which HugeTLB used to
> > ignore), and then ignores SIGBUS. It will keep getting MCEs /
> > SIGBUSes.
> >
> > The sane hypervisor will use UFFDIO_POISON to prevent the guest from
> > re-accessing *real* poison, but we will still get the pr_err, and we
> > still keep injecting MCEs into the guest. We have observed scenarios
> > like this before.
> >
> > >
> > > > Perhaps we can mitigate that concern by defining our own ratelimit
> > > > interval/burst configuration?
> > >
> > > Any details?
> > >
> > > > Another idea would be to only ratelimit it if !CONFIG_DEBUG_VM or
> > > > similar. Not sure if that's considered valid or not. :)
> > >
> > > This, OTOH, sounds like an overkill..
> > >
> > > I just checked again on the detail of ratelimit code, where we by default
> > > it has:
> > >
> > > #define DEFAULT_RATELIMIT_INTERVAL (5 * HZ)
> > > #define DEFAULT_RATELIMIT_BURST 10
> > >
> > > So it allows a 10 times burst rather than 2.. IIUC it means even if
> > > there're continous 10 MCEs it won't get suppressed, until the 11th came, in
> > > 5 seconds interval. I think it means it's possibly even less of a concern
> > > to directly use pr_err_ratelimited().
> >
> > I'm okay with any rate limiting everyone agrees on. IMO, silencing
> > these pr_errs if they came from UFFDIO_POISON (or, perhaps, if they
> > did not come from real hardware MCE events) sounds like the most
> > correct thing to do, but I don't mind. Just don't make UFFDIO_POISON
> > require CAP_SYS_ADMIN. :)
> >
> > Thanks.
>
--
Peter Xu
next prev parent reply other threads:[~2024-03-12 15:38 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-09 19:12 BUG selftests/mm] Mirsad Todorovac
2024-03-11 9:31 ` David Hildenbrand
2024-03-11 14:35 ` Peter Xu
2024-03-11 14:48 ` David Hildenbrand
2024-03-11 15:12 ` Peter Xu
2024-03-11 18:59 ` Axel Rasmussen
2024-03-11 19:28 ` Peter Xu
2024-03-11 21:26 ` James Houghton
2024-03-11 22:28 ` Jiaqi Yan
2024-03-12 15:38 ` Peter Xu [this message]
2024-03-12 16:47 ` Axel Rasmussen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZfB28NIbflrnsqiX@x1n \
--to=peterx@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=david@redhat.com \
--cc=jiaqiyan@google.com \
--cc=jthoughton@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mirsad.todorovac@alu.unizg.hr \
--cc=shuah@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.