From: Elliott Mitchell <ehem+xen@m5p.com>
To: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>,
xen-devel@lists.xenproject.org,
Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wl@xen.org>,
Kelly Choi <kelly.choi@cloud.com>
Subject: Re: Serious AMD-Vi(?) issue
Date: Wed, 10 Jul 2024 11:35:19 -0700 [thread overview]
Message-ID: <Zo7UZ7l6m0KUl/Gx@mattapan.m5p.com> (raw)
In-Reply-To: <ZocdQFCkH7p5nkiz@mattapan.m5p.com>
On Thu, Jul 04, 2024 at 03:08:00PM -0700, Elliott Mitchell wrote:
> On Mon, Jul 01, 2024 at 11:07:57AM -0700, Elliott Mitchell wrote:
> > On Thu, Jun 27, 2024 at 05:18:15PM -0700, Elliott Mitchell wrote:
> >
> > Most processors were mentioned roughly equally. Several had fewer
> > mentions, but not enough to seem significant. I discovered processor 1
> > did NOT show up. Whereas processor 0 had an above average number of
> > occurrences. This seems notable as these 2 processors are both reserved
> > exclusively for domain 0.
>
> All of the patterns continue. There are more reports on processor 0 than
> any other processor, but not enough to look particularly suspicious.
> What *does* look suspicious is the complete absence of reports from
> processor 1.
Bit more work with sort/uniq here and there is more of a pattern.
Odd-numbered processors (1,3,5) are seeing fewer reports, with CPU1 being
an outlier for having none. Even-numbered processors (0,2,4) are seeing
more reports, with CPU0 displaying the most of any processor. There is
also a pattern of lower-numbered processors seeing more of the reports
and higher numbered ones seeing less (CPU1 being an outlier).
If my reading of `xl dmesg` is correct, then the lower-numbered
processors are the first die and higher-numbered processors are the
second die. My guess is the 0 and 1 are the first conjoined pair which
share more of their silicon with each other.
> > There have also been a few "spurious 8259A interrupt" lines. So far
> > there haven't been very many of these. The processor and IRQ listed
> > don't yet appear to show any patterns. So far no IRQ has been listed
> > twice.
>
> IRQs 3-7 and 9-15 have each shown up once. 1-2 and 8 haven't shown up
> so far.
#8 has now shown up, so 8259A interrupts 3-15 have now all shown up
*once*. 0-2 haven't show up at all.
Certain MSI IRQs are showing up. The complete list is:
IRQ70 2
IRQ71 82
IRQ72 368
IRQ73 81
IRQ90 22
IRQ107 27
IRQ108 92
IRQ109 23
IRQ111 29
IRQ117 1
I'm unsure whether this actually works, but looking at /proc/interrupts,
all of these are associated with Xen according to Domain 0. 68-91 are
all listed as "xen-percpu", 105-120 are listed as "xen-dyn-lateeoi".
*IF* I am understanding this correctly, this *might* be the same problem
https://lists.xenproject.org/archives/html/xen-devel/2024-07/msg00454.html
Domain 0 is reportting plenty of spurious events.
I'm starting to wonder if this isn't a Linux software RAID1 on AMD
hardware issue, but instead a more generalized issue towards the core
of Xen's interrupt handling. Just AMD hardware gets hit harder.
> Things look different enough to try reenabling Linux software RAID1. I'm
> going to continue monitoring closely, but so far it seems
> "iommu=no-intremap" may in fact mitigate the issue with software RAID1.
At this point I've monitored for problems and not found any for long
enough to declare this a tentative mitigation.
--
(\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/)
\BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) /
\_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
next prev parent reply other threads:[~2024-07-10 18:35 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-25 20:24 Serious AMD-Vi issue Elliott Mitchell
2024-02-12 23:23 ` Elliott Mitchell
2024-03-04 19:56 ` Elliott Mitchell
2024-03-18 19:41 ` Serious AMD-Vi(?) issue Elliott Mitchell
2024-03-22 16:41 ` Kelly Choi
2024-03-22 19:22 ` Elliott Mitchell
2024-03-25 7:55 ` Jan Beulich
2024-03-25 21:43 ` Elliott Mitchell
2024-03-27 17:27 ` Elliott Mitchell
2024-03-28 6:25 ` Jan Beulich
2024-03-28 15:22 ` Elliott Mitchell
2024-03-28 16:17 ` Elliott Mitchell
2024-04-11 2:41 ` Elliott Mitchell
2024-04-17 12:40 ` Jan Beulich
2024-04-18 6:45 ` Elliott Mitchell
2024-04-18 7:09 ` Jan Beulich
2024-04-19 4:33 ` Elliott Mitchell
2024-05-11 4:09 ` Elliott Mitchell
2024-05-13 8:44 ` Roger Pau Monné
2024-05-13 20:11 ` Elliott Mitchell
2024-05-14 8:22 ` Jan Beulich
2024-05-14 20:51 ` Elliott Mitchell
2024-05-15 13:40 ` Kelly Choi
2024-05-16 5:21 ` Elliott Mitchell
2024-05-14 8:20 ` Jan Beulich
2024-06-28 0:18 ` Elliott Mitchell
2024-07-01 18:07 ` Elliott Mitchell
2024-07-04 22:08 ` Elliott Mitchell
2024-07-10 18:35 ` Elliott Mitchell [this message]
2024-03-04 23:55 ` AMD-Vi issue Andrew Cooper
2024-03-05 0:34 ` Elliott Mitchell
2025-01-24 14:31 ` Serious " Roger Pau Monné
2025-01-24 21:26 ` Elliott Mitchell
2025-01-26 0:24 ` Teddy Astie
2025-01-27 9:44 ` Roger Pau Monné
2025-02-18 4:05 ` Elliott Mitchell
2025-04-13 22:08 ` Elliott Mitchell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zo7UZ7l6m0KUl/Gx@mattapan.m5p.com \
--to=ehem+xen@m5p.com \
--cc=andrew.cooper3@citrix.com \
--cc=jbeulich@suse.com \
--cc=kelly.choi@cloud.com \
--cc=roger.pau@citrix.com \
--cc=wl@xen.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.