Serious AMD-Vi issue

All of lore.kernel.org
 help / color / mirror / Atom feed

* Serious AMD-Vi issue
@ 2024-01-25 20:24 Elliott Mitchell
  2024-02-12 23:23 ` Elliott Mitchell
                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Elliott Mitchell @ 2024-01-25 20:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Jan Beulich, Andrew Cooper

Apparently this was first noticed with 4.14, but more recently I've been
able to reproduce the issue:

https://bugs.debian.org/988477

The original observation features MD-RAID1 using a pair of Samsung
SATA-attached flash devices.  The main line shows up in `xl dmesg`:

(XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr ffffff???????000 flags 0x8 I

Where the device points at the SATA controller.  I've ended up
reproducing this with some noticable differences.

A major goal of RAID is to have different devices fail at different
times.  Hence my initial run had a Samsung device plus a device from
another reputable flash manufacturer.

I initially noticed this due to messages in domain 0's dmesg about
errors from the SATA device.  Wasn't until rather later that I noticed
the IOMMU warnings in Xen's dmesg (perhaps post-domain 0 messages should
be duplicated into domain 0's dmesg?).

All of the failures consistently pointed at the Samsung device.  Due to
the expectation it would fail first (lower quality offering with
lesser guarantees), I proceeded to replace it with a NVMe device.

With some monitoring I discovered the NVMe device was now triggering
IOMMU errors, though not nearly as many as the Samsung SATA device did.
As such looks like AMD-Vi plus MD-RAID1 appears to be exposing some sort
of IOMMU issue with Xen.

All I can do is offer speculation about the underlying cause.  There
does seem to be a pattern of higher-performance flash storage devices
being more severely effected.

I was speculating about the issue being the MD-RAID1 driver abusing
Linux's DMA infrastructure in some fashion.

Upon further consideration, I'm wondering if this is perhaps a latency
issue.  I imagine there is some sort of flush after the IOMMU tables are
modified.  Perhaps the Samsung SATA (and all NVMe) devices were trying to
execute commands before reloading the IOMMU tables is complete.

-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi issue
  2024-01-25 20:24 Serious AMD-Vi issue Elliott Mitchell
@ 2024-02-12 23:23 ` Elliott Mitchell
  2024-03-04 19:56   ` Elliott Mitchell
  2024-03-18 19:41   ` Serious AMD-Vi(?) issue Elliott Mitchell
  2024-03-04 23:55 ` AMD-Vi issue Andrew Cooper
  2025-01-24 14:31 ` Serious " Roger Pau Monné
  2 siblings, 2 replies; 37+ messages in thread
From: Elliott Mitchell @ 2024-02-12 23:23 UTC (permalink / raw)
  To: xen-devel, Jan Beulich, Andrew Cooper

On Thu, Jan 25, 2024 at 12:24:53PM -0800, Elliott Mitchell wrote:
> Apparently this was first noticed with 4.14, but more recently I've been
> able to reproduce the issue:
> 
> https://bugs.debian.org/988477
> 
> The original observation features MD-RAID1 using a pair of Samsung
> SATA-attached flash devices.  The main line shows up in `xl dmesg`:
> 
> (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr ffffff???????000 flags 0x8 I
> 
> Where the device points at the SATA controller.  I've ended up
> reproducing this with some noticable differences.
> 
> A major goal of RAID is to have different devices fail at different
> times.  Hence my initial run had a Samsung device plus a device from
> another reputable flash manufacturer.
> 
> I initially noticed this due to messages in domain 0's dmesg about
> errors from the SATA device.  Wasn't until rather later that I noticed
> the IOMMU warnings in Xen's dmesg (perhaps post-domain 0 messages should
> be duplicated into domain 0's dmesg?).
> 
> All of the failures consistently pointed at the Samsung device.  Due to
> the expectation it would fail first (lower quality offering with
> lesser guarantees), I proceeded to replace it with a NVMe device.
> 
> With some monitoring I discovered the NVMe device was now triggering
> IOMMU errors, though not nearly as many as the Samsung SATA device did.
> As such looks like AMD-Vi plus MD-RAID1 appears to be exposing some sort
> of IOMMU issue with Xen.
> 
> 
> All I can do is offer speculation about the underlying cause.  There
> does seem to be a pattern of higher-performance flash storage devices
> being more severely effected.
> 
> I was speculating about the issue being the MD-RAID1 driver abusing
> Linux's DMA infrastructure in some fashion.
> 
> Upon further consideration, I'm wondering if this is perhaps a latency
> issue.  I imagine there is some sort of flush after the IOMMU tables are
> modified.  Perhaps the Samsung SATA (and all NVMe) devices were trying to
> execute commands before reloading the IOMMU tables is complete.

Ping!

The recipe seems to be Linux MD RAID1, plus Samsung SATA or any NVMe.

To make it explicit, when I tried Crucial SATA + Samsung SATA.  IOMMU
errors matched the Samsung SATA (a number of times the SATA driver
complained).

As stated, I'm speculating lower latency devices starting to execute
commands before IOMMU tables have finished reloading.  When originally
implemented fast flash devices were rare.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi issue
  2024-02-12 23:23 ` Elliott Mitchell
@ 2024-03-04 19:56   ` Elliott Mitchell
  2024-03-18 19:41   ` Serious AMD-Vi(?) issue Elliott Mitchell
  1 sibling, 0 replies; 37+ messages in thread
From: Elliott Mitchell @ 2024-03-04 19:56 UTC (permalink / raw)
  To: xen-devel, Jan Beulich, Andrew Cooper

On Mon, Feb 12, 2024 at 03:23:00PM -0800, Elliott Mitchell wrote:
> On Thu, Jan 25, 2024 at 12:24:53PM -0800, Elliott Mitchell wrote:
> > Apparently this was first noticed with 4.14, but more recently I've been
> > able to reproduce the issue:
> > 
> > https://bugs.debian.org/988477
> > 
> > The original observation features MD-RAID1 using a pair of Samsung
> > SATA-attached flash devices.  The main line shows up in `xl dmesg`:
> > 
> > (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr ffffff???????000 flags 0x8 I
> > 
> > Where the device points at the SATA controller.  I've ended up
> > reproducing this with some noticable differences.
> > 
> > A major goal of RAID is to have different devices fail at different
> > times.  Hence my initial run had a Samsung device plus a device from
> > another reputable flash manufacturer.
> > 
> > I initially noticed this due to messages in domain 0's dmesg about
> > errors from the SATA device.  Wasn't until rather later that I noticed
> > the IOMMU warnings in Xen's dmesg (perhaps post-domain 0 messages should
> > be duplicated into domain 0's dmesg?).
> > 
> > All of the failures consistently pointed at the Samsung device.  Due to
> > the expectation it would fail first (lower quality offering with
> > lesser guarantees), I proceeded to replace it with a NVMe device.
> > 
> > With some monitoring I discovered the NVMe device was now triggering
> > IOMMU errors, though not nearly as many as the Samsung SATA device did.
> > As such looks like AMD-Vi plus MD-RAID1 appears to be exposing some sort
> > of IOMMU issue with Xen.
> > 
> > 
> > All I can do is offer speculation about the underlying cause.  There
> > does seem to be a pattern of higher-performance flash storage devices
> > being more severely effected.
> > 
> > I was speculating about the issue being the MD-RAID1 driver abusing
> > Linux's DMA infrastructure in some fashion.
> > 
> > Upon further consideration, I'm wondering if this is perhaps a latency
> > issue.  I imagine there is some sort of flush after the IOMMU tables are
> > modified.  Perhaps the Samsung SATA (and all NVMe) devices were trying to
> > execute commands before reloading the IOMMU tables is complete.
> 
> Ping!
> 
> The recipe seems to be Linux MD RAID1, plus Samsung SATA or any NVMe.
> 
> To make it explicit, when I tried Crucial SATA + Samsung SATA.  IOMMU
> errors matched the Samsung SATA (a number of times the SATA driver
> complained).
> 
> As stated, I'm speculating lower latency devices starting to execute
> commands before IOMMU tables have finished reloading.  When originally
> implemented fast flash devices were rare.

I guess I'm lucky I ended up with some slightly higher-latency hardware.
This is a very serious issue as data loss can occur.

AMD needs to fund their Xen engineers more, otherwise soon AMD hardware
may no longer be viable with Xen.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD-Vi issue
  2024-01-25 20:24 Serious AMD-Vi issue Elliott Mitchell
  2024-02-12 23:23 ` Elliott Mitchell
@ 2024-03-04 23:55 ` Andrew Cooper
  2024-03-05  0:34   ` Elliott Mitchell
  2025-01-24 14:31 ` Serious " Roger Pau Monné
  2 siblings, 1 reply; 37+ messages in thread
From: Andrew Cooper @ 2024-03-04 23:55 UTC (permalink / raw)
  To: Elliott Mitchell, xen-devel; +Cc: Jan Beulich

On 25/01/2024 8:24 pm, Elliott Mitchell wrote:
> The original observation features MD-RAID1 using a pair of Samsung
> SATA-attached flash devices.  The main line shows up in `xl dmesg`:
>
> (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr ffffff???????000 flags 0x8 I

You have a device which is issuing interrupts which have been blocked
due to the IOMMU configuration.

But you've elided all details other than the fact it's assigned to
dom0.  As such, there is nothing we can do to help.

This isn't the first time you've been asked to provide a bare minimum
amount of details such that we might be able to help.  I'm sorry you are
having problems, but continuing to ping a question with no actionable
information is unfair to those of us who are who are providing support
to the community for free.

~Andrew

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD-Vi issue
  2024-03-04 23:55 ` AMD-Vi issue Andrew Cooper
@ 2024-03-05  0:34   ` Elliott Mitchell
  0 siblings, 0 replies; 37+ messages in thread
From: Elliott Mitchell @ 2024-03-05  0:34 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jan Beulich

On Mon, Mar 04, 2024 at 11:55:07PM +0000, Andrew Cooper wrote:
> On 25/01/2024 8:24 pm, Elliott Mitchell wrote:
> > The original observation features MD-RAID1 using a pair of Samsung
> > SATA-attached flash devices.  The main line shows up in `xl dmesg`:
> >
> > (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr ffffff???????000 flags 0x8 I
> 
> You have a device which is issuing interrupts which have been blocked
> due to the IOMMU configuration.
> 
> But you've elided all details other than the fact it's assigned to
> dom0.  As such, there is nothing we can do to help.

I've provided the details thought most likely to allow others to
reproduce.  I've pointed to a report from someone else with somewhat
similar hardware who also encountered this (and had enough hardware to
try several combinations).

Sorry about being reluctant to send exact hardware serial numbers to a
public mailing list.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-02-12 23:23 ` Elliott Mitchell
  2024-03-04 19:56   ` Elliott Mitchell
@ 2024-03-18 19:41   ` Elliott Mitchell
  2024-03-22 16:41     ` Kelly Choi
  1 sibling, 1 reply; 37+ messages in thread
From: Elliott Mitchell @ 2024-03-18 19:41 UTC (permalink / raw)
  To: xen-devel; +Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu

I sent a ping on this about 2 weeks ago.  Since the plan is to move x86
IOMMU under general x86, the other x86 maintainers should be aware of
this:

On Mon, Feb 12, 2024 at 03:23:00PM -0800, Elliott Mitchell wrote:
> On Thu, Jan 25, 2024 at 12:24:53PM -0800, Elliott Mitchell wrote:
> > Apparently this was first noticed with 4.14, but more recently I've been
> > able to reproduce the issue:
> > 
> > https://bugs.debian.org/988477
> > 
> > The original observation features MD-RAID1 using a pair of Samsung
> > SATA-attached flash devices.  The main line shows up in `xl dmesg`:
> > 
> > (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr ffffff???????000 flags 0x8 I
> > 
> > Where the device points at the SATA controller.  I've ended up
> > reproducing this with some noticable differences.
> > 
> > A major goal of RAID is to have different devices fail at different
> > times.  Hence my initial run had a Samsung device plus a device from
> > another reputable flash manufacturer.
> > 
> > I initially noticed this due to messages in domain 0's dmesg about
> > errors from the SATA device.  Wasn't until rather later that I noticed
> > the IOMMU warnings in Xen's dmesg (perhaps post-domain 0 messages should
> > be duplicated into domain 0's dmesg?).
> > 
> > All of the failures consistently pointed at the Samsung device.  Due to
> > the expectation it would fail first (lower quality offering with
> > lesser guarantees), I proceeded to replace it with a NVMe device.
> > 
> > With some monitoring I discovered the NVMe device was now triggering
> > IOMMU errors, though not nearly as many as the Samsung SATA device did.
> > As such looks like AMD-Vi plus MD-RAID1 appears to be exposing some sort
> > of IOMMU issue with Xen.
> > 
> > 
> > All I can do is offer speculation about the underlying cause.  There
> > does seem to be a pattern of higher-performance flash storage devices
> > being more severely effected.
> > 
> > I was speculating about the issue being the MD-RAID1 driver abusing
> > Linux's DMA infrastructure in some fashion.
> > 
> > Upon further consideration, I'm wondering if this is perhaps a latency
> > issue.  I imagine there is some sort of flush after the IOMMU tables are
> > modified.  Perhaps the Samsung SATA (and all NVMe) devices were trying to
> > execute commands before reloading the IOMMU tables is complete.
> 
> Ping!
> 
> The recipe seems to be Linux MD RAID1, plus Samsung SATA or any NVMe.
> 
> To make it explicit, when I tried Crucial SATA + Samsung SATA.  IOMMU
> errors matched the Samsung SATA (a number of times the SATA driver
> complained).
> 
> As stated, I'm speculating lower latency devices starting to execute
> commands before IOMMU tables have finished reloading.  When originally
> implemented fast flash devices were rare.

Both reproductions of this issue I'm aware of were on systems with AMD
processors.  I'm doubtul suspicion of flash storage hardware is unique
to owners of AMD systems.  As a result while this /could/ also effect
Intel systems, the lack of reports /suggests/ otherwise.

I've noticed two things when glancing at the original report.  LVM is not
in use here, so that doesn't seem to effect the problem.  The Phenom II
the original reporter tested as not having the issue might have lacked
proper BIOS support, hence IOMMU not being functional.

This being a latency issue is *speculation*, but would explain the
pattern of devices being effected.

This is rather serious as it can lead to data loss (phew!  glad I just
barely dodged this outcome).


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-03-18 19:41   ` Serious AMD-Vi(?) issue Elliott Mitchell
@ 2024-03-22 16:41     ` Kelly Choi
  2024-03-22 19:22       ` Elliott Mitchell
  0 siblings, 1 reply; 37+ messages in thread
From: Kelly Choi @ 2024-03-22 16:41 UTC (permalink / raw)
  To: Elliott Mitchell
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu

[-- Attachment #1: Type: text/plain, Size: 5066 bytes --]

Hi Elliott,

I hope you're well.

I'm Kelly, the community manager at the Xen Project.

I can see you've recently engaged with our community with some issues you'd
like help with.
We love the fact you are participating in our project, however, our
developers aren't able to help if you do not provide the specific details.

As an open-source project, our developers are committed to helping and
contributing as much as possible. We welcome you to continue participating,
however, please refrain from requesting help without providing the
necessary details as this takes up a lot of our community's time to analyze
what is possible and what assistance you might need.

I'd recommend providing logs or specific information so the community can
help you further.

If you'd like to chat more, let me know.

Many thanks,
Kelly Choi

Community Manager
Xen Project


On Mon, Mar 18, 2024 at 7:42 PM Elliott Mitchell <ehem+xen@m5p.com> wrote:

> I sent a ping on this about 2 weeks ago.  Since the plan is to move x86
> IOMMU under general x86, the other x86 maintainers should be aware of
> this:
>
> On Mon, Feb 12, 2024 at 03:23:00PM -0800, Elliott Mitchell wrote:
> > On Thu, Jan 25, 2024 at 12:24:53PM -0800, Elliott Mitchell wrote:
> > > Apparently this was first noticed with 4.14, but more recently I've
> been
> > > able to reproduce the issue:
> > >
> > > https://bugs.debian.org/988477
> > >
> > > The original observation features MD-RAID1 using a pair of Samsung
> > > SATA-attached flash devices.  The main line shows up in `xl dmesg`:
> > >
> > > (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr ffffff???????000
> flags 0x8 I
> > >
> > > Where the device points at the SATA controller.  I've ended up
> > > reproducing this with some noticable differences.
> > >
> > > A major goal of RAID is to have different devices fail at different
> > > times.  Hence my initial run had a Samsung device plus a device from
> > > another reputable flash manufacturer.
> > >
> > > I initially noticed this due to messages in domain 0's dmesg about
> > > errors from the SATA device.  Wasn't until rather later that I noticed
> > > the IOMMU warnings in Xen's dmesg (perhaps post-domain 0 messages
> should
> > > be duplicated into domain 0's dmesg?).
> > >
> > > All of the failures consistently pointed at the Samsung device.  Due to
> > > the expectation it would fail first (lower quality offering with
> > > lesser guarantees), I proceeded to replace it with a NVMe device.
> > >
> > > With some monitoring I discovered the NVMe device was now triggering
> > > IOMMU errors, though not nearly as many as the Samsung SATA device did.
> > > As such looks like AMD-Vi plus MD-RAID1 appears to be exposing some
> sort
> > > of IOMMU issue with Xen.
> > >
> > >
> > > All I can do is offer speculation about the underlying cause.  There
> > > does seem to be a pattern of higher-performance flash storage devices
> > > being more severely effected.
> > >
> > > I was speculating about the issue being the MD-RAID1 driver abusing
> > > Linux's DMA infrastructure in some fashion.
> > >
> > > Upon further consideration, I'm wondering if this is perhaps a latency
> > > issue.  I imagine there is some sort of flush after the IOMMU tables
> are
> > > modified.  Perhaps the Samsung SATA (and all NVMe) devices were trying
> to
> > > execute commands before reloading the IOMMU tables is complete.
> >
> > Ping!
> >
> > The recipe seems to be Linux MD RAID1, plus Samsung SATA or any NVMe.
> >
> > To make it explicit, when I tried Crucial SATA + Samsung SATA.  IOMMU
> > errors matched the Samsung SATA (a number of times the SATA driver
> > complained).
> >
> > As stated, I'm speculating lower latency devices starting to execute
> > commands before IOMMU tables have finished reloading.  When originally
> > implemented fast flash devices were rare.
>
> Both reproductions of this issue I'm aware of were on systems with AMD
> processors.  I'm doubtul suspicion of flash storage hardware is unique
> to owners of AMD systems.  As a result while this /could/ also effect
> Intel systems, the lack of reports /suggests/ otherwise.
>
> I've noticed two things when glancing at the original report.  LVM is not
> in use here, so that doesn't seem to effect the problem.  The Phenom II
> the original reporter tested as not having the issue might have lacked
> proper BIOS support, hence IOMMU not being functional.
>
> This being a latency issue is *speculation*, but would explain the
> pattern of devices being effected.
>
> This is rather serious as it can lead to data loss (phew!  glad I just
> barely dodged this outcome).
>
>
> --
> (\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
>  \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
>   \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
> 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
>
>
>
>

[-- Attachment #2: Type: text/html, Size: 6525 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-03-22 16:41     ` Kelly Choi
@ 2024-03-22 19:22       ` Elliott Mitchell
  2024-03-25  7:55         ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Elliott Mitchell @ 2024-03-22 19:22 UTC (permalink / raw)
  To: Kelly Choi
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu

On Fri, Mar 22, 2024 at 04:41:45PM +0000, Kelly Choi wrote:
> 
> I can see you've recently engaged with our community with some issues you'd
> like help with.
> We love the fact you are participating in our project, however, our
> developers aren't able to help if you do not provide the specific details.

Please point to specific details which have been omitted.  Fairly little
data has been provided as fairly little data is available.  The primary
observation is large numbers of:

(XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr ffffff???????000 flags 0x8 I

Lines in Xen's ring buffer.  I recall spotting 3 messages from Linux's
SATA driver (which weren't saved due to other causes being suspected),
which would likely be associated with hundreds or thousands of the above
log messages.  I never observed any messages from the NVMe subsystem
during that phase.

The most overt sign was telling the Linux kernel to scan for
inconsistencies and the kernel finding some.  The domain didn't otherwise
appear to notice trouble.

This is from memory, it would take some time to discover whether any
messages were missed.  Present mitigation action is inhibiting the
messages, but the trouble is certainly still lurking.

-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-03-22 19:22       ` Elliott Mitchell
@ 2024-03-25  7:55         ` Jan Beulich
  2024-03-25 21:43           ` Elliott Mitchell
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2024-03-25  7:55 UTC (permalink / raw)
  To: Elliott Mitchell
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné, Wei Liu,
	Kelly Choi

On 22.03.2024 20:22, Elliott Mitchell wrote:
> On Fri, Mar 22, 2024 at 04:41:45PM +0000, Kelly Choi wrote:
>>
>> I can see you've recently engaged with our community with some issues you'd
>> like help with.
>> We love the fact you are participating in our project, however, our
>> developers aren't able to help if you do not provide the specific details.
> 
> Please point to specific details which have been omitted.  Fairly little
> data has been provided as fairly little data is available.  The primary
> observation is large numbers of:
> 
> (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr ffffff???????000 flags 0x8 I
> 
> Lines in Xen's ring buffer.

Yet this is (part of) the problem: By providing only the messages that appear
relevant to you, you imply that you know that no other message is in any way
relevant. That's judgement you'd better leave to people actually trying to
investigate. Unless of course you were proposing an actual code change, with
suitable justification.

In fact when running into trouble, the usual course of action would be to
increase verbosity in both hypervisor and kernel, just to make sure no
potentially relevant message is missed.

>  I recall spotting 3 messages from Linux's
> SATA driver (which weren't saved due to other causes being suspected),
> which would likely be associated with hundreds or thousands of the above
> log messages.  I never observed any messages from the NVMe subsystem
> during that phase.
> 
> The most overt sign was telling the Linux kernel to scan for
> inconsistencies and the kernel finding some.  The domain didn't otherwise
> appear to notice trouble.
> 
> This is from memory, it would take some time to discover whether any
> messages were missed.  Present mitigation action is inhibiting the
> messages, but the trouble is certainly still lurking.

Iirc you were considering whether any of this might be a timing issue. Yet
beyond voicing that suspicion, you didn't provide any technical details as
to why you think so. Such technical details would include taking into
account how IOMMU mappings and associated IOMMU TLB flushing are carried
out. Right now, to me at least, your speculation in this regard fails
basic sanity checking. Therefore the scenario that you're thinking of
would need better describing, imo.

Jan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-03-25  7:55         ` Jan Beulich
@ 2024-03-25 21:43           ` Elliott Mitchell
  2024-03-27 17:27             ` Elliott Mitchell
  0 siblings, 1 reply; 37+ messages in thread
From: Elliott Mitchell @ 2024-03-25 21:43 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné, Wei Liu,
	Kelly Choi

On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
> On 22.03.2024 20:22, Elliott Mitchell wrote:
> > On Fri, Mar 22, 2024 at 04:41:45PM +0000, Kelly Choi wrote:
> >>
> >> I can see you've recently engaged with our community with some issues you'd
> >> like help with.
> >> We love the fact you are participating in our project, however, our
> >> developers aren't able to help if you do not provide the specific details.
> > 
> > Please point to specific details which have been omitted.  Fairly little
> > data has been provided as fairly little data is available.  The primary
> > observation is large numbers of:
> > 
> > (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr ffffff???????000 flags 0x8 I
> > 
> > Lines in Xen's ring buffer.
> 
> Yet this is (part of) the problem: By providing only the messages that appear
> relevant to you, you imply that you know that no other message is in any way
> relevant. That's judgement you'd better leave to people actually trying to
> investigate. Unless of course you were proposing an actual code change, with
> suitable justification.

Honestly, I forgot about the very small number of messages from the SATA
subsystem.  The question of whether the current mitigation actions are
effective right now was a bigger issue.  As such monitoring `xl dmesg`
was a priority to looking at SATA messages which failed to reliably
indicate status.

I *thought* I would be able to retrieve those via other slow means, but a
different and possibly overlapping issue has shown up.  Unfortunately
this means those are no longer retrievable.   :-(

> In fact when running into trouble, the usual course of action would be to
> increase verbosity in both hypervisor and kernel, just to make sure no
> potentially relevant message is missed.

More/better information might have been obtained if I'd been engaged
earlier.

> > The most overt sign was telling the Linux kernel to scan for
> > inconsistencies and the kernel finding some.  The domain didn't otherwise
> > appear to notice trouble.
> > 
> > This is from memory, it would take some time to discover whether any
> > messages were missed.  Present mitigation action is inhibiting the
> > messages, but the trouble is certainly still lurking.
> 
> Iirc you were considering whether any of this might be a timing issue. Yet
> beyond voicing that suspicion, you didn't provide any technical details as
> to why you think so. Such technical details would include taking into
> account how IOMMU mappings and associated IOMMU TLB flushing are carried
> out. Right now, to me at least, your speculation in this regard fails
> basic sanity checking. Therefore the scenario that you're thinking of
> would need better describing, imo.

True.  Mostly I'm analyzing the known information and considering what
the patterns suggest.

Presently I'm aware of two reports (Imre Szőllősi and mine).

Both of these feature AMD processor machines.  Could be people with AMD
processors are less trustful of flash storage or could be an AMD-only
IOMMU issue.  Ideally someone would test and confirm there is no issue
with Linux software RAID1 on flash on an Intel machine.

Both reports feature two flash storage devices being run through Linux
MD RAID1.  Could be the MD RAID1 subsystem is abusing the DMA interface
in some fashion.  While Imre Szőllősi reported this not occuring with a
single device, the report does not explicitly state whether that was a
degenerate RAID1 versus non-RAID.  I'm unaware of any testing with 3x
devices in RAID1.

Both reports feature Samsung SATA flash devices.  My case also includes a
Crucial NVMe device.  My case also features a Crucial SATA flash device
for which the problem did NOT occur.  So the question becomes, why did
the problem not occur for this Crucial SATA device?

According to the specifications, the Crucial SATA device is roughly on
par with the Samsung SATA devices in terms of read/write speeds.  The
NVMe device's specifications are massively better.

What comes to mind is the Crucial SATA device might have higher latency
before executing commands.  Specifications don't mention command
execution latency, so it isn't possible to know whether this is the
issue.

Yes, latency/timing is speculation.  Does seem a good fit for the pattern
though.

This could be a Linux MD RAID1 bug or a Xen bug.

Unfortunately data loss is a very serious type of bug so I'm highly
reluctant to let go of mitigations without hope for progress.

-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-03-25 21:43           ` Elliott Mitchell
@ 2024-03-27 17:27             ` Elliott Mitchell
  2024-03-28  6:25               ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Elliott Mitchell @ 2024-03-27 17:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné, Wei Liu,
	Kelly Choi

On Mon, Mar 25, 2024 at 02:43:44PM -0700, Elliott Mitchell wrote:
> On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
> > On 22.03.2024 20:22, Elliott Mitchell wrote:
> > > On Fri, Mar 22, 2024 at 04:41:45PM +0000, Kelly Choi wrote:
> > >>
> > >> I can see you've recently engaged with our community with some issues you'd
> > >> like help with.
> > >> We love the fact you are participating in our project, however, our
> > >> developers aren't able to help if you do not provide the specific details.
> > > 
> > > Please point to specific details which have been omitted.  Fairly little
> > > data has been provided as fairly little data is available.  The primary
> > > observation is large numbers of:
> > > 
> > > (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr ffffff???????000 flags 0x8 I
> > > 
> > > Lines in Xen's ring buffer.
> > 
> > Yet this is (part of) the problem: By providing only the messages that appear
> > relevant to you, you imply that you know that no other message is in any way
> > relevant. That's judgement you'd better leave to people actually trying to
> > investigate. Unless of course you were proposing an actual code change, with
> > suitable justification.
> 
> Honestly, I forgot about the very small number of messages from the SATA
> subsystem.  The question of whether the current mitigation actions are
> effective right now was a bigger issue.  As such monitoring `xl dmesg`
> was a priority to looking at SATA messages which failed to reliably
> indicate status.
> 
> I *thought* I would be able to retrieve those via other slow means, but a
> different and possibly overlapping issue has shown up.  Unfortunately
> this means those are no longer retrievable.   :-(

With some persistence I was able to retrieve them.  There are other
pieces of software with worse UIs than Xen.

> > In fact when running into trouble, the usual course of action would be to
> > increase verbosity in both hypervisor and kernel, just to make sure no
> > potentially relevant message is missed.
> 
> More/better information might have been obtained if I'd been engaged
> earlier.

This is still true, things are in full mitigation mode and I'll be
quite unhappy to go back with experiments at this point.


I now see why I left those out.  The messages from the SATA subsystem
were from a kernel which a bad patch had leaked into a LTS branch.  Looks
like the SATA subsystem was significantly broken and I'm unsure whether
any useful information could be retrieved.  Notably there is quite a bit
of noise from SATA devices not effected by this issue.

Some of the messages /might/ be useful, but the amount of noise is quite
high.  Do messages from a broken kernel interest you?


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-03-27 17:27             ` Elliott Mitchell
@ 2024-03-28  6:25               ` Jan Beulich
  2024-03-28 15:22                 ` Elliott Mitchell
  2024-04-11  2:41                 ` Elliott Mitchell
  0 siblings, 2 replies; 37+ messages in thread
From: Jan Beulich @ 2024-03-28  6:25 UTC (permalink / raw)
  To: Elliott Mitchell
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné, Wei Liu,
	Kelly Choi

On 27.03.2024 18:27, Elliott Mitchell wrote:
> On Mon, Mar 25, 2024 at 02:43:44PM -0700, Elliott Mitchell wrote:
>> On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
>>> On 22.03.2024 20:22, Elliott Mitchell wrote:
>>>> On Fri, Mar 22, 2024 at 04:41:45PM +0000, Kelly Choi wrote:
>>>>>
>>>>> I can see you've recently engaged with our community with some issues you'd
>>>>> like help with.
>>>>> We love the fact you are participating in our project, however, our
>>>>> developers aren't able to help if you do not provide the specific details.
>>>>
>>>> Please point to specific details which have been omitted.  Fairly little
>>>> data has been provided as fairly little data is available.  The primary
>>>> observation is large numbers of:
>>>>
>>>> (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr ffffff???????000 flags 0x8 I
>>>>
>>>> Lines in Xen's ring buffer.
>>>
>>> Yet this is (part of) the problem: By providing only the messages that appear
>>> relevant to you, you imply that you know that no other message is in any way
>>> relevant. That's judgement you'd better leave to people actually trying to
>>> investigate. Unless of course you were proposing an actual code change, with
>>> suitable justification.
>>
>> Honestly, I forgot about the very small number of messages from the SATA
>> subsystem.  The question of whether the current mitigation actions are
>> effective right now was a bigger issue.  As such monitoring `xl dmesg`
>> was a priority to looking at SATA messages which failed to reliably
>> indicate status.
>>
>> I *thought* I would be able to retrieve those via other slow means, but a
>> different and possibly overlapping issue has shown up.  Unfortunately
>> this means those are no longer retrievable.   :-(
> 
> With some persistence I was able to retrieve them.  There are other
> pieces of software with worse UIs than Xen.
> 
>>> In fact when running into trouble, the usual course of action would be to
>>> increase verbosity in both hypervisor and kernel, just to make sure no
>>> potentially relevant message is missed.
>>
>> More/better information might have been obtained if I'd been engaged
>> earlier.
> 
> This is still true, things are in full mitigation mode and I'll be
> quite unhappy to go back with experiments at this point.

Well, it very likely won't work without further experimenting by someone
able to observe the bad behavior. Recall we're on xen-devel here; it is
kind of expected that without clear (and practical) repro instructions
experimenting as well as info collection will remain with the reporter.

> I now see why I left those out.  The messages from the SATA subsystem
> were from a kernel which a bad patch had leaked into a LTS branch.  Looks
> like the SATA subsystem was significantly broken and I'm unsure whether
> any useful information could be retrieved.  Notably there is quite a bit
> of noise from SATA devices not effected by this issue.
> 
> Some of the messages /might/ be useful, but the amount of noise is quite
> high.  Do messages from a broken kernel interest you?

If this was a less vague (in terms of possible root causes) issue, I'd
probably have answered "yes". But in the case here I'm afraid such might
further confuse things rather than clarifying them.

Jan


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-03-28  6:25               ` Jan Beulich
@ 2024-03-28 15:22                 ` Elliott Mitchell
  2024-03-28 16:17                   ` Elliott Mitchell
  2024-04-11  2:41                 ` Elliott Mitchell
  1 sibling, 1 reply; 37+ messages in thread
From: Elliott Mitchell @ 2024-03-28 15:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné, Wei Liu,
	Kelly Choi

On Thu, Mar 28, 2024 at 07:25:02AM +0100, Jan Beulich wrote:
> On 27.03.2024 18:27, Elliott Mitchell wrote:
> > On Mon, Mar 25, 2024 at 02:43:44PM -0700, Elliott Mitchell wrote:
> >> On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
> >>> On 22.03.2024 20:22, Elliott Mitchell wrote:
> >>>> On Fri, Mar 22, 2024 at 04:41:45PM +0000, Kelly Choi wrote:
> >>>>>
> >>>>> I can see you've recently engaged with our community with some issues you'd
> >>>>> like help with.
> >>>>> We love the fact you are participating in our project, however, our
> >>>>> developers aren't able to help if you do not provide the specific details.
> >>>>
> >>>> Please point to specific details which have been omitted.  Fairly little
> >>>> data has been provided as fairly little data is available.  The primary
> >>>> observation is large numbers of:
> >>>>
> >>>> (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr ffffff???????000 flags 0x8 I
> >>>>
> >>>> Lines in Xen's ring buffer.
> >>>
> >>> Yet this is (part of) the problem: By providing only the messages that appear
> >>> relevant to you, you imply that you know that no other message is in any way
> >>> relevant. That's judgement you'd better leave to people actually trying to
> >>> investigate. Unless of course you were proposing an actual code change, with
> >>> suitable justification.
> >>
> >> Honestly, I forgot about the very small number of messages from the SATA
> >> subsystem.  The question of whether the current mitigation actions are
> >> effective right now was a bigger issue.  As such monitoring `xl dmesg`
> >> was a priority to looking at SATA messages which failed to reliably
> >> indicate status.
> >>
> >> I *thought* I would be able to retrieve those via other slow means, but a
> >> different and possibly overlapping issue has shown up.  Unfortunately
> >> this means those are no longer retrievable.   :-(
> > 
> > With some persistence I was able to retrieve them.  There are other
> > pieces of software with worse UIs than Xen.
> > 
> >>> In fact when running into trouble, the usual course of action would be to
> >>> increase verbosity in both hypervisor and kernel, just to make sure no
> >>> potentially relevant message is missed.
> >>
> >> More/better information might have been obtained if I'd been engaged
> >> earlier.
> > 
> > This is still true, things are in full mitigation mode and I'll be
> > quite unhappy to go back with experiments at this point.
> 
> Well, it very likely won't work without further experimenting by someone
> able to observe the bad behavior. Recall we're on xen-devel here; it is
> kind of expected that without clear (and practical) repro instructions
> experimenting as well as info collection will remain with the reporter.

The first reporter: https://bugs.debian.org/988477 gave pretty specific
details about their setups.

While the exact border isn't very well defined, that seems enough to give
a pretty good start.  We don't know whether all Samsung SATA devices are
effected, but most of the recent ones (<5 years old) are.  This requires
a pair of devices in software RAID1.  Likely reproduces better with AMD
AM4/AM5 processors, but almost certainly needs a fully operational IOMMU.

(ASUS motherboards tend to have well setup IOMMUs)

I would be surprised if you don't have all of the hardware on-hand.  Only
issue would be finding an appropriate pair of SATA devices, since those
tend to remain in service.  I would look for older devices which were
removed from service due to being too small (128GB 840 PRO from the first
report), or were pulled from service due to having had too many writes.


> > I now see why I left those out.  The messages from the SATA subsystem
> > were from a kernel which a bad patch had leaked into a LTS branch.  Looks
> > like the SATA subsystem was significantly broken and I'm unsure whether
> > any useful information could be retrieved.  Notably there is quite a bit
> > of noise from SATA devices not effected by this issue.
> > 
> > Some of the messages /might/ be useful, but the amount of noise is quite
> > high.  Do messages from a broken kernel interest you?
> 
> If this was a less vague (in terms of possible root causes) issue, I'd
> probably have answered "yes". But in the case here I'm afraid such might
> further confuse things rather than clarifying them.

Okay.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-03-28 15:22                 ` Elliott Mitchell
@ 2024-03-28 16:17                   ` Elliott Mitchell
  0 siblings, 0 replies; 37+ messages in thread
From: Elliott Mitchell @ 2024-03-28 16:17 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné, Wei Liu,
	Kelly Choi

On Thu, Mar 28, 2024 at 08:22:31AM -0700, Elliott Mitchell wrote:
> On Thu, Mar 28, 2024 at 07:25:02AM +0100, Jan Beulich wrote:
> > On 27.03.2024 18:27, Elliott Mitchell wrote:
> > > On Mon, Mar 25, 2024 at 02:43:44PM -0700, Elliott Mitchell wrote:
> > >> On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
> > >>>
> > >>> In fact when running into trouble, the usual course of action would be to
> > >>> increase verbosity in both hypervisor and kernel, just to make sure no
> > >>> potentially relevant message is missed.
> > >>
> > >> More/better information might have been obtained if I'd been engaged
> > >> earlier.
> > > 
> > > This is still true, things are in full mitigation mode and I'll be
> > > quite unhappy to go back with experiments at this point.
> > 
> > Well, it very likely won't work without further experimenting by someone
> > able to observe the bad behavior. Recall we're on xen-devel here; it is
> > kind of expected that without clear (and practical) repro instructions
> > experimenting as well as info collection will remain with the reporter.
> 
> The first reporter: https://bugs.debian.org/988477 gave pretty specific
> details about their setups.
> 
> While the exact border isn't very well defined, that seems enough to give
> a pretty good start.  We don't know whether all Samsung SATA devices are
> effected, but most of the recent ones (<5 years old) are.  This requires
> a pair of devices in software RAID1.  Likely reproduces better with AMD
> AM4/AM5 processors, but almost certainly needs a fully operational IOMMU.
> 
> (ASUS motherboards tend to have well setup IOMMUs)
> 
> I would be surprised if you don't have all of the hardware on-hand.  Only
> issue would be finding an appropriate pair of SATA devices, since those
> tend to remain in service.  I would look for older devices which were
> removed from service due to being too small (128GB 840 PRO from the first
> report), or were pulled from service due to having had too many writes.

Come to think of it, one more possible ingredient to this.  Similar to
the first report, when the problem occurred, the SATA device was plugged
into an on chipset SATA port, not the extra controller this motherboard
has.  I don't know whether the performance difference of an off-main
chip controller would influence this, but it might.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-03-28  6:25               ` Jan Beulich
  2024-03-28 15:22                 ` Elliott Mitchell
@ 2024-04-11  2:41                 ` Elliott Mitchell
  2024-04-17 12:40                   ` Jan Beulich
  1 sibling, 1 reply; 37+ messages in thread
From: Elliott Mitchell @ 2024-04-11  2:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné, Wei Liu,
	Kelly Choi

On Thu, Mar 28, 2024 at 07:25:02AM +0100, Jan Beulich wrote:
> On 27.03.2024 18:27, Elliott Mitchell wrote:
> > On Mon, Mar 25, 2024 at 02:43:44PM -0700, Elliott Mitchell wrote:
> >> On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
> >>>
> >>> In fact when running into trouble, the usual course of action would be to
> >>> increase verbosity in both hypervisor and kernel, just to make sure no
> >>> potentially relevant message is missed.
> >>
> >> More/better information might have been obtained if I'd been engaged
> >> earlier.
> > 
> > This is still true, things are in full mitigation mode and I'll be
> > quite unhappy to go back with experiments at this point.
> 
> Well, it very likely won't work without further experimenting by someone
> able to observe the bad behavior. Recall we're on xen-devel here; it is
> kind of expected that without clear (and practical) repro instructions
> experimenting as well as info collection will remain with the reporter.

After looking at the situation and considering the issues, I /may/ be
able to setup for doing more testing.  I guess I should confirm, which of
those criteria do you think currently provided information fails at?

AMD-IOMMU + Linux MD RAID1 + dual Samsung SATA (or various NVMe) +
dbench; seems a pretty specific setup.

I could see this being criticised as impractical if /new/ devices were
required, but the confirmed flash devices are several generations old.
Difficulty is cheaper candidate devices are being recycled for their
precious metal content, rather than resold as used.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-04-11  2:41                 ` Elliott Mitchell
@ 2024-04-17 12:40                   ` Jan Beulich
  2024-04-18  6:45                     ` Elliott Mitchell
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2024-04-17 12:40 UTC (permalink / raw)
  To: Elliott Mitchell
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné, Wei Liu,
	Kelly Choi

On 11.04.2024 04:41, Elliott Mitchell wrote:
> On Thu, Mar 28, 2024 at 07:25:02AM +0100, Jan Beulich wrote:
>> On 27.03.2024 18:27, Elliott Mitchell wrote:
>>> On Mon, Mar 25, 2024 at 02:43:44PM -0700, Elliott Mitchell wrote:
>>>> On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
>>>>>
>>>>> In fact when running into trouble, the usual course of action would be to
>>>>> increase verbosity in both hypervisor and kernel, just to make sure no
>>>>> potentially relevant message is missed.
>>>>
>>>> More/better information might have been obtained if I'd been engaged
>>>> earlier.
>>>
>>> This is still true, things are in full mitigation mode and I'll be
>>> quite unhappy to go back with experiments at this point.
>>
>> Well, it very likely won't work without further experimenting by someone
>> able to observe the bad behavior. Recall we're on xen-devel here; it is
>> kind of expected that without clear (and practical) repro instructions
>> experimenting as well as info collection will remain with the reporter.
> 
> After looking at the situation and considering the issues, I /may/ be
> able to setup for doing more testing.  I guess I should confirm, which of
> those criteria do you think currently provided information fails at?
> 
> AMD-IOMMU + Linux MD RAID1 + dual Samsung SATA (or various NVMe) +
> dbench; seems a pretty specific setup.

Indeed. If that's the only way to observe the issue, it suggests to me
that it'll need to be mainly you to do further testing, and perhaps even
debugging. Which isn't to say we're not available to help, but from all
I have gathered so far we're pretty much in the dark even as to which
component(s) may be to blame. As can still be seen at the top in reply
context, some suggestions were given as to obtaining possible further
information (or confirming the absence thereof).

I'd also like to come back to the vague theory you did voice, in that
you're suspecting flushes to take too long. I continue to have trouble
with this, and I would therefore like to ask that you put this down in
more technical terms, making connections to actual actions taken by
software / hardware.

Jan

> I could see this being criticised as impractical if /new/ devices were
> required, but the confirmed flash devices are several generations old.
> Difficulty is cheaper candidate devices are being recycled for their
> precious metal content, rather than resold as used.
> 
> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-04-17 12:40                   ` Jan Beulich
@ 2024-04-18  6:45                     ` Elliott Mitchell
  2024-04-18  7:09                       ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Elliott Mitchell @ 2024-04-18  6:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné, Wei Liu,
	Kelly Choi

On Wed, Apr 17, 2024 at 02:40:09PM +0200, Jan Beulich wrote:
> On 11.04.2024 04:41, Elliott Mitchell wrote:
> > On Thu, Mar 28, 2024 at 07:25:02AM +0100, Jan Beulich wrote:
> >> On 27.03.2024 18:27, Elliott Mitchell wrote:
> >>> On Mon, Mar 25, 2024 at 02:43:44PM -0700, Elliott Mitchell wrote:
> >>>> On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
> >>>>>
> >>>>> In fact when running into trouble, the usual course of action would be to
> >>>>> increase verbosity in both hypervisor and kernel, just to make sure no
> >>>>> potentially relevant message is missed.
> >>>>
> >>>> More/better information might have been obtained if I'd been engaged
> >>>> earlier.
> >>>
> >>> This is still true, things are in full mitigation mode and I'll be
> >>> quite unhappy to go back with experiments at this point.
> >>
> >> Well, it very likely won't work without further experimenting by someone
> >> able to observe the bad behavior. Recall we're on xen-devel here; it is
> >> kind of expected that without clear (and practical) repro instructions
> >> experimenting as well as info collection will remain with the reporter.
> > 
> > After looking at the situation and considering the issues, I /may/ be
> > able to setup for doing more testing.  I guess I should confirm, which of
> > those criteria do you think currently provided information fails at?
> > 
> > AMD-IOMMU + Linux MD RAID1 + dual Samsung SATA (or various NVMe) +
> > dbench; seems a pretty specific setup.
> 
> Indeed. If that's the only way to observe the issue, it suggests to me
> that it'll need to be mainly you to do further testing, and perhaps even
> debugging. Which isn't to say we're not available to help, but from all
> I have gathered so far we're pretty much in the dark even as to which
> component(s) may be to blame. As can still be seen at the top in reply
> context, some suggestions were given as to obtaining possible further
> information (or confirming the absence thereof).

There may be other ways which haven't yet been found.

I've been left with the suspicion AMD was to some degree sponsoring
work to ensure Xen works on their hardware.  Given the severity of this
problem I would kind of expect them not want to gain a reputation for
having data loss issues.  Assuming a suitable pair of devices weren't
already on-hand, I would kind of expect this to be well within their
budget.

> I'd also like to come back to the vague theory you did voice, in that
> you're suspecting flushes to take too long. I continue to have trouble
> with this, and I would therefore like to ask that you put this down in
> more technical terms, making connections to actual actions taken by
> software / hardware.

I'm trying to figure out a pattern.

Nominally all the devices are roughly on par (only a very cheap flash
device will be unable to overwhelm SATA's bandwidth).  Yet why did the
Crucial SATA device /seem/ not to have the issue?  Why did a Crucial NVMe
device demonstrate the issue.

My guess is the flash controllers Samsung uses may be able to start
executing commands faster than the ones Crucial uses.  Meanwhile NVMe
is lower overhead and latency than SATA (SATA's overhead isn't an issue
for actual disks).  Perhaps the IOMMU is still flushing its TLB, or
hasn't loaded the new tables.

I suspect when the MD-RAID1 issues block requests to a pair of devices,
it likely sends the block to one device and then reuses most/all of the
structures for the second device.  As a result the second request would
likely get a command to the device rather faster than the first request.

Perhaps look into what structures the MD-RAID1 subsystem reuses are.
Then see whether doing early setup of those structures triggers the
issue?

(okay I'm deep into speculation here, but this seems the simplest
explanation for what could be occuring)

-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-04-18  6:45                     ` Elliott Mitchell
@ 2024-04-18  7:09                       ` Jan Beulich
  2024-04-19  4:33                         ` Elliott Mitchell
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2024-04-18  7:09 UTC (permalink / raw)
  To: Elliott Mitchell
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné, Wei Liu,
	Kelly Choi

On 18.04.2024 08:45, Elliott Mitchell wrote:
> On Wed, Apr 17, 2024 at 02:40:09PM +0200, Jan Beulich wrote:
>> On 11.04.2024 04:41, Elliott Mitchell wrote:
>>> On Thu, Mar 28, 2024 at 07:25:02AM +0100, Jan Beulich wrote:
>>>> On 27.03.2024 18:27, Elliott Mitchell wrote:
>>>>> On Mon, Mar 25, 2024 at 02:43:44PM -0700, Elliott Mitchell wrote:
>>>>>> On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
>>>>>>>
>>>>>>> In fact when running into trouble, the usual course of action would be to
>>>>>>> increase verbosity in both hypervisor and kernel, just to make sure no
>>>>>>> potentially relevant message is missed.
>>>>>>
>>>>>> More/better information might have been obtained if I'd been engaged
>>>>>> earlier.
>>>>>
>>>>> This is still true, things are in full mitigation mode and I'll be
>>>>> quite unhappy to go back with experiments at this point.
>>>>
>>>> Well, it very likely won't work without further experimenting by someone
>>>> able to observe the bad behavior. Recall we're on xen-devel here; it is
>>>> kind of expected that without clear (and practical) repro instructions
>>>> experimenting as well as info collection will remain with the reporter.
>>>
>>> After looking at the situation and considering the issues, I /may/ be
>>> able to setup for doing more testing.  I guess I should confirm, which of
>>> those criteria do you think currently provided information fails at?
>>>
>>> AMD-IOMMU + Linux MD RAID1 + dual Samsung SATA (or various NVMe) +
>>> dbench; seems a pretty specific setup.
>>
>> Indeed. If that's the only way to observe the issue, it suggests to me
>> that it'll need to be mainly you to do further testing, and perhaps even
>> debugging. Which isn't to say we're not available to help, but from all
>> I have gathered so far we're pretty much in the dark even as to which
>> component(s) may be to blame. As can still be seen at the top in reply
>> context, some suggestions were given as to obtaining possible further
>> information (or confirming the absence thereof).
> 
> There may be other ways which haven't yet been found.
> 
> I've been left with the suspicion AMD was to some degree sponsoring
> work to ensure Xen works on their hardware.  Given the severity of this
> problem I would kind of expect them not want to gain a reputation for
> having data loss issues.  Assuming a suitable pair of devices weren't
> already on-hand, I would kind of expect this to be well within their
> budget.

You've got to talk to AMD then. Plus I assume it's clear to you that
even if the (presumably) necessary hardware was available, it still
would require respective setup, leaving open whether the issue then
could indeed be reproduced.

>> I'd also like to come back to the vague theory you did voice, in that
>> you're suspecting flushes to take too long. I continue to have trouble
>> with this, and I would therefore like to ask that you put this down in
>> more technical terms, making connections to actual actions taken by
>> software / hardware.
> 
> I'm trying to figure out a pattern.
> 
> Nominally all the devices are roughly on par (only a very cheap flash
> device will be unable to overwhelm SATA's bandwidth).  Yet why did the
> Crucial SATA device /seem/ not to have the issue?  Why did a Crucial NVMe
> device demonstrate the issue.
> 
> My guess is the flash controllers Samsung uses may be able to start
> executing commands faster than the ones Crucial uses.  Meanwhile NVMe
> is lower overhead and latency than SATA (SATA's overhead isn't an issue
> for actual disks).  Perhaps the IOMMU is still flushing its TLB, or
> hasn't loaded the new tables.

Which would be an IOMMU issue then, that software at best may be able to
work around.

Jan

> I suspect when the MD-RAID1 issues block requests to a pair of devices,
> it likely sends the block to one device and then reuses most/all of the
> structures for the second device.  As a result the second request would
> likely get a command to the device rather faster than the first request.
> 
> Perhaps look into what structures the MD-RAID1 subsystem reuses are.
> Then see whether doing early setup of those structures triggers the
> issue?
> 
> (okay I'm deep into speculation here, but this seems the simplest
> explanation for what could be occuring)
> 
> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-04-18  7:09                       ` Jan Beulich
@ 2024-04-19  4:33                         ` Elliott Mitchell
  2024-05-11  4:09                           ` Elliott Mitchell
  0 siblings, 1 reply; 37+ messages in thread
From: Elliott Mitchell @ 2024-04-19  4:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné, Wei Liu,
	Kelly Choi

On Thu, Apr 18, 2024 at 09:09:51AM +0200, Jan Beulich wrote:
> On 18.04.2024 08:45, Elliott Mitchell wrote:
> > On Wed, Apr 17, 2024 at 02:40:09PM +0200, Jan Beulich wrote:
> >> On 11.04.2024 04:41, Elliott Mitchell wrote:
> >>> On Thu, Mar 28, 2024 at 07:25:02AM +0100, Jan Beulich wrote:
> >>>> On 27.03.2024 18:27, Elliott Mitchell wrote:
> >>>>> On Mon, Mar 25, 2024 at 02:43:44PM -0700, Elliott Mitchell wrote:
> >>>>>> On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
> >>>>>>>
> >>>>>>> In fact when running into trouble, the usual course of action would be to
> >>>>>>> increase verbosity in both hypervisor and kernel, just to make sure no
> >>>>>>> potentially relevant message is missed.
> >>>>>>
> >>>>>> More/better information might have been obtained if I'd been engaged
> >>>>>> earlier.
> >>>>>
> >>>>> This is still true, things are in full mitigation mode and I'll be
> >>>>> quite unhappy to go back with experiments at this point.
> >>>>
> >>>> Well, it very likely won't work without further experimenting by someone
> >>>> able to observe the bad behavior. Recall we're on xen-devel here; it is
> >>>> kind of expected that without clear (and practical) repro instructions
> >>>> experimenting as well as info collection will remain with the reporter.
> >>>
> >>> After looking at the situation and considering the issues, I /may/ be
> >>> able to setup for doing more testing.  I guess I should confirm, which of
> >>> those criteria do you think currently provided information fails at?
> >>>
> >>> AMD-IOMMU + Linux MD RAID1 + dual Samsung SATA (or various NVMe) +
> >>> dbench; seems a pretty specific setup.
> >>
> >> Indeed. If that's the only way to observe the issue, it suggests to me
> >> that it'll need to be mainly you to do further testing, and perhaps even
> >> debugging. Which isn't to say we're not available to help, but from all
> >> I have gathered so far we're pretty much in the dark even as to which
> >> component(s) may be to blame. As can still be seen at the top in reply
> >> context, some suggestions were given as to obtaining possible further
> >> information (or confirming the absence thereof).
> > 
> > There may be other ways which haven't yet been found.
> > 
> > I've been left with the suspicion AMD was to some degree sponsoring
> > work to ensure Xen works on their hardware.  Given the severity of this
> > problem I would kind of expect them not want to gain a reputation for
> > having data loss issues.  Assuming a suitable pair of devices weren't
> > already on-hand, I would kind of expect this to be well within their
> > budget.
> 
> You've got to talk to AMD then. Plus I assume it's clear to you that
> even if the (presumably) necessary hardware was available, it still
> would require respective setup, leaving open whether the issue then
> could indeed be reproduced.

I had a vain hope your links to AMD would allow you to say "we've got a
major problem in need of addressing ASAP".

I suspect it will reproduce readily.  The sparsity of reports is likely
due to few people using RAID1 for flash.  Yet even though the initial
surveys suggest flash has a rather lower initial failure rate, they're
still pointing to rather non-zero failures in the first 5 years.

> >> I'd also like to come back to the vague theory you did voice, in that
> >> you're suspecting flushes to take too long. I continue to have trouble
> >> with this, and I would therefore like to ask that you put this down in
> >> more technical terms, making connections to actual actions taken by
> >> software / hardware.
> > 
> > I'm trying to figure out a pattern.
> > 
> > Nominally all the devices are roughly on par (only a very cheap flash
> > device will be unable to overwhelm SATA's bandwidth).  Yet why did the
> > Crucial SATA device /seem/ not to have the issue?  Why did a Crucial NVMe
> > device demonstrate the issue.
> > 
> > My guess is the flash controllers Samsung uses may be able to start
> > executing commands faster than the ones Crucial uses.  Meanwhile NVMe
> > is lower overhead and latency than SATA (SATA's overhead isn't an issue
> > for actual disks).  Perhaps the IOMMU is still flushing its TLB, or
> > hasn't loaded the new tables.
> 
> Which would be an IOMMU issue then, that software at best may be able to
> work around.

Yet even if uses of RAID1 with flash are uncommon or rare, I would
expect this to have already manifested on Linux without Xen.  In turn
this would suggest Linux likely already has some sort of workaround.

I suspect this is a case of there is some step which is missing from
Xen's IOMMU handling.  Perhaps something which Linux does during an early
DMA setup stage, but the current Xen implementation does lazily?
Alternatively some flag setting or missing step?

I should be able to do another test approach in a few weeks, but I would
love if something could be found sooner.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-04-19  4:33                         ` Elliott Mitchell
@ 2024-05-11  4:09                           ` Elliott Mitchell
  2024-05-13  8:44                             ` Roger Pau Monné
  0 siblings, 1 reply; 37+ messages in thread
From: Elliott Mitchell @ 2024-05-11  4:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné, Wei Liu,
	Kelly Choi

On Thu, Apr 18, 2024 at 09:33:31PM -0700, Elliott Mitchell wrote:
> 
> I suspect this is a case of there is some step which is missing from
> Xen's IOMMU handling.  Perhaps something which Linux does during an early
> DMA setup stage, but the current Xen implementation does lazily?
> Alternatively some flag setting or missing step?
> 
> I should be able to do another test approach in a few weeks, but I would
> love if something could be found sooner.

Turned out to be disturbingly easy to get the first entry when it
happened.  Didn't even need `dbench`, it simply showed once the OS was
fully loaded.  I did get some additional data points.

Appears this requires an AMD IOMMUv2.  A test system with known
functioning AMD IOMMUv1 didn't display the issue at all.

(XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr fffffffdf8000000 flags 0x8 I
(XEN) DDDD:bb:dd.f root @ 83b5f5 (3 levels) dfn=fffffffdf8000
(XEN)   L3[1f7] = 0 np

I find it surprising this required "iommu=debug" to get this level of
detail.  This amount of output seems more appropriate for "verbose".

I strongly prefer to provide snippets.  There is a fair bit of output,
I'm unsure which portion is most pertinent.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-05-11  4:09                           ` Elliott Mitchell
@ 2024-05-13  8:44                             ` Roger Pau Monné
  2024-05-13 20:11                               ` Elliott Mitchell
                                                 ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Roger Pau Monné @ 2024-05-13  8:44 UTC (permalink / raw)
  To: Elliott Mitchell
  Cc: Jan Beulich, xen-devel, Andrew Cooper, Wei Liu, Kelly Choi

On Fri, May 10, 2024 at 09:09:54PM -0700, Elliott Mitchell wrote:
> On Thu, Apr 18, 2024 at 09:33:31PM -0700, Elliott Mitchell wrote:
> > 
> > I suspect this is a case of there is some step which is missing from
> > Xen's IOMMU handling.  Perhaps something which Linux does during an early
> > DMA setup stage, but the current Xen implementation does lazily?
> > Alternatively some flag setting or missing step?
> > 
> > I should be able to do another test approach in a few weeks, but I would
> > love if something could be found sooner.
> 
> Turned out to be disturbingly easy to get the first entry when it
> happened.  Didn't even need `dbench`, it simply showed once the OS was
> fully loaded.  I did get some additional data points.
> 
> Appears this requires an AMD IOMMUv2.  A test system with known
> functioning AMD IOMMUv1 didn't display the issue at all.
> 
> (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr fffffffdf8000000 flags 0x8 I

I would expect the address field to contain more information about the
fault, but I'm not finding any information on the AMD-Vi specification
apart from that it contains the DVA, which makes no sense when the
fault is caused by an interrupt.

> (XEN) DDDD:bb:dd.f root @ 83b5f5 (3 levels) dfn=fffffffdf8000
> (XEN)   L3[1f7] = 0 np

Attempting to print the page table walk for an Interrupt remapping
fault is useless, we should likely avoid that when the I flag is set.

> 
> I find it surprising this required "iommu=debug" to get this level of
> detail.  This amount of output seems more appropriate for "verbose".

"verbose" should also print this information.

> 
> I strongly prefer to provide snippets.  There is a fair bit of output,
> I'm unsure which portion is most pertinent.

I've already voiced my concern that I think what yo uare doing is not
fair.  We are debugging this out of interest, and hence you refusing
to provide all information just hampers our ability to debug, and
makes us spend more time than required just thinking what snippets we
need to ask for.

I will ask again, what's there in the Xen or the Linux dmesgs that you
are so worried about leaking? Please provide an specific example.

Why do you mask the device SBDF in the above snippet?  I would really
like to understand what's so privacy relevant in a PCI SBDF number.

Does booting with `iommu=no-intremap` lead to any issues being
reported?

Regards, Roger.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-05-13  8:44                             ` Roger Pau Monné
@ 2024-05-13 20:11                               ` Elliott Mitchell
  2024-05-14  8:22                                 ` Jan Beulich
  2024-05-14  8:20                               ` Jan Beulich
  2024-06-28  0:18                               ` Elliott Mitchell
  2 siblings, 1 reply; 37+ messages in thread
From: Elliott Mitchell @ 2024-05-13 20:11 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, xen-devel, Andrew Cooper, Wei Liu, Kelly Choi

On Mon, May 13, 2024 at 10:44:59AM +0200, Roger Pau Monné wrote:
> On Fri, May 10, 2024 at 09:09:54PM -0700, Elliott Mitchell wrote:
> > On Thu, Apr 18, 2024 at 09:33:31PM -0700, Elliott Mitchell wrote:
> > > 
> > > I suspect this is a case of there is some step which is missing from
> > > Xen's IOMMU handling.  Perhaps something which Linux does during an early
> > > DMA setup stage, but the current Xen implementation does lazily?
> > > Alternatively some flag setting or missing step?
> > > 
> > > I should be able to do another test approach in a few weeks, but I would
> > > love if something could be found sooner.
> > 
> > Turned out to be disturbingly easy to get the first entry when it
> > happened.  Didn't even need `dbench`, it simply showed once the OS was
> > fully loaded.  I did get some additional data points.
> > 
> > Appears this requires an AMD IOMMUv2.  A test system with known
> > functioning AMD IOMMUv1 didn't display the issue at all.
> > 
> > (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr fffffffdf8000000 flags 0x8 I
> 
> I would expect the address field to contain more information about the
> fault, but I'm not finding any information on the AMD-Vi specification
> apart from that it contains the DVA, which makes no sense when the
> fault is caused by an interrupt.
> 
> > (XEN) DDDD:bb:dd.f root @ 83b5f5 (3 levels) dfn=fffffffdf8000
> > (XEN)   L3[1f7] = 0 np
> 
> Attempting to print the page table walk for an Interrupt remapping
> fault is useless, we should likely avoid that when the I flag is set.

> > I find it surprising this required "iommu=debug" to get this level of
> > detail.  This amount of output seems more appropriate for "verbose".
> 
> "verbose" should also print this information.

Mostly I've noticed Xen's dmesg seems a bit sparse at default settings.
Confirming IOMMU was recognized and operational had been a challenge.  On
the flip side this does mean less potentially sensitive data gets in.

> > I strongly prefer to provide snippets.  There is a fair bit of output,
> > I'm unsure which portion is most pertinent.
> 
> I've already voiced my concern that I think what yo uare doing is not
> fair.  We are debugging this out of interest, and hence you refusing
> to provide all information just hampers our ability to debug, and
> makes us spend more time than required just thinking what snippets we
> need to ask for.
> 
> I will ask again, what's there in the Xen or the Linux dmesgs that you
> are so worried about leaking? Please provide an specific example.

I cannot point to specific data in Xen's dmesg which is known to be
sensitive.  On the flip side all the addresses could readily function as
a subliminal channel.

Might only be kernels from certain vendors, but hardware serial numbers
frequently make it into Linux's dmesg.  All the data coming from ACPI
tables could readily hide something.  Worse, data which seems harmless
now might later turn out to reveal things.

The usual approach is everyone has PGP keys and logs are kept private
on request.

> Why do you mask the device SBDF in the above snippet?  I would really
> like to understand what's so privacy relevant in a PCI SBDF number.

I doubt it reveals much.  Simply seems unlikely to help debugging and
therefore I prefer to mask it.  One more Xen dmesg line:

(XEN) AMD-Vi: Setup I/O page table: device id = 0xbbdd, type = 0x1, root table = 0xADDRADDR, domain = 0, paging mode = 3

> Does booting with `iommu=no-intremap` lead to any issues being
> reported?

I'll try that next time I restart the system.


Another viable approach.  I imagine one or more of the Xen developers
have computers with AMD processors.  I could send a pair of SATA devices
which are known to exhibit the behavior to someone.

The known reproductions have featured ASUS motherboards.  I doubt this is
a requirement, but if one of the main developers has such a system that
is a better target.  I also note these are plugged into motherboard SATA
ports.  It is possible add-on card SATA ports might not exhibit the
behavior.

Then you may discover not much log data is being provided simply due to
not much log data being generated.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-05-13  8:44                             ` Roger Pau Monné
  2024-05-13 20:11                               ` Elliott Mitchell
@ 2024-05-14  8:20                               ` Jan Beulich
  2024-06-28  0:18                               ` Elliott Mitchell
  2 siblings, 0 replies; 37+ messages in thread
From: Jan Beulich @ 2024-05-14  8:20 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, Wei Liu, Kelly Choi, Elliott Mitchell

On 13.05.2024 10:44, Roger Pau Monné wrote:
> On Fri, May 10, 2024 at 09:09:54PM -0700, Elliott Mitchell wrote:
>> On Thu, Apr 18, 2024 at 09:33:31PM -0700, Elliott Mitchell wrote:
>>>
>>> I suspect this is a case of there is some step which is missing from
>>> Xen's IOMMU handling.  Perhaps something which Linux does during an early
>>> DMA setup stage, but the current Xen implementation does lazily?
>>> Alternatively some flag setting or missing step?
>>>
>>> I should be able to do another test approach in a few weeks, but I would
>>> love if something could be found sooner.
>>
>> Turned out to be disturbingly easy to get the first entry when it
>> happened.  Didn't even need `dbench`, it simply showed once the OS was
>> fully loaded.  I did get some additional data points.
>>
>> Appears this requires an AMD IOMMUv2.  A test system with known
>> functioning AMD IOMMUv1 didn't display the issue at all.
>>
>> (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr fffffffdf8000000 flags 0x8 I
> 
> I would expect the address field to contain more information about the
> fault, but I'm not finding any information on the AMD-Vi specification
> apart from that it contains the DVA, which makes no sense when the
> fault is caused by an interrupt.

Isn't the address above in the "magic" HT range (and hence still meaningful
as an address)?

Jan


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-05-13 20:11                               ` Elliott Mitchell
@ 2024-05-14  8:22                                 ` Jan Beulich
  2024-05-14 20:51                                   ` Elliott Mitchell
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2024-05-14  8:22 UTC (permalink / raw)
  To: Elliott Mitchell
  Cc: xen-devel, Andrew Cooper, Wei Liu, Kelly Choi,
	Roger Pau Monné

On 13.05.2024 22:11, Elliott Mitchell wrote:
> On Mon, May 13, 2024 at 10:44:59AM +0200, Roger Pau Monné wrote:
>> Why do you mask the device SBDF in the above snippet?  I would really
>> like to understand what's so privacy relevant in a PCI SBDF number.
> 
> I doubt it reveals much.  Simply seems unlikely to help debugging and
> therefore I prefer to mask it.

SBDF in one place may be matchable against a memory address in another
place. _Any_ hiding of information is hindering analysis. Please can
you finally accept that it needs to be the person doing the analysis
to judge what is or is not relevant to them?

Jan


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-05-14  8:22                                 ` Jan Beulich
@ 2024-05-14 20:51                                   ` Elliott Mitchell
  2024-05-15 13:40                                     ` Kelly Choi
  0 siblings, 1 reply; 37+ messages in thread
From: Elliott Mitchell @ 2024-05-14 20:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Wei Liu, Kelly Choi,
	Roger Pau Monné

On Tue, May 14, 2024 at 10:22:51AM +0200, Jan Beulich wrote:
> On 13.05.2024 22:11, Elliott Mitchell wrote:
> > On Mon, May 13, 2024 at 10:44:59AM +0200, Roger Pau Monné wrote:
> >> Why do you mask the device SBDF in the above snippet?  I would really
> >> like to understand what's so privacy relevant in a PCI SBDF number.
> > 
> > I doubt it reveals much.  Simply seems unlikely to help debugging and
> > therefore I prefer to mask it.
> 
> SBDF in one place may be matchable against a memory address in another
> place. _Any_ hiding of information is hindering analysis. Please can
> you finally accept that it needs to be the person doing the analysis
> to judge what is or is not relevant to them?

Not going to happen as I'd accepted this long ago.  The usual approach
is all developers have PGP keys (needed for security issues anyway) and
you don't require all logs to be public.

I've noticed the core of the Xen project appears centered in the EU.  Yet
you're not catering to data privacy at all?  Or is this a service
exclusively provided to people who prove they're EU citizens?


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-05-14 20:51                                   ` Elliott Mitchell
@ 2024-05-15 13:40                                     ` Kelly Choi
  2024-05-16  5:21                                       ` Elliott Mitchell
  0 siblings, 1 reply; 37+ messages in thread
From: Kelly Choi @ 2024-05-15 13:40 UTC (permalink / raw)
  To: Elliott Mitchell
  Cc: Jan Beulich, xen-devel, Andrew Cooper, Wei Liu,
	Roger Pau Monné

[-- Attachment #1: Type: text/plain, Size: 2377 bytes --]

Hello Elliott,

Most of our developers are based in the EU timezone, however we are a
worldwide community.
The Xen Project is an open source community that everyone contributes to
and we do not divide how we provide help, based on location.

As explained previously, we are happy to help resolve issues and provide
advice where necessary. However, to do this, our developers need the
relevant information to provide accurate resolutions. Given that our
developers have repeatedly voiced their concerns, and are debugging this
out of interest, please help us by providing all the necessary information.

Until we have this information, it will be very difficult to help you
further. Should anything change, we would be glad to assist you.

Many thanks,
Kelly Choi

Community Manager
Xen Project


On Tue, May 14, 2024 at 9:51 PM Elliott Mitchell <ehem+xen@m5p.com> wrote:

> On Tue, May 14, 2024 at 10:22:51AM +0200, Jan Beulich wrote:
> > On 13.05.2024 22:11, Elliott Mitchell wrote:
> > > On Mon, May 13, 2024 at 10:44:59AM +0200, Roger Pau Monné wrote:
> > >> Why do you mask the device SBDF in the above snippet?  I would really
> > >> like to understand what's so privacy relevant in a PCI SBDF number.
> > >
> > > I doubt it reveals much.  Simply seems unlikely to help debugging and
> > > therefore I prefer to mask it.
> >
> > SBDF in one place may be matchable against a memory address in another
> > place. _Any_ hiding of information is hindering analysis. Please can
> > you finally accept that it needs to be the person doing the analysis
> > to judge what is or is not relevant to them?
>
> Not going to happen as I'd accepted this long ago.  The usual approach
> is all developers have PGP keys (needed for security issues anyway) and
> you don't require all logs to be public.
>
> I've noticed the core of the Xen project appears centered in the EU.  Yet
> you're not catering to data privacy at all?  Or is this a service
> exclusively provided to people who prove they're EU citizens?
>
>
> --
> (\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
>  \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
>   \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
> 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
>
>
>

[-- Attachment #2: Type: text/html, Size: 3262 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-05-15 13:40                                     ` Kelly Choi
@ 2024-05-16  5:21                                       ` Elliott Mitchell
  0 siblings, 0 replies; 37+ messages in thread
From: Elliott Mitchell @ 2024-05-16  5:21 UTC (permalink / raw)
  To: Kelly Choi
  Cc: Jan Beulich, xen-devel, Andrew Cooper, Wei Liu,
	Roger Pau Monné

On Wed, May 15, 2024 at 02:40:31PM +0100, Kelly Choi wrote:
> 
> As explained previously, we are happy to help resolve issues and provide
> advice where necessary. However, to do this, our developers need the
> relevant information to provide accurate resolutions. Given that our
> developers have repeatedly voiced their concerns, and are debugging this
> out of interest, please help us by providing all the necessary information.
> 
> Until we have this information, it will be very difficult to help you
> further. Should anything change, we would be glad to assist you.

Usually private submission of logs (PGP) is acceptable.

Note, I am not claiming Xen's `dmesg` contains truly concerning
information.  The issue is there is enough data for problematic
information to unintentionally leak in.  Alternatively no pieces would
be individually concerning, but all together information may leak.

Hopefully ACPI table addresses nor table order are effected by the
motherboard serial number, yet those could readily leak information.

So far this is acting like a major bug.  The paucity of reports is likely
due to few people using RAID1 with flash (most people relying greater
reliability even before the first large studies came out).

-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-05-13  8:44                             ` Roger Pau Monné
  2024-05-13 20:11                               ` Elliott Mitchell
  2024-05-14  8:20                               ` Jan Beulich
@ 2024-06-28  0:18                               ` Elliott Mitchell
  2024-07-01 18:07                                 ` Elliott Mitchell
  2 siblings, 1 reply; 37+ messages in thread
From: Elliott Mitchell @ 2024-06-28  0:18 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, xen-devel, Andrew Cooper, Wei Liu, Kelly Choi

I'm rather surprised it was so long before the next system restart.  
Seems a quiet period as far as security updates go.  Good news is I made
several new observations, but I don't know how valuable these are.

On Mon, May 13, 2024 at 10:44:59AM +0200, Roger Pau Monné wrote:
> 
> Does booting with `iommu=no-intremap` lead to any issues being
> reported?

On boot there was in fact less.  Notably the "AMD-Vi" messages haven't
shown up at all.  I haven't stressed it very much yet, but previous
boots a message showed up the moment the MD-RAID1 driver was loaded.

I am though seeing two different messages now:

(XEN) CPU#: No irq handler for vector # (IRQ -#, LAPIC)
(XEN) IRQ# a=#[#,#] v=#[#] t=PCI-MSI s=#

These are to be appearing in pairs.  Multiple values show for each field,
though each field appears to vary between 2-3 different values.  There
are thousands of these messages showing up.

I'm unsure whether these can be attributed to the flash devices I had
been trying to use in RAID1 though.  I've got another device being
problematic with interrupts which might instead be the cause of those
messages (this one is lower urgency than the flash devices).

-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-06-28  0:18                               ` Elliott Mitchell
@ 2024-07-01 18:07                                 ` Elliott Mitchell
  2024-07-04 22:08                                   ` Elliott Mitchell
  0 siblings, 1 reply; 37+ messages in thread
From: Elliott Mitchell @ 2024-07-01 18:07 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, xen-devel, Andrew Cooper, Wei Liu, Kelly Choi

On Thu, Jun 27, 2024 at 05:18:15PM -0700, Elliott Mitchell wrote:
> I'm rather surprised it was so long before the next system restart.  
> Seems a quiet period as far as security updates go.  Good news is I made
> several new observations, but I don't know how valuable these are.
> 
> On Mon, May 13, 2024 at 10:44:59AM +0200, Roger Pau Monné wrote:
> > 
> > Does booting with `iommu=no-intremap` lead to any issues being
> > reported?
> 
> On boot there was in fact less.  Notably the "AMD-Vi" messages haven't
> shown up at all.  I haven't stressed it very much yet, but previous
> boots a message showed up the moment the MD-RAID1 driver was loaded.
> 
> 
> I am though seeing two different messages now:
> 
> (XEN) CPU#: No irq handler for vector # (IRQ -#, LAPIC)
> (XEN) IRQ# a=#[#,#] v=#[#] t=PCI-MSI s=#
> 
> These are to be appearing in pairs.  Multiple values show for each field,
> though each field appears to vary between 2-3 different values.  There
> are thousands of these messages showing up.

Some lucky timing so I've done some more experimentation and sampling.

The "(XEN) IRQ" line almost always shows up with the "(XEN) CPU" line.
I notice it is possible to generate the first without the second, so this
seems notable.  Every single "(XEN) CPU" line mentioned "LAPIC".

The small number (20) of lines where "(XEN) IRQ" did not show up, the
"(XEN) CPU" line always ended with "(IRQ -2147483648, LAPIC)"

For the "t=" value out of 316 samples, 94 listed "PCI-MSI" while 222
listed "PCI-MSI/-X".

For the IRQ, 72 occurred 126 times.  71, 73 and 108 occurred roughly 50
times each. 109 and 111 occurred under 10 times.  Almost no other IRQ
values appeared.

The "s=" value was "00000030" slightly more often than "00000010".  No
other values have been observed so far.

The other values were didn't show too many patterns.

Most processors were mentioned roughly equally.  Several had fewer
mentions, but not enough to seem significant.  I discovered processor 1
did NOT show up.  Whereas processor 0 had an above average number of
occurrences.  This seems notable as these 2 processors are both reserved
exclusively for domain 0.

There have also been a few "spurious 8259A interrupt" lines.  So far
there haven't been very many of these.  The processor and IRQ listed
don't yet appear to show any patterns.  So far no IRQ has been listed
twice.

-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-07-01 18:07                                 ` Elliott Mitchell
@ 2024-07-04 22:08                                   ` Elliott Mitchell
  2024-07-10 18:35                                     ` Elliott Mitchell
  0 siblings, 1 reply; 37+ messages in thread
From: Elliott Mitchell @ 2024-07-04 22:08 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, xen-devel, Andrew Cooper, Wei Liu, Kelly Choi

On Mon, Jul 01, 2024 at 11:07:57AM -0700, Elliott Mitchell wrote:
> On Thu, Jun 27, 2024 at 05:18:15PM -0700, Elliott Mitchell wrote:
> > I'm rather surprised it was so long before the next system restart.  
> > Seems a quiet period as far as security updates go.  Good news is I made
> > several new observations, but I don't know how valuable these are.
> > 
> > On Mon, May 13, 2024 at 10:44:59AM +0200, Roger Pau Monné wrote:
> > > 
> > > Does booting with `iommu=no-intremap` lead to any issues being
> > > reported?
> > 
> > On boot there was in fact less.  Notably the "AMD-Vi" messages haven't
> > shown up at all.  I haven't stressed it very much yet, but previous
> > boots a message showed up the moment the MD-RAID1 driver was loaded.
> > 
> > 
> > I am though seeing two different messages now:
> > 
> > (XEN) CPU#: No irq handler for vector # (IRQ -#, LAPIC)
> > (XEN) IRQ# a=#[#,#] v=#[#] t=PCI-MSI s=#
> > 
> > These are to be appearing in pairs.  Multiple values show for each field,
> > though each field appears to vary between 2-3 different values.  There
> > are thousands of these messages showing up.
> 
> Some lucky timing so I've done some more experimentation and sampling.
> 
> The "(XEN) IRQ" line almost always shows up with the "(XEN) CPU" line.
> I notice it is possible to generate the first without the second, so this
> seems notable.  Every single "(XEN) CPU" line mentioned "LAPIC".
> 
> The small number (20) of lines where "(XEN) IRQ" did not show up, the
> "(XEN) CPU" line always ended with "(IRQ -2147483648, LAPIC)"
> 
> For the "t=" value out of 316 samples, 94 listed "PCI-MSI" while 222
> listed "PCI-MSI/-X".
> 
> For the IRQ, 72 occurred 126 times.  71, 73 and 108 occurred roughly 50
> times each. 109 and 111 occurred under 10 times.  Almost no other IRQ
> values appeared.
> 
> The "s=" value was "00000030" slightly more often than "00000010".  No
> other values have been observed so far.
> 
> The other values were didn't show too many patterns.
> 
> Most processors were mentioned roughly equally.  Several had fewer
> mentions, but not enough to seem significant.  I discovered processor 1
> did NOT show up.  Whereas processor 0 had an above average number of
> occurrences.  This seems notable as these 2 processors are both reserved
> exclusively for domain 0.

All of the patterns continue.  There are more reports on processor 0 than
any other processor, but not enough to look particularly suspicious.
What *does* look suspicious is the complete absence of reports from
processor 1.

> There have also been a few "spurious 8259A interrupt" lines.  So far
> there haven't been very many of these.  The processor and IRQ listed
> don't yet appear to show any patterns.  So far no IRQ has been listed
> twice.

IRQs 3-7 and 9-15 have each shown up once.  1-2 and 8 haven't shown up
so far.


Things look different enough to try reenabling Linux software RAID1.  I'm
going to continue monitoring closely, but so far it seems
"iommu=no-intremap" may in fact mitigate the issue with software RAID1.

This seems odd, but I'm simply reporting what I observe.  I would have
expected to see problem indications by now, yet there aren't any.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi(?) issue
  2024-07-04 22:08                                   ` Elliott Mitchell
@ 2024-07-10 18:35                                     ` Elliott Mitchell
  0 siblings, 0 replies; 37+ messages in thread
From: Elliott Mitchell @ 2024-07-10 18:35 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, xen-devel, Andrew Cooper, Wei Liu, Kelly Choi

On Thu, Jul 04, 2024 at 03:08:00PM -0700, Elliott Mitchell wrote:
> On Mon, Jul 01, 2024 at 11:07:57AM -0700, Elliott Mitchell wrote:
> > On Thu, Jun 27, 2024 at 05:18:15PM -0700, Elliott Mitchell wrote:
> > 
> > Most processors were mentioned roughly equally.  Several had fewer
> > mentions, but not enough to seem significant.  I discovered processor 1
> > did NOT show up.  Whereas processor 0 had an above average number of
> > occurrences.  This seems notable as these 2 processors are both reserved
> > exclusively for domain 0.
> 
> All of the patterns continue.  There are more reports on processor 0 than
> any other processor, but not enough to look particularly suspicious.
> What *does* look suspicious is the complete absence of reports from
> processor 1.

Bit more work with sort/uniq here and there is more of a pattern.
Odd-numbered processors (1,3,5) are seeing fewer reports, with CPU1 being
an outlier for having none.  Even-numbered processors (0,2,4) are seeing
more reports, with CPU0 displaying the most of any processor.  There is
also a pattern of lower-numbered processors seeing more of the reports
and higher numbered ones seeing less (CPU1 being an outlier).

If my reading of `xl dmesg` is correct, then the lower-numbered
processors are the first die and higher-numbered processors are the
second die.  My guess is the 0 and 1 are the first conjoined pair which
share more of their silicon with each other.

> > There have also been a few "spurious 8259A interrupt" lines.  So far
> > there haven't been very many of these.  The processor and IRQ listed
> > don't yet appear to show any patterns.  So far no IRQ has been listed
> > twice.
> 
> IRQs 3-7 and 9-15 have each shown up once.  1-2 and 8 haven't shown up
> so far.

#8 has now shown up, so 8259A interrupts 3-15 have now all shown up
*once*.  0-2 haven't show up at all.

Certain MSI IRQs are showing up.  The complete list is:

IRQ70	2
IRQ71	82
IRQ72	368
IRQ73	81
IRQ90	22
IRQ107	27
IRQ108	92
IRQ109	23
IRQ111	29
IRQ117	1

I'm unsure whether this actually works, but looking at /proc/interrupts,
all of these are associated with Xen according to Domain 0.  68-91 are
all listed as "xen-percpu", 105-120 are listed as "xen-dyn-lateeoi".

*IF* I am understanding this correctly, this *might* be the same problem
https://lists.xenproject.org/archives/html/xen-devel/2024-07/msg00454.html
Domain 0 is reportting plenty of spurious events.

I'm starting to wonder if this isn't a Linux software RAID1 on AMD
hardware issue, but instead a more generalized issue towards the core
of Xen's interrupt handling.  Just AMD hardware gets hit harder.

> Things look different enough to try reenabling Linux software RAID1.  I'm
> going to continue monitoring closely, but so far it seems
> "iommu=no-intremap" may in fact mitigate the issue with software RAID1.

At this point I've monitored for problems and not found any for long
enough to declare this a tentative mitigation.

-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi issue
  2024-01-25 20:24 Serious AMD-Vi issue Elliott Mitchell
  2024-02-12 23:23 ` Elliott Mitchell
  2024-03-04 23:55 ` AMD-Vi issue Andrew Cooper
@ 2025-01-24 14:31 ` Roger Pau Monné
  2025-01-24 21:26   ` Elliott Mitchell
  2 siblings, 1 reply; 37+ messages in thread
From: Roger Pau Monné @ 2025-01-24 14:31 UTC (permalink / raw)
  To: Elliott Mitchell; +Cc: xen-devel, Jan Beulich, Andrew Cooper

On Thu, Jan 25, 2024 at 12:24:53PM -0800, Elliott Mitchell wrote:
> Apparently this was first noticed with 4.14, but more recently I've been
> able to reproduce the issue:
> 
> https://bugs.debian.org/988477
> 
> The original observation features MD-RAID1 using a pair of Samsung
> SATA-attached flash devices.  The main line shows up in `xl dmesg`:
> 
> (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr ffffff???????000 flags 0x8 I

I think I've figured out the cause for those faults, and posted a fix
here:

https://lore.kernel.org/xen-devel/20250124120112.56678-1-roger.pau@citrix.com/

Fix is patch 5/5, but you likely want to take them all to avoid
context conflicts.

Can you give it a try and see if it fixes the fault messages, plus
your issues with the disk devices?

Regards, Roger.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi issue
  2025-01-24 14:31 ` Serious " Roger Pau Monné
@ 2025-01-24 21:26   ` Elliott Mitchell
  2025-01-26  0:24     ` Teddy Astie
  2025-01-27  9:44     ` Roger Pau Monné
  0 siblings, 2 replies; 37+ messages in thread
From: Elliott Mitchell @ 2025-01-24 21:26 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Jan Beulich, Andrew Cooper

On Fri, Jan 24, 2025 at 03:31:30PM +0100, Roger Pau Monné wrote:
> On Thu, Jan 25, 2024 at 12:24:53PM -0800, Elliott Mitchell wrote:
> > Apparently this was first noticed with 4.14, but more recently I've been
> > able to reproduce the issue:
> > 
> > https://bugs.debian.org/988477
> > 
> > The original observation features MD-RAID1 using a pair of Samsung
> > SATA-attached flash devices.  The main line shows up in `xl dmesg`:
> > 
> > (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr ffffff???????000 flags 0x8 I
> 
> I think I've figured out the cause for those faults, and posted a fix
> here:
> 
> https://lore.kernel.org/xen-devel/20250124120112.56678-1-roger.pau@citrix.com/
> 
> Fix is patch 5/5, but you likely want to take them all to avoid
> context conflicts.

I haven't tested yet, but some analysis from looking at the series.

This seems a plausible explanation for the interrupt IOMMU messages.  As
such I think there is a good chance the reported messages will disappear.

Nothing in here looks plausible for solving the real problem, that of
RAID1 mirrors diverging (almost certainly getting zeroes during DMA, but
there is a chance stale data is being read).

Worse, since it removes the observed messages, the next person will
almost certainly have severe data loss by the time they realize there is
a problem.  Notably those messages lead me to Debian #988477, so I was
able to take action before things got too bad.

I'm not absolutely certain this is a pure Xen bug.  There is a
possibility the RAID1 driver is reusing DMA buffers in a fashion which
violates the DMA interface.  Yet there is also a good chance Xen isn't
implementing its layer properly either.

There is one pattern emerging at this point.  Samsung hardware is badly
effected, other vendors are either uneffected or mildly effected.
Notably the estimated age of the devices meant to be handed off to
someone able to diagnose the issue is >10 years.  The uneffected
Crucial/Micron SATA device *should* drastically outperform these, yet
instead it is uneffected.  The Crucial/Micron NVMe is very mildly
effected, yet should be more than an order of magnitude faster.

The simplest explanation is the flash controller on the Samsung devices
is lower latency than the one used by Micron.

Both present reproductions feature AMD processors and ASUS motherboards.
I'm doubtful of this being an ASUS issue.  This seems more likely a case
of people who use RAID with flash tending to go with a motherboard vendor
who reliably support ECC on all their motherboards.

I don't know whether this is confined to AMD processors, or not.  The
small number of reproductions suggests few people are doing RAID with
flash storage.  In which case no one may have tried RAID1 with flash on
Intel processors.  On Intel hardware the referenced message would be
absent and people might think their problem was distinct from Debian
#988477.

In fact what seems a likely reproduction on Intel hardware is the Intel
sound card issue.  I notice that issue occurs when sound *starts*
playing.  When a sound device starts, its buffers would be empty and the
first DMA request would be turned around with minimal latency.  In such
case this matches the Samsung SATA devices handling DMA with low
latency.

> Can you give it a try and see if it fixes the fault messages, plus
> your issues with the disk devices?

Ick.  I was hoping to avoid reinstalling the known problematic devices
and simply send them to someone better setup for analyzing x86 problems.

Looking at the series, it seems likely to remove the fault messages and
turn this into silent data loss.  I doubt any AMD processors have an
IOMMU, yet omit cmpxchg16b (older system lacked full IOMMU, yet did have
cmpxchg16b, newer system has both).  Even guests have cmpxchg16b
available.

If you really want this tested, it will be a while before the next
potential downtime window.

Come to think of it, I wonder whether this might fix a particular device
which was having an interrupt problem.  Problem there being it was being
uncooperative with motherboard firmware...

-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi issue
  2025-01-24 21:26   ` Elliott Mitchell
@ 2025-01-26  0:24     ` Teddy Astie
  2025-01-27  9:44     ` Roger Pau Monné
  1 sibling, 0 replies; 37+ messages in thread
From: Teddy Astie @ 2025-01-26  0:24 UTC (permalink / raw)
  To: Elliott Mitchell, Roger Pau Monné
  Cc: xen-devel, Jan Beulich, Andrew Cooper

Hello Elliott,

Le 24/01/2025 à 22:31, Elliott Mitchell a écrit :
> On Fri, Jan 24, 2025 at 03:31:30PM +0100, Roger Pau Monné wrote:
>> On Thu, Jan 25, 2024 at 12:24:53PM -0800, Elliott Mitchell wrote:
>>> Apparently this was first noticed with 4.14, but more recently I've been
>>> able to reproduce the issue:
>>>
>>> https://bugs.debian.org/988477
>>>
>>> The original observation features MD-RAID1 using a pair of Samsung
>>> SATA-attached flash devices.  The main line shows up in `xl dmesg`:
>>>
>>> (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr ffffff???????000 flags 0x8 I
>>
>> I think I've figured out the cause for those faults, and posted a fix
>> here:
>>
>> https://lore.kernel.org/xen-devel/20250124120112.56678-1-roger.pau@citrix.com/
>>
>> Fix is patch 5/5, but you likely want to take them all to avoid
>> context conflicts.
>
> I haven't tested yet, but some analysis from looking at the series.

>
> This seems a plausible explanation for the interrupt IOMMU messages.  As
> such I think there is a good chance the reported messages will disappear.
>
> Nothing in here looks plausible for solving the real problem, that of
> RAID1 mirrors diverging (almost certainly getting zeroes during DMA, but
> there is a chance stale data is being read).
>

The message is showing shows that something is going wrong, presumably a
lost interrupt. This can lead to data loss, as it breaks the
expectations of the Dom0's drivers.

If you still observe data loss after these patches, and these messages
have disappeared, it may be due to something else, but these patches are
not looking to hide the fault.

According to AMD-Vi specification, there appears to be a specific case
where interrupt remapping faults are reported as IO_PAGE_FAULT (which
appears to be what's happening).

IG bit (133) of DTE appears to provide an explanation (SupIOPF can set
this behavior globally).

 > IG: ignore unmapped interrupts. 1=Suppress event logging for interrupt
 > messages causing IO_PAGE_FAULT events. 0=creation of event log entries
 > for IO_PAGE_FAULT events is controlled by SupIOPF in the interrupt
 > remapping table entry (see Section 2.2.5 [Interrupt Remapping
 > Tables]).

Note that Xen (and this patch doesn't change this behavior) does set
this bit to 0, which means that faults are reported as IO_PAGE_FAULT events.

> Worse, since it removes the observed messages, the next person will
> almost certainly have severe data loss by the time they realize there is
> a problem.  Notably those messages lead me to Debian #988477, so I was
> able to take action before things got too bad.
>
>
>
> I'm not absolutely certain this is a pure Xen bug.  There is a
> possibility the RAID1 driver is reusing DMA buffers in a fashion which
> violates the DMA interface.  Yet there is also a good chance Xen isn't
> implementing its layer properly either.
>
>
>
> There is one pattern emerging at this point.  Samsung hardware is badly
> effected, other vendors are either uneffected or mildly effected.
> Notably the estimated age of the devices meant to be handed off to
> someone able to diagnose the issue is >10 years.  The uneffected
> Crucial/Micron SATA device *should* drastically outperform these, yet
> instead it is uneffected.  The Crucial/Micron NVMe is very mildly
> effected, yet should be more than an order of magnitude faster.
>
> The simplest explanation is the flash controller on the Samsung devices
> is lower latency than the one used by Micron.
>
>
> Both present reproductions feature AMD processors and ASUS motherboards.
> I'm doubtful of this being an ASUS issue.  This seems more likely a case
> of people who use RAID with flash tending to go with a motherboard vendor
> who reliably support ECC on all their motherboards.
>
> I don't know whether this is confined to AMD processors, or not.  The
> small number of reproductions suggests few people are doing RAID with
> flash storage.  In which case no one may have tried RAID1 with flash on
> Intel processors.  On Intel hardware the referenced message would be
> absent and people might think their problem was distinct from Debian
> #988477.
>
> In fact what seems a likely reproduction on Intel hardware is the Intel
> sound card issue.  I notice that issue occurs when sound *starts*
> playing.  When a sound device starts, its buffers would be empty and the
> first DMA request would be turned around with minimal latency.  In such
> case this matches the Samsung SATA devices handling DMA with low
> latency.
>
>
>> Can you give it a try and see if it fixes the fault messages, plus
>> your issues with the disk devices?
>
> Ick.  I was hoping to avoid reinstalling the known problematic devices
> and simply send them to someone better setup for analyzing x86 problems.
>
> Looking at the series, it seems likely to remove the fault messages and
> turn this into silent data loss.  I doubt any AMD processors have an
> IOMMU, yet omit cmpxchg16b (older system lacked full IOMMU, yet did have
> cmpxchg16b, newer system has both).  Even guests have cmpxchg16b
> available.
>
> If you really want this tested, it will be a while before the next
> potential downtime window.
>
> Come to think of it, I wonder whether this might fix a particular device
> which was having an interrupt problem.  Problem there being it was being
> uncooperative with motherboard firmware...
>
>

As it seems to be a specific corner case, I would not be surprised that
it only shows up in very specific hardware setups.

Teddy


Teddy Astie | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi issue
  2025-01-24 21:26   ` Elliott Mitchell
  2025-01-26  0:24     ` Teddy Astie
@ 2025-01-27  9:44     ` Roger Pau Monné
  2025-02-18  4:05       ` Elliott Mitchell
  2025-04-13 22:08       ` Elliott Mitchell
  1 sibling, 2 replies; 37+ messages in thread
From: Roger Pau Monné @ 2025-01-27  9:44 UTC (permalink / raw)
  To: Elliott Mitchell; +Cc: xen-devel, Jan Beulich, Andrew Cooper

On Fri, Jan 24, 2025 at 01:26:23PM -0800, Elliott Mitchell wrote:
> On Fri, Jan 24, 2025 at 03:31:30PM +0100, Roger Pau Monné wrote:
> > On Thu, Jan 25, 2024 at 12:24:53PM -0800, Elliott Mitchell wrote:
> > > Apparently this was first noticed with 4.14, but more recently I've been
> > > able to reproduce the issue:
> > > 
> > > https://bugs.debian.org/988477
> > > 
> > > The original observation features MD-RAID1 using a pair of Samsung
> > > SATA-attached flash devices.  The main line shows up in `xl dmesg`:
> > > 
> > > (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr ffffff???????000 flags 0x8 I
> > 
> > I think I've figured out the cause for those faults, and posted a fix
> > here:
> > 
> > https://lore.kernel.org/xen-devel/20250124120112.56678-1-roger.pau@citrix.com/
> > 
> > Fix is patch 5/5, but you likely want to take them all to avoid
> > context conflicts.
> 
> I haven't tested yet, but some analysis from looking at the series.
> 
> This seems a plausible explanation for the interrupt IOMMU messages.  As
> such I think there is a good chance the reported messages will disappear.
> 
> Nothing in here looks plausible for solving the real problem, that of
> RAID1 mirrors diverging (almost certainly getting zeroes during DMA, but
> there is a chance stale data is being read).
> 
> Worse, since it removes the observed messages, the next person will
> almost certainly have severe data loss by the time they realize there is
> a problem.  Notably those messages lead me to Debian #988477, so I was
> able to take action before things got too bad.

I think it's the first time I get complains from the reported of a bug
after attempting to fix it.

Maybe my original message wasn't clear enough.  So far I consider the
IOMMU faults and the disk issues different bugs, and hence me asking
specifically whether the posted series make any different for any of
those issues.

I would be surprised if it also fixed the data loss issue, but wanted
to ask regardless.

> 
> 
> I'm not absolutely certain this is a pure Xen bug.  There is a
> possibility the RAID1 driver is reusing DMA buffers in a fashion which
> violates the DMA interface.  Yet there is also a good chance Xen isn't
> implementing its layer properly either.
> 
> 
> 
> There is one pattern emerging at this point.  Samsung hardware is badly
> effected, other vendors are either uneffected or mildly effected.
> Notably the estimated age of the devices meant to be handed off to
> someone able to diagnose the issue is >10 years.  The uneffected
> Crucial/Micron SATA device *should* drastically outperform these, yet
> instead it is uneffected.  The Crucial/Micron NVMe is very mildly
> effected, yet should be more than an order of magnitude faster.
> 
> The simplest explanation is the flash controller on the Samsung devices
> is lower latency than the one used by Micron.
> 
> 
> Both present reproductions feature AMD processors and ASUS motherboards.
> I'm doubtful of this being an ASUS issue.  This seems more likely a case
> of people who use RAID with flash tending to go with a motherboard vendor
> who reliably support ECC on all their motherboards.
> 
> I don't know whether this is confined to AMD processors, or not.  The
> small number of reproductions suggests few people are doing RAID with
> flash storage.  In which case no one may have tried RAID1 with flash on
> Intel processors.  On Intel hardware the referenced message would be
> absent and people might think their problem was distinct from Debian
> #988477.

As said above - my current hypothesis is that the IOMMU fault message
is just a red herring, and has nothing to do with the underlying data
loss issue that you are seeing.

I expect there will be no similar IOMMU fault message on Intel
hardware, as updating of interrupt remapping entries was already done
atomically on VT-d.

> In fact what seems a likely reproduction on Intel hardware is the Intel
> sound card issue.  I notice that issue occurs when sound *starts*
> playing.  When a sound device starts, its buffers would be empty and the
> first DMA request would be turned around with minimal latency.  In such
> case this matches the Samsung SATA devices handling DMA with low
> latency.

Can you reproduce the data loss issue without using RAID in Linux?
You can use fio with verify or similar to stress test it.

Can you reproduce if dom0 is PVH instead of PV?

Can you reproduce with dom0-iommu=strict mode in the Xen command line?

> 
> 
> > Can you give it a try and see if it fixes the fault messages, plus
> > your issues with the disk devices?
> 
> Ick.  I was hoping to avoid reinstalling the known problematic devices
> and simply send them to someone better setup for analyzing x86 problems.
> 
> Looking at the series, it seems likely to remove the fault messages and
> turn this into silent data loss.  I doubt any AMD processors have an
> IOMMU, yet omit cmpxchg16b (older system lacked full IOMMU, yet did have
> cmpxchg16b, newer system has both).  Even guests have cmpxchg16b
> available.

Silent data loss> data loss might or not be there, regardless of whether
IOMMU faults are being reported.  IMO it's unhelpful to make this kind of
comments, as you seem to suggest a preference for leaving the IOMMU
fault bug unfixed, which I'm sure it's not the case.

> If you really want this tested, it will be a while before the next
> potential downtime window.

No worries, I already have confirmation from someone else that was
seeing the same IOMMU faults has tested the fix.  I was mostly
wondering whether it would affect your data loss issues in any way, as
for that I have no one else that can reproduce.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi issue
  2025-01-27  9:44     ` Roger Pau Monné
@ 2025-02-18  4:05       ` Elliott Mitchell
  2025-04-13 22:08       ` Elliott Mitchell
  1 sibling, 0 replies; 37+ messages in thread
From: Elliott Mitchell @ 2025-02-18  4:05 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Jan Beulich, Andrew Cooper

On Mon, Jan 27, 2025 at 10:44:33AM +0100, Roger Pau Monné wrote:
> On Fri, Jan 24, 2025 at 01:26:23PM -0800, Elliott Mitchell wrote:
> > On Fri, Jan 24, 2025 at 03:31:30PM +0100, Roger Pau Monné wrote:
> > > 
> > > I think I've figured out the cause for those faults, and posted a fix
> > > here:
> > > 
> > > https://lore.kernel.org/xen-devel/20250124120112.56678-1-roger.pau@citrix.com/
> > > 
> > > Fix is patch 5/5, but you likely want to take them all to avoid
> > > context conflicts.
> > 
> > I haven't tested yet, but some analysis from looking at the series.
> > 
> > This seems a plausible explanation for the interrupt IOMMU messages.  As
> > such I think there is a good chance the reported messages will disappear.
> > 
> > Nothing in here looks plausible for solving the real problem, that of
> > RAID1 mirrors diverging (almost certainly getting zeroes during DMA, but
> > there is a chance stale data is being read).
> > 
> > Worse, since it removes the observed messages, the next person will
> > almost certainly have severe data loss by the time they realize there is
> > a problem.  Notably those messages lead me to Debian #988477, so I was
> > able to take action before things got too bad.
> 
> I think it's the first time I get complains from the reported of a bug
> after attempting to fix it.
> 
> Maybe my original message wasn't clear enough.  So far I consider the
> IOMMU faults and the disk issues different bugs, and hence me asking
> specifically whether the posted series make any different for any of
> those issues.
> 
> I would be surprised if it also fixed the data loss issue, but wanted
> to ask regardless.

I could readily believe there are two issues here.  An interrupt issue
causing the messages, plus a distinct bug causing IOMMU issues.  The one
trick being the correlation means the interrupt issue serves to allow
rendezvous among reportters.

Problem is if so, the IOMMU is a *much* more severe issue.  Fixing the
interrupt issue is nice, but that doesn't cause data loss.

> > There is one pattern emerging at this point.  Samsung hardware is badly
> > effected, other vendors are either uneffected or mildly effected.
> > Notably the estimated age of the devices meant to be handed off to
> > someone able to diagnose the issue is >10 years.  The uneffected
> > Crucial/Micron SATA device *should* drastically outperform these, yet
> > instead it is uneffected.  The Crucial/Micron NVMe is very mildly
> > effected, yet should be more than an order of magnitude faster.
> > 
> > The simplest explanation is the flash controller on the Samsung devices
> > is lower latency than the one used by Micron.
> > 
> > 
> > Both present reproductions feature AMD processors and ASUS motherboards.
> > I'm doubtful of this being an ASUS issue.  This seems more likely a case
> > of people who use RAID with flash tending to go with a motherboard vendor
> > who reliably support ECC on all their motherboards.
> > 
> > I don't know whether this is confined to AMD processors, or not.  The
> > small number of reproductions suggests few people are doing RAID with
> > flash storage.  In which case no one may have tried RAID1 with flash on
> > Intel processors.  On Intel hardware the referenced message would be
> > absent and people might think their problem was distinct from Debian
> > #988477.
> 
> As said above - my current hypothesis is that the IOMMU fault message
> is just a red herring, and has nothing to do with the underlying data
> loss issue that you are seeing.
> 
> I expect there will be no similar IOMMU fault message on Intel
> hardware, as updating of interrupt remapping entries was already done
> atomically on VT-d.

Entirely possible.  Within the past 24 hours I notice the message:

Message-ID: <1050214476.1105853.1739823581696.JavaMail.zimbra@cert.pl>
Subject: Memory corruption bug with Xen PV Dom0 and BOSS-S1 RAID card

There is a fair bit of similarity, but some distinct differences there.
This may or may not be the exact same bug.

> > In fact what seems a likely reproduction on Intel hardware is the Intel
> > sound card issue.  I notice that issue occurs when sound *starts*
> > playing.  When a sound device starts, its buffers would be empty and the
> > first DMA request would be turned around with minimal latency.  In such
> > case this matches the Samsung SATA devices handling DMA with low
> > latency.
> 
> Can you reproduce the data loss issue without using RAID in Linux?
> You can use fio with verify or similar to stress test it.

I'm not setup to do this.  The only combination I've found where this
occurs features Linux software RAID.  This doesn't mean that is an
absolute requirement though.

> Can you reproduce if dom0 is PVH instead of PV?

I've tried to boot with PVH domain 0 a few times, but so far never had
any success.  I was planning to retry PVH domain 0 when the next Debian
update comes through.  As a result, I've only reproduced with PV
domain 0.

I notice the other report indicates it only effects PV domain 0.
Hopefully I'll have more success with PVH domain 0 next time.

> Can you reproduce with dom0-iommu=strict mode in the Xen command line?

Not yet tried.

> > > Can you give it a try and see if it fixes the fault messages, plus
> > > your issues with the disk devices?
> > 
> > Ick.  I was hoping to avoid reinstalling the known problematic devices
> > and simply send them to someone better setup for analyzing x86 problems.
> > 
> > Looking at the series, it seems likely to remove the fault messages and
> > turn this into silent data loss.  I doubt any AMD processors have an
> > IOMMU, yet omit cmpxchg16b (older system lacked full IOMMU, yet did have
> > cmpxchg16b, newer system has both).  Even guests have cmpxchg16b
> > available.
> 
> Silent data loss> data loss might or not be there, regardless of whether
> IOMMU faults are being reported.  IMO it's unhelpful to make this kind of
> comments, as you seem to suggest a preference for leaving the IOMMU
> fault bug unfixed, which I'm sure it's not the case.

Indeed.

> > If you really want this tested, it will be a while before the next
> > potential downtime window.
> 
> No worries, I already have confirmation from someone else that was
> seeing the same IOMMU faults has tested the fix.  I was mostly
> wondering whether it would affect your data loss issues in any way, as
> for that I have no one else that can reproduce.

I'm extremely doubtful it will effect that issue.  In the more immediate
time-frame, the question is whether:

Message-ID: <1050214476.1105853.1739823581696.JavaMail.zimbra@cert.pl>
Subject: Memory corruption bug with Xen PV Dom0 and BOSS-S1 RAID card

Is the same issue, or not?  There are enough differences to suspect it
is a distinct issue, but there is enough similarity to suspect it may
be the same issue.

Even if it is the same issue, some of the observations there might be
red herrings too.  I'm unsure about whether it is limited to particular
block ranges here.  (may or may not be)


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Serious AMD-Vi issue
  2025-01-27  9:44     ` Roger Pau Monné
  2025-02-18  4:05       ` Elliott Mitchell
@ 2025-04-13 22:08       ` Elliott Mitchell
  1 sibling, 0 replies; 37+ messages in thread
From: Elliott Mitchell @ 2025-04-13 22:08 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Jan Beulich, Andrew Cooper

On Mon, Jan 27, 2025 at 10:44:33AM +0100, Roger Pau Monné wrote:
> On Fri, Jan 24, 2025 at 01:26:23PM -0800, Elliott Mitchell wrote:
> > 
> > In fact what seems a likely reproduction on Intel hardware is the Intel
> > sound card issue.  I notice that issue occurs when sound *starts*
> > playing.  When a sound device starts, its buffers would be empty and the
> > first DMA request would be turned around with minimal latency.  In such
> > case this matches the Samsung SATA devices handling DMA with low
> > latency.
> 
> Can you reproduce the data loss issue without using RAID in Linux?
> You can use fio with verify or similar to stress test it.

This seems rather unlikely.  The first reportter tried without software
RAID and didn't observe the issue.  If the problem occurred without
software RAID, there would be massive numbers of data loss reports.

> Can you reproduce if dom0 is PVH instead of PV?

Haven't gotten Domain 0 PVH to work yet.  I was planning to try again
when Debian updates to 4.20, but until then this isn't working.

> Can you reproduce with dom0-iommu=strict mode in the Xen command line?

Alas, this is now a definite "yes".  Took a bunch of waiting, but now
confirmed to occur with "dom0-iommu=strict".


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2025-04-13 22:09 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-25 20:24 Serious AMD-Vi issue Elliott Mitchell
2024-02-12 23:23 ` Elliott Mitchell
2024-03-04 19:56   ` Elliott Mitchell
2024-03-18 19:41   ` Serious AMD-Vi(?) issue Elliott Mitchell
2024-03-22 16:41     ` Kelly Choi
2024-03-22 19:22       ` Elliott Mitchell
2024-03-25  7:55         ` Jan Beulich
2024-03-25 21:43           ` Elliott Mitchell
2024-03-27 17:27             ` Elliott Mitchell
2024-03-28  6:25               ` Jan Beulich
2024-03-28 15:22                 ` Elliott Mitchell
2024-03-28 16:17                   ` Elliott Mitchell
2024-04-11  2:41                 ` Elliott Mitchell
2024-04-17 12:40                   ` Jan Beulich
2024-04-18  6:45                     ` Elliott Mitchell
2024-04-18  7:09                       ` Jan Beulich
2024-04-19  4:33                         ` Elliott Mitchell
2024-05-11  4:09                           ` Elliott Mitchell
2024-05-13  8:44                             ` Roger Pau Monné
2024-05-13 20:11                               ` Elliott Mitchell
2024-05-14  8:22                                 ` Jan Beulich
2024-05-14 20:51                                   ` Elliott Mitchell
2024-05-15 13:40                                     ` Kelly Choi
2024-05-16  5:21                                       ` Elliott Mitchell
2024-05-14  8:20                               ` Jan Beulich
2024-06-28  0:18                               ` Elliott Mitchell
2024-07-01 18:07                                 ` Elliott Mitchell
2024-07-04 22:08                                   ` Elliott Mitchell
2024-07-10 18:35                                     ` Elliott Mitchell
2024-03-04 23:55 ` AMD-Vi issue Andrew Cooper
2024-03-05  0:34   ` Elliott Mitchell
2025-01-24 14:31 ` Serious " Roger Pau Monné
2025-01-24 21:26   ` Elliott Mitchell
2025-01-26  0:24     ` Teddy Astie
2025-01-27  9:44     ` Roger Pau Monné
2025-02-18  4:05       ` Elliott Mitchell
2025-04-13 22:08       ` Elliott Mitchell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.