All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen PAT settings vs Linux PAT settings
@ 2024-10-14 18:26 Marek Marczykowski-Górecki
  2024-10-14 20:05 ` Andrew Cooper
  0 siblings, 1 reply; 4+ messages in thread
From: Marek Marczykowski-Górecki @ 2024-10-14 18:26 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 1388 bytes --]

Hi,

It looks like we've identified the second buggy driver that somewhere
assumes PAT is configured as Linux normally do natively - nvidia binary
one this time[3]. The first one affected was i915, but it turned out to be
a bug in Linux mm. It was eventually fixed[1], but it was quite painful
debugging. This time a proper fix is not known yet. Since the previous
issue, Qubes OS carried a patch[2] that changes Xen to use same PAT as
Linux. We recently dropped this patch, since the Linux fix reached all
supported by us branches, but apparently it wasn't all...

Anyway, would it be useful (and acceptable) for upstream Xen to have
a kconfig option (behind UNSUPPORTED or so) to switch this behavior?
Technically, it's a PV ABI violation, and it does break few things
(definitely PV domU with passthrough are affected - Xen considers them
L1TF vulnerable then; PV live migration is most likely broken too). But
on the other hand, if one doesn't use affected feature, it allows to
workaround an issue that otherwise is very annoying to debug...


[1] git.kernel.org/torvalds/c/548cb932051fb6232ac983ed6673dae7bdf3cf4c
[2] https://github.com/QubesOS/qubes-vmm-xen/blob/44e9fd9f3b1ebf1cf43674b5a1c2669f7dd253f5/1019-Use-Linux-s-PAT.patch
[3] https://github.com/QubesOS/qubes-issues/issues/9501
-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Xen PAT settings vs Linux PAT settings
  2024-10-14 18:26 Xen PAT settings vs Linux PAT settings Marek Marczykowski-Górecki
@ 2024-10-14 20:05 ` Andrew Cooper
  2024-10-14 21:37   ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Cooper @ 2024-10-14 20:05 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, xen-devel

On 14/10/2024 7:26 pm, Marek Marczykowski-Górecki wrote:
> Hi,
>
> It looks like we've identified the second buggy driver that somewhere
> assumes PAT is configured as Linux normally do natively - nvidia binary
> one this time[3]. The first one affected was i915, but it turned out to be
> a bug in Linux mm. It was eventually fixed[1], but it was quite painful
> debugging. This time a proper fix is not known yet. Since the previous
> issue, Qubes OS carried a patch[2] that changes Xen to use same PAT as
> Linux. We recently dropped this patch, since the Linux fix reached all
> supported by us branches, but apparently it wasn't all...
>
> Anyway, would it be useful (and acceptable) for upstream Xen to have
> a kconfig option (behind UNSUPPORTED or so) to switch this behavior?

Not UNSUPPORTED - it's bogus and I still want it purged.

But, behind EXPERT, with a suitable description (e.g. "This breaks
various ABIs including migration, and is presented here for debugging PV
driver issues in a single system.  If turning it on fixes a bug, please
contact upstream Xen"), then I think we need to take it.

The fact that I've had to recommend it once already this week for
debugging purposes, and it wasn't even this Nvidia bug, demonstrates how
pervasive the problems are.

> Technically, it's a PV ABI violation, and it does break few things
> (definitely PV domU with passthrough are affected - Xen considers them
> L1TF vulnerable then; PV live migration is most likely broken too).

Do you have more information on this?  The PAT bits shouldn't form any
part of L1TF considerations.

~Andrew


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Xen PAT settings vs Linux PAT settings
  2024-10-14 20:05 ` Andrew Cooper
@ 2024-10-14 21:37   ` Marek Marczykowski-Górecki
  2024-10-14 22:21     ` Andrew Cooper
  0 siblings, 1 reply; 4+ messages in thread
From: Marek Marczykowski-Górecki @ 2024-10-14 21:37 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 1900 bytes --]

On Mon, Oct 14, 2024 at 09:05:58PM +0100, Andrew Cooper wrote:
> On 14/10/2024 7:26 pm, Marek Marczykowski-Górecki wrote:
> > Hi,
> >
> > It looks like we've identified the second buggy driver that somewhere
> > assumes PAT is configured as Linux normally do natively - nvidia binary
> > one this time[3]. The first one affected was i915, but it turned out to be
> > a bug in Linux mm. It was eventually fixed[1], but it was quite painful
> > debugging. This time a proper fix is not known yet. Since the previous
> > issue, Qubes OS carried a patch[2] that changes Xen to use same PAT as
> > Linux. We recently dropped this patch, since the Linux fix reached all
> > supported by us branches, but apparently it wasn't all...
> >
> > Anyway, would it be useful (and acceptable) for upstream Xen to have
> > a kconfig option (behind UNSUPPORTED or so) to switch this behavior?
> 
> Not UNSUPPORTED - it's bogus and I still want it purged.
> 
> But, behind EXPERT, with a suitable description (e.g. "This breaks
> various ABIs including migration, and is presented here for debugging PV
> driver issues in a single system.  If turning it on fixes a bug, please
> contact upstream Xen"), then I think we need to take it.

Makes sense.

> The fact that I've had to recommend it once already this week for
> debugging purposes, and it wasn't even this Nvidia bug, demonstrates how
> pervasive the problems are.
> 
> > Technically, it's a PV ABI violation, and it does break few things
> > (definitely PV domU with passthrough are affected - Xen considers them
> > L1TF vulnerable then; PV live migration is most likely broken too).
> 
> Do you have more information on this?  The PAT bits shouldn't form any
> part of L1TF considerations.

https://github.com/QubesOS/qubes-issues/issues/8593

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Xen PAT settings vs Linux PAT settings
  2024-10-14 21:37   ` Marek Marczykowski-Górecki
@ 2024-10-14 22:21     ` Andrew Cooper
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Cooper @ 2024-10-14 22:21 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel

On 14/10/2024 10:37 pm, Marek Marczykowski-Górecki wrote:
> On Mon, Oct 14, 2024 at 09:05:58PM +0100, Andrew Cooper wrote:
>> On 14/10/2024 7:26 pm, Marek Marczykowski-Górecki wrote:
>>> Technically, it's a PV ABI violation, and it does break few things
>>> (definitely PV domU with passthrough are affected - Xen considers them
>>> L1TF vulnerable then; PV live migration is most likely broken too).
>> Do you have more information on this?  The PAT bits shouldn't form any
>> part of L1TF considerations.
> https://github.com/QubesOS/qubes-issues/issues/8593
>

0x8010000018200066

That's a very L1TF-unsafe PTE, but it's also got nothing to do with PAT.
It's:

  NX | Avail(bit 52) | addr (0x18200000) | D | A | U | W

and importantly not present.  PAT == 0 == WB in both the Xen and Linux
worlds.

But, it likely does highlight a codepath which is opencoding PTE updates.

We really ought to have an option to do as f61c54967f4a did with
_PAGE_GNTTAB, and to inject #GP into the guest to get a backtrace out of
Linux.

In the case that we're going to crash the domain anyway, #GP is still
more useful, although I would quite like the #GP option instead of
shadowing too.  Maybe hanging off pv-l1tf=fault as an option?

~Andrew


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-10-14 22:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-14 18:26 Xen PAT settings vs Linux PAT settings Marek Marczykowski-Górecki
2024-10-14 20:05 ` Andrew Cooper
2024-10-14 21:37   ` Marek Marczykowski-Górecki
2024-10-14 22:21     ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.