* [PATCH] [RFC] vpci: allow BAR write while mapped
@ 2025-03-12 19:50 Stewart Hildebrand
2025-03-13 15:14 ` Alejandro Vallejo
2025-03-13 17:48 ` Roger Pau Monné
0 siblings, 2 replies; 12+ messages in thread
From: Stewart Hildebrand @ 2025-03-12 19:50 UTC (permalink / raw)
To: xen-devel; +Cc: Stewart Hildebrand, Roger Pau Monné
Xen vPCI refuses BAR writes if the BAR is mapped in p2m. If firmware
initialized the BAR to a bad address, Linux will try to write a new
address to the BAR without disabling memory decoding. Allow the write
by updating p2m right away in the vPCI BAR write handler.
Resolves: https://gitlab.com/xen-project/xen/-/issues/197
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
RFC: Currently the deferred mapping machinery supports only map or
unmap, not both. It might be better to rework the mapping machinery
to support unmap-then-map operations, but please let me know your
thoughts.
RFC: This patch has not yet made an attempt to distinguish between
32-bit and 64-bit writes. It probably should.
---
xen/drivers/vpci/header.c | 65 +++++++++++++++++++++++++++++++--------
1 file changed, 53 insertions(+), 12 deletions(-)
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index ef6c965c081c..66adb2183cfe 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -173,7 +173,7 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
ASSERT_UNREACHABLE();
}
-bool vpci_process_pending(struct vcpu *v)
+static bool process_pending(struct vcpu *v, bool need_lock)
{
struct pci_dev *pdev = v->vpci.pdev;
struct vpci_header *header = NULL;
@@ -182,12 +182,14 @@ bool vpci_process_pending(struct vcpu *v)
if ( !pdev )
return false;
- read_lock(&v->domain->pci_lock);
+ if ( need_lock )
+ read_lock(&v->domain->pci_lock);
if ( !pdev->vpci || (v->domain != pdev->domain) )
{
v->vpci.pdev = NULL;
- read_unlock(&v->domain->pci_lock);
+ if ( need_lock )
+ read_unlock(&v->domain->pci_lock);
return false;
}
@@ -209,17 +211,20 @@ bool vpci_process_pending(struct vcpu *v)
if ( rc == -ERESTART )
{
- read_unlock(&v->domain->pci_lock);
+ if ( need_lock )
+ read_unlock(&v->domain->pci_lock);
return true;
}
if ( rc )
{
- spin_lock(&pdev->vpci->lock);
+ if ( need_lock )
+ spin_lock(&pdev->vpci->lock);
/* Disable memory decoding unconditionally on failure. */
modify_decoding(pdev, v->vpci.cmd & ~PCI_COMMAND_MEMORY,
false);
- spin_unlock(&pdev->vpci->lock);
+ if ( need_lock )
+ spin_unlock(&pdev->vpci->lock);
/* Clean all the rangesets */
for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
@@ -228,7 +233,8 @@ bool vpci_process_pending(struct vcpu *v)
v->vpci.pdev = NULL;
- read_unlock(&v->domain->pci_lock);
+ if ( need_lock )
+ read_unlock(&v->domain->pci_lock);
if ( !is_hardware_domain(v->domain) )
domain_crash(v->domain);
@@ -238,15 +244,23 @@ bool vpci_process_pending(struct vcpu *v)
}
v->vpci.pdev = NULL;
- spin_lock(&pdev->vpci->lock);
+ if ( need_lock )
+ spin_lock(&pdev->vpci->lock);
modify_decoding(pdev, v->vpci.cmd, v->vpci.rom_only);
- spin_unlock(&pdev->vpci->lock);
+ if ( need_lock )
+ spin_unlock(&pdev->vpci->lock);
- read_unlock(&v->domain->pci_lock);
+ if ( need_lock )
+ read_unlock(&v->domain->pci_lock);
return false;
}
+bool vpci_process_pending(struct vcpu *v)
+{
+ return process_pending(v, true);
+}
+
static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
uint16_t cmd)
{
@@ -565,6 +579,8 @@ static void cf_check bar_write(
{
struct vpci_bar *bar = data;
bool hi = false;
+ bool reenable = false;
+ uint32_t cmd = 0;
ASSERT(is_hardware_domain(pdev->domain));
@@ -585,10 +601,31 @@ static void cf_check bar_write(
{
/* If the value written is the current one avoid printing a warning. */
if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
+ {
gprintk(XENLOG_WARNING,
- "%pp: ignored BAR %zu write while mapped\n",
+ "%pp: allowing BAR %zu write while mapped\n",
&pdev->sbdf, bar - pdev->vpci->header.bars + hi);
- return;
+ ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+ ASSERT(spin_is_locked(&pdev->vpci->lock));
+ reenable = true;
+ cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
+ /*
+ * Write-while-mapped: unmap the old BAR in p2m. We want this to
+ * finish right away since the deferral machinery only supports
+ * unmap OR map, not unmap-then-remap. Ultimately, it probably would
+ * be better to defer the write-while-mapped case just like regular
+ * BAR writes (but still only allow it for 32-bit BAR writes).
+ */
+ /* Disable memory decoding */
+ modify_bars(pdev, cmd & ~PCI_COMMAND_MEMORY, false);
+ /* Call process pending here to ensure P2M operations are done */
+ while ( process_pending(current, false) )
+ {
+ /* Pre-empted, try again */
+ }
+ }
+ else
+ return;
}
@@ -610,6 +647,10 @@ static void cf_check bar_write(
}
pci_conf_write32(pdev->sbdf, reg, val);
+
+ if ( reenable )
+ /* Write-while-mapped: map the new BAR in p2m. OK to defer. */
+ modify_bars(pdev, cmd, false);
}
static void cf_check guest_mem_bar_write(const struct pci_dev *pdev,
base-commit: 8e60d47cf0112c145b6b0e454d102b04c857db8c
--
2.48.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH] [RFC] vpci: allow BAR write while mapped
2025-03-12 19:50 [PATCH] [RFC] vpci: allow BAR write while mapped Stewart Hildebrand
@ 2025-03-13 15:14 ` Alejandro Vallejo
2025-03-13 17:43 ` Stewart Hildebrand
2025-03-13 17:58 ` Roger Pau Monné
2025-03-13 17:48 ` Roger Pau Monné
1 sibling, 2 replies; 12+ messages in thread
From: Alejandro Vallejo @ 2025-03-13 15:14 UTC (permalink / raw)
To: Stewart Hildebrand, xen-devel; +Cc: Roger Pau Monné
On Wed Mar 12, 2025 at 7:50 PM GMT, Stewart Hildebrand wrote:
> Xen vPCI refuses BAR writes if the BAR is mapped in p2m. If firmware
> initialized the BAR to a bad address, Linux will try to write a new
> address to the BAR without disabling memory decoding. Allow the write
> by updating p2m right away in the vPCI BAR write handler.
>
> Resolves: https://gitlab.com/xen-project/xen/-/issues/197
> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> ---
> RFC: Currently the deferred mapping machinery supports only map or
> unmap, not both. It might be better to rework the mapping machinery
> to support unmap-then-map operations, but please let me know your
> thoughts.
> RFC: This patch has not yet made an attempt to distinguish between
> 32-bit and 64-bit writes. It probably should.
> ---
> xen/drivers/vpci/header.c | 65 +++++++++++++++++++++++++++++++--------
> 1 file changed, 53 insertions(+), 12 deletions(-)
>
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index ef6c965c081c..66adb2183cfe 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -173,7 +173,7 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
> ASSERT_UNREACHABLE();
> }
>
> -bool vpci_process_pending(struct vcpu *v)
> +static bool process_pending(struct vcpu *v, bool need_lock)
> {
> struct pci_dev *pdev = v->vpci.pdev;
> struct vpci_header *header = NULL;
> @@ -182,12 +182,14 @@ bool vpci_process_pending(struct vcpu *v)
> if ( !pdev )
> return false;
>
> - read_lock(&v->domain->pci_lock);
> + if ( need_lock )
> + read_lock(&v->domain->pci_lock);
>
> if ( !pdev->vpci || (v->domain != pdev->domain) )
> {
> v->vpci.pdev = NULL;
> - read_unlock(&v->domain->pci_lock);
> + if ( need_lock )
> + read_unlock(&v->domain->pci_lock);
> return false;
> }
>
> @@ -209,17 +211,20 @@ bool vpci_process_pending(struct vcpu *v)
>
> if ( rc == -ERESTART )
> {
> - read_unlock(&v->domain->pci_lock);
> + if ( need_lock )
> + read_unlock(&v->domain->pci_lock);
> return true;
> }
>
> if ( rc )
> {
> - spin_lock(&pdev->vpci->lock);
> + if ( need_lock )
> + spin_lock(&pdev->vpci->lock);
> /* Disable memory decoding unconditionally on failure. */
> modify_decoding(pdev, v->vpci.cmd & ~PCI_COMMAND_MEMORY,
> false);
> - spin_unlock(&pdev->vpci->lock);
> + if ( need_lock )
> + spin_unlock(&pdev->vpci->lock);
>
> /* Clean all the rangesets */
> for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> @@ -228,7 +233,8 @@ bool vpci_process_pending(struct vcpu *v)
>
> v->vpci.pdev = NULL;
>
> - read_unlock(&v->domain->pci_lock);
> + if ( need_lock )
> + read_unlock(&v->domain->pci_lock);
>
> if ( !is_hardware_domain(v->domain) )
> domain_crash(v->domain);
> @@ -238,15 +244,23 @@ bool vpci_process_pending(struct vcpu *v)
> }
> v->vpci.pdev = NULL;
>
> - spin_lock(&pdev->vpci->lock);
> + if ( need_lock )
> + spin_lock(&pdev->vpci->lock);
> modify_decoding(pdev, v->vpci.cmd, v->vpci.rom_only);
> - spin_unlock(&pdev->vpci->lock);
> + if ( need_lock )
> + spin_unlock(&pdev->vpci->lock);
>
> - read_unlock(&v->domain->pci_lock);
> + if ( need_lock )
> + read_unlock(&v->domain->pci_lock);
>
> return false;
> }
>
> +bool vpci_process_pending(struct vcpu *v)
> +{
> + return process_pending(v, true);
> +}
> +
> static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
> uint16_t cmd)
> {
> @@ -565,6 +579,8 @@ static void cf_check bar_write(
> {
> struct vpci_bar *bar = data;
> bool hi = false;
> + bool reenable = false;
> + uint32_t cmd = 0;
>
> ASSERT(is_hardware_domain(pdev->domain));
>
> @@ -585,10 +601,31 @@ static void cf_check bar_write(
> {
> /* If the value written is the current one avoid printing a warning. */
> if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
> + {
> gprintk(XENLOG_WARNING,
> - "%pp: ignored BAR %zu write while mapped\n",
> + "%pp: allowing BAR %zu write while mapped\n",
> &pdev->sbdf, bar - pdev->vpci->header.bars + hi);
> - return;
> + ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
> + ASSERT(spin_is_locked(&pdev->vpci->lock));
> + reenable = true;
> + cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
> + /*
> + * Write-while-mapped: unmap the old BAR in p2m. We want this to
> + * finish right away since the deferral machinery only supports
> + * unmap OR map, not unmap-then-remap. Ultimately, it probably would
> + * be better to defer the write-while-mapped case just like regular
> + * BAR writes (but still only allow it for 32-bit BAR writes).
> + */
> + /* Disable memory decoding */
> + modify_bars(pdev, cmd & ~PCI_COMMAND_MEMORY, false);
> + /* Call process pending here to ensure P2M operations are done */
> + while ( process_pending(current, false) )
> + {
> + /* Pre-empted, try again */
This seems a tad dangerous. There may be a non-negligible amount of work queued
up. I also wonder whether the guest can induce spinning by increasing
contention on the p2m (e.g: via ballooning) or by induces work being queued up.
I don't quite understand the logic, but I suspect you could
raise_softirq(SCHEDULE_SOFTIRQ), decrease the IP so the instruction is
replayed, release the locks, and simply exit the hypervisor. The system ought
to naturally split the operation in several continuations each of which does
either unmapping or mapping if it couldn't be done in a single one. Replaying
the instruction after decoding is disabled ought to be benign.
I haven't tried any of what I just wrote, so take it with with several tons of
salt though.
Do you know if Linux intentionally skips disabling decode? Or is it a bug?
> + }
> + }
> + else
> + return;
> }
>
>
> @@ -610,6 +647,10 @@ static void cf_check bar_write(
> }
>
> pci_conf_write32(pdev->sbdf, reg, val);
> +
> + if ( reenable )
> + /* Write-while-mapped: map the new BAR in p2m. OK to defer. */
> + modify_bars(pdev, cmd, false);
> }
>
> static void cf_check guest_mem_bar_write(const struct pci_dev *pdev,
>
> base-commit: 8e60d47cf0112c145b6b0e454d102b04c857db8c
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] [RFC] vpci: allow BAR write while mapped
2025-03-13 15:14 ` Alejandro Vallejo
@ 2025-03-13 17:43 ` Stewart Hildebrand
2025-03-14 10:39 ` Alejandro Vallejo
2025-03-13 17:58 ` Roger Pau Monné
1 sibling, 1 reply; 12+ messages in thread
From: Stewart Hildebrand @ 2025-03-13 17:43 UTC (permalink / raw)
To: Alejandro Vallejo, xen-devel; +Cc: Roger Pau Monné
On 3/13/25 11:14, Alejandro Vallejo wrote:
> On Wed Mar 12, 2025 at 7:50 PM GMT, Stewart Hildebrand wrote:
>> Xen vPCI refuses BAR writes if the BAR is mapped in p2m. If firmware
>> initialized the BAR to a bad address, Linux will try to write a new
>> address to the BAR without disabling memory decoding. Allow the write
>> by updating p2m right away in the vPCI BAR write handler.
>>
>> Resolves: https://gitlab.com/xen-project/xen/-/issues/197
>> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>> ---
>> RFC: Currently the deferred mapping machinery supports only map or
>> unmap, not both. It might be better to rework the mapping machinery
>> to support unmap-then-map operations, but please let me know your
>> thoughts.
>> RFC: This patch has not yet made an attempt to distinguish between
>> 32-bit and 64-bit writes. It probably should.
>> ---
>> xen/drivers/vpci/header.c | 65 +++++++++++++++++++++++++++++++--------
>> 1 file changed, 53 insertions(+), 12 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index ef6c965c081c..66adb2183cfe 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -173,7 +173,7 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
>> ASSERT_UNREACHABLE();
>> }
>>
>> -bool vpci_process_pending(struct vcpu *v)
>> +static bool process_pending(struct vcpu *v, bool need_lock)
>> {
>> struct pci_dev *pdev = v->vpci.pdev;
>> struct vpci_header *header = NULL;
>> @@ -182,12 +182,14 @@ bool vpci_process_pending(struct vcpu *v)
>> if ( !pdev )
>> return false;
>>
>> - read_lock(&v->domain->pci_lock);
>> + if ( need_lock )
>> + read_lock(&v->domain->pci_lock);
>>
>> if ( !pdev->vpci || (v->domain != pdev->domain) )
>> {
>> v->vpci.pdev = NULL;
>> - read_unlock(&v->domain->pci_lock);
>> + if ( need_lock )
>> + read_unlock(&v->domain->pci_lock);
>> return false;
>> }
>>
>> @@ -209,17 +211,20 @@ bool vpci_process_pending(struct vcpu *v)
>>
>> if ( rc == -ERESTART )
>> {
>> - read_unlock(&v->domain->pci_lock);
>> + if ( need_lock )
>> + read_unlock(&v->domain->pci_lock);
>> return true;
>> }
>>
>> if ( rc )
>> {
>> - spin_lock(&pdev->vpci->lock);
>> + if ( need_lock )
>> + spin_lock(&pdev->vpci->lock);
>> /* Disable memory decoding unconditionally on failure. */
>> modify_decoding(pdev, v->vpci.cmd & ~PCI_COMMAND_MEMORY,
>> false);
>> - spin_unlock(&pdev->vpci->lock);
>> + if ( need_lock )
>> + spin_unlock(&pdev->vpci->lock);
>>
>> /* Clean all the rangesets */
>> for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> @@ -228,7 +233,8 @@ bool vpci_process_pending(struct vcpu *v)
>>
>> v->vpci.pdev = NULL;
>>
>> - read_unlock(&v->domain->pci_lock);
>> + if ( need_lock )
>> + read_unlock(&v->domain->pci_lock);
>>
>> if ( !is_hardware_domain(v->domain) )
>> domain_crash(v->domain);
>> @@ -238,15 +244,23 @@ bool vpci_process_pending(struct vcpu *v)
>> }
>> v->vpci.pdev = NULL;
>>
>> - spin_lock(&pdev->vpci->lock);
>> + if ( need_lock )
>> + spin_lock(&pdev->vpci->lock);
>> modify_decoding(pdev, v->vpci.cmd, v->vpci.rom_only);
>> - spin_unlock(&pdev->vpci->lock);
>> + if ( need_lock )
>> + spin_unlock(&pdev->vpci->lock);
>>
>> - read_unlock(&v->domain->pci_lock);
>> + if ( need_lock )
>> + read_unlock(&v->domain->pci_lock);
>>
>> return false;
>> }
>>
>> +bool vpci_process_pending(struct vcpu *v)
>> +{
>> + return process_pending(v, true);
>> +}
>> +
>> static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
>> uint16_t cmd)
>> {
>> @@ -565,6 +579,8 @@ static void cf_check bar_write(
>> {
>> struct vpci_bar *bar = data;
>> bool hi = false;
>> + bool reenable = false;
>> + uint32_t cmd = 0;
>>
>> ASSERT(is_hardware_domain(pdev->domain));
>>
>> @@ -585,10 +601,31 @@ static void cf_check bar_write(
>> {
>> /* If the value written is the current one avoid printing a warning. */
>> if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
>> + {
>> gprintk(XENLOG_WARNING,
>> - "%pp: ignored BAR %zu write while mapped\n",
>> + "%pp: allowing BAR %zu write while mapped\n",
>> &pdev->sbdf, bar - pdev->vpci->header.bars + hi);
>> - return;
>> + ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>> + ASSERT(spin_is_locked(&pdev->vpci->lock));
>> + reenable = true;
>> + cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
>> + /*
>> + * Write-while-mapped: unmap the old BAR in p2m. We want this to
>> + * finish right away since the deferral machinery only supports
>> + * unmap OR map, not unmap-then-remap. Ultimately, it probably would
>> + * be better to defer the write-while-mapped case just like regular
>> + * BAR writes (but still only allow it for 32-bit BAR writes).
>> + */
>> + /* Disable memory decoding */
>> + modify_bars(pdev, cmd & ~PCI_COMMAND_MEMORY, false);
>> + /* Call process pending here to ensure P2M operations are done */
>> + while ( process_pending(current, false) )
>> + {
>> + /* Pre-empted, try again */
>
> This seems a tad dangerous. There may be a non-negligible amount of work queued
> up. I also wonder whether the guest can induce spinning by increasing
> contention on the p2m (e.g: via ballooning) or by induces work being queued up.
>
> I don't quite understand the logic, but I suspect you could
> raise_softirq(SCHEDULE_SOFTIRQ), decrease the IP so the instruction is
> replayed, release the locks, and simply exit the hypervisor. The system ought
> to naturally split the operation in several continuations each of which does
> either unmapping or mapping if it couldn't be done in a single one. Replaying
> the instruction after decoding is disabled ought to be benign.
>
> I haven't tried any of what I just wrote, so take it with with several tons of
> salt though.
The idea was that the unmap-then-map operation would appear atomic from
the guest's point of view. I've only queued up the unmap operation at
this point in the new logic. Due to the mentioned limitation in the BAR
mapping deferral machinery, I wanted to make sure *this BAR* was
unmapped before queuing up the map operation (see below). Waiting for
*all* pending operations to finish here is likely not appropriate.
I think this just reinforces the need to rework the BAR mapping
machinery.
> Do you know if Linux intentionally skips disabling decode? Or is it a bug?
I think it's intentional. See https://gitlab.com/xen-project/xen/-/issues/197
>> + }
>> + }
>> + else
>> + return;
>> }
>>
>>
>> @@ -610,6 +647,10 @@ static void cf_check bar_write(
>> }
>>
>> pci_conf_write32(pdev->sbdf, reg, val);
>> +
>> + if ( reenable )
>> + /* Write-while-mapped: map the new BAR in p2m. OK to defer. */
>> + modify_bars(pdev, cmd, false);
This call to modify_bars() will raise a softirq for the map operation.
>> }
>>
>> static void cf_check guest_mem_bar_write(const struct pci_dev *pdev,
>>
>> base-commit: 8e60d47cf0112c145b6b0e454d102b04c857db8c
>
> Cheers,
> Alejandro
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] [RFC] vpci: allow BAR write while mapped
2025-03-13 17:43 ` Stewart Hildebrand
@ 2025-03-14 10:39 ` Alejandro Vallejo
2025-03-17 12:32 ` Roger Pau Monné
0 siblings, 1 reply; 12+ messages in thread
From: Alejandro Vallejo @ 2025-03-14 10:39 UTC (permalink / raw)
To: Stewart Hildebrand, xen-devel; +Cc: Roger Pau Monné
On Thu Mar 13, 2025 at 5:43 PM GMT, Stewart Hildebrand wrote:
> The idea was that the unmap-then-map operation would appear atomic from
> the guest's point of view. I've only queued up the unmap operation at
> this point in the new logic. Due to the mentioned limitation in the BAR
> mapping deferral machinery, I wanted to make sure *this BAR* was
> unmapped before queuing up the map operation (see below). Waiting for
> *all* pending operations to finish here is likely not appropriate.
Looking more closely after reading Roger's answer, I misunderstood what was
being queued where. There's space for a single deferred operation that's
retried if pending on each attempt to resume the vCPU, whereas I initially
thought it was the mutations to the p2m (which would've competed with other
mutations from other vCPUs). This makes more sense, sorry for the noise.
> I think this just reinforces the need to rework the BAR mapping
> machinery.
Right. The most delicate part is dealing with races with another vCPU when the
unmap-then-map operation does not complete in a single taking of the vpci lock
I'd say. And that much is unavoidable, I think, because either unmapping or
mapping might take a while.
>
> > Do you know if Linux intentionally skips disabling decode? Or is it a bug?
>
> I think it's intentional. See https://gitlab.com/xen-project/xen/-/issues/197
Interesting. I seemed to recall some devices being able to decode their own BAR
accesses. But I must've been wrong.
>
> >> + }
> >> + }
> >> + else
> >> + return;
> >> }
> >>
> >>
> >> @@ -610,6 +647,10 @@ static void cf_check bar_write(
> >> }
> >>
> >> pci_conf_write32(pdev->sbdf, reg, val);
> >> +
> >> + if ( reenable )
> >> + /* Write-while-mapped: map the new BAR in p2m. OK to defer. */
> >> + modify_bars(pdev, cmd, false);
>
> This call to modify_bars() will raise a softirq for the map operation.
Ah, fair enough. I clearly didn't look closely enough.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [RFC] vpci: allow BAR write while mapped
2025-03-14 10:39 ` Alejandro Vallejo
@ 2025-03-17 12:32 ` Roger Pau Monné
2025-03-17 13:49 ` Alejandro Vallejo
0 siblings, 1 reply; 12+ messages in thread
From: Roger Pau Monné @ 2025-03-17 12:32 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Stewart Hildebrand, xen-devel
On Fri, Mar 14, 2025 at 10:39:04AM +0000, Alejandro Vallejo wrote:
> On Thu Mar 13, 2025 at 5:43 PM GMT, Stewart Hildebrand wrote:
> > I think this just reinforces the need to rework the BAR mapping
> > machinery.
>
> Right. The most delicate part is dealing with races with another vCPU when the
> unmap-then-map operation does not complete in a single taking of the vpci lock
> I'd say. And that much is unavoidable, I think, because either unmapping or
> mapping might take a while.
The original code was put together for dom0, I bet there are some
races that can lead to incomplete p2m mappings if the domain attempts
parallel manipulation of the BARs and memory decoding bits. However
there should be no case where an unexpected mfn gets mapped into the
p2m as a result of such races.
It's fine for a domain to shot it's own foot if it attempts to do
concurrent PCI accesses to explicitly trigger races in the MMIO
mapping handling, as long as this doesn't cause issues to other
guests, and doesn't leak memory.
Regards, Roger.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [RFC] vpci: allow BAR write while mapped
2025-03-17 12:32 ` Roger Pau Monné
@ 2025-03-17 13:49 ` Alejandro Vallejo
0 siblings, 0 replies; 12+ messages in thread
From: Alejandro Vallejo @ 2025-03-17 13:49 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: Stewart Hildebrand, xen-devel
On Mon Mar 17, 2025 at 12:32 PM GMT, Roger Pau Monné wrote:
> On Fri, Mar 14, 2025 at 10:39:04AM +0000, Alejandro Vallejo wrote:
> > On Thu Mar 13, 2025 at 5:43 PM GMT, Stewart Hildebrand wrote:
> > > I think this just reinforces the need to rework the BAR mapping
> > > machinery.
> >
> > Right. The most delicate part is dealing with races with another vCPU when the
> > unmap-then-map operation does not complete in a single taking of the vpci lock
> > I'd say. And that much is unavoidable, I think, because either unmapping or
> > mapping might take a while.
>
> The original code was put together for dom0, I bet there are some
> races that can lead to incomplete p2m mappings if the domain attempts
> parallel manipulation of the BARs and memory decoding bits. However
> there should be no case where an unexpected mfn gets mapped into the
> p2m as a result of such races.
>
> It's fine for a domain to shot it's own foot if it attempts to do
> concurrent PCI accesses to explicitly trigger races in the MMIO
> mapping handling, as long as this doesn't cause issues to other
> guests, and doesn't leak memory.
>
> Regards, Roger.
It's refcounting bugs that would scare me the most. Or something similar, like
having an expectation that a BAR is absent from a certain p2m when it's still
present.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [RFC] vpci: allow BAR write while mapped
2025-03-13 15:14 ` Alejandro Vallejo
2025-03-13 17:43 ` Stewart Hildebrand
@ 2025-03-13 17:58 ` Roger Pau Monné
1 sibling, 0 replies; 12+ messages in thread
From: Roger Pau Monné @ 2025-03-13 17:58 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Stewart Hildebrand, xen-devel
On Thu, Mar 13, 2025 at 03:14:26PM +0000, Alejandro Vallejo wrote:
> On Wed Mar 12, 2025 at 7:50 PM GMT, Stewart Hildebrand wrote:
> > Xen vPCI refuses BAR writes if the BAR is mapped in p2m. If firmware
> > initialized the BAR to a bad address, Linux will try to write a new
> > address to the BAR without disabling memory decoding. Allow the write
> > by updating p2m right away in the vPCI BAR write handler.
> >
> > Resolves: https://gitlab.com/xen-project/xen/-/issues/197
> > Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> > ---
> > RFC: Currently the deferred mapping machinery supports only map or
> > unmap, not both. It might be better to rework the mapping machinery
> > to support unmap-then-map operations, but please let me know your
> > thoughts.
> > RFC: This patch has not yet made an attempt to distinguish between
> > 32-bit and 64-bit writes. It probably should.
> > ---
> > xen/drivers/vpci/header.c | 65 +++++++++++++++++++++++++++++++--------
> > 1 file changed, 53 insertions(+), 12 deletions(-)
> >
> > diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> > index ef6c965c081c..66adb2183cfe 100644
> > --- a/xen/drivers/vpci/header.c
> > +++ b/xen/drivers/vpci/header.c
> > @@ -173,7 +173,7 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
> > ASSERT_UNREACHABLE();
> > }
> >
> > -bool vpci_process_pending(struct vcpu *v)
> > +static bool process_pending(struct vcpu *v, bool need_lock)
> > {
> > struct pci_dev *pdev = v->vpci.pdev;
> > struct vpci_header *header = NULL;
> > @@ -182,12 +182,14 @@ bool vpci_process_pending(struct vcpu *v)
> > if ( !pdev )
> > return false;
> >
> > - read_lock(&v->domain->pci_lock);
> > + if ( need_lock )
> > + read_lock(&v->domain->pci_lock);
> >
> > if ( !pdev->vpci || (v->domain != pdev->domain) )
> > {
> > v->vpci.pdev = NULL;
> > - read_unlock(&v->domain->pci_lock);
> > + if ( need_lock )
> > + read_unlock(&v->domain->pci_lock);
> > return false;
> > }
> >
> > @@ -209,17 +211,20 @@ bool vpci_process_pending(struct vcpu *v)
> >
> > if ( rc == -ERESTART )
> > {
> > - read_unlock(&v->domain->pci_lock);
> > + if ( need_lock )
> > + read_unlock(&v->domain->pci_lock);
> > return true;
> > }
> >
> > if ( rc )
> > {
> > - spin_lock(&pdev->vpci->lock);
> > + if ( need_lock )
> > + spin_lock(&pdev->vpci->lock);
> > /* Disable memory decoding unconditionally on failure. */
> > modify_decoding(pdev, v->vpci.cmd & ~PCI_COMMAND_MEMORY,
> > false);
> > - spin_unlock(&pdev->vpci->lock);
> > + if ( need_lock )
> > + spin_unlock(&pdev->vpci->lock);
> >
> > /* Clean all the rangesets */
> > for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> > @@ -228,7 +233,8 @@ bool vpci_process_pending(struct vcpu *v)
> >
> > v->vpci.pdev = NULL;
> >
> > - read_unlock(&v->domain->pci_lock);
> > + if ( need_lock )
> > + read_unlock(&v->domain->pci_lock);
> >
> > if ( !is_hardware_domain(v->domain) )
> > domain_crash(v->domain);
> > @@ -238,15 +244,23 @@ bool vpci_process_pending(struct vcpu *v)
> > }
> > v->vpci.pdev = NULL;
> >
> > - spin_lock(&pdev->vpci->lock);
> > + if ( need_lock )
> > + spin_lock(&pdev->vpci->lock);
> > modify_decoding(pdev, v->vpci.cmd, v->vpci.rom_only);
> > - spin_unlock(&pdev->vpci->lock);
> > + if ( need_lock )
> > + spin_unlock(&pdev->vpci->lock);
> >
> > - read_unlock(&v->domain->pci_lock);
> > + if ( need_lock )
> > + read_unlock(&v->domain->pci_lock);
> >
> > return false;
> > }
> >
> > +bool vpci_process_pending(struct vcpu *v)
> > +{
> > + return process_pending(v, true);
> > +}
> > +
> > static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
> > uint16_t cmd)
> > {
> > @@ -565,6 +579,8 @@ static void cf_check bar_write(
> > {
> > struct vpci_bar *bar = data;
> > bool hi = false;
> > + bool reenable = false;
> > + uint32_t cmd = 0;
> >
> > ASSERT(is_hardware_domain(pdev->domain));
> >
> > @@ -585,10 +601,31 @@ static void cf_check bar_write(
> > {
> > /* If the value written is the current one avoid printing a warning. */
> > if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
> > + {
> > gprintk(XENLOG_WARNING,
> > - "%pp: ignored BAR %zu write while mapped\n",
> > + "%pp: allowing BAR %zu write while mapped\n",
> > &pdev->sbdf, bar - pdev->vpci->header.bars + hi);
> > - return;
> > + ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
> > + ASSERT(spin_is_locked(&pdev->vpci->lock));
> > + reenable = true;
> > + cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
> > + /*
> > + * Write-while-mapped: unmap the old BAR in p2m. We want this to
> > + * finish right away since the deferral machinery only supports
> > + * unmap OR map, not unmap-then-remap. Ultimately, it probably would
> > + * be better to defer the write-while-mapped case just like regular
> > + * BAR writes (but still only allow it for 32-bit BAR writes).
> > + */
> > + /* Disable memory decoding */
> > + modify_bars(pdev, cmd & ~PCI_COMMAND_MEMORY, false);
> > + /* Call process pending here to ensure P2M operations are done */
> > + while ( process_pending(current, false) )
> > + {
> > + /* Pre-empted, try again */
>
> This seems a tad dangerous. There may be a non-negligible amount of work queued
> up. I also wonder whether the guest can induce spinning by increasing
> contention on the p2m (e.g: via ballooning) or by induces work being queued up.
>
> I don't quite understand the logic, but I suspect you could
> raise_softirq(SCHEDULE_SOFTIRQ), decrease the IP so the instruction is
> replayed, release the locks, and simply exit the hypervisor.
There's no instruction replay in this case, instead the vCPU is
prevented from resuming operation until the deferred pending operation
is finished. This is done by preventing hvm_do_resume() from
succeeding as long as vpci_process_pending() return true.
> The system ought
> to naturally split the operation in several continuations each of which does
> either unmapping or mapping if it couldn't be done in a single one. Replaying
> the instruction after decoding is disabled ought to be benign.
IMO the main issue is how to signal which operations are pending, so
they can be executed in the deferred context. `struct vpci_vcpu` is
used to hold a single pending operation.
> I haven't tried any of what I just wrote, so take it with with several tons of
> salt though.
>
> Do you know if Linux intentionally skips disabling decode? Or is it a bug?
For 32bit BARs it's possible to atomically reposition them with a
single atomic write, so Linux will try to do that. For 64bit BARs
this is not possible (as it involves writes to two registers), and
hence Linux won't usually attempt to atomically reposition 64bit BARs.
Thanks, Roger.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [RFC] vpci: allow BAR write while mapped
2025-03-12 19:50 [PATCH] [RFC] vpci: allow BAR write while mapped Stewart Hildebrand
2025-03-13 15:14 ` Alejandro Vallejo
@ 2025-03-13 17:48 ` Roger Pau Monné
2025-03-13 18:28 ` Stewart Hildebrand
2025-03-14 8:04 ` Jan Beulich
1 sibling, 2 replies; 12+ messages in thread
From: Roger Pau Monné @ 2025-03-13 17:48 UTC (permalink / raw)
To: Stewart Hildebrand; +Cc: xen-devel
On Wed, Mar 12, 2025 at 03:50:17PM -0400, Stewart Hildebrand wrote:
> Xen vPCI refuses BAR writes if the BAR is mapped in p2m. If firmware
> initialized the BAR to a bad address, Linux will try to write a new
> address to the BAR without disabling memory decoding. Allow the write
> by updating p2m right away in the vPCI BAR write handler.
IIRC it's only 32bit BARs that Linux will attempt to reposition
without toggling memory decoding off. For 64bit BARs it will in
general (unless pci_dev->mmio_always_on is set) toggle memory decoding
off and then update the BAR registers.
>
> Resolves: https://gitlab.com/xen-project/xen/-/issues/197
> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> ---
> RFC: Currently the deferred mapping machinery supports only map or
> unmap, not both. It might be better to rework the mapping machinery
> to support unmap-then-map operations, but please let me know your
> thoughts.
> RFC: This patch has not yet made an attempt to distinguish between
> 32-bit and 64-bit writes. It probably should.
> ---
> xen/drivers/vpci/header.c | 65 +++++++++++++++++++++++++++++++--------
> 1 file changed, 53 insertions(+), 12 deletions(-)
>
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index ef6c965c081c..66adb2183cfe 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -173,7 +173,7 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
> ASSERT_UNREACHABLE();
> }
>
> -bool vpci_process_pending(struct vcpu *v)
> +static bool process_pending(struct vcpu *v, bool need_lock)
> {
> struct pci_dev *pdev = v->vpci.pdev;
> struct vpci_header *header = NULL;
> @@ -182,12 +182,14 @@ bool vpci_process_pending(struct vcpu *v)
> if ( !pdev )
> return false;
>
> - read_lock(&v->domain->pci_lock);
> + if ( need_lock )
> + read_lock(&v->domain->pci_lock);
The addition of need_lock would better be done in a pre-patch.
>
> if ( !pdev->vpci || (v->domain != pdev->domain) )
> {
> v->vpci.pdev = NULL;
> - read_unlock(&v->domain->pci_lock);
> + if ( need_lock )
> + read_unlock(&v->domain->pci_lock);
> return false;
> }
>
> @@ -209,17 +211,20 @@ bool vpci_process_pending(struct vcpu *v)
>
> if ( rc == -ERESTART )
> {
> - read_unlock(&v->domain->pci_lock);
> + if ( need_lock )
> + read_unlock(&v->domain->pci_lock);
> return true;
> }
>
> if ( rc )
> {
> - spin_lock(&pdev->vpci->lock);
> + if ( need_lock )
> + spin_lock(&pdev->vpci->lock);
> /* Disable memory decoding unconditionally on failure. */
> modify_decoding(pdev, v->vpci.cmd & ~PCI_COMMAND_MEMORY,
> false);
> - spin_unlock(&pdev->vpci->lock);
> + if ( need_lock )
> + spin_unlock(&pdev->vpci->lock);
>
> /* Clean all the rangesets */
> for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> @@ -228,7 +233,8 @@ bool vpci_process_pending(struct vcpu *v)
>
> v->vpci.pdev = NULL;
>
> - read_unlock(&v->domain->pci_lock);
> + if ( need_lock )
> + read_unlock(&v->domain->pci_lock);
>
> if ( !is_hardware_domain(v->domain) )
> domain_crash(v->domain);
> @@ -238,15 +244,23 @@ bool vpci_process_pending(struct vcpu *v)
> }
> v->vpci.pdev = NULL;
>
> - spin_lock(&pdev->vpci->lock);
> + if ( need_lock )
> + spin_lock(&pdev->vpci->lock);
> modify_decoding(pdev, v->vpci.cmd, v->vpci.rom_only);
> - spin_unlock(&pdev->vpci->lock);
> + if ( need_lock )
> + spin_unlock(&pdev->vpci->lock);
>
> - read_unlock(&v->domain->pci_lock);
> + if ( need_lock )
> + read_unlock(&v->domain->pci_lock);
>
> return false;
> }
>
> +bool vpci_process_pending(struct vcpu *v)
> +{
> + return process_pending(v, true);
> +}
> +
> static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
> uint16_t cmd)
> {
> @@ -565,6 +579,8 @@ static void cf_check bar_write(
> {
> struct vpci_bar *bar = data;
> bool hi = false;
> + bool reenable = false;
> + uint32_t cmd = 0;
>
> ASSERT(is_hardware_domain(pdev->domain));
>
> @@ -585,10 +601,31 @@ static void cf_check bar_write(
> {
> /* If the value written is the current one avoid printing a warning. */
> if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
> + {
> gprintk(XENLOG_WARNING,
> - "%pp: ignored BAR %zu write while mapped\n",
> + "%pp: allowing BAR %zu write while mapped\n",
> &pdev->sbdf, bar - pdev->vpci->header.bars + hi);
If Xen now handles BARs writes with memory decoding enabled the
message can be removed. It's only purpose was to signal this missing
handling.
> - return;
> + ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
> + ASSERT(spin_is_locked(&pdev->vpci->lock));
> + reenable = true;
> + cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
> + /*
> + * Write-while-mapped: unmap the old BAR in p2m. We want this to
> + * finish right away since the deferral machinery only supports
> + * unmap OR map, not unmap-then-remap. Ultimately, it probably would
> + * be better to defer the write-while-mapped case just like regular
> + * BAR writes (but still only allow it for 32-bit BAR writes).
> + */
> + /* Disable memory decoding */
> + modify_bars(pdev, cmd & ~PCI_COMMAND_MEMORY, false);
I think if the guest explicitly avoids toggling memory decoding Xen
should also to the same, and not touch the bit.
> + /* Call process pending here to ensure P2M operations are done */
> + while ( process_pending(current, false) )
> + {
> + /* Pre-empted, try again */
> + }
I'm afraid this is not how I would expect this to be done. We
explicitly do the p2m modifications in a deferred context to avoid
long running operations. We should continue to do so to perform this
unmap + map operation.
I think you need to introduce a way to queue an operation that will do
a map + unmap in the deferred context processing, or signal that after
the currently queued operation finishes a new call to modify_bars()
should be issued.
It would be nice if we had a more generic way to queue guest vCPU p2m
(map and unmap) operations, but that's likely to require a much better
interface than what we currently have.
Thanks, Roger.
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] [RFC] vpci: allow BAR write while mapped
2025-03-13 17:48 ` Roger Pau Monné
@ 2025-03-13 18:28 ` Stewart Hildebrand
2025-03-14 8:04 ` Jan Beulich
1 sibling, 0 replies; 12+ messages in thread
From: Stewart Hildebrand @ 2025-03-13 18:28 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel
On 3/13/25 13:48, Roger Pau Monné wrote:
> On Wed, Mar 12, 2025 at 03:50:17PM -0400, Stewart Hildebrand wrote:
>> Xen vPCI refuses BAR writes if the BAR is mapped in p2m. If firmware
>> initialized the BAR to a bad address, Linux will try to write a new
>> address to the BAR without disabling memory decoding. Allow the write
>> by updating p2m right away in the vPCI BAR write handler.
>
> IIRC it's only 32bit BARs that Linux will attempt to reposition
> without toggling memory decoding off.
This matches my observations as well.
> For 64bit BARs it will in
> general (unless pci_dev->mmio_always_on is set) toggle memory decoding
> off and then update the BAR registers.
>
>>
>> Resolves: https://gitlab.com/xen-project/xen/-/issues/197
>> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>> ---
>> RFC: Currently the deferred mapping machinery supports only map or
>> unmap, not both. It might be better to rework the mapping machinery
>> to support unmap-then-map operations, but please let me know your
>> thoughts.
>> RFC: This patch has not yet made an attempt to distinguish between
>> 32-bit and 64-bit writes. It probably should.
>> ---
>> xen/drivers/vpci/header.c | 65 +++++++++++++++++++++++++++++++--------
>> 1 file changed, 53 insertions(+), 12 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index ef6c965c081c..66adb2183cfe 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -173,7 +173,7 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
>> ASSERT_UNREACHABLE();
>> }
>>
>> -bool vpci_process_pending(struct vcpu *v)
>> +static bool process_pending(struct vcpu *v, bool need_lock)
>> {
>> struct pci_dev *pdev = v->vpci.pdev;
>> struct vpci_header *header = NULL;
>> @@ -182,12 +182,14 @@ bool vpci_process_pending(struct vcpu *v)
>> if ( !pdev )
>> return false;
>>
>> - read_lock(&v->domain->pci_lock);
>> + if ( need_lock )
>> + read_lock(&v->domain->pci_lock);
>
> The addition of need_lock would better be done in a pre-patch.
>
>>
>> if ( !pdev->vpci || (v->domain != pdev->domain) )
>> {
>> v->vpci.pdev = NULL;
>> - read_unlock(&v->domain->pci_lock);
>> + if ( need_lock )
>> + read_unlock(&v->domain->pci_lock);
>> return false;
>> }
>>
>> @@ -209,17 +211,20 @@ bool vpci_process_pending(struct vcpu *v)
>>
>> if ( rc == -ERESTART )
>> {
>> - read_unlock(&v->domain->pci_lock);
>> + if ( need_lock )
>> + read_unlock(&v->domain->pci_lock);
>> return true;
>> }
>>
>> if ( rc )
>> {
>> - spin_lock(&pdev->vpci->lock);
>> + if ( need_lock )
>> + spin_lock(&pdev->vpci->lock);
>> /* Disable memory decoding unconditionally on failure. */
>> modify_decoding(pdev, v->vpci.cmd & ~PCI_COMMAND_MEMORY,
>> false);
>> - spin_unlock(&pdev->vpci->lock);
>> + if ( need_lock )
>> + spin_unlock(&pdev->vpci->lock);
>>
>> /* Clean all the rangesets */
>> for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> @@ -228,7 +233,8 @@ bool vpci_process_pending(struct vcpu *v)
>>
>> v->vpci.pdev = NULL;
>>
>> - read_unlock(&v->domain->pci_lock);
>> + if ( need_lock )
>> + read_unlock(&v->domain->pci_lock);
>>
>> if ( !is_hardware_domain(v->domain) )
>> domain_crash(v->domain);
>> @@ -238,15 +244,23 @@ bool vpci_process_pending(struct vcpu *v)
>> }
>> v->vpci.pdev = NULL;
>>
>> - spin_lock(&pdev->vpci->lock);
>> + if ( need_lock )
>> + spin_lock(&pdev->vpci->lock);
>> modify_decoding(pdev, v->vpci.cmd, v->vpci.rom_only);
>> - spin_unlock(&pdev->vpci->lock);
>> + if ( need_lock )
>> + spin_unlock(&pdev->vpci->lock);
>>
>> - read_unlock(&v->domain->pci_lock);
>> + if ( need_lock )
>> + read_unlock(&v->domain->pci_lock);
>>
>> return false;
>> }
>>
>> +bool vpci_process_pending(struct vcpu *v)
>> +{
>> + return process_pending(v, true);
>> +}
>> +
>> static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
>> uint16_t cmd)
>> {
>> @@ -565,6 +579,8 @@ static void cf_check bar_write(
>> {
>> struct vpci_bar *bar = data;
>> bool hi = false;
>> + bool reenable = false;
>> + uint32_t cmd = 0;
>>
>> ASSERT(is_hardware_domain(pdev->domain));
>>
>> @@ -585,10 +601,31 @@ static void cf_check bar_write(
>> {
>> /* If the value written is the current one avoid printing a warning. */
>> if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
>> + {
>> gprintk(XENLOG_WARNING,
>> - "%pp: ignored BAR %zu write while mapped\n",
>> + "%pp: allowing BAR %zu write while mapped\n",
>> &pdev->sbdf, bar - pdev->vpci->header.bars + hi);
>
> If Xen now handles BARs writes with memory decoding enabled the
> message can be removed. It's only purpose was to signal this missing
> handling.
OK
>> - return;
>> + ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>> + ASSERT(spin_is_locked(&pdev->vpci->lock));
>> + reenable = true;
>> + cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
>> + /*
>> + * Write-while-mapped: unmap the old BAR in p2m. We want this to
>> + * finish right away since the deferral machinery only supports
>> + * unmap OR map, not unmap-then-remap. Ultimately, it probably would
>> + * be better to defer the write-while-mapped case just like regular
>> + * BAR writes (but still only allow it for 32-bit BAR writes).
>> + */
>> + /* Disable memory decoding */
>> + modify_bars(pdev, cmd & ~PCI_COMMAND_MEMORY, false);
>
> I think if the guest explicitly avoids toggling memory decoding Xen
> should also to the same, and not touch the bit.
OK
>> + /* Call process pending here to ensure P2M operations are done */
>> + while ( process_pending(current, false) )
>> + {
>> + /* Pre-empted, try again */
>> + }
>
> I'm afraid this is not how I would expect this to be done. We
> explicitly do the p2m modifications in a deferred context to avoid
> long running operations. We should continue to do so to perform this
> unmap + map operation.
>
> I think you need to introduce a way to queue an operation that will do
> a map + unmap in the deferred context processing, or signal that after
> the currently queued operation finishes a new call to modify_bars()
> should be issued.
Yep, this makes sense. Will do.
>
> It would be nice if we had a more generic way to queue guest vCPU p2m
> (map and unmap) operations, but that's likely to require a much better
> interface than what we currently have.
>
> Thanks, Roger.
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] [RFC] vpci: allow BAR write while mapped
2025-03-13 17:48 ` Roger Pau Monné
2025-03-13 18:28 ` Stewart Hildebrand
@ 2025-03-14 8:04 ` Jan Beulich
2025-03-14 8:21 ` Roger Pau Monné
1 sibling, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2025-03-14 8:04 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel, Stewart Hildebrand
On 13.03.2025 18:48, Roger Pau Monné wrote:
> On Wed, Mar 12, 2025 at 03:50:17PM -0400, Stewart Hildebrand wrote:
>> @@ -585,10 +601,31 @@ static void cf_check bar_write(
>> {
>> /* If the value written is the current one avoid printing a warning. */
>> if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
>> + {
>> gprintk(XENLOG_WARNING,
>> - "%pp: ignored BAR %zu write while mapped\n",
>> + "%pp: allowing BAR %zu write while mapped\n",
>> &pdev->sbdf, bar - pdev->vpci->header.bars + hi);
>
> If Xen now handles BARs writes with memory decoding enabled the
> message can be removed. It's only purpose was to signal this missing
> handling.
>
>> - return;
>> + ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>> + ASSERT(spin_is_locked(&pdev->vpci->lock));
>> + reenable = true;
>> + cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
>> + /*
>> + * Write-while-mapped: unmap the old BAR in p2m. We want this to
>> + * finish right away since the deferral machinery only supports
>> + * unmap OR map, not unmap-then-remap. Ultimately, it probably would
>> + * be better to defer the write-while-mapped case just like regular
>> + * BAR writes (but still only allow it for 32-bit BAR writes).
>> + */
>> + /* Disable memory decoding */
>> + modify_bars(pdev, cmd & ~PCI_COMMAND_MEMORY, false);
>
> I think if the guest explicitly avoids toggling memory decoding Xen
> should also to the same, and not touch the bit.
For Dom0 I'm inclined to agree, but for DomU-s it may be unsafe to do so.
(You may have meant it like this, but you said "guest".)
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] [RFC] vpci: allow BAR write while mapped
2025-03-14 8:04 ` Jan Beulich
@ 2025-03-14 8:21 ` Roger Pau Monné
2025-03-14 8:26 ` Jan Beulich
0 siblings, 1 reply; 12+ messages in thread
From: Roger Pau Monné @ 2025-03-14 8:21 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, Stewart Hildebrand
On Fri, Mar 14, 2025 at 09:04:55AM +0100, Jan Beulich wrote:
> On 13.03.2025 18:48, Roger Pau Monné wrote:
> > On Wed, Mar 12, 2025 at 03:50:17PM -0400, Stewart Hildebrand wrote:
> >> @@ -585,10 +601,31 @@ static void cf_check bar_write(
> >> {
> >> /* If the value written is the current one avoid printing a warning. */
> >> if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
> >> + {
> >> gprintk(XENLOG_WARNING,
> >> - "%pp: ignored BAR %zu write while mapped\n",
> >> + "%pp: allowing BAR %zu write while mapped\n",
> >> &pdev->sbdf, bar - pdev->vpci->header.bars + hi);
> >
> > If Xen now handles BARs writes with memory decoding enabled the
> > message can be removed. It's only purpose was to signal this missing
> > handling.
> >
> >> - return;
> >> + ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
> >> + ASSERT(spin_is_locked(&pdev->vpci->lock));
> >> + reenable = true;
> >> + cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
> >> + /*
> >> + * Write-while-mapped: unmap the old BAR in p2m. We want this to
> >> + * finish right away since the deferral machinery only supports
> >> + * unmap OR map, not unmap-then-remap. Ultimately, it probably would
> >> + * be better to defer the write-while-mapped case just like regular
> >> + * BAR writes (but still only allow it for 32-bit BAR writes).
> >> + */
> >> + /* Disable memory decoding */
> >> + modify_bars(pdev, cmd & ~PCI_COMMAND_MEMORY, false);
> >
> > I think if the guest explicitly avoids toggling memory decoding Xen
> > should also to the same, and not touch the bit.
>
> For Dom0 I'm inclined to agree, but for DomU-s it may be unsafe to do so.
> (You may have meant it like this, but you said "guest".)
Sorry, I'm not sure I'm following. For domUs the BAR register write
is not propagated to the hardware, it's just the p2m mappings that
change, and hence it's even safer to not toggle the memory decoding
bit for that case? (as there's no write to the device BAR registers
for domUs).
Maybe there's some aspect I'm missing.
Thanks, Roger.
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] [RFC] vpci: allow BAR write while mapped
2025-03-14 8:21 ` Roger Pau Monné
@ 2025-03-14 8:26 ` Jan Beulich
0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2025-03-14 8:26 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel, Stewart Hildebrand
On 14.03.2025 09:21, Roger Pau Monné wrote:
> On Fri, Mar 14, 2025 at 09:04:55AM +0100, Jan Beulich wrote:
>> On 13.03.2025 18:48, Roger Pau Monné wrote:
>>> On Wed, Mar 12, 2025 at 03:50:17PM -0400, Stewart Hildebrand wrote:
>>>> @@ -585,10 +601,31 @@ static void cf_check bar_write(
>>>> {
>>>> /* If the value written is the current one avoid printing a warning. */
>>>> if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
>>>> + {
>>>> gprintk(XENLOG_WARNING,
>>>> - "%pp: ignored BAR %zu write while mapped\n",
>>>> + "%pp: allowing BAR %zu write while mapped\n",
>>>> &pdev->sbdf, bar - pdev->vpci->header.bars + hi);
>>>
>>> If Xen now handles BARs writes with memory decoding enabled the
>>> message can be removed. It's only purpose was to signal this missing
>>> handling.
>>>
>>>> - return;
>>>> + ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>>>> + ASSERT(spin_is_locked(&pdev->vpci->lock));
>>>> + reenable = true;
>>>> + cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
>>>> + /*
>>>> + * Write-while-mapped: unmap the old BAR in p2m. We want this to
>>>> + * finish right away since the deferral machinery only supports
>>>> + * unmap OR map, not unmap-then-remap. Ultimately, it probably would
>>>> + * be better to defer the write-while-mapped case just like regular
>>>> + * BAR writes (but still only allow it for 32-bit BAR writes).
>>>> + */
>>>> + /* Disable memory decoding */
>>>> + modify_bars(pdev, cmd & ~PCI_COMMAND_MEMORY, false);
>>>
>>> I think if the guest explicitly avoids toggling memory decoding Xen
>>> should also to the same, and not touch the bit.
>>
>> For Dom0 I'm inclined to agree, but for DomU-s it may be unsafe to do so.
>> (You may have meant it like this, but you said "guest".)
>
> Sorry, I'm not sure I'm following. For domUs the BAR register write
> is not propagated to the hardware, it's just the p2m mappings that
> change, and hence it's even safer to not toggle the memory decoding
> bit for that case? (as there's no write to the device BAR registers
> for domUs).
Oh, right. I'm sorry for the noise.
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-03-17 13:49 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-12 19:50 [PATCH] [RFC] vpci: allow BAR write while mapped Stewart Hildebrand
2025-03-13 15:14 ` Alejandro Vallejo
2025-03-13 17:43 ` Stewart Hildebrand
2025-03-14 10:39 ` Alejandro Vallejo
2025-03-17 12:32 ` Roger Pau Monné
2025-03-17 13:49 ` Alejandro Vallejo
2025-03-13 17:58 ` Roger Pau Monné
2025-03-13 17:48 ` Roger Pau Monné
2025-03-13 18:28 ` Stewart Hildebrand
2025-03-14 8:04 ` Jan Beulich
2025-03-14 8:21 ` Roger Pau Monné
2025-03-14 8:26 ` Jan Beulich
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.