“Backend has not unmapped grant” errors

All of lore.kernel.org
 help / color / mirror / Atom feed

* “Backend has not unmapped grant” errors
@ 2022-08-23  7:40 Demi Marie Obenour
  2022-08-23  7:48 ` Juergen Gross
  0 siblings, 1 reply; 18+ messages in thread
From: Demi Marie Obenour @ 2022-08-23  7:40 UTC (permalink / raw)
  To: Xen developer discussion; +Cc: Marek Marczykowski-Górecki

[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]

I recently had a VM’s /dev/xvdb stop working with a “backend has not
unmapped grant” error.  Since /dev/xvdb was the VM’s private volume,
that rendered the VM effectively useless.  I had to kill it with
qvm-kill.

The backend of /dev/xvdb is dom0, so a malicious backend is clearly not
the cause of this.  I believe the actual cause is a race condition, such
as the following:

1. GUI agent in VM allocates grant X.
2. GUI agent tells GUI daemon in dom0 to map X.
3. GUI agent frees grant X.
4. blkfront allocates grant X and passes it to dom0.
5. dom0’s blkback maps grant X.
6. blkback unmaps grant X.
7. GUI daemon maps grant X.
8. blkfront tries to revoke access to grant X and fails.  Disaster
   ensues.

What could be done to prevent this race?  Right now all of the
approaches I can think of are horribly backwards-incompatible.  They
require replacing grant IDs with some sort of handle, and requiring
userspace to pass these handles to ioctls.  It is also possible that
netfront and blkfront could race against each other in a way that causes
this, though I suspect that race would be much harder to trigger.

This has happened more than once so it is not a fluke due to e.g. cosmic
rays or other random bit-flips.

Marek, do you have any suggestions?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: “Backend has not unmapped grant” errors
  2022-08-23  7:40 “Backend has not unmapped grant” errors Demi Marie Obenour
@ 2022-08-23  7:48 ` Juergen Gross
  2022-08-24  0:20   ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 18+ messages in thread
From: Juergen Gross @ 2022-08-23  7:48 UTC (permalink / raw)
  To: Xen developer discussion, Marek Marczykowski-Górecki


[-- Attachment #1.1.1: Type: text/plain, Size: 1704 bytes --]

On 23.08.22 09:40, Demi Marie Obenour wrote:
> I recently had a VM’s /dev/xvdb stop working with a “backend has not
> unmapped grant” error.  Since /dev/xvdb was the VM’s private volume,
> that rendered the VM effectively useless.  I had to kill it with
> qvm-kill.
> 
> The backend of /dev/xvdb is dom0, so a malicious backend is clearly not
> the cause of this.  I believe the actual cause is a race condition, such
> as the following:
> 
> 1. GUI agent in VM allocates grant X.
> 2. GUI agent tells GUI daemon in dom0 to map X.
> 3. GUI agent frees grant X.
> 4. blkfront allocates grant X and passes it to dom0.
> 5. dom0’s blkback maps grant X.
> 6. blkback unmaps grant X.
> 7. GUI daemon maps grant X.
> 8. blkfront tries to revoke access to grant X and fails.  Disaster
>     ensues.
> 
> What could be done to prevent this race?  Right now all of the
> approaches I can think of are horribly backwards-incompatible.  They
> require replacing grant IDs with some sort of handle, and requiring
> userspace to pass these handles to ioctls.  It is also possible that
> netfront and blkfront could race against each other in a way that causes
> this, though I suspect that race would be much harder to trigger.
> 
> This has happened more than once so it is not a fluke due to e.g. cosmic
> rays or other random bit-flips.
> 
> Marek, do you have any suggestions?

To me that sounds like the interface of the GUI is the culprit.

The GUI agent in the guest should only free a grant, if it got a message
from the backend that it can do so. Just assuming to be able to free it
because it isn't in use currently is the broken assumption here.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: “Backend has not unmapped grant” errors
  2022-08-23  7:48 ` Juergen Gross
@ 2022-08-24  0:20   ` Marek Marczykowski-Górecki
  2022-08-24  6:02     ` Juergen Gross
  2022-08-24  6:11     ` Juergen Gross
  0 siblings, 2 replies; 18+ messages in thread
From: Marek Marczykowski-Górecki @ 2022-08-24  0:20 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Xen developer discussion

[-- Attachment #1: Type: text/plain, Size: 2882 bytes --]

On Tue, Aug 23, 2022 at 09:48:57AM +0200, Juergen Gross wrote:
> On 23.08.22 09:40, Demi Marie Obenour wrote:
> > I recently had a VM’s /dev/xvdb stop working with a “backend has not
> > unmapped grant” error.  Since /dev/xvdb was the VM’s private volume,
> > that rendered the VM effectively useless.  I had to kill it with
> > qvm-kill.
> > 
> > The backend of /dev/xvdb is dom0, so a malicious backend is clearly not
> > the cause of this.  I believe the actual cause is a race condition, such
> > as the following:
> > 
> > 1. GUI agent in VM allocates grant X.
> > 2. GUI agent tells GUI daemon in dom0 to map X.
> > 3. GUI agent frees grant X.
> > 4. blkfront allocates grant X and passes it to dom0.
> > 5. dom0’s blkback maps grant X.
> > 6. blkback unmaps grant X.
> > 7. GUI daemon maps grant X.
> > 8. blkfront tries to revoke access to grant X and fails.  Disaster
> >     ensues.
> > 
> > What could be done to prevent this race?  Right now all of the
> > approaches I can think of are horribly backwards-incompatible.  They
> > require replacing grant IDs with some sort of handle, and requiring
> > userspace to pass these handles to ioctls.  It is also possible that
> > netfront and blkfront could race against each other in a way that causes
> > this, though I suspect that race would be much harder to trigger.
> > 
> > This has happened more than once so it is not a fluke due to e.g. cosmic
> > rays or other random bit-flips.
> > 
> > Marek, do you have any suggestions?
> 
> To me that sounds like the interface of the GUI is the culprit.
> 
> The GUI agent in the guest should only free a grant, if it got a message
> from the backend that it can do so. Just assuming to be able to free it
> because it isn't in use currently is the broken assumption here.

FWIW, I hit this issue twice already in this week CI run, while it never
happened before. The difference compared to previous run is Linux
5.15.57 vs 5.15.61. The latter reports persistent grants disabled. The
only related commits I see there are three commits indeed related to
persistent grants:

  c98e956ef489 xen-blkfront: Apply 'feature_persistent' parameter when connect
  ef26b5d530d4 xen-blkback: Apply 'feature_persistent' parameter when connect
  7304be4c985d xen-blkback: fix persistent grants negotiation

But none of the commit messages suggests intentional disabling it
without explicit request for doing so. I did not requested disabling it
in toolstack (although I have set backend as "trusted" - XSA-403).
I have confirmed it's the frontend version that matters. Running older
frontend kernel with 5.15.61 backend results in persistent grants
enabled (and both frontend and backend xenstore "feature-persistent"
entries are "1" in this case).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: “Backend has not unmapped grant” errors
  2022-08-24  0:20   ` Marek Marczykowski-Górecki
@ 2022-08-24  6:02     ` Juergen Gross
  2022-08-24  6:30       ` Jan Beulich
  2022-08-24 17:44       ` SeongJae Park
  2022-08-24  6:11     ` Juergen Gross
  1 sibling, 2 replies; 18+ messages in thread
From: Juergen Gross @ 2022-08-24  6:02 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Xen developer discussion, SeongJae Park, Maximilian Heyne

[-- Attachment #1.1.1: Type: text/plain, Size: 4312 bytes --]

On 24.08.22 02:20, Marek Marczykowski-Górecki wrote:
> FWIW, I hit this issue twice already in this week CI run, while it never
> happened before. The difference compared to previous run is Linux
> 5.15.57 vs 5.15.61. The latter reports persistent grants disabled. The
> only related commits I see there are three commits indeed related to
> persistent grants:
> 
>    c98e956ef489 xen-blkfront: Apply 'feature_persistent' parameter when connect
>    ef26b5d530d4 xen-blkback: Apply 'feature_persistent' parameter when connect
>    7304be4c985d xen-blkback: fix persistent grants negotiation
> 
> But none of the commit messages suggests intentional disabling it
> without explicit request for doing so. I did not requested disabling it
> in toolstack (although I have set backend as "trusted" - XSA-403).
> I have confirmed it's the frontend version that matters. Running older
> frontend kernel with 5.15.61 backend results in persistent grants
> enabled (and both frontend and backend xenstore "feature-persistent"
> entries are "1" in this case).

This is a mess.

I think the main problem seems to be that the feature negotiation process
isn't specified in a sane way.

 From the blkif.h header:

Backend-side:
  * feature-persistent
  *      Values:         0/1 (boolean)
  *      Default Value:  0
  *      Notes: 7
  *
  *      A value of "1" indicates that the backend can keep the grants used
  *      by the frontend driver mapped, so the same set of grants should be
  *      used in all transactions. The maximum number of grants the backend
  *      can map persistently depends on the implementation, but ideally it
  *      should be RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST. Using this
  *      feature the backend doesn't need to unmap each grant, preventing
  *      costly TLB flushes. The backend driver should only map grants
  *      persistently if the frontend supports it. If a backend driver chooses
  *      to use the persistent protocol when the frontend doesn't support it,
  *      it will probably hit the maximum number of persistently mapped grants
  *      (due to the fact that the frontend won't be reusing the same grants),
  *      and fall back to non-persistent mode. Backend implementations may
  *      shrink or expand the number of persistently mapped grants without
  *      notifying the frontend depending on memory constraints (this might
  *      cause a performance degradation).

Frontend-side:
  * feature-persistent
  *      Values:         0/1 (boolean)
  *      Default Value:  0
  *      Notes: 7, 8, 9
  *
  *      A value of "1" indicates that the frontend will reuse the same grants
  *      for all transactions, allowing the backend to map them with write
  *      access (even when it should be read-only). If the frontend hits the
  *      maximum number of allowed persistently mapped grants, it can fallback
  *      to non persistent mode. This will cause a performance degradation,
  *      since the the backend driver will still try to map those grants
  *      persistently. Since the persistent grants protocol is compatible with
  *      the previous protocol, a frontend driver can choose to work in
  *      persistent mode even when the backend doesn't support it.

Those definitions don't make clear, which side is the one to decide whether
the feature should be used or not. In my understanding the related drivers
should just advertise their setting (the _ability_ to use the feature), and
it should be used only if both sides have written a "1".

With above patches applied, the frontend will set 'feature-persistent' in
Xenstore only, if the backend has done so, but the backend will set it
only, if the frontend has done it. This results in persistent grants
always being disabled.

This is wrong, as the value written should not reflect the current state
of the interface. That state should be set according to both sides' value,
probably a cached one on the blkback side (using a new flag for caching it,
not the current state).

The blkif.h comments should be updated to make it clear that the values in
Xenstore don't reflect the state of the connection, but the availability of
the feature in the related driver.

Comments?

Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: “Backend has not unmapped grant” errors
  2022-08-24  6:02     ` Juergen Gross
@ 2022-08-24  6:30       ` Jan Beulich
  2022-08-24  6:36         ` Juergen Gross
  2022-08-24 17:44       ` SeongJae Park
  1 sibling, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2022-08-24  6:30 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Xen developer discussion, SeongJae Park, Maximilian Heyne,
	Marek Marczykowski-Górecki

On 24.08.2022 08:02, Juergen Gross wrote:
> The blkif.h comments should be updated to make it clear that the values in
> Xenstore don't reflect the state of the connection, but the availability of
> the feature in the related driver.

Isn't that implied for all the feature-* leaves? I certainly don't mind it
being spelled out, but I don't think there's any real ambiguity here.

Jan


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: “Backend has not unmapped grant” errors
  2022-08-24  6:30       ` Jan Beulich
@ 2022-08-24  6:36         ` Juergen Gross
  2022-08-24  6:40           ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Juergen Gross @ 2022-08-24  6:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Xen developer discussion, SeongJae Park, Maximilian Heyne,
	Marek Marczykowski-Górecki


[-- Attachment #1.1.1: Type: text/plain, Size: 760 bytes --]

On 24.08.22 08:30, Jan Beulich wrote:
> On 24.08.2022 08:02, Juergen Gross wrote:
>> The blkif.h comments should be updated to make it clear that the values in
>> Xenstore don't reflect the state of the connection, but the availability of
>> the feature in the related driver.
> 
> Isn't that implied for all the feature-* leaves? I certainly don't mind it
> being spelled out, but I don't think there's any real ambiguity here.

I think it should spelled out explicitly, maybe in the general paragraph
about feature negotiation.

To me especially the phrasing on the frontend side was reading as if a "1"
would indicate the feature to be actively used:

"A value of "1" indicates that the frontend will reuse the same grants ..."


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: “Backend has not unmapped grant” errors
  2022-08-24  6:36         ` Juergen Gross
@ 2022-08-24  6:40           ` Jan Beulich
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2022-08-24  6:40 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Xen developer discussion, SeongJae Park, Maximilian Heyne,
	Marek Marczykowski-Górecki

On 24.08.2022 08:36, Juergen Gross wrote:
> On 24.08.22 08:30, Jan Beulich wrote:
>> On 24.08.2022 08:02, Juergen Gross wrote:
>>> The blkif.h comments should be updated to make it clear that the values in
>>> Xenstore don't reflect the state of the connection, but the availability of
>>> the feature in the related driver.
>>
>> Isn't that implied for all the feature-* leaves? I certainly don't mind it
>> being spelled out, but I don't think there's any real ambiguity here.
> 
> I think it should spelled out explicitly, maybe in the general paragraph
> about feature negotiation.
> 
> To me especially the phrasing on the frontend side was reading as if a "1"
> would indicate the feature to be actively used:
> 
> "A value of "1" indicates that the frontend will reuse the same grants ..."

Hmm, yes, that's certainly wording worth of improving (regardless of any
addition to the general paragraph).

Jan


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: “Backend has not unmapped grant” errors
  2022-08-24  6:02     ` Juergen Gross
  2022-08-24  6:30       ` Jan Beulich
@ 2022-08-24 17:44       ` SeongJae Park
  2022-08-24 20:38         ` SeongJae Park
  1 sibling, 1 reply; 18+ messages in thread
From: SeongJae Park @ 2022-08-24 17:44 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Marek Marczykowski-Górecki, Xen developer discussion,
	SeongJae Park, Maximilian Heyne

Hello,

On Wed, 24 Aug 2022 08:02:40 +0200 Juergen Gross <jgross@suse.com> wrote:

> 
> [-- Attachment #1.1.1: Type: text/plain, Size: 4312 bytes --]
> 
> On 24.08.22 02:20, Marek Marczykowski-Górecki wrote:
> > FWIW, I hit this issue twice already in this week CI run, while it never
> > happened before. The difference compared to previous run is Linux
> > 5.15.57 vs 5.15.61. The latter reports persistent grants disabled. The
> > only related commits I see there are three commits indeed related to
> > persistent grants:
> > 
> >    c98e956ef489 xen-blkfront: Apply 'feature_persistent' parameter when connect
> >    ef26b5d530d4 xen-blkback: Apply 'feature_persistent' parameter when connect
> >    7304be4c985d xen-blkback: fix persistent grants negotiation
> > 
> > But none of the commit messages suggests intentional disabling it
> > without explicit request for doing so. I did not requested disabling it
> > in toolstack (although I have set backend as "trusted" - XSA-403).
> > I have confirmed it's the frontend version that matters. Running older
> > frontend kernel with 5.15.61 backend results in persistent grants
> > enabled (and both frontend and backend xenstore "feature-persistent"
> > entries are "1" in this case).
> 
> This is a mess.
> 
> I think the main problem seems to be that the feature negotiation process
> isn't specified in a sane way.
> 
>  From the blkif.h header:
> 
> Backend-side:
>   * feature-persistent
>   *      Values:         0/1 (boolean)
>   *      Default Value:  0
>   *      Notes: 7
>   *
>   *      A value of "1" indicates that the backend can keep the grants used
>   *      by the frontend driver mapped, so the same set of grants should be
>   *      used in all transactions. The maximum number of grants the backend
>   *      can map persistently depends on the implementation, but ideally it
>   *      should be RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST. Using this
>   *      feature the backend doesn't need to unmap each grant, preventing
>   *      costly TLB flushes. The backend driver should only map grants
>   *      persistently if the frontend supports it. If a backend driver chooses
>   *      to use the persistent protocol when the frontend doesn't support it,
>   *      it will probably hit the maximum number of persistently mapped grants
>   *      (due to the fact that the frontend won't be reusing the same grants),
>   *      and fall back to non-persistent mode. Backend implementations may
>   *      shrink or expand the number of persistently mapped grants without
>   *      notifying the frontend depending on memory constraints (this might
>   *      cause a performance degradation).
> 
> Frontend-side:
>   * feature-persistent
>   *      Values:         0/1 (boolean)
>   *      Default Value:  0
>   *      Notes: 7, 8, 9
>   *
>   *      A value of "1" indicates that the frontend will reuse the same grants
>   *      for all transactions, allowing the backend to map them with write
>   *      access (even when it should be read-only). If the frontend hits the
>   *      maximum number of allowed persistently mapped grants, it can fallback
>   *      to non persistent mode. This will cause a performance degradation,
>   *      since the the backend driver will still try to map those grants
>   *      persistently. Since the persistent grants protocol is compatible with
>   *      the previous protocol, a frontend driver can choose to work in
>   *      persistent mode even when the backend doesn't support it.
> 
> Those definitions don't make clear, which side is the one to decide whether
> the feature should be used or not. In my understanding the related drivers
> should just advertise their setting (the _ability_ to use the feature), and
> it should be used only if both sides have written a "1".
> 
> With above patches applied, the frontend will set 'feature-persistent' in
> Xenstore only, if the backend has done so, but the backend will set it
> only, if the frontend has done it. This results in persistent grants
> always being disabled.

Sorry for making the mess, and thank you for the kind report and detailed
explanation of the problem.

> 
> This is wrong, as the value written should not reflect the current state
> of the interface. That state should be set according to both sides' value,
> probably a cached one on the blkback side (using a new flag for caching it,
> not the current state).

Agreed.  So, I think the issue comes from the fact that we are using one field,
which was a place for saving only the negotiation result, for yet another
purpose: caching of the parameter value.  As a result, the advertisement, which
should follow only the parameter value, becomes inconsistent.

How about simply adding another field for the caching purpose, so that the
advertisation could be done regardless of the negotiation?  For example:

diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index bda5c815e441..a28473470e66 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -226,6 +226,9 @@ struct xen_vbd {
 	sector_t		size;
 	unsigned int		flush_support:1;
 	unsigned int		discard_secure:1;
+	/* Connect-time cached feature_persistent parameter value */
+	unsigned int		feature_gnt_persistent_parm:1;
+	/* Persistent grants feature negotiation result */
 	unsigned int		feature_gnt_persistent:1;
 	unsigned int		overflow_max_grants:1;
 };
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index ee7ad2fb432d..c0227dfa4688 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -907,7 +907,7 @@ static void connect(struct backend_info *be)
 	xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
 
 	err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
-			be->blkif->vbd.feature_gnt_persistent);
+			be->blkif->vbd.feature_gnt_persistent_parm);
 	if (err) {
 		xenbus_dev_fatal(dev, err, "writing %s/feature-persistent",
 				 dev->nodename);
@@ -1085,7 +1085,9 @@ static int connect_ring(struct backend_info *be)
 		return -ENOSYS;
 	}
 
-	blkif->vbd.feature_gnt_persistent = feature_persistent &&
+	blkif->vbd.feature_gnt_persistent_parm = feature_persistent;
+	blkif->vbd.feature_gnt_persistent =
+		blkif->vbd.feature_gnt_persistent_parm &&
 		xenbus_read_unsigned(dev->otherend, "feature-persistent", 0);
 
 	blkif->vbd.overflow_max_grants = 0;
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 8e56e69fb4c4..dfae08115450 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -213,6 +213,9 @@ struct blkfront_info
 	unsigned int feature_fua:1;
 	unsigned int feature_discard:1;
 	unsigned int feature_secdiscard:1;
+	/* Connect-time cached feature_persistent parameter */
+	unsigned int feature_persistent_parm:1;
+	/* Persistent grants feature negotiation result */
 	unsigned int feature_persistent:1;
 	unsigned int bounce:1;
 	unsigned int discard_granularity;
@@ -1848,7 +1851,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
 		goto abort_transaction;
 	}
 	err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
-			info->feature_persistent);
+			info->feature_persistent_parm);
 	if (err)
 		dev_warn(&dev->dev,
 			 "writing persistent grants feature to xenbus");
@@ -2281,7 +2284,8 @@ static void blkfront_gather_backend_features(struct blkfront_info *info)
 	if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0))
 		blkfront_setup_discard(info);
 
-	if (feature_persistent)
+	info->feature_persistent_parm = feature_persistent;
+	if (info->feature_persistent_parm)
 		info->feature_persistent =
 			!!xenbus_read_unsigned(info->xbdev->otherend,
 					       "feature-persistent", 0);


Thanks,
SJ

> 
> The blkif.h comments should be updated to make it clear that the values in
> Xenstore don't reflect the state of the connection, but the availability of
> the feature in the related driver.
> 
> Comments?
> 
> 
> Juergen


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: “Backend has not unmapped grant” errors
  2022-08-24 17:44       ` SeongJae Park
@ 2022-08-24 20:38         ` SeongJae Park
  2022-08-25  6:20           ` Juergen Gross
  0 siblings, 1 reply; 18+ messages in thread
From: SeongJae Park @ 2022-08-24 20:38 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Juergen Gross, roger.pau, Marek Marczykowski-Górecki,
	Xen developer discussion, SeongJae Park, Maximilian Heyne

+ Roger

On Wed, 24 Aug 2022 17:44:42 +0000 SeongJae Park <sj@kernel.org> wrote:

> Hello,
> 
> On Wed, 24 Aug 2022 08:02:40 +0200 Juergen Gross <jgross@suse.com> wrote:
> 
> > 
> > [-- Attachment #1.1.1: Type: text/plain, Size: 4312 bytes --]
> > 
> > On 24.08.22 02:20, Marek Marczykowski-Górecki wrote:
> > > FWIW, I hit this issue twice already in this week CI run, while it never
> > > happened before. The difference compared to previous run is Linux
> > > 5.15.57 vs 5.15.61. The latter reports persistent grants disabled. The
> > > only related commits I see there are three commits indeed related to
> > > persistent grants:
> > > 
> > >    c98e956ef489 xen-blkfront: Apply 'feature_persistent' parameter when connect
> > >    ef26b5d530d4 xen-blkback: Apply 'feature_persistent' parameter when connect
> > >    7304be4c985d xen-blkback: fix persistent grants negotiation
> > > 
> > > But none of the commit messages suggests intentional disabling it
> > > without explicit request for doing so. I did not requested disabling it
> > > in toolstack (although I have set backend as "trusted" - XSA-403).
> > > I have confirmed it's the frontend version that matters. Running older
> > > frontend kernel with 5.15.61 backend results in persistent grants
> > > enabled (and both frontend and backend xenstore "feature-persistent"
> > > entries are "1" in this case).
> > 
> > This is a mess.
> > 
> > I think the main problem seems to be that the feature negotiation process
> > isn't specified in a sane way.
> > 
> >  From the blkif.h header:
> > 
> > Backend-side:
> >   * feature-persistent
> >   *      Values:         0/1 (boolean)
> >   *      Default Value:  0
> >   *      Notes: 7
> >   *
> >   *      A value of "1" indicates that the backend can keep the grants used
> >   *      by the frontend driver mapped, so the same set of grants should be
> >   *      used in all transactions. The maximum number of grants the backend
> >   *      can map persistently depends on the implementation, but ideally it
> >   *      should be RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST. Using this
> >   *      feature the backend doesn't need to unmap each grant, preventing
> >   *      costly TLB flushes. The backend driver should only map grants
> >   *      persistently if the frontend supports it. If a backend driver chooses
> >   *      to use the persistent protocol when the frontend doesn't support it,
> >   *      it will probably hit the maximum number of persistently mapped grants
> >   *      (due to the fact that the frontend won't be reusing the same grants),
> >   *      and fall back to non-persistent mode. Backend implementations may
> >   *      shrink or expand the number of persistently mapped grants without
> >   *      notifying the frontend depending on memory constraints (this might
> >   *      cause a performance degradation).
> > 
> > Frontend-side:
> >   * feature-persistent
> >   *      Values:         0/1 (boolean)
> >   *      Default Value:  0
> >   *      Notes: 7, 8, 9
> >   *
> >   *      A value of "1" indicates that the frontend will reuse the same grants
> >   *      for all transactions, allowing the backend to map them with write
> >   *      access (even when it should be read-only). If the frontend hits the
> >   *      maximum number of allowed persistently mapped grants, it can fallback
> >   *      to non persistent mode. This will cause a performance degradation,
> >   *      since the the backend driver will still try to map those grants
> >   *      persistently. Since the persistent grants protocol is compatible with
> >   *      the previous protocol, a frontend driver can choose to work in
> >   *      persistent mode even when the backend doesn't support it.
> > 
> > Those definitions don't make clear, which side is the one to decide whether
> > the feature should be used or not. In my understanding the related drivers
> > should just advertise their setting (the _ability_ to use the feature), and
> > it should be used only if both sides have written a "1".
> > 
> > With above patches applied, the frontend will set 'feature-persistent' in
> > Xenstore only, if the backend has done so, but the backend will set it
> > only, if the frontend has done it. This results in persistent grants
> > always being disabled.
> 
> Sorry for making the mess, and thank you for the kind report and detailed
> explanation of the problem.
> 
> > 
> > This is wrong, as the value written should not reflect the current state
> > of the interface. That state should be set according to both sides' value,
> > probably a cached one on the blkback side (using a new flag for caching it,
> > not the current state).
> 
> Agreed.  So, I think the issue comes from the fact that we are using one field,
> which was a place for saving only the negotiation result, for yet another
> purpose: caching of the parameter value.  As a result, the advertisement, which
> should follow only the parameter value, becomes inconsistent.
> 
> How about simply adding another field for the caching purpose, so that the
> advertisation could be done regardless of the negotiation?  For example:
> 
> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
> index bda5c815e441..a28473470e66 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -226,6 +226,9 @@ struct xen_vbd {
>  	sector_t		size;
>  	unsigned int		flush_support:1;
>  	unsigned int		discard_secure:1;
> +	/* Connect-time cached feature_persistent parameter value */
> +	unsigned int		feature_gnt_persistent_parm:1;
> +	/* Persistent grants feature negotiation result */
>  	unsigned int		feature_gnt_persistent:1;
>  	unsigned int		overflow_max_grants:1;
>  };
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index ee7ad2fb432d..c0227dfa4688 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -907,7 +907,7 @@ static void connect(struct backend_info *be)
>  	xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
>  
>  	err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
> -			be->blkif->vbd.feature_gnt_persistent);
> +			be->blkif->vbd.feature_gnt_persistent_parm);
>  	if (err) {
>  		xenbus_dev_fatal(dev, err, "writing %s/feature-persistent",
>  				 dev->nodename);
> @@ -1085,7 +1085,9 @@ static int connect_ring(struct backend_info *be)
>  		return -ENOSYS;
>  	}
>  
> -	blkif->vbd.feature_gnt_persistent = feature_persistent &&
> +	blkif->vbd.feature_gnt_persistent_parm = feature_persistent;
> +	blkif->vbd.feature_gnt_persistent =
> +		blkif->vbd.feature_gnt_persistent_parm &&
>  		xenbus_read_unsigned(dev->otherend, "feature-persistent", 0);
>  
>  	blkif->vbd.overflow_max_grants = 0;
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 8e56e69fb4c4..dfae08115450 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -213,6 +213,9 @@ struct blkfront_info
>  	unsigned int feature_fua:1;
>  	unsigned int feature_discard:1;
>  	unsigned int feature_secdiscard:1;
> +	/* Connect-time cached feature_persistent parameter */
> +	unsigned int feature_persistent_parm:1;
> +	/* Persistent grants feature negotiation result */
>  	unsigned int feature_persistent:1;
>  	unsigned int bounce:1;
>  	unsigned int discard_granularity;
> @@ -1848,7 +1851,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
>  		goto abort_transaction;
>  	}
>  	err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
> -			info->feature_persistent);
> +			info->feature_persistent_parm);
>  	if (err)
>  		dev_warn(&dev->dev,
>  			 "writing persistent grants feature to xenbus");
> @@ -2281,7 +2284,8 @@ static void blkfront_gather_backend_features(struct blkfront_info *info)
>  	if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0))
>  		blkfront_setup_discard(info);
>  
> -	if (feature_persistent)
> +	info->feature_persistent_parm = feature_persistent;
> +	if (info->feature_persistent_parm)
>  		info->feature_persistent =
>  			!!xenbus_read_unsigned(info->xbdev->otherend,
>  					       "feature-persistent", 0);
> 
> 
> Thanks,
> SJ
> 
> > 
> > The blkif.h comments should be updated to make it clear that the values in
> > Xenstore don't reflect the state of the connection, but the availability of
> > the feature in the related driver.
> > 
> > Comments?
> > 
> > 
> > Juergen


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: “Backend has not unmapped grant” errors
  2022-08-24 20:38         ` SeongJae Park
@ 2022-08-25  6:20           ` Juergen Gross
  2022-08-25 16:22             ` SeongJae Park
  0 siblings, 1 reply; 18+ messages in thread
From: Juergen Gross @ 2022-08-25  6:20 UTC (permalink / raw)
  To: SeongJae Park
  Cc: roger.pau, Marek Marczykowski-Górecki,
	Xen developer discussion, SeongJae Park, Maximilian Heyne


[-- Attachment #1.1.1: Type: text/plain, Size: 8724 bytes --]

On 24.08.22 22:38, SeongJae Park wrote:
> + Roger
> 
> On Wed, 24 Aug 2022 17:44:42 +0000 SeongJae Park <sj@kernel.org> wrote:
> 
>> Hello,
>>
>> On Wed, 24 Aug 2022 08:02:40 +0200 Juergen Gross <jgross@suse.com> wrote:
>>
>>>
>>> [-- Attachment #1.1.1: Type: text/plain, Size: 4312 bytes --]
>>>
>>> On 24.08.22 02:20, Marek Marczykowski-Górecki wrote:
>>>> FWIW, I hit this issue twice already in this week CI run, while it never
>>>> happened before. The difference compared to previous run is Linux
>>>> 5.15.57 vs 5.15.61. The latter reports persistent grants disabled. The
>>>> only related commits I see there are three commits indeed related to
>>>> persistent grants:
>>>>
>>>>     c98e956ef489 xen-blkfront: Apply 'feature_persistent' parameter when connect
>>>>     ef26b5d530d4 xen-blkback: Apply 'feature_persistent' parameter when connect
>>>>     7304be4c985d xen-blkback: fix persistent grants negotiation
>>>>
>>>> But none of the commit messages suggests intentional disabling it
>>>> without explicit request for doing so. I did not requested disabling it
>>>> in toolstack (although I have set backend as "trusted" - XSA-403).
>>>> I have confirmed it's the frontend version that matters. Running older
>>>> frontend kernel with 5.15.61 backend results in persistent grants
>>>> enabled (and both frontend and backend xenstore "feature-persistent"
>>>> entries are "1" in this case).
>>>
>>> This is a mess.
>>>
>>> I think the main problem seems to be that the feature negotiation process
>>> isn't specified in a sane way.
>>>
>>>   From the blkif.h header:
>>>
>>> Backend-side:
>>>    * feature-persistent
>>>    *      Values:         0/1 (boolean)
>>>    *      Default Value:  0
>>>    *      Notes: 7
>>>    *
>>>    *      A value of "1" indicates that the backend can keep the grants used
>>>    *      by the frontend driver mapped, so the same set of grants should be
>>>    *      used in all transactions. The maximum number of grants the backend
>>>    *      can map persistently depends on the implementation, but ideally it
>>>    *      should be RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST. Using this
>>>    *      feature the backend doesn't need to unmap each grant, preventing
>>>    *      costly TLB flushes. The backend driver should only map grants
>>>    *      persistently if the frontend supports it. If a backend driver chooses
>>>    *      to use the persistent protocol when the frontend doesn't support it,
>>>    *      it will probably hit the maximum number of persistently mapped grants
>>>    *      (due to the fact that the frontend won't be reusing the same grants),
>>>    *      and fall back to non-persistent mode. Backend implementations may
>>>    *      shrink or expand the number of persistently mapped grants without
>>>    *      notifying the frontend depending on memory constraints (this might
>>>    *      cause a performance degradation).
>>>
>>> Frontend-side:
>>>    * feature-persistent
>>>    *      Values:         0/1 (boolean)
>>>    *      Default Value:  0
>>>    *      Notes: 7, 8, 9
>>>    *
>>>    *      A value of "1" indicates that the frontend will reuse the same grants
>>>    *      for all transactions, allowing the backend to map them with write
>>>    *      access (even when it should be read-only). If the frontend hits the
>>>    *      maximum number of allowed persistently mapped grants, it can fallback
>>>    *      to non persistent mode. This will cause a performance degradation,
>>>    *      since the the backend driver will still try to map those grants
>>>    *      persistently. Since the persistent grants protocol is compatible with
>>>    *      the previous protocol, a frontend driver can choose to work in
>>>    *      persistent mode even when the backend doesn't support it.
>>>
>>> Those definitions don't make clear, which side is the one to decide whether
>>> the feature should be used or not. In my understanding the related drivers
>>> should just advertise their setting (the _ability_ to use the feature), and
>>> it should be used only if both sides have written a "1".
>>>
>>> With above patches applied, the frontend will set 'feature-persistent' in
>>> Xenstore only, if the backend has done so, but the backend will set it
>>> only, if the frontend has done it. This results in persistent grants
>>> always being disabled.
>>
>> Sorry for making the mess, and thank you for the kind report and detailed
>> explanation of the problem.
>>
>>>
>>> This is wrong, as the value written should not reflect the current state
>>> of the interface. That state should be set according to both sides' value,
>>> probably a cached one on the blkback side (using a new flag for caching it,
>>> not the current state).
>>
>> Agreed.  So, I think the issue comes from the fact that we are using one field,
>> which was a place for saving only the negotiation result, for yet another
>> purpose: caching of the parameter value.  As a result, the advertisement, which
>> should follow only the parameter value, becomes inconsistent.
>>
>> How about simply adding another field for the caching purpose, so that the
>> advertisation could be done regardless of the negotiation?  For example:
>>
>> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
>> index bda5c815e441..a28473470e66 100644
>> --- a/drivers/block/xen-blkback/common.h
>> +++ b/drivers/block/xen-blkback/common.h
>> @@ -226,6 +226,9 @@ struct xen_vbd {
>>   	sector_t		size;
>>   	unsigned int		flush_support:1;
>>   	unsigned int		discard_secure:1;
>> +	/* Connect-time cached feature_persistent parameter value */
>> +	unsigned int		feature_gnt_persistent_parm:1;
>> +	/* Persistent grants feature negotiation result */
>>   	unsigned int		feature_gnt_persistent:1;
>>   	unsigned int		overflow_max_grants:1;
>>   };
>> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
>> index ee7ad2fb432d..c0227dfa4688 100644
>> --- a/drivers/block/xen-blkback/xenbus.c
>> +++ b/drivers/block/xen-blkback/xenbus.c
>> @@ -907,7 +907,7 @@ static void connect(struct backend_info *be)
>>   	xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
>>   
>>   	err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
>> -			be->blkif->vbd.feature_gnt_persistent);
>> +			be->blkif->vbd.feature_gnt_persistent_parm);
>>   	if (err) {
>>   		xenbus_dev_fatal(dev, err, "writing %s/feature-persistent",
>>   				 dev->nodename);
>> @@ -1085,7 +1085,9 @@ static int connect_ring(struct backend_info *be)
>>   		return -ENOSYS;
>>   	}
>>   
>> -	blkif->vbd.feature_gnt_persistent = feature_persistent &&
>> +	blkif->vbd.feature_gnt_persistent_parm = feature_persistent;
>> +	blkif->vbd.feature_gnt_persistent =
>> +		blkif->vbd.feature_gnt_persistent_parm &&
>>   		xenbus_read_unsigned(dev->otherend, "feature-persistent", 0);
>>   
>>   	blkif->vbd.overflow_max_grants = 0;
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 8e56e69fb4c4..dfae08115450 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -213,6 +213,9 @@ struct blkfront_info
>>   	unsigned int feature_fua:1;
>>   	unsigned int feature_discard:1;
>>   	unsigned int feature_secdiscard:1;
>> +	/* Connect-time cached feature_persistent parameter */
>> +	unsigned int feature_persistent_parm:1;
>> +	/* Persistent grants feature negotiation result */
>>   	unsigned int feature_persistent:1;
>>   	unsigned int bounce:1;
>>   	unsigned int discard_granularity;
>> @@ -1848,7 +1851,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
>>   		goto abort_transaction;
>>   	}
>>   	err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
>> -			info->feature_persistent);
>> +			info->feature_persistent_parm);
>>   	if (err)
>>   		dev_warn(&dev->dev,
>>   			 "writing persistent grants feature to xenbus");
>> @@ -2281,7 +2284,8 @@ static void blkfront_gather_backend_features(struct blkfront_info *info)
>>   	if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0))
>>   		blkfront_setup_discard(info);
>>   
>> -	if (feature_persistent)
>> +	info->feature_persistent_parm = feature_persistent;
>> +	if (info->feature_persistent_parm)
>>   		info->feature_persistent =
>>   			!!xenbus_read_unsigned(info->xbdev->otherend,
>>   					       "feature-persistent", 0);
>>

Yes, this is much better IMO.

Could you please send it as two proper patches (one for each driver) with
the correct "Fixes:" tags?


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: “Backend has not unmapped grant” errors
  2022-08-25  6:20           ` Juergen Gross
@ 2022-08-25 16:22             ` SeongJae Park
  0 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-08-25 16:22 UTC (permalink / raw)
  To: Juergen Gross
  Cc: SeongJae Park, roger.pau, Marek Marczykowski-Górecki,
	Xen developer discussion, SeongJae Park, Maximilian Heyne

Hi Juergen,


Thank you for the quick and nice reply!

On Thu, 25 Aug 2022 08:20:33 +0200 Juergen Gross <jgross@suse.com> wrote:

> 
[...]
> 
> Could you please send it as two proper patches (one for each driver) with
> the correct "Fixes:" tags?

Sure, just posted:
https://lore.kernel.org/xen-devel/20220825161511.94922-2-sj@kernel.org/


Thanks,
SJ

> 
> 
> Juergen


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: “Backend has not unmapped grant” errors
  2022-08-24  0:20   ` Marek Marczykowski-Górecki
  2022-08-24  6:02     ` Juergen Gross
@ 2022-08-24  6:11     ` Juergen Gross
  2022-08-28  5:15       ` Demi Marie Obenour
  1 sibling, 1 reply; 18+ messages in thread
From: Juergen Gross @ 2022-08-24  6:11 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: Xen developer discussion


[-- Attachment #1.1.1: Type: text/plain, Size: 2359 bytes --]

On 24.08.22 02:20, Marek Marczykowski-Górecki wrote:
> On Tue, Aug 23, 2022 at 09:48:57AM +0200, Juergen Gross wrote:
>> On 23.08.22 09:40, Demi Marie Obenour wrote:
>>> I recently had a VM’s /dev/xvdb stop working with a “backend has not
>>> unmapped grant” error.  Since /dev/xvdb was the VM’s private volume,
>>> that rendered the VM effectively useless.  I had to kill it with
>>> qvm-kill.
>>>
>>> The backend of /dev/xvdb is dom0, so a malicious backend is clearly not
>>> the cause of this.  I believe the actual cause is a race condition, such
>>> as the following:
>>>
>>> 1. GUI agent in VM allocates grant X.
>>> 2. GUI agent tells GUI daemon in dom0 to map X.
>>> 3. GUI agent frees grant X.
>>> 4. blkfront allocates grant X and passes it to dom0.
>>> 5. dom0’s blkback maps grant X.
>>> 6. blkback unmaps grant X.
>>> 7. GUI daemon maps grant X.
>>> 8. blkfront tries to revoke access to grant X and fails.  Disaster
>>>      ensues.
>>>
>>> What could be done to prevent this race?  Right now all of the
>>> approaches I can think of are horribly backwards-incompatible.  They
>>> require replacing grant IDs with some sort of handle, and requiring
>>> userspace to pass these handles to ioctls.  It is also possible that
>>> netfront and blkfront could race against each other in a way that causes
>>> this, though I suspect that race would be much harder to trigger.
>>>
>>> This has happened more than once so it is not a fluke due to e.g. cosmic
>>> rays or other random bit-flips.
>>>
>>> Marek, do you have any suggestions?
>>
>> To me that sounds like the interface of the GUI is the culprit.
>>
>> The GUI agent in the guest should only free a grant, if it got a message
>> from the backend that it can do so. Just assuming to be able to free it
>> because it isn't in use currently is the broken assumption here.
> 
> FWIW, I hit this issue twice already in this week CI run, while it never
> happened before. The difference compared to previous run is Linux
> 5.15.57 vs 5.15.61. The latter reports persistent grants disabled.

I think this additional bug is just triggering the race in the GUI
interface more easily, as blkfront will allocate new grants with a
much higher frequency.

So fixing the persistent grant issue will just paper over the real
issue.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: “Backend has not unmapped grant” errors
  2022-08-24  6:11     ` Juergen Gross
@ 2022-08-28  5:15       ` Demi Marie Obenour
  2022-08-29 12:55         ` Juergen Gross
  0 siblings, 1 reply; 18+ messages in thread
From: Demi Marie Obenour @ 2022-08-28  5:15 UTC (permalink / raw)
  To: Juergen Gross, Marek Marczykowski-Górecki; +Cc: Xen developer discussion

[-- Attachment #1: Type: text/plain, Size: 4024 bytes --]

On Wed, Aug 24, 2022 at 08:11:56AM +0200, Juergen Gross wrote:
> On 24.08.22 02:20, Marek Marczykowski-Górecki wrote:
> > On Tue, Aug 23, 2022 at 09:48:57AM +0200, Juergen Gross wrote:
> > > On 23.08.22 09:40, Demi Marie Obenour wrote:
> > > > I recently had a VM’s /dev/xvdb stop working with a “backend has not
> > > > unmapped grant” error.  Since /dev/xvdb was the VM’s private volume,
> > > > that rendered the VM effectively useless.  I had to kill it with
> > > > qvm-kill.
> > > > 
> > > > The backend of /dev/xvdb is dom0, so a malicious backend is clearly not
> > > > the cause of this.  I believe the actual cause is a race condition, such
> > > > as the following:
> > > > 
> > > > 1. GUI agent in VM allocates grant X.
> > > > 2. GUI agent tells GUI daemon in dom0 to map X.
> > > > 3. GUI agent frees grant X.
> > > > 4. blkfront allocates grant X and passes it to dom0.
> > > > 5. dom0’s blkback maps grant X.
> > > > 6. blkback unmaps grant X.
> > > > 7. GUI daemon maps grant X.
> > > > 8. blkfront tries to revoke access to grant X and fails.  Disaster
> > > >      ensues.
> > > > 
> > > > What could be done to prevent this race?  Right now all of the
> > > > approaches I can think of are horribly backwards-incompatible.  They
> > > > require replacing grant IDs with some sort of handle, and requiring
> > > > userspace to pass these handles to ioctls.  It is also possible that
> > > > netfront and blkfront could race against each other in a way that causes
> > > > this, though I suspect that race would be much harder to trigger.
> > > > 
> > > > This has happened more than once so it is not a fluke due to e.g. cosmic
> > > > rays or other random bit-flips.
> > > > 
> > > > Marek, do you have any suggestions?
> > > 
> > > To me that sounds like the interface of the GUI is the culprit.
> > > 
> > > The GUI agent in the guest should only free a grant, if it got a message
> > > from the backend that it can do so. Just assuming to be able to free it
> > > because it isn't in use currently is the broken assumption here.
> > 
> > FWIW, I hit this issue twice already in this week CI run, while it never
> > happened before. The difference compared to previous run is Linux
> > 5.15.57 vs 5.15.61. The latter reports persistent grants disabled.
> 
> I think this additional bug is just triggering the race in the GUI
> interface more easily, as blkfront will allocate new grants with a
> much higher frequency.
> 
> So fixing the persistent grant issue will just paper over the real
> issue.

Indeed so, but making the bug happen much less frequently is still a
significant win for users.

In the long term, there is one situation I do not have a good solution
for: recovery from GUI agent crashes.  If the GUI agent crashes, the
kernel it is running under has two bad choices.  Either the kernel can
reclaim the grants, risking them being mapped at a later time by the GUI
daemon, or it can leak them, which is bad for obvious reasons.  I
believe the current implementation makes the former choice.

To fix this problem, I recommend the following changes:

1. Treat “backend has not unmapped grant” errors as non-fatal.  The most
   likely cause is buggy userspace software, not an attempt to exploit
   XSA-396.  Instead of disabling the device, just log a warning message
   and put the grant on the deferred queue.  Even leaking the grant
   would be preferable to the current behavior, as disabling a block
   device typically leaves the VM unusable.

2. Ensure that the same grant being mapped twice is handled correctly.
   At least Linux is known to have bugs in this regard.

3. Provide a means for a domain to be notified by Xen whenever one of
   its grants is unmapped.  Setting an event channel and writing to a
   shared ring would suffice.  This would allow eliminating the kludgy
   deferred freeing mechanism.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: “Backend has not unmapped grant” errors
  2022-08-28  5:15       ` Demi Marie Obenour
@ 2022-08-29 12:55         ` Juergen Gross
  2022-08-29 14:39           ` Marek Marczykowski-Górecki
  2022-08-29 18:54           ` Demi Marie Obenour
  0 siblings, 2 replies; 18+ messages in thread
From: Juergen Gross @ 2022-08-29 12:55 UTC (permalink / raw)
  To: Demi Marie Obenour, Marek Marczykowski-Górecki
  Cc: Xen developer discussion


[-- Attachment #1.1.1: Type: text/plain, Size: 5192 bytes --]

On 28.08.22 07:15, Demi Marie Obenour wrote:
> On Wed, Aug 24, 2022 at 08:11:56AM +0200, Juergen Gross wrote:
>> On 24.08.22 02:20, Marek Marczykowski-Górecki wrote:
>>> On Tue, Aug 23, 2022 at 09:48:57AM +0200, Juergen Gross wrote:
>>>> On 23.08.22 09:40, Demi Marie Obenour wrote:
>>>>> I recently had a VM’s /dev/xvdb stop working with a “backend has not
>>>>> unmapped grant” error.  Since /dev/xvdb was the VM’s private volume,
>>>>> that rendered the VM effectively useless.  I had to kill it with
>>>>> qvm-kill.
>>>>>
>>>>> The backend of /dev/xvdb is dom0, so a malicious backend is clearly not
>>>>> the cause of this.  I believe the actual cause is a race condition, such
>>>>> as the following:
>>>>>
>>>>> 1. GUI agent in VM allocates grant X.
>>>>> 2. GUI agent tells GUI daemon in dom0 to map X.
>>>>> 3. GUI agent frees grant X.
>>>>> 4. blkfront allocates grant X and passes it to dom0.
>>>>> 5. dom0’s blkback maps grant X.
>>>>> 6. blkback unmaps grant X.
>>>>> 7. GUI daemon maps grant X.
>>>>> 8. blkfront tries to revoke access to grant X and fails.  Disaster
>>>>>       ensues.
>>>>>
>>>>> What could be done to prevent this race?  Right now all of the
>>>>> approaches I can think of are horribly backwards-incompatible.  They
>>>>> require replacing grant IDs with some sort of handle, and requiring
>>>>> userspace to pass these handles to ioctls.  It is also possible that
>>>>> netfront and blkfront could race against each other in a way that causes
>>>>> this, though I suspect that race would be much harder to trigger.
>>>>>
>>>>> This has happened more than once so it is not a fluke due to e.g. cosmic
>>>>> rays or other random bit-flips.
>>>>>
>>>>> Marek, do you have any suggestions?
>>>>
>>>> To me that sounds like the interface of the GUI is the culprit.
>>>>
>>>> The GUI agent in the guest should only free a grant, if it got a message
>>>> from the backend that it can do so. Just assuming to be able to free it
>>>> because it isn't in use currently is the broken assumption here.
>>>
>>> FWIW, I hit this issue twice already in this week CI run, while it never
>>> happened before. The difference compared to previous run is Linux
>>> 5.15.57 vs 5.15.61. The latter reports persistent grants disabled.
>>
>> I think this additional bug is just triggering the race in the GUI
>> interface more easily, as blkfront will allocate new grants with a
>> much higher frequency.
>>
>> So fixing the persistent grant issue will just paper over the real
>> issue.
> 
> Indeed so, but making the bug happen much less frequently is still a
> significant win for users.

Probably, yes.

> In the long term, there is one situation I do not have a good solution
> for: recovery from GUI agent crashes.  If the GUI agent crashes, the
> kernel it is running under has two bad choices.  Either the kernel can
> reclaim the grants, risking them being mapped at a later time by the GUI
> daemon, or it can leak them, which is bad for obvious reasons.  I
> believe the current implementation makes the former choice.

It does.

I don't have enough information about the GUI architecture you are using.
Which components are involved on the backend side, and which on the
frontend side? Especially the responsibilities and communication regarding
grants is important here.

> To fix this problem, I recommend the following changes:
> 
> 1. Treat “backend has not unmapped grant” errors as non-fatal.  The most
>     likely cause is buggy userspace software, not an attempt to exploit
>     XSA-396.  Instead of disabling the device, just log a warning message
>     and put the grant on the deferred queue.  Even leaking the grant
>     would be preferable to the current behavior, as disabling a block
>     device typically leaves the VM unusable.

Sorry, I don't agree. This is a major violation of the normal I/O
architecture. Your reasoning with the disabled block device doesn't make
much sense IMHO, as the mapped grant was due to a bad interface leading to
another component using a grant it was not meant to use.

Shutting down the block device is the right thing to do here, as data
corruption might be happening.

> 2. Ensure that the same grant being mapped twice is handled correctly.
>     At least Linux is known to have bugs in this regard.

I agree that this should be repaired.

> 3. Provide a means for a domain to be notified by Xen whenever one of
>     its grants is unmapped.  Setting an event channel and writing to a
>     shared ring would suffice.  This would allow eliminating the kludgy
>     deferred freeing mechanism.

Interesting idea.

I believe such an interface would need to be activated per grant, as
otherwise performance could suffer a lot. There are still some unused bits
in the grant flags, one could be used for that purpose.

I'm not sure how often this would be used. In case it is only for the rare
case of unexpectedly long mapped grant pages, a simple event might do the
job, with the event handler just skimming through the pending unmaps to
find the grants being available again.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: “Backend has not unmapped grant” errors
  2022-08-29 12:55         ` Juergen Gross
@ 2022-08-29 14:39           ` Marek Marczykowski-Górecki
  2022-08-29 16:03             ` Juergen Gross
  2022-08-29 18:32             ` Demi Marie Obenour
  2022-08-29 18:54           ` Demi Marie Obenour
  1 sibling, 2 replies; 18+ messages in thread
From: Marek Marczykowski-Górecki @ 2022-08-29 14:39 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Demi Marie Obenour, Xen developer discussion

[-- Attachment #1: Type: text/plain, Size: 6324 bytes --]

On Mon, Aug 29, 2022 at 02:55:55PM +0200, Juergen Gross wrote:
> On 28.08.22 07:15, Demi Marie Obenour wrote:
> > On Wed, Aug 24, 2022 at 08:11:56AM +0200, Juergen Gross wrote:
> > > On 24.08.22 02:20, Marek Marczykowski-Górecki wrote:
> > > > On Tue, Aug 23, 2022 at 09:48:57AM +0200, Juergen Gross wrote:
> > > > > On 23.08.22 09:40, Demi Marie Obenour wrote:
> > > > > > I recently had a VM’s /dev/xvdb stop working with a “backend has not
> > > > > > unmapped grant” error.  Since /dev/xvdb was the VM’s private volume,
> > > > > > that rendered the VM effectively useless.  I had to kill it with
> > > > > > qvm-kill.
> > > > > > 
> > > > > > The backend of /dev/xvdb is dom0, so a malicious backend is clearly not
> > > > > > the cause of this.  I believe the actual cause is a race condition, such
> > > > > > as the following:
> > > > > > 
> > > > > > 1. GUI agent in VM allocates grant X.
> > > > > > 2. GUI agent tells GUI daemon in dom0 to map X.
> > > > > > 3. GUI agent frees grant X.
> > > > > > 4. blkfront allocates grant X and passes it to dom0.
> > > > > > 5. dom0’s blkback maps grant X.
> > > > > > 6. blkback unmaps grant X.
> > > > > > 7. GUI daemon maps grant X.
> > > > > > 8. blkfront tries to revoke access to grant X and fails.  Disaster
> > > > > >       ensues.
> > > > > > 
> > > > > > What could be done to prevent this race?  Right now all of the
> > > > > > approaches I can think of are horribly backwards-incompatible.  They
> > > > > > require replacing grant IDs with some sort of handle, and requiring
> > > > > > userspace to pass these handles to ioctls.  It is also possible that
> > > > > > netfront and blkfront could race against each other in a way that causes
> > > > > > this, though I suspect that race would be much harder to trigger.
> > > > > > 
> > > > > > This has happened more than once so it is not a fluke due to e.g. cosmic
> > > > > > rays or other random bit-flips.
> > > > > > 
> > > > > > Marek, do you have any suggestions?
> > > > > 
> > > > > To me that sounds like the interface of the GUI is the culprit.
> > > > > 
> > > > > The GUI agent in the guest should only free a grant, if it got a message
> > > > > from the backend that it can do so. Just assuming to be able to free it
> > > > > because it isn't in use currently is the broken assumption here.
> > > > 
> > > > FWIW, I hit this issue twice already in this week CI run, while it never
> > > > happened before. The difference compared to previous run is Linux
> > > > 5.15.57 vs 5.15.61. The latter reports persistent grants disabled.
> > > 
> > > I think this additional bug is just triggering the race in the GUI
> > > interface more easily, as blkfront will allocate new grants with a
> > 
> > 1. Treat “backend has not unmapped grant” errors as non-fatal.  The most
> >     likely cause is buggy userspace software, not an attempt to exploit
> >     XSA-396.  Instead of disabling the device, just log a warning message
> > > much higher frequency.
> > > 
> > > So fixing the persistent grant issue will just paper over the real
> > > issue.
> > 
> > Indeed so, but making the bug happen much less frequently is still a
> > significant win for users.
> 
> Probably, yes.
> 
> > In the long term, there is one situation I do not have a good solution
> > for: recovery from GUI agent crashes.  If the GUI agent crashes, the
> > kernel it is running under has two bad choices.  Either the kernel can
> > reclaim the grants, risking them being mapped at a later time by the GUI
> > daemon, or it can leak them, which is bad for obvious reasons.  I
> > believe the current implementation makes the former choice.
> 
> It does.
> 
> I don't have enough information about the GUI architecture you are using.
> Which components are involved on the backend side, and which on the
> frontend side? Especially the responsibilities and communication regarding
> grants is important here.

I'll limit the description to the relevant minimum here.
The gui-agent(*) uses gntalloc to share framebuffers (they are allocated
whenever an application within domU opens a window), then sends grant
reference numbers over vchan to the gui-daemon (running in dom0 by
default, but it can be also another domU).
Then the gui-daemon(*) maps them.
Later, when an application closes a window, the shared memory is
unmapped, and gui-daemon is informed about it. Releasing grant refs is
deferred by the kernel (until gui-daemon unmaps them). It may happen
that unmapping on the gui-agent side will happen before gui-daemon maps
them. We are modifying our GUI protocol to delay releasing grants on the
user space side, to coordinate with gui-daemon (basically wait until
gui-daemon confirms it unmapped them). This should fix the "normal"
case.
But if the gui-agent crashes just after sending grant refs, but before
gui-daemon maps them, then the problem is still there. If they are
immediately released by the kernel for others to use, we can hit the
same issue again (for example blkfront using them, and then gui-daemon
mapping them). I don't see race-free method for solving this with the
current API. GUI daemon can notice when such situation happens (by
checking if gui-agent is still alive after mapping grants), but that is
too late already.

The main difference compared to kernel drivers is the automatic release
on crash (or other unclean exit). In case of kernel driver crash, either
the whole VM goes down, or at least automatic release doesn't happen.
Maybe gntalloc could have some flag (per open file? per allocated
grant?) to _not_ release grant reference (aka leak it) in case of
implicit unmap, instead of explicit release? Such explicit release
would need to be added to the Linux gntshr API, as xengntshr_unshare()
currently is just munmap()). I don't see many other options to avoid
userspace crash (potentially) taking down PV device with it too...

(*) gui-agent and gui-daemon here are both in fact two processes (qubes gui
process that handles vchan communication and Xorg that does the actual
mapping). It complicates few things, but generally is irrelevant detail
from the Xen point of view.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: “Backend has not unmapped grant” errors
  2022-08-29 14:39           ` Marek Marczykowski-Górecki
@ 2022-08-29 16:03             ` Juergen Gross
  2022-08-29 18:32             ` Demi Marie Obenour
  1 sibling, 0 replies; 18+ messages in thread
From: Juergen Gross @ 2022-08-29 16:03 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Demi Marie Obenour, Xen developer discussion


[-- Attachment #1.1.1: Type: text/plain, Size: 6461 bytes --]

On 29.08.22 16:39, Marek Marczykowski-Górecki wrote:
> On Mon, Aug 29, 2022 at 02:55:55PM +0200, Juergen Gross wrote:
>> On 28.08.22 07:15, Demi Marie Obenour wrote:
>>> On Wed, Aug 24, 2022 at 08:11:56AM +0200, Juergen Gross wrote:
>>>> On 24.08.22 02:20, Marek Marczykowski-Górecki wrote:
>>>>> On Tue, Aug 23, 2022 at 09:48:57AM +0200, Juergen Gross wrote:
>>>>>> On 23.08.22 09:40, Demi Marie Obenour wrote:
>>>>>>> I recently had a VM’s /dev/xvdb stop working with a “backend has not
>>>>>>> unmapped grant” error.  Since /dev/xvdb was the VM’s private volume,
>>>>>>> that rendered the VM effectively useless.  I had to kill it with
>>>>>>> qvm-kill.
>>>>>>>
>>>>>>> The backend of /dev/xvdb is dom0, so a malicious backend is clearly not
>>>>>>> the cause of this.  I believe the actual cause is a race condition, such
>>>>>>> as the following:
>>>>>>>
>>>>>>> 1. GUI agent in VM allocates grant X.
>>>>>>> 2. GUI agent tells GUI daemon in dom0 to map X.
>>>>>>> 3. GUI agent frees grant X.
>>>>>>> 4. blkfront allocates grant X and passes it to dom0.
>>>>>>> 5. dom0’s blkback maps grant X.
>>>>>>> 6. blkback unmaps grant X.
>>>>>>> 7. GUI daemon maps grant X.
>>>>>>> 8. blkfront tries to revoke access to grant X and fails.  Disaster
>>>>>>>        ensues.
>>>>>>>
>>>>>>> What could be done to prevent this race?  Right now all of the
>>>>>>> approaches I can think of are horribly backwards-incompatible.  They
>>>>>>> require replacing grant IDs with some sort of handle, and requiring
>>>>>>> userspace to pass these handles to ioctls.  It is also possible that
>>>>>>> netfront and blkfront could race against each other in a way that causes
>>>>>>> this, though I suspect that race would be much harder to trigger.
>>>>>>>
>>>>>>> This has happened more than once so it is not a fluke due to e.g. cosmic
>>>>>>> rays or other random bit-flips.
>>>>>>>
>>>>>>> Marek, do you have any suggestions?
>>>>>>
>>>>>> To me that sounds like the interface of the GUI is the culprit.
>>>>>>
>>>>>> The GUI agent in the guest should only free a grant, if it got a message
>>>>>> from the backend that it can do so. Just assuming to be able to free it
>>>>>> because it isn't in use currently is the broken assumption here.
>>>>>
>>>>> FWIW, I hit this issue twice already in this week CI run, while it never
>>>>> happened before. The difference compared to previous run is Linux
>>>>> 5.15.57 vs 5.15.61. The latter reports persistent grants disabled.
>>>>
>>>> I think this additional bug is just triggering the race in the GUI
>>>> interface more easily, as blkfront will allocate new grants with a
>>>
>>> 1. Treat “backend has not unmapped grant” errors as non-fatal.  The most
>>>      likely cause is buggy userspace software, not an attempt to exploit
>>>      XSA-396.  Instead of disabling the device, just log a warning message
>>>> much higher frequency.
>>>>
>>>> So fixing the persistent grant issue will just paper over the real
>>>> issue.
>>>
>>> Indeed so, but making the bug happen much less frequently is still a
>>> significant win for users.
>>
>> Probably, yes.
>>
>>> In the long term, there is one situation I do not have a good solution
>>> for: recovery from GUI agent crashes.  If the GUI agent crashes, the
>>> kernel it is running under has two bad choices.  Either the kernel can
>>> reclaim the grants, risking them being mapped at a later time by the GUI
>>> daemon, or it can leak them, which is bad for obvious reasons.  I
>>> believe the current implementation makes the former choice.
>>
>> It does.
>>
>> I don't have enough information about the GUI architecture you are using.
>> Which components are involved on the backend side, and which on the
>> frontend side? Especially the responsibilities and communication regarding
>> grants is important here.
> 
> I'll limit the description to the relevant minimum here.

Thanks for the information. It helps a lot.

> The gui-agent(*) uses gntalloc to share framebuffers (they are allocated
> whenever an application within domU opens a window), then sends grant
> reference numbers over vchan to the gui-daemon (running in dom0 by
> default, but it can be also another domU).
> Then the gui-daemon(*) maps them.
> Later, when an application closes a window, the shared memory is
> unmapped, and gui-daemon is informed about it. Releasing grant refs is
> deferred by the kernel (until gui-daemon unmaps them). It may happen
> that unmapping on the gui-agent side will happen before gui-daemon maps
> them. We are modifying our GUI protocol to delay releasing grants on the
> user space side, to coordinate with gui-daemon (basically wait until
> gui-daemon confirms it unmapped them). This should fix the "normal"
> case.
> But if the gui-agent crashes just after sending grant refs, but before
> gui-daemon maps them, then the problem is still there. If they are
> immediately released by the kernel for others to use, we can hit the
> same issue again (for example blkfront using them, and then gui-daemon
> mapping them). I don't see race-free method for solving this with the
> current API. GUI daemon can notice when such situation happens (by
> checking if gui-agent is still alive after mapping grants), but that is
> too late already.
> 
> The main difference compared to kernel drivers is the automatic release
> on crash (or other unclean exit). In case of kernel driver crash, either
> the whole VM goes down, or at least automatic release doesn't happen.
> Maybe gntalloc could have some flag (per open file? per allocated
> grant?) to _not_ release grant reference (aka leak it) in case of
> implicit unmap, instead of explicit release? Such explicit release
> would need to be added to the Linux gntshr API, as xengntshr_unshare()
> currently is just munmap()). I don't see many other options to avoid
> userspace crash (potentially) taking down PV device with it too...

My idea would be to add a new ioctl() to the gntalloc driver allowing to
specify a permanent name for a file. This would lead to:

- the grants not to be dropped when the process is dying
- in case grants with this name are existing, they are added to the file
   descriptor, resulting in them being under control of the new process
- the permanent grants would need to be remove explicitly instead of
   cleaned up due to close()


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: “Backend has not unmapped grant” errors
  2022-08-29 14:39           ` Marek Marczykowski-Górecki
  2022-08-29 16:03             ` Juergen Gross
@ 2022-08-29 18:32             ` Demi Marie Obenour
  1 sibling, 0 replies; 18+ messages in thread
From: Demi Marie Obenour @ 2022-08-29 18:32 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, Juergen Gross; +Cc: Xen developer discussion

[-- Attachment #1: Type: text/plain, Size: 6934 bytes --]

On Mon, Aug 29, 2022 at 04:39:29PM +0200, Marek Marczykowski-Górecki wrote:
> On Mon, Aug 29, 2022 at 02:55:55PM +0200, Juergen Gross wrote:
> > On 28.08.22 07:15, Demi Marie Obenour wrote:
> > > On Wed, Aug 24, 2022 at 08:11:56AM +0200, Juergen Gross wrote:
> > > > On 24.08.22 02:20, Marek Marczykowski-Górecki wrote:
> > > > > On Tue, Aug 23, 2022 at 09:48:57AM +0200, Juergen Gross wrote:
> > > > > > On 23.08.22 09:40, Demi Marie Obenour wrote:
> > > > > > > I recently had a VM’s /dev/xvdb stop working with a “backend has not
> > > > > > > unmapped grant” error.  Since /dev/xvdb was the VM’s private volume,
> > > > > > > that rendered the VM effectively useless.  I had to kill it with
> > > > > > > qvm-kill.
> > > > > > > 
> > > > > > > The backend of /dev/xvdb is dom0, so a malicious backend is clearly not
> > > > > > > the cause of this.  I believe the actual cause is a race condition, such
> > > > > > > as the following:
> > > > > > > 
> > > > > > > 1. GUI agent in VM allocates grant X.
> > > > > > > 2. GUI agent tells GUI daemon in dom0 to map X.
> > > > > > > 3. GUI agent frees grant X.
> > > > > > > 4. blkfront allocates grant X and passes it to dom0.
> > > > > > > 5. dom0’s blkback maps grant X.
> > > > > > > 6. blkback unmaps grant X.
> > > > > > > 7. GUI daemon maps grant X.
> > > > > > > 8. blkfront tries to revoke access to grant X and fails.  Disaster
> > > > > > >       ensues.
> > > > > > > 
> > > > > > > What could be done to prevent this race?  Right now all of the
> > > > > > > approaches I can think of are horribly backwards-incompatible.  They
> > > > > > > require replacing grant IDs with some sort of handle, and requiring
> > > > > > > userspace to pass these handles to ioctls.  It is also possible that
> > > > > > > netfront and blkfront could race against each other in a way that causes
> > > > > > > this, though I suspect that race would be much harder to trigger.
> > > > > > > 
> > > > > > > This has happened more than once so it is not a fluke due to e.g. cosmic
> > > > > > > rays or other random bit-flips.
> > > > > > > 
> > > > > > > Marek, do you have any suggestions?
> > > > > > 
> > > > > > To me that sounds like the interface of the GUI is the culprit.
> > > > > > 
> > > > > > The GUI agent in the guest should only free a grant, if it got a message
> > > > > > from the backend that it can do so. Just assuming to be able to free it
> > > > > > because it isn't in use currently is the broken assumption here.
> > > > > 
> > > > > FWIW, I hit this issue twice already in this week CI run, while it never
> > > > > happened before. The difference compared to previous run is Linux
> > > > > 5.15.57 vs 5.15.61. The latter reports persistent grants disabled.
> > > > 
> > > > I think this additional bug is just triggering the race in the GUI
> > > > interface more easily, as blkfront will allocate new grants with a
> > > 
> > > 1. Treat “backend has not unmapped grant” errors as non-fatal.  The most
> > >     likely cause is buggy userspace software, not an attempt to exploit
> > >     XSA-396.  Instead of disabling the device, just log a warning message
> > > > much higher frequency.
> > > > 
> > > > So fixing the persistent grant issue will just paper over the real
> > > > issue.
> > > 
> > > Indeed so, but making the bug happen much less frequently is still a
> > > significant win for users.
> > 
> > Probably, yes.
> > 
> > > In the long term, there is one situation I do not have a good solution
> > > for: recovery from GUI agent crashes.  If the GUI agent crashes, the
> > > kernel it is running under has two bad choices.  Either the kernel can
> > > reclaim the grants, risking them being mapped at a later time by the GUI
> > > daemon, or it can leak them, which is bad for obvious reasons.  I
> > > believe the current implementation makes the former choice.
> > 
> > It does.
> > 
> > I don't have enough information about the GUI architecture you are using.
> > Which components are involved on the backend side, and which on the
> > frontend side? Especially the responsibilities and communication regarding
> > grants is important here.
> 
> I'll limit the description to the relevant minimum here.
> The gui-agent(*) uses gntalloc to share framebuffers (they are allocated
> whenever an application within domU opens a window), then sends grant
> reference numbers over vchan to the gui-daemon (running in dom0 by
> default, but it can be also another domU).
> Then the gui-daemon(*) maps them.
> Later, when an application closes a window, the shared memory is
> unmapped, and gui-daemon is informed about it. Releasing grant refs is
> deferred by the kernel (until gui-daemon unmaps them). It may happen
> that unmapping on the gui-agent side will happen before gui-daemon maps
> them. We are modifying our GUI protocol to delay releasing grants on the
> user space side, to coordinate with gui-daemon (basically wait until
> gui-daemon confirms it unmapped them). This should fix the "normal"
> case.
> But if the gui-agent crashes just after sending grant refs, but before
> gui-daemon maps them, then the problem is still there. If they are
> immediately released by the kernel for others to use, we can hit the
> same issue again (for example blkfront using them, and then gui-daemon
> mapping them). I don't see race-free method for solving this with the
> current API. GUI daemon can notice when such situation happens (by
> checking if gui-agent is still alive after mapping grants), but that is
> too late already.
> 
> The main difference compared to kernel drivers is the automatic release
> on crash (or other unclean exit). In case of kernel driver crash, either
> the whole VM goes down, or at least automatic release doesn't happen.
> Maybe gntalloc could have some flag (per open file? per allocated
> grant?) to _not_ release grant reference (aka leak it) in case of
> implicit unmap, instead of explicit release? Such explicit release
> would need to be added to the Linux gntshr API, as xengntshr_unshare()
> currently is just munmap()). I don't see many other options to avoid
> userspace crash (potentially) taking down PV device with it too...

That is still less than great, as it leads to a memory leak.  Another
approach would be some sort of unmap/revoke operation in the backend, so
that the backend revokes its own access to the grants before telling the
frontend it has unmapped them.  This would cause the userspace mmap()
call to fail.

> (*) gui-agent and gui-daemon here are both in fact two processes (qubes gui
> process that handles vchan communication and Xorg that does the actual
> mapping). It complicates few things, but generally is irrelevant detail
> from the Xen point of view.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: “Backend has not unmapped grant” errors
  2022-08-29 12:55         ` Juergen Gross
  2022-08-29 14:39           ` Marek Marczykowski-Górecki
@ 2022-08-29 18:54           ` Demi Marie Obenour
  1 sibling, 0 replies; 18+ messages in thread
From: Demi Marie Obenour @ 2022-08-29 18:54 UTC (permalink / raw)
  To: Juergen Gross, Marek Marczykowski-Górecki; +Cc: Xen developer discussion

[-- Attachment #1: Type: text/plain, Size: 6372 bytes --]

On Mon, Aug 29, 2022 at 02:55:55PM +0200, Juergen Gross wrote:
> On 28.08.22 07:15, Demi Marie Obenour wrote:
> > On Wed, Aug 24, 2022 at 08:11:56AM +0200, Juergen Gross wrote:
> > > On 24.08.22 02:20, Marek Marczykowski-Górecki wrote:
> > > > On Tue, Aug 23, 2022 at 09:48:57AM +0200, Juergen Gross wrote:
> > > > > On 23.08.22 09:40, Demi Marie Obenour wrote:
> > > > > > I recently had a VM’s /dev/xvdb stop working with a “backend has not
> > > > > > unmapped grant” error.  Since /dev/xvdb was the VM’s private volume,
> > > > > > that rendered the VM effectively useless.  I had to kill it with
> > > > > > qvm-kill.
> > > > > > 
> > > > > > The backend of /dev/xvdb is dom0, so a malicious backend is clearly not
> > > > > > the cause of this.  I believe the actual cause is a race condition, such
> > > > > > as the following:
> > > > > > 
> > > > > > 1. GUI agent in VM allocates grant X.
> > > > > > 2. GUI agent tells GUI daemon in dom0 to map X.
> > > > > > 3. GUI agent frees grant X.
> > > > > > 4. blkfront allocates grant X and passes it to dom0.
> > > > > > 5. dom0’s blkback maps grant X.
> > > > > > 6. blkback unmaps grant X.
> > > > > > 7. GUI daemon maps grant X.
> > > > > > 8. blkfront tries to revoke access to grant X and fails.  Disaster
> > > > > >       ensues.
> > > > > > 
> > > > > > What could be done to prevent this race?  Right now all of the
> > > > > > approaches I can think of are horribly backwards-incompatible.  They
> > > > > > require replacing grant IDs with some sort of handle, and requiring
> > > > > > userspace to pass these handles to ioctls.  It is also possible that
> > > > > > netfront and blkfront could race against each other in a way that causes
> > > > > > this, though I suspect that race would be much harder to trigger.
> > > > > > 
> > > > > > This has happened more than once so it is not a fluke due to e.g. cosmic
> > > > > > rays or other random bit-flips.
> > > > > > 
> > > > > > Marek, do you have any suggestions?
> > > > > 
> > > > > To me that sounds like the interface of the GUI is the culprit.
> > > > > 
> > > > > The GUI agent in the guest should only free a grant, if it got a message
> > > > > from the backend that it can do so. Just assuming to be able to free it
> > > > > because it isn't in use currently is the broken assumption here.
> > > > 
> > > > FWIW, I hit this issue twice already in this week CI run, while it never
> > > > happened before. The difference compared to previous run is Linux
> > > > 5.15.57 vs 5.15.61. The latter reports persistent grants disabled.
> > > 
> > > I think this additional bug is just triggering the race in the GUI
> > > interface more easily, as blkfront will allocate new grants with a
> > > much higher frequency.
> > > 
> > > So fixing the persistent grant issue will just paper over the real
> > > issue.
> > 
> > Indeed so, but making the bug happen much less frequently is still a
> > significant win for users.
> 
> Probably, yes.
> 
> > In the long term, there is one situation I do not have a good solution
> > for: recovery from GUI agent crashes.  If the GUI agent crashes, the
> > kernel it is running under has two bad choices.  Either the kernel can
> > reclaim the grants, risking them being mapped at a later time by the GUI
> > daemon, or it can leak them, which is bad for obvious reasons.  I
> > believe the current implementation makes the former choice.
> 
> It does.
> 
> I don't have enough information about the GUI architecture you are using.
> Which components are involved on the backend side, and which on the
> frontend side? Especially the responsibilities and communication regarding
> grants is important here.

See Marek’s reply.

> > To fix this problem, I recommend the following changes:
> > 
> > 1. Treat “backend has not unmapped grant” errors as non-fatal.  The most
> >     likely cause is buggy userspace software, not an attempt to exploit
> >     XSA-396.  Instead of disabling the device, just log a warning message
> >     and put the grant on the deferred queue.  Even leaking the grant
> >     would be preferable to the current behavior, as disabling a block
> >     device typically leaves the VM unusable.
> 
> Sorry, I don't agree. This is a major violation of the normal I/O
> architecture. Your reasoning with the disabled block device doesn't make
> much sense IMHO, as the mapped grant was due to a bad interface leading to
> another component using a grant it was not meant to use.
> 
> Shutting down the block device is the right thing to do here, as data
> corruption might be happening.

In this case, the grants are being mapped read-only, so (unless I have
missed something) data corruption is not possible.

> > 3. Provide a means for a domain to be notified by Xen whenever one of
> >     its grants is unmapped.  Setting an event channel and writing to a
> >     shared ring would suffice.  This would allow eliminating the kludgy
> >     deferred freeing mechanism.
> 
> Interesting idea.
> 
> I believe such an interface would need to be activated per grant, as
> otherwise performance could suffer a lot. There are still some unused bits
> in the grant flags, one could be used for that purpose.

At least in the GUI case, large numbers of grants are typically unmapped
at once, and a notification is only necessary when the entire block has
been unmapped.  This should mitigate the performance concerns.

> I'm not sure how often this would be used. In case it is only for the rare
> case of unexpectedly long mapped grant pages, a simple event might do the
> job, with the event handler just skimming through the pending unmaps to
> find the grants being available again.

In Qubes OS, this happens so often that we had to patch the Linux kernel
to handle it better.  Prior to the patch, the background deferred
reclaim could not keep up, causing a memory leak.  Furthermore, the log
messages whenever an unmap had to be deferred were flooding the logs.
While we could change the GUI protocol to provide an unmap-time
notification, this is only because we use an LD_PRELOAD hack to hook
Xorg’s unmapping calls.  I would prefer to not continue to rely on this.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-08-29 18:54 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-23  7:40 “Backend has not unmapped grant” errors Demi Marie Obenour
2022-08-23  7:48 ` Juergen Gross
2022-08-24  0:20   ` Marek Marczykowski-Górecki
2022-08-24  6:02     ` Juergen Gross
2022-08-24  6:30       ` Jan Beulich
2022-08-24  6:36         ` Juergen Gross
2022-08-24  6:40           ` Jan Beulich
2022-08-24 17:44       ` SeongJae Park
2022-08-24 20:38         ` SeongJae Park
2022-08-25  6:20           ` Juergen Gross
2022-08-25 16:22             ` SeongJae Park
2022-08-24  6:11     ` Juergen Gross
2022-08-28  5:15       ` Demi Marie Obenour
2022-08-29 12:55         ` Juergen Gross
2022-08-29 14:39           ` Marek Marczykowski-Górecki
2022-08-29 16:03             ` Juergen Gross
2022-08-29 18:32             ` Demi Marie Obenour
2022-08-29 18:54           ` Demi Marie Obenour

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.