* “Backend has not unmapped grant” errors @ 2022-08-23 7:40 Demi Marie Obenour 2022-08-23 7:48 ` Juergen Gross 0 siblings, 1 reply; 18+ messages in thread From: Demi Marie Obenour @ 2022-08-23 7:40 UTC (permalink / raw) To: Xen developer discussion; +Cc: Marek Marczykowski-Górecki [-- Attachment #1: Type: text/plain, Size: 1376 bytes --] I recently had a VM’s /dev/xvdb stop working with a “backend has not unmapped grant” error. Since /dev/xvdb was the VM’s private volume, that rendered the VM effectively useless. I had to kill it with qvm-kill. The backend of /dev/xvdb is dom0, so a malicious backend is clearly not the cause of this. I believe the actual cause is a race condition, such as the following: 1. GUI agent in VM allocates grant X. 2. GUI agent tells GUI daemon in dom0 to map X. 3. GUI agent frees grant X. 4. blkfront allocates grant X and passes it to dom0. 5. dom0’s blkback maps grant X. 6. blkback unmaps grant X. 7. GUI daemon maps grant X. 8. blkfront tries to revoke access to grant X and fails. Disaster ensues. What could be done to prevent this race? Right now all of the approaches I can think of are horribly backwards-incompatible. They require replacing grant IDs with some sort of handle, and requiring userspace to pass these handles to ioctls. It is also possible that netfront and blkfront could race against each other in a way that causes this, though I suspect that race would be much harder to trigger. This has happened more than once so it is not a fluke due to e.g. cosmic rays or other random bit-flips. Marek, do you have any suggestions? -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: “Backend has not unmapped grant” errors 2022-08-23 7:40 “Backend has not unmapped grant” errors Demi Marie Obenour @ 2022-08-23 7:48 ` Juergen Gross 2022-08-24 0:20 ` Marek Marczykowski-Górecki 0 siblings, 1 reply; 18+ messages in thread From: Juergen Gross @ 2022-08-23 7:48 UTC (permalink / raw) To: Xen developer discussion, Marek Marczykowski-Górecki [-- Attachment #1.1.1: Type: text/plain, Size: 1704 bytes --] On 23.08.22 09:40, Demi Marie Obenour wrote: > I recently had a VM’s /dev/xvdb stop working with a “backend has not > unmapped grant” error. Since /dev/xvdb was the VM’s private volume, > that rendered the VM effectively useless. I had to kill it with > qvm-kill. > > The backend of /dev/xvdb is dom0, so a malicious backend is clearly not > the cause of this. I believe the actual cause is a race condition, such > as the following: > > 1. GUI agent in VM allocates grant X. > 2. GUI agent tells GUI daemon in dom0 to map X. > 3. GUI agent frees grant X. > 4. blkfront allocates grant X and passes it to dom0. > 5. dom0’s blkback maps grant X. > 6. blkback unmaps grant X. > 7. GUI daemon maps grant X. > 8. blkfront tries to revoke access to grant X and fails. Disaster > ensues. > > What could be done to prevent this race? Right now all of the > approaches I can think of are horribly backwards-incompatible. They > require replacing grant IDs with some sort of handle, and requiring > userspace to pass these handles to ioctls. It is also possible that > netfront and blkfront could race against each other in a way that causes > this, though I suspect that race would be much harder to trigger. > > This has happened more than once so it is not a fluke due to e.g. cosmic > rays or other random bit-flips. > > Marek, do you have any suggestions? To me that sounds like the interface of the GUI is the culprit. The GUI agent in the guest should only free a grant, if it got a message from the backend that it can do so. Just assuming to be able to free it because it isn't in use currently is the broken assumption here. Juergen [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 3149 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 495 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: “Backend has not unmapped grant” errors 2022-08-23 7:48 ` Juergen Gross @ 2022-08-24 0:20 ` Marek Marczykowski-Górecki 2022-08-24 6:02 ` Juergen Gross 2022-08-24 6:11 ` Juergen Gross 0 siblings, 2 replies; 18+ messages in thread From: Marek Marczykowski-Górecki @ 2022-08-24 0:20 UTC (permalink / raw) To: Juergen Gross; +Cc: Xen developer discussion [-- Attachment #1: Type: text/plain, Size: 2882 bytes --] On Tue, Aug 23, 2022 at 09:48:57AM +0200, Juergen Gross wrote: > On 23.08.22 09:40, Demi Marie Obenour wrote: > > I recently had a VM’s /dev/xvdb stop working with a “backend has not > > unmapped grant” error. Since /dev/xvdb was the VM’s private volume, > > that rendered the VM effectively useless. I had to kill it with > > qvm-kill. > > > > The backend of /dev/xvdb is dom0, so a malicious backend is clearly not > > the cause of this. I believe the actual cause is a race condition, such > > as the following: > > > > 1. GUI agent in VM allocates grant X. > > 2. GUI agent tells GUI daemon in dom0 to map X. > > 3. GUI agent frees grant X. > > 4. blkfront allocates grant X and passes it to dom0. > > 5. dom0’s blkback maps grant X. > > 6. blkback unmaps grant X. > > 7. GUI daemon maps grant X. > > 8. blkfront tries to revoke access to grant X and fails. Disaster > > ensues. > > > > What could be done to prevent this race? Right now all of the > > approaches I can think of are horribly backwards-incompatible. They > > require replacing grant IDs with some sort of handle, and requiring > > userspace to pass these handles to ioctls. It is also possible that > > netfront and blkfront could race against each other in a way that causes > > this, though I suspect that race would be much harder to trigger. > > > > This has happened more than once so it is not a fluke due to e.g. cosmic > > rays or other random bit-flips. > > > > Marek, do you have any suggestions? > > To me that sounds like the interface of the GUI is the culprit. > > The GUI agent in the guest should only free a grant, if it got a message > from the backend that it can do so. Just assuming to be able to free it > because it isn't in use currently is the broken assumption here. FWIW, I hit this issue twice already in this week CI run, while it never happened before. The difference compared to previous run is Linux 5.15.57 vs 5.15.61. The latter reports persistent grants disabled. The only related commits I see there are three commits indeed related to persistent grants: c98e956ef489 xen-blkfront: Apply 'feature_persistent' parameter when connect ef26b5d530d4 xen-blkback: Apply 'feature_persistent' parameter when connect 7304be4c985d xen-blkback: fix persistent grants negotiation But none of the commit messages suggests intentional disabling it without explicit request for doing so. I did not requested disabling it in toolstack (although I have set backend as "trusted" - XSA-403). I have confirmed it's the frontend version that matters. Running older frontend kernel with 5.15.61 backend results in persistent grants enabled (and both frontend and backend xenstore "feature-persistent" entries are "1" in this case). -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: “Backend has not unmapped grant” errors 2022-08-24 0:20 ` Marek Marczykowski-Górecki @ 2022-08-24 6:02 ` Juergen Gross 2022-08-24 6:30 ` Jan Beulich 2022-08-24 17:44 ` SeongJae Park 2022-08-24 6:11 ` Juergen Gross 1 sibling, 2 replies; 18+ messages in thread From: Juergen Gross @ 2022-08-24 6:02 UTC (permalink / raw) To: Marek Marczykowski-Górecki Cc: Xen developer discussion, SeongJae Park, Maximilian Heyne [-- Attachment #1.1.1: Type: text/plain, Size: 4312 bytes --] On 24.08.22 02:20, Marek Marczykowski-Górecki wrote: > FWIW, I hit this issue twice already in this week CI run, while it never > happened before. The difference compared to previous run is Linux > 5.15.57 vs 5.15.61. The latter reports persistent grants disabled. The > only related commits I see there are three commits indeed related to > persistent grants: > > c98e956ef489 xen-blkfront: Apply 'feature_persistent' parameter when connect > ef26b5d530d4 xen-blkback: Apply 'feature_persistent' parameter when connect > 7304be4c985d xen-blkback: fix persistent grants negotiation > > But none of the commit messages suggests intentional disabling it > without explicit request for doing so. I did not requested disabling it > in toolstack (although I have set backend as "trusted" - XSA-403). > I have confirmed it's the frontend version that matters. Running older > frontend kernel with 5.15.61 backend results in persistent grants > enabled (and both frontend and backend xenstore "feature-persistent" > entries are "1" in this case). This is a mess. I think the main problem seems to be that the feature negotiation process isn't specified in a sane way. From the blkif.h header: Backend-side: * feature-persistent * Values: 0/1 (boolean) * Default Value: 0 * Notes: 7 * * A value of "1" indicates that the backend can keep the grants used * by the frontend driver mapped, so the same set of grants should be * used in all transactions. The maximum number of grants the backend * can map persistently depends on the implementation, but ideally it * should be RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST. Using this * feature the backend doesn't need to unmap each grant, preventing * costly TLB flushes. The backend driver should only map grants * persistently if the frontend supports it. If a backend driver chooses * to use the persistent protocol when the frontend doesn't support it, * it will probably hit the maximum number of persistently mapped grants * (due to the fact that the frontend won't be reusing the same grants), * and fall back to non-persistent mode. Backend implementations may * shrink or expand the number of persistently mapped grants without * notifying the frontend depending on memory constraints (this might * cause a performance degradation). Frontend-side: * feature-persistent * Values: 0/1 (boolean) * Default Value: 0 * Notes: 7, 8, 9 * * A value of "1" indicates that the frontend will reuse the same grants * for all transactions, allowing the backend to map them with write * access (even when it should be read-only). If the frontend hits the * maximum number of allowed persistently mapped grants, it can fallback * to non persistent mode. This will cause a performance degradation, * since the the backend driver will still try to map those grants * persistently. Since the persistent grants protocol is compatible with * the previous protocol, a frontend driver can choose to work in * persistent mode even when the backend doesn't support it. Those definitions don't make clear, which side is the one to decide whether the feature should be used or not. In my understanding the related drivers should just advertise their setting (the _ability_ to use the feature), and it should be used only if both sides have written a "1". With above patches applied, the frontend will set 'feature-persistent' in Xenstore only, if the backend has done so, but the backend will set it only, if the frontend has done it. This results in persistent grants always being disabled. This is wrong, as the value written should not reflect the current state of the interface. That state should be set according to both sides' value, probably a cached one on the blkback side (using a new flag for caching it, not the current state). The blkif.h comments should be updated to make it clear that the values in Xenstore don't reflect the state of the connection, but the availability of the feature in the related driver. Comments? Juergen [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 3149 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 495 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: “Backend has not unmapped grant” errors 2022-08-24 6:02 ` Juergen Gross @ 2022-08-24 6:30 ` Jan Beulich 2022-08-24 6:36 ` Juergen Gross 2022-08-24 17:44 ` SeongJae Park 1 sibling, 1 reply; 18+ messages in thread From: Jan Beulich @ 2022-08-24 6:30 UTC (permalink / raw) To: Juergen Gross Cc: Xen developer discussion, SeongJae Park, Maximilian Heyne, Marek Marczykowski-Górecki On 24.08.2022 08:02, Juergen Gross wrote: > The blkif.h comments should be updated to make it clear that the values in > Xenstore don't reflect the state of the connection, but the availability of > the feature in the related driver. Isn't that implied for all the feature-* leaves? I certainly don't mind it being spelled out, but I don't think there's any real ambiguity here. Jan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: “Backend has not unmapped grant” errors 2022-08-24 6:30 ` Jan Beulich @ 2022-08-24 6:36 ` Juergen Gross 2022-08-24 6:40 ` Jan Beulich 0 siblings, 1 reply; 18+ messages in thread From: Juergen Gross @ 2022-08-24 6:36 UTC (permalink / raw) To: Jan Beulich Cc: Xen developer discussion, SeongJae Park, Maximilian Heyne, Marek Marczykowski-Górecki [-- Attachment #1.1.1: Type: text/plain, Size: 760 bytes --] On 24.08.22 08:30, Jan Beulich wrote: > On 24.08.2022 08:02, Juergen Gross wrote: >> The blkif.h comments should be updated to make it clear that the values in >> Xenstore don't reflect the state of the connection, but the availability of >> the feature in the related driver. > > Isn't that implied for all the feature-* leaves? I certainly don't mind it > being spelled out, but I don't think there's any real ambiguity here. I think it should spelled out explicitly, maybe in the general paragraph about feature negotiation. To me especially the phrasing on the frontend side was reading as if a "1" would indicate the feature to be actively used: "A value of "1" indicates that the frontend will reuse the same grants ..." Juergen [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 3149 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 495 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: “Backend has not unmapped grant” errors 2022-08-24 6:36 ` Juergen Gross @ 2022-08-24 6:40 ` Jan Beulich 0 siblings, 0 replies; 18+ messages in thread From: Jan Beulich @ 2022-08-24 6:40 UTC (permalink / raw) To: Juergen Gross Cc: Xen developer discussion, SeongJae Park, Maximilian Heyne, Marek Marczykowski-Górecki On 24.08.2022 08:36, Juergen Gross wrote: > On 24.08.22 08:30, Jan Beulich wrote: >> On 24.08.2022 08:02, Juergen Gross wrote: >>> The blkif.h comments should be updated to make it clear that the values in >>> Xenstore don't reflect the state of the connection, but the availability of >>> the feature in the related driver. >> >> Isn't that implied for all the feature-* leaves? I certainly don't mind it >> being spelled out, but I don't think there's any real ambiguity here. > > I think it should spelled out explicitly, maybe in the general paragraph > about feature negotiation. > > To me especially the phrasing on the frontend side was reading as if a "1" > would indicate the feature to be actively used: > > "A value of "1" indicates that the frontend will reuse the same grants ..." Hmm, yes, that's certainly wording worth of improving (regardless of any addition to the general paragraph). Jan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: “Backend has not unmapped grant” errors 2022-08-24 6:02 ` Juergen Gross 2022-08-24 6:30 ` Jan Beulich @ 2022-08-24 17:44 ` SeongJae Park 2022-08-24 20:38 ` SeongJae Park 1 sibling, 1 reply; 18+ messages in thread From: SeongJae Park @ 2022-08-24 17:44 UTC (permalink / raw) To: Juergen Gross Cc: Marek Marczykowski-Górecki, Xen developer discussion, SeongJae Park, Maximilian Heyne Hello, On Wed, 24 Aug 2022 08:02:40 +0200 Juergen Gross <jgross@suse.com> wrote: > > [-- Attachment #1.1.1: Type: text/plain, Size: 4312 bytes --] > > On 24.08.22 02:20, Marek Marczykowski-Górecki wrote: > > FWIW, I hit this issue twice already in this week CI run, while it never > > happened before. The difference compared to previous run is Linux > > 5.15.57 vs 5.15.61. The latter reports persistent grants disabled. The > > only related commits I see there are three commits indeed related to > > persistent grants: > > > > c98e956ef489 xen-blkfront: Apply 'feature_persistent' parameter when connect > > ef26b5d530d4 xen-blkback: Apply 'feature_persistent' parameter when connect > > 7304be4c985d xen-blkback: fix persistent grants negotiation > > > > But none of the commit messages suggests intentional disabling it > > without explicit request for doing so. I did not requested disabling it > > in toolstack (although I have set backend as "trusted" - XSA-403). > > I have confirmed it's the frontend version that matters. Running older > > frontend kernel with 5.15.61 backend results in persistent grants > > enabled (and both frontend and backend xenstore "feature-persistent" > > entries are "1" in this case). > > This is a mess. > > I think the main problem seems to be that the feature negotiation process > isn't specified in a sane way. > > From the blkif.h header: > > Backend-side: > * feature-persistent > * Values: 0/1 (boolean) > * Default Value: 0 > * Notes: 7 > * > * A value of "1" indicates that the backend can keep the grants used > * by the frontend driver mapped, so the same set of grants should be > * used in all transactions. The maximum number of grants the backend > * can map persistently depends on the implementation, but ideally it > * should be RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST. Using this > * feature the backend doesn't need to unmap each grant, preventing > * costly TLB flushes. The backend driver should only map grants > * persistently if the frontend supports it. If a backend driver chooses > * to use the persistent protocol when the frontend doesn't support it, > * it will probably hit the maximum number of persistently mapped grants > * (due to the fact that the frontend won't be reusing the same grants), > * and fall back to non-persistent mode. Backend implementations may > * shrink or expand the number of persistently mapped grants without > * notifying the frontend depending on memory constraints (this might > * cause a performance degradation). > > Frontend-side: > * feature-persistent > * Values: 0/1 (boolean) > * Default Value: 0 > * Notes: 7, 8, 9 > * > * A value of "1" indicates that the frontend will reuse the same grants > * for all transactions, allowing the backend to map them with write > * access (even when it should be read-only). If the frontend hits the > * maximum number of allowed persistently mapped grants, it can fallback > * to non persistent mode. This will cause a performance degradation, > * since the the backend driver will still try to map those grants > * persistently. Since the persistent grants protocol is compatible with > * the previous protocol, a frontend driver can choose to work in > * persistent mode even when the backend doesn't support it. > > Those definitions don't make clear, which side is the one to decide whether > the feature should be used or not. In my understanding the related drivers > should just advertise their setting (the _ability_ to use the feature), and > it should be used only if both sides have written a "1". > > With above patches applied, the frontend will set 'feature-persistent' in > Xenstore only, if the backend has done so, but the backend will set it > only, if the frontend has done it. This results in persistent grants > always being disabled. Sorry for making the mess, and thank you for the kind report and detailed explanation of the problem. > > This is wrong, as the value written should not reflect the current state > of the interface. That state should be set according to both sides' value, > probably a cached one on the blkback side (using a new flag for caching it, > not the current state). Agreed. So, I think the issue comes from the fact that we are using one field, which was a place for saving only the negotiation result, for yet another purpose: caching of the parameter value. As a result, the advertisement, which should follow only the parameter value, becomes inconsistent. How about simply adding another field for the caching purpose, so that the advertisation could be done regardless of the negotiation? For example: diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h index bda5c815e441..a28473470e66 100644 --- a/drivers/block/xen-blkback/common.h +++ b/drivers/block/xen-blkback/common.h @@ -226,6 +226,9 @@ struct xen_vbd { sector_t size; unsigned int flush_support:1; unsigned int discard_secure:1; + /* Connect-time cached feature_persistent parameter value */ + unsigned int feature_gnt_persistent_parm:1; + /* Persistent grants feature negotiation result */ unsigned int feature_gnt_persistent:1; unsigned int overflow_max_grants:1; }; diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index ee7ad2fb432d..c0227dfa4688 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -907,7 +907,7 @@ static void connect(struct backend_info *be) xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support); err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", - be->blkif->vbd.feature_gnt_persistent); + be->blkif->vbd.feature_gnt_persistent_parm); if (err) { xenbus_dev_fatal(dev, err, "writing %s/feature-persistent", dev->nodename); @@ -1085,7 +1085,9 @@ static int connect_ring(struct backend_info *be) return -ENOSYS; } - blkif->vbd.feature_gnt_persistent = feature_persistent && + blkif->vbd.feature_gnt_persistent_parm = feature_persistent; + blkif->vbd.feature_gnt_persistent = + blkif->vbd.feature_gnt_persistent_parm && xenbus_read_unsigned(dev->otherend, "feature-persistent", 0); blkif->vbd.overflow_max_grants = 0; diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 8e56e69fb4c4..dfae08115450 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -213,6 +213,9 @@ struct blkfront_info unsigned int feature_fua:1; unsigned int feature_discard:1; unsigned int feature_secdiscard:1; + /* Connect-time cached feature_persistent parameter */ + unsigned int feature_persistent_parm:1; + /* Persistent grants feature negotiation result */ unsigned int feature_persistent:1; unsigned int bounce:1; unsigned int discard_granularity; @@ -1848,7 +1851,7 @@ static int talk_to_blkback(struct xenbus_device *dev, goto abort_transaction; } err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", - info->feature_persistent); + info->feature_persistent_parm); if (err) dev_warn(&dev->dev, "writing persistent grants feature to xenbus"); @@ -2281,7 +2284,8 @@ static void blkfront_gather_backend_features(struct blkfront_info *info) if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0)) blkfront_setup_discard(info); - if (feature_persistent) + info->feature_persistent_parm = feature_persistent; + if (info->feature_persistent_parm) info->feature_persistent = !!xenbus_read_unsigned(info->xbdev->otherend, "feature-persistent", 0); Thanks, SJ > > The blkif.h comments should be updated to make it clear that the values in > Xenstore don't reflect the state of the connection, but the availability of > the feature in the related driver. > > Comments? > > > Juergen ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: “Backend has not unmapped grant” errors 2022-08-24 17:44 ` SeongJae Park @ 2022-08-24 20:38 ` SeongJae Park 2022-08-25 6:20 ` Juergen Gross 0 siblings, 1 reply; 18+ messages in thread From: SeongJae Park @ 2022-08-24 20:38 UTC (permalink / raw) To: SeongJae Park Cc: Juergen Gross, roger.pau, Marek Marczykowski-Górecki, Xen developer discussion, SeongJae Park, Maximilian Heyne + Roger On Wed, 24 Aug 2022 17:44:42 +0000 SeongJae Park <sj@kernel.org> wrote: > Hello, > > On Wed, 24 Aug 2022 08:02:40 +0200 Juergen Gross <jgross@suse.com> wrote: > > > > > [-- Attachment #1.1.1: Type: text/plain, Size: 4312 bytes --] > > > > On 24.08.22 02:20, Marek Marczykowski-Górecki wrote: > > > FWIW, I hit this issue twice already in this week CI run, while it never > > > happened before. The difference compared to previous run is Linux > > > 5.15.57 vs 5.15.61. The latter reports persistent grants disabled. The > > > only related commits I see there are three commits indeed related to > > > persistent grants: > > > > > > c98e956ef489 xen-blkfront: Apply 'feature_persistent' parameter when connect > > > ef26b5d530d4 xen-blkback: Apply 'feature_persistent' parameter when connect > > > 7304be4c985d xen-blkback: fix persistent grants negotiation > > > > > > But none of the commit messages suggests intentional disabling it > > > without explicit request for doing so. I did not requested disabling it > > > in toolstack (although I have set backend as "trusted" - XSA-403). > > > I have confirmed it's the frontend version that matters. Running older > > > frontend kernel with 5.15.61 backend results in persistent grants > > > enabled (and both frontend and backend xenstore "feature-persistent" > > > entries are "1" in this case). > > > > This is a mess. > > > > I think the main problem seems to be that the feature negotiation process > > isn't specified in a sane way. > > > > From the blkif.h header: > > > > Backend-side: > > * feature-persistent > > * Values: 0/1 (boolean) > > * Default Value: 0 > > * Notes: 7 > > * > > * A value of "1" indicates that the backend can keep the grants used > > * by the frontend driver mapped, so the same set of grants should be > > * used in all transactions. The maximum number of grants the backend > > * can map persistently depends on the implementation, but ideally it > > * should be RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST. Using this > > * feature the backend doesn't need to unmap each grant, preventing > > * costly TLB flushes. The backend driver should only map grants > > * persistently if the frontend supports it. If a backend driver chooses > > * to use the persistent protocol when the frontend doesn't support it, > > * it will probably hit the maximum number of persistently mapped grants > > * (due to the fact that the frontend won't be reusing the same grants), > > * and fall back to non-persistent mode. Backend implementations may > > * shrink or expand the number of persistently mapped grants without > > * notifying the frontend depending on memory constraints (this might > > * cause a performance degradation). > > > > Frontend-side: > > * feature-persistent > > * Values: 0/1 (boolean) > > * Default Value: 0 > > * Notes: 7, 8, 9 > > * > > * A value of "1" indicates that the frontend will reuse the same grants > > * for all transactions, allowing the backend to map them with write > > * access (even when it should be read-only). If the frontend hits the > > * maximum number of allowed persistently mapped grants, it can fallback > > * to non persistent mode. This will cause a performance degradation, > > * since the the backend driver will still try to map those grants > > * persistently. Since the persistent grants protocol is compatible with > > * the previous protocol, a frontend driver can choose to work in > > * persistent mode even when the backend doesn't support it. > > > > Those definitions don't make clear, which side is the one to decide whether > > the feature should be used or not. In my understanding the related drivers > > should just advertise their setting (the _ability_ to use the feature), and > > it should be used only if both sides have written a "1". > > > > With above patches applied, the frontend will set 'feature-persistent' in > > Xenstore only, if the backend has done so, but the backend will set it > > only, if the frontend has done it. This results in persistent grants > > always being disabled. > > Sorry for making the mess, and thank you for the kind report and detailed > explanation of the problem. > > > > > This is wrong, as the value written should not reflect the current state > > of the interface. That state should be set according to both sides' value, > > probably a cached one on the blkback side (using a new flag for caching it, > > not the current state). > > Agreed. So, I think the issue comes from the fact that we are using one field, > which was a place for saving only the negotiation result, for yet another > purpose: caching of the parameter value. As a result, the advertisement, which > should follow only the parameter value, becomes inconsistent. > > How about simply adding another field for the caching purpose, so that the > advertisation could be done regardless of the negotiation? For example: > > diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h > index bda5c815e441..a28473470e66 100644 > --- a/drivers/block/xen-blkback/common.h > +++ b/drivers/block/xen-blkback/common.h > @@ -226,6 +226,9 @@ struct xen_vbd { > sector_t size; > unsigned int flush_support:1; > unsigned int discard_secure:1; > + /* Connect-time cached feature_persistent parameter value */ > + unsigned int feature_gnt_persistent_parm:1; > + /* Persistent grants feature negotiation result */ > unsigned int feature_gnt_persistent:1; > unsigned int overflow_max_grants:1; > }; > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c > index ee7ad2fb432d..c0227dfa4688 100644 > --- a/drivers/block/xen-blkback/xenbus.c > +++ b/drivers/block/xen-blkback/xenbus.c > @@ -907,7 +907,7 @@ static void connect(struct backend_info *be) > xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support); > > err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", > - be->blkif->vbd.feature_gnt_persistent); > + be->blkif->vbd.feature_gnt_persistent_parm); > if (err) { > xenbus_dev_fatal(dev, err, "writing %s/feature-persistent", > dev->nodename); > @@ -1085,7 +1085,9 @@ static int connect_ring(struct backend_info *be) > return -ENOSYS; > } > > - blkif->vbd.feature_gnt_persistent = feature_persistent && > + blkif->vbd.feature_gnt_persistent_parm = feature_persistent; > + blkif->vbd.feature_gnt_persistent = > + blkif->vbd.feature_gnt_persistent_parm && > xenbus_read_unsigned(dev->otherend, "feature-persistent", 0); > > blkif->vbd.overflow_max_grants = 0; > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c > index 8e56e69fb4c4..dfae08115450 100644 > --- a/drivers/block/xen-blkfront.c > +++ b/drivers/block/xen-blkfront.c > @@ -213,6 +213,9 @@ struct blkfront_info > unsigned int feature_fua:1; > unsigned int feature_discard:1; > unsigned int feature_secdiscard:1; > + /* Connect-time cached feature_persistent parameter */ > + unsigned int feature_persistent_parm:1; > + /* Persistent grants feature negotiation result */ > unsigned int feature_persistent:1; > unsigned int bounce:1; > unsigned int discard_granularity; > @@ -1848,7 +1851,7 @@ static int talk_to_blkback(struct xenbus_device *dev, > goto abort_transaction; > } > err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", > - info->feature_persistent); > + info->feature_persistent_parm); > if (err) > dev_warn(&dev->dev, > "writing persistent grants feature to xenbus"); > @@ -2281,7 +2284,8 @@ static void blkfront_gather_backend_features(struct blkfront_info *info) > if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0)) > blkfront_setup_discard(info); > > - if (feature_persistent) > + info->feature_persistent_parm = feature_persistent; > + if (info->feature_persistent_parm) > info->feature_persistent = > !!xenbus_read_unsigned(info->xbdev->otherend, > "feature-persistent", 0); > > > Thanks, > SJ > > > > > The blkif.h comments should be updated to make it clear that the values in > > Xenstore don't reflect the state of the connection, but the availability of > > the feature in the related driver. > > > > Comments? > > > > > > Juergen ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: “Backend has not unmapped grant” errors 2022-08-24 20:38 ` SeongJae Park @ 2022-08-25 6:20 ` Juergen Gross 2022-08-25 16:22 ` SeongJae Park 0 siblings, 1 reply; 18+ messages in thread From: Juergen Gross @ 2022-08-25 6:20 UTC (permalink / raw) To: SeongJae Park Cc: roger.pau, Marek Marczykowski-Górecki, Xen developer discussion, SeongJae Park, Maximilian Heyne [-- Attachment #1.1.1: Type: text/plain, Size: 8724 bytes --] On 24.08.22 22:38, SeongJae Park wrote: > + Roger > > On Wed, 24 Aug 2022 17:44:42 +0000 SeongJae Park <sj@kernel.org> wrote: > >> Hello, >> >> On Wed, 24 Aug 2022 08:02:40 +0200 Juergen Gross <jgross@suse.com> wrote: >> >>> >>> [-- Attachment #1.1.1: Type: text/plain, Size: 4312 bytes --] >>> >>> On 24.08.22 02:20, Marek Marczykowski-Górecki wrote: >>>> FWIW, I hit this issue twice already in this week CI run, while it never >>>> happened before. The difference compared to previous run is Linux >>>> 5.15.57 vs 5.15.61. The latter reports persistent grants disabled. The >>>> only related commits I see there are three commits indeed related to >>>> persistent grants: >>>> >>>> c98e956ef489 xen-blkfront: Apply 'feature_persistent' parameter when connect >>>> ef26b5d530d4 xen-blkback: Apply 'feature_persistent' parameter when connect >>>> 7304be4c985d xen-blkback: fix persistent grants negotiation >>>> >>>> But none of the commit messages suggests intentional disabling it >>>> without explicit request for doing so. I did not requested disabling it >>>> in toolstack (although I have set backend as "trusted" - XSA-403). >>>> I have confirmed it's the frontend version that matters. Running older >>>> frontend kernel with 5.15.61 backend results in persistent grants >>>> enabled (and both frontend and backend xenstore "feature-persistent" >>>> entries are "1" in this case). >>> >>> This is a mess. >>> >>> I think the main problem seems to be that the feature negotiation process >>> isn't specified in a sane way. >>> >>> From the blkif.h header: >>> >>> Backend-side: >>> * feature-persistent >>> * Values: 0/1 (boolean) >>> * Default Value: 0 >>> * Notes: 7 >>> * >>> * A value of "1" indicates that the backend can keep the grants used >>> * by the frontend driver mapped, so the same set of grants should be >>> * used in all transactions. The maximum number of grants the backend >>> * can map persistently depends on the implementation, but ideally it >>> * should be RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST. Using this >>> * feature the backend doesn't need to unmap each grant, preventing >>> * costly TLB flushes. The backend driver should only map grants >>> * persistently if the frontend supports it. If a backend driver chooses >>> * to use the persistent protocol when the frontend doesn't support it, >>> * it will probably hit the maximum number of persistently mapped grants >>> * (due to the fact that the frontend won't be reusing the same grants), >>> * and fall back to non-persistent mode. Backend implementations may >>> * shrink or expand the number of persistently mapped grants without >>> * notifying the frontend depending on memory constraints (this might >>> * cause a performance degradation). >>> >>> Frontend-side: >>> * feature-persistent >>> * Values: 0/1 (boolean) >>> * Default Value: 0 >>> * Notes: 7, 8, 9 >>> * >>> * A value of "1" indicates that the frontend will reuse the same grants >>> * for all transactions, allowing the backend to map them with write >>> * access (even when it should be read-only). If the frontend hits the >>> * maximum number of allowed persistently mapped grants, it can fallback >>> * to non persistent mode. This will cause a performance degradation, >>> * since the the backend driver will still try to map those grants >>> * persistently. Since the persistent grants protocol is compatible with >>> * the previous protocol, a frontend driver can choose to work in >>> * persistent mode even when the backend doesn't support it. >>> >>> Those definitions don't make clear, which side is the one to decide whether >>> the feature should be used or not. In my understanding the related drivers >>> should just advertise their setting (the _ability_ to use the feature), and >>> it should be used only if both sides have written a "1". >>> >>> With above patches applied, the frontend will set 'feature-persistent' in >>> Xenstore only, if the backend has done so, but the backend will set it >>> only, if the frontend has done it. This results in persistent grants >>> always being disabled. >> >> Sorry for making the mess, and thank you for the kind report and detailed >> explanation of the problem. >> >>> >>> This is wrong, as the value written should not reflect the current state >>> of the interface. That state should be set according to both sides' value, >>> probably a cached one on the blkback side (using a new flag for caching it, >>> not the current state). >> >> Agreed. So, I think the issue comes from the fact that we are using one field, >> which was a place for saving only the negotiation result, for yet another >> purpose: caching of the parameter value. As a result, the advertisement, which >> should follow only the parameter value, becomes inconsistent. >> >> How about simply adding another field for the caching purpose, so that the >> advertisation could be done regardless of the negotiation? For example: >> >> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h >> index bda5c815e441..a28473470e66 100644 >> --- a/drivers/block/xen-blkback/common.h >> +++ b/drivers/block/xen-blkback/common.h >> @@ -226,6 +226,9 @@ struct xen_vbd { >> sector_t size; >> unsigned int flush_support:1; >> unsigned int discard_secure:1; >> + /* Connect-time cached feature_persistent parameter value */ >> + unsigned int feature_gnt_persistent_parm:1; >> + /* Persistent grants feature negotiation result */ >> unsigned int feature_gnt_persistent:1; >> unsigned int overflow_max_grants:1; >> }; >> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c >> index ee7ad2fb432d..c0227dfa4688 100644 >> --- a/drivers/block/xen-blkback/xenbus.c >> +++ b/drivers/block/xen-blkback/xenbus.c >> @@ -907,7 +907,7 @@ static void connect(struct backend_info *be) >> xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support); >> >> err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", >> - be->blkif->vbd.feature_gnt_persistent); >> + be->blkif->vbd.feature_gnt_persistent_parm); >> if (err) { >> xenbus_dev_fatal(dev, err, "writing %s/feature-persistent", >> dev->nodename); >> @@ -1085,7 +1085,9 @@ static int connect_ring(struct backend_info *be) >> return -ENOSYS; >> } >> >> - blkif->vbd.feature_gnt_persistent = feature_persistent && >> + blkif->vbd.feature_gnt_persistent_parm = feature_persistent; >> + blkif->vbd.feature_gnt_persistent = >> + blkif->vbd.feature_gnt_persistent_parm && >> xenbus_read_unsigned(dev->otherend, "feature-persistent", 0); >> >> blkif->vbd.overflow_max_grants = 0; >> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c >> index 8e56e69fb4c4..dfae08115450 100644 >> --- a/drivers/block/xen-blkfront.c >> +++ b/drivers/block/xen-blkfront.c >> @@ -213,6 +213,9 @@ struct blkfront_info >> unsigned int feature_fua:1; >> unsigned int feature_discard:1; >> unsigned int feature_secdiscard:1; >> + /* Connect-time cached feature_persistent parameter */ >> + unsigned int feature_persistent_parm:1; >> + /* Persistent grants feature negotiation result */ >> unsigned int feature_persistent:1; >> unsigned int bounce:1; >> unsigned int discard_granularity; >> @@ -1848,7 +1851,7 @@ static int talk_to_blkback(struct xenbus_device *dev, >> goto abort_transaction; >> } >> err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", >> - info->feature_persistent); >> + info->feature_persistent_parm); >> if (err) >> dev_warn(&dev->dev, >> "writing persistent grants feature to xenbus"); >> @@ -2281,7 +2284,8 @@ static void blkfront_gather_backend_features(struct blkfront_info *info) >> if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0)) >> blkfront_setup_discard(info); >> >> - if (feature_persistent) >> + info->feature_persistent_parm = feature_persistent; >> + if (info->feature_persistent_parm) >> info->feature_persistent = >> !!xenbus_read_unsigned(info->xbdev->otherend, >> "feature-persistent", 0); >> Yes, this is much better IMO. Could you please send it as two proper patches (one for each driver) with the correct "Fixes:" tags? Juergen [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 3149 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 495 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: “Backend has not unmapped grant” errors 2022-08-25 6:20 ` Juergen Gross @ 2022-08-25 16:22 ` SeongJae Park 0 siblings, 0 replies; 18+ messages in thread From: SeongJae Park @ 2022-08-25 16:22 UTC (permalink / raw) To: Juergen Gross Cc: SeongJae Park, roger.pau, Marek Marczykowski-Górecki, Xen developer discussion, SeongJae Park, Maximilian Heyne Hi Juergen, Thank you for the quick and nice reply! On Thu, 25 Aug 2022 08:20:33 +0200 Juergen Gross <jgross@suse.com> wrote: > [...] > > Could you please send it as two proper patches (one for each driver) with > the correct "Fixes:" tags? Sure, just posted: https://lore.kernel.org/xen-devel/20220825161511.94922-2-sj@kernel.org/ Thanks, SJ > > > Juergen ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: “Backend has not unmapped grant” errors 2022-08-24 0:20 ` Marek Marczykowski-Górecki 2022-08-24 6:02 ` Juergen Gross @ 2022-08-24 6:11 ` Juergen Gross 2022-08-28 5:15 ` Demi Marie Obenour 1 sibling, 1 reply; 18+ messages in thread From: Juergen Gross @ 2022-08-24 6:11 UTC (permalink / raw) To: Marek Marczykowski-Górecki; +Cc: Xen developer discussion [-- Attachment #1.1.1: Type: text/plain, Size: 2359 bytes --] On 24.08.22 02:20, Marek Marczykowski-Górecki wrote: > On Tue, Aug 23, 2022 at 09:48:57AM +0200, Juergen Gross wrote: >> On 23.08.22 09:40, Demi Marie Obenour wrote: >>> I recently had a VM’s /dev/xvdb stop working with a “backend has not >>> unmapped grant” error. Since /dev/xvdb was the VM’s private volume, >>> that rendered the VM effectively useless. I had to kill it with >>> qvm-kill. >>> >>> The backend of /dev/xvdb is dom0, so a malicious backend is clearly not >>> the cause of this. I believe the actual cause is a race condition, such >>> as the following: >>> >>> 1. GUI agent in VM allocates grant X. >>> 2. GUI agent tells GUI daemon in dom0 to map X. >>> 3. GUI agent frees grant X. >>> 4. blkfront allocates grant X and passes it to dom0. >>> 5. dom0’s blkback maps grant X. >>> 6. blkback unmaps grant X. >>> 7. GUI daemon maps grant X. >>> 8. blkfront tries to revoke access to grant X and fails. Disaster >>> ensues. >>> >>> What could be done to prevent this race? Right now all of the >>> approaches I can think of are horribly backwards-incompatible. They >>> require replacing grant IDs with some sort of handle, and requiring >>> userspace to pass these handles to ioctls. It is also possible that >>> netfront and blkfront could race against each other in a way that causes >>> this, though I suspect that race would be much harder to trigger. >>> >>> This has happened more than once so it is not a fluke due to e.g. cosmic >>> rays or other random bit-flips. >>> >>> Marek, do you have any suggestions? >> >> To me that sounds like the interface of the GUI is the culprit. >> >> The GUI agent in the guest should only free a grant, if it got a message >> from the backend that it can do so. Just assuming to be able to free it >> because it isn't in use currently is the broken assumption here. > > FWIW, I hit this issue twice already in this week CI run, while it never > happened before. The difference compared to previous run is Linux > 5.15.57 vs 5.15.61. The latter reports persistent grants disabled. I think this additional bug is just triggering the race in the GUI interface more easily, as blkfront will allocate new grants with a much higher frequency. So fixing the persistent grant issue will just paper over the real issue. Juergen [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 3149 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 495 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: “Backend has not unmapped grant” errors 2022-08-24 6:11 ` Juergen Gross @ 2022-08-28 5:15 ` Demi Marie Obenour 2022-08-29 12:55 ` Juergen Gross 0 siblings, 1 reply; 18+ messages in thread From: Demi Marie Obenour @ 2022-08-28 5:15 UTC (permalink / raw) To: Juergen Gross, Marek Marczykowski-Górecki; +Cc: Xen developer discussion [-- Attachment #1: Type: text/plain, Size: 4024 bytes --] On Wed, Aug 24, 2022 at 08:11:56AM +0200, Juergen Gross wrote: > On 24.08.22 02:20, Marek Marczykowski-Górecki wrote: > > On Tue, Aug 23, 2022 at 09:48:57AM +0200, Juergen Gross wrote: > > > On 23.08.22 09:40, Demi Marie Obenour wrote: > > > > I recently had a VM’s /dev/xvdb stop working with a “backend has not > > > > unmapped grant” error. Since /dev/xvdb was the VM’s private volume, > > > > that rendered the VM effectively useless. I had to kill it with > > > > qvm-kill. > > > > > > > > The backend of /dev/xvdb is dom0, so a malicious backend is clearly not > > > > the cause of this. I believe the actual cause is a race condition, such > > > > as the following: > > > > > > > > 1. GUI agent in VM allocates grant X. > > > > 2. GUI agent tells GUI daemon in dom0 to map X. > > > > 3. GUI agent frees grant X. > > > > 4. blkfront allocates grant X and passes it to dom0. > > > > 5. dom0’s blkback maps grant X. > > > > 6. blkback unmaps grant X. > > > > 7. GUI daemon maps grant X. > > > > 8. blkfront tries to revoke access to grant X and fails. Disaster > > > > ensues. > > > > > > > > What could be done to prevent this race? Right now all of the > > > > approaches I can think of are horribly backwards-incompatible. They > > > > require replacing grant IDs with some sort of handle, and requiring > > > > userspace to pass these handles to ioctls. It is also possible that > > > > netfront and blkfront could race against each other in a way that causes > > > > this, though I suspect that race would be much harder to trigger. > > > > > > > > This has happened more than once so it is not a fluke due to e.g. cosmic > > > > rays or other random bit-flips. > > > > > > > > Marek, do you have any suggestions? > > > > > > To me that sounds like the interface of the GUI is the culprit. > > > > > > The GUI agent in the guest should only free a grant, if it got a message > > > from the backend that it can do so. Just assuming to be able to free it > > > because it isn't in use currently is the broken assumption here. > > > > FWIW, I hit this issue twice already in this week CI run, while it never > > happened before. The difference compared to previous run is Linux > > 5.15.57 vs 5.15.61. The latter reports persistent grants disabled. > > I think this additional bug is just triggering the race in the GUI > interface more easily, as blkfront will allocate new grants with a > much higher frequency. > > So fixing the persistent grant issue will just paper over the real > issue. Indeed so, but making the bug happen much less frequently is still a significant win for users. In the long term, there is one situation I do not have a good solution for: recovery from GUI agent crashes. If the GUI agent crashes, the kernel it is running under has two bad choices. Either the kernel can reclaim the grants, risking them being mapped at a later time by the GUI daemon, or it can leak them, which is bad for obvious reasons. I believe the current implementation makes the former choice. To fix this problem, I recommend the following changes: 1. Treat “backend has not unmapped grant” errors as non-fatal. The most likely cause is buggy userspace software, not an attempt to exploit XSA-396. Instead of disabling the device, just log a warning message and put the grant on the deferred queue. Even leaking the grant would be preferable to the current behavior, as disabling a block device typically leaves the VM unusable. 2. Ensure that the same grant being mapped twice is handled correctly. At least Linux is known to have bugs in this regard. 3. Provide a means for a domain to be notified by Xen whenever one of its grants is unmapped. Setting an event channel and writing to a shared ring would suffice. This would allow eliminating the kludgy deferred freeing mechanism. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: “Backend has not unmapped grant” errors 2022-08-28 5:15 ` Demi Marie Obenour @ 2022-08-29 12:55 ` Juergen Gross 2022-08-29 14:39 ` Marek Marczykowski-Górecki 2022-08-29 18:54 ` Demi Marie Obenour 0 siblings, 2 replies; 18+ messages in thread From: Juergen Gross @ 2022-08-29 12:55 UTC (permalink / raw) To: Demi Marie Obenour, Marek Marczykowski-Górecki Cc: Xen developer discussion [-- Attachment #1.1.1: Type: text/plain, Size: 5192 bytes --] On 28.08.22 07:15, Demi Marie Obenour wrote: > On Wed, Aug 24, 2022 at 08:11:56AM +0200, Juergen Gross wrote: >> On 24.08.22 02:20, Marek Marczykowski-Górecki wrote: >>> On Tue, Aug 23, 2022 at 09:48:57AM +0200, Juergen Gross wrote: >>>> On 23.08.22 09:40, Demi Marie Obenour wrote: >>>>> I recently had a VM’s /dev/xvdb stop working with a “backend has not >>>>> unmapped grant” error. Since /dev/xvdb was the VM’s private volume, >>>>> that rendered the VM effectively useless. I had to kill it with >>>>> qvm-kill. >>>>> >>>>> The backend of /dev/xvdb is dom0, so a malicious backend is clearly not >>>>> the cause of this. I believe the actual cause is a race condition, such >>>>> as the following: >>>>> >>>>> 1. GUI agent in VM allocates grant X. >>>>> 2. GUI agent tells GUI daemon in dom0 to map X. >>>>> 3. GUI agent frees grant X. >>>>> 4. blkfront allocates grant X and passes it to dom0. >>>>> 5. dom0’s blkback maps grant X. >>>>> 6. blkback unmaps grant X. >>>>> 7. GUI daemon maps grant X. >>>>> 8. blkfront tries to revoke access to grant X and fails. Disaster >>>>> ensues. >>>>> >>>>> What could be done to prevent this race? Right now all of the >>>>> approaches I can think of are horribly backwards-incompatible. They >>>>> require replacing grant IDs with some sort of handle, and requiring >>>>> userspace to pass these handles to ioctls. It is also possible that >>>>> netfront and blkfront could race against each other in a way that causes >>>>> this, though I suspect that race would be much harder to trigger. >>>>> >>>>> This has happened more than once so it is not a fluke due to e.g. cosmic >>>>> rays or other random bit-flips. >>>>> >>>>> Marek, do you have any suggestions? >>>> >>>> To me that sounds like the interface of the GUI is the culprit. >>>> >>>> The GUI agent in the guest should only free a grant, if it got a message >>>> from the backend that it can do so. Just assuming to be able to free it >>>> because it isn't in use currently is the broken assumption here. >>> >>> FWIW, I hit this issue twice already in this week CI run, while it never >>> happened before. The difference compared to previous run is Linux >>> 5.15.57 vs 5.15.61. The latter reports persistent grants disabled. >> >> I think this additional bug is just triggering the race in the GUI >> interface more easily, as blkfront will allocate new grants with a >> much higher frequency. >> >> So fixing the persistent grant issue will just paper over the real >> issue. > > Indeed so, but making the bug happen much less frequently is still a > significant win for users. Probably, yes. > In the long term, there is one situation I do not have a good solution > for: recovery from GUI agent crashes. If the GUI agent crashes, the > kernel it is running under has two bad choices. Either the kernel can > reclaim the grants, risking them being mapped at a later time by the GUI > daemon, or it can leak them, which is bad for obvious reasons. I > believe the current implementation makes the former choice. It does. I don't have enough information about the GUI architecture you are using. Which components are involved on the backend side, and which on the frontend side? Especially the responsibilities and communication regarding grants is important here. > To fix this problem, I recommend the following changes: > > 1. Treat “backend has not unmapped grant” errors as non-fatal. The most > likely cause is buggy userspace software, not an attempt to exploit > XSA-396. Instead of disabling the device, just log a warning message > and put the grant on the deferred queue. Even leaking the grant > would be preferable to the current behavior, as disabling a block > device typically leaves the VM unusable. Sorry, I don't agree. This is a major violation of the normal I/O architecture. Your reasoning with the disabled block device doesn't make much sense IMHO, as the mapped grant was due to a bad interface leading to another component using a grant it was not meant to use. Shutting down the block device is the right thing to do here, as data corruption might be happening. > 2. Ensure that the same grant being mapped twice is handled correctly. > At least Linux is known to have bugs in this regard. I agree that this should be repaired. > 3. Provide a means for a domain to be notified by Xen whenever one of > its grants is unmapped. Setting an event channel and writing to a > shared ring would suffice. This would allow eliminating the kludgy > deferred freeing mechanism. Interesting idea. I believe such an interface would need to be activated per grant, as otherwise performance could suffer a lot. There are still some unused bits in the grant flags, one could be used for that purpose. I'm not sure how often this would be used. In case it is only for the rare case of unexpectedly long mapped grant pages, a simple event might do the job, with the event handler just skimming through the pending unmaps to find the grants being available again. Juergen [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 3149 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 495 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: “Backend has not unmapped grant” errors 2022-08-29 12:55 ` Juergen Gross @ 2022-08-29 14:39 ` Marek Marczykowski-Górecki 2022-08-29 16:03 ` Juergen Gross 2022-08-29 18:32 ` Demi Marie Obenour 2022-08-29 18:54 ` Demi Marie Obenour 1 sibling, 2 replies; 18+ messages in thread From: Marek Marczykowski-Górecki @ 2022-08-29 14:39 UTC (permalink / raw) To: Juergen Gross; +Cc: Demi Marie Obenour, Xen developer discussion [-- Attachment #1: Type: text/plain, Size: 6324 bytes --] On Mon, Aug 29, 2022 at 02:55:55PM +0200, Juergen Gross wrote: > On 28.08.22 07:15, Demi Marie Obenour wrote: > > On Wed, Aug 24, 2022 at 08:11:56AM +0200, Juergen Gross wrote: > > > On 24.08.22 02:20, Marek Marczykowski-Górecki wrote: > > > > On Tue, Aug 23, 2022 at 09:48:57AM +0200, Juergen Gross wrote: > > > > > On 23.08.22 09:40, Demi Marie Obenour wrote: > > > > > > I recently had a VM’s /dev/xvdb stop working with a “backend has not > > > > > > unmapped grant” error. Since /dev/xvdb was the VM’s private volume, > > > > > > that rendered the VM effectively useless. I had to kill it with > > > > > > qvm-kill. > > > > > > > > > > > > The backend of /dev/xvdb is dom0, so a malicious backend is clearly not > > > > > > the cause of this. I believe the actual cause is a race condition, such > > > > > > as the following: > > > > > > > > > > > > 1. GUI agent in VM allocates grant X. > > > > > > 2. GUI agent tells GUI daemon in dom0 to map X. > > > > > > 3. GUI agent frees grant X. > > > > > > 4. blkfront allocates grant X and passes it to dom0. > > > > > > 5. dom0’s blkback maps grant X. > > > > > > 6. blkback unmaps grant X. > > > > > > 7. GUI daemon maps grant X. > > > > > > 8. blkfront tries to revoke access to grant X and fails. Disaster > > > > > > ensues. > > > > > > > > > > > > What could be done to prevent this race? Right now all of the > > > > > > approaches I can think of are horribly backwards-incompatible. They > > > > > > require replacing grant IDs with some sort of handle, and requiring > > > > > > userspace to pass these handles to ioctls. It is also possible that > > > > > > netfront and blkfront could race against each other in a way that causes > > > > > > this, though I suspect that race would be much harder to trigger. > > > > > > > > > > > > This has happened more than once so it is not a fluke due to e.g. cosmic > > > > > > rays or other random bit-flips. > > > > > > > > > > > > Marek, do you have any suggestions? > > > > > > > > > > To me that sounds like the interface of the GUI is the culprit. > > > > > > > > > > The GUI agent in the guest should only free a grant, if it got a message > > > > > from the backend that it can do so. Just assuming to be able to free it > > > > > because it isn't in use currently is the broken assumption here. > > > > > > > > FWIW, I hit this issue twice already in this week CI run, while it never > > > > happened before. The difference compared to previous run is Linux > > > > 5.15.57 vs 5.15.61. The latter reports persistent grants disabled. > > > > > > I think this additional bug is just triggering the race in the GUI > > > interface more easily, as blkfront will allocate new grants with a > > > > 1. Treat “backend has not unmapped grant” errors as non-fatal. The most > > likely cause is buggy userspace software, not an attempt to exploit > > XSA-396. Instead of disabling the device, just log a warning message > > > much higher frequency. > > > > > > So fixing the persistent grant issue will just paper over the real > > > issue. > > > > Indeed so, but making the bug happen much less frequently is still a > > significant win for users. > > Probably, yes. > > > In the long term, there is one situation I do not have a good solution > > for: recovery from GUI agent crashes. If the GUI agent crashes, the > > kernel it is running under has two bad choices. Either the kernel can > > reclaim the grants, risking them being mapped at a later time by the GUI > > daemon, or it can leak them, which is bad for obvious reasons. I > > believe the current implementation makes the former choice. > > It does. > > I don't have enough information about the GUI architecture you are using. > Which components are involved on the backend side, and which on the > frontend side? Especially the responsibilities and communication regarding > grants is important here. I'll limit the description to the relevant minimum here. The gui-agent(*) uses gntalloc to share framebuffers (they are allocated whenever an application within domU opens a window), then sends grant reference numbers over vchan to the gui-daemon (running in dom0 by default, but it can be also another domU). Then the gui-daemon(*) maps them. Later, when an application closes a window, the shared memory is unmapped, and gui-daemon is informed about it. Releasing grant refs is deferred by the kernel (until gui-daemon unmaps them). It may happen that unmapping on the gui-agent side will happen before gui-daemon maps them. We are modifying our GUI protocol to delay releasing grants on the user space side, to coordinate with gui-daemon (basically wait until gui-daemon confirms it unmapped them). This should fix the "normal" case. But if the gui-agent crashes just after sending grant refs, but before gui-daemon maps them, then the problem is still there. If they are immediately released by the kernel for others to use, we can hit the same issue again (for example blkfront using them, and then gui-daemon mapping them). I don't see race-free method for solving this with the current API. GUI daemon can notice when such situation happens (by checking if gui-agent is still alive after mapping grants), but that is too late already. The main difference compared to kernel drivers is the automatic release on crash (or other unclean exit). In case of kernel driver crash, either the whole VM goes down, or at least automatic release doesn't happen. Maybe gntalloc could have some flag (per open file? per allocated grant?) to _not_ release grant reference (aka leak it) in case of implicit unmap, instead of explicit release? Such explicit release would need to be added to the Linux gntshr API, as xengntshr_unshare() currently is just munmap()). I don't see many other options to avoid userspace crash (potentially) taking down PV device with it too... (*) gui-agent and gui-daemon here are both in fact two processes (qubes gui process that handles vchan communication and Xorg that does the actual mapping). It complicates few things, but generally is irrelevant detail from the Xen point of view. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: “Backend has not unmapped grant” errors 2022-08-29 14:39 ` Marek Marczykowski-Górecki @ 2022-08-29 16:03 ` Juergen Gross 2022-08-29 18:32 ` Demi Marie Obenour 1 sibling, 0 replies; 18+ messages in thread From: Juergen Gross @ 2022-08-29 16:03 UTC (permalink / raw) To: Marek Marczykowski-Górecki Cc: Demi Marie Obenour, Xen developer discussion [-- Attachment #1.1.1: Type: text/plain, Size: 6461 bytes --] On 29.08.22 16:39, Marek Marczykowski-Górecki wrote: > On Mon, Aug 29, 2022 at 02:55:55PM +0200, Juergen Gross wrote: >> On 28.08.22 07:15, Demi Marie Obenour wrote: >>> On Wed, Aug 24, 2022 at 08:11:56AM +0200, Juergen Gross wrote: >>>> On 24.08.22 02:20, Marek Marczykowski-Górecki wrote: >>>>> On Tue, Aug 23, 2022 at 09:48:57AM +0200, Juergen Gross wrote: >>>>>> On 23.08.22 09:40, Demi Marie Obenour wrote: >>>>>>> I recently had a VM’s /dev/xvdb stop working with a “backend has not >>>>>>> unmapped grant” error. Since /dev/xvdb was the VM’s private volume, >>>>>>> that rendered the VM effectively useless. I had to kill it with >>>>>>> qvm-kill. >>>>>>> >>>>>>> The backend of /dev/xvdb is dom0, so a malicious backend is clearly not >>>>>>> the cause of this. I believe the actual cause is a race condition, such >>>>>>> as the following: >>>>>>> >>>>>>> 1. GUI agent in VM allocates grant X. >>>>>>> 2. GUI agent tells GUI daemon in dom0 to map X. >>>>>>> 3. GUI agent frees grant X. >>>>>>> 4. blkfront allocates grant X and passes it to dom0. >>>>>>> 5. dom0’s blkback maps grant X. >>>>>>> 6. blkback unmaps grant X. >>>>>>> 7. GUI daemon maps grant X. >>>>>>> 8. blkfront tries to revoke access to grant X and fails. Disaster >>>>>>> ensues. >>>>>>> >>>>>>> What could be done to prevent this race? Right now all of the >>>>>>> approaches I can think of are horribly backwards-incompatible. They >>>>>>> require replacing grant IDs with some sort of handle, and requiring >>>>>>> userspace to pass these handles to ioctls. It is also possible that >>>>>>> netfront and blkfront could race against each other in a way that causes >>>>>>> this, though I suspect that race would be much harder to trigger. >>>>>>> >>>>>>> This has happened more than once so it is not a fluke due to e.g. cosmic >>>>>>> rays or other random bit-flips. >>>>>>> >>>>>>> Marek, do you have any suggestions? >>>>>> >>>>>> To me that sounds like the interface of the GUI is the culprit. >>>>>> >>>>>> The GUI agent in the guest should only free a grant, if it got a message >>>>>> from the backend that it can do so. Just assuming to be able to free it >>>>>> because it isn't in use currently is the broken assumption here. >>>>> >>>>> FWIW, I hit this issue twice already in this week CI run, while it never >>>>> happened before. The difference compared to previous run is Linux >>>>> 5.15.57 vs 5.15.61. The latter reports persistent grants disabled. >>>> >>>> I think this additional bug is just triggering the race in the GUI >>>> interface more easily, as blkfront will allocate new grants with a >>> >>> 1. Treat “backend has not unmapped grant” errors as non-fatal. The most >>> likely cause is buggy userspace software, not an attempt to exploit >>> XSA-396. Instead of disabling the device, just log a warning message >>>> much higher frequency. >>>> >>>> So fixing the persistent grant issue will just paper over the real >>>> issue. >>> >>> Indeed so, but making the bug happen much less frequently is still a >>> significant win for users. >> >> Probably, yes. >> >>> In the long term, there is one situation I do not have a good solution >>> for: recovery from GUI agent crashes. If the GUI agent crashes, the >>> kernel it is running under has two bad choices. Either the kernel can >>> reclaim the grants, risking them being mapped at a later time by the GUI >>> daemon, or it can leak them, which is bad for obvious reasons. I >>> believe the current implementation makes the former choice. >> >> It does. >> >> I don't have enough information about the GUI architecture you are using. >> Which components are involved on the backend side, and which on the >> frontend side? Especially the responsibilities and communication regarding >> grants is important here. > > I'll limit the description to the relevant minimum here. Thanks for the information. It helps a lot. > The gui-agent(*) uses gntalloc to share framebuffers (they are allocated > whenever an application within domU opens a window), then sends grant > reference numbers over vchan to the gui-daemon (running in dom0 by > default, but it can be also another domU). > Then the gui-daemon(*) maps them. > Later, when an application closes a window, the shared memory is > unmapped, and gui-daemon is informed about it. Releasing grant refs is > deferred by the kernel (until gui-daemon unmaps them). It may happen > that unmapping on the gui-agent side will happen before gui-daemon maps > them. We are modifying our GUI protocol to delay releasing grants on the > user space side, to coordinate with gui-daemon (basically wait until > gui-daemon confirms it unmapped them). This should fix the "normal" > case. > But if the gui-agent crashes just after sending grant refs, but before > gui-daemon maps them, then the problem is still there. If they are > immediately released by the kernel for others to use, we can hit the > same issue again (for example blkfront using them, and then gui-daemon > mapping them). I don't see race-free method for solving this with the > current API. GUI daemon can notice when such situation happens (by > checking if gui-agent is still alive after mapping grants), but that is > too late already. > > The main difference compared to kernel drivers is the automatic release > on crash (or other unclean exit). In case of kernel driver crash, either > the whole VM goes down, or at least automatic release doesn't happen. > Maybe gntalloc could have some flag (per open file? per allocated > grant?) to _not_ release grant reference (aka leak it) in case of > implicit unmap, instead of explicit release? Such explicit release > would need to be added to the Linux gntshr API, as xengntshr_unshare() > currently is just munmap()). I don't see many other options to avoid > userspace crash (potentially) taking down PV device with it too... My idea would be to add a new ioctl() to the gntalloc driver allowing to specify a permanent name for a file. This would lead to: - the grants not to be dropped when the process is dying - in case grants with this name are existing, they are added to the file descriptor, resulting in them being under control of the new process - the permanent grants would need to be remove explicitly instead of cleaned up due to close() Juergen [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 3149 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 495 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: “Backend has not unmapped grant” errors 2022-08-29 14:39 ` Marek Marczykowski-Górecki 2022-08-29 16:03 ` Juergen Gross @ 2022-08-29 18:32 ` Demi Marie Obenour 1 sibling, 0 replies; 18+ messages in thread From: Demi Marie Obenour @ 2022-08-29 18:32 UTC (permalink / raw) To: Marek Marczykowski-Górecki, Juergen Gross; +Cc: Xen developer discussion [-- Attachment #1: Type: text/plain, Size: 6934 bytes --] On Mon, Aug 29, 2022 at 04:39:29PM +0200, Marek Marczykowski-Górecki wrote: > On Mon, Aug 29, 2022 at 02:55:55PM +0200, Juergen Gross wrote: > > On 28.08.22 07:15, Demi Marie Obenour wrote: > > > On Wed, Aug 24, 2022 at 08:11:56AM +0200, Juergen Gross wrote: > > > > On 24.08.22 02:20, Marek Marczykowski-Górecki wrote: > > > > > On Tue, Aug 23, 2022 at 09:48:57AM +0200, Juergen Gross wrote: > > > > > > On 23.08.22 09:40, Demi Marie Obenour wrote: > > > > > > > I recently had a VM’s /dev/xvdb stop working with a “backend has not > > > > > > > unmapped grant” error. Since /dev/xvdb was the VM’s private volume, > > > > > > > that rendered the VM effectively useless. I had to kill it with > > > > > > > qvm-kill. > > > > > > > > > > > > > > The backend of /dev/xvdb is dom0, so a malicious backend is clearly not > > > > > > > the cause of this. I believe the actual cause is a race condition, such > > > > > > > as the following: > > > > > > > > > > > > > > 1. GUI agent in VM allocates grant X. > > > > > > > 2. GUI agent tells GUI daemon in dom0 to map X. > > > > > > > 3. GUI agent frees grant X. > > > > > > > 4. blkfront allocates grant X and passes it to dom0. > > > > > > > 5. dom0’s blkback maps grant X. > > > > > > > 6. blkback unmaps grant X. > > > > > > > 7. GUI daemon maps grant X. > > > > > > > 8. blkfront tries to revoke access to grant X and fails. Disaster > > > > > > > ensues. > > > > > > > > > > > > > > What could be done to prevent this race? Right now all of the > > > > > > > approaches I can think of are horribly backwards-incompatible. They > > > > > > > require replacing grant IDs with some sort of handle, and requiring > > > > > > > userspace to pass these handles to ioctls. It is also possible that > > > > > > > netfront and blkfront could race against each other in a way that causes > > > > > > > this, though I suspect that race would be much harder to trigger. > > > > > > > > > > > > > > This has happened more than once so it is not a fluke due to e.g. cosmic > > > > > > > rays or other random bit-flips. > > > > > > > > > > > > > > Marek, do you have any suggestions? > > > > > > > > > > > > To me that sounds like the interface of the GUI is the culprit. > > > > > > > > > > > > The GUI agent in the guest should only free a grant, if it got a message > > > > > > from the backend that it can do so. Just assuming to be able to free it > > > > > > because it isn't in use currently is the broken assumption here. > > > > > > > > > > FWIW, I hit this issue twice already in this week CI run, while it never > > > > > happened before. The difference compared to previous run is Linux > > > > > 5.15.57 vs 5.15.61. The latter reports persistent grants disabled. > > > > > > > > I think this additional bug is just triggering the race in the GUI > > > > interface more easily, as blkfront will allocate new grants with a > > > > > > 1. Treat “backend has not unmapped grant” errors as non-fatal. The most > > > likely cause is buggy userspace software, not an attempt to exploit > > > XSA-396. Instead of disabling the device, just log a warning message > > > > much higher frequency. > > > > > > > > So fixing the persistent grant issue will just paper over the real > > > > issue. > > > > > > Indeed so, but making the bug happen much less frequently is still a > > > significant win for users. > > > > Probably, yes. > > > > > In the long term, there is one situation I do not have a good solution > > > for: recovery from GUI agent crashes. If the GUI agent crashes, the > > > kernel it is running under has two bad choices. Either the kernel can > > > reclaim the grants, risking them being mapped at a later time by the GUI > > > daemon, or it can leak them, which is bad for obvious reasons. I > > > believe the current implementation makes the former choice. > > > > It does. > > > > I don't have enough information about the GUI architecture you are using. > > Which components are involved on the backend side, and which on the > > frontend side? Especially the responsibilities and communication regarding > > grants is important here. > > I'll limit the description to the relevant minimum here. > The gui-agent(*) uses gntalloc to share framebuffers (they are allocated > whenever an application within domU opens a window), then sends grant > reference numbers over vchan to the gui-daemon (running in dom0 by > default, but it can be also another domU). > Then the gui-daemon(*) maps them. > Later, when an application closes a window, the shared memory is > unmapped, and gui-daemon is informed about it. Releasing grant refs is > deferred by the kernel (until gui-daemon unmaps them). It may happen > that unmapping on the gui-agent side will happen before gui-daemon maps > them. We are modifying our GUI protocol to delay releasing grants on the > user space side, to coordinate with gui-daemon (basically wait until > gui-daemon confirms it unmapped them). This should fix the "normal" > case. > But if the gui-agent crashes just after sending grant refs, but before > gui-daemon maps them, then the problem is still there. If they are > immediately released by the kernel for others to use, we can hit the > same issue again (for example blkfront using them, and then gui-daemon > mapping them). I don't see race-free method for solving this with the > current API. GUI daemon can notice when such situation happens (by > checking if gui-agent is still alive after mapping grants), but that is > too late already. > > The main difference compared to kernel drivers is the automatic release > on crash (or other unclean exit). In case of kernel driver crash, either > the whole VM goes down, or at least automatic release doesn't happen. > Maybe gntalloc could have some flag (per open file? per allocated > grant?) to _not_ release grant reference (aka leak it) in case of > implicit unmap, instead of explicit release? Such explicit release > would need to be added to the Linux gntshr API, as xengntshr_unshare() > currently is just munmap()). I don't see many other options to avoid > userspace crash (potentially) taking down PV device with it too... That is still less than great, as it leads to a memory leak. Another approach would be some sort of unmap/revoke operation in the backend, so that the backend revokes its own access to the grants before telling the frontend it has unmapped them. This would cause the userspace mmap() call to fail. > (*) gui-agent and gui-daemon here are both in fact two processes (qubes gui > process that handles vchan communication and Xorg that does the actual > mapping). It complicates few things, but generally is irrelevant detail > from the Xen point of view. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: “Backend has not unmapped grant” errors 2022-08-29 12:55 ` Juergen Gross 2022-08-29 14:39 ` Marek Marczykowski-Górecki @ 2022-08-29 18:54 ` Demi Marie Obenour 1 sibling, 0 replies; 18+ messages in thread From: Demi Marie Obenour @ 2022-08-29 18:54 UTC (permalink / raw) To: Juergen Gross, Marek Marczykowski-Górecki; +Cc: Xen developer discussion [-- Attachment #1: Type: text/plain, Size: 6372 bytes --] On Mon, Aug 29, 2022 at 02:55:55PM +0200, Juergen Gross wrote: > On 28.08.22 07:15, Demi Marie Obenour wrote: > > On Wed, Aug 24, 2022 at 08:11:56AM +0200, Juergen Gross wrote: > > > On 24.08.22 02:20, Marek Marczykowski-Górecki wrote: > > > > On Tue, Aug 23, 2022 at 09:48:57AM +0200, Juergen Gross wrote: > > > > > On 23.08.22 09:40, Demi Marie Obenour wrote: > > > > > > I recently had a VM’s /dev/xvdb stop working with a “backend has not > > > > > > unmapped grant” error. Since /dev/xvdb was the VM’s private volume, > > > > > > that rendered the VM effectively useless. I had to kill it with > > > > > > qvm-kill. > > > > > > > > > > > > The backend of /dev/xvdb is dom0, so a malicious backend is clearly not > > > > > > the cause of this. I believe the actual cause is a race condition, such > > > > > > as the following: > > > > > > > > > > > > 1. GUI agent in VM allocates grant X. > > > > > > 2. GUI agent tells GUI daemon in dom0 to map X. > > > > > > 3. GUI agent frees grant X. > > > > > > 4. blkfront allocates grant X and passes it to dom0. > > > > > > 5. dom0’s blkback maps grant X. > > > > > > 6. blkback unmaps grant X. > > > > > > 7. GUI daemon maps grant X. > > > > > > 8. blkfront tries to revoke access to grant X and fails. Disaster > > > > > > ensues. > > > > > > > > > > > > What could be done to prevent this race? Right now all of the > > > > > > approaches I can think of are horribly backwards-incompatible. They > > > > > > require replacing grant IDs with some sort of handle, and requiring > > > > > > userspace to pass these handles to ioctls. It is also possible that > > > > > > netfront and blkfront could race against each other in a way that causes > > > > > > this, though I suspect that race would be much harder to trigger. > > > > > > > > > > > > This has happened more than once so it is not a fluke due to e.g. cosmic > > > > > > rays or other random bit-flips. > > > > > > > > > > > > Marek, do you have any suggestions? > > > > > > > > > > To me that sounds like the interface of the GUI is the culprit. > > > > > > > > > > The GUI agent in the guest should only free a grant, if it got a message > > > > > from the backend that it can do so. Just assuming to be able to free it > > > > > because it isn't in use currently is the broken assumption here. > > > > > > > > FWIW, I hit this issue twice already in this week CI run, while it never > > > > happened before. The difference compared to previous run is Linux > > > > 5.15.57 vs 5.15.61. The latter reports persistent grants disabled. > > > > > > I think this additional bug is just triggering the race in the GUI > > > interface more easily, as blkfront will allocate new grants with a > > > much higher frequency. > > > > > > So fixing the persistent grant issue will just paper over the real > > > issue. > > > > Indeed so, but making the bug happen much less frequently is still a > > significant win for users. > > Probably, yes. > > > In the long term, there is one situation I do not have a good solution > > for: recovery from GUI agent crashes. If the GUI agent crashes, the > > kernel it is running under has two bad choices. Either the kernel can > > reclaim the grants, risking them being mapped at a later time by the GUI > > daemon, or it can leak them, which is bad for obvious reasons. I > > believe the current implementation makes the former choice. > > It does. > > I don't have enough information about the GUI architecture you are using. > Which components are involved on the backend side, and which on the > frontend side? Especially the responsibilities and communication regarding > grants is important here. See Marek’s reply. > > To fix this problem, I recommend the following changes: > > > > 1. Treat “backend has not unmapped grant” errors as non-fatal. The most > > likely cause is buggy userspace software, not an attempt to exploit > > XSA-396. Instead of disabling the device, just log a warning message > > and put the grant on the deferred queue. Even leaking the grant > > would be preferable to the current behavior, as disabling a block > > device typically leaves the VM unusable. > > Sorry, I don't agree. This is a major violation of the normal I/O > architecture. Your reasoning with the disabled block device doesn't make > much sense IMHO, as the mapped grant was due to a bad interface leading to > another component using a grant it was not meant to use. > > Shutting down the block device is the right thing to do here, as data > corruption might be happening. In this case, the grants are being mapped read-only, so (unless I have missed something) data corruption is not possible. > > 3. Provide a means for a domain to be notified by Xen whenever one of > > its grants is unmapped. Setting an event channel and writing to a > > shared ring would suffice. This would allow eliminating the kludgy > > deferred freeing mechanism. > > Interesting idea. > > I believe such an interface would need to be activated per grant, as > otherwise performance could suffer a lot. There are still some unused bits > in the grant flags, one could be used for that purpose. At least in the GUI case, large numbers of grants are typically unmapped at once, and a notification is only necessary when the entire block has been unmapped. This should mitigate the performance concerns. > I'm not sure how often this would be used. In case it is only for the rare > case of unexpectedly long mapped grant pages, a simple event might do the > job, with the event handler just skimming through the pending unmaps to > find the grants being available again. In Qubes OS, this happens so often that we had to patch the Linux kernel to handle it better. Prior to the patch, the background deferred reclaim could not keep up, causing a memory leak. Furthermore, the log messages whenever an unmap had to be deferred were flooding the logs. While we could change the GUI protocol to provide an unmap-time notification, this is only because we use an LD_PRELOAD hack to hook Xorg’s unmapping calls. I would prefer to not continue to rely on this. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2022-08-29 18:54 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-08-23 7:40 “Backend has not unmapped grant” errors Demi Marie Obenour 2022-08-23 7:48 ` Juergen Gross 2022-08-24 0:20 ` Marek Marczykowski-Górecki 2022-08-24 6:02 ` Juergen Gross 2022-08-24 6:30 ` Jan Beulich 2022-08-24 6:36 ` Juergen Gross 2022-08-24 6:40 ` Jan Beulich 2022-08-24 17:44 ` SeongJae Park 2022-08-24 20:38 ` SeongJae Park 2022-08-25 6:20 ` Juergen Gross 2022-08-25 16:22 ` SeongJae Park 2022-08-24 6:11 ` Juergen Gross 2022-08-28 5:15 ` Demi Marie Obenour 2022-08-29 12:55 ` Juergen Gross 2022-08-29 14:39 ` Marek Marczykowski-Górecki 2022-08-29 16:03 ` Juergen Gross 2022-08-29 18:32 ` Demi Marie Obenour 2022-08-29 18:54 ` Demi Marie Obenour
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.