* [PATCH 0/3] Fixing races in probe/remove
@ 2022-06-20 5:11 Jason Wang
2022-06-20 5:11 ` [PATCH 1/3] caif_virtio: remove virtqueue_disable_cb() in probe Jason Wang
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Jason Wang @ 2022-06-20 5:11 UTC (permalink / raw)
To: netdev, linux-kernel, mst, jasowang; +Cc: davem, kuba, erwan.yvin
Hi all:
This series tries to fix races spotted during code review for
caif_virtio in probe() and remove().
Compile test only. (I don't have setup to test them).
Please review.
Thanks
Jason Wang (3):
caif_virtio: remove virtqueue_disable_cb() in probe
caif_virtio: fix the race between virtio_device_ready() and ndo_open()
caif_virtio: fix the race between reset and netdev unregister
drivers/net/caif/caif_virtio.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH 1/3] caif_virtio: remove virtqueue_disable_cb() in probe 2022-06-20 5:11 [PATCH 0/3] Fixing races in probe/remove Jason Wang @ 2022-06-20 5:11 ` Jason Wang 2022-06-20 9:02 ` Michael S. Tsirkin 2022-06-20 5:11 ` [PATCH 2/3] caif_virtio: fix the race between virtio_device_ready() and ndo_open() Jason Wang 2022-06-20 5:11 ` [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister Jason Wang 2 siblings, 1 reply; 13+ messages in thread From: Jason Wang @ 2022-06-20 5:11 UTC (permalink / raw) To: netdev, linux-kernel, mst, jasowang; +Cc: davem, kuba, erwan.yvin This disabling is a just a hint with best effort, there's no guarantee that device doesn't send notification. The driver should survive with that, so let's remove it. Signed-off-by: Jason Wang <jasowang@redhat.com> --- drivers/net/caif/caif_virtio.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c index 5458f57177a0..c677ded81133 100644 --- a/drivers/net/caif/caif_virtio.c +++ b/drivers/net/caif/caif_virtio.c @@ -705,9 +705,6 @@ static int cfv_probe(struct virtio_device *vdev) netdev->needed_headroom = cfv->tx_hr; netdev->needed_tailroom = cfv->tx_tr; - /* Disable buffer release interrupts unless we have stopped TX queues */ - virtqueue_disable_cb(cfv->vq_tx); - netdev->mtu = cfv->mtu - cfv->tx_tr; vdev->priv = cfv; -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] caif_virtio: remove virtqueue_disable_cb() in probe 2022-06-20 5:11 ` [PATCH 1/3] caif_virtio: remove virtqueue_disable_cb() in probe Jason Wang @ 2022-06-20 9:02 ` Michael S. Tsirkin 2022-06-21 3:10 ` Jason Wang 0 siblings, 1 reply; 13+ messages in thread From: Michael S. Tsirkin @ 2022-06-20 9:02 UTC (permalink / raw) To: Jason Wang; +Cc: netdev, linux-kernel, davem, kuba, erwan.yvin On Mon, Jun 20, 2022 at 01:11:13PM +0800, Jason Wang wrote: > This disabling is a just a hint with best effort, there's no guarantee > that device doesn't send notification. The driver should survive with > that, so let's remove it. > > Signed-off-by: Jason Wang <jasowang@redhat.com> I guess, but frankly this change feels gratituous, and just might uncover latent bugs. Which would be fine if we were out to find and fix them, but given this was compile-tested only, I'm not sure that's the case. > --- > drivers/net/caif/caif_virtio.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c > index 5458f57177a0..c677ded81133 100644 > --- a/drivers/net/caif/caif_virtio.c > +++ b/drivers/net/caif/caif_virtio.c > @@ -705,9 +705,6 @@ static int cfv_probe(struct virtio_device *vdev) > netdev->needed_headroom = cfv->tx_hr; > netdev->needed_tailroom = cfv->tx_tr; > > - /* Disable buffer release interrupts unless we have stopped TX queues */ > - virtqueue_disable_cb(cfv->vq_tx); > - > netdev->mtu = cfv->mtu - cfv->tx_tr; > vdev->priv = cfv; > > -- > 2.25.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] caif_virtio: remove virtqueue_disable_cb() in probe 2022-06-20 9:02 ` Michael S. Tsirkin @ 2022-06-21 3:10 ` Jason Wang 0 siblings, 0 replies; 13+ messages in thread From: Jason Wang @ 2022-06-21 3:10 UTC (permalink / raw) To: Michael S. Tsirkin Cc: netdev, linux-kernel, davem, Jakub Kicinski, erwan.yvin On Mon, Jun 20, 2022 at 5:02 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Mon, Jun 20, 2022 at 01:11:13PM +0800, Jason Wang wrote: > > This disabling is a just a hint with best effort, there's no guarantee > > that device doesn't send notification. The driver should survive with > > that, so let's remove it. > > > > Signed-off-by: Jason Wang <jasowang@redhat.com> > > I guess, but frankly this change feels gratituous, and just might > uncover latent bugs. Which would be fine if we were out to > find and fix them, but given this was compile-tested only, > I'm not sure that's the case. Ok, let me drop this from the next version (no way to test). Thanks > > > > --- > > drivers/net/caif/caif_virtio.c | 3 --- > > 1 file changed, 3 deletions(-) > > > > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c > > index 5458f57177a0..c677ded81133 100644 > > --- a/drivers/net/caif/caif_virtio.c > > +++ b/drivers/net/caif/caif_virtio.c > > @@ -705,9 +705,6 @@ static int cfv_probe(struct virtio_device *vdev) > > netdev->needed_headroom = cfv->tx_hr; > > netdev->needed_tailroom = cfv->tx_tr; > > > > - /* Disable buffer release interrupts unless we have stopped TX queues */ > > - virtqueue_disable_cb(cfv->vq_tx); > > - > > netdev->mtu = cfv->mtu - cfv->tx_tr; > > vdev->priv = cfv; > > > > -- > > 2.25.1 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/3] caif_virtio: fix the race between virtio_device_ready() and ndo_open() 2022-06-20 5:11 [PATCH 0/3] Fixing races in probe/remove Jason Wang 2022-06-20 5:11 ` [PATCH 1/3] caif_virtio: remove virtqueue_disable_cb() in probe Jason Wang @ 2022-06-20 5:11 ` Jason Wang 2022-06-20 9:04 ` Michael S. Tsirkin 2022-06-20 5:11 ` [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister Jason Wang 2 siblings, 1 reply; 13+ messages in thread From: Jason Wang @ 2022-06-20 5:11 UTC (permalink / raw) To: netdev, linux-kernel, mst, jasowang; +Cc: davem, kuba, erwan.yvin We used to depend on the virtio_device_ready() that is called after probe() by virtio_dev_probe() after netdev registration. This cause a race between ndo_open() and virtio_device_ready(): if ndo_open() is called before virtio_device_ready(), the driver may start to use the device (e.g TX) before DRIVER_OK which violates the spec. Fixing this by switching to use register_netdevice() and protect the virtio_device_ready() with rtnl_lock() to make sure ndo_open() can only be called after virtio_device_ready(). Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio") Signed-off-by: Jason Wang <jasowang@redhat.com> --- drivers/net/caif/caif_virtio.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c index c677ded81133..66375bea2fcd 100644 --- a/drivers/net/caif/caif_virtio.c +++ b/drivers/net/caif/caif_virtio.c @@ -719,13 +719,21 @@ static int cfv_probe(struct virtio_device *vdev) /* Carrier is off until netdevice is opened */ netif_carrier_off(netdev); + /* serialize netdev register + virtio_device_ready() with ndo_open() */ + rtnl_lock(); + /* register Netdev */ - err = register_netdev(netdev); + err = register_netdevice(netdev); if (err) { + rtnl_unlock(); dev_err(&vdev->dev, "Unable to register netdev (%d)\n", err); goto err; } + virtio_device_ready(vdev); + + rtnl_unlock(); + debugfs_init(cfv); return 0; -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] caif_virtio: fix the race between virtio_device_ready() and ndo_open() 2022-06-20 5:11 ` [PATCH 2/3] caif_virtio: fix the race between virtio_device_ready() and ndo_open() Jason Wang @ 2022-06-20 9:04 ` Michael S. Tsirkin 0 siblings, 0 replies; 13+ messages in thread From: Michael S. Tsirkin @ 2022-06-20 9:04 UTC (permalink / raw) To: Jason Wang; +Cc: netdev, linux-kernel, davem, kuba, erwan.yvin On Mon, Jun 20, 2022 at 01:11:14PM +0800, Jason Wang wrote: > We used to depend on the virtio_device_ready() that is called after "We used to" implies it's not the case currently. I think you mean "we currently depend". > probe() by virtio_dev_probe() after netdev registration. This > cause causes >a race between ndo_open() and virtio_device_ready(): if > ndo_open() is called before virtio_device_ready(), the driver may > start to use the device (e.g TX) before DRIVER_OK which violates the > spec. > > Fixing this Fix this > by switching to use register_netdevice() and protect the > virtio_device_ready() with rtnl_lock() to make sure ndo_open() can > only be called after virtio_device_ready(). > > Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio") > Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> > --- > drivers/net/caif/caif_virtio.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c > index c677ded81133..66375bea2fcd 100644 > --- a/drivers/net/caif/caif_virtio.c > +++ b/drivers/net/caif/caif_virtio.c > @@ -719,13 +719,21 @@ static int cfv_probe(struct virtio_device *vdev) > /* Carrier is off until netdevice is opened */ > netif_carrier_off(netdev); > > + /* serialize netdev register + virtio_device_ready() with ndo_open() */ > + rtnl_lock(); > + > /* register Netdev */ > - err = register_netdev(netdev); > + err = register_netdevice(netdev); > if (err) { > + rtnl_unlock(); > dev_err(&vdev->dev, "Unable to register netdev (%d)\n", err); > goto err; > } > > + virtio_device_ready(vdev); > + > + rtnl_unlock(); > + > debugfs_init(cfv); > > return 0; > -- > 2.25.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister 2022-06-20 5:11 [PATCH 0/3] Fixing races in probe/remove Jason Wang 2022-06-20 5:11 ` [PATCH 1/3] caif_virtio: remove virtqueue_disable_cb() in probe Jason Wang 2022-06-20 5:11 ` [PATCH 2/3] caif_virtio: fix the race between virtio_device_ready() and ndo_open() Jason Wang @ 2022-06-20 5:11 ` Jason Wang 2022-06-20 9:09 ` Michael S. Tsirkin 2 siblings, 1 reply; 13+ messages in thread From: Jason Wang @ 2022-06-20 5:11 UTC (permalink / raw) To: netdev, linux-kernel, mst, jasowang; +Cc: davem, kuba, erwan.yvin We use to do the following steps during .remove(): static void cfv_remove(struct virtio_device *vdev) { struct cfv_info *cfv = vdev->priv; rtnl_lock(); dev_close(cfv->ndev); rtnl_unlock(); tasklet_kill(&cfv->tx_release_tasklet); debugfs_remove_recursive(cfv->debugfs); vringh_kiov_cleanup(&cfv->ctx.riov); virtio_reset_device(vdev); vdev->vringh_config->del_vrhs(cfv->vdev); cfv->vr_rx = NULL; vdev->config->del_vqs(cfv->vdev); unregister_netdev(cfv->ndev); } This is racy since device could be re-opened after dev_close() but before unregister_netdevice(): 1) RX vringh is cleaned before resetting the device, rx callbacks that is called after the vringh_kiov_cleanup() will result a UAF 2) Network stack can still try to use TX virtqueue even if it has been deleted after dev_vqs() Fixing this by unregistering the network device first to make sure not device access from both TX and RX side. Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio") Signed-off-by: Jason Wang <jasowang@redhat.com> --- drivers/net/caif/caif_virtio.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c index 66375bea2fcd..a29f9b2df5b1 100644 --- a/drivers/net/caif/caif_virtio.c +++ b/drivers/net/caif/caif_virtio.c @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev) { struct cfv_info *cfv = vdev->priv; - rtnl_lock(); - dev_close(cfv->ndev); - rtnl_unlock(); + /* Make sure NAPI/TX won't try to access the device */ + unregister_netdev(cfv->ndev); tasklet_kill(&cfv->tx_release_tasklet); debugfs_remove_recursive(cfv->debugfs); @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev) vdev->vringh_config->del_vrhs(cfv->vdev); cfv->vr_rx = NULL; vdev->config->del_vqs(cfv->vdev); - unregister_netdev(cfv->ndev); } static struct virtio_device_id id_table[] = { -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister 2022-06-20 5:11 ` [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister Jason Wang @ 2022-06-20 9:09 ` Michael S. Tsirkin 2022-06-20 9:18 ` Jason Wang 0 siblings, 1 reply; 13+ messages in thread From: Michael S. Tsirkin @ 2022-06-20 9:09 UTC (permalink / raw) To: Jason Wang; +Cc: netdev, linux-kernel, davem, kuba, erwan.yvin On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote: > We use to do the following steps during .remove(): We currently do > static void cfv_remove(struct virtio_device *vdev) > { > struct cfv_info *cfv = vdev->priv; > > rtnl_lock(); > dev_close(cfv->ndev); > rtnl_unlock(); > > tasklet_kill(&cfv->tx_release_tasklet); > debugfs_remove_recursive(cfv->debugfs); > > vringh_kiov_cleanup(&cfv->ctx.riov); > virtio_reset_device(vdev); > vdev->vringh_config->del_vrhs(cfv->vdev); > cfv->vr_rx = NULL; > vdev->config->del_vqs(cfv->vdev); > unregister_netdev(cfv->ndev); > } > This is racy since device could be re-opened after dev_close() but > before unregister_netdevice(): > > 1) RX vringh is cleaned before resetting the device, rx callbacks that > is called after the vringh_kiov_cleanup() will result a UAF > 2) Network stack can still try to use TX virtqueue even if it has been > deleted after dev_vqs() > > Fixing this by unregistering the network device first to make sure not > device access from both TX and RX side. > > Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio") > Signed-off-by: Jason Wang <jasowang@redhat.com> > --- > drivers/net/caif/caif_virtio.c | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c > index 66375bea2fcd..a29f9b2df5b1 100644 > --- a/drivers/net/caif/caif_virtio.c > +++ b/drivers/net/caif/caif_virtio.c > @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev) > { > struct cfv_info *cfv = vdev->priv; > > - rtnl_lock(); > - dev_close(cfv->ndev); > - rtnl_unlock(); > + /* Make sure NAPI/TX won't try to access the device */ > + unregister_netdev(cfv->ndev); > > tasklet_kill(&cfv->tx_release_tasklet); > debugfs_remove_recursive(cfv->debugfs); > @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev) > vdev->vringh_config->del_vrhs(cfv->vdev); > cfv->vr_rx = NULL; > vdev->config->del_vqs(cfv->vdev); > - unregister_netdev(cfv->ndev); > } This gives me pause, callbacks can now trigger after device has been unregistered. Are we sure this is safe? Won't it be safer to just keep the rtnl_lock around the whole process? > static struct virtio_device_id id_table[] = { > -- > 2.25.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister 2022-06-20 9:09 ` Michael S. Tsirkin @ 2022-06-20 9:18 ` Jason Wang 2022-06-20 10:18 ` Michael S. Tsirkin 0 siblings, 1 reply; 13+ messages in thread From: Jason Wang @ 2022-06-20 9:18 UTC (permalink / raw) To: Michael S. Tsirkin Cc: netdev, linux-kernel, davem, Jakub Kicinski, erwan.yvin On Mon, Jun 20, 2022 at 5:09 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote: > > We use to do the following steps during .remove(): > > We currently do > > > > static void cfv_remove(struct virtio_device *vdev) > > { > > struct cfv_info *cfv = vdev->priv; > > > > rtnl_lock(); > > dev_close(cfv->ndev); > > rtnl_unlock(); > > > > tasklet_kill(&cfv->tx_release_tasklet); > > debugfs_remove_recursive(cfv->debugfs); > > > > vringh_kiov_cleanup(&cfv->ctx.riov); > > virtio_reset_device(vdev); > > vdev->vringh_config->del_vrhs(cfv->vdev); > > cfv->vr_rx = NULL; > > vdev->config->del_vqs(cfv->vdev); > > unregister_netdev(cfv->ndev); > > } > > This is racy since device could be re-opened after dev_close() but > > before unregister_netdevice(): > > > > 1) RX vringh is cleaned before resetting the device, rx callbacks that > > is called after the vringh_kiov_cleanup() will result a UAF > > 2) Network stack can still try to use TX virtqueue even if it has been > > deleted after dev_vqs() > > > > Fixing this by unregistering the network device first to make sure not > > device access from both TX and RX side. > > > > Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio") > > Signed-off-by: Jason Wang <jasowang@redhat.com> > > --- > > drivers/net/caif/caif_virtio.c | 6 ++---- > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c > > index 66375bea2fcd..a29f9b2df5b1 100644 > > --- a/drivers/net/caif/caif_virtio.c > > +++ b/drivers/net/caif/caif_virtio.c > > @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev) > > { > > struct cfv_info *cfv = vdev->priv; > > > > - rtnl_lock(); > > - dev_close(cfv->ndev); > > - rtnl_unlock(); > > + /* Make sure NAPI/TX won't try to access the device */ > > + unregister_netdev(cfv->ndev); > > > > tasklet_kill(&cfv->tx_release_tasklet); > > debugfs_remove_recursive(cfv->debugfs); > > @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev) > > vdev->vringh_config->del_vrhs(cfv->vdev); > > cfv->vr_rx = NULL; > > vdev->config->del_vqs(cfv->vdev); > > - unregister_netdev(cfv->ndev); > > } > > > This gives me pause, callbacks can now trigger after device > has been unregistered. Are we sure this is safe? It looks safe, for RX NAPI is disabled. For TX, tasklet is disabled after tasklet_kill(). I can add a comment to explain this. > Won't it be safer to just keep the rtnl_lock around > the whole process? It looks to me we rtnl_lock can't help in synchronizing with the callbacks, anything I miss? Thanks > > > static struct virtio_device_id id_table[] = { > > -- > > 2.25.1 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister 2022-06-20 9:18 ` Jason Wang @ 2022-06-20 10:18 ` Michael S. Tsirkin 2022-06-21 3:09 ` Jason Wang 0 siblings, 1 reply; 13+ messages in thread From: Michael S. Tsirkin @ 2022-06-20 10:18 UTC (permalink / raw) To: Jason Wang; +Cc: netdev, linux-kernel, davem, Jakub Kicinski, erwan.yvin On Mon, Jun 20, 2022 at 05:18:29PM +0800, Jason Wang wrote: > On Mon, Jun 20, 2022 at 5:09 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote: > > > We use to do the following steps during .remove(): > > > > We currently do > > > > > > > static void cfv_remove(struct virtio_device *vdev) > > > { > > > struct cfv_info *cfv = vdev->priv; > > > > > > rtnl_lock(); > > > dev_close(cfv->ndev); > > > rtnl_unlock(); > > > > > > tasklet_kill(&cfv->tx_release_tasklet); > > > debugfs_remove_recursive(cfv->debugfs); > > > > > > vringh_kiov_cleanup(&cfv->ctx.riov); > > > virtio_reset_device(vdev); > > > vdev->vringh_config->del_vrhs(cfv->vdev); > > > cfv->vr_rx = NULL; > > > vdev->config->del_vqs(cfv->vdev); > > > unregister_netdev(cfv->ndev); > > > } > > > This is racy since device could be re-opened after dev_close() but > > > before unregister_netdevice(): > > > > > > 1) RX vringh is cleaned before resetting the device, rx callbacks that > > > is called after the vringh_kiov_cleanup() will result a UAF > > > 2) Network stack can still try to use TX virtqueue even if it has been > > > deleted after dev_vqs() > > > > > > Fixing this by unregistering the network device first to make sure not > > > device access from both TX and RX side. > > > > > > Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio") > > > Signed-off-by: Jason Wang <jasowang@redhat.com> > > > --- > > > drivers/net/caif/caif_virtio.c | 6 ++---- > > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c > > > index 66375bea2fcd..a29f9b2df5b1 100644 > > > --- a/drivers/net/caif/caif_virtio.c > > > +++ b/drivers/net/caif/caif_virtio.c > > > @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev) > > > { > > > struct cfv_info *cfv = vdev->priv; > > > > > > - rtnl_lock(); > > > - dev_close(cfv->ndev); > > > - rtnl_unlock(); > > > + /* Make sure NAPI/TX won't try to access the device */ > > > + unregister_netdev(cfv->ndev); > > > > > > tasklet_kill(&cfv->tx_release_tasklet); > > > debugfs_remove_recursive(cfv->debugfs); > > > @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev) > > > vdev->vringh_config->del_vrhs(cfv->vdev); > > > cfv->vr_rx = NULL; > > > vdev->config->del_vqs(cfv->vdev); > > > - unregister_netdev(cfv->ndev); > > > } > > > > > > This gives me pause, callbacks can now trigger after device > > has been unregistered. Are we sure this is safe? > > It looks safe, for RX NAPI is disabled. For TX, tasklet is disabled > after tasklet_kill(). I can add a comment to explain this. that waits for outstanding tasklets but does it really prevent future ones? > > Won't it be safer to just keep the rtnl_lock around > > the whole process? > > It looks to me we rtnl_lock can't help in synchronizing with the > callbacks, anything I miss? > > Thanks good point. > > > > > static struct virtio_device_id id_table[] = { > > > -- > > > 2.25.1 > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister 2022-06-20 10:18 ` Michael S. Tsirkin @ 2022-06-21 3:09 ` Jason Wang 2022-06-21 6:00 ` Michael S. Tsirkin 0 siblings, 1 reply; 13+ messages in thread From: Jason Wang @ 2022-06-21 3:09 UTC (permalink / raw) To: Michael S. Tsirkin Cc: netdev, linux-kernel, davem, Jakub Kicinski, erwan.yvin On Mon, Jun 20, 2022 at 6:18 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Mon, Jun 20, 2022 at 05:18:29PM +0800, Jason Wang wrote: > > On Mon, Jun 20, 2022 at 5:09 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote: > > > > We use to do the following steps during .remove(): > > > > > > We currently do > > > > > > > > > > static void cfv_remove(struct virtio_device *vdev) > > > > { > > > > struct cfv_info *cfv = vdev->priv; > > > > > > > > rtnl_lock(); > > > > dev_close(cfv->ndev); > > > > rtnl_unlock(); > > > > > > > > tasklet_kill(&cfv->tx_release_tasklet); > > > > debugfs_remove_recursive(cfv->debugfs); > > > > > > > > vringh_kiov_cleanup(&cfv->ctx.riov); > > > > virtio_reset_device(vdev); > > > > vdev->vringh_config->del_vrhs(cfv->vdev); > > > > cfv->vr_rx = NULL; > > > > vdev->config->del_vqs(cfv->vdev); > > > > unregister_netdev(cfv->ndev); > > > > } > > > > This is racy since device could be re-opened after dev_close() but > > > > before unregister_netdevice(): > > > > > > > > 1) RX vringh is cleaned before resetting the device, rx callbacks that > > > > is called after the vringh_kiov_cleanup() will result a UAF > > > > 2) Network stack can still try to use TX virtqueue even if it has been > > > > deleted after dev_vqs() > > > > > > > > Fixing this by unregistering the network device first to make sure not > > > > device access from both TX and RX side. > > > > > > > > Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio") > > > > Signed-off-by: Jason Wang <jasowang@redhat.com> > > > > --- > > > > drivers/net/caif/caif_virtio.c | 6 ++---- > > > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c > > > > index 66375bea2fcd..a29f9b2df5b1 100644 > > > > --- a/drivers/net/caif/caif_virtio.c > > > > +++ b/drivers/net/caif/caif_virtio.c > > > > @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev) > > > > { > > > > struct cfv_info *cfv = vdev->priv; > > > > > > > > - rtnl_lock(); > > > > - dev_close(cfv->ndev); > > > > - rtnl_unlock(); > > > > + /* Make sure NAPI/TX won't try to access the device */ > > > > + unregister_netdev(cfv->ndev); > > > > > > > > tasklet_kill(&cfv->tx_release_tasklet); > > > > debugfs_remove_recursive(cfv->debugfs); > > > > @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev) > > > > vdev->vringh_config->del_vrhs(cfv->vdev); > > > > cfv->vr_rx = NULL; > > > > vdev->config->del_vqs(cfv->vdev); > > > > - unregister_netdev(cfv->ndev); > > > > } > > > > > > > > > This gives me pause, callbacks can now trigger after device > > > has been unregistered. Are we sure this is safe? > > > > It looks safe, for RX NAPI is disabled. For TX, tasklet is disabled > > after tasklet_kill(). I can add a comment to explain this. > > that waits for outstanding tasklets but does it really prevent > future ones? I think so, it tries to test and set TASKLET_STATE_SCHED which blocks the future scheduling of a tasklet. Thanks > > > > Won't it be safer to just keep the rtnl_lock around > > > the whole process? > > > > It looks to me we rtnl_lock can't help in synchronizing with the > > callbacks, anything I miss? > > > > Thanks > > good point. > > > > > > > > > static struct virtio_device_id id_table[] = { > > > > -- > > > > 2.25.1 > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister 2022-06-21 3:09 ` Jason Wang @ 2022-06-21 6:00 ` Michael S. Tsirkin 2022-06-21 6:25 ` Jason Wang 0 siblings, 1 reply; 13+ messages in thread From: Michael S. Tsirkin @ 2022-06-21 6:00 UTC (permalink / raw) To: Jason Wang; +Cc: netdev, linux-kernel, davem, Jakub Kicinski, erwan.yvin On Tue, Jun 21, 2022 at 11:09:45AM +0800, Jason Wang wrote: > On Mon, Jun 20, 2022 at 6:18 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Mon, Jun 20, 2022 at 05:18:29PM +0800, Jason Wang wrote: > > > On Mon, Jun 20, 2022 at 5:09 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote: > > > > > We use to do the following steps during .remove(): > > > > > > > > We currently do > > > > > > > > > > > > > static void cfv_remove(struct virtio_device *vdev) > > > > > { > > > > > struct cfv_info *cfv = vdev->priv; > > > > > > > > > > rtnl_lock(); > > > > > dev_close(cfv->ndev); > > > > > rtnl_unlock(); > > > > > > > > > > tasklet_kill(&cfv->tx_release_tasklet); > > > > > debugfs_remove_recursive(cfv->debugfs); > > > > > > > > > > vringh_kiov_cleanup(&cfv->ctx.riov); > > > > > virtio_reset_device(vdev); > > > > > vdev->vringh_config->del_vrhs(cfv->vdev); > > > > > cfv->vr_rx = NULL; > > > > > vdev->config->del_vqs(cfv->vdev); > > > > > unregister_netdev(cfv->ndev); > > > > > } > > > > > This is racy since device could be re-opened after dev_close() but > > > > > before unregister_netdevice(): > > > > > > > > > > 1) RX vringh is cleaned before resetting the device, rx callbacks that > > > > > is called after the vringh_kiov_cleanup() will result a UAF > > > > > 2) Network stack can still try to use TX virtqueue even if it has been > > > > > deleted after dev_vqs() > > > > > > > > > > Fixing this by unregistering the network device first to make sure not > > > > > device access from both TX and RX side. > > > > > > > > > > Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio") > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com> > > > > > --- > > > > > drivers/net/caif/caif_virtio.c | 6 ++---- > > > > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > > > > > > > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c > > > > > index 66375bea2fcd..a29f9b2df5b1 100644 > > > > > --- a/drivers/net/caif/caif_virtio.c > > > > > +++ b/drivers/net/caif/caif_virtio.c > > > > > @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev) > > > > > { > > > > > struct cfv_info *cfv = vdev->priv; > > > > > > > > > > - rtnl_lock(); > > > > > - dev_close(cfv->ndev); > > > > > - rtnl_unlock(); > > > > > + /* Make sure NAPI/TX won't try to access the device */ > > > > > + unregister_netdev(cfv->ndev); > > > > > > > > > > tasklet_kill(&cfv->tx_release_tasklet); > > > > > debugfs_remove_recursive(cfv->debugfs); > > > > > @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev) > > > > > vdev->vringh_config->del_vrhs(cfv->vdev); > > > > > cfv->vr_rx = NULL; > > > > > vdev->config->del_vqs(cfv->vdev); > > > > > - unregister_netdev(cfv->ndev); > > > > > } > > > > > > > > > > > > This gives me pause, callbacks can now trigger after device > > > > has been unregistered. Are we sure this is safe? > > > > > > It looks safe, for RX NAPI is disabled. For TX, tasklet is disabled > > > after tasklet_kill(). I can add a comment to explain this. > > > > that waits for outstanding tasklets but does it really prevent > > future ones? > > I think so, it tries to test and set TASKLET_STATE_SCHED which blocks > the future scheduling of a tasklet. > > Thanks But then in the end it clears it, does it not? > > > > > > Won't it be safer to just keep the rtnl_lock around > > > > the whole process? > > > > > > It looks to me we rtnl_lock can't help in synchronizing with the > > > callbacks, anything I miss? > > > > > > Thanks > > > > good point. > > > > > > > > > > > > > static struct virtio_device_id id_table[] = { > > > > > -- > > > > > 2.25.1 > > > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister 2022-06-21 6:00 ` Michael S. Tsirkin @ 2022-06-21 6:25 ` Jason Wang 0 siblings, 0 replies; 13+ messages in thread From: Jason Wang @ 2022-06-21 6:25 UTC (permalink / raw) To: Michael S. Tsirkin Cc: netdev, linux-kernel, davem, Jakub Kicinski, erwan.yvin On Tue, Jun 21, 2022 at 2:00 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Jun 21, 2022 at 11:09:45AM +0800, Jason Wang wrote: > > On Mon, Jun 20, 2022 at 6:18 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Mon, Jun 20, 2022 at 05:18:29PM +0800, Jason Wang wrote: > > > > On Mon, Jun 20, 2022 at 5:09 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote: > > > > > > We use to do the following steps during .remove(): > > > > > > > > > > We currently do > > > > > > > > > > > > > > > > static void cfv_remove(struct virtio_device *vdev) > > > > > > { > > > > > > struct cfv_info *cfv = vdev->priv; > > > > > > > > > > > > rtnl_lock(); > > > > > > dev_close(cfv->ndev); > > > > > > rtnl_unlock(); > > > > > > > > > > > > tasklet_kill(&cfv->tx_release_tasklet); > > > > > > debugfs_remove_recursive(cfv->debugfs); > > > > > > > > > > > > vringh_kiov_cleanup(&cfv->ctx.riov); > > > > > > virtio_reset_device(vdev); > > > > > > vdev->vringh_config->del_vrhs(cfv->vdev); > > > > > > cfv->vr_rx = NULL; > > > > > > vdev->config->del_vqs(cfv->vdev); > > > > > > unregister_netdev(cfv->ndev); > > > > > > } > > > > > > This is racy since device could be re-opened after dev_close() but > > > > > > before unregister_netdevice(): > > > > > > > > > > > > 1) RX vringh is cleaned before resetting the device, rx callbacks that > > > > > > is called after the vringh_kiov_cleanup() will result a UAF > > > > > > 2) Network stack can still try to use TX virtqueue even if it has been > > > > > > deleted after dev_vqs() > > > > > > > > > > > > Fixing this by unregistering the network device first to make sure not > > > > > > device access from both TX and RX side. > > > > > > > > > > > > Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio") > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com> > > > > > > --- > > > > > > drivers/net/caif/caif_virtio.c | 6 ++---- > > > > > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > > > > > > > > > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c > > > > > > index 66375bea2fcd..a29f9b2df5b1 100644 > > > > > > --- a/drivers/net/caif/caif_virtio.c > > > > > > +++ b/drivers/net/caif/caif_virtio.c > > > > > > @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev) > > > > > > { > > > > > > struct cfv_info *cfv = vdev->priv; > > > > > > > > > > > > - rtnl_lock(); > > > > > > - dev_close(cfv->ndev); > > > > > > - rtnl_unlock(); > > > > > > + /* Make sure NAPI/TX won't try to access the device */ > > > > > > + unregister_netdev(cfv->ndev); > > > > > > > > > > > > tasklet_kill(&cfv->tx_release_tasklet); > > > > > > debugfs_remove_recursive(cfv->debugfs); > > > > > > @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev) > > > > > > vdev->vringh_config->del_vrhs(cfv->vdev); > > > > > > cfv->vr_rx = NULL; > > > > > > vdev->config->del_vqs(cfv->vdev); > > > > > > - unregister_netdev(cfv->ndev); > > > > > > } > > > > > > > > > > > > > > > This gives me pause, callbacks can now trigger after device > > > > > has been unregistered. Are we sure this is safe? > > > > > > > > It looks safe, for RX NAPI is disabled. For TX, tasklet is disabled > > > > after tasklet_kill(). I can add a comment to explain this. > > > > > > that waits for outstanding tasklets but does it really prevent > > > future ones? > > > > I think so, it tries to test and set TASKLET_STATE_SCHED which blocks > > the future scheduling of a tasklet. > > > > Thanks > > But then in the end it clears it, does it not? Right, so we need to reset before taskset_kill(). Thanks > > > > > > > > > Won't it be safer to just keep the rtnl_lock around > > > > > the whole process? > > > > > > > > It looks to me we rtnl_lock can't help in synchronizing with the > > > > callbacks, anything I miss? > > > > > > > > Thanks > > > > > > good point. > > > > > > > > > > > > > > > > > static struct virtio_device_id id_table[] = { > > > > > > -- > > > > > > 2.25.1 > > > > > > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2022-06-21 6:25 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-06-20 5:11 [PATCH 0/3] Fixing races in probe/remove Jason Wang 2022-06-20 5:11 ` [PATCH 1/3] caif_virtio: remove virtqueue_disable_cb() in probe Jason Wang 2022-06-20 9:02 ` Michael S. Tsirkin 2022-06-21 3:10 ` Jason Wang 2022-06-20 5:11 ` [PATCH 2/3] caif_virtio: fix the race between virtio_device_ready() and ndo_open() Jason Wang 2022-06-20 9:04 ` Michael S. Tsirkin 2022-06-20 5:11 ` [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister Jason Wang 2022-06-20 9:09 ` Michael S. Tsirkin 2022-06-20 9:18 ` Jason Wang 2022-06-20 10:18 ` Michael S. Tsirkin 2022-06-21 3:09 ` Jason Wang 2022-06-21 6:00 ` Michael S. Tsirkin 2022-06-21 6:25 ` Jason Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).