* Re: [PATCH net-next] virtio_net: set link state down when virtqueue is broken [not found] <79907bf6c835572b4af92f16d9a3ff2822b1c7ea.1622028946.git.wangyunjian@huawei.com> @ 2021-05-27 4:22 ` Jason Wang [not found] ` <d18383f7e675452d9392321506db6fa0@huawei.com> 0 siblings, 1 reply; 4+ messages in thread From: Jason Wang @ 2021-05-27 4:22 UTC (permalink / raw) To: wangyunjian, netdev; +Cc: kuba, virtualization, dingxiaoxiong, davem, mst 在 2021/5/26 下午7:39, wangyunjian 写道: > From: Yunjian Wang <wangyunjian@huawei.com> > > The NIC can't receive/send packets if a rx/tx virtqueue is broken. > However, the link state of the NIC is still normal. As a result, > the user cannot detect the NIC exception. Doesn't we have: /* This should not happen! */ if (unlikely(err)) { dev->stats.tx_fifo_errors++; if (net_ratelimit()) dev_warn(&dev->dev, "Unexpected TXQ (%d) queue failure: %d\n", qnum, err); dev->stats.tx_dropped++; dev_kfree_skb_any(skb); return NETDEV_TX_OK; } Which should be sufficient? > > The driver can set the link state down when the virtqueue is broken. > If the state is down, the user can switch over to another NIC. Note that, we probably need the watchdog for virtio-net in order to be a complete solution. Thanks > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> > --- > drivers/net/virtio_net.c | 36 +++++++++++++++++++++++++++++++++++- > 1 file changed, 35 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 073fec4c0df1..05a3cd1c589b 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -237,6 +237,10 @@ struct virtnet_info { > > /* failover when STANDBY feature enabled */ > struct failover *failover; > + > + /* Work struct for checking vq status, stop NIC if vq is broken. */ > + struct delayed_work vq_check_work; > + bool broken; > }; > > struct padded_vnet_hdr { > @@ -1407,6 +1411,27 @@ static void refill_work(struct work_struct *work) > } > } > > +static void virnet_vq_check_work(struct work_struct *work) > +{ > + struct virtnet_info *vi = > + container_of(work, struct virtnet_info, vq_check_work.work); > + struct net_device *netdev = vi->dev; > + int i; > + > + if (vi->broken) > + return; > + > + /* If virtqueue is broken, set link down and stop all queues */ > + for (i = 0; i < vi->max_queue_pairs; i++) { > + if (virtqueue_is_broken(vi->rq[i].vq) || virtqueue_is_broken(vi->sq[i].vq)) { > + netif_carrier_off(netdev); > + netif_tx_stop_all_queues(netdev); > + vi->broken = true; > + break; > + } > + } > +} > + > static int virtnet_receive(struct receive_queue *rq, int budget, > unsigned int *xdp_xmit) > { > @@ -1432,6 +1457,9 @@ static int virtnet_receive(struct receive_queue *rq, int budget, > } > } > > + if (unlikely(!virtqueue_is_broken(rq->vq))) > + schedule_delayed_work(&vi->vq_check_work, HZ); > + > if (rq->vq->num_free > min((unsigned int)budget, virtqueue_get_vring_size(rq->vq)) / 2) { > if (!try_fill_recv(vi, rq, GFP_ATOMIC)) > schedule_delayed_work(&vi->refill, 0); > @@ -1681,6 +1709,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) > qnum, err); > dev->stats.tx_dropped++; > dev_kfree_skb_any(skb); > + schedule_delayed_work(&vi->vq_check_work, HZ); > return NETDEV_TX_OK; > } > > @@ -1905,6 +1934,7 @@ static int virtnet_close(struct net_device *dev) > > /* Make sure refill_work doesn't re-enable napi! */ > cancel_delayed_work_sync(&vi->refill); > + cancel_delayed_work_sync(&vi->vq_check_work); > > for (i = 0; i < vi->max_queue_pairs; i++) { > xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); > @@ -2381,6 +2411,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev) > netif_device_detach(vi->dev); > netif_tx_unlock_bh(vi->dev); > cancel_delayed_work_sync(&vi->refill); > + cancel_delayed_work_sync(&vi->vq_check_work); > > if (netif_running(vi->dev)) { > for (i = 0; i < vi->max_queue_pairs; i++) { > @@ -2662,7 +2693,7 @@ static void virtnet_config_changed_work(struct work_struct *work) > > vi->status = v; > > - if (vi->status & VIRTIO_NET_S_LINK_UP) { > + if ((vi->status & VIRTIO_NET_S_LINK_UP) && !vi->broken) { > virtnet_update_settings(vi); > netif_carrier_on(vi->dev); > netif_tx_wake_all_queues(vi->dev); > @@ -2889,6 +2920,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi) > goto err_rq; > > INIT_DELAYED_WORK(&vi->refill, refill_work); > + INIT_DELAYED_WORK(&vi->vq_check_work, virnet_vq_check_work); > + > for (i = 0; i < vi->max_queue_pairs; i++) { > vi->rq[i].pages = NULL; > netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll, > @@ -3240,6 +3273,7 @@ static int virtnet_probe(struct virtio_device *vdev) > net_failover_destroy(vi->failover); > free_vqs: > cancel_delayed_work_sync(&vi->refill); > + cancel_delayed_work_sync(&vi->vq_check_work); > free_receive_page_frags(vi); > virtnet_del_vqs(vi); > free: _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <d18383f7e675452d9392321506db6fa0@huawei.com>]
* Re: [PATCH net-next] virtio_net: set link state down when virtqueue is broken [not found] ` <d18383f7e675452d9392321506db6fa0@huawei.com> @ 2021-05-31 3:28 ` Jason Wang [not found] ` <20a5f1bd8a5a49fa8c0f90875a49631b@huawei.com> 0 siblings, 1 reply; 4+ messages in thread From: Jason Wang @ 2021-05-31 3:28 UTC (permalink / raw) To: wangyunjian, netdev@vger.kernel.org Cc: kuba@kernel.org, virtualization@lists.linux-foundation.org, dingxiaoxiong, davem@davemloft.net, mst@redhat.com 在 2021/5/28 下午6:58, wangyunjian 写道: >> -----Original Message----- >>> From: Yunjian Wang <wangyunjian@huawei.com> >>> >>> The NIC can't receive/send packets if a rx/tx virtqueue is broken. >>> However, the link state of the NIC is still normal. As a result, the >>> user cannot detect the NIC exception. >> >> Doesn't we have: >> >> /* This should not happen! */ >> if (unlikely(err)) { >> dev->stats.tx_fifo_errors++; >> if (net_ratelimit()) >> dev_warn(&dev->dev, >> "Unexpected TXQ (%d) queue >> failure: %d\n", >> qnum, err); >> dev->stats.tx_dropped++; >> dev_kfree_skb_any(skb); >> return NETDEV_TX_OK; >> } >> >> Which should be sufficient? > There may be other reasons for this error, e.g -ENOSPC(no free desc). This should not happen unless the device or driver is buggy. We always reserved sufficient slots: if (sq->vq->num_free < 2+MAX_SKB_FRAGS) { netif_stop_subqueue(dev, qnum); ... > And if rx virtqueue is broken, there is no error statistics. Feel free to add one if it's necessary. Let's leave the policy decision (link down) to userspace. > >> >>> The driver can set the link state down when the virtqueue is broken. >>> If the state is down, the user can switch over to another NIC. >> >> Note that, we probably need the watchdog for virtio-net in order to be a >> complete solution. > Yes, I can think of is that the virtqueue's broken exception is detected on watchdog. > Is there anything else that needs to be done? Basically, it's all about TX stall which watchdog tries to catch. Broken vq is only one of the possible reason. Thanks > > Thanks > >> Thanks >> >> _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <20a5f1bd8a5a49fa8c0f90875a49631b@huawei.com>]
* Re: [PATCH net-next] virtio_net: set link state down when virtqueue is broken [not found] ` <20a5f1bd8a5a49fa8c0f90875a49631b@huawei.com> @ 2021-06-04 2:37 ` Jason Wang [not found] ` <5d6fdd5c8e62498ba804aa22d71eb6a8@huawei.com> 0 siblings, 1 reply; 4+ messages in thread From: Jason Wang @ 2021-06-04 2:37 UTC (permalink / raw) To: wangyunjian, netdev@vger.kernel.org Cc: kuba@kernel.org, virtualization@lists.linux-foundation.org, dingxiaoxiong, davem@davemloft.net, mst@redhat.com 在 2021/6/3 下午7:34, wangyunjian 写道: >> -----Original Message----- >> From: Jason Wang [mailto:jasowang@redhat.com] >> Sent: Monday, May 31, 2021 11:29 AM >> To: wangyunjian <wangyunjian@huawei.com>; netdev@vger.kernel.org >> Cc: kuba@kernel.org; davem@davemloft.net; mst@redhat.com; >> virtualization@lists.linux-foundation.org; dingxiaoxiong >> <dingxiaoxiong@huawei.com> >> Subject: Re: [PATCH net-next] virtio_net: set link state down when virtqueue is >> broken >> >> >> 在 2021/5/28 下午6:58, wangyunjian 写道: >>>> -----Original Message----- >>>>> From: Yunjian Wang <wangyunjian@huawei.com> >>>>> >>>>> The NIC can't receive/send packets if a rx/tx virtqueue is broken. >>>>> However, the link state of the NIC is still normal. As a result, the >>>>> user cannot detect the NIC exception. >>>> Doesn't we have: >>>> >>>> /* This should not happen! */ >>>> if (unlikely(err)) { >>>> dev->stats.tx_fifo_errors++; >>>> if (net_ratelimit()) >>>> dev_warn(&dev->dev, >>>> "Unexpected TXQ (%d) >> queue >>>> failure: %d\n", >>>> qnum, err); >>>> dev->stats.tx_dropped++; >>>> dev_kfree_skb_any(skb); >>>> return NETDEV_TX_OK; >>>> } >>>> >>>> Which should be sufficient? >>> There may be other reasons for this error, e.g -ENOSPC(no free desc). >> >> This should not happen unless the device or driver is buggy. We always reserved >> sufficient slots: >> >> if (sq->vq->num_free < 2+MAX_SKB_FRAGS) { >> netif_stop_subqueue(dev, qnum); ... >> >> >>> And if rx virtqueue is broken, there is no error statistics. >> >> Feel free to add one if it's necessary. > Currently receiving scenario, it is impossible to distinguish whether the reason for > not receiving packet is virtqueue's broken or no packet. Can we introduce rx_fifo_errors for that? > >> Let's leave the policy decision (link down) to userspace. >> >> >>>>> The driver can set the link state down when the virtqueue is broken. >>>>> If the state is down, the user can switch over to another NIC. >>>> Note that, we probably need the watchdog for virtio-net in order to >>>> be a complete solution. >>> Yes, I can think of is that the virtqueue's broken exception is detected on >> watchdog. >>> Is there anything else that needs to be done? >> >> Basically, it's all about TX stall which watchdog tries to catch. Broken vq is only >> one of the possible reason. > Are there any plans for the watchdog? Somebody posted a prototype 3 or 4 years ago, you can search it and maybe we can start from there. Thanks > > Thanks > >> Thanks >> >> >>> Thanks >>> >>>> Thanks >>>> >>>> _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <5d6fdd5c8e62498ba804aa22d71eb6a8@huawei.com>]
* Re: [PATCH net-next] virtio_net: set link state down when virtqueue is broken [not found] ` <5d6fdd5c8e62498ba804aa22d71eb6a8@huawei.com> @ 2021-06-07 2:28 ` Jason Wang 0 siblings, 0 replies; 4+ messages in thread From: Jason Wang @ 2021-06-07 2:28 UTC (permalink / raw) To: wangyunjian, netdev@vger.kernel.org, jcfaracco@gmail.com Cc: kuba@kernel.org, virtualization@lists.linux-foundation.org, dingxiaoxiong, davem@davemloft.net, mst@redhat.com 在 2021/6/5 下午3:10, wangyunjian 写道: >> -----Original Message----- >> From: Jason Wang [mailto:jasowang@redhat.com] >> Sent: Friday, June 4, 2021 10:38 AM >> To: wangyunjian <wangyunjian@huawei.com>; netdev@vger.kernel.org >> Cc: kuba@kernel.org; davem@davemloft.net; mst@redhat.com; >> virtualization@lists.linux-foundation.org; dingxiaoxiong >> <dingxiaoxiong@huawei.com> >> Subject: Re: [PATCH net-next] virtio_net: set link state down when virtqueue is >> broken >> >> >> 在 2021/6/3 下午7:34, wangyunjian 写道: >>>> -----Original Message----- >>>> From: Jason Wang [mailto:jasowang@redhat.com] >>>> Sent: Monday, May 31, 2021 11:29 AM >>>> To: wangyunjian <wangyunjian@huawei.com>; netdev@vger.kernel.org >>>> Cc: kuba@kernel.org; davem@davemloft.net; mst@redhat.com; >>>> virtualization@lists.linux-foundation.org; dingxiaoxiong >>>> <dingxiaoxiong@huawei.com> >>>> Subject: Re: [PATCH net-next] virtio_net: set link state down when >>>> virtqueue is broken >>>> >>>> >>>> 在 2021/5/28 下午6:58, wangyunjian 写道: >>>>>> -----Original Message----- >>>>>>> From: Yunjian Wang <wangyunjian@huawei.com> >>>>>>> >>>>>>> The NIC can't receive/send packets if a rx/tx virtqueue is broken. >>>>>>> However, the link state of the NIC is still normal. As a result, >>>>>>> the user cannot detect the NIC exception. >>>>>> Doesn't we have: >>>>>> >>>>>> /* This should not happen! */ >>>>>> if (unlikely(err)) { >>>>>> dev->stats.tx_fifo_errors++; >>>>>> if (net_ratelimit()) >>>>>> dev_warn(&dev->dev, >>>>>> "Unexpected TXQ (%d) >>>> queue >>>>>> failure: %d\n", >>>>>> qnum, err); >>>>>> dev->stats.tx_dropped++; >>>>>> dev_kfree_skb_any(skb); >>>>>> return NETDEV_TX_OK; >>>>>> } >>>>>> >>>>>> Which should be sufficient? >>>>> There may be other reasons for this error, e.g -ENOSPC(no free desc). >>>> This should not happen unless the device or driver is buggy. We >>>> always reserved sufficient slots: >>>> >>>> if (sq->vq->num_free < 2+MAX_SKB_FRAGS) { >>>> netif_stop_subqueue(dev, qnum); ... >>>> >>>> >>>>> And if rx virtqueue is broken, there is no error statistics. >>>> Feel free to add one if it's necessary. >>> Currently receiving scenario, it is impossible to distinguish whether >>> the reason for not receiving packet is virtqueue's broken or no packet. >> >> Can we introduce rx_fifo_errors for that? >> >> >>>> Let's leave the policy decision (link down) to userspace. >>>> >>>> >>>>>>> The driver can set the link state down when the virtqueue is broken. >>>>>>> If the state is down, the user can switch over to another NIC. >>>>>> Note that, we probably need the watchdog for virtio-net in order to >>>>>> be a complete solution. >>>>> Yes, I can think of is that the virtqueue's broken exception is >>>>> detected on >>>> watchdog. >>>>> Is there anything else that needs to be done? >>>> Basically, it's all about TX stall which watchdog tries to catch. >>>> Broken vq is only one of the possible reason. >>> Are there any plans for the watchdog? >> >> Somebody posted a prototype 3 or 4 years ago, you can search it and maybe we >> can start from there. > I find the patch (https://patchwork.ozlabs.org/project/netdev/patch/20191126200628.22251-3-jcfaracco@gmail.com/) > > The patch checks only the scenario where the sending queue is abnormal, but can > not detect the exception in the receiving queue. It's almost impossible to detect the abnormal of receiving queue since we there's no deterministic time for a receiving packet. > > And the patch restores the NIC by reset, which is inappropriate because the broken > state may be caused by a front-end or back-end bug. We should keep the scene to > locate bugs. This could be changed, we can increase the error counters and let userspce to decide what to do. Thanks > > Thanks > >> Thanks >> >> >>> Thanks >>> >>>> Thanks >>>> >>>> >>>>> Thanks >>>>> >>>>>> Thanks >>>>>> >>>>>> _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-06-07 2:29 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <79907bf6c835572b4af92f16d9a3ff2822b1c7ea.1622028946.git.wangyunjian@huawei.com>
2021-05-27 4:22 ` [PATCH net-next] virtio_net: set link state down when virtqueue is broken Jason Wang
[not found] ` <d18383f7e675452d9392321506db6fa0@huawei.com>
2021-05-31 3:28 ` Jason Wang
[not found] ` <20a5f1bd8a5a49fa8c0f90875a49631b@huawei.com>
2021-06-04 2:37 ` Jason Wang
[not found] ` <5d6fdd5c8e62498ba804aa22d71eb6a8@huawei.com>
2021-06-07 2:28 ` Jason Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).