* [PATCH 0/2] virtio-net: a fix and some updates for virtio dim
@ 2024-03-21 11:45 Heng Qi
2024-03-21 11:45 ` [PATCH 1/2] virtio-net: fix possible dim status unrecoverable Heng Qi
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Heng Qi @ 2024-03-21 11:45 UTC (permalink / raw)
To: netdev, virtualization, Jason Wang, Michael S. Tsirkin,
Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller,
Xuan Zhuo
Patch 1 fixes an existing bug. Belongs to the net branch.
Patch 2 attempts to modify the sending of dim cmds to an asynchronous way.
Heng Qi (2):
virtio-net: fix possible dim status unrecoverable
virtio-net: reduce the CPU consumption of dim worker
drivers/net/virtio_net.c | 273 ++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 246 insertions(+), 27 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 23+ messages in thread* [PATCH 1/2] virtio-net: fix possible dim status unrecoverable 2024-03-21 11:45 [PATCH 0/2] virtio-net: a fix and some updates for virtio dim Heng Qi @ 2024-03-21 11:45 ` Heng Qi 2024-03-22 5:17 ` Jason Wang 2024-03-21 11:45 ` [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker Heng Qi 2024-03-21 12:25 ` [PATCH 0/2] virtio-net: a fix and some updates for virtio dim Jiri Pirko 2 siblings, 1 reply; 23+ messages in thread From: Heng Qi @ 2024-03-21 11:45 UTC (permalink / raw) To: netdev, virtualization, Jason Wang, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo When the dim worker is scheduled, if it fails to acquire the lock, dim may not be able to return to the working state later. For example, the following single queue scenario: 1. The dim worker of rxq0 is scheduled, and the dim status is changed to DIM_APPLY_NEW_PROFILE; 2. The ethtool command is holding rtnl lock; 3. Since the rtnl lock is already held, virtnet_rx_dim_work fails to acquire the lock and exits; Then, even if net_dim is invoked again, it cannot work because the state is not restored to DIM_START_MEASURE. Fixes: 6208799553a8 ("virtio-net: support rx netdim") Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> --- drivers/net/virtio_net.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index c22d111..0ebe322 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -3563,8 +3563,10 @@ static void virtnet_rx_dim_work(struct work_struct *work) struct dim_cq_moder update_moder; int i, qnum, err; - if (!rtnl_trylock()) + if (!rtnl_trylock()) { + schedule_work(&dim->work); return; + } /* Each rxq's work is queued by "net_dim()->schedule_work()" * in response to NAPI traffic changes. Note that dim->profile_ix -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] virtio-net: fix possible dim status unrecoverable 2024-03-21 11:45 ` [PATCH 1/2] virtio-net: fix possible dim status unrecoverable Heng Qi @ 2024-03-22 5:17 ` Jason Wang 2024-03-25 2:11 ` Heng Qi 0 siblings, 1 reply; 23+ messages in thread From: Jason Wang @ 2024-03-22 5:17 UTC (permalink / raw) To: Heng Qi Cc: netdev, virtualization, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > > When the dim worker is scheduled, if it fails to acquire the lock, > dim may not be able to return to the working state later. > > For example, the following single queue scenario: > 1. The dim worker of rxq0 is scheduled, and the dim status is > changed to DIM_APPLY_NEW_PROFILE; > 2. The ethtool command is holding rtnl lock; > 3. Since the rtnl lock is already held, virtnet_rx_dim_work fails > to acquire the lock and exits; > > Then, even if net_dim is invoked again, it cannot work because the > state is not restored to DIM_START_MEASURE. > > Fixes: 6208799553a8 ("virtio-net: support rx netdim") > Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> > --- > drivers/net/virtio_net.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index c22d111..0ebe322 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -3563,8 +3563,10 @@ static void virtnet_rx_dim_work(struct work_struct *work) > struct dim_cq_moder update_moder; > int i, qnum, err; > > - if (!rtnl_trylock()) > + if (!rtnl_trylock()) { > + schedule_work(&dim->work); > return; > + } Patch looks fine but I wonder if a delayed schedule is better. Thanks > > /* Each rxq's work is queued by "net_dim()->schedule_work()" > * in response to NAPI traffic changes. Note that dim->profile_ix > -- > 1.8.3.1 > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] virtio-net: fix possible dim status unrecoverable 2024-03-22 5:17 ` Jason Wang @ 2024-03-25 2:11 ` Heng Qi 2024-03-25 6:29 ` Jason Wang 0 siblings, 1 reply; 23+ messages in thread From: Heng Qi @ 2024-03-25 2:11 UTC (permalink / raw) To: Jason Wang Cc: netdev, virtualization, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo 在 2024/3/22 下午1:17, Jason Wang 写道: > On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >> When the dim worker is scheduled, if it fails to acquire the lock, >> dim may not be able to return to the working state later. >> >> For example, the following single queue scenario: >> 1. The dim worker of rxq0 is scheduled, and the dim status is >> changed to DIM_APPLY_NEW_PROFILE; >> 2. The ethtool command is holding rtnl lock; >> 3. Since the rtnl lock is already held, virtnet_rx_dim_work fails >> to acquire the lock and exits; >> >> Then, even if net_dim is invoked again, it cannot work because the >> state is not restored to DIM_START_MEASURE. >> >> Fixes: 6208799553a8 ("virtio-net: support rx netdim") >> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> >> --- >> drivers/net/virtio_net.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >> index c22d111..0ebe322 100644 >> --- a/drivers/net/virtio_net.c >> +++ b/drivers/net/virtio_net.c >> @@ -3563,8 +3563,10 @@ static void virtnet_rx_dim_work(struct work_struct *work) >> struct dim_cq_moder update_moder; >> int i, qnum, err; >> >> - if (!rtnl_trylock()) >> + if (!rtnl_trylock()) { >> + schedule_work(&dim->work); >> return; >> + } > Patch looks fine but I wonder if a delayed schedule is better. The work in net_dim() core layer uses non-delayed-work, and the two cannot be mixed. Thanks, Heng > > Thanks > >> /* Each rxq's work is queued by "net_dim()->schedule_work()" >> * in response to NAPI traffic changes. Note that dim->profile_ix >> -- >> 1.8.3.1 >> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] virtio-net: fix possible dim status unrecoverable 2024-03-25 2:11 ` Heng Qi @ 2024-03-25 6:29 ` Jason Wang 2024-03-25 6:57 ` Heng Qi 0 siblings, 1 reply; 23+ messages in thread From: Jason Wang @ 2024-03-25 6:29 UTC (permalink / raw) To: Heng Qi Cc: netdev, virtualization, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo On Mon, Mar 25, 2024 at 10:11 AM Heng Qi <hengqi@linux.alibaba.com> wrote: > > > > 在 2024/3/22 下午1:17, Jason Wang 写道: > > On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > >> When the dim worker is scheduled, if it fails to acquire the lock, > >> dim may not be able to return to the working state later. > >> > >> For example, the following single queue scenario: > >> 1. The dim worker of rxq0 is scheduled, and the dim status is > >> changed to DIM_APPLY_NEW_PROFILE; > >> 2. The ethtool command is holding rtnl lock; > >> 3. Since the rtnl lock is already held, virtnet_rx_dim_work fails > >> to acquire the lock and exits; > >> > >> Then, even if net_dim is invoked again, it cannot work because the > >> state is not restored to DIM_START_MEASURE. > >> > >> Fixes: 6208799553a8 ("virtio-net: support rx netdim") > >> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> > >> --- > >> drivers/net/virtio_net.c | 4 +++- > >> 1 file changed, 3 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >> index c22d111..0ebe322 100644 > >> --- a/drivers/net/virtio_net.c > >> +++ b/drivers/net/virtio_net.c > >> @@ -3563,8 +3563,10 @@ static void virtnet_rx_dim_work(struct work_struct *work) > >> struct dim_cq_moder update_moder; > >> int i, qnum, err; > >> > >> - if (!rtnl_trylock()) > >> + if (!rtnl_trylock()) { > >> + schedule_work(&dim->work); > >> return; > >> + } > > Patch looks fine but I wonder if a delayed schedule is better. > > The work in net_dim() core layer uses non-delayed-work, and the two > cannot be mixed. Well, I think we need first to figure out if delayed work is better here. Switching to use delayed work for dim seems not hard anyhow. Thanks > > Thanks, > Heng > > > > > Thanks > > > >> /* Each rxq's work is queued by "net_dim()->schedule_work()" > >> * in response to NAPI traffic changes. Note that dim->profile_ix > >> -- > >> 1.8.3.1 > >> > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] virtio-net: fix possible dim status unrecoverable 2024-03-25 6:29 ` Jason Wang @ 2024-03-25 6:57 ` Heng Qi 2024-03-25 7:06 ` Jason Wang 0 siblings, 1 reply; 23+ messages in thread From: Heng Qi @ 2024-03-25 6:57 UTC (permalink / raw) To: Jason Wang Cc: netdev, virtualization, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo 在 2024/3/25 下午2:29, Jason Wang 写道: > On Mon, Mar 25, 2024 at 10:11 AM Heng Qi <hengqi@linux.alibaba.com> wrote: >> >> >> 在 2024/3/22 下午1:17, Jason Wang 写道: >>> On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >>>> When the dim worker is scheduled, if it fails to acquire the lock, >>>> dim may not be able to return to the working state later. >>>> >>>> For example, the following single queue scenario: >>>> 1. The dim worker of rxq0 is scheduled, and the dim status is >>>> changed to DIM_APPLY_NEW_PROFILE; >>>> 2. The ethtool command is holding rtnl lock; >>>> 3. Since the rtnl lock is already held, virtnet_rx_dim_work fails >>>> to acquire the lock and exits; >>>> >>>> Then, even if net_dim is invoked again, it cannot work because the >>>> state is not restored to DIM_START_MEASURE. >>>> >>>> Fixes: 6208799553a8 ("virtio-net: support rx netdim") >>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> >>>> --- >>>> drivers/net/virtio_net.c | 4 +++- >>>> 1 file changed, 3 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>>> index c22d111..0ebe322 100644 >>>> --- a/drivers/net/virtio_net.c >>>> +++ b/drivers/net/virtio_net.c >>>> @@ -3563,8 +3563,10 @@ static void virtnet_rx_dim_work(struct work_struct *work) >>>> struct dim_cq_moder update_moder; >>>> int i, qnum, err; >>>> >>>> - if (!rtnl_trylock()) >>>> + if (!rtnl_trylock()) { >>>> + schedule_work(&dim->work); >>>> return; >>>> + } >>> Patch looks fine but I wonder if a delayed schedule is better. >> The work in net_dim() core layer uses non-delayed-work, and the two >> cannot be mixed. > Well, I think we need first to figure out if delayed work is better here. I tested a VM with 16 NICs, 128 queues per NIC (2kq total). With dim enabled on all queues, there are many opportunities for contention for rtnl lock, and this patch introduces no visible hotspots. The dim performance is also stable. So I think there doesn't seem to be a strong motivation right now. Thanks, Heng > > Switching to use delayed work for dim seems not hard anyhow. > > Thanks > >> Thanks, >> Heng >> >>> Thanks >>> >>>> /* Each rxq's work is queued by "net_dim()->schedule_work()" >>>> * in response to NAPI traffic changes. Note that dim->profile_ix >>>> -- >>>> 1.8.3.1 >>>> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] virtio-net: fix possible dim status unrecoverable 2024-03-25 6:57 ` Heng Qi @ 2024-03-25 7:06 ` Jason Wang 0 siblings, 0 replies; 23+ messages in thread From: Jason Wang @ 2024-03-25 7:06 UTC (permalink / raw) To: Heng Qi Cc: netdev, virtualization, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo On Mon, Mar 25, 2024 at 2:58 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > > > > 在 2024/3/25 下午2:29, Jason Wang 写道: > > On Mon, Mar 25, 2024 at 10:11 AM Heng Qi <hengqi@linux.alibaba.com> wrote: > >> > >> > >> 在 2024/3/22 下午1:17, Jason Wang 写道: > >>> On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > >>>> When the dim worker is scheduled, if it fails to acquire the lock, > >>>> dim may not be able to return to the working state later. > >>>> > >>>> For example, the following single queue scenario: > >>>> 1. The dim worker of rxq0 is scheduled, and the dim status is > >>>> changed to DIM_APPLY_NEW_PROFILE; > >>>> 2. The ethtool command is holding rtnl lock; > >>>> 3. Since the rtnl lock is already held, virtnet_rx_dim_work fails > >>>> to acquire the lock and exits; > >>>> > >>>> Then, even if net_dim is invoked again, it cannot work because the > >>>> state is not restored to DIM_START_MEASURE. > >>>> > >>>> Fixes: 6208799553a8 ("virtio-net: support rx netdim") > >>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> > >>>> --- > >>>> drivers/net/virtio_net.c | 4 +++- > >>>> 1 file changed, 3 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >>>> index c22d111..0ebe322 100644 > >>>> --- a/drivers/net/virtio_net.c > >>>> +++ b/drivers/net/virtio_net.c > >>>> @@ -3563,8 +3563,10 @@ static void virtnet_rx_dim_work(struct work_struct *work) > >>>> struct dim_cq_moder update_moder; > >>>> int i, qnum, err; > >>>> > >>>> - if (!rtnl_trylock()) > >>>> + if (!rtnl_trylock()) { > >>>> + schedule_work(&dim->work); > >>>> return; > >>>> + } > >>> Patch looks fine but I wonder if a delayed schedule is better. > >> The work in net_dim() core layer uses non-delayed-work, and the two > >> cannot be mixed. > > Well, I think we need first to figure out if delayed work is better here. > > I tested a VM with 16 NICs, 128 queues per NIC (2kq total). With dim > enabled on all queues, > there are many opportunities for contention for rtnl lock, and this > patch introduces no visible hotspots. > The dim performance is also stable. So I think there doesn't seem to be > a strong motivation right now. That's fine, let's add them to the changelog. Acked-by: Jason Wang <jasowang@redhat.com> Thanks > > Thanks, > Heng > > > > > Switching to use delayed work for dim seems not hard anyhow. > > > > Thanks > > > >> Thanks, > >> Heng > >> > >>> Thanks > >>> > >>>> /* Each rxq's work is queued by "net_dim()->schedule_work()" > >>>> * in response to NAPI traffic changes. Note that dim->profile_ix > >>>> -- > >>>> 1.8.3.1 > >>>> > ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker 2024-03-21 11:45 [PATCH 0/2] virtio-net: a fix and some updates for virtio dim Heng Qi 2024-03-21 11:45 ` [PATCH 1/2] virtio-net: fix possible dim status unrecoverable Heng Qi @ 2024-03-21 11:45 ` Heng Qi 2024-03-22 2:03 ` kernel test robot ` (2 more replies) 2024-03-21 12:25 ` [PATCH 0/2] virtio-net: a fix and some updates for virtio dim Jiri Pirko 2 siblings, 3 replies; 23+ messages in thread From: Heng Qi @ 2024-03-21 11:45 UTC (permalink / raw) To: netdev, virtualization, Jason Wang, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo Currently, ctrlq processes commands in a synchronous manner, which increases the delay of dim commands when configuring multi-queue VMs, which in turn causes the CPU utilization to increase and interferes with the performance of dim. Therefore we asynchronously process ctlq's dim commands. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> --- drivers/net/virtio_net.c | 269 ++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 243 insertions(+), 26 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 0ebe322..460fc9e 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -138,6 +138,13 @@ struct virtnet_interrupt_coalesce { u32 max_usecs; }; +struct virtnet_coal_node { + struct virtio_net_ctrl_hdr hdr; + virtio_net_ctrl_ack status; + struct virtio_net_ctrl_coal_vq coal_vqs; + struct list_head list; +}; + /* The dma information of pages allocated at a time. */ struct virtnet_rq_dma { dma_addr_t addr; @@ -300,6 +307,9 @@ struct virtnet_info { /* Work struct for delayed refilling if we run low on memory. */ struct delayed_work refill; + /* Work struct for delayed acquisition of cvq processing results. */ + struct delayed_work get_cvq; + /* Is delayed refill enabled? */ bool refill_enabled; @@ -332,6 +342,10 @@ struct virtnet_info { bool rx_dim_enabled; /* Interrupt coalescing settings */ + int cvq_cmd_nums; + int batch_dim_nums; + int dim_loop_index; + struct list_head coal_list; struct virtnet_interrupt_coalesce intr_coal_tx; struct virtnet_interrupt_coalesce intr_coal_rx; @@ -2522,6 +2536,64 @@ static int virtnet_tx_resize(struct virtnet_info *vi, return err; } +static void virtnet_process_dim_cmd(struct virtnet_info *vi, void *res) +{ + struct virtnet_coal_node *coal_node; + u16 queue; + + vi->cvq_cmd_nums--; + + coal_node = (struct virtnet_coal_node *)res; + list_add(&coal_node->list, &vi->coal_list); + + queue = le16_to_cpu(coal_node->coal_vqs.vqn) / 2; + vi->rq[queue].dim.state = DIM_START_MEASURE; +} + +/** + * virtnet_cvq_response - get the response for filled ctrlq requests + * @poll: keep polling ctrlq when a NULL buffer is obtained. + * @dim_oneshot: process a dim cmd then exit, excluding user commands. + * + * Note that user commands must be processed synchronously + * (poll = true, dim_oneshot = false). + */ +static void virtnet_cvq_response(struct virtnet_info *vi, + bool poll, + bool dim_oneshot) +{ + unsigned tmp; + void *res; + + while (true) { + res = virtqueue_get_buf(vi->cvq, &tmp); + if (virtqueue_is_broken(vi->cvq)) { + dev_warn(&vi->dev->dev, "Control vq is broken.\n"); + return; + } + + if (!res) { + if (!poll) + return; + + cond_resched(); + cpu_relax(); + continue; + } + + /* this does not occur inside the process of waiting dim */ + if (res == ((void *)vi)) + return; + + virtnet_process_dim_cmd(vi, res); + /* When it is a user command, we must wait until the + * processing result is processed synchronously. + */ + if (dim_oneshot) + return; + } +} + /* * Send command via the control virtqueue and check status. Commands * supported by the hypervisor, as indicated by feature bits, should @@ -2531,7 +2603,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, struct scatterlist *out) { struct scatterlist *sgs[4], hdr, stat; - unsigned out_num = 0, tmp; + unsigned out_num = 0; int ret; /* Caller should know better */ @@ -2552,6 +2624,13 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, sgs[out_num] = &stat; BUG_ON(out_num + 1 > ARRAY_SIZE(sgs)); + + /* The additional task (dim) consumes the descriptor asynchronously, + * so we must ensure that there is a location for us. + */ + if (vi->cvq->num_free <= 3) + virtnet_cvq_response(vi, true, true); + ret = virtqueue_add_sgs(vi->cvq, sgs, out_num, 1, vi, GFP_ATOMIC); if (ret < 0) { dev_warn(&vi->vdev->dev, @@ -2565,11 +2644,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, /* Spin for a response, the kick causes an ioport write, trapping * into the hypervisor, so the request should be handled immediately. */ - while (!virtqueue_get_buf(vi->cvq, &tmp) && - !virtqueue_is_broken(vi->cvq)) { - cond_resched(); - cpu_relax(); - } + virtnet_cvq_response(vi, true, false); return vi->ctrl->status == VIRTIO_NET_OK; } @@ -2721,6 +2796,7 @@ static int virtnet_close(struct net_device *dev) cancel_work_sync(&vi->rq[i].dim.work); } + cancel_delayed_work_sync(&vi->get_cvq); return 0; } @@ -3553,48 +3629,148 @@ static int virtnet_send_notf_coal_vq_cmds(struct virtnet_info *vi, return 0; } +static bool virtnet_add_dim_command(struct virtnet_info *vi, + struct virtnet_coal_node *ctrl) +{ + struct scatterlist *sgs[4], hdr, stat, out; + unsigned out_num = 0; + int ret; + + BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); + + ctrl->hdr.class = VIRTIO_NET_CTRL_NOTF_COAL; + ctrl->hdr.cmd = VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET; + + sg_init_one(&hdr, &ctrl->hdr, sizeof(ctrl->hdr)); + sgs[out_num++] = &hdr; + + sg_init_one(&out, &ctrl->coal_vqs, sizeof(ctrl->coal_vqs)); + sgs[out_num++] = &out; + + ctrl->status = VIRTIO_NET_OK; + sg_init_one(&stat, &ctrl->status, sizeof(ctrl->status)); + sgs[out_num] = &stat; + + BUG_ON(out_num + 1 > ARRAY_SIZE(sgs)); + ret = virtqueue_add_sgs(vi->cvq, sgs, out_num, 1, ctrl, GFP_ATOMIC); + if (ret < 0) { + dev_warn(&vi->vdev->dev, "Failed to add sgs for command vq: %d\n.", ret); + return false; + } + + virtqueue_kick(vi->cvq); + + vi->cvq_cmd_nums++; + + return true; +} + +static void virtnet_get_cvq_work(struct work_struct *work) +{ + struct virtnet_info *vi = + container_of(work, struct virtnet_info, get_cvq.work); + + if (!rtnl_trylock()) { + schedule_delayed_work(&vi->get_cvq, 1); + return; + } + + if (!vi->cvq_cmd_nums) + goto ret; + + virtnet_cvq_response(vi, false, false); + + if (vi->cvq_cmd_nums) + schedule_delayed_work(&vi->get_cvq, 1); + +ret: + rtnl_unlock(); +} + +static int virtnet_config_dim(struct virtnet_info *vi, struct receive_queue *rq, + struct dim *dim) +{ + struct virtnet_coal_node *avail_coal; + struct dim_cq_moder update_moder; + int qnum = rq - vi->rq; + + update_moder = net_dim_get_rx_moderation(dim->mode, dim->profile_ix); + if (update_moder.usec != rq->intr_coal.max_usecs || + update_moder.pkts != rq->intr_coal.max_packets) { + avail_coal = list_first_entry(&vi->coal_list, + struct virtnet_coal_node, list); + avail_coal->coal_vqs.vqn = cpu_to_le16(rxq2vq(qnum)); + avail_coal->coal_vqs.coal.max_usecs = cpu_to_le32(update_moder.usec); + avail_coal->coal_vqs.coal.max_packets = cpu_to_le32(update_moder.pkts); + list_del(&avail_coal->list); + if (!virtnet_add_dim_command(vi, avail_coal)) + return -EINVAL; + + rq->intr_coal.max_usecs = update_moder.usec; + rq->intr_coal.max_packets = update_moder.pkts; + } else if (dim->state == DIM_APPLY_NEW_PROFILE) { + dim->state = DIM_START_MEASURE; + } + + return 0; +} + static void virtnet_rx_dim_work(struct work_struct *work) { struct dim *dim = container_of(work, struct dim, work); - struct receive_queue *rq = container_of(dim, + struct receive_queue *rq, *rq_ = container_of(dim, struct receive_queue, dim); - struct virtnet_info *vi = rq->vq->vdev->priv; - struct net_device *dev = vi->dev; - struct dim_cq_moder update_moder; - int i, qnum, err; + struct virtnet_info *vi = rq_->vq->vdev->priv; + int i = 0, err; if (!rtnl_trylock()) { schedule_work(&dim->work); return; } + if (list_empty(&vi->coal_list) || vi->cvq->num_free <= 3) + virtnet_cvq_response(vi, true, true); + + /* The request scheduling the worker must be processed first + * to avoid not having enough descs for ctrlq, causing the + * request to fail, and the parameters of the queue will never + * be updated again in the future. + */ + err = virtnet_config_dim(vi, rq_, dim); + if (err) + goto ret; + /* Each rxq's work is queued by "net_dim()->schedule_work()" * in response to NAPI traffic changes. Note that dim->profile_ix * for each rxq is updated prior to the queuing action. * So we only need to traverse and update profiles for all rxqs * in the work which is holding rtnl_lock. */ - for (i = 0; i < vi->curr_queue_pairs; i++) { + for (i = vi->dim_loop_index; i < vi->curr_queue_pairs; i++) { rq = &vi->rq[i]; dim = &rq->dim; - qnum = rq - vi->rq; - if (!rq->dim_enabled) + if (list_empty(&vi->coal_list) || vi->cvq->num_free <= 3) + break; + + if (!rq->dim_enabled || rq == rq_) continue; - update_moder = net_dim_get_rx_moderation(dim->mode, dim->profile_ix); - if (update_moder.usec != rq->intr_coal.max_usecs || - update_moder.pkts != rq->intr_coal.max_packets) { - err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum, - update_moder.usec, - update_moder.pkts); - if (err) - pr_debug("%s: Failed to send dim parameters on rxq%d\n", - dev->name, qnum); - dim->state = DIM_START_MEASURE; - } + err = virtnet_config_dim(vi, rq, dim); + if (err) + goto ret; + } + if (vi->cvq_cmd_nums) + schedule_delayed_work(&vi->get_cvq, 1); + +ret: + if (i == vi->curr_queue_pairs) + vi->dim_loop_index = 0; + else + vi->dim_loop_index = i; + rtnl_unlock(); } @@ -4439,6 +4615,7 @@ static int virtnet_alloc_queues(struct virtnet_info *vi) goto err_rq; INIT_DELAYED_WORK(&vi->refill, refill_work); + INIT_DELAYED_WORK(&vi->get_cvq, virtnet_get_cvq_work); for (i = 0; i < vi->max_queue_pairs; i++) { vi->rq[i].pages = NULL; netif_napi_add_weight(vi->dev, &vi->rq[i].napi, virtnet_poll, @@ -4623,6 +4800,35 @@ static void virtnet_set_big_packets(struct virtnet_info *vi, const int mtu) } } +static void virtnet_del_coal_list(struct virtnet_info *vi) +{ + struct virtnet_coal_node *coal_node, *tmp; + + list_for_each_entry_safe(coal_node, tmp, &vi->coal_list, list) { + list_del(&coal_node->list); + kfree(coal_node); + } +} + +static int virtnet_init_coal_list(struct virtnet_info *vi) +{ + struct virtnet_coal_node *coal_node; + int i; + + vi->batch_dim_nums = min((unsigned int)vi->max_queue_pairs, + virtqueue_get_vring_size(vi->cvq) / 3); + for (i = 0; i < vi->batch_dim_nums; i++) { + coal_node = kmalloc(sizeof(*coal_node), GFP_KERNEL); + if (!coal_node) { + virtnet_del_coal_list(vi); + return -ENOMEM; + } + list_add(&coal_node->list, &vi->coal_list); + } + + return 0; +} + static int virtnet_probe(struct virtio_device *vdev) { int i, err = -ENOMEM; @@ -4816,11 +5022,20 @@ static int virtnet_probe(struct virtio_device *vdev) vi->intr_coal_tx.max_packets = 0; } + INIT_LIST_HEAD(&vi->coal_list); if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_VQ_NOTF_COAL)) { + vi->cvq_cmd_nums = 0; + vi->dim_loop_index = 0; + + if (virtnet_init_coal_list(vi)) + goto free; + /* The reason is the same as VIRTIO_NET_F_NOTF_COAL. */ - for (i = 0; i < vi->max_queue_pairs; i++) + for (i = 0; i < vi->max_queue_pairs; i++) { + vi->rq[i].packets_in_napi = 0; if (vi->sq[i].napi.weight) vi->sq[i].intr_coal.max_packets = 1; + } } #ifdef CONFIG_SYSFS @@ -4949,6 +5164,8 @@ static void virtnet_remove(struct virtio_device *vdev) net_failover_destroy(vi->failover); + virtnet_del_coal_list(vi); + remove_vq_common(vi); free_netdev(vi->dev); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker 2024-03-21 11:45 ` [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker Heng Qi @ 2024-03-22 2:03 ` kernel test robot 2024-03-22 5:19 ` Jason Wang 2024-03-22 6:50 ` Dan Carpenter 2 siblings, 0 replies; 23+ messages in thread From: kernel test robot @ 2024-03-22 2:03 UTC (permalink / raw) To: Heng Qi, netdev, virtualization, Jason Wang, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo Cc: oe-kbuild-all Hi Heng, kernel test robot noticed the following build warnings: [auto build test WARNING on linus/master] [also build test WARNING on next-20240321] [cannot apply to v6.8] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Heng-Qi/virtio-net-fix-possible-dim-status-unrecoverable/20240321-194759 base: linus/master patch link: https://lore.kernel.org/r/1711021557-58116-3-git-send-email-hengqi%40linux.alibaba.com patch subject: [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20240322/202403220916.cSUxehuW-lkp@intel.com/config) compiler: gcc-11 (Debian 11.3.0-12) 11.3.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240322/202403220916.cSUxehuW-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202403220916.cSUxehuW-lkp@intel.com/ All warnings (new ones prefixed by >>): >> drivers/net/virtio_net.c:2564: warning: Function parameter or struct member 'vi' not described in 'virtnet_cvq_response' vim +2564 drivers/net/virtio_net.c 2552 2553 /** 2554 * virtnet_cvq_response - get the response for filled ctrlq requests 2555 * @poll: keep polling ctrlq when a NULL buffer is obtained. 2556 * @dim_oneshot: process a dim cmd then exit, excluding user commands. 2557 * 2558 * Note that user commands must be processed synchronously 2559 * (poll = true, dim_oneshot = false). 2560 */ 2561 static void virtnet_cvq_response(struct virtnet_info *vi, 2562 bool poll, 2563 bool dim_oneshot) > 2564 { 2565 unsigned tmp; 2566 void *res; 2567 2568 while (true) { 2569 res = virtqueue_get_buf(vi->cvq, &tmp); 2570 if (virtqueue_is_broken(vi->cvq)) { 2571 dev_warn(&vi->dev->dev, "Control vq is broken.\n"); 2572 return; 2573 } 2574 2575 if (!res) { 2576 if (!poll) 2577 return; 2578 2579 cond_resched(); 2580 cpu_relax(); 2581 continue; 2582 } 2583 2584 /* this does not occur inside the process of waiting dim */ 2585 if (res == ((void *)vi)) 2586 return; 2587 2588 virtnet_process_dim_cmd(vi, res); 2589 /* When it is a user command, we must wait until the 2590 * processing result is processed synchronously. 2591 */ 2592 if (dim_oneshot) 2593 return; 2594 } 2595 } 2596 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker 2024-03-21 11:45 ` [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker Heng Qi 2024-03-22 2:03 ` kernel test robot @ 2024-03-22 5:19 ` Jason Wang 2024-03-25 2:21 ` Heng Qi 2024-03-22 6:50 ` Dan Carpenter 2 siblings, 1 reply; 23+ messages in thread From: Jason Wang @ 2024-03-22 5:19 UTC (permalink / raw) To: Heng Qi Cc: netdev, virtualization, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > > Currently, ctrlq processes commands in a synchronous manner, > which increases the delay of dim commands when configuring > multi-queue VMs, which in turn causes the CPU utilization to > increase and interferes with the performance of dim. > > Therefore we asynchronously process ctlq's dim commands. > > Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> I may miss some previous discussions. But at least the changelog needs to explain why you don't use interrupt. Thanks ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker 2024-03-22 5:19 ` Jason Wang @ 2024-03-25 2:21 ` Heng Qi 2024-03-25 5:57 ` Jason Wang 0 siblings, 1 reply; 23+ messages in thread From: Heng Qi @ 2024-03-25 2:21 UTC (permalink / raw) To: Jason Wang Cc: netdev, virtualization, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo 在 2024/3/22 下午1:19, Jason Wang 写道: > On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >> Currently, ctrlq processes commands in a synchronous manner, >> which increases the delay of dim commands when configuring >> multi-queue VMs, which in turn causes the CPU utilization to >> increase and interferes with the performance of dim. >> >> Therefore we asynchronously process ctlq's dim commands. >> >> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> > I may miss some previous discussions. > > But at least the changelog needs to explain why you don't use interrupt. Will add, but reply here first. When upgrading the driver's ctrlq to use interrupt, problems may occur with some existing devices. For example, when existing devices are replaced with new drivers, they may not work. Or, if the guest OS supported by the new device is replaced by an old downstream OS product, it will not be usable. Although, ctrlq has the same capabilities as IOq in the virtio spec, this does have historical baggage. Thanks, Heng > > Thanks ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker 2024-03-25 2:21 ` Heng Qi @ 2024-03-25 5:57 ` Jason Wang 2024-03-25 7:17 ` Heng Qi 0 siblings, 1 reply; 23+ messages in thread From: Jason Wang @ 2024-03-25 5:57 UTC (permalink / raw) To: Heng Qi Cc: netdev, virtualization, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo On Mon, Mar 25, 2024 at 10:21 AM Heng Qi <hengqi@linux.alibaba.com> wrote: > > > > 在 2024/3/22 下午1:19, Jason Wang 写道: > > On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > >> Currently, ctrlq processes commands in a synchronous manner, > >> which increases the delay of dim commands when configuring > >> multi-queue VMs, which in turn causes the CPU utilization to > >> increase and interferes with the performance of dim. > >> > >> Therefore we asynchronously process ctlq's dim commands. > >> > >> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> > > I may miss some previous discussions. > > > > But at least the changelog needs to explain why you don't use interrupt. > > Will add, but reply here first. > > When upgrading the driver's ctrlq to use interrupt, problems may occur > with some existing devices. > For example, when existing devices are replaced with new drivers, they > may not work. > Or, if the guest OS supported by the new device is replaced by an old > downstream OS product, it will not be usable. > > Although, ctrlq has the same capabilities as IOq in the virtio spec, > this does have historical baggage. I don't think the upstream Linux drivers need to workaround buggy devices. Or it is a good excuse to block configure interrupts. And I remember you told us your device doesn't have such an issue. Thanks > > Thanks, > Heng > > > > > Thanks > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker 2024-03-25 5:57 ` Jason Wang @ 2024-03-25 7:17 ` Heng Qi 2024-03-25 7:56 ` Jason Wang 0 siblings, 1 reply; 23+ messages in thread From: Heng Qi @ 2024-03-25 7:17 UTC (permalink / raw) To: Jason Wang Cc: netdev, virtualization, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo 在 2024/3/25 下午1:57, Jason Wang 写道: > On Mon, Mar 25, 2024 at 10:21 AM Heng Qi <hengqi@linux.alibaba.com> wrote: >> >> >> 在 2024/3/22 下午1:19, Jason Wang 写道: >>> On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >>>> Currently, ctrlq processes commands in a synchronous manner, >>>> which increases the delay of dim commands when configuring >>>> multi-queue VMs, which in turn causes the CPU utilization to >>>> increase and interferes with the performance of dim. >>>> >>>> Therefore we asynchronously process ctlq's dim commands. >>>> >>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> >>> I may miss some previous discussions. >>> >>> But at least the changelog needs to explain why you don't use interrupt. >> Will add, but reply here first. >> >> When upgrading the driver's ctrlq to use interrupt, problems may occur >> with some existing devices. >> For example, when existing devices are replaced with new drivers, they >> may not work. >> Or, if the guest OS supported by the new device is replaced by an old >> downstream OS product, it will not be usable. >> >> Although, ctrlq has the same capabilities as IOq in the virtio spec, >> this does have historical baggage. > I don't think the upstream Linux drivers need to workaround buggy > devices. Or it is a good excuse to block configure interrupts. Of course I agree. Our DPU devices support ctrlq irq natively, as long as the guest os opens irq to ctrlq. If other products have no problem with this, I would prefer to use irq to solve this problem, which is the most essential solution. > > And I remember you told us your device doesn't have such an issue. YES. Thanks, Heng > > Thanks > >> Thanks, >> Heng >> >>> Thanks ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker 2024-03-25 7:17 ` Heng Qi @ 2024-03-25 7:56 ` Jason Wang 2024-03-25 8:22 ` Heng Qi 0 siblings, 1 reply; 23+ messages in thread From: Jason Wang @ 2024-03-25 7:56 UTC (permalink / raw) To: Heng Qi Cc: netdev, virtualization, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo On Mon, Mar 25, 2024 at 3:18 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > > > > 在 2024/3/25 下午1:57, Jason Wang 写道: > > On Mon, Mar 25, 2024 at 10:21 AM Heng Qi <hengqi@linux.alibaba.com> wrote: > >> > >> > >> 在 2024/3/22 下午1:19, Jason Wang 写道: > >>> On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > >>>> Currently, ctrlq processes commands in a synchronous manner, > >>>> which increases the delay of dim commands when configuring > >>>> multi-queue VMs, which in turn causes the CPU utilization to > >>>> increase and interferes with the performance of dim. > >>>> > >>>> Therefore we asynchronously process ctlq's dim commands. > >>>> > >>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> > >>> I may miss some previous discussions. > >>> > >>> But at least the changelog needs to explain why you don't use interrupt. > >> Will add, but reply here first. > >> > >> When upgrading the driver's ctrlq to use interrupt, problems may occur > >> with some existing devices. > >> For example, when existing devices are replaced with new drivers, they > >> may not work. > >> Or, if the guest OS supported by the new device is replaced by an old > >> downstream OS product, it will not be usable. > >> > >> Although, ctrlq has the same capabilities as IOq in the virtio spec, > >> this does have historical baggage. > > I don't think the upstream Linux drivers need to workaround buggy > > devices. Or it is a good excuse to block configure interrupts. > > Of course I agree. Our DPU devices support ctrlq irq natively, as long > as the guest os opens irq to ctrlq. > > If other products have no problem with this, I would prefer to use irq > to solve this problem, which is the most essential solution. Let's do that. Thanks > > > > > And I remember you told us your device doesn't have such an issue. > > YES. > > Thanks, > Heng > > > > > Thanks > > > >> Thanks, > >> Heng > >> > >>> Thanks > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker 2024-03-25 7:56 ` Jason Wang @ 2024-03-25 8:22 ` Heng Qi 2024-03-25 8:42 ` Jason Wang 0 siblings, 1 reply; 23+ messages in thread From: Heng Qi @ 2024-03-25 8:22 UTC (permalink / raw) To: Jason Wang Cc: netdev, virtualization, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo 在 2024/3/25 下午3:56, Jason Wang 写道: > On Mon, Mar 25, 2024 at 3:18 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >> >> >> 在 2024/3/25 下午1:57, Jason Wang 写道: >>> On Mon, Mar 25, 2024 at 10:21 AM Heng Qi <hengqi@linux.alibaba.com> wrote: >>>> >>>> 在 2024/3/22 下午1:19, Jason Wang 写道: >>>>> On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >>>>>> Currently, ctrlq processes commands in a synchronous manner, >>>>>> which increases the delay of dim commands when configuring >>>>>> multi-queue VMs, which in turn causes the CPU utilization to >>>>>> increase and interferes with the performance of dim. >>>>>> >>>>>> Therefore we asynchronously process ctlq's dim commands. >>>>>> >>>>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> >>>>> I may miss some previous discussions. >>>>> >>>>> But at least the changelog needs to explain why you don't use interrupt. >>>> Will add, but reply here first. >>>> >>>> When upgrading the driver's ctrlq to use interrupt, problems may occur >>>> with some existing devices. >>>> For example, when existing devices are replaced with new drivers, they >>>> may not work. >>>> Or, if the guest OS supported by the new device is replaced by an old >>>> downstream OS product, it will not be usable. >>>> >>>> Although, ctrlq has the same capabilities as IOq in the virtio spec, >>>> this does have historical baggage. >>> I don't think the upstream Linux drivers need to workaround buggy >>> devices. Or it is a good excuse to block configure interrupts. >> Of course I agree. Our DPU devices support ctrlq irq natively, as long >> as the guest os opens irq to ctrlq. >> >> If other products have no problem with this, I would prefer to use irq >> to solve this problem, which is the most essential solution. > Let's do that. Ok, will do. Do you have the link to the patch where you previously modified the control queue for interrupt notifications. I think a new patch could be made on top of it, but I can't seem to find it. Thanks, Heng > > Thanks > >>> And I remember you told us your device doesn't have such an issue. >> YES. >> >> Thanks, >> Heng >> >>> Thanks >>> >>>> Thanks, >>>> Heng >>>> >>>>> Thanks ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker 2024-03-25 8:22 ` Heng Qi @ 2024-03-25 8:42 ` Jason Wang 2024-03-26 2:46 ` Heng Qi 0 siblings, 1 reply; 23+ messages in thread From: Jason Wang @ 2024-03-25 8:42 UTC (permalink / raw) To: Heng Qi Cc: netdev, virtualization, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo On Mon, Mar 25, 2024 at 4:22 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > > > > 在 2024/3/25 下午3:56, Jason Wang 写道: > > On Mon, Mar 25, 2024 at 3:18 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > >> > >> > >> 在 2024/3/25 下午1:57, Jason Wang 写道: > >>> On Mon, Mar 25, 2024 at 10:21 AM Heng Qi <hengqi@linux.alibaba.com> wrote: > >>>> > >>>> 在 2024/3/22 下午1:19, Jason Wang 写道: > >>>>> On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > >>>>>> Currently, ctrlq processes commands in a synchronous manner, > >>>>>> which increases the delay of dim commands when configuring > >>>>>> multi-queue VMs, which in turn causes the CPU utilization to > >>>>>> increase and interferes with the performance of dim. > >>>>>> > >>>>>> Therefore we asynchronously process ctlq's dim commands. > >>>>>> > >>>>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> > >>>>> I may miss some previous discussions. > >>>>> > >>>>> But at least the changelog needs to explain why you don't use interrupt. > >>>> Will add, but reply here first. > >>>> > >>>> When upgrading the driver's ctrlq to use interrupt, problems may occur > >>>> with some existing devices. > >>>> For example, when existing devices are replaced with new drivers, they > >>>> may not work. > >>>> Or, if the guest OS supported by the new device is replaced by an old > >>>> downstream OS product, it will not be usable. > >>>> > >>>> Although, ctrlq has the same capabilities as IOq in the virtio spec, > >>>> this does have historical baggage. > >>> I don't think the upstream Linux drivers need to workaround buggy > >>> devices. Or it is a good excuse to block configure interrupts. > >> Of course I agree. Our DPU devices support ctrlq irq natively, as long > >> as the guest os opens irq to ctrlq. > >> > >> If other products have no problem with this, I would prefer to use irq > >> to solve this problem, which is the most essential solution. > > Let's do that. > > Ok, will do. > > Do you have the link to the patch where you previously modified the > control queue for interrupt notifications. > I think a new patch could be made on top of it, but I can't seem to find it. Something like this? https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ Note that 1) some patch has been merged 2) we probably need to drop the timeout logic as it's another topic 3) need to address other comments THanks > > Thanks, > Heng > > > > > Thanks > > > >>> And I remember you told us your device doesn't have such an issue. > >> YES. > >> > >> Thanks, > >> Heng > >> > >>> Thanks > >>> > >>>> Thanks, > >>>> Heng > >>>> > >>>>> Thanks > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker 2024-03-25 8:42 ` Jason Wang @ 2024-03-26 2:46 ` Heng Qi 2024-03-26 4:08 ` Jason Wang 0 siblings, 1 reply; 23+ messages in thread From: Heng Qi @ 2024-03-26 2:46 UTC (permalink / raw) To: Jason Wang Cc: netdev, virtualization, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo 在 2024/3/25 下午4:42, Jason Wang 写道: > On Mon, Mar 25, 2024 at 4:22 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >> >> >> 在 2024/3/25 下午3:56, Jason Wang 写道: >>> On Mon, Mar 25, 2024 at 3:18 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >>>> >>>> 在 2024/3/25 下午1:57, Jason Wang 写道: >>>>> On Mon, Mar 25, 2024 at 10:21 AM Heng Qi <hengqi@linux.alibaba.com> wrote: >>>>>> 在 2024/3/22 下午1:19, Jason Wang 写道: >>>>>>> On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >>>>>>>> Currently, ctrlq processes commands in a synchronous manner, >>>>>>>> which increases the delay of dim commands when configuring >>>>>>>> multi-queue VMs, which in turn causes the CPU utilization to >>>>>>>> increase and interferes with the performance of dim. >>>>>>>> >>>>>>>> Therefore we asynchronously process ctlq's dim commands. >>>>>>>> >>>>>>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> >>>>>>> I may miss some previous discussions. >>>>>>> >>>>>>> But at least the changelog needs to explain why you don't use interrupt. >>>>>> Will add, but reply here first. >>>>>> >>>>>> When upgrading the driver's ctrlq to use interrupt, problems may occur >>>>>> with some existing devices. >>>>>> For example, when existing devices are replaced with new drivers, they >>>>>> may not work. >>>>>> Or, if the guest OS supported by the new device is replaced by an old >>>>>> downstream OS product, it will not be usable. >>>>>> >>>>>> Although, ctrlq has the same capabilities as IOq in the virtio spec, >>>>>> this does have historical baggage. >>>>> I don't think the upstream Linux drivers need to workaround buggy >>>>> devices. Or it is a good excuse to block configure interrupts. >>>> Of course I agree. Our DPU devices support ctrlq irq natively, as long >>>> as the guest os opens irq to ctrlq. >>>> >>>> If other products have no problem with this, I would prefer to use irq >>>> to solve this problem, which is the most essential solution. >>> Let's do that. >> Ok, will do. >> >> Do you have the link to the patch where you previously modified the >> control queue for interrupt notifications. >> I think a new patch could be made on top of it, but I can't seem to find it. > Something like this? YES. Thanks Jason. > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ > > Note that > > 1) some patch has been merged > 2) we probably need to drop the timeout logic as it's another topic > 3) need to address other comments I did a quick read of your patch sets from the previous 5 version: [1] https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ [2] https://lore.kernel.org/all/20221226074908.8154-1-jasowang@redhat.com/ [3] https://lore.kernel.org/all/20230413064027.13267-1-jasowang@redhat.com/ [4] https://lore.kernel.org/all/20230524081842.3060-1-jasowang@redhat.com/ [5] https://lore.kernel.org/all/20230720083839.481487-1-jasowang@redhat.com/ Regarding adding the interrupt to ctrlq, there are a few points where there is no agreement, which I summarize below. 1. Require additional interrupt vector resource https://lore.kernel.org/all/20230516165043-mutt-send-email-mst@kernel.org/ 2. Adding the interrupt for ctrlq may break some devices https://lore.kernel.org/all/f9e75ce5-e6df-d1be-201b-7d0f18c1b6e7@redhat.com/ 3. RTNL breaks surprise removal https://lore.kernel.org/all/20230720170001-mutt-send-email-mst@kernel.org/ Regarding the above, there seems to be no conclusion yet. If these problems still exist, I think this patch is good enough and we can merge it first. For the third point, it seems to be being solved by Daniel now [6], but spink lock is used, which I think conflicts with the way of adding interrupts to ctrlq. [6] https://lore.kernel.org/all/20240325214912.323749-1-danielj@nvidia.com/ Thanks, Heng > > THanks > > >> Thanks, >> Heng >> >>> Thanks >>> >>>>> And I remember you told us your device doesn't have such an issue. >>>> YES. >>>> >>>> Thanks, >>>> Heng >>>> >>>>> Thanks >>>>> >>>>>> Thanks, >>>>>> Heng >>>>>> >>>>>>> Thanks ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker 2024-03-26 2:46 ` Heng Qi @ 2024-03-26 4:08 ` Jason Wang 2024-03-26 5:57 ` Heng Qi 0 siblings, 1 reply; 23+ messages in thread From: Jason Wang @ 2024-03-26 4:08 UTC (permalink / raw) To: Heng Qi Cc: netdev, virtualization, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo On Tue, Mar 26, 2024 at 10:46 AM Heng Qi <hengqi@linux.alibaba.com> wrote: > > > > 在 2024/3/25 下午4:42, Jason Wang 写道: > > On Mon, Mar 25, 2024 at 4:22 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > >> > >> > >> 在 2024/3/25 下午3:56, Jason Wang 写道: > >>> On Mon, Mar 25, 2024 at 3:18 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > >>>> > >>>> 在 2024/3/25 下午1:57, Jason Wang 写道: > >>>>> On Mon, Mar 25, 2024 at 10:21 AM Heng Qi <hengqi@linux.alibaba.com> wrote: > >>>>>> 在 2024/3/22 下午1:19, Jason Wang 写道: > >>>>>>> On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > >>>>>>>> Currently, ctrlq processes commands in a synchronous manner, > >>>>>>>> which increases the delay of dim commands when configuring > >>>>>>>> multi-queue VMs, which in turn causes the CPU utilization to > >>>>>>>> increase and interferes with the performance of dim. > >>>>>>>> > >>>>>>>> Therefore we asynchronously process ctlq's dim commands. > >>>>>>>> > >>>>>>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> > >>>>>>> I may miss some previous discussions. > >>>>>>> > >>>>>>> But at least the changelog needs to explain why you don't use interrupt. > >>>>>> Will add, but reply here first. > >>>>>> > >>>>>> When upgrading the driver's ctrlq to use interrupt, problems may occur > >>>>>> with some existing devices. > >>>>>> For example, when existing devices are replaced with new drivers, they > >>>>>> may not work. > >>>>>> Or, if the guest OS supported by the new device is replaced by an old > >>>>>> downstream OS product, it will not be usable. > >>>>>> > >>>>>> Although, ctrlq has the same capabilities as IOq in the virtio spec, > >>>>>> this does have historical baggage. > >>>>> I don't think the upstream Linux drivers need to workaround buggy > >>>>> devices. Or it is a good excuse to block configure interrupts. > >>>> Of course I agree. Our DPU devices support ctrlq irq natively, as long > >>>> as the guest os opens irq to ctrlq. > >>>> > >>>> If other products have no problem with this, I would prefer to use irq > >>>> to solve this problem, which is the most essential solution. > >>> Let's do that. > >> Ok, will do. > >> > >> Do you have the link to the patch where you previously modified the > >> control queue for interrupt notifications. > >> I think a new patch could be made on top of it, but I can't seem to find it. > > Something like this? > > YES. Thanks Jason. > > > > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ > > > > Note that > > > > 1) some patch has been merged > > 2) we probably need to drop the timeout logic as it's another topic > > 3) need to address other comments > > I did a quick read of your patch sets from the previous 5 version: > [1] > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ > [2] https://lore.kernel.org/all/20221226074908.8154-1-jasowang@redhat.com/ > [3] https://lore.kernel.org/all/20230413064027.13267-1-jasowang@redhat.com/ > [4] https://lore.kernel.org/all/20230524081842.3060-1-jasowang@redhat.com/ > [5] https://lore.kernel.org/all/20230720083839.481487-1-jasowang@redhat.com/ > > Regarding adding the interrupt to ctrlq, there are a few points where > there is no agreement, > which I summarize below. > > 1. Require additional interrupt vector resource > https://lore.kernel.org/all/20230516165043-mutt-send-email-mst@kernel.org/ I don't think one more vector is a big problem. Multiqueue will require much more than this. Even if it is, we can try to share an interrupt as Michael suggests. Let's start from something that is simple, just one more vector. > 2. Adding the interrupt for ctrlq may break some devices > https://lore.kernel.org/all/f9e75ce5-e6df-d1be-201b-7d0f18c1b6e7@redhat.com/ These devices need to be fixed. It's hard to imagine the evolution of virtio-net is blocked by buggy devices. > 3. RTNL breaks surprise removal > https://lore.kernel.org/all/20230720170001-mutt-send-email-mst@kernel.org/ The comment is for indefinite waiting for ctrl vq which turns out to be another issue. For the removal, we just need to do the wakeup then everything is fine. > > Regarding the above, there seems to be no conclusion yet. > If these problems still exist, I think this patch is good enough and we > can merge it first. I don't think so, poll turns out to be problematic for a lot of cases. > > For the third point, it seems to be being solved by Daniel now [6], but > spink lock is used, > which I think conflicts with the way of adding interrupts to ctrlq. > > [6] https://lore.kernel.org/all/20240325214912.323749-1-danielj@nvidia.com/ I don't see how it conflicts with this. Thanks > > > Thanks, > Heng > > > > > THanks > > > > > >> Thanks, > >> Heng > >> > >>> Thanks > >>> > >>>>> And I remember you told us your device doesn't have such an issue. > >>>> YES. > >>>> > >>>> Thanks, > >>>> Heng > >>>> > >>>>> Thanks > >>>>> > >>>>>> Thanks, > >>>>>> Heng > >>>>>> > >>>>>>> Thanks > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker 2024-03-26 4:08 ` Jason Wang @ 2024-03-26 5:57 ` Heng Qi 2024-03-26 6:05 ` Jason Wang 0 siblings, 1 reply; 23+ messages in thread From: Heng Qi @ 2024-03-26 5:57 UTC (permalink / raw) To: Jason Wang Cc: netdev, virtualization, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo 在 2024/3/26 下午12:08, Jason Wang 写道: > On Tue, Mar 26, 2024 at 10:46 AM Heng Qi <hengqi@linux.alibaba.com> wrote: >> >> >> 在 2024/3/25 下午4:42, Jason Wang 写道: >>> On Mon, Mar 25, 2024 at 4:22 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >>>> >>>> 在 2024/3/25 下午3:56, Jason Wang 写道: >>>>> On Mon, Mar 25, 2024 at 3:18 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >>>>>> 在 2024/3/25 下午1:57, Jason Wang 写道: >>>>>>> On Mon, Mar 25, 2024 at 10:21 AM Heng Qi <hengqi@linux.alibaba.com> wrote: >>>>>>>> 在 2024/3/22 下午1:19, Jason Wang 写道: >>>>>>>>> On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >>>>>>>>>> Currently, ctrlq processes commands in a synchronous manner, >>>>>>>>>> which increases the delay of dim commands when configuring >>>>>>>>>> multi-queue VMs, which in turn causes the CPU utilization to >>>>>>>>>> increase and interferes with the performance of dim. >>>>>>>>>> >>>>>>>>>> Therefore we asynchronously process ctlq's dim commands. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> >>>>>>>>> I may miss some previous discussions. >>>>>>>>> >>>>>>>>> But at least the changelog needs to explain why you don't use interrupt. >>>>>>>> Will add, but reply here first. >>>>>>>> >>>>>>>> When upgrading the driver's ctrlq to use interrupt, problems may occur >>>>>>>> with some existing devices. >>>>>>>> For example, when existing devices are replaced with new drivers, they >>>>>>>> may not work. >>>>>>>> Or, if the guest OS supported by the new device is replaced by an old >>>>>>>> downstream OS product, it will not be usable. >>>>>>>> >>>>>>>> Although, ctrlq has the same capabilities as IOq in the virtio spec, >>>>>>>> this does have historical baggage. >>>>>>> I don't think the upstream Linux drivers need to workaround buggy >>>>>>> devices. Or it is a good excuse to block configure interrupts. >>>>>> Of course I agree. Our DPU devices support ctrlq irq natively, as long >>>>>> as the guest os opens irq to ctrlq. >>>>>> >>>>>> If other products have no problem with this, I would prefer to use irq >>>>>> to solve this problem, which is the most essential solution. >>>>> Let's do that. >>>> Ok, will do. >>>> >>>> Do you have the link to the patch where you previously modified the >>>> control queue for interrupt notifications. >>>> I think a new patch could be made on top of it, but I can't seem to find it. >>> Something like this? >> YES. Thanks Jason. >> >>> https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ >>> >>> Note that >>> >>> 1) some patch has been merged >>> 2) we probably need to drop the timeout logic as it's another topic >>> 3) need to address other comments >> I did a quick read of your patch sets from the previous 5 version: >> [1] >> https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ >> [2] https://lore.kernel.org/all/20221226074908.8154-1-jasowang@redhat.com/ >> [3] https://lore.kernel.org/all/20230413064027.13267-1-jasowang@redhat.com/ >> [4] https://lore.kernel.org/all/20230524081842.3060-1-jasowang@redhat.com/ >> [5] https://lore.kernel.org/all/20230720083839.481487-1-jasowang@redhat.com/ >> >> Regarding adding the interrupt to ctrlq, there are a few points where >> there is no agreement, >> which I summarize below. >> >> 1. Require additional interrupt vector resource >> https://lore.kernel.org/all/20230516165043-mutt-send-email-mst@kernel.org/ > I don't think one more vector is a big problem. Multiqueue will > require much more than this. > > Even if it is, we can try to share an interrupt as Michael suggests. > > Let's start from something that is simple, just one more vector. OK, that puts my concerns to rest. > >> 2. Adding the interrupt for ctrlq may break some devices >> https://lore.kernel.org/all/f9e75ce5-e6df-d1be-201b-7d0f18c1b6e7@redhat.com/ > These devices need to be fixed. It's hard to imagine the evolution of > virtio-net is blocked by buggy devices. Agree. > >> 3. RTNL breaks surprise removal >> https://lore.kernel.org/all/20230720170001-mutt-send-email-mst@kernel.org/ > The comment is for indefinite waiting for ctrl vq which turns out to > be another issue. > > For the removal, we just need to do the wakeup then everything is fine. Then I will make a patch set based on irq and without timeout. > >> Regarding the above, there seems to be no conclusion yet. >> If these problems still exist, I think this patch is good enough and we >> can merge it first. > I don't think so, poll turns out to be problematic for a lot of cases. > >> For the third point, it seems to be being solved by Daniel now [6], but >> spink lock is used, >> which I think conflicts with the way of adding interrupts to ctrlq. >> >> [6] https://lore.kernel.org/all/20240325214912.323749-1-danielj@nvidia.com/ > I don't see how it conflicts with this. I'll just make changes on top of it. Can I? Thanks, Heng > > Thanks > >> >> Thanks, >> Heng >> >>> THanks >>> >>> >>>> Thanks, >>>> Heng >>>> >>>>> Thanks >>>>> >>>>>>> And I remember you told us your device doesn't have such an issue. >>>>>> YES. >>>>>> >>>>>> Thanks, >>>>>> Heng >>>>>> >>>>>>> Thanks >>>>>>> >>>>>>>> Thanks, >>>>>>>> Heng >>>>>>>> >>>>>>>>> Thanks ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker 2024-03-26 5:57 ` Heng Qi @ 2024-03-26 6:05 ` Jason Wang 0 siblings, 0 replies; 23+ messages in thread From: Jason Wang @ 2024-03-26 6:05 UTC (permalink / raw) To: Heng Qi Cc: netdev, virtualization, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo On Tue, Mar 26, 2024 at 1:57 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > > > > 在 2024/3/26 下午12:08, Jason Wang 写道: > > On Tue, Mar 26, 2024 at 10:46 AM Heng Qi <hengqi@linux.alibaba.com> wrote: > >> > >> > >> 在 2024/3/25 下午4:42, Jason Wang 写道: > >>> On Mon, Mar 25, 2024 at 4:22 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > >>>> > >>>> 在 2024/3/25 下午3:56, Jason Wang 写道: > >>>>> On Mon, Mar 25, 2024 at 3:18 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > >>>>>> 在 2024/3/25 下午1:57, Jason Wang 写道: > >>>>>>> On Mon, Mar 25, 2024 at 10:21 AM Heng Qi <hengqi@linux.alibaba.com> wrote: > >>>>>>>> 在 2024/3/22 下午1:19, Jason Wang 写道: > >>>>>>>>> On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > >>>>>>>>>> Currently, ctrlq processes commands in a synchronous manner, > >>>>>>>>>> which increases the delay of dim commands when configuring > >>>>>>>>>> multi-queue VMs, which in turn causes the CPU utilization to > >>>>>>>>>> increase and interferes with the performance of dim. > >>>>>>>>>> > >>>>>>>>>> Therefore we asynchronously process ctlq's dim commands. > >>>>>>>>>> > >>>>>>>>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> > >>>>>>>>> I may miss some previous discussions. > >>>>>>>>> > >>>>>>>>> But at least the changelog needs to explain why you don't use interrupt. > >>>>>>>> Will add, but reply here first. > >>>>>>>> > >>>>>>>> When upgrading the driver's ctrlq to use interrupt, problems may occur > >>>>>>>> with some existing devices. > >>>>>>>> For example, when existing devices are replaced with new drivers, they > >>>>>>>> may not work. > >>>>>>>> Or, if the guest OS supported by the new device is replaced by an old > >>>>>>>> downstream OS product, it will not be usable. > >>>>>>>> > >>>>>>>> Although, ctrlq has the same capabilities as IOq in the virtio spec, > >>>>>>>> this does have historical baggage. > >>>>>>> I don't think the upstream Linux drivers need to workaround buggy > >>>>>>> devices. Or it is a good excuse to block configure interrupts. > >>>>>> Of course I agree. Our DPU devices support ctrlq irq natively, as long > >>>>>> as the guest os opens irq to ctrlq. > >>>>>> > >>>>>> If other products have no problem with this, I would prefer to use irq > >>>>>> to solve this problem, which is the most essential solution. > >>>>> Let's do that. > >>>> Ok, will do. > >>>> > >>>> Do you have the link to the patch where you previously modified the > >>>> control queue for interrupt notifications. > >>>> I think a new patch could be made on top of it, but I can't seem to find it. > >>> Something like this? > >> YES. Thanks Jason. > >> > >>> https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ > >>> > >>> Note that > >>> > >>> 1) some patch has been merged > >>> 2) we probably need to drop the timeout logic as it's another topic > >>> 3) need to address other comments > >> I did a quick read of your patch sets from the previous 5 version: > >> [1] > >> https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ > >> [2] https://lore.kernel.org/all/20221226074908.8154-1-jasowang@redhat.com/ > >> [3] https://lore.kernel.org/all/20230413064027.13267-1-jasowang@redhat.com/ > >> [4] https://lore.kernel.org/all/20230524081842.3060-1-jasowang@redhat.com/ > >> [5] https://lore.kernel.org/all/20230720083839.481487-1-jasowang@redhat.com/ > >> > >> Regarding adding the interrupt to ctrlq, there are a few points where > >> there is no agreement, > >> which I summarize below. > >> > >> 1. Require additional interrupt vector resource > >> https://lore.kernel.org/all/20230516165043-mutt-send-email-mst@kernel.org/ > > I don't think one more vector is a big problem. Multiqueue will > > require much more than this. > > > > Even if it is, we can try to share an interrupt as Michael suggests. > > > > Let's start from something that is simple, just one more vector. > > OK, that puts my concerns to rest. > > > > >> 2. Adding the interrupt for ctrlq may break some devices > >> https://lore.kernel.org/all/f9e75ce5-e6df-d1be-201b-7d0f18c1b6e7@redhat.com/ > > These devices need to be fixed. It's hard to imagine the evolution of > > virtio-net is blocked by buggy devices. > > Agree. > > > > >> 3. RTNL breaks surprise removal > >> https://lore.kernel.org/all/20230720170001-mutt-send-email-mst@kernel.org/ > > The comment is for indefinite waiting for ctrl vq which turns out to > > be another issue. > > > > For the removal, we just need to do the wakeup then everything is fine. > > Then I will make a patch set based on irq and without timeout. Ok. > > > > >> Regarding the above, there seems to be no conclusion yet. > >> If these problems still exist, I think this patch is good enough and we > >> can merge it first. > > I don't think so, poll turns out to be problematic for a lot of cases. > > > >> For the third point, it seems to be being solved by Daniel now [6], but > >> spink lock is used, > >> which I think conflicts with the way of adding interrupts to ctrlq. > >> > >> [6] https://lore.kernel.org/all/20240325214912.323749-1-danielj@nvidia.com/ > > I don't see how it conflicts with this. > > I'll just make changes on top of it. Can I? It should be fine. Thanks > > Thanks, > Heng > > > > > Thanks > > > >> > >> Thanks, > >> Heng > >> > >>> THanks > >>> > >>> > >>>> Thanks, > >>>> Heng > >>>> > >>>>> Thanks > >>>>> > >>>>>>> And I remember you told us your device doesn't have such an issue. > >>>>>> YES. > >>>>>> > >>>>>> Thanks, > >>>>>> Heng > >>>>>> > >>>>>>> Thanks > >>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Heng > >>>>>>>> > >>>>>>>>> Thanks > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker 2024-03-21 11:45 ` [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker Heng Qi 2024-03-22 2:03 ` kernel test robot 2024-03-22 5:19 ` Jason Wang @ 2024-03-22 6:50 ` Dan Carpenter 2 siblings, 0 replies; 23+ messages in thread From: Dan Carpenter @ 2024-03-22 6:50 UTC (permalink / raw) To: oe-kbuild, Heng Qi, netdev, virtualization, Jason Wang, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo Cc: lkp, oe-kbuild-all Hi Heng, kernel test robot noticed the following build warnings: https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Heng-Qi/virtio-net-fix-possible-dim-status-unrecoverable/20240321-194759 base: linus/master patch link: https://lore.kernel.org/r/1711021557-58116-3-git-send-email-hengqi%40linux.alibaba.com patch subject: [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker config: i386-randconfig-141-20240322 (https://download.01.org/0day-ci/archive/20240322/202403221133.J66oueZh-lkp@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Reported-by: Dan Carpenter <dan.carpenter@linaro.org> | Closes: https://lore.kernel.org/r/202403221133.J66oueZh-lkp@intel.com/ New smatch warnings: drivers/net/virtio_net.c:5031 virtnet_probe() warn: missing error code 'err' vim +/err +5031 drivers/net/virtio_net.c 986a4f4d452dec Jason Wang 2012-12-07 5006 /* Allocate/initialize the rx/tx queues, and invoke find_vqs */ 3f9c10b0d478a3 Amit Shah 2011-12-22 5007 err = init_vqs(vi); d2a7ddda9ffb1c Michael S. Tsirkin 2009-06-12 5008 if (err) d7dfc5cf56b0e3 Toshiaki Makita 2018-01-17 5009 goto free; 296f96fcfc160e Rusty Russell 2007-10-22 5010 3014a0d54820d2 Heng Qi 2023-10-08 5011 if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_NOTF_COAL)) { 3014a0d54820d2 Heng Qi 2023-10-08 5012 vi->intr_coal_rx.max_usecs = 0; 3014a0d54820d2 Heng Qi 2023-10-08 5013 vi->intr_coal_tx.max_usecs = 0; 3014a0d54820d2 Heng Qi 2023-10-08 5014 vi->intr_coal_rx.max_packets = 0; 3014a0d54820d2 Heng Qi 2023-10-08 5015 3014a0d54820d2 Heng Qi 2023-10-08 5016 /* Keep the default values of the coalescing parameters 3014a0d54820d2 Heng Qi 2023-10-08 5017 * aligned with the default napi_tx state. 3014a0d54820d2 Heng Qi 2023-10-08 5018 */ 3014a0d54820d2 Heng Qi 2023-10-08 5019 if (vi->sq[0].napi.weight) 3014a0d54820d2 Heng Qi 2023-10-08 5020 vi->intr_coal_tx.max_packets = 1; 3014a0d54820d2 Heng Qi 2023-10-08 5021 else 3014a0d54820d2 Heng Qi 2023-10-08 5022 vi->intr_coal_tx.max_packets = 0; 3014a0d54820d2 Heng Qi 2023-10-08 5023 } 3014a0d54820d2 Heng Qi 2023-10-08 5024 d8cd72f1622753 Heng Qi 2024-03-21 5025 INIT_LIST_HEAD(&vi->coal_list); 3014a0d54820d2 Heng Qi 2023-10-08 5026 if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_VQ_NOTF_COAL)) { d8cd72f1622753 Heng Qi 2024-03-21 5027 vi->cvq_cmd_nums = 0; d8cd72f1622753 Heng Qi 2024-03-21 5028 vi->dim_loop_index = 0; d8cd72f1622753 Heng Qi 2024-03-21 5029 d8cd72f1622753 Heng Qi 2024-03-21 5030 if (virtnet_init_coal_list(vi)) d8cd72f1622753 Heng Qi 2024-03-21 @5031 goto free; This should probably set the error code. err = virtnet_init_coal_list(vi); if (err) goto free; d8cd72f1622753 Heng Qi 2024-03-21 5032 3014a0d54820d2 Heng Qi 2023-10-08 5033 /* The reason is the same as VIRTIO_NET_F_NOTF_COAL. */ d8cd72f1622753 Heng Qi 2024-03-21 5034 for (i = 0; i < vi->max_queue_pairs; i++) { d8cd72f1622753 Heng Qi 2024-03-21 5035 vi->rq[i].packets_in_napi = 0; 3014a0d54820d2 Heng Qi 2023-10-08 5036 if (vi->sq[i].napi.weight) 3014a0d54820d2 Heng Qi 2023-10-08 5037 vi->sq[i].intr_coal.max_packets = 1; 3014a0d54820d2 Heng Qi 2023-10-08 5038 } d8cd72f1622753 Heng Qi 2024-03-21 5039 } -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] virtio-net: a fix and some updates for virtio dim 2024-03-21 11:45 [PATCH 0/2] virtio-net: a fix and some updates for virtio dim Heng Qi 2024-03-21 11:45 ` [PATCH 1/2] virtio-net: fix possible dim status unrecoverable Heng Qi 2024-03-21 11:45 ` [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker Heng Qi @ 2024-03-21 12:25 ` Jiri Pirko 2024-03-25 2:23 ` Heng Qi 2 siblings, 1 reply; 23+ messages in thread From: Jiri Pirko @ 2024-03-21 12:25 UTC (permalink / raw) To: Heng Qi Cc: netdev, virtualization, Jason Wang, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo Thu, Mar 21, 2024 at 12:45:55PM CET, hengqi@linux.alibaba.com wrote: >Patch 1 fixes an existing bug. Belongs to the net branch. Send separately with "net" indication in the patch brackets: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html?highlight=network#tl-dr >Patch 2 attempts to modify the sending of dim cmds to an asynchronous way. Not a bugfix, then send separately with "net-next" indication. Net-next is currently closed, send it next week. > >Heng Qi (2): > virtio-net: fix possible dim status unrecoverable > virtio-net: reduce the CPU consumption of dim worker The name of the driver is "virtio_net". pw-bot: cr > > drivers/net/virtio_net.c | 273 ++++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 246 insertions(+), 27 deletions(-) > >-- >1.8.3.1 > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] virtio-net: a fix and some updates for virtio dim 2024-03-21 12:25 ` [PATCH 0/2] virtio-net: a fix and some updates for virtio dim Jiri Pirko @ 2024-03-25 2:23 ` Heng Qi 0 siblings, 0 replies; 23+ messages in thread From: Heng Qi @ 2024-03-25 2:23 UTC (permalink / raw) To: Jiri Pirko Cc: netdev, virtualization, Jason Wang, Michael S. Tsirkin, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Xuan Zhuo 在 2024/3/21 下午8:25, Jiri Pirko 写道: > Thu, Mar 21, 2024 at 12:45:55PM CET, hengqi@linux.alibaba.com wrote: >> Patch 1 fixes an existing bug. Belongs to the net branch. > Send separately with "net" indication in the patch brackets: > https://www.kernel.org/doc/html/next/process/maintainer-netdev.html?highlight=network#tl-dr > > >> Patch 2 attempts to modify the sending of dim cmds to an asynchronous way. > Not a bugfix, then send separately with "net-next" indication. Net-next > is currently closed, send it next week. Will do. > > >> Heng Qi (2): >> virtio-net: fix possible dim status unrecoverable >> virtio-net: reduce the CPU consumption of dim worker > The name of the driver is "virtio_net". 'git log --grep="^virtio-net:" --oneline' will see a large number of "virtio-net" names. > > > > pw-bot: cr > > >> drivers/net/virtio_net.c | 273 ++++++++++++++++++++++++++++++++++++++++++----- >> 1 file changed, 246 insertions(+), 27 deletions(-) >> >> -- >> 1.8.3.1 >> >> ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2024-03-26 6:05 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-03-21 11:45 [PATCH 0/2] virtio-net: a fix and some updates for virtio dim Heng Qi 2024-03-21 11:45 ` [PATCH 1/2] virtio-net: fix possible dim status unrecoverable Heng Qi 2024-03-22 5:17 ` Jason Wang 2024-03-25 2:11 ` Heng Qi 2024-03-25 6:29 ` Jason Wang 2024-03-25 6:57 ` Heng Qi 2024-03-25 7:06 ` Jason Wang 2024-03-21 11:45 ` [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker Heng Qi 2024-03-22 2:03 ` kernel test robot 2024-03-22 5:19 ` Jason Wang 2024-03-25 2:21 ` Heng Qi 2024-03-25 5:57 ` Jason Wang 2024-03-25 7:17 ` Heng Qi 2024-03-25 7:56 ` Jason Wang 2024-03-25 8:22 ` Heng Qi 2024-03-25 8:42 ` Jason Wang 2024-03-26 2:46 ` Heng Qi 2024-03-26 4:08 ` Jason Wang 2024-03-26 5:57 ` Heng Qi 2024-03-26 6:05 ` Jason Wang 2024-03-22 6:50 ` Dan Carpenter 2024-03-21 12:25 ` [PATCH 0/2] virtio-net: a fix and some updates for virtio dim Jiri Pirko 2024-03-25 2:23 ` Heng Qi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox