* [PATCH net-next v6 0/5] virtio-net: support dynamic coalescing moderation
@ 2023-12-05 8:02 Heng Qi
2023-12-05 8:02 ` [PATCH net-next v6 1/5] virtio-net: returns whether napi is complete Heng Qi
` (4 more replies)
0 siblings, 5 replies; 14+ messages in thread
From: Heng Qi @ 2023-12-05 8:02 UTC (permalink / raw)
To: netdev, virtualization
Cc: jasowang, mst, pabeni, kuba, yinjun.zhang, edumazet, davem, hawk,
john.fastabend, ast, horms, xuanzhuo
Now, virtio-net already supports per-queue moderation parameter
setting. Based on this, we use the linux dimlib to support
dynamic coalescing moderation for virtio-net.
Due to some scheduling issues, we only support and test the rx dim.
Some test results:
I. Sockperf UDP
=================================================
1. Env
rxq_0 with affinity to cpu_0.
2. Cmd
client: taskset -c 0 sockperf tp -p 8989 -i $IP -t 10 -m 16B
server: taskset -c 0 sockperf sr -p 8989
3. Result
dim off: 1143277.00 rxpps, throughput 17.844 MBps, cpu is 100%.
dim on: 1124161.00 rxpps, throughput 17.610 MBps, cpu is 83.5%.
=================================================
II. Redis
=================================================
1. Env
There are 8 rxqs, and rxq_i with affinity to cpu_i.
2. Result
When all cpus are 100%, ops/sec of memtier_benchmark client is
dim off: 978437.23
dim on: 1143638.28
=================================================
III. Nginx
=================================================
1. Env
There are 8 rxqs and rxq_i with affinity to cpu_i.
2. Result
When all cpus are 100%, requests/sec of wrk client is
dim off: 877931.67
dim on: 1019160.31
=================================================
IV. Latency of sockperf udp
=================================================
1. Rx cmd
taskset -c 0 sockperf sr -p 8989
2. Tx cmd
taskset -c 0 sockperf pp -i ${ip} -p 8989 -t 10
After running this cmd 5 times and averaging the results,
3. Result
dim off: 17.7735 usec
dim on: 18.0110 usec
=================================================
Changelog:
v5->v6:
- Add patch(4/5): spin lock for ctrl cmd access
- Patch(5/5):
- Use spin lock and cancel_work_sync to synchronize
v4->v5:
- Patch(4/4):
- Fix possible synchronization issues with cancel_work_sync.
- Reduce if/else nesting levels
v3->v4:
- Patch(5/5): drop.
v2->v3:
- Patch(4/5): some minor modifications.
v1->v2:
- Patch(2/5): a minor fix.
- Patch(4/5):
- improve the judgment of dim switch conditions.
- Cancel the work when vq reset.
- Patch(5/5): drop the tx dim implementation.
Heng Qi (5):
virtio-net: returns whether napi is complete
virtio-net: separate rx/tx coalescing moderation cmds
virtio-net: extract virtqueue coalescig cmd for reuse
virtio-net: add spin lock for ctrl cmd access
virtio-net: support rx netdim
drivers/net/virtio_net.c | 346 +++++++++++++++++++++++++++++++++------
1 file changed, 295 insertions(+), 51 deletions(-)
--
2.19.1.6.gb485710b
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH net-next v6 1/5] virtio-net: returns whether napi is complete
2023-12-05 8:02 [PATCH net-next v6 0/5] virtio-net: support dynamic coalescing moderation Heng Qi
@ 2023-12-05 8:02 ` Heng Qi
2023-12-05 8:02 ` [PATCH net-next v6 2/5] virtio-net: separate rx/tx coalescing moderation cmds Heng Qi
` (3 subsequent siblings)
4 siblings, 0 replies; 14+ messages in thread
From: Heng Qi @ 2023-12-05 8:02 UTC (permalink / raw)
To: netdev, virtualization
Cc: jasowang, mst, pabeni, kuba, yinjun.zhang, edumazet, davem, hawk,
john.fastabend, ast, horms, xuanzhuo
rx netdim needs to count the traffic during a complete napi process,
and start updating and comparing samples to make decisions after
the napi ends. Let virtqueue_napi_complete() return true if napi is done,
otherwise vice versa.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d16f592c2061..0ad2894e6a5e 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -431,7 +431,7 @@ static void virtqueue_napi_schedule(struct napi_struct *napi,
}
}
-static void virtqueue_napi_complete(struct napi_struct *napi,
+static bool virtqueue_napi_complete(struct napi_struct *napi,
struct virtqueue *vq, int processed)
{
int opaque;
@@ -440,9 +440,13 @@ static void virtqueue_napi_complete(struct napi_struct *napi,
if (napi_complete_done(napi, processed)) {
if (unlikely(virtqueue_poll(vq, opaque)))
virtqueue_napi_schedule(napi, vq);
+ else
+ return true;
} else {
virtqueue_disable_cb(vq);
}
+
+ return false;
}
static void skb_xmit_done(struct virtqueue *vq)
--
2.19.1.6.gb485710b
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next v6 2/5] virtio-net: separate rx/tx coalescing moderation cmds
2023-12-05 8:02 [PATCH net-next v6 0/5] virtio-net: support dynamic coalescing moderation Heng Qi
2023-12-05 8:02 ` [PATCH net-next v6 1/5] virtio-net: returns whether napi is complete Heng Qi
@ 2023-12-05 8:02 ` Heng Qi
2023-12-05 8:02 ` [PATCH net-next v6 3/5] virtio-net: extract virtqueue coalescig cmd for reuse Heng Qi
` (2 subsequent siblings)
4 siblings, 0 replies; 14+ messages in thread
From: Heng Qi @ 2023-12-05 8:02 UTC (permalink / raw)
To: netdev, virtualization
Cc: jasowang, mst, pabeni, kuba, yinjun.zhang, edumazet, davem, hawk,
john.fastabend, ast, horms, xuanzhuo
This patch separates the rx and tx global coalescing moderation
commands to support netdim switches in subsequent patches.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 31 ++++++++++++++++++++++++++++---
1 file changed, 28 insertions(+), 3 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0ad2894e6a5e..0285301caf78 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3266,10 +3266,10 @@ static int virtnet_get_link_ksettings(struct net_device *dev,
return 0;
}
-static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
- struct ethtool_coalesce *ec)
+static int virtnet_send_tx_notf_coal_cmds(struct virtnet_info *vi,
+ struct ethtool_coalesce *ec)
{
- struct scatterlist sgs_tx, sgs_rx;
+ struct scatterlist sgs_tx;
int i;
vi->ctrl->coal_tx.tx_usecs = cpu_to_le32(ec->tx_coalesce_usecs);
@@ -3289,6 +3289,15 @@ static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
vi->sq[i].intr_coal.max_packets = ec->tx_max_coalesced_frames;
}
+ return 0;
+}
+
+static int virtnet_send_rx_notf_coal_cmds(struct virtnet_info *vi,
+ struct ethtool_coalesce *ec)
+{
+ struct scatterlist sgs_rx;
+ int i;
+
vi->ctrl->coal_rx.rx_usecs = cpu_to_le32(ec->rx_coalesce_usecs);
vi->ctrl->coal_rx.rx_max_packets = cpu_to_le32(ec->rx_max_coalesced_frames);
sg_init_one(&sgs_rx, &vi->ctrl->coal_rx, sizeof(vi->ctrl->coal_rx));
@@ -3309,6 +3318,22 @@ static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
return 0;
}
+static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
+ struct ethtool_coalesce *ec)
+{
+ int err;
+
+ err = virtnet_send_tx_notf_coal_cmds(vi, ec);
+ if (err)
+ return err;
+
+ err = virtnet_send_rx_notf_coal_cmds(vi, ec);
+ if (err)
+ return err;
+
+ return 0;
+}
+
static int virtnet_send_ctrl_coal_vq_cmd(struct virtnet_info *vi,
u16 vqn, u32 max_usecs, u32 max_packets)
{
--
2.19.1.6.gb485710b
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next v6 3/5] virtio-net: extract virtqueue coalescig cmd for reuse
2023-12-05 8:02 [PATCH net-next v6 0/5] virtio-net: support dynamic coalescing moderation Heng Qi
2023-12-05 8:02 ` [PATCH net-next v6 1/5] virtio-net: returns whether napi is complete Heng Qi
2023-12-05 8:02 ` [PATCH net-next v6 2/5] virtio-net: separate rx/tx coalescing moderation cmds Heng Qi
@ 2023-12-05 8:02 ` Heng Qi
2023-12-05 8:02 ` [PATCH net-next v6 4/5] virtio-net: add spin lock for ctrl cmd access Heng Qi
2023-12-05 8:02 ` [PATCH net-next v6 5/5] virtio-net: support rx netdim Heng Qi
4 siblings, 0 replies; 14+ messages in thread
From: Heng Qi @ 2023-12-05 8:02 UTC (permalink / raw)
To: netdev, virtualization
Cc: jasowang, mst, pabeni, kuba, yinjun.zhang, edumazet, davem, hawk,
john.fastabend, ast, horms, xuanzhuo
Extract commands to set virtqueue coalescing parameters for reuse
by ethtool -Q, vq resize and netdim.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 106 +++++++++++++++++++++++----------------
1 file changed, 64 insertions(+), 42 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0285301caf78..69fe09e99b3c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2849,6 +2849,58 @@ static void virtnet_cpu_notif_remove(struct virtnet_info *vi)
&vi->node_dead);
}
+static int virtnet_send_ctrl_coal_vq_cmd(struct virtnet_info *vi,
+ u16 vqn, u32 max_usecs, u32 max_packets)
+{
+ struct scatterlist sgs;
+
+ vi->ctrl->coal_vq.vqn = cpu_to_le16(vqn);
+ vi->ctrl->coal_vq.coal.max_usecs = cpu_to_le32(max_usecs);
+ vi->ctrl->coal_vq.coal.max_packets = cpu_to_le32(max_packets);
+ sg_init_one(&sgs, &vi->ctrl->coal_vq, sizeof(vi->ctrl->coal_vq));
+
+ if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL,
+ VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET,
+ &sgs))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int virtnet_send_rx_ctrl_coal_vq_cmd(struct virtnet_info *vi,
+ u16 queue, u32 max_usecs,
+ u32 max_packets)
+{
+ int err;
+
+ err = virtnet_send_ctrl_coal_vq_cmd(vi, rxq2vq(queue),
+ max_usecs, max_packets);
+ if (err)
+ return err;
+
+ vi->rq[queue].intr_coal.max_usecs = max_usecs;
+ vi->rq[queue].intr_coal.max_packets = max_packets;
+
+ return 0;
+}
+
+static int virtnet_send_tx_ctrl_coal_vq_cmd(struct virtnet_info *vi,
+ u16 queue, u32 max_usecs,
+ u32 max_packets)
+{
+ int err;
+
+ err = virtnet_send_ctrl_coal_vq_cmd(vi, txq2vq(queue),
+ max_usecs, max_packets);
+ if (err)
+ return err;
+
+ vi->sq[queue].intr_coal.max_usecs = max_usecs;
+ vi->sq[queue].intr_coal.max_packets = max_packets;
+
+ return 0;
+}
+
static void virtnet_get_ringparam(struct net_device *dev,
struct ethtool_ringparam *ring,
struct kernel_ethtool_ringparam *kernel_ring,
@@ -2906,14 +2958,11 @@ static int virtnet_set_ringparam(struct net_device *dev,
* through the VIRTIO_NET_CTRL_NOTF_COAL_TX_SET command, or, if the driver
* did not set any TX coalescing parameters, to 0.
*/
- err = virtnet_send_ctrl_coal_vq_cmd(vi, txq2vq(i),
- vi->intr_coal_tx.max_usecs,
- vi->intr_coal_tx.max_packets);
+ err = virtnet_send_tx_ctrl_coal_vq_cmd(vi, i,
+ vi->intr_coal_tx.max_usecs,
+ vi->intr_coal_tx.max_packets);
if (err)
return err;
-
- vi->sq[i].intr_coal.max_usecs = vi->intr_coal_tx.max_usecs;
- vi->sq[i].intr_coal.max_packets = vi->intr_coal_tx.max_packets;
}
if (ring->rx_pending != rx_pending) {
@@ -2922,14 +2971,11 @@ static int virtnet_set_ringparam(struct net_device *dev,
return err;
/* The reason is same as the transmit virtqueue reset */
- err = virtnet_send_ctrl_coal_vq_cmd(vi, rxq2vq(i),
- vi->intr_coal_rx.max_usecs,
- vi->intr_coal_rx.max_packets);
+ err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, i,
+ vi->intr_coal_rx.max_usecs,
+ vi->intr_coal_rx.max_packets);
if (err)
return err;
-
- vi->rq[i].intr_coal.max_usecs = vi->intr_coal_rx.max_usecs;
- vi->rq[i].intr_coal.max_packets = vi->intr_coal_rx.max_packets;
}
}
@@ -3334,48 +3380,24 @@ static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
return 0;
}
-static int virtnet_send_ctrl_coal_vq_cmd(struct virtnet_info *vi,
- u16 vqn, u32 max_usecs, u32 max_packets)
-{
- struct scatterlist sgs;
-
- vi->ctrl->coal_vq.vqn = cpu_to_le16(vqn);
- vi->ctrl->coal_vq.coal.max_usecs = cpu_to_le32(max_usecs);
- vi->ctrl->coal_vq.coal.max_packets = cpu_to_le32(max_packets);
- sg_init_one(&sgs, &vi->ctrl->coal_vq, sizeof(vi->ctrl->coal_vq));
-
- if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL,
- VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET,
- &sgs))
- return -EINVAL;
-
- return 0;
-}
-
static int virtnet_send_notf_coal_vq_cmds(struct virtnet_info *vi,
struct ethtool_coalesce *ec,
u16 queue)
{
int err;
- err = virtnet_send_ctrl_coal_vq_cmd(vi, rxq2vq(queue),
- ec->rx_coalesce_usecs,
- ec->rx_max_coalesced_frames);
+ err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, queue,
+ ec->rx_coalesce_usecs,
+ ec->rx_max_coalesced_frames);
if (err)
return err;
- vi->rq[queue].intr_coal.max_usecs = ec->rx_coalesce_usecs;
- vi->rq[queue].intr_coal.max_packets = ec->rx_max_coalesced_frames;
-
- err = virtnet_send_ctrl_coal_vq_cmd(vi, txq2vq(queue),
- ec->tx_coalesce_usecs,
- ec->tx_max_coalesced_frames);
+ err = virtnet_send_tx_ctrl_coal_vq_cmd(vi, queue,
+ ec->tx_coalesce_usecs,
+ ec->tx_max_coalesced_frames);
if (err)
return err;
- vi->sq[queue].intr_coal.max_usecs = ec->tx_coalesce_usecs;
- vi->sq[queue].intr_coal.max_packets = ec->tx_max_coalesced_frames;
-
return 0;
}
--
2.19.1.6.gb485710b
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next v6 4/5] virtio-net: add spin lock for ctrl cmd access
2023-12-05 8:02 [PATCH net-next v6 0/5] virtio-net: support dynamic coalescing moderation Heng Qi
` (2 preceding siblings ...)
2023-12-05 8:02 ` [PATCH net-next v6 3/5] virtio-net: extract virtqueue coalescig cmd for reuse Heng Qi
@ 2023-12-05 8:02 ` Heng Qi
2023-12-05 8:35 ` Jason Wang
2023-12-05 8:02 ` [PATCH net-next v6 5/5] virtio-net: support rx netdim Heng Qi
4 siblings, 1 reply; 14+ messages in thread
From: Heng Qi @ 2023-12-05 8:02 UTC (permalink / raw)
To: netdev, virtualization
Cc: jasowang, mst, pabeni, kuba, yinjun.zhang, edumazet, davem, hawk,
john.fastabend, ast, horms, xuanzhuo
Currently access to ctrl cmd is globally protected via rtnl_lock and works
fine. But if dim work's access to ctrl cmd also holds rtnl_lock, deadlock
may occur due to cancel_work_sync for dim work. Therefore, treating
ctrl cmd as a separate protection object of the lock is the solution and
the basis for the next patch.
Since ndo_set_rx_mode is in an atomic environment(netif_addr_lock_bh),
the mutex lock is excluded.
And I tried putting the spin lock in virtnet_send_command, but
virtnet_rx_dim_work and virtnet_set_per_queue_coalesce access to
shared variables prevent this.
cancel_work_sync and virtnet_rx_dim_work are from the next patch.
Please review. Thanks a lot!
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
---
drivers/net/virtio_net.c | 70 ++++++++++++++++++++++++++++++++++++----
1 file changed, 64 insertions(+), 6 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 69fe09e99b3c..0dd09c4f8d89 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -301,6 +301,9 @@ struct virtnet_info {
struct control_buf *ctrl;
+ /* The lock to synchronize the access to contrl cmd */
+ spinlock_t ctrl_lock;
+
/* Ethtool settings */
u8 duplex;
u32 speed;
@@ -2520,13 +2523,16 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
sg_init_one(&sg, addr->sa_data, dev->addr_len);
+ spin_lock(&vi->ctrl_lock);
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC,
VIRTIO_NET_CTRL_MAC_ADDR_SET, &sg)) {
dev_warn(&vdev->dev,
"Failed to set mac address by vq command.\n");
ret = -EINVAL;
+ spin_unlock(&vi->ctrl_lock);
goto out;
}
+ spin_unlock(&vi->ctrl_lock);
} else if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC) &&
!virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
unsigned int i;
@@ -2589,9 +2595,11 @@ static void virtnet_stats(struct net_device *dev,
static void virtnet_ack_link_announce(struct virtnet_info *vi)
{
rtnl_lock();
+ spin_lock(&vi->ctrl_lock);
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_ANNOUNCE,
VIRTIO_NET_CTRL_ANNOUNCE_ACK, NULL))
dev_warn(&vi->dev->dev, "Failed to ack link announce.\n");
+ spin_unlock(&vi->ctrl_lock);
rtnl_unlock();
}
@@ -2603,6 +2611,7 @@ static int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
if (!vi->has_cvq || !virtio_has_feature(vi->vdev, VIRTIO_NET_F_MQ))
return 0;
+ spin_lock(&vi->ctrl_lock);
vi->ctrl->mq.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs);
sg_init_one(&sg, &vi->ctrl->mq, sizeof(vi->ctrl->mq));
@@ -2610,6 +2619,7 @@ static int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, &sg)) {
dev_warn(&dev->dev, "Fail to set num of queue pairs to %d\n",
queue_pairs);
+ spin_unlock(&vi->ctrl_lock);
return -EINVAL;
} else {
vi->curr_queue_pairs = queue_pairs;
@@ -2618,6 +2628,7 @@ static int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
schedule_delayed_work(&vi->refill, 0);
}
+ spin_unlock(&vi->ctrl_lock);
return 0;
}
@@ -2662,6 +2673,7 @@ static void virtnet_set_rx_mode(struct net_device *dev)
if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX))
return;
+ spin_lock(&vi->ctrl_lock);
vi->ctrl->promisc = ((dev->flags & IFF_PROMISC) != 0);
vi->ctrl->allmulti = ((dev->flags & IFF_ALLMULTI) != 0);
@@ -2679,6 +2691,7 @@ static void virtnet_set_rx_mode(struct net_device *dev)
dev_warn(&dev->dev, "Failed to %sable allmulti mode.\n",
vi->ctrl->allmulti ? "en" : "dis");
+ spin_unlock(&vi->ctrl_lock);
uc_count = netdev_uc_count(dev);
mc_count = netdev_mc_count(dev);
/* MAC filter - use one buffer for both lists */
@@ -2710,10 +2723,12 @@ static void virtnet_set_rx_mode(struct net_device *dev)
sg_set_buf(&sg[1], mac_data,
sizeof(mac_data->entries) + (mc_count * ETH_ALEN));
+ spin_lock(&vi->ctrl_lock);
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC,
VIRTIO_NET_CTRL_MAC_TABLE_SET, sg))
dev_warn(&dev->dev, "Failed to set MAC filter table.\n");
+ spin_unlock(&vi->ctrl_lock);
kfree(buf);
}
@@ -2723,12 +2738,15 @@ static int virtnet_vlan_rx_add_vid(struct net_device *dev,
struct virtnet_info *vi = netdev_priv(dev);
struct scatterlist sg;
+ spin_lock(&vi->ctrl_lock);
vi->ctrl->vid = cpu_to_virtio16(vi->vdev, vid);
sg_init_one(&sg, &vi->ctrl->vid, sizeof(vi->ctrl->vid));
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
VIRTIO_NET_CTRL_VLAN_ADD, &sg))
dev_warn(&dev->dev, "Failed to add VLAN ID %d.\n", vid);
+ spin_unlock(&vi->ctrl_lock);
+
return 0;
}
@@ -2738,12 +2756,15 @@ static int virtnet_vlan_rx_kill_vid(struct net_device *dev,
struct virtnet_info *vi = netdev_priv(dev);
struct scatterlist sg;
+ spin_lock(&vi->ctrl_lock);
vi->ctrl->vid = cpu_to_virtio16(vi->vdev, vid);
sg_init_one(&sg, &vi->ctrl->vid, sizeof(vi->ctrl->vid));
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
VIRTIO_NET_CTRL_VLAN_DEL, &sg))
dev_warn(&dev->dev, "Failed to kill VLAN ID %d.\n", vid);
+ spin_unlock(&vi->ctrl_lock);
+
return 0;
}
@@ -2958,11 +2979,15 @@ static int virtnet_set_ringparam(struct net_device *dev,
* through the VIRTIO_NET_CTRL_NOTF_COAL_TX_SET command, or, if the driver
* did not set any TX coalescing parameters, to 0.
*/
+ spin_lock(&vi->ctrl_lock);
err = virtnet_send_tx_ctrl_coal_vq_cmd(vi, i,
vi->intr_coal_tx.max_usecs,
vi->intr_coal_tx.max_packets);
- if (err)
+ if (err) {
+ spin_unlock(&vi->ctrl_lock);
return err;
+ }
+ spin_unlock(&vi->ctrl_lock);
}
if (ring->rx_pending != rx_pending) {
@@ -2971,11 +2996,15 @@ static int virtnet_set_ringparam(struct net_device *dev,
return err;
/* The reason is same as the transmit virtqueue reset */
+ spin_lock(&vi->ctrl_lock);
err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, i,
vi->intr_coal_rx.max_usecs,
vi->intr_coal_rx.max_packets);
- if (err)
+ if (err) {
+ spin_unlock(&vi->ctrl_lock);
return err;
+ }
+ spin_unlock(&vi->ctrl_lock);
}
}
@@ -2991,6 +3020,7 @@ static bool virtnet_commit_rss_command(struct virtnet_info *vi)
/* prepare sgs */
sg_init_table(sgs, 4);
+ spin_lock(&vi->ctrl_lock);
sg_buf_size = offsetof(struct virtio_net_ctrl_rss, indirection_table);
sg_set_buf(&sgs[0], &vi->ctrl->rss, sg_buf_size);
@@ -3008,8 +3038,12 @@ static bool virtnet_commit_rss_command(struct virtnet_info *vi)
vi->has_rss ? VIRTIO_NET_CTRL_MQ_RSS_CONFIG
: VIRTIO_NET_CTRL_MQ_HASH_CONFIG, sgs)) {
dev_warn(&dev->dev, "VIRTIONET issue with committing RSS sgs\n");
+ spin_unlock(&vi->ctrl_lock);
return false;
}
+
+ spin_unlock(&vi->ctrl_lock);
+
return true;
}
@@ -3318,14 +3352,17 @@ static int virtnet_send_tx_notf_coal_cmds(struct virtnet_info *vi,
struct scatterlist sgs_tx;
int i;
+ spin_lock(&vi->ctrl_lock);
vi->ctrl->coal_tx.tx_usecs = cpu_to_le32(ec->tx_coalesce_usecs);
vi->ctrl->coal_tx.tx_max_packets = cpu_to_le32(ec->tx_max_coalesced_frames);
sg_init_one(&sgs_tx, &vi->ctrl->coal_tx, sizeof(vi->ctrl->coal_tx));
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL,
VIRTIO_NET_CTRL_NOTF_COAL_TX_SET,
- &sgs_tx))
+ &sgs_tx)) {
+ spin_unlock(&vi->ctrl_lock);
return -EINVAL;
+ }
/* Save parameters */
vi->intr_coal_tx.max_usecs = ec->tx_coalesce_usecs;
@@ -3334,6 +3371,7 @@ static int virtnet_send_tx_notf_coal_cmds(struct virtnet_info *vi,
vi->sq[i].intr_coal.max_usecs = ec->tx_coalesce_usecs;
vi->sq[i].intr_coal.max_packets = ec->tx_max_coalesced_frames;
}
+ spin_unlock(&vi->ctrl_lock);
return 0;
}
@@ -3344,14 +3382,17 @@ static int virtnet_send_rx_notf_coal_cmds(struct virtnet_info *vi,
struct scatterlist sgs_rx;
int i;
+ spin_lock(&vi->ctrl_lock);
vi->ctrl->coal_rx.rx_usecs = cpu_to_le32(ec->rx_coalesce_usecs);
vi->ctrl->coal_rx.rx_max_packets = cpu_to_le32(ec->rx_max_coalesced_frames);
sg_init_one(&sgs_rx, &vi->ctrl->coal_rx, sizeof(vi->ctrl->coal_rx));
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL,
VIRTIO_NET_CTRL_NOTF_COAL_RX_SET,
- &sgs_rx))
+ &sgs_rx)) {
+ spin_unlock(&vi->ctrl_lock);
return -EINVAL;
+ }
/* Save parameters */
vi->intr_coal_rx.max_usecs = ec->rx_coalesce_usecs;
@@ -3361,6 +3402,8 @@ static int virtnet_send_rx_notf_coal_cmds(struct virtnet_info *vi,
vi->rq[i].intr_coal.max_packets = ec->rx_max_coalesced_frames;
}
+ spin_unlock(&vi->ctrl_lock);
+
return 0;
}
@@ -3386,17 +3429,24 @@ static int virtnet_send_notf_coal_vq_cmds(struct virtnet_info *vi,
{
int err;
+ spin_lock(&vi->ctrl_lock);
err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, queue,
ec->rx_coalesce_usecs,
ec->rx_max_coalesced_frames);
- if (err)
+ if (err) {
+ spin_unlock(&vi->ctrl_lock);
return err;
+ }
err = virtnet_send_tx_ctrl_coal_vq_cmd(vi, queue,
ec->tx_coalesce_usecs,
ec->tx_max_coalesced_frames);
- if (err)
+ if (err) {
+ spin_unlock(&vi->ctrl_lock);
return err;
+ }
+
+ spin_unlock(&vi->ctrl_lock);
return 0;
}
@@ -3733,6 +3783,8 @@ static int virtnet_restore_up(struct virtio_device *vdev)
static int virtnet_set_guest_offloads(struct virtnet_info *vi, u64 offloads)
{
struct scatterlist sg;
+
+ spin_lock(&vi->ctrl_lock);
vi->ctrl->offloads = cpu_to_virtio64(vi->vdev, offloads);
sg_init_one(&sg, &vi->ctrl->offloads, sizeof(vi->ctrl->offloads));
@@ -3740,9 +3792,11 @@ static int virtnet_set_guest_offloads(struct virtnet_info *vi, u64 offloads)
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_GUEST_OFFLOADS,
VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET, &sg)) {
dev_warn(&vi->dev->dev, "Fail to set guest offload.\n");
+ spin_unlock(&vi->ctrl_lock);
return -EINVAL;
}
+ spin_unlock(&vi->ctrl_lock);
return 0;
}
@@ -4525,6 +4579,7 @@ static int virtnet_probe(struct virtio_device *vdev)
INIT_WORK(&vi->config_work, virtnet_config_changed_work);
spin_lock_init(&vi->refill_lock);
+ spin_lock_init(&vi->ctrl_lock);
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF)) {
vi->mergeable_rx_bufs = true;
@@ -4669,13 +4724,16 @@ static int virtnet_probe(struct virtio_device *vdev)
struct scatterlist sg;
sg_init_one(&sg, dev->dev_addr, dev->addr_len);
+ spin_lock(&vi->ctrl_lock);
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC,
VIRTIO_NET_CTRL_MAC_ADDR_SET, &sg)) {
pr_debug("virtio_net: setting MAC address failed\n");
+ spin_unlock(&vi->ctrl_lock);
rtnl_unlock();
err = -EINVAL;
goto free_unregister_netdev;
}
+ spin_unlock(&vi->ctrl_lock);
}
rtnl_unlock();
--
2.19.1.6.gb485710b
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next v6 5/5] virtio-net: support rx netdim
2023-12-05 8:02 [PATCH net-next v6 0/5] virtio-net: support dynamic coalescing moderation Heng Qi
` (3 preceding siblings ...)
2023-12-05 8:02 ` [PATCH net-next v6 4/5] virtio-net: add spin lock for ctrl cmd access Heng Qi
@ 2023-12-05 8:02 ` Heng Qi
4 siblings, 0 replies; 14+ messages in thread
From: Heng Qi @ 2023-12-05 8:02 UTC (permalink / raw)
To: netdev, virtualization
Cc: jasowang, mst, pabeni, kuba, yinjun.zhang, edumazet, davem, hawk,
john.fastabend, ast, horms, xuanzhuo
By comparing the traffic information in the complete napi processes,
let the virtio-net driver automatically adjust the coalescing
moderation parameters of each receive queue.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
---
v5->v5:
- Use spin lock and cancel_work_sync to synchronize
v4->v5:
- Fix possible synchronization issues using cancel_work().
- Reduce if/else nesting levels
v2->v3:
- Some minor modifications.
v1->v2:
- Improved the judgment of dim switch conditions.
- Cancel the work when vq reset.
drivers/net/virtio_net.c | 199 ++++++++++++++++++++++++++++++++-------
1 file changed, 167 insertions(+), 32 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0dd09c4f8d89..882761c64db4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -19,6 +19,7 @@
#include <linux/average.h>
#include <linux/filter.h>
#include <linux/kernel.h>
+#include <linux/dim.h>
#include <net/route.h>
#include <net/xdp.h>
#include <net/net_failover.h>
@@ -172,6 +173,17 @@ struct receive_queue {
struct virtnet_rq_stats stats;
+ /* The number of rx notifications */
+ u16 calls;
+
+ /* Is dynamic interrupt moderation enabled? */
+ bool dim_enabled;
+
+ /* Dynamic Interrupt Moderation */
+ struct dim dim;
+
+ u32 packets_in_napi;
+
struct virtnet_interrupt_coalesce intr_coal;
/* Chain pages by the private ptr. */
@@ -308,6 +320,9 @@ struct virtnet_info {
u8 duplex;
u32 speed;
+ /* Is rx dynamic interrupt moderation enabled? */
+ bool rx_dim_enabled;
+
/* Interrupt coalescing settings */
struct virtnet_interrupt_coalesce intr_coal_tx;
struct virtnet_interrupt_coalesce intr_coal_rx;
@@ -2004,6 +2019,7 @@ static void skb_recv_done(struct virtqueue *rvq)
struct virtnet_info *vi = rvq->vdev->priv;
struct receive_queue *rq = &vi->rq[vq2rxq(rvq)];
+ rq->calls++;
virtqueue_napi_schedule(&rq->napi, rvq);
}
@@ -2144,6 +2160,26 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
}
}
+static void virtnet_rx_dim_work(struct work_struct *work);
+
+static void virtnet_rx_dim_update(struct virtnet_info *vi, struct receive_queue *rq)
+{
+ struct dim_sample cur_sample = {};
+
+ if (!rq->packets_in_napi)
+ return;
+
+ u64_stats_update_begin(&rq->stats.syncp);
+ dim_update_sample(rq->calls,
+ u64_stats_read(&rq->stats.packets),
+ u64_stats_read(&rq->stats.bytes),
+ &cur_sample);
+ u64_stats_update_end(&rq->stats.syncp);
+
+ net_dim(&rq->dim, cur_sample);
+ rq->packets_in_napi = 0;
+}
+
static int virtnet_poll(struct napi_struct *napi, int budget)
{
struct receive_queue *rq =
@@ -2152,17 +2188,22 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
struct send_queue *sq;
unsigned int received;
unsigned int xdp_xmit = 0;
+ bool napi_complete;
virtnet_poll_cleantx(rq);
received = virtnet_receive(rq, budget, &xdp_xmit);
+ rq->packets_in_napi += received;
if (xdp_xmit & VIRTIO_XDP_REDIR)
xdp_do_flush();
/* Out of packets? */
- if (received < budget)
- virtqueue_napi_complete(napi, rq->vq, received);
+ if (received < budget) {
+ napi_complete = virtqueue_napi_complete(napi, rq->vq, received);
+ if (napi_complete && rq->dim_enabled)
+ virtnet_rx_dim_update(vi, rq);
+ }
if (xdp_xmit & VIRTIO_XDP_TX) {
sq = virtnet_xdp_get_sq(vi);
@@ -2233,8 +2274,11 @@ static int virtnet_open(struct net_device *dev)
disable_delayed_refill(vi);
cancel_delayed_work_sync(&vi->refill);
- for (i--; i >= 0; i--)
+ for (i--; i >= 0; i--) {
virtnet_disable_queue_pair(vi, i);
+ cancel_work_sync(&vi->rq[i].dim.work);
+ }
+
return err;
}
@@ -2396,8 +2440,10 @@ static int virtnet_rx_resize(struct virtnet_info *vi,
qindex = rq - vi->rq;
- if (running)
+ if (running) {
napi_disable(&rq->napi);
+ cancel_work_sync(&rq->dim.work);
+ }
err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_free_unused_buf);
if (err)
@@ -2652,8 +2698,10 @@ static int virtnet_close(struct net_device *dev)
/* Make sure refill_work doesn't re-enable napi! */
cancel_delayed_work_sync(&vi->refill);
- for (i = 0; i < vi->max_queue_pairs; i++)
+ for (i = 0; i < vi->max_queue_pairs; i++) {
virtnet_disable_queue_pair(vi, i);
+ cancel_work_sync(&vi->rq[i].dim.work);
+ }
return 0;
}
@@ -2905,20 +2953,25 @@ static int virtnet_send_rx_ctrl_coal_vq_cmd(struct virtnet_info *vi,
return 0;
}
-static int virtnet_send_tx_ctrl_coal_vq_cmd(struct virtnet_info *vi,
- u16 queue, u32 max_usecs,
- u32 max_packets)
+static int virtnet_send_tx_notf_coal_vq_cmds(struct virtnet_info *vi,
+ u16 queue, u32 max_usecs,
+ u32 max_packets)
{
int err;
+ spin_lock(&vi->ctrl_lock);
err = virtnet_send_ctrl_coal_vq_cmd(vi, txq2vq(queue),
max_usecs, max_packets);
- if (err)
+ if (err) {
+ spin_unlock(&vi->ctrl_lock);
return err;
+ }
vi->sq[queue].intr_coal.max_usecs = max_usecs;
vi->sq[queue].intr_coal.max_packets = max_packets;
+ spin_unlock(&vi->ctrl_lock);
+
return 0;
}
@@ -2935,9 +2988,6 @@ static void virtnet_get_ringparam(struct net_device *dev,
ring->tx_pending = virtqueue_get_vring_size(vi->sq[0].vq);
}
-static int virtnet_send_ctrl_coal_vq_cmd(struct virtnet_info *vi,
- u16 vqn, u32 max_usecs, u32 max_packets);
-
static int virtnet_set_ringparam(struct net_device *dev,
struct ethtool_ringparam *ring,
struct kernel_ethtool_ringparam *kernel_ring,
@@ -2979,15 +3029,11 @@ static int virtnet_set_ringparam(struct net_device *dev,
* through the VIRTIO_NET_CTRL_NOTF_COAL_TX_SET command, or, if the driver
* did not set any TX coalescing parameters, to 0.
*/
- spin_lock(&vi->ctrl_lock);
- err = virtnet_send_tx_ctrl_coal_vq_cmd(vi, i,
- vi->intr_coal_tx.max_usecs,
- vi->intr_coal_tx.max_packets);
- if (err) {
- spin_unlock(&vi->ctrl_lock);
+ err = virtnet_send_tx_notf_coal_vq_cmds(vi, i,
+ vi->intr_coal_tx.max_usecs,
+ vi->intr_coal_tx.max_packets);
+ if (err)
return err;
- }
- spin_unlock(&vi->ctrl_lock);
}
if (ring->rx_pending != rx_pending) {
@@ -3379,9 +3425,34 @@ static int virtnet_send_tx_notf_coal_cmds(struct virtnet_info *vi,
static int virtnet_send_rx_notf_coal_cmds(struct virtnet_info *vi,
struct ethtool_coalesce *ec)
{
+ bool rx_ctrl_dim_on = !!ec->use_adaptive_rx_coalesce;
struct scatterlist sgs_rx;
int i;
+ if (rx_ctrl_dim_on && !virtio_has_feature(vi->vdev, VIRTIO_NET_F_VQ_NOTF_COAL))
+ return -EOPNOTSUPP;
+
+ if (rx_ctrl_dim_on && (ec->rx_coalesce_usecs != vi->intr_coal_rx.max_usecs ||
+ ec->rx_max_coalesced_frames != vi->intr_coal_rx.max_packets))
+ return -EINVAL;
+
+ if (rx_ctrl_dim_on && !vi->rx_dim_enabled) {
+ vi->rx_dim_enabled = true;
+ for (i = 0; i < vi->max_queue_pairs; i++)
+ vi->rq[i].dim_enabled = true;
+ return 0;
+ }
+
+ if (!rx_ctrl_dim_on && vi->rx_dim_enabled) {
+ vi->rx_dim_enabled = false;
+ for (i = 0; i < vi->max_queue_pairs; i++)
+ vi->rq[i].dim_enabled = false;
+ }
+
+ /* Since the per-queue coalescing params can be set,
+ * we need apply the global new params even if they
+ * are not updated.
+ */
spin_lock(&vi->ctrl_lock);
vi->ctrl->coal_rx.rx_usecs = cpu_to_le32(ec->rx_coalesce_usecs);
vi->ctrl->coal_rx.rx_max_packets = cpu_to_le32(ec->rx_max_coalesced_frames);
@@ -3394,7 +3465,6 @@ static int virtnet_send_rx_notf_coal_cmds(struct virtnet_info *vi,
return -EINVAL;
}
- /* Save parameters */
vi->intr_coal_rx.max_usecs = ec->rx_coalesce_usecs;
vi->intr_coal_rx.max_packets = ec->rx_max_coalesced_frames;
for (i = 0; i < vi->max_queue_pairs; i++) {
@@ -3423,13 +3493,36 @@ static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
return 0;
}
-static int virtnet_send_notf_coal_vq_cmds(struct virtnet_info *vi,
- struct ethtool_coalesce *ec,
- u16 queue)
+static int virtnet_send_rx_notf_coal_vq_cmds(struct virtnet_info *vi,
+ struct ethtool_coalesce *ec,
+ u16 queue)
{
+ bool rx_ctrl_dim_on = !!ec->use_adaptive_rx_coalesce;
+ bool cur_rx_dim = vi->rq[queue].dim_enabled;
+ u32 max_usecs, max_packets;
int err;
spin_lock(&vi->ctrl_lock);
+ max_usecs = vi->rq[queue].intr_coal.max_usecs;
+ max_packets = vi->rq[queue].intr_coal.max_packets;
+ if (rx_ctrl_dim_on && (ec->rx_coalesce_usecs != max_usecs ||
+ ec->rx_max_coalesced_frames != max_packets)) {
+ spin_unlock(&vi->ctrl_lock);
+ return -EINVAL;
+ }
+
+ if (rx_ctrl_dim_on && !cur_rx_dim) {
+ vi->rq[queue].dim_enabled = true;
+ spin_unlock(&vi->ctrl_lock);
+ return 0;
+ }
+
+ if (!rx_ctrl_dim_on && cur_rx_dim)
+ vi->rq[queue].dim_enabled = false;
+
+ /* If no params are updated, userspace ethtool will
+ * reject the modification.
+ */
err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, queue,
ec->rx_coalesce_usecs,
ec->rx_max_coalesced_frames);
@@ -3438,19 +3531,56 @@ static int virtnet_send_notf_coal_vq_cmds(struct virtnet_info *vi,
return err;
}
- err = virtnet_send_tx_ctrl_coal_vq_cmd(vi, queue,
- ec->tx_coalesce_usecs,
- ec->tx_max_coalesced_frames);
- if (err) {
- spin_unlock(&vi->ctrl_lock);
+ spin_unlock(&vi->ctrl_lock);
+
+ return 0;
+}
+
+static int virtnet_send_notf_coal_vq_cmds(struct virtnet_info *vi,
+ struct ethtool_coalesce *ec,
+ u16 queue)
+{
+ int err;
+
+ err = virtnet_send_rx_notf_coal_vq_cmds(vi, ec, queue);
+ if (err)
return err;
- }
- spin_unlock(&vi->ctrl_lock);
+ err = virtnet_send_tx_notf_coal_vq_cmds(vi, queue,
+ ec->tx_coalesce_usecs,
+ ec->tx_max_coalesced_frames);
+ if (err)
+ return err;
return 0;
}
+static void virtnet_rx_dim_work(struct work_struct *work)
+{
+ struct dim *dim = container_of(work, struct dim, work);
+ struct receive_queue *rq = container_of(dim,
+ struct receive_queue, dim);
+ struct virtnet_info *vi = rq->vq->vdev->priv;
+ struct net_device *dev = vi->dev;
+ struct dim_cq_moder update_moder;
+ int qnum = rq - vi->rq, err;
+
+ update_moder = net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
+ spin_lock(&vi->ctrl_lock);
+ if (update_moder.usec != vi->rq[qnum].intr_coal.max_usecs ||
+ update_moder.pkts != vi->rq[qnum].intr_coal.max_packets) {
+ err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
+ update_moder.usec,
+ update_moder.pkts);
+ if (err)
+ pr_debug("%s: Failed to send dim parameters on rxq%d\n",
+ dev->name, (int)(rq - vi->rq));
+ }
+
+ spin_unlock(&vi->ctrl_lock);
+ dim->state = DIM_START_MEASURE;
+}
+
static int virtnet_coal_params_supported(struct ethtool_coalesce *ec)
{
/* usecs coalescing is supported only if VIRTIO_NET_F_NOTF_COAL
@@ -3532,6 +3662,7 @@ static int virtnet_get_coalesce(struct net_device *dev,
ec->tx_coalesce_usecs = vi->intr_coal_tx.max_usecs;
ec->tx_max_coalesced_frames = vi->intr_coal_tx.max_packets;
ec->rx_max_coalesced_frames = vi->intr_coal_rx.max_packets;
+ ec->use_adaptive_rx_coalesce = vi->rx_dim_enabled;
} else {
ec->rx_max_coalesced_frames = 1;
@@ -3589,6 +3720,7 @@ static int virtnet_get_per_queue_coalesce(struct net_device *dev,
ec->tx_coalesce_usecs = vi->sq[queue].intr_coal.max_usecs;
ec->tx_max_coalesced_frames = vi->sq[queue].intr_coal.max_packets;
ec->rx_max_coalesced_frames = vi->rq[queue].intr_coal.max_packets;
+ ec->use_adaptive_rx_coalesce = vi->rq[queue].dim_enabled;
} else {
ec->rx_max_coalesced_frames = 1;
@@ -3714,7 +3846,7 @@ static int virtnet_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *info)
static const struct ethtool_ops virtnet_ethtool_ops = {
.supported_coalesce_params = ETHTOOL_COALESCE_MAX_FRAMES |
- ETHTOOL_COALESCE_USECS,
+ ETHTOOL_COALESCE_USECS | ETHTOOL_COALESCE_USE_ADAPTIVE_RX,
.get_drvinfo = virtnet_get_drvinfo,
.get_link = ethtool_op_get_link,
.get_ringparam = virtnet_get_ringparam,
@@ -4308,6 +4440,9 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
virtnet_poll_tx,
napi_tx ? napi_weight : 0);
+ INIT_WORK(&vi->rq[i].dim.work, virtnet_rx_dim_work);
+ vi->rq[i].dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+
sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
ewma_pkt_len_init(&vi->rq[i].mrg_avg_pkt_len);
sg_init_table(vi->sq[i].sg, ARRAY_SIZE(vi->sq[i].sg));
--
2.19.1.6.gb485710b
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH net-next v6 4/5] virtio-net: add spin lock for ctrl cmd access
2023-12-05 8:02 ` [PATCH net-next v6 4/5] virtio-net: add spin lock for ctrl cmd access Heng Qi
@ 2023-12-05 8:35 ` Jason Wang
2023-12-05 11:05 ` Heng Qi
0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2023-12-05 8:35 UTC (permalink / raw)
To: Heng Qi
Cc: netdev, virtualization, mst, pabeni, kuba, yinjun.zhang, edumazet,
davem, hawk, john.fastabend, ast, horms, xuanzhuo
On Tue, Dec 5, 2023 at 4:02 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> Currently access to ctrl cmd is globally protected via rtnl_lock and works
> fine. But if dim work's access to ctrl cmd also holds rtnl_lock, deadlock
> may occur due to cancel_work_sync for dim work.
Can you explain why?
> Therefore, treating
> ctrl cmd as a separate protection object of the lock is the solution and
> the basis for the next patch.
Let's don't do that. Reasons are:
1) virtnet_send_command() may wait for cvq commands for an indefinite time
2) hold locks may complicate the future hardening works around cvq
Thanks
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next v6 4/5] virtio-net: add spin lock for ctrl cmd access
2023-12-05 8:35 ` Jason Wang
@ 2023-12-05 11:05 ` Heng Qi
2023-12-06 9:11 ` Jason Wang
2023-12-06 12:27 ` Paolo Abeni
0 siblings, 2 replies; 14+ messages in thread
From: Heng Qi @ 2023-12-05 11:05 UTC (permalink / raw)
To: Jason Wang
Cc: netdev, virtualization, mst, pabeni, kuba, yinjun.zhang, edumazet,
davem, hawk, john.fastabend, ast, horms, xuanzhuo
在 2023/12/5 下午4:35, Jason Wang 写道:
> On Tue, Dec 5, 2023 at 4:02 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>> Currently access to ctrl cmd is globally protected via rtnl_lock and works
>> fine. But if dim work's access to ctrl cmd also holds rtnl_lock, deadlock
>> may occur due to cancel_work_sync for dim work.
> Can you explain why?
For example, during the bus unbind operation, the following call stack
occurs:
virtnet_remove -> unregister_netdev -> rtnl_lock[1] -> virtnet_close ->
cancel_work_sync -> virtnet_rx_dim_work -> rtnl_lock[2] (deadlock occurs).
>> Therefore, treating
>> ctrl cmd as a separate protection object of the lock is the solution and
>> the basis for the next patch.
> Let's don't do that. Reasons are:
>
> 1) virtnet_send_command() may wait for cvq commands for an indefinite time
Yes, I took that into consideration. But ndo_set_rx_mode's need for an
atomic
environment rules out the mutex lock.
> 2) hold locks may complicate the future hardening works around cvq
Agree, but I don't seem to have thought of a better way besides passing
the lock.
Do you have any other better ideas or suggestions?
Thanks!
>
> Thanks
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next v6 4/5] virtio-net: add spin lock for ctrl cmd access
2023-12-05 11:05 ` Heng Qi
@ 2023-12-06 9:11 ` Jason Wang
2023-12-06 12:27 ` Paolo Abeni
1 sibling, 0 replies; 14+ messages in thread
From: Jason Wang @ 2023-12-06 9:11 UTC (permalink / raw)
To: Heng Qi
Cc: netdev, virtualization, mst, pabeni, kuba, yinjun.zhang, edumazet,
davem, hawk, john.fastabend, ast, horms, xuanzhuo
On Tue, Dec 5, 2023 at 7:06 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
>
>
> 在 2023/12/5 下午4:35, Jason Wang 写道:
> > On Tue, Dec 5, 2023 at 4:02 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> >> Currently access to ctrl cmd is globally protected via rtnl_lock and works
> >> fine. But if dim work's access to ctrl cmd also holds rtnl_lock, deadlock
> >> may occur due to cancel_work_sync for dim work.
> > Can you explain why?
>
> For example, during the bus unbind operation, the following call stack
> occurs:
> virtnet_remove -> unregister_netdev -> rtnl_lock[1] -> virtnet_close ->
> cancel_work_sync -> virtnet_rx_dim_work -> rtnl_lock[2] (deadlock occurs).
Can we use rtnl_trylock() and reschedule the work?
>
> >> Therefore, treating
> >> ctrl cmd as a separate protection object of the lock is the solution and
> >> the basis for the next patch.
> > Let's don't do that. Reasons are:
> >
> > 1) virtnet_send_command() may wait for cvq commands for an indefinite time
>
> Yes, I took that into consideration. But ndo_set_rx_mode's need for an
> atomic
> environment rules out the mutex lock.
It is a "bug" that we need to fix.
>
> > 2) hold locks may complicate the future hardening works around cvq
>
> Agree, but I don't seem to have thought of a better way besides passing
> the lock.
> Do you have any other better ideas or suggestions?
So far no, you can refer to the past discussions, it might require the
collaboration from the uAPI and stack.
Thanks
>
> Thanks!
>
> >
> > Thanks
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next v6 4/5] virtio-net: add spin lock for ctrl cmd access
2023-12-05 11:05 ` Heng Qi
2023-12-06 9:11 ` Jason Wang
@ 2023-12-06 12:27 ` Paolo Abeni
2023-12-06 13:03 ` Heng Qi
1 sibling, 1 reply; 14+ messages in thread
From: Paolo Abeni @ 2023-12-06 12:27 UTC (permalink / raw)
To: Heng Qi, Jason Wang
Cc: netdev, virtualization, mst, kuba, yinjun.zhang, edumazet, davem,
hawk, john.fastabend, ast, horms, xuanzhuo
On Tue, 2023-12-05 at 19:05 +0800, Heng Qi wrote:
>
> 在 2023/12/5 下午4:35, Jason Wang 写道:
> > On Tue, Dec 5, 2023 at 4:02 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> > > Currently access to ctrl cmd is globally protected via rtnl_lock and works
> > > fine. But if dim work's access to ctrl cmd also holds rtnl_lock, deadlock
> > > may occur due to cancel_work_sync for dim work.
> > Can you explain why?
>
> For example, during the bus unbind operation, the following call stack
> occurs:
> virtnet_remove -> unregister_netdev -> rtnl_lock[1] -> virtnet_close ->
> cancel_work_sync -> virtnet_rx_dim_work -> rtnl_lock[2] (deadlock occurs).
>
> > > Therefore, treating
> > > ctrl cmd as a separate protection object of the lock is the solution and
> > > the basis for the next patch.
> > Let's don't do that. Reasons are:
> >
> > 1) virtnet_send_command() may wait for cvq commands for an indefinite time
>
> Yes, I took that into consideration. But ndo_set_rx_mode's need for an
> atomic
> environment rules out the mutex lock.
>
> > 2) hold locks may complicate the future hardening works around cvq
>
> Agree, but I don't seem to have thought of a better way besides passing
> the lock.
> Do you have any other better ideas or suggestions?
What about:
- using the rtnl lock only
- virtionet_close() invokes cancel_work(), without flushing the work
- virtnet_remove() calls flush_work() after unregister_netdev(),
outside the rtnl lock
Should prevent both the deadlock and the UaF.
Side note: for this specific case any functional test with a
CONFIG_LOCKDEP enabled build should suffice to catch the deadlock
scenario above.
Cheers,
Paolo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next v6 4/5] virtio-net: add spin lock for ctrl cmd access
2023-12-06 12:27 ` Paolo Abeni
@ 2023-12-06 13:03 ` Heng Qi
2023-12-07 4:19 ` Jason Wang
0 siblings, 1 reply; 14+ messages in thread
From: Heng Qi @ 2023-12-06 13:03 UTC (permalink / raw)
To: Paolo Abeni, Jason Wang
Cc: netdev, virtualization, mst, kuba, yinjun.zhang, edumazet, davem,
hawk, john.fastabend, ast, horms, xuanzhuo
在 2023/12/6 下午8:27, Paolo Abeni 写道:
> On Tue, 2023-12-05 at 19:05 +0800, Heng Qi wrote:
>> 在 2023/12/5 下午4:35, Jason Wang 写道:
>>> On Tue, Dec 5, 2023 at 4:02 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>>>> Currently access to ctrl cmd is globally protected via rtnl_lock and works
>>>> fine. But if dim work's access to ctrl cmd also holds rtnl_lock, deadlock
>>>> may occur due to cancel_work_sync for dim work.
>>> Can you explain why?
>> For example, during the bus unbind operation, the following call stack
>> occurs:
>> virtnet_remove -> unregister_netdev -> rtnl_lock[1] -> virtnet_close ->
>> cancel_work_sync -> virtnet_rx_dim_work -> rtnl_lock[2] (deadlock occurs).
>>
>>>> Therefore, treating
>>>> ctrl cmd as a separate protection object of the lock is the solution and
>>>> the basis for the next patch.
>>> Let's don't do that. Reasons are:
>>>
>>> 1) virtnet_send_command() may wait for cvq commands for an indefinite time
>> Yes, I took that into consideration. But ndo_set_rx_mode's need for an
>> atomic
>> environment rules out the mutex lock.
>>
>>> 2) hold locks may complicate the future hardening works around cvq
>> Agree, but I don't seem to have thought of a better way besides passing
>> the lock.
>> Do you have any other better ideas or suggestions?
> What about:
>
> - using the rtnl lock only
> - virtionet_close() invokes cancel_work(), without flushing the work
> - virtnet_remove() calls flush_work() after unregister_netdev(),
> outside the rtnl lock
>
> Should prevent both the deadlock and the UaF.
Hi, Paolo and Jason!
Thank you very much for your effective suggestions, but I found another
solution[1],
based on the ideas of rtnl_trylock and refill_work, which works very well:
[1]
+static void virtnet_rx_dim_work(struct work_struct *work)
+{
+ struct dim *dim = container_of(work, struct dim, work);
+ struct receive_queue *rq = container_of(dim,
+ struct receive_queue, dim);
+ struct virtnet_info *vi = rq->vq->vdev->priv;
+ struct net_device *dev = vi->dev;
+ struct dim_cq_moder update_moder;
+ int i, qnum, err;
+
+ if (!rtnl_trylock())
+ return;
+
+ for (i = 0; i < vi->curr_queue_pairs; i++) {
+ rq = &vi->rq[i];
+ dim = &rq->dim;
+ qnum = rq - vi->rq;
+
+ if (!rq->dim_enabled)
+ continue;
+
+ update_moder = net_dim_get_rx_moderation(dim->mode,
dim->profile_ix);
+ if (update_moder.usec != rq->intr_coal.max_usecs ||
+ update_moder.pkts != rq->intr_coal.max_packets) {
+ err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
+ update_moder.usec,
+ update_moder.pkts);
+ if (err)
+ pr_debug("%s: Failed to send dim parameters on rxq%d\n",
+ dev->name, qnum);
+ dim->state = DIM_START_MEASURE;
+ }
+ }
+
+ rtnl_unlock();
+}
In addition, other optimizations[2] have been tried, but it may be due
to the sparsely
scheduled work that the retry condition is always satisfied, affecting
performance,
so [1] is the final solution:
[2]
+static void virtnet_rx_dim_work(struct work_struct *work)
+{
+ struct dim *dim = container_of(work, struct dim, work);
+ struct receive_queue *rq = container_of(dim,
+ struct receive_queue, dim);
+ struct virtnet_info *vi = rq->vq->vdev->priv;
+ struct net_device *dev = vi->dev;
+ struct dim_cq_moder update_moder;
+ int i, qnum, err, count;
+
+ if (!rtnl_trylock())
+ return;
+retry:
+ count = vi->curr_queue_pairs;
+ for (i = 0; i < vi->curr_queue_pairs; i++) {
+ rq = &vi->rq[i];
+ dim = &rq->dim;
+ qnum = rq - vi->rq;
+ update_moder = net_dim_get_rx_moderation(dim->mode,
dim->profile_ix);
+ if (update_moder.usec != rq->intr_coal.max_usecs ||
+ update_moder.pkts != rq->intr_coal.max_packets) {
+ --count;
+ err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
+ update_moder.usec,
+ update_moder.pkts);
+ if (err)
+ pr_debug("%s: Failed to send dim parameters on rxq%d\n",
+ dev->name, qnum);
+ dim->state = DIM_START_MEASURE;
+ }
+ }
+
+ if (need_resched()) {
+ rtnl_unlock();
+ schedule();
+ }
+
+ if (count)
+ goto retry;
+
+ rtnl_unlock();
+}
Thanks a lot!
>
> Side note: for this specific case any functional test with a
> CONFIG_LOCKDEP enabled build should suffice to catch the deadlock
> scenario above.
>
> Cheers,
>
> Paolo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next v6 4/5] virtio-net: add spin lock for ctrl cmd access
2023-12-06 13:03 ` Heng Qi
@ 2023-12-07 4:19 ` Jason Wang
2023-12-07 4:47 ` Heng Qi
0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2023-12-07 4:19 UTC (permalink / raw)
To: Heng Qi
Cc: Paolo Abeni, netdev, virtualization, mst, kuba, yinjun.zhang,
edumazet, davem, hawk, john.fastabend, ast, horms, xuanzhuo
On Wed, Dec 6, 2023 at 9:03 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
>
>
> 在 2023/12/6 下午8:27, Paolo Abeni 写道:
> > On Tue, 2023-12-05 at 19:05 +0800, Heng Qi wrote:
> >> 在 2023/12/5 下午4:35, Jason Wang 写道:
> >>> On Tue, Dec 5, 2023 at 4:02 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> >>>> Currently access to ctrl cmd is globally protected via rtnl_lock and works
> >>>> fine. But if dim work's access to ctrl cmd also holds rtnl_lock, deadlock
> >>>> may occur due to cancel_work_sync for dim work.
> >>> Can you explain why?
> >> For example, during the bus unbind operation, the following call stack
> >> occurs:
> >> virtnet_remove -> unregister_netdev -> rtnl_lock[1] -> virtnet_close ->
> >> cancel_work_sync -> virtnet_rx_dim_work -> rtnl_lock[2] (deadlock occurs).
> >>
> >>>> Therefore, treating
> >>>> ctrl cmd as a separate protection object of the lock is the solution and
> >>>> the basis for the next patch.
> >>> Let's don't do that. Reasons are:
> >>>
> >>> 1) virtnet_send_command() may wait for cvq commands for an indefinite time
> >> Yes, I took that into consideration. But ndo_set_rx_mode's need for an
> >> atomic
> >> environment rules out the mutex lock.
> >>
> >>> 2) hold locks may complicate the future hardening works around cvq
> >> Agree, but I don't seem to have thought of a better way besides passing
> >> the lock.
> >> Do you have any other better ideas or suggestions?
> > What about:
> >
> > - using the rtnl lock only
> > - virtionet_close() invokes cancel_work(), without flushing the work
> > - virtnet_remove() calls flush_work() after unregister_netdev(),
> > outside the rtnl lock
> >
> > Should prevent both the deadlock and the UaF.
>
>
> Hi, Paolo and Jason!
>
> Thank you very much for your effective suggestions, but I found another
> solution[1],
> based on the ideas of rtnl_trylock and refill_work, which works very well:
>
> [1]
> +static void virtnet_rx_dim_work(struct work_struct *work)
> +{
> + struct dim *dim = container_of(work, struct dim, work);
> + struct receive_queue *rq = container_of(dim,
> + struct receive_queue, dim);
> + struct virtnet_info *vi = rq->vq->vdev->priv;
> + struct net_device *dev = vi->dev;
> + struct dim_cq_moder update_moder;
> + int i, qnum, err;
> +
> + if (!rtnl_trylock())
> + return;
Don't we need to reschedule here?
like
if (rq->dim_enabled)
sechedule_work()
?
Thanks
> +
> + for (i = 0; i < vi->curr_queue_pairs; i++) {
> + rq = &vi->rq[i];
> + dim = &rq->dim;
> + qnum = rq - vi->rq;
> +
> + if (!rq->dim_enabled)
> + continue;
> +
> + update_moder = net_dim_get_rx_moderation(dim->mode,
> dim->profile_ix);
> + if (update_moder.usec != rq->intr_coal.max_usecs ||
> + update_moder.pkts != rq->intr_coal.max_packets) {
> + err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
> + update_moder.usec,
> + update_moder.pkts);
> + if (err)
> + pr_debug("%s: Failed to send dim parameters on rxq%d\n",
> + dev->name, qnum);
> + dim->state = DIM_START_MEASURE;
> + }
> + }
> +
> + rtnl_unlock();
> +}
>
>
> In addition, other optimizations[2] have been tried, but it may be due
> to the sparsely
> scheduled work that the retry condition is always satisfied, affecting
> performance,
> so [1] is the final solution:
>
> [2]
>
> +static void virtnet_rx_dim_work(struct work_struct *work)
> +{
> + struct dim *dim = container_of(work, struct dim, work);
> + struct receive_queue *rq = container_of(dim,
> + struct receive_queue, dim);
> + struct virtnet_info *vi = rq->vq->vdev->priv;
> + struct net_device *dev = vi->dev;
> + struct dim_cq_moder update_moder;
> + int i, qnum, err, count;
> +
> + if (!rtnl_trylock())
> + return;
> +retry:
> + count = vi->curr_queue_pairs;
> + for (i = 0; i < vi->curr_queue_pairs; i++) {
> + rq = &vi->rq[i];
> + dim = &rq->dim;
> + qnum = rq - vi->rq;
> + update_moder = net_dim_get_rx_moderation(dim->mode,
> dim->profile_ix);
> + if (update_moder.usec != rq->intr_coal.max_usecs ||
> + update_moder.pkts != rq->intr_coal.max_packets) {
> + --count;
> + err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
> + update_moder.usec,
> + update_moder.pkts);
> + if (err)
> + pr_debug("%s: Failed to send dim parameters on rxq%d\n",
> + dev->name, qnum);
> + dim->state = DIM_START_MEASURE;
> + }
> + }
> +
> + if (need_resched()) {
> + rtnl_unlock();
> + schedule();
> + }
> +
> + if (count)
> + goto retry;
> +
> + rtnl_unlock();
> +}
>
> Thanks a lot!
>
> >
> > Side note: for this specific case any functional test with a
> > CONFIG_LOCKDEP enabled build should suffice to catch the deadlock
> > scenario above.
> >
> > Cheers,
> >
> > Paolo
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next v6 4/5] virtio-net: add spin lock for ctrl cmd access
2023-12-07 4:19 ` Jason Wang
@ 2023-12-07 4:47 ` Heng Qi
2023-12-07 5:31 ` Jason Wang
0 siblings, 1 reply; 14+ messages in thread
From: Heng Qi @ 2023-12-07 4:47 UTC (permalink / raw)
To: Jason Wang
Cc: Paolo Abeni, netdev, virtualization, mst, kuba, yinjun.zhang,
edumazet, davem, hawk, john.fastabend, ast, horms, xuanzhuo
在 2023/12/7 下午12:19, Jason Wang 写道:
> On Wed, Dec 6, 2023 at 9:03 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>>
>>
>> 在 2023/12/6 下午8:27, Paolo Abeni 写道:
>>> On Tue, 2023-12-05 at 19:05 +0800, Heng Qi wrote:
>>>> 在 2023/12/5 下午4:35, Jason Wang 写道:
>>>>> On Tue, Dec 5, 2023 at 4:02 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>>>>>> Currently access to ctrl cmd is globally protected via rtnl_lock and works
>>>>>> fine. But if dim work's access to ctrl cmd also holds rtnl_lock, deadlock
>>>>>> may occur due to cancel_work_sync for dim work.
>>>>> Can you explain why?
>>>> For example, during the bus unbind operation, the following call stack
>>>> occurs:
>>>> virtnet_remove -> unregister_netdev -> rtnl_lock[1] -> virtnet_close ->
>>>> cancel_work_sync -> virtnet_rx_dim_work -> rtnl_lock[2] (deadlock occurs).
>>>>
>>>>>> Therefore, treating
>>>>>> ctrl cmd as a separate protection object of the lock is the solution and
>>>>>> the basis for the next patch.
>>>>> Let's don't do that. Reasons are:
>>>>>
>>>>> 1) virtnet_send_command() may wait for cvq commands for an indefinite time
>>>> Yes, I took that into consideration. But ndo_set_rx_mode's need for an
>>>> atomic
>>>> environment rules out the mutex lock.
>>>>
>>>>> 2) hold locks may complicate the future hardening works around cvq
>>>> Agree, but I don't seem to have thought of a better way besides passing
>>>> the lock.
>>>> Do you have any other better ideas or suggestions?
>>> What about:
>>>
>>> - using the rtnl lock only
>>> - virtionet_close() invokes cancel_work(), without flushing the work
>>> - virtnet_remove() calls flush_work() after unregister_netdev(),
>>> outside the rtnl lock
>>>
>>> Should prevent both the deadlock and the UaF.
>>
>> Hi, Paolo and Jason!
>>
>> Thank you very much for your effective suggestions, but I found another
>> solution[1],
>> based on the ideas of rtnl_trylock and refill_work, which works very well:
>>
>> [1]
>> +static void virtnet_rx_dim_work(struct work_struct *work)
>> +{
>> + struct dim *dim = container_of(work, struct dim, work);
>> + struct receive_queue *rq = container_of(dim,
>> + struct receive_queue, dim);
>> + struct virtnet_info *vi = rq->vq->vdev->priv;
>> + struct net_device *dev = vi->dev;
>> + struct dim_cq_moder update_moder;
>> + int i, qnum, err;
>> +
>> + if (!rtnl_trylock())
>> + return;
> Don't we need to reschedule here?
>
> like
>
> if (rq->dim_enabled)
> sechedule_work()
>
> ?
I think no, we don't need this.
The work of each queue will be called by "net_dim()->schedule_work()"
when napi traffic changes (before schedule_work(), the dim->profile_ix
of the corresponding rxq has been updated).
So we only need to traverse and update the profiles of all rxqs in the
work which is obtaining the rtnl_lock.
Thanks!
>
> Thanks
>
>> +
>> + for (i = 0; i < vi->curr_queue_pairs; i++) {
>> + rq = &vi->rq[i];
>> + dim = &rq->dim;
>> + qnum = rq - vi->rq;
>> +
>> + if (!rq->dim_enabled)
>> + continue;
>> +
>> + update_moder = net_dim_get_rx_moderation(dim->mode,
>> dim->profile_ix);
>> + if (update_moder.usec != rq->intr_coal.max_usecs ||
>> + update_moder.pkts != rq->intr_coal.max_packets) {
>> + err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
>> + update_moder.usec,
>> + update_moder.pkts);
>> + if (err)
>> + pr_debug("%s: Failed to send dim parameters on rxq%d\n",
>> + dev->name, qnum);
>> + dim->state = DIM_START_MEASURE;
>> + }
>> + }
>> +
>> + rtnl_unlock();
>> +}
>>
>>
>> In addition, other optimizations[2] have been tried, but it may be due
>> to the sparsely
>> scheduled work that the retry condition is always satisfied, affecting
>> performance,
>> so [1] is the final solution:
>>
>> [2]
>>
>> +static void virtnet_rx_dim_work(struct work_struct *work)
>> +{
>> + struct dim *dim = container_of(work, struct dim, work);
>> + struct receive_queue *rq = container_of(dim,
>> + struct receive_queue, dim);
>> + struct virtnet_info *vi = rq->vq->vdev->priv;
>> + struct net_device *dev = vi->dev;
>> + struct dim_cq_moder update_moder;
>> + int i, qnum, err, count;
>> +
>> + if (!rtnl_trylock())
>> + return;
>> +retry:
>> + count = vi->curr_queue_pairs;
>> + for (i = 0; i < vi->curr_queue_pairs; i++) {
>> + rq = &vi->rq[i];
>> + dim = &rq->dim;
>> + qnum = rq - vi->rq;
>> + update_moder = net_dim_get_rx_moderation(dim->mode,
>> dim->profile_ix);
>> + if (update_moder.usec != rq->intr_coal.max_usecs ||
>> + update_moder.pkts != rq->intr_coal.max_packets) {
>> + --count;
>> + err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
>> + update_moder.usec,
>> + update_moder.pkts);
>> + if (err)
>> + pr_debug("%s: Failed to send dim parameters on rxq%d\n",
>> + dev->name, qnum);
>> + dim->state = DIM_START_MEASURE;
>> + }
>> + }
>> +
>> + if (need_resched()) {
>> + rtnl_unlock();
>> + schedule();
>> + }
>> +
>> + if (count)
>> + goto retry;
>> +
>> + rtnl_unlock();
>> +}
>>
>> Thanks a lot!
>>
>>> Side note: for this specific case any functional test with a
>>> CONFIG_LOCKDEP enabled build should suffice to catch the deadlock
>>> scenario above.
>>>
>>> Cheers,
>>>
>>> Paolo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next v6 4/5] virtio-net: add spin lock for ctrl cmd access
2023-12-07 4:47 ` Heng Qi
@ 2023-12-07 5:31 ` Jason Wang
0 siblings, 0 replies; 14+ messages in thread
From: Jason Wang @ 2023-12-07 5:31 UTC (permalink / raw)
To: Heng Qi
Cc: Paolo Abeni, netdev, virtualization, mst, kuba, yinjun.zhang,
edumazet, davem, hawk, john.fastabend, ast, horms, xuanzhuo
On Thu, Dec 7, 2023 at 12:47 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
>
>
> 在 2023/12/7 下午12:19, Jason Wang 写道:
> > On Wed, Dec 6, 2023 at 9:03 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> >>
> >>
> >> 在 2023/12/6 下午8:27, Paolo Abeni 写道:
> >>> On Tue, 2023-12-05 at 19:05 +0800, Heng Qi wrote:
> >>>> 在 2023/12/5 下午4:35, Jason Wang 写道:
> >>>>> On Tue, Dec 5, 2023 at 4:02 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> >>>>>> Currently access to ctrl cmd is globally protected via rtnl_lock and works
> >>>>>> fine. But if dim work's access to ctrl cmd also holds rtnl_lock, deadlock
> >>>>>> may occur due to cancel_work_sync for dim work.
> >>>>> Can you explain why?
> >>>> For example, during the bus unbind operation, the following call stack
> >>>> occurs:
> >>>> virtnet_remove -> unregister_netdev -> rtnl_lock[1] -> virtnet_close ->
> >>>> cancel_work_sync -> virtnet_rx_dim_work -> rtnl_lock[2] (deadlock occurs).
> >>>>
> >>>>>> Therefore, treating
> >>>>>> ctrl cmd as a separate protection object of the lock is the solution and
> >>>>>> the basis for the next patch.
> >>>>> Let's don't do that. Reasons are:
> >>>>>
> >>>>> 1) virtnet_send_command() may wait for cvq commands for an indefinite time
> >>>> Yes, I took that into consideration. But ndo_set_rx_mode's need for an
> >>>> atomic
> >>>> environment rules out the mutex lock.
> >>>>
> >>>>> 2) hold locks may complicate the future hardening works around cvq
> >>>> Agree, but I don't seem to have thought of a better way besides passing
> >>>> the lock.
> >>>> Do you have any other better ideas or suggestions?
> >>> What about:
> >>>
> >>> - using the rtnl lock only
> >>> - virtionet_close() invokes cancel_work(), without flushing the work
> >>> - virtnet_remove() calls flush_work() after unregister_netdev(),
> >>> outside the rtnl lock
> >>>
> >>> Should prevent both the deadlock and the UaF.
> >>
> >> Hi, Paolo and Jason!
> >>
> >> Thank you very much for your effective suggestions, but I found another
> >> solution[1],
> >> based on the ideas of rtnl_trylock and refill_work, which works very well:
> >>
> >> [1]
> >> +static void virtnet_rx_dim_work(struct work_struct *work)
> >> +{
> >> + struct dim *dim = container_of(work, struct dim, work);
> >> + struct receive_queue *rq = container_of(dim,
> >> + struct receive_queue, dim);
> >> + struct virtnet_info *vi = rq->vq->vdev->priv;
> >> + struct net_device *dev = vi->dev;
> >> + struct dim_cq_moder update_moder;
> >> + int i, qnum, err;
> >> +
> >> + if (!rtnl_trylock())
> >> + return;
> > Don't we need to reschedule here?
> >
> > like
> >
> > if (rq->dim_enabled)
> > sechedule_work()
> >
> > ?
>
> I think no, we don't need this.
>
> The work of each queue will be called by "net_dim()->schedule_work()"
> when napi traffic changes (before schedule_work(), the dim->profile_ix
> of the corresponding rxq has been updated).
> So we only need to traverse and update the profiles of all rxqs in the
> work which is obtaining the rtnl_lock.
Ok, let's have a comment here then.
Thanks
>
> Thanks!
>
> >
> > Thanks
> >
> >> +
> >> + for (i = 0; i < vi->curr_queue_pairs; i++) {
> >> + rq = &vi->rq[i];
> >> + dim = &rq->dim;
> >> + qnum = rq - vi->rq;
> >> +
> >> + if (!rq->dim_enabled)
> >> + continue;
> >> +
> >> + update_moder = net_dim_get_rx_moderation(dim->mode,
> >> dim->profile_ix);
> >> + if (update_moder.usec != rq->intr_coal.max_usecs ||
> >> + update_moder.pkts != rq->intr_coal.max_packets) {
> >> + err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
> >> + update_moder.usec,
> >> + update_moder.pkts);
> >> + if (err)
> >> + pr_debug("%s: Failed to send dim parameters on rxq%d\n",
> >> + dev->name, qnum);
> >> + dim->state = DIM_START_MEASURE;
> >> + }
> >> + }
> >> +
> >> + rtnl_unlock();
> >> +}
> >>
> >>
> >> In addition, other optimizations[2] have been tried, but it may be due
> >> to the sparsely
> >> scheduled work that the retry condition is always satisfied, affecting
> >> performance,
> >> so [1] is the final solution:
> >>
> >> [2]
> >>
> >> +static void virtnet_rx_dim_work(struct work_struct *work)
> >> +{
> >> + struct dim *dim = container_of(work, struct dim, work);
> >> + struct receive_queue *rq = container_of(dim,
> >> + struct receive_queue, dim);
> >> + struct virtnet_info *vi = rq->vq->vdev->priv;
> >> + struct net_device *dev = vi->dev;
> >> + struct dim_cq_moder update_moder;
> >> + int i, qnum, err, count;
> >> +
> >> + if (!rtnl_trylock())
> >> + return;
> >> +retry:
> >> + count = vi->curr_queue_pairs;
> >> + for (i = 0; i < vi->curr_queue_pairs; i++) {
> >> + rq = &vi->rq[i];
> >> + dim = &rq->dim;
> >> + qnum = rq - vi->rq;
> >> + update_moder = net_dim_get_rx_moderation(dim->mode,
> >> dim->profile_ix);
> >> + if (update_moder.usec != rq->intr_coal.max_usecs ||
> >> + update_moder.pkts != rq->intr_coal.max_packets) {
> >> + --count;
> >> + err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
> >> + update_moder.usec,
> >> + update_moder.pkts);
> >> + if (err)
> >> + pr_debug("%s: Failed to send dim parameters on rxq%d\n",
> >> + dev->name, qnum);
> >> + dim->state = DIM_START_MEASURE;
> >> + }
> >> + }
> >> +
> >> + if (need_resched()) {
> >> + rtnl_unlock();
> >> + schedule();
> >> + }
> >> +
> >> + if (count)
> >> + goto retry;
> >> +
> >> + rtnl_unlock();
> >> +}
> >>
> >> Thanks a lot!
> >>
> >>> Side note: for this specific case any functional test with a
> >>> CONFIG_LOCKDEP enabled build should suffice to catch the deadlock
> >>> scenario above.
> >>>
> >>> Cheers,
> >>>
> >>> Paolo
>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-12-07 5:32 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-05 8:02 [PATCH net-next v6 0/5] virtio-net: support dynamic coalescing moderation Heng Qi
2023-12-05 8:02 ` [PATCH net-next v6 1/5] virtio-net: returns whether napi is complete Heng Qi
2023-12-05 8:02 ` [PATCH net-next v6 2/5] virtio-net: separate rx/tx coalescing moderation cmds Heng Qi
2023-12-05 8:02 ` [PATCH net-next v6 3/5] virtio-net: extract virtqueue coalescig cmd for reuse Heng Qi
2023-12-05 8:02 ` [PATCH net-next v6 4/5] virtio-net: add spin lock for ctrl cmd access Heng Qi
2023-12-05 8:35 ` Jason Wang
2023-12-05 11:05 ` Heng Qi
2023-12-06 9:11 ` Jason Wang
2023-12-06 12:27 ` Paolo Abeni
2023-12-06 13:03 ` Heng Qi
2023-12-07 4:19 ` Jason Wang
2023-12-07 4:47 ` Heng Qi
2023-12-07 5:31 ` Jason Wang
2023-12-05 8:02 ` [PATCH net-next v6 5/5] virtio-net: support rx netdim Heng Qi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.