* [PATCH net-next V2 0/2] veth: qdisc backpressure and qdisc check refactor @ 2025-04-08 15:31 Jesper Dangaard Brouer 2025-04-08 15:31 ` [PATCH net-next V2 1/2] veth: apply qdisc backpressure on full ptr_ring to reduce TX drops Jesper Dangaard Brouer ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Jesper Dangaard Brouer @ 2025-04-08 15:31 UTC (permalink / raw) To: netdev, Jakub Kicinski Cc: Jesper Dangaard Brouer, bpf, tom, Eric Dumazet, David S. Miller, Paolo Abeni, Toke Høiland-Jørgensen, dsahern, makita.toshiaki, kernel-team This series addresses TX drops observed in production when using veth devices with threaded NAPI, and refactors a common qdisc check into a shared helper. In threaded NAPI mode, packet drops can occur when the ptr_ring backing the veth peer fills up. This is typically due to a combination of scheduling delays and the consumer (NAPI thread) being slower than the producer. When the ring overflows, packets are dropped in veth_xmit(). Patch 1 introduces a backpressure mechanism: when the ring is full, the driver returns NETDEV_TX_BUSY, signaling the qdisc layer to requeue the packet. This allows Active Queue Management (AQM) - such as fq or sfq - to spread traffic more fairly across flows and reduce damage from elephant flows. To minimize invasiveness, this backpressure behavior is only enabled when a qdisc is attached. If no qdisc is present, the driver retains its original behavior (dropping packets on a full ring), avoiding behavior changes for configurations without a qdisc. Detecting the presence of a "real" qdisc relies on a check that is already duplicated across multiple drivers (e.g., veth, vrf). Patch-2 consolidates this logic into a new helper, qdisc_txq_is_noop(), to avoid duplication and clarify intent. --- Jesper Dangaard Brouer (2): veth: apply qdisc backpressure on full ptr_ring to reduce TX drops net: sched: generalize check for no-op qdisc drivers/net/veth.c | 49 ++++++++++++++++++++++++++++++++------- drivers/net/vrf.c | 3 +-- include/net/sch_generic.h | 7 +++++- 3 files changed, 48 insertions(+), 11 deletions(-) -- ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net-next V2 1/2] veth: apply qdisc backpressure on full ptr_ring to reduce TX drops 2025-04-08 15:31 [PATCH net-next V2 0/2] veth: qdisc backpressure and qdisc check refactor Jesper Dangaard Brouer @ 2025-04-08 15:31 ` Jesper Dangaard Brouer 2025-04-11 12:45 ` Simon Horman 2025-04-08 15:31 ` [PATCH net-next V2 2/2] net: sched: generalize check for no-op qdisc Jesper Dangaard Brouer 2025-04-11 12:59 ` [PATCH net-next V2 0/2] veth: qdisc backpressure and qdisc check refactor Jakub Sitnicki 2 siblings, 1 reply; 11+ messages in thread From: Jesper Dangaard Brouer @ 2025-04-08 15:31 UTC (permalink / raw) To: netdev, Jakub Kicinski Cc: Jesper Dangaard Brouer, bpf, tom, Eric Dumazet, David S. Miller, Paolo Abeni, Toke Høiland-Jørgensen, dsahern, makita.toshiaki, kernel-team In production, we're seeing TX drops on veth devices when the ptr_ring fills up. This can occur when NAPI mode is enabled, though it's relatively rare. However, with threaded NAPI - which we use in production - the drops become significantly more frequent. The underlying issue is that with threaded NAPI, the consumer often runs on a different CPU than the producer. This increases the likelihood of the ring filling up before the consumer gets scheduled, especially under load, leading to drops in veth_xmit() (ndo_start_xmit()). This patch introduces backpressure by returning NETDEV_TX_BUSY when the ring is full, signaling the qdisc layer to requeue the packet. The txq (netdev queue) is stopped in this condition and restarted once veth_poll() drains entries from the ring, ensuring coordination between NAPI and qdisc. Backpressure is only enabled when a qdisc is attached. Without a qdisc, the driver retains its original behavior - dropping packets immediately when the ring is full. This avoids unexpected behavior changes in setups without a configured qdisc. With a qdisc in place (e.g. fq, sfq) this allows Active Queue Management (AQM) to fairly schedule packets across flows and reduce collateral damage from elephant flows. A known limitation of this approach is that the full ring sits in front of the qdisc layer, effectively forming a FIFO buffer that introduces base latency. While AQM still improves fairness and mitigates flow dominance, the latency impact is measurable. In hardware drivers, this issue is typically addressed using BQL (Byte Queue Limits), which tracks in-flight bytes needed based on physical link rate. However, for virtual drivers like veth, there is no fixed bandwidth constraint - the bottleneck is CPU availability and the scheduler's ability to run the NAPI thread. It is unclear how effective BQL would be in this context. This patch serves as a first step toward addressing TX drops. Future work may explore adapting a BQL-like mechanism to better suit virtual devices like veth. Reported-by: Yan Zhai <yan@cloudflare.com> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> --- drivers/net/veth.c | 61 +++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 53 insertions(+), 8 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 7bb53961c0ea..f29a0db2ba36 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -308,11 +308,10 @@ static void __veth_xdp_flush(struct veth_rq *rq) static int veth_xdp_rx(struct veth_rq *rq, struct sk_buff *skb) { if (unlikely(ptr_ring_produce(&rq->xdp_ring, skb))) { - dev_kfree_skb_any(skb); - return NET_RX_DROP; + return NETDEV_TX_BUSY; /* signal qdisc layer */ } - return NET_RX_SUCCESS; + return NET_RX_SUCCESS; /* same as NETDEV_TX_OK */ } static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb, @@ -342,15 +341,27 @@ static bool veth_skb_is_eligible_for_gro(const struct net_device *dev, rcv->features & (NETIF_F_GRO_FRAGLIST | NETIF_F_GRO_UDP_FWD)); } +/* Does specific txq have a real qdisc attached? - see noqueue_init() */ +static inline bool txq_has_qdisc(struct netdev_queue *txq) +{ + struct Qdisc *q; + + q = rcu_dereference(txq->qdisc); + if (q->enqueue) + return true; + else + return false; +} + static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) { struct veth_priv *rcv_priv, *priv = netdev_priv(dev); struct veth_rq *rq = NULL; - int ret = NETDEV_TX_OK; + struct netdev_queue *txq; struct net_device *rcv; int length = skb->len; bool use_napi = false; - int rxq; + int ret, rxq; rcu_read_lock(); rcv = rcu_dereference(priv->peer); @@ -373,17 +384,41 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) } skb_tx_timestamp(skb); - if (likely(veth_forward_skb(rcv, skb, rq, use_napi) == NET_RX_SUCCESS)) { + + ret = veth_forward_skb(rcv, skb, rq, use_napi); + switch(ret) { + case NET_RX_SUCCESS: /* same as NETDEV_TX_OK */ if (!use_napi) dev_sw_netstats_tx_add(dev, 1, length); else __veth_xdp_flush(rq); - } else { + break; + case NETDEV_TX_BUSY: + /* If a qdisc is attached to our virtual device, returning + * NETDEV_TX_BUSY is allowed. + */ + txq = netdev_get_tx_queue(dev, rxq); + + if (!txq_has_qdisc(txq)) { + dev_kfree_skb_any(skb); + goto drop; + } + netif_tx_stop_queue(txq); /* Unconditional netif_txq_try_stop */ + /* Restore Eth hdr pulled by dev_forward_skb/eth_type_trans */ + __skb_push(skb, ETH_HLEN); + if (use_napi) + __veth_xdp_flush(rq); + + break; + case NET_RX_DROP: /* same as NET_XMIT_DROP */ drop: atomic64_inc(&priv->dropped); ret = NET_XMIT_DROP; + break; + default: + net_crit_ratelimited("veth_xmit(%s): Invalid return code(%d)", + dev->name, ret); } - rcu_read_unlock(); return ret; @@ -874,9 +909,16 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget, struct veth_xdp_tx_bq *bq, struct veth_stats *stats) { + struct veth_priv *priv = netdev_priv(rq->dev); + int queue_idx = rq->xdp_rxq.queue_index; + struct netdev_queue *peer_txq; + struct net_device *peer_dev; int i, done = 0, n_xdpf = 0; void *xdpf[VETH_XDP_BATCH]; + peer_dev = priv->peer; + peer_txq = netdev_get_tx_queue(peer_dev, queue_idx); + for (i = 0; i < budget; i++) { void *ptr = __ptr_ring_consume(&rq->xdp_ring); @@ -925,6 +967,9 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget, rq->stats.vs.xdp_packets += done; u64_stats_update_end(&rq->stats.syncp); + if (unlikely(netif_tx_queue_stopped(peer_txq))) + netif_tx_wake_queue(peer_txq); + return done; } ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V2 1/2] veth: apply qdisc backpressure on full ptr_ring to reduce TX drops 2025-04-08 15:31 ` [PATCH net-next V2 1/2] veth: apply qdisc backpressure on full ptr_ring to reduce TX drops Jesper Dangaard Brouer @ 2025-04-11 12:45 ` Simon Horman 2025-04-11 13:56 ` Jesper Dangaard Brouer 0 siblings, 1 reply; 11+ messages in thread From: Simon Horman @ 2025-04-11 12:45 UTC (permalink / raw) To: Jesper Dangaard Brouer Cc: netdev, Jakub Kicinski, bpf, tom, Eric Dumazet, David S. Miller, Paolo Abeni, Toke Høiland-Jørgensen, dsahern, makita.toshiaki, kernel-team On Tue, Apr 08, 2025 at 05:31:19PM +0200, Jesper Dangaard Brouer wrote: > In production, we're seeing TX drops on veth devices when the ptr_ring > fills up. This can occur when NAPI mode is enabled, though it's > relatively rare. However, with threaded NAPI - which we use in > production - the drops become significantly more frequent. > > The underlying issue is that with threaded NAPI, the consumer often runs > on a different CPU than the producer. This increases the likelihood of > the ring filling up before the consumer gets scheduled, especially under > load, leading to drops in veth_xmit() (ndo_start_xmit()). > > This patch introduces backpressure by returning NETDEV_TX_BUSY when the > ring is full, signaling the qdisc layer to requeue the packet. The txq > (netdev queue) is stopped in this condition and restarted once > veth_poll() drains entries from the ring, ensuring coordination between > NAPI and qdisc. > > Backpressure is only enabled when a qdisc is attached. Without a qdisc, > the driver retains its original behavior - dropping packets immediately > when the ring is full. This avoids unexpected behavior changes in setups > without a configured qdisc. > > With a qdisc in place (e.g. fq, sfq) this allows Active Queue Management > (AQM) to fairly schedule packets across flows and reduce collateral > damage from elephant flows. > > A known limitation of this approach is that the full ring sits in front > of the qdisc layer, effectively forming a FIFO buffer that introduces > base latency. While AQM still improves fairness and mitigates flow > dominance, the latency impact is measurable. > > In hardware drivers, this issue is typically addressed using BQL (Byte > Queue Limits), which tracks in-flight bytes needed based on physical link > rate. However, for virtual drivers like veth, there is no fixed bandwidth > constraint - the bottleneck is CPU availability and the scheduler's ability > to run the NAPI thread. It is unclear how effective BQL would be in this > context. > > This patch serves as a first step toward addressing TX drops. Future work > may explore adapting a BQL-like mechanism to better suit virtual devices > like veth. > > Reported-by: Yan Zhai <yan@cloudflare.com> > Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Thanks Jesper, It's very nice to see backpressure support being added here. ... > @@ -874,9 +909,16 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget, > struct veth_xdp_tx_bq *bq, > struct veth_stats *stats) > { > + struct veth_priv *priv = netdev_priv(rq->dev); > + int queue_idx = rq->xdp_rxq.queue_index; > + struct netdev_queue *peer_txq; > + struct net_device *peer_dev; > int i, done = 0, n_xdpf = 0; > void *xdpf[VETH_XDP_BATCH]; > > + peer_dev = priv->peer; I think you need to take into account RCU here. Sparse says: .../veth.c:919:18: warning: incorrect type in assignment (different address spaces) .../veth.c:919:18: expected struct net_device *peer_dev .../veth.c:919:18: got struct net_device [noderef] __rcu *peer > + peer_txq = netdev_get_tx_queue(peer_dev, queue_idx); > + > for (i = 0; i < budget; i++) { > void *ptr = __ptr_ring_consume(&rq->xdp_ring); > ... ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V2 1/2] veth: apply qdisc backpressure on full ptr_ring to reduce TX drops 2025-04-11 12:45 ` Simon Horman @ 2025-04-11 13:56 ` Jesper Dangaard Brouer 2025-04-11 14:32 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 11+ messages in thread From: Jesper Dangaard Brouer @ 2025-04-11 13:56 UTC (permalink / raw) To: Simon Horman Cc: netdev, Jakub Kicinski, bpf, tom, Eric Dumazet, David S. Miller, Paolo Abeni, Toke Høiland-Jørgensen, dsahern, makita.toshiaki, kernel-team On 11/04/2025 14.45, Simon Horman wrote: > On Tue, Apr 08, 2025 at 05:31:19PM +0200, Jesper Dangaard Brouer wrote: >> In production, we're seeing TX drops on veth devices when the ptr_ring >> fills up. This can occur when NAPI mode is enabled, though it's >> relatively rare. However, with threaded NAPI - which we use in >> production - the drops become significantly more frequent. >> >> The underlying issue is that with threaded NAPI, the consumer often runs >> on a different CPU than the producer. This increases the likelihood of >> the ring filling up before the consumer gets scheduled, especially under >> load, leading to drops in veth_xmit() (ndo_start_xmit()). >> >> This patch introduces backpressure by returning NETDEV_TX_BUSY when the >> ring is full, signaling the qdisc layer to requeue the packet. The txq >> (netdev queue) is stopped in this condition and restarted once >> veth_poll() drains entries from the ring, ensuring coordination between >> NAPI and qdisc. >> >> Backpressure is only enabled when a qdisc is attached. Without a qdisc, >> the driver retains its original behavior - dropping packets immediately >> when the ring is full. This avoids unexpected behavior changes in setups >> without a configured qdisc. >> >> With a qdisc in place (e.g. fq, sfq) this allows Active Queue Management >> (AQM) to fairly schedule packets across flows and reduce collateral >> damage from elephant flows. >> >> A known limitation of this approach is that the full ring sits in front >> of the qdisc layer, effectively forming a FIFO buffer that introduces >> base latency. While AQM still improves fairness and mitigates flow >> dominance, the latency impact is measurable. >> >> In hardware drivers, this issue is typically addressed using BQL (Byte >> Queue Limits), which tracks in-flight bytes needed based on physical link >> rate. However, for virtual drivers like veth, there is no fixed bandwidth >> constraint - the bottleneck is CPU availability and the scheduler's ability >> to run the NAPI thread. It is unclear how effective BQL would be in this >> context. >> >> This patch serves as a first step toward addressing TX drops. Future work >> may explore adapting a BQL-like mechanism to better suit virtual devices >> like veth. >> >> Reported-by: Yan Zhai <yan@cloudflare.com> >> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> > > Thanks Jesper, > > It's very nice to see backpressure support being added here. > > ... > >> @@ -874,9 +909,16 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget, >> struct veth_xdp_tx_bq *bq, >> struct veth_stats *stats) >> { >> + struct veth_priv *priv = netdev_priv(rq->dev); >> + int queue_idx = rq->xdp_rxq.queue_index; >> + struct netdev_queue *peer_txq; >> + struct net_device *peer_dev; >> int i, done = 0, n_xdpf = 0; >> void *xdpf[VETH_XDP_BATCH]; >> >> + peer_dev = priv->peer; > > I think you need to take into account RCU here. > > Sparse says: > > .../veth.c:919:18: warning: incorrect type in assignment (different address spaces) > .../veth.c:919:18: expected struct net_device *peer_dev > .../veth.c:919:18: got struct net_device [noderef] __rcu *peer > Is it correctly understood that I need an: peer_dev = rcu_dereference(priv->peer); And also wrap this in a RCU section (rcu_read_lock()) ? > >> + peer_txq = netdev_get_tx_queue(peer_dev, queue_idx); >> + >> for (i = 0; i < budget; i++) { >> void *ptr = __ptr_ring_consume(&rq->xdp_ring); >> > > ... ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V2 1/2] veth: apply qdisc backpressure on full ptr_ring to reduce TX drops 2025-04-11 13:56 ` Jesper Dangaard Brouer @ 2025-04-11 14:32 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 11+ messages in thread From: Toke Høiland-Jørgensen @ 2025-04-11 14:32 UTC (permalink / raw) To: Jesper Dangaard Brouer, Simon Horman Cc: netdev, Jakub Kicinski, bpf, tom, Eric Dumazet, David S. Miller, Paolo Abeni, dsahern, makita.toshiaki, kernel-team Jesper Dangaard Brouer <hawk@kernel.org> writes: > On 11/04/2025 14.45, Simon Horman wrote: >> On Tue, Apr 08, 2025 at 05:31:19PM +0200, Jesper Dangaard Brouer wrote: >>> In production, we're seeing TX drops on veth devices when the ptr_ring >>> fills up. This can occur when NAPI mode is enabled, though it's >>> relatively rare. However, with threaded NAPI - which we use in >>> production - the drops become significantly more frequent. >>> >>> The underlying issue is that with threaded NAPI, the consumer often runs >>> on a different CPU than the producer. This increases the likelihood of >>> the ring filling up before the consumer gets scheduled, especially under >>> load, leading to drops in veth_xmit() (ndo_start_xmit()). >>> >>> This patch introduces backpressure by returning NETDEV_TX_BUSY when the >>> ring is full, signaling the qdisc layer to requeue the packet. The txq >>> (netdev queue) is stopped in this condition and restarted once >>> veth_poll() drains entries from the ring, ensuring coordination between >>> NAPI and qdisc. >>> >>> Backpressure is only enabled when a qdisc is attached. Without a qdisc, >>> the driver retains its original behavior - dropping packets immediately >>> when the ring is full. This avoids unexpected behavior changes in setups >>> without a configured qdisc. >>> >>> With a qdisc in place (e.g. fq, sfq) this allows Active Queue Management >>> (AQM) to fairly schedule packets across flows and reduce collateral >>> damage from elephant flows. >>> >>> A known limitation of this approach is that the full ring sits in front >>> of the qdisc layer, effectively forming a FIFO buffer that introduces >>> base latency. While AQM still improves fairness and mitigates flow >>> dominance, the latency impact is measurable. >>> >>> In hardware drivers, this issue is typically addressed using BQL (Byte >>> Queue Limits), which tracks in-flight bytes needed based on physical link >>> rate. However, for virtual drivers like veth, there is no fixed bandwidth >>> constraint - the bottleneck is CPU availability and the scheduler's ability >>> to run the NAPI thread. It is unclear how effective BQL would be in this >>> context. >>> >>> This patch serves as a first step toward addressing TX drops. Future work >>> may explore adapting a BQL-like mechanism to better suit virtual devices >>> like veth. >>> >>> Reported-by: Yan Zhai <yan@cloudflare.com> >>> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> >> >> Thanks Jesper, >> >> It's very nice to see backpressure support being added here. >> >> ... >> >>> @@ -874,9 +909,16 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget, >>> struct veth_xdp_tx_bq *bq, >>> struct veth_stats *stats) >>> { >>> + struct veth_priv *priv = netdev_priv(rq->dev); >>> + int queue_idx = rq->xdp_rxq.queue_index; >>> + struct netdev_queue *peer_txq; >>> + struct net_device *peer_dev; >>> int i, done = 0, n_xdpf = 0; >>> void *xdpf[VETH_XDP_BATCH]; >>> >>> + peer_dev = priv->peer; >> >> I think you need to take into account RCU here. >> >> Sparse says: >> >> .../veth.c:919:18: warning: incorrect type in assignment (different address spaces) >> .../veth.c:919:18: expected struct net_device *peer_dev >> .../veth.c:919:18: got struct net_device [noderef] __rcu *peer >> > > Is it correctly understood that I need an: > > peer_dev = rcu_dereference(priv->peer); > > And also wrap this in a RCU section (rcu_read_lock()) ? Just the deref - softirq already counts as an RCU section, so no need for an additional rcu_read_lock() :) -Toke ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net-next V2 2/2] net: sched: generalize check for no-op qdisc 2025-04-08 15:31 [PATCH net-next V2 0/2] veth: qdisc backpressure and qdisc check refactor Jesper Dangaard Brouer 2025-04-08 15:31 ` [PATCH net-next V2 1/2] veth: apply qdisc backpressure on full ptr_ring to reduce TX drops Jesper Dangaard Brouer @ 2025-04-08 15:31 ` Jesper Dangaard Brouer 2025-04-08 15:47 ` Eric Dumazet 2025-04-11 12:48 ` Simon Horman 2025-04-11 12:59 ` [PATCH net-next V2 0/2] veth: qdisc backpressure and qdisc check refactor Jakub Sitnicki 2 siblings, 2 replies; 11+ messages in thread From: Jesper Dangaard Brouer @ 2025-04-08 15:31 UTC (permalink / raw) To: netdev, Jakub Kicinski Cc: Jesper Dangaard Brouer, bpf, tom, Eric Dumazet, David S. Miller, Paolo Abeni, Toke Høiland-Jørgensen, dsahern, makita.toshiaki, kernel-team Several drivers (e.g., veth, vrf) contain open-coded checks to determine whether a TX queue has a real qdisc attached - typically by testing if qdisc->enqueue is non-NULL. These checks are functionally equivalent to comparing the queue's qdisc pointer against &noop_qdisc (qdisc named "noqueue"). This equivalence stems from noqueue_init(), which explicitly clears the enqueue pointer for the "noqueue" qdisc. As a result, __dev_queue_xmit() treats the qdisc as a no-op only when enqueue == NULL. This patch introduces a common helper, qdisc_txq_is_noop() to standardize this check. The helper is added in sch_generic.h and replaces open-coded logic in both the veth and vrf drivers. This is a non-functional change. Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> --- drivers/net/veth.c | 14 +------------- drivers/net/vrf.c | 3 +-- include/net/sch_generic.h | 7 ++++++- 3 files changed, 8 insertions(+), 16 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index f29a0db2ba36..83c7758534da 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -341,18 +341,6 @@ static bool veth_skb_is_eligible_for_gro(const struct net_device *dev, rcv->features & (NETIF_F_GRO_FRAGLIST | NETIF_F_GRO_UDP_FWD)); } -/* Does specific txq have a real qdisc attached? - see noqueue_init() */ -static inline bool txq_has_qdisc(struct netdev_queue *txq) -{ - struct Qdisc *q; - - q = rcu_dereference(txq->qdisc); - if (q->enqueue) - return true; - else - return false; -} - static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) { struct veth_priv *rcv_priv, *priv = netdev_priv(dev); @@ -399,7 +387,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) */ txq = netdev_get_tx_queue(dev, rxq); - if (!txq_has_qdisc(txq)) { + if (qdisc_txq_is_noop(txq)) { dev_kfree_skb_any(skb); goto drop; } diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 7168b33adadb..d4fe36c55f29 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -349,9 +349,8 @@ static bool qdisc_tx_is_default(const struct net_device *dev) return false; txq = netdev_get_tx_queue(dev, 0); - qdisc = rcu_access_pointer(txq->qdisc); - return !qdisc->enqueue; + return qdisc_txq_is_noop(txq); } /* Local traffic destined to local address. Reinsert the packet to rx diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index d48c657191cd..eb90d5103371 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -803,6 +803,11 @@ static inline bool qdisc_tx_changing(const struct net_device *dev) return false; } +static inline bool qdisc_txq_is_noop(const struct netdev_queue *txq) +{ + return (rcu_access_pointer(txq->qdisc) == &noop_qdisc); +} + /* Is the device using the noop qdisc on all queues? */ static inline bool qdisc_tx_is_noop(const struct net_device *dev) { @@ -810,7 +815,7 @@ static inline bool qdisc_tx_is_noop(const struct net_device *dev) for (i = 0; i < dev->num_tx_queues; i++) { struct netdev_queue *txq = netdev_get_tx_queue(dev, i); - if (rcu_access_pointer(txq->qdisc) != &noop_qdisc) + if (!qdisc_txq_is_noop(txq)) return false; } return true; ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V2 2/2] net: sched: generalize check for no-op qdisc 2025-04-08 15:31 ` [PATCH net-next V2 2/2] net: sched: generalize check for no-op qdisc Jesper Dangaard Brouer @ 2025-04-08 15:47 ` Eric Dumazet 2025-04-09 13:28 ` Jesper Dangaard Brouer 2025-04-11 12:48 ` Simon Horman 1 sibling, 1 reply; 11+ messages in thread From: Eric Dumazet @ 2025-04-08 15:47 UTC (permalink / raw) To: Jesper Dangaard Brouer, netdev, Jakub Kicinski, edumazet Cc: bpf, tom, David S. Miller, Paolo Abeni, Toke Høiland-Jørgensen, dsahern, makita.toshiaki, kernel-team On 4/8/25 5:31 PM, Jesper Dangaard Brouer wrote: > Several drivers (e.g., veth, vrf) contain open-coded checks to determine > whether a TX queue has a real qdisc attached - typically by testing if > qdisc->enqueue is non-NULL. > > These checks are functionally equivalent to comparing the queue's qdisc > pointer against &noop_qdisc (qdisc named "noqueue"). This equivalence > stems from noqueue_init(), which explicitly clears the enqueue pointer > for the "noqueue" qdisc. As a result, __dev_queue_xmit() treats the qdisc > as a no-op only when enqueue == NULL. > > This patch introduces a common helper, qdisc_txq_is_noop() to standardize > this check. The helper is added in sch_generic.h and replaces open-coded > logic in both the veth and vrf drivers. > > This is a non-functional change. > > Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> > --- > drivers/net/veth.c | 14 +------------- > drivers/net/vrf.c | 3 +-- > include/net/sch_generic.h | 7 ++++++- > 3 files changed, 8 insertions(+), 16 deletions(-) > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > index f29a0db2ba36..83c7758534da 100644 > --- a/drivers/net/veth.c > +++ b/drivers/net/veth.c > @@ -341,18 +341,6 @@ static bool veth_skb_is_eligible_for_gro(const struct net_device *dev, > rcv->features & (NETIF_F_GRO_FRAGLIST | NETIF_F_GRO_UDP_FWD)); > } > > -/* Does specific txq have a real qdisc attached? - see noqueue_init() */ > -static inline bool txq_has_qdisc(struct netdev_queue *txq) > -{ > - struct Qdisc *q; > - > - q = rcu_dereference(txq->qdisc); > - if (q->enqueue) > - return true; > - else > - return false; > -} > - > static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) > { > struct veth_priv *rcv_priv, *priv = netdev_priv(dev); > @@ -399,7 +387,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) > */ > txq = netdev_get_tx_queue(dev, rxq); > > - if (!txq_has_qdisc(txq)) { > + if (qdisc_txq_is_noop(txq)) { > dev_kfree_skb_any(skb); > goto drop; > } > diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c > index 7168b33adadb..d4fe36c55f29 100644 > --- a/drivers/net/vrf.c > +++ b/drivers/net/vrf.c > @@ -349,9 +349,8 @@ static bool qdisc_tx_is_default(const struct net_device *dev) > return false; > > txq = netdev_get_tx_queue(dev, 0); > - qdisc = rcu_access_pointer(txq->qdisc); > > - return !qdisc->enqueue; > + return qdisc_txq_is_noop(txq); > } > > /* Local traffic destined to local address. Reinsert the packet to rx > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h > index d48c657191cd..eb90d5103371 100644 > --- a/include/net/sch_generic.h > +++ b/include/net/sch_generic.h > @@ -803,6 +803,11 @@ static inline bool qdisc_tx_changing(const struct net_device *dev) > return false; > } > > +static inline bool qdisc_txq_is_noop(const struct netdev_queue *txq) > +{ > + return (rcu_access_pointer(txq->qdisc) == &noop_qdisc); return (expression); -> return expression; return rcu_access_pointer(txq->qdisc) == &noop_qdisc; I also feel this patch should come first in the series ? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V2 2/2] net: sched: generalize check for no-op qdisc 2025-04-08 15:47 ` Eric Dumazet @ 2025-04-09 13:28 ` Jesper Dangaard Brouer 2025-04-09 13:47 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 11+ messages in thread From: Jesper Dangaard Brouer @ 2025-04-09 13:28 UTC (permalink / raw) To: Eric Dumazet, netdev, Jakub Kicinski, edumazet Cc: bpf, tom, David S. Miller, Paolo Abeni, Toke Høiland-Jørgensen, dsahern, makita.toshiaki, kernel-team On 08/04/2025 17.47, Eric Dumazet wrote: > > On 4/8/25 5:31 PM, Jesper Dangaard Brouer wrote: >> Several drivers (e.g., veth, vrf) contain open-coded checks to determine >> whether a TX queue has a real qdisc attached - typically by testing if >> qdisc->enqueue is non-NULL. >> >> These checks are functionally equivalent to comparing the queue's qdisc >> pointer against &noop_qdisc (qdisc named "noqueue"). This equivalence >> stems from noqueue_init(), which explicitly clears the enqueue pointer >> for the "noqueue" qdisc. As a result, __dev_queue_xmit() treats the qdisc >> as a no-op only when enqueue == NULL. >> >> This patch introduces a common helper, qdisc_txq_is_noop() to standardize >> this check. The helper is added in sch_generic.h and replaces open-coded >> logic in both the veth and vrf drivers. >> >> This is a non-functional change. >> >> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> >> --- >> drivers/net/veth.c | 14 +------------- >> drivers/net/vrf.c | 3 +-- >> include/net/sch_generic.h | 7 ++++++- >> 3 files changed, 8 insertions(+), 16 deletions(-) >> >> diff --git a/drivers/net/veth.c b/drivers/net/veth.c >> index f29a0db2ba36..83c7758534da 100644 >> --- a/drivers/net/veth.c >> +++ b/drivers/net/veth.c >> @@ -341,18 +341,6 @@ static bool veth_skb_is_eligible_for_gro(const >> struct net_device *dev, >> rcv->features & (NETIF_F_GRO_FRAGLIST | NETIF_F_GRO_UDP_FWD)); >> } >> -/* Does specific txq have a real qdisc attached? - see noqueue_init() */ >> -static inline bool txq_has_qdisc(struct netdev_queue *txq) >> -{ >> - struct Qdisc *q; >> - >> - q = rcu_dereference(txq->qdisc); >> - if (q->enqueue) >> - return true; >> - else >> - return false; >> -} >> - >> static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device >> *dev) >> { >> struct veth_priv *rcv_priv, *priv = netdev_priv(dev); >> @@ -399,7 +387,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, >> struct net_device *dev) >> */ >> txq = netdev_get_tx_queue(dev, rxq); >> - if (!txq_has_qdisc(txq)) { >> + if (qdisc_txq_is_noop(txq)) { >> dev_kfree_skb_any(skb); >> goto drop; >> } >> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c >> index 7168b33adadb..d4fe36c55f29 100644 >> --- a/drivers/net/vrf.c >> +++ b/drivers/net/vrf.c >> @@ -349,9 +349,8 @@ static bool qdisc_tx_is_default(const struct >> net_device *dev) >> return false; >> txq = netdev_get_tx_queue(dev, 0); >> - qdisc = rcu_access_pointer(txq->qdisc); >> - return !qdisc->enqueue; >> + return qdisc_txq_is_noop(txq); >> } >> /* Local traffic destined to local address. Reinsert the packet to rx >> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h >> index d48c657191cd..eb90d5103371 100644 >> --- a/include/net/sch_generic.h >> +++ b/include/net/sch_generic.h >> @@ -803,6 +803,11 @@ static inline bool qdisc_tx_changing(const struct >> net_device *dev) >> return false; >> } >> +static inline bool qdisc_txq_is_noop(const struct netdev_queue *txq) >> +{ >> + return (rcu_access_pointer(txq->qdisc) == &noop_qdisc); > > > return (expression); > > -> > > return expression; > > > return rcu_access_pointer(txq->qdisc) == &noop_qdisc; Will fix in next iteration. > I also feel this patch should come first in the series ? > To me it looks/feels wrong doing this before there are two users. With only the vrf driver, the changed looked unnecessary. The diff stats looks/feels wrong, when it's patch-1. As I have to respin anyhow, I will let you decide. Please let me know, if you prefer this to be patch-1 ? --Jesper ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V2 2/2] net: sched: generalize check for no-op qdisc 2025-04-09 13:28 ` Jesper Dangaard Brouer @ 2025-04-09 13:47 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 11+ messages in thread From: Toke Høiland-Jørgensen @ 2025-04-09 13:47 UTC (permalink / raw) To: Jesper Dangaard Brouer, Eric Dumazet, netdev, Jakub Kicinski, edumazet Cc: bpf, tom, David S. Miller, Paolo Abeni, dsahern, makita.toshiaki, kernel-team Jesper Dangaard Brouer <hawk@kernel.org> writes: > On 08/04/2025 17.47, Eric Dumazet wrote: >> >> On 4/8/25 5:31 PM, Jesper Dangaard Brouer wrote: >>> Several drivers (e.g., veth, vrf) contain open-coded checks to determine >>> whether a TX queue has a real qdisc attached - typically by testing if >>> qdisc->enqueue is non-NULL. >>> >>> These checks are functionally equivalent to comparing the queue's qdisc >>> pointer against &noop_qdisc (qdisc named "noqueue"). This equivalence >>> stems from noqueue_init(), which explicitly clears the enqueue pointer >>> for the "noqueue" qdisc. As a result, __dev_queue_xmit() treats the qdisc >>> as a no-op only when enqueue == NULL. >>> >>> This patch introduces a common helper, qdisc_txq_is_noop() to standardize >>> this check. The helper is added in sch_generic.h and replaces open-coded >>> logic in both the veth and vrf drivers. >>> >>> This is a non-functional change. >>> >>> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> >>> --- >>> drivers/net/veth.c | 14 +------------- >>> drivers/net/vrf.c | 3 +-- >>> include/net/sch_generic.h | 7 ++++++- >>> 3 files changed, 8 insertions(+), 16 deletions(-) >>> >>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c >>> index f29a0db2ba36..83c7758534da 100644 >>> --- a/drivers/net/veth.c >>> +++ b/drivers/net/veth.c >>> @@ -341,18 +341,6 @@ static bool veth_skb_is_eligible_for_gro(const >>> struct net_device *dev, >>> rcv->features & (NETIF_F_GRO_FRAGLIST | NETIF_F_GRO_UDP_FWD)); >>> } >>> -/* Does specific txq have a real qdisc attached? - see noqueue_init() */ >>> -static inline bool txq_has_qdisc(struct netdev_queue *txq) >>> -{ >>> - struct Qdisc *q; >>> - >>> - q = rcu_dereference(txq->qdisc); >>> - if (q->enqueue) >>> - return true; >>> - else >>> - return false; >>> -} >>> - >>> static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device >>> *dev) >>> { >>> struct veth_priv *rcv_priv, *priv = netdev_priv(dev); >>> @@ -399,7 +387,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, >>> struct net_device *dev) >>> */ >>> txq = netdev_get_tx_queue(dev, rxq); >>> - if (!txq_has_qdisc(txq)) { >>> + if (qdisc_txq_is_noop(txq)) { >>> dev_kfree_skb_any(skb); >>> goto drop; >>> } >>> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c >>> index 7168b33adadb..d4fe36c55f29 100644 >>> --- a/drivers/net/vrf.c >>> +++ b/drivers/net/vrf.c >>> @@ -349,9 +349,8 @@ static bool qdisc_tx_is_default(const struct >>> net_device *dev) >>> return false; >>> txq = netdev_get_tx_queue(dev, 0); >>> - qdisc = rcu_access_pointer(txq->qdisc); >>> - return !qdisc->enqueue; >>> + return qdisc_txq_is_noop(txq); >>> } >>> /* Local traffic destined to local address. Reinsert the packet to rx >>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h >>> index d48c657191cd..eb90d5103371 100644 >>> --- a/include/net/sch_generic.h >>> +++ b/include/net/sch_generic.h >>> @@ -803,6 +803,11 @@ static inline bool qdisc_tx_changing(const struct >>> net_device *dev) >>> return false; >>> } >>> +static inline bool qdisc_txq_is_noop(const struct netdev_queue *txq) >>> +{ >>> + return (rcu_access_pointer(txq->qdisc) == &noop_qdisc); >> >> >> return (expression); >> >> -> >> >> return expression; >> >> >> return rcu_access_pointer(txq->qdisc) == &noop_qdisc; > > Will fix in next iteration. > >> I also feel this patch should come first in the series ? >> > > To me it looks/feels wrong doing this before there are two users. > With only the vrf driver, the changed looked unnecessary. > The diff stats looks/feels wrong, when it's patch-1. Generalising something in preparation for another user is pretty normal, isn't it? Just write this in the commit message, like: "In preparation for using this in more places, move the check for a noop qdisc from the vrf driver into sch_generic.h" - or something like that? -Toke ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V2 2/2] net: sched: generalize check for no-op qdisc 2025-04-08 15:31 ` [PATCH net-next V2 2/2] net: sched: generalize check for no-op qdisc Jesper Dangaard Brouer 2025-04-08 15:47 ` Eric Dumazet @ 2025-04-11 12:48 ` Simon Horman 1 sibling, 0 replies; 11+ messages in thread From: Simon Horman @ 2025-04-11 12:48 UTC (permalink / raw) To: Jesper Dangaard Brouer Cc: netdev, Jakub Kicinski, bpf, tom, Eric Dumazet, David S. Miller, Paolo Abeni, Toke Høiland-Jørgensen, dsahern, makita.toshiaki, kernel-team On Tue, Apr 08, 2025 at 05:31:24PM +0200, Jesper Dangaard Brouer wrote: > Several drivers (e.g., veth, vrf) contain open-coded checks to determine > whether a TX queue has a real qdisc attached - typically by testing if > qdisc->enqueue is non-NULL. > > These checks are functionally equivalent to comparing the queue's qdisc > pointer against &noop_qdisc (qdisc named "noqueue"). This equivalence > stems from noqueue_init(), which explicitly clears the enqueue pointer > for the "noqueue" qdisc. As a result, __dev_queue_xmit() treats the qdisc > as a no-op only when enqueue == NULL. > > This patch introduces a common helper, qdisc_txq_is_noop() to standardize > this check. The helper is added in sch_generic.h and replaces open-coded > logic in both the veth and vrf drivers. > > This is a non-functional change. > > Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> ... > diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c > index 7168b33adadb..d4fe36c55f29 100644 > --- a/drivers/net/vrf.c > +++ b/drivers/net/vrf.c > @@ -349,9 +349,8 @@ static bool qdisc_tx_is_default(const struct net_device *dev) > return false; > > txq = netdev_get_tx_queue(dev, 0); > - qdisc = rcu_access_pointer(txq->qdisc); nit: the qdisc variable is now unused in this function and can be removed. Flagged by W=1 builds. > > - return !qdisc->enqueue; > + return qdisc_txq_is_noop(txq); > } > > /* Local traffic destined to local address. Reinsert the packet to rx ... ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V2 0/2] veth: qdisc backpressure and qdisc check refactor 2025-04-08 15:31 [PATCH net-next V2 0/2] veth: qdisc backpressure and qdisc check refactor Jesper Dangaard Brouer 2025-04-08 15:31 ` [PATCH net-next V2 1/2] veth: apply qdisc backpressure on full ptr_ring to reduce TX drops Jesper Dangaard Brouer 2025-04-08 15:31 ` [PATCH net-next V2 2/2] net: sched: generalize check for no-op qdisc Jesper Dangaard Brouer @ 2025-04-11 12:59 ` Jakub Sitnicki 2 siblings, 0 replies; 11+ messages in thread From: Jakub Sitnicki @ 2025-04-11 12:59 UTC (permalink / raw) To: Jesper Dangaard Brouer Cc: netdev, Jakub Kicinski, bpf, tom, Eric Dumazet, David S. Miller, Paolo Abeni, Toke Høiland-Jørgensen, dsahern, makita.toshiaki, kernel-team On Tue, Apr 08, 2025 at 05:31 PM +02, Jesper Dangaard Brouer wrote: > This series addresses TX drops observed in production when using veth > devices with threaded NAPI, and refactors a common qdisc check into a > shared helper. > > In threaded NAPI mode, packet drops can occur when the ptr_ring backing > the veth peer fills up. This is typically due to a combination of > scheduling delays and the consumer (NAPI thread) being slower than the > producer. When the ring overflows, packets are dropped in veth_xmit(). > > Patch 1 introduces a backpressure mechanism: when the ring is full, the > driver returns NETDEV_TX_BUSY, signaling the qdisc layer to requeue the > packet. This allows Active Queue Management (AQM) - such as fq or sfq - > to spread traffic more fairly across flows and reduce damage from > elephant flows. > > To minimize invasiveness, this backpressure behavior is only enabled when > a qdisc is attached. If no qdisc is present, the driver retains its > original behavior (dropping packets on a full ring), avoiding behavior > changes for configurations without a qdisc. > > Detecting the presence of a "real" qdisc relies on a check that is > already duplicated across multiple drivers (e.g., veth, vrf). Patch-2 > consolidates this logic into a new helper, qdisc_txq_is_noop(), to avoid > duplication and clarify intent. > > --- > > Jesper Dangaard Brouer (2): > veth: apply qdisc backpressure on full ptr_ring to reduce TX drops > net: sched: generalize check for no-op qdisc > > > drivers/net/veth.c | 49 ++++++++++++++++++++++++++++++++------- > drivers/net/vrf.c | 3 +-- > include/net/sch_generic.h | 7 +++++- > 3 files changed, 48 insertions(+), 11 deletions(-) This setup scenario is currently not covered by the veth selftest [1]. Would be great to extend it so the code gets exercised by the CI. [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/tools/testing/selftests/net/veth.sh ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-04-11 14:32 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-04-08 15:31 [PATCH net-next V2 0/2] veth: qdisc backpressure and qdisc check refactor Jesper Dangaard Brouer 2025-04-08 15:31 ` [PATCH net-next V2 1/2] veth: apply qdisc backpressure on full ptr_ring to reduce TX drops Jesper Dangaard Brouer 2025-04-11 12:45 ` Simon Horman 2025-04-11 13:56 ` Jesper Dangaard Brouer 2025-04-11 14:32 ` Toke Høiland-Jørgensen 2025-04-08 15:31 ` [PATCH net-next V2 2/2] net: sched: generalize check for no-op qdisc Jesper Dangaard Brouer 2025-04-08 15:47 ` Eric Dumazet 2025-04-09 13:28 ` Jesper Dangaard Brouer 2025-04-09 13:47 ` Toke Høiland-Jørgensen 2025-04-11 12:48 ` Simon Horman 2025-04-11 12:59 ` [PATCH net-next V2 0/2] veth: qdisc backpressure and qdisc check refactor Jakub Sitnicki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).