* [RFC net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
@ 2023-03-11 5:01 Jakub Kicinski
2023-03-11 5:01 ` [RFC net-next 2/3] ixgbe: use new queue try_stop/try_wake macros Jakub Kicinski
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Jakub Kicinski @ 2023-03-11 5:01 UTC (permalink / raw)
To: netdev, davem, edumazet, pabeni
Cc: alexanderduyck, roman.gushchin, Jakub Kicinski
A lot of drivers follow the same scheme to stop / start queues
without introducing locks between xmit and NAPI tx completions.
I'm guessing they all copy'n'paste each other's code.
Smaller drivers shy away from the scheme and introduce a lock
which may cause deadlocks in netpoll.
Provide macros which encapsulate the necessary logic.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/net/netdev_queues.h | 166 ++++++++++++++++++++++++++++++++++++
1 file changed, 166 insertions(+)
create mode 100644 include/net/netdev_queues.h
diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
new file mode 100644
index 000000000000..2a857faf28d8
--- /dev/null
+++ b/include/net/netdev_queues.h
@@ -0,0 +1,166 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_NET_QUEUES_H
+#define _LINUX_NET_QUEUES_H
+
+#include <linux/netdevice.h>
+
+/* Lockless queue stopping / waking helpers.
+ *
+ * These macroes are designed to safely implement stopping and waking
+ * netdev queues without any lock protection. We assume that there can
+ * be no concurrent stop attempts and no concurrent wake attempts.
+ * This is usually true as stop attempts happen from the xmit handler,
+ * while wake up is triggered from NAPI poll context. The two may run
+ * concurrently but are each protected by a lock (SPSC of sorts).
+ *
+ * All descriptor ring indexes (and other relevant shared state) must
+ * be updated before invoking the macros.
+ */
+
+#define netif_tx_queue_try_stop(txq, get_desc, start_thrs) \
+ ({ \
+ int _res; \
+ \
+ netif_tx_stop_queue(txq); \
+ \
+ smp_mb(); \
+ \
+ /* We need to check again in a case another \
+ * CPU has just made room available. \
+ */ \
+ if (likely(get_desc < start_thrs)) { \
+ _res = 0; \
+ } else { \
+ netif_tx_wake_queue(txq); \
+ _res = -1; \
+ } \
+ _res; \
+ }) \
+
+/**
+ * netif_tx_queue_maybe_stop() - locklessly stop a Tx queue, if needed
+ * @txq: struct netdev_queue to stop/start
+ * @get_desc: get current number of free descriptors (see requirements below!)
+ * @stop_thrs: minimal number of available descriptors for queue to be left
+ * enabled
+ * @start_thrs: minimal number of descriptors to re-enable the queue, can be
+ * equal to @stop_thrs or higher to avoid frequent waking
+ *
+ * All arguments may be evaluated multiple times, beware of side effects.
+ * @get_desc must be a formula or a function call, it must always
+ * return up-to-date information when evaluated!
+ *
+ * Returns:
+ * 0 if the queue was stopped
+ * 1 if the queue was left enabled
+ * -1 if the queue was re-enabled (raced with waking)
+ */
+#define netif_tx_queue_maybe_stop(txq, get_desc, stop_thrs, start_thrs) \
+ ({ \
+ int _res; \
+ \
+ if (likely(get_desc > stop_thrs)) \
+ _res = 1; \
+ else \
+ _res = netif_tx_queue_try_stop(txq, get_desc, \
+ start_thrs); \
+ _res; \
+ }) \
+
+#define __netif_tx_queue_try_wake(txq, get_desc, start_thrs, down_cond) \
+ ({ \
+ int _res; \
+ \
+ /* Make sure that anybody stopping the queue after \
+ * this sees the new next_to_clean. \
+ */ \
+ smp_mb(); \
+ if (netif_tx_queue_stopped(txq) && !(down_cond)) { \
+ netif_tx_wake_queue(txq); \
+ _res = 0; \
+ } else { \
+ _res = 1; \
+ } \
+ _res; \
+ })
+
+#define netif_tx_queue_try_wake(txq, get_desc, start_thrs) \
+ __netif_tx_queue_try_wake(txq, get_desc, start_thrs, false)
+
+/**
+ * __netif_tx_queue_maybe_wake() - locklessly wake a Tx queue, if needed
+ * @txq: struct netdev_queue to stop/start
+ * @get_desc: get current number of free descriptors (see requirements below!)
+ * @start_thrs: minimal number of descriptors to re-enable the queue
+ * @down_cond: down condition, perdicate indicating that the queue should
+ * not be woken up even if descriptors are available
+ *
+ * All arguments may be evaluated multiple times.
+ * @get_desc must be a formula or a function call, it must always
+ * return up-to-date information when evaluated!
+ *
+ * Returns:
+ * 0 if the queue was woken up
+ * 1 if the queue was already enabled (or disabled but @down_cond is true)
+ * -1 if the queue was left stopped
+ */
+#define __netif_tx_queue_maybe_wake(txq, get_desc, start_thrs, down_cond) \
+ ({ \
+ int _res; \
+ \
+ if (likely(get_desc < start_thrs)) \
+ _res = -1; \
+ else \
+ _res = __netif_tx_queue_try_wake(txq, get_desc, \
+ start_thrs, \
+ down_cond); \
+ _res; \
+ })
+
+#define netif_tx_queue_maybe_wake(txq, get_desc, start_thrs) \
+ __netif_tx_queue_maybe_wake(txq, get_desc, start_thrs, false)
+
+/* subqueue variants follow */
+
+#define netif_subqueue_try_stop(dev, idx, get_desc, start_thrs) \
+ ({ \
+ struct netdev_queue *txq; \
+ \
+ txq = netdev_get_tx_queue(dev, idx); \
+ netif_tx_queue_try_stop(txq, get_desc, start_thrs); \
+ })
+
+#define netif_subqueue_maybe_stop(dev, idx, get_desc, stop_thrs, start_thrs) \
+ ({ \
+ struct netdev_queue *txq; \
+ \
+ txq = netdev_get_tx_queue(dev, idx); \
+ netif_tx_queue_maybe_stop(txq, get_desc, \
+ stop_thrs, start_thrs); \
+ })
+
+#define __netif_subqueue_try_wake(dev, idx, get_desc, start_thrs, down_cond) \
+ ({ \
+ struct netdev_queue *txq; \
+ \
+ txq = netdev_get_tx_queue(dev, idx); \
+ __netif_tx_queue_try_wake(txq, get_desc, \
+ start_thrs, down_cond); \
+ })
+
+#define netif_subqueue_try_wake(dev, idx, get_desc, start_thrs) \
+ __netif_subqueue_try_wake(dev, idx, get_desc, start_thrs, false)
+
+#define __netif_subqueue_maybe_wake(dev, idx, get_desc, start_thrs, down_cond) \
+ ({ \
+ struct netdev_queue *txq; \
+ \
+ txq = netdev_get_tx_queue(dev, idx); \
+ __netif_tx_queue_maybe_wake(txq, get_desc, \
+ start_thrs, down_cond); \
+ })
+
+#define netif_subqueue_maybe_wake(dev, idx, get_desc, start_thrs) \
+ __netif_subqueue_maybe_wake(dev, idx, get_desc, start_thrs, false)
+
+#endif
--
2.39.2
^ permalink raw reply related [flat|nested] 8+ messages in thread* [RFC net-next 2/3] ixgbe: use new queue try_stop/try_wake macros
2023-03-11 5:01 [RFC net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
@ 2023-03-11 5:01 ` Jakub Kicinski
2023-03-11 5:01 ` [RFC net-next 3/3] bnxt: " Jakub Kicinski
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2023-03-11 5:01 UTC (permalink / raw)
To: netdev, davem, edumazet, pabeni
Cc: alexanderduyck, roman.gushchin, Jakub Kicinski
Convert ixgbe to use the new macros, I think a lot of people
copy the ixgbe code.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 37 +++++--------------
1 file changed, 9 insertions(+), 28 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 773c35fecace..db00e50a40ff 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -36,6 +36,7 @@
#include <net/tc_act/tc_mirred.h>
#include <net/vxlan.h>
#include <net/mpls.h>
+#include <net/netdev_queues.h>
#include <net/xdp_sock_drv.h>
#include <net/xfrm.h>
@@ -1253,20 +1254,12 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
total_packets, total_bytes);
#define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
- if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) &&
- (ixgbe_desc_unused(tx_ring) >= TX_WAKE_THRESHOLD))) {
- /* Make sure that anybody stopping the queue after this
- * sees the new next_to_clean.
- */
- smp_mb();
- if (__netif_subqueue_stopped(tx_ring->netdev,
- tx_ring->queue_index)
- && !test_bit(__IXGBE_DOWN, &adapter->state)) {
- netif_wake_subqueue(tx_ring->netdev,
- tx_ring->queue_index);
- ++tx_ring->tx_stats.restart_queue;
- }
- }
+ if (total_packets && netif_carrier_ok(tx_ring->netdev) &&
+ !__netif_subqueue_maybe_wake(tx_ring->netdev, tx_ring->queue_index,
+ ixgbe_desc_unused(tx_ring),
+ TX_WAKE_THRESHOLD,
+ test_bit(__IXGBE_DOWN, &adapter->state)))
+ ++tx_ring->tx_stats.restart_queue;
return !!budget;
}
@@ -8270,22 +8263,10 @@ static void ixgbe_tx_olinfo_status(union ixgbe_adv_tx_desc *tx_desc,
static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
{
- netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
-
- /* Herbert's original patch had:
- * smp_mb__after_netif_stop_queue();
- * but since that doesn't exist yet, just open code it.
- */
- smp_mb();
-
- /* We need to check again in a case another CPU has just
- * made room available.
- */
- if (likely(ixgbe_desc_unused(tx_ring) < size))
+ if (!netif_subqueue_try_stop(tx_ring->netdev, tx_ring->queue_index,
+ ixgbe_desc_unused(tx_ring), size))
return -EBUSY;
- /* A reprieve! - use start_queue because it doesn't call schedule */
- netif_start_subqueue(tx_ring->netdev, tx_ring->queue_index);
++tx_ring->tx_stats.restart_queue;
return 0;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 8+ messages in thread* [RFC net-next 3/3] bnxt: use new queue try_stop/try_wake macros
2023-03-11 5:01 [RFC net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
2023-03-11 5:01 ` [RFC net-next 2/3] ixgbe: use new queue try_stop/try_wake macros Jakub Kicinski
@ 2023-03-11 5:01 ` Jakub Kicinski
2023-03-11 16:28 ` [RFC net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Stephen Hemminger
2023-04-04 3:29 ` Herbert Xu
3 siblings, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2023-03-11 5:01 UTC (permalink / raw)
To: netdev, davem, edumazet, pabeni
Cc: alexanderduyck, roman.gushchin, Jakub Kicinski
Convert bnxt to use new macros rather than open code the logic.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 41 +++++------------------
1 file changed, 8 insertions(+), 33 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index dceaecab6605..b52d1e5d0ac7 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -56,6 +56,7 @@
#include <linux/hwmon-sysfs.h>
#include <net/page_pool.h>
#include <linux/align.h>
+#include <net/netdev_queues.h>
#include "bnxt_hsi.h"
#include "bnxt.h"
@@ -331,26 +332,6 @@ static void bnxt_txr_db_kick(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
txr->kick_pending = 0;
}
-static bool bnxt_txr_netif_try_stop_queue(struct bnxt *bp,
- struct bnxt_tx_ring_info *txr,
- struct netdev_queue *txq)
-{
- netif_tx_stop_queue(txq);
-
- /* netif_tx_stop_queue() must be done before checking
- * tx index in bnxt_tx_avail() below, because in
- * bnxt_tx_int(), we update tx index before checking for
- * netif_tx_queue_stopped().
- */
- smp_mb();
- if (bnxt_tx_avail(bp, txr) >= bp->tx_wake_thresh) {
- netif_tx_wake_queue(txq);
- return false;
- }
-
- return true;
-}
-
static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct bnxt *bp = netdev_priv(dev);
@@ -384,7 +365,8 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
if (net_ratelimit() && txr->kick_pending)
netif_warn(bp, tx_err, dev,
"bnxt: ring busy w/ flush pending!\n");
- if (bnxt_txr_netif_try_stop_queue(bp, txr, txq))
+ if (!netif_tx_queue_try_stop(txq, bnxt_tx_avail(bp, txr),
+ bp->tx_wake_thresh))
return NETDEV_TX_BUSY;
}
@@ -614,7 +596,8 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
if (netdev_xmit_more() && !tx_buf->is_push)
bnxt_txr_db_kick(bp, txr, prod);
- bnxt_txr_netif_try_stop_queue(bp, txr, txq);
+ netif_tx_queue_try_stop(txq, bnxt_tx_avail(bp, txr),
+ bp->tx_wake_thresh);
}
return NETDEV_TX_OK;
@@ -708,17 +691,9 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts)
netdev_tx_completed_queue(txq, nr_pkts, tx_bytes);
txr->tx_cons = cons;
- /* Need to make the tx_cons update visible to bnxt_start_xmit()
- * before checking for netif_tx_queue_stopped(). Without the
- * memory barrier, there is a small possibility that bnxt_start_xmit()
- * will miss it and cause the queue to be stopped forever.
- */
- smp_mb();
-
- if (unlikely(netif_tx_queue_stopped(txq)) &&
- bnxt_tx_avail(bp, txr) >= bp->tx_wake_thresh &&
- READ_ONCE(txr->dev_state) != BNXT_DEV_STATE_CLOSING)
- netif_tx_wake_queue(txq);
+ __netif_tx_queue_maybe_wake(txq, bnxt_tx_avail(bp, txr),
+ bp->tx_wake_thresh,
+ READ_ONCE(txr->dev_state) != BNXT_DEV_STATE_CLOSING);
}
static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
--
2.39.2
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [RFC net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-11 5:01 [RFC net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
2023-03-11 5:01 ` [RFC net-next 2/3] ixgbe: use new queue try_stop/try_wake macros Jakub Kicinski
2023-03-11 5:01 ` [RFC net-next 3/3] bnxt: " Jakub Kicinski
@ 2023-03-11 16:28 ` Stephen Hemminger
2023-03-13 1:37 ` Willem de Bruijn
2023-04-04 3:29 ` Herbert Xu
3 siblings, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2023-03-11 16:28 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, davem, edumazet, pabeni, alexanderduyck, roman.gushchin
On Fri, 10 Mar 2023 21:01:28 -0800
Jakub Kicinski <kuba@kernel.org> wrote:
> A lot of drivers follow the same scheme to stop / start queues
> without introducing locks between xmit and NAPI tx completions.
> I'm guessing they all copy'n'paste each other's code.
>
> Smaller drivers shy away from the scheme and introduce a lock
> which may cause deadlocks in netpoll.
>
> Provide macros which encapsulate the necessary logic.
Could any of these be inline functions instead for type safety?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-11 16:28 ` [RFC net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Stephen Hemminger
@ 2023-03-13 1:37 ` Willem de Bruijn
2023-03-13 1:45 ` Stephen Hemminger
0 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2023-03-13 1:37 UTC (permalink / raw)
To: Stephen Hemminger, Jakub Kicinski
Cc: netdev, davem, edumazet, pabeni, alexanderduyck, roman.gushchin
Stephen Hemminger wrote:
> On Fri, 10 Mar 2023 21:01:28 -0800
> Jakub Kicinski <kuba@kernel.org> wrote:
>
> > A lot of drivers follow the same scheme to stop / start queues
> > without introducing locks between xmit and NAPI tx completions.
> > I'm guessing they all copy'n'paste each other's code.
> >
> > Smaller drivers shy away from the scheme and introduce a lock
> > which may cause deadlocks in netpoll.
> >
> > Provide macros which encapsulate the necessary logic.
>
> Could any of these be inline functions instead for type safety?
I suppose not because of the condition that is evaluated.
Btw: perdicate -> predicate
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-13 1:37 ` Willem de Bruijn
@ 2023-03-13 1:45 ` Stephen Hemminger
2023-03-13 20:56 ` Jakub Kicinski
0 siblings, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2023-03-13 1:45 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Jakub Kicinski, netdev, davem, edumazet, pabeni, alexanderduyck,
roman.gushchin
On Sun, 12 Mar 2023 21:37:39 -0400
Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> Stephen Hemminger wrote:
> > On Fri, 10 Mar 2023 21:01:28 -0800
> > Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > > A lot of drivers follow the same scheme to stop / start queues
> > > without introducing locks between xmit and NAPI tx completions.
> > > I'm guessing they all copy'n'paste each other's code.
> > >
> > > Smaller drivers shy away from the scheme and introduce a lock
> > > which may cause deadlocks in netpoll.
> > >
> > > Provide macros which encapsulate the necessary logic.
> >
> > Could any of these be inline functions instead for type safety?
>
> I suppose not because of the condition that is evaluated.
It is more that the condition needs to evaluated after some other
pre-conditions.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-13 1:45 ` Stephen Hemminger
@ 2023-03-13 20:56 ` Jakub Kicinski
0 siblings, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2023-03-13 20:56 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Willem de Bruijn, netdev, davem, edumazet, pabeni, alexanderduyck,
roman.gushchin
On Sun, 12 Mar 2023 18:45:15 -0700 Stephen Hemminger wrote:
> > > Could any of these be inline functions instead for type safety?
> >
> > I suppose not because of the condition that is evaluated.
>
> It is more that the condition needs to evaluated after some other
> pre-conditions.
Right, I think I could slice off individual chunks and wrap them in
static inlines, but I reckon the result is relatively readable now?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-11 5:01 [RFC net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
` (2 preceding siblings ...)
2023-03-11 16:28 ` [RFC net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Stephen Hemminger
@ 2023-04-04 3:29 ` Herbert Xu
3 siblings, 0 replies; 8+ messages in thread
From: Herbert Xu @ 2023-04-04 3:29 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, davem, edumazet, pabeni, alexanderduyck, roman.gushchin,
kuba
Jakub Kicinski <kuba@kernel.org> wrote:
>
> +#define netif_tx_queue_try_stop(txq, get_desc, start_thrs) \
> + ({ \
> + int _res; \
> + \
> + netif_tx_stop_queue(txq); \
> + \
> + smp_mb(); \
We should never have an smp_mb by itself. It must come with a
comment indicating which other barrier (possibly implicit) it
pairs with.
I know that you're just copying old code around, but by turning
it into a helper, we should treat it as new code and apply the
current requirements.
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-04-04 3:29 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-11 5:01 [RFC net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
2023-03-11 5:01 ` [RFC net-next 2/3] ixgbe: use new queue try_stop/try_wake macros Jakub Kicinski
2023-03-11 5:01 ` [RFC net-next 3/3] bnxt: " Jakub Kicinski
2023-03-11 16:28 ` [RFC net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Stephen Hemminger
2023-03-13 1:37 ` Willem de Bruijn
2023-03-13 1:45 ` Stephen Hemminger
2023-03-13 20:56 ` Jakub Kicinski
2023-04-04 3:29 ` Herbert Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).