* [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
@ 2023-03-22 23:30 Jakub Kicinski
2023-03-22 23:30 ` [PATCH net-next 2/3] ixgbe: use new queue try_stop/try_wake macros Jakub Kicinski
` (5 more replies)
0 siblings, 6 replies; 19+ messages in thread
From: Jakub Kicinski @ 2023-03-22 23:30 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, willemb, alexander.duyck,
Jakub Kicinski
A lot of drivers follow the same scheme to stop / start queues
without introducing locks between xmit and NAPI tx completions.
I'm guessing they all copy'n'paste each other's code.
Smaller drivers shy away from the scheme and introduce a lock
which may cause deadlocks in netpoll.
Provide macros which encapsulate the necessary logic.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
rfc: https://lore.kernel.org/all/20230311050130.115138-1-kuba@kernel.org/
- perdicate -> predicate
- on race use start instead of wake and make a note of that
in the doc / comment at the start
---
include/net/netdev_queues.h | 171 ++++++++++++++++++++++++++++++++++++
1 file changed, 171 insertions(+)
create mode 100644 include/net/netdev_queues.h
diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
new file mode 100644
index 000000000000..64e059647274
--- /dev/null
+++ b/include/net/netdev_queues.h
@@ -0,0 +1,171 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_NET_QUEUES_H
+#define _LINUX_NET_QUEUES_H
+
+#include <linux/netdevice.h>
+
+/* Lockless queue stopping / waking helpers.
+ *
+ * These macros are designed to safely implement stopping and waking
+ * netdev queues without full lock protection. We assume that there can
+ * be no concurrent stop attempts and no concurrent wake attempts.
+ * The try-stop should happen from the xmit handler*, while wake up
+ * should be triggered from NAPI poll context. The two may run
+ * concurrently (single producer, single consumer).
+ *
+ * All descriptor ring indexes (and other relevant shared state) must
+ * be updated before invoking the macros.
+ *
+ * * the try-stop side does not reschedule Tx (netif_tx_start_queue()
+ * instead of netif_tx_wake_queue()) so uses outside of the xmit
+ * handler may lead to bugs
+ */
+
+#define netif_tx_queue_try_stop(txq, get_desc, start_thrs) \
+ ({ \
+ int _res; \
+ \
+ netif_tx_stop_queue(txq); \
+ \
+ smp_mb(); \
+ \
+ /* We need to check again in a case another \
+ * CPU has just made room available. \
+ */ \
+ if (likely(get_desc < start_thrs)) { \
+ _res = 0; \
+ } else { \
+ netif_tx_start_queue(txq); \
+ _res = -1; \
+ } \
+ _res; \
+ }) \
+
+/**
+ * netif_tx_queue_maybe_stop() - locklessly stop a Tx queue, if needed
+ * @txq: struct netdev_queue to stop/start
+ * @get_desc: get current number of free descriptors (see requirements below!)
+ * @stop_thrs: minimal number of available descriptors for queue to be left
+ * enabled
+ * @start_thrs: minimal number of descriptors to re-enable the queue, can be
+ * equal to @stop_thrs or higher to avoid frequent waking
+ *
+ * All arguments may be evaluated multiple times, beware of side effects.
+ * @get_desc must be a formula or a function call, it must always
+ * return up-to-date information when evaluated!
+ * Expected to be used from ndo_start_xmit, see the comment on top of the file.
+ *
+ * Returns:
+ * 0 if the queue was stopped
+ * 1 if the queue was left enabled
+ * -1 if the queue was re-enabled (raced with waking)
+ */
+#define netif_tx_queue_maybe_stop(txq, get_desc, stop_thrs, start_thrs) \
+ ({ \
+ int _res; \
+ \
+ if (likely(get_desc > stop_thrs)) \
+ _res = 1; \
+ else \
+ _res = netif_tx_queue_try_stop(txq, get_desc, \
+ start_thrs); \
+ _res; \
+ }) \
+
+#define __netif_tx_queue_try_wake(txq, get_desc, start_thrs, down_cond) \
+ ({ \
+ int _res; \
+ \
+ /* Make sure that anybody stopping the queue after \
+ * this sees the new next_to_clean. \
+ */ \
+ smp_mb(); \
+ if (netif_tx_queue_stopped(txq) && !(down_cond)) { \
+ netif_tx_wake_queue(txq); \
+ _res = 0; \
+ } else { \
+ _res = 1; \
+ } \
+ _res; \
+ })
+
+#define netif_tx_queue_try_wake(txq, get_desc, start_thrs) \
+ __netif_tx_queue_try_wake(txq, get_desc, start_thrs, false)
+
+/**
+ * __netif_tx_queue_maybe_wake() - locklessly wake a Tx queue, if needed
+ * @txq: struct netdev_queue to stop/start
+ * @get_desc: get current number of free descriptors (see requirements below!)
+ * @start_thrs: minimal number of descriptors to re-enable the queue
+ * @down_cond: down condition, predicate indicating that the queue should
+ * not be woken up even if descriptors are available
+ *
+ * All arguments may be evaluated multiple times.
+ * @get_desc must be a formula or a function call, it must always
+ * return up-to-date information when evaluated!
+ *
+ * Returns:
+ * 0 if the queue was woken up
+ * 1 if the queue was already enabled (or disabled but @down_cond is true)
+ * -1 if the queue was left stopped
+ */
+#define __netif_tx_queue_maybe_wake(txq, get_desc, start_thrs, down_cond) \
+ ({ \
+ int _res; \
+ \
+ if (likely(get_desc < start_thrs)) \
+ _res = -1; \
+ else \
+ _res = __netif_tx_queue_try_wake(txq, get_desc, \
+ start_thrs, \
+ down_cond); \
+ _res; \
+ })
+
+#define netif_tx_queue_maybe_wake(txq, get_desc, start_thrs) \
+ __netif_tx_queue_maybe_wake(txq, get_desc, start_thrs, false)
+
+/* subqueue variants follow */
+
+#define netif_subqueue_try_stop(dev, idx, get_desc, start_thrs) \
+ ({ \
+ struct netdev_queue *txq; \
+ \
+ txq = netdev_get_tx_queue(dev, idx); \
+ netif_tx_queue_try_stop(txq, get_desc, start_thrs); \
+ })
+
+#define netif_subqueue_maybe_stop(dev, idx, get_desc, stop_thrs, start_thrs) \
+ ({ \
+ struct netdev_queue *txq; \
+ \
+ txq = netdev_get_tx_queue(dev, idx); \
+ netif_tx_queue_maybe_stop(txq, get_desc, \
+ stop_thrs, start_thrs); \
+ })
+
+#define __netif_subqueue_try_wake(dev, idx, get_desc, start_thrs, down_cond) \
+ ({ \
+ struct netdev_queue *txq; \
+ \
+ txq = netdev_get_tx_queue(dev, idx); \
+ __netif_tx_queue_try_wake(txq, get_desc, \
+ start_thrs, down_cond); \
+ })
+
+#define netif_subqueue_try_wake(dev, idx, get_desc, start_thrs) \
+ __netif_subqueue_try_wake(dev, idx, get_desc, start_thrs, false)
+
+#define __netif_subqueue_maybe_wake(dev, idx, get_desc, start_thrs, down_cond) \
+ ({ \
+ struct netdev_queue *txq; \
+ \
+ txq = netdev_get_tx_queue(dev, idx); \
+ __netif_tx_queue_maybe_wake(txq, get_desc, \
+ start_thrs, down_cond); \
+ })
+
+#define netif_subqueue_maybe_wake(dev, idx, get_desc, start_thrs) \
+ __netif_subqueue_maybe_wake(dev, idx, get_desc, start_thrs, false)
+
+#endif
--
2.39.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next 2/3] ixgbe: use new queue try_stop/try_wake macros
2023-03-22 23:30 [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
@ 2023-03-22 23:30 ` Jakub Kicinski
2023-03-22 23:30 ` [PATCH net-next 3/3] bnxt: " Jakub Kicinski
` (4 subsequent siblings)
5 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2023-03-22 23:30 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, willemb, alexander.duyck,
Jakub Kicinski
Convert ixgbe to use the new macros, I think a lot of people
copy the ixgbe code. The only functional change is that the
unlikely() in ixgbe_clean_tx_irq() turns into a likely()
inside the new macro and no longer includes
total_packets && netif_carrier_ok(tx_ring->netdev)
which is probably for the best, anyway.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 37 +++++--------------
1 file changed, 9 insertions(+), 28 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 773c35fecace..db00e50a40ff 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -36,6 +36,7 @@
#include <net/tc_act/tc_mirred.h>
#include <net/vxlan.h>
#include <net/mpls.h>
+#include <net/netdev_queues.h>
#include <net/xdp_sock_drv.h>
#include <net/xfrm.h>
@@ -1253,20 +1254,12 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
total_packets, total_bytes);
#define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
- if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) &&
- (ixgbe_desc_unused(tx_ring) >= TX_WAKE_THRESHOLD))) {
- /* Make sure that anybody stopping the queue after this
- * sees the new next_to_clean.
- */
- smp_mb();
- if (__netif_subqueue_stopped(tx_ring->netdev,
- tx_ring->queue_index)
- && !test_bit(__IXGBE_DOWN, &adapter->state)) {
- netif_wake_subqueue(tx_ring->netdev,
- tx_ring->queue_index);
- ++tx_ring->tx_stats.restart_queue;
- }
- }
+ if (total_packets && netif_carrier_ok(tx_ring->netdev) &&
+ !__netif_subqueue_maybe_wake(tx_ring->netdev, tx_ring->queue_index,
+ ixgbe_desc_unused(tx_ring),
+ TX_WAKE_THRESHOLD,
+ test_bit(__IXGBE_DOWN, &adapter->state)))
+ ++tx_ring->tx_stats.restart_queue;
return !!budget;
}
@@ -8270,22 +8263,10 @@ static void ixgbe_tx_olinfo_status(union ixgbe_adv_tx_desc *tx_desc,
static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
{
- netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
-
- /* Herbert's original patch had:
- * smp_mb__after_netif_stop_queue();
- * but since that doesn't exist yet, just open code it.
- */
- smp_mb();
-
- /* We need to check again in a case another CPU has just
- * made room available.
- */
- if (likely(ixgbe_desc_unused(tx_ring) < size))
+ if (!netif_subqueue_try_stop(tx_ring->netdev, tx_ring->queue_index,
+ ixgbe_desc_unused(tx_ring), size))
return -EBUSY;
- /* A reprieve! - use start_queue because it doesn't call schedule */
- netif_start_subqueue(tx_ring->netdev, tx_ring->queue_index);
++tx_ring->tx_stats.restart_queue;
return 0;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next 3/3] bnxt: use new queue try_stop/try_wake macros
2023-03-22 23:30 [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
2023-03-22 23:30 ` [PATCH net-next 2/3] ixgbe: use new queue try_stop/try_wake macros Jakub Kicinski
@ 2023-03-22 23:30 ` Jakub Kicinski
2023-03-23 0:35 ` [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Andrew Lunn
` (3 subsequent siblings)
5 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2023-03-22 23:30 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, pabeni, willemb, alexander.duyck,
Jakub Kicinski
Convert bnxt to use new macros rather than open code the logic.
Two differences:
(1) bnxt_tx_int() will now only issue a memory barrier if it sees
enough space on the ring to wake the queue. This should be fine,
the mb() is between the writes to the ring pointers and checking
queue state.
(2) we'll start the queue instead of waking on race, this should
be safe inside the xmit handler.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 41 +++++------------------
1 file changed, 8 insertions(+), 33 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index f533a8f46217..cbd48c192de3 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -56,6 +56,7 @@
#include <linux/hwmon-sysfs.h>
#include <net/page_pool.h>
#include <linux/align.h>
+#include <net/netdev_queues.h>
#include "bnxt_hsi.h"
#include "bnxt.h"
@@ -331,26 +332,6 @@ static void bnxt_txr_db_kick(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
txr->kick_pending = 0;
}
-static bool bnxt_txr_netif_try_stop_queue(struct bnxt *bp,
- struct bnxt_tx_ring_info *txr,
- struct netdev_queue *txq)
-{
- netif_tx_stop_queue(txq);
-
- /* netif_tx_stop_queue() must be done before checking
- * tx index in bnxt_tx_avail() below, because in
- * bnxt_tx_int(), we update tx index before checking for
- * netif_tx_queue_stopped().
- */
- smp_mb();
- if (bnxt_tx_avail(bp, txr) >= bp->tx_wake_thresh) {
- netif_tx_wake_queue(txq);
- return false;
- }
-
- return true;
-}
-
static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct bnxt *bp = netdev_priv(dev);
@@ -384,7 +365,8 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
if (net_ratelimit() && txr->kick_pending)
netif_warn(bp, tx_err, dev,
"bnxt: ring busy w/ flush pending!\n");
- if (bnxt_txr_netif_try_stop_queue(bp, txr, txq))
+ if (!netif_tx_queue_try_stop(txq, bnxt_tx_avail(bp, txr),
+ bp->tx_wake_thresh))
return NETDEV_TX_BUSY;
}
@@ -614,7 +596,8 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
if (netdev_xmit_more() && !tx_buf->is_push)
bnxt_txr_db_kick(bp, txr, prod);
- bnxt_txr_netif_try_stop_queue(bp, txr, txq);
+ netif_tx_queue_try_stop(txq, bnxt_tx_avail(bp, txr),
+ bp->tx_wake_thresh);
}
return NETDEV_TX_OK;
@@ -708,17 +691,9 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts)
netdev_tx_completed_queue(txq, nr_pkts, tx_bytes);
txr->tx_cons = cons;
- /* Need to make the tx_cons update visible to bnxt_start_xmit()
- * before checking for netif_tx_queue_stopped(). Without the
- * memory barrier, there is a small possibility that bnxt_start_xmit()
- * will miss it and cause the queue to be stopped forever.
- */
- smp_mb();
-
- if (unlikely(netif_tx_queue_stopped(txq)) &&
- bnxt_tx_avail(bp, txr) >= bp->tx_wake_thresh &&
- READ_ONCE(txr->dev_state) != BNXT_DEV_STATE_CLOSING)
- netif_tx_wake_queue(txq);
+ __netif_tx_queue_maybe_wake(txq, bnxt_tx_avail(bp, txr),
+ bp->tx_wake_thresh,
+ READ_ONCE(txr->dev_state) != BNXT_DEV_STATE_CLOSING);
}
static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
--
2.39.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-22 23:30 [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
2023-03-22 23:30 ` [PATCH net-next 2/3] ixgbe: use new queue try_stop/try_wake macros Jakub Kicinski
2023-03-22 23:30 ` [PATCH net-next 3/3] bnxt: " Jakub Kicinski
@ 2023-03-23 0:35 ` Andrew Lunn
2023-03-23 1:04 ` Jakub Kicinski
2023-03-23 3:05 ` Yunsheng Lin
` (2 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Andrew Lunn @ 2023-03-23 0:35 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, willemb, alexander.duyck
On Wed, Mar 22, 2023 at 04:30:26PM -0700, Jakub Kicinski wrote:
> A lot of drivers follow the same scheme to stop / start queues
> without introducing locks between xmit and NAPI tx completions.
> I'm guessing they all copy'n'paste each other's code.
>
> Smaller drivers shy away from the scheme and introduce a lock
> which may cause deadlocks in netpoll.
Hi Jakub
I notice there is no patch 0/X. Seems like the above would be good
material for it, along with a comment that a few drivers are converted
to make use of the new macros.
Andrew
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-23 0:35 ` [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Andrew Lunn
@ 2023-03-23 1:04 ` Jakub Kicinski
2023-03-23 21:02 ` Andrew Lunn
0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2023-03-23 1:04 UTC (permalink / raw)
To: Andrew Lunn
Cc: davem, netdev, edumazet, pabeni, willemb, alexander.duyck,
michael.chan, intel-wired-lan, anthony.l.nguyen, jesse.brandeburg
CC: maintainers, in case there isn't a repost
https://lore.kernel.org/all/20230322233028.269410-1-kuba@kernel.org/
On Thu, 23 Mar 2023 01:35:34 +0100 Andrew Lunn wrote:
> On Wed, Mar 22, 2023 at 04:30:26PM -0700, Jakub Kicinski wrote:
> > A lot of drivers follow the same scheme to stop / start queues
> > without introducing locks between xmit and NAPI tx completions.
> > I'm guessing they all copy'n'paste each other's code.
> >
> > Smaller drivers shy away from the scheme and introduce a lock
> > which may cause deadlocks in netpoll.
>
> I notice there is no patch 0/X. Seems like the above would be good
> material for it, along with a comment that a few drivers are converted
> to make use of the new macros.
Then do I repeat the same text in the commit? Or cut the commit down?
Doesn't that just take away information from the commit which will
show up in git blame?
Having a cover letter is a good default, and required if the series
is a larger change decomposed into steps. But here there is a major
change and a bunch of loose conversions. More sample users than
meaningful part.
LMK what your preference for splitting this info is, I'm unsure.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-22 23:30 [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
` (2 preceding siblings ...)
2023-03-23 0:35 ` [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Andrew Lunn
@ 2023-03-23 3:05 ` Yunsheng Lin
2023-03-23 3:27 ` Jakub Kicinski
2023-03-23 4:53 ` Pavan Chebbi
2023-03-23 16:05 ` Alexander H Duyck
5 siblings, 1 reply; 19+ messages in thread
From: Yunsheng Lin @ 2023-03-23 3:05 UTC (permalink / raw)
To: Jakub Kicinski, davem; +Cc: netdev, edumazet, pabeni, willemb, alexander.duyck
On 2023/3/23 7:30, Jakub Kicinski wrote:
> A lot of drivers follow the same scheme to stop / start queues
> without introducing locks between xmit and NAPI tx completions.
> I'm guessing they all copy'n'paste each other's code.
>
> Smaller drivers shy away from the scheme and introduce a lock
> which may cause deadlocks in netpoll.
>
> Provide macros which encapsulate the necessary logic.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> rfc: https://lore.kernel.org/all/20230311050130.115138-1-kuba@kernel.org/
> - perdicate -> predicate
> - on race use start instead of wake and make a note of that
> in the doc / comment at the start
> ---
> include/net/netdev_queues.h | 171 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 171 insertions(+)
> create mode 100644 include/net/netdev_queues.h
>
> diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
> new file mode 100644
> index 000000000000..64e059647274
> --- /dev/null
> +++ b/include/net/netdev_queues.h
> @@ -0,0 +1,171 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_NET_QUEUES_H
> +#define _LINUX_NET_QUEUES_H
> +
> +#include <linux/netdevice.h>
> +
> +/* Lockless queue stopping / waking helpers.
> + *
> + * These macros are designed to safely implement stopping and waking
> + * netdev queues without full lock protection. We assume that there can
> + * be no concurrent stop attempts and no concurrent wake attempts.
> + * The try-stop should happen from the xmit handler*, while wake up
unnecessary '*' after handler?
> + * should be triggered from NAPI poll context. The two may run
> + * concurrently (single producer, single consumer).
> + *
> + * All descriptor ring indexes (and other relevant shared state) must
> + * be updated before invoking the macros.
> + *
> + * * the try-stop side does not reschedule Tx (netif_tx_start_queue()
double '*' here?
> + * instead of netif_tx_wake_queue()) so uses outside of the xmit
> + * handler may lead to bugs
> + */
> +
> +#define netif_tx_queue_try_stop(txq, get_desc, start_thrs) \
> + ({ \
> + int _res; \
> + \
> + netif_tx_stop_queue(txq); \
> + \
> + smp_mb(); \
> + \
> + /* We need to check again in a case another \
> + * CPU has just made room available. \
> + */ \
> + if (likely(get_desc < start_thrs)) { \
> + _res = 0; \
> + } else { \
> + netif_tx_start_queue(txq); \
> + _res = -1; \
> + } \
> + _res; \
> + }) \
> +
> +/**
> + * netif_tx_queue_maybe_stop() - locklessly stop a Tx queue, if needed
> + * @txq: struct netdev_queue to stop/start
> + * @get_desc: get current number of free descriptors (see requirements below!)
> + * @stop_thrs: minimal number of available descriptors for queue to be left
> + * enabled
> + * @start_thrs: minimal number of descriptors to re-enable the queue, can be
> + * equal to @stop_thrs or higher to avoid frequent waking
> + *
> + * All arguments may be evaluated multiple times, beware of side effects.
> + * @get_desc must be a formula or a function call, it must always
> + * return up-to-date information when evaluated!
> + * Expected to be used from ndo_start_xmit, see the comment on top of the file.
For now,there seems to be three ways of calling *_maybe_stop():
1. called before transimting a skb.
2. called after transimting a skb.
3. called both before and after transimting a skb.
Which one should new driver follow?
Do we need to make it clear here?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-23 3:05 ` Yunsheng Lin
@ 2023-03-23 3:27 ` Jakub Kicinski
0 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2023-03-23 3:27 UTC (permalink / raw)
To: Yunsheng Lin; +Cc: davem, netdev, edumazet, pabeni, willemb, alexander.duyck
On Thu, 23 Mar 2023 11:05:36 +0800 Yunsheng Lin wrote:
> > +/* Lockless queue stopping / waking helpers.
> > + *
> > + * These macros are designed to safely implement stopping and waking
> > + * netdev queues without full lock protection. We assume that there can
> > + * be no concurrent stop attempts and no concurrent wake attempts.
> > + * The try-stop should happen from the xmit handler*, while wake up
>
> unnecessary '*' after handler?
>
> > + * should be triggered from NAPI poll context. The two may run
> > + * concurrently (single producer, single consumer).
> > + *
> > + * All descriptor ring indexes (and other relevant shared state) must
> > + * be updated before invoking the macros.
> > + *
> > + * * the try-stop side does not reschedule Tx (netif_tx_start_queue()
>
> double '*' here?
It's a footnote..
> > + * instead of netif_tx_wake_queue()) so uses outside of the xmit
> > + * handler may lead to bugs
> > + */
> > +
> > +#define netif_tx_queue_try_stop(txq, get_desc, start_thrs) \
> > + ({ \
> > + int _res; \
> > + \
> > + netif_tx_stop_queue(txq); \
> > + \
> > + smp_mb(); \
> > + \
> > + /* We need to check again in a case another \
> > + * CPU has just made room available. \
> > + */ \
> > + if (likely(get_desc < start_thrs)) { \
> > + _res = 0; \
> > + } else { \
> > + netif_tx_start_queue(txq); \
> > + _res = -1; \
> > + } \
> > + _res; \
> > + }) \
> > +
> > +/**
> > + * netif_tx_queue_maybe_stop() - locklessly stop a Tx queue, if needed
> > + * @txq: struct netdev_queue to stop/start
> > + * @get_desc: get current number of free descriptors (see requirements below!)
> > + * @stop_thrs: minimal number of available descriptors for queue to be left
> > + * enabled
> > + * @start_thrs: minimal number of descriptors to re-enable the queue, can be
> > + * equal to @stop_thrs or higher to avoid frequent waking
> > + *
> > + * All arguments may be evaluated multiple times, beware of side effects.
> > + * @get_desc must be a formula or a function call, it must always
> > + * return up-to-date information when evaluated!
> > + * Expected to be used from ndo_start_xmit, see the comment on top of the file.
>
> For now,there seems to be three ways of calling *_maybe_stop():
> 1. called before transimting a skb.
> 2. called after transimting a skb.
> 3. called both before and after transimting a skb.
>
> Which one should new driver follow?
> Do we need to make it clear here?
After, the check before is more of a sanity check.
AFAIR netdevice.rst or some other place documents the stopping
expectations.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-22 23:30 [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
` (3 preceding siblings ...)
2023-03-23 3:05 ` Yunsheng Lin
@ 2023-03-23 4:53 ` Pavan Chebbi
2023-03-23 5:08 ` Jakub Kicinski
2023-03-23 16:05 ` Alexander H Duyck
5 siblings, 1 reply; 19+ messages in thread
From: Pavan Chebbi @ 2023-03-23 4:53 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, willemb, alexander.duyck
[-- Attachment #1: Type: text/plain, Size: 1844 bytes --]
On Thu, Mar 23, 2023 at 5:01 AM Jakub Kicinski <kuba@kernel.org> wrote:
> +
> +#define netif_tx_queue_try_stop(txq, get_desc, start_thrs) \
> + ({ \
> + int _res; \
> + \
> + netif_tx_stop_queue(txq); \
> + \
> + smp_mb(); \
> + \
> + /* We need to check again in a case another \
> + * CPU has just made room available. \
> + */ \
> + if (likely(get_desc < start_thrs)) { \
I am only curious to understand why initializing _res with likely
result and having a condition to cover only the unlikely case, would
not be better.
As in:
int _res = 0;
if (unlikely(get_desc >= start_thrs) {
start_queue()
_res = -1
}
> + _res = 0; \
> + } else { \
> + netif_tx_start_queue(txq); \
> + _res = -1; \
> + } \
> + _res; \
> + }) \
> +
> --
> 2.39.2
>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-23 4:53 ` Pavan Chebbi
@ 2023-03-23 5:08 ` Jakub Kicinski
0 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2023-03-23 5:08 UTC (permalink / raw)
To: Pavan Chebbi; +Cc: davem, netdev, edumazet, pabeni, willemb, alexander.duyck
On Thu, 23 Mar 2023 10:23:32 +0530 Pavan Chebbi wrote:
> > + /* We need to check again in a case another \
> > + * CPU has just made room available. \
> > + */ \
> > + if (likely(get_desc < start_thrs)) { \
>
> I am only curious to understand why initializing _res with likely
> result and having a condition to cover only the unlikely case, would
> not be better.
> As in:
> int _res = 0;
> if (unlikely(get_desc >= start_thrs) {
> start_queue()
> _res = -1
> }
I don't think it matters.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-22 23:30 [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
` (4 preceding siblings ...)
2023-03-23 4:53 ` Pavan Chebbi
@ 2023-03-23 16:05 ` Alexander H Duyck
2023-03-24 3:09 ` Jakub Kicinski
5 siblings, 1 reply; 19+ messages in thread
From: Alexander H Duyck @ 2023-03-23 16:05 UTC (permalink / raw)
To: Jakub Kicinski, davem; +Cc: netdev, edumazet, pabeni, willemb
On Wed, 2023-03-22 at 16:30 -0700, Jakub Kicinski wrote:
> A lot of drivers follow the same scheme to stop / start queues
> without introducing locks between xmit and NAPI tx completions.
> I'm guessing they all copy'n'paste each other's code.
>
> Smaller drivers shy away from the scheme and introduce a lock
> which may cause deadlocks in netpoll.
>
> Provide macros which encapsulate the necessary logic.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> rfc: https://lore.kernel.org/all/20230311050130.115138-1-kuba@kernel.org/
> - perdicate -> predicate
> - on race use start instead of wake and make a note of that
> in the doc / comment at the start
> ---
> include/net/netdev_queues.h | 171 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 171 insertions(+)
> create mode 100644 include/net/netdev_queues.h
>
> diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
> new file mode 100644
> index 000000000000..64e059647274
> --- /dev/null
> +++ b/include/net/netdev_queues.h
> @@ -0,0 +1,171 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_NET_QUEUES_H
> +#define _LINUX_NET_QUEUES_H
> +
> +#include <linux/netdevice.h>
> +
> +/* Lockless queue stopping / waking helpers.
> + *
> + * These macros are designed to safely implement stopping and waking
> + * netdev queues without full lock protection. We assume that there can
> + * be no concurrent stop attempts and no concurrent wake attempts.
> + * The try-stop should happen from the xmit handler*, while wake up
> + * should be triggered from NAPI poll context. The two may run
> + * concurrently (single producer, single consumer).
> + *
> + * All descriptor ring indexes (and other relevant shared state) must
> + * be updated before invoking the macros.
> + *
> + * * the try-stop side does not reschedule Tx (netif_tx_start_queue()
> + * instead of netif_tx_wake_queue()) so uses outside of the xmit
> + * handler may lead to bugs
> + */
> +
> +#define netif_tx_queue_try_stop(txq, get_desc, start_thrs) \
> + ({ \
> + int _res; \
> + \
> + netif_tx_stop_queue(txq); \
> + \
> + smp_mb(); \
> + \
> + /* We need to check again in a case another \
> + * CPU has just made room available. \
> + */ \
> + if (likely(get_desc < start_thrs)) { \
> + _res = 0; \
> + } else { \
> + netif_tx_start_queue(txq); \
> + _res = -1; \
> + } \
> + _res; \
> + }) \
> +
The issue I see here is that with this being a macro it abstracts away
the relationship between get_desc and the memory barrier. At a minimum
I think we should be treating get_desc as a function instead of just
passing it and its arguments as a single value. Maybe something more
like how read_poll_timeout passes the "op" and then uses it as a
function with args passed seperately. What we want to avoid is passing
a precomuted value to this function as get_desc.
In addition I think I would prefer to see _res initialized to the
likely value so that we can drop this to one case instead of having to
have two. Same thing for the macros below.
> +/**
> + * netif_tx_queue_maybe_stop() - locklessly stop a Tx queue, if needed
> + * @txq: struct netdev_queue to stop/start
> + * @get_desc: get current number of free descriptors (see requirements below!)
> + * @stop_thrs: minimal number of available descriptors for queue to be left
> + * enabled
> + * @start_thrs: minimal number of descriptors to re-enable the queue, can be
> + * equal to @stop_thrs or higher to avoid frequent waking
> + *
> + * All arguments may be evaluated multiple times, beware of side effects.
> + * @get_desc must be a formula or a function call, it must always
> + * return up-to-date information when evaluated!
> + * Expected to be used from ndo_start_xmit, see the comment on top of the file.
> + *
> + * Returns:
> + * 0 if the queue was stopped
> + * 1 if the queue was left enabled
> + * -1 if the queue was re-enabled (raced with waking)
> + */
We may want to change the values here. The most likely case is "left
enabled" with that being the case we probably want to make that the 0
case. I would then probably make 1 the re-enabled case and -1 the
stopped case.
With that the decision tree becomes more straightforward as we would do
something like:
if (result) {
if (result < 0)
Increment stopped stat
return
else
Increment restarted stat
}
In addition for readability we may want consider adding an enum simliar
to the netdev_tx enum as then we have the return types locked and
usable should we want to specifically pick out one.
> +#define netif_tx_queue_maybe_stop(txq, get_desc, stop_thrs, start_thrs) \
> + ({ \
> + int _res; \
> + \
> + if (likely(get_desc > stop_thrs)) \
> + _res = 1; \
> + else \
> + _res = netif_tx_queue_try_stop(txq, get_desc, \
> + start_thrs); \
> + _res; \
> + }) \
> +
> +#define __netif_tx_queue_try_wake(txq, get_desc, start_thrs, down_cond) \
> + ({ \
> + int _res; \
> + \
> + /* Make sure that anybody stopping the queue after \
> + * this sees the new next_to_clean. \
> + */ \
> + smp_mb(); \
> + if (netif_tx_queue_stopped(txq) && !(down_cond)) { \
> + netif_tx_wake_queue(txq); \
> + _res = 0; \
> + } else { \
> + _res = 1; \
> + } \
> + _res; \
> + })
> +
> +#define netif_tx_queue_try_wake(txq, get_desc, start_thrs) \
> + __netif_tx_queue_try_wake(txq, get_desc, start_thrs, false)
> +
> +/**
> + * __netif_tx_queue_maybe_wake() - locklessly wake a Tx queue, if needed
> + * @txq: struct netdev_queue to stop/start
> + * @get_desc: get current number of free descriptors (see requirements below!)
> + * @start_thrs: minimal number of descriptors to re-enable the queue
> + * @down_cond: down condition, predicate indicating that the queue should
> + * not be woken up even if descriptors are available
> + *
> + * All arguments may be evaluated multiple times.
> + * @get_desc must be a formula or a function call, it must always
> + * return up-to-date information when evaluated!
> + *
> + * Returns:
> + * 0 if the queue was woken up
> + * 1 if the queue was already enabled (or disabled but @down_cond is true)
> + * -1 if the queue was left stopped
> + */
I would go with the same here. The most common case should probably be
0 which would be "already enabled" with 1 being woken up and -1 being
stopped. In addition keeping the two consistent with each other would
allow for easier understanding of the two.
> +#define __netif_tx_queue_maybe_wake(txq, get_desc, start_thrs, down_cond) \
> + ({ \
> + int _res; \
> + \
> + if (likely(get_desc < start_thrs)) \
> + _res = -1; \
> + else \
> + _res = __netif_tx_queue_try_wake(txq, get_desc, \
> + start_thrs, \
> + down_cond); \
> + _res; \
> + })
> +
The likely here is probably not correct. In most cases the queue will
likely have enough descriptors to enable waking since Tx cleanup can
usually run pretty fast compared to the transmit path itself since it
can run without needing to take locks.
> +#define netif_tx_queue_maybe_wake(txq, get_desc, start_thrs) \
> + __netif_tx_queue_maybe_wake(txq, get_desc, start_thrs, false)
> +
> +/* subqueue variants follow */
> +
> +#define netif_subqueue_try_stop(dev, idx, get_desc, start_thrs) \
> + ({ \
> + struct netdev_queue *txq; \
> + \
> + txq = netdev_get_tx_queue(dev, idx); \
> + netif_tx_queue_try_stop(txq, get_desc, start_thrs); \
> + })
> +
> +#define netif_subqueue_maybe_stop(dev, idx, get_desc, stop_thrs, start_thrs) \
> + ({ \
> + struct netdev_queue *txq; \
> + \
> + txq = netdev_get_tx_queue(dev, idx); \
> + netif_tx_queue_maybe_stop(txq, get_desc, \
> + stop_thrs, start_thrs); \
> + })
> +
> +#define __netif_subqueue_try_wake(dev, idx, get_desc, start_thrs, down_cond) \
> + ({ \
> + struct netdev_queue *txq; \
> + \
> + txq = netdev_get_tx_queue(dev, idx); \
> + __netif_tx_queue_try_wake(txq, get_desc, \
> + start_thrs, down_cond); \
> + })
> +
> +#define netif_subqueue_try_wake(dev, idx, get_desc, start_thrs) \
> + __netif_subqueue_try_wake(dev, idx, get_desc, start_thrs, false)
> +
> +#define __netif_subqueue_maybe_wake(dev, idx, get_desc, start_thrs, down_cond) \
> + ({ \
> + struct netdev_queue *txq; \
> + \
> + txq = netdev_get_tx_queue(dev, idx); \
> + __netif_tx_queue_maybe_wake(txq, get_desc, \
> + start_thrs, down_cond); \
> + })
> +
> +#define netif_subqueue_maybe_wake(dev, idx, get_desc, start_thrs) \
> + __netif_subqueue_maybe_wake(dev, idx, get_desc, start_thrs, false)
> +
> +#endif
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-23 1:04 ` Jakub Kicinski
@ 2023-03-23 21:02 ` Andrew Lunn
2023-03-23 22:46 ` Jakub Kicinski
0 siblings, 1 reply; 19+ messages in thread
From: Andrew Lunn @ 2023-03-23 21:02 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, willemb, alexander.duyck,
michael.chan, intel-wired-lan, anthony.l.nguyen, jesse.brandeburg
On Wed, Mar 22, 2023 at 06:04:06PM -0700, Jakub Kicinski wrote:
> CC: maintainers, in case there isn't a repost
> https://lore.kernel.org/all/20230322233028.269410-1-kuba@kernel.org/
>
> On Thu, 23 Mar 2023 01:35:34 +0100 Andrew Lunn wrote:
> > On Wed, Mar 22, 2023 at 04:30:26PM -0700, Jakub Kicinski wrote:
> > > A lot of drivers follow the same scheme to stop / start queues
> > > without introducing locks between xmit and NAPI tx completions.
> > > I'm guessing they all copy'n'paste each other's code.
> > >
> > > Smaller drivers shy away from the scheme and introduce a lock
> > > which may cause deadlocks in netpoll.
> >
> > I notice there is no patch 0/X. Seems like the above would be good
> > material for it, along with a comment that a few drivers are converted
> > to make use of the new macros.
>
> Then do I repeat the same text in the commit? Or cut the commit down?
> Doesn't that just take away information from the commit which will
> show up in git blame?
>
> Having a cover letter is a good default, and required if the series
> is a larger change decomposed into steps. But here there is a major
> change and a bunch of loose conversions. More sample users than
> meaningful part.
>
> LMK what your preference for splitting this info is, I'm unsure.
We do seem to have a policy of asking for a 0/X. And it is used for
the merge commit. That is my real point. And i don't see why the text
can be repeated in the merge commit and the individual commits.
Andrew
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-23 21:02 ` Andrew Lunn
@ 2023-03-23 22:46 ` Jakub Kicinski
0 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2023-03-23 22:46 UTC (permalink / raw)
To: Andrew Lunn
Cc: davem, netdev, edumazet, pabeni, willemb, alexander.duyck,
michael.chan, intel-wired-lan, anthony.l.nguyen, jesse.brandeburg
On Thu, 23 Mar 2023 22:02:29 +0100 Andrew Lunn wrote:
> We do seem to have a policy of asking for a 0/X. And it is used for
> the merge commit. That is my real point. And i don't see why the text
> can be repeated in the merge commit and the individual commits.
Alright, I'll respin and add a cover. But I think we should be critical
of the rules. The rules are just a mental shortcut, we shouldn't lose
sight of why they are in place and maintain some flexibility.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-23 16:05 ` Alexander H Duyck
@ 2023-03-24 3:09 ` Jakub Kicinski
2023-03-24 15:45 ` Alexander Duyck
0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2023-03-24 3:09 UTC (permalink / raw)
To: Alexander H Duyck; +Cc: davem, netdev, edumazet, pabeni, willemb
On Thu, 23 Mar 2023 09:05:35 -0700 Alexander H Duyck wrote:
> > +#define netif_tx_queue_try_stop(txq, get_desc, start_thrs) \
> > + ({ \
> > + int _res; \
> > + \
> > + netif_tx_stop_queue(txq); \
> > + \
> > + smp_mb(); \
> > + \
> > + /* We need to check again in a case another \
> > + * CPU has just made room available. \
> > + */ \
> > + if (likely(get_desc < start_thrs)) { \
> > + _res = 0; \
> > + } else { \
> > + netif_tx_start_queue(txq); \
> > + _res = -1; \
> > + } \
> > + _res; \
> > + }) \
> > +
>
> The issue I see here is that with this being a macro it abstracts away
> the relationship between get_desc and the memory barrier. At a minimum
> I think we should be treating get_desc as a function instead of just
> passing it and its arguments as a single value. Maybe something more
> like how read_poll_timeout passes the "op" and then uses it as a
> function with args passed seperately. What we want to avoid is passing
> a precomuted value to this function as get_desc.
The kdocs hopefully have enough warnings. The issue I see with
read_poll_timeout() is that I always have to have the definition
open side by side to match up the arguments. I wish there was
a way the test that something is not an lval, but I couldn't
find it :(
Let's see if anyone gets this wrong, you can tell me "I told you so"?
> In addition I think I would prefer to see _res initialized to the
> likely value so that we can drop this to one case instead of having to
> have two. Same thing for the macros below.
Alright.
> > +/**
> > + * netif_tx_queue_maybe_stop() - locklessly stop a Tx queue, if needed
> > + * @txq: struct netdev_queue to stop/start
> > + * @get_desc: get current number of free descriptors (see requirements below!)
> > + * @stop_thrs: minimal number of available descriptors for queue to be left
> > + * enabled
> > + * @start_thrs: minimal number of descriptors to re-enable the queue, can be
> > + * equal to @stop_thrs or higher to avoid frequent waking
> > + *
> > + * All arguments may be evaluated multiple times, beware of side effects.
> > + * @get_desc must be a formula or a function call, it must always
> > + * return up-to-date information when evaluated!
> > + * Expected to be used from ndo_start_xmit, see the comment on top of the file.
> > + *
> > + * Returns:
> > + * 0 if the queue was stopped
> > + * 1 if the queue was left enabled
> > + * -1 if the queue was re-enabled (raced with waking)
> > + */
>
> We may want to change the values here. The most likely case is "left
> enabled" with that being the case we probably want to make that the 0
> case. I would then probably make 1 the re-enabled case and -1 the
> stopped case.
I chose the return values this way because the important information is
whether the queue was in fact stopped (in case the macro is used at the
start of .xmit as a safety check). If stopped is zero caller can check
!ret vs !!ret.
Seems pretty normal for the kernel function called stop() to return 0
if it did stop.
> With that the decision tree becomes more straightforward as we would do
> something like:
> if (result) {
> if (result < 0)
> Increment stopped stat
> return
> else
> Increment restarted stat
> }
Do you see a driver where it matters? ixgbe and co. use
netif_tx_queue_try_stop() and again they just test stopped vs not stopped.
> In addition for readability we may want consider adding an enum simliar
> to the netdev_tx enum as then we have the return types locked and
> usable should we want to specifically pick out one.
Hm, I thought people generally dislike the netdev_tx enum.
Maybe it's just me.
> > +#define __netif_tx_queue_maybe_wake(txq, get_desc, start_thrs, down_cond) \
> > + ({ \
> > + int _res; \
> > + \
> > + if (likely(get_desc < start_thrs)) \
> > + _res = -1; \
> > + else \
> > + _res = __netif_tx_queue_try_wake(txq, get_desc, \
> > + start_thrs, \
> > + down_cond); \
> > + _res; \
> > + })
> > +
>
> The likely here is probably not correct. In most cases the queue will
> likely have enough descriptors to enable waking since Tx cleanup can
> usually run pretty fast compared to the transmit path itself since it
> can run without needing to take locks.
Good catch, must be a copy paste from the other direction without
inverting the likely.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-24 3:09 ` Jakub Kicinski
@ 2023-03-24 15:45 ` Alexander Duyck
2023-03-24 21:28 ` Jakub Kicinski
0 siblings, 1 reply; 19+ messages in thread
From: Alexander Duyck @ 2023-03-24 15:45 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, willemb
On Thu, Mar 23, 2023 at 8:09 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 23 Mar 2023 09:05:35 -0700 Alexander H Duyck wrote:
> > > +#define netif_tx_queue_try_stop(txq, get_desc, start_thrs) \
> > > + ({ \
> > > + int _res; \
> > > + \
> > > + netif_tx_stop_queue(txq); \
> > > + \
> > > + smp_mb(); \
> > > + \
> > > + /* We need to check again in a case another \
> > > + * CPU has just made room available. \
> > > + */ \
> > > + if (likely(get_desc < start_thrs)) { \
> > > + _res = 0; \
> > > + } else { \
> > > + netif_tx_start_queue(txq); \
> > > + _res = -1; \
> > > + } \
> > > + _res; \
> > > + }) \
> > > +
> >
> > The issue I see here is that with this being a macro it abstracts away
> > the relationship between get_desc and the memory barrier. At a minimum
> > I think we should be treating get_desc as a function instead of just
> > passing it and its arguments as a single value. Maybe something more
> > like how read_poll_timeout passes the "op" and then uses it as a
> > function with args passed seperately. What we want to avoid is passing
> > a precomuted value to this function as get_desc.
>
> The kdocs hopefully have enough warnings. The issue I see with
> read_poll_timeout() is that I always have to have the definition
> open side by side to match up the arguments. I wish there was
> a way the test that something is not an lval, but I couldn't
> find it :(
>
> Let's see if anyone gets this wrong, you can tell me "I told you so"?
The setup for it makes me really uncomfortable. Passing a function w/
arguments as an argument itself usually implies that it is called
first, not during.
> > In addition I think I would prefer to see _res initialized to the
> > likely value so that we can drop this to one case instead of having to
> > have two. Same thing for the macros below.
>
> Alright.
>
> > > +/**
> > > + * netif_tx_queue_maybe_stop() - locklessly stop a Tx queue, if needed
> > > + * @txq: struct netdev_queue to stop/start
> > > + * @get_desc: get current number of free descriptors (see requirements below!)
> > > + * @stop_thrs: minimal number of available descriptors for queue to be left
> > > + * enabled
> > > + * @start_thrs: minimal number of descriptors to re-enable the queue, can be
> > > + * equal to @stop_thrs or higher to avoid frequent waking
> > > + *
> > > + * All arguments may be evaluated multiple times, beware of side effects.
> > > + * @get_desc must be a formula or a function call, it must always
> > > + * return up-to-date information when evaluated!
> > > + * Expected to be used from ndo_start_xmit, see the comment on top of the file.
> > > + *
> > > + * Returns:
> > > + * 0 if the queue was stopped
> > > + * 1 if the queue was left enabled
> > > + * -1 if the queue was re-enabled (raced with waking)
> > > + */
> >
> > We may want to change the values here. The most likely case is "left
> > enabled" with that being the case we probably want to make that the 0
> > case. I would then probably make 1 the re-enabled case and -1 the
> > stopped case.
>
> I chose the return values this way because the important information is
> whether the queue was in fact stopped (in case the macro is used at the
> start of .xmit as a safety check). If stopped is zero caller can check
> !ret vs !!ret.
>
> Seems pretty normal for the kernel function called stop() to return 0
> if it did stop.
Except this isn't "stop", this is "maybe stop". Maybe we should just
do away with the stop/wake messaging and go with something such as a
RTS/CTS type setup. Basically this function is acting as a RTS to
verify that we have room on the ring to place the frame. If we don't
we are stopped. The "wake" function is on what is essentially the
receiving end on the other side of the hardware after it has DMAed the
frames and is providing the CTS signal back.
> > With that the decision tree becomes more straightforward as we would do
> > something like:
> > if (result) {
> > if (result < 0)
> > Increment stopped stat
> > return
> > else
> > Increment restarted stat
> > }
>
> Do you see a driver where it matters? ixgbe and co. use
> netif_tx_queue_try_stop() and again they just test stopped vs not stopped.
The thing is in order to make this work for the ixgbe patch you didn't
use the maybe_stop instead you went with the try_stop. If you replaced
the ixgbe_maybe_stop_tx with your maybe stop would have to do
something such as the code above to make it work. That is what I am
getting at. From what I can tell the only real difference between
ixgbe_maybe_stop_tx and your maybe_stop is that you avoided having to
move the restart_queue stat increment out.
The general thought is I would prefer to keep it so that 0 is the
default most likely case in both where the queue is enabled and is
still enabled. By moving the "take action" items into the 1/-1 values
then it becomes much easier to sort them out with 1 being a stat
increment and -1 being an indication to stop transmitting and prep for
watchdog hang if we don't clear this in the next watchdog period.
Also in general it makes it easier to understand if these all work
with the same logic.
> > In addition for readability we may want consider adding an enum simliar
> > to the netdev_tx enum as then we have the return types locked and
> > usable should we want to specifically pick out one.
>
> Hm, I thought people generally dislike the netdev_tx enum.
> Maybe it's just me.
The thought I had with the enum is to more easily connect the outcomes
with the sources. It would also help to prevent any confusion on what
is what. Having the two stop/wake functions return different values is
a potential source for errors since 0/1 means different things in the
different functions. Basically since we have 3 possible outcomes using
the enum would make it very clear what the mapping is between the two.
> > > +#define __netif_tx_queue_maybe_wake(txq, get_desc, start_thrs, down_cond) \
> > > + ({ \
> > > + int _res; \
> > > + \
> > > + if (likely(get_desc < start_thrs)) \
> > > + _res = -1; \
> > > + else \
> > > + _res = __netif_tx_queue_try_wake(txq, get_desc, \
> > > + start_thrs, \
> > > + down_cond); \
> > > + _res; \
> > > + })
> > > +
> >
> > The likely here is probably not correct. In most cases the queue will
> > likely have enough descriptors to enable waking since Tx cleanup can
> > usually run pretty fast compared to the transmit path itself since it
> > can run without needing to take locks.
>
> Good catch, must be a copy paste from the other direction without
> inverting the likely.
Actually the original code had it there in ixgbe although the check
also included total_packets in it which implies that maybe the
assumption was that Tx cleanup wasn't occurring as often as Rx? Either
that or it was a carry-over and had a check for
__netif_subqueue_stopped included in there previously.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-24 15:45 ` Alexander Duyck
@ 2023-03-24 21:28 ` Jakub Kicinski
2023-03-26 21:23 ` Alexander Duyck
0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2023-03-24 21:28 UTC (permalink / raw)
To: Alexander Duyck; +Cc: davem, netdev, edumazet, pabeni, willemb
On Fri, 24 Mar 2023 08:45:23 -0700 Alexander Duyck wrote:
> On Thu, Mar 23, 2023 at 8:09 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > We may want to change the values here. The most likely case is "left
> > > enabled" with that being the case we probably want to make that the 0
> > > case. I would then probably make 1 the re-enabled case and -1 the
> > > stopped case.
> >
> > I chose the return values this way because the important information is
> > whether the queue was in fact stopped (in case the macro is used at the
> > start of .xmit as a safety check). If stopped is zero caller can check
> > !ret vs !!ret.
> >
> > Seems pretty normal for the kernel function called stop() to return 0
> > if it did stop.
>
> Except this isn't "stop", this is "maybe stop".
So the return value from try_stop and maybe_stop would be different?
try_stop needs to return 0 if it stopped - the same semantics as
trylock(), AFAIR. Not that I love those semantics, but it's a fairly
strong precedent.
> Maybe we should just
> do away with the stop/wake messaging and go with something such as a
> RTS/CTS type setup. Basically this function is acting as a RTS to
> verify that we have room on the ring to place the frame. If we don't
> we are stopped. The "wake" function is on what is essentially the
> receiving end on the other side of the hardware after it has DMAed the
> frames and is providing the CTS signal back.
I'm definitely open to different naming but wouldn't calling RTS _after_
send be even more confusing than maybe_stop?
> > > With that the decision tree becomes more straightforward as we would do
> > > something like:
> > > if (result) {
> > > if (result < 0)
> > > Increment stopped stat
> > > return
> > > else
> > > Increment restarted stat
> > > }
> >
> > Do you see a driver where it matters? ixgbe and co. use
> > netif_tx_queue_try_stop() and again they just test stopped vs not stopped.
>
> The thing is in order to make this work for the ixgbe patch you didn't
> use the maybe_stop instead you went with the try_stop. If you replaced
> the ixgbe_maybe_stop_tx with your maybe stop would have to do
> something such as the code above to make it work. That is what I am
> getting at. From what I can tell the only real difference between
> ixgbe_maybe_stop_tx and your maybe_stop is that you avoided having to
> move the restart_queue stat increment out.
I can convert ixgbe further, true, but I needed the try_stop, anyway,
because bnxt does:
if (/* need to stop */) {
if (xmit_more())
flush_db_write();
netif_tx_queue_try_stop();
}
which seems reasonable.
> The general thought is I would prefer to keep it so that 0 is the
> default most likely case in both where the queue is enabled and is
> still enabled. By moving the "take action" items into the 1/-1 values
> then it becomes much easier to sort them out with 1 being a stat
> increment and -1 being an indication to stop transmitting and prep for
> watchdog hang if we don't clear this in the next watchdog period.
Maybe worth taking a step back - the restart stat which ixgbe
maintains made perfect sense when you pioneered this approach but
I think we had a decade of use, and have kprobes now, so we don't
really need to maintain a statistic for a condition with no impact
to the user? New driver should not care 1 vs -1..
> Also in general it makes it easier to understand if these all work
> with the same logic.
>
> > > In addition for readability we may want consider adding an enum simliar
> > > to the netdev_tx enum as then we have the return types locked and
> > > usable should we want to specifically pick out one.
> >
> > Hm, I thought people generally dislike the netdev_tx enum.
> > Maybe it's just me.
>
> The thought I had with the enum is to more easily connect the outcomes
> with the sources. It would also help to prevent any confusion on what
> is what. Having the two stop/wake functions return different values is
> a potential source for errors since 0/1 means different things in the
> different functions. Basically since we have 3 possible outcomes using
> the enum would make it very clear what the mapping is between the two.
IMO only two outcomes matter in practice (as mentioned above).
I really like the ability to treat the return value as a bool, if only
we had negative zero we would have a perfect compromise :(
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-24 21:28 ` Jakub Kicinski
@ 2023-03-26 21:23 ` Alexander Duyck
2023-03-29 0:56 ` Jakub Kicinski
0 siblings, 1 reply; 19+ messages in thread
From: Alexander Duyck @ 2023-03-26 21:23 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, willemb
On Fri, Mar 24, 2023 at 2:28 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 24 Mar 2023 08:45:23 -0700 Alexander Duyck wrote:
> > On Thu, Mar 23, 2023 at 8:09 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > We may want to change the values here. The most likely case is "left
> > > > enabled" with that being the case we probably want to make that the 0
> > > > case. I would then probably make 1 the re-enabled case and -1 the
> > > > stopped case.
> > >
> > > I chose the return values this way because the important information is
> > > whether the queue was in fact stopped (in case the macro is used at the
> > > start of .xmit as a safety check). If stopped is zero caller can check
> > > !ret vs !!ret.
> > >
> > > Seems pretty normal for the kernel function called stop() to return 0
> > > if it did stop.
> >
> > Except this isn't "stop", this is "maybe stop".
>
> So the return value from try_stop and maybe_stop would be different?
> try_stop needs to return 0 if it stopped - the same semantics as
> trylock(), AFAIR. Not that I love those semantics, but it's a fairly
> strong precedent.
The problem is this isn't a lock. Ideally with this we aren't taking
the action. So if anything this functions in my mind more like the
inverse where if this does stop we have to abort more like trylock
failing.
This is why I mentioned that maybe this should be renamed. I view this
more as a check to verify we are good to proceed. In addition there is
the problem that there are 3 possible outcomes with maybe_stop versus
the two from try_stop.
> > Maybe we should just
> > do away with the stop/wake messaging and go with something such as a
> > RTS/CTS type setup. Basically this function is acting as a RTS to
> > verify that we have room on the ring to place the frame. If we don't
> > we are stopped. The "wake" function is on what is essentially the
> > receiving end on the other side of the hardware after it has DMAed the
> > frames and is providing the CTS signal back.
>
> I'm definitely open to different naming but wouldn't calling RTS _after_
> send be even more confusing than maybe_stop?
We don't call maybe_stop after the send. For that we would be calling
try_stop. The difference being in the case of maybe_stop we might have
to return busy.
> > > > With that the decision tree becomes more straightforward as we would do
> > > > something like:
> > > > if (result) {
> > > > if (result < 0)
> > > > Increment stopped stat
> > > > return
> > > > else
> > > > Increment restarted stat
> > > > }
> > >
> > > Do you see a driver where it matters? ixgbe and co. use
> > > netif_tx_queue_try_stop() and again they just test stopped vs not stopped.
> >
> > The thing is in order to make this work for the ixgbe patch you didn't
> > use the maybe_stop instead you went with the try_stop. If you replaced
> > the ixgbe_maybe_stop_tx with your maybe stop would have to do
> > something such as the code above to make it work. That is what I am
> > getting at. From what I can tell the only real difference between
> > ixgbe_maybe_stop_tx and your maybe_stop is that you avoided having to
> > move the restart_queue stat increment out.
>
> I can convert ixgbe further, true, but I needed the try_stop, anyway,
> because bnxt does:
>
> if (/* need to stop */) {
> if (xmit_more())
> flush_db_write();
> netif_tx_queue_try_stop();
> }
>
> which seems reasonable.
I wasn't saying we didn't need try_stop. However the logic here
doesn't care about the return value. In the ixgbe case we track the
queue restarts so we would want a 0 on success and a non-zero if we
have to increment the stat. I would be okay with the 0 (success) / -1
(queue restarted) in this case.
> > The general thought is I would prefer to keep it so that 0 is the
> > default most likely case in both where the queue is enabled and is
> > still enabled. By moving the "take action" items into the 1/-1 values
> > then it becomes much easier to sort them out with 1 being a stat
> > increment and -1 being an indication to stop transmitting and prep for
> > watchdog hang if we don't clear this in the next watchdog period.
>
> Maybe worth taking a step back - the restart stat which ixgbe
> maintains made perfect sense when you pioneered this approach but
> I think we had a decade of use, and have kprobes now, so we don't
> really need to maintain a statistic for a condition with no impact
> to the user? New driver should not care 1 vs -1..
Actually the restart_queue stat is VERY useful for debugging. It tells
us we are seeing backlogs develop in the Tx queue. We track it any
time we wake up the queue, not just in the maybe_stop case.
WIthout that we are then having to break out kprobes and the like
which we could only add after-the-fact which makes things much harder
to debug when issues occur. For example, a common case to use it is to
monitor it when we see a system with slow Tx connections. With that
stat we can tell if we are building a backlog in the qdisc or if it is
something else such as a limited amount of socket memory is limiting
the transmits.
> > Also in general it makes it easier to understand if these all work
> > with the same logic.
> >
> > > > In addition for readability we may want consider adding an enum simliar
> > > > to the netdev_tx enum as then we have the return types locked and
> > > > usable should we want to specifically pick out one.
> > >
> > > Hm, I thought people generally dislike the netdev_tx enum.
> > > Maybe it's just me.
> >
> > The thought I had with the enum is to more easily connect the outcomes
> > with the sources. It would also help to prevent any confusion on what
> > is what. Having the two stop/wake functions return different values is
> > a potential source for errors since 0/1 means different things in the
> > different functions. Basically since we have 3 possible outcomes using
> > the enum would make it very clear what the mapping is between the two.
>
> IMO only two outcomes matter in practice (as mentioned above).
> I really like the ability to treat the return value as a bool, if only
> we had negative zero we would have a perfect compromise :(
I think we are just thinking about two different things. I am focusing
on the "maybe" calls that have 3 outcomes whereas I think you are
mostly focused on the "try" calls. My thought is to treat it something
like the msix allocation calls where a negative indicates a failure
forcing us to stop since the ring is full, 0 is a success, and a value
indicates that there are resources but they are/were limited.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-26 21:23 ` Alexander Duyck
@ 2023-03-29 0:56 ` Jakub Kicinski
2023-03-30 14:56 ` Paolo Abeni
0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2023-03-29 0:56 UTC (permalink / raw)
To: Alexander Duyck, pabeni; +Cc: davem, netdev, edumazet, willemb
On Sun, 26 Mar 2023 14:23:07 -0700 Alexander Duyck wrote:
> > > Except this isn't "stop", this is "maybe stop".
> >
> > So the return value from try_stop and maybe_stop would be different?
> > try_stop needs to return 0 if it stopped - the same semantics as
> > trylock(), AFAIR. Not that I love those semantics, but it's a fairly
> > strong precedent.
>
> The problem is this isn't a lock. Ideally with this we aren't taking
> the action. So if anything this functions in my mind more like the
> inverse where if this does stop we have to abort more like trylock
> failing.
No.. for try_stop we are trying to stop.
> This is why I mentioned that maybe this should be renamed. I view this
> more as a check to verify we are good to proceed. In addition there is
> the problem that there are 3 possible outcomes with maybe_stop versus
> the two from try_stop.
I'm open to other names :S
> > > The thing is in order to make this work for the ixgbe patch you didn't
> > > use the maybe_stop instead you went with the try_stop. If you replaced
> > > the ixgbe_maybe_stop_tx with your maybe stop would have to do
> > > something such as the code above to make it work. That is what I am
> > > getting at. From what I can tell the only real difference between
> > > ixgbe_maybe_stop_tx and your maybe_stop is that you avoided having to
> > > move the restart_queue stat increment out.
> >
> > I can convert ixgbe further, true, but I needed the try_stop, anyway,
> > because bnxt does:
> >
> > if (/* need to stop */) {
> > if (xmit_more())
> > flush_db_write();
> > netif_tx_queue_try_stop();
> > }
> >
> > which seems reasonable.
>
> I wasn't saying we didn't need try_stop. However the logic here
> doesn't care about the return value. In the ixgbe case we track the
> queue restarts so we would want a 0 on success and a non-zero if we
> have to increment the stat. I would be okay with the 0 (success) / -1
> (queue restarted) in this case.
>
> > > The general thought is I would prefer to keep it so that 0 is the
> > > default most likely case in both where the queue is enabled and is
> > > still enabled. By moving the "take action" items into the 1/-1 values
> > > then it becomes much easier to sort them out with 1 being a stat
> > > increment and -1 being an indication to stop transmitting and prep for
> > > watchdog hang if we don't clear this in the next watchdog period.
> >
> > Maybe worth taking a step back - the restart stat which ixgbe
> > maintains made perfect sense when you pioneered this approach but
> > I think we had a decade of use, and have kprobes now, so we don't
> > really need to maintain a statistic for a condition with no impact
> > to the user? New driver should not care 1 vs -1..
>
> Actually the restart_queue stat is VERY useful for debugging. It tells
> us we are seeing backlogs develop in the Tx queue. We track it any
> time we wake up the queue, not just in the maybe_stop case.
>
> WIthout that we are then having to break out kprobes and the like
> which we could only add after-the-fact which makes things much harder
> to debug when issues occur. For example, a common case to use it is to
> monitor it when we see a system with slow Tx connections. With that
> stat we can tell if we are building a backlog in the qdisc or if it is
> something else such as a limited amount of socket memory is limiting
> the transmits.
Oh, I missed that wake uses the same stat. Let me clarify - the
stop/start counter is definitely useful. What I thought the restart
counter is counting is just the race cases. I don't think the race
cases are worth counting in any way.
> > > The thought I had with the enum is to more easily connect the outcomes
> > > with the sources. It would also help to prevent any confusion on what
> > > is what. Having the two stop/wake functions return different values is
> > > a potential source for errors since 0/1 means different things in the
> > > different functions. Basically since we have 3 possible outcomes using
> > > the enum would make it very clear what the mapping is between the two.
> >
> > IMO only two outcomes matter in practice (as mentioned above).
> > I really like the ability to treat the return value as a bool, if only
> > we had negative zero we would have a perfect compromise :(
>
> I think we are just thinking about two different things. I am focusing
> on the "maybe" calls that have 3 outcomes whereas I think you are
> mostly focused on the "try" calls. My thought is to treat it something
> like the msix allocation calls where a negative indicates a failure
> forcing us to stop since the ring is full, 0 is a success, and a value
> indicates that there are resources but they are/were limited.
I don't see a strong analogy to PCI resource allocation :(
I prefer to keep the 0 vs non-0 distinction to indicate whether
the action was performed.
Paolo, Eric, any opinion? Other than the one likely vs unlikely
flip -- is this good enough to merge for you?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
2023-03-29 0:56 ` Jakub Kicinski
@ 2023-03-30 14:56 ` Paolo Abeni
0 siblings, 0 replies; 19+ messages in thread
From: Paolo Abeni @ 2023-03-30 14:56 UTC (permalink / raw)
To: Jakub Kicinski, Alexander Duyck; +Cc: davem, netdev, edumazet, willemb
Hi,
On Tue, 2023-03-28 at 17:56 -0700, Jakub Kicinski wrote:
> On Sun, 26 Mar 2023 14:23:07 -0700 Alexander Duyck wrote:
> > > > Except this isn't "stop", this is "maybe stop".
> > >
> > > So the return value from try_stop and maybe_stop would be different?
> > > try_stop needs to return 0 if it stopped - the same semantics as
> > > trylock(), AFAIR. Not that I love those semantics, but it's a fairly
> > > strong precedent.
> >
> > The problem is this isn't a lock. Ideally with this we aren't taking
> > the action. So if anything this functions in my mind more like the
> > inverse where if this does stop we have to abort more like trylock
> > failing.
>
> No.. for try_stop we are trying to stop.
>
> > This is why I mentioned that maybe this should be renamed. I view this
> > more as a check to verify we are good to proceed. In addition there is
> > the problem that there are 3 possible outcomes with maybe_stop versus
> > the two from try_stop.
>
> I'm open to other names :S
>
> > > > The thing is in order to make this work for the ixgbe patch you didn't
> > > > use the maybe_stop instead you went with the try_stop. If you replaced
> > > > the ixgbe_maybe_stop_tx with your maybe stop would have to do
> > > > something such as the code above to make it work. That is what I am
> > > > getting at. From what I can tell the only real difference between
> > > > ixgbe_maybe_stop_tx and your maybe_stop is that you avoided having to
> > > > move the restart_queue stat increment out.
> > >
> > > I can convert ixgbe further, true, but I needed the try_stop, anyway,
> > > because bnxt does:
> > >
> > > if (/* need to stop */) {
> > > if (xmit_more())
> > > flush_db_write();
> > > netif_tx_queue_try_stop();
> > > }
> > >
> > > which seems reasonable.
> >
> > I wasn't saying we didn't need try_stop. However the logic here
> > doesn't care about the return value. In the ixgbe case we track the
> > queue restarts so we would want a 0 on success and a non-zero if we
> > have to increment the stat. I would be okay with the 0 (success) / -1
> > (queue restarted) in this case.
> >
> > > > The general thought is I would prefer to keep it so that 0 is the
> > > > default most likely case in both where the queue is enabled and is
> > > > still enabled. By moving the "take action" items into the 1/-1 values
> > > > then it becomes much easier to sort them out with 1 being a stat
> > > > increment and -1 being an indication to stop transmitting and prep for
> > > > watchdog hang if we don't clear this in the next watchdog period.
> > >
> > > Maybe worth taking a step back - the restart stat which ixgbe
> > > maintains made perfect sense when you pioneered this approach but
> > > I think we had a decade of use, and have kprobes now, so we don't
> > > really need to maintain a statistic for a condition with no impact
> > > to the user? New driver should not care 1 vs -1..
> >
> > Actually the restart_queue stat is VERY useful for debugging. It tells
> > us we are seeing backlogs develop in the Tx queue. We track it any
> > time we wake up the queue, not just in the maybe_stop case.
> >
> > WIthout that we are then having to break out kprobes and the like
> > which we could only add after-the-fact which makes things much harder
> > to debug when issues occur. For example, a common case to use it is to
> > monitor it when we see a system with slow Tx connections. With that
> > stat we can tell if we are building a backlog in the qdisc or if it is
> > something else such as a limited amount of socket memory is limiting
> > the transmits.
>
> Oh, I missed that wake uses the same stat. Let me clarify - the
> stop/start counter is definitely useful. What I thought the restart
> counter is counting is just the race cases. I don't think the race
> cases are worth counting in any way.
>
> > > > The thought I had with the enum is to more easily connect the outcomes
> > > > with the sources. It would also help to prevent any confusion on what
> > > > is what. Having the two stop/wake functions return different values is
> > > > a potential source for errors since 0/1 means different things in the
> > > > different functions. Basically since we have 3 possible outcomes using
> > > > the enum would make it very clear what the mapping is between the two.
> > >
> > > IMO only two outcomes matter in practice (as mentioned above).
> > > I really like the ability to treat the return value as a bool, if only
> > > we had negative zero we would have a perfect compromise :(
> >
> > I think we are just thinking about two different things. I am focusing
> > on the "maybe" calls that have 3 outcomes whereas I think you are
> > mostly focused on the "try" calls. My thought is to treat it something
> > like the msix allocation calls where a negative indicates a failure
> > forcing us to stop since the ring is full, 0 is a success, and a value
> > indicates that there are resources but they are/were limited.
>
> I don't see a strong analogy to PCI resource allocation :(
>
> I prefer to keep the 0 vs non-0 distinction to indicate whether
> the action was performed.
>
> Paolo, Eric, any opinion? Other than the one likely vs unlikely
> flip -- is this good enough to merge for you?
As you know I'm usually horrible at name related choice, but you asked,
so...
I'm personally ok with the current naming, and AFAICS the coding style
guidelines suggest returning 0 when imperative functions complete
successfully.
I think we should apply the guidelines here, even if are talking about
macros.
That means netif_tx_queue_maybe_stop() and netif_tx_queue_try_stop()
should return 0 when the queue is actually stopped.
I'm personally fine with the current implementation.
Cheers,
Paolo
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH net-next 2/3] ixgbe: use new queue try_stop/try_wake macros
2023-04-01 5:12 [PATCH net-next 0/3] " Jakub Kicinski
@ 2023-04-01 5:12 ` Jakub Kicinski
0 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2023-04-01 5:12 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, Jakub Kicinski, jesse.brandeburg,
anthony.l.nguyen
Convert ixgbe to use the new macros, I think a lot of people
copy the ixgbe code. The only functional change is that the
unlikely() in ixgbe_clean_tx_irq() turns into a likely()
inside the new macro and no longer includes
total_packets && netif_carrier_ok(tx_ring->netdev)
which is probably for the best, anyway.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: jesse.brandeburg@intel.com
CC: anthony.l.nguyen@intel.com
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 37 +++++--------------
1 file changed, 9 insertions(+), 28 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 773c35fecace..db00e50a40ff 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -36,6 +36,7 @@
#include <net/tc_act/tc_mirred.h>
#include <net/vxlan.h>
#include <net/mpls.h>
+#include <net/netdev_queues.h>
#include <net/xdp_sock_drv.h>
#include <net/xfrm.h>
@@ -1253,20 +1254,12 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
total_packets, total_bytes);
#define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
- if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) &&
- (ixgbe_desc_unused(tx_ring) >= TX_WAKE_THRESHOLD))) {
- /* Make sure that anybody stopping the queue after this
- * sees the new next_to_clean.
- */
- smp_mb();
- if (__netif_subqueue_stopped(tx_ring->netdev,
- tx_ring->queue_index)
- && !test_bit(__IXGBE_DOWN, &adapter->state)) {
- netif_wake_subqueue(tx_ring->netdev,
- tx_ring->queue_index);
- ++tx_ring->tx_stats.restart_queue;
- }
- }
+ if (total_packets && netif_carrier_ok(tx_ring->netdev) &&
+ !__netif_subqueue_maybe_wake(tx_ring->netdev, tx_ring->queue_index,
+ ixgbe_desc_unused(tx_ring),
+ TX_WAKE_THRESHOLD,
+ test_bit(__IXGBE_DOWN, &adapter->state)))
+ ++tx_ring->tx_stats.restart_queue;
return !!budget;
}
@@ -8270,22 +8263,10 @@ static void ixgbe_tx_olinfo_status(union ixgbe_adv_tx_desc *tx_desc,
static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
{
- netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
-
- /* Herbert's original patch had:
- * smp_mb__after_netif_stop_queue();
- * but since that doesn't exist yet, just open code it.
- */
- smp_mb();
-
- /* We need to check again in a case another CPU has just
- * made room available.
- */
- if (likely(ixgbe_desc_unused(tx_ring) < size))
+ if (!netif_subqueue_try_stop(tx_ring->netdev, tx_ring->queue_index,
+ ixgbe_desc_unused(tx_ring), size))
return -EBUSY;
- /* A reprieve! - use start_queue because it doesn't call schedule */
- netif_start_subqueue(tx_ring->netdev, tx_ring->queue_index);
++tx_ring->tx_stats.restart_queue;
return 0;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
end of thread, other threads:[~2023-04-01 5:12 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-22 23:30 [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
2023-03-22 23:30 ` [PATCH net-next 2/3] ixgbe: use new queue try_stop/try_wake macros Jakub Kicinski
2023-03-22 23:30 ` [PATCH net-next 3/3] bnxt: " Jakub Kicinski
2023-03-23 0:35 ` [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Andrew Lunn
2023-03-23 1:04 ` Jakub Kicinski
2023-03-23 21:02 ` Andrew Lunn
2023-03-23 22:46 ` Jakub Kicinski
2023-03-23 3:05 ` Yunsheng Lin
2023-03-23 3:27 ` Jakub Kicinski
2023-03-23 4:53 ` Pavan Chebbi
2023-03-23 5:08 ` Jakub Kicinski
2023-03-23 16:05 ` Alexander H Duyck
2023-03-24 3:09 ` Jakub Kicinski
2023-03-24 15:45 ` Alexander Duyck
2023-03-24 21:28 ` Jakub Kicinski
2023-03-26 21:23 ` Alexander Duyck
2023-03-29 0:56 ` Jakub Kicinski
2023-03-30 14:56 ` Paolo Abeni
-- strict thread matches above, loose matches on Subject: below --
2023-04-01 5:12 [PATCH net-next 0/3] " Jakub Kicinski
2023-04-01 5:12 ` [PATCH net-next 2/3] ixgbe: use new queue try_stop/try_wake macros Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).