[PATCH net-next 0/3] net: netpoll: Improve SKB pool management

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 0/3] net: netpoll: Improve SKB pool management
@ 2024-10-25 14:20 Breno Leitao
  2024-10-25 14:20 ` [PATCH net-next 1/3] net: netpoll: Defer skb_pool population until setup success Breno Leitao
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Breno Leitao @ 2024-10-25 14:20 UTC (permalink / raw)
  To: kuba, horms, davem, edumazet, pabeni
  Cc: thepacketgeek, netdev, linux-kernel, davej, vlad.wing, max,
	kernel-team, jiri, jv, andy, aehkn

The netpoll subsystem pre-allocates 32 SKBs in a pool for emergency use
during out-of-memory conditions. However, the current implementation has
several inefficiencies:

 * The SKB pool, once allocated, is never freed:
	 * Resources remain allocated even after netpoll users are removed
	 * Failed initialization can leave pool populated forever
 * The global pool design makes resource tracking difficult

This series addresses these issues through three patches:


Patch 1 ("net: netpoll: Defer skb_pool population until setup success"):
 - Defer SKB pool population until setup validation passes

Patch 2 ("net: netpoll: Individualize the skb pool"):
 - Replace global pool with per-user pools in netpoll struct

Patch 3 ("net: netpoll: flush skb pool during cleanup"):
- Properly free pool resources during netconsole cleanup

These changes improve resource management and make the code more
maintainable.  As a side benefit, the improved structure would allow
netpoll to be modularized if desired in the future.

What is coming next?

Once this patch is integrated, I am planning to have the SKBs being
refilled outside of hot (send) path, in a work thread.

Breno Leitao (3):
  net: netpoll: Defer skb_pool population until setup success
  net: netpoll: Individualize the skb pool
  net: netpoll: flush skb pool during cleanup

 include/linux/netpoll.h |  1 +
 net/core/netpoll.c      | 52 +++++++++++++++++++++++++----------------
 2 files changed, 33 insertions(+), 20 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net-next 1/3] net: netpoll: Defer skb_pool population until setup success
  2024-10-25 14:20 [PATCH net-next 0/3] net: netpoll: Improve SKB pool management Breno Leitao
@ 2024-10-25 14:20 ` Breno Leitao
  2024-11-01  1:26   ` Jakub Kicinski
  2024-10-25 14:20 ` [PATCH net-next 2/3] net: netpoll: Individualize the skb pool Breno Leitao
  2024-10-25 14:20 ` [PATCH net-next 3/3] net: netpoll: flush skb pool during cleanup Breno Leitao
  2 siblings, 1 reply; 17+ messages in thread
From: Breno Leitao @ 2024-10-25 14:20 UTC (permalink / raw)
  To: kuba, horms, davem, edumazet, pabeni
  Cc: thepacketgeek, netdev, linux-kernel, davej, vlad.wing, max,
	kernel-team, jiri, jv, andy, aehkn, Rik van Riel, Al Viro

The current implementation has a flaw where it populates the skb_pool
with 32 SKBs before calling __netpoll_setup(). If the setup fails, the
skb_pool buffer will persist indefinitely and never be cleaned up.

This change moves the skb_pool population to after the successful
completion of __netpoll_setup(), ensuring that the buffers are not
unnecessarily retained. Additionally, this modification alleviates rtnl
lock pressure by allowing the buffer filling to occur outside of the
lock.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 net/core/netpoll.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index aa49b92e9194..e83fd8bdce36 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -772,13 +772,13 @@ int netpoll_setup(struct netpoll *np)
 		}
 	}
 
-	/* fill up the skb queue */
-	refill_skbs();
-
 	err = __netpoll_setup(np, ndev);
 	if (err)
 		goto put;
 	rtnl_unlock();
+
+	/* fill up the skb queue */
+	refill_skbs();
 	return 0;
 
 put:
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 2/3] net: netpoll: Individualize the skb pool
  2024-10-25 14:20 [PATCH net-next 0/3] net: netpoll: Improve SKB pool management Breno Leitao
  2024-10-25 14:20 ` [PATCH net-next 1/3] net: netpoll: Defer skb_pool population until setup success Breno Leitao
@ 2024-10-25 14:20 ` Breno Leitao
  2024-11-01  1:28   ` Jakub Kicinski
  2024-10-25 14:20 ` [PATCH net-next 3/3] net: netpoll: flush skb pool during cleanup Breno Leitao
  2 siblings, 1 reply; 17+ messages in thread
From: Breno Leitao @ 2024-10-25 14:20 UTC (permalink / raw)
  To: kuba, horms, davem, edumazet, pabeni
  Cc: thepacketgeek, netdev, linux-kernel, davej, vlad.wing, max,
	kernel-team, jiri, jv, andy, aehkn, Rik van Riel, Al Viro

The current implementation of the netpoll system uses a global skb pool,
which can lead to inefficient memory usage and waste when targets are
disabled or no longer in use.

This can result in a significant amount of memory being unnecessarily
allocated and retained, potentially causing performance issues and
limiting the availability of resources for other system components.

Modify the netpoll system to assign a skb pool to each target instead of
using a global one.

This approach allows for more fine-grained control over memory
allocation and deallocation, ensuring that resources are only allocated
and retained as needed.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/linux/netpoll.h |  1 +
 net/core/netpoll.c      | 29 +++++++++++------------------
 2 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index cd4e28db0cbd..77635b885c18 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -32,6 +32,7 @@ struct netpoll {
 	bool ipv6;
 	u16 local_port, remote_port;
 	u8 remote_mac[ETH_ALEN];
+	struct sk_buff_head skb_pool;
 };
 
 struct netpoll_info {
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index e83fd8bdce36..c9a9e37e2d74 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -45,9 +45,6 @@
 
 #define MAX_UDP_CHUNK 1460
 #define MAX_SKBS 32
-
-static struct sk_buff_head skb_pool;
-
 #define USEC_PER_POLL	50
 
 #define MAX_SKB_SIZE							\
@@ -234,20 +231,23 @@ void netpoll_poll_enable(struct net_device *dev)
 		up(&ni->dev_lock);
 }
 
-static void refill_skbs(void)
+static void refill_skbs(struct netpoll *np)
 {
+	struct sk_buff_head *skb_pool;
 	struct sk_buff *skb;
 	unsigned long flags;
 
-	spin_lock_irqsave(&skb_pool.lock, flags);
-	while (skb_pool.qlen < MAX_SKBS) {
+	skb_pool = &np->skb_pool;
+
+	spin_lock_irqsave(&skb_pool->lock, flags);
+	while (skb_pool->qlen < MAX_SKBS) {
 		skb = alloc_skb(MAX_SKB_SIZE, GFP_ATOMIC);
 		if (!skb)
 			break;
 
-		__skb_queue_tail(&skb_pool, skb);
+		__skb_queue_tail(skb_pool, skb);
 	}
-	spin_unlock_irqrestore(&skb_pool.lock, flags);
+	spin_unlock_irqrestore(&skb_pool->lock, flags);
 }
 
 static void zap_completion_queue(void)
@@ -284,12 +284,12 @@ static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)
 	struct sk_buff *skb;
 
 	zap_completion_queue();
-	refill_skbs();
+	refill_skbs(np);
 repeat:
 
 	skb = alloc_skb(len, GFP_ATOMIC);
 	if (!skb)
-		skb = skb_dequeue(&skb_pool);
+		skb = skb_dequeue(&np->skb_pool);
 
 	if (!skb) {
 		if (++count < 10) {
@@ -778,7 +778,7 @@ int netpoll_setup(struct netpoll *np)
 	rtnl_unlock();
 
 	/* fill up the skb queue */
-	refill_skbs();
+	refill_skbs(np);
 	return 0;
 
 put:
@@ -792,13 +792,6 @@ int netpoll_setup(struct netpoll *np)
 }
 EXPORT_SYMBOL(netpoll_setup);
 
-static int __init netpoll_init(void)
-{
-	skb_queue_head_init(&skb_pool);
-	return 0;
-}
-core_initcall(netpoll_init);
-
 static void rcu_cleanup_netpoll_info(struct rcu_head *rcu_head)
 {
 	struct netpoll_info *npinfo =
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 3/3] net: netpoll: flush skb pool during cleanup
  2024-10-25 14:20 [PATCH net-next 0/3] net: netpoll: Improve SKB pool management Breno Leitao
  2024-10-25 14:20 ` [PATCH net-next 1/3] net: netpoll: Defer skb_pool population until setup success Breno Leitao
  2024-10-25 14:20 ` [PATCH net-next 2/3] net: netpoll: Individualize the skb pool Breno Leitao
@ 2024-10-25 14:20 ` Breno Leitao
  2024-11-01  1:29   ` Jakub Kicinski
  2 siblings, 1 reply; 17+ messages in thread
From: Breno Leitao @ 2024-10-25 14:20 UTC (permalink / raw)
  To: kuba, horms, davem, edumazet, pabeni
  Cc: thepacketgeek, netdev, linux-kernel, davej, vlad.wing, max,
	kernel-team, jiri, jv, andy, aehkn, Rik van Riel, Al Viro

The netpoll subsystem maintains a pool of 32 pre-allocated SKBs per
instance, but these SKBs are not freed when the netpoll user is brought
down. This leads to memory waste as these buffers remain allocated but
unused.

Add skb_pool_flush() to properly clean up these SKBs when netconsole is
terminated, improving memory efficiency.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 net/core/netpoll.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index c9a9e37e2d74..bf2064d689d5 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -656,6 +656,7 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
 
 	/* last thing to do is link it to the net device structure */
 	rcu_assign_pointer(ndev->npinfo, npinfo);
+	skb_queue_head_init(&np->skb_pool);
 
 	return 0;
 
@@ -809,6 +810,22 @@ static void rcu_cleanup_netpoll_info(struct rcu_head *rcu_head)
 	kfree(npinfo);
 }
 
+static void skb_pool_flush(struct netpoll *np)
+{
+	struct sk_buff_head *skb_pool;
+	struct sk_buff *skb;
+	unsigned long flags;
+
+	skb_pool = &np->skb_pool;
+
+	spin_lock_irqsave(&skb_pool->lock, flags);
+	while (skb_pool->qlen > 0) {
+		skb = __skb_dequeue(skb_pool);
+		kfree_skb(skb);
+	}
+	spin_unlock_irqrestore(&skb_pool->lock, flags);
+}
+
 void __netpoll_cleanup(struct netpoll *np)
 {
 	struct netpoll_info *npinfo;
@@ -828,6 +845,8 @@ void __netpoll_cleanup(struct netpoll *np)
 		call_rcu(&npinfo->rcu, rcu_cleanup_netpoll_info);
 	} else
 		RCU_INIT_POINTER(np->dev->npinfo, NULL);
+
+	skb_pool_flush(np);
 }
 EXPORT_SYMBOL_GPL(__netpoll_cleanup);
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: netpoll: Defer skb_pool population until setup success
  2024-10-25 14:20 ` [PATCH net-next 1/3] net: netpoll: Defer skb_pool population until setup success Breno Leitao
@ 2024-11-01  1:26   ` Jakub Kicinski
  2024-11-01 10:51     ` Breno Leitao
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2024-11-01  1:26 UTC (permalink / raw)
  To: Breno Leitao
  Cc: horms, davem, edumazet, pabeni, thepacketgeek, netdev,
	linux-kernel, davej, vlad.wing, max, kernel-team, jiri, jv, andy,
	aehkn, Rik van Riel, Al Viro

On Fri, 25 Oct 2024 07:20:18 -0700 Breno Leitao wrote:
> The current implementation has a flaw where it populates the skb_pool
> with 32 SKBs before calling __netpoll_setup(). If the setup fails, the
> skb_pool buffer will persist indefinitely and never be cleaned up.
> 
> This change moves the skb_pool population to after the successful
> completion of __netpoll_setup(), ensuring that the buffers are not
> unnecessarily retained. Additionally, this modification alleviates rtnl
> lock pressure by allowing the buffer filling to occur outside of the
> lock.

arguably if the setup succeeds there would now be a window of time
where np is active but pool is empty.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 2/3] net: netpoll: Individualize the skb pool
  2024-10-25 14:20 ` [PATCH net-next 2/3] net: netpoll: Individualize the skb pool Breno Leitao
@ 2024-11-01  1:28   ` Jakub Kicinski
  2024-11-01 11:56     ` Breno Leitao
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2024-11-01  1:28 UTC (permalink / raw)
  To: Breno Leitao
  Cc: horms, davem, edumazet, pabeni, thepacketgeek, netdev,
	linux-kernel, davej, vlad.wing, max, kernel-team, jiri, jv, andy,
	aehkn, Rik van Riel, Al Viro

On Fri, 25 Oct 2024 07:20:19 -0700 Breno Leitao wrote:
> The current implementation of the netpoll system uses a global skb pool,
> which can lead to inefficient memory usage and waste when targets are
> disabled or no longer in use.
> 
> This can result in a significant amount of memory being unnecessarily
> allocated and retained, potentially causing performance issues and
> limiting the availability of resources for other system components.
> 
> Modify the netpoll system to assign a skb pool to each target instead of
> using a global one.
> 
> This approach allows for more fine-grained control over memory
> allocation and deallocation, ensuring that resources are only allocated
> and retained as needed.

If memory consumption is a concern then having n pools for n targets
rather than one seems even worse? 

Is it not better to flush the pool when last target gets disabled?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 3/3] net: netpoll: flush skb pool during cleanup
  2024-10-25 14:20 ` [PATCH net-next 3/3] net: netpoll: flush skb pool during cleanup Breno Leitao
@ 2024-11-01  1:29   ` Jakub Kicinski
  2024-11-01 11:57     ` Breno Leitao
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2024-11-01  1:29 UTC (permalink / raw)
  To: Breno Leitao
  Cc: horms, davem, edumazet, pabeni, thepacketgeek, netdev,
	linux-kernel, davej, vlad.wing, max, kernel-team, jiri, jv, andy,
	aehkn, Rik van Riel, Al Viro

On Fri, 25 Oct 2024 07:20:20 -0700 Breno Leitao wrote:
> +	skb_queue_head_init(&np->skb_pool);

This belongs in the previous patch
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: netpoll: Defer skb_pool population until setup success
  2024-11-01  1:26   ` Jakub Kicinski
@ 2024-11-01 10:51     ` Breno Leitao
  2024-11-01 18:18       ` Breno Leitao
  0 siblings, 1 reply; 17+ messages in thread
From: Breno Leitao @ 2024-11-01 10:51 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: horms, davem, edumazet, pabeni, thepacketgeek, netdev,
	linux-kernel, davej, vlad.wing, max, kernel-team, jiri, jv, andy,
	aehkn, Rik van Riel, Al Viro

Hello Jakub,

On Thu, Oct 31, 2024 at 06:26:47PM -0700, Jakub Kicinski wrote:
> On Fri, 25 Oct 2024 07:20:18 -0700 Breno Leitao wrote:
> > The current implementation has a flaw where it populates the skb_pool
> > with 32 SKBs before calling __netpoll_setup(). If the setup fails, the
> > skb_pool buffer will persist indefinitely and never be cleaned up.
> > 
> > This change moves the skb_pool population to after the successful
> > completion of __netpoll_setup(), ensuring that the buffers are not
> > unnecessarily retained. Additionally, this modification alleviates rtnl
> > lock pressure by allowing the buffer filling to occur outside of the
> > lock.
> 
> arguably if the setup succeeds there would now be a window of time
> where np is active but pool is empty.

I am not convinced this is a problem. Given that netpoll_setup() is only
called from netconsole.

In netconsole, a target is not enabled (as in sending packets) until the
netconsole target is, in fact, enabled. (nt->enabled = true). Enabling
the target(nt) only happen after netpoll_setup() returns successfully.

Example:

	static void write_ext_msg(struct console *con, const char *msg,
				  unsigned int len)
	{
		...
		list_for_each_entry(nt, &target_list, list)
			if (nt->extended && nt->enabled && netif_running(nt->np.dev))
				send_ext_msg_udp(nt, msg, len);

So, back to your point, the netpoll interface will be up, but, not used
at all.

On top of that, two other considerations:

 * If the netpoll target is used without the buffer, it is not a big
deal, since refill_skbs() is called, independently if the pool is full
or not. (Which is not ideal and I eventually want to improve it).

Anyway, this is how the code works today:

	void netpoll_send_udp(struct netpoll *np, const char *msg, int len)
	{
		...
		skb = find_skb(np, total_len + np->dev->needed_tailroom,...
		// transmit the skb

	static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)
	{
		...
		refill_skbs(np);
		skb = alloc_skb(len, GFP_ATOMIC);
		if (!skb)
			skb = skb_dequeue(&np->skb_pool);
		...
		// return the skb

So, even in there is a transmission in-between enabling the netpoll
target and not populating the pool (which is NOT the case in the code
today), it would not be a problem, given that netpoll_send_udp() will
call refill_skbs() anyway.

I have an in-development patch to improve it, by deferring this to a
workthread, mainly because this whole allocation dance is done with a
bunch of locks held, including printk/console lock.

I think that a best mechanism might be something like:

 * If find_skb() needs to consume from the pool (which is rare, only
when alloc_skb() fails), raise workthread that tries to repopulate the
pool in the background. 

 * Eventually avoid alloc_skb() first, and getting directly from the
   pool first, if the pool is depleted, try to alloc_skb(GPF_ATOMIC).
   This might make the code faster, but, I don't have data yet.

   This might also required a netpool reconfigurable of pool size. Today
   it is hardcoded (#define MAX_SKBS 32). This current patchset is the
   first step to individualize the pool, then, we can have a field in
   struct netpoll that specify what is the pool size (32 by default),
   but user configuration.

   On netconsole, we can do it using the configfs fields.

Anyway, are these ideas too crazy?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 2/3] net: netpoll: Individualize the skb pool
  2024-11-01  1:28   ` Jakub Kicinski
@ 2024-11-01 11:56     ` Breno Leitao
  0 siblings, 0 replies; 17+ messages in thread
From: Breno Leitao @ 2024-11-01 11:56 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: horms, davem, edumazet, pabeni, thepacketgeek, netdev,
	linux-kernel, davej, vlad.wing, max, kernel-team, jiri, jv, andy,
	aehkn, Rik van Riel, Al Viro

On Thu, Oct 31, 2024 at 06:28:57PM -0700, Jakub Kicinski wrote:
> On Fri, 25 Oct 2024 07:20:19 -0700 Breno Leitao wrote:
> > The current implementation of the netpoll system uses a global skb pool,
> > which can lead to inefficient memory usage and waste when targets are
> > disabled or no longer in use.
> > 
> > This can result in a significant amount of memory being unnecessarily
> > allocated and retained, potentially causing performance issues and
> > limiting the availability of resources for other system components.
> > 
> > Modify the netpoll system to assign a skb pool to each target instead of
> > using a global one.
> > 
> > This approach allows for more fine-grained control over memory
> > allocation and deallocation, ensuring that resources are only allocated
> > and retained as needed.
> 
> If memory consumption is a concern then having n pools for n targets
> rather than one seems even worse? 
> 
> Is it not better to flush the pool when last target gets disabled?

That is an option as well, we can create a refcount and flush the pool
when it reaches to zero. This will require some core reoganization due
to the way the buffer are initialized (at early initi), but, totally
doable. In fact, this is how I started this patchset.

On the other side, it seems a better design to have a pool per user, and
they are independent and not sharing the same pool.

In fact, if the pool is per-user, we can make the whole netpoll
transmission faster, just dequeing a skb from the pool instead of trying
to allocate an fresh skb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 3/3] net: netpoll: flush skb pool during cleanup
  2024-11-01  1:29   ` Jakub Kicinski
@ 2024-11-01 11:57     ` Breno Leitao
  0 siblings, 0 replies; 17+ messages in thread
From: Breno Leitao @ 2024-11-01 11:57 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: horms, davem, edumazet, pabeni, thepacketgeek, netdev,
	linux-kernel, davej, vlad.wing, max, kernel-team, jiri, jv, andy,
	aehkn, Rik van Riel, Al Viro

On Thu, Oct 31, 2024 at 06:29:27PM -0700, Jakub Kicinski wrote:
> On Fri, 25 Oct 2024 07:20:20 -0700 Breno Leitao wrote:
> > +	skb_queue_head_init(&np->skb_pool);
> 
> This belongs in the previous patch

Agree. Good catch. I will update.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: netpoll: Defer skb_pool population until setup success
  2024-11-01 10:51     ` Breno Leitao
@ 2024-11-01 18:18       ` Breno Leitao
  2024-11-02  2:01         ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Breno Leitao @ 2024-11-01 18:18 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: horms, davem, edumazet, pabeni, thepacketgeek, netdev,
	linux-kernel, davej, vlad.wing, max, kernel-team, jiri, jv, andy,
	aehkn, Rik van Riel, Al Viro

On Fri, Nov 01, 2024 at 03:51:59AM -0700, Breno Leitao wrote:
> Hello Jakub,
> 
> On Thu, Oct 31, 2024 at 06:26:47PM -0700, Jakub Kicinski wrote:
> > On Fri, 25 Oct 2024 07:20:18 -0700 Breno Leitao wrote:
> > > The current implementation has a flaw where it populates the skb_pool
> > > with 32 SKBs before calling __netpoll_setup(). If the setup fails, the
> > > skb_pool buffer will persist indefinitely and never be cleaned up.
> > > 
> > > This change moves the skb_pool population to after the successful
> > > completion of __netpoll_setup(), ensuring that the buffers are not
> > > unnecessarily retained. Additionally, this modification alleviates rtnl
> > > lock pressure by allowing the buffer filling to occur outside of the
> > > lock.
> > 
> > arguably if the setup succeeds there would now be a window of time
> > where np is active but pool is empty.
> 
> I am not convinced this is a problem. Given that netpoll_setup() is only
> called from netconsole.
> 
> In netconsole, a target is not enabled (as in sending packets) until the
> netconsole target is, in fact, enabled. (nt->enabled = true). Enabling
> the target(nt) only happen after netpoll_setup() returns successfully.
> 
> Example:
> 
> 	static void write_ext_msg(struct console *con, const char *msg,
> 				  unsigned int len)
> 	{
> 		...
> 		list_for_each_entry(nt, &target_list, list)
> 			if (nt->extended && nt->enabled && netif_running(nt->np.dev))
> 				send_ext_msg_udp(nt, msg, len);
> 
> So, back to your point, the netpoll interface will be up, but, not used
> at all.
> 
> On top of that, two other considerations:
> 
>  * If the netpoll target is used without the buffer, it is not a big
> deal, since refill_skbs() is called, independently if the pool is full
> or not. (Which is not ideal and I eventually want to improve it).
> 
> Anyway, this is how the code works today:
> 
> 
> 	void netpoll_send_udp(struct netpoll *np, const char *msg, int len)
> 	{
> 		...
> 		skb = find_skb(np, total_len + np->dev->needed_tailroom,...
> 		// transmit the skb
> 		
> 
> 	static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)
> 	{
> 		...
> 		refill_skbs(np);
> 		skb = alloc_skb(len, GFP_ATOMIC);
> 		if (!skb)
> 			skb = skb_dequeue(&np->skb_pool);
> 		...
> 		// return the skb
> 
> So, even in there is a transmission in-between enabling the netpoll
> target and not populating the pool (which is NOT the case in the code
> today), it would not be a problem, given that netpoll_send_udp() will
> call refill_skbs() anyway.
> 
> I have an in-development patch to improve it, by deferring this to a
> workthread, mainly because this whole allocation dance is done with a
> bunch of locks held, including printk/console lock.
> 
> I think that a best mechanism might be something like:
> 
>  * If find_skb() needs to consume from the pool (which is rare, only
> when alloc_skb() fails), raise workthread that tries to repopulate the
> pool in the background. 
> 
>  * Eventually avoid alloc_skb() first, and getting directly from the
>    pool first, if the pool is depleted, try to alloc_skb(GPF_ATOMIC).
>    This might make the code faster, but, I don't have data yet.

I've hacked this case (getting the skb from the pool first and refilling
it on a workqueue) today, and the performance is expressive.

I've tested sending 2k messages, and meassured the time it takes to
run `netpoll_send_udp`, which is the critical function in netpoll.

Actual code (with this patchset applied), where the index is
nanoseconds:

	[14K, 16K)           107 |@@@                                                 |
	[16K, 20K)          1757 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
	[20K, 24K)            59 |@                                                   |
	[24K, 28K)            35 |@                                                   |
	[28K, 32K)            35 |@                                                   |
	[32K, 40K)             5 |                                                    |
	[40K, 48K)             0 |                                                    |
	[48K, 56K)             0 |                                                    |
	[56K, 64K)             0 |                                                    |
	[64K, 80K)             1 |                                                    |
	[80K, 96K)             0 |                                                    |
	[96K, 112K)            1 |                                                    |


With the optimization applied, I get a solid win:

	[8K, 10K)             32 |@                                                   |
	[10K, 12K)           514 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@                        |
	[12K, 14K)           932 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
	[14K, 16K)           367 |@@@@@@@@@@@@@@@@@@@@                                |
	[16K, 20K)           102 |@@@@@                                               |
	[20K, 24K)            29 |@                                                   |
	[24K, 28K)            17 |                                                    |
	[28K, 32K)             1 |                                                    |
	[32K, 40K)             3 |                                                    |
	[40K, 48K)             0 |                                                    |
	[48K, 56K)             0 |                                                    |
	[56K, 64K)             1 |                                                    |
	[64K, 80K)             0 |                                                    |
	[80K, 96K)             1 |                                                    |
	[96K, 112K)            1 |                                                    |

That was captured this simple bpftrace script:

	kprobe:netpoll_send_udp {
	    @start[tid] = nsecs;
	}

	kretprobe:netpoll_send_udp /@start[tid]/ {
	    $duration = nsecs - @start[tid];
	    delete(@start[tid]);
	    @ = hist($duration, 2)
	}

	END
	{
		clear(@start);
		print(@);
	}

And this is the patch I am testing right now:


	commit 262de00e439e0708fadf5e4c2896837c046a325b
	Author: Breno Leitao <leitao@debian.org>
	Date:   Fri Nov 1 10:03:58 2024 -0700

	    netpoll: prioritize the skb from the pool.

	    Prioritize allocating SKBs from the pool over alloc_skb() to reduce
	    overhead in the critical path.

	    Move the pool refill operation to a worktask, allowing it to run
	    outside of the printk/console lock.

	    This change improves performance by minimizing the time spent in the
	    critical path filling and allocating skbs, reducing contention on the
	    printk/console lock.

	    Signed-off-by: Breno Leitao <leitao@debian.org>

	diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
	index 77635b885c18..c81dc9cc0139 100644
	--- a/include/linux/netpoll.h
	+++ b/include/linux/netpoll.h
	@@ -33,6 +33,7 @@ struct netpoll {
		u16 local_port, remote_port;
		u8 remote_mac[ETH_ALEN];
		struct sk_buff_head skb_pool;
	+	struct work_struct work;
	 };

	 struct netpoll_info {
	diff --git a/net/core/netpoll.c b/net/core/netpoll.c
	index bf2064d689d5..657100393489 100644
	--- a/net/core/netpoll.c
	+++ b/net/core/netpoll.c
	@@ -278,18 +278,24 @@ static void zap_completion_queue(void)
		put_cpu_var(softnet_data);
	 }

	+
	+static void refill_skbs_wt(struct work_struct *work)
	+{
	+	struct netpoll *np = container_of(work, struct netpoll, work);
	+
	+	refill_skbs(np);
	+}
	+
	 static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)
	 {
		int count = 0;
		struct sk_buff *skb;

		zap_completion_queue();
	-	refill_skbs(np);
	 repeat:
	-
	-	skb = alloc_skb(len, GFP_ATOMIC);
	+	skb = skb_dequeue(&np->skb_pool);
		if (!skb)
	-		skb = skb_dequeue(&np->skb_pool);
	+		skb = alloc_skb(len, GFP_ATOMIC);

		if (!skb) {
			if (++count < 10) {
	@@ -301,6 +307,7 @@ static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)

		refcount_set(&skb->users, 1);
		skb_reserve(skb, reserve);
	+	schedule_work(&np->work);
		return skb;
	 }

	@@ -780,6 +787,7 @@ int netpoll_setup(struct netpoll *np)

		/* fill up the skb queue */
		refill_skbs(np);
	+	INIT_WORK(&np->work, refill_skbs_wt);
		return 0;

	 put:



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: netpoll: Defer skb_pool population until setup success
  2024-11-01 18:18       ` Breno Leitao
@ 2024-11-02  2:01         ` Jakub Kicinski
  2024-11-04 20:40           ` Breno Leitao
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2024-11-02  2:01 UTC (permalink / raw)
  To: Breno Leitao
  Cc: horms, davem, edumazet, pabeni, thepacketgeek, netdev,
	linux-kernel, davej, vlad.wing, max, kernel-team, jiri, jv, andy,
	aehkn, Rik van Riel, Al Viro

On Fri, 1 Nov 2024 11:18:29 -0700 Breno Leitao wrote:
> > I think that a best mechanism might be something like:
> > 
> >  * If find_skb() needs to consume from the pool (which is rare, only
> > when alloc_skb() fails), raise workthread that tries to repopulate the
> > pool in the background. 
> > 
> >  * Eventually avoid alloc_skb() first, and getting directly from the
> >    pool first, if the pool is depleted, try to alloc_skb(GPF_ATOMIC).
> >    This might make the code faster, but, I don't have data yet.  
> 
> I've hacked this case (getting the skb from the pool first and refilling
> it on a workqueue) today, and the performance is expressive.
> 
> I've tested sending 2k messages, and meassured the time it takes to
> run `netpoll_send_udp`, which is the critical function in netpoll.

The purpose of the pool is to have a reserve in case of OOM, AFAIU.
We may speed things up by taking the allocations out of line but
we risk the pool being empty when we really need it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: netpoll: Defer skb_pool population until setup success
  2024-11-02  2:01         ` Jakub Kicinski
@ 2024-11-04 20:40           ` Breno Leitao
  2024-11-06  1:00             ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Breno Leitao @ 2024-11-04 20:40 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: horms, davem, edumazet, pabeni, thepacketgeek, netdev,
	linux-kernel, davej, vlad.wing, max, kernel-team, jiri, jv, andy,
	aehkn, Rik van Riel, Al Viro

On Fri, Nov 01, 2024 at 07:01:01PM -0700, Jakub Kicinski wrote:
> On Fri, 1 Nov 2024 11:18:29 -0700 Breno Leitao wrote:
> > > I think that a best mechanism might be something like:
> > > 
> > >  * If find_skb() needs to consume from the pool (which is rare, only
> > > when alloc_skb() fails), raise workthread that tries to repopulate the
> > > pool in the background. 
> > > 
> > >  * Eventually avoid alloc_skb() first, and getting directly from the
> > >    pool first, if the pool is depleted, try to alloc_skb(GPF_ATOMIC).
> > >    This might make the code faster, but, I don't have data yet.  
> > 
> > I've hacked this case (getting the skb from the pool first and refilling
> > it on a workqueue) today, and the performance is expressive.
> > 
> > I've tested sending 2k messages, and meassured the time it takes to
> > run `netpoll_send_udp`, which is the critical function in netpoll.
> 
> The purpose of the pool is to have a reserve in case of OOM, AFAIU.
> We may speed things up by taking the allocations out of line but
> we risk the pool being empty when we really need it.

Correct, but, in a case of OOM, I am not sure if this is going to
change the chances at all.

Let's assume the pool is full and we start getting OOMs. It doesn't
matter if alloc_skb() will fail in the critical path or in the work
thread, netpoll will have MAX_SKBS skbs buffered to use, and none will
be allocated, thus, just 32 SKBs will be used until a -ENOMEM returns.

On the other side, let's suppose there is a bunch of OOM pressure for a
while (10 SKBs are consumed for instance), and then some free memory
show up, causing the pool to be replenished. It is better
to do it in the workthread other than in the hot path.

In both cases, the chance of not having SKBs to send the packet seems to
be the same, unless I am not modeling the problem correctly.

On top of that, a few other points that this new model could help more,
in a OOM case.

1) Now with Maksysm patches, we can monitor the OOMing rate

2) With the pool per target, we can easily increase the pool size if we
want. (patchset not pushed yet)

This will also fix another corner case we have in netconsole. When
printk() holds the console/target_list locked, the upcoming code cannot
printk() anymore, otherwise it will deadlcok system. That is because a
printk() will call netconsole again (nested), and it will try to get the
console_lock/target_lock again, but that is already held. Having the
alloc_skb() out of that thot path will reduce the probability of this
happening. This is something I am planning to fix later, by just
dropping the upcoming message. Having this patch might make less packets
being dropped.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: netpoll: Defer skb_pool population until setup success
  2024-11-04 20:40           ` Breno Leitao
@ 2024-11-06  1:00             ` Jakub Kicinski
  2024-11-06 15:06               ` Breno Leitao
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2024-11-06  1:00 UTC (permalink / raw)
  To: Breno Leitao
  Cc: horms, davem, edumazet, pabeni, thepacketgeek, netdev,
	linux-kernel, davej, vlad.wing, max, kernel-team, jiri, jv, andy,
	aehkn, Rik van Riel, Al Viro

On Mon, 4 Nov 2024 12:40:00 -0800 Breno Leitao wrote:
> Let's assume the pool is full and we start getting OOMs. It doesn't
> matter if alloc_skb() will fail in the critical path or in the work
> thread, netpoll will have MAX_SKBS skbs buffered to use, and none will
> be allocated, thus, just 32 SKBs will be used until a -ENOMEM returns.

Do you assume the worker thread will basically keep up with the output?
Vadim was showing me a system earlier today where workqueue workers
didn't get scheduled in for minutes :( That's a bit extreme but doesn't
inspire confidence in worker replenishing the pool quickly.

> On the other side, let's suppose there is a bunch of OOM pressure for a
> while (10 SKBs are consumed for instance), and then some free memory
> show up, causing the pool to be replenished. It is better
> to do it in the workthread other than in the hot path.

We could cap how much we replenish in one go?

> In both cases, the chance of not having SKBs to send the packet seems to
> be the same, unless I am not modeling the problem correctly.

Maybe I misunderstood the proposal, I think you said earlier that you
want to consume from the pool instead of calling alloc(). If you mean
that we'd still alloc in the fast path but not replenish the pool
that's different.

> On top of that, a few other points that this new model could help more,
> in a OOM case.
> 
> 1) Now with Maksysm patches, we can monitor the OOMing rate
> 
> 2) With the pool per target, we can easily increase the pool size if we
> want. (patchset not pushed yet)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: netpoll: Defer skb_pool population until setup success
  2024-11-06  1:00             ` Jakub Kicinski
@ 2024-11-06 15:06               ` Breno Leitao
  2024-11-06 23:43                 ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Breno Leitao @ 2024-11-06 15:06 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: horms, davem, edumazet, pabeni, thepacketgeek, netdev,
	linux-kernel, davej, vlad.wing, max, kernel-team, jiri, jv, andy,
	aehkn, Rik van Riel, Al Viro

Hello Jakub,

On Tue, Nov 05, 2024 at 05:00:29PM -0800, Jakub Kicinski wrote:
> On Mon, 4 Nov 2024 12:40:00 -0800 Breno Leitao wrote:
> > Let's assume the pool is full and we start getting OOMs. It doesn't
> > matter if alloc_skb() will fail in the critical path or in the work
> > thread, netpoll will have MAX_SKBS skbs buffered to use, and none will
> > be allocated, thus, just 32 SKBs will be used until a -ENOMEM returns.
> 
> Do you assume the worker thread will basically keep up with the output?
> Vadim was showing me a system earlier today where workqueue workers
> didn't get scheduled in for minutes :( That's a bit extreme but doesn't
> inspire confidence in worker replenishing the pool quickly.

Interesting. Thanks for the data point.

> > On the other side, let's suppose there is a bunch of OOM pressure for a
> > while (10 SKBs are consumed for instance), and then some free memory
> > show up, causing the pool to be replenished. It is better
> > to do it in the workthread other than in the hot path.
> 
> We could cap how much we replenish in one go?

If we keep the replenish in the hot path, I think it is worth doing it,
for sure.

> > In both cases, the chance of not having SKBs to send the packet seems to
> > be the same, unless I am not modeling the problem correctly.
> 
> Maybe I misunderstood the proposal, I think you said earlier that you
> want to consume from the pool instead of calling alloc(). If you mean
> that we'd still alloc in the fast path but not replenish the pool
> that's different.

To clarify, let me take a step back and outline what this patchset proposes:

The patchset enhances SKB pool management in three key ways:

	a) It delays populating the skb pool until the target is active.
	b) It releases the skb pool when there are no more active users.
	c) It creates a separate pool for each target.

The third point (c) is the one that's open to discussion, as I
understand.

I proposed that having an individualized skb pool that users can control
would be beneficial. For example, users could define the number of skbs
in the pool. This could lead to additional advantages, such as allowing
netpoll to directly consume from the pool instead of relying on alloc()
in the optimal scenario, thereby speeding up the critical path.

--breno

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: netpoll: Defer skb_pool population until setup success
  2024-11-06 15:06               ` Breno Leitao
@ 2024-11-06 23:43                 ` Jakub Kicinski
  2024-11-07 11:50                   ` Breno Leitao
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2024-11-06 23:43 UTC (permalink / raw)
  To: Breno Leitao
  Cc: horms, davem, edumazet, pabeni, thepacketgeek, netdev,
	linux-kernel, davej, vlad.wing, max, kernel-team, jiri, jv, andy,
	aehkn, Rik van Riel, Al Viro

On Wed, 6 Nov 2024 07:06:06 -0800 Breno Leitao wrote:
> To clarify, let me take a step back and outline what this patchset proposes:
> 
> The patchset enhances SKB pool management in three key ways:
> 
> 	a) It delays populating the skb pool until the target is active.
> 	b) It releases the skb pool when there are no more active users.
> 	c) It creates a separate pool for each target.
> 
> The third point (c) is the one that's open to discussion, as I
> understand.
> 
> I proposed that having an individualized skb pool that users can control
> would be beneficial. For example, users could define the number of skbs
> in the pool. This could lead to additional advantages, such as allowing
> netpoll to directly consume from the pool instead of relying on alloc()
> in the optimal scenario, thereby speeding up the critical path.

Patch 1 is the one I'm not completely convinced by. I understand 
the motivation but its rather unusual to activate partially initialized
objects. Maybe let's leave it out.

The rest is fine, although I'd invert the justification for the second
patch. We should in fact scale the number of pooled packets with the
number of consoles. Each message gets send to every console so system
with 2 netconsoles has effectively half the OOM cushion.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: netpoll: Defer skb_pool population until setup success
  2024-11-06 23:43                 ` Jakub Kicinski
@ 2024-11-07 11:50                   ` Breno Leitao
  0 siblings, 0 replies; 17+ messages in thread
From: Breno Leitao @ 2024-11-07 11:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: horms, davem, edumazet, pabeni, thepacketgeek, netdev,
	linux-kernel, davej, vlad.wing, max, kernel-team, jiri, jv, andy,
	aehkn, Rik van Riel, Al Viro

Hello Jakub,

On Wed, Nov 06, 2024 at 03:43:49PM -0800, Jakub Kicinski wrote:
> On Wed, 6 Nov 2024 07:06:06 -0800 Breno Leitao wrote:
> > To clarify, let me take a step back and outline what this patchset proposes:
> > 
> > The patchset enhances SKB pool management in three key ways:
> > 
> > 	a) It delays populating the skb pool until the target is active.
> > 	b) It releases the skb pool when there are no more active users.
> > 	c) It creates a separate pool for each target.
> > 
> > The third point (c) is the one that's open to discussion, as I
> > understand.
> > 
> > I proposed that having an individualized skb pool that users can control
> > would be beneficial. For example, users could define the number of skbs
> > in the pool. This could lead to additional advantages, such as allowing
> > netpoll to directly consume from the pool instead of relying on alloc()
> > in the optimal scenario, thereby speeding up the critical path.
> 
> Patch 1 is the one I'm not completely convinced by. I understand 
> the motivation but its rather unusual to activate partially initialized
> objects. Maybe let's leave it out.
> 
> The rest is fine, although I'd invert the justification for the second
> patch. We should in fact scale the number of pooled packets with the
> number of consoles. Each message gets send to every console so system
> with 2 netconsoles has effectively half the OOM cushion.

That is fair. Thanks for the guidance. I will keep patch 1 out of it and
send a v2.

Thanks
--breno

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-11-07 11:50 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-25 14:20 [PATCH net-next 0/3] net: netpoll: Improve SKB pool management Breno Leitao
2024-10-25 14:20 ` [PATCH net-next 1/3] net: netpoll: Defer skb_pool population until setup success Breno Leitao
2024-11-01  1:26   ` Jakub Kicinski
2024-11-01 10:51     ` Breno Leitao
2024-11-01 18:18       ` Breno Leitao
2024-11-02  2:01         ` Jakub Kicinski
2024-11-04 20:40           ` Breno Leitao
2024-11-06  1:00             ` Jakub Kicinski
2024-11-06 15:06               ` Breno Leitao
2024-11-06 23:43                 ` Jakub Kicinski
2024-11-07 11:50                   ` Breno Leitao
2024-10-25 14:20 ` [PATCH net-next 2/3] net: netpoll: Individualize the skb pool Breno Leitao
2024-11-01  1:28   ` Jakub Kicinski
2024-11-01 11:56     ` Breno Leitao
2024-10-25 14:20 ` [PATCH net-next 3/3] net: netpoll: flush skb pool during cleanup Breno Leitao
2024-11-01  1:29   ` Jakub Kicinski
2024-11-01 11:57     ` Breno Leitao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).