netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* issue with inflight pages from page_pool
@ 2023-04-17 17:53 Lorenzo Bianconi
  2023-04-17 18:08 ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Lorenzo Bianconi @ 2023-04-17 17:53 UTC (permalink / raw)
  To: netdev
  Cc: hawk, ilias.apalodimas, davem, edumazet, kuba, pabeni, bpf,
	lorenzo.bianconi, nbd

[-- Attachment #1: Type: text/plain, Size: 2166 bytes --]

Hi all,

I am triggering an issue with a device running the page_pool allocator.
In particular, the device is running an iperf tcp server receiving traffic
from a remote client. On the driver I loaded a simple xdp program returning
xdp_pass. When I remove the ebpf program and destroy the pool, page_pool
allocator starts complaining in page_pool_release_retry() that not all the pages
have been returned to the allocator. In fact, the pool is not really destroyed
in this case.
Debugging the code it seems the pages are stuck softnet_data defer_list and
they are never freed in skb_defer_free_flush() since I do not have any more tcp
traffic. To prove it, I tried to set sysctl_skb_defer_max to 0 and the issue
does not occur.
I developed the poc patch below and the issue seems to be fixed:

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 193c18799865..160f45c4e3a5 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -19,6 +19,7 @@
 #include <linux/mm.h> /* for put_page() */
 #include <linux/poison.h>
 #include <linux/ethtool.h>
+#include <linux/netdevice.h>
 
 #include <trace/events/page_pool.h>
 
@@ -810,12 +811,23 @@ static void page_pool_release_retry(struct work_struct *wq)
 {
 	struct delayed_work *dwq = to_delayed_work(wq);
 	struct page_pool *pool = container_of(dwq, typeof(*pool), release_dw);
-	int inflight;
+	int cpu, inflight;
 
 	inflight = page_pool_release(pool);
 	if (!inflight)
 		return;
 
+	/* Run NET_RX_SOFTIRQ in order to free pending skbs in softnet_data
+	 * defer_list that can stay in the list until we have enough queued
+	 * traffic.
+	 */
+	for_each_online_cpu(cpu) {
+		struct softnet_data *sd = &per_cpu(softnet_data, cpu);
+
+		if (!cmpxchg(&sd->defer_ipi_scheduled, 0, 1))
+			smp_call_function_single_async(cpu, &sd->defer_csd);
+	}
+
 	/* Periodic warning */
 	if (time_after_eq(jiffies, pool->defer_warn)) {
 		int sec = (s32)((u32)jiffies - (u32)pool->defer_start) / HZ;

Is it ok or do you think there is a better solution for issue?
@Felix: I think we faced a similar issue in mt76 unloading the module, right?

Regards,
Lorenzo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-04-19 17:15 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-17 17:53 issue with inflight pages from page_pool Lorenzo Bianconi
2023-04-17 18:08 ` Eric Dumazet
2023-04-17 18:17   ` Lorenzo Bianconi
2023-04-17 18:23     ` Jakub Kicinski
2023-04-17 18:42       ` Lorenzo Bianconi
2023-04-17 19:08         ` Jakub Kicinski
2023-04-17 21:31           ` Lorenzo Bianconi
2023-04-17 23:32             ` Jakub Kicinski
2023-04-18  7:36               ` Lorenzo Bianconi
2023-04-19 11:08                 ` Jesper Dangaard Brouer
2023-04-19 12:09                   ` Eric Dumazet
2023-04-19 14:02                     ` Jesper Dangaard Brouer
2023-04-19 14:18                       ` Eric Dumazet
2023-04-19 16:10                         ` Jesper Dangaard Brouer
2023-04-19 14:21                       ` Lorenzo Bianconi
2023-04-19 15:36                         ` Jesper Dangaard Brouer
2023-04-19 16:40                           ` Lorenzo Bianconi
2023-04-19 17:12                           ` Jakub Kicinski
2023-04-17 18:24     ` Gerhard Engleder
2023-04-17 19:00       ` Lorenzo Bianconi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).