* Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) @ 2025-04-14 16:29 Jaroslav Pulchart 2025-04-14 17:15 ` [Intel-wired-lan] " Paul Menzel ` (2 more replies) 0 siblings, 3 replies; 46+ messages in thread From: Jaroslav Pulchart @ 2025-04-14 16:29 UTC (permalink / raw) To: Tony Nguyen, Kitszel, Przemyslaw Cc: jdamato, intel-wired-lan, netdev, Igor Raits, Daniel Secik, Zdenek Pesek Hello, While investigating increased memory usage after upgrading our host/hypervisor servers from Linux kernel 6.12.y to 6.13.y, I observed a regression in available memory per NUMA node. Our servers allocate 60GB of each NUMA node’s 64GB of RAM to HugePages for VMs, leaving 4GB for the host OS. After the upgrade, we noticed approximately 500MB less free RAM on NUMA nodes 0 and 2 compared to 6.12.y, even with no VMs running (just the host OS after reboot). These nodes host Intel 810-XXV NICs. Here's a snapshot of the NUMA stats on vanilla 6.13.y: NUMA nodes: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 HPFreeGiB: 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 MemTotal: 64989 65470 65470 65470 65470 65470 65470 65453 65470 65470 65470 65470 65470 65470 65470 65462 MemFree: 2793 3559 3150 3438 3616 3722 3520 3547 3547 3536 3506 3452 3440 3489 3607 3729 We traced the issue to commit 492a044508ad13a490a24c66f311339bf891cb5f "ice: Add support for persistent NAPI config". We limit the number of channels on the NICs to match local NUMA cores or less if unused interface (from ridiculous 96 default), for example: ethtool -L em1 combined 6 # active port; from 96 ethtool -L p3p2 combined 2 # unused port; from 96 This typically aligns memory use with local CPUs and keeps NUMA-local memory usage within expected limits. However, starting with kernel 6.13.y and this commit, the high memory usage by the ICE driver persists regardless of reduced channel configuration. Reverting the commit restores expected memory availability on nodes 0 and 2. Below are stats from 6.13.y with the commit reverted: NUMA nodes: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 HPFreeGiB: 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 MemTotal: 64989 65470 65470 65470 65470 65470 65470 65453 65470 65470 65470 65470 65470 65470 65470 65462 MemFree: 3208 3765 3668 3507 3811 3727 3812 3546 3676 3596 ... This brings nodes 0 and 2 back to ~3.5GB free RAM, similar to kernel 6.12.y, and avoids swap pressure and memory exhaustion when running services and VMs. I also do not see any practical benefit in persisting the channel memory allocation. After a fresh server reboot, channels are not explicitly configured, and the system will not automatically resize them back to a higher count unless manually set again. Therefore, retaining the previous memory footprint appears unnecessary and potentially harmful in memory-constrained environments Best regards, Jaroslav Pulchart ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-04-14 16:29 Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) Jaroslav Pulchart @ 2025-04-14 17:15 ` Paul Menzel 2025-04-15 14:38 ` Przemek Kitszel 2025-07-04 16:55 ` Michal Kubiak 2 siblings, 0 replies; 46+ messages in thread From: Paul Menzel @ 2025-04-14 17:15 UTC (permalink / raw) To: Jaroslav Pulchart, Tony Nguyen, Przemyslaw Kitszel Cc: jdamato, intel-wired-lan, netdev, Igor Raits, Daniel Secik, Zdenek Pesek, regressions #regzbot ^introduced: 492a044508ad13a490a24c66f311339bf891cb5f Am 14.04.25 um 18:29 schrieb Jaroslav Pulchart: > Hello, > > While investigating increased memory usage after upgrading our > host/hypervisor servers from Linux kernel 6.12.y to 6.13.y, I observed > a regression in available memory per NUMA node. Our servers allocate > 60GB of each NUMA node’s 64GB of RAM to HugePages for VMs, leaving 4GB > for the host OS. > > After the upgrade, we noticed approximately 500MB less free RAM on > NUMA nodes 0 and 2 compared to 6.12.y, even with no VMs running (just > the host OS after reboot). These nodes host Intel 810-XXV NICs. Here's > a snapshot of the NUMA stats on vanilla 6.13.y: > > NUMA nodes: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > HPFreeGiB: 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 > MemTotal: 64989 65470 65470 65470 65470 65470 65470 65453 65470 65470 65470 65470 65470 65470 65470 65462 > MemFree: 2793 3559 3150 3438 3616 3722 3520 3547 3547 3536 3506 3452 3440 3489 3607 3729 > > We traced the issue to commit 492a044508ad13a490a24c66f311339bf891cb5f > "ice: Add support for persistent NAPI config". > > We limit the number of channels on the NICs to match local NUMA cores > or less if unused interface (from ridiculous 96 default), for example: > ethtool -L em1 combined 6 # active port; from 96 > ethtool -L p3p2 combined 2 # unused port; from 96 > > This typically aligns memory use with local CPUs and keeps NUMA-local > memory usage within expected limits. However, starting with kernel > 6.13.y and this commit, the high memory usage by the ICE driver > persists regardless of reduced channel configuration. > > Reverting the commit restores expected memory availability on nodes 0 > and 2. Below are stats from 6.13.y with the commit reverted: > NUMA nodes: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > HPFreeGiB: 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 > MemTotal: 64989 65470 65470 65470 65470 65470 65470 65453 65470 65470 65470 65470 65470 65470 65470 65462 > MemFree: 3208 3765 3668 3507 3811 3727 3812 3546 3676 3596 ... > > This brings nodes 0 and 2 back to ~3.5GB free RAM, similar to kernel > 6.12.y, and avoids swap pressure and memory exhaustion when running > services and VMs. > > I also do not see any practical benefit in persisting the channel > memory allocation. After a fresh server reboot, channels are not > explicitly configured, and the system will not automatically resize > them back to a higher count unless manually set again. Therefore, > retaining the previous memory footprint appears unnecessary and > potentially harmful in memory-constrained environments > > Best regards, > Jaroslav Pulchart ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-04-14 16:29 Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) Jaroslav Pulchart 2025-04-14 17:15 ` [Intel-wired-lan] " Paul Menzel @ 2025-04-15 14:38 ` Przemek Kitszel 2025-04-16 0:53 ` Jakub Kicinski 2025-07-04 16:55 ` Michal Kubiak 2 siblings, 1 reply; 46+ messages in thread From: Przemek Kitszel @ 2025-04-15 14:38 UTC (permalink / raw) To: Jaroslav Pulchart Cc: jdamato, intel-wired-lan, netdev, Tony Nguyen, Igor Raits, Daniel Secik, Zdenek Pesek, Jakub Kicinski, Eric Dumazet, Martin Karsten, Ahmed Zaki, Czapnik, Lukasz, Michal Swiatkowski On 4/14/25 18:29, Jaroslav Pulchart wrote: > Hello, +CC to co-devs and reviewers of initial napi_config introduction +CC Ahmed, who leverages napi_config for more stuff in 6.15 > > While investigating increased memory usage after upgrading our > host/hypervisor servers from Linux kernel 6.12.y to 6.13.y, I observed > a regression in available memory per NUMA node. Our servers allocate > 60GB of each NUMA node’s 64GB of RAM to HugePages for VMs, leaving 4GB > for the host OS. > > After the upgrade, we noticed approximately 500MB less free RAM on > NUMA nodes 0 and 2 compared to 6.12.y, even with no VMs running (just > the host OS after reboot). These nodes host Intel 810-XXV NICs. Here's > a snapshot of the NUMA stats on vanilla 6.13.y: > > NUMA nodes: 0 1 2 3 4 5 6 7 8 > 9 10 11 12 13 14 15 > HPFreeGiB: 60 60 60 60 60 60 60 60 60 > 60 60 60 60 60 60 60 > MemTotal: 64989 65470 65470 65470 65470 65470 65470 65453 > 65470 65470 65470 65470 65470 65470 65470 65462 > MemFree: 2793 3559 3150 3438 3616 3722 3520 3547 3547 > 3536 3506 3452 3440 3489 3607 3729 > > We traced the issue to commit 492a044508ad13a490a24c66f311339bf891cb5f > "ice: Add support for persistent NAPI config". thank you for the report and bisection, this commit is ice's opt-in into using persistent napi_config I have checked the code, and there is nothing obvious to inflate memory consumption in the driver/core in the touched parts. I have not yet looked into how much memory is eaten by the hash array of now-kept configs. > > We limit the number of channels on the NICs to match local NUMA cores > or less if unused interface (from ridiculous 96 default), for example: We will experiment with other defaults, looks like number of total CPUs, instead of local NUMA cores, might be better here. And even if that would resolve the issue, I would like to have a more direct fix for this > ethtool -L em1 combined 6 # active port; from 96 > ethtool -L p3p2 combined 2 # unused port; from 96 > > This typically aligns memory use with local CPUs and keeps NUMA-local > memory usage within expected limits. However, starting with kernel > 6.13.y and this commit, the high memory usage by the ICE driver > persists regardless of reduced channel configuration. As a workaround, you could try to do devlink reload (action driver_reinit), that should flush all napi instances. We will try to reproduce the issue locally and work on a fix. > > Reverting the commit restores expected memory availability on nodes 0 > and 2. Below are stats from 6.13.y with the commit reverted: > NUMA nodes: 0 1 2 3 4 5 6 7 8 > 9 10 11 12 13 14 15 > HPFreeGiB: 60 60 60 60 60 60 60 60 60 > 60 60 60 60 60 60 60 > MemTotal: 64989 65470 65470 65470 65470 65470 65470 65453 65470 > 65470 65470 65470 65470 65470 65470 65462 > MemFree: 3208 3765 3668 3507 3811 3727 3812 3546 3676 3596 ... > > This brings nodes 0 and 2 back to ~3.5GB free RAM, similar to kernel > 6.12.y, and avoids swap pressure and memory exhaustion when running > services and VMs. > > I also do not see any practical benefit in persisting the channel > memory allocation. After a fresh server reboot, channels are not > explicitly configured, and the system will not automatically resize > them back to a higher count unless manually set again. Therefore, > retaining the previous memory footprint appears unnecessary and > potentially harmful in memory-constrained environments in this particular case there is indeed no benefit, it was designed for keeping the config/stats for queues that were meaningfully used it is rather clunky anyway > > Best regards, > Jaroslav Pulchart ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-04-15 14:38 ` Przemek Kitszel @ 2025-04-16 0:53 ` Jakub Kicinski 2025-04-16 7:13 ` Jaroslav Pulchart 0 siblings, 1 reply; 46+ messages in thread From: Jakub Kicinski @ 2025-04-16 0:53 UTC (permalink / raw) To: Przemek Kitszel Cc: Jaroslav Pulchart, jdamato, intel-wired-lan, netdev, Tony Nguyen, Igor Raits, Daniel Secik, Zdenek Pesek, Eric Dumazet, Martin Karsten, Ahmed Zaki, Czapnik, Lukasz, Michal Swiatkowski On Tue, 15 Apr 2025 16:38:40 +0200 Przemek Kitszel wrote: > > We traced the issue to commit 492a044508ad13a490a24c66f311339bf891cb5f > > "ice: Add support for persistent NAPI config". > > thank you for the report and bisection, > this commit is ice's opt-in into using persistent napi_config > > I have checked the code, and there is nothing obvious to inflate memory > consumption in the driver/core in the touched parts. I have not yet > looked into how much memory is eaten by the hash array of now-kept > configs. +1 also unclear to me how that commit makes any difference. Jaroslav, when you say "traced" what do you mean? CONFIG_MEM_ALLOC_PROFILING ? The napi_config struct is just 24B. The queue struct (we allocate napi_config for each queue) is 320B... ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-04-16 0:53 ` Jakub Kicinski @ 2025-04-16 7:13 ` Jaroslav Pulchart 2025-04-16 13:48 ` Jakub Kicinski 0 siblings, 1 reply; 46+ messages in thread From: Jaroslav Pulchart @ 2025-04-16 7:13 UTC (permalink / raw) To: Jakub Kicinski Cc: Przemek Kitszel, jdamato, intel-wired-lan, netdev, Tony Nguyen, Igor Raits, Daniel Secik, Zdenek Pesek, Eric Dumazet, Martin Karsten, Ahmed Zaki, Czapnik, Lukasz, Michal Swiatkowski st 16. 4. 2025 v 2:54 odesílatel Jakub Kicinski <kuba@kernel.org> napsal: > > On Tue, 15 Apr 2025 16:38:40 +0200 Przemek Kitszel wrote: > > > We traced the issue to commit 492a044508ad13a490a24c66f311339bf891cb5f > > > "ice: Add support for persistent NAPI config". > > > > thank you for the report and bisection, > > this commit is ice's opt-in into using persistent napi_config > > > > I have checked the code, and there is nothing obvious to inflate memory > > consumption in the driver/core in the touched parts. I have not yet > > looked into how much memory is eaten by the hash array of now-kept > > configs. > > +1 also unclear to me how that commit makes any difference. > > Jaroslav, when you say "traced" what do you mean? > CONFIG_MEM_ALLOC_PROFILING ? > > The napi_config struct is just 24B. The queue struct (we allocate > napi_config for each queue) is 320B... By "traced" I mean using the kernel and checking memory situation on numa nodes with and without production load. Numa nodes, with X810 NIC, showing a quite less available memory with default queue length (num of all cpus) and it needs to be lowered to 1-2 (for unused interfaces) and up-to-count of numa node cores on used interfaces to make the memory allocation reasonable and server avoiding "kswapd"... See "MemFree" on numa 0 + 1 on different/smaller but utilized (running VMs + using network) host server with 8 numa nodes (32GB RAM each, 28G in Hugepase for VMs and 4GB for host os): 6.13.y vanilla (lot of kswapd0 in background): NUMA nodes: 0 1 2 3 4 5 6 7 HPTotalGiB: 28 28 28 28 28 28 28 28 HPFreeGiB: 0 0 0 0 0 0 0 0 MemTotal: 32220 32701 32701 32686 32701 32701 32701 32696 MemFree: 274 254 1327 1928 1949 2683 2624 2769 6.13.y + Revert (no memory issues at all): NUMA nodes: 0 1 2 3 4 5 6 7 HPTotalGiB: 28 28 28 28 28 28 28 28 HPFreeGiB: 0 0 0 0 0 0 0 0 MemTotal: 32220 32701 32701 32686 32701 32701 32701 32696 MemFree: 2213 2438 3402 3108 2846 2672 2592 3063 We need to lower the queue on all X810 interfaces from default (64 in this case), to ensure we have memory available for host OS services. ethtool -L em2 combined 1 ethtool -L p3p2 combined 1 ethtool -L em1 combined 6 ethtool -L p3p1 combined 6 This trick "does not work" without the revert. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-04-16 7:13 ` Jaroslav Pulchart @ 2025-04-16 13:48 ` Jakub Kicinski 2025-04-16 16:03 ` Jaroslav Pulchart 0 siblings, 1 reply; 46+ messages in thread From: Jakub Kicinski @ 2025-04-16 13:48 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Przemek Kitszel, jdamato, intel-wired-lan, netdev, Tony Nguyen, Igor Raits, Daniel Secik, Zdenek Pesek, Eric Dumazet, Martin Karsten, Ahmed Zaki, Czapnik, Lukasz, Michal Swiatkowski On Wed, 16 Apr 2025 09:13:23 +0200 Jaroslav Pulchart wrote: > By "traced" I mean using the kernel and checking memory situation on > numa nodes with and without production load. Numa nodes, with X810 > NIC, showing a quite less available memory with default queue length > (num of all cpus) and it needs to be lowered to 1-2 (for unused > interfaces) and up-to-count of numa node cores on used interfaces to > make the memory allocation reasonable and server avoiding "kswapd"... > > See "MemFree" on numa 0 + 1 on different/smaller but utilized (running > VMs + using network) host server with 8 numa nodes (32GB RAM each, 28G > in Hugepase for VMs and 4GB for host os): FWIW you can also try the tools/net/ynl/samples/page-pool application, not sure if Intel NICs init page pools appropriately but this will show you exactly how much memory is sitting on Rx rings of the driver (and in net socket buffers). > 6.13.y vanilla (lot of kswapd0 in background): > NUMA nodes: 0 1 2 3 4 5 6 7 > HPTotalGiB: 28 28 28 28 28 28 28 28 > HPFreeGiB: 0 0 0 0 0 0 0 0 > MemTotal: 32220 32701 32701 32686 32701 32701 > 32701 32696 > MemFree: 274 254 1327 1928 1949 2683 2624 2769 > 6.13.y + Revert (no memory issues at all): > NUMA nodes: 0 1 2 3 4 5 6 7 > HPTotalGiB: 28 28 28 28 28 28 28 28 > HPFreeGiB: 0 0 0 0 0 0 0 0 > MemTotal: 32220 32701 32701 32686 32701 32701 32701 32696 > MemFree: 2213 2438 3402 3108 2846 2672 2592 3063 > > We need to lower the queue on all X810 interfaces from default (64 in > this case), to ensure we have memory available for host OS services. > ethtool -L em2 combined 1 > ethtool -L p3p2 combined 1 > ethtool -L em1 combined 6 > ethtool -L p3p1 combined 6 > This trick "does not work" without the revert. And you're reverting just and exactly 492a044508ad13 ? The memory for persistent config is allocated in alloc_netdev_mqs() unconditionally. I'm lost as to how this commit could make any difference :( ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-04-16 13:48 ` Jakub Kicinski @ 2025-04-16 16:03 ` Jaroslav Pulchart 2025-04-16 22:44 ` Jakub Kicinski 2025-04-16 22:57 ` Keller, Jacob E 0 siblings, 2 replies; 46+ messages in thread From: Jaroslav Pulchart @ 2025-04-16 16:03 UTC (permalink / raw) To: Jakub Kicinski Cc: Przemek Kitszel, jdamato, intel-wired-lan, netdev, Tony Nguyen, Igor Raits, Daniel Secik, Zdenek Pesek, Eric Dumazet, Martin Karsten, Ahmed Zaki, Czapnik, Lukasz, Michal Swiatkowski > > On Wed, 16 Apr 2025 09:13:23 +0200 Jaroslav Pulchart wrote: > > By "traced" I mean using the kernel and checking memory situation on > > numa nodes with and without production load. Numa nodes, with X810 > > NIC, showing a quite less available memory with default queue length > > (num of all cpus) and it needs to be lowered to 1-2 (for unused > > interfaces) and up-to-count of numa node cores on used interfaces to > > make the memory allocation reasonable and server avoiding "kswapd"... > > > > See "MemFree" on numa 0 + 1 on different/smaller but utilized (running > > VMs + using network) host server with 8 numa nodes (32GB RAM each, 28G > > in Hugepase for VMs and 4GB for host os): > > FWIW you can also try the tools/net/ynl/samples/page-pool > application, not sure if Intel NICs init page pools appropriately > but this will show you exactly how much memory is sitting on Rx rings > of the driver (and in net socket buffers). I'm not familiar with the page-pool tool, I try to build it, run it and nothing is shown. Any hint/menual how to use it? > > > 6.13.y vanilla (lot of kswapd0 in background): > > NUMA nodes: 0 1 2 3 4 5 6 7 > > HPTotalGiB: 28 28 28 28 28 28 28 28 > > HPFreeGiB: 0 0 0 0 0 0 0 0 > > MemTotal: 32220 32701 32701 32686 32701 32701 > > 32701 32696 > > MemFree: 274 254 1327 1928 1949 2683 2624 2769 > > 6.13.y + Revert (no memory issues at all): > > NUMA nodes: 0 1 2 3 4 5 6 7 > > HPTotalGiB: 28 28 28 28 28 28 28 28 > > HPFreeGiB: 0 0 0 0 0 0 0 0 > > MemTotal: 32220 32701 32701 32686 32701 32701 32701 32696 > > MemFree: 2213 2438 3402 3108 2846 2672 2592 3063 > > > > We need to lower the queue on all X810 interfaces from default (64 in > > this case), to ensure we have memory available for host OS services. > > ethtool -L em2 combined 1 > > ethtool -L p3p2 combined 1 > > ethtool -L em1 combined 6 > > ethtool -L p3p1 combined 6 > > This trick "does not work" without the revert. > > And you're reverting just and exactly 492a044508ad13 ? > The memory for persistent config is allocated in alloc_netdev_mqs() > unconditionally. I'm lost as to how this commit could make any > difference :( Yes, reverted the 492a044508ad13. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-04-16 16:03 ` Jaroslav Pulchart @ 2025-04-16 22:44 ` Jakub Kicinski 2025-04-16 22:57 ` [Intel-wired-lan] " Keller, Jacob E 2025-04-16 22:57 ` Keller, Jacob E 1 sibling, 1 reply; 46+ messages in thread From: Jakub Kicinski @ 2025-04-16 22:44 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Przemek Kitszel, jdamato, intel-wired-lan, netdev, Tony Nguyen, Igor Raits, Daniel Secik, Zdenek Pesek, Eric Dumazet, Martin Karsten, Ahmed Zaki, Czapnik, Lukasz, Michal Swiatkowski On Wed, 16 Apr 2025 18:03:52 +0200 Jaroslav Pulchart wrote: > > FWIW you can also try the tools/net/ynl/samples/page-pool > > application, not sure if Intel NICs init page pools appropriately > > but this will show you exactly how much memory is sitting on Rx rings > > of the driver (and in net socket buffers). > > I'm not familiar with the page-pool tool, I try to build it, run it > and nothing is shown. Any hint/menual how to use it? It's pretty dumb, you run it and it tells you how much memory is allocated by Rx page pools. Commit message has an example: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=637567e4a3ef6f6a5ffa48781207d270265f7e68 ^ permalink raw reply [flat|nested] 46+ messages in thread
* RE: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-04-16 22:44 ` Jakub Kicinski @ 2025-04-16 22:57 ` Keller, Jacob E 0 siblings, 0 replies; 46+ messages in thread From: Keller, Jacob E @ 2025-04-16 22:57 UTC (permalink / raw) To: Jakub Kicinski, Jaroslav Pulchart Cc: Kitszel, Przemyslaw, Damato, Joe, intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, Nguyen, Anthony L, Igor Raits, Daniel Secik, Zdenek Pesek, Dumazet, Eric, Martin Karsten, Zaki, Ahmed, Czapnik, Lukasz, Michal Swiatkowski > -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Jakub > Kicinski > Sent: Wednesday, April 16, 2025 3:45 PM > To: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> > Cc: Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; Damato, Joe > <jdamato@fastly.com>; intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; > Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Igor Raits > <igor@gooddata.com>; Daniel Secik <daniel.secik@gooddata.com>; Zdenek Pesek > <zdenek.pesek@gooddata.com>; Dumazet, Eric <edumazet@google.com>; Martin > Karsten <mkarsten@uwaterloo.ca>; Zaki, Ahmed <ahmed.zaki@intel.com>; > Czapnik, Lukasz <lukasz.czapnik@intel.com>; Michal Swiatkowski > <michal.swiatkowski@linux.intel.com> > Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE > driver after upgrade to 6.13.y (regression in commit 492a044508ad) > > On Wed, 16 Apr 2025 18:03:52 +0200 Jaroslav Pulchart wrote: > > > FWIW you can also try the tools/net/ynl/samples/page-pool > > > application, not sure if Intel NICs init page pools appropriately > > > but this will show you exactly how much memory is sitting on Rx rings > > > of the driver (and in net socket buffers). > > > > I'm not familiar with the page-pool tool, I try to build it, run it > > and nothing is shown. Any hint/menual how to use it? > > It's pretty dumb, you run it and it tells you how much memory is > allocated by Rx page pools. Commit message has an example: > https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i > d=637567e4a3ef6f6a5ffa48781207d270265f7e68 Unfortunately, I don't think ice has migrated to page pool just yet ☹ ^ permalink raw reply [flat|nested] 46+ messages in thread
* RE: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-04-16 16:03 ` Jaroslav Pulchart 2025-04-16 22:44 ` Jakub Kicinski @ 2025-04-16 22:57 ` Keller, Jacob E 2025-04-17 0:13 ` Jakub Kicinski 1 sibling, 1 reply; 46+ messages in thread From: Keller, Jacob E @ 2025-04-16 22:57 UTC (permalink / raw) To: Jaroslav Pulchart, Jakub Kicinski Cc: Kitszel, Przemyslaw, Damato, Joe, intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, Nguyen, Anthony L, Igor Raits, Daniel Secik, Zdenek Pesek, Dumazet, Eric, Martin Karsten, Zaki, Ahmed, Czapnik, Lukasz, Michal Swiatkowski > -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Jaroslav > Pulchart > Sent: Wednesday, April 16, 2025 9:04 AM > To: Jakub Kicinski <kuba@kernel.org> > Cc: Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; Damato, Joe > <jdamato@fastly.com>; intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; > Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Igor Raits > <igor@gooddata.com>; Daniel Secik <daniel.secik@gooddata.com>; Zdenek Pesek > <zdenek.pesek@gooddata.com>; Dumazet, Eric <edumazet@google.com>; Martin > Karsten <mkarsten@uwaterloo.ca>; Zaki, Ahmed <ahmed.zaki@intel.com>; > Czapnik, Lukasz <lukasz.czapnik@intel.com>; Michal Swiatkowski > <michal.swiatkowski@linux.intel.com> > Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE > driver after upgrade to 6.13.y (regression in commit 492a044508ad) > > > > > On Wed, 16 Apr 2025 09:13:23 +0200 Jaroslav Pulchart wrote: > > > By "traced" I mean using the kernel and checking memory situation on > > > numa nodes with and without production load. Numa nodes, with X810 > > > NIC, showing a quite less available memory with default queue length > > > (num of all cpus) and it needs to be lowered to 1-2 (for unused > > > interfaces) and up-to-count of numa node cores on used interfaces to > > > make the memory allocation reasonable and server avoiding "kswapd"... > > > > > > See "MemFree" on numa 0 + 1 on different/smaller but utilized (running > > > VMs + using network) host server with 8 numa nodes (32GB RAM each, 28G > > > in Hugepase for VMs and 4GB for host os): > > > > FWIW you can also try the tools/net/ynl/samples/page-pool > > application, not sure if Intel NICs init page pools appropriately > > but this will show you exactly how much memory is sitting on Rx rings > > of the driver (and in net socket buffers). > > I'm not familiar with the page-pool tool, I try to build it, run it > and nothing is shown. Any hint/menual how to use it? > > > > > > 6.13.y vanilla (lot of kswapd0 in background): > > > NUMA nodes: 0 1 2 3 4 5 6 7 > > > HPTotalGiB: 28 28 28 28 28 28 28 28 > > > HPFreeGiB: 0 0 0 0 0 0 0 0 > > > MemTotal: 32220 32701 32701 32686 32701 32701 > > > 32701 32696 > > > MemFree: 274 254 1327 1928 1949 2683 2624 2769 > > > 6.13.y + Revert (no memory issues at all): > > > NUMA nodes: 0 1 2 3 4 5 6 7 > > > HPTotalGiB: 28 28 28 28 28 28 28 28 > > > HPFreeGiB: 0 0 0 0 0 0 0 0 > > > MemTotal: 32220 32701 32701 32686 32701 32701 32701 32696 > > > MemFree: 2213 2438 3402 3108 2846 2672 2592 3063 > > > > > > We need to lower the queue on all X810 interfaces from default (64 in > > > this case), to ensure we have memory available for host OS services. > > > ethtool -L em2 combined 1 > > > ethtool -L p3p2 combined 1 > > > ethtool -L em1 combined 6 > > > ethtool -L p3p1 combined 6 > > > This trick "does not work" without the revert. > > > > And you're reverting just and exactly 492a044508ad13 ? > > The memory for persistent config is allocated in alloc_netdev_mqs() > > unconditionally. I'm lost as to how this commit could make any > > difference :( > > Yes, reverted the 492a044508ad13. Struct napi_config *is* 1056 bytes, or about 1Kb, and we will allocate one per max queue with this change, resulting in 1KB per CPU.. if there is a 64 CPU system this should be at most 64KB... That seems unlikely to be the root cause of memory outage like this is just the napi_config structure.... Perhaps something that netif_napi_restore_config is somehow causing us to end up with more allocated memory? Or some interaction with our ethtool callback to reduce the number of rings is not working properly..? ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-04-16 22:57 ` Keller, Jacob E @ 2025-04-17 0:13 ` Jakub Kicinski 2025-04-17 17:52 ` Keller, Jacob E 0 siblings, 1 reply; 46+ messages in thread From: Jakub Kicinski @ 2025-04-17 0:13 UTC (permalink / raw) To: Keller, Jacob E Cc: Jaroslav Pulchart, Kitszel, Przemyslaw, Damato, Joe, intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, Nguyen, Anthony L, Igor Raits, Daniel Secik, Zdenek Pesek, Dumazet, Eric, Martin Karsten, Zaki, Ahmed, Czapnik, Lukasz, Michal Swiatkowski On Wed, 16 Apr 2025 22:57:10 +0000 Keller, Jacob E wrote: > > > And you're reverting just and exactly 492a044508ad13 ? > > > The memory for persistent config is allocated in alloc_netdev_mqs() > > > unconditionally. I'm lost as to how this commit could make any > > > difference :( > > > > Yes, reverted the 492a044508ad13. > > Struct napi_config *is* 1056 bytes You're probably looking at 6.15-rcX kernels. Yes, the affinity mask can be large depending on the kernel config. But report is for 6.13, AFAIU. In 6.13 and 6.14 napi_config was tiny. ^ permalink raw reply [flat|nested] 46+ messages in thread
* RE: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-04-17 0:13 ` Jakub Kicinski @ 2025-04-17 17:52 ` Keller, Jacob E 2025-05-21 10:50 ` Jaroslav Pulchart 0 siblings, 1 reply; 46+ messages in thread From: Keller, Jacob E @ 2025-04-17 17:52 UTC (permalink / raw) To: Jakub Kicinski Cc: Jaroslav Pulchart, Kitszel, Przemyslaw, Damato, Joe, intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, Nguyen, Anthony L, Igor Raits, Daniel Secik, Zdenek Pesek, Dumazet, Eric, Martin Karsten, Zaki, Ahmed, Czapnik, Lukasz, Michal Swiatkowski > -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Wednesday, April 16, 2025 5:13 PM > To: Keller, Jacob E <jacob.e.keller@intel.com> > Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>; Kitszel, Przemyslaw > <przemyslaw.kitszel@intel.com>; Damato, Joe <jdamato@fastly.com>; intel-wired- > lan@lists.osuosl.org; netdev@vger.kernel.org; Nguyen, Anthony L > <anthony.l.nguyen@intel.com>; Igor Raits <igor@gooddata.com>; Daniel Secik > <daniel.secik@gooddata.com>; Zdenek Pesek <zdenek.pesek@gooddata.com>; > Dumazet, Eric <edumazet@google.com>; Martin Karsten > <mkarsten@uwaterloo.ca>; Zaki, Ahmed <ahmed.zaki@intel.com>; Czapnik, > Lukasz <lukasz.czapnik@intel.com>; Michal Swiatkowski > <michal.swiatkowski@linux.intel.com> > Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE > driver after upgrade to 6.13.y (regression in commit 492a044508ad) > > On Wed, 16 Apr 2025 22:57:10 +0000 Keller, Jacob E wrote: > > > > And you're reverting just and exactly 492a044508ad13 ? > > > > The memory for persistent config is allocated in alloc_netdev_mqs() > > > > unconditionally. I'm lost as to how this commit could make any > > > > difference :( > > > > > > Yes, reverted the 492a044508ad13. > > > > Struct napi_config *is* 1056 bytes > > You're probably looking at 6.15-rcX kernels. Yes, the affinity mask > can be large depending on the kernel config. But report is for 6.13, > AFAIU. In 6.13 and 6.14 napi_config was tiny. Regardless, it should still be ~64KB even in that case which is a far cry from eating all available memory. Something else must be going on.... Thanks, Jake ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-04-17 17:52 ` Keller, Jacob E @ 2025-05-21 10:50 ` Jaroslav Pulchart 2025-06-04 8:42 ` Jaroslav Pulchart 0 siblings, 1 reply; 46+ messages in thread From: Jaroslav Pulchart @ 2025-05-21 10:50 UTC (permalink / raw) To: Keller, Jacob E, Jakub Kicinski, Kitszel, Przemyslaw, Damato, Joe, intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten Cc: Igor Raits, Daniel Secik, Zdenek Pesek čt 17. 4. 2025 v 19:52 odesílatel Keller, Jacob E <jacob.e.keller@intel.com> napsal: > > > > > -----Original Message----- > > From: Jakub Kicinski <kuba@kernel.org> > > Sent: Wednesday, April 16, 2025 5:13 PM > > To: Keller, Jacob E <jacob.e.keller@intel.com> > > Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>; Kitszel, Przemyslaw > > <przemyslaw.kitszel@intel.com>; Damato, Joe <jdamato@fastly.com>; intel-wired- > > lan@lists.osuosl.org; netdev@vger.kernel.org; Nguyen, Anthony L > > <anthony.l.nguyen@intel.com>; Igor Raits <igor@gooddata.com>; Daniel Secik > > <daniel.secik@gooddata.com>; Zdenek Pesek <zdenek.pesek@gooddata.com>; > > Dumazet, Eric <edumazet@google.com>; Martin Karsten > > <mkarsten@uwaterloo.ca>; Zaki, Ahmed <ahmed.zaki@intel.com>; Czapnik, > > Lukasz <lukasz.czapnik@intel.com>; Michal Swiatkowski > > <michal.swiatkowski@linux.intel.com> > > Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE > > driver after upgrade to 6.13.y (regression in commit 492a044508ad) > > > > On Wed, 16 Apr 2025 22:57:10 +0000 Keller, Jacob E wrote: > > > > > And you're reverting just and exactly 492a044508ad13 ? > > > > > The memory for persistent config is allocated in alloc_netdev_mqs() > > > > > unconditionally. I'm lost as to how this commit could make any > > > > > difference :( > > > > > > > > Yes, reverted the 492a044508ad13. > > > > > > Struct napi_config *is* 1056 bytes > > > > You're probably looking at 6.15-rcX kernels. Yes, the affinity mask > > can be large depending on the kernel config. But report is for 6.13, > > AFAIU. In 6.13 and 6.14 napi_config was tiny. > > Regardless, it should still be ~64KB even in that case which is a far cry from eating all available memory. Something else must be going on.... > > Thanks, > Jake Hello Some observation, this "problem" still exists with the latest 6.14.y and there must be multiple issues, the memory utilization is slowly going down, from 3GB to 100MB in 10-20days. at home NUMA nodes where intel x810 NIC are (looks like some memory leak related to networking). So without the revert the kawadX usage is observed asap like till 1-2d, with revert of mentioned commit kswadX starts to consume resources later like in ~10d-20d later. It is almost impossible to use servers with Intel X810 cards (ice driver) with recent linux kernels. Were you able to reproduce the memory problems in your testbed? Best, Jaroslav ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-05-21 10:50 ` Jaroslav Pulchart @ 2025-06-04 8:42 ` Jaroslav Pulchart [not found] ` <CAK8fFZ5XTO9dGADuMSV0hJws-6cZE9equa3X6dfTBgDyzE1pEQ@mail.gmail.com> 0 siblings, 1 reply; 46+ messages in thread From: Jaroslav Pulchart @ 2025-06-04 8:42 UTC (permalink / raw) To: Keller, Jacob E, Jakub Kicinski, Kitszel, Przemyslaw, Damato, Joe, intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten Cc: Igor Raits, Daniel Secik, Zdenek Pesek [-- Attachment #1: Type: text/plain, Size: 3080 bytes --] > > čt 17. 4. 2025 v 19:52 odesílatel Keller, Jacob E > <jacob.e.keller@intel.com> napsal: > > > > > > > > > -----Original Message----- > > > From: Jakub Kicinski <kuba@kernel.org> > > > Sent: Wednesday, April 16, 2025 5:13 PM > > > To: Keller, Jacob E <jacob.e.keller@intel.com> > > > Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>; Kitszel, Przemyslaw > > > <przemyslaw.kitszel@intel.com>; Damato, Joe <jdamato@fastly.com>; intel-wired- > > > lan@lists.osuosl.org; netdev@vger.kernel.org; Nguyen, Anthony L > > > <anthony.l.nguyen@intel.com>; Igor Raits <igor@gooddata.com>; Daniel Secik > > > <daniel.secik@gooddata.com>; Zdenek Pesek <zdenek.pesek@gooddata.com>; > > > Dumazet, Eric <edumazet@google.com>; Martin Karsten > > > <mkarsten@uwaterloo.ca>; Zaki, Ahmed <ahmed.zaki@intel.com>; Czapnik, > > > Lukasz <lukasz.czapnik@intel.com>; Michal Swiatkowski > > > <michal.swiatkowski@linux.intel.com> > > > Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE > > > driver after upgrade to 6.13.y (regression in commit 492a044508ad) > > > > > > On Wed, 16 Apr 2025 22:57:10 +0000 Keller, Jacob E wrote: > > > > > > And you're reverting just and exactly 492a044508ad13 ? > > > > > > The memory for persistent config is allocated in alloc_netdev_mqs() > > > > > > unconditionally. I'm lost as to how this commit could make any > > > > > > difference :( > > > > > > > > > > Yes, reverted the 492a044508ad13. > > > > > > > > Struct napi_config *is* 1056 bytes > > > > > > You're probably looking at 6.15-rcX kernels. Yes, the affinity mask > > > can be large depending on the kernel config. But report is for 6.13, > > > AFAIU. In 6.13 and 6.14 napi_config was tiny. > > > > Regardless, it should still be ~64KB even in that case which is a far cry from eating all available memory. Something else must be going on.... > > > > Thanks, > > Jake > > Hello > > Some observation, this "problem" still exists with the latest 6.14.y > and there must be multiple issues, the memory utilization is slowly > going down, from 3GB to 100MB in 10-20days. at home NUMA nodes where > intel x810 NIC are (looks like some memory leak related to > networking). > > So without the revert the kawadX usage is observed asap like till > 1-2d, with revert of mentioned commit kswadX starts to consume > resources later like in ~10d-20d later. It is almost impossible to use > servers with Intel X810 cards (ice driver) with recent linux kernels. > > Were you able to reproduce the memory problems in your testbed? > > Best, > Jaroslav Hello I deployed linux 6.15.0 to our servers 7d ago and observed the behaviour of memory utilization of NUMA home nodes of Intel X810 1/ there is no need to revert the commit as before, 2/ the memory is continuously consumed (like memory leak), see attached "7d_memory_usage_per_numa_linux6.15.0.png" screenshot 8x numa nodes, (NUMA0 + NUMA1 are local for X810 nics). BTW: We do not see this memory utilization pattern on server s using Broadcom Netxtreme-E NICs [-- Attachment #2: 7d_memory_usage_per_numa_linux6.15.0.png --] [-- Type: image/png, Size: 430093 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <CAK8fFZ5XTO9dGADuMSV0hJws-6cZE9equa3X6dfTBgDyzE1pEQ@mail.gmail.com>]
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) [not found] ` <CAK8fFZ5XTO9dGADuMSV0hJws-6cZE9equa3X6dfTBgDyzE1pEQ@mail.gmail.com> @ 2025-06-25 14:03 ` Przemek Kitszel [not found] ` <CAK8fFZ7LREBEdhXjBAKuaqktOz1VwsBTxcCpLBsa+dkMj4Pyyw@mail.gmail.com> 2025-06-25 14:53 ` Paul Menzel 1 sibling, 1 reply; 46+ messages in thread From: Przemek Kitszel @ 2025-06-25 14:03 UTC (permalink / raw) To: Jaroslav Pulchart, intel-wired-lan@lists.osuosl.org Cc: Keller, Jacob E, Jakub Kicinski, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek On 6/25/25 14:17, Jaroslav Pulchart wrote: > Hello > > We are still facing the memory issue with Intel 810 NICs (even on latest > 6.15.y). > > Our current stabilization and solution is to move everything to a new > INTEL-FREE server and get rid of last Intel sights there (after Intel's > CPU vulnerabilities fuckups NICs are next step). > > Any help welcomed, > Jaroslav P. > > Thank you for urging us, I can understand the frustration. We have identified some (unrelated) memory leaks, will soon ship fixes. And, as there were no clear issue with any commit/version you have posted to be a culprit, there is a chance that our random findings could help. Anyway going to zero kmemleak reports is good in itself, that is a good start. Will ask my VAL too to increase efforts in this area too. Przemek > > st 4. 6. 2025 v 10:42 odesílatel Jaroslav Pulchart > <jaroslav.pulchart@gooddata.com <mailto:jaroslav.pulchart@gooddata.com>> > napsal: > > > > > čt 17. 4. 2025 v 19:52 odesílatel Keller, Jacob E > > <jacob.e.keller@intel.com <mailto:jacob.e.keller@intel.com>> napsal: > > > > > > > > > > > > > -----Original Message----- > > > > From: Jakub Kicinski <kuba@kernel.org <mailto:kuba@kernel.org>> > > > > Sent: Wednesday, April 16, 2025 5:13 PM > > > > To: Keller, Jacob E <jacob.e.keller@intel.com > <mailto:jacob.e.keller@intel.com>> > > > > Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com > <mailto:jaroslav.pulchart@gooddata.com>>; Kitszel, Przemyslaw > > > > <przemyslaw.kitszel@intel.com > <mailto:przemyslaw.kitszel@intel.com>>; Damato, Joe > <jdamato@fastly.com <mailto:jdamato@fastly.com>>; intel-wired- > > > > lan@lists.osuosl.org <mailto:lan@lists.osuosl.org>; > netdev@vger.kernel.org <mailto:netdev@vger.kernel.org>; Nguyen, > Anthony L > > > > <anthony.l.nguyen@intel.com > <mailto:anthony.l.nguyen@intel.com>>; Igor Raits <igor@gooddata.com > <mailto:igor@gooddata.com>>; Daniel Secik > > > > <daniel.secik@gooddata.com > <mailto:daniel.secik@gooddata.com>>; Zdenek Pesek > <zdenek.pesek@gooddata.com <mailto:zdenek.pesek@gooddata.com>>; > > > > Dumazet, Eric <edumazet@google.com > <mailto:edumazet@google.com>>; Martin Karsten > > > > <mkarsten@uwaterloo.ca <mailto:mkarsten@uwaterloo.ca>>; Zaki, > Ahmed <ahmed.zaki@intel.com <mailto:ahmed.zaki@intel.com>>; Czapnik, > > > > Lukasz <lukasz.czapnik@intel.com > <mailto:lukasz.czapnik@intel.com>>; Michal Swiatkowski > > > > <michal.swiatkowski@linux.intel.com > <mailto:michal.swiatkowski@linux.intel.com>> > > > > Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA > nodes with ICE > > > > driver after upgrade to 6.13.y (regression in commit > 492a044508ad) > > > > > > > > On Wed, 16 Apr 2025 22:57:10 +0000 Keller, Jacob E wrote: > > > > > > > And you're reverting just and exactly 492a044508ad13 ? > > > > > > > The memory for persistent config is allocated in > alloc_netdev_mqs() > > > > > > > unconditionally. I'm lost as to how this commit could > make any > > > > > > > difference :( > > > > > > > > > > > > Yes, reverted the 492a044508ad13. > > > > > > > > > > Struct napi_config *is* 1056 bytes > > > > > > > > You're probably looking at 6.15-rcX kernels. Yes, the > affinity mask > > > > can be large depending on the kernel config. But report is > for 6.13, > > > > AFAIU. In 6.13 and 6.14 napi_config was tiny. > > > > > > Regardless, it should still be ~64KB even in that case which is > a far cry from eating all available memory. Something else must be > going on.... > > > > > > Thanks, > > > Jake > > > > Hello > > > > Some observation, this "problem" still exists with the latest 6.14.y > > and there must be multiple issues, the memory utilization is slowly > > going down, from 3GB to 100MB in 10-20days. at home NUMA nodes where > > intel x810 NIC are (looks like some memory leak related to > > networking). > > > > So without the revert the kawadX usage is observed asap like till > > 1-2d, with revert of mentioned commit kswadX starts to consume > > resources later like in ~10d-20d later. It is almost impossible > to use > > servers with Intel X810 cards (ice driver) with recent linux kernels. > > > > Were you able to reproduce the memory problems in your testbed? > > > > Best, > > Jaroslav > > Hello > > I deployed linux 6.15.0 to our servers 7d ago and observed the > behaviour of memory utilization of NUMA home nodes of Intel X810 > 1/ there is no need to revert the commit as before, > 2/ the memory is continuously consumed (like memory leak), > see attached "7d_memory_usage_per_numa_linux6.15.0.png" screenshot 8x > numa nodes, (NUMA0 + NUMA1 are local for X810 nics). BTW: We do not > see this memory utilization pattern on server s using Broadcom > Netxtreme-E NICs > > > > -- > Jaroslav Pulchart > Sr. Principal SW Engineer > GoodData ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <CAK8fFZ7LREBEdhXjBAKuaqktOz1VwsBTxcCpLBsa+dkMj4Pyyw@mail.gmail.com>]
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) [not found] ` <CAK8fFZ7LREBEdhXjBAKuaqktOz1VwsBTxcCpLBsa+dkMj4Pyyw@mail.gmail.com> @ 2025-06-25 20:25 ` Jakub Kicinski 2025-06-26 7:42 ` Jaroslav Pulchart 0 siblings, 1 reply; 46+ messages in thread From: Jakub Kicinski @ 2025-06-25 20:25 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote: > Great, please send me a link to the related patch set. I can apply them in > our kernel build and try them ASAP! Sorry if I'm repeating the question - have you tried CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels is low enough to use it for production workloads. > st 25. 6. 2025 v 16:03 odesílatel Przemek Kitszel < > przemyslaw.kitszel@intel.com> napsal: > > > On 6/25/25 14:17, Jaroslav Pulchart wrote: > > > Hello > > > > > > We are still facing the memory issue with Intel 810 NICs (even on latest > > > 6.15.y). > > > > > > Our current stabilization and solution is to move everything to a new > > > INTEL-FREE server and get rid of last Intel sights there (after Intel's > > > CPU vulnerabilities fuckups NICs are next step). > > > > > > Any help welcomed, > > > Jaroslav P. > > > > > > > > > > Thank you for urging us, I can understand the frustration. > > > > We have identified some (unrelated) memory leaks, will soon ship fixes. > > And, as there were no clear issue with any commit/version you have > > posted to be a culprit, there is a chance that our random findings could > > help. Anyway going to zero kmemleak reports is good in itself, that is > > a good start. > > > > Will ask my VAL too to increase efforts in this area too. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-06-25 20:25 ` Jakub Kicinski @ 2025-06-26 7:42 ` Jaroslav Pulchart 2025-06-30 7:35 ` Jaroslav Pulchart 0 siblings, 1 reply; 46+ messages in thread From: Jaroslav Pulchart @ 2025-06-26 7:42 UTC (permalink / raw) To: Jakub Kicinski Cc: Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek > > On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote: > > Great, please send me a link to the related patch set. I can apply them in > > our kernel build and try them ASAP! > > Sorry if I'm repeating the question - have you tried > CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels > is low enough to use it for production workloads. I try it now, the fresh booted server: # sort -g /proc/allocinfo| tail -n 15 45409728 236509 fs/dcache.c:1681 func:__d_alloc 71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages 71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages 85098496 4486 mm/slub.c:2452 func:alloc_slab_page 115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page 141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio 191594496 46776 mm/memory.c:1056 func:folio_prealloc 360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio 444076032 33790 mm/slub.c:2450 func:alloc_slab_page 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext 975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd 1022427136 249616 mm/memory.c:1054 func:folio_prealloc 1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice] func:ice_alloc_mapped_page 1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio > > > st 25. 6. 2025 v 16:03 odesílatel Przemek Kitszel < > > przemyslaw.kitszel@intel.com> napsal: > > > > > On 6/25/25 14:17, Jaroslav Pulchart wrote: > > > > Hello > > > > > > > > We are still facing the memory issue with Intel 810 NICs (even on latest > > > > 6.15.y). > > > > > > > > Our current stabilization and solution is to move everything to a new > > > > INTEL-FREE server and get rid of last Intel sights there (after Intel's > > > > CPU vulnerabilities fuckups NICs are next step). > > > > > > > > Any help welcomed, > > > > Jaroslav P. > > > > > > > > > > > > > > Thank you for urging us, I can understand the frustration. > > > > > > We have identified some (unrelated) memory leaks, will soon ship fixes. > > > And, as there were no clear issue with any commit/version you have > > > posted to be a culprit, there is a chance that our random findings could > > > help. Anyway going to zero kmemleak reports is good in itself, that is > > > a good start. > > > > > > Will ask my VAL too to increase efforts in this area too. > ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-06-26 7:42 ` Jaroslav Pulchart @ 2025-06-30 7:35 ` Jaroslav Pulchart 2025-06-30 16:02 ` Jacob Keller 0 siblings, 1 reply; 46+ messages in thread From: Jaroslav Pulchart @ 2025-06-30 7:35 UTC (permalink / raw) To: Jakub Kicinski Cc: Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek > > > > > On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote: > > > Great, please send me a link to the related patch set. I can apply them in > > > our kernel build and try them ASAP! > > > > Sorry if I'm repeating the question - have you tried > > CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels > > is low enough to use it for production workloads. > > I try it now, the fresh booted server: > > # sort -g /proc/allocinfo| tail -n 15 > 45409728 236509 fs/dcache.c:1681 func:__d_alloc > 71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages > 71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages > 85098496 4486 mm/slub.c:2452 func:alloc_slab_page > 115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode > 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page > 141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio > 191594496 46776 mm/memory.c:1056 func:folio_prealloc > 360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio > 444076032 33790 mm/slub.c:2450 func:alloc_slab_page > 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext > 975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd > 1022427136 249616 mm/memory.c:1054 func:folio_prealloc > 1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681 > [ice] func:ice_alloc_mapped_page > 1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio > The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice] func:ice_alloc_mapped_page" is just growing... # uptime ; sort -g /proc/allocinfo| tail -n 15 09:33:58 up 4 days, 6 min, 1 user, load average: 6.65, 8.18, 9.81 # sort -g /proc/allocinfo| tail -n 15 85216896 443838 fs/dcache.c:1681 func:__d_alloc 106156032 25917 mm/shmem.c:1854 func:shmem_alloc_folio 116850096 102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page 143556608 6894 mm/slub.c:2452 func:alloc_slab_page 186793984 45604 mm/memory.c:1056 func:folio_prealloc 362807296 88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext 598237184 51309 mm/slub.c:2450 func:alloc_slab_page 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd 929083392 226827 mm/filemap.c:1978 func:__filemap_get_folio 1034657792 252602 mm/memory.c:1054 func:folio_prealloc 1262485504 602 mm/khugepaged.c:1084 func:alloc_charge_folio 1335377920 325970 mm/readahead.c:186 func:ractl_alloc_folio 2544877568 315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice] func:ice_alloc_mapped_page > > > > > > st 25. 6. 2025 v 16:03 odesílatel Przemek Kitszel < > > > przemyslaw.kitszel@intel.com> napsal: > > > > > > > On 6/25/25 14:17, Jaroslav Pulchart wrote: > > > > > Hello > > > > > > > > > > We are still facing the memory issue with Intel 810 NICs (even on latest > > > > > 6.15.y). > > > > > > > > > > Our current stabilization and solution is to move everything to a new > > > > > INTEL-FREE server and get rid of last Intel sights there (after Intel's > > > > > CPU vulnerabilities fuckups NICs are next step). > > > > > > > > > > Any help welcomed, > > > > > Jaroslav P. > > > > > > > > > > > > > > > > > > Thank you for urging us, I can understand the frustration. > > > > > > > > We have identified some (unrelated) memory leaks, will soon ship fixes. > > > > And, as there were no clear issue with any commit/version you have > > > > posted to be a culprit, there is a chance that our random findings could > > > > help. Anyway going to zero kmemleak reports is good in itself, that is > > > > a good start. > > > > > > > > Will ask my VAL too to increase efforts in this area too. > > ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-06-30 7:35 ` Jaroslav Pulchart @ 2025-06-30 16:02 ` Jacob Keller 2025-06-30 17:24 ` Jaroslav Pulchart 0 siblings, 1 reply; 46+ messages in thread From: Jacob Keller @ 2025-06-30 16:02 UTC (permalink / raw) To: Jaroslav Pulchart, Jakub Kicinski Cc: Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek [-- Attachment #1.1: Type: text/plain, Size: 3954 bytes --] On 6/30/2025 12:35 AM, Jaroslav Pulchart wrote: >> >>> >>> On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote: >>>> Great, please send me a link to the related patch set. I can apply them in >>>> our kernel build and try them ASAP! >>> >>> Sorry if I'm repeating the question - have you tried >>> CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels >>> is low enough to use it for production workloads. >> >> I try it now, the fresh booted server: >> >> # sort -g /proc/allocinfo| tail -n 15 >> 45409728 236509 fs/dcache.c:1681 func:__d_alloc >> 71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages >> 71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages >> 85098496 4486 mm/slub.c:2452 func:alloc_slab_page >> 115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode >> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page >> 141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio >> 191594496 46776 mm/memory.c:1056 func:folio_prealloc >> 360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio >> 444076032 33790 mm/slub.c:2450 func:alloc_slab_page >> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext >> 975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd >> 1022427136 249616 mm/memory.c:1054 func:folio_prealloc >> 1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681 >> [ice] func:ice_alloc_mapped_page >> 1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio >> > > The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice] > func:ice_alloc_mapped_page" is just growing... > > # uptime ; sort -g /proc/allocinfo| tail -n 15 > 09:33:58 up 4 days, 6 min, 1 user, load average: 6.65, 8.18, 9.81 > > # sort -g /proc/allocinfo| tail -n 15 > 85216896 443838 fs/dcache.c:1681 func:__d_alloc > 106156032 25917 mm/shmem.c:1854 func:shmem_alloc_folio > 116850096 102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode > 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page > 143556608 6894 mm/slub.c:2452 func:alloc_slab_page > 186793984 45604 mm/memory.c:1056 func:folio_prealloc > 362807296 88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages > 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext > 598237184 51309 mm/slub.c:2450 func:alloc_slab_page > 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd > 929083392 226827 mm/filemap.c:1978 func:__filemap_get_folio > 1034657792 252602 mm/memory.c:1054 func:folio_prealloc > 1262485504 602 mm/khugepaged.c:1084 func:alloc_charge_folio > 1335377920 325970 mm/readahead.c:186 func:ractl_alloc_folio > 2544877568 315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681 > [ice] func:ice_alloc_mapped_page > ice_alloc_mapped_page is the function used to allocate the pages for the Rx ring buffers. There were a number of fixes for the hot path from Maciej which might be related. Although those fixes were primarily for XDP they do impact the regular hot path as well. These were fixes on top of work he did which landed in v6.13, so it seems plausible they might be related. In particular one which mentions a missing buffer put: 743bbd93cf29 ("ice: put Rx buffers after being done with current frame") It says the following: > While at it, address an error path of ice_add_xdp_frag() - we were > missing buffer putting from day 1 there. > It seems to me the issue must be somehow related to the buffer cleanup logic for the Rx ring, since thats the only thing allocated by ice_alloc_mapped_page. It might be something fixed with the work Maciej did.. but it seems very weird that 492a044508ad ("ice: Add support for persistent NAPI config") would affect that logic at all.... [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-06-30 16:02 ` Jacob Keller @ 2025-06-30 17:24 ` Jaroslav Pulchart 2025-06-30 18:59 ` Jacob Keller 0 siblings, 1 reply; 46+ messages in thread From: Jaroslav Pulchart @ 2025-06-30 17:24 UTC (permalink / raw) To: Jacob Keller Cc: Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek > > > > On 6/30/2025 12:35 AM, Jaroslav Pulchart wrote: > >> > >>> > >>> On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote: > >>>> Great, please send me a link to the related patch set. I can apply them in > >>>> our kernel build and try them ASAP! > >>> > >>> Sorry if I'm repeating the question - have you tried > >>> CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels > >>> is low enough to use it for production workloads. > >> > >> I try it now, the fresh booted server: > >> > >> # sort -g /proc/allocinfo| tail -n 15 > >> 45409728 236509 fs/dcache.c:1681 func:__d_alloc > >> 71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages > >> 71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages > >> 85098496 4486 mm/slub.c:2452 func:alloc_slab_page > >> 115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode > >> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page > >> 141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio > >> 191594496 46776 mm/memory.c:1056 func:folio_prealloc > >> 360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio > >> 444076032 33790 mm/slub.c:2450 func:alloc_slab_page > >> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext > >> 975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd > >> 1022427136 249616 mm/memory.c:1054 func:folio_prealloc > >> 1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681 > >> [ice] func:ice_alloc_mapped_page > >> 1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio > >> > > > > The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice] > > func:ice_alloc_mapped_page" is just growing... > > > > # uptime ; sort -g /proc/allocinfo| tail -n 15 > > 09:33:58 up 4 days, 6 min, 1 user, load average: 6.65, 8.18, 9.81 > > > > # sort -g /proc/allocinfo| tail -n 15 > > 85216896 443838 fs/dcache.c:1681 func:__d_alloc > > 106156032 25917 mm/shmem.c:1854 func:shmem_alloc_folio > > 116850096 102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode > > 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page > > 143556608 6894 mm/slub.c:2452 func:alloc_slab_page > > 186793984 45604 mm/memory.c:1056 func:folio_prealloc > > 362807296 88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages > > 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext > > 598237184 51309 mm/slub.c:2450 func:alloc_slab_page > > 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd > > 929083392 226827 mm/filemap.c:1978 func:__filemap_get_folio > > 1034657792 252602 mm/memory.c:1054 func:folio_prealloc > > 1262485504 602 mm/khugepaged.c:1084 func:alloc_charge_folio > > 1335377920 325970 mm/readahead.c:186 func:ractl_alloc_folio > > 2544877568 315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681 > > [ice] func:ice_alloc_mapped_page > > > ice_alloc_mapped_page is the function used to allocate the pages for the > Rx ring buffers. > > There were a number of fixes for the hot path from Maciej which might be > related. Although those fixes were primarily for XDP they do impact the > regular hot path as well. > > These were fixes on top of work he did which landed in v6.13, so it > seems plausible they might be related. In particular one which mentions > a missing buffer put: > > 743bbd93cf29 ("ice: put Rx buffers after being done with current frame") > > It says the following: > > While at it, address an error path of ice_add_xdp_frag() - we were > > missing buffer putting from day 1 there. > > > > It seems to me the issue must be somehow related to the buffer cleanup > logic for the Rx ring, since thats the only thing allocated by > ice_alloc_mapped_page. > > It might be something fixed with the work Maciej did.. but it seems very > weird that 492a044508ad ("ice: Add support for persistent NAPI config") > would affect that logic at all.... I believe there were/are at least two separate issues. Regarding commit 492a044508ad (“ice: Add support for persistent NAPI config”): * On 6.13.y and 6.14.y kernels, this change prevented us from lowering the driver’s initial, large memory allocation immediately after server power-up. A few hours (max few days) later, this inevitably led to an out-of-memory condition. * Reverting the commit in those series only delayed the OOM, it allowed the queue size (and thus memory footprint) to shrink on boot just as it did in 6.12.y but didn’t eliminate the underlying 'leak'. * In 6.15.y, however, that revert isn’t required (and isn’t even applicable). The after boot allocation can once again be tuned down without patching. Still, we observe the same increase in memory use over time, as shown in the 'allocmap' output. Thus, commit 492a044508ad led us down a false trail, or at the very least hastened the inevitable OOM. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-06-30 17:24 ` Jaroslav Pulchart @ 2025-06-30 18:59 ` Jacob Keller 2025-06-30 20:01 ` Jaroslav Pulchart 0 siblings, 1 reply; 46+ messages in thread From: Jacob Keller @ 2025-06-30 18:59 UTC (permalink / raw) To: Jaroslav Pulchart, Maciej Fijalkowski Cc: Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek [-- Attachment #1.1: Type: text/plain, Size: 5575 bytes --] On 6/30/2025 10:24 AM, Jaroslav Pulchart wrote: >> >> >> >> On 6/30/2025 12:35 AM, Jaroslav Pulchart wrote: >>>> >>>>> >>>>> On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote: >>>>>> Great, please send me a link to the related patch set. I can apply them in >>>>>> our kernel build and try them ASAP! >>>>> >>>>> Sorry if I'm repeating the question - have you tried >>>>> CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels >>>>> is low enough to use it for production workloads. >>>> >>>> I try it now, the fresh booted server: >>>> >>>> # sort -g /proc/allocinfo| tail -n 15 >>>> 45409728 236509 fs/dcache.c:1681 func:__d_alloc >>>> 71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages >>>> 71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages >>>> 85098496 4486 mm/slub.c:2452 func:alloc_slab_page >>>> 115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode >>>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page >>>> 141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio >>>> 191594496 46776 mm/memory.c:1056 func:folio_prealloc >>>> 360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio >>>> 444076032 33790 mm/slub.c:2450 func:alloc_slab_page >>>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext >>>> 975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd >>>> 1022427136 249616 mm/memory.c:1054 func:folio_prealloc >>>> 1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681 >>>> [ice] func:ice_alloc_mapped_page >>>> 1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio >>>> >>> >>> The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice] >>> func:ice_alloc_mapped_page" is just growing... >>> >>> # uptime ; sort -g /proc/allocinfo| tail -n 15 >>> 09:33:58 up 4 days, 6 min, 1 user, load average: 6.65, 8.18, 9.81 >>> >>> # sort -g /proc/allocinfo| tail -n 15 >>> 85216896 443838 fs/dcache.c:1681 func:__d_alloc >>> 106156032 25917 mm/shmem.c:1854 func:shmem_alloc_folio >>> 116850096 102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode >>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page >>> 143556608 6894 mm/slub.c:2452 func:alloc_slab_page >>> 186793984 45604 mm/memory.c:1056 func:folio_prealloc >>> 362807296 88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages >>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext >>> 598237184 51309 mm/slub.c:2450 func:alloc_slab_page >>> 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd >>> 929083392 226827 mm/filemap.c:1978 func:__filemap_get_folio >>> 1034657792 252602 mm/memory.c:1054 func:folio_prealloc >>> 1262485504 602 mm/khugepaged.c:1084 func:alloc_charge_folio >>> 1335377920 325970 mm/readahead.c:186 func:ractl_alloc_folio >>> 2544877568 315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681 >>> [ice] func:ice_alloc_mapped_page >>> >> ice_alloc_mapped_page is the function used to allocate the pages for the >> Rx ring buffers. >> >> There were a number of fixes for the hot path from Maciej which might be >> related. Although those fixes were primarily for XDP they do impact the >> regular hot path as well. >> >> These were fixes on top of work he did which landed in v6.13, so it >> seems plausible they might be related. In particular one which mentions >> a missing buffer put: >> >> 743bbd93cf29 ("ice: put Rx buffers after being done with current frame") >> >> It says the following: >>> While at it, address an error path of ice_add_xdp_frag() - we were >>> missing buffer putting from day 1 there. >>> >> >> It seems to me the issue must be somehow related to the buffer cleanup >> logic for the Rx ring, since thats the only thing allocated by >> ice_alloc_mapped_page. >> >> It might be something fixed with the work Maciej did.. but it seems very >> weird that 492a044508ad ("ice: Add support for persistent NAPI config") >> would affect that logic at all.... > > I believe there were/are at least two separate issues. Regarding > commit 492a044508ad (“ice: Add support for persistent NAPI config”): > * On 6.13.y and 6.14.y kernels, this change prevented us from lowering > the driver’s initial, large memory allocation immediately after server > power-up. A few hours (max few days) later, this inevitably led to an > out-of-memory condition. > * Reverting the commit in those series only delayed the OOM, it > allowed the queue size (and thus memory footprint) to shrink on boot > just as it did in 6.12.y but didn’t eliminate the underlying 'leak'. > * In 6.15.y, however, that revert isn’t required (and isn’t even > applicable). The after boot allocation can once again be tuned down > without patching. Still, we observe the same increase in memory use > over time, as shown in the 'allocmap' output. > Thus, commit 492a044508ad led us down a false trail, or at the very > least hastened the inevitable OOM. That seems reasonable. I'm still surprised the specific commit leads to any large increase in memory, since it should only be a few bytes per NAPI. But there may be some related driver-specific issues. Either way, we clearly need to isolate how we're leaking memory in the hot path. I think it might be related to the fixes from Maciej which are pretty recent so might not be in 6.13 or 6.14 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-06-30 18:59 ` Jacob Keller @ 2025-06-30 20:01 ` Jaroslav Pulchart 2025-06-30 20:42 ` Jacob Keller 2025-06-30 21:56 ` Jacob Keller 0 siblings, 2 replies; 46+ messages in thread From: Jaroslav Pulchart @ 2025-06-30 20:01 UTC (permalink / raw) To: Jacob Keller Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek > > > > On 6/30/2025 10:24 AM, Jaroslav Pulchart wrote: > >> > >> > >> > >> On 6/30/2025 12:35 AM, Jaroslav Pulchart wrote: > >>>> > >>>>> > >>>>> On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote: > >>>>>> Great, please send me a link to the related patch set. I can apply them in > >>>>>> our kernel build and try them ASAP! > >>>>> > >>>>> Sorry if I'm repeating the question - have you tried > >>>>> CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels > >>>>> is low enough to use it for production workloads. > >>>> > >>>> I try it now, the fresh booted server: > >>>> > >>>> # sort -g /proc/allocinfo| tail -n 15 > >>>> 45409728 236509 fs/dcache.c:1681 func:__d_alloc > >>>> 71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages > >>>> 71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages > >>>> 85098496 4486 mm/slub.c:2452 func:alloc_slab_page > >>>> 115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode > >>>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page > >>>> 141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio > >>>> 191594496 46776 mm/memory.c:1056 func:folio_prealloc > >>>> 360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio > >>>> 444076032 33790 mm/slub.c:2450 func:alloc_slab_page > >>>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext > >>>> 975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd > >>>> 1022427136 249616 mm/memory.c:1054 func:folio_prealloc > >>>> 1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681 > >>>> [ice] func:ice_alloc_mapped_page > >>>> 1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio > >>>> > >>> > >>> The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice] > >>> func:ice_alloc_mapped_page" is just growing... > >>> > >>> # uptime ; sort -g /proc/allocinfo| tail -n 15 > >>> 09:33:58 up 4 days, 6 min, 1 user, load average: 6.65, 8.18, 9.81 > >>> > >>> # sort -g /proc/allocinfo| tail -n 15 > >>> 85216896 443838 fs/dcache.c:1681 func:__d_alloc > >>> 106156032 25917 mm/shmem.c:1854 func:shmem_alloc_folio > >>> 116850096 102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode > >>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page > >>> 143556608 6894 mm/slub.c:2452 func:alloc_slab_page > >>> 186793984 45604 mm/memory.c:1056 func:folio_prealloc > >>> 362807296 88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages > >>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext > >>> 598237184 51309 mm/slub.c:2450 func:alloc_slab_page > >>> 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd > >>> 929083392 226827 mm/filemap.c:1978 func:__filemap_get_folio > >>> 1034657792 252602 mm/memory.c:1054 func:folio_prealloc > >>> 1262485504 602 mm/khugepaged.c:1084 func:alloc_charge_folio > >>> 1335377920 325970 mm/readahead.c:186 func:ractl_alloc_folio > >>> 2544877568 315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681 > >>> [ice] func:ice_alloc_mapped_page > >>> > >> ice_alloc_mapped_page is the function used to allocate the pages for the > >> Rx ring buffers. > >> > >> There were a number of fixes for the hot path from Maciej which might be > >> related. Although those fixes were primarily for XDP they do impact the > >> regular hot path as well. > >> > >> These were fixes on top of work he did which landed in v6.13, so it > >> seems plausible they might be related. In particular one which mentions > >> a missing buffer put: > >> > >> 743bbd93cf29 ("ice: put Rx buffers after being done with current frame") > >> > >> It says the following: > >>> While at it, address an error path of ice_add_xdp_frag() - we were > >>> missing buffer putting from day 1 there. > >>> > >> > >> It seems to me the issue must be somehow related to the buffer cleanup > >> logic for the Rx ring, since thats the only thing allocated by > >> ice_alloc_mapped_page. > >> > >> It might be something fixed with the work Maciej did.. but it seems very > >> weird that 492a044508ad ("ice: Add support for persistent NAPI config") > >> would affect that logic at all.... > > > > I believe there were/are at least two separate issues. Regarding > > commit 492a044508ad (“ice: Add support for persistent NAPI config”): > > * On 6.13.y and 6.14.y kernels, this change prevented us from lowering > > the driver’s initial, large memory allocation immediately after server > > power-up. A few hours (max few days) later, this inevitably led to an > > out-of-memory condition. > > * Reverting the commit in those series only delayed the OOM, it > > allowed the queue size (and thus memory footprint) to shrink on boot > > just as it did in 6.12.y but didn’t eliminate the underlying 'leak'. > > * In 6.15.y, however, that revert isn’t required (and isn’t even > > applicable). The after boot allocation can once again be tuned down > > without patching. Still, we observe the same increase in memory use > > over time, as shown in the 'allocmap' output. > > Thus, commit 492a044508ad led us down a false trail, or at the very > > least hastened the inevitable OOM. > > That seems reasonable. I'm still surprised the specific commit leads to > any large increase in memory, since it should only be a few bytes per > NAPI. But there may be some related driver-specific issues. Actually, the large base allocation has existed for quite some time, the mentioned commit didn’t suddenly grow our memory usage, it only prevented us from shrinking it via "ethtool -L <iface> combined <small-number>" after boot. In other words, we’re still stuck with the same big allocation, we just can’t tune it down (till reverting the commit) > > Either way, we clearly need to isolate how we're leaking memory in the > hot path. I think it might be related to the fixes from Maciej which are > pretty recent so might not be in 6.13 or 6.14 I’m fine with the fix for the mainline (now 6.15.y), the 6.13.y and 6.14.y are already EOL. Could you please tell me which 6.15.y stable release first incorporates that patch? Is it included in current 6.15.5, or will it arrive in a later point release? ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-06-30 20:01 ` Jaroslav Pulchart @ 2025-06-30 20:42 ` Jacob Keller 2025-06-30 21:56 ` Jacob Keller 1 sibling, 0 replies; 46+ messages in thread From: Jacob Keller @ 2025-06-30 20:42 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek [-- Attachment #1.1: Type: text/plain, Size: 6777 bytes --] On 6/30/2025 1:01 PM, Jaroslav Pulchart wrote: >> >> >> >> On 6/30/2025 10:24 AM, Jaroslav Pulchart wrote: >>>> >>>> >>>> >>>> On 6/30/2025 12:35 AM, Jaroslav Pulchart wrote: >>>>>> >>>>>>> >>>>>>> On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote: >>>>>>>> Great, please send me a link to the related patch set. I can apply them in >>>>>>>> our kernel build and try them ASAP! >>>>>>> >>>>>>> Sorry if I'm repeating the question - have you tried >>>>>>> CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels >>>>>>> is low enough to use it for production workloads. >>>>>> >>>>>> I try it now, the fresh booted server: >>>>>> >>>>>> # sort -g /proc/allocinfo| tail -n 15 >>>>>> 45409728 236509 fs/dcache.c:1681 func:__d_alloc >>>>>> 71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages >>>>>> 71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages >>>>>> 85098496 4486 mm/slub.c:2452 func:alloc_slab_page >>>>>> 115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode >>>>>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page >>>>>> 141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio >>>>>> 191594496 46776 mm/memory.c:1056 func:folio_prealloc >>>>>> 360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio >>>>>> 444076032 33790 mm/slub.c:2450 func:alloc_slab_page >>>>>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext >>>>>> 975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd >>>>>> 1022427136 249616 mm/memory.c:1054 func:folio_prealloc >>>>>> 1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681 >>>>>> [ice] func:ice_alloc_mapped_page >>>>>> 1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio >>>>>> >>>>> >>>>> The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice] >>>>> func:ice_alloc_mapped_page" is just growing... >>>>> >>>>> # uptime ; sort -g /proc/allocinfo| tail -n 15 >>>>> 09:33:58 up 4 days, 6 min, 1 user, load average: 6.65, 8.18, 9.81 >>>>> >>>>> # sort -g /proc/allocinfo| tail -n 15 >>>>> 85216896 443838 fs/dcache.c:1681 func:__d_alloc >>>>> 106156032 25917 mm/shmem.c:1854 func:shmem_alloc_folio >>>>> 116850096 102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode >>>>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page >>>>> 143556608 6894 mm/slub.c:2452 func:alloc_slab_page >>>>> 186793984 45604 mm/memory.c:1056 func:folio_prealloc >>>>> 362807296 88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages >>>>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext >>>>> 598237184 51309 mm/slub.c:2450 func:alloc_slab_page >>>>> 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd >>>>> 929083392 226827 mm/filemap.c:1978 func:__filemap_get_folio >>>>> 1034657792 252602 mm/memory.c:1054 func:folio_prealloc >>>>> 1262485504 602 mm/khugepaged.c:1084 func:alloc_charge_folio >>>>> 1335377920 325970 mm/readahead.c:186 func:ractl_alloc_folio >>>>> 2544877568 315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681 >>>>> [ice] func:ice_alloc_mapped_page >>>>> >>>> ice_alloc_mapped_page is the function used to allocate the pages for the >>>> Rx ring buffers. >>>> >>>> There were a number of fixes for the hot path from Maciej which might be >>>> related. Although those fixes were primarily for XDP they do impact the >>>> regular hot path as well. >>>> >>>> These were fixes on top of work he did which landed in v6.13, so it >>>> seems plausible they might be related. In particular one which mentions >>>> a missing buffer put: >>>> >>>> 743bbd93cf29 ("ice: put Rx buffers after being done with current frame") >>>> >>>> It says the following: >>>>> While at it, address an error path of ice_add_xdp_frag() - we were >>>>> missing buffer putting from day 1 there. >>>>> >>>> >>>> It seems to me the issue must be somehow related to the buffer cleanup >>>> logic for the Rx ring, since thats the only thing allocated by >>>> ice_alloc_mapped_page. >>>> >>>> It might be something fixed with the work Maciej did.. but it seems very >>>> weird that 492a044508ad ("ice: Add support for persistent NAPI config") >>>> would affect that logic at all.... >>> >>> I believe there were/are at least two separate issues. Regarding >>> commit 492a044508ad (“ice: Add support for persistent NAPI config”): >>> * On 6.13.y and 6.14.y kernels, this change prevented us from lowering >>> the driver’s initial, large memory allocation immediately after server >>> power-up. A few hours (max few days) later, this inevitably led to an >>> out-of-memory condition. >>> * Reverting the commit in those series only delayed the OOM, it >>> allowed the queue size (and thus memory footprint) to shrink on boot >>> just as it did in 6.12.y but didn’t eliminate the underlying 'leak'. >>> * In 6.15.y, however, that revert isn’t required (and isn’t even >>> applicable). The after boot allocation can once again be tuned down >>> without patching. Still, we observe the same increase in memory use >>> over time, as shown in the 'allocmap' output. >>> Thus, commit 492a044508ad led us down a false trail, or at the very >>> least hastened the inevitable OOM. >> >> That seems reasonable. I'm still surprised the specific commit leads to >> any large increase in memory, since it should only be a few bytes per >> NAPI. But there may be some related driver-specific issues. > > Actually, the large base allocation has existed for quite some time, > the mentioned commit didn’t suddenly grow our memory usage, it only > prevented us from shrinking it via "ethtool -L <iface> combined > <small-number>" > after boot. In other words, we’re still stuck with the same big > allocation, we just can’t tune it down (till reverting the commit) > Yes. My point is that I still don't understand the mechanism by which that change *prevents* ethtool -L from working as you describe. >> >> Either way, we clearly need to isolate how we're leaking memory in the >> hot path. I think it might be related to the fixes from Maciej which are >> pretty recent so might not be in 6.13 or 6.14 > > I’m fine with the fix for the mainline (now 6.15.y), the 6.13.y and > 6.14.y are already EOL. Could you please tell me which 6.15.y stable > release first incorporates that patch? Is it included in current > 6.15.5, or will it arrive in a later point release? I'm not certain if this fix actually is resolving your issue, but I will figure out which stable kernels have it shortly. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-06-30 20:01 ` Jaroslav Pulchart 2025-06-30 20:42 ` Jacob Keller @ 2025-06-30 21:56 ` Jacob Keller 2025-06-30 23:16 ` Jacob Keller 1 sibling, 1 reply; 46+ messages in thread From: Jacob Keller @ 2025-06-30 21:56 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek [-- Attachment #1.1: Type: text/plain, Size: 6966 bytes --] On 6/30/2025 1:01 PM, Jaroslav Pulchart wrote: >> >> >> >> On 6/30/2025 10:24 AM, Jaroslav Pulchart wrote: >>>> >>>> >>>> >>>> On 6/30/2025 12:35 AM, Jaroslav Pulchart wrote: >>>>>> >>>>>>> >>>>>>> On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote: >>>>>>>> Great, please send me a link to the related patch set. I can apply them in >>>>>>>> our kernel build and try them ASAP! >>>>>>> >>>>>>> Sorry if I'm repeating the question - have you tried >>>>>>> CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels >>>>>>> is low enough to use it for production workloads. >>>>>> >>>>>> I try it now, the fresh booted server: >>>>>> >>>>>> # sort -g /proc/allocinfo| tail -n 15 >>>>>> 45409728 236509 fs/dcache.c:1681 func:__d_alloc >>>>>> 71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages >>>>>> 71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages >>>>>> 85098496 4486 mm/slub.c:2452 func:alloc_slab_page >>>>>> 115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode >>>>>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page >>>>>> 141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio >>>>>> 191594496 46776 mm/memory.c:1056 func:folio_prealloc >>>>>> 360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio >>>>>> 444076032 33790 mm/slub.c:2450 func:alloc_slab_page >>>>>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext >>>>>> 975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd >>>>>> 1022427136 249616 mm/memory.c:1054 func:folio_prealloc >>>>>> 1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681 >>>>>> [ice] func:ice_alloc_mapped_page >>>>>> 1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio >>>>>> >>>>> >>>>> The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice] >>>>> func:ice_alloc_mapped_page" is just growing... >>>>> >>>>> # uptime ; sort -g /proc/allocinfo| tail -n 15 >>>>> 09:33:58 up 4 days, 6 min, 1 user, load average: 6.65, 8.18, 9.81 >>>>> >>>>> # sort -g /proc/allocinfo| tail -n 15 >>>>> 85216896 443838 fs/dcache.c:1681 func:__d_alloc >>>>> 106156032 25917 mm/shmem.c:1854 func:shmem_alloc_folio >>>>> 116850096 102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode >>>>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page >>>>> 143556608 6894 mm/slub.c:2452 func:alloc_slab_page >>>>> 186793984 45604 mm/memory.c:1056 func:folio_prealloc >>>>> 362807296 88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages >>>>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext >>>>> 598237184 51309 mm/slub.c:2450 func:alloc_slab_page >>>>> 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd >>>>> 929083392 226827 mm/filemap.c:1978 func:__filemap_get_folio >>>>> 1034657792 252602 mm/memory.c:1054 func:folio_prealloc >>>>> 1262485504 602 mm/khugepaged.c:1084 func:alloc_charge_folio >>>>> 1335377920 325970 mm/readahead.c:186 func:ractl_alloc_folio >>>>> 2544877568 315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681 >>>>> [ice] func:ice_alloc_mapped_page >>>>> >>>> ice_alloc_mapped_page is the function used to allocate the pages for the >>>> Rx ring buffers. >>>> >>>> There were a number of fixes for the hot path from Maciej which might be >>>> related. Although those fixes were primarily for XDP they do impact the >>>> regular hot path as well. >>>> >>>> These were fixes on top of work he did which landed in v6.13, so it >>>> seems plausible they might be related. In particular one which mentions >>>> a missing buffer put: >>>> >>>> 743bbd93cf29 ("ice: put Rx buffers after being done with current frame") >>>> >>>> It says the following: >>>>> While at it, address an error path of ice_add_xdp_frag() - we were >>>>> missing buffer putting from day 1 there. >>>>> >>>> >>>> It seems to me the issue must be somehow related to the buffer cleanup >>>> logic for the Rx ring, since thats the only thing allocated by >>>> ice_alloc_mapped_page. >>>> >>>> It might be something fixed with the work Maciej did.. but it seems very >>>> weird that 492a044508ad ("ice: Add support for persistent NAPI config") >>>> would affect that logic at all.... >>> >>> I believe there were/are at least two separate issues. Regarding >>> commit 492a044508ad (“ice: Add support for persistent NAPI config”): >>> * On 6.13.y and 6.14.y kernels, this change prevented us from lowering >>> the driver’s initial, large memory allocation immediately after server >>> power-up. A few hours (max few days) later, this inevitably led to an >>> out-of-memory condition. >>> * Reverting the commit in those series only delayed the OOM, it >>> allowed the queue size (and thus memory footprint) to shrink on boot >>> just as it did in 6.12.y but didn’t eliminate the underlying 'leak'. >>> * In 6.15.y, however, that revert isn’t required (and isn’t even >>> applicable). The after boot allocation can once again be tuned down >>> without patching. Still, we observe the same increase in memory use >>> over time, as shown in the 'allocmap' output. >>> Thus, commit 492a044508ad led us down a false trail, or at the very >>> least hastened the inevitable OOM. >> >> That seems reasonable. I'm still surprised the specific commit leads to >> any large increase in memory, since it should only be a few bytes per >> NAPI. But there may be some related driver-specific issues. > > Actually, the large base allocation has existed for quite some time, > the mentioned commit didn’t suddenly grow our memory usage, it only > prevented us from shrinking it via "ethtool -L <iface> combined > <small-number>" > after boot. In other words, we’re still stuck with the same big > allocation, we just can’t tune it down (till reverting the commit) > >> >> Either way, we clearly need to isolate how we're leaking memory in the >> hot path. I think it might be related to the fixes from Maciej which are >> pretty recent so might not be in 6.13 or 6.14 > > I’m fine with the fix for the mainline (now 6.15.y), the 6.13.y and > 6.14.y are already EOL. Could you please tell me which 6.15.y stable > release first incorporates that patch? Is it included in current > 6.15.5, or will it arrive in a later point release? Unfortunately it looks like the fix I mentioned has landed in 6.14, so its not a fix for your issue (since you mentioned 6.14 has failed testing in your system) $ git describe --first-parent --contains --match=v* --exclude=*rc* 743bbd93cf29f653fae0e1416a31f03231689911 v6.14~251^2~15^2~2 I don't see any other relevant changes since v6.14. I can try to see if I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test systems here. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-06-30 21:56 ` Jacob Keller @ 2025-06-30 23:16 ` Jacob Keller 2025-07-01 6:48 ` Jaroslav Pulchart 0 siblings, 1 reply; 46+ messages in thread From: Jacob Keller @ 2025-06-30 23:16 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek [-- Attachment #1.1: Type: text/plain, Size: 2175 bytes --] On 6/30/2025 2:56 PM, Jacob Keller wrote: > Unfortunately it looks like the fix I mentioned has landed in 6.14, so > its not a fix for your issue (since you mentioned 6.14 has failed > testing in your system) > > $ git describe --first-parent --contains --match=v* --exclude=*rc* > 743bbd93cf29f653fae0e1416a31f03231689911 > v6.14~251^2~15^2~2 > > I don't see any other relevant changes since v6.14. I can try to see if > I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test > systems here. On my system I see this at boot after loading the ice module from $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec> 26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice] func:ice_get_irq_res > 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs > 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats > 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats > 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings > 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings > 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector > 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs > 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring > 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page Its about 1GB for the mapped pages. I don't see any increase moment to moment. I've started an iperf session to simulate some traffic, and I'll leave this running to see if anything changes overnight. Is there anything else that you can share about the traffic setup or otherwise that I could look into? Your system seems to use ~2.5 x the buffer size as mine, but that might just be a smaller number of CPUs. Hopefully I'll get some more results overnight. Thanks, Jake [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-06-30 23:16 ` Jacob Keller @ 2025-07-01 6:48 ` Jaroslav Pulchart 2025-07-01 20:48 ` Jacob Keller 0 siblings, 1 reply; 46+ messages in thread From: Jaroslav Pulchart @ 2025-07-01 6:48 UTC (permalink / raw) To: Jacob Keller Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek > On 6/30/2025 2:56 PM, Jacob Keller wrote: > > Unfortunately it looks like the fix I mentioned has landed in 6.14, so > > its not a fix for your issue (since you mentioned 6.14 has failed > > testing in your system) > > > > $ git describe --first-parent --contains --match=v* --exclude=*rc* > > 743bbd93cf29f653fae0e1416a31f03231689911 > > v6.14~251^2~15^2~2 > > > > I don't see any other relevant changes since v6.14. I can try to see if > > I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test > > systems here. > > On my system I see this at boot after loading the ice module from > > $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec> > 26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice] > func:ice_get_irq_res > > 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs > > 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats > > 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats > > 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings > > 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings > > 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector > > 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs > > 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring > > 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page > > Its about 1GB for the mapped pages. I don't see any increase moment to > moment. I've started an iperf session to simulate some traffic, and I'll > leave this running to see if anything changes overnight. > > Is there anything else that you can share about the traffic setup or > otherwise that I could look into? Your system seems to use ~2.5 x the > buffer size as mine, but that might just be a smaller number of CPUs. > > Hopefully I'll get some more results overnight. The traffic is random production workloads from VMs, using standard Linux or OVS bridges. There is no specific pattern to it. I haven’t had any luck reproducing (or was not patient enough) this with iperf3 myself. The two active (UP) interfaces are in an LACP bonding setup. Here are our ethtool settings for the two member ports (em1 and p3p1) # ethtool -l em1 Channel parameters for em1: Pre-set maximums: RX: 64 TX: 64 Other: 1 Combined: 64 Current hardware settings: RX: 0 TX: 0 Other: 1 Combined: 8 # ethtool -g em1 Ring parameters for em1: Pre-set maximums: RX: 8160 RX Mini: n/a RX Jumbo: n/a TX: 8160 TX push buff len: n/a Current hardware settings: RX: 8160 RX Mini: n/a RX Jumbo: n/a TX: 8160 RX Buf Len: n/a CQE Size: n/a TX Push: off RX Push: off TX push buff len: n/a TCP data split: n/a # ethtool -c em1 Coalesce parameters for em1: Adaptive RX: off TX: off stats-block-usecs: n/a sample-interval: n/a pkt-rate-low: n/a pkt-rate-high: n/a rx-usecs: 12 rx-frames: n/a rx-usecs-irq: n/a rx-frames-irq: n/a tx-usecs: 28 tx-frames: n/a tx-usecs-irq: n/a tx-frames-irq: n/a rx-usecs-low: n/a rx-frame-low: n/a tx-usecs-low: n/a tx-frame-low: n/a rx-usecs-high: 0 rx-frame-high: n/a tx-usecs-high: n/a tx-frame-high: n/a CQE mode RX: n/a TX: n/a tx-aggr-max-bytes: n/a tx-aggr-max-frames: n/a tx-aggr-time-usecs: n/a # ethtool -k em1 Features for em1: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: on scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: on tx-tcp-mangleid-segmentation: off tx-tcp6-segmentation: on tx-tcp-accecn-segmentation: off [fixed] generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [fixed] rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: on receive-hashing: on highdma: on rx-vlan-filter: on vlan-challenged: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: on tx-gre-csum-segmentation: on tx-ipxip4-segmentation: on tx-ipxip6-segmentation: on tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on tx-gso-partial: on tx-tunnel-remcsum-segmentation: off [fixed] tx-sctp-segmentation: off [fixed] tx-esp-segmentation: off [fixed] tx-udp-segmentation: on tx-gso-list: off [fixed] tx-nocache-copy: off loopback: off rx-fcs: off rx-all: off [fixed] tx-vlan-stag-hw-insert: off rx-vlan-stag-hw-parse: off rx-vlan-stag-filter: on l2-fwd-offload: off [fixed] hw-tc-offload: off esp-hw-offload: off [fixed] esp-tx-csum-hw-offload: off [fixed] rx-udp_tunnel-port-offload: on tls-hw-tx-offload: off [fixed] tls-hw-rx-offload: off [fixed] rx-gro-hw: off [fixed] tls-hw-record: off [fixed] rx-gro-list: off macsec-hw-offload: off [fixed] rx-udp-gro-forwarding: off hsr-tag-ins-offload: off [fixed] hsr-tag-rm-offload: off [fixed] hsr-fwd-offload: off [fixed] hsr-dup-offload: off [fixed] # ethtool -i em1 driver: ice version: 6.15.3-3.gdc.el9.x86_64 firmware-version: 4.51 0x8001e501 23.0.8 expansion-rom-version: bus-info: 0000:63:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes # ethtool em1 Settings for em1: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full 25000baseCR/Full 25000baseSR/Full 1000baseX/Full 10000baseCR/Full 10000baseSR/Full 10000baseLR/Full Supported pause frame use: Symmetric Supports auto-negotiation: Yes Supported FEC modes: None RS BASER Advertised link modes: 25000baseCR/Full 10000baseCR/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Advertised FEC modes: None RS BASER Speed: 25000Mb/s Duplex: Full Auto-negotiation: off Port: Direct Attach Copper PHYAD: 0 Transceiver: internal Supports Wake-on: g Wake-on: d Current message level: 0x00000007 (7) drv probe link Link detected: yes > > Thanks, > Jake ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-07-01 6:48 ` Jaroslav Pulchart @ 2025-07-01 20:48 ` Jacob Keller 2025-07-02 9:48 ` Jaroslav Pulchart 0 siblings, 1 reply; 46+ messages in thread From: Jacob Keller @ 2025-07-01 20:48 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek [-- Attachment #1.1: Type: text/plain, Size: 3099 bytes --] On 6/30/2025 11:48 PM, Jaroslav Pulchart wrote: >> On 6/30/2025 2:56 PM, Jacob Keller wrote: >>> Unfortunately it looks like the fix I mentioned has landed in 6.14, so >>> its not a fix for your issue (since you mentioned 6.14 has failed >>> testing in your system) >>> >>> $ git describe --first-parent --contains --match=v* --exclude=*rc* >>> 743bbd93cf29f653fae0e1416a31f03231689911 >>> v6.14~251^2~15^2~2 >>> >>> I don't see any other relevant changes since v6.14. I can try to see if >>> I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test >>> systems here. >> >> On my system I see this at boot after loading the ice module from >> >> $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec> >> 26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice] >> func:ice_get_irq_res >>> 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs >>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats >>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats >>> 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings >>> 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings >>> 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector >>> 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs >>> 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring >>> 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page >> >> Its about 1GB for the mapped pages. I don't see any increase moment to >> moment. I've started an iperf session to simulate some traffic, and I'll >> leave this running to see if anything changes overnight. >> >> Is there anything else that you can share about the traffic setup or >> otherwise that I could look into? Your system seems to use ~2.5 x the >> buffer size as mine, but that might just be a smaller number of CPUs. >> >> Hopefully I'll get some more results overnight. > > The traffic is random production workloads from VMs, using standard > Linux or OVS bridges. There is no specific pattern to it. I haven’t > had any luck reproducing (or was not patient enough) this with iperf3 > myself. The two active (UP) interfaces are in an LACP bonding setup. > Here are our ethtool settings for the two member ports (em1 and p3p1) > I had iperf3 running overnight and the memory usage for ice_alloc_mapped_pages is constant here. Mine was direct connections without bridge or bonding. From your description I assume there's no XDP happening either. I guess the traffic patterns of an iperf session are too regular, or something to do with bridge or bonding.. but I also struggle to see how those could play a role in the buffer management in the ice driver... [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-07-01 20:48 ` Jacob Keller @ 2025-07-02 9:48 ` Jaroslav Pulchart 2025-07-02 18:01 ` Jacob Keller 2025-07-02 21:56 ` Jacob Keller 0 siblings, 2 replies; 46+ messages in thread From: Jaroslav Pulchart @ 2025-07-02 9:48 UTC (permalink / raw) To: Jacob Keller Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek > > On 6/30/2025 11:48 PM, Jaroslav Pulchart wrote: > >> On 6/30/2025 2:56 PM, Jacob Keller wrote: > >>> Unfortunately it looks like the fix I mentioned has landed in 6.14, so > >>> its not a fix for your issue (since you mentioned 6.14 has failed > >>> testing in your system) > >>> > >>> $ git describe --first-parent --contains --match=v* --exclude=*rc* > >>> 743bbd93cf29f653fae0e1416a31f03231689911 > >>> v6.14~251^2~15^2~2 > >>> > >>> I don't see any other relevant changes since v6.14. I can try to see if > >>> I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test > >>> systems here. > >> > >> On my system I see this at boot after loading the ice module from > >> > >> $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec> > >> 26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice] > >> func:ice_get_irq_res > >>> 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs > >>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats > >>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats > >>> 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings > >>> 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings > >>> 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector > >>> 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs > >>> 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring > >>> 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page > >> > >> Its about 1GB for the mapped pages. I don't see any increase moment to > >> moment. I've started an iperf session to simulate some traffic, and I'll > >> leave this running to see if anything changes overnight. > >> > >> Is there anything else that you can share about the traffic setup or > >> otherwise that I could look into? Your system seems to use ~2.5 x the > >> buffer size as mine, but that might just be a smaller number of CPUs. > >> > >> Hopefully I'll get some more results overnight. > > > > The traffic is random production workloads from VMs, using standard > > Linux or OVS bridges. There is no specific pattern to it. I haven’t > > had any luck reproducing (or was not patient enough) this with iperf3 > > myself. The two active (UP) interfaces are in an LACP bonding setup. > > Here are our ethtool settings for the two member ports (em1 and p3p1) > > > > I had iperf3 running overnight and the memory usage for > ice_alloc_mapped_pages is constant here. Mine was direct connections > without bridge or bonding. From your description I assume there's no XDP > happening either. Yes, no XDP in use. BTW the allocinfo after 6days uptime: # uptime ; sort -g /proc/allocinfo| tail -n 15 11:46:44 up 6 days, 2:18, 1 user, load average: 9.24, 11.33, 15.07 102489024 533797 fs/dcache.c:1681 func:__d_alloc 106229760 25935 mm/shmem.c:1854 func:shmem_alloc_folio 117118192 103097 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page 162783232 7656 mm/slub.c:2452 func:alloc_slab_page 189906944 46364 mm/memory.c:1056 func:folio_prealloc 499384320 121920 mm/percpu-vm.c:95 func:pcpu_alloc_pages 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext 625876992 54186 mm/slub.c:2450 func:alloc_slab_page 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd 1014710272 247732 mm/filemap.c:1978 func:__filemap_get_folio 1056710656 257986 mm/memory.c:1054 func:folio_prealloc 1279262720 610 mm/khugepaged.c:1084 func:alloc_charge_folio 1334530048 325763 mm/readahead.c:186 func:ractl_alloc_folio 3341238272 412215 drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice] func:ice_alloc_mapped_page > > I guess the traffic patterns of an iperf session are too regular, or > something to do with bridge or bonding.. but I also struggle to see how > those could play a role in the buffer management in the ice driver... ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-07-02 9:48 ` Jaroslav Pulchart @ 2025-07-02 18:01 ` Jacob Keller 2025-07-02 21:56 ` Jacob Keller 1 sibling, 0 replies; 46+ messages in thread From: Jacob Keller @ 2025-07-02 18:01 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek [-- Attachment #1.1: Type: text/plain, Size: 4996 bytes --] On 7/2/2025 2:48 AM, Jaroslav Pulchart wrote: >> >> On 6/30/2025 11:48 PM, Jaroslav Pulchart wrote: >>>> On 6/30/2025 2:56 PM, Jacob Keller wrote: >>>>> Unfortunately it looks like the fix I mentioned has landed in 6.14, so >>>>> its not a fix for your issue (since you mentioned 6.14 has failed >>>>> testing in your system) >>>>> >>>>> $ git describe --first-parent --contains --match=v* --exclude=*rc* >>>>> 743bbd93cf29f653fae0e1416a31f03231689911 >>>>> v6.14~251^2~15^2~2 >>>>> >>>>> I don't see any other relevant changes since v6.14. I can try to see if >>>>> I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test >>>>> systems here. >>>> >>>> On my system I see this at boot after loading the ice module from >>>> >>>> $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec> >>>> 26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice] >>>> func:ice_get_irq_res >>>>> 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs >>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats >>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats >>>>> 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings >>>>> 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings >>>>> 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector >>>>> 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs >>>>> 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring >>>>> 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page >>>> >>>> Its about 1GB for the mapped pages. I don't see any increase moment to >>>> moment. I've started an iperf session to simulate some traffic, and I'll >>>> leave this running to see if anything changes overnight. >>>> >>>> Is there anything else that you can share about the traffic setup or >>>> otherwise that I could look into? Your system seems to use ~2.5 x the >>>> buffer size as mine, but that might just be a smaller number of CPUs. >>>> >>>> Hopefully I'll get some more results overnight. >>> >>> The traffic is random production workloads from VMs, using standard >>> Linux or OVS bridges. There is no specific pattern to it. I haven’t >>> had any luck reproducing (or was not patient enough) this with iperf3 >>> myself. The two active (UP) interfaces are in an LACP bonding setup. >>> Here are our ethtool settings for the two member ports (em1 and p3p1) >>> >> >> I had iperf3 running overnight and the memory usage for >> ice_alloc_mapped_pages is constant here. Mine was direct connections >> without bridge or bonding. From your description I assume there's no XDP >> happening either. > > Yes, no XDP in use. > > BTW the allocinfo after 6days uptime: > # uptime ; sort -g /proc/allocinfo| tail -n 15 > 11:46:44 up 6 days, 2:18, 1 user, load average: 9.24, 11.33, 15.07 > 102489024 533797 fs/dcache.c:1681 func:__d_alloc > 106229760 25935 mm/shmem.c:1854 func:shmem_alloc_folio > 117118192 103097 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode > 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page > 162783232 7656 mm/slub.c:2452 func:alloc_slab_page > 189906944 46364 mm/memory.c:1056 func:folio_prealloc > 499384320 121920 mm/percpu-vm.c:95 func:pcpu_alloc_pages > 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext > 625876992 54186 mm/slub.c:2450 func:alloc_slab_page > 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd > 1014710272 247732 mm/filemap.c:1978 func:__filemap_get_folio > 1056710656 257986 mm/memory.c:1054 func:folio_prealloc > 1279262720 610 mm/khugepaged.c:1084 func:alloc_charge_folio > 1334530048 325763 mm/readahead.c:186 func:ractl_alloc_folio > 3341238272 412215 drivers/net/ethernet/intel/ice/ice_txrx.c:681 > [ice] func:ice_alloc_mapped_page > 3.2GB meaning an entire GB wasted from your on-boot up :( Unfortunately, I've had no luck trying to reproduce the conditions that trigger this. We do have a series in flight to convert ice to page pool which we hope resolves this.. but of course that isn't really a suitable backport candidate. Its quite frustrating when I can't figure out how to reproduce to further debug where the leak is. I also discovered that the leak sanitizer doesn't cover page allocations :( >> >> I guess the traffic patterns of an iperf session are too regular, or >> something to do with bridge or bonding.. but I also struggle to see how >> those could play a role in the buffer management in the ice driver... [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-07-02 9:48 ` Jaroslav Pulchart 2025-07-02 18:01 ` Jacob Keller @ 2025-07-02 21:56 ` Jacob Keller 2025-07-03 6:46 ` Jaroslav Pulchart 1 sibling, 1 reply; 46+ messages in thread From: Jacob Keller @ 2025-07-02 21:56 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek [-- Attachment #1.1: Type: text/plain, Size: 5974 bytes --] On 7/2/2025 2:48 AM, Jaroslav Pulchart wrote: >> >> On 6/30/2025 11:48 PM, Jaroslav Pulchart wrote: >>>> On 6/30/2025 2:56 PM, Jacob Keller wrote: >>>>> Unfortunately it looks like the fix I mentioned has landed in 6.14, so >>>>> its not a fix for your issue (since you mentioned 6.14 has failed >>>>> testing in your system) >>>>> >>>>> $ git describe --first-parent --contains --match=v* --exclude=*rc* >>>>> 743bbd93cf29f653fae0e1416a31f03231689911 >>>>> v6.14~251^2~15^2~2 >>>>> >>>>> I don't see any other relevant changes since v6.14. I can try to see if >>>>> I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test >>>>> systems here. >>>> >>>> On my system I see this at boot after loading the ice module from >>>> >>>> $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec> >>>> 26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice] >>>> func:ice_get_irq_res >>>>> 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs >>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats >>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats >>>>> 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings >>>>> 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings >>>>> 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector >>>>> 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs >>>>> 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring >>>>> 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page >>>> >>>> Its about 1GB for the mapped pages. I don't see any increase moment to >>>> moment. I've started an iperf session to simulate some traffic, and I'll >>>> leave this running to see if anything changes overnight. >>>> >>>> Is there anything else that you can share about the traffic setup or >>>> otherwise that I could look into? Your system seems to use ~2.5 x the >>>> buffer size as mine, but that might just be a smaller number of CPUs. >>>> >>>> Hopefully I'll get some more results overnight. >>> >>> The traffic is random production workloads from VMs, using standard >>> Linux or OVS bridges. There is no specific pattern to it. I haven’t >>> had any luck reproducing (or was not patient enough) this with iperf3 >>> myself. The two active (UP) interfaces are in an LACP bonding setup. >>> Here are our ethtool settings for the two member ports (em1 and p3p1) >>> >> >> I had iperf3 running overnight and the memory usage for >> ice_alloc_mapped_pages is constant here. Mine was direct connections >> without bridge or bonding. From your description I assume there's no XDP >> happening either. > > Yes, no XDP in use. > > BTW the allocinfo after 6days uptime: > # uptime ; sort -g /proc/allocinfo| tail -n 15 > 11:46:44 up 6 days, 2:18, 1 user, load average: 9.24, 11.33, 15.07 > 102489024 533797 fs/dcache.c:1681 func:__d_alloc > 106229760 25935 mm/shmem.c:1854 func:shmem_alloc_folio > 117118192 103097 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode > 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page > 162783232 7656 mm/slub.c:2452 func:alloc_slab_page > 189906944 46364 mm/memory.c:1056 func:folio_prealloc > 499384320 121920 mm/percpu-vm.c:95 func:pcpu_alloc_pages > 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext > 625876992 54186 mm/slub.c:2450 func:alloc_slab_page > 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd > 1014710272 247732 mm/filemap.c:1978 func:__filemap_get_folio > 1056710656 257986 mm/memory.c:1054 func:folio_prealloc > 1279262720 610 mm/khugepaged.c:1084 func:alloc_charge_folio > 1334530048 325763 mm/readahead.c:186 func:ractl_alloc_folio > 3341238272 412215 drivers/net/ethernet/intel/ice/ice_txrx.c:681 > [ice] func:ice_alloc_mapped_page > I have a suspicion that the issue is related to the updating of page_count in ice_get_rx_pgcnt(). The i40e driver has a very similar logic for page reuse but doesn't do this. It also has a counter to track failure to re-use the Rx pages. Commit 11c4aa074d54 ("ice: gather page_count()'s of each frag right before XDP prog call") changed the logic to update page_count of the Rx page just prior to the XDP call instead of at the point where we get the page from ice_get_rx_buf(). I think this change was originally introduced while we were trying out an experimental refactor of the hotpath to handle fragments differently, which no longer happens since 743bbd93cf29 ("ice: put Rx buffers after being done with current frame"), which ironically was part of this very same series.. I think this updating of page count is accidentally causing us to miscount when we could perform page-reuse, and ultimately causes us to leak the page somehow. I'm still investigating, but I think this might trigger if somehow the page pgcnt - pagecnt_bias becomes >1, we don't reuse the page. The i40e driver stores the page count in i40e_get_rx_buffer, and I think our updating it later can somehow get things out-of-sync. Do you know if your traffic pattern happens to send fragmented frames? I think iperf doesn't do that, which might be part of whats causing this issue. I'm going to try to see if I can generate such fragmentation to confirm. Is your MTU kept at the default ethernet size? At the very least I'm going to propose a patch for ice similar to the one from Joe Damato to track the rx busy page count. That might at least help track something.. Thanks, Jake [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-07-02 21:56 ` Jacob Keller @ 2025-07-03 6:46 ` Jaroslav Pulchart 2025-07-03 16:16 ` Jacob Keller 0 siblings, 1 reply; 46+ messages in thread From: Jaroslav Pulchart @ 2025-07-03 6:46 UTC (permalink / raw) To: Jacob Keller Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek > > On 7/2/2025 2:48 AM, Jaroslav Pulchart wrote: > >> > >> On 6/30/2025 11:48 PM, Jaroslav Pulchart wrote: > >>>> On 6/30/2025 2:56 PM, Jacob Keller wrote: > >>>>> Unfortunately it looks like the fix I mentioned has landed in 6.14, so > >>>>> its not a fix for your issue (since you mentioned 6.14 has failed > >>>>> testing in your system) > >>>>> > >>>>> $ git describe --first-parent --contains --match=v* --exclude=*rc* > >>>>> 743bbd93cf29f653fae0e1416a31f03231689911 > >>>>> v6.14~251^2~15^2~2 > >>>>> > >>>>> I don't see any other relevant changes since v6.14. I can try to see if > >>>>> I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test > >>>>> systems here. > >>>> > >>>> On my system I see this at boot after loading the ice module from > >>>> > >>>> $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec> > >>>> 26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice] > >>>> func:ice_get_irq_res > >>>>> 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs > >>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats > >>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats > >>>>> 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings > >>>>> 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings > >>>>> 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector > >>>>> 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs > >>>>> 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring > >>>>> 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page > >>>> > >>>> Its about 1GB for the mapped pages. I don't see any increase moment to > >>>> moment. I've started an iperf session to simulate some traffic, and I'll > >>>> leave this running to see if anything changes overnight. > >>>> > >>>> Is there anything else that you can share about the traffic setup or > >>>> otherwise that I could look into? Your system seems to use ~2.5 x the > >>>> buffer size as mine, but that might just be a smaller number of CPUs. > >>>> > >>>> Hopefully I'll get some more results overnight. > >>> > >>> The traffic is random production workloads from VMs, using standard > >>> Linux or OVS bridges. There is no specific pattern to it. I haven’t > >>> had any luck reproducing (or was not patient enough) this with iperf3 > >>> myself. The two active (UP) interfaces are in an LACP bonding setup. > >>> Here are our ethtool settings for the two member ports (em1 and p3p1) > >>> > >> > >> I had iperf3 running overnight and the memory usage for > >> ice_alloc_mapped_pages is constant here. Mine was direct connections > >> without bridge or bonding. From your description I assume there's no XDP > >> happening either. > > > > Yes, no XDP in use. > > > > BTW the allocinfo after 6days uptime: > > # uptime ; sort -g /proc/allocinfo| tail -n 15 > > 11:46:44 up 6 days, 2:18, 1 user, load average: 9.24, 11.33, 15.07 > > 102489024 533797 fs/dcache.c:1681 func:__d_alloc > > 106229760 25935 mm/shmem.c:1854 func:shmem_alloc_folio > > 117118192 103097 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode > > 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page > > 162783232 7656 mm/slub.c:2452 func:alloc_slab_page > > 189906944 46364 mm/memory.c:1056 func:folio_prealloc > > 499384320 121920 mm/percpu-vm.c:95 func:pcpu_alloc_pages > > 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext > > 625876992 54186 mm/slub.c:2450 func:alloc_slab_page > > 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd > > 1014710272 247732 mm/filemap.c:1978 func:__filemap_get_folio > > 1056710656 257986 mm/memory.c:1054 func:folio_prealloc > > 1279262720 610 mm/khugepaged.c:1084 func:alloc_charge_folio > > 1334530048 325763 mm/readahead.c:186 func:ractl_alloc_folio > > 3341238272 412215 drivers/net/ethernet/intel/ice/ice_txrx.c:681 > > [ice] func:ice_alloc_mapped_page > > > I have a suspicion that the issue is related to the updating of > page_count in ice_get_rx_pgcnt(). The i40e driver has a very similar > logic for page reuse but doesn't do this. It also has a counter to track > failure to re-use the Rx pages. > > Commit 11c4aa074d54 ("ice: gather page_count()'s of each frag right > before XDP prog call") changed the logic to update page_count of the Rx > page just prior to the XDP call instead of at the point where we get the > page from ice_get_rx_buf(). I think this change was originally > introduced while we were trying out an experimental refactor of the > hotpath to handle fragments differently, which no longer happens since > 743bbd93cf29 ("ice: put Rx buffers after being done with current > frame"), which ironically was part of this very same series.. > > I think this updating of page count is accidentally causing us to > miscount when we could perform page-reuse, and ultimately causes us to > leak the page somehow. I'm still investigating, but I think this might > trigger if somehow the page pgcnt - pagecnt_bias becomes >1, we don't > reuse the page. > > The i40e driver stores the page count in i40e_get_rx_buffer, and I think > our updating it later can somehow get things out-of-sync. > > Do you know if your traffic pattern happens to send fragmented frames? I Hmm, I check the * node_netstat_Ip_Frag* metrics and they are empty(do-not-exists), * shortly run "tcpdump -n -i any 'ip[6:2] & 0x3fff != 0'" and nothing was found looks to me like there is no fragmentation. > think iperf doesn't do that, which might be part of whats causing this > issue. I'm going to try to see if I can generate such fragmentation to > confirm. Is your MTU kept at the default ethernet size? Our MTU size is set to 9000 everywhere. > > At the very least I'm going to propose a patch for ice similar to the > one from Joe Damato to track the rx busy page count. That might at least > help track something.. > > Thanks, > Jake ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-07-03 6:46 ` Jaroslav Pulchart @ 2025-07-03 16:16 ` Jacob Keller 2025-07-04 19:30 ` Maciej Fijalkowski 2025-07-07 18:32 ` Jacob Keller 0 siblings, 2 replies; 46+ messages in thread From: Jacob Keller @ 2025-07-03 16:16 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek [-- Attachment #1.1: Type: text/plain, Size: 6719 bytes --] On 7/2/2025 11:46 PM, Jaroslav Pulchart wrote: >> >> On 7/2/2025 2:48 AM, Jaroslav Pulchart wrote: >>>> >>>> On 6/30/2025 11:48 PM, Jaroslav Pulchart wrote: >>>>>> On 6/30/2025 2:56 PM, Jacob Keller wrote: >>>>>>> Unfortunately it looks like the fix I mentioned has landed in 6.14, so >>>>>>> its not a fix for your issue (since you mentioned 6.14 has failed >>>>>>> testing in your system) >>>>>>> >>>>>>> $ git describe --first-parent --contains --match=v* --exclude=*rc* >>>>>>> 743bbd93cf29f653fae0e1416a31f03231689911 >>>>>>> v6.14~251^2~15^2~2 >>>>>>> >>>>>>> I don't see any other relevant changes since v6.14. I can try to see if >>>>>>> I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test >>>>>>> systems here. >>>>>> >>>>>> On my system I see this at boot after loading the ice module from >>>>>> >>>>>> $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec> >>>>>> 26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice] >>>>>> func:ice_get_irq_res >>>>>>> 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs >>>>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats >>>>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats >>>>>>> 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings >>>>>>> 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings >>>>>>> 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector >>>>>>> 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs >>>>>>> 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring >>>>>>> 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page >>>>>> >>>>>> Its about 1GB for the mapped pages. I don't see any increase moment to >>>>>> moment. I've started an iperf session to simulate some traffic, and I'll >>>>>> leave this running to see if anything changes overnight. >>>>>> >>>>>> Is there anything else that you can share about the traffic setup or >>>>>> otherwise that I could look into? Your system seems to use ~2.5 x the >>>>>> buffer size as mine, but that might just be a smaller number of CPUs. >>>>>> >>>>>> Hopefully I'll get some more results overnight. >>>>> >>>>> The traffic is random production workloads from VMs, using standard >>>>> Linux or OVS bridges. There is no specific pattern to it. I haven’t >>>>> had any luck reproducing (or was not patient enough) this with iperf3 >>>>> myself. The two active (UP) interfaces are in an LACP bonding setup. >>>>> Here are our ethtool settings for the two member ports (em1 and p3p1) >>>>> >>>> >>>> I had iperf3 running overnight and the memory usage for >>>> ice_alloc_mapped_pages is constant here. Mine was direct connections >>>> without bridge or bonding. From your description I assume there's no XDP >>>> happening either. >>> >>> Yes, no XDP in use. >>> >>> BTW the allocinfo after 6days uptime: >>> # uptime ; sort -g /proc/allocinfo| tail -n 15 >>> 11:46:44 up 6 days, 2:18, 1 user, load average: 9.24, 11.33, 15.07 >>> 102489024 533797 fs/dcache.c:1681 func:__d_alloc >>> 106229760 25935 mm/shmem.c:1854 func:shmem_alloc_folio >>> 117118192 103097 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode >>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page >>> 162783232 7656 mm/slub.c:2452 func:alloc_slab_page >>> 189906944 46364 mm/memory.c:1056 func:folio_prealloc >>> 499384320 121920 mm/percpu-vm.c:95 func:pcpu_alloc_pages >>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext >>> 625876992 54186 mm/slub.c:2450 func:alloc_slab_page >>> 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd >>> 1014710272 247732 mm/filemap.c:1978 func:__filemap_get_folio >>> 1056710656 257986 mm/memory.c:1054 func:folio_prealloc >>> 1279262720 610 mm/khugepaged.c:1084 func:alloc_charge_folio >>> 1334530048 325763 mm/readahead.c:186 func:ractl_alloc_folio >>> 3341238272 412215 drivers/net/ethernet/intel/ice/ice_txrx.c:681 >>> [ice] func:ice_alloc_mapped_page >>> >> I have a suspicion that the issue is related to the updating of >> page_count in ice_get_rx_pgcnt(). The i40e driver has a very similar >> logic for page reuse but doesn't do this. It also has a counter to track >> failure to re-use the Rx pages. >> >> Commit 11c4aa074d54 ("ice: gather page_count()'s of each frag right >> before XDP prog call") changed the logic to update page_count of the Rx >> page just prior to the XDP call instead of at the point where we get the >> page from ice_get_rx_buf(). I think this change was originally >> introduced while we were trying out an experimental refactor of the >> hotpath to handle fragments differently, which no longer happens since >> 743bbd93cf29 ("ice: put Rx buffers after being done with current >> frame"), which ironically was part of this very same series.. >> >> I think this updating of page count is accidentally causing us to >> miscount when we could perform page-reuse, and ultimately causes us to >> leak the page somehow. I'm still investigating, but I think this might >> trigger if somehow the page pgcnt - pagecnt_bias becomes >1, we don't >> reuse the page. >> >> The i40e driver stores the page count in i40e_get_rx_buffer, and I think >> our updating it later can somehow get things out-of-sync. >> >> Do you know if your traffic pattern happens to send fragmented frames? I > > Hmm, I check the > * node_netstat_Ip_Frag* metrics and they are empty(do-not-exists), > * shortly run "tcpdump -n -i any 'ip[6:2] & 0x3fff != 0'" and nothing was found > looks to me like there is no fragmentation. > Good to rule it out at least. >> think iperf doesn't do that, which might be part of whats causing this >> issue. I'm going to try to see if I can generate such fragmentation to >> confirm. Is your MTU kept at the default ethernet size? > > Our MTU size is set to 9000 everywhere. > Ok. I am re-trying with MTU 9000 and using some traffic generated by wrk now. I do see much larger memory use (~2GB) when using MTU 9000, so that tracks with what your system shows. Currently its fluctuating between 1.9 and 2G. I'll leave this going for a couple of days while on vacation and see if anything pops up. Thanks, Jake [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-07-03 16:16 ` Jacob Keller @ 2025-07-04 19:30 ` Maciej Fijalkowski 2025-07-07 18:32 ` Jacob Keller 1 sibling, 0 replies; 46+ messages in thread From: Maciej Fijalkowski @ 2025-07-04 19:30 UTC (permalink / raw) To: Jacob Keller Cc: Jaroslav Pulchart, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek On Thu, Jul 03, 2025 at 09:16:35AM -0700, Jacob Keller wrote: > > > On 7/2/2025 11:46 PM, Jaroslav Pulchart wrote: > >> > >> On 7/2/2025 2:48 AM, Jaroslav Pulchart wrote: > >>>> > >>>> On 6/30/2025 11:48 PM, Jaroslav Pulchart wrote: > >>>>>> On 6/30/2025 2:56 PM, Jacob Keller wrote: > >>>>>>> Unfortunately it looks like the fix I mentioned has landed in 6.14, so > >>>>>>> its not a fix for your issue (since you mentioned 6.14 has failed > >>>>>>> testing in your system) > >>>>>>> > >>>>>>> $ git describe --first-parent --contains --match=v* --exclude=*rc* > >>>>>>> 743bbd93cf29f653fae0e1416a31f03231689911 > >>>>>>> v6.14~251^2~15^2~2 > >>>>>>> > >>>>>>> I don't see any other relevant changes since v6.14. I can try to see if > >>>>>>> I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test > >>>>>>> systems here. > >>>>>> > >>>>>> On my system I see this at boot after loading the ice module from > >>>>>> > >>>>>> $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec> > >>>>>> 26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice] > >>>>>> func:ice_get_irq_res > >>>>>>> 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs > >>>>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats > >>>>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats > >>>>>>> 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings > >>>>>>> 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings > >>>>>>> 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector > >>>>>>> 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs > >>>>>>> 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring > >>>>>>> 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page > >>>>>> > >>>>>> Its about 1GB for the mapped pages. I don't see any increase moment to > >>>>>> moment. I've started an iperf session to simulate some traffic, and I'll > >>>>>> leave this running to see if anything changes overnight. > >>>>>> > >>>>>> Is there anything else that you can share about the traffic setup or > >>>>>> otherwise that I could look into? Your system seems to use ~2.5 x the > >>>>>> buffer size as mine, but that might just be a smaller number of CPUs. > >>>>>> > >>>>>> Hopefully I'll get some more results overnight. > >>>>> > >>>>> The traffic is random production workloads from VMs, using standard > >>>>> Linux or OVS bridges. There is no specific pattern to it. I haven’t > >>>>> had any luck reproducing (or was not patient enough) this with iperf3 > >>>>> myself. The two active (UP) interfaces are in an LACP bonding setup. > >>>>> Here are our ethtool settings for the two member ports (em1 and p3p1) > >>>>> > >>>> > >>>> I had iperf3 running overnight and the memory usage for > >>>> ice_alloc_mapped_pages is constant here. Mine was direct connections > >>>> without bridge or bonding. From your description I assume there's no XDP > >>>> happening either. > >>> > >>> Yes, no XDP in use. > >>> > >>> BTW the allocinfo after 6days uptime: > >>> # uptime ; sort -g /proc/allocinfo| tail -n 15 > >>> 11:46:44 up 6 days, 2:18, 1 user, load average: 9.24, 11.33, 15.07 > >>> 102489024 533797 fs/dcache.c:1681 func:__d_alloc > >>> 106229760 25935 mm/shmem.c:1854 func:shmem_alloc_folio > >>> 117118192 103097 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode > >>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page > >>> 162783232 7656 mm/slub.c:2452 func:alloc_slab_page > >>> 189906944 46364 mm/memory.c:1056 func:folio_prealloc > >>> 499384320 121920 mm/percpu-vm.c:95 func:pcpu_alloc_pages > >>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext > >>> 625876992 54186 mm/slub.c:2450 func:alloc_slab_page > >>> 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd > >>> 1014710272 247732 mm/filemap.c:1978 func:__filemap_get_folio > >>> 1056710656 257986 mm/memory.c:1054 func:folio_prealloc > >>> 1279262720 610 mm/khugepaged.c:1084 func:alloc_charge_folio > >>> 1334530048 325763 mm/readahead.c:186 func:ractl_alloc_folio > >>> 3341238272 412215 drivers/net/ethernet/intel/ice/ice_txrx.c:681 > >>> [ice] func:ice_alloc_mapped_page > >>> > >> I have a suspicion that the issue is related to the updating of > >> page_count in ice_get_rx_pgcnt(). The i40e driver has a very similar > >> logic for page reuse but doesn't do this. It also has a counter to track > >> failure to re-use the Rx pages. > >> > >> Commit 11c4aa074d54 ("ice: gather page_count()'s of each frag right > >> before XDP prog call") changed the logic to update page_count of the Rx > >> page just prior to the XDP call instead of at the point where we get the > >> page from ice_get_rx_buf(). I think this change was originally > >> introduced while we were trying out an experimental refactor of the > >> hotpath to handle fragments differently, which no longer happens since > >> 743bbd93cf29 ("ice: put Rx buffers after being done with current > >> frame"), which ironically was part of this very same series.. > >> > >> I think this updating of page count is accidentally causing us to > >> miscount when we could perform page-reuse, and ultimately causes us to > >> leak the page somehow. I'm still investigating, but I think this might > >> trigger if somehow the page pgcnt - pagecnt_bias becomes >1, we don't > >> reuse the page. > >> > >> The i40e driver stores the page count in i40e_get_rx_buffer, and I think > >> our updating it later can somehow get things out-of-sync. > >> > >> Do you know if your traffic pattern happens to send fragmented frames? I > > > > Hmm, I check the > > * node_netstat_Ip_Frag* metrics and they are empty(do-not-exists), > > * shortly run "tcpdump -n -i any 'ip[6:2] & 0x3fff != 0'" and nothing was found > > looks to me like there is no fragmentation. > > > > Good to rule it out at least. > > >> think iperf doesn't do that, which might be part of whats causing this > >> issue. I'm going to try to see if I can generate such fragmentation to > >> confirm. Is your MTU kept at the default ethernet size? > > > > Our MTU size is set to 9000 everywhere. > > > > Ok. I am re-trying with MTU 9000 and using some traffic generated by wrk > now. I do see much larger memory use (~2GB) when using MTU 9000, so that > tracks with what your system shows. Currently its fluctuating between > 1.9 and 2G. I'll leave this going for a couple of days while on vacation > and see if anything pops up. I was thinking if order-1 pages might do the mess there for some reason since for 9k mtu we pull them and split into half. Maybe it would be worth trying out if legacy-rx (which will work on order-0 pages) doesn't have this issue? but that would require 8k mtu. > > Thanks, > Jake ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-07-03 16:16 ` Jacob Keller 2025-07-04 19:30 ` Maciej Fijalkowski @ 2025-07-07 18:32 ` Jacob Keller 2025-07-07 22:03 ` Jacob Keller 1 sibling, 1 reply; 46+ messages in thread From: Jacob Keller @ 2025-07-07 18:32 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek [-- Attachment #1.1: Type: text/plain, Size: 1394 bytes --] On 7/3/2025 9:16 AM, Jacob Keller wrote: > On 7/2/2025 11:46 PM, Jaroslav Pulchart wrote: >>> think iperf doesn't do that, which might be part of whats causing this >>> issue. I'm going to try to see if I can generate such fragmentation to >>> confirm. Is your MTU kept at the default ethernet size? >> >> Our MTU size is set to 9000 everywhere. >> > > Ok. I am re-trying with MTU 9000 and using some traffic generated by wrk > now. I do see much larger memory use (~2GB) when using MTU 9000, so that > tracks with what your system shows. Currently its fluctuating between > 1.9 and 2G. I'll leave this going for a couple of days while on vacation > and see if anything pops up. > > Thanks, > Jake Good news! After several days of running a wrk and iperf3 workload with 9k MTU, I see a significant increase in the memory usage from the page allocations: 7.3G 953314 drivers/net/ethernet/intel/ice/ice_txrx.c:682 [ice] func:ice_alloc_mapped_page ~5GB extra. At least I can reproduce this now. Its unclear how long it took since I was out on vacation from Wednesday through until now. I do have a singular hypothesis regarding the way we're currently tracking the page count, (just based on differences between ice and i40e). I'm going to attempt to align with i40e and re-run the test. Hopefully I'll have some more information in a day or two. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-07-07 18:32 ` Jacob Keller @ 2025-07-07 22:03 ` Jacob Keller 2025-07-09 0:50 ` Jacob Keller 0 siblings, 1 reply; 46+ messages in thread From: Jacob Keller @ 2025-07-07 22:03 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek [-- Attachment #1.1: Type: text/plain, Size: 2279 bytes --] On 7/7/2025 11:32 AM, Jacob Keller wrote: > > > On 7/3/2025 9:16 AM, Jacob Keller wrote: >> On 7/2/2025 11:46 PM, Jaroslav Pulchart wrote: >>>> think iperf doesn't do that, which might be part of whats causing this >>>> issue. I'm going to try to see if I can generate such fragmentation to >>>> confirm. Is your MTU kept at the default ethernet size? >>> >>> Our MTU size is set to 9000 everywhere. >>> >> >> Ok. I am re-trying with MTU 9000 and using some traffic generated by wrk >> now. I do see much larger memory use (~2GB) when using MTU 9000, so that >> tracks with what your system shows. Currently its fluctuating between >> 1.9 and 2G. I'll leave this going for a couple of days while on vacation >> and see if anything pops up. >> >> Thanks, >> Jake > > Good news! After several days of running a wrk and iperf3 workload with > 9k MTU, I see a significant increase in the memory usage from the page > allocations: > > 7.3G 953314 drivers/net/ethernet/intel/ice/ice_txrx.c:682 [ice] > func:ice_alloc_mapped_page > > ~5GB extra. > > At least I can reproduce this now. Its unclear how long it took since I > was out on vacation from Wednesday through until now. > > I do have a singular hypothesis regarding the way we're currently > tracking the page count, (just based on differences between ice and > i40e). I'm going to attempt to align with i40e and re-run the test. > Hopefully I'll have some more information in a day or two. Bad news: my hypothesis was incorrect. Good news: I can immediately see the problem if I set MTU to 9K and start an iperf3 session and just watch the count of allocations from ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell if a change is helping. I ported the stats from i40e for tracking the page allocations, and I can see that we're allocating new pages despite not actually performing releases. I don't yet have a good understanding of what causes this, and the logic in ice is pretty hard to track... I'm going to try the page pool patches myself to see if this test bed triggers the same problems. Unfortunately I think I need someone else with more experience with the hotpath code to help figure out whats going wrong here... [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-07-07 22:03 ` Jacob Keller @ 2025-07-09 0:50 ` Jacob Keller 2025-07-09 19:11 ` Jacob Keller 0 siblings, 1 reply; 46+ messages in thread From: Jacob Keller @ 2025-07-09 0:50 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek [-- Attachment #1.1: Type: text/plain, Size: 1346 bytes --] On 7/7/2025 3:03 PM, Jacob Keller wrote: > Bad news: my hypothesis was incorrect. > > Good news: I can immediately see the problem if I set MTU to 9K and > start an iperf3 session and just watch the count of allocations from > ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell > if a change is helping. > > I ported the stats from i40e for tracking the page allocations, and I > can see that we're allocating new pages despite not actually performing > releases. > > I don't yet have a good understanding of what causes this, and the logic > in ice is pretty hard to track... > > I'm going to try the page pool patches myself to see if this test bed > triggers the same problems. Unfortunately I think I need someone else > with more experience with the hotpath code to help figure out whats > going wrong here... I believe I have isolated this and figured out the issue: With 9K MTU, sometimes the hardware posts a multi-buffer frame with an extra descriptor that has a size of 0 bytes with no data in it. When this happens, our logic for tracking buffers fails to free this buffer. We then later overwrite the page because we failed to either free or re-use the page, and our overwriting logic doesn't verify this. I will have a fix with a more detailed description posted tomorrow. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-07-09 0:50 ` Jacob Keller @ 2025-07-09 19:11 ` Jacob Keller 2025-07-09 21:04 ` Jaroslav Pulchart 0 siblings, 1 reply; 46+ messages in thread From: Jacob Keller @ 2025-07-09 19:11 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek [-- Attachment #1.1: Type: text/plain, Size: 1772 bytes --] On 7/8/2025 5:50 PM, Jacob Keller wrote: > > > On 7/7/2025 3:03 PM, Jacob Keller wrote: >> Bad news: my hypothesis was incorrect. >> >> Good news: I can immediately see the problem if I set MTU to 9K and >> start an iperf3 session and just watch the count of allocations from >> ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell >> if a change is helping. >> >> I ported the stats from i40e for tracking the page allocations, and I >> can see that we're allocating new pages despite not actually performing >> releases. >> >> I don't yet have a good understanding of what causes this, and the logic >> in ice is pretty hard to track... >> >> I'm going to try the page pool patches myself to see if this test bed >> triggers the same problems. Unfortunately I think I need someone else >> with more experience with the hotpath code to help figure out whats >> going wrong here... > > I believe I have isolated this and figured out the issue: With 9K MTU, > sometimes the hardware posts a multi-buffer frame with an extra > descriptor that has a size of 0 bytes with no data in it. When this > happens, our logic for tracking buffers fails to free this buffer. We > then later overwrite the page because we failed to either free or re-use > the page, and our overwriting logic doesn't verify this. > > I will have a fix with a more detailed description posted tomorrow. @Jaroslav, I've posted a fix which I believe should resolve your issue: https://lore.kernel.org/intel-wired-lan/20250709-jk-ice-fix-rx-mem-leak-v1-1-cfdd7eeea905@intel.com/T/#u I am reasonably confident it should resolve the issue you reported. If possible, it would be appreciated if you could test it and report back to confirm. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-07-09 19:11 ` Jacob Keller @ 2025-07-09 21:04 ` Jaroslav Pulchart 2025-07-09 21:15 ` Jacob Keller 0 siblings, 1 reply; 46+ messages in thread From: Jaroslav Pulchart @ 2025-07-09 21:04 UTC (permalink / raw) To: Jacob Keller Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek > > > On 7/8/2025 5:50 PM, Jacob Keller wrote: > > > > > > On 7/7/2025 3:03 PM, Jacob Keller wrote: > >> Bad news: my hypothesis was incorrect. > >> > >> Good news: I can immediately see the problem if I set MTU to 9K and > >> start an iperf3 session and just watch the count of allocations from > >> ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell > >> if a change is helping. > >> > >> I ported the stats from i40e for tracking the page allocations, and I > >> can see that we're allocating new pages despite not actually performing > >> releases. > >> > >> I don't yet have a good understanding of what causes this, and the logic > >> in ice is pretty hard to track... > >> > >> I'm going to try the page pool patches myself to see if this test bed > >> triggers the same problems. Unfortunately I think I need someone else > >> with more experience with the hotpath code to help figure out whats > >> going wrong here... > > > > I believe I have isolated this and figured out the issue: With 9K MTU, > > sometimes the hardware posts a multi-buffer frame with an extra > > descriptor that has a size of 0 bytes with no data in it. When this > > happens, our logic for tracking buffers fails to free this buffer. We > > then later overwrite the page because we failed to either free or re-use > > the page, and our overwriting logic doesn't verify this. > > > > I will have a fix with a more detailed description posted tomorrow. > > @Jaroslav, I've posted a fix which I believe should resolve your issue: > > https://lore.kernel.org/intel-wired-lan/20250709-jk-ice-fix-rx-mem-leak-v1-1-cfdd7eeea905@intel.com/T/#u > > I am reasonably confident it should resolve the issue you reported. If > possible, it would be appreciated if you could test it and report back > to confirm. @Jacob that’s excellent news! I’ve built and installed 6.15.5 with your patch on one of our servers (strange that I had to disable CONFIG_MEM_ALLOC_PROFILING with this patch or the kernel wouldn’t boot) and started a VM running our production traffic. I’ll let it run for a day-two, observe the memory utilization per NUMA node and report back. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-07-09 21:04 ` Jaroslav Pulchart @ 2025-07-09 21:15 ` Jacob Keller 2025-07-11 18:16 ` Jaroslav Pulchart 0 siblings, 1 reply; 46+ messages in thread From: Jacob Keller @ 2025-07-09 21:15 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek [-- Attachment #1.1: Type: text/plain, Size: 2406 bytes --] On 7/9/2025 2:04 PM, Jaroslav Pulchart wrote: >> >> >> On 7/8/2025 5:50 PM, Jacob Keller wrote: >>> >>> >>> On 7/7/2025 3:03 PM, Jacob Keller wrote: >>>> Bad news: my hypothesis was incorrect. >>>> >>>> Good news: I can immediately see the problem if I set MTU to 9K and >>>> start an iperf3 session and just watch the count of allocations from >>>> ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell >>>> if a change is helping. >>>> >>>> I ported the stats from i40e for tracking the page allocations, and I >>>> can see that we're allocating new pages despite not actually performing >>>> releases. >>>> >>>> I don't yet have a good understanding of what causes this, and the logic >>>> in ice is pretty hard to track... >>>> >>>> I'm going to try the page pool patches myself to see if this test bed >>>> triggers the same problems. Unfortunately I think I need someone else >>>> with more experience with the hotpath code to help figure out whats >>>> going wrong here... >>> >>> I believe I have isolated this and figured out the issue: With 9K MTU, >>> sometimes the hardware posts a multi-buffer frame with an extra >>> descriptor that has a size of 0 bytes with no data in it. When this >>> happens, our logic for tracking buffers fails to free this buffer. We >>> then later overwrite the page because we failed to either free or re-use >>> the page, and our overwriting logic doesn't verify this. >>> >>> I will have a fix with a more detailed description posted tomorrow. >> >> @Jaroslav, I've posted a fix which I believe should resolve your issue: >> >> https://lore.kernel.org/intel-wired-lan/20250709-jk-ice-fix-rx-mem-leak-v1-1-cfdd7eeea905@intel.com/T/#u >> >> I am reasonably confident it should resolve the issue you reported. If >> possible, it would be appreciated if you could test it and report back >> to confirm. > > @Jacob that’s excellent news! > > I’ve built and installed 6.15.5 with your patch on one of our servers > (strange that I had to disable CONFIG_MEM_ALLOC_PROFILING with this > patch or the kernel wouldn’t boot) and started a VM running our > production traffic. I’ll let it run for a day-two, observe the memory > utilization per NUMA node and report back. Great! A bit odd you had to disable CONFIG_MEM_ALLOC_PROFILING. I didn't have trouble on my kernel with it enabled. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-07-09 21:15 ` Jacob Keller @ 2025-07-11 18:16 ` Jaroslav Pulchart 2025-07-11 22:30 ` Jacob Keller 0 siblings, 1 reply; 46+ messages in thread From: Jaroslav Pulchart @ 2025-07-11 18:16 UTC (permalink / raw) To: Jacob Keller Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek [-- Attachment #1: Type: text/plain, Size: 2717 bytes --] > > > > On 7/9/2025 2:04 PM, Jaroslav Pulchart wrote: > >> > >> > >> On 7/8/2025 5:50 PM, Jacob Keller wrote: > >>> > >>> > >>> On 7/7/2025 3:03 PM, Jacob Keller wrote: > >>>> Bad news: my hypothesis was incorrect. > >>>> > >>>> Good news: I can immediately see the problem if I set MTU to 9K and > >>>> start an iperf3 session and just watch the count of allocations from > >>>> ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell > >>>> if a change is helping. > >>>> > >>>> I ported the stats from i40e for tracking the page allocations, and I > >>>> can see that we're allocating new pages despite not actually performing > >>>> releases. > >>>> > >>>> I don't yet have a good understanding of what causes this, and the logic > >>>> in ice is pretty hard to track... > >>>> > >>>> I'm going to try the page pool patches myself to see if this test bed > >>>> triggers the same problems. Unfortunately I think I need someone else > >>>> with more experience with the hotpath code to help figure out whats > >>>> going wrong here... > >>> > >>> I believe I have isolated this and figured out the issue: With 9K MTU, > >>> sometimes the hardware posts a multi-buffer frame with an extra > >>> descriptor that has a size of 0 bytes with no data in it. When this > >>> happens, our logic for tracking buffers fails to free this buffer. We > >>> then later overwrite the page because we failed to either free or re-use > >>> the page, and our overwriting logic doesn't verify this. > >>> > >>> I will have a fix with a more detailed description posted tomorrow. > >> > >> @Jaroslav, I've posted a fix which I believe should resolve your issue: > >> > >> https://lore.kernel.org/intel-wired-lan/20250709-jk-ice-fix-rx-mem-leak-v1-1-cfdd7eeea905@intel.com/T/#u > >> > >> I am reasonably confident it should resolve the issue you reported. If > >> possible, it would be appreciated if you could test it and report back > >> to confirm. > > > > @Jacob that’s excellent news! > > > > I’ve built and installed 6.15.5 with your patch on one of our servers > > (strange that I had to disable CONFIG_MEM_ALLOC_PROFILING with this > > patch or the kernel wouldn’t boot) and started a VM running our > > production traffic. I’ll let it run for a day-two, observe the memory > > utilization per NUMA node and report back. > > Great! A bit odd you had to disable CONFIG_MEM_ALLOC_PROFILING. I didn't > have trouble on my kernel with it enabled. Status update after ~45h of uptime. So far so good, I do not see continuous memory consumption increase on home numa nodes like before. See attached "status_before_after_45h_uptime.png" comparison. [-- Attachment #2: status_before_after_45h_uptime.png --] [-- Type: image/png, Size: 355801 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-07-11 18:16 ` Jaroslav Pulchart @ 2025-07-11 22:30 ` Jacob Keller 2025-07-14 5:34 ` Jaroslav Pulchart 0 siblings, 1 reply; 46+ messages in thread From: Jacob Keller @ 2025-07-11 22:30 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek [-- Attachment #1.1: Type: text/plain, Size: 2901 bytes --] On 7/11/2025 11:16 AM, Jaroslav Pulchart wrote: >> >> >> >> On 7/9/2025 2:04 PM, Jaroslav Pulchart wrote: >>>> >>>> >>>> On 7/8/2025 5:50 PM, Jacob Keller wrote: >>>>> >>>>> >>>>> On 7/7/2025 3:03 PM, Jacob Keller wrote: >>>>>> Bad news: my hypothesis was incorrect. >>>>>> >>>>>> Good news: I can immediately see the problem if I set MTU to 9K and >>>>>> start an iperf3 session and just watch the count of allocations from >>>>>> ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell >>>>>> if a change is helping. >>>>>> >>>>>> I ported the stats from i40e for tracking the page allocations, and I >>>>>> can see that we're allocating new pages despite not actually performing >>>>>> releases. >>>>>> >>>>>> I don't yet have a good understanding of what causes this, and the logic >>>>>> in ice is pretty hard to track... >>>>>> >>>>>> I'm going to try the page pool patches myself to see if this test bed >>>>>> triggers the same problems. Unfortunately I think I need someone else >>>>>> with more experience with the hotpath code to help figure out whats >>>>>> going wrong here... >>>>> >>>>> I believe I have isolated this and figured out the issue: With 9K MTU, >>>>> sometimes the hardware posts a multi-buffer frame with an extra >>>>> descriptor that has a size of 0 bytes with no data in it. When this >>>>> happens, our logic for tracking buffers fails to free this buffer. We >>>>> then later overwrite the page because we failed to either free or re-use >>>>> the page, and our overwriting logic doesn't verify this. >>>>> >>>>> I will have a fix with a more detailed description posted tomorrow. >>>> >>>> @Jaroslav, I've posted a fix which I believe should resolve your issue: >>>> >>>> https://lore.kernel.org/intel-wired-lan/20250709-jk-ice-fix-rx-mem-leak-v1-1-cfdd7eeea905@intel.com/T/#u >>>> >>>> I am reasonably confident it should resolve the issue you reported. If >>>> possible, it would be appreciated if you could test it and report back >>>> to confirm. >>> >>> @Jacob that’s excellent news! >>> >>> I’ve built and installed 6.15.5 with your patch on one of our servers >>> (strange that I had to disable CONFIG_MEM_ALLOC_PROFILING with this >>> patch or the kernel wouldn’t boot) and started a VM running our >>> production traffic. I’ll let it run for a day-two, observe the memory >>> utilization per NUMA node and report back. >> >> Great! A bit odd you had to disable CONFIG_MEM_ALLOC_PROFILING. I didn't >> have trouble on my kernel with it enabled. > > Status update after ~45h of uptime. So far so good, I do not see > continuous memory consumption increase on home numa nodes like before. > See attached "status_before_after_45h_uptime.png" comparison. Great news! Would you like your "Tested-by" being added to the commit message when we submit the fix to netdev? [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-07-11 22:30 ` Jacob Keller @ 2025-07-14 5:34 ` Jaroslav Pulchart 0 siblings, 0 replies; 46+ messages in thread From: Jaroslav Pulchart @ 2025-07-14 5:34 UTC (permalink / raw) To: Jacob Keller Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek > > On 7/11/2025 11:16 AM, Jaroslav Pulchart wrote: > >> > >> > >> > >> On 7/9/2025 2:04 PM, Jaroslav Pulchart wrote: > >>>> > >>>> > >>>> On 7/8/2025 5:50 PM, Jacob Keller wrote: > >>>>> > >>>>> > >>>>> On 7/7/2025 3:03 PM, Jacob Keller wrote: > >>>>>> Bad news: my hypothesis was incorrect. > >>>>>> > >>>>>> Good news: I can immediately see the problem if I set MTU to 9K and > >>>>>> start an iperf3 session and just watch the count of allocations from > >>>>>> ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell > >>>>>> if a change is helping. > >>>>>> > >>>>>> I ported the stats from i40e for tracking the page allocations, and I > >>>>>> can see that we're allocating new pages despite not actually performing > >>>>>> releases. > >>>>>> > >>>>>> I don't yet have a good understanding of what causes this, and the logic > >>>>>> in ice is pretty hard to track... > >>>>>> > >>>>>> I'm going to try the page pool patches myself to see if this test bed > >>>>>> triggers the same problems. Unfortunately I think I need someone else > >>>>>> with more experience with the hotpath code to help figure out whats > >>>>>> going wrong here... > >>>>> > >>>>> I believe I have isolated this and figured out the issue: With 9K MTU, > >>>>> sometimes the hardware posts a multi-buffer frame with an extra > >>>>> descriptor that has a size of 0 bytes with no data in it. When this > >>>>> happens, our logic for tracking buffers fails to free this buffer. We > >>>>> then later overwrite the page because we failed to either free or re-use > >>>>> the page, and our overwriting logic doesn't verify this. > >>>>> > >>>>> I will have a fix with a more detailed description posted tomorrow. > >>>> > >>>> @Jaroslav, I've posted a fix which I believe should resolve your issue: > >>>> > >>>> https://lore.kernel.org/intel-wired-lan/20250709-jk-ice-fix-rx-mem-leak-v1-1-cfdd7eeea905@intel.com/T/#u > >>>> > >>>> I am reasonably confident it should resolve the issue you reported. If > >>>> possible, it would be appreciated if you could test it and report back > >>>> to confirm. > >>> > >>> @Jacob that’s excellent news! > >>> > >>> I’ve built and installed 6.15.5 with your patch on one of our servers > >>> (strange that I had to disable CONFIG_MEM_ALLOC_PROFILING with this > >>> patch or the kernel wouldn’t boot) and started a VM running our > >>> production traffic. I’ll let it run for a day-two, observe the memory > >>> utilization per NUMA node and report back. > >> > >> Great! A bit odd you had to disable CONFIG_MEM_ALLOC_PROFILING. I didn't > >> have trouble on my kernel with it enabled. > > > > Status update after ~45h of uptime. So far so good, I do not see > > continuous memory consumption increase on home numa nodes like before. > > See attached "status_before_after_45h_uptime.png" comparison. > > Great news! Would you like your "Tested-by" being added to the commit > message when we submit the fix to netdev? Jacob, absolutely. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) [not found] ` <CAK8fFZ5XTO9dGADuMSV0hJws-6cZE9equa3X6dfTBgDyzE1pEQ@mail.gmail.com> 2025-06-25 14:03 ` Przemek Kitszel @ 2025-06-25 14:53 ` Paul Menzel 1 sibling, 0 replies; 46+ messages in thread From: Paul Menzel @ 2025-06-25 14:53 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Jacob E Keller, Jakub Kicinski, Przemyslaw Kitszel, Joe Damato, intel-wired-lan, netdev, Anthony L Nguyen, Michal Swiatkowski, Lukasz Czapnik, Eric Dumazet, Ahmed Zaki, Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek, regressions Dear Jaroslav, Am 25.06.25 um 14:17 schrieb Jaroslav Pulchart: > We are still facing the memory issue with Intel 810 NICs (even on latest > 6.15.y). Commit 492a044508ad13 ("ice: Add support for persistent NAPI config") was added in Linux v6.13-rc1, and as until now, no fix could be presented, but reverting it fixes your issue, I strongly recommend to send a revert. No idea if it’s compiler depended or what else could be the issue. But due to Linux’ no regression policy this should be reverted as soon as possible. Kind regards, Paul ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-04-14 16:29 Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) Jaroslav Pulchart 2025-04-14 17:15 ` [Intel-wired-lan] " Paul Menzel 2025-04-15 14:38 ` Przemek Kitszel @ 2025-07-04 16:55 ` Michal Kubiak 2025-07-05 7:01 ` Jaroslav Pulchart 2 siblings, 1 reply; 46+ messages in thread From: Michal Kubiak @ 2025-07-04 16:55 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Tony Nguyen, Kitszel, Przemyslaw, jdamato, intel-wired-lan, netdev, Igor Raits, Daniel Secik, Zdenek Pesek On Mon, Apr 14, 2025 at 06:29:01PM +0200, Jaroslav Pulchart wrote: > Hello, > > While investigating increased memory usage after upgrading our > host/hypervisor servers from Linux kernel 6.12.y to 6.13.y, I observed > a regression in available memory per NUMA node. Our servers allocate > 60GB of each NUMA node’s 64GB of RAM to HugePages for VMs, leaving 4GB > for the host OS. > > After the upgrade, we noticed approximately 500MB less free RAM on > NUMA nodes 0 and 2 compared to 6.12.y, even with no VMs running (just > the host OS after reboot). These nodes host Intel 810-XXV NICs. Here's > a snapshot of the NUMA stats on vanilla 6.13.y: > > NUMA nodes: 0 1 2 3 4 5 6 7 8 > 9 10 11 12 13 14 15 > HPFreeGiB: 60 60 60 60 60 60 60 60 60 > 60 60 60 60 60 60 60 > MemTotal: 64989 65470 65470 65470 65470 65470 65470 65453 > 65470 65470 65470 65470 65470 65470 65470 65462 > MemFree: 2793 3559 3150 3438 3616 3722 3520 3547 3547 > 3536 3506 3452 3440 3489 3607 3729 > > We traced the issue to commit 492a044508ad13a490a24c66f311339bf891cb5f > "ice: Add support for persistent NAPI config". > > We limit the number of channels on the NICs to match local NUMA cores > or less if unused interface (from ridiculous 96 default), for example: > ethtool -L em1 combined 6 # active port; from 96 > ethtool -L p3p2 combined 2 # unused port; from 96 > > This typically aligns memory use with local CPUs and keeps NUMA-local > memory usage within expected limits. However, starting with kernel > 6.13.y and this commit, the high memory usage by the ICE driver > persists regardless of reduced channel configuration. > > Reverting the commit restores expected memory availability on nodes 0 > and 2. Below are stats from 6.13.y with the commit reverted: > NUMA nodes: 0 1 2 3 4 5 6 7 8 > 9 10 11 12 13 14 15 > HPFreeGiB: 60 60 60 60 60 60 60 60 60 > 60 60 60 60 60 60 60 > MemTotal: 64989 65470 65470 65470 65470 65470 65470 65453 65470 > 65470 65470 65470 65470 65470 65470 65462 > MemFree: 3208 3765 3668 3507 3811 3727 3812 3546 3676 3596 ... > > This brings nodes 0 and 2 back to ~3.5GB free RAM, similar to kernel > 6.12.y, and avoids swap pressure and memory exhaustion when running > services and VMs. > > I also do not see any practical benefit in persisting the channel > memory allocation. After a fresh server reboot, channels are not > explicitly configured, and the system will not automatically resize > them back to a higher count unless manually set again. Therefore, > retaining the previous memory footprint appears unnecessary and > potentially harmful in memory-constrained environments > > Best regards, > Jaroslav Pulchart > Hello Jaroslav, I have just sent a series for converting the Rx path of the ice driver to use the Page Pool. We suspect it may help for the memory consumption issue since it removes the problematic code and delegates some memory management to the generic code. Could you please give it a try and check if it helps for your issue. The link to the series: https://lore.kernel.org/intel-wired-lan/20250704161859.871152-1-michal.kubiak@intel.com/ Thanks, Michal ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-07-04 16:55 ` Michal Kubiak @ 2025-07-05 7:01 ` Jaroslav Pulchart 2025-07-07 15:37 ` Jaroslav Pulchart 0 siblings, 1 reply; 46+ messages in thread From: Jaroslav Pulchart @ 2025-07-05 7:01 UTC (permalink / raw) To: Michal Kubiak Cc: Tony Nguyen, Kitszel, Przemyslaw, jdamato, intel-wired-lan, netdev, Igor Raits, Daniel Secik, Zdenek Pesek > On Mon, Apr 14, 2025 at 06:29:01PM +0200, Jaroslav Pulchart wrote: > > Hello, > > > > While investigating increased memory usage after upgrading our > > host/hypervisor servers from Linux kernel 6.12.y to 6.13.y, I observed > > a regression in available memory per NUMA node. Our servers allocate > > 60GB of each NUMA node’s 64GB of RAM to HugePages for VMs, leaving 4GB > > for the host OS. > > > > After the upgrade, we noticed approximately 500MB less free RAM on > > NUMA nodes 0 and 2 compared to 6.12.y, even with no VMs running (just > > the host OS after reboot). These nodes host Intel 810-XXV NICs. Here's > > a snapshot of the NUMA stats on vanilla 6.13.y: > > > > NUMA nodes: 0 1 2 3 4 5 6 7 8 > > 9 10 11 12 13 14 15 > > HPFreeGiB: 60 60 60 60 60 60 60 60 60 > > 60 60 60 60 60 60 60 > > MemTotal: 64989 65470 65470 65470 65470 65470 65470 65453 > > 65470 65470 65470 65470 65470 65470 65470 65462 > > MemFree: 2793 3559 3150 3438 3616 3722 3520 3547 3547 > > 3536 3506 3452 3440 3489 3607 3729 > > > > We traced the issue to commit 492a044508ad13a490a24c66f311339bf891cb5f > > "ice: Add support for persistent NAPI config". > > > > We limit the number of channels on the NICs to match local NUMA cores > > or less if unused interface (from ridiculous 96 default), for example: > > ethtool -L em1 combined 6 # active port; from 96 > > ethtool -L p3p2 combined 2 # unused port; from 96 > > > > This typically aligns memory use with local CPUs and keeps NUMA-local > > memory usage within expected limits. However, starting with kernel > > 6.13.y and this commit, the high memory usage by the ICE driver > > persists regardless of reduced channel configuration. > > > > Reverting the commit restores expected memory availability on nodes 0 > > and 2. Below are stats from 6.13.y with the commit reverted: > > NUMA nodes: 0 1 2 3 4 5 6 7 8 > > 9 10 11 12 13 14 15 > > HPFreeGiB: 60 60 60 60 60 60 60 60 60 > > 60 60 60 60 60 60 60 > > MemTotal: 64989 65470 65470 65470 65470 65470 65470 65453 65470 > > 65470 65470 65470 65470 65470 65470 65462 > > MemFree: 3208 3765 3668 3507 3811 3727 3812 3546 3676 3596 ... > > > > This brings nodes 0 and 2 back to ~3.5GB free RAM, similar to kernel > > 6.12.y, and avoids swap pressure and memory exhaustion when running > > services and VMs. > > > > I also do not see any practical benefit in persisting the channel > > memory allocation. After a fresh server reboot, channels are not > > explicitly configured, and the system will not automatically resize > > them back to a higher count unless manually set again. Therefore, > > retaining the previous memory footprint appears unnecessary and > > potentially harmful in memory-constrained environments > > > > Best regards, > > Jaroslav Pulchart > > > > > Hello Jaroslav, > > I have just sent a series for converting the Rx path of the ice driver > to use the Page Pool. > We suspect it may help for the memory consumption issue since it removes > the problematic code and delegates some memory management to the generic > code. > > Could you please give it a try and check if it helps for your issue. > The link to the series: https://lore.kernel.org/intel-wired-lan/20250704161859.871152-1-michal.kubiak@intel.com/ I can try it, however I cannot apply the patch as-is @ 6.15.y: $ git am ~/ice-convert-Rx-path-to-Page-Pool.patch Applying: ice: remove legacy Rx and construct SKB Applying: ice: drop page splitting and recycling error: patch failed: drivers/net/ethernet/intel/ice/ice_txrx.h:480 error: drivers/net/ethernet/intel/ice/ice_txrx.h: patch does not apply Patch failed at 0002 ice: drop page splitting and recycling hint: Use 'git am --show-current-patch=diff' to see the failed patch hint: When you have resolved this problem, run "git am --continue". hint: If you prefer to skip this patch, run "git am --skip" instead. hint: To restore the original branch and stop patching, run "git am --abort". hint: Disable this message with "git config set advice.mergeConflict false" > > Thanks, > Michal > ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) 2025-07-05 7:01 ` Jaroslav Pulchart @ 2025-07-07 15:37 ` Jaroslav Pulchart 0 siblings, 0 replies; 46+ messages in thread From: Jaroslav Pulchart @ 2025-07-07 15:37 UTC (permalink / raw) To: Michal Kubiak Cc: Tony Nguyen, Kitszel, Przemyslaw, jdamato, intel-wired-lan, netdev, Igor Raits, Daniel Secik, Zdenek Pesek so 5. 7. 2025 v 9:01 odesílatel Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> napsal: > > > On Mon, Apr 14, 2025 at 06:29:01PM +0200, Jaroslav Pulchart wrote: > > > Hello, > > > > > > While investigating increased memory usage after upgrading our > > > host/hypervisor servers from Linux kernel 6.12.y to 6.13.y, I observed > > > a regression in available memory per NUMA node. Our servers allocate > > > 60GB of each NUMA node’s 64GB of RAM to HugePages for VMs, leaving 4GB > > > for the host OS. > > > > > > After the upgrade, we noticed approximately 500MB less free RAM on > > > NUMA nodes 0 and 2 compared to 6.12.y, even with no VMs running (just > > > the host OS after reboot). These nodes host Intel 810-XXV NICs. Here's > > > a snapshot of the NUMA stats on vanilla 6.13.y: > > > > > > NUMA nodes: 0 1 2 3 4 5 6 7 8 > > > 9 10 11 12 13 14 15 > > > HPFreeGiB: 60 60 60 60 60 60 60 60 60 > > > 60 60 60 60 60 60 60 > > > MemTotal: 64989 65470 65470 65470 65470 65470 65470 65453 > > > 65470 65470 65470 65470 65470 65470 65470 65462 > > > MemFree: 2793 3559 3150 3438 3616 3722 3520 3547 3547 > > > 3536 3506 3452 3440 3489 3607 3729 > > > > > > We traced the issue to commit 492a044508ad13a490a24c66f311339bf891cb5f > > > "ice: Add support for persistent NAPI config". > > > > > > We limit the number of channels on the NICs to match local NUMA cores > > > or less if unused interface (from ridiculous 96 default), for example: > > > ethtool -L em1 combined 6 # active port; from 96 > > > ethtool -L p3p2 combined 2 # unused port; from 96 > > > > > > This typically aligns memory use with local CPUs and keeps NUMA-local > > > memory usage within expected limits. However, starting with kernel > > > 6.13.y and this commit, the high memory usage by the ICE driver > > > persists regardless of reduced channel configuration. > > > > > > Reverting the commit restores expected memory availability on nodes 0 > > > and 2. Below are stats from 6.13.y with the commit reverted: > > > NUMA nodes: 0 1 2 3 4 5 6 7 8 > > > 9 10 11 12 13 14 15 > > > HPFreeGiB: 60 60 60 60 60 60 60 60 60 > > > 60 60 60 60 60 60 60 > > > MemTotal: 64989 65470 65470 65470 65470 65470 65470 65453 65470 > > > 65470 65470 65470 65470 65470 65470 65462 > > > MemFree: 3208 3765 3668 3507 3811 3727 3812 3546 3676 3596 ... > > > > > > This brings nodes 0 and 2 back to ~3.5GB free RAM, similar to kernel > > > 6.12.y, and avoids swap pressure and memory exhaustion when running > > > services and VMs. > > > > > > I also do not see any practical benefit in persisting the channel > > > memory allocation. After a fresh server reboot, channels are not > > > explicitly configured, and the system will not automatically resize > > > them back to a higher count unless manually set again. Therefore, > > > retaining the previous memory footprint appears unnecessary and > > > potentially harmful in memory-constrained environments > > > > > > Best regards, > > > Jaroslav Pulchart > > > > > > > > > Hello Jaroslav, > > > > I have just sent a series for converting the Rx path of the ice driver > > to use the Page Pool. > > We suspect it may help for the memory consumption issue since it removes > > the problematic code and delegates some memory management to the generic > > code. > > > > Could you please give it a try and check if it helps for your issue. > > The link to the series: https://lore.kernel.org/intel-wired-lan/20250704161859.871152-1-michal.kubiak@intel.com/ > > I can try it, however I cannot apply the patch as-is @ 6.15.y: > $ git am ~/ice-convert-Rx-path-to-Page-Pool.patch > Applying: ice: remove legacy Rx and construct SKB > Applying: ice: drop page splitting and recycling > error: patch failed: drivers/net/ethernet/intel/ice/ice_txrx.h:480 > error: drivers/net/ethernet/intel/ice/ice_txrx.h: patch does not apply > Patch failed at 0002 ice: drop page splitting and recycling > hint: Use 'git am --show-current-patch=diff' to see the failed patch > hint: When you have resolved this problem, run "git am --continue". > hint: If you prefer to skip this patch, run "git am --skip" instead. > hint: To restore the original branch and stop patching, run "git am --abort". > hint: Disable this message with "git config set advice.mergeConflict false" > My colleague and I have applied the missing bits and have it building on 6.15.5 (note that we had to disable CONFIG_MEM_ALLOC_PROFILING, or the kernel won’t boot). The patches we used are: 0001-libeth-convert-to-netmem.patch 0002-libeth-support-native-XDP-and-register-memory-model.patch 0003-libeth-xdp-add-XDP_TX-buffers-sending.patch 0004-libeth-xdp-add-.ndo_xdp_xmit-helpers.patch 0005-libeth-xdp-add-XDPSQE-completion-helpers.patch 0006-libeth-xdp-add-XDPSQ-locking-helpers.patch 0007-libeth-xdp-add-XDPSQ-cleanup-timers.patch 0008-libeth-xdp-add-helpers-for-preparing-processing-libe.patch 0009-libeth-xdp-add-XDP-prog-run-and-verdict-result-handl.patch 0010-libeth-xdp-add-templates-for-building-driver-side-ca.patch 0011-libeth-xdp-add-RSS-hash-hint-and-XDP-features-setup-.patch 0012-libeth-xsk-add-XSk-XDP_TX-sending-helpers.patch 0013-libeth-xsk-add-XSk-xmit-functions.patch 0014-libeth-xsk-add-XSk-Rx-processing-support.patch 0015-libeth-xsk-add-XSkFQ-refill-and-XSk-wakeup-helpers.patch 0016-libeth-xdp-xsk-access-adjacent-u32s-as-u64-where-app.patch 0017-ice-add-a-separate-Rx-handler-for-flow-director-comm.patch 0018-ice-remove-legacy-Rx-and-construct-SKB.patch 0019-ice-drop-page-splitting-and-recycling.patch 0020-ice-switch-to-Page-Pool.patch Unfortunately, the new setup crashes after VMs are started. Here’s the oops trace: [ 82.816544] tun: Universal TUN/TAP device driver, 1.6 [ 82.823923] tap2c2b8dfc-91: entered promiscuous mode [ 82.848913] tapa92181fc-b5: entered promiscuous mode [ 84.030527] tap54ab9888-90: entered promiscuous mode [ 84.043251] tap89f4f7ae-d1: entered promiscuous mode [ 85.768578] tapf1e9f4f9-17: entered promiscuous mode [ 85.780372] tap72c64909-77: entered promiscuous mode [ 87.580455] tape1b2d2dd-bc: entered promiscuous mode [ 87.593224] tap34fb2668-4a: entered promiscuous mode [ 150.406899] Oops: general protection fault, probably for non-canonical address 0xffff3b95e757d5a0: 0000 [#1] SMP NOPTI [ 150.417626] CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Tainted: G E 6.15.5-1.gdc+ice.el9.x86_64 #1 PREEMPT(lazy) [ 150.428845] Tainted: [E]=UNSIGNED_MODULE [ 150.432773] Hardware name: Dell Inc. PowerEdge R7525/0H3K7P, BIOS 2.19.0 03/07/2025 [ 150.440432] RIP: 0010:page_pool_put_unrefed_netmem+0xe2/0x250 [ 150.446186] Code: 18 48 85 d2 0f 84 58 ff ff ff 8b 52 2c 4c 89 e7 39 d0 41 0f 94 c5 e8 0d f2 ff ff 84 c0 0f 85 4f ff ff ff 48 8b 85 60 06 00 00 <65> 48 ff 40 20 5b 4c 89 e6 48 89 ef 5d 41 5c 41 5d e9 f8 fa ff ff [ 150.464947] RSP: 0018:ffffbc4a003fcd18 EFLAGS: 00010246 [ 150.470173] RAX: ffff9dcabfc37580 RBX: 00000000ffffffff RCX: 0000000000000000 [ 150.477496] RDX: 0000000000000000 RSI: fffff2ec441924c0 RDI: fffff2ec441924c0 [ 150.484773] RBP: ffff9dcabfc36f20 R08: ffff9dc330536d20 R09: 0000000000551618 [ 150.492045] R10: 0000000000000000 R11: 0000000000000f82 R12: fffff2ec441924c0 [ 150.499317] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000001b69 [ 150.506584] FS: 0000000000000000(0000) GS:ffff9dcb27946000(0000) knlGS:0000000000000000 [ 150.514806] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 150.520677] CR2: 00007f82d00041b8 CR3: 000000012bcab00a CR4: 0000000000770ef0 [ 150.527937] PKRU: 55555554 [ 150.530770] Call Trace: [ 150.533342] <IRQ> [ 150.535484] ice_clean_rx_irq+0x288/0x530 [ice] [ 150.540171] ? sched_balance_find_src_group+0x13f/0x210 [ 150.545521] ? ice_clean_tx_irq+0x18f/0x3a0 [ice] [ 150.550373] ice_napi_poll+0xe2/0x290 [ice] [ 150.554709] __napi_poll+0x27/0x1e0 [ 150.558323] net_rx_action+0x1d3/0x3f0 [ 150.562194] ? __napi_schedule+0x8e/0xb0 [ 150.566239] ? sched_clock+0xc/0x30 [ 150.569852] ? sched_clock_cpu+0xb/0x190 [ 150.573897] handle_softirqs+0xd0/0x2b0 [ 150.577858] __irq_exit_rcu+0xcd/0xf0 [ 150.581636] common_interrupt+0x7f/0xa0 [ 150.585601] </IRQ> [ 150.587826] <TASK> [ 150.590049] asm_common_interrupt+0x22/0x40 [ 150.594352] RIP: 0010:flush_smp_call_function_queue+0x39/0x50 [ 150.600218] Code: 80 c0 bb 2e 98 48 85 c0 74 31 53 9c 5b fa bf 01 00 00 00 e8 49 f5 ff ff 65 66 83 3d 58 af 90 02 00 75 0c 80 e7 02 74 01 fb 5b <c3> cc cc cc cc e8 8d 1d f1 ff 80 e7 02 74 f0 eb ed c3 cc cc cc cc [ 150.619204] RSP: 0018:ffffbc4a001e7ed8 EFLAGS: 00000202 [ 150.624550] RAX: 0000000000000000 RBX: ffff9dc2c0088000 RCX: 00000000000f4240 [ 150.631806] RDX: 0000000000007f0c RSI: 0000000000000008 RDI: ffff9dcabfc30880 [ 150.639057] RBP: 0000000000000004 R08: 0000000000000008 R09: ffff9dcabfc311e8 [ 150.646314] R10: ffff9dcabfc1fd80 R11: 0000000000000004 R12: ffff9dc2c1e64400 [ 150.653569] R13: ffffffff978da0e0 R14: 0000000000000001 R15: 0000000000000000 [ 150.660829] do_idle+0x13a/0x200 [ 150.664186] cpu_startup_entry+0x25/0x30 [ 150.668241] start_secondary+0x114/0x140 [ 150.672292] common_startup_64+0x13e/0x141 [ 150.676525] </TASK> [ 150.678840] Modules linked in: target_core_user(E) uio(E) target_core_pscsi(E) target_core_file(E) target_core_iblock(E) nf_conntrack_netlink(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E) tun(E) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) netfs(E) netconsole(E) scsi_transport_iscsi(E) sch_ingress(E) iscsi_target_mod(E) target_core_mod(E) 8021q(E) garp(E) mrp(E) bonding(E) tls(E) nfnetlink_cttimeout(E) nfnetlink(E) openvswitch(E) nf_conncount(E) nf_nat(E) psample(E) ib_core(E) binfmt_misc(E) dell_rbu(E) sunrpc(E) vfat(E) fat(E) dm_service_time(E) dm_multipath(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) ipmi_ssif(E) edac_mce_amd(E) kvm_amd(E) kvm(E) dell_pc(E) platform_profile(E) dell_smbios(E) dcdbas(E) mgag200(E) irqbypass(E) dell_wmi_descriptor(E) wmi_bmof(E) i2c_algo_bit(E) rapl(E) acpi_cpufreq(E) ptdma(E) i2c_piix4(E) acpi_power_meter(E) ipmi_si(E) k10temp(E) i2c_smbus(E) acpi_ipmi(E) wmi(E) ipmi_devintf(E) ipmi_msghandler(E) tcp_bbr(E) fuse(E) zram(E) [ 150.678894] lz4hc_compress(E) lz4_compress(E) zstd_compress(E) ext4(E) crc16(E) mbcache(E) jbd2(E) dm_crypt(E) sd_mod(E) sg(E) ice(E) ahci(E) polyval_clmulni(E) libie(E) libeth_xdp(E) polyval_generic(E) libahci(E) libeth(E) ghash_clmulni_intel(E) sha512_ssse3(E) libata(E) ccp(E) megaraid_sas(E) gnss(E) sp5100_tco(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E) [ 150.770112] Unloaded tainted modules: fmpm(E):1 fjes(E):2 padlock_aes(E):2 [ 150.818140] ---[ end trace 0000000000000000 ]--- [ 150.913536] pstore: backend (erst) writing error (-22) [ 150.918850] RIP: 0010:page_pool_put_unrefed_netmem+0xe2/0x250 [ 150.924764] Code: 18 48 85 d2 0f 84 58 ff ff ff 8b 52 2c 4c 89 e7 39 d0 41 0f 94 c5 e8 0d f2 ff ff 84 c0 0f 85 4f ff ff ff 48 8b 85 60 06 00 00 <65> 48 ff 40 20 5b 4c 89 e6 48 89 ef 5d 41 5c 41 5d e9 f8 fa ff ff [ 150.943854] RSP: 0018:ffffbc4a003fcd18 EFLAGS: 00010246 [ 150.949245] RAX: ffff9dcabfc37580 RBX: 00000000ffffffff RCX: 0000000000000000 [ 150.956556] RDX: 0000000000000000 RSI: fffff2ec441924c0 RDI: fffff2ec441924c0 [ 150.963860] RBP: ffff9dcabfc36f20 R08: ffff9dc330536d20 R09: 0000000000551618 [ 150.971166] R10: 0000000000000000 R11: 0000000000000f82 R12: fffff2ec441924c0 [ 150.978475] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000001b69 [ 150.985782] FS: 0000000000000000(0000) GS:ffff9dcb27946000(0000) knlGS:0000000000000000 [ 150.994036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 150.999958] CR2: 00007f82d00041b8 CR3: 000000012bcab00a CR4: 0000000000770ef0 [ 151.007270] PKRU: 55555554 [ 151.010151] Kernel panic - not syncing: Fatal exception in interrupt [ 151.488873] Kernel Offset: 0x14600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 151.581163] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- > > > > Thanks, > > Michal > > ^ permalink raw reply [flat|nested] 46+ messages in thread
end of thread, other threads:[~2025-07-14 5:35 UTC | newest] Thread overview: 46+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-04-14 16:29 Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) Jaroslav Pulchart 2025-04-14 17:15 ` [Intel-wired-lan] " Paul Menzel 2025-04-15 14:38 ` Przemek Kitszel 2025-04-16 0:53 ` Jakub Kicinski 2025-04-16 7:13 ` Jaroslav Pulchart 2025-04-16 13:48 ` Jakub Kicinski 2025-04-16 16:03 ` Jaroslav Pulchart 2025-04-16 22:44 ` Jakub Kicinski 2025-04-16 22:57 ` [Intel-wired-lan] " Keller, Jacob E 2025-04-16 22:57 ` Keller, Jacob E 2025-04-17 0:13 ` Jakub Kicinski 2025-04-17 17:52 ` Keller, Jacob E 2025-05-21 10:50 ` Jaroslav Pulchart 2025-06-04 8:42 ` Jaroslav Pulchart [not found] ` <CAK8fFZ5XTO9dGADuMSV0hJws-6cZE9equa3X6dfTBgDyzE1pEQ@mail.gmail.com> 2025-06-25 14:03 ` Przemek Kitszel [not found] ` <CAK8fFZ7LREBEdhXjBAKuaqktOz1VwsBTxcCpLBsa+dkMj4Pyyw@mail.gmail.com> 2025-06-25 20:25 ` Jakub Kicinski 2025-06-26 7:42 ` Jaroslav Pulchart 2025-06-30 7:35 ` Jaroslav Pulchart 2025-06-30 16:02 ` Jacob Keller 2025-06-30 17:24 ` Jaroslav Pulchart 2025-06-30 18:59 ` Jacob Keller 2025-06-30 20:01 ` Jaroslav Pulchart 2025-06-30 20:42 ` Jacob Keller 2025-06-30 21:56 ` Jacob Keller 2025-06-30 23:16 ` Jacob Keller 2025-07-01 6:48 ` Jaroslav Pulchart 2025-07-01 20:48 ` Jacob Keller 2025-07-02 9:48 ` Jaroslav Pulchart 2025-07-02 18:01 ` Jacob Keller 2025-07-02 21:56 ` Jacob Keller 2025-07-03 6:46 ` Jaroslav Pulchart 2025-07-03 16:16 ` Jacob Keller 2025-07-04 19:30 ` Maciej Fijalkowski 2025-07-07 18:32 ` Jacob Keller 2025-07-07 22:03 ` Jacob Keller 2025-07-09 0:50 ` Jacob Keller 2025-07-09 19:11 ` Jacob Keller 2025-07-09 21:04 ` Jaroslav Pulchart 2025-07-09 21:15 ` Jacob Keller 2025-07-11 18:16 ` Jaroslav Pulchart 2025-07-11 22:30 ` Jacob Keller 2025-07-14 5:34 ` Jaroslav Pulchart 2025-06-25 14:53 ` Paul Menzel 2025-07-04 16:55 ` Michal Kubiak 2025-07-05 7:01 ` Jaroslav Pulchart 2025-07-07 15:37 ` Jaroslav Pulchart
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).