* RE: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
@ 2025-04-17 17:52 ` Keller, Jacob E
0 siblings, 0 replies; 59+ messages in thread
From: Keller, Jacob E @ 2025-04-17 17:52 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Jaroslav Pulchart, Kitszel, Przemyslaw, Damato, Joe,
intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
Nguyen, Anthony L, Igor Raits, Daniel Secik, Zdenek Pesek,
Dumazet, Eric, Martin Karsten, Zaki, Ahmed, Czapnik, Lukasz,
Michal Swiatkowski
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Wednesday, April 16, 2025 5:13 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Damato, Joe <jdamato@fastly.com>; intel-wired-
> lan@lists.osuosl.org; netdev@vger.kernel.org; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; Igor Raits <igor@gooddata.com>; Daniel Secik
> <daniel.secik@gooddata.com>; Zdenek Pesek <zdenek.pesek@gooddata.com>;
> Dumazet, Eric <edumazet@google.com>; Martin Karsten
> <mkarsten@uwaterloo.ca>; Zaki, Ahmed <ahmed.zaki@intel.com>; Czapnik,
> Lukasz <lukasz.czapnik@intel.com>; Michal Swiatkowski
> <michal.swiatkowski@linux.intel.com>
> Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE
> driver after upgrade to 6.13.y (regression in commit 492a044508ad)
>
> On Wed, 16 Apr 2025 22:57:10 +0000 Keller, Jacob E wrote:
> > > > And you're reverting just and exactly 492a044508ad13 ?
> > > > The memory for persistent config is allocated in alloc_netdev_mqs()
> > > > unconditionally. I'm lost as to how this commit could make any
> > > > difference :(
> > >
> > > Yes, reverted the 492a044508ad13.
> >
> > Struct napi_config *is* 1056 bytes
>
> You're probably looking at 6.15-rcX kernels. Yes, the affinity mask
> can be large depending on the kernel config. But report is for 6.13,
> AFAIU. In 6.13 and 6.14 napi_config was tiny.
Regardless, it should still be ~64KB even in that case which is a far cry from eating all available memory. Something else must be going on....
Thanks,
Jake
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-04-17 17:52 ` Keller, Jacob E
(?)
@ 2025-05-21 9:32 ` Jaroslav Pulchart
-1 siblings, 0 replies; 59+ messages in thread
From: Jaroslav Pulchart @ 2025-05-21 9:32 UTC (permalink / raw)
To: Keller, Jacob E
Cc: Jakub Kicinski, Kitszel, Przemyslaw, Damato, Joe,
intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
Nguyen, Anthony L, Igor Raits, Daniel Secik, Zdenek Pesek,
Dumazet, Eric, Martin Karsten, Zaki, Ahmed, Czapnik, Lukasz,
Michal Swiatkowski
[-- Attachment #1: Type: text/plain, Size: 2239 bytes --]
Hello
some observation
* I'm still observing this "problem" with latest 6.14.y
* there must be multiple problems, the memory utilization is slowly going
down at home NUMA nodes where intel x810 NIC are (like some memory leak)
Were you able to reproduce the memory problems in your testbed?
Best,
Jaroslav
čt 17. 4. 2025 v 19:52 odesílatel Keller, Jacob E <jacob.e.keller@intel.com>
napsal:
>
>
> > -----Original Message-----
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Wednesday, April 16, 2025 5:13 PM
> > To: Keller, Jacob E <jacob.e.keller@intel.com>
> > Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>; Kitszel,
> Przemyslaw
> > <przemyslaw.kitszel@intel.com>; Damato, Joe <jdamato@fastly.com>;
> intel-wired-
> > lan@lists.osuosl.org; netdev@vger.kernel.org; Nguyen, Anthony L
> > <anthony.l.nguyen@intel.com>; Igor Raits <igor@gooddata.com>; Daniel
> Secik
> > <daniel.secik@gooddata.com>; Zdenek Pesek <zdenek.pesek@gooddata.com>;
> > Dumazet, Eric <edumazet@google.com>; Martin Karsten
> > <mkarsten@uwaterloo.ca>; Zaki, Ahmed <ahmed.zaki@intel.com>; Czapnik,
> > Lukasz <lukasz.czapnik@intel.com>; Michal Swiatkowski
> > <michal.swiatkowski@linux.intel.com>
> > Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with
> ICE
> > driver after upgrade to 6.13.y (regression in commit 492a044508ad)
> >
> > On Wed, 16 Apr 2025 22:57:10 +0000 Keller, Jacob E wrote:
> > > > > And you're reverting just and exactly 492a044508ad13 ?
> > > > > The memory for persistent config is allocated in alloc_netdev_mqs()
> > > > > unconditionally. I'm lost as to how this commit could make any
> > > > > difference :(
> > > >
> > > > Yes, reverted the 492a044508ad13.
> > >
> > > Struct napi_config *is* 1056 bytes
> >
> > You're probably looking at 6.15-rcX kernels. Yes, the affinity mask
> > can be large depending on the kernel config. But report is for 6.13,
> > AFAIU. In 6.13 and 6.14 napi_config was tiny.
>
> Regardless, it should still be ~64KB even in that case which is a far cry
> from eating all available memory. Something else must be going on....
>
> Thanks,
> Jake
>
--
Jaroslav Pulchart
Sr. Principal SW Engineer
GoodData
[-- Attachment #2: Type: text/html, Size: 4162 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-04-17 17:52 ` Keller, Jacob E
(?)
(?)
@ 2025-05-21 10:50 ` Jaroslav Pulchart
2025-06-04 8:42 ` Jaroslav Pulchart
-1 siblings, 1 reply; 59+ messages in thread
From: Jaroslav Pulchart @ 2025-05-21 10:50 UTC (permalink / raw)
To: Keller, Jacob E, Jakub Kicinski, Kitszel, Przemyslaw, Damato, Joe,
intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz,
Dumazet, Eric, Zaki, Ahmed, Martin Karsten
Cc: Igor Raits, Daniel Secik, Zdenek Pesek
čt 17. 4. 2025 v 19:52 odesílatel Keller, Jacob E
<jacob.e.keller@intel.com> napsal:
>
>
>
> > -----Original Message-----
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Wednesday, April 16, 2025 5:13 PM
> > To: Keller, Jacob E <jacob.e.keller@intel.com>
> > Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>; Kitszel, Przemyslaw
> > <przemyslaw.kitszel@intel.com>; Damato, Joe <jdamato@fastly.com>; intel-wired-
> > lan@lists.osuosl.org; netdev@vger.kernel.org; Nguyen, Anthony L
> > <anthony.l.nguyen@intel.com>; Igor Raits <igor@gooddata.com>; Daniel Secik
> > <daniel.secik@gooddata.com>; Zdenek Pesek <zdenek.pesek@gooddata.com>;
> > Dumazet, Eric <edumazet@google.com>; Martin Karsten
> > <mkarsten@uwaterloo.ca>; Zaki, Ahmed <ahmed.zaki@intel.com>; Czapnik,
> > Lukasz <lukasz.czapnik@intel.com>; Michal Swiatkowski
> > <michal.swiatkowski@linux.intel.com>
> > Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE
> > driver after upgrade to 6.13.y (regression in commit 492a044508ad)
> >
> > On Wed, 16 Apr 2025 22:57:10 +0000 Keller, Jacob E wrote:
> > > > > And you're reverting just and exactly 492a044508ad13 ?
> > > > > The memory for persistent config is allocated in alloc_netdev_mqs()
> > > > > unconditionally. I'm lost as to how this commit could make any
> > > > > difference :(
> > > >
> > > > Yes, reverted the 492a044508ad13.
> > >
> > > Struct napi_config *is* 1056 bytes
> >
> > You're probably looking at 6.15-rcX kernels. Yes, the affinity mask
> > can be large depending on the kernel config. But report is for 6.13,
> > AFAIU. In 6.13 and 6.14 napi_config was tiny.
>
> Regardless, it should still be ~64KB even in that case which is a far cry from eating all available memory. Something else must be going on....
>
> Thanks,
> Jake
Hello
Some observation, this "problem" still exists with the latest 6.14.y
and there must be multiple issues, the memory utilization is slowly
going down, from 3GB to 100MB in 10-20days. at home NUMA nodes where
intel x810 NIC are (looks like some memory leak related to
networking).
So without the revert the kawadX usage is observed asap like till
1-2d, with revert of mentioned commit kswadX starts to consume
resources later like in ~10d-20d later. It is almost impossible to use
servers with Intel X810 cards (ice driver) with recent linux kernels.
Were you able to reproduce the memory problems in your testbed?
Best,
Jaroslav
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-05-21 10:50 ` Jaroslav Pulchart
@ 2025-06-04 8:42 ` Jaroslav Pulchart
2025-06-25 12:17 ` Jaroslav Pulchart
0 siblings, 1 reply; 59+ messages in thread
From: Jaroslav Pulchart @ 2025-06-04 8:42 UTC (permalink / raw)
To: Keller, Jacob E, Jakub Kicinski, Kitszel, Przemyslaw, Damato, Joe,
intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz,
Dumazet, Eric, Zaki, Ahmed, Martin Karsten
Cc: Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1: Type: text/plain, Size: 3080 bytes --]
>
> čt 17. 4. 2025 v 19:52 odesílatel Keller, Jacob E
> <jacob.e.keller@intel.com> napsal:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jakub Kicinski <kuba@kernel.org>
> > > Sent: Wednesday, April 16, 2025 5:13 PM
> > > To: Keller, Jacob E <jacob.e.keller@intel.com>
> > > Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>; Kitszel, Przemyslaw
> > > <przemyslaw.kitszel@intel.com>; Damato, Joe <jdamato@fastly.com>; intel-wired-
> > > lan@lists.osuosl.org; netdev@vger.kernel.org; Nguyen, Anthony L
> > > <anthony.l.nguyen@intel.com>; Igor Raits <igor@gooddata.com>; Daniel Secik
> > > <daniel.secik@gooddata.com>; Zdenek Pesek <zdenek.pesek@gooddata.com>;
> > > Dumazet, Eric <edumazet@google.com>; Martin Karsten
> > > <mkarsten@uwaterloo.ca>; Zaki, Ahmed <ahmed.zaki@intel.com>; Czapnik,
> > > Lukasz <lukasz.czapnik@intel.com>; Michal Swiatkowski
> > > <michal.swiatkowski@linux.intel.com>
> > > Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE
> > > driver after upgrade to 6.13.y (regression in commit 492a044508ad)
> > >
> > > On Wed, 16 Apr 2025 22:57:10 +0000 Keller, Jacob E wrote:
> > > > > > And you're reverting just and exactly 492a044508ad13 ?
> > > > > > The memory for persistent config is allocated in alloc_netdev_mqs()
> > > > > > unconditionally. I'm lost as to how this commit could make any
> > > > > > difference :(
> > > > >
> > > > > Yes, reverted the 492a044508ad13.
> > > >
> > > > Struct napi_config *is* 1056 bytes
> > >
> > > You're probably looking at 6.15-rcX kernels. Yes, the affinity mask
> > > can be large depending on the kernel config. But report is for 6.13,
> > > AFAIU. In 6.13 and 6.14 napi_config was tiny.
> >
> > Regardless, it should still be ~64KB even in that case which is a far cry from eating all available memory. Something else must be going on....
> >
> > Thanks,
> > Jake
>
> Hello
>
> Some observation, this "problem" still exists with the latest 6.14.y
> and there must be multiple issues, the memory utilization is slowly
> going down, from 3GB to 100MB in 10-20days. at home NUMA nodes where
> intel x810 NIC are (looks like some memory leak related to
> networking).
>
> So without the revert the kawadX usage is observed asap like till
> 1-2d, with revert of mentioned commit kswadX starts to consume
> resources later like in ~10d-20d later. It is almost impossible to use
> servers with Intel X810 cards (ice driver) with recent linux kernels.
>
> Were you able to reproduce the memory problems in your testbed?
>
> Best,
> Jaroslav
Hello
I deployed linux 6.15.0 to our servers 7d ago and observed the
behaviour of memory utilization of NUMA home nodes of Intel X810
1/ there is no need to revert the commit as before,
2/ the memory is continuously consumed (like memory leak),
see attached "7d_memory_usage_per_numa_linux6.15.0.png" screenshot 8x
numa nodes, (NUMA0 + NUMA1 are local for X810 nics). BTW: We do not
see this memory utilization pattern on server s using Broadcom
Netxtreme-E NICs
[-- Attachment #2: 7d_memory_usage_per_numa_linux6.15.0.png --]
[-- Type: image/png, Size: 430093 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-06-04 8:42 ` Jaroslav Pulchart
@ 2025-06-25 12:17 ` Jaroslav Pulchart
2025-06-25 14:03 ` Przemek Kitszel
2025-06-25 14:53 ` Paul Menzel
0 siblings, 2 replies; 59+ messages in thread
From: Jaroslav Pulchart @ 2025-06-25 12:17 UTC (permalink / raw)
To: Keller, Jacob E, Jakub Kicinski, Kitszel, Przemyslaw, Damato, Joe,
intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz,
Dumazet, Eric, Zaki, Ahmed, Martin Karsten
Cc: Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1: Type: text/plain, Size: 3737 bytes --]
Hello
We are still facing the memory issue with Intel 810 NICs (even on latest
6.15.y).
Our current stabilization and solution is to move everything to a new
INTEL-FREE server and get rid of last Intel sights there (after Intel's CPU
vulnerabilities fuckups NICs are next step).
Any help welcomed,
Jaroslav P.
st 4. 6. 2025 v 10:42 odesílatel Jaroslav Pulchart <
jaroslav.pulchart@gooddata.com> napsal:
> >
> > čt 17. 4. 2025 v 19:52 odesílatel Keller, Jacob E
> > <jacob.e.keller@intel.com> napsal:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jakub Kicinski <kuba@kernel.org>
> > > > Sent: Wednesday, April 16, 2025 5:13 PM
> > > > To: Keller, Jacob E <jacob.e.keller@intel.com>
> > > > Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>; Kitszel,
> Przemyslaw
> > > > <przemyslaw.kitszel@intel.com>; Damato, Joe <jdamato@fastly.com>;
> intel-wired-
> > > > lan@lists.osuosl.org; netdev@vger.kernel.org; Nguyen, Anthony L
> > > > <anthony.l.nguyen@intel.com>; Igor Raits <igor@gooddata.com>;
> Daniel Secik
> > > > <daniel.secik@gooddata.com>; Zdenek Pesek <zdenek.pesek@gooddata.com
> >;
> > > > Dumazet, Eric <edumazet@google.com>; Martin Karsten
> > > > <mkarsten@uwaterloo.ca>; Zaki, Ahmed <ahmed.zaki@intel.com>;
> Czapnik,
> > > > Lukasz <lukasz.czapnik@intel.com>; Michal Swiatkowski
> > > > <michal.swiatkowski@linux.intel.com>
> > > > Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes
> with ICE
> > > > driver after upgrade to 6.13.y (regression in commit 492a044508ad)
> > > >
> > > > On Wed, 16 Apr 2025 22:57:10 +0000 Keller, Jacob E wrote:
> > > > > > > And you're reverting just and exactly 492a044508ad13 ?
> > > > > > > The memory for persistent config is allocated in
> alloc_netdev_mqs()
> > > > > > > unconditionally. I'm lost as to how this commit could make any
> > > > > > > difference :(
> > > > > >
> > > > > > Yes, reverted the 492a044508ad13.
> > > > >
> > > > > Struct napi_config *is* 1056 bytes
> > > >
> > > > You're probably looking at 6.15-rcX kernels. Yes, the affinity mask
> > > > can be large depending on the kernel config. But report is for 6.13,
> > > > AFAIU. In 6.13 and 6.14 napi_config was tiny.
> > >
> > > Regardless, it should still be ~64KB even in that case which is a far
> cry from eating all available memory. Something else must be going on....
> > >
> > > Thanks,
> > > Jake
> >
> > Hello
> >
> > Some observation, this "problem" still exists with the latest 6.14.y
> > and there must be multiple issues, the memory utilization is slowly
> > going down, from 3GB to 100MB in 10-20days. at home NUMA nodes where
> > intel x810 NIC are (looks like some memory leak related to
> > networking).
> >
> > So without the revert the kawadX usage is observed asap like till
> > 1-2d, with revert of mentioned commit kswadX starts to consume
> > resources later like in ~10d-20d later. It is almost impossible to use
> > servers with Intel X810 cards (ice driver) with recent linux kernels.
> >
> > Were you able to reproduce the memory problems in your testbed?
> >
> > Best,
> > Jaroslav
>
> Hello
>
> I deployed linux 6.15.0 to our servers 7d ago and observed the
> behaviour of memory utilization of NUMA home nodes of Intel X810
> 1/ there is no need to revert the commit as before,
> 2/ the memory is continuously consumed (like memory leak),
> see attached "7d_memory_usage_per_numa_linux6.15.0.png" screenshot 8x
> numa nodes, (NUMA0 + NUMA1 are local for X810 nics). BTW: We do not
> see this memory utilization pattern on server s using Broadcom
> Netxtreme-E NICs
>
--
Jaroslav Pulchart
Sr. Principal SW Engineer
GoodData
[-- Attachment #2: Type: text/html, Size: 6082 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-06-25 12:17 ` Jaroslav Pulchart
@ 2025-06-25 14:03 ` Przemek Kitszel
2025-06-25 17:51 ` Jaroslav Pulchart
2025-06-25 14:53 ` Paul Menzel
1 sibling, 1 reply; 59+ messages in thread
From: Przemek Kitszel @ 2025-06-25 14:03 UTC (permalink / raw)
To: Jaroslav Pulchart, intel-wired-lan@lists.osuosl.org
Cc: Keller, Jacob E, Jakub Kicinski, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
On 6/25/25 14:17, Jaroslav Pulchart wrote:
> Hello
>
> We are still facing the memory issue with Intel 810 NICs (even on latest
> 6.15.y).
>
> Our current stabilization and solution is to move everything to a new
> INTEL-FREE server and get rid of last Intel sights there (after Intel's
> CPU vulnerabilities fuckups NICs are next step).
>
> Any help welcomed,
> Jaroslav P.
>
>
Thank you for urging us, I can understand the frustration.
We have identified some (unrelated) memory leaks, will soon ship fixes.
And, as there were no clear issue with any commit/version you have
posted to be a culprit, there is a chance that our random findings could
help. Anyway going to zero kmemleak reports is good in itself, that is
a good start.
Will ask my VAL too to increase efforts in this area too.
Przemek
>
> st 4. 6. 2025 v 10:42 odesílatel Jaroslav Pulchart
> <jaroslav.pulchart@gooddata.com <mailto:jaroslav.pulchart@gooddata.com>>
> napsal:
>
> >
> > čt 17. 4. 2025 v 19:52 odesílatel Keller, Jacob E
> > <jacob.e.keller@intel.com <mailto:jacob.e.keller@intel.com>> napsal:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jakub Kicinski <kuba@kernel.org <mailto:kuba@kernel.org>>
> > > > Sent: Wednesday, April 16, 2025 5:13 PM
> > > > To: Keller, Jacob E <jacob.e.keller@intel.com
> <mailto:jacob.e.keller@intel.com>>
> > > > Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com
> <mailto:jaroslav.pulchart@gooddata.com>>; Kitszel, Przemyslaw
> > > > <przemyslaw.kitszel@intel.com
> <mailto:przemyslaw.kitszel@intel.com>>; Damato, Joe
> <jdamato@fastly.com <mailto:jdamato@fastly.com>>; intel-wired-
> > > > lan@lists.osuosl.org <mailto:lan@lists.osuosl.org>;
> netdev@vger.kernel.org <mailto:netdev@vger.kernel.org>; Nguyen,
> Anthony L
> > > > <anthony.l.nguyen@intel.com
> <mailto:anthony.l.nguyen@intel.com>>; Igor Raits <igor@gooddata.com
> <mailto:igor@gooddata.com>>; Daniel Secik
> > > > <daniel.secik@gooddata.com
> <mailto:daniel.secik@gooddata.com>>; Zdenek Pesek
> <zdenek.pesek@gooddata.com <mailto:zdenek.pesek@gooddata.com>>;
> > > > Dumazet, Eric <edumazet@google.com
> <mailto:edumazet@google.com>>; Martin Karsten
> > > > <mkarsten@uwaterloo.ca <mailto:mkarsten@uwaterloo.ca>>; Zaki,
> Ahmed <ahmed.zaki@intel.com <mailto:ahmed.zaki@intel.com>>; Czapnik,
> > > > Lukasz <lukasz.czapnik@intel.com
> <mailto:lukasz.czapnik@intel.com>>; Michal Swiatkowski
> > > > <michal.swiatkowski@linux.intel.com
> <mailto:michal.swiatkowski@linux.intel.com>>
> > > > Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA
> nodes with ICE
> > > > driver after upgrade to 6.13.y (regression in commit
> 492a044508ad)
> > > >
> > > > On Wed, 16 Apr 2025 22:57:10 +0000 Keller, Jacob E wrote:
> > > > > > > And you're reverting just and exactly 492a044508ad13 ?
> > > > > > > The memory for persistent config is allocated in
> alloc_netdev_mqs()
> > > > > > > unconditionally. I'm lost as to how this commit could
> make any
> > > > > > > difference :(
> > > > > >
> > > > > > Yes, reverted the 492a044508ad13.
> > > > >
> > > > > Struct napi_config *is* 1056 bytes
> > > >
> > > > You're probably looking at 6.15-rcX kernels. Yes, the
> affinity mask
> > > > can be large depending on the kernel config. But report is
> for 6.13,
> > > > AFAIU. In 6.13 and 6.14 napi_config was tiny.
> > >
> > > Regardless, it should still be ~64KB even in that case which is
> a far cry from eating all available memory. Something else must be
> going on....
> > >
> > > Thanks,
> > > Jake
> >
> > Hello
> >
> > Some observation, this "problem" still exists with the latest 6.14.y
> > and there must be multiple issues, the memory utilization is slowly
> > going down, from 3GB to 100MB in 10-20days. at home NUMA nodes where
> > intel x810 NIC are (looks like some memory leak related to
> > networking).
> >
> > So without the revert the kawadX usage is observed asap like till
> > 1-2d, with revert of mentioned commit kswadX starts to consume
> > resources later like in ~10d-20d later. It is almost impossible
> to use
> > servers with Intel X810 cards (ice driver) with recent linux kernels.
> >
> > Were you able to reproduce the memory problems in your testbed?
> >
> > Best,
> > Jaroslav
>
> Hello
>
> I deployed linux 6.15.0 to our servers 7d ago and observed the
> behaviour of memory utilization of NUMA home nodes of Intel X810
> 1/ there is no need to revert the commit as before,
> 2/ the memory is continuously consumed (like memory leak),
> see attached "7d_memory_usage_per_numa_linux6.15.0.png" screenshot 8x
> numa nodes, (NUMA0 + NUMA1 are local for X810 nics). BTW: We do not
> see this memory utilization pattern on server s using Broadcom
> Netxtreme-E NICs
>
>
>
> --
> Jaroslav Pulchart
> Sr. Principal SW Engineer
> GoodData
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-06-25 14:03 ` Przemek Kitszel
@ 2025-06-25 17:51 ` Jaroslav Pulchart
2025-06-25 20:25 ` Jakub Kicinski
0 siblings, 1 reply; 59+ messages in thread
From: Jaroslav Pulchart @ 2025-06-25 17:51 UTC (permalink / raw)
To: Przemek Kitszel
Cc: intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski,
Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L,
Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed,
Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1: Type: text/plain, Size: 5885 bytes --]
Great, please send me a link to the related patch set. I can apply them in
our kernel build and try them ASAP!
st 25. 6. 2025 v 16:03 odesílatel Przemek Kitszel <
przemyslaw.kitszel@intel.com> napsal:
> On 6/25/25 14:17, Jaroslav Pulchart wrote:
> > Hello
> >
> > We are still facing the memory issue with Intel 810 NICs (even on latest
> > 6.15.y).
> >
> > Our current stabilization and solution is to move everything to a new
> > INTEL-FREE server and get rid of last Intel sights there (after Intel's
> > CPU vulnerabilities fuckups NICs are next step).
> >
> > Any help welcomed,
> > Jaroslav P.
> >
> >
>
> Thank you for urging us, I can understand the frustration.
>
> We have identified some (unrelated) memory leaks, will soon ship fixes.
> And, as there were no clear issue with any commit/version you have
> posted to be a culprit, there is a chance that our random findings could
> help. Anyway going to zero kmemleak reports is good in itself, that is
> a good start.
>
> Will ask my VAL too to increase efforts in this area too.
>
> Przemek
>
> >
> > st 4. 6. 2025 v 10:42 odesílatel Jaroslav Pulchart
> > <jaroslav.pulchart@gooddata.com <mailto:jaroslav.pulchart@gooddata.com>>
>
> > napsal:
> >
> > >
> > > čt 17. 4. 2025 v 19:52 odesílatel Keller, Jacob E
> > > <jacob.e.keller@intel.com <mailto:jacob.e.keller@intel.com>>
> napsal:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Jakub Kicinski <kuba@kernel.org <mailto:kuba@kernel.org
> >>
> > > > > Sent: Wednesday, April 16, 2025 5:13 PM
> > > > > To: Keller, Jacob E <jacob.e.keller@intel.com
> > <mailto:jacob.e.keller@intel.com>>
> > > > > Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com
> > <mailto:jaroslav.pulchart@gooddata.com>>; Kitszel, Przemyslaw
> > > > > <przemyslaw.kitszel@intel.com
> > <mailto:przemyslaw.kitszel@intel.com>>; Damato, Joe
> > <jdamato@fastly.com <mailto:jdamato@fastly.com>>; intel-wired-
> > > > > lan@lists.osuosl.org <mailto:lan@lists.osuosl.org>;
> > netdev@vger.kernel.org <mailto:netdev@vger.kernel.org>; Nguyen,
> > Anthony L
> > > > > <anthony.l.nguyen@intel.com
> > <mailto:anthony.l.nguyen@intel.com>>; Igor Raits <igor@gooddata.com
> > <mailto:igor@gooddata.com>>; Daniel Secik
> > > > > <daniel.secik@gooddata.com
> > <mailto:daniel.secik@gooddata.com>>; Zdenek Pesek
> > <zdenek.pesek@gooddata.com <mailto:zdenek.pesek@gooddata.com>>;
> > > > > Dumazet, Eric <edumazet@google.com
> > <mailto:edumazet@google.com>>; Martin Karsten
> > > > > <mkarsten@uwaterloo.ca <mailto:mkarsten@uwaterloo.ca>>; Zaki,
> > Ahmed <ahmed.zaki@intel.com <mailto:ahmed.zaki@intel.com>>; Czapnik,
> > > > > Lukasz <lukasz.czapnik@intel.com
> > <mailto:lukasz.czapnik@intel.com>>; Michal Swiatkowski
> > > > > <michal.swiatkowski@linux.intel.com
> > <mailto:michal.swiatkowski@linux.intel.com>>
> > > > > Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA
> > nodes with ICE
> > > > > driver after upgrade to 6.13.y (regression in commit
> > 492a044508ad)
> > > > >
> > > > > On Wed, 16 Apr 2025 22:57:10 +0000 Keller, Jacob E wrote:
> > > > > > > > And you're reverting just and exactly 492a044508ad13 ?
> > > > > > > > The memory for persistent config is allocated in
> > alloc_netdev_mqs()
> > > > > > > > unconditionally. I'm lost as to how this commit could
> > make any
> > > > > > > > difference :(
> > > > > > >
> > > > > > > Yes, reverted the 492a044508ad13.
> > > > > >
> > > > > > Struct napi_config *is* 1056 bytes
> > > > >
> > > > > You're probably looking at 6.15-rcX kernels. Yes, the
> > affinity mask
> > > > > can be large depending on the kernel config. But report is
> > for 6.13,
> > > > > AFAIU. In 6.13 and 6.14 napi_config was tiny.
> > > >
> > > > Regardless, it should still be ~64KB even in that case which is
> > a far cry from eating all available memory. Something else must be
> > going on....
> > > >
> > > > Thanks,
> > > > Jake
> > >
> > > Hello
> > >
> > > Some observation, this "problem" still exists with the latest
> 6.14.y
> > > and there must be multiple issues, the memory utilization is
> slowly
> > > going down, from 3GB to 100MB in 10-20days. at home NUMA nodes
> where
> > > intel x810 NIC are (looks like some memory leak related to
> > > networking).
> > >
> > > So without the revert the kawadX usage is observed asap like till
> > > 1-2d, with revert of mentioned commit kswadX starts to consume
> > > resources later like in ~10d-20d later. It is almost impossible
> > to use
> > > servers with Intel X810 cards (ice driver) with recent linux
> kernels.
> > >
> > > Were you able to reproduce the memory problems in your testbed?
> > >
> > > Best,
> > > Jaroslav
> >
> > Hello
> >
> > I deployed linux 6.15.0 to our servers 7d ago and observed the
> > behaviour of memory utilization of NUMA home nodes of Intel X810
> > 1/ there is no need to revert the commit as before,
> > 2/ the memory is continuously consumed (like memory leak),
> > see attached "7d_memory_usage_per_numa_linux6.15.0.png" screenshot 8x
> > numa nodes, (NUMA0 + NUMA1 are local for X810 nics). BTW: We do not
> > see this memory utilization pattern on server s using Broadcom
> > Netxtreme-E NICs
> >
> >
> >
> > --
> > Jaroslav Pulchart
> > Sr. Principal SW Engineer
> > GoodData
>
>
--
Jaroslav Pulchart
Sr. Principal SW Engineer
GoodData
[-- Attachment #2: Type: text/html, Size: 10229 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-06-25 17:51 ` Jaroslav Pulchart
@ 2025-06-25 20:25 ` Jakub Kicinski
2025-06-26 7:42 ` Jaroslav Pulchart
0 siblings, 1 reply; 59+ messages in thread
From: Jakub Kicinski @ 2025-06-25 20:25 UTC (permalink / raw)
To: Jaroslav Pulchart
Cc: Przemek Kitszel, intel-wired-lan@lists.osuosl.org,
Keller, Jacob E, Damato, Joe, netdev@vger.kernel.org,
Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz,
Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits,
Daniel Secik, Zdenek Pesek
On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote:
> Great, please send me a link to the related patch set. I can apply them in
> our kernel build and try them ASAP!
Sorry if I'm repeating the question - have you tried
CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels
is low enough to use it for production workloads.
> st 25. 6. 2025 v 16:03 odesílatel Przemek Kitszel <
> przemyslaw.kitszel@intel.com> napsal:
>
> > On 6/25/25 14:17, Jaroslav Pulchart wrote:
> > > Hello
> > >
> > > We are still facing the memory issue with Intel 810 NICs (even on latest
> > > 6.15.y).
> > >
> > > Our current stabilization and solution is to move everything to a new
> > > INTEL-FREE server and get rid of last Intel sights there (after Intel's
> > > CPU vulnerabilities fuckups NICs are next step).
> > >
> > > Any help welcomed,
> > > Jaroslav P.
> > >
> > >
> >
> > Thank you for urging us, I can understand the frustration.
> >
> > We have identified some (unrelated) memory leaks, will soon ship fixes.
> > And, as there were no clear issue with any commit/version you have
> > posted to be a culprit, there is a chance that our random findings could
> > help. Anyway going to zero kmemleak reports is good in itself, that is
> > a good start.
> >
> > Will ask my VAL too to increase efforts in this area too.
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-06-25 20:25 ` Jakub Kicinski
@ 2025-06-26 7:42 ` Jaroslav Pulchart
2025-06-30 7:35 ` Jaroslav Pulchart
0 siblings, 1 reply; 59+ messages in thread
From: Jaroslav Pulchart @ 2025-06-26 7:42 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Przemek Kitszel, intel-wired-lan@lists.osuosl.org,
Keller, Jacob E, Damato, Joe, netdev@vger.kernel.org,
Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz,
Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits,
Daniel Secik, Zdenek Pesek
>
> On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote:
> > Great, please send me a link to the related patch set. I can apply them in
> > our kernel build and try them ASAP!
>
> Sorry if I'm repeating the question - have you tried
> CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels
> is low enough to use it for production workloads.
I try it now, the fresh booted server:
# sort -g /proc/allocinfo| tail -n 15
45409728 236509 fs/dcache.c:1681 func:__d_alloc
71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages
71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages
85098496 4486 mm/slub.c:2452 func:alloc_slab_page
115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio
191594496 46776 mm/memory.c:1056 func:folio_prealloc
360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio
444076032 33790 mm/slub.c:2450 func:alloc_slab_page
530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
1022427136 249616 mm/memory.c:1054 func:folio_prealloc
1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681
[ice] func:ice_alloc_mapped_page
1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio
>
> > st 25. 6. 2025 v 16:03 odesílatel Przemek Kitszel <
> > przemyslaw.kitszel@intel.com> napsal:
> >
> > > On 6/25/25 14:17, Jaroslav Pulchart wrote:
> > > > Hello
> > > >
> > > > We are still facing the memory issue with Intel 810 NICs (even on latest
> > > > 6.15.y).
> > > >
> > > > Our current stabilization and solution is to move everything to a new
> > > > INTEL-FREE server and get rid of last Intel sights there (after Intel's
> > > > CPU vulnerabilities fuckups NICs are next step).
> > > >
> > > > Any help welcomed,
> > > > Jaroslav P.
> > > >
> > > >
> > >
> > > Thank you for urging us, I can understand the frustration.
> > >
> > > We have identified some (unrelated) memory leaks, will soon ship fixes.
> > > And, as there were no clear issue with any commit/version you have
> > > posted to be a culprit, there is a chance that our random findings could
> > > help. Anyway going to zero kmemleak reports is good in itself, that is
> > > a good start.
> > >
> > > Will ask my VAL too to increase efforts in this area too.
>
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-06-26 7:42 ` Jaroslav Pulchart
@ 2025-06-30 7:35 ` Jaroslav Pulchart
2025-06-30 16:02 ` Jacob Keller
0 siblings, 1 reply; 59+ messages in thread
From: Jaroslav Pulchart @ 2025-06-30 7:35 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Przemek Kitszel, intel-wired-lan@lists.osuosl.org,
Keller, Jacob E, Damato, Joe, netdev@vger.kernel.org,
Nguyen, Anthony L, Michal Swiatkowski, Czapnik, Lukasz,
Dumazet, Eric, Zaki, Ahmed, Martin Karsten, Igor Raits,
Daniel Secik, Zdenek Pesek
>
> >
> > On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote:
> > > Great, please send me a link to the related patch set. I can apply them in
> > > our kernel build and try them ASAP!
> >
> > Sorry if I'm repeating the question - have you tried
> > CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels
> > is low enough to use it for production workloads.
>
> I try it now, the fresh booted server:
>
> # sort -g /proc/allocinfo| tail -n 15
> 45409728 236509 fs/dcache.c:1681 func:__d_alloc
> 71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages
> 71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages
> 85098496 4486 mm/slub.c:2452 func:alloc_slab_page
> 115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
> 141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio
> 191594496 46776 mm/memory.c:1056 func:folio_prealloc
> 360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio
> 444076032 33790 mm/slub.c:2450 func:alloc_slab_page
> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
> 975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
> 1022427136 249616 mm/memory.c:1054 func:folio_prealloc
> 1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681
> [ice] func:ice_alloc_mapped_page
> 1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio
>
The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice]
func:ice_alloc_mapped_page" is just growing...
# uptime ; sort -g /proc/allocinfo| tail -n 15
09:33:58 up 4 days, 6 min, 1 user, load average: 6.65, 8.18, 9.81
# sort -g /proc/allocinfo| tail -n 15
85216896 443838 fs/dcache.c:1681 func:__d_alloc
106156032 25917 mm/shmem.c:1854 func:shmem_alloc_folio
116850096 102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
143556608 6894 mm/slub.c:2452 func:alloc_slab_page
186793984 45604 mm/memory.c:1056 func:folio_prealloc
362807296 88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages
530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
598237184 51309 mm/slub.c:2450 func:alloc_slab_page
838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
929083392 226827 mm/filemap.c:1978 func:__filemap_get_folio
1034657792 252602 mm/memory.c:1054 func:folio_prealloc
1262485504 602 mm/khugepaged.c:1084 func:alloc_charge_folio
1335377920 325970 mm/readahead.c:186 func:ractl_alloc_folio
2544877568 315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681
[ice] func:ice_alloc_mapped_page
>
> >
> > > st 25. 6. 2025 v 16:03 odesílatel Przemek Kitszel <
> > > przemyslaw.kitszel@intel.com> napsal:
> > >
> > > > On 6/25/25 14:17, Jaroslav Pulchart wrote:
> > > > > Hello
> > > > >
> > > > > We are still facing the memory issue with Intel 810 NICs (even on latest
> > > > > 6.15.y).
> > > > >
> > > > > Our current stabilization and solution is to move everything to a new
> > > > > INTEL-FREE server and get rid of last Intel sights there (after Intel's
> > > > > CPU vulnerabilities fuckups NICs are next step).
> > > > >
> > > > > Any help welcomed,
> > > > > Jaroslav P.
> > > > >
> > > > >
> > > >
> > > > Thank you for urging us, I can understand the frustration.
> > > >
> > > > We have identified some (unrelated) memory leaks, will soon ship fixes.
> > > > And, as there were no clear issue with any commit/version you have
> > > > posted to be a culprit, there is a chance that our random findings could
> > > > help. Anyway going to zero kmemleak reports is good in itself, that is
> > > > a good start.
> > > >
> > > > Will ask my VAL too to increase efforts in this area too.
> >
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-06-30 7:35 ` Jaroslav Pulchart
@ 2025-06-30 16:02 ` Jacob Keller
2025-06-30 17:24 ` Jaroslav Pulchart
0 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2025-06-30 16:02 UTC (permalink / raw)
To: Jaroslav Pulchart, Jakub Kicinski
Cc: Przemek Kitszel, intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1.1: Type: text/plain, Size: 3954 bytes --]
On 6/30/2025 12:35 AM, Jaroslav Pulchart wrote:
>>
>>>
>>> On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote:
>>>> Great, please send me a link to the related patch set. I can apply them in
>>>> our kernel build and try them ASAP!
>>>
>>> Sorry if I'm repeating the question - have you tried
>>> CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels
>>> is low enough to use it for production workloads.
>>
>> I try it now, the fresh booted server:
>>
>> # sort -g /proc/allocinfo| tail -n 15
>> 45409728 236509 fs/dcache.c:1681 func:__d_alloc
>> 71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages
>> 71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages
>> 85098496 4486 mm/slub.c:2452 func:alloc_slab_page
>> 115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
>> 141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio
>> 191594496 46776 mm/memory.c:1056 func:folio_prealloc
>> 360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio
>> 444076032 33790 mm/slub.c:2450 func:alloc_slab_page
>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
>> 975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
>> 1022427136 249616 mm/memory.c:1054 func:folio_prealloc
>> 1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681
>> [ice] func:ice_alloc_mapped_page
>> 1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio
>>
>
> The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice]
> func:ice_alloc_mapped_page" is just growing...
>
> # uptime ; sort -g /proc/allocinfo| tail -n 15
> 09:33:58 up 4 days, 6 min, 1 user, load average: 6.65, 8.18, 9.81
>
> # sort -g /proc/allocinfo| tail -n 15
> 85216896 443838 fs/dcache.c:1681 func:__d_alloc
> 106156032 25917 mm/shmem.c:1854 func:shmem_alloc_folio
> 116850096 102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
> 143556608 6894 mm/slub.c:2452 func:alloc_slab_page
> 186793984 45604 mm/memory.c:1056 func:folio_prealloc
> 362807296 88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages
> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
> 598237184 51309 mm/slub.c:2450 func:alloc_slab_page
> 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
> 929083392 226827 mm/filemap.c:1978 func:__filemap_get_folio
> 1034657792 252602 mm/memory.c:1054 func:folio_prealloc
> 1262485504 602 mm/khugepaged.c:1084 func:alloc_charge_folio
> 1335377920 325970 mm/readahead.c:186 func:ractl_alloc_folio
> 2544877568 315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681
> [ice] func:ice_alloc_mapped_page
>
ice_alloc_mapped_page is the function used to allocate the pages for the
Rx ring buffers.
There were a number of fixes for the hot path from Maciej which might be
related. Although those fixes were primarily for XDP they do impact the
regular hot path as well.
These were fixes on top of work he did which landed in v6.13, so it
seems plausible they might be related. In particular one which mentions
a missing buffer put:
743bbd93cf29 ("ice: put Rx buffers after being done with current frame")
It says the following:
> While at it, address an error path of ice_add_xdp_frag() - we were
> missing buffer putting from day 1 there.
>
It seems to me the issue must be somehow related to the buffer cleanup
logic for the Rx ring, since thats the only thing allocated by
ice_alloc_mapped_page.
It might be something fixed with the work Maciej did.. but it seems very
weird that 492a044508ad ("ice: Add support for persistent NAPI config")
would affect that logic at all....
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-06-30 16:02 ` Jacob Keller
@ 2025-06-30 17:24 ` Jaroslav Pulchart
2025-06-30 18:59 ` Jacob Keller
0 siblings, 1 reply; 59+ messages in thread
From: Jaroslav Pulchart @ 2025-06-30 17:24 UTC (permalink / raw)
To: Jacob Keller
Cc: Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org,
Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L,
Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed,
Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek
>
>
>
> On 6/30/2025 12:35 AM, Jaroslav Pulchart wrote:
> >>
> >>>
> >>> On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote:
> >>>> Great, please send me a link to the related patch set. I can apply them in
> >>>> our kernel build and try them ASAP!
> >>>
> >>> Sorry if I'm repeating the question - have you tried
> >>> CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels
> >>> is low enough to use it for production workloads.
> >>
> >> I try it now, the fresh booted server:
> >>
> >> # sort -g /proc/allocinfo| tail -n 15
> >> 45409728 236509 fs/dcache.c:1681 func:__d_alloc
> >> 71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages
> >> 71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages
> >> 85098496 4486 mm/slub.c:2452 func:alloc_slab_page
> >> 115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
> >> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
> >> 141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio
> >> 191594496 46776 mm/memory.c:1056 func:folio_prealloc
> >> 360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio
> >> 444076032 33790 mm/slub.c:2450 func:alloc_slab_page
> >> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
> >> 975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
> >> 1022427136 249616 mm/memory.c:1054 func:folio_prealloc
> >> 1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681
> >> [ice] func:ice_alloc_mapped_page
> >> 1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio
> >>
> >
> > The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice]
> > func:ice_alloc_mapped_page" is just growing...
> >
> > # uptime ; sort -g /proc/allocinfo| tail -n 15
> > 09:33:58 up 4 days, 6 min, 1 user, load average: 6.65, 8.18, 9.81
> >
> > # sort -g /proc/allocinfo| tail -n 15
> > 85216896 443838 fs/dcache.c:1681 func:__d_alloc
> > 106156032 25917 mm/shmem.c:1854 func:shmem_alloc_folio
> > 116850096 102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
> > 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
> > 143556608 6894 mm/slub.c:2452 func:alloc_slab_page
> > 186793984 45604 mm/memory.c:1056 func:folio_prealloc
> > 362807296 88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages
> > 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
> > 598237184 51309 mm/slub.c:2450 func:alloc_slab_page
> > 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
> > 929083392 226827 mm/filemap.c:1978 func:__filemap_get_folio
> > 1034657792 252602 mm/memory.c:1054 func:folio_prealloc
> > 1262485504 602 mm/khugepaged.c:1084 func:alloc_charge_folio
> > 1335377920 325970 mm/readahead.c:186 func:ractl_alloc_folio
> > 2544877568 315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681
> > [ice] func:ice_alloc_mapped_page
> >
> ice_alloc_mapped_page is the function used to allocate the pages for the
> Rx ring buffers.
>
> There were a number of fixes for the hot path from Maciej which might be
> related. Although those fixes were primarily for XDP they do impact the
> regular hot path as well.
>
> These were fixes on top of work he did which landed in v6.13, so it
> seems plausible they might be related. In particular one which mentions
> a missing buffer put:
>
> 743bbd93cf29 ("ice: put Rx buffers after being done with current frame")
>
> It says the following:
> > While at it, address an error path of ice_add_xdp_frag() - we were
> > missing buffer putting from day 1 there.
> >
>
> It seems to me the issue must be somehow related to the buffer cleanup
> logic for the Rx ring, since thats the only thing allocated by
> ice_alloc_mapped_page.
>
> It might be something fixed with the work Maciej did.. but it seems very
> weird that 492a044508ad ("ice: Add support for persistent NAPI config")
> would affect that logic at all....
I believe there were/are at least two separate issues. Regarding
commit 492a044508ad (“ice: Add support for persistent NAPI config”):
* On 6.13.y and 6.14.y kernels, this change prevented us from lowering
the driver’s initial, large memory allocation immediately after server
power-up. A few hours (max few days) later, this inevitably led to an
out-of-memory condition.
* Reverting the commit in those series only delayed the OOM, it
allowed the queue size (and thus memory footprint) to shrink on boot
just as it did in 6.12.y but didn’t eliminate the underlying 'leak'.
* In 6.15.y, however, that revert isn’t required (and isn’t even
applicable). The after boot allocation can once again be tuned down
without patching. Still, we observe the same increase in memory use
over time, as shown in the 'allocmap' output.
Thus, commit 492a044508ad led us down a false trail, or at the very
least hastened the inevitable OOM.
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-06-30 17:24 ` Jaroslav Pulchart
@ 2025-06-30 18:59 ` Jacob Keller
2025-06-30 20:01 ` Jaroslav Pulchart
0 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2025-06-30 18:59 UTC (permalink / raw)
To: Jaroslav Pulchart, Maciej Fijalkowski
Cc: Jakub Kicinski, Przemek Kitszel, intel-wired-lan@lists.osuosl.org,
Damato, Joe, netdev@vger.kernel.org, Nguyen, Anthony L,
Michal Swiatkowski, Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed,
Martin Karsten, Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1.1: Type: text/plain, Size: 5575 bytes --]
On 6/30/2025 10:24 AM, Jaroslav Pulchart wrote:
>>
>>
>>
>> On 6/30/2025 12:35 AM, Jaroslav Pulchart wrote:
>>>>
>>>>>
>>>>> On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote:
>>>>>> Great, please send me a link to the related patch set. I can apply them in
>>>>>> our kernel build and try them ASAP!
>>>>>
>>>>> Sorry if I'm repeating the question - have you tried
>>>>> CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels
>>>>> is low enough to use it for production workloads.
>>>>
>>>> I try it now, the fresh booted server:
>>>>
>>>> # sort -g /proc/allocinfo| tail -n 15
>>>> 45409728 236509 fs/dcache.c:1681 func:__d_alloc
>>>> 71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages
>>>> 71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages
>>>> 85098496 4486 mm/slub.c:2452 func:alloc_slab_page
>>>> 115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
>>>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
>>>> 141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio
>>>> 191594496 46776 mm/memory.c:1056 func:folio_prealloc
>>>> 360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio
>>>> 444076032 33790 mm/slub.c:2450 func:alloc_slab_page
>>>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
>>>> 975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
>>>> 1022427136 249616 mm/memory.c:1054 func:folio_prealloc
>>>> 1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681
>>>> [ice] func:ice_alloc_mapped_page
>>>> 1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio
>>>>
>>>
>>> The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice]
>>> func:ice_alloc_mapped_page" is just growing...
>>>
>>> # uptime ; sort -g /proc/allocinfo| tail -n 15
>>> 09:33:58 up 4 days, 6 min, 1 user, load average: 6.65, 8.18, 9.81
>>>
>>> # sort -g /proc/allocinfo| tail -n 15
>>> 85216896 443838 fs/dcache.c:1681 func:__d_alloc
>>> 106156032 25917 mm/shmem.c:1854 func:shmem_alloc_folio
>>> 116850096 102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
>>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
>>> 143556608 6894 mm/slub.c:2452 func:alloc_slab_page
>>> 186793984 45604 mm/memory.c:1056 func:folio_prealloc
>>> 362807296 88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages
>>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
>>> 598237184 51309 mm/slub.c:2450 func:alloc_slab_page
>>> 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
>>> 929083392 226827 mm/filemap.c:1978 func:__filemap_get_folio
>>> 1034657792 252602 mm/memory.c:1054 func:folio_prealloc
>>> 1262485504 602 mm/khugepaged.c:1084 func:alloc_charge_folio
>>> 1335377920 325970 mm/readahead.c:186 func:ractl_alloc_folio
>>> 2544877568 315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681
>>> [ice] func:ice_alloc_mapped_page
>>>
>> ice_alloc_mapped_page is the function used to allocate the pages for the
>> Rx ring buffers.
>>
>> There were a number of fixes for the hot path from Maciej which might be
>> related. Although those fixes were primarily for XDP they do impact the
>> regular hot path as well.
>>
>> These were fixes on top of work he did which landed in v6.13, so it
>> seems plausible they might be related. In particular one which mentions
>> a missing buffer put:
>>
>> 743bbd93cf29 ("ice: put Rx buffers after being done with current frame")
>>
>> It says the following:
>>> While at it, address an error path of ice_add_xdp_frag() - we were
>>> missing buffer putting from day 1 there.
>>>
>>
>> It seems to me the issue must be somehow related to the buffer cleanup
>> logic for the Rx ring, since thats the only thing allocated by
>> ice_alloc_mapped_page.
>>
>> It might be something fixed with the work Maciej did.. but it seems very
>> weird that 492a044508ad ("ice: Add support for persistent NAPI config")
>> would affect that logic at all....
>
> I believe there were/are at least two separate issues. Regarding
> commit 492a044508ad (“ice: Add support for persistent NAPI config”):
> * On 6.13.y and 6.14.y kernels, this change prevented us from lowering
> the driver’s initial, large memory allocation immediately after server
> power-up. A few hours (max few days) later, this inevitably led to an
> out-of-memory condition.
> * Reverting the commit in those series only delayed the OOM, it
> allowed the queue size (and thus memory footprint) to shrink on boot
> just as it did in 6.12.y but didn’t eliminate the underlying 'leak'.
> * In 6.15.y, however, that revert isn’t required (and isn’t even
> applicable). The after boot allocation can once again be tuned down
> without patching. Still, we observe the same increase in memory use
> over time, as shown in the 'allocmap' output.
> Thus, commit 492a044508ad led us down a false trail, or at the very
> least hastened the inevitable OOM.
That seems reasonable. I'm still surprised the specific commit leads to
any large increase in memory, since it should only be a few bytes per
NAPI. But there may be some related driver-specific issues.
Either way, we clearly need to isolate how we're leaking memory in the
hot path. I think it might be related to the fixes from Maciej which are
pretty recent so might not be in 6.13 or 6.14
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-06-30 18:59 ` Jacob Keller
@ 2025-06-30 20:01 ` Jaroslav Pulchart
2025-06-30 20:42 ` Jacob Keller
2025-06-30 21:56 ` Jacob Keller
0 siblings, 2 replies; 59+ messages in thread
From: Jaroslav Pulchart @ 2025-06-30 20:01 UTC (permalink / raw)
To: Jacob Keller
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
>
>
>
> On 6/30/2025 10:24 AM, Jaroslav Pulchart wrote:
> >>
> >>
> >>
> >> On 6/30/2025 12:35 AM, Jaroslav Pulchart wrote:
> >>>>
> >>>>>
> >>>>> On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote:
> >>>>>> Great, please send me a link to the related patch set. I can apply them in
> >>>>>> our kernel build and try them ASAP!
> >>>>>
> >>>>> Sorry if I'm repeating the question - have you tried
> >>>>> CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels
> >>>>> is low enough to use it for production workloads.
> >>>>
> >>>> I try it now, the fresh booted server:
> >>>>
> >>>> # sort -g /proc/allocinfo| tail -n 15
> >>>> 45409728 236509 fs/dcache.c:1681 func:__d_alloc
> >>>> 71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages
> >>>> 71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages
> >>>> 85098496 4486 mm/slub.c:2452 func:alloc_slab_page
> >>>> 115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
> >>>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
> >>>> 141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio
> >>>> 191594496 46776 mm/memory.c:1056 func:folio_prealloc
> >>>> 360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio
> >>>> 444076032 33790 mm/slub.c:2450 func:alloc_slab_page
> >>>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
> >>>> 975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
> >>>> 1022427136 249616 mm/memory.c:1054 func:folio_prealloc
> >>>> 1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681
> >>>> [ice] func:ice_alloc_mapped_page
> >>>> 1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio
> >>>>
> >>>
> >>> The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice]
> >>> func:ice_alloc_mapped_page" is just growing...
> >>>
> >>> # uptime ; sort -g /proc/allocinfo| tail -n 15
> >>> 09:33:58 up 4 days, 6 min, 1 user, load average: 6.65, 8.18, 9.81
> >>>
> >>> # sort -g /proc/allocinfo| tail -n 15
> >>> 85216896 443838 fs/dcache.c:1681 func:__d_alloc
> >>> 106156032 25917 mm/shmem.c:1854 func:shmem_alloc_folio
> >>> 116850096 102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
> >>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
> >>> 143556608 6894 mm/slub.c:2452 func:alloc_slab_page
> >>> 186793984 45604 mm/memory.c:1056 func:folio_prealloc
> >>> 362807296 88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages
> >>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
> >>> 598237184 51309 mm/slub.c:2450 func:alloc_slab_page
> >>> 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
> >>> 929083392 226827 mm/filemap.c:1978 func:__filemap_get_folio
> >>> 1034657792 252602 mm/memory.c:1054 func:folio_prealloc
> >>> 1262485504 602 mm/khugepaged.c:1084 func:alloc_charge_folio
> >>> 1335377920 325970 mm/readahead.c:186 func:ractl_alloc_folio
> >>> 2544877568 315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681
> >>> [ice] func:ice_alloc_mapped_page
> >>>
> >> ice_alloc_mapped_page is the function used to allocate the pages for the
> >> Rx ring buffers.
> >>
> >> There were a number of fixes for the hot path from Maciej which might be
> >> related. Although those fixes were primarily for XDP they do impact the
> >> regular hot path as well.
> >>
> >> These were fixes on top of work he did which landed in v6.13, so it
> >> seems plausible they might be related. In particular one which mentions
> >> a missing buffer put:
> >>
> >> 743bbd93cf29 ("ice: put Rx buffers after being done with current frame")
> >>
> >> It says the following:
> >>> While at it, address an error path of ice_add_xdp_frag() - we were
> >>> missing buffer putting from day 1 there.
> >>>
> >>
> >> It seems to me the issue must be somehow related to the buffer cleanup
> >> logic for the Rx ring, since thats the only thing allocated by
> >> ice_alloc_mapped_page.
> >>
> >> It might be something fixed with the work Maciej did.. but it seems very
> >> weird that 492a044508ad ("ice: Add support for persistent NAPI config")
> >> would affect that logic at all....
> >
> > I believe there were/are at least two separate issues. Regarding
> > commit 492a044508ad (“ice: Add support for persistent NAPI config”):
> > * On 6.13.y and 6.14.y kernels, this change prevented us from lowering
> > the driver’s initial, large memory allocation immediately after server
> > power-up. A few hours (max few days) later, this inevitably led to an
> > out-of-memory condition.
> > * Reverting the commit in those series only delayed the OOM, it
> > allowed the queue size (and thus memory footprint) to shrink on boot
> > just as it did in 6.12.y but didn’t eliminate the underlying 'leak'.
> > * In 6.15.y, however, that revert isn’t required (and isn’t even
> > applicable). The after boot allocation can once again be tuned down
> > without patching. Still, we observe the same increase in memory use
> > over time, as shown in the 'allocmap' output.
> > Thus, commit 492a044508ad led us down a false trail, or at the very
> > least hastened the inevitable OOM.
>
> That seems reasonable. I'm still surprised the specific commit leads to
> any large increase in memory, since it should only be a few bytes per
> NAPI. But there may be some related driver-specific issues.
Actually, the large base allocation has existed for quite some time,
the mentioned commit didn’t suddenly grow our memory usage, it only
prevented us from shrinking it via "ethtool -L <iface> combined
<small-number>"
after boot. In other words, we’re still stuck with the same big
allocation, we just can’t tune it down (till reverting the commit)
>
> Either way, we clearly need to isolate how we're leaking memory in the
> hot path. I think it might be related to the fixes from Maciej which are
> pretty recent so might not be in 6.13 or 6.14
I’m fine with the fix for the mainline (now 6.15.y), the 6.13.y and
6.14.y are already EOL. Could you please tell me which 6.15.y stable
release first incorporates that patch? Is it included in current
6.15.5, or will it arrive in a later point release?
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-06-30 20:01 ` Jaroslav Pulchart
@ 2025-06-30 20:42 ` Jacob Keller
2025-06-30 21:56 ` Jacob Keller
1 sibling, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2025-06-30 20:42 UTC (permalink / raw)
To: Jaroslav Pulchart
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1.1: Type: text/plain, Size: 6777 bytes --]
On 6/30/2025 1:01 PM, Jaroslav Pulchart wrote:
>>
>>
>>
>> On 6/30/2025 10:24 AM, Jaroslav Pulchart wrote:
>>>>
>>>>
>>>>
>>>> On 6/30/2025 12:35 AM, Jaroslav Pulchart wrote:
>>>>>>
>>>>>>>
>>>>>>> On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote:
>>>>>>>> Great, please send me a link to the related patch set. I can apply them in
>>>>>>>> our kernel build and try them ASAP!
>>>>>>>
>>>>>>> Sorry if I'm repeating the question - have you tried
>>>>>>> CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels
>>>>>>> is low enough to use it for production workloads.
>>>>>>
>>>>>> I try it now, the fresh booted server:
>>>>>>
>>>>>> # sort -g /proc/allocinfo| tail -n 15
>>>>>> 45409728 236509 fs/dcache.c:1681 func:__d_alloc
>>>>>> 71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages
>>>>>> 71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages
>>>>>> 85098496 4486 mm/slub.c:2452 func:alloc_slab_page
>>>>>> 115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
>>>>>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
>>>>>> 141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio
>>>>>> 191594496 46776 mm/memory.c:1056 func:folio_prealloc
>>>>>> 360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio
>>>>>> 444076032 33790 mm/slub.c:2450 func:alloc_slab_page
>>>>>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
>>>>>> 975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
>>>>>> 1022427136 249616 mm/memory.c:1054 func:folio_prealloc
>>>>>> 1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681
>>>>>> [ice] func:ice_alloc_mapped_page
>>>>>> 1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio
>>>>>>
>>>>>
>>>>> The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice]
>>>>> func:ice_alloc_mapped_page" is just growing...
>>>>>
>>>>> # uptime ; sort -g /proc/allocinfo| tail -n 15
>>>>> 09:33:58 up 4 days, 6 min, 1 user, load average: 6.65, 8.18, 9.81
>>>>>
>>>>> # sort -g /proc/allocinfo| tail -n 15
>>>>> 85216896 443838 fs/dcache.c:1681 func:__d_alloc
>>>>> 106156032 25917 mm/shmem.c:1854 func:shmem_alloc_folio
>>>>> 116850096 102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
>>>>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
>>>>> 143556608 6894 mm/slub.c:2452 func:alloc_slab_page
>>>>> 186793984 45604 mm/memory.c:1056 func:folio_prealloc
>>>>> 362807296 88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages
>>>>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
>>>>> 598237184 51309 mm/slub.c:2450 func:alloc_slab_page
>>>>> 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
>>>>> 929083392 226827 mm/filemap.c:1978 func:__filemap_get_folio
>>>>> 1034657792 252602 mm/memory.c:1054 func:folio_prealloc
>>>>> 1262485504 602 mm/khugepaged.c:1084 func:alloc_charge_folio
>>>>> 1335377920 325970 mm/readahead.c:186 func:ractl_alloc_folio
>>>>> 2544877568 315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681
>>>>> [ice] func:ice_alloc_mapped_page
>>>>>
>>>> ice_alloc_mapped_page is the function used to allocate the pages for the
>>>> Rx ring buffers.
>>>>
>>>> There were a number of fixes for the hot path from Maciej which might be
>>>> related. Although those fixes were primarily for XDP they do impact the
>>>> regular hot path as well.
>>>>
>>>> These were fixes on top of work he did which landed in v6.13, so it
>>>> seems plausible they might be related. In particular one which mentions
>>>> a missing buffer put:
>>>>
>>>> 743bbd93cf29 ("ice: put Rx buffers after being done with current frame")
>>>>
>>>> It says the following:
>>>>> While at it, address an error path of ice_add_xdp_frag() - we were
>>>>> missing buffer putting from day 1 there.
>>>>>
>>>>
>>>> It seems to me the issue must be somehow related to the buffer cleanup
>>>> logic for the Rx ring, since thats the only thing allocated by
>>>> ice_alloc_mapped_page.
>>>>
>>>> It might be something fixed with the work Maciej did.. but it seems very
>>>> weird that 492a044508ad ("ice: Add support for persistent NAPI config")
>>>> would affect that logic at all....
>>>
>>> I believe there were/are at least two separate issues. Regarding
>>> commit 492a044508ad (“ice: Add support for persistent NAPI config”):
>>> * On 6.13.y and 6.14.y kernels, this change prevented us from lowering
>>> the driver’s initial, large memory allocation immediately after server
>>> power-up. A few hours (max few days) later, this inevitably led to an
>>> out-of-memory condition.
>>> * Reverting the commit in those series only delayed the OOM, it
>>> allowed the queue size (and thus memory footprint) to shrink on boot
>>> just as it did in 6.12.y but didn’t eliminate the underlying 'leak'.
>>> * In 6.15.y, however, that revert isn’t required (and isn’t even
>>> applicable). The after boot allocation can once again be tuned down
>>> without patching. Still, we observe the same increase in memory use
>>> over time, as shown in the 'allocmap' output.
>>> Thus, commit 492a044508ad led us down a false trail, or at the very
>>> least hastened the inevitable OOM.
>>
>> That seems reasonable. I'm still surprised the specific commit leads to
>> any large increase in memory, since it should only be a few bytes per
>> NAPI. But there may be some related driver-specific issues.
>
> Actually, the large base allocation has existed for quite some time,
> the mentioned commit didn’t suddenly grow our memory usage, it only
> prevented us from shrinking it via "ethtool -L <iface> combined
> <small-number>"
> after boot. In other words, we’re still stuck with the same big
> allocation, we just can’t tune it down (till reverting the commit)
>
Yes. My point is that I still don't understand the mechanism by which
that change *prevents* ethtool -L from working as you describe.
>>
>> Either way, we clearly need to isolate how we're leaking memory in the
>> hot path. I think it might be related to the fixes from Maciej which are
>> pretty recent so might not be in 6.13 or 6.14
>
> I’m fine with the fix for the mainline (now 6.15.y), the 6.13.y and
> 6.14.y are already EOL. Could you please tell me which 6.15.y stable
> release first incorporates that patch? Is it included in current
> 6.15.5, or will it arrive in a later point release?
I'm not certain if this fix actually is resolving your issue, but I will
figure out which stable kernels have it shortly.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-06-30 20:01 ` Jaroslav Pulchart
2025-06-30 20:42 ` Jacob Keller
@ 2025-06-30 21:56 ` Jacob Keller
2025-06-30 23:16 ` Jacob Keller
1 sibling, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2025-06-30 21:56 UTC (permalink / raw)
To: Jaroslav Pulchart
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1.1: Type: text/plain, Size: 6966 bytes --]
On 6/30/2025 1:01 PM, Jaroslav Pulchart wrote:
>>
>>
>>
>> On 6/30/2025 10:24 AM, Jaroslav Pulchart wrote:
>>>>
>>>>
>>>>
>>>> On 6/30/2025 12:35 AM, Jaroslav Pulchart wrote:
>>>>>>
>>>>>>>
>>>>>>> On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote:
>>>>>>>> Great, please send me a link to the related patch set. I can apply them in
>>>>>>>> our kernel build and try them ASAP!
>>>>>>>
>>>>>>> Sorry if I'm repeating the question - have you tried
>>>>>>> CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels
>>>>>>> is low enough to use it for production workloads.
>>>>>>
>>>>>> I try it now, the fresh booted server:
>>>>>>
>>>>>> # sort -g /proc/allocinfo| tail -n 15
>>>>>> 45409728 236509 fs/dcache.c:1681 func:__d_alloc
>>>>>> 71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages
>>>>>> 71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages
>>>>>> 85098496 4486 mm/slub.c:2452 func:alloc_slab_page
>>>>>> 115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
>>>>>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
>>>>>> 141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio
>>>>>> 191594496 46776 mm/memory.c:1056 func:folio_prealloc
>>>>>> 360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio
>>>>>> 444076032 33790 mm/slub.c:2450 func:alloc_slab_page
>>>>>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
>>>>>> 975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
>>>>>> 1022427136 249616 mm/memory.c:1054 func:folio_prealloc
>>>>>> 1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681
>>>>>> [ice] func:ice_alloc_mapped_page
>>>>>> 1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio
>>>>>>
>>>>>
>>>>> The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice]
>>>>> func:ice_alloc_mapped_page" is just growing...
>>>>>
>>>>> # uptime ; sort -g /proc/allocinfo| tail -n 15
>>>>> 09:33:58 up 4 days, 6 min, 1 user, load average: 6.65, 8.18, 9.81
>>>>>
>>>>> # sort -g /proc/allocinfo| tail -n 15
>>>>> 85216896 443838 fs/dcache.c:1681 func:__d_alloc
>>>>> 106156032 25917 mm/shmem.c:1854 func:shmem_alloc_folio
>>>>> 116850096 102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
>>>>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
>>>>> 143556608 6894 mm/slub.c:2452 func:alloc_slab_page
>>>>> 186793984 45604 mm/memory.c:1056 func:folio_prealloc
>>>>> 362807296 88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages
>>>>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
>>>>> 598237184 51309 mm/slub.c:2450 func:alloc_slab_page
>>>>> 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
>>>>> 929083392 226827 mm/filemap.c:1978 func:__filemap_get_folio
>>>>> 1034657792 252602 mm/memory.c:1054 func:folio_prealloc
>>>>> 1262485504 602 mm/khugepaged.c:1084 func:alloc_charge_folio
>>>>> 1335377920 325970 mm/readahead.c:186 func:ractl_alloc_folio
>>>>> 2544877568 315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681
>>>>> [ice] func:ice_alloc_mapped_page
>>>>>
>>>> ice_alloc_mapped_page is the function used to allocate the pages for the
>>>> Rx ring buffers.
>>>>
>>>> There were a number of fixes for the hot path from Maciej which might be
>>>> related. Although those fixes were primarily for XDP they do impact the
>>>> regular hot path as well.
>>>>
>>>> These were fixes on top of work he did which landed in v6.13, so it
>>>> seems plausible they might be related. In particular one which mentions
>>>> a missing buffer put:
>>>>
>>>> 743bbd93cf29 ("ice: put Rx buffers after being done with current frame")
>>>>
>>>> It says the following:
>>>>> While at it, address an error path of ice_add_xdp_frag() - we were
>>>>> missing buffer putting from day 1 there.
>>>>>
>>>>
>>>> It seems to me the issue must be somehow related to the buffer cleanup
>>>> logic for the Rx ring, since thats the only thing allocated by
>>>> ice_alloc_mapped_page.
>>>>
>>>> It might be something fixed with the work Maciej did.. but it seems very
>>>> weird that 492a044508ad ("ice: Add support for persistent NAPI config")
>>>> would affect that logic at all....
>>>
>>> I believe there were/are at least two separate issues. Regarding
>>> commit 492a044508ad (“ice: Add support for persistent NAPI config”):
>>> * On 6.13.y and 6.14.y kernels, this change prevented us from lowering
>>> the driver’s initial, large memory allocation immediately after server
>>> power-up. A few hours (max few days) later, this inevitably led to an
>>> out-of-memory condition.
>>> * Reverting the commit in those series only delayed the OOM, it
>>> allowed the queue size (and thus memory footprint) to shrink on boot
>>> just as it did in 6.12.y but didn’t eliminate the underlying 'leak'.
>>> * In 6.15.y, however, that revert isn’t required (and isn’t even
>>> applicable). The after boot allocation can once again be tuned down
>>> without patching. Still, we observe the same increase in memory use
>>> over time, as shown in the 'allocmap' output.
>>> Thus, commit 492a044508ad led us down a false trail, or at the very
>>> least hastened the inevitable OOM.
>>
>> That seems reasonable. I'm still surprised the specific commit leads to
>> any large increase in memory, since it should only be a few bytes per
>> NAPI. But there may be some related driver-specific issues.
>
> Actually, the large base allocation has existed for quite some time,
> the mentioned commit didn’t suddenly grow our memory usage, it only
> prevented us from shrinking it via "ethtool -L <iface> combined
> <small-number>"
> after boot. In other words, we’re still stuck with the same big
> allocation, we just can’t tune it down (till reverting the commit)
>
>>
>> Either way, we clearly need to isolate how we're leaking memory in the
>> hot path. I think it might be related to the fixes from Maciej which are
>> pretty recent so might not be in 6.13 or 6.14
>
> I’m fine with the fix for the mainline (now 6.15.y), the 6.13.y and
> 6.14.y are already EOL. Could you please tell me which 6.15.y stable
> release first incorporates that patch? Is it included in current
> 6.15.5, or will it arrive in a later point release?
Unfortunately it looks like the fix I mentioned has landed in 6.14, so
its not a fix for your issue (since you mentioned 6.14 has failed
testing in your system)
$ git describe --first-parent --contains --match=v* --exclude=*rc*
743bbd93cf29f653fae0e1416a31f03231689911
v6.14~251^2~15^2~2
I don't see any other relevant changes since v6.14. I can try to see if
I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test
systems here.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-06-30 21:56 ` Jacob Keller
@ 2025-06-30 23:16 ` Jacob Keller
2025-07-01 6:48 ` Jaroslav Pulchart
0 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2025-06-30 23:16 UTC (permalink / raw)
To: Jaroslav Pulchart
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1.1: Type: text/plain, Size: 2175 bytes --]
On 6/30/2025 2:56 PM, Jacob Keller wrote:
> Unfortunately it looks like the fix I mentioned has landed in 6.14, so
> its not a fix for your issue (since you mentioned 6.14 has failed
> testing in your system)
>
> $ git describe --first-parent --contains --match=v* --exclude=*rc*
> 743bbd93cf29f653fae0e1416a31f03231689911
> v6.14~251^2~15^2~2
>
> I don't see any other relevant changes since v6.14. I can try to see if
> I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test
> systems here.
On my system I see this at boot after loading the ice module from
$ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec>
26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice]
func:ice_get_irq_res
> 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs
> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats
> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats
> 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings
> 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings
> 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector
> 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs
> 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring
> 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page
Its about 1GB for the mapped pages. I don't see any increase moment to
moment. I've started an iperf session to simulate some traffic, and I'll
leave this running to see if anything changes overnight.
Is there anything else that you can share about the traffic setup or
otherwise that I could look into? Your system seems to use ~2.5 x the
buffer size as mine, but that might just be a smaller number of CPUs.
Hopefully I'll get some more results overnight.
Thanks,
Jake
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-06-30 23:16 ` Jacob Keller
@ 2025-07-01 6:48 ` Jaroslav Pulchart
2025-07-01 20:48 ` Jacob Keller
0 siblings, 1 reply; 59+ messages in thread
From: Jaroslav Pulchart @ 2025-07-01 6:48 UTC (permalink / raw)
To: Jacob Keller
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
> On 6/30/2025 2:56 PM, Jacob Keller wrote:
> > Unfortunately it looks like the fix I mentioned has landed in 6.14, so
> > its not a fix for your issue (since you mentioned 6.14 has failed
> > testing in your system)
> >
> > $ git describe --first-parent --contains --match=v* --exclude=*rc*
> > 743bbd93cf29f653fae0e1416a31f03231689911
> > v6.14~251^2~15^2~2
> >
> > I don't see any other relevant changes since v6.14. I can try to see if
> > I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test
> > systems here.
>
> On my system I see this at boot after loading the ice module from
>
> $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec>
> 26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice]
> func:ice_get_irq_res
> > 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs
> > 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats
> > 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats
> > 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings
> > 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings
> > 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector
> > 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs
> > 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring
> > 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page
>
> Its about 1GB for the mapped pages. I don't see any increase moment to
> moment. I've started an iperf session to simulate some traffic, and I'll
> leave this running to see if anything changes overnight.
>
> Is there anything else that you can share about the traffic setup or
> otherwise that I could look into? Your system seems to use ~2.5 x the
> buffer size as mine, but that might just be a smaller number of CPUs.
>
> Hopefully I'll get some more results overnight.
The traffic is random production workloads from VMs, using standard
Linux or OVS bridges. There is no specific pattern to it. I haven’t
had any luck reproducing (or was not patient enough) this with iperf3
myself. The two active (UP) interfaces are in an LACP bonding setup.
Here are our ethtool settings for the two member ports (em1 and p3p1)
# ethtool -l em1
Channel parameters for em1:
Pre-set maximums:
RX: 64
TX: 64
Other: 1
Combined: 64
Current hardware settings:
RX: 0
TX: 0
Other: 1
Combined: 8
# ethtool -g em1
Ring parameters for em1:
Pre-set maximums:
RX: 8160
RX Mini: n/a
RX Jumbo: n/a
TX: 8160
TX push buff len: n/a
Current hardware settings:
RX: 8160
RX Mini: n/a
RX Jumbo: n/a
TX: 8160
RX Buf Len: n/a
CQE Size: n/a
TX Push: off
RX Push: off
TX push buff len: n/a
TCP data split: n/a
# ethtool -c em1
Coalesce parameters for em1:
Adaptive RX: off TX: off
stats-block-usecs: n/a
sample-interval: n/a
pkt-rate-low: n/a
pkt-rate-high: n/a
rx-usecs: 12
rx-frames: n/a
rx-usecs-irq: n/a
rx-frames-irq: n/a
tx-usecs: 28
tx-frames: n/a
tx-usecs-irq: n/a
tx-frames-irq: n/a
rx-usecs-low: n/a
rx-frame-low: n/a
tx-usecs-low: n/a
tx-frame-low: n/a
rx-usecs-high: 0
rx-frame-high: n/a
tx-usecs-high: n/a
tx-frame-high: n/a
CQE mode RX: n/a TX: n/a
tx-aggr-max-bytes: n/a
tx-aggr-max-frames: n/a
tx-aggr-time-usecs: n/a
# ethtool -k em1
Features for em1:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: on
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: on
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: on
tx-tcp-accecn-segmentation: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: on
receive-hashing: on
highdma: on
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
tx-gso-list: off [fixed]
tx-nocache-copy: off
loopback: off
rx-fcs: off
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off
rx-vlan-stag-hw-parse: off
rx-vlan-stag-filter: on
l2-fwd-offload: off [fixed]
hw-tc-offload: off
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]
# ethtool -i em1
driver: ice
version: 6.15.3-3.gdc.el9.x86_64
firmware-version: 4.51 0x8001e501 23.0.8
expansion-rom-version:
bus-info: 0000:63:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
# ethtool em1
Settings for em1:
Supported ports: [ FIBRE ]
Supported link modes: 1000baseT/Full
25000baseCR/Full
25000baseSR/Full
1000baseX/Full
10000baseCR/Full
10000baseSR/Full
10000baseLR/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: None RS BASER
Advertised link modes: 25000baseCR/Full
10000baseCR/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: None RS BASER
Speed: 25000Mb/s
Duplex: Full
Auto-negotiation: off
Port: Direct Attach Copper
PHYAD: 0
Transceiver: internal
Supports Wake-on: g
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes
>
> Thanks,
> Jake
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-07-01 6:48 ` Jaroslav Pulchart
@ 2025-07-01 20:48 ` Jacob Keller
2025-07-02 9:48 ` Jaroslav Pulchart
0 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2025-07-01 20:48 UTC (permalink / raw)
To: Jaroslav Pulchart
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1.1: Type: text/plain, Size: 3099 bytes --]
On 6/30/2025 11:48 PM, Jaroslav Pulchart wrote:
>> On 6/30/2025 2:56 PM, Jacob Keller wrote:
>>> Unfortunately it looks like the fix I mentioned has landed in 6.14, so
>>> its not a fix for your issue (since you mentioned 6.14 has failed
>>> testing in your system)
>>>
>>> $ git describe --first-parent --contains --match=v* --exclude=*rc*
>>> 743bbd93cf29f653fae0e1416a31f03231689911
>>> v6.14~251^2~15^2~2
>>>
>>> I don't see any other relevant changes since v6.14. I can try to see if
>>> I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test
>>> systems here.
>>
>> On my system I see this at boot after loading the ice module from
>>
>> $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec>
>> 26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice]
>> func:ice_get_irq_res
>>> 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs
>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats
>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats
>>> 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings
>>> 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings
>>> 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector
>>> 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs
>>> 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring
>>> 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page
>>
>> Its about 1GB for the mapped pages. I don't see any increase moment to
>> moment. I've started an iperf session to simulate some traffic, and I'll
>> leave this running to see if anything changes overnight.
>>
>> Is there anything else that you can share about the traffic setup or
>> otherwise that I could look into? Your system seems to use ~2.5 x the
>> buffer size as mine, but that might just be a smaller number of CPUs.
>>
>> Hopefully I'll get some more results overnight.
>
> The traffic is random production workloads from VMs, using standard
> Linux or OVS bridges. There is no specific pattern to it. I haven’t
> had any luck reproducing (or was not patient enough) this with iperf3
> myself. The two active (UP) interfaces are in an LACP bonding setup.
> Here are our ethtool settings for the two member ports (em1 and p3p1)
>
I had iperf3 running overnight and the memory usage for
ice_alloc_mapped_pages is constant here. Mine was direct connections
without bridge or bonding. From your description I assume there's no XDP
happening either.
I guess the traffic patterns of an iperf session are too regular, or
something to do with bridge or bonding.. but I also struggle to see how
those could play a role in the buffer management in the ice driver...
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-07-01 20:48 ` Jacob Keller
@ 2025-07-02 9:48 ` Jaroslav Pulchart
2025-07-02 18:01 ` Jacob Keller
2025-07-02 21:56 ` Jacob Keller
0 siblings, 2 replies; 59+ messages in thread
From: Jaroslav Pulchart @ 2025-07-02 9:48 UTC (permalink / raw)
To: Jacob Keller
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
>
> On 6/30/2025 11:48 PM, Jaroslav Pulchart wrote:
> >> On 6/30/2025 2:56 PM, Jacob Keller wrote:
> >>> Unfortunately it looks like the fix I mentioned has landed in 6.14, so
> >>> its not a fix for your issue (since you mentioned 6.14 has failed
> >>> testing in your system)
> >>>
> >>> $ git describe --first-parent --contains --match=v* --exclude=*rc*
> >>> 743bbd93cf29f653fae0e1416a31f03231689911
> >>> v6.14~251^2~15^2~2
> >>>
> >>> I don't see any other relevant changes since v6.14. I can try to see if
> >>> I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test
> >>> systems here.
> >>
> >> On my system I see this at boot after loading the ice module from
> >>
> >> $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec>
> >> 26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice]
> >> func:ice_get_irq_res
> >>> 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs
> >>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats
> >>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats
> >>> 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings
> >>> 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings
> >>> 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector
> >>> 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs
> >>> 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring
> >>> 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page
> >>
> >> Its about 1GB for the mapped pages. I don't see any increase moment to
> >> moment. I've started an iperf session to simulate some traffic, and I'll
> >> leave this running to see if anything changes overnight.
> >>
> >> Is there anything else that you can share about the traffic setup or
> >> otherwise that I could look into? Your system seems to use ~2.5 x the
> >> buffer size as mine, but that might just be a smaller number of CPUs.
> >>
> >> Hopefully I'll get some more results overnight.
> >
> > The traffic is random production workloads from VMs, using standard
> > Linux or OVS bridges. There is no specific pattern to it. I haven’t
> > had any luck reproducing (or was not patient enough) this with iperf3
> > myself. The two active (UP) interfaces are in an LACP bonding setup.
> > Here are our ethtool settings for the two member ports (em1 and p3p1)
> >
>
> I had iperf3 running overnight and the memory usage for
> ice_alloc_mapped_pages is constant here. Mine was direct connections
> without bridge or bonding. From your description I assume there's no XDP
> happening either.
Yes, no XDP in use.
BTW the allocinfo after 6days uptime:
# uptime ; sort -g /proc/allocinfo| tail -n 15
11:46:44 up 6 days, 2:18, 1 user, load average: 9.24, 11.33, 15.07
102489024 533797 fs/dcache.c:1681 func:__d_alloc
106229760 25935 mm/shmem.c:1854 func:shmem_alloc_folio
117118192 103097 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
162783232 7656 mm/slub.c:2452 func:alloc_slab_page
189906944 46364 mm/memory.c:1056 func:folio_prealloc
499384320 121920 mm/percpu-vm.c:95 func:pcpu_alloc_pages
530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
625876992 54186 mm/slub.c:2450 func:alloc_slab_page
838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
1014710272 247732 mm/filemap.c:1978 func:__filemap_get_folio
1056710656 257986 mm/memory.c:1054 func:folio_prealloc
1279262720 610 mm/khugepaged.c:1084 func:alloc_charge_folio
1334530048 325763 mm/readahead.c:186 func:ractl_alloc_folio
3341238272 412215 drivers/net/ethernet/intel/ice/ice_txrx.c:681
[ice] func:ice_alloc_mapped_page
>
> I guess the traffic patterns of an iperf session are too regular, or
> something to do with bridge or bonding.. but I also struggle to see how
> those could play a role in the buffer management in the ice driver...
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-07-02 9:48 ` Jaroslav Pulchart
@ 2025-07-02 18:01 ` Jacob Keller
2025-07-02 21:56 ` Jacob Keller
1 sibling, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2025-07-02 18:01 UTC (permalink / raw)
To: Jaroslav Pulchart
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1.1: Type: text/plain, Size: 4996 bytes --]
On 7/2/2025 2:48 AM, Jaroslav Pulchart wrote:
>>
>> On 6/30/2025 11:48 PM, Jaroslav Pulchart wrote:
>>>> On 6/30/2025 2:56 PM, Jacob Keller wrote:
>>>>> Unfortunately it looks like the fix I mentioned has landed in 6.14, so
>>>>> its not a fix for your issue (since you mentioned 6.14 has failed
>>>>> testing in your system)
>>>>>
>>>>> $ git describe --first-parent --contains --match=v* --exclude=*rc*
>>>>> 743bbd93cf29f653fae0e1416a31f03231689911
>>>>> v6.14~251^2~15^2~2
>>>>>
>>>>> I don't see any other relevant changes since v6.14. I can try to see if
>>>>> I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test
>>>>> systems here.
>>>>
>>>> On my system I see this at boot after loading the ice module from
>>>>
>>>> $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec>
>>>> 26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice]
>>>> func:ice_get_irq_res
>>>>> 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs
>>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats
>>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats
>>>>> 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings
>>>>> 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings
>>>>> 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector
>>>>> 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs
>>>>> 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring
>>>>> 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page
>>>>
>>>> Its about 1GB for the mapped pages. I don't see any increase moment to
>>>> moment. I've started an iperf session to simulate some traffic, and I'll
>>>> leave this running to see if anything changes overnight.
>>>>
>>>> Is there anything else that you can share about the traffic setup or
>>>> otherwise that I could look into? Your system seems to use ~2.5 x the
>>>> buffer size as mine, but that might just be a smaller number of CPUs.
>>>>
>>>> Hopefully I'll get some more results overnight.
>>>
>>> The traffic is random production workloads from VMs, using standard
>>> Linux or OVS bridges. There is no specific pattern to it. I haven’t
>>> had any luck reproducing (or was not patient enough) this with iperf3
>>> myself. The two active (UP) interfaces are in an LACP bonding setup.
>>> Here are our ethtool settings for the two member ports (em1 and p3p1)
>>>
>>
>> I had iperf3 running overnight and the memory usage for
>> ice_alloc_mapped_pages is constant here. Mine was direct connections
>> without bridge or bonding. From your description I assume there's no XDP
>> happening either.
>
> Yes, no XDP in use.
>
> BTW the allocinfo after 6days uptime:
> # uptime ; sort -g /proc/allocinfo| tail -n 15
> 11:46:44 up 6 days, 2:18, 1 user, load average: 9.24, 11.33, 15.07
> 102489024 533797 fs/dcache.c:1681 func:__d_alloc
> 106229760 25935 mm/shmem.c:1854 func:shmem_alloc_folio
> 117118192 103097 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
> 162783232 7656 mm/slub.c:2452 func:alloc_slab_page
> 189906944 46364 mm/memory.c:1056 func:folio_prealloc
> 499384320 121920 mm/percpu-vm.c:95 func:pcpu_alloc_pages
> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
> 625876992 54186 mm/slub.c:2450 func:alloc_slab_page
> 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
> 1014710272 247732 mm/filemap.c:1978 func:__filemap_get_folio
> 1056710656 257986 mm/memory.c:1054 func:folio_prealloc
> 1279262720 610 mm/khugepaged.c:1084 func:alloc_charge_folio
> 1334530048 325763 mm/readahead.c:186 func:ractl_alloc_folio
> 3341238272 412215 drivers/net/ethernet/intel/ice/ice_txrx.c:681
> [ice] func:ice_alloc_mapped_page
>
3.2GB meaning an entire GB wasted from your on-boot up :(
Unfortunately, I've had no luck trying to reproduce the conditions that
trigger this. We do have a series in flight to convert ice to page pool
which we hope resolves this.. but of course that isn't really a suitable
backport candidate.
Its quite frustrating when I can't figure out how to reproduce to
further debug where the leak is.
I also discovered that the leak sanitizer doesn't cover page allocations :(
>>
>> I guess the traffic patterns of an iperf session are too regular, or
>> something to do with bridge or bonding.. but I also struggle to see how
>> those could play a role in the buffer management in the ice driver...
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-07-02 9:48 ` Jaroslav Pulchart
2025-07-02 18:01 ` Jacob Keller
@ 2025-07-02 21:56 ` Jacob Keller
2025-07-03 6:46 ` Jaroslav Pulchart
1 sibling, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2025-07-02 21:56 UTC (permalink / raw)
To: Jaroslav Pulchart
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1.1: Type: text/plain, Size: 5974 bytes --]
On 7/2/2025 2:48 AM, Jaroslav Pulchart wrote:
>>
>> On 6/30/2025 11:48 PM, Jaroslav Pulchart wrote:
>>>> On 6/30/2025 2:56 PM, Jacob Keller wrote:
>>>>> Unfortunately it looks like the fix I mentioned has landed in 6.14, so
>>>>> its not a fix for your issue (since you mentioned 6.14 has failed
>>>>> testing in your system)
>>>>>
>>>>> $ git describe --first-parent --contains --match=v* --exclude=*rc*
>>>>> 743bbd93cf29f653fae0e1416a31f03231689911
>>>>> v6.14~251^2~15^2~2
>>>>>
>>>>> I don't see any other relevant changes since v6.14. I can try to see if
>>>>> I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test
>>>>> systems here.
>>>>
>>>> On my system I see this at boot after loading the ice module from
>>>>
>>>> $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec>
>>>> 26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice]
>>>> func:ice_get_irq_res
>>>>> 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs
>>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats
>>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats
>>>>> 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings
>>>>> 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings
>>>>> 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector
>>>>> 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs
>>>>> 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring
>>>>> 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page
>>>>
>>>> Its about 1GB for the mapped pages. I don't see any increase moment to
>>>> moment. I've started an iperf session to simulate some traffic, and I'll
>>>> leave this running to see if anything changes overnight.
>>>>
>>>> Is there anything else that you can share about the traffic setup or
>>>> otherwise that I could look into? Your system seems to use ~2.5 x the
>>>> buffer size as mine, but that might just be a smaller number of CPUs.
>>>>
>>>> Hopefully I'll get some more results overnight.
>>>
>>> The traffic is random production workloads from VMs, using standard
>>> Linux or OVS bridges. There is no specific pattern to it. I haven’t
>>> had any luck reproducing (or was not patient enough) this with iperf3
>>> myself. The two active (UP) interfaces are in an LACP bonding setup.
>>> Here are our ethtool settings for the two member ports (em1 and p3p1)
>>>
>>
>> I had iperf3 running overnight and the memory usage for
>> ice_alloc_mapped_pages is constant here. Mine was direct connections
>> without bridge or bonding. From your description I assume there's no XDP
>> happening either.
>
> Yes, no XDP in use.
>
> BTW the allocinfo after 6days uptime:
> # uptime ; sort -g /proc/allocinfo| tail -n 15
> 11:46:44 up 6 days, 2:18, 1 user, load average: 9.24, 11.33, 15.07
> 102489024 533797 fs/dcache.c:1681 func:__d_alloc
> 106229760 25935 mm/shmem.c:1854 func:shmem_alloc_folio
> 117118192 103097 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
> 162783232 7656 mm/slub.c:2452 func:alloc_slab_page
> 189906944 46364 mm/memory.c:1056 func:folio_prealloc
> 499384320 121920 mm/percpu-vm.c:95 func:pcpu_alloc_pages
> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
> 625876992 54186 mm/slub.c:2450 func:alloc_slab_page
> 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
> 1014710272 247732 mm/filemap.c:1978 func:__filemap_get_folio
> 1056710656 257986 mm/memory.c:1054 func:folio_prealloc
> 1279262720 610 mm/khugepaged.c:1084 func:alloc_charge_folio
> 1334530048 325763 mm/readahead.c:186 func:ractl_alloc_folio
> 3341238272 412215 drivers/net/ethernet/intel/ice/ice_txrx.c:681
> [ice] func:ice_alloc_mapped_page
>
I have a suspicion that the issue is related to the updating of
page_count in ice_get_rx_pgcnt(). The i40e driver has a very similar
logic for page reuse but doesn't do this. It also has a counter to track
failure to re-use the Rx pages.
Commit 11c4aa074d54 ("ice: gather page_count()'s of each frag right
before XDP prog call") changed the logic to update page_count of the Rx
page just prior to the XDP call instead of at the point where we get the
page from ice_get_rx_buf(). I think this change was originally
introduced while we were trying out an experimental refactor of the
hotpath to handle fragments differently, which no longer happens since
743bbd93cf29 ("ice: put Rx buffers after being done with current
frame"), which ironically was part of this very same series..
I think this updating of page count is accidentally causing us to
miscount when we could perform page-reuse, and ultimately causes us to
leak the page somehow. I'm still investigating, but I think this might
trigger if somehow the page pgcnt - pagecnt_bias becomes >1, we don't
reuse the page.
The i40e driver stores the page count in i40e_get_rx_buffer, and I think
our updating it later can somehow get things out-of-sync.
Do you know if your traffic pattern happens to send fragmented frames? I
think iperf doesn't do that, which might be part of whats causing this
issue. I'm going to try to see if I can generate such fragmentation to
confirm. Is your MTU kept at the default ethernet size?
At the very least I'm going to propose a patch for ice similar to the
one from Joe Damato to track the rx busy page count. That might at least
help track something..
Thanks,
Jake
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-07-02 21:56 ` Jacob Keller
@ 2025-07-03 6:46 ` Jaroslav Pulchart
2025-07-03 16:16 ` Jacob Keller
0 siblings, 1 reply; 59+ messages in thread
From: Jaroslav Pulchart @ 2025-07-03 6:46 UTC (permalink / raw)
To: Jacob Keller
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
>
> On 7/2/2025 2:48 AM, Jaroslav Pulchart wrote:
> >>
> >> On 6/30/2025 11:48 PM, Jaroslav Pulchart wrote:
> >>>> On 6/30/2025 2:56 PM, Jacob Keller wrote:
> >>>>> Unfortunately it looks like the fix I mentioned has landed in 6.14, so
> >>>>> its not a fix for your issue (since you mentioned 6.14 has failed
> >>>>> testing in your system)
> >>>>>
> >>>>> $ git describe --first-parent --contains --match=v* --exclude=*rc*
> >>>>> 743bbd93cf29f653fae0e1416a31f03231689911
> >>>>> v6.14~251^2~15^2~2
> >>>>>
> >>>>> I don't see any other relevant changes since v6.14. I can try to see if
> >>>>> I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test
> >>>>> systems here.
> >>>>
> >>>> On my system I see this at boot after loading the ice module from
> >>>>
> >>>> $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec>
> >>>> 26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice]
> >>>> func:ice_get_irq_res
> >>>>> 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs
> >>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats
> >>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats
> >>>>> 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings
> >>>>> 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings
> >>>>> 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector
> >>>>> 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs
> >>>>> 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring
> >>>>> 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page
> >>>>
> >>>> Its about 1GB for the mapped pages. I don't see any increase moment to
> >>>> moment. I've started an iperf session to simulate some traffic, and I'll
> >>>> leave this running to see if anything changes overnight.
> >>>>
> >>>> Is there anything else that you can share about the traffic setup or
> >>>> otherwise that I could look into? Your system seems to use ~2.5 x the
> >>>> buffer size as mine, but that might just be a smaller number of CPUs.
> >>>>
> >>>> Hopefully I'll get some more results overnight.
> >>>
> >>> The traffic is random production workloads from VMs, using standard
> >>> Linux or OVS bridges. There is no specific pattern to it. I haven’t
> >>> had any luck reproducing (or was not patient enough) this with iperf3
> >>> myself. The two active (UP) interfaces are in an LACP bonding setup.
> >>> Here are our ethtool settings for the two member ports (em1 and p3p1)
> >>>
> >>
> >> I had iperf3 running overnight and the memory usage for
> >> ice_alloc_mapped_pages is constant here. Mine was direct connections
> >> without bridge or bonding. From your description I assume there's no XDP
> >> happening either.
> >
> > Yes, no XDP in use.
> >
> > BTW the allocinfo after 6days uptime:
> > # uptime ; sort -g /proc/allocinfo| tail -n 15
> > 11:46:44 up 6 days, 2:18, 1 user, load average: 9.24, 11.33, 15.07
> > 102489024 533797 fs/dcache.c:1681 func:__d_alloc
> > 106229760 25935 mm/shmem.c:1854 func:shmem_alloc_folio
> > 117118192 103097 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
> > 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
> > 162783232 7656 mm/slub.c:2452 func:alloc_slab_page
> > 189906944 46364 mm/memory.c:1056 func:folio_prealloc
> > 499384320 121920 mm/percpu-vm.c:95 func:pcpu_alloc_pages
> > 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
> > 625876992 54186 mm/slub.c:2450 func:alloc_slab_page
> > 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
> > 1014710272 247732 mm/filemap.c:1978 func:__filemap_get_folio
> > 1056710656 257986 mm/memory.c:1054 func:folio_prealloc
> > 1279262720 610 mm/khugepaged.c:1084 func:alloc_charge_folio
> > 1334530048 325763 mm/readahead.c:186 func:ractl_alloc_folio
> > 3341238272 412215 drivers/net/ethernet/intel/ice/ice_txrx.c:681
> > [ice] func:ice_alloc_mapped_page
> >
> I have a suspicion that the issue is related to the updating of
> page_count in ice_get_rx_pgcnt(). The i40e driver has a very similar
> logic for page reuse but doesn't do this. It also has a counter to track
> failure to re-use the Rx pages.
>
> Commit 11c4aa074d54 ("ice: gather page_count()'s of each frag right
> before XDP prog call") changed the logic to update page_count of the Rx
> page just prior to the XDP call instead of at the point where we get the
> page from ice_get_rx_buf(). I think this change was originally
> introduced while we were trying out an experimental refactor of the
> hotpath to handle fragments differently, which no longer happens since
> 743bbd93cf29 ("ice: put Rx buffers after being done with current
> frame"), which ironically was part of this very same series..
>
> I think this updating of page count is accidentally causing us to
> miscount when we could perform page-reuse, and ultimately causes us to
> leak the page somehow. I'm still investigating, but I think this might
> trigger if somehow the page pgcnt - pagecnt_bias becomes >1, we don't
> reuse the page.
>
> The i40e driver stores the page count in i40e_get_rx_buffer, and I think
> our updating it later can somehow get things out-of-sync.
>
> Do you know if your traffic pattern happens to send fragmented frames? I
Hmm, I check the
* node_netstat_Ip_Frag* metrics and they are empty(do-not-exists),
* shortly run "tcpdump -n -i any 'ip[6:2] & 0x3fff != 0'" and nothing was found
looks to me like there is no fragmentation.
> think iperf doesn't do that, which might be part of whats causing this
> issue. I'm going to try to see if I can generate such fragmentation to
> confirm. Is your MTU kept at the default ethernet size?
Our MTU size is set to 9000 everywhere.
>
> At the very least I'm going to propose a patch for ice similar to the
> one from Joe Damato to track the rx busy page count. That might at least
> help track something..
>
> Thanks,
> Jake
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-07-03 6:46 ` Jaroslav Pulchart
@ 2025-07-03 16:16 ` Jacob Keller
2025-07-04 19:30 ` Maciej Fijalkowski
2025-07-07 18:32 ` Jacob Keller
0 siblings, 2 replies; 59+ messages in thread
From: Jacob Keller @ 2025-07-03 16:16 UTC (permalink / raw)
To: Jaroslav Pulchart
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1.1: Type: text/plain, Size: 6719 bytes --]
On 7/2/2025 11:46 PM, Jaroslav Pulchart wrote:
>>
>> On 7/2/2025 2:48 AM, Jaroslav Pulchart wrote:
>>>>
>>>> On 6/30/2025 11:48 PM, Jaroslav Pulchart wrote:
>>>>>> On 6/30/2025 2:56 PM, Jacob Keller wrote:
>>>>>>> Unfortunately it looks like the fix I mentioned has landed in 6.14, so
>>>>>>> its not a fix for your issue (since you mentioned 6.14 has failed
>>>>>>> testing in your system)
>>>>>>>
>>>>>>> $ git describe --first-parent --contains --match=v* --exclude=*rc*
>>>>>>> 743bbd93cf29f653fae0e1416a31f03231689911
>>>>>>> v6.14~251^2~15^2~2
>>>>>>>
>>>>>>> I don't see any other relevant changes since v6.14. I can try to see if
>>>>>>> I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test
>>>>>>> systems here.
>>>>>>
>>>>>> On my system I see this at boot after loading the ice module from
>>>>>>
>>>>>> $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec>
>>>>>> 26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice]
>>>>>> func:ice_get_irq_res
>>>>>>> 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs
>>>>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats
>>>>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats
>>>>>>> 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings
>>>>>>> 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings
>>>>>>> 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector
>>>>>>> 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs
>>>>>>> 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring
>>>>>>> 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page
>>>>>>
>>>>>> Its about 1GB for the mapped pages. I don't see any increase moment to
>>>>>> moment. I've started an iperf session to simulate some traffic, and I'll
>>>>>> leave this running to see if anything changes overnight.
>>>>>>
>>>>>> Is there anything else that you can share about the traffic setup or
>>>>>> otherwise that I could look into? Your system seems to use ~2.5 x the
>>>>>> buffer size as mine, but that might just be a smaller number of CPUs.
>>>>>>
>>>>>> Hopefully I'll get some more results overnight.
>>>>>
>>>>> The traffic is random production workloads from VMs, using standard
>>>>> Linux or OVS bridges. There is no specific pattern to it. I haven’t
>>>>> had any luck reproducing (or was not patient enough) this with iperf3
>>>>> myself. The two active (UP) interfaces are in an LACP bonding setup.
>>>>> Here are our ethtool settings for the two member ports (em1 and p3p1)
>>>>>
>>>>
>>>> I had iperf3 running overnight and the memory usage for
>>>> ice_alloc_mapped_pages is constant here. Mine was direct connections
>>>> without bridge or bonding. From your description I assume there's no XDP
>>>> happening either.
>>>
>>> Yes, no XDP in use.
>>>
>>> BTW the allocinfo after 6days uptime:
>>> # uptime ; sort -g /proc/allocinfo| tail -n 15
>>> 11:46:44 up 6 days, 2:18, 1 user, load average: 9.24, 11.33, 15.07
>>> 102489024 533797 fs/dcache.c:1681 func:__d_alloc
>>> 106229760 25935 mm/shmem.c:1854 func:shmem_alloc_folio
>>> 117118192 103097 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
>>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
>>> 162783232 7656 mm/slub.c:2452 func:alloc_slab_page
>>> 189906944 46364 mm/memory.c:1056 func:folio_prealloc
>>> 499384320 121920 mm/percpu-vm.c:95 func:pcpu_alloc_pages
>>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
>>> 625876992 54186 mm/slub.c:2450 func:alloc_slab_page
>>> 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
>>> 1014710272 247732 mm/filemap.c:1978 func:__filemap_get_folio
>>> 1056710656 257986 mm/memory.c:1054 func:folio_prealloc
>>> 1279262720 610 mm/khugepaged.c:1084 func:alloc_charge_folio
>>> 1334530048 325763 mm/readahead.c:186 func:ractl_alloc_folio
>>> 3341238272 412215 drivers/net/ethernet/intel/ice/ice_txrx.c:681
>>> [ice] func:ice_alloc_mapped_page
>>>
>> I have a suspicion that the issue is related to the updating of
>> page_count in ice_get_rx_pgcnt(). The i40e driver has a very similar
>> logic for page reuse but doesn't do this. It also has a counter to track
>> failure to re-use the Rx pages.
>>
>> Commit 11c4aa074d54 ("ice: gather page_count()'s of each frag right
>> before XDP prog call") changed the logic to update page_count of the Rx
>> page just prior to the XDP call instead of at the point where we get the
>> page from ice_get_rx_buf(). I think this change was originally
>> introduced while we were trying out an experimental refactor of the
>> hotpath to handle fragments differently, which no longer happens since
>> 743bbd93cf29 ("ice: put Rx buffers after being done with current
>> frame"), which ironically was part of this very same series..
>>
>> I think this updating of page count is accidentally causing us to
>> miscount when we could perform page-reuse, and ultimately causes us to
>> leak the page somehow. I'm still investigating, but I think this might
>> trigger if somehow the page pgcnt - pagecnt_bias becomes >1, we don't
>> reuse the page.
>>
>> The i40e driver stores the page count in i40e_get_rx_buffer, and I think
>> our updating it later can somehow get things out-of-sync.
>>
>> Do you know if your traffic pattern happens to send fragmented frames? I
>
> Hmm, I check the
> * node_netstat_Ip_Frag* metrics and they are empty(do-not-exists),
> * shortly run "tcpdump -n -i any 'ip[6:2] & 0x3fff != 0'" and nothing was found
> looks to me like there is no fragmentation.
>
Good to rule it out at least.
>> think iperf doesn't do that, which might be part of whats causing this
>> issue. I'm going to try to see if I can generate such fragmentation to
>> confirm. Is your MTU kept at the default ethernet size?
>
> Our MTU size is set to 9000 everywhere.
>
Ok. I am re-trying with MTU 9000 and using some traffic generated by wrk
now. I do see much larger memory use (~2GB) when using MTU 9000, so that
tracks with what your system shows. Currently its fluctuating between
1.9 and 2G. I'll leave this going for a couple of days while on vacation
and see if anything pops up.
Thanks,
Jake
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-07-03 16:16 ` Jacob Keller
@ 2025-07-04 19:30 ` Maciej Fijalkowski
2025-07-07 18:32 ` Jacob Keller
1 sibling, 0 replies; 59+ messages in thread
From: Maciej Fijalkowski @ 2025-07-04 19:30 UTC (permalink / raw)
To: Jacob Keller
Cc: Jaroslav Pulchart, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
On Thu, Jul 03, 2025 at 09:16:35AM -0700, Jacob Keller wrote:
>
>
> On 7/2/2025 11:46 PM, Jaroslav Pulchart wrote:
> >>
> >> On 7/2/2025 2:48 AM, Jaroslav Pulchart wrote:
> >>>>
> >>>> On 6/30/2025 11:48 PM, Jaroslav Pulchart wrote:
> >>>>>> On 6/30/2025 2:56 PM, Jacob Keller wrote:
> >>>>>>> Unfortunately it looks like the fix I mentioned has landed in 6.14, so
> >>>>>>> its not a fix for your issue (since you mentioned 6.14 has failed
> >>>>>>> testing in your system)
> >>>>>>>
> >>>>>>> $ git describe --first-parent --contains --match=v* --exclude=*rc*
> >>>>>>> 743bbd93cf29f653fae0e1416a31f03231689911
> >>>>>>> v6.14~251^2~15^2~2
> >>>>>>>
> >>>>>>> I don't see any other relevant changes since v6.14. I can try to see if
> >>>>>>> I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test
> >>>>>>> systems here.
> >>>>>>
> >>>>>> On my system I see this at boot after loading the ice module from
> >>>>>>
> >>>>>> $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec>
> >>>>>> 26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice]
> >>>>>> func:ice_get_irq_res
> >>>>>>> 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs
> >>>>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats
> >>>>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats
> >>>>>>> 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings
> >>>>>>> 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings
> >>>>>>> 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector
> >>>>>>> 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs
> >>>>>>> 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring
> >>>>>>> 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page
> >>>>>>
> >>>>>> Its about 1GB for the mapped pages. I don't see any increase moment to
> >>>>>> moment. I've started an iperf session to simulate some traffic, and I'll
> >>>>>> leave this running to see if anything changes overnight.
> >>>>>>
> >>>>>> Is there anything else that you can share about the traffic setup or
> >>>>>> otherwise that I could look into? Your system seems to use ~2.5 x the
> >>>>>> buffer size as mine, but that might just be a smaller number of CPUs.
> >>>>>>
> >>>>>> Hopefully I'll get some more results overnight.
> >>>>>
> >>>>> The traffic is random production workloads from VMs, using standard
> >>>>> Linux or OVS bridges. There is no specific pattern to it. I haven’t
> >>>>> had any luck reproducing (or was not patient enough) this with iperf3
> >>>>> myself. The two active (UP) interfaces are in an LACP bonding setup.
> >>>>> Here are our ethtool settings for the two member ports (em1 and p3p1)
> >>>>>
> >>>>
> >>>> I had iperf3 running overnight and the memory usage for
> >>>> ice_alloc_mapped_pages is constant here. Mine was direct connections
> >>>> without bridge or bonding. From your description I assume there's no XDP
> >>>> happening either.
> >>>
> >>> Yes, no XDP in use.
> >>>
> >>> BTW the allocinfo after 6days uptime:
> >>> # uptime ; sort -g /proc/allocinfo| tail -n 15
> >>> 11:46:44 up 6 days, 2:18, 1 user, load average: 9.24, 11.33, 15.07
> >>> 102489024 533797 fs/dcache.c:1681 func:__d_alloc
> >>> 106229760 25935 mm/shmem.c:1854 func:shmem_alloc_folio
> >>> 117118192 103097 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
> >>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
> >>> 162783232 7656 mm/slub.c:2452 func:alloc_slab_page
> >>> 189906944 46364 mm/memory.c:1056 func:folio_prealloc
> >>> 499384320 121920 mm/percpu-vm.c:95 func:pcpu_alloc_pages
> >>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
> >>> 625876992 54186 mm/slub.c:2450 func:alloc_slab_page
> >>> 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
> >>> 1014710272 247732 mm/filemap.c:1978 func:__filemap_get_folio
> >>> 1056710656 257986 mm/memory.c:1054 func:folio_prealloc
> >>> 1279262720 610 mm/khugepaged.c:1084 func:alloc_charge_folio
> >>> 1334530048 325763 mm/readahead.c:186 func:ractl_alloc_folio
> >>> 3341238272 412215 drivers/net/ethernet/intel/ice/ice_txrx.c:681
> >>> [ice] func:ice_alloc_mapped_page
> >>>
> >> I have a suspicion that the issue is related to the updating of
> >> page_count in ice_get_rx_pgcnt(). The i40e driver has a very similar
> >> logic for page reuse but doesn't do this. It also has a counter to track
> >> failure to re-use the Rx pages.
> >>
> >> Commit 11c4aa074d54 ("ice: gather page_count()'s of each frag right
> >> before XDP prog call") changed the logic to update page_count of the Rx
> >> page just prior to the XDP call instead of at the point where we get the
> >> page from ice_get_rx_buf(). I think this change was originally
> >> introduced while we were trying out an experimental refactor of the
> >> hotpath to handle fragments differently, which no longer happens since
> >> 743bbd93cf29 ("ice: put Rx buffers after being done with current
> >> frame"), which ironically was part of this very same series..
> >>
> >> I think this updating of page count is accidentally causing us to
> >> miscount when we could perform page-reuse, and ultimately causes us to
> >> leak the page somehow. I'm still investigating, but I think this might
> >> trigger if somehow the page pgcnt - pagecnt_bias becomes >1, we don't
> >> reuse the page.
> >>
> >> The i40e driver stores the page count in i40e_get_rx_buffer, and I think
> >> our updating it later can somehow get things out-of-sync.
> >>
> >> Do you know if your traffic pattern happens to send fragmented frames? I
> >
> > Hmm, I check the
> > * node_netstat_Ip_Frag* metrics and they are empty(do-not-exists),
> > * shortly run "tcpdump -n -i any 'ip[6:2] & 0x3fff != 0'" and nothing was found
> > looks to me like there is no fragmentation.
> >
>
> Good to rule it out at least.
>
> >> think iperf doesn't do that, which might be part of whats causing this
> >> issue. I'm going to try to see if I can generate such fragmentation to
> >> confirm. Is your MTU kept at the default ethernet size?
> >
> > Our MTU size is set to 9000 everywhere.
> >
>
> Ok. I am re-trying with MTU 9000 and using some traffic generated by wrk
> now. I do see much larger memory use (~2GB) when using MTU 9000, so that
> tracks with what your system shows. Currently its fluctuating between
> 1.9 and 2G. I'll leave this going for a couple of days while on vacation
> and see if anything pops up.
I was thinking if order-1 pages might do the mess there for some reason
since for 9k mtu we pull them and split into half.
Maybe it would be worth trying out if legacy-rx (which will work on
order-0 pages) doesn't have this issue? but that would require 8k mtu.
>
> Thanks,
> Jake
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-07-03 16:16 ` Jacob Keller
2025-07-04 19:30 ` Maciej Fijalkowski
@ 2025-07-07 18:32 ` Jacob Keller
2025-07-07 22:03 ` Jacob Keller
1 sibling, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2025-07-07 18:32 UTC (permalink / raw)
To: Jaroslav Pulchart
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1.1: Type: text/plain, Size: 1394 bytes --]
On 7/3/2025 9:16 AM, Jacob Keller wrote:
> On 7/2/2025 11:46 PM, Jaroslav Pulchart wrote:
>>> think iperf doesn't do that, which might be part of whats causing this
>>> issue. I'm going to try to see if I can generate such fragmentation to
>>> confirm. Is your MTU kept at the default ethernet size?
>>
>> Our MTU size is set to 9000 everywhere.
>>
>
> Ok. I am re-trying with MTU 9000 and using some traffic generated by wrk
> now. I do see much larger memory use (~2GB) when using MTU 9000, so that
> tracks with what your system shows. Currently its fluctuating between
> 1.9 and 2G. I'll leave this going for a couple of days while on vacation
> and see if anything pops up.
>
> Thanks,
> Jake
Good news! After several days of running a wrk and iperf3 workload with
9k MTU, I see a significant increase in the memory usage from the page
allocations:
7.3G 953314 drivers/net/ethernet/intel/ice/ice_txrx.c:682 [ice]
func:ice_alloc_mapped_page
~5GB extra.
At least I can reproduce this now. Its unclear how long it took since I
was out on vacation from Wednesday through until now.
I do have a singular hypothesis regarding the way we're currently
tracking the page count, (just based on differences between ice and
i40e). I'm going to attempt to align with i40e and re-run the test.
Hopefully I'll have some more information in a day or two.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-07-07 18:32 ` Jacob Keller
@ 2025-07-07 22:03 ` Jacob Keller
2025-07-09 0:50 ` Jacob Keller
0 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2025-07-07 22:03 UTC (permalink / raw)
To: Jaroslav Pulchart
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1.1: Type: text/plain, Size: 2279 bytes --]
On 7/7/2025 11:32 AM, Jacob Keller wrote:
>
>
> On 7/3/2025 9:16 AM, Jacob Keller wrote:
>> On 7/2/2025 11:46 PM, Jaroslav Pulchart wrote:
>>>> think iperf doesn't do that, which might be part of whats causing this
>>>> issue. I'm going to try to see if I can generate such fragmentation to
>>>> confirm. Is your MTU kept at the default ethernet size?
>>>
>>> Our MTU size is set to 9000 everywhere.
>>>
>>
>> Ok. I am re-trying with MTU 9000 and using some traffic generated by wrk
>> now. I do see much larger memory use (~2GB) when using MTU 9000, so that
>> tracks with what your system shows. Currently its fluctuating between
>> 1.9 and 2G. I'll leave this going for a couple of days while on vacation
>> and see if anything pops up.
>>
>> Thanks,
>> Jake
>
> Good news! After several days of running a wrk and iperf3 workload with
> 9k MTU, I see a significant increase in the memory usage from the page
> allocations:
>
> 7.3G 953314 drivers/net/ethernet/intel/ice/ice_txrx.c:682 [ice]
> func:ice_alloc_mapped_page
>
> ~5GB extra.
>
> At least I can reproduce this now. Its unclear how long it took since I
> was out on vacation from Wednesday through until now.
>
> I do have a singular hypothesis regarding the way we're currently
> tracking the page count, (just based on differences between ice and
> i40e). I'm going to attempt to align with i40e and re-run the test.
> Hopefully I'll have some more information in a day or two.
Bad news: my hypothesis was incorrect.
Good news: I can immediately see the problem if I set MTU to 9K and
start an iperf3 session and just watch the count of allocations from
ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell
if a change is helping.
I ported the stats from i40e for tracking the page allocations, and I
can see that we're allocating new pages despite not actually performing
releases.
I don't yet have a good understanding of what causes this, and the logic
in ice is pretty hard to track...
I'm going to try the page pool patches myself to see if this test bed
triggers the same problems. Unfortunately I think I need someone else
with more experience with the hotpath code to help figure out whats
going wrong here...
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-07-07 22:03 ` Jacob Keller
@ 2025-07-09 0:50 ` Jacob Keller
2025-07-09 19:11 ` Jacob Keller
0 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2025-07-09 0:50 UTC (permalink / raw)
To: Jaroslav Pulchart
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1.1: Type: text/plain, Size: 1346 bytes --]
On 7/7/2025 3:03 PM, Jacob Keller wrote:
> Bad news: my hypothesis was incorrect.
>
> Good news: I can immediately see the problem if I set MTU to 9K and
> start an iperf3 session and just watch the count of allocations from
> ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell
> if a change is helping.
>
> I ported the stats from i40e for tracking the page allocations, and I
> can see that we're allocating new pages despite not actually performing
> releases.
>
> I don't yet have a good understanding of what causes this, and the logic
> in ice is pretty hard to track...
>
> I'm going to try the page pool patches myself to see if this test bed
> triggers the same problems. Unfortunately I think I need someone else
> with more experience with the hotpath code to help figure out whats
> going wrong here...
I believe I have isolated this and figured out the issue: With 9K MTU,
sometimes the hardware posts a multi-buffer frame with an extra
descriptor that has a size of 0 bytes with no data in it. When this
happens, our logic for tracking buffers fails to free this buffer. We
then later overwrite the page because we failed to either free or re-use
the page, and our overwriting logic doesn't verify this.
I will have a fix with a more detailed description posted tomorrow.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-07-09 0:50 ` Jacob Keller
@ 2025-07-09 19:11 ` Jacob Keller
2025-07-09 21:04 ` Jaroslav Pulchart
0 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2025-07-09 19:11 UTC (permalink / raw)
To: Jaroslav Pulchart
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1.1: Type: text/plain, Size: 1772 bytes --]
On 7/8/2025 5:50 PM, Jacob Keller wrote:
>
>
> On 7/7/2025 3:03 PM, Jacob Keller wrote:
>> Bad news: my hypothesis was incorrect.
>>
>> Good news: I can immediately see the problem if I set MTU to 9K and
>> start an iperf3 session and just watch the count of allocations from
>> ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell
>> if a change is helping.
>>
>> I ported the stats from i40e for tracking the page allocations, and I
>> can see that we're allocating new pages despite not actually performing
>> releases.
>>
>> I don't yet have a good understanding of what causes this, and the logic
>> in ice is pretty hard to track...
>>
>> I'm going to try the page pool patches myself to see if this test bed
>> triggers the same problems. Unfortunately I think I need someone else
>> with more experience with the hotpath code to help figure out whats
>> going wrong here...
>
> I believe I have isolated this and figured out the issue: With 9K MTU,
> sometimes the hardware posts a multi-buffer frame with an extra
> descriptor that has a size of 0 bytes with no data in it. When this
> happens, our logic for tracking buffers fails to free this buffer. We
> then later overwrite the page because we failed to either free or re-use
> the page, and our overwriting logic doesn't verify this.
>
> I will have a fix with a more detailed description posted tomorrow.
@Jaroslav, I've posted a fix which I believe should resolve your issue:
https://lore.kernel.org/intel-wired-lan/20250709-jk-ice-fix-rx-mem-leak-v1-1-cfdd7eeea905@intel.com/T/#u
I am reasonably confident it should resolve the issue you reported. If
possible, it would be appreciated if you could test it and report back
to confirm.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-07-09 19:11 ` Jacob Keller
@ 2025-07-09 21:04 ` Jaroslav Pulchart
2025-07-09 21:15 ` Jacob Keller
0 siblings, 1 reply; 59+ messages in thread
From: Jaroslav Pulchart @ 2025-07-09 21:04 UTC (permalink / raw)
To: Jacob Keller
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
>
>
> On 7/8/2025 5:50 PM, Jacob Keller wrote:
> >
> >
> > On 7/7/2025 3:03 PM, Jacob Keller wrote:
> >> Bad news: my hypothesis was incorrect.
> >>
> >> Good news: I can immediately see the problem if I set MTU to 9K and
> >> start an iperf3 session and just watch the count of allocations from
> >> ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell
> >> if a change is helping.
> >>
> >> I ported the stats from i40e for tracking the page allocations, and I
> >> can see that we're allocating new pages despite not actually performing
> >> releases.
> >>
> >> I don't yet have a good understanding of what causes this, and the logic
> >> in ice is pretty hard to track...
> >>
> >> I'm going to try the page pool patches myself to see if this test bed
> >> triggers the same problems. Unfortunately I think I need someone else
> >> with more experience with the hotpath code to help figure out whats
> >> going wrong here...
> >
> > I believe I have isolated this and figured out the issue: With 9K MTU,
> > sometimes the hardware posts a multi-buffer frame with an extra
> > descriptor that has a size of 0 bytes with no data in it. When this
> > happens, our logic for tracking buffers fails to free this buffer. We
> > then later overwrite the page because we failed to either free or re-use
> > the page, and our overwriting logic doesn't verify this.
> >
> > I will have a fix with a more detailed description posted tomorrow.
>
> @Jaroslav, I've posted a fix which I believe should resolve your issue:
>
> https://lore.kernel.org/intel-wired-lan/20250709-jk-ice-fix-rx-mem-leak-v1-1-cfdd7eeea905@intel.com/T/#u
>
> I am reasonably confident it should resolve the issue you reported. If
> possible, it would be appreciated if you could test it and report back
> to confirm.
@Jacob that’s excellent news!
I’ve built and installed 6.15.5 with your patch on one of our servers
(strange that I had to disable CONFIG_MEM_ALLOC_PROFILING with this
patch or the kernel wouldn’t boot) and started a VM running our
production traffic. I’ll let it run for a day-two, observe the memory
utilization per NUMA node and report back.
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-07-09 21:04 ` Jaroslav Pulchart
@ 2025-07-09 21:15 ` Jacob Keller
2025-07-11 18:16 ` Jaroslav Pulchart
0 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2025-07-09 21:15 UTC (permalink / raw)
To: Jaroslav Pulchart
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1.1: Type: text/plain, Size: 2406 bytes --]
On 7/9/2025 2:04 PM, Jaroslav Pulchart wrote:
>>
>>
>> On 7/8/2025 5:50 PM, Jacob Keller wrote:
>>>
>>>
>>> On 7/7/2025 3:03 PM, Jacob Keller wrote:
>>>> Bad news: my hypothesis was incorrect.
>>>>
>>>> Good news: I can immediately see the problem if I set MTU to 9K and
>>>> start an iperf3 session and just watch the count of allocations from
>>>> ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell
>>>> if a change is helping.
>>>>
>>>> I ported the stats from i40e for tracking the page allocations, and I
>>>> can see that we're allocating new pages despite not actually performing
>>>> releases.
>>>>
>>>> I don't yet have a good understanding of what causes this, and the logic
>>>> in ice is pretty hard to track...
>>>>
>>>> I'm going to try the page pool patches myself to see if this test bed
>>>> triggers the same problems. Unfortunately I think I need someone else
>>>> with more experience with the hotpath code to help figure out whats
>>>> going wrong here...
>>>
>>> I believe I have isolated this and figured out the issue: With 9K MTU,
>>> sometimes the hardware posts a multi-buffer frame with an extra
>>> descriptor that has a size of 0 bytes with no data in it. When this
>>> happens, our logic for tracking buffers fails to free this buffer. We
>>> then later overwrite the page because we failed to either free or re-use
>>> the page, and our overwriting logic doesn't verify this.
>>>
>>> I will have a fix with a more detailed description posted tomorrow.
>>
>> @Jaroslav, I've posted a fix which I believe should resolve your issue:
>>
>> https://lore.kernel.org/intel-wired-lan/20250709-jk-ice-fix-rx-mem-leak-v1-1-cfdd7eeea905@intel.com/T/#u
>>
>> I am reasonably confident it should resolve the issue you reported. If
>> possible, it would be appreciated if you could test it and report back
>> to confirm.
>
> @Jacob that’s excellent news!
>
> I’ve built and installed 6.15.5 with your patch on one of our servers
> (strange that I had to disable CONFIG_MEM_ALLOC_PROFILING with this
> patch or the kernel wouldn’t boot) and started a VM running our
> production traffic. I’ll let it run for a day-two, observe the memory
> utilization per NUMA node and report back.
Great! A bit odd you had to disable CONFIG_MEM_ALLOC_PROFILING. I didn't
have trouble on my kernel with it enabled.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-07-09 21:15 ` Jacob Keller
@ 2025-07-11 18:16 ` Jaroslav Pulchart
2025-07-11 22:30 ` Jacob Keller
0 siblings, 1 reply; 59+ messages in thread
From: Jaroslav Pulchart @ 2025-07-11 18:16 UTC (permalink / raw)
To: Jacob Keller
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1: Type: text/plain, Size: 2717 bytes --]
>
>
>
> On 7/9/2025 2:04 PM, Jaroslav Pulchart wrote:
> >>
> >>
> >> On 7/8/2025 5:50 PM, Jacob Keller wrote:
> >>>
> >>>
> >>> On 7/7/2025 3:03 PM, Jacob Keller wrote:
> >>>> Bad news: my hypothesis was incorrect.
> >>>>
> >>>> Good news: I can immediately see the problem if I set MTU to 9K and
> >>>> start an iperf3 session and just watch the count of allocations from
> >>>> ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell
> >>>> if a change is helping.
> >>>>
> >>>> I ported the stats from i40e for tracking the page allocations, and I
> >>>> can see that we're allocating new pages despite not actually performing
> >>>> releases.
> >>>>
> >>>> I don't yet have a good understanding of what causes this, and the logic
> >>>> in ice is pretty hard to track...
> >>>>
> >>>> I'm going to try the page pool patches myself to see if this test bed
> >>>> triggers the same problems. Unfortunately I think I need someone else
> >>>> with more experience with the hotpath code to help figure out whats
> >>>> going wrong here...
> >>>
> >>> I believe I have isolated this and figured out the issue: With 9K MTU,
> >>> sometimes the hardware posts a multi-buffer frame with an extra
> >>> descriptor that has a size of 0 bytes with no data in it. When this
> >>> happens, our logic for tracking buffers fails to free this buffer. We
> >>> then later overwrite the page because we failed to either free or re-use
> >>> the page, and our overwriting logic doesn't verify this.
> >>>
> >>> I will have a fix with a more detailed description posted tomorrow.
> >>
> >> @Jaroslav, I've posted a fix which I believe should resolve your issue:
> >>
> >> https://lore.kernel.org/intel-wired-lan/20250709-jk-ice-fix-rx-mem-leak-v1-1-cfdd7eeea905@intel.com/T/#u
> >>
> >> I am reasonably confident it should resolve the issue you reported. If
> >> possible, it would be appreciated if you could test it and report back
> >> to confirm.
> >
> > @Jacob that’s excellent news!
> >
> > I’ve built and installed 6.15.5 with your patch on one of our servers
> > (strange that I had to disable CONFIG_MEM_ALLOC_PROFILING with this
> > patch or the kernel wouldn’t boot) and started a VM running our
> > production traffic. I’ll let it run for a day-two, observe the memory
> > utilization per NUMA node and report back.
>
> Great! A bit odd you had to disable CONFIG_MEM_ALLOC_PROFILING. I didn't
> have trouble on my kernel with it enabled.
Status update after ~45h of uptime. So far so good, I do not see
continuous memory consumption increase on home numa nodes like before.
See attached "status_before_after_45h_uptime.png" comparison.
[-- Attachment #2: status_before_after_45h_uptime.png --]
[-- Type: image/png, Size: 355801 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-07-11 18:16 ` Jaroslav Pulchart
@ 2025-07-11 22:30 ` Jacob Keller
2025-07-14 5:34 ` Jaroslav Pulchart
0 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2025-07-11 22:30 UTC (permalink / raw)
To: Jaroslav Pulchart
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
[-- Attachment #1.1: Type: text/plain, Size: 2901 bytes --]
On 7/11/2025 11:16 AM, Jaroslav Pulchart wrote:
>>
>>
>>
>> On 7/9/2025 2:04 PM, Jaroslav Pulchart wrote:
>>>>
>>>>
>>>> On 7/8/2025 5:50 PM, Jacob Keller wrote:
>>>>>
>>>>>
>>>>> On 7/7/2025 3:03 PM, Jacob Keller wrote:
>>>>>> Bad news: my hypothesis was incorrect.
>>>>>>
>>>>>> Good news: I can immediately see the problem if I set MTU to 9K and
>>>>>> start an iperf3 session and just watch the count of allocations from
>>>>>> ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell
>>>>>> if a change is helping.
>>>>>>
>>>>>> I ported the stats from i40e for tracking the page allocations, and I
>>>>>> can see that we're allocating new pages despite not actually performing
>>>>>> releases.
>>>>>>
>>>>>> I don't yet have a good understanding of what causes this, and the logic
>>>>>> in ice is pretty hard to track...
>>>>>>
>>>>>> I'm going to try the page pool patches myself to see if this test bed
>>>>>> triggers the same problems. Unfortunately I think I need someone else
>>>>>> with more experience with the hotpath code to help figure out whats
>>>>>> going wrong here...
>>>>>
>>>>> I believe I have isolated this and figured out the issue: With 9K MTU,
>>>>> sometimes the hardware posts a multi-buffer frame with an extra
>>>>> descriptor that has a size of 0 bytes with no data in it. When this
>>>>> happens, our logic for tracking buffers fails to free this buffer. We
>>>>> then later overwrite the page because we failed to either free or re-use
>>>>> the page, and our overwriting logic doesn't verify this.
>>>>>
>>>>> I will have a fix with a more detailed description posted tomorrow.
>>>>
>>>> @Jaroslav, I've posted a fix which I believe should resolve your issue:
>>>>
>>>> https://lore.kernel.org/intel-wired-lan/20250709-jk-ice-fix-rx-mem-leak-v1-1-cfdd7eeea905@intel.com/T/#u
>>>>
>>>> I am reasonably confident it should resolve the issue you reported. If
>>>> possible, it would be appreciated if you could test it and report back
>>>> to confirm.
>>>
>>> @Jacob that’s excellent news!
>>>
>>> I’ve built and installed 6.15.5 with your patch on one of our servers
>>> (strange that I had to disable CONFIG_MEM_ALLOC_PROFILING with this
>>> patch or the kernel wouldn’t boot) and started a VM running our
>>> production traffic. I’ll let it run for a day-two, observe the memory
>>> utilization per NUMA node and report back.
>>
>> Great! A bit odd you had to disable CONFIG_MEM_ALLOC_PROFILING. I didn't
>> have trouble on my kernel with it enabled.
>
> Status update after ~45h of uptime. So far so good, I do not see
> continuous memory consumption increase on home numa nodes like before.
> See attached "status_before_after_45h_uptime.png" comparison.
Great news! Would you like your "Tested-by" being added to the commit
message when we submit the fix to netdev?
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-07-11 22:30 ` Jacob Keller
@ 2025-07-14 5:34 ` Jaroslav Pulchart
0 siblings, 0 replies; 59+ messages in thread
From: Jaroslav Pulchart @ 2025-07-14 5:34 UTC (permalink / raw)
To: Jacob Keller
Cc: Maciej Fijalkowski, Jakub Kicinski, Przemek Kitszel,
intel-wired-lan@lists.osuosl.org, Damato, Joe,
netdev@vger.kernel.org, Nguyen, Anthony L, Michal Swiatkowski,
Czapnik, Lukasz, Dumazet, Eric, Zaki, Ahmed, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek
>
> On 7/11/2025 11:16 AM, Jaroslav Pulchart wrote:
> >>
> >>
> >>
> >> On 7/9/2025 2:04 PM, Jaroslav Pulchart wrote:
> >>>>
> >>>>
> >>>> On 7/8/2025 5:50 PM, Jacob Keller wrote:
> >>>>>
> >>>>>
> >>>>> On 7/7/2025 3:03 PM, Jacob Keller wrote:
> >>>>>> Bad news: my hypothesis was incorrect.
> >>>>>>
> >>>>>> Good news: I can immediately see the problem if I set MTU to 9K and
> >>>>>> start an iperf3 session and just watch the count of allocations from
> >>>>>> ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell
> >>>>>> if a change is helping.
> >>>>>>
> >>>>>> I ported the stats from i40e for tracking the page allocations, and I
> >>>>>> can see that we're allocating new pages despite not actually performing
> >>>>>> releases.
> >>>>>>
> >>>>>> I don't yet have a good understanding of what causes this, and the logic
> >>>>>> in ice is pretty hard to track...
> >>>>>>
> >>>>>> I'm going to try the page pool patches myself to see if this test bed
> >>>>>> triggers the same problems. Unfortunately I think I need someone else
> >>>>>> with more experience with the hotpath code to help figure out whats
> >>>>>> going wrong here...
> >>>>>
> >>>>> I believe I have isolated this and figured out the issue: With 9K MTU,
> >>>>> sometimes the hardware posts a multi-buffer frame with an extra
> >>>>> descriptor that has a size of 0 bytes with no data in it. When this
> >>>>> happens, our logic for tracking buffers fails to free this buffer. We
> >>>>> then later overwrite the page because we failed to either free or re-use
> >>>>> the page, and our overwriting logic doesn't verify this.
> >>>>>
> >>>>> I will have a fix with a more detailed description posted tomorrow.
> >>>>
> >>>> @Jaroslav, I've posted a fix which I believe should resolve your issue:
> >>>>
> >>>> https://lore.kernel.org/intel-wired-lan/20250709-jk-ice-fix-rx-mem-leak-v1-1-cfdd7eeea905@intel.com/T/#u
> >>>>
> >>>> I am reasonably confident it should resolve the issue you reported. If
> >>>> possible, it would be appreciated if you could test it and report back
> >>>> to confirm.
> >>>
> >>> @Jacob that’s excellent news!
> >>>
> >>> I’ve built and installed 6.15.5 with your patch on one of our servers
> >>> (strange that I had to disable CONFIG_MEM_ALLOC_PROFILING with this
> >>> patch or the kernel wouldn’t boot) and started a VM running our
> >>> production traffic. I’ll let it run for a day-two, observe the memory
> >>> utilization per NUMA node and report back.
> >>
> >> Great! A bit odd you had to disable CONFIG_MEM_ALLOC_PROFILING. I didn't
> >> have trouble on my kernel with it enabled.
> >
> > Status update after ~45h of uptime. So far so good, I do not see
> > continuous memory consumption increase on home numa nodes like before.
> > See attached "status_before_after_45h_uptime.png" comparison.
>
> Great news! Would you like your "Tested-by" being added to the commit
> message when we submit the fix to netdev?
Jacob, absolutely.
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
2025-06-25 12:17 ` Jaroslav Pulchart
2025-06-25 14:03 ` Przemek Kitszel
@ 2025-06-25 14:53 ` Paul Menzel
1 sibling, 0 replies; 59+ messages in thread
From: Paul Menzel @ 2025-06-25 14:53 UTC (permalink / raw)
To: Jaroslav Pulchart
Cc: Jacob E Keller, Jakub Kicinski, Przemyslaw Kitszel, Joe Damato,
intel-wired-lan, netdev, Anthony L Nguyen, Michal Swiatkowski,
Lukasz Czapnik, Eric Dumazet, Ahmed Zaki, Martin Karsten,
Igor Raits, Daniel Secik, Zdenek Pesek, regressions
Dear Jaroslav,
Am 25.06.25 um 14:17 schrieb Jaroslav Pulchart:
> We are still facing the memory issue with Intel 810 NICs (even on latest
> 6.15.y).
Commit 492a044508ad13 ("ice: Add support for persistent NAPI config")
was added in Linux v6.13-rc1, and as until now, no fix could be
presented, but reverting it fixes your issue, I strongly recommend to
send a revert. No idea if it’s compiler depended or what else could be
the issue. But due to Linux’ no regression policy this should be
reverted as soon as possible.
Kind regards,
Paul
^ permalink raw reply [flat|nested] 59+ messages in thread