Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jacob Keller <jacob.e.keller@intel.com>
To: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Przemek Kitszel <przemyslaw.kitszel@intel.com>,
	"intel-wired-lan@lists.osuosl.org"
	<intel-wired-lan@lists.osuosl.org>,
	"Damato, Joe" <jdamato@fastly.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"Nguyen, Anthony L" <anthony.l.nguyen@intel.com>,
	Michal Swiatkowski <michal.swiatkowski@linux.intel.com>,
	"Czapnik, Lukasz" <lukasz.czapnik@intel.com>,
	"Dumazet, Eric" <edumazet@google.com>,
	"Zaki, Ahmed" <ahmed.zaki@intel.com>,
	Martin Karsten <mkarsten@uwaterloo.ca>,
	"Igor Raits" <igor@gooddata.com>,
	Daniel Secik <daniel.secik@gooddata.com>,
	"Zdenek Pesek" <zdenek.pesek@gooddata.com>
Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
Date: Mon, 30 Jun 2025 13:42:20 -0700	[thread overview]
Message-ID: <48324fdb-59f5-4113-87cd-c3e6ad7560ec@intel.com> (raw)
In-Reply-To: <CAK8fFZ6FU1+1__FndEoFQgHqSXN+330qvNTWMvMfiXc2DpN8NQ@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 6777 bytes --]



On 6/30/2025 1:01 PM, Jaroslav Pulchart wrote:
>>
>>
>>
>> On 6/30/2025 10:24 AM, Jaroslav Pulchart wrote:
>>>>
>>>>
>>>>
>>>> On 6/30/2025 12:35 AM, Jaroslav Pulchart wrote:
>>>>>>
>>>>>>>
>>>>>>> On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote:
>>>>>>>> Great, please send me a link to the related patch set. I can apply them in
>>>>>>>> our kernel build and try them ASAP!
>>>>>>>
>>>>>>> Sorry if I'm repeating the question - have you tried
>>>>>>> CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels
>>>>>>> is low enough to use it for production workloads.
>>>>>>
>>>>>> I try it now, the fresh booted server:
>>>>>>
>>>>>> # sort -g /proc/allocinfo| tail -n 15
>>>>>>     45409728   236509 fs/dcache.c:1681 func:__d_alloc
>>>>>>     71041024    17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages
>>>>>>     71524352    11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages
>>>>>>     85098496     4486 mm/slub.c:2452 func:alloc_slab_page
>>>>>>    115470992   101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
>>>>>>    134479872    32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
>>>>>>    141426688    34528 mm/filemap.c:1978 func:__filemap_get_folio
>>>>>>    191594496    46776 mm/memory.c:1056 func:folio_prealloc
>>>>>>    360710144      172 mm/khugepaged.c:1084 func:alloc_charge_folio
>>>>>>    444076032    33790 mm/slub.c:2450 func:alloc_slab_page
>>>>>>    530579456   129536 mm/page_ext.c:271 func:alloc_page_ext
>>>>>>    975175680      465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
>>>>>>   1022427136   249616 mm/memory.c:1054 func:folio_prealloc
>>>>>>   1105125376   139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681
>>>>>> [ice] func:ice_alloc_mapped_page
>>>>>>   1621598208   395848 mm/readahead.c:186 func:ractl_alloc_folio
>>>>>>
>>>>>
>>>>> The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice]
>>>>> func:ice_alloc_mapped_page" is just growing...
>>>>>
>>>>> # uptime ; sort -g /proc/allocinfo| tail -n 15
>>>>>  09:33:58 up 4 days, 6 min,  1 user,  load average: 6.65, 8.18, 9.81
>>>>>
>>>>> # sort -g /proc/allocinfo| tail -n 15
>>>>>     85216896   443838 fs/dcache.c:1681 func:__d_alloc
>>>>>    106156032    25917 mm/shmem.c:1854 func:shmem_alloc_folio
>>>>>    116850096   102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
>>>>>    134479872    32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
>>>>>    143556608     6894 mm/slub.c:2452 func:alloc_slab_page
>>>>>    186793984    45604 mm/memory.c:1056 func:folio_prealloc
>>>>>    362807296    88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages
>>>>>    530579456   129536 mm/page_ext.c:271 func:alloc_page_ext
>>>>>    598237184    51309 mm/slub.c:2450 func:alloc_slab_page
>>>>>    838860800      400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
>>>>>    929083392   226827 mm/filemap.c:1978 func:__filemap_get_folio
>>>>>   1034657792   252602 mm/memory.c:1054 func:folio_prealloc
>>>>>   1262485504      602 mm/khugepaged.c:1084 func:alloc_charge_folio
>>>>>   1335377920   325970 mm/readahead.c:186 func:ractl_alloc_folio
>>>>>   2544877568   315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681
>>>>> [ice] func:ice_alloc_mapped_page
>>>>>
>>>> ice_alloc_mapped_page is the function used to allocate the pages for the
>>>> Rx ring buffers.
>>>>
>>>> There were a number of fixes for the hot path from Maciej which might be
>>>> related. Although those fixes were primarily for XDP they do impact the
>>>> regular hot path as well.
>>>>
>>>> These were fixes on top of work he did which landed in v6.13, so it
>>>> seems plausible they might be related. In particular one which mentions
>>>> a missing buffer put:
>>>>
>>>> 743bbd93cf29 ("ice: put Rx buffers after being done with current frame")
>>>>
>>>> It says the following:
>>>>>     While at it, address an error path of ice_add_xdp_frag() - we were
>>>>>     missing buffer putting from day 1 there.
>>>>>
>>>>
>>>> It seems to me the issue must be somehow related to the buffer cleanup
>>>> logic for the Rx ring, since thats the only thing allocated by
>>>> ice_alloc_mapped_page.
>>>>
>>>> It might be something fixed with the work Maciej did.. but it seems very
>>>> weird that 492a044508ad ("ice: Add support for persistent NAPI config")
>>>> would affect that logic at all....
>>>
>>> I believe there were/are at least two separate issues. Regarding
>>> commit 492a044508ad (“ice: Add support for persistent NAPI config”):
>>> * On 6.13.y and 6.14.y kernels, this change prevented us from lowering
>>> the driver’s initial, large memory allocation immediately after server
>>> power-up. A few hours (max few days) later, this inevitably led to an
>>> out-of-memory condition.
>>> * Reverting the commit in those series only delayed the OOM, it
>>> allowed the queue size (and thus memory footprint) to shrink on boot
>>> just as it did in 6.12.y but didn’t eliminate the underlying 'leak'.
>>> * In 6.15.y, however, that revert isn’t required (and isn’t even
>>> applicable). The after boot allocation can once again be tuned down
>>> without patching. Still, we observe the same increase in memory use
>>> over time, as shown in the 'allocmap' output.
>>> Thus, commit 492a044508ad led us down a false trail, or at the very
>>> least hastened the inevitable OOM.
>>
>> That seems reasonable. I'm still surprised the specific commit leads to
>> any large increase in memory, since it should only be a few bytes per
>> NAPI. But there may be some related driver-specific issues.
> 
> Actually, the large base allocation has existed for quite some time,
> the mentioned commit didn’t suddenly grow our memory usage, it only
> prevented us from shrinking it via "ethtool -L <iface> combined
> <small-number>"
> after boot. In other words, we’re still stuck with the same big
> allocation, we just can’t tune it down (till reverting the commit)
> 

Yes. My point is that I still don't understand the mechanism by which
that change *prevents* ethtool -L from working as you describe.

>>
>> Either way, we clearly need to isolate how we're leaking memory in the
>> hot path. I think it might be related to the fixes from Maciej which are
>> pretty recent so might not be in 6.13 or 6.14
> 
> I’m fine with the fix for the mainline (now 6.15.y), the 6.13.y and
> 6.14.y are already EOL. Could you please tell me which 6.15.y stable
> release first incorporates that patch? Is it included in current
> 6.15.5, or will it arrive in a later point release?

I'm not certain if this fix actually is resolving your issue, but I will
figure out which stable kernels have it shortly.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

next prev parent reply	other threads:[~2025-06-30 20:42 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-14 16:29 Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) Jaroslav Pulchart
2025-04-14 17:15 ` [Intel-wired-lan] " Paul Menzel
2025-04-15 14:38 ` Przemek Kitszel
2025-04-16  0:53   ` Jakub Kicinski
2025-04-16  7:13     ` Jaroslav Pulchart
2025-04-16 13:48       ` Jakub Kicinski
2025-04-16 16:03         ` Jaroslav Pulchart
2025-04-16 22:44           ` Jakub Kicinski
2025-04-16 22:57             ` [Intel-wired-lan] " Keller, Jacob E
2025-04-16 22:57           ` Keller, Jacob E
2025-04-17  0:13             ` Jakub Kicinski
2025-04-17 17:52               ` Keller, Jacob E
2025-05-21 10:50                 ` Jaroslav Pulchart
2025-06-04  8:42                   ` Jaroslav Pulchart
     [not found]                     ` <CAK8fFZ5XTO9dGADuMSV0hJws-6cZE9equa3X6dfTBgDyzE1pEQ@mail.gmail.com>
2025-06-25 14:03                       ` Przemek Kitszel
     [not found]                         ` <CAK8fFZ7LREBEdhXjBAKuaqktOz1VwsBTxcCpLBsa+dkMj4Pyyw@mail.gmail.com>
2025-06-25 20:25                           ` Jakub Kicinski
2025-06-26  7:42                             ` Jaroslav Pulchart
2025-06-30  7:35                               ` Jaroslav Pulchart
2025-06-30 16:02                                 ` Jacob Keller
2025-06-30 17:24                                   ` Jaroslav Pulchart
2025-06-30 18:59                                     ` Jacob Keller
2025-06-30 20:01                                       ` Jaroslav Pulchart
2025-06-30 20:42                                         ` Jacob Keller [this message]
2025-06-30 21:56                                         ` Jacob Keller
2025-06-30 23:16                                           ` Jacob Keller
2025-07-01  6:48                                             ` Jaroslav Pulchart
2025-07-01 20:48                                               ` Jacob Keller
2025-07-02  9:48                                                 ` Jaroslav Pulchart
2025-07-02 18:01                                                   ` Jacob Keller
2025-07-02 21:56                                                   ` Jacob Keller
2025-07-03  6:46                                                     ` Jaroslav Pulchart
2025-07-03 16:16                                                       ` Jacob Keller
2025-07-04 19:30                                                         ` Maciej Fijalkowski
2025-07-07 18:32                                                         ` Jacob Keller
2025-07-07 22:03                                                           ` Jacob Keller
2025-07-09  0:50                                                             ` Jacob Keller
2025-07-09 19:11                                                               ` Jacob Keller
2025-07-09 21:04                                                                 ` Jaroslav Pulchart
2025-07-09 21:15                                                                   ` Jacob Keller
2025-07-11 18:16                                                                     ` Jaroslav Pulchart
2025-07-11 22:30                                                                       ` Jacob Keller
2025-07-14  5:34                                                                         ` Jaroslav Pulchart
2025-06-25 14:53                       ` Paul Menzel
2025-07-04 16:55 ` Michal Kubiak
2025-07-05  7:01   ` Jaroslav Pulchart
2025-07-07 15:37     ` Jaroslav Pulchart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48324fdb-59f5-4113-87cd-c3e6ad7560ec@intel.com \
    --to=jacob.e.keller@intel.com \
    --cc=ahmed.zaki@intel.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=daniel.secik@gooddata.com \
    --cc=edumazet@google.com \
    --cc=igor@gooddata.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jaroslav.pulchart@gooddata.com \
    --cc=jdamato@fastly.com \
    --cc=kuba@kernel.org \
    --cc=lukasz.czapnik@intel.com \
    --cc=maciej.fijalkowski@intel.com \
    --cc=michal.swiatkowski@linux.intel.com \
    --cc=mkarsten@uwaterloo.ca \
    --cc=netdev@vger.kernel.org \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=zdenek.pesek@gooddata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).