From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>,
Jakub Kicinski <kuba@kernel.org>,
Przemek Kitszel <przemyslaw.kitszel@intel.com>,
"intel-wired-lan@lists.osuosl.org"
<intel-wired-lan@lists.osuosl.org>,
"Damato, Joe" <jdamato@fastly.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"Nguyen, Anthony L" <anthony.l.nguyen@intel.com>,
Michal Swiatkowski <michal.swiatkowski@linux.intel.com>,
"Czapnik, Lukasz" <lukasz.czapnik@intel.com>,
"Dumazet, Eric" <edumazet@google.com>,
"Zaki, Ahmed" <ahmed.zaki@intel.com>,
Martin Karsten <mkarsten@uwaterloo.ca>,
"Igor Raits" <igor@gooddata.com>,
Daniel Secik <daniel.secik@gooddata.com>,
"Zdenek Pesek" <zdenek.pesek@gooddata.com>
Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
Date: Fri, 4 Jul 2025 21:30:02 +0200 [thread overview]
Message-ID: <aGgruu0EWqQnVRd8@boxer> (raw)
In-Reply-To: <aff93c23-4f46-4d52-bdaa-9ed365e87782@intel.com>
On Thu, Jul 03, 2025 at 09:16:35AM -0700, Jacob Keller wrote:
>
>
> On 7/2/2025 11:46 PM, Jaroslav Pulchart wrote:
> >>
> >> On 7/2/2025 2:48 AM, Jaroslav Pulchart wrote:
> >>>>
> >>>> On 6/30/2025 11:48 PM, Jaroslav Pulchart wrote:
> >>>>>> On 6/30/2025 2:56 PM, Jacob Keller wrote:
> >>>>>>> Unfortunately it looks like the fix I mentioned has landed in 6.14, so
> >>>>>>> its not a fix for your issue (since you mentioned 6.14 has failed
> >>>>>>> testing in your system)
> >>>>>>>
> >>>>>>> $ git describe --first-parent --contains --match=v* --exclude=*rc*
> >>>>>>> 743bbd93cf29f653fae0e1416a31f03231689911
> >>>>>>> v6.14~251^2~15^2~2
> >>>>>>>
> >>>>>>> I don't see any other relevant changes since v6.14. I can try to see if
> >>>>>>> I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test
> >>>>>>> systems here.
> >>>>>>
> >>>>>> On my system I see this at boot after loading the ice module from
> >>>>>>
> >>>>>> $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec>
> >>>>>> 26K 230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice]
> >>>>>> func:ice_get_irq_res
> >>>>>>> 48K 2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs
> >>>>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats
> >>>>>>> 57K 226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats
> >>>>>>> 85K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings
> >>>>>>> 339K 226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings
> >>>>>>> 678K 226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector
> >>>>>>> 1.1M 257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs
> >>>>>>> 7.2M 114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring
> >>>>>>> 896M 229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page
> >>>>>>
> >>>>>> Its about 1GB for the mapped pages. I don't see any increase moment to
> >>>>>> moment. I've started an iperf session to simulate some traffic, and I'll
> >>>>>> leave this running to see if anything changes overnight.
> >>>>>>
> >>>>>> Is there anything else that you can share about the traffic setup or
> >>>>>> otherwise that I could look into? Your system seems to use ~2.5 x the
> >>>>>> buffer size as mine, but that might just be a smaller number of CPUs.
> >>>>>>
> >>>>>> Hopefully I'll get some more results overnight.
> >>>>>
> >>>>> The traffic is random production workloads from VMs, using standard
> >>>>> Linux or OVS bridges. There is no specific pattern to it. I haven’t
> >>>>> had any luck reproducing (or was not patient enough) this with iperf3
> >>>>> myself. The two active (UP) interfaces are in an LACP bonding setup.
> >>>>> Here are our ethtool settings for the two member ports (em1 and p3p1)
> >>>>>
> >>>>
> >>>> I had iperf3 running overnight and the memory usage for
> >>>> ice_alloc_mapped_pages is constant here. Mine was direct connections
> >>>> without bridge or bonding. From your description I assume there's no XDP
> >>>> happening either.
> >>>
> >>> Yes, no XDP in use.
> >>>
> >>> BTW the allocinfo after 6days uptime:
> >>> # uptime ; sort -g /proc/allocinfo| tail -n 15
> >>> 11:46:44 up 6 days, 2:18, 1 user, load average: 9.24, 11.33, 15.07
> >>> 102489024 533797 fs/dcache.c:1681 func:__d_alloc
> >>> 106229760 25935 mm/shmem.c:1854 func:shmem_alloc_folio
> >>> 117118192 103097 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
> >>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
> >>> 162783232 7656 mm/slub.c:2452 func:alloc_slab_page
> >>> 189906944 46364 mm/memory.c:1056 func:folio_prealloc
> >>> 499384320 121920 mm/percpu-vm.c:95 func:pcpu_alloc_pages
> >>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
> >>> 625876992 54186 mm/slub.c:2450 func:alloc_slab_page
> >>> 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
> >>> 1014710272 247732 mm/filemap.c:1978 func:__filemap_get_folio
> >>> 1056710656 257986 mm/memory.c:1054 func:folio_prealloc
> >>> 1279262720 610 mm/khugepaged.c:1084 func:alloc_charge_folio
> >>> 1334530048 325763 mm/readahead.c:186 func:ractl_alloc_folio
> >>> 3341238272 412215 drivers/net/ethernet/intel/ice/ice_txrx.c:681
> >>> [ice] func:ice_alloc_mapped_page
> >>>
> >> I have a suspicion that the issue is related to the updating of
> >> page_count in ice_get_rx_pgcnt(). The i40e driver has a very similar
> >> logic for page reuse but doesn't do this. It also has a counter to track
> >> failure to re-use the Rx pages.
> >>
> >> Commit 11c4aa074d54 ("ice: gather page_count()'s of each frag right
> >> before XDP prog call") changed the logic to update page_count of the Rx
> >> page just prior to the XDP call instead of at the point where we get the
> >> page from ice_get_rx_buf(). I think this change was originally
> >> introduced while we were trying out an experimental refactor of the
> >> hotpath to handle fragments differently, which no longer happens since
> >> 743bbd93cf29 ("ice: put Rx buffers after being done with current
> >> frame"), which ironically was part of this very same series..
> >>
> >> I think this updating of page count is accidentally causing us to
> >> miscount when we could perform page-reuse, and ultimately causes us to
> >> leak the page somehow. I'm still investigating, but I think this might
> >> trigger if somehow the page pgcnt - pagecnt_bias becomes >1, we don't
> >> reuse the page.
> >>
> >> The i40e driver stores the page count in i40e_get_rx_buffer, and I think
> >> our updating it later can somehow get things out-of-sync.
> >>
> >> Do you know if your traffic pattern happens to send fragmented frames? I
> >
> > Hmm, I check the
> > * node_netstat_Ip_Frag* metrics and they are empty(do-not-exists),
> > * shortly run "tcpdump -n -i any 'ip[6:2] & 0x3fff != 0'" and nothing was found
> > looks to me like there is no fragmentation.
> >
>
> Good to rule it out at least.
>
> >> think iperf doesn't do that, which might be part of whats causing this
> >> issue. I'm going to try to see if I can generate such fragmentation to
> >> confirm. Is your MTU kept at the default ethernet size?
> >
> > Our MTU size is set to 9000 everywhere.
> >
>
> Ok. I am re-trying with MTU 9000 and using some traffic generated by wrk
> now. I do see much larger memory use (~2GB) when using MTU 9000, so that
> tracks with what your system shows. Currently its fluctuating between
> 1.9 and 2G. I'll leave this going for a couple of days while on vacation
> and see if anything pops up.
I was thinking if order-1 pages might do the mess there for some reason
since for 9k mtu we pull them and split into half.
Maybe it would be worth trying out if legacy-rx (which will work on
order-0 pages) doesn't have this issue? but that would require 8k mtu.
>
> Thanks,
> Jake
next prev parent reply other threads:[~2025-07-04 19:30 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-14 16:29 [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) Jaroslav Pulchart
2025-04-14 16:29 ` Jaroslav Pulchart
2025-04-14 17:15 ` [Intel-wired-lan] " Paul Menzel
2025-04-15 14:38 ` Przemek Kitszel
2025-04-15 14:38 ` Przemek Kitszel
2025-04-16 0:53 ` [Intel-wired-lan] " Jakub Kicinski
2025-04-16 0:53 ` Jakub Kicinski
2025-04-16 7:13 ` [Intel-wired-lan] " Jaroslav Pulchart
2025-04-16 7:13 ` Jaroslav Pulchart
2025-04-16 13:48 ` [Intel-wired-lan] " Jakub Kicinski
2025-04-16 13:48 ` Jakub Kicinski
2025-04-16 16:03 ` [Intel-wired-lan] " Jaroslav Pulchart
2025-04-16 16:03 ` Jaroslav Pulchart
2025-04-16 22:44 ` [Intel-wired-lan] " Jakub Kicinski
2025-04-16 22:44 ` Jakub Kicinski
2025-04-16 22:57 ` [Intel-wired-lan] " Keller, Jacob E
2025-04-16 22:57 ` Keller, Jacob E
2025-04-16 22:57 ` Keller, Jacob E
2025-04-16 22:57 ` Keller, Jacob E
2025-04-17 0:13 ` Jakub Kicinski
2025-04-17 17:52 ` Keller, Jacob E
2025-04-17 17:52 ` Keller, Jacob E
2025-05-21 9:32 ` Jaroslav Pulchart
2025-05-21 10:50 ` Jaroslav Pulchart
2025-06-04 8:42 ` Jaroslav Pulchart
2025-06-25 12:17 ` Jaroslav Pulchart
2025-06-25 14:03 ` Przemek Kitszel
2025-06-25 17:51 ` Jaroslav Pulchart
2025-06-25 20:25 ` Jakub Kicinski
2025-06-26 7:42 ` Jaroslav Pulchart
2025-06-30 7:35 ` Jaroslav Pulchart
2025-06-30 16:02 ` Jacob Keller
2025-06-30 17:24 ` Jaroslav Pulchart
2025-06-30 18:59 ` Jacob Keller
2025-06-30 20:01 ` Jaroslav Pulchart
2025-06-30 20:42 ` Jacob Keller
2025-06-30 21:56 ` Jacob Keller
2025-06-30 23:16 ` Jacob Keller
2025-07-01 6:48 ` Jaroslav Pulchart
2025-07-01 20:48 ` Jacob Keller
2025-07-02 9:48 ` Jaroslav Pulchart
2025-07-02 18:01 ` Jacob Keller
2025-07-02 21:56 ` Jacob Keller
2025-07-03 6:46 ` Jaroslav Pulchart
2025-07-03 16:16 ` Jacob Keller
2025-07-04 19:30 ` Maciej Fijalkowski [this message]
2025-07-07 18:32 ` Jacob Keller
2025-07-07 22:03 ` Jacob Keller
2025-07-09 0:50 ` Jacob Keller
2025-07-09 19:11 ` Jacob Keller
2025-07-09 21:04 ` Jaroslav Pulchart
2025-07-09 21:15 ` Jacob Keller
2025-07-11 18:16 ` Jaroslav Pulchart
2025-07-11 22:30 ` Jacob Keller
2025-07-14 5:34 ` Jaroslav Pulchart
2025-06-25 14:53 ` Paul Menzel
2025-07-04 16:55 ` Michal Kubiak
2025-07-05 7:01 ` Jaroslav Pulchart
2025-07-07 15:37 ` Jaroslav Pulchart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aGgruu0EWqQnVRd8@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=ahmed.zaki@intel.com \
--cc=anthony.l.nguyen@intel.com \
--cc=daniel.secik@gooddata.com \
--cc=edumazet@google.com \
--cc=igor@gooddata.com \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=jacob.e.keller@intel.com \
--cc=jaroslav.pulchart@gooddata.com \
--cc=jdamato@fastly.com \
--cc=kuba@kernel.org \
--cc=lukasz.czapnik@intel.com \
--cc=michal.swiatkowski@linux.intel.com \
--cc=mkarsten@uwaterloo.ca \
--cc=netdev@vger.kernel.org \
--cc=przemyslaw.kitszel@intel.com \
--cc=zdenek.pesek@gooddata.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.