All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Przemek Kitszel <przemyslaw.kitszel@intel.com>,
	"intel-wired-lan@lists.osuosl.org"
	<intel-wired-lan@lists.osuosl.org>,
	"Damato, Joe" <jdamato@fastly.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"Nguyen, Anthony L" <anthony.l.nguyen@intel.com>,
	Michal Swiatkowski <michal.swiatkowski@linux.intel.com>,
	"Czapnik, Lukasz" <lukasz.czapnik@intel.com>,
	"Dumazet, Eric" <edumazet@google.com>,
	"Zaki, Ahmed" <ahmed.zaki@intel.com>,
	Martin Karsten <mkarsten@uwaterloo.ca>,
	"Igor Raits" <igor@gooddata.com>,
	Daniel Secik <daniel.secik@gooddata.com>,
	"Zdenek Pesek" <zdenek.pesek@gooddata.com>
Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)
Date: Fri, 4 Jul 2025 21:30:02 +0200	[thread overview]
Message-ID: <aGgruu0EWqQnVRd8@boxer> (raw)
In-Reply-To: <aff93c23-4f46-4d52-bdaa-9ed365e87782@intel.com>

On Thu, Jul 03, 2025 at 09:16:35AM -0700, Jacob Keller wrote:
> 
> 
> On 7/2/2025 11:46 PM, Jaroslav Pulchart wrote:
> >>
> >> On 7/2/2025 2:48 AM, Jaroslav Pulchart wrote:
> >>>>
> >>>> On 6/30/2025 11:48 PM, Jaroslav Pulchart wrote:
> >>>>>> On 6/30/2025 2:56 PM, Jacob Keller wrote:
> >>>>>>> Unfortunately it looks like the fix I mentioned has landed in 6.14, so
> >>>>>>> its not a fix for your issue (since you mentioned 6.14 has failed
> >>>>>>> testing in your system)
> >>>>>>>
> >>>>>>> $ git describe --first-parent --contains --match=v* --exclude=*rc*
> >>>>>>> 743bbd93cf29f653fae0e1416a31f03231689911
> >>>>>>> v6.14~251^2~15^2~2
> >>>>>>>
> >>>>>>> I don't see any other relevant changes since v6.14. I can try to see if
> >>>>>>> I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test
> >>>>>>> systems here.
> >>>>>>
> >>>>>> On my system I see this at boot after loading the ice module from
> >>>>>>
> >>>>>> $ grep -F "/ice/" /proc/allocinfo | sort -g | tail | numfmt --to=iec>
> >>>>>>       26K      230 drivers/net/ethernet/intel/ice/ice_irq.c:84 [ice]
> >>>>>> func:ice_get_irq_res
> >>>>>>>          48K        2 drivers/net/ethernet/intel/ice/ice_arfs.c:565 [ice] func:ice_init_arfs
> >>>>>>>          57K      226 drivers/net/ethernet/intel/ice/ice_lib.c:397 [ice] func:ice_vsi_alloc_ring_stats
> >>>>>>>          57K      226 drivers/net/ethernet/intel/ice/ice_lib.c:416 [ice] func:ice_vsi_alloc_ring_stats
> >>>>>>>          85K      226 drivers/net/ethernet/intel/ice/ice_lib.c:1398 [ice] func:ice_vsi_alloc_rings
> >>>>>>>         339K      226 drivers/net/ethernet/intel/ice/ice_lib.c:1422 [ice] func:ice_vsi_alloc_rings
> >>>>>>>         678K      226 drivers/net/ethernet/intel/ice/ice_base.c:109 [ice] func:ice_vsi_alloc_q_vector
> >>>>>>>         1.1M      257 drivers/net/ethernet/intel/ice/ice_fwlog.c:40 [ice] func:ice_fwlog_alloc_ring_buffs
> >>>>>>>         7.2M      114 drivers/net/ethernet/intel/ice/ice_txrx.c:493 [ice] func:ice_setup_rx_ring
> >>>>>>>         896M   229264 drivers/net/ethernet/intel/ice/ice_txrx.c:680 [ice] func:ice_alloc_mapped_page
> >>>>>>
> >>>>>> Its about 1GB for the mapped pages. I don't see any increase moment to
> >>>>>> moment. I've started an iperf session to simulate some traffic, and I'll
> >>>>>> leave this running to see if anything changes overnight.
> >>>>>>
> >>>>>> Is there anything else that you can share about the traffic setup or
> >>>>>> otherwise that I could look into?  Your system seems to use ~2.5 x the
> >>>>>> buffer size as mine, but that might just be a smaller number of CPUs.
> >>>>>>
> >>>>>> Hopefully I'll get some more results overnight.
> >>>>>
> >>>>> The traffic is random production workloads from VMs, using standard
> >>>>> Linux or OVS bridges. There is no specific pattern to it. I haven’t
> >>>>> had any luck reproducing (or was not patient enough) this with iperf3
> >>>>> myself. The two active (UP) interfaces are in an LACP bonding setup.
> >>>>> Here are our ethtool settings for the two member ports (em1 and p3p1)
> >>>>>
> >>>>
> >>>> I had iperf3 running overnight and the memory usage for
> >>>> ice_alloc_mapped_pages is constant here. Mine was direct connections
> >>>> without bridge or bonding. From your description I assume there's no XDP
> >>>> happening either.
> >>>
> >>> Yes, no XDP in use.
> >>>
> >>> BTW the allocinfo after 6days uptime:
> >>> # uptime ; sort -g /proc/allocinfo| tail -n 15
> >>>  11:46:44 up 6 days,  2:18,  1 user,  load average: 9.24, 11.33, 15.07
> >>>    102489024   533797 fs/dcache.c:1681 func:__d_alloc
> >>>    106229760    25935 mm/shmem.c:1854 func:shmem_alloc_folio
> >>>    117118192   103097 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
> >>>    134479872    32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
> >>>    162783232     7656 mm/slub.c:2452 func:alloc_slab_page
> >>>    189906944    46364 mm/memory.c:1056 func:folio_prealloc
> >>>    499384320   121920 mm/percpu-vm.c:95 func:pcpu_alloc_pages
> >>>    530579456   129536 mm/page_ext.c:271 func:alloc_page_ext
> >>>    625876992    54186 mm/slub.c:2450 func:alloc_slab_page
> >>>    838860800      400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
> >>>   1014710272   247732 mm/filemap.c:1978 func:__filemap_get_folio
> >>>   1056710656   257986 mm/memory.c:1054 func:folio_prealloc
> >>>   1279262720      610 mm/khugepaged.c:1084 func:alloc_charge_folio
> >>>   1334530048   325763 mm/readahead.c:186 func:ractl_alloc_folio
> >>>   3341238272   412215 drivers/net/ethernet/intel/ice/ice_txrx.c:681
> >>> [ice] func:ice_alloc_mapped_page
> >>>
> >> I have a suspicion that the issue is related to the updating of
> >> page_count in ice_get_rx_pgcnt(). The i40e driver has a very similar
> >> logic for page reuse but doesn't do this. It also has a counter to track
> >> failure to re-use the Rx pages.
> >>
> >> Commit 11c4aa074d54 ("ice: gather page_count()'s of each frag right
> >> before XDP prog call") changed the logic to update page_count of the Rx
> >> page just prior to the XDP call instead of at the point where we get the
> >> page from ice_get_rx_buf(). I think this change was originally
> >> introduced while we were trying out an experimental refactor of the
> >> hotpath to handle fragments differently, which no longer happens since
> >> 743bbd93cf29 ("ice: put Rx buffers after being done with current
> >> frame"), which ironically was part of this very same series..
> >>
> >> I think this updating of page count is accidentally causing us to
> >> miscount when we could perform page-reuse, and ultimately causes us to
> >> leak the page somehow. I'm still investigating, but I think this might
> >> trigger if somehow the page pgcnt - pagecnt_bias becomes >1, we don't
> >> reuse the page.
> >>
> >> The i40e driver stores the page count in i40e_get_rx_buffer, and I think
> >> our updating it later can somehow get things out-of-sync.
> >>
> >> Do you know if your traffic pattern happens to send fragmented frames? I
> > 
> > Hmm, I check the
> > * node_netstat_Ip_Frag* metrics and they are empty(do-not-exists),
> > * shortly run "tcpdump -n -i any 'ip[6:2] & 0x3fff != 0'" and nothing was found
> > looks to me like there is no fragmentation.
> > 
> 
> Good to rule it out at least.
> 
> >> think iperf doesn't do that, which might be part of whats causing this
> >> issue. I'm going to try to see if I can generate such fragmentation to
> >> confirm. Is your MTU kept at the default ethernet size?
> > 
> > Our MTU size is set to 9000 everywhere.
> > 
> 
> Ok. I am re-trying with MTU 9000 and using some traffic generated by wrk
> now. I do see much larger memory use (~2GB) when using MTU 9000, so that
> tracks with what your system shows. Currently its fluctuating between
> 1.9 and 2G. I'll leave this going for a couple of days while on vacation
> and see if anything pops up.

I was thinking if order-1 pages might do the mess there for some reason
since for 9k mtu we pull them and split into half.

Maybe it would be worth trying out if legacy-rx (which will work on
order-0 pages) doesn't have this issue? but that would require 8k mtu.

> 
> Thanks,
> Jake




  reply	other threads:[~2025-07-04 19:30 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-14 16:29 [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad) Jaroslav Pulchart
2025-04-14 16:29 ` Jaroslav Pulchart
2025-04-14 17:15 ` [Intel-wired-lan] " Paul Menzel
2025-04-15 14:38 ` Przemek Kitszel
2025-04-15 14:38   ` Przemek Kitszel
2025-04-16  0:53   ` [Intel-wired-lan] " Jakub Kicinski
2025-04-16  0:53     ` Jakub Kicinski
2025-04-16  7:13     ` [Intel-wired-lan] " Jaroslav Pulchart
2025-04-16  7:13       ` Jaroslav Pulchart
2025-04-16 13:48       ` [Intel-wired-lan] " Jakub Kicinski
2025-04-16 13:48         ` Jakub Kicinski
2025-04-16 16:03         ` [Intel-wired-lan] " Jaroslav Pulchart
2025-04-16 16:03           ` Jaroslav Pulchart
2025-04-16 22:44           ` [Intel-wired-lan] " Jakub Kicinski
2025-04-16 22:44             ` Jakub Kicinski
2025-04-16 22:57             ` [Intel-wired-lan] " Keller, Jacob E
2025-04-16 22:57               ` Keller, Jacob E
2025-04-16 22:57           ` Keller, Jacob E
2025-04-16 22:57             ` Keller, Jacob E
2025-04-17  0:13             ` Jakub Kicinski
2025-04-17 17:52               ` Keller, Jacob E
2025-04-17 17:52                 ` Keller, Jacob E
2025-05-21  9:32                 ` Jaroslav Pulchart
2025-05-21 10:50                 ` Jaroslav Pulchart
2025-06-04  8:42                   ` Jaroslav Pulchart
2025-06-25 12:17                     ` Jaroslav Pulchart
2025-06-25 14:03                       ` Przemek Kitszel
2025-06-25 17:51                         ` Jaroslav Pulchart
2025-06-25 20:25                           ` Jakub Kicinski
2025-06-26  7:42                             ` Jaroslav Pulchart
2025-06-30  7:35                               ` Jaroslav Pulchart
2025-06-30 16:02                                 ` Jacob Keller
2025-06-30 17:24                                   ` Jaroslav Pulchart
2025-06-30 18:59                                     ` Jacob Keller
2025-06-30 20:01                                       ` Jaroslav Pulchart
2025-06-30 20:42                                         ` Jacob Keller
2025-06-30 21:56                                         ` Jacob Keller
2025-06-30 23:16                                           ` Jacob Keller
2025-07-01  6:48                                             ` Jaroslav Pulchart
2025-07-01 20:48                                               ` Jacob Keller
2025-07-02  9:48                                                 ` Jaroslav Pulchart
2025-07-02 18:01                                                   ` Jacob Keller
2025-07-02 21:56                                                   ` Jacob Keller
2025-07-03  6:46                                                     ` Jaroslav Pulchart
2025-07-03 16:16                                                       ` Jacob Keller
2025-07-04 19:30                                                         ` Maciej Fijalkowski [this message]
2025-07-07 18:32                                                         ` Jacob Keller
2025-07-07 22:03                                                           ` Jacob Keller
2025-07-09  0:50                                                             ` Jacob Keller
2025-07-09 19:11                                                               ` Jacob Keller
2025-07-09 21:04                                                                 ` Jaroslav Pulchart
2025-07-09 21:15                                                                   ` Jacob Keller
2025-07-11 18:16                                                                     ` Jaroslav Pulchart
2025-07-11 22:30                                                                       ` Jacob Keller
2025-07-14  5:34                                                                         ` Jaroslav Pulchart
2025-06-25 14:53                       ` Paul Menzel
2025-07-04 16:55 ` Michal Kubiak
2025-07-05  7:01   ` Jaroslav Pulchart
2025-07-07 15:37     ` Jaroslav Pulchart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aGgruu0EWqQnVRd8@boxer \
    --to=maciej.fijalkowski@intel.com \
    --cc=ahmed.zaki@intel.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=daniel.secik@gooddata.com \
    --cc=edumazet@google.com \
    --cc=igor@gooddata.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jacob.e.keller@intel.com \
    --cc=jaroslav.pulchart@gooddata.com \
    --cc=jdamato@fastly.com \
    --cc=kuba@kernel.org \
    --cc=lukasz.czapnik@intel.com \
    --cc=michal.swiatkowski@linux.intel.com \
    --cc=mkarsten@uwaterloo.ca \
    --cc=netdev@vger.kernel.org \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=zdenek.pesek@gooddata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.