From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-44.mimecast.com (us-smtp-delivery-44.mimecast.com [207.211.30.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6BF8E18EFD1 for ; Thu, 16 Oct 2025 02:33:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.211.30.44 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760581989; cv=none; b=c8GcaZFfl5oLL3LUdjcKdDkVsr8KusAFzByCbxJqFL5y61HOIw51e9plSntb9nmUyjYxwHnKyM0mweOEPl+jYsh5MDhPbk3MZMHQnIWO1id1FViF/XMz7Wj0CYdmHY148+8N0Q2hP+uozt3vCansKF7itk3hZSKwoDgqsnybPtM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760581989; c=relaxed/simple; bh=Az6kbN4bL9trqnHqgt1obnj+ftzGTGivGDF4KhElHIs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=qn+C3RC66sJS7pEvtF/tHovE+VKezApRkldce20GGaJCB/H2KX1cTqmW8k+0XSpCdYRSsek51GjQev27QAWRg36Xyp1Y5ymWU+GAazUG1i2aGFieITVDZzVreVN2OMGu7AF+hBUK2tjzx7Iy41cgCgIcExNd8t+S0k2+3uOQ3Bo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com; spf=fail smtp.mailfrom=gmail.com; arc=none smtp.client-ip=207.211.30.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=gmail.com Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-611-OezOraHEMbSzgij2Xsg6Rg-1; Wed, 15 Oct 2025 22:33:03 -0400 X-MC-Unique: OezOraHEMbSzgij2Xsg6Rg-1 X-Mimecast-MFC-AGG-ID: OezOraHEMbSzgij2Xsg6Rg_1760581981 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 2831D18002C1; Thu, 16 Oct 2025 02:33:01 +0000 (UTC) Received: from dreadlord.taild9177d.ts.net (unknown [10.67.32.64]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id CAD9C1800353; Thu, 16 Oct 2025 02:32:54 +0000 (UTC) From: Dave Airlie To: dri-devel@lists.freedesktop.org, tj@kernel.org, christian.koenig@amd.com, Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song Cc: cgroups@vger.kernel.org, Dave Chinner , Waiman Long , simona@ffwll.ch Subject: [PATCH 06/16] ttm/pool: track allocated_pages per numa node. Date: Thu, 16 Oct 2025 12:31:34 +1000 Message-ID: <20251016023205.2303108-7-airlied@gmail.com> In-Reply-To: <20251016023205.2303108-1-airlied@gmail.com> References: <20251016023205.2303108-1-airlied@gmail.com> Precedence: bulk X-Mailing-List: cgroups@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: rZlJxlyI_4Koq7OKAyyKtFK4_wxsn3R6GyviHtBi-Kg_1760581981 X-Mimecast-Originator: gmail.com Content-Transfer-Encoding: quoted-printable content-type: text/plain; charset=WINDOWS-1252; x-default=true From: Dave Airlie This gets the memory sizes from the nodes and stores the limit as 50% of those. I think eventually we should drop the limits once we have memcg aware shrinking, but this should be more NUMA friendly, and I think seems like what people would prefer to happen on NUMA aware systems. Cc: Christian Koenig Signed-off-by: Dave Airlie --- drivers/gpu/drm/ttm/ttm_pool.c | 60 +++++++++++++++++++++++++--------- 1 file changed, 45 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.= c index ae54f01f240b..a6b055256150 100644 --- a/drivers/gpu/drm/ttm/ttm_pool.c +++ b/drivers/gpu/drm/ttm/ttm_pool.c @@ -115,10 +115,11 @@ struct ttm_pool_tt_restore { =20 static unsigned long page_pool_size; =20 -MODULE_PARM_DESC(page_pool_size, "Number of pages in the WC/UC/DMA pool"); +MODULE_PARM_DESC(page_pool_size, "Number of pages in the WC/UC/DMA pool pe= r NUMA node"); module_param(page_pool_size, ulong, 0644); =20 -static atomic_long_t allocated_pages; +static unsigned long pool_node_limit[MAX_NUMNODES]; +static atomic_long_t allocated_pages[MAX_NUMNODES]; =20 static struct ttm_pool_type global_write_combined[NR_PAGE_ORDERS]; static struct ttm_pool_type global_uncached[NR_PAGE_ORDERS]; @@ -289,6 +290,7 @@ static void ttm_pool_unmap(struct ttm_pool *pool, dma_a= ddr_t dma_addr, static void ttm_pool_type_give(struct ttm_pool_type *pt, struct page *p) { =09unsigned int i, num_pages =3D 1 << pt->order; +=09int nid =3D page_to_nid(p); =20 =09for (i =3D 0; i < num_pages; ++i) { =09=09if (PageHighMem(p)) @@ -299,10 +301,10 @@ static void ttm_pool_type_give(struct ttm_pool_type *= pt, struct page *p) =20 =09INIT_LIST_HEAD(&p->lru); =09rcu_read_lock(); -=09list_lru_add(&pt->pages, &p->lru, page_to_nid(p), NULL); +=09list_lru_add(&pt->pages, &p->lru, nid, NULL); =09rcu_read_unlock(); -=09atomic_long_add(1 << pt->order, &allocated_pages); =20 +=09atomic_long_add(num_pages, &allocated_pages[nid]);=09 =09mod_lruvec_page_state(p, NR_GPU_ACTIVE, -num_pages); =09mod_lruvec_page_state(p, NR_GPU_RECLAIM, num_pages); } @@ -328,7 +330,7 @@ static struct page *ttm_pool_type_take(struct ttm_pool_= type *pt, int nid) =20 =09ret =3D list_lru_walk_node(&pt->pages, nid, take_one_from_lru, (void *)= &p, &nr_to_walk); =09if (ret =3D=3D 1 && p) { -=09=09atomic_long_sub(1 << pt->order, &allocated_pages); +=09=09atomic_long_sub(1 << pt->order, &allocated_pages[nid]); =09=09mod_lruvec_page_state(p, NR_GPU_ACTIVE, (1 << pt->order)); =09=09mod_lruvec_page_state(p, NR_GPU_RECLAIM, -(1 << pt->order)); =09} @@ -367,7 +369,7 @@ static void ttm_pool_dispose_list(struct ttm_pool_type = *pt, =09=09struct page *p; =09=09p =3D list_first_entry(dispose, struct page, lru); =09=09list_del_init(&p->lru); -=09=09atomic_long_sub(1 << pt->order, &allocated_pages); +=09=09atomic_long_sub(1 << pt->order, &allocated_pages[page_to_nid(p)]); =09=09ttm_pool_free_page(pt->pool, pt->caching, pt->order, p, true); =09} } @@ -925,11 +927,13 @@ int ttm_pool_restore_and_alloc(struct ttm_pool *pool,= struct ttm_tt *tt, */ void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt) { +=09int nid =3D ttm_pool_nid(pool); + =09ttm_pool_free_range(pool, tt, tt->caching, 0, tt->num_pages); =20 -=09while (atomic_long_read(&allocated_pages) > page_pool_size) { -=09=09unsigned long diff =3D page_pool_size - atomic_long_read(&allocated_= pages); -=09=09ttm_pool_shrink(ttm_pool_nid(pool), diff); +=09while (atomic_long_read(&allocated_pages[nid]) > pool_node_limit[nid]) = { +=09=09unsigned long diff =3D pool_node_limit[nid] - atomic_long_read(&allo= cated_pages[nid]); +=09=09ttm_pool_shrink(nid, diff); =09} } EXPORT_SYMBOL(ttm_pool_free); @@ -1189,7 +1193,7 @@ static unsigned long ttm_pool_shrinker_scan(struct sh= rinker *shrink, =09do =09=09num_freed +=3D ttm_pool_shrink(sc->nid, sc->nr_to_scan); =09while (num_freed < sc->nr_to_scan && -=09 atomic_long_read(&allocated_pages)); +=09 atomic_long_read(&allocated_pages[sc->nid])); =20 =09sc->nr_scanned =3D num_freed; =20 @@ -1200,7 +1204,7 @@ static unsigned long ttm_pool_shrinker_scan(struct sh= rinker *shrink, static unsigned long ttm_pool_shrinker_count(struct shrinker *shrink, =09=09=09=09=09 struct shrink_control *sc) { -=09unsigned long num_pages =3D atomic_long_read(&allocated_pages); +=09unsigned long num_pages =3D atomic_long_read(&allocated_pages[sc->nid])= ; =20 =09return num_pages ? num_pages : SHRINK_EMPTY; } @@ -1237,8 +1241,12 @@ static void ttm_pool_debugfs_orders(struct ttm_pool_= type *pt, /* Dump the total amount of allocated pages */ static void ttm_pool_debugfs_footer(struct seq_file *m) { -=09seq_printf(m, "\ntotal\t: %8lu of %8lu\n", -=09=09 atomic_long_read(&allocated_pages), page_pool_size); +=09int nid; + +=09for_each_node(nid) { +=09=09seq_printf(m, "\ntotal node%d\t: %8lu of %8lu\n", nid, +=09=09=09 atomic_long_read(&allocated_pages[nid]), pool_node_limit[nid])= ; +=09} } =20 /* Dump the information for the global pools */ @@ -1332,6 +1340,22 @@ DEFINE_SHOW_ATTRIBUTE(ttm_pool_debugfs_shrink); =20 #endif =20 +static inline uint64_t ttm_get_node_memory_size(int nid) +{ +=09/* This is directly using si_meminfo_node implementation as the +=09 * function is not exported. +=09 */ +=09int zone_type; +=09uint64_t managed_pages =3D 0; + +=09pg_data_t *pgdat =3D NODE_DATA(nid); + +=09for (zone_type =3D 0; zone_type < MAX_NR_ZONES; zone_type++) +=09=09managed_pages +=3D +=09=09=09zone_managed_pages(&pgdat->node_zones[zone_type]); +=09return managed_pages * PAGE_SIZE; +} + /** * ttm_pool_mgr_init - Initialize globals * @@ -1343,8 +1367,14 @@ int ttm_pool_mgr_init(unsigned long num_pages) { =09unsigned int i; =20 -=09if (!page_pool_size) -=09=09page_pool_size =3D num_pages; +=09int nid; +=09for_each_node(nid) { +=09=09if (!page_pool_size) { +=09=09=09uint64_t node_size =3D ttm_get_node_memory_size(nid); +=09=09=09pool_node_limit[nid] =3D (node_size >> PAGE_SHIFT) / 2; +=09=09} else +=09=09=09pool_node_limit[nid] =3D page_pool_size; +=09} =20 =09spin_lock_init(&shrinker_lock); =09INIT_LIST_HEAD(&shrinker_list); --=20 2.51.0