From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62335C83038 for ; Tue, 1 Jul 2025 22:11:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CAD7F6B00B0; Tue, 1 Jul 2025 18:11:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C5DCF6B00B7; Tue, 1 Jul 2025 18:11:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B4D0E6B00B9; Tue, 1 Jul 2025 18:11:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9DBD86B00B0 for ; Tue, 1 Jul 2025 18:11:34 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DA941140601 for ; Tue, 1 Jul 2025 22:11:33 +0000 (UTC) X-FDA: 83617093266.02.C15FA9A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf20.hostedemail.com (Postfix) with ESMTP id 963E41C0017 for ; Tue, 1 Jul 2025 22:11:31 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jNvzldSG; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf20.hostedemail.com: domain of airlied@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=airlied@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751407891; a=rsa-sha256; cv=none; b=2q1uL2tsXlzIF/aUji+BkSiaWUu2tvIiwO0fANtTo2vocAgu//GaxlsAI+T1nTUZfNIir9 C2oMsDiQy0Tv/Nk9mzvIO+Vx1HWXy3CGUrj6x7/8n9jCzVBZHsb3BSZ+QUbydnW5bUYqDL x6R5tqIOtogRnbv0wE90X8Vko1+t/Yc= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jNvzldSG; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf20.hostedemail.com: domain of airlied@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=airlied@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751407891; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=P2n1GCzFsQdFgvd4JqLPeVxo7Fy9M1vZuKb3K8BjYfY=; b=o27AaqDcsmpec8VqBNyZAIlpAizMxZMUGVNaEUI10B4tNyH6erKKDw8YXIGVlsRzFrymqQ RirPvhZ+JgeEYYidxIROECMdzEZgYGRCyKy5lLLysRWILeoZsaB0rI5NtDWW5V9w/q+0zH cD/gMmeyqAlcCIIGNYLf4SoCfLMOhRE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751407891; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P2n1GCzFsQdFgvd4JqLPeVxo7Fy9M1vZuKb3K8BjYfY=; b=jNvzldSGL9GTM8M+3ZkbGEz+dDy8s9Y+/zzZApa53Yqk87fs/BtjFLxxz7SdTIA8bVGsWb 8RpVnofcmF4H1iIqmiKW7uK9bO6TBynnl+eB1NZOEX549DQ+l0wEc9LNc4oyOIUl/M4mqm bVY08nP6zkn2tiSKirH8PvyBTI8cJ3Q= Received: from mail-pj1-f72.google.com (mail-pj1-f72.google.com [209.85.216.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-656-G9Kuj0vdP8S871n61zmPJQ-1; Tue, 01 Jul 2025 18:11:23 -0400 X-MC-Unique: G9Kuj0vdP8S871n61zmPJQ-1 X-Mimecast-MFC-AGG-ID: G9Kuj0vdP8S871n61zmPJQ_1751407883 Received: by mail-pj1-f72.google.com with SMTP id 98e67ed59e1d1-3138e671316so4809633a91.0 for ; Tue, 01 Jul 2025 15:11:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751407883; x=1752012683; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P2n1GCzFsQdFgvd4JqLPeVxo7Fy9M1vZuKb3K8BjYfY=; b=AfDqZK4RVIo1awE1qbcK/GgseleVmQeOabk2VsMBtQLvS2JGfabUWjsVJN7QmWh5qP 8Ipt708ZlROK6rgXo/LY8oEux7HwKXjFyiYQLaYkgistytSl6PwKUTyyL5tSnJ0BdMVU eDY64bO7Us5fefFqoeaQGUzu9+aHKwf8k3qfMWxpRYn62LatEWm6WIGCjx8BcLdDSdVT AVWq2bFvIdet7iEJpLcS9ycD5TRa3aqp+hZxprVCMggxcsZ1B3yUPLXx3oLr0muF04uf hNHgNp33Eq+xfbnwB5+naofW6M4uzVFicLCXs0ks/k/adLYY7vXz/zpgH+KhB+B6GO+W y+Bw== X-Forwarded-Encrypted: i=1; AJvYcCUwOSBLQbWh98C9uueyDxaa1DitASUsrg6dYOAo/BMMXSE07fKwc17ICJOM9lqJyPbl0+aCFT4Esw==@kvack.org X-Gm-Message-State: AOJu0YwYjSNZjYkiCyFaG2JqV4L3q+nWGVI/wbQoMA6DZCco7YUKLPhk Qif3dDRC3xGoBNX45KJLW9YumKJIlB0buAWKkpwJmBaTdyOgPjCIgfloLYc0ZYe0ug/odAZ1/dC glz5b20K4Yh7nogXkwqG5c1pnA/rxZlO6b+fcLIFJHA2vUE7JJQRzxl5MHpFfZ1tzgjDWPArX3C EofC9l97Rzi1UqIgbYkbt54jlUwWI= X-Gm-Gg: ASbGncvuYUahy3QUB8k2Fi9UmJ9x1pKKbAGNYWM+A+Jm8ds8Hv3Ghf7k9zxlqWuI3Na /mHnI6A9EDx3qpoDWYU/QSno81YeAm00LOEK+mHfmK/Q9wYWE3tUnUZwkMztEkAQdZIjjcRGVpE lABQ== X-Received: by 2002:a17:90b:380b:b0:31a:8dc4:b5bf with SMTP id 98e67ed59e1d1-31a9185e610mr282524a91.17.1751407882541; Tue, 01 Jul 2025 15:11:22 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEsoZMc8z6Gxz67gOWHuiiCXh4z4S9T8Dqh9lIBNI2CgMpFVtMdamz+f579Lo0x4zxMsbD8TT40c9cyGMWRYYc= X-Received: by 2002:a17:90b:380b:b0:31a:8dc4:b5bf with SMTP id 98e67ed59e1d1-31a9185e610mr282493a91.17.1751407881888; Tue, 01 Jul 2025 15:11:21 -0700 (PDT) MIME-Version: 1.0 References: <20250630045005.1337339-1-airlied@gmail.com> <20250630045005.1337339-13-airlied@gmail.com> <20a90668-3ddf-4153-9953-a2df9179a1b1@amd.com> <26c79b1e-0f7f-4efa-9040-92df8c5bdf1f@amd.com> In-Reply-To: <26c79b1e-0f7f-4efa-9040-92df8c5bdf1f@amd.com> From: David Airlie Date: Wed, 2 Jul 2025 08:11:10 +1000 X-Gm-Features: Ac12FXyDpCsACysmA0irocFjIYFGOKWaqYPH9NtEP0X5j4msTJM4YJ3PX16_fGs Message-ID: Subject: Re: [PATCH 12/17] ttm: add objcg pointer to bo and tt To: =?UTF-8?Q?Christian_K=C3=B6nig?= Cc: Dave Airlie , dri-devel@lists.freedesktop.org, linux-mm@kvack.org, Johannes Weiner , Dave Chinner , Kairui Song X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: pcuoHbDbGs4V4rGtOleAaE9jAjJNWwZ9kHpme7NFH0Q_1751407883 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: tckg11q1xcyfqc6s9nej7fdft59oofxc X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 963E41C0017 X-Rspam-User: X-HE-Tag: 1751407891-947713 X-HE-Meta: U2FsdGVkX1/d6spwLLG2X2O4/VH+UYEZb5Nk7V3lZu7vYsl9KrHmn8vSH91D1L2B/FIrzIDUcUvATs3baVz2dPJabP+OWnWkYZBVDwZg1uUYrbQVKQEPsTldQEE9hkHftPXUYLd/PA87wG057QFCBPNayM5J7iGBEb9+R4VqDJ410mZj8epP69yM1uhNnWlBNtLWbU4orMoEtIqFuCr8ZGeCsXQuX57/iI0iagLu1oOYteX0lR+uqLzJMr/pxjQa0osr+EvQLidOvkk2QH6QahozwXci8Jvh3BkzOzIm5wXWHfAlU8Fqv016iK0RvP5YoJHfxnUt/yvNJUha4FIcYnDyqGv3JHZ/R2mBB/A9XGiIVqkzD9I5JuW4z5G7FaF9EQTn0GU2azZpG2P4uOYsGK/mPnXMggTV/eHwr1t/humq7mXmaqUY72EhjvUOhw1SaJIW1+tUVbKEiIBRRXrYYRdi6psnCtSxnUeUucgL+z+E+MUSQwFBsEEOwYUk0xOfoGqMEbW/HvQlBHIHzLNMEMvbFmVFZtplKyDEwVzw6aXju7Uj0QWnUT90emIiXs0/ePjdLxSXuQ92a+pYC2CGjZYw57l5GEK3GImYLrYXEfgF2Oqwms0BVKS+HWrogUEgVkIBdlwY8DuyZKSP6m408h48fa/bgrtHRwRwxgWXrg8RmSw3BgrdeOte8zfWB7sJd8CmYAabrgCC7Z7RtkHPp8PfGnGHpWKo3jgSeMxvrt8gUdceKg3n/zcl85vre2DSQpalvA/38bfn+X95nuYHLpTyWAEdQNVObvrD3usawgwNIwmpa+9Ci7E65wXlPu/vWAc+HK6SjyT/6jCxiU++fKFRupucpimkIhsFTYPJxYvJzX81D0F5uW/MuYoGNmyInIK1d0U8xXHpAc0UhAfet6DXIQazCx8VtqUgOjBAACU1O79wS5k7lj4jLlQR6fUlGypX5Zml8VHpjIpr6a1 lMS7hCO7 vgQ6JG48vdFAo7Bj0GzBPFcoBjg7ksZZ5CqYsw7d57Dx9j7OY5LLHeDhiWWMWBShpnIOb2ZXqKgs210pj046NyZAtScOjbpEitp7rih4eTSXdwH7gBV2PCwfU17mU3h2ySuPwqTBoNb+fi9X8V7LY5PQhjDSH8P8qHoXDf0bhsZX+OhrApqCFJfdRlXM4S0VdLGXKZ2RGuxu7T5jPXn/bfXRQE5AUx/i6Z2ZhfTUSiiHa5Vs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 1, 2025 at 6:16=E2=80=AFPM Christian K=C3=B6nig wrote: > > On 01.07.25 10:06, David Airlie wrote: > > On Tue, Jul 1, 2025 at 5:22=E2=80=AFPM Christian K=C3=B6nig wrote: > >>>>> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h > >>>>> index 15d4019685f6..c13fea4c2915 100644 > >>>>> --- a/include/drm/ttm/ttm_tt.h > >>>>> +++ b/include/drm/ttm/ttm_tt.h > >>>>> @@ -126,6 +126,8 @@ struct ttm_tt { > >>>>> enum ttm_caching caching; > >>>>> /** @restore: Partial restoration from backup state. TTM priv= ate */ > >>>>> struct ttm_pool_tt_restore *restore; > >>>>> + /** @objcg: Object cgroup for this TT allocation */ > >>>>> + struct obj_cgroup *objcg; > >>>>> }; > >>>> > >>>> We should probably keep that out of the pool and account the memory = to the BO instead. > >>>> > >>> > >>> I tried that like 2-3 patch posting iterations ago, you suggested it > >>> then, it didn't work. It has to be done at the pool level, I think it > >>> was due to swap handling. > >> > >> When you do it at the pool level the swap/shrink handling is broken as= well, just not for amdgpu. > >> > >> See xe_bo_shrink() and drivers/gpu/drm/xe/xe_shrinker.c on how XE does= it. > > > > I've read all of that, but I don't think it needs changing yet, though > > I do think I probably need to do a bit more work on the ttm > > backup/restore paths to account things, but again we suffer from the > > what happens if your cgroup runs out of space on a restore path, > > similiar to eviction. > > My thinking was rather that because of this we do it at the resource leve= l and keep memory accounted to whoever allocated it even if it's backed up = or swapped out. > > > Blocking the problems we can solve now on the problems we've no idea > > how to solve means nobody gets experience with solving anything. > > Well that's exactly the reason why I'm suggesting this. Ignoring swapping= /backup for now seems to make things much easier. It makes it easier now, but when we have to solve swapping, step one will be moving all this code around to what I have now, and starting from there. This just raises the bar to solving the next problem. We need to find incremental approaches to getting all the pieces of the puzzle solved, or else we will still be here in 10 years. The steps I've formulated (none of them are perfect, but they all seem better than status quo) 1. add global counters for pages - now we can at least see things in vmstat and per-node 2. add numa to the pool lru - we can remove our own numa code and align with core kernel - probably doesn't help anything 3. add memcg awareness to the pool and pool shrinker. if you are on a APU with no swap configured - you have a lot better tim= e. if you are on a dGPU or APU with swap - you have a moderately better time, but I can't see you having a worse time. 4. look into tt level swapping and seeing how to integrate that lru with numa/memcg awareness in theory we can do better than allocated_pages tracking, (I'd like to burn that down, since it seems at odds with memcg) 5. look into xe swapping and see if we can integrate that numa/memcg better= . So the question I really want answered when I'm submitting patches isn't, what does this not fix or not make better, but what does this actively make worse than the status quo and is it heading in a consistent direction to solve the problem. Accounting at the resource level makes stuff better, but I don't believe after implementing it that it is consistent with solving the overall problem. Dave.