All of lore.kernel.org
 help / color / mirror / Atom feed
From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Hugh Dickins <hughd@google.com>
Cc: "David Hildenbrand" <david@redhat.com>,
	"Patryk Kowalczyk" <patryk@kowalczyk.ws>,
	da.gomez@samsung.com, baohua@kernel.org,
	wangkefeng.wang@huawei.com, ioworker0@gmail.com,
	willy@infradead.org, ryan.roberts@arm.com,
	akpm@linux-foundation.org, eero.t.tamminen@intel.com,
	"Ville Syrjälä" <ville.syrjala@linux.intel.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: regression - mm: shmem: add large folio support for tmpfs affect GPU performance.
Date: Mon, 28 Jul 2025 14:29:23 +0800	[thread overview]
Message-ID: <dce00acf-bbf6-4e4f-a646-d939998f979a@linux.alibaba.com> (raw)
In-Reply-To: <df580afe-f7c2-c5eb-b575-373908e0f6a1@google.com>



On 2025/7/28 13:35, Hugh Dickins wrote:
> On Fri, 25 Jul 2025, Baolin Wang wrote:
>> On 2025/7/25 12:47, Hugh Dickins wrote:
>>> On Fri, 25 Jul 2025, Baolin Wang wrote:
>>>>>
>>>>> I hope to correct the logic of i915 driver's shmem allocation, by
>>>>> extending
>>>>> the shmem write length in the i915 driver to allocate PMD- sized THPs.
>>>>> IIUC,
>>>>> some sample fix code is as follows (untested). Patryk, could you help test
>>>>> it to see if this resolves your issue? Thanks.
>>>
>>> This patch cannot be the right fix.  It may be a very sensible workaround
>>> for some in-kernel drivers (I've not looked or tried); but unless I
>>> misunderstand, it does nothing to restore userspace behaviour on a
>>> huge=always tmpfs.
>>
>> Yes. Initially, we wanted to maintain compatibility with the 'huge=' option,
>> meaning that 'huge=always' tmpfs mount would still allocate PMD-sized THPs.
>> However, the current implementation is the consensus we reached after much
>> debate:
>>
>> 1. “When using tmpfs as a filesystem, it should behave like other filesystems.
>> No more special mount options.” Per Matthew.
> 
> That's okay, I've not proposed a new mount option at all (though that is
> rather how "never" came to end up meaning "not usually": our shared dislike
> for adding yet more options).  I'm proposing (shock horror) respecting the
> long-standing meaning of "huge=always".

OK.

>> 2. “Do not let the 'huge=' mount option mean 'PMD-sized' when other sizes
>> exist.” Per David.
> 
> That's less obvious.  The collision in tmpfs between anon mTHP, file large
> folio, and huge mount option (where shmem_enabled in sysfs provides that
> mount option for the internal mounts) is certainly difficult to resolve
> in any way pleasing to all (or any) of us.
> 
> But what remains clear is that we should not degrade the behaviour of
> "huge=always" for existing users: they were given PMD-sized when possible
> before, and they should be given PMD-sized when possible now (not suited
> to all usages, when "huge=within_size" may be more suitable).

This is the most contentious point. When we agree on the principle that 
'when using tmpfs as a filesystem, it should behave like other 
filesystems,' it means that tmpfs's large folio allocations should also 
be consistent with other filesystems, i.e., ‘tmpfs will allow getting a 
highest order hint based on the size of write and fallocate paths, and 
then will try each allowable huge order’ (see shmem_huge_global_enabled()).

So, based on the above principle, we allocate large folios in the same 
way when ‘huge=always’ option. However, as you mentioned, this will 
break the previous behavior where 'huge=always' option always meant 
PMD-sized large folio allocation.

>> At the time, we should have sought your advice, but we failed. The long
>> historical discussion is in this thread[1]. So now the strategy for tmpfs
>> supporting large folios is:
> 
> Yes, it's a pity how limited and unresponsive I am, then and now and forever;
> but the principle of not regressing userspace is not a topic on which my
> special input should be needed.
> 
>>
>> "
>> Considering that tmpfs already has the 'huge=' option to control the PMD-sized
>> large folios allocation, we can extend the 'huge=' option to allow any sized
>> large folios. The semantics of the 'huge=' mount option are:
>> huge=never: no any sized large folios
>> huge=always: any sized large folios
>> huge=within_size: like 'always' but respect i_size
>> huge=advise: like 'always' if requested with madvise()
>>
>> Note: For tmpfs mmap() faults, due to the lack of a write size hint, still
>> allocate the PMD-sized large folios if huge=always/within_size/advise is set.
>>
>> Moreover, the 'deny' and 'force' testing options controlled by
>> '/sys/kernel/mm/transparent_hugepage/shmem_enabled' still retain the same
>> semantics. The 'deny' can disable any sized large folios for tmpfs, while the
>> 'force' can enable PMD sized large folios for tmpfs.
>> "
> 
> Thanks for the summary, I'll have to come back to it another time: on
> first reading, it is not incompatible with "huge=always" always trying
> for PMD-sized, but falling back to smaller large folios when unsuccessful.
> 
> (I'll mention in passing that I find it strange the way shmem is getting
> large folios of a selected subset of sizes from one direction, but large
> folios of all possible sizes from another direction - often dependent
> on whether i_nlink is 0 at the time, but maybe not.  My own preference,
> so long as those tunings exist, is that shmem should always be restricted
> to the selected subset of sizes; but I may well alienate everyone I've
> not already annoyed with that opinion, and it's probably "not a hill I'm
> prepared to die on", nor even directly relevant here - except that I'd
> better mention that unhappiness while I'm in the area.)
> 
>>
>> Currently, we have observed regression in the i915 driver but have not yet
>> seen userspace regression on a huge=always tmpfs.
> 
> I shall not object to a temporary workaround to suit the i915 driver; but
> insist it not be taken as excuse not to fix the userspace regression later.

OK. Let me fix the i915 driver first.

>> If you have better suggestions, please feel free to point them out. Thanks.
> 
> Sounds like you're disinclined to fix it yourself, and I'll lose the

No, I do intend to address this incompatibility myself. However, I 
wanted to clearly describe our previous discussions and decisions to you 
first, and then make a decision after fully understanding your opinion.

Now I see your point and I can create a patch to fix the semantics of 
'huge=always,' but I think there will still be a lot of contention and 
discussion. Let's continue the discussion in that patch.

Thanks for your input.

> argument if it's not fixed during this cycle (since 6.17-next will become
> 6.18 LTS); so I'd better carve out the time to get into it in coming weeks.


      reply	other threads:[~2025-07-28  6:29 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAJCW39JCDX6_S2Ojt1HMmX-h_qAKm2eBRzxX5kOHNJz60Zu=vw@mail.gmail.com>
     [not found] ` <d5c6ac93-1af0-4093-afea-94a29a387903@redhat.com>
     [not found]   ` <63b69425-2fd1-2c77-06d6-e7ea25c92f34@google.com>
     [not found]     ` <3f204974-26c8-4d5f-b7ae-4052cbfdf4ac@redhat.com>
     [not found]       ` <a8ac7ec3-4cb3-4dd8-8d02-ede6905f322e@linux.alibaba.com>
2025-07-25  2:38         ` regression - mm: shmem: add large folio support for tmpfs affect GPU performance Baolin Wang
2025-07-25  4:47           ` Hugh Dickins
2025-07-25  6:05             ` Baolin Wang
2025-07-25  8:36               ` Patryk Kowalczyk
2025-07-25  9:17                 ` Baolin Wang
2025-07-28  5:35               ` Hugh Dickins
2025-07-28  6:29                 ` Baolin Wang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dce00acf-bbf6-4e4f-a646-d939998f979a@linux.alibaba.com \
    --to=baolin.wang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=da.gomez@samsung.com \
    --cc=david@redhat.com \
    --cc=eero.t.tamminen@intel.com \
    --cc=hughd@google.com \
    --cc=ioworker0@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=patryk@kowalczyk.ws \
    --cc=ryan.roberts@arm.com \
    --cc=ville.syrjala@linux.intel.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.