linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org, Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH 0/5] Remove some races around folio_test_hugetlb
Date: Thu, 7 Mar 2024 22:38:35 +0100	[thread overview]
Message-ID: <4b9f83e5-c7b5-4a08-b5b0-411921e00b5e@redhat.com> (raw)
In-Reply-To: <ZeouSqELyK5UPkIM@casper.infradead.org>

On 07.03.24 22:14, Matthew Wilcox wrote:
> On Thu, Mar 07, 2024 at 10:20:15AM +0100, David Hildenbrand wrote:
>>>>> IOW:
>>>>>
>>>>> word	page0			page1
>>>>> 0	flags			flags
>>>>> 1	lru.next		head
>>>>> 2	lru.prev		entire_mapcount + gap
>>>>> 3	mapping			nr_pages_mapped + gap / hugetlb_id
>>>>> 4	index			pincount + nr_pages
>>>>> 5	private			unused
>>>>> 6	mapcount+refcount	mapcount+refcount(0)
>>>>> 7	memcg_data		-
>>>>>
>>>>> or on 32-bit
>>>>>
>>>>> word	page0			page1
>>>>> 0	flags			flags
>>>>> 1	lru.next		head
>>>>> 2	lru.prev		entire_mapcount
>>>>> 3	mapping			nr_pages_mapped / hugetlb_id
>>>>> 4	index			pincount
>>>>> 5	private			unused
>>>>> 6	mapcount		mapcount
>>>>> 7	refcount		refcount
>>>>> 8	memcg_data		-
>>>>> 9+	virtual? last_cpupid? whatever
>>>
>>> How about this layout?
>>>
>>> @@ -350,8 +350,13 @@ struct folio {
>>>                           unsigned long _head_1;
>>>                           unsigned long _folio_avail;
>>>           /* public: */
>>> -                       atomic_t _entire_mapcount;
>>> -                       atomic_t _nr_pages_mapped;
>>> +                       union {
>>> +                               unsigned long _hugetlb_id;
>>> +                               struct {
>>> +                                       atomic_t _entire_mapcount;
>>> +                                       atomic_t _nr_pages_mapped;
>>> +                               };
>>> +                       };
>>>                           atomic_t _pincount;
>>>    #ifdef CONFIG_64BIT
>>>                           unsigned int _folio_nr_pages;
>>>
>>> That keeps _folio_avail as, well, available.  It puts _hugetlb_id in
>>> the same bits as ->mapping.  It continues to leave ->private unused
>>> on 64-bit.  I think this does everything we want?
>>
>> _entire_mapcount is (still) used for hugetlb folios.
> 
> Oh, duh, of course it is.  I thought we used page[0].mapcount for them,
> but we don't and shouldn't.  I suppose we could use a magic value for
> page[0].mapcount to indicate hugetlb, but that'd make page_mapcount()
> more complex.
> 
>> With the total mapcount in place, I was thinking about renaming it to
>> "_pmd_mapcount" and stop using it for hugetlb folios, just like we'd not be
>> using _nr_pages_mapped for hugetlb folios.
>>
>> [I also thought about moving the _pmd_mapcount to another subpage, where
>> we'd also have a _pud_mapcount in the future; but again, stuff for the
>> future]
>>
>> Until then, wouldn't _hugetlb_id be problematic here? [I could move
>> _entire_mapcount/_pmd_mapcount later I guess]
> 
> New idea then, how about simply:
> 
>                          unsigned long _flags_1;
>                          unsigned long _head_1;
> -                       unsigned long _folio_avail;
> +                       unsigned long _hugetlb_id;
> 
> We have to check the various other users of struct page to see what we
> might conflict with.

Well, compared to the current version there is no change for me: the 
space I wanted to use for the total_mapcount is gone and I'll have to 
start being creative :)

Free space in subpage1: "The LORD gave and the LORD has taken away"

We seem to be running again into space issues in subpage1 (also, more 
relevant now that we will support order-1 folios ...). This kind of 
wastage+over-complicated layout (again :( ) just to handle hugetlb 
oddities for free pages -- refcount 0 -- feels very wrong.


Stupid idea: Do we really have to identify (possibly free) hugetlb 
folios that way from lockless pfnwalkers without a folio reference?

I mean, with a folio reference held it's all working as expected.

Couldn't we just make hugetlb store them in some kind of xarray that we 
can walk (using RCU?), indexed by start PFN we want to test?

So if we find the current hugetlb flag to be set on the lockless path, 
simply check that xarray. It's all super racy either way, we can get a 
free+split+reuse anytime.

Or is that completely flawed?

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-03-07 21:38 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-01 21:47 [PATCH 0/5] Remove some races around folio_test_hugetlb Matthew Wilcox (Oracle)
2024-03-01 21:47 ` [PATCH 1/5] hugetlb: Make folio_test_hugetlb safer to call Matthew Wilcox (Oracle)
2024-03-05  6:43   ` Oscar Salvador
2024-03-05  8:39   ` David Hildenbrand
2024-03-01 21:47 ` [PATCH 2/5] hugetlb: Add hugetlb_pfn_folio Matthew Wilcox (Oracle)
2024-03-05  6:58   ` Oscar Salvador
2024-03-01 21:47 ` [PATCH 3/5] memory-failure: Use hugetlb_pfn_folio Matthew Wilcox (Oracle)
2024-03-01 21:47 ` [PATCH 4/5] memory-failure: Reorganise get_huge_page_for_hwpoison() Matthew Wilcox (Oracle)
2024-03-01 21:47 ` [PATCH 5/5] compaction: Use hugetlb_pfn_folio in isolate_migratepages_block Matthew Wilcox (Oracle)
2024-03-04  9:09 ` [PATCH 0/5] Remove some races around folio_test_hugetlb Miaohe Lin
2024-03-04 17:08   ` Matthew Wilcox
2024-03-06  7:58     ` Miaohe Lin
2024-03-07 21:16       ` Matthew Wilcox
2024-03-05  9:10 ` David Hildenbrand
2024-03-05 20:35   ` Matthew Wilcox
2024-03-06 15:18     ` David Hildenbrand
2024-03-07  4:31       ` Matthew Wilcox
2024-03-07  9:20         ` David Hildenbrand
2024-03-07 21:14           ` Matthew Wilcox
2024-03-07 21:38             ` David Hildenbrand [this message]
2024-03-08  4:31             ` Matthew Wilcox
2024-03-08  8:46               ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b9f83e5-c7b5-4a08-b5b0-411921e00b5e@redhat.com \
    --to=david@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=osalvador@suse.de \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).