From: Jason Gunthorpe <jgg@ziepe.ca>
To: Hugh Dickins <hughd@google.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Heiko Carstens <hca@linux.ibm.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Claudio Imbrenda <imbrenda@linux.ibm.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
linux-s390@vger.kernel.org
Subject: Re: [PATCH 07/12] s390: add pte_free_defer(), with use of mmdrop_async()
Date: Thu, 15 Jun 2023 09:34:09 -0300 [thread overview]
Message-ID: <ZIsFQalF7rwVKXrD@ziepe.ca> (raw)
In-Reply-To: <fc5cd62e-d85f-36c3-ba37-db87e8b625d@google.com>
On Wed, Jun 14, 2023 at 02:59:33PM -0700, Hugh Dickins wrote:
> I guess the best thing would be to modify kernel/fork.c to allow the
> architecture to override free_mm(), and arch/s390 call_rcu to free mm.
> But as a quick and dirty s390-end workaround, how about:
RCU callbacks are not ordered so that doesn't seem like it helps..
synchronize_rcu would do the job since it is ordered, but I think the
performance cost is too great to just call it from mmdrop
rcu_barrier() followed by call_rcu on the mm struct might work, but I
don't know the cost
A per-cpu refcount scheme might also do the job reasonably
Making the page frag pool global (per-cpu global I guess) would also
remove the need to reach back to the freeable mm_struct and reduce the
need for struct page memory. This views it as a special kind of
kmemcache.
Another approach is to not use a rcu_head in the ptdesc at all.
With a global kmemcache-like-thing we could probably also organize
something where you don't use a rcu_head in the ptdesc, but instead
just a naked 'next' pointer. This would give enough space to have two
next pointers and the next pointers can be re-used for the normal free
list as well.
In this flow you'd thread the free'd frags onto a waterfall of global
per-cpu lists:
- RCU free the next cycle
- RCU free this cycle
- Actually free
Where a single rcu_head and single call_rcu frees the entire 2nd list
to the 3rd list and then schedules the 1st list to be RCU'd next. This
eliminates the need to store a function pointer in the ptdesc at
all.
It requires some global per-cpu lock on the free/alloc paths however,
but this is basically what every other arch does as it frees the page
back to the page allocator.
I suspect that two next pointers would also eliminate pt_frag_refcount
entirely as we can encode that information in the low bits of the next
pointers.
> (Funnily enough, there's no problem when the stored mm gets re-used for
> a different mm, once past its spin_lock_init(&mm->context.lock);
> because
We do that have really weird "type safe by rcu" thing in the
allocators, but I don't quite know how it works.
> Powerpc is like that. I have no idea how much gets wasted that way.
> I was keen not to degrade what s390 does: which is definitely superior,
> but possibly not worth the effort.
Yeah, it would be good to understand if this is really sufficiently
beneficial..
> I'll look into it, once I understand c2c224932fd0. But may have to write
> to Vishal first, or get the v2 of my series out: if only I could work out
> a safe and easy way of unbreaking s390...
Can arches opt in to RCU freeing page table support and still keep
your series sane?
Honestly, I feel like trying to RCU enable page tables should be its
own series. It is a sufficiently tricky subject on its own right.
Jason
next prev parent reply other threads:[~2023-06-15 12:34 UTC|newest]
Thread overview: 158+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-29 6:11 [PATCH 00/12] mm: free retracted page table by RCU Hugh Dickins
2023-05-29 6:11 ` Hugh Dickins
2023-05-29 6:11 ` Hugh Dickins
2023-05-29 6:14 ` [PATCH 01/12] mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s Hugh Dickins
2023-05-29 6:14 ` Hugh Dickins
2023-05-29 6:14 ` Hugh Dickins
2023-05-31 17:06 ` Jann Horn
2023-05-31 17:06 ` Jann Horn
2023-05-31 17:06 ` Jann Horn
2023-06-02 2:50 ` Hugh Dickins
2023-06-02 2:50 ` Hugh Dickins
2023-06-02 2:50 ` Hugh Dickins
2023-06-02 14:21 ` Jann Horn
2023-06-02 14:21 ` Jann Horn
2023-06-02 14:21 ` Jann Horn
2023-05-29 6:16 ` [PATCH 02/12] mm/pgtable: add PAE safety to __pte_offset_map() Hugh Dickins
2023-05-29 6:16 ` Hugh Dickins
2023-05-29 6:16 ` Hugh Dickins
2023-05-29 13:56 ` Matthew Wilcox
2023-05-29 13:56 ` Matthew Wilcox
2023-05-29 13:56 ` Matthew Wilcox
[not found] ` <ZHeg3oRljRn6wlLX@ziepe.ca>
2023-06-02 5:35 ` Hugh Dickins
2023-06-02 5:35 ` Hugh Dickins
2023-06-02 5:35 ` Hugh Dickins
2023-05-29 6:17 ` [PATCH 03/12] arm: adjust_pte() use pte_offset_map_nolock() Hugh Dickins
2023-05-29 6:17 ` Hugh Dickins
2023-05-29 6:17 ` Hugh Dickins
2023-05-29 6:18 ` [PATCH 04/12] powerpc: assert_pte_locked() " Hugh Dickins
2023-05-29 6:18 ` Hugh Dickins
2023-05-29 6:18 ` Hugh Dickins
2023-05-29 6:20 ` [PATCH 05/12] powerpc: add pte_free_defer() for pgtables sharing page Hugh Dickins
2023-05-29 6:20 ` Hugh Dickins
2023-05-29 6:20 ` Hugh Dickins
2023-05-29 14:02 ` Matthew Wilcox
2023-05-29 14:02 ` Matthew Wilcox
2023-05-29 14:02 ` Matthew Wilcox
2023-05-29 14:36 ` Hugh Dickins
2023-05-29 14:36 ` Hugh Dickins
2023-05-29 14:36 ` Hugh Dickins
2023-06-01 13:57 ` Gerald Schaefer
2023-06-01 13:57 ` Gerald Schaefer
2023-06-01 13:57 ` Gerald Schaefer
2023-06-02 6:38 ` Hugh Dickins
2023-06-02 6:38 ` Hugh Dickins
2023-06-02 14:20 ` Jason Gunthorpe
2023-06-02 14:20 ` Jason Gunthorpe
2023-06-02 14:20 ` Jason Gunthorpe
2023-06-06 3:40 ` Hugh Dickins
2023-06-06 3:40 ` Hugh Dickins
2023-06-06 3:40 ` Hugh Dickins
2023-06-06 18:23 ` Jason Gunthorpe
2023-06-06 18:23 ` Jason Gunthorpe
2023-06-06 18:23 ` Jason Gunthorpe
2023-06-06 19:03 ` Peter Xu
2023-06-06 19:03 ` Peter Xu
2023-06-06 19:03 ` Peter Xu
2023-06-06 19:08 ` Jason Gunthorpe
2023-06-06 19:08 ` Jason Gunthorpe
2023-06-06 19:08 ` Jason Gunthorpe
2023-06-07 3:49 ` Hugh Dickins
2023-06-07 3:49 ` Hugh Dickins
2023-06-07 3:49 ` Hugh Dickins
2023-05-29 6:21 ` [PATCH 06/12] sparc: " Hugh Dickins
2023-05-29 6:21 ` Hugh Dickins
2023-05-29 6:21 ` Hugh Dickins
2023-06-06 3:46 ` Hugh Dickins
2023-06-06 3:46 ` Hugh Dickins
2023-06-06 3:46 ` Hugh Dickins
2023-05-29 6:22 ` [PATCH 07/12] s390: add pte_free_defer(), with use of mmdrop_async() Hugh Dickins
2023-05-29 6:22 ` Hugh Dickins
2023-05-29 6:22 ` Hugh Dickins
2023-06-06 5:11 ` Hugh Dickins
2023-06-06 5:11 ` Hugh Dickins
2023-06-06 5:11 ` Hugh Dickins
2023-06-06 18:39 ` Jason Gunthorpe
2023-06-06 18:39 ` Jason Gunthorpe
2023-06-06 18:39 ` Jason Gunthorpe
2023-06-08 2:46 ` Hugh Dickins
2023-06-08 2:46 ` Hugh Dickins
2023-06-08 2:46 ` Hugh Dickins
2023-06-06 19:40 ` Gerald Schaefer
2023-06-06 19:40 ` Gerald Schaefer
2023-06-06 19:40 ` Gerald Schaefer
2023-06-08 3:35 ` Hugh Dickins
2023-06-08 3:35 ` Hugh Dickins
2023-06-08 3:35 ` Hugh Dickins
2023-06-08 13:58 ` Jason Gunthorpe
2023-06-08 13:58 ` Jason Gunthorpe
2023-06-08 13:58 ` Jason Gunthorpe
2023-06-08 15:47 ` Gerald Schaefer
2023-06-08 15:47 ` Gerald Schaefer
2023-06-08 15:47 ` Gerald Schaefer
2023-06-13 6:34 ` Hugh Dickins
2023-06-14 13:30 ` Gerald Schaefer
2023-06-14 21:59 ` Hugh Dickins
2023-06-15 12:11 ` Gerald Schaefer
2023-06-15 20:06 ` Hugh Dickins
2023-06-16 8:38 ` Gerald Schaefer
2023-06-15 12:34 ` Jason Gunthorpe [this message]
2023-06-15 21:09 ` Hugh Dickins
2023-06-16 12:35 ` Jason Gunthorpe
2023-05-29 6:23 ` [PATCH 08/12] mm/pgtable: add pte_free_defer() for pgtable as page Hugh Dickins
2023-05-29 6:23 ` Hugh Dickins
2023-05-29 6:23 ` Hugh Dickins
2023-06-01 13:31 ` Jann Horn
2023-06-01 13:31 ` Jann Horn
2023-06-01 13:31 ` Jann Horn
[not found] ` <ZHekpAKJ05cr/GLl@ziepe.ca>
2023-06-02 6:03 ` Hugh Dickins
2023-06-02 6:03 ` Hugh Dickins
2023-06-02 6:03 ` Hugh Dickins
2023-06-02 12:15 ` Jason Gunthorpe
2023-06-02 12:15 ` Jason Gunthorpe
2023-06-02 12:15 ` Jason Gunthorpe
2023-05-29 6:25 ` [PATCH 09/12] mm/khugepaged: retract_page_tables() without mmap or vma lock Hugh Dickins
2023-05-29 6:25 ` Hugh Dickins
2023-05-29 6:25 ` Hugh Dickins
2023-05-29 23:26 ` Peter Xu
2023-05-29 23:26 ` Peter Xu
2023-05-29 23:26 ` Peter Xu
2023-05-31 0:38 ` Hugh Dickins
2023-05-31 0:38 ` Hugh Dickins
2023-05-31 0:38 ` Hugh Dickins
2023-05-31 15:34 ` Jann Horn
2023-05-31 15:34 ` Jann Horn
2023-05-31 15:34 ` Jann Horn
[not found] ` <ZHe0A079X9B8jWlH@x1n>
2023-05-31 22:18 ` Jann Horn
2023-05-31 22:18 ` Jann Horn
2023-05-31 22:18 ` Jann Horn
2023-06-01 14:06 ` Jason Gunthorpe
2023-06-01 14:06 ` Jason Gunthorpe
2023-06-01 14:06 ` Jason Gunthorpe
2023-06-06 6:18 ` Hugh Dickins
2023-06-06 6:18 ` Hugh Dickins
2023-06-06 6:18 ` Hugh Dickins
2023-05-29 6:26 ` [PATCH 10/12] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() Hugh Dickins
2023-05-29 6:26 ` Hugh Dickins
2023-05-29 6:26 ` Hugh Dickins
2023-05-31 17:25 ` Jann Horn
2023-05-31 17:25 ` Jann Horn
2023-06-02 5:11 ` Hugh Dickins
2023-06-02 5:11 ` Hugh Dickins
2023-06-02 5:11 ` Hugh Dickins
2023-05-29 6:28 ` [PATCH 11/12] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() Hugh Dickins
2023-05-29 6:28 ` Hugh Dickins
2023-05-29 6:28 ` Hugh Dickins
2023-05-29 6:30 ` [PATCH 12/12] mm: delete mmap_write_trylock() and vma_try_start_write() Hugh Dickins
2023-05-29 6:30 ` Hugh Dickins
2023-05-29 6:30 ` Hugh Dickins
2023-05-31 17:59 ` [PATCH 00/12] mm: free retracted page table by RCU Jann Horn
2023-06-02 4:37 ` Hugh Dickins
2023-06-02 4:37 ` Hugh Dickins
2023-06-02 4:37 ` Hugh Dickins
2023-06-02 15:26 ` Jann Horn
2023-06-02 15:26 ` Jann Horn
2023-06-02 15:26 ` Jann Horn
2023-06-06 6:28 ` Hugh Dickins
2023-06-06 6:28 ` Hugh Dickins
2023-06-06 6:28 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZIsFQalF7rwVKXrD@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=agordeev@linux.ibm.com \
--cc=borntraeger@linux.ibm.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=hughd@google.com \
--cc=imbrenda@linux.ibm.com \
--cc=linux-s390@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.