From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Matthew Wilcox <willy@infradead.org>
Cc: Khalid Aziz <khalid.aziz@oracle.com>,
akpm@linux-foundation.org, longpeng2@huawei.com, arnd@arndb.de,
dave.hansen@linux.intel.com, david@redhat.com, rppt@kernel.org,
surenb@google.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [RFC PATCH 0/6] Add support for shared PTEs across processes
Date: Wed, 26 Jan 2022 16:42:47 +0300 [thread overview]
Message-ID: <20220126134247.fadtwbvyknh3ejpe@box.shutemov.name> (raw)
In-Reply-To: <YfDIYKygRHX4RIri@casper.infradead.org>
On Wed, Jan 26, 2022 at 04:04:48AM +0000, Matthew Wilcox wrote:
> On Tue, Jan 25, 2022 at 06:59:50PM +0000, Matthew Wilcox wrote:
> > On Tue, Jan 25, 2022 at 09:57:05PM +0300, Kirill A. Shutemov wrote:
> > > On Tue, Jan 25, 2022 at 02:09:47PM +0000, Matthew Wilcox wrote:
> > > > > I think zero-API approach (plus madvise() hints to tweak it) is worth
> > > > > considering.
> > > >
> > > > I think the zero-API approach actually misses out on a lot of
> > > > possibilities that the mshare() approach offers. For example, mshare()
> > > > allows you to mmap() many small files in the shared region -- you
> > > > can't do that with zeroAPI.
> > >
> > > Do you consider a use-case for many small files to be common? I would
> > > think that the main consumer of the feature to be mmap of huge files.
> > > And in this case zero enabling burden on userspace side sounds like a
> > > sweet deal.
> >
> > mmap() of huge files is certainly the Oracle use-case. With occasional
> > funny business like mprotect() of a single page in the middle of a 1GB
> > hugepage.
>
> Bill and I were talking about this earlier and realised that this is
> the key point. There's a requirement that when one process mprotects
> a page that it gets protected in all processes. You can't do that
> without *some* API because that's different behaviour than any existing
> API would produce.
"hurr, durr, we are Oracle" :P
Sounds like a very niche requirement. I doubt there will more than single
digit user count for the feature. Maybe only the DB.
> So how about something like this ...
>
> int mcreate(const char *name, int flags, mode_t mode);
>
> creates a new mm_struct with a refcount of 2. returns an fd (one
> of the two refcounts) and creates a name for it (inside msharefs,
> holds the other refcount).
>
> You can then mmap() that fd to attach it to a chunk of your address
> space. Once attached, you can start to populate it by calling
> mmap() and specifying an address inside the attached mm as the first
> argument to mmap().
That is not what mmap() would normally do to an existing mapping. So it
requires special treatment.
In general mmap() of a mm_struct scares me. I can't wrap my head around
implications.
Like how does it work on fork()?
How accounting works? What happens on OOM?
What prevents creating loops, like mapping a mm_struct inside itself?
What mremap()/munmap() do to such mapping? Will it affect mapping of
mm_struct or will it target mapping inside the mm_sturct?
Maybe it just didn't clicked for me, I donno.
> Maybe mcreate() is just a library call, and it's really a thin wrapper
> around open() that happens to know where msharefs is mounted.
--
Kirill A. Shutemov
next prev parent reply other threads:[~2022-01-26 13:42 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-18 21:19 [RFC PATCH 0/6] Add support for shared PTEs across processes Khalid Aziz
2022-01-18 21:19 ` [RFC PATCH 1/6] mm: Add new system calls mshare, mshare_unlink Khalid Aziz
2022-01-18 21:19 ` [RFC PATCH 2/6] mm: Add msharefs filesystem Khalid Aziz
2022-01-18 21:19 ` [RFC PATCH 3/6] mm: Add read for msharefs Khalid Aziz
2022-01-18 21:19 ` [RFC PATCH 4/6] mm: implement mshare_unlink syscall Khalid Aziz
2022-01-18 21:19 ` [RFC PATCH 5/6] mm: Add locking to msharefs syscalls Khalid Aziz
2022-01-18 21:19 ` [RFC PATCH 6/6] mm: Add basic page table sharing using mshare Khalid Aziz
2022-01-18 21:41 ` [RFC PATCH 0/6] Add support for shared PTEs across processes Dave Hansen
2022-01-18 21:46 ` Matthew Wilcox
2022-01-18 22:47 ` Khalid Aziz
2022-01-18 22:06 ` Dave Hansen
2022-01-18 22:52 ` Khalid Aziz
2022-01-19 11:38 ` Mark Hemment
2022-01-19 17:02 ` Khalid Aziz
2022-01-20 12:49 ` Mark Hemment
2022-01-20 19:15 ` Khalid Aziz
2022-01-24 15:15 ` Mark Hemment
2022-01-24 15:27 ` Matthew Wilcox
2022-01-24 22:20 ` Khalid Aziz
2022-01-21 1:08 ` Barry Song
2022-01-21 2:13 ` Matthew Wilcox
2022-01-21 7:35 ` Barry Song
2022-01-21 14:47 ` Matthew Wilcox
2022-01-21 16:41 ` Khalid Aziz
2022-01-22 1:39 ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
2022-01-22 1:41 ` Matthew Wilcox
2022-01-22 10:18 ` Thomas Schoebel-Theuer
2022-01-22 16:09 ` Matthew Wilcox
2022-01-22 11:31 ` Mike Rapoport
2022-01-22 18:29 ` Andy Lutomirski
2022-01-24 18:48 ` Khalid Aziz
2022-01-24 19:45 ` Andy Lutomirski
2022-01-24 22:30 ` Khalid Aziz
2022-01-24 23:16 ` Andy Lutomirski
2022-01-24 23:44 ` Khalid Aziz
2022-01-25 11:42 ` Kirill A. Shutemov
2022-01-25 12:09 ` William Kucharski
2022-01-25 13:18 ` David Hildenbrand
2022-01-25 14:01 ` Kirill A. Shutemov
2022-01-25 13:23 ` Matthew Wilcox
2022-01-25 13:59 ` Kirill A. Shutemov
2022-01-25 14:09 ` Matthew Wilcox
2022-01-25 18:57 ` Kirill A. Shutemov
2022-01-25 18:59 ` Matthew Wilcox
2022-01-26 4:04 ` Matthew Wilcox
2022-01-26 10:16 ` David Hildenbrand
2022-01-26 13:38 ` Matthew Wilcox
2022-01-26 13:55 ` David Hildenbrand
2022-01-26 14:12 ` Matthew Wilcox
2022-01-26 14:30 ` David Hildenbrand
2022-01-26 14:12 ` Mike Rapoport
2022-01-26 13:42 ` Kirill A. Shutemov [this message]
2022-01-26 14:18 ` Mike Rapoport
2022-01-26 17:33 ` Khalid Aziz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220126134247.fadtwbvyknh3ejpe@box.shutemov.name \
--to=kirill@shutemov.name \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=khalid.aziz@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longpeng2@huawei.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).