linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Khalid Aziz <khalid.aziz@oracle.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com,
	arnd@arndb.de, 21cnbao@gmail.com, corbet@lwn.net,
	dave.hansen@linux.intel.com, david@redhat.com,
	ebiederm@xmission.com, hagen@jauu.net, jack@suse.cz,
	keescook@chromium.org, kirill@shutemov.name, kucharsk@gmail.com,
	linkinjeon@kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	longpeng2@huawei.com, luto@kernel.org, markhemm@googlemail.com,
	pcc@google.com, rppt@kernel.org, sieberf@amazon.com,
	sjpark@amazon.de, surenb@google.com, tst@schoebel-theuer.de,
	yzaikin@google.com
Subject: Re: [PATCH v1 00/14] Add support for shared PTEs across processes
Date: Mon, 11 Apr 2022 13:52:31 -0600	[thread overview]
Message-ID: <fe797b58-bd46-754d-17d2-a19e7ce1bf40@oracle.com> (raw)
In-Reply-To: <YlRnPstOywJzxUib@casper.infradead.org>

On 4/11/22 11:37, Matthew Wilcox wrote:
> On Mon, Apr 11, 2022 at 10:05:44AM -0600, Khalid Aziz wrote:
>> Page tables in kernel consume some of the memory and as long as number
>> of mappings being maintained is small enough, this space consumed by
>> page tables is not objectionable. When very few memory pages are
>> shared between processes, the number of page table entries (PTEs) to
>> maintain is mostly constrained by the number of pages of memory on the
>> system. As the number of shared pages and the number of times pages
>> are shared goes up, amount of memory consumed by page tables starts to
>> become significant.
> 
> All of this is true.  However, I've found a lot of people don't see this
> as compelling.  I've had more success explaining this from a different
> direction:
> 
> --- 8< ---
> 
> Linux supports processes which share all of their address space (threads)
> and processes that share none of their address space (tasks).  We propose
> a useful intermediate model where two or more cooperating processes
> can choose to share portions of their address space with each other.
> The shared portion is referred to by a file descriptor which processes
> can choose to attach to their own address space.
> 
> Modifications to the shared region affect all processes sharing
> that region, just as changes by one thread affect all threads in a
> multithreaded program.  This implies a certain level of trust between
> the different processes (ie malicious processes should not be allowed
> access to the mshared region).
> 
> --- 8< ---
> 
> Another argument that MM developers find compelling is that we can reduce
> some of the complexity in hugetlbfs where it has the ability to share
> page tables between processes.

This all sounds reasonable.

> 
> One objection that was raised is that the mechanism for starting the
> shared region is a bit clunky.  Did you investigate the proposed approach
> of creating an empty address space, attaching to it and using an fd-based
> mmap to modify its contents?

I want to make sure I understand this correctly. In the example I gave, the process creating mshare'd region maps in the 
address space first possibly using mmap(). It then calls mshare() to share this already-mapped region. Are you 
suggesting that the process be able to call mshare() before mapping in address range and then map things into that 
address range later? If yes, it is my intent to support that after the initial implementation as expansion of original 
concept.

> 
>> int mshare_unlink(char *name)
>>
>> A shared address range created by mshare() can be destroyed using
>> mshare_unlink() which removes the  shared named object. Once all
>> processes have unmapped the shared object, the shared address range
>> references are de-allocated and destroyed.
>>
>> mshare_unlink() returns 0 on success or -1 on error.
> 
> Can you explain why this is a syscall instead of being a library
> function which does
> 
> 	int dirfd = open("/sys/fs/mshare");
> 	err = unlinkat(dirfd, name, 0);
> 	close(dirfd);
> 	return err;

mshare_unlink can be simple unlink on the file in msharefs. API will be asymmetrical in that creating mshare'd region is 
a syscall while tearing it down is a file op. I don't mind saving a syscall slot. Would you prefer it that way?

> 
> Does msharefs support creating directories, so that we can use file
> permissions to limit who can see the sharable files?  Or is it strictly
> a single-level-deep hierarchy?
> 

For now msharefs is single-level-deep. It can be expanded to support directories to limit visibility of filenames. Would 
you prefer to see it support directories from the beginning or can that be a future expansion of this feature?

Thanks,
Khalid



  parent reply	other threads:[~2022-04-11 19:53 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-11 16:05 [PATCH v1 00/14] Add support for shared PTEs across processes Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 01/14] mm: Add new system calls mshare, mshare_unlink Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 02/14] mm/mshare: Add msharefs filesystem Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 03/14] mm/mshare: Add read for msharefs Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 04/14] mm/mshare: implement mshare_unlink syscall Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 05/14] mm/mshare: Add locking to msharefs syscalls Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 06/14] mm/mshare: Check for mounted filesystem Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 07/14] mm/mshare: Add vm flag for shared PTE Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 08/14] mm/mshare: Add basic page table sharing using mshare Khalid Aziz
2022-04-11 18:48   ` Dave Hansen
2022-04-11 20:39     ` Khalid Aziz
2022-05-30 11:11   ` Barry Song
2022-06-28 20:11     ` Khalid Aziz
2022-05-31  3:46   ` Barry Song
2022-06-28 20:16     ` Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 09/14] mm/mshare: Do not free PTEs for mshare'd PTEs Khalid Aziz
2022-05-31  4:24   ` Barry Song
2022-06-29 17:38     ` Khalid Aziz
2022-07-03 20:54       ` Andy Lutomirski
2022-07-06 20:33         ` Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 10/14] mm/mshare: Check for mapped vma when mshare'ing existing mshare'd range Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 11/14] mm/mshare: unmap vmas in mshare_unlink Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 12/14] mm/mshare: Add a proc file with mshare alignment/size information Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 13/14] mm/mshare: Enforce mshare'd region permissions Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 14/14] mm/mshare: Copy PTEs to host mm Khalid Aziz
2022-04-11 17:37 ` [PATCH v1 00/14] Add support for shared PTEs across processes Matthew Wilcox
2022-04-11 18:51   ` Dave Hansen
2022-04-11 19:08     ` Matthew Wilcox
2022-04-11 19:52   ` Khalid Aziz [this message]
2022-04-11 18:47 ` Dave Hansen
2022-04-11 20:10 ` Eric W. Biederman
2022-04-11 22:21   ` Khalid Aziz
2022-05-30 10:48 ` Barry Song
2022-05-30 11:18   ` David Hildenbrand
2022-05-30 11:49     ` Barry Song
2022-06-29 17:48     ` Khalid Aziz
2022-06-29 17:40   ` Khalid Aziz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fe797b58-bd46-754d-17d2-a19e7ce1bf40@oracle.com \
    --to=khalid.aziz@oracle.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=arnd@arndb.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=hagen@jauu.net \
    --cc=jack@suse.cz \
    --cc=keescook@chromium.org \
    --cc=kirill@shutemov.name \
    --cc=kucharsk@gmail.com \
    --cc=linkinjeon@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longpeng2@huawei.com \
    --cc=luto@kernel.org \
    --cc=markhemm@googlemail.com \
    --cc=pcc@google.com \
    --cc=rppt@kernel.org \
    --cc=sieberf@amazon.com \
    --cc=sjpark@amazon.de \
    --cc=surenb@google.com \
    --cc=tst@schoebel-theuer.de \
    --cc=willy@infradead.org \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).