From: "Huang, Kai" <kai.huang@intel.com>
To: "Christopherson,, Sean" <seanjc@google.com>
Cc: "linux-sgx@vger.kernel.org" <linux-sgx@vger.kernel.org>,
"Chatre, Reinette" <reinette.chatre@intel.com>,
"jarkko@kernel.org" <jarkko@kernel.org>,
"Dhanraj, Vijay" <vijay.dhanraj@intel.com>,
"haitao.huang@linux.intel.com" <haitao.huang@linux.intel.com>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>
Subject: Re: [RFC PATCH v4 2/4] x86/sgx: Implement support for MADV_WILLNEED
Date: Mon, 19 Jun 2023 11:17:45 +0000 [thread overview]
Message-ID: <fc214963a01ca975d6de2991b63c93f04114c562.camel@intel.com> (raw)
In-Reply-To: <ZIzcrmU1RA7oEeQ7@google.com>
On Fri, 2023-06-16 at 15:05 -0700, Sean Christopherson wrote:
> On Fri, Jun 16, 2023, Kai Huang wrote:
> > On Tue, 2023-06-06 at 04:11 +0000, Huang, Kai wrote:
> > > On Fri, 2023-05-26 at 19:32 -0500, Haitao Huang wrote:
> > > > Hi Kai, Jarkko and Dave
> > > >
> > > > On Thu, 09 Mar 2023 05:31:29 -0600, Huang, Kai <kai.huang@intel.com> wrote:
> > > > >
> > > > > So I am still a little bit confused about where does "SGX driver uses
> > > > > MAP_ANONYMOUS semantics for fd-based mmap()" come from.
> > > > >
> > > > > Anyway, we certainly don't want to break userspace. However, IIUC,
> > > > > even from now on we change the driver to depend on userspace to pass
> > > > > the correct pgoff in mmap(), this won't break userspace, because old
> > > > > userspace which doesn't use fadvice() and pgoff actually doesn't
> > > > > matter. For new userspace which uses fadvice(), it needs to pass the
> > > > > correct pgoff.
> > > > >
> > > > > I am not saying we should do this, but it doesn't seem we can break
> > > > > userspace?
> > > > >
> > > >
> > > > Sorry for delayed update but I thought about this more and likely to
> > > > propose a new EAUG ioctl for this and for enabling SGX-CET shadow stack
> > > > pages. But regardless, I'd like to wrap up this discussion to just clarify
> > > > this anonymous semantics design in documentation so people won't get
> > > > confused in future.
> > > >
> > > > I think we all agree to keep this semantics so no user space would need
> > > > specify 'offset' for mmap with enclave fd. And here is my proposed
> > > > documentation changes.
> > > >
> > > > --- a/Documentation/x86/sgx.rst
> > > > +++ b/Documentation/x86/sgx.rst
> > > > @@ -100,6 +100,23 @@ pages and establish enclave page permissions.
> > > > sgx_ioc_enclave_init
> > > > sgx_ioc_enclave_provision
> > > >
> > > > +Enclave memory mapping
> > > > +----------------------
> > > > +
> > > > +A file descriptor created from opening **/dev/sgx_enclave** represents an
> > > > +enclave object. The mmap() syscall with enclave file descriptors does not
> > > > +support non-zero value for the 'offset' parameter.
> > >
> > > I think we all need to understand better why SGX driver requires anonymous
> > > semantics mmap() against /dev/sgx_enclave, and as a result of that, requires
> > > mmap() to pass 0 as pgoff (which looks wasn't even discussed when upstreaming
> > > the driver).
> > >
> > > I'll do some investigation and try to summerize and report back. Thanks.
> > >
> >
> > + Sean.
> >
> > Hi Sean,
> >
> > If you see this and have time, please help to comment. Thanks.
> >
> > I've spent plenty of time to look into the discussions around v20/v28/v29 and
> > roughly v38/v39 to find out why SGX driver requires MAP_ANONYMOUS semantics,
> > AFAICT it turns out it was never explicitly discussed. Or perhaps the
> > "MAP_ANONYMOUS semantics" actually just means "MAP_SHARED | MAP_FIXED + pgoff is
> > ignored", and everyone believed there was no need to explain what does "SGX
> > driver uses MAP_ANONYMOUS semantics for mmap()" mean.
> >
> > Details:
> >
> > The v20 story (that I spent most of my time on) mentioned by Haitao was actually
> > about how to make SGX and LSM work together but not related to SGX driver mmap()
> > semantic.
> >
> > Also Haitao mentioned "the use of anonymous mapping can be traced back to v29"
> > but this actually was just about how to use the first mmap() to "reserve the
> > ELRANGE before ECREATE". It wasn't about to changing mmap(/dev/sgx_enclave)
> > semantics at all.
> >
> > Sean actually suggested to explicitly document "how does SGX driver recommend
> > the user to reserve ELRANGE", but Jarkko didn't think we should do:
> >
> > https://lore.kernel.org/linux-sgx/20200528111910.GB1666298@linux.intel.com/
> >
> > which is a pity IMHO, because I believe for anyone, naturally, the first
> > instinct to reserve ELRANGE is to use mmap(/dev/sgx_enclave) but not
> > mmap(MAP_ANONYMOUS). If we suggest user to use the latter then there must be
> > some reason and IMHO such suggestion and reason should be documented.
>
> Ya, the use of mmap() on fd=-1 is done in order to find an available, naturally
> aligned chunk of virtual memory[*]. IIRC, there was a (very brief) discussion
> about enhancing .mmap() so that userspace wouldn't be responsible for doing the
> alignment, but I think we didn't pursue that idea very because we had bigger fish
> to fry.
Thanks for your time :)
Yeah I noticed in v20, the sgx_get_unmapped_area() internally would allocate
2*len and then manually return the aligned address. This wouldn't work for
mmap()ing the small chunks with an offset smaller than the size
>
> But I think this is unrelated to what you really care about, e.g. a userspace that
> tightly controls its virtual memory could hardcode enclave placement (IIRC graphene
> did/does do that). I.e. the alignment issue is a completely different discussion.
>
> [*] https://lore.kernel.org/all/20190522153836.GA24833@linux.intel.com
Right. For the sake of making progress of this Haitao's series, we want to
understand where did "SGX driver requires mmap(/dev/sgx_enclave) to use
MAP_ANONYMOUS semantics" come from.
But for this topic (how to reserve ELRANGE). I am not sure the reason that we
prefer mmap(MAP_ANONYMOUS) still stands. For instance, the discussion in your
above link was old implementation/assumption -- e.g., MAP_FIXED wasn't even
used/supported for mmap()ing enclave chunks.
So I am wondering now whether we suggest user to use mmap(MAP_ANONYMOUS) to get
a size-aligned address still stands? The current SGX driver's
get_unmapped_area:
static unsigned long sgx_get_unmapped_area(struct file *file,
unsigned long addr,
unsigned long len,
unsigned long pgoff,
unsigned long flags)
{
if ((flags & MAP_TYPE) == MAP_PRIVATE)
return -EINVAL;
if (flags & MAP_FIXED)
return addr;
return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
}
As you can see if we don't pass MAP_FIXED, which is the case for mmap() to
reserve ELRANGE, looks there's no difference between mmap(MAP_ANONYMOUS) and
mmap(/dev/sgx_enclave)?
>
> > Also, if I am not missing something, the current driver doesn't prevent using
> > mmap(/dev/sgx_enclave, PROT_NONE) to reserve ELANGE. So having clear
> > documentation is helpful for SGX users to choose how to write their apps.
> >
> > Go back to the "SGX driver uses MAP_ANONYMOUS semantics for mmap()", I believe
> > this just is "SGX driver requires mmap() after ECREATE/EINIT to use MAP_SHARED |
> > MAP_FIXED and pgoff is ignored". Or more precisely, pgoff is "not _used_ by SGX
> > driver".
> >
> > In fact, I think "pgoff is ignored/not used" is technically wrong for enclave.
>
> Yeah, it's wrong. It works because, until now apparently, there was never a reason
> a need to care about the file offset since ELRANGE base always provided the necessary
> information. It wasn't a deliberate design choice, we simply overlooked that detail
> (unless Jarkko was holding back on me all those years ago :-) ).
Yeah. But I am not sure whether there are corner cases that we can have
potential bug around here, since those VMA's are not aligned between the core-mm
and the driver.
I haven't thought carefully though. I guess from core-mm's perspective there
are multi-VMAs mapping to the same/overlapping part of the enclave, while the
SGX driver treats them as mapping to different non-overlapping parts. Perhaps
it's OK as long as SGX driver doesn't use vma->pgoff to do something???
>
> > IMHO we should stop saying SGX driver uses MAP_ANONYMOUS semantics, because the
> > truth is it just takes advantage of MAP_FIXED and carelessly ignores the pgoff
> > due to the nature of SGX w/o considering from core-MM's perspective.
> >
> > And IMHO there are two ways to fix:
> >
> > 1) From now on, we ask SGX apps to use the correct pgoff in their
> > mmap(/dev/sgx_enclave). This shouldn't impact the existing SGX apps because SGX
> > driver doesn't use vma->pgoff anyway.
>
> Heh, just "asking" won't help. And breaking userspace, i.e. requiring all apps
> to update, will just get you yelled at :-)
We can document properly and assume the new apps will follow. As I mentioned
above, the old apps, which doesn't/shouldn't have been using fadvice() anyway,
doesn't need to be changed, i.e., they should continue to work. :)
No?
>
> > 2) For the sake of avoiding having to ask existing SGX apps to change their
> > mmap()s, we _officially_ say that userspace isn't required to pass a correct
> > pgoff to mmap() (i.e. passing 0 as did in existing apps), but the kernel should
> > fix the vma->pgoff internally.
>
> I recommend you don't do this. Overwriting vm_pgoff would probably work, but it's
> going to make a flawed API even messier. E.g. you'll have painted SGX into a
> corner if you ever want to decouple vma->start/end from encl->base. I highly
> doubt that will ever happen given how ELRANGE works, but I don't think a hack-a-fix
> buys you enough to justify any more ugliness.
I also found it's not feasible to cleanly fix the pgoff from userspace (I
thought the pgoff could be fixed at very early stage of do_mmap() so everything
later just worked, but it's not the case). In do_mmap() the pgoff from
userspace is already used to VMA merging/splitting etc before creating the
target VMA.
Hmm.. Now I think we shouldn't silently fix pgoff in SGX driver as it may
surprise the core-mm later because the core-mm has already done some job around
VMAs before vma->pgoff is changed.
>
> > I do prefer option 2) because it has no harm to anyone: 1) No changes to
> > existing SGX apps; 2) It aligns with the core-MM to so all existing mmap()
> > operations should work as expected, meaning no surprise; 3) And this patchset
> > from Haitao to use fadvice() to accelerate EAUG flow just works.
>
> I think you can have your cake and eat it too. IIUC, the goal is to get fadvise()
> working, and to do that without creating a truly heinous uAPI, you need an accurate
> vm_pgoff. So, use a carrot and stick approach.
>
> If userspace properly defines vm_pgoff during mmap() and doesn't specify MAP_ANONYMOUS,
> then they get to use fadvise() (give 'em a carrot). But if *any* mmap() on the
> enclave doesn't followo those rules, mark the enclave as tainted or whatever and
> disallow fadvise() (hit 'em with a stick).
If we are supposed to use mmap(MAP_ANONYMOUS) to reserve ELRANGE, then I think
userspace will just use MAP_FIXED for all mmap()s to /dev/sgx_enclave. To
detect whether a VMA has the correct pgoff, we need to somehow mark it in
sgx_mmap(), but currently we don't have facility to do that. Even we can do,
the core madvice() -> vfs_fadvice() cannot be aware of such VMA either, and in
sgx_fadvice() we may have problem locating the correct VMA to do EAUG (because
the vfs_fadvice() uses 0 pgoff to calculate the file offset).
So, now I think we should just let userspace itself to pass the correct pgoff
when it wants to use fadvice(), which is the option 1) I mentioned above. The
kernel doesn't/shouldn't need to fix vma->pgoff.
In another words, I think:
- For the old apps, doesn't matter, continue to work;
- For new apps which want to use fadvice() to accelerate EAUG, it should pass
the correct pgoff in mmap(/dev/sgx_enclave) even with MAP_FIXED.
And we don't say "SGX driver uses MAP_ANONYMOUS semantics for
mmap(/dev/sgx_enclave) any more".
Does this make sense?
next prev parent reply other threads:[~2023-06-19 11:17 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-28 4:55 [RFC PATCH v4 0/4] x86/sgx: implement support for MADV_WILLNEED Haitao Huang
2023-01-28 4:55 ` [RFC PATCH v4 1/4] x86/sgx: Export sgx_encl_eaug_page Haitao Huang
2023-01-28 4:55 ` [RFC PATCH v4 2/4] x86/sgx: Implement support for MADV_WILLNEED Haitao Huang
2023-01-28 4:55 ` [RFC PATCH v4 3/4] selftests/sgx: add len field for EACCEPT op Haitao Huang
2023-01-28 4:55 ` [RFC PATCH v4 4/4] selftests/sgx: Add test for madvise(..., WILLNEED) Haitao Huang
2023-02-07 23:30 ` Jarkko Sakkinen
2023-02-15 2:38 ` Huang, Kai
2023-02-15 4:42 ` Haitao Huang
2023-02-15 8:46 ` Huang, Kai
2023-02-17 22:29 ` jarkko
2023-02-07 23:29 ` [RFC PATCH v4 3/4] selftests/sgx: add len field for EACCEPT op Jarkko Sakkinen
2023-02-07 23:28 ` [RFC PATCH v4 2/4] x86/sgx: Implement support for MADV_WILLNEED Jarkko Sakkinen
2023-02-14 9:47 ` Huang, Kai
2023-02-14 19:18 ` Haitao Huang
2023-02-14 20:54 ` Huang, Kai
2023-02-14 21:42 ` Haitao Huang
2023-02-14 22:36 ` Huang, Kai
2023-02-15 3:59 ` Haitao Huang
2023-02-15 8:51 ` Huang, Kai
2023-02-15 15:42 ` Haitao Huang
2023-02-16 7:53 ` Huang, Kai
2023-02-16 17:12 ` Haitao Huang
2023-02-17 22:32 ` jarkko
2023-02-17 23:03 ` Haitao Huang
2023-02-21 22:10 ` jarkko
2023-02-22 1:37 ` Haitao Huang
2023-03-07 23:32 ` Huang, Kai
2023-03-09 0:50 ` Haitao Huang
2023-03-09 11:31 ` Huang, Kai
2023-03-14 14:54 ` Haitao Huang
2023-03-19 13:26 ` jarkko
2023-03-20 9:36 ` Huang, Kai
2023-03-20 14:04 ` jarkko
2023-05-27 0:32 ` Haitao Huang
2023-06-06 4:11 ` Huang, Kai
2023-06-07 16:59 ` Haitao Huang
2023-06-16 3:49 ` Huang, Kai
2023-06-16 22:05 ` Sean Christopherson
2023-06-19 11:17 ` Huang, Kai [this message]
2023-06-22 22:01 ` Sean Christopherson
2023-06-22 23:21 ` Huang, Kai
2023-06-26 22:28 ` Sean Christopherson
2023-06-27 11:43 ` Huang, Kai
2023-06-27 14:50 ` Sean Christopherson
2023-06-28 9:37 ` Huang, Kai
2023-06-28 14:57 ` Sean Christopherson
2023-06-29 3:10 ` Huang, Kai
2023-06-29 14:23 ` Sean Christopherson
2023-06-29 23:29 ` Huang, Kai
2023-06-30 0:14 ` Sean Christopherson
2023-06-30 0:56 ` Huang, Kai
2023-06-30 1:54 ` Jarkko Sakkinen
2023-06-30 1:57 ` Jarkko Sakkinen
2023-06-30 4:26 ` Huang, Kai
2023-06-30 9:35 ` Jarkko Sakkinen
2023-03-12 1:25 ` jarkko
2023-03-12 22:25 ` Huang, Kai
2023-02-17 22:07 ` jarkko
2023-02-17 21:50 ` jarkko
2023-02-07 23:26 ` [RFC PATCH v4 1/4] x86/sgx: Export sgx_encl_eaug_page Jarkko Sakkinen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fc214963a01ca975d6de2991b63c93f04114c562.camel@intel.com \
--to=kai.huang@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=haitao.huang@linux.intel.com \
--cc=jarkko@kernel.org \
--cc=linux-sgx@vger.kernel.org \
--cc=reinette.chatre@intel.com \
--cc=seanjc@google.com \
--cc=vijay.dhanraj@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox