public inbox for linux-sgx@vger.kernel.org
 help / color / mirror / Atom feed
From: Jarkko Sakkinen <jarkko@kernel.org>
To: Dave Hansen <dave.hansen@intel.com>
Cc: linux-sgx@vger.kernel.org,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	reinette.chatre@intel.com,
	Nathaniel McCallum <nathaniel@profian.com>, "Christopherson,,
	Sean" <seanjc@google.com>
Subject: Re: VMA's not getting merged
Date: Mon, 6 Jun 2022 10:21:15 +0300	[thread overview]
Message-ID: <Yp2q68qYtdpR+Z+T@iki.fi> (raw)
In-Reply-To: <YoRCJCtSGfxjwu2w@iki.fi>

On Wed, May 18, 2022 at 03:47:32AM +0300, Jarkko Sakkinen wrote:
> On Mon, May 16, 2022 at 09:45:56PM +0300, Jarkko Sakkinen wrote:
> > On Mon, May 16, 2022 at 07:13:42AM -0700, Dave Hansen wrote:
> > > On 5/14/22 18:20, Jarkko Sakkinen wrote:
> > > > Looks like VMA's do not get merged properly:
> > > ...
> > > > 7f8f0800b000-7f8f08014000 rw-s 00000000 00:05 84                         /dev/sgx_enclave
> > > > 7f8f08014000-7f8f08025000 rw-s 00000000 00:05 84                         /dev/sgx_enclave
> > > > 7f8f08025000-7f8f08046000 rw-s 00000000 00:05 84                         /dev/sgx_enclave
> > > > 7f8f08046000-7f8f0804a000 rw-s 00000000 00:05 84                         /dev/sgx_enclave
> > > > 7f8f0804a000-7f8f0804b000 rw-s 00000000 00:05 84                         /dev/sgx_enclave
> > > > 7f8f0804b000-7f8f0804c000 rw-s 00000000 00:05 84                         /dev/sgx_enclave
> > > > 7f8f0804c000-7f8f0804d000 rw-s 00000000 00:05 84                         /dev/sgx_enclave
> > > > 7f8f0804d000-7f8f0804e000 rw-s 00000000 00:05 84                         /dev/sgx_enclave
> > > 
> > > I remember some issue with VM_IO VMAs not getting merged, but I don't
> > > remember how we fixed or addressed it.
> > > 
> > > Seems like something we can and should fix, though.
> > 
> > Access pattern here is series of small adjacent mmaps (brk()).
> > 
> > During upstreaming vma_close callback was eliminated to allow VMA's getting
> > merged. I'll try to trace the condition in the call path triggering this.
> > 
> > BR, Jarkko
> 
> I wrote a small bpftrace script:
> [SNIP]

I started looking into this again by writing a short program:

// SPDX-License-Identifier: GPL-2.0
/* Derivative of tools/testing/selftests/sgx */

#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <asm/sgx.h>

#define ENCL_SIZE (1 << 24) /* 16 MB */

struct sgx_secs {
	__u64 size;
	__u64 base;
	__u32 ssa_frame_size;
	__u32 miscselect;
	__u8  reserved1[24];
	__u64 attributes;
	__u64 xfrm;
	__u32 mrenclave[8];
	__u8  reserved2[32];
	__u32 mrsigner[8];
	__u8  reserved3[32];
	__u32 config_id[16];
	__u16 isv_prod_id;
	__u16 isv_svn;
	__u16 config_svn;
	__u8  reserved4[3834];
} __attribute__((packed));

void dump_maps(void)
{
	char maps_line[256];
	FILE *fp = fopen("/proc/self/maps", "r");
	if (fp != NULL)  {
		while (fgets(maps_line, sizeof(maps_line), fp) != NULL) {
			maps_line[strlen(maps_line) - 1] = '\0';

			if (strstr(maps_line, "/dev/sgx_enclave"))
				printf("%s\n", maps_line);
		}

		fclose(fp);
	}
}

int main(void)
{	
	const char device_path[] = "/dev/sgx_enclave";
	struct sgx_enclave_create ioc;
	struct sgx_secs secs;
	int fd = -1;
	void *area;
	void *addr;
	int ret;

	area = mmap(NULL, ENCL_SIZE * 2, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
	if (area == MAP_FAILED) {
		perror("mmap");
		exit(1);
	}

	memset(&secs, 0, sizeof(secs));
	secs.ssa_frame_size = 1;
	secs.attributes = 4;
	secs.xfrm = 3;
	secs.base = ((uint64_t)area + ENCL_SIZE - 1) & ~(ENCL_SIZE - 1);
	secs.size = ENCL_SIZE;

	munmap(area, secs.base - (uint64_t)area);
	munmap((void *)(secs.base + ENCL_SIZE), (uint64_t)area + ENCL_SIZE - secs.base);

	fd = open(device_path, O_RDWR);
	if (fd < 0) {
		perror("open");
		exit(1);
	}

	ioc.src = (unsigned long)&secs;
	ret = ioctl(fd, SGX_IOC_ENCLAVE_CREATE, &ioc);
	if (ret) {
		perror("ioctl");
		exit(1);
	}

	addr = mmap((void *)secs.base, ENCL_SIZE / 2, PROT_READ | PROT_WRITE,
		    MAP_SHARED | MAP_FIXED, fd, 0);
	if (addr == MAP_FAILED) {
		perror("mmap");
		exit(1);
	}

	addr = mmap((void *)(secs.base + ENCL_SIZE / 4), ENCL_SIZE / 2, PROT_READ | PROT_WRITE,
		    MAP_SHARED | MAP_FIXED, fd, 0);
	if (addr == MAP_FAILED) {
		perror("mmap");
		exit(1);
	}

	dump_maps();

	exit(0);
}

It creates two RW VMA's:

1. 8 MB VMA.
2. Another 8 MB VMA that overlaps the first by 4 MB.

Expected result ought to be 12 MB RW VMA.

Then I wrote a another eBPF script:

#include <linux/fs.h>
#include <linux/mm.h>

k:vma_merge
/strncmp(str(((struct vm_area_struct *)arg1)->vm_file->f_path.dentry->d_name.name), "sgx_enclave", 11) == 0/
{
	@merge[tid] = true;
}

kr:vma_merge /@merge[tid]/
{
	printf("vma_merge: retval = %d\n", retval);
}

k:__vma_adjust
/strncmp(str(((struct vm_area_struct *)arg0)->vm_file->f_path.dentry->d_name.name), "sgx_enclave", 11) == 0/
{
	@adjust[tid] = true;
}

kr:__vma_adjust /@adjust[tid]/
{
	printf("__vma_adjust: retval = %d\n", retval);
}

kr:__mpol_equal
{
	printf("__mpol_equal: retval = %d\n", retval);
}

This gives me:

$ sudo bpftrace sgx-vma-merged.bt 
Attaching 5 probes...
__vma_adjust: retval = 0
__vma_adjust: retval = 0
vma_merge: retval = 0
vma_merge: retval = 0
vma_merge: retval = 0

Two __vma_adjust() calls are result of mapping enclave VMA on top
of anonymous VMA. __vma_adjust() is never reached when overlapping
VMA is mapped.

Another noworthy thing is that mpol_equal() is never reached, and
the condition in vma_merge  is like this:

	if (prev && prev->vm_end == addr &&
			mpol_equal(vma_policy(prev), policy) &&
			can_vma_merge_after(prev, vm_flags,
					    anon_vma, file, pgoff,
					    vm_userfaultfd_ctx, anon_name)) {

So this led to the obvious fact: it's the VM_SPECIAL check in the
beginning of the vma_merge() function.

I found at least this old ref on removing the close callback:

https://lore.kernel.org/linux-sgx/20190708145707.GC20433@linux.intel.com/

So I guess we did not take this VM_SPECIAL check in the account...

On bright side: this can be obviously worked around in the
user space. E.g. in "brk()" you do mmap() over the whole
new VMA instead of the new subportion of it, and so forth.

And all in all the whole merging thing can be worked around
in the user space. So it's not a dead end but still would
be nice to have kernel to do this automatically.

Dave: so my question is this: can VM_SPECIAL check be
relaxed somehow, or should it be just documented that
enclave VMAs do not get merged?

Anyway, if this was a blocker for getting SGX2 patches to
5.19, it should not be. I did all the research with
my i5-9600KF desktop, which is oddbal CPU with flexible
launch control, and no SGX2. Neither was any SGX2 patches
applied.

BR, Jarkko

  reply	other threads:[~2022-06-06  7:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-15  1:20 VMA's not getting merged Jarkko Sakkinen
2022-05-16 14:13 ` Dave Hansen
2022-05-16 18:45   ` Jarkko Sakkinen
2022-05-18  0:47     ` Jarkko Sakkinen
2022-06-06  7:21       ` Jarkko Sakkinen [this message]
2022-06-06 11:40         ` Jarkko Sakkinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yp2q68qYtdpR+Z+T@iki.fi \
    --to=jarkko@kernel.org \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-sgx@vger.kernel.org \
    --cc=nathaniel@profian.com \
    --cc=reinette.chatre@intel.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox