From: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
To: linux-sgx@vger.kernel.org
Cc: sean.j.christopherson@intel.com, shay.katz-zamir@intel.com,
serge.ayoun@intel.com
Subject: Re: x86/sgx: v23-rc1
Date: Thu, 10 Oct 2019 11:07:44 +0300 [thread overview]
Message-ID: <20191010080744.GA20442@linux.intel.com> (raw)
In-Reply-To: <20191010073815.GB31041@linux.intel.com>
On Thu, Oct 10, 2019 at 10:38:15AM +0300, Jarkko Sakkinen wrote:
> On Thu, Oct 10, 2019 at 10:34:48AM +0300, Jarkko Sakkinen wrote:
> > On Wed, Oct 09, 2019 at 05:35:29PM +0300, Jarkko Sakkinen wrote:
> > > On Wed, Oct 09, 2019 at 05:07:23PM +0300, Jarkko Sakkinen wrote:
> > > > Baseline before adding Sean's updates. This contains only my updates. I
> > > > spent this day mostly fixing diff's. Especially these two were somewhat
> > > > unclean:
> > > >
> > > > 1. x86/sgx: Add a page reclaimer
> > > > 2. x86/sgx: Linux Enclave Driver
> > > >
> > > > Now they pile up nicely (I think). So I decided to do this tag since now
> > > > commit's in the sense of form and shape are legit. And also because
> > > > things, well, work.
> > > >
> > > > I'll continue from this by integrating Sean's changes. You can see below
> > > > what has been already changed.
> > > >
> > > > /Jarkko
> > > >
> > > > tag v23-rc1
> > > > Tagger: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> > > > Date: Wed Oct 9 16:59:10 2019 +0300
> > > >
> > > > x86/sgx: v23-rc1 patch set
> > > >
> > > > * Return -EIO instead of -ECANCELED when ptrace() fails to read a TCS page.
> > > > * In the reclaimer, pin page before ENCLS[EBLOCK] because pinning can fail
> > > > (because of OOM) even in legit behaviour and after EBLOCK the reclaiming
> > > > flow can be only reverted by killing the whole enclave.
> > > > * Fixed SGX_ATTR_RESERVED_MASK. Bit 7 was marked as reserved while in fact
> > > > it should have been bit 6 (Table 37-3 in the SDM).
> > > > * Return -EPERM from SGX_IOC_ENCLAVE_INIT when ENCLS[EINIT] returns an SGX
> > > > error code.
> > > > -----BEGIN PGP SIGNATURE-----
> > > >
> > > > iJYEABYIAD4WIQRE6pSOnaBC00OEHEIaerohdGur0gUCXZ3nxCAcamFya2tvLnNh
> > > > a2tpbmVuQGxpbnV4LmludGVsLmNvbQAKCRAaerohdGur0mKVAQDcmIGs2f8y8hDY
> > > > b7zaQdNbaAMgsEkQ3ohMA88fbm2UQwD+P7y5AcAxzdccbgh++7RDy6XR2Ow2pluW
> > > > vCGUvRAhgwY=
> > > > =LCI3
> > > > -----END PGP SIGNATURE-----
> > > >
> > > > /Jarkko
> > >
> > > Getting this with rc1 (after running selftest). Leaving from office.
> > > No time to check this today but here are anyway logs.
> > >
> > > [ 96.906523] ============================================
> > > [ 96.906600] WARNING: possible recursive locking detected
> > > [ 96.906679] 5.4.0-rc1-custom #66 Not tainted
> > > [ 96.906741] --------------------------------------------
> > > [ 96.906817] test_sgx/1297 is trying to acquire lock:
> > > [ 96.906889] ffff99032aebdb18 (&mm->mmap_sem#2){++++}, at: __do_page_fault+0x424/0x4f0
> > > [ 96.907009]
> > > but task is already holding lock:
> > > [ 96.907091] ffff99032aebdb18 (&mm->mmap_sem#2){++++}, at: sgx_ioc_enclave_add_page+0x1fc/0x620
> > > [ 96.907217]
> > > other info that might help us debug this:
> > > [ 96.907308] Possible unsafe locking scenario:
> > >
> > > [ 96.907391] CPU0
> > > [ 96.907428] ----
> > > [ 96.907464] lock(&mm->mmap_sem#2);
> > > [ 96.907516] lock(&mm->mmap_sem#2);
> > > [ 96.907569]
> > > *** DEADLOCK ***
> > >
> > > [ 96.907650] May be due to missing lock nesting notation
> > >
> > > [ 96.907745] 2 locks held by test_sgx/1297:
> > > [ 96.907804] #0: ffff99032aebdb18 (&mm->mmap_sem#2){++++}, at: sgx_ioc_enclave_add_page+0x1fc/0x620
> > > [ 96.907935] #1: ffff990322de0080 (&encl->lock){+.+.}, at: sgx_ioc_enclave_add_page+0x212/0x620
> > > [ 96.910109]
> > > stack backtrace:
> > > [ 96.914616] CPU: 0 PID: 1297 Comm: test_sgx Not tainted 5.4.0-rc1-custom #66
> > > [ 96.918182] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0047.2018.1219.1246 12/19/2018
> > > [ 96.921795] Call Trace:
> > > [ 96.925209] dump_stack+0x8e/0xd5
> > > [ 96.928462] __lock_acquire+0xeab/0x1470
> > > [ 96.931648] ? __do_fault+0x57/0x11d
> > > [ 96.934761] lock_acquire+0xa3/0x180
> > > [ 96.937820] ? __do_page_fault+0x424/0x4f0
> > > [ 96.940866] down_read+0x30/0x150
> > > [ 96.943867] ? __do_page_fault+0x424/0x4f0
> > > [ 96.946889] __do_page_fault+0x424/0x4f0
> > > [ 96.949792] do_page_fault+0x2c/0x1a0
> > > [ 96.952602] page_fault+0x39/0x40
> > > [ 96.955400] RIP: 0010:sgx_ioc_enclave_add_page+0x3aa/0x620
> > > [ 96.958181] Code: 9d 10 ff ff ff 48 89 c8 48 81 e1 00 f0 ff ff 83 e0 0f 48 8d 14 40 48 8d 04 90 49 8d 04 c0 48 2b 08 48 03 48 08 b8 01 00 00 00 <0f> 01 cf 31 c0 0f 01 ca 85 c0 0f 85 0b 02 00 00 65 48 8b 04 25 c0
> > > [ 96.964133] RSP: 0018:ffffb46640df3c80 EFLAGS: 00050286
> > > [ 96.967141] RAX: 0000000000000001 RBX: ffffb46640df3cc0 RCX: ffffb4664ddfd000
> > > [ 96.970254] RDX: 0000000000000000 RSI: 00007f2d7a30f000 RDI: 0000000000000246
> > > [ 96.973440] RBP: ffffb46640df3db0 R08: ffffffffa8faf8a0 R09: 0000000000000000
> > > [ 96.976698] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990322de0000
> > > [ 96.980048] R13: ffff99032ba198c0 R14: 0000000000000000 R15: ffff99033a10cfe0
> > > [ 96.983425] ? avc_has_extended_perms+0x1f6/0x610
> > > [ 96.986830] sgx_ioctl+0x87/0x470
> > > [ 96.990247] ? sgx_ioctl+0x87/0x470
> > > [ 96.993696] do_vfs_ioctl+0xa9/0x6d0
> > > [ 96.997151] ? tomoyo_file_ioctl+0x19/0x20
> > > [ 97.000571] ksys_ioctl+0x75/0x80
> > > [ 97.003983] ? do_syscall_64+0x17/0x230
> > > [ 97.007285] __x64_sys_ioctl+0x1a/0x20
> > > [ 97.010487] do_syscall_64+0x5f/0x230
> > > [ 97.013612] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > [ 97.016791] RIP: 0033:0x7f2d79e135d7
> > > [ 97.019897] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
> > > [ 97.026332] RSP: 002b:00007ffd983721f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
> > > [ 97.029673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2d79e135d7
> > > [ 97.033037] RDX: 00007ffd98372260 RSI: 000000004020a401 RDI: 0000000000000003
> > > [ 97.036426] RBP: 00007ffd98372330 R08: 0000000000000003 R09: 0000000000000000
> > > [ 97.039813] R10: 00007ffd98372350 R11: 0000000000000202 R12: 00005565ae6e89e0
> > > [ 97.043190] R13: 00007ffd98373c40 R14: 0000000000000000 R15: 0000000000000000
> > > [ 681.794211] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > >
> > > $ sudo cat /sys/kernel/debug/kmemleak
> > > [sudo] password for jsakkine:
> > > unreferenced object 0xffff990325b69eb0 (size 16):
> > > comm "kworker/u8:1", pid 31, jiffies 4294895395 (age 1718.288s)
> > > hex dump (first 16 bytes):
> > > 6d 65 6d 73 74 69 63 6b 30 00 01 00 00 00 00 00 memstick0.......
> > > backtrace:
> > > [<0000000010512df5>] __kmalloc_track_caller+0x139/0x280
> > > [<00000000a5374cb0>] kstrdup+0x31/0x60
> > > [<00000000c59be911>] kstrdup_const+0x24/0x30
> > > [<00000000ff88e957>] kvasprintf_const+0x86/0xa0
> > > [<0000000050affb9a>] kobject_set_name_vargs+0x23/0x90
> > > [<00000000839b8dd7>] dev_set_name+0x4e/0x70
> > > [<0000000069897a8c>] memstick_check+0xdf/0x3a3 [memstick]
> > > [<00000000dffb0c9f>] process_one_work+0x281/0x5c0
> > > [<00000000090981e2>] worker_thread+0x34/0x400
> > > [<00000000bb117b3c>] kthread+0x121/0x140
> > > [<000000004d2f4c32>] ret_from_fork+0x24/0x50
> > >
> > > /Jarkko
> >
> > The locking order is all wrong:
> >
> > up_read(¤t->mm->mmap_sem);
> >
> > if (ret)
> > goto err_out;
> >
> > ret = __sgx_encl_extend(encl, epc_page, addp->mrmask);
> > if (ret)
> > goto err_out;
> >
> > encl_page->encl = encl;
> > encl_page->epc_page = epc_page;
> > encl->secs_child_cnt++;
> >
> > sgx_mark_page_reclaimable(encl_page->epc_page);
> >
> > mutex_unlock(&encl->lock);
> >
> > Sean: what might be reason for this? Probably is caused by add
> > page worker changes. Is this just something that has happend when
> > squashing patches by accident?
>
> I guess it is a real regression as the same happens in the
> error path.
The problem is that __uaccess_begin() can ignite #PF, which causes the
deadlock.
When I saw this first time I proposed using get_user_pages() because the
code looked at first sight a bit flakky (e.g. you don't ignite fault
handlers explicitly). Now I think it is the only acceptable way to fix
this.
/Jarkko
prev parent reply other threads:[~2019-10-10 8:07 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-09 14:07 x86/sgx: v23-rc1 Jarkko Sakkinen
2019-10-09 14:35 ` Jarkko Sakkinen
2019-10-10 7:34 ` Jarkko Sakkinen
2019-10-10 7:38 ` Jarkko Sakkinen
2019-10-10 8:07 ` Jarkko Sakkinen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191010080744.GA20442@linux.intel.com \
--to=jarkko.sakkinen@linux.intel.com \
--cc=linux-sgx@vger.kernel.org \
--cc=sean.j.christopherson@intel.com \
--cc=serge.ayoun@intel.com \
--cc=shay.katz-zamir@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.