From: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
To: linux-sgx@vger.kernel.org
Cc: Sean Christopherson <sean.j.christopherson@intel.com>,
Haitao Huang <haitao.huang@linux.intel.com>
Subject: Re: [PATCH v2] x86/sgx: Fix deadlock and race conditions between fork() and EPC reclaim
Date: Sat, 4 Apr 2020 02:33:31 +0300 [thread overview]
Message-ID: <20200403233237.GA11802@linux.intel.com> (raw)
In-Reply-To: <20200403093550.104789-1-jarkko.sakkinen@linux.intel.com>
On Fri, Apr 03, 2020 at 12:35:50PM +0300, Jarkko Sakkinen wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
>
> Drop the synchronize_srcu() from sgx_encl_mm_add() and replace it with a
> mm_list versioning concept to avoid deadlock when adding a mm during
> dup_mmap()/fork(), and to ensure copied PTEs are zapped.
>
> When dup_mmap() runs, it holds mmap_sem for write in both the old mm and
> new mm. Invoking synchronize_srcu() while holding mmap_sem of a mm that
> is already attached to the enclave will deadlock if the reclaimer is in
> the process of walking mm_list, as the reclaimer will try to acquire
> mmap_sem (of the old mm) while holding encl->srcu for read.
>
> INFO: task ksgxswapd:181 blocked for more than 120 seconds.
> ksgxswapd D 0 181 2 0x80004000
> Call Trace:
> __schedule+0x2db/0x700
> schedule+0x44/0xb0
> rwsem_down_read_slowpath+0x370/0x470
> down_read+0x95/0xa0
> sgx_reclaim_pages+0x1d2/0x7d0
> ksgxswapd+0x151/0x2e0
> kthread+0x120/0x140
> ret_from_fork+0x35/0x40
>
> INFO: task fork_consistenc:18824 blocked for more than 120 seconds.
> fork_consistenc D 0 18824 18786 0x00004320
> Call Trace:
> __schedule+0x2db/0x700
> schedule+0x44/0xb0
> schedule_timeout+0x205/0x300
> wait_for_completion+0xb7/0x140
> __synchronize_srcu.part.22+0x81/0xb0
> synchronize_srcu_expedited+0x27/0x30
> synchronize_srcu+0x57/0xe0
> sgx_encl_mm_add+0x12b/0x160
> sgx_vma_open+0x22/0x40
> dup_mm+0x521/0x580
> copy_process+0x1a56/0x1b50
> _do_fork+0x85/0x3a0
> __x64_sys_clone+0x8e/0xb0
> do_syscall_64+0x57/0x1b0
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Furthermore, doing synchronize_srcu() in sgx_encl_mm_add() does not
> prevent the new mm from having stale PTEs pointing at the EPC page to be
> reclaimed. dup_mmap() calls vm_ops->open()/sgx_encl_mm_add() _after_
> PTEs are copied to the new mm, i.e. blocking fork() until reclaim zaps
> the old mm is pointless as the stale PTEs have already been created in
> the new mm.
>
> All other flows that walk mm_list can safely race with dup_mmap() or are
> protected by a different mechanism. Add comments to all srcu readers
> that don't check the list version to document why its ok for the flow to
> ignore the version.
>
> Note, synchronize_srcu() is still needed when removing a mm from an
> enclave, as the srcu readers must complete their walk before the mm can
> be freed. Removing a mm is never done while holding mmap_sem.
>
> Cc: Haitao Huang <haitao.huang@linux.intel.com>
> Cc: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> ---
> v2:
> * Remove smp_wmb() as x86 does not reorder writes in the pipeline.
> * Refine comments to be more to the point and more maintainable when
> things might change.
> * Replace the ad hoc (goto-based) loop construct with a proper loop
> construct.
> arch/x86/kernel/cpu/sgx/encl.c | 11 +++++++--
> arch/x86/kernel/cpu/sgx/encl.h | 1 +
> arch/x86/kernel/cpu/sgx/reclaim.c | 39 +++++++++++++++++++++----------
> 3 files changed, 37 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index e0124a2f22d5..5b15352b3d4f 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -196,6 +196,9 @@ int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm)
> struct sgx_encl_mm *encl_mm;
> int ret;
>
> + /* mm_list can be accessed only by a single thread at a time. */
> + lockdep_assert_held_write(&mm->mmap_sem);
> +
> if (atomic_read(&encl->flags) & SGX_ENCL_DEAD)
> return -EINVAL;
>
> @@ -221,12 +224,16 @@ int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm)
> return ret;
> }
>
> + /*
> + * The page reclaimer uses list version for synchronization instead of
> + * synchronize_scru() because otherwise we could conflict with
> + * dup_mmap().
> + */
> spin_lock(&encl->mm_lock);
> list_add_rcu(&encl_mm->list, &encl->mm_list);
> + encl->mm_list_version++;
> spin_unlock(&encl->mm_lock);
>
> - synchronize_srcu(&encl->srcu);
> -
> return 0;
> }
>
> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> index 44b353aa8866..f0f72e591244 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.h
> +++ b/arch/x86/kernel/cpu/sgx/encl.h
> @@ -74,6 +74,7 @@ struct sgx_encl {
> struct mutex lock;
> struct list_head mm_list;
> spinlock_t mm_lock;
> + unsigned long mm_list_version;
> struct file *backing;
> struct kref refcount;
> struct srcu_struct srcu;
> diff --git a/arch/x86/kernel/cpu/sgx/reclaim.c b/arch/x86/kernel/cpu/sgx/reclaim.c
> index 39f0ddefbb79..5fb8bdfa6a1f 100644
> --- a/arch/x86/kernel/cpu/sgx/reclaim.c
> +++ b/arch/x86/kernel/cpu/sgx/reclaim.c
> @@ -184,28 +184,38 @@ static void sgx_reclaimer_block(struct sgx_epc_page *epc_page)
> struct sgx_encl_page *page = epc_page->owner;
> unsigned long addr = SGX_ENCL_PAGE_ADDR(page);
> struct sgx_encl *encl = page->encl;
> + unsigned long mm_list_version;
> struct sgx_encl_mm *encl_mm;
> struct vm_area_struct *vma;
> int idx, ret;
>
> - idx = srcu_read_lock(&encl->srcu);
> + do {
> + mm_list_version = encl->mm_list_version;
>
> - list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) {
> - if (!mmget_not_zero(encl_mm->mm))
> - continue;
> + /* Fence reads as the CPU can reorder them. This guarantees
> + * that we don't access old list with a new version.
> + */
> + smp_rmb();
>
> - down_read(&encl_mm->mm->mmap_sem);
> + idx = srcu_read_lock(&encl->srcu);
>
> - ret = sgx_encl_find(encl_mm->mm, addr, &vma);
> - if (!ret && encl == vma->vm_private_data)
> - zap_vma_ptes(vma, addr, PAGE_SIZE);
> + list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) {
> + if (!mmget_not_zero(encl_mm->mm))
> + continue;
>
> - up_read(&encl_mm->mm->mmap_sem);
> + down_read(&encl_mm->mm->mmap_sem);
>
> - mmput_async(encl_mm->mm);
> - }
> + ret = sgx_encl_find(encl_mm->mm, addr, &vma);
> + if (!ret && encl == vma->vm_private_data)
> + zap_vma_ptes(vma, addr, PAGE_SIZE);
>
> - srcu_read_unlock(&encl->srcu, idx);
> + up_read(&encl_mm->mm->mmap_sem);
> +
> + mmput_async(encl_mm->mm);
> + }
> +
> + srcu_read_unlock(&encl->srcu, idx);
> + } while (unlikely(encl->mm_list_version != mm_list_version));
This is bad, or at least makes the code unnecessarily hard to understand
given the use of the fences: encl->mm_list_version is read twice per
iteration.
I'll change the code to use "for ( ; ; )" and have exactly one read per
iteration.
/Jarko
next prev parent reply other threads:[~2020-04-03 23:33 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-03 9:35 [PATCH v2] x86/sgx: Fix deadlock and race conditions between fork() and EPC reclaim Jarkko Sakkinen
2020-04-03 23:33 ` Jarkko Sakkinen [this message]
2020-04-03 23:35 ` Sean Christopherson
2020-04-04 1:12 ` Jarkko Sakkinen
2020-04-06 14:31 ` Sean Christopherson
2020-04-06 14:44 ` Sean Christopherson
2020-04-03 23:42 ` Sean Christopherson
2020-04-04 1:12 ` Jarkko Sakkinen
2020-04-06 14:36 ` Sean Christopherson
2020-04-06 17:10 ` Jarkko Sakkinen
2020-04-06 17:15 ` Jarkko Sakkinen
2020-04-09 19:13 ` Sean Christopherson
2020-04-10 13:22 ` Jarkko Sakkinen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200403233237.GA11802@linux.intel.com \
--to=jarkko.sakkinen@linux.intel.com \
--cc=haitao.huang@linux.intel.com \
--cc=linux-sgx@vger.kernel.org \
--cc=sean.j.christopherson@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.