All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-mm@kvack.org, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, dave.hansen@linux.intel.com,
	arnd@arndb.de, hughd@google.com, viro@zeniv.linux.org.uk
Subject: Re: [PATCH 5/9] x86, pkeys: allocation/free syscalls
Date: Thu, 7 Jul 2016 08:38:10 -0700	[thread overview]
Message-ID: <577E7762.1010003@intel.com> (raw)
In-Reply-To: <20160707144017.GW11498@techsingularity.net>

On 07/07/2016 07:40 AM, Mel Gorman wrote:
> Ok, so the last patch wired up the system call before the kernel was
> tracking which numbers were in use. It doesn't really matter as such but
> the patches should be swapped around and only expose the systemcall when
> it's actually safe.

I can do that.

>> These system calls are also very important given the kernel's use
>> of pkeys to implement execute-only support.  These help ensure
>> that userspace can never assume that it has control of a key
>> unless it first asks the kernel.
>>
>> The 'init_access_rights' argument to pkey_alloc() specifies the
>> rights that will be established for the returned pkey.  For
>> instance:
>>
>> 	pkey = pkey_alloc(flags, PKEY_DENY_WRITE);
>>
>> will allocate 'pkey', but also sets the bits in PKRU[1] such that
>> writing to 'pkey' is already denied.  This keeps userspace from
>> needing to have knowledge about manipulating PKRU with the
>> RDPKRU/WRPKRU instructions.  Userspace is still free to use these
>> instructions as it wishes, but this facility ensures it is no
>> longer required.
>>
>> The kernel does _not_ enforce that this interface must be used for
>> changes to PKRU, even for keys it does not control.
>>
>> The kernel does not prevent pkey_free() from successfully freeing
>> in-use pkeys (those still assigned to a memory range by
>> pkey_mprotect()).  It would be expensive to implement the checks
>> for this, so we instead say, "Just don't do it" since sane
>> software will never do it anyway.
> 
> Unfortunately, it could manifest as either corruption due to an area
> expected to be protected being accessible or an unexpected SEGV.
> 
> I accept the expensive arguement but it opens a new class of problems
> that userspace debuggers will need to evaluate.

Yeah.  I guess it would be good to have a debugging mechanism here at least.

>> +static inline
>> +bool mm_pkey_is_allocated(struct mm_struct *mm, unsigned long pkey)
>> +{
>> +	if (!validate_pkey(pkey))
>> +		return true;
>> +
>> +	return mm_pkey_allocation_map(mm) & (1 << pkey);
>> +}
>> +
> 
> We flip-flop between whether pkey is signed or unsigned.

Yeah, I can add some consistency here, for sure.

>> +SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
>> +{
>> +	int pkey;
>> +	int ret;
>> +
>> +	/* No flags supported yet. */
>> +	if (flags)
>> +		return -EINVAL;
>> +	/* check for unsupported init values */
>> +	if (init_val & ~PKEY_ACCESS_MASK)
>> +		return -EINVAL;
>> +
>> +	down_write(&current->mm->mmap_sem);
>> +	pkey = mm_pkey_alloc(current->mm);
>> +
>> +	ret = -ENOSPC;
>> +	if (pkey == -1)
>> +		goto out;
>> +
>> +	ret = arch_set_user_pkey_access(current, pkey, init_val);
>> +	if (ret) {
>> +		mm_pkey_free(current->mm, pkey);
>> +		goto out;
>> +	}
>> +	ret = pkey;
>> +out:
>> +	up_write(&current->mm->mmap_sem);
>> +	return ret;
>> +}
> 
> It's not wrong as such but mmap_sem taken for write seems *extremely*
> heavy to protect the allocation mask. If userspace is racing a key
> allocation with mprotect, it's already game over in terms of random
> behaviour.
> 
> I've no idea what the frequency of pkey alloc/free is expected to be. If
> it's really low then maybe it doesn't matter but if it's high this is
> going to be a bottleneck later.

I think pkey_alloc() is fundamentally less frequent than mprotect().  If
you're doing a pkey_alloc() it's because you want to set it on at least
one memory area, which means at least one mprotect().  So, at _worst_,
it's 1:1.  If you've got more than one thing you're protecting, you'll
have many mprotect()s for each pkey_alloc().

The real reason I did this, though, was to avoid having _some_ other
lock.  It'll cost more storage space, have more locking rules and I need
exclusion against pkey_mprotect() which already holds mmap_sem for
write.  IOW, I think this was the simplest thing to do.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Dave Hansen <dave.hansen@intel.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-mm@kvack.org, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, dave.hansen@linux.intel.com,
	arnd@arndb.de, hughd@google.com, viro@zeniv.linux.org.uk
Subject: Re: [PATCH 5/9] x86, pkeys: allocation/free syscalls
Date: Thu, 7 Jul 2016 08:38:10 -0700	[thread overview]
Message-ID: <577E7762.1010003@intel.com> (raw)
Message-ID: <20160707153810.uGKMZS2emIKX4IWKR-yHyqxm5-pcZuM-HbS5tPFQeeU@z> (raw)
In-Reply-To: <20160707144017.GW11498@techsingularity.net>

On 07/07/2016 07:40 AM, Mel Gorman wrote:
> Ok, so the last patch wired up the system call before the kernel was
> tracking which numbers were in use. It doesn't really matter as such but
> the patches should be swapped around and only expose the systemcall when
> it's actually safe.

I can do that.

>> These system calls are also very important given the kernel's use
>> of pkeys to implement execute-only support.  These help ensure
>> that userspace can never assume that it has control of a key
>> unless it first asks the kernel.
>>
>> The 'init_access_rights' argument to pkey_alloc() specifies the
>> rights that will be established for the returned pkey.  For
>> instance:
>>
>> 	pkey = pkey_alloc(flags, PKEY_DENY_WRITE);
>>
>> will allocate 'pkey', but also sets the bits in PKRU[1] such that
>> writing to 'pkey' is already denied.  This keeps userspace from
>> needing to have knowledge about manipulating PKRU with the
>> RDPKRU/WRPKRU instructions.  Userspace is still free to use these
>> instructions as it wishes, but this facility ensures it is no
>> longer required.
>>
>> The kernel does _not_ enforce that this interface must be used for
>> changes to PKRU, even for keys it does not control.
>>
>> The kernel does not prevent pkey_free() from successfully freeing
>> in-use pkeys (those still assigned to a memory range by
>> pkey_mprotect()).  It would be expensive to implement the checks
>> for this, so we instead say, "Just don't do it" since sane
>> software will never do it anyway.
> 
> Unfortunately, it could manifest as either corruption due to an area
> expected to be protected being accessible or an unexpected SEGV.
> 
> I accept the expensive arguement but it opens a new class of problems
> that userspace debuggers will need to evaluate.

Yeah.  I guess it would be good to have a debugging mechanism here at least.

>> +static inline
>> +bool mm_pkey_is_allocated(struct mm_struct *mm, unsigned long pkey)
>> +{
>> +	if (!validate_pkey(pkey))
>> +		return true;
>> +
>> +	return mm_pkey_allocation_map(mm) & (1 << pkey);
>> +}
>> +
> 
> We flip-flop between whether pkey is signed or unsigned.

Yeah, I can add some consistency here, for sure.

>> +SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
>> +{
>> +	int pkey;
>> +	int ret;
>> +
>> +	/* No flags supported yet. */
>> +	if (flags)
>> +		return -EINVAL;
>> +	/* check for unsupported init values */
>> +	if (init_val & ~PKEY_ACCESS_MASK)
>> +		return -EINVAL;
>> +
>> +	down_write(&current->mm->mmap_sem);
>> +	pkey = mm_pkey_alloc(current->mm);
>> +
>> +	ret = -ENOSPC;
>> +	if (pkey == -1)
>> +		goto out;
>> +
>> +	ret = arch_set_user_pkey_access(current, pkey, init_val);
>> +	if (ret) {
>> +		mm_pkey_free(current->mm, pkey);
>> +		goto out;
>> +	}
>> +	ret = pkey;
>> +out:
>> +	up_write(&current->mm->mmap_sem);
>> +	return ret;
>> +}
> 
> It's not wrong as such but mmap_sem taken for write seems *extremely*
> heavy to protect the allocation mask. If userspace is racing a key
> allocation with mprotect, it's already game over in terms of random
> behaviour.
> 
> I've no idea what the frequency of pkey alloc/free is expected to be. If
> it's really low then maybe it doesn't matter but if it's high this is
> going to be a bottleneck later.

I think pkey_alloc() is fundamentally less frequent than mprotect().  If
you're doing a pkey_alloc() it's because you want to set it on at least
one memory area, which means at least one mprotect().  So, at _worst_,
it's 1:1.  If you've got more than one thing you're protecting, you'll
have many mprotect()s for each pkey_alloc().

The real reason I did this, though, was to avoid having _some_ other
lock.  It'll cost more storage space, have more locking rules and I need
exclusion against pkey_mprotect() which already holds mmap_sem for
write.  IOW, I think this was the simplest thing to do.


  reply	other threads:[~2016-07-07 15:38 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-07 12:47 [PATCH 0/9] [REVIEW-REQUEST] [v4] System Calls for Memory Protection Keys Dave Hansen
2016-07-07 12:47 ` Dave Hansen
2016-07-07 12:47 ` [PATCH 1/9] x86, pkeys: add fault handling for PF_PK page fault bit Dave Hansen
2016-07-07 12:47   ` Dave Hansen
2016-07-07 14:40   ` Mel Gorman
2016-07-07 14:40     ` Mel Gorman
2016-07-07 15:42     ` Dave Hansen
2016-07-07 15:42       ` Dave Hansen
2016-07-07 12:47 ` [PATCH 2/9] mm: implement new pkey_mprotect() system call Dave Hansen
2016-07-07 12:47   ` Dave Hansen
2016-07-07 14:40   ` Mel Gorman
2016-07-07 14:40     ` Mel Gorman
2016-07-07 16:51     ` Dave Hansen
2016-07-07 16:51       ` Dave Hansen
2016-07-08 10:15       ` Mel Gorman
2016-07-08 10:15         ` Mel Gorman
2016-07-07 12:47 ` [PATCH 3/9] x86, pkeys: make mprotect_key() mask off additional vm_flags Dave Hansen
2016-07-07 12:47   ` Dave Hansen
2016-07-07 12:47 ` [PATCH 4/9] x86: wire up mprotect_key() system call Dave Hansen
2016-07-07 12:47   ` Dave Hansen
2016-07-07 12:47 ` [PATCH 5/9] x86, pkeys: allocation/free syscalls Dave Hansen
2016-07-07 12:47   ` Dave Hansen
2016-07-07 14:40   ` Mel Gorman
2016-07-07 14:40     ` Mel Gorman
2016-07-07 15:38     ` Dave Hansen [this message]
2016-07-07 15:38       ` Dave Hansen
2016-07-07 12:47 ` [PATCH 6/9] x86, pkeys: add pkey set/get syscalls Dave Hansen
2016-07-07 12:47   ` Dave Hansen
2016-07-07 14:45   ` Mel Gorman
2016-07-07 14:45     ` Mel Gorman
2016-07-07 17:33     ` Dave Hansen
2016-07-07 17:33       ` Dave Hansen
2016-07-08  7:18       ` Ingo Molnar
2016-07-08  7:18         ` Ingo Molnar
     [not found]         ` <20160708071810.GA27457-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-08 16:32           ` Dave Hansen
2016-07-08 16:32             ` Dave Hansen
2016-07-08 16:32             ` Dave Hansen
2016-07-09  8:37             ` Ingo Molnar
2016-07-09  8:37               ` Ingo Molnar
2016-07-09  8:37               ` Ingo Molnar
2016-07-11  4:25               ` Andy Lutomirski
2016-07-11  4:25                 ` Andy Lutomirski
     [not found]                 ` <CALCETrXJhVz6Za4=oidiM2Vfbb+XdggFBYiVyvOCcia+w064aQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-11  7:35                   ` Ingo Molnar
2016-07-11  7:35                     ` Ingo Molnar
2016-07-11  7:35                     ` Ingo Molnar
2016-07-11 14:28                     ` Dave Hansen
2016-07-11 14:28                       ` Dave Hansen
2016-07-12  7:13                       ` Ingo Molnar
2016-07-12  7:13                         ` Ingo Molnar
2016-07-12 15:39                         ` Dave Hansen
2016-07-12 15:39                           ` Dave Hansen
2016-07-11 14:50                     ` Andy Lutomirski
2016-07-11 14:50                       ` Andy Lutomirski
2016-07-11 14:34                   ` Dave Hansen
2016-07-11 14:34                     ` Dave Hansen
2016-07-11 14:34                     ` Dave Hansen
2016-07-11 14:45                     ` Andy Lutomirski
2016-07-11 14:45                       ` Andy Lutomirski
2016-07-11 15:48                       ` Dave Hansen
2016-07-11 15:48                         ` Dave Hansen
2016-07-12 16:32                         ` Andy Lutomirski
2016-07-12 16:32                           ` Andy Lutomirski
2016-07-12 17:12                           ` Dave Hansen
2016-07-12 17:12                             ` Dave Hansen
2016-07-12 22:55                             ` Andy Lutomirski
2016-07-12 22:55                               ` Andy Lutomirski
2016-07-13  7:56                           ` Ingo Molnar
2016-07-13  7:56                             ` Ingo Molnar
2016-07-13 18:43                             ` Andy Lutomirski
2016-07-13 18:43                               ` Andy Lutomirski
2016-07-14  8:07                               ` Ingo Molnar
2016-07-14  8:07                                 ` Ingo Molnar
2016-07-18  4:43                                 ` Andy Lutomirski
2016-07-18  4:43                                   ` Andy Lutomirski
2016-07-18  9:56                                   ` Ingo Molnar
2016-07-18  9:56                                     ` Ingo Molnar
     [not found]               ` <20160709083715.GA29939-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-18 18:02                 ` Dave Hansen
2016-07-18 18:02                   ` Dave Hansen
2016-07-18 18:02                   ` Dave Hansen
2016-07-18 20:12                 ` Dave Hansen
2016-07-18 20:12                   ` Dave Hansen
2016-07-18 20:12                   ` Dave Hansen
2016-07-08 19:26         ` Dave Hansen
2016-07-08 19:26           ` Dave Hansen
     [not found]       ` <577E924C.6010406-gkUM19QKKo4@public.gmane.org>
2016-07-08 10:22         ` Mel Gorman
2016-07-08 10:22           ` Mel Gorman
2016-07-08 10:22           ` Mel Gorman
2016-07-07 12:47 ` [PATCH 7/9] generic syscalls: wire up memory protection keys syscalls Dave Hansen
2016-07-07 12:47   ` Dave Hansen
2016-07-07 12:47 ` [PATCH 8/9] pkeys: add details of system call use to Documentation/ Dave Hansen
2016-07-07 12:47   ` Dave Hansen
2016-07-07 12:47 ` [PATCH 9/9] x86, pkeys: add self-tests Dave Hansen
2016-07-07 12:47   ` Dave Hansen
     [not found] ` <20160707124719.3F04C882-LXbPSdftPKxrdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2016-07-07 14:47   ` [PATCH 0/9] [REVIEW-REQUEST] [v4] System Calls for Memory Protection Keys Mel Gorman
2016-07-07 14:47     ` Mel Gorman
2016-07-07 14:47     ` Mel Gorman
2016-07-08 18:38   ` Hugh Dickins
2016-07-08 18:38     ` Hugh Dickins
2016-07-08 18:38     ` Hugh Dickins
  -- strict thread matches above, loose matches on Subject: below --
2016-06-09  0:01 [PATCH 0/9] [v3] " Dave Hansen
2016-06-09  0:01 ` [PATCH 5/9] x86, pkeys: allocation/free syscalls Dave Hansen
2016-06-09  0:01   ` Dave Hansen
2016-06-07 20:47 [PATCH 0/9] [v2] System Calls for Memory Protection Keys Dave Hansen
2016-06-07 20:47 ` [PATCH 5/9] x86, pkeys: allocation/free syscalls Dave Hansen
2016-06-07 20:47   ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=577E7762.1010003@intel.com \
    --to=dave.hansen@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hughd@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.