* Re: [PATCH v6 1/3] random: add vgetrandom_alloc() syscall
2022-11-24 12:24 ` [PATCH v6 1/3] random: add vgetrandom_alloc() syscall Jason A. Donenfeld
@ 2022-11-24 12:48 ` Jason A. Donenfeld
2022-11-24 13:18 ` Arnd Bergmann
2022-11-24 12:49 ` Christian Brauner
2022-11-24 16:30 ` Jason A. Donenfeld
2 siblings, 1 reply; 9+ messages in thread
From: Jason A. Donenfeld @ 2022-11-24 12:48 UTC (permalink / raw)
To: Florian Weimer
Cc: linux-kernel, patches, tglx, linux-crypto, x86,
Greg Kroah-Hartman, Adhemerval Zanella Netto, Carlos O'Donell,
linux-api
Hey again,
On Thu, Nov 24, 2022 at 01:24:42PM +0100, Jason A. Donenfeld wrote:
> Hi Florian,
>
> On Thu, Nov 24, 2022 at 01:15:24PM +0100, Florian Weimer wrote:
> > * Jason A. Donenfeld:
> >
> > > Hi Florian,
> > >
> > > On Thu, Nov 24, 2022 at 06:25:39AM +0100, Florian Weimer wrote:
> > >> * Jason A. Donenfeld:
> > >>
> > >> > Hi Florian,
> > >> >
> > >> > On Wed, Nov 23, 2022 at 11:46:58AM +0100, Florian Weimer wrote:
> > >> >> * Jason A. Donenfeld:
> > >> >>
> > >> >> > + * The vgetrandom() function in userspace requires an opaque state, which this
> > >> >> > + * function provides to userspace, by mapping a certain number of special pages
> > >> >> > + * into the calling process. It takes a hint as to the number of opaque states
> > >> >> > + * desired, and returns the number of opaque states actually allocated, the
> > >> >> > + * size of each one in bytes, and the address of the first state.
> > >> >> > + */
> > >> >> > +SYSCALL_DEFINE3(vgetrandom_alloc, unsigned long __user *, num,
> > >> >> > + unsigned long __user *, size_per_each, unsigned int, flags)
> > >> >>
> > >> >> I think you should make this __u64, so that you get a consistent
> > >> >> userspace interface on all architectures, without the need for compat
> > >> >> system calls.
> > >> >
> > >> > That would be quite unconventional. Most syscalls that take lengths do
> > >> > so with the native register size (`unsigned long`, `size_t`), rather
> > >> > than u64. If you can point to a recent trend away from this by
> > >> > indicating some commits that added new syscalls with u64, I'd be happy
> > >> > to be shown otherwise. But AFAIK, that's not the way it's done.
> > >>
> > >> See clone3 and struct clone_args.
> > >
> > > The struct is one thing. But actually, clone3 takes a `size_t`:
> > >
> > > SYSCALL_DEFINE2(clone3, struct clone_args __user *, uargs, size_t, size)
> > >
> > > I take from this that I too should use `size_t` rather than `unsigned
> > > long.` And it doesn't seem like there's any compat clone3.
> >
> > But vgetrandom_alloc does not use unsigned long, but unsigned long *.
> > You need to look at the contents for struct clone_args for comparison.
>
> Ah! I see what you mean; that's a good point. The usual register
> clearing thing isn't going to happen because these are addresses.
>
> I still am somewhat hesitant, though, because `size_t` is really the
> "proper" type to be used. Maybe the compat syscall thing is just a
> necessary evil?
>
> The other direction would be making this a u32, since 640k ought to be
> enough for anybody and such, but maybe that'd be a mistake too.
>
> So I'm not sure. Anybody else on the list with experience adding
> syscalls have an opinion?
Looks like set_mempolicy, get_mempoliy, and migrate_pages pass an
unsigned long pointer and I don't see any compat stuff around it:
SYSCALL_DEFINE3(set_mempolicy, int, mode, const unsigned long __user *, nmask,
unsigned long, maxnode)
SYSCALL_DEFINE5(get_mempolicy, int __user *, policy,
unsigned long __user *, nmask, unsigned long, maxnode,
unsigned long, addr, unsigned long, flags)
SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
const unsigned long __user *, old_nodes,
const unsigned long __user *, new_nodes)
In contrast sched_setaffinity and get_robust_list take a unsigned long
pointer and does have a compat wrapper:
SYSCALL_DEFINE3(sched_setaffinity, pid_t, pid, unsigned int, len,
unsigned long __user *, user_mask_ptr)
SYSCALL_DEFINE3(get_robust_list, int, pid,
struct robust_list_head __user * __user *, head_ptr,
size_t __user *, len_ptr)
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6 1/3] random: add vgetrandom_alloc() syscall
2022-11-24 12:48 ` Jason A. Donenfeld
@ 2022-11-24 13:18 ` Arnd Bergmann
0 siblings, 0 replies; 9+ messages in thread
From: Arnd Bergmann @ 2022-11-24 13:18 UTC (permalink / raw)
To: Jason A . Donenfeld, Florian Weimer
Cc: linux-kernel, patches, Thomas Gleixner, linux-crypto, x86,
Greg Kroah-Hartman, Adhemerval Zanella Netto, Carlos O'Donell,
linux-api
On Thu, Nov 24, 2022, at 13:48, Jason A. Donenfeld wrote:
> On Thu, Nov 24, 2022 at 01:24:42PM +0100, Jason A. Donenfeld wrote:
> Looks like set_mempolicy, get_mempoliy, and migrate_pages pass an
> unsigned long pointer and I don't see any compat stuff around it:
>
> SYSCALL_DEFINE3(set_mempolicy, int, mode, const unsigned long
> __user *, nmask,
> unsigned long, maxnode)
>
> SYSCALL_DEFINE5(get_mempolicy, int __user *, policy,
> unsigned long __user *, nmask, unsigned long, maxnode,
> unsigned long, addr, unsigned long, flags)
>
> SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
> const unsigned long __user *, old_nodes,
> const unsigned long __user *, new_nodes)
Compat handling for these is done all the way down in the
pointer access:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/mempolicy.c#n1368
This works here because it's a special bitmap but is not the
best approach if you just have a pointer to a single value.
Arnd
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6 1/3] random: add vgetrandom_alloc() syscall
2022-11-24 12:24 ` [PATCH v6 1/3] random: add vgetrandom_alloc() syscall Jason A. Donenfeld
2022-11-24 12:48 ` Jason A. Donenfeld
@ 2022-11-24 12:49 ` Christian Brauner
2022-11-24 12:57 ` Jason A. Donenfeld
2022-11-24 16:30 ` Jason A. Donenfeld
2 siblings, 1 reply; 9+ messages in thread
From: Christian Brauner @ 2022-11-24 12:49 UTC (permalink / raw)
To: Jason A. Donenfeld
Cc: Florian Weimer, linux-kernel, patches, tglx, linux-crypto, x86,
Greg Kroah-Hartman, Adhemerval Zanella Netto, Carlos O'Donell,
linux-api, Arnd Bergmann
On Thu, Nov 24, 2022 at 01:24:42PM +0100, Jason A. Donenfeld wrote:
> Hi Florian,
>
> On Thu, Nov 24, 2022 at 01:15:24PM +0100, Florian Weimer wrote:
> > * Jason A. Donenfeld:
> >
> > > Hi Florian,
> > >
> > > On Thu, Nov 24, 2022 at 06:25:39AM +0100, Florian Weimer wrote:
> > >> * Jason A. Donenfeld:
> > >>
> > >> > Hi Florian,
> > >> >
> > >> > On Wed, Nov 23, 2022 at 11:46:58AM +0100, Florian Weimer wrote:
> > >> >> * Jason A. Donenfeld:
> > >> >>
> > >> >> > + * The vgetrandom() function in userspace requires an opaque state, which this
> > >> >> > + * function provides to userspace, by mapping a certain number of special pages
> > >> >> > + * into the calling process. It takes a hint as to the number of opaque states
> > >> >> > + * desired, and returns the number of opaque states actually allocated, the
> > >> >> > + * size of each one in bytes, and the address of the first state.
> > >> >> > + */
> > >> >> > +SYSCALL_DEFINE3(vgetrandom_alloc, unsigned long __user *, num,
> > >> >> > + unsigned long __user *, size_per_each, unsigned int, flags)
> > >> >>
> > >> >> I think you should make this __u64, so that you get a consistent
> > >> >> userspace interface on all architectures, without the need for compat
> > >> >> system calls.
> > >> >
> > >> > That would be quite unconventional. Most syscalls that take lengths do
> > >> > so with the native register size (`unsigned long`, `size_t`), rather
> > >> > than u64. If you can point to a recent trend away from this by
> > >> > indicating some commits that added new syscalls with u64, I'd be happy
> > >> > to be shown otherwise. But AFAIK, that's not the way it's done.
> > >>
> > >> See clone3 and struct clone_args.
For system calls that take structs as arguments we use u64 in the struct
for proper alignment so we can extend structs without regressing old
kernels. We have a few of those extensible struct system calls.
But we don't really have a lot system calls that pass u64 as a pointer
outside of a structure so far. Neither as register and nor as pointer
iirc. Passing them as a register arg is problematic because of 32bit
arches. But passing as pointer should be fine but it is indeed uncommon.
> > >
> > > The struct is one thing. But actually, clone3 takes a `size_t`:
> > >
> > > SYSCALL_DEFINE2(clone3, struct clone_args __user *, uargs, size_t, size)
> > >
> > > I take from this that I too should use `size_t` rather than `unsigned
> > > long.` And it doesn't seem like there's any compat clone3.
> >
> > But vgetrandom_alloc does not use unsigned long, but unsigned long *.
> > You need to look at the contents for struct clone_args for comparison.
>
> Ah! I see what you mean; that's a good point. The usual register
> clearing thing isn't going to happen because these are addresses.
>
> I still am somewhat hesitant, though, because `size_t` is really the
> "proper" type to be used. Maybe the compat syscall thing is just a
> necessary evil?
We try to avoid adding new compat-requiring syscalls like the plague
usually. (At least for new syscalls that don't need to inherit behavior
from earlier syscalls they are a revisions of.)
>
> The other direction would be making this a u32, since 640k ought to be
> enough for anybody and such, but maybe that'd be a mistake too.
I think making this a size_t is fine. We haven't traditionally used u32
for sizes. All syscalls that pass structs versioned by size use size_t.
So I would recommend to stick with that.
Alternatively, you could also introduce a simple struct versioned by
size for this system call similar to mount_setatt() and clone3() and so
on. This way you don't need to worry about future extensibilty. Just a
thought.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6 1/3] random: add vgetrandom_alloc() syscall
2022-11-24 12:49 ` Christian Brauner
@ 2022-11-24 12:57 ` Jason A. Donenfeld
0 siblings, 0 replies; 9+ messages in thread
From: Jason A. Donenfeld @ 2022-11-24 12:57 UTC (permalink / raw)
To: Christian Brauner
Cc: Florian Weimer, linux-kernel, patches, tglx, linux-crypto, x86,
Greg Kroah-Hartman, Adhemerval Zanella Netto, Carlos O'Donell,
linux-api, Arnd Bergmann
Hi Christian,
Thanks a bunch for chiming in.
On Thu, Nov 24, 2022 at 01:49:27PM +0100, Christian Brauner wrote:
> Alternatively, you could also introduce a simple struct versioned by
> size for this system call similar to mount_setatt() and clone3() and so
> on. This way you don't need to worry about future extensibilty. Just a
> thought.
Briefly considered that, but it seemed a bit heavy for something like
this. I'm not super heavily opposed, but just seemed like a bit much.
> > > >> >> > +SYSCALL_DEFINE3(vgetrandom_alloc, unsigned long __user *, num,
> > > >> >> > + unsigned long __user *, size_per_each, unsigned int, flags)
> > > >> >>
> > > >> >> I think you should make this __u64, so that you get a consistent
> > > >> >> userspace interface on all architectures, without the need for compat
> > > >> >> system calls.
> > > >> >
> > > >> > That would be quite unconventional. Most syscalls that take lengths do
> > > >> > so with the native register size (`unsigned long`, `size_t`), rather
> > > >> > than u64. If you can point to a recent trend away from this by
> > > >> > indicating some commits that added new syscalls with u64, I'd be happy
> > > >> > to be shown otherwise. But AFAIK, that's not the way it's done.
> > > >>
> > > >> See clone3 and struct clone_args.
>
> For system calls that take structs as arguments we use u64 in the struct
> for proper alignment so we can extend structs without regressing old
> kernels. We have a few of those extensible struct system calls.
>
> But we don't really have a lot system calls that pass u64 as a pointer
> outside of a structure so far. Neither as register and nor as pointer
> iirc.
Right, the __u64_aligned business seemed to be mostly about
extensibility.
> > > > The struct is one thing. But actually, clone3 takes a `size_t`:
> > > >
> > > > SYSCALL_DEFINE2(clone3, struct clone_args __user *, uargs, size_t, size)
> > > >
> > > > I take from this that I too should use `size_t` rather than `unsigned
> > > > long.` And it doesn't seem like there's any compat clone3.
> > >
> > > But vgetrandom_alloc does not use unsigned long, but unsigned long *.
> > > You need to look at the contents for struct clone_args for comparison.
> >
> > Ah! I see what you mean; that's a good point. The usual register
> > clearing thing isn't going to happen because these are addresses.
> >
> > I still am somewhat hesitant, though, because `size_t` is really the
> > "proper" type to be used. Maybe the compat syscall thing is just a
> > necessary evil?
>
> I think making this a size_t is fine. We haven't traditionally used u32
> for sizes. All syscalls that pass structs versioned by size use size_t.
> So I would recommend to stick with that.
This isn't quite a struct versioned by size. This is:
void *vgetrandom_alloc([inout] size_t *num, [out] size_t *size_per_each, unsigned int flags);
You give it an input 'num' and some flags (currently flags=0), and it
gives you back an output 'num' size, an output 'size_per_each' size, and
an opaque pointer value mapping as its return value.
I do like the idea of keeping size_t so that the type is "right". But
the other arguments are equally compelling as well, so not sure.
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6 1/3] random: add vgetrandom_alloc() syscall
2022-11-24 12:24 ` [PATCH v6 1/3] random: add vgetrandom_alloc() syscall Jason A. Donenfeld
2022-11-24 12:48 ` Jason A. Donenfeld
2022-11-24 12:49 ` Christian Brauner
@ 2022-11-24 16:30 ` Jason A. Donenfeld
2 siblings, 0 replies; 9+ messages in thread
From: Jason A. Donenfeld @ 2022-11-24 16:30 UTC (permalink / raw)
To: Florian Weimer
Cc: linux-kernel, patches, tglx, linux-crypto, x86,
Greg Kroah-Hartman, Adhemerval Zanella Netto, Carlos O'Donell,
linux-api
On Thu, Nov 24, 2022 at 01:24:42PM +0100, Jason A. Donenfeld wrote:
> Hi Florian,
>
> On Thu, Nov 24, 2022 at 01:15:24PM +0100, Florian Weimer wrote:
> > * Jason A. Donenfeld:
> >
> > > Hi Florian,
> > >
> > > On Thu, Nov 24, 2022 at 06:25:39AM +0100, Florian Weimer wrote:
> > >> * Jason A. Donenfeld:
> > >>
> > >> > Hi Florian,
> > >> >
> > >> > On Wed, Nov 23, 2022 at 11:46:58AM +0100, Florian Weimer wrote:
> > >> >> * Jason A. Donenfeld:
> > >> >>
> > >> >> > + * The vgetrandom() function in userspace requires an opaque state, which this
> > >> >> > + * function provides to userspace, by mapping a certain number of special pages
> > >> >> > + * into the calling process. It takes a hint as to the number of opaque states
> > >> >> > + * desired, and returns the number of opaque states actually allocated, the
> > >> >> > + * size of each one in bytes, and the address of the first state.
> > >> >> > + */
> > >> >> > +SYSCALL_DEFINE3(vgetrandom_alloc, unsigned long __user *, num,
> > >> >> > + unsigned long __user *, size_per_each, unsigned int, flags)
> > >> >>
> > >> >> I think you should make this __u64, so that you get a consistent
> > >> >> userspace interface on all architectures, without the need for compat
> > >> >> system calls.
> > >> >
> > >> > That would be quite unconventional. Most syscalls that take lengths do
> > >> > so with the native register size (`unsigned long`, `size_t`), rather
> > >> > than u64. If you can point to a recent trend away from this by
> > >> > indicating some commits that added new syscalls with u64, I'd be happy
> > >> > to be shown otherwise. But AFAIK, that's not the way it's done.
> > >>
> > >> See clone3 and struct clone_args.
> > >
> > > The struct is one thing. But actually, clone3 takes a `size_t`:
> > >
> > > SYSCALL_DEFINE2(clone3, struct clone_args __user *, uargs, size_t, size)
> > >
> > > I take from this that I too should use `size_t` rather than `unsigned
> > > long.` And it doesn't seem like there's any compat clone3.
> >
> > But vgetrandom_alloc does not use unsigned long, but unsigned long *.
> > You need to look at the contents for struct clone_args for comparison.
>
> The other direction would be making this a u32
I think `unsigned int` is actually a sensible size for what these values
should be. That eliminates the problem and potential bikeshed too. So
I'll go with that for v+1.
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread