Linux userland API discussions

Linux userland API discussions
 help / color / mirror / Atom feed

* Re: [PATCH 4/7] Teach SELinux about a new userfaultfd class
From: Daniel Colascione @ 2019-10-13  0:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux API, LKML, Lokesh Gidra, Nick Kralevich, Nosh Minwalla,
	Tim Murray
In-Reply-To: <CALCETrVmYQ9xikif--RSAWhboY1yj=piEAEuPzisf+b+qEX4uA@mail.gmail.com>

On Sat, Oct 12, 2019 at 4:09 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione <dancol@google.com> wrote:
> >
> > Use the secure anonymous inode LSM hook we just added to let SELinux
> > policy place restrictions on userfaultfd use. The create operation
> > applies to processes creating new instances of these file objects;
> > transfer between processes is covered by restrictions on read, write,
> > and ioctl access already checked inside selinux_file_receive.
>
> This is great, and I suspect we'll want it for things like SGX, too.
> But the current design seems like it will make it essentially
> impossible for SELinux to reference an anon_inode class whose
> file_operations are in a module, and moving file_operations out of a
> module would be nasty.
>
> Could this instead be keyed off a new struct anon_inode_class, an
> enum, or even just a string?

The new LSM hook already receives the string that callers pass to the
anon_inode APIs; modules can look at that instead of the fops if they
want. The reason to pass both the name and the fops through the hook
is to allow LSMs to match using fops comparison (which seems less
prone to breakage) when possible and rely on string matching when it
isn't.

^ permalink raw reply

* Re: [PATCH 1/7 v2] tracefs: Revert ccbd54ff54e8 ("tracefs: Restrict tracefs when the kernel is locked down")
From: Steven Rostedt @ 2019-10-13  0:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Ingo Molnar, Andrew Morton,
	Matthew Garrett, James Morris James Morris, LSM List, Linux API,
	Ben Hutchings, Al Viro
In-Reply-To: <CAHk-=whE7GjKz9LtEVNw=zEgWr65N1mU7t2rA4MLiia8Zit6DQ@mail.gmail.com>

On Sat, 12 Oct 2019 15:56:15 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, Oct 11, 2019 at 5:59 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> >
> > I bisected this down to the addition of the proxy_ops into tracefs for
> > lockdown. It appears that the allocation of the proxy_ops and then freeing
> > it in the destroy_inode callback, is causing havoc with the memory system.
> > Reading the documentation about destroy_inode and talking with Linus about
> > this, this is buggy and wrong.  
> 
> Can you still add the explanation about the inode memory leak to this message?
> 
> Right now it just says "it's buggy and wrong". True. But doesn't
> explain _why_ it is buggy and wrong.
> 

Sure. The patches just finished my testing (along with other fixes that
I need to send you). I have to make a few other updates in the change
log though, so I'll be rebasing them (but not touching the code), to
clean up the change logs.

-- Steve

^ permalink raw reply

* Re: [PATCH 1/7 v2] tracefs: Revert ccbd54ff54e8 ("tracefs: Restrict tracefs when the kernel is locked down")
From: Steven Rostedt @ 2019-10-13  0:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Ingo Molnar, Andrew Morton,
	Matthew Garrett, James Morris James Morris, LSM List, Linux API,
	Ben Hutchings, Al Viro
In-Reply-To: <20191012203502.065258d2@gandalf.local.home>

On Sat, 12 Oct 2019 20:35:02 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Sat, 12 Oct 2019 15:56:15 -0700
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Fri, Oct 11, 2019 at 5:59 PM Steven Rostedt <rostedt@goodmis.org> wrote:  
> > >
> > >
> > > I bisected this down to the addition of the proxy_ops into tracefs for
> > > lockdown. It appears that the allocation of the proxy_ops and then freeing
> > > it in the destroy_inode callback, is causing havoc with the memory system.
> > > Reading the documentation about destroy_inode and talking with Linus about
> > > this, this is buggy and wrong.    
> > 
> > Can you still add the explanation about the inode memory leak to this message?
> > 
> > Right now it just says "it's buggy and wrong". True. But doesn't
> > explain _why_ it is buggy and wrong.
> >   
> 
> Sure. The patches just finished my testing (along with other fixes that
> I need to send you). I have to make a few other updates in the change
> log though, so I'll be rebasing them (but not touching the code), to
> clean up the change logs.
> 

I updated this change log to state:

"I bisected this down to the addition of the proxy_ops into tracefs for
lockdown. It appears that the allocation of the proxy_ops and then freeing
it in the destroy_inode callback, is causing havoc with the memory system.
Reading the documentation about destroy_inode and talking with Linus about
this, this is buggy and wrong. When defining the destroy_inode() method, it 
is expected that the destroy_inode() will also free the inode, and not just 
the extra allocations done in the creation of the inode. The faulty commit 
causes a memory leak of the inode data structure when they are deleted."

-- Steve

^ permalink raw reply

* Re: [PATCH 4/7] Teach SELinux about a new userfaultfd class
From: Andy Lutomirski @ 2019-10-13  0:46 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Andy Lutomirski, Linux API, LKML, Lokesh Gidra, Nick Kralevich,
	Nosh Minwalla, Tim Murray
In-Reply-To: <CAKOZuevQD-xsy_PrvT7F3Pqaoo5apZFukj2ZKLLQKup1cwgZ-A@mail.gmail.com>

On Sat, Oct 12, 2019 at 5:12 PM Daniel Colascione <dancol@google.com> wrote:
>
> On Sat, Oct 12, 2019 at 4:09 PM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione <dancol@google.com> wrote:
> > >
> > > Use the secure anonymous inode LSM hook we just added to let SELinux
> > > policy place restrictions on userfaultfd use. The create operation
> > > applies to processes creating new instances of these file objects;
> > > transfer between processes is covered by restrictions on read, write,
> > > and ioctl access already checked inside selinux_file_receive.
> >
> > This is great, and I suspect we'll want it for things like SGX, too.
> > But the current design seems like it will make it essentially
> > impossible for SELinux to reference an anon_inode class whose
> > file_operations are in a module, and moving file_operations out of a
> > module would be nasty.
> >
> > Could this instead be keyed off a new struct anon_inode_class, an
> > enum, or even just a string?
>
> The new LSM hook already receives the string that callers pass to the
> anon_inode APIs; modules can look at that instead of the fops if they
> want. The reason to pass both the name and the fops through the hook
> is to allow LSMs to match using fops comparison (which seems less
> prone to breakage) when possible and rely on string matching when it
> isn't.

I suppose that whoever makes the first module that wants to use this
mechanism can have the fun task of reworking it.  There's nothing
user-visible here that would make it hard to change in the future.

^ permalink raw reply

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
From: Daniel Colascione @ 2019-10-13  0:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux API, LKML, Lokesh Gidra, Nick Kralevich, Nosh Minwalla,
	Tim Murray
In-Reply-To: <CALCETrVZHd+csdRL-uKbVN3Z7yeNNtxiDy-UsutMi=K3ZgCiYw@mail.gmail.com>

On Sat, Oct 12, 2019 at 4:10 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione <dancol@google.com> wrote:
> >
> > The new secure flag makes userfaultfd use a new "secure" anonymous
> > file object instead of the default one, letting security modules
> > supervise userfaultfd use.
> >
> > Requiring that users pass a new flag lets us avoid changing the
> > semantics for existing callers.
>
> Is there any good reason not to make this be the default?
>
>
> The only downside I can see is that it would increase the memory usage
> of userfaultfd(), but that doesn't seem like such a big deal.  A
> lighter-weight alternative would be to have a single inode shared by
> all userfaultfd instances, which would require a somewhat different
> internal anon_inode API.

I'd also prefer to just make SELinux use mandatory, but there's a
nasty interaction with UFFD_EVENT_FORK. Adding a new UFFD_SECURE mode
which blocks UFFD_EVENT_FORK sidesteps this problem. Maybe you know a
better way to deal with it.

Right now, when a process with a UFFD-managed VMA using
UFFD_EVENT_FORK forks, we make a new userfaultfd_ctx out of thin air
and enqueue it on the message queue for the parent process. When we
dequeue that context, we get to resolve_userfault_fork, which makes up
a new UFFD file object out of thin air in the context of the reading
process. Following normal SELinux rules, the SID attached to that new
file object would be the task SID of the process *reading* the fork
event, not the SID of the new fork child. That seems wrong, because
the label we give to the UFFD should correspond to the label of the
process that UFFD controls.

To try to solve this problem, we can move the file object creation to
the fork child and enqueue the file object itself instead of just the
userfaultfd_ctx, treating the dequeue as a file-descriptor-receive
operation just like a recvmsg of an AF_UNIX socket with SCM_RIGHTS.
(This approach seems more elegant anyway, since it reflects what's
actually going on.) The trouble the early-file-object-creation
approach is that the fork child may not be allowed to create UFFD file
objects on its own and an LSM can't tell the difference between
UFFD_EVENT_FORK handling creating the file object and the fork child
just calling userfaultfd(), meaning an LSM could veto the creation of
the file object for the fork event. We can't just create a
non-ANON_INODE_SECURE file object instead: that would defeat the whole
purpose of supervising UFFD using SELinux.

But maybe we can go further: let's separate authentication and
authorization, as we do in other LSM hooks. Let's split my
inode_init_security_anon into two hooks, inode_init_security_anon and
inode_create_anon. We'd define the former to just initialize the file
object's security information --- in the SELinux case, figuring out
its class and SID --- and define the latter to answer the yes/no
question of whether a particular anonymous inode creation should be
allowed. Normally, anon_inode_getfile2() would just call both hooks.
We'd add another anon_inode_getfd flag, ANON_INODE_SKIP_AUTHORIZATION
or something, that would tell anon_inode_getfile2() to skip calling
the authorization hook, effectively making the creation always
succeed. We can then make the UFFD code pass
ANON_INODE_SKIP_AUTHORIZATION when it's creating a file object in the
fork child while creating UFFD_EVENT_FORK messages.

Granted, UFFD fork processing doesn't actually occur in the fork
child, but in copy_mm, in the parent --- but the right thing should
happen anyway, right?

I'm open to suggestions. In the meantime, I figured we'd just define a
UFFD_SECURE and make it incompatible with UFFD_EVENT_FORK.

> In any event, I don't think that "make me visible to SELinux" should
> be a choice that user code makes.

Right. The new unprivileged_userfaultfd setting is ugly, but it at
least removes the ability of unprivileged users to opt out of SELinux
supervision.

^ permalink raw reply

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
From: Andy Lutomirski @ 2019-10-13  1:14 UTC (permalink / raw)
  To: Daniel Colascione, Linus Torvalds, Jann Horn, Andrea Arcangeli,
	Pavel Emelyanov
  Cc: Andy Lutomirski, Linux API, LKML, Lokesh Gidra, Nick Kralevich,
	Nosh Minwalla, Tim Murray
In-Reply-To: <CAKOZuevUqs_Oe1UEwguQK7Ate3ai1DSVSij=0R=vmz9LzX4k6Q@mail.gmail.com>

[adding more people because this is going to be an ABI break, sigh]

On Sat, Oct 12, 2019 at 5:52 PM Daniel Colascione <dancol@google.com> wrote:
>
> On Sat, Oct 12, 2019 at 4:10 PM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione <dancol@google.com> wrote:
> > >
> > > The new secure flag makes userfaultfd use a new "secure" anonymous
> > > file object instead of the default one, letting security modules
> > > supervise userfaultfd use.
> > >
> > > Requiring that users pass a new flag lets us avoid changing the
> > > semantics for existing callers.
> >
> > Is there any good reason not to make this be the default?
> >
> >
> > The only downside I can see is that it would increase the memory usage
> > of userfaultfd(), but that doesn't seem like such a big deal.  A
> > lighter-weight alternative would be to have a single inode shared by
> > all userfaultfd instances, which would require a somewhat different
> > internal anon_inode API.
>
> I'd also prefer to just make SELinux use mandatory, but there's a
> nasty interaction with UFFD_EVENT_FORK. Adding a new UFFD_SECURE mode
> which blocks UFFD_EVENT_FORK sidesteps this problem. Maybe you know a
> better way to deal with it.

...

> But maybe we can go further: let's separate authentication and
> authorization, as we do in other LSM hooks. Let's split my
> inode_init_security_anon into two hooks, inode_init_security_anon and
> inode_create_anon. We'd define the former to just initialize the file
> object's security information --- in the SELinux case, figuring out
> its class and SID --- and define the latter to answer the yes/no
> question of whether a particular anonymous inode creation should be
> allowed. Normally, anon_inode_getfile2() would just call both hooks.
> We'd add another anon_inode_getfd flag, ANON_INODE_SKIP_AUTHORIZATION
> or something, that would tell anon_inode_getfile2() to skip calling
> the authorization hook, effectively making the creation always
> succeed. We can then make the UFFD code pass
> ANON_INODE_SKIP_AUTHORIZATION when it's creating a file object in the
> fork child while creating UFFD_EVENT_FORK messages.

That sounds like an improvement.  Or maybe just teach SELinux that
this particular fd creation is actually making an anon_inode that is a
child of an existing anon inode and that the context should be copied
or whatever SELinux wants to do.  Like this, maybe:

static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
                                  struct userfaultfd_ctx *new,
                                  struct uffd_msg *msg)
{
        int fd;

Change this:

        fd = anon_inode_getfd("[userfaultfd]", &userfaultfd_fops, new,
                              O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));

to something like:

      fd = anon_inode_make_child_fd(..., ctx->inode, ...);

where ctx->inode is the one context's inode.

*** HOWEVER *** !!!

Now that you've pointed this mechanism out, it is utterly and
completely broken and should be removed from the kernel outright or at
least severely restricted.  A .read implementation MUST NOT ACT ON THE
CALLING TASK.  Ever.  Just imagine the effect of passing a userfaultfd
as stdin to a setuid program.

So I think the right solution might be to attempt to *remove*
UFFD_EVENT_FORK.  Maybe the solution is to say that, unless the
creator of a userfaultfd() has global CAP_SYS_ADMIN, then it cannot
use UFFD_FEATURE_EVENT_FORK) and print a warning (once) when
UFFD_FEATURE_EVENT_FORK is allowed.  And, after some suitable
deprecation period, just remove it.  If it's genuinely useful, it
needs an entirely new API based on ioctl() or a syscall.  Or even
recvmsg() :)

And UFFD_SECURE should just become automatic, since you don't have a
problem any more. :-p

--Andy

^ permalink raw reply

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
From: Daniel Colascione @ 2019-10-13  1:38 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Jann Horn, Andrea Arcangeli, Pavel Emelyanov,
	Linux API, LKML, Lokesh Gidra, Nick Kralevich, Nosh Minwalla,
	Tim Murray
In-Reply-To: <CALCETrUyq=J37gU-MYXqLdoi7uH7iNNVRjvcGUT11JA1QuTFyg@mail.gmail.com>

On Sat, Oct 12, 2019 at 6:14 PM Andy Lutomirski <luto@kernel.org> wrote:
>
..
>
> > But maybe we can go further: let's separate authentication and
> > authorization, as we do in other LSM hooks. Let's split my
> > inode_init_security_anon into two hooks, inode_init_security_anon and
> > inode_create_anon. We'd define the former to just initialize the file
> > object's security information --- in the SELinux case, figuring out
> > its class and SID --- and define the latter to answer the yes/no
> > question of whether a particular anonymous inode creation should be
> > allowed. Normally, anon_inode_getfile2() would just call both hooks.
> > We'd add another anon_inode_getfd flag, ANON_INODE_SKIP_AUTHORIZATION
> > or something, that would tell anon_inode_getfile2() to skip calling
> > the authorization hook, effectively making the creation always
> > succeed. We can then make the UFFD code pass
> > ANON_INODE_SKIP_AUTHORIZATION when it's creating a file object in the
> > fork child while creating UFFD_EVENT_FORK messages.
>
> That sounds like an improvement.  Or maybe just teach SELinux that
> this particular fd creation is actually making an anon_inode that is a
> child of an existing anon inode and that the context should be copied
> or whatever SELinux wants to do.  Like this, maybe:
>
> static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
>                                   struct userfaultfd_ctx *new,
>                                   struct uffd_msg *msg)
> {
>         int fd;
>
> Change this:
>
>         fd = anon_inode_getfd("[userfaultfd]", &userfaultfd_fops, new,
>                               O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
>
> to something like:
>
>       fd = anon_inode_make_child_fd(..., ctx->inode, ...);
>
> where ctx->inode is the one context's inode.

Yeah. I figured we could just add a special-purpose hook for this
case. Having a special hook for this one case feels ugly though, and
at copy_mm time, we don't have a PID for the new child yet --- I don't
know whether LSMs would care about that. But maybe this is one of
those "doctor, it hurts when I do this!" situations and this child
process difficulty is just a hint that some other design might work
better.

> Now that you've pointed this mechanism out, it is utterly and
> completely broken and should be removed from the kernel outright or at
> least severely restricted.  A .read implementation MUST NOT ACT ON THE
> CALLING TASK.  Ever.  Just imagine the effect of passing a userfaultfd
> as stdin to a setuid program.
>
> So I think the right solution might be to attempt to *remove*
> UFFD_EVENT_FORK.  Maybe the solution is to say that, unless the
> creator of a userfaultfd() has global CAP_SYS_ADMIN, then it cannot
> use UFFD_FEATURE_EVENT_FORK) and print a warning (once) when
> UFFD_FEATURE_EVENT_FORK is allowed.  And, after some suitable
> deprecation period, just remove it.  If it's genuinely useful, it
> needs an entirely new API based on ioctl() or a syscall.  Or even
> recvmsg() :)

IMHO, userfaultfd should have been a datagram socket from the start.
As you point out, it's a good fit for the UFFD protocol, which
involves FD passing and a fixed message size.

> And UFFD_SECURE should just become automatic, since you don't have a
> problem any more. :-p

Agreed. I'll wait to hear what everyone else has to say.

^ permalink raw reply

* Re: [PATCHv7 06/33] alarmtimer: Provide get_timespec() callback
From: kbuild test robot @ 2019-10-14  0:36 UTC (permalink / raw)
  Cc: kbuild-all, linux-kernel, Dmitry Safonov, Andrei Vagin,
	Dmitry Safonov, Adrian Reber, Andrei Vagin, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Eric W. Biederman, H. Peter Anvin, Ingo Molnar, Jann Horn,
	Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Thomas Gleixner, Vincenzo Frascino, containers
In-Reply-To: <20191011012341.846266-7-dima@arista.com>

[-- Attachment #1: Type: text/plain, Size: 1213 bytes --]

Hi Dmitry,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.4-rc2 next-20191011]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Dmitry-Safonov/kernel-Introduce-Time-Namespace/20191014-075119
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-13) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: kernel/time/alarmtimer.o: in function `alarmtimer_init':
>> alarmtimer.c:(.init.text+0x26): undefined reference to `posix_get_realtime_timespec'
>> ld: alarmtimer.c:(.init.text+0x44): undefined reference to `posix_get_boottime_timespec'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7207 bytes --]

^ permalink raw reply

* Re: [PATCHv7 15/33] posix-timers: Make clock_nanosleep() time namespace aware
From: kbuild test robot @ 2019-10-14  0:50 UTC (permalink / raw)
  Cc: kbuild-all, linux-kernel, Dmitry Safonov, Andrei Vagin,
	Dmitry Safonov, Adrian Reber, Andy Lutomirski, Arnd Bergmann,
	Christian Brauner, Cyrill Gorcunov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Thomas Gleixner, Vincenzo Frascino,
	containers, criu
In-Reply-To: <20191011012341.846266-16-dima@arista.com>

[-- Attachment #1: Type: text/plain, Size: 4759 bytes --]

Hi Dmitry,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.4-rc2 next-20191011]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Dmitry-Safonov/kernel-Introduce-Time-Namespace/20191014-075119
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-13) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/time/posix-stubs.c: In function '__do_sys_clock_nanosleep':
>> kernel/time/posix-stubs.c:153:31: error: 'clockid' undeclared (first use in this function); did you mean 'clock_t'?
      texp = timens_ktime_to_host(clockid, texp);
                                  ^~~~~~~
                                  clock_t
   kernel/time/posix-stubs.c:153:31: note: each undeclared identifier is reported only once for each function it appears in
   kernel/time/posix-stubs.c: In function '__do_sys_clock_nanosleep_time32':
>> kernel/time/posix-stubs.c:222:2: error: unknown type name 'ktime'; did you mean 'ktime_t'?
     ktime texp;
     ^~~~~
     ktime_t
   kernel/time/posix-stubs.c:243:31: error: 'clockid' undeclared (first use in this function); did you mean 'clock_t'?
      texp = timens_ktime_to_host(clockid, texp);
                                  ^~~~~~~
                                  clock_t

vim +153 kernel/time/posix-stubs.c

   126	
   127	SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, which_clock, int, flags,
   128			const struct __kernel_timespec __user *, rqtp,
   129			struct __kernel_timespec __user *, rmtp)
   130	{
   131		struct timespec64 t;
   132		ktime_t texp;
   133	
   134		switch (which_clock) {
   135		case CLOCK_REALTIME:
   136		case CLOCK_MONOTONIC:
   137		case CLOCK_BOOTTIME:
   138			break;
   139		default:
   140			return -EINVAL;
   141		}
   142	
   143		if (get_timespec64(&t, rqtp))
   144			return -EFAULT;
   145		if (!timespec64_valid(&t))
   146			return -EINVAL;
   147		if (flags & TIMER_ABSTIME)
   148			rmtp = NULL;
   149		current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE;
   150		current->restart_block.nanosleep.rmtp = rmtp;
   151		texp = timespec64_to_ktime(t);
   152		if (flags & TIMER_ABSTIME)
 > 153			texp = timens_ktime_to_host(clockid, texp);
   154		return hrtimer_nanosleep(texp, flags & TIMER_ABSTIME ?
   155					 HRTIMER_MODE_ABS : HRTIMER_MODE_REL,
   156					 which_clock);
   157	}
   158	
   159	#ifdef CONFIG_COMPAT
   160	COMPAT_SYS_NI(timer_create);
   161	COMPAT_SYS_NI(getitimer);
   162	COMPAT_SYS_NI(setitimer);
   163	#endif
   164	
   165	#ifdef CONFIG_COMPAT_32BIT_TIME
   166	SYS_NI(timer_settime32);
   167	SYS_NI(timer_gettime32);
   168	
   169	SYSCALL_DEFINE2(clock_settime32, const clockid_t, which_clock,
   170			struct old_timespec32 __user *, tp)
   171	{
   172		struct timespec64 new_tp;
   173	
   174		if (which_clock != CLOCK_REALTIME)
   175			return -EINVAL;
   176		if (get_old_timespec32(&new_tp, tp))
   177			return -EFAULT;
   178	
   179		return do_sys_settimeofday64(&new_tp, NULL);
   180	}
   181	
   182	SYSCALL_DEFINE2(clock_gettime32, clockid_t, which_clock,
   183			struct old_timespec32 __user *, tp)
   184	{
   185		int ret;
   186		struct timespec64 kernel_tp;
   187	
   188		ret = do_clock_gettime(which_clock, &kernel_tp);
   189		if (ret)
   190			return ret;
   191	
   192		if (put_old_timespec32(&kernel_tp, tp))
   193			return -EFAULT;
   194		return 0;
   195	}
   196	
   197	SYSCALL_DEFINE2(clock_getres_time32, clockid_t, which_clock,
   198			struct old_timespec32 __user *, tp)
   199	{
   200		struct timespec64 rtn_tp = {
   201			.tv_sec = 0,
   202			.tv_nsec = hrtimer_resolution,
   203		};
   204	
   205		switch (which_clock) {
   206		case CLOCK_REALTIME:
   207		case CLOCK_MONOTONIC:
   208		case CLOCK_BOOTTIME:
   209			if (put_old_timespec32(&rtn_tp, tp))
   210				return -EFAULT;
   211			return 0;
   212		default:
   213			return -EINVAL;
   214		}
   215	}
   216	
   217	SYSCALL_DEFINE4(clock_nanosleep_time32, clockid_t, which_clock, int, flags,
   218			struct old_timespec32 __user *, rqtp,
   219			struct old_timespec32 __user *, rmtp)
   220	{
   221		struct timespec64 t;
 > 222		ktime texp;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7207 bytes --]

^ permalink raw reply

* Re: [PATCHv7 22/33] time: Allocate per-timens vvar page
From: kbuild test robot @ 2019-10-14  2:22 UTC (permalink / raw)
  Cc: kbuild-all, linux-kernel, Dmitry Safonov, Dmitry Safonov,
	Adrian Reber, Andrei Vagin, Andy Lutomirski, Arnd Bergmann,
	Christian Brauner, Cyrill Gorcunov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Thomas Gleixner, Vincenzo Frascino,
	containers, criu
In-Reply-To: <20191011012341.846266-23-dima@arista.com>

[-- Attachment #1: Type: text/plain, Size: 1278 bytes --]

Hi Dmitry,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.4-rc2 next-20191010]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Dmitry-Safonov/kernel-Introduce-Time-Namespace/20191014-075119
config: parisc-b180_defconfig (attached as .config)
compiler: hppa-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=parisc 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/time/namespace.o: In function `timens_set_vvar_page.isra.8.part.9':
>> (.text+0x130): undefined reference to `arch_get_vdso_data'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 14107 bytes --]

^ permalink raw reply

* Re: [PATCHv7 22/33] time: Allocate per-timens vvar page
From: kbuild test robot @ 2019-10-14  2:34 UTC (permalink / raw)
  Cc: kbuild-all, linux-kernel, Dmitry Safonov, Dmitry Safonov,
	Adrian Reber, Andrei Vagin, Andy Lutomirski, Arnd Bergmann,
	Christian Brauner, Cyrill Gorcunov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Thomas Gleixner, Vincenzo Frascino,
	containers, criu
In-Reply-To: <20191011012341.846266-23-dima@arista.com>

[-- Attachment #1: Type: text/plain, Size: 1255 bytes --]

Hi Dmitry,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.4-rc2 next-20191011]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Dmitry-Safonov/kernel-Introduce-Time-Namespace/20191014-075119
config: riscv-defconfig (attached as .config)
compiler: riscv64-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=riscv 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/time/namespace.o: In function `.L0 ':
>> namespace.c:(.text+0xfc): undefined reference to `arch_get_vdso_data'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 18632 bytes --]

^ permalink raw reply

* Re: [PATCHv7 25/33] x86/vdso: Zap vvar pages on switch a time namspace
From: kbuild test robot @ 2019-10-14  2:47 UTC (permalink / raw)
  Cc: kbuild-all, linux-kernel, Dmitry Safonov, Dmitry Safonov,
	Adrian Reber, Andrei Vagin, Andy Lutomirski, Arnd Bergmann,
	Christian Brauner, Cyrill Gorcunov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Thomas Gleixner, Vincenzo Frascino,
	containers, criu
In-Reply-To: <20191011012341.846266-26-dima@arista.com>

[-- Attachment #1: Type: text/plain, Size: 1514 bytes --]

Hi Dmitry,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.4-rc2 next-20191010]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Dmitry-Safonov/kernel-Introduce-Time-Namespace/20191014-075119
config: parisc-b180_defconfig (attached as .config)
compiler: hppa-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=parisc 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/time/namespace.o: In function `timens_set_vvar_page.isra.8.part.9':
   (.text+0x130): undefined reference to `arch_get_vdso_data'
   kernel/time/namespace.o: In function `timens_install':
>> (.text+0x6e0): undefined reference to `vdso_join_timens'
   kernel/time/namespace.o: In function `timens_on_fork':
   (.text+0x804): undefined reference to `vdso_join_timens'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 14107 bytes --]

^ permalink raw reply

* Re: [PATCH 2/7] Add a concept of a "secure" anonymous file
From: kbuild test robot @ 2019-10-14  3:01 UTC (permalink / raw)
  Cc: kbuild-all, linux-api, linux-kernel, lokeshgidra, dancol, nnk,
	nosh, timmurray
In-Reply-To: <20191012191602.45649-3-dancol@google.com>

[-- Attachment #1: Type: text/plain, Size: 1984 bytes --]

Hi Daniel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.4-rc2 next-20191011]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Daniel-Colascione/Harden-userfaultfd/20191014-102741
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-13) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   fs/anon_inodes.c: In function 'anon_inode_make_secure_inode':
>> fs/anon_inodes.c:67:10: error: implicit declaration of function 'security_inode_init_security_anon'; did you mean 'security_inode_init_security'? [-Werror=implicit-function-declaration]
     error = security_inode_init_security_anon(inode, name, fops);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             security_inode_init_security
   cc1: some warnings being treated as errors

vim +67 fs/anon_inodes.c

    57	
    58	struct inode *anon_inode_make_secure_inode(const char *name,
    59						   const struct file_operations *fops)
    60	{
    61		struct inode *inode;
    62		int error;
    63		inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
    64		if (IS_ERR(inode))
    65			return ERR_PTR(PTR_ERR(inode));
    66		inode->i_flags &= ~S_PRIVATE;
  > 67		error =	security_inode_init_security_anon(inode, name, fops);
    68		if (error) {
    69			iput(inode);
    70			return ERR_PTR(error);
    71		}
    72		return inode;
    73	}
    74	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7207 bytes --]

^ permalink raw reply

* Re: [PATCHv7 25/33] x86/vdso: Zap vvar pages on switch a time namspace
From: kbuild test robot @ 2019-10-14  3:11 UTC (permalink / raw)
  Cc: kbuild-all, linux-kernel, Dmitry Safonov, Dmitry Safonov,
	Adrian Reber, Andrei Vagin, Andy Lutomirski, Arnd Bergmann,
	Christian Brauner, Cyrill Gorcunov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Thomas Gleixner, Vincenzo Frascino,
	containers, criu
In-Reply-To: <20191011012341.846266-26-dima@arista.com>

[-- Attachment #1: Type: text/plain, Size: 1457 bytes --]

Hi Dmitry,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.4-rc2 next-20191011]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Dmitry-Safonov/kernel-Introduce-Time-Namespace/20191014-075119
config: riscv-defconfig (attached as .config)
compiler: riscv64-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=riscv 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/time/namespace.o: In function `.L0 ':
   namespace.c:(.text+0xfc): undefined reference to `arch_get_vdso_data'
   kernel/time/namespace.o: In function `timens_install':
>> namespace.c:(.text+0x41c): undefined reference to `vdso_join_timens'
   namespace.c:(.text+0x4ce): undefined reference to `vdso_join_timens'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 18632 bytes --]

^ permalink raw reply

* Re: [PATCHv7 15/33] posix-timers: Make clock_nanosleep() time namespace aware
From: kbuild test robot @ 2019-10-14  4:10 UTC (permalink / raw)
  Cc: kbuild-all, linux-kernel, Dmitry Safonov, Andrei Vagin,
	Dmitry Safonov, Adrian Reber, Andy Lutomirski, Arnd Bergmann,
	Christian Brauner, Cyrill Gorcunov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Thomas Gleixner, Vincenzo Frascino,
	containers, criu
In-Reply-To: <20191011012341.846266-16-dima@arista.com>

[-- Attachment #1: Type: text/plain, Size: 4611 bytes --]

Hi Dmitry,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.4-rc2 next-20191011]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Dmitry-Safonov/kernel-Introduce-Time-Namespace/20191014-075119
config: x86_64-randconfig-s1-201941 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.2-10+deb8u1) 4.9.2
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel//time/posix-stubs.c: In function '__do_sys_clock_nanosleep':
>> kernel//time/posix-stubs.c:153:31: error: 'clockid' undeclared (first use in this function)
      texp = timens_ktime_to_host(clockid, texp);
                                  ^
   kernel//time/posix-stubs.c:153:31: note: each undeclared identifier is reported only once for each function it appears in
   kernel//time/posix-stubs.c: In function '__do_sys_clock_nanosleep_time32':
>> kernel//time/posix-stubs.c:222:2: error: unknown type name 'ktime'
     ktime texp;
     ^
   kernel//time/posix-stubs.c:243:31: error: 'clockid' undeclared (first use in this function)
      texp = timens_ktime_to_host(clockid, texp);
                                  ^

vim +/clockid +153 kernel//time/posix-stubs.c

   126	
   127	SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, which_clock, int, flags,
   128			const struct __kernel_timespec __user *, rqtp,
   129			struct __kernel_timespec __user *, rmtp)
   130	{
   131		struct timespec64 t;
   132		ktime_t texp;
   133	
   134		switch (which_clock) {
   135		case CLOCK_REALTIME:
   136		case CLOCK_MONOTONIC:
   137		case CLOCK_BOOTTIME:
   138			break;
   139		default:
   140			return -EINVAL;
   141		}
   142	
   143		if (get_timespec64(&t, rqtp))
   144			return -EFAULT;
   145		if (!timespec64_valid(&t))
   146			return -EINVAL;
   147		if (flags & TIMER_ABSTIME)
   148			rmtp = NULL;
   149		current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE;
   150		current->restart_block.nanosleep.rmtp = rmtp;
   151		texp = timespec64_to_ktime(t);
   152		if (flags & TIMER_ABSTIME)
 > 153			texp = timens_ktime_to_host(clockid, texp);
   154		return hrtimer_nanosleep(texp, flags & TIMER_ABSTIME ?
   155					 HRTIMER_MODE_ABS : HRTIMER_MODE_REL,
   156					 which_clock);
   157	}
   158	
   159	#ifdef CONFIG_COMPAT
   160	COMPAT_SYS_NI(timer_create);
   161	COMPAT_SYS_NI(getitimer);
   162	COMPAT_SYS_NI(setitimer);
   163	#endif
   164	
   165	#ifdef CONFIG_COMPAT_32BIT_TIME
   166	SYS_NI(timer_settime32);
   167	SYS_NI(timer_gettime32);
   168	
   169	SYSCALL_DEFINE2(clock_settime32, const clockid_t, which_clock,
   170			struct old_timespec32 __user *, tp)
   171	{
   172		struct timespec64 new_tp;
   173	
   174		if (which_clock != CLOCK_REALTIME)
   175			return -EINVAL;
   176		if (get_old_timespec32(&new_tp, tp))
   177			return -EFAULT;
   178	
   179		return do_sys_settimeofday64(&new_tp, NULL);
   180	}
   181	
   182	SYSCALL_DEFINE2(clock_gettime32, clockid_t, which_clock,
   183			struct old_timespec32 __user *, tp)
   184	{
   185		int ret;
   186		struct timespec64 kernel_tp;
   187	
   188		ret = do_clock_gettime(which_clock, &kernel_tp);
   189		if (ret)
   190			return ret;
   191	
   192		if (put_old_timespec32(&kernel_tp, tp))
   193			return -EFAULT;
   194		return 0;
   195	}
   196	
   197	SYSCALL_DEFINE2(clock_getres_time32, clockid_t, which_clock,
   198			struct old_timespec32 __user *, tp)
   199	{
   200		struct timespec64 rtn_tp = {
   201			.tv_sec = 0,
   202			.tv_nsec = hrtimer_resolution,
   203		};
   204	
   205		switch (which_clock) {
   206		case CLOCK_REALTIME:
   207		case CLOCK_MONOTONIC:
   208		case CLOCK_BOOTTIME:
   209			if (put_old_timespec32(&rtn_tp, tp))
   210				return -EFAULT;
   211			return 0;
   212		default:
   213			return -EINVAL;
   214		}
   215	}
   216	
   217	SYSCALL_DEFINE4(clock_nanosleep_time32, clockid_t, which_clock, int, flags,
   218			struct old_timespec32 __user *, rqtp,
   219			struct old_timespec32 __user *, rmtp)
   220	{
   221		struct timespec64 t;
 > 222		ktime texp;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30786 bytes --]

^ permalink raw reply

* Re: [PATCH 1/7] Add a new flags-accepting interface for anonymous inodes
From: kbuild test robot @ 2019-10-14  4:26 UTC (permalink / raw)
  Cc: kbuild-all, linux-api, linux-kernel, lokeshgidra, dancol, nnk,
	nosh, timmurray
In-Reply-To: <20191012191602.45649-2-dancol@google.com>

[-- Attachment #1: Type: text/plain, Size: 18565 bytes --]

Hi Daniel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[cannot apply to v5.4-rc3 next-20191011]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Daniel-Colascione/Harden-userfaultfd/20191014-102741
reproduce: make htmldocs

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   Warning: The Sphinx 'sphinx_rtd_theme' HTML theme was not found. Make sure you have the theme installed to produce pretty HTML output. Falling back to the default theme.
   WARNING: dot(1) not found, for better output quality install graphviz from http://www.graphviz.org
   WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick (https://www.imagemagick.org)
   Error: Cannot open file drivers/dma-buf/reservation.c
   Error: Cannot open file drivers/dma-buf/reservation.c
   Error: Cannot open file drivers/dma-buf/reservation.c
   Error: Cannot open file include/linux/reservation.h
   Error: Cannot open file include/linux/reservation.h
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'quotactl' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'quota_on' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'sb_free_mnt_opts' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'sb_eat_lsm_opts' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'sb_kern_mount' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'sb_show_options' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'sb_add_mnt_opt' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'd_instantiate' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'getprocattr' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'setprocattr' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'locked_down' not described in 'security_list_options'
   include/linux/regulator/machine.h:196: warning: Function parameter or member 'max_uV_step' not described in 'regulation_constraints'
   include/linux/regulator/driver.h:223: warning: Function parameter or member 'resume' not described in 'regulator_ops'
   fs/fs-writeback.c:918: warning: Excess function parameter 'nr_pages' description in 'cgroup_writeback_by_id'
>> fs/anon_inodes.c:139: warning: Function parameter or member 'anon_inode_flags' not described in 'anon_inode_getfd2'
   fs/direct-io.c:258: warning: Excess function parameter 'offset' description in 'dio_complete'
   fs/libfs.c:501: warning: Excess function parameter 'available' description in 'simple_write_end'
   fs/posix_acl.c:647: warning: Function parameter or member 'inode' not described in 'posix_acl_update_mode'
   fs/posix_acl.c:647: warning: Function parameter or member 'mode_p' not described in 'posix_acl_update_mode'
   fs/posix_acl.c:647: warning: Function parameter or member 'acl' not described in 'posix_acl_update_mode'
   include/linux/spi/spi.h:190: warning: Function parameter or member 'driver_override' not described in 'spi_device'
   drivers/usb/typec/bus.c:1: warning: 'typec_altmode_unregister_driver' not found
   drivers/usb/typec/bus.c:1: warning: 'typec_altmode_register_driver' not found
   drivers/usb/typec/class.c:1: warning: 'typec_altmode_register_notifier' not found
   drivers/usb/typec/class.c:1: warning: 'typec_altmode_unregister_notifier' not found
   include/linux/w1.h:277: warning: Function parameter or member 'of_match_table' not described in 'w1_family'
   drivers/gpio/gpiolib-of.c:92: warning: Excess function parameter 'dev' description in 'of_gpio_need_valid_mask'
   include/linux/i2c.h:337: warning: Function parameter or member 'init_irq' not described in 'i2c_client'
   kernel/dma/coherent.c:1: warning: no structured comments found
   include/linux/input/sparse-keymap.h:43: warning: Function parameter or member 'sw' not described in 'key_entry'
   include/linux/skbuff.h:888: warning: Function parameter or member 'dev_scratch' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'list' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'ip_defrag_offset' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'skb_mstamp_ns' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member '__cloned_offset' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'head_frag' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member '__pkt_type_offset' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'encapsulation' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'encap_hdr_csum' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'csum_valid' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member '__pkt_vlan_present_offset' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'vlan_present' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'csum_complete_sw' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'csum_level' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'inner_protocol_type' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'remcsum_offload' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'sender_cpu' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'reserved_tailroom' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'inner_ipproto' not described in 'sk_buff'
   include/net/sock.h:233: warning: Function parameter or member 'skc_addrpair' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_portpair' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_ipv6only' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_net_refcnt' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_v6_daddr' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_v6_rcv_saddr' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_cookie' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_listener' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_tw_dr' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_rcv_wnd' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_tw_rcv_nxt' not described in 'sock_common'
   include/net/sock.h:515: warning: Function parameter or member 'sk_rx_skb_cache' not described in 'sock'
   include/net/sock.h:515: warning: Function parameter or member 'sk_wq_raw' not described in 'sock'
   include/net/sock.h:515: warning: Function parameter or member 'tcp_rtx_queue' not described in 'sock'
   include/net/sock.h:515: warning: Function parameter or member 'sk_tx_skb_cache' not described in 'sock'
   include/net/sock.h:515: warning: Function parameter or member 'sk_route_forced_caps' not described in 'sock'
   include/net/sock.h:515: warning: Function parameter or member 'sk_txtime_report_errors' not described in 'sock'
   include/net/sock.h:515: warning: Function parameter or member 'sk_validate_xmit_skb' not described in 'sock'
   include/net/sock.h:515: warning: Function parameter or member 'sk_bpf_storage' not described in 'sock'
   include/net/sock.h:2439: warning: Function parameter or member 'tcp_rx_skb_cache_key' not described in 'DECLARE_STATIC_KEY_FALSE'
   include/net/sock.h:2439: warning: Excess function parameter 'sk' description in 'DECLARE_STATIC_KEY_FALSE'
   include/net/sock.h:2439: warning: Excess function parameter 'skb' description in 'DECLARE_STATIC_KEY_FALSE'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'gso_partial_features' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'l3mdev_ops' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'xfrmdev_ops' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'tlsdev_ops' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'name_assign_type' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'ieee802154_ptr' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'mpls_ptr' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'xdp_prog' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'gro_flush_timeout' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'nf_hooks_ingress' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member '____cacheline_aligned_in_smp' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'qdisc_hash' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'xps_cpus_map' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'xps_rxqs_map' not described in 'net_device'
   include/linux/phylink.h:56: warning: Function parameter or member '__ETHTOOL_DECLARE_LINK_MODE_MASK(advertising' not described in 'phylink_link_state'
   include/linux/phylink.h:56: warning: Function parameter or member '__ETHTOOL_DECLARE_LINK_MODE_MASK(lp_advertising' not described in 'phylink_link_state'
   drivers/net/phy/phylink.c:595: warning: Function parameter or member 'config' not described in 'phylink_create'
   drivers/net/phy/phylink.c:595: warning: Excess function parameter 'ndev' description in 'phylink_create'
   lib/genalloc.c:1: warning: 'gen_pool_add_virt' not found
   lib/genalloc.c:1: warning: 'gen_pool_alloc' not found
   lib/genalloc.c:1: warning: 'gen_pool_free' not found
   lib/genalloc.c:1: warning: 'gen_pool_alloc_algo' not found
   include/linux/bitmap.h:341: warning: Function parameter or member 'nbits' not described in 'bitmap_or_equal'
   include/linux/rculist.h:374: warning: Excess function parameter 'cond' description in 'list_for_each_entry_rcu'
   include/linux/rculist.h:651: warning: Excess function parameter 'cond' description in 'hlist_for_each_entry_rcu'
   mm/util.c:1: warning: 'get_user_pages_fast' not found
   mm/slab.c:4215: warning: Function parameter or member 'objp' not described in '__ksize'
   drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c:335: warning: Excess function parameter 'dev' description in 'amdgpu_gem_prime_export'
   drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c:336: warning: Excess function parameter 'dev' description in 'amdgpu_gem_prime_export'
   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c:142: warning: Function parameter or member 'blockable' not described in 'amdgpu_mn_read_lock'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:347: warning: cannot understand function prototype: 'struct amdgpu_vm_pt_cursor '
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:348: warning: cannot understand function prototype: 'struct amdgpu_vm_pt_cursor '
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:494: warning: Function parameter or member 'start' not described in 'amdgpu_vm_pt_first_dfs'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:546: warning: Function parameter or member 'adev' not described in 'for_each_amdgpu_vm_pt_dfs_safe'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:546: warning: Function parameter or member 'vm' not described in 'for_each_amdgpu_vm_pt_dfs_safe'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:546: warning: Function parameter or member 'start' not described in 'for_each_amdgpu_vm_pt_dfs_safe'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:546: warning: Function parameter or member 'cursor' not described in 'for_each_amdgpu_vm_pt_dfs_safe'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:546: warning: Function parameter or member 'entry' not described in 'for_each_amdgpu_vm_pt_dfs_safe'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:821: warning: Function parameter or member 'level' not described in 'amdgpu_vm_bo_param'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1283: warning: Function parameter or member 'params' not described in 'amdgpu_vm_update_flags'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1283: warning: Function parameter or member 'bo' not described in 'amdgpu_vm_update_flags'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1283: warning: Function parameter or member 'level' not described in 'amdgpu_vm_update_flags'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1283: warning: Function parameter or member 'pe' not described in 'amdgpu_vm_update_flags'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1283: warning: Function parameter or member 'addr' not described in 'amdgpu_vm_update_flags'

vim +139 fs/anon_inodes.c

562787a5c32ccd Davide Libenzi    2009-09-22  119  
562787a5c32ccd Davide Libenzi    2009-09-22  120  /**
428e297f7ee416 Daniel Colascione 2019-10-12  121   * anon_inode_getfd2 - creates a new file instance by hooking it up to an
562787a5c32ccd Davide Libenzi    2009-09-22  122   *                     anonymous inode, and a dentry that describe the "class"
562787a5c32ccd Davide Libenzi    2009-09-22  123   *                     of the file
562787a5c32ccd Davide Libenzi    2009-09-22  124   *
562787a5c32ccd Davide Libenzi    2009-09-22  125   * @name:    [in]    name of the "class" of the new file
562787a5c32ccd Davide Libenzi    2009-09-22  126   * @fops:    [in]    file operations for the new file
562787a5c32ccd Davide Libenzi    2009-09-22  127   * @priv:    [in]    private data for the new file (will be file's private_data)
562787a5c32ccd Davide Libenzi    2009-09-22  128   * @flags:   [in]    flags
562787a5c32ccd Davide Libenzi    2009-09-22  129   *
562787a5c32ccd Davide Libenzi    2009-09-22  130   * Creates a new file by hooking it on a single inode. This is useful for files
562787a5c32ccd Davide Libenzi    2009-09-22  131   * that do not need to have a full-fledged inode in order to operate correctly.
562787a5c32ccd Davide Libenzi    2009-09-22  132   * All the files created with anon_inode_getfd() will share a single inode,
562787a5c32ccd Davide Libenzi    2009-09-22  133   * hence saving memory and avoiding code duplication for the file/inode/dentry
562787a5c32ccd Davide Libenzi    2009-09-22  134   * setup.  Returns new descriptor or an error code.
562787a5c32ccd Davide Libenzi    2009-09-22  135   */
428e297f7ee416 Daniel Colascione 2019-10-12  136  int anon_inode_getfd2(const char *name, const struct file_operations *fops,
428e297f7ee416 Daniel Colascione 2019-10-12  137  		      void *priv, int flags, int anon_inode_flags)
562787a5c32ccd Davide Libenzi    2009-09-22  138  {
562787a5c32ccd Davide Libenzi    2009-09-22 @139  	int error, fd;
562787a5c32ccd Davide Libenzi    2009-09-22  140  	struct file *file;
562787a5c32ccd Davide Libenzi    2009-09-22  141  
562787a5c32ccd Davide Libenzi    2009-09-22  142  	error = get_unused_fd_flags(flags);
562787a5c32ccd Davide Libenzi    2009-09-22  143  	if (error < 0)
562787a5c32ccd Davide Libenzi    2009-09-22  144  		return error;
562787a5c32ccd Davide Libenzi    2009-09-22  145  	fd = error;
562787a5c32ccd Davide Libenzi    2009-09-22  146  
428e297f7ee416 Daniel Colascione 2019-10-12  147  	file = anon_inode_getfile2(name, fops, priv, flags, anon_inode_flags);
562787a5c32ccd Davide Libenzi    2009-09-22  148  	if (IS_ERR(file)) {
562787a5c32ccd Davide Libenzi    2009-09-22  149  		error = PTR_ERR(file);
562787a5c32ccd Davide Libenzi    2009-09-22  150  		goto err_put_unused_fd;
562787a5c32ccd Davide Libenzi    2009-09-22  151  	}
5dc8bf8132d59c Davide Libenzi    2007-05-10  152  	fd_install(fd, file);
5dc8bf8132d59c Davide Libenzi    2007-05-10  153  
2030a42cecd4dd Al Viro           2008-02-23  154  	return fd;
5dc8bf8132d59c Davide Libenzi    2007-05-10  155  
5dc8bf8132d59c Davide Libenzi    2007-05-10  156  err_put_unused_fd:
5dc8bf8132d59c Davide Libenzi    2007-05-10  157  	put_unused_fd(fd);
5dc8bf8132d59c Davide Libenzi    2007-05-10  158  	return error;
5dc8bf8132d59c Davide Libenzi    2007-05-10  159  }
d6d281684913da Avi Kivity        2007-06-28  160  EXPORT_SYMBOL_GPL(anon_inode_getfd);
428e297f7ee416 Daniel Colascione 2019-10-12  161  EXPORT_SYMBOL_GPL(anon_inode_getfd2);
5dc8bf8132d59c Davide Libenzi    2007-05-10  162  

:::::: The code at line 139 was first introduced by commit
:::::: 562787a5c32ccdf182de27793a83a9f2ee86cd77 anonfd: split interface into file creation and install

:::::: TO: Davide Libenzi <davidel@xmailserver.org>
:::::: CC: Linus Torvalds <torvalds@linux-foundation.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7279 bytes --]

^ permalink raw reply

* [PATCH v7 0/3] add thermal/power management features for FPGA DFL drivers
From: Wu Hao @ 2019-10-14  5:42 UTC (permalink / raw)
  To: mdf, linux-fpga, linux-kernel
  Cc: linux-api, linux-hwmon, linux, jdelvare, gregkh, Wu Hao

Hi Moritz and all,

This patchset adds thermal and power management features for FPGA DFL
drivers. Both patches are using hwmon as userspace interfaces.

This patchset is created on top of 5.4-rc3, please help with review to see
if any comments, thank you very much!

Main changes from v6:
 - update kernel version and date in sysfs doc.

Main changes from v5:
 - rebase and clean up (remove empty uinit function) per changes in recent
   merged dfl patches.
 - update date in sysfs doc.

Main changes from v4:
 - rebase due to Documentation format change (dfl.txt -> rst).
 - clamp threshold inputs for sysfs interfaces. (patch#3)
 - update sysfs doc to add more description for ltr sysfs interfaces.
   (patch#3)

Main changes from v3:
 - use HWMON_CHANNEL_INFO.

Main changes from v2:
 - switch to standard hwmon APIs for thermal hwmon:
     temp1_alarm        --> temp1_max
     temp1_alarm_status --> temp1_max_alarm
     temp1_crit_status  --> temp1_crit_alarm
     temp1_alarm_policy --> temp1_max_policy
 - switch to standard hwmon APIs for power hwmon:
     power1_cap         --> power1_max
     power1_cap_status  --> power1_max_alarm
     power1_crit_status --> power1_crit_alarm

Wu Hao (2):
  fpga: dfl: fme: add thermal management support
  fpga: dfl: fme: add power management support

Xu Yilun (1):
  Documentation: fpga: dfl: add descriptions for thermal/power
    management interfaces

 Documentation/ABI/testing/sysfs-platform-dfl-fme | 132 ++++++++
 Documentation/fpga/dfl.rst                       |  10 +
 drivers/fpga/Kconfig                             |   2 +-
 drivers/fpga/dfl-fme-main.c                      | 385 +++++++++++++++++++++++
 4 files changed, 528 insertions(+), 1 deletion(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH v7 1/3] Documentation: fpga: dfl: add descriptions for thermal/power management interfaces
From: Wu Hao @ 2019-10-14  5:42 UTC (permalink / raw)
  To: mdf, linux-fpga, linux-kernel
  Cc: linux-api, linux-hwmon, linux, jdelvare, gregkh, Xu Yilun, Wu Hao
In-Reply-To: <1571031723-12101-1-git-send-email-hao.wu@intel.com>

From: Xu Yilun <yilun.xu@intel.com>

This patch add introductions to thermal/power interfaces. They are
implemented as hwmon sysfs interfaces by thermal/power private
feature drivers.

Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
---
 Documentation/fpga/dfl.rst | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/fpga/dfl.rst b/Documentation/fpga/dfl.rst
index 6fa483f..094fc8a 100644
--- a/Documentation/fpga/dfl.rst
+++ b/Documentation/fpga/dfl.rst
@@ -108,6 +108,16 @@ More functions are exposed through sysfs
      error reporting sysfs interfaces allow user to read errors detected by the
      hardware, and clear the logged errors.
 
+ Power management (dfl_fme_power hwmon)
+     power management hwmon sysfs interfaces allow user to read power management
+     information (power consumption, thresholds, threshold status, limits, etc.)
+     and configure power thresholds for different throttling levels.
+
+ Thermal management (dfl_fme_thermal hwmon)
+     thermal management hwmon sysfs interfaces allow user to read thermal
+     management information (current temperature, thresholds, threshold status,
+     etc.).
+
 
 FIU - PORT
 ==========
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v7 2/3] fpga: dfl: fme: add thermal management support
From: Wu Hao @ 2019-10-14  5:42 UTC (permalink / raw)
  To: mdf, linux-fpga, linux-kernel
  Cc: linux-api, linux-hwmon, linux, jdelvare, gregkh, Wu Hao,
	Luwei Kang, Russ Weight, Xu Yilun
In-Reply-To: <1571031723-12101-1-git-send-email-hao.wu@intel.com>

This patch adds support to thermal management private feature for DFL
FPGA Management Engine (FME). This private feature driver registers
a hwmon for thermal/temperature monitoring (hwmon temp1_input).
If hardware automatic throttling is supported by this hardware, then
driver also exposes sysfs interfaces under hwmon for thresholds
(temp1_max/ crit/ emergency), threshold alarms (temp1_max_alarm/
temp1_crit_alarm) and throttling policy (temp1_max_policy).

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Russ Weight <russell.h.weight@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Moritz Fischer <mdf@kernel.org>
---
v2: create a dfl_fme_thermal hwmon to expose thermal information.
    move all sysfs interfaces under hwmon
	tempareture       --> hwmon temp1_input
	threshold1        --> hwmon temp1_alarm
	threshold2        --> hwmon temp1_crit
	trip_threshold    --> hwmon temp1_emergency
	threshold1_status --> hwmon temp1_alarm_status
	threshold2_status --> hwmon temp1_crit_status
	threshold1_policy --> hwmon temp1_alarm_policy
v3: rename some hwmon sysfs interfaces to follow hwmon ABI.
	temp1_alarm        --> temp1_max
	temp1_alarm_status --> temp1_max_alarm
	temp1_crit_status  --> temp1_crit_alarm
	temp1_alarm_policy --> temp1_max_policy
    update sysfs doc for above sysfs interface changes.
    replace scnprintf with sprintf in sysfs interface.
v4: use HWMON_CHANNEL_INFO.
    rebase, and update date in sysfs doc.
v5: no change.
v6: rebased, and clean up (remove empty uinit function).
    update date in sysfs doc.
v7: update kernel version and date in sysfs doc.
---
 Documentation/ABI/testing/sysfs-platform-dfl-fme |  64 ++++++++
 drivers/fpga/Kconfig                             |   2 +-
 drivers/fpga/dfl-fme-main.c                      | 178 +++++++++++++++++++++++
 3 files changed, 243 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
index 72634d3..8eb6d03 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
@@ -106,3 +106,67 @@ KernelVersion:  5.4
 Contact:	Wu Hao <hao.wu@intel.com>
 Description:	Read-only. Read this file to get the second error detected by
 		hardware.
+
+What:		/sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/name
+Date:		October 2019
+KernelVersion:	5.5
+Contact:	Wu Hao <hao.wu@intel.com>
+Description:	Read-Only. Read this file to get the name of hwmon device, it
+		supports values:
+		    'dfl_fme_thermal' - thermal hwmon device name
+
+What:		/sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_input
+Date:		October 2019
+KernelVersion:	5.5
+Contact:	Wu Hao <hao.wu@intel.com>
+Description:	Read-Only. It returns FPGA device temperature in millidegrees
+		Celsius.
+
+What:		/sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_max
+Date:		October 2019
+KernelVersion:	5.5
+Contact:	Wu Hao <hao.wu@intel.com>
+Description:	Read-Only. It returns hardware threshold1 temperature in
+		millidegrees Celsius. If temperature rises at or above this
+		threshold, hardware starts 50% or 90% throttling (see
+		'temp1_max_policy').
+
+What:		/sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_crit
+Date:		October 2019
+KernelVersion:	5.5
+Contact:	Wu Hao <hao.wu@intel.com>
+Description:	Read-Only. It returns hardware threshold2 temperature in
+		millidegrees Celsius. If temperature rises at or above this
+		threshold, hardware starts 100% throttling.
+
+What:		/sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_emergency
+Date:		October 2019
+KernelVersion:	5.5
+Contact:	Wu Hao <hao.wu@intel.com>
+Description:	Read-Only. It returns hardware trip threshold temperature in
+		millidegrees Celsius. If temperature rises at or above this
+		threshold, a fatal event will be triggered to board management
+		controller (BMC) to shutdown FPGA.
+
+What:		/sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_max_alarm
+Date:		October 2019
+KernelVersion:	5.5
+Contact:	Wu Hao <hao.wu@intel.com>
+Description:	Read-only. It returns 1 if temperature is currently at or above
+		hardware threshold1 (see 'temp1_max'), otherwise 0.
+
+What:		/sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_crit_alarm
+Date:		October 2019
+KernelVersion:	5.5
+Contact:	Wu Hao <hao.wu@intel.com>
+Description:	Read-only. It returns 1 if temperature is currently at or above
+		hardware threshold2 (see 'temp1_crit'), otherwise 0.
+
+What:		/sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_max_policy
+Date:		October 2019
+KernelVersion:	5.5
+Contact:	Wu Hao <hao.wu@intel.com>
+Description:	Read-Only. Read this file to get the policy of hardware threshold1
+		(see 'temp1_max'). It only supports two values (policies):
+		    0 - AP2 state (90% throttling)
+		    1 - AP1 state (50% throttling)
diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
index 73c779e..72380e1 100644
--- a/drivers/fpga/Kconfig
+++ b/drivers/fpga/Kconfig
@@ -156,7 +156,7 @@ config FPGA_DFL
 
 config FPGA_DFL_FME
 	tristate "FPGA DFL FME Driver"
-	depends on FPGA_DFL
+	depends on FPGA_DFL && HWMON
 	help
 	  The FPGA Management Engine (FME) is a feature device implemented
 	  under Device Feature List (DFL) framework. Select this option to
diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
index 4d78e18..752d71c 100644
--- a/drivers/fpga/dfl-fme-main.c
+++ b/drivers/fpga/dfl-fme-main.c
@@ -14,6 +14,8 @@
  *   Henry Mitchel <henry.mitchel@intel.com>
  */
 
+#include <linux/hwmon.h>
+#include <linux/hwmon-sysfs.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/uaccess.h>
@@ -181,6 +183,178 @@ static long fme_hdr_ioctl(struct platform_device *pdev,
 	.ioctl = fme_hdr_ioctl,
 };
 
+#define FME_THERM_THRESHOLD	0x8
+#define TEMP_THRESHOLD1		GENMASK_ULL(6, 0)
+#define TEMP_THRESHOLD1_EN	BIT_ULL(7)
+#define TEMP_THRESHOLD2		GENMASK_ULL(14, 8)
+#define TEMP_THRESHOLD2_EN	BIT_ULL(15)
+#define TRIP_THRESHOLD		GENMASK_ULL(30, 24)
+#define TEMP_THRESHOLD1_STATUS	BIT_ULL(32)		/* threshold1 reached */
+#define TEMP_THRESHOLD2_STATUS	BIT_ULL(33)		/* threshold2 reached */
+/* threshold1 policy: 0 - AP2 (90% throttle) / 1 - AP1 (50% throttle) */
+#define TEMP_THRESHOLD1_POLICY	BIT_ULL(44)
+
+#define FME_THERM_RDSENSOR_FMT1	0x10
+#define FPGA_TEMPERATURE	GENMASK_ULL(6, 0)
+
+#define FME_THERM_CAP		0x20
+#define THERM_NO_THROTTLE	BIT_ULL(0)
+
+#define MD_PRE_DEG
+
+static bool fme_thermal_throttle_support(void __iomem *base)
+{
+	u64 v = readq(base + FME_THERM_CAP);
+
+	return FIELD_GET(THERM_NO_THROTTLE, v) ? false : true;
+}
+
+static umode_t thermal_hwmon_attrs_visible(const void *drvdata,
+					   enum hwmon_sensor_types type,
+					   u32 attr, int channel)
+{
+	const struct dfl_feature *feature = drvdata;
+
+	/* temperature is always supported, and check hardware cap for others */
+	if (attr == hwmon_temp_input)
+		return 0444;
+
+	return fme_thermal_throttle_support(feature->ioaddr) ? 0444 : 0;
+}
+
+static int thermal_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
+			      u32 attr, int channel, long *val)
+{
+	struct dfl_feature *feature = dev_get_drvdata(dev);
+	u64 v;
+
+	switch (attr) {
+	case hwmon_temp_input:
+		v = readq(feature->ioaddr + FME_THERM_RDSENSOR_FMT1);
+		*val = (long)(FIELD_GET(FPGA_TEMPERATURE, v) * 1000);
+		break;
+	case hwmon_temp_max:
+		v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
+		*val = (long)(FIELD_GET(TEMP_THRESHOLD1, v) * 1000);
+		break;
+	case hwmon_temp_crit:
+		v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
+		*val = (long)(FIELD_GET(TEMP_THRESHOLD2, v) * 1000);
+		break;
+	case hwmon_temp_emergency:
+		v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
+		*val = (long)(FIELD_GET(TRIP_THRESHOLD, v) * 1000);
+		break;
+	case hwmon_temp_max_alarm:
+		v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
+		*val = (long)FIELD_GET(TEMP_THRESHOLD1_STATUS, v);
+		break;
+	case hwmon_temp_crit_alarm:
+		v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
+		*val = (long)FIELD_GET(TEMP_THRESHOLD2_STATUS, v);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
+static const struct hwmon_ops thermal_hwmon_ops = {
+	.is_visible = thermal_hwmon_attrs_visible,
+	.read = thermal_hwmon_read,
+};
+
+static const struct hwmon_channel_info *thermal_hwmon_info[] = {
+	HWMON_CHANNEL_INFO(temp, HWMON_T_INPUT | HWMON_T_EMERGENCY |
+				 HWMON_T_MAX   | HWMON_T_MAX_ALARM |
+				 HWMON_T_CRIT  | HWMON_T_CRIT_ALARM),
+	NULL
+};
+
+static const struct hwmon_chip_info thermal_hwmon_chip_info = {
+	.ops = &thermal_hwmon_ops,
+	.info = thermal_hwmon_info,
+};
+
+static ssize_t temp1_max_policy_show(struct device *dev,
+				     struct device_attribute *attr, char *buf)
+{
+	struct dfl_feature *feature = dev_get_drvdata(dev);
+	u64 v;
+
+	v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
+
+	return sprintf(buf, "%u\n",
+		       (unsigned int)FIELD_GET(TEMP_THRESHOLD1_POLICY, v));
+}
+
+static DEVICE_ATTR_RO(temp1_max_policy);
+
+static struct attribute *thermal_extra_attrs[] = {
+	&dev_attr_temp1_max_policy.attr,
+	NULL,
+};
+
+static umode_t thermal_extra_attrs_visible(struct kobject *kobj,
+					   struct attribute *attr, int index)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct dfl_feature *feature = dev_get_drvdata(dev);
+
+	return fme_thermal_throttle_support(feature->ioaddr) ? attr->mode : 0;
+}
+
+static const struct attribute_group thermal_extra_group = {
+	.attrs		= thermal_extra_attrs,
+	.is_visible	= thermal_extra_attrs_visible,
+};
+__ATTRIBUTE_GROUPS(thermal_extra);
+
+static int fme_thermal_mgmt_init(struct platform_device *pdev,
+				 struct dfl_feature *feature)
+{
+	struct device *hwmon;
+
+	/*
+	 * create hwmon to allow userspace monitoring temperature and other
+	 * threshold information.
+	 *
+	 * temp1_input      -> FPGA device temperature
+	 * temp1_max        -> hardware threshold 1 -> 50% or 90% throttling
+	 * temp1_crit       -> hardware threshold 2 -> 100% throttling
+	 * temp1_emergency  -> hardware trip_threshold to shutdown FPGA
+	 * temp1_max_alarm  -> hardware threshold 1 alarm
+	 * temp1_crit_alarm -> hardware threshold 2 alarm
+	 *
+	 * create device specific sysfs interfaces, e.g. read temp1_max_policy
+	 * to understand the actual hardware throttling action (50% vs 90%).
+	 *
+	 * If hardware doesn't support automatic throttling per thresholds,
+	 * then all above sysfs interfaces are not visible except temp1_input
+	 * for temperature.
+	 */
+	hwmon = devm_hwmon_device_register_with_info(&pdev->dev,
+						     "dfl_fme_thermal", feature,
+						     &thermal_hwmon_chip_info,
+						     thermal_extra_groups);
+	if (IS_ERR(hwmon)) {
+		dev_err(&pdev->dev, "Fail to register thermal hwmon\n");
+		return PTR_ERR(hwmon);
+	}
+
+	return 0;
+}
+
+static const struct dfl_feature_id fme_thermal_mgmt_id_table[] = {
+	{.id = FME_FEATURE_ID_THERMAL_MGMT,},
+	{0,}
+};
+
+static const struct dfl_feature_ops fme_thermal_mgmt_ops = {
+	.init = fme_thermal_mgmt_init,
+};
+
 static struct dfl_feature_driver fme_feature_drvs[] = {
 	{
 		.id_table = fme_hdr_id_table,
@@ -195,6 +369,10 @@ static long fme_hdr_ioctl(struct platform_device *pdev,
 		.ops = &fme_global_err_ops,
 	},
 	{
+		.id_table = fme_thermal_mgmt_id_table,
+		.ops = &fme_thermal_mgmt_ops,
+	},
+	{
 		.ops = NULL,
 	},
 };
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v7 3/3] fpga: dfl: fme: add power management support
From: Wu Hao @ 2019-10-14  5:42 UTC (permalink / raw)
  To: mdf, linux-fpga, linux-kernel
  Cc: linux-api, linux-hwmon, linux, jdelvare, gregkh, Wu Hao,
	Luwei Kang, Xu Yilun
In-Reply-To: <1571031723-12101-1-git-send-email-hao.wu@intel.com>

This patch adds support for power management private feature under
FPGA Management Engine (FME). This private feature driver registers
a hwmon for power (power1_input), thresholds information, e.g.
(power1_max / crit / max_alarm / crit_alarm) and also read-only sysfs
interfaces for other power management information. For configuration,
user could write threshold values via above power1_max / crit sysfs
interface under hwmon too.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Moritz Fischer <mdf@kernel.org>
---
v2: create a dfl_fme_power hwmon to expose power sysfs interfaces.
    move all sysfs interfaces under hwmon
        consumed          --> hwmon power1_input
        threshold1        --> hwmon power1_cap
        threshold2        --> hwmon power1_crit
        threshold1_status --> hwmon power1_cap_status
        threshold2_status --> hwmon power1_crit_status
        xeon_limit        --> hwmon power1_xeon_limit
        fpga_limit        --> hwmon power1_fpga_limit
        ltr               --> hwmon power1_ltr
v3: rename some hwmon sysfs interfaces to follow hwmon ABI.
	power1_cap         --> power1_max
	power1_cap_status  --> power1_max_alarm
	power1_crit_status --> power1_crit_alarm
    update sysfs doc for above sysfs interface changes.
    replace scnprintf with sprintf in sysfs interface.
v4: use HWMON_CHANNEL_INFO.
    update date in sysfs doc.
v5: clamp threshold inputs in power_hwmon_write function.
    update sysfs doc as threshold inputs are clamped now.
    add more descriptions to ltr sysfs interface.
v6: rebase and clean up (remove empty uinit function).
    update date in sysfs doc.
v7: update kernel version and date in sysfs doc.
---
 Documentation/ABI/testing/sysfs-platform-dfl-fme |  68 ++++++++
 drivers/fpga/dfl-fme-main.c                      | 207 +++++++++++++++++++++++
 2 files changed, 275 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
index 8eb6d03..3683cb1c 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
@@ -114,6 +114,7 @@ Contact:	Wu Hao <hao.wu@intel.com>
 Description:	Read-Only. Read this file to get the name of hwmon device, it
 		supports values:
 		    'dfl_fme_thermal' - thermal hwmon device name
+		    'dfl_fme_power'   - power hwmon device name
 
 What:		/sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_input
 Date:		October 2019
@@ -170,3 +171,70 @@ Description:	Read-Only. Read this file to get the policy of hardware threshold1
 		(see 'temp1_max'). It only supports two values (policies):
 		    0 - AP2 state (90% throttling)
 		    1 - AP1 state (50% throttling)
+
+What:		/sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_input
+Date:		October 2019
+KernelVersion:	5.5
+Contact:	Wu Hao <hao.wu@intel.com>
+Description:	Read-Only. It returns current FPGA power consumption in uW.
+
+What:		/sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_max
+Date:		October 2019
+KernelVersion:	5.5
+Contact:	Wu Hao <hao.wu@intel.com>
+Description:	Read-Write. Read this file to get current hardware power
+		threshold1 in uW. If power consumption rises at or above
+		this threshold, hardware starts 50% throttling.
+		Write this file to set current hardware power threshold1 in uW.
+		As hardware only accepts values in Watts, so input value will
+		be round down per Watts (< 1 watts part will be discarded) and
+		clamped within the range from 0 to 127 Watts. Write fails with
+		-EINVAL if input parsing fails.
+
+What:		/sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_crit
+Date:		October 2019
+KernelVersion:	5.5
+Contact:	Wu Hao <hao.wu@intel.com>
+Description:	Read-Write. Read this file to get current hardware power
+		threshold2 in uW. If power consumption rises at or above
+		this threshold, hardware starts 90% throttling.
+		Write this file to set current hardware power threshold2 in uW.
+		As hardware only accepts values in Watts, so input value will
+		be round down per Watts (< 1 watts part will be discarded) and
+		clamped within the range from 0 to 127 Watts. Write fails with
+		-EINVAL if input parsing fails.
+
+What:		/sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_max_alarm
+Date:		October 2019
+KernelVersion:	5.5
+Contact:	Wu Hao <hao.wu@intel.com>
+Description:	Read-only. It returns 1 if power consumption is currently at or
+		above hardware threshold1 (see 'power1_max'), otherwise 0.
+
+What:		/sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_crit_alarm
+Date:		October 2019
+KernelVersion:	5.5
+Contact:	Wu Hao <hao.wu@intel.com>
+Description:	Read-only. It returns 1 if power consumption is currently at or
+		above hardware threshold2 (see 'power1_crit'), otherwise 0.
+
+What:		/sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_xeon_limit
+Date:		October 2019
+KernelVersion:	5.5
+Contact:	Wu Hao <hao.wu@intel.com>
+Description:	Read-Only. It returns power limit for XEON in uW.
+
+What:		/sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_fpga_limit
+Date:		October 2019
+KernelVersion:	5.5
+Contact:	Wu Hao <hao.wu@intel.com>
+Description:	Read-Only. It returns power limit for FPGA in uW.
+
+What:		/sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_ltr
+Date:		October 2019
+KernelVersion:	5.5
+Contact:	Wu Hao <hao.wu@intel.com>
+Description:	Read-only. Read this file to get current Latency Tolerance
+		Reporting (ltr) value. It returns 1 if all Accelerated
+		Function Units (AFUs) can tolerate latency >= 40us for memory
+		access or 0 if any AFU is latency sensitive (< 40us).
diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
index 752d71c..7c930e6 100644
--- a/drivers/fpga/dfl-fme-main.c
+++ b/drivers/fpga/dfl-fme-main.c
@@ -355,6 +355,209 @@ static int fme_thermal_mgmt_init(struct platform_device *pdev,
 	.init = fme_thermal_mgmt_init,
 };
 
+#define FME_PWR_STATUS		0x8
+#define FME_LATENCY_TOLERANCE	BIT_ULL(18)
+#define PWR_CONSUMED		GENMASK_ULL(17, 0)
+
+#define FME_PWR_THRESHOLD	0x10
+#define PWR_THRESHOLD1		GENMASK_ULL(6, 0)	/* in Watts */
+#define PWR_THRESHOLD2		GENMASK_ULL(14, 8)	/* in Watts */
+#define PWR_THRESHOLD_MAX	0x7f			/* in Watts */
+#define PWR_THRESHOLD1_STATUS	BIT_ULL(16)
+#define PWR_THRESHOLD2_STATUS	BIT_ULL(17)
+
+#define FME_PWR_XEON_LIMIT	0x18
+#define XEON_PWR_LIMIT		GENMASK_ULL(14, 0)	/* in 0.1 Watts */
+#define XEON_PWR_EN		BIT_ULL(15)
+#define FME_PWR_FPGA_LIMIT	0x20
+#define FPGA_PWR_LIMIT		GENMASK_ULL(14, 0)	/* in 0.1 Watts */
+#define FPGA_PWR_EN		BIT_ULL(15)
+
+static int power_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
+			    u32 attr, int channel, long *val)
+{
+	struct dfl_feature *feature = dev_get_drvdata(dev);
+	u64 v;
+
+	switch (attr) {
+	case hwmon_power_input:
+		v = readq(feature->ioaddr + FME_PWR_STATUS);
+		*val = (long)(FIELD_GET(PWR_CONSUMED, v) * 1000000);
+		break;
+	case hwmon_power_max:
+		v = readq(feature->ioaddr + FME_PWR_THRESHOLD);
+		*val = (long)(FIELD_GET(PWR_THRESHOLD1, v) * 1000000);
+		break;
+	case hwmon_power_crit:
+		v = readq(feature->ioaddr + FME_PWR_THRESHOLD);
+		*val = (long)(FIELD_GET(PWR_THRESHOLD2, v) * 1000000);
+		break;
+	case hwmon_power_max_alarm:
+		v = readq(feature->ioaddr + FME_PWR_THRESHOLD);
+		*val = (long)FIELD_GET(PWR_THRESHOLD1_STATUS, v);
+		break;
+	case hwmon_power_crit_alarm:
+		v = readq(feature->ioaddr + FME_PWR_THRESHOLD);
+		*val = (long)FIELD_GET(PWR_THRESHOLD2_STATUS, v);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
+static int power_hwmon_write(struct device *dev, enum hwmon_sensor_types type,
+			     u32 attr, int channel, long val)
+{
+	struct dfl_feature_platform_data *pdata = dev_get_platdata(dev->parent);
+	struct dfl_feature *feature = dev_get_drvdata(dev);
+	int ret = 0;
+	u64 v;
+
+	val = clamp_val(val / 1000000, 0, PWR_THRESHOLD_MAX);
+
+	mutex_lock(&pdata->lock);
+
+	switch (attr) {
+	case hwmon_power_max:
+		v = readq(feature->ioaddr + FME_PWR_THRESHOLD);
+		v &= ~PWR_THRESHOLD1;
+		v |= FIELD_PREP(PWR_THRESHOLD1, val);
+		writeq(v, feature->ioaddr + FME_PWR_THRESHOLD);
+		break;
+	case hwmon_power_crit:
+		v = readq(feature->ioaddr + FME_PWR_THRESHOLD);
+		v &= ~PWR_THRESHOLD2;
+		v |= FIELD_PREP(PWR_THRESHOLD2, val);
+		writeq(v, feature->ioaddr + FME_PWR_THRESHOLD);
+		break;
+	default:
+		ret = -EOPNOTSUPP;
+		break;
+	}
+
+	mutex_unlock(&pdata->lock);
+
+	return ret;
+}
+
+static umode_t power_hwmon_attrs_visible(const void *drvdata,
+					 enum hwmon_sensor_types type,
+					 u32 attr, int channel)
+{
+	switch (attr) {
+	case hwmon_power_input:
+	case hwmon_power_max_alarm:
+	case hwmon_power_crit_alarm:
+		return 0444;
+	case hwmon_power_max:
+	case hwmon_power_crit:
+		return 0644;
+	}
+
+	return 0;
+}
+
+static const struct hwmon_ops power_hwmon_ops = {
+	.is_visible = power_hwmon_attrs_visible,
+	.read = power_hwmon_read,
+	.write = power_hwmon_write,
+};
+
+static const struct hwmon_channel_info *power_hwmon_info[] = {
+	HWMON_CHANNEL_INFO(power, HWMON_P_INPUT |
+				  HWMON_P_MAX   | HWMON_P_MAX_ALARM |
+				  HWMON_P_CRIT  | HWMON_P_CRIT_ALARM),
+	NULL
+};
+
+static const struct hwmon_chip_info power_hwmon_chip_info = {
+	.ops = &power_hwmon_ops,
+	.info = power_hwmon_info,
+};
+
+static ssize_t power1_xeon_limit_show(struct device *dev,
+				      struct device_attribute *attr, char *buf)
+{
+	struct dfl_feature *feature = dev_get_drvdata(dev);
+	u16 xeon_limit = 0;
+	u64 v;
+
+	v = readq(feature->ioaddr + FME_PWR_XEON_LIMIT);
+
+	if (FIELD_GET(XEON_PWR_EN, v))
+		xeon_limit = FIELD_GET(XEON_PWR_LIMIT, v);
+
+	return sprintf(buf, "%u\n", xeon_limit * 100000);
+}
+
+static ssize_t power1_fpga_limit_show(struct device *dev,
+				      struct device_attribute *attr, char *buf)
+{
+	struct dfl_feature *feature = dev_get_drvdata(dev);
+	u16 fpga_limit = 0;
+	u64 v;
+
+	v = readq(feature->ioaddr + FME_PWR_FPGA_LIMIT);
+
+	if (FIELD_GET(FPGA_PWR_EN, v))
+		fpga_limit = FIELD_GET(FPGA_PWR_LIMIT, v);
+
+	return sprintf(buf, "%u\n", fpga_limit * 100000);
+}
+
+static ssize_t power1_ltr_show(struct device *dev,
+			       struct device_attribute *attr, char *buf)
+{
+	struct dfl_feature *feature = dev_get_drvdata(dev);
+	u64 v;
+
+	v = readq(feature->ioaddr + FME_PWR_STATUS);
+
+	return sprintf(buf, "%u\n",
+		       (unsigned int)FIELD_GET(FME_LATENCY_TOLERANCE, v));
+}
+
+static DEVICE_ATTR_RO(power1_xeon_limit);
+static DEVICE_ATTR_RO(power1_fpga_limit);
+static DEVICE_ATTR_RO(power1_ltr);
+
+static struct attribute *power_extra_attrs[] = {
+	&dev_attr_power1_xeon_limit.attr,
+	&dev_attr_power1_fpga_limit.attr,
+	&dev_attr_power1_ltr.attr,
+	NULL
+};
+
+ATTRIBUTE_GROUPS(power_extra);
+
+static int fme_power_mgmt_init(struct platform_device *pdev,
+			       struct dfl_feature *feature)
+{
+	struct device *hwmon;
+
+	hwmon = devm_hwmon_device_register_with_info(&pdev->dev,
+						     "dfl_fme_power", feature,
+						     &power_hwmon_chip_info,
+						     power_extra_groups);
+	if (IS_ERR(hwmon)) {
+		dev_err(&pdev->dev, "Fail to register power hwmon\n");
+		return PTR_ERR(hwmon);
+	}
+
+	return 0;
+}
+
+static const struct dfl_feature_id fme_power_mgmt_id_table[] = {
+	{.id = FME_FEATURE_ID_POWER_MGMT,},
+	{0,}
+};
+
+static const struct dfl_feature_ops fme_power_mgmt_ops = {
+	.init = fme_power_mgmt_init,
+};
+
 static struct dfl_feature_driver fme_feature_drvs[] = {
 	{
 		.id_table = fme_hdr_id_table,
@@ -373,6 +576,10 @@ static int fme_thermal_mgmt_init(struct platform_device *pdev,
 		.ops = &fme_thermal_mgmt_ops,
 	},
 	{
+		.id_table = fme_power_mgmt_id_table,
+		.ops = &fme_power_mgmt_ops,
+	},
+	{
 		.ops = NULL,
 	},
 };
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH v6 2/3] fpga: dfl: fme: add thermal management support
From: Wu, Hao @ 2019-10-14  6:12 UTC (permalink / raw)
  To: mdf@kernel.org, linux-fpga@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: linux-api@vger.kernel.org, linux-hwmon@vger.kernel.org,
	linux@roeck-us.net, jdelvare@suse.com, gregkh@linuxfoundation.org,
	Kang, Luwei, Weight, Russell H, Xu, Yilun
In-Reply-To: <1568094640-4920-3-git-send-email-hao.wu@intel.com>

Please ignore this one, sent by mistake. Sorry.
Latest version is v7 here: 
    https://lkml.org/lkml/2019/10/14/32

Hao

> -----Original Message-----
> Subject: [PATCH v6 2/3] fpga: dfl: fme: add thermal management support
> 
> This patch adds support to thermal management private feature for DFL
> FPGA Management Engine (FME). This private feature driver registers
> a hwmon for thermal/temperature monitoring (hwmon temp1_input).
> If hardware automatic throttling is supported by this hardware, then
> driver also exposes sysfs interfaces under hwmon for thresholds
> (temp1_max/ crit/ emergency), threshold alarms (temp1_max_alarm/
> temp1_crit_alarm) and throttling policy (temp1_max_policy).
> 
> Signed-off-by: Wu Hao <hao.wu@intel.com>
> ---
> v2: create a dfl_fme_thermal hwmon to expose thermal information.
>     move all sysfs interfaces under hwmon
> 	tempareture       --> hwmon temp1_input
> 	threshold1        --> hwmon temp1_alarm
> 	threshold2        --> hwmon temp1_crit
> 	trip_threshold    --> hwmon temp1_emergency
> 	threshold1_status --> hwmon temp1_alarm_status
> 	threshold2_status --> hwmon temp1_crit_status
> 	threshold1_policy --> hwmon temp1_alarm_policy
> v3: rename some hwmon sysfs interfaces to follow hwmon ABI.
> 	temp1_alarm        --> temp1_max
> 	temp1_alarm_status --> temp1_max_alarm
> 	temp1_crit_status  --> temp1_crit_alarm
> 	temp1_alarm_policy --> temp1_max_policy
>     update sysfs doc for above sysfs interface changes.
>     replace scnprintf with sprintf in sysfs interface.
> v4: use HWMON_CHANNEL_INFO.
>     rebase, and update date in sysfs doc.
> v5: no change.
> v6: rebased, and clean up (remove empty uinit function).
>     update date in sysfs doc.
> ---
>  Documentation/ABI/testing/sysfs-platform-dfl-fme |  64 ++++++++
>  drivers/fpga/Kconfig                             |   2 +-
>  drivers/fpga/dfl-fme-main.c                      | 178
> +++++++++++++++++++++++
>  3 files changed, 243 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> index 72634d3..c84b3c1 100644
> --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> @@ -106,3 +106,67 @@ KernelVersion:  5.4
>  Contact:	Wu Hao <hao.wu@intel.com>
>  Description:	Read-only. Read this file to get the second error detected by
>  		hardware.
> +
> +What:		/sys/bus/platform/devices/dfl-
> fme.0/hwmon/hwmonX/name
> +Date:		September 2019
> +KernelVersion:	5.4
> +Contact:	Wu Hao <hao.wu@intel.com>
> +Description:	Read-Only. Read this file to get the name of hwmon device, it
> +		supports values:
> +		    'dfl_fme_thermal' - thermal hwmon device name
> +
> +What:		/sys/bus/platform/devices/dfl-
> fme.0/hwmon/hwmonX/temp1_input
> +Date:		September 2019
> +KernelVersion:	5.4
> +Contact:	Wu Hao <hao.wu@intel.com>
> +Description:	Read-Only. It returns FPGA device temperature in millidegrees
> +		Celsius.
> +
> +What:		/sys/bus/platform/devices/dfl-
> fme.0/hwmon/hwmonX/temp1_max
> +Date:		September 2019
> +KernelVersion:	5.4
> +Contact:	Wu Hao <hao.wu@intel.com>
> +Description:	Read-Only. It returns hardware threshold1 temperature in
> +		millidegrees Celsius. If temperature rises at or above this
> +		threshold, hardware starts 50% or 90% throttling (see
> +		'temp1_max_policy').
> +
> +What:		/sys/bus/platform/devices/dfl-
> fme.0/hwmon/hwmonX/temp1_crit
> +Date:		September 2019
> +KernelVersion:	5.4
> +Contact:	Wu Hao <hao.wu@intel.com>
> +Description:	Read-Only. It returns hardware threshold2 temperature in
> +		millidegrees Celsius. If temperature rises at or above this
> +		threshold, hardware starts 100% throttling.
> +
> +What:		/sys/bus/platform/devices/dfl-
> fme.0/hwmon/hwmonX/temp1_emergency
> +Date:		September 2019
> +KernelVersion:	5.4
> +Contact:	Wu Hao <hao.wu@intel.com>
> +Description:	Read-Only. It returns hardware trip threshold temperature in
> +		millidegrees Celsius. If temperature rises at or above this
> +		threshold, a fatal event will be triggered to board management
> +		controller (BMC) to shutdown FPGA.
> +
> +What:		/sys/bus/platform/devices/dfl-
> fme.0/hwmon/hwmonX/temp1_max_alarm
> +Date:		September 2019
> +KernelVersion:	5.4
> +Contact:	Wu Hao <hao.wu@intel.com>
> +Description:	Read-only. It returns 1 if temperature is currently at or above
> +		hardware threshold1 (see 'temp1_max'), otherwise 0.
> +
> +What:		/sys/bus/platform/devices/dfl-
> fme.0/hwmon/hwmonX/temp1_crit_alarm
> +Date:		September 2019
> +KernelVersion:	5.4
> +Contact:	Wu Hao <hao.wu@intel.com>
> +Description:	Read-only. It returns 1 if temperature is currently at or above
> +		hardware threshold2 (see 'temp1_crit'), otherwise 0.
> +
> +What:		/sys/bus/platform/devices/dfl-
> fme.0/hwmon/hwmonX/temp1_max_policy
> +Date:		September 2019
> +KernelVersion:	5.4
> +Contact:	Wu Hao <hao.wu@intel.com>
> +Description:	Read-Only. Read this file to get the policy of hardware threshold1
> +		(see 'temp1_max'). It only supports two values (policies):
> +		    0 - AP2 state (90% throttling)
> +		    1 - AP1 state (50% throttling)
> diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
> index 73c779e..72380e1 100644
> --- a/drivers/fpga/Kconfig
> +++ b/drivers/fpga/Kconfig
> @@ -156,7 +156,7 @@ config FPGA_DFL
> 
>  config FPGA_DFL_FME
>  	tristate "FPGA DFL FME Driver"
> -	depends on FPGA_DFL
> +	depends on FPGA_DFL && HWMON
>  	help
>  	  The FPGA Management Engine (FME) is a feature device implemented
>  	  under Device Feature List (DFL) framework. Select this option to
> diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> index 4d78e18..752d71c 100644
> --- a/drivers/fpga/dfl-fme-main.c
> +++ b/drivers/fpga/dfl-fme-main.c
> @@ -14,6 +14,8 @@
>   *   Henry Mitchel <henry.mitchel@intel.com>
>   */
> 
> +#include <linux/hwmon.h>
> +#include <linux/hwmon-sysfs.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
>  #include <linux/uaccess.h>
> @@ -181,6 +183,178 @@ static long fme_hdr_ioctl(struct platform_device
> *pdev,
>  	.ioctl = fme_hdr_ioctl,
>  };
> 
> +#define FME_THERM_THRESHOLD	0x8
> +#define TEMP_THRESHOLD1		GENMASK_ULL(6, 0)
> +#define TEMP_THRESHOLD1_EN	BIT_ULL(7)
> +#define TEMP_THRESHOLD2		GENMASK_ULL(14, 8)
> +#define TEMP_THRESHOLD2_EN	BIT_ULL(15)
> +#define TRIP_THRESHOLD		GENMASK_ULL(30, 24)
> +#define TEMP_THRESHOLD1_STATUS	BIT_ULL(32)		/* threshold1
> reached */
> +#define TEMP_THRESHOLD2_STATUS	BIT_ULL(33)		/* threshold2
> reached */
> +/* threshold1 policy: 0 - AP2 (90% throttle) / 1 - AP1 (50% throttle) */
> +#define TEMP_THRESHOLD1_POLICY	BIT_ULL(44)
> +
> +#define FME_THERM_RDSENSOR_FMT1	0x10
> +#define FPGA_TEMPERATURE	GENMASK_ULL(6, 0)
> +
> +#define FME_THERM_CAP		0x20
> +#define THERM_NO_THROTTLE	BIT_ULL(0)
> +
> +#define MD_PRE_DEG
> +
> +static bool fme_thermal_throttle_support(void __iomem *base)
> +{
> +	u64 v = readq(base + FME_THERM_CAP);
> +
> +	return FIELD_GET(THERM_NO_THROTTLE, v) ? false : true;
> +}
> +
> +static umode_t thermal_hwmon_attrs_visible(const void *drvdata,
> +					   enum hwmon_sensor_types type,
> +					   u32 attr, int channel)
> +{
> +	const struct dfl_feature *feature = drvdata;
> +
> +	/* temperature is always supported, and check hardware cap for others
> */
> +	if (attr == hwmon_temp_input)
> +		return 0444;
> +
> +	return fme_thermal_throttle_support(feature->ioaddr) ? 0444 : 0;
> +}
> +
> +static int thermal_hwmon_read(struct device *dev, enum hwmon_sensor_types
> type,
> +			      u32 attr, int channel, long *val)
> +{
> +	struct dfl_feature *feature = dev_get_drvdata(dev);
> +	u64 v;
> +
> +	switch (attr) {
> +	case hwmon_temp_input:
> +		v = readq(feature->ioaddr + FME_THERM_RDSENSOR_FMT1);
> +		*val = (long)(FIELD_GET(FPGA_TEMPERATURE, v) * 1000);
> +		break;
> +	case hwmon_temp_max:
> +		v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> +		*val = (long)(FIELD_GET(TEMP_THRESHOLD1, v) * 1000);
> +		break;
> +	case hwmon_temp_crit:
> +		v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> +		*val = (long)(FIELD_GET(TEMP_THRESHOLD2, v) * 1000);
> +		break;
> +	case hwmon_temp_emergency:
> +		v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> +		*val = (long)(FIELD_GET(TRIP_THRESHOLD, v) * 1000);
> +		break;
> +	case hwmon_temp_max_alarm:
> +		v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> +		*val = (long)FIELD_GET(TEMP_THRESHOLD1_STATUS, v);
> +		break;
> +	case hwmon_temp_crit_alarm:
> +		v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> +		*val = (long)FIELD_GET(TEMP_THRESHOLD2_STATUS, v);
> +		break;
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +
> +	return 0;
> +}
> +
> +static const struct hwmon_ops thermal_hwmon_ops = {
> +	.is_visible = thermal_hwmon_attrs_visible,
> +	.read = thermal_hwmon_read,
> +};
> +
> +static const struct hwmon_channel_info *thermal_hwmon_info[] = {
> +	HWMON_CHANNEL_INFO(temp, HWMON_T_INPUT |
> HWMON_T_EMERGENCY |
> +				 HWMON_T_MAX   | HWMON_T_MAX_ALARM |
> +				 HWMON_T_CRIT  | HWMON_T_CRIT_ALARM),
> +	NULL
> +};
> +
> +static const struct hwmon_chip_info thermal_hwmon_chip_info = {
> +	.ops = &thermal_hwmon_ops,
> +	.info = thermal_hwmon_info,
> +};
> +
> +static ssize_t temp1_max_policy_show(struct device *dev,
> +				     struct device_attribute *attr, char *buf)
> +{
> +	struct dfl_feature *feature = dev_get_drvdata(dev);
> +	u64 v;
> +
> +	v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> +
> +	return sprintf(buf, "%u\n",
> +		       (unsigned int)FIELD_GET(TEMP_THRESHOLD1_POLICY, v));
> +}
> +
> +static DEVICE_ATTR_RO(temp1_max_policy);
> +
> +static struct attribute *thermal_extra_attrs[] = {
> +	&dev_attr_temp1_max_policy.attr,
> +	NULL,
> +};
> +
> +static umode_t thermal_extra_attrs_visible(struct kobject *kobj,
> +					   struct attribute *attr, int index)
> +{
> +	struct device *dev = kobj_to_dev(kobj);
> +	struct dfl_feature *feature = dev_get_drvdata(dev);
> +
> +	return fme_thermal_throttle_support(feature->ioaddr) ? attr->mode : 0;
> +}
> +
> +static const struct attribute_group thermal_extra_group = {
> +	.attrs		= thermal_extra_attrs,
> +	.is_visible	= thermal_extra_attrs_visible,
> +};
> +__ATTRIBUTE_GROUPS(thermal_extra);
> +
> +static int fme_thermal_mgmt_init(struct platform_device *pdev,
> +				 struct dfl_feature *feature)
> +{
> +	struct device *hwmon;
> +
> +	/*
> +	 * create hwmon to allow userspace monitoring temperature and other
> +	 * threshold information.
> +	 *
> +	 * temp1_input      -> FPGA device temperature
> +	 * temp1_max        -> hardware threshold 1 -> 50% or 90% throttling
> +	 * temp1_crit       -> hardware threshold 2 -> 100% throttling
> +	 * temp1_emergency  -> hardware trip_threshold to shutdown FPGA
> +	 * temp1_max_alarm  -> hardware threshold 1 alarm
> +	 * temp1_crit_alarm -> hardware threshold 2 alarm
> +	 *
> +	 * create device specific sysfs interfaces, e.g. read temp1_max_policy
> +	 * to understand the actual hardware throttling action (50% vs 90%).
> +	 *
> +	 * If hardware doesn't support automatic throttling per thresholds,
> +	 * then all above sysfs interfaces are not visible except temp1_input
> +	 * for temperature.
> +	 */
> +	hwmon = devm_hwmon_device_register_with_info(&pdev->dev,
> +						     "dfl_fme_thermal", feature,
> +						     &thermal_hwmon_chip_info,
> +						     thermal_extra_groups);
> +	if (IS_ERR(hwmon)) {
> +		dev_err(&pdev->dev, "Fail to register thermal hwmon\n");
> +		return PTR_ERR(hwmon);
> +	}
> +
> +	return 0;
> +}
> +
> +static const struct dfl_feature_id fme_thermal_mgmt_id_table[] = {
> +	{.id = FME_FEATURE_ID_THERMAL_MGMT,},
> +	{0,}
> +};
> +
> +static const struct dfl_feature_ops fme_thermal_mgmt_ops = {
> +	.init = fme_thermal_mgmt_init,
> +};
> +
>  static struct dfl_feature_driver fme_feature_drvs[] = {
>  	{
>  		.id_table = fme_hdr_id_table,
> @@ -195,6 +369,10 @@ static long fme_hdr_ioctl(struct platform_device
> *pdev,
>  		.ops = &fme_global_err_ops,
>  	},
>  	{
> +		.id_table = fme_thermal_mgmt_id_table,
> +		.ops = &fme_thermal_mgmt_ops,
> +	},
> +	{
>  		.ops = NULL,
>  	},
>  };
> --
> 1.8.3.1

^ permalink raw reply

* Re: [PATCH] mm: mempolicy: fix the absence of the last bit of nodemask
From: Vlastimil Babka @ 2019-10-14  9:35 UTC (permalink / raw)
  To: Michal Hocko, Pan Zhang
  Cc: akpm, rientjes, jgg, aarcange, yang.shi, zhongjiang, linux-mm,
	linux-kernel, Cristopher Lameter, Linux API, Alexander Viro
In-Reply-To: <20191014091243.GD317@dhcp22.suse.cz>

On 10/14/19 11:12 AM, Michal Hocko wrote:
>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> index 4ae967b..a23509f 100644
>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -1328,9 +1328,11 @@ static int get_nodes(nodemask_t *nodes, const unsigned long __user *nmask,
>>  	unsigned long nlongs;
>>  	unsigned long endmask;
>>  
>> -	--maxnode;
>>  	nodes_clear(*nodes);
>> -	if (maxnode == 0 || !nmask)
>> +	/*
>> +	 * If the user specified only one node, no need to set nodemask
>> +	 */
>> +	if (maxnode - 1 == 0 || !nmask)
>>  		return 0;
>>  	if (maxnode > PAGE_SIZE*BITS_PER_BYTE)
>>  		return -EINVAL;
> 
> I am afraid this is a wrong fix. It is really hard to grasp the code but my
> understanding is that the caller is supposed to provide maxnode larger
> than than the nodemask. So if you want 2 nodes then maxnode should be 3.
> Have a look at the libnuma (which is a reference implementation)
> 
> static void setpol(int policy, struct bitmask *bmp)
> {
> 	if (set_mempolicy(policy, bmp->maskp, bmp->size + 1) < 0)
> 		numa_error("set_mempolicy");
> }
> 
> The semantic is quite awkward but it is that way for years.

Yes, unfortunately. Too late to change. We could just update the
manpages at this point.

get_mempolicy(2) says:
 maxnode specifies the number of node IDs that can be stored into
nodemask—that is, the maximum node ID plus one.

- Since node ID starts with 0, it should be actually "plus two".

set_mempolicy(2) says:
 nodemask  points to a bit mask of node IDs that contains up to maxnode
bits.

- should be also clarified.

^ permalink raw reply

* Re: [PATCH] pidfd: add NSpid entries to fdinfo
From: Christian Kellner @ 2019-10-14  9:43 UTC (permalink / raw)
  To: Christian Brauner, jannh
  Cc: aarcange, akpm, cyphar, elena.reshetova, guro, ldv, linux-api,
	linux-kernel, mhocko, mingo, peterz, tglx, viro
In-Reply-To: <20191012102119.qq2adlnxjxrkslca@wittgenstein>

On Sat, 2019-10-12 at 12:21 +0200, Christian Brauner wrote:
> I think this might be more what we want.
Yep, indeed.

> I tried to think of cases where the first entry of Pid is not
> identical
> to the first entry of NSpid but I came up with none. Maybe you do,
> Jann?
Yeah, I don't think that can be the case. By looking at the source of
'pid_nr_ns(pid, ns)' a non-zero return means that a) 'pid' valid, ie.
non-null and b) 'ns' is in the pid namespace hierarchy of 'pid' (at
pid->level, i.e. "pid->numbers[ns->level].ns == ns").

> Christian, this is just a quick stab I took. Feel free to pick this
> up as a template.
Thanks! I slightly re-worked it, with the reasoning above in mind, to
get rid of one of the branches:

+#ifdef CONFIG_PID_NS
+	seq_put_decimal_ull(m, "\nNSpid:\t", nr);
+	if (nr) {
+		int i;
+
+		/* If nr is non-zero it means that 'pid' is valid and that
+		 * ns, i.e. the pid namespace associated with the procfs
+		 * instance, is in the pid namespace hierarchy of pid.
+		 * Start at one level below and print all descending pids.
+		 */
+		for (i = ns->level + 1; i <= pid->level; i++) {
+			ns = pid->numbers[i].ns;
+			seq_put_decimal_ull(m, "\t", pid_nr_ns(pid, ns));
+		}
+	}
+#endif

But I now just realized that with the very same reasoning, if nr is
non-zero, we don't need to redo all the checks and can just do:

for (i = ns->level + 1; i <= pid->level; i++)
	seq_put_decimal_ull(m, "\t", pid->numbers[i].nr);

If this sounds good to you I resend the patches with the change above.

Thanks,
Christian

^ permalink raw reply

* Re: [PATCH] pidfd: add NSpid entries to fdinfo
From: Christian Brauner @ 2019-10-14 10:31 UTC (permalink / raw)
  To: Christian Kellner
  Cc: jannh, aarcange, akpm, cyphar, elena.reshetova, guro, ldv,
	linux-api, linux-kernel, mhocko, mingo, peterz, tglx, viro
In-Reply-To: <abc477fb3bd8fbf7b4d7e53d079dd1d8902e54af.camel@kellner.me>

On Mon, Oct 14, 2019 at 11:43:01AM +0200, Christian Kellner wrote:
> On Sat, 2019-10-12 at 12:21 +0200, Christian Brauner wrote:
> > I think this might be more what we want.
> Yep, indeed.
> 
> > I tried to think of cases where the first entry of Pid is not
> > identical
> > to the first entry of NSpid but I came up with none. Maybe you do,
> > Jann?
> Yeah, I don't think that can be the case. By looking at the source of
> 'pid_nr_ns(pid, ns)' a non-zero return means that a) 'pid' valid, ie.
> non-null and b) 'ns' is in the pid namespace hierarchy of 'pid' (at
> pid->level, i.e. "pid->numbers[ns->level].ns == ns").
> 
> > Christian, this is just a quick stab I took. Feel free to pick this
> > up as a template.
> Thanks! I slightly re-worked it, with the reasoning above in mind, to
> get rid of one of the branches:

Thanks!

> 
> +#ifdef CONFIG_PID_NS
> +	seq_put_decimal_ull(m, "\nNSpid:\t", nr);
> +	if (nr) {
> +		int i;
> +
> +		/* If nr is non-zero it means that 'pid' is valid and that
> +		 * ns, i.e. the pid namespace associated with the procfs
> +		 * instance, is in the pid namespace hierarchy of pid.
> +		 * Start at one level below and print all descending pids.
> +		 */
> +		for (i = ns->level + 1; i <= pid->level; i++) {
> +			ns = pid->numbers[i].ns;

I'm not a fan of overriding the "ns" pointer. It's not a huge deal but
it's rather subtle.

> +			seq_put_decimal_ull(m, "\t", pid_nr_ns(pid, ns));
> +		}
> +	}
> +#endif
> 
> But I now just realized that with the very same reasoning, if nr is
> non-zero, we don't need to redo all the checks and can just do:
> 
> for (i = ns->level + 1; i <= pid->level; i++)
> 	seq_put_decimal_ull(m, "\t", pid->numbers[i].nr);
> 
> If this sounds good to you I resend the patches with the change above.

You could probably do:

#ifdef CONFIG_PID_NS
seq_put_decimal_ull(m, "\nNSpid:\t", nr);
for (i = ns->level + 1; i <= pid->level && nr; i++)
	seq_put_decimal_ull(m, "\t", pid->numbers[i].nr);
#endif

Christian

^ permalink raw reply

* [PATCH v3 1/2] clone3: add CLONE_CLEAR_SIGHAND
From: Christian Brauner @ 2019-10-14 10:45 UTC (permalink / raw)
  To: linux-kernel, Oleg Nesterov, Florian Weimer, Arnd Bergmann,
	libc-alpha
  Cc: David Howells, Jann Horn, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Shuah Khan, Andrew Morton, Michal Hocko,
	Elena Reshetova, Thomas Gleixner, Roman Gushchin,
	Andrea Arcangeli, Al Viro, Aleksa Sarai, Dmitry V. Levin,
	linux-kselftest

Reset all signal handlers of the child not set to SIG_IGN to SIG_DFL.
Mutually exclusive with CLONE_SIGHAND to not disturb other thread's
signal handler.

In the spirit of closer cooperation between glibc developers and kernel
developers (cf. [2]) this patchset came out of a discussion on the glibc
mailing list for improving posix_spawn() (cf. [1], [3], [4]). Kernel
support for this feature has been explicitly requested by glibc and I
see no reason not to help them with this.

The child helper process on Linux posix_spawn must ensure that no signal
handlers are enabled, so the signal disposition must be either SIG_DFL
or SIG_IGN. However, it requires a sigprocmask to obtain the current
signal mask and at least _NSIG sigaction calls to reset the signal
handlers for each posix_spawn call or complex state tracking that might
lead to data corruption in glibc. Adding this flags lets glibc avoid
these problems.

[1]: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00149.html
[3]: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00158.html
[4]: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00160.html
[2]: https://lwn.net/Articles/799331/
     '[...] by asking for better cooperation with the C-library projects
     in general. They should be copied on patches containing ABI
     changes, for example. I noted that there are often times where
     C-library developers wish the kernel community had done things
     differently; how could those be avoided in the future? Members of
     the audience suggested that more glibc developers should perhaps
     join the linux-api list. The other suggestion was to "copy Florian
     on everything".'
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: libc-alpha@sourceware.org
Cc: linux-api@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v1 */
Link: https://lore.kernel.org/r/20191010133518.5420-1-christian.brauner@ubuntu.com

/* v2 */
Link: https://lore.kernel.org/r/20191011102537.27502-1-christian.brauner@ubuntu.com
- Florian Weimer <fweimer@redhat.com>:
  - update comment in clone3_args_valid()

/* v3 */
- "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>:
  - s/CLONE3_CLEAR_SIGHAND/CLONE_CLEAR_SIGHAND/g
---
 include/uapi/linux/sched.h |  3 +++
 kernel/fork.c              | 16 +++++++++++-----
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 99335e1f4a27..1d500ed03c63 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -33,6 +33,9 @@
 #define CLONE_NEWNET		0x40000000	/* New network namespace */
 #define CLONE_IO		0x80000000	/* Clone io context */
 
+/* Flags for the clone3() syscall. */
+#define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */
+
 #ifndef __ASSEMBLY__
 /**
  * struct clone_args - arguments for the clone3 syscall
diff --git a/kernel/fork.c b/kernel/fork.c
index 1f6c45f6a734..aa5b5137f071 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1517,6 +1517,11 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)
 	spin_lock_irq(&current->sighand->siglock);
 	memcpy(sig->action, current->sighand->action, sizeof(sig->action));
 	spin_unlock_irq(&current->sighand->siglock);
+
+	/* Reset all signal handler not set to SIG_IGN to SIG_DFL. */
+	if (clone_flags & CLONE_CLEAR_SIGHAND)
+		flush_signal_handlers(tsk, 0);
+
 	return 0;
 }
 
@@ -2563,11 +2568,8 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
 
 static bool clone3_args_valid(const struct kernel_clone_args *kargs)
 {
-	/*
-	 * All lower bits of the flag word are taken.
-	 * Verify that no other unknown flags are passed along.
-	 */
-	if (kargs->flags & ~CLONE_LEGACY_FLAGS)
+	/* Verify that no unknown flags are passed along. */
+	if (kargs->flags & ~(CLONE_LEGACY_FLAGS | CLONE_CLEAR_SIGHAND))
 		return false;
 
 	/*
@@ -2577,6 +2579,10 @@ static bool clone3_args_valid(const struct kernel_clone_args *kargs)
 	if (kargs->flags & (CLONE_DETACHED | CSIGNAL))
 		return false;
 
+	if ((kargs->flags & (CLONE_SIGHAND | CLONE_CLEAR_SIGHAND)) ==
+	    (CLONE_SIGHAND | CLONE_CLEAR_SIGHAND))
+		return false;
+
 	if ((kargs->flags & (CLONE_THREAD | CLONE_PARENT)) &&
 	    kargs->exit_signal)
 		return false;
-- 
2.23.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox