Linux userland API discussions

Linux userland API discussions
 help / color / mirror / Atom feed

* Re: [RFC PATCH] fork: add CLONE_PIDFD
From: Linus Torvalds @ 2019-04-11 16:50 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Al Viro, Jann Horn, David Howells, Linux API,
	Linux List Kernel Mailing, Serge E. Hallyn, Andrew Lutomirski,
	Arnd Bergmann, Eric W. Biederman, Kees Cook, Alexey Dobriyan,
	Thomas Gleixner, Michael Kerrisk-manpages, Jonathan Kowalski,
	Dmitry V. Levin, Andrew Morton, Oleg Nesterov, Aleksa Sarai,
	Joel Fernandes, Daniel
In-Reply-To: <20190410234045.29846-1-christian@brauner.io>

On Wed, Apr 10, 2019 at 4:43 PM Christian Brauner <christian@brauner.io> wrote:
>
> RFC-2:
> This alternative patchset uses anonymous file descriptors instead of
> file descriptors from /proc/<pid>.

I think I prefer this one. Your diffstat makes it look bigger, but
that's because you added the example code to use this to that rfc..

Plus making anon_inodes unconditional makes sense anyway.  They are
always selected in practice to begin with.

               Linus

^ permalink raw reply

* Re: [PATCH] Linux: Define struct termios2 in <termios.h> under _GNU_SOURCE [BZ #10339]
From: Adhemerval Zanella @ 2019-04-11 12:53 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, hpa, linux-api, linuxppc-dev
In-Reply-To: <87imvker6t.fsf@oldenburg2.str.redhat.com>



On 11/04/2019 08:07, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> This allows us to adjust the baud rates to non-standard values using termios
>> interfaces without to resorting to add new headers and use a different API
>> (ioctl).
> 
> How much symbol versioning will be required for this change?

I think all interfaces that have termios as input for sparc and mips 
(tcgetattr, tcsetattr, cfmakeraw, cfgetispeed, cfgetospeed, cfsetispeed,
cfsetospeed, cfsetspeed).

Alpha will also need to use termios1 for pre-4.20 kernels.

> 
>> As Peter Anvin has indicated, he create a POC [1] with the aforementioned
>> new interfaces.  It has not been rebased against master, more specially against
>> my termios refactor to simplify the multiple architecture header definitions,
>> but I intend to use as a base.
> 
> Reference [1] is still missing. 8-(

Oops... it is https://git.zytor.com/users/hpa/glibc/termbaud.git/log/?h=wip.termbaud

> 
> Thanks,
> Florian
> 

^ permalink raw reply

* Re: [PATCH ghak90 V6 00/10] audit: implement container identifier
From: Richard Guy Briggs @ 2019-04-11 11:31 UTC (permalink / raw)
  To: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel
  Cc: Paul Moore, sgrubb, omosnace, dhowells, simo, eparis, serge,
	ebiederm, nhorman
In-Reply-To: <cover.1554732921.git.rgb@redhat.com>

On 2019-04-08 23:39, Richard Guy Briggs wrote:
> Implement kernel audit container identifier.

Here's a first revision of the conversion of my manual test script from
bash to automated perl in the audit-testsuite:

	https://github.com/linux-audit/audit-testsuite/pull/83

It revealed some bugs/limitations in userspace code.  One is an
omission in my userspace code support for these features that treat the
contid field in the CONATAINER_ID auxiliary record to the NETFILTER_PKT
record as a comma separated list (I already have a patch).  Another is
the inability to search on contid in CONTAINER_ID fields (complicated by
the previous issue).  A third (already noted in ghau86) is the failure
to group records of the same event if the record number is in the 1000
block.  Another is pondering the addition of an old-contid search
option.

Despite these limitations, the test script works.

> This patchset is a fifth based on the proposal document (V3)
> posted:
> 	https://www.redhat.com/archives/linux-audit/2018-January/msg00014.html
> 
> The first patch was the last patch from ghak81 that was absorbed into
> this patchset since its primary justification is the rest of this
> patchset.
> 
> The second patch implements the proc fs write to set the audit container
> identifier of a process, emitting an AUDIT_CONTAINER_OP record to
> announce the registration of that audit container identifier on that
> process.  This patch requires userspace support for record acceptance
> and proper type display.
> 
> The third implements reading the audit container identifier from the
> proc filesystem for debugging.  This patch wasn't planned for upstream
> inclusion but is starting to become more likely.
> 
> The fourth implements the auxiliary record AUDIT_CONTAINER_ID if an audit
> container identifier is associated with an event.  This patch requires
> userspace support for proper type display.
> 
> The 5th adds audit daemon signalling provenance through audit_sig_info2.
> 
> The 6th creates a local audit context to be able to bind a standalone
> record with a locally created auxiliary record.
> 
> The 7th patch adds audit container identifier records to the user
> standalone records.
> 
> The 8th adds audit container identifier filtering to the exit,
> exclude and user lists.  This patch adds the AUDIT_CONTID field and
> requires auditctl userspace support for the --contid option.
> 
> The 9th adds network namespace audit container identifier labelling
> based on member tasks' audit container identifier labels.
> 
> The 10th adds audit container identifier support to standalone netfilter
> records that don't have a task context and lists each container to which
> that net namespace belongs.
> 
> Example: Set an audit container identifier of 123456 to the "sleep" task:
> 
>   sleep 2&
>   child=$!
>   echo 123456 > /proc/$child/audit_containerid; echo $?
>   ausearch -ts recent -m container_op
>   echo child:$child contid:$( cat /proc/$child/audit_containerid)
> 
> This should produce a record such as:
> 
>   type=CONTAINER_OP msg=audit(2018-06-06 12:39:29.636:26949) : op=set opid=2209 contid=123456 old-contid=18446744073709551615 pid=628 auid=root uid=root tty=ttyS0 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm=bash exe=/usr/bin/bash res=yes
> 
> 
> Example: Set a filter on an audit container identifier 123459 on /tmp/tmpcontainerid:
> 
>   contid=123459
>   key=tmpcontainerid
>   auditctl -a exit,always -F dir=/tmp -F perm=wa -F contid=$contid -F key=$key
>   perl -e "sleep 1; open(my \$tmpfile, '>', \"/tmp/$key\"); close(\$tmpfile);" &
>   child=$!
>   echo $contid > /proc/$child/audit_containerid
>   sleep 2
>   ausearch -i -ts recent -k $key
>   auditctl -d exit,always -F dir=/tmp -F perm=wa -F contid=$contid -F key=$key
>   rm -f /tmp/$key
> 
> This should produce an event such as:
> 
>   type=CONTAINER_ID msg=audit(2018-06-06 12:46:31.707:26953) : contid=123459
>   type=PROCTITLE msg=audit(2018-06-06 12:46:31.707:26953) : proctitle=perl -e sleep 1; open(my $tmpfile, '>', "/tmp/tmpcontainerid"); close($tmpfile);
>   type=PATH msg=audit(2018-06-06 12:46:31.707:26953) : item=1 name=/tmp/tmpcontainerid inode=25656 dev=00:26 mode=file,644 ouid=root ogid=root rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=none cap_fi=none cap_fe=0 cap_fver=0
>   type=PATH msg=audit(2018-06-06 12:46:31.707:26953) : item=0 name=/tmp/ inode=8985 dev=00:26 mode=dir,sticky,777 ouid=root ogid=root rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype=PARENT cap_fp=none cap_fi=none cap_fe=0 cap_fver=0
>   type=CWD msg=audit(2018-06-06 12:46:31.707:26953) : cwd=/root
>   type=SYSCALL msg=audit(2018-06-06 12:46:31.707:26953) : arch=x86_64 syscall=openat success=yes exit=3 a0=0xffffffffffffff9c a1=0x5621f2b81900 a2=O_WRONLY|O_CREAT|O_TRUNC a3=0x1b6 items=2 ppid=628 pid=2232 auid=root uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=ttyS0 ses=1 comm=perl exe=/usr/bin/perl subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=tmpcontainerid
> 
> Example: Test multiple containers on one netns:
> 
>   sleep 5 &
>   child1=$!
>   containerid1=123451
>   echo $containerid1 > /proc/$child1/audit_containerid
>   sleep 5 &
>   child2=$!
>   containerid2=123452
>   echo $containerid2 > /proc/$child2/audit_containerid
>   iptables -I INPUT -i lo -p icmp --icmp-type echo-request -j AUDIT --type accept
>   iptables -I INPUT  -t mangle -i lo -p icmp --icmp-type echo-request -j MARK --set-mark 0x12345555
>   sleep 1;
>   bash -c "ping -q -c 1 127.0.0.1 >/dev/null 2>&1"
>   sleep 1;
>   ausearch -i -m NETFILTER_PKT -ts boot|grep mark=0x12345555
>   ausearch -i -m NETFILTER_PKT -ts boot|grep contid=|grep $containerid1|grep $containerid2
> 
> This should produce an event such as:
> 
>   type=NETFILTER_PKT msg=audit(03/15/2019 14:16:13.369:244) : mark=0x12345555 saddr=127.0.0.1 daddr=127.0.0.1 proto=icmp
>   type=CONTAINER_ID msg=audit(03/15/2019 14:16:13.369:244) : contid=123452,123451
> 
> 
> Includes the last patch of https://github.com/linux-audit/audit-kernel/issues/81
> Please see the github audit kernel issue for the main feature:
>   https://github.com/linux-audit/audit-kernel/issues/90
> and the kernel filter code:
>   https://github.com/linux-audit/audit-kernel/issues/91
> and the network support:
>   https://github.com/linux-audit/audit-kernel/issues/92
> Please see the github audit userspace issue for supporting record types:
>   https://github.com/linux-audit/audit-userspace/issues/51
> and filter code:
>   https://github.com/linux-audit/audit-userspace/issues/40
> Please see the github audit testsuiite issue for the test case:
>   https://github.com/linux-audit/audit-testsuite/issues/64
> Please see the github audit wiki for the feature overview:
>   https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> 
> 
> Changelog:
> 
> v6
> - change TMPBUFLEN from 11 to 21 to cover the decimal value of contid
>   u64 (nhorman)
> - fix bug overwriting ctx in struct audit_sig_info, move cid above
>   ctx[0] (nhorman)
> - fix bug skipping remaining fields and not advancing bufp when copying
>   out contid in audit_krule_to_data (omosnacec)
> - add acks, tidy commit descriptions, other formatting fixes (checkpatch
>   wrong on audit_log_lost)
> - cast ull for u64 prints
> - target_cid tracking was moved from the ptrace/signal patch to
>   container_op
> - target ptrace and signal records were moved from the ptrace/signal
>   patch to container_id
> - auditd signaller tracking was moved to a new AUDIT_SIGNAL_INFO2
>   request and record
> - ditch unnecessary list_empty() checks
> - check for null net and aunet in audit_netns_contid_add()
> - swap CONTAINER_OP contid/old-contid order to ease parsing
> 
> v5
> - address loginuid and sessionid syscall scope in ghak104
> - address audit_context in CONFIG_AUDIT vs CONFIG_AUDITSYSCALL in ghak105
> - remove tty patch, addressed in ghak106
> - rebase on audit/next v5.0-rc1
>   w/ghak59/ghak104/ghak103/ghak100/ghak107/ghak105/ghak106/ghak105sup
> - update CONTAINER_ID to CONTAINER_OP in patch description
> - move audit_context in audit_task_info to CONFIG_AUDITSYSCALL
> - move audit_alloc() and audit_free() out of CONFIG_AUDITSYSCALL and into
>   CONFIG_AUDIT and create audit_{alloc,free}_syscall
> - use plain kmem_cache_alloc() rather than kmem_cache_zalloc() in audit_alloc()
> - fix audit_get_contid() declaration type error
> - move audit_set_contid() from auditsc.c to audit.c
> - audit_log_contid() returns void
> - audit_log_contid() handed contid rather than tsk
> - switch from AUDIT_CONTAINER to AUDIT_CONTAINER_ID for aux record
> - move audit_log_contid(tsk/contid) & audit_contid_set(tsk)/audit_contid_valid(contid)
> - switch from tsk to current
> - audit_alloc_local() calls audit_log_lost() on failure to allocate a context
> - add AUDIT_USER* non-syscall contid record
> - cosmetic cleanup double parens, goto out on err
> - ditch audit_get_ns_contid_list_lock(), fix aunet lock race
> - switch from all-cpu read spinlock to rcu, keep spinlock for write
> - update audit_alloc_local() to use ktime_get_coarse_real_ts64()
> - add nft_log support
> - add call from do_exit() in audit_free() to remove contid from netns
> - relegate AUDIT_CONTAINER ref= field (was op=) to debug patch
> 
> v4
> - preface set with ghak81:"collect audit task parameters"
> - add shallyn and sgrubb acks
> - rename feature bitmap macro
> - rename cid_valid() to audit_contid_valid()
> - rename AUDIT_CONTAINER_ID to AUDIT_CONTAINER_OP
> - delete audit_get_contid_list() from headers
> - move work into inner if, delete "found"
> - change netns contid list function names
> - move exports for audit_log_contid audit_alloc_local audit_free_context to non-syscall patch
> - list contids CSV
> - pass in gfp flags to audit_alloc_local() (fix audit_alloc_context callers)
> - use "local" in lieu of abusing in_syscall for auditsc_get_stamp()
> - read_lock(&tasklist_lock) around children and thread check
> - task_lock(tsk) should be taken before first check of tsk->audit
> - add spin lock to contid list in aunet
> - restrict /proc read to CAP_AUDIT_CONTROL
> - remove set again prohibition and inherited flag
> - delete contidion spelling fix from patchset, send to netdev/linux-wireless
> 
> v3
> - switched from containerid in task_struct to audit_task_info (depends on ghak81)
> - drop INVALID_CID in favour of only AUDIT_CID_UNSET
> - check for !audit_task_info, throw -ENOPROTOOPT on set
> - changed -EPERM to -EEXIST for parent check
> - return AUDIT_CID_UNSET if !audit_enabled
> - squash child/thread check patch into AUDIT_CONTAINER_ID patch
> - changed -EPERM to -EBUSY for child check
> - separate child and thread checks, use -EALREADY for latter
> - move addition of op= from ptrace/signal patch to AUDIT_CONTAINER patch
> - fix && to || bashism in ptrace/signal patch
> - uninline and export function for audit_free_context()
> - drop CONFIG_CHANGE, FEATURE_CHANGE, ANOM_ABEND, ANOM_SECCOMP patches
> - move audit_enabled check (xt_AUDIT)
> - switched from containerid list in struct net to net_generic's struct audit_net
> - move containerid list iteration into audit (xt_AUDIT)
> - create function to move namespace switch into audit
> - switched /proc/PID/ entry from containerid to audit_containerid
> - call kzalloc with GFP_ATOMIC on in_atomic() in audit_alloc_context()
> - call kzalloc with GFP_ATOMIC on in_atomic() in audit_log_container_info()
> - use xt_net(par) instead of sock_net(skb->sk) to get net
> - switched record and field names: initial CONTAINER_ID, aux CONTAINER, field CONTID
> - allow to set own contid
> - open code audit_set_containerid
> - add contid inherited flag
> - ccontainerid and pcontainerid eliminated due to inherited flag
> - change name of container list funcitons
> - rename containerid to contid
> - convert initial container record to syscall aux
> - fix spelling mistake of contidion in net/rfkill/core.c to avoid contid name collision
> 
> v2
> - add check for children and threads
> - add network namespace container identifier list
> - add NETFILTER_PKT audit container identifier logging
> - patch description and documentation clean-up and example
> - reap unused ppid
> 
> Richard Guy Briggs (10):
>   audit: collect audit task parameters
>   audit: add container id
>   audit: read container ID of a process
>   audit: log container info of syscalls
>   audit: add contid support for signalling the audit daemon
>   audit: add support for non-syscall auxiliary records
>   audit: add containerid support for user records
>   audit: add containerid filtering
>   audit: add support for containerid to network namespaces
>   audit: NETFILTER_PKT: record each container ID associated with a netNS
> 
>  fs/proc/base.c              |  57 +++++++-
>  include/linux/audit.h       | 113 +++++++++++++--
>  include/linux/sched.h       |   7 +-
>  include/uapi/linux/audit.h  |   9 +-
>  init/init_task.c            |   3 +-
>  init/main.c                 |   2 +
>  kernel/audit.c              | 325 ++++++++++++++++++++++++++++++++++++++++++--
>  kernel/audit.h              |   9 ++
>  kernel/auditfilter.c        |  47 +++++++
>  kernel/auditsc.c            |  90 ++++++++----
>  kernel/fork.c               |   1 -
>  kernel/nsproxy.c            |   4 +
>  net/netfilter/nft_log.c     |  11 +-
>  net/netfilter/xt_AUDIT.c    |  11 +-
>  security/selinux/nlmsgtab.c |   1 +
>  15 files changed, 627 insertions(+), 63 deletions(-)
> 
> -- 
> 1.8.3.1
> 

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* [PATCH v2] fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback
From: Amir Goldstein @ 2019-04-11 11:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, Dave Chinner, Al Viro, linux-mm, linux-api,
	linux-fsdevel, linux-kernel

Commit 23d0127096cb ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE
writeback") claims that sync_file_range(2) syscall was "created for
userspace to be able to issue background writeout and so waiting for
in-flight IO is undesirable there" and changes the writeback (back) to
WB_SYNC_NONE.

This claim is only partially true. It is true for users that use the flag
SYNC_FILE_RANGE_WRITE by itself, as does PostgreSQL, the user that was
the reason for changing to WB_SYNC_NONE writeback.

However, that claim is not true for users that use that flag combination
SYNC_FILE_RANGE_{WAIT_BEFORE|WRITE|_WAIT_AFTER}.
Those users explicitly requested to wait for in-flight IO as well as to
writeback of dirty pages.

Re-brand that flag combination as SYNC_FILE_RANGE_WRITE_AND_WAIT
and use the helper filemap_write_and_wait_range(), that uses WB_SYNC_ALL
writeback, to perform the full range sync request.

Fixes: 23d0127096cb ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE")
Acked-by: Jan Kara <jack@suse.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/sync.c               | 25 ++++++++++++++++++-------
 include/uapi/linux/fs.h |  3 +++
 2 files changed, 21 insertions(+), 7 deletions(-)

Andrew,

Since you were the one to merge Jan's patch that this Fixes,
I figured it would be best to send the fix through your tree.

You may find the discussion on V1 here:
https://lore.kernel.org/lkml/20190409114922.30095-1-amir73il@gmail.com/

Thanks,
Amir.

Changes since v1:
- Remove non-guaranties of the API from commit message
- Added ACK by Jan

diff --git a/fs/sync.c b/fs/sync.c
index b54e0541ad89..5cf6fdbae4de 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -18,8 +18,8 @@
 #include <linux/backing-dev.h>
 #include "internal.h"
 
-#define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \
-			SYNC_FILE_RANGE_WAIT_AFTER)
+#define VALID_FLAGS (SYNC_FILE_RANGE_WRITE | SYNC_FILE_RANGE_WRITE_AND_WAIT | \
+		     SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WAIT_AFTER)
 
 /*
  * Do the filesystem syncing work. For simple filesystems
@@ -235,9 +235,9 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
 }
 
 /*
- * sys_sync_file_range() permits finely controlled syncing over a segment of
+ * ksys_sync_file_range() permits finely controlled syncing over a segment of
  * a file in the range offset .. (offset+nbytes-1) inclusive.  If nbytes is
- * zero then sys_sync_file_range() will operate from offset out to EOF.
+ * zero then ksys_sync_file_range() will operate from offset out to EOF.
  *
  * The flag bits are:
  *
@@ -254,7 +254,7 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
  * Useful combinations of the flag bits are:
  *
  * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE: ensures that all pages
- * in the range which were dirty on entry to sys_sync_file_range() are placed
+ * in the range which were dirty on entry to ksys_sync_file_range() are placed
  * under writeout.  This is a start-write-for-data-integrity operation.
  *
  * SYNC_FILE_RANGE_WRITE: start writeout of all dirty pages in the range which
@@ -266,10 +266,13 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
  * earlier SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE operation to wait
  * for that operation to complete and to return the result.
  *
- * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER:
+ * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER
+ * (a.k.a. SYNC_FILE_RANGE_WRITE_AND_WAIT):
  * a traditional sync() operation.  This is a write-for-data-integrity operation
  * which will ensure that all pages in the range which were dirty on entry to
- * sys_sync_file_range() are committed to disk.
+ * ksys_sync_file_range() are written to disk.  It should be noted that disk
+ * caches are not flushed by this call, so there are no guarantees here that the
+ * data will be available on disk after a crash.
  *
  *
  * SYNC_FILE_RANGE_WAIT_BEFORE and SYNC_FILE_RANGE_WAIT_AFTER will detect any
@@ -338,6 +341,14 @@ int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes,
 
 	mapping = f.file->f_mapping;
 	ret = 0;
+	if ((flags & SYNC_FILE_RANGE_WRITE_AND_WAIT) ==
+		     SYNC_FILE_RANGE_WRITE_AND_WAIT) {
+		/* Unlike SYNC_FILE_RANGE_WRITE alone uses WB_SYNC_ALL */
+		ret = filemap_write_and_wait_range(mapping, offset, endbyte);
+		if (ret < 0)
+			goto out_put;
+	}
+
 	if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) {
 		ret = file_fdatawait_range(f.file, offset, endbyte);
 		if (ret < 0)
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 121e82ce296b..59c71fa8c553 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -320,6 +320,9 @@ struct fscrypt_key {
 #define SYNC_FILE_RANGE_WAIT_BEFORE	1
 #define SYNC_FILE_RANGE_WRITE		2
 #define SYNC_FILE_RANGE_WAIT_AFTER	4
+#define SYNC_FILE_RANGE_WRITE_AND_WAIT	(SYNC_FILE_RANGE_WRITE | \
+					 SYNC_FILE_RANGE_WAIT_BEFORE | \
+					 SYNC_FILE_RANGE_WAIT_AFTER)
 
 /*
  * Flags for preadv2/pwritev2:
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH] Linux: Define struct termios2 in <termios.h> under _GNU_SOURCE [BZ #10339]
From: Florian Weimer @ 2019-04-11 11:07 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha, hpa, linux-api, linuxppc-dev
In-Reply-To: <284e9c76-2411-b8f4-c4bc-c25c60c04cf7@linaro.org>

* Adhemerval Zanella:

> This allows us to adjust the baud rates to non-standard values using termios
> interfaces without to resorting to add new headers and use a different API
> (ioctl).

How much symbol versioning will be required for this change?

> As Peter Anvin has indicated, he create a POC [1] with the aforementioned
> new interfaces.  It has not been rebased against master, more specially against
> my termios refactor to simplify the multiple architecture header definitions,
> but I intend to use as a base.

Reference [1] is still missing. 8-(

Thanks,
Florian

^ permalink raw reply

* Re: [PATCH] io_uring: add support for barrier fsync
From: Dave Chinner @ 2019-04-11 11:05 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Chris Mason, Christoph Hellwig, linux-fsdevel,
	linux-block@vger.kernel.org, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <a7f00629-f822-7271-7a4e-ad3b802752e4@kernel.dk>

On Tue, Apr 09, 2019 at 12:46:15PM -0600, Jens Axboe wrote:
> On 4/9/19 12:42 PM, Chris Mason wrote:
> > On 9 Apr 2019, at 14:23, Jens Axboe wrote:
> > 
> >> On 4/9/19 12:17 PM, Christoph Hellwig wrote:
> >>> On Tue, Apr 09, 2019 at 10:27:43AM -0600, Jens Axboe wrote:
> >>>> It's a quite common use case to issue a bunch of writes, then an 
> >>>> fsync
> >>>> or fdatasync when they complete. Since io_uring doesn't guarantee 
> >>>> any
> >>>> type of ordering, the application must track issued writes and wait
> >>>> with the fsync issue until they have completed.
> >>>>
> >>>> Add an IORING_FSYNC_BARRIER flag that helps with this so the 
> >>>> application
> >>>> doesn't have to do this manually. If this flag is set for the fsync
> >>>> request, we won't issue it until pending IO has already completed.
> >>>
> >>> I think we need a much more detailed explanation of the semantics,
> >>> preferably in man page format.
> >>>
> >>> Barrier at least in Linux traditionally means all previously 
> >>> submitted
> >>> requests have finished and no new ones are started until the
> >>> barrier request finishes, which is very heavy handed.  Is that what
> >>> this is supposed to do?  If not what are the exact guarantees vs
> >>> ordering and or barrier semantics?
> >>
> >> The patch description isn't that great, and maybe the naming isn't 
> >> that
> >> intuitive either. The way it's implemented, the fsync will NOT be 
> >> issued
> >> until previously issued IOs have completed. That means both reads and
> >> writes, since there's no way to wait for just one.  In terms of
> >> semantics, any previously submitted writes will have completed before
> >> this fsync is issued. The barrier fsync has no ordering wrt future
> >> writes, no ordering is implied there. Hence:
> >>
> >> W1, W2, W3, FSYNC_W_BARRIER, W4, W5
> >>
> >> W1..3 will have been completed by the hardware side before we start
> >> FSYNC_W_BARRIER. We don't wait with issuing W4..5 until after the 
> >> fsync
> >> completes, no ordering is provided there.
> > 
> > Looking at the patch, why is fsync special?  Seems like you could add 
> > this ordering bit to any write?
> 
> It's really not, the exact same technique could be used on any type of
> command to imply ordering. My initial idea was to have an explicit
> barrier/ordering command, but I didn't think that separating it from an
> actual command would be needed/useful.
> 
> > While you're here, do you want to add a way to FUA/cache flush?  
> > Basically the rest of what user land would need to make their own 
> > write-back-cache-safe implementation.
> 
> FUA would be a WRITEV/WRITE_FIXED flag, that should be trivially doable.

We already have plumbing to make pwritev2 and AIO issue FUA writes
via the RWF_DSYNC flag through the fs/iomap.c direct IO path. FUA is
only valid if the file does not have dirty metadata (e.g. because of
block allocation) and that requires the filesystem block mapping to
tell the IO path if FUA can be used. Otherwise a journal flush is
also required to make the data stable and there's no point in doing
a FUA write for the data in that case...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply

* [PATCH v2 10/11] platform/x86: asus-wmi: Switch fan boost mode
From: Yurii Pavlovskyi @ 2019-04-11  5:47 UTC (permalink / raw)
  Cc: Corentin Chary, Darren Hart, Andy Shevchenko, Daniel Drake,
	acpi4asus-user, platform-driver-x86, linux-kernel, linux-api
In-Reply-To: <72b3e0aa-d53a-8a82-1505-f4f00aa2bb46@gmail.com>

The WMI exposes a write-only device ID where three modes can be switched
on some laptops (TUF Gaming FX505GM). There is a hotkey combination Fn-F5
that does have a fan icon which is designed to toggle between these 3
modes.

Add a SysFS entry that reads the last written value and updates value in
WMI on write and a hotkey handler that toggles the modes. The
corresponding DEVS device handler does obviously take 3 possible
argument values.

Method (SFBM, 1, NotSerialized)
{
    If ((Arg0 == Zero) { .. }
    If ((Arg0 == One)) { .. }
    If ((Arg0 == 0x02)) { .. }
}

... // DEVS
If ((IIA0 == 0x00110018))
{
   SFBM (IIA1)
   Return (One)
}

* 0x00 - is normal,
* 0x01 - is obviously turbo by the amount of noise, might be useful to
avoid CPU frequency throttling on high load,
* 0x02 - the meaning is unknown at the time as modes are not named
in the vendor documentation, but it does look like a quiet mode as CPU
temperature does increase about 10 degrees on maximum load.

Signed-off-by: Yurii Pavlovskyi <yurii.pavlovskyi@gmail.com>
---
 .../ABI/testing/sysfs-platform-asus-wmi       |  10 ++
 drivers/platform/x86/asus-wmi.c               | 119 ++++++++++++++++--
 include/linux/platform_data/x86/asus-wmi.h    |   1 +
 3 files changed, 117 insertions(+), 13 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-platform-asus-wmi b/Documentation/ABI/testing/sysfs-platform-asus-wmi
index 300a40519695..2b3184e297a7 100644
--- a/Documentation/ABI/testing/sysfs-platform-asus-wmi
+++ b/Documentation/ABI/testing/sysfs-platform-asus-wmi
@@ -97,3 +97,13 @@ Description:
 		Write changed RGB keyboard backlight parameters:
 			* 1 - permanently,
 			* 2 - temporarily.
+
+What:		/sys/devices/platform/<platform>/fan_mode
+Date:		Apr 2019
+KernelVersion:	5.1
+Contact:	"Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
+Description:
+		Fan boost mode:
+			* 0 - normal,
+			* 1 - turbo,
+			* 2 - quiet?
diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index b4fd200e8335..f0e506feb924 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -69,6 +69,7 @@ MODULE_LICENSE("GPL");
 #define NOTIFY_KBD_BRTUP		0xc4
 #define NOTIFY_KBD_BRTDWN		0xc5
 #define NOTIFY_KBD_BRTTOGGLE		0xc7
+#define NOTIFY_KBD_FBM			0x99
 
 #define ASUS_FAN_DESC			"cpu_fan"
 #define ASUS_FAN_MFUN			0x13
@@ -77,6 +78,8 @@ MODULE_LICENSE("GPL");
 #define ASUS_FAN_CTRL_MANUAL		1
 #define ASUS_FAN_CTRL_AUTO		2
 
+#define ASUS_FAN_MODE_COUNT		3
+
 #define USB_INTEL_XUSB2PR		0xD0
 #define PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI	0x9c31
 
@@ -196,6 +199,9 @@ struct asus_wmi {
 	int asus_hwmon_num_fans;
 	int asus_hwmon_pwm;
 
+	bool fan_mode_available;
+	u8 fan_mode;
+
 	bool kbbl_rgb_available;
 	struct asus_kbbl_rgb kbbl_rgb;
 
@@ -1832,6 +1838,87 @@ static int asus_wmi_fan_init(struct asus_wmi *asus)
 	return 0;
 }
 
+/* Fan mode *******************************************************************/
+
+static int fan_mode_check_present(struct asus_wmi *asus)
+{
+	u32 result;
+	int err;
+
+	asus->fan_mode_available = false;
+
+	err = asus_wmi_get_devstate(asus, ASUS_WMI_DEVID_FAN_MODE, &result);
+	if (err) {
+		if (err == -ENODEV)
+			return 0;
+		else
+			return err;
+	}
+
+	if (result & ASUS_WMI_DSTS_PRESENCE_BIT)
+		asus->fan_mode_available = true;
+
+	return 0;
+}
+
+static int fan_mode_write(struct asus_wmi *asus)
+{
+	int err;
+	u8 value;
+	u32 retval;
+
+	value = asus->fan_mode % ASUS_FAN_MODE_COUNT;
+	pr_info("Set fan mode: %u\n", value);
+	err = asus_wmi_set_devstate(ASUS_WMI_DEVID_FAN_MODE, value, &retval);
+
+	if (err) {
+		pr_warn("Failed to set fan mode: %d\n", err);
+		return err;
+	}
+
+	if (retval != 1) {
+		pr_warn("Failed to set fan mode (retval): 0x%x\n", retval);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static int fan_mode_switch_next(struct asus_wmi *asus)
+{
+	asus->fan_mode = (asus->fan_mode + 1) % ASUS_FAN_MODE_COUNT;
+	return fan_mode_write(asus);
+}
+
+static ssize_t fan_mode_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return show_u8(asus->fan_mode, buf);
+}
+
+static ssize_t fan_mode_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	int result;
+	u8 new_mode;
+
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	result = store_u8(&new_mode, buf, count);
+	if (result < 0)
+		return result;
+
+	asus->fan_mode = new_mode % ASUS_FAN_MODE_COUNT;
+	fan_mode_write(asus);
+
+	return result;
+}
+
+// Fan mode: 0 - normal, 1 - turbo, 2 - quiet?
+static DEVICE_ATTR_RW(fan_mode);
+
 /* Backlight ******************************************************************/
 
 static int read_backlight_power(struct asus_wmi *asus)
@@ -2083,6 +2170,9 @@ static void asus_wmi_handle_notify(int code, struct asus_wmi *asus)
 		return;
 	}
 
+	if (asus->fan_mode_available && code == NOTIFY_KBD_FBM)
+		fan_mode_switch_next(asus);
+
 	if (is_display_toggle(code) && asus->driver->quirks->no_display_toggle)
 		return;
 
@@ -2236,6 +2326,7 @@ static struct attribute *platform_attributes[] = {
 	&dev_attr_touchpad.attr,
 	&dev_attr_lid_resume.attr,
 	&dev_attr_als_enable.attr,
+	&dev_attr_fan_mode.attr,
 	NULL
 };
 
@@ -2257,6 +2348,8 @@ static umode_t asus_sysfs_is_visible(struct kobject *kobj,
 		devid = ASUS_WMI_DEVID_LID_RESUME;
 	else if (attr == &dev_attr_als_enable.attr)
 		devid = ASUS_WMI_DEVID_ALS_ENABLE;
+	else if (attr == &dev_attr_fan_mode.attr)
+		ok = asus->fan_mode_available;
 
 	if (devid != -1)
 		ok = !(asus_wmi_get_devstate_simple(asus, devid) < 0);
@@ -2281,7 +2374,7 @@ static int asus_wmi_sysfs_init(struct platform_device *device)
 
 /* Platform device ************************************************************/
 
-static int asus_wmi_platform_init(struct asus_wmi *asus)
+static void asus_wmi_platform_init(struct asus_wmi *asus)
 {
 	int rv;
 
@@ -2333,13 +2426,6 @@ static int asus_wmi_platform_init(struct asus_wmi *asus)
 	if (asus->driver->quirks->wapf >= 0)
 		asus_wmi_set_devstate(ASUS_WMI_DEVID_CWAP,
 				      asus->driver->quirks->wapf, NULL);
-
-	return asus_wmi_sysfs_init(asus->platform_device);
-}
-
-static void asus_wmi_platform_exit(struct asus_wmi *asus)
-{
-	asus_wmi_sysfs_exit(asus->platform_device);
 }
 
 /* debugfs ********************************************************************/
@@ -2514,9 +2600,15 @@ static int asus_wmi_add(struct platform_device *pdev)
 	if (wdrv->detect_quirks)
 		wdrv->detect_quirks(asus->driver);
 
-	err = asus_wmi_platform_init(asus);
+	asus_wmi_platform_init(asus);
+
+	err = fan_mode_check_present(asus);
 	if (err)
-		goto fail_platform;
+		goto fail_fan_mode;
+
+	err = asus_wmi_sysfs_init(asus->platform_device);
+	if (err)
+		goto fail_sysfs;
 
 	err = asus_wmi_input_init(asus);
 	if (err)
@@ -2611,8 +2703,9 @@ static int asus_wmi_add(struct platform_device *pdev)
 fail_hwmon:
 	asus_wmi_input_exit(asus);
 fail_input:
-	asus_wmi_platform_exit(asus);
-fail_platform:
+	asus_wmi_sysfs_exit(asus->platform_device);
+fail_sysfs:
+fail_fan_mode:
 	kfree(asus);
 	return err;
 }
@@ -2629,7 +2722,7 @@ static int asus_wmi_remove(struct platform_device *device)
 	kbbl_rgb_exit(asus);
 	asus_wmi_rfkill_exit(asus);
 	asus_wmi_debugfs_exit(asus);
-	asus_wmi_platform_exit(asus);
+	asus_wmi_sysfs_exit(asus->platform_device);
 	asus_wmi_hwmon_exit(asus);
 
 	kfree(asus);
diff --git a/include/linux/platform_data/x86/asus-wmi.h b/include/linux/platform_data/x86/asus-wmi.h
index 25b7b653e6d2..0f3654b7b8a8 100644
--- a/include/linux/platform_data/x86/asus-wmi.h
+++ b/include/linux/platform_data/x86/asus-wmi.h
@@ -59,6 +59,7 @@
 #define ASUS_WMI_DEVID_LIGHTBAR		0x00050025
 #define ASUS_WMI_DEVID_KBD_RGB		0x00100056
 #define ASUS_WMI_DEVID_KBD_RGB2		0x00100057
+#define ASUS_WMI_DEVID_FAN_MODE		0x00110018
 
 /* Misc */
 #define ASUS_WMI_DEVID_CAMERA		0x00060013
-- 
2.17.1

^ permalink raw reply related

* [PATCH v2 09/11] platform/x86: asus-wmi: Control RGB keyboard backlight
From: Yurii Pavlovskyi @ 2019-04-11  5:46 UTC (permalink / raw)
  Cc: Corentin Chary, Darren Hart, Andy Shevchenko, Daniel Drake,
	acpi4asus-user, platform-driver-x86, linux-kernel, linux-api
In-Reply-To: <bc00c6bb-298f-6c85-83a3-7500ed3c8d61@gmail.com>

The WMI exposes two methods for controlling RGB keyboard backlight which
allow to control:
* RGB components in range 00 - ff,
* Switch between 4 effects,
* Switch between 3 effect speed modes,
* Separately enable the backlight on boot, in awake state (after driver
  load), in sleep mode, and probably in something called shutdown mode
  (no observable effects of enabling it are known so far).

The configuration should be written to several sysfs parameter buffers
which are then written via WMI by writing either 1 or 2 to the "kbbl_set"
parameter. When reading the buffers the last written value is returned.

If the 2 is written to "kbbl_set", the parameters will be reset on reboot
(temporary mode), 1 is permanent mode, parameters are retained.

The calls use new 3-dword input buffer method call.

The functionality is only enabled if corresponding DSTS methods return
exact valid values.

The following script demonstrates usage:

echo Red [00 - ff]
echo 33 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_red
echo Green [00 - ff]
echo ff > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_green
echo Blue [00 - ff]
echo 0 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_blue
echo Mode: 0 - static color, 1 - blink, 2 - rainbow, 3 - strobe
echo 0 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_mode
echo Speed for modes 1 and 2: 0 - slow, 1 - medium, 2 - fast
echo 0 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_speed
echo Enable: 02 - on boot, before module load, 08 - awake, 20 - sleep,
echo 2a or ff to set all
echo 2a > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_flags
echo Save: 1 - permanently, 2 - temporarily, reset after reboot
echo 1 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_set

Signed-off-by: Yurii Pavlovskyi <yurii.pavlovskyi@gmail.com>
---
 .../ABI/testing/sysfs-platform-asus-wmi       |  61 ++++
 drivers/platform/x86/asus-wmi.c               | 329 ++++++++++++++++++
 include/linux/platform_data/x86/asus-wmi.h    |   2 +
 3 files changed, 392 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-platform-asus-wmi b/Documentation/ABI/testing/sysfs-platform-asus-wmi
index 019e1e29370e..300a40519695 100644
--- a/Documentation/ABI/testing/sysfs-platform-asus-wmi
+++ b/Documentation/ABI/testing/sysfs-platform-asus-wmi
@@ -36,3 +36,64 @@ KernelVersion:	3.5
 Contact:	"AceLan Kao" <acelan.kao@canonical.com>
 Description:
 		Resume on lid open. 1 means on, 0 means off.
+
+What:		/sys/devices/platform/<platform>/kbbl/kbbl_red
+Date:		Apr 2019
+KernelVersion:	5.1
+Contact:	"Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
+Description:
+		RGB keyboard backlight red component: 00 .. ff.
+
+What:		/sys/devices/platform/<platform>/kbbl/kbbl_green
+Date:		Apr 2019
+KernelVersion:	5.1
+Contact:	"Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
+Description:
+		RGB keyboard backlight green component: 00 .. ff.
+
+What:		/sys/devices/platform/<platform>/kbbl/kbbl_blue
+Date:		Apr 2019
+KernelVersion:	5.1
+Contact:	"Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
+Description:
+		RGB keyboard backlight blue component: 00 .. ff.
+
+What:		/sys/devices/platform/<platform>/kbbl/kbbl_mode
+Date:		Apr 2019
+KernelVersion:	5.1
+Contact:	"Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
+Description:
+		RGB keyboard backlight mode:
+			* 0 - static color,
+			* 1 - blink,
+			* 2 - rainbow,
+			* 3 - strobe.
+
+What:		/sys/devices/platform/<platform>/kbbl/kbbl_speed
+Date:		Apr 2019
+KernelVersion:	5.1
+Contact:	"Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
+Description:
+		RGB keyboard backlight speed for modes 1 and 2:
+			* 0 - slow,
+			* 1 - medium,
+			* 2 - fast.
+
+What:		/sys/devices/platform/<platform>/kbbl/kbbl_flags
+Date:		Apr 2019
+KernelVersion:	5.1
+Contact:	"Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
+Description:
+		RGB keyboard backlight enable flags (2a to enable everything), OR of:
+			* 02 - on boot (until module load),
+			* 08 - awake,
+			* 20 - sleep.
+
+What:		/sys/devices/platform/<platform>/kbbl/kbbl_set
+Date:		Apr 2019
+KernelVersion:	5.1
+Contact:	"Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
+Description:
+		Write changed RGB keyboard backlight parameters:
+			* 1 - permanently,
+			* 2 - temporarily.
diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index de0a8f61d4a1..b4fd200e8335 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -145,6 +145,21 @@ struct asus_rfkill {
 	u32 dev_id;
 };
 
+struct asus_kbbl_rgb {
+	u8 kbbl_red;
+	u8 kbbl_green;
+	u8 kbbl_blue;
+	u8 kbbl_mode;
+	u8 kbbl_speed;
+
+	u8 kbbl_set_red;
+	u8 kbbl_set_green;
+	u8 kbbl_set_blue;
+	u8 kbbl_set_mode;
+	u8 kbbl_set_speed;
+	u8 kbbl_set_flags;
+};
+
 struct asus_wmi {
 	int dsts_id;
 	int spec;
@@ -181,6 +196,9 @@ struct asus_wmi {
 	int asus_hwmon_num_fans;
 	int asus_hwmon_pwm;
 
+	bool kbbl_rgb_available;
+	struct asus_kbbl_rgb kbbl_rgb;
+
 	struct hotplug_slot hotplug_slot;
 	struct mutex hotplug_lock;
 	struct mutex wmi_lock;
@@ -656,6 +674,310 @@ static int asus_wmi_led_init(struct asus_wmi *asus)
 	return rv;
 }
 
+/* RGB keyboard backlight *****************************************************/
+
+static ssize_t show_u8(u8 value, char *buf)
+{
+	return scnprintf(buf, PAGE_SIZE, "%02x\n", value);
+}
+
+static ssize_t store_u8(u8 *value, const char *buf, int count)
+{
+	int err;
+	u8 result;
+
+	err = kstrtou8(buf, 16, &result);
+	if (err < 0) {
+		pr_warn("Trying to store invalid value\n");
+		return err;
+	}
+
+	*value = result;
+
+	return count;
+}
+
+static ssize_t kbbl_red_show(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return show_u8(asus->kbbl_rgb.kbbl_red, buf);
+}
+
+static ssize_t kbbl_red_store(struct device *dev, struct device_attribute *attr,
+		const char *buf, size_t count)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return store_u8(&asus->kbbl_rgb.kbbl_set_red, buf, count);
+}
+
+static ssize_t kbbl_green_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return show_u8(asus->kbbl_rgb.kbbl_green, buf);
+}
+
+static ssize_t kbbl_green_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return store_u8(&asus->kbbl_rgb.kbbl_set_green, buf, count);
+}
+
+static ssize_t kbbl_blue_show(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return show_u8(asus->kbbl_rgb.kbbl_blue, buf);
+}
+
+static ssize_t kbbl_blue_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return store_u8(&asus->kbbl_rgb.kbbl_set_blue, buf, count);
+}
+
+static ssize_t kbbl_mode_show(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return show_u8(asus->kbbl_rgb.kbbl_mode, buf);
+}
+
+static ssize_t kbbl_mode_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return store_u8(&asus->kbbl_rgb.kbbl_set_mode, buf, count);
+}
+
+static ssize_t kbbl_speed_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return show_u8(asus->kbbl_rgb.kbbl_speed, buf);
+}
+
+static ssize_t kbbl_speed_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return store_u8(&asus->kbbl_rgb.kbbl_set_speed, buf, count);
+}
+
+static ssize_t kbbl_flags_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return show_u8(asus->kbbl_rgb.kbbl_set_flags, buf);
+}
+
+static ssize_t kbbl_flags_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return store_u8(&asus->kbbl_rgb.kbbl_set_flags, buf, count);
+}
+
+static ssize_t kbbl_set_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	return scnprintf(buf, PAGE_SIZE,
+			"Write to configure RGB keyboard backlight\n");
+}
+
+static int kbbl_rgb_write(struct asus_wmi *asus, int persistent)
+{
+	int err;
+	u32 retval;
+	u8 speed_byte;
+	u8 mode_byte;
+	u8 speed;
+	u8 mode;
+
+	speed = asus->kbbl_rgb.kbbl_set_speed;
+	switch (speed) {
+	case 0:
+	default:
+		speed_byte = 0xe1; // slow
+		speed = 0;
+		break;
+	case 1:
+		speed_byte = 0xeb; // medium
+		break;
+	case 2:
+		speed_byte = 0xf5; // fast
+		break;
+	}
+
+	mode = asus->kbbl_rgb.kbbl_set_mode;
+	switch (mode) {
+	case 0:
+	default:
+		mode_byte = 0x00; // static color
+		mode = 0;
+		break;
+	case 1:
+		mode_byte = 0x01; // blink
+		break;
+	case 2:
+		mode_byte = 0x02; // rainbow
+		break;
+	case 3:
+		mode_byte = 0x0a; // strobe
+		break;
+	}
+
+	err = asus_wmi_evaluate_method_3dw(ASUS_WMI_METHODID_DEVS,
+		ASUS_WMI_DEVID_KBD_RGB,
+		(persistent ? 0xb4 : 0xb3) |
+		(mode_byte << 8) |
+		(asus->kbbl_rgb.kbbl_set_red << 16) |
+		(asus->kbbl_rgb.kbbl_set_green << 24),
+		(asus->kbbl_rgb.kbbl_set_blue) |
+		(speed_byte << 8), &retval);
+	if (err) {
+		pr_warn("RGB keyboard device 1, write error: %d\n", err);
+		return err;
+	}
+
+	if (retval != 1) {
+		pr_warn("RGB keyboard device 1, write error (retval): %x\n",
+				retval);
+		return -EIO;
+	}
+
+	err = asus_wmi_evaluate_method_3dw(ASUS_WMI_METHODID_DEVS,
+		ASUS_WMI_DEVID_KBD_RGB2,
+		(0xbd) |
+		(asus->kbbl_rgb.kbbl_set_flags << 16) |
+		(persistent ? 0x0100 : 0x0000), 0, &retval);
+	if (err) {
+		pr_warn("RGB keyboard device 2, write error: %d\n", err);
+		return err;
+	}
+
+	if (retval != 1) {
+		pr_warn("RGB keyboard device 2, write error (retval): %x\n",
+				retval);
+		return -EIO;
+	}
+
+	asus->kbbl_rgb.kbbl_red = asus->kbbl_rgb.kbbl_set_red;
+	asus->kbbl_rgb.kbbl_green = asus->kbbl_rgb.kbbl_set_green;
+	asus->kbbl_rgb.kbbl_blue = asus->kbbl_rgb.kbbl_set_blue;
+	asus->kbbl_rgb.kbbl_mode = mode;
+	asus->kbbl_rgb.kbbl_speed = speed;
+
+	return 0;
+}
+
+static ssize_t kbbl_set_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	u8 value;
+	struct asus_wmi *asus;
+	int result;
+
+	asus = dev_get_drvdata(dev);
+	result = store_u8(&value, buf, count);
+	if (result < 0)
+		return result;
+
+	if (value == 1)
+		kbbl_rgb_write(asus, 1);
+	else if (value == 2)
+		kbbl_rgb_write(asus, 0);
+
+	return count;
+}
+
+/* RGB values: 00 .. ff */
+static DEVICE_ATTR_RW(kbbl_red);
+static DEVICE_ATTR_RW(kbbl_green);
+static DEVICE_ATTR_RW(kbbl_blue);
+
+/* Color modes: 0 - static color, 1 - blink, 2 - rainbow, 3 - strobe */
+static DEVICE_ATTR_RW(kbbl_mode);
+
+/* Speed for modes 1 and 2: 0 - slow, 1 - medium, 2 - fast */
+static DEVICE_ATTR_RW(kbbl_speed);
+
+/*
+ * Enable: 02 - on boot (until module load) | 08 - awake | 20 - sleep
+ * (2a or ff to enable everything)
+ *
+ * Logically 80 would be shutdown, but no visible effects of this option
+ * were observed so far
+ */
+static DEVICE_ATTR_RW(kbbl_flags);
+
+/* Write data: 1 - permanently, 2 - temporarily (reset after reboot) */
+static DEVICE_ATTR_RW(kbbl_set);
+
+static struct attribute *rgbkb_sysfs_attributes[] = {
+	&dev_attr_kbbl_red.attr,
+	&dev_attr_kbbl_green.attr,
+	&dev_attr_kbbl_blue.attr,
+	&dev_attr_kbbl_mode.attr,
+	&dev_attr_kbbl_speed.attr,
+	&dev_attr_kbbl_flags.attr,
+	&dev_attr_kbbl_set.attr,
+	NULL,
+};
+
+static const struct attribute_group kbbl_attribute_group = {
+	.name = "kbbl",
+	.attrs = rgbkb_sysfs_attributes
+};
+
+static int kbbl_rgb_init(struct asus_wmi *asus)
+{
+	int err;
+
+	err = asus_wmi_get_devstate_simple(asus, ASUS_WMI_DEVID_KBD_RGB);
+	if (err) {
+		if (err == -ENODEV)
+			return 0;
+		else
+			return err;
+	}
+
+	err = asus_wmi_get_devstate_simple(asus, ASUS_WMI_DEVID_KBD_RGB2);
+	if (err) {
+		if (err == -ENODEV)
+			return 0;
+		else
+			return err;
+	}
+
+	asus->kbbl_rgb_available = true;
+	return sysfs_create_group(&asus->platform_device->dev.kobj,
+			&kbbl_attribute_group);
+}
+
+static void kbbl_rgb_exit(struct asus_wmi *asus)
+{
+	if (asus->kbbl_rgb_available) {
+		sysfs_remove_group(&asus->platform_device->dev.kobj,
+				&kbbl_attribute_group);
+	}
+}
+
 /* RF *************************************************************************/
 
 /*
@@ -2211,6 +2533,10 @@ static int asus_wmi_add(struct platform_device *pdev)
 	if (err)
 		goto fail_leds;
 
+	err = kbbl_rgb_init(asus);
+	if (err)
+		goto fail_rgbkb;
+
 	asus_wmi_get_devstate(asus, ASUS_WMI_DEVID_WLAN, &result);
 	if (result & (ASUS_WMI_DSTS_PRESENCE_BIT | ASUS_WMI_DSTS_USER_BIT))
 		asus->driver->wlan_ctrl_by_user = 1;
@@ -2277,6 +2603,8 @@ static int asus_wmi_add(struct platform_device *pdev)
 fail_backlight:
 	asus_wmi_rfkill_exit(asus);
 fail_rfkill:
+	kbbl_rgb_exit(asus);
+fail_rgbkb:
 	asus_wmi_led_exit(asus);
 fail_leds:
 	asus_wmi_hwmon_exit(asus);
@@ -2298,6 +2626,7 @@ static int asus_wmi_remove(struct platform_device *device)
 	asus_wmi_backlight_exit(asus);
 	asus_wmi_input_exit(asus);
 	asus_wmi_led_exit(asus);
+	kbbl_rgb_exit(asus);
 	asus_wmi_rfkill_exit(asus);
 	asus_wmi_debugfs_exit(asus);
 	asus_wmi_platform_exit(asus);
diff --git a/include/linux/platform_data/x86/asus-wmi.h b/include/linux/platform_data/x86/asus-wmi.h
index 53dfc2541960..25b7b653e6d2 100644
--- a/include/linux/platform_data/x86/asus-wmi.h
+++ b/include/linux/platform_data/x86/asus-wmi.h
@@ -57,6 +57,8 @@
 #define ASUS_WMI_DEVID_KBD_BACKLIGHT	0x00050021
 #define ASUS_WMI_DEVID_LIGHT_SENSOR	0x00050022 /* ?? */
 #define ASUS_WMI_DEVID_LIGHTBAR		0x00050025
+#define ASUS_WMI_DEVID_KBD_RGB		0x00100056
+#define ASUS_WMI_DEVID_KBD_RGB2		0x00100057
 
 /* Misc */
 #define ASUS_WMI_DEVID_CAMERA		0x00060013
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH v14 1/3] /proc/pid/status: Add support for architecture specific output
From: Li, Aubrey @ 2019-04-11  1:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, H. Peter Anvin,
	Andi Kleen, Tim Chen, Dave Hansen, Arjan van de Ven,
	Alexey Dobriyan, Andrew Morton, aubrey.li, Linux API, LKML
In-Reply-To: <CALCETrUPHg8b-aRpo3NhaedGMJgQxOJjgQN_9fjubnDWEzo+aA@mail.gmail.com>

On 2019/4/10 22:54, Andy Lutomirski wrote:
> On Tue, Apr 9, 2019 at 8:40 PM Li, Aubrey <aubrey.li@linux.intel.com> wrote:
>>
>> On 2019/4/10 10:36, Li, Aubrey wrote:
>>> On 2019/4/10 10:25, Andy Lutomirski wrote:
>>>> On Tue, Apr 9, 2019 at 7:20 PM Li, Aubrey <aubrey.li@linux.intel.com> wrote:
>>>>>
>>>>> On 2019/4/10 9:58, Andy Lutomirski wrote:
>>>>>> On Tue, Apr 9, 2019 at 6:55 PM Aubrey Li <aubrey.li@linux.intel.com> wrote:
>>>>>>>
>>>>>>> The architecture specific information of the running processes could
>>>>>>> be useful to the userland. Add support to examine process architecture
>>>>>>> specific information externally.
>>>>>>>
>>>>>>> Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
>>>>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>>>>> Cc: Andi Kleen <ak@linux.intel.com>
>>>>>>> Cc: Tim Chen <tim.c.chen@linux.intel.com>
>>>>>>> Cc: Dave Hansen <dave.hansen@intel.com>
>>>>>>> Cc: Arjan van de Ven <arjan@linux.intel.com>
>>>>>>> Cc: Linux API <linux-api@vger.kernel.org>
>>>>>>> Cc: Alexey Dobriyan <adobriyan@gmail.com>
>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>> ---
>>>>>>>  fs/proc/array.c         | 5 +++++
>>>>>>>  include/linux/proc_fs.h | 2 ++
>>>>>>>  2 files changed, 7 insertions(+)
>>>>>>>
>>>>>>> diff --git a/fs/proc/array.c b/fs/proc/array.c
>>>>>>> index 2edbb657f859..331592a61718 100644
>>>>>>> --- a/fs/proc/array.c
>>>>>>> +++ b/fs/proc/array.c
>>>>>>> @@ -401,6 +401,10 @@ static inline void task_thp_status(struct seq_file *m, struct mm_struct *mm)
>>>>>>>         seq_printf(m, "THP_enabled:\t%d\n", thp_enabled);
>>>>>>>  }
>>>>>>>
>>>>>>> +void __weak arch_proc_pid_status(struct seq_file *m, struct task_struct *task)
>>>>>>> +{
>>>>>>> +}
>>>>>>
>>>>>> This pointlessly bloats other architectures.  Do this instead in an
>>>>>> appropriate header:
>>>>>>
>>>>>> #ifndef arch_proc_pid_status
>>>>>> static inline void arch_proc_pid_status(...)
>>>>>> {
>>>>>> }
>>>>>> #endif
>>>>>>
>>>>>
>>>>> I saw a bunch of similar weak functions, is it not acceptable?
>>>>>
>>>>> fs/proc$ grep weak *.c
>>>>> cpuinfo.c:__weak void arch_freq_prepare_all(void)
>>>>> meminfo.c:void __attribute__((weak)) arch_report_meminfo(struct seq_file *m)
>>>>> vmcore.c:int __weak elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
>>>>> vmcore.c:void __weak elfcorehdr_free(unsigned long long addr)
>>>>> vmcore.c:ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
>>>>> vmcore.c:ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
>>>>> vmcore.c:int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
>>>>> vmcore.c:ssize_t __weak
>>>>
>>>> I think they're acceptable, but I don't personally like them.
>>>>
>>>
>>> okay, let me try to see if I can refine it in an appropriate way.
>>
>> Hi Andy,
>>
>> Is this what you want?
>>
>> Thanks,
>> -Aubrey
>>
>> ====================================================================
>> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
>> index 2bb3a648fc12..82d77d3aefff 100644
>> --- a/arch/x86/include/asm/processor.h
>> +++ b/arch/x86/include/asm/processor.h
>> @@ -990,5 +990,8 @@ enum l1tf_mitigations {
>>  };
>>
>>  extern enum l1tf_mitigations l1tf_mitigation;
>> +/* Add support for architecture specific output in /proc/pid/status */
>> +void arch_proc_pid_status(struct seq_file *m, struct task_struct *task);
>> +#define arch_proc_pid_status arch_proc_pid_status
>>
>>  #endif /* _ASM_X86_PROCESSOR_H */
>> diff --git a/fs/proc/array.c b/fs/proc/array.c
>> index 2edbb657f859..fd65a6ba2864 100644
>> --- a/fs/proc/array.c
>> +++ b/fs/proc/array.c
>> @@ -401,6 +401,11 @@ static inline void task_thp_status(struct seq_file *m, struct mm_struct *mm)
>>         seq_printf(m, "THP_enabled:\t%d\n", thp_enabled);
>>  }
>>
>> +/* Add support for architecture specific output in /proc/pid/status */
>> +#ifndef arch_proc_pid_status
>> +#define arch_proc_pid_status(m, task)
>> +#endif
>> +
>>  int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
>>                         struct pid *pid, struct task_struct *task)
>>  {
>> @@ -424,6 +429,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
>>         task_cpus_allowed(m, task);
>>         cpuset_task_status_allowed(m, task);
>>         task_context_switch_counts(m, task);
>> +       arch_proc_pid_status(m, task);
>>         return 0;
>>  }
>>
> 
> Yes.  But I still think it would be nicer to separate the arch stuff
> into its own file.  Others might reasonably disagree with me.
> 
I like arch_status, I proposed but no other arch shows interesting in it.

I think the problem is similar for x86_status, it does not make sense for
those x86 platform without AVX512 to have an empty arch file. I personally
don't like [arch]_status because the code may become unclean if more arches
added in future.

Maybe it's too early to have a separated arch staff file for now.

Thanks,
-Aubrey

^ permalink raw reply

* Re: [RFC PATCH] fork: add CLONE_PIDFD
From: Daniel Colascione @ 2019-04-11  0:12 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Linus Torvalds, Al Viro, Jann Horn, David Howells, Linux API,
	linux-kernel, Serge E. Hallyn, Andy Lutomirski, Arnd Bergmann,
	Eric W. Biederman, Kees Cook, Alexey Dobriyan, Thomas Gleixner,
	Michael Kerrisk-manpages, Jonathan Kowalski, Dmitry V. Levin,
	Andrew Morton, Oleg Nesterov, Aleksa Sarai, Joel
In-Reply-To: <20190410234045.29846-1-christian@brauner.io>

Thanks for trying it both ways.

On Wed, Apr 10, 2019 at 4:43 PM Christian Brauner <christian@brauner.io> wrote:
>
> Hey Linus,
>
> This is an RFC for adding a new CLONE_PIDFD flag to clone() as
> previously discussed.
> While implementing this Jann and I ran into additional complexity that
> prompted us to send out an initial RFC patchset to make sure we still
> think going forward with the current implementation is a good idea and
> also provide an alternative approach:
>
> RFC-1:
> This is an RFC for the implementation of pidfds as /proc/<pid> file
> descriptors.
> The tricky part here is that we need to retrieve a file descriptor for
> /proc/<pid> before clone's point of no return. Otherwise, we need to fail
> the creation of a process that has already passed all barriers and is
> visible in userspace. Getting that file descriptor then becomes a rather
> intricate dance including allocating a detached dentry that we need to
> commit once attach_pid() has been called.
> Note that this RFC only includes the logic we think is needed to return
> /proc/<pid> file descriptors from clone. It does *not* yet include the even
> more complex logic needed to restrict procfs itself. And the additional
> logic needed to prevent attacks such as openat(pidfd, "..", ...) and access
> to /proc/<pid>/net/ on top of the procfs restriction.

Why would filtering proc be all that complicated? Wouldn't it just be
adding a "sensitive" flag to struct pid_entry and skipping entries
with that flag when constructing proc entries?

> There are a couple of reasons why we stopped short of this and decided to
> sent out an RFC first:
> - Even the initial part of getting file descriptors from /proc/<pid> out
>   of clone() required rather complex code that struck us as very
>   inelegant and heavy (which granted, might partially caused by not seeing
>   a cleaner way to implement this). Thus, it felt like we needed to see
>   whether this is even remotely considered acceptable.
> - While discussing further aspects of this approach with Al we received
>   rather substantiated opposition to exposing even more codepaths to
>   procfs.
> - Restricting access to procfs properly requires a lot of invasive work
>   even touching core vfs functions such as
>   follow_dotdot()/follow_dotdot_rcu() which also caused 2.

Wasn't an internal bind mount supposed to take care of the parent
traversal problem?

^ permalink raw reply

* Re: [RFC-2 PATCH 4/4] samples: show race-free pidfd metadata access
From: Daniel Colascione @ 2019-04-11  0:08 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Linus Torvalds, Al Viro, Jann Horn, David Howells, Linux API,
	linux-kernel, Serge E. Hallyn, Andy Lutomirski, Arnd Bergmann,
	Eric W. Biederman, Kees Cook, Alexey Dobriyan, Thomas Gleixner,
	Michael Kerrisk-manpages, Jonathan Kowalski, Dmitry V. Levin,
	Andrew Morton, Oleg Nesterov, Aleksa Sarai, Joel
In-Reply-To: <20190410234045.29846-6-christian@brauner.io>

Thanks for providing this example. A new nits below.

On Wed, Apr 10, 2019 at 4:43 PM Christian Brauner <christian@brauner.io> wrote:
>
> This is an sample program to show userspace how to get race-free access to
> process metadata from a pidfd.
> It is really not that difficult and instead of burdening the kernel with
> this task by using fds to /proc/<pid> we can simply add a helper to libc
> that does it for the user.
>
> Signed-off-by: Christian Brauner <christian@brauner.io>
> Signed-off-by: Jann Horn <jann@thejh.net>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Alexey Dobriyan <adobriyan@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: David Howells <dhowells@redhat.com>
> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
> Cc: Jonathan Kowalski <bl0pbl33p@gmail.com>
> Cc: "Dmitry V. Levin" <ldv@altlinux.org>
> Cc: Andy Lutomirsky <luto@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Aleksa Sarai <cyphar@cyphar.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  samples/Makefile               |   2 +-
>  samples/pidfd/Makefile         |   6 ++
>  samples/pidfd/pidfd-metadata.c | 169 +++++++++++++++++++++++++++++++++
>  3 files changed, 176 insertions(+), 1 deletion(-)
>  create mode 100644 samples/pidfd/Makefile
>  create mode 100644 samples/pidfd/pidfd-metadata.c
>
> diff --git a/samples/Makefile b/samples/Makefile
> index b1142a958811..fadadb1c3b05 100644
> --- a/samples/Makefile
> +++ b/samples/Makefile
> @@ -3,4 +3,4 @@
>  obj-$(CONFIG_SAMPLES)  += kobject/ kprobes/ trace_events/ livepatch/ \
>                            hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/ \
>                            configfs/ connector/ v4l/ trace_printk/ \
> -                          vfio-mdev/ statx/ qmi/ binderfs/
> +                          vfio-mdev/ statx/ qmi/ binderfs/ pidfd/
> diff --git a/samples/pidfd/Makefile b/samples/pidfd/Makefile
> new file mode 100644
> index 000000000000..0ff97784177a
> --- /dev/null
> +++ b/samples/pidfd/Makefile
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +hostprogs-y := pidfd-metadata
> +always := $(hostprogs-y)
> +HOSTCFLAGS_pidfd-metadata.o += -I$(objtree)/usr/include
> +all: pidfd-metadata
> diff --git a/samples/pidfd/pidfd-metadata.c b/samples/pidfd/pidfd-metadata.c
> new file mode 100644
> index 000000000000..c46c6c34a012
> --- /dev/null
> +++ b/samples/pidfd/pidfd-metadata.c
> @@ -0,0 +1,169 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define _GNU_SOURCE
> +#include <err.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <inttypes.h>
> +#include <limits.h>
> +#include <sched.h>
> +#include <signal.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/stat.h>
> +#include <sys/syscall.h>
> +#include <sys/types.h>
> +#include <sys/wait.h>
> +#include <unistd.h>
> +
> +#ifndef CLONE_PIDFD
> +#define CLONE_PIDFD 0x00001000
> +#endif
> +
> +static int raw_clone_pidfd(void)
> +{
> +       unsigned long flags = CLONE_PIDFD;
> +
> +#if defined(__s390x__) || defined(__s390__) || defined(__CRIS__)
> +       /* On s390/s390x and cris the order of the first and second arguments
> +         * of the system call is reversed.
> +         */
> +       return (int)syscall(__NR_clone, NULL, flags | SIGCHLD);
> +#elif defined(__sparc__) && defined(__arch64__)
> +       {
> +               /*
> +                 * sparc64 always returns the other process id in %o0, and a
> +                 * boolean flag whether this is the child or the parent in %o1.
> +                 * Inline assembly is needed to get the flag returned in %o1.
> +                 */
> +               int in_child;
> +               int child_pid;
> +               asm volatile("mov %2, %%g1\n\t"
> +                            "mov %3, %%o0\n\t"
> +                            "mov 0 , %%o1\n\t"
> +                            "t 0x6d\n\t"
> +                            "mov %%o1, %0\n\t"
> +                            "mov %%o0, %1"
> +                            : "=r"(in_child), "=r"(child_pid)
> +                            : "i"(__NR_clone), "r"(flags | SIGCHLD)
> +                            : "%o1", "%o0", "%g1");
> +
> +               if (in_child)
> +                       return 0;
> +               else
> +                       return child_pid;
> +       }
> +#elif defined(__ia64__)
> +       /* On ia64 the stack and stack size are passed as separate arguments. */
> +       return (int)syscall(__NR_clone, flags | SIGCHLD, NULL, prctl_arg(0));
> +#else
> +       return (int)syscall(__NR_clone, flags | SIGCHLD, NULL);
> +#endif
> +}
> +
> +static inline int sys_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
> +                                       unsigned int flags)
> +{
> +       return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
> +}
> +
> +static int pidfd_metadata_fd(int pidfd)
> +{
> +       int procfd, ret;
> +       char path[100];
> +       FILE *f;
> +       size_t n = 0;
> +       char *line = NULL;
> +
> +       snprintf(path, sizeof(path), "/proc/self/fdinfo/%d", pidfd);
> +
> +       f = fopen(path, "re");
> +       if (!f)
> +               return -1;
> +
> +       ret = 0;
> +       while (getline(&line, &n, f) != -1) {
> +               char *numstr;
> +               size_t len;
> +
> +               if (strncmp(line, "Pid:\t", 5))
> +                       continue;
> +
> +               numstr = line + 5;
> +               len = strlen(numstr);
> +               if (len > 0 && numstr[len - 1] == '\n')
> +                       numstr[len - 1] = '\0';
> +               ret = snprintf(path, sizeof(path), "/proc/%s", numstr);
> +               break;
> +       }
> +       free(line);
> +       fclose(f);
> +
> +       if (!ret) {
> +               errno = ENOENT;
> +               warn("Failed to parse pid from fdinfo\n");
> +               return -1;
> +       }
> +
> +       procfd = open(path, O_DIRECTORY | O_RDONLY | O_CLOEXEC);
> +       if (procfd < 0) {
> +               warn("Failed to open %s\n", path);
> +               return -1;
> +       }
> +
> +       /*
> +        * Verify that the pid has not been recycled and our /proc/<pid> handle
> +        * is still valid.
> +        */
> +       if (sys_pidfd_send_signal(pidfd, 0, NULL, 0) < 0) {
> +               /* process does not exist */
> +               if (errno == ESRCH) {
> +                       warn("The pid was recycled\n");

ITYM that the process was reaped.

> +                       close(procfd);
> +                       return -1;
> +               }
> +
> +               /* just not allowed to signal it */

I'd look for EPERM specifically instead of just assuming that any
error indicates that a permission failure. I'd also explicitly state
that EPERM still implies process existence.

> +       }
> +
> +       return procfd;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +       int procfd, ret = EXIT_FAILURE;
> +       ssize_t bytes;
> +       char buf[4096] = { 0 };
> +
> +       int pidfd = raw_clone_pidfd();
> +       if (pidfd < 0)
> +               return -1;
> +
> +       if (pidfd == 0) {
> +               printf("%d\n", getpid());
> +               exit(EXIT_SUCCESS);
> +       }
> +
> +       procfd = pidfd_metadata_fd(pidfd);
> +       close(pidfd);
> +       if (procfd < 0)
> +               goto out;
> +
> +       int statusfd = openat(procfd, "status", O_RDONLY | O_CLOEXEC);
> +       close(procfd);
> +       if (statusfd < 0)
> +               goto out;
> +
> +       bytes = read(statusfd, buf, sizeof(buf));
> +       if (bytes > 0)
> +               bytes = write(STDOUT_FILENO, buf, bytes);
> +       close(statusfd);
> +
> +out:
> +       (void)wait(NULL);
> +       if (bytes < 0 || ret)
> +               exit(EXIT_FAILURE);
> +
> +       exit(EXIT_SUCCESS);
> +}
> --
> 2.21.0
>

^ permalink raw reply

* [RFC-2 PATCH 4/4] samples: show race-free pidfd metadata access
From: Christian Brauner @ 2019-04-10 23:40 UTC (permalink / raw)
  To: torvalds, viro, jannh, dhowells, linux-api, linux-kernel
  Cc: serge, luto, arnd, ebiederm, keescook, adobriyan, tglx,
	mtk.manpages, bl0pbl33p, ldv, akpm, oleg, cyphar, joel, dancol,
	Christian Brauner
In-Reply-To: <20190410234045.29846-1-christian@brauner.io>

This is an sample program to show userspace how to get race-free access to
process metadata from a pidfd.
It is really not that difficult and instead of burdening the kernel with
this task by using fds to /proc/<pid> we can simply add a helper to libc
that does it for the user.

Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Jann Horn <jann@thejh.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Jonathan Kowalski <bl0pbl33p@gmail.com>
Cc: "Dmitry V. Levin" <ldv@altlinux.org>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
---
 samples/Makefile               |   2 +-
 samples/pidfd/Makefile         |   6 ++
 samples/pidfd/pidfd-metadata.c | 169 +++++++++++++++++++++++++++++++++
 3 files changed, 176 insertions(+), 1 deletion(-)
 create mode 100644 samples/pidfd/Makefile
 create mode 100644 samples/pidfd/pidfd-metadata.c

diff --git a/samples/Makefile b/samples/Makefile
index b1142a958811..fadadb1c3b05 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -3,4 +3,4 @@
 obj-$(CONFIG_SAMPLES)	+= kobject/ kprobes/ trace_events/ livepatch/ \
 			   hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/ \
 			   configfs/ connector/ v4l/ trace_printk/ \
-			   vfio-mdev/ statx/ qmi/ binderfs/
+			   vfio-mdev/ statx/ qmi/ binderfs/ pidfd/
diff --git a/samples/pidfd/Makefile b/samples/pidfd/Makefile
new file mode 100644
index 000000000000..0ff97784177a
--- /dev/null
+++ b/samples/pidfd/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+
+hostprogs-y := pidfd-metadata
+always := $(hostprogs-y)
+HOSTCFLAGS_pidfd-metadata.o += -I$(objtree)/usr/include
+all: pidfd-metadata
diff --git a/samples/pidfd/pidfd-metadata.c b/samples/pidfd/pidfd-metadata.c
new file mode 100644
index 000000000000..c46c6c34a012
--- /dev/null
+++ b/samples/pidfd/pidfd-metadata.c
@@ -0,0 +1,169 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include <err.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <limits.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#ifndef CLONE_PIDFD
+#define CLONE_PIDFD 0x00001000
+#endif
+
+static int raw_clone_pidfd(void)
+{
+	unsigned long flags = CLONE_PIDFD;
+
+#if defined(__s390x__) || defined(__s390__) || defined(__CRIS__)
+	/* On s390/s390x and cris the order of the first and second arguments
+         * of the system call is reversed.
+         */
+	return (int)syscall(__NR_clone, NULL, flags | SIGCHLD);
+#elif defined(__sparc__) && defined(__arch64__)
+	{
+		/*
+                 * sparc64 always returns the other process id in %o0, and a
+                 * boolean flag whether this is the child or the parent in %o1.
+                 * Inline assembly is needed to get the flag returned in %o1.
+                 */
+		int in_child;
+		int child_pid;
+		asm volatile("mov %2, %%g1\n\t"
+			     "mov %3, %%o0\n\t"
+			     "mov 0 , %%o1\n\t"
+			     "t 0x6d\n\t"
+			     "mov %%o1, %0\n\t"
+			     "mov %%o0, %1"
+			     : "=r"(in_child), "=r"(child_pid)
+			     : "i"(__NR_clone), "r"(flags | SIGCHLD)
+			     : "%o1", "%o0", "%g1");
+
+		if (in_child)
+			return 0;
+		else
+			return child_pid;
+	}
+#elif defined(__ia64__)
+	/* On ia64 the stack and stack size are passed as separate arguments. */
+	return (int)syscall(__NR_clone, flags | SIGCHLD, NULL, prctl_arg(0));
+#else
+	return (int)syscall(__NR_clone, flags | SIGCHLD, NULL);
+#endif
+}
+
+static inline int sys_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
+					unsigned int flags)
+{
+	return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
+}
+
+static int pidfd_metadata_fd(int pidfd)
+{
+	int procfd, ret;
+	char path[100];
+	FILE *f;
+	size_t n = 0;
+	char *line = NULL;
+
+	snprintf(path, sizeof(path), "/proc/self/fdinfo/%d", pidfd);
+
+	f = fopen(path, "re");
+	if (!f)
+		return -1;
+
+	ret = 0;
+	while (getline(&line, &n, f) != -1) {
+		char *numstr;
+		size_t len;
+
+		if (strncmp(line, "Pid:\t", 5))
+			continue;
+
+		numstr = line + 5;
+		len = strlen(numstr);
+		if (len > 0 && numstr[len - 1] == '\n')
+			numstr[len - 1] = '\0';
+		ret = snprintf(path, sizeof(path), "/proc/%s", numstr);
+		break;
+	}
+	free(line);
+	fclose(f);
+
+	if (!ret) {
+		errno = ENOENT;
+		warn("Failed to parse pid from fdinfo\n");
+		return -1;
+	}
+
+	procfd = open(path, O_DIRECTORY | O_RDONLY | O_CLOEXEC);
+	if (procfd < 0) {
+		warn("Failed to open %s\n", path);
+		return -1;
+	}
+
+	/*
+	 * Verify that the pid has not been recycled and our /proc/<pid> handle
+	 * is still valid.
+	 */
+	if (sys_pidfd_send_signal(pidfd, 0, NULL, 0) < 0) {
+		/* process does not exist */
+		if (errno == ESRCH) {
+			warn("The pid was recycled\n");
+			close(procfd);
+			return -1;
+		}
+
+		/* just not allowed to signal it */
+	}
+
+	return procfd;
+}
+
+int main(int argc, char *argv[])
+{
+	int procfd, ret = EXIT_FAILURE;
+	ssize_t bytes;
+	char buf[4096] = { 0 };
+
+	int pidfd = raw_clone_pidfd();
+	if (pidfd < 0)
+		return -1;
+
+	if (pidfd == 0) {
+		printf("%d\n", getpid());
+		exit(EXIT_SUCCESS);
+	}
+
+	procfd = pidfd_metadata_fd(pidfd);
+	close(pidfd);
+	if (procfd < 0)
+		goto out;
+
+	int statusfd = openat(procfd, "status", O_RDONLY | O_CLOEXEC);
+	close(procfd);
+	if (statusfd < 0)
+		goto out;
+
+	bytes = read(statusfd, buf, sizeof(buf));
+	if (bytes > 0)
+		bytes = write(STDOUT_FILENO, buf, bytes);
+	close(statusfd);
+
+out:
+	(void)wait(NULL);
+	if (bytes < 0 || ret)
+		exit(EXIT_FAILURE);
+
+	exit(EXIT_SUCCESS);
+}
-- 
2.21.0

^ permalink raw reply related

* [RFC-2 PATCH 3/4] signal: support CLONE_PIDFD with pidfd_send_signal
From: Christian Brauner @ 2019-04-10 23:40 UTC (permalink / raw)
  To: torvalds, viro, jannh, dhowells, linux-api, linux-kernel
  Cc: serge, luto, arnd, ebiederm, keescook, adobriyan, tglx,
	mtk.manpages, bl0pbl33p, ldv, akpm, oleg, cyphar, joel, dancol,
	Christian Brauner
In-Reply-To: <20190410234045.29846-1-christian@brauner.io>

Let pidfd_send_signal() use pidfds retrieved via CLONE_PIDFD. With this
patch pidfd_send_signal() becomes independent of procfs. This fullfils the
request made when we merged the pidfd_send_signal() patchset. The
pidfd_send_signal() syscall is now always available allowing for it to be
used by users without procfs mounted or even users without procfs support
compiled into the kernel.

Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Jann Horn <jann@thejh.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Jonathan Kowalski <bl0pbl33p@gmail.com>
Cc: "Dmitry V. Levin" <ldv@altlinux.org>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
---
 kernel/signal.c | 14 ++++++++++----
 kernel/sys_ni.c |  3 ---
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index f98448cf2def..cd83cc376767 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -3513,7 +3513,6 @@ SYSCALL_DEFINE2(kill, pid_t, pid, int, sig)
 	return kill_something_info(sig, &info, pid);
 }
 
-#ifdef CONFIG_PROC_FS
 /*
  * Verify that the signaler and signalee either are in the same pid namespace
  * or that the signaler's pid namespace is an ancestor of the signalee's pid
@@ -3550,6 +3549,14 @@ static int copy_siginfo_from_user_any(kernel_siginfo_t *kinfo, siginfo_t *info)
 	return copy_siginfo_from_user(kinfo, info);
 }
 
+static struct pid *pidfd_to_pid(const struct file *file)
+{
+	if (file->f_op == &pidfd_fops)
+		return file->private_data;
+
+	return tgid_pidfd_to_pid(file);
+}
+
 /**
  * sys_pidfd_send_signal - send a signal to a process through a task file
  *                          descriptor
@@ -3581,12 +3588,12 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
 	if (flags)
 		return -EINVAL;
 
-	f = fdget_raw(pidfd);
+	f = fdget(pidfd);
 	if (!f.file)
 		return -EBADF;
 
 	/* Is this a pidfd? */
-	pid = tgid_pidfd_to_pid(f.file);
+	pid = pidfd_to_pid(f.file);
 	if (IS_ERR(pid)) {
 		ret = PTR_ERR(pid);
 		goto err;
@@ -3620,7 +3627,6 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
 	fdput(f);
 	return ret;
 }
-#endif /* CONFIG_PROC_FS */
 
 static int
 do_send_specific(pid_t tgid, pid_t pid, int sig, struct kernel_siginfo *info)
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index d21f4befaea4..4d9ae5ea6caf 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -167,9 +167,6 @@ COND_SYSCALL(syslog);
 
 /* kernel/sched/core.c */
 
-/* kernel/signal.c */
-COND_SYSCALL(pidfd_send_signal);
-
 /* kernel/sys.c */
 COND_SYSCALL(setregid);
 COND_SYSCALL(setgid);
-- 
2.21.0

^ permalink raw reply related

* [RFC-2 PATCH 2/4] fork: add CLONE_PIDFD via anonymous inode
From: Christian Brauner @ 2019-04-10 23:40 UTC (permalink / raw)
  To: torvalds, viro, jannh, dhowells, linux-api, linux-kernel
  Cc: serge, luto, arnd, ebiederm, keescook, adobriyan, tglx,
	mtk.manpages, bl0pbl33p, ldv, akpm, oleg, cyphar, joel, dancol,
	Christian Brauner
In-Reply-To: <20190410234045.29846-1-christian@brauner.io>

This patchset makes it possible to retrieve pid file descriptors at process
creation time by introducing a new flag CLONE_PIDFD. As spotted by Linus,
there is exactly one bit left.

In this version of CLONE_PIDFD anonymous inode file descriptors are used.
They serve as a simple opaque handle on pids. Logically, this makes it
possible to interpret a pidfd differently, narrowing or widening the scope
of various operations (e.g. signal sending). Thus, a pidfd cannot just
refer to a tgid, but also a tid, or in theory - given appropriate flag
arguments in relevant syscalls - a process group or session.

This patchset uses anonymous file descriptors instead of file descriptors
from /proc/<pid>.
A pidfd in this style comes with additional information in fdinfo: the pid
of the process it refers to in the current pid namespace (Pid: %d).

Even though originally file descriptors to /proc/<pid> were preferred we
discovered the associated complexity while implementing this solution which
prompted us to implement an alternative and put it up for debate. We have
chosen to implement this alternative to illustrate how strikingly simple
this patchset is in comparision to the original approach.
To remove worries about missing metadata access we have written a POC that
illustrates how a combination of CLONE_PIDFD, fdinfo, and
pidfd_send_signal() can be used to gain race-free access to process
metadata through /proc/<pid>. The sample program can easily be translated
into a helper that would be suitable for inclusion in libc so that users
don't have to worry about writing it themselves.

We hope that this ultimately will be the approach the community prefers.

Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Jann Horn <jann@thejh.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Jonathan Kowalski <bl0pbl33p@gmail.com>
Cc: "Dmitry V. Levin" <ldv@altlinux.org>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
---
 include/linux/pid.h        |  2 +
 include/uapi/linux/sched.h |  1 +
 kernel/fork.c              | 94 ++++++++++++++++++++++++++++++++++++--
 3 files changed, 92 insertions(+), 5 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index b6f4ba16065a..3c8ef5a199ca 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -66,6 +66,8 @@ struct pid
 
 extern struct pid init_struct_pid;
 
+extern const struct file_operations pidfd_fops;
+
 static inline struct pid *get_pid(struct pid *pid)
 {
 	if (pid)
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 22627f80063e..cd9bd14ce56d 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -10,6 +10,7 @@
 #define CLONE_FS	0x00000200	/* set if fs info shared between processes */
 #define CLONE_FILES	0x00000400	/* set if open files shared between processes */
 #define CLONE_SIGHAND	0x00000800	/* set if signal handlers and blocked signals shared */
+#define CLONE_PIDFD	0x00001000	/* create new pid file descriptor */
 #define CLONE_PTRACE	0x00002000	/* set if we want to let tracing continue on the child too */
 #define CLONE_VFORK	0x00004000	/* set if the parent wants the child to wake it up on mm_release */
 #define CLONE_PARENT	0x00008000	/* set if we want to have the same parent as the cloner */
diff --git a/kernel/fork.c b/kernel/fork.c
index 9dcd18aa210b..5716ea8c32e5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -11,6 +11,7 @@
  * management can be a bitch. See 'mm/memory.c': 'copy_page_range()'
  */
 
+#include <linux/anon_inodes.h>
 #include <linux/slab.h>
 #include <linux/sched/autogroup.h>
 #include <linux/sched/mm.h>
@@ -21,8 +22,10 @@
 #include <linux/sched/task.h>
 #include <linux/sched/task_stack.h>
 #include <linux/sched/cputime.h>
+#include <linux/seq_file.h>
 #include <linux/rtmutex.h>
 #include <linux/init.h>
+#include <linux/fsnotify.h>
 #include <linux/unistd.h>
 #include <linux/module.h>
 #include <linux/vmalloc.h>
@@ -1662,6 +1665,64 @@ static inline void rcu_copy_process(struct task_struct *p)
 #endif /* #ifdef CONFIG_TASKS_RCU */
 }
 
+static int pidfd_release(struct inode *inode, struct file *file)
+{
+	struct pid *pid = file->private_data;
+	file->private_data = NULL;
+	put_pid(pid);
+	return 0;
+}
+
+#ifdef CONFIG_PROC_FS
+static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
+{
+	struct pid_namespace *ns = proc_pid_ns(file_inode(m->file));
+	struct pid *pid = f->private_data;
+
+	seq_put_decimal_ull(m, "Pid:\t", pid_nr_ns(pid, ns));
+	seq_putc(m, '\n');
+}
+#endif
+
+const struct file_operations pidfd_fops = {
+	.release = pidfd_release,
+#ifdef CONFIG_PROC_FS
+	.show_fdinfo = pidfd_show_fdinfo,
+#endif
+};
+
+static int pidfd_create_cloexec(struct pid *pid, struct file **file)
+{
+	unsigned int flags = O_RDWR | O_CLOEXEC;
+	int error, fd;
+	struct file *f;
+
+	error = __alloc_fd(current->files, 1, rlimit(RLIMIT_NOFILE), flags);
+	if (error < 0)
+		return error;
+	fd = error;
+
+	f = anon_inode_getfile("pidfd", &pidfd_fops, get_pid(pid), flags);
+	if (IS_ERR(f)) {
+		put_pid(pid);
+		error = PTR_ERR(f);
+		goto err_put_unused_fd;
+	}
+
+	*file = f;
+	return fd;
+
+err_put_unused_fd:
+	put_unused_fd(fd);
+	return error;
+}
+
+static inline void pidfd_put_cloexec(struct pid *pid, int fd, struct file *file)
+{
+        put_unused_fd(fd);
+	fput(file);
+}
+
 /*
  * This creates a new process as a copy of the old one,
  * but does not actually start it yet.
@@ -1678,11 +1739,12 @@ static __latent_entropy struct task_struct *copy_process(
 					struct pid *pid,
 					int trace,
 					unsigned long tls,
-					int node)
+					int node, int *pidfd)
 {
 	int retval;
 	struct task_struct *p;
 	struct multiprocess_signals delayed;
+	struct file *pidfdf = NULL;
 
 	/*
 	 * Don't allow sharing the root directory with processes in a different
@@ -1936,6 +1998,18 @@ static __latent_entropy struct task_struct *copy_process(
 		}
 	}
 
+	/*
+	 * This has to happen after we've potentially unshared the file
+	 * descriptor table (so that the pidfd doesn't leak into the child if
+	 * the fd table isn't shared).
+	 */
+	if (clone_flags & CLONE_PIDFD) {
+		retval = pidfd_create_cloexec(pid, &pidfdf);
+		if (retval < 0)
+			goto bad_fork_free_pid;
+		*pidfd = retval;
+	}
+
 #ifdef CONFIG_BLOCK
 	p->plug = NULL;
 #endif
@@ -1996,7 +2070,7 @@ static __latent_entropy struct task_struct *copy_process(
 	 */
 	retval = cgroup_can_fork(p);
 	if (retval)
-		goto bad_fork_free_pid;
+		goto bad_fork_put_pidfd;
 
 	/*
 	 * From this point on we must avoid any synchronous user-space
@@ -2097,6 +2171,9 @@ static __latent_entropy struct task_struct *copy_process(
 	syscall_tracepoint_update(p);
 	write_unlock_irq(&tasklist_lock);
 
+	if (clone_flags & CLONE_PIDFD)
+		fd_install(*pidfd, pidfdf);
+
 	proc_fork_connector(p);
 	cgroup_post_fork(p);
 	cgroup_threadgroup_change_end(current);
@@ -2111,6 +2188,9 @@ static __latent_entropy struct task_struct *copy_process(
 	spin_unlock(&current->sighand->siglock);
 	write_unlock_irq(&tasklist_lock);
 	cgroup_cancel_fork(p);
+bad_fork_put_pidfd:
+	if (clone_flags & CLONE_PIDFD)
+		pidfd_put_cloexec(pid, *pidfd, pidfdf);
 bad_fork_free_pid:
 	cgroup_threadgroup_change_end(current);
 	if (pid != &init_struct_pid)
@@ -2177,7 +2257,7 @@ struct task_struct *fork_idle(int cpu)
 {
 	struct task_struct *task;
 	task = copy_process(CLONE_VM, 0, 0, NULL, &init_struct_pid, 0, 0,
-			    cpu_to_node(cpu));
+			    cpu_to_node(cpu), NULL);
 	if (!IS_ERR(task)) {
 		init_idle_pids(task);
 		init_idle(task, cpu);
@@ -2202,7 +2282,7 @@ long _do_fork(unsigned long clone_flags,
 	struct completion vfork;
 	struct pid *pid;
 	struct task_struct *p;
-	int trace = 0;
+	int pidfd, trace = 0;
 	long nr;
 
 	/*
@@ -2224,7 +2304,7 @@ long _do_fork(unsigned long clone_flags,
 	}
 
 	p = copy_process(clone_flags, stack_start, stack_size,
-			 child_tidptr, NULL, trace, tls, NUMA_NO_NODE);
+			 child_tidptr, NULL, trace, tls, NUMA_NO_NODE, &pidfd);
 	add_latent_entropy();
 
 	if (IS_ERR(p))
@@ -2260,6 +2340,10 @@ long _do_fork(unsigned long clone_flags,
 	}
 
 	put_pid(pid);
+
+	if (clone_flags & CLONE_PIDFD)
+		nr = pidfd;
+
 	return nr;
 }
 
-- 
2.21.0

^ permalink raw reply related

* [RFC-2 PATCH 1/4] Make anon_inodes unconditional
From: Christian Brauner @ 2019-04-10 23:40 UTC (permalink / raw)
  To: torvalds, viro, jannh, dhowells, linux-api, linux-kernel
  Cc: serge, luto, arnd, ebiederm, keescook, adobriyan, tglx,
	mtk.manpages, bl0pbl33p, ldv, akpm, oleg, cyphar, joel, dancol,
	Christian Brauner
In-Reply-To: <20190410234045.29846-1-christian@brauner.io>

From: David Howells <dhowells@redhat.com>

Make the anon_inodes facility unconditional so that it can be used by core
VFS code and the pidfd_open() syscall.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[christian@brauner.io: adapt commit message to mention pidfd_open()]
Signed-off-by: Christian Brauner <christian@brauner.io>
---
 arch/arm/kvm/Kconfig       |  1 -
 arch/arm64/kvm/Kconfig     |  1 -
 arch/mips/kvm/Kconfig      |  1 -
 arch/powerpc/kvm/Kconfig   |  1 -
 arch/s390/kvm/Kconfig      |  1 -
 arch/x86/Kconfig           |  1 -
 arch/x86/kvm/Kconfig       |  1 -
 drivers/base/Kconfig       |  1 -
 drivers/char/tpm/Kconfig   |  1 -
 drivers/dma-buf/Kconfig    |  1 -
 drivers/gpio/Kconfig       |  1 -
 drivers/iio/Kconfig        |  1 -
 drivers/infiniband/Kconfig |  1 -
 drivers/vfio/Kconfig       |  1 -
 fs/Makefile                |  2 +-
 fs/notify/fanotify/Kconfig |  1 -
 fs/notify/inotify/Kconfig  |  1 -
 init/Kconfig               | 10 ----------
 18 files changed, 1 insertion(+), 27 deletions(-)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 3f5320f46de2..f591026347a5 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -22,7 +22,6 @@ config KVM
 	bool "Kernel-based Virtual Machine (KVM) support"
 	depends on MMU && OF
 	select PREEMPT_NOTIFIERS
-	select ANON_INODES
 	select ARM_GIC
 	select ARM_GIC_V3
 	select ARM_GIC_V3_ITS
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index a3f85624313e..a67121d419a2 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -23,7 +23,6 @@ config KVM
 	depends on OF
 	select MMU_NOTIFIER
 	select PREEMPT_NOTIFIERS
-	select ANON_INODES
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
 	select HAVE_KVM_ARCH_TLB_FLUSH_ALL
 	select KVM_MMIO
diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
index 4528bc9c3cb1..eac25aef21e0 100644
--- a/arch/mips/kvm/Kconfig
+++ b/arch/mips/kvm/Kconfig
@@ -21,7 +21,6 @@ config KVM
 	depends on MIPS_FP_SUPPORT
 	select EXPORT_UASM
 	select PREEMPT_NOTIFIERS
-	select ANON_INODES
 	select KVM_GENERIC_DIRTYLOG_READ_PROTECT
 	select HAVE_KVM_VCPU_ASYNC_IOCTL
 	select KVM_MMIO
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index bfdde04e4905..f53997a8ca62 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -20,7 +20,6 @@ if VIRTUALIZATION
 config KVM
 	bool
 	select PREEMPT_NOTIFIERS
-	select ANON_INODES
 	select HAVE_KVM_EVENTFD
 	select HAVE_KVM_VCPU_ASYNC_IOCTL
 	select SRCU
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index 767453faacfc..1816ee48eadd 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -21,7 +21,6 @@ config KVM
 	prompt "Kernel-based Virtual Machine (KVM) support"
 	depends on HAVE_KVM
 	select PREEMPT_NOTIFIERS
-	select ANON_INODES
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
 	select HAVE_KVM_VCPU_ASYNC_IOCTL
 	select HAVE_KVM_EVENTFD
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5ad92419be19..7a70fb58b2d0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -44,7 +44,6 @@ config X86
 	#
 	select ACPI_LEGACY_TABLES_LOOKUP	if ACPI
 	select ACPI_SYSTEM_POWER_STATES_SUPPORT	if ACPI
-	select ANON_INODES
 	select ARCH_32BIT_OFF_T			if X86_32
 	select ARCH_CLOCKSOURCE_DATA
 	select ARCH_CLOCKSOURCE_INIT
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 72fa955f4a15..fc042419e670 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -27,7 +27,6 @@ config KVM
 	depends on X86_LOCAL_APIC
 	select PREEMPT_NOTIFIERS
 	select MMU_NOTIFIER
-	select ANON_INODES
 	select HAVE_KVM_IRQCHIP
 	select HAVE_KVM_IRQFD
 	select IRQ_BYPASS_MANAGER
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 059700ea3521..03f067da12ee 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -174,7 +174,6 @@ source "drivers/base/regmap/Kconfig"
 config DMA_SHARED_BUFFER
 	bool
 	default n
-	select ANON_INODES
 	select IRQ_WORK
 	help
 	  This option enables the framework for buffer-sharing between
diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig
index 536e55d3919f..f3e4bc490cf0 100644
--- a/drivers/char/tpm/Kconfig
+++ b/drivers/char/tpm/Kconfig
@@ -157,7 +157,6 @@ config TCG_CRB
 config TCG_VTPM_PROXY
 	tristate "VTPM Proxy Interface"
 	depends on TCG_TPM
-	select ANON_INODES
 	---help---
 	  This driver proxies for an emulated TPM (vTPM) running in userspace.
 	  A device /dev/vtpmx is provided that creates a device pair
diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig
index 2e5a0faa2cb1..3fc9c2efc583 100644
--- a/drivers/dma-buf/Kconfig
+++ b/drivers/dma-buf/Kconfig
@@ -3,7 +3,6 @@ menu "DMABUF options"
 config SYNC_FILE
 	bool "Explicit Synchronization Framework"
 	default n
-	select ANON_INODES
 	select DMA_SHARED_BUFFER
 	---help---
 	  The Sync File Framework adds explicit syncronization via
diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index 3f50526a771f..0f91600c27ae 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -12,7 +12,6 @@ config ARCH_HAVE_CUSTOM_GPIO_H
 
 menuconfig GPIOLIB
 	bool "GPIO Support"
-	select ANON_INODES
 	help
 	  This enables GPIO support through the generic GPIO library.
 	  You only need to enable this, if you also want to enable
diff --git a/drivers/iio/Kconfig b/drivers/iio/Kconfig
index d08aeb41cd07..1dec0fecb6ef 100644
--- a/drivers/iio/Kconfig
+++ b/drivers/iio/Kconfig
@@ -4,7 +4,6 @@
 
 menuconfig IIO
 	tristate "Industrial I/O support"
-	select ANON_INODES
 	help
 	  The industrial I/O subsystem provides a unified framework for
 	  drivers for many different types of embedded sensors using a
diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index a1fb840de45d..d318bab25860 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -25,7 +25,6 @@ config INFINIBAND_USER_MAD
 
 config INFINIBAND_USER_ACCESS
 	tristate "InfiniBand userspace access (verbs and CM)"
-	select ANON_INODES
 	depends on MMU
 	---help---
 	  Userspace InfiniBand access support.  This enables the
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 9de5ed38da83..3798d77d131c 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -22,7 +22,6 @@ menuconfig VFIO
 	tristate "VFIO Non-Privileged userspace driver framework"
 	depends on IOMMU_API
 	select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM || ARM64)
-	select ANON_INODES
 	help
 	  VFIO provides a framework for secure userspace device drivers.
 	  See Documentation/vfio.txt for more details.
diff --git a/fs/Makefile b/fs/Makefile
index 427fec226fae..35945f8139e6 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -25,7 +25,7 @@ obj-$(CONFIG_PROC_FS) += proc_namespace.o
 
 obj-y				+= notify/
 obj-$(CONFIG_EPOLL)		+= eventpoll.o
-obj-$(CONFIG_ANON_INODES)	+= anon_inodes.o
+obj-y				+= anon_inodes.o
 obj-$(CONFIG_SIGNALFD)		+= signalfd.o
 obj-$(CONFIG_TIMERFD)		+= timerfd.o
 obj-$(CONFIG_EVENTFD)		+= eventfd.o
diff --git a/fs/notify/fanotify/Kconfig b/fs/notify/fanotify/Kconfig
index 735bfb2e9190..521dc91d2cb5 100644
--- a/fs/notify/fanotify/Kconfig
+++ b/fs/notify/fanotify/Kconfig
@@ -1,7 +1,6 @@
 config FANOTIFY
 	bool "Filesystem wide access notification"
 	select FSNOTIFY
-	select ANON_INODES
 	select EXPORTFS
 	default n
 	---help---
diff --git a/fs/notify/inotify/Kconfig b/fs/notify/inotify/Kconfig
index b981fc0c8379..0161c74e76e2 100644
--- a/fs/notify/inotify/Kconfig
+++ b/fs/notify/inotify/Kconfig
@@ -1,6 +1,5 @@
 config INOTIFY_USER
 	bool "Inotify support for userspace"
-	select ANON_INODES
 	select FSNOTIFY
 	default y
 	---help---
diff --git a/init/Kconfig b/init/Kconfig
index 4592bf7997c0..be8f97e37a76 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1171,9 +1171,6 @@ config LD_DEAD_CODE_DATA_ELIMINATION
 config SYSCTL
 	bool
 
-config ANON_INODES
-	bool
-
 config HAVE_UID16
 	bool
 
@@ -1378,14 +1375,12 @@ config HAVE_FUTEX_CMPXCHG
 config EPOLL
 	bool "Enable eventpoll support" if EXPERT
 	default y
-	select ANON_INODES
 	help
 	  Disabling this option will cause the kernel to be built without
 	  support for epoll family of system calls.
 
 config SIGNALFD
 	bool "Enable signalfd() system call" if EXPERT
-	select ANON_INODES
 	default y
 	help
 	  Enable the signalfd() system call that allows to receive signals
@@ -1395,7 +1390,6 @@ config SIGNALFD
 
 config TIMERFD
 	bool "Enable timerfd() system call" if EXPERT
-	select ANON_INODES
 	default y
 	help
 	  Enable the timerfd() system call that allows to receive timer
@@ -1405,7 +1399,6 @@ config TIMERFD
 
 config EVENTFD
 	bool "Enable eventfd() system call" if EXPERT
-	select ANON_INODES
 	default y
 	help
 	  Enable the eventfd() system call that allows to receive both
@@ -1516,7 +1509,6 @@ config KALLSYMS_BASE_RELATIVE
 # syscall, maps, verifier
 config BPF_SYSCALL
 	bool "Enable bpf() system call"
-	select ANON_INODES
 	select BPF
 	select IRQ_WORK
 	default n
@@ -1533,7 +1525,6 @@ config BPF_JIT_ALWAYS_ON
 
 config USERFAULTFD
 	bool "Enable userfaultfd() system call"
-	select ANON_INODES
 	depends on MMU
 	help
 	  Enable the userfaultfd() system call that allows to intercept and
@@ -1600,7 +1591,6 @@ config PERF_EVENTS
 	bool "Kernel performance events and counters"
 	default y if PROFILING
 	depends on HAVE_PERF_EVENTS
-	select ANON_INODES
 	select IRQ_WORK
 	select SRCU
 	help
-- 
2.21.0

^ permalink raw reply related

* [RFC-1 PATCH 1/1] fork: add CLONE_PIDFD via /proc/<pid>
From: Christian Brauner @ 2019-04-10 23:40 UTC (permalink / raw)
  To: torvalds, viro, jannh, dhowells, linux-api, linux-kernel
  Cc: serge, luto, arnd, ebiederm, keescook, adobriyan, tglx,
	mtk.manpages, bl0pbl33p, ldv, akpm, oleg, cyphar, joel, dancol,
	Christian Brauner
In-Reply-To: <20190410234045.29846-1-christian@brauner.io>

This is an RFC for the implementation of pidfds as /proc/<pid> file
descriptors. They can be retrieved through the clone() with the addition of
the CLONE_PIDFD flag.
The tricky part here is that we need to retrieve a file descriptor for
/proc/<pid> before clone's point of no return. Otherwise, we need to fail
the creation of a process that has already passed all barriers and is
visible in userspace. Getting that file descriptor then becomes a rather
intricate dance including allocating a detached dentry that we need to
commit once attach_pid() has been called.
Note that this RFC only includes the logic we think is needed to return
/proc/<pid> file descriptors from clone. It does *not* yet include the even
more complex logic needed to restrict procfs itself. And the additional
logic needed to prevent attacks such as openat(pidfd, "..", ...) and access
to /proc/<pid>/net/.

There are a couple of reasons why we stopped short of this and decided to
sent out an RFC first.
- Even the initial part of getting file descriptors from /proc/<pid> out
  out clone() required rather complex code that struck us as very
  inelegant and heavy (which granted, might partially caused by not seeing
  a cleaner way to implement this). Thus, it felt like we needed to see
  whether this is even remotely considered acceptable.
- While discussion further aspects of this approach with Al we received
  rather substantiated opposition to exposing even more codepaths to
  procfs.
- Restricting access to procfs properly requires a lot of invasive work
  even touching core vfs functions such as
  follow_dotdot()/follow_dotdot_rcu() which also caused 2.

Jann and I are providing a second RFC alongside this one that shows an
alternative and rather much simpler approach we think might be preferable.

Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Jann Horn <jann@thejh.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Jonathan Kowalski <bl0pbl33p@gmail.com>
Cc: "Dmitry V. Levin" <ldv@altlinux.org>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/proc/base.c             | 130 ++++++++++++++++++++++++++++++++++---
 fs/proc/fd.c               |   4 +-
 fs/proc/internal.h         |   2 +-
 fs/proc/namespaces.c       |   2 +-
 include/linux/proc_fs.h    |  19 ++++++
 include/uapi/linux/sched.h |   1 +
 kernel/fork.c              |  40 ++++++++++--
 7 files changed, 180 insertions(+), 18 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 6a803a0b75df..2f5d7bd5d047 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1731,7 +1731,7 @@ void task_dump_owner(struct task_struct *task, umode_t mode,
 	*rgid = gid;
 }
 
-struct inode *proc_pid_make_inode(struct super_block * sb,
+struct inode *proc_pid_make_inode(struct super_block * sb, struct pid *pid,
 				  struct task_struct *task, umode_t mode)
 {
 	struct inode * inode;
@@ -1753,7 +1753,7 @@ struct inode *proc_pid_make_inode(struct super_block * sb,
 	/*
 	 * grab the reference to task.
 	 */
-	ei->pid = get_task_pid(task, PIDTYPE_PID);
+	ei->pid = pid ? get_pid(pid) : get_task_pid(task, PIDTYPE_PID);
 	if (!ei->pid)
 		goto out_unlock;
 
@@ -2070,7 +2070,7 @@ proc_map_files_instantiate(struct dentry *dentry,
 	struct proc_inode *ei;
 	struct inode *inode;
 
-	inode = proc_pid_make_inode(dentry->d_sb, task, S_IFLNK |
+	inode = proc_pid_make_inode(dentry->d_sb, NULL, task, S_IFLNK |
 				    ((mode & FMODE_READ ) ? S_IRUSR : 0) |
 				    ((mode & FMODE_WRITE) ? S_IWUSR : 0));
 	if (!inode)
@@ -2428,7 +2428,7 @@ static struct dentry *proc_pident_instantiate(struct dentry *dentry,
 	struct inode *inode;
 	struct proc_inode *ei;
 
-	inode = proc_pid_make_inode(dentry->d_sb, task, p->mode);
+	inode = proc_pid_make_inode(dentry->d_sb, NULL, task, p->mode);
 	if (!inode)
 		return ERR_PTR(-ENOENT);
 
@@ -3184,12 +3184,13 @@ void proc_flush_task(struct task_struct *task)
 	}
 }
 
-static struct dentry *proc_pid_instantiate(struct dentry * dentry,
-				   struct task_struct *task, const void *ptr)
+static struct inode *proc_pid_dentry_init(struct dentry *dentry,
+					  struct task_struct *task)
 {
 	struct inode *inode;
 
-	inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
+	inode = proc_pid_make_inode(dentry->d_sb, NULL, task,
+				    S_IFDIR | S_IRUGO | S_IXUGO);
 	if (!inode)
 		return ERR_PTR(-ENOENT);
 
@@ -3201,9 +3202,122 @@ static struct dentry *proc_pid_instantiate(struct dentry * dentry,
 	pid_update_inode(task, inode);
 
 	d_set_d_op(dentry, &pid_dentry_operations);
+	return inode;
+}
+
+static struct dentry *proc_pid_instantiate(struct dentry * dentry,
+				   struct task_struct *task, const void *ptr)
+{
+	struct inode *inode = proc_pid_dentry_init(dentry, task);
+	if (IS_ERR(inode))
+		return ERR_CAST(inode);
 	return d_splice_alias(inode, dentry);
 }
 
+/*
+ * Open /proc/$pid before clone()'s point of no return, i.e. before
+ * attach_pid() has been called.
+ */
+int proc_pid_open_early(struct task_struct *task, struct pid *pid,
+			struct early_proc_pid *info)
+{
+	struct pid_namespace *ns = task_active_pid_ns(current);
+	struct dentry *root = ns->proc_mnt->mnt_root;
+	pid_t vpid = pid_nr_ns(pid, ns);
+	struct inode *inode;
+	char pid_str[12];
+	int res;
+
+	if (WARN_ON(vpid == 0))
+		return -ESRCH;
+
+	/*
+	 * We can't use lookup_one_len() here. When this function is called
+	 * attach_pid() will not have been called which means that
+	 * proc_pid_lookup() will fail with ENOENT as it can't successfully
+	 * find_task_by_pid_ns().
+	 * We can just use d_alloc_name() though.
+	 */
+	snprintf(pid_str, sizeof(pid_str), "%d", vpid);
+	info->dentry = d_alloc_name(root, pid_str);
+	if (IS_ERR(info->dentry))
+		return PTR_ERR(info->dentry);
+
+	info->fd = __alloc_fd(current->files, 1, rlimit(RLIMIT_NOFILE),
+			      O_CLOEXEC);
+	if (info->fd < 0) {
+		res = info->fd;
+		goto out_put_dentry;
+	}
+
+	info->tmp_dentry = d_alloc_anon(root->d_sb);
+	if (!info->tmp_dentry) {
+		res = -ENOMEM;
+		goto out_put_fd;
+	}
+
+	inode = proc_pid_dentry_init(info->tmp_dentry, task);
+	if (IS_ERR(inode)) {
+		res = PTR_ERR(inode);
+		goto out_put_tmp_dentry;
+	}
+
+	d_instantiate(info->tmp_dentry, inode);
+	info->file = file_open_root(info->tmp_dentry, ns->proc_mnt, "/",
+				    O_RDONLY | O_NOFOLLOW | O_DIRECTORY, 0);
+	if (IS_ERR(info->file)) {
+		res = PTR_ERR(info->file);
+		goto out_put_tmp_dentry;
+	}
+
+	return 0;
+
+out_put_tmp_dentry:
+	dput(info->tmp_dentry);
+
+out_put_fd:
+	put_unused_fd(info->fd);
+
+out_put_dentry:
+	dput(info->dentry);
+
+	return res;
+}
+
+void proc_pid_dentry_commit_lock(struct early_proc_pid *info)
+{
+	lock_rename(info->tmp_dentry, info->dentry->d_parent);
+}
+
+/*
+ * Commit /proc/$pid after clone()'s point of no return, and install the file
+ * descriptor.
+ * Drops the locks acquired by proc_pid_dentry_commit_lock().
+ * Returns the file descriptor.
+ */
+int proc_pid_dentry_commit_unlock(struct early_proc_pid *info)
+{
+	/* commit the dentry */
+	d_move(info->tmp_dentry, info->dentry);
+	unlock_rename(info->tmp_dentry, info->dentry->d_parent);
+
+	/* release extra references */
+	dput(info->tmp_dentry);
+	dput(info->dentry);
+
+	/* install fd */
+	fd_install(info->fd, info->file);
+	return info->fd;
+}
+
+void proc_pid_dentry_abort(struct early_proc_pid *info)
+{
+	fput(info->file);
+	dput(info->tmp_dentry);
+	put_unused_fd(info->fd);
+	dput(info->dentry);
+}
+
 struct dentry *proc_pid_lookup(struct dentry *dentry, unsigned int flags)
 {
 	struct task_struct *task;
@@ -3480,7 +3594,7 @@ static struct dentry *proc_task_instantiate(struct dentry *dentry,
 	struct task_struct *task, const void *ptr)
 {
 	struct inode *inode;
-	inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
+	inode = proc_pid_make_inode(dentry->d_sb, NULL, task, S_IFDIR | S_IRUGO | S_IXUGO);
 	if (!inode)
 		return ERR_PTR(-ENOENT);
 
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 81882a13212d..7e624695db5a 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -186,7 +186,7 @@ static struct dentry *proc_fd_instantiate(struct dentry *dentry,
 	struct proc_inode *ei;
 	struct inode *inode;
 
-	inode = proc_pid_make_inode(dentry->d_sb, task, S_IFLNK);
+	inode = proc_pid_make_inode(dentry->d_sb, NULL, task, S_IFLNK);
 	if (!inode)
 		return ERR_PTR(-ENOENT);
 
@@ -325,7 +325,7 @@ static struct dentry *proc_fdinfo_instantiate(struct dentry *dentry,
 	struct proc_inode *ei;
 	struct inode *inode;
 
-	inode = proc_pid_make_inode(dentry->d_sb, task, S_IFREG | S_IRUSR);
+	inode = proc_pid_make_inode(dentry->d_sb, NULL, task, S_IFREG | S_IRUSR);
 	if (!inode)
 		return ERR_PTR(-ENOENT);
 
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index d1671e97f7fe..9b4cb85b96be 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -159,7 +159,7 @@ extern int proc_pid_statm(struct seq_file *, struct pid_namespace *,
 extern const struct dentry_operations pid_dentry_operations;
 extern int pid_getattr(const struct path *, struct kstat *, u32, unsigned int);
 extern int proc_setattr(struct dentry *, struct iattr *);
-extern struct inode *proc_pid_make_inode(struct super_block *, struct task_struct *, umode_t);
+extern struct inode *proc_pid_make_inode(struct super_block *, struct pid *pid, struct task_struct *, umode_t);
 extern void pid_update_inode(struct task_struct *, struct inode *);
 extern int pid_delete_dentry(const struct dentry *);
 extern int proc_pid_readdir(struct file *, struct dir_context *);
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index dd2b35f78b09..b77e4234a892 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -94,7 +94,7 @@ static struct dentry *proc_ns_instantiate(struct dentry *dentry,
 	struct inode *inode;
 	struct proc_inode *ei;
 
-	inode = proc_pid_make_inode(dentry->d_sb, task, S_IFLNK | S_IRWXUGO);
+	inode = proc_pid_make_inode(dentry->d_sb, NULL, task, S_IFLNK | S_IRWXUGO);
 	if (!inode)
 		return ERR_PTR(-ENOENT);
 
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 52a283ba0465..e801481a8a24 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -75,6 +75,18 @@ struct proc_dir_entry *proc_create_net_single_write(const char *name, umode_t mo
 						    void *data);
 extern struct pid *tgid_pidfd_to_pid(const struct file *file);
 
+struct early_proc_pid {
+	struct dentry *dentry;
+	struct dentry *tmp_dentry;
+	struct file *file;
+	int fd;
+};
+int proc_pid_open_early(struct task_struct *task, struct pid *pid,
+			struct early_proc_pid *info);
+void proc_pid_dentry_commit_lock(struct early_proc_pid *info);
+int proc_pid_dentry_commit_unlock(struct early_proc_pid *info);
+void proc_pid_dentry_abort(struct early_proc_pid *info);
+
 #else /* CONFIG_PROC_FS */
 
 static inline void proc_root_init(void)
@@ -120,6 +132,13 @@ static inline struct pid *tgid_pidfd_to_pid(const struct file *file)
 	return ERR_PTR(-EBADF);
 }
 
+struct early_proc_pid {};
+static inline int proc_pid_open_early(struct task_struct *task,
+		struct pid *pid, struct early_proc_pid *info) { return -EINVAL; }
+void proc_pid_dentry_commit_lock(struct early_proc_pid *info) { }
+static inline int proc_pid_dentry_commit_unlock(struct early_proc_pid *info) { return -EINVAL; }
+static inline void proc_pid_dentry_abort(struct early_proc_pid *info) { }
+
 #endif /* CONFIG_PROC_FS */
 
 struct net;
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 22627f80063e..cd9bd14ce56d 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -10,6 +10,7 @@
 #define CLONE_FS	0x00000200	/* set if fs info shared between processes */
 #define CLONE_FILES	0x00000400	/* set if open files shared between processes */
 #define CLONE_SIGHAND	0x00000800	/* set if signal handlers and blocked signals shared */
+#define CLONE_PIDFD	0x00001000	/* create new pid file descriptor */
 #define CLONE_PTRACE	0x00002000	/* set if we want to let tracing continue on the child too */
 #define CLONE_VFORK	0x00004000	/* set if the parent wants the child to wake it up on mm_release */
 #define CLONE_PARENT	0x00008000	/* set if we want to have the same parent as the cloner */
diff --git a/kernel/fork.c b/kernel/fork.c
index 9dcd18aa210b..31b405eee020 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -32,6 +32,7 @@
 #include <linux/sem.h>
 #include <linux/file.h>
 #include <linux/fdtable.h>
+#include <linux/fsnotify.h>
 #include <linux/iocontext.h>
 #include <linux/key.h>
 #include <linux/binfmts.h>
@@ -1678,11 +1679,13 @@ static __latent_entropy struct task_struct *copy_process(
 					struct pid *pid,
 					int trace,
 					unsigned long tls,
-					int node)
+					int node,
+					int *pidfd)
 {
 	int retval;
 	struct task_struct *p;
 	struct multiprocess_signals delayed;
+	struct early_proc_pid proc_pid_info;
 
 	/*
 	 * Don't allow sharing the root directory with processes in a different
@@ -1936,6 +1939,17 @@ static __latent_entropy struct task_struct *copy_process(
 		}
 	}
 
+	/*
+	 * This has to happen after we've potentially unshared the file
+	 * descriptor table (so that the pidfd doesn't leak into the child if
+	 * the fd table isn't shared).
+	 */
+	if (clone_flags & CLONE_PIDFD) {
+		retval = proc_pid_open_early(p, pid, &proc_pid_info);
+		if (retval)
+			goto bad_fork_free_pid;
+	}
+
 #ifdef CONFIG_BLOCK
 	p->plug = NULL;
 #endif
@@ -1996,7 +2010,7 @@ static __latent_entropy struct task_struct *copy_process(
 	 */
 	retval = cgroup_can_fork(p);
 	if (retval)
-		goto bad_fork_free_pid;
+		goto bad_fork_abort_cgroup;
 
 	/*
 	 * From this point on we must avoid any synchronous user-space
@@ -2009,6 +2023,9 @@ static __latent_entropy struct task_struct *copy_process(
 	p->start_time = ktime_get_ns();
 	p->real_start_time = ktime_get_boot_ns();
 
+	if (clone_flags & CLONE_PIDFD)
+		proc_pid_dentry_commit_lock(&proc_pid_info);
+
 	/*
 	 * Make it visible to the rest of the system, but dont wake it up yet.
 	 * Need tasklist lock for parent etc handling!
@@ -2091,12 +2108,16 @@ static __latent_entropy struct task_struct *copy_process(
 		attach_pid(p, PIDTYPE_PID);
 		nr_threads++;
 	}
+
 	total_forks++;
 	hlist_del_init(&delayed.node);
 	spin_unlock(&current->sighand->siglock);
 	syscall_tracepoint_update(p);
 	write_unlock_irq(&tasklist_lock);
 
+	if (clone_flags & CLONE_PIDFD)
+		*pidfd = proc_pid_dentry_commit_unlock(&proc_pid_info);
+
 	proc_fork_connector(p);
 	cgroup_post_fork(p);
 	cgroup_threadgroup_change_end(current);
@@ -2111,8 +2132,11 @@ static __latent_entropy struct task_struct *copy_process(
 	spin_unlock(&current->sighand->siglock);
 	write_unlock_irq(&tasklist_lock);
 	cgroup_cancel_fork(p);
-bad_fork_free_pid:
+bad_fork_abort_cgroup:
 	cgroup_threadgroup_change_end(current);
+	if (clone_flags & CLONE_PIDFD)
+		proc_pid_dentry_abort(&proc_pid_info);
+bad_fork_free_pid:
 	if (pid != &init_struct_pid)
 		free_pid(pid);
 bad_fork_cleanup_thread:
@@ -2177,7 +2201,7 @@ struct task_struct *fork_idle(int cpu)
 {
 	struct task_struct *task;
 	task = copy_process(CLONE_VM, 0, 0, NULL, &init_struct_pid, 0, 0,
-			    cpu_to_node(cpu));
+			    cpu_to_node(cpu), NULL);
 	if (!IS_ERR(task)) {
 		init_idle_pids(task);
 		init_idle(task, cpu);
@@ -2202,7 +2226,7 @@ long _do_fork(unsigned long clone_flags,
 	struct completion vfork;
 	struct pid *pid;
 	struct task_struct *p;
-	int trace = 0;
+	int trace = 0, pidfd;
 	long nr;
 
 	/*
@@ -2224,7 +2248,7 @@ long _do_fork(unsigned long clone_flags,
 	}
 
 	p = copy_process(clone_flags, stack_start, stack_size,
-			 child_tidptr, NULL, trace, tls, NUMA_NO_NODE);
+			 child_tidptr, NULL, trace, tls, NUMA_NO_NODE, &pidfd);
 	add_latent_entropy();
 
 	if (IS_ERR(p))
@@ -2260,6 +2284,10 @@ long _do_fork(unsigned long clone_flags,
 	}
 
 	put_pid(pid);
+
+	if (clone_flags & CLONE_PIDFD)
+		nr = pidfd;
+
 	return nr;
 }
 
-- 
2.21.0

^ permalink raw reply related

* [RFC PATCH] fork: add CLONE_PIDFD
From: Christian Brauner @ 2019-04-10 23:40 UTC (permalink / raw)
  To: torvalds, viro, jannh, dhowells, linux-api, linux-kernel
  Cc: serge, luto, arnd, ebiederm, keescook, adobriyan, tglx,
	mtk.manpages, bl0pbl33p, ldv, akpm, oleg, cyphar, joel, dancol,
	Christian Brauner

Hey Linus,

This is an RFC for adding a new CLONE_PIDFD flag to clone() as
previously discussed.
While implementing this Jann and I ran into additional complexity that
prompted us to send out an initial RFC patchset to make sure we still
think going forward with the current implementation is a good idea and
also provide an alternative approach:

RFC-1:
This is an RFC for the implementation of pidfds as /proc/<pid> file
descriptors.
The tricky part here is that we need to retrieve a file descriptor for
/proc/<pid> before clone's point of no return. Otherwise, we need to fail
the creation of a process that has already passed all barriers and is
visible in userspace. Getting that file descriptor then becomes a rather
intricate dance including allocating a detached dentry that we need to
commit once attach_pid() has been called.
Note that this RFC only includes the logic we think is needed to return
/proc/<pid> file descriptors from clone. It does *not* yet include the even
more complex logic needed to restrict procfs itself. And the additional
logic needed to prevent attacks such as openat(pidfd, "..", ...) and access
to /proc/<pid>/net/ on top of the procfs restriction.

There are a couple of reasons why we stopped short of this and decided to
sent out an RFC first:
- Even the initial part of getting file descriptors from /proc/<pid> out
  of clone() required rather complex code that struck us as very
  inelegant and heavy (which granted, might partially caused by not seeing
  a cleaner way to implement this). Thus, it felt like we needed to see
  whether this is even remotely considered acceptable.
- While discussing further aspects of this approach with Al we received
  rather substantiated opposition to exposing even more codepaths to
  procfs.
- Restricting access to procfs properly requires a lot of invasive work
  even touching core vfs functions such as
  follow_dotdot()/follow_dotdot_rcu() which also caused 2.

RFC-2:
This alternative patchset uses anonymous file descriptors instead of
file descriptors from /proc/<pid>.
A pidfd in this style comes with additional information in fdinfo: the pid
of the process it refers to in the current pid namespace (Pid: %d).

We have chosen to implement this alternative to illustrate how
strikingly simple this patchset is in comparision to the original
approach.
To remove worries about missing metadata access we have written a POC
that illustrates how a combination of CLONE_PIDFD, fdinfo, and
pidfd_send_signal() can be used to gain race-free access to process
metadata through /proc/<pid>. The sample program can easily be
translated into a helper that would be suitable for inclusion in libc so
that users don't have to worry about writing it themselves.

Thanks!
Jann & Christian

RFC-1:
Christian Brauner (1):
  fork: add CLONE_PIDFD via /proc/<pid>

 fs/proc/base.c             | 130 ++++++++++++++++++++++++++++++++++---
 fs/proc/fd.c               |   4 +-
 fs/proc/internal.h         |   2 +-
 fs/proc/namespaces.c       |   2 +-
 include/linux/proc_fs.h    |  19 ++++++
 include/uapi/linux/sched.h |   1 +
 kernel/fork.c              |  40 ++++++++++--
 7 files changed, 180 insertions(+), 18 deletions(-)

RFC-2:
Christian Brauner (3):
  fork: add CLONE_PIDFD via anonymous inode
  signal: support CLONE_PIDFD with pidfd_send_signal
  samples: show race-free pidfd metadata access

David Howells (1):
  Make anon_inodes unconditional

 arch/arm/kvm/Kconfig           |   1 -
 arch/arm64/kvm/Kconfig         |   1 -
 arch/mips/kvm/Kconfig          |   1 -
 arch/powerpc/kvm/Kconfig       |   1 -
 arch/s390/kvm/Kconfig          |   1 -
 arch/x86/Kconfig               |   1 -
 arch/x86/kvm/Kconfig           |   1 -
 drivers/base/Kconfig           |   1 -
 drivers/char/tpm/Kconfig       |   1 -
 drivers/dma-buf/Kconfig        |   1 -
 drivers/gpio/Kconfig           |   1 -
 drivers/iio/Kconfig            |   1 -
 drivers/infiniband/Kconfig     |   1 -
 drivers/vfio/Kconfig           |   1 -
 fs/Makefile                    |   2 +-
 fs/notify/fanotify/Kconfig     |   1 -
 fs/notify/inotify/Kconfig      |   1 -
 include/linux/pid.h            |   2 +
 include/uapi/linux/sched.h     |   1 +
 init/Kconfig                   |  10 --
 kernel/fork.c                  |  94 +++++++++++++++++-
 kernel/signal.c                |  14 ++-
 kernel/sys_ni.c                |   3 -
 samples/Makefile               |   2 +-
 samples/pidfd/Makefile         |   6 ++
 samples/pidfd/pidfd-metadata.c | 169 +++++++++++++++++++++++++++++++++
 26 files changed, 279 insertions(+), 40 deletions(-)
 create mode 100644 samples/pidfd/Makefile
 create mode 100644 samples/pidfd/pidfd-metadata.c

-- 
2.21.0

^ permalink raw reply

* [PATCH 10/11] platform/x86: asus-wmi: Switch fan boost mode
From: Yurii Pavlovskyi @ 2019-04-10 20:34 UTC (permalink / raw)
  Cc: Corentin Chary, Darren Hart, Andy Shevchenko, Daniel Drake,
	acpi4asus-user, platform-driver-x86, linux-kernel, linux-api
In-Reply-To: <a8dd77f3-1e28-6416-5f04-aff44d5bb6d6@gmail.com>

The WMI exposes a write-only device ID where three modes can be switched
on some laptops (TUF Gaming FX505GM). There is a hotkey combination Fn-F5
that does have a fan icon which is designed to toggle between these 3
modes.

Add a SysFS entry that reads the last written value and updates value in
WMI on write and a hotkey handler that toggles the modes. The
corresponding DEVS device handler does obviously take 3 possible
argument values.

Method (SFBM, 1, NotSerialized)
{
    If ((Arg0 == Zero) { .. }
    If ((Arg0 == One)) { .. }
    If ((Arg0 == 0x02)) { .. }
}

... // DEVS
If ((IIA0 == 0x00110018))
{
   SFBM (IIA1)
   Return (One)
}

* 0x00 - is normal,
* 0x01 - is obviously turbo by the amount of noise, might be useful to
avoid CPU frequency throttling on high load,
* 0x02 - the meaning is unknown at the time as modes are not named
in the vendor documentation, but it does look like a quiet mode as CPU
temperature does increase about 10 degrees on maximum load.

Signed-off-by: Yurii Pavlovskyi <yurii.pavlovskyi@gmail.com>
---
 .../ABI/testing/sysfs-platform-asus-wmi       |  10 ++
 drivers/platform/x86/asus-wmi.c               | 119 ++++++++++++++++--
 include/linux/platform_data/x86/asus-wmi.h    |   1 +
 3 files changed, 117 insertions(+), 13 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-platform-asus-wmi b/Documentation/ABI/testing/sysfs-platform-asus-wmi
index 300a40519695..2b3184e297a7 100644
--- a/Documentation/ABI/testing/sysfs-platform-asus-wmi
+++ b/Documentation/ABI/testing/sysfs-platform-asus-wmi
@@ -97,3 +97,13 @@ Description:
 		Write changed RGB keyboard backlight parameters:
 			* 1 - permanently,
 			* 2 - temporarily.
+
+What:		/sys/devices/platform/<platform>/fan_mode
+Date:		Apr 2019
+KernelVersion:	5.1
+Contact:	"Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
+Description:
+		Fan boost mode:
+			* 0 - normal,
+			* 1 - turbo,
+			* 2 - quiet?
diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index f4323a57f22f..941c628945ac 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -69,6 +69,7 @@ MODULE_LICENSE("GPL");
 #define NOTIFY_KBD_BRTUP		0xc4
 #define NOTIFY_KBD_BRTDWN		0xc5
 #define NOTIFY_KBD_BRTTOGGLE		0xc7
+#define NOTIFY_KBD_FBM			0x99
 
 #define ASUS_FAN_DESC			"cpu_fan"
 #define ASUS_FAN_MFUN			0x13
@@ -77,6 +78,8 @@ MODULE_LICENSE("GPL");
 #define ASUS_FAN_CTRL_MANUAL		1
 #define ASUS_FAN_CTRL_AUTO		2
 
+#define ASUS_FAN_MODE_COUNT		3
+
 #define USB_INTEL_XUSB2PR		0xD0
 #define PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI	0x9c31
 
@@ -196,6 +199,9 @@ struct asus_wmi {
 	int asus_hwmon_num_fans;
 	int asus_hwmon_pwm;
 
+	bool fan_mode_available;
+	u8 fan_mode;
+
 	bool kbbl_rgb_available;
 	struct asus_kbbl_rgb kbbl_rgb;
 
@@ -1833,6 +1839,87 @@ static int asus_wmi_fan_init(struct asus_wmi *asus)
 	return 0;
 }
 
+/* Fan mode *******************************************************************/
+
+static int fan_mode_check_present(struct asus_wmi *asus)
+{
+	u32 result;
+	int err;
+
+	asus->fan_mode_available = false;
+
+	err = asus_wmi_get_devstate(asus, ASUS_WMI_DEVID_FAN_MODE, &result);
+	if (err) {
+		if (err == -ENODEV)
+			return 0;
+		else
+			return err;
+	}
+
+	if (result & ASUS_WMI_DSTS_PRESENCE_BIT)
+		asus->fan_mode_available = true;
+
+	return 0;
+}
+
+static int fan_mode_write(struct asus_wmi *asus)
+{
+	int err;
+	u8 value;
+	u32 retval;
+
+	value = asus->fan_mode % ASUS_FAN_MODE_COUNT;
+	pr_info(PR "Set fan mode: %u\n", value);
+	err = asus_wmi_set_devstate(ASUS_WMI_DEVID_FAN_MODE, value, &retval);
+
+	if (err) {
+		pr_warn(PR "Failed to set fan mode: %d\n", err);
+		return err;
+	}
+
+	if (retval != 1) {
+		pr_warn(PR "Failed to set fan mode (retval): 0x%x\n", retval);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static int fan_mode_switch_next(struct asus_wmi *asus)
+{
+	asus->fan_mode = (asus->fan_mode + 1) % ASUS_FAN_MODE_COUNT;
+	return fan_mode_write(asus);
+}
+
+static ssize_t fan_mode_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return show_u8(asus->fan_mode, buf);
+}
+
+static ssize_t fan_mode_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	int result;
+	u8 new_mode;
+
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	result = store_u8(&new_mode, buf, count);
+	if (result < 0)
+		return result;
+
+	asus->fan_mode = new_mode % ASUS_FAN_MODE_COUNT;
+	fan_mode_write(asus);
+
+	return result;
+}
+
+// Fan mode: 0 - normal, 1 - turbo, 2 - quiet?
+static DEVICE_ATTR_RW(fan_mode);
+
 /* Backlight ******************************************************************/
 
 static int read_backlight_power(struct asus_wmi *asus)
@@ -2084,6 +2171,9 @@ static void asus_wmi_handle_notify(int code, struct asus_wmi *asus)
 		return;
 	}
 
+	if (asus->fan_mode_available && code == NOTIFY_KBD_FBM)
+		fan_mode_switch_next(asus);
+
 	if (is_display_toggle(code) && asus->driver->quirks->no_display_toggle)
 		return;
 
@@ -2237,6 +2327,7 @@ static struct attribute *platform_attributes[] = {
 	&dev_attr_touchpad.attr,
 	&dev_attr_lid_resume.attr,
 	&dev_attr_als_enable.attr,
+	&dev_attr_fan_mode.attr,
 	NULL
 };
 
@@ -2258,6 +2349,8 @@ static umode_t asus_sysfs_is_visible(struct kobject *kobj,
 		devid = ASUS_WMI_DEVID_LID_RESUME;
 	else if (attr == &dev_attr_als_enable.attr)
 		devid = ASUS_WMI_DEVID_ALS_ENABLE;
+	else if (attr == &dev_attr_fan_mode.attr)
+		ok = asus->fan_mode_available;
 
 	if (devid != -1)
 		ok = !(asus_wmi_get_devstate_simple(asus, devid) < 0);
@@ -2282,7 +2375,7 @@ static int asus_wmi_sysfs_init(struct platform_device *device)
 
 /* Platform device ************************************************************/
 
-static int asus_wmi_platform_init(struct asus_wmi *asus)
+static void asus_wmi_platform_init(struct asus_wmi *asus)
 {
 	int rv;
 
@@ -2334,13 +2427,6 @@ static int asus_wmi_platform_init(struct asus_wmi *asus)
 	if (asus->driver->quirks->wapf >= 0)
 		asus_wmi_set_devstate(ASUS_WMI_DEVID_CWAP,
 				      asus->driver->quirks->wapf, NULL);
-
-	return asus_wmi_sysfs_init(asus->platform_device);
-}
-
-static void asus_wmi_platform_exit(struct asus_wmi *asus)
-{
-	asus_wmi_sysfs_exit(asus->platform_device);
 }
 
 /* debugfs ********************************************************************/
@@ -2515,9 +2601,15 @@ static int asus_wmi_add(struct platform_device *pdev)
 	if (wdrv->detect_quirks)
 		wdrv->detect_quirks(asus->driver);
 
-	err = asus_wmi_platform_init(asus);
+	asus_wmi_platform_init(asus);
+
+	err = fan_mode_check_present(asus);
 	if (err)
-		goto fail_platform;
+		goto fail_fan_mode;
+
+	err = asus_wmi_sysfs_init(asus->platform_device);
+	if (err)
+		goto fail_sysfs;
 
 	err = asus_wmi_input_init(asus);
 	if (err)
@@ -2612,8 +2704,9 @@ static int asus_wmi_add(struct platform_device *pdev)
 fail_hwmon:
 	asus_wmi_input_exit(asus);
 fail_input:
-	asus_wmi_platform_exit(asus);
-fail_platform:
+	asus_wmi_sysfs_exit(asus->platform_device);
+fail_sysfs:
+fail_fan_mode:
 	kfree(asus);
 	return err;
 }
@@ -2630,7 +2723,7 @@ static int asus_wmi_remove(struct platform_device *device)
 	kbbl_rgb_exit(asus);
 	asus_wmi_rfkill_exit(asus);
 	asus_wmi_debugfs_exit(asus);
-	asus_wmi_platform_exit(asus);
+	asus_wmi_sysfs_exit(asus->platform_device);
 	asus_wmi_hwmon_exit(asus);
 
 	kfree(asus);
diff --git a/include/linux/platform_data/x86/asus-wmi.h b/include/linux/platform_data/x86/asus-wmi.h
index 25b7b653e6d2..0f3654b7b8a8 100644
--- a/include/linux/platform_data/x86/asus-wmi.h
+++ b/include/linux/platform_data/x86/asus-wmi.h
@@ -59,6 +59,7 @@
 #define ASUS_WMI_DEVID_LIGHTBAR		0x00050025
 #define ASUS_WMI_DEVID_KBD_RGB		0x00100056
 #define ASUS_WMI_DEVID_KBD_RGB2		0x00100057
+#define ASUS_WMI_DEVID_FAN_MODE		0x00110018
 
 /* Misc */
 #define ASUS_WMI_DEVID_CAMERA		0x00060013
-- 
2.17.1

^ permalink raw reply related

* [PATCH 09/11] platform/x86: asus-wmi: Control RGB keyboard backlight
From: Yurii Pavlovskyi @ 2019-04-10 20:33 UTC (permalink / raw)
  Cc: Corentin Chary, Darren Hart, Andy Shevchenko, Daniel Drake,
	acpi4asus-user, platform-driver-x86, linux-kernel, linux-api
In-Reply-To: <a8dd77f3-1e28-6416-5f04-aff44d5bb6d6@gmail.com>

The WMI exposes two methods for controlling RGB keyboard backlight which
allow to control:
* RGB components in range 00 - ff,
* Switch between 4 effects,
* Switch between 3 effect speed modes,
* Separately enable the backlight on boot, in awake state (after driver
  load), in sleep mode, and probably in something called shutdown mode
  (no observable effects of enabling it are known so far).

The configuration should be written to several sysfs parameter buffers
which are then written via WMI by writing either 1 or 2 to the "kbbl_set"
parameter. When reading the buffers the last written value is returned.

If the 2 is written to "kbbl_set", the parameters will be reset on reboot
(temporary mode), 1 is permanent mode, parameters are retained.

The calls use new 3-dword input buffer method call.

The functionality is only enabled if corresponding DSTS methods return
exact valid values.

The following script demonstrates usage:

echo Red [00 - ff]
echo 33 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_red
echo Green [00 - ff]
echo ff > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_green
echo Blue [00 - ff]
echo 0 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_blue
echo Mode: 0 - static color, 1 - blink, 2 - rainbow, 3 - strobe
echo 0 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_mode
echo Speed for modes 1 and 2: 0 - slow, 1 - medium, 2 - fast
echo 0 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_speed
echo Enable: 02 - on boot, before module load, 08 - awake, 20 - sleep,
echo 2a or ff to set all
echo 2a > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_flags
echo Save: 1 - permanently, 2 - temporarily, reset after reboot
echo 1 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_set

Signed-off-by: Yurii Pavlovskyi <yurii.pavlovskyi@gmail.com>
---
 .../ABI/testing/sysfs-platform-asus-wmi       |  61 ++++
 drivers/platform/x86/asus-wmi.c               | 329 ++++++++++++++++++
 include/linux/platform_data/x86/asus-wmi.h    |   2 +
 3 files changed, 392 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-platform-asus-wmi b/Documentation/ABI/testing/sysfs-platform-asus-wmi
index 019e1e29370e..300a40519695 100644
--- a/Documentation/ABI/testing/sysfs-platform-asus-wmi
+++ b/Documentation/ABI/testing/sysfs-platform-asus-wmi
@@ -36,3 +36,64 @@ KernelVersion:	3.5
 Contact:	"AceLan Kao" <acelan.kao@canonical.com>
 Description:
 		Resume on lid open. 1 means on, 0 means off.
+
+What:		/sys/devices/platform/<platform>/kbbl/kbbl_red
+Date:		Apr 2019
+KernelVersion:	5.1
+Contact:	"Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
+Description:
+		RGB keyboard backlight red component: 00 .. ff.
+
+What:		/sys/devices/platform/<platform>/kbbl/kbbl_green
+Date:		Apr 2019
+KernelVersion:	5.1
+Contact:	"Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
+Description:
+		RGB keyboard backlight green component: 00 .. ff.
+
+What:		/sys/devices/platform/<platform>/kbbl/kbbl_blue
+Date:		Apr 2019
+KernelVersion:	5.1
+Contact:	"Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
+Description:
+		RGB keyboard backlight blue component: 00 .. ff.
+
+What:		/sys/devices/platform/<platform>/kbbl/kbbl_mode
+Date:		Apr 2019
+KernelVersion:	5.1
+Contact:	"Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
+Description:
+		RGB keyboard backlight mode:
+			* 0 - static color,
+			* 1 - blink,
+			* 2 - rainbow,
+			* 3 - strobe.
+
+What:		/sys/devices/platform/<platform>/kbbl/kbbl_speed
+Date:		Apr 2019
+KernelVersion:	5.1
+Contact:	"Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
+Description:
+		RGB keyboard backlight speed for modes 1 and 2:
+			* 0 - slow,
+			* 1 - medium,
+			* 2 - fast.
+
+What:		/sys/devices/platform/<platform>/kbbl/kbbl_flags
+Date:		Apr 2019
+KernelVersion:	5.1
+Contact:	"Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
+Description:
+		RGB keyboard backlight enable flags (2a to enable everything), OR of:
+			* 02 - on boot (until module load),
+			* 08 - awake,
+			* 20 - sleep.
+
+What:		/sys/devices/platform/<platform>/kbbl/kbbl_set
+Date:		Apr 2019
+KernelVersion:	5.1
+Contact:	"Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
+Description:
+		Write changed RGB keyboard backlight parameters:
+			* 1 - permanently,
+			* 2 - temporarily.
diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index 175ecd5b7c51..f4323a57f22f 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -145,6 +145,21 @@ struct asus_rfkill {
 	u32 dev_id;
 };
 
+struct asus_kbbl_rgb {
+	u8 kbbl_red;
+	u8 kbbl_green;
+	u8 kbbl_blue;
+	u8 kbbl_mode;
+	u8 kbbl_speed;
+
+	u8 kbbl_set_red;
+	u8 kbbl_set_green;
+	u8 kbbl_set_blue;
+	u8 kbbl_set_mode;
+	u8 kbbl_set_speed;
+	u8 kbbl_set_flags;
+};
+
 struct asus_wmi {
 	int dsts_id;
 	int spec;
@@ -181,6 +196,9 @@ struct asus_wmi {
 	int asus_hwmon_num_fans;
 	int asus_hwmon_pwm;
 
+	bool kbbl_rgb_available;
+	struct asus_kbbl_rgb kbbl_rgb;
+
 	struct hotplug_slot hotplug_slot;
 	struct mutex hotplug_lock;
 	struct mutex wmi_lock;
@@ -656,6 +674,310 @@ static int asus_wmi_led_init(struct asus_wmi *asus)
 	return rv;
 }
 
+/* RGB keyboard backlight *****************************************************/
+
+static ssize_t show_u8(u8 value, char *buf)
+{
+	return scnprintf(buf, PAGE_SIZE, "%02x\n", value);
+}
+
+static ssize_t store_u8(u8 *value, const char *buf, int count)
+{
+	int err;
+	u8 result;
+
+	err = kstrtou8(buf, 16, &result);
+	if (err < 0) {
+		pr_warn(PR "Trying to store invalid value\n");
+		return err;
+	}
+
+	*value = result;
+
+	return count;
+}
+
+static ssize_t kbbl_red_show(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return show_u8(asus->kbbl_rgb.kbbl_red, buf);
+}
+
+static ssize_t kbbl_red_store(struct device *dev, struct device_attribute *attr,
+		const char *buf, size_t count)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return store_u8(&asus->kbbl_rgb.kbbl_set_red, buf, count);
+}
+
+static ssize_t kbbl_green_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return show_u8(asus->kbbl_rgb.kbbl_green, buf);
+}
+
+static ssize_t kbbl_green_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return store_u8(&asus->kbbl_rgb.kbbl_set_green, buf, count);
+}
+
+static ssize_t kbbl_blue_show(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return show_u8(asus->kbbl_rgb.kbbl_blue, buf);
+}
+
+static ssize_t kbbl_blue_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return store_u8(&asus->kbbl_rgb.kbbl_set_blue, buf, count);
+}
+
+static ssize_t kbbl_mode_show(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return show_u8(asus->kbbl_rgb.kbbl_mode, buf);
+}
+
+static ssize_t kbbl_mode_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return store_u8(&asus->kbbl_rgb.kbbl_set_mode, buf, count);
+}
+
+static ssize_t kbbl_speed_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return show_u8(asus->kbbl_rgb.kbbl_speed, buf);
+}
+
+static ssize_t kbbl_speed_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return store_u8(&asus->kbbl_rgb.kbbl_set_speed, buf, count);
+}
+
+static ssize_t kbbl_flags_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return show_u8(asus->kbbl_rgb.kbbl_set_flags, buf);
+}
+
+static ssize_t kbbl_flags_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct asus_wmi *asus = dev_get_drvdata(dev);
+
+	return store_u8(&asus->kbbl_rgb.kbbl_set_flags, buf, count);
+}
+
+static ssize_t kbbl_set_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	return scnprintf(buf, PAGE_SIZE,
+			"Write to configure RGB keyboard backlight\n");
+}
+
+static int kbbl_rgb_write(struct asus_wmi *asus, int persistent)
+{
+	int err;
+	u32 retval;
+	u8 speed_byte;
+	u8 mode_byte;
+	u8 speed;
+	u8 mode;
+
+	speed = asus->kbbl_rgb.kbbl_set_speed;
+	switch (speed) {
+	case 0:
+	default:
+		speed_byte = 0xe1; // slow
+		speed = 0;
+		break;
+	case 1:
+		speed_byte = 0xeb; // medium
+		break;
+	case 2:
+		speed_byte = 0xf5; // fast
+		break;
+	}
+
+	mode = asus->kbbl_rgb.kbbl_set_mode;
+	switch (mode) {
+	case 0:
+	default:
+		mode_byte = 0x00; // static color
+		mode = 0;
+		break;
+	case 1:
+		mode_byte = 0x01; // blink
+		break;
+	case 2:
+		mode_byte = 0x02; // rainbow
+		break;
+	case 3:
+		mode_byte = 0x0a; // strobe
+		break;
+	}
+
+	err = asus_wmi_evaluate_method_3dw(ASUS_WMI_METHODID_DEVS,
+		ASUS_WMI_DEVID_KBD_RGB,
+		(persistent ? 0xb4 : 0xb3) |
+		(mode_byte << 8) |
+		(asus->kbbl_rgb.kbbl_set_red << 16) |
+		(asus->kbbl_rgb.kbbl_set_green << 24),
+		(asus->kbbl_rgb.kbbl_set_blue) |
+		(speed_byte << 8), &retval);
+	if (err) {
+		pr_warn(PR "RGB keyboard device 1, write error: %d\n", err);
+		return err;
+	}
+
+	if (retval != 1) {
+		pr_warn(PR "RGB keyboard device 1, write error (retval): %x\n",
+				retval);
+		return -EIO;
+	}
+
+	err = asus_wmi_evaluate_method_3dw(ASUS_WMI_METHODID_DEVS,
+		ASUS_WMI_DEVID_KBD_RGB2,
+		(0xbd) |
+		(asus->kbbl_rgb.kbbl_set_flags << 16) |
+		(persistent ? 0x0100 : 0x0000), 0, &retval);
+	if (err) {
+		pr_warn(PR "RGB keyboard device 2, write error: %d\n", err);
+		return err;
+	}
+
+	if (retval != 1) {
+		pr_warn(PR "RGB keyboard device 2, write error (retval): %x\n",
+				retval);
+		return -EIO;
+	}
+
+	asus->kbbl_rgb.kbbl_red = asus->kbbl_rgb.kbbl_set_red;
+	asus->kbbl_rgb.kbbl_green = asus->kbbl_rgb.kbbl_set_green;
+	asus->kbbl_rgb.kbbl_blue = asus->kbbl_rgb.kbbl_set_blue;
+	asus->kbbl_rgb.kbbl_mode = mode;
+	asus->kbbl_rgb.kbbl_speed = speed;
+
+	return 0;
+}
+
+static ssize_t kbbl_set_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	u8 value;
+	struct asus_wmi *asus;
+	int result;
+
+	asus = dev_get_drvdata(dev);
+	result = store_u8(&value, buf, count);
+	if (result < 0)
+		return result;
+
+	if (value == 1)
+		kbbl_rgb_write(asus, 1);
+	else if (value == 2)
+		kbbl_rgb_write(asus, 0);
+
+	return count;
+}
+
+/* RGB values: 00 .. ff */
+static DEVICE_ATTR_RW(kbbl_red);
+static DEVICE_ATTR_RW(kbbl_green);
+static DEVICE_ATTR_RW(kbbl_blue);
+
+/* Color modes: 0 - static color, 1 - blink, 2 - rainbow, 3 - strobe */
+static DEVICE_ATTR_RW(kbbl_mode);
+
+/* Speed for modes 1 and 2: 0 - slow, 1 - medium, 2 - fast */
+static DEVICE_ATTR_RW(kbbl_speed);
+
+/*
+ * Enable: 02 - on boot (until module load) | 08 - awake | 20 - sleep
+ * (2a or ff to enable everything)
+ *
+ * Logically 80 would be shutdown, but no visible effects of this option
+ * were observed so far
+ */
+static DEVICE_ATTR_RW(kbbl_flags);
+
+/* Write data: 1 - permanently, 2 - temporarily (reset after reboot) */
+static DEVICE_ATTR_RW(kbbl_set);
+
+static struct attribute *rgbkb_sysfs_attributes[] = {
+	&dev_attr_kbbl_red.attr,
+	&dev_attr_kbbl_green.attr,
+	&dev_attr_kbbl_blue.attr,
+	&dev_attr_kbbl_mode.attr,
+	&dev_attr_kbbl_speed.attr,
+	&dev_attr_kbbl_flags.attr,
+	&dev_attr_kbbl_set.attr,
+	NULL,
+};
+
+static const struct attribute_group kbbl_attribute_group = {
+	.name = "kbbl",
+	.attrs = rgbkb_sysfs_attributes
+};
+
+static int kbbl_rgb_init(struct asus_wmi *asus)
+{
+	int err;
+
+	err = asus_wmi_get_devstate_simple(asus, ASUS_WMI_DEVID_KBD_RGB);
+	if (err) {
+		if (err == -ENODEV)
+			return 0;
+		else
+			return err;
+	}
+
+	err = asus_wmi_get_devstate_simple(asus, ASUS_WMI_DEVID_KBD_RGB2);
+	if (err) {
+		if (err == -ENODEV)
+			return 0;
+		else
+			return err;
+	}
+
+	asus->kbbl_rgb_available = true;
+	return sysfs_create_group(&asus->platform_device->dev.kobj,
+			&kbbl_attribute_group);
+}
+
+static void kbbl_rgb_exit(struct asus_wmi *asus)
+{
+	if (asus->kbbl_rgb_available) {
+		sysfs_remove_group(&asus->platform_device->dev.kobj,
+				&kbbl_attribute_group);
+	}
+}
+
 /* RF *************************************************************************/
 
 /*
@@ -2212,6 +2534,10 @@ static int asus_wmi_add(struct platform_device *pdev)
 	if (err)
 		goto fail_leds;
 
+	err = kbbl_rgb_init(asus);
+	if (err)
+		goto fail_rgbkb;
+
 	asus_wmi_get_devstate(asus, ASUS_WMI_DEVID_WLAN, &result);
 	if (result & (ASUS_WMI_DSTS_PRESENCE_BIT | ASUS_WMI_DSTS_USER_BIT))
 		asus->driver->wlan_ctrl_by_user = 1;
@@ -2278,6 +2604,8 @@ static int asus_wmi_add(struct platform_device *pdev)
 fail_backlight:
 	asus_wmi_rfkill_exit(asus);
 fail_rfkill:
+	kbbl_rgb_exit(asus);
+fail_rgbkb:
 	asus_wmi_led_exit(asus);
 fail_leds:
 	asus_wmi_hwmon_exit(asus);
@@ -2299,6 +2627,7 @@ static int asus_wmi_remove(struct platform_device *device)
 	asus_wmi_backlight_exit(asus);
 	asus_wmi_input_exit(asus);
 	asus_wmi_led_exit(asus);
+	kbbl_rgb_exit(asus);
 	asus_wmi_rfkill_exit(asus);
 	asus_wmi_debugfs_exit(asus);
 	asus_wmi_platform_exit(asus);
diff --git a/include/linux/platform_data/x86/asus-wmi.h b/include/linux/platform_data/x86/asus-wmi.h
index 53dfc2541960..25b7b653e6d2 100644
--- a/include/linux/platform_data/x86/asus-wmi.h
+++ b/include/linux/platform_data/x86/asus-wmi.h
@@ -57,6 +57,8 @@
 #define ASUS_WMI_DEVID_KBD_BACKLIGHT	0x00050021
 #define ASUS_WMI_DEVID_LIGHT_SENSOR	0x00050022 /* ?? */
 #define ASUS_WMI_DEVID_LIGHTBAR		0x00050025
+#define ASUS_WMI_DEVID_KBD_RGB		0x00100056
+#define ASUS_WMI_DEVID_KBD_RGB2		0x00100057
 
 /* Misc */
 #define ASUS_WMI_DEVID_CAMERA		0x00060013
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH] Linux: Define struct termios2 in <termios.h> under _GNU_SOURCE [BZ #10339]
From: Adhemerval Zanella @ 2019-04-10 20:24 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha; +Cc: hpa, linux-api, linuxppc-dev
In-Reply-To: <875zrnlakd.fsf@oldenburg2.str.redhat.com>



On 09/04/2019 07:47, Florian Weimer wrote:
> struct termios2 is required for setting arbitrary baud rates on serial
> ports.  <sys/ioctl.h> and <linux/termios.h> have conflicting
> definitions in the existing termios definitions, which means that it
> is currently very difficult to use TCGETS2/TCSETS2 and struct termios2
> with glibc.  Providing a definition within glibc resolves this problem.
> 
> This does not completely address bug 10339, but it at least exposes
> the current kernel functionality in this area.
> 
> Support for struct termios2 is architecture-specific in the kernel.
> Support on alpha was only added in Linux 4.20.  POWER support is
> currently missing.  The expectation is that the kernel will eventually
> use the generic UAPI definition for struct termios2.

I still think the better strategy, from both BZ#10339 and recent thread
discussion about the issue on libc-alpha, is rather to:

  1. Start to use termios2 ioctl kabi instead of termios1.  The only
     missing spot is alpha pre linux 4.20.

  2. Adjust sparc and mips to add c_ispeed and c_ospeed along with
     compat symbols to use termios1.  This will allow also some cleanup
     to remove _HAVE_STRUCT_TERMIOS_C_{O,I}SPEED.

  3. Use the compat symbols for alpha pre-4.20.

  4. With termios Linux ABI being essentially the same for all supported
     architectures (with support for c_ospeed and c_ispeed) we can move 
     forward to adapt the current cfgetospeed, cfgetispeed, cfsetospeed,
     cfsetispeed to work with arbitrary values.

     The POSIX and Linux extended BXX values will need to be handled 
     exceptionally.  It means their integers values will be reserved and
     mapped to the termios2 values.  The code resulting code for cfsetospeed,
     for instance, would be:

     ---
     static inline speed_t
     c_ispeed (tcflag_t c_cflag)
     {
       return (c_cflag >> IBSHIFT) & CBAUD;
     }

     /*
      * The top four bits in speed_t are reserved for future use, and *currently*
      * the equivalent values are the only valid baud_t values.
      */
     static inline bool         
     invalid_speed (speed_t x)
     {
       return x > 0x0fffffff;
     } 

     /* Set the output baud rate stored in *TERMIOS_P to the symbol SPEED */
     int
     cfsetospeed (struct termios *termios_p, speed_t speed)
     {
       if (invalid_speed (speed))
         {
           __set_errno (EINVAL); 
           return -1;
         }

       termios_p->c_ospeed = speed_kernel_from_user (speed);
       if ( c_ispeed (termios_p->c_cflag) == B0 )
         termios_p->c_ispeed = termios_p->c_ospeed;
  
       if ( (speed & ~CBAUD) != 0 || speed > _MAX_BAUD )
         speed = BOTHER;

       /*
        * Don't set the input flags here; the B0 in c_cflag indicates that
        * the input speed is tied to the output speed.
        */
       termios_p->c_cflag = (termios_p->c_cflag & ~CBAUD) | speed;
       return 0;
     } 
     ---

This allows us to adjust the baud rates to non-standard values using termios
interfaces without to resorting to add new headers and use a different API
(ioctl).

As Peter Anvin has indicated, he create a POC [1] with the aforementioned
new interfaces.  It has not been rebased against master, more specially against
my termios refactor to simplify the multiple architecture header definitions,
but I intend to use as a base.

> 
> 2019-04-09  Florian Weimer  <fweimer@redhat.com>
> 
> 	[BZ #10339]
> 	Linux: Define struct termios2 in <termios.h> under _GNU_SOURCE.
> 	* sysdeps/unix/sysv/linux/Makefile [$(subdir) == termios] (tests):
> 	Add tst-termios2.
> 	* sysdeps/unix/sysv/linux/tst-termios2.c: New file.
> 	* sysdeps/unix/sysv/linux/bits/termios2-struct.h: Likewise.
> 	* sysdeps/unix/sysv/linux/bits/termios.h [__USE_GNU]: Include it.
> 	* sysdeps/unix/sysv/linux/alpha/bits/termios2-struct.h: New file.
> 	* sysdeps/unix/sysv/linux/sparc/bits/termios2-struct.h: Likewise.
> 
> diff --git a/NEWS b/NEWS
> index b58e2469d4..5e6ecb9c7d 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -18,6 +18,9 @@ Major new features:
>  
>  * On Linux, the gettid function has been added.
>  
> +* On Linux, <termios.h> now provides a definition of struct termios2 with
> +  the _GNU_SOURCE feature test macro.
> +
>  * Minguo (Republic of China) calendar support has been added as an
>    alternative calendar for the following locales: zh_TW, cmn_TW, hak_TW,
>    nan_TW, lzh_TW.
> diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> index 52ac6ad484..4cb5e4f0d2 100644
> --- a/sysdeps/unix/sysv/linux/Makefile
> +++ b/sysdeps/unix/sysv/linux/Makefile
> @@ -156,6 +156,7 @@ endif
>  
>  ifeq ($(subdir),termios)
>  sysdep_headers += termio.h
> +tests += tst-termios2
>  endif
>  
>  ifeq ($(subdir),posix)
> diff --git a/sysdeps/unix/sysv/linux/alpha/bits/termios2-struct.h b/sysdeps/unix/sysv/linux/alpha/bits/termios2-struct.h
> new file mode 100644
> index 0000000000..5f09445e23
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/alpha/bits/termios2-struct.h
> @@ -0,0 +1,33 @@
> +/* struct termios2 definition.  Linux/alpha version.
> +   Copyright (C) 2019 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _TERMIOS_H
> +# error "Never include <bits/termios2-struct.h> directly; use <termios.h> instead."
> +#endif
> +
> +struct termios2
> +{
> +  tcflag_t c_iflag;
> +  tcflag_t c_oflag;
> +  tcflag_t c_cflag;
> +  tcflag_t c_lflag;
> +  cc_t c_cc[NCCS];
> +  cc_t c_line;
> +  speed_t c_ispeed;
> +  speed_t c_ospeed;
> +};
> diff --git a/sysdeps/unix/sysv/linux/bits/termios.h b/sysdeps/unix/sysv/linux/bits/termios.h
> index 997231cd03..45ac7affdf 100644
> --- a/sysdeps/unix/sysv/linux/bits/termios.h
> +++ b/sysdeps/unix/sysv/linux/bits/termios.h
> @@ -25,6 +25,10 @@ typedef unsigned int	speed_t;
>  typedef unsigned int	tcflag_t;
>  
>  #include <bits/termios-struct.h>
> +#ifdef __USE_GNU
> +# include <bits/termios2-struct.h>
> +#endif
> +
>  #include <bits/termios-c_cc.h>
>  #include <bits/termios-c_iflag.h>
>  #include <bits/termios-c_oflag.h>
> diff --git a/sysdeps/unix/sysv/linux/bits/termios2-struct.h b/sysdeps/unix/sysv/linux/bits/termios2-struct.h
> new file mode 100644
> index 0000000000..5a48e45ef3
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/bits/termios2-struct.h
> @@ -0,0 +1,33 @@
> +/* struct termios2 definition.  Linux/generic version.
> +   Copyright (C) 2019 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _TERMIOS_H
> +# error "Never include <bits/termios2-struct.h> directly; use <termios.h> instead."
> +#endif
> +
> +struct termios2
> +{
> +  tcflag_t c_iflag;
> +  tcflag_t c_oflag;
> +  tcflag_t c_cflag;
> +  tcflag_t c_lflag;
> +  cc_t c_line;
> +  cc_t c_cc[NCCS];
> +  speed_t c_ispeed;
> +  speed_t c_ospeed;
> +};
> diff --git a/sysdeps/unix/sysv/linux/sparc/bits/termios2-struct.h b/sysdeps/unix/sysv/linux/sparc/bits/termios2-struct.h
> new file mode 100644
> index 0000000000..7c889e575c
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/sparc/bits/termios2-struct.h
> @@ -0,0 +1,33 @@
> +/* struct termios2 definition.  Linux/sparc version.
> +   Copyright (C) 2019 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _TERMIOS_H
> +# error "Never include <bits/termios2-struct.h> directly; use <termios.h> instead."
> +#endif
> +
> +struct termios2
> +{
> +  tcflag_t c_iflag;
> +  tcflag_t c_oflag;
> +  tcflag_t c_cflag;
> +  tcflag_t c_lflag;
> +  cc_t c_line;
> +  cc_t c_cc[NCCS + 2];
> +  speed_t c_ispeed;
> +  speed_t c_ospeed;
> +};
> diff --git a/sysdeps/unix/sysv/linux/tst-termios2.c b/sysdeps/unix/sysv/linux/tst-termios2.c
> new file mode 100644
> index 0000000000..82326a1288
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-termios2.c
> @@ -0,0 +1,48 @@
> +/* Minimal test of struct termios2 definition.
> +   Copyright (C) 2019 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <termios.h>
> +#include <sys/ioctl.h>
> +
> +/* This function is never executed, but must be compiled successfully.
> +   Accessing serial ports in the test suite is problematic because
> +   they likely correspond with low-level system functionality.  */
> +void
> +not_executed (int fd)
> +{
> +  /* Avoid a compilation failure if TCGETS2, TCSETS2 are not
> +     defined.  */
> +#if defined (TCGETS2) && defined (TCSETS2)
> +  struct termios2 ti;
> +  ioctl (fd, TCGETS2, &ti);
> +  ioctl (fd, TCSETS2, &ti);
> +#endif
> +}
> +
> +static int
> +do_test (void)
> +{
> +  /* Fail at run time if TCGETS2 or TCSETS2 is not defined.  */
> +#if defined (TCGETS2) && defined (TCSETS2)
> +  return 0;
> +#else
> +  return 1;
> +#endif
> +}
> +
> +#include <support/test-driver.c>
> 

^ permalink raw reply

* Re: [PATCH v14 1/3] /proc/pid/status: Add support for architecture specific output
From: Andy Lutomirski @ 2019-04-10 14:54 UTC (permalink / raw)
  To: Li, Aubrey
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	H. Peter Anvin, Andi Kleen, Tim Chen, Dave Hansen,
	Arjan van de Ven, Alexey Dobriyan, Andrew Morton, aubrey.li,
	Linux API, LKML
In-Reply-To: <39318cc8-340f-5366-f1d6-e7dcbb6ba595@linux.intel.com>

On Tue, Apr 9, 2019 at 8:40 PM Li, Aubrey <aubrey.li@linux.intel.com> wrote:
>
> On 2019/4/10 10:36, Li, Aubrey wrote:
> > On 2019/4/10 10:25, Andy Lutomirski wrote:
> >> On Tue, Apr 9, 2019 at 7:20 PM Li, Aubrey <aubrey.li@linux.intel.com> wrote:
> >>>
> >>> On 2019/4/10 9:58, Andy Lutomirski wrote:
> >>>> On Tue, Apr 9, 2019 at 6:55 PM Aubrey Li <aubrey.li@linux.intel.com> wrote:
> >>>>>
> >>>>> The architecture specific information of the running processes could
> >>>>> be useful to the userland. Add support to examine process architecture
> >>>>> specific information externally.
> >>>>>
> >>>>> Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
> >>>>> Cc: Peter Zijlstra <peterz@infradead.org>
> >>>>> Cc: Andi Kleen <ak@linux.intel.com>
> >>>>> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> >>>>> Cc: Dave Hansen <dave.hansen@intel.com>
> >>>>> Cc: Arjan van de Ven <arjan@linux.intel.com>
> >>>>> Cc: Linux API <linux-api@vger.kernel.org>
> >>>>> Cc: Alexey Dobriyan <adobriyan@gmail.com>
> >>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
> >>>>> ---
> >>>>>  fs/proc/array.c         | 5 +++++
> >>>>>  include/linux/proc_fs.h | 2 ++
> >>>>>  2 files changed, 7 insertions(+)
> >>>>>
> >>>>> diff --git a/fs/proc/array.c b/fs/proc/array.c
> >>>>> index 2edbb657f859..331592a61718 100644
> >>>>> --- a/fs/proc/array.c
> >>>>> +++ b/fs/proc/array.c
> >>>>> @@ -401,6 +401,10 @@ static inline void task_thp_status(struct seq_file *m, struct mm_struct *mm)
> >>>>>         seq_printf(m, "THP_enabled:\t%d\n", thp_enabled);
> >>>>>  }
> >>>>>
> >>>>> +void __weak arch_proc_pid_status(struct seq_file *m, struct task_struct *task)
> >>>>> +{
> >>>>> +}
> >>>>
> >>>> This pointlessly bloats other architectures.  Do this instead in an
> >>>> appropriate header:
> >>>>
> >>>> #ifndef arch_proc_pid_status
> >>>> static inline void arch_proc_pid_status(...)
> >>>> {
> >>>> }
> >>>> #endif
> >>>>
> >>>
> >>> I saw a bunch of similar weak functions, is it not acceptable?
> >>>
> >>> fs/proc$ grep weak *.c
> >>> cpuinfo.c:__weak void arch_freq_prepare_all(void)
> >>> meminfo.c:void __attribute__((weak)) arch_report_meminfo(struct seq_file *m)
> >>> vmcore.c:int __weak elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
> >>> vmcore.c:void __weak elfcorehdr_free(unsigned long long addr)
> >>> vmcore.c:ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
> >>> vmcore.c:ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
> >>> vmcore.c:int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
> >>> vmcore.c:ssize_t __weak
> >>
> >> I think they're acceptable, but I don't personally like them.
> >>
> >
> > okay, let me try to see if I can refine it in an appropriate way.
>
> Hi Andy,
>
> Is this what you want?
>
> Thanks,
> -Aubrey
>
> ====================================================================
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 2bb3a648fc12..82d77d3aefff 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -990,5 +990,8 @@ enum l1tf_mitigations {
>  };
>
>  extern enum l1tf_mitigations l1tf_mitigation;
> +/* Add support for architecture specific output in /proc/pid/status */
> +void arch_proc_pid_status(struct seq_file *m, struct task_struct *task);
> +#define arch_proc_pid_status arch_proc_pid_status
>
>  #endif /* _ASM_X86_PROCESSOR_H */
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index 2edbb657f859..fd65a6ba2864 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -401,6 +401,11 @@ static inline void task_thp_status(struct seq_file *m, struct mm_struct *mm)
>         seq_printf(m, "THP_enabled:\t%d\n", thp_enabled);
>  }
>
> +/* Add support for architecture specific output in /proc/pid/status */
> +#ifndef arch_proc_pid_status
> +#define arch_proc_pid_status(m, task)
> +#endif
> +
>  int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
>                         struct pid *pid, struct task_struct *task)
>  {
> @@ -424,6 +429,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
>         task_cpus_allowed(m, task);
>         cpuset_task_status_allowed(m, task);
>         task_context_switch_counts(m, task);
> +       arch_proc_pid_status(m, task);
>         return 0;
>  }
>

Yes.  But I still think it would be nicer to separate the arch stuff
into its own file.  Others might reasonably disagree with me.

^ permalink raw reply

* Re: [PATCH] Linux: Define struct termios2 in <termios.h> under _GNU_SOURCE [BZ #10339]
From: Florian Weimer @ 2019-04-10  6:50 UTC (permalink / raw)
  To: hpa; +Cc: libc-alpha, adhemerval.zanella, linux-api, linuxppc-dev
In-Reply-To: <EF5CE34B-1A82-4391-B07E-DB832814AD1C@zytor.com>

* hpa:

> PowerPC doesn't need it at all.

Because the definition would be the same as struct termios?

I think we should still define struct termios2 using the struct termios
definition, and also define TCGETS2 and TCSETS2.  This way, applications
can use the struct termios2 type without affecting portability to POWER.

Thanks,
Florian

^ permalink raw reply

* Re: [PATCH v14 1/3] /proc/pid/status: Add support for architecture specific output
From: Li, Aubrey @ 2019-04-10  3:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, H. Peter Anvin,
	Andi Kleen, Tim Chen, Dave Hansen, Arjan van de Ven,
	Alexey Dobriyan, Andrew Morton, aubrey.li, Linux API, LKML
In-Reply-To: <d9e9f68e-e8bb-7e53-9909-d86f5f73f617@linux.intel.com>

On 2019/4/10 10:36, Li, Aubrey wrote:
> On 2019/4/10 10:25, Andy Lutomirski wrote:
>> On Tue, Apr 9, 2019 at 7:20 PM Li, Aubrey <aubrey.li@linux.intel.com> wrote:
>>>
>>> On 2019/4/10 9:58, Andy Lutomirski wrote:
>>>> On Tue, Apr 9, 2019 at 6:55 PM Aubrey Li <aubrey.li@linux.intel.com> wrote:
>>>>>
>>>>> The architecture specific information of the running processes could
>>>>> be useful to the userland. Add support to examine process architecture
>>>>> specific information externally.
>>>>>
>>>>> Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
>>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>>> Cc: Andi Kleen <ak@linux.intel.com>
>>>>> Cc: Tim Chen <tim.c.chen@linux.intel.com>
>>>>> Cc: Dave Hansen <dave.hansen@intel.com>
>>>>> Cc: Arjan van de Ven <arjan@linux.intel.com>
>>>>> Cc: Linux API <linux-api@vger.kernel.org>
>>>>> Cc: Alexey Dobriyan <adobriyan@gmail.com>
>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>> ---
>>>>>  fs/proc/array.c         | 5 +++++
>>>>>  include/linux/proc_fs.h | 2 ++
>>>>>  2 files changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/fs/proc/array.c b/fs/proc/array.c
>>>>> index 2edbb657f859..331592a61718 100644
>>>>> --- a/fs/proc/array.c
>>>>> +++ b/fs/proc/array.c
>>>>> @@ -401,6 +401,10 @@ static inline void task_thp_status(struct seq_file *m, struct mm_struct *mm)
>>>>>         seq_printf(m, "THP_enabled:\t%d\n", thp_enabled);
>>>>>  }
>>>>>
>>>>> +void __weak arch_proc_pid_status(struct seq_file *m, struct task_struct *task)
>>>>> +{
>>>>> +}
>>>>
>>>> This pointlessly bloats other architectures.  Do this instead in an
>>>> appropriate header:
>>>>
>>>> #ifndef arch_proc_pid_status
>>>> static inline void arch_proc_pid_status(...)
>>>> {
>>>> }
>>>> #endif
>>>>
>>>
>>> I saw a bunch of similar weak functions, is it not acceptable?
>>>
>>> fs/proc$ grep weak *.c
>>> cpuinfo.c:__weak void arch_freq_prepare_all(void)
>>> meminfo.c:void __attribute__((weak)) arch_report_meminfo(struct seq_file *m)
>>> vmcore.c:int __weak elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
>>> vmcore.c:void __weak elfcorehdr_free(unsigned long long addr)
>>> vmcore.c:ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
>>> vmcore.c:ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
>>> vmcore.c:int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
>>> vmcore.c:ssize_t __weak
>>
>> I think they're acceptable, but I don't personally like them.
>>
> 
> okay, let me try to see if I can refine it in an appropriate way.

Hi Andy,

Is this what you want? 

Thanks,
-Aubrey

====================================================================
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 2bb3a648fc12..82d77d3aefff 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -990,5 +990,8 @@ enum l1tf_mitigations {
 };
 
 extern enum l1tf_mitigations l1tf_mitigation;
+/* Add support for architecture specific output in /proc/pid/status */
+void arch_proc_pid_status(struct seq_file *m, struct task_struct *task);
+#define arch_proc_pid_status arch_proc_pid_status
 
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 2edbb657f859..fd65a6ba2864 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -401,6 +401,11 @@ static inline void task_thp_status(struct seq_file *m, struct mm_struct *mm)
 	seq_printf(m, "THP_enabled:\t%d\n", thp_enabled);
 }
 
+/* Add support for architecture specific output in /proc/pid/status */
+#ifndef arch_proc_pid_status
+#define arch_proc_pid_status(m, task)
+#endif
+
 int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
 			struct pid *pid, struct task_struct *task)
 {
@@ -424,6 +429,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
 	task_cpus_allowed(m, task);
 	cpuset_task_status_allowed(m, task);
 	task_context_switch_counts(m, task);
+	arch_proc_pid_status(m, task);
 	return 0;
 }
 

^ permalink raw reply related

* Re: [PATCH v14 1/3] /proc/pid/status: Add support for architecture specific output
From: Li, Aubrey @ 2019-04-10  2:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, H. Peter Anvin,
	Andi Kleen, Tim Chen, Dave Hansen, Arjan van de Ven,
	Alexey Dobriyan, Andrew Morton, aubrey.li, Linux API, LKML
In-Reply-To: <CALCETrWD4H8zZVD+Mk+=1VWmYNWx=aC7w+LOi8KqoSpAFgD--Q@mail.gmail.com>

On 2019/4/10 10:25, Andy Lutomirski wrote:
> On Tue, Apr 9, 2019 at 7:20 PM Li, Aubrey <aubrey.li@linux.intel.com> wrote:
>>
>> On 2019/4/10 9:58, Andy Lutomirski wrote:
>>> On Tue, Apr 9, 2019 at 6:55 PM Aubrey Li <aubrey.li@linux.intel.com> wrote:
>>>>
>>>> The architecture specific information of the running processes could
>>>> be useful to the userland. Add support to examine process architecture
>>>> specific information externally.
>>>>
>>>> Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>> Cc: Andi Kleen <ak@linux.intel.com>
>>>> Cc: Tim Chen <tim.c.chen@linux.intel.com>
>>>> Cc: Dave Hansen <dave.hansen@intel.com>
>>>> Cc: Arjan van de Ven <arjan@linux.intel.com>
>>>> Cc: Linux API <linux-api@vger.kernel.org>
>>>> Cc: Alexey Dobriyan <adobriyan@gmail.com>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> ---
>>>>  fs/proc/array.c         | 5 +++++
>>>>  include/linux/proc_fs.h | 2 ++
>>>>  2 files changed, 7 insertions(+)
>>>>
>>>> diff --git a/fs/proc/array.c b/fs/proc/array.c
>>>> index 2edbb657f859..331592a61718 100644
>>>> --- a/fs/proc/array.c
>>>> +++ b/fs/proc/array.c
>>>> @@ -401,6 +401,10 @@ static inline void task_thp_status(struct seq_file *m, struct mm_struct *mm)
>>>>         seq_printf(m, "THP_enabled:\t%d\n", thp_enabled);
>>>>  }
>>>>
>>>> +void __weak arch_proc_pid_status(struct seq_file *m, struct task_struct *task)
>>>> +{
>>>> +}
>>>
>>> This pointlessly bloats other architectures.  Do this instead in an
>>> appropriate header:
>>>
>>> #ifndef arch_proc_pid_status
>>> static inline void arch_proc_pid_status(...)
>>> {
>>> }
>>> #endif
>>>
>>
>> I saw a bunch of similar weak functions, is it not acceptable?
>>
>> fs/proc$ grep weak *.c
>> cpuinfo.c:__weak void arch_freq_prepare_all(void)
>> meminfo.c:void __attribute__((weak)) arch_report_meminfo(struct seq_file *m)
>> vmcore.c:int __weak elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
>> vmcore.c:void __weak elfcorehdr_free(unsigned long long addr)
>> vmcore.c:ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
>> vmcore.c:ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
>> vmcore.c:int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
>> vmcore.c:ssize_t __weak
> 
> I think they're acceptable, but I don't personally like them.
> 

okay, let me try to see if I can refine it in an appropriate way.

>>
>>> Or add /proc/PID/x86_status, which sounds better in most respects to me.
>>>
>>
>> I didn't figure out how to make /proc/PID/x86_status invisible to other
>> architectures in an appropriate way, do you have any suggestions?
> 
> #ifdef CONFIG_X86?
> 

I'm not sure if everyone like adding an arch #ifdef in a common file,
please allow me to wait a while to see if there are other comments.

Thanks,
-Aubrey

^ permalink raw reply

* Re: [PATCH v14 1/3] /proc/pid/status: Add support for architecture specific output
From: Andy Lutomirski @ 2019-04-10  2:25 UTC (permalink / raw)
  To: Li, Aubrey
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	H. Peter Anvin, Andi Kleen, Tim Chen, Dave Hansen,
	Arjan van de Ven, Alexey Dobriyan, Andrew Morton, aubrey.li,
	Linux API, LKML
In-Reply-To: <89957461-ec42-f041-a7e9-2452b2e83d39@linux.intel.com>

On Tue, Apr 9, 2019 at 7:20 PM Li, Aubrey <aubrey.li@linux.intel.com> wrote:
>
> On 2019/4/10 9:58, Andy Lutomirski wrote:
> > On Tue, Apr 9, 2019 at 6:55 PM Aubrey Li <aubrey.li@linux.intel.com> wrote:
> >>
> >> The architecture specific information of the running processes could
> >> be useful to the userland. Add support to examine process architecture
> >> specific information externally.
> >>
> >> Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >> Cc: Andi Kleen <ak@linux.intel.com>
> >> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> >> Cc: Dave Hansen <dave.hansen@intel.com>
> >> Cc: Arjan van de Ven <arjan@linux.intel.com>
> >> Cc: Linux API <linux-api@vger.kernel.org>
> >> Cc: Alexey Dobriyan <adobriyan@gmail.com>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> ---
> >>  fs/proc/array.c         | 5 +++++
> >>  include/linux/proc_fs.h | 2 ++
> >>  2 files changed, 7 insertions(+)
> >>
> >> diff --git a/fs/proc/array.c b/fs/proc/array.c
> >> index 2edbb657f859..331592a61718 100644
> >> --- a/fs/proc/array.c
> >> +++ b/fs/proc/array.c
> >> @@ -401,6 +401,10 @@ static inline void task_thp_status(struct seq_file *m, struct mm_struct *mm)
> >>         seq_printf(m, "THP_enabled:\t%d\n", thp_enabled);
> >>  }
> >>
> >> +void __weak arch_proc_pid_status(struct seq_file *m, struct task_struct *task)
> >> +{
> >> +}
> >
> > This pointlessly bloats other architectures.  Do this instead in an
> > appropriate header:
> >
> > #ifndef arch_proc_pid_status
> > static inline void arch_proc_pid_status(...)
> > {
> > }
> > #endif
> >
>
> I saw a bunch of similar weak functions, is it not acceptable?
>
> fs/proc$ grep weak *.c
> cpuinfo.c:__weak void arch_freq_prepare_all(void)
> meminfo.c:void __attribute__((weak)) arch_report_meminfo(struct seq_file *m)
> vmcore.c:int __weak elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
> vmcore.c:void __weak elfcorehdr_free(unsigned long long addr)
> vmcore.c:ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
> vmcore.c:ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
> vmcore.c:int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
> vmcore.c:ssize_t __weak

I think they're acceptable, but I don't personally like them.

>
> > Or add /proc/PID/x86_status, which sounds better in most respects to me.
> >
>
> I didn't figure out how to make /proc/PID/x86_status invisible to other
> architectures in an appropriate way, do you have any suggestions?

#ifdef CONFIG_X86?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox