* Re: [PATCH v1 1/6] eeprom: Add a simple EEPROM framework for eeprom providers
From: Srinivas Kandagatla @ 2015-03-09 7:13 UTC (permalink / raw)
To: Mark Brown
Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Maxime Ripard,
Rob Herring, Pawel Moll, Kumar Gala,
linux-api-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
devicetree-u79uwXL29TY76Z2rM5mHXA, Stephen Boyd,
andrew-g2DYL2Zd6BY, Arnd Bergmann, Greg Kroah-Hartman
In-Reply-To: <20150307150035.GN28806-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
On 07/03/15 15:00, Mark Brown wrote:
> On Thu, Mar 05, 2015 at 09:45:41AM +0000, Srinivas Kandagatla wrote:
>
>> +
>> + return eeprom;
>> +}
>> +EXPORT_SYMBOL(eeprom_register);
>
> This framework uses regmap but regmap is EXPORT_SYMBOL_GPL() and this is
> using EXPORT_SYMBOL().
>
Thanks for spotting this, I will fix this in next version.
>> +int eeprom_unregister(struct eeprom_device *eeprom)
>> +{
>> + mutex_lock(&eeprom_mutex);
>> + if (atomic_read(&eeprom->users)) {
>> + mutex_unlock(&eeprom_mutex);
>
> Atomic reads and a mutex - isn't the mutex enough? Atomics are
> generally a recipie for bugs due to the complexity in using them.
Yes, you are right as long as we protect users variable with mutex,
using atomic is really redundant, will fix it in next version.
>
^ permalink raw reply
* Re: [PATCH v2 02/18] ARM: ARMv7M: Enlarge vector table to 256 entries
From: Stefan Agner @ 2015-03-09 0:29 UTC (permalink / raw)
To: Maxime Coquelin
Cc: u.kleine-koenig, afaerber, geert, Rob Herring, Philipp Zabel,
Jonathan Corbet, Pawel Moll, Mark Rutland, Ian Campbell,
Kumar Gala, Russell King, Daniel Lezcano, Thomas Gleixner,
Linus Walleij, Greg Kroah-Hartman, Jiri Slaby, Arnd Bergmann,
Andrew Morton, David S. Miller, Mauro Carvalho Chehab,
Joe Perches, Antti Palosaari, Tejun Heo, Will Deacon, Nikolay
In-Reply-To: <1424455277-29983-3-git-send-email-mcoquelin.stm32@gmail.com>
On 2015-02-20 19:01, Maxime Coquelin wrote:
> From Cortex-M reference manuals, the nvic supports up to 240 interrupts.
> So the number of entries in vectors table is up to 256.
>
> This patch adds a new config flag to specify the number of external interrupts.
> Some ifdeferies are added in order to respect the natural alignment without
> wasting too much space on smaller systems.
>
> Signed-off-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>
> ---
> arch/arm/kernel/entry-v7m.S | 13 +++++++++----
> arch/arm/mm/Kconfig | 15 +++++++++++++++
> 2 files changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm/kernel/entry-v7m.S b/arch/arm/kernel/entry-v7m.S
> index 8944f49..68cde36 100644
> --- a/arch/arm/kernel/entry-v7m.S
> +++ b/arch/arm/kernel/entry-v7m.S
> @@ -117,9 +117,14 @@ ENTRY(__switch_to)
> ENDPROC(__switch_to)
>
> .data
> - .align 8
> +#if CONFIG_CPUV7M_NUM_IRQ <= 112
> + .align 9
> +#else
> + .align 10
> +#endif
> +
> /*
> - * Vector table (64 words => 256 bytes natural alignment)
> + * Vector table (Natural alignment need to be ensured)
> */
> ENTRY(vector_table)
> .long 0 @ 0 - Reset stack pointer
> @@ -138,6 +143,6 @@ ENTRY(vector_table)
> .long __invalid_entry @ 13 - Reserved
> .long __pendsv_entry @ 14 - PendSV
> .long __invalid_entry @ 15 - SysTick
> - .rept 64 - 16
> - .long __irq_entry @ 16..64 - External Interrupts
> + .rept CONFIG_CPUV7M_NUM_IRQ
> + .long __irq_entry @ External Interrupts
> .endr
> diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
> index c43c714..27eb835 100644
> --- a/arch/arm/mm/Kconfig
> +++ b/arch/arm/mm/Kconfig
> @@ -604,6 +604,21 @@ config CPU_USE_DOMAINS
> This option enables or disables the use of domain switching
> via the set_fs() function.
>
> +config CPUV7M_NUM_IRQ
> + int "Number of external interrupts connected to the NVIC"
> + depends on CPU_V7M
> + default 90 if ARCH_STM32
> + default 38 if ARCH_EFM32
> + default 240
> + help
> + This option indicates the number of interrupts connected to the NVIC.
> + The value can be larger than the real number of interrupts supported
> + by the system, but must not be lower.
> + The default value is 240, corresponding to the maximum number of
> + interrupts supported by the NVIC on Cortex-M family.
> +
> + If unsure, keep default value.
> +
> #
> # CPU supports 36-bit I/O
> #
I sent a patch which extended that vector table some weeks ago:
https://lkml.org/lkml/2014/12/29/296
But your solution is definitely more flexible, and given that we deal
with small devices here, it's worth saving memory.
Acked-by: Stefan Agner <stefan@agner.ch>
^ permalink raw reply
* Re: [PATCH v5 tip 0/7] tracing: attach eBPF programs to kprobes
From: Alexei Starovoitov @ 2015-03-08 0:21 UTC (permalink / raw)
To: Steven Rostedt, Ingo Molnar
Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Jiri Olsa,
Masami Hiramatsu, David S. Miller, Daniel Borkmann,
Peter Zijlstra, Linux API, Network Development, LKML
In-Reply-To: <20150306200938.6a6387c0@gandalf.local.home>
On 3/6/15 5:09 PM, Steven Rostedt wrote:
> On Wed, 4 Mar 2015 15:48:24 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
>> On Wed, 4 Mar 2015 21:33:16 +0100
>> Ingo Molnar <mingo@kernel.org> wrote:
>>
>>>
>>> * Alexei Starovoitov <ast@plumgrid.com> wrote:
>>>
>>>> On Sun, Mar 1, 2015 at 3:27 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>>>>> Peter, Steven,
>>>>> I think this set addresses everything we've discussed.
>>>>> Please review/ack. Thanks!
>>>>
>>>> icmp echo request
>>>
>>> I'd really like to have an Acked-by from Steve (propagated into the
>>> changelogs) before looking at applying these patches.
>>
>> I'll have to look at this tomorrow. I'm a bit swamped with other things
>> at the moment :-/
>>
>
> Just an update. I started looking at it but then was pulled off to do
> other things. I'll make this a priority next week. Sorry for the delay.
There is no rush. Please let me know if I need to clarify anything.
One thing I just caught which I'm planning to address in the follow on
patch is missing 'recursion check'. Since attaching programs to kprobes
means that root may create loops by adding a kprobe somewhere in
the call chain invoked from bpf program. So far I'm thinking to do
simple stack_trace_call()-like check. I don't think it's a blocker
for this set, but if I'm done coding recursion soon, I'll just
roll it in and respin this set :)
^ permalink raw reply
* Re: [PATCH v0 01/11] stm class: Introduce an abstraction for System Trace Module devices
From: Paul Bolle @ 2015-03-07 22:26 UTC (permalink / raw)
To: Alexander Shishkin
Cc: Greg Kroah-Hartman, linux-kernel, Pratik Patel, mathieu.poirier,
peter.lachner, norbert.schulz, keven.boell, yann.fouassier,
laurent.fert, linux-api
In-Reply-To: <1425728161-164217-2-git-send-email-alexander.shishkin@linux.intel.com>
On Sat, 2015-03-07 at 13:35 +0200, Alexander Shishkin wrote:
> Documentation/ABI/testing/configfs-stp-policy | 44 ++
git am whined about this file when I tried to apply this patch:
Applying: stm class: Introduce an abstraction for System Trace Module devices
[...]/.git/rebase-apply/patch:77: new blank line at EOF.
> Documentation/ABI/testing/sysfs-class-stm | 14 +
> Documentation/ABI/testing/sysfs-class-stm_source | 11 +
> Documentation/trace/stm.txt | 77 +++
> drivers/Kconfig | 2 +
> drivers/Makefile | 1 +
> drivers/stm/Kconfig | 8 +
> drivers/stm/Makefile | 3 +
> drivers/stm/core.c | 839 +++++++++++++++++++++++
> drivers/stm/policy.c | 470 +++++++++++++
> drivers/stm/stm.h | 77 +++
> include/linux/stm.h | 87 +++
> include/uapi/linux/stm.h | 47 ++
> --- /dev/null
> +++ b/drivers/stm/Kconfig
> @@ -0,0 +1,8 @@
> +config STM
> + tristate "System Trace Module devices"
> + help
> + A System Trace Module (STM) is a device exporting data in System
> + Trace Protocol (STP) format as defined by MIPI STP standards.
> + Examples of such devices are Intel Trace Hub and Coresight STM.
> +
> + Say Y here to enable System Trace Module device support.
> diff --git a/drivers/stm/Makefile b/drivers/stm/Makefile
> new file mode 100644
> index 0000000000..adec701649
> --- /dev/null
> +++ b/drivers/stm/Makefile
> @@ -0,0 +1,3 @@
> +obj-$(CONFIG_STM) += stm_core.o
> +
> +stm_core-y := core.o policy.o
I tried to compile this as a module:
$ make -C ../.. M=$PWD CONFIG_STM=m stm_core.ko
make: Entering directory `[...]'
LD [M] [...]/drivers/stm/stm_core.o
[...]/drivers/stm/policy.o: In function `stp_configfs_init':
policy.c:(.text+0x5f0): multiple definition of `init_module'
[...]/drivers/stm/core.o:core.c:(.init.text+0x0): first defined here
make[1]: *** [[...]/drivers/stm/stm_core.o] Error 1
make: *** [stm_core.ko] Error 2
make: Leaving directory `[...]'
I think that's because
postcore_initcall(stm_core_init);
in core.c becomes
module_init(stm_core_init);
if this driver is compiled as a module. And that will clash with
module_init(stp_configfs_init);
in policy.c. Am I missing something obvious or should STM not be a
tristate symbol?
Paul Bolle
^ permalink raw reply
* Re: [PATCH] capabilities: Ambient capability set V2
From: Serge E. Hallyn @ 2015-03-07 21:35 UTC (permalink / raw)
To: Christoph Lameter
Cc: Serge E. Hallyn, Andy Lutomirski, Serge Hallyn, Jonathan Corbet,
Aaron Jones, LSM List,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Andrew Morton, Andrew G. Morgan, Mimi Zohar, Austin S Hemmelgarn,
Markku Savela, Jarkko Sakkinen, Linux API, Michael Kerrisk
In-Reply-To: <alpine.DEB.2.11.1503070907330.15173-gkYfJU5Cukgdnm+yROfE0A@public.gmane.org>
On Sat, Mar 07, 2015 at 09:09:05AM -0600, Christoph Lameter wrote:
> On Fri, 6 Mar 2015, Serge E. Hallyn wrote:
>
> > > I think that's right. fI doesn't set pI.
> >
> > Right. The idea is that for the running binary to get capability x in its
> > pP, its privileged ancestor must have set x in pI, and the binary itself
> > must be trusted with x in fI.
>
> The ancestor here is ambient_test and when it is run pI will not be set
> despite the cap setting.
ambient_test is supposed to set it.
> Therefore anything is spawns cannot have the inheritance bits set either.
> This plainly does not make any sense whatsoever. If this is so as it seems
> to be then we should be able to remove the inheritance bits because they
> have no effect.
>
^ permalink raw reply
* Re: [PATCH] capabilities: Ambient capability set V2
From: Serge E. Hallyn @ 2015-03-07 21:35 UTC (permalink / raw)
To: Christoph Lameter
Cc: Andy Lutomirski, Serge E. Hallyn, Serge Hallyn, Jonathan Corbet,
Aaron Jones, LSM List, linux-kernel@vger.kernel.org,
Andrew Morton, Andrew G. Morgan, Mimi Zohar, Austin S Hemmelgarn,
Markku Savela, Jarkko Sakkinen, Linux API, Michael Kerrisk
In-Reply-To: <alpine.DEB.2.11.1503070906130.15173@gentwo.org>
On Sat, Mar 07, 2015 at 09:06:46AM -0600, Christoph Lameter wrote:
> On Fri, 6 Mar 2015, Andy Lutomirski wrote:
>
> > > christoph@fujitsu-haswell:~$ getcap ambient_test
> > >
> > > ambient_test = cap_setpcap,cap_net_admin,cap_net_raw,cap_sys_nice+eip
> >
> > I think that's right. fI doesn't set pI.
>
> Ok then that is the point of pI if it cannot be set?
It can be set! Anything with CAP_SETPCAP can fill it's pI. When
it and its children exec(), pI' = pI.
^ permalink raw reply
* Re: [PATCH] capabilities: Ambient capability set V2
From: Christoph Lameter @ 2015-03-07 15:09 UTC (permalink / raw)
To: Serge E. Hallyn
Cc: Andy Lutomirski, Serge Hallyn, Jonathan Corbet, Aaron Jones,
LSM List, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Andrew Morton, Andrew G. Morgan, Mimi Zohar, Austin S Hemmelgarn,
Markku Savela, Jarkko Sakkinen, Linux API, Michael Kerrisk
In-Reply-To: <20150306200838.GA29198-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
On Fri, 6 Mar 2015, Serge E. Hallyn wrote:
> > I think that's right. fI doesn't set pI.
>
> Right. The idea is that for the running binary to get capability x in its
> pP, its privileged ancestor must have set x in pI, and the binary itself
> must be trusted with x in fI.
The ancestor here is ambient_test and when it is run pI will not be set
despite the cap setting.
Therefore anything is spawns cannot have the inheritance bits set either.
This plainly does not make any sense whatsoever. If this is so as it seems
to be then we should be able to remove the inheritance bits because they
have no effect.
^ permalink raw reply
* Re: [PATCH] capabilities: Ambient capability set V2
From: Christoph Lameter @ 2015-03-07 15:06 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Serge E. Hallyn, Serge Hallyn, Jonathan Corbet, Aaron Jones,
LSM List, linux-kernel@vger.kernel.org, Andrew Morton,
Andrew G. Morgan, Mimi Zohar, Austin S Hemmelgarn, Markku Savela,
Jarkko Sakkinen, Linux API, Michael Kerrisk
In-Reply-To: <CALCETrVQF22rkZFD8VAW_xrVQOjwpej6W4TJS9gbN9B431TEKg@mail.gmail.com>
On Fri, 6 Mar 2015, Andy Lutomirski wrote:
> > christoph@fujitsu-haswell:~$ getcap ambient_test
> >
> > ambient_test = cap_setpcap,cap_net_admin,cap_net_raw,cap_sys_nice+eip
>
> I think that's right. fI doesn't set pI.
Ok then that is the point of pI if it cannot be set?
^ permalink raw reply
* Re: [PATCH v1 1/6] eeprom: Add a simple EEPROM framework for eeprom providers
From: Mark Brown @ 2015-03-07 15:00 UTC (permalink / raw)
To: Srinivas Kandagatla
Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Maxime Ripard,
Rob Herring, Pawel Moll, Kumar Gala,
linux-api-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
devicetree-u79uwXL29TY76Z2rM5mHXA, Stephen Boyd,
andrew-g2DYL2Zd6BY, Arnd Bergmann, Greg Kroah-Hartman
In-Reply-To: <1425548741-12930-1-git-send-email-srinivas.kandagatla-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 526 bytes --]
On Thu, Mar 05, 2015 at 09:45:41AM +0000, Srinivas Kandagatla wrote:
> +
> + return eeprom;
> +}
> +EXPORT_SYMBOL(eeprom_register);
This framework uses regmap but regmap is EXPORT_SYMBOL_GPL() and this is
using EXPORT_SYMBOL().
> +int eeprom_unregister(struct eeprom_device *eeprom)
> +{
> + mutex_lock(&eeprom_mutex);
> + if (atomic_read(&eeprom->users)) {
> + mutex_unlock(&eeprom_mutex);
Atomic reads and a mutex - isn't the mutex enough? Atomics are
generally a recipie for bugs due to the complexity in using them.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply
* Re: [PATCH v3 0/3] epoll: introduce round robin wakeup mode
From: Jason Baron @ 2015-03-07 12:35 UTC (permalink / raw)
To: Ingo Molnar
Cc: Andrew Morton, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA,
viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, normalperson-rMlxZR9MS24,
davidel-AhlLAIvw+VEjIGhXcJzhZg,
mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, luto-kltTT9wpgjJwATOyAt5JVQ,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA, Linus Torvalds, Alexander Viro
In-Reply-To: <20150305091517.GA25158-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
On 03/05/2015 04:15 AM, Ingo Molnar wrote:
> * Jason Baron <jbaron-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org> wrote:
>
>> 2) We are using the wakeup in this case to 'assign' work more
>> permanently to the thread. That is, in the case of a listen socket
>> we then add the connected socket to the woken up threads local set
>> of epoll events. So the load persists past the wake up. And in this
>> case, doing the round robin wakeups, simply allows us to access more
>> cpu bandwidth. (I'm also looking into potentially using cpu affinity
>> to do the wakeups as well as you suggested.)
> So this is the part that I still don't understand.
Here's maybe another way to frame this. Epoll sets add
a waiter on the wait queue in a fixed order when epoll sets
are added (via EPOLL_CTL_ADD). This order does not change
modulo adds/dels which are usually not common. So if
we don't want to wake all threads, when say an interrupt
occurs at some random point, we can either:
1) Walk the list, wake up the first epoll set that has idle
threads (queued via epoll_wait()) and return.
or:
2) Walk the list and wake up the first epoll set that has idle
threads, but then 'rotate' or move this epoll set to the tail
of the queue before returning.
So because the epoll sets are in a fixed order there is
an extreme bias to pick the same epoll sets over and over
regardless of the order in which threads return to wait
via (epoll_wait()). So I think the rotate makes sense for
the case where I am trying to assign work to threads that
may persist past the wake up point, and for cases where
the threads can finish all their work before returning
back to epoll_wait().
Thanks,
-Jason
^ permalink raw reply
* [PATCH v0 01/11] stm class: Introduce an abstraction for System Trace Module devices
From: Alexander Shishkin @ 2015-03-07 11:35 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: linux-kernel, Pratik Patel, mathieu.poirier, peter.lachner,
norbert.schulz, keven.boell, yann.fouassier, laurent.fert,
Alexander Shishkin, linux-api
In-Reply-To: <1425728161-164217-1-git-send-email-alexander.shishkin@linux.intel.com>
A System Trace Module (STM) is a device exporting data in System Trace
Protocol (STP) format as defined by MIPI STP standards. Examples of such
devices are Intel Trace Hub and Coresight STM.
This abstraction provides a unified interface for software trace sources
to send their data over an STM device to a debug host. In order to do
that, such a trace source needs to be assigned a pair of master/channel
identifiers that all the data from this source will be tagged with. The
STP decoder on the debug host side will use these master/channel tags to
distinguish different trace streams from one another inside one STP
stream.
This abstraction provides a configfs-based policy management mechanism
for dynamic allocation of these master/channel pairs based on trace
source-supplied string identifier. It has the flexibility of being
defined at runtime and at the same time (provided that the policy
definition is aligned with the decoding end) consistency.
For userspace trace sources, this abstraction provides write()-based and
mmap()-based (if the underlying stm device allows this) output mechanism.
For kernel-side trace sources, we provide "stm_source" device class that
can be connected to an stm device at run time.
Cc: linux-api@vger.kernel.org
Cc: Pratik Patel <pratikp@codeaurora.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
Documentation/ABI/testing/configfs-stp-policy | 44 ++
Documentation/ABI/testing/sysfs-class-stm | 14 +
Documentation/ABI/testing/sysfs-class-stm_source | 11 +
Documentation/trace/stm.txt | 77 +++
drivers/Kconfig | 2 +
drivers/Makefile | 1 +
drivers/stm/Kconfig | 8 +
drivers/stm/Makefile | 3 +
drivers/stm/core.c | 839 +++++++++++++++++++++++
drivers/stm/policy.c | 470 +++++++++++++
drivers/stm/stm.h | 77 +++
include/linux/stm.h | 87 +++
include/uapi/linux/stm.h | 47 ++
13 files changed, 1680 insertions(+)
create mode 100644 Documentation/ABI/testing/configfs-stp-policy
create mode 100644 Documentation/ABI/testing/sysfs-class-stm
create mode 100644 Documentation/ABI/testing/sysfs-class-stm_source
create mode 100644 Documentation/trace/stm.txt
create mode 100644 drivers/stm/Kconfig
create mode 100644 drivers/stm/Makefile
create mode 100644 drivers/stm/core.c
create mode 100644 drivers/stm/policy.c
create mode 100644 drivers/stm/stm.h
create mode 100644 include/linux/stm.h
create mode 100644 include/uapi/linux/stm.h
diff --git a/Documentation/ABI/testing/configfs-stp-policy b/Documentation/ABI/testing/configfs-stp-policy
new file mode 100644
index 0000000000..1c7ab3dbcd
--- /dev/null
+++ b/Documentation/ABI/testing/configfs-stp-policy
@@ -0,0 +1,44 @@
+What: /config/stp-policy
+Date: Jan 2015
+KernelVersion: 3.20
+Description:
+ This group contains policies mandating Master/Channel allocation
+ for software sources wishing to send trace data over an STM
+ device.
+
+What: /config/stp-policy/<policy>
+Date: Jan 2015
+KernelVersion: 3.20
+Description:
+ Root of a policy. This name is an arbitrary string.
+
+What: /config/stp-policy/<policy>/device
+Date: Jan 2015
+KernelVersion: 3.20
+Description:
+ STM device to which this policy applies. Write a valid stm class
+ device name here to assign this policy to that device.
+
+What: /config/stp-policy/<policy>/<node>
+Date: Jan 2015
+KernelVersion: 3.20
+Description:
+ Policy node is a string identifier that software clients will
+ use to request a master/channel to be allocated and assigned to
+ them.
+
+What: /config/stp-policy/<policy>/<node>/masters
+Date: Jan 2015
+KernelVersion: 3.20
+Description:
+ Range of masters from which to allocate for users of this node.
+ Write two numbers: the first master and the last master number.
+
+What: /config/stp-policy/<policy>/<node>/channels
+Date: Jan 2015
+KernelVersion: 3.20
+Description:
+ Range of channels from which to allocate for users of this node.
+ Write two numbers: the first channel and the last channel
+ number.
+
diff --git a/Documentation/ABI/testing/sysfs-class-stm b/Documentation/ABI/testing/sysfs-class-stm
new file mode 100644
index 0000000000..186f8e66e1
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-stm
@@ -0,0 +1,14 @@
+What: /sys/class/stm/<stm>/masters
+Date: Jan 2015
+KernelVersion: 3.20
+Contact: Alexander Shishkin <alexander.shishkin@linux.intel.com>
+Description:
+ Shows first and last available to software master numbers on
+ this STM device.
+
+What: /sys/class/stm/<stm>/channels
+Date: Jan 2015
+KernelVersion: 3.20
+Contact: Alexander Shishkin <alexander.shishkin@linux.intel.com>
+Description:
+ Shows the number of channels per master on this STM device.
diff --git a/Documentation/ABI/testing/sysfs-class-stm_source b/Documentation/ABI/testing/sysfs-class-stm_source
new file mode 100644
index 0000000000..735b0d657f
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-stm_source
@@ -0,0 +1,11 @@
+What: /sys/class/stm_source/<stm_source>/stm_source_link
+Date: Jan 2015
+KernelVersion: 3.20
+Contact: Alexander Shishkin <alexander.shishkin@linux.intel.com>
+Description:
+ stm_source device linkage to stm device, where its tracing data
+ is directed. Reads return an existing connection or "<none>" if
+ this stm_source is not connected to any stm device yet.
+ Write an existing (registered) stm device's name here to
+ connect that device. If a device is already connected to this
+ stm_source, it will first be disconnected.
diff --git a/Documentation/trace/stm.txt b/Documentation/trace/stm.txt
new file mode 100644
index 0000000000..0ba5c9115c
--- /dev/null
+++ b/Documentation/trace/stm.txt
@@ -0,0 +1,77 @@
+System Trace Module
+===================
+
+System Trace Module (STM) is a device described in MIPI STP specs as
+STP trace stream generator. STP (System Trace Protocol) is a trace
+protocol multiplexing data from multiple trace sources, each one of
+which is assigned a unique pair of master and channel. While some of
+these masters and channels are statically allocated to certain
+hardware trace sources, others are available to software. Software
+trace sources are usually free to pick for themselves any
+master/channel combination from this pool.
+
+On the receiving end of this STP stream (the decoder side), trace
+sources can only be identified by master/channel combination, so in
+order for the decoder to be able to make sense of the trace that
+involves multiple trace sources, it needs to be able to map those
+master/channel pairs to the trace sources that it understands.
+
+For instance, it is helpful to know that syslog messages come on
+master 7 channel 15, while arbitrary user applications can use masters
+48 to 63 and channels 0 to 127.
+
+To solve this mapping problem, stm class provides a policy management
+mechanism via configfs, that allows defining rules that map string
+identifiers to ranges of masters and channels. If these rules (policy)
+are consistent with what decoder expects, it will be able to properly
+process the trace data.
+
+This policy is a tree structure containing rules (policy_node) that
+have a name (string identifier) and a range of masters and channels
+associated with it, located in "stp-policy" subsystem directory in
+configfs. From the examle above, a rule may look like this:
+
+$ ls /config/stp-policy/my-policy/user
+channels masters
+$ cat /config/stp-policy/my-policy/user/masters
+48 63
+$ cat /config/stp-policy/my-policy/user/channels
+0 127
+
+which means that the master allocation pool for this rule consists of
+masters 48 through 63 and channel allocation pool has channels 0
+through 127 in it. Now, any producer (trace source) identifying itself
+with "user" identification string will be allocated a master and
+channel from within these ranges.
+
+These rules can be nested, for example, one can define a rule "dummy"
+under "user" directory from the example above and this new rule will
+be used for trace sources with the id string of "user/dummy".
+
+Trace sources have to open the stm class device's node and write their
+trace data into its file descriptor. In order to identify themselves
+to the policy, they need to do a STP_POLICY_ID_SET ioctl on this file
+descriptor providing their id string. Otherwise, they will be
+automatically allocated a master/channel pair upon first write to this
+file descriptor according to the "default" rule of the policy, if such
+exists.
+
+Some STM devices may allow direct mapping of the channel mmio regions
+to userspace for zero-copy writing. One mappable page (in terms of
+mmu) will usually contain multiple channels' mmios, so the user will
+need to allocate that many channels to themselves (via the
+aforementioned ioctl() call) to be able to do this. That is, if your
+stm device's channel mmio region is 64 bytes and hardware page size is
+4096 bytes, after a successful STP_POLICY_ID_SET ioctl() call with
+width==64, you should be able to mmap() one page on this file
+descriptor and obtain direct access to an mmio region for 64 channels.
+
+For kernel-based trace sources, there is "stm_source" device
+class. Devices of this class can be connected and disconnected to/from
+stm devices at runtime via a sysfs attribute.
+
+Examples of STM devices are Intel Trace Hub [1] and Coresight STM
+[2].
+
+[1] https://software.intel.com/sites/default/files/managed/d3/3c/intel-th-developer-manual.pdf
+[2] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0444b/index.html
diff --git a/drivers/Kconfig b/drivers/Kconfig
index c0cc96bab9..7bc80670bb 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -182,4 +182,6 @@ source "drivers/thunderbolt/Kconfig"
source "drivers/android/Kconfig"
+source "drivers/stm/Kconfig"
+
endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 527a6da8d5..2d511b411a 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -165,3 +165,4 @@ obj-$(CONFIG_RAS) += ras/
obj-$(CONFIG_THUNDERBOLT) += thunderbolt/
obj-$(CONFIG_CORESIGHT) += coresight/
obj-$(CONFIG_ANDROID) += android/
+obj-$(CONFIG_STM) += stm/
diff --git a/drivers/stm/Kconfig b/drivers/stm/Kconfig
new file mode 100644
index 0000000000..90ed327461
--- /dev/null
+++ b/drivers/stm/Kconfig
@@ -0,0 +1,8 @@
+config STM
+ tristate "System Trace Module devices"
+ help
+ A System Trace Module (STM) is a device exporting data in System
+ Trace Protocol (STP) format as defined by MIPI STP standards.
+ Examples of such devices are Intel Trace Hub and Coresight STM.
+
+ Say Y here to enable System Trace Module device support.
diff --git a/drivers/stm/Makefile b/drivers/stm/Makefile
new file mode 100644
index 0000000000..adec701649
--- /dev/null
+++ b/drivers/stm/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_STM) += stm_core.o
+
+stm_core-y := core.o policy.o
diff --git a/drivers/stm/core.c b/drivers/stm/core.c
new file mode 100644
index 0000000000..ba0ce55b0a
--- /dev/null
+++ b/drivers/stm/core.c
@@ -0,0 +1,839 @@
+/*
+ * System Trace Module (STM) infrastructure
+ * Copyright (c) 2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * STM class implements generic infrastructure for System Trace Module devices
+ * as defined in MIPI STPv2 specification.
+ */
+
+#include <linux/uaccess.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/compat.h>
+#include <linux/kdev_t.h>
+#include <linux/srcu.h>
+#include <linux/slab.h>
+#include <linux/stm.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include "stm.h"
+
+#include <uapi/linux/stm.h>
+
+static unsigned int stm_core_up;
+
+/*
+ * The SRCU here makes sure that STM device doesn't disappear from under a
+ * stm_source_write() caller, which may want to have as little overhead as
+ * possible.
+ */
+static struct srcu_struct stm_source_srcu;
+
+static ssize_t masters_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct stm_device *stm = dev_get_drvdata(dev);
+ int ret;
+
+ ret = sprintf(buf, "%u %u\n", stm->data->sw_start, stm->data->sw_end);
+
+ return ret;
+}
+
+static DEVICE_ATTR_RO(masters);
+
+static ssize_t channels_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct stm_device *stm = dev_get_drvdata(dev);
+ int ret;
+
+ ret = sprintf(buf, "%u\n", stm->data->sw_nchannels);
+
+ return ret;
+}
+
+static DEVICE_ATTR_RO(channels);
+
+static struct attribute *stm_attrs[] = {
+ &dev_attr_masters.attr,
+ &dev_attr_channels.attr,
+ NULL,
+};
+
+static const struct attribute_group stm_group = {
+ .attrs = stm_attrs,
+};
+
+static const struct attribute_group *stm_groups[] = {
+ &stm_group,
+ NULL,
+};
+
+static struct class stm_class = {
+ .name = "stm",
+ .dev_groups = stm_groups,
+};
+
+static int stm_dev_match(struct device *dev, const void *data)
+{
+ const char *name = data;
+
+ return sysfs_streq(name, dev_name(dev));
+}
+
+/**
+ * stm_find_device() - find stm device by name
+ * @buf: character buffer containing the name
+ * @len: length of the name in @buf
+ *
+ * This is called from attributes' store methods, so it will
+ * also trim the trailing newline if necessary.
+ *
+ * Return: device pointer or null if lookup failed.
+ */
+struct device *stm_find_device(const char *buf, size_t len)
+{
+ if (!stm_core_up)
+ return NULL;
+
+ return class_find_device(&stm_class, NULL, buf, stm_dev_match);
+}
+
+#define __stm_master(_s, _m) \
+ ((_s)->masters[(_m) - (_s)->data->sw_start])
+
+static inline struct stp_master *
+stm_master(struct stm_device *stm, unsigned int idx)
+{
+ if (idx < stm->data->sw_start || idx > stm->data->sw_end)
+ return NULL;
+
+ return __stm_master(stm, idx);
+}
+
+static int stp_master_alloc(struct stm_device *stm, unsigned int idx)
+{
+ struct stp_master *master;
+ size_t size;
+
+ size = ALIGN(stm->data->sw_nchannels, 8) / 8;
+ size += sizeof(struct stp_master);
+ master = kzalloc(size, GFP_ATOMIC);
+ if (!master)
+ return -ENOMEM;
+
+ master->nr_free = stm->data->sw_nchannels;
+ __stm_master(stm, idx) = master;
+
+ return 0;
+}
+
+static void stp_master_free(struct stm_device *stm, unsigned int idx)
+{
+ struct stp_master *master = stm_master(stm, idx);
+
+ if (!master)
+ return;
+
+ __stm_master(stm, idx) = NULL;
+ kfree(master);
+}
+
+static void stm_output_claim(struct stm_device *stm, struct stm_output *output)
+{
+ struct stp_master *master = stm_master(stm, output->master);
+
+ if (WARN_ON_ONCE(master->nr_free < output->nr_chans))
+ return;
+
+ bitmap_allocate_region(&master->chan_map[0], output->channel,
+ ilog2(output->nr_chans));
+
+ master->nr_free -= output->nr_chans;
+}
+
+static void
+stm_output_disclaim(struct stm_device *stm, struct stm_output *output)
+{
+ struct stp_master *master = stm_master(stm, output->master);
+
+ bitmap_release_region(&master->chan_map[0], output->channel,
+ ilog2(output->nr_chans));
+
+ master->nr_free += output->nr_chans;
+}
+
+/*
+ * This is like bitmap_find_free_region(), except it can ignore @start bits
+ * at the beginning.
+ */
+static int find_free_channels(unsigned long *bitmap, unsigned int start,
+ unsigned int end, unsigned int width)
+{
+ unsigned int pos;
+ int i;
+
+ for (pos = start; pos < end + 1; pos = ALIGN(pos, width)) {
+ pos = find_next_zero_bit(bitmap, end + 1, pos);
+ if (pos + width > end + 1)
+ break;
+
+ if (pos & (width - 1))
+ continue;
+
+ for (i = 1; i < width && !test_bit(pos + i, bitmap); i++)
+ ;
+ if (i == width)
+ return pos;
+ }
+
+ return -1;
+}
+
+static unsigned int
+stm_find_master_chan(struct stm_device *stm, unsigned int width,
+ unsigned int *mstart, unsigned int mend,
+ unsigned int *cstart, unsigned int cend)
+{
+ struct stp_master *master;
+ unsigned int midx;
+ int pos, err;
+
+ for (midx = *mstart; midx <= mend; midx++) {
+ if (!stm_master(stm, midx)) {
+ err = stp_master_alloc(stm, midx);
+ if (err)
+ return err;
+ }
+
+ master = stm_master(stm, midx);
+
+ if (!master->nr_free)
+ continue;
+
+ pos = find_free_channels(master->chan_map, *cstart, cend,
+ width);
+ if (pos < 0)
+ continue;
+
+ *mstart = midx;
+ *cstart = pos;
+ return 0;
+ }
+
+ return -ENOSPC;
+}
+
+static int stm_output_assign(struct stm_device *stm, unsigned int width,
+ struct stp_policy_node *policy_node,
+ struct stm_output *output)
+{
+ unsigned int midx, cidx, mend, cend;
+ int ret = -EBUSY;
+
+ if (width > stm->data->sw_nchannels)
+ return -EINVAL;
+
+ if (policy_node) {
+ stp_policy_node_get_ranges(policy_node,
+ &midx, &mend, &cidx, &cend);
+ } else {
+ midx = stm->data->sw_start;
+ cidx = 0;
+ mend = stm->data->sw_end;
+ cend = stm->data->sw_nchannels - 1;
+ }
+
+ spin_lock(&stm->mc_lock);
+ if (output->nr_chans)
+ goto unlock;
+
+ ret = stm_find_master_chan(stm, width, &midx, mend, &cidx, cend);
+ if (ret)
+ goto unlock;
+
+ output->master = midx;
+ output->channel = cidx;
+ output->nr_chans = width;
+ stm_output_claim(stm, output);
+ dev_dbg(stm->dev, "assigned %u:%u (+%u)\n", midx, cidx, width);
+
+ ret = 0;
+unlock:
+ spin_unlock(&stm->mc_lock);
+
+ return ret;
+}
+
+static void stm_output_free(struct stm_device *stm, struct stm_output *output)
+{
+ spin_lock(&stm->mc_lock);
+ if (output->nr_chans)
+ stm_output_disclaim(stm, output);
+ spin_unlock(&stm->mc_lock);
+}
+
+static int major_match(struct device *dev, const void *data)
+{
+ unsigned int major = *(unsigned int *)data;
+
+ return MAJOR(dev->devt) == major;
+}
+
+static int stm_char_open(struct inode *inode, struct file *file)
+{
+ struct stm_file *stmf;
+ struct device *dev;
+ unsigned int major = imajor(inode);
+ int err = -ENODEV;
+
+ dev = class_find_device(&stm_class, NULL, &major, major_match);
+ if (!dev)
+ return -ENODEV;
+
+ stmf = kzalloc(sizeof(*stmf), GFP_KERNEL);
+ if (!stmf)
+ return -ENOMEM;
+
+ stmf->stm = dev_get_drvdata(dev);
+
+ if (!try_module_get(stmf->stm->owner))
+ goto err_free;
+
+ file->private_data = stmf;
+
+ return nonseekable_open(inode, file);
+
+err_free:
+ kfree(stmf);
+
+ return err;
+}
+
+static int stm_char_release(struct inode *inode, struct file *file)
+{
+ struct stm_file *stmf = file->private_data;
+
+ stm_output_free(stmf->stm, &stmf->output);
+ module_put(stmf->stm->owner);
+ kfree(stmf);
+
+ return 0;
+}
+
+static int stm_file_assign(struct stm_file *stmf, char *id, unsigned int width)
+{
+ struct stm_device *stm = stmf->stm;
+ int ret;
+
+ mutex_lock(&stm->policy_mutex);
+ if (stm->policy)
+ stmf->policy_node = stp_policy_node_lookup(stm->policy, id);
+
+ ret = stm_output_assign(stm, width, stmf->policy_node, &stmf->output);
+ mutex_unlock(&stm->policy_mutex);
+
+ return ret;
+}
+
+static ssize_t stm_char_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct stm_file *stmf = file->private_data;
+ struct stm_device *stm = stmf->stm;
+ char *kbuf;
+ int err;
+
+ /*
+ * if no m/c have been assigned to this writer up to this
+ * point, use "default" policy entry
+ */
+ if (!stmf->output.nr_chans) {
+ err = stm_file_assign(stmf, "default", 1);
+ /*
+ * EBUSY means that somebody else just assigned this
+ * output, which is just fine for write()
+ */
+ if (err && err != -EBUSY)
+ return err;
+ }
+
+ kbuf = kmalloc(count + 1, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ err = copy_from_user(kbuf, buf, count);
+ if (err) {
+ kfree(kbuf);
+ return -EFAULT;
+ }
+
+ stm->data->write(stm->data, stmf->output.master,
+ stmf->output.channel, kbuf, count);
+
+
+ kfree(kbuf);
+
+ return count;
+}
+
+static int stm_char_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ struct stm_file *stmf = file->private_data;
+ struct stm_device *stm = stmf->stm;
+ unsigned long size, phys;
+
+ if (!stm->data->mmio_addr)
+ return -EOPNOTSUPP;
+
+ if (vma->vm_pgoff)
+ return -EINVAL;
+
+ size = vma->vm_end - vma->vm_start;
+
+ if (stmf->output.nr_chans * stm->data->sw_mmiosz != size)
+ return -EINVAL;
+
+ phys = stm->data->mmio_addr(stm->data, stmf->output.master,
+ stmf->output.channel,
+ stmf->output.nr_chans);
+
+ if (!phys)
+ return -EINVAL;
+
+ vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+ vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
+ vm_iomap_memory(vma, phys, size);
+
+ return 0;
+}
+
+static int stm_char_policy_set_ioctl(struct stm_file *stmf, void __user *arg)
+{
+ struct stm_device *stm = stmf->stm;
+ struct stp_policy_id *id;
+ int ret = -EFAULT;
+ u32 size;
+
+ if (stmf->output.nr_chans)
+ return -EBUSY;
+
+ if (copy_from_user(&size, arg, sizeof(size)))
+ return -EFAULT;
+
+ if (size >= PATH_MAX + sizeof(*id))
+ return -EINVAL;
+
+ id = kzalloc(size + 1, GFP_KERNEL);
+ if (!id)
+ return -ENOMEM;
+
+ if (copy_from_user(id, arg, size))
+ goto err_free;
+
+ if (id->__reserved_0 || id->__reserved_1)
+ return -EINVAL;
+
+ if (id->width < 1 ||
+ id->width > PAGE_SIZE / stm->data->sw_mmiosz) {
+ ret = -EINVAL;
+ goto err_free;
+ }
+
+ ret = stm_file_assign(stmf, id->id, id->width);
+ if (ret)
+ goto err_free;
+
+ if (stm->data->link)
+ stm->data->link(stm->data, stmf->output.master,
+ stmf->output.channel);
+
+ ret = 0;
+
+err_free:
+ kfree(id);
+
+ return ret;
+}
+
+static int stm_char_policy_get_ioctl(struct stm_file *stmf, void __user *arg)
+{
+ struct stp_policy_id id = {
+ .size = sizeof(id),
+ .master = stmf->output.master,
+ .channel = stmf->output.channel,
+ .width = stmf->output.nr_chans,
+ .__reserved_0 = 0,
+ .__reserved_1 = 0,
+ };
+
+ return copy_to_user(arg, &id, id.size) ? -EFAULT : 0;
+}
+
+static long
+stm_char_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ struct stm_file *stmf = file->private_data;
+ int err;
+
+ switch (cmd) {
+ case STP_POLICY_ID_SET:
+ err = stm_char_policy_set_ioctl(stmf, (void __user *)arg);
+ if (err)
+ return err;
+
+ return stm_char_policy_get_ioctl(stmf, (void __user *)arg);
+
+ case STP_POLICY_ID_GET:
+ return stm_char_policy_get_ioctl(stmf, (void __user *)arg);
+
+ default:
+ return -ENOTTY;
+ }
+
+ return 0;
+}
+
+#ifdef CONFIG_COMPAT
+static long
+stm_char_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ return stm_char_ioctl(file, cmd, (unsigned long)compat_ptr(arg));
+}
+#else
+#define stm_char_compat_ioctl NULL
+#endif
+
+static const struct file_operations stm_fops = {
+ .open = stm_char_open,
+ .release = stm_char_release,
+ .write = stm_char_write,
+ .mmap = stm_char_mmap,
+ .unlocked_ioctl = stm_char_ioctl,
+ .compat_ioctl = stm_char_compat_ioctl,
+ .llseek = no_llseek,
+};
+
+int stm_register_device(struct device *parent, struct stm_data *stm_data,
+ struct module *owner)
+{
+ struct stm_device *stm;
+ struct device *dev;
+ unsigned int nmasters;
+ int err = -ENOMEM;
+
+ if (!stm_core_up)
+ return -EPROBE_DEFER;
+
+ if (!stm_data->write || !stm_data->sw_nchannels)
+ return -EINVAL;
+
+ nmasters = stm_data->sw_end - stm_data->sw_start;
+ stm = kzalloc(sizeof(*stm) + nmasters * sizeof(void *), GFP_KERNEL);
+ if (!stm)
+ return -ENOMEM;
+
+ stm->major = register_chrdev(0, stm_data->name, &stm_fops);
+ if (stm->major < 0)
+ goto err_free;
+
+ dev = device_create(&stm_class, parent, MKDEV(stm->major, 0), NULL,
+ "%s", stm_data->name);
+ if (IS_ERR(dev)) {
+ err = PTR_ERR(dev);
+ goto err_device;
+ }
+
+ spin_lock_init(&stm->link_lock);
+ INIT_LIST_HEAD(&stm->link_list);
+
+ spin_lock_init(&stm->mc_lock);
+ mutex_init(&stm->policy_mutex);
+ stm->sw_nmasters = nmasters;
+ stm->owner = owner;
+ stm->data = stm_data;
+ stm->dev = dev;
+ stm_data->stm = stm;
+
+ dev_set_drvdata(dev, stm);
+
+ return 0;
+
+err_device:
+ device_unregister(dev);
+err_free:
+ kfree(stm);
+
+ return err;
+}
+EXPORT_SYMBOL_GPL(stm_register_device);
+
+static void stm_source_link_drop(struct stm_source_device *src);
+
+void stm_unregister_device(struct stm_data *stm_data)
+{
+ struct stm_device *stm = stm_data->stm;
+ struct stm_source_device *src, *iter;
+ int i;
+
+ spin_lock(&stm->link_lock);
+ list_for_each_entry_safe(src, iter, &stm->link_list, link_entry) {
+ stm_source_link_drop(src);
+ }
+ spin_unlock(&stm->link_lock);
+
+ synchronize_srcu(&stm_source_srcu);
+
+ unregister_chrdev(stm->major, stm_data->name);
+
+ if (stm->policy)
+ stp_policy_unbind(stm->policy);
+
+ for (i = 0; i < stm->sw_nmasters; i++)
+ stp_master_free(stm, i);
+
+ device_unregister(stm->dev);
+ kfree(stm);
+ stm_data->stm = NULL;
+}
+EXPORT_SYMBOL_GPL(stm_unregister_device);
+
+static int stm_source_link_add(struct stm_source_device *src,
+ struct stm_device *stm)
+{
+ int err;
+
+ spin_lock(&stm->link_lock);
+ spin_lock(&src->link_lock);
+
+ /* src->link is dereferenced under stm_source_srcu but not the list */
+ rcu_assign_pointer(src->link, stm);
+ list_add_tail(&src->link_entry, &stm->link_list);
+
+ spin_unlock(&src->link_lock);
+ spin_unlock(&stm->link_lock);
+
+ if (stm->policy) {
+ char *id = kstrdup(src->data->name, GFP_KERNEL);
+
+ if (id) {
+ src->policy_node =
+ stp_policy_node_lookup(stm->policy, id);
+
+ kfree(id);
+ }
+ }
+
+ err = stm_output_assign(stm, src->data->nr_chans,
+ src->policy_node, &src->output);
+ if (err)
+ return err;
+
+ if (stm->data->link)
+ stm->data->link(stm->data, src->output.master,
+ src->output.channel);
+ if (src->data->link)
+ src->data->link(src->data);
+
+ return 0;
+}
+
+static void stm_source_link_drop(struct stm_source_device *src)
+{
+ int idx = srcu_read_lock(&stm_source_srcu);
+
+ if (src->link && src->data->unlink)
+ src->data->unlink(src->data);
+
+ srcu_read_unlock(&stm_source_srcu, idx);
+
+ spin_lock(&src->link_lock);
+ if (src->link) {
+ stm_output_free(src->link, &src->output);
+ list_del_init(&src->link_entry);
+ rcu_assign_pointer(src->link, NULL);
+ }
+ spin_unlock(&src->link_lock);
+}
+
+static ssize_t stm_source_link_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct stm_source_device *src = dev_get_drvdata(dev);
+ int idx, ret;
+
+ idx = srcu_read_lock(&stm_source_srcu);
+ ret = sprintf(buf, "%s\n",
+ src->link ? dev_name(src->link->dev) : "<none>");
+ srcu_read_unlock(&stm_source_srcu, idx);
+
+ return ret;
+}
+
+static ssize_t stm_source_link_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct stm_source_device *src = dev_get_drvdata(dev);
+ struct stm_device *link;
+ struct device *linkdev;
+ int err;
+
+ stm_source_link_drop(src);
+
+ linkdev = stm_find_device(buf, count);
+ if (!linkdev)
+ return -EINVAL;
+
+ link = dev_get_drvdata(linkdev);
+
+ err = stm_source_link_add(src, link);
+
+ return err ? : count;
+}
+
+static DEVICE_ATTR_RW(stm_source_link);
+
+static struct attribute *stm_source_attrs[] = {
+ &dev_attr_stm_source_link.attr,
+ NULL,
+};
+
+static const struct attribute_group stm_source_group = {
+ .attrs = stm_source_attrs,
+};
+
+static const struct attribute_group *stm_source_groups[] = {
+ &stm_source_group,
+ NULL,
+};
+
+static struct class stm_source_class = {
+ .name = "stm_source",
+ .dev_groups = stm_source_groups,
+};
+
+/**
+ * stm_source_register_device() - register an stm_source device
+ * @parent: parent device
+ * @data: device description structure
+ *
+ * This will create a device of stm_source class that can write
+ * data to an stm device once linked.
+ *
+ * Return: 0 on success, -errno otherwise.
+ */
+int stm_source_register_device(struct device *parent,
+ struct stm_source_data *data)
+{
+ struct stm_source_device *src;
+ struct device *dev;
+
+ if (!stm_core_up)
+ return -EPROBE_DEFER;
+
+ src = kzalloc(sizeof(*src), GFP_KERNEL);
+ if (!src)
+ return -ENOMEM;
+
+ dev = device_create(&stm_source_class, parent, MKDEV(0, 0), NULL, "%s",
+ data->name);
+ if (IS_ERR(dev)) {
+ kfree(src);
+ return PTR_ERR(dev);
+ }
+
+ spin_lock_init(&src->link_lock);
+ INIT_LIST_HEAD(&src->link_entry);
+ src->dev = dev;
+ src->data = data;
+ data->src = src;
+ dev_set_drvdata(dev, src);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(stm_source_register_device);
+
+/**
+ * stm_source_unregister_device() - unregister an stm_source device
+ * @data: device description that was used to register the device
+ *
+ * This will remove a previously created stm_source device from the system.
+ */
+void stm_source_unregister_device(struct stm_source_data *data)
+{
+ struct stm_source_device *src = data->src;
+
+ stm_source_link_drop(src);
+
+ device_destroy(&stm_source_class, src->dev->devt);
+
+ kfree(src);
+}
+EXPORT_SYMBOL_GPL(stm_source_unregister_device);
+
+int stm_source_write(struct stm_source_data *data, unsigned int chan,
+ const char *buf, size_t count)
+{
+ struct stm_source_device *src = data->src;
+ struct stm_device *stm;
+ int idx;
+
+ if (!src->output.nr_chans)
+ return -ENODEV;
+
+ if (chan >= src->output.nr_chans)
+ return -EINVAL;
+
+ idx = srcu_read_lock(&stm_source_srcu);
+
+ stm = srcu_dereference(src->link, &stm_source_srcu);
+ if (stm)
+ count = stm->data->write(stm->data, src->output.master,
+ src->output.channel + chan, buf,
+ count);
+ else
+ count = -ENODEV;
+
+ srcu_read_unlock(&stm_source_srcu, idx);
+
+ return count;
+}
+EXPORT_SYMBOL_GPL(stm_source_write);
+
+static int __init stm_core_init(void)
+{
+ int err;
+
+ err = class_register(&stm_class);
+ if (err)
+ return err;
+
+ err = class_register(&stm_source_class);
+ if (err) {
+ class_unregister(&stm_class);
+ return err;
+ }
+
+ init_srcu_struct(&stm_source_srcu);
+
+ stm_core_up++;
+
+ return 0;
+}
+
+postcore_initcall(stm_core_init);
diff --git a/drivers/stm/policy.c b/drivers/stm/policy.c
new file mode 100644
index 0000000000..6ce125da3b
--- /dev/null
+++ b/drivers/stm/policy.c
@@ -0,0 +1,470 @@
+/*
+ * System Trace Module (STM) master/channel allocation policy management
+ * Copyright (c) 2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * A master/channel allocation policy allows mapping string identifiers to
+ * master and channel ranges, where allocation can be done.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/configfs.h>
+#include <linux/slab.h>
+#include <linux/stm.h>
+#include "stm.h"
+
+/*
+ * STP Master/Channel allocation policy configfs layout.
+ */
+
+struct stp_policy {
+ struct config_group group;
+ struct stm_device *stm;
+};
+
+struct stp_policy_node {
+ struct config_group group;
+ struct stm_device *stm;
+ struct stp_policy *policy;
+ unsigned int first_master;
+ unsigned int last_master;
+ unsigned int first_channel;
+ unsigned int last_channel;
+};
+
+void stp_policy_node_get_ranges(struct stp_policy_node *policy_node,
+ unsigned int *mstart, unsigned int *mend,
+ unsigned int *cstart, unsigned int *cend)
+{
+ *mstart = policy_node->first_master;
+ *mend = policy_node->last_master;
+ *cstart = policy_node->first_channel;
+ *cend = policy_node->last_channel;
+}
+
+static inline char *stp_policy_node_name(struct stp_policy_node *policy_node)
+{
+ return policy_node->group.cg_item.ci_name ? : "<none>";
+}
+
+static inline struct stp_policy *to_stp_policy(struct config_item *item)
+{
+ return item ?
+ container_of(to_config_group(item), struct stp_policy, group) :
+ NULL;
+}
+
+static inline struct stp_policy_node *
+to_stp_policy_node(struct config_item *item)
+{
+ return item ?
+ container_of(to_config_group(item), struct stp_policy_node,
+ group) :
+ NULL;
+}
+
+static ssize_t stp_policy_node_masters_show(struct stp_policy_node *policy_node,
+ char *page)
+{
+ ssize_t count;
+
+ count = sprintf(page, "%u %u\n", policy_node->first_master,
+ policy_node->last_master);
+
+ return count;
+}
+
+static ssize_t
+stp_policy_node_masters_store(struct stp_policy_node *policy_node,
+ const char *page, size_t count)
+{
+ struct stm_device *stm = policy_node->stm;
+ unsigned int first, last;
+ char *p = (char *) page;
+
+ if (sscanf(p, "%u %u", &first, &last) != 2)
+ return -EINVAL;
+
+ /* must be within [sw_start..sw_end], which is an inclusive range */
+ if (first > INT_MAX || last > INT_MAX || first > last ||
+ first < stm->data->sw_start ||
+ last > stm->data->sw_end)
+ return -ERANGE;
+
+ policy_node->first_master = first;
+ policy_node->last_master = last;
+
+ return count;
+}
+
+static ssize_t
+stp_policy_node_channels_show(struct stp_policy_node *policy_node, char *page)
+{
+ ssize_t count;
+
+ count = sprintf(page, "%u %u\n", policy_node->first_channel,
+ policy_node->last_channel);
+
+ return count;
+}
+
+static ssize_t
+stp_policy_node_channels_store(struct stp_policy_node *policy_node,
+ const char *page, size_t count)
+{
+ unsigned int first, last;
+ char *p = (char *) page;
+
+ if (sscanf(p, "%u %u", &first, &last) != 2)
+ return -EINVAL;
+
+ if (first > INT_MAX || last > INT_MAX || first > last ||
+ last >= policy_node->stm->data->sw_nchannels)
+ return -ERANGE;
+
+ policy_node->first_channel = first;
+ policy_node->last_channel = last;
+
+ return count;
+}
+
+static void stp_policy_node_release(struct config_item *item)
+{
+ kfree(to_stp_policy_node(item));
+}
+
+struct stp_policy_node_attribute {
+ struct configfs_attribute attr;
+ ssize_t (*show)(struct stp_policy_node *, char *);
+ ssize_t (*store)(struct stp_policy_node *, const char *, size_t);
+};
+
+static ssize_t stp_policy_node_attr_show(struct config_item *item,
+ struct configfs_attribute *attr,
+ char *page)
+{
+ struct stp_policy_node *policy_node = to_stp_policy_node(item);
+ struct stp_policy_node_attribute *pn_attr =
+ container_of(attr, struct stp_policy_node_attribute, attr);
+ ssize_t count = 0;
+
+ if (pn_attr->show)
+ count = pn_attr->show(policy_node, page);
+
+ return count;
+}
+
+static ssize_t stp_policy_node_attr_store(struct config_item *item,
+ struct configfs_attribute *attr,
+ const char *page, size_t len)
+{
+ struct stp_policy_node *policy_node = to_stp_policy_node(item);
+ struct stp_policy_node_attribute *pn_attr =
+ container_of(attr, struct stp_policy_node_attribute, attr);
+ ssize_t count = -EINVAL;
+
+ if (pn_attr->store)
+ count = pn_attr->store(policy_node, page, len);
+
+ return count;
+}
+
+static struct configfs_item_operations stp_policy_node_item_ops = {
+ .release = stp_policy_node_release,
+ .show_attribute = stp_policy_node_attr_show,
+ .store_attribute = stp_policy_node_attr_store,
+};
+
+static struct stp_policy_node_attribute stp_policy_node_attr_range = {
+ .attr = {
+ .ca_owner = THIS_MODULE,
+ .ca_name = "masters",
+ .ca_mode = S_IRUGO | S_IWUSR,
+ },
+ .show = stp_policy_node_masters_show,
+ .store = stp_policy_node_masters_store,
+};
+
+static struct stp_policy_node_attribute stp_policy_node_attr_channels = {
+ .attr = {
+ .ca_owner = THIS_MODULE,
+ .ca_name = "channels",
+ .ca_mode = S_IRUGO | S_IWUSR,
+ },
+ .show = stp_policy_node_channels_show,
+ .store = stp_policy_node_channels_store,
+};
+
+static struct configfs_attribute *stp_policy_node_attrs[] = {
+ &stp_policy_node_attr_range.attr,
+ &stp_policy_node_attr_channels.attr,
+ NULL,
+};
+
+static struct config_item_type stp_policy_type;
+static struct config_item_type stp_policy_node_type;
+
+static struct config_group *
+stp_policy_node_make(struct config_group *group, const char *name)
+{
+ struct stp_policy_node *policy_node, *parent_node;
+ struct stp_policy *policy;
+
+ if (group->cg_item.ci_type == &stp_policy_type) {
+ policy = container_of(group, struct stp_policy, group);
+ } else {
+ parent_node = container_of(group, struct stp_policy_node,
+ group);
+ policy = parent_node->policy;
+ }
+
+ if (!policy->stm)
+ return ERR_PTR(-ENODEV);
+
+ policy_node = kzalloc(sizeof(struct stp_policy_node), GFP_KERNEL);
+ if (!policy_node)
+ return ERR_PTR(-ENOMEM);
+
+ config_group_init_type_name(&policy_node->group, name,
+ &stp_policy_node_type);
+
+ policy_node->policy = policy;
+ policy_node->stm = policy->stm;
+
+ /* default values for the attributes */
+ policy_node->first_master = policy->stm->data->sw_start;
+ policy_node->last_master = policy->stm->data->sw_end;
+ policy_node->first_channel = 0;
+ policy_node->last_channel = policy->stm->data->sw_nchannels - 1;
+
+ return &policy_node->group;
+}
+
+static void
+stp_policy_node_drop(struct config_group *group, struct config_item *item)
+{
+ config_item_put(item);
+}
+
+static struct configfs_group_operations stp_policy_node_group_ops = {
+ .make_group = stp_policy_node_make,
+ .drop_item = stp_policy_node_drop,
+};
+
+static struct config_item_type stp_policy_node_type = {
+ .ct_item_ops = &stp_policy_node_item_ops,
+ .ct_group_ops = &stp_policy_node_group_ops,
+ .ct_attrs = stp_policy_node_attrs,
+ .ct_owner = THIS_MODULE,
+};
+
+/*
+ * Root group: policies.
+ */
+static struct configfs_attribute stp_policy_attr_device = {
+ .ca_owner = THIS_MODULE,
+ .ca_name = "device",
+ .ca_mode = S_IRUGO | S_IWUSR,
+};
+
+static struct configfs_attribute *stp_policy_attrs[] = {
+ &stp_policy_attr_device,
+ NULL,
+};
+
+static ssize_t stp_policy_attr_show(struct config_item *item,
+ struct configfs_attribute *attr,
+ char *page)
+{
+ struct stp_policy *policy = to_stp_policy(item);
+
+ return sprintf(page, "%s\n",
+ (policy && policy->stm) ?
+ policy->stm->data->name :
+ "<none>");
+}
+
+static ssize_t stp_policy_attr_store(struct config_item *item,
+ struct configfs_attribute *attr,
+ const char *page, size_t len)
+{
+ struct stp_policy *policy = to_stp_policy(item);
+ ssize_t count = -EINVAL;
+ struct device *dev;
+
+ dev = stm_find_device(page, len);
+ if (dev) {
+ count = len;
+ if (policy->stm)
+ put_device(policy->stm->dev);
+
+ policy->stm = dev_get_drvdata(dev);
+
+ mutex_lock(&policy->stm->policy_mutex);
+ policy->stm->policy = policy;
+ mutex_unlock(&policy->stm->policy_mutex);
+ }
+
+ return count;
+}
+
+void stp_policy_unbind(struct stp_policy *policy)
+{
+ put_device(policy->stm->dev);
+
+ mutex_lock(&policy->stm->policy_mutex);
+ policy->stm->policy = NULL;
+ mutex_unlock(&policy->stm->policy_mutex);
+
+ policy->stm = NULL;
+}
+
+static void stp_policy_release(struct config_item *item)
+{
+ struct stp_policy *policy = to_stp_policy(item);
+
+ stp_policy_unbind(policy);
+ kfree(policy);
+}
+
+static struct configfs_item_operations stp_policy_item_ops = {
+ .release = stp_policy_release,
+ .show_attribute = stp_policy_attr_show,
+ .store_attribute = stp_policy_attr_store,
+};
+
+static struct configfs_group_operations stp_policy_group_ops = {
+ .make_group = stp_policy_node_make,
+};
+
+static struct config_item_type stp_policy_type = {
+ .ct_item_ops = &stp_policy_item_ops,
+ .ct_group_ops = &stp_policy_group_ops,
+ .ct_attrs = stp_policy_attrs,
+ .ct_owner = THIS_MODULE,
+};
+
+static struct config_group *
+stp_policies_make(struct config_group *group, const char *name)
+{
+ struct stp_policy *policy;
+
+ policy = kzalloc(sizeof(*policy), GFP_KERNEL);
+ if (!policy)
+ return ERR_PTR(-ENOMEM);
+
+ config_group_init_type_name(&policy->group, name,
+ &stp_policy_type);
+ policy->stm = NULL;
+
+ return &policy->group;
+}
+
+static struct configfs_group_operations stp_policies_group_ops = {
+ .make_group = stp_policies_make,
+};
+
+static struct config_item_type stp_policies_type = {
+ .ct_group_ops = &stp_policies_group_ops,
+ .ct_owner = THIS_MODULE,
+};
+
+static struct configfs_subsystem stp_policy_subsys = {
+ .su_group = {
+ .cg_item = {
+ .ci_namebuf = "stp-policy",
+ .ci_type = &stp_policies_type,
+ },
+ },
+};
+
+/*
+ * Lock the policy mutex from the outside
+ */
+static struct stp_policy_node *
+__stp_policy_node_lookup(struct stp_policy *policy, char *s)
+{
+ struct stp_policy_node *policy_node, *ret;
+ struct list_head *head = &policy->group.cg_children;
+ struct config_item *item;
+ char *start, *end = s;
+
+ if (list_empty(head))
+ return NULL;
+
+ /* return the first entry if everything else fails */
+ item = list_entry(head->next, struct config_item, ci_entry);
+ ret = to_stp_policy_node(item);
+
+next:
+ for (;;) {
+ start = strsep(&end, "/");
+ if (!start)
+ break;
+
+ if (!*start)
+ continue;
+
+ list_for_each_entry(item, head, ci_entry) {
+ policy_node = to_stp_policy_node(item);
+
+ if (!strcmp(start,
+ policy_node->group.cg_item.ci_name)) {
+ ret = policy_node;
+
+ if (!end)
+ goto out;
+
+ head = &policy_node->group.cg_children;
+ goto next;
+ }
+ }
+ break;
+ }
+
+out:
+ return ret;
+}
+
+struct stp_policy_node *
+stp_policy_node_lookup(struct stp_policy *policy, char *s)
+{
+ struct stp_policy_node *policy_node;
+
+ mutex_lock(&stp_policy_subsys.su_mutex);
+ policy_node = __stp_policy_node_lookup(policy, s);
+ mutex_unlock(&stp_policy_subsys.su_mutex);
+
+ return policy_node;
+}
+
+static int stp_configfs_init(void)
+{
+ int err;
+
+ config_group_init(&stp_policy_subsys.su_group);
+ mutex_init(&stp_policy_subsys.su_mutex);
+ err = configfs_register_subsystem(&stp_policy_subsys);
+
+ return err;
+}
+
+static void stp_configfs_done(void)
+{
+ configfs_unregister_subsystem(&stp_policy_subsys);
+}
+
+module_init(stp_configfs_init);
+module_exit(stp_configfs_done);
diff --git a/drivers/stm/stm.h b/drivers/stm/stm.h
new file mode 100644
index 0000000000..7d7d9954e2
--- /dev/null
+++ b/drivers/stm/stm.h
@@ -0,0 +1,77 @@
+/*
+ * System Trace Module (STM) infrastructure
+ * Copyright (c) 2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * STM class implements generic infrastructure for System Trace Module devices
+ * as defined in MIPI STPv2 specification.
+ */
+
+#ifndef _CLASS_STM_H_
+#define _CLASS_STM_H_
+
+struct stp_policy;
+struct stp_policy_node;
+
+struct stp_policy_node *
+stp_policy_node_lookup(struct stp_policy *policy, char *s);
+void stp_policy_unbind(struct stp_policy *policy);
+
+void stp_policy_node_get_ranges(struct stp_policy_node *policy_node,
+ unsigned int *mstart, unsigned int *mend,
+ unsigned int *cstart, unsigned int *cend);
+
+struct stp_master {
+ unsigned int nr_free;
+ unsigned long chan_map[0];
+};
+
+struct stm_device {
+ struct device *dev;
+ struct module *owner;
+ struct stp_policy *policy;
+ struct mutex policy_mutex;
+ int major;
+ unsigned int sw_nmasters;
+ struct stm_data *data;
+ spinlock_t link_lock;
+ struct list_head link_list;
+ /* master allocation */
+ spinlock_t mc_lock;
+ struct stp_master *masters[0];
+};
+
+struct stm_output {
+ unsigned int master;
+ unsigned int channel;
+ unsigned int nr_chans;
+};
+
+struct stm_file {
+ struct stm_device *stm;
+ struct stp_policy_node *policy_node;
+ struct stm_output output;
+};
+
+struct device *stm_find_device(const char *name, size_t len);
+
+struct stm_source_device {
+ struct device *dev;
+ struct stm_source_data *data;
+ spinlock_t link_lock;
+ struct stm_device *link;
+ struct list_head link_entry;
+ /* one output per stm_source device */
+ struct stp_policy_node *policy_node;
+ struct stm_output output;
+};
+
+#endif /* _CLASS_STM_H_ */
diff --git a/include/linux/stm.h b/include/linux/stm.h
new file mode 100644
index 0000000000..d00fa9a3fc
--- /dev/null
+++ b/include/linux/stm.h
@@ -0,0 +1,87 @@
+/*
+ * System Trace Module (STM) infrastructure apis
+ * Copyright (C) 2014 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ */
+
+#ifndef _STM_H_
+#define _STM_H_
+
+struct stp_policy;
+
+struct stm_device;
+
+/**
+ * struct stm_data - STM device description and callbacks
+ * @name: device name
+ * @stm: internal structure, only used by stm class code
+ * @sw_start: first STP master
+ * @sw_end: last STP master
+ * @sw_nchannels: number of STP channels per master
+ * @sw_mmiosz: size of one channel's IO space, for mmap, optional
+ * @write: write callback
+ * @mmio_addr: mmap callback, optional
+ *
+ * Fill out this structure before calling stm_register_device() to create
+ * an STM device and stm_unregister_device() to destroy it. It will also be
+ * passed back to write() and mmio_addr() callbacks.
+ */
+struct stm_data {
+ const char *name;
+ struct stm_device *stm;
+ unsigned int sw_start;
+ unsigned int sw_end;
+ unsigned int sw_nchannels;
+ unsigned int sw_mmiosz;
+ ssize_t (*write)(struct stm_data *, unsigned int,
+ unsigned int, const char *, size_t);
+ phys_addr_t (*mmio_addr)(struct stm_data *, unsigned int,
+ unsigned int, unsigned int);
+ void (*link)(struct stm_data *, unsigned int,
+ unsigned int);
+ void (*unlink)(struct stm_data *, unsigned int,
+ unsigned int);
+};
+
+int stm_register_device(struct device *parent, struct stm_data *stm_data,
+ struct module *owner);
+void stm_unregister_device(struct stm_data *stm_data);
+
+struct stm_source_device;
+
+/**
+ * struct stm_source_data - STM source device description and callbacks
+ * @name: device name, will be used for policy lookup
+ * @src: internal structure, only used by stm class code
+ * @nr_chans: number of channels to allocate
+ * @link: called when STM device gets linked to this source
+ * @unlink: called when STH device is about to be unlinked
+ *
+ * Fill in this structure before calling stm_source_register_device() to
+ * register a source device. Also pass it to unregister and write calls.
+ */
+struct stm_source_data {
+ const char *name;
+ struct stm_source_device *src;
+ unsigned int percpu;
+ unsigned int nr_chans;
+ int (*link)(struct stm_source_data *data);
+ void (*unlink)(struct stm_source_data *data);
+};
+
+int stm_source_register_device(struct device *parent,
+ struct stm_source_data *data);
+void stm_source_unregister_device(struct stm_source_data *data);
+
+int stm_source_write(struct stm_source_data *data, unsigned int chan,
+ const char *buf, size_t count);
+
+#endif /* _STM_H_ */
diff --git a/include/uapi/linux/stm.h b/include/uapi/linux/stm.h
new file mode 100644
index 0000000000..042b58b53b
--- /dev/null
+++ b/include/uapi/linux/stm.h
@@ -0,0 +1,47 @@
+/*
+ * System Trace Module (STM) userspace interfaces
+ * Copyright (c) 2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * STM class implements generic infrastructure for System Trace Module devices
+ * as defined in MIPI STPv2 specification.
+ */
+
+#ifndef _UAPI_LINUX_STM_H
+#define _UAPI_LINUX_STM_H
+
+/**
+ * struct stp_policy_id - identification for the STP policy
+ * @size: size of the structure including real id[] length
+ * @master: assigned master
+ * @channel: first assigned channel
+ * @width: number of requested channels
+ * @id: identification string
+ *
+ * User must calculate the total size of the structure and put it into
+ * @size field, fill out the @id and desired @width. In return, kernel
+ * fills out @master, @channel and @width.
+ */
+struct stp_policy_id {
+ __u32 size;
+ __u16 master;
+ __u16 channel;
+ __u16 width;
+ /* padding */
+ __u16 __reserved_0;
+ __u32 __reserved_1;
+ char id[0];
+};
+
+#define STP_POLICY_ID_SET _IOWR('%', 0, struct stp_policy_id)
+#define STP_POLICY_ID_GET _IOR('%', 1, struct stp_policy_id)
+
+#endif /* _UAPI_LINUX_STM_H */
--
2.1.4
^ permalink raw reply related
* Re: [PATCH v5 tip 0/7] tracing: attach eBPF programs to kprobes
From: Steven Rostedt @ 2015-03-07 1:09 UTC (permalink / raw)
To: Ingo Molnar
Cc: Alexei Starovoitov, Namhyung Kim, Arnaldo Carvalho de Melo,
Jiri Olsa, Masami Hiramatsu, David S. Miller, Daniel Borkmann,
Peter Zijlstra, Linux API, Network Development, LKML
In-Reply-To: <20150304154824.5f165c6d@gandalf.local.home>
On Wed, 4 Mar 2015 15:48:24 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed, 4 Mar 2015 21:33:16 +0100
> Ingo Molnar <mingo@kernel.org> wrote:
>
> >
> > * Alexei Starovoitov <ast@plumgrid.com> wrote:
> >
> > > On Sun, Mar 1, 2015 at 3:27 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> > > > Peter, Steven,
> > > > I think this set addresses everything we've discussed.
> > > > Please review/ack. Thanks!
> > >
> > > icmp echo request
> >
> > I'd really like to have an Acked-by from Steve (propagated into the
> > changelogs) before looking at applying these patches.
>
> I'll have to look at this tomorrow. I'm a bit swamped with other things
> at the moment :-/
>
Just an update. I started looking at it but then was pulled off to do
other things. I'll make this a priority next week. Sorry for the delay.
-- Steve
^ permalink raw reply
* Re: Right interface for cellphone modem audio (was Re: [PATCHv2 0/2] N900 Modem Speech Support)
From: Kai Vehmanen @ 2015-03-06 20:49 UTC (permalink / raw)
To: Pavel Machek
Cc: perex-/Fr2/VpizcU, Takashi Iwai,
alsa-devel-K7yf7f+aM1XWsZ/bQMPhNw, Sebastian Reichel,
Peter Ujfalusi, Kai Vehmanen, Pali Rohar, Aaro Koskinen,
Ivaylo Dimitrov, linux-omap-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20150306094354.GA32369@amd>
Hi,
On Fri, 6 Mar 2015, Pavel Machek wrote:
>> Our take was that ALSA is not the right interface for cmt_speech. The
>> cmt_speech interface in the modem is _not_ a PCM interface as modelled by
>> ALSA. Specifically:
>>
>> - the interface is lossy in both directions
>> - data is sent in packets, not a stream of samples (could be other things
>> than PCM samples), with timing and meta-data
>> - timing of uplink is of utmost importance
>
> I see that you may not have data available in "downlink" scenario, but
> how is it lossy in "uplink" scenario? Phone should always try to fill
> the uplink, no? (Or do you detect silence and not transmit in this
Lossy was perhaps not the best choice of words, non-continuous would be
a better choice in the uplink case. To adjust timing, some samples from
the continuous locally recorded PCM stream need to be skipped and/or
duplicated. This would normally be done between speech bursts to avoid
artifacts.
> Packets vs. stream of samples... does userland need to know about the
> packets? Could we simply hide it from the userland? As userland daemon
> is (supposed to be) realtime, do we really need extra set of
> timestamps? What other metadata are there?
Yes, we need flags that tell about the frame. Please see docs for
'frame_flags' and 'spc_flags' in libcmtspeechdata cmtspeech.h:
https://www.gitorious.org/libcmtspeechdata/libcmtspeechdata/source/9206835ea3c96815840a80ccba9eaeb16ff7e294:cmtspeech.h
Kernel space does not have enough info to handle these flags as the audio
mixer is not implemented in kernel, so they have to be passed to/from
user-space.
And some further info in libcmtspeechdata/docs/
https://www.gitorious.org/libcmtspeechdata/libcmtspeechdata/source/9206835ea3c96815840a80ccba9eaeb16ff7e294:doc/libcmtspeechdata_api_docs_main.txt
> Uplink timing... As the daemon is realtime, can it just send the data
> at the right time? Also normally uplink would be filled, no?
But how would you implement that via the ALSA API? With cmt_speech, a
speech packet is prepared in a mmap'ed buffer, flags are set to describe
the buffer, and at the correct time, write() is called to trigger
transmission in HW (see cmtspeech_ul_buffer_release() in
libcmtspeechdata() -> compare this to snd_pcm_mmap_commit() in ALSA). In
ALSA, the mmap commit and PCM write variants just add data to the
ringbuffer and update the appl pointer. Only initial start (and stop) on
stream have the "do something now" semantics in ALSA.
The ALSA compressed offload API did not exist back when we were working on
cmt_speech, but that's still not a good fit, although adds some of the
concepts (notably frames).
> Well, packets are of fixed size, right? So the userland can simply
> supply the right size in the common case. As for sending at the right
> time... well... if the userspace is already real-time, that should be
> easy
See above, ALSA just doesn't work like that, there's no syscall for "send
these samples now", the model is different.
Br, Kai
^ permalink raw reply
* Re: [PATCH] capabilities: Ambient capability set V2
From: Serge E. Hallyn @ 2015-03-06 20:08 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Christoph Lameter, Serge E. Hallyn, Serge Hallyn, Jonathan Corbet,
Aaron Jones, LSM List, linux-kernel@vger.kernel.org,
Andrew Morton, Andrew G. Morgan, Mimi Zohar, Austin S Hemmelgarn,
Markku Savela, Jarkko Sakkinen, Linux API, Michael Kerrisk
In-Reply-To: <CALCETrVQF22rkZFD8VAW_xrVQOjwpej6W4TJS9gbN9B431TEKg@mail.gmail.com>
On Fri, Mar 06, 2015 at 11:02:43AM -0800, Andy Lutomirski wrote:
> On Fri, Mar 6, 2015 at 10:53 AM, Christoph Lameter <cl@linux.com> wrote:
> > On Fri, 6 Mar 2015, Serge E. Hallyn wrote:
> >
> >> Sorry, something about that patch-patch didn't make sense to me, but I
> >> need to look more closely. My objection was that you were able to get the
> >> pA capabilities into pP without them being in your pI. Your proposed
> >> change didn't seem like it would fix that.
> >
> > Just tried to fix that. Could it be that cap_inherited is never set even
> > for a binary that has
> >
> > christoph@fujitsu-haswell:~$ getcap ambient_test
> >
> > ambient_test = cap_setpcap,cap_net_admin,cap_net_raw,cap_sys_nice+eip
>
> I think that's right. fI doesn't set pI.
Right. The idea is that for the running binary to get capability x in its
pP, its privileged ancestor must have set x in pI, and the binary itself
must be trusted with x in fI.
What we are doing is allowing bypassing fI using pA, without bypassing the
requirement for x to be in pI. Since pI is intended to be filled (for
instance) at login based on username/group, pI generally does not get cleared.
At the same time, any software which thinks it is running untrusted code
safely without privilege by clearing pI and pP won't be fooled by pA.
-serge
^ permalink raw reply
* Re: [PATCH] capabilities: Ambient capability set V2
From: Andy Lutomirski @ 2015-03-06 19:02 UTC (permalink / raw)
To: Christoph Lameter
Cc: Serge E. Hallyn, Serge Hallyn, Jonathan Corbet, Aaron Jones,
LSM List, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Andrew Morton, Andrew G. Morgan, Mimi Zohar, Austin S Hemmelgarn,
Markku Savela, Jarkko Sakkinen, Linux API, Michael Kerrisk
In-Reply-To: <alpine.DEB.2.11.1503061244130.9804-gkYfJU5Cukgdnm+yROfE0A@public.gmane.org>
On Fri, Mar 6, 2015 at 10:53 AM, Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> wrote:
> On Fri, 6 Mar 2015, Serge E. Hallyn wrote:
>
>> Sorry, something about that patch-patch didn't make sense to me, but I
>> need to look more closely. My objection was that you were able to get the
>> pA capabilities into pP without them being in your pI. Your proposed
>> change didn't seem like it would fix that.
>
> Just tried to fix that. Could it be that cap_inherited is never set even
> for a binary that has
>
> christoph@fujitsu-haswell:~$ getcap ambient_test
>
> ambient_test = cap_setpcap,cap_net_admin,cap_net_raw,cap_sys_nice+eip
I think that's right. fI doesn't set pI.
--Andy
^ permalink raw reply
* Re: [PATCH] capabilities: Ambient capability set V2
From: Christoph Lameter @ 2015-03-06 18:53 UTC (permalink / raw)
To: Serge E. Hallyn
Cc: Serge Hallyn, Andy Lutomirski, Jonathan Corbet, Aaron Jones,
linux-security-module, linux-kernel, akpm, Andrew G. Morgan,
Mimi Zohar, Austin S Hemmelgarn, Markku Savela, Jarkko Sakkinen,
linux-api, Michael Kerrisk
In-Reply-To: <20150306163443.GA28386@mail.hallyn.com>
On Fri, 6 Mar 2015, Serge E. Hallyn wrote:
> Sorry, something about that patch-patch didn't make sense to me, but I
> need to look more closely. My objection was that you were able to get the
> pA capabilities into pP without them being in your pI. Your proposed
> change didn't seem like it would fix that.
Just tried to fix that. Could it be that cap_inherited is never set even
for a binary that has
christoph@fujitsu-haswell:~$ getcap ambient_test
ambient_test = cap_setpcap,cap_net_admin,cap_net_raw,cap_sys_nice+eip
I added some printks and it seems that current_cred()->cap_inherited is
not set when running ambient_test.
Index: linux/security/commoncap.c
===================================================================
--- linux.orig/security/commoncap.c 2015-03-06 11:05:10.802218196
-0600
+++ linux/security/commoncap.c 2015-03-06 12:50:38.424330679 -0600
@@ -456,6 +456,10 @@ static int get_file_caps(struct linux_bi
kernel_cap_t relevant_ambient = cap_intersect(
current_cred()->cap_ambient,
current_cred()->cap_inheritable);
+ printk("task->comm %s: Amb=%x Inh=%x relevant=%x\n",
+ current->comm, current_cred()->cap_ambient.cap[0],
+ current_cred()->cap_inheritable.cap[0],
+ relevant_ambient.cap[0]);
rc = 0;
if (!cap_isclear(relevant_ambient)) {
/*
Mar 6 12:42:18 fujitsu-haswell kernel: [ 284.715051] task->comm ambient_test: Amb=803000 Inh=0 relevant=0
^ permalink raw reply
* Re: [PATCH] capabilities: Ambient capability set V2
From: Serge E. Hallyn @ 2015-03-06 16:34 UTC (permalink / raw)
To: Christoph Lameter
Cc: Serge E. Hallyn, Serge Hallyn, Andy Lutomirski, Jonathan Corbet,
Aaron Jones, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
akpm-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r, Andrew G. Morgan,
Mimi Zohar, Austin S Hemmelgarn, Markku Savela, Jarkko Sakkinen,
linux-api-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk
In-Reply-To: <alpine.DEB.2.11.1503060948460.8207-gkYfJU5Cukgdnm+yROfE0A@public.gmane.org>
On Fri, Mar 06, 2015 at 09:50:02AM -0600, Christoph Lameter wrote:
> On Thu, 5 Mar 2015, Serge E. Hallyn wrote:
>
> > > > So I'd say drop this change ^
> > >
> > > Then the ambient caps get ignored for a executables that have capabilities
> > > seton the file?
> >
> > Yes. Those are assumed to already know what they're doing.
>
> Ok can we get this patch merged now if I do this change
> (effectively ambient caps for binaries that have no caps set) and deal with the
> other issues later? This would cover most of the use cases here at least.
Sorry, something about that patch-patch didn't make sense to me, but I
need to look more closely. My objection was that you were able to get the
pA capabilities into pP without them being in your pI. Your proposed
change didn't seem like it would fix that.
It also seems worth waiting until you talk to Andy in person next week.
-serge
^ permalink raw reply
* Re: [PATCH] capabilities: Ambient capability set V2
From: Christoph Lameter @ 2015-03-06 15:50 UTC (permalink / raw)
To: Serge E. Hallyn
Cc: Serge Hallyn, Andy Lutomirski, Jonathan Corbet, Aaron Jones,
linux-security-module-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
akpm-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r, Andrew G. Morgan,
Mimi Zohar, Austin S Hemmelgarn, Markku Savela, Jarkko Sakkinen,
linux-api-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk
In-Reply-To: <20150305171326.GA14998-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
On Thu, 5 Mar 2015, Serge E. Hallyn wrote:
> > > So I'd say drop this change ^
> >
> > Then the ambient caps get ignored for a executables that have capabilities
> > seton the file?
>
> Yes. Those are assumed to already know what they're doing.
Ok can we get this patch merged now if I do this change
(effectively ambient caps for binaries that have no caps set) and deal with the
other issues later? This would cover most of the use cases here at least.
^ permalink raw reply
* Re: [PATCH] capabilities: Ambient capability set V2
From: Christoph Lameter @ 2015-03-06 15:47 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Jarkko Sakkinen, Andrew Morton, LSM List, Andrew G. Morgan,
Michael Kerrisk, Mimi Zohar,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Austin S Hemmelgarn, Aaron Jones, Serge Hallyn, Serge E. Hallyn,
Markku Savela, Linux API, Jonathan Corbet
In-Reply-To: <CALCETrUVrfPBpb69WFyptzFoJ8Sx4LwhhjirVx=KQ11ofCcwYg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 825 bytes --]
On Thu, 5 Mar 2015, Andy Lutomirski wrote:
> > Yes due to the library issues.
>
> You can't LD_PRELOAD and fP together. And I'm still unconvinced that
> ambient caps can ever be safe in conjunction with fP. I'll grill you
> next week on what you're trying to do that makes you want this :)
>From the ld.so manpage:
LD_PRELOAD
A whitespace-separated list of additional, user-specified, ELF shared
libraries to be loaded before all others. This can be used to selec‐
tively override functions in other shared libraries. For setuid/set‐
gid ELF binaries, only libraries in the standard search directories
that are also setgid will be loaded.
So this mechanism has not been made to work for binaries with caps? We
have to keep using setuid?
^ permalink raw reply
* Re: [Qemu-devel] [PATCH 02/21] userfaultfd: linux/Documentation/vm/userfaultfd.txt
From: Eric Blake @ 2015-03-06 15:39 UTC (permalink / raw)
To: Andrea Arcangeli, qemu-devel, kvm, linux-kernel, linux-mm,
linux-api, Android Kernel Team
Cc: Robert Love, Dave Hansen, Jan Kara, Neil Brown, Stefan Hajnoczi,
Andrew Jones, Sanidhya Kashyap, KOSAKI Motohiro,
Michel Lespinasse, Taras Glek, zhang.zhanghailiang,
Pavel Emelyanov, Hugh Dickins, Mel Gorman, Sasha Levin,
Dr. David Alan Gilbert, Huangpeng (Peter), Andres Lagar-Cavilla,
Christopher Covington, Anthony Liguori, Paolo Bonzini
In-Reply-To: <1425575884-2574-3-git-send-email-aarcange@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 6342 bytes --]
On 03/05/2015 10:17 AM, Andrea Arcangeli wrote:
> Add documentation.
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
> Documentation/vm/userfaultfd.txt | 97 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 97 insertions(+)
> create mode 100644 Documentation/vm/userfaultfd.txt
Just a grammar review (no analysis of technical correctness)
>
> diff --git a/Documentation/vm/userfaultfd.txt b/Documentation/vm/userfaultfd.txt
> new file mode 100644
> index 0000000..2ec296c
> --- /dev/null
> +++ b/Documentation/vm/userfaultfd.txt
> @@ -0,0 +1,97 @@
> += Userfaultfd =
> +
> +== Objective ==
> +
> +Userfaults allow to implement on demand paging from userland and more
s/to implement/the implementation of/
and maybe: s/on demand/on-demand/
> +generally they allow userland to take control various memory page
> +faults, something otherwise only the kernel code could do.
> +
> +For example userfaults allows a proper and more optimal implementation
> +of the PROT_NONE+SIGSEGV trick.
> +
> +== Design ==
> +
> +Userfaults are delivered and resolved through the userfaultfd syscall.
> +
> +The userfaultfd (aside from registering and unregistering virtual
> +memory ranges) provides for two primary functionalities:
s/provides for/provides/
> +
> +1) read/POLLIN protocol to notify an userland thread of the faults
s/an userland/a userland/ (remember, 'a unicorn gets an umbrella' - if
the 'u' is pronounced 'you' the correct article is 'a')
> + happening
> +
> +2) various UFFDIO_* ioctls that can mangle over the virtual memory
> + regions registered in the userfaultfd that allows userland to
> + efficiently resolve the userfaults it receives via 1) or to mangle
> + the virtual memory in the background
maybe: s/mangle/manage/2
> +
> +The real advantage of userfaults if compared to regular virtual memory
> +management of mremap/mprotect is that the userfaults in all their
> +operations never involve heavyweight structures like vmas (in fact the
> +userfaultfd runtime load never takes the mmap_sem for writing).
> +
> +Vmas are not suitable for page(or hugepage)-granular fault tracking
s/page(or hugepage)-granular/page- (or hugepage-) granular/
> +when dealing with virtual address spaces that could span
> +Terabytes. Too many vmas would be needed for that.
> +
> +The userfaultfd once opened by invoking the syscall, can also be
> +passed using unix domain sockets to a manager process, so the same
> +manager process could handle the userfaults of a multitude of
> +different process without them being aware about what is going on
s/process/processes/
> +(well of course unless they later try to use the userfaultfd themself
s/themself/themselves/
> +on the same region the manager is already tracking, which is a corner
> +case that would currently return -EBUSY).
> +
> +== API ==
> +
> +When first opened the userfaultfd must be enabled invoking the
> +UFFDIO_API ioctl specifying an uffdio_api.api value set to UFFD_API
s/an uffdio/a uffdio/
> +which will specify the read/POLLIN protocol userland intends to speak
> +on the UFFD. The UFFDIO_API ioctl if successful (i.e. if the requested
> +uffdio_api.api is spoken also by the running kernel), will return into
> +uffdio_api.bits and uffdio_api.ioctls two 64bit bitmasks of
> +respectively the activated feature bits below PAGE_SHIFT in the
> +userfault addresses returned by read(2) and the generic ioctl
> +available.
> +
> +Once the userfaultfd has been enabled the UFFDIO_REGISTER ioctl should
> +be invoked (if present in the returned uffdio_api.ioctls bitmask) to
> +register a memory range in the userfaultfd by setting the
> +uffdio_register structure accordingly. The uffdio_register.mode
> +bitmask will specify to the kernel which kind of faults to track for
> +the range (UFFDIO_REGISTER_MODE_MISSING would track missing
> +pages). The UFFDIO_REGISTER ioctl will return the
> +uffdio_register.ioctls bitmask of ioctls that are suitable to resolve
> +userfaults on the range reigstered. Not all ioctls will necessarily be
s/reigstered/registered/
> +supported for all memory types depending on the underlying virtual
> +memory backend (anonymous memory vs tmpfs vs real filebacked
> +mappings).
> +
> +Userland can use the uffdio_register.ioctls to mangle the virtual
maybe s/mangle/manage/
> +address space in the background (to add or potentially also remove
> +memory from the userfaultfd registered range). This means an userfault
s/an/a/
> +could be triggering just before userland maps in the background the
> +user-faulted page. To avoid POLLIN resulting in an unexpected blocking
> +read (if the UFFD is not opened in nonblocking mode in the first
> +place), we don't allow the background thread to wake userfaults that
> +haven't been read by userland yet. If we would do that likely the
> +UFFDIO_WAKE ioctl could be dropped. This may change in the future
> +(with a UFFD_API protocol bumb combined with the removal of the
s/bumb/bump/
> +UFFDIO_WAKE ioctl) if it'll be demonstrated that it's a valid
> +optimization and worthy to force userland to use the UFFD always in
> +nonblocking mode if combined with POLLIN.
> +
> +userfaultfd is also a generic enough feature, that it allows KVM to
> +implement postcopy live migration (one form of memory externalization
> +consisting of a virtual machine running with part or all of its memory
> +residing on a different node in the cloud) without having to modify a
> +single line of KVM kernel code. Guest async page faults, FOLL_NOWAIT
> +and all other GUP features works just fine in combination with
> +userfaults (userfaults trigger async page faults in the guest
> +scheduler so those guest processes that aren't waiting for userfaults
> +can keep running in the guest vcpus).
> +
> +The primary ioctl to resolve userfaults is UFFDIO_COPY. That
> +atomically copies a page into the userfault registered range and wakes
> +up the blocked userfaults (unless uffdio_copy.mode &
> +UFFDIO_COPY_MODE_DONTWAKE is set). Other ioctl works similarly to
> +UFFDIO_COPY.
>
>
>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply
* Re: [PATCH 10/21] userfaultfd: add new syscall to provide memory externalization
From: Michael Kerrisk (man-pages) @ 2015-03-06 10:48 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: qemu-devel, kvm, lkml, linux-mm@kvack.org, Linux API,
Android Kernel Team, Kirill A. Shutemov, Pavel Emelyanov,
Sanidhya Kashyap, zhang.zhanghailiang, Linus Torvalds,
Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini, Rik van Riel,
Mel Gorman, Andy Lutomirski, Andrew Morton, Sasha Levin,
Hugh Dickins, Peter Feiner, Dr. David Alan Gilbert,
Christopher Covington, Jo
In-Reply-To: <1425575884-2574-11-git-send-email-aarcange@redhat.com>
Hi Andrea,
On 5 March 2015 at 18:17, Andrea Arcangeli <aarcange@redhat.com> wrote:
> Once an userfaultfd has been created and certain region of the process
> virtual address space have been registered into it, the thread
> responsible for doing the memory externalization can manage the page
> faults in userland by talking to the kernel using the userfaultfd
> protocol.
Is there someting like a man page for this new syscall?
Thanks,
Michael
> poll() can be used to know when there are new pending userfaults to be
> read (POLLIN).
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
> fs/userfaultfd.c | 977 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 977 insertions(+)
> create mode 100644 fs/userfaultfd.c
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> new file mode 100644
> index 0000000..6b31967
> --- /dev/null
> +++ b/fs/userfaultfd.c
> @@ -0,0 +1,977 @@
> +/*
> + * fs/userfaultfd.c
> + *
> + * Copyright (C) 2007 Davide Libenzi <davidel@xmailserver.org>
> + * Copyright (C) 2008-2009 Red Hat, Inc.
> + * Copyright (C) 2015 Red Hat, Inc.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2. See
> + * the COPYING file in the top-level directory.
> + *
> + * Some part derived from fs/eventfd.c (anon inode setup) and
> + * mm/ksm.c (mm hashing).
> + */
> +
> +#include <linux/hashtable.h>
> +#include <linux/sched.h>
> +#include <linux/mm.h>
> +#include <linux/poll.h>
> +#include <linux/slab.h>
> +#include <linux/seq_file.h>
> +#include <linux/file.h>
> +#include <linux/bug.h>
> +#include <linux/anon_inodes.h>
> +#include <linux/syscalls.h>
> +#include <linux/userfaultfd_k.h>
> +#include <linux/mempolicy.h>
> +#include <linux/ioctl.h>
> +#include <linux/security.h>
> +
> +enum userfaultfd_state {
> + UFFD_STATE_WAIT_API,
> + UFFD_STATE_RUNNING,
> +};
> +
> +struct userfaultfd_ctx {
> + /* pseudo fd refcounting */
> + atomic_t refcount;
> + /* waitqueue head for the userfaultfd page faults */
> + wait_queue_head_t fault_wqh;
> + /* waitqueue head for the pseudo fd to wakeup poll/read */
> + wait_queue_head_t fd_wqh;
> + /* userfaultfd syscall flags */
> + unsigned int flags;
> + /* state machine */
> + enum userfaultfd_state state;
> + /* released */
> + bool released;
> + /* mm with one ore more vmas attached to this userfaultfd_ctx */
> + struct mm_struct *mm;
> +};
> +
> +struct userfaultfd_wait_queue {
> + unsigned long address;
> + wait_queue_t wq;
> + bool pending;
> + struct userfaultfd_ctx *ctx;
> +};
> +
> +struct userfaultfd_wake_range {
> + unsigned long start;
> + unsigned long len;
> +};
> +
> +static int userfaultfd_wake_function(wait_queue_t *wq, unsigned mode,
> + int wake_flags, void *key)
> +{
> + struct userfaultfd_wake_range *range = key;
> + int ret;
> + struct userfaultfd_wait_queue *uwq;
> + unsigned long start, len;
> +
> + uwq = container_of(wq, struct userfaultfd_wait_queue, wq);
> + ret = 0;
> + /* don't wake the pending ones to avoid reads to block */
> + if (uwq->pending && !ACCESS_ONCE(uwq->ctx->released))
> + goto out;
> + /* len == 0 means wake all */
> + start = range->start;
> + len = range->len;
> + if (len && (start > uwq->address || start + len <= uwq->address))
> + goto out;
> + ret = wake_up_state(wq->private, mode);
> + if (ret)
> + /* wake only once, autoremove behavior */
> + list_del_init(&wq->task_list);
> +out:
> + return ret;
> +}
> +
> +/**
> + * userfaultfd_ctx_get - Acquires a reference to the internal userfaultfd
> + * context.
> + * @ctx: [in] Pointer to the userfaultfd context.
> + *
> + * Returns: In case of success, returns not zero.
> + */
> +static void userfaultfd_ctx_get(struct userfaultfd_ctx *ctx)
> +{
> + if (!atomic_inc_not_zero(&ctx->refcount))
> + BUG();
> +}
> +
> +/**
> + * userfaultfd_ctx_put - Releases a reference to the internal userfaultfd
> + * context.
> + * @ctx: [in] Pointer to userfaultfd context.
> + *
> + * The userfaultfd context reference must have been previously acquired either
> + * with userfaultfd_ctx_get() or userfaultfd_ctx_fdget().
> + */
> +static void userfaultfd_ctx_put(struct userfaultfd_ctx *ctx)
> +{
> + if (atomic_dec_and_test(&ctx->refcount)) {
> + mmdrop(ctx->mm);
> + kfree(ctx);
> + }
> +}
> +
> +static inline unsigned long userfault_address(unsigned long address,
> + unsigned int flags,
> + unsigned long reason)
> +{
> + BUILD_BUG_ON(PAGE_SHIFT < UFFD_BITS);
> + address &= PAGE_MASK;
> + if (flags & FAULT_FLAG_WRITE)
> + /*
> + * Encode "write" fault information in the LSB of the
> + * address read by userland, without depending on
> + * FAULT_FLAG_WRITE kernel internal value.
> + */
> + address |= UFFD_BIT_WRITE;
> + if (reason & VM_UFFD_WP)
> + /*
> + * Encode "reason" fault information as bit number 1
> + * in the address read by userland. If bit number 1 is
> + * clear it means the reason is a VM_FAULT_MISSING
> + * fault.
> + */
> + address |= UFFD_BIT_WP;
> + return address;
> +}
> +
> +/*
> + * The locking rules involved in returning VM_FAULT_RETRY depending on
> + * FAULT_FLAG_ALLOW_RETRY, FAULT_FLAG_RETRY_NOWAIT and
> + * FAULT_FLAG_KILLABLE are not straightforward. The "Caution"
> + * recommendation in __lock_page_or_retry is not an understatement.
> + *
> + * If FAULT_FLAG_ALLOW_RETRY is set, the mmap_sem must be released
> + * before returning VM_FAULT_RETRY only if FAULT_FLAG_RETRY_NOWAIT is
> + * not set.
> + *
> + * If FAULT_FLAG_ALLOW_RETRY is set but FAULT_FLAG_KILLABLE is not
> + * set, VM_FAULT_RETRY can still be returned if and only if there are
> + * fatal_signal_pending()s, and the mmap_sem must be released before
> + * returning it.
> + */
> +int handle_userfault(struct vm_area_struct *vma, unsigned long address,
> + unsigned int flags, unsigned long reason)
> +{
> + struct mm_struct *mm = vma->vm_mm;
> + struct userfaultfd_ctx *ctx;
> + struct userfaultfd_wait_queue uwq;
> +
> + BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
> +
> + ctx = vma->vm_userfaultfd_ctx.ctx;
> + if (!ctx)
> + return VM_FAULT_SIGBUS;
> +
> + BUG_ON(ctx->mm != mm);
> +
> + VM_BUG_ON(reason & ~(VM_UFFD_MISSING|VM_UFFD_WP));
> + VM_BUG_ON(!(reason & VM_UFFD_MISSING) ^ !!(reason & VM_UFFD_WP));
> +
> + /*
> + * If it's already released don't get it. This avoids to loop
> + * in __get_user_pages if userfaultfd_release waits on the
> + * caller of handle_userfault to release the mmap_sem.
> + */
> + if (unlikely(ACCESS_ONCE(ctx->released)))
> + return VM_FAULT_SIGBUS;
> +
> + /* check that we can return VM_FAULT_RETRY */
> + if (unlikely(!(flags & FAULT_FLAG_ALLOW_RETRY))) {
> + /*
> + * Validate the invariant that nowait must allow retry
> + * to be sure not to return SIGBUS erroneously on
> + * nowait invocations.
> + */
> + BUG_ON(flags & FAULT_FLAG_RETRY_NOWAIT);
> +#ifdef CONFIG_DEBUG_VM
> + if (printk_ratelimit()) {
> + printk(KERN_WARNING
> + "FAULT_FLAG_ALLOW_RETRY missing %x\n", flags);
> + dump_stack();
> + }
> +#endif
> + return VM_FAULT_SIGBUS;
> + }
> +
> + /*
> + * Handle nowait, not much to do other than tell it to retry
> + * and wait.
> + */
> + if (flags & FAULT_FLAG_RETRY_NOWAIT)
> + return VM_FAULT_RETRY;
> +
> + /* take the reference before dropping the mmap_sem */
> + userfaultfd_ctx_get(ctx);
> +
> + /* be gentle and immediately relinquish the mmap_sem */
> + up_read(&mm->mmap_sem);
> +
> + init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function);
> + uwq.wq.private = current;
> + uwq.address = userfault_address(address, flags, reason);
> + uwq.pending = true;
> + uwq.ctx = ctx;
> +
> + spin_lock(&ctx->fault_wqh.lock);
> + /*
> + * After the __add_wait_queue the uwq is visible to userland
> + * through poll/read().
> + */
> + __add_wait_queue(&ctx->fault_wqh, &uwq.wq);
> + for (;;) {
> + set_current_state(TASK_KILLABLE);
> + if (!uwq.pending || ACCESS_ONCE(ctx->released) ||
> + fatal_signal_pending(current))
> + break;
> + spin_unlock(&ctx->fault_wqh.lock);
> +
> + wake_up_poll(&ctx->fd_wqh, POLLIN);
> + schedule();
> +
> + spin_lock(&ctx->fault_wqh.lock);
> + }
> + __remove_wait_queue(&ctx->fault_wqh, &uwq.wq);
> + __set_current_state(TASK_RUNNING);
> + spin_unlock(&ctx->fault_wqh.lock);
> +
> + /*
> + * ctx may go away after this if the userfault pseudo fd is
> + * already released.
> + */
> + userfaultfd_ctx_put(ctx);
> +
> + return VM_FAULT_RETRY;
> +}
> +
> +static int userfaultfd_release(struct inode *inode, struct file *file)
> +{
> + struct userfaultfd_ctx *ctx = file->private_data;
> + struct mm_struct *mm = ctx->mm;
> + struct vm_area_struct *vma, *prev;
> + /* len == 0 means wake all */
> + struct userfaultfd_wake_range range = { .len = 0, };
> + unsigned long new_flags;
> +
> + ACCESS_ONCE(ctx->released) = true;
> +
> + /*
> + * Flush page faults out of all CPUs. NOTE: all page faults
> + * must be retried without returning VM_FAULT_SIGBUS if
> + * userfaultfd_ctx_get() succeeds but vma->vma_userfault_ctx
> + * changes while handle_userfault released the mmap_sem. So
> + * it's critical that released is set to true (above), before
> + * taking the mmap_sem for writing.
> + */
> + down_write(&mm->mmap_sem);
> + prev = NULL;
> + for (vma = mm->mmap; vma; vma = vma->vm_next) {
> + cond_resched();
> + BUG_ON(!!vma->vm_userfaultfd_ctx.ctx ^
> + !!(vma->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP)));
> + if (vma->vm_userfaultfd_ctx.ctx != ctx) {
> + prev = vma;
> + continue;
> + }
> + new_flags = vma->vm_flags & ~(VM_UFFD_MISSING | VM_UFFD_WP);
> + prev = vma_merge(mm, prev, vma->vm_start, vma->vm_end,
> + new_flags, vma->anon_vma,
> + vma->vm_file, vma->vm_pgoff,
> + vma_policy(vma),
> + NULL_VM_UFFD_CTX);
> + if (prev)
> + vma = prev;
> + else
> + prev = vma;
> + vma->vm_flags = new_flags;
> + vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
> + }
> + up_write(&mm->mmap_sem);
> +
> + /*
> + * After no new page faults can wait on this fault_wqh, flush
> + * the last page faults that may have been already waiting on
> + * the fault_wqh.
> + */
> + spin_lock(&ctx->fault_wqh.lock);
> + __wake_up_locked_key(&ctx->fault_wqh, TASK_NORMAL, 0, &range);
> + spin_unlock(&ctx->fault_wqh.lock);
> +
> + wake_up_poll(&ctx->fd_wqh, POLLHUP);
> + userfaultfd_ctx_put(ctx);
> + return 0;
> +}
> +
> +static inline unsigned int find_userfault(struct userfaultfd_ctx *ctx,
> + struct userfaultfd_wait_queue **uwq)
> +{
> + wait_queue_t *wq;
> + struct userfaultfd_wait_queue *_uwq;
> + unsigned int ret = 0;
> +
> + spin_lock(&ctx->fault_wqh.lock);
> + list_for_each_entry(wq, &ctx->fault_wqh.task_list, task_list) {
> + _uwq = container_of(wq, struct userfaultfd_wait_queue, wq);
> + if (_uwq->pending) {
> + ret = POLLIN;
> + if (uwq)
> + *uwq = _uwq;
> + break;
> + }
> + }
> + spin_unlock(&ctx->fault_wqh.lock);
> +
> + return ret;
> +}
> +
> +static unsigned int userfaultfd_poll(struct file *file, poll_table *wait)
> +{
> + struct userfaultfd_ctx *ctx = file->private_data;
> +
> + poll_wait(file, &ctx->fd_wqh, wait);
> +
> + switch (ctx->state) {
> + case UFFD_STATE_WAIT_API:
> + return POLLERR;
> + case UFFD_STATE_RUNNING:
> + return find_userfault(ctx, NULL);
> + default:
> + BUG();
> + }
> +}
> +
> +static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
> + __u64 *addr)
> +{
> + ssize_t ret;
> + DECLARE_WAITQUEUE(wait, current);
> + struct userfaultfd_wait_queue *uwq = NULL;
> +
> + /* always take the fd_wqh lock before the fault_wqh lock */
> + spin_lock(&ctx->fd_wqh.lock);
> + __add_wait_queue(&ctx->fd_wqh, &wait);
> + for (;;) {
> + set_current_state(TASK_INTERRUPTIBLE);
> + if (find_userfault(ctx, &uwq)) {
> + uwq->pending = false;
> + /* careful to always initialize addr if ret == 0 */
> + *addr = uwq->address;
> + ret = 0;
> + break;
> + }
> + if (signal_pending(current)) {
> + ret = -ERESTARTSYS;
> + break;
> + }
> + if (no_wait) {
> + ret = -EAGAIN;
> + break;
> + }
> + spin_unlock(&ctx->fd_wqh.lock);
> + schedule();
> + spin_lock_irq(&ctx->fd_wqh.lock);
> + }
> + __remove_wait_queue(&ctx->fd_wqh, &wait);
> + __set_current_state(TASK_RUNNING);
> + spin_unlock_irq(&ctx->fd_wqh.lock);
> +
> + return ret;
> +}
> +
> +static ssize_t userfaultfd_read(struct file *file, char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + struct userfaultfd_ctx *ctx = file->private_data;
> + ssize_t _ret, ret = 0;
> + /* careful to always initialize addr if ret == 0 */
> + __u64 uninitialized_var(addr);
> + int no_wait = file->f_flags & O_NONBLOCK;
> +
> + if (ctx->state == UFFD_STATE_WAIT_API)
> + return -EINVAL;
> + BUG_ON(ctx->state != UFFD_STATE_RUNNING);
> +
> + for (;;) {
> + if (count < sizeof(addr))
> + return ret ? ret : -EINVAL;
> + _ret = userfaultfd_ctx_read(ctx, no_wait, &addr);
> + if (_ret < 0)
> + return ret ? ret : _ret;
> + if (put_user(addr, (__u64 __user *) buf))
> + return ret ? ret : -EFAULT;
> + ret += sizeof(addr);
> + buf += sizeof(addr);
> + count -= sizeof(addr);
> + /*
> + * Allow to read more than one fault at time but only
> + * block if waiting for the very first one.
> + */
> + no_wait = O_NONBLOCK;
> + }
> +}
> +
> +static int __wake_userfault(struct userfaultfd_ctx *ctx,
> + struct userfaultfd_wake_range *range)
> +{
> + wait_queue_t *wq;
> + struct userfaultfd_wait_queue *uwq;
> + int ret;
> + unsigned long start, end;
> +
> + start = range->start;
> + end = range->start + range->len;
> +
> + ret = -ENOENT;
> + spin_lock(&ctx->fault_wqh.lock);
> + list_for_each_entry(wq, &ctx->fault_wqh.task_list, task_list) {
> + uwq = container_of(wq, struct userfaultfd_wait_queue, wq);
> + if (uwq->pending)
> + continue;
> + if (uwq->address >= start && uwq->address < end) {
> + ret = 0;
> + /* wake all in the range and autoremove */
> + __wake_up_locked_key(&ctx->fault_wqh, TASK_NORMAL, 0,
> + range);
> + break;
> + }
> + }
> + spin_unlock(&ctx->fault_wqh.lock);
> +
> + return ret;
> +}
> +
> +static __always_inline int wake_userfault(struct userfaultfd_ctx *ctx,
> + struct userfaultfd_wake_range *range)
> +{
> + if (!waitqueue_active(&ctx->fault_wqh))
> + return -ENOENT;
> +
> + return __wake_userfault(ctx, range);
> +}
> +
> +static __always_inline int validate_range(struct mm_struct *mm,
> + __u64 start, __u64 len)
> +{
> + __u64 task_size = mm->task_size;
> +
> + if (start & ~PAGE_MASK)
> + return -EINVAL;
> + if (len & ~PAGE_MASK)
> + return -EINVAL;
> + if (!len)
> + return -EINVAL;
> + if (start < mmap_min_addr)
> + return -EINVAL;
> + if (start >= task_size)
> + return -EINVAL;
> + if (len > task_size - start)
> + return -EINVAL;
> + return 0;
> +}
> +
> +static int userfaultfd_register(struct userfaultfd_ctx *ctx,
> + unsigned long arg)
> +{
> + struct mm_struct *mm = ctx->mm;
> + struct vm_area_struct *vma, *prev, *cur;
> + int ret;
> + struct uffdio_register uffdio_register;
> + struct uffdio_register __user *user_uffdio_register;
> + unsigned long vm_flags, new_flags;
> + bool found;
> + unsigned long start, end, vma_end;
> +
> + user_uffdio_register = (struct uffdio_register __user *) arg;
> +
> + ret = -EFAULT;
> + if (copy_from_user(&uffdio_register, user_uffdio_register,
> + sizeof(uffdio_register)-sizeof(__u64)))
> + goto out;
> +
> + ret = -EINVAL;
> + if (!uffdio_register.mode)
> + goto out;
> + if (uffdio_register.mode & ~(UFFDIO_REGISTER_MODE_MISSING|
> + UFFDIO_REGISTER_MODE_WP))
> + goto out;
> + vm_flags = 0;
> + if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MISSING)
> + vm_flags |= VM_UFFD_MISSING;
> + if (uffdio_register.mode & UFFDIO_REGISTER_MODE_WP) {
> + vm_flags |= VM_UFFD_WP;
> + /*
> + * FIXME: remove the below error constraint by
> + * implementing the wprotect tracking mode.
> + */
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + ret = validate_range(mm, uffdio_register.range.start,
> + uffdio_register.range.len);
> + if (ret)
> + goto out;
> +
> + start = uffdio_register.range.start;
> + end = start + uffdio_register.range.len;
> +
> + down_write(&mm->mmap_sem);
> + vma = find_vma_prev(mm, start, &prev);
> +
> + ret = -ENOMEM;
> + if (!vma)
> + goto out_unlock;
> +
> + /* check that there's at least one vma in the range */
> + ret = -EINVAL;
> + if (vma->vm_start >= end)
> + goto out_unlock;
> +
> + /*
> + * Search for not compatible vmas.
> + *
> + * FIXME: this shall be relaxed later so that it doesn't fail
> + * on tmpfs backed vmas (in addition to the current allowance
> + * on anonymous vmas).
> + */
> + found = false;
> + for (cur = vma; cur && cur->vm_start < end; cur = cur->vm_next) {
> + cond_resched();
> +
> + BUG_ON(!!cur->vm_userfaultfd_ctx.ctx ^
> + !!(cur->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP)));
> +
> + /* check not compatible vmas */
> + ret = -EINVAL;
> + if (cur->vm_ops)
> + goto out_unlock;
> +
> + /*
> + * Check that this vma isn't already owned by a
> + * different userfaultfd. We can't allow more than one
> + * userfaultfd to own a single vma simultaneously or we
> + * wouldn't know which one to deliver the userfaults to.
> + */
> + ret = -EBUSY;
> + if (cur->vm_userfaultfd_ctx.ctx &&
> + cur->vm_userfaultfd_ctx.ctx != ctx)
> + goto out_unlock;
> +
> + found = true;
> + }
> + BUG_ON(!found);
> +
> + /*
> + * Now that we scanned all vmas we can already tell userland which
> + * ioctls methods are guaranteed to succeed on this range.
> + */
> + ret = -EFAULT;
> + if (put_user(UFFD_API_RANGE_IOCTLS, &user_uffdio_register->ioctls))
> + goto out_unlock;
> +
> + if (vma->vm_start < start)
> + prev = vma;
> +
> + ret = 0;
> + do {
> + cond_resched();
> +
> + BUG_ON(vma->vm_ops);
> + BUG_ON(vma->vm_userfaultfd_ctx.ctx &&
> + vma->vm_userfaultfd_ctx.ctx != ctx);
> +
> + /*
> + * Nothing to do: this vma is already registered into this
> + * userfaultfd and with the right tracking mode too.
> + */
> + if (vma->vm_userfaultfd_ctx.ctx == ctx &&
> + (vma->vm_flags & vm_flags) == vm_flags)
> + goto skip;
> +
> + if (vma->vm_start > start)
> + start = vma->vm_start;
> + vma_end = min(end, vma->vm_end);
> +
> + new_flags = (vma->vm_flags & ~vm_flags) | vm_flags;
> + prev = vma_merge(mm, prev, start, vma_end, new_flags,
> + vma->anon_vma, vma->vm_file, vma->vm_pgoff,
> + vma_policy(vma),
> + ((struct vm_userfaultfd_ctx){ ctx }));
> + if (prev) {
> + vma = prev;
> + goto next;
> + }
> + if (vma->vm_start < start) {
> + ret = split_vma(mm, vma, start, 1);
> + if (ret)
> + break;
> + }
> + if (vma->vm_end > end) {
> + ret = split_vma(mm, vma, end, 0);
> + if (ret)
> + break;
> + }
> + next:
> + /*
> + * In the vma_merge() successful mprotect-like case 8:
> + * the next vma was merged into the current one and
> + * the current one has not been updated yet.
> + */
> + vma->vm_flags = new_flags;
> + vma->vm_userfaultfd_ctx.ctx = ctx;
> +
> + skip:
> + prev = vma;
> + start = vma->vm_end;
> + vma = vma->vm_next;
> + } while (vma && vma->vm_start < end);
> +out_unlock:
> + up_write(&mm->mmap_sem);
> +out:
> + return ret;
> +}
> +
> +static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
> + unsigned long arg)
> +{
> + struct mm_struct *mm = ctx->mm;
> + struct vm_area_struct *vma, *prev, *cur;
> + int ret;
> + struct uffdio_range uffdio_unregister;
> + unsigned long new_flags;
> + bool found;
> + unsigned long start, end, vma_end;
> + const void __user *buf = (void __user *)arg;
> +
> + ret = -EFAULT;
> + if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister)))
> + goto out;
> +
> + ret = validate_range(mm, uffdio_unregister.start,
> + uffdio_unregister.len);
> + if (ret)
> + goto out;
> +
> + start = uffdio_unregister.start;
> + end = start + uffdio_unregister.len;
> +
> + down_write(&mm->mmap_sem);
> + vma = find_vma_prev(mm, start, &prev);
> +
> + ret = -ENOMEM;
> + if (!vma)
> + goto out_unlock;
> +
> + /* check that there's at least one vma in the range */
> + ret = -EINVAL;
> + if (vma->vm_start >= end)
> + goto out_unlock;
> +
> + /*
> + * Search for not compatible vmas.
> + *
> + * FIXME: this shall be relaxed later so that it doesn't fail
> + * on tmpfs backed vmas (in addition to the current allowance
> + * on anonymous vmas).
> + */
> + found = false;
> + ret = -EINVAL;
> + for (cur = vma; cur && cur->vm_start < end; cur = cur->vm_next) {
> + cond_resched();
> +
> + BUG_ON(!!cur->vm_userfaultfd_ctx.ctx ^
> + !!(cur->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP)));
> +
> + /*
> + * Check not compatible vmas, not strictly required
> + * here as not compatible vmas cannot have an
> + * userfaultfd_ctx registered on them, but this
> + * provides for more strict behavior to notice
> + * unregistration errors.
> + */
> + if (cur->vm_ops)
> + goto out_unlock;
> +
> + found = true;
> + }
> + BUG_ON(!found);
> +
> + if (vma->vm_start < start)
> + prev = vma;
> +
> + ret = 0;
> + do {
> + cond_resched();
> +
> + BUG_ON(vma->vm_ops);
> +
> + /*
> + * Nothing to do: this vma is already registered into this
> + * userfaultfd and with the right tracking mode too.
> + */
> + if (!vma->vm_userfaultfd_ctx.ctx)
> + goto skip;
> +
> + if (vma->vm_start > start)
> + start = vma->vm_start;
> + vma_end = min(end, vma->vm_end);
> +
> + new_flags = vma->vm_flags & ~(VM_UFFD_MISSING | VM_UFFD_WP);
> + prev = vma_merge(mm, prev, start, vma_end, new_flags,
> + vma->anon_vma, vma->vm_file, vma->vm_pgoff,
> + vma_policy(vma),
> + NULL_VM_UFFD_CTX);
> + if (prev) {
> + vma = prev;
> + goto next;
> + }
> + if (vma->vm_start < start) {
> + ret = split_vma(mm, vma, start, 1);
> + if (ret)
> + break;
> + }
> + if (vma->vm_end > end) {
> + ret = split_vma(mm, vma, end, 0);
> + if (ret)
> + break;
> + }
> + next:
> + /*
> + * In the vma_merge() successful mprotect-like case 8:
> + * the next vma was merged into the current one and
> + * the current one has not been updated yet.
> + */
> + vma->vm_flags = new_flags;
> + vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
> +
> + skip:
> + prev = vma;
> + start = vma->vm_end;
> + vma = vma->vm_next;
> + } while (vma && vma->vm_start < end);
> +out_unlock:
> + up_write(&mm->mmap_sem);
> +out:
> + return ret;
> +}
> +
> +/*
> + * This is mostly needed to re-wakeup those userfaults that were still
> + * pending when userland wake them up the first time. We don't wake
> + * the pending one to avoid blocking reads to block, or non blocking
> + * read to return -EAGAIN, if used with POLLIN, to avoid userland
> + * doubts on why POLLIN wasn't reliable.
> + */
> +static int userfaultfd_wake(struct userfaultfd_ctx *ctx,
> + unsigned long arg)
> +{
> + int ret;
> + struct uffdio_range uffdio_wake;
> + struct userfaultfd_wake_range range;
> + const void __user *buf = (void __user *)arg;
> +
> + ret = -EFAULT;
> + if (copy_from_user(&uffdio_wake, buf, sizeof(uffdio_wake)))
> + goto out;
> +
> + ret = validate_range(ctx->mm, uffdio_wake.start, uffdio_wake.len);
> + if (ret)
> + goto out;
> +
> + range.start = uffdio_wake.start;
> + range.len = uffdio_wake.len;
> +
> + /*
> + * len == 0 means wake all and we don't want to wake all here,
> + * so check it again to be sure.
> + */
> + VM_BUG_ON(!range.len);
> +
> + ret = wake_userfault(ctx, &range);
> +
> +out:
> + return ret;
> +}
> +
> +/*
> + * userland asks for a certain API version and we return which bits
> + * and ioctl commands are implemented in this kernel for such API
> + * version or -EINVAL if unknown.
> + */
> +static int userfaultfd_api(struct userfaultfd_ctx *ctx,
> + unsigned long arg)
> +{
> + struct uffdio_api uffdio_api;
> + void __user *buf = (void __user *)arg;
> + int ret;
> +
> + ret = -EINVAL;
> + if (ctx->state != UFFD_STATE_WAIT_API)
> + goto out;
> + ret = -EFAULT;
> + if (copy_from_user(&uffdio_api, buf, sizeof(__u64)))
> + goto out;
> + if (uffdio_api.api != UFFD_API) {
> + /* careful not to leak info, we only read the first 8 bytes */
> + memset(&uffdio_api, 0, sizeof(uffdio_api));
> + if (copy_to_user(buf, &uffdio_api, sizeof(uffdio_api)))
> + goto out;
> + ret = -EINVAL;
> + goto out;
> + }
> + /* careful not to leak info, we only read the first 8 bytes */
> + uffdio_api.bits = UFFD_API_BITS;
> + uffdio_api.ioctls = UFFD_API_IOCTLS;
> + ret = -EFAULT;
> + if (copy_to_user(buf, &uffdio_api, sizeof(uffdio_api)))
> + goto out;
> + ctx->state = UFFD_STATE_RUNNING;
> + ret = 0;
> +out:
> + return ret;
> +}
> +
> +static long userfaultfd_ioctl(struct file *file, unsigned cmd,
> + unsigned long arg)
> +{
> + int ret = -EINVAL;
> + struct userfaultfd_ctx *ctx = file->private_data;
> +
> + switch(cmd) {
> + case UFFDIO_API:
> + ret = userfaultfd_api(ctx, arg);
> + break;
> + case UFFDIO_REGISTER:
> + ret = userfaultfd_register(ctx, arg);
> + break;
> + case UFFDIO_UNREGISTER:
> + ret = userfaultfd_unregister(ctx, arg);
> + break;
> + case UFFDIO_WAKE:
> + ret = userfaultfd_wake(ctx, arg);
> + break;
> + }
> + return ret;
> +}
> +
> +#ifdef CONFIG_PROC_FS
> +static void userfaultfd_show_fdinfo(struct seq_file *m, struct file *f)
> +{
> + struct userfaultfd_ctx *ctx = f->private_data;
> + wait_queue_t *wq;
> + struct userfaultfd_wait_queue *uwq;
> + unsigned long pending = 0, total = 0;
> +
> + spin_lock(&ctx->fault_wqh.lock);
> + list_for_each_entry(wq, &ctx->fault_wqh.task_list, task_list) {
> + uwq = container_of(wq, struct userfaultfd_wait_queue, wq);
> + if (uwq->pending)
> + pending++;
> + total++;
> + }
> + spin_unlock(&ctx->fault_wqh.lock);
> +
> + /*
> + * If more protocols will be added, there will be all shown
> + * separated by a space. Like this:
> + * protocols: 0xaa 0xbb
> + */
> + seq_printf(m, "pending:\t%lu\ntotal:\t%lu\nAPI:\t%Lx:%x:%Lx\n",
> + pending, total, UFFD_API, UFFD_API_BITS,
> + UFFD_API_IOCTLS|UFFD_API_RANGE_IOCTLS);
> +}
> +#endif
> +
> +static const struct file_operations userfaultfd_fops = {
> +#ifdef CONFIG_PROC_FS
> + .show_fdinfo = userfaultfd_show_fdinfo,
> +#endif
> + .release = userfaultfd_release,
> + .poll = userfaultfd_poll,
> + .read = userfaultfd_read,
> + .unlocked_ioctl = userfaultfd_ioctl,
> + .compat_ioctl = userfaultfd_ioctl,
> + .llseek = noop_llseek,
> +};
> +
> +/**
> + * userfaultfd_file_create - Creates an userfaultfd file pointer.
> + * @flags: Flags for the userfaultfd file.
> + *
> + * This function creates an userfaultfd file pointer, w/out installing
> + * it into the fd table. This is useful when the userfaultfd file is
> + * used during the initialization of data structures that require
> + * extra setup after the userfaultfd creation. So the userfaultfd
> + * creation is split into the file pointer creation phase, and the
> + * file descriptor installation phase. In this way races with
> + * userspace closing the newly installed file descriptor can be
> + * avoided. Returns an userfaultfd file pointer, or a proper error
> + * pointer.
> + */
> +static struct file *userfaultfd_file_create(int flags)
> +{
> + struct file *file;
> + struct userfaultfd_ctx *ctx;
> +
> + BUG_ON(!current->mm);
> +
> + /* Check the UFFD_* constants for consistency. */
> + BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC);
> + BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK);
> +
> + file = ERR_PTR(-EINVAL);
> + if (flags & ~UFFD_SHARED_FCNTL_FLAGS)
> + goto out;
> +
> + file = ERR_PTR(-ENOMEM);
> + ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
> + if (!ctx)
> + goto out;
> +
> + atomic_set(&ctx->refcount, 1);
> + init_waitqueue_head(&ctx->fault_wqh);
> + init_waitqueue_head(&ctx->fd_wqh);
> + ctx->flags = flags;
> + ctx->state = UFFD_STATE_WAIT_API;
> + ctx->released = false;
> + ctx->mm = current->mm;
> + /* prevent the mm struct to be freed */
> + atomic_inc(&ctx->mm->mm_count);
> +
> + file = anon_inode_getfile("[userfaultfd]", &userfaultfd_fops, ctx,
> + O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS));
> + if (IS_ERR(file))
> + kfree(ctx);
> +out:
> + return file;
> +}
> +
> +SYSCALL_DEFINE1(userfaultfd, int, flags)
> +{
> + int fd, error;
> + struct file *file;
> +
> + error = get_unused_fd_flags(flags & UFFD_SHARED_FCNTL_FLAGS);
> + if (error < 0)
> + return error;
> + fd = error;
> +
> + file = userfaultfd_file_create(flags);
> + if (IS_ERR(file)) {
> + error = PTR_ERR(file);
> + goto err_put_unused_fd;
> + }
> + fd_install(fd, file);
> +
> + return fd;
> +
> +err_put_unused_fd:
> + put_unused_fd(fd);
> +
> + return error;
> +}
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH v2 11/18] pinctrl: Add pinctrl driver for STM32 MCUs
From: Maxime Coquelin @ 2015-03-06 9:57 UTC (permalink / raw)
To: Linus Walleij
Cc: Uwe Kleine-König, Andreas Färber, Geert Uytterhoeven,
Rob Herring, Philipp Zabel, Jonathan Corbet, Pawel Moll,
Mark Rutland, Ian Campbell, Kumar Gala, Russell King,
Daniel Lezcano, Thomas Gleixner, Greg Kroah-Hartman, Jiri Slaby,
Arnd Bergmann, Andrew Morton, David S. Miller,
Mauro Carvalho Chehab, Joe Perches, Antti Palosaari, Tejun Heo
In-Reply-To: <CACRpkdbuJ5B_GwvRXax2Y4V37ihh5e6H7=2no0fYTMZPXwDdCw@mail.gmail.com>
2015-03-06 10:24 GMT+01:00 Linus Walleij <linus.walleij@linaro.org>:
> On Fri, Feb 20, 2015 at 7:01 PM, Maxime Coquelin
> <mcoquelin.stm32@gmail.com> wrote:
>
>> This driver adds pinctrl and GPIO support to STMicrolectronic's
>> STM32 family of MCUs.
>>
>> Pin muxing and GPIO handling have been tested on STM32F429
>> based Discovery board.
>>
>> Signed-off-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>
>
> (...)
>> +config PINCTRL_STM32
>> + bool "STMicroelectronics STM32 pinctrl driver"
>> + depends on OF
>> + depends on ARCH_STM32 || COMPILE_TEST
>> + select PINMUX
>> + select PINCONF
>> + select GPIOLIB_IRQCHIP
>> + help
>> + This selects the device tree based generic pinctrl driver for STM32.
>
> Good start! Especially that you use GPIOLIB_IRQCHIP.
>
> But this (as discussed earlier) should select GENERIC_PINCONF
>
> Stopping review here so you can reengineer it a bit using GENERIC_PINCONF
> for next submission.
>
> Also think about pinmux in single registers, whether you want to do this
> with a single value for a register or using strings to identify groups
> and functions.
Thanks for the review.
I will digest all this, and come back with another solution :)
Best regards,
Maxime
>
> Yours,
> Linus Walleij
^ permalink raw reply
* Re: [PATCH 14/14] MAINTAINERS: Add entry for STM32 MCUs
From: Maxime Coquelin @ 2015-03-06 9:55 UTC (permalink / raw)
To: Linus Walleij
Cc: Jonathan Corbet, Rob Herring, Pawel Moll, Mark Rutland,
Ian Campbell, Kumar Gala, Philipp Zabel, Russell King,
Daniel Lezcano, Thomas Gleixner, Greg Kroah-Hartman, Jiri Slaby,
Arnd Bergmann, Andrew Morton, David S. Miller,
Mauro Carvalho Chehab, Joe Perches, Antti Palosaari, Tejun Heo,
Will Deacon, Nikolay Borisov, Rusty Russell, Kees Cook, Michal
In-Reply-To: <CACRpkdaM0BwMcq77MO=0-dVaqwobD5Pt+11j7ZZ4c3yKM=AkeA@mail.gmail.com>
2015-03-06 10:03 GMT+01:00 Linus Walleij <linus.walleij@linaro.org>:
> On Thu, Feb 12, 2015 at 6:46 PM, Maxime Coquelin
> <mcoquelin.stm32@gmail.com> wrote:
>
>> Add a MAINTAINER entry covering all STM32 machine and drivers files.
>>
>> Signed-off-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>
>
> (...)
>> +F: drivers/clocksource/arm_system_timer.c
>
> Is that all? And that is not even a STM32 specific driver.
For the ARM System Timer, I'm fine to add a new entry.
Or remove the line, and let the maintain-ship to clocksource maintainers.
All the STM32 files are covered by this line:
+N: stm32
Thanks,
Maxime
>
> Yours,
> Linus Walleij
^ permalink raw reply
* Right interface for cellphone modem audio (was Re: [PATCHv2 0/2] N900 Modem Speech Support)
From: Pavel Machek @ 2015-03-06 9:43 UTC (permalink / raw)
To: Kai Vehmanen, perex, tiwai, alsa-devel
Cc: Sebastian Reichel, Peter Ujfalusi, Kai Vehmanen, Pali Rohar,
Aaro Koskinen, Ivaylo Dimitrov, linux-omap, linux-kernel,
linux-api
In-Reply-To: <alpine.DEB.2.00.1503051844420.2610@ecabase.localdomain>
Hi!
> >>Userland access goes via /dev/cmt_speech. The API is implemented in
> >>libcmtspeechdata, which is used by ofono and the freesmartphone.org project.
> >Yes, the ABI is "tested" for some years, but it is not documented, and
> >it is very wrong ABI.
> >
> >I'm not sure what they do with the "read()". I was assuming it is
> >meant for passing voice data, but it can return at most 4 bytes,
> >AFAICT.
> >
> >We already have perfectly good ABI for passing voice data around. It
> >is called "ALSA". libcmtspeech will then become unneccessary, and the
> >daemon routing voice data will be as simple as "read sample from
>
> I'm no longer involved with cmt_speech (with this driver nor modems in
> general), but let me clarify some bits about the design.
Thanks a lot for your insights; high level design decisions are quite
hard to understand from C code.
> First, the team that designed the driver and the stack above had a lot of
> folks working also with ALSA (and the ALSA drivers have been merged to
> mainline long ago) and we considered ALSA on multiple occasions as the
> interface for this as well.
>
> Our take was that ALSA is not the right interface for cmt_speech. The
> cmt_speech interface in the modem is _not_ a PCM interface as modelled by
> ALSA. Specifically:
>
> - the interface is lossy in both directions
> - data is sent in packets, not a stream of samples (could be other things
> than PCM samples), with timing and meta-data
> - timing of uplink is of utmost importance
I see that you may not have data available in "downlink" scenario, but
how is it lossy in "uplink" scenario? Phone should always try to fill
the uplink, no? (Or do you detect silence and not transmit in this
case?) (Actually, I guess applications should be ready for "data not
ready" case even on "normal" hardware due to differing clocks.)
Packets vs. stream of samples... does userland need to know about the
packets? Could we simply hide it from the userland? As userland daemon
is (supposed to be) realtime, do we really need extra set of
timestamps? What other metadata are there?
Uplink timing... As the daemon is realtime, can it just send the data
at the right time? Also normally uplink would be filled, no?
> Some definite similarities:
> - the mmap interface to manage the PCM buffers (that is on purpose
> similar to that of ALSA)
>
> The interface was designed so that the audio mixer (e.g. Pulseaudio) is run
> with a soft real-time SCHED_FIFO/RR user-space thread that has full control
> over _when_ voice _packets_ are sent, and can receive packets with meta-data
> (see libcmtspeechdata interface, cmtspeech.h), and can detect and handle
> gaps in the received packets.
Well, packets are of fixed size, right? So the userland can simply
supply the right size in the common case. As for sending at the right
time... well... if the userspace is already real-time, that should be
easy.
Now, there's a difference in the downlink. Maybe ALSA people have an
idea what to do in this case? Perhaps we can just provide artificial
"zero" data?
> This is very different from modems that offer an actual PCM voice link for
> example over I2S to the application processor (there are lots of these on
> the market). When you walk out of coverage during a call with these modems,
> you'll still get samples over I2S, but not so with cmt_speech, so ALSA is
> not the right interface.
Yes, understood.
> Now, I'm not saying the interface is perfect, but just to give a bit of
> background, why a custom char-device interface was chosen.
Thanks and best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply
* Re: [PATCH v2 10/18] dt-bindings: Document the STM32 pin controller
From: Linus Walleij @ 2015-03-06 9:35 UTC (permalink / raw)
To: Maxime Coquelin
Cc: Uwe Kleine-König, Andreas Färber, Geert Uytterhoeven,
Rob Herring, Philipp Zabel, Jonathan Corbet, Pawel Moll,
Mark Rutland, Ian Campbell, Kumar Gala, Russell King,
Daniel Lezcano, Thomas Gleixner, Greg Kroah-Hartman, Jiri Slaby,
Arnd Bergmann, Andrew Morton, David S. Miller,
Mauro Carvalho Chehab, Joe Perches, Antti Palosaari, Tejun Heo
In-Reply-To: <1424455277-29983-11-git-send-email-mcoquelin.stm32@gmail.com>
I saw this other thing:
On Fri, Feb 20, 2015 at 7:01 PM, Maxime Coquelin
<mcoquelin.stm32@gmail.com> wrote:
> This adds documentation of device tree bindings for the
> STM32 pin controller.
>
> Signed-off-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>
(...)
> +- altmode : Should be mode or alternate function number associated this pin, as
> +described in the datasheet (IN, OUT, ALT0...ALT15, ANALOG)
We can now describe muxing (altmodes etc) in two ways as described
in the generic bindings in
Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt
This is done by strings combining a function with N groups.
We are also discussing having a single config number setting up
all and keeping down the size of the DTB (which
is close to what you're doing here). Please take part in that
discussion to standardize such bindings. Sascha Hauer and
others are involved, don't know the exact topic right now but
it involved using a single "pinmux" parameter in the device treel.
All agree on using the standardized pin config bindings henceforth
so start by migrating to these.
Yours,
Linus Walleij
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox