Linux userland API discussions
 help / color / mirror / Atom feed
* Re: [PATCH v2 11/18] pinctrl: Add pinctrl driver for STM32 MCUs
From: Maxime Coquelin @ 2015-03-06  9:57 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Uwe Kleine-König, Andreas Färber, Geert Uytterhoeven,
	Rob Herring, Philipp Zabel, Jonathan Corbet, Pawel Moll,
	Mark Rutland, Ian Campbell, Kumar Gala, Russell King,
	Daniel Lezcano, Thomas Gleixner, Greg Kroah-Hartman, Jiri Slaby,
	Arnd Bergmann, Andrew Morton, David S. Miller,
	Mauro Carvalho Chehab, Joe Perches, Antti Palosaari, Tejun Heo
In-Reply-To: <CACRpkdbuJ5B_GwvRXax2Y4V37ihh5e6H7=2no0fYTMZPXwDdCw@mail.gmail.com>

2015-03-06 10:24 GMT+01:00 Linus Walleij <linus.walleij@linaro.org>:
> On Fri, Feb 20, 2015 at 7:01 PM, Maxime Coquelin
> <mcoquelin.stm32@gmail.com> wrote:
>
>> This driver adds pinctrl and GPIO support to STMicrolectronic's
>> STM32 family of MCUs.
>>
>> Pin muxing and GPIO handling have been tested on STM32F429
>> based Discovery board.
>>
>> Signed-off-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>
>
> (...)
>> +config PINCTRL_STM32
>> +       bool "STMicroelectronics STM32 pinctrl driver"
>> +       depends on OF
>> +       depends on ARCH_STM32 || COMPILE_TEST
>> +       select PINMUX
>> +       select PINCONF
>> +       select GPIOLIB_IRQCHIP
>> +       help
>> +         This selects the device tree based generic pinctrl driver for STM32.
>
> Good start! Especially that you use GPIOLIB_IRQCHIP.
>
> But this (as discussed earlier) should select GENERIC_PINCONF
>
> Stopping review here so you can reengineer it a bit using GENERIC_PINCONF
> for next submission.
>
> Also think about pinmux in single registers, whether you want to do this
> with a single value for a register or using strings to identify groups
> and functions.

Thanks for the review.
I will digest all this, and come back with another solution :)

Best regards,
Maxime

>
> Yours,
> Linus Walleij

^ permalink raw reply

* Re: [PATCH 14/14] MAINTAINERS: Add entry for STM32 MCUs
From: Maxime Coquelin @ 2015-03-06  9:55 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Jonathan Corbet, Rob Herring, Pawel Moll, Mark Rutland,
	Ian Campbell, Kumar Gala, Philipp Zabel, Russell King,
	Daniel Lezcano, Thomas Gleixner, Greg Kroah-Hartman, Jiri Slaby,
	Arnd Bergmann, Andrew Morton, David S. Miller,
	Mauro Carvalho Chehab, Joe Perches, Antti Palosaari, Tejun Heo,
	Will Deacon, Nikolay Borisov, Rusty Russell, Kees Cook, Michal
In-Reply-To: <CACRpkdaM0BwMcq77MO=0-dVaqwobD5Pt+11j7ZZ4c3yKM=AkeA@mail.gmail.com>

2015-03-06 10:03 GMT+01:00 Linus Walleij <linus.walleij@linaro.org>:
> On Thu, Feb 12, 2015 at 6:46 PM, Maxime Coquelin
> <mcoquelin.stm32@gmail.com> wrote:
>
>> Add a MAINTAINER entry covering all STM32 machine and drivers files.
>>
>> Signed-off-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>
>
> (...)
>> +F:     drivers/clocksource/arm_system_timer.c
>
> Is that all? And that is not even a STM32 specific driver.

For the ARM System Timer, I'm fine to add a new entry.
Or remove the line, and let the maintain-ship to clocksource maintainers.

All the STM32 files are covered by this line:
+N:     stm32

Thanks,
Maxime
>
> Yours,
> Linus Walleij

^ permalink raw reply

* Right interface for cellphone modem audio (was Re: [PATCHv2 0/2] N900 Modem Speech Support)
From: Pavel Machek @ 2015-03-06  9:43 UTC (permalink / raw)
  To: Kai Vehmanen, perex, tiwai, alsa-devel
  Cc: Sebastian Reichel, Peter Ujfalusi, Kai Vehmanen, Pali Rohar,
	Aaro Koskinen, Ivaylo Dimitrov, linux-omap, linux-kernel,
	linux-api
In-Reply-To: <alpine.DEB.2.00.1503051844420.2610@ecabase.localdomain>

Hi!

> >>Userland access goes via /dev/cmt_speech. The API is implemented in
> >>libcmtspeechdata, which is used by ofono and the freesmartphone.org project.
> >Yes, the ABI is "tested" for some years, but it is not documented, and
> >it is very wrong ABI.
> >
> >I'm not sure what they do with the "read()". I was assuming it is
> >meant for passing voice data, but it can return at most 4 bytes,
> >AFAICT.
> >
> >We already have perfectly good ABI for passing voice data around. It
> >is called "ALSA". libcmtspeech will then become unneccessary, and the
> >daemon routing voice data will be as simple as "read sample from
> 
> I'm no longer involved with cmt_speech (with this driver nor modems in
> general), but let me clarify some bits about the design.

Thanks a lot for your insights; high level design decisions are quite
hard to understand from C code.

> First, the team that designed the driver and the stack above had a lot of
> folks working also with ALSA (and the ALSA drivers have been merged to
> mainline long ago) and we considered ALSA on multiple occasions as the
> interface for this as well.
> 
> Our take was that ALSA is not the right interface for cmt_speech. The
> cmt_speech interface in the modem is _not_ a PCM interface as modelled by
> ALSA. Specifically:
> 
> - the interface is lossy in both directions
> - data is sent in packets, not a stream of samples (could be other things
>   than PCM samples), with timing and meta-data
> - timing of uplink is of utmost importance

I see that you may not have data available in "downlink" scenario, but
how is it lossy in "uplink" scenario? Phone should always try to fill
the uplink, no? (Or do you detect silence and not transmit in this
case?) (Actually, I guess applications should be ready for "data not
ready" case even on "normal" hardware due to differing clocks.)

Packets vs. stream of samples... does userland need to know about the
packets? Could we simply hide it from the userland? As userland daemon
is (supposed to be) realtime, do we really need extra set of
timestamps? What other metadata are there?

Uplink timing... As the daemon is realtime, can it just send the data
at the right time? Also normally uplink would be filled, no?

> Some definite similarities:
>  - the mmap interface to manage the PCM buffers (that is on purpose
>    similar to that of ALSA)
> 
> The interface was designed so that the audio mixer (e.g. Pulseaudio) is run
> with a soft real-time SCHED_FIFO/RR user-space thread that has full control
> over _when_ voice _packets_ are sent, and can receive packets with meta-data
> (see libcmtspeechdata interface, cmtspeech.h), and can detect and handle
> gaps in the received packets.

Well, packets are of fixed size, right? So the userland can simply
supply the right size in the common case. As for sending at the right
time... well... if the userspace is already real-time, that should be
easy. 

Now, there's a difference in the downlink. Maybe ALSA people have an
idea what to do in this case? Perhaps we can just provide artificial
"zero" data?

> This is very different from modems that offer an actual PCM voice link for
> example over I2S to the application processor (there are lots of these on
> the market). When you walk out of coverage during a call with these modems,
> you'll still get samples over I2S, but not so with cmt_speech, so ALSA is
> not the right interface.

Yes, understood.

> Now, I'm not saying the interface is perfect, but just to give a bit of
> background, why a custom char-device interface was chosen.

Thanks and best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [PATCH v2 10/18] dt-bindings: Document the STM32 pin controller
From: Linus Walleij @ 2015-03-06  9:35 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Uwe Kleine-König, Andreas Färber, Geert Uytterhoeven,
	Rob Herring, Philipp Zabel, Jonathan Corbet, Pawel Moll,
	Mark Rutland, Ian Campbell, Kumar Gala, Russell King,
	Daniel Lezcano, Thomas Gleixner, Greg Kroah-Hartman, Jiri Slaby,
	Arnd Bergmann, Andrew Morton, David S. Miller,
	Mauro Carvalho Chehab, Joe Perches, Antti Palosaari, Tejun Heo
In-Reply-To: <1424455277-29983-11-git-send-email-mcoquelin.stm32@gmail.com>

I saw this other thing:

On Fri, Feb 20, 2015 at 7:01 PM, Maxime Coquelin
<mcoquelin.stm32@gmail.com> wrote:

> This adds documentation of device tree bindings for the
> STM32 pin controller.
>
> Signed-off-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>
(...)
> +- altmode : Should be mode or alternate function number associated this pin, as
> +described in the datasheet (IN, OUT, ALT0...ALT15, ANALOG)

We can now describe muxing (altmodes etc) in two ways as described
in the generic bindings in
Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt

This is done by strings combining a function with N groups.

We are also discussing having a single config number setting up
all and keeping down the size of the DTB (which
is close to what you're doing here). Please take part in that
discussion to standardize such bindings. Sascha Hauer and
others are involved, don't know the exact topic right now but
it involved using a single "pinmux" parameter in the device treel.

All agree on using the standardized pin config bindings henceforth
so start by migrating to these.

Yours,
Linus Walleij

^ permalink raw reply

* Re: [PATCH v2 11/18] pinctrl: Add pinctrl driver for STM32 MCUs
From: Linus Walleij @ 2015-03-06  9:24 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Uwe Kleine-König, Andreas Färber, Geert Uytterhoeven,
	Rob Herring, Philipp Zabel, Jonathan Corbet, Pawel Moll,
	Mark Rutland, Ian Campbell, Kumar Gala, Russell King,
	Daniel Lezcano, Thomas Gleixner, Greg Kroah-Hartman, Jiri Slaby,
	Arnd Bergmann, Andrew Morton, David S. Miller,
	Mauro Carvalho Chehab, Joe Perches, Antti Palosaari, Tejun Heo
In-Reply-To: <1424455277-29983-12-git-send-email-mcoquelin.stm32@gmail.com>

On Fri, Feb 20, 2015 at 7:01 PM, Maxime Coquelin
<mcoquelin.stm32@gmail.com> wrote:

> This driver adds pinctrl and GPIO support to STMicrolectronic's
> STM32 family of MCUs.
>
> Pin muxing and GPIO handling have been tested on STM32F429
> based Discovery board.
>
> Signed-off-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>

(...)
> +config PINCTRL_STM32
> +       bool "STMicroelectronics STM32 pinctrl driver"
> +       depends on OF
> +       depends on ARCH_STM32 || COMPILE_TEST
> +       select PINMUX
> +       select PINCONF
> +       select GPIOLIB_IRQCHIP
> +       help
> +         This selects the device tree based generic pinctrl driver for STM32.

Good start! Especially that you use GPIOLIB_IRQCHIP.

But this (as discussed earlier) should select GENERIC_PINCONF

Stopping review here so you can reengineer it a bit using GENERIC_PINCONF
for next submission.

Also think about pinmux in single registers, whether you want to do this
with a single value for a register or using strings to identify groups
and functions.

Yours,
Linus Walleij

^ permalink raw reply

* Re: [PATCH v2 10/18] dt-bindings: Document the STM32 pin controller
From: Linus Walleij @ 2015-03-06  9:12 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Uwe Kleine-König, Andreas Färber, Geert Uytterhoeven,
	Rob Herring, Philipp Zabel, Jonathan Corbet, Pawel Moll,
	Mark Rutland, Ian Campbell, Kumar Gala, Russell King,
	Daniel Lezcano, Thomas Gleixner, Greg Kroah-Hartman, Jiri Slaby,
	Arnd Bergmann, Andrew Morton, David S. Miller,
	Mauro Carvalho Chehab, Joe Perches, Antti Palosaari, Tejun Heo
In-Reply-To: <1424455277-29983-11-git-send-email-mcoquelin.stm32-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Fri, Feb 20, 2015 at 7:01 PM, Maxime Coquelin
<mcoquelin.stm32-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> This adds documentation of device tree bindings for the
> STM32 pin controller.
>
> Signed-off-by: Maxime Coquelin <mcoquelin.stm32-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
(...)
> +Pin controller node:
> +Required properies:
> +- compatible   : "st,stm32-pinctrl"
> +- #address-cells: The value of this property must be 1
> +- #size-cells  : The value of this property must be 1
> +- ranges       : defines mapping between pin controller node (parent) to
> +  gpio-bank node (children).
> +
> +GPIO controller/bank node:
> +Required properties:
> +- gpio-controller : Indicates this device is a GPIO controller
> +- #gpio-cells    : Should be two.
> +                       The first cell is the pin number
> +                       The second one is the polarity:
> +                               - 0 for active high
> +                               - 1 for active low
> +- reg            : The gpio address range, relative to the pinctrl range
> +- st,bank-name   : Should be a name string for this bank as specified in
> +  the datasheet

OK...

(...)
> +               ...
> +               pin-functions nodes follow...
> +       };
> +
> +Contents of function subnode node:
> +----------------------------------
> +
> +Required properties for pin configuration node:
> +- st,pins      : Child node with list of pins with configuration.

Don't use vendor prefix. Use the new standard notation, just "pins".
See Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt

> +Below is the format of how each pin conf should look like.
> +
> +<bank offset altmode pull type speed>
> +
> +Every PIO is represented with 4 to 6 parameters.
> +Each parameter is explained as below.
> +
> +- bank   : Should be bank phandle to which this PIO belongs.
> +- offset  : Offset in the PIO bank.

No. Don't hardcode register offsets into the DTS files. This is
something the driver should know from the compatible string or
instance number, not by explicit reference.

> +- altmode : Should be mode or alternate function number associated this pin, as
> +described in the datasheet (IN, OUT, ALT0...ALT15, ANALOG)
> +- pull   : Should be either NO_PULL, PULL_UP or PULL_DOWN
> +- type   : Should be either PUSH_PULL or OPEN_DRAIN.
> +           Setting it is not needed for IN and ANALOG modes, or alternate
> +           functions acting as inputs.
> +- speed          : Value taken from the datasheet, depending on the function
> +(LOW_SPEED, MEDIUM_SPEED, FAST_SPEED, HIGH_SPEED)
> +           Setting it is not needed for IN and ANALOG modes, or alternate
> +           functions acting as inputs.

NACK. Make your Kconfig select GENERIC_PINCONF and use the
already existing generic pin configuration specifiers from the standard
bindings. Like bias-pull-up; etc.

This kind of custom pin config is a thing of the past.

> +usart1 {
> +       pinctrl_usart1: usart1-0 {
> +               st,pins {
> +                       tx = <&gpioa 9 ALT7 NO_PULL PUSH_PULL LOW_SPEED>;
> +                       rx = <&gpioa 10 ALT7 NO_PULL PUSH_PULL LOW_SPEED>;
> +               };
> +       };
> +};

You might have to use separate parameters for muxing and config
in your description.

Yours,
Linus Walleij

^ permalink raw reply

* Re: [PATCH 14/14] MAINTAINERS: Add entry for STM32 MCUs
From: Linus Walleij @ 2015-03-06  9:03 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Jonathan Corbet, Rob Herring, Pawel Moll, Mark Rutland,
	Ian Campbell, Kumar Gala, Philipp Zabel, Russell King,
	Daniel Lezcano, Thomas Gleixner, Greg Kroah-Hartman, Jiri Slaby,
	Arnd Bergmann, Andrew Morton, David S. Miller,
	Mauro Carvalho Chehab, Joe Perches, Antti Palosaari, Tejun Heo,
	Will Deacon, Nikolay Borisov, Rusty Russell, Kees Cook, Michal
In-Reply-To: <1423763164-5606-15-git-send-email-mcoquelin.stm32-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Thu, Feb 12, 2015 at 6:46 PM, Maxime Coquelin
<mcoquelin.stm32-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> Add a MAINTAINER entry covering all STM32 machine and drivers files.
>
> Signed-off-by: Maxime Coquelin <mcoquelin.stm32-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

(...)
> +F:     drivers/clocksource/arm_system_timer.c

Is that all? And that is not even a STM32 specific driver.

Yours,
Linus Walleij

^ permalink raw reply

* Re: [PATCH 07/14] clockevent: Add STM32 Timer driver
From: Linus Walleij @ 2015-03-06  8:57 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Jonathan Corbet, Rob Herring, Pawel Moll, Mark Rutland,
	Ian Campbell, Kumar Gala, Philipp Zabel, Russell King,
	Daniel Lezcano, Thomas Gleixner, Greg Kroah-Hartman, Jiri Slaby,
	Arnd Bergmann, Andrew Morton, David S. Miller,
	Mauro Carvalho Chehab, Joe Perches, Antti Palosaari, Tejun Heo,
	Will Deacon, Nikolay Borisov, Rusty Russell, Kees Cook, Michal
In-Reply-To: <1423763164-5606-8-git-send-email-mcoquelin.stm32@gmail.com>

On Thu, Feb 12, 2015 at 6:45 PM, Maxime Coquelin
<mcoquelin.stm32@gmail.com> wrote:

> STM32 MCUs feature 16 and 32 bits general purpose timers with prescalers.
> The drivers detects whether the time is 16 or 32 bits, and applies a
> 1024 prescaler value if it is 16 bits.
>
> Signed-off-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>

This is a nice driver.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>

Yours,
Linus Walleij

^ permalink raw reply

* Re: [PATCH] capabilities: Ambient capability set V2
From: Andy Lutomirski @ 2015-03-05 23:07 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jarkko Sakkinen, Andrew Morton, LSM List, Andrew G. Morgan,
	Michael Kerrisk, Mimi Zohar, linux-kernel@vger.kernel.org,
	Austin S Hemmelgarn, Aaron Jones, Serge Hallyn, Serge E. Hallyn,
	Markku Savela, Linux API, Jonathan Corbet
In-Reply-To: <alpine.DEB.2.11.1503051235300.32373@gentwo.org>

On Mar 5, 2015 10:41 AM, "Christoph Lameter" <cl@linux.com> wrote:
>
> On Thu, 5 Mar 2015, Serge E. Hallyn wrote:
>
> > > >
> > > > So I'd say drop this change ^
> > >
> > > Then the ambient caps get ignored for a executables that have capabilities
> > > seton the file?
> >
> > Yes.  Those are assumed to already know what they're doing.
>
> Do they? What if there is a LD_PRELOAD library that redirects socket calls
> and that needs raw device access (there are actually a number of software
> packages like that to reduce the latency of network I/O. See for example
> Solarflare's software products and the current rsocket libary in OFED.
> There are cap issues if the rsocket library should be made useful for
> Ethernet instead of infiniband).
>
> > Why?  Do you foresee cases where a file that has fP set needs capabilities
> > that aren't in its fP?
>
> Yes due to the library issues.

You can't LD_PRELOAD and fP together.  And I'm still unconvinced that
ambient caps can ever be safe in conjunction with fP.  I'll grill you
next week on what you're trying to do that makes you want this :)

--Andy

>
> > It seems more likely that they'll risk misbehaving due to an unexpected set
> > of caps.
>
> The userspace driver code in the library wont work since it does not have
> the caps to access the raw device registers.

^ permalink raw reply

* Re: [PATCH v1 3/6] eeprom: Add bindings for simple eeprom framework
From: Srinivas Kandagatla @ 2015-03-05 22:34 UTC (permalink / raw)
  To: Rob Herring
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Maxime Ripard, Rob Herring, Pawel Moll, Kumar Gala,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Stephen Boyd,
	Andrew Lunn, Arnd Bergmann, Mark Brown, Greg Kroah-Hartman
In-Reply-To: <CAL_Jsq+wrZYn82otDfLsTVpgmhLFWtzRHQ4v+qn-Ks--ZpXR8Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

>> +
>> +For example:
>> +
>> +       /* Provider */
>> +       qfprom: qfprom@00700000 {
>> +               compatible      = "qcom,qfprom";
>> +               reg             = <0x00700000 0x1000>;
>> +               ...
>> +
>> +               /* Data cells */
>> +               tsens_calibration: calib@404 {
>> +                       reg = <0x404 0x10>;
>> +               };
>> +
>> +               serial_number: sn {
>> +                       reg = <0x104 0x4>, <0x204 0x4>, <0x30c 0x4>;
>> +
>> +               };
>> +               ...
>> +       };
>> +
>> += Data consumers =
>> +Are drivers which consume eeprom data cells.
>
> s/drivers/device nodes/
>
Thats true, "device nodes" makes sense.
>> +
>> +Required properties:
>> +
>> +eeproms: List of phandle and data cell the device might be interested in.
>> +
>> +Optional properties:
>> +
>> +eeprom-names: List of data cell name strings sorted in the same order
>> +             as the resets property. Consumers drivers will use
>
> resets?
Opps..
I remember fixing this, I will take care of it in next version.
>
>> +             eeprom-names to differentiate between multiple cells,
>> +             and hence being able to know what these cells are for.
>
> Is this still needed? The sub-node name defines the name. Or you can
> use reg-names with-in the sub-node.
Yes, eeprom-names is needed in the consumer nodes, where there are 
multiple eeproms cells, its easy to lookup by name rather than 
index,which depends on the order of the entries.

reg-names inside the "data cells" is ok, but I can't think of its use 
immediately. May be useful for debug?

--srini
 >
>
> Rob
>
>> +
>> +For example:
>> +
>> +       tsens {
>> +               ...
>> +               eeproms = <&tsens_calibration>;
>> +               eeprom-names = "calibration";
>> +       };
>> --
>> 1.9.1
>>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 0/3] epoll: introduce round robin wakeup mode
From: Jason Baron @ 2015-03-05 20:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	mingo-H+wXaHxf7aLQT0dZR+AlfA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, normalperson-rMlxZR9MS24,
	davidel-AhlLAIvw+VEjIGhXcJzhZg,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Linus Torvalds, Alexander Viro
In-Reply-To: <20150305091517.GA25158-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>


On 03/05/2015 04:15 AM, Ingo Molnar wrote:
> * Jason Baron <jbaron-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org> wrote:
>
>> 2) We are using the wakeup in this case to 'assign' work more 
>> permanently to the thread. That is, in the case of a listen socket 
>> we then add the connected socket to the woken up threads local set 
>> of epoll events. So the load persists past the wake up. And in this 
>> case, doing the round robin wakeups, simply allows us to access more 
>> cpu bandwidth. (I'm also looking into potentially using cpu affinity 
>> to do the wakeups as well as you suggested.)
> So this is the part that I still don't understand.
>
> What difference does LIFO versus FIFO wakeups make to CPU utilization: 
> a thread waiting for work is idle, no matter whether it ran most 
> recently or least recently.
>
> Once an idle worker thread is woken it will compute its own work, for 
> whatever time it needs to, and won't be bothered by epoll again until 
> it finished its work and starts waiting again.
>
> So regardless the wakeup order it's the same principal bandwidth 
> utilization, modulo caching artifacts [*] and modulo scheduling 
> artifacts [**]:

So just adding the wakeup source as 'exclusive', I think would
give much of the desired behavior as you point out. In the first
patch posting I separated 'exclusive' from 'rotate' (where rotate
depended on exclusive), since the idle threads will tend to get
assigned the new work vs. the busy threads as you point out
and the workload naturally spreads out (modulo the artifacts
you mentioned).

However, I added the 'rotate' b/c I'm assigning work via the
wakeup that persists past the wakeup point. So without the rotate
I might end up assigning a lot of work to always say the first
thread if its always idle. And then I might get a large burst of
work queued to it at some later point. The rotate is intended
to address this case.

To use some pseudo-code in hopes of clarifying things, each
thread is roughly doing:

epoll_ctl(epfd, EPOLL_CTL_ADD, listen_fd...);

while(1) {

    epoll_wait(epfd...);
    fd = accept(listen_fd...);
    epoll_ctl(epfd, EPOLL_CTL_ADD, fd...);
    ...do any additional desired fd processing...
}

So since the work persists past the wakeup point (after
the 'fd' has been assigned to the epfd set of the local
thread), I am trying to balance out future load.

This is an issue that current userspace has to address in
various ways. In our case, we periodically remove all
epfds from the listen socket, and then re-add in a
different order periodically. Another alternative that was
suggested by Eric was to have a dedicated thread(s), to
do the assignment. So these approaches can work to an
extent, but they seem at least to me to complicate
userspace somewhat. And at least in our case, its not
providing as good balancing as this approach.

So I am trying to use epoll in a special way to do work
assignment. I think the model is different from the
standard waker/wakee model. So to that end, in this
v3 version, I've attempted to isolate all the changes to
be contained within epoll to reflect that fact.

Thanks,

-Jason


>
> [*]  Caching artifacts: in that sense Andrew's point stands: given 
>      multiple equivalent choices it's more beneficial to pick a thread 
>      that was most recently used (and is thus most cache-hot - i.e. 
>      the current wakeup behavior), versus a thread that was least 
>      recently used (and is thus the most cache-cold - i.e. the 
>      round-robin wakeup you introduce).
>
> [**] The hack patch I posted in my previous reply.
>
> Thanks,
>
> 	Ingo
>

^ permalink raw reply

* Re: [PATCH v1 3/6] eeprom: Add bindings for simple eeprom framework
From: Rob Herring @ 2015-03-05 20:11 UTC (permalink / raw)
  To: Srinivas Kandagatla
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Maxime Ripard, Rob Herring, Pawel Moll, Kumar Gala,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Stephen Boyd,
	Andrew Lunn, Arnd Bergmann, Mark Brown, Greg Kroah-Hartman
In-Reply-To: <1425548765-13019-1-git-send-email-srinivas.kandagatla-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

On Thu, Mar 5, 2015 at 3:46 AM, Srinivas Kandagatla
<srinivas.kandagatla-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> This patch adds bindings for simple eeprom framework which allows eeprom
> consumers to talk to eeprom providers to get access to eeprom cell data.
>
> Signed-off-by: Maxime Ripard <maxime.ripard-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
> [Maxime Ripard: intial version of eeprom framework]
> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> ---
>  .../devicetree/bindings/eeprom/eeprom.txt          | 70 ++++++++++++++++++++++
>  1 file changed, 70 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/eeprom/eeprom.txt
>
> diff --git a/Documentation/devicetree/bindings/eeprom/eeprom.txt b/Documentation/devicetree/bindings/eeprom/eeprom.txt
> new file mode 100644
> index 0000000..dbfb95c
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/eeprom/eeprom.txt
> @@ -0,0 +1,70 @@
> += EEPROM Data Device Tree Bindings =
> +
> +This binding is intended to represent the location of hardware
> +configuration data stored in EEPROMs.
> +
> +On a significant proportion of boards, the manufacturer has stored
> +some data on an EEPROM-like device, for the OS to be able to retrieve
> +these information and act upon it. Obviously, the OS has to know
> +about where to retrieve these data from, and where they are stored on
> +the storage device.
> +
> +This document is here to document this.
> +
> += Data providers =
> +Contains bindings specific to provider drivers and data cells as children
> +to this node.
> +
> += Data cells =
> +These are the child nodes of the provider which contain data cell
> +information like offset and size in eeprom provider.
> +
> +Required properties:
> +reg:   specifies the offset in byte within that storage device, and the length
> +       in bytes of the data we care about.
> +       There could be more then one offset-length pairs in this property.
> +
> +Optional properties:
> +As required by specific data parsers/interpreters.
> +
> +For example:
> +
> +       /* Provider */
> +       qfprom: qfprom@00700000 {
> +               compatible      = "qcom,qfprom";
> +               reg             = <0x00700000 0x1000>;
> +               ...
> +
> +               /* Data cells */
> +               tsens_calibration: calib@404 {
> +                       reg = <0x404 0x10>;
> +               };
> +
> +               serial_number: sn {
> +                       reg = <0x104 0x4>, <0x204 0x4>, <0x30c 0x4>;
> +
> +               };
> +               ...
> +       };
> +
> += Data consumers =
> +Are drivers which consume eeprom data cells.

s/drivers/device nodes/

> +
> +Required properties:
> +
> +eeproms: List of phandle and data cell the device might be interested in.
> +
> +Optional properties:
> +
> +eeprom-names: List of data cell name strings sorted in the same order
> +             as the resets property. Consumers drivers will use

resets?

> +             eeprom-names to differentiate between multiple cells,
> +             and hence being able to know what these cells are for.

Is this still needed? The sub-node name defines the name. Or you can
use reg-names with-in the sub-node.

Rob

> +
> +For example:
> +
> +       tsens {
> +               ...
> +               eeproms = <&tsens_calibration>;
> +               eeprom-names = "calibration";
> +       };
> --
> 1.9.1
>

^ permalink raw reply

* Re: [PATCH 19/21] userfaultfd: remap_pages: UFFDIO_REMAP preparation
From: Linus Torvalds @ 2015-03-05 19:32 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: qemu-devel-qX2TKyscuCcdnm+yROfE0A, KVM list,
	Linux Kernel Mailing List, linux-mm, Linux API,
	Android Kernel Team, Kirill A. Shutemov, Pavel Emelyanov,
	Sanidhya Kashyap, zhang.zhanghailiang-hv44wF8Li93QT0dZR+AlfA,
	Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini, Rik van Riel,
	Mel Gorman, Andy Lutomirski, Andrew Morton, Sasha Levin,
	Hugh Dickins, Peter Feiner, Dr. David Alan Gilbert,
	Christopher Covington, Johannes Weiner
In-Reply-To: <20150305185112.GL4280-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Thu, Mar 5, 2015 at 10:51 AM, Andrea Arcangeli <aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>
> Thanks for your idea that the UFFDIO_COPY is faster, the userland code
> we submitted for qemu only uses UFFDIO_COPY|ZEROPAGE, it never uses
> UFFDIO_REMAP.

Ok. So there's no actual expected use of the remap interface. Good.
That makes this series more palatable, since the rest didn't raise my
hackles much.

(But yeah, the documentation patch didn't really explain the uses very
much or at all, so I think something more is needed in that area).

                   Linus

^ permalink raw reply

* Re: [PATCH 2/9] selftests: Add install target
From: Dave Jones @ 2015-03-05 18:53 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	shuahkh-JPH+aEBZ4P+UEJcrhfAQsw, mmarek-AlSwsSmVLrQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1425358302-16680-2-git-send-email-mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org>

On Tue, Mar 03, 2015 at 03:51:35PM +1100, Michael Ellerman wrote:
 > This adds make install support to selftests. The basic usage is:
 > 
 > $ cd tools/testing/selftests
 > $ make install
 > 
 > That installs into tools/testing/selftests/install, which can then be
 > copied where ever necessary.
 > 
 > The install destination is also configurable using eg:
 > 
 > $ INSTALL_PATH=/mnt/selftests make install

 ...

 > +	@# Ask all targets to emit their test scripts
 > +	echo "#!/bin/bash\n\n" > $(ALL_SCRIPT)

$ ./all.sh 
-bash: ./all.sh: /bin/bash\n\n: bad interpreter: No such file or directory

Removing the \n\n fixes it.

 > +		echo "cd \$$ROOT\n" >> $(ALL_SCRIPT); \

ditto

	Dave

^ permalink raw reply

* Re: [PATCH 19/21] userfaultfd: remap_pages: UFFDIO_REMAP preparation
From: Andrea Arcangeli @ 2015-03-05 18:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: qemu-devel, KVM list, Linux Kernel Mailing List, linux-mm,
	Linux API, Android Kernel Team, Kirill A. Shutemov,
	Pavel Emelyanov, Sanidhya Kashyap, zhang.zhanghailiang,
	Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini, Rik van Riel,
	Mel Gorman, Andy Lutomirski, Andrew Morton, Sasha Levin,
	Hugh Dickins, Peter Feiner, Dr. David 
In-Reply-To: <CA+55aFzW=qaO0iKZWK9BWDNHu4eOgiKOJ-=0SvzsmZawuH5_3A@mail.gmail.com>

On Thu, Mar 05, 2015 at 09:39:48AM -0800, Linus Torvalds wrote:
> Is this really worth it? On real loads? That people are expected to use?

I fully agree that it's not worth merging upstream UFFDIO_REMAP until
(and if) a real world usage for it will showup. To further clarify:
would this not have been an RFC, the patchset would have stopped at
patch number 15/21 included.

Merging UFFDIO_REMAP with no real life users, would just increase the
attack vector surface of the kernel for no good.

Thanks for your idea that the UFFDIO_COPY is faster, the userland code
we submitted for qemu only uses UFFDIO_COPY|ZEROPAGE, it never uses
UFFDIO_REMAP. I immediately agreed about UFFDIO_COPY being preferable
after you mentioned it during review of the previous RFC.

However this being a RFC with a large audience, and UFFDIO_REMAP
allowing to "remove" memory (think like externalizing memory into to
ceph with deduplication or such), I still added it just in case there
are real world use cases that may justify me keeping it around (even
if I would definitely not have submitted it for merging in the short
term regardless).

In addition of dropping the parts that aren't suitable for merging in
the short term like UFFDIO_REMAP, for any further submits that won't
substantially alter the API like it happened between the v2 to v3
RFCs, I'll also shrink the To/Cc list considerably.

> Considering how we just got rid of one special magic VM remapping
> thing that nobody actually used, I'd really hate to add a new one.

Having to define an API somehow, I tried to think at all possible
future usages and make sure the API would allow for those if needed.

> Quite frankly, *if* we ever merge userfaultfd, I would *strongly*
> argue for not merging the remap parts. I just don't see the point. It
> doesn't seem to add anything that is semantically very important -
> it's *potentially* a faster copy, but even that is
> 
>   (a) questionable in the first place

Yes, we already measured the UFFDIO_COPY is faster than UFFDIO_REMAP,
the userfault latency decreases -20%.

> 
> and
> 
>  (b) unclear why anybody would ever care about performance of
> infrastructure that nobody actually uses today, and future use isn't
> even clear or shown to be particualrly performance-sensitive.

The only potential _theoretical_ case that justify the existence of
UFFDIO_REMAP is about "removing" memory from the address space. To
"add" memory UFFDIO_COPY and UFFDIO_ZEROPAGE are always preferable
like you suggested.

> So basically I'd like to see better documentation, a few real use
> cases (and by real I very much do *not* mean "you can use it for
> this", but actual patches to actual projects that matter and that are
> expected to care and merge them), and a simplified series that doesn't
> do the remap thing.

So far I wrote some doc in 2/21 and in the cover letter, but certainly
more docs are necessary. Trinity is also needed (I got trinity running
on the v2 API but I haven't adapted to the new API yet).

About the real world usages, this is the primary one:

http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04873.html

And it actually cannot be merged in qemu until userfaultfd is merged
in the kernel. There's simply no safe way to implement postcopy live
migration without something equivalent to the userfaultfd if all Linux
VM features are intended to be retained in the destination node.

> Because *every* time we add a new clever interface, we end up with
> approximately zero users and just pain down the line. Examples:
> splice, mremap, yadda yadda.

Aside from mremap which I think is widely used, I totally agree in
principle.

For now I can quite comfortably guarantee the above real life user for
userfaultfd (qemu), but there are potential 5 of them. And none needs
UFFDIO_REMAP, which is again why I totally agree of not submitting it
for merging and it was intended it only for the initial RFC to share
the idea of "removing" the memory with a larger audience before I
shrink the Cc/To list for further updates.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] capabilities: Ambient capability set V2
From: Christoph Lameter @ 2015-03-05 18:41 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Serge Hallyn, Andy Lutomirski, Jonathan Corbet, Aaron Jones,
	linux-security-module, linux-kernel, akpm, Andrew G. Morgan,
	Mimi Zohar, Austin S Hemmelgarn, Markku Savela, Jarkko Sakkinen,
	linux-api, Michael Kerrisk
In-Reply-To: <20150305171326.GA14998@mail.hallyn.com>

On Thu, 5 Mar 2015, Serge E. Hallyn wrote:

> > >
> > > So I'd say drop this change ^
> >
> > Then the ambient caps get ignored for a executables that have capabilities
> > seton the file?
>
> Yes.  Those are assumed to already know what they're doing.

Do they? What if there is a LD_PRELOAD library that redirects socket calls
and that needs raw device access (there are actually a number of software
packages like that to reduce the latency of network I/O. See for example
Solarflare's software products and the current rsocket libary in OFED.
There are cap issues if the rsocket library should be made useful for
Ethernet instead of infiniband).

> Why?  Do you foresee cases where a file that has fP set needs capabilities
> that aren't in its fP?

Yes due to the library issues.

> It seems more likely that they'll risk misbehaving due to an unexpected set
> of caps.

The userspace driver code in the library wont work since it does not have
the caps to access the raw device registers.

^ permalink raw reply

* Re: [PATCH v1 4/6] eeprom: sunxi: Move the SID driver to the eeprom framework
From: Maxime Ripard @ 2015-03-05 18:36 UTC (permalink / raw)
  To: Paul Bolle
  Cc: Srinivas Kandagatla,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Rob Herring,
	Pawel Moll, Kumar Gala, linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Stephen Boyd,
	andrew-g2DYL2Zd6BY, Arnd Bergmann, broonie-DgEjT+Ai2ygdnm+yROfE0A,
	Greg Kroah-Hartman
In-Reply-To: <1425550515.24292.212.camel@x220>

[-- Attachment #1: Type: text/plain, Size: 2338 bytes --]

Hi Paul,

On Thu, Mar 05, 2015 at 11:15:15AM +0100, Paul Bolle wrote:
> >  endif
> > diff --git a/drivers/eeprom/Makefile b/drivers/eeprom/Makefile
> > index e130079..661422c 100644
> > --- a/drivers/eeprom/Makefile
> > +++ b/drivers/eeprom/Makefile
> > @@ -7,3 +7,4 @@ ccflags-$(CONFIG_EEPROM_DEBUG) += -DDEBUG
> >  obj-$(CONFIG_EEPROM)		+= core.o
> >  
> >  # Devices
> > +obj-$(CONFIG_EEPROM_SUNXI_SID)	+= eeprom-sunxi-sid.o
> > diff --git a/drivers/eeprom/eeprom-sunxi-sid.c b/drivers/eeprom/eeprom-sunxi-sid.c
> > new file mode 100644
> > index 0000000..eb32afb
> > --- /dev/null
> > +++ b/drivers/eeprom/eeprom-sunxi-sid.c
> > @@ -0,0 +1,129 @@
> > +/*
> > + * Allwinner sunXi SoCs Security ID support.
> > + *
> > + * Copyright (c) 2013 Oliver Schinagl <oliver-dxLnbx3+1qmEVqv0pETR8A@public.gmane.org>
> > + * Copyright (C) 2014 Maxime Ripard <maxime.ripard-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
> > + *
> > + * This file is licensed under the terms of the GNU General Public
> > + * License version 2.  This program is licensed "as is" without any
> > + * warranty of any kind, whether express or implied.
> > + */
> 
> So the license is GPL v2.
> 
> > +MODULE_LICENSE("GPL");
> 
> Which means you probably want
>     MODULE_LICENSE("GPL v2");
> 
> > --- a/drivers/misc/eeprom/sunxi_sid.c
> > +++ /dev/null
> > @@ -1,156 +0,0 @@
> > -/*
> > - * Copyright (c) 2013 Oliver Schinagl <oliver-dxLnbx3+1qmEVqv0pETR8A@public.gmane.org>
> > - * http://www.linux-sunxi.org
> > - *
> > - * This program is free software; you can redistribute it and/or modify
> > - * it under the terms of the GNU General Public License as published by
> > - * the Free Software Foundation; either version 2 of the License, or
> > - * (at your option) any later version.
> 
> But the previous driver was GPL v2 or later.
> 
> > -MODULE_LICENSE("GPL");
> 
> And this matches that.
> 
> Was it intended to re-license this, or is the code basically new? (I
> haven't compared the before and after code, to be honest.)

That was unintentional, especially since this driver is not new at
all, and is barely a conversion to that framework.

Thanks for catching this!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH 00/21] RFC: userfaultfd v3
From: Pavel Emelyanov @ 2015-03-05 18:15 UTC (permalink / raw)
  To: Andrea Arcangeli, qemu-devel, kvm, linux-kernel, linux-mm,
	linux-api, Android Kernel Team
  Cc: Kirill A. Shutemov, Sanidhya Kashyap, zhang.zhanghailiang,
	Linus Torvalds, Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini,
	Rik van Riel, Mel Gorman, Andy Lutomirski, Andrew Morton,
	Sasha Levin, Hugh Dickins, Peter Feiner, Dr. David Alan Gilbert,
	Christopher Covington, Johannes Weiner, Robert Love,
	Dmitry Adamushko, Neil Brown, Mike Hommey, Taras Glek, Jan Kara,
	KOSAKI Motohiro <kosaki.mo>
In-Reply-To: <1425575884-2574-1-git-send-email-aarcange@redhat.com>


> All UFFDIO_COPY/ZEROPAGE/REMAP methods already support CRIU postcopy
> live migration and the UFFD can be passed to a manager process through
> unix domain sockets to satisfy point 5).

Yup :) That's the best (from my POV) point of ufd -- the ability to delegate
the descriptor to some other task. Though there are several limitations (I've
expressed them in other e-mails), I'm definitely supporting this!

The respective CRIU code is quite sloppy yet, I will try to brush one up and
show soon.

Thanks,
Pavel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 14/21] userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation
From: Pavel Emelyanov @ 2015-03-05 18:07 UTC (permalink / raw)
  To: Andrea Arcangeli, qemu-devel, kvm, linux-kernel, linux-mm,
	linux-api, Android Kernel Team
  Cc: Kirill A. Shutemov, Sanidhya Kashyap, zhang.zhanghailiang,
	Linus Torvalds, Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini,
	Rik van Riel, Mel Gorman, Andy Lutomirski, Andrew Morton,
	Sasha Levin, Hugh Dickins, Peter Feiner, Dr. David Alan Gilbert,
	Christopher Covington, Johannes Weiner, Robert Love,
	Dmitry Adamushko, Neil Brown, Mike Hommey, Taras Glek, Jan Kara,
	KOSAKI Motohiro <kosaki.mo>
In-Reply-To: <1425575884-2574-15-git-send-email-aarcange@redhat.com>

> +static int mcopy_atomic_pte(struct mm_struct *dst_mm,
> +			    pmd_t *dst_pmd,
> +			    struct vm_area_struct *dst_vma,
> +			    unsigned long dst_addr,
> +			    unsigned long src_addr)
> +{
> +	struct mem_cgroup *memcg;
> +	pte_t _dst_pte, *dst_pte;
> +	spinlock_t *ptl;
> +	struct page *page;
> +	void *page_kaddr;
> +	int ret;
> +
> +	ret = -ENOMEM;
> +	page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, dst_vma, dst_addr);
> +	if (!page)
> +		goto out;

Not a fatal thing, but still quite inconvenient. If there are two tasks that
have anonymous private VMAs that are still not COW-ed from each other, then
it will be impossible to keep the pages shared with userfault. Thus if we do
post-copy memory migration for tasks, then these guys will have their
memory COW-ed.


Thanks,
Pavel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 19/21] userfaultfd: remap_pages: UFFDIO_REMAP preparation
From: Pavel Emelyanov @ 2015-03-05 18:01 UTC (permalink / raw)
  To: Andrea Arcangeli, qemu-devel, kvm, linux-kernel, linux-mm,
	linux-api, Android Kernel Team
  Cc: Kirill A. Shutemov, Sanidhya Kashyap, zhang.zhanghailiang,
	Linus Torvalds, Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini,
	Rik van Riel, Mel Gorman, Andy Lutomirski, Andrew Morton,
	Sasha Levin, Hugh Dickins, Peter Feiner, Dr. David Alan Gilbert,
	Christopher Covington, Johannes Weiner, Robert Love,
	Dmitry Adamushko, Neil Brown, Mike Hommey, Taras Glek, Jan Kara,
	KOSAKI Motohiro <kosaki.mo>
In-Reply-To: <1425575884-2574-20-git-send-email-aarcange@redhat.com>

> +ssize_t remap_pages(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> +		    unsigned long dst_start, unsigned long src_start,
> +		    unsigned long len, __u64 mode)
> +{
> +	struct vm_area_struct *src_vma, *dst_vma;
> +	long err = -EINVAL;
> +	pmd_t *src_pmd, *dst_pmd;
> +	pte_t *src_pte, *dst_pte;
> +	spinlock_t *dst_ptl, *src_ptl;
> +	unsigned long src_addr, dst_addr;
> +	int thp_aligned = -1;
> +	ssize_t moved = 0;
> +
> +	/*
> +	 * Sanitize the command parameters:
> +	 */
> +	BUG_ON(src_start & ~PAGE_MASK);
> +	BUG_ON(dst_start & ~PAGE_MASK);
> +	BUG_ON(len & ~PAGE_MASK);
> +
> +	/* Does the address range wrap, or is the span zero-sized? */
> +	BUG_ON(src_start + len <= src_start);
> +	BUG_ON(dst_start + len <= dst_start);
> +
> +	/*
> +	 * Because these are read sempahores there's no risk of lock
> +	 * inversion.
> +	 */
> +	down_read(&dst_mm->mmap_sem);
> +	if (dst_mm != src_mm)
> +		down_read(&src_mm->mmap_sem);
> +
> +	/*
> +	 * Make sure the vma is not shared, that the src and dst remap
> +	 * ranges are both valid and fully within a single existing
> +	 * vma.
> +	 */
> +	src_vma = find_vma(src_mm, src_start);
> +	if (!src_vma || (src_vma->vm_flags & VM_SHARED))
> +		goto out;
> +	if (src_start < src_vma->vm_start ||
> +	    src_start + len > src_vma->vm_end)
> +		goto out;
> +
> +	dst_vma = find_vma(dst_mm, dst_start);
> +	if (!dst_vma || (dst_vma->vm_flags & VM_SHARED))
> +		goto out;

I again have a concern about the case when one task monitors the VM of the
other one. If the target task (owning the mm) unmaps a VMA then the monitor
task (holding and operating on the ufd) will get plain EINVAL on UFFDIO_REMAP
request. This is not fatal, but still inconvenient as it will be hard to
find out the reason for failure -- dst VMA is removed and the monitor should
just drop the respective pages with data, or some other error has occurred
and some other actions should be taken.

Thanks,
Pavel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 10/21] userfaultfd: add new syscall to provide memory externalization
From: Pavel Emelyanov @ 2015-03-05 17:57 UTC (permalink / raw)
  To: Andrea Arcangeli, qemu-devel, kvm, linux-kernel, linux-mm,
	linux-api, Android Kernel Team
  Cc: Kirill A. Shutemov, Sanidhya Kashyap, zhang.zhanghailiang,
	Linus Torvalds, Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini,
	Rik van Riel, Mel Gorman, Andy Lutomirski, Andrew Morton,
	Sasha Levin, Hugh Dickins, Peter Feiner, Dr. David Alan Gilbert,
	Christopher Covington, Johannes Weiner, Robert Love,
	Dmitry Adamushko, Neil Brown, Mike Hommey, Taras Glek, Jan Kara,
	KOSAKI Motohiro <kosaki.mo>
In-Reply-To: <1425575884-2574-11-git-send-email-aarcange@redhat.com>


> +int handle_userfault(struct vm_area_struct *vma, unsigned long address,
> +		     unsigned int flags, unsigned long reason)
> +{
> +	struct mm_struct *mm = vma->vm_mm;
> +	struct userfaultfd_ctx *ctx;
> +	struct userfaultfd_wait_queue uwq;
> +
> +	BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
> +
> +	ctx = vma->vm_userfaultfd_ctx.ctx;
> +	if (!ctx)
> +		return VM_FAULT_SIGBUS;
> +
> +	BUG_ON(ctx->mm != mm);
> +
> +	VM_BUG_ON(reason & ~(VM_UFFD_MISSING|VM_UFFD_WP));
> +	VM_BUG_ON(!(reason & VM_UFFD_MISSING) ^ !!(reason & VM_UFFD_WP));
> +
> +	/*
> +	 * If it's already released don't get it. This avoids to loop
> +	 * in __get_user_pages if userfaultfd_release waits on the
> +	 * caller of handle_userfault to release the mmap_sem.
> +	 */
> +	if (unlikely(ACCESS_ONCE(ctx->released)))
> +		return VM_FAULT_SIGBUS;
> +
> +	/* check that we can return VM_FAULT_RETRY */
> +	if (unlikely(!(flags & FAULT_FLAG_ALLOW_RETRY))) {
> +		/*
> +		 * Validate the invariant that nowait must allow retry
> +		 * to be sure not to return SIGBUS erroneously on
> +		 * nowait invocations.
> +		 */
> +		BUG_ON(flags & FAULT_FLAG_RETRY_NOWAIT);
> +#ifdef CONFIG_DEBUG_VM
> +		if (printk_ratelimit()) {
> +			printk(KERN_WARNING
> +			       "FAULT_FLAG_ALLOW_RETRY missing %x\n", flags);
> +			dump_stack();
> +		}
> +#endif
> +		return VM_FAULT_SIGBUS;
> +	}
> +
> +	/*
> +	 * Handle nowait, not much to do other than tell it to retry
> +	 * and wait.
> +	 */
> +	if (flags & FAULT_FLAG_RETRY_NOWAIT)
> +		return VM_FAULT_RETRY;
> +
> +	/* take the reference before dropping the mmap_sem */
> +	userfaultfd_ctx_get(ctx);
> +
> +	/* be gentle and immediately relinquish the mmap_sem */
> +	up_read(&mm->mmap_sem);
> +
> +	init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function);
> +	uwq.wq.private = current;
> +	uwq.address = userfault_address(address, flags, reason);

Since we report only the virtual address of the fault, this will make difficulties
for task monitoring the address space of some other task. Like this:

Let's assume a task creates a userfaultfd, activates one, registers several VMAs 
in it and then sends the ufd descriptor to other task. If later the first task will
remap those VMAs and will start touching pages, the monitor will start receiving 
fault addresses using which it will not be able to guess the exact vma the
requests come from.

Thanks,
Pavel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 05/21] userfaultfd: add vm_userfaultfd_ctx to the vm_area_struct
From: Pavel Emelyanov @ 2015-03-05 17:48 UTC (permalink / raw)
  To: Andrea Arcangeli, qemu-devel, kvm, linux-kernel, linux-mm,
	linux-api, Android Kernel Team
  Cc: Kirill A. Shutemov, Sanidhya Kashyap, zhang.zhanghailiang,
	Linus Torvalds, Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini,
	Rik van Riel, Mel Gorman, Andy Lutomirski, Andrew Morton,
	Sasha Levin, Hugh Dickins, Peter Feiner, Dr. David Alan Gilbert,
	Christopher Covington, Johannes Weiner, Robert Love,
	Dmitry Adamushko, Neil Brown, Mike Hommey, Taras Glek, Jan Kara,
	KOSAKI Motohiro <kosaki.mo>
In-Reply-To: <1425575884-2574-6-git-send-email-aarcange@redhat.com>

> diff --git a/kernel/fork.c b/kernel/fork.c
> index cf65139..cb215c0 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -425,6 +425,7 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
>  			goto fail_nomem_anon_vma_fork;
>  		tmp->vm_flags &= ~VM_LOCKED;
>  		tmp->vm_next = tmp->vm_prev = NULL;
> +		tmp->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;

This creates an interesting effect when the userfaultfd is used outside of
the process which created and activated one. If I try to monitor the memory
usage of one task with another, once the first task fork()-s, its child
begins to see zero-pages in the places where the monitor task was supposed
to insert pages with data.

>  		file = tmp->vm_file;
>  		if (file) {
>  			struct inode *inode = file_inode(file);
> .
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 19/21] userfaultfd: remap_pages: UFFDIO_REMAP preparation
From: Linus Torvalds @ 2015-03-05 17:39 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: qemu-devel, KVM list, Linux Kernel Mailing List, linux-mm,
	Linux API, Android Kernel Team, Kirill A. Shutemov,
	Pavel Emelyanov, Sanidhya Kashyap, zhang.zhanghailiang,
	Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini, Rik van Riel,
	Mel Gorman, Andy Lutomirski, Andrew Morton, Sasha Levin,
	Hugh Dickins, Peter Feiner, Dr. David Alan Gilbert,
	Christopher Covington, Johannes Weiner
In-Reply-To: <1425575884-2574-20-git-send-email-aarcange@redhat.com>

On Thu, Mar 5, 2015 at 9:18 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> remap_pages is the lowlevel mm helper needed to implement
> UFFDIO_REMAP.

This function is nasty nasty nasty.

Is this really worth it? On real loads? That people are expected to use?

Considering how we just got rid of one special magic VM remapping
thing that nobody actually used, I'd really hate to add a new one.

The fact is, almost nobody ever uses anything that isn't standard
POSIX. There are no apps, and even for specialized things like
virtualization hypervisors this kind of thing is often simply not
worth it.

Quite frankly, *if* we ever merge userfaultfd, I would *strongly*
argue for not merging the remap parts. I just don't see the point. It
doesn't seem to add anything that is semantically very important -
it's *potentially* a faster copy, but even that is

  (a) questionable in the first place

and

 (b) unclear why anybody would ever care about performance of
infrastructure that nobody actually uses today, and future use isn't
even clear or shown to be particualrly performance-sensitive.

So basically I'd like to see better documentation, a few real use
cases (and by real I very much do *not* mean "you can use it for
this", but actual patches to actual projects that matter and that are
expected to care and merge them), and a simplified series that doesn't
do the remap thing.

Because *every* time we add a new clever interface, we end up with
approximately zero users and just pain down the line. Examples:
splice, mremap, yadda yadda.

                        Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCHv2 0/2] N900 Modem Speech Support
From: Kai Vehmanen @ 2015-03-05 17:32 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Sebastian Reichel, Peter Ujfalusi, Kai Vehmanen, Pali Rohar,
	Aaro Koskinen, Ivaylo Dimitrov, linux-omap, linux-kernel,
	linux-api
In-Reply-To: <20150305113013.GA3867@amd>

Hi,

On Thu, 5 Mar 2015, Pavel Machek wrote:

>> Userland access goes via /dev/cmt_speech. The API is implemented in
>> libcmtspeechdata, which is used by ofono and the freesmartphone.org project.
> Yes, the ABI is "tested" for some years, but it is not documented, and
> it is very wrong ABI.
>
> I'm not sure what they do with the "read()". I was assuming it is
> meant for passing voice data, but it can return at most 4 bytes,
> AFAICT.
>
> We already have perfectly good ABI for passing voice data around. It
> is called "ALSA". libcmtspeech will then become unneccessary, and the
> daemon routing voice data will be as simple as "read sample from

I'm no longer involved with cmt_speech (with this driver nor modems in 
general), but let me clarify some bits about the design.

First, the team that designed the driver and the stack above had a lot of 
folks working also with ALSA (and the ALSA drivers have been merged to 
mainline long ago) and we considered ALSA on multiple occasions as the 
interface for this as well.

Our take was that ALSA is not the right interface for cmt_speech. The 
cmt_speech interface in the modem is _not_ a PCM interface as modelled by 
ALSA. Specifically:

- the interface is lossy in both directions
- data is sent in packets, not a stream of samples (could be other things
   than PCM samples), with timing and meta-data
- timing of uplink is of utmost importance

Some definite similarities:
  - the mmap interface to manage the PCM buffers (that is on purpose
    similar to that of ALSA)

The interface was designed so that the audio mixer (e.g. Pulseaudio) is 
run with a soft real-time SCHED_FIFO/RR user-space thread that has full 
control over _when_ voice _packets_ are sent, and can receive packets with 
meta-data (see libcmtspeechdata interface, cmtspeech.h), and can 
detect and handle gaps in the received packets.

This is very different from modems that offer an actual PCM voice link for 
example over I2S to the application processor (there are lots of these on 
the market). When you walk out of coverage during a call with these 
modems, you'll still get samples over I2S, but not so with cmt_speech, so 
ALSA is not the right interface.

Now, I'm not saying the interface is perfect, but just to give a bit of 
background, why a custom char-device interface was chosen.

PS Not saying it's enough for mainline inclusion, but libcmtspeechdata [1]
    was released and documented to enable the driver to be used by
    other software than the closed pulseaudio modules. You Pavel of course
    know this as you've been maintaining the library, but FYI for others.

[1] https://www.gitorious.org/libcmtspeechdata

Br, Kai

^ permalink raw reply

* [PATCH 21/21] userfaultfd: add userfaultfd_wp mm helpers
From: Andrea Arcangeli @ 2015-03-05 17:18 UTC (permalink / raw)
  To: qemu-devel, kvm, linux-kernel, linux-mm, linux-api,
	Android Kernel Team
  Cc: Kirill A. Shutemov, Pavel Emelyanov, Sanidhya Kashyap,
	zhang.zhanghailiang, Linus Torvalds, Andres Lagar-Cavilla,
	Dave Hansen, Paolo Bonzini, Rik van Riel, Mel Gorman,
	Andy Lutomirski, Andrew Morton, Sasha Levin, Hugh Dickins,
	Peter Feiner, Dr. David Alan Gilbert, Christopher Covington,
	Johannes Weiner, Robert Love, Dmitry Adamushko
In-Reply-To: <1425575884-2574-1-git-send-email-aarcange@redhat.com>

These helpers will be used to know if to call handle_userfault() during
wrprotect faults in order to deliver the wrprotect faults to userland.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 include/linux/userfaultfd_k.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 3c39a4f..81f0d11 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -65,6 +65,11 @@ static inline bool userfaultfd_missing(struct vm_area_struct *vma)
 	return vma->vm_flags & VM_UFFD_MISSING;
 }
 
+static inline bool userfaultfd_wp(struct vm_area_struct *vma)
+{
+	return vma->vm_flags & VM_UFFD_WP;
+}
+
 static inline bool userfaultfd_armed(struct vm_area_struct *vma)
 {
 	return vma->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP);
@@ -92,6 +97,11 @@ static inline bool userfaultfd_missing(struct vm_area_struct *vma)
 	return false;
 }
 
+static inline bool userfaultfd_wp(struct vm_area_struct *vma)
+{
+	return false;
+}
+
 static inline bool userfaultfd_armed(struct vm_area_struct *vma)
 {
 	return false;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox