* Re: [PATCH v12 02/11] lib: kstrtox: add kstrtoudec64() and kstrtodec64()
From: Andy Shevchenko @ 2026-05-12 17:13 UTC (permalink / raw)
To: Rodrigo Alencar
Cc: Andy Shevchenko, Jonathan Cameron, Rodrigo Alencar via B4 Relay,
rodrigo.alencar, linux-kernel, linux-iio, devicetree, linux-doc,
David Lechner, Andy Shevchenko, Lars-Peter Clausen,
Michael Hennerich, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Jonathan Corbet, Andrew Morton, Petr Mladek, Steven Rostedt,
Rasmus Villemoes, Sergey Senozhatsky, Shuah Khan, David Laight
In-Reply-To: <q4rmlkgecvztnvjg7b7wtqyvhdy7uxgaouvhae2mlsxaasasbf@dfakp4m5l5sl>
On Tue, May 12, 2026 at 05:35:59PM +0100, Rodrigo Alencar wrote:
> On 26/05/12 06:21PM, Andy Shevchenko wrote:
> > On Tue, May 12, 2026 at 6:11 PM Rodrigo Alencar
> > <455.rodrigo.alencar@gmail.com> wrote:
> > > On 26/05/12 05:43PM, Andy Shevchenko wrote:
> > > > On Tue, May 12, 2026 at 03:12:24PM +0100, Rodrigo Alencar wrote:
> > > > > On 26/05/12 04:48PM, Andy Shevchenko wrote:
> > > > > > On Tue, May 12, 2026 at 02:21:14PM +0100, Rodrigo Alencar wrote:
> > > > > > > On 26/05/12 04:12PM, Andy Shevchenko wrote:
> > > > > > > > On Tue, May 12, 2026 at 12:39:53PM +0100, Jonathan Cameron wrote:
> > > > > > > > > On Sun, 10 May 2026 13:42:20 +0100
> > > > > > > > > Rodrigo Alencar via B4 Relay <devnull+rodrigo.alencar.analog.com@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > > > Add helpers that parses decimal numbers into 64-bit number, i.e., decimal
> > > > > > > > > > point numbers with pre-defined scale are parsed into a 64-bit value (fixed
> > > > > > > > > > precision). After the decimal point, digits beyond the specified scale
> > > > > > > > > > are ignored.
> > > > > > > > >
> > > > > > > > > Whilst Rodrigo has already replied to say there will be another version
> > > > > > > > > I'd like to request final feedback from those who were involved in the parser
> > > > > > > > > discussions.
> > > > > > > > >
> > > > > > > > > They got very involved and I'm far from an expert in the right way to do
> > > > > > > > > this stuff.
> > > > > > > > >
> > > > > > > > > I don't think David Laight was +CC so I've added that.
> > > > > > > > > David, Andy - I think you two were most involved in that discussion:
> > > > > > > > > Any objections to the end result?
> > > > > > > >
> > > > > > > > I already said a few times about the naming. I do not like the kstrto*()
> > > > > > > > be semantically different on how they treat the input. Second point is
> > > > > > > > to avoid code duplication, but this one is less of a concern since the
> > > > > > > > new code is in the library close to the other potentially duplicate code
> > > > > > > > piece and hence can be addressed later.
> > > > > > >
> > > > > > > I suppose I reached into kstrtodec64() and kstrtoudec64() because it aligns
> > > > > > > with your expectations for kstrto*() semantics, no? Those include:
> > > > > > > - overflow check;
> > > > > > > - extensive input validation;
> > > > > > > - optional '\n' in the end;
> > > > > > > - mandatory nul-termination.
> > > > > > >
> > > > > > > am I missing anything?
> > > > > >
> > > > > > When we add scale we basically make that not true. Moreover the code in this
> > > > > > patch makes scale == number_of_characters which I think a bit fragile, however
> > > > > > it's about the fractional part when the amount of digits is equal to scale.
> > > > >
> > > > > That is not really the case. It is being set as a limit, so it does check for
> > > > > truncation and zero-padding.
> > > >
> > > > I do not see it happens in _parse_integer_limit(). It doesn't try to parse more
> > > > characters than it's requested in max_chars. It doesn't check if there are more
> > > > character nor their converted values.
> > > >
> > > > > > To make this work as expected we need to add an additional call like
> > > > > > kstrtoull() (and perhaps drop that \n and NUL-terminator checks) and see
> > > > > > if that overflows or not. Since it's a fractional part it must have less
> > > > > > than 20 (decimal) digits there, so we check the rv (or how many digits
> > > > > > were parsed successfully) and compare to 20. If it's more, we got too many
> > > > > > decimal digits.
> > > > >
> > > > > For overflow it checks the KSTRTOX_OVERFLOW flag and leverages check_mul_overflow()
> > > > > and check_add_overflow() when combining fractional and integer parts. The amount
> > > > > of characters is not really important there. The scale cannot be bigger than 19 and
> > > > > that makes sure that int_pow() does not overflow. The code uses _parse_integer_limit()
> > > > > due to the nature of input and to avoid 64-bit division, kstrtoull() at any point
> > > > > (parsing integer or fractional parts) does not make much sense.
> > > >
> > > > Under 'like kstrotoull()' I meant something that repeats needed functionality.
> > > > I believe it's parse_integer() (without limit).
> > >
> > > I think we are going in circles here and we could look at the code instead:
> > > - integer parsing with _parse_integer()
> > > - overflow check and validation of the return value
> > > - fractional parsing with _parse_integer_limit()
> > > - overflow check and validation of the return value
> >
> > No, this is not fully true. That's what my whole point is about. The
> > max_chars parameter limits the input check, then it skips an arbitrary
> > number of digits and only *then* it checks for \n and \0. What will be
> > the result of the
> > 0.00000000000000000000000000000000423 in your case? Whatever scale you
> > gave it will return 0 without checking on how many digits were
> > supplied.
>
> I suppose that is a valid input and 0 is the expected result there.
>
> > All the same for 0.9999999999999999999999999999999000423. My
> > point is that we should limit this by 19 digits.
>
> why we need to limit by 19? Digits beyond the scale carry no value...
...only if they are all 0:s.
> just like leading zeros to the integer part (which is also accepted by
> kstrtoull() when parsing with base 10). Not sure why this is invalid input.
See above. I agree on truncating trailing 0:s as it's done for leading ones
in integer part, but if any of the digit behind 19th is not 0, it's an overflow
condition (or bad input, depending how strict the rules are).
> > On top of that, what about -0.9(19 times) ? the fraction should be u64
> > in this case and it's fine. The sign applies to the combined value.
>
> yes, range for signed values are verified later.
> > > - extra scaling and truncation happening outside if needed.
> >
> > Right, but the given input may be way too long and still needs more validation.
>
> What is the problem with a long input of digits?
> C compiler does not complain about this when parsing a float value,
> python does not
> complain about this when parsing floats or decimals either.
Because there is an exponent limit and for double it's something like 1e307
IIRC, meaning, try 1024 digits to be sure.
Python most likely uses the library for big numbers, you can't compare it at all with this.
> > > - check for input termination
> > > - combination of integer and fractional parts with check_mul_overflow() and check_add_overflow()
> > >
> > > > > > Maybe I'm missing these checks already performed?
> > > > > >
> > > > > > > > Having the test cases is a big benefit, and that part I like the most.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH v3 0/3] Documentation: security-bugs: new updates covering triage and AI
From: Jonathan Corbet @ 2026-05-12 17:14 UTC (permalink / raw)
To: Willy Tarreau, greg
Cc: Leon Romanovsky, skhan, security, workflows, linux-doc,
linux-kernel, Willy Tarreau
In-Reply-To: <20260509094755.2838-1-w@1wt.eu>
Willy Tarreau <w@1wt.eu> writes:
> This series tries to translate recent discussions on the security list
> on how to better handle reports. It details:
> - when not to Cc: the security list
> - what classes of bugs do not need to be handled privately
> - minimum requirements for AI-assisted reports
>
> As usual, this is probably perfectible but can already help in the short
> term as we can point it to reporters, so barring any strong disagreement,
> better continue to proceed in small incremental improvements and observe
> the effects.
OK, I've applied the series to docs-fixes; after a short exposure in
linux-next I'll ship it Linusward.
I have a couple of comments on the individual changes that might merit
an eventual add-on patch.
Thanks,
jon
^ permalink raw reply
* Re: [PATCH v13 3/4] gpio: rpmsg: add generic rpmsg GPIO driver
From: Shah, Tanmay @ 2026-05-12 17:19 UTC (permalink / raw)
To: Mathieu Poirier, tanmay.shah
Cc: Arnaud POULIQUEN, Beleswar Prasad Padhi, Shenwei Wang,
Andrew Lunn, Linus Walleij, Bartosz Golaszewski, Jonathan Corbet,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Bjorn Andersson,
Frank Li, Sascha Hauer, Shuah Khan, linux-gpio@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
Pengutronix Kernel Team, Fabio Estevam, Peng Fan,
devicetree@vger.kernel.org, linux-remoteproc@vger.kernel.org,
imx@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
dl-linux-imx, Bartosz Golaszewski
In-Reply-To: <agNKHsAEscc1TDiq@p14s>
On 5/12/2026 10:41 AM, Mathieu Poirier wrote:
> On Mon, May 11, 2026 at 04:35:46PM -0500, Shah, Tanmay wrote:
>>
>>
>> On 5/11/2026 12:58 PM, Mathieu Poirier wrote:
>>> On Mon, 11 May 2026 at 10:47, Shah, Tanmay <tanmays@amd.com> wrote:
>>>>
>>>>
>>>>
>>>> On 5/5/2026 10:52 AM, Shah, Tanmay wrote:
>>>>>
>>>>>
>>>>> On 5/5/2026 4:28 AM, Arnaud POULIQUEN wrote:
>>>>>> Hi Tanmay,
>>>>>>
>>>>>> On 5/4/26 21:19, Shah, Tanmay wrote:
>>>>>>>
>>>>>>> Hello all,
>>>>>>>
>>>>>>> I have started reviewing this work as well.
>>>>>>> Thanks Shenwei for this work.
>>>>>>>
>>>>>>> I have gone through only the current revision, and would like to provide
>>>>>>> idea on how to achieve GPIO number multiplexing with the RPMsg protocol.
>>>>>>> Also, have some bindings related question.
>>>>>>>
>>>>>>> Please see below:
>>>>>>>
>>>>>>> On 4/30/2026 11:40 AM, Arnaud POULIQUEN wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/30/26 14:56, Beleswar Prasad Padhi wrote:
>>>>>>>>> Hello Arnaud,
>>>>>>>>>
>>>>>>>>> On 30/04/26 13:05, Arnaud POULIQUEN wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> On 4/29/26 21:20, Mathieu Poirier wrote:
>>>>>>>>>>> On Wed, 29 Apr 2026 at 12:07, Padhi, Beleswar <b-padhi@ti.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Mathieu,
>>>>>>>>>>>>
>>>>>>>>>>>> On 4/29/2026 11:03 PM, Mathieu Poirier wrote:
>>>>>>>>>>>>> On Wed, 29 Apr 2026 at 10:53, Shenwei Wang <shenwei.wang@nxp.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: Mathieu Poirier <mathieu.poirier@linaro.org>
>>>>>>>>>>>>>>> Sent: Wednesday, April 29, 2026 10:42 AM
>>>>>>>>>>>>>>> To: Shenwei Wang <shenwei.wang@nxp.com>
>>>>>>>>>>>>>>> Cc: Andrew Lunn <andrew@lunn.ch>; Padhi, Beleswar <b-
>>>>>>>>>>>>>>> padhi@ti.com>; Linus
>>>>>>>>>>>>>>> Walleij <linusw@kernel.org>; Bartosz Golaszewski
>>>>>>>>>>>>>>> <brgl@kernel.org>; Jonathan
>>>>>>>>>>>>>>> Corbet <corbet@lwn.net>; Rob Herring <robh@kernel.org>;
>>>>>>>>>>>>>>> Krzysztof Kozlowski
>>>>>>>>>>>>>>> <krzk+dt@kernel.org>; Conor Dooley <conor+dt@kernel.org>; Bjorn
>>>>>>>>>>>>>>> Andersson
>>>>>>>>>>>>>>> <andersson@kernel.org>; Frank Li <frank.li@nxp.com>; Sascha Hauer
>>>>>>>>>>>>>>> <s.hauer@pengutronix.de>; Shuah Khan
>>>>>>>>>>>>>>> <skhan@linuxfoundation.org>; linux-
>>>>>>>>>>>>>>> gpio@vger.kernel.org; linux-doc@vger.kernel.org; linux-
>>>>>>>>>>>>>>> kernel@vger.kernel.org;
>>>>>>>>>>>>>>> Pengutronix Kernel Team <kernel@pengutronix.de>; Fabio Estevam
>>>>>>>>>>>>>>> <festevam@gmail.com>; Peng Fan <peng.fan@nxp.com>;
>>>>>>>>>>>>>>> devicetree@vger.kernel.org; linux-remoteproc@vger.kernel.org;
>>>>>>>>>>>>>>> imx@lists.linux.dev; linux-arm-kernel@lists.infradead.org; dl-
>>>>>>>>>>>>>>> linux-imx <linux-
>>>>>>>>>>>>>>> imx@nxp.com>; Bartosz Golaszewski <brgl@bgdev.pl>
>>>>>>>>>>>>>>> Subject: [EXT] Re: [PATCH v13 3/4] gpio: rpmsg: add generic
>>>>>>>>>>>>>>> rpmsg GPIO driver
>>>>>>>>>>>>>>> On Tue, Apr 28, 2026 at 03:24:59PM +0000, Shenwei Wang wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>> From: Andrew Lunn <andrew@lunn.ch>
>>>>>>>>>>>>>>>>> Sent: Monday, April 27, 2026 3:49 PM
>>>>>>>>>>>>>>>>> To: Shenwei Wang <shenwei.wang@nxp.com>
>>>>>>>>>>>>>>>>> Cc: Padhi, Beleswar <b-padhi@ti.com>; Linus Walleij
>>>>>>>>>>>>>>>>> <linusw@kernel.org>; Bartosz Golaszewski <brgl@kernel.org>;
>>>>>>>>>>>>>>>>> Jonathan
>>>>>>>>>>>>>>>>> Corbet <corbet@lwn.net>; Rob Herring <robh@kernel.org>;
>>>>>>>>>>>>>>>>> Krzysztof
>>>>>>>>>>>>>>>>> Kozlowski <krzk+dt@kernel.org>; Conor Dooley
>>>>>>>>>>>>>>>>> <conor+dt@kernel.org>;
>>>>>>>>>>>>>>>>> Bjorn Andersson <andersson@kernel.org>; Mathieu Poirier
>>>>>>>>>>>>>>>>> <mathieu.poirier@linaro.org>; Frank Li <frank.li@nxp.com>;
>>>>>>>>>>>>>>>>> Sascha
>>>>>>>>>>>>>>>>> Hauer <s.hauer@pengutronix.de>; Shuah Khan
>>>>>>>>>>>>>>>>> <skhan@linuxfoundation.org>; linux-gpio@vger.kernel.org; linux-
>>>>>>>>>>>>>>>>> doc@vger.kernel.org; linux-kernel@vger.kernel.org; Pengutronix
>>>>>>>>>>>>>>>>> Kernel Team <kernel@pengutronix.de>; Fabio Estevam
>>>>>>>>>>>>>>>>> <festevam@gmail.com>; Peng Fan <peng.fan@nxp.com>;
>>>>>>>>>>>>>>>>> devicetree@vger.kernel.org; linux- remoteproc@vger.kernel.org;
>>>>>>>>>>>>>>>>> imx@lists.linux.dev; linux-arm- kernel@lists.infradead.org;
>>>>>>>>>>>>>>>>> dl-linux-imx <linux-imx@nxp.com>; Bartosz Golaszewski
>>>>>>>>>>>>>>>>> <brgl@bgdev.pl>
>>>>>>>>>>>>>>>>> Subject: [EXT] Re: [PATCH v13 3/4] gpio: rpmsg: add generic
>>>>>>>>>>>>>>>>> rpmsg
>>>>>>>>>>>>>>>>> GPIO driver
>>>>>>>>>>>>>>>>>>> struct virtio_gpio_response {
>>>>>>>>>>>>>>>>>>> __u8 status;
>>>>>>>>>>>>>>>>>>> __u8 value;
>>>>>>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>>>>> It is the same message format. Please see the message
>>>>>>>>>>>>>>>>>> definition
>>>>>>>>>>>>>>>>> (GET_DIRECTION) below:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> + +-----+-----+-----+-----+-----+----+
>>>>>>>>>>>>>>>>>> + |0x00 |0x01 |0x02 |0x03 |0x04 |0x05|
>>>>>>>>>>>>>>>>>> + | 1 | 2 |port |line | err | dir|
>>>>>>>>>>>>>>>>>> + +-----+-----+-----+-----+-----+----+
>>>>>>>>>>>>>>>>> Sorry, but i don't see how two u8 vs six u8 are the same
>>>>>>>>>>>>>>>>> message format.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Some changes to the message format are necessary.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Virtio uses two communication channels (virtqueues): one for
>>>>>>>>>>>>>>>> requests and
>>>>>>>>>>>>>>> replies, and a second one for events.
>>>>>>>>>>>>>>>> In contrast, rpmsg provides only a single communication
>>>>>>>>>>>>>>>> channel, so a
>>>>>>>>>>>>>>>> type field is required to distinguish between different kinds
>>>>>>>>>>>>>>>> of messages.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Since rpmsg replies and events share the same message format,
>>>>>>>>>>>>>>>> an additional
>>>>>>>>>>>>>>> line is introduced to handle both cases.
>>>>>>>>>>>>>>>> Finally, rpmsg supports multiple GPIO controllers, so a port
>>>>>>>>>>>>>>>> field is added to
>>>>>>>>>>>>>>> uniquely identify the target controller.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have commented on this before - RPMSG is already providing
>>>>>>>>>>>>>>> multiplexing
>>>>>>>>>>>>>>> capability by way of endpoints. There is no need for a port
>>>>>>>>>>>>>>> field. One endpoint,
>>>>>>>>>>>>>>> one GPIO controller.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You still need a way to let the remote side know which port the
>>>>>>>>>>>>>> endpoint maps to, either
>>>>>>>>>>>>>> by embedding the port information in the message (the current
>>>>>>>>>>>>>> way), or by sending it
>>>>>>>>>>>>>> separately.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> An endpoint is created with every namespace request. There
>>>>>>>>>>>>> should be
>>>>>>>>>>>>> one namespace request for every GPIO controller, which yields a
>>>>>>>>>>>>> unique
>>>>>>>>>>>>> endpoint for each controller and eliminates the need for an extra
>>>>>>>>>>>>> field to identify them.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Right, but this can still be done by just having one namespace
>>>>>>>>>>>> request.
>>>>>>>>>>>> We can create new endpoints bound to an existing namespace/
>>>>>>>>>>>> channel by
>>>>>>>>>>>> invoking rpmsg_create_ept(). This is what I suggested here too:
>>>>>>>>>>>> https://lore.kernel.org/all/29485742-6e49-482e-
>>>>>>>>>>>> b73d-228295daaeec@ti.com/
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I will look at your suggestion (i.e link above) later this week or
>>>>>>>>>>> next week.
>>>>>>>>>>>
>>>>>>>>>>>> My mental model looks like this for the complete picture:
>>>>>>>>>>>>
>>>>>>>>>>>> 1. namespace/channel#1 = rpmsg-io
>>>>>>>>>>>> a. ept1 -> gpio-controller@1
>>>>>>>>>>>> b. ept2 -> gpio-controller@2
>>>>>>>>>>>>
>>>>>>>
>>>>>>> If my understanding of what gpio-controller is right, than this won't
>>>>>>> work. We need one rpmsg channel per gpio-controller, and in most cases
>>>>>>> there will be only one GPIO-controller on the remote side. If there are
>>>>>>> multiple or multiple instances of same controller, than we need separate
>>>>>>> channel name for that controller just like we would have separate device
>>>>>>> on the Linux.
>>>>>>
>>>>>> As done in ehe rpmsg_tty driver it could be instantiated several times with
>>>>>> the same channel/service name. This would imply a specific rpmsg to
>>>>>> retreive
>>>>>> the gpio controller index from the remote side.
>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I've asked for one endpoint per GPIO controller since the very
>>>>>>>>>>> beginning. I don't yet have a strong opinion on whether to use one
>>>>>>>>>>> namespace request per GPIO controller or a single request that spins
>>>>>>>>>>> off multiple endpoints. I'll have to look at your link and
>>>>>>>>>>> reflect on
>>>>>>>>>>> that. Regardless of how we proceed on that front, multiplexing needs
>>>>>>>>>>> to happen at the endpoint level rather than the packet level.
>>>>>>>>>>> This is
>>>>>>>>>>> the only way this work can move forward.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I would be more in favor of Mathieu’s proposal: “An endpoint is
>>>>>>>>>> created with every namespace request.”
>>>>>>>>>>
>>>>>>>>>> If the endpoint is created only on the Linux side, how do we match
>>>>>>>>>> the Linux endpoint address with the local port field on the remote
>>>>>>>>>> side?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Simply by sending a message to the remote containing the newly created
>>>>>>>>> endpoint and the port idx. Note that is this done just one time, after
>>>>>>>>> this
>>>>>>>>> Linux need not have the port field in the message everytime its sending
>>>>>>>>> a message.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> With a multi-namespace approach, the namespace could be rpmsg-io-
>>>>>>>>>> [addr], where [addr] corresponds to the GPIO controller address in
>>>>>>>>>> the DT. This would:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> You will face the same problem in this case also that you asked above:
>>>>>>>>> "how do we match the Linux endpoint address with the local port field
>>>>>>>>> on the remote side?"
>>>>>>>>
>>>>>>>> Sorry I probably introduced confusion here
>>>>>>>> my sentence should be;
>>>>>>>> With a multi-namespace approach, the namespace could be rpmsg-io-
>>>>>>>> [port],
>>>>>>>> where [port] corresponds to the GPIO controller port in the DT.
>>>>>>>>
>>>>>>>>
>>>>>>>> For instance:
>>>>>>>>
>>>>>>>> rpmsg {
>>>>>>>> rpmsg-io {
>>>>>>>> #address-cells = <1>;
>>>>>>>> #size-cells = <0>;
>>>>>>>>
>>>>>>>> gpio@25 {
>>>>>>>> compatible = "rpmsg-gpio";
>>>>>>>> reg = <25>;
>>>>>>>> gpio-controller;
>>>>>>>> #gpio-cells = <2>;
>>>>>>>> #interrupt-cells = <2>;
>>>>>>>> interrupt-controller;
>>>>>>>> };
>>>>>>>>
>>>>>>>> gpio@32 {
>>>>>>>> compatible = "rpmsg-gpio";
>>>>>>>> reg = <32>;
>>>>>>>> gpio-controller;
>>>>>>>> #gpio-cells = <2>;
>>>>>>>> #interrupt-cells = <2>;
>>>>>>>> interrupt-controller;
>>>>>>>> };
>>>>>>>> };
>>>>>>>> };
>>>>>>>>
>>>>>>>> rpmsg-io-25 would match with gpio@25
>>>>>>>> rpmsg-io-32 would match with gpio@32
>>>>>>>>
>>>>>>>
>>>>>>> The problem with this approach is, we will endup creating way too many
>>>>>>> RPMsg devices/channels. i.e. one channel per one GPIO. That limits how
>>>>>>> many GPIOs can be handled by remote from memory perspective. At
>>>>>>> somepoint we might just run-out of number ept & channels created by the
>>>>>>> remote. As of now, open-amp library supports 128 epts I think.
>>>>>>
>>>>>> Right, I proposed a solution in my previous answer to Beleswar who has
>>>>>> the same concern.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Because the endpoint that is created on a namespace request is also
>>>>>>>>> dynamic in nature. How will the remote know which endpoint addr
>>>>>>>>> Linux allocated for a namespace that it announced?
>>>>>>>>>
>>>>>>>>> As an example/PoC, I created a firmware example which announces
>>>>>>>>> 2 name services to Linux, one is the standard "rpmsg_chrdev" and
>>>>>>>>> the other is a TI specific name service "ti.ipc4.ping-pong". You can
>>>>>>>>> see it created 2 different addresses (0x400 and 0x401) for each of
>>>>>>>>> the name service request from the same firmware:
>>>>>>>>>
>>>>>>>>> root@j784s4-evm:~# dmesg | grep virtio0 | grep -i channel
>>>>>>>>> [ 9.290275] virtio_rpmsg_bus virtio0: creating channel
>>>>>>>>> ti.ipc4.ping-pong addr 0xd
>>>>>>>>> [ 9.311230] virtio_rpmsg_bus virtio0: creating channel rpmsg_chrdev
>>>>>>>>> addr 0xe
>>>>>>>>> [ 9.496645] rpmsg_chrdev virtio0.rpmsg_chrdev.-1.14: DEBUG: Channel
>>>>>>>>> formed from src = 0x400 to dst = 0xe
>>>>>>>>> [ 9.707255] rpmsg_client_sample virtio0.ti.ipc4.ping-pong.-1.13:
>>>>>>>>> new channel: 0x401 -> 0xd!
>>>>>>>>>
>>>>>>>>> So in this case, rpmsg-io-1 can have different ept addr than rpmsg-io-2
>>>>>>>>> Back to same problem. Simple solution is to reply to remote with the
>>>>>>>>> created ept addr and the index.
>>>>>>>>
>>>>>>>> That why I would like to suggest to use the name service field to
>>>>>>>> identify the port/controller, instead of the endpoint address.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> - match the RPMsg probe with the DT,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We can probe from all controllers with a single name service
>>>>>>>>> announcement too.
>>>>>>>>>
>>>>>>>>>> - provide a simple mapping between the port and the endpoint on both
>>>>>>>>>> sides,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We are trying to get rid of this mapping from Linux side to adapt
>>>>>>>>> the gpio-virtio design.
>>>>>>>>>
>>>>>>>>>> - allow multiple endpoints on the remote side,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We can support this as well with single nameservice model.
>>>>>>>>> There is no limitation. Remote has to send a message with
>>>>>>>>> its newly created ept that's all.
>>>>>>>>>
>>>>>>>>>> - provide a simple discovery mechanism for remote capabilities.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> A single announcement: "rpmsg-io" is also discovery mechanism.
>>>>>>>>>
>>>>>>>>> Feel free to let me know if you have concerns with any of the
>>>>>>>>> suggestions!
>>>>>>>>
>>>>>>>> My only concern, whatever the solution, is that we find a smart
>>>>>>>> solution to associate the correct endpoint with the correct GPIO
>>>>>>>> port/controller defined in the DT.
>>>>>>>>
>>>>>>>> I may have misunderstood your solution. Could you please help me
>>>>>>>> understand your proposal by explaining how you would handle three
>>>>>>>> GPIO ports defined in the DT, considering that the endpoint
>>>>>>>> addresses on the Linux side can be random?
>>>>>>>> If I assume there is a unique endpoint on the remote side,
>>>>>>>> I do not understand how you can match, on the firmware side,
>>>>>>>> the Linux endpoint address to the GPIO port.
>>>>>>>>
>>>>>>>> Thanks and Regards,Arnaud
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Beleswar
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Arnaud
>>>>>>>>>>
>>>>>>>>>>>> 2. namespace/channel#2 = rpmsg-i2c
>>>>>>>>>>>> a. ept1 -> i2c@1
>>>>>>>>>>>> b. ept2 -> i2c@2
>>>>>>>>>>>> c. ept3 -> i2c@3
>>>>>>>>>>>>
>>>>>>>>>>>> etc...
>>>>>>>>>>>>
>>>>>>>
>>>>>>> Just want to clear-up few terms before I jump to the solution:
>>>>>>>
>>>>>>> **RPMsg channel/device**:
>>>>>>> - These are devices announced by the remote processor, and created by
>>>>>>> linux. They are created at: /sys/bus/rpmsg/devices
>>>>>>> - The channel format: <name>.<src ept>.<dst ept>
>>>>>>>
>>>>>>> **RPMsg endpoint**:
>>>>>>> - Endpoint is differnt than channel. Single channel can have multiple
>>>>>>> endpoints, and represented in the linux with: /dev/rpmsg? devices.
>>>>>>>
>>>>>>> To create endpoint device, we have rpmsg_create_ept API, which takes
>>>>>>> channel information as input, which has src-ept, dst-ept.
>>>>>>>
>>>>>>> Following is proposed solution:
>>>>>>>
>>>>>>> 1) Assign RPMsg channel/device per rpmsg-gpio controller (Not per GPIO
>>>>>>> pin/port).
>>>>>>> - In our case that would be, single rpmsg-io node. (That makes me
>>>>>>> question if bindings are correct or not).
>>>>>>>
>>>>>>> 2) Assign GPIO number as src ept.
>>>>>>>
>>>>>>> i.e. *rpmsg-io.<GPIO number>.<dst ept>*. Do not randomly assign src
>>>>>>> endpoint.
>>>>>>>
>>>>>>> Now, RPMSG channel by spec reserves first 1024 endpoints [1], so we can
>>>>>>> add 1024 offset to the GPIO number:
>>>>>>>
>>>>>>> so, when calling rpmsg_create_ept() API, we assing src_endpoint as:
>>>>>>> (GPIO_NUMBER + RPMSG_RESERVED_ADDRESSES)
>>>>>>>
>>>>>>> Now on the remote side, there is single channel and only single-endpoint
>>>>>>> is needed that is mapped to the rpmsg-io channel callback.
>>>>>>>
>>>>>>> That callback will receive all the payloads from the Linux, which will
>>>>>>> have src-ept i.e. (RPMSG_RESERVED_ADDRESSES + GPIO_NUMBER).
>>>>>>
>>>>>>
>>>>>> Interesting approach. I also tried to find a similar solution.
>>>>>>
>>>>>> The question here is: how can we guarantee continuous addresses? Given
>>>>>> the static and dynamic allocation of endpoint addresses that are
>>>>>> implemented, my conclusion was that it is not reliable enough.
>>>>>>
>>>>>> but perhaps I missed something...
>>>>>>
>>>>>>>
>>>>>>> It can retrieve GPIO_NUMBER easily, and convert to appropriate pin based
>>>>>>> on platform specific logic.
>>>>>>>
>>>>>>> This doesn't need PORT information at all. Also it makes sure that
>>>>>>> remote is using only single-endpoint so not much memory is used.
>>>>>>>
>>>>>>> *Example*:
>>>>>>> If only rpmsg-gpio channel is created by the remote side, than following
>>>>>>> is the representation of the devices when GPIO 25, 26, 27 is assigned to
>>>>>>> the rpmsg-io controller:
>>>>>>>
>>>>>>> Linux Remote
>>>>>>>
>>>>>>> rpmsg-channel: rpmsg-gpio.0x400.0x400
>>>>>>>
>>>>>>> /dev/rpmsg0 - GPIO25 ept (rpmsg-gpio.0x419.0x400)-|
>>>>>>> |
>>>>>>> /dev/rpmsg1 - GPIO26 ept (rpmsg-gpio.0x41a.0x400)-|-> rpmsg-gpio.*.0x400
>>>>>>> |
>>>>>>> /dev/rpmsg2 - GPIO27 ept (rpmsg-gpio.0x41b.0x400)-| 0x400 ept callback.
>>>>>>>
>>>>>>>
>>>>>>> *On remote side*:
>>>>>>>
>>>>>>> ept_0x400_callback(..., int src_ept, ...,)
>>>>>>> {
>>>>>>> int gpio_num = src_ept - RPMSG_RESERVED_ADDRESSES;
>>>>>>> // platform specific logic to convert gpio num to proper pin,
>>>>>>> // just like you would convert gpio num to pin on a linux gpio
>>>>>>> controller.
>>>>>>> }
>>>>>>>
>>>>>>> My question on the binding:
>>>>>>>
>>>>>>> Why each GPIO is represented with the separate node? I think rpmsg-gpio
>>>>>>> can be represented just any other GPIO controller? Please let me know if
>>>>>>> I am missing something. So rpmsg channel/rpmsg device is not created per
>>>>>>> GPIO, but per controller. GPIO number multiplexing should be done with
>>>>>>> rpmsg src ept, that removes the need of having each GPIO as a separate
>>>>>>> node.
>>>>>>>
>>>>>>>
>>>>>>> rpmsg_gpio: rpmsg-gpio@0 {
>>>>>>> compatible = "rpmsg-gpio";
>>>>>>> reg = <0>;
>>>>>>> gpio-controller;
>>>>>>> #gpio-cells = <2>;
>>>>>>> #interrupt-cells = <2>;
>>>>>>> interrupt-controller;
>>>>>>> };
>>>>>>>
>>>>>>> Then in DT, use like regular GPIO, but with the rpmsg-gpio controller:
>>>>>>>
>>>>>>> rpmsg-gpios = <&rpmsg_gpio (GPIO NUM) (flags)>;
>>>>>>>
>>>>>>> If the intent to create separate gpio nodes was only for the channel
>>>>>>> creation, then it's not really needed.
>>>>>>>
>>>>>>> [1]
>>>>>>> https://github.com/torvalds/linux/
>>>>>>> blob/6d35786de28116ecf78797a62b84e6bf3c45aa5a/drivers/rpmsg/
>>>>>>> virtio_rpmsg_bus.c#L136
>>>>>>>
>>>>>>
>>>>>> It is already the case. bindings declare GPIO controllers, not directly
>>>>>> GPIOs in:
>>>>>>
>>>>>> [PATCH v13 2/4] dt-bindings: remoteproc: imx_rproc: Add "rpmsg" subnode
>>>>>> support
>>>>>>
>>>>>> The discussion is around having an unique RPmsg endpoint for all
>>>>>> GPIO controller or one RPmsg endpoint per GPIO controller.
>>>>>>
>>>>>
>>>>> Endpoint where remote side or linux side?
>>>>>
>>>>> If unique endpoint on remote side per gpio controller then it makes sense.
>>>>>
>>>>> Unique endpoint on linux side doesn't make sense. Instead, unique
>>>>> channel per gpio controller makes sense, and each channel will have
>>>>> multiple endpoints on linux side. As I replied to Beleswar on the other
>>>>> email, I will copy past my answer here too:
>>>>>
>>>>>
>>>>> To be more specific:
>>>>>
>>>>> Linux: remote:
>>>>>
>>>>> ch1: rpmsg-gpio.-1.1024 -> gpio-controller@1024
>>>>> - gpio-line ept1
>>>>> - gpio-line ept2 -> They all map to same callback_ept_1024.
>>>>> - gpio-line ept3
>>>>>
>>>>> ch2: rpmsg-gpio.-1.1025 -> gpio-controller@1025
>>>>> - gpio-line ept1
>>>>> - gpio-line ept2 -> They all map to same callback_ept_1025.
>>>>> - gpio-line ept3
>>>>>
>>>>
>>>>
>>>> Hi Mathieu,
>>>>
>>>> So upon more brain storming in this approach I found limitation:
>>>>
>>>> This approach won't work if host OS is any other OS but Linux. For
>>>> example, if the remote OS is zephyr/baremetal using open-amp, then Only
>>>> Linux <-> zephyr combination will work, and we won't be able to re-use
>>>> this approach for zephyr <-> zephyr use case. The concept of rpmsg
>>>> channel/device exist only in the linux kernel implementation. This
>>>> brings another question: Should the protocol we decide work on other use
>>>> cases as well? Or Linux must be the Host OS for this protocol ?
>>>>
>>>
>>> Linux and Zephyr are very distinct OS, each with their own subsystems
>>> and characteristics. The design we choose here involves RPMSG and,
>>> inherently, Linux. We can't make decisions based on what may
>>> potentially happen in Zephyr.
>>>
>>>>
>>>> I think your & Arnaud's proposed approach of single endpoint per
>>>> gpio-controller on both side makes more sense, as it will work
>>>> regardless of any OS on host or remote side.
>>>>
>>>
>>> Arnaud, Beleswar, Andrew and I are all advocating for one endpoint per
>>> GPIO controller. The remaining issue it about the best way to work
>>> out source and destination addresses between Linux and the remote
>>> processor. I'm running out of time for today but I'll return to this
>>> thread with a final analysis by the end of the week.
>>>
>>
>> Okay. Then that means multiple endpoints on Linux side can be considered.
>
> If there are multiple GPIO controllers then yes, there will be more than one
> endpoint. At this time I do now want to condiser other bus architectures (i2c,
> spi, ...) to avoid muddying an already difficult conversation.
>
>>
>> If we decide to go single-endpoint per device on both side, then for
>> that here is the proposal to represent src ept and dst ept:
>
> I do not understand what you mean by "per device" - please be more specific.
>
"per device" I mean, per rpmsg device/channel. In our case that would be
per gpio-controller.
>>
>> When we represent any device under rpmsg bus node, I think it should be
>> considered remote's view of the adddress space. So ideally we can
>> convert it to Linux view of the address space, via 'ranges' property.
>
> There is no address space to consider since there is no GPIO controller memory
> space to access. All that is done by the driver (remote processor) and
> completely hidden from Linux by rpmsg-virtio-gpio.
>
So IMHO the dt-binding is the representation of the device hardware and
is independent of how driver will access it. Any gpio-controller device
node, we are just representing how gpio-controller hardware on the
remote side looks like, and what is the corresponding view of the linux is.
The rpmsg-gpio driver is different than the platform gpio controller
driver mainly in two ways:
1) How the driver is probed: rpmsg-gpio driver will be probed when
corresponding rpmsg channel/device name-service announcment will happen
from the remote side.
2) The GPIO Ops are not performed on the hardware directly, but it's
done via rpmsg commands on the remote side.
However, the GPIO controller hardware remains the same. So bindings
shoudln't change.
IMHO That means, if I want to move any existing GPIO-controller to the
remote side, and want the rpmsg-gpio driver to handle it then, all I
need to change is the compatible string of the current gpio-controller
device node. The rest of the address space should remain the same, and
leave ranges property empty. If the remote core has different view of
the address space, then the device should contain remote's view and
parent bus (rpmsg-io bus) should provide linux view via 'ranges' property.
That is just the device hw representation in the device-tree as rpmsg
device. Same for any other type of the controller: i2c, spi etc.
Thanks,
Tanmay
>>
>> So bindings should include 'ranges' property in the parent node. Then
>> linux view of the start address becomes src ept, and remote view of the
>> start address becomes dest ept. The remote view of the start address is
>> expected to be the static src endpoint on the remote side.
>>
>> Following representation of the rpmsg devices (gpio, i2c, spi or any other):
>>
>> rpmsg {
>> #address-cells = <1>;
>> #size-cells = <1>;
>>
>> rpmsg-io {
>> compatible = "rpmsg-io-bus";
>> ranges = <remote_view_addr(dst ept) linux_view_addr(src ept) size>;
>> #address-cells = <1>;
>> #size-cells = <1>;
>>
>> gpio@remote_view_addr(or dst ept) {
>> compatible = "rpmsg-io";
>> reg = <remote_view_addr addr_space_size>;
>> gpio-controller;
>> #gpio-cells = <2>;
>> interrupt-controller;
>> #interrupt-cells = <2>;
>> };
>>
>> ...
>>
>> };
>>
>> };
>>
>> Example device-tree:
>>
>> rpmsg {
>> #address-cells = <1>;
>> #size-cells = <1>;
>>
>> rpmsg-io {
>> compatible = "rpmsg-io-bus";
>> ranges = <0x10000 0x50000 0x1000>,
>> <0x20000 0x60000 0x1000>;
>> #address-cells = <1>;
>> #size-cells = <1>;
>>
>> gpio@10000 {
>> compatible = "rpmsg-io";
>> reg = <0x10000 0x1000>;
>> gpio-controller;
>> #gpio-cells = <2>;
>> interrupt-controller;
>> #interrupt-cells = <2>;
>> };
>>
>> gpio@20000 {
>> compatible = "rpmsg-io";
>> reg = <0x20000 0x1000>;
>> gpio-controller;
>> #gpio-cells = <2>;
>> interrupt-controller;
>> #interrupt-cells = <2>;
>> };
>>
>> };
>>
>> };
>>
>>
>> Thanks,
>> Tanmay
>>
>>
>>>> To be more specific this will look like following:
>>>>
>>>> Host (Linux) Remote (baremetal/RTOS)
>>>>
>>>> rpmsg ch/device 1:
>>>> - rpmsg ept 1 <------> rpmsg ept 1 gpio-controller 0
>>>>
>>>> rpmsg ch/device 2:
>>>> - rpmsg ept 2 <------> rpmsg ept 2 gpio-controller 1
>>>>
>>>>
>>>> The question is, how to decide src ept, and dest ept on both sides?
>>>> I still think it should be static endpoints.
>>>>
>>>> I will get back with more reasoning on that.
>>>>
>>>>> On the remote side, we have to hardcode Which rpmsg controller is mapped
>>>>> to which endpoint.
>>>>>
>>>>>> Or did I misunderstand your questions?
>>>>>>
>>>>>> Thanks,
>>>>>> Arnaud
>>>>>>
>>>>>
>>>>>
>>>>> I gave this patch more time yesterday, and I think the 'reg' property
>>>>> should represent remote endpoint, instead of the gpio-controller index.
>>>>>
>>>>> So in this approach remote implementation is expected to provide
>>>>> hard-coded (static) endpoints for each gpio-controller instance, and
>>>>> that same number should be represented with the 'reg' property.
>>>>>
>>>>> On remote side:
>>>>>
>>>>> #define RPMSG_GPIO_0_CONTROLLER_EPT (RPMSG_RESERVED_ADDRESSES + 1) // 1024
>>>>>
>>>>> ept_1024_callback() {
>>>>>
>>>>> // handle appropriate gpio port ()
>>>>>
>>>>> }
>>>>>
>>>>> On linux side:
>>>>>
>>>>> So new representation of controller:
>>>>>
>>>>> rpmsg_gpio_0: gpio@1024 {
>>>>> compatible = "rpmsg-gpio";
>>>>> reg = <1024>;
>>>>> gpio-controller;
>>>>> #gpio-cells = <2>;
>>>>> #interrupt-cells = <2>;
>>>>> interrupt-controller;
>>>>> };
>>>>>
>>>>> rpmsg_gpio_1: gpio@1025 {
>>>>> compatible = "rpmsg-gpio";
>>>>> reg = <1025>;
>>>>> gpio-controller;
>>>>> #gpio-cells = <2>;
>>>>> #interrupt-cells = <2>;
>>>>> interrupt-controller;
>>>>> };
>>>>>
>>>>> gpios = <&rpmsg_gpio_0 (GPIO NUM or PIN) flags>,
>>>>> <&rpmsg_gpio_1 (GPIO NUM or PIN) flags>;
>>>>>
>>>>> Now in the linux driver:
>>>>>
>>>>> You can easily retrieve destination endpoint when we want to send the
>>>>> command to the gpio controller via device's "reg" property.
>>>>>
>>>>> This approach also provides built-in security as well. Because now
>>>>> gpio-controller instance is hardcoded with the endpoint callback, it
>>>>> can't be modified/addressed without changing the 'reg' property.
>>>>>
>>>>> Just like you wouldn't change device address for the instance of the
>>>>> gpio-controller right?
>>>>>
>>>>> This approach can be easily adapted to all the other rpmsg controllers
>>>>> as well.
>>>>>
>>>>> So, dynamic endpoint allocation doesn't make sense in this case. Dynamic
>>>>> endpoint allocation makes more sense for user-space apps which don't
>>>>> really care about endpoints and only payloads.
>>>>>
>>>>> But, here we are multiplexing device-addresses with endpoints, and so it
>>>>> has to be fixed, and presented via 'reg' property. So, firmware can't
>>>>> change device-address without Linux knowing it.
>>>>>
>>>>> Thanks,
>>>>> Tanmay
>>>>>
>>>>>
>>>>>>
>>>>>>>>>>>> This way device groups are isolated with each channel/namespace, and
>>>>>>>>>>>> instances within each device groups are also respected with specific
>>>>>>>>>>>> endpoints.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Beleswar
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
^ permalink raw reply
* Re: [PATCH v2 08/14] userfaultfd: add UFFDIO_REGISTER_MODE_RWP and UFFDIO_RWPROTECT plumbing
From: Mike Rapoport @ 2026-05-12 17:20 UTC (permalink / raw)
To: Kiryl Shutsemau (Meta)
Cc: akpm, peterx, david, ljs, surenb, vbabka, Liam.Howlett, ziy,
corbet, skhan, seanjc, pbonzini, jthoughton, aarcange, sj,
usama.arif, linux-mm, linux-kernel, linux-doc, linux-kselftest,
kvm, kernel-team
In-Reply-To: <1ad0cb61a7b5a33a5375baadbd0720ba2ba43d2f.1778254670.git.kas@kernel.org>
On Fri, May 08, 2026 at 04:55:20PM +0100, Kiryl Shutsemau (Meta) wrote:
> Add the userspace interface for read-write protection tracking:
>
> - UFFDIO_REGISTER_MODE_RWP register a range for RWP tracking
> - UFFD_FEATURE_RWP capability bit
> - UFFDIO_RWPROTECT install / remove RWP on a range
>
> Registration sets VM_UFFD_RWP on the VMA. Combining MODE_WP with
> MODE_RWP is rejected because both modes claim the uffd PTE bit.
>
> UFFDIO_RWPROTECT is the bidirectional counterpart of
> UFFDIO_WRITEPROTECT:
>
> - MODE_RWP change_protection() with MM_CP_UFFD_RWP
> installs PAGE_NONE and sets the uffd bit on
> present PTEs
> - !MODE_RWP change_protection() with MM_CP_UFFD_RWP_RESOLVE
> restores vma->vm_page_prot and clears the bit
>
> userfaultfd_clear_vma() runs the same resolve pass on unregister so
> RWP state cannot outlive the uffd.
>
> Re-registering a range must not drop a mode that installs per-PTE
> markers (WP or RWP); doing so returns -EBUSY. This also closes a
> pre-existing window where re-registering without MODE_WP would strand
> uffd-wp markers: before, those caused extra write-faults but were
> otherwise benign; with RWP preservation in place, a subsequent
> mprotect() on a VM_UFFD_RWP VMA would silently promote the stale
> markers to RWP.
>
> The feature is not yet advertised. UFFDIO_REGISTER_MODE_RWP,
> UFFD_FEATURE_RWP, and _UFFDIO_RWPROTECT are intentionally absent from
> UFFD_API_REGISTER_MODES, UFFD_API_FEATURES, and UFFD_API_RANGE_IOCTLS,
> so UFFDIO_API masks them out and the register-mode validator rejects
> the bit. The follow-up patch adds fault dispatch and exposes the UAPI.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
with a comment below
> ---
> Documentation/admin-guide/mm/userfaultfd.rst | 10 ++
> fs/userfaultfd.c | 84 +++++++++++++++++
> include/linux/userfaultfd_k.h | 2 +
> include/uapi/linux/userfaultfd.h | 19 ++++
> mm/userfaultfd.c | 97 +++++++++++++++++++-
> 5 files changed, 209 insertions(+), 3 deletions(-)
>
> + /*
> + * Pre-scan the range: validate every spanned VMA before applying
> + * any change_protection() so a partial failure cannot leave the
> + * process with only a prefix of the range re-protected.
> + */
> + err = -ENOENT;
> + for_each_vma_range(vmi, dst_vma, end) {
> + if (!userfaultfd_rwp(dst_vma))
> + return -ENOENT;
> +
> + if (is_vm_hugetlb_page(dst_vma)) {
> + unsigned long page_mask;
> +
> + page_mask = vma_kernel_pagesize(dst_vma) - 1;
> + if ((start & page_mask) || (len & page_mask))
> + return -EINVAL;
> + }
> + err = 0;
> + }
> + if (err)
> + return err;
It's an interesting way to say "no VMA found in range" :)
I think bool found and
if (!found)
return -ENOENT;
looks more readable.
--
Sincerely yours,
Mike.
^ permalink raw reply
* Re: [PATCH v3 2/3] Documentation: security-bugs: explain what is and is not a security bug
From: Jonathan Corbet @ 2026-05-12 17:20 UTC (permalink / raw)
To: Willy Tarreau, greg
Cc: Leon Romanovsky, skhan, security, workflows, linux-doc,
linux-kernel, Willy Tarreau, Greg KH
In-Reply-To: <20260509094755.2838-3-w@1wt.eu>
Willy Tarreau <w@1wt.eu> writes:
> The use of automated tools to find bugs in random locations of the kernel
> induces a raise of security reports even if most of them should just be
> reported as regular bugs. This patch is an attempt at drawing a line
> between what qualifies as a security bug and what does not, hoping to
> improve the situation and ease decision on the reporter's side.
>
> It defers the enumeration to a new file, threat-model.rst, that tries
> to enumerate various classes of issues that are and are not security
> bugs. This should permit to more easily update this file for various
> subsystem-specific rules without having to revisit the security bug
> reporting guide.
One thing here:
[...]
> +* **Capability-based protection**:
> +
> + * users not having the ``CAP_SYS_ADMIN`` capability may not alter the
> + kernel's configuration, memory nor state, change other users' view of the
> + file system layout, grant any user capabilities they do not have, nor
> + affect the system's availability (shutdown, reboot, panic, hang, or making
> + the system unresponsive via unbounded resource exhaustion).
That is pretty demonstrably not true, and will likely elicit challenges
at some point. There are a lot of "make me root" capabilities that
enable users to do all of those things; consider CAP_DAC_OVERRIDE as an
obvious example. I think that just about all of the capabilities will
enable at least one of those things - that's why the capabilities exist
in the first place. So I think this needs to be written far more
generally.
As a lower-priority thing, lockdown mode is meant to at least try to
provide some stronger guarantees, and lockdown circumvention seems to be
normally be viewed as a security bug. Worth a mention?
Thanks,
jon
^ permalink raw reply
* Re: [PATCH v3 3/3] Documentation: security-bugs: clarify requirements for AI-assisted reports
From: Jonathan Corbet @ 2026-05-12 17:21 UTC (permalink / raw)
To: Willy Tarreau, greg
Cc: Leon Romanovsky, skhan, security, workflows, linux-doc,
linux-kernel, Willy Tarreau, Greg KH
In-Reply-To: <20260509094755.2838-4-w@1wt.eu>
Willy Tarreau <w@1wt.eu> writes:
> AI tools are increasingly used to assist in bug discovery. While these
> tools can identify valid issues, reports that are submitted without
> manual verification often lack context, contain speculative impact
> assessments, or include unnecessary formatting. Such reports increase
> triage effort, waste maintainers' time and may be ignored.
>
> Reports where the reporter has verified the issue and the proposed fix
> typically meet quality standards. This documentation outlines specific
> requirements for length, formatting, and impact evaluation to reduce
> the effort needed to deal with these reports.
>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Reviewed-by: Leon Romanovsky <leon@kernel.org>
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
> Documentation/process/security-bugs.rst | 57 +++++++++++++++++++++++++
> 1 file changed, 57 insertions(+)
One nit:
> + * **Impact Evaluation**: Many AI-generated reports lack an understanding of
> + the kernel's threat model and go to great lengths inventing theoretical
> + consequences.
If only we had a shiny new document describing that threat model that we
could reference here... :)
Thanks,
jon
^ permalink raw reply
* Re: [kees:for-next/hardening 1/1] htmldocs: Documentation/driver-api/basics:127: ./include/linux/stddef.h:110: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
From: Gustavo A. R. Silva @ 2026-05-12 17:24 UTC (permalink / raw)
To: Kees Cook, Gustavo A. R. Silva
Cc: kernel test robot, oe-kbuild-all, linux-doc
In-Reply-To: <202605120755.4A2AC441EB@keescook>
> Oh, hrm, there are a lot of errors in stddef.h for the "htmldocs" make
> target. Gustavo, can you see what's needed to fix these?
Sure thing. I saw that yesterday.
-Gustavo
^ permalink raw reply
* Re: [PATCH v12 02/11] lib: kstrtox: add kstrtoudec64() and kstrtodec64()
From: Rodrigo Alencar @ 2026-05-12 17:26 UTC (permalink / raw)
To: Andy Shevchenko, Rodrigo Alencar
Cc: Andy Shevchenko, Jonathan Cameron, Rodrigo Alencar via B4 Relay,
rodrigo.alencar, linux-kernel, linux-iio, devicetree, linux-doc,
David Lechner, Andy Shevchenko, Lars-Peter Clausen,
Michael Hennerich, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Jonathan Corbet, Andrew Morton, Petr Mladek, Steven Rostedt,
Rasmus Villemoes, Sergey Senozhatsky, Shuah Khan, David Laight
In-Reply-To: <agNfqiZpGZAM-x_H@ashevche-desk.local>
On 26/05/12 08:13PM, Andy Shevchenko wrote:
> On Tue, May 12, 2026 at 05:35:59PM +0100, Rodrigo Alencar wrote:
> > On 26/05/12 06:21PM, Andy Shevchenko wrote:
> > > On Tue, May 12, 2026 at 6:11 PM Rodrigo Alencar
> > > <455.rodrigo.alencar@gmail.com> wrote:
> > > > On 26/05/12 05:43PM, Andy Shevchenko wrote:
> > > > > On Tue, May 12, 2026 at 03:12:24PM +0100, Rodrigo Alencar wrote:
> > > > > > On 26/05/12 04:48PM, Andy Shevchenko wrote:
> > > > > > > On Tue, May 12, 2026 at 02:21:14PM +0100, Rodrigo Alencar wrote:
> > > > > > > > On 26/05/12 04:12PM, Andy Shevchenko wrote:
> > > > > > > > > On Tue, May 12, 2026 at 12:39:53PM +0100, Jonathan Cameron wrote:
> > > > > > > > > > On Sun, 10 May 2026 13:42:20 +0100
> > > > > > > > > > Rodrigo Alencar via B4 Relay <devnull+rodrigo.alencar.analog.com@kernel.org> wrote:
> > > > > > > > > >
> > > > > > > > > > > Add helpers that parses decimal numbers into 64-bit number, i.e., decimal
> > > > > > > > > > > point numbers with pre-defined scale are parsed into a 64-bit value (fixed
> > > > > > > > > > > precision). After the decimal point, digits beyond the specified scale
> > > > > > > > > > > are ignored.
> > > > > > > > > >
> > > > > > > > > > Whilst Rodrigo has already replied to say there will be another version
> > > > > > > > > > I'd like to request final feedback from those who were involved in the parser
> > > > > > > > > > discussions.
> > > > > > > > > >
> > > > > > > > > > They got very involved and I'm far from an expert in the right way to do
> > > > > > > > > > this stuff.
> > > > > > > > > >
> > > > > > > > > > I don't think David Laight was +CC so I've added that.
> > > > > > > > > > David, Andy - I think you two were most involved in that discussion:
> > > > > > > > > > Any objections to the end result?
> > > > > > > > >
> > > > > > > > > I already said a few times about the naming. I do not like the kstrto*()
> > > > > > > > > be semantically different on how they treat the input. Second point is
> > > > > > > > > to avoid code duplication, but this one is less of a concern since the
> > > > > > > > > new code is in the library close to the other potentially duplicate code
> > > > > > > > > piece and hence can be addressed later.
> > > > > > > >
> > > > > > > > I suppose I reached into kstrtodec64() and kstrtoudec64() because it aligns
> > > > > > > > with your expectations for kstrto*() semantics, no? Those include:
> > > > > > > > - overflow check;
> > > > > > > > - extensive input validation;
> > > > > > > > - optional '\n' in the end;
> > > > > > > > - mandatory nul-termination.
> > > > > > > >
> > > > > > > > am I missing anything?
> > > > > > >
> > > > > > > When we add scale we basically make that not true. Moreover the code in this
> > > > > > > patch makes scale == number_of_characters which I think a bit fragile, however
> > > > > > > it's about the fractional part when the amount of digits is equal to scale.
> > > > > >
> > > > > > That is not really the case. It is being set as a limit, so it does check for
> > > > > > truncation and zero-padding.
> > > > >
> > > > > I do not see it happens in _parse_integer_limit(). It doesn't try to parse more
> > > > > characters than it's requested in max_chars. It doesn't check if there are more
> > > > > character nor their converted values.
> > > > >
> > > > > > > To make this work as expected we need to add an additional call like
> > > > > > > kstrtoull() (and perhaps drop that \n and NUL-terminator checks) and see
> > > > > > > if that overflows or not. Since it's a fractional part it must have less
> > > > > > > than 20 (decimal) digits there, so we check the rv (or how many digits
> > > > > > > were parsed successfully) and compare to 20. If it's more, we got too many
> > > > > > > decimal digits.
> > > > > >
> > > > > > For overflow it checks the KSTRTOX_OVERFLOW flag and leverages check_mul_overflow()
> > > > > > and check_add_overflow() when combining fractional and integer parts. The amount
> > > > > > of characters is not really important there. The scale cannot be bigger than 19 and
> > > > > > that makes sure that int_pow() does not overflow. The code uses _parse_integer_limit()
> > > > > > due to the nature of input and to avoid 64-bit division, kstrtoull() at any point
> > > > > > (parsing integer or fractional parts) does not make much sense.
> > > > >
> > > > > Under 'like kstrotoull()' I meant something that repeats needed functionality.
> > > > > I believe it's parse_integer() (without limit).
> > > >
> > > > I think we are going in circles here and we could look at the code instead:
> > > > - integer parsing with _parse_integer()
> > > > - overflow check and validation of the return value
> > > > - fractional parsing with _parse_integer_limit()
> > > > - overflow check and validation of the return value
> > >
> > > No, this is not fully true. That's what my whole point is about. The
> > > max_chars parameter limits the input check, then it skips an arbitrary
> > > number of digits and only *then* it checks for \n and \0. What will be
> > > the result of the
> > > 0.00000000000000000000000000000000423 in your case? Whatever scale you
> > > gave it will return 0 without checking on how many digits were
> > > supplied.
> >
> > I suppose that is a valid input and 0 is the expected result there.
> >
> > > All the same for 0.9999999999999999999999999999999000423. My
> > > point is that we should limit this by 19 digits.
> >
> > why we need to limit by 19? Digits beyond the scale carry no value...
>
> ...only if they are all 0:s.
I thought your concern was on input length.
> > just like leading zeros to the integer part (which is also accepted by
> > kstrtoull() when parsing with base 10). Not sure why this is invalid input.
>
> See above. I agree on truncating trailing 0:s as it's done for leading ones
> in integer part, but if any of the digit behind 19th is not 0, it's an overflow
> condition (or bad input, depending how strict the rules are).
stating in the documentation that digits beyond the scale are ignored is not
enough?
> > > On top of that, what about -0.9(19 times) ? the fraction should be u64
> > > in this case and it's fine. The sign applies to the combined value.
> >
> > yes, range for signed values are verified later.
>
> > > > - extra scaling and truncation happening outside if needed.
> > >
> > > Right, but the given input may be way too long and still needs more validation.
> >
> > What is the problem with a long input of digits?
> > C compiler does not complain about this when parsing a float value,
> > python does not
> > complain about this when parsing floats or decimals either.
>
> Because there is an exponent limit and for double it's something like 1e307
> IIRC, meaning, try 1024 digits to be sure.
>
> Python most likely uses the library for big numbers, you can't compare it at all with this.
You would be fine if the truncation loop:
while (isdigit(*s)) /* truncate */
s++;
is bounded by (19-scale) iteration count? or it should keep iterating if those are zero?
is that the only concern? Again, the usage of _parse_integer_limit(s, 10, &_frac, scale)
avoids a 64-bit division when checking the rv.
> > > > - check for input termination
> > > > - combination of integer and fractional parts with check_mul_overflow() and check_add_overflow()
> > > >
> > > > > > > Maybe I'm missing these checks already performed?
> > > > > > >
> > > > > > > > > Having the test cases is a big benefit, and that part I like the most.
>
> --
> With Best Regards,
> Andy Shevchenko
>
>
--
Kind regards,
Rodrigo Alencar
^ permalink raw reply
* Re: [PATCH v2 09/14] mm/userfaultfd: add RWP fault delivery and expose UFFDIO_REGISTER_MODE_RWP
From: Mike Rapoport @ 2026-05-12 17:29 UTC (permalink / raw)
To: Kiryl Shutsemau (Meta)
Cc: akpm, peterx, david, ljs, surenb, vbabka, Liam.Howlett, ziy,
corbet, skhan, seanjc, pbonzini, jthoughton, aarcange, sj,
usama.arif, linux-mm, linux-kernel, linux-doc, linux-kselftest,
kvm, kernel-team
In-Reply-To: <454b3381cb7ead65291b2d7e24c0bff62e55c41b.1778254670.git.kas@kernel.org>
On Fri, May 08, 2026 at 04:55:21PM +0100, Kiryl Shutsemau (Meta) wrote:
> Wire the fault side of read-write protection tracking and turn the
> userspace interface on.
>
> An RWP-protected PTE is PAGE_NONE with the uffd bit set. The
> PROT_NONE triggers a fault on any access; the uffd bit distinguishes
> it from plain mprotect(PROT_NONE) or NUMA hinting.
>
> Fault dispatch, per level:
>
> PTE handle_pte_fault() -> do_uffd_rwp()
> PMD __handle_mm_fault() -> do_huge_pmd_uffd_rwp()
> hugetlb hugetlb_fault() -> hugetlb_handle_userfault()
>
> The RWP branches gate on userfaultfd_pte_rwp() / userfaultfd_huge_pmd_rwp()
> (VM_UFFD_RWP plus the uffd bit) and fall through to do_numa_page() /
> do_huge_pmd_numa_page() otherwise. Each delivers a
> UFFD_PAGEFAULT_FLAG_RWP message through handle_userfault(); the handler
> resolves it with UFFDIO_RWPROTECT clearing MODE_RWP.
>
> userfaultfd_must_wait() and userfaultfd_huge_must_wait() add matching
> protnone+uffd waiters so sync-mode fault handlers block correctly.
>
> Expose the UAPI:
>
> UFFDIO_REGISTER_MODE_RWP -> UFFD_API_REGISTER_MODES
> UFFD_FEATURE_RWP -> UFFD_API_FEATURES
> _UFFDIO_RWPROTECT -> UFFD_API_RANGE_IOCTLS
> UFFD_API_RANGE_IOCTLS_BASIC
>
> UFFD_FEATURE_RWP is masked out at UFFDIO_API time when PROT_NONE is
> not available or VM_UFFD_RWP aliases VM_NONE (32-bit), so userspace
> never sees an advertised-but-broken feature.
>
> Works on anonymous, shmem, and hugetlb memory.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Assisted-by: Claude:claude-opus-4-6
A small nit below, other than that
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> @@ -347,6 +359,14 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
> */
> if (!pte_write(ptent) && (reason & VM_UFFD_WP))
> goto out;
> + /*
> + * PTE is still RW-protected (protnone with uffd bit), wait for
> + * userspace to resolve. Plain PROT_NONE without the marker is not
> + * an RWP fault.
> + */
> + if (pte_protnone(ptent) && pte_uffd(ptent) &&
> + (reason & VM_UFFD_RWP))
Nit: this fits even in 80-chars line
> + goto out;
>
> ret = false;
> out:
--
Sincerely yours,
Mike.
^ permalink raw reply
* htmldocs: Documentation/driver-api/vfio_pci_liveupdate:7: ./drivers/vfio/pci/vfio_pci_liveupdate.c:78: WARNING: unknown document: '/PCI/liveupdate' [ref.doc]
From: kernel test robot @ 2026-05-12 17:28 UTC (permalink / raw)
To: David Matlack; +Cc: oe-kbuild-all, 0day robot, Vipin Sharma, linux-doc
tree: https://github.com/intel-lab-lkp/linux/commits/Vipin-Sharma/vfio-pci-Register-a-file-handler-with-Live-Update-Orchestrator/20260512-152829
head: ebd50d810440364055692a3ed1d967cd7149d0dc
commit: 4ab988bdf43079ff3eb58904f591ec464f289db5 docs: liveupdate: Add documentation for VFIO PCI
date: 10 hours ago
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
docutils: docutils (Docutils 0.21.2, Python 3.13.5, on linux)
reproduce: (https://download.01.org/0day-ci/archive/20260512/202605121946.GDFPWf9X-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605121946.GDFPWf9X-lkp@intel.com/
All warnings (new ones prefixed by >>):
Documentation/userspace-api/landlock:550: ./include/uapi/linux/landlock.h:45: ERROR: Unknown target name: "network flags". [docutils]
Documentation/userspace-api/landlock:550: ./include/uapi/linux/landlock.h:50: ERROR: Unknown target name: "scope flags". [docutils]
Documentation/userspace-api/landlock:550: ./include/uapi/linux/landlock.h:24: ERROR: Unknown target name: "filesystem flags". [docutils]
Documentation/userspace-api/landlock:559: ./include/uapi/linux/landlock.h:168: ERROR: Unknown target name: "filesystem flags". [docutils]
Documentation/userspace-api/landlock:559: ./include/uapi/linux/landlock.h:191: ERROR: Unknown target name: "network flags". [docutils]
>> Documentation/driver-api/vfio_pci_liveupdate:7: ./drivers/vfio/pci/vfio_pci_liveupdate.c:78: WARNING: unknown document: '/PCI/liveupdate' [ref.doc]
Documentation/networking/skbuff:36: ./include/linux/skbuff.h:181: WARNING: Failed to create a cross reference. A title or caption not found: 'crc' [ref.ref]
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply
* Re: [PATCH v2 10/14] mm/pagemap: add PAGE_IS_ACCESSED for RWP tracking
From: Mike Rapoport @ 2026-05-12 17:41 UTC (permalink / raw)
To: Kiryl Shutsemau (Meta)
Cc: akpm, peterx, david, ljs, surenb, vbabka, Liam.Howlett, ziy,
corbet, skhan, seanjc, pbonzini, jthoughton, aarcange, sj,
usama.arif, linux-mm, linux-kernel, linux-doc, linux-kselftest,
kvm, kernel-team
In-Reply-To: <c076bf8d482e80512b0b1db0e107ef0c822c5ddf.1778254670.git.kas@kernel.org>
On Fri, May 08, 2026 at 04:55:22PM +0100, Kiryl Shutsemau (Meta) wrote:
> PAGEMAP_SCAN already reports PAGE_IS_WRITTEN from the inverted uffd
> PTE bit, targeting the UFFDIO_WRITEPROTECT workflow. UFFDIO_RWPROTECT
> reuses the same PTE bit as a marker for read-write protection, but
> "has been written" and "has been accessed" are distinct semantic
> signals — they happen to share one PTE bit today only because the two
> implementations share infrastructure.
>
> Give RWP its own pagemap category so the UAPI does not conflate them:
>
> PAGE_IS_WRITTEN reported on VM_UFFD_WP VMAs, !pte_uffd(pte)
> PAGE_IS_ACCESSED reported on VM_UFFD_RWP VMAs, !pte_uffd(pte)
>
> Both still read the same PTE bit today, but each is scoped to the VMA
> whose registered mode makes the bit meaningful. If a future
> implementation moves RWP to a separate PTE bit, only PAGE_IS_ACCESSED
> switches over.
>
> This is a UAPI narrowing. Outside VM_UFFD_WP VMAs the uffd bit is
> always clear, so PAGEMAP_SCAN used to flag PAGE_IS_WRITTEN on every
> present PTE there — a meaningless duplicate of PAGE_IS_PRESENT. Now
> PAGE_IS_WRITTEN fires only inside VM_UFFD_WP VMAs.
>
> pagemap_hugetlb_category() now takes the vma like its PTE/PMD peers.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Assisted-by: Claude:claude-opus-4-6
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
> Documentation/admin-guide/mm/pagemap.rst | 13 ++++-
> fs/proc/task_mmu.c | 73 ++++++++++++++++++------
> include/uapi/linux/fs.h | 1 +
> tools/include/uapi/linux/fs.h | 1 +
> 4 files changed, 67 insertions(+), 21 deletions(-)
--
Sincerely yours,
Mike.
^ permalink raw reply
* Re: [PATCH v12 02/11] lib: kstrtox: add kstrtoudec64() and kstrtodec64()
From: Andy Shevchenko @ 2026-05-12 17:46 UTC (permalink / raw)
To: Rodrigo Alencar
Cc: Andy Shevchenko, Jonathan Cameron, Rodrigo Alencar via B4 Relay,
rodrigo.alencar, linux-kernel, linux-iio, devicetree, linux-doc,
David Lechner, Andy Shevchenko, Lars-Peter Clausen,
Michael Hennerich, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Jonathan Corbet, Andrew Morton, Petr Mladek, Steven Rostedt,
Rasmus Villemoes, Sergey Senozhatsky, Shuah Khan, David Laight
In-Reply-To: <ru2h3ip7qf6j54dlrij54nwp45uyq6m2e6zspt6v6eynpsagqq@eo5v3yparuhh>
On Tue, May 12, 2026 at 06:26:12PM +0100, Rodrigo Alencar wrote:
> On 26/05/12 08:13PM, Andy Shevchenko wrote:
> > On Tue, May 12, 2026 at 05:35:59PM +0100, Rodrigo Alencar wrote:
> > > On 26/05/12 06:21PM, Andy Shevchenko wrote:
> > > > On Tue, May 12, 2026 at 6:11 PM Rodrigo Alencar
> > > > <455.rodrigo.alencar@gmail.com> wrote:
> > > > > On 26/05/12 05:43PM, Andy Shevchenko wrote:
> > > > > > On Tue, May 12, 2026 at 03:12:24PM +0100, Rodrigo Alencar wrote:
> > > > > > > On 26/05/12 04:48PM, Andy Shevchenko wrote:
> > > > > > > > On Tue, May 12, 2026 at 02:21:14PM +0100, Rodrigo Alencar wrote:
> > > > > > > > > On 26/05/12 04:12PM, Andy Shevchenko wrote:
> > > > > > > > > > On Tue, May 12, 2026 at 12:39:53PM +0100, Jonathan Cameron wrote:
> > > > > > > > > > > On Sun, 10 May 2026 13:42:20 +0100
> > > > > > > > > > > Rodrigo Alencar via B4 Relay <devnull+rodrigo.alencar.analog.com@kernel.org> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Add helpers that parses decimal numbers into 64-bit number, i.e., decimal
> > > > > > > > > > > > point numbers with pre-defined scale are parsed into a 64-bit value (fixed
> > > > > > > > > > > > precision). After the decimal point, digits beyond the specified scale
> > > > > > > > > > > > are ignored.
> > > > > > > > > > >
> > > > > > > > > > > Whilst Rodrigo has already replied to say there will be another version
> > > > > > > > > > > I'd like to request final feedback from those who were involved in the parser
> > > > > > > > > > > discussions.
> > > > > > > > > > >
> > > > > > > > > > > They got very involved and I'm far from an expert in the right way to do
> > > > > > > > > > > this stuff.
> > > > > > > > > > >
> > > > > > > > > > > I don't think David Laight was +CC so I've added that.
> > > > > > > > > > > David, Andy - I think you two were most involved in that discussion:
> > > > > > > > > > > Any objections to the end result?
> > > > > > > > > >
> > > > > > > > > > I already said a few times about the naming. I do not like the kstrto*()
> > > > > > > > > > be semantically different on how they treat the input. Second point is
> > > > > > > > > > to avoid code duplication, but this one is less of a concern since the
> > > > > > > > > > new code is in the library close to the other potentially duplicate code
> > > > > > > > > > piece and hence can be addressed later.
> > > > > > > > >
> > > > > > > > > I suppose I reached into kstrtodec64() and kstrtoudec64() because it aligns
> > > > > > > > > with your expectations for kstrto*() semantics, no? Those include:
> > > > > > > > > - overflow check;
> > > > > > > > > - extensive input validation;
> > > > > > > > > - optional '\n' in the end;
> > > > > > > > > - mandatory nul-termination.
> > > > > > > > >
> > > > > > > > > am I missing anything?
> > > > > > > >
> > > > > > > > When we add scale we basically make that not true. Moreover the code in this
> > > > > > > > patch makes scale == number_of_characters which I think a bit fragile, however
> > > > > > > > it's about the fractional part when the amount of digits is equal to scale.
> > > > > > >
> > > > > > > That is not really the case. It is being set as a limit, so it does check for
> > > > > > > truncation and zero-padding.
> > > > > >
> > > > > > I do not see it happens in _parse_integer_limit(). It doesn't try to parse more
> > > > > > characters than it's requested in max_chars. It doesn't check if there are more
> > > > > > character nor their converted values.
> > > > > >
> > > > > > > > To make this work as expected we need to add an additional call like
> > > > > > > > kstrtoull() (and perhaps drop that \n and NUL-terminator checks) and see
> > > > > > > > if that overflows or not. Since it's a fractional part it must have less
> > > > > > > > than 20 (decimal) digits there, so we check the rv (or how many digits
> > > > > > > > were parsed successfully) and compare to 20. If it's more, we got too many
> > > > > > > > decimal digits.
> > > > > > >
> > > > > > > For overflow it checks the KSTRTOX_OVERFLOW flag and leverages check_mul_overflow()
> > > > > > > and check_add_overflow() when combining fractional and integer parts. The amount
> > > > > > > of characters is not really important there. The scale cannot be bigger than 19 and
> > > > > > > that makes sure that int_pow() does not overflow. The code uses _parse_integer_limit()
> > > > > > > due to the nature of input and to avoid 64-bit division, kstrtoull() at any point
> > > > > > > (parsing integer or fractional parts) does not make much sense.
> > > > > >
> > > > > > Under 'like kstrotoull()' I meant something that repeats needed functionality.
> > > > > > I believe it's parse_integer() (without limit).
> > > > >
> > > > > I think we are going in circles here and we could look at the code instead:
> > > > > - integer parsing with _parse_integer()
> > > > > - overflow check and validation of the return value
> > > > > - fractional parsing with _parse_integer_limit()
> > > > > - overflow check and validation of the return value
> > > >
> > > > No, this is not fully true. That's what my whole point is about. The
> > > > max_chars parameter limits the input check, then it skips an arbitrary
> > > > number of digits and only *then* it checks for \n and \0. What will be
> > > > the result of the
> > > > 0.00000000000000000000000000000000423 in your case? Whatever scale you
> > > > gave it will return 0 without checking on how many digits were
> > > > supplied.
> > >
> > > I suppose that is a valid input and 0 is the expected result there.
> > >
> > > > All the same for 0.9999999999999999999999999999999000423. My
> > > > point is that we should limit this by 19 digits.
> > >
> > > why we need to limit by 19? Digits beyond the scale carry no value...
> >
> > ...only if they are all 0:s.
>
> I thought your concern was on input length.
One of, since I think you rose the topic of leading 0:s for integers and
I agreed with that which makes sense to have mirrored in fractional part.
> > > just like leading zeros to the integer part (which is also accepted by
> > > kstrtoull() when parsing with base 10). Not sure why this is invalid input.
> >
> > See above. I agree on truncating trailing 0:s as it's done for leading ones
> > in integer part, but if any of the digit behind 19th is not 0, it's an overflow
> > condition (or bad input, depending how strict the rules are).
>
> stating in the documentation that digits beyond the scale are ignored is not
> enough?
It's in case we are not for kstrto*() family. My understanding that kstrto*()
use strict rules on the input in overflow check.
> > > > On top of that, what about -0.9(19 times) ? the fraction should be u64
> > > > in this case and it's fine. The sign applies to the combined value.
> > >
> > > yes, range for signed values are verified later.
> >
> > > > > - extra scaling and truncation happening outside if needed.
> > > >
> > > > Right, but the given input may be way too long and still needs more validation.
> > >
> > > What is the problem with a long input of digits?
> > > C compiler does not complain about this when parsing a float value,
> > > python does not
> > > complain about this when parsing floats or decimals either.
> >
> > Because there is an exponent limit and for double it's something like 1e307
> > IIRC, meaning, try 1024 digits to be sure.
> >
> > Python most likely uses the library for big numbers, you can't compare it at all with this.
>
> You would be fine if the truncation loop:
>
> while (isdigit(*s)) /* truncate */
> s++;
>
> is bounded by (19-scale) iteration count? or it should keep iterating if those are zero?
Ideally both.
We don't care about the digits in the range of 19-scale and skip all 0:s after
that.
/* truncate unrequired digits within type limit, i.e. 19 decimal digits */
while (isdigit(*s) && "(s - pos_of_dot) is less than 19")
s++;
while (s == '0') /* truncate trailing 0:s, it's not a bad input nor overflow */
s++;
// Now if it's not \0 nor \n and
// a) still a digit consider either overflow or bad input,
// b) if not a digit, consider as bad input.
In a) I tend to be on par with the other k*() and consider that as overflow.
> is that the only concern? Again, the usage of _parse_integer_limit(s, 10, &_frac, scale)
> avoids a 64-bit division when checking the rv.
I'm not against usage of _parse_integer_limit(), I'm for stricter rules on the input.
With the above addressed, I have no more concerns.
> > > > > - check for input termination
> > > > > - combination of integer and fractional parts with check_mul_overflow() and check_add_overflow()
> > > > >
> > > > > > > > Maybe I'm missing these checks already performed?
> > > > > > > >
> > > > > > > > > > Having the test cases is a big benefit, and that part I like the most.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH v12 05/11] iio: core: add decimal value formatting into 64-bit value
From: Andy Shevchenko @ 2026-05-12 17:49 UTC (permalink / raw)
To: Rodrigo Alencar
Cc: rodrigo.alencar, linux-kernel, linux-iio, devicetree, linux-doc,
Jonathan Cameron, David Lechner, Andy Shevchenko,
Lars-Peter Clausen, Michael Hennerich, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Jonathan Corbet, Andrew Morton,
Petr Mladek, Steven Rostedt, Rasmus Villemoes, Sergey Senozhatsky,
Shuah Khan
In-Reply-To: <ql7smsqza7liupm7fhdts73cxsltrpxsqofu5ovzpxpwvcscuv@qigi3dwukk7k>
On Tue, May 12, 2026 at 05:09:32PM +0100, Rodrigo Alencar wrote:
> On 26/05/12 05:35PM, Andy Shevchenko wrote:
> > On Sun, May 10, 2026 at 01:42:23PM +0100, Rodrigo Alencar via B4 Relay wrote:
> >
> > > Create new format types for iio values (IIO_VAL_DECIMAL64_*), which
> > > defines the representation of fixed decimal point values into a single
> > > 64-bit number. This new format increases the range of represented values,
> > > allowing for integer parts greater than 2^32, as bits are not "wasted"
> > > in the fractional part, which can be seen in IIO_VAL_INT_PLUS_MICRO and
> > > IIO_VAL_INT_PLUS_NANO. Helpers are created to compose and decompose 64-bit
> > > decimals into integer values used in IIO formatting interfaces, which
> > > creates consistency and avoid error-prone manual assignments when using
> > > wordpart macros. When doing the parsing, kstrtodec64() is used with the
> > > scale defined by the specific decimal format type.
...
> > > + tmp2 = div64_s64_rem(iio_val_s64_from_array(vals),
> > > + int_pow(10, scale), &frac);
> > > + if (tmp2 == 0 && frac < 0)
> > > + return sysfs_emit_at(buf, offset, "-0.%0*lld", scale,
> > > + abs(frac));
> > > + else
> > > + return sysfs_emit_at(buf, offset, "%lld.%0*lld", tmp2,
> > > + scale, abs(frac));
> > > + }
> >
> > What about
> >
> > /* Print a leading '-' for negative fractions */
> > if (tmp2 == 0 && frac < 0)
> > offset += sysfs_emit_at(buf, offset, "-");
> >
> > return sysfs_emit_at(buf, offset, "%lld.%0*lld", tmp2, scale, abs(frac));
> >
> > Also note this won't work with the frac that are == S64_MIN. It's UB (undefined
> > behaviour), see the comment at abs() implementation. Maybe a time to add abs()
> > corner case tests...
>
> frac cannot be S64_MIN, it is always and remainder of a power of 10 modulus.
Okay, but what about input of -0.9999999999999999999 ? Will it fit the signed
frac type?
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH v12 06/11] iio: test: iio-test-format: add test case for decimal format
From: Andy Shevchenko @ 2026-05-12 17:51 UTC (permalink / raw)
To: Rodrigo Alencar
Cc: rodrigo.alencar, linux-kernel, linux-iio, devicetree, linux-doc,
Jonathan Cameron, David Lechner, Andy Shevchenko,
Lars-Peter Clausen, Michael Hennerich, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Jonathan Corbet, Andrew Morton,
Petr Mladek, Steven Rostedt, Rasmus Villemoes, Sergey Senozhatsky,
Shuah Khan
In-Reply-To: <zugkmatjsacla7l7nguekmclfdkzsshr3gs434a3liccgokxb4@xg77y5tkts5c>
On Tue, May 12, 2026 at 06:02:22PM +0100, Rodrigo Alencar wrote:
> On 26/05/12 05:36PM, Andy Shevchenko wrote:
> > On Sun, May 10, 2026 at 01:42:24PM +0100, Rodrigo Alencar via B4 Relay wrote:
...
> > > + iio_val_s64_array_populate(24, values);
> >
> > You want to test this first...
> > I think the previous patch needs new test cases.
>
> This is no complex stuff.. those functions are straightforward and
> goes into accordance with what the format function does... which is
> the opposite, before populating the buffer. The assertion on the buffer
> content accounts for that behavior.
You never know what BE32 / BE64 architectures will give you...
(but okay, it's simple enough to check the implementation),
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH 0/4] Add MSI Claw HID Configuration Driver
From: Derek John Clark @ 2026-05-12 17:54 UTC (permalink / raw)
To: Jiri Kosina
Cc: Benjamin Tissoires, Pierre-Loup A . Griffais, Denis Benato,
Zhouwang Huang, linux-input, linux-doc, linux-kernel
In-Reply-To: <n533qs94-7o4r-p5r0-04p1-68q1398n5785@xreary.bet>
On Tue, May 12, 2026 at 9:13 AM Jiri Kosina <jikos@kernel.org> wrote:
>
> On Sun, 10 May 2026, Derek J. Clark wrote:
>
> > This series adds and HID Configuration driver for the MSI Claw line of
> > Handheld Gaming PC's. The MSI Claw HID interface provides multiple
> > features, such as the ability to switch between xinput, dinput, and a
> > desktop mode, RGB control, rumble intensity, and mapping of the rear "M"
> > keys. There are additional gamepad modes that are not included in this
> > driver as they appear to be used in assembly line testing or are
> > incomplete in the firmware. During my testing I found them to be unstable.
> >
> > The initial version of this driver was written by Denis Benato, which
> > contained the initial reverse-engineering and implementation for the
> > gamepad mode switching. This work was later expanded by Zhouwang Huang
> > to include more gamepad modes and additional features. Finally, I
> > refactored the entire driver, fixed multiple bugs, and refined the overall
> > format to conform to kernel driver best practices and style guide.
> >
> > Claude was used initially by Zhouwang Huang to quickly parse HID captures
> > during the reverse-engineering of some of the features. Since Claude had
> > already been used, as a test of its capabilities I had it implement the
> > rumble intensity attribute after I had already rewritten most of the
> > driver, which I then manually edited to fix some mistakes. I also used
> > Claude to review the driver and these patches for any mistakes and bugs.
> >
> > Assisted-by: Claude:claude-sonnet-4-6
> > Co-developed-by: Denis Benato <denis.benato@linux.dev>
> > Signed-off-by: Denis Benato <denis.benato@linux.dev>
> > Co-developed-by: Zhouwang Huang <honjow311@gmail.com>
> > Signed-off-by: Zhouwang Huang <honjow311@gmail.com>
> > Signed-off-by: Derek J. Clark <derekjohn.clark@gmail.com>
> >
> > Derek J. Clark (4):
> > HID: hid-msi-claw: Add MSI Claw configuration driver
> > HID: hid-msi-claw: Add M-key mapping attributes
> > HID: hid-msi-claw: Add RGB control interface
> > HID: hid-msi-claw: Add Rumble Intensity Attributes
>
> The driver looks reasonable, I'd just like to propose that we name it just
> hid-msi to follow the usual HID subsystem driver naming standards, so that
> it can later be extended with supporting other MSI devices.
>
Hi Jiri,
Sounds good. I'll do that when I fix the issues flagged by the bot in
v2 and I'll try to have it out some time this week.
Thanks,
Derek
Thanks
> Thanks,
>
> --
> Jiri Kosina
> SUSE Labs
>
^ permalink raw reply
* Re: [PATCH v6 1/4] mm/memory-failure: report MF_MSG_KERNEL for reserved pages
From: jane.chu @ 2026-05-12 17:58 UTC (permalink / raw)
To: David Hildenbrand (Arm), Breno Leitao, Miaohe Lin,
Naoya Horiguchi, Andrew Morton, Jonathan Corbet, Shuah Khan,
Lorenzo Stoakes, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Liam R. Howlett
Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest,
linux-trace-kernel, kernel-team, Lance Yang
In-Reply-To: <9504c193-8c01-4d03-8f62-c50fd7fbdbc0@kernel.org>
On 5/12/2026 1:17 AM, David Hildenbrand (Arm) wrote:
> On 5/11/26 17:38, Breno Leitao wrote:
>> When get_hwpoison_page() returns a negative value, distinguish
>> reserved pages from other failure cases by reporting MF_MSG_KERNEL
>> instead of MF_MSG_GET_HWPOISON. Reserved pages belong to the kernel
>> and should be classified accordingly for proper handling.
>>
>> Sample PG_reserved before the get_hwpoison_page() call. In the
>> MF_COUNT_INCREASED path get_any_page() can drop the caller's
>> reference before returning -EIO, after which the underlying page may
>> have been freed and reallocated with page->flags reset; reading
>> PageReserved(p) at that point would observe stale or unrelated state.
>> The pre-call snapshot reflects what the page actually was at the
>> time of the failure event.
>>
>> Acked-by: Miaohe Lin <linmiaohe@huawei.com>
>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>> Signed-off-by: Breno Leitao <leitao@debian.org>
>> ---
>> mm/memory-failure.c | 19 ++++++++++++++++++-
>> 1 file changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 866c4428ac7ef..f112fb27a8ff6 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -2348,6 +2348,7 @@ int memory_failure(unsigned long pfn, int flags)
>> unsigned long page_flags;
>> bool retry = true;
>> int hugetlb = 0;
>> + bool is_reserved;
>>
>> if (!sysctl_memory_failure_recovery)
>> panic("Memory failure on page %lx", pfn);
>> @@ -2411,6 +2412,18 @@ int memory_failure(unsigned long pfn, int flags)
>> * In fact it's dangerous to directly bump up page count from 0,
>> * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
>> */
>> + /*
>> + * Pages with PG_reserved set are not currently managed by the
>> + * page allocator (memblock-reserved memory, driver reservations,
>> + * etc.), so classify them as kernel-owned for reporting.
>> + *
>> + * Sample the flag before get_hwpoison_page(): in the
>> + * MF_COUNT_INCREASED path, get_any_page() can drop the caller's
>> + * reference before returning -EIO, after which page->flags may
>> + * have been reset by the allocator.
>> + */
>> + is_reserved = PageReserved(p);
>> +
>> res = get_hwpoison_page(p, flags);
>> if (!res) {
>> if (is_free_buddy_page(p)) {
>> @@ -2432,7 +2445,11 @@ int memory_failure(unsigned long pfn, int flags)
>> }
>> goto unlock_mutex;
>> } else if (res < 0) {
>> - res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
>> + if (is_reserved)
>> + res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
>> + else
>> + res = action_result(pfn, MF_MSG_GET_HWPOISON,
>> + MF_IGNORED);
>> goto unlock_mutex;
>> }
>>
>>
>
> It's a bit odd that we need this handling when we already have handling for
> reserved pages in error_states[].
>
> HWPoisonHandlable() would always essentially reject PG_reserved pages. So
> __get_hwpoison_page() ... would always fail? Making
> get_hwpoison_page()->get_any_page() always fail?
>
> But then, we never call identify_page_state()? And never call me_kernel()?
>
> This all looks very odd.
>
> Why would you even want to call get_hwpoison_page() in the first place if you
> find PageReserved?
>
Ah, good point!
It seems to me that all unhandable pages should head out to
identify_page_state:
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2411,6 +2411,10 @@ int memory_failure(unsigned long pfn, int flags)
* In fact it's dangerous to directly bump up page count from 0,
* that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
*/
+
+ if (!HWPoisonHandlable(page, flags)
+ goto identify_page_state;
+
res = get_hwpoison_page(p, flags);
if (!res) {
if (is_free_buddy_page(p)) {
thanks,
-jane
^ permalink raw reply
* [PATCH] dm: fix dm-inlinecrypt docs warnings
From: Randy Dunlap @ 2026-05-12 18:04 UTC (permalink / raw)
To: linux-kernel
Cc: Randy Dunlap, Linlin Zhang, Alasdair Kergon, Mike Snitzer,
Mikulas Patocka, Benjamin Marzinski, dm-devel, Jonathan Corbet,
Shuah Khan, linux-doc
Add this file to the index and use a longer heading overline string
to eliminate warnings:
Documentation/admin-guide/device-mapper/dm-inlinecrypt.rst:1: WARNING: Title overline too short.
========
dm-inlinecrypt
========
Documentation/admin-guide/device-mapper/dm-inlinecrypt.rst: WARNING: document isn't included in any toctree [toc.not_included]
Fixes: b4a0774bd7fd ("dm: add documentation for dm-inlinecrypt target")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
---
Cc: Linlin Zhang <linlin.zhang@oss.qualcomm.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Benjamin Marzinski <bmarzins@redhat.com>
Cc: dm-devel@lists.linux.dev
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-doc@vger.kernel.org
Documentation/admin-guide/device-mapper/dm-inlinecrypt.rst | 4 ++--
Documentation/admin-guide/device-mapper/index.rst | 1 +
2 files changed, 3 insertions(+), 2 deletions(-)
--- linux-next-20260508.orig/Documentation/admin-guide/device-mapper/dm-inlinecrypt.rst
+++ linux-next-20260508/Documentation/admin-guide/device-mapper/dm-inlinecrypt.rst
@@ -1,6 +1,6 @@
-========
+==============
dm-inlinecrypt
-========
+==============
Device-Mapper's "inlinecrypt" target provides transparent encryption of block devices
using the inline encryption hardware.
--- linux-next-20260508.orig/Documentation/admin-guide/device-mapper/index.rst
+++ linux-next-20260508/Documentation/admin-guide/device-mapper/index.rst
@@ -15,6 +15,7 @@ Device Mapper
dm-flakey
dm-ima
dm-init
+ dm-inlinecrypt
dm-integrity
dm-io
dm-log
^ permalink raw reply
* Re: [PATCH v2 11/14] userfaultfd: add UFFD_FEATURE_RWP_ASYNC for async fault resolution
From: Mike Rapoport @ 2026-05-12 18:05 UTC (permalink / raw)
To: Kiryl Shutsemau (Meta)
Cc: akpm, peterx, david, ljs, surenb, vbabka, Liam.Howlett, ziy,
corbet, skhan, seanjc, pbonzini, jthoughton, aarcange, sj,
usama.arif, linux-mm, linux-kernel, linux-doc, linux-kselftest,
kvm, kernel-team
In-Reply-To: <65492c7b535080c7e85e90cb7ca962a52871e8b9.1778254670.git.kas@kernel.org>
On Fri, May 08, 2026 at 04:55:23PM +0100, Kiryl Shutsemau (Meta) wrote:
> Sync RWP delivers a message and blocks the faulting thread until the
> handler resolves the fault. For working-set tracking the VMM does not
> need the message: it just needs to know, at scan time, which pages
> were touched. Async RWP serves that use case — the kernel restores
> access in-place and the faulting thread continues without blocking.
>
> The VMM reconstructs the access pattern after the fact via
> PAGEMAP_SCAN: pages whose uffd bit is still set (inverted
> PAGE_IS_ACCESSED) were not re-accessed since the last RWP cycle.
>
> Worth calling out: async resolution upgrades writable private anon
> PTEs via pte_mkwrite() when can_change_pte_writable() allows, mirroring
> do_numa_page(). Without it, every re-access of an RWP'd writable page
> would COW-fault a second time.
>
> UFFD_FEATURE_RWP_ASYNC requires UFFD_FEATURE_RWP.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Assisted-by: Claude:claude-opus-4-6
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
> fs/userfaultfd.c | 19 ++++++++++++++++++-
> include/linux/userfaultfd_k.h | 6 ++++++
> include/uapi/linux/userfaultfd.h | 11 ++++++++++-
> mm/huge_memory.c | 25 ++++++++++++++++++++++++-
> mm/hugetlb.c | 32 +++++++++++++++++++++++++++++++-
> mm/memory.c | 27 +++++++++++++++++++++++++--
> 6 files changed, 114 insertions(+), 6 deletions(-)
--
Sincerely yours,
Mike.
^ permalink raw reply
* Re: [PATCH RFC v3 0/3] Add splash DRM client
From: Francesco Valla @ 2026-05-12 17:41 UTC (permalink / raw)
To: Mario Limonciello
Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Jonathan Corbet, Jocelyn Falempe,
Javier Martinez Canillas, Shuah Khan, Sam Ravnborg, linux-kernel,
dri-devel, linux-doc, linux-embedded
In-Reply-To: <5d7067de-97b7-4232-9cf6-e4b978696482@amd.com>
Hello Mario,
Thank you for taking a stab at this.
On Mon, May 11, 2026 at 08:59:14PM -0500, Mario Limonciello wrote:
>
>
> On 5/10/26 16:29, Francesco Valla wrote:
> > Hello,
> >
> > this is the third (and hopefully last) RFC version for the DRM-based
> > splash screen.
> >
> > Motivation behind the work can be found in v1 [0]; in a nutshell, the
> > splash DRM client can draw a splashscreen using:
> >
> > - the BMP image supplied by the EFI BGRT;
> > - a BMP image loaded as firmware (either built-in or loaded from the
> > filesystem);
> > - a colored background.
> >
> > This revision greatly simplifies the image seletion logic; now the EFI
> > BGRT is always used as first source if enabled, with a fallback to BMP
> > image loaded as firmware and then to a plain color.
> >
> > Sanity checks on the EFI BGRT image have been borrowed from the efifb
> > driver. More complete splash providers (e.g.: Plymouth) have an
> > extensive management of platform-specific quirks, but I don't think it
> > would be reasonable to introduce such complexity here.
> >
> > Additional notes:
> > - Rotation is still not managed (and probably won't?).
> > - Support for tiled screens is untested.
> > - Plain color and BMP sources were tested on QEMU, Beagleplay and
> > i.MX93 FRDM.
> > - EFI BGRT support was tested using QEMU+OVMF.
> >
> > Thank you in advance for any feedback.
>
> Unfortunately I found that I couldn't compile with my normal Kconfig.
>
> ERROR: modpost: "bgrt_tab" [drivers/gpu/drm/clients/drm_client_lib.ko]
> undefined!
> ERROR: modpost: "bgrt_image_size"
> [drivers/gpu/drm/clients/drm_client_lib.ko] undefined!
> make[2]: *** [scripts/Makefile.modpost:147: Module.symvers] Error 1
> make[1]: *** [/home/supermario/src/linux/Makefile:2091: modpost] Error 2
> make: *** [Makefile:248: __sub-make] Error 2
>
> ❮ grep ^CONFIG_DRM .config
> CONFIG_DRM=y
> CONFIG_DRM_KMS_HELPER=m
> CONFIG_DRM_DRAW=y
> CONFIG_DRM_CLIENT=y
> CONFIG_DRM_CLIENT_LIB=m
Here lies the source of the issue, since I forgot to export the BGRT
table symbols. In my test setup I had the clients built-in and didn't
catch this. A simple patch (which will be included in v4) is attached.
> CONFIG_DRM_CLIENT_SELECTION=m
> CONFIG_DRM_CLIENT_SETUP=y
> CONFIG_DRM_FBDEV_EMULATION=y
> CONFIG_DRM_FBDEV_OVERALLOC=100
> CONFIG_DRM_CLIENT_SPLASH=y
> CONFIG_DRM_CLIENT_SPLASH_BACKGROUND_COLOR=0x000000
> CONFIG_DRM_CLIENT_SPLASH_SRC_BGRT=y
> CONFIG_DRM_CLIENT_SPLASH_BMP_SUPPORT=y
> CONFIG_DRM_CLIENT_DEFAULT_SPLASH=y
> CONFIG_DRM_CLIENT_DEFAULT="splash"
> CONFIG_DRM_LOAD_EDID_FIRMWARE=y
> CONFIG_DRM_DISPLAY_HELPER=m
> CONFIG_DRM_DISPLAY_DP_AUX_CHARDEV=y
> CONFIG_DRM_DISPLAY_DP_HELPER=y
> CONFIG_DRM_DISPLAY_DSC_HELPER=y
> CONFIG_DRM_DISPLAY_HDCP_HELPER=y
> CONFIG_DRM_DISPLAY_HDMI_CEC_NOTIFIER_HELPER=y
> CONFIG_DRM_DISPLAY_HDMI_HELPER=y
> CONFIG_DRM_TTM=m
> CONFIG_DRM_EXEC=m
> CONFIG_DRM_BUDDY=m
> CONFIG_DRM_TTM_HELPER=m
> CONFIG_DRM_GEM_SHMEM_HELPER=m
> CONFIG_DRM_SUBALLOC_HELPER=m
> CONFIG_DRM_SCHED=m
> CONFIG_DRM_PANEL_BACKLIGHT_QUIRKS=m
> CONFIG_DRM_PRIVACY_SCREEN=y
> CONFIG_DRM_AMDGPU=m
> CONFIG_DRM_AMDGPU_CIK=y
> CONFIG_DRM_AMDGPU_USERPTR=y
> CONFIG_DRM_AMD_ISP=y
> CONFIG_DRM_AMD_ACP=y
> CONFIG_DRM_AMD_DC=y
> CONFIG_DRM_AMD_DC_FP=y
> CONFIG_DRM_AMD_SECURE_DISPLAY=y
> CONFIG_DRM_BRIDGE=y
> CONFIG_DRM_PANEL_BRIDGE=y
> CONFIG_DRM_PANEL=y
> CONFIG_DRM_SYSFB_HELPER=m
> CONFIG_DRM_SIMPLEDRM=m
> CONFIG_DRM_PANEL_ORIENTATION_QUIRKS=y
> CONFIG_DRM_ACCEL=y
> CONFIG_DRM_ACCEL_AMDXDNA=m
> CONFIG_DRM_ACCEL_HABANALABS=m
> CONFIG_DRM_ACCEL_IVPU=m
> CONFIG_DRM_ACCEL_QAIC=m
> ❮ grep BGRT .config
> CONFIG_ACPI_BGRT=y
> CONFIG_DRM_CLIENT_SPLASH_SRC_BGRT=y
>
Regards,
Francesco
---
drivers/firmware/efi/efi-bgrt.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/firmware/efi/efi-bgrt.c b/drivers/firmware/efi/efi-bgrt.c
index 1da451582812..4ca06ed5d6f5 100644
--- a/drivers/firmware/efi/efi-bgrt.c
+++ b/drivers/firmware/efi/efi-bgrt.c
@@ -17,7 +17,10 @@
#include <linux/efi-bgrt.h>
struct acpi_table_bgrt bgrt_tab;
+EXPORT_SYMBOL(bgrt_tab);
+
size_t bgrt_image_size;
+EXPORT_SYMBOL(bgrt_image_size);
struct bmp_header {
u16 id;
--
^ permalink raw reply related
* Re: [PATCH v2 12/14] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle
From: Mike Rapoport @ 2026-05-12 18:11 UTC (permalink / raw)
To: Kiryl Shutsemau (Meta)
Cc: akpm, peterx, david, ljs, surenb, vbabka, Liam.Howlett, ziy,
corbet, skhan, seanjc, pbonzini, jthoughton, aarcange, sj,
usama.arif, linux-mm, linux-kernel, linux-doc, linux-kselftest,
kvm, kernel-team
In-Reply-To: <e8f142c530b715c6d45475230c4e35a1cfd8dbd4.1778254670.git.kas@kernel.org>
On Fri, May 08, 2026 at 04:55:24PM +0100, Kiryl Shutsemau (Meta) wrote:
> Add an ioctl to toggle async mode at runtime without re-registering
> the userfaultfd. This allows a VMM to switch between sync and async
> RWP modes on-the-fly -- for example, starting in async mode for
> working set scanning, then switching to sync mode to intercept faults
> during page eviction.
>
> UFFDIO_SET_MODE takes an enable/disable bitmask of UFFD_FEATURE_*
> flags. Only UFFD_FEATURE_RWP_ASYNC is toggleable today; the ioctl
> rejects any other bit with -EINVAL. Enabling RWP_ASYNC also requires
> RWP to have been negotiated at UFFDIO_API time, mirroring the
> UFFDIO_API invariant.
>
> Fault-path readers of ctx->features run under mmap_read_lock or a
> per-VMA lock; the RMW takes mmap_write_lock and calls
> vma_start_write() on every UFFD-armed VMA, so those readers are fully
> excluded. userfaultfd_show_fdinfo(), however, reads ctx->features
> without any lock, so the RMW is written as a single WRITE_ONCE and
> fdinfo reads it with READ_ONCE. That keeps the lockless observer from
> seeing a mid-RMW intermediate and removes the audit burden when new
> toggleable bits are added later.
>
> When switching to async, pending sync waiters are woken so they retry
> and auto-resolve under the new mode.
>
> Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
> Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
> fs/userfaultfd.c | 150 +++++++++++++++++++++++++------
> include/uapi/linux/userfaultfd.h | 14 +++
> 2 files changed, 136 insertions(+), 28 deletions(-)
--
Sincerely yours,
Mike.
^ permalink raw reply
* Re: [PATCH v12 02/11] lib: kstrtox: add kstrtoudec64() and kstrtodec64()
From: Rodrigo Alencar @ 2026-05-12 18:15 UTC (permalink / raw)
To: Andy Shevchenko, Rodrigo Alencar
Cc: Andy Shevchenko, Jonathan Cameron, Rodrigo Alencar via B4 Relay,
rodrigo.alencar, linux-kernel, linux-iio, devicetree, linux-doc,
David Lechner, Andy Shevchenko, Lars-Peter Clausen,
Michael Hennerich, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Jonathan Corbet, Andrew Morton, Petr Mladek, Steven Rostedt,
Rasmus Villemoes, Sergey Senozhatsky, Shuah Khan, David Laight
In-Reply-To: <agNnfWZa9_NyLoWq@ashevche-desk.local>
On 26/05/12 08:46PM, Andy Shevchenko wrote:
> On Tue, May 12, 2026 at 06:26:12PM +0100, Rodrigo Alencar wrote:
> > On 26/05/12 08:13PM, Andy Shevchenko wrote:
> > > On Tue, May 12, 2026 at 05:35:59PM +0100, Rodrigo Alencar wrote:
> > > > On 26/05/12 06:21PM, Andy Shevchenko wrote:
> > > > > On Tue, May 12, 2026 at 6:11 PM Rodrigo Alencar
> > > > > <455.rodrigo.alencar@gmail.com> wrote:
> > > > > > On 26/05/12 05:43PM, Andy Shevchenko wrote:
> > > > > > > On Tue, May 12, 2026 at 03:12:24PM +0100, Rodrigo Alencar wrote:
> > > > > > > > On 26/05/12 04:48PM, Andy Shevchenko wrote:
> > > > > > > > > On Tue, May 12, 2026 at 02:21:14PM +0100, Rodrigo Alencar wrote:
> > > > > > > > > > On 26/05/12 04:12PM, Andy Shevchenko wrote:
> > > > > > > > > > > On Tue, May 12, 2026 at 12:39:53PM +0100, Jonathan Cameron wrote:
> > > > > > > > > > > > On Sun, 10 May 2026 13:42:20 +0100
> > > > > > > > > > > > Rodrigo Alencar via B4 Relay <devnull+rodrigo.alencar.analog.com@kernel.org> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Add helpers that parses decimal numbers into 64-bit number, i.e., decimal
> > > > > > > > > > > > > point numbers with pre-defined scale are parsed into a 64-bit value (fixed
> > > > > > > > > > > > > precision). After the decimal point, digits beyond the specified scale
> > > > > > > > > > > > > are ignored.
...
> > > > > > I think we are going in circles here and we could look at the code instead:
> > > > > > - integer parsing with _parse_integer()
> > > > > > - overflow check and validation of the return value
> > > > > > - fractional parsing with _parse_integer_limit()
> > > > > > - overflow check and validation of the return value
> > > > >
> > > > > No, this is not fully true. That's what my whole point is about. The
> > > > > max_chars parameter limits the input check, then it skips an arbitrary
> > > > > number of digits and only *then* it checks for \n and \0. What will be
> > > > > the result of the
> > > > > 0.00000000000000000000000000000000423 in your case? Whatever scale you
> > > > > gave it will return 0 without checking on how many digits were
> > > > > supplied.
> > > >
> > > > I suppose that is a valid input and 0 is the expected result there.
> > > >
> > > > > All the same for 0.9999999999999999999999999999999000423. My
> > > > > point is that we should limit this by 19 digits.
> > > >
> > > > why we need to limit by 19? Digits beyond the scale carry no value...
> > >
> > > ...only if they are all 0:s.
> >
> > I thought your concern was on input length.
>
> One of, since I think you rose the topic of leading 0:s for integers and
> I agreed with that which makes sense to have mirrored in fractional part.
>
> > > > just like leading zeros to the integer part (which is also accepted by
> > > > kstrtoull() when parsing with base 10). Not sure why this is invalid input.
> > >
> > > See above. I agree on truncating trailing 0:s as it's done for leading ones
> > > in integer part, but if any of the digit behind 19th is not 0, it's an overflow
> > > condition (or bad input, depending how strict the rules are).
> >
> > stating in the documentation that digits beyond the scale are ignored is not
> > enough?
>
> It's in case we are not for kstrto*() family. My understanding that kstrto*()
> use strict rules on the input in overflow check.
>
> > > > > On top of that, what about -0.9(19 times) ? the fraction should be u64
> > > > > in this case and it's fine. The sign applies to the combined value.
> > > >
> > > > yes, range for signed values are verified later.
> > >
> > > > > > - extra scaling and truncation happening outside if needed.
> > > > >
> > > > > Right, but the given input may be way too long and still needs more validation.
> > > >
> > > > What is the problem with a long input of digits?
> > > > C compiler does not complain about this when parsing a float value,
> > > > python does not
> > > > complain about this when parsing floats or decimals either.
> > >
> > > Because there is an exponent limit and for double it's something like 1e307
> > > IIRC, meaning, try 1024 digits to be sure.
> > >
> > > Python most likely uses the library for big numbers, you can't compare it at all with this.
> >
> > You would be fine if the truncation loop:
> >
> > while (isdigit(*s)) /* truncate */
> > s++;
> >
> > is bounded by (19-scale) iteration count? or it should keep iterating if those are zero?
>
> Ideally both.
>
> We don't care about the digits in the range of 19-scale and skip all 0:s after
> that.
>
> /* truncate unrequired digits within type limit, i.e. 19 decimal digits */
> while (isdigit(*s) && "(s - pos_of_dot) is less than 19")
> s++;
> while (s == '0') /* truncate trailing 0:s, it's not a bad input nor overflow */
> s++;
We could have agreed on something like that since the beginning!
And I think that changing the logic to something like this would not change a
thing on the kind of inputs we expect, it will just complicate the code.
I suppose that kind of kstrto*() rules were never stated anywhere.
|> 20th digit
Also, 0.00000000000000000001 still sounds like a valid decimal number to me, even
though it is going to be parsed as 0!
>
> // Now if it's not \0 nor \n and
> // a) still a digit consider either overflow or bad input,
> // b) if not a digit, consider as bad input.
>
> In a) I tend to be on par with the other k*() and consider that as overflow.
>
> > is that the only concern? Again, the usage of _parse_integer_limit(s, 10, &_frac, scale)
> > avoids a 64-bit division when checking the rv.
>
> I'm not against usage of _parse_integer_limit(), I'm for stricter rules on the input.
> With the above addressed, I have no more concerns.
Thanks! I will proceed with the requested adjustments.
...
--
Kind regards,
Rodrigo Alencar
^ permalink raw reply
* Re: [PATCH RFC v4 01/10] dt-bindings: iio: frequency: add ad9910
From: Jonathan Cameron @ 2026-05-12 18:31 UTC (permalink / raw)
To: Rodrigo Alencar via B4 Relay
Cc: rodrigo.alencar, linux-iio, devicetree, linux-kernel, linux-doc,
linux-hardening, Lars-Peter Clausen, Michael Hennerich,
David Lechner, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
Kees Cook, Gustavo A. R. Silva
In-Reply-To: <20260508-ad9910-iio-driver-v4-1-d26bfd20ee3d@analog.com>
On Fri, 08 May 2026 18:00:17 +0100
Rodrigo Alencar via B4 Relay <devnull+rodrigo.alencar.analog.com@kernel.org> wrote:
> From: Rodrigo Alencar <rodrigo.alencar@analog.com>
>
> DT-bindings for AD9910, a 1 GSPS DDS with 14-bit DAC. It includes
> configurations for clocks, DAC current, reset and basic GPIO control.
I think this is getting close enough now that for next version you should
drop the RFC (which is probably gating DT binding folk giving it
a detailed review!)
>
> Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com>
> +
> + adi,dac-output-current-microamp:
> + minimum: 8640
> + maximum: 31590
> + default: 20070
> + description:
> + DAC full-scale output current in microamps.
> +
Can we use generic dac.yaml defined output-range-microamp? The base will be 0 always but
that shouldn't matter.
^ permalink raw reply
* Re: [RFC net-next 0/4] devlink: Add boot-time defaults
From: Jiri Pirko @ 2026-05-12 18:35 UTC (permalink / raw)
To: Parav Pandit
Cc: Mark Bloch, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
Andrew Lunn, David S. Miller, Jonathan Corbet, Shuah Khan,
Simon Horman, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
Andrew Morton, Borislav Petkov (AMD), Randy Dunlap, Dave Hansen,
Christian Brauner, Petr Mladek, Peter Zijlstra (Intel),
Thomas Gleixner, Pawan Gupta, Dapeng Mi, Kees Cook, Marco Elver,
Eric Biggers, NBU-Contact-Li Rongqing (EXTERNAL),
Paul E. McKenney, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
linux-rdma@vger.kernel.org
In-Reply-To: <SJ0PR12MB6806D8ADF943B30AD3B479CCDC392@SJ0PR12MB6806.namprd12.prod.outlook.com>
Tue, May 12, 2026 at 05:25:21PM CEST, parav@nvidia.com wrote:
>
>
>> From: Jiri Pirko <jiri@resnulli.us>
>> Sent: 12 May 2026 07:37 PM
>>
>> Tue, May 12, 2026 at 03:48:32PM CEST, parav@nvidia.com wrote:
>> >
>> >> From: Jiri Pirko <jiri@resnulli.us>
>> >> Sent: 12 May 2026 02:16 PM
>> >>
>> >> Mon, May 11, 2026 at 08:21:37PM +0200, parav@nvidia.com wrote:
>> >> >
>> >> >> From: Mark Bloch <mbloch@nvidia.com>
>> >> >> Sent: 10 May 2026 06:02 PM
>> >> >>
>> >> >
>> >> >[..]
>> >> >
>> >> >> > I look at it from the perspective that from some CX generation,
>> >> >> > switchdev mode should be default. So that is a device-based decision.
>> >> >> > I believe as such it can optionally be permanenty configured (nv config)
>> >> >> > on older device. Why not?
>> >> >>
>> >> >Because sometimes switchdev_inactive is needed and sometimes not.
>> >> >Such knob is not device decision.
>> >>
>> >> That is what I would call corner case. In that, user can use userspace
>> >> configuration to change the mode in runtime.
>> >>
>> >Corner vs common depends on users one talks to. :)
>> >If fw has switchdev(active) as default, and then
>> >And user needs to run switchdev_inactive, it will actually break their switching applications.
>>
>> Can you describe the actutal breakage please?
>>
>Driver default was switchdev so all the traffic is forwarded to the switch,
>and user didn't have chance to setup the fdb rules.
>So packets are dropped but user didn't expect the traffic to be forwarded.
User may switch mode to switchdev_inactive early on, before any of the
representors are created. What's the issue then?
>
>With this RFC, the device would start in the switchdev_inactive.
>And user's goal is achieved.
>
>> >
>> >So, one needs to invent switchdev_inactive in the FW.
>> >
>> >Jakub's suggestion in this RFC is covering both the scenarios uniformly without above problems.
>> >Single uapi for all the cases, so looks good to me.
>> >
>> >Moreover, do not understand how alternative solves such problems.
>> >i.e. user is unable to configure the fw because driver is not yet loaded/up.
>>
>> See my other reply in this thread. I don't think there is a need to
>> configure anything in FW. If we fix the behaviour in switchdev mode for
>> non-sriov user and change the default, no fw knob needed. What am I
>> missing?
>>
>If I understood your suggestion right, is it the devlinkd based solution?
The suggestion is to use "switchdev" as default with user configuration
no matter if it is devlinkd or something else.
>
>If yes, then Mark explained that it has the issue of all drivers to be loaded, followed by user space to start.
^ permalink raw reply
* [PATCH v5 00/11] PCI: liveupdate: PCI core support for Live Update
From: David Matlack @ 2026-05-12 18:48 UTC (permalink / raw)
To: kexec, linux-doc, linux-kernel, linux-mm, linux-pci
Cc: Adithya Jayachandran, Alexander Graf, Alex Williamson,
Bjorn Helgaas, Chris Li, David Matlack, David Rientjes, Jacob Pan,
Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Leon Romanovsky,
Lukas Wunner, Mike Rapoport, Parav Pandit, Pasha Tatashin,
Pranjal Shrivastava, Pratyush Yadav, Saeed Mahameed,
Samiullah Khawaja, Shuah Khan, Vipin Sharma, William Tu, Yi Liu
This series can be found on GitHub:
https://github.com/dmatlack/linux/tree/liveupdate/pci/base/v5
This patch series introduces the initial support in the PCI core for
Live Update, enabling drivers to preserve PCI devices across a
kexec-based kernel update without interrupting the device. This
functionality is critical for minimizing downtime in environments where
PCI devices (e.g., those assigned to VMs via VFIO) must continue
operating or maintain state across a host kernel upgrade.
Specifically, this patch series allows preserved PCI devices to perform
memory transactions to/from system memory (DMA) uninterrupted across a
Live Update. The devices can be behind a bridge, but must not be a VF.
Support for P2P and preserving VFs will come in future series.
Series Overview
---------------
This series implements the following to support PCI device preservation
across Live Update:
1. Set up a File-Lifecycle-Bound (FLB) handler to track and preserve
PCI-specific state (struct pci_ser) across Live Update using Kexec
Handover (KHO).
2. Add APIs for drivers to register "outgoing" devices for
preservation and for the PCI core to identify "incoming" preserved
devices during enumeration.
3. Automatically preserve all upstream bridges for any preserved
endpoint. Use reference counting to ensure bridges remain preserved
as long as any downstream device is preserved.
4. Guarantee that preserved devices can be identified by the same
RequesterID (bus, device, function) for as long as they are
preserved by always inheriting secondary and subordinate bus
numbers and ARI Forwarding Enable on bridges with preserved
downstream endpoints.
5. Guarantee the memory transactions to/from preserved devices are
routed the same way by inheriting Access Control Services (ACS)
flags across a Live Update.
6. Modify the PCI shutdown path to avoid disabling bus mastering on
preserved devices and their upstream bridges, allowing memory
transactions to continue uninterrupted.
7. Provide comprehensive documentation for the FLB API, device
tracking mechanisms, and the division of responsibilities between
the PCI core, drivers, and userspace.
Dependencies
------------
This series is built on top of the next branch of liveupdate.git tree
which has 2 commits to enable refcounting the incoming FLB:
https://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git/log/?h=next
Testing
-------
This series was tested in conjunction with v4 of the VFIO PCI driver
series:
https://lore.kernel.org/kvm/20260511234802.2280368-1-vipinsh@google.com/
The full set of patches that I used for testing can be found on GitHub:
https://github.com/dmatlack/linux/tree/liveupdate/pci/base/v5-with-vfio
The full set of patches was tested using the new VFIO selftests:
- vfio_pci_liveupdate_uapi_test
- vfio_pci_liveupdate_kexec_test
Both tests were ran in ran in a QEMU-based VM environment, using a
single virtio-net PCIe device connected to a root port (to exercise the
bridge support in this series), and in a baremetal environment on an
Intel EMR server, using 8x Intel DSA PCIe devices (each on a host
bridge) and 1x NVMe device connected to a root port.
Future Work
-----------
After this series we expect to make further improvements to the PCI core
support for Live Update.
- Allow P2P across Live Update by avoiding sizing or moving preserved
device BARs and preserving all upstream bridge windows.
- Support preserving Virtual Functions, by preserving SR-IOV
configuration on PFs and enumerating VFs after Live Update.
Changelog
---------
v5:
- Update PCI LIVE UPDATE entry in MAINTAINERS to use liveupdate.git,
add kexec@ mailing list, and drop Bjorn (Pasha, Bjorn, Pratyush)
- Create separate headers for Live Update definitions to avoid future
patch conflicts (me)
- Add kernel doc for public (Driver) API (me)
- Rename reserved field to padding (Vipin)
- Reorder checks outside of mutex where possible (Jacob)
- Clarify refcount in struct pci_dev_ser in kernel-doc (Sami)
- Require CONFIG_64BIT to avoid overflowing xarray key (Sashiko)
- Various spelling and grammar fixes (Bjorn)
- Ensure incoming and outgoing devices do not have their bus numbers
changed during manual rescans via sysfs (Jacob)
- Fix refcount dropping for upstream bridges during finish (Sashiko)
- Disallow devices with PCI_DEV_FLAGS_ACS_ENABLED_QUIRK to simplify
ACS inheritence across Live Update (Sashiko)
- Fix ACS re-enablement via pci_restore_state() (Sashiko)
- Drop commit that requires singleton iommu groups (me, Sami)
- Add per-device lock to protect Live Update fields (Sami, Sashiko)
v4: https://lore.kernel.org/linux-pci/20260423212316.3431746-1-dmatlack@google.com/
v3: https://lore.kernel.org/kvm/20260323235817.1960573-1-dmatlack@google.com/
v2: https://lore.kernel.org/kvm/20260129212510.967611-1-dmatlack@google.com/
v1: https://lore.kernel.org/kvm/20251126193608.2678510-1-dmatlack@google.com/
rfc: https://lore.kernel.org/kvm/20251018000713.677779-1-vipinsh@google.com/
David Matlack (11):
PCI: liveupdate: Set up FLB handler for the PCI core
PCI: liveupdate: Track outgoing preserved PCI devices
PCI: liveupdate: Track incoming preserved PCI devices
PCI: liveupdate: Document driver binding responsibilities
PCI: liveupdate: Keep bus numbers constant during Live Update
PCI: liveupdate: Auto-preserve upstream bridges across Live Update
PCI: liveupdate: Inherit ACS flags in incoming preserved devices
PCI: liveupdate: Inherit ARI Forwarding Enable on preserved bridges
PCI: liveupdate: Freeze preservation status during shutdown
PCI: liveupdate: Do not disable bus mastering on preserved devices
during kexec
Documentation: PCI: Add documentation for Live Update
Documentation/PCI/index.rst | 1 +
Documentation/PCI/liveupdate.rst | 29 +
.../admin-guide/kernel-parameters.txt | 6 +-
Documentation/core-api/liveupdate.rst | 1 +
MAINTAINERS | 12 +
drivers/pci/Kconfig | 14 +
drivers/pci/Makefile | 1 +
drivers/pci/liveupdate.c | 807 ++++++++++++++++++
drivers/pci/liveupdate.h | 66 ++
drivers/pci/pci-driver.c | 33 +-
drivers/pci/pci.c | 13 +-
drivers/pci/probe.c | 29 +-
include/linux/kho/abi/pci.h | 64 ++
include/linux/pci.h | 4 +
include/linux/pci_liveupdate.h | 75 ++
15 files changed, 1140 insertions(+), 15 deletions(-)
create mode 100644 Documentation/PCI/liveupdate.rst
create mode 100644 drivers/pci/liveupdate.c
create mode 100644 drivers/pci/liveupdate.h
create mode 100644 include/linux/kho/abi/pci.h
create mode 100644 include/linux/pci_liveupdate.h
base-commit: 34e8f02817e31826e76bb2ded48bf28fe921f20b
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply
* [PATCH v5 01/11] PCI: liveupdate: Set up FLB handler for the PCI core
From: David Matlack @ 2026-05-12 18:48 UTC (permalink / raw)
To: kexec, linux-doc, linux-kernel, linux-mm, linux-pci
Cc: Adithya Jayachandran, Alexander Graf, Alex Williamson,
Bjorn Helgaas, Chris Li, David Matlack, David Rientjes, Jacob Pan,
Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Leon Romanovsky,
Lukas Wunner, Mike Rapoport, Parav Pandit, Pasha Tatashin,
Pranjal Shrivastava, Pratyush Yadav, Saeed Mahameed,
Samiullah Khawaja, Shuah Khan, Vipin Sharma, William Tu, Yi Liu
In-Reply-To: <20260512184846.119396-1-dmatlack@google.com>
Set up a File-Lifecycle-Bound (FLB) handler for the PCI core to enable
it to participate in the preservation of PCI devices across Live Update.
Essentially, this commit enables the PCI core to allocate a struct
(struct pci_ser) and preserve it across a Live Update whenever at least
one device is preserved.
Preserving PCI devices across Live Update is built on top of the Live
Update Orchestrator's (LUO) support for file preservation. Drivers are
expected to expose a file to userspace to represent a single PCI device
and support preservation of that file. This is intended primarily to
support preservation of PCI devices bound to VFIO drivers.
This commit enables drivers to register their liveupdate_file_handler
with the PCI core so that the PCI core can do its own tracking and
enforcement of which devices are preserved.
pci_liveupdate_register_flb(driver_file_handler);
pci_liveupdate_unregister_flb(driver_file_handler);
When the first file (with a handler registered with the PCI core) is
preserved, the PCI core will be notified to allocate its tracking struct
(pci_ser). When the last file is unpreserved (i.e. preservation
cancelled) the PCI core will be notified to free struct pci_ser.
This struct is preserved across a Live Update using KHO and can be
fetched by the PCI core during early boot (e.g. during device
enumeration) so that it knows which devices were preserved.
Note: This commit only allocates struct pci_ser and preserves it across
Live Update. A subsequent commit will add an API for drivers to tell the
PCI core exactly which devices are being preserved.
Note: There is no reason to check for kho_is_enabled() since it can be
assumed to return true. If KHO was not enabled then Live Update would
not be enabled and these routines would never run.
Signed-off-by: David Matlack <dmatlack@google.com>
---
MAINTAINERS | 10 +++
drivers/pci/Kconfig | 14 +++
drivers/pci/Makefile | 1 +
drivers/pci/liveupdate.c | 153 +++++++++++++++++++++++++++++++++
include/linux/kho/abi/pci.h | 61 +++++++++++++
include/linux/pci.h | 1 +
include/linux/pci_liveupdate.h | 30 +++++++
7 files changed, 270 insertions(+)
create mode 100644 drivers/pci/liveupdate.c
create mode 100644 include/linux/kho/abi/pci.h
create mode 100644 include/linux/pci_liveupdate.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 2fb1c75afd16..6c618830cf61 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -20530,6 +20530,16 @@ L: linux-pci@vger.kernel.org
S: Supported
F: Documentation/PCI/pci-error-recovery.rst
+PCI LIVE UPDATE
+M: David Matlack <dmatlack@google.com>
+L: kexec@lists.infradead.org
+L: linux-pci@vger.kernel.org
+S: Maintained
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git
+F: drivers/pci/liveupdate.c
+F: include/linux/kho/abi/pci.h
+F: include/linux/pci_liveupdate.h
+
PCI MSI DRIVER FOR ALTERA MSI IP
L: linux-pci@vger.kernel.org
S: Orphan
diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 33c88432b728..08398cbe970c 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -328,6 +328,20 @@ config VGA_ARB_MAX_GPUS
Reserves space in the kernel to maintain resource locking for
multiple GPUS. The overhead for each GPU is very small.
+config PCI_LIVEUPDATE
+ bool "PCI Live Update Support (EXPERIMENTAL)"
+ depends on PCI && LIVEUPDATE
+ help
+ Enable PCI core support for preserving PCI devices across Live
+ Update. This, in combination with support in a device's driver,
+ enables PCI devices to run and perform memory transactions
+ uninterrupted during a kexec for Live Update.
+
+ This option should only be enabled by developers working on
+ implementing this support.
+
+ If unsure, say N.
+
source "drivers/pci/hotplug/Kconfig"
source "drivers/pci/controller/Kconfig"
source "drivers/pci/endpoint/Kconfig"
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 41ebc3b9a518..e8d003cb6757 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -16,6 +16,7 @@ obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_SYSFS) += pci-sysfs.o slot.o
obj-$(CONFIG_ACPI) += pci-acpi.o
obj-$(CONFIG_GENERIC_PCI_IOMAP) += iomap.o
+obj-$(CONFIG_PCI_LIVEUPDATE) += liveupdate.o
endif
obj-$(CONFIG_OF) += of.o
diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
new file mode 100644
index 000000000000..dd2449e12b6d
--- /dev/null
+++ b/drivers/pci/liveupdate.c
@@ -0,0 +1,153 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2026, Google LLC.
+ * David Matlack <dmatlack@google.com>
+ */
+
+/**
+ * DOC: PCI Live Update
+ *
+ * The PCI subsystem participates in the Live Update process to enable drivers
+ * to preserve their PCI devices across kexec.
+ *
+ * .. note::
+ * The support for preserving PCI devices across Live Update is currently
+ * *partial* and should be considered *experimental*. It should only be
+ * used by developers working on the implementation for the time being.
+ *
+ * To enable the support, enable ``CONFIG_PCI_LIVEUPDATE``.
+ *
+ * File-Lifecycle-Bound (FLB) Data
+ * ===============================
+ *
+ * PCI device preservation across Live Update is built on top of the Live Update
+ * Orchestrator's (LUO) support for file preservation across kexec. Drivers
+ * are expected to expose a file to represent a single PCI device and support
+ * preservation of that file with ``ioctl(LIVEUPDATE_SESSION_PRESERVE_FD)``.
+ * This allows userspace to control the preservation of devices and ensure
+ * proper lifecycle management while a device is preserved. The first intended
+ * use-case is preserving vfio-pci device files.
+ *
+ * The PCI core maintains its own state about what devices are being preserved
+ * across Live Update using a feature called File-Lifecycle-Bound (FLB) data in
+ * LUO. Essentially, this allows the PCI core to allocate struct pci_ser when
+ * the first device (file) is preserved and free it when the last device (file)
+ * is unpreserved. After kexec, the PCI core can fetch the struct pci_ser (which
+ * was constructed by the previous kernel) from LUO at any time (e.g. during
+ * enumeration) so that it knows which devices were preserved.
+ *
+ * To enable the PCI core to be notified whenever a file representing a device
+ * is preserved, drivers must register their struct liveupdate_file_handler with
+ * the PCI core by using the following APIs:
+ *
+ * * ``pci_liveupdate_register_flb(driver_file_handler)``
+ * * ``pci_liveupdate_unregister_flb(driver_file_handler)``
+ */
+
+#define pr_fmt(fmt) "PCI: liveupdate: " fmt
+
+#include <linux/io.h>
+#include <linux/kexec_handover.h>
+#include <linux/kho/abi/pci.h>
+#include <linux/liveupdate.h>
+#include <linux/mutex.h>
+#include <linux/mm.h>
+#include <linux/pci.h>
+
+static int pci_flb_preserve(struct liveupdate_flb_op_args *args)
+{
+ struct pci_dev *dev = NULL;
+ u32 max_nr_devices = 0;
+ struct pci_ser *ser;
+ unsigned long size;
+
+ /*
+ * Allocate enough space to preserve all of the devices that are
+ * currently present on the system. Extra padding can be added to this
+ * in the future to increase the chances that there is enough room to
+ * preserve devices that are not yet present on the system (e.g. VFs,
+ * hot-plugged devices).
+ */
+ for_each_pci_dev(dev)
+ max_nr_devices++;
+
+ size = struct_size_t(struct pci_ser, devices, max_nr_devices);
+
+ ser = kho_alloc_preserve(size);
+ if (IS_ERR(ser))
+ return PTR_ERR(ser);
+
+ pr_debug("Preserved struct pci_ser with room for %u devices\n",
+ max_nr_devices);
+
+ ser->max_nr_devices = max_nr_devices;
+ ser->nr_devices = 0;
+
+ args->obj = ser;
+ args->data = virt_to_phys(ser);
+ return 0;
+}
+
+static void pci_flb_unpreserve(struct liveupdate_flb_op_args *args)
+{
+ struct pci_ser *ser = args->obj;
+
+ WARN_ON_ONCE(ser->nr_devices);
+ kho_unpreserve_free(ser);
+
+ pr_debug("Unpreserved struct pci_ser\n");
+}
+
+static int pci_flb_retrieve(struct liveupdate_flb_op_args *args)
+{
+ args->obj = phys_to_virt(args->data);
+ return 0;
+}
+
+static void pci_flb_finish(struct liveupdate_flb_op_args *args)
+{
+ kho_restore_free(args->obj);
+}
+
+static struct liveupdate_flb_ops pci_liveupdate_flb_ops = {
+ .preserve = pci_flb_preserve,
+ .unpreserve = pci_flb_unpreserve,
+ .retrieve = pci_flb_retrieve,
+ .finish = pci_flb_finish,
+ .owner = THIS_MODULE,
+};
+
+static struct liveupdate_flb pci_liveupdate_flb = {
+ .ops = &pci_liveupdate_flb_ops,
+ .compatible = PCI_LUO_FLB_COMPATIBLE,
+};
+
+/**
+ * pci_liveupdate_register_flb() - Register a file handler with the PCI core
+ * @fh: The file handler to register.
+ *
+ * Drivers should call pci_liveupdate_register_flb() to register their
+ * struct liveupdate_file_handler with the PCI core. This enables the PCI core
+ * to allocate its outgoing struct pci_ser whenever the first device is
+ * preserved, and free it when the last device is unpreserved.
+ *
+ * Return: 0 on success, <0 on failure.
+ */
+int pci_liveupdate_register_flb(struct liveupdate_file_handler *fh)
+{
+ pr_debug("Registering file handler \"%s\"\n", fh->compatible);
+ return liveupdate_register_flb(fh, &pci_liveupdate_flb);
+}
+EXPORT_SYMBOL_GPL(pci_liveupdate_register_flb);
+
+/**
+ * pci_liveupdate_unregister_flb() - Unregister a file handler with the PCI core
+ * @fh: The file handler to unregister.
+ */
+void pci_liveupdate_unregister_flb(struct liveupdate_file_handler *fh)
+{
+ pr_debug("Unregistering file handler \"%s\"\n", fh->compatible);
+ liveupdate_unregister_flb(fh, &pci_liveupdate_flb);
+}
+EXPORT_SYMBOL_GPL(pci_liveupdate_unregister_flb);
diff --git a/include/linux/kho/abi/pci.h b/include/linux/kho/abi/pci.h
new file mode 100644
index 000000000000..6ebcf817fff4
--- /dev/null
+++ b/include/linux/kho/abi/pci.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (c) 2026, Google LLC.
+ * David Matlack <dmatlack@google.com>
+ */
+
+#ifndef _LINUX_KHO_ABI_PCI_H
+#define _LINUX_KHO_ABI_PCI_H
+
+#include <linux/bug.h>
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+/**
+ * DOC: PCI File-Lifecycle Bound (FLB) Live Update ABI
+ *
+ * This header defines the ABI for preserving core PCI state across kexec using
+ * Live Update File-Lifecycle Bound (FLB) data.
+ *
+ * This interface is a contract. Any modification to any of the serialization
+ * structs defined here constitutes a breaking change. Such changes require
+ * incrementing the version number in the PCI_LUO_FLB_COMPATIBLE string.
+ */
+
+#define PCI_LUO_FLB_COMPATIBLE "pci-v1"
+
+/**
+ * struct pci_dev_ser - Serialized state about a single PCI device.
+ *
+ * @domain: The device's PCI domain number (segment).
+ * @bdf: The device's PCI bus, device, and function number.
+ * @padding: Padding to naturally align struct pci_dev_ser.
+ */
+struct pci_dev_ser {
+ u32 domain;
+ u16 bdf;
+ u16 padding;
+} __packed;
+
+/**
+ * struct pci_ser - PCI Subsystem Live Update State
+ *
+ * This struct tracks state about all devices that are being preserved across
+ * a Live Update for the next kernel.
+ *
+ * @max_nr_devices: The length of the devices[] flexible array.
+ * @nr_devices: The number of devices that were preserved.
+ * @devices: Flexible array of pci_dev_ser structs for each device.
+ */
+struct pci_ser {
+ u32 max_nr_devices;
+ u32 nr_devices;
+ struct pci_dev_ser devices[];
+} __packed;
+
+/* Ensure all elements of devices[] are naturally aligned. */
+static_assert(offsetof(struct pci_ser, devices) % sizeof(unsigned long) == 0);
+static_assert(sizeof(struct pci_dev_ser) % sizeof(unsigned long) == 0);
+
+#endif /* _LINUX_KHO_ABI_PCI_H */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2c4454583c11..8cadeeab86fd 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -42,6 +42,7 @@
#include <uapi/linux/pci.h>
#include <linux/pci_ids.h>
+#include <linux/pci_liveupdate.h>
#define PCI_STATUS_ERROR_BITS (PCI_STATUS_DETECTED_PARITY | \
PCI_STATUS_SIG_SYSTEM_ERROR | \
diff --git a/include/linux/pci_liveupdate.h b/include/linux/pci_liveupdate.h
new file mode 100644
index 000000000000..8ec98beefcb4
--- /dev/null
+++ b/include/linux/pci_liveupdate.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * PCI Live Update support (Public/Driver API)
+ *
+ * Copyright (c) 2026, Google LLC.
+ * David Matlack <dmatlack@google.com>
+ */
+#ifndef LINUX_PCI_LIVEUPDATE_H
+#define LINUX_PCI_LIVEUPDATE_H
+
+#include <linux/liveupdate.h>
+#include <linux/types.h>
+
+struct pci_dev;
+
+#ifdef CONFIG_PCI_LIVEUPDATE
+int pci_liveupdate_register_flb(struct liveupdate_file_handler *fh);
+void pci_liveupdate_unregister_flb(struct liveupdate_file_handler *fh);
+#else
+static inline int pci_liveupdate_register_flb(struct liveupdate_file_handler *fh)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void pci_liveupdate_unregister_flb(struct liveupdate_file_handler *fh)
+{
+}
+#endif
+
+#endif /* LINUX_PCI_LIVEUPDATE_H */
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox