* Re: [PATCH v8 00/10] VMSCAPE optimization for BHI variant
From: Pawan Gupta @ 2026-03-30 16:11 UTC (permalink / raw)
To: Jon Kohler
Cc: x86@kernel.org, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
Paolo Bonzini, Jonathan Corbet, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org, Asit Mallick, Tao Zhang, bpf@vger.kernel.org,
netdev@vger.kernel.org, linux-doc@vger.kernel.org
In-Reply-To: <3B7BF368-4A3A-4853-A7CD-6F17E7982546@nutanix.com>
On Mon, Mar 30, 2026 at 03:16:32AM +0000, Jon Kohler wrote:
> Tested the v7 of this series with 6.18.y and one of our performance
> suites, where we had previously bisected a significant regression to
> the enablement of the VMSCAPE mitigation. This particular suite looks
> at synthetic performance using KVM virtualized Windows guests.
>
> Long story short, this suite tries to derive what end user experience
> would be in these virtual machines while performing a standardized set
> of synthetic tasks on real apps.
>
> VMSCAPE hits especially hard when enabling Windows HVCI, which drives
> a much higher VMExit count, all else equals.
>
> Tested on an Intel Xeon 6444Y (SPR)
>
> TLDR, we're really happy with the results. The following was with
> Intel MBEC *enabled*, so even with that speedup (and drastic reduction
> in VMExits), this optimization makes a significant difference.
>
> - CPU‑ready time drops ~70 % across all steady‑state and log‑on metrics
> with this series, indicating more efficient context switching even
> though overall hypervisor CPU rises ~14 % (steady) to ~12 % (max).
> Basically, we're getting more actual work done.
> - Read/write IOPS increase by ~18–37 % and 14–20 % respectively, while
> average IO latency remains largely unchanged or slightly lower in
> steady metrics.
> - Power consumption falls 5–11 % in every category
> - Login times improve by 4–6 % on average.
> - Application start‑up times are generally better (Word, Excel,
> PowerPoint, Outlook), especially Outlook max time drops 67 %, a clear
> win for end‑user experience.
These results are promising.
> Tested-By: Jon Kohler <jon@nutanix.com>
Thanks for testing, Jon.
^ permalink raw reply
* Re: [PATCH] doc tools: better handle KBUILD_VERBOSE
From: Jonathan Corbet @ 2026-03-30 16:03 UTC (permalink / raw)
To: Mauro Carvalho Chehab, Linux Doc Mailing List
Cc: Mauro Carvalho Chehab, linux-kernel, Jacob Keller,
Mauro Carvalho Chehab, Randy Dunlap, Shuah Khan
In-Reply-To: <7a99788db75630fb14828d612c0fd77c45ec1891.1774591065.git.mchehab+huawei@kernel.org>
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
> As reported by Jacob, there are troubles when KBUILD_VERBOSE is
> set at the environment.
>
> Fix it on both kernel-doc and sphinx-build-wrapper.
>
> Reported-by: Jacob Keller <jacob.e.keller@intel.com>
> Closes: https://lore.kernel.org/linux-doc/9367d899-53af-4d9c-9320-22fc4dbadca5@intel.com/
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> tools/docs/sphinx-build-wrapper | 7 ++++++-
> tools/lib/python/kdoc/kdoc_files.py | 7 ++++++-
> 2 files changed, 12 insertions(+), 2 deletions(-)
Applied, thanks.
jon
^ permalink raw reply
* Re: [PATCH v10 2/2] hwmon: add support for MCP998X
From: Guenter Roeck @ 2026-03-30 16:00 UTC (permalink / raw)
To: Victor.Duicu
Cc: corbet, linux-hwmon, devicetree, robh, linux-kernel, krzk+dt,
linux-doc, conor+dt, Marius.Cristea
In-Reply-To: <2d3955f5b906018fd7670ed5b8d37eaffa0ec207.camel@microchip.com>
On 3/30/26 05:01, Victor.Duicu@microchip.com wrote:
> Hi Guenter,
>
> ...
>
>>> + }
>>> +
>>> + switch (type) {
>>> + case hwmon_temp:
>>> + switch (attr) {
>>> + case hwmon_temp_input:
>>> + /* Block reading from addresses 0x00->0x09 is
>>> not allowed. */
>>> + ret = regmap_read(priv->regmap,
>>> MCP9982_HIGH_BYTE_ADDR(channel), ®_high);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + ret = regmap_read(priv->regmap,
>>> MCP9982_HIGH_BYTE_ADDR(channel) + 1,
>>> + ®_low);
>>> + if (ret)
>>> + return ret;
>>
>> Reading the 11-bit temperature value involves two separate 8-bit
>> register reads.
>> If the chip updates the temperature between these two reads, the
>> resulting value
>> may be torn. While some chips latch the low byte upon reading the
>> high byte,
>> the driver does not explicitly rely on or document this behavior, and
>> it's safer
>> to use regmap_bulk_read if supported, or at least ensure the correct
>> order and
>> atomicity if possible.
>>
>> Note: Maybe the low temperature is latched, but there is no
>> indication in the
>> datasheet that this would be the case. Even if it is, the code above
>> is
>> inefficient.
>
> The low temperature register is latched. In the documentation at
> page 32 it is described that when reading the high byte register,
> the value from the low byte register is copied into a 'shadow'
> register. In this way it is guaranteed that when we read the low byte,
> it will correspond to the high byte.
>
> Regarding the bulk read, the chip has a number of design quirks and
> because of that different commands are supported only on some
> particular memory regions.
>
> According to the documentation page 26, the only areas of memory that
> support SMBus block read are 80h->89h(temperature memory block) and
> 90h->97h(status memory block). In order to block read the temperatures,
> the area of memory targeted has to be the temperature memory block. In
> this context the read operation uses SMBus protocol and the first value
> returned will be the number of addresses that can be read (in our
> particular case a max value of 10 bytes).
>
> In v8 of the driver
> https://lore.kernel.org/all/20251120071248.3767-1-victor.duicu@microchip.com/
> ,
> the temperature values were read with regmap_bulk_read(). In that
> version, regmap_bulk_read() was also used to read the temperature
> limits, without returning count (this is an undocumented feature of the
> chip and because of that we could assume is not supported).
> In order to avoid this behaviour and avoid mixing the SMBus and I2C
> protocols all block readings were removed.
>
> In the hopes of bypassing a long chain of replies, I tested the
> behaviour of the chip with different read instructions.
> Regmap_bulk_read() when applied to the temperature memory block
> (80h->89h) returns count and the high and low bytes. When it is applied
> to the 00h->09h memory, it uses I2C. It returns one temperature byte,
> but all other bytes are returned as 0xFF. The chip behaves as if
> it is at the last register location in the temperature block while the
> host continues to ACK.(behaviour described at page 26).
> If we set use_single_read in regmap_config and apply regmap_bulk_read()
> to the 00h->09h register area the high and low temperature bytes are
> read successfully without count.
>
> Regmap_multi_reg_read() reads a number of registers one by one. When
> applied to the 00h->09h area, I2C is used and it returns only the high
> and low temperature bytes. When applied to the temperature memory block
> (80h->89h), because it is not a bulk function, returns the count till
> the end of the temperature memory block (aka SMBus count).
>
> I2c_smbus_read_block_data() when applied to the temperature block (80h-
> 89h) returns the count, the driver replies with an NACK and the
> communication is stopped. In our case, the board we are using to test
> the driver has an AT91 adapter and supports
> I2C_FUNC_SMBUS_READ_BLOCK_DATA. It seems that the I2C driver for AT91
> does not modify the buff length of the message, leaving it 1.
>
> I2c_smbus_read_i2c_block_data() when applied to the temperature block
> (80h-89h) returns count and the temperature values.
>
> If you are of the opinion that block reading the temperatures is worth
> introducing (even in case we need to skip count) then I can add it, but
> we should come to an agreement on which function to use.
> Please let me know your thoughts.
>
It is your chip, so I'll let you decide. Please include all the above
as comments into the code.
Thanks,
Guenter
^ permalink raw reply
* Re: [PATCH] docs: generate a static 404 page
From: Jonathan Corbet @ 2026-03-30 15:53 UTC (permalink / raw)
To: Rito Rhymes, skhan, mchehab; +Cc: linux-doc, linux-kernel, Rito Rhymes
In-Reply-To: <20260329180448.24614-1-rito@ritovision.com>
Rito Rhymes <rito@ritovision.com> writes:
> Broken links in static deployments currently fall back to a
> generic web server 404 page, which leaves users on an orphaned
> error page with no direct way to continue navigating the
> documentation site. Add a dedicated not-found page so deployments
> can serve a project-specific 404 instead.
>
> It keeps the normal documentation layout around the error state so
> users still have the search box, table of contents, footer links,
> and a clear route back to the documentation root. The penguin
> logo makes it less generic and adds character to what is
> otherwise a frustrating page to encounter.
>
> For translated documentation, generate 404 pages whose return
> link keeps users inside the current translation instead of always
> sending them back to the English root documentation.
>
> Actual 404 handling remains a web server concern.
>
> Signed-off-by: Rito Rhymes <rito@ritovision.com>
> Assisted-by: Codex:GPT-5.4
I don't think that this makes a lot of sense.
Who are your users, what is your use case? Who do you think will do all
of the setup work to create a server with a custom 404 page, but can't
supply the page itself?
This is the kernel documentation, not a web-site construction kit.
Please slow down and think about solving real problems. There is So
Much Work that needs to be done with the kernel documentation, but none
of it has to do with this stuff. And, in any case, the merge window is
approaching, so significant changes will not be accepted at this point
even if they otherwise make sense.
Thanks,
jon
^ permalink raw reply
* Re: [PATCH v8 2/3] hwmon: ltc4283: Add support for the LTC4283 Swap Controller
From: Guenter Roeck @ 2026-03-30 15:47 UTC (permalink / raw)
To: Nuno Sá, Nuno Sá
Cc: linux-gpio, linux-hwmon, devicetree, linux-doc, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Jonathan Corbet, Linus Walleij,
Bartosz Golaszewski
In-Reply-To: <aco5L_6SZIB2DdpF@nsa>
On 3/30/26 02:28, Nuno Sá wrote:
> Hi Guenter, Regarding AI review, I think most of the points were
> discussed in previous revisions, but there are two valid.
>
> On Fri, Mar 27, 2026 at 05:26:15PM +0000, Nuno Sá wrote:
>> Support the LTC4283 Hot Swap Controller. The device features programmable
>> current limit with foldback and independently adjustable inrush current to
>> optimize the MOSFET safe operating area (SOA). The SOA timer limits MOSFET
>> temperature rise for reliable protection against overstresses.
>>
>> An I2C interface and onboard ADC allow monitoring of board current,
>> voltage, power, energy, and fault status.
>>
>> Signed-off-by: Nuno Sá <nuno.sa@analog.com>
>> ---
>> Documentation/hwmon/index.rst | 1 +
>> Documentation/hwmon/ltc4283.rst | 266 ++++++
>> MAINTAINERS | 1 +
>> drivers/hwmon/Kconfig | 12 +
>> drivers/hwmon/Makefile | 1 +
>> drivers/hwmon/ltc4283.c | 1796 +++++++++++++++++++++++++++++++++++++++
>> 6 files changed, 2077 insertions(+)
>>
>
> ...
>
>> +static int ltc4283_read_in_alarm(struct ltc4283_hwmon *st, u32 channel,
>> + bool max_alm, long *val)
>> +{
>> + if (channel == LTC4283_VPWR)
>> + return ltc4283_read_alarm(st, LTC4283_ADC_ALM_LOG_1,
>> + BIT(2 + max_alm), val);
>> +
>> + if (channel >= LTC4283_CHAN_ADI_1 && channel <= LTC4283_CHAN_ADI_4) {
>> + u32 bit = (channel - LTC4283_CHAN_ADI_1) * 2;
>> + /*
>> + * Lower channels go to higher bits. We also want to go +1 down
>> + * in the min_alarm case.
>> + */
>> + return ltc4283_read_alarm(st, LTC4283_ADC_ALM_LOG_2,
>> + BIT(7 - bit - !max_alm), val);
>> + }
>> +
>> + if (channel >= LTC4283_CHAN_ADIO_1 && channel <= LTC4283_CHAN_ADIO_4) {
>> + u32 bit = (channel - LTC4283_CHAN_ADIO_1) * 2;
>> +
>> + return ltc4283_read_alarm(st, LTC4283_ADC_ALM_LOG_3,
>> + BIT(7 - bit - !max_alm), val);
>> + }
>> +
>> + if (channel >= LTC4283_CHAN_ADIN12 && channel <= LTC4283_CHAN_ADIN34) {
>> + u32 bit = (channel - LTC4283_CHAN_ADIN12) * 2;
>> +
>> + return ltc4283_read_alarm(st, LTC4283_ADC_ALM_LOG_5,
>> + BIT(7 - bit - !max_alm), val);
>> + }
>
> "Will this condition handle the ADIO12 and ADIO34 differential channels?
> It looks like channels 14 and 15 fall through to the default return intended
> for the DRAIN channel. Since reading the alarm implicitly clears the register
> bits, could reading these ADIO alarms unintentionally clear actual DRAIN
> alarms? Should the upper bound be LTC4283_CHAN_ADIO34?"
>
> Good catch and should be:
>
> - if (channel >= LTC4283_CHAN_ADIN12 && channel <= LTC4283_CHAN_ADIN34) {
> + if (channel >= LTC4283_CHAN_ADIN12 && channel <= LTC4283_CHAN_ADIO34) {
>
>> +
>> + if (channel == LTC4283_CHAN_DRNS)
>> + return ltc4283_read_alarm(st, LTC4283_ADC_ALM_LOG_4,
>> + BIT(6 + max_alm), val);
>> +
>> + return ltc4283_read_alarm(st, LTC4283_ADC_ALM_LOG_4, BIT(4 + max_alm),
>> + val);
>> +}
>
> ...
>
>> +
>> +static int ltc4283_probe(struct i2c_client *client)
>> +{
>> + struct device *dev = &client->dev, *hwmon;
>> + struct auxiliary_device *adev;
>> + struct ltc4283_hwmon *st;
>> + int ret;
>> +
>> + st = devm_kzalloc(dev, sizeof(*st), GFP_KERNEL);
>> + if (!st)
>> + return -ENOMEM;
>> +
>> + if (!i2c_check_functionality(client->adapter,
>> + I2C_FUNC_SMBUS_BYTE_DATA |
>> + I2C_FUNC_SMBUS_WORD_DATA |
>> + I2C_FUNC_SMBUS_READ_I2C_BLOCK))
>> + return -EOPNOTSUPP;
>> +
>> + st->client = client;
>> + st->map = devm_regmap_init(dev, <c4283_regmap_bus, client,
>> + <c4283_regmap_config);
>> + if (IS_ERR(st->map))
>> + return dev_err_probe(dev, PTR_ERR(st->map),
>> + "Failed to create regmap\n");
>> +
>> + ret = ltc4283_setup(st, dev);
>> + if (ret)
>> + return ret;
>> +
>> + hwmon = devm_hwmon_device_register_with_info(dev, "ltc4283", st,
>> + <c4283_chip_info, NULL);
>> +
>> + if (IS_ERR(hwmon))
>> + return PTR_ERR(hwmon);
>> +
>> + ltc4283_debugfs_init(st, client);
>> +
>> + if (!st->gpio_mask)
>> + return 0;
>> +
>> + adev = devm_auxiliary_device_create(dev, "gpio", &st->gpio_mask);
>> + if (!adev)
>> + return dev_err_probe(dev, -ENODEV, "Failed to add GPIO device\n");
>
> "Does this allow multiple LTC4283 chips to probe successfully?
> Without allocating a unique ID per I2C instance, it seems the first probed
> chip takes the generic name. If a second chip is present, it might attempt
> to register with the exact same name, resulting in a failure in device_add()
> and aborting the probe."
>
> Also looks valid and I suspect is one of those that a quick look will
> find more "offenders". I would purpose:
>
> - adev = devm_auxiliary_device_create(dev, "gpio", &st->gpio_mask);
> + adev = __devm_auxiliary_device_create(dev, KBUILD_MODNAME, "gpio",
> + &st->gpio_mask, client->addr);
>
That would still fail if there are multiple chips at the same I2C address
on multiple I2C busses. Check drivers/gpu/drm/bridge/ti-sn65dsi86.c which has
the same problem.
> If there's nothing else and you agree with the above, is this something
> you can tweak while applying or should I spin a new version?
>
Please respin. Also, regarding the other concerns:
Can BIT(8) * st->rsense wrap to zero on 32-bit architectures?
BIT(8) is a 32-bit unsigned long and st->rsense is a u32. If a user sets a
very large sense resistor value via the device tree, the multiplication could
wrap to 0, causing a division-by-zero kernel panic. Should the divisor use
BIT_ULL(8)?
Unless I am missing something, this _can_ overflow. Try to provide a sense
resistor value of 1677721600. Yes, it is unreasonable to specify such large
rsense values, but why not just limit it such that it does not overflow ?
Also, for the overflow concerns, if you are sure they can not happen, I'll
really need to write the unit test code to make sure that this is indeed
the case.
Thanks,
Guenter
^ permalink raw reply
* Re: [PATCH] docs: add copy buttons for code blocks
From: Jonathan Corbet @ 2026-03-30 15:40 UTC (permalink / raw)
To: Rito Rhymes, skhan; +Cc: linux-doc, linux-kernel, Rito Rhymes
In-Reply-To: <20260329214816.10553-1-rito@ritovision.com>
Rito Rhymes <rito@ritovision.com> writes:
> Add a copy button to highlighted code blocks in the documentation
> that copies the full contents of the code block to the clipboard.
>
> This is faster and less error-prone than manually selecting and
> copying code from the page, especially for longer examples where
> part of the block can be accidentally missed.
>
> Keep the control hidden until the user interacts with the block so
> it stays out of the way during normal reading. Reveal it on hover,
> focus, and touch interaction, then copy the block contents to the
> clipboard with a small success or failure state.
>
> Signed-off-by: Rito Rhymes <rito@ritovision.com>
> Assisted-by: Codex:GPT-5.4
> Assisted-by: Claude Opus 4.6
> ---
> Live demo:
> https://kernel-docs-cp.ritovision.com/accounting/delay-accounting.html
Honestly, I don't think so.
Rito, who is asking for this feature? What is the use case? Does it
really justify adding a blob of JavaScript code to every view of the
kernel documentation - JavaScript that we will have to maintain going
forward?
Our goal here is to make the kernel documentation better, not to shovel
lots of code into the repository.
If you can get some acks from established kernel developers saying that
they want this change, I will reconsider - but only after the merge
window. But I really think that, again, this is something you might
want to discuss with the Sphinx developers, then turn it into something
without hard-coded colors that will work with whatever theme people
might choose to build their docs with.
jon
^ permalink raw reply
* Re: [PATCH v2 5/5] docs: pt_BR: complete PGP guide translation
From: Konstantin Ryabitsev @ 2026-03-30 15:17 UTC (permalink / raw)
To: Daniel Pereira; +Cc: Jonathan Corbet, linux-doc
In-Reply-To: <20260329165041.831369-6-danielmaraboo@gmail.com>
On Sun, 29 Mar 2026 13:50:39 -0300, Daniel Pereira <danielmaraboo@gmail.com> wrote:
> diff --git a/Documentation/translations/pt_BR/process/maintainer-pgp-guide.rst b/Documentation/translations/pt_BR/process/maintainer-pgp-guide.rst
> index 3501756fda52..fd19fd4c9eda 100644
> --- a/Documentation/translations/pt_BR/process/maintainer-pgp-guide.rst
> +++ b/Documentation/translations/pt_BR/process/maintainer-pgp-guide.rst
> @@ -778,3 +778,136 @@ criptográfica nos cabeçalhos das mensagens (estilo DKIM):
> [ ... skip 46 lines ... ]
> + O patatt e o b4 ainda estão em desenvolvimento ativo e você deve consultar
> + a documentação mais recente desses projetos para quaisquer recursos novos
> + ou atualizados.
> +
> +Como verificar identidades de desenvolvedores do kernel
> +=======================================================
Needs the _kernel_identities_pt: label here as well.
> [ ... skip 22 lines ... ]
> +(Web Key Directory) é o método alternativo que usa consultas https para o mesmo
> +propósito. Ao usar DANE ou WKD para buscar chaves públicas, o GnuPG validará o
> +DNSSEC ou os certificados TLS, respectivamente, antes de adicionar as chaves
> +públicas recuperadas automaticamente ao seu chaveiro local.
> +
> +O Kernel.org publica o WKD para todos os desenvolvedores que possuem contas
I think just "O kernel.org" without capital K.
> [ ... skip 50 lines ... ]
> +- `Chaveiro PGP de desenvolvedores do Kernel (pt)`_
> +
> +Se você é um desenvolvedor do kernel, considere enviar sua chave para inclusão
> +nesse chaveiro.
> +
> +.. _`Chaveiro PGP de desenvolvedores do Kernel (pt)`: https://korg.docs.kernel.org/pgpkeys.html
I suggest you switch all (pt) disambiguated links to just anonymous
hyperlinks.
--
KR
^ permalink raw reply
* Re: [PATCH v2 4/5] docs: pt_BR: continue PGP guide: Git and maintenance
From: Konstantin Ryabitsev @ 2026-03-30 15:17 UTC (permalink / raw)
To: Daniel Pereira; +Cc: Jonathan Corbet, linux-doc
In-Reply-To: <20260329165041.831369-5-danielmaraboo@gmail.com>
On Sun, 29 Mar 2026 13:50:38 -0300, Daniel Pereira <danielmaraboo@gmail.com> wrote:
> diff --git a/Documentation/translations/pt_BR/process/maintainer-pgp-guide.rst b/Documentation/translations/pt_BR/process/maintainer-pgp-guide.rst
> index f7b31201499a..3501756fda52 100644
> --- a/Documentation/translations/pt_BR/process/maintainer-pgp-guide.rst
> +++ b/Documentation/translations/pt_BR/process/maintainer-pgp-guide.rst
> @@ -489,3 +489,292 @@ maioria das operações::
> Usar ``--edit-key`` nos coloca no modo de menu novamente, e você notará que a
> listagem das chaves é um pouco diferente. De aqui em diante, todos os comandos
> são feitos de dentro deste modo de menu, conforme indicado por ``gpg>``.
> +
> +Primeiro, vamos selecionar a chave que colocaremos no cartão -- você faz isso
> +digitando ``key 1`` (é a primeira na listagem, a subchave **[E]**)::
> +
> + gpg> key 1
> +
> +Na saída, você deverá ver agora ``ssb*`` na chave **[E]**. O ``*`` indica qual
> +chave está atualmente "selecionada". Ele funciona como uma *alternância*
> +(toggle), o que significa que se você digitar ``key 1`` novamente, o ``*``
If we have to add (toggle) here, I don't think we're properly explaining
what this does. The idea I had was to compare this to a physical switch
that one can toggle from one position to another, so maybe a better
translation here would be to use something like "Ele funciona como um
interruptor" or "botão liga/desliga"? Up to you, though.
> +desaparecerá e a chave não estará mais selecionada.
> +
> +Agora, vamos mover essa chave para o smartcard::
> +
> + gpg> keytocard
> + Please select where to store the key:
> + (2) Encryption key
> + Your selection? 2
> +
> +Como é a nossa chave **[E]**, faz sentido colocá-la no slot de Criptografia
> +(Encryption). Quando você enviar sua seleção, será solicitada primeiro a frase
"Quando você confirmar" would be better? (Oh, and Portuguese always
trips me up with its future subjunctive looking identical to its
indefinite -- my French speaking part of the brain is so confused).
> [ ... skip 126 lines ... ]
> +servidor sshd na extremidade remota.
> +
> +.. _`Encaminhamento de Agent sobre SSH (pt)`: https://wiki.gnupg.org/AgentForwarding
> +
> +Usando PGP com Git
> +==================
Same thing here, needs a .._pgp_with_git_pt: label.
> +
> +Uma das principais características do Git é sua natureza descentralizada --
> +uma vez que um repositório é clonado em seu sistema, você tem o histórico
> +completo do projeto, incluindo todas as suas tags, commits e branches. No
> +entanto, com centenas de repositórios clonados por aí, como alguém verifica
> +se sua cópia do linux.git não foi adulterada por um terceiro mal-intencionado?
> +
> +Ou o que acontece se um código malicioso for descoberto no kernel e a linha
> +"Author" no commit disser que foi feito por você, enquanto você tem certeza
> +de que `não teve relação com isso (pt)`_?
Same here, just use an anonymous inline hyperlink to avoid having to
add (pt).
> [ ... skip 32 lines ... ]
> +
> +Para verificar uma tag assinada, use o comando ``verify-tag``::
> +
> + $ git verify-tag [tagname]
> +
> +Se você estiver baixando (pulling) uma tag de outro fork do repositório do
I would just use: Se você fizer um git pull
This avoids having to add (pulling) in parens to explain "baixando".
> [ ... skip 83 lines ... ]
> +kernel.org criou para este fim, que coloca assinaturas de atestação
> +criptográfica nos cabeçalhos das mensagens (estilo DKIM):
> +
> +- `Atestação de Patch Patatt (pt)`_
> +
> +.. _`Atestação de Patch Patatt (pt)`: https://pypi.org/project/patatt/
Same here -- just use an anonymous inline hyperlink.
--
KR
^ permalink raw reply
* Re: [PATCH v2 3/5] docs: pt_BR: continue PGP guide translation
From: Konstantin Ryabitsev @ 2026-03-30 15:17 UTC (permalink / raw)
To: Daniel Pereira; +Cc: Jonathan Corbet, linux-doc
In-Reply-To: <20260329165041.831369-4-danielmaraboo@gmail.com>
On Sun, 29 Mar 2026 13:50:37 -0300, Daniel Pereira <danielmaraboo@gmail.com> wrote:
> diff --git a/Documentation/translations/pt_BR/process/maintainer-pgp-guide.rst b/Documentation/translations/pt_BR/process/maintainer-pgp-guide.rst
> index 93f0759e94b2..f7b31201499a 100644
> --- a/Documentation/translations/pt_BR/process/maintainer-pgp-guide.rst
> +++ b/Documentation/translations/pt_BR/process/maintainer-pgp-guide.rst
> @@ -200,3 +200,292 @@ offline; portanto, se você tiver apenas uma chave **[SC]** combinada, você dev
> [ ... skip 23 lines ... ]
> +está criptografada com essa frase secreta e, se você algum dia alterá-la, você
> +não se lembrará de qual era quando criou o backup -- *garantido*.
> +
> +Coloque a cópia impressa resultante e a frase secreta escrita à mão em um
> +envelope e guarde-os em um local seguro e bem protegido, de preferência longe
> +de sua casa, como o cofre de um banco.
I'm okay if you change it if the recommendation to store the backup in a
bank deposit box isn't really useful for non-North-American locations.
> [ ... skip 56 lines ... ]
> + de PDF, etc.)
> +- por meio de coação ao cruzar fronteiras internacionais
> +
> +Proteger sua chave com uma boa frase secreta ajuda muito a reduzir o risco
> +de qualquer um dos itens acima, mas as frases secretas podem ser descobertas
> +por meio de keyloggers, shoulder-surfing (observação direta) ou qualquer número
I think "observação clandestina" would work better than "direta" here.
> [ ... skip 63 lines ... ]
> +
> +Assim que concluir isso, certifique-se de excluir o arquivo ``secring.gpg``
> +obsoleto, que ainda contém suas chaves privadas.
> +
> +Mova as subchaves para um dispositivo criptográfico dedicado
> +============================================================
I think this needs a _smartcards_pt: label before this section. The
commit message says this label exists, but it is not present in the
patch.
> [ ... skip 30 lines ... ]
> +
> +A menos que todos os seus laptops e estações de trabalho tenham leitores de
> +smartcard, o mais fácil é obter um dispositivo USB especializado que implemente
> +a funcionalidade de smartcard. Existem várias opções disponíveis:
> +
> +- `Nitrokey Start (pt)`_: Hardware aberto e Software Livre, baseado no `Gnuk_pt`_ da FSI
I think (pt) here would be confusing to readers, because this implies
that if they follow the link, the site will be in Portuguese. There's
also an inconsistency with `Gnuk_pt` here. I think a better strategy is
to use inline anonymous hyperlinks like:
- `Nitrokey Start <https://www.nitrokey.com/products/nitrokeys>`__:
Hardware aberto e ...
This should avoid clashing with the English version and not create
confusion for readers.
> [ ... skip 22 lines ... ]
> +.. _`se qualifica para um Nitrokey Start gratuito`: https://www.kernel.org/nitrokey-digital-tokens-for-kernel-developers.html
> +
> +Configure seu dispositivo smartcard
> +-----------------------------------
> +
> +Seu dispositivo smartcard deve simplesmente funcionar (Just Work - TM) no
Does the "Just Work TM" joke still make sense in the translation? :)
--
KR
^ permalink raw reply
* Re: [PATCH v2 2/5] docs: pt_BR: start translation of the PGP maintainer guide
From: Konstantin Ryabitsev @ 2026-03-30 15:17 UTC (permalink / raw)
To: Daniel Pereira; +Cc: Jonathan Corbet, linux-doc
In-Reply-To: <20260329165041.831369-3-danielmaraboo@gmail.com>
On Sun, 29 Mar 2026 13:50:36 -0300, Daniel Pereira <danielmaraboo@gmail.com> wrote:
> diff --git a/Documentation/translations/pt_BR/process/maintainer-pgp-guide.rst b/Documentation/translations/pt_BR/process/maintainer-pgp-guide.rst
> new file mode 100644
> index 000000000000..93f0759e94b2
> --- /dev/null
> +++ b/Documentation/translations/pt_BR/process/maintainer-pgp-guide.rst
> @@ -0,0 +1,202 @@
> [ ... skip 26 lines ... ]
> +- Repositórios de fontes distribuídos (git)
> +- Arquivos tarballs por release
> +
> +Tanto os repositórios git quanto os tarballs carregam assinaturas PGP dos
> +desenvolvedores do kernel que criam os lançamentos oficiais. Essas assinaturas
> +oferecem uma garantia criptográfica de que as versões para download
The "para download" jumped out at me. Like I said, I'm not very well
versed in pt_BR, but shouldn't this be "baixar"? Looking at other docs
in the pt_BR translation, I think they use "baixar/baixando" there. I do
realize that technical jargon is normal, though.
> [ ... skip 80 lines ... ]
> +
> +Você também deve criar uma nova chave se a sua atual for inferior a 2048
> +bits (RSA).
> +
> +Você também deve criar uma nova chave se a sua atual for inferior a 2048
> +bits (RSA).
Looks like this got duplicated.
--
KR
^ permalink raw reply
* Re: [PATCH v2 1/5] docs: add maintainer-kvm-x86 to maintainer-handbooks index
From: Konstantin Ryabitsev @ 2026-03-30 15:17 UTC (permalink / raw)
To: Daniel Pereira; +Cc: Jonathan Corbet, linux-doc
In-Reply-To: <20260329165041.831369-2-danielmaraboo@gmail.com>
On Sun, 29 Mar 2026 13:50:35 -0300, Daniel Pereira <danielmaraboo@gmail.com> wrote:
> Include the KVM x86 subsystem development process notes to the main
> documentation tree. This ensures the new maintainer guide is properly
> indexed and reachable.
This doesn't really belong in this series, since the rest of the patches
deal exclusively with the PGP maintainer's guide. It should be a
separate patch.
--
KR
^ permalink raw reply
* Re: [PATCH v2 0/5] docs: pt_BR: Complete PGP maintainer guide translation
From: Konstantin Ryabitsev @ 2026-03-30 15:17 UTC (permalink / raw)
To: Daniel Pereira; +Cc: Jonathan Corbet, linux-doc
In-Reply-To: <20260329165041.831369-1-danielmaraboo@gmail.com>
On Sun, 29 Mar 2026 13:50:34 -0300, Daniel Pereira <danielmaraboo@gmail.com> wrote:
> This series provides the complete Brazilian Portuguese translation for
> the Kernel Maintainer PGP guide. The translation was divided into
> subsequent patches to facilitate review, covering PGP basics, hardware
> tokens (smartcards), Git integration, and identity verification.
Thank you for providing this translation. My Portuguese is not great --
I mostly nerded out one year after attending the Kernel Summit in Lisbon
to get myself to a decent level of reading fluency, but haven't kept it
up in a while. This translation was a fun refresher, so thank you for
the opportunity. Please take all my comments with a large grain of salt
as coming from someone for whom Portuguese is a 5th or 6th foreign
language.
>
> All internal cross-references were updated to ensure a clean Sphinx
> build, and terminology aligns with the existing pt_BR documentation.
>
> Changes in v2:
> - Fixed translation of "Periodic release snapshots" to "Arquivos
> tarballs por release" as suggested by Mauro Carvalho Chehab.
> - Corrected a double-hyphen formatting error in the first translation
> patch.
> - Added missing Signed-off-by and fixed line wrapping in the
> KVM index patch (1/5).
> - Rebased onto the latest docs-next branch.
>
> Daniel Pereira (5):
> docs: add maintainer-kvm-x86 to maintainer-handbooks index
I don't think this belongs in this series.
> docs: pt_BR: start translation of the PGP maintainer guide
> docs: pt_BR: continue PGP guide translation
> docs: pt_BR: continue PGP guide: Git and maintenance
> docs: pt_BR: complete PGP guide translation
I don't think the translation needs to be split into 4 patches. Just
submit it as a single translated document.
--
KR
^ permalink raw reply
* Re: [PATCH v2] bootconfig: Apply early options from embedded config
From: Breno Leitao @ 2026-03-30 15:04 UTC (permalink / raw)
To: Masami Hiramatsu
Cc: Jonathan Corbet, Shuah Khan, linux-kernel, linux-trace-kernel,
linux-doc, oss, paulmck, rostedt, kernel-team
In-Reply-To: <acpzhCBEPh-tKVqg@gmail.com>
On Mon, Mar 30, 2026 at 06:15:17AM -0700, Breno Leitao wrote:
> On Fri, Mar 27, 2026 at 10:37:44PM +0900, Masami Hiramatsu wrote:
> > On Fri, 27 Mar 2026 03:06:41 -0700
> > Breno Leitao <leitao@debian.org> wrote:
>
> > > > To fix this, we need to change setup_arch() for each architecture so
> > > > that it calls this bootconfig_apply_early_params().
> > >
> > > Could we instead integrate this into parse_early_param() itself? That
> > > approach would avoid the need to modify each architecture individually.
> >
> > Ah, indeed.
>
> I investigated integrating bootconfig into parse_early_param() and hit a
> blocker: xbc_init() and xbc_make_cmdline() depend on memblock_alloc(), but on
> most architectures (x86, arm64, arm, s390, riscv) parse_early_param() is called
> from setup_arch() _before_ memblock is initialized.
That said, I'd like to propose a simpler approach as a first step:
1) Keep calling bootconfig_apply_early_params() from setup_boot_config().
This is the least intrusive approach and expands bootconfig support to
additional early boot parameters.
2) Document that architecture-specific early parameters might be ignored.
If a parameter is consumed early enough (during setup_arch()), it will
not see the bootconfig value.
3) Ensure that early bootconfig parameters don't overwrite the boot command
line. For example, if the boot command line has foo=bar and bootconfig
later has foo=baz, the command line value should take precedence.
This prevents early boot code (in setup_arch()) from seeing a parameter
value that will be changed later.
If that is OK, that is what I have right now:
commit dd6e00e41c381e5fef9d22dda02b104aa8f83101
Author: Breno Leitao <leitao@debian.org>
Date: Mon Mar 30 06:50:28 2026 -0700
bootconfig: Apply early options from embedded config
Bootconfig currently cannot apply early kernel parameters. For example,
the "mitigations=" parameter must be passed through traditional boot
methods because bootconfig parsing happens after these early parameters
need to be processed.
Add bootconfig_apply_early_params() which walks all kernel.* keys in the
parsed XBC tree and calls do_early_param() for each one. It is called
from setup_boot_config() immediately after a successful xbc_init() on
the embedded data, which happens before parse_early_param() runs in
start_kernel().
This allows early options such as:
kernel.mitigations = off
to be placed in the embedded bootconfig and take effect, without
requiring them on the kernel command line.
If the same parameter appears on both the kernel command line and in
the embedded bootconfig, the command-line value takes precedence:
bootconfig_apply_early_params() checks boot_command_line and skips
any parameter already present there.
Known limitations are documented:
- Early options in initrd bootconfig are still silently ignored, as the
initrd is only available after the early param window has closed.
- Arch-specific early params consumed during setup_arch() (e.g. mem=,
earlycon, noapic) may not take effect from bootconfig.
Signed-off-by: Breno Leitao <leitao@debian.org>
diff --git a/Documentation/admin-guide/bootconfig.rst b/Documentation/admin-guide/bootconfig.rst
index f712758472d5c..6ed852a0c66d8 100644
--- a/Documentation/admin-guide/bootconfig.rst
+++ b/Documentation/admin-guide/bootconfig.rst
@@ -169,6 +169,15 @@ Boot Kernel With a Boot Config
There are two options to boot the kernel with bootconfig: attaching the
bootconfig to the initrd image or embedding it in the kernel itself.
+Early options (those registered with ``early_param()``) may only be
+specified in the embedded bootconfig, because the initrd is not yet
+available when early parameters are processed.
+
+Note that embedded bootconfig is parsed after ``setup_arch()``, so
+early options that are consumed during architecture initialization
+(e.g., ``mem=``, ``memmap=``, ``earlycon``, ``noapic``, ``nolapic``,
+``acpi=``, ``numa=``, ``iommu=``) may not take effect from bootconfig.
+
Attaching a Boot Config to Initrd
---------------------------------
diff --git a/init/Kconfig b/init/Kconfig
index 7484cd703bc1a..34adcc1feb9b6 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1525,6 +1525,16 @@ config BOOT_CONFIG_EMBED
image. But if the system doesn't support initrd, this option will
help you by embedding a bootconfig file while building the kernel.
+ Unlike bootconfig attached to initrd, the embedded bootconfig also
+ supports early options (those registered with early_param()). Any
+ kernel.* key in the embedded bootconfig is applied before
+ parse_early_param() runs. Early options in initrd bootconfig will
+ not be applied. Early options consumed during setup_arch() (e.g.
+ mem=, memmap=, earlycon, noapic, acpi=, numa=, iommu=) may not
+ take effect. If the same early option
+ appears in both bootconfig and the kernel command line, the
+ command line value takes precedence.
+
If unsure, say N.
config BOOT_CONFIG_EMBED_FILE
diff --git a/init/main.c b/init/main.c
index 1cb395dd94e43..487fe86ab5c09 100644
--- a/init/main.c
+++ b/init/main.c
@@ -414,10 +414,112 @@ static int __init warn_bootconfig(char *str)
return 0;
}
+/*
+ * do_early_param() is defined later in this file but called from
+ * bootconfig_apply_early_params() below, so we need a forward declaration.
+ */
+static int __init do_early_param(char *param, char *val,
+ const char *unused, void *arg);
+
+/*
+ * Check if a parameter name appears on the kernel command line.
+ * Returns true if the parameter was explicitly passed by the bootloader.
+ */
+static bool __init cmdline_has_param(const char *param)
+{
+ const char *p = boot_command_line;
+ int len = strlen(param);
+
+ while ((p = strstr(p, param)) != NULL) {
+ /* Check it's a whole-word match: preceded by space/start */
+ if (p != boot_command_line && *(p - 1) != ' ') {
+ p += len;
+ continue;
+ }
+ /* Followed by =, space, or end of string */
+ if (p[len] == '=' || p[len] == ' ' || p[len] == '\0')
+ return true;
+ p += len;
+ }
+ return false;
+}
+
+/*
+ * bootconfig_apply_early_params - apply kernel.* keys from the embedded
+ * bootconfig as early_param() calls.
+ *
+ * early_param() handlers run before most of the kernel initialises.
+ * A bootconfig attached to initrd arrives too late because the initrd is
+ * not mapped when early params are processed. The embedded bootconfig
+ * lives in the kernel image itself (.init.data), so it is always
+ * reachable.
+ *
+ * Called from setup_boot_config() which runs before parse_early_param()
+ * in start_kernel(), but after setup_arch(). Arch-specific early params
+ * parsed during setup_arch() will not see bootconfig values.
+ */
+static void __init bootconfig_apply_early_params(void)
+{
+ struct xbc_node *knode, *vnode, *root;
+ const char *val;
+ char *val_copy;
+
+ root = xbc_find_node("kernel");
+ if (!root)
+ return;
+
+ xbc_node_for_each_key_value(root, knode, val) {
+ if (xbc_node_compose_key_after(root, knode,
+ xbc_namebuf,
+ XBC_KEYLEN_MAX) < 0)
+ continue;
+
+ /* Command-line values take precedence over bootconfig */
+ if (cmdline_has_param(xbc_namebuf)) {
+ pr_info("bootconfig: skipping '%s', already on command line\n",
+ xbc_namebuf);
+ continue;
+ }
+
+ /* Boolean key with no value — pass NULL like parse_args() */
+ if (!xbc_node_get_child(knode)) {
+ do_early_param(xbc_namebuf, NULL, NULL, NULL);
+ continue;
+ }
+
+ /*
+ * Iterate array values: "foo = bar, buz" becomes two
+ * calls: do_early_param("foo", "bar") and
+ * do_early_param("foo", "buz").
+ */
+ vnode = xbc_node_get_child(knode);
+ xbc_array_for_each_value(vnode, val) {
+ /*
+ * Some early_param handlers save the pointer to
+ * val, so each value needs its own persistent
+ * copy. memblock is available here since we run
+ * after setup_arch(). These allocations are
+ * intentionally never freed because the handlers
+ * may retain references indefinitely.
+ */
+ val_copy = memblock_alloc(strlen(val) + 1,
+ SMP_CACHE_BYTES);
+ if (!val_copy) {
+ pr_err("Failed to allocate bootconfig value for '%s'\n",
+ xbc_namebuf);
+ continue;
+ }
+ strcpy(val_copy, val);
+ do_early_param(xbc_namebuf, val_copy, NULL, NULL);
+ }
+ }
+}
+
static void __init setup_boot_config(void)
{
static char tmp_cmdline[COMMAND_LINE_SIZE] __initdata;
const char *msg, *data;
+ bool embedded = false;
int pos, ret;
size_t size;
char *err;
@@ -425,8 +527,11 @@ static void __init setup_boot_config(void)
/* Cut out the bootconfig data even if we have no bootconfig option */
data = get_boot_config_from_initrd(&size);
/* If there is no bootconfig in initrd, try embedded one. */
- if (!data)
+ if (!data) {
data = xbc_get_embedded_bootconfig(&size);
+ /* tag we have embedded data */
+ embedded = !!data;
+ }
strscpy(tmp_cmdline, boot_command_line, COMMAND_LINE_SIZE);
err = parse_args("bootconfig", tmp_cmdline, NULL, 0, 0, 0, NULL,
@@ -464,6 +569,8 @@ static void __init setup_boot_config(void)
} else {
xbc_get_info(&ret, NULL);
pr_info("Load bootconfig: %ld bytes %d nodes\n", (long)size, ret);
+ if (embedded)
+ bootconfig_apply_early_params();
/* keys starting with "kernel." are passed via cmdline */
extra_command_line = xbc_make_cmdline("kernel");
/* Also, "init." keys are init arguments */
^ permalink raw reply related
* [PATCH v12 15/15] docs: Update KASAN and x86 memory map documentations
From: Maciej Wieczor-Retman @ 2026-03-30 14:34 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Jonathan Corbet, Shuah Khan, Andrey Ryabinin,
Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov,
Vincenzo Frascino
Cc: m.wieczorretman, Maciej Wieczor-Retman, linux-kernel, linux-doc,
kasan-dev, workflows
In-Reply-To: <cover.1774872838.git.m.wieczorretman@pm.me>
From: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
Update the documentation concerning changes to x86's memory address
space and new architecture addition to KASAN's software tag-based mode.
Redo paragraphs in KASAN's documentation on hardware and software
implementation details to allow better extensibility.
Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
---
Changelog v11:
- Split off the documentation portion of v10's patch 13.
- Apply Dave's suggestions to reformat the footer explaining alternate
ranges for KASAN shadow memory, put arch hardware implementation in a
separate paragraph and make a table to hold various implementation
details.
Documentation/arch/x86/x86_64/mm.rst | 21 +++++++++-
Documentation/dev-tools/kasan.rst | 61 ++++++++++++++++++++--------
2 files changed, 62 insertions(+), 20 deletions(-)
diff --git a/Documentation/arch/x86/x86_64/mm.rst b/Documentation/arch/x86/x86_64/mm.rst
index a6cf05d51bd8..3c78ab1afd8d 100644
--- a/Documentation/arch/x86/x86_64/mm.rst
+++ b/Documentation/arch/x86/x86_64/mm.rst
@@ -60,7 +60,7 @@ Complete virtual memory map with 4-level page tables
ffffe90000000000 | -23 TB | ffffe9ffffffffff | 1 TB | ... unused hole
ffffea0000000000 | -22 TB | ffffeaffffffffff | 1 TB | virtual memory map (vmemmap_base)
ffffeb0000000000 | -21 TB | ffffebffffffffff | 1 TB | ... unused hole
- ffffec0000000000 | -20 TB | fffffbffffffffff | 16 TB | KASAN shadow memory
+ ffffec0000000000 | -20 TB | fffffbffffffffff | 16 TB | KASAN shadow memory[1]
__________________|____________|__________________|_________|____________________________________________________________
|
| Identical layout to the 56-bit one from here on:
@@ -130,7 +130,7 @@ Complete virtual memory map with 5-level page tables
ffd2000000000000 | -11.5 PB | ffd3ffffffffffff | 0.5 PB | ... unused hole
ffd4000000000000 | -11 PB | ffd5ffffffffffff | 0.5 PB | virtual memory map (vmemmap_base)
ffd6000000000000 | -10.5 PB | ffdeffffffffffff | 2.25 PB | ... unused hole
- ffdf000000000000 | -8.25 PB | fffffbffffffffff | ~8 PB | KASAN shadow memory
+ ffdf000000000000 | -8.25 PB | fffffbffffffffff | ~8 PB | KASAN shadow memory[1]
__________________|____________|__________________|_________|____________________________________________________________
|
| Identical layout to the 47-bit one from here on:
@@ -178,3 +178,20 @@ correct as KASAN disables KASLR.
For both 4- and 5-level layouts, the KSTACK_ERASE_POISON value in the last 2MB
hole: ffffffffffff4111
+
+1. The range is different based on what KASAN mode is used and what paging level
+ is used:
+
+::
+
+ ============================================================================================================
+ Start addr | Offset | End addr | Size | VM area description
+ ============================================================================================================
+ | | | | 4-level paging:
+ ffffec0000000000 | -20 TB | fffffbffffffffff | 16 TB | KASAN shadow memory (generic mode)
+ fffff40000000000 | -8 TB | fffffbffffffffff | 8 TB | KASAN shadow memory (software tag-based mode)
+ __________________|____________|__________________|_________|_______________________________________________
+ | | | | 5-level paging:
+ ffdf000000000000 | -8.25 PB | fffffbffffffffff | ~8 PB | KASAN shadow memory (generic mode)
+ ffeffc0000000000 | -6 PB | fffffbffffffffff | 4 PB | KASAN shadow memory (software tag-based mode)
+ __________________|____________|__________________|_________|_______________________________________________
diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst
index b11c1be8dff4..d42d80e9fcf1 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -22,8 +22,8 @@ architectures, but it has significant performance and memory overheads.
Software Tag-Based KASAN or SW_TAGS KASAN, enabled with CONFIG_KASAN_SW_TAGS,
can be used for both debugging and dogfood testing, similar to userspace HWASan.
-This mode is only supported for arm64, but its moderate memory overhead allows
-using it for testing on memory-restricted devices with real workloads.
+This mode is only supported for arm64 and x86, but its moderate memory overhead
+allows using it for testing on memory-restricted devices with real workloads.
Hardware Tag-Based KASAN or HW_TAGS KASAN, enabled with CONFIG_KASAN_HW_TAGS,
is the mode intended to be used as an in-field memory bug detector or as a
@@ -346,16 +346,21 @@ Software Tag-Based KASAN
~~~~~~~~~~~~~~~~~~~~~~~~
Software Tag-Based KASAN uses a software memory tagging approach to checking
-access validity. It is currently only implemented for the arm64 architecture.
-
-Software Tag-Based KASAN uses the Top Byte Ignore (TBI) feature of arm64 CPUs
-to store a pointer tag in the top byte of kernel pointers. It uses shadow memory
-to store memory tags associated with each 16-byte memory cell (therefore, it
-dedicates 1/16th of the kernel memory for shadow memory).
-
-On each memory allocation, Software Tag-Based KASAN generates a random tag, tags
-the allocated memory with this tag, and embeds the same tag into the returned
-pointer.
+access validity. It is currently only implemented for the arm64 and x86
+architectures. To function, special hardware CPU features* are needed for
+repurposing space inside the kernel pointers to store pointer tags.
+
+Software Tag-Based mode uses shadow memory to store memory tags associated with
+each 16-byte memory cell (therefore, it dedicates 1/16th of the kernel memory
+for shadow memory). On each memory allocation, Software Tag-Based KASAN
+generates a random tag, tags the allocated memory with this tag, and embeds the
+same tag into the returned pointer.
+
+Two special tag values can be distinguished. A match-all pointer tag (otherwise
+called the 'kernel tag' because it's supposed to be equal to the value normally
+present in the same bits of the linear address when KASAN is disabled) -
+accesses through such pointers are not checked. Another value is also reserved
+to tag freed memory regions.
Software Tag-Based KASAN uses compile-time instrumentation to insert checks
before each memory access. These checks make sure that the tag of the memory
@@ -367,12 +372,32 @@ Software Tag-Based KASAN also has two instrumentation modes (outline, which
emits callbacks to check memory accesses; and inline, which performs the shadow
memory checks inline). With outline instrumentation mode, a bug report is
printed from the function that performs the access check. With inline
-instrumentation, a ``brk`` instruction is emitted by the compiler, and a
-dedicated ``brk`` handler is used to print bug reports.
-
-Software Tag-Based KASAN uses 0xFF as a match-all pointer tag (accesses through
-pointers with the 0xFF pointer tag are not checked). The value 0xFE is currently
-reserved to tag freed memory regions.
+instrumentation, the compiler emits a specific arch-dependent instruction with a
+dedicated handler to print bug reports.
+
+Architecture specific details:
+
+::
+
+ +-----------------------+--------+---------------------+
+ | detail \ architecture | arm64 | x86 |
+ +=======================+========+=====================+
+ | Hardware feature | TBI | LAM |
+ +-----------------------+--------+---------------------+
+ | Kernel tag | 0xFF | 0x0F |
+ +-----------------------+--------+---------------------+
+ | Freed memory tag | 0xFE | 0x0E |
+ +-----------------------+--------+---------------------+
+ | Tag width | 8 bits | 4 bits |
+ +-----------------------+--------+---------------------+
+ | Inline instruction | brk | no compiler support |
+ +-----------------------+--------+---------------------+
+
+* Different architectures implement different hardware features to mask and
+ repurpose linear address bits. arm64 utilizes Top Byte Ignore (TBI) to mask
+ out and allow storing tags in the top byte of the pointer. x86 uses Linear
+ Address Masking (LAM) to store tags in the four bits of the kernel pointer's
+ top byte.
Hardware Tag-Based KASAN
~~~~~~~~~~~~~~~~~~~~~~~~
--
2.53.0
^ permalink raw reply related
* [PATCH v12 01/15] kasan: sw_tags: Use arithmetic shift for shadow computation
From: Maciej Wieczor-Retman @ 2026-03-30 14:33 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Jonathan Corbet, Shuah Khan,
Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
Dmitry Vyukov, Vincenzo Frascino, Andrew Morton, Jan Kiszka,
Kieran Bingham, Nathan Chancellor, Nick Desaulniers,
Bill Wendling, Justin Stitt
Cc: m.wieczorretman, Samuel Holland, Maciej Wieczor-Retman,
linux-arm-kernel, linux-doc, linux-kernel, kasan-dev, workflows,
linux-mm, llvm
In-Reply-To: <cover.1774872838.git.m.wieczorretman@pm.me>
From: Samuel Holland <samuel.holland@sifive.com>
Currently, kasan_mem_to_shadow() uses a logical right shift, which turns
canonical kernel addresses into non-canonical addresses by clearing the
high KASAN_SHADOW_SCALE_SHIFT bits. The value of KASAN_SHADOW_OFFSET is
then chosen so that the addition results in a canonical address for the
shadow memory.
For KASAN_GENERIC, this shift/add combination is ABI with the compiler,
because KASAN_SHADOW_OFFSET is used in compiler-generated inline tag
checks[1], which must only attempt to dereference canonical addresses.
However, for KASAN_SW_TAGS there is some freedom to change the algorithm
without breaking the ABI. Because TBI is enabled for kernel addresses,
the top bits of shadow memory addresses computed during tag checks are
irrelevant, and so likewise are the top bits of KASAN_SHADOW_OFFSET.
This is demonstrated by the fact that LLVM uses a logical right shift in
the tag check fast path[2] but a sbfx (signed bitfield extract)
instruction in the slow path[3] without causing any issues.
Use an arithmetic shift in kasan_mem_to_shadow() as it provides a number
of benefits:
1) The memory layout doesn't change but is easier to understand.
KASAN_SHADOW_OFFSET becomes a canonical memory address, and the shifted
pointer becomes a negative offset, so KASAN_SHADOW_OFFSET ==
KASAN_SHADOW_END regardless of the shift amount or the size of the
virtual address space.
2) KASAN_SHADOW_OFFSET becomes a simpler constant, requiring only one
instruction to load instead of two. Since it must be loaded in each
function with a tag check, this decreases kernel text size by 0.5%.
3) This shift and the sign extension from kasan_reset_tag() can be
combined into a single sbfx instruction. When this same algorithm change
is applied to the compiler, it removes an instruction from each inline
tag check, further reducing kernel text size by an additional 4.6%.
These benefits extend to other architectures as well. On RISC-V, where
the baseline ISA does not shifted addition or have an equivalent to the
sbfx instruction, loading KASAN_SHADOW_OFFSET is reduced from 3 to 2
instructions, and kasan_mem_to_shadow(kasan_reset_tag(addr)) similarly
combines two consecutive right shifts.
Link: https://github.com/llvm/llvm-project/blob/llvmorg-20-init/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp#L1316 [1]
Link: https://github.com/llvm/llvm-project/blob/llvmorg-20-init/llvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp#L895 [2]
Link: https://github.com/llvm/llvm-project/blob/llvmorg-20-init/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp#L669 [3]
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Co-developed-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
---
Changelog v11: (Maciej)
- Remove the arch_kasan_non_canonical_hook() scheme in favor of Andrey
Ryabinin's much nicer simple implementation.
Changelog v10: (Maciej)
- Update the Documentation/dev-tools/kasan.rst file with the changed
kasan_mem_to_shadow().
Changelog v9: (Maciej)
- Take out the arm64 related code from mm/kasan/report.c and put it in
the arch specific directory in a new file so the kasan_mem_to_shadow()
function can be included.
- Reset addr tag bits in arm64's arch_kasan_non_canonical_hook() so the
inline mode can also work with that function (Andrey Ryabinin).
- Fix incorrect number of zeros in a comment in mm/kasan/report.c.
- Remove Catalin's acked-by since changes were made.
Changelog v7: (Maciej)
- Change UL to ULL in report.c to fix some compilation warnings.
Changelog v6: (Maciej)
- Add Catalin's acked-by.
- Move x86 gdb snippet here from the last patch.
Changelog v5: (Maciej)
- (u64) -> (unsigned long) in report.c
Changelog v4: (Maciej)
- Revert x86 to signed mem_to_shadow mapping.
- Remove last two paragraphs since they were just poorer duplication of
the comments in kasan_non_canonical_hook().
Changelog v3: (Maciej)
- Fix scripts/gdb/linux/kasan.py so the new signed mem_to_shadow() is
reflected there.
- Fix Documentation/arch/arm64/kasan-offsets.sh to take new offsets into
account.
- Made changes to the kasan_non_canonical_hook() according to upstream
discussion. Settled on overflow on both ranges and separate checks for
x86 and arm.
Changelog v2: (Maciej)
- Correct address range that's checked in kasan_non_canonical_hook().
Adjust the comment inside.
- Remove part of comment from arch/arm64/include/asm/memory.h.
- Append patch message paragraph about the overflow in
kasan_non_canonical_hook().
Documentation/arch/arm64/kasan-offsets.sh | 8 ++++++--
Documentation/dev-tools/kasan.rst | 18 ++++++++++++------
arch/arm64/Kconfig | 10 +++++-----
arch/arm64/include/asm/memory.h | 14 +++++++++++++-
arch/arm64/mm/kasan_init.c | 7 +++++--
include/linux/kasan.h | 10 ++++++++--
mm/kasan/report.c | 16 ++++++++++++----
scripts/gdb/linux/kasan.py | 5 ++++-
scripts/gdb/linux/mm.py | 5 +++--
9 files changed, 68 insertions(+), 25 deletions(-)
diff --git a/Documentation/arch/arm64/kasan-offsets.sh b/Documentation/arch/arm64/kasan-offsets.sh
index 2dc5f9e18039..ce777c7c7804 100644
--- a/Documentation/arch/arm64/kasan-offsets.sh
+++ b/Documentation/arch/arm64/kasan-offsets.sh
@@ -5,8 +5,12 @@
print_kasan_offset () {
printf "%02d\t" $1
- printf "0x%08x00000000\n" $(( (0xffffffff & (-1 << ($1 - 1 - 32))) \
- - (1 << (64 - 32 - $2)) ))
+ if [[ $2 -ne 4 ]] then
+ printf "0x%08x00000000\n" $(( (0xffffffff & (-1 << ($1 - 1 - 32))) \
+ - (1 << (64 - 32 - $2)) ))
+ else
+ printf "0x%08x00000000\n" $(( (0xffffffff & (-1 << ($1 - 1 - 32))) ))
+ fi
}
echo KASAN_SHADOW_SCALE_SHIFT = 3
diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst
index 4968b2aa60c8..b11c1be8dff4 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -315,13 +315,19 @@ translate a memory address to its corresponding shadow address.
Here is the function which translates an address to its corresponding shadow
address::
- static inline void *kasan_mem_to_shadow(const void *addr)
- {
- return (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT)
- + KASAN_SHADOW_OFFSET;
- }
+ static inline void *kasan_mem_to_shadow(const void *addr)
+ {
+ void *scaled;
-where ``KASAN_SHADOW_SCALE_SHIFT = 3``.
+ if (IS_ENABLED(CONFIG_KASAN_GENERIC))
+ scaled = (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT);
+ else
+ scaled = (void *)((long)addr >> KASAN_SHADOW_SCALE_SHIFT);
+
+ return KASAN_SHADOW_OFFSET + scaled;
+ }
+
+where for Generic KASAN ``KASAN_SHADOW_SCALE_SHIFT = 3``.
Compile-time instrumentation is used to insert memory access checks. Compiler
inserts function calls (``__asan_load*(addr)``, ``__asan_store*(addr)``) before
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index bcd9f5bc66e2..87239396ed23 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -434,11 +434,11 @@ config KASAN_SHADOW_OFFSET
default 0xdffffe0000000000 if ARM64_VA_BITS_42 && !KASAN_SW_TAGS
default 0xdfffffc000000000 if ARM64_VA_BITS_39 && !KASAN_SW_TAGS
default 0xdffffff800000000 if ARM64_VA_BITS_36 && !KASAN_SW_TAGS
- default 0xefff800000000000 if (ARM64_VA_BITS_48 || (ARM64_VA_BITS_52 && !ARM64_16K_PAGES)) && KASAN_SW_TAGS
- default 0xefffc00000000000 if (ARM64_VA_BITS_47 || ARM64_VA_BITS_52) && ARM64_16K_PAGES && KASAN_SW_TAGS
- default 0xeffffe0000000000 if ARM64_VA_BITS_42 && KASAN_SW_TAGS
- default 0xefffffc000000000 if ARM64_VA_BITS_39 && KASAN_SW_TAGS
- default 0xeffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
+ default 0xffff800000000000 if (ARM64_VA_BITS_48 || (ARM64_VA_BITS_52 && !ARM64_16K_PAGES)) && KASAN_SW_TAGS
+ default 0xffffc00000000000 if (ARM64_VA_BITS_47 || ARM64_VA_BITS_52) && ARM64_16K_PAGES && KASAN_SW_TAGS
+ default 0xfffffe0000000000 if ARM64_VA_BITS_42 && KASAN_SW_TAGS
+ default 0xffffffc000000000 if ARM64_VA_BITS_39 && KASAN_SW_TAGS
+ default 0xfffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
default 0xffffffffffffffff
config UNWIND_TABLES
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index a2b7a33966ff..875c0bd0d85a 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -89,7 +89,15 @@
*
* KASAN_SHADOW_END is defined first as the shadow address that corresponds to
* the upper bound of possible virtual kernel memory addresses UL(1) << 64
- * according to the mapping formula.
+ * according to the mapping formula. For Generic KASAN, the address in the
+ * mapping formula is treated as unsigned (part of the compiler's ABI), so the
+ * end of the shadow memory region is at a large positive offset from
+ * KASAN_SHADOW_OFFSET. For Software Tag-Based KASAN, the address in the
+ * formula is treated as signed. Since all kernel addresses are negative, they
+ * map to shadow memory below KASAN_SHADOW_OFFSET, making KASAN_SHADOW_OFFSET
+ * itself the end of the shadow memory region. (User pointers are positive and
+ * would map to shadow memory above KASAN_SHADOW_OFFSET, but shadow memory is
+ * not allocated for them.)
*
* KASAN_SHADOW_START is defined second based on KASAN_SHADOW_END. The shadow
* memory start must map to the lowest possible kernel virtual memory address
@@ -100,7 +108,11 @@
*/
#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
#define KASAN_SHADOW_OFFSET _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
+#ifdef CONFIG_KASAN_GENERIC
#define KASAN_SHADOW_END ((UL(1) << (64 - KASAN_SHADOW_SCALE_SHIFT)) + KASAN_SHADOW_OFFSET)
+#else
+#define KASAN_SHADOW_END KASAN_SHADOW_OFFSET
+#endif
#define _KASAN_SHADOW_START(va) (KASAN_SHADOW_END - (UL(1) << ((va) - KASAN_SHADOW_SCALE_SHIFT)))
#define KASAN_SHADOW_START _KASAN_SHADOW_START(vabits_actual)
#define PAGE_END KASAN_SHADOW_START
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index abeb81bf6ebd..937f6eb8115b 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -198,8 +198,11 @@ static bool __init root_level_aligned(u64 addr)
/* The early shadow maps everything to a single page of zeroes */
asmlinkage void __init kasan_early_init(void)
{
- BUILD_BUG_ON(KASAN_SHADOW_OFFSET !=
- KASAN_SHADOW_END - (1UL << (64 - KASAN_SHADOW_SCALE_SHIFT)));
+ if (IS_ENABLED(CONFIG_KASAN_GENERIC))
+ BUILD_BUG_ON(KASAN_SHADOW_OFFSET !=
+ KASAN_SHADOW_END - (1UL << (64 - KASAN_SHADOW_SCALE_SHIFT)));
+ else
+ BUILD_BUG_ON(KASAN_SHADOW_OFFSET != KASAN_SHADOW_END);
BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS), SHADOW_ALIGN));
BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS_MIN), SHADOW_ALIGN));
BUILD_BUG_ON(!IS_ALIGNED(KASAN_SHADOW_END, SHADOW_ALIGN));
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index bf233bde68c7..fbff1b759c85 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -62,8 +62,14 @@ int kasan_populate_early_shadow(const void *shadow_start,
#ifndef kasan_mem_to_shadow
static inline void *kasan_mem_to_shadow(const void *addr)
{
- return (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT)
- + KASAN_SHADOW_OFFSET;
+ void *scaled;
+
+ if (IS_ENABLED(CONFIG_KASAN_GENERIC))
+ scaled = (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT);
+ else
+ scaled = (void *)((long)addr >> KASAN_SHADOW_SCALE_SHIFT);
+
+ return KASAN_SHADOW_OFFSET + scaled;
}
#endif
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index e804b1e1f886..1e4521b5ef14 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -640,12 +640,20 @@ void kasan_non_canonical_hook(unsigned long addr)
{
unsigned long orig_addr, user_orig_addr;
const char *bug_type;
+ void *tagged_null = set_tag(NULL, KASAN_TAG_KERNEL);
+ void *tagged_addr = set_tag((void *)addr, KASAN_TAG_KERNEL);
/*
- * All addresses that came as a result of the memory-to-shadow mapping
- * (even for bogus pointers) must be >= KASAN_SHADOW_OFFSET.
+ * Filter out addresses that cannot be shadow memory accesses generated
+ * by the compiler.
+ *
+ * In SW_TAGS mode, when computing a shadow address, the compiler always
+ * sets the kernel tag (some top bits) on the pointer *before* computing
+ * the memory-to-shadow mapping. As a result, valid shadow addresses
+ * are derived from tagged kernel pointers.
*/
- if (addr < KASAN_SHADOW_OFFSET)
+ if (tagged_addr < kasan_mem_to_shadow(tagged_null) ||
+ tagged_addr > kasan_mem_to_shadow((void *)(~0ULL)))
return;
orig_addr = (unsigned long)kasan_shadow_to_mem((void *)addr);
@@ -670,7 +678,7 @@ void kasan_non_canonical_hook(unsigned long addr)
} else if (user_orig_addr < TASK_SIZE) {
bug_type = "probably user-memory-access";
orig_addr = user_orig_addr;
- } else if (addr_in_shadow((void *)addr))
+ } else if (addr_in_shadow(tagged_addr))
bug_type = "probably wild-memory-access";
else
bug_type = "maybe wild-memory-access";
diff --git a/scripts/gdb/linux/kasan.py b/scripts/gdb/linux/kasan.py
index 56730b3fde0b..4b86202b155f 100644
--- a/scripts/gdb/linux/kasan.py
+++ b/scripts/gdb/linux/kasan.py
@@ -7,7 +7,8 @@
#
import gdb
-from linux import constants, mm
+from linux import constants, utils, mm
+from ctypes import c_int64 as s64
def help():
t = """Usage: lx-kasan_mem_to_shadow [Hex memory addr]
@@ -39,6 +40,8 @@ class KasanMemToShadow(gdb.Command):
else:
help()
def kasan_mem_to_shadow(self, addr):
+ if constants.CONFIG_KASAN_SW_TAGS and not utils.is_target_arch('x86'):
+ addr = s64(addr)
return (addr >> self.p_ops.KASAN_SHADOW_SCALE_SHIFT) + self.p_ops.KASAN_SHADOW_OFFSET
KasanMemToShadow()
diff --git a/scripts/gdb/linux/mm.py b/scripts/gdb/linux/mm.py
index d78908f6664d..d4ab341d89c5 100644
--- a/scripts/gdb/linux/mm.py
+++ b/scripts/gdb/linux/mm.py
@@ -281,12 +281,13 @@ class aarch64_page_ops():
self.KERNEL_END = gdb.parse_and_eval("_end")
if constants.LX_CONFIG_KASAN_GENERIC or constants.LX_CONFIG_KASAN_SW_TAGS:
+ self.KASAN_SHADOW_OFFSET = constants.LX_CONFIG_KASAN_SHADOW_OFFSET
if constants.LX_CONFIG_KASAN_GENERIC:
self.KASAN_SHADOW_SCALE_SHIFT = 3
+ self.KASAN_SHADOW_END = (1 << (64 - self.KASAN_SHADOW_SCALE_SHIFT)) + self.KASAN_SHADOW_OFFSET
else:
self.KASAN_SHADOW_SCALE_SHIFT = 4
- self.KASAN_SHADOW_OFFSET = constants.LX_CONFIG_KASAN_SHADOW_OFFSET
- self.KASAN_SHADOW_END = (1 << (64 - self.KASAN_SHADOW_SCALE_SHIFT)) + self.KASAN_SHADOW_OFFSET
+ self.KASAN_SHADOW_END = self.KASAN_SHADOW_OFFSET
self.PAGE_END = self.KASAN_SHADOW_END - (1 << (self.vabits_actual - self.KASAN_SHADOW_SCALE_SHIFT))
else:
self.PAGE_END = self._PAGE_END(self.VA_BITS_MIN)
--
2.53.0
^ permalink raw reply related
* [PATCH v12 00/15] kasan: x86: arm64: KASAN tag-based mode for x86
From: Maciej Wieczor-Retman @ 2026-03-30 14:31 UTC (permalink / raw)
To: vbabka, glider, ryabinin.a.a, urezki, tglx, jeremy.linton,
osandov, ritesh.list, morbo, axelrasmussen, ankur.a.arora, baohua,
tabba, catalin.marinas, surenb, maciej.wieczor-retman,
vincenzo.frascino, will, kasong, qi.zheng, hsj0512, shakeel.butt,
weixugc, kees, akpm, yeoreum.yun, jgross, justinstitt,
trintaeoitogc, nick.desaulniers+lkml, corbet, samuel.holland,
Liam.Howlett, rppt, mhocko, jackmanb, mingo, linmag7, kas, ardb,
leitao, david, skhan, thuth, hpa, andreyknvl, luto, maz, dvyukov,
nsc, houwenlong.hwl, bp, jan.kiszka, kevin.brodsky, nathan,
peterz, yuanchu, dave.hansen, kbingham, ljs
Cc: linux-mm, linux-arm-kernel, linux-doc, workflows, llvm, kasan-dev,
linux-kbuild, linux-kernel, x86, m.wieczorretman
======= Introduction
The patchset aims to add a KASAN tag-based mode for the x86 architecture
with the help of the new CPU feature called Linear Address Masking
(LAM). Main improvement introduced by the series is 2x lower memory
usage compared to KASAN's generic mode, the only currently available
mode on x86. The tag based mode may also find errors that the generic
mode couldn't because of differences in how these modes operate.
======= How does KASAN' tag-based mode work?
When enabled, memory accesses and allocations are augmented by the
compiler during kernel compilation. Instrumentation functions are added
to each memory allocation and each pointer dereference.
The allocation related functions generate a random tag and save it in
two places: in shadow memory that maps to the allocated memory, and in
the top bits of the pointer that points to the allocated memory. Storing
the tag in the top of the pointer is possible because of Top-Byte Ignore
(TBI) on arm64 architecture and LAM on x86.
The access related functions are performing a comparison between the tag
stored in the pointer and the one stored in shadow memory. If the tags
don't match an out of bounds error must have occurred and so an error
report is generated.
The general idea for the tag-based mode is very well explained in the
series with the original implementation [1].
[1] https://lore.kernel.org/all/cover.1544099024.git.andreyknvl@google.com/
======= Differences summary compared to the arm64 tag-based mode
- Tag width:
- Tag width influences the chance of a tag mismatch due to two
tags from different allocations having the same value. The
bigger the possible range of tag values the lower the chance
of that happening.
- Shortening the tag width from 8 bits to 4, while it can help
with memory usage, it also increases the chance of not
reporting an error. 4 bit tags have a ~7% chance of a tag
mismatch.
- Address masking mechanism
- TBI in arm64 allows for storing metadata in the top 8 bits of
the virtual address.
- LAM in x86 allows storing tags in bits [62:57] of the pointer.
To maximize memory savings the tag width is reduced to bits
[60:57].
- Inline mode mismatch reporting
- Arm64 inserts a BRK instruction to pass metadata about a tag
mismatch to the KASAN report.
- Right now on x86 the INT3 instruction is used for the same
purpose. The attempt to move it over to use UD1 is already
implemented and tested but relies on another series that needs
merging first. Therefore this patch will be posted separately
once the dependency is satisfied by being merged upstream.
======= Testing
Checked all the kunits for both software tags and generic KASAN after
making changes.
In generic mode (both with these patches and without) the results were:
kasan: pass:61 fail:1 skip:14 total:76
Totals: pass:61 fail:1 skip:14 total:76
not ok 1 kasan
and for software tags:
kasan: pass:65 fail:1 skip:10 total:76
Totals: pass:65 fail:1 skip:10 total:76
not ok 1 kasan
At the time of testing the one failing case is also present on generic
mode without this patchset applied. This seems to point to something
else being at fault for the one case not passing. The test case in
question concerns strscpy() out of bounds error not getting caught.
======= Benchmarks [1]
All tests were ran on a Sierra Forest server platform. The only
differences between the tests were kernel options:
- CONFIG_KASAN
- CONFIG_KASAN_GENERIC
- CONFIG_KASAN_SW_TAGS
- CONFIG_KASAN_INLINE [1]
- CONFIG_KASAN_OUTLINE
Boot time (until login prompt):
* 02:55 for clean kernel
* 05:42 / 06:32 for generic KASAN (inline/outline)
* 05:58 for tag-based KASAN (outline) [2]
Total memory usage (512GB present on the system - MemAvailable just
after boot):
* 12.56 GB for clean kernel
* 81.74 GB for generic KASAN
* 44.39 GB for tag-based KASAN
Kernel size:
* 14 MB for clean kernel
* 24.7 MB / 19.5 MB for generic KASAN (inline/outline)
* 27.1 MB / 18.1 MB for tag-based KASAN (inline/outline)
Work under load time comparison (compiling the mainline kernel) (200 cores):
* 62s for clean kernel
* 171s / 125s for generic KASAN (outline/inline)
* 145s for tag-based KASAN (outline) [2]
[1] Currently inline mode doesn't work on x86 due to things missing in
the compiler. I have written a patch for clang that seems to fix the
inline mode and I was able to boot and check that all patches regarding
the inline mode work as expected. My hope is to post the patch to LLVM
once this series is completed, and then make inline mode available in
the kernel config.
[2] While I was able to boot the inline tag-based kernel with my
compiler changes in a simulated environment, due to toolchain
difficulties I couldn't get it to boot on the machine I had access to.
Also boot time results from the simulation seem too good to be true, and
they're much too worse for the generic case to be believable. Therefore
I'm posting only results from the physical server platform.
======= Compilation
Clang was used to compile the series (make LLVM=1) since gcc doesn't
seem to have support for KASAN tag-based compiler instrumentation on
x86. Patchset does seem to compile with gcc without an issue but doesn't
boot afterwards.
======= Dependencies
The series is based on mm-new.
======= Previous versions
v11: https://lore.kernel.org/all/cover.1773164688.git.m.wieczorretman@pm.me/
v10: https://lore.kernel.org/all/cover.1770232424.git.m.wieczorretman@pm.me/
v9: https://lore.kernel.org/all/cover.1768845098.git.m.wieczorretman@pm.me/
v8: https://lore.kernel.org/all/cover.1768233085.git.m.wieczorretman@pm.me/
v7: https://lore.kernel.org/all/cover.1765386422.git.m.wieczorretman@pm.me/
v6: https://lore.kernel.org/all/cover.1761763681.git.m.wieczorretman@pm.me/
v5: https://lore.kernel.org/all/cover.1756151769.git.maciej.wieczor-retman@intel.com/
v4: https://lore.kernel.org/all/cover.1755004923.git.maciej.wieczor-retman@intel.com/
v3: https://lore.kernel.org/all/cover.1743772053.git.maciej.wieczor-retman@intel.com/
v2: https://lore.kernel.org/all/cover.1739866028.git.maciej.wieczor-retman@intel.com/
v1: https://lore.kernel.org/all/cover.1738686764.git.maciej.wieczor-retman@intel.com/
=== (two fixes patches were split off after v6) (merged into mm-unstable)
v1: https://lore.kernel.org/all/cover.1762267022.git.m.wieczorretman@pm.me/
v2: https://lore.kernel.org/all/cover.1764685296.git.m.wieczorretman@pm.me/
v3: https://lore.kernel.org/all/cover.1764874575.git.m.wieczorretman@pm.me/
v4: https://lore.kernel.org/all/cover.1764945396.git.m.wieczorretman@pm.me/
Changes v12:
- Put CC_IS_CLANG and ADDRESS_MASKING into one Kconfig option that
controls HAVE_ARCH_KASAN_SW_TAGS. (Peter Zijlstra)
Changes v11:
- Rebase series onto mm-new.
- Split off and modify the documentation patch.
- Split the pointer arithmetic reset tag patch in two. One patch for
slight rework of page_to_virt() and one for putting x -
__START_KERNEL_map into a tag reset helper.
- Fix issue pointed out by Dave on copy_from_kernel_nofault_allowed().
- Remove the arch_kasan_non_canonical_hook function scheme in favor of
Andrey Ryabinin's simpler arch independent implementation.
Changes v10:
- Rebase the series onto 6.19-rc8.
- Add Mike Rapoport's acked-by to patch 6.
- Modify Documentation/dev-tools/kasan.rst in patches 1 and 13.
Changes v9:
- Lock HAVE_ARCH_KASAN_SW_TAGS behind CC_IS_CLANG due to gcc not working
in practice.
- Remove pr_info() from KASAN initialization.
- Add paragraph to mm.rst explaining the alternative KASAN memory
ranges.
- Move out arch based code from kasan_non_canonical_hook() into arch
subdirectories. arm64 and non-arch changes in patch 1, x86 changes in
patch 12.
- Reset tag bits on arm64's non-canonical hook to allow inline mode to
work.
- Revert modifying __is_canonical_address() since it can break KVM. Just
untag address in copy_from_kernel_no_fault_allowed().
- Add a bunch of reviewed-by tags.
Changes v8:
- Detached the UD1/INT3 inline patch from the series so the whole
patchset can be merged without waiting on other dependency series. For
now with lack of compiler support for the inline mode that patch
didn't work anyway so this delay is not an issue.
- Rebased patches onto 6.19-rc5.
- Added acked-by tag to "kasan: arm64: x86: Make special tags arch
specific".
Changes v7:
- Rebased the series onto Peter Zijlstra's "WARN() hackery" v2 patchset.
- Fix flipped memset arguments in "x86/kasan: KASAN raw shadow memory
PTE init".
- Reorder tag width defines on arm64 to avoid redefinition warnings.
- Split off the pcpu unpoison patches into a separate fix oriented
series.
- Redid the canonicality checks so it works for KVM too (didn't change
the __canonical_address() function previously).
- A lot of fixes pointed out by Alexander in his great review:
- Fixed "x86/mm: Physical address comparisons in fill_p*d/pte"
- Merged "Support tag widths less than 8 bits" and "Make special
tags arch specific".
- Added comments and extended patch messages for patches
"x86/kasan: Make software tag-based kasan available" and
"mm/execmem: Untag addresses in EXECMEM_ROX related pointer arithmetic",
- Fixed KASAN_TAG_MASK definition order so all patches compile
individually.
- Renamed kasan_inline.c to kasan_sw_tags.c.
Changes v6:
- Initialize sw-tags only when LAM is available.
- Move inline mode to use UD1 instead of INT3
- Remove inline multishot patch.
- Fix the canonical check to work for user addresses too.
- Revise patch names and messages to align to tip tree rules.
- Fix vdso compilation issue.
Changes v5:
- Fix a bunch of arm64 compilation errors I didn't catch earlier.
Thank You Ada for testing the series!
- Simplify the usage of the tag handling x86 functions (virt_to_page,
phys_addr etc.).
- Remove within() and within_range() from the EXECMEM_ROX patch.
Changes v4:
- Revert x86 kasan_mem_to_shadow() scheme to the same on used in generic
KASAN. Keep the arithmetic shift idea for the KASAN in general since
it makes more sense for arm64 and in risc-v.
- Fix inline mode but leave it unavailable until a complementary
compiler patch can be merged.
- Apply Dave Hansen's comments on series formatting, patch style and
code simplifications.
Changes v3:
- Remove the runtime_const patch and setup a unified offset for both 5
and 4 paging levels.
- Add a fix for inline mode on x86 tag-based KASAN. Add a handler for
int3 that is generated on inline tag mismatches.
- Fix scripts/gdb/linux/kasan.py so the new signed mem_to_shadow() is
reflected there.
- Fix Documentation/arch/arm64/kasan-offsets.sh to take new offsets into
account.
- Made changes to the kasan_non_canonical_hook() according to upstream
discussion.
- Remove patches 2 and 3 since they related to risc-v and this series
adds only x86 related things.
- Reorder __tag_*() functions so they're before arch_kasan_*(). Remove
CONFIG_KASAN condition from __tag_set().
Changes v2:
- Split the series into one adding KASAN tag-based mode (this one) and
another one that adds the dense mode to KASAN (will post later).
- Removed exporting kasan_poison() and used a wrapper instead in
kasan_init_64.c
- Prepended series with 4 patches from the risc-v series and applied
review comments to the first patch as the rest already are reviewed.
Maciej Wieczor-Retman (13):
kasan: Fix inline mode for x86 tag-based mode
x86/kasan: Add arch specific kasan functions
x86/mm: Reset pointer tag in x - __START_KERNEL_map instances
kasan: arm64: x86: Make page_to_virt() KASAN aware
mm/execmem: Untag addresses in EXECMEM_ROX related pointer arithmetic
x86/mm: Use physical address comparisons in fill_p*d/pte
x86/kasan: Initialize KASAN raw shadow memory
x86/mm: Reset tags in a canonical address helper call
x86/mm: Initialize LAM_SUP
x86: Increase minimal SLAB alignment for KASAN
x86/kasan: Use a logical bit shift for kasan_mem_to_shadow
x86/kasan: Make software tag-based kasan available
docs: Update KASAN and x86 memory map documentations
Samuel Holland (2):
kasan: sw_tags: Use arithmetic shift for shadow computation
kasan: arm64: x86: Make special tags arch specific
Documentation/arch/arm64/kasan-offsets.sh | 8 ++-
Documentation/arch/x86/x86_64/mm.rst | 21 +++++-
Documentation/dev-tools/kasan.rst | 79 ++++++++++++++++-------
MAINTAINERS | 2 +-
arch/arm64/Kconfig | 10 +--
arch/arm64/include/asm/kasan-tags.h | 14 ++++
arch/arm64/include/asm/kasan.h | 2 -
arch/arm64/include/asm/memory.h | 19 ++++--
arch/arm64/include/asm/uaccess.h | 1 +
arch/arm64/mm/kasan_init.c | 7 +-
arch/x86/Kconfig | 9 +++
arch/x86/boot/compressed/misc.h | 1 +
arch/x86/include/asm/cache.h | 4 ++
arch/x86/include/asm/kasan-tags.h | 9 +++
arch/x86/include/asm/kasan.h | 62 +++++++++++++++++-
arch/x86/include/asm/page_64.h | 11 +++-
arch/x86/kernel/head_64.S | 3 +
arch/x86/mm/init.c | 3 +
arch/x86/mm/init_64.c | 11 ++--
arch/x86/mm/kasan_init_64.c | 24 ++++++-
arch/x86/mm/maccess.c | 1 +
arch/x86/mm/physaddr.c | 4 +-
include/linux/kasan-tags.h | 21 ++++--
include/linux/kasan.h | 23 +++++--
include/linux/mm.h | 11 ++--
include/linux/mmzone.h | 2 +-
include/linux/page-flags-layout.h | 9 +--
lib/Kconfig.kasan | 4 +-
mm/execmem.c | 9 ++-
mm/kasan/report.c | 16 +++--
mm/vmalloc.c | 7 +-
scripts/Makefile.kasan | 3 +
scripts/gdb/linux/kasan.py | 5 +-
scripts/gdb/linux/mm.py | 5 +-
34 files changed, 331 insertions(+), 89 deletions(-)
create mode 100644 arch/arm64/include/asm/kasan-tags.h
create mode 100644 arch/x86/include/asm/kasan-tags.h
--
2.53.0
^ permalink raw reply
* RE: [PATCH v5 2/4] iio: adc: ad4691: add initial driver for AD4691 family
From: Sabau, Radu bogdan @ 2026-03-30 14:20 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Lars-Peter Clausen, Hennerich, Michael, Jonathan Cameron,
David Lechner, Sa, Nuno, Andy Shevchenko, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Uwe Kleine-König,
Liam Girdwood, Mark Brown, Linus Walleij, Bartosz Golaszewski,
Philipp Zabel, Jonathan Corbet, Shuah Khan,
linux-iio@vger.kernel.org, devicetree@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pwm@vger.kernel.org,
linux-gpio@vger.kernel.org, linux-doc@vger.kernel.org
In-Reply-To: <acZrthJYQX-h_9p5@ashevche-desk.local>
> -----Original Message-----
> From: Andy Shevchenko <andriy.shevchenko@intel.com>
> Sent: Friday, March 27, 2026 1:36 PM
> To: Sabau, Radu bogdan <Radu.Sabau@analog.com>
...
>
> > +#include <linux/bitfield.h>
> > +#include <linux/bitops.h>
> > +#include <linux/cleanup.h>
> > +#include <linux/delay.h>
> > +#include <linux/device.h>
>
> Hmm... Is it used? Or perhaps you need only
> dev_printk.h
> device/devres.h
> ?
Hi Andy,
I have checked this out and it seems device.h doesn't actually need
to be included anyway since spi.h directly includes device.h, and since
this is a SPI driver that's never going away, it's covered. Will drop it!
Thanks,
Radu
^ permalink raw reply
* Re: [PATCH 1/2] mm/memory-failure: add panic_on_unrecoverable_memory_failure sysctl
From: Breno Leitao @ 2026-03-30 13:45 UTC (permalink / raw)
To: Miaohe Lin
Cc: linux-mm, linux-kernel, linux-doc, kernel-team, Naoya Horiguchi,
Andrew Morton, Jonathan Corbet, Shuah Khan
In-Reply-To: <a88d62ee-530c-1a6e-c05f-de324f940b8f@huawei.com>
On Mon, Mar 30, 2026 at 03:55:00PM +0800, Miaohe Lin wrote:
> On 2026/3/23 23:29, Breno Leitao wrote:
>
> > @@ -1298,6 +1309,10 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,
> > pr_err("%#lx: recovery action for %s: %s\n",
> > pfn, action_page_types[type], action_name[result]);
> >
> > + if (sysctl_panic_on_unrecoverable_mf &&
> > + type == MF_MSG_GET_HWPOISON && result == MF_IGNORED)
> > + panic("Memory failure: %#lx: unrecoverable page", pfn);
>
> MF_MSG_GET_HWPOISON contains some other scenarios. For example, an isolated folio will
> make get_hwpoison_page return -EIO so we will see MF_MSG_GET_HWPOISON and MF_IGNORED in
> action_result. But that's recoverable if folio is used by userspace thus panic will be
> unacceptable.
> Will it better to check type against MF_MSG_KERNEL_HIGH_ORDER?
Yes, I was discussing this with akpm, and maybe the better
approach would be to panic for types MF_MSG_KERNEL_HIGH_ORDER and MF_MSG_KERNEL.
In both cases, it seems that, the page would not be able to migrate. What do
you think about a change like this:
@@ -1298,6 +1309,10 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,
pr_err("%#lx: recovery action for %s: %s\n",
pfn, action_page_types[type], action_name[result]);
+ if (sysctl_panic_on_unrecoverable_mf && result == MF_IGNORED &&
+ (type == MF_MSG_KERNEL || type == MF_MSG_KERNEL_HIGH_ORDER))
+ panic("Memory failure: %#lx: unrecoverable page", pfn);
+
return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
}
^ permalink raw reply
* Re: [PATCH v6 3/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes
From: Guo Ren @ 2026-03-30 13:21 UTC (permalink / raw)
To: fangyu.yu
Cc: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex,
skhan, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv,
linux-riscv, linux-kernel
In-Reply-To: <20260330122601.22140-4-fangyu.yu@linux.alibaba.com>
On Mon, Mar 30, 2026 at 8:26 PM <fangyu.yu@linux.alibaba.com> wrote:
>
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> Extend kvm_riscv_gstage_mode_detect() to probe all HGATP.MODE values
> supported by the host and record them in a bitmask. Keep tracking the
> maximum supported G-stage page table level for existing internal users.
>
> Also provide lightweight helpers to retrieve the supported-mode bitmask
> and validate a requested HGATP.MODE against it.
>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
> ---
> arch/riscv/include/asm/kvm_gstage.h | 11 ++++++++
> arch/riscv/kvm/gstage.c | 43 +++++++++++++++--------------
> 2 files changed, 34 insertions(+), 20 deletions(-)
>
> diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
> index 70d9d483365e..bbf8f45c6563 100644
> --- a/arch/riscv/include/asm/kvm_gstage.h
> +++ b/arch/riscv/include/asm/kvm_gstage.h
> @@ -31,6 +31,7 @@ struct kvm_gstage_mapping {
> #endif
>
> extern unsigned long kvm_riscv_gstage_max_pgd_levels;
> +extern u32 kvm_riscv_gstage_supported_mode_mask;
>
> #define kvm_riscv_gstage_pgd_xbits 2
> #define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gstage_pgd_xbits))
> @@ -102,4 +103,14 @@ static inline void kvm_riscv_gstage_init(struct kvm_gstage *gstage, struct kvm *
> gstage->pgd_levels = kvm->arch.pgd_levels;
> }
>
> +static inline u32 kvm_riscv_get_hgatp_mode_mask(void)
> +{
> + return kvm_riscv_gstage_supported_mode_mask;
> +}
> +
> +static inline bool kvm_riscv_hgatp_mode_is_valid(unsigned long mode)
> +{
> + return kvm_riscv_gstage_supported_mode_mask & BIT(mode);
> +}
> +
> #endif
> diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c
> index 7c4c34bc191b..459041255c14 100644
> --- a/arch/riscv/kvm/gstage.c
> +++ b/arch/riscv/kvm/gstage.c
> @@ -16,6 +16,8 @@ unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 3;
> #else
> unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 2;
> #endif
> +/* Bitmask of supported HGATP.MODE encodings (BIT(HGATP_MODE_*)). */
> +u32 kvm_riscv_gstage_supported_mode_mask __ro_after_init;
>
> #define gstage_pte_leaf(__ptep) \
> (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC))
> @@ -315,42 +317,43 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end
> }
> }
>
> +static bool __init kvm_riscv_hgatp_mode_supported(unsigned long mode)
> +{
> + csr_write(CSR_HGATP, mode << HGATP_MODE_SHIFT);
> + return ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == mode);
> +}
> +
> void __init kvm_riscv_gstage_mode_detect(void)
> {
> + kvm_riscv_gstage_supported_mode_mask = 0;
> + kvm_riscv_gstage_max_pgd_levels = 0;
> +
> #ifdef CONFIG_64BIT
> - /* Try Sv57x4 G-stage mode */
> - csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT);
> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) {
> - kvm_riscv_gstage_max_pgd_levels = 5;
> - goto done;
> + /* Try Sv39x4 G-stage mode */
> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV39X4)) {
> + kvm_riscv_gstage_supported_mode_mask |= BIT(HGATP_MODE_SV39X4);
> + kvm_riscv_gstage_max_pgd_levels = 3;
> }
>
> /* Try Sv48x4 G-stage mode */
> - csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT);
> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) {
> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV48X4)) {
> + kvm_riscv_gstage_supported_mode_mask |= BIT(HGATP_MODE_SV48X4);
> kvm_riscv_gstage_max_pgd_levels = 4;
> - goto done;
> }
>
> - /* Try Sv39x4 G-stage mode */
> - csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT);
> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV39X4) {
> - kvm_riscv_gstage_max_pgd_levels = 3;
> - goto done;
> + /* Try Sv57x4 G-stage mode */
> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV57X4)) {
> + kvm_riscv_gstage_supported_mode_mask |= BIT(HGATP_MODE_SV57X4);
> + kvm_riscv_gstage_max_pgd_levels = 5;
> }
> #else /* CONFIG_32BIT */
> /* Try Sv32x4 G-stage mode */
> - csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT);
> - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV32X4) {
> + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV32X4)) {
> + kvm_riscv_gstage_supported_mode_mask |= BIT(HGATP_MODE_SV32X4);
> kvm_riscv_gstage_max_pgd_levels = 2;
> - goto done;
> }
> #endif
>
> - /* KVM depends on !HGATP_MODE_OFF */
> - kvm_riscv_gstage_max_pgd_levels = 0;
> -
> -done:
> csr_write(CSR_HGATP, 0);
> kvm_riscv_local_hfence_gvma_all();
> }
> --
> 2.50.1
>
Reviewed-by: Guo Ren <guoren@kernel.org>
--
Best Regards
Guo Ren
^ permalink raw reply
* Re: [PATCH v6 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
From: Guo Ren @ 2026-03-30 13:20 UTC (permalink / raw)
To: fangyu.yu
Cc: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex,
skhan, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv,
linux-riscv, linux-kernel
In-Reply-To: <20260330122601.22140-5-fangyu.yu@linux.alibaba.com>
On Mon, Mar 30, 2026 at 8:26 PM <fangyu.yu@linux.alibaba.com> wrote:
>
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> Add a VM capability that allows userspace to select the G-stage page table
> format by setting HGATP.MODE on a per-VM basis.
>
> Userspace enables the capability via KVM_ENABLE_CAP, passing the requested
> HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is
> not supported by the host, and with -EBUSY if the VM has already been
> committed (e.g. vCPUs have been created or any memslot is populated).
>
> KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the
> HGATP.MODE formats supported by the host.
>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
> ---
> Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
> arch/riscv/kvm/vm.c | 18 ++++++++++++++++--
> include/uapi/linux/kvm.h | 1 +
> 3 files changed, 44 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 032516783e96..9d7f6958fa81 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8902,6 +8902,33 @@ helpful if user space wants to emulate instructions which are not
> This capability can be enabled dynamically even if VCPUs were already
> created and are running.
>
> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE
> +---------------------------------
> +
> +:Architectures: riscv
> +:Type: VM
> +:Parameters: args[0] contains the requested HGATP mode
> +:Returns:
> + - 0 on success.
> + - -EINVAL if args[0] is outside the range of HGATP modes supported by the
> + hardware.
> + - -EBUSY if vCPUs have already been created for the VM, if the VM has any
> + non-empty memslots.
> +
> +This capability allows userspace to explicitly select the HGATP mode for
> +the VM. The selected mode must be supported by both KVM and hardware. This
> +capability must be enabled before creating any vCPUs or memslots.
> +
> +If this capability is not enabled, KVM will select the default HGATP mode
> +automatically. The default is the highest HGATP.MODE value supported by
> +hardware.
> +
> +``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of
> +HGATP.MODE values supported by the host. A return value of 0 indicates that
> +the capability is not supported. Supported-mode bitmask use HGATP.MODE
> +encodings as defined by the RISC-V privileged specification, such as Sv39x4
> +corresponds to HGATP.MODE=8, so userspace should test bitmask & BIT(8).
> +
> 8. Other capabilities.
> ======================
>
> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> index 4d82a886102c..5e82a3ad3ad0 100644
> --- a/arch/riscv/kvm/vm.c
> +++ b/arch/riscv/kvm/vm.c
> @@ -201,6 +201,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_VM_GPA_BITS:
> r = kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels);
> break;
> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> + r = kvm_riscv_get_hgatp_mode_mask();
> + break;
> default:
> r = 0;
> break;
> @@ -211,12 +214,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>
> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> {
> + if (cap->flags)
> + return -EINVAL;
> +
> switch (cap->cap) {
> case KVM_CAP_RISCV_MP_STATE_RESET:
> - if (cap->flags)
> - return -EINVAL;
> kvm->arch.mp_state_reset = true;
> return 0;
> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0]))
> + return -EINVAL;
> +
> + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm))
> + return -EBUSY;
> +#ifdef CONFIG_64BIT
> + kvm->arch.pgd_levels = 3 + cap->args[0] - HGATP_MODE_SV39X4;
> +#endif
> + return 0;
> default:
> return -EINVAL;
> }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 80364d4dbebb..a74a80fd4046 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -989,6 +989,7 @@ struct kvm_enable_cap {
> #define KVM_CAP_ARM_SEA_TO_USER 245
> #define KVM_CAP_S390_USER_OPEREXEC 246
> #define KVM_CAP_S390_KEYOP 247
> +#define KVM_CAP_RISCV_SET_HGATP_MODE 248
>
> struct kvm_irq_routing_irqchip {
> __u32 irqchip;
> --
> 2.50.1
>
Reviewed-by: Guo Ren <guoren@kernel.org>
--
Best Regards
Guo Ren
^ permalink raw reply
* Re: [PATCH v6 1/4] RISC-V: KVM: Support runtime configuration for per-VM's HGATP mode
From: Guo Ren @ 2026-03-30 13:20 UTC (permalink / raw)
To: fangyu.yu
Cc: pbonzini, corbet, anup, atish.patra, pjw, palmer, aou, alex,
skhan, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv,
linux-riscv, linux-kernel
In-Reply-To: <20260330122601.22140-2-fangyu.yu@linux.alibaba.com>
On Mon, Mar 30, 2026 at 8:26 PM <fangyu.yu@linux.alibaba.com> wrote:
>
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> Introduces one per-VM architecture-specific fields to support runtime
> configuration of the G-stage page table format:
>
> - kvm->arch.pgd_levels: the corresponding number of page table levels
> for the selected mode.
>
> These fields replace the previous global variables
> kvm_riscv_gstage_mode and kvm_riscv_gstage_pgd_levels, enabling different
> virtual machines to independently select their G-stage page table format
> instead of being forced to share the maximum mode detected by the kernel
> at boot time.
>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
> ---
> arch/riscv/include/asm/kvm_gstage.h | 37 ++++++++++++----
> arch/riscv/include/asm/kvm_host.h | 1 +
> arch/riscv/kvm/gstage.c | 65 ++++++++++++++---------------
> arch/riscv/kvm/main.c | 12 +++---
> arch/riscv/kvm/mmu.c | 20 +++++----
> arch/riscv/kvm/vm.c | 2 +-
> arch/riscv/kvm/vmid.c | 3 +-
> 7 files changed, 83 insertions(+), 57 deletions(-)
>
> diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
> index 595e2183173e..5aa58d1f692a 100644
> --- a/arch/riscv/include/asm/kvm_gstage.h
> +++ b/arch/riscv/include/asm/kvm_gstage.h
> @@ -29,16 +29,22 @@ struct kvm_gstage_mapping {
> #define kvm_riscv_gstage_index_bits 10
> #endif
>
> -extern unsigned long kvm_riscv_gstage_mode;
> -extern unsigned long kvm_riscv_gstage_pgd_levels;
> +extern unsigned long kvm_riscv_gstage_max_pgd_levels;
>
> #define kvm_riscv_gstage_pgd_xbits 2
> #define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gstage_pgd_xbits))
> -#define kvm_riscv_gstage_gpa_bits (HGATP_PAGE_SHIFT + \
> - (kvm_riscv_gstage_pgd_levels * \
> - kvm_riscv_gstage_index_bits) + \
> - kvm_riscv_gstage_pgd_xbits)
> -#define kvm_riscv_gstage_gpa_size ((gpa_t)(1ULL << kvm_riscv_gstage_gpa_bits))
> +
> +static inline unsigned long kvm_riscv_gstage_gpa_bits(unsigned long pgd_levels)
> +{
> + return (HGATP_PAGE_SHIFT +
> + pgd_levels * kvm_riscv_gstage_index_bits +
> + kvm_riscv_gstage_pgd_xbits);
> +}
> +
> +static inline gpa_t kvm_riscv_gstage_gpa_size(unsigned long pgd_levels)
> +{
> + return BIT_ULL(kvm_riscv_gstage_gpa_bits(pgd_levels));
> +}
>
> bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr,
> pte_t **ptepp, u32 *ptep_level);
> @@ -69,4 +75,21 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end
>
> void kvm_riscv_gstage_mode_detect(void);
>
> +static inline unsigned long kvm_riscv_gstage_mode(unsigned long pgd_levels)
> +{
> + switch (pgd_levels) {
> + case 2:
> + return HGATP_MODE_SV32X4;
> + case 3:
> + return HGATP_MODE_SV39X4;
> + case 4:
> + return HGATP_MODE_SV48X4;
> + case 5:
> + return HGATP_MODE_SV57X4;
> + default:
> + WARN_ON_ONCE(1);
> + return HGATP_MODE_OFF;
> + }
> +}
> +
> #endif
> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> index 24585304c02b..478f699e9dec 100644
> --- a/arch/riscv/include/asm/kvm_host.h
> +++ b/arch/riscv/include/asm/kvm_host.h
> @@ -94,6 +94,7 @@ struct kvm_arch {
> /* G-stage page table */
> pgd_t *pgd;
> phys_addr_t pgd_phys;
> + unsigned long pgd_levels;
>
> /* Guest Timer */
> struct kvm_guest_timer timer;
> diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c
> index b67d60d722c2..4beb9322fe76 100644
> --- a/arch/riscv/kvm/gstage.c
> +++ b/arch/riscv/kvm/gstage.c
> @@ -12,22 +12,21 @@
> #include <asm/kvm_gstage.h>
>
> #ifdef CONFIG_64BIT
> -unsigned long kvm_riscv_gstage_mode __ro_after_init = HGATP_MODE_SV39X4;
> -unsigned long kvm_riscv_gstage_pgd_levels __ro_after_init = 3;
> +unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 3;
> #else
> -unsigned long kvm_riscv_gstage_mode __ro_after_init = HGATP_MODE_SV32X4;
> -unsigned long kvm_riscv_gstage_pgd_levels __ro_after_init = 2;
> +unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 2;
> #endif
>
> #define gstage_pte_leaf(__ptep) \
> (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC))
>
> -static inline unsigned long gstage_pte_index(gpa_t addr, u32 level)
> +static inline unsigned long gstage_pte_index(struct kvm_gstage *gstage,
> + gpa_t addr, u32 level)
> {
> unsigned long mask;
> unsigned long shift = HGATP_PAGE_SHIFT + (kvm_riscv_gstage_index_bits * level);
>
> - if (level == (kvm_riscv_gstage_pgd_levels - 1))
> + if (level == gstage->kvm->arch.pgd_levels - 1)
> mask = (PTRS_PER_PTE * (1UL << kvm_riscv_gstage_pgd_xbits)) - 1;
> else
> mask = PTRS_PER_PTE - 1;
> @@ -40,12 +39,13 @@ static inline unsigned long gstage_pte_page_vaddr(pte_t pte)
> return (unsigned long)pfn_to_virt(__page_val_to_pfn(pte_val(pte)));
> }
>
> -static int gstage_page_size_to_level(unsigned long page_size, u32 *out_level)
> +static int gstage_page_size_to_level(struct kvm_gstage *gstage, unsigned long page_size,
> + u32 *out_level)
> {
> u32 i;
> unsigned long psz = 1UL << 12;
>
> - for (i = 0; i < kvm_riscv_gstage_pgd_levels; i++) {
> + for (i = 0; i < gstage->kvm->arch.pgd_levels; i++) {
> if (page_size == (psz << (i * kvm_riscv_gstage_index_bits))) {
> *out_level = i;
> return 0;
> @@ -55,21 +55,23 @@ static int gstage_page_size_to_level(unsigned long page_size, u32 *out_level)
> return -EINVAL;
> }
>
> -static int gstage_level_to_page_order(u32 level, unsigned long *out_pgorder)
> +static int gstage_level_to_page_order(struct kvm_gstage *gstage, u32 level,
> + unsigned long *out_pgorder)
> {
> - if (kvm_riscv_gstage_pgd_levels < level)
> + if (gstage->kvm->arch.pgd_levels < level)
> return -EINVAL;
>
> *out_pgorder = 12 + (level * kvm_riscv_gstage_index_bits);
> return 0;
> }
>
> -static int gstage_level_to_page_size(u32 level, unsigned long *out_pgsize)
> +static int gstage_level_to_page_size(struct kvm_gstage *gstage, u32 level,
> + unsigned long *out_pgsize)
> {
> int rc;
> unsigned long page_order = PAGE_SHIFT;
>
> - rc = gstage_level_to_page_order(level, &page_order);
> + rc = gstage_level_to_page_order(gstage, level, &page_order);
> if (rc)
> return rc;
>
> @@ -81,11 +83,11 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr,
> pte_t **ptepp, u32 *ptep_level)
> {
> pte_t *ptep;
> - u32 current_level = kvm_riscv_gstage_pgd_levels - 1;
> + u32 current_level = gstage->kvm->arch.pgd_levels - 1;
>
> *ptep_level = current_level;
> ptep = (pte_t *)gstage->pgd;
> - ptep = &ptep[gstage_pte_index(addr, current_level)];
> + ptep = &ptep[gstage_pte_index(gstage, addr, current_level)];
> while (ptep && pte_val(ptep_get(ptep))) {
> if (gstage_pte_leaf(ptep)) {
> *ptep_level = current_level;
> @@ -97,7 +99,7 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr,
> current_level--;
> *ptep_level = current_level;
> ptep = (pte_t *)gstage_pte_page_vaddr(ptep_get(ptep));
> - ptep = &ptep[gstage_pte_index(addr, current_level)];
> + ptep = &ptep[gstage_pte_index(gstage, addr, current_level)];
> } else {
> ptep = NULL;
> }
> @@ -110,7 +112,7 @@ static void gstage_tlb_flush(struct kvm_gstage *gstage, u32 level, gpa_t addr)
> {
> unsigned long order = PAGE_SHIFT;
>
> - if (gstage_level_to_page_order(level, &order))
> + if (gstage_level_to_page_order(gstage, level, &order))
> return;
> addr &= ~(BIT(order) - 1);
>
> @@ -125,9 +127,9 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage,
> struct kvm_mmu_memory_cache *pcache,
> const struct kvm_gstage_mapping *map)
> {
> - u32 current_level = kvm_riscv_gstage_pgd_levels - 1;
> + u32 current_level = gstage->kvm->arch.pgd_levels - 1;
> pte_t *next_ptep = (pte_t *)gstage->pgd;
> - pte_t *ptep = &next_ptep[gstage_pte_index(map->addr, current_level)];
> + pte_t *ptep = &next_ptep[gstage_pte_index(gstage, map->addr, current_level)];
>
> if (current_level < map->level)
> return -EINVAL;
> @@ -151,7 +153,7 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage,
> }
>
> current_level--;
> - ptep = &next_ptep[gstage_pte_index(map->addr, current_level)];
> + ptep = &next_ptep[gstage_pte_index(gstage, map->addr, current_level)];
> }
>
> if (pte_val(*ptep) != pte_val(map->pte)) {
> @@ -175,7 +177,7 @@ int kvm_riscv_gstage_map_page(struct kvm_gstage *gstage,
> out_map->addr = gpa;
> out_map->level = 0;
>
> - ret = gstage_page_size_to_level(page_size, &out_map->level);
> + ret = gstage_page_size_to_level(gstage, page_size, &out_map->level);
> if (ret)
> return ret;
>
> @@ -217,7 +219,7 @@ void kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage, gpa_t addr,
> u32 next_ptep_level;
> unsigned long next_page_size, page_size;
>
> - ret = gstage_level_to_page_size(ptep_level, &page_size);
> + ret = gstage_level_to_page_size(gstage, ptep_level, &page_size);
> if (ret)
> return;
>
> @@ -229,7 +231,7 @@ void kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage, gpa_t addr,
> if (ptep_level && !gstage_pte_leaf(ptep)) {
> next_ptep = (pte_t *)gstage_pte_page_vaddr(ptep_get(ptep));
> next_ptep_level = ptep_level - 1;
> - ret = gstage_level_to_page_size(next_ptep_level, &next_page_size);
> + ret = gstage_level_to_page_size(gstage, next_ptep_level, &next_page_size);
> if (ret)
> return;
>
> @@ -263,7 +265,7 @@ void kvm_riscv_gstage_unmap_range(struct kvm_gstage *gstage,
>
> while (addr < end) {
> found_leaf = kvm_riscv_gstage_get_leaf(gstage, addr, &ptep, &ptep_level);
> - ret = gstage_level_to_page_size(ptep_level, &page_size);
> + ret = gstage_level_to_page_size(gstage, ptep_level, &page_size);
> if (ret)
> break;
>
> @@ -297,7 +299,7 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end
>
> while (addr < end) {
> found_leaf = kvm_riscv_gstage_get_leaf(gstage, addr, &ptep, &ptep_level);
> - ret = gstage_level_to_page_size(ptep_level, &page_size);
> + ret = gstage_level_to_page_size(gstage, ptep_level, &page_size);
> if (ret)
> break;
>
> @@ -319,39 +321,34 @@ void __init kvm_riscv_gstage_mode_detect(void)
> /* Try Sv57x4 G-stage mode */
> csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT);
> if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) {
> - kvm_riscv_gstage_mode = HGATP_MODE_SV57X4;
> - kvm_riscv_gstage_pgd_levels = 5;
> + kvm_riscv_gstage_max_pgd_levels = 5;
> goto done;
> }
>
> /* Try Sv48x4 G-stage mode */
> csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT);
> if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) {
> - kvm_riscv_gstage_mode = HGATP_MODE_SV48X4;
> - kvm_riscv_gstage_pgd_levels = 4;
> + kvm_riscv_gstage_max_pgd_levels = 4;
> goto done;
> }
>
> /* Try Sv39x4 G-stage mode */
> csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT);
> if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV39X4) {
> - kvm_riscv_gstage_mode = HGATP_MODE_SV39X4;
> - kvm_riscv_gstage_pgd_levels = 3;
> + kvm_riscv_gstage_max_pgd_levels = 3;
> goto done;
> }
> #else /* CONFIG_32BIT */
> /* Try Sv32x4 G-stage mode */
> csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT);
> if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV32X4) {
> - kvm_riscv_gstage_mode = HGATP_MODE_SV32X4;
> - kvm_riscv_gstage_pgd_levels = 2;
> + kvm_riscv_gstage_max_pgd_levels = 2;
> goto done;
> }
> #endif
>
> /* KVM depends on !HGATP_MODE_OFF */
> - kvm_riscv_gstage_mode = HGATP_MODE_OFF;
> - kvm_riscv_gstage_pgd_levels = 0;
> + kvm_riscv_gstage_max_pgd_levels = 0;
>
> done:
> csr_write(CSR_HGATP, 0);
> diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
> index 0f3fe3986fc0..90ee0a032b9a 100644
> --- a/arch/riscv/kvm/main.c
> +++ b/arch/riscv/kvm/main.c
> @@ -105,17 +105,17 @@ static int __init riscv_kvm_init(void)
> return rc;
>
> kvm_riscv_gstage_mode_detect();
> - switch (kvm_riscv_gstage_mode) {
> - case HGATP_MODE_SV32X4:
> + switch (kvm_riscv_gstage_max_pgd_levels) {
> + case 2:
> str = "Sv32x4";
> break;
> - case HGATP_MODE_SV39X4:
> + case 3:
> str = "Sv39x4";
> break;
> - case HGATP_MODE_SV48X4:
> + case 4:
> str = "Sv48x4";
> break;
> - case HGATP_MODE_SV57X4:
> + case 5:
> str = "Sv57x4";
> break;
> default:
> @@ -164,7 +164,7 @@ static int __init riscv_kvm_init(void)
> (rc) ? slist : "no features");
> }
>
> - kvm_info("using %s G-stage page table format\n", str);
> + kvm_info("highest G-stage page table mode is %s\n", str);
>
> kvm_info("VMID %ld bits available\n", kvm_riscv_gstage_vmid_bits());
>
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 088d33ba90ed..fbcdd75cb9af 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -67,7 +67,7 @@ int kvm_riscv_mmu_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa,
> if (!writable)
> map.pte = pte_wrprotect(map.pte);
>
> - ret = kvm_mmu_topup_memory_cache(&pcache, kvm_riscv_gstage_pgd_levels);
> + ret = kvm_mmu_topup_memory_cache(&pcache, kvm->arch.pgd_levels);
> if (ret)
> goto out;
>
> @@ -186,7 +186,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> * space addressable by the KVM guest GPA space.
> */
> if ((new->base_gfn + new->npages) >=
> - (kvm_riscv_gstage_gpa_size >> PAGE_SHIFT))
> + kvm_riscv_gstage_gpa_size(kvm->arch.pgd_levels) >> PAGE_SHIFT)
> return -EFAULT;
>
> hva = new->userspace_addr;
> @@ -472,7 +472,7 @@ int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
> memset(out_map, 0, sizeof(*out_map));
>
> /* We need minimum second+third level pages */
> - ret = kvm_mmu_topup_memory_cache(pcache, kvm_riscv_gstage_pgd_levels);
> + ret = kvm_mmu_topup_memory_cache(pcache, kvm->arch.pgd_levels);
> if (ret) {
> kvm_err("Failed to topup G-stage cache\n");
> return ret;
> @@ -575,6 +575,7 @@ int kvm_riscv_mmu_alloc_pgd(struct kvm *kvm)
> return -ENOMEM;
> kvm->arch.pgd = page_to_virt(pgd_page);
> kvm->arch.pgd_phys = page_to_phys(pgd_page);
> + kvm->arch.pgd_levels = kvm_riscv_gstage_max_pgd_levels;
>
> return 0;
> }
> @@ -590,10 +591,12 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm)
> gstage.flags = 0;
> gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
> gstage.pgd = kvm->arch.pgd;
> - kvm_riscv_gstage_unmap_range(&gstage, 0UL, kvm_riscv_gstage_gpa_size, false);
> + kvm_riscv_gstage_unmap_range(&gstage, 0UL,
> + kvm_riscv_gstage_gpa_size(kvm->arch.pgd_levels), false);
> pgd = READ_ONCE(kvm->arch.pgd);
> kvm->arch.pgd = NULL;
> kvm->arch.pgd_phys = 0;
> + kvm->arch.pgd_levels = 0;
> }
> spin_unlock(&kvm->mmu_lock);
>
> @@ -603,11 +606,12 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm)
>
> void kvm_riscv_mmu_update_hgatp(struct kvm_vcpu *vcpu)
> {
> - unsigned long hgatp = kvm_riscv_gstage_mode << HGATP_MODE_SHIFT;
> - struct kvm_arch *k = &vcpu->kvm->arch;
> + struct kvm_arch *ka = &vcpu->kvm->arch;
> + unsigned long hgatp = kvm_riscv_gstage_mode(ka->pgd_levels)
> + << HGATP_MODE_SHIFT;
>
> - hgatp |= (READ_ONCE(k->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID;
> - hgatp |= (k->pgd_phys >> PAGE_SHIFT) & HGATP_PPN;
> + hgatp |= (READ_ONCE(ka->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID;
> + hgatp |= (ka->pgd_phys >> PAGE_SHIFT) & HGATP_PPN;
>
> ncsr_write(CSR_HGATP, hgatp);
>
> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> index 13c63ae1a78b..4d82a886102c 100644
> --- a/arch/riscv/kvm/vm.c
> +++ b/arch/riscv/kvm/vm.c
> @@ -199,7 +199,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> r = KVM_USER_MEM_SLOTS;
> break;
> case KVM_CAP_VM_GPA_BITS:
> - r = kvm_riscv_gstage_gpa_bits;
> + r = kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels);
> break;
> default:
> r = 0;
> diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c
> index cf34d448289d..c15bdb1dd8be 100644
> --- a/arch/riscv/kvm/vmid.c
> +++ b/arch/riscv/kvm/vmid.c
> @@ -26,7 +26,8 @@ static DEFINE_SPINLOCK(vmid_lock);
> void __init kvm_riscv_gstage_vmid_detect(void)
> {
> /* Figure-out number of VMID bits in HW */
> - csr_write(CSR_HGATP, (kvm_riscv_gstage_mode << HGATP_MODE_SHIFT) | HGATP_VMID);
> + csr_write(CSR_HGATP, (kvm_riscv_gstage_mode(kvm_riscv_gstage_max_pgd_levels) <<
> + HGATP_MODE_SHIFT) | HGATP_VMID);
> vmid_bits = csr_read(CSR_HGATP);
> vmid_bits = (vmid_bits & HGATP_VMID) >> HGATP_VMID_SHIFT;
> vmid_bits = fls_long(vmid_bits);
> --
> 2.50.1
>
Reviewed-by: Guo Ren <guoren@kernel.org>
--
Best Regards
Guo Ren
^ permalink raw reply
* Re: [PATCH v2] bootconfig: Apply early options from embedded config
From: Breno Leitao @ 2026-03-30 13:15 UTC (permalink / raw)
To: Masami Hiramatsu
Cc: Jonathan Corbet, Shuah Khan, linux-kernel, linux-trace-kernel,
linux-doc, oss, paulmck, rostedt, kernel-team
In-Reply-To: <20260327223744.f246150adc1671f7605a4f0a@kernel.org>
On Fri, Mar 27, 2026 at 10:37:44PM +0900, Masami Hiramatsu wrote:
> On Fri, 27 Mar 2026 03:06:41 -0700
> Breno Leitao <leitao@debian.org> wrote:
> > > To fix this, we need to change setup_arch() for each architecture so
> > > that it calls this bootconfig_apply_early_params().
> >
> > Could we instead integrate this into parse_early_param() itself? That
> > approach would avoid the need to modify each architecture individually.
>
> Ah, indeed.
I investigated integrating bootconfig into parse_early_param() and hit a
blocker: xbc_init() and xbc_make_cmdline() depend on memblock_alloc(), but on
most architectures (x86, arm64, arm, s390, riscv) parse_early_param() is called
from setup_arch() _before_ memblock is initialized.
So, bootconfig will not be available as early as parse_early_param().
An alternative is replace memblock allocations in lib/bootconfig.c with static
__initdata buffers, similar to Petr's approach in 2023:
https://lore.kernel.org/all/20231121231342.193646-3-oss@malat.biz/
But, there was concerns about the allocation size:
Petr Malat <oss@malat.biz> wrote:
> To allow handling of early options, it's necessary to eliminate allocations
> from embedded bootconfig handling
"Hm, my concern is that this can introduce some sort of overhead to parse the bootconfig."
^ permalink raw reply
* Re: [PATCH v14 12/12] crypto: qce - Communicate the base physical address to the dmaengine
From: Manivannan Sadhasivam @ 2026-03-30 13:08 UTC (permalink / raw)
To: Bartosz Golaszewski
Cc: Vinod Koul, Jonathan Corbet, Thara Gopinath, Herbert Xu,
David S. Miller, Udit Tiwari, Md Sadre Alam, Dmitry Baryshkov,
Stephan Gerhold, Bjorn Andersson, Peter Ujfalusi, Michal Simek,
Frank Li, dmaengine, linux-doc, linux-kernel, linux-arm-msm,
linux-crypto, linux-arm-kernel, brgl, Bartosz Golaszewski
In-Reply-To: <20260323-qcom-qce-cmd-descr-v14-12-f323af411274@oss.qualcomm.com>
On Mon, Mar 23, 2026 at 04:17:18PM +0100, Bartosz Golaszewski wrote:
> In order to communicate to the BAM DMA engine which address should be
> used as a scratchpad for dummy writes related to BAM pipe locking,
> fill out and attach the provided metadata struct to the descriptor as
> well as mark the RX channel as such using the slave config struct.
>
> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
> ---
> drivers/crypto/qce/dma.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/drivers/crypto/qce/dma.c b/drivers/crypto/qce/dma.c
> index 5c42fc7ddf01e11a6562d272ba7c90c906e0e312..635208947668667765e6accf9ef02100746c0f9a 100644
> --- a/drivers/crypto/qce/dma.c
> +++ b/drivers/crypto/qce/dma.c
> @@ -11,6 +11,7 @@
>
> #include "core.h"
> #include "dma.h"
> +#include "regs-v5.h"
>
> #define QCE_IGNORE_BUF_SZ (2 * QCE_BAM_BURST_SIZE)
> #define QCE_BAM_CMD_SGL_SIZE 128
> @@ -43,6 +44,7 @@ void qce_clear_bam_transaction(struct qce_device *qce)
>
> int qce_submit_cmd_desc(struct qce_device *qce)
> {
> + struct bam_desc_metadata meta = { .scratchpad_addr = qce->base_phys + REG_VERSION };
> struct qce_desc_info *qce_desc = qce->dma.bam_txn->desc;
> struct qce_bam_transaction *bam_txn = qce->dma.bam_txn;
> struct dma_async_tx_descriptor *dma_desc;
> @@ -64,6 +66,12 @@ int qce_submit_cmd_desc(struct qce_device *qce)
> return -ENOMEM;
> }
>
> + ret = dmaengine_desc_attach_metadata(dma_desc, &meta, 0);
> + if (ret) {
> + dma_unmap_sg(qce->dev, bam_txn->wr_sgl, bam_txn->wr_sgl_cnt, DMA_TO_DEVICE);
> + return ret;
> + }
> +
> qce_desc->dma_desc = dma_desc;
> cookie = dmaengine_submit(qce_desc->dma_desc);
>
> @@ -107,7 +115,9 @@ void qce_write_dma(struct qce_device *qce, unsigned int offset, u32 val)
> int devm_qce_dma_request(struct qce_device *qce)
> {
> struct qce_dma_data *dma = &qce->dma;
> + struct dma_slave_config cfg = { };
> struct device *dev = qce->dev;
> + int ret;
>
> dma->txchan = devm_dma_request_chan(dev, "tx");
> if (IS_ERR(dma->txchan))
> @@ -119,6 +129,11 @@ int devm_qce_dma_request(struct qce_device *qce)
> return dev_err_probe(dev, PTR_ERR(dma->rxchan),
> "Failed to get RX DMA channel\n");
>
> + cfg.direction = DMA_MEM_TO_DEV;
> + ret = dmaengine_slave_config(dma->rxchan, &cfg);
> + if (ret)
> + return ret;
> +
I don't think this part is necessary. You are already passing the metadata above
and that should be sufficient for the BAM DMA driver to get the scratchpad
address. If any client drivers call dmaengine_slave_config() without
dmaengine_desc_attach_metadata(), and if the BAM DMA supports locking, then the
BAM driver should fail. Otherwise, continuing so would cause race conditions
among the BAM clients, which we are seeing right now on Qcom SDX targets with
both NAND driver in Linux and Modem trying to access NAND memory over BAM.
So please drop this and just use dmaengine_desc_attach_metadata().
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply
* Re: [PATCH v14 05/12] dmaengine: qcom: bam_dma: add support for BAM locking
From: Manivannan Sadhasivam @ 2026-03-30 12:54 UTC (permalink / raw)
To: Bartosz Golaszewski
Cc: Vinod Koul, Jonathan Corbet, Thara Gopinath, Herbert Xu,
David S. Miller, Udit Tiwari, Md Sadre Alam, Dmitry Baryshkov,
Stephan Gerhold, Bjorn Andersson, Peter Ujfalusi, Michal Simek,
Frank Li, dmaengine, linux-doc, linux-kernel, linux-arm-msm,
linux-crypto, linux-arm-kernel, brgl, Bartosz Golaszewski
In-Reply-To: <20260323-qcom-qce-cmd-descr-v14-5-f323af411274@oss.qualcomm.com>
On Mon, Mar 23, 2026 at 04:17:11PM +0100, Bartosz Golaszewski wrote:
> Add support for BAM pipe locking. To that end: when starting DMA on an RX
> channel - prepend the existing queue of issued descriptors with an
> additional "dummy" command descriptor with the LOCK bit set. Once the
> transaction is done (no more issued descriptors), issue one more dummy
> descriptor with the UNLOCK bit.
>
> We *must* wait until the transaction is signalled as done because we
> must not perform any writes into config registers while the engine is
> busy.
>
> The dummy writes must be issued into a scratchpad register of the client
> so provide a mechanism to communicate the right address via descriptor
> metadata.
>
> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
- Mani
> ---
> drivers/dma/qcom/bam_dma.c | 165 ++++++++++++++++++++++++++++++++++++++-
> include/linux/dma/qcom_bam_dma.h | 10 +++
> 2 files changed, 171 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c
> index 83491e7c2f17d8c9d12a1a055baea7e3a0a75a53..309681e798d2e44992e3d20679c3a7564ad8f29e 100644
> --- a/drivers/dma/qcom/bam_dma.c
> +++ b/drivers/dma/qcom/bam_dma.c
> @@ -28,11 +28,13 @@
> #include <linux/clk.h>
> #include <linux/device.h>
> #include <linux/dma-mapping.h>
> +#include <linux/dma/qcom_bam_dma.h>
> #include <linux/dmaengine.h>
> #include <linux/init.h>
> #include <linux/interrupt.h>
> #include <linux/io.h>
> #include <linux/kernel.h>
> +#include <linux/lockdep.h>
> #include <linux/module.h>
> #include <linux/of_address.h>
> #include <linux/of_dma.h>
> @@ -60,6 +62,8 @@ struct bam_desc_hw {
> #define DESC_FLAG_EOB BIT(13)
> #define DESC_FLAG_NWD BIT(12)
> #define DESC_FLAG_CMD BIT(11)
> +#define DESC_FLAG_LOCK BIT(10)
> +#define DESC_FLAG_UNLOCK BIT(9)
>
> struct bam_async_desc {
> struct virt_dma_desc vd;
> @@ -391,6 +395,13 @@ struct bam_chan {
> struct list_head desc_list;
>
> struct list_head node;
> +
> + /* BAM locking infrastructure */
> + phys_addr_t scratchpad_addr;
> + struct scatterlist lock_sg;
> + struct scatterlist unlock_sg;
> + struct bam_cmd_element lock_ce;
> + struct bam_cmd_element unlock_ce;
> };
>
> static inline struct bam_chan *to_bam_chan(struct dma_chan *common)
> @@ -652,6 +663,32 @@ static int bam_slave_config(struct dma_chan *chan,
> return 0;
> }
>
> +static int bam_metadata_attach(struct dma_async_tx_descriptor *desc, void *data, size_t len)
> +{
> + struct bam_chan *bchan = to_bam_chan(desc->chan);
> + const struct bam_device_data *bdata = bchan->bdev->dev_data;
> + struct bam_desc_metadata *metadata = data;
> +
> + if (!data)
> + return -EINVAL;
> +
> + if (!bdata->pipe_lock_supported)
> + /*
> + * The client wants to use locking but this BAM version doesn't
> + * support it. Don't return an error here as this will stop the
> + * client from using DMA at all for no reason.
> + */
> + return 0;
> +
> + bchan->scratchpad_addr = metadata->scratchpad_addr;
> +
> + return 0;
> +}
> +
> +static const struct dma_descriptor_metadata_ops bam_metadata_ops = {
> + .attach = bam_metadata_attach,
> +};
> +
> /**
> * bam_prep_slave_sg - Prep slave sg transaction
> *
> @@ -668,6 +705,7 @@ static struct dma_async_tx_descriptor *bam_prep_slave_sg(struct dma_chan *chan,
> void *context)
> {
> struct bam_chan *bchan = to_bam_chan(chan);
> + struct dma_async_tx_descriptor *tx_desc;
> struct bam_device *bdev = bchan->bdev;
> struct bam_async_desc *async_desc;
> struct scatterlist *sg;
> @@ -723,7 +761,12 @@ static struct dma_async_tx_descriptor *bam_prep_slave_sg(struct dma_chan *chan,
> } while (remainder > 0);
> }
>
> - return vchan_tx_prep(&bchan->vc, &async_desc->vd, flags);
> + tx_desc = vchan_tx_prep(&bchan->vc, &async_desc->vd, flags);
> + if (!tx_desc)
> + return NULL;
> +
> + tx_desc->metadata_ops = &bam_metadata_ops;
> + return tx_desc;
> }
>
> /**
> @@ -1012,13 +1055,116 @@ static void bam_apply_new_config(struct bam_chan *bchan,
> bchan->reconfigure = 0;
> }
>
> +static struct bam_async_desc *
> +bam_make_lock_desc(struct bam_chan *bchan, struct scatterlist *sg,
> + struct bam_cmd_element *ce, unsigned long flag)
> +{
> + struct dma_chan *chan = &bchan->vc.chan;
> + struct bam_async_desc *async_desc;
> + struct bam_desc_hw *desc;
> + struct virt_dma_desc *vd;
> + struct virt_dma_chan *vc;
> + unsigned int mapped;
> + dma_cookie_t cookie;
> + int ret;
> +
> + sg_init_table(sg, 1);
> +
> + async_desc = kzalloc_flex(*async_desc, desc, 1, GFP_NOWAIT);
> + if (!async_desc) {
> + dev_err(bchan->bdev->dev, "failed to allocate the BAM lock descriptor\n");
> + return ERR_PTR(-ENOMEM);
> + }
> +
> + async_desc->num_desc = 1;
> + async_desc->curr_desc = async_desc->desc;
> + async_desc->dir = DMA_MEM_TO_DEV;
> +
> + desc = async_desc->desc;
> +
> + bam_prep_ce_le32(ce, bchan->scratchpad_addr, BAM_WRITE_COMMAND, 0);
> + sg_set_buf(sg, ce, sizeof(*ce));
> +
> + mapped = dma_map_sg_attrs(chan->slave, sg, 1, DMA_TO_DEVICE, DMA_PREP_CMD);
> + if (!mapped) {
> + kfree(async_desc);
> + return ERR_PTR(-ENOMEM);
> + }
> +
> + desc->flags |= cpu_to_le16(DESC_FLAG_CMD | flag);
> + desc->addr = sg_dma_address(sg);
> + desc->size = sizeof(struct bam_cmd_element);
> +
> + vc = &bchan->vc;
> + vd = &async_desc->vd;
> +
> + dma_async_tx_descriptor_init(&vd->tx, &vc->chan);
> + vd->tx.flags = DMA_PREP_CMD;
> + vd->tx.desc_free = vchan_tx_desc_free;
> + vd->tx_result.result = DMA_TRANS_NOERROR;
> + vd->tx_result.residue = 0;
> +
> + cookie = dma_cookie_assign(&vd->tx);
> + ret = dma_submit_error(cookie);
> + if (ret) {
> + dma_unmap_sg(chan->slave, sg, 1, DMA_TO_DEVICE);
> + kfree(async_desc);
> + return ERR_PTR(ret);
> + }
> +
> + return async_desc;
> +}
> +
> +static int bam_do_setup_pipe_lock(struct bam_chan *bchan, bool lock)
> +{
> + struct bam_device *bdev = bchan->bdev;
> + const struct bam_device_data *bdata = bdev->dev_data;
> + struct bam_async_desc *lock_desc;
> + struct bam_cmd_element *ce;
> + struct scatterlist *sgl;
> + unsigned long flag;
> +
> + lockdep_assert_held(&bchan->vc.lock);
> +
> + if (!bdata->pipe_lock_supported || !bchan->scratchpad_addr ||
> + bchan->slave.direction != DMA_MEM_TO_DEV)
> + return 0;
> +
> + if (lock) {
> + sgl = &bchan->lock_sg;
> + ce = &bchan->lock_ce;
> + flag = DESC_FLAG_LOCK;
> + } else {
> + sgl = &bchan->unlock_sg;
> + ce = &bchan->unlock_ce;
> + flag = DESC_FLAG_UNLOCK;
> + }
> +
> + lock_desc = bam_make_lock_desc(bchan, sgl, ce, flag);
> + if (IS_ERR(lock_desc))
> + return PTR_ERR(lock_desc);
> +
> + if (lock)
> + list_add(&lock_desc->vd.node, &bchan->vc.desc_issued);
> + else
> + list_add_tail(&lock_desc->vd.node, &bchan->vc.desc_issued);
> +
> + return 0;
> +}
> +
> +static void bam_setup_pipe_lock(struct bam_chan *bchan)
> +{
> + if (bam_do_setup_pipe_lock(bchan, true) || bam_do_setup_pipe_lock(bchan, false))
> + dev_err(bchan->vc.chan.slave, "Failed to setup BAM pipe lock descriptors");
> +}
> +
> /**
> * bam_start_dma - start next transaction
> * @bchan: bam dma channel
> */
> static void bam_start_dma(struct bam_chan *bchan)
> {
> - struct virt_dma_desc *vd = vchan_next_desc(&bchan->vc);
> + struct virt_dma_desc *vd;
> struct bam_device *bdev = bchan->bdev;
> struct bam_async_desc *async_desc = NULL;
> struct bam_desc_hw *desc;
> @@ -1030,6 +1176,9 @@ static void bam_start_dma(struct bam_chan *bchan)
>
> lockdep_assert_held(&bchan->vc.lock);
>
> + bam_setup_pipe_lock(bchan);
> +
> + vd = vchan_next_desc(&bchan->vc);
> if (!vd)
> return;
>
> @@ -1157,8 +1306,15 @@ static void bam_issue_pending(struct dma_chan *chan)
> */
> static void bam_dma_free_desc(struct virt_dma_desc *vd)
> {
> - struct bam_async_desc *async_desc = container_of(vd,
> - struct bam_async_desc, vd);
> + struct bam_async_desc *async_desc = container_of(vd, struct bam_async_desc, vd);
> + struct bam_desc_hw *desc = async_desc->desc;
> + struct dma_chan *chan = vd->tx.chan;
> + struct bam_chan *bchan = to_bam_chan(chan);
> +
> + if (le16_to_cpu(desc->flags) & DESC_FLAG_LOCK)
> + dma_unmap_sg(chan->slave, &bchan->lock_sg, 1, DMA_TO_DEVICE);
> + else if (le16_to_cpu(desc->flags) & DESC_FLAG_UNLOCK)
> + dma_unmap_sg(chan->slave, &bchan->unlock_sg, 1, DMA_TO_DEVICE);
>
> kfree(async_desc);
> }
> @@ -1350,6 +1506,7 @@ static int bam_dma_probe(struct platform_device *pdev)
> bdev->common.device_terminate_all = bam_dma_terminate_all;
> bdev->common.device_issue_pending = bam_issue_pending;
> bdev->common.device_tx_status = bam_tx_status;
> + bdev->common.desc_metadata_modes = DESC_METADATA_CLIENT;
> bdev->common.dev = bdev->dev;
>
> ret = dma_async_device_register(&bdev->common);
> diff --git a/include/linux/dma/qcom_bam_dma.h b/include/linux/dma/qcom_bam_dma.h
> index 68fc0e643b1b97fe4520d5878daa322b81f4f559..5f0d2a27face8223ecb77da33d9e050c1ff2622f 100644
> --- a/include/linux/dma/qcom_bam_dma.h
> +++ b/include/linux/dma/qcom_bam_dma.h
> @@ -34,6 +34,16 @@ enum bam_command_type {
> BAM_READ_COMMAND,
> };
>
> +/**
> + * struct bam_desc_metadata - DMA descriptor metadata specific to the BAM driver.
> + *
> + * @scratchpad_addr: Physical address to use for dummy write operations when
> + * queuing command descriptors with LOCK/UNLOCK bits set.
> + */
> +struct bam_desc_metadata {
> + phys_addr_t scratchpad_addr;
> +};
> +
> /*
> * prep_bam_ce_le32 - Wrapper function to prepare a single BAM command
> * element with the data already in le32 format.
>
> --
> 2.47.3
>
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply
* Re: [PATCH v9 2/9] lib: vsprintf: export simple_strntoull() in a safe prototype
From: Rodrigo Alencar @ 2026-03-30 12:49 UTC (permalink / raw)
To: Rodrigo Alencar, Andy Shevchenko
Cc: Petr Mladek, rodrigo.alencar, linux-kernel, linux-iio, devicetree,
linux-doc, Jonathan Cameron, David Lechner, Andy Shevchenko,
Lars-Peter Clausen, Michael Hennerich, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Jonathan Corbet, Andrew Morton,
Steven Rostedt, Rasmus Villemoes, Sergey Senozhatsky, Shuah Khan
In-Reply-To: <x34d7jz7be4ommjh6efx5mcq5pbpellykwuyrqayr4ske3lywf@wh46mu3anmcz>
On 26/03/27 03:17PM, Rodrigo Alencar wrote:
> On 26/03/27 12:21PM, Andy Shevchenko wrote:
> > On Fri, Mar 27, 2026 at 10:11:56AM +0000, Rodrigo Alencar wrote:
> > > On 26/03/27 11:17AM, Andy Shevchenko wrote:
> > > > On Fri, Mar 27, 2026 at 09:45:17AM +0100, Petr Mladek wrote:
> > > > > On Fri 2026-03-20 16:27:27, Rodrigo Alencar via B4 Relay wrote:
...
> > > > Maybe we want to have kstrtof32() and kstrtof64() for these two cases?
> > > >
> > > > With that we will always consider the fraction part as 32- or 64-bit,
> > > > imply floor() on the fraction for the sake of simplicity and require
> > > > it to be NUL-terminated with possible trailing '\n'.
> > >
> > > I think this is a good idea, but calling it float or fixed point itself
> > > is a bit confusing as float often refers to the IEEE 754 standard and
> > > fixed point types is often expressed in Q-format.
> >
> > Yeah... I am lack of better naming.
>
> decimals is the name, but they are often represented as:
>
> DECIMAL = INT * 10^X + FRAC
>
> in a single 64-bit number, which would be fine for my end use case.
> However IIO decimal fixed point parsing is out there for quite some time a
> lot of drivers use that. The interface often relies on breaking parsed values
> into an integer array (for standard attributes int val and int val2 are expected).
Thinking about this again and in IIO drivers we end up doing something like:
val64 = (u64)val * MICRO + val2;
so that drivers often work with scaled versions of the decimal value.
then, would it make sense to have a function that already outputs such value?
That would allow to have more freedom over the 64-bit split between integer
and fractional parts.
As a draft:
static int _kstrtodec64(const char *s, unsigned int scale, u64 *res)
{
u64 _res = 0, _frac = 0;
unsigned int rv;
if (*s != '.') {
rv = _parse_integer(s, 10, &_res);
if (rv & KSTRTOX_OVERFLOW)
return -ERANGE;
if (rv == 0)
return -EINVAL;
s += rv;
}
if (*s == '.') {
s++;
rv = _parse_integer_limit(s, 10, &_frac, scale);
if (rv & KSTRTOX_OVERFLOW)
return -ERANGE;
if (rv == 0)
return -EINVAL;
s += rv;
if (rv < scale)
_frac *= int_pow(10, scale - rv);
while (isdigit(*s)) /* truncate */
s++;
}
if (*s == '\n')
s++;
if (*s)
return -EINVAL;
if (check_mul_overflow(_res, int_pow(10, scale), &_res) ||
check_add_overflow(_res, _frac, &_res))
return -ERANGE;
*res = _res;
return 0;
}
noinline
int kstrtoudec64(const char *s, unsigned int scale, u64 *res)
{
if (s[0] == '+')
s++;
return _kstrtodec64(s, scale, res);
}
EXPORT_SYMBOL(kstrtoudec64);
noinline
int kstrtosdec64(const char *s, unsigned int scale, s64 *res)
{
u64 tmp;
int rv;
if (s[0] == '-') {
rv = _kstrtodec64(s + 1, scale, &tmp);
if (rv < 0)
return rv;
if ((s64)-tmp > 0)
return -ERANGE;
*res = -tmp;
} else {
rv = kstrtoudec64(s, scale, &tmp);
if (rv < 0)
return rv;
if ((s64)tmp < 0)
return -ERANGE;
*res = tmp;
}
return 0;
}
EXPORT_SYMBOL(kstrtosdec64);
e.g., kstrtosdec64() or kstrtoudec64() parses "3.1415" with scale 3 into 3141
--
Kind regards,
Rodrigo Alencar
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox