Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH v2] docs: Fix minor grammatical error
From: Randy Dunlap @ 2026-06-09 16:56 UTC (permalink / raw)
  To: Brigham Campbell, Thorsten Leemhuis, Jonathan Corbet, Shuah Khan,
	open list:DOCUMENTATION REPORTING ISSUES, open list
In-Reply-To: <20260609070618.12566-1-me@brighamcampbell.com>



On 6/9/26 12:06 AM, Brigham Campbell wrote:
> Fix minor grammatical error in the administration guide.
> 
> Signed-off-by: Brigham Campbell <me@brighamcampbell.com>

Acked-by: Randy Dunlap <rdunlap@infradead.org>
Thanks.

> ---
> 
> Since v1:
> * Drop pedantic line re-wrapping.
> 
> In hindsight, I should have guessed that reflowing the paragraph was
> overzealous. Thanks for the guidance, Randy, Thorsten. I'll remember it
> if I make minor doc fixes in the future.
> 
>  Documentation/admin-guide/quickly-build-trimmed-linux.rst | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/quickly-build-trimmed-linux.rst b/Documentation/admin-guide/quickly-build-trimmed-linux.rst
> index cb178e0a6208..3432dc8e1a85 100644
> --- a/Documentation/admin-guide/quickly-build-trimmed-linux.rst
> +++ b/Documentation/admin-guide/quickly-build-trimmed-linux.rst
> @@ -217,7 +217,7 @@ again.
>  
>     There is a catch: 'localmodconfig' is likely to disable kernel features you
>     did not use since you booted your Linux -- like drivers for currently
> -   disconnected peripherals or a virtualization software not haven't used yet.
> +   disconnected peripherals or virtualization software not currently in use.
>     You can reduce or nearly eliminate that risk with tricks the reference
>     section outlines; but when building a kernel just for quick testing purposes
>     it is often negligible if such features are missing. But you should keep that
> 
> base-commit: 738bb6e6c8d992f33335b3cbcce051ab118a33dc

-- 
~Randy

^ permalink raw reply

* Re: [PATCH v5 06/19] perf test cs-etm: Replace unroll loop thread with deterministic decode test
From: Leo Yan @ 2026-06-09 17:04 UTC (permalink / raw)
  To: James Clark
  Cc: Suzuki K Poulose, Mike Leach, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis, coresight, linux-perf-users,
	linux-kernel, Arnaldo Carvalho de Melo, linux-doc
In-Reply-To: <20260609-james-cs-context-tracking-fix-v5-6-d53a7d096a19@linaro.org>

On Tue, Jun 09, 2026 at 03:40:11PM +0100, James Clark wrote:
> Testing a long sequence without branches seems like it would be better
> as a decoder unit test, and this test doesn't test decoding either, so
> it's not clear what bugs this is trying to catch.
> 
> The new deterministic workload has somewhat long sequences when built
> unoptimized, and we can always increase them later if we want to. But
> now we test that decoding always gives the same result for the same
> sequence of code which we've never had before.
> 
> Signed-off-by: James Clark <james.clark@linaro.org>

Tested-by: Leo Yan <leo.yan@arm.com>

^ permalink raw reply

* Re: [PATCH v5 08/19] perf test cs-etm: Replace memcpy test with raw dump stress test
From: Leo Yan @ 2026-06-09 17:16 UTC (permalink / raw)
  To: James Clark
  Cc: Suzuki K Poulose, Mike Leach, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis, coresight, linux-perf-users,
	linux-kernel, Arnaldo Carvalho de Melo, linux-doc
In-Reply-To: <20260609-james-cs-context-tracking-fix-v5-8-d53a7d096a19@linaro.org>

On Tue, Jun 09, 2026 at 03:40:13PM +0100, James Clark wrote:
> Like asm_pure_loop, this memcpy test only checks that 10 of each of a
> few trace packet types occur after recording a lot of trace, which isn't
> more specific than other existing Coresight tests.
> 
> Assume it was supposed to be a stress test for dumping and replace it
> with one that doesn't require a custom binary and checks for a specific
> amount of raw output. Don't bother checking for packets because the
> other tests that test decoding will catch issues with malformed data.
> 
> This also adds coverage for exit snapshot mode which was missing.
> 
> Signed-off-by: James Clark <james.clark@linaro.org>

Tested-by: Leo Yan <leo.yan@arm.com>

^ permalink raw reply

* Re: [PATCH v5 10/19] perf test cs-etm: Test decoding for concurrent threads test
From: Leo Yan @ 2026-06-09 17:18 UTC (permalink / raw)
  To: James Clark
  Cc: Suzuki K Poulose, Mike Leach, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis, coresight, linux-perf-users,
	linux-kernel, Arnaldo Carvalho de Melo, linux-doc
In-Reply-To: <20260609-james-cs-context-tracking-fix-v5-10-d53a7d096a19@linaro.org>

On Tue, Jun 09, 2026 at 03:40:15PM +0100, James Clark wrote:
> The thread_loop test only looks for context IDs in the raw trace.
> There's a lot more that can go wrong when decoding these, so replace it
> with a test that looks at the final output for matching thread names and
> symbols.
> 
> In the future we might use timestamps and context switch events to track
> threads, so looking at context IDs in the raw trace wouldn't always
> work.
> 
> Reviewed-by: Leo Yan <leo.yan@arm.com>
> Signed-off-by: James Clark <james.clark@linaro.org>

Tested-by: Leo Yan <leo.yan@arm.com>

^ permalink raw reply

* Re: [PATCH v6 08/12] PCI: liveupdate: Inherit ACS flags in incoming preserved devices
From: Pranjal Shrivastava @ 2026-06-09 17:20 UTC (permalink / raw)
  To: David Matlack
  Cc: kexec, linux-doc, linux-kernel, linux-mm, linux-pci,
	Adithya Jayachandran, Alexander Graf, Alex Williamson,
	Bjorn Helgaas, Chris Li, David Rientjes, Jacob Pan,
	Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Leon Romanovsky,
	Lukas Wunner, Mike Rapoport, Parav Pandit, Pasha Tatashin,
	Pratyush Yadav, Saeed Mahameed, Samiullah Khawaja, Shuah Khan,
	Vipin Sharma, William Tu, Yi Liu
In-Reply-To: <aic6mdiZ0qUJpFca@google.com>

On Mon, Jun 08, 2026 at 09:56:41PM +0000, David Matlack wrote:
> On 2026-06-07 08:37 PM, Pranjal Shrivastava wrote:
> > On Fri, May 22, 2026 at 08:24:06PM +0000, David Matlack wrote:
> > > Inherit Access Control Services (ACS) flags on all incoming preserved
> > > devices (endpoints and upstream bridges) during a Live Update.
> > > 
> > > Inheriting ACS flags avoids changing routing rules while memory
> > > transactions are in flight from preserved devices. This is also strictly
> > > necessary to ensure that IOMMU group assignments do not change across
> > > a Live Update for preserved devices, as changing ACS configurations can
> > > split or merge IOMMU groups.
> > > 
> > > Cache the inherited ACS controls established by the previous kernel in
> > > struct pci_dev so that ACS controls do not change after a reset
> > > (pci_restore_state() calls pci_enable_acs()).
> > > 
> > > To simplify ACS inheritance, reject preserving any devices that require
> > > quirks to enable ACS as those quirks would also have to take Live Update
> > > into account.
> > > 
> > > Signed-off-by: David Matlack <dmatlack@google.com>
> > > ---
> > >  drivers/pci/liveupdate.c       | 68 ++++++++++++++++++++++++++++++++++
> > >  drivers/pci/liveupdate.h       | 11 ++++++
> > >  drivers/pci/pci.c              |  5 +++
> > >  drivers/pci/pci.h              |  5 +++
> > >  drivers/pci/quirks.c           |  7 ++++
> > >  include/linux/pci_liveupdate.h |  6 +++
> > >  6 files changed, 102 insertions(+)
> > > 
> > 
> > [...]
> > 
> > >  
> > > +void pci_liveupdate_init_acs(struct pci_dev *dev)
> > > +{
> > > +	guard(rwsem_read)(&pci_liveupdate.rwsem);
> > > +
> > > +	if (!dev->acs_cap || !dev->liveupdate.incoming)
> > > +		return;
> > > +
> > > +	pci_read_config_word(dev, dev->acs_cap + PCI_ACS_CTRL, &dev->liveupdate.acs_ctrl);
> > 
> > I might be thinking out loud here, but as an attacker, this motivates me
> > to somehow hack the EP FW to mis-report the PCI_ACS_CTRL register across
> > a liveupdate to fool the incoming kernel. If the FW feeds a 0, it silently
> > strips ACS protections.
> > 
> > Should we also serialize ACS state in ser somehow to ensure we aren't 
> > fooled by something like this?
> 
> What does "EP FW" mean?

I was referring to the Endpoint Firmware (basically any SW running on
a downstream device)

> 
> Does such an attacker even need Live Update to attack the system? It
> seems like such an attacker could route TLPs in whatever malicious way
> they want regardless of Live Update.
> 

I agree that compromised PCIe devices are a menace anyway. But I was
talking about the potential window opened up by Live Update here,
suppose we have Device A & B assigned to 2 different VMs (implying they
are in separate IOMMU groups because the switch set ACS_RR = 1).

Now, the attacker has an opportunity with Liveupdate, since the devices
are already assigned, if *somehow* it flips a bit like ACS_RR, the
incoming kernel might see both the devices in the same IOMMU group.
Who detects this case and what happens if this happens if the devices
are kept assigned to these VMs?

Thanks,
Praan

^ permalink raw reply

* Re: [PATCH v5 15/19] perf test cs-etm: Remove unused Coresight workloads
From: Leo Yan @ 2026-06-09 17:22 UTC (permalink / raw)
  To: James Clark
  Cc: Suzuki K Poulose, Mike Leach, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis, coresight, linux-perf-users,
	linux-kernel, Arnaldo Carvalho de Melo, linux-doc
In-Reply-To: <20260609-james-cs-context-tracking-fix-v5-15-d53a7d096a19@linaro.org>

On Tue, Jun 09, 2026 at 03:40:20PM +0100, James Clark wrote:
> These are now unused and had various issues like not working with out of
> source builds and being slow to compile. Delete them.

> +There are a set of Perf tests for CoreSight which can be run with::
> +
> +  sudo perf test coresight

Tested-by: Leo Yan <leo.yan@arm.com>

^ permalink raw reply

* Re: [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup
From: John Ericson @ 2026-06-09 17:27 UTC (permalink / raw)
  To: Li Chen, Andy Lutomirski
  Cc: Christian Brauner, Kees Cook, Al Viro, linux-fsdevel, linux-api,
	LKML, linux-mm, linux-arch, linux-doc, linux-kselftest, x86,
	Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Jan Kara, Jonathan Corbet,
	Shuah Khan
In-Reply-To: <19eacd64508.26b92c022125848.262962729296162879@linux.beauty>

On Tue, Jun 9, 2026, at 10:43 AM, Li Chen wrote:
> Hi Andy,
>
> ---- On Tue, 09 Jun 2026 08:01:57 +0800  Andy Lutomirski <luto@kernel.org> wrote ---
> > [...]
> >
> > After contemplating this for a bit... why pidfd?  Doesn't a pidfd
> > refer to an actual process that is, or at least was, running?  This
> > new thing is a process that we are contemplating spawning.  I can
> > imagine that basically all pidfd APIs would be a bit confused by the
> > nonexistence of the process in question.
> >
>
> Yes, I think that is a real concern.
>
> In my current local WIP I tried to keep that distinction explicit.
> pidfd_spawn_open() returns a pidfs-backed builder fd, not a normal pidfd
> referring to a process. The builder fd is allocated as an anonymous pidfs
> file with builder-specific file operations:
>
>     file = pidfs_alloc_anon_file("[pidfd_spawn]",
>                                  &pidfd_spawn_builder_fops, builder,
>                                  O_RDWR);
>

What does your builder fd point to, explicitly? For example in my other reply I
talked about how it was "real" process state. In my FreeBSD patch, for example,
I found there was already a status for a process "in exec", and I figured that
was clean to reuse for one of these "embryonic" processes that also hadn't
started running. I would reckon that Linux probably has some similar notions.

> and the normal pidfd helpers still reject it because it does not use the
> ordinary pidfd file operations:
>
>     struct pid *pidfd_pid(const struct file *file)
>     {
>         if (file->f_op != &pidfs_file_operations)
>             return ERR_PTR(-EBADF);
>         return file_inode(file)->i_private;
>     }
>
> So the current split is:
>
>     builder_fd = pidfd_spawn_open(...);       /* builder object */
>     pidfd_config(builder_fd, ...);
>     child_pidfd = pidfd_spawn_run(builder_fd, ...); /* real pidfd */
>
> Only the last fd is a normal pidfd for an actual child process. The builder
> fd is only accepted by the builder operations.
>
> This avoids having to define what waitid(P_PIDFD), pidfd_send_signal(),
> pidfd_getfd(), poll(), etc. mean before the process exists.

I wouldn't be so sure this is necessary/good. For example, I think it could
make sense to wait on a process that has yet to be started; one just waits for
both the process to start and the process to exit. Obviously a blocking syscall
in the thread that is spawning the process is not useful, but the asynchronous
poll variation seems fine.

As long as there is real process state here, it shouldn't be too hard to
implement.

> The downside is that it adds a separate open-style entry point and is less
> uniform than the pidfd_open(0, PIDFD_EMPTY) spelling Christian sketched.

I do think there is no point having two file descriptors. The file descriptor
that previously referred to the builder/embryonic process then can refer to the
real process, right?

> If people think there is a better way to represent the pre-spawn builder
> state, or if the preference is to integrate it directly into pidfd_open()
> with an explicit empty/future-pidfd state, I would be happy to discuss that.

Hope the above answers your question? I suppose my ideas lean more on the
"future" than "empty" side --- there is indeed a thread in the thread group,
with real VM/namespace/file descriptor etc. state. Moreover, state gets
initialized before the process is started, so the actual start is a pretty
lightweight step of just letting the scheduler know the now-ready process can
be scheduled. The only thing that distinguishes the embryonic process from a
real one is simply that it isn't running --- i.e. isn't (yet) available to be
scheduled --- so the pidfds holders are free to poke at its state.

Cheers,

John

^ permalink raw reply

* Re: [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup
From: Jann Horn @ 2026-06-09 17:53 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Mateusz Guzik, Christian Brauner, Li Chen, Kees Cook,
	Alexander Viro, linux-fsdevel, linux-api, linux-kernel, linux-mm,
	linux-arch, linux-doc, linux-kselftest, x86, Arnd Bergmann,
	Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Jan Kara, Jonathan Corbet,
	Shuah Khan
In-Reply-To: <lhubjdk1c1m.fsf@oldenburg.str.redhat.com>

On Tue, Jun 9, 2026 at 8:08 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Jann Horn:
>
> >> Per the above, the primary win would stem from *NOT* messing with mm.
> >
> > As you write below, I think we have that with CLONE_MM? The C function
> > vfork() is kind of a terrible API because of its returns-twice
> > behavior, but I think if process cloning with CLONE_VM|CLONE_VFORK was
> > wrapped by libc in a way similar to clone() (with the child executing
> > a separate handler function), or if it was used in the implementation
> > of some higher-level process-spawning API, it would be a perfectly
> > fine API?
>
> No, there is still a problem with SIGTSTP handling because we cannot
> atomically unmask the signal during execve.  We need to unblock SIGTSTP
> before execve in the new process, but this means that it can get
> suspended by SIGTSTP.  Consequently, the execve never happens and the
> original process is stuck in vfork:
>
>   posix_spawn: parent can get stuck in uninterruptible sleep if child
>   receives SIGTSTP early enough
>   <https://inbox.sourceware.org/libc-help/2921668c-773e-465d-9480-0abb6f979bf9@www.fastmail.com/>
>
> More on the low-level side, it's difficult to make sure that execve gets
> a consistent snapshot of the environ vector.  Both vfork and execve need
> to be async-signal-safe.  Any locking or memory allocation (except for
> the stack …) persists in the original process after vfork returns.  The

I think that's not entirely accurate; if you call set_robust_list() on
a futex list, then call execve(), the futexes should be released once
the process switches to a new MM, in
begin_new_exec -> exec_mmap -> exec_mm_release -> futex_exec_release
-> futex_cleanup -> exit_robust_list.

So in theory you could use clone() with CLONE_VM and without
CLONE_VFORK, and let the parent either wait for a futex that is
released on exec, or somehow asynchronously check later whether the
futex is still held... probably not the nicest building block but
maybe workable? Though I guess it would fit more nicely if there was a
"munmap() this range on exec" API...

> environ vector can be large, so making a copy on the stack is not ideal.
> It's even harder for getenv/setenv/unsetenv implementations that use
> locking instead of software transactional memory.

Makes sense, that kind of sounds like a pain inherent in being able to
execute from signal handler context...

^ permalink raw reply

* Re: [PATCH v3] cpu/hotplug: Fix NULL kobject warning in cpuhp_smt_enable()
From: Catalin Marinas @ 2026-06-09 17:58 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: Will Deacon, corbet, skhan, punit.agrawal, jic23,
	osama.abdelkader, chenl311, fengchengwen, suzuki.poulose, maz,
	lpieralisi, timothy.hayes, sascha.bischoff, arnd,
	mrigendra.chaubey, pierre.gondois, dietmar.eggemann, yangyicong,
	sudeep.holla, linux-arm-kernel, linux-doc, linux-kernel
In-Reply-To: <02932ef7-5819-4cf5-8e78-8fd3fd40274f@huawei.com>

Hi Jinjie,

On Wed, Jun 03, 2026 at 02:38:11PM +0800, Jinjie Ruan wrote:
> On 6/2/2026 7:09 PM, Will Deacon wrote:
> > On Wed, May 20, 2026 at 10:20:23AM +0800, Jinjie Ruan wrote:
> >> When booting with ACPI, arm64 smp_prepare_cpus() currently sets all
> >> enumerated CPUs as "present" regardless of their status in the MADT. This
> >> causes issues with SMT hotplug control. For instance, with QEMU's
> >> "-smp 4,maxcpus=8" configuration, the MADT GICC entries are populated as
> >> follows: the first four CPUs are marked Enabled while the remaining four
> >> are marked Online Capable to support potential hot-plugging.
> >>
> >> Fix this by:
> >>
> >> 1. When booting with ACPI, checking the ACPI_MADT_ENABLED flag in the GICC
> >>    entry before calling set_cpu_present() during SMP initialization.
> >>
> >> 2. Properly managing the present mask in acpi_map_cpu() and
> >>    acpi_unmap_cpu() to support actual CPU hotplug events, This aligns with
> >>    other architectures like x86 and LoongArch.
> >>
> >> 3. Update the arm64 CPU hotplug documentation to no longer state that all
> >>    online-capable vCPUs are marked as present by the kernel at boot time.
> >>
> >> This ensures that only physically available or explicitly enabled CPUs
> >> are in the present mask, keeping the SMT control logic consistent with
> >> the actual hardware state.
> > 
> > Please can you check the Sashiko review comment?
> > 
> > https://sashiko.dev/#/patchset/20260520022023.126670-1-ruanjinjie@huawei.com
> 
> I think commit eba4675008a6 ("arm64: arch_register_cpu() variant to
> check if an ACPI handle is now available.") introduced this bug.
> 
> It introduced an architectural safety block inside
> arch_unregister_cpu(). If a hot-unplug operation is determined to be a
> physical hardware removal (where _STA evaluates to
> !ACPI_STA_DEVICE_PRESENT), it aborts the unregistration transaction
> early to protect unreadied arm64 infrastructure, thereby skipping
> unregister_cpu().
> 
> However, the generic ACPI processor driver path in
> acpi_processor_post_eject() currently treats arch_unregister_cpu() as
> an unconditional void operation. When arch_unregister_cpu() bails out
> early, the subsequent cleanup flow blindly proceeds to call
> acpi_unmap_cpu(), clears global per-cpu processor arrays, and
> unconditionally free the 'struct acpi_processor' object.
> 
> I think we can fix this by:
> 
>     1. Refactoring arch_unregister_cpu() to return an integer
> transaction status. It returns -EOPNOTSUPP when aborting due to physical
> hot-remove blocking, -EINVAL/-EIO on firmware failures, and 0 only upon
> successful unregistration.
> 
>     2. Guarding the downstream execution flow in
> acpi_processor_post_eject(). If arch_unregister_cpu() returns a error
> code, the hot-unplug transaction is considered aborted.

I wonder whether we need all this guarding. In the worst case, we could
rewrite the function, something like below, to always unregister and
only warn:

void arch_unregister_cpu(int cpu)
{
	acpi_handle acpi_handle = acpi_get_processor_handle(cpu);
	struct cpu *c = &per_cpu(cpu_devices, cpu);
	acpi_status status;
	unsigned long long sta;

	if (!acpi_handle) {
		pr_err_once("Removing a CPU without associated ACPI handle\n");
	} else {
		status = acpi_evaluate_integer(acpi_handle, "_STA", NULL, &sta);
		if (!ACPI_FAILURE(status) &&
		    cpu_present(cpu) && !(sta & ACPI_STA_DEVICE_PRESENT))
			pr_err_once("Changing CPU present bit is not supported\n");
	}

	unregister_cpu(c);
}

However, on the first condition, can we actually trigger !acpi_handle?
If not, we could just drop it. I tried to look up the paths and I don't
think we'd ever end up in this function with !acpi_handle. So this
leaves us with the next checks.

On the second/third conditions, it's more about preventing physical CPU
hotplug as we haven't properly defined it for arm yet but we could just
add a WARN_ONCE() to make it more visible and still proceed with the
unregistering. I think with your proposal, we don't fully unroll the
state anyway just by returning an error in arch_unregister_cpu(), so I'd
rather continue here.

What does firmware do for virtual CPU hotplug w.r.t. _STA? I noticed a
slight change in wording in the cpu-hotplug.rst doc with your patch from

  On virtual systems the _STA method must always report the CPU as
  ``present``

to

  On virtual systems the _STA method must report the CPU as ``present``
  when it is activated by the firmware

Was your intention that _STA.PRESENT can become 0 when hot-unplugging
virtual CPUs?

-- 
Catalin

^ permalink raw reply

* Re: [PATCH v6 08/12] PCI: liveupdate: Inherit ACS flags in incoming preserved devices
From: David Matlack @ 2026-06-09 18:40 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: kexec, linux-doc, linux-kernel, linux-mm, linux-pci,
	Adithya Jayachandran, Alexander Graf, Alex Williamson,
	Bjorn Helgaas, Chris Li, David Rientjes, Jacob Pan,
	Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Leon Romanovsky,
	Lukas Wunner, Mike Rapoport, Parav Pandit, Pasha Tatashin,
	Pratyush Yadav, Saeed Mahameed, Samiullah Khawaja, Shuah Khan,
	Vipin Sharma, William Tu, Yi Liu
In-Reply-To: <aihLTgs1Y49OXQaV@google.com>

On Tue, Jun 9, 2026 at 10:20 AM Pranjal Shrivastava <praan@google.com> wrote:
>
> On Mon, Jun 08, 2026 at 09:56:41PM +0000, David Matlack wrote:
> > On 2026-06-07 08:37 PM, Pranjal Shrivastava wrote:
> > > On Fri, May 22, 2026 at 08:24:06PM +0000, David Matlack wrote:
> > > > Inherit Access Control Services (ACS) flags on all incoming preserved
> > > > devices (endpoints and upstream bridges) during a Live Update.
> > > >
> > > > Inheriting ACS flags avoids changing routing rules while memory
> > > > transactions are in flight from preserved devices. This is also strictly
> > > > necessary to ensure that IOMMU group assignments do not change across
> > > > a Live Update for preserved devices, as changing ACS configurations can
> > > > split or merge IOMMU groups.
> > > >
> > > > Cache the inherited ACS controls established by the previous kernel in
> > > > struct pci_dev so that ACS controls do not change after a reset
> > > > (pci_restore_state() calls pci_enable_acs()).
> > > >
> > > > To simplify ACS inheritance, reject preserving any devices that require
> > > > quirks to enable ACS as those quirks would also have to take Live Update
> > > > into account.
> > > >
> > > > Signed-off-by: David Matlack <dmatlack@google.com>
> > > > ---
> > > >  drivers/pci/liveupdate.c       | 68 ++++++++++++++++++++++++++++++++++
> > > >  drivers/pci/liveupdate.h       | 11 ++++++
> > > >  drivers/pci/pci.c              |  5 +++
> > > >  drivers/pci/pci.h              |  5 +++
> > > >  drivers/pci/quirks.c           |  7 ++++
> > > >  include/linux/pci_liveupdate.h |  6 +++
> > > >  6 files changed, 102 insertions(+)
> > > >
> > >
> > > [...]
> > >
> > > >
> > > > +void pci_liveupdate_init_acs(struct pci_dev *dev)
> > > > +{
> > > > + guard(rwsem_read)(&pci_liveupdate.rwsem);
> > > > +
> > > > + if (!dev->acs_cap || !dev->liveupdate.incoming)
> > > > +         return;
> > > > +
> > > > + pci_read_config_word(dev, dev->acs_cap + PCI_ACS_CTRL, &dev->liveupdate.acs_ctrl);
> > >
> > > I might be thinking out loud here, but as an attacker, this motivates me
> > > to somehow hack the EP FW to mis-report the PCI_ACS_CTRL register across
> > > a liveupdate to fool the incoming kernel. If the FW feeds a 0, it silently
> > > strips ACS protections.
> > >
> > > Should we also serialize ACS state in ser somehow to ensure we aren't
> > > fooled by something like this?
> >
> > What does "EP FW" mean?
>
> I was referring to the Endpoint Firmware (basically any SW running on
> a downstream device)
>
> >
> > Does such an attacker even need Live Update to attack the system? It
> > seems like such an attacker could route TLPs in whatever malicious way
> > they want regardless of Live Update.
> >
>
> I agree that compromised PCIe devices are a menace anyway. But I was
> talking about the potential window opened up by Live Update here,
> suppose we have Device A & B assigned to 2 different VMs (implying they
> are in separate IOMMU groups because the switch set ACS_RR = 1).
>
> Now, the attacker has an opportunity with Liveupdate, since the devices
> are already assigned, if *somehow* it flips a bit like ACS_RR, the
> incoming kernel might see both the devices in the same IOMMU group.
> Who detects this case and what happens if this happens if the devices
> are kept assigned to these VMs?

I suspect that would be caught during the restore of the iommufds to
which those devices are attached.

The kernel would attempt to restore each device into a separate domain
(since that's how they were preserved before the Live Update) but that
will fail because they are in the same group now. Even if one of the
devices was not preserved, that will still cause a failure when a user
tries to start using that device (e.g. to try to attach it to a
different VM).

^ permalink raw reply

* Re: [PATCH v9 2/6] mm/memory-failure: surface unhandlable kernel pages as -ENOTRECOVERABLE
From: David Hildenbrand (Arm) @ 2026-06-09 18:41 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Miaohe Lin, Andrew Morton, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	Naoya Horiguchi, Jonathan Corbet, Shuah Khan, Liam R. Howlett,
	lance.yang, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	linux-mm, linux-kernel, linux-doc, linux-kselftest,
	linux-trace-kernel, kernel-team
In-Reply-To: <aig7jzwDHfVCxikl@gmail.com>

On 6/9/26 18:15, Breno Leitao wrote:
> On Tue, Jun 09, 2026 at 04:41:01PM +0200, David Hildenbrand (Arm) wrote:
>> On 6/9/26 12:56, Breno Leitao wrote:
>>> get_any_page() collapses every HWPoisonHandlable() rejection into a
>>> single -EIO via the __get_hwpoison_page() -> -EBUSY -> shake_page()
>>> -> retry path.  That is correct for the transient case (a userspace
>>> folio briefly off LRU during migration or compaction, which a later
>>> shake can drag back), but wrong for stable kernel-owned pages: slab,
>>> page-table, large-kmalloc and PG_reserved pages will never become
>>> HWPoisonHandlable(), so the retry loop is wasted work and the final
>>> -EIO loses the "this is structurally unrecoverable" information.
>>> memory_failure() then maps -EIO into MF_MSG_GET_HWPOISON, which the
>>> panic-on-unrecoverable sysctl deliberately does not act on.
>>>
>>> Introduce HWPoisonKernelOwned(), a small predicate that positively
>>> identifies pages the hwpoison handler cannot recover from:
>>>
>>>   HWPoisonKernelOwned(p, flags) :=
>>>       !(MF_SOFT_OFFLINE && page_has_movable_ops(p)) &&
>>>       (PageReserved(p) ||
>>>        PageSlab(head) || PageTable(head) || PageLargeKmalloc(head))
>>>
>>>   where head = compound_head(p).
>>>
>>> PG_reserved is a per-page flag (PF_NO_COMPOUND) and is tested on the
>>> page directly.  The slab, page-table and large-kmalloc page-type bits
>>> are only stored on the head page, so those tests resolve the compound
>>> head first, then re-read compound_head(page) afterwards: a concurrent
>>> split or compound free that moves head invalidates the just-read flags
>>> and the loop retries.  The lookup still takes no refcount, mirroring
>>> the rest of get_any_page(); the recheck closes the common split race,
>>> and a residual free->alloc->free in the same window can only mis-tag
>>> a genuinely poisoned page, never reclassify a handlable one.
>>>
>>> The MF_SOFT_OFFLINE / page_has_movable_ops() opt-out mirrors the
>>> same exception in HWPoisonHandlable(): soft-offline is allowed to
>>> migrate movable_ops pages even though they are not on the LRU, and
>>> we must not pre-empt that with an unrecoverable verdict.
>>>
>>> The list is intentionally not exhaustive.  vmalloc and kernel-stack
>>> pages, for example, do not carry a page_type bit and would need a
>>> different oracle; they keep going through the existing retry path
>>> unchanged.  This is the smallest set we can identify with certainty
>>> by page type.
>>>
>>> Wire the helper into the top of get_any_page() to short-circuit
>>> those pages before the retry loop runs.  On a hit, drop the caller's
>>> MF_COUNT_INCREASED reference (if any) and return -ENOTRECOVERABLE
>>> straight away.  Pages outside the helper's positive list still take
>>> the existing retry path and return -EIO, leaving operator-visible
>>> behaviour for those cases unchanged.
>>>
>>> Extend the unhandlable-page pr_err() to fire for either errno and
>>> update the get_hwpoison_page() kerneldoc to document the new return.
>>>
>>> memory_failure() still folds every negative return into
>>> MF_MSG_GET_HWPOISON via its existing "else if (res < 0)" branch, so
>>> this patch on its own only changes the errno that soft_offline_page()
>>> can propagate to its callers.  A follow-up wires -ENOTRECOVERABLE
>>> through memory_failure() and reports MF_MSG_KERNEL for the
>>> unrecoverable cases, which is what the
>>> panic_on_unrecoverable_memory_failure sysctl observes.
>>>
>>> Suggested-by: David Hildenbrand <david@kernel.org>
>>> Suggested-by: Lance Yang <lance.yang@linux.dev>
>>> Signed-off-by: Breno Leitao <leitao@debian.org>
>>> ---
>>>  mm/memory-failure.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>>>  1 file changed, 58 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index f4d3e6e20e13..eed9de387694 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -1325,6 +1325,46 @@ static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
>>>  	return PageLRU(page) || is_free_buddy_page(page);
>>>  }
>>>  
>>> +/*
>>> + * Positive identification of pages the hwpoison handler cannot recover.
>>> + * These page types are owned by kernel internals (no userspace mapping
>>> + * to unmap, no file mapping to invalidate, no migration target), so the
>>> + * shake_page() / retry loop in get_any_page() can never turn them into
>>> + * something HWPoisonHandlable() will accept.  Short-circuit them to
>>> + * -ENOTRECOVERABLE so callers can panic on operator request instead of
>>> + * spinning through retries that exit as a transient-looking -EIO.
>>> + *
>>> + * The MF_SOFT_OFFLINE / page_has_movable_ops() opt-out mirrors
>>> + * HWPoisonHandlable(): soft-offline is allowed to migrate movable_ops
>>> + * pages even though they are not on the LRU.
>>> + */
>>> +static inline bool HWPoisonKernelOwned(struct page *page, unsigned long flags)
>>> +{
>>> +	struct page *head;
>>> +
>>> +	if ((flags & MF_SOFT_OFFLINE) && page_has_movable_ops(page))
>>> +		return false;
>>> +
>>
>> On a second look: Do we really need that? The page types below never support
>> migration. So I guess that check is not required?
>>
>> Apart from that, looks good with two comments:
>>
>> a) HWPoisonKernelOwned: this is not the common style for us to name functions.
>>
>> is_kernel_owned_page() or sth like that would do.
> 
> Ack, I will rename it is_kernel_owned_page()
> 
> In my defence, most of the functions similar to HWPoisonKernelOwned()
> has this name format, and I got this discussion earlier (with Lance?
> I think). Here are the similar function names in that file:
> 
>  * HWPoisonHandlable
>  * PageHWPoisonTakenOff()
>  * SetPageHWPoisonTakenOff

Some of these probably date back to our old way of handling page flags and
things, like PageLRU.

But we really should stop :)

> 
> I will update in the new version.

Thanks! Probably best to wait a bit, the merge window is coming up either way,
so this will have to wait a bit either way.

-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH v4 2/2] iio: dac: Add AD5529R DAC driver support
From: Andy Shevchenko @ 2026-06-09 18:44 UTC (permalink / raw)
  To: Janani Sunil
  Cc: Lars-Peter Clausen, Michael Hennerich, Jonathan Cameron,
	David Lechner, Nuno Sá, Andy Shevchenko, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Philipp Zabel, Jonathan Corbet,
	Shuah Khan, linux-iio, devicetree, linux-kernel, linux-doc,
	Janani Sunil
In-Reply-To: <20260609-ad5529r-driver-v4-2-2e4c02234a1a@analog.com>

On Tue, Jun 09, 2026 at 05:00:21PM +0200, Janani Sunil wrote:
> Add support for AD5529R 16-channel, 12/16 bit Digital to Analog Converter

Looks like half-dropped sentence...


> +static const struct regmap_range ad5529r_16bit_read_only_ranges[] = {
> +	regmap_reg_range(AD5529R_REG_DAC_DATA_READBACK_BASE,
> +			 (AD5529R_REG_DAC_DATA_READBACK_BASE + 15 * 2)),

Too many parentheses.

> +	regmap_reg_range(AD5529R_REG_TSENS_ALERT_FLAG, AD5529R_REG_TSENS_SHTD_FLAG),
> +	regmap_reg_range(AD5529R_REG_FUNC_BUSY, AD5529R_REG_FUNC_BUSY),
> +	regmap_reg_range(AD5529R_REG_INIT_CRC_ERR_STAT, AD5529R_REG_INIT_CRC_ERR_STAT),
> +};

...

> +static int ad5529r_reset(struct ad5529r_state *st)
> +{
> +	struct reset_control *rst;
> +	int ret;
> +
> +	rst = devm_reset_control_get_optional_exclusive(&st->spi->dev, NULL);
> +	if (IS_ERR(rst))
> +		return PTR_ERR(rst);
> +
> +	if (rst) {
> +		ret = reset_control_deassert(rst);
> +		if (ret)
> +			return ret;
> +	} else {
> +		ret = regmap_write(st->regmap_8bit, AD5529R_REG_INTERFACE_CONFIG_A,
> +				   AD5529R_INTERFACE_CONFIG_A_SW_RESET);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	/*
> +	 * Wait 10 ms for digital initialization to complete.
> +	 * Per datasheet, Interface Status A register NOT_READY_ERR bit is
> +	 * set if SPI transactions are attempted before digital initialization
> +	 * completes.
> +	 */
> +	fsleep(10000);

10 * USEC_PER_MSEC

(needs time.h if not yet included).

> +	return regmap_write(st->regmap_8bit, AD5529R_REG_INTERFACE_CONFIG_A,
> +			    AD5529R_INTERFACE_CONFIG_A_SDO_ENABLE |
> +			    AD5529R_INTERFACE_CONFIG_A_ADDR_ASCENSION);
> +}

...

> +	case IIO_CHAN_INFO_OFFSET:
> +		range_idx = st->output_range_idx[chan->channel];
> +
> +		if (ad5529r_output_ranges_mv[range_idx][0] < 0)
> +			*val = -(1 << (st->model_data->resolution - 1));

From the C standard point of view this is problematic code in case
if resolution == 32, but I believe it won't happen IRL.

> +		else
> +			*val = 0;
> +
> +		return IIO_VAL_INT;

...

> +static int ad5529r_find_output_range(const s32 *vals)
> +{
> +	for (unsigned int i = 0; i < ARRAY_SIZE(ad5529r_output_ranges_mv); i++) {
> +		if (vals[0] == ad5529r_output_ranges_mv[i][0] * 1000 &&
> +		    vals[1] == ad5529r_output_ranges_mv[i][1] * 1000)

Make suffix of the array _mV (yes, like in SI).

> +			return i;
> +	}
> +
> +	return -EINVAL;
> +}

...

> +static int ad5529r_parse_channel_ranges(struct device *dev,
> +					struct ad5529r_state *st)
> +{
> +	int ret, range_idx;
> +	u32 ch;
> +	s32 vals[2];

Reversed xmas tree order.

> +	device_for_each_child_node_scoped(dev, child) {
> +		range_idx = AD5529R_RANGE_0V_5V;
> +
> +		ret = fwnode_property_read_u32(child, "reg", &ch);
> +		if (ret)
> +			return dev_err_probe(dev, ret,
> +					     "Missing reg property in channel node\n");
> +
> +		if (ch >= 16)
> +			return dev_err_probe(dev, -EINVAL,
> +					     "Invalid channel number: %u\n", ch);
> +
> +		/* Read u32 property into s32 to handle negative voltage ranges */
> +		if (!fwnode_property_read_u32_array(child,
> +						    "adi,output-range-microvolt",
> +						    (u32 *)vals, ARRAY_SIZE(vals))) {

Yeah... Do you know how negative numbers will look in DT? Just as regular
negative integers and DT compiler will take care to pack them in unsigned
variable?

> +			range_idx = ad5529r_find_output_range(vals);
> +			if (range_idx < 0)
> +				return dev_err_probe(dev, range_idx,
> +						     "Invalid range [%d %d] for ch %u\n",
> +						     vals[0], vals[1], ch);
> +		}
> +
> +		st->output_range_idx[ch] = range_idx;
> +		ret = regmap_write(st->regmap_16bit,
> +				   AD5529R_REG_OUT_RANGE(ch), range_idx);
> +		if (ret)
> +			return dev_err_probe(dev, ret,
> +					     "Failed to configure range for ch %u\n",
> +					     ch);
> +	}
> +
> +	return 0;
> +}

...

> +static int ad5529r_probe(struct spi_device *spi)
> +{
> +	struct device *dev = &spi->dev;
> +	struct iio_dev *indio_dev;
> +	struct ad5529r_state *st;
> +	bool external_vref;
> +	int ret;
> +
> +	indio_dev = devm_iio_device_alloc(dev, sizeof(*st));
> +	if (!indio_dev)
> +		return -ENOMEM;
> +
> +	st = iio_priv(indio_dev);
> +
> +	st->spi = spi;
> +
> +	st->model_data = spi_get_device_match_data(spi);
> +	if (!st->model_data)
> +		return dev_err_probe(dev, -EINVAL, "Failed to identify device variant\n");
> +
> +	ret = devm_regulator_bulk_get_enable(dev, ARRAY_SIZE(ad5529r_supply_names),
> +					     ad5529r_supply_names);
> +	if (ret)
> +		return dev_err_probe(dev, ret,
> +				     "Failed to get and enable regulators\n");
> +
> +	ret = devm_regulator_get_enable_optional(dev, "hvss");
> +	if (ret && ret != -ENODEV)
> +		return dev_err_probe(dev, ret,
> +				     "Failed to get and enable hvss regulator\n");
> +
> +	/*
> +	 * The datasheet mentions a 4.096V external reference for correct
> +	 * operation.
> +	 */
> +	ret = devm_regulator_get_enable_optional(dev, "vref");
> +	if (ret && ret != -ENODEV)
> +		return dev_err_probe(dev, ret,
> +				     "Failed to get and enable vref regulator\n");
> +
> +	external_vref = ret != -ENODEV;

This can be written as

	if (ret == -ENODEV)
		external_vref = false;
	else if (ret)
		return dev_err_probe(dev, ret,
				     "Failed to get and enable vref regulator\n");
	else
		external_vref = true;

> +	st->regmap_8bit = devm_regmap_init_spi(spi, &ad5529r_regmap_8bit_config);
> +	if (IS_ERR(st->regmap_8bit))
> +		return dev_err_probe(dev, PTR_ERR(st->regmap_8bit),
> +				     "Failed to initialize 8-bit regmap\n");
> +
> +	st->regmap_16bit = devm_regmap_init_spi(spi, &ad5529r_regmap_16bit_config);
> +	if (IS_ERR(st->regmap_16bit))
> +		return dev_err_probe(dev, PTR_ERR(st->regmap_16bit),
> +				     "Failed to initialize 16-bit regmap\n");
> +
> +	ret = ad5529r_reset(st);
> +	if (ret)
> +		return dev_err_probe(dev, ret, "Failed to reset device\n");
> +
> +	ret = regmap_assign_bits(st->regmap_16bit, AD5529R_REG_REF_SEL,
> +				 AD5529R_REF_SEL_INTERNAL_REF,
> +				 external_vref ? 0 : AD5529R_REF_SEL_INTERNAL_REF);
> +	if (ret)
> +		return dev_err_probe(dev, ret, "Failed to configure reference\n");
> +
> +	ret = ad5529r_parse_channel_ranges(dev, st);
> +	if (ret)
> +		return ret;
> +
> +	indio_dev->name = st->model_data->model_name;
> +	indio_dev->info = &ad5529r_info;
> +	indio_dev->modes = INDIO_DIRECT_MODE;
> +	indio_dev->channels = st->model_data->channels;
> +	indio_dev->num_channels = st->model_data->num_channels;
> +
> +	return devm_iio_device_register(dev, indio_dev);
> +}

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* Re: [PATCH v5 00/34] Cleaning up the KVM clock mess
From: David Woodhouse @ 2026-06-09 18:50 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
	Paul Durrant, Jonathan Cameron, Sascha Bischoff, Marc Zyngier,
	Joey Gouly, Jack Allister, Dongli Zhang, joe.jin, kvm, linux-doc,
	linux-kernel, xen-devel, linux-kselftest
In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org>

[-- Attachment #1: Type: text/plain, Size: 1716 bytes --]

On Mon, 2026-06-08 at 15:47 +0100, David Woodhouse wrote:
> This is v5 of the series to clean up the KVM clock, rebased onto
> tip/timers/ptp (which now includes Thomas's ktime snapshot series and
> the read_snapshot patches for hyperv, kvmclock, and vmclock).
> 
> The KVM clock has historically suffered from three problems:
> 
>  1. Imprecision: get_kvmclock_ns() computed the clock from the *host*
>     TSC without applying guest TSC scaling, causing systemic drift from
>     the values the guest computes from its own TSC.
> 
>  2. Unnecessary discontinuities: gratuitous KVM_REQ_MASTERCLOCK_UPDATE
>     requests caused the master clock reference point to be re-snapshotted,
>     yanking the guest's clock due to arithmetic precision differences.
> 
>  3. No precise migration API: the existing KVM_[GS]ET_CLOCK only allows
>     setting the clock at a given UTC reference time, which is necessarily
>     imprecise. There was no way to preserve the exact arithmetic
>     relationship between guest TSC and KVM clock across live migration.
> 
> This series addresses all three, and adds new APIs for precise clock 
> migration and TSC frequency reporting. As an added bonus, it now rips
> out the whole pvclock_gtod_data hack which was shadowing the kernel's
> timekeeping, and uses ktime snapshots as $DEITY (well, Thomas) intended.

Updated series with Sashiko's fixes pushed out to
https://git.infradead.org/?p=users/dwmw2/linux.git;a=shortlog;h=refs/heads/kvmclock6

I'll give it a while for meat-based reviewers to opine before sending
another round. (Is there a way to get Sashiko to take another look
without sending it all to the list again?)

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* [PATCH net-next] docs: networking: add guidance on what to push via extack
From: Jakub Kicinski @ 2026-06-09 19:09 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
	corbet, skhan, linux-doc

Every now and then someone tries to duplicated extack
messages to dmesg. Document our guidance against this.
Also indicate that system level faults should continue
to go to system logs. The high level thinking is to try
to distinguish between what's important to the user vs
system admin.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: corbet@lwn.net
CC: skhan@linuxfoundation.org
CC: linux-doc@vger.kernel.org
---
 Documentation/networking/driver.rst | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/Documentation/networking/driver.rst b/Documentation/networking/driver.rst
index 195a916dc0de..abd366dd5e43 100644
--- a/Documentation/networking/driver.rst
+++ b/Documentation/networking/driver.rst
@@ -128,3 +128,16 @@ to be freed up.
 If you return NETDEV_TX_BUSY from the ndo_start_xmit method, you
 must not keep any reference to that SKB and you must not attempt
 to free it up.
+
+Error message reporting
+=======================
+
+Number of driver configuration interfaces pass a Netlink extended ACK
+(``extack``) object to the driver (either directly as an argument or
+as a member of a parameter struct). The drivers should try to report
+most errors via the ``extack`` object. System level exceptions,
+indicating that system or device is misbehaving or is in bad state
+should continue to be reported to system logs.
+
+Messages should be passed **either** via ``extack`` **or** to system logs.
+Drivers should not try to report the same information to both.
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH v6 08/12] PCI: liveupdate: Inherit ACS flags in incoming preserved devices
From: Pranjal Shrivastava @ 2026-06-09 19:25 UTC (permalink / raw)
  To: David Matlack
  Cc: kexec, linux-doc, linux-kernel, linux-mm, linux-pci,
	Adithya Jayachandran, Alexander Graf, Alex Williamson,
	Bjorn Helgaas, Chris Li, David Rientjes, Jacob Pan,
	Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Leon Romanovsky,
	Lukas Wunner, Mike Rapoport, Parav Pandit, Pasha Tatashin,
	Pratyush Yadav, Saeed Mahameed, Samiullah Khawaja, Shuah Khan,
	Vipin Sharma, William Tu, Yi Liu
In-Reply-To: <CALzav=dagHvcS8kbTti5rmMoks9DXuCpO3AjptkQ8z_PdG9JyQ@mail.gmail.com>

On Tue, Jun 09, 2026 at 11:40:39AM -0700, David Matlack wrote:
> On Tue, Jun 9, 2026 at 10:20 AM Pranjal Shrivastava <praan@google.com> wrote:
> >
> > On Mon, Jun 08, 2026 at 09:56:41PM +0000, David Matlack wrote:
> > > On 2026-06-07 08:37 PM, Pranjal Shrivastava wrote:
> > > > On Fri, May 22, 2026 at 08:24:06PM +0000, David Matlack wrote:
> > > > > Inherit Access Control Services (ACS) flags on all incoming preserved
> > > > > devices (endpoints and upstream bridges) during a Live Update.
> > > > >
> > > > > Inheriting ACS flags avoids changing routing rules while memory
> > > > > transactions are in flight from preserved devices. This is also strictly
> > > > > necessary to ensure that IOMMU group assignments do not change across
> > > > > a Live Update for preserved devices, as changing ACS configurations can
> > > > > split or merge IOMMU groups.
> > > > >
> > > > > Cache the inherited ACS controls established by the previous kernel in
> > > > > struct pci_dev so that ACS controls do not change after a reset
> > > > > (pci_restore_state() calls pci_enable_acs()).
> > > > >
> > > > > To simplify ACS inheritance, reject preserving any devices that require
> > > > > quirks to enable ACS as those quirks would also have to take Live Update
> > > > > into account.
> > > > >
> > > > > Signed-off-by: David Matlack <dmatlack@google.com>
> > > > > ---
> > > > >  drivers/pci/liveupdate.c       | 68 ++++++++++++++++++++++++++++++++++
> > > > >  drivers/pci/liveupdate.h       | 11 ++++++
> > > > >  drivers/pci/pci.c              |  5 +++
> > > > >  drivers/pci/pci.h              |  5 +++
> > > > >  drivers/pci/quirks.c           |  7 ++++
> > > > >  include/linux/pci_liveupdate.h |  6 +++
> > > > >  6 files changed, 102 insertions(+)
> > > > >
> > > >
> > > > [...]
> > > >
> > > > >
> > > > > +void pci_liveupdate_init_acs(struct pci_dev *dev)
> > > > > +{
> > > > > + guard(rwsem_read)(&pci_liveupdate.rwsem);
> > > > > +
> > > > > + if (!dev->acs_cap || !dev->liveupdate.incoming)
> > > > > +         return;
> > > > > +
> > > > > + pci_read_config_word(dev, dev->acs_cap + PCI_ACS_CTRL, &dev->liveupdate.acs_ctrl);
> > > >
> > > > I might be thinking out loud here, but as an attacker, this motivates me
> > > > to somehow hack the EP FW to mis-report the PCI_ACS_CTRL register across
> > > > a liveupdate to fool the incoming kernel. If the FW feeds a 0, it silently
> > > > strips ACS protections.
> > > >
> > > > Should we also serialize ACS state in ser somehow to ensure we aren't
> > > > fooled by something like this?
> > >
> > > What does "EP FW" mean?
> >
> > I was referring to the Endpoint Firmware (basically any SW running on
> > a downstream device)
> >
> > >
> > > Does such an attacker even need Live Update to attack the system? It
> > > seems like such an attacker could route TLPs in whatever malicious way
> > > they want regardless of Live Update.
> > >
> >
> > I agree that compromised PCIe devices are a menace anyway. But I was
> > talking about the potential window opened up by Live Update here,
> > suppose we have Device A & B assigned to 2 different VMs (implying they
> > are in separate IOMMU groups because the switch set ACS_RR = 1).
> >
> > Now, the attacker has an opportunity with Liveupdate, since the devices
> > are already assigned, if *somehow* it flips a bit like ACS_RR, the
> > incoming kernel might see both the devices in the same IOMMU group.
> > Who detects this case and what happens if this happens if the devices
> > are kept assigned to these VMs?
> 
> I suspect that would be caught during the restore of the iommufds to
> which those devices are attached.
> 
> The kernel would attempt to restore each device into a separate domain
> (since that's how they were preserved before the Live Update) but that
> will fail because they are in the same group now. Even if one of the
> devices was not preserved, that will still cause a failure when a user
> tries to start using that device (e.g. to try to attach it to a
> different VM).

Yes, IOMMU would eventually catch-up but what about the DMAs that were
done already? Say to an NVMe disk? We'll have to wipe the entire disk in
such a case? Also, we wouldn't know the offending device..

If such situations aren't a problem, then I guess it's fine.

Thanks,
Praan


^ permalink raw reply

* [PATCH v3 2/3] hwmon: Add update_interval_us chip attribute
From: Ferdinand Schwenk via B4 Relay @ 2026-06-09 19:43 UTC (permalink / raw)
  To: Guenter Roeck, Jonathan Corbet, Shuah Khan
  Cc: linux-hwmon, linux-kernel, linux-doc, richard.leitner,
	Ferdinand Schwenk
In-Reply-To: <20260609-hwmon-ina238-update-interval-us-v2-v3-0-016b55567950@advastore.com>

From: Ferdinand Schwenk <ferdinand.schwenk@advastore.com>

Some hardware monitoring chips support update intervals below one
millisecond. The existing update_interval attribute uses millisecond
granularity, which causes sub-millisecond steps to round to the same
value and become inaccessible from userspace.

Introduce update_interval_us, a companion chip-level attribute that
expresses the same update interval in microseconds. Drivers
implementing this attribute should also implement update_interval for
compatibility with millisecond-based userspace interfaces.

Signed-off-by: Ferdinand Schwenk <ferdinand.schwenk@advastore.com>
---
 Documentation/ABI/testing/sysfs-class-hwmon | 14 ++++++++++++++
 Documentation/hwmon/sysfs-interface.rst     |  4 ++++
 drivers/hwmon/hwmon.c                       |  1 +
 include/linux/hwmon.h                       |  2 ++
 4 files changed, 21 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-class-hwmon b/Documentation/ABI/testing/sysfs-class-hwmon
index cfd0d0bab483..b185bdfc7186 100644
--- a/Documentation/ABI/testing/sysfs-class-hwmon
+++ b/Documentation/ABI/testing/sysfs-class-hwmon
@@ -27,6 +27,20 @@ Description:
 		Some devices have a variable update rate or interval.
 		This attribute can be used to change it to the desired value.
 
+What:		/sys/class/hwmon/hwmonX/update_interval_us
+Description:
+		The interval at which the chip will update readings,
+		expressed in microseconds.
+		Unit: microsecond
+
+		RW
+
+		Some devices have a variable update rate or interval and
+		require finer-than-millisecond control.
+		This attribute can be used to change it to the desired value.
+		Drivers implementing this attribute should also implement
+		update_interval for millisecond-based userspace interfaces.
+
 What:		/sys/class/hwmon/hwmonX/inY_min
 Description:
 		Voltage min value.
diff --git a/Documentation/hwmon/sysfs-interface.rst b/Documentation/hwmon/sysfs-interface.rst
index f76e9f8cc1ad..94e1bbce172a 100644
--- a/Documentation/hwmon/sysfs-interface.rst
+++ b/Documentation/hwmon/sysfs-interface.rst
@@ -106,6 +106,10 @@ Global attributes
 `update_interval`
 		The interval at which the chip will update readings.
 
+`update_interval_us`
+		The interval at which the chip will update readings,
+		expressed in microseconds for finer resolution.
+
 
 ********
 Voltages
diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c
index 9695dca62a7e..c481bb535e35 100644
--- a/drivers/hwmon/hwmon.c
+++ b/drivers/hwmon/hwmon.c
@@ -572,6 +572,7 @@ static const char * const hwmon_chip_attrs[] = {
 	[hwmon_chip_curr_reset_history] = "curr_reset_history",
 	[hwmon_chip_power_reset_history] = "power_reset_history",
 	[hwmon_chip_update_interval] = "update_interval",
+	[hwmon_chip_update_interval_us] = "update_interval_us",
 	[hwmon_chip_alarms] = "alarms",
 	[hwmon_chip_samples] = "samples",
 	[hwmon_chip_curr_samples] = "curr_samples",
diff --git a/include/linux/hwmon.h b/include/linux/hwmon.h
index 301a83afbd66..5a5882fee299 100644
--- a/include/linux/hwmon.h
+++ b/include/linux/hwmon.h
@@ -39,6 +39,7 @@ enum hwmon_chip_attributes {
 	hwmon_chip_power_reset_history,
 	hwmon_chip_register_tz,
 	hwmon_chip_update_interval,
+	hwmon_chip_update_interval_us,
 	hwmon_chip_alarms,
 	hwmon_chip_samples,
 	hwmon_chip_curr_samples,
@@ -55,6 +56,7 @@ enum hwmon_chip_attributes {
 #define HWMON_C_POWER_RESET_HISTORY	BIT(hwmon_chip_power_reset_history)
 #define HWMON_C_REGISTER_TZ		BIT(hwmon_chip_register_tz)
 #define HWMON_C_UPDATE_INTERVAL		BIT(hwmon_chip_update_interval)
+#define HWMON_C_UPDATE_INTERVAL_US	BIT(hwmon_chip_update_interval_us)
 #define HWMON_C_ALARMS			BIT(hwmon_chip_alarms)
 #define HWMON_C_SAMPLES			BIT(hwmon_chip_samples)
 #define HWMON_C_CURR_SAMPLES		BIT(hwmon_chip_curr_samples)

-- 
2.54.0



^ permalink raw reply related

* [PATCH v3 0/3] hwmon: ina238: add samples and update_interval_us support
From: Ferdinand Schwenk via B4 Relay @ 2026-06-09 19:43 UTC (permalink / raw)
  To: Guenter Roeck, Jonathan Corbet, Shuah Khan
  Cc: linux-hwmon, linux-kernel, linux-doc, richard.leitner,
	Ferdinand Schwenk

The INA238 family exposes ADC averaging and conversion timing controls
through ADC_CONFIG. Add support for the samples and update_interval
chip attributes and introduce a generic update_interval_us companion
attribute for devices that need sub-millisecond resolution.

The shortest INA238 conversion time steps are below 1 ms, so several
valid hardware settings collapse to the same update_interval value when
reported only in milliseconds. Keep update_interval in milliseconds as
required by the existing hwmon ABI and provide update_interval_us to
report and program the same total conversion cycle time with microsecond
resolution.

Patch 1 adds samples and update_interval support for the INA238 family.
Patch 2 adds the generic hwmon update_interval_us attribute and
documents it.
Patch 3 wires the new attribute up in the INA238 driver.

Link: https://lore.kernel.org/all/20260522-hwmon-ina238-add-samples-update-interval-v1-0-e1acfceb447e@advastore.com/
---
v2:
- keep update_interval in milliseconds to preserve the existing ABI
- add the generic update_interval_us hwmon chip attribute and documentation
- implement update_interval_us for ina238
- report and program intervals using the active averaging count

v3:
- add missing Signed-off-by trailer to all patches
- address truncation concern by tightening update_interval input clamping in patch 1/3
- preserve equivalent clamping in patch 3/3 after introducing update_interval_us
- Link to v2: https://lore.kernel.org/r/20260608-hwmon-ina238-update-interval-us-v2-v2-0-2d939fbb2ea1@advastore.com

---
Ferdinand Schwenk (3):
      hwmon: ina238: add support for samples and update_interval
      hwmon: Add update_interval_us chip attribute
      hwmon: ina238: add update_interval_us attribute

 Documentation/ABI/testing/sysfs-class-hwmon |  14 +++
 Documentation/hwmon/ina238.rst              |   4 +
 Documentation/hwmon/sysfs-interface.rst     |   4 +
 drivers/hwmon/hwmon.c                       |   1 +
 drivers/hwmon/ina238.c                      | 162 +++++++++++++++++++++++++++-
 include/linux/hwmon.h                       |   2 +
 6 files changed, 185 insertions(+), 2 deletions(-)
---
base-commit: 028ef9c96e96197026887c0f092424679298aae8
change-id: 20260608-hwmon-ina238-update-interval-us-v2-13448de0a083

Best regards,
-- 
Ferdinand Schwenk <ferdinand.schwenk@advastore.com>

^ permalink raw reply

* [PATCH v3 1/3] hwmon: ina238: add support for samples and update_interval
From: Ferdinand Schwenk via B4 Relay @ 2026-06-09 19:43 UTC (permalink / raw)
  To: Guenter Roeck, Jonathan Corbet, Shuah Khan
  Cc: linux-hwmon, linux-kernel, linux-doc, richard.leitner,
	Ferdinand Schwenk
In-Reply-To: <20260609-hwmon-ina238-update-interval-us-v2-v3-0-016b55567950@advastore.com>

From: Ferdinand Schwenk <ferdinand.schwenk@advastore.com>

Expose INA238 ADC averaging count (AVG) and conversion timing
(VBUSCT/VSHCT/VTCT) through chip-level hwmon attributes:

  chip/samples
  chip/update_interval

Use per-chip conversion-time lookup tables so the same helpers work
for INA228/INA237/INA238/INA700/INA780 and SQ52206. Cache ADC_CONFIG
in driver data and update it on writes to avoid extra register reads
during read-modify-write updates.

Report update_interval in milliseconds as required by the hwmon ABI.
Compute it from raw ADC cycle time multiplied by the active averaging
count, and apply the inverse mapping on writes so programmed conversion
time tracks the selected sample count.

Clamp user-provided update_interval before unit scaling to prevent
overflow in arithmetic conversions.

Also combine chip attributes in HWMON_CHANNEL_INFO using a bitwise OR
for a single logical chip channel.

Signed-off-by: Ferdinand Schwenk <ferdinand.schwenk@advastore.com>
---
 drivers/hwmon/ina238.c | 144 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 142 insertions(+), 2 deletions(-)

diff --git a/drivers/hwmon/ina238.c b/drivers/hwmon/ina238.c
index ff67b03189f7..dc5dd3ad2557 100644
--- a/drivers/hwmon/ina238.c
+++ b/drivers/hwmon/ina238.c
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/of.h>
 #include <linux/regmap.h>
+#include <linux/util_macros.h>
 
 /* INA238 register definitions */
 #define INA238_CONFIG			0x0
@@ -49,6 +50,17 @@
 #define INA238_DIAG_ALERT_BUSUL		BIT(3)
 #define INA238_DIAG_ALERT_POL		BIT(2)
 
+/* INA238_ADC_CONFIG register field masks and shifts */
+#define INA238_ADC_CONFIG_MODE_MASK	GENMASK(15, 12)
+#define INA238_ADC_CONFIG_VBUSCT_SHIFT	9
+#define INA238_ADC_CONFIG_VBUSCT_MASK	GENMASK(11, 9)
+#define INA238_ADC_CONFIG_VSHCT_SHIFT	6
+#define INA238_ADC_CONFIG_VSHCT_MASK	GENMASK(8, 6)
+#define INA238_ADC_CONFIG_VTCT_SHIFT	3
+#define INA238_ADC_CONFIG_VTCT_MASK	GENMASK(5, 3)
+#define INA238_ADC_CONFIG_AVG_SHIFT	0
+#define INA238_ADC_CONFIG_AVG_MASK	GENMASK(2, 0)
+
 #define INA238_REGISTERS		0x20
 
 #define INA238_RSHUNT_DEFAULT		2500	/* uOhm */
@@ -101,6 +113,21 @@ static const struct regmap_config ina238_regmap_config = {
 	.val_bits = 16,
 };
 
+/* Lookup table for conversion times in usec for INA238 family */
+static const u16 ina238_conv_time[] = {
+	50, 84, 150, 280, 540, 1052, 2074, 4120,
+};
+
+/* Lookup table for conversion times in usec for SQ52206 */
+static const u16 sq52206_conv_time[] = {
+	66, 118, 310, 566, 1070, 2090, 4140, 8230,
+};
+
+/* Lookup table for number of samples used in averaging mode */
+static const int ina238_avg_samples[] = {
+	1, 4, 16, 64, 128, 256, 512, 1024,
+};
+
 enum ina238_ids { ina228, ina237, ina238, ina700, ina780, sq52206 };
 
 struct ina238_config {
@@ -112,6 +139,7 @@ struct ina238_config {
 	u32 power_calculate_factor;	/* fixed parameter for power calculation, from datasheet */
 	u32 bus_voltage_lsb;		/* bus voltage LSB, in nV */
 	int current_lsb;		/* current LSB, in uA */
+	const u16 *conv_time;		/* conversion time lookup table */
 };
 
 struct ina238_data {
@@ -124,6 +152,7 @@ struct ina238_data {
 	int current_lsb;		/* current LSB, in uA */
 	int power_lsb;			/* power LSB, in uW */
 	int energy_lsb;			/* energy LSB, in uJ */
+	u16 adc_config;			/* cached ADC_CONFIG register value */
 };
 
 static const struct ina238_config ina238_config[] = {
@@ -135,6 +164,7 @@ static const struct ina238_config ina238_config[] = {
 		.config_default = INA238_CONFIG_DEFAULT,
 		.bus_voltage_lsb = INA238_BUS_VOLTAGE_LSB,
 		.temp_resolution = 16,
+		.conv_time = ina238_conv_time,
 	},
 	[ina237] = {
 		.has_20bit_voltage_current = false,
@@ -144,6 +174,7 @@ static const struct ina238_config ina238_config[] = {
 		.config_default = INA238_CONFIG_DEFAULT,
 		.bus_voltage_lsb = INA238_BUS_VOLTAGE_LSB,
 		.temp_resolution = 12,
+		.conv_time = ina238_conv_time,
 	},
 	[ina238] = {
 		.has_20bit_voltage_current = false,
@@ -153,6 +184,7 @@ static const struct ina238_config ina238_config[] = {
 		.config_default = INA238_CONFIG_DEFAULT,
 		.bus_voltage_lsb = INA238_BUS_VOLTAGE_LSB,
 		.temp_resolution = 12,
+		.conv_time = ina238_conv_time,
 	},
 	[ina700] = {
 		.has_20bit_voltage_current = false,
@@ -163,6 +195,7 @@ static const struct ina238_config ina238_config[] = {
 		.bus_voltage_lsb = INA238_BUS_VOLTAGE_LSB,
 		.temp_resolution = 12,
 		.current_lsb = 480,
+		.conv_time = ina238_conv_time,
 	},
 	[ina780] = {
 		.has_20bit_voltage_current = false,
@@ -173,6 +206,7 @@ static const struct ina238_config ina238_config[] = {
 		.bus_voltage_lsb = INA238_BUS_VOLTAGE_LSB,
 		.temp_resolution = 12,
 		.current_lsb = 2400,
+		.conv_time = ina238_conv_time,
 	},
 	[sq52206] = {
 		.has_20bit_voltage_current = false,
@@ -182,6 +216,7 @@ static const struct ina238_config ina238_config[] = {
 		.config_default = SQ52206_CONFIG_DEFAULT,
 		.bus_voltage_lsb = SQ52206_BUS_VOLTAGE_LSB,
 		.temp_resolution = 16,
+		.conv_time = sq52206_conv_time,
 	},
 };
 
@@ -261,6 +296,97 @@ static int ina228_read_voltage(struct ina238_data *data, int channel, long *val)
 	return 0;
 }
 
+/* Converting ADC_CONFIG register value to update_interval in usec */
+static inline u32 ina238_reg_to_interval_us(struct ina238_data *data)
+{
+	const u16 *ct = data->config->conv_time;
+	u32 vbusct = ct[(data->adc_config & INA238_ADC_CONFIG_VBUSCT_MASK) >>
+			INA238_ADC_CONFIG_VBUSCT_SHIFT];
+	u32 vshct  = ct[(data->adc_config & INA238_ADC_CONFIG_VSHCT_MASK) >>
+			INA238_ADC_CONFIG_VSHCT_SHIFT];
+	u32 vtct   = ct[(data->adc_config & INA238_ADC_CONFIG_VTCT_MASK) >>
+			INA238_ADC_CONFIG_VTCT_SHIFT];
+
+	return vbusct + vshct + vtct;
+}
+
+static inline u32 ina238_samples(struct ina238_data *data)
+{
+	return ina238_avg_samples[(data->adc_config & INA238_ADC_CONFIG_AVG_MASK) >>
+				  INA238_ADC_CONFIG_AVG_SHIFT];
+}
+
+/* Converting update_interval in msec to a single conversion time in usec */
+static inline u32 ina238_interval_ms_to_conv_time(long interval, u32 samples)
+{
+	u64 interval_us;
+
+	interval = clamp_val(interval, 0, INT_MAX / 1000);
+	interval_us = (u64)interval * 1000;
+
+	/*
+	 * update_interval reports the ADC cycle time including averaging.
+	 * The target per-field conversion time is interval_us / (samples * 3).
+	 */
+	return DIV_ROUND_CLOSEST_ULL(interval_us, samples * 3);
+}
+
+static int ina238_read_chip(struct device *dev, u32 attr, long *val)
+{
+	struct ina238_data *data = dev_get_drvdata(dev);
+
+	switch (attr) {
+	case hwmon_chip_samples:
+		*val = ina238_samples(data);
+		return 0;
+	case hwmon_chip_update_interval:
+		/* Return in msec */
+		*val = DIV_ROUND_CLOSEST(ina238_reg_to_interval_us(data) *
+					ina238_samples(data), 1000);
+		return 0;
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static int ina238_write_chip(struct device *dev, u32 attr, long val)
+{
+	struct ina238_data *data = dev_get_drvdata(dev);
+	u16 adc_config;
+	int idx, ret;
+
+	switch (attr) {
+	case hwmon_chip_samples:
+		idx = find_closest(val, ina238_avg_samples,
+				   ARRAY_SIZE(ina238_avg_samples));
+		adc_config = (data->adc_config & ~INA238_ADC_CONFIG_AVG_MASK) |
+			     (idx << INA238_ADC_CONFIG_AVG_SHIFT);
+		ret = regmap_write(data->regmap, INA238_ADC_CONFIG, adc_config);
+		if (ret)
+			return ret;
+		data->adc_config = adc_config;
+		return 0;
+	case hwmon_chip_update_interval:
+		val = ina238_interval_ms_to_conv_time(val, ina238_samples(data));
+		idx = find_closest(val, data->config->conv_time,
+				   ARRAY_SIZE(ina238_conv_time));
+		adc_config = (data->adc_config &
+			      ~(INA238_ADC_CONFIG_VBUSCT_MASK |
+				INA238_ADC_CONFIG_VSHCT_MASK |
+				INA238_ADC_CONFIG_VTCT_MASK)) |
+			     ((u16)idx << INA238_ADC_CONFIG_VBUSCT_SHIFT) |
+			     ((u16)idx << INA238_ADC_CONFIG_VSHCT_SHIFT) |
+			     ((u16)idx << INA238_ADC_CONFIG_VTCT_SHIFT);
+		ret = regmap_write(data->regmap, INA238_ADC_CONFIG, adc_config);
+		if (ret)
+			return ret;
+		data->adc_config = adc_config;
+		return 0;
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
 static int ina238_read_in(struct device *dev, u32 attr, int channel,
 			  long *val)
 {
@@ -587,6 +713,8 @@ static int ina238_read(struct device *dev, enum hwmon_sensor_types type,
 		       u32 attr, int channel, long *val)
 {
 	switch (type) {
+	case hwmon_chip:
+		return ina238_read_chip(dev, attr, val);
 	case hwmon_in:
 		return ina238_read_in(dev, attr, channel, val);
 	case hwmon_curr:
@@ -607,6 +735,8 @@ static int ina238_write(struct device *dev, enum hwmon_sensor_types type,
 			u32 attr, int channel, long val)
 {
 	switch (type) {
+	case hwmon_chip:
+		return ina238_write_chip(dev, attr, val);
 	case hwmon_in:
 		return ina238_write_in(dev, attr, channel, val);
 	case hwmon_curr:
@@ -629,6 +759,14 @@ static umode_t ina238_is_visible(const void *drvdata,
 	bool has_energy = data->config->has_energy;
 
 	switch (type) {
+	case hwmon_chip:
+		switch (attr) {
+		case hwmon_chip_samples:
+		case hwmon_chip_update_interval:
+			return 0644;
+		default:
+			return 0;
+		}
 	case hwmon_in:
 		switch (attr) {
 		case hwmon_in_input:
@@ -692,6 +830,8 @@ static umode_t ina238_is_visible(const void *drvdata,
 				HWMON_I_MIN | HWMON_I_MIN_ALARM)
 
 static const struct hwmon_channel_info * const ina238_info[] = {
+	HWMON_CHANNEL_INFO(chip,
+			   HWMON_C_SAMPLES | HWMON_C_UPDATE_INTERVAL),
 	HWMON_CHANNEL_INFO(in,
 			   /* 0: shunt voltage */
 			   INA238_HWMON_IN_CONFIG,
@@ -798,8 +938,8 @@ static int ina238_probe(struct i2c_client *client)
 	}
 
 	/* Setup ADC_CONFIG register */
-	ret = regmap_write(data->regmap, INA238_ADC_CONFIG,
-			   INA238_ADC_CONFIG_DEFAULT);
+	data->adc_config = INA238_ADC_CONFIG_DEFAULT;
+	ret = regmap_write(data->regmap, INA238_ADC_CONFIG, data->adc_config);
 	if (ret < 0) {
 		dev_err(dev, "error configuring the device: %d\n", ret);
 		return -ENODEV;

-- 
2.54.0



^ permalink raw reply related

* [PATCH v3 3/3] hwmon: ina238: add update_interval_us attribute
From: Ferdinand Schwenk via B4 Relay @ 2026-06-09 19:43 UTC (permalink / raw)
  To: Guenter Roeck, Jonathan Corbet, Shuah Khan
  Cc: linux-hwmon, linux-kernel, linux-doc, richard.leitner,
	Ferdinand Schwenk
In-Reply-To: <20260609-hwmon-ina238-update-interval-us-v2-v3-0-016b55567950@advastore.com>

From: Ferdinand Schwenk <ferdinand.schwenk@advastore.com>

The INA238 family supports eight conversion time steps from 50 us to
4120 us (SQ52206: 66 us to 8230 us). At the millisecond granularity of
update_interval, the four shortest steps (50, 84, 150, 280 us) all
round to the same value and cannot be individually selected.

Add support for the generic update_interval_us attribute, which reports
and programs the same ADC cycle time as update_interval but in
microseconds, giving userspace full access to all conversion time steps.

Both attributes reflect the total cycle time including the active
averaging count: the reported value is the raw conversion time
multiplied by the number of averaged samples, and writes apply the
inverse mapping.

Signed-off-by: Ferdinand Schwenk <ferdinand.schwenk@advastore.com>
---
 Documentation/hwmon/ina238.rst |  4 +++
 drivers/hwmon/ina238.c         | 70 ++++++++++++++++++++++++++----------------
 2 files changed, 48 insertions(+), 26 deletions(-)

diff --git a/Documentation/hwmon/ina238.rst b/Documentation/hwmon/ina238.rst
index 43950d1ec551..a75b79e17d9d 100644
--- a/Documentation/hwmon/ina238.rst
+++ b/Documentation/hwmon/ina238.rst
@@ -106,4 +106,8 @@ energy1_input		Energy measurement (uJ)
 temp1_input		Die temperature measurement (mC)
 temp1_max		Maximum die temperature threshold (mC)
 temp1_max_alarm		Maximum die temperature alarm
+
+samples			ADC averaging count (1, 4, 16, 64, 128, 256, 512, 1024)
+update_interval		Total ADC conversion cycle time including averaging (ms)
+update_interval_us	Total ADC conversion cycle time including averaging (us)
 ======================= =======================================================
diff --git a/drivers/hwmon/ina238.c b/drivers/hwmon/ina238.c
index dc5dd3ad2557..080a93fcc9f7 100644
--- a/drivers/hwmon/ina238.c
+++ b/drivers/hwmon/ina238.c
@@ -316,19 +316,36 @@ static inline u32 ina238_samples(struct ina238_data *data)
 				  INA238_ADC_CONFIG_AVG_SHIFT];
 }
 
-/* Converting update_interval in msec to a single conversion time in usec */
-static inline u32 ina238_interval_ms_to_conv_time(long interval, u32 samples)
+/* Converting update_interval(_us) to a per-field conversion time in usec.
+ * interval_us is the total ADC cycle time including averaging in microseconds.
+ * All three conversion fields (VBUSCT, VSHCT, VTCT) are set equal, so the
+ * per-field time is interval_us / (samples * 3).
+ */
+static inline u32 ina238_interval_us_to_conv_time(u32 interval_us, u32 samples)
 {
-	u64 interval_us;
+	return DIV_ROUND_CLOSEST_ULL(interval_us, samples * 3);
+}
 
-	interval = clamp_val(interval, 0, INT_MAX / 1000);
-	interval_us = (u64)interval * 1000;
+/* Write a per-field conversion time (in usec) to the ADC_CONFIG register */
+static int ina238_write_conv_time(struct ina238_data *data, u32 conv_time_us)
+{
+	u16 adc_config;
+	int idx, ret;
 
-	/*
-	 * update_interval reports the ADC cycle time including averaging.
-	 * The target per-field conversion time is interval_us / (samples * 3).
-	 */
-	return DIV_ROUND_CLOSEST_ULL(interval_us, samples * 3);
+	idx = find_closest(conv_time_us, data->config->conv_time,
+			   ARRAY_SIZE(ina238_conv_time));
+	adc_config = (data->adc_config &
+		      ~(INA238_ADC_CONFIG_VBUSCT_MASK |
+			INA238_ADC_CONFIG_VSHCT_MASK |
+			INA238_ADC_CONFIG_VTCT_MASK)) |
+		     ((u16)idx << INA238_ADC_CONFIG_VBUSCT_SHIFT) |
+		     ((u16)idx << INA238_ADC_CONFIG_VSHCT_SHIFT) |
+		     ((u16)idx << INA238_ADC_CONFIG_VTCT_SHIFT);
+	ret = regmap_write(data->regmap, INA238_ADC_CONFIG, adc_config);
+	if (ret)
+		return ret;
+	data->adc_config = adc_config;
+	return 0;
 }
 
 static int ina238_read_chip(struct device *dev, u32 attr, long *val)
@@ -344,6 +361,10 @@ static int ina238_read_chip(struct device *dev, u32 attr, long *val)
 		*val = DIV_ROUND_CLOSEST(ina238_reg_to_interval_us(data) *
 					ina238_samples(data), 1000);
 		return 0;
+	case hwmon_chip_update_interval_us:
+		/* Return in usec */
+		*val = ina238_reg_to_interval_us(data) * ina238_samples(data);
+		return 0;
 	default:
 		return -EOPNOTSUPP;
 	}
@@ -367,21 +388,16 @@ static int ina238_write_chip(struct device *dev, u32 attr, long val)
 		data->adc_config = adc_config;
 		return 0;
 	case hwmon_chip_update_interval:
-		val = ina238_interval_ms_to_conv_time(val, ina238_samples(data));
-		idx = find_closest(val, data->config->conv_time,
-				   ARRAY_SIZE(ina238_conv_time));
-		adc_config = (data->adc_config &
-			      ~(INA238_ADC_CONFIG_VBUSCT_MASK |
-				INA238_ADC_CONFIG_VSHCT_MASK |
-				INA238_ADC_CONFIG_VTCT_MASK)) |
-			     ((u16)idx << INA238_ADC_CONFIG_VBUSCT_SHIFT) |
-			     ((u16)idx << INA238_ADC_CONFIG_VSHCT_SHIFT) |
-			     ((u16)idx << INA238_ADC_CONFIG_VTCT_SHIFT);
-		ret = regmap_write(data->regmap, INA238_ADC_CONFIG, adc_config);
-		if (ret)
-			return ret;
-		data->adc_config = adc_config;
-		return 0;
+		/* Convert ms to us before passing to the shared helper */
+		val = clamp_val(val, 0, INT_MAX / 1000) * 1000;
+		return ina238_write_conv_time(data,
+			ina238_interval_us_to_conv_time((u32)val,
+						ina238_samples(data)));
+	case hwmon_chip_update_interval_us:
+		val = clamp_val(val, 0, INT_MAX);
+		return ina238_write_conv_time(data,
+			ina238_interval_us_to_conv_time((u32)val,
+						ina238_samples(data)));
 	default:
 		return -EOPNOTSUPP;
 	}
@@ -763,6 +779,7 @@ static umode_t ina238_is_visible(const void *drvdata,
 		switch (attr) {
 		case hwmon_chip_samples:
 		case hwmon_chip_update_interval:
+		case hwmon_chip_update_interval_us:
 			return 0644;
 		default:
 			return 0;
@@ -831,7 +848,8 @@ static umode_t ina238_is_visible(const void *drvdata,
 
 static const struct hwmon_channel_info * const ina238_info[] = {
 	HWMON_CHANNEL_INFO(chip,
-			   HWMON_C_SAMPLES | HWMON_C_UPDATE_INTERVAL),
+			   HWMON_C_SAMPLES | HWMON_C_UPDATE_INTERVAL |
+			   HWMON_C_UPDATE_INTERVAL_US),
 	HWMON_CHANNEL_INFO(in,
 			   /* 0: shunt voltage */
 			   INA238_HWMON_IN_CONFIG,

-- 
2.54.0



^ permalink raw reply related

* Re: [PATCH net-next] docs: networking: add guidance on what to push via extack
From: Joe Damato @ 2026-06-09 19:51 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	skhan, linux-doc
In-Reply-To: <20260609190919.1139517-1-kuba@kernel.org>

On Tue, Jun 09, 2026 at 12:09:19PM -0700, Jakub Kicinski wrote:
> Every now and then someone tries to duplicated extack
> messages to dmesg. Document our guidance against this.
> Also indicate that system level faults should continue
> to go to system logs. The high level thinking is to try
> to distinguish between what's important to the user vs
> system admin.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: corbet@lwn.net
> CC: skhan@linuxfoundation.org
> CC: linux-doc@vger.kernel.org
> ---
>  Documentation/networking/driver.rst | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/Documentation/networking/driver.rst b/Documentation/networking/driver.rst
> index 195a916dc0de..abd366dd5e43 100644
> --- a/Documentation/networking/driver.rst
> +++ b/Documentation/networking/driver.rst
> @@ -128,3 +128,16 @@ to be freed up.
>  If you return NETDEV_TX_BUSY from the ndo_start_xmit method, you
>  must not keep any reference to that SKB and you must not attempt
>  to free it up.
> +
> +Error message reporting
> +=======================
> +
> +Number of driver configuration interfaces pass a Netlink extended ACK

Maybe I'm reading it wrong, but seems like it should be: "A number of" ?

Reviewed-by: Joe Damato <joe@dama.to>

^ permalink raw reply

* Re: [PATCH v17 00/28] Add new general DRM property "color format"
From: Daniel Stone @ 2026-06-09 20:11 UTC (permalink / raw)
  To: Nicolas Frattaroli
  Cc: Harry Wentland, Leo Li, Rodrigo Siqueira, Alex Deucher,
	Christian König, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Laurent Pinchart,
	Jonas Karlman, Jernej Skrabec, Sandy Huang, Heiko Stübner,
	Andy Yan, Jani Nikula, Rodrigo Vivi, Joonas Lahtinen,
	Tvrtko Ursulin, Dmitry Baryshkov, Sascha Hauer, Rob Herring,
	Jonathan Corbet, Shuah Khan, kernel, amd-gfx, dri-devel,
	linux-kernel, linux-arm-kernel, linux-rockchip, intel-gfx,
	intel-xe, linux-doc, wayland-devel, Werner Sembach,
	Andri Yngvason, Cristian Ciocaltea, Marius Vlad, Dmitry Baryshkov,
	Andy Yan
In-Reply-To: <20260609-color-format-v17-0-35739b5782cc@collabora.com>

Hi Nicolas,

On Tue, 9 Jun 2026 at 13:44, Nicolas Frattaroli
<nicolas.frattaroli@collabora.com> wrote:
> this is a follow-up to
> https://lore.kernel.org/all/20250911130739.4936-1-marius.vlad@collabora.com/
> which in of itself is a follow-up to
> https://lore.kernel.org/dri-devel/20240115160554.720247-1-andri@yngvason.is/ where
> a new DRM connector property has been added allowing users to
> force a particular color format.
>
> That in turn was actually also a follow-up from Werner Sembach's posted at
> https://lore.kernel.org/dri-devel/20210630151018.330354-1-wse@tuxedocomputers.com/
>
> As the number of cooks have reached critical mass, I'm hoping I'll be
> the last person to touch this particular series.

Thanks for seeing this through!

I've pushed this now, minus the Intel patches which they can merge
through their own tree. The AMD tree required a pretty trivial
conflict resolution which seems to work OK here.

Cheers,
Daniel

^ permalink raw reply

* [PATCH net-next 0/3] docs: net: more adjustments to docs
From: Jakub Kicinski @ 2026-06-09 20:12 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet, linux-doc,
	bpf, Jakub Kicinski

A few small updates to the docs. Mostly typos this time.
This is trying to prepare docs for getting fed directly
into AI reviews.

Jakub Kicinski (3):
  docs: net: fix minor issues with XDP metadata docs
  docs: net: tls-offload: document tls_dev_del, tls_dev_resync, and
    rekey
  docs: net: fix minor issues with devlink docs

 .../networking/devlink/devlink-health.rst     | 12 +++++---
 .../networking/devlink/devlink-params.rst     |  2 +-
 .../networking/devlink/devlink-port.rst       |  5 +++-
 .../networking/devlink/devlink-trap.rst       |  8 +++--
 Documentation/networking/devlink/index.rst    | 10 +++----
 Documentation/networking/tls-offload.rst      | 29 ++++++++++++++++++
 Documentation/networking/xdp-rx-metadata.rst  |  2 +-
 Documentation/networking/xsk-tx-metadata.rst  | 30 ++++++++++---------
 8 files changed, 69 insertions(+), 29 deletions(-)

-- 
2.54.0


^ permalink raw reply

* [PATCH net-next 1/3] docs: net: fix minor issues with XDP metadata docs
From: Jakub Kicinski @ 2026-06-09 20:12 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet, linux-doc,
	bpf, Jakub Kicinski, skhan, ast, daniel, hawk, john.fastabend,
	sdf
In-Reply-To: <20260609201224.1191391-1-kuba@kernel.org>

Minor updates to the XDP metadata documentation:
- s/union/struct/ for xsk_tx_metadata
- document nested request and completion metadata fields
- point capability queries at the xsk-features attribute
- fix grammar in the XDP RX metadata guide
- typos

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: corbet@lwn.net
CC: skhan@linuxfoundation.org
CC: ast@kernel.org
CC: daniel@iogearbox.net
CC: hawk@kernel.org
CC: john.fastabend@gmail.com
CC: sdf@fomichev.me
CC: linux-doc@vger.kernel.org
CC: bpf@vger.kernel.org
---
 Documentation/networking/xdp-rx-metadata.rst |  2 +-
 Documentation/networking/xsk-tx-metadata.rst | 30 +++++++++++---------
 2 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
index ce96f4c99505..efdf5eeb49e7 100644
--- a/Documentation/networking/xdp-rx-metadata.rst
+++ b/Documentation/networking/xdp-rx-metadata.rst
@@ -36,7 +36,7 @@ metadata available in which case the driver returns ``-ENODATA``.
 
 Not all kfuncs have to be implemented by the device driver; when not
 implemented, the default ones that return ``-EOPNOTSUPP`` will be used
-to indicate the device driver have not implemented this kfunc.
+to indicate the device driver has not implemented this kfunc.
 
 
 Within an XDP frame, the metadata layout (accessed via ``xdp_buff``) is
diff --git a/Documentation/networking/xsk-tx-metadata.rst b/Documentation/networking/xsk-tx-metadata.rst
index df53a10ccac3..f240930ca1df 100644
--- a/Documentation/networking/xsk-tx-metadata.rst
+++ b/Documentation/networking/xsk-tx-metadata.rst
@@ -14,9 +14,9 @@ General Design
 The headroom for the metadata is reserved via ``tx_metadata_len`` and
 ``XDP_UMEM_TX_METADATA_LEN`` flag in ``struct xdp_umem_reg``. The metadata
 length is therefore the same for every socket that shares the same umem.
-The metadata layout is a fixed UAPI, refer to ``union xsk_tx_metadata`` in
+The metadata layout is a fixed UAPI, refer to ``struct xsk_tx_metadata`` in
 ``include/uapi/linux/if_xdp.h``. Thus, generally, the ``tx_metadata_len``
-field above should contain ``sizeof(union xsk_tx_metadata)``.
+field above should contain ``sizeof(struct xsk_tx_metadata)``.
 
 Note that in the original implementation the ``XDP_UMEM_TX_METADATA_LEN``
 flag was not required. Applications might attempt to create a umem
@@ -45,15 +45,16 @@ the metadata area is ignored by the kernel as well.
 The flags field enables the particular offload:
 
 - ``XDP_TXMD_FLAGS_TIMESTAMP``: requests the device to put transmission
-  timestamp into ``tx_timestamp`` field of ``union xsk_tx_metadata``.
+  timestamp into ``completion.tx_timestamp`` field of
+  ``struct xsk_tx_metadata``.
 - ``XDP_TXMD_FLAGS_CHECKSUM``: requests the device to calculate L4
-  checksum. ``csum_start`` specifies byte offset of where the checksumming
-  should start and ``csum_offset`` specifies byte offset where the
-  device should store the computed checksum.
+  checksum. ``request.csum_start`` specifies byte offset of where the
+  checksumming should start and ``request.csum_offset`` specifies byte offset
+  where the device should store the computed checksum.
 - ``XDP_TXMD_FLAGS_LAUNCH_TIME``: requests the device to schedule the
   packet for transmission at a pre-determined time called launch time. The
-  value of launch time is indicated by ``launch_time`` field of
-  ``union xsk_tx_metadata``.
+  value of launch time is indicated by ``request.launch_time`` field of
+  ``struct xsk_tx_metadata``.
 
 Besides the flags above, in order to trigger the offloads, the first
 packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA``
@@ -63,9 +64,9 @@ only the first chunk should carry the metadata.
 Software TX Checksum
 ====================
 
-For development and testing purposes its possible to pass
+For development and testing purposes it's possible to pass
 ``XDP_UMEM_TX_SW_CSUM`` flag to ``XDP_UMEM_REG`` UMEM registration call.
-In this case, when running in ``XDK_COPY`` mode, the TX checksum
+In this case, when running in ``XDP_COPY`` mode, the TX checksum
 is calculated on the CPU. Do not enable this option in production because
 it will negatively affect performance.
 
@@ -118,7 +119,7 @@ schedule with a 1-second cycle time, with all Tx Queues open at all times.
 
 The value of the launch time that is programmed in the Advanced Transmit
 Context Descriptor is a relative offset to the starting time of the Qbv
-transmission window of the queue. The Frst flag of the descriptor can be
+transmission window of the queue. The First flag of the descriptor can be
 set to schedule the packet for the next Qbv cycle. Therefore, the horizon
 of the launch time for i225 and i226 is the ending time of the next cycle
 of the Qbv transmission window of the queue. For example, when the Qbv
@@ -129,9 +130,10 @@ running.
 Querying Device Capabilities
 ============================
 
-Every devices exports its offloads capabilities via netlink netdev family.
-Refer to ``xsk-flags`` features bitmask in
-``Documentation/netlink/specs/netdev.yaml``.
+Every device exports its offload capabilities via the Netlink netdev family.
+Query the ``xsk-features`` attribute in
+``Documentation/netlink/specs/netdev.yaml``. Its bits are defined by the
+``xsk-flags`` enum.
 
 - ``tx-timestamp``: device supports ``XDP_TXMD_FLAGS_TIMESTAMP``
 - ``tx-checksum``: device supports ``XDP_TXMD_FLAGS_CHECKSUM``
-- 
2.54.0


^ permalink raw reply related

* [PATCH net-next 2/3] docs: net: tls-offload: document tls_dev_del, tls_dev_resync, and rekey
From: Jakub Kicinski @ 2026-06-09 20:12 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet, linux-doc,
	bpf, Jakub Kicinski, john.fastabend, sd, skhan
In-Reply-To: <20260609201224.1191391-1-kuba@kernel.org>

Fill in some gaps in the TLS offload doc:

- describe the tls_dev_del and tls_dev_resync callbacks
- add a mention of rekeying being out of scope for now

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: john.fastabend@gmail.com
CC: sd@queasysnail.net
CC: corbet@lwn.net
CC: skhan@linuxfoundation.org
CC: linux-doc@vger.kernel.org
---
 Documentation/networking/tls-offload.rst | 29 ++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/Documentation/networking/tls-offload.rst b/Documentation/networking/tls-offload.rst
index c173f537bf4d..a41f46885e8c 100644
--- a/Documentation/networking/tls-offload.rst
+++ b/Documentation/networking/tls-offload.rst
@@ -104,6 +104,29 @@ at the end of kernel structures (see :c:member:`driver_state` members
 in ``include/net/tls.h``) to avoid additional allocations and pointer
 dereferences.
 
+When the offloaded connection is destroyed the core calls
+the :c:member:`tls_dev_del` callback so the driver can release per-direction
+state:
+
+.. code-block:: c
+
+	void (*tls_dev_del)(struct net_device *netdev,
+			    struct tls_context *ctx,
+			    enum tls_offload_ctx_dir direction);
+
+``tls_dev_del`` is mandatory whenever ``tls_dev_add`` is provided.
+
+The third TLS device callback is :c:member:`tls_dev_resync`, called by the core
+to synchronize the TCP stream with the record boundaries:
+
+.. code-block:: c
+
+	int (*tls_dev_resync)(struct net_device *netdev,
+			      struct sock *sk, u32 seq, u8 *rcd_sn,
+			      enum tls_offload_ctx_dir direction);
+
+See the `Resync handling`_ section for details.
+
 TX
 --
 
@@ -381,6 +404,12 @@ synchronization with an exponential back off (first after 2 encrypted
 records, then after 4 records, after 8, after 16... up until every
 128 records).
 
+Rekey
+=====
+
+Offload does not currently support TLS 1.3, therefore key rotation
+is not a concern for offloaded connections at this point.
+
 Error handling
 ==============
 
-- 
2.54.0


^ permalink raw reply related

* [PATCH net-next 3/3] docs: net: fix minor issues with devlink docs
From: Jakub Kicinski @ 2026-06-09 20:12 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, corbet, linux-doc,
	bpf, Jakub Kicinski, jiri, skhan
In-Reply-To: <20260609201224.1191391-1-kuba@kernel.org>

Update devlink documentation to match current code:

- describe health reporter defaults (it's currently under "callbacks"),
  best-effort auto-dump, and port-scoped reporters
- fix generic parameter names and values
- fix nested devlink setup wording and registration ordering

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: jiri@resnulli.us
CC: corbet@lwn.net
CC: skhan@linuxfoundation.org
CC: linux-doc@vger.kernel.org
---
 Documentation/networking/devlink/devlink-health.rst | 12 ++++++++----
 Documentation/networking/devlink/devlink-params.rst |  2 +-
 Documentation/networking/devlink/devlink-port.rst   |  5 ++++-
 Documentation/networking/devlink/devlink-trap.rst   |  8 +++++---
 Documentation/networking/devlink/index.rst          | 10 +++++-----
 5 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/Documentation/networking/devlink/devlink-health.rst b/Documentation/networking/devlink/devlink-health.rst
index 4d10536377ab..bedac58a2f36 100644
--- a/Documentation/networking/devlink/devlink-health.rst
+++ b/Documentation/networking/devlink/devlink-health.rst
@@ -33,7 +33,9 @@ asynchronously. All health reports handling is done by ``devlink``.
   * Recovery procedures
   * Diagnostics procedures
   * Object dump procedures
-  * Out Of Box initial parameters
+
+Drivers also provide default values for generic reporter parameters when
+creating a health reporter.
 
 Different parts of the driver can register different types of health reporters
 with different handlers.
@@ -45,8 +47,9 @@ Actions
 
   * A log is being send to the kernel trace events buffer
   * Health status and statistics are being updated for the reporter instance
-  * Object dump is being taken and saved at the reporter instance (as long as
-    auto-dump is set and there is no other dump which is already stored)
+  * Object dump is being taken and saved at the reporter instance. This is
+    best effort and skipped when recovery is aborted, auto-dump is disabled,
+    no dump callback is registered, or a dump is already stored.
   * Auto recovery attempt is being done. Depends on:
 
     - Auto-recovery configuration
@@ -75,7 +78,8 @@ User Interface
 ==============
 
 User can access/change each reporter's parameters and driver specific callbacks
-via ``devlink``, e.g per error type (per health reporter):
+via ``devlink``, e.g. per error type (per health reporter). Reporters may be
+registered for the whole devlink instance or for a specific devlink port.
 
   * Configure reporter's generic parameters (like: disable/enable auto recovery)
   * Invoke recovery procedure
diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
index ea17756dcda6..ca19ee3e63c8 100644
--- a/Documentation/networking/devlink/devlink-params.rst
+++ b/Documentation/networking/devlink/devlink-params.rst
@@ -122,7 +122,7 @@ own name.
    * - ``enable_iwarp``
      - Boolean
      - Enable handling of iWARP traffic in the device.
-   * - ``internal_err_reset``
+   * - ``internal_error_reset``
      - Boolean
      - When enabled, the device driver will reset the device on internal
        errors.
diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst
index 5e397798a402..9374ebe70f48 100644
--- a/Documentation/networking/devlink/devlink-port.rst
+++ b/Documentation/networking/devlink/devlink-port.rst
@@ -38,7 +38,7 @@ Devlink port flavours are described below.
      - This indicates an eswitch port representing a port of PCI
        subfunction (SF).
    * - ``DEVLINK_PORT_FLAVOUR_VIRTUAL``
-     - This indicates a virtual port for the PCI virtual function.
+     - Any virtual port facing the user.
 
 Devlink port can have a different type based on the link layer described below.
 
@@ -134,6 +134,9 @@ Users may also set the IPsec crypto capability of the function using
 Users may also set the IPsec packet capability of the function using
 `devlink port function set ipsec_packet` command.
 
+The ``migratable`` attribute may be set only on ports with
+``DEVLINK_PORT_FLAVOUR_PCI_VF``.
+
 Users may also set the maximum IO event queues of the function
 using `devlink port function set max_io_eqs` command.
 
diff --git a/Documentation/networking/devlink/devlink-trap.rst b/Documentation/networking/devlink/devlink-trap.rst
index 5885e21e2212..ac5bf9337198 100644
--- a/Documentation/networking/devlink/devlink-trap.rst
+++ b/Documentation/networking/devlink/devlink-trap.rst
@@ -516,9 +516,11 @@ Generic Packet Trap Groups
 
 Generic packet trap groups are used to aggregate logically related packet
 traps. These groups allow the user to batch operations such as setting the trap
-action of all member traps. In addition, ``devlink-trap`` can report aggregated
-per-group packets and bytes statistics, in case per-trap statistics are too
-narrow. The description of these groups must be added to the following table:
+action of all member drop traps whose action may legally change. Exception and
+control traps remain unchanged. In addition, ``devlink-trap`` can report
+aggregated per-group packets and bytes statistics, in case per-trap statistics
+are too narrow. The description of these groups must be added to the following
+table:
 
 .. list-table:: List of Generic Packet Trap Groups
    :widths: 10 90
diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
index f7ba7dcf477d..32f70879ddd0 100644
--- a/Documentation/networking/devlink/index.rst
+++ b/Documentation/networking/devlink/index.rst
@@ -13,8 +13,8 @@ new APIs prefixed by ``devl_*``. The older APIs handle all the locking
 in devlink core, but don't allow registration of most sub-objects once
 the main devlink object is itself registered. The newer ``devl_*`` APIs assume
 the devlink instance lock is already held. Drivers can take the instance
-lock by calling ``devl_lock()``. It is also held all callbacks of devlink
-netlink commands.
+lock by calling ``devl_lock()``. It is also held across all callbacks of
+devlink netlink commands.
 
 Drivers are encouraged to use the devlink instance lock for their own needs.
 
@@ -33,11 +33,11 @@ devlink instances created underneath. In that case, drivers should make
    lock of both nested and parent instances at the same time, devlink
    instance lock of the parent instance should be taken first, only then
    instance lock of the nested instance could be taken.
- - Driver should use object-specific helpers to setup the
-   nested relationship:
+ - Driver should use object-specific helpers to setup the nested relationship
+   before registering the nested devlink instance:
 
    - ``devl_nested_devlink_set()`` - called to setup devlink -> nested
-     devlink relationship (could be user for multiple nested instances.
+     devlink relationship (could be used for multiple nested instances).
    - ``devl_port_fn_devlink_set()`` - called to setup port function ->
      nested devlink relationship.
    - ``devlink_linecard_nested_dl_set()`` - called to setup linecard ->
-- 
2.54.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox