Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* [PATCH v5 00/19] drm/atomic: Rework initial state allocation
From: Maxime Ripard @ 2026-05-19  9:01 UTC (permalink / raw)
  To: Maarten Lankhorst, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jonathan Corbet, Shuah Khan, Dmitry Baryshkov, Jyri Sarha,
	Tomi Valkeinen, Andrzej Hajda, Neil Armstrong, Robert Foss,
	Laurent Pinchart, Jonas Karlman, Jernej Skrabec, Simon Ser,
	Harry Wentland, Melissa Wen, Sebastian Wick, Alex Hung,
	Jani Nikula, Rodrigo Vivi, Joonas Lahtinen, Tvrtko Ursulin,
	Chen-Yu Tsai, Samuel Holland, Dave Stevenson, Maíra Canal,
	Raspberry Pi Kernel Maintenance
  Cc: dri-devel, linux-doc, linux-kernel, Daniel Stone, intel-gfx,
	intel-xe, linux-arm-kernel, linux-sunxi, Maxime Ripard,
	Laurent Pinchart, Laurent Pinchart

Hi,

This series started from my work on the hardware state readout[1], and
more specifically a discussion with Thomas[2].

This series expands the work that has been merged recently to make
drm_private_obj and drm_private_state allocation a bit more consistent
and ended up creating a new atomic_create_state callback to allocate a
new state with no side effect.

The first patches document the existing behaviour and fix a few
cleanups and typos.

Then, __drm_*_state_reset() helpers are renamed to
__drm_*_state_init() to clarify that they initialize rather than
reset state, and we add the new atomic_create_state callback to
every other DRM object (planes, CRTCs, connectors, colorops).

Next, we leverage those new callbacks to create a new helper,
drm_mode_config_create_initial_state(), to create the initial state
for all the objects of a driver, and update the driver skeleton to
recommend it.

Finally, we convert the tidss driver and the bridge_connector to the
new pattern.

This was tested on a TI SK-AM62, with the tidss driver.

Let me know what you think,
Maxime

1: https://lore.kernel.org/dri-devel/20250902-drm-state-readout-v1-0-14ad5315da3f@kernel.org/
2: https://lore.kernel.org/dri-devel/5920ffe5-b6b1-484b-b320-332b9eb9db82@suse.de/

Signed-off-by: Maxime Ripard <mripard@kernel.org>
---
Changes in v5:
- Address sashiko reviews
- Improve the docs
- Fix drmm_connector_hdmi_init
- Drop drm/tidss: Switch to drm_mode_config_create_initial_state since
  not all possible bridges would have been converted to create_state 
- Link to v4: https://lore.kernel.org/r/20260512-drm-mode-config-init-v4-0-591dfdcc1bf9@kernel.org

Changes in v4:
- Rebased on current drm-misc-next
- Update drm_atomic_state to drm_atomic_commit
- Various doc impromvements
- Don't call drm_crtc_vblank_reset in create_state
- Prevent mem leak if states already have a state when
  drm_mode_config_reset or _create_initial_state are called
- Link to v3: https://lore.kernel.org/r/20260424-drm-mode-config-init-v3-0-8b68d9db0d8b@kernel.org

Changes in v3:
- Reintroduce state documentation that was dropped by accident
- Change name to drm_mode_config_create_initial_state()
- Don't call drm_mode_config_create_initial_state() in drm_dev_register
  anymore
- Drop __drm_atomic_helper_*_create_state
- Improve documentation and commit messages where necessary
- Collected tags
- Link to v2: https://lore.kernel.org/r/20260320-drm-mode-config-init-v2-0-c63f1134e76c@kernel.org

Changes in v2:
- Change the _state_reset function names to _state_init
- Change the colorop too
- Various doc improvements
- Link to v1: https://lore.kernel.org/r/20260310-drm-mode-config-init-v1-0-de7397c8e1cf@kernel.org

---
Maxime Ripard (19):
      drm/atomic: Document atomic commit lifetime
      drm/colorop: Fix typos in the doc
      drm/atomic: Drop drm_private_obj.state assignment from create_state
      drm/atomic: Expand atomic_create_state expectations for drm_private_obj
      drm/mode-config: Document drm_private_obj exclusion from drm_mode_config_reset()
      drm/colorop: Rename __drm_colorop_state_reset()
      drm/colorop: Create drm_atomic_helper_colorop_create_state()
      drm/atomic-state-helper: Fix __drm_atomic_helper_plane_reset() doc typo
      drm/atomic-state-helper: Rename __drm_atomic_helper_plane_state_reset()
      drm/plane: Add new atomic_create_state callback
      drm/atomic-state-helper: Rename __drm_atomic_helper_crtc_state_reset()
      drm/crtc: Add new atomic_create_state callback
      drm/atomic-state-helper: Rename __drm_atomic_helper_connector_state_reset()
      drm/hdmi: Rename __drm_atomic_helper_connector_hdmi_reset()
      drm/connector: Add new atomic_create_state callback
      drm/mode-config: Create drm_mode_config_create_initial_state()
      drm/drv: Switch skeleton to drm_mode_config_create_initial_state()
      drm/tidss: Convert to atomic_create_state
      drm/bridge_connector: Convert to atomic_create_state

 Documentation/gpu/drm-kms.rst                      |   6 +
 drivers/gpu/drm/display/drm_bridge_connector.c     |  17 +-
 drivers/gpu/drm/display/drm_hdmi_state_helper.c    |  15 +-
 drivers/gpu/drm/drm_atomic.c                       |  67 ++++++++
 drivers/gpu/drm/drm_atomic_state_helper.c          | 114 ++++++++++---
 drivers/gpu/drm/drm_colorop.c                      |  41 ++++-
 drivers/gpu/drm/drm_connector.c                    |  10 +-
 drivers/gpu/drm/drm_drv.c                          |   4 +-
 drivers/gpu/drm/drm_mode_config.c                  | 189 ++++++++++++++++++++-
 drivers/gpu/drm/i915/display/intel_crtc.c          |   2 +-
 drivers/gpu/drm/i915/display/intel_plane.c         |   2 +-
 drivers/gpu/drm/sun4i/sun4i_hdmi_enc.c             |   2 +-
 drivers/gpu/drm/tests/drm_hdmi_state_helper_test.c |   2 +-
 drivers/gpu/drm/tidss/tidss_crtc.c                 |  17 +-
 drivers/gpu/drm/tidss/tidss_plane.c                |   2 +-
 drivers/gpu/drm/vc4/vc4_hdmi.c                     |   2 +-
 include/drm/display/drm_hdmi_state_helper.h        |   4 +-
 include/drm/drm_atomic.h                           |   5 +-
 include/drm/drm_atomic_state_helper.h              |  12 +-
 include/drm/drm_colorop.h                          |   2 +
 include/drm/drm_connector.h                        |  16 ++
 include/drm/drm_crtc.h                             |  16 ++
 include/drm/drm_mode_config.h                      |   1 +
 include/drm/drm_plane.h                            |  16 ++
 24 files changed, 496 insertions(+), 68 deletions(-)
---
base-commit: 69c95e4c529297c25503e60acba757fba24fdc95
change-id: 20260310-drm-mode-config-init-1e1f52b745d0

Best regards,
-- 
Maxime Ripard <mripard@kernel.org>


^ permalink raw reply

* Re: [PATCH v5 00/13] ima: Introduce staging mechanism
From: Roberto Sassu @ 2026-05-19  8:38 UTC (permalink / raw)
  To: Lakshmi Ramasubramanian, steven chen, corbet, skhan, zohar,
	dmitry.kasatkin, eric.snowberg, paul, jmorris, serge
  Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
	gregorylumen, Roberto Sassu
In-Reply-To: <8db443f1-d2f3-47ce-9116-18985ed0b290@linux.microsoft.com>

On Fri, 2026-05-15 at 10:37 -0700, Lakshmi Ramasubramanian wrote:
> Thanks for the response Roberto.
> 
> On 5/12/2026 1:17 AM, Roberto Sassu wrote:
> 
> > > > 
> > > > This submission proposes two ways for log trimming:
> > > > 
> > > > *Flavor 1:* Staging With Prompt
> > > > *Flavor 2:* Stage and Delete N
> > > > 
> > 
> > I'm happy to support your trimming method. Just does not fit with my
> > use case. I would like to keep both.
> > 
> 
> If "Flavor 1: Staging With Prompt" would be beneficial to the Linux 
> kernel customers, in general, we should continue to review the change 
> and merge it eventually.
> 
> My request, then, would be to split this patch set into 2 parts:
> 
> 	Part 1: Implements "Staging With Prompt"
> 
> 	Part 2: Implements "Stage and Delete N"
> 
> I think that would make it easier for reviewing the code, test\validate, 
> and merge.

No need in my opinion, it is simple enough.

Roberto


^ permalink raw reply

* Re: [PATCH 4/8] drm/panthor: Add support for protected memory allocation in panthor
From: Ketil Johnsen @ 2026-05-19  8:49 UTC (permalink / raw)
  To: Boris Brezillon, Chia-I Wu
  Cc: Liviu Dudau, Marcin Ślusarz, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, Shuah Khan, Sumit Semwal, Benjamin Gaignard,
	Brian Starkey, John Stultz, T.J. Mercier, Christian König,
	Steven Price, Daniel Almeida, Alice Ryhl, Matthias Brugger,
	AngeloGioacchino Del Regno, dri-devel, linux-doc, linux-kernel,
	linux-media, linaro-mm-sig, linux-arm-kernel, linux-mediatek,
	Florent Tomasin, nd
In-Reply-To: <20260519093955.448ff899@fedora>

On 19/05/2026 09:39, Boris Brezillon wrote:
> On Mon, 18 May 2026 17:36:40 -0700
> Chia-I Wu <olvaffe@gmail.com> wrote:
> 
>> On Mon, May 18, 2026 at 12:16 AM Boris Brezillon
>> <boris.brezillon@collabora.com> wrote:
>>>
>>> On Wed, 13 May 2026 12:31:32 -0700
>>> Chia-I Wu <olvaffe@gmail.com> wrote:
>>>   
>>>> On Tue, May 12, 2026 at 8:39 AM Liviu Dudau <liviu.dudau@arm.com> wrote:
>>>>>
>>>>> On Tue, May 12, 2026 at 04:11:11PM +0200, Boris Brezillon wrote:
>>>>>> On Tue, 12 May 2026 14:47:27 +0100
>>>>>> Liviu Dudau <liviu.dudau@arm.com> wrote:
>>>>>>   
>>>>>>> On Thu, May 07, 2026 at 01:53:56PM +0200, Boris Brezillon wrote:
>>>>>>>> On Thu, 7 May 2026 11:02:26 +0200
>>>>>>>> Marcin Ślusarz <marcin.slusarz@arm.com> wrote:
>>>>>>>>   
>>>>>>>>> On Tue, May 05, 2026 at 06:15:23PM +0200, Boris Brezillon wrote:
>>>>>>>>>>> @@ -277,9 +286,21 @@ int panthor_device_init(struct panthor_device *ptdev)
>>>>>>>>>>>                      return ret;
>>>>>>>>>>>      }
>>>>>>>>>>>
>>>>>>>>>>> +   /* If a protected heap name is specified but not found, defer the probe until created */
>>>>>>>>>>> +   if (protected_heap_name && strlen(protected_heap_name)) {
>>>>>>>>>>
>>>>>>>>>> Do we really need this strlen() > 0? Won't dma_heap_find() fail is the
>>>>>>>>>> name is "" already?
>>>>>>>>>
>>>>>>>>> If dma_heap_find() will fail, then the whole probe with fail too.
>>>>>>>>> This check prevents that.
>>>>>>>>
>>>>>>>> Yeah, that's also a questionable design choice. I mean, we can
>>>>>>>> currently probe and boot the FW even though we never setup the
>>>>>>>> protected FW sections, so why should we defer the probe here? Can't we
>>>>>>>> just retry the next time a group with the protected bit is created and
>>>>>>>> fail if we can find a protected heap?
>>>>>>>
>>>>>>> The problem we have with the current firmware is that it does a number of setup steps at "boot"
>>>>>>> time only. One of the steps is preparing its internal structures for when it enters protected
>>>>>>> mode and it stores them in the buffer passed in at firmware loading. We cannot later run the
>>>>>>> process when we have a group with protected mode set.
>>>>>>
>>>>>> No, but we can force a full/slow reset and have that thing
>>>>>> re-initialized, can't we? I mean, that's basically what we do when a
>>>>>> fast reset fails: we re-initialize all the sections and reset again, at
>>>>>> which point the FW should start from a fresh state, and be able to
>>>>>> properly initialize the protected-related stuff if protected sections
>>>>>> are populated. Am I missing something?
>>>>>
>>>>> Right, we can do that. For some reason I keep associating the reset with the
>>>>> error handling and not with "normal" operations.
>>>> I kind of hope we end up with either
>>>>
>>>>   - panthor knows the exact heap to use and fails with EPROBE_DEFER if
>>>> the heap is missing, or
>>>>   - panthor gets a dma-buf from userspace and does the full reset
>>>>     - userspace also needs to provide a dma-buf for each protected
>>>> group for the suspend buffer
>>>>
>>>> than something in-between. The latter is more ad-hoc and basically
>>>> kicks the issue to the userspace.
>>>
>>> Indeed, the second option is more ad-hoc, but when you think about it,
>>> userspace has to have this knowledge, because it needs to know the
>>> dma-heap to use for buffer allocation that cross a device boundary
>>> anyway. Think about frames produced by a video decoder, and composited
>>> by the GPU into a protected scanout buffer that's passed to the KMS
>>> device. Why would the GPU driver be source of truth when it comes to
>>> choosing the heap to use to allocate protected buffers for the video
>>> decoder or those used for the display?
>> I don't think the GPU driver is ever the source of truth. If the
>> system integrator wants to specify the source of truth (SoT) from
>> kernel space, they should use the device tree (or module params /
>> config options). If they want to specify the SoT in userspace, then we
>> don't really care how it is done other than providing an ioctl.
>> Panthor is always on the receiving end.
> 
> Okay, we're on the same page then.
> 
>>
>> If we don't want to delay this functionality, but it takes time to
>> converge on SoT, maybe a solution that is not a long-term promise can
>> work? Of the options on the table (dt, module params, kconfig options,
>> ioctls), a kconfig option, potentially marked as experimental, seems
>> like a good candidate.
> 
> If Panthor is only a consumer, I actually think it'd be easier to just
> let userspace pass the protected FW section as an imported buffer
> through an ioctl for now. It means we don't need any of the
> modifications to the dma_heap API in this series, and userspace is free
> to choose its SoT (efuse, DT, ...) and pass the info back to mesa/GBM
> somehow (envvar, driconf, ...). The only thing we need to ensure is if
> lazy protected FW section allocation is going to work, but given the
> current code purely and simply ignores those sections, and the FW is
> still able to boot and act properly (at least on v10-v13), I'm pretty
> confident this is okay, unless there's some trick the MCU can do to
> detect that the protected section isn't mapped (which I doubt, because
> the MCU doesn't know it lives behind an MMU).
> 
> Of course, once we have a consensus on how to describe this in the DT,
> we can switch Panthor over to "protected dma_heap selection through DT",
> and reflect that through the ioctl that exposes whether protected
> support is ready or not (would be a DEV_QUERY), such that userspace can
> skip this "PROTM initialization" step.
> 
> We're talking about an extra ioctl to set those buffers, and a
> DEV_QUERY to query the state (ready or not), the size of the global
> protected buffer (protected FW section) and the size of the protected
> suspend buffer. The protected suspend buffer would be allocated and
> passed at group creation time (extra arg passed to the existing
> GROUP_CREATE ioctl). So, overall, I don't consider it a huge liability
> in term of maintenance cost.

If we can avoid the dma-heap changes, then that would surely help!
I can try to implement this in the next version unless someone finds a 
reason why it is a bad idea.

>>>> For the former, expressing the relation in DT seems to be the best,
>>>> but only if possible :-). Otherwise, a kconfig option (instead of
>>>> module param) should be easier to work with.
>>>>
>>>> Looking at the userspace implementation, can we also have an panthor
>>>> ioctl to return the heap to userspace?
>>>
>>> Yes, it's something we can add, but again, I'm questioning the
>>> usefulness of this: how can we ensure the heap used by panthor to
>>> allocate its protected FW buffers is suitable for scanout buffers
>>> (buffers that can be used by display drivers). There needs to be a glue
>>> leaving in usersland and taking the decision, and I'm not too sure
>>> trusting any of the component in the chain (vdec, gpu, display) is the
>>> right thing to do.
>> The heap returned by panthor is only for panfrost/panvk. It says
>> nothing about compatibility with other components on the system.
> 
> Okay, if it's used only for internal buffers, I guess that's fine.

--
Ketil

^ permalink raw reply

* Re: [PATCH] nios2: remove the architecture
From: David Laight @ 2026-05-19  8:48 UTC (permalink / raw)
  To: Ethan Nelson-Moore
  Cc: linux-doc, devicetree, workflows, linux-arch, dmaengine,
	linux-i2c, linux-iio, netdev, linux-pci, linux-pwm,
	linux-hardening, linux-kbuild, linux-csky, Jonathan Corbet,
	Shuah Khan, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Daniel Lezcano, Thomas Gleixner, Alex Shi, Yanteng Si,
	Dongliang Mu, Hu Haowen, Dinh Nguyen, Kees Cook, Oleg Nesterov,
	Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Vinod Koul, Frank Li, Dave Penkler, Andi Shyti,
	Jonathan Cameron, David Lechner, Nuno Sá, Andy Shevchenko,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Lorenzo Pieralisi, Krzysztof Wilczyński
In-Reply-To: <20260518042833.272221-1-enelsonmoore@gmail.com>

On Sun, 17 May 2026 21:28:33 -0700
Ethan Nelson-Moore <enelsonmoore@gmail.com> wrote:

> The Nios II architecture is a soft-core architecture developed by
> Altera (since acquired by Intel) and intended to run on their FPGAs.
> 
> Licenses for the architecture have not been available for purchase
> since 2024 [1],

Except I think they got 'beaten up' by some telcos.
The Nios II gets used inside fpga for small cpu doing things that it would
be far to difficult to do in VHDL.
(I believe some mobile base stations fgpa embed a lot of them.)
These will have a small amount of code (maybe 4k - 64k) and a similarly
small amount of data memory along with access to fpga peripheral registers
and (optionally) host memory vie PCIe. No MMU, no cache (or rather the code/data
is in the cache memory but it isn't backed by anything), no branch predictor
(guaranteed cycle times), etc.
Intel suggested that RISCV could be used instead, but it isn't the same beast.
They didn't document the instruction timings nor how to add custom instructions.

The company I used to work for used 4 NIOS II inside an fpga.
The instruction timing for one is pretty critical, it has some code that
has to complete in 122 clocks (worst case).
Our solution was to spend a few man-weeks writing a compatible cpu!
I think it came out with fewer pipeline stalls (in particular it 'lost'
the one for a (predicted) taken branch).
The maximum clock frequency might be lower; but it is ok at 62.5MHz and the
higher 125MHz in just impossible for all sorts of reasons.

OTOH I really wouldn't run Linux on it!

-- David

> and support for it has been removed from GCC 15 [2],
> Buildroot [3], and QEMU [4].
> 
> Given all of these factors, it is time to remove Nios II support from
> the kernel. The maintainer stated in 2024 that they were planning to do
> so soon [5], but this did not come to pass.
> 
> Remove Nios II support from the kernel and move the former maintainer
> to CREDITS. Thank you, Dinh Nguyen, for maintaining Nios II support!
> 
> References:
> [1] https://docs.altera.com/v/u/docs/781327/is-discontinuing-ip-ordering-codes-listed-in-pdn2312-for-nios-ii-ip
> [2] https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=e876acab6cdd84bb2b32c98fc69fb0ba29c81153
> [3] https://github.com/buildroot/buildroot/commit/6775ccc5a199d574ad70b5f79ec58cce97a07c6f
> [4] https://github.com/qemu/qemu/commit/6c3014858c4c0024dd0560f08a6eda0f92f658d6
> [5] https://sourceware.org/pipermail/newlib/2024/021083.html
> 
> Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>

^ permalink raw reply

* Re: [PATCH v2] dcache: add fs.dentry-limit sysctl with negative-first reaper
From: Jan Kara @ 2026-05-19  8:45 UTC (permalink / raw)
  To: Horst Birthelmer
  Cc: Matthew Wilcox, Horst Birthelmer, Miklos Szeredi, Jonathan Corbet,
	Shuah Khan, Alexander Viro, Christian Brauner, Jan Kara,
	linux-doc, linux-kernel, linux-fsdevel, Horst Birthelmer
In-Reply-To: <aglh7SrXWbYgD3nA@fedora.fritz.box>

Hi Horst!

On Sun 17-05-26 09:57:41, Horst Birthelmer wrote:
> On Sun, May 17, 2026 at 12:09:26AM +0100, Matthew Wilcox wrote:
> > On Sat, May 16, 2026 at 04:52:54PM +0200, Horst Birthelmer wrote:
> > > There was a discussion at LSFMM about servers with too many cached
> > > negative dentries.
> > > That gave me the idea to keep the dentries in general limited
> > > if the system administrator needs it to.
> > 
> > I feel you should link to the dozens of previous attempts at this kind
> > of thing to show that you're aware that this has been tried before and
> > you're doing something meaningfully different.

<snip>

> As a conclusion, I think I have an uncommon perspective on the cache entries
> since I don't usually work on vfs but argue from the perspective of a fuse server
> Where the kernel makes us waste resources. This hurts way more in the FUSE context
> than in a 'normal' file system.
> I have taken the look at the dentry cache just because people told me that this
> has to be solved in the vfs (and I agree). I actually have a somewhat hacky patch
> to do this from fuse and only for the fuse sb.

So I'm a bit confused here. The changelog speaks only about negative
dentries (and that's what the change also concentrates on). OTOH you've
mentioned multiple times that you are not really interested in limiting
negative dentries but rather positive ones because you have a problem with
cached inodes. So can you perhaps formulate what is exactly the problem
you're trying to solve?

Also you mention that cached (positive) dentries and inodes are a wasted
memory when they aren't used. That is certainly a valid view, OTOH you can
never predict future so you don't really know what will get used in the
future and thus will be useful. That's why we currently side with the idea
that memory that isn't used for something is wasted and unless there's
something to use the memory for, we cache dentries & inodes & page cache in
it.

If I remember correctly the discussion we had at LSF, the problem why inode
caching is a problem for you, although there's enough free memory and no
memory pressure, is that these cached inodes pin memory on the other end of
the FUSE communication channel and there we are getting short on memory. Is
this what you're trying to solve?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Albert Esteve @ 2026-05-19  8:25 UTC (permalink / raw)
  To: Christian König
  Cc: T.J. Mercier, Christian Brauner, Tejun Heo, Johannes Weiner,
	Michal Koutný, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
	Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
	Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <01b6eefc-c107-4f8c-9d7c-3b86f54cabaa@amd.com>

On Tue, May 19, 2026 at 9:20 AM Christian König
<christian.koenig@amd.com> wrote:
>
> On 5/19/26 01:39, T.J. Mercier wrote:
> > On Mon, May 18, 2026 at 7:07 AM Christian König
> > <christian.koenig@amd.com> wrote:
> >>
> >> On 5/18/26 14:50, Albert Esteve wrote:
> >>> On Mon, May 18, 2026 at 9:20 AM Christian König
> >>> <christian.koenig@amd.com> wrote:
> >>>>
> >>>> On 5/15/26 19:06, T.J. Mercier wrote:
> >>>>> On Fri, May 15, 2026 at 6:53 AM Christian Brauner <brauner@kernel.org> wrote:
> >>>>>>
> >>>>>> On Tue, May 12, 2026 at 11:10:44AM +0200, Albert Esteve wrote:
> >>>>>>> On embedded platforms a central process often allocates dma-buf
> >>>>>>> memory on behalf of client applications. Without a way to
> >>>>>>> attribute the charge to the requesting client's cgroup, the
> >>>>>>> cost lands on the allocator, making per-cgroup memory limits
> >>>>>>> ineffective for the actual consumers.
> >>>>>>>
> >>>>>>> Add charge_pid_fd to struct dma_heap_allocation_data. When set to
> >>>>>>
> >>>>>> Please be aware that pidfds come in two flavors:
> >>>>>>
> >>>>>> thread-group pidfds and thread-specific pidfds. Make sure that your API
> >>>>>> doesn't implicitly depend on this distinction not existing.
> >>>>>
> >>>>> Hi Christian,
> >>>>>
> >>>>> Memcg is not a controller that supports "thread mode" so all threads
> >>>>> in a group should belong to the same memcg.
> >>>>
> >>>> BTW: Exactly that is the requirement automotive has with their native context use case.
> >>>>
> >>>> The use case is that you have a deamon which has multiple threads were each one is acting on behalve of some other process.
> >>>>
> >>>> At the moment we basically say they are simply not using cgroups for that use case, but it would be really nice if we could handle that as well.
> >>>>
> >>>> Summarizing the requirement of that use case: You need a different cgroup for each thread of a process.
> >>>
> >>> Hi Christian,
> >>>
> >>> Thanks for sharing this atuomotive usecase. If I understand correctly,
> >>> the actual requirement is attributing dma-buf charges to the right
> >>> client, not putting each daemon thread in a different cgroup?
> >>
> >> Nope, exactly that's the difference.
> >>
> >> The thread acts as a filtering agent for both memory allocation and command submission for somebody else, the process on which behalve the daemon does things can even be in a client VM, completely remote over some network or even something like a microcontroller.
> >>
> >> Everything the thread does regarding CPU time, GPU driver memory allocation as well as resources like GPU processing and I/O time etc.. needs to be accounted to one client which can be different for each thread of the process.
> >>
> >> The only thing which is shared with the main process thread is CPU memory resources, e.g. malloc() because that is basically just needed for housekeeping and pretty much irrelevant for this kind of use case.
> >>
> >> The problem is now you can't do that with cgroups at the moment but unfortunately only the kernel has the information you need to know to do this.
> >>
> >> So what you end up with is to define tons of interfaces just to get the necessary information from the kernel into userspace and then essentially duplicate the same infrastructure cgroup provides in the kernel in userspace again.
> >>
> >>> If so,
> >>> the `charge_pid_fd` approach achieves this directly by passing the
> >>> client's `pid_fd`, without needing to add per-thread cgroup
> >>> infrastructure.
> >>
> >> Well it's already a massive improvemt, we could basically stop doing the whole duplication part for the GPU driver stack and just use cgroups for this part.
> >>
> >> Doing that automatically for CPU and I/O time would just be nice to have additionally.
> >>
> >> Regards,
> >> Christian.
> >
> > Hopefully I'm following correctly here.... So you are duplicating the
> > GPU driver stack to achieve remote accounting on a per-thread basis?
>
> Not quite, we are duplicating the handling cgroup provides in the kernel in userspace.
>
> For this memory usage information as well as execution times of the GPU kernel driver is exposed in fdinfo for example.
>
> > Does this mean for GPU allocations you currently have some GFP_ACCOUNT
> > magic in your driver to attribute GPU memory to the correct remote
> > client?
>
> No, we just expose what the kernel driver has allocated for itself. E.g. page tables, buffers etc...
>
> When userspace allocates something using memfd_create() for example we just ignore that.


>
> > So this series would close the gap for dma-buf allocations,
> > but what about private GPU driver memory allocated on behalf of a
> > client?
>
> Well we would need a cgroup which isn't associated with any process were we could charge the GPU driver allocations against.

I think I better understand your framing for this now. Thanks again
for taking the time to explain.

I was looking for a way to pass cgroup around to do the charge. I
found that `struct cgroup *cgroup_get_from_fd(int fd)` already exists
in cgroups available symbols to handle cgroup directories.

So here's an idea...

Rename the charge_pid_fd to charge_fd:
- If it is a pidfd (`!IS_ERR(pidfd_pid(fget(charge_fd)))`) then we do
what we're already doing here.
- If it is a cgroup_fd (`!IS_ERR(cgroup_get_from_fd(charge_fd))`) then
we charge to that cgroup.

Also we could add add an ioctl for the generic fd path similar to what
we have for dma-buf heaps. Or have a new flavour for memfd_create:
```
memfd_create2(name, flags, charge_fd);
```

The transfer ioctl could also be made generic to accept both pidfds
and cgroup_fds.

For this series we could move forward as is, and make the generic
solution a follow-up series, knowing that the field can be reused for
cgroup fds.

>
> But good point, charging against a pid wouldn't work in this use case.
>
> Regards,
> Christian.
>


^ permalink raw reply

* Re: [PATCH 09/12] drm/syncobj: fix resource leak in drm_syncobj_import_sync_file_fence
From: Christian König @ 2026-05-19  8:22 UTC (permalink / raw)
  To: Julian Orth, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Sumit Semwal, Jonathan Corbet,
	Shuah Khan, Arnd Bergmann, Greg Kroah-Hartman
  Cc: dri-devel, linux-kernel, linux-media, linaro-mm-sig, linux-doc,
	wayland-devel
In-Reply-To: <20260516-jorth-syncobj-v1-9-88ede9d98a81@gmail.com>

On 5/16/26 13:06, Julian Orth wrote:
> Previously, if dma_fence_chain_alloc() failed, the syncobj and fence
> would be leaked.

Since it is a bug fix that patch should be send out separately from the patch set.

> 
> Signed-off-by: Julian Orth <ju.orth@gmail.com>
> ---
>  drivers/gpu/drm/drm_syncobj.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> index 9b7ecc2978f5..1da96e23dfc0 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -767,30 +767,35 @@ static int drm_syncobj_import_sync_file_fence(struct drm_file *file_private,
>  {
>  	struct dma_fence *fence = sync_file_get_fence(fd);
>  	struct drm_syncobj *syncobj;
> +	int ret = 0;

Please don't initialize local return variables, initialize them when you know that the function is successful.

Regards,
Christian.

>  
>  	if (!fence)
>  		return -EINVAL;
>  
>  	syncobj = drm_syncobj_find(file_private, handle);
>  	if (!syncobj) {
> -		dma_fence_put(fence);
> -		return -ENOENT;
> +		ret = -ENOENT;
> +		goto err_syncobj;
>  	}
>  
>  	if (point) {
>  		struct dma_fence_chain *chain = dma_fence_chain_alloc();
>  
> -		if (!chain)
> -			return -ENOMEM;
> +		if (!chain) {
> +			ret = -ENOMEM;
> +			goto err;
> +		}
>  
>  		drm_syncobj_add_point(syncobj, chain, fence, point);
>  	} else {
>  		drm_syncobj_replace_fence(syncobj, fence);
>  	}
>  
> -	dma_fence_put(fence);
> +err:
>  	drm_syncobj_put(syncobj);
> -	return 0;
> +err_syncobj:
> +	dma_fence_put(fence);
> +	return ret;
>  }
>  
>  static int drm_syncobj_export_sync_file(struct drm_file *file_private,
> 


^ permalink raw reply

* Re: [PATCH net-next v3 02/14] libie: add PCI device initialization helpers to libie
From: Philipp Stanner @ 2026-05-19  8:20 UTC (permalink / raw)
  To: Bjorn Helgaas, Tony Nguyen
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	Phani R Burra, larysa.zaremba, przemyslaw.kitszel,
	aleksander.lobakin, sridhar.samudrala, anjali.singhai,
	michal.swiatkowski, maciej.fijalkowski, emil.s.tantilov,
	madhu.chittim, joshua.a.hay, jacob.e.keller,
	jayaprakash.shanmugam, jiri, horms, corbet, richardcochran,
	linux-doc, bhelgaas, linux-pci, Bharath R, Samuel Salin,
	Aleksandr Loktionov, Philipp Stanner
In-Reply-To: <20260518215441.GA640516@bhelgaas>

On Mon, 2026-05-18 at 16:54 -0500, Bjorn Helgaas wrote:
> [+cc Philipp]
> 
> On Fri, May 15, 2026 at 03:44:26PM -0700, Tony Nguyen wrote:
> > From: Phani R Burra <phani.r.burra@intel.com>
> > 
> > Add support functions for drivers to configure PCI functionality and access
> > MMIO space.
> 
> This looks kind of like what pcim_iomap_range() does, i.e., a way to
> ioremap (BAR-idx, offset, size) pieces of PCI BARs.  That sounds like
> useful functionality.
> 
> Is there something Intel-specific or even ethernet-specific about
> this?  If devm_* and pcim_* don't do what you need, maybe they should
> be extended or this could be made generic so any drivers could use it?
> 
> This looks like a mix of managed (pcim_enable_device(),
> pcim_request_region()), and unmanaged (ioremap(), iounmap()) things.
> I haven't looked at how all this is used, but it's pretty easy to get
> things wrong when mixing models.
> 
> > +++ b/drivers/net/ethernet/intel/libie/pci.c
> > @@ -0,0 +1,208 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright (C) 2025 Intel Corporation */
> > +
> > +#include <linux/intel/libie/pci.h>
> > +
> > +/**
> > + * libie_find_mmio_region - find MMIO region containing a range
> > + * @mmio_list: list that contains MMIO region info
> > + * @offset: range start offset
> > + * @size: range size
> > + * @bar_idx: BAR index containing the range to search
> > + *
> > + * Return: pointer to a MMIO region overlapping with the range in any way or
> > + *	   NULL if no such region is mapped.
> > + */
> > +static struct libie_pci_mmio_region *
> > +libie_find_mmio_region(const struct list_head *mmio_list,
> > +		       resource_size_t offset, resource_size_t size,
> > +		       int bar_idx)
> > +{
> > +	resource_size_t end_offset = offset + size;
> > +	struct libie_pci_mmio_region *mr;
> > +
> > +	list_for_each_entry(mr, mmio_list, list) {
> > +		resource_size_t mr_end = mr->offset + mr->size;
> > +		resource_size_t mr_start = mr->offset;
> > +
> > +		if (mr->bar_idx != bar_idx)
> > +			continue;
> > +		if (offset < mr_end && end_offset > mr_start)
> > +			return mr;
> > +	}
> > +
> > +	return NULL;
> > +}
> > +
> > +/**
> > + * __libie_pci_get_mmio_addr - get the MMIO virtual address
> > + * @mmio_info: contains list of MMIO regions
> > + * @offset: register offset to find
> > + * @num_args: number of additional arguments present
> > + *
> > + * This function finds the virtual address of a register offset by iterating
> > + * through the non-linear MMIO regions that are mapped by the driver.
> > + *
> > + * Return: valid MMIO virtual address or NULL.
> > + */
> > +void __iomem *__libie_pci_get_mmio_addr(struct libie_mmio_info *mmio_info,
> > +					resource_size_t offset,
> > +					int num_args, ...)
> > +{
> > +	struct libie_pci_mmio_region *mr;
> > +	int bar_idx = 0;
> > +	va_list args;
> > +
> > +	if (num_args) {
> > +		va_start(args, num_args);
> > +		bar_idx = va_arg(args, int);
> > +		va_end(args);
> > +	}
> > +
> > +	list_for_each_entry(mr, &mmio_info->mmio_list, list)
> > +		if (bar_idx == mr->bar_idx && offset >= mr->offset &&
> > +		    offset < mr->offset + mr->size) {
> > +			offset -= mr->offset;
> > +
> > +			return mr->addr + offset;
> > +		}
> > +
> > +	return NULL;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(__libie_pci_get_mmio_addr, "LIBIE_PCI");
> > +
> > +/**
> > + * __libie_pci_map_mmio_region - map PCI device MMIO region
> > + * @mmio_info: struct to store the mapped MMIO region
> > + * @offset: MMIO region start offset
> > + * @size: MMIO region size
> > + * @num_args: number of additional arguments present
> > + *
> > + * Return: true on success, false on memory map failure.
> > + */
> > +bool __libie_pci_map_mmio_region(struct libie_mmio_info *mmio_info,
> > +				 resource_size_t offset,
> > +				 resource_size_t size, int num_args, ...)
> > +{
> > +	struct pci_dev *pdev = mmio_info->pdev;
> > +	struct libie_pci_mmio_region *mr;
> > +	resource_size_t pa;
> > +	void __iomem *va;
> > +	int bar_idx = 0;
> > +	va_list args;
> > +
> > +	if (num_args) {
> > +		va_start(args, num_args);
> > +		bar_idx = va_arg(args, int);
> > +		va_end(args);
> > +	}
> > +
> > +	if (offset + size > pci_resource_len(pdev, bar_idx))
> > +		return false;
> > +
> > +	mr = libie_find_mmio_region(&mmio_info->mmio_list, offset, size,
> > +				    bar_idx);
> > +	if (mr) {
> > +		pci_warn(pdev,
> > +			 "Mapping of BAR%u (offset=%llu, size=%llu) intersecting region (offset=%llu, size=%llu) already exists\n",
> > +			 bar_idx, (unsigned long long)mr->offset,
> > +			 (unsigned long long)mr->size,
> > +			 (unsigned long long)offset, (unsigned long long)size);
> > +		return mr->offset <= offset &&
> > +		       mr->offset + mr->size >= offset + size;
> > +	}
> > +
> > +	pa = pci_resource_start(pdev, bar_idx) + offset;
> > +	va = ioremap(pa, size);

I agree with Bjorn, this certainly looks like something that can be
covered by shared PCI infrastructure?

> > +	if (!va) {
> > +		pci_err(pdev, "Failed to map BAR%u region\n", bar_idx);
> > +		return false;
> > +	}
> > +
> > +	mr = kvzalloc_obj(*mr);
> > +	if (!mr) {
> > +		iounmap(va);
> > +		return false;
> > +	}
> > +
> > +	mr->addr = va;
> > +	mr->offset = offset;
> > +	mr->size = size;
> > +	mr->bar_idx = bar_idx;
> > +
> > +	list_add_tail(&mr->list, &mmio_info->mmio_list);
> > +
> > +	return true;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(__libie_pci_map_mmio_region, "LIBIE_PCI");
> > +
> > +/**
> > + * libie_pci_unmap_fltr_regs - unmap selected PCI device MMIO regions
> > + * @mmio_info: contains list of MMIO regions to unmap
> > + * @fltr: returns true, if region is to be unmapped
> > + */
> > +void libie_pci_unmap_fltr_regs(struct libie_mmio_info *mmio_info,
> > +			       bool (*fltr)(struct libie_mmio_info *mmio_info,
> > +					    struct libie_pci_mmio_region *reg))
> > +{
> > +	struct libie_pci_mmio_region *mr, *tmp;
> > +
> > +	list_for_each_entry_safe(mr, tmp, &mmio_info->mmio_list, list) {
> > +		if (!fltr(mmio_info, mr))
> > +			continue;
> > +		iounmap(mr->addr);
> > +		list_del(&mr->list);
> > +		kvfree(mr);
> > +	}
> > +}
> > +EXPORT_SYMBOL_NS_GPL(libie_pci_unmap_fltr_regs, "LIBIE_PCI");
> > +
> > +/**
> > + * libie_pci_unmap_all_mmio_regions - unmap all PCI device MMIO regions
> > + * @mmio_info: contains list of MMIO regions to unmap
> > + */
> > +void libie_pci_unmap_all_mmio_regions(struct libie_mmio_info *mmio_info)
> > +{
> > +	struct libie_pci_mmio_region *mr, *tmp;
> > +
> > +	list_for_each_entry_safe(mr, tmp, &mmio_info->mmio_list, list) {
> > +		iounmap(mr->addr);
> > +		list_del(&mr->list);
> > +		kvfree(mr);
> > +	}
> > +}
> > +EXPORT_SYMBOL_NS_GPL(libie_pci_unmap_all_mmio_regions, "LIBIE_PCI");
> > +
> > +/**
> > + * libie_pci_init_dev - enable and reserve PCI regions of the device
> > + * @pdev: PCI device information
> > + *
> > + * Return: %0 on success, -%errno on failure.
> > + */
> > +int libie_pci_init_dev(struct pci_dev *pdev)
> > +{
> > +	int err;
> > +
> > +	err = pcim_enable_device(pdev);
> > +	if (err)
> > +		return err;
> > +
> > +	for (int bar = 0; bar < PCI_STD_NUM_BARS; bar++)
> > +		if (pci_resource_flags(pdev, bar) & IORESOURCE_MEM) {
> > +			err = pcim_request_region(pdev, bar, pci_name(pdev));

So mappings are handled manually, and region requests automatically
through devres?

In case you can use (or add) a pcim_iomap_region() function for that,
you would get consistent automatic devres management.


Greetings,
P.

> > +			if (err)
> > +				return err;
> > +		}
> > +
> > +	err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> > +	if (err)
> > +		return err;
> > +
> > +	pci_set_master(pdev);
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(libie_pci_init_dev, "LIBIE_PCI");
> > +
> > +MODULE_DESCRIPTION("Common Ethernet PCI library");
> > +MODULE_LICENSE("GPL");
> > diff --git a/include/linux/intel/libie/pci.h b/include/linux/intel/libie/pci.h
> > new file mode 100644
> > index 000000000000..effd072c55c8
> > --- /dev/null
> > +++ b/include/linux/intel/libie/pci.h
> > @@ -0,0 +1,56 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright (C) 2025 Intel Corporation */
> > +
> > +#ifndef __LIBIE_PCI_H
> > +#define __LIBIE_PCI_H
> > +
> > +#include <linux/pci.h>
> > +
> > +/**
> > + * struct libie_pci_mmio_region - structure for MMIO region info
> > + * @list: used to add a MMIO region to the list of MMIO regions in
> > + *	  libie_mmio_info
> > + * @addr: virtual address of MMIO region start
> > + * @offset: start offset of the MMIO region
> > + * @size: size of the MMIO region
> > + * @bar_idx: BAR index to which the MMIO region belongs to
> > + */
> > +struct libie_pci_mmio_region {
> > +	struct list_head	list;
> > +	void __iomem		*addr;
> > +	resource_size_t		offset;
> > +	resource_size_t		size;
> > +	u16			bar_idx;
> > +};
> > +
> > +/**
> > + * struct libie_mmio_info - contains list of MMIO regions
> > + * @pdev: PCI device pointer
> > + * @mmio_list: list of MMIO regions
> > + */
> > +struct libie_mmio_info {
> > +	struct pci_dev		*pdev;
> > +	struct list_head	mmio_list;
> > +};
> > +
> > +#define libie_pci_map_mmio_region(mmio_info, offset, size, ...)	\
> > +	__libie_pci_map_mmio_region(mmio_info, offset, size,		\
> > +				     COUNT_ARGS(__VA_ARGS__), ##__VA_ARGS__)
> > +
> > +#define libie_pci_get_mmio_addr(mmio_info, offset, ...)		\
> > +	__libie_pci_get_mmio_addr(mmio_info, offset,			\
> > +				   COUNT_ARGS(__VA_ARGS__), ##__VA_ARGS__)
> > +
> > +bool __libie_pci_map_mmio_region(struct libie_mmio_info *mmio_info,
> > +				 resource_size_t offset, resource_size_t size,
> > +				 int num_args, ...);
> > +void __iomem *__libie_pci_get_mmio_addr(struct libie_mmio_info *mmio_info,
> > +					resource_size_t offset,
> > +					int num_args, ...);
> > +void libie_pci_unmap_all_mmio_regions(struct libie_mmio_info *mmio_info);
> > +void libie_pci_unmap_fltr_regs(struct libie_mmio_info *mmio_info,
> > +			       bool (*fltr)(struct libie_mmio_info *mmio_info,
> > +					    struct libie_pci_mmio_region *reg));
> > +int libie_pci_init_dev(struct pci_dev *pdev);
> > +
> > +#endif /* __LIBIE_PCI_H */
> > -- 
> > 2.47.1
> > 


^ permalink raw reply

* Re: [PATCH 00/12] misc/syncobj: add /dev/syncobj device
From: Christian König @ 2026-05-19  8:18 UTC (permalink / raw)
  To: Julian Orth
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Sumit Semwal, Jonathan Corbet, Shuah Khan,
	Arnd Bergmann, Greg Kroah-Hartman, dri-devel, linux-kernel,
	linux-media, linaro-mm-sig, linux-doc, wayland-devel,
	Michel Dänzer
In-Reply-To: <CAHijbEWqc2+kSkk3i_LxB2PQ6XwUetw1UkdUdXJfdv3zgKd1kA@mail.gmail.com>

On 5/18/26 14:58, Julian Orth wrote:
> On Mon, May 18, 2026 at 2:41 PM Christian König
> <christian.koenig@amd.com> wrote:
...
>> It could be that we have eventfd integration for that as well now, but in that case you could give the compositor an eventfd instead of a drm_syncobj fd in the first place.
> 
> Yes, all compositors use the DRM_IOCTL_SYNCOBJ_EVENTFD ioctl to wait
> async for the timeline point to materialize and/or be signaled. The
> wayland protocol was the motivation for that ioctl.
> 
>>
>> So as far as I can see using drm_syncobj for software rendering really doesn't make sense, eventfd is a much better fit for that use case.
> 
> Using eventfd has some disadvantages:
> 
> - We've just added syncobj support to vulkan:
> https://github.com/KhronosGroup/Vulkan-Docs/issues/2473#issuecomment-4446117280.
> For eventfd we would not only have to add yet another extension, that
> would realistically only be exposed by llvmpipe, but also every
> compositor and every client would have to support both extensions.
> - Similarly, a new wayland protocol would need to be designed to
> support sync over eventfd.
> - Eventfd does not support timeline semantics. Meaning that you would
> have to send two eventfds over the wire for each commit, one for the
> acquire point and one for the release point. Whereas with syncobj you
> only need to send two integers per commit.
> 
> I don't see the advantage when drm_syncobj already does everything we need.
> 
> You seem to believe that compositors would not be ready for this and
> from that perspective I can understand your apprehension. But I can
> assure you that compositors are already fully set up to support all of
> the usecases I've described: The wayland protocol requires the
> compositor to support wait before signal.
Yeah that's much better than I thought it would be.

And that eventfds don't support timeline points is indeed a pretty good argument.

But I still don't see much justification for creating a /dev/syncobj device, this is clearly something DRM specific.

What about using VGEM for this?

Regards,
Christian.

> 
>>
>> Regards,
>> Christian.

^ permalink raw reply

* Re: [PATCH v4 03/30] UAPI: x86: Move pvclock-abi to UAPI for x86 platforms
From: David Woodhouse @ 2026-05-19  7:56 UTC (permalink / raw)
  To: Dongli Zhang, kvm
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Thomas Gleixner,
	Sean Christopherson, Ingo Molnar, Borislav Petkov, Dave Hansen,
	x86, H. Peter Anvin, Vitaly Kuznetsov, Juergen Gross,
	Boris Ostrovsky, Paul Durrant, Jonathan Cameron, Sascha Bischoff,
	Marc Zyngier, Joey Gouly, Jack Allister, joe.jin, linux-doc,
	linux-kernel, xen-devel, linux-kselftest
In-Reply-To: <93e799fd-b661-45f0-9cc6-21823765332e@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 838 bytes --]

On Tue, 2026-05-19 at 00:35 -0700, Dongli Zhang wrote:
> I have encountered below build warning.
> 
> Perhaps it is because of PATCH 03?
> 
Almost certainly; I'll clean it up. Thank you.

> In file included from ./include/linux/types.h:5,
>                  from ./arch/x86/include/uapi/asm/pvclock-abi.h:5,
>                  from ./arch/x86/include/asm/xen/interface.h:197,
>                  from ./include/xen/interface/xen.h:13,
>                  from <command-line>:
> ./include/uapi/linux/types.h:10:2: warning: #warning "Attempt to use kernel
> headers from user space, see https://kernelnewbies.org/KernelHeaders" [-Wcpp]
>    10 | #warning "Attempt to use kernel headers from user space, see
> https://kernelnewbies.org/KernelHeaders"
>       |  ^~~~~~~


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v4 16/30] KVM: x86: Restructure kvm_guest_time_update() for TSC upscaling
From: David Woodhouse @ 2026-05-19  7:54 UTC (permalink / raw)
  To: Dongli Zhang, kvm
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
	Paul Durrant, Jonathan Cameron, Sascha Bischoff, Marc Zyngier,
	Joey Gouly, Jack Allister, joe.jin, linux-doc, linux-kernel,
	xen-devel, linux-kselftest
In-Reply-To: <b5a8262d-4128-4fd4-b3db-fa718002c4cc@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 1400 bytes --]

On Tue, 2026-05-19 at 00:38 -0700, Dongli Zhang wrote:
> I have encountered this build error with this patch.
> 
> Perhaps it is because all usage of "flags" are removed.
> 
> $ make -j32 > /dev/null
> arch/x86/kvm/x86.c: In function ‘kvm_guest_time_update’:
> arch/x86/kvm/x86.c:3359:23: error: unused variable ‘flags’ [-Werror=unused-variable]
>  3359 |         unsigned long flags;
>       |                       ^~~~~
> cc1: all warnings being treated as errors
> make[4]: *** [scripts/Makefile.build:289: arch/x86/kvm/x86.o] Error 1
> make[3]: *** [scripts/Makefile.build:548: arch/x86/kvm] Error 2
> make[2]: *** [scripts/Makefile.build:548: arch/x86] Error 2
> make[1]: *** [/home/opc/ext4/mainline-linux/Makefile:2143: .] Error 2
> make: *** [Makefile:248: __sub-make] Error 2
> 
> Thank you very much!
> 
> Dongli Zhang

Yes, in all the refactoring/rebasing, somehow the line which should
have removed 'flags' there ended up in
https://lore.kernel.org/all/20260509224824.3264567-31-dwmw2@infradead.org/
along with another one-liner that should have been in a different
previous commit and breaks bisectability of that too. Sorry about that.

Should all be fixed in
https://git.infradead.org/?p=users/dwmw2/linux.git;a=shortlog;h=refs/heads/kvmclock5
where I'm accumulating various fixes in preparation to post a v5.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [Linaro-mm-sig] Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Christian König @ 2026-05-19  7:53 UTC (permalink / raw)
  To: Albert Esteve
  Cc: Barry Song, T.J. Mercier, Tejun Heo, Johannes Weiner,
	Michal Koutný, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
	Christian Brauner, Paul Moore, James Morris, Serge E. Hallyn,
	Stephen Smalley, Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc,
	linux-kernel, linux-media, dri-, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <CADSE00Lc42s2bzXzV5D7t1Enf56u4BVj-yXLp3Yxhm0=qMPvuw@mail.gmail.com>

On 5/18/26 14:06, Albert Esteve wrote:
>>>>> udmabufs are already
>>>>> memcg-charged, so adding a separate MEMCG_DMABUF would double count.
>>>>> Are there any other exporters you had in mind that would benefit from
>>>>> this approach?
>>
>> Well apart from DMA-buf memfd_create() is one of the things which as broken our neck in the past a couple of times.
>>
>> But thinking more about it what if instead of making this DMA-buf heaps specific what if we have a general cgroups function which allows to change accounting of a buffer referenced by a file descriptor to a different process?
>>
>> That would cover not only the DMA-buf heaps use case, but also all other DMA-buf with dmem and whatever we come up in the future as well.
> 
> I removed a draft adding an ioctl for charge transfer from the series
> before sending because I wanted to focus on the charge_pid_fd approach
> and keep things simple, deferring the recharge path to a follow-up
> depending on feedback.
> 
> The main difference between my removed draft and what you're
> describing, iiuc, is scope and layer: my draft was an explicit ioctl
> on the dma-buf fd that the consumer calls to claim the charge (see
> below), while you seem to be suggesting a more general kernel-internal
> function that could work across buffer types and cgroup controllers,
> so not necessarily userspace-initiated? A kernel-internal function
> will need a way to identify the target process, which sounds similar
> to the binder-backed approach from TJ [1]. For everything else, the
> receiver still needs to declare itself, which the ioctl accomplishes.
> 
> ```
> # When an app imports a daemon-allocated buffer, it can transfer the
> charge to itself:
> int buf_fd = receive_dmabuf_from_daemon();
> ioctl(buf_fd, DMA_BUF_IOCTL_XFER_CHARGE); /* charge now attributed to
> apps's cgroup */

Well that thinking goes into the right direction, but the requirements are still not completely covered as far as I can see.

Let me explain below a bit more.

> 
> [1] https://lore.kernel.org/cgroups/20230109213809.418135-1-tjmercier@google.com/
> 
>>
>> The only drawback I can see is that DMA-buf heap allocations would be temporarily accounted to the memory allocation daemon, but I don't think that this would be a problem.
> 
> The main reasons we moved away from TJ's transfer-based approach
> toward `charge_pid_fd` are: avoid the transient charge window on the
> daemon's cgroup; and to decouple from Binder, allowing any allocator
> to use it.

Yeah those concerns are completely correct.

The application should not volunteering says 'Charge that buffer to me.', but rather that the daemon says force charge that buffer to this application and tell me when the application is over its limit.

> 
> Technically, both approaches could coexist, though. Of the three
> scenarios TJ described:
> - Scenario 2 is directly addressed by charge_pid_fd approach without
> any transient charge on the daemon at the cost of one extra field in
> the heap ioctl uAPI struct.

Yeah extending the uAPI to pass in the pid on allocation time is not much of a problem, but you also need to modify the whole stack above it and that is a bit more trickier.

> - Scenario 3 can be handled by the charge transfer function without
> changes to SurfaceFlinger. The app or dequeueBuffer claims the charge
> for itself or the app, respectively (depending on whether we include a
> pid_fd field in the transfer ioctl). It also covers non-heap
> exporters. The con in both variants is the transient charge window on
> the daemon.

It should be trivial for the deamon to charge the buffer to an application before handing it out.

> Both approaches shift the responsibility for correct charging
> attribution to userspace: first, 'charge_pid_fd` on the allocator's
> side, and the transfer charge on the consumer's side.

Yeah that's why I said it would be better if we do that without any uAPI change, but with all the uAPI we have to transfer file descriptors (dup(), fork(), passing FDs over sockets etc...) it could be really tricky to implement that.

> Deciding on one, the other or both depends on how much we value
> avoiding transient attribution, and how much we need a non-heap
> generic solution. With the XFER_CHARGE we can cover both. Thus, the
> `charge_pid_fd` approach in this RFC can be seen as a
> performance/strictness optimisation, eliminating transient charges to
> the daemon at the cost of a permanent uAPI addition to the heap ioctl
> struct, but not strictly required for correctness.

Well all we need is a uAPI which says charge this buffer (file descriptor) to that cgroup (pidfd).

With this at hand we should be able to handle all use cases at the same time.

> On the other hand,
> if we agree on the end goal of migrating other exporters to use
> dma-buf heaps

That won't work. DMA-buf heaps is actually only a rather small and Anroid specific use case.

We have tons of other interfaces to allocate DMA-bufs which need to stay around because of HW restrictions and we do need a solution for them as well.

Regards,
Christian.

>, and scenario 3 is addressed by adding the app's pid_fd
> to SurfaceFlinger, then `charge_pid_fd` alone is a coherent/sufficient
> approach despite the uAPI change.
> 
>>
>> Regards,
>> Christian.
>>
>>>
>>> Thanks
>>> Barry
>>
> 


^ permalink raw reply

* Re: [PATCH v4 04/30] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration
From: David Woodhouse @ 2026-05-19  7:50 UTC (permalink / raw)
  To: Dongli Zhang, kvm
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Thomas Gleixner,
	Sean Christopherson, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dave Hansen, Vitaly Kuznetsov, x86, Marc Zyngier, Juergen Gross,
	Boris Ostrovsky, Paul Durrant, Jonathan Cameron, Sascha Bischoff,
	Jack Allister, Joey Gouly, joe.jin, linux-doc, linux-kernel,
	xen-devel, linux-kselftest
In-Reply-To: <935312be-9a86-49fd-8bb4-2c998a68e2df@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 9845 bytes --]

On Mon, 2026-05-18 at 17:57 -0700, Dongli Zhang wrote:
> On 2026-05-18 1:48 AM, David Woodhouse wrote:
> > ...
> 
> I have fixed the Thunderbird configuration. Does it look better to you?

The date is certainly better, thank you. But although I *was* up late
that night frowning at clocks, I didn't think I was up *quite* as late
(almost 2am) as it suggests.

But I suspect that getting *that* right is beyond the limit of
Thunderbird's configurability.

Thanks :)

> I really appreciate guidelines like the ones below.
> 
> https://lore.kernel.org/all/20240522001817.619072-8-dwmw2@infradead.org
> 
> Assuming I am a user of the new API, I feel confused about whether the goal is
> to replace KVM_SET_CLOCK with KVM_SET_CLOCK_GUEST, or whether the latter is
> meant to supplement the former.

The issue is that KVM_SET_CLOCK_GUEST can only be used in 'masterclock'
mode, when the TSC is reliable and the guest TSCs are all in sync.

Which ought to be *all* of the time, on modern hardware and sane
configurations. And in this series, I don't even let the *guest* screw
that over by setting different TSC offsets on different vCPUs any more
(we stay in masterclock mode in that case now). But the VMM can cause
its guest to come out of masterclock mode, by setting different TSC
*speeds* on different vCPUs.

So there remain some pathological cases where the kvmclock actually
still has a justification to exist, and those are the cases where it
needs to be set in its own right as a function of host time
(KVM_SET_CLOCK), not purely as a function of the guest TSC
(KVM_SET_CLOCK_GUEST).

> 
> If we are going to use KVM_SET_CLOCK_GUEST when KVM_SET_CLOCK is not needed, I
> would appreciate it if the API could carry more data in addition to struct
> pvclock_vcpu_time_info.
> 
> +#define KVM_SET_CLOCK_GUEST    _IOW(KVMIO, 0xd6, struct pvclock_vcpu_time_info)
> +#define KVM_GET_CLOCK_GUEST    _IOR(KVMIO, 0xd7, struct pvclock_vcpu_time_info)
> 
> 
> In the future, if we need to carry additional data, we could simply reuse the
> padding fields instead of introducing another KVM_SET_CLOCK_GUEST2.
> 
> The following is an example of how additional data could be carried.
> 
> KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c68dc1b577eabd5605c6c7c08f3e07ae18d30d5d

I'm not very keen on the way that KVM_[GS]ET_CLOCK threw extra time
references in over the years without any of them ever actually making
*sense*, which makes me reluctant. 

KVM_[GS]ET_CLOCK_GUEST on the other hand does *one* thing: it exports
the relationship between the guest TSC and kvmclock. We don't *need* to
litter it with time values from other clocks willy-nilly.

But yes, you are right in principle.

And in fact, the other part of this conversation has really drawn my
attention to the ugliness of the "try KVM_SET_CLOCK_GUEST and if that
doesn't take then fall back to KVM_SET_CLOCK" which we are pushing onto
userspace.

Yes, I covered it in the guidelines: but we should always abide by the
mantra, "if it *needs* documenting, fix it first".

So perhaps we could build a variant which does both at once. It
provides the guest clock as a function of guest TSC for the sane
masterclock case, but *also* includes CLOCK_TAI at the same moment, in
case it needs to fall back to that.

Although... the actual reference time part of the existing pvclock
might be significantly in the past, especially with all the
MASTERCLOCK_UPDATE elimination that we've been doing. And we'd have to
regenerate it to get a simultaneous realtime reading, wouldn't we?
> > 

> > > Do we need a KVM_CHECK_EXTENSION capability for this? If userspace wants to
> > > support the new API, should it detect availability via KVM_CHECK_EXTENSION, or
> > > simply try the ioctl and handle failure?
> > 
> > That might be conventional, I suppose. But I suspect Jack's thinking
> > was that userspace is going to have to *try* it anyway, and still might
> > have to fall back to what KVM_SET_CLOCK can manage, so userspace
> > probably wouldn't even bother to check that capability; it doesn't
> > matter.
> > 
> > Since then, we've added some more attributes in this series though, and
> > it probably is worth adding a cap which advertises them *all*?
> > Something like KVM_CAP_CLOCK_PRECISION_API?
> 
> From an API user's perspective, userspace may need to distinguish between an API
> failure and the API not being available.

That's -ENOTTY vs. -EINVAL for the ioctl, isn't it? And isn't there
something similarly unambiguous for the attributes?

But I have no objection to adding a capability. The lack of it was more
oversight than intent.

> > > 
> > > From my perspective, I am also curious how we should reason about this in other
> > > scenarios in the future. Specifically, when do we need to process
> > > KVM_REQ_MASTERCLOCK_UPDATE before KVM_REQ_CLOCK_UPDATE, and when is it
> > > acceptable not to? I noticed that kvm_cpuid() already processes only
> > > KVM_REQ_CLOCK_UPDATE.
> > 
> > The way I've been thinking about it — and I'm only two cups of coffee
> > into Monday so take those words literally and don't think of them as
> > British understatement of something I believe is absolute truth — is
> > that MASTERCLOCK_UPDATE is updating the actual clock for the whole VM,
> > while CLOCK_UPDATE is about *putting* that information into the per-
> > vCPU pvclock structures.
> > 
> > So after a MASTERCLOCK_UPDATE, we need to do a CLOCK_UPDATE on all
> > vCPUs to disseminate the result. Which means that if CLOCK_UPDATE is
> > already pending before a MASTERCLOCK_UPDATE, it's probably redundant
> > and might as well be cleared because it's only going to get set *again*
> > in kvm_end_pvclock_update()? 
> 
> Another scenario is when only MASTERCLOCK_UPDATE is pending and there is no
> pending CLOCK_UPDATE.
> 
> In this scenario, is it fine to skip processing MASTERCLOCK_UPDATE before saving
> pvclock_vcpu_time_info?
> 

I'm not sure I understand that scenario. 

MASTERCLOCK_UPDATE means we have to actually recalculate the master
clock (which really *should* be rare, now!). And then any time we do
that, we also have to do a CLOCK_UPDATE on every vCPU to disseminate
the new information. Which is why kvm_end_pvclock_update() does exactly
that.

So your "MASTERCLOCK_UPDATE is pending and there is no pending
CLOCK_UPDATE" doesn't make much sense to me. If MASTERCLOCK_UPDATE is
pending, then there *will* be a CLOCK_UPDATE pending.

> > > 
> > > Would it be helpful to validate that the delta is within a reasonable range,
> > > e.g. that the drift can never be more than five minutes (forward or backward)?
> > 
> > If a guest has been running for months on a previous host and is
> > migrated to a new host, don't we expect that the KVM clock of the new
> > VM on the new host is tweaked from its default near-zero after
> > creation, to some large amount?
> > 
> 
> Regarding live migration, my own investigation does not show a proportional
> relationship between VM uptime and the amount of drift.

You're comparing the VM on the source host, with the VM on the
destination post-migration.

Perhaps I misunderstood, but I thought your suggested validation of a
'reasonable range' would also apply when adjusting the kvmclock of the
nascent VM on the destination host, from "newly created" to "has been
running for months" while migrating the state of the actual guest onto
a clean new slate.

> Just taking QEMU + KVM as an example: suppose TSC scaling is inactive, the
> amount of drift does not depend on how long the VM has been running before live
> migration.
> 
> Instead, it depends on the delta between when we call MSR_IA32_TSC and
> KVM_GET_CLOCK, and between MSR_IA32_TSC and KVM_SET_CLOCK.
> 
> The guest TSC stops at P1 and resumes at P3.
> The kvmclock stops at P2 and resumes at P4.
> 
> We expect P1 == P2 and P3 == P4.
> 
> On source host.
> 
> - kvm_get_msr_common(MSR_IA32_TSC) for vCPU=0 ===> P1

Here's where it all starts going wrong. Line 1.

Any API which lets you get a single time value in isolation, and thus
which is already out of date by the time the system call even returns,
is fundamentally unsuitable for migration.

> - kvm_get_msr_common(MSR_IA32_TSC) for vCPU=1
> - kvm_get_msr_common(MSR_IA32_TSC) for vCPU=2
> - kvm_get_msr_common(MSR_IA32_TSC) for vCPU=3
> - kvm_get_msr_common(MSR_IA32_TSC) for vCPU=4
> ... ...
> - kvm_get_msr_common(MSR_IA32_TSC) for vCPU=N
> - KVM_GET_CLOCK                               ===> P2
> 
> On target host.
> 
> - kvm_set_msr_common(MSR_IA32_TSC) for vCPU=1 ===> P3
> - kvm_set_msr_common(MSR_IA32_TSC) for vCPU=2

At this point, the nasty hack in the kernel steps in, realises that the
value you're setting on vCPU 2 is within a second or so of the value
you had previously set on vCPU 1, and snaps it back to be precisely the
same. To work around the fundamental brokenness of this method.

> - kvm_set_msr_common(MSR_IA32_TSC) for vCPU=3
> - kvm_set_msr_common(MSR_IA32_TSC) for vCPU=4
> - kvm_set_msr_common(MSR_IA32_TSC) for vCPU=5
> ... ...
> - kvm_set_msr_common(MSR_IA32_TSC) for vCPU=N
> - KVM_SET_CLOCK                               ====> P4
> 
> 
> Here is my equiation to predict the drift.

I'm sure you're right, but I didn't get that far when looking at this.
I'd already thrown up in my mouth a little bit by line one.

Here's my equation to predict the drift of a live update done correctly
on the same host using the method I've now put in the documentation:

0.

:)

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH 4/8] drm/panthor: Add support for protected memory allocation in panthor
From: Boris Brezillon @ 2026-05-19  7:39 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Liviu Dudau, Marcin Ślusarz, Ketil Johnsen, David Airlie,
	Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Benjamin Gaignard, Brian Starkey, John Stultz, T.J. Mercier,
	Christian König, Steven Price, Daniel Almeida, Alice Ryhl,
	Matthias Brugger, AngeloGioacchino Del Regno, dri-devel,
	linux-doc, linux-kernel, linux-media, linaro-mm-sig,
	linux-arm-kernel, linux-mediatek, Florent Tomasin, nd
In-Reply-To: <CAPaKu7R9ET767qc3eppBUfG2RAeyrg7E-gE0turgp-u_FU4+Vg@mail.gmail.com>

On Mon, 18 May 2026 17:36:40 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> On Mon, May 18, 2026 at 12:16 AM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > On Wed, 13 May 2026 12:31:32 -0700
> > Chia-I Wu <olvaffe@gmail.com> wrote:
> >  
> > > On Tue, May 12, 2026 at 8:39 AM Liviu Dudau <liviu.dudau@arm.com> wrote:  
> > > >
> > > > On Tue, May 12, 2026 at 04:11:11PM +0200, Boris Brezillon wrote:  
> > > > > On Tue, 12 May 2026 14:47:27 +0100
> > > > > Liviu Dudau <liviu.dudau@arm.com> wrote:
> > > > >  
> > > > > > On Thu, May 07, 2026 at 01:53:56PM +0200, Boris Brezillon wrote:  
> > > > > > > On Thu, 7 May 2026 11:02:26 +0200
> > > > > > > Marcin Ślusarz <marcin.slusarz@arm.com> wrote:
> > > > > > >  
> > > > > > > > On Tue, May 05, 2026 at 06:15:23PM +0200, Boris Brezillon wrote:  
> > > > > > > > > > @@ -277,9 +286,21 @@ int panthor_device_init(struct panthor_device *ptdev)
> > > > > > > > > >                     return ret;
> > > > > > > > > >     }
> > > > > > > > > >
> > > > > > > > > > +   /* If a protected heap name is specified but not found, defer the probe until created */
> > > > > > > > > > +   if (protected_heap_name && strlen(protected_heap_name)) {  
> > > > > > > > >
> > > > > > > > > Do we really need this strlen() > 0? Won't dma_heap_find() fail is the
> > > > > > > > > name is "" already?  
> > > > > > > >
> > > > > > > > If dma_heap_find() will fail, then the whole probe with fail too.
> > > > > > > > This check prevents that.  
> > > > > > >
> > > > > > > Yeah, that's also a questionable design choice. I mean, we can
> > > > > > > currently probe and boot the FW even though we never setup the
> > > > > > > protected FW sections, so why should we defer the probe here? Can't we
> > > > > > > just retry the next time a group with the protected bit is created and
> > > > > > > fail if we can find a protected heap?  
> > > > > >
> > > > > > The problem we have with the current firmware is that it does a number of setup steps at "boot"
> > > > > > time only. One of the steps is preparing its internal structures for when it enters protected
> > > > > > mode and it stores them in the buffer passed in at firmware loading. We cannot later run the
> > > > > > process when we have a group with protected mode set.  
> > > > >
> > > > > No, but we can force a full/slow reset and have that thing
> > > > > re-initialized, can't we? I mean, that's basically what we do when a
> > > > > fast reset fails: we re-initialize all the sections and reset again, at
> > > > > which point the FW should start from a fresh state, and be able to
> > > > > properly initialize the protected-related stuff if protected sections
> > > > > are populated. Am I missing something?  
> > > >
> > > > Right, we can do that. For some reason I keep associating the reset with the
> > > > error handling and not with "normal" operations.  
> > > I kind of hope we end up with either
> > >
> > >  - panthor knows the exact heap to use and fails with EPROBE_DEFER if
> > > the heap is missing, or
> > >  - panthor gets a dma-buf from userspace and does the full reset
> > >    - userspace also needs to provide a dma-buf for each protected
> > > group for the suspend buffer
> > >
> > > than something in-between. The latter is more ad-hoc and basically
> > > kicks the issue to the userspace.  
> >
> > Indeed, the second option is more ad-hoc, but when you think about it,
> > userspace has to have this knowledge, because it needs to know the
> > dma-heap to use for buffer allocation that cross a device boundary
> > anyway. Think about frames produced by a video decoder, and composited
> > by the GPU into a protected scanout buffer that's passed to the KMS
> > device. Why would the GPU driver be source of truth when it comes to
> > choosing the heap to use to allocate protected buffers for the video
> > decoder or those used for the display?  
> I don't think the GPU driver is ever the source of truth. If the
> system integrator wants to specify the source of truth (SoT) from
> kernel space, they should use the device tree (or module params /
> config options). If they want to specify the SoT in userspace, then we
> don't really care how it is done other than providing an ioctl.
> Panthor is always on the receiving end.

Okay, we're on the same page then.

> 
> If we don't want to delay this functionality, but it takes time to
> converge on SoT, maybe a solution that is not a long-term promise can
> work? Of the options on the table (dt, module params, kconfig options,
> ioctls), a kconfig option, potentially marked as experimental, seems
> like a good candidate.

If Panthor is only a consumer, I actually think it'd be easier to just
let userspace pass the protected FW section as an imported buffer
through an ioctl for now. It means we don't need any of the
modifications to the dma_heap API in this series, and userspace is free
to choose its SoT (efuse, DT, ...) and pass the info back to mesa/GBM
somehow (envvar, driconf, ...). The only thing we need to ensure is if
lazy protected FW section allocation is going to work, but given the
current code purely and simply ignores those sections, and the FW is
still able to boot and act properly (at least on v10-v13), I'm pretty
confident this is okay, unless there's some trick the MCU can do to
detect that the protected section isn't mapped (which I doubt, because
the MCU doesn't know it lives behind an MMU).

Of course, once we have a consensus on how to describe this in the DT,
we can switch Panthor over to "protected dma_heap selection through DT",
and reflect that through the ioctl that exposes whether protected
support is ready or not (would be a DEV_QUERY), such that userspace can
skip this "PROTM initialization" step.

We're talking about an extra ioctl to set those buffers, and a
DEV_QUERY to query the state (ready or not), the size of the global
protected buffer (protected FW section) and the size of the protected
suspend buffer. The protected suspend buffer would be allocated and
passed at group creation time (extra arg passed to the existing
GROUP_CREATE ioctl). So, overall, I don't consider it a huge liability
in term of maintenance cost.

> 
> >  
> > >
> > > For the former, expressing the relation in DT seems to be the best,
> > > but only if possible :-). Otherwise, a kconfig option (instead of
> > > module param) should be easier to work with.
> > >
> > > Looking at the userspace implementation, can we also have an panthor
> > > ioctl to return the heap to userspace?  
> >
> > Yes, it's something we can add, but again, I'm questioning the
> > usefulness of this: how can we ensure the heap used by panthor to
> > allocate its protected FW buffers is suitable for scanout buffers
> > (buffers that can be used by display drivers). There needs to be a glue
> > leaving in usersland and taking the decision, and I'm not too sure
> > trusting any of the component in the chain (vdec, gpu, display) is the
> > right thing to do.  
> The heap returned by panthor is only for panfrost/panvk. It says
> nothing about compatibility with other components on the system.

Okay, if it's used only for internal buffers, I guess that's fine.

^ permalink raw reply

* Re: [PATCH v4 16/30] KVM: x86: Restructure kvm_guest_time_update() for TSC upscaling
From: Dongli Zhang @ 2026-05-19  7:38 UTC (permalink / raw)
  To: David Woodhouse, kvm
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
	Paul Durrant, Jonathan Cameron, Sascha Bischoff, Marc Zyngier,
	Joey Gouly, Jack Allister, joe.jin, linux-doc, linux-kernel,
	xen-devel, linux-kselftest
In-Reply-To: <20260509224824.3264567-17-dwmw2@infradead.org>

I have encountered this build error with this patch.

Perhaps it is because all usage of "flags" are removed.

$ make -j32 > /dev/null
arch/x86/kvm/x86.c: In function ‘kvm_guest_time_update’:
arch/x86/kvm/x86.c:3359:23: error: unused variable ‘flags’ [-Werror=unused-variable]
 3359 |         unsigned long flags;
      |                       ^~~~~
cc1: all warnings being treated as errors
make[4]: *** [scripts/Makefile.build:289: arch/x86/kvm/x86.o] Error 1
make[3]: *** [scripts/Makefile.build:548: arch/x86/kvm] Error 2
make[2]: *** [scripts/Makefile.build:548: arch/x86] Error 2
make[1]: *** [/home/opc/ext4/mainline-linux/Makefile:2143: .] Error 2
make: *** [Makefile:248: __sub-make] Error 2

Thank you very much!

Dongli Zhang

On 2026-05-09 3:46 PM, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Restructure kvm_guest_time_update() so that kernel_ns/host_tsc are
> always "now" when doing TSC catchup, then swap in the master clock
> reference values afterward for the hv_clock.
> 
> This makes the TSC upscaling code considerably simpler: the catchup
> adjustment is computed as the delta between what the guest TSC *should*
> be at "now" and what it actually is, rather than mixing "now" and
> "master clock reference" timestamps.
> 
> The seqcount loop now also contains the kvm_get_time_and_clockread()
> call (matching get_kvmclock's pattern), with the same WARN for
> unexpected failure.
> 
> Based on a suggestion by Sean Christopherson.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  arch/x86/kvm/x86.c | 67 ++++++++++++++++++++++++++++++++--------------
>  1 file changed, 47 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e281c49561fa..8e4993ef4f6b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3363,39 +3363,51 @@ int kvm_guest_time_update(struct kvm_vcpu *v)
>  	struct kvm_arch *ka = &v->kvm->arch;
>  	s64 kernel_ns;
>  	u64 tsc_timestamp, host_tsc;
> +	u64 master_host_tsc = 0;
> +	s64 master_kernel_ns = 0;
>  	bool use_master_clock;
>  
> -	kernel_ns = 0;
> -	host_tsc = 0;
> -
>  	/*
>  	 * If the host uses TSC clock, then passthrough TSC as stable
>  	 * to the guest.
>  	 */
>  	do {
>  		seq = read_seqcount_begin(&ka->pvclock_sc);
> +
>  		use_master_clock = ka->use_master_clock;
> -		if (use_master_clock) {
> -			host_tsc = ka->master_cycle_now;
> -			kernel_ns = ka->master_kernel_ns;
> -		}
> +
> +		/*
> +		 * The TSC read and the call to get_cpu_tsc_khz() must happen
> +		 * on the same CPU.
> +		 */
> +		get_cpu();
> +
> +		tgt_tsc_hz = (u64)get_cpu_tsc_khz() * 1000;
> +
> +		if (use_master_clock &&
> +		    !kvm_get_time_and_clockread(&kernel_ns, &host_tsc) &&
> +		    WARN_ON_ONCE(!read_seqcount_retry(&ka->pvclock_sc, seq)))
> +			use_master_clock = false;
> +
> +		put_cpu();
> +
> +		if (!use_master_clock)
> +			break;
> +
> +		master_host_tsc = ka->master_cycle_now;
> +		master_kernel_ns = ka->master_kernel_ns;
>  	} while (read_seqcount_retry(&ka->pvclock_sc, seq));
>  
> -	/* Keep irq disabled to prevent changes to the clock */
> -	local_irq_save(flags);
> -	tgt_tsc_hz = (u64)get_cpu_tsc_khz() * 1000;
>  	if (unlikely(tgt_tsc_hz == 0)) {
> -		local_irq_restore(flags);
>  		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
>  		return 1;
>  	}
> +
>  	if (!use_master_clock) {
>  		host_tsc = rdtsc();
>  		kernel_ns = get_kvmclock_base_ns();
>  	}
>  
> -	tsc_timestamp = kvm_read_l1_tsc(v, host_tsc);
> -
>  	/*
>  	 * We may have to catch up the TSC to match elapsed wall clock
>  	 * time for two reasons, even if kvmclock is used.
> @@ -3404,17 +3416,32 @@ int kvm_guest_time_update(struct kvm_vcpu *v)
>  	 *      entry to avoid unknown leaps of TSC even when running
>  	 *      again on the same CPU.  This may cause apparent elapsed
>  	 *      time to disappear, and the guest to stand still or run
> -	 *	very slowly.
> +	 *      very slowly.
>  	 */
>  	if (vcpu->tsc_catchup) {
> -		u64 tsc = compute_guest_tsc(v, kernel_ns);
> -		if (tsc > tsc_timestamp) {
> -			adjust_tsc_offset_guest(v, tsc - tsc_timestamp);
> -			tsc_timestamp = tsc;
> -		}
> +		s64 adjustment;
> +
> +		/*
> +		 * Calculate the delta between what the guest TSC *should* be
> +		 * and what it actually is according to kvm_read_l1_tsc().
> +		 */
> +		adjustment = compute_guest_tsc(v, kernel_ns) -
> +			     kvm_read_l1_tsc(v, host_tsc);
> +		if (adjustment > 0)
> +			adjust_tsc_offset_guest(v, adjustment);
>  	}
>  
> -	local_irq_restore(flags);
> +	/*
> +	 * Now that TSC upscaling is out of the way, the remaining calculations
> +	 * are all relative to the reference time that's placed in hv_clock.
> +	 * If the master clock is NOT in use, the reference time is "now".  If
> +	 * master clock is in use, the reference time comes from there.
> +	 */
> +	if (use_master_clock) {
> +		host_tsc = master_host_tsc;
> +		kernel_ns = master_kernel_ns;
> +	}
> +	tsc_timestamp = kvm_read_l1_tsc(v, host_tsc);
>  
>  	/* With all the info we got, fill in the values */
>  


^ permalink raw reply

* Re: [PATCH v4 03/30] UAPI: x86: Move pvclock-abi to UAPI for x86 platforms
From: Dongli Zhang @ 2026-05-19  7:35 UTC (permalink / raw)
  To: David Woodhouse, kvm
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Thomas Gleixner,
	Sean Christopherson, Ingo Molnar, Borislav Petkov, Dave Hansen,
	x86, H. Peter Anvin, Vitaly Kuznetsov, Juergen Gross,
	Boris Ostrovsky, Paul Durrant, Jonathan Cameron, Sascha Bischoff,
	Marc Zyngier, Joey Gouly, Jack Allister, joe.jin, linux-doc,
	linux-kernel, xen-devel, linux-kselftest
In-Reply-To: <20260509224824.3264567-4-dwmw2@infradead.org>

I have encountered below build warning.

Perhaps it is because of PATCH 03?

In file included from ./include/linux/types.h:5,
                 from ./arch/x86/include/uapi/asm/pvclock-abi.h:5,
                 from ./arch/x86/include/asm/xen/interface.h:197,
                 from ./include/xen/interface/xen.h:13,
                 from <command-line>:
./include/uapi/linux/types.h:10:2: warning: #warning "Attempt to use kernel
headers from user space, see https://kernelnewbies.org/KernelHeaders" [-Wcpp]
   10 | #warning "Attempt to use kernel headers from user space, see
https://kernelnewbies.org/KernelHeaders"
      |  ^~~~~~~
In file included from ./include/linux/types.h:5,
                 from ./arch/x86/include/uapi/asm/pvclock-abi.h:5,
                 from ./arch/x86/include/asm/xen/interface.h:197,
                 from ./include/xen/interface/xen.h:13,
                 from ./include/xen/interface/xenpmu.h:5,
                 from <command-line>:
./include/uapi/linux/types.h:10:2: warning: #warning "Attempt to use kernel
headers from user space, see https://kernelnewbies.org/KernelHeaders" [-Wcpp]
   10 | #warning "Attempt to use kernel headers from user space, see
https://kernelnewbies.org/KernelHeaders"
      |  ^~~~~~~

Thank you very much!

Dongli Zhang

On 2026-05-09 3:46 PM, David Woodhouse wrote:
> From: Jack Allister <jalliste@amazon.com>
> 
> A subsequent commit will provide a new KVM interface for performing a
> fixup/correction of the KVM clock against the reference TSC. The
> KVM_[GS]ET_CLOCK_GUEST API requires a pvclock_vcpu_time_info, as such
> the caller must know about this definition.
> 
> Move the definition to the UAPI folder so that it is exported to
> usermode and also change the type definitions to use the standard for
> UAPI exports.
> 
> Signed-off-by: Jack Allister <jalliste@amazon.com>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Reviewed-by: Paul Durrant <paul@xen.org>
> ---
>  MAINTAINERS                                   |  4 +--
>  arch/x86/include/{ => uapi}/asm/pvclock-abi.h | 27 ++++++++++---------
>  2 files changed, 17 insertions(+), 14 deletions(-)
>  rename arch/x86/include/{ => uapi}/asm/pvclock-abi.h (82%)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index e0b307b2108c..e49676955c0c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14406,7 +14406,7 @@ S:	Supported
>  T:	git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
>  F:	arch/um/include/asm/kvm_para.h
>  F:	arch/x86/include/asm/kvm_para.h
> -F:	arch/x86/include/asm/pvclock-abi.h
> +F:	arch/x86/include/uapi/asm/pvclock-abi.h
>  F:	arch/x86/include/uapi/asm/kvm_para.h
>  F:	arch/x86/kernel/kvm.c
>  F:	arch/x86/kernel/kvmclock.c
> @@ -29087,7 +29087,7 @@ R:	Boris Ostrovsky <boris.ostrovsky@oracle.com>
>  L:	xen-devel@lists.xenproject.org (moderated for non-subscribers)
>  S:	Supported
>  F:	arch/x86/configs/xen.config
> -F:	arch/x86/include/asm/pvclock-abi.h
> +F:	arch/x86/include/uapi/asm/pvclock-abi.h
>  F:	arch/x86/include/asm/xen/
>  F:	arch/x86/platform/pvh/
>  F:	arch/x86/xen/
> diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/uapi/asm/pvclock-abi.h
> similarity index 82%
> rename from arch/x86/include/asm/pvclock-abi.h
> rename to arch/x86/include/uapi/asm/pvclock-abi.h
> index b9fece5fc96d..6d70cf640362 100644
> --- a/arch/x86/include/asm/pvclock-abi.h
> +++ b/arch/x86/include/uapi/asm/pvclock-abi.h
> @@ -1,6 +1,9 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>  #ifndef _ASM_X86_PVCLOCK_ABI_H
>  #define _ASM_X86_PVCLOCK_ABI_H
> +
> +#include <linux/types.h>
> +
>  #ifndef __ASSEMBLER__
>  
>  /*
> @@ -24,20 +27,20 @@
>   */
>  
>  struct pvclock_vcpu_time_info {
> -	u32   version;
> -	u32   pad0;
> -	u64   tsc_timestamp;
> -	u64   system_time;
> -	u32   tsc_to_system_mul;
> -	s8    tsc_shift;
> -	u8    flags;
> -	u8    pad[2];
> +	__u32   version;
> +	__u32   pad0;
> +	__u64   tsc_timestamp;
> +	__u64   system_time;
> +	__u32   tsc_to_system_mul;
> +	__s8    tsc_shift;
> +	__u8    flags;
> +	__u8    pad[2];
>  } __attribute__((__packed__)); /* 32 bytes */
>  
>  struct pvclock_wall_clock {
> -	u32   version;
> -	u32   sec;
> -	u32   nsec;
> +	__u32   version;
> +	__u32   sec;
> +	__u32   nsec;
>  } __attribute__((__packed__));
>  
>  #define PVCLOCK_TSC_STABLE_BIT	(1 << 0)


^ permalink raw reply

* Re: [PATCH 2/6] mm/damon/sysfs: implement update_schemes_quota_goals command
From: Maksym Shcherba @ 2026-05-19  7:33 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Maksym Shcherba, Maksym Shcherba, akpm, david, ljs, liam, vbabka,
	rppt, surenb, mhocko, corbet, skhan, damon, linux-mm,
	linux-kernel, linux-doc, linux-kselftest
In-Reply-To: <20260519001703.99264-1-sj@kernel.org>

On Mon, 18 May 2026 17:17:02 -0700 SeongJae Park <sj@kernel.org> wrote:

> On Mon, 18 May 2026 22:09:28 +0300 Maksym Shcherba <mshcherba2000@gmail.com> wrote:
> 
> > Add the logic to copy the current_value from the internal
> > damos_quota_goal structure to the damos_sysfs_quota_goal sysfs structure.
> > Introduce the DAMON_SYSFS_CMD_UPDATE_SCHEMES_QUOTA_GOALS command
> > and integrate it with the sysfs interface via the 'state' file.
> 
> Could you please further elaborate why you think this change is needed?  What
> is the expected use case and benefit?
>

Hi SJ,

The documentation (`Documentation/admin-guide/mm/damon/usage.rst`)
states that users can read the `current_value` file. However, the
kernel currently never updates this value in sysfs, preventing users
from reading the actual metrics.

This patch series implements the missing logic to align the code
with the documentation.

If the design intent was to intentionally keep `current_value`
internal and not expose it via sysfs, then the documentation is
incorrect. Let me know if that's the case, and I will send a v2
that drops the code changes and only fixes the documentation.

(Apologies for missing the cover letter where this should have
been explained, this is my first patch submission).

Thanks,
Maksym Shcherba

[...]

^ permalink raw reply

* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Christian König @ 2026-05-19  7:19 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Albert Esteve, Christian Brauner, Tejun Heo, Johannes Weiner,
	Michal Koutný, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
	Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
	Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <CABdmKX3yZubjDKbVqwrjHAiKyj_ioHzOoxd0wzFbJK=PAGOqcQ@mail.gmail.com>

On 5/19/26 01:39, T.J. Mercier wrote:
> On Mon, May 18, 2026 at 7:07 AM Christian König
> <christian.koenig@amd.com> wrote:
>>
>> On 5/18/26 14:50, Albert Esteve wrote:
>>> On Mon, May 18, 2026 at 9:20 AM Christian König
>>> <christian.koenig@amd.com> wrote:
>>>>
>>>> On 5/15/26 19:06, T.J. Mercier wrote:
>>>>> On Fri, May 15, 2026 at 6:53 AM Christian Brauner <brauner@kernel.org> wrote:
>>>>>>
>>>>>> On Tue, May 12, 2026 at 11:10:44AM +0200, Albert Esteve wrote:
>>>>>>> On embedded platforms a central process often allocates dma-buf
>>>>>>> memory on behalf of client applications. Without a way to
>>>>>>> attribute the charge to the requesting client's cgroup, the
>>>>>>> cost lands on the allocator, making per-cgroup memory limits
>>>>>>> ineffective for the actual consumers.
>>>>>>>
>>>>>>> Add charge_pid_fd to struct dma_heap_allocation_data. When set to
>>>>>>
>>>>>> Please be aware that pidfds come in two flavors:
>>>>>>
>>>>>> thread-group pidfds and thread-specific pidfds. Make sure that your API
>>>>>> doesn't implicitly depend on this distinction not existing.
>>>>>
>>>>> Hi Christian,
>>>>>
>>>>> Memcg is not a controller that supports "thread mode" so all threads
>>>>> in a group should belong to the same memcg.
>>>>
>>>> BTW: Exactly that is the requirement automotive has with their native context use case.
>>>>
>>>> The use case is that you have a deamon which has multiple threads were each one is acting on behalve of some other process.
>>>>
>>>> At the moment we basically say they are simply not using cgroups for that use case, but it would be really nice if we could handle that as well.
>>>>
>>>> Summarizing the requirement of that use case: You need a different cgroup for each thread of a process.
>>>
>>> Hi Christian,
>>>
>>> Thanks for sharing this atuomotive usecase. If I understand correctly,
>>> the actual requirement is attributing dma-buf charges to the right
>>> client, not putting each daemon thread in a different cgroup?
>>
>> Nope, exactly that's the difference.
>>
>> The thread acts as a filtering agent for both memory allocation and command submission for somebody else, the process on which behalve the daemon does things can even be in a client VM, completely remote over some network or even something like a microcontroller.
>>
>> Everything the thread does regarding CPU time, GPU driver memory allocation as well as resources like GPU processing and I/O time etc.. needs to be accounted to one client which can be different for each thread of the process.
>>
>> The only thing which is shared with the main process thread is CPU memory resources, e.g. malloc() because that is basically just needed for housekeeping and pretty much irrelevant for this kind of use case.
>>
>> The problem is now you can't do that with cgroups at the moment but unfortunately only the kernel has the information you need to know to do this.
>>
>> So what you end up with is to define tons of interfaces just to get the necessary information from the kernel into userspace and then essentially duplicate the same infrastructure cgroup provides in the kernel in userspace again.
>>
>>> If so,
>>> the `charge_pid_fd` approach achieves this directly by passing the
>>> client's `pid_fd`, without needing to add per-thread cgroup
>>> infrastructure.
>>
>> Well it's already a massive improvemt, we could basically stop doing the whole duplication part for the GPU driver stack and just use cgroups for this part.
>>
>> Doing that automatically for CPU and I/O time would just be nice to have additionally.
>>
>> Regards,
>> Christian.
> 
> Hopefully I'm following correctly here.... So you are duplicating the
> GPU driver stack to achieve remote accounting on a per-thread basis?

Not quite, we are duplicating the handling cgroup provides in the kernel in userspace.

For this memory usage information as well as execution times of the GPU kernel driver is exposed in fdinfo for example.

> Does this mean for GPU allocations you currently have some GFP_ACCOUNT
> magic in your driver to attribute GPU memory to the correct remote
> client?

No, we just expose what the kernel driver has allocated for itself. E.g. page tables, buffers etc...

When userspace allocates something using memfd_create() for example we just ignore that. 

> So this series would close the gap for dma-buf allocations,
> but what about private GPU driver memory allocated on behalf of a
> client?

Well we would need a cgroup which isn't associated with any process were we could charge the GPU driver allocations against.

But good point, charging against a pid wouldn't work in this use case.

Regards,
Christian.

^ permalink raw reply

* Re: [PATCH v2 1/3] dt-bindings: iio: dac: Add AD5529R
From: Janani Sunil @ 2026-05-19  7:13 UTC (permalink / raw)
  To: David Lechner, Jonathan Cameron, Janani Sunil
  Cc: Lars-Peter Clausen, Michael Hennerich, Nuno Sá,
	Andy Shevchenko, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Philipp Zabel, Jonathan Corbet, Shuah Khan, linux-iio, devicetree,
	linux-kernel, linux-doc, rodrigo.alencar
In-Reply-To: <53d547ee-1ac3-42b9-92a6-e7f48b72fee3@baylibre.com>


On 5/16/26 21:25, David Lechner wrote:
> On 5/8/26 7:48 AM, Jonathan Cameron wrote:
>> On Fri, 8 May 2026 13:55:47 +0200
>> Janani Sunil <janani.sunil@analog.com> wrote:
>>
>>> Devicetree bindings for AD5529R 16 channel 12/16 bit high voltage,
>>> buffered voltage output digital-to-analog converter (DAC) with an
>>> integrated precision reference.
>>>
>>> Signed-off-by: Janani Sunil <janani.sunil@analog.com>
>>> ---
> ...
>
>>> +  * Multiplexer for output voltage, load current sense and die temperature
>>> +
>>> +  Datasheet: https://www.analog.com/media/en/technical-documentation/data-sheets/ad5529r.pdf
>>> +
>>> +properties:
>>> +  compatible:
>>> +    const: adi,ad5529r
>>> +
>>> +  reg:
>>> +    maxItems: 1
>>> +
>>> +  spi-max-frequency:
>>> +    maximum: 50000000
>>> +
>>> +  reset-gpios:
>>> +    maxItems: 1
>>> +    description:
>>> +      GPIO connected to the RESET pin. Active low. When asserted low,
>>> +      performs a power-on reset and initializes the device to its default state.
>>> +
>>> +  vdd-supply:
>>> +    description: Digital power supply (typically 3.3V)
>>> +
>>> +  avdd-supply:
>>> +    description: Analog power supply (typically 5V)
>>> +
>>> +  hvdd-supply:
>>> +    description: High voltage positive supply (up to 40V for output range)
>>> +
>>> +  hvss-supply:
>>> +    description: High voltage negative supply (ground or negative voltage)
>> I don't mind doing it this way but in some similar cases where 0 is something that
>> can be considered the 'default' we've made the supply optional.  What was
>> your reasoning for requiring it in this case?
>>
>> dt-bindings should be as complete as we can make them - with that in mind...
>>
>> There are some more interesting corners on this device the binding doesn't
>> currently cover such as mux_out pin.  We'd normally do that by making the
>> driver potentially a client of an ADC
>>
>> Easier though is !alarm which smells like an interrupt.
>> !clear probably a gpio. TG0-3 also GPIOs.
> also optional vref-supply for external vs internal reference

I will add bindings for optional Vref supply in the next version.

Best Regards,
Janani Sunil


^ permalink raw reply

* Re: [PATCH v2 2/3] iio: dac: Add AD5529R DAC driver support
From: Janani Sunil @ 2026-05-19  7:11 UTC (permalink / raw)
  To: David Lechner, Janani Sunil, Lars-Peter Clausen,
	Michael Hennerich, Jonathan Cameron, Nuno Sá,
	Andy Shevchenko, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Philipp Zabel, Jonathan Corbet, Shuah Khan
  Cc: linux-iio, devicetree, linux-kernel, linux-doc
In-Reply-To: <2f74e76e-b066-40ac-9cb4-c75137c9825d@baylibre.com>


On 5/16/26 21:35, David Lechner wrote:
> On 5/8/26 6:55 AM, Janani Sunil wrote:
>> Add support for AD5529R 16-channel, 12/16 bit Digital to Analog Converter
>>
> ...
>
>
>> +		.realbits = (bits),				\
>> +		.storagebits = 16,				\
>> +	},							\
>> +}
>> +static struct regmap *ad5529r_get_regmap(struct ad5529r_state *st, unsigned int reg)
>> +{
>> +	if (reg <= AD5529R_8BIT_REG_MAX)
>> +		return st->regmap_8bit;
>> +
>> +	return st->regmap_16bit;
>> +}
> Another way we have done this is make custom read/write functions for the
> regmap itself so that we don't have to have two regmaps.

Dual regmap approach was chosen here because:

1) It leverages regmap's val_bits validation and endianness for 16 bit registers, rather than
implementing them manually.

2) The two distinct register banks- 8 bit and 16 bit map naturally to the separate regmap configs

3) Each regmap has a focused rd_table/wr_table ranges matching the hardware, rather than a complex unified table

The routing overhead is just an address comparison, similar to what custom functions would need, but with automatic validation
and endianness handling

Best Regards,
Janani Sunil


^ permalink raw reply

* Re: [Linaro-mm-sig] Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Christian König @ 2026-05-19  7:09 UTC (permalink / raw)
  To: Barry Song
  Cc: T.J. Mercier, Albert Esteve, Tejun Heo, Johannes Weiner,
	Michal Koutný, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
	Christian Brauner, Paul Moore, James Morris, Serge E. Hallyn,
	Stephen Smalley, Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc,
	linux-kernel, linux-media, dri-, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <CAGsJ_4z121v4tK_3+j-hkD7HH0gH3w8tWD8nk0CwRhFE5T+4Og@mail.gmail.com>

On 5/19/26 01:00, Barry Song wrote:
> On Mon, May 18, 2026 at 3:34 PM Christian König
> <christian.koenig@amd.com> wrote:
>>
>> On 5/16/26 11:19, Barry Song wrote:
>>> On Thu, May 14, 2026 at 12:35 AM T.J. Mercier <tjmercier@google.com> wrote:
>>> [...]
>>>>>> I have a question about this part. Albert I guess you are interested
>>>>>> only in accounting dmabuf-heap allocations, or do you expect to add
>>>>>> __GFP_ACCOUNT or mem_cgroup_charge_dmabuf calls to other
>>>>>> non-dmabuf-heap exporters?
>>>>>
>>>>> We're scoping this to dma-buf heaps for now. CMA heaps and the dmem
>>>>> controller are on the radar for follow-up/parallel work (there will be
>>>>> dragons and will surely need discussion). For DRM and V4L2 the
>>>>> long-term intent is migration to heaps, which would make direct
>>>>> accounting on those paths unnecessary.
>>>>
>>>> Ah I see. GEM buffers exported to dmabufs are what I had in mind. I
>>>> guess this would only leave the odd non-DRM driver with the need to
>>>> add their own accounting calls, which I don't expect would be a big
>>>> problem.
>>>>
>>>
>>> sounds like we still have a long way to go to correctly account for
>>> various v4l2, drm, GEM, CMA, etc. In patch 1, the charging is done in
>>> dma_buf_export(), so I guess it covers all dma-buf types except
>>> dma_heap, but the problem is that it has no remote charging support at
>>> all?
>>
>> No, just the other way around
>>
>> DMA-buf heaps can be handled here because we know that it is pure system memory and nothing special so memcg always applies.
>>
>> dma_buf_export() on the other hand handles tons of different use cases, ranging from buffer accounted to dmem, over special resources which aren't even memory all the way to buffers which can migrate from dmem to memcg and back during their lifetime.
>>
> 
> Hi Christian,
> 
> Thanks very much for your explanation. So basically it seems that
> dma_buf_export() is not the proper place to charge, since it may end up
> mixing in non-system-memory accounting?

Yes, exactly that.

> My question is also about the global view for both heap and non-heap cases.
> After reading the discussion, I’ve tried to summarize it—please let me know
> if my understanding is correct.
> 
> for dma_heap, we have the ioctl DMA_HEAP_IOCTL_ALLOC, where users can pass a
> remote pidfd or similar information to indicate where the dma-buf should be
> charged, as in Albert's patchset.

Well that's the current proposal, but I think we need to come up with something more general.

> For non-dma_heap dma-bufs, we don’t have an obvious userspace entry point that
> triggers the allocation. So we likely need other approaches. We could either
> move more drivers over to dma-heap, or introduce something like
> DMA_BUF_IOCTL_XFER_CHARGE, as you are discussing, to let userspace explicitly
> declare a charge.

Yeah but that's not only for DMA-buf, we need that for file descriptors returned by memfd_create() as well.

Regards,
Christian.

> Best Regards
> Barry


^ permalink raw reply

* Re: [PATCH v2 2/3] iio: dac: Add AD5529R DAC driver support
From: Janani Sunil @ 2026-05-19  7:07 UTC (permalink / raw)
  To: Jonathan Cameron, Janani Sunil
  Cc: Lars-Peter Clausen, Michael Hennerich, David Lechner,
	Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-kernel, linux-doc
In-Reply-To: <20260508143017.28f86551@jic23-huawei>


On 5/8/26 15:30, Jonathan Cameron wrote:
> On Fri, 8 May 2026 13:55:48 +0200
> Janani Sunil <janani.sunil@analog.com> wrote:
>
>> Add support for AD5529R 16-channel, 12/16 bit Digital to Analog Converter
>>
>> Signed-off-by: Janani Sunil <janani.sunil@analog.com>
>> +/* Register Map */
>> +#define AD5529R_REG_INTERFACE_CONFIG_A		0x00
>> +#define AD5529R_REG_INTERFACE_CONFIG_B		0x01
>> +#define AD5529R_REG_DEVICE_CONFIG		0x02
>> +#define AD5529R_REG_CHIP_TYPE			0x03
>> +#define AD5529R_REG_PRODUCT_ID_L		0x04
>> +#define AD5529R_REG_PRODUCT_ID_H		0x05
>> +#define AD5529R_REG_CHIP_GRADE			0x06
>> +#define AD5529R_REG_SCRATCH_PAD			0x0A
>> +#define AD5529R_REG_SPI_REVISION		0x0B
>> +#define AD5529R_REG_VENDOR_L			0x0C
>> +#define AD5529R_REG_VENDOR_H			0x0D
>> +#define AD5529R_REG_STREAM_MODE			0x0E
>> +#define AD5529R_REG_TRANSFER_CONFIG		0x0F
>> +#define AD5529R_REG_INTERFACE_CONFIG_C		0x10
>> +#define AD5529R_REG_INTERFACE_STATUS_A		0x11
>> +
>> +/* Configuration registers */
>> +#define AD5529R_REG_MULTI_DAC_CH_SEL		(0x14 + 1)
> Feels like this would all be simpler if you used autoincrement rather than
> default value of autdecrement.  What breaks if you do that?
> Superficially feels like all the +1 would go away - though with need
> for a byte swap?  Might be worth that pain for the simpler code.
> Should just be a regmap_config parameter.

Switching to auto increment is feasible. I'll switch to auto increment and
eliminate all the +1 offsets.

>> +
>> +static const struct regmap_range ad5529r_8bit_readable_ranges[] = {
>> +	regmap_reg_range(AD5529R_REG_INTERFACE_CONFIG_A, AD5529R_REG_CHIP_GRADE),
>> +	regmap_reg_range(AD5529R_REG_SCRATCH_PAD, AD5529R_REG_VENDOR_H),
>> +	regmap_reg_range(AD5529R_REG_STREAM_MODE, AD5529R_REG_INTERFACE_STATUS_A),
>> +};
>> +
>> +static const struct regmap_range ad5529r_16bit_readable_ranges[] = {
> Tricky bit here is you are saying it's a 16 bit regmap but then providing
> address ranges including the ones we shouldn't use. We need to hide those
> intermediate addresses.  Various things might work depending on the addresses.
> Can we hide the bottom bit of each address then write it to appropriate value
> under the hood. That is divide addresses by 2?

I'll address this by using reg_stride = 2 in the 16-bit regmap configuration,
which automatically handles the address spacing and eliminates the need for manual
address range exclusion.

>> +	int ret;
>> +
>> +	switch (mask) {
>> +	case IIO_CHAN_INFO_RAW:
>> +		reg_addr = AD5529R_REG_DAC_INPUT_A(chan->channel);
>> +		ret = regmap_read(st->regmap_16bit, reg_addr, &reg_val_h);
>> +		if (ret)
>> +			return ret;
>> +
>> +		*val = reg_val_h;
>> +
>> +		return IIO_VAL_INT;
>> +	case IIO_CHAN_INFO_SCALE:
>> +		/*
>> +		 * Using default 0-5V range: VOUTn = A × D/2^N + B
>> +		 * where A = 5V, B = 0V, D = digital code, N = resolution
>> +		 * Scale = 5000mV / 2^resolution
> See the comment on the dt-binding. I think we need support for
> dt described output ranges from the start. This is a rare multi range
> device where we could set a safe default but to me it makes little sense
> and the driver will be doing something unexpected if a newer DT is
> provided with a different range.

I will add devicetree properties for per channel output range configuration.

>> +
>> +static int ad5529r_probe(struct spi_device *spi)
>> +{
>> +	struct device *dev = &spi->dev;
>> +	struct iio_dev *indio_dev;
>> +	struct ad5529r_state *st;
>> +	int ret;
>> +
>> +	indio_dev = devm_iio_device_alloc(dev, sizeof(*st));
>> +	if (!indio_dev)
>> +		return -ENOMEM;
>> +
>> +	st = iio_priv(indio_dev);
>> +
>> +	st->spi = spi;
>> +
>> +	ret = devm_regulator_bulk_get_enable(dev, AD5529R_NUM_SUPPLIES,
>> +					     ad5529r_supply_names);
>> +	if (ret)
>> +		return dev_err_probe(dev, ret, "Failed to get and enable regulators\n");
>> +
>> +	st->regmap_8bit = devm_regmap_init_spi(spi, &ad5529r_regmap_8bit_config);
>> +	if (IS_ERR(st->regmap_8bit))
>> +		return dev_err_probe(dev, PTR_ERR(st->regmap_8bit),
>> +				     "Failed to initialize 8-bit regmap\n");
>> +
>> +	st->regmap_16bit = devm_regmap_init_spi(spi, &ad5529r_regmap_16bit_config);
>> +	if (IS_ERR(st->regmap_16bit))
>> +		return dev_err_probe(dev, PTR_ERR(st->regmap_16bit),
>> +				     "Failed to initialize 16-bit regmap\n");
>> +
>> +	ret = ad5529r_reset(st);
>> +	if (ret)
>> +		return dev_err_probe(dev, ret, "Failed to reset device\n");
>> +
>> +	ret = ad5529r_detect_device(st);
>> +	if (ret)
>> +		return dev_err_probe(dev, ret, "Failed to detect device variant\n");
> No to this. It breaks the use of fallback device tree compatibles.  As such we
> never fail on an ID missmatch. Instead we just believe firmware when it says
> whatever is there is compatible with this device. See below on why I think
> we need to break this into separate compatibles.

I'll create separate compatibles and remove the device ID detection logic.

Best Regards,
Janani Sunil


^ permalink raw reply

* Re: [PATCH v4 09/10] dt-bindings: firmware: add arm,ras-cper
From: Krzysztof Kozlowski @ 2026-05-19  7:04 UTC (permalink / raw)
  To: Ahmed Tiba, rafael, bp, saket.dumbre, will, xueshuai, mchehab,
	krzk+dt, dave, conor+dt, vishal.l.verma, jic23, corbet, guohanjun,
	dave.jiang, catalin.marinas, lenb, tony.luck, skhan, djbw,
	alison.schofield, ira.weiny, robh
  Cc: devicetree, linux-acpi, linux-doc, Dmitry.Lamerov, linux-cxl,
	Michael.Zhao2, acpica-devel, linux-kernel, linux-arm-kernel,
	linux-edac
In-Reply-To: <20260518-topics-ahmtib01-ras_ffh_arm_internal_review-v4-9-42698675ba61@arm.com>

On 18/05/2026 13:57, Ahmed Tiba wrote:
> Describe the DeviceTree node that exposes the Arm firmware-first
> CPER provider and hook the file into MAINTAINERS so the
> binding has an owner.
> 
> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>

Please implement previous comments.


> ---
>  .../devicetree/bindings/firmware/arm,ras-cper.yaml | 71 ++++++++++++++++++++++
>  MAINTAINERS                                        |  5 ++
>  2 files changed, 76 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/firmware/arm,ras-cper.yaml b/Documentation/devicetree/bindings/firmware/arm,ras-cper.yaml
> new file mode 100644
> index 000000000000..81dc37390af5
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/firmware/arm,ras-cper.yaml
> @@ -0,0 +1,71 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/firmware/arm,ras-cper.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Arm RAS CPER provider
> +
> +maintainers:
> +  - Ahmed Tiba <ahmed.tiba@arm.com>
> +
> +description:
> +  Arm Reliability, Availability and Serviceability (RAS) firmware can expose
> +  a firmware-first CPER error source directly via DeviceTree. Firmware
> +  provides the CPER Generic Error Status block and notifies the OS through
> +  an interrupt.
> +
> +properties:
> +  compatible:
> +    const: arm,ras-cper
> +
> +  memory-region:
> +    oneOf:
> +      - items:
> +          - description:
> +              CPER Generic Error Status block exposed by firmware
> +      - items:
> +          - description:
> +              CPER Generic Error Status block exposed by firmware.

Also, this is just a list with minItems. No need for oneOf.

Best regards,
Krzysztof

^ permalink raw reply

* Re: [PATCH v2 1/3] dt-bindings: iio: dac: Add AD5529R
From: Janani Sunil @ 2026-05-19  6:59 UTC (permalink / raw)
  To: Jonathan Cameron, Janani Sunil
  Cc: Lars-Peter Clausen, Michael Hennerich, David Lechner,
	Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-kernel, linux-doc, rodrigo.alencar
In-Reply-To: <20260508140814.67800e4a@jic23-huawei>


On 5/8/26 15:08, Jonathan Cameron wrote:
> On Fri, 8 May 2026 13:48:43 +0100
> Jonathan Cameron <jic23@kernel.org> wrote:
>
>> On Fri, 8 May 2026 13:55:47 +0200
>> Janani Sunil <janani.sunil@analog.com> wrote:
>>
>>> Devicetree bindings for AD5529R 16 channel 12/16 bit high voltage,
>>> buffered voltage output digital-to-analog converter (DAC) with an
>>> integrated precision reference.
>>>
>>> Signed-off-by: Janani Sunil <janani.sunil@analog.com>
>>> ---
>>>   .../devicetree/bindings/iio/dac/adi,ad5529r.yaml   | 96 ++++++++++++++++++++++
>>>   MAINTAINERS                                        |  7 ++
>>>   2 files changed, 103 insertions(+)
>>>
>>> diff --git a/Documentation/devicetree/bindings/iio/dac/adi,ad5529r.yaml b/Documentation/devicetree/bindings/iio/dac/adi,ad5529r.yaml
>>> new file mode 100644
>>> index 000000000000..f531b4865b01
>>> --- /dev/null
>>> +++ b/Documentation/devicetree/bindings/iio/dac/adi,ad5529r.yaml
>>> @@ -0,0 +1,96 @@
>>> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
>>> +%YAML 1.2
>>> +---
>>> +$id: http://devicetree.org/schemas/iio/dac/adi,ad5529r.yaml#
>>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>>> +
>>> +title: Analog Devices AD5529R 16-Channel 12/16-bit High Voltage DAC
>> How is one device bother 12 and 16-bit? That sometimes happens for
>> ADCs where it is really reflecting oversampling or for device with hardware
>> FIFOs where storage space is saved by using lower bit rate. I'm not sure either
>> applies here.
> Having read the driver I now understand. This is supporting two parts and
> doing device ID based detection.  In an unusual step for Analog they have
> the same base part number with a post fix.  Whilst this approach works today
> it fundamentally breaks fallback dt-compatibles being used in future (the
> driver fails for any non match of WHOAMI value as it needs them to look
> up device specific data)  As such I think you need to have separate
> compatibles for the 12 and 16 bit versions.

AD5529R supports two variants- AD5529R-12 bit and AD5529R-16 bit. They share the same register interface and pin configuration
but differ in DAC resolution. I will add separate compatibles for this case.

Best Regards,
Janani Sunil


^ permalink raw reply

* [syzbot ci] Re: Introduce Per-CPU Work helpers (was QPW)
From: syzbot ci @ 2026-05-19  6:58 UTC (permalink / raw)
  To: akpm, axelrasmussen, baohua, bhe, boqun, bp, brauner, chrisl, cl,
	corbet, coxu, dapeng1.mi, david, dianders, ebiggers, elver,
	feng.tang, frederic, gary, hannes, hao.li, harry, jackmanb, jannh,
	kasong, kees, kuba, leobras.c, liam, linux-doc, linux-kernel,
	linux-mm, linux-rt-devel, lirongqing, ljs, longman, masahiroy,
	mhocko, mingo, mtosatti, nathan, nphamcs, nsc, ojeda,
	pasha.tatashin, paulmck, peterz, pfalcato, qi.zheng, rdunlap
  Cc: syzbot, syzkaller-bugs
In-Reply-To: <20260519012754.240804-1-leobras.c@gmail.com>

syzbot ci has tested the following series

[v4] Introduce Per-CPU Work helpers (was QPW)
https://lore.kernel.org/all/20260519012754.240804-1-leobras.c@gmail.com
* [PATCH v4 1/4] Introducing pw_lock() and per-cpu queue & flush work
* [PATCH v4 2/4] mm/swap: move bh draining into a separate workqueue
* [PATCH v4 3/4] swap: apply new pw_queue_on() interface
* [PATCH v4 4/4] slub: apply new pw_queue_on() interface

and found the following issue:
WARNING in __pcs_replace_empty_main

Full report is available here:
https://ci.syzbot.org/series/804f81bd-77b4-490e-bd57-6345ad2aa923

***

WARNING in __pcs_replace_empty_main

tree:      drm-next
URL:       https://gitlab.freedesktop.org/drm/kernel.git
base:      5200f5f493f79f14bbdc349e402a40dfb32f23c8
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/3ea80958-13bd-49da-9c64-6deb788113f8/config

clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
Zone ranges:
  DMA      [mem 0x0000000000001000-0x0000000000ffffff]
  DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
  Normal   [mem 0x0000000100000000-0x000000023fffffff]
  Device   empty
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x0000000000001000-0x000000000009efff]
  node   0: [mem 0x0000000000100000-0x000000007ffdefff]
  node   0: [mem 0x0000000100000000-0x0000000160000fff]
  node   1: [mem 0x0000000160001000-0x000000023fffffff]
Initmem setup node 0 [mem 0x0000000000001000-0x0000000160000fff]
Initmem setup node 1 [mem 0x0000000160001000-0x000000023fffffff]
On node 0, zone DMA: 1 pages in unavailable ranges
On node 0, zone DMA: 97 pages in unavailable ranges
On node 0, zone Normal: 33 pages in unavailable ranges
setup_percpu: NR_CPUS:8 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:2
percpu: Embedded 71 pages/cpu s250632 r8192 d31992 u2097152
kvm-guest: PV spinlocks disabled, no host support
Kernel command line: earlyprintk=serial net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1 ima_policy=tcb nf-conntrack-ftp.ports=20000 nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000 nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000 binder.debug_mask=0 rcupdate.rcu_expedited=1 rcupdate.rcu_cpu_stall_cputime=1 no_hash_pointers page_owner=on sysctl.vm.nr_hugepages=4 sysctl.vm.nr_overcommit_hugepages=4 secretmem.enable=1 sysctl.max_rcu_stall_to_panic=1 msr.allow_writes=off coredump_filter=0xffff root=/dev/sda console=ttyS0 vsyscall=native numa=fake=2 kvm-intel.nested=1 spec_store_bypass_disable=prctl nopcid vivid.n_devs=64 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2 netrom.nr_ndevs=32 rose.rose_ndevs=32 smp.csd_lock_timeout=100000 watchdog_thresh=55 workqueue.watchdog_thresh=140 sysctl.net.core.netdev_unregister_timeout_secs=140 dummy_hcd.num=32 max_loop=32 nbds_max=32 \
Kernel command line: comedi.comedi_num_legacy_minors=4 panic_on_warn=1 root=/dev/sda console=ttyS0 root=/dev/sda1
Unknown kernel command line parameters "nbds_max=32", will be passed to user space.
printk: log buffer data + meta data: 262144 + 917504 = 1179648 bytes
software IO TLB: area num 2.
Fallback order for Node 0: 0 1 
Fallback order for Node 1: 1 0 
Built 2 zonelists, mobility grouping on.  Total pages: 1834877
Policy zone: Normal
mem auto-init: stack:all(zero), heap alloc:on, heap free:off
stackdepot: allocating hash table via alloc_large_system_hash
stackdepot hash table entries: 1048576 (order: 12, 16777216 bytes, linear)
stackdepot: allocating space for 8192 stack pools via memblock
**********************************************************
**   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
**                                                      **
** This system shows unhashed kernel memory addresses   **
** via the console, logs, and other interfaces. This    **
** might reduce the security of your system.            **
**                                                      **
** If you see this message and you are not debugging    **
** the kernel, report this immediately to your system   **
** administrator!                                       **
**                                                      **
** Use hash_pointers=always to force this mode off      **
**                                                      **
**   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
**********************************************************
------------[ cut here ]------------
debug_locks && !(lock_is_held(&(&s->cpu_sheaves->lock)->dep_map) != 0)
WARNING: mm/slub.c:4601 at __pcs_replace_empty_main+0x51b/0x6e0, CPU#0: swapper/0
Modules linked in:
CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted syzkaller #0 PREEMPT(undef) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:__pcs_replace_empty_main+0x51b/0x6e0
Code: 48 85 f6 74 15 4c 89 ff 48 89 c6 e8 af 5e ff ff 4d 89 74 24 38 e9 36 fc ff ff 49 89 44 24 40 4d 89 74 24 38 e9 27 fc ff ff 90 <0f> 0b 90 83 7b 2c 00 0f 85 23 fb ff ff 48 8b 1b e8 20 cd 82 09 41
RSP: 0000:ffffffff8e607d58 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffffffff91bb8398 RCX: 0000000000000002
RDX: 0000000000000cc0 RSI: ffffffff8e21ec94 RDI: ffffffff8c28b160
RBP: 0000000000000cc0 R08: 0000000000005e00 R09: 00000000477ac845
R10: 0000000047d13f7f R11: 000000002fa01ecd R12: ffff88812103f308
R13: 0000000000000000 R14: ffffffff91bb8398 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88818dc8a000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88823ffff000 CR3: 000000000e74a000 CR4: 00000000000000b0
Call Trace:
 <TASK>
 kmem_cache_alloc_node_noprof+0x441/0x690
 do_kmem_cache_create+0x172/0x620
 create_boot_cache+0xbf/0x120
 kmem_cache_init+0x11a/0x1e0
 mm_core_init+0x7e/0xb0
 start_kernel+0x15a/0x3e0
 x86_64_start_reservations+0x24/0x30
 x86_64_start_kernel+0x143/0x1c0
 common_startup_64+0x13e/0x147
 </TASK>


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).

The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox