All of lore.kernel.org

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v3 1/2] system/memory: Use qemu_ram_{copy, move}() in ram device region accessors
From: Michael S. Tsirkin @ 2026-06-25 11:07 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Gavin Shan, qemu-arm, qemu-devel, peterx, alex, richard.henderson,
	berrange, philmd, philmd, david, clg, pbonzini, phrdina, jugraham,
	liugang24219, dinghui, shan.gavin
In-Reply-To: <CAFEAcA-mf14=jpRFHarAbN6AbQwvW=Yu6WB1JiNMsECEdaxD3A@mail.gmail.com>

On Thu, Jun 25, 2026 at 11:09:26AM +0100, Peter Maydell wrote:
> On Tue, 16 Jun 2026 at 06:26, Gavin Shan <gshan@redhat.com> wrote:
> >
> > All ram device regions were turned to be indirectly accessible by commit
> > 4a2e242bbb ("memory: Don't use memcpy for ram_device regions"). This leads
> > to guest hang on attempt to build 'cuda-samples' as reported by Julia. The
> > guest is started by the following command lines, with GH100 GPU card passed
> > from the host.
> >
> >    host$ lspci | grep GH100
> >    0009:01:00.0 3D controller: NVIDIA Corporation GH100 [GH200 120GB / 480GB] (rev a1)
> >    host$ /home/sandbox/gavin/qemu.main/build/qemu-system-aarch64            \
> >          -machine virt,gic-version=host,ras=on,highmem-mmio-size=4T         \
> >          -accel kvm -cpu host -smp cpus=48 -m size=8G                       \
> >          -drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=d0    \
> >          -device virtio-blk-pci,id=vb0,bus=pcie.0,drive=d0,num-queues=4     \
> >          -device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.1.0
> >            :
> >    guest$ cd cuda-samples/build
> >    guest$ make -j 20 clean
> >    guest$ make -j 20
> >            :
> >    [ 54%] Linking CUDA executable graphMemoryNodes
> >    [ 54%] Built target graphMemoryNodes
> >    <no more output afterwards, guest becomes frozen here>
> >
> >    guest$ qemu-system-aarch64: virtio: bogus descriptor or out of resources
> >    [  555.814025] virtio_blk virtio0: [vda] new size: 268435456 512-byte logical blocks (137 GB/128 GiB)
> >
> > When the GPU's driver (NVidia open driver) is loaded on guest bootup,
> > the memory blocks residing in the PCI BAR#4 of the GH100 GPU card can
> > be presented to the guest through memory hot-add. The page cache can
> > then be allocated from the hot added memory blocks when cuda-samples
> > is being built. Afterwards, the page cache is sent to QEMU's virtio-blk
> > device as part of the DMA request, the bounce buffer has to be used to
> > accomodate the request as the corresponding memory region (MemoryRegion)
> > is an indirectly accessible ram device region in qemu. However, the max
> > bounce bufer size is only 4096 bytes by default and that is exhausted
> > quickly, leading to a reset on the virtio-blk device and frozen guest
> > eventually.
> >
> >   QEMU
> >   ====
> >   virtio_blk_handle_output
> >     virtio_blk_handle_vq
> >       virtio_blk_get_request
> >         virtqueue_pop
> >           virtqueue_split_pop
> >             virtqueue_map_desc
> >               address_space_map
> >                 memory_access_is_direct         # Return false
> >                   memory_region_supports_direct_access
> >
> >   (qemu) info mtree
> >   memory-region: pci_bridge_pci
> >     0000000000000000-ffffffffffffffff (prio 0, container): pci_bridge_pci
> >       0000042000000000-0000043fffffffff (prio 1, i/o): 0009:01:00.0 base BAR 4
> >         0000042000000000-0000043fffffffff (prio 0, i/o): 0009:01:00.0 BAR 4
> >           0000042000000000-000004379fffffff (prio 0, ramd): 0009:01:00.0 BAR 4 mmaps[0]
> >
> > This adds qemu_ram_{copy, move}() and replaces {memcpy, memmove}() with
> > them in the ram device memory region accessors, similar to what's done
> > in commit 4a2e242bbb so that the issue (MMIO access instructions were
> > optimized to SSE instructions) covered by that commit is fixed. This
> > makes 'ram_device_mem_ops' redundant, paving the way to revert that
> > commit to make ram device region directly accessible again in the next
> > patch.
> >
> > Reported-by: Julia Graham <jugraham@redhat.com>
> > Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> > Suggested-by: Peter Xu <peterx@redhat.com>
> > Suggested-by: Richard Henderson <richard.henderson@linaro.org>
> > Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> > Signed-off-by: Gavin Shan <gshan@redhat.com>
> > ---
> > v3: Documentation for qemu_ram_{copy, move}           (Peter/Michael)
> >     Support qemu_ram_move() for overlapped src/dest   (Richard)
> >     Use {memcpy, memmove} if step is 16-bytes or more (Michael)
> >     Code improvements                                 (Richard/Michael)
> > ---
> >  hw/remote/vfio-user-obj.c |   4 +-
> >  include/system/memory.h   |  32 ++++++-
> >  system/physmem.c          | 178 +++++++++++++++++++++++++++++++++++++-
> >  3 files changed, 207 insertions(+), 7 deletions(-)
> >
> > diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> > index 87fa7b6572..97a6c88780 100644
> > --- a/hw/remote/vfio-user-obj.c
> > +++ b/hw/remote/vfio-user-obj.c
> > @@ -375,9 +375,9 @@ static int vfu_object_mr_rw(MemoryRegion *mr, uint8_t *buf, hwaddr offset,
> >          ram_ptr = memory_region_get_ram_ptr(mr);
> >
> >          if (is_write) {
> > -            memcpy((ram_ptr + offset), buf, size);
> > +            qemu_ram_copy(ram_ptr + offset, buf, size);
> >          } else {
> > -            memcpy(buf, (ram_ptr + offset), size);
> > +            qemu_ram_copy(buf, ram_ptr + offset, size);
> >          }
> >
> >          return 0;
> > diff --git a/include/system/memory.h b/include/system/memory.h
> > index 1417132f6d..84203c312d 100644
> > --- a/include/system/memory.h
> > +++ b/include/system/memory.h
> > @@ -2897,6 +2897,36 @@ void address_space_register_map_client(AddressSpace *as, QEMUBH *bh);
> >  void address_space_unregister_map_client(AddressSpace *as, QEMUBH *bh);
> >
> >  /* Internal functions, part of the implementation of address_space_read.  */
> > +
> > +/**
> > + * qemu_ram_copy: copy data to ramblock
> > + *
> > + * @dst: destination where the data is copied to
> > + * @src: source where the data is copied from
> > + * @n: length of data to be copied
> > + *
> > + *
> > + * Copy @n bytes from @src to @dst with the assumption that @src and @dst
> > + * do not overlap. Handles special cases such as uncacheable ramblocks
> > + * correctly. Use this for accessing ramblock in response to DMA/VCPU IO,
> > + * in preference to memcpy().
> > + */
> 
> The documentation for these functions needs to say what semantics
> the function is providing (e.g. can I rely on it to do a 4 byte aligned
> load as a single 4 byte read?). This does not.
> 
> > +/* x86 should work with __builtin_{memcpy, memmove}() for IO access */
> > +#if defined(__i386__) || defined(__x86_64__)
> > +#define HOST_UNALIGNED_MMIO_OK 1
> > +#else
> > +#define HOST_UNALIGNED_MMIO_OK 0
> > +#endif
> 
> This is still wrong. We should not have "x86 magically works
> and all other hosts do something different" ifdefs. Define what
> semantics you need and then we can figure out how to
> implement them.
> 
> My current thought is that we need to handle accesses of
> 1, 2, 4 and 8 bytes that are naturally aligned by ensuring that we
> do exactly one host load/store of that type, and that anything else
> is "the guest isn't relying on specific semantics here, we can just
> assume it's plain old RAM and do whatever". That would not
> require any architecture specific ifdefs.
> 
> thanks
> -- PMM


Well. X86 is special, as usual. It allows unaligned mmio so
we really have no way to know an x86 guest does not intend
just that. That can only be emulated perfectly on x86
which is sad but I see no reason to actively break it.



^ permalink raw reply

* Re: [PATCH] i2c: qcom-cci: drop custom suspend/resume and rely on runtime PM helpers
From: Vladimir Zapolskiy @ 2026-06-25 11:07 UTC (permalink / raw)
  To: Wenmeng Liu, Loic Poulain, Robert Foss, Andi Shyti,
	Bjorn Andersson, Wolfram Sang, Todor Tomov, Vinod Koul
  Cc: linux-i2c, linux-arm-msm, linux-kernel, stable
In-Reply-To: <20260625-cci-v1-1-a100cda673ce@oss.qualcomm.com>

On 6/25/26 12:42, Wenmeng Liu wrote:
> cci_resume() unconditionally calls cci_resume_runtime() regardless of
> the runtime PM state.
> 
> If the device is already runtime-suspended before system suspend,
> the clock is re-enabled while runtime_status remains RPM_SUSPENDED.
> As a result, pm_request_autosuspend() does not arm the timer,
> leaving the clock permanently enabled.
> 
> Fixes: e517526195de ("i2c: Add Qualcomm CCI I2C driver")
> Cc: stable@vger.kernel.org
> Signed-off-by: Wenmeng Liu <wenmeng.liu@oss.qualcomm.com>
> ---
>   drivers/i2c/busses/i2c-qcom-cci.c | 18 +-----------------
>   1 file changed, 1 insertion(+), 17 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-qcom-cci.c b/drivers/i2c/busses/i2c-qcom-cci.c
> index 4d64895a9e9e4e0bd5e0ccb5c3cc04b282b1e4d5..bdeda3979c4814b5cdb463734b8361da7fffa879 100644
> --- a/drivers/i2c/busses/i2c-qcom-cci.c
> +++ b/drivers/i2c/busses/i2c-qcom-cci.c
> @@ -492,24 +492,8 @@ static int __maybe_unused cci_resume_runtime(struct device *dev)
>   	return 0;
>   }
>   
> -static int __maybe_unused cci_suspend(struct device *dev)
> -{
> -	if (!pm_runtime_suspended(dev))
> -		return cci_suspend_runtime(dev);
> -
> -	return 0;
> -}
> -
> -static int __maybe_unused cci_resume(struct device *dev)
> -{
> -	cci_resume_runtime(dev);
> -	pm_request_autosuspend(dev);
> -
> -	return 0;
> -}
> -
>   static const struct dev_pm_ops qcom_cci_pm = {
> -	SET_SYSTEM_SLEEP_PM_OPS(cci_suspend, cci_resume)
> +	SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)
>   	SET_RUNTIME_PM_OPS(cci_suspend_runtime, cci_resume_runtime, NULL)
>   };
>   

Reviewed-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>

-- 
Best wishes,
Vladimir

^ permalink raw reply

* Re: [RFC PATCH 3/3] arm64: dts: mt8516/mt8167: Update pinctrl nodes for the new paris driver
From: sashiko-bot @ 2026-06-25 11:07 UTC (permalink / raw)
  To: Luca Leonardo Scorcia; +Cc: conor+dt, devicetree, robh
In-Reply-To: <20260625104742.113803-4-l.scorcia@gmail.com>

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The `gpio-ranges` property incorrectly specifies a count of 124, which excludes the final pin (GPIO 124) from being mapped to the GPIO subsystem, as the drivers define exactly 125 pins.
--

commit 36a8d0bd5d2cccdc70b499936029deb0499393de
Author: Luca Leonardo Scorcia <l.scorcia@gmail.com>

arm64: dts: mt8516/mt8167: Update pinctrl nodes for the new paris driver

This patch updates the MediaTek mt8516 and mt8167 SoC descriptions to
respect the constraints of the Paris pinctrl driver. It sets separate base
and eint registers and removes the deprecated syscfg_pctl syscon node.

> diff --git a/arch/arm64/boot/dts/mediatek/mt8516.dtsi b/arch/arm64/boot/dts/mediatek/mt8516.dtsi
> index b5e753759465d..63f36df4d1b48 100644
> --- a/arch/arm64/boot/dts/mediatek/mt8516.dtsi
> +++ b/arch/arm64/boot/dts/mediatek/mt8516.dtsi

[ ... ]

> @@ -231,17 +231,13 @@ keypad: keypad@10002000 {
>  			status = "disabled";
>  		};
>  
> -		syscfg_pctl: syscfg-pctl@10005000 {
> -			compatible = "syscon";
> -			reg = <0 0x10005000 0 0x1000>;
> -		};
> -
> -		pio: pinctrl@1000b000 {
> +		pio: pinctrl@10005000 {
>  			compatible = "mediatek,mt8516-pinctrl";
> -			reg = <0 0x1000b000 0 0x1000>;
> -			mediatek,pctl-regmap = <&syscfg_pctl>;
> +			reg = <0 0x10005000 0 0x1000>, <0 0x1000b000 0 0x1000>;
> +			reg-names = "base", "eint";
>  			gpio-controller;
>  			#gpio-cells = <2>;
> +			gpio-ranges = <&pio 0 0 124>;

[Severity: Medium]
Does this count parameter accidentally exclude the final pin (GPIO 124) from
being mapped to the GPIO subsystem?

Looking at the driver definitions for the MT8516 and MT8167 platforms, there
are exactly 125 pins defined. A count of 124 will prevent the mapping of the
final pin, causing any driver or application attempting to request GPIO 124 to
fail. Since mt8167.dtsi includes mt8516.dtsi, this affects both SoCs.

Should this count parameter be 125 instead?

>  			interrupt-controller;
>  			#interrupt-cells = <2>;
>  			interrupts = <GIC_SPI 134 IRQ_TYPE_LEVEL_HIGH>;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260625104742.113803-1-l.scorcia@gmail.com?part=3

^ permalink raw reply

* [PATCH 2/2] iio: adc: Add Nuvoton MA35D1 EADC driver
From: Chi-Wen Weng @ 2026-06-25 11:06 UTC (permalink / raw)
  To: jic23, robh, krzk+dt, conor+dt
  Cc: dlechner, nuno.sa, andy, linux-arm-kernel, linux-iio, devicetree,
	linux-kernel, cwweng, cwweng.linux
In-Reply-To: <20260625110638.38438-1-cwweng.linux@gmail.com>

From: Chi-Wen Weng <cwweng@nuvoton.com>

Add an IIO driver for the Nuvoton MA35D1 Enhanced ADC controller.

The driver supports direct raw reads and triggered buffered capture. The
controller end-of-conversion interrupt is exposed as the device trigger
and is used to push samples into the IIO buffer.

Channels are described by firmware child nodes and can be configured as
single-ended or differential inputs. Since the differential enable bit is
global, mixed single-ended and differential buffered scans are rejected.

DMA support is intentionally not included in this initial upstream driver;
conversions are handled through the interrupt-driven path.

Signed-off-by: Chi-Wen Weng <cwweng@nuvoton.com>
---
 drivers/iio/adc/Kconfig       |  10 +
 drivers/iio/adc/Makefile      |   1 +
 drivers/iio/adc/ma35d1_eadc.c | 636 ++++++++++++++++++++++++++++++++++
 3 files changed, 647 insertions(+)
 create mode 100644 drivers/iio/adc/ma35d1_eadc.c

diff --git a/drivers/iio/adc/Kconfig b/drivers/iio/adc/Kconfig
index 1c663c98c6c9..43409999a94b 100644
--- a/drivers/iio/adc/Kconfig
+++ b/drivers/iio/adc/Kconfig
@@ -981,6 +981,16 @@ config LTC2497
 	  To compile this driver as a module, choose M here: the module will be
 	  called ltc2497.
 
+config MA35D1_EADC
+	tristate "MA35D1 EADC driver"
+	select IIO_BUFFER
+	select IIO_TRIGGERED_BUFFER
+	help
+	  Say yes here to build support for MA35D1 EADC.
+
+	  To compile this driver as a module, choose M here: the module will be
+	  called ma35d1.
+
 config MAX1027
 	tristate "Maxim max1027 ADC driver"
 	depends on SPI
diff --git a/drivers/iio/adc/Makefile b/drivers/iio/adc/Makefile
index 707dd708912f..7b9b38688223 100644
--- a/drivers/iio/adc/Makefile
+++ b/drivers/iio/adc/Makefile
@@ -85,6 +85,7 @@ obj-$(CONFIG_LTC2471) += ltc2471.o
 obj-$(CONFIG_LTC2485) += ltc2485.o
 obj-$(CONFIG_LTC2496) += ltc2496.o ltc2497-core.o
 obj-$(CONFIG_LTC2497) += ltc2497.o ltc2497-core.o
+obj-$(CONFIG_MA35D1_EADC) += ma35d1_eadc.o
 obj-$(CONFIG_MAX1027) += max1027.o
 obj-$(CONFIG_MAX11100) += max11100.o
 obj-$(CONFIG_MAX1118) += max1118.o
diff --git a/drivers/iio/adc/ma35d1_eadc.c b/drivers/iio/adc/ma35d1_eadc.c
new file mode 100644
index 000000000000..0c075126e139
--- /dev/null
+++ b/drivers/iio/adc/ma35d1_eadc.c
@@ -0,0 +1,636 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Nuvoton MA35D1 EADC driver
+ *
+ * Copyright (c) 2026 Nuvoton Technology Corp.
+ */
+
+#include <linux/bitfield.h>
+#include <linux/bits.h>
+#include <linux/bitmap.h>
+#include <linux/clk.h>
+#include <linux/completion.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mod_devicetable.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/pm.h>
+#include <linux/property.h>
+
+#include <linux/iio/buffer.h>
+#include <linux/iio/iio.h>
+#include <linux/iio/trigger.h>
+#include <linux/iio/trigger_consumer.h>
+#include <linux/iio/triggered_buffer.h>
+
+#define MA35D1_EADC_DAT(n)		(0x00 + (n) * 0x04)
+#define MA35D1_EADC_CTL			0x50
+#define MA35D1_EADC_SWTRG		0x54
+#define MA35D1_EADC_SCTL(n)		(0x80 + (n) * 0x04)
+#define MA35D1_EADC_INTSRC0		0xd0
+#define MA35D1_EADC_STATUS2		0xf8
+#define MA35D1_EADC_SELSMP0		0x140
+#define MA35D1_EADC_REFADJCTL		0x150
+
+#define MA35D1_EADC_CTL_ADCEN		BIT(0)
+#define MA35D1_EADC_CTL_ADCIEN0		BIT(2)
+#define MA35D1_EADC_CTL_DIFFEN		BIT(8)
+
+#define MA35D1_EADC_SCTL_CHSEL_MASK	GENMASK(3, 0)
+#define MA35D1_EADC_SCTL_TRGDLY_MASK	GENMASK(15, 8)
+#define MA35D1_EADC_SCTL_TRGSEL_MASK	GENMASK(21, 16)
+#define MA35D1_EADC_SCTL_TRGSEL_ADINT0	\
+	FIELD_PREP(MA35D1_EADC_SCTL_TRGSEL_MASK, 2)
+
+#define MA35D1_EADC_DAT_MASK		GENMASK(11, 0)
+#define MA35D1_EADC_STATUS2_ADIF0	BIT(0)
+#define MA35D1_EADC_INTSRC0_ADINT0	BIT(0)
+#define MA35D1_EADC_REFADJCTL_EXT_VREF	BIT(0)
+
+#define MA35D1_EADC_MAX_CHANNELS	9
+#define MA35D1_EADC_MAX_SAMPLE_MODULES	16
+#define MA35D1_EADC_CHAN_NAME_LEN	16
+#define MA35D1_EADC_TIMEOUT		msecs_to_jiffies(1000)
+
+struct ma35d1_adc {
+	struct device *dev;
+	void __iomem *regs;
+	struct clk *clk;
+	struct completion completion;
+	/* Protects direct conversions against concurrent register access. */
+	struct mutex lock;
+	struct iio_trigger *trig;
+	unsigned int scan_chancnt;
+	bool scan_differential;
+	char chan_name[MA35D1_EADC_MAX_CHANNELS][MA35D1_EADC_CHAN_NAME_LEN];
+	struct {
+		u16 channels[MA35D1_EADC_MAX_SAMPLE_MODULES];
+		aligned_s64 timestamp;
+	} scan;
+};
+
+static inline u32 ma35d1_adc_read(struct ma35d1_adc *adc, u32 reg)
+{
+	return readl(adc->regs + reg);
+}
+
+static inline void ma35d1_adc_write(struct ma35d1_adc *adc, u32 reg, u32 val)
+{
+	writel(val, adc->regs + reg);
+}
+
+static void ma35d1_adc_rmw(struct ma35d1_adc *adc, u32 reg, u32 mask, u32 val)
+{
+	u32 tmp;
+
+	tmp = ma35d1_adc_read(adc, reg);
+	tmp &= ~mask;
+	tmp |= val;
+	ma35d1_adc_write(adc, reg, tmp);
+}
+
+static void ma35d1_adc_set_diff(struct ma35d1_adc *adc, bool differential)
+{
+	ma35d1_adc_rmw(adc, MA35D1_EADC_CTL, MA35D1_EADC_CTL_DIFFEN,
+		       differential ? MA35D1_EADC_CTL_DIFFEN : 0);
+}
+
+static void ma35d1_adc_config_sample(struct ma35d1_adc *adc,
+				     unsigned int sample, unsigned int channel)
+{
+	u32 reg = MA35D1_EADC_SCTL(sample);
+
+	ma35d1_adc_rmw(adc, reg,
+		       MA35D1_EADC_SCTL_CHSEL_MASK |
+		       MA35D1_EADC_SCTL_TRGSEL_MASK,
+		       FIELD_PREP(MA35D1_EADC_SCTL_CHSEL_MASK, channel) |
+		       MA35D1_EADC_SCTL_TRGSEL_ADINT0);
+}
+
+static void ma35d1_adc_disable_irq(struct ma35d1_adc *adc)
+{
+	ma35d1_adc_rmw(adc, MA35D1_EADC_CTL, MA35D1_EADC_CTL_ADCIEN0, 0);
+}
+
+static void ma35d1_adc_hw_init(struct ma35d1_adc *adc)
+{
+	ma35d1_adc_disable_irq(adc);
+	ma35d1_adc_rmw(adc, MA35D1_EADC_CTL,
+		       MA35D1_EADC_CTL_ADCEN, MA35D1_EADC_CTL_ADCEN);
+	ma35d1_adc_write(adc, MA35D1_EADC_STATUS2, MA35D1_EADC_STATUS2_ADIF0);
+	ma35d1_adc_rmw(adc, MA35D1_EADC_INTSRC0,
+		       MA35D1_EADC_INTSRC0_ADINT0,
+		       MA35D1_EADC_INTSRC0_ADINT0);
+	ma35d1_adc_rmw(adc, MA35D1_EADC_REFADJCTL,
+		       MA35D1_EADC_REFADJCTL_EXT_VREF,
+		       MA35D1_EADC_REFADJCTL_EXT_VREF);
+	ma35d1_adc_rmw(adc, MA35D1_EADC_SELSMP0, GENMASK(1, 0), 3);
+}
+
+static void ma35d1_adc_hw_disable(void *data)
+{
+	struct ma35d1_adc *adc = data;
+
+	ma35d1_adc_disable_irq(adc);
+	ma35d1_adc_rmw(adc, MA35D1_EADC_CTL, MA35D1_EADC_CTL_ADCEN, 0);
+}
+
+static irqreturn_t ma35d1_adc_isr(int irq, void *data)
+{
+	struct iio_dev *indio_dev = data;
+	struct ma35d1_adc *adc = iio_priv(indio_dev);
+	u32 status;
+
+	status = ma35d1_adc_read(adc, MA35D1_EADC_STATUS2);
+	if (!(status & MA35D1_EADC_STATUS2_ADIF0))
+		return IRQ_NONE;
+
+	ma35d1_adc_write(adc, MA35D1_EADC_STATUS2, MA35D1_EADC_STATUS2_ADIF0);
+
+	if (iio_buffer_enabled(indio_dev)) {
+		ma35d1_adc_disable_irq(adc);
+		iio_trigger_poll(adc->trig);
+	} else {
+		complete(&adc->completion);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t ma35d1_adc_trigger_handler(int irq, void *p)
+{
+	struct iio_poll_func *pf = p;
+	struct iio_dev *indio_dev = pf->indio_dev;
+	struct ma35d1_adc *adc = iio_priv(indio_dev);
+	int i;
+
+	for (i = 0; i < adc->scan_chancnt; i++)
+		adc->scan.channels[i] =
+			ma35d1_adc_read(adc, MA35D1_EADC_DAT(i)) &
+			MA35D1_EADC_DAT_MASK;
+
+	iio_push_to_buffers_with_timestamp(indio_dev, &adc->scan, pf->timestamp);
+	iio_trigger_notify_done(adc->trig);
+
+	ma35d1_adc_rmw(adc, MA35D1_EADC_CTL, MA35D1_EADC_CTL_ADCIEN0,
+		       MA35D1_EADC_CTL_ADCIEN0);
+	ma35d1_adc_write(adc, MA35D1_EADC_SWTRG, 1);
+
+	return IRQ_HANDLED;
+}
+
+static int ma35d1_adc_read_conversion(struct iio_dev *indio_dev,
+				      const struct iio_chan_spec *chan,
+				      int *val)
+{
+	struct ma35d1_adc *adc = iio_priv(indio_dev);
+	long timeout;
+
+	reinit_completion(&adc->completion);
+
+	ma35d1_adc_write(adc, MA35D1_EADC_STATUS2, MA35D1_EADC_STATUS2_ADIF0);
+	ma35d1_adc_rmw(adc, MA35D1_EADC_SCTL(0),
+		       MA35D1_EADC_SCTL_CHSEL_MASK |
+		       MA35D1_EADC_SCTL_TRGSEL_MASK,
+		       FIELD_PREP(MA35D1_EADC_SCTL_CHSEL_MASK,
+				  chan->channel));
+	ma35d1_adc_set_diff(adc, chan->differential);
+	ma35d1_adc_rmw(adc, MA35D1_EADC_CTL, MA35D1_EADC_CTL_ADCIEN0,
+		       MA35D1_EADC_CTL_ADCIEN0);
+	ma35d1_adc_write(adc, MA35D1_EADC_SWTRG, 1);
+
+	timeout = wait_for_completion_interruptible_timeout(&adc->completion,
+							    MA35D1_EADC_TIMEOUT);
+	ma35d1_adc_disable_irq(adc);
+
+	if (timeout < 0)
+		return timeout;
+	if (!timeout)
+		return -ETIMEDOUT;
+
+	*val = ma35d1_adc_read(adc, MA35D1_EADC_DAT(0)) & MA35D1_EADC_DAT_MASK;
+
+	return 0;
+}
+
+static int ma35d1_adc_read_raw(struct iio_dev *indio_dev,
+			       const struct iio_chan_spec *chan,
+			       int *val, int *val2, long mask)
+{
+	struct ma35d1_adc *adc = iio_priv(indio_dev);
+	int ret;
+
+	switch (mask) {
+	case IIO_CHAN_INFO_RAW:
+		if (!iio_device_claim_direct(indio_dev))
+			return -EBUSY;
+
+		mutex_lock(&adc->lock);
+		ret = ma35d1_adc_read_conversion(indio_dev, chan, val);
+		mutex_unlock(&adc->lock);
+
+		iio_device_release_direct(indio_dev);
+		if (ret)
+			return ret;
+
+		return IIO_VAL_INT;
+	default:
+		return -EINVAL;
+	}
+}
+
+static int ma35d1_adc_validate_scan(struct iio_dev *indio_dev,
+				    const unsigned long *scan_mask)
+{
+	const struct iio_chan_spec *chan;
+	bool have_single = false;
+	bool have_diff = false;
+	unsigned int count = 0;
+	unsigned long bit;
+
+	for_each_set_bit(bit, scan_mask, indio_dev->masklength) {
+		chan = &indio_dev->channels[bit];
+
+		if (chan->type == IIO_TIMESTAMP)
+			continue;
+		count++;
+		if (chan->differential)
+			have_diff = true;
+		else
+			have_single = true;
+	}
+
+	if (!count || count > MA35D1_EADC_MAX_SAMPLE_MODULES)
+		return -EINVAL;
+
+	if (have_single && have_diff)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int ma35d1_adc_update_scan_mode(struct iio_dev *indio_dev,
+				       const unsigned long *scan_mask)
+{
+	struct ma35d1_adc *adc = iio_priv(indio_dev);
+	const struct iio_chan_spec *chan;
+	unsigned int sample = 0;
+	unsigned long bit;
+	bool differential = false;
+	int ret;
+
+	ret = ma35d1_adc_validate_scan(indio_dev, scan_mask);
+	if (ret)
+		return ret;
+
+	for_each_set_bit(bit, scan_mask, indio_dev->masklength) {
+		chan = &indio_dev->channels[bit];
+		if (chan->type == IIO_TIMESTAMP)
+			continue;
+
+		if (!sample)
+			differential = chan->differential;
+
+		ma35d1_adc_config_sample(adc, sample, chan->channel);
+		sample++;
+	}
+
+	adc->scan_chancnt = sample;
+	adc->scan_differential = differential;
+
+	return 0;
+}
+
+static int ma35d1_adc_buffer_postenable(struct iio_dev *indio_dev)
+{
+	struct ma35d1_adc *adc = iio_priv(indio_dev);
+	int i;
+
+	if (!adc->scan_chancnt)
+		return -EINVAL;
+
+	ma35d1_adc_write(adc, MA35D1_EADC_STATUS2, MA35D1_EADC_STATUS2_ADIF0);
+	ma35d1_adc_rmw(adc, MA35D1_EADC_INTSRC0,
+		       MA35D1_EADC_INTSRC0_ADINT0,
+		       MA35D1_EADC_INTSRC0_ADINT0);
+	ma35d1_adc_rmw(adc, MA35D1_EADC_REFADJCTL,
+		       MA35D1_EADC_REFADJCTL_EXT_VREF,
+		       MA35D1_EADC_REFADJCTL_EXT_VREF);
+	ma35d1_adc_rmw(adc, MA35D1_EADC_SELSMP0, GENMASK(1, 0), 3);
+	ma35d1_adc_set_diff(adc, adc->scan_differential);
+
+	for (i = 0; i < adc->scan_chancnt; i++)
+		ma35d1_adc_rmw(adc, MA35D1_EADC_SCTL(i),
+			       MA35D1_EADC_SCTL_TRGDLY_MASK,
+			       MA35D1_EADC_SCTL_TRGDLY_MASK);
+
+	ma35d1_adc_rmw(adc, MA35D1_EADC_CTL, MA35D1_EADC_CTL_ADCIEN0,
+		       MA35D1_EADC_CTL_ADCIEN0);
+	ma35d1_adc_write(adc, MA35D1_EADC_SWTRG, 1);
+
+	return 0;
+}
+
+static int ma35d1_adc_buffer_predisable(struct iio_dev *indio_dev)
+{
+	struct ma35d1_adc *adc = iio_priv(indio_dev);
+	int i;
+
+	ma35d1_adc_disable_irq(adc);
+	for (i = 0; i < adc->scan_chancnt; i++)
+		ma35d1_adc_rmw(adc, MA35D1_EADC_SCTL(i),
+			       MA35D1_EADC_SCTL_TRGSEL_MASK, 0);
+
+	return 0;
+}
+
+static const struct iio_buffer_setup_ops ma35d1_adc_buffer_ops = {
+	.postenable = ma35d1_adc_buffer_postenable,
+	.predisable = ma35d1_adc_buffer_predisable,
+};
+
+static const struct iio_info ma35d1_adc_info = {
+	.read_raw = ma35d1_adc_read_raw,
+	.update_scan_mode = ma35d1_adc_update_scan_mode,
+};
+
+static const struct iio_trigger_ops ma35d1_adc_trigger_ops = {
+	.validate_device = iio_trigger_validate_own_device,
+};
+
+static void ma35d1_adc_init_channel(struct ma35d1_adc *adc,
+				    struct iio_chan_spec *chan, u32 vinp,
+				    u32 vinn, int scan_index, bool differential)
+{
+	char *name = adc->chan_name[vinp];
+
+	chan->type = IIO_VOLTAGE;
+	chan->indexed = 1;
+	chan->channel = vinp;
+	chan->address = vinp;
+	chan->scan_index = scan_index;
+	chan->info_mask_separate = BIT(IIO_CHAN_INFO_RAW);
+	chan->scan_type.sign = 'u';
+	chan->scan_type.realbits = 12;
+	chan->scan_type.storagebits = 16;
+	chan->scan_type.endianness = IIO_CPU;
+
+	if (differential) {
+		chan->differential = 1;
+		chan->channel2 = vinn;
+		snprintf(name, MA35D1_EADC_CHAN_NAME_LEN, "in%d-in%d", vinp,
+			 vinn);
+	} else {
+		snprintf(name, MA35D1_EADC_CHAN_NAME_LEN, "in%d", vinp);
+	}
+
+	chan->datasheet_name = name;
+}
+
+static int ma35d1_adc_parse_channels(struct iio_dev *indio_dev,
+				     struct device *dev)
+{
+	struct ma35d1_adc *adc = iio_priv(indio_dev);
+	DECLARE_BITMAP(used_channels, MA35D1_EADC_MAX_CHANNELS);
+	struct fwnode_handle *child;
+	struct iio_chan_spec *channels;
+	int num_channels;
+	int scan_index = 0;
+	int ret;
+
+	bitmap_zero(used_channels, MA35D1_EADC_MAX_CHANNELS);
+
+	num_channels = device_get_child_node_count(dev);
+	if (!num_channels)
+		return dev_err_probe(dev, -ENODATA,
+				     "no ADC channels configured\n");
+
+	if (num_channels > MA35D1_EADC_MAX_CHANNELS)
+		return dev_err_probe(dev, -EINVAL, "too many ADC channels\n");
+
+	channels = devm_kcalloc(dev, num_channels + 1, sizeof(*channels),
+				GFP_KERNEL);
+	if (!channels)
+		return -ENOMEM;
+
+	device_for_each_child_node(dev, child) {
+		u32 diff[2];
+		u32 reg;
+		bool differential = false;
+
+		ret = fwnode_property_read_u32(child, "reg", &reg);
+		if (ret) {
+			fwnode_handle_put(child);
+			return dev_err_probe(dev, ret,
+					     "missing channel reg property\n");
+		}
+
+		if (reg >= MA35D1_EADC_MAX_CHANNELS) {
+			fwnode_handle_put(child);
+			return dev_err_probe(dev, -EINVAL,
+					     "invalid ADC channel %u\n", reg);
+		}
+
+		if (test_and_set_bit(reg, used_channels)) {
+			fwnode_handle_put(child);
+			return dev_err_probe(dev, -EINVAL,
+					     "duplicate ADC channel %u\n", reg);
+		}
+
+		if (fwnode_property_present(child, "diff-channels")) {
+			ret = fwnode_property_read_u32_array(child,
+							     "diff-channels",
+							     diff,
+							     ARRAY_SIZE(diff));
+			if (ret) {
+				fwnode_handle_put(child);
+				return dev_err_probe(dev, ret,
+						     "invalid diff-channels for channel %u\n",
+						     reg);
+			}
+
+			if (diff[0] != reg ||
+			    diff[1] >= MA35D1_EADC_MAX_CHANNELS ||
+			    diff[0] == diff[1]) {
+				fwnode_handle_put(child);
+				return dev_err_probe(dev, -EINVAL,
+						     "invalid differential ADC channel %u-%u\n",
+						     diff[0], diff[1]);
+			}
+
+			if (test_and_set_bit(diff[1], used_channels)) {
+				fwnode_handle_put(child);
+				return dev_err_probe(dev, -EINVAL,
+						     "ADC channel %u already used\n",
+						     diff[1]);
+			}
+
+			differential = true;
+		}
+
+		ma35d1_adc_init_channel(adc, &channels[scan_index], reg,
+					differential ? diff[1] : 0,
+					scan_index, differential);
+		scan_index++;
+	}
+
+	channels[scan_index] = (struct iio_chan_spec)
+		IIO_CHAN_SOFT_TIMESTAMP(scan_index);
+
+	indio_dev->channels = channels;
+	indio_dev->num_channels = scan_index + 1;
+	indio_dev->masklength = indio_dev->num_channels;
+
+	return 0;
+}
+
+static int ma35d1_adc_setup_trigger(struct iio_dev *indio_dev,
+				    struct device *dev)
+{
+	struct ma35d1_adc *adc = iio_priv(indio_dev);
+	int ret;
+
+	adc->trig = devm_iio_trigger_alloc(dev, "%s-trigger", dev_name(dev));
+	if (!adc->trig)
+		return -ENOMEM;
+
+	adc->trig->ops = &ma35d1_adc_trigger_ops;
+	iio_trigger_set_drvdata(adc->trig, indio_dev);
+
+	ret = devm_iio_trigger_register(dev, adc->trig);
+	if (ret)
+		return dev_err_probe(dev, ret, "failed to register trigger\n");
+
+	ret = iio_trigger_set_immutable(indio_dev, adc->trig);
+	if (ret)
+		return dev_err_probe(dev, ret, "failed to set trigger\n");
+
+	return 0;
+}
+
+static int ma35d1_adc_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct iio_dev *indio_dev;
+	struct ma35d1_adc *adc;
+	int irq;
+	int ret;
+
+	indio_dev = devm_iio_device_alloc(dev, sizeof(*adc));
+	if (!indio_dev)
+		return -ENOMEM;
+	adc = iio_priv(indio_dev);
+	adc->dev = dev;
+	mutex_init(&adc->lock);
+	init_completion(&adc->completion);
+
+	adc->regs = devm_platform_ioremap_resource(pdev, 0);
+	if (IS_ERR(adc->regs))
+		return dev_err_probe(dev, PTR_ERR(adc->regs),
+				     "failed to map registers\n");
+
+	adc->clk = devm_clk_get_enabled(dev, NULL);
+	if (IS_ERR(adc->clk))
+		return dev_err_probe(dev, PTR_ERR(adc->clk),
+				     "failed to get and enable ADC clock\n");
+
+	indio_dev->name = "ma35d1-eadc";
+	indio_dev->modes = INDIO_DIRECT_MODE | INDIO_BUFFER_TRIGGERED;
+	indio_dev->info = &ma35d1_adc_info;
+
+	ret = ma35d1_adc_parse_channels(indio_dev, dev);
+	if (ret)
+		return ret;
+
+	ma35d1_adc_hw_init(adc);
+
+	ret = devm_add_action_or_reset(dev, ma35d1_adc_hw_disable, adc);
+	if (ret)
+		return ret;
+
+	ret = ma35d1_adc_setup_trigger(indio_dev, dev);
+	if (ret)
+		return ret;
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq < 0)
+		return irq;
+
+	ret = devm_request_irq(dev, irq, ma35d1_adc_isr, 0, dev_name(dev),
+			       indio_dev);
+	if (ret)
+		return dev_err_probe(dev, ret, "failed to request IRQ %d\n", irq);
+
+	ret = devm_iio_triggered_buffer_setup(dev, indio_dev,
+					      iio_pollfunc_store_time,
+					      ma35d1_adc_trigger_handler,
+					      &ma35d1_adc_buffer_ops);
+	if (ret)
+		return dev_err_probe(dev, ret,
+				     "failed to setup triggered buffer\n");
+
+	platform_set_drvdata(pdev, indio_dev);
+
+	ret = devm_iio_device_register(dev, indio_dev);
+	if (ret)
+		return dev_err_probe(dev, ret, "failed to register IIO device\n");
+
+	return 0;
+}
+
+static int ma35d1_adc_suspend(struct device *dev)
+{
+	struct iio_dev *indio_dev = dev_get_drvdata(dev);
+	struct ma35d1_adc *adc = iio_priv(indio_dev);
+
+	if (iio_buffer_enabled(indio_dev))
+		return -EBUSY;
+
+	ma35d1_adc_hw_disable(adc);
+	clk_disable_unprepare(adc->clk);
+
+	return 0;
+}
+
+static int ma35d1_adc_resume(struct device *dev)
+{
+	struct iio_dev *indio_dev = dev_get_drvdata(dev);
+	struct ma35d1_adc *adc = iio_priv(indio_dev);
+	int ret;
+
+	ret = clk_prepare_enable(adc->clk);
+	if (ret)
+		return ret;
+
+	ma35d1_adc_hw_init(adc);
+
+	return 0;
+}
+
+static DEFINE_SIMPLE_DEV_PM_OPS(ma35d1_adc_pm_ops,
+				ma35d1_adc_suspend, ma35d1_adc_resume);
+
+static const struct of_device_id ma35d1_adc_of_match[] = {
+	{ .compatible = "nuvoton,ma35d1-eadc" },
+	{ }
+};
+MODULE_DEVICE_TABLE(of, ma35d1_adc_of_match);
+
+static struct platform_driver ma35d1_adc_driver = {
+	.probe = ma35d1_adc_probe,
+	.driver = {
+		.name = "ma35d1-eadc",
+		.of_match_table = ma35d1_adc_of_match,
+		.pm = pm_sleep_ptr(&ma35d1_adc_pm_ops),
+	},
+};
+module_platform_driver(ma35d1_adc_driver);
+
+MODULE_AUTHOR("Chi-Wen Weng <cwweng@nuvoton.com>");
+MODULE_DESCRIPTION("Nuvoton MA35D1 EADC driver");
+MODULE_LICENSE("GPL");
-- 
2.25.1



^ permalink raw reply related

* Re: [PATCH v19 1/7] coresight: core: refactor ctcu_get_active_port and make it generic
From: sashiko-bot @ 2026-06-25 11:07 UTC (permalink / raw)
  To: Jie Gan; +Cc: devicetree, conor+dt, robh
In-Reply-To: <20260625-enable-byte-cntr-for-ctcu-v19-1-8fbbf22e8381@oss.qualcomm.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] Lockless iteration over `pdata->in_conns` array introduces a TOCTOU race condition and potential Use-After-Free.

Pre-existing issues:
- [High] Missing bounds check for Device Tree-provided `port_num` leads to out-of-bounds array access in CTCU trace ID configuration.
- [High] Unsafe direct dereference of sparse array elements in `tpda_get_element_size`.
--

commit f655e6df0885045f179728cb52f455717556ac67
Author: Jie Gan <jie.gan@oss.qualcomm.com>

coresight: core: refactor ctcu_get_active_port and make it generic

This patch removes ctcu_get_active_port from the CTCU module and adds it to
the core framework as coresight_get_in_port. It allows retrieving the active
input port number generically for a given device.

> diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c
> index 6d65c43d574fd..7a7a85acdca0d 100644
> --- a/drivers/hwtracing/coresight/coresight-core.c
> +++ b/drivers/hwtracing/coresight/coresight-core.c
> @@ -773,6 +773,33 @@ struct coresight_device *coresight_get_sink(struct coresight_path *path)
[ ... ]
> +int coresight_get_in_port(struct coresight_device *csdev,
> +			  struct coresight_device *remote)
> +{
> +	struct coresight_platform_data *pdata = remote->pdata;
> +	int i;
> +
> +	for (i = 0; i < pdata->nr_inconns; ++i) {
> +		if (!pdata->in_conns[i])
> +			continue;
> +
> +		if (pdata->in_conns[i]->src_dev == csdev)

[Severity: High]
Can this sequence cause a race condition leading to a NULL pointer
dereference?

The array element is read twice. First, to check if it's NULL, and then
again to access src_dev. If a concurrent device unbind operation like
coresight_remove_conns() sets the element to NULL between these reads, the
second access might trigger a panic.

Should we use READ_ONCE() to load the pointer into a local variable first?

> +			return pdata->in_conns[i]->dest_port;
> +	}
> +
> +	return -EINVAL;
> +}
> +EXPORT_SYMBOL_GPL(coresight_get_in_port);
> +
>  u32 coresight_get_sink_id(struct coresight_device *csdev)
>  {
>  	if (!csdev->ea)
> diff --git a/drivers/hwtracing/coresight/coresight-ctcu-core.c b/drivers/hwtracing/coresight/coresight-ctcu-core.c
> index 9043cad42f01e..e8720026c9e3f 100644
> --- a/drivers/hwtracing/coresight/coresight-ctcu-core.c
> +++ b/drivers/hwtracing/coresight/coresight-ctcu-core.c
[ ... ]
> @@ -145,7 +128,7 @@ static int ctcu_set_etr_traceid(struct coresight_device *csdev, struct coresight
>  		return -EINVAL;
>  	}
>  
> -	port_num = ctcu_get_active_port(sink, csdev);
> +	port_num = coresight_get_in_port(sink, csdev);
>  	if (port_num < 0)
>  		return -EINVAL;

[Severity: High]
This isn't a bug introduced by this patch, but does the returned
port_num need to be bounds-checked before being used?

The port_num originates from device tree properties, and if it exceeds
ETR_MAX_NUM (2), it leads to out-of-bounds array accesses in
__ctcu_set_etr_traceid() when indexing into drvdata->atid_offset and
drvdata->traceid_refcnt:

__ctcu_set_etr_traceid()
    ...
    atid_offset = drvdata->atid_offset[port_num];
    ...
    refcnt = drvdata->traceid_refcnt[port_num][traceid];

Would it be safer to validate port_num against the array bounds after
retrieving it?

>  
[ ... ]

[Severity: High]
This is a pre-existing issue not introduced by this patch, but while
looking at the sparse in_conns array usage, I noticed a similar issue in
tpda_get_element_size().

Does this code dereference the sparse array element before checking if it
is valid?

drivers/hwtracing/coresight/coresight-tpda.c:tpda_get_element_size() {
    ...
    for (i = 0; i < csdev->pdata->nr_inconns; i++) {
        in = csdev->pdata->in_conns[i]->src_dev;
        if (!in)
            continue;
    ...
}

If a component is unbound and the connection is removed by
coresight_remove_conns(), csdev->pdata->in_conns[i] could be NULL,
causing a panic before the check is reached.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260625-enable-byte-cntr-for-ctcu-v19-0-8fbbf22e8381@oss.qualcomm.com?part=1

^ permalink raw reply

* [PATCH 1/2] dt-bindings: iio: adc: Add Nuvoton MA35D1 EADC
From: Chi-Wen Weng @ 2026-06-25 11:06 UTC (permalink / raw)
  To: jic23, robh, krzk+dt, conor+dt
  Cc: dlechner, nuno.sa, andy, linux-arm-kernel, linux-iio, devicetree,
	linux-kernel, cwweng, cwweng.linux
In-Reply-To: <20260625110638.38438-1-cwweng.linux@gmail.com>

From: Chi-Wen Weng <cwweng@nuvoton.com>

Add devicetree binding for the Enhanced ADC controller found on
Nuvoton MA35D1 SoCs.

The controller has one register region, one interrupt and one functional
clock. ADC inputs are described using standard channel child nodes,
including optional differential channel pairs.

Signed-off-by: Chi-Wen Weng <cwweng@nuvoton.com>
---
 .../bindings/iio/adc/nuvoton,ma35d1-eadc.yaml | 100 ++++++++++++++++++
 1 file changed, 100 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iio/adc/nuvoton,ma35d1-eadc.yaml

diff --git a/Documentation/devicetree/bindings/iio/adc/nuvoton,ma35d1-eadc.yaml b/Documentation/devicetree/bindings/iio/adc/nuvoton,ma35d1-eadc.yaml
new file mode 100644
index 000000000000..ae7ad0f7689a
--- /dev/null
+++ b/Documentation/devicetree/bindings/iio/adc/nuvoton,ma35d1-eadc.yaml
@@ -0,0 +1,100 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iio/adc/nuvoton,ma35d1-eadc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Nuvoton MA35D1 Enhanced Analog to Digital Converter
+
+maintainers:
+  - Chi-Wen Weng <cwweng@nuvoton.com>
+
+description: |
+  The Nuvoton MA35D1 Enhanced Analog to Digital Converter (EADC) is a
+  12-bit ADC controller integrated in the MA35D1 SoC. Each enabled ADC
+  input is described by a child channel node.
+
+properties:
+  compatible:
+    const: nuvoton,ma35d1-eadc
+
+  reg:
+    maxItems: 1
+
+  interrupts:
+    maxItems: 1
+
+  clocks:
+    maxItems: 1
+
+  '#address-cells':
+    const: 1
+
+  '#size-cells':
+    const: 0
+
+patternProperties:
+  '^channel@[0-8]$':
+    type: object
+    $ref: adc.yaml
+    unevaluatedProperties: false
+
+    properties:
+      reg:
+        minimum: 0
+        maximum: 8
+
+      diff-channels:
+        minItems: 2
+        maxItems: 2
+        items:
+          minimum: 0
+          maximum: 8
+
+    required:
+      - reg
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - clocks
+  - '#address-cells'
+  - '#size-cells'
+
+additionalProperties: false
+
+examples:
+  - |
+    #include <dt-bindings/clock/nuvoton,ma35d1-clk.h>
+    #include <dt-bindings/interrupt-controller/arm-gic.h>
+    #include <dt-bindings/interrupt-controller/irq.h>
+
+    soc {
+        #address-cells = <2>;
+        #size-cells = <2>;
+
+        adc@40430000 {
+            compatible = "nuvoton,ma35d1-eadc";
+            reg = <0x0 0x40430000 0x0 0x10000>;
+            interrupts = <GIC_SPI 65 IRQ_TYPE_LEVEL_HIGH>;
+            clocks = <&clk EADC_GATE>;
+
+            #address-cells = <1>;
+            #size-cells = <0>;
+
+            channel@0 {
+                reg = <0>;
+            };
+
+            channel@1 {
+                reg = <1>;
+            };
+
+            channel@2 {
+                reg = <2>;
+                diff-channels = <2 3>;
+            };
+        };
+    };
+...
-- 
2.25.1



^ permalink raw reply related

* [PATCH 0/2] iio: adc: Add Nuvoton MA35D1 EADC support
From: Chi-Wen Weng @ 2026-06-25 11:06 UTC (permalink / raw)
  To: jic23, robh, krzk+dt, conor+dt
  Cc: dlechner, nuno.sa, andy, linux-arm-kernel, linux-iio, devicetree,
	linux-kernel, cwweng, cwweng.linux

From: Chi-Wen Weng <cwweng@nuvoton.com>

This series adds devicetree binding and IIO driver support for the
Nuvoton MA35D1 Enhanced ADC controller.

The MA35D1 EADC controller supports multiple ADC input channels. This
initial upstream driver supports direct raw reads and triggered buffered
capture using the controller end-of-conversion interrupt as the IIO
device trigger.

ADC channels are described using standard firmware child nodes. Both
single-ended and differential channels are supported. Since the
differential enable bit is global in the controller, mixed single-ended
and differential buffered scans are rejected.

DMA support is intentionally not included in this initial version. The
driver uses the interrupt-driven conversion path to keep the first
upstream submission small and easier to review.

Patch 1 adds the devicetree binding.
Patch 2 adds the MA35D1 EADC IIO driver.

Chi-Wen Weng (2):
  dt-bindings: iio: adc: Add Nuvoton MA35D1 EADC
  iio: adc: Add Nuvoton MA35D1 EADC driver

 .../bindings/iio/adc/nuvoton,ma35d1-eadc.yaml | 100 +++
 drivers/iio/adc/Kconfig                       |  10 +
 drivers/iio/adc/Makefile                      |   1 +
 drivers/iio/adc/ma35d1_eadc.c                 | 636 ++++++++++++++++++
 4 files changed, 747 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iio/adc/nuvoton,ma35d1-eadc.yaml
 create mode 100644 drivers/iio/adc/ma35d1_eadc.c

-- 
2.25.1

^ permalink raw reply

* [rgushchin:bpfoom.4.rebased 18/29] undefined reference to `bpf_handle_oom'
From: kernel test robot @ 2026-06-25 11:05 UTC (permalink / raw)
  To: Roman Gushchin; +Cc: oe-kbuild-all

tree:   https://github.com/rgushchin/linux.git bpfoom.4.rebased
head:   702dfd9ab6849d59d3521f7c6542c1cf6aac217c
commit: f8f699c3c785a375e31b42825752cba88f18e1d8 [18/29] mm: introduce BPF OOM struct ops
config: x86_64-buildonly-randconfig-004 (https://download.01.org/0day-ci/archive/20260625/202606251832.iOg1HRsA-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260625/202606251832.iOg1HRsA-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606251832.iOg1HRsA-lkp@intel.com/

All errors (new ones prefixed by >>):

   ld: vmlinux.o: in function `out_of_memory':
>> (.text+0x23ae81): undefined reference to `bpf_handle_oom'

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH v4 0/2] tracing: Move non-trace_printk prototypes into trace_controls.h
From: Jani Nikula @ 2026-06-25 11:05 UTC (permalink / raw)
  To: Steven Rostedt, linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
	Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov,
	linux-doc, linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
	linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
	kvm, intel-gfx
In-Reply-To: <20260625104007.041432666@kernel.org>

On Thu, 25 Jun 2026, Steven Rostedt <rostedt@kernel.org> wrote:
> Remove trace_printk.h by creating a trace_controls.h for those places that
> need access to tracing prototypes like tracing_off() and for the places that
> need trace_printk() directly, to have it included directly.
>
> Changse since v3: https://lore.kernel.org/all/20260624081806.120105649@kernel.org/
>
> - Always include trace_controls.h in rcu.h (kernel test robot)
>
>   There are other configs that may include tracing_off() in rcu.h besides
>   the one that had the include of trace_controls.h. Just always include
>   it in that header to be safe.
>
> Steven Rostedt (2):
>       tracing: Move non-trace_printk prototypes into trace_controls.h
>       tracing: Remove trace_printk.h from kernel.h
>
> ----
>  arch/powerpc/kvm/book3s_xics.c         |  1 +
>  arch/powerpc/xmon/xmon.c               |  1 +
>  arch/s390/kernel/ipl.c                 |  1 +
>  arch/s390/kernel/machine_kexec.c       |  1 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h    |  1 +
>  drivers/gpu/drm/i915/i915_gem.h        |  2 ++

For the i915 parts,

Acked-by: Jani Nikula <jani.nikula@intel.com>

for merging via whichever tree.

>  drivers/hwtracing/stm/dummy_stm.c      |  1 +
>  drivers/infiniband/hw/hfi1/trace_dbg.h |  1 +
>  drivers/tty/sysrq.c                    |  1 +
>  drivers/usb/early/xhci-dbc.c           |  1 +
>  fs/ext4/inline.c                       |  1 +
>  include/linux/ftrace.h                 |  2 ++
>  include/linux/kernel.h                 |  1 -
>  include/linux/sunrpc/debug.h           |  1 +
>  include/linux/trace_controls.h         | 54 ++++++++++++++++++++++++++++++++
>  include/linux/trace_printk.h           | 56 ++--------------------------------
>  kernel/debug/debug_core.c              |  1 +
>  kernel/panic.c                         |  1 +
>  kernel/rcu/rcu.h                       |  1 +
>  kernel/rcu/rcutorture.c                |  1 +
>  kernel/trace/ring_buffer_benchmark.c   |  1 +
>  kernel/trace/trace.h                   |  1 +
>  kernel/trace/trace_benchmark.c         |  1 +
>  lib/sys_info.c                         |  1 +
>  samples/fprobe/fprobe_example.c        |  1 +
>  samples/ftrace/ftrace-direct-too.c     |  1 -
>  samples/trace_printk/trace-printk.c    |  1 +
>  27 files changed, 82 insertions(+), 55 deletions(-)
>  create mode 100644 include/linux/trace_controls.h

-- 
Jani Nikula, Intel

^ permalink raw reply

* Re: [PATCH v19 2/7] coresight: tmc: add create/clean functions for etr_buf_list
From: sashiko-bot @ 2026-06-25 11:06 UTC (permalink / raw)
  To: Jie Gan; +Cc: conor+dt, robh, devicetree
In-Reply-To: <20260625-enable-byte-cntr-for-ctcu-v19-2-8fbbf22e8381@oss.qualcomm.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Lockless read of `drvdata->sysfs_buf` in `tmc_create_etr_buf_list` leads to a Time-of-Check to Time-of-Use race condition, resulting in a Double-Free or Use-After-Free.
- [Medium] The error path in `tmc_create_etr_buf_list` destructively frees the pre-existing `drvdata->sysfs_buf` if the device becomes inactive, leading to the irreversible loss of previously captured trace data.
- [Medium] `tmc_clean_etr_buf_list` leaks `drvdata->sysfs_buf` if called when the device is active in a non-SYSFS mode (e.g., `CS_MODE_PERF`).
--

commit efe04b42620438e6e268cf24ebf0538327ca8629
Author: Jie Gan <jie.gan@oss.qualcomm.com>

coresight: tmc: add create/clean functions for etr_buf_list

This commit introduces functions for creating and cleaning a list of
etr_buf_node elements to manage etr_buf entries. It is intended to support
byte-counter functionality which requires multiple trace data buffers.

> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 361a433e6f0c5..9b3ef73e9cf27 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c

[ ... ]

> @@ -1918,6 +1918,132 @@ const struct coresight_ops tmc_etr_cs_ops = {
>  	.panic_ops	= &tmc_etr_sync_ops,
>  };
>  
> +/**
> + * tmc_clean_etr_buf_list - clean the etr_buf_list.
> + * @drvdata:	driver data of the TMC device.
> + *
> + * Remove all nodes from @drvdata->etr_buf_list and free their buffers.
> + * If a node holds the live sysfs_buf and the device is active, the node is
> + * removed but the buffer is not freed; ownership stays with drvdata->sysfs_buf.
> + *
> + * Locking: callers must guarantee exclusive access to @drvdata->etr_buf_list
> + * and must not hold @drvdata->spinlock. The spinlock is taken internally only
> + * to serialise the @drvdata->sysfs_buf accesses against the ETR sink
> + * enable/disable paths. Must be called from process context: buffers are freed
> + * with the lock released.
> + */
> +void tmc_clean_etr_buf_list(struct tmc_drvdata *drvdata)
> +{
> +	struct etr_buf_node *nd, *next;
> +	unsigned long flags;
> +
> +	list_for_each_entry_safe(nd, next, &drvdata->etr_buf_list, link) {
> +		raw_spin_lock_irqsave(&drvdata->spinlock, flags);
> +		if (nd->sysfs_buf == drvdata->sysfs_buf) {
> +			if (coresight_get_mode(drvdata->csdev) != CS_MODE_DISABLED)
> +				/*
> +				 * The device is still active. Keep the live
> +				 * buffer owned by drvdata->sysfs_buf and only
> +				 * drop the list's reference to it.
> +				 */
> +				nd->sysfs_buf = NULL;

[Severity: Medium]
Will this leak drvdata->sysfs_buf if the device mode is active in a non-SYSFS
mode (such as CS_MODE_PERF)?

If the mode changes to CS_MODE_PERF concurrently during the
tmc_create_etr_buf_list() error path, this condition evaluates to true and
abandons the pointer by setting nd->sysfs_buf to NULL.

Since PERF mode tracing exclusively manages drvdata->perf_buf, does its
teardown path ever free drvdata->sysfs_buf?

> +			else
> +				/* Free the buffer below through nd->sysfs_buf */
> +				drvdata->sysfs_buf = NULL;

[Severity: Medium]
If tmc_create_etr_buf_list() encounters an allocation failure and calls
tmc_clean_etr_buf_list() to roll back, and the device happens to be inactive,
will this unconditionally clear and free the pre-existing drvdata->sysfs_buf?

Should an initialization error path only clean up the resources it allocated,
rather than destroying a preserved external resource and losing previously
captured trace data?

> +		}
> +		raw_spin_unlock_irqrestore(&drvdata->spinlock, flags);
> +
> +		/* Free the buffer (NULL is ignored) and the node out of the lock */
> +		tmc_etr_free_sysfs_buf(nd->sysfs_buf);
> +		list_del(&nd->link);
> +		kfree(nd);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(tmc_clean_etr_buf_list);
> +
> +/**
> + * tmc_create_etr_buf_list - create a list to manage the etr_buf_node.
> + * @drvdata:	driver data of the TMC device.
> + * @num_nodes:	number of nodes want to create with the list.
> + *
> + * Locking: callers must guarantee exclusive access to @drvdata->etr_buf_list
> + * and must not hold @drvdata->spinlock. The spinlock is taken internally only
> + * to serialise the @drvdata->sysfs_buf accesses against the ETR sink
> + * enable/disable paths. Must be called from process context: buffers and nodes
> + * are allocated with the lock released.
> + *
> + * Return 0 upon success and return the error number if fail.
> + */
> +int tmc_create_etr_buf_list(struct tmc_drvdata *drvdata, int num_nodes)
> +{
> +	struct etr_buf_node *new_node;
> +	struct etr_buf *sysfs_buf;
> +	unsigned long flags;
> +	int i = 0, ret = 0;
> +
> +	/* We don't need a list if there is only one node */
> +	if (num_nodes < 2)
> +		return -EINVAL;
> +
> +	/*
> +	 * We expect that sysfs_buf in drvdata has already been allocated.
> +	 * Wrap the live sysfs_buf into the first node so the captured trace
> +	 * data is preserved. The list is owned by the caller, so no lock is
> +	 * needed to read sysfs_buf or to add the node here.
> +	 */
> +	if (drvdata->sysfs_buf) {
> +		new_node = kzalloc_obj(*new_node, GFP_KERNEL);
> +		if (!new_node)
> +			return -ENOMEM;
> +
> +		new_node->sysfs_buf = drvdata->sysfs_buf;

[Severity: High]
Does this lockless read of drvdata->sysfs_buf introduce a race condition?

If a concurrent hardware disable/enable path (like tmc_etr_get_sysfs_buffer())
clears and frees drvdata->sysfs_buf immediately after this lockless check,
wouldn't new_node->sysfs_buf be left as a dangling pointer, leading to a
use-after-free or double free?

> +		new_node->is_free = false;
> +		list_add(&new_node->link, &drvdata->etr_buf_list);
> +		i++;
> +	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260625-enable-byte-cntr-for-ctcu-v19-0-8fbbf22e8381@oss.qualcomm.com?part=2

^ permalink raw reply

* Re: [stable request] ROSE memory-safety fixes for 7.0.y and earlier (merged out-of-tree in linux-netdev/mod-orphan)
From: Greg KH @ 2026-06-25 11:04 UTC (permalink / raw)
  To: Bernard Pidoux; +Cc: kuba, stable, linux-hams
In-Reply-To: <CAFAa3YCJXV9uW==2776dbfNFH4PhBPUYnTxDJ2xs7kn0b=4UTA@mail.gmail.com>

On Sat, Jun 20, 2026 at 02:42:17PM +0200, Bernard Pidoux wrote:
> Hi Greg,
> 
> Thanks, much appreciated.
> 
> Short answer: yes, the same series applies to 6.18.y, and the same bugs
> exist in the older trees too -- but only 7.0.y and 6.18.y take the series
> as-is. ROSE was removed in 7.1, so every stable line up to and including
> 7.0.y still carries this code and is affected.
> 
> I just test-applied this exact mbox with "git am" against the current
> ROSE files of each tree:
> 
> v7.0.13 : clean, 15/15 (what I sent you)
> linux-6.18.y : clean, 15/15, no conflicts -- the teardown code is
> identical to 7.0.13

Great, I've applied these to both now.

> linux-6.12.y : applies up to patch 3, then conflicts in
> rose_loopback.c (the loopback/timer code predates one
> of the refactors the series builds on)
> linux-6.6.y / 6.1.y / 5.15.y : same, conflict at the same patch
> 
> So for 6.18.y I can send an identical batch right away. For 6.12.y and
> the older LTS lines the fixes are still needed, but they need a rebased
> backport rather than a straight cherry-pick; I'm happy to prepare those
> per-tree once the format is settled.
> 
> My suggestion, matching what you said: let's land this 7.0.y batch first
> to work out the workflow. As soon as it's in I'll send the (identical)
> 6.18.y batch, and then the rebased older-tree batches one line at a time.
> Whatever order is easiest on your side works for me.

What you sent was fine, no problems, if that works for you, it's fine
for me.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v3 2/2] xen: introduce CONFIG_HAS_SHARED_INFO for archs without a shared page
From: Jan Beulich @ 2026-06-25 11:04 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Andrew Cooper, Anthony PERARD,
	Roger Pau Monné, Teddy Astie, xen-devel
In-Reply-To: <910ed097-10d8-41a2-9035-a3f10d60c214@gmail.com>

On 25.06.2026 12:50, Oleksii Kurochko wrote:
> On 6/17/26 3:26 PM, Jan Beulich wrote:
>>> +#define shared_info(d, field) \
>>> +    (*(typeof(__shared_info(d, (d)->shared_info, field)) *)shared_info_absent())
>> How about the simpler
>>
>> extern struct shared_info *shared_info_absent;
>> #define shared_info(d, field) (shared_info_absent->field)
>>
>> ?
> 
> This could lead to compilation error:
> 
> common/domain.c: In function 'vcpu_info_reset':
> common/domain.c:316:20: error: unused variable 'd' [-Werror=unused-variable]
>    316 |     struct domain *d = v->domain;
>        |                    ^
> cc1: all warnings being treated as errors

#define shared_info(d, field) ((void)(d), shared_info_absent->field)

Which it should have been anyway, to make sure the first argument is evaluated.

Jan


^ permalink raw reply

* Re: [RFC PATCH 1/3] dt-bindings: pinctrl: mt8516/mt8167: Move compatibles from mt66xx to mt6795
From: sashiko-bot @ 2026-06-25 11:04 UTC (permalink / raw)
  To: Luca Leonardo Scorcia; +Cc: devicetree, conor+dt, robh
In-Reply-To: <20260625104742.113803-2-l.scorcia@gmail.com>

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Critical] The patch series permanently breaks Device Tree backward compatibility (ABI breakage). By moving the mt8167 and mt8516 SoCs to the mt6795 bindings and pinctrl-paris.c driver, the schema now unconditionally requires `reg-names` and `gpio-ranges` and forbids the legacy `mediatek,pctl-regmap` layout. Furthermore, the updated driver calls `devm_platform_ioremap_resource_byname(pdev, "base")` and fails with `-EINVAL` if `reg-names` is absent. Booting a new kernel with an older, previously valid device tree will fail to probe the pinctrl driver, breaking all GPIO-dependent devices.
--

commit ae4fde26bd3f120a43beccf98009171f05144eee
Author: Luca Leonardo Scorcia <l.scorcia@gmail.com>

dt-bindings: pinctrl: mt8516/mt8167: Move compatibles from mt66xx to mt6795

Pinctrl settings for MediaTek mt8516-mt8167 SoCs use two reg base
addresses, one for GPIO and the other for EINT, as it is common in the
"Paris" pinctrl platform. This commit moves the mediatek,mt8516-pinctrl and
mediatek,mt8167-pinctrl compatible strings to the mt6795-pinctrl binding.

[ ... ]

> diff --git a/Documentation/devicetree/bindings/pinctrl/mediatek,mt6795-pinctrl.yaml b/Documentation/devicetree/bindings/pinctrl/mediatek,mt6795-pinctrl.yaml
> index 9a937f414cc98..c703de72e1d52 100644
> --- a/Documentation/devicetree/bindings/pinctrl/mediatek,mt6795-pinctrl.yaml
> +++ b/Documentation/devicetree/bindings/pinctrl/mediatek,mt6795-pinctrl.yaml
> @@ -15,7 +15,10 @@ description:
>  
>  properties:
>    compatible:
> -    const: mediatek,mt6795-pinctrl
> +    enum:
> +      - mediatek,mt6795-pinctrl
> +      - mediatek,mt8167-pinctrl
> +      - mediatek,mt8516-pinctrl

[Severity: Critical]
Does moving these compatibles to this binding break Device Tree backward
compatibility?

The schema now unconditionally requires properties like reg-names and
gpio-ranges, and forbids the legacy mediatek,pctl-regmap layout without an
allOf: if-then conditional to accommodate older hardware descriptions.

Furthermore, the updated driver unconditionally requires the reg-names
property:

drivers/pinctrl/mediatek/pinctrl-paris.c:mtk_paris_pinctrl_probe() {
    ...
		hw->base[i] = devm_platform_ioremap_resource_byname(pdev,
					hw->soc->base_names[i]);
    ...
}

Because older device trees lack the reg-names property, this call returns
-EINVAL and fails the probe.

Will booting a new kernel with an older device tree fail to probe the
pinctrl driver, breaking all GPIO-dependent peripherals?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260625104742.113803-1-l.scorcia@gmail.com?part=1

^ permalink raw reply

* [PATCH bpf-next v10 5/5] selftests/bpf: add bpf_icmp_send no route test
From: Mahe Tardy @ 2026-06-25 11:03 UTC (permalink / raw)
  To: bpf
  Cc: andrii, ast, daniel, john.fastabend, jordan, martin.lau,
	yonghong.song, emil, netdev, edumazet, kuba, pabeni, davem, horms,
	Mahe Tardy
In-Reply-To: <20260625110321.28236-1-mahe.tardy@gmail.com>

For normal live cgroup_skb paths, the skb should already be routed. The
exception is for test run via BPF_PROG_TEST_RUN with packets created
via bpf_prog_test_run_skb. Those lack dst route and thus the icmp_send
would quietly fail by returning early.

This test exercises this and makes sure the kfunc returns -ENETUNREACH.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 .../bpf/prog_tests/icmp_send_kfunc.c          | 26 +++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
index bb532aa0d158..ffaf0fe1880b 100644
--- a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
+++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
@@ -169,6 +169,29 @@ static void run_icmp_test(struct icmp_send *skel, int af, const char *ip,
 	}
 }

+static void run_icmp_no_route_test(struct icmp_send *skel)
+{
+	struct ipv4_packet pkt = pkt_v4;
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		.data_in = &pkt,
+		.data_size_in = sizeof(pkt),
+	);
+	int err;
+
+	pkt.iph.version = 4;
+	pkt.iph.daddr = inet_addr("127.0.0.1");
+	pkt.tcp.dest = htons(80);
+	skel->bss->server_port = 80;
+	skel->bss->unreach_type = ICMP_DEST_UNREACH;
+	skel->bss->unreach_code = ICMP_HOST_UNREACH;
+	skel->data->kfunc_ret = KFUNC_RET_UNSET;
+
+	err = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.egress), &opts);
+	if (!ASSERT_OK(err, "test_run"))
+		return;
+	ASSERT_EQ(skel->data->kfunc_ret, -ENETUNREACH, "kfunc_ret_no_route");
+}
+
 void test_icmp_send_unreach_cgroup(void)
 {
 	struct icmp_send *skel;
@@ -193,6 +216,9 @@ void test_icmp_send_unreach_cgroup(void)
 	if (test__start_subtest("ipv6"))
 		run_icmp_test(skel, AF_INET6, "::1", ICMPV6_REJECT_ROUTE);

+	if (test__start_subtest("no_route"))
+		run_icmp_no_route_test(skel);
+
 cleanup:
 	icmp_send__destroy(skel);
 	if (cgroup_fd >= 0)
--
2.34.1


^ permalink raw reply related

* [PATCH bpf-next v10 4/5] selftests/bpf: add bpf_icmp_send recursion test
From: Mahe Tardy @ 2026-06-25 11:03 UTC (permalink / raw)
  To: bpf
  Cc: andrii, ast, daniel, john.fastabend, jordan, martin.lau,
	yonghong.song, emil, netdev, edumazet, kuba, pabeni, davem, horms,
	Mahe Tardy
In-Reply-To: <20260625110321.28236-1-mahe.tardy@gmail.com>

This test is similar to test_icmp_send_unreach_cgroup but checks that,
in case of recursion, meaning that the BPF program calling the kfunc was
re-triggered by the icmp_send done by the kfunc, the kfunc will stop
early and return -EBUSY.

The test attaches to the root cgroup to ensure the ICMP packet generated
by the kfunc re-triggers the BPF program.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 .../bpf/prog_tests/icmp_send_kfunc.c          | 46 ++++++++++++++++
 tools/testing/selftests/bpf/progs/icmp_send.c | 55 +++++++++++++++++++
 2 files changed, 101 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
index bbb3c3d4509c..bb532aa0d158 100644
--- a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
+++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
@@ -1,8 +1,10 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <test_progs.h>
 #include <network_helpers.h>
+#include <cgroup_helpers.h>
 #include <linux/errqueue.h>
 #include <poll.h>
+#include <unistd.h>
 #include "icmp_send.skel.h"

 #define TIMEOUT_MS 1000
@@ -10,6 +12,7 @@
 #define ICMP_DEST_UNREACH 3
 #define ICMPV6_DEST_UNREACH 1

+#define ICMP_HOST_UNREACH 1
 #define ICMP_FRAG_NEEDED 4
 #define NR_ICMP_UNREACH 15
 #define ICMPV6_REJECT_ROUTE 6
@@ -195,3 +198,46 @@ void test_icmp_send_unreach_cgroup(void)
 	if (cgroup_fd >= 0)
 		close(cgroup_fd);
 }
+
+void test_icmp_send_unreach_recursion(void)
+{
+	struct icmp_send *skel;
+	int cgroup_fd = -1;
+	int err;
+
+	err = setup_cgroup_environment();
+	if (!ASSERT_OK(err, "setup_cgroup_environment"))
+		return;
+
+	skel = icmp_send__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	cgroup_fd = get_root_cgroup();
+	if (!ASSERT_OK_FD(cgroup_fd, "get_root_cgroup"))
+		goto cleanup;
+
+	skel->data->target_pid = getpid();
+	skel->links.recursion =
+		bpf_program__attach_cgroup(skel->progs.recursion, cgroup_fd);
+	if (!ASSERT_OK_PTR(skel->links.recursion, "prog_attach_cgroup"))
+		goto cleanup;
+
+	trigger_prog_read_icmp_errqueue(skel, ICMP_HOST_UNREACH, AF_INET,
+					"127.0.0.1");
+
+	/*
+	 * Because there's recursion involved, the first call will return at
+	 * index 1 since it will return the second, and the second call will
+	 * return at index 0 since it will return the first.
+	 */
+	ASSERT_EQ(skel->bss->rec_count, 2, "rec_count");
+	ASSERT_EQ(skel->data->rec_kfunc_rets[0], -EBUSY, "kfunc_rets[0]");
+	ASSERT_EQ(skel->data->rec_kfunc_rets[1], 0, "kfunc_rets[1]");
+
+cleanup:
+	icmp_send__destroy(skel);
+	if (cgroup_fd >= 0)
+		close(cgroup_fd);
+	cleanup_cgroup_environment();
+}
diff --git a/tools/testing/selftests/bpf/progs/icmp_send.c b/tools/testing/selftests/bpf/progs/icmp_send.c
index 6e1ba539eeb0..c642ccdf9fd5 100644
--- a/tools/testing/selftests/bpf/progs/icmp_send.c
+++ b/tools/testing/selftests/bpf/progs/icmp_send.c
@@ -12,6 +12,10 @@ __u16 server_port = 0;
 int unreach_type = 0;
 int unreach_code = 0;
 int kfunc_ret = -1;
+int target_pid = -1;
+
+unsigned int rec_count = 0;
+int rec_kfunc_rets[] = { -1, -1 };

 SEC("cgroup_skb/egress")
 int egress(struct __sk_buff *skb)
@@ -65,4 +69,55 @@ int egress(struct __sk_buff *skb)
 	return SK_DROP;
 }

+SEC("cgroup_skb/egress")
+int recursion(struct __sk_buff *skb)
+{
+	void *data = (void *)(long)skb->data;
+	void *data_end = (void *)(long)skb->data_end;
+	struct icmphdr *icmph;
+	struct tcphdr *tcph;
+	struct iphdr *iph;
+	int ret;
+
+	if ((bpf_get_current_pid_tgid() >> 32) != target_pid)
+		return SK_PASS;
+
+	iph = data;
+	if ((void *)(iph + 1) > data_end || iph->version != 4)
+		return SK_PASS;
+
+	if (iph->daddr != bpf_htonl(SERVER_IP))
+		return SK_PASS;
+
+	if (iph->protocol == IPPROTO_TCP) {
+		tcph = (void *)iph + iph->ihl * 4;
+		if ((void *)(tcph + 1) > data_end ||
+		    tcph->dest != bpf_htons(server_port))
+			return SK_PASS;
+	} else if (iph->protocol == IPPROTO_ICMP) {
+		icmph = (void *)iph + iph->ihl * 4;
+		if ((void *)(icmph + 1) > data_end ||
+		    icmph->type != unreach_type || icmph->code != unreach_code)
+			return SK_PASS;
+	} else {
+		return SK_PASS;
+	}
+
+	/*
+	 * This call will provoke a recursion: the ICMP packet generated by the
+	 * kfunc will re-trigger this program since we are in the root cgroup in
+	 * which the kernel ICMP socket belongs. However when re-entering the
+	 * kfunc, it should return EBUSY.
+	 */
+	ret = bpf_icmp_send(skb, unreach_type, unreach_code);
+	rec_kfunc_rets[rec_count & 1] = ret;
+	__sync_fetch_and_add(&rec_count, 1);
+
+	/* Let the first ICMP error message pass */
+	if (iph->protocol == IPPROTO_ICMP)
+		return SK_PASS;
+
+	return SK_DROP;
+}
+
 char LICENSE[] SEC("license") = "Dual BSD/GPL";
--
2.34.1


^ permalink raw reply related

* [PATCH bpf-next v10 3/5] selftests/bpf: add bpf_icmp_send kfunc cgroup_skb IPv6 tests
From: Mahe Tardy @ 2026-06-25 11:03 UTC (permalink / raw)
  To: bpf
  Cc: andrii, ast, daniel, john.fastabend, jordan, martin.lau,
	yonghong.song, emil, netdev, edumazet, kuba, pabeni, davem, horms,
	Mahe Tardy
In-Reply-To: <20260625110321.28236-1-mahe.tardy@gmail.com>

This test extends the existing cgroup_skb tests with IPv6 support.

Note that we need to set IPV6_RECVERR on the socket for IPv6 in
connect_to_fd_nonblock otherwise the error will be ignored even if we
are in the middle of the TCP handshake. See in
net/ipv6/datagram.c:ipv6_icmp_error for more details.

Reviewed-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 .../bpf/prog_tests/icmp_send_kfunc.c          | 91 +++++++++++++------
 tools/testing/selftests/bpf/progs/icmp_send.c | 48 ++++++++--
 2 files changed, 101 insertions(+), 38 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
index b8a98c90053e..bbb3c3d4509c 100644
--- a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
+++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
@@ -8,9 +8,11 @@
 #define TIMEOUT_MS 1000

 #define ICMP_DEST_UNREACH 3
+#define ICMPV6_DEST_UNREACH 1

 #define ICMP_FRAG_NEEDED 4
 #define NR_ICMP_UNREACH 15
+#define ICMPV6_REJECT_ROUTE 6

 #define KFUNC_RET_UNSET -1

@@ -18,7 +20,7 @@ static int connect_to_fd_nonblock(int server_fd)
 {
 	struct sockaddr_storage addr;
 	socklen_t len = sizeof(addr);
-	int fd, err;
+	int fd, err, on = 1;

 	if (getsockname(server_fd, (struct sockaddr *)&addr, &len))
 		return -1;
@@ -27,6 +29,12 @@ static int connect_to_fd_nonblock(int server_fd)
 	if (fd < 0)
 		return -1;

+	if (addr.ss_family == AF_INET6 &&
+	    setsockopt(fd, IPPROTO_IPV6, IPV6_RECVERR, &on, sizeof(on)) < 0) {
+		close(fd);
+		return -1;
+	}
+
 	err = connect(fd, (struct sockaddr *)&addr, len);
 	if (err < 0 && errno != EINPROGRESS) {
 		close(fd);
@@ -36,8 +44,14 @@ static int connect_to_fd_nonblock(int server_fd)
 	return fd;
 }

-static void read_icmp_errqueue(int sockfd, int expected_code)
+static void read_icmp_errqueue(int sockfd, int expected_code, int af)
 {
+	int expected_ee_type = (af == AF_INET) ? ICMP_DEST_UNREACH :
+						 ICMPV6_DEST_UNREACH;
+	int expected_origin = (af == AF_INET) ? SO_EE_ORIGIN_ICMP :
+						SO_EE_ORIGIN_ICMP6;
+	int expected_level = (af == AF_INET) ? IPPROTO_IP : IPPROTO_IPV6;
+	int expected_type = (af == AF_INET) ? IP_RECVERR : IPV6_RECVERR;
 	struct sock_extended_err *sock_err;
 	char ctrl_buf[512];
 	struct msghdr msg = {
@@ -63,38 +77,43 @@ static void read_icmp_errqueue(int sockfd, int expected_code)
 		return;

 	for (; cm; cm = CMSG_NXTHDR(&msg, cm)) {
-		if (cm->cmsg_level != IPPROTO_IP || cm->cmsg_type != IP_RECVERR)
+		if (cm->cmsg_level != expected_level ||
+		    cm->cmsg_type != expected_type)
 			continue;

 		sock_err = (struct sock_extended_err *)CMSG_DATA(cm);

-		if (!ASSERT_EQ(sock_err->ee_origin, SO_EE_ORIGIN_ICMP,
-			       "sock_err_origin_icmp"))
+		if (!ASSERT_EQ(sock_err->ee_origin, expected_origin,
+			       "sock_err_origin"))
 			return;
-		if (!ASSERT_EQ(sock_err->ee_type, ICMP_DEST_UNREACH,
+		if (!ASSERT_EQ(sock_err->ee_type, expected_ee_type,
 			       "sock_err_type_dest_unreach"))
 			return;
 		ASSERT_EQ(sock_err->ee_code, expected_code, "sock_err_code");
 		return;
 	}

-	ASSERT_FAIL("no IP_RECVERR control message found");
+	ASSERT_FAIL("no IP_RECVERR/IPV6_RECVERR control message found");
 }

-static bool valid_unreach_code(int code)
+static bool valid_unreach_code(int code, int af)
 {
 	if (code < 0)
 		return false;

-	return code <= NR_ICMP_UNREACH && code != ICMP_FRAG_NEEDED;
+	if (af == AF_INET)
+		return code <= NR_ICMP_UNREACH && code != ICMP_FRAG_NEEDED;
+
+	return code <= ICMPV6_REJECT_ROUTE;
 }

-static void trigger_prog_read_icmp_errqueue(struct icmp_send *skel, int code)
+static void trigger_prog_read_icmp_errqueue(struct icmp_send *skel, int code,
+					    int af, const char *ip)
 {
 	int srv_fd = -1, client_fd = -1;
 	int port;

-	srv_fd = start_server(AF_INET, SOCK_STREAM, "127.0.0.1", 0, TIMEOUT_MS);
+	srv_fd = start_server(af, SOCK_STREAM, ip, 0, TIMEOUT_MS);
 	if (!ASSERT_OK_FD(srv_fd, "start_server"))
 		return;

@@ -105,6 +124,8 @@ static void trigger_prog_read_icmp_errqueue(struct icmp_send *skel, int code)
 	}

 	skel->bss->server_port = ntohs(port);
+	skel->bss->unreach_type = (af == AF_INET) ? ICMP_DEST_UNREACH :
+						    ICMPV6_DEST_UNREACH;
 	skel->bss->unreach_code = code;
 	skel->data->kfunc_ret = KFUNC_RET_UNSET;

@@ -114,13 +135,37 @@ static void trigger_prog_read_icmp_errqueue(struct icmp_send *skel, int code)
 		return;
 	}

-	if (valid_unreach_code(code))
-		read_icmp_errqueue(client_fd, code);
+	if (valid_unreach_code(code, af))
+		read_icmp_errqueue(client_fd, code, af);

 	close(client_fd);
 	close(srv_fd);
 }

+static void run_icmp_test(struct icmp_send *skel, int af, const char *ip,
+			  int max_code)
+{
+	for (int code = 0; code <= max_code; code++) {
+		if (af == AF_INET && code == ICMP_FRAG_NEEDED)
+			continue;
+
+		trigger_prog_read_icmp_errqueue(skel, code, af, ip);
+		ASSERT_EQ(skel->data->kfunc_ret, 0, "kfunc_ret");
+	}
+
+	/* Test invalid codes */
+	trigger_prog_read_icmp_errqueue(skel, -1, af, ip);
+	ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
+
+	trigger_prog_read_icmp_errqueue(skel, max_code + 1, af, ip);
+	ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
+
+	if (af == AF_INET) {
+		trigger_prog_read_icmp_errqueue(skel, ICMP_FRAG_NEEDED, af, ip);
+		ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
+	}
+}
+
 void test_icmp_send_unreach_cgroup(void)
 {
 	struct icmp_send *skel;
@@ -139,23 +184,11 @@ void test_icmp_send_unreach_cgroup(void)
 	if (!ASSERT_OK_PTR(skel->links.egress, "prog_attach_cgroup"))
 		goto cleanup;

-	for (int code = 0; code <= NR_ICMP_UNREACH; code++) {
-		if (code == ICMP_FRAG_NEEDED)
-			continue;
-
-		trigger_prog_read_icmp_errqueue(skel, code);
-		ASSERT_EQ(skel->data->kfunc_ret, 0, "kfunc_ret");
-	}
-
-	/* Test invalid codes */
-	trigger_prog_read_icmp_errqueue(skel, -1);
-	ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
+	if (test__start_subtest("ipv4"))
+		run_icmp_test(skel, AF_INET, "127.0.0.1", NR_ICMP_UNREACH);

-	trigger_prog_read_icmp_errqueue(skel, NR_ICMP_UNREACH + 1);
-	ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
-
-	trigger_prog_read_icmp_errqueue(skel, ICMP_FRAG_NEEDED);
-	ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
+	if (test__start_subtest("ipv6"))
+		run_icmp_test(skel, AF_INET6, "::1", ICMPV6_REJECT_ROUTE);

 cleanup:
 	icmp_send__destroy(skel);
diff --git a/tools/testing/selftests/bpf/progs/icmp_send.c b/tools/testing/selftests/bpf/progs/icmp_send.c
index 6d0be0a9afe1..6e1ba539eeb0 100644
--- a/tools/testing/selftests/bpf/progs/icmp_send.c
+++ b/tools/testing/selftests/bpf/progs/icmp_send.c
@@ -5,10 +5,11 @@

 /* 127.0.0.1 in host byte order */
 #define SERVER_IP 0x7F000001
-
-#define ICMP_DEST_UNREACH 3
+/* ::1 in host byte order (last 32-bit word) */
+#define SERVER_IP6_LO 0x00000001

 __u16 server_port = 0;
+int unreach_type = 0;
 int unreach_code = 0;
 int kfunc_ret = -1;

@@ -18,19 +19,48 @@ int egress(struct __sk_buff *skb)
 	void *data = (void *)(long)skb->data;
 	void *data_end = (void *)(long)skb->data_end;
 	struct iphdr *iph;
+	struct ipv6hdr *ip6h;
 	struct tcphdr *tcph;
+	__u8 version;

-	iph = data;
-	if ((void *)(iph + 1) > data_end || iph->version != 4 ||
-	    iph->protocol != IPPROTO_TCP || iph->daddr != bpf_htonl(SERVER_IP))
+	if (data + 1 > data_end)
 		return SK_PASS;

-	tcph = (void *)iph + iph->ihl * 4;
-	if ((void *)(tcph + 1) > data_end ||
-	    tcph->dest != bpf_htons(server_port))
+	version = (*((__u8 *)data)) >> 4;
+
+	if (version == 4) {
+		iph = data;
+		if ((void *)(iph + 1) > data_end ||
+		    iph->protocol != IPPROTO_TCP ||
+		    iph->daddr != bpf_htonl(SERVER_IP))
+			return SK_PASS;
+
+		tcph = (void *)iph + iph->ihl * 4;
+		if ((void *)(tcph + 1) > data_end ||
+		    tcph->dest != bpf_htons(server_port))
+			return SK_PASS;
+
+	} else if (version == 6) {
+		ip6h = data;
+		if ((void *)(ip6h + 1) > data_end ||
+		    ip6h->nexthdr != IPPROTO_TCP)
+			return SK_PASS;
+
+		if (ip6h->daddr.in6_u.u6_addr32[0] != 0 ||
+		    ip6h->daddr.in6_u.u6_addr32[1] != 0 ||
+		    ip6h->daddr.in6_u.u6_addr32[2] != 0 ||
+		    ip6h->daddr.in6_u.u6_addr32[3] != bpf_htonl(SERVER_IP6_LO))
+			return SK_PASS;
+
+		tcph = (void *)(ip6h + 1);
+		if ((void *)(tcph + 1) > data_end ||
+		    tcph->dest != bpf_htons(server_port))
+			return SK_PASS;
+	} else {
 		return SK_PASS;
+	}

-	kfunc_ret = bpf_icmp_send(skb, ICMP_DEST_UNREACH, unreach_code);
+	kfunc_ret = bpf_icmp_send(skb, unreach_type, unreach_code);

 	return SK_DROP;
 }
--
2.34.1


^ permalink raw reply related

* [PATCH bpf-next v10 2/5] selftests/bpf: add bpf_icmp_send kfunc cgroup_skb tests
From: Mahe Tardy @ 2026-06-25 11:03 UTC (permalink / raw)
  To: bpf
  Cc: andrii, ast, daniel, john.fastabend, jordan, martin.lau,
	yonghong.song, emil, netdev, edumazet, kuba, pabeni, davem, horms,
	Mahe Tardy
In-Reply-To: <20260625110321.28236-1-mahe.tardy@gmail.com>

This test opens a server and client, enters a new cgroup, attach a
cgroup_skb program on egress and calls the bpf_icmp_send function from
the client egress so that an ICMP unreach control message is sent back
to the client. It then fetches the message from the error queue to
confirm the correct ICMP unreach code has been sent.

Note that, for the client, we have to connect in non-blocking mode to
let the test execute faster. Otherwise, we need to wait for the TCP
three-way handshake to timeout in the kernel before reading the errno.

Also note that we don't set IP_RECVERR on the socket in
connect_to_fd_nonblock since the error will be transferred anyway in our
test because the connection is rejected at the beginning of the TCP
handshake. See in net/ipv4/tcp_ipv4.c:tcp_v4_err for more details.

Reviewed-by: Jordan Rife <jordan@jrife.io>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 .../bpf/prog_tests/icmp_send_kfunc.c          | 164 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/icmp_send.c |  38 ++++
 2 files changed, 202 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/progs/icmp_send.c

diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
new file mode 100644
index 000000000000..b8a98c90053e
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
@@ -0,0 +1,164 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+#include <linux/errqueue.h>
+#include <poll.h>
+#include "icmp_send.skel.h"
+
+#define TIMEOUT_MS 1000
+
+#define ICMP_DEST_UNREACH 3
+
+#define ICMP_FRAG_NEEDED 4
+#define NR_ICMP_UNREACH 15
+
+#define KFUNC_RET_UNSET -1
+
+static int connect_to_fd_nonblock(int server_fd)
+{
+	struct sockaddr_storage addr;
+	socklen_t len = sizeof(addr);
+	int fd, err;
+
+	if (getsockname(server_fd, (struct sockaddr *)&addr, &len))
+		return -1;
+
+	fd = socket(addr.ss_family, SOCK_STREAM | SOCK_NONBLOCK, 0);
+	if (fd < 0)
+		return -1;
+
+	err = connect(fd, (struct sockaddr *)&addr, len);
+	if (err < 0 && errno != EINPROGRESS) {
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+static void read_icmp_errqueue(int sockfd, int expected_code)
+{
+	struct sock_extended_err *sock_err;
+	char ctrl_buf[512];
+	struct msghdr msg = {
+		.msg_control = ctrl_buf,
+		.msg_controllen = sizeof(ctrl_buf),
+	};
+	struct pollfd pfd = {
+		.fd = sockfd,
+		.events = POLLERR,
+	};
+	struct cmsghdr *cm;
+	ssize_t n;
+
+	if (!ASSERT_GE(poll(&pfd, 1, TIMEOUT_MS), 1, "poll_errqueue"))
+		return;
+
+	n = recvmsg(sockfd, &msg, MSG_ERRQUEUE);
+	if (!ASSERT_GE(n, 0, "recvmsg_errqueue"))
+		return;
+
+	cm = CMSG_FIRSTHDR(&msg);
+	if (!ASSERT_NEQ(cm, NULL, "cm_firsthdr_null"))
+		return;
+
+	for (; cm; cm = CMSG_NXTHDR(&msg, cm)) {
+		if (cm->cmsg_level != IPPROTO_IP || cm->cmsg_type != IP_RECVERR)
+			continue;
+
+		sock_err = (struct sock_extended_err *)CMSG_DATA(cm);
+
+		if (!ASSERT_EQ(sock_err->ee_origin, SO_EE_ORIGIN_ICMP,
+			       "sock_err_origin_icmp"))
+			return;
+		if (!ASSERT_EQ(sock_err->ee_type, ICMP_DEST_UNREACH,
+			       "sock_err_type_dest_unreach"))
+			return;
+		ASSERT_EQ(sock_err->ee_code, expected_code, "sock_err_code");
+		return;
+	}
+
+	ASSERT_FAIL("no IP_RECVERR control message found");
+}
+
+static bool valid_unreach_code(int code)
+{
+	if (code < 0)
+		return false;
+
+	return code <= NR_ICMP_UNREACH && code != ICMP_FRAG_NEEDED;
+}
+
+static void trigger_prog_read_icmp_errqueue(struct icmp_send *skel, int code)
+{
+	int srv_fd = -1, client_fd = -1;
+	int port;
+
+	srv_fd = start_server(AF_INET, SOCK_STREAM, "127.0.0.1", 0, TIMEOUT_MS);
+	if (!ASSERT_OK_FD(srv_fd, "start_server"))
+		return;
+
+	port = get_socket_local_port(srv_fd);
+	if (!ASSERT_GE(port, 0, "get_socket_local_port")) {
+		close(srv_fd);
+		return;
+	}
+
+	skel->bss->server_port = ntohs(port);
+	skel->bss->unreach_code = code;
+	skel->data->kfunc_ret = KFUNC_RET_UNSET;
+
+	client_fd = connect_to_fd_nonblock(srv_fd);
+	if (!ASSERT_OK_FD(client_fd, "client_connect_nonblock")) {
+		close(srv_fd);
+		return;
+	}
+
+	if (valid_unreach_code(code))
+		read_icmp_errqueue(client_fd, code);
+
+	close(client_fd);
+	close(srv_fd);
+}
+
+void test_icmp_send_unreach_cgroup(void)
+{
+	struct icmp_send *skel;
+	int cgroup_fd = -1;
+
+	skel = icmp_send__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	cgroup_fd = test__join_cgroup("/icmp_send_unreach_cgroup");
+	if (!ASSERT_OK_FD(cgroup_fd, "join_cgroup"))
+		goto cleanup;
+
+	skel->links.egress =
+		bpf_program__attach_cgroup(skel->progs.egress, cgroup_fd);
+	if (!ASSERT_OK_PTR(skel->links.egress, "prog_attach_cgroup"))
+		goto cleanup;
+
+	for (int code = 0; code <= NR_ICMP_UNREACH; code++) {
+		if (code == ICMP_FRAG_NEEDED)
+			continue;
+
+		trigger_prog_read_icmp_errqueue(skel, code);
+		ASSERT_EQ(skel->data->kfunc_ret, 0, "kfunc_ret");
+	}
+
+	/* Test invalid codes */
+	trigger_prog_read_icmp_errqueue(skel, -1);
+	ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
+
+	trigger_prog_read_icmp_errqueue(skel, NR_ICMP_UNREACH + 1);
+	ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
+
+	trigger_prog_read_icmp_errqueue(skel, ICMP_FRAG_NEEDED);
+	ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
+
+cleanup:
+	icmp_send__destroy(skel);
+	if (cgroup_fd >= 0)
+		close(cgroup_fd);
+}
diff --git a/tools/testing/selftests/bpf/progs/icmp_send.c b/tools/testing/selftests/bpf/progs/icmp_send.c
new file mode 100644
index 000000000000..6d0be0a9afe1
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/icmp_send.c
@@ -0,0 +1,38 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+/* 127.0.0.1 in host byte order */
+#define SERVER_IP 0x7F000001
+
+#define ICMP_DEST_UNREACH 3
+
+__u16 server_port = 0;
+int unreach_code = 0;
+int kfunc_ret = -1;
+
+SEC("cgroup_skb/egress")
+int egress(struct __sk_buff *skb)
+{
+	void *data = (void *)(long)skb->data;
+	void *data_end = (void *)(long)skb->data_end;
+	struct iphdr *iph;
+	struct tcphdr *tcph;
+
+	iph = data;
+	if ((void *)(iph + 1) > data_end || iph->version != 4 ||
+	    iph->protocol != IPPROTO_TCP || iph->daddr != bpf_htonl(SERVER_IP))
+		return SK_PASS;
+
+	tcph = (void *)iph + iph->ihl * 4;
+	if ((void *)(tcph + 1) > data_end ||
+	    tcph->dest != bpf_htons(server_port))
+		return SK_PASS;
+
+	kfunc_ret = bpf_icmp_send(skb, ICMP_DEST_UNREACH, unreach_code);
+
+	return SK_DROP;
+}
+
+char LICENSE[] SEC("license") = "Dual BSD/GPL";
--
2.34.1


^ permalink raw reply related

* [PATCH bpf-next v10 1/5] bpf: add bpf_icmp_send kfunc
From: Mahe Tardy @ 2026-06-25 11:03 UTC (permalink / raw)
  To: bpf
  Cc: andrii, ast, daniel, john.fastabend, jordan, martin.lau,
	yonghong.song, emil, netdev, edumazet, kuba, pabeni, davem, horms,
	Mahe Tardy
In-Reply-To: <20260625110321.28236-1-mahe.tardy@gmail.com>

This is needed in the context of Tetragon to provide improved feedback
(in contrast to just dropping packets) to east-west traffic when blocked
by policies using cgroup_skb programs.

This reuses concepts from netfilter reject target codepath with the
differences that:
* Packets are cloned since the BPF user can still let the packet pass
  (SK_PASS from the cgroup_skb progs for example) and the current skb
  need to stay untouched (cgroup_skb hooks only allow read-only skb
  payload).
* We protect against recursion since the kfunc, by generating an ICMP
  error message, could retrigger the BPF prog that invoked it.

Only ICMP_DEST_UNREACH and ICMPV6_DEST_UNREACH are currently supported.
The interface accepts a type parameter to facilitate future extension to
other ICMP control message types.

For normal cgroup_skb paths, the skb dst route should already be set.
However, bpf_prog_test_run_skb can create synthetic IPv4 skbs without an
attached route. In that case, icmp_send returns early, and the kfunc
would otherwise report success despite no ICMP reply being sent. The
check also rejects metadata dsts, which are not valid struct rtable
instances. For IPv6, reject metadata dsts only: icmpv6_send can reach
icmp6_dev, where skb_rt6_info treats any non-NULL skb dst as a struct
rt6_info, which is not valid for metadata_dst.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 net/core/filter.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 95 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 2e96b4b847ce..0a0191586b44 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -84,6 +84,9 @@
 #include <linux/un.h>
 #include <net/xdp_sock_drv.h>
 #include <net/inet_dscp.h>
+#include <linux/icmpv6.h>
+#include <net/icmp.h>
+#include <net/ip6_route.h>

 #include "dev.h"

@@ -12546,6 +12549,88 @@ __bpf_kfunc int bpf_xdp_pull_data(struct xdp_md *x, u32 len)
 	return 0;
 }

+/**
+ * bpf_icmp_send - Send an ICMP control message
+ * @skb_ctx: Packet that triggered the control message
+ * @type: ICMP type (only ICMP_DEST_UNREACH/ICMPV6_DEST_UNREACH supported)
+ * @code: ICMP code (0-15 except ICMP_FRAG_NEEDED for IPv4, 0-6 for IPv6)
+ *
+ * Sends an ICMP control message in response to the packet. The original packet
+ * is cloned before sending the ICMP message, so the BPF program can still let
+ * the packet pass if desired.
+ *
+ * Currently only ICMP_DEST_UNREACH (IPv4) and ICMPV6_DEST_UNREACH (IPv6) are
+ * supported.
+ *
+ * Return: 0 on success (send attempt), negative error code on failure:
+ *         -EBUSY: Recursion detected
+ *         -EPROTONOSUPPORT: Non-IP protocol
+ *         -EOPNOTSUPP: Unsupported ICMP type
+ *         -EINVAL: Invalid code parameter
+ *         -ENETUNREACH: No usable route/dst for the ICMP reply
+ *         -ENOMEM: Memory allocation failed
+ */
+__bpf_kfunc int bpf_icmp_send(struct __sk_buff *skb_ctx, int type, int code)
+{
+	struct sk_buff *skb = (struct sk_buff *)skb_ctx;
+	struct sk_buff *nskb;
+	struct sock *sk;
+
+	sk = skb_to_full_sk(skb);
+	if (sk && sk->sk_kern_sock &&
+	    (sk->sk_protocol == IPPROTO_ICMP || sk->sk_protocol == IPPROTO_ICMPV6))
+		return -EBUSY;
+
+	switch (skb->protocol) {
+#if IS_ENABLED(CONFIG_INET)
+	case htons(ETH_P_IP): {
+		if (type != ICMP_DEST_UNREACH)
+			return -EOPNOTSUPP;
+		if (code < 0 || code > NR_ICMP_UNREACH ||
+		    code == ICMP_FRAG_NEEDED) /* needs a valid next-hop MTU */
+			return -EINVAL;
+
+		/* icmp_send expects skb_dst to be a real rtable. */
+		if (!skb_valid_dst(skb))
+			return -ENETUNREACH;
+
+		nskb = skb_clone(skb, GFP_ATOMIC);
+		if (!nskb)
+			return -ENOMEM;
+
+		memset(IPCB(nskb), 0, sizeof(*IPCB(nskb)));
+		icmp_send(nskb, type, code, 0);
+		consume_skb(nskb);
+		break;
+	}
+#endif
+#if IS_ENABLED(CONFIG_IPV6)
+	case htons(ETH_P_IPV6):
+		if (type != ICMPV6_DEST_UNREACH)
+			return -EOPNOTSUPP;
+		if (code < 0 || code > ICMPV6_REJECT_ROUTE)
+			return -EINVAL;
+
+		/* icmpv6_send may treat skb_dst as rt6_info. */
+		if (skb_metadata_dst(skb))
+			return -ENETUNREACH;
+
+		nskb = skb_clone(skb, GFP_ATOMIC);
+		if (!nskb)
+			return -ENOMEM;
+
+		memset(IP6CB(nskb), 0, sizeof(*IP6CB(nskb)));
+		icmpv6_send(nskb, type, code, 0);
+		consume_skb(nskb);
+		break;
+#endif
+	default:
+		return -EPROTONOSUPPORT;
+	}
+
+	return 0;
+}
+
 __bpf_kfunc_end_defs();

 int bpf_dynptr_from_skb_rdonly(struct __sk_buff *skb, u64 flags,
@@ -12588,6 +12673,10 @@ BTF_KFUNCS_START(bpf_kfunc_check_set_sock_ops)
 BTF_ID_FLAGS(func, bpf_sock_ops_enable_tx_tstamp)
 BTF_KFUNCS_END(bpf_kfunc_check_set_sock_ops)

+BTF_KFUNCS_START(bpf_kfunc_check_set_icmp_send)
+BTF_ID_FLAGS(func, bpf_icmp_send)
+BTF_KFUNCS_END(bpf_kfunc_check_set_icmp_send)
+
 static const struct btf_kfunc_id_set bpf_kfunc_set_skb = {
 	.owner = THIS_MODULE,
 	.set = &bpf_kfunc_check_set_skb,
@@ -12618,6 +12707,11 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_sock_ops = {
 	.set = &bpf_kfunc_check_set_sock_ops,
 };

+static const struct btf_kfunc_id_set bpf_kfunc_set_icmp_send = {
+	.owner = THIS_MODULE,
+	.set = &bpf_kfunc_check_set_icmp_send,
+};
+
 static int __init bpf_kfunc_init(void)
 {
 	int ret;
@@ -12639,6 +12733,7 @@ static int __init bpf_kfunc_init(void)
 	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
 					       &bpf_kfunc_set_sock_addr);
 	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
+	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SKB, &bpf_kfunc_set_icmp_send);
 	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCK_OPS, &bpf_kfunc_set_sock_ops);
 }
 late_initcall(bpf_kfunc_init);
--
2.34.1


^ permalink raw reply related

* [PATCH bpf-next v10 0/5] bpf: add icmp_send kfunc
From: Mahe Tardy @ 2026-06-25 11:03 UTC (permalink / raw)
  To: bpf
  Cc: andrii, ast, daniel, john.fastabend, jordan, martin.lau,
	yonghong.song, emil, netdev, edumazet, kuba, pabeni, davem, horms,
	Mahe Tardy

Hello,

This is v10 of adding the icmp_send kfunc, as suggested during
LSF/MM/BPF 2025[^1]. The goal is to allow cgroup_skb programs to
actively reject east-west traffic, similarly to what is possible to do
with netfilter reject target. Applications can receive early feedback
that something went wrong during the TCP handshake.

The first step to implement this is using ICMP control messages, with
the ICMP_DEST_UNREACH type with various code ICMP_NET_UNREACH,
ICMP_HOST_UNREACH, ICMP_PROT_UNREACH, etc. This is easier to implement
than a TCP RST reply and will already hint the client TCP stack to abort
the connection and not retry extensively.

Note that this is different than the sock_destroy kfunc, that along
calls tcp_abort and thus sends a reset, destroying the underlying
socket.

Caveats of this kfunc design are that a program can call this function N
times, thus send N ICMP unreach control messages and that the program
can return from the BPF filter with pass leading to a potential
confusing situation where the TCP connection was established while the
client received ICMP_DEST_UNREACH messages.

v2 updates:
- fix a build error from a missing function call rename;
- avoid changing return line in bpf_kfunc_init;
- return SK_DROP from the kfunc (similarly to bpf_redirect);
- check the return value in the selftest.

v3 update:
- fix an undefined reference build error.

v4 updates:
- prevent the kfunc to be called recursively and add a test (thanks to
  Martin).
- do not fetch dst route when unnecessary (thanks to Martin).
- extend the test for IPv6 (thanks to Martin).
- use SK_DROP in examples and use non blocking sockets for testing
  (thanks to Martin).
- test when the kfunc returns -EINVAL (thanks to Jordan).
- add the kfunc to bpf_kfunc_set_skb as suggested by Alexei.
- guard the IPv4 parts with IS_ENABLED(CONFIG_INET).
- fix a wrong initial value for client_fd (thanks to Yonghong).
- add documentation to the kfunc.
- to Jordan: I couldn't include <linux/icmp.h> because of redefines from
  <network_helpers.h>.

v5 updates:
- kfunc name is now icmp_send and takes the control message type as
  parameter for future potential extension (daniel)
- drop the net patches to route packet since now the kfunc is limited to
  cgroup_skb and tc progs (daniel & martin)
- linearize skb headers (sashiko)
- zero SKB control block (sashiko)
- bind to port 0 instead of fixed port (sashiko)
- poll to wait for POLLERR event (sashiko)
- do not use ASSERT_EQ in CMSG_NXTHDR loop (sashiko)
- fix comment about byte order (sashiko)
- fix endianness IP address issue (sashiko)
- add forgotten cleanup_cgroup_environment (sashiko)
- let packets pass in recursion test (sashiko)
- clarify evaluation order for recursion test (sashiko)

v6 updates (all from sashiko):
- bring back the net patches to route packet since tc ingress needs it.
- rename the ip_route_reply helpers from fetch to fill.
- call pskb_network_may_pull on the cloned pkt.
- check explicitly that we received one and only one ICMP err ctrl msg.

v7 updates:
- use consume_skb on success path (stanislav)
- replace recursion protection with CPU_ARRAY by checking the nature of
  the sk (daniel, offline)
- use reverse xmas tree in read_icmp_errqueue (jordan)
- use ASSERT_OK_FD instead of ASSERT_GE whenever possible (jordan)
- add a test for tc (jordan)
- better filtering from host cgroup test progs (sashiko)

v8 updates:
- mostly a resend as it's been sitting as "New" in the queue for almost
  one month, fixed a few nits.
- on new bpf_icmp_send kfunc cgroup_skb test (patch 4/7):
  - guard a close fd with fd >= 0 (jordan)
  - use ASSERT_OK_FD instead of ASSERT_GE (jordan)
  - fixed comment style (sashiko)
- on recursion test (patch 7/7):
  - guard a close fd with fd >= 0 (jordan)
  - fixed comments style (sashiko)
  - filter bpf prog on pid and ICMP message types (sashiko)

v9 updates:
- first, there was a v8.5 that I discussed here[^2] with Emil
  Tsalapatis. I tried once again to make tc work but the ai review found
  something fundamentally wrong. This version removes the tc support for
  now and focuses on cgroup_skb.
- use helper get_socket_local_port instead of getsockname (sashiko)
- use if_nametoindex("lo") instead of value 1 (bpf-ci)
- fix IPV6_RECVERR appearance before IPv6 patch (bpf-ci)
- precise that 0 on success mean icmp_send was called but it was just an
  attempt since this function does not return anything (sashiko)
- explicitly consider ICMP_FRAG_NEEDED as invalid in bpf_icmp_send as
  it would miss the next-hop MTU info. Also test it. (sashiko)
- test for max_code + 1 for invalid (sashiko)
- add review-by tags from Jordan and Emil but remove it on the main
  patch as I have significantly changed it.
- check for rec_count in recursion test (sashiko)
- re-order setup_cgroup_environmment in test (sashiko)
- reset kfunc_ret on every test run (sashiko)
- check for skb route for icmp_send as the function would quietly fail
  and add a test (sashiko)

v10 updates:
- guard against skbs with metadata_dst before calling icmpv6_send
  (sashiko)
- add more review-by tags from Emil and Jordan.

[^1]: https://lwn.net/Articles/1022034/
[^2]: https://lore.kernel.org/bpf/ajvDRCw8cPqXAqQq@gmail.com/

Link to v9: https://lore.kernel.org/bpf/20260624185554.362555-1-mahe.tardy@gmail.com/

Mahe Tardy (5):
  bpf: add bpf_icmp_send kfunc
  selftests/bpf: add bpf_icmp_send kfunc cgroup_skb tests
  selftests/bpf: add bpf_icmp_send kfunc cgroup_skb IPv6 tests
  selftests/bpf: add bpf_icmp_send recursion test
  selftests/bpf: add bpf_icmp_send no route test

 net/core/filter.c                             |  95 +++++++
 .../bpf/prog_tests/icmp_send_kfunc.c          | 269 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/icmp_send.c | 123 ++++++++
 3 files changed, 487 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/progs/icmp_send.c

--
2.34.1


^ permalink raw reply

* [PATCH] staging: fbtft: fix unaligned vmem writes when txbuf is byte-offset
From: suryasaimadhu @ 2026-06-25 11:02 UTC (permalink / raw)
  To: andy
  Cc: gregkh, dri-devel, linux-fbdev, linux-staging, linux-kernel,
	suryasaimadhu
In-Reply-To: <20260625104220.21E5A1F00A3D@smtp.kernel.org>

fbtft_write_vmem16_bus8() and fb_ra8875's write_vmem16_bus8() offset
txbuf16 by one byte for a command/start prefix, then store 16-bit pixel
data via txbuf16[i]. On strict-alignment architectures this can fault
the same way as the write_reg path fixed in the previous patch.

Use put_unaligned() for these stores.

Signed-off-by: suryasaimadhu <suryasaimadhu369@gmail.com>
---
 drivers/staging/fbtft/fb_ra8875.c | 3 ++-
 drivers/staging/fbtft/fbtft-bus.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/fbtft/fb_ra8875.c b/drivers/staging/fbtft/fb_ra8875.c
index 0ab1de664..5b95b0095 100644
--- a/drivers/staging/fbtft/fb_ra8875.c
+++ b/drivers/staging/fbtft/fb_ra8875.c
@@ -10,6 +10,7 @@
 #include <linux/delay.h>
 
 #include <linux/gpio/consumer.h>
+#include <linux/unaligned.h>
 #include "fbtft.h"
 
 #define DRVNAME "fb_ra8875"
@@ -262,7 +263,7 @@ static int write_vmem16_bus8(struct fbtft_par *par, size_t offset, size_t len)
 			to_copy, remain - to_copy);
 
 		for (i = 0; i < to_copy; i++)
-			txbuf16[i] = cpu_to_be16(vmem16[i]);
+			put_unaligned(cpu_to_be16(vmem16[i]), &txbuf16[i]);
 
 		vmem16 = vmem16 + to_copy;
 		ret = par->fbtftops.write(par, par->txbuf.buf,
diff --git a/drivers/staging/fbtft/fbtft-bus.c b/drivers/staging/fbtft/fbtft-bus.c
index cfcf4d7e7..fc3faab7d 100644
--- a/drivers/staging/fbtft/fbtft-bus.c
+++ b/drivers/staging/fbtft/fbtft-bus.c
@@ -158,7 +158,7 @@ int fbtft_write_vmem16_bus8(struct fbtft_par *par, size_t offset, size_t len)
 			to_copy, remain - to_copy);
 
 		for (i = 0; i < to_copy; i++)
-			txbuf16[i] = cpu_to_be16(vmem16[i]);
+			put_unaligned(cpu_to_be16(vmem16[i]), &txbuf16[i]);
 
 		vmem16 = vmem16 + to_copy;
 		ret = par->fbtftops.write(par, par->txbuf.buf,
-- 
2.47.3


^ permalink raw reply related

* RE: [PULL 28/41] vfio/listener: Add missing dirty tracking in region_del
From: Duan, Zhenzhong @ 2026-06-25 11:00 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel@nongnu.org
  Cc: Alex Williamson, Cabiddu, Giovanni, Liu, Yi L
In-Reply-To: <10936003-6199-4cf9-92ba-f648f070d841@redhat.com>



>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Subject: Re: [PULL 28/41] vfio/listener: Add missing dirty tracking in region_del
>
>On 6/24/26 10:02, Duan, Zhenzhong wrote:
>> Hi Cédric,
>>
>>> -----Original Message-----
>>> From: Cédric Le Goater <clg@redhat.com>
>>> Subject: Re: [PULL 28/41] vfio/listener: Add missing dirty tracking in region_del
>>>
>>> Hello Zhenzhong,
>>>
>>> On 1/13/26 10:36, Cédric Le Goater wrote:
>>>> From: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>
>>>> If a VFIO device in guest switches from passthrough(PT) domain to block
>>>> domain, the whole memory address space is unmapped, but we passed a
>NULL
>>>> iotlb entry to unmap_bitmap, then bitmap query didn't happen and we lost
>>>> dirty pages.
>>>>
>>>> By constructing an iotlb entry with iova = gpa for unmap_bitmap, it can
>>>> set dirty bits correctly.
>>>>
>>>> For IOMMU address space, we still send NULL iotlb because VFIO don't know
>>>> the actual mappings in guest. It's vIOMMU's responsibility to send actual
>>>> unmapping notifications, e.g., vtd_address_space_unmap_in_dirty_tracking().
>>>>
>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>> Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com>
>>>> Reviewed-by: Yi Liu <yi.l.liu@intel.com>
>>>> Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-8-
>>> zhenzhong.duan@intel.com
>>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>>> ---
>>>>    hw/vfio/listener.c | 22 +++++++++++++++++++++-
>>>>    1 file changed, 21 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
>>>> index
>>>
>62699cb772d786c1510318dff73973ef4d297177..813621f22f8b5ec284388f9c5f71
>>> 9525ec5f282c 100644
>>>> --- a/hw/vfio/listener.c
>>>> +++ b/hw/vfio/listener.c
>>>> @@ -713,14 +713,34 @@ static void
>vfio_listener_region_del(MemoryListener
>>> *listener,
>>>>
>>>>        if (try_unmap) {
>>>>            bool unmap_all = false;
>>>> +        IOMMUTLBEntry entry = {}, *iotlb = NULL;
>>>>
>>>>            if (int128_eq(llsize, int128_2_64())) {
>>>>                assert(!iova);
>>>>                unmap_all = true;
>>>>                llsize = int128_zero();
>>>>            }
>>>> +
>>>> +        /*
>>>> +         * Fake an IOTLB entry for identity mapping which is needed by dirty
>>>> +         * tracking when switch out of PT domain. In fact, in unmap_bitmap,
>>>> +         * only translated_addr field is used to set dirty bitmap.
>>>> +         *
>>>> +         * Note: When switch into PT domain from DMA domain, the whole
>>> IOMMU
>>>> +         * MR is deleted without iotlb, before that happen, we depend on
>>>> +         * vIOMMU to send unmap notification with accurate iotlb entry to
>>>> +         * VFIO. See vtd_address_space_unmap_in_dirty_tracking() for example,
>>>> +         * it is triggered during switching to block domain because vtd does
>>>> +         * not support direct switching from DMA to PT domain.
>>>> +         */
>>>> +        if (global_dirty_tracking && memory_region_is_ram(section->mr)) {
>>>> +            entry.iova = iova;
>>>> +            entry.translated_addr = iova;
>>>
>>> In an experiment involving a nested setup with L1 + intel_iommu,
>>> an L2 VM with an assigned igb VF can trigger the following QEMU
>>> segv :
>>>
>>>    bitmap_set_atomic(map=NULL, start=1, nr=1)
>>>    physical_memory_set_dirty_range(start=0x380004040000, length=4096)
>>>    physical_memory_set_dirty_lebitmap(start=0x380004040000, pages=3)
>>>    vfio_container_query_dirty_bitmap(translated_addr=0x380004040000)
>>>    vfio_legacy_dma_unmap_one(iova=0x380004040000, size=12288)
>>>    vfio_listener_region_del()
>>
>> Hmm, looks a range in PCI space, I suspect it's a PCI bar but it has 3 pages.
>
>It's the nested VT-d interrupt remapping table.
>
>> Is igb VF a real hw passthrough to L1, then L1 pass it to L2?
>
>Iit's an L1 emulated igb VF assigned to an L2.
>
>> Seems L2 crash, not L1?
>yes. It occured a few times and  I can't reproduce anymore. hmm.

It can be reproduced with below steps:

1. Start guest with VFIO device in legacy iommu mode, then start dirty rate tracking.

(qemu) calc_dirty_rate -b 60

2. disable VFIO device bar region by clear bit1 in COMMAND register in guest.
# setpci -s 01:00.0 COMMAND=0005

3. below trace triggers:

vfio_listener_region_del region_del 0xe0400020000 - 0xe0400021fff

Thread 10 "CPU 6/KVM" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffde5dfb640 (LWP 217374)]
0x000055555613c8a0 in bitmap_set_atomic (map=0x0, start=32, nr=1) at ../util/bitmap.c:213
213             qatomic_or(p, mask_to_set);
(gdb) bt
#0  0x000055555613c8a0 in bitmap_set_atomic (map=0x0, start=32, nr=1) at ../util/bitmap.c:213
#1  0x0000555555d6a0e2 in physical_memory_set_dirty_range (start=15410342789120, length=4096, mask=5 '\005') at ../system/physmem.c:1042
#2  0x0000555555d6b2c9 in physical_memory_set_dirty_lebitmap (bitmap=0x7ffdc80ac690, start=15410342789120, pages=2) at ../system/physmem.c:1305
#3  0x0000555555ca97ee in vfio_legacy_dma_unmap_get_dirty_bitmap (container=0x5555576fc980, iova=15410342789120, size=8192, iotlb=0x7ffde5df9e20) at ../hw/vfio/container-legacy.c:112
#4  0x0000555555ca9927 in vfio_legacy_dma_unmap_one (container=0x5555576fc980, iova=15410342789120, size=8192, flags=0, iotlb=0x7ffde5df9e20) at ../hw/vfio/container-legacy.c:145
#5  0x0000555555ca9aba in vfio_legacy_dma_unmap (bcontainer=0x5555576fc980, iova=15410342789120, size=8192, iotlb=0x7ffde5df9e20, unmap_all=false) at ../hw/vfio/container-legacy.c:196
#6  0x0000555555ca84b5 in vfio_container_dma_unmap (bcontainer=0x5555576fc980, iova=15410342789120, size=8192, iotlb=0x7ffde5df9e20, unmap_all=false) at ../hw/vfio/container.c:131
#7  0x0000555555ca68cd in vfio_listener_region_del (listener=0x5555576fc9b0, section=0x7ffde5df9ed0) at ../hw/vfio/listener.c:753
#8  0x0000555555d5d9a9 in address_space_update_topology_pass (as=0x5555574c1880 <address_space_memory>, old_view=0x7ffd7c001580, new_view=0x7ffdc8000c00, adding=false)


>
>That seemed to fix it
>
>-            entry.translated_addr = iova;
>+            entry.translated_addr = memory_region_get_ram_addr(section->mr) +
>+                                    section->offset_within_region;

I verified this works.

>
>
>Thanks,
>
>C.
>
>
>>>
>>> The VT-d interrupt-remapping table is mapped pretty high and
>>> translated_addr ends up as a very large value. It seems that
>>> the correct value should be :
>>>
>>>    translated_addr = memory_region_get_ram_addr(section->mr) +
>>>                      section->offset_within_region;
>>>
>>> as found in vfio_iommu_map_dirty_notify() and
>>> and vfio_ram_discard_query_dirty_bitmap().
>>>
>>> Is that correct ?
>>
>> Good catch, it looks we lack a conversion from iotlb->translated_addr to
>dirty_block offset just like in vfio_iommu_map_dirty_notify().
>> iotlb->translated_addr is designed to hold translation result from vIOMMU. How
>about below(untested)?
>>
>> --- a/hw/vfio/container-legacy.c
>> +++ b/hw/vfio/container-legacy.c
>> @@ -71,7 +71,7 @@ static int
>vfio_ram_block_discard_disable(VFIOLegacyContainer *container,
>>   static int
>>   vfio_legacy_dma_unmap_get_dirty_bitmap(const VFIOLegacyContainer
>*container,
>>                                          hwaddr iova, uint64_t size,
>> -                                       IOMMUTLBEntry *iotlb)
>> +                                       hwaddr translated_addr)
>>   {
>>       const VFIOContainer *bcontainer = VFIO_IOMMU(container);
>>       struct vfio_iommu_type1_dma_unmap *unmap;
>> @@ -109,8 +109,8 @@ vfio_legacy_dma_unmap_get_dirty_bitmap(const
>VFIOLegacyContainer *container,
>>
>>       ret = ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, unmap);
>>       if (!ret) {
>> -        physical_memory_set_dirty_lebitmap(vbmap.bitmap,
>> -                iotlb->translated_addr, vbmap.pages);
>> +        physical_memory_set_dirty_lebitmap(vbmap.bitmap, translated_addr,
>> +                                           vbmap.pages);
>>       } else {
>>           error_report("VFIO_UNMAP_DMA with DIRTY_BITMAP : %m");
>>       }
>> @@ -134,16 +134,27 @@ static int vfio_legacy_dma_unmap_one(const
>VFIOLegacyContainer *container,
>>           .size = size,
>>       };
>>       bool need_dirty_sync = false;
>> +    MemoryRegion *mr;
>> +    hwaddr translated_addr, xlat;
>>       int ret;
>>       Error *local_err = NULL;
>>
>>       g_assert(!cpr_is_incoming());
>>
>> +    RCU_READ_LOCK_GUARD();
>> +
>> +    mr = vfio_translate_iotlb(iotlb, &xlat, &local_err);
>> +    if (!mr) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    translated_addr = memory_region_get_ram_addr(mr) + xlat;
>> +
>>       if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
>>           if (!vfio_container_devices_dirty_tracking_is_supported(bcontainer) &&
>>               bcontainer->dirty_pages_supported) {
>>               return vfio_legacy_dma_unmap_get_dirty_bitmap(container, iova, size,
>> -                                                          iotlb);
>> +                                                          translated_addr);
>>           }
>>
>>           need_dirty_sync = true;
>> @@ -155,7 +166,7 @@ static int vfio_legacy_dma_unmap_one(const
>VFIOLegacyContainer *container,
>>
>>       if (need_dirty_sync) {
>>           ret = vfio_container_query_dirty_bitmap(bcontainer, iova, size, 0,
>> -                                    iotlb->translated_addr, &local_err);
>> +                                                translated_addr, &local_err);
>>           if (ret) {
>>               error_report_err(local_err);
>>               return ret;
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 68f2ae6f9f..969629b0f5 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -65,6 +65,8 @@ static int iommufd_cdev_unmap(const VFIOContainer
>*bcontainer,
>>       IOMMUFDBackend *be = container->be;
>>       uint32_t ioas_id = container->ioas_id;
>>       bool need_dirty_sync = false;
>> +    hwaddr translated_addr, xlat;
>> +    MemoryRegion *mr;
>>       Error *local_err = NULL;
>>       int ret, unmap_ret;
>>
>> @@ -72,12 +74,21 @@ static int iommufd_cdev_unmap(const VFIOContainer
>*bcontainer,
>>           size = UINT64_MAX;
>>       }
>>
>> +    RCU_READ_LOCK_GUARD();
>> +
>> +    mr = vfio_translate_iotlb(iotlb, &xlat, &local_err);
>> +    if (!mr) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    translated_addr = memory_region_get_ram_addr(mr) + xlat;
>> +
>>       if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
>>           if (!vfio_container_devices_dirty_tracking_is_supported(bcontainer) &&
>>               bcontainer->dirty_pages_supported) {
>>               ret = vfio_container_query_dirty_bitmap(bcontainer, iova, size,
>>                                                       IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR,
>> -                                                    iotlb->translated_addr,
>> +                                                    translated_addr,
>>                                                       &local_err);
>>               if (ret) {
>>                   error_report_err(local_err);
>> @@ -103,7 +114,7 @@ static int iommufd_cdev_unmap(const VFIOContainer
>*bcontainer,
>>
>>       if (need_dirty_sync) {
>>           ret = vfio_container_query_dirty_bitmap(bcontainer, iova, size, 0,
>> -                                                iotlb->translated_addr,
>> +                                                translated_addr,
>>                                                   &local_err);
>>           if (ret) {
>>               error_report_err(local_err);
>> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
>> index 109b5d61af..e1b078a7b3 100644
>> --- a/hw/vfio/listener.c
>> +++ b/hw/vfio/listener.c
>> @@ -97,8 +97,8 @@ static bool
>vfio_listener_skipped_section(MemoryRegionSection *section,
>>    * Called with rcu_read_lock held.
>>    * The returned MemoryRegion must not be accessed after calling
>rcu_read_unlock.
>>    */
>> -static MemoryRegion *vfio_translate_iotlb(IOMMUTLBEntry *iotlb, hwaddr
>*xlat_p,
>> -                                          Error **errp)
>> +MemoryRegion *vfio_translate_iotlb(IOMMUTLBEntry *iotlb, hwaddr *xlat_p,
>> +                                   Error **errp)
>>   {
>>       MemoryRegion *mr;

Above change needs below tweak to work:

@@ -134,7 +134,6 @@ static int vfio_legacy_dma_unmap_one(const VFIOLegacyContainer *container,
         .size = size,
     };
     bool need_dirty_sync = false;
-    MemoryRegion *mr;
     hwaddr translated_addr, xlat;
     int ret;
     Error *local_err = NULL;
@@ -143,14 +142,15 @@ static int vfio_legacy_dma_unmap_one(const VFIOLegacyContainer *container,

     RCU_READ_LOCK_GUARD();

-    mr = vfio_translate_iotlb(iotlb, &xlat, &local_err);
-    if (!mr) {
-        return -EINVAL;
-    }
+    if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
+        MemoryRegion *mr = vfio_translate_iotlb(iotlb, &xlat, &local_err);

-    translated_addr = memory_region_get_ram_addr(mr) + xlat;
+        if (!mr) {
+            return -EINVAL;
+        }
+
+        translated_addr = memory_region_get_ram_addr(mr) + xlat;

-    if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
         if (!vfio_container_devices_dirty_tracking_is_supported(bcontainer) &&
             bcontainer->dirty_pages_supported) {
             return vfio_legacy_dma_unmap_get_dirty_bitmap(container, iova, size,

Thanks
Zhenzhong

^ permalink raw reply

* Re: [PATCH 2/2] media: rzg2l-cru: Align bytesperline to hardware DMA stride requirement
From: Tommaso Merciai @ 2026-06-25 11:01 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: tomm.merciai, linux-renesas-soc, biju.das.jz, jacopo.mondi,
	Lad Prabhakar, Mauro Carvalho Chehab, Hans Verkuil,
	Nicolas Dufresne, Sakari Ailus, Sven Püschel, Mehdi Djait,
	Paul Cercueil, Isaac Scott, Daniel Scally, linux-media,
	linux-kernel
In-Reply-To: <20260624195334.GI851255@killaraus.ideasonboard.com>

Hi Laurent,
Thanks for your review.

On Wed, Jun 24, 2026 at 10:53:34PM +0300, Laurent Pinchart wrote:
> On Wed, Jun 24, 2026 at 12:41:31PM +0200, Tommaso Merciai wrote:
> > The RZ/G3E CRU programs the line stride via the AMnIS register, whose
> > IS field encodes the value in units of 128 bytes. If bytesperline is
> > not a multiple of 128, the division truncates and the hardware uses a
> > wrong stride, causing horizontal banding.
> > 
> > commit ace92ccef0c9 ("media: platform: rzg2l-cru: Use v4l2_fill_pixfmt()")
> 
> s/commit/Commit/

thanks.

> 
> > replaced the open-coded aligned calculation with v4l2_fill_pixfmt(),
> > which sets no alignment, reintroducing the issue.
> 
> I wonder how I missed that. Sorry.
> 
> > Switch to v4l2_fill_pixfmt_aligned() with RZG2L_CRU_STRIDE_ALIGN when
> > info->has_stride is set. RZ/G2L has no AMnIS register and keeps using
> > v4l2_fill_pixfmt() unchanged.
> > 
> > Fixes: ace92ccef0c9 ("media: platform: rzg2l-cru: Use v4l2_fill_pixfmt()")
> > Signed-off-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com>
> > ---
> >  drivers/media/platform/renesas/rzg2l-cru/rzg2l-video.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/media/platform/renesas/rzg2l-cru/rzg2l-video.c b/drivers/media/platform/renesas/rzg2l-cru/rzg2l-video.c
> > index 69346a585f9f..478264f26466 100644
> > --- a/drivers/media/platform/renesas/rzg2l-cru/rzg2l-video.c
> > +++ b/drivers/media/platform/renesas/rzg2l-cru/rzg2l-video.c
> > @@ -860,7 +860,8 @@ static void rzg2l_cru_format_align(struct rzg2l_cru_dev *cru,
> >  	v4l_bound_align_image(&pix->width, 320, info->max_width, 1,
> >  			      &pix->height, 240, info->max_height, 0, 0);
> >  
> > -	v4l2_fill_pixfmt(pix, pix->pixelformat, pix->width, pix->height);
> > +	v4l2_fill_pixfmt_aligned(pix, pix->pixelformat, pix->width, pix->height,
> > +				 info->has_stride ? RZG2L_CRU_STRIDE_ALIGN : 1);
> 
> The documentation states that, for RGB888, the stride has to be a
> multiple of 384 (3*128). Shouldn't you take that into account here ?
> 
> Also, for semi-planar YUV 4:2:0, the hardware seems to use a stride
> equal to AMnIS*2, which leaves blank lines after every U/V line. That's
> something userspace doesn't expect.

Correct.

Currently neither RGB888 nor semi-planar YUV 4:2:0 are supported.
I will handle this once the support for those formats will be added
if for you is ok.

Please let me know.
Thanks.

Kind Regards,
Tommaso

> 
> >  
> >  	dev_dbg(cru->dev, "Format %ux%u bpl: %u size: %u\n",
> >  		pix->width, pix->height, pix->bytesperline, pix->sizeimage);
> 
> -- 
> Regards,
> 
> Laurent Pinchart

^ permalink raw reply

* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
From: Carlos Maiolino @ 2026-06-25 11:01 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, linux-xfs, stable, Darrick J. Wong,
	Dave Chinner, Eric Sandeen, Dr. Thomas Orgis, linux-fsdevel,
	Christian Brauner
In-Reply-To: <qcsdbdpp23fsu3cqhpjdpwusvl6onc2knnrun522ofrutxpz6j@reh3k2ofqjir>

On Thu, Jun 25, 2026 at 12:43:54PM +0200, Jan Kara wrote:
> On Wed 24-06-26 15:40:39, Christoph Hellwig wrote:
> > Adding Jan and Christian for quota and user_ns knowledge.
> > 
> > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> > > From: Carlos Maiolino <cem@kernel.org>
> > > 
> > > An user reported a bug where he managed to evade group's quota
> > > by changing a file's gid to a different group id the same user
> > > belonged to, even though quotas were enforced on both gids and the
> > > file's size was big enough to exceed the quota's hardlimit.
> > > 
> > > Commit eba0549bc7d1 replaced a capable() call by a
> > > has_capability_noaudit() to prevent unnecessary selinux audit messages.
> > > Turns out that both calls have slightly different semantics even though
> > > their documentation seems similar. Where in a nutshell:
> > > 
> > > capable() - Tests the task's effective credentials
> > > has_ns_capability_noaudit() - Tests the task's real credentials
> > 
> > Eww..
> 
> Yeah, that's a catch.
> 
> > > This most of the time has no practical difference but in some cases like
> > > changing attrs (specifically group id in this case) through a NFS client
> > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> > > bypassing quota accounting checks.
> > 
> > Yeah, this does look wrong.  Do the other conversion in the above commit
> > have tthe same issue?
> > 
> > > Using instead ns_capable_noaudit() should fix this issue and prevent
> > > selinux audit messages.
> > 
> > The generic quota code manages to do without either has_capability_noaudit
> > or ns_capable_noaudit.  I think this might be hidden behind
> > inode_owner_or_capable calls.  Any idea why we're different?
> 
> Actually no. Generic quota code has equivalent checks in ignore_hardlimit()
> function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why
> nobody complained about generic quota code is that we call
> ignore_hardlimit() only if we are above hardlimit whereas XFS calls this
> for every transaction...

But the capable() call works fine, it's the replacement by
has_capability_noaudit() that enables quota bypass I *think* generic
quota code is safe from this point.

> 
> 								Honza
> 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH v3 1/7] list: Add mutable iterator variants
From: Jani Nikula @ 2026-06-25 11:00 UTC (permalink / raw)
  To: Kaitao Cheng, David Laight, Christian König,
	David Hildenbrand (Arm), Alexei Starovoitov
  Cc: Andrew Morton, David Hildenbrand, Jens Axboe, Tejun Heo,
	Alexander Viro, Christian Brauner, Daniel Borkmann,
	Andrii Nakryiko, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
	Juri Lelli, Vincent Guittot, Paul Moore, Andy Shevchenko,
	Paul E. McKenney, Shakeel Butt, David Howells, Simona Vetter,
	Randy Dunlap, Luca Ceresoli, Philipp Stanner, linux-block,
	linux-kernel, cgroups, linux-ntfs-dev, linux-fsdevel, io-uring,
	audit, bpf, netdev, dri-devel, linux-perf-users,
	linux-trace-kernel, kexec, live-patching, linux-modules,
	linux-crypto, linux-pm, rcu, sched-ext, linux-mm, virtualization,
	damon, llvm, Kaitao Cheng, Muchun Song
In-Reply-To: <0ed6b5c3-e955-46e2-9fc6-075a0dfd1c4f@linux.dev>

On Thu, 25 Jun 2026, Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
> 在 2026/6/24 22:23, David Laight 写道:
>> On Wed, 24 Jun 2026 15:23:47 +0200
>> Christian König <christian.koenig@amd.com> wrote:
>>> On 6/24/26 15:14, Kaitao Cheng wrote:
>>>> 在 2026/6/22 16:42, David Laight 写道:  
>>>>> On Mon, 22 Jun 2026 12:05:31 +0800
>>>>> Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>>>>>  
>>>>>> From: Kaitao Cheng <chengkaitao@kylinos.cn>
>>>>>>
>>>>>> The list_for_each*_safe() helpers are used when the loop body may
>>>>>> remove the current entry.  Their API exposes the temporary cursor at
>>>>>> every call site, even though most users only need it for the iterator
>>>>>> implementation and never reference it in the loop body.
>>>>>>
>>>>>> Add *_mutable() variants for list and hlist iteration.  The new helpers
>>>>>> support both forms: callers may keep passing an explicit temporary cursor
>>>>>> when they need to inspect or reset it, or omit it and let the helper use
>>>>>> a unique internal cursor.  
>>>>>
>>>>> I'm not really sure 'mutable' means anything either.
>>>>> It is possible to make it valid for the loop body (or even other threads)
>>>>> to delete arbitrary list items - but that needs significant extra overheads.
>>>>>
>>>>> It might be worth doing something that doesn't need the extra variable,
>>>>> but there is little point doing all the churn just to rename things.
>>>>>  
>>>>>>
>>>>>> This makes call sites that only mutate the list through the current entry
>>>>>> less noisy, while keeping the existing *_safe() helpers available for
>>>>>> compatibility.
>>>>>>
>>>>>> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
>>>>>> ---
>>>>>>  include/linux/list.h | 269 +++++++++++++++++++++++++++++++++++++------
>>>>>>  1 file changed, 231 insertions(+), 38 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/list.h b/include/linux/list.h
>>>>>> index 09d979976b3b..1081def7cea9 100644
>>>>>> --- a/include/linux/list.h
>>>>>> +++ b/include/linux/list.h
>>>>>> @@ -7,6 +7,7 @@
>>>>>>  #include <linux/stddef.h>
>>>>>>  #include <linux/poison.h>
>>>>>>  #include <linux/const.h>
>>>>>> +#include <linux/args.h>
>>>>>>  
>>>>>>  #include <asm/barrier.h>
>>>>>>  
>>>>>> @@ -763,28 +764,72 @@ static inline void list_splice_tail_init(struct list_head *list,
>>>>>>  #define list_for_each_prev(pos, head) \
>>>>>>  	for (pos = (head)->prev; !list_is_head(pos, (head)); pos = pos->prev)
>>>>>>  
>>>>>> -/**
>>>>>> - * list_for_each_safe - iterate over a list safe against removal of list entry
>>>>>> - * @pos:	the &struct list_head to use as a loop cursor.
>>>>>> - * @n:		another &struct list_head to use as temporary storage
>>>>>> - * @head:	the head for your list.
>>>>>> +/*
>>>>>> + * list_for_each_safe is an old interface, use list_for_each_mutable instead.
>>>>>>   */
>>>>>>  #define list_for_each_safe(pos, n, head) \
>>>>>>  	for (pos = (head)->next, n = pos->next; \
>>>>>>  	     !list_is_head(pos, (head)); \
>>>>>>  	     pos = n, n = pos->next)
>>>>>>  
>>>>>> +#define __list_for_each_mutable_internal(pos, tmp, head)		\
>>>>>> +	for (typeof(pos) tmp = (pos = (head)->next)->next;		\  
>>>>>
>>>>> Use auto
>>>>>  
>>>>>> +	     !list_is_head(pos, (head));				\
>>>>>> +	     pos = tmp, tmp = pos->next)
>>>>>> +
>>>>>> +#define __list_for_each_mutable1(pos, head)				\
>>>>>> +	__list_for_each_mutable_internal(pos, __UNIQUE_ID(next), head)
>>>>>> +
>>>>>> +#define __list_for_each_mutable2(pos, next, head)			\
>>>>>> +	list_for_each_safe(pos, next, head)
>>>>>> +
>>>>>>  /**
>>>>>> - * list_for_each_prev_safe - iterate over a list backwards safe against removal of list entry
>>>>>> + * list_for_each_mutable - iterate over a list safe against entry removal
>>>>>>   * @pos:	the &struct list_head to use as a loop cursor.
>>>>>> - * @n:		another &struct list_head to use as temporary storage
>>>>>> - * @head:	the head for your list.
>>>>>> + * @...:	either (head) or (next, head)
>>>>>> + *
>>>>>> + * next:	another &struct list_head to use as optional temporary storage.
>>>>>> + *		The temporary cursor is internal unless explicitly supplied by
>>>>>> + *		the caller.
>>>>>> + * head:	the head for your list.
>>>>>> + */
>>>>>> +#define list_for_each_mutable(pos, ...)					\
>>>>>> +	CONCATENATE(__list_for_each_mutable, COUNT_ARGS(__VA_ARGS__))	\
>>>>>> +		(pos, __VA_ARGS__)  
>>>>>
>>>>> The variable argument count logic really just slows down compilation.
>>>>> Maybe there aren't enough copies of this code to make that significant.
>>>>> But just because you can do it doesn't mean it is a gooD idea.
>>>>> I'm also not sure it really adds anything to the readability.
>>>>>
>>>>> And, it you are going to make the middle argument optional there is
>>>>> no need to change the macro name.  
>>>>
>>>> Christian König and Jani Nikula also disagree with the variadic-argument
>>>> implementation approach. If we abandon that method, it means we will
>>>> inevitably need to add some new macros. If mutable is not a good name,
>>>> suggestions for better alternatives would be welcome; coming up with a
>>>> suitable name is indeed rather tricky.  
>>>
>>> I don't think you need to add a new macro for the specific use case that people want to modify the next element of the iteration.
>>>
>>> If I remember your numbers correctly that is a really corner case and keeping using the existing *_safe() macros for that sounds perfectly fine to me.
>> 
>> IIRC currently you have a choice of either:
>> 	define               Item that can't be deleted
>> 	list_for_each()	     The current item.
>> 	list_for_each_safe() The next item.
>> There is also likely to be code that updates the variables to allow
>> for other scenarios.
>> 
>> Note that if increase a reference count and release a lock then list_for_each()
>> is likely safer than list_for_each_safe() :-)
>> 
>> list.h has 9 variants of the 'safe' loop.
>> The bloat of another 9 is getting excessive.
>> 
>> It has to be said that this is one of my least favourite type of list...
>
> Hi Christian König, David Laight, Jani Nikula, David Hildenbrand,
> Andy Shevchenko, Alexei Starovoitov
>
> For ease of discussion, I need to summarize the currently possible
> approaches and briefly describe their respective pros and cons,
> using the list_for_each_entry* interfaces as examples.
>
> 1. Add list_for_each_entry_mutable, while keeping list_for_each_entry
> and list_for_each_entry_safe unchanged. list_for_each_entry_mutable
> would be used specifically for safe deletion scenarios that do not
> need to expose the temporary cursor externally. The code can refer to
> the v1 version.
>
> Pros: Does not depend on immediate per-subsystem adaptation and can be
>       merged directly.
> Cons: Requires adding a whole set of mutable interfaces, which makes the
>       code somewhat redundant.

Seems fine, and the original _safe naming is ambiguous anyway.

> 2. Directly optimize away the temporary cursor in list_for_each_entry_safe
> and define it inside the loop instead, changing the interface from four
> arguments to three.
>
> Pros: Does not add redundant interfaces.
> Cons: (1) Users need to manually update special cases that use the
>       traversal variable of list_for_each_entry_safe, the new
>       list_for_each_entry_safe would no longer apply there and would
>       need to be open-coded.
>       (2) Because the macro arguments changes, all list_for_each_entry_safe
>       callers would need to be modified and merged together, making it
>       difficult to merge such a large amount of code at once.

This won't fly because there are literally thousands of
list_for_each_entry_safe() users.

> 3. Use a variadic macro approach to optimize list_for_each_entry_safe,
> so that it supports both three and four arguments.
>
> Pros: (1) Does not add redundant interfaces.
>       (2) Does not depend on immediate per-subsystem adaptation and can
>       be merged directly.
> Cons: (1) Increases compile time.
>       (2) Makes the interface harder for users to use.

Basically I'm against any variadic macro tricks where the optional
argument is not the last argument. That's just way too surprising, and
goes against common practice in just about all other languages.

> 4. Optimize list_for_each_entry by defining the temporary cursor internally,
> making it compatible with the functionality of list_for_each_entry_safe.
> The code can refer to the v2 version.
>
> Pros: (1) Does not add redundant interfaces.
>       (2) The number of externally visible arguments of list_for_each_entry
>       remains unchanged, still three.
> Cons: (1) list_for_each_entry and list_for_each_entry_safe would be merged
>       into one, and list_for_each_entry_safe would gradually be deprecated.
>       (2) Users need to manually update special cases that use the traversal
>       variable of list_for_each_entry, the new list_for_each_entry would no
>       longer apply there and would need to be open-coded. There are 15 such
>       cases in total.

This sounds good to me, though I take it there's some code size increase
and/or performance penalty?

Maybe the 15 cases are questionable anyway?

> 5. Use a variadic macro approach to optimize list_for_each_entry, so that
> it supports both three and four arguments.
>
> Pros: (1) Does not add redundant interfaces.
>       (2) Does not depend on immediate per-subsystem adaptation and can be
>       merged directly.
> Cons: (1) Increases compile time.
>       (2) list_for_each_entry and list_for_each_entry_safe would be merged
>       into one, and list_for_each_entry_safe would gradually be deprecated.

Please don't do the macro tricks.

> 6. Make no changes, keep the current logic unchanged, and close the current
> email discussion.

I like hiding the temporary stuff when possible.


BR,
Jani.

-- 
Jani Nikula, Intel

^ permalink raw reply

* Re: [PATCH v19 3/7] coresight: tmc: introduce tmc_sysfs_ops to wrap sysfs read operations
From: sashiko-bot @ 2026-06-25 11:00 UTC (permalink / raw)
  To: Jie Gan; +Cc: devicetree, conor+dt, robh
In-Reply-To: <20260625-enable-byte-cntr-for-ctcu-v19-3-8fbbf22e8381@oss.qualcomm.com>

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] The error handling path in `__tmc_probe()` at the `out:` label unconditionally accesses `drvdata->csdev` and attempts to register the crash metadata interface, leading to NULL pointer dereference, error pointer dereference, or Use-After-Free.
--

commit 0e2c5d1f36c04454695412377951547f5c57c845
Author: Jie Gan <jie.gan@oss.qualcomm.com>

coresight: tmc: introduce tmc_sysfs_ops to wrap sysfs read operations

Introduce tmc_sysfs_ops as a wrapper, wrap sysfs read operations,
for reading trace data from the TMC buffer.

> diff --git a/drivers/hwtracing/coresight/coresight-tmc-core.c b/drivers/hwtracing/coresight/coresight-tmc-core.c
> index bc7dd676da473..4b40b692be4d8 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-core.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-core.c

[ ... ]

> @@ -835,6 +826,7 @@ static int __tmc_probe(struct device *dev, struct resource *res)
>  		idr_init(&drvdata->idr);
>  		mutex_init(&drvdata->idr_mutex);
>  		dev_list = "tmc_etr";
> +		drvdata->sysfs_ops = &etr_sysfs_ops;
>  		INIT_LIST_HEAD(&drvdata->etr_buf_list);
>  		break;
>  	case TMC_CONFIG_TYPE_ETF:
> @@ -844,6 +836,7 @@ static int __tmc_probe(struct device *dev, struct resource *res)
>  		desc.subtype.link_subtype = CORESIGHT_DEV_SUBTYPE_LINK_FIFO;
>  		desc.ops = &tmc_etf_cs_ops;
>  		dev_list = "tmc_etf";
> +		drvdata->sysfs_ops = &etb_sysfs_ops;
>  		break;
>  	default:
>  		pr_err("%s: Unsupported TMC config\n", desc.name);

[Severity: High]
This isn't a bug introduced by this patch, but looking at __tmc_probe(),
is it safe to unconditionally call the crashdata functions at the out: label
if probe fails?

If an error occurs during probe (for example, coresight_alloc_device_name()
returns NULL, or coresight_register() returns an ERR_PTR), the code jumps
to out:

out:
	if (is_tmc_crashdata_valid(drvdata) &&
	    !tmc_prepare_crashdata(drvdata))
		register_crash_dev_interface(drvdata, desc.name);

is_tmc_crashdata_valid() then attempts to access drvdata->csdev->dev:

drivers/hwtracing/coresight/coresight-tmc-core.c:is_tmc_crashdata_valid() {
    ...
	/* Check for valid metadata */
	if (!mdata->valid) {
		dev_dbg(&drvdata->csdev->dev,
			"Data invalid in tmc crash metadata\n");
    ...
}

Will this cause a NULL pointer or ERR_PTR dereference if drvdata->csdev
hasn't been successfully registered?

Furthermore, if misc_register() fails, coresight_unregister() frees csdev,
potentially leading to a use-after-free. If register_crash_dev_interface()
succeeds on an error path, could the driver core subsequently free drvdata
via devres cleanup, leaving a dangling pointer in the global miscdevice list?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260625-enable-byte-cntr-for-ctcu-v19-0-8fbbf22e8381@oss.qualcomm.com?part=3

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.