* Re: [PATCH v2 7/7] mm/vmalloc: Stop scanning for compound pages after encountering small pages in vmap
From: Wen Jiang @ 2026-05-20 10:56 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, baohua,
Xueyuan.chen21, dev.jain, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel
In-Reply-To: <ag2CeHWDLLglCF7t@milan>
Sure, will do.
Thanks,
Wen Jiang
On Wed, 20 May 2026 at 17:44, Uladzislau Rezki <urezki@gmail.com> wrote:
>
> On Thu, May 14, 2026 at 05:41:08PM +0800, Wen Jiang wrote:
> > From: "Barry Song (Xiaomi)" <baohua@kernel.org>
> >
> > Users typically allocate memory in descending orders, e.g.
> > 8 → 4 → 0. Once an order-0 page is encountered, subsequent
> > pages are likely to also be order-0, so we stop scanning
> > for compound pages at that point.
> >
> > Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> > Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
> > Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
> > ---
> > mm/vmalloc.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index b3389c8f1..60579bfbf 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3576,6 +3576,12 @@ static int __vmap_huge(unsigned long addr, unsigned long end,
> > map_addr = addr;
> > idx = i;
> > }
> > + /*
> > + * Once small pages are encountered, the remaining pages
> > + * are likely small as well
> > + */
> > + if (shift == PAGE_SHIFT)
> > + break;
> >
> > addr += 1UL << shift;
> > i += 1U << (shift - PAGE_SHIFT);
> > --
> > 2.34.1
> >
> Can we squash this patch with
> "mm/vmalloc: map contiguous pages in batches for vmap() if possible"?
>
> --
> Uladzislau Rezki
^ permalink raw reply
* [PATCH] i2c: imx: fix clock and pinctrl state inconsistency in runtime PM
From: Carlos Song (OSS) @ 2026-05-20 10:49 UTC (permalink / raw)
To: o.rempel, kernel, andi.shyti, Frank.Li, s.hauer, festevam,
carlos.song
Cc: linux-i2c, imx, linux-arm-kernel, linux-kernel, stable
From: Carlos Song <carlos.song@nxp.com>
In i2c_imx_runtime_suspend(), the clock is disabled before switching
the pinctrl state to sleep. If pinctrl_pm_select_sleep_state() fails,
the runtime suspend is aborted but the clock remains disabled, causing
a system crash when the hardware is subsequently accessed.
Fix this by switching the pinctrl state before disabling the clock so
that a pinctrl failure leaves the clock enabled and the hardware
accessible.
In i2c_imx_runtime_resume(), restore the pinctrl state back to sleep
if clk_enable() fails to keep the two consistent.
Fixes: 576eba03c994 ("i2c: imx: switch different pinctrl state in different system power status")
Cc: stable@vger.kernel.org
Signed-off-by: Carlos Song <carlos.song@nxp.com>
---
drivers/i2c/busses/i2c-imx.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
index d651ade86267..54fd5d0e4056 100644
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -1892,9 +1892,15 @@ static void i2c_imx_remove(struct platform_device *pdev)
static int i2c_imx_runtime_suspend(struct device *dev)
{
struct imx_i2c_struct *i2c_imx = dev_get_drvdata(dev);
+ int ret;
+
+ ret = pinctrl_pm_select_sleep_state(dev);
+ if (ret)
+ return ret;
clk_disable(i2c_imx->clk);
- return pinctrl_pm_select_sleep_state(dev);
+
+ return 0;
}
static int i2c_imx_runtime_resume(struct device *dev)
@@ -1907,10 +1913,13 @@ static int i2c_imx_runtime_resume(struct device *dev)
return ret;
ret = clk_enable(i2c_imx->clk);
- if (ret)
+ if (ret) {
dev_err(dev, "can't enable I2C clock, ret=%d\n", ret);
+ pinctrl_pm_select_sleep_state(dev);
+ return ret;
+ }
- return ret;
+ return 0;
}
static int __maybe_unused i2c_imx_suspend_noirq(struct device *dev)
--
2.43.0
^ permalink raw reply related
* Re: [RFC V2 01/14] mm: Abstract printing of pxd_val()
From: David Hildenbrand (Arm) @ 2026-05-20 10:41 UTC (permalink / raw)
To: Dave Hansen, Anshuman Khandual, linux-arm-kernel
Cc: Catalin Marinas, Will Deacon, Ryan Roberts, Mark Rutland,
Lorenzo Stoakes, Andrew Morton, Mike Rapoport, Linu Cherian,
Usama Arif, linux-kernel, linux-mm
In-Reply-To: <74f66e30-ab3d-4352-89ef-1bccc7e9daeb@intel.com>
On 5/19/26 16:28, Dave Hansen wrote:
> On 5/12/26 21:45, Anshuman Khandual wrote:
>> if (!p4d_present(p4d) || p4d_leaf(p4d)) {
>> - pr_alert("pgd:%08llx p4d:%08llx\n", pgdv, p4dv);
>> + pr_alert("pgd:%" __PRIpxx " p4d:%" __PRIpxx "\n",
>> + __PRIpxx_args(pgdv), __PRIpxx_args(p4dv));
>> return;
>> }
>
> That's not the most readable result. Could a printk() format specifier
> make this nicer? Maybe use "%pT"?
>
> pr_alert("pgd:%pT p4d:%pT\n", &pgd, &p4d);
>
> I _think_ it could even get rid of the p??v variables.
That would be nicer indeed, if that works.
--
Cheers,
David
^ permalink raw reply
* Re: [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance
From: David Hildenbrand (Arm) @ 2026-05-20 10:33 UTC (permalink / raw)
To: Lorenzo Stoakes, Suren Baghdasaryan
Cc: Barry Song, Matthew Wilcox, akpm, linux-mm, liam, vbabka, rppt,
mhocko, jack, pfalcato, wanglian, chentao, lianux.mm, kunwu.chan,
liyangouwen1, chrisl, kasong, shikemeng, nphamcs, bhe,
youngjun.park, linux-arm-kernel, linux-kernel, loongarch,
linuxppc-dev, linux-riscv, linux-s390, Nanzhe Zhao
In-Reply-To: <agxbq1TxJdniMQT3@lucifer>
On 5/19/26 14:53, Lorenzo Stoakes wrote:
> On Mon, May 18, 2026 at 12:56:59PM -0700, Suren Baghdasaryan wrote:
>
>>>
>>> I think we either need to fix `fork()`, or keep the current
>>> behavior of dropping the VMA lock before performing I/O.
>>
>> I see. So, this problem arises from the fact that we are changing the
>> pagefaults requiring I/O operation to hold VMA lock...
>> And you want to lock VMA on fork only if vma_is_anonymous(vma) ||
>> is_cow_mapping(vma->vm_flags). So, we will be blocking page faults for
>> anonymous and COW VMAs only while holding mmap_write_lock, preventing
>> any VMA modification. On the surface, that looks ok to me but I might
>> be missing some corner cases. If nobody sees any obvious issues, I
>> think it's worth a try.
>
> Not sure if you noticed but I did raise concerns ;)
>
> I wonder if you've confused the fault path and fork here, as I think Barry has
> been a little unclear on that.
>
> What's being suggested in this thread is to fundamentally change fork behaviour
> so it's different from the entire history of the kernel (or - presumably - at
> least recent history :)
I don't want fork() to become different in that regard.
There is already a slight difference with vs. without per-VMA locks, because
there is a window in-between us taking the write mmap_lock and all the per-VMA
locks. I raised that previously [1] and assumed that it is probably fine.
I also raised in the past why I think we must not allow concurrent page faults,
at least as soon as anonymous memory is involved [2].
... and I raised that this is pretty much slower by design right now: "Well, the
design decision that CONFIG_PER_VMA_LOCK made for now to make page faults fast
and to make blocking any page faults from happening to be slower ..." [3]
[1] https://lore.kernel.org/all/970295ab-e85d-7af3-76e6-df53a5c52f8b@redhat.com/
[2] https://lore.kernel.org/all/7e3f35cc-59b9-bf12-b8b1-4ed78223844a@redhat.com/
[3] https://lore.kernel.org/all/2efa2c89-3765-721d-2c3c-00590054aa5b@redhat.com/
--
Cheers,
David
^ permalink raw reply
* Re: [PATCH V2 00/11] soc: ti: keystone/k3 navigator queue/dma/ringacc cleanups
From: Sai Sree Kartheek Adivi @ 2026-05-20 10:29 UTC (permalink / raw)
To: Nishanth Menon
Cc: Justin Stitt, Bill Wendling, Nick Desaulniers, Nathan Chancellor,
Santosh Shilimkar, afd, llvm, linux-arm-kernel, linux-kernel
In-Reply-To: <20260512170623.3174416-1-nm@ti.com>
On 12:06-20260512, Nishanth Menon wrote:
> Fix W=2 (clang/gcc), sparse, smatch and coccinelle warnings.
> No functional changes.
>
> Tested: NFS boot (via nav subsystem) on k2l-evm, k2hk-evm and k2g-evm
> based on next-20260507:
> https://gist.github.com/nmenon/cff02a5f2a72fde5fcb49664fcc834d2
>
> Changes since V1:
> - update for review comments.
> - Picked up Randy's and Andrew's tags in appropriate patches.
>
> V1: https://lore.kernel.org/all/20260508153211.3688277-1-nm@ti.com/
>
For the entire series,
Reviewed-by: Sai Sree Kartheek Adivi <s-adivi@ti.com>
Regards,
Kartheek
> Nishanth Menon (11):
> soc: ti: knav_qmss: Remove remaining redundant ENOMEM printks
> soc: ti: knav_qmss: Rename global kdev to knav_qdev to fix -Wshadow
> soc: ti: knav_qmss: Inline lockdep condition in for_each_handle_rcu
> soc: ti: knav_qmss: Fix kernel-doc Return: tags
> soc: ti: knav_qmss: Use %pe to print PTR_ERR()
> soc: ti: knav_qmss: Fix __iomem annotations and __be32 type
> soc: ti: knav_qmss_acc: Fix kernel-doc Return: tag
> soc: ti: knav_dma: Remove unused DMA_PRIO_MASK macro
> soc: ti: knav_dma: Remove dead check on unsigned args.args[0]
> soc: ti: knav_dma: Use IOMEM_ERR_PTR() in pktdma_get_regs()
> soc: ti: k3-ringacc: Use str_enabled_disabled() helper
>
> drivers/soc/ti/k3-ringacc.c | 3 +-
> drivers/soc/ti/knav_dma.c | 8 +-
> drivers/soc/ti/knav_qmss.h | 2 +-
> drivers/soc/ti/knav_qmss_acc.c | 2 +-
> drivers/soc/ti/knav_qmss_queue.c | 148 +++++++++++++++----------------
> 5 files changed, 75 insertions(+), 88 deletions(-)
>
> --
> 2.47.0
>
^ permalink raw reply
* [PATCH] phy: rockchip: inno-usb2: Add missing clkout_ctl_phy kerneldoc
From: Heiko Stuebner @ 2026-05-20 10:28 UTC (permalink / raw)
To: vkoul
Cc: neil.armstrong, heiko, jonas, linux-phy, linux-arm-kernel,
linux-rockchip, linux-kernel, kernel test robot
Add the missing documentation for the newly added clkout_ctl_phy field.
Fixes: 2775541de058 ("phy: rockchip: inno-usb2: Add clkout_ctl_phy support")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605150315.MyBNQOPB-lkp@intel.com/
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
---
drivers/phy/rockchip/phy-rockchip-inno-usb2.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/phy/rockchip/phy-rockchip-inno-usb2.c b/drivers/phy/rockchip/phy-rockchip-inno-usb2.c
index 133cfd6624e8..7d8a533f24ae 100644
--- a/drivers/phy/rockchip/phy-rockchip-inno-usb2.c
+++ b/drivers/phy/rockchip/phy-rockchip-inno-usb2.c
@@ -170,7 +170,8 @@ struct rockchip_usb2phy_port_cfg {
* @reg: the address offset of grf for usb-phy config.
* @num_ports: specify how many ports that the phy has.
* @phy_tuning: phy default parameters tuning.
- * @clkout_ctl: keep on/turn off output clk of phy.
+ * @clkout_ctl: register to enable output clk of phy, when set in GRF
+ * @clkout_ctl_phy: register to enable output clk of phy, when set inside phy
* @port_cfgs: usb-phy port configurations.
* @chg_det: charger detection registers.
*/
--
2.47.3
^ permalink raw reply related
* Re: [PATCH v3] arm64: Kconfig: drop unneeded dependency on OF_GPIO for ARCH_MVEBU
From: Arnd Bergmann @ 2026-05-20 10:18 UTC (permalink / raw)
To: Bartosz Golaszewski
Cc: Bartosz Golaszewski, Catalin Marinas, Will Deacon,
linux-arm-kernel, linux-kernel, Linus Walleij
In-Reply-To: <CAMRc=Md5wDV+w7awoiiBoFVG24VEfH8hKvH2VP7dtcRX3m-Wnw@mail.gmail.com>
On Wed, May 20, 2026, at 10:32, Bartosz Golaszewski wrote:
> On Tue, May 5, 2026 at 12:42 PM Linus Walleij <linusw@kernel.org> wrote:
>>
>> On Thu, Apr 30, 2026 at 4:32 PM Bartosz Golaszewski
>> <bartosz.golaszewski@oss.qualcomm.com> wrote:
>>
>> > OF_GPIO is selected automatically on all OF systems. Any symbols it
>> > controls also provide stubs so there's really no reason to select it
>> > explicitly. ARCH_MVEBU already selects GPIOLIB, drop the redundant
>> > OF_GPIO dependency.
>> >
>> > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
>>
>> Reviewed-by: Linus Walleij <linusw@kernel.org>
>>
>
> Arnd, can you please queue this directly for v7.2?
Done, thanks for the patch!
Arnd
^ permalink raw reply
* Re: [PATCH] coresight: fix resource leaks on path build failure
From: Jie Gan @ 2026-05-20 10:13 UTC (permalink / raw)
To: Mike Leach, James Clark
Cc: coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, Suzuki Poulose, Leo Yan,
Alexander Shishkin, Mathieu Poirier, Tingwei Zhang,
Greg Kroah-Hartman, nd
In-Reply-To: <PAVPR08MB9674C4C58210122F3A6E07AB8C012@PAVPR08MB9674.eurprd08.prod.outlook.com>
On 5/20/2026 5:27 PM, Mike Leach wrote:
>
>
>> -----Original Message-----
>> From: James Clark <james.clark@linaro.org>
>> Sent: Wednesday, May 20, 2026 9:38 AM
>> To: Jie Gan <jie.gan@oss.qualcomm.com>
>> Cc: coresight@lists.linaro.org; linux-arm-kernel@lists.infradead.org; linux-
>> kernel@vger.kernel.org; Suzuki Poulose <Suzuki.Poulose@arm.com>; Mike
>> Leach <Mike.Leach@arm.com>; Leo Yan <Leo.Yan@arm.com>; Alexander
>> Shishkin <alexander.shishkin@linux.intel.com>; Mathieu Poirier
>> <mathieu.poirier@linaro.org>; Tingwei Zhang
>> <tingwei.zhang@oss.qualcomm.com>; Greg Kroah-Hartman
>> <gregkh@linuxfoundation.org>
>> Subject: Re: [PATCH] coresight: fix resource leaks on path build failure
>>
>>
>>
>> On 20/05/2026 2:55 am, Jie Gan wrote:
>>>
>>>
>>> On 5/19/2026 9:57 PM, James Clark wrote:
>>>>
>>>>
>>>> On 13/05/2026 2:32 am, Jie Gan wrote:
>>>>> Two related leaks when _coresight_build_path() encounters an error after
>>>>> coresight_grab_device() has already incremented the pm_runtime,
>> module,
>>>>> and device references for a node:
>>>>>
>>>>> 1. In _coresight_build_path(), if kzalloc_obj() for the path node fails
>>>>> after coresight_grab_device() succeeds, coresight_drop_device() was
>>>>> never called, permanently leaking all three references.
>>>>>
>>>>> 2. In coresight_build_path(), on failure the partial path was freed with
>>>>> kfree(path) instead of coresight_release_path(path). kfree() only
>>>>> frees the coresight_path struct itself; it does not iterate
>>>>> path_list
>>>>> to call coresight_drop_device() and kfree() for each coresight_node
>>>>> already added by deeper recursive calls, leaking both the
>>>>> pm_runtime,
>>>>> module, and device references and the node memory for every element
>>>>> on the partial path.
>>>>>
>>>>> Fix both by adding coresight_drop_device() in the OOM unwind of
>>>>> _coresight_build_path(), and replacing kfree(path) with
>>>>> coresight_release_path(path) in coresight_build_path().
>>>>>
>>>>> Fixes: 32b0707a4182 ("coresight: Add try_get_module() in
>>>>> coresight_grab_device()")
>>>>> Fixes: b3e94405941e ("coresight: associating path with session rather
>>>>> than tracer")
>>>>> Signed-off-by: Jie Gan <jie.gan@oss.qualcomm.com>
>>>>> ---
>>>>> drivers/hwtracing/coresight/coresight-core.c | 6 ++++--
>>>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/
>>>>> hwtracing/coresight/coresight-core.c
>>>>> index 46f247f73cf6..c1354ea8e11d 100644
>>>>> --- a/drivers/hwtracing/coresight/coresight-core.c
>>>>> +++ b/drivers/hwtracing/coresight/coresight-core.c
>>>>> @@ -825,8 +825,10 @@ static int _coresight_build_path(struct
>>>>> coresight_device *csdev,
>>>>> return ret;
>>>>> node = kzalloc_obj(struct coresight_node);
>>>>> - if (!node)
>>>>> + if (!node) {
>>>>> + coresight_drop_device(csdev);
>>>>> return -ENOMEM;
>>>>> + }
>>>>> node->csdev = csdev;
>>>>> list_add(&node->link, &path->path_list);
>>>>> @@ -851,7 +853,7 @@ struct coresight_path
>>>>> *coresight_build_path(struct coresight_device *source,
>>>>> rc = _coresight_build_path(source, source, sink, path);
>>>>> if (rc) {
>>>>> - kfree(path);
>>>>> + coresight_release_path(path);
>>>>> return ERR_PTR(rc);
>>>>> }
>>>>>
>>>>> ---
>>>>> base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
>>>>> change-id: 20260513-fix-memory-leak-issue-034b4a45265e
>>>>>
>>>>> Best regards,
>>>>
>>>> Looks good to me, but sashiko is complaining: https://sashiko.dev/#/
>>>> patchset/20260513-fix-memory-leak-issue-
>>>> v1-1-49822d7bc7d4%40oss.qualcomm.com
>>>>
>>>> I'm trying to understand why it's saying that, but I think the
>>>> scenario is that if there are multiple correct paths to a sink, when
>>>> one path partially fails and a second path succeeds you could get a
>>>> path_list with some garbage entries in it.
>>>
>>> I think the coresight_release_path is added to address this situation.
>>> We suffered the path partially failure, and we need release all nodes
>>> already added to the path.
>>>
>>
>> It wouldn't call coresight_release_path() in this case though. If one
>> path ends up building to success but a parallel path partially failed
>> then _coresight_build_path() still returns success. During the search it
>> would have still added the nodes from the partially failed path to the
>> path_list. This is only an issue if there are multiple correct paths.
>>
The point here is there are multiple routes from the same source device
to the same sink device, am right?
I have no experience on this scenario. So with the scenario, the
build_path may succeeded in one route and failed in another route, but
finally, the _coresight_build_path still returns success, is that correct?
>>>>
>>>> That's kind of a different and existing issue to the one you've fixed,
>>>> and assumes that multiple paths to one sink are possible, which I'm
>>>> not sure is supported?
>>>
>>> Each path is unique. We only deal with the issue path for balancing the
>>> reference count.
>>>
>>> Thanks,
>>> Jie
>>>
>>
>> I'm not exactly sure what you mean by unique, but the same source and
>> sink could potentially be connected through two different sets of links.
>>
>
> Multiple paths between a source and sink are not permitted under the CoreSight spec.
>
As Mike mentioned, my understanding is that a source device is only
allowed to be added to one valid path—this is what I mean by “unique.”
Thanks,
Jie
> If such a system was to be built - then a fix would need to be in the declaration of connections - e.g. miss one path out in the device tree for example. Not up to the Coresight drivers to handle out of specification hardware.
>
> Mike
>
>
>>>>
>>>> It might be as easy as breaking the loop early for any return value
>>>> other than -ENODEV, but I'll leave it to you to decide whether to do
>>>> that here or not.
>>>>
>>>> Reviewed-by: James Clark <james.clark@linaro.org>
>>>>
>>>
>
^ permalink raw reply
* [PATCH v3] i2c: imx: mark I2C adapter when hardware is powered down
From: Carlos Song (OSS) @ 2026-05-20 10:15 UTC (permalink / raw)
To: o.rempel, kernel, andi.shyti, Frank.Li, s.hauer, festevam,
carlos.song, haibo.chen
Cc: linux-i2c, imx, linux-arm-kernel, linux-kernel, stable
From: Carlos Song <carlos.song@nxp.com>
Mark the I2C adapter as suspended during system suspend to block further
transfers, and resume it on system resume. This prevents potential hangs
when the hardware is powered down but clients still attempt I2C transfers.
Fixes: 358025ac091e ("i2c: imx: make controller available until system suspend_noirq() and from resume_noirq()")
Cc: stable@vger.kernel.org
Signed-off-by: Carlos Song <carlos.song@nxp.com>
---
Change for v3:
- Add hrtimer_cancel in i2c_imx_suspend_noirq to cancel slave_timer for
safe suspend in i2c slave mode.
Change for v2:
- Call i2c_mark_adapter_suspended() before pm_runtime_force_suspend()
to prevent potential deadlock if a transfer is active during suspend.
- Roll back with i2c_mark_adapter_resumed() if pm_runtime_force_suspend()
fails.
---
drivers/i2c/busses/i2c-imx.c | 41 ++++++++++++++++++++++++++++++++++--
1 file changed, 39 insertions(+), 2 deletions(-)
diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
index a208fefd3c3b..d651ade86267 100644
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -1913,6 +1913,43 @@ static int i2c_imx_runtime_resume(struct device *dev)
return ret;
}
+static int __maybe_unused i2c_imx_suspend_noirq(struct device *dev)
+{
+ struct imx_i2c_struct *i2c_imx = dev_get_drvdata(dev);
+ int ret;
+
+ i2c_mark_adapter_suspended(&i2c_imx->adapter);
+
+ /*
+ * Cancel the slave timer before powering down to prevent
+ * i2c_imx_slave_timeout() from accessing hardware registers
+ * while the clock is disabled.
+ */
+ hrtimer_cancel(&i2c_imx->slave_timer);
+
+ ret = pm_runtime_force_suspend(dev);
+ if (ret) {
+ i2c_mark_adapter_resumed(&i2c_imx->adapter);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int __maybe_unused i2c_imx_resume_noirq(struct device *dev)
+{
+ struct imx_i2c_struct *i2c_imx = dev_get_drvdata(dev);
+ int ret;
+
+ ret = pm_runtime_force_resume(dev);
+ if (ret)
+ return ret;
+
+ i2c_mark_adapter_resumed(&i2c_imx->adapter);
+
+ return 0;
+}
+
static int i2c_imx_suspend(struct device *dev)
{
/*
@@ -1946,8 +1983,8 @@ static int i2c_imx_resume(struct device *dev)
}
static const struct dev_pm_ops i2c_imx_pm_ops = {
- NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
- pm_runtime_force_resume)
+ NOIRQ_SYSTEM_SLEEP_PM_OPS(i2c_imx_suspend_noirq,
+ i2c_imx_resume_noirq)
SYSTEM_SLEEP_PM_OPS(i2c_imx_suspend, i2c_imx_resume)
RUNTIME_PM_OPS(i2c_imx_runtime_suspend, i2c_imx_runtime_resume, NULL)
};
--
2.43.0
^ permalink raw reply related
* Re: [PATCH] fbdev: Consistently define pci_device_ids using named initializers
From: Helge Deller @ 2026-05-20 10:11 UTC (permalink / raw)
To: Uwe Kleine-König (The Capable Hub)
Cc: Benjamin Herrenschmidt, Russell King, Andres Salomon,
Antonino Daplas, linux-fbdev, dri-devel, linux-kernel,
linux-arm-kernel, linux-geode, Markus Schneider-Pargmann
In-Reply-To: <ag1xQVCCzXkc_Ucu@monoceros>
On 5/20/26 10:46, Uwe Kleine-König (The Capable Hub) wrote:
> Hello,
>
> On Thu, Apr 30, 2026 at 01:16:36PM +0200, Uwe Kleine-König (The Capable Hub) wrote:
>> diff --git a/drivers/video/fbdev/matrox/matroxfb_base.c b/drivers/video/fbdev/matrox/matroxfb_base.c
>> index e1a4bc7c2318..22774eb1b14c 100644
>> --- a/drivers/video/fbdev/matrox/matroxfb_base.c
>> +++ b/drivers/video/fbdev/matrox/matroxfb_base.c
>> @@ -1642,7 +1642,7 @@ static int initMatrox2(struct matrox_fb_info *minfo, struct board *b)
>> int err;
>>
>> static const struct pci_device_id intel_82437[] = {
>> - { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82437) },
>> + { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_82437) },
>> { },
>> };
>>
>
> after further sharpening my tooling there is an additional change that
> IMHO should be done here:
>
> - { },
> + { }
>
> and ...
>
>> diff --git a/drivers/video/fbdev/pvr2fb.c b/drivers/video/fbdev/pvr2fb.c
>> index 3f6384e631b1..06aefad75f46 100644
>> --- a/drivers/video/fbdev/pvr2fb.c
>> +++ b/drivers/video/fbdev/pvr2fb.c
>> @@ -993,9 +993,8 @@ static void pvr2fb_pci_remove(struct pci_dev *pdev)
>> }
>>
>> static const struct pci_device_id pvr2fb_pci_tbl[] = {
>> - { PCI_VENDOR_ID_NEC, PCI_DEVICE_ID_NEC_NEON250,
>> - PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 },
>> - { 0, },
>> + { PCI_VDEVICE(NEC, PCI_DEVICE_ID_NEC_NEON250), },
>> + { },
>> };
>>
>> MODULE_DEVICE_TABLE(pci, pvr2fb_pci_tbl);
>
> ... here:
>
> - { PCI_VDEVICE(NEC, PCI_DEVICE_ID_NEC_NEON250), },
> + { PCI_VDEVICE(NEC, PCI_DEVICE_ID_NEC_NEON250) },
> - { },
> + { }
>
> Would you mind squashing that into the patch you already applied, maybe
> adding:
>
> While touching all these arrays, unify usage of whitespace and
> comma in a few drivers.
>
> to the commit log? I can also send a v2 of the patch with these changes
> included if that's easier for you.
>
> Otherwise I will put sending these modifications separately on my todo
> list.
No need to resend anything. I'll clean it up manually during the next few hours....
Helge
^ permalink raw reply
* Re: [PATCH] spi: aspeed: Replace VLA parameter with flat pointer in calibration helper
From: Mark Brown @ 2026-05-20 10:10 UTC (permalink / raw)
To: David Laight
Cc: Chin-Ting Kuo, clg, joel, andrew, linux-aspeed, openbmc,
linux-spi, linux-arm-kernel, linux-kernel, BMC-SW,
kernel test robot
In-Reply-To: <20260519181348.777f7dc5@pumpkin>
[-- Attachment #1: Type: text/plain, Size: 758 bytes --]
On Tue, May 19, 2026 at 06:13:48PM +0100, David Laight wrote:
> Mark Brown <broonie@kernel.org> wrote:
> > On Mon, May 18, 2026 at 05:57:08PM +0800, Chin-Ting Kuo wrote:
> > > - while (k < cols && buf[i][k])
> > > + while (k < cols && buf[i * cols + k])
> > This really needs () to make it clear what's going on; the precedence is
> > well defined but not everyone is going to know that off the top of their
> > head.
> Come on, it's multiply and add - everyone is going to get that right.
No, I have to stop and think. It's not just "what is the rule" it's
also "is that the same rule whoever wrote the code thought there was" -
implicit precedence is the sort of thing that flags up as an alarm bell
when scanning through code.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance
From: Lorenzo Stoakes @ 2026-05-20 10:07 UTC (permalink / raw)
To: Barry Song
Cc: Suren Baghdasaryan, Matthew Wilcox, akpm, linux-mm, david, liam,
vbabka, rppt, mhocko, jack, pfalcato, wanglian, chentao,
lianux.mm, kunwu.chan, liyangouwen1, chrisl, kasong, shikemeng,
nphamcs, bhe, youngjun.park, linux-arm-kernel, linux-kernel,
loongarch, linuxppc-dev, linux-riscv, linux-s390, Nanzhe Zhao
In-Reply-To: <CAGsJ_4zN5ezh9vvvQDQdMF2KuuDGvkhNjTZWc0y0Lsa-P4Aahw@mail.gmail.com>
On Wed, May 20, 2026 at 05:07:16PM +0800, Barry Song wrote:
> On Wed, May 20, 2026 at 3:50 PM Lorenzo Stoakes <ljs@kernel.org> wrote:
> >
> > On Wed, May 20, 2026 at 05:18:52AM +0800, Barry Song wrote:
> > > On Tue, May 19, 2026 at 8:53 PM Lorenzo Stoakes <ljs@kernel.org> wrote:
> > > >
> > > > On Mon, May 18, 2026 at 12:56:59PM -0700, Suren Baghdasaryan wrote:
> > > >
> > > > > >
> > > > > > I think we either need to fix `fork()`, or keep the current
> > > > > > behavior of dropping the VMA lock before performing I/O.
> > > > >
> > > > > I see. So, this problem arises from the fact that we are changing the
> > > > > pagefaults requiring I/O operation to hold VMA lock...
> > > > > And you want to lock VMA on fork only if vma_is_anonymous(vma) ||
> > > > > is_cow_mapping(vma->vm_flags). So, we will be blocking page faults for
> > > > > anonymous and COW VMAs only while holding mmap_write_lock, preventing
> > > > > any VMA modification. On the surface, that looks ok to me but I might
> > > > > be missing some corner cases. If nobody sees any obvious issues, I
> > > > > think it's worth a try.
> > > >
> > > > Not sure if you noticed but I did raise concerns ;)
> > > >
> > > > I wonder if you've confused the fault path and fork here, as I think Barry has
> > > > been a little unclear on that.
> > >
> > > I think I’ve been absolutely clear :-)
> >
> > On this point sure, I would argue less so around the fork stuff but I responded
> > on that specifically elsewhere so let's keep things moving :>)
> >
> > > We should either stick to the current behavior - drop
> > > the VMA lock before doing I/O, or change fork() so that it
> > > does not wait on vma_start_write().
> >
> > Again, as I said elsewhere, I think there might be a 3rd way possibly. It's a
> > big mistake to assume that there are only specific solutions to problems in the
> > kernel then to present a false dichotomy.
>
> I recalled that when we discussed this part in my slides:
>
> ‘For simplicity, rather than using a whitelist mechanism for
> per-VMA retry, we could use a blacklist instead: default to
> always retry via the VMA lock, and only allow mmap_lock-based
> page-fault retry for specific cases such as
> __vmf_anon_prepare().’
Yeah that's an itneresting approach actually, sorry if I missed that.
>
> Suren mentioned introducing a FALLBACK flag. With the
> FALLBACK flag, we would retry via mmap_lock; with the RETRY
> flag, we would retry via the VMA lock.
Yeah, and honestly I'm beginning to wonder if we don't just have to pay the
complexity tax anyway and eat the fact we have to deal with that.
But as per Josef's comment re: this whole mechanism, simply not waiting for
file-backed I think is another option (but I don't recall where we left that
conversation actually?)
Anyway I want to make sure any complexity we add is necessary so will take a
look through patches and have a think (and obviously others will have their own
opinions!)
>
> Not sure whether this could really be called a ‘third way,’
> but it seems more like a shift from a whitelist model to a
> blacklist model, without changing the fundamental design, but
> it does change where we would need to touch the source code.
Right yeah, good to have more options.
>
> >
> > We absolutely hear you on this being a problem and it WILL be addressed one way
> > or another.
>
> Thanks. This is a bit of light in what has felt like a fairly
> dark situation. I really appreciate your thoughtful and
> responsible approach.
Yes, sorry, I maybe was a bit too harsh in my tone here, I didn't really intend
to be negative as to addresisng the problem as a whole.
Moreso I've been concerned about the fork approach, and that is what's led to me
being shall we say 'emphatic' about it :)
But of course I sometimes make mistakes in quite how my tone comes across, so
apologies if it came across overly negatively - I am negative (on a technical
level) about the fork approach, but not the fact we should address this.
To be clear - I'm very glad you've brought this up, it's important, as much as
it's painful that we have this issue in the first place! :)
>
> >
> > Of the two approaches, as I said elsewhere, I prefer what you've done in this
> > series to anything touching fork.
> >
> > But give me time to look through the series please (I'd also suggest RFC'ing
> > when it's something kinda fundamental that might generate converastion, makes
> > life a bit easier on the review side :)
>
> Thanks! Sure, I’m happy to wait and there’s no urgency.
>
> Last year you made quite a significant contribution to the work
> when I tried to remove mmap_lock in madvise. I really
> appreciated it. Now we’re back to the same lock again, just in
> different places.
Yeah :) one day maybe we can get rid of it altogether (maybe I'm dreaming :)
>
> Best Regards
> Barry
Cheers, Lorenzo
^ permalink raw reply
* [PATCH v3 2/6] KVM: arm64: Simplify userspace notification of interrupt state
From: Marc Zyngier @ 2026-05-20 10:01 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: Deepanshu Kartikey, Steffen Eiden, Joey Gouly, Suzuki K Poulose,
Oliver Upton, Zenghui Yu
In-Reply-To: <20260520100200.543845-1-maz@kernel.org>
The userspace notification of interrupts is has a few problems:
- it is utterly pointless
- it is annoyingly split between detecting the need for notification
and the population of the interrupts in the run structure
We can't do anything about the former (yet), but the latter can be
addressed. If we detect that we must notify userspace, we know that
we are going to exit, as we populate the exit status. Which means
we can also populate the interrupt state at this stage and be done
with it.
This simplifies the structure of the code.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/kvm/arch_timer.c | 49 +++++++++++++++---------------------
arch/arm64/kvm/arm.c | 24 ++++++++++--------
arch/arm64/kvm/pmu-emul.c | 18 +++++--------
include/kvm/arm_arch_timer.h | 2 +-
include/kvm/arm_pmu.h | 4 +--
5 files changed, 43 insertions(+), 54 deletions(-)
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index d8add34717f07..7236dd6a99e67 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -404,22 +404,30 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
return vcpu_has_wfit_active(vcpu) && wfit_delay_ns(vcpu) == 0;
}
+static u64 kvm_timer_needs_notify(struct kvm_vcpu *vcpu)
+{
+ u64 v = vcpu->run->s.regs.device_irq_level;
+
+ v ^= kvm_timer_pending(vcpu_vtimer(vcpu)) ? KVM_ARM_DEV_EL1_VTIMER : 0;
+ v ^= kvm_timer_pending(vcpu_ptimer(vcpu)) ? KVM_ARM_DEV_EL1_PTIMER : 0;
+
+ return v & (KVM_ARM_DEV_EL1_VTIMER | KVM_ARM_DEV_EL1_PTIMER);
+}
+
+bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
+{
+ return !!kvm_timer_needs_notify(vcpu);
+}
+
/*
* Reflect the timer output level into the kvm_run structure
*/
-void kvm_timer_update_run(struct kvm_vcpu *vcpu)
+bool kvm_timer_update_run(struct kvm_vcpu *vcpu)
{
- struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
- struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
- struct kvm_sync_regs *regs = &vcpu->run->s.regs;
-
- /* Populate the device bitmap with the timer states */
- regs->device_irq_level &= ~(KVM_ARM_DEV_EL1_VTIMER |
- KVM_ARM_DEV_EL1_PTIMER);
- if (kvm_timer_pending(vtimer))
- regs->device_irq_level |= KVM_ARM_DEV_EL1_VTIMER;
- if (kvm_timer_pending(ptimer))
- regs->device_irq_level |= KVM_ARM_DEV_EL1_PTIMER;
+ u64 mask = kvm_timer_needs_notify(vcpu);
+ if (mask)
+ vcpu->run->s.regs.device_irq_level ^= mask;
+ return !!mask;
}
static void kvm_timer_update_status(struct arch_timer_context *ctx, bool level)
@@ -903,23 +911,6 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
timer_set_traps(vcpu, &map);
}
-bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
-{
- struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
- struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
- struct kvm_sync_regs *sregs = &vcpu->run->s.regs;
- bool vlevel, plevel;
-
- if (likely(irqchip_in_kernel(vcpu->kvm)))
- return false;
-
- vlevel = sregs->device_irq_level & KVM_ARM_DEV_EL1_VTIMER;
- plevel = sregs->device_irq_level & KVM_ARM_DEV_EL1_PTIMER;
-
- return kvm_timer_pending(vtimer) != vlevel ||
- kvm_timer_pending(ptimer) != plevel;
-}
-
void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
{
struct arch_timer_cpu *timer = vcpu_timer(vcpu);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 8bb2c7422cc8b..6e6dc17f8b606 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1163,6 +1163,15 @@ static bool vcpu_mode_is_bad_32bit(struct kvm_vcpu *vcpu)
return !kvm_supports_32bit_el0();
}
+static bool kvm_irq_update_run(struct kvm_vcpu *vcpu)
+{
+ bool r;
+
+ r = kvm_timer_update_run(vcpu);
+ r |= kvm_pmu_update_run(vcpu);
+ return r;
+}
+
/**
* kvm_vcpu_exit_request - returns true if the VCPU should *not* enter the guest
* @vcpu: The VCPU pointer
@@ -1184,13 +1193,11 @@ static bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu, int *ret)
/*
* If we're using a userspace irqchip, then check if we need
* to tell a userspace irqchip about timer or PMU level
- * changes and if so, exit to userspace (the actual level
- * state gets updated in kvm_timer_update_run and
- * kvm_pmu_update_run below).
+ * changes and if so, exit to userspace while updating the run
+ * state.
*/
if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {
- if (kvm_timer_should_notify_user(vcpu) ||
- kvm_pmu_should_notify_user(vcpu)) {
+ if (unlikely(kvm_irq_update_run(vcpu))) {
*ret = -EINTR;
run->exit_reason = KVM_EXIT_INTR;
return true;
@@ -1405,11 +1412,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
ret = handle_exit(vcpu, ret);
}
- /* Tell userspace about in-kernel device output levels */
- if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {
- kvm_timer_update_run(vcpu);
- kvm_pmu_update_run(vcpu);
- }
+ if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
+ kvm_irq_update_run(vcpu);
kvm_sigset_deactivate(vcpu);
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index e1860acae641f..31a472a2c4881 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -413,27 +413,21 @@ static void kvm_pmu_update_state(struct kvm_vcpu *vcpu)
bool kvm_pmu_should_notify_user(struct kvm_vcpu *vcpu)
{
- struct kvm_pmu *pmu = &vcpu->arch.pmu;
struct kvm_sync_regs *sregs = &vcpu->run->s.regs;
bool run_level = sregs->device_irq_level & KVM_ARM_DEV_PMU;
- if (likely(irqchip_in_kernel(vcpu->kvm)))
- return false;
-
- return pmu->irq_level != run_level;
+ return kvm_pmu_overflow_status(vcpu) != run_level;
}
/*
* Reflect the PMU overflow interrupt output level into the kvm_run structure
*/
-void kvm_pmu_update_run(struct kvm_vcpu *vcpu)
+bool kvm_pmu_update_run(struct kvm_vcpu *vcpu)
{
- struct kvm_sync_regs *regs = &vcpu->run->s.regs;
-
- /* Populate the timer bitmap for user space */
- regs->device_irq_level &= ~KVM_ARM_DEV_PMU;
- if (vcpu->arch.pmu.irq_level)
- regs->device_irq_level |= KVM_ARM_DEV_PMU;
+ bool update = kvm_pmu_should_notify_user(vcpu);
+ if (update)
+ vcpu->run->s.regs.device_irq_level ^= KVM_ARM_DEV_PMU;
+ return update;
}
/**
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index bf8cc9589bd09..9e4076eebd29f 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -104,7 +104,7 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
void kvm_timer_sync_nested(struct kvm_vcpu *vcpu);
void kvm_timer_sync_user(struct kvm_vcpu *vcpu);
bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu);
-void kvm_timer_update_run(struct kvm_vcpu *vcpu);
+bool kvm_timer_update_run(struct kvm_vcpu *vcpu);
void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu);
void kvm_timer_init_vm(struct kvm *kvm);
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 0a36a3d5c8944..3e844c5ee9174 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -54,7 +54,7 @@ void kvm_pmu_reprogram_counter_mask(struct kvm_vcpu *vcpu, u64 val);
void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu);
void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
bool kvm_pmu_should_notify_user(struct kvm_vcpu *vcpu);
-void kvm_pmu_update_run(struct kvm_vcpu *vcpu);
+bool kvm_pmu_update_run(struct kvm_vcpu *vcpu);
void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val);
void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val);
void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
@@ -131,7 +131,7 @@ static inline bool kvm_pmu_should_notify_user(struct kvm_vcpu *vcpu)
{
return false;
}
-static inline void kvm_pmu_update_run(struct kvm_vcpu *vcpu) {}
+static inline bool kvm_pmu_update_run(struct kvm_vcpu *vcpu) { return false; }
static inline void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val) {}
static inline void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) {}
static inline void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu,
--
2.47.3
^ permalink raw reply related
* [PATCH v3 0/6] KVM: arm64: Don't perform vgic-v2 lazy init on timer injection
From: Marc Zyngier @ 2026-05-20 10:01 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: Deepanshu Kartikey, Steffen Eiden, Joey Gouly, Suzuki K Poulose,
Oliver Upton, Zenghui Yu
This is the third version of this series aiming at fixing issues with
vgic-v2 being initialised from non-preemptible context.
* From v2 [2]:
- Remove the PMU's irq level cache which was hidding in plain sight
- Simplify the userspace notification of interrupt level update
- Additional comment clarification in patch #1
- Collected RB, with thanks
* From v1 [1]:
- Repaint kvm_timer_irq_can_fire() to kvm_timer_enabled()
- Drop duplicate kvm_timer_update_status() call
- Force lazy init on the irqfd slow-path for SPIs
[1] https://lore.kernel.org/r/20260417124612.2770268-1-maz@kernel.org
[2] https://lore.kernel.org/r/20260422100210.3008156-1-maz@kernel.org
Marc Zyngier (6):
KVM: arm64: timer: Repaint kvm_timer_{should,irq_can}_fire() to
kvm_timer_{pending,enabled}()
KVM: arm64: Simplify userspace notification of interrupt state
KVM: arm64: timer: Kill the per-timer irq level cache
KVM: arm64: pmu: Kill the PMU interrupt level cache
KVM: arm64: vgic-v2: Force vgic init on injection outside the run loop
KVM: arm64: vgic-v2: Don't init the vgic on in-kernel interrupt
injection
arch/arm64/kvm/arch_timer.c | 106 ++++++++++++++-----------------
arch/arm64/kvm/arm.c | 39 ++++++++----
arch/arm64/kvm/pmu-emul.c | 31 +++------
arch/arm64/kvm/vgic/vgic-irqfd.c | 6 ++
arch/arm64/kvm/vgic/vgic.c | 6 +-
include/kvm/arm_arch_timer.h | 7 +-
include/kvm/arm_pmu.h | 5 +-
7 files changed, 94 insertions(+), 106 deletions(-)
--
2.47.3
^ permalink raw reply
* [PATCH v3 4/6] KVM: arm64: pmu: Kill the PMU interrupt level cache
From: Marc Zyngier @ 2026-05-20 10:01 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: Deepanshu Kartikey, Steffen Eiden, Joey Gouly, Suzuki K Poulose,
Oliver Upton, Zenghui Yu
In-Reply-To: <20260520100200.543845-1-maz@kernel.org>
Just like the timer, the PMU has an interrupt cache that serves little
purpose. Drop it.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/kvm/pmu-emul.c | 13 +++----------
include/kvm/arm_pmu.h | 1 -
2 files changed, 3 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index 31a472a2c4881..edb21239478a9 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -396,19 +396,12 @@ static bool kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
static void kvm_pmu_update_state(struct kvm_vcpu *vcpu)
{
struct kvm_pmu *pmu = &vcpu->arch.pmu;
- bool overflow;
- overflow = kvm_pmu_overflow_status(vcpu);
- if (pmu->irq_level == overflow)
+ if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
return;
- pmu->irq_level = overflow;
-
- if (likely(irqchip_in_kernel(vcpu->kvm))) {
- int ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu,
- pmu->irq_num, overflow, pmu);
- WARN_ON(ret);
- }
+ WARN_ON(kvm_vgic_inject_irq(vcpu->kvm, vcpu, pmu->irq_num,
+ kvm_pmu_overflow_status(vcpu), pmu));
}
bool kvm_pmu_should_notify_user(struct kvm_vcpu *vcpu)
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 3e844c5ee9174..b5e5942204fc6 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -32,7 +32,6 @@ struct kvm_pmu {
struct kvm_pmc pmc[KVM_ARMV8_PMU_MAX_COUNTERS];
int irq_num;
bool created;
- bool irq_level;
};
struct arm_pmu_entry {
--
2.47.3
^ permalink raw reply related
* [PATCH v3 5/6] KVM: arm64: vgic-v2: Force vgic init on injection outside the run loop
From: Marc Zyngier @ 2026-05-20 10:01 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: Deepanshu Kartikey, Steffen Eiden, Joey Gouly, Suzuki K Poulose,
Oliver Upton, Zenghui Yu
In-Reply-To: <20260520100200.543845-1-maz@kernel.org>
Make sure that any attempt to inject an interrupt from userspace
or an irqfd results in the GICv2 lazy init to take place.
This is not currently necessary as the init is also performed on
*any* interrupt injection. But as we're about to remove that,
let's introduce it here.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/kvm/arm.c | 15 +++++++++++++--
arch/arm64/kvm/vgic/vgic-irqfd.c | 6 ++++++
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 6e6dc17f8b606..cfb7921fc7d75 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -51,6 +51,7 @@
#include <linux/irqchip/arm-gic-v5.h>
+#include "vgic/vgic.h"
#include "sys_regs.h"
static enum kvm_mode kvm_mode = KVM_MODE_DEFAULT;
@@ -1497,8 +1498,13 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
return vcpu_interrupt_line(vcpu, irq_num, level);
case KVM_ARM_IRQ_TYPE_PPI:
- if (!irqchip_in_kernel(kvm))
+ if (irqchip_in_kernel(kvm)) {
+ int ret = vgic_lazy_init(kvm);
+ if (ret)
+ return ret;
+ } else {
return -ENXIO;
+ }
vcpu = kvm_get_vcpu_by_id(kvm, vcpu_id);
if (!vcpu)
@@ -1525,8 +1531,13 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
return kvm_vgic_inject_irq(kvm, vcpu, irq_num, level, NULL);
case KVM_ARM_IRQ_TYPE_SPI:
- if (!irqchip_in_kernel(kvm))
+ if (irqchip_in_kernel(kvm)) {
+ int ret = vgic_lazy_init(kvm);
+ if (ret)
+ return ret;
+ } else {
return -ENXIO;
+ }
if (vgic_is_v5(kvm)) {
/* Build a GICv5-style IntID here */
diff --git a/arch/arm64/kvm/vgic/vgic-irqfd.c b/arch/arm64/kvm/vgic/vgic-irqfd.c
index b9b86e3a6c862..19a1094536e6a 100644
--- a/arch/arm64/kvm/vgic/vgic-irqfd.c
+++ b/arch/arm64/kvm/vgic/vgic-irqfd.c
@@ -20,9 +20,15 @@ static int vgic_irqfd_set_irq(struct kvm_kernel_irq_routing_entry *e,
int level, bool line_status)
{
unsigned int spi_id = e->irqchip.pin + VGIC_NR_PRIVATE_IRQS;
+ int ret;
if (!vgic_valid_spi(kvm, spi_id))
return -EINVAL;
+
+ ret = vgic_lazy_init(kvm);
+ if (ret)
+ return ret;
+
return kvm_vgic_inject_irq(kvm, NULL, spi_id, level, NULL);
}
--
2.47.3
^ permalink raw reply related
* [PATCH v3 1/6] KVM: arm64: timer: Repaint kvm_timer_{should,irq_can}_fire() to kvm_timer_{pending,enabled}()
From: Marc Zyngier @ 2026-05-20 10:01 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: Deepanshu Kartikey, Steffen Eiden, Joey Gouly, Suzuki K Poulose,
Oliver Upton, Zenghui Yu
In-Reply-To: <20260520100200.543845-1-maz@kernel.org>
kvm_timer_should_fire() seems to date back to a time where the author
of the timer code didn't seem to have made the word "pending" part of
their vocabulary.
Having since slightly improved on that front, let's rename this predicate
to kvm_timer_pending(), which clearly indicates whether the timer
interrupt is pending or not.
Similarly, kvm_timer_irq_can_fire() is renamed to kvm_timer_enabled().
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/kvm/arch_timer.c | 55 ++++++++++++++++++-------------------
1 file changed, 27 insertions(+), 28 deletions(-)
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index cbea4d9ee9552..d8add34717f07 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -39,10 +39,9 @@ static const u8 default_ppi[] = {
[TIMER_HVTIMER] = 28,
};
-static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
struct arch_timer_context *timer_ctx);
-static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx);
+static bool kvm_timer_pending(struct arch_timer_context *timer_ctx);
static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
struct arch_timer_context *timer,
enum kvm_arch_timer_regs treg,
@@ -224,7 +223,7 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
else
ctx = map.direct_ptimer;
- if (kvm_timer_should_fire(ctx))
+ if (kvm_timer_pending(ctx))
kvm_timer_update_irq(vcpu, true, ctx);
if (userspace_irqchip(vcpu->kvm) &&
@@ -257,7 +256,7 @@ static u64 kvm_timer_compute_delta(struct arch_timer_context *timer_ctx)
return kvm_counter_compute_delta(timer_ctx, timer_get_cval(timer_ctx));
}
-static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx)
+static bool kvm_timer_enabled(struct arch_timer_context *timer_ctx)
{
WARN_ON(timer_ctx && timer_ctx->loaded);
return timer_ctx &&
@@ -294,7 +293,7 @@ static u64 kvm_timer_earliest_exp(struct kvm_vcpu *vcpu)
struct arch_timer_context *ctx = &vcpu->arch.timer_cpu.timers[i];
WARN(ctx->loaded, "timer %d loaded\n", i);
- if (kvm_timer_irq_can_fire(ctx))
+ if (kvm_timer_enabled(ctx))
min_delta = min(min_delta, kvm_timer_compute_delta(ctx));
}
@@ -358,7 +357,7 @@ static enum hrtimer_restart kvm_hrtimer_expire(struct hrtimer *hrt)
return HRTIMER_NORESTART;
}
-static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
+static bool kvm_timer_pending(struct arch_timer_context *timer_ctx)
{
enum kvm_arch_timers index;
u64 cval, now;
@@ -391,7 +390,7 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
!(cnt_ctl & ARCH_TIMER_CTRL_IT_MASK);
}
- if (!kvm_timer_irq_can_fire(timer_ctx))
+ if (!kvm_timer_enabled(timer_ctx))
return false;
cval = timer_get_cval(timer_ctx);
@@ -417,9 +416,9 @@ void kvm_timer_update_run(struct kvm_vcpu *vcpu)
/* Populate the device bitmap with the timer states */
regs->device_irq_level &= ~(KVM_ARM_DEV_EL1_VTIMER |
KVM_ARM_DEV_EL1_PTIMER);
- if (kvm_timer_should_fire(vtimer))
+ if (kvm_timer_pending(vtimer))
regs->device_irq_level |= KVM_ARM_DEV_EL1_VTIMER;
- if (kvm_timer_should_fire(ptimer))
+ if (kvm_timer_pending(ptimer))
regs->device_irq_level |= KVM_ARM_DEV_EL1_PTIMER;
}
@@ -473,21 +472,21 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
/* Only called for a fully emulated timer */
static void timer_emulate(struct arch_timer_context *ctx)
{
- bool should_fire = kvm_timer_should_fire(ctx);
+ bool pending = kvm_timer_pending(ctx);
- trace_kvm_timer_emulate(ctx, should_fire);
+ trace_kvm_timer_emulate(ctx, pending);
- if (should_fire != ctx->irq.level)
- kvm_timer_update_irq(timer_context_to_vcpu(ctx), should_fire, ctx);
+ if (pending != ctx->irq.level)
+ kvm_timer_update_irq(timer_context_to_vcpu(ctx), pending, ctx);
- kvm_timer_update_status(ctx, should_fire);
+ kvm_timer_update_status(ctx, pending);
/*
- * If the timer can fire now, we don't need to have a soft timer
- * scheduled for the future. If the timer cannot fire at all,
- * then we also don't need a soft timer.
+ * If the timer is pending, we don't need to have a soft timer
+ * scheduled for the future. If the timer is disabled, then
+ * we don't need a soft timer either.
*/
- if (should_fire || !kvm_timer_irq_can_fire(ctx))
+ if (pending || !kvm_timer_enabled(ctx))
return;
soft_timer_start(&ctx->hrtimer, kvm_timer_compute_delta(ctx));
@@ -594,10 +593,10 @@ static void kvm_timer_blocking(struct kvm_vcpu *vcpu)
* If no timers are capable of raising interrupts (disabled or
* masked), then there's no more work for us to do.
*/
- if (!kvm_timer_irq_can_fire(map.direct_vtimer) &&
- !kvm_timer_irq_can_fire(map.direct_ptimer) &&
- !kvm_timer_irq_can_fire(map.emul_vtimer) &&
- !kvm_timer_irq_can_fire(map.emul_ptimer) &&
+ if (!kvm_timer_enabled(map.direct_vtimer) &&
+ !kvm_timer_enabled(map.direct_ptimer) &&
+ !kvm_timer_enabled(map.emul_vtimer) &&
+ !kvm_timer_enabled(map.emul_ptimer) &&
!vcpu_has_wfit_active(vcpu))
return;
@@ -685,7 +684,7 @@ static void kvm_timer_vcpu_load_gic(struct arch_timer_context *ctx)
* this point and the register restoration, we'll take the
* interrupt anyway.
*/
- kvm_timer_update_irq(vcpu, kvm_timer_should_fire(ctx), ctx);
+ kvm_timer_update_irq(vcpu, kvm_timer_pending(ctx), ctx);
if (irqchip_in_kernel(vcpu->kvm))
phys_active = kvm_vgic_map_is_active(vcpu, timer_irq(ctx));
@@ -706,7 +705,7 @@ static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)
* this point and the register restoration, we'll take the
* interrupt anyway.
*/
- kvm_timer_update_irq(vcpu, kvm_timer_should_fire(vtimer), vtimer);
+ kvm_timer_update_irq(vcpu, kvm_timer_pending(vtimer), vtimer);
/*
* When using a userspace irqchip with the architected timers and a
@@ -917,8 +916,8 @@ bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
vlevel = sregs->device_irq_level & KVM_ARM_DEV_EL1_VTIMER;
plevel = sregs->device_irq_level & KVM_ARM_DEV_EL1_PTIMER;
- return kvm_timer_should_fire(vtimer) != vlevel ||
- kvm_timer_should_fire(ptimer) != plevel;
+ return kvm_timer_pending(vtimer) != vlevel ||
+ kvm_timer_pending(ptimer) != plevel;
}
void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
@@ -1006,7 +1005,7 @@ static void unmask_vtimer_irq_user(struct kvm_vcpu *vcpu)
{
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
- if (!kvm_timer_should_fire(vtimer)) {
+ if (!kvm_timer_pending(vtimer)) {
kvm_timer_update_irq(vcpu, false, vtimer);
if (static_branch_likely(&has_gic_active_state))
set_timer_irq_phys_active(vtimer, false);
@@ -1579,7 +1578,7 @@ static bool kvm_arch_timer_get_input_level(int vintid)
ctx = vcpu_get_timer(vcpu, i);
if (timer_irq(ctx) == vintid)
- return kvm_timer_should_fire(ctx);
+ return kvm_timer_pending(ctx);
}
/* A timer IRQ has fired, but no matching timer was found? */
--
2.47.3
^ permalink raw reply related
* [PATCH v3 6/6] KVM: arm64: vgic-v2: Don't init the vgic on in-kernel interrupt injection
From: Marc Zyngier @ 2026-05-20 10:02 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: Deepanshu Kartikey, Steffen Eiden, Joey Gouly, Suzuki K Poulose,
Oliver Upton, Zenghui Yu
In-Reply-To: <20260520100200.543845-1-maz@kernel.org>
We how have the lazy init on three paths:
- on first run of a vcpu
- on first injection of an interrupt from userspace and irqfd
- on first injection of an interrupt from kernel space as
part of the device emulation (timers, PMU, vgic MI)
Given that we recompute the state of each in-kernel interrupt
every time we are about to enter the guest, we can drop the lazy
init from the kernel injection path.
This solves a bunch of issues related to vgic_lazy_init() being called
in non-preemptible context, such as vcpu reset.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/kvm/vgic/vgic.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index 1e9fe8764584d..9e29f03d3463c 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -534,11 +534,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
{
struct vgic_irq *irq;
unsigned long flags;
- int ret;
- ret = vgic_lazy_init(kvm);
- if (ret)
- return ret;
+ if (unlikely(!vgic_initialized(kvm)))
+ return 0;
if (!vcpu && irq_is_private(kvm, intid))
return -EINVAL;
--
2.47.3
^ permalink raw reply related
* [PATCH v3 3/6] KVM: arm64: timer: Kill the per-timer irq level cache
From: Marc Zyngier @ 2026-05-20 10:01 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: Deepanshu Kartikey, Steffen Eiden, Joey Gouly, Suzuki K Poulose,
Oliver Upton, Zenghui Yu
In-Reply-To: <20260520100200.543845-1-maz@kernel.org>
The timer code makes use of a per-timer irq level cache, which
looks like a very minor optimisation to avoid taking a lock upon
updating the GIC view of the interrupt when it is unchanged from
the previous state.
This is coming in the way of more important correctness issues,
so get rid of the cache, which simplifies a couple of minor things.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/kvm/arch_timer.c | 20 +++++++++-----------
include/kvm/arm_arch_timer.h | 5 -----
2 files changed, 9 insertions(+), 16 deletions(-)
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 7236dd6a99e67..c3b8257888e89 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -453,9 +453,8 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
{
kvm_timer_update_status(timer_ctx, new_level);
- timer_ctx->irq.level = new_level;
trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_irq(timer_ctx),
- timer_ctx->irq.level);
+ new_level);
if (userspace_irqchip(vcpu->kvm))
return;
@@ -473,7 +472,7 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
kvm_vgic_inject_irq(vcpu->kvm, vcpu,
timer_irq(timer_ctx),
- timer_ctx->irq.level,
+ new_level,
timer_ctx);
}
@@ -484,10 +483,7 @@ static void timer_emulate(struct arch_timer_context *ctx)
trace_kvm_timer_emulate(ctx, pending);
- if (pending != ctx->irq.level)
- kvm_timer_update_irq(timer_context_to_vcpu(ctx), pending, ctx);
-
- kvm_timer_update_status(ctx, pending);
+ kvm_timer_update_irq(timer_context_to_vcpu(ctx), pending, ctx);
/*
* If the timer is pending, we don't need to have a soft timer
@@ -684,6 +680,7 @@ static inline void set_timer_irq_phys_active(struct arch_timer_context *ctx, boo
static void kvm_timer_vcpu_load_gic(struct arch_timer_context *ctx)
{
struct kvm_vcpu *vcpu = timer_context_to_vcpu(ctx);
+ bool pending = kvm_timer_pending(ctx);
bool phys_active = false;
/*
@@ -692,12 +689,12 @@ static void kvm_timer_vcpu_load_gic(struct arch_timer_context *ctx)
* this point and the register restoration, we'll take the
* interrupt anyway.
*/
- kvm_timer_update_irq(vcpu, kvm_timer_pending(ctx), ctx);
+ kvm_timer_update_irq(vcpu, pending, ctx);
if (irqchip_in_kernel(vcpu->kvm))
phys_active = kvm_vgic_map_is_active(vcpu, timer_irq(ctx));
- phys_active |= ctx->irq.level;
+ phys_active |= pending;
phys_active |= vgic_is_v5(vcpu->kvm);
set_timer_irq_phys_active(ctx, phys_active);
@@ -706,6 +703,7 @@ static void kvm_timer_vcpu_load_gic(struct arch_timer_context *ctx)
static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)
{
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+ bool pending = kvm_timer_pending(vtimer);
/*
* Update the timer output so that it is likely to match the
@@ -713,7 +711,7 @@ static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)
* this point and the register restoration, we'll take the
* interrupt anyway.
*/
- kvm_timer_update_irq(vcpu, kvm_timer_pending(vtimer), vtimer);
+ kvm_timer_update_irq(vcpu, pending, vtimer);
/*
* When using a userspace irqchip with the architected timers and a
@@ -725,7 +723,7 @@ static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)
* being de-asserted, we unmask the interrupt again so that we exit
* from the guest when the timer fires.
*/
- if (vtimer->irq.level)
+ if (pending)
disable_percpu_irq(host_vtimer_irq);
else
enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 9e4076eebd29f..15a4f97f81051 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -66,11 +66,6 @@ struct arch_timer_context {
*/
bool loaded;
- /* Output level of the timer IRQ */
- struct {
- bool level;
- } irq;
-
/* Who am I? */
enum kvm_arch_timers timer_id;
--
2.47.3
^ permalink raw reply related
* Re: [PATCH net-next v2 2/2] net: ti: icssg: Add HSR and LRE PA statistics
From: MD Danish Anwar @ 2026-05-20 10:00 UTC (permalink / raw)
To: Jakub Kicinski, Luka Gejak
Cc: Felix Maurer, David S. Miller, Eric Dumazet, Paolo Abeni,
Simon Horman, Jonathan Corbet, Shuah Khan, Roger Quadros,
Andrew Lunn, Meghana Malladi, Jacob Keller, David Carlier,
Vadim Fedorenko, Kevin Hao, netdev, linux-doc, linux-kernel,
linux-arm-kernel, Vladimir Oltean
In-Reply-To: <20260519165646.09b0783f@kernel.org>
Hi Jakub,
On 20/05/26 5:26 am, Jakub Kicinski wrote:
> On Tue, 19 May 2026 07:55:55 +0200 Luka Gejak wrote:
>> On May 19, 2026 3:45:06 AM GMT+02:00, Jakub Kicinski <kuba@kernel.org> wrote:
>>> On Thu, 14 May 2026 13:26:05 +0530 MD Danish Anwar wrote:
>>>> Add new firmware PA statistics counters for HSR and LRE to the ethtool
>>>> statistics exposed by the ICSSG driver.
>>>>
>>>> New statistics added:
>>>> - FW_HSR_FWD_CHECK_FAIL_DROP: Packets dropped on the HSR forwarding path
>>>> - FW_HSR_HE_CHECK_FAIL_DROP: Packets dropped on the HSR host egress path
>>>> - FW_HSR_SKIP_HOST_DUP_DISCARD_FRAMES: Frames with duplicate discard
>>>> skipped
>>>> - FW_LRE_CNT_UNIQUE/DUPLICATE/MULTIPLE_RX: LRE duplicate detection
>>>> counters
>>>> - FW_LRE_CNT_RX/TX: LRE per-port frame counters
>>>> - FW_LRE_CNT_OWN_RX: Own HSR tagged frames received
>>>> - FW_LRE_CNT_ERRWRONGLAN: Frames with wrong LAN identifier (PRP)
>>>>
>>>> Document the new HSR/LRE statistics in icssg_prueth.rst.
>>>
>>> To an untrained eye these stats look like stuff that could
>>> be standardized across drivers.
>>>
>>> Luka, Felix, others on CC, do you think we should expose these
>> >from HSR over netlink as "standard" offload stats different drivers
>>> can plug into or not worth it?
>>
>> I think there is a case for standardizing part of this, but I would
>> not standardize the whole set as-is.
>>
>> The LRE counters look generic enough to me, especially:
>> - unique rx
>> - duplicate rx
>> - multiple rx
>> - rx / tx
>> - own rx
>> - wrong LAN, PRP only
>>
>> Those are protocol/LRE concepts rather than TI firmware details, so
>> exposing them from the HSR/PRP layer sounds useful. I would expect
>> both the software implementation and offloaded implementations to be
>> able to provide at least some of them, with unsupported counters
>> omitted or reported as not available.
>> I would not put the firmware check/drop counters in the same standard
>> bucket, though:
>> - FW_HSR_FWD_CHECK_FAIL_DROP
>> - FW_HSR_HE_CHECK_FAIL_DROP
>> - FW_HSR_SKIP_HOST_DUP_DISCARD_FRAMES
>
> Thanks for the breakdown!
>
>> Those sound more like implementation/debug counters for the ICSSG
>> firmware pipeline. They are still useful in ethtool driver stats, but
>> I would be hesitant to bake their exact semantics into HSR UAPI.
>> So my preference would be:
>> 1. Keep driver-private ethtool stats for the full firmware counter set.
>> 2. Add a small HSR/PRP standard stats set separately, limited to
>> well-defined LRE counters.
>> 3. Make the HSR layer expose them, with offload drivers plugging in via
>> an optional callback or offload stats op.
>> 4. Define the counters carefully, including whether they are per-HSR
>> device or per-port A/B, and what PRP-only counters mean for HSR.
>>
>> I do not think this patch should blindly become the UAPI definition,
>
> Not at all, the unique / multiple stats gave me pause. We should
> only put in the standard API what can be easily and unambiguously
> defined given the protocol spec.
>
>> but I do think it points at a useful follow-up. If we want to avoid
>> adding driver-private names first and then standardizing different
>> names later, then it may be worth asking Danish to split the
>> protocol-level LRE counters out and route those through a common HSR
>> stats interface.
>
> As a general policy we ask for standard stats to be added first and
> ethtool to only contain what didn't fit in the standard ones.
> There are some technical reasons but it's mostly a mindset thing.
What should be the next steps here? Is there any existing defined set of
stats where I could populate stats from ICSSG firmware for HSR (similar
to ndo_get_stats64 callback). Or de we need to implement a new callback
that will do this for HSR.
I agree with Luka on the categorization,
Below stats can be generic,
- unique rx
- duplicate rx
- multiple rx
- rx / tx
- own rx
- wrong LAN, PRP only
Below stats can be driver specific and can be pulled using `ethtool -S`
on child interfaces of HSR
- FW_HSR_FWD_CHECK_FAIL_DROP
- FW_HSR_HE_CHECK_FAIL_DROP
- FW_HSR_SKIP_HOST_DUP_DISCARD_FRAMES
Let me know if I should go ahead and implement this.
--
Thanks and Regards,
Danish
^ permalink raw reply
* Re: [PATCH v7 20/23] drm: bridge: dw_hdmi: Rework HDP and RXSENSE interrupt handling
From: Neil Armstrong @ 2026-05-20 9:59 UTC (permalink / raw)
To: Jonas Karlman, Andrzej Hajda, Robert Foss, Heiko Stuebner,
Laurent Pinchart, Jernej Skrabec, Luca Ceresoli,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter
Cc: Liu Ying, Sandy Huang, Andy Yan, Chen-Yu Tsai, Christian Hewitt,
Diederik de Haas, Nicolas Frattaroli, Dmitry Baryshkov, dri-devel,
linux-arm-kernel, linux-rockchip, linux-amlogic, linux-sunxi, imx,
linux-kernel
In-Reply-To: <20260518180206.2480119-21-jonas@kwiboo.se>
On 5/18/26 20:01, Jonas Karlman wrote:
> The commit aeac23bda87f ("drm: bridge/dw_hdmi: improve HDMI
> enable/disable handling") added use of PHY RXSENSE indications to avoid
> triggering a full enable/disable of the HDMI block when a sink use a HPD
> low voltage level pulse to indicate changes of the EDID.
>
> HDMI Specification Version 1.4b chapter 8.5 mentions:
>
> An HDMI Sink shall indicate any change to the contents of the E-EDID
> by driving a low voltage level pulse on the Hot Plug Detect pin. This
> pulse shall be at least 100 msec.
>
> A delayed work is now used to debounce reacting on a HPD low voltage
> level pulse when a sink changes the EDID. The delayed work triggers a
> hotplug uevent every time the connection status or EDID has changed.
>
> Remove RXSENSE handling to simplify the HPD interrupt handling and
> instead depend on the delayed work to detect any connection status or
> EDID changes.
>
> This also ensures the initial HPD interrupt polarity is based on current
> HPD status to avoid an unnecessary interrupt from being triggered
> immediately at probe or resume when a sink is connected.
I'm still puzzled of the removal of RX_SENSE entirely as v1, and I since the
rx_sense code is not easy to understand I don't have an opinion on that.
Can someone with more knowledge can comment on that ?
Neil
>
> Tested-by: Diederik de Haas <diederik@cknow-tech.com> # Rock64, RockPro64, Quartz64-B
> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> ---
> v7: Remove clear of STAT0_RX_SENSE in dw_hdmi_remove() added in prior
> patch
> v6: Update commit message,
> Collect t-b tag
> v5: Add comment about interrupt generation
> v4: New patch
> ---
> drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 147 ++++------------------
> 1 file changed, 22 insertions(+), 125 deletions(-)
>
> diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
> index 270db58a0e7c..2e09bff5faf7 100644
> --- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
> +++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
> @@ -161,11 +161,7 @@ struct dw_hdmi {
> struct pinctrl_state *unwedge_state;
>
> struct mutex mutex; /* for state below */
> - enum drm_connector_force force; /* mutex-protected force state */
> struct drm_connector *curr_conn;/* current connector (only valid when !disabled) */
> - bool disabled; /* DRM has disabled our bridge */
> - bool rxsense; /* rxsense state */
> - u8 phy_mask; /* desired phy int mask settings */
> u8 mc_clkdis; /* clock disable register */
>
> spinlock_t audio_lock;
> @@ -196,14 +192,6 @@ const struct dw_hdmi_plat_data *dw_hdmi_to_plat_data(struct dw_hdmi *hdmi)
> }
> EXPORT_SYMBOL_GPL(dw_hdmi_to_plat_data);
>
> -#define HDMI_IH_PHY_STAT0_RX_SENSE \
> - (HDMI_IH_PHY_STAT0_RX_SENSE0 | HDMI_IH_PHY_STAT0_RX_SENSE1 | \
> - HDMI_IH_PHY_STAT0_RX_SENSE2 | HDMI_IH_PHY_STAT0_RX_SENSE3)
> -
> -#define HDMI_PHY_RX_SENSE \
> - (HDMI_PHY_RX_SENSE0 | HDMI_PHY_RX_SENSE1 | \
> - HDMI_PHY_RX_SENSE2 | HDMI_PHY_RX_SENSE3)
> -
> static inline void hdmi_writeb(struct dw_hdmi *hdmi, u8 val, int offset)
> {
> regmap_write(hdmi->regm, offset << hdmi->reg_shift, val);
> @@ -1702,36 +1690,25 @@ EXPORT_SYMBOL_GPL(dw_hdmi_phy_read_hpd);
> void dw_hdmi_phy_update_hpd(struct dw_hdmi *hdmi, void *data,
> bool force, bool disabled, bool rxsense)
> {
> - u8 old_mask = hdmi->phy_mask;
> -
> - if (force || disabled || !rxsense)
> - hdmi->phy_mask |= HDMI_PHY_RX_SENSE;
> - else
> - hdmi->phy_mask &= ~HDMI_PHY_RX_SENSE;
> -
> - if (old_mask != hdmi->phy_mask)
> - hdmi_writeb(hdmi, hdmi->phy_mask, HDMI_PHY_MASK0);
> }
> EXPORT_SYMBOL_GPL(dw_hdmi_phy_update_hpd);
>
> void dw_hdmi_phy_setup_hpd(struct dw_hdmi *hdmi, void *data)
> {
> /*
> - * Configure the PHY RX SENSE and HPD interrupts polarities and clear
> - * any pending interrupt.
> + * Configure the PHY HPD interrupt polarity based on current HPD status
> + * and clear any pending interrupt.
> */
> - hdmi_writeb(hdmi, HDMI_PHY_HPD | HDMI_PHY_RX_SENSE, HDMI_PHY_POL0);
> - hdmi_writeb(hdmi, HDMI_IH_PHY_STAT0_HPD | HDMI_IH_PHY_STAT0_RX_SENSE,
> - HDMI_IH_PHY_STAT0);
> + hdmi_modb(hdmi, hdmi_readb(hdmi, HDMI_PHY_STAT0) & HDMI_PHY_HPD ?
> + 0 : HDMI_PHY_HPD, HDMI_PHY_HPD, HDMI_PHY_POL0);
> + hdmi_writeb(hdmi, HDMI_IH_PHY_STAT0_HPD, HDMI_IH_PHY_STAT0);
>
> /* Enable cable hot plug irq. */
> - hdmi_writeb(hdmi, hdmi->phy_mask, HDMI_PHY_MASK0);
> + hdmi_writeb(hdmi, ~HDMI_PHY_HPD, HDMI_PHY_MASK0);
>
> /* Clear and unmute interrupts. */
> - hdmi_writeb(hdmi, HDMI_IH_PHY_STAT0_HPD | HDMI_IH_PHY_STAT0_RX_SENSE,
> - HDMI_IH_PHY_STAT0);
> - hdmi_writeb(hdmi, ~(HDMI_IH_PHY_STAT0_HPD | HDMI_IH_PHY_STAT0_RX_SENSE),
> - HDMI_IH_MUTE_PHY_STAT0);
> + hdmi_writeb(hdmi, HDMI_IH_PHY_STAT0_HPD, HDMI_IH_PHY_STAT0);
> + hdmi_writeb(hdmi, ~HDMI_IH_PHY_STAT0_HPD, HDMI_IH_MUTE_PHY_STAT0);
> }
> EXPORT_SYMBOL_GPL(dw_hdmi_phy_setup_hpd);
>
> @@ -2395,26 +2372,6 @@ static void dw_hdmi_poweroff(struct dw_hdmi *hdmi)
> }
> }
>
> -/*
> - * Adjust the detection of RXSENSE according to whether we have a forced
> - * connection mode enabled, or whether we have been disabled. There is
> - * no point processing RXSENSE interrupts if we have a forced connection
> - * state, or DRM has us disabled.
> - *
> - * We also disable rxsense interrupts when we think we're disconnected
> - * to avoid floating TDMS signals giving false rxsense interrupts.
> - *
> - * Note: we still need to listen for HPD interrupts even when DRM has us
> - * disabled so that we can detect a connect event.
> - */
> -static void dw_hdmi_update_phy_mask(struct dw_hdmi *hdmi)
> -{
> - if (hdmi->phy.ops->update_hpd)
> - hdmi->phy.ops->update_hpd(hdmi, hdmi->phy.data,
> - hdmi->force, hdmi->disabled,
> - hdmi->rxsense);
> -}
> -
> static enum drm_connector_status dw_hdmi_detect(struct dw_hdmi *hdmi)
> {
> enum drm_connector_status result;
> @@ -2512,9 +2469,7 @@ static void dw_hdmi_connector_force(struct drm_connector *connector)
> struct dw_hdmi *hdmi = container_of(connector, struct dw_hdmi, connector);
>
> mutex_lock(&hdmi->mutex);
> - hdmi->force = connector->force;
> hdmi->last_connector_result = connector->status;
> - dw_hdmi_update_phy_mask(hdmi);
> mutex_unlock(&hdmi->mutex);
>
> dw_hdmi_connector_status_update(hdmi, connector, connector->status);
> @@ -2932,10 +2887,8 @@ static void dw_hdmi_bridge_atomic_disable(struct drm_bridge *bridge,
> struct dw_hdmi *hdmi = bridge->driver_private;
>
> mutex_lock(&hdmi->mutex);
> - hdmi->disabled = true;
> hdmi->curr_conn = NULL;
> dw_hdmi_poweroff(hdmi);
> - dw_hdmi_update_phy_mask(hdmi);
> handle_plugged_change(hdmi, false);
> mutex_unlock(&hdmi->mutex);
> }
> @@ -2954,10 +2907,8 @@ static void dw_hdmi_bridge_atomic_enable(struct drm_bridge *bridge,
> mode = &drm_atomic_get_new_crtc_state(state, crtc)->adjusted_mode;
>
> mutex_lock(&hdmi->mutex);
> - hdmi->disabled = false;
> hdmi->curr_conn = connector;
> dw_hdmi_poweron(hdmi, connector, mode);
> - dw_hdmi_update_phy_mask(hdmi);
> handle_plugged_change(hdmi, true);
> mutex_unlock(&hdmi->mutex);
> }
> @@ -3060,78 +3011,29 @@ static irqreturn_t dw_hdmi_hardirq(int irq, void *dev_id)
>
> void dw_hdmi_setup_rx_sense(struct dw_hdmi *hdmi, bool hpd, bool rx_sense)
> {
> - mutex_lock(&hdmi->mutex);
> -
> - if (!hdmi->force) {
> - /*
> - * If the RX sense status indicates we're disconnected,
> - * clear the software rxsense status.
> - */
> - if (!rx_sense)
> - hdmi->rxsense = false;
> -
> - /*
> - * Only set the software rxsense status when both
> - * rxsense and hpd indicates we're connected.
> - * This avoids what seems to be bad behaviour in
> - * at least iMX6S versions of the phy.
> - */
> - if (hpd)
> - hdmi->rxsense = true;
> -
> - dw_hdmi_update_phy_mask(hdmi);
> - }
> - mutex_unlock(&hdmi->mutex);
> }
> EXPORT_SYMBOL_GPL(dw_hdmi_setup_rx_sense);
>
> static irqreturn_t dw_hdmi_irq(int irq, void *dev_id)
> {
> struct dw_hdmi *hdmi = dev_id;
> - u8 intr_stat, phy_int_pol, phy_pol_mask, phy_stat;
> - enum drm_connector_status status = connector_status_unknown;
> -
> - intr_stat = hdmi_readb(hdmi, HDMI_IH_PHY_STAT0);
> - phy_int_pol = hdmi_readb(hdmi, HDMI_PHY_POL0);
> - phy_stat = hdmi_readb(hdmi, HDMI_PHY_STAT0);
> -
> - phy_pol_mask = 0;
> - if (intr_stat & HDMI_IH_PHY_STAT0_HPD)
> - phy_pol_mask |= HDMI_PHY_HPD;
> - if (intr_stat & HDMI_IH_PHY_STAT0_RX_SENSE0)
> - phy_pol_mask |= HDMI_PHY_RX_SENSE0;
> - if (intr_stat & HDMI_IH_PHY_STAT0_RX_SENSE1)
> - phy_pol_mask |= HDMI_PHY_RX_SENSE1;
> - if (intr_stat & HDMI_IH_PHY_STAT0_RX_SENSE2)
> - phy_pol_mask |= HDMI_PHY_RX_SENSE2;
> - if (intr_stat & HDMI_IH_PHY_STAT0_RX_SENSE3)
> - phy_pol_mask |= HDMI_PHY_RX_SENSE3;
> -
> - if (phy_pol_mask)
> - hdmi_modb(hdmi, ~phy_int_pol, phy_pol_mask, HDMI_PHY_POL0);
> + u8 intr_stat;
>
> /*
> - * RX sense tells us whether the TDMS transmitters are detecting
> - * load - in other words, there's something listening on the
> - * other end of the link. Use this to decide whether we should
> - * power on the phy as HPD may be toggled by the sink to merely
> - * ask the source to re-read the EDID.
> + * Interrupt generation is accomplished in the following way:
> + * interrupt = (mask == 0) && (polarity == status)
> + * All interrupts are forwarded to the Interrupt Handler sticky bit
> + * register ih_phy_stat0 and muted using the register ih_mute_phy_stat0.
> */
> - if (intr_stat &
> - (HDMI_IH_PHY_STAT0_RX_SENSE | HDMI_IH_PHY_STAT0_HPD)) {
> - dw_hdmi_setup_rx_sense(hdmi,
> - phy_stat & HDMI_PHY_HPD,
> - phy_stat & HDMI_PHY_RX_SENSE);
> + intr_stat = hdmi_readb(hdmi, HDMI_IH_PHY_STAT0);
> + if (intr_stat & HDMI_IH_PHY_STAT0_HPD) {
> + enum drm_connector_status status;
>
> - if ((intr_stat & HDMI_IH_PHY_STAT0_HPD) &&
> - (phy_stat & HDMI_PHY_HPD))
> - status = connector_status_connected;
> + /* Set HPD interrupt polarity based on current HPD status. */
> + status = dw_hdmi_phy_read_hpd(hdmi, hdmi->phy.data);
> + hdmi_modb(hdmi, status == connector_status_connected ?
> + 0 : HDMI_PHY_HPD, HDMI_PHY_HPD, HDMI_PHY_POL0);
>
> - if (!(phy_stat & (HDMI_PHY_HPD | HDMI_PHY_RX_SENSE)))
> - status = connector_status_disconnected;
> - }
> -
> - if (status != connector_status_unknown) {
> dev_dbg(hdmi->dev, "EVENT=%s\n",
> status == connector_status_connected ?
> "plugin" : "plugout");
> @@ -3141,8 +3043,7 @@ static irqreturn_t dw_hdmi_irq(int irq, void *dev_id)
> }
>
> hdmi_writeb(hdmi, intr_stat, HDMI_IH_PHY_STAT0);
> - hdmi_writeb(hdmi, ~(HDMI_IH_PHY_STAT0_HPD | HDMI_IH_PHY_STAT0_RX_SENSE),
> - HDMI_IH_MUTE_PHY_STAT0);
> + hdmi_writeb(hdmi, ~HDMI_IH_PHY_STAT0_HPD, HDMI_IH_MUTE_PHY_STAT0);
>
> return IRQ_HANDLED;
> }
> @@ -3343,9 +3244,6 @@ struct dw_hdmi *dw_hdmi_probe(struct platform_device *pdev,
> hdmi->dev = dev;
> hdmi->sample_rate = 48000;
> hdmi->channels = 2;
> - hdmi->disabled = true;
> - hdmi->rxsense = true;
> - hdmi->phy_mask = (u8)~(HDMI_PHY_HPD | HDMI_PHY_RX_SENSE);
> hdmi->mc_clkdis = 0x7f;
> hdmi->last_connector_result = connector_status_disconnected;
>
> @@ -3599,8 +3497,7 @@ void dw_hdmi_remove(struct dw_hdmi *hdmi)
> /* Free, mute and clear phy interrupts */
> devm_free_irq(hdmi->dev, irq, hdmi);
> hdmi_writeb(hdmi, ~0, HDMI_IH_MUTE_PHY_STAT0);
> - hdmi_writeb(hdmi, HDMI_IH_PHY_STAT0_HPD | HDMI_IH_PHY_STAT0_RX_SENSE,
> - HDMI_IH_PHY_STAT0);
> + hdmi_writeb(hdmi, HDMI_IH_PHY_STAT0_HPD, HDMI_IH_PHY_STAT0);
>
> /* Cancel any pending hot plug work */
> cancel_delayed_work_sync(&hdmi->hpd_work);
^ permalink raw reply
* Re: [PATCH v7 19/23] drm: bridge: dw_hdmi: Use delayed_work to debounce hotplug event
From: Neil Armstrong @ 2026-05-20 9:58 UTC (permalink / raw)
To: Jonas Karlman, Andrzej Hajda, Robert Foss, Heiko Stuebner,
Laurent Pinchart, Jernej Skrabec, Luca Ceresoli,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter
Cc: Liu Ying, Sandy Huang, Andy Yan, Chen-Yu Tsai, Christian Hewitt,
Diederik de Haas, Nicolas Frattaroli, Dmitry Baryshkov, dri-devel,
linux-arm-kernel, linux-rockchip, linux-amlogic, linux-sunxi, imx,
linux-kernel
In-Reply-To: <20260518180206.2480119-20-jonas@kwiboo.se>
Hi,
On 5/18/26 20:01, Jonas Karlman wrote:
> HDMI Specification Version 1.4b chapter 8.5 mentions:
>
> An HDMI Sink shall not assert high voltage level on its Hot Plug
> Detect pin when the E-EDID is not available for reading.
>
> A Source may use a high voltage level Hot Plug Detect signal to
> initiate the reading of E-EDID data.
>
> An HDMI Sink shall indicate any change to the contents of the E-EDID
> by driving a low voltage level pulse on the Hot Plug Detect pin. This
> pulse shall be at least 100 msec.
>
> Use a delayed work to debounce reacting on HPD events to improve
> handling of a HPD low voltage level pulse when a sink changes the EDID.
>
> The delayed work is only enabled between enable_hpd()/hpd_enable() and
> disable_hpd()/hpd_disable() calls from core, i.e. enabled after
> attach/bind/resume and disabled before detach/unbind/suspend.
>
> The 1100 msec hotplug debounce timeout was arbitrarily picked to match
> other drivers using same const, and testing using a Raspberry Pi Monitor
> seem to use a 200-300 msec pulse when going from standby to power on
> state.
The logic looks ok, but I'm puzzled by the 1.1 sec debounce, which after
plugging in a monitor will only send an irq event after 1.1s which is very long.
Since the spec says 100ms and the real worls values are more like 200-300ms,
I would first reduce this to 500ms.
But as I understand the code right now, on the first HPD front the irq work
is programmed to run after the debounce time, but if it's a pulse the irq would
also trigger on the second HPD front and then delay again the work after the
debounce time.
My understanding of a debounce was that we "ignore" the pulse by only generating
a single irq event when the pulse is finished.
The current code does that, we will only have a single irq event and the HPD
will return as connected state, good. But this delays the irq event 1.1s _after_
the end of the pulse, which I would expect the event to be send at tht debounce
time after the start of the pulse.
Like, program the work at the beginning of the pulse, if somehow the pulse ends before
the debounce time, send the irq event immediately, otherwise let the debounce
work run after the debounce time which will trigger a disconnect event.
But the delay is too high, 1.1s could be a manual unplug/plug or bad connector
with false contact on the hpd pin.
I would rather reduce this to something more realistic like 500ms or less and
try to better handle the pulse somehow. But I don't have any idea if the scheme
I described is doable.
Neil
>
> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> ---
> v7: Change to free irq before mute and clear using IH regs, also include
> clear of STAT0_RX_SENSE
> v6: Change back to disable_delayed_work_sync() in hpd disable ops,
> Ensure HPD interrupt is masked and IRQ handler is disabled early
> in dw_hdmi_remove() to prevent any irq re-arming of delayed work,
> Drop use of suspend helper
> v5: Change to none-sync disable_delayed_work() in hpd disable ops,
> Change to cancel_delayed_work_sync() in remove,
> Add cancel_delayed_work_sync() to new suspend helper
> v4: Disable/mask delayed_work until enable_hpd()/hpd_enable(),
> Read connector status directly from HW regs in hpd_work
> v3: New patch
> ---
> drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 80 +++++++++++++++++++++--
> 1 file changed, 75 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
> index 8afc9d240121..270db58a0e7c 100644
> --- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
> +++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
> @@ -50,6 +50,8 @@
>
> #define HDMI14_MAX_TMDSCLK 340000000
>
> +#define HOTPLUG_DEBOUNCE_MS 1100
> +
> static const u16 csc_coeff_default[3][4] = {
> { 0x2000, 0x0000, 0x0000, 0x0000 },
> { 0x0000, 0x2000, 0x0000, 0x0000 },
> @@ -185,6 +187,7 @@ struct dw_hdmi {
> hdmi_codec_plugged_cb plugged_cb;
> struct device *codec_dev;
> enum drm_connector_status last_connector_result;
> + struct delayed_work hpd_work;
> };
>
> const struct dw_hdmi_plat_data *dw_hdmi_to_plat_data(struct dw_hdmi *hdmi)
> @@ -2517,6 +2520,20 @@ static void dw_hdmi_connector_force(struct drm_connector *connector)
> dw_hdmi_connector_status_update(hdmi, connector, connector->status);
> }
>
> +static void dw_hdmi_connector_enable_hpd(struct drm_connector *connector)
> +{
> + struct dw_hdmi *hdmi = container_of(connector, struct dw_hdmi, connector);
> +
> + enable_delayed_work(&hdmi->hpd_work);
> +}
> +
> +static void dw_hdmi_connector_disable_hpd(struct drm_connector *connector)
> +{
> + struct dw_hdmi *hdmi = container_of(connector, struct dw_hdmi, connector);
> +
> + disable_delayed_work_sync(&hdmi->hpd_work);
> +}
> +
> static void dw_hdmi_connector_destroy(struct drm_connector *connector)
> {
> struct dw_hdmi *hdmi = container_of(connector, struct dw_hdmi, connector);
> @@ -2538,6 +2555,8 @@ static const struct drm_connector_funcs dw_hdmi_connector_funcs = {
> static const struct drm_connector_helper_funcs dw_hdmi_connector_helper_funcs = {
> .get_modes = dw_hdmi_connector_get_modes,
> .atomic_check = dw_hdmi_connector_atomic_check,
> + .enable_hpd = dw_hdmi_connector_enable_hpd,
> + .disable_hpd = dw_hdmi_connector_disable_hpd,
> };
>
> static int dw_hdmi_connector_create(struct dw_hdmi *hdmi)
> @@ -2968,6 +2987,20 @@ static const struct drm_edid *dw_hdmi_bridge_edid_read(struct drm_bridge *bridge
> return dw_hdmi_edid_read(hdmi, connector);
> }
>
> +static void dw_hdmi_bridge_hpd_enable(struct drm_bridge *bridge)
> +{
> + struct dw_hdmi *hdmi = bridge->driver_private;
> +
> + enable_delayed_work(&hdmi->hpd_work);
> +}
> +
> +static void dw_hdmi_bridge_hpd_disable(struct drm_bridge *bridge)
> +{
> + struct dw_hdmi *hdmi = bridge->driver_private;
> +
> + disable_delayed_work_sync(&hdmi->hpd_work);
> +}
> +
> static const struct drm_bridge_funcs dw_hdmi_bridge_funcs = {
> .atomic_duplicate_state = drm_atomic_helper_bridge_duplicate_state,
> .atomic_destroy_state = drm_atomic_helper_bridge_destroy_state,
> @@ -2981,6 +3014,8 @@ static const struct drm_bridge_funcs dw_hdmi_bridge_funcs = {
> .mode_valid = dw_hdmi_bridge_mode_valid,
> .detect = dw_hdmi_bridge_detect,
> .edid_read = dw_hdmi_bridge_edid_read,
> + .hpd_enable = dw_hdmi_bridge_hpd_enable,
> + .hpd_disable = dw_hdmi_bridge_hpd_disable,
> };
>
> /* -----------------------------------------------------------------------------
> @@ -3101,8 +3136,8 @@ static irqreturn_t dw_hdmi_irq(int irq, void *dev_id)
> status == connector_status_connected ?
> "plugin" : "plugout");
>
> - if (hdmi->bridge.dev)
> - drm_helper_hpd_irq_event(hdmi->bridge.dev);
> + mod_delayed_work(system_percpu_wq, &hdmi->hpd_work,
> + msecs_to_jiffies(HOTPLUG_DEBOUNCE_MS));
> }
>
> hdmi_writeb(hdmi, intr_stat, HDMI_IH_PHY_STAT0);
> @@ -3112,6 +3147,29 @@ static irqreturn_t dw_hdmi_irq(int irq, void *dev_id)
> return IRQ_HANDLED;
> }
>
> +static void dw_hdmi_hpd_work(struct work_struct *work)
> +{
> + struct dw_hdmi *hdmi = container_of(work, struct dw_hdmi, hpd_work.work);
> + struct drm_device *dev = hdmi->bridge.dev;
> +
> + if (WARN_ON(!dev))
> + return;
> +
> + /*
> + * Notify the DRM core of the HPD event using drm_helper_hpd_irq_event()
> + * instead of drm_bridge_hpd_notify(). This will cause the DRM function
> + * check_connector_changed() to be called, which in turn calls the
> + * connector detect()/force() funcs to detect any connection status or
> + * epoch changes. The bridge connector detect() func also ensures that
> + * any hpd_notify() funcs are called for all bridges in the chain.
> + *
> + * drm_bridge_hpd_notify() shares a mutex with drm_bridge_hpd_disable(),
> + * and can result in a deadlock due to the disable_delayed_work_sync()
> + * call to wait on work to complete in dw_hdmi_bridge_hpd_disable().
> + */
> + drm_helper_hpd_irq_event(dev);
> +}
> +
> static const struct dw_hdmi_phy_data dw_hdmi_phys[] = {
> {
> .type = DW_HDMI_PHY_DWC_HDMI_TX_PHY,
> @@ -3396,6 +3454,9 @@ struct dw_hdmi *dw_hdmi_probe(struct platform_device *pdev,
> goto err_res;
> }
>
> + INIT_DELAYED_WORK(&hdmi->hpd_work, dw_hdmi_hpd_work);
> + disable_delayed_work(&hdmi->hpd_work);
> +
> ret = devm_request_threaded_irq(dev, irq, dw_hdmi_hardirq,
> dw_hdmi_irq, IRQF_SHARED,
> dev_name(dev), hdmi);
> @@ -3532,6 +3593,18 @@ EXPORT_SYMBOL_GPL(dw_hdmi_probe);
>
> void dw_hdmi_remove(struct dw_hdmi *hdmi)
> {
> + struct platform_device *pdev = to_platform_device(hdmi->dev);
> + int irq = platform_get_irq(pdev, 0);
> +
> + /* Free, mute and clear phy interrupts */
> + devm_free_irq(hdmi->dev, irq, hdmi);
> + hdmi_writeb(hdmi, ~0, HDMI_IH_MUTE_PHY_STAT0);
> + hdmi_writeb(hdmi, HDMI_IH_PHY_STAT0_HPD | HDMI_IH_PHY_STAT0_RX_SENSE,
> + HDMI_IH_PHY_STAT0);
> +
> + /* Cancel any pending hot plug work */
> + cancel_delayed_work_sync(&hdmi->hpd_work);
> +
> drm_bridge_remove(&hdmi->bridge);
>
> if (hdmi->audio && !IS_ERR(hdmi->audio))
> @@ -3539,9 +3612,6 @@ void dw_hdmi_remove(struct dw_hdmi *hdmi)
> if (!IS_ERR(hdmi->cec))
> platform_device_unregister(hdmi->cec);
>
> - /* Disable all interrupts */
> - hdmi_writeb(hdmi, ~0, HDMI_IH_MUTE_PHY_STAT0);
> -
> if (hdmi->i2c)
> i2c_del_adapter(&hdmi->i2c->adap);
> else
^ permalink raw reply
* Re: [PATCH v2 3/8] of: reserved_mem: add dumpable flag to opt-in vmcore
From: Marek Szyprowski @ 2026-05-20 9:53 UTC (permalink / raw)
To: Wandun Chen, linux-arm-kernel, linux-kernel, loongarch,
linux-riscv, devicetree, kexec, iommu, zhaomeijing
Cc: catalin.marinas, will, chenhuacai, kernel, pjw, palmer, aou, alex,
robh, saravanak, akpm, bhe, rppt, pasha.tatashin, pratyush,
ruirui.yang, robin.murphy, leitao, kees, coxu, tangyouling,
songshuaishuai
In-Reply-To: <20260520091844.592753-4-chenwandun@lixiang.com>
On 20.05.2026 11:18, Wandun Chen wrote:
> From: Wandun Chen <chenwandun1@gmail.com>
>
> From: Wandun Chen <chenwandun@lixiang.com>
>
> Add a 'dumpable' flag to struct reserved_mem so the kernel can decide
> whether a reserved area should be included in the kdump vmcore. Most
> reserved regions are owned by devices and do not contain data useful
> for kernel crash analysis, so excluding them by default is the right
> behaviour.
>
> Reusable CMA regions are different: pages in a CMA region are handed
> back to the buddy allocator and may contain key data for crash
> analysis, so set dumpable to true in rmem_cma_setup().
>
> Suggested-by: Rob Herring <robh@kernel.org>
> Signed-off-by: Wandun Chen <chenwandun@lixiang.com>
> Tested-by: Meijing Zhao <zhaomeijing@lixiang.com>
> Link: https://lore.kernel.org/all/20260506144542.GA2072596-robh@kernel.org/
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
> include/linux/of_reserved_mem.h | 1 +
> kernel/dma/contiguous.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/include/linux/of_reserved_mem.h b/include/linux/of_reserved_mem.h
> index e8b20b29fa68..55a67cee41ea 100644
> --- a/include/linux/of_reserved_mem.h
> +++ b/include/linux/of_reserved_mem.h
> @@ -15,6 +15,7 @@ struct reserved_mem {
> phys_addr_t base;
> phys_addr_t size;
> void *priv;
> + bool dumpable;
> };
>
> struct reserved_mem_ops {
> diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
> index 03f52bd17120..eddec89eb414 100644
> --- a/kernel/dma/contiguous.c
> +++ b/kernel/dma/contiguous.c
> @@ -579,6 +579,7 @@ static int __init rmem_cma_setup(unsigned long node, struct reserved_mem *rmem)
> dma_contiguous_default_area = cma;
>
> rmem->priv = cma;
> + rmem->dumpable = true;
>
> pr_info("Reserved memory: created CMA memory pool at %pa, size %ld MiB\n",
> &rmem->base, (unsigned long)rmem->size / SZ_1M);
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
^ permalink raw reply
* Re: [PATCH v8 next 00/10] arm_mpam: Introduce Narrow-PARTID feature
From: Zeng Heng @ 2026-05-20 9:47 UTC (permalink / raw)
To: James Morse, ben.horgan, Dave.Martin, tan.shaopeng,
reinette.chatre, fenghuay, tglx, will, hpa, bp, babu.moger,
dave.hansen, mingo, tony.luck, gshan, catalin.marinas
Cc: linux-arm-kernel, x86, linux-kernel, wangkefeng.wang, zengheng4
In-Reply-To: <ec8bc617-9e74-4749-ab33-39d1079415cc@arm.com>
Hi James,
On 2026/5/15 1:06, James Morse wrote:
> Hi Zeng,
>
> (beware this is the first version I've seen - arm have been silently deleting your mail,
> it looks like a problem with DKIM signatures)
>
Thanks for your informing. I will try to send community mails using
huaweicloud email to avoid DKIM signature issues.
Hope it works.
> On 13/04/2026 09:53, Zeng Heng wrote:
>> Background
>> ==========
>>
>> On x86, the resctrl allows creating up to num_rmids monitoring groups
>> under parent control group. However, ARM64 MPAM is currently limited by
>> the PMG (Performance Monitoring Group) count, which is typically much
>> smaller than the theoretical RMID limit.
>
> The MPAM PMG limit is 255. Is that not enough?
>
> I think the real problem is the CHI interconnect protocol is forcing people
> to only have 1 bit of PMG - regardless of what the architecture says. This
> isn't an MPAM problem as such - its an implementation issue.
>
> (but we can try and work around it)
>
Yes, the architecture theoretically allows PMG to be up to 8 bits wide,
but many platforms I've worked with (not just Kunpeng) implement far
fewer bits in practice.
>
>> This creates a significant
>> scalability gap: users expecting fine-grained per-process or per-thread
>> monitoring quickly exhaust the PMG space, even when plenty of reqPARTIDs
>> remain available.
>
> This is more about MPAM's philosophical stance that PMG extents PARTID, whereas
> on x86 RMID is an independent number.
>
No value judgment here. ARM seeks to expand the number of monitoring
groups by combining PARTID and PMG within limited bit-width constraints,
which inherently introduces coupling between the two.
> Please don't muddle these - it results in muddled patches!
> If we want to try and attack both with narrowing, we should do them separately.
>
>
>> The Narrow-PARTID feature, defined in the ARM MPAM architecture,
>> addresses this by associating reqPARTIDs with intPARTIDs through a
>> programmable many-to-one mapping. This allows the kernel to present more
>> logical monitoring contexts.
>
> I'd put this as "can be abused to avoid this problem"! We still have a problem with
> controls that don't alias and need to be removed from MSC that don't support narrowing.
> This isn't what the feature was designed for - but it is a really cool trick, it works
> for some real platforms, and solves a problem seen in user-space.
>
> However - throughout this series you seem to be discarding all the control-group support
> for a monitoring-only setup that allocates intPARTID for everything. This might work for
> your use-case on your platform, but it doesn't generalise to platforms without narrowing
> or where multiple control-groups are needed.
>
Currently, for MSCs that have non-aliasing controls but do not support
the Narrow PARTID feature, this solution will directly disable itself,
rather than hiding the non-aliasing control capabilities (Patch 3:
https://lore.kernel.org/all/20260413085405.1166412-4-zengheng4@huawei.com/).
This does indeed affect the enablement of this solution on MSC systems
without narrowing capability.
On the contrary, the solution attempts to preserve as many intPARTIDs
(i.e., control groups) as original. In principle, I hope that on
systems where narrow PARTID was not previously enabled, this patch set
can create as many monitoring groups as possible without changing any
other functionality.
And also allows users to limit the intpartid_max count via boot
parameters. (Patch 6:
https://lore.kernel.org/all/20260413085405.1166412-7-zengheng4@huawei.com/)
>
>> Design Overview
>> ===============
>>
>> The implementation extends the RMID encoding to carry reqPARTID
>> information:
>>
>> RMID = reqPARTID * NUM_PMG + PMG
>>
>> In this patchset, a monitoring group is uniquely identified by the
>> combination of reqPARTID and PMG. The closid is represented by intPARTID,
>> which is exactly the original PARTID.
>
> The way I think of this is 'RMID' bits being spilled into PARTID. This
> means each control group has a set of PARTID. For MSC using narrowing,
> CLOSID would be the intPARTID value. But as you note, we need to support
> mismatches:
>
>
Yes.
>> For systems with homogeneous MSCs (all supporting Narrow-PARTID), the
>> driver exposes the full reqPARTID range directly. For heterogeneous
>> systems where some MSCs lack Narrow-PARTID support, the driver utilizes
>> PARTIDs beyond the intPARTID range as reqPARTIDs to expand monitoring
>> capability. The sole exception is when any type of MSCs lack Narrow-PARTID
>> support, their percentage-based control mechanism prevents the use of
>> PARTIDs as reqPARTIDs.
>
> It'd be good to have some discussion about what the interface between the
> mpam_devices code and any other user (like resctrl) should be.
>
> As a hypothetical system to think about:
> 64 PARTID at the L3, which support CPOR and CCAP
> 64 PARTID and narrowing to 16 at the SLC, which supoprts CPOR
> 64 PARTID and narrowing to 32 at the memory-controller, which support MBWU_MAX
>
By the way, in this case, the L3 does not support NP and has CCAP, so
the PARTID mapping extension(PME) is not enabled by default.
If we exclude the L3 CCAP, the solution would support 16 control groups
and (64 * PMG) monitoring groups.
> I think whether using intPARTID is a benefit needs to be user-space policy.
> You've likely got a platform where that choice is obvious - but it is a
> trade-off as you lose the non-aliasing controls. In the example above, using
> narrowing on this system means losing the CCAP controls on L3 as they don't alias [*].
> Where its a policy, its likely to be one policy for resctrl, and another for any other
> user.
> We can get the resctrl glue code to turn it on unconditionally if there is no trade off,
> I think that means: no non-aliasing controls in any class that doesn't support narrowing
> - including 'unknown'. (we couldn't add them to resctrl in the future if you already chose
> to enable this).
>
Currently, after MPAM initialization, the PARTID mapping extension(PME)
is enabled by default unless there exists an MSC that both lacks NP
support and has non-aliasing controls — this is purely beneficial with
no downsides. Going forward, we may consider adding a `force_reqpartid`
option to forcibly enable the feature and disable non-aliasing controls.
> As for the interface with mpam_devices:
> I think this means the resctrl glue code needs to be able to discover which
> classes support intPARTID, and how many controls they actually have. From there
> it can apply to policy to determine whether its better to support fewer features
> in resctrl to get more RMID. (the alternative is always to ignore the MSC with
> narrowing - narrowing lets hardware lie about the features it supports).
>
> Currently the resctrl glue code has to program a configuration for two PARTID
> when CDP is being hidden on the MB resource. This is ugly and fragile. I'd like
> to explore generalising it as this narrowing stuff will also need to apply a
> configuration to a set of PARTID when that MSC doesn't support narrowing.
> In the example above, we'd need to discard the CCAP controls and write the same
> CPOR bitmap to each PARTID that is mapped together by narrowing.
>
One option is to expand CDP compatibility by PME: L3DATA and L3CODE
would still be controlled separately, while MB control would be
consolidated via narrow mapping onto a single intPARTID.
Of course, this requires that the MB supports narrowing.
>
> I think this means the resctrl glue code will need to be able to write a configuration
> to controls using the full partid_max range as it does today. But also be able to set
> the narrowing mapping on classes that support it.
> For the monitors, the resctrl glue code will need to allocate and configure a set of
> monitors, and read and sum them. This will be regardless of whether narrowing is
> supported. >
> I think this means allocating a table of CLOSID to PARTID(s). the intPARTID would
> always match the CLOSID. Monitors and non-narrowing MSC would need to walk the list.
> I'm hoping we can make CDP a subset of this problem.
> Some clever arithmetic may save allocating memory for a table - but if we change resctrl
> to do this dynamically, the numbers become arbitrary forcing it to be a table.
> It might also be possible to support moving monitor-groups between control groups with
> the table driven approach. (see what you think on how complex it ends up ...)
>
In the current patch series, static allocation employs a
straightforward intPARTID-to-reqPARTID translation, while dynamic
management tracks the mappings via `reqpartid_map` table.
> I'd like to keep that grouping static for now, the table needs creating at setup time,
> (+/- CDP), to avoid problems like you've found with CDP. This means the intpartid mappings
> can be written once at setup time.
>
> I'd like to avoid exposing user ABI to control this until we get it working, then we can
> talk about whether to try making the grouping dynamically managed by resctrl. (there were
> some proposals in that area - but I can't find them on lore).
> If there are platforms were its certainly not a trade-off, we can enable it
> unconditionally - but I'm wary of this being "what we care about now", requiring user-abi
> to enable features that were detectable.
> e.g. we ignore an unknown MSC, and add a resctrl schema for it later - only we can't
> expose it if we were using narrowing. Now its a trade-off.
>
>
>> Capability Improvements
>> =======================
>>
>> --------------------------------------------------------------------------
>> The maximum | Sub-monitoring groups | System-wide
>> number of | under a control group | monitoring groups
>> --------------------------------------------------------------------------
>> Without reqPARTID | PMG | intPARTID * PMG
>> --------------------------------------------------------------------------
>> reqPARTID | |
>> static allocation | (reqPARTID // intPARTID) * PMG | reqPARTID * PMG
>> --------------------------------------------------------------------------
>> reqPARTID | |
>> dynamic allocation | (reqPARTID - intPARTID + 1) * PMG | reqPARTID * PMG
>> --------------------------------------------------------------------------
>>
>> Note: The number of intPARTIDs can be capped via the boot parameter
>> mpam.intpartid_max. Under MPAM, reqPARTID count is always greater than
>> or equal to intPARTID count.
>>
>> Series Structure
>> ================
>>
>> Patch 1: Fix pre-existing out-of-range PARTID issue between mount sessions.
>> Patches 2-6: Implement static reqPARTID allocation.
>> Patches 7-10: Implement dynamic reqPARTID allocation.
>
> I've had a hard time following this series. You dive in with invasive changes, then
> unbreak things in later patches.
>
> Please added the needed infrastructure in mpam_devices.c first. This should be free of
> resctrl-isms, and 'only' needs reviewing against the architecture.
>
> Then add the resctrl glue code stuff. That needs to comply with what resctrl expects.
>
> I think the cleanest way to think about this is to break the mapping between CLOSID and
> PARTID. We're effectively moving bits of RMID out of PMG into PARTID. Adding helpers
> to explicitly do this early in those patches will make your changes clearer.
> Please avoid spraying the narrowing terms for things everywhere.
>
>
Sure, I'll reorder the series to introduce the core infrastructure in
mpam_devices.c first. Should I drop the dynamic allocation part from
this series for now?
>
>
> [*] It's terminology from discussing this with Dave, just in case a summary is needed:
> aliasing controls are like CPOR where two different PARTID with the same bitmap
> compete for the same resource. If you give them each the same 50% of the portions,
> they can't exceed that together.
> non-aliasing controls are like CCAP where to different PARTID with the same fraction
> compete for different resources. If you give them each 50% of the capacity, it adds
> up to 100%. You can't represent 'the same' 50% using these controls.
>
> Narrowing papers over this problem with its remapping table, which gives you a 'same'
> property. For MSC that have controls of that shape - and where more monitors are
> desired - we'd have to drop the controls.
>
> I think "more monitors are desired" is going to need to be user-space policy. But
> we can come back to how to do that later.
>
>
I'm not sure if anyone else has formalized these into terminology
before, but I fully agree with the terms "aliasing controls" and "non-
aliasing controls" — they're instantly intuitive for software
developers.
Best regards,
Zeng Heng
^ permalink raw reply
* Re: [PATCH v2 7/7] mm/vmalloc: Stop scanning for compound pages after encountering small pages in vmap
From: Uladzislau Rezki @ 2026-05-20 9:44 UTC (permalink / raw)
To: Wen Jiang
Cc: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki,
baohua, Xueyuan.chen21, dev.jain, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, Wen Jiang
In-Reply-To: <20260514094108.2016201-8-jiangwen6@xiaomi.com>
On Thu, May 14, 2026 at 05:41:08PM +0800, Wen Jiang wrote:
> From: "Barry Song (Xiaomi)" <baohua@kernel.org>
>
> Users typically allocate memory in descending orders, e.g.
> 8 → 4 → 0. Once an order-0 page is encountered, subsequent
> pages are likely to also be order-0, so we stop scanning
> for compound pages at that point.
>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
> Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
> ---
> mm/vmalloc.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index b3389c8f1..60579bfbf 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3576,6 +3576,12 @@ static int __vmap_huge(unsigned long addr, unsigned long end,
> map_addr = addr;
> idx = i;
> }
> + /*
> + * Once small pages are encountered, the remaining pages
> + * are likely small as well
> + */
> + if (shift == PAGE_SHIFT)
> + break;
>
> addr += 1UL << shift;
> i += 1U << (shift - PAGE_SHIFT);
> --
> 2.34.1
>
Can we squash this patch with
"mm/vmalloc: map contiguous pages in batches for vmap() if possible"?
--
Uladzislau Rezki
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox