Linux-ARM-Kernel Archive on lore.kernel.org

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v3 2/4] mm: use tiered folio allocation for VM_EXEC readahead
From: Jan Kara @ 2026-04-13 11:03 UTC (permalink / raw)
  To: Usama Arif
  Cc: Andrew Morton, david, willy, ryan.roberts, linux-mm, r, jack, ajd,
	apopple, baohua, baolin.wang, brauner, catalin.marinas, dev.jain,
	kees, kevin.brodsky, lance.yang, Liam.Howlett, linux-arm-kernel,
	linux-fsdevel, linux-kernel, Lorenzo Stoakes, mhocko, npache,
	pasha.tatashin, rmclure, rppt, surenb, vbabka, Al Viro,
	wilts.infradead.org, ziy, hannes, kas, shakeel.butt, leitao,
	kernel-team
In-Reply-To: <20260402181326.3107102-3-usama.arif@linux.dev>

On Thu 02-04-26 11:08:23, Usama Arif wrote:
> When executable pages are faulted via do_sync_mmap_readahead(), request
> a folio order that enables the best hardware TLB coalescing available:
> 
> - If the VMA is large enough to contain a full PMD, request
>   HPAGE_PMD_ORDER so the folio can be PMD-mapped. This benefits
>   architectures where PMD_SIZE is reasonable (e.g. 2M on x86-64
>   and arm64 with 4K pages). VM_EXEC VMAs are very unlikely to be
>   large enough for 512M pages on ARM to take into affect.

I'm not sure relying on PMD_SIZE will be too much for a VMA is a great
strategy. With 16k PAGE_SIZE the PMD would be 32MB large which would fit in
the .text size but already looks a bit too much? Mapping with PMD sized
folios brings some benefits but at the same time it costs because now parts
of VMA that would be never paged in are pulled into memory and also LRU
tracking now happens with this very large granularity making it fairly
inefficient (big folios have much higher chances of getting accessed
similarly often making LRU order mostly random). We are already getting
reports of people with small machines (phones etc.) where the memory
overhead of large folios (in the page cache) is simply too much. So I'd
have a bigger peace of mind if we capped folio size at 2MB for now until we
come with a more sophisticated heuristic of picking sensible folio order
given the machine size. Now I'm not really an MM person so my feeling here
may be just wrong but I wanted to voice this concern from what I can see...

								Honza


> - Otherwise, fall back to exec_folio_order(), which returns the
>   minimum order for hardware PTE coalescing for arm64:
>   - arm64 4K:  order 4 (64K) for contpte (16 PTEs → 1 iTLB entry)
>   - arm64 16K: order 2 (64K) for HPA (4 pages → 1 TLB entry)
>   - arm64 64K: order 5 (2M) for contpte (32 PTEs → 1 iTLB entry)
>   - generic:   order 0 (no coalescing)
> 
> Update the arm64 exec_folio_order() to return ilog2(SZ_2M >>
> PAGE_SHIFT) on 64K page configurations, where the previous SZ_64K
> value collapsed to order 0 (a single page) and provided no coalescing
> benefit.
> 
> Use ~__GFP_RECLAIM so the allocation is opportunistic: if a large
> folio is readily available, use it, otherwise fall back to smaller
> folios without stalling on reclaim or compaction. The existing fallback
> in page_cache_ra_order() handles this naturally.
> 
> The readahead window is already clamped to the VMA boundaries, so
> ra->size naturally caps the folio order via ilog2(ra->size) in
> page_cache_ra_order().
> 
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
> ---
>  arch/arm64/include/asm/pgtable.h | 16 +++++++++----
>  mm/filemap.c                     | 40 +++++++++++++++++++++++---------
>  mm/internal.h                    |  3 ++-
>  mm/readahead.c                   |  7 +++---
>  4 files changed, 45 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 52bafe79c10a..9ce9f73a6f35 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -1591,12 +1591,18 @@ static inline void update_mmu_cache_range(struct vm_fault *vmf,
>  #define arch_wants_old_prefaulted_pte	cpu_has_hw_af
>  
>  /*
> - * Request exec memory is read into pagecache in at least 64K folios. This size
> - * can be contpte-mapped when 4K base pages are in use (16 pages into 1 iTLB
> - * entry), and HPA can coalesce it (4 pages into 1 TLB entry) when 16K base
> - * pages are in use.
> + * Request exec memory is read into pagecache in folios large enough for
> + * hardware TLB coalescing. On 4K and 16K page configs this is 64K, which
> + * enables contpte mapping (16 × 4K) and HPA coalescing (4 × 16K). On
> + * 64K page configs, contpte requires 2M (32 × 64K).
>   */
> -#define exec_folio_order() ilog2(SZ_64K >> PAGE_SHIFT)
> +#define exec_folio_order exec_folio_order
> +static inline unsigned int exec_folio_order(void)
> +{
> +	if (PAGE_SIZE == SZ_64K)
> +		return ilog2(SZ_2M >> PAGE_SHIFT);
> +	return ilog2(SZ_64K >> PAGE_SHIFT);
> +}
>  
>  static inline bool pud_sect_supported(void)
>  {
> diff --git a/mm/filemap.c b/mm/filemap.c
> index a4ea869b2ca1..7ffea986b3b4 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3311,6 +3311,7 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
>  	DEFINE_READAHEAD(ractl, file, ra, mapping, vmf->pgoff);
>  	struct file *fpin = NULL;
>  	vm_flags_t vm_flags = vmf->vma->vm_flags;
> +	gfp_t gfp = readahead_gfp_mask(mapping);
>  	bool force_thp_readahead = false;
>  	unsigned short mmap_miss;
>  
> @@ -3363,28 +3364,45 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
>  			ra->size *= 2;
>  		ra->async_size = HPAGE_PMD_NR;
>  		ra->order = HPAGE_PMD_ORDER;
> -		page_cache_ra_order(&ractl, ra);
> +		page_cache_ra_order(&ractl, ra, gfp);
>  		return fpin;
>  	}
>  
>  	if (vm_flags & VM_EXEC) {
>  		/*
> -		 * Allow arch to request a preferred minimum folio order for
> -		 * executable memory. This can often be beneficial to
> -		 * performance if (e.g.) arm64 can contpte-map the folio.
> -		 * Executable memory rarely benefits from readahead, due to its
> -		 * random access nature, so set async_size to 0.
> +		 * Request large folios for executable memory to enable
> +		 * hardware PTE coalescing and PMD mappings:
>  		 *
> -		 * Limit to the boundaries of the VMA to avoid reading in any
> -		 * pad that might exist between sections, which would be a waste
> -		 * of memory.
> +		 *  - If the VMA is large enough for a PMD, request
> +		 *    HPAGE_PMD_ORDER so the folio can be PMD-mapped.
> +		 *  - Otherwise, use exec_folio_order() which returns
> +		 *    the minimum order for hardware TLB coalescing
> +		 *    (e.g. arm64 contpte/HPA).
> +		 *
> +		 * Use ~__GFP_RECLAIM so large folio allocation is
> +		 * opportunistic — if memory isn't readily available,
> +		 * fall back to smaller folios rather than stalling on
> +		 * reclaim or compaction.
> +		 *
> +		 * Executable memory rarely benefits from speculative
> +		 * readahead due to its random access nature, so set
> +		 * async_size to 0.
> +		 *
> +		 * Limit to the boundaries of the VMA to avoid reading
> +		 * in any pad that might exist between sections, which
> +		 * would be a waste of memory.
>  		 */
> +		gfp &= ~__GFP_RECLAIM;
>  		struct vm_area_struct *vma = vmf->vma;
>  		unsigned long start = vma->vm_pgoff;
>  		unsigned long end = start + vma_pages(vma);
>  		unsigned long ra_end;
>  
> -		ra->order = exec_folio_order();
> +		if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> +		    vma_pages(vma) >= HPAGE_PMD_NR)
> +			ra->order = HPAGE_PMD_ORDER;
> +		else
> +			ra->order = exec_folio_order();
>  		ra->start = round_down(vmf->pgoff, 1UL << ra->order);
>  		ra->start = max(ra->start, start);
>  		ra_end = round_up(ra->start + ra->ra_pages, 1UL << ra->order);
> @@ -3403,7 +3421,7 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
>  
>  	fpin = maybe_unlock_mmap_for_io(vmf, fpin);
>  	ractl._index = ra->start;
> -	page_cache_ra_order(&ractl, ra);
> +	page_cache_ra_order(&ractl, ra, gfp);
>  	return fpin;
>  }
>  
> diff --git a/mm/internal.h b/mm/internal.h
> index 475bd281a10d..e624cb619057 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -545,7 +545,8 @@ int zap_vma_for_reaping(struct vm_area_struct *vma);
>  int folio_unmap_invalidate(struct address_space *mapping, struct folio *folio,
>  			   gfp_t gfp);
>  
> -void page_cache_ra_order(struct readahead_control *, struct file_ra_state *);
> +void page_cache_ra_order(struct readahead_control *, struct file_ra_state *,
> +			 gfp_t gfp);
>  void force_page_cache_ra(struct readahead_control *, unsigned long nr);
>  static inline void force_page_cache_readahead(struct address_space *mapping,
>  		struct file *file, pgoff_t index, unsigned long nr_to_read)
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 7b05082c89ea..b3dc08cf180c 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -465,7 +465,7 @@ static inline int ra_alloc_folio(struct readahead_control *ractl, pgoff_t index,
>  }
>  
>  void page_cache_ra_order(struct readahead_control *ractl,
> -		struct file_ra_state *ra)
> +		struct file_ra_state *ra, gfp_t gfp)
>  {
>  	struct address_space *mapping = ractl->mapping;
>  	pgoff_t start = readahead_index(ractl);
> @@ -475,7 +475,6 @@ void page_cache_ra_order(struct readahead_control *ractl,
>  	pgoff_t mark = index + ra->size - ra->async_size;
>  	unsigned int nofs;
>  	int err = 0;
> -	gfp_t gfp = readahead_gfp_mask(mapping);
>  	unsigned int new_order = ra->order;
>  
>  	trace_page_cache_ra_order(mapping->host, start, ra);
> @@ -626,7 +625,7 @@ void page_cache_sync_ra(struct readahead_control *ractl,
>  readit:
>  	ra->order = 0;
>  	ractl->_index = ra->start;
> -	page_cache_ra_order(ractl, ra);
> +	page_cache_ra_order(ractl, ra, readahead_gfp_mask(ractl->mapping));
>  }
>  EXPORT_SYMBOL_GPL(page_cache_sync_ra);
>  
> @@ -697,7 +696,7 @@ void page_cache_async_ra(struct readahead_control *ractl,
>  		ra->size -= end - aligned_end;
>  	ra->async_size = ra->size;
>  	ractl->_index = ra->start;
> -	page_cache_ra_order(ractl, ra);
> +	page_cache_ra_order(ractl, ra, readahead_gfp_mask(ractl->mapping));
>  }
>  EXPORT_SYMBOL_GPL(page_cache_async_ra);
>  
> -- 
> 2.52.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply

* Re: [PATCH 3/4] pinctrl: vt8500: Enable compile testing
From: Krzysztof Kozlowski @ 2026-04-13 11:12 UTC (permalink / raw)
  To: Sander Vanheule, Linus Walleij, Andreas Färber
  Cc: linux-gpio, linux-kernel, Andrew Jeffery, linux-aspeed, openbmc,
	linux-arm-kernel, Joel Stanley, linux-realtek-soc, James Tai,
	Yu-Chun Lin
In-Reply-To: <a5e993d2b6c8b57d2057909812ce831877762bd6.camel@svanheule.net>

On 10/04/2026 23:22, Sander Vanheule wrote:
> Hi Krzysztof,
> 
> On Fri, 2026-04-10 at 15:04 +0200, Krzysztof Kozlowski wrote:
>> Enable compile testing for Realtek pin controller drivers for increased
> 
> Small nit, but this looks like a copy-paste error from the other patch.
> 
> 	Realtek -> VIA/Wondermedia (or vt8500, whatever you prefer)
> 

Yes.

Best regards,
Krzysztof


^ permalink raw reply

* Re: [PATCH 4/4] ARM: realtek: MAINTAINERS: Include pin controller drivers
From: Krzysztof Kozlowski @ 2026-04-13 11:13 UTC (permalink / raw)
  To: Yu-Chun Lin [林祐君], Linus Walleij,
	Andreas Färber
  Cc: linux-gpio@vger.kernel.org, linux-kernel@vger.kernel.org,
	Andrew Jeffery, linux-aspeed@lists.ozlabs.org,
	openbmc@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org,
	Joel Stanley, linux-realtek-soc@lists.infradead.org,
	James Tai [戴志峰]
In-Reply-To: <45866135c8a54e1d98cac51932ca8e1a@realtek.com>

On 13/04/2026 11:23, Yu-Chun Lin [林祐君] wrote:
>> No dedicated maintainers are shown for Realtek SoC pin controllers, except
>> pinctrl subsystem maintainer, which means reduced review and impression of
>> abandoned drivers.  Pin controller drivers are essential part of an SoC, so in
>> case of lack of dedicated entry at least cover it by the SoC platform
>> maintainers.
>>
>> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
>>
>> ---
>>
>> This patch should go via Realtek SoC maintainers, not pinctrl.
>> ---
>>  MAINTAINERS | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 10d12b51b1f6..374ce55e4fb6 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -3373,6 +3373,7 @@ F:
>> Documentation/devicetree/bindings/arm/realtek.yaml
>>  F:     arch/arm/boot/dts/realtek/
>>  F:     arch/arm/mach-realtek/
>>  F:     arch/arm64/boot/dts/realtek/
>> +F:     drivers/pinctrl/realtek/
>>
>>  ARM/RISC-V/RENESAS ARCHITECTURE
>>  M:     Geert Uytterhoeven <geert+renesas@glider.be>
>>
>> --
>> 2.51.0
> 
> Acked-by: Yu-Chun Lin <eleanor.lin@realtek.com>

So James will pick it up?

Best regards,
Krzysztof


^ permalink raw reply

* Re: [PATCH] mm/arm: pgtable: remove young bit check for pte_valid_user
From: Brian Ruley @ 2026-04-13 11:17 UTC (permalink / raw)
  To: Will Deacon
  Cc: Russell King (Oracle), Steve Capper, linux-arm-kernel,
	linux-kernel, catalin.marinas
In-Reply-To: <adzMOdySgMIePcue@willie-the-truck>

On Apr 13, Will Deacon wrote:
> 
> On Fri, Apr 10, 2026 at 02:01:41PM +0300, Brian Ruley wrote:
> > On Apr 09, Russell King (Oracle) wrote:
> > >
> > > On Thu, Apr 09, 2026 at 06:17:36PM +0300, Brian Ruley wrote:
> > > > However, in the case I describe, if VA_B is mapped immediately to pfn_q
> > > > after it been has unmapped and freed for VA_A, then it's quite possible
> > > > that the page is still indexed in the cache.
> > >
> > > True.
> > >
> > > > The hypothesis is that if
> > > > VA_A and VA_B land in the same I-cache set and VA_A old cache entry
> > > > still exists (tagged with pfn_q), then the CPU can fetch stale
> > > > instructions because the tag will match. That's one reason why we need
> > > > to invalidate the cache, but that will be skipped in the path:
> > > >
> > > >     migrate_pages
> > > >      migrate_pages_batch
> > > >       migrate_folio_move
> > > >        remove_migration_ptes
> > > >         remove_migration_pte
> > > >          set_pte_at
> > > >           set_ptes
> > > >            __sync_icache_dcache  (skipped if !young)
> > > >             set_pte_ext
> > >
> > > In this case, if the old PTE was marked !young, then the new PTE will
> > > have:
> > >         pte = pte_mkold(pte);
> > >
> > > on it, which marks it !young. As you say, __sync_icache_dcache() will
> > > be skipped. While a PTE entry will be set for the kernel, the code in
> > > set_pte_ext() will *not* establish a hardware PTE entry. For the
> > > 2-level pte code:
> > >
> > >         tst     r1, #L_PTE_YOUNG        @ <- results in Z being set
> > >         tstne   r1, #L_PTE_VALID        @ <- not executed
> > >         eorne   r1, r1, #L_PTE_NONE     @ <- not executed
> > >         tstne   r1, #L_PTE_NONE         @ <- not executed
> > >         moveq   r3, #0                  @ <- hardware PTE value
> > >  ARM(   str     r3, [r0, #2048]! )      @ <- writes hardware PTE
> > >
> > > So, for a !young PTE, the hardware PTE entry is written as zero,
> > > which means accesses should fault, which will then cause the PTE to
> > > be marked young.
> > >
> > > For the 3-level case, the L_PTE_YOUNG bit corresponds with the AF bit
> > > in the PTE, and there aren't split Linux / hardware PTE entries. AF
> > > being clear should result in a page fault being generated for the
> > > kernel to handle making the PTE young.
> > >
> > > In both of these cases, set_ptes() will need to be called with the
> > > updated PTE which will now be marked young, and that will result in
> > > the I-cache being flushed.
> >
> > Hi Russell,
> >
> > Thank you for the clarification, this is very educational for me.
> > I understand your scepticism, and I can't explain what's going on based
> > on what you replied. However, I do honestly believe there is a problem
> > here. I'll share the exact testing details and the instrumentation
> > we added that convinced us to reach out at the end. One idea we also
> > had was that could cache aliasing be happening here.
> 
> I thought a bit more about this over the weekend and started to wonder
> if there's a potential race where multiple CPUs try to write the same
> PTE and don't synchronise properly on the cache-maintenance.
> 
> In particular, PG_dcache_clean is manipulated with a test_and_set_bit()
> operation _before_ the cache maintenance is performed, so there's a
> small window where the flag is set but the page is _not_ clean. I don't
> think that matters with regards to invalid migration entries, but
> perhaps the migration just means that we end up putting down a bunch of
> 'old' entries and are then more likely to see concurrent faults trying
> to make the thing young again, potentially hitting this race.
> 
> Looking at arm64 this morning, I noticed that we split the flag
> manipulation so that it's set with a set_bit() after the maintenance has
> been performed. Git then points to 588a513d3425 ("arm64: Fix race
> condition on PG_dcache_clean in __sync_icache_dcache()") which seems to
> talk about the same race. In fact, the mailing list posting:
> 
>   https://lore.kernel.org/all/20210514095001.13236-1-catalin.marinas@arm.com/
> 
> points out that arch/arm/ is also affected but we forgot to CC Russell
> because I think this all came out of the MTE-enablement work [1] and it
> sounds like Catalin was trying to fix it in the core mprotect() code.
> 
> Brian, can you try something like 588a513d3425?
> 
> Will
> 
> [1] https://lore.kernel.org/all/YJGHApOCXl811VK3@arm.com/

I'll try it, thanks.  

Best regards,
Brian


^ permalink raw reply

* Re: [PATCH 2/4] soc: amlogic: clk-measure: Add A1 and T7 support
From: Jian Hu @ 2026-04-13 11:33 UTC (permalink / raw)
  To: Neil Armstrong, Krzysztof Kozlowski, Jerome Brunet, Kevin Hilman,
	Michael Turquette, Martin Blumenstingl, robh+dt, Rob Herring
  Cc: devicetree, linux-amlogic, linux-kernel, linux-arm-kernel
In-Reply-To: <3a08bb84-b313-4b3b-bb61-1b686226e902@linaro.org>


On 4/13/2026 5:24 PM, Neil Armstrong wrote:
> [ EXTERNAL EMAIL ]
>
> On 4/13/26 11:10, Krzysztof Kozlowski wrote:
>> On 13/04/2026 10:21, Jian Hu wrote:
>>>
>>> On 4/12/2026 5:55 PM, Krzysztof Kozlowski wrote:
>>>> [ EXTERNAL EMAIL ]
>>>>
>>>> On 10/04/2026 12:03, Jian Hu wrote:
>>>>> Add support for the A1 and T7 SoC family in amlogic clk measure.
>>>>>
>>>>> Signed-off-by: Jian Hu <jian.hu@amlogic.com>
>>>>> ---
>>>>>    drivers/soc/amlogic/meson-clk-measure.c | 272 
>>>>> ++++++++++++++++++++++++
>>>>>    1 file changed, 272 insertions(+)
>>>>>
>>>>> diff --git a/drivers/soc/amlogic/meson-clk-measure.c 
>>>>> b/drivers/soc/amlogic/meson-clk-measure.c
>>>>> index d862e30a244e..083524671b76 100644
>>>>> --- a/drivers/soc/amlogic/meson-clk-measure.c
>>>>> +++ b/drivers/soc/amlogic/meson-clk-measure.c
>>>>> @@ -787,6 +787,258 @@ static const struct meson_msr_id 
>>>>> clk_msr_s4[] = {
>>>>>
>>>>>    };
>>>>>
>>>>> +static struct meson_msr_id clk_msr_a1[] = {
>>>> And existing code uses what sort of array? Seems you send us 
>>>> obsolete or
>>>> downstream code.
>>>
>>>
>>> Thanks for your review.
>>>
>>>
>>> I have checked the previous Amlogic SoC's commits. Such as Amlogic AXG,
>>> G12A, C3, S4.
>>>
>>> The clk_msr_xx entry is added after last SoC's array, sorted by
>>> submissin date rather than alphabetical order.
>>>
>>> So I place A1 and T7 after S4 accordingly.
>>>
>>>
>>> The A1 clock controller driver was already supported in
>>> https://lore.kernel.org/all/20230523135351.19133-7-ddrokosov@sberdevices.ru/ 
>>>
>>>
>>> It is also present in the mainline kernel:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/Kconfig#n113 
>>>
>>>
>>>
>>> This clock measure IP is used to measure the internal clock paths
>>> frequencies,  and A1 clock controller driver was supported.
>>>
>>> Since the corresponding clock measure driver does not support A1 
>>> yet, So
>>> add A1 clk msr here.
>>
>> No, what qualifiers or keywords are used for existing arrays? IOW,
>> please investigate and understand why you are doing this very different
>> than existing code. Maybe because you sent us downstream, so you
>> replicated all other downstream issues.
>
> I see, the existing uses "static const struct".
>
> Jian, could to switch to that please ?
>
> Neil
>
>>
>> Best regards,
>> Krzysztof


Hi, Krysztof & neil


Got it. Thank you pointing out the missing "const".  I mistakenly 
thought it was an alphabetical order issue.

I will fix it in the next verion.


Best regards,

Jian



^ permalink raw reply

* Re: [PATCH v2 0/4] usb: dwc3: xilinx: Add Versal2 MMI USB 3.2 controller support
From: Pandey, Radhey Shyam @ 2026-04-13 11:33 UTC (permalink / raw)
  To: Thinh Nguyen, Radhey Shyam Pandey
  Cc: gregkh@linuxfoundation.org, robh@kernel.org, krzk+dt@kernel.org,
	conor+dt@kernel.org, michal.simek@amd.com, p.zabel@pengutronix.de,
	linux-usb@vger.kernel.org, devicetree@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, git@amd.com
In-Reply-To: <20260401230401.w2si3gnqvzlszduh@synopsys.com>

> On Tue, Mar 31, 2026, Radhey Shyam Pandey wrote:
>> This series introduces support for the Multi-Media Integrated (MMI) USB
>> 3.2 Dual-Role Device (DRD) controller on Xilinx Versal2 platforms.
>>
>> The controller supports SSP(10-Gbps), SuperSpeed, high-speed, full-speed
>> and low-speed operation modes.
>>
>> USB2 and USB3 PHY support Physical connectivity via the Type-C
>> connectivity. DWC3 wrapper IP IO space is in SLCR so reg is made
>> optional.
>>
>> The driver is required for the clock, reset and platform specific
>> initialization (coherency/TX_DEEMPH etc). In this initial version typec
>> reversibility is not implemented and it is assumed that USB3 PHY TCA mux
>> programming is done by MMI configuration data object (CDOs) and TI PD
>> controller is configured using external tiva programmer on VEK385
>> evaluation board.
>>
>> Changes for v2:
>> - DT binding: fix MHz spacing (SI convention), reorder description
>>    before $ref in xlnx,usb-syscon, restore zynqmp-dwc3 example and add
>>    versal2-mmi-dwc3 example, fix node name for no-reg case, use 1/1
>>    address/size configuration and lowercase hex in syscon offsets.
>> - Split config struct refactoring (device_get_match_data,dwc3_xlnx_config)
>>    into a separate preparatory patch.
>> - Fix error message capitalization to lowercase per kernel convention.
>> - Rename property snps,lcsr_tx_deemph to snps,lcsr-tx-deemph (hyphens).
>> - Fix double space in comment and missing blank line in core.h.
>> - Use platform data instead of of_device_is_compatible() check for
>>    deemphasis support.
>>
>> Link: https://urldefense.com/v3/__https://lore.kernel.org/all/20251119193036.2666877-1-radhey.shyam.pandey@amd.com/__;!!A4F2R9G_pg!YSeyY-bpQrMLqswAc1cWND5CSHvGFygPGMEMpR9amrRMnRFjYrFZktzbLzEzVZcQmOW34IUAfwRKHwy7B8p_ciUorWGJsA$
>>
>> Radhey Shyam Pandey (4):
>>    dt-bindings: usb: dwc3-xilinx: Add MMI USB support on Versal Gen2
>>      platform
>>    usb: dwc3: xilinx: Introduce dwc3_xlnx_config for per-platform data
>>    usb: dwc3: xilinx: Add Versal2 MMI USB 3.2 controller support
>>    usb: dwc3: xilinx: Add support to program MMI USB TX deemphasis
>>
>>   .../devicetree/bindings/usb/dwc3-xilinx.yaml  | 70 ++++++++++++++-
>>   drivers/usb/dwc3/core.c                       | 17 ++++
>>   drivers/usb/dwc3/core.h                       |  8 ++
>>   drivers/usb/dwc3/dwc3-xilinx.c                | 89 +++++++++++++++----
>>   4 files changed, 166 insertions(+), 18 deletions(-)
>>
>>
>> base-commit: 46b513250491a7bfc97d98791dbe6a10bcc8129d
>> -- 
>> 2.43.0
>>
> Hi Radhey,
>
> Do you have plans to convert dwc3-xilinx to using the new flatten model?
> The change you have here fits better for the new glue model.
Thanks Thinh for the review.

I have looked into the newly introduced flattened model introduced by
commit 613a2e655d4d ("usb: dwc3: core: Expose core driver as library").
Moving to that approach would require switching to the new DT binding
and doing a large refactor.

Given this series is already implemented and under review,
I suggest we get it merged first, then evaluate the flattened models
benefits and limitations and plan a follow‑up migration if it still
makes sense. If there are no objections, I'll send out v3.

Thanks,
Radhey


^ permalink raw reply

* Re: [PATCH 01/11] arch: arm64: Export arch_smp_send_reschedule for mshv_vtl module
From: Naman Jain @ 2026-04-13 11:44 UTC (permalink / raw)
  To: Michael Kelley, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	x86@kernel.org, H . Peter Anvin, Arnd Bergmann, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Ghiti
  Cc: Marc Zyngier, Timothy Hayes, Lorenzo Pieralisi, mrigendrachaubey,
	ssengar@linux.microsoft.com, linux-hyperv@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-riscv@lists.infradead.org
In-Reply-To: <SN6PR02MB41570A9050B3EB6A905DFF56D450A@SN6PR02MB4157.namprd02.prod.outlook.com>



On 4/1/2026 10:24 PM, Michael Kelley wrote:
> From: Naman Jain <namjain@linux.microsoft.com> Sent: Monday, March 16, 2026 5:13 AM
>>
> 
> Nit: For the patch "Subject", the most common prefix for the file
> arch/arm64/kernel/smp.c is "arm64: smp:".  I'd suggest using that
> prefix for historical consistency.

Acked. Will change in v2.

> 
>> mshv_vtl_main.c calls smp_send_reschedule() which expands to
>> arch_smp_send_reschedule(). When CONFIG_MSHV_VTL=m, the module cannot
>> access this symbol since it is not exported on arm64.
>>
>> smp_send_reschedule() is used in mshv_vtl_cancel() to interrupt a vCPU
>> thread running on another CPU. When a vCPU is looping in
>> mshv_vtl_ioctl_return_to_lower_vtl(), it checks a per-CPU cancel flag
>> before each VTL0 entry. Setting cancel=1 alone is not enough if the
>> target CPU thread is sleeping - the IPI from smp_send_reschedule() kicks
>> the remote CPU out of idle/sleep so it re-checks the cancel flag and
>> exits the loop promptly.
>>
>> Other architectures (riscv, loongarch, powerpc) already export this
>> symbol. Add the same EXPORT_SYMBOL_GPL for arm64. This is required
>> for adding arm64 support in MSHV_VTL.
>>
>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>> ---
>>   arch/arm64/kernel/smp.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>> index 1aa324104afb..26b1a4456ceb 100644
>> --- a/arch/arm64/kernel/smp.c
>> +++ b/arch/arm64/kernel/smp.c
>> @@ -1152,6 +1152,7 @@ void arch_smp_send_reschedule(int cpu)
>>   {
>>   	smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
>>   }
>> +EXPORT_SYMBOL_GPL(arch_smp_send_reschedule);
>>
>>   #ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL
>>   void arch_send_wakeup_ipi(unsigned int cpu)
>> --
>> 2.43.0
>>
> 
> The "Subject" nit notwithstanding,
> 
> Reviewed-by: Michael Kelley <mhklinux@outlook.com>

Thanks,
Naman


^ permalink raw reply

* Re: [PATCH 03/11] Drivers: hv: Add support to setup percpu vmbus handler
From: Naman Jain @ 2026-04-13 11:45 UTC (permalink / raw)
  To: Michael Kelley, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	x86@kernel.org, H . Peter Anvin, Arnd Bergmann, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Ghiti
  Cc: Marc Zyngier, Timothy Hayes, Lorenzo Pieralisi, mrigendrachaubey,
	ssengar@linux.microsoft.com, linux-hyperv@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-riscv@lists.infradead.org
In-Reply-To: <SN6PR02MB41570E0F113FE28CFC839476D450A@SN6PR02MB4157.namprd02.prod.outlook.com>



On 4/1/2026 10:25 PM, Michael Kelley wrote:
> From: Naman Jain <namjain@linux.microsoft.com> Sent: Monday, March 16, 2026 5:13 AM
>>
>> Add a wrapper function - hv_setup_percpu_vmbus_handler(), similar to
>> hv_setup_vmbus_handler() to allow setting up custom per-cpu VMBus
>> interrupt handler. This is required for arm64 support, to be added
>> in MSHV_VTL driver, where per-cpu VMBus interrupt handler will be
>> set to mshv_vtl_vmbus_isr() for VTL2 (Virtual Trust Level 2).
> 
> Needing both hv_setup_vmbus_handler() and
> hv_setup_percpu_vmbus_handler() seems unfortunate. Here's an
> alternate approach to consider:
> 
> 1. I think the x86 VMBus sysvec handler and the vmbus_percpu_isr()
> functions could both use the same vmbus_handler global variable.
> Looking at your changes in this patch set, hv_setup_vmbus_handler()
> and hv_setup_percpu_vmbus_handler() are used together and always
> set the same value.
> 
> 2. So move the global variable vmbus_handler out from arch/x86
> and into hv_common.c, and export it. The x86 sysvec handler can
> still reference it, and vmbus_percpu_isr() in vmbus_drv.c can
> also reference it.  No need to have vmbus_percpu_isr() under
> arch/arm64 or have a stub in hv_common.c.
> 
> 3. hv_setup_vmbus_handler() and hv_remove_vmbus_handler()
> also move to hv_common.c.  The __weak stubs go away.
> 
> With these changes, only hv_setup_vmbus_handler() needs to
> be called, and it works for both x86 with the sysvec handler and
> for arm64 with vmbus_percpu_isr().
> 
> I haven't coded this up, so maybe there's some problematic detail,
> but the idea seems like it would work. If it does work, some of my
> comments below are no longer applicable.
> 

This is a great suggestion. Current implementation looked complex in 
design and it was becoming more complex with the changes I was making 
while addressing Sashiko's AI review comments. However your suggestion 
looks much better. I'll implement it. Thanks for suggesting.

>>
>> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>> ---
>>   arch/arm64/hyperv/mshyperv.c   | 13 +++++++++++++
>>   drivers/hv/hv_common.c         | 11 +++++++++++
>>   drivers/hv/vmbus_drv.c         |  7 +------
>>   include/asm-generic/mshyperv.h |  3 +++
>>   4 files changed, 28 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/arm64/hyperv/mshyperv.c b/arch/arm64/hyperv/mshyperv.c
>> index 4fdc26ade1d7..d4494ceeaad0 100644
>> --- a/arch/arm64/hyperv/mshyperv.c
>> +++ b/arch/arm64/hyperv/mshyperv.c
>> @@ -134,3 +134,16 @@ bool hv_is_hyperv_initialized(void)
>>   	return hyperv_initialized;
>>   }
>>   EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
>> +
>> +static void (*vmbus_percpu_handler)(void);
>> +void hv_setup_percpu_vmbus_handler(void (*handler)(void))
>> +{
>> +	vmbus_percpu_handler = handler;
>> +}
>> +
>> +irqreturn_t vmbus_percpu_isr(int irq, void *dev_id)
>> +{
>> +	if (vmbus_percpu_handler)
>> +		vmbus_percpu_handler();
>> +	return IRQ_HANDLED;
>> +}
>> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
>> index d1ebc0ebd08f..a5064f558bf6 100644
>> --- a/drivers/hv/hv_common.c
>> +++ b/drivers/hv/hv_common.c
>> @@ -759,6 +759,17 @@ void __weak hv_setup_vmbus_handler(void (*handler)(void))
>>   }
>>   EXPORT_SYMBOL_GPL(hv_setup_vmbus_handler);
>>
>> +irqreturn_t __weak vmbus_percpu_isr(int irq, void *dev_id)
>> +{
>> +	return IRQ_HANDLED;
>> +}
>> +EXPORT_SYMBOL_GPL(vmbus_percpu_isr);
>> +
>> +void __weak hv_setup_percpu_vmbus_handler(void (*handler)(void))
>> +{
>> +}
>> +EXPORT_SYMBOL_GPL(hv_setup_percpu_vmbus_handler);
> 
> You've implemented hv_setup_percpu_vmbus_handler() following
> the pattern of hv_setup_vmbus_handler(), which is reasonable.
> But that turns out to be unnecessarily complicated. The existing
> hv_setup_vmbus_handler() has a portion in
> arch/x86/kernel/cpu/mshyperv.c as a special case because it uses a
> hard-coded interrupt vector on x86/x64, and has its own custom
> sysvec code. And there's a need for a __weak stub in hv_common.c
> so that vmbus_drv.c will compile on arm64.
> 
> But hv_setup_percpu_vmbus_handler() does not have the same
> requirements. It could be implemented entirely in vmbus_drv.c,
> with no code under arch/x86 or arch/arm64, and no __weak stub
> in hv_common.c.  vmbus_drv.c would just need to
> EXPORT_SYMBOL_FOR_MODULES, like it already does with vmbus_isr.
> I didn't code it up, but I think that approach would be simpler with
> fewer piece-parts scattered all over. If so, it would be worth
> breaking the symmetry with hv_setup_vmbus_handler().
> 

No longer applicable.

Regards,
Naman


^ permalink raw reply

* Re: [PATCH 04/11] Drivers: hv: Refactor mshv_vtl for ARM64 support to be added
From: Naman Jain @ 2026-04-13 11:46 UTC (permalink / raw)
  To: Michael Kelley, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	x86@kernel.org, H . Peter Anvin, Arnd Bergmann, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Ghiti
  Cc: Marc Zyngier, Timothy Hayes, Lorenzo Pieralisi, mrigendrachaubey,
	ssengar@linux.microsoft.com, linux-hyperv@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-riscv@lists.infradead.org
In-Reply-To: <SN6PR02MB41573C4A21BA96A534E5429CD450A@SN6PR02MB4157.namprd02.prod.outlook.com>



On 4/1/2026 10:26 PM, Michael Kelley wrote:
> From: Naman Jain <namjain@linux.microsoft.com> Sent: Monday, March 16, 2026 5:13 AM
>>
>> Refactor MSHV_VTL driver to move some of the x86 specific code to arch
>> specific files, and add corresponding functions for arm64.
>>
>> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>> ---
>>   arch/arm64/include/asm/mshyperv.h |  10 +++
>>   arch/x86/hyperv/hv_vtl.c          |  98 ++++++++++++++++++++++++++++
>>   arch/x86/include/asm/mshyperv.h   |   1 +
>>   drivers/hv/mshv_vtl_main.c        | 102 +-----------------------------
>>   4 files changed, 111 insertions(+), 100 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/mshyperv.h
>> b/arch/arm64/include/asm/mshyperv.h
>> index b721d3134ab6..804068e0941b 100644
>> --- a/arch/arm64/include/asm/mshyperv.h
>> +++ b/arch/arm64/include/asm/mshyperv.h
>> @@ -60,6 +60,16 @@ static inline u64 hv_get_non_nested_msr(unsigned int reg)
>>   				ARM_SMCCC_SMC_64,		\
>>   				ARM_SMCCC_OWNER_VENDOR_HYP,	\
>>   				HV_SMCCC_FUNC_NUMBER)
>> +#ifdef CONFIG_HYPERV_VTL_MODE
>> +/*
>> + * Get/Set the register. If the function returns `1`, that must be done via
>> + * a hypercall. Returning `0` means success.
>> + */
>> +static inline int hv_vtl_get_set_reg(struct hv_register_assoc *regs, bool set, u64 shared)
>> +{
>> +	return 1;
>> +}
>> +#endif
>>
>>   #include <asm-generic/mshyperv.h>
>>
>> diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c
>> index 9b6a9bc4ab76..72a0bb4ae0c7 100644
>> --- a/arch/x86/hyperv/hv_vtl.c
>> +++ b/arch/x86/hyperv/hv_vtl.c
>> @@ -17,6 +17,8 @@
>>   #include <asm/realmode.h>
>>   #include <asm/reboot.h>
>>   #include <asm/smap.h>
>> +#include <uapi/asm/mtrr.h>
>> +#include <asm/debugreg.h>
>>   #include <linux/export.h>
>>   #include <../kernel/smpboot.h>
>>   #include "../../kernel/fpu/legacy.h"
>> @@ -281,3 +283,99 @@ void mshv_vtl_return_call(struct mshv_vtl_cpu_context *vtl0)
>>   	kernel_fpu_end();
>>   }
>>   EXPORT_SYMBOL(mshv_vtl_return_call);
>> +
>> +/* Static table mapping register names to their corresponding actions */
>> +static const struct {
>> +	enum hv_register_name reg_name;
>> +	int debug_reg_num;  /* -1 if not a debug register */
>> +	u32 msr_addr;       /* 0 if not an MSR */
>> +} reg_table[] = {
>> +	/* Debug registers */
>> +	{HV_X64_REGISTER_DR0, 0, 0},
>> +	{HV_X64_REGISTER_DR1, 1, 0},
>> +	{HV_X64_REGISTER_DR2, 2, 0},
>> +	{HV_X64_REGISTER_DR3, 3, 0},
>> +	{HV_X64_REGISTER_DR6, 6, 0},
>> +	/* MTRR MSRs */
>> +	{HV_X64_REGISTER_MSR_MTRR_CAP, -1, MSR_MTRRcap},
>> +	{HV_X64_REGISTER_MSR_MTRR_DEF_TYPE, -1, MSR_MTRRdefType},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_BASE0, -1, MTRRphysBase_MSR(0)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_BASE1, -1, MTRRphysBase_MSR(1)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_BASE2, -1, MTRRphysBase_MSR(2)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_BASE3, -1, MTRRphysBase_MSR(3)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_BASE4, -1, MTRRphysBase_MSR(4)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_BASE5, -1, MTRRphysBase_MSR(5)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_BASE6, -1, MTRRphysBase_MSR(6)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_BASE7, -1, MTRRphysBase_MSR(7)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_BASE8, -1, MTRRphysBase_MSR(8)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_BASE9, -1, MTRRphysBase_MSR(9)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_BASEA, -1, MTRRphysBase_MSR(0xa)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_BASEB, -1, MTRRphysBase_MSR(0xb)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_BASEC, -1, MTRRphysBase_MSR(0xc)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_BASED, -1, MTRRphysBase_MSR(0xd)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_BASEE, -1, MTRRphysBase_MSR(0xe)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_BASEF, -1, MTRRphysBase_MSR(0xf)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_MASK0, -1, MTRRphysMask_MSR(0)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_MASK1, -1, MTRRphysMask_MSR(1)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_MASK2, -1, MTRRphysMask_MSR(2)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_MASK3, -1, MTRRphysMask_MSR(3)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_MASK4, -1, MTRRphysMask_MSR(4)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_MASK5, -1, MTRRphysMask_MSR(5)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_MASK6, -1, MTRRphysMask_MSR(6)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_MASK7, -1, MTRRphysMask_MSR(7)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_MASK8, -1, MTRRphysMask_MSR(8)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_MASK9, -1, MTRRphysMask_MSR(9)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_MASKA, -1, MTRRphysMask_MSR(0xa)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_MASKB, -1, MTRRphysMask_MSR(0xb)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_MASKC, -1, MTRRphysMask_MSR(0xc)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_MASKD, -1, MTRRphysMask_MSR(0xd)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_MASKE, -1, MTRRphysMask_MSR(0xe)},
>> +	{HV_X64_REGISTER_MSR_MTRR_PHYS_MASKF, -1, MTRRphysMask_MSR(0xf)},
>> +	{HV_X64_REGISTER_MSR_MTRR_FIX64K00000, -1, MSR_MTRRfix64K_00000},
>> +	{HV_X64_REGISTER_MSR_MTRR_FIX16K80000, -1, MSR_MTRRfix16K_80000},
>> +	{HV_X64_REGISTER_MSR_MTRR_FIX16KA0000, -1, MSR_MTRRfix16K_A0000},
>> +	{HV_X64_REGISTER_MSR_MTRR_FIX4KC0000, -1, MSR_MTRRfix4K_C0000},
>> +	{HV_X64_REGISTER_MSR_MTRR_FIX4KC8000, -1, MSR_MTRRfix4K_C8000},
>> +	{HV_X64_REGISTER_MSR_MTRR_FIX4KD0000, -1, MSR_MTRRfix4K_D0000},
>> +	{HV_X64_REGISTER_MSR_MTRR_FIX4KD8000, -1, MSR_MTRRfix4K_D8000},
>> +	{HV_X64_REGISTER_MSR_MTRR_FIX4KE0000, -1, MSR_MTRRfix4K_E0000},
>> +	{HV_X64_REGISTER_MSR_MTRR_FIX4KE8000, -1, MSR_MTRRfix4K_E8000},
>> +	{HV_X64_REGISTER_MSR_MTRR_FIX4KF0000, -1, MSR_MTRRfix4K_F0000},
>> +	{HV_X64_REGISTER_MSR_MTRR_FIX4KF8000, -1, MSR_MTRRfix4K_F8000},
>> +};
>> +
>> +int hv_vtl_get_set_reg(struct hv_register_assoc *regs, bool set, u64 shared)
>> +{
>> +	u64 *reg64;
>> +	enum hv_register_name gpr_name;
>> +	int i;
>> +
>> +	gpr_name = regs->name;
>> +	reg64 = &regs->value.reg64;
>> +
>> +	/* Search for the register in the table */
>> +	for (i = 0; i < ARRAY_SIZE(reg_table); i++) {
>> +		if (reg_table[i].reg_name != gpr_name)
>> +			continue;
>> +		if (reg_table[i].debug_reg_num != -1) {
>> +			/* Handle debug registers */
>> +			if (gpr_name == HV_X64_REGISTER_DR6 && !shared)
>> +				goto hypercall;
>> +			if (set)
>> +				native_set_debugreg(reg_table[i].debug_reg_num, *reg64);
>> +			else
>> +				*reg64 = native_get_debugreg(reg_table[i].debug_reg_num);
>> +		} else {
>> +			/* Handle MSRs */
>> +			if (set)
>> +				wrmsrl(reg_table[i].msr_addr, *reg64);
>> +			else
>> +				rdmsrl(reg_table[i].msr_addr, *reg64);
>> +		}
>> +		return 0;
>> +	}
>> +
>> +hypercall:
>> +	return 1;
>> +}
>> +EXPORT_SYMBOL(hv_vtl_get_set_reg);
>> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
>> index f64393e853ee..d5355a5b7517 100644
>> --- a/arch/x86/include/asm/mshyperv.h
>> +++ b/arch/x86/include/asm/mshyperv.h
>> @@ -304,6 +304,7 @@ void mshv_vtl_return_call(struct mshv_vtl_cpu_context *vtl0);
>>   void mshv_vtl_return_call_init(u64 vtl_return_offset);
>>   void mshv_vtl_return_hypercall(void);
>>   void __mshv_vtl_return_call(struct mshv_vtl_cpu_context *vtl0);
>> +int hv_vtl_get_set_reg(struct hv_register_assoc *regs, bool set, u64 shared);
> 
> Can this move to asm-generic/mshyperv.h?  The function is no longer specific
> to x86/x64, so one would want to not declare it in the arch/x86 version
> of mshyperv.h. But maybe moving it to asm-generic/mshyperv.h breaks
> compilation on arm64 because there's also the static inline stub there.

This is still arch specific (x86 to be precise). For ARM64, we always 
want to return 1, which is to tell the client to use hypercall as a 
fallback option. Moving this x86 specific implementation (X64 registers, 
MTRR etc) to a common code, would not be right. One thing that could be 
done here was to move the "return 1" stub code from arm64 to asm-generic 
mshyperv.h, but that would not provide much benefit.

I am currently not planning to make any changes here. If I misunderstood 
your comment, please let me know.

Regards,
Naman



^ permalink raw reply

* Re: [PATCH 05/11] drivers: hv: Export vmbus_interrupt for mshv_vtl module
From: Naman Jain @ 2026-04-13 11:46 UTC (permalink / raw)
  To: Michael Kelley, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	x86@kernel.org, H . Peter Anvin, Arnd Bergmann, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Ghiti
  Cc: Marc Zyngier, Timothy Hayes, Lorenzo Pieralisi, mrigendrachaubey,
	ssengar@linux.microsoft.com, linux-hyperv@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-riscv@lists.infradead.org
In-Reply-To: <SN6PR02MB4157F1DAEF3BC14A67D59FB8D450A@SN6PR02MB4157.namprd02.prod.outlook.com>



On 4/1/2026 10:26 PM, Michael Kelley wrote:
> From: Naman Jain <namjain@linux.microsoft.com> Sent: Monday, March 16, 2026 5:13 AM
>>
> 
> Nit:  For the patch Subject, capitalize "Drivers:" in the prefix.

Acked.

Thanks,
Naman

> 
>> vmbus_interrupt is used in mshv_vtl_main.c to set the SINT vector.
>> When CONFIG_MSHV_VTL=m and CONFIG_HYPERV_VMBUS=y (built-in), the module
>> cannot access vmbus_interrupt at load time since it is not exported.
>>
>> Export it using EXPORT_SYMBOL_FOR_MODULES consistent with the existing
>> pattern used for vmbus_isr.
>>
>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>> ---
>>   drivers/hv/vmbus_drv.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
>> index f99d4f2d3862..de191799a8f6 100644
>> --- a/drivers/hv/vmbus_drv.c
>> +++ b/drivers/hv/vmbus_drv.c
>> @@ -57,6 +57,7 @@ static DEFINE_PER_CPU(long, vmbus_evt);
>>   /* Values parsed from ACPI DSDT */
>>   int vmbus_irq;
>>   int vmbus_interrupt;
>> +EXPORT_SYMBOL_FOR_MODULES(vmbus_interrupt, "mshv_vtl");
>>
>>   /*
>>    * If the Confidential VMBus is used, the data on the "wire" is not
>> --
>> 2.43.0
>>
> 
> Reviewed-by: Michael Kelley <mhklinux@outlook.com>



^ permalink raw reply

* [PATCH] drm: mxsfb: lcdif: enforce 64-byte pitch alignment for scanout
From: Advait Dhamorikar @ 2026-04-13 11:44 UTC (permalink / raw)
  To: marex
  Cc: stefan, maarten.lankhorst, mripard, tzimmermann, airlied, simona,
	Frank.Li, s.hauer, kernel, festevam, dri-devel, imx,
	linux-arm-kernel, linux-kernel, Advait Dhamorikar

The LCDIF controller expects framebuffer pitch to be aligned to a
64 byte boundary for reliable scanout. While byte-granular pitches are
supported by the interface, the i.MX8MP reference manual
recommends 64-byte alignment for optimal operation.

Corrupted output was observed with XR24 framebuffers where a pitch of
4320 bytes caused visible corruption and choppy display, while an aligned
pitch of 4352 bytes worked correctly.

Ensure that only framebuffers with properly aligned pitch are accepted
by rejecting invalid configurations in lcdif_plane_atomic_check().
This allows userspace to fall back to a compatible allocation.

Signed-off-by: Advait Dhamorikar <advaitd@mechasystems.com>
---
 drivers/gpu/drm/mxsfb/lcdif_kms.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/mxsfb/lcdif_kms.c b/drivers/gpu/drm/mxsfb/lcdif_kms.c
index 72eb0de46b54..8e574e9a591a 100644
--- a/drivers/gpu/drm/mxsfb/lcdif_kms.c
+++ b/drivers/gpu/drm/mxsfb/lcdif_kms.c
@@ -674,6 +674,18 @@ static int lcdif_plane_atomic_check(struct drm_plane *plane,
 	crtc_state = drm_atomic_get_new_crtc_state(state,
 						   &lcdif->crtc);

+	/*
+	 * While byte granularity is supported, LCDIF requires
+	 * that framebuffer pitch be aligned to 64 bytes.
+	 */
+	if (plane_state->fb &&
+	    !IS_ALIGNED(plane_state->fb->pitches[0], 64)) {
+		DRM_DEV_DEBUG_DRIVER(plane->dev->dev,
+							"Framebuffer pitch (%u bytes) must be aligned to 64 bytes\n",
+							plane_state->fb->pitches[0]);
+		return -EINVAL;
+	}
+
 	return drm_atomic_helper_check_plane_state(plane_state, crtc_state,
 						   DRM_PLANE_NO_SCALING,
 						   DRM_PLANE_NO_SCALING,
-- 
2.43.0

^ permalink raw reply related

* Re: [PATCH 06/11] Drivers: hv: Make sint vector architecture neutral in MSHV_VTL
From: Naman Jain @ 2026-04-13 11:47 UTC (permalink / raw)
  To: Michael Kelley, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	x86@kernel.org, H . Peter Anvin, Arnd Bergmann, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Ghiti
  Cc: Marc Zyngier, Timothy Hayes, Lorenzo Pieralisi, mrigendrachaubey,
	ssengar@linux.microsoft.com, linux-hyperv@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-riscv@lists.infradead.org
In-Reply-To: <SN6PR02MB4157521DEF9EA2471B6F3359D450A@SN6PR02MB4157.namprd02.prod.outlook.com>



On 4/1/2026 10:27 PM, Michael Kelley wrote:
> From: Naman Jain <namjain@linux.microsoft.com> Sent: Monday, March 16, 2026 5:13 AM
>>
>> Generalize Synthetic interrupt source vector (sint) to use
>> vmbus_interrupt variable instead, which automatically takes care of
>> architectures where HYPERVISOR_CALLBACK_VECTOR is not present (arm64).
> 
> Sashiko AI raised an interesting question about the startup timing --
> whether the vmbus_platform_driver_probe() is guaranteed to have
> set vmbus_interrupt before the VTL functions below run and use it.
> What causes the mshv_vtl.ko module to be loaded, and hence run
> mshv_vtl_init()?

There is no race condition here. The init ordering guarantees that
vmbus_interrupt is always set before mshv_vtl_synic_enable_regs()
reads it.

The call chain for setting vmbus_interrupt:

   subsys_initcall(hv_acpi_init)                          [level 4]
     -> platform_driver_register(&vmbus_platform_driver) and so on.


The call chain for reading vmbus_interrupt:

   module_init(mshv_vtl_init)                             [level 6]
     -> hv_vtl_setup_synic()
       -> cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, ..., 
mshv_vtl_alloc_context, ...)
         -> mshv_vtl_alloc_context()
           -> mshv_vtl_synic_enable_regs()
             -> sint.vector = vmbus_interrupt

do_initcalls() processes sections in order 0 through 7, so 
hv_acpi_init() (level 4) is guaranteed to complete before 
mshv_vtl_init() (level 6) runs.



Regarding memory leak on cpu offline/online or module load/unload- it is 
beyond the scope of this series, I will fix it separately.

I may need some more time in addressing comments on rest of the patches. 
Please bear with me.

Regards,
Naman

> 
>>
>> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>> ---
>>   drivers/hv/mshv_vtl_main.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/hv/mshv_vtl_main.c b/drivers/hv/mshv_vtl_main.c
>> index b607b6e7e121..91517b45d526 100644
>> --- a/drivers/hv/mshv_vtl_main.c
>> +++ b/drivers/hv/mshv_vtl_main.c
>> @@ -234,7 +234,7 @@ static void mshv_vtl_synic_enable_regs(unsigned int cpu)
>>   	union hv_synic_sint sint;
>>
>>   	sint.as_uint64 = 0;
>> -	sint.vector = HYPERVISOR_CALLBACK_VECTOR;
>> +	sint.vector = vmbus_interrupt;
>>   	sint.masked = false;
>>   	sint.auto_eoi = hv_recommend_using_aeoi();
>>
>> @@ -753,7 +753,7 @@ static void mshv_vtl_synic_mask_vmbus_sint(void *info)
>>   	const u8 *mask = info;
>>
>>   	sint.as_uint64 = 0;
>> -	sint.vector = HYPERVISOR_CALLBACK_VECTOR;
>> +	sint.vector = vmbus_interrupt;
>>   	sint.masked = (*mask != 0);
>>   	sint.auto_eoi = hv_recommend_using_aeoi();
>>
>> --
>> 2.43.0
>>
> 
> Assuming there's no timing problem vs. the VMBus driver,
> 
> Reviewed-by: Michael Kelley <mhklinux@outlook.com>



^ permalink raw reply

* Re: [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
From: Kevin Brodsky @ 2026-04-13 11:47 UTC (permalink / raw)
  To: Ryan Roberts, Catalin Marinas, Will Deacon,
	David Hildenbrand (Arm), Dev Jain, Yang Shi, Suzuki K Poulose,
	Jinjiang Tu
  Cc: linux-arm-kernel, linux-kernel, stable
In-Reply-To: <20260330161705.3349825-2-ryan.roberts@arm.com>

On 30/03/2026 18:17, Ryan Roberts wrote:
> +bool page_alloc_available __ro_after_init;
> +
> +void __init mem_init(void)
> +{
> + page_alloc_available = true;

I knew I had seen a simpler solution to this: vmemmap_alloc_block() uses
slab_is_available() to tell whether the buddy allocator is available.
AFAICT this becomes true somewhere in kmem_cache_init(), which is called
just after mem_init(). Probably good enough for our purpose?

- Kevin

> + swiotlb_update_mem_attributes();
> +}

^ permalink raw reply

* Re: [PATCH] drm/bridge: stm_lvds: Do not fail atomic_check on disabled connector
From: Raphael Gallais-Pou @ 2026-04-13 11:48 UTC (permalink / raw)
  To: dri-devel, Marek Vasut
  Cc: Alexandre Torgue, David Airlie, Maarten Lankhorst,
	Maxime Coquelin, Maxime Ripard, Philippe Cornu, Simona Vetter,
	Thomas Zimmermann, Yannick Fertre, linux-arm-kernel, linux-kernel,
	linux-stm32
In-Reply-To: <20260409024928.344010-1-marex@nabladev.com>


On Thu, 09 Apr 2026 04:48:41 +0200, Marek Vasut wrote:
> If the connector is disabled, the new connector state has .crtc field
> set to NULL and there is nothing more to validate after that point.
> The .crtc field being NULL is not an error. Test for .crtc being NULL,
> and if it is NULL, exit early with return 0.
> 
> This fixes a failure in suspend/resume path, where the connector is
> already disabled, but .atomic_check is called, fails, returns -EINVAL
> and blocks the suspend entry.
> 
> [...]

Applied, thanks!

[1/1] drm/bridge: stm_lvds: Do not fail atomic_check on disabled connector
      commit: eecdd4bd6e47bf0c8ff1e049771fa5bab7074c7c

Best regards,
-- 
Raphael Gallais-Pou <raphael.gallais-pou@foss.st.com>



^ permalink raw reply

* Re: [PATCH v3 2/4] mm: use tiered folio allocation for VM_EXEC readahead
From: Usama Arif @ 2026-04-13 11:48 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, david, willy, ryan.roberts, linux-mm, r, ajd,
	apopple, baohua, baolin.wang, brauner, catalin.marinas, dev.jain,
	kees, kevin.brodsky, lance.yang, Liam.Howlett, linux-arm-kernel,
	linux-fsdevel, linux-kernel, Lorenzo Stoakes, mhocko, npache,
	pasha.tatashin, rmclure, rppt, surenb, vbabka, Al Viro,
	wilts.infradead.org, ziy, hannes, kas, shakeel.butt, leitao,
	kernel-team
In-Reply-To: <aji7zs42th272khtxesk6dfcrgf7ddr5r5n62wgzeqooyexgxf@5ns3i47f5nlg>



On 13/04/2026 12:03, Jan Kara wrote:
> On Thu 02-04-26 11:08:23, Usama Arif wrote:
>> When executable pages are faulted via do_sync_mmap_readahead(), request
>> a folio order that enables the best hardware TLB coalescing available:
>>
>> - If the VMA is large enough to contain a full PMD, request
>>   HPAGE_PMD_ORDER so the folio can be PMD-mapped. This benefits
>>   architectures where PMD_SIZE is reasonable (e.g. 2M on x86-64
>>   and arm64 with 4K pages). VM_EXEC VMAs are very unlikely to be
>>   large enough for 512M pages on ARM to take into affect.
> 
> I'm not sure relying on PMD_SIZE will be too much for a VMA is a great
> strategy. With 16k PAGE_SIZE the PMD would be 32MB large which would fit in
> the .text size but already looks a bit too much? Mapping with PMD sized
> folios brings some benefits but at the same time it costs because now parts
> of VMA that would be never paged in are pulled into memory and also LRU
> tracking now happens with this very large granularity making it fairly
> inefficient (big folios have much higher chances of getting accessed
> similarly often making LRU order mostly random). We are already getting
> reports of people with small machines (phones etc.) where the memory
> overhead of large folios (in the page cache) is simply too much. So I'd
> have a bigger peace of mind if we capped folio size at 2MB for now until we
> come with a more sophisticated heuristic of picking sensible folio order
> given the machine size. Now I'm not really an MM person so my feeling here
> may be just wrong but I wanted to voice this concern from what I can see...
> 
> 								Honza
> 
> 

Thanks for the feedback! I agree, it makes sense. I did that in the previous
revision [1]. I will reinistante that in the next one.

[1] https://lore.kernel.org/all/20260320140315.979307-3-usama.arif@linux.dev/

>> - Otherwise, fall back to exec_folio_order(), which returns the
>>   minimum order for hardware PTE coalescing for arm64:
>>   - arm64 4K:  order 4 (64K) for contpte (16 PTEs → 1 iTLB entry)
>>   - arm64 16K: order 2 (64K) for HPA (4 pages → 1 TLB entry)
>>   - arm64 64K: order 5 (2M) for contpte (32 PTEs → 1 iTLB entry)
>>   - generic:   order 0 (no coalescing)
>>
>> Update the arm64 exec_folio_order() to return ilog2(SZ_2M >>
>> PAGE_SHIFT) on 64K page configurations, where the previous SZ_64K
>> value collapsed to order 0 (a single page) and provided no coalescing
>> benefit.
>>
>> Use ~__GFP_RECLAIM so the allocation is opportunistic: if a large
>> folio is readily available, use it, otherwise fall back to smaller
>> folios without stalling on reclaim or compaction. The existing fallback
>> in page_cache_ra_order() handles this naturally.
>>
>> The readahead window is already clamped to the VMA boundaries, so
>> ra->size naturally caps the folio order via ilog2(ra->size) in
>> page_cache_ra_order().
>>
>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>> ---
>>  arch/arm64/include/asm/pgtable.h | 16 +++++++++----
>>  mm/filemap.c                     | 40 +++++++++++++++++++++++---------
>>  mm/internal.h                    |  3 ++-
>>  mm/readahead.c                   |  7 +++---
>>  4 files changed, 45 insertions(+), 21 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index 52bafe79c10a..9ce9f73a6f35 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -1591,12 +1591,18 @@ static inline void update_mmu_cache_range(struct vm_fault *vmf,
>>  #define arch_wants_old_prefaulted_pte	cpu_has_hw_af
>>  
>>  /*
>> - * Request exec memory is read into pagecache in at least 64K folios. This size
>> - * can be contpte-mapped when 4K base pages are in use (16 pages into 1 iTLB
>> - * entry), and HPA can coalesce it (4 pages into 1 TLB entry) when 16K base
>> - * pages are in use.
>> + * Request exec memory is read into pagecache in folios large enough for
>> + * hardware TLB coalescing. On 4K and 16K page configs this is 64K, which
>> + * enables contpte mapping (16 × 4K) and HPA coalescing (4 × 16K). On
>> + * 64K page configs, contpte requires 2M (32 × 64K).
>>   */
>> -#define exec_folio_order() ilog2(SZ_64K >> PAGE_SHIFT)
>> +#define exec_folio_order exec_folio_order
>> +static inline unsigned int exec_folio_order(void)
>> +{
>> +	if (PAGE_SIZE == SZ_64K)
>> +		return ilog2(SZ_2M >> PAGE_SHIFT);
>> +	return ilog2(SZ_64K >> PAGE_SHIFT);
>> +}
>>  
>>  static inline bool pud_sect_supported(void)
>>  {
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index a4ea869b2ca1..7ffea986b3b4 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -3311,6 +3311,7 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
>>  	DEFINE_READAHEAD(ractl, file, ra, mapping, vmf->pgoff);
>>  	struct file *fpin = NULL;
>>  	vm_flags_t vm_flags = vmf->vma->vm_flags;
>> +	gfp_t gfp = readahead_gfp_mask(mapping);
>>  	bool force_thp_readahead = false;
>>  	unsigned short mmap_miss;
>>  
>> @@ -3363,28 +3364,45 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
>>  			ra->size *= 2;
>>  		ra->async_size = HPAGE_PMD_NR;
>>  		ra->order = HPAGE_PMD_ORDER;
>> -		page_cache_ra_order(&ractl, ra);
>> +		page_cache_ra_order(&ractl, ra, gfp);
>>  		return fpin;
>>  	}
>>  
>>  	if (vm_flags & VM_EXEC) {
>>  		/*
>> -		 * Allow arch to request a preferred minimum folio order for
>> -		 * executable memory. This can often be beneficial to
>> -		 * performance if (e.g.) arm64 can contpte-map the folio.
>> -		 * Executable memory rarely benefits from readahead, due to its
>> -		 * random access nature, so set async_size to 0.
>> +		 * Request large folios for executable memory to enable
>> +		 * hardware PTE coalescing and PMD mappings:
>>  		 *
>> -		 * Limit to the boundaries of the VMA to avoid reading in any
>> -		 * pad that might exist between sections, which would be a waste
>> -		 * of memory.
>> +		 *  - If the VMA is large enough for a PMD, request
>> +		 *    HPAGE_PMD_ORDER so the folio can be PMD-mapped.
>> +		 *  - Otherwise, use exec_folio_order() which returns
>> +		 *    the minimum order for hardware TLB coalescing
>> +		 *    (e.g. arm64 contpte/HPA).
>> +		 *
>> +		 * Use ~__GFP_RECLAIM so large folio allocation is
>> +		 * opportunistic — if memory isn't readily available,
>> +		 * fall back to smaller folios rather than stalling on
>> +		 * reclaim or compaction.
>> +		 *
>> +		 * Executable memory rarely benefits from speculative
>> +		 * readahead due to its random access nature, so set
>> +		 * async_size to 0.
>> +		 *
>> +		 * Limit to the boundaries of the VMA to avoid reading
>> +		 * in any pad that might exist between sections, which
>> +		 * would be a waste of memory.
>>  		 */
>> +		gfp &= ~__GFP_RECLAIM;
>>  		struct vm_area_struct *vma = vmf->vma;
>>  		unsigned long start = vma->vm_pgoff;
>>  		unsigned long end = start + vma_pages(vma);
>>  		unsigned long ra_end;
>>  
>> -		ra->order = exec_folio_order();
>> +		if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
>> +		    vma_pages(vma) >= HPAGE_PMD_NR)
>> +			ra->order = HPAGE_PMD_ORDER;
>> +		else
>> +			ra->order = exec_folio_order();
>>  		ra->start = round_down(vmf->pgoff, 1UL << ra->order);
>>  		ra->start = max(ra->start, start);
>>  		ra_end = round_up(ra->start + ra->ra_pages, 1UL << ra->order);
>> @@ -3403,7 +3421,7 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
>>  
>>  	fpin = maybe_unlock_mmap_for_io(vmf, fpin);
>>  	ractl._index = ra->start;
>> -	page_cache_ra_order(&ractl, ra);
>> +	page_cache_ra_order(&ractl, ra, gfp);
>>  	return fpin;
>>  }
>>  
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 475bd281a10d..e624cb619057 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -545,7 +545,8 @@ int zap_vma_for_reaping(struct vm_area_struct *vma);
>>  int folio_unmap_invalidate(struct address_space *mapping, struct folio *folio,
>>  			   gfp_t gfp);
>>  
>> -void page_cache_ra_order(struct readahead_control *, struct file_ra_state *);
>> +void page_cache_ra_order(struct readahead_control *, struct file_ra_state *,
>> +			 gfp_t gfp);
>>  void force_page_cache_ra(struct readahead_control *, unsigned long nr);
>>  static inline void force_page_cache_readahead(struct address_space *mapping,
>>  		struct file *file, pgoff_t index, unsigned long nr_to_read)
>> diff --git a/mm/readahead.c b/mm/readahead.c
>> index 7b05082c89ea..b3dc08cf180c 100644
>> --- a/mm/readahead.c
>> +++ b/mm/readahead.c
>> @@ -465,7 +465,7 @@ static inline int ra_alloc_folio(struct readahead_control *ractl, pgoff_t index,
>>  }
>>  
>>  void page_cache_ra_order(struct readahead_control *ractl,
>> -		struct file_ra_state *ra)
>> +		struct file_ra_state *ra, gfp_t gfp)
>>  {
>>  	struct address_space *mapping = ractl->mapping;
>>  	pgoff_t start = readahead_index(ractl);
>> @@ -475,7 +475,6 @@ void page_cache_ra_order(struct readahead_control *ractl,
>>  	pgoff_t mark = index + ra->size - ra->async_size;
>>  	unsigned int nofs;
>>  	int err = 0;
>> -	gfp_t gfp = readahead_gfp_mask(mapping);
>>  	unsigned int new_order = ra->order;
>>  
>>  	trace_page_cache_ra_order(mapping->host, start, ra);
>> @@ -626,7 +625,7 @@ void page_cache_sync_ra(struct readahead_control *ractl,
>>  readit:
>>  	ra->order = 0;
>>  	ractl->_index = ra->start;
>> -	page_cache_ra_order(ractl, ra);
>> +	page_cache_ra_order(ractl, ra, readahead_gfp_mask(ractl->mapping));
>>  }
>>  EXPORT_SYMBOL_GPL(page_cache_sync_ra);
>>  
>> @@ -697,7 +696,7 @@ void page_cache_async_ra(struct readahead_control *ractl,
>>  		ra->size -= end - aligned_end;
>>  	ra->async_size = ra->size;
>>  	ractl->_index = ra->start;
>> -	page_cache_ra_order(ractl, ra);
>> +	page_cache_ra_order(ractl, ra, readahead_gfp_mask(ractl->mapping));
>>  }
>>  EXPORT_SYMBOL_GPL(page_cache_async_ra);
>>  
>> -- 
>> 2.52.0
>>



^ permalink raw reply

* Re: [PATCH RFC 00/12] Add support for DisplayPort link training information report
From: Kory Maincent @ 2026-04-13 12:10 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Ville Syrjälä, Jani Nikula, Rodrigo Vivi,
	Joonas Lahtinen, Tvrtko Ursulin, David Airlie, Simona Vetter,
	Dave Airlie, Jesse Barnes, Eric Anholt, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Andrzej Hajda, Neil Armstrong,
	Robert Foss, Laurent Pinchart, Jonas Karlman, Jernej Skrabec,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Chris Wilson, Thomas Petazzoni,
	Mark Yacoub, Sean Paul, Louis Chauvet, intel-gfx, intel-xe,
	dri-devel, linux-kernel, linux-mediatek, linux-arm-kernel,
	Simona Vetter
In-Reply-To: <u4ononk4cpccx77gvlywtfen5rmyslvr72v7olkhdrjf65aqce@xo777vofhcan>

On Fri, 10 Apr 2026 00:36:09 +0300
Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> wrote:

> On Thu, Apr 09, 2026 at 11:36:21PM +0300, Ville Syrjälä wrote:
> > On Thu, Apr 09, 2026 at 07:08:16PM +0200, Kory Maincent wrote:  
> > > DisplayPort link training negotiates the physical-layer parameters needed
> > > for a reliable connection: lane count, link rate, voltage swing,
> > > pre-emphasis, and optionally Display Stream Compression (DSC). Currently,
> > > each driver exposes this state in its own way, often through
> > > driver-specific debugfs entries, with no standard interface for userspace
> > > diagnostic and monitoring tools.
> > > 
> > > This series introduces a generic, DRM-managed framework for exposing DP
> > > link training state as standard connector properties, modeled after the
> > > existing HDMI helper drmm_connector_hdmi_init().
> > > 
> > > The new drmm_connector_dp_init() helper initializes a DP connector and
> > > registers the following connector properties to expose the negotiated link
> > > state to userspace:
> > > 
> > > - num_lanes:      negotiated lane count (1, 2 or 4)
> > > - link_rate:      negotiated link rate
> > > - dsc_en:         whether Display Stream Compression is active
> > > - voltage_swingN: per-lane voltage swing level (lanes 0-3)
> > > - pre_emphasisN:  per-lane pre-emphasis level (lanes 0-3)  
> > 
> > I don't see why any real userspace would be interested in those (apart
> > from maybe DSC). If this is just for diagnostics and whatnot then I
> > think sysfs/debugfs could be a better fit.  
> 
> I'd agree here. Please consider implementing it as a debugfs interface,
> possibly reusing the Intel's format.

Sorry, I completely forgot to include a paragraph explaining the rationale
behind using DRM properties.

This DisplayPort link information report was requested by OSes to allow them to
assess the capabilities of each DisplayPort connector on the system, and to
guide users from the most to least capable ones. It will also enable the OS to
warn the user when a cable is too long or experiencing noise (indicated by high
voltage swing and pre-emphasis levels).

Since this is information that OSes will consume on a regular basis, exposing
it directly as DRM properties seems the most appropriate approach. 

Regards,
-- 
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com


^ permalink raw reply

* [PATCH v4 2/2] usb: mtu3: add support remote wakeup of mt8196
From: Chunfeng Yun @ 2026-04-13 12:17 UTC (permalink / raw)
  To: Greg Kroah-Hartman, AngeloGioacchino Del Regno
  Cc: Chunfeng Yun, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Matthias Brugger, linux-usb, linux-arm-kernel, linux-mediatek,
	devicetree, linux-kernel
In-Reply-To: <20260413121727.4702-1-chunfeng.yun@mediatek.com>

There are three USB controllers on mt8196, each controller's wakeup
control is different, add some specific versions for them.

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
---
v4: add reviewed-by
v3: add the ommitted third dual-role controller add acked by Conor
v2: new patch for dual-role controllers
---
 drivers/usb/mtu3/mtu3_host.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/usb/mtu3/mtu3_host.c b/drivers/usb/mtu3/mtu3_host.c
index 7c657ea2dabd..8138b3f3096a 100644
--- a/drivers/usb/mtu3/mtu3_host.c
+++ b/drivers/usb/mtu3/mtu3_host.c
@@ -46,6 +46,14 @@
 #define WC1_IS_P_95		BIT(12)
 #define WC1_IS_EN_P0_95		BIT(6)
 
+/* mt8196 */
+#define PERI_WK_CTRL0_8196	0x08
+#define WC0_IS_EN_P0_96		BIT(0)
+#define WC0_IS_EN_P1_96		BIT(7)
+
+#define PERI_WK_CTRL1_8196	0x10
+#define WC1_IS_EN_P2_96		BIT(0)
+
 /* mt2712 etc */
 #define PERI_SSUSB_SPM_CTRL	0x0
 #define SSC_IP_SLEEP_EN	BIT(4)
@@ -59,6 +67,9 @@ enum ssusb_uwk_vers {
 	SSUSB_UWK_V1_3,		/* mt8195 IP0 */
 	SSUSB_UWK_V1_5 = 105,	/* mt8195 IP2 */
 	SSUSB_UWK_V1_6,		/* mt8195 IP3 */
+	SSUSB_UWK_V1_7, 	/* mt8196 IP0 */
+	SSUSB_UWK_V1_8, 	/* mt8196 IP1 */
+	SSUSB_UWK_V1_9, 	/* mt8196 IP2 */
 };
 
 /*
@@ -100,6 +111,21 @@ static void ssusb_wakeup_ip_sleep_set(struct ssusb_mtk *ssusb, bool enable)
 		msk = WC0_IS_EN_P3_95 | WC0_IS_C_95(0x7) | WC0_IS_P_95;
 		val = enable ? (WC0_IS_EN_P3_95 | WC0_IS_C_95(0x1)) : 0;
 		break;
+	case SSUSB_UWK_V1_7:
+		reg = ssusb->uwk_reg_base + PERI_WK_CTRL0_8196;
+		msk = WC0_IS_EN_P0_96;
+		val = enable ? msk : 0;
+		break;
+	case SSUSB_UWK_V1_8:
+		reg = ssusb->uwk_reg_base + PERI_WK_CTRL0_8196;
+		msk = WC0_IS_EN_P1_96;
+		val = enable ? msk : 0;
+		break;
+	case SSUSB_UWK_V1_9:
+		reg = ssusb->uwk_reg_base + PERI_WK_CTRL1_8196;
+		msk = WC1_IS_EN_P2_96;
+		val = enable ? msk : 0;
+		break;
 	case SSUSB_UWK_V2:
 		reg = ssusb->uwk_reg_base + PERI_SSUSB_SPM_CTRL;
 		msk = SSC_IP_SLEEP_EN | SSC_SPM_INT_EN;
-- 
2.45.2



^ permalink raw reply related

* [PATCH v4 1/2] dt-bindings: usb: mtu3: add support mt8196
From: Chunfeng Yun @ 2026-04-13 12:17 UTC (permalink / raw)
  To: Greg Kroah-Hartman, AngeloGioacchino Del Regno
  Cc: Chunfeng Yun, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Matthias Brugger, linux-usb, linux-arm-kernel, linux-mediatek,
	devicetree, linux-kernel, Conor Dooley

There are three USB controllers on mt8196, each controller's wakeup
control is different, add some specific versions for them, and add
compatilbe for mt8196.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
---
v4: add reviewed-by
v3: add the ommitted third dual-role controller suggested by Angelo
v2: add wakeup for dual-role controllers
---
 Documentation/devicetree/bindings/usb/mediatek,mtu3.yaml | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/usb/mediatek,mtu3.yaml b/Documentation/devicetree/bindings/usb/mediatek,mtu3.yaml
index 21fc6bbe954f..d148e938d647 100644
--- a/Documentation/devicetree/bindings/usb/mediatek,mtu3.yaml
+++ b/Documentation/devicetree/bindings/usb/mediatek,mtu3.yaml
@@ -28,6 +28,7 @@ properties:
           - mediatek,mt8188-mtu3
           - mediatek,mt8192-mtu3
           - mediatek,mt8195-mtu3
+          - mediatek,mt8196-mtu3
           - mediatek,mt8365-mtu3
       - const: mediatek,mtu3
 
@@ -200,7 +201,10 @@ properties:
             103 - used by mt8195, IP0, specific 1.03;
             105 - used by mt8195, IP2, specific 1.05;
             106 - used by mt8195, IP3, specific 1.06;
-          enum: [1, 2, 101, 102, 103, 105, 106]
+            107 - used by mt8196, IP0, specific 1.07;
+            108 - used by mt8196, IP1, specific 1.08;
+            109 - used by mt8196, IP2, specific 1.09;
+            enum: [1, 2, 101, 102, 103, 105, 106, 107, 108, 109]
 
   mediatek,u3p-dis-msk:
     $ref: /schemas/types.yaml#/definitions/uint32
-- 
2.45.2



^ permalink raw reply related

* [PATCH V2 RESEND 2/2] phy: mediatek: xsphy: add support to set disconnect threshold
From: Chunfeng Yun @ 2026-04-13 12:28 UTC (permalink / raw)
  To: Vinod Koul, AngeloGioacchino Del Regno
  Cc: Chunfeng Yun, Neil Armstrong, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Matthias Brugger, linux-arm-kernel, linux-mediatek,
	linux-phy, devicetree, linux-kernel
In-Reply-To: <20260413122836.4848-1-chunfeng.yun@mediatek.com>

Add a property to tune usb2 phy's disconnect threshold.

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
---
v2: change property name
---
 drivers/phy/mediatek/phy-mtk-xsphy.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/phy/mediatek/phy-mtk-xsphy.c b/drivers/phy/mediatek/phy-mtk-xsphy.c
index c0ddb9273cc3..46345e4f4189 100644
--- a/drivers/phy/mediatek/phy-mtk-xsphy.c
+++ b/drivers/phy/mediatek/phy-mtk-xsphy.c
@@ -61,6 +61,7 @@
 #define XSP_USBPHYACR6		((SSUSB_SIFSLV_U2PHY_COM) + 0x018)
 #define P2A6_RG_BC11_SW_EN	BIT(23)
 #define P2A6_RG_OTG_VBUSCMP_EN	BIT(20)
+#define PA6_RG_U2_DISCTH	GENMASK(7, 4)
 
 #define XSP_U2PHYDTM1		((SSUSB_SIFSLV_U2PHY_COM) + 0x06C)
 #define P2D_FORCE_IDDIG		BIT(9)
@@ -107,6 +108,7 @@ struct xsphy_instance {
 	int eye_src;
 	int eye_vrt;
 	int eye_term;
+	int discth;
 };
 
 struct mtk_xsphy {
@@ -256,9 +258,12 @@ static void phy_parse_property(struct mtk_xsphy *xsphy,
 					 &inst->eye_vrt);
 		device_property_read_u32(dev, "mediatek,eye-term",
 					 &inst->eye_term);
-		dev_dbg(dev, "intr:%d, src:%d, vrt:%d, term:%d\n",
+		device_property_read_u32(dev, "mediatek,discth",
+					 &inst->discth);
+		dev_dbg(dev, "intr:%d, src:%d, vrt:%d, term:%d, discth:%d\n",
 			inst->efuse_intr, inst->eye_src,
-			inst->eye_vrt, inst->eye_term);
+			inst->eye_vrt, inst->eye_term,
+			inst->discth);
 		break;
 	case PHY_TYPE_USB3:
 		device_property_read_u32(dev, "mediatek,efuse-intr",
@@ -301,6 +306,9 @@ static void u2_phy_props_set(struct mtk_xsphy *xsphy,
 	if (inst->eye_term)
 		mtk_phy_update_field(pbase + XSP_USBPHYACR1, P2A1_RG_TERM_SEL,
 				     inst->eye_term);
+	if (inst->discth)
+		mtk_phy_update_field(pbase + XSP_USBPHYACR6, PA6_RG_U2_DISCTH,
+				     inst->discth);
 }
 
 static void u3_phy_props_set(struct mtk_xsphy *xsphy,
-- 
2.45.2



^ permalink raw reply related

* [PATCH v2 RESEND 1/2] dt-bindings: phy: mediatek,xsphy: add property to set disconnect threshold
From: Chunfeng Yun @ 2026-04-13 12:28 UTC (permalink / raw)
  To: Vinod Koul, AngeloGioacchino Del Regno
  Cc: Chunfeng Yun, Neil Armstrong, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Matthias Brugger, linux-arm-kernel, linux-mediatek,
	linux-phy, devicetree, linux-kernel

Add a property to tune usb2 phy's disconnect threshold.
And add a compatible for mt8196.

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
---
v2: change property name
---
 Documentation/devicetree/bindings/phy/mediatek,xsphy.yaml | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/Documentation/devicetree/bindings/phy/mediatek,xsphy.yaml b/Documentation/devicetree/bindings/phy/mediatek,xsphy.yaml
index 0bed847bb4ad..9017a9c93eb9 100644
--- a/Documentation/devicetree/bindings/phy/mediatek,xsphy.yaml
+++ b/Documentation/devicetree/bindings/phy/mediatek,xsphy.yaml
@@ -50,6 +50,7 @@ properties:
           - mediatek,mt3611-xsphy
           - mediatek,mt3612-xsphy
           - mediatek,mt7988-xsphy
+          - mediatek,mt8196-xsphy
       - const: mediatek,xsphy
 
   reg:
@@ -130,6 +131,13 @@ patternProperties:
         minimum: 1
         maximum: 7
 
+      mediatek,disconnect-threshold:
+        description:
+          The selection of disconnect threshold (U2 phy)
+        $ref: /schemas/types.yaml#/definitions/uint32
+        minimum: 1
+        maximum: 15
+
       mediatek,efuse-intr:
         description:
           The selection of Internal Resistor (U2/U3 phy)
-- 
2.45.2



^ permalink raw reply related

* Re: [PATCH] clk: bcm: rpi: Mark VEC clock as CLK_IGNORE_UNUSED
From: Maíra Canal @ 2026-04-13 12:29 UTC (permalink / raw)
  To: Michael Turquette, Stephen Boyd, Florian Fainelli,
	Broadcom internal kernel review list, Mark Brown, Maxime Ripard,
	Stefan Wahren, Dom Cobley, Dave Stevenson
  Cc: linux-clk, linux-rpi-kernel, linux-arm-kernel, kernel-dev
In-Reply-To: <20260401111416.562279-2-mcanal@igalia.com>

Hi Stephen,

It would be great to land this patch in the next release together with
commit 672299736af6 ("clk: bcm: rpi: Manage clock rate in prepare/
unprepare callbacks"). When possible, could you take a look at it?

Best regards,
- Maíra

On 4/1/26 08:13, Maíra Canal wrote:
> On Raspberry Pi 3B, the VEC clock is used by the VideoCore firmware
> display driver, which remains active until the vc4 driver loads and
> sends NOTIFY_DISPLAY_DONE. If this clock is disabled during boot, a bus
> lockup happens and the firmware becomes unresponsive, causing a complete
> system lockup.
> 
> Mark the VEC clock with CLK_IGNORE_UNUSED so it survives the unused
> clock disablement and remains available until the vc4 driver takes over
> display management.
> 
> Fixes: 672299736af6 ("clk: bcm: rpi: Manage clock rate in prepare/unprepare callbacks")
> Reported-by: Mark Brown <broonie@kernel.org>
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
>   drivers/clk/bcm/clk-raspberrypi.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/clk/bcm/clk-raspberrypi.c b/drivers/clk/bcm/clk-raspberrypi.c
> index df2d246eb6ef..f1a99de6de4f 100644
> --- a/drivers/clk/bcm/clk-raspberrypi.c
> +++ b/drivers/clk/bcm/clk-raspberrypi.c
> @@ -160,6 +160,13 @@ raspberrypi_clk_variants[RPI_FIRMWARE_NUM_CLK_ID] = {
>   	[RPI_FIRMWARE_VEC_CLK_ID] = {
>   		.export = true,
>   		.minimize = true,
> +
> +		/*
> +		 * If this clock is disabled during boot, it causes a bus
> +		 * lockup in RPi 3B. Therefore, make sure it's left enabled
> +		 * during boot.
> +		 */
> +		.flags = CLK_IGNORE_UNUSED,
>   	},
>   	[RPI_FIRMWARE_DISP_CLK_ID] = {
>   		.export = true,



^ permalink raw reply

* [PATCH bpf-next 0/2] bpf, arm64/riscv: Remove redundant icache flush after pack allocator finalize
From: Puranjay Mohan @ 2026-04-13 12:32 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Xu Kuohai, Catalin Marinas, Will Deacon, Luke Nelson, Xi Wang,
	Björn Töpel, Pu Lehui, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Alexandre Ghiti, linux-arm-kernel, linux-riscv,
	linux-kernel

When the BPF prog pack allocator was added for arm64 and riscv, the
existing bpf_flush_icache() calls were retained after
bpf_jit_binary_pack_finalize(). However, the finalize path copies the
JITed code via architecture text patching routines (__text_poke on arm64,
patch_text_nosync on riscv) that already perform a full
flush_icache_range() internally. The subsequent bpf_flush_icache()
repeats the same cache maintenance on the same range.

Remove the redundant flush and the now-unused bpf_flush_icache()
definitions on both architectures.

Puranjay Mohan (2):
  bpf, arm64: Remove redundant bpf_flush_icache() after pack allocator
    finalize
  bpf, riscv: Remove redundant bpf_flush_icache() after pack allocator
    finalize

 arch/arm64/net/bpf_jit_comp.c | 11 -----------
 arch/riscv/net/bpf_jit.h      |  5 -----
 arch/riscv/net/bpf_jit_core.c |  7 -------
 3 files changed, 23 deletions(-)

base-commit: 71b500afd2f7336f5b6c6026f2af546fc079be26
-- 
2.52.0

^ permalink raw reply

* [PATCH bpf-next 2/2] bpf, riscv: Remove redundant bpf_flush_icache() after pack allocator finalize
From: Puranjay Mohan @ 2026-04-13 12:32 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Xu Kuohai, Catalin Marinas, Will Deacon, Luke Nelson, Xi Wang,
	Björn Töpel, Pu Lehui, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Alexandre Ghiti, linux-arm-kernel, linux-riscv,
	linux-kernel
In-Reply-To: <20260413123256.3296452-1-puranjay@kernel.org>

bpf_flush_icache() calls flush_icache_range() to clean the data cache
and invalidate the instruction cache for the JITed code region. However,
since commit 48a8f78c50bd ("bpf, riscv: use prog pack allocator in the
BPF JIT"), this flush is redundant.

bpf_jit_binary_pack_finalize() copies the JITed instructions to the ROX
region via bpf_arch_text_copy() -> patch_text_nosync(), and
patch_text_nosync() already calls flush_icache_range() on the written
range. The subsequent bpf_flush_icache() repeats the same cache
maintenance on an overlapping range.

Remove the redundant bpf_flush_icache() call and its now-unused
definition.

Fixes: 48a8f78c50bd ("bpf, riscv: use prog pack allocator in the BPF JIT")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 arch/riscv/net/bpf_jit.h      | 5 -----
 arch/riscv/net/bpf_jit_core.c | 7 -------
 2 files changed, 12 deletions(-)

diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
index 632ced07bca4..549537cad86b 100644
--- a/arch/riscv/net/bpf_jit.h
+++ b/arch/riscv/net/bpf_jit.h
@@ -105,11 +105,6 @@ static inline void bpf_fill_ill_insns(void *area, unsigned int size)
 	memset(area, 0, size);
 }
 
-static inline void bpf_flush_icache(void *start, void *end)
-{
-	flush_icache_range((unsigned long)start, (unsigned long)end);
-}
-
 /* Emit a 4-byte riscv instruction. */
 static inline void emit(const u32 insn, struct rv_jit_context *ctx)
 {
diff --git a/arch/riscv/net/bpf_jit_core.c b/arch/riscv/net/bpf_jit_core.c
index b3581e926436..f7fd4afc3ca3 100644
--- a/arch/riscv/net/bpf_jit_core.c
+++ b/arch/riscv/net/bpf_jit_core.c
@@ -183,13 +183,6 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 			prog = orig_prog;
 			goto out_offset;
 		}
-		/*
-		 * The instructions have now been copied to the ROX region from
-		 * where they will execute.
-		 * Write any modified data cache blocks out to memory and
-		 * invalidate the corresponding blocks in the instruction cache.
-		 */
-		bpf_flush_icache(jit_data->ro_header, ctx->ro_insns + ctx->ninsns);
 		for (i = 0; i < prog->len; i++)
 			ctx->offset[i] = ninsns_rvoff(ctx->offset[i]);
 		bpf_prog_fill_jited_linfo(prog, ctx->offset);
-- 
2.52.0



^ permalink raw reply related

* [PATCH bpf-next 1/2] bpf, arm64: Remove redundant bpf_flush_icache() after pack allocator finalize
From: Puranjay Mohan @ 2026-04-13 12:32 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Xu Kuohai, Catalin Marinas, Will Deacon, Luke Nelson, Xi Wang,
	Björn Töpel, Pu Lehui, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Alexandre Ghiti, linux-arm-kernel, linux-riscv,
	linux-kernel
In-Reply-To: <20260413123256.3296452-1-puranjay@kernel.org>

bpf_flush_icache() calls flush_icache_range() to clean the data cache
and invalidate the instruction cache for the JITed code region. However,
since commit 1dad391daef1 ("bpf, arm64: use bpf_prog_pack for memory
management"), this flush is redundant.

bpf_jit_binary_pack_finalize() copies the JITed instructions to the ROX
region via bpf_arch_text_copy() -> aarch64_insn_copy() -> __text_poke(),
and __text_poke() already calls flush_icache_range() on the written
range. The subsequent bpf_flush_icache() repeats the same cache
maintenance on an overlapping range, including an unnecessary second
synchronous IPI to all CPUs via kick_all_cpus_sync().

Remove the redundant bpf_flush_icache() call and its now-unused
definition.

Fixes: 1dad391daef1 ("bpf, arm64: use bpf_prog_pack for memory management")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 arch/arm64/net/bpf_jit_comp.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index adf84962d579..e88b0917adec 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -1961,11 +1961,6 @@ static int validate_ctx(struct jit_ctx *ctx)
 	return 0;
 }
 
-static inline void bpf_flush_icache(void *start, void *end)
-{
-	flush_icache_range((unsigned long)start, (unsigned long)end);
-}
-
 static void priv_stack_init_guard(void __percpu *priv_stack_ptr, int alloc_size)
 {
 	int cpu, underflow_idx = (alloc_size - PRIV_STACK_GUARD_SZ) >> 3;
@@ -2204,12 +2199,6 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 			prog = orig_prog;
 			goto out_off;
 		}
-		/*
-		 * The instructions have now been copied to the ROX region from
-		 * where they will execute. Now the data cache has to be cleaned to
-		 * the PoU and the I-cache has to be invalidated for the VAs.
-		 */
-		bpf_flush_icache(ro_header, ctx.ro_image + ctx.idx);
 	} else {
 		jit_data->ctx = ctx;
 		jit_data->ro_image = ro_image_ptr;
-- 
2.52.0



^ permalink raw reply related

* Re: [PATCH RFC 10/12] drm/i915/display/dp: Adopt dp_connector helpers to expose link training state
From: Kory Maincent @ 2026-04-13 12:34 UTC (permalink / raw)
  To: Jani Nikula
  Cc: Rodrigo Vivi, Joonas Lahtinen, Tvrtko Ursulin, David Airlie,
	Simona Vetter, Dave Airlie, Jesse Barnes, Eric Anholt,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Laurent Pinchart,
	Jonas Karlman, Jernej Skrabec, Chun-Kuang Hu, Philipp Zabel,
	Matthias Brugger, AngeloGioacchino Del Regno, Chris Wilson,
	Thomas Petazzoni, Mark Yacoub, Sean Paul, Louis Chauvet,
	intel-gfx, intel-xe, dri-devel, linux-kernel, linux-mediatek,
	linux-arm-kernel, Simona Vetter
In-Reply-To: <e253ca4fa0b493032a7b35a0a20689b9d9e0c4e7@intel.com>

On Fri, 10 Apr 2026 19:26:53 +0300
Jani Nikula <jani.nikula@linux.intel.com> wrote:

> On Thu, 09 Apr 2026, Kory Maincent <kory.maincent@bootlin.com> wrote:
> > Switch the i915 DP connector initialization from drmm_connector_init()
> > to drmm_connector_dp_init(), providing the source link capabilities
> > (supported lane counts, link rates, DSC support, voltage swing and
> > pre-emphasis levels).
> >
> > Add intel_dp_report_link_train() to collect the negotiated link
> > parameters (rate, lane count, DSC enable, per-lane voltage swing and
> > pre-emphasis) and report them via
> > drm_connector_dp_set_link_train_properties() once link training completes
> > successfully.
> >
> > Reset the link training properties via
> > drm_connector_dp_reset_link_train_properties() when the connector is
> > reported as disconnected or when the display device is disabled, so
> > the exposed state always reflects the current link status.
> >
> > Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
> > ---
> >  drivers/gpu/drm/i915/display/intel_dp.c            | 31
> > +++++++++++++++++++--- .../gpu/drm/i915/display/intel_dp_link_training.c  |
> > 25 +++++++++++++++++ 2 files changed, 52 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/display/intel_dp.c
> > b/drivers/gpu/drm/i915/display/intel_dp.c index
> > 2af64de9c81de..641406bdc0cc9 100644 ---
> > a/drivers/gpu/drm/i915/display/intel_dp.c +++
> > b/drivers/gpu/drm/i915/display/intel_dp.c @@ -45,6 +45,7 @@
> >  #include <drm/display/drm_hdmi_helper.h>
> >  #include <drm/drm_atomic_helper.h>
> >  #include <drm/drm_crtc.h>
> > +#include <drm/drm_dp_connector.h>
> >  #include <drm/drm_edid.h>
> >  #include <drm/drm_fixed.h>
> >  #include <drm/drm_managed.h>
> > @@ -6337,8 +6338,10 @@ intel_dp_detect(struct drm_connector *_connector,
> >  	drm_WARN_ON(display->drm,
> >  		    !drm_modeset_is_locked(&display->drm->mode_config.connection_mutex));
> >  
> > -	if (!intel_display_device_enabled(display))
> > +	if (!intel_display_device_enabled(display)) {
> > +		drm_connector_dp_reset_link_train_properties(_connector);
> >  		return connector_status_disconnected;
> > +	}
> >  
> >  	if (!intel_display_driver_check_access(display))
> >  		return connector->base.status;
> > @@ -6388,6 +6391,8 @@ intel_dp_detect(struct drm_connector *_connector,
> >  
> >  		intel_dp_tunnel_disconnect(intel_dp);
> >  
> > +		drm_connector_dp_reset_link_train_properties(_connector);
> > +
> >  		goto out_unset_edid;
> >  	}
> >  
> > @@ -7162,10 +7167,12 @@ intel_dp_init_connector(struct intel_digital_port
> > *dig_port, struct intel_connector *connector)
> >  {
> >  	struct intel_display *display = to_intel_display(dig_port);
> > +	struct drm_connector_dp_link_train_caps link_caps;
> >  	struct intel_dp *intel_dp = &dig_port->dp;
> >  	struct intel_encoder *encoder = &dig_port->base;
> >  	struct drm_device *dev = encoder->base.dev;
> >  	enum port port = encoder->port;
> > +	u32 *rates;
> >  	int type;
> >  
> >  	if (drm_WARN(dev, dig_port->max_lanes < 1,
> > @@ -7213,8 +7220,25 @@ intel_dp_init_connector(struct intel_digital_port
> > *dig_port, type == DRM_MODE_CONNECTOR_eDP ? "eDP" : "DP",
> >  		    encoder->base.base.id, encoder->base.name);
> >  
> > -	drmm_connector_init(dev, &connector->base,
> > &intel_dp_connector_funcs,
> > -			    type, &intel_dp->aux.ddc);
> > +	intel_dp_set_source_rates(intel_dp);
> > +	link_caps.nlanes = DRM_DP_1LANE | DRM_DP_2LANE | DRM_DP_4LANE;
> > +	link_caps.nrates = intel_dp->num_source_rates;
> > +	rates = kzalloc_objs(*rates, intel_dp->num_source_rates);
> > +	if (!rates)
> > +		goto fail;
> > +
> > +	for (int i = 0; i < intel_dp->num_source_rates; i++)
> > +		rates[i] = intel_dp->source_rates[i];
> > +
> > +	link_caps.rates = rates;
> > +	link_caps.dsc = true;  
> 
> You have a source, you have a sink, and you have a link between the two.
> 
> Source rates do not reflect the link rates common between source and
> sink.
> 
> DSC depends on source and sink, and it's not statically "true" for
> either, and depends on a bunch of things.

At init, we are reporting the capabilities of the source. So we list every
link rates that the source can achieve and we report that the source is DSC
capable which it is IIUC the code. Or maybe I am missing something?

Regards,
-- 
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox