All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, andrew+netdev@lunn.ch, davem@davemloft.net,
	edumazet@google.com, pabeni@redhat.com, leon@kernel.org,
	longli@microsoft.com, kotaranov@microsoft.com, horms@kernel.org,
	shradhagupta@linux.microsoft.com, ssengar@linux.microsoft.com,
	ernis@linux.microsoft.com, shirazsaleem@microsoft.com,
	linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
	stephen@networkplumber.org, jacob.e.keller@intel.com,
	dipayanroy@microsoft.com, leitao@debian.org, kees@kernel.org,
	john.fastabend@gmail.com, hawk@kernel.org, bpf@vger.kernel.org,
	daniel@iogearbox.net, ast@kernel.org, sdf@fomichev.me,
	yury.norov@gmail.com
Subject: Re: [PATCH net, v2] net: mana: Fix crash from unvalidated SHM offset read from BAR0 during FLR
Date: Fri, 1 May 2026 18:53:24 -0700	[thread overview]
Message-ID: <20260501185324.0f02dc72@kernel.org> (raw)
In-Reply-To: <afJUszROT+yKjth0@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>

On Wed, 29 Apr 2026 11:57:55 -0700 Dipayaan Roy wrote:
> During Function Level Reset recovery, the MANA driver reads
> hardware BAR0 registers that may temporarily contain garbage values.
> The SHM (Shared Memory) offset read from GDMA_REG_SHM_OFFSET is used
> to compute gc->shm_base, which is later dereferenced via readl() in
> mana_smc_poll_register(). If the hardware returns an unaligned or
> out-of-range value, the driver must not blindly use it, as this would
> propagate the hardware error into a kernel crash.
> 
> The following crash was observed on an arm64 Hyper-V guest running
> kernel 6.17.0-3013-azure during VF reset recovery triggered by HWC
> timeout.
> 
> [13291.785274] Unable to handle kernel paging request at virtual address ffff8000a200001b
> [13291.785311] Mem abort info:
> [13291.785332]   ESR = 0x0000000096000021
> [13291.785343]   EC = 0x25: DABT (current EL), IL = 32 bits
> [13291.785355]   SET = 0, FnV = 0
> [13291.785363]   EA = 0, S1PTW = 0
> [13291.785372]   FSC = 0x21: alignment fault
> [13291.785382] Data abort info:
> [13291.785391]   ISV = 0, ISS = 0x00000021, ISS2 = 0x00000000
> [13291.785404]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [13291.785412]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [13291.785421] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000014df3a1000
> [13291.785432] [ffff8000a200001b] pgd=1000000100438403, p4d=1000000100438403, pud=1000000100439403, pmd=0068000fc2000711
> [13291.785703] Internal error: Oops: 0000000096000021 [#1]  SMP
> [13291.830975] Modules linked in: tls qrtr mana_ib ib_uverbs ib_core xt_owner xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables cfg80211 8021q garp mrp stp llc binfmt_misc joydev serio_raw nls_iso8859_1 hid_generic aes_ce_blk aes_ce_cipher polyval_ce ghash_ce sm4_ce_gcm sm4_ce_ccm sm4_ce sm4_ce_cipher hid_hyperv sm4 sm3_ce sha3_ce hv_netvsc hid vmgenid hyperv_keyboard hyperv_drm sch_fq_codel nvme_fabrics efi_pstore dm_multipath nfnetlink vsock_loopback vmw_vsock_virtio_transport_common hv_sock vmw_vsock_vmci_transport vmw_vmci vsock dmi_sysfs ip_tables x_tables autofs4
> [13291.862630] CPU: 122 UID: 0 PID: 61796 Comm: kworker/122:2 Tainted: G        W           6.17.0-3013-azure #13-Ubuntu VOLUNTARY
> [13291.869902] Tainted: [W]=WARN
> [13291.871901] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 01/08/2026
> [13291.878086] Workqueue: events mana_serv_func
> [13291.880718] pstate: 62400005 (nZCv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--)
> [13291.884835] pc : mana_smc_poll_register+0x48/0xb0
> [13291.887902] lr : mana_smc_setup_hwc+0x70/0x1c0
> [13291.890493] sp : ffff8000ab79bbb0
> [13291.892364] x29: ffff8000ab79bbb0 x28: ffff00410c8b5900 x27: ffff00410d630680
> [13291.896252] x26: ffff004171f9fd80 x25: 000000016ed55000 x24: 000000017f37e000
> [13291.899990] x23: 0000000000000000 x22: 000000016ed55000 x21: 0000000000000000
> [13291.904497] x20: ffff8000a200001b x19: 0000000000004e20 x18: ffff8000a6183050
> [13291.908308] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000000a
> [13291.912542] x14: 0000000000000004 x13: 0000000000000000 x12: 0000000000000000
> [13291.916298] x11: 0000000000000000 x10: 0000000000000001 x9 : ffffc45006af1bd8
> [13291.920945] x8 : ffff000151129000 x7 : 0000000000000000 x6 : 0000000000000000
> [13291.925293] x5 : 000000015f214000 x4 : 000000017217a000 x3 : 000000016ed50000
> [13291.930436] x2 : 000000016ed55000 x1 : 0000000000000000 x0 : ffff8000a1ffffff
> [13291.934342] Call trace:
> [13291.935736]  mana_smc_poll_register+0x48/0xb0 (P)
> [13291.938611]  mana_smc_setup_hwc+0x70/0x1c0
> [13291.941113]  mana_hwc_create_channel+0x1a0/0x3a0
> [13291.944283]  mana_gd_setup+0x16c/0x398
> [13291.946584]  mana_gd_resume+0x24/0x70
> [13291.948917]  mana_do_service+0x13c/0x1d0
> [13291.951583]  mana_serv_func+0x34/0x68
> [13291.953732]  process_one_work+0x168/0x3d0
> [13291.956745]  worker_thread+0x2ac/0x480
> [13291.959104]  kthread+0xf8/0x110
> [13291.961026]  ret_from_fork+0x10/0x20
> [13291.963560] Code: d2807d00 9417c551 71000673 54000220 (b9400281)
> [13291.967299] ---[ end trace 0000000000000000 ]---
> 
> Disassembly of mana_smc_poll_register() around the crash site:
> 
> Disassembly of section .text:
> 
> 00000000000047c8 <mana_smc_poll_register>:
>     47c8: d503201f        nop
>     47cc: d503201f        nop
>     47d0: d503233f        paciasp
>     47d4: f800865e        str     x30, [x18], #8
>     47d8: a9bd7bfd        stp     x29, x30, [sp, #-48]!
>     47dc: 910003fd        mov     x29, sp
>     47e0: a90153f3        stp     x19, x20, [sp, #16]
>     47e4: 91007014        add     x20, x0, #0x1c
>     47e8: 5289c413        mov     w19, #0x4e20
>     47ec: f90013f5        str     x21, [sp, #32]
>     47f0: 12001c35        and     w21, w1, #0xff
>     47f4: 14000008        b       4814 <mana_smc_poll_register+0x4c>
>     47f8: 36f801e1  tbz  w1, #31, 4834 <mana_smc_poll_register+0x6c>
>     47fc: 52800042        mov     w2, #0x2
>     4800: d280fa01        mov     x1, #0x7d0
>     4804: d2807d00        mov     x0, #0x3e8
>     4808: 94000000        bl      0 <usleep_range_state>
>     480c: 71000673        subs    w19, w19, #0x1
>     4810: 54000200        b.eq    4850 <mana_smc_poll_register+0x88>
>     4814: b9400281      ldr   w1, [x20] <-- **** CRASHED HERE *****
>     4818: d50331bf        dmb     oshld
>     481c: 2a0103e2        mov     w2, w1
>     ...
> 
> From the crash signature x20 = ffff8000a200001b, this address
> ends in 0x1b which is not 4-byte aligned, so the 'ldr w1, [x20]'
> instruction (readl) triggers the arm64 alignment fault (FSC = 0x21).
> 
> The root cause is in mana_gd_init_vf_regs(), which computes:
> 
>   gc->shm_base = gc->bar0_va + mana_gd_r64(gc, GDMA_REG_SHM_OFFSET);
> 
> The offset is used without any validation.  The same problem exists
> in mana_gd_init_pf_regs() for sriov_base_off and sriov_shm_off.
> 
> Fix this by validating all offsets before use:
> 
> - VF: check shm_off is within BAR0, properly aligned to 4 bytes
>   (readl requirement), and leaves room for the full 256-bit
>   (32-byte) SMC aperture.
> 
> - PF: check sriov_base_off is within BAR0, aligned to 8 bytes
>   (readq requirement), and leaves room to safely read the
>   sriov_shm_off register at sriov_base_off + GDMA_PF_REG_SHM_OFF.
>   Then check sriov_shm_off leaves room for the full SMC aperture.
>   All arithmetic uses subtraction rather than addition to avoid
>   integer overflow on garbage firmware values.
> 
> without validating the offset read from hardware. If the register
> returns a garbage value that is neither within bar 0 bounds nor aligned
> to the 4-byte granularity, thus causing the alignment fault.
> 
> Define SMC_APERTURE_SIZE (32 bytes, derived from the 256-bit aperture
> width)
> 
> Return -EPROTO on invalid values.  The existing recovery path in
> mana_serv_reset() already handles -EPROTO by falling through to PCI
> device rescan, giving the hardware another chance to present valid
> register values after reset.
> 
> Fixes: 9bf66036d686 ("net: mana: Handle hardware recovery events when probing the device")
> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
> 
> ---
> Changes in v2:
> - Fix sriov_base_off alignment check: sizeof(u32) to sizeof(u64), since
>   mana_gd_r64() (readq) requires 8-byte alignment on arm64.
> - Fix sriov_base_off bounds: also verify enough space remains in BAR0
>   to safely read sriov_shm_off at offset GDMA_PF_REG_SHM_OFF + 8 bytes.
> - Fix integer overflow: rewrite bounds checks using subtraction
>   (remaining = bar0_size - base) instead of addition.
> - Fix SMC aperture size: add gc->bar0_size - shm_off < SMC_APERTURE_SIZE
>   checks in both VF and PF paths; previously only the start address was
>   validated, but mana_smc_poll_register() accesses up to shm_base + 0x1c
>   (28 bytes from base, 32 bytes total).
> - Export SMC_APERTURE_SIZE to shm_channel.h.
> ---
>  .../net/ethernet/microsoft/mana/gdma_main.c   | 40 ++++++++++++++++---
>  include/net/mana/shm_channel.h                |  6 +++
>  2 files changed, 41 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index 098fbda0d128..d8e816882f02 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -43,8 +43,9 @@ static u64 mana_gd_r64(struct gdma_context *g, u64 offset)
>  static int mana_gd_init_pf_regs(struct pci_dev *pdev)
>  {
>  	struct gdma_context *gc = pci_get_drvdata(pdev);
> -	void __iomem *sriov_base_va;
> +	u64 remaining_barsize;
>  	u64 sriov_base_off;
> +	u64 sriov_shm_off;
>  
>  	gc->db_page_size = mana_gd_r32(gc, GDMA_PF_REG_DB_PAGE_SIZE) & 0xFFFF;
>  
> @@ -73,10 +74,28 @@ static int mana_gd_init_pf_regs(struct pci_dev *pdev)
>  	gc->phys_db_page_base = gc->bar0_pa + gc->db_page_off;
>  
>  	sriov_base_off = mana_gd_r64(gc, GDMA_SRIOV_REG_CFG_BASE_OFF);
> +	if (sriov_base_off >= gc->bar0_size ||
> +	    gc->bar0_size - sriov_base_off <
> +		GDMA_PF_REG_SHM_OFF + sizeof(u64) ||

nit: fits on a single line, I think?

> +	    !IS_ALIGNED(sriov_base_off, sizeof(u64))) {
> +		dev_err(gc->dev,
> +			"SRIOV base offset 0x%llx out of range or unaligned (BAR0 size 0x%llx)\n",
> +			sriov_base_off, (u64)gc->bar0_size);
> +		return -EPROTO;
> +	}
>  
> -	sriov_base_va = gc->bar0_va + sriov_base_off;
> -	gc->shm_base = sriov_base_va +
> -			mana_gd_r64(gc, sriov_base_off + GDMA_PF_REG_SHM_OFF);
> +	remaining_barsize = gc->bar0_size - sriov_base_off;
> +	sriov_shm_off = mana_gd_r64(gc, sriov_base_off + GDMA_PF_REG_SHM_OFF);
> +	if (sriov_shm_off >= remaining_barsize ||
> +	    remaining_barsize - sriov_shm_off < SMC_APERTURE_SIZE ||
> +	    !IS_ALIGNED(sriov_shm_off, sizeof(u32))) {
> +		dev_err(gc->dev,
> +			"SRIOV SHM offset 0x%llx out of range or unaligned (BAR0 size 0x%llx)\n",
> +			sriov_shm_off, (u64)gc->bar0_size);
> +		return -EPROTO;
> +	}
> +
> +	gc->shm_base = gc->bar0_va + sriov_base_off + sriov_shm_off;
>  
>  	return 0;
>  }
> @@ -84,6 +103,7 @@ static int mana_gd_init_pf_regs(struct pci_dev *pdev)
>  static int mana_gd_init_vf_regs(struct pci_dev *pdev)
>  {
>  	struct gdma_context *gc = pci_get_drvdata(pdev);
> +	u64 shm_off;
>  
>  	gc->db_page_size = mana_gd_r32(gc, GDMA_REG_DB_PAGE_SIZE) & 0xFFFF;
>  
> @@ -111,7 +131,17 @@ static int mana_gd_init_vf_regs(struct pci_dev *pdev)
>  	gc->db_page_base = gc->bar0_va + gc->db_page_off;
>  	gc->phys_db_page_base = gc->bar0_pa + gc->db_page_off;
>  
> -	gc->shm_base = gc->bar0_va + mana_gd_r64(gc, GDMA_REG_SHM_OFFSET);
> +	shm_off = mana_gd_r64(gc, GDMA_REG_SHM_OFFSET);
> +	if (shm_off >= gc->bar0_size ||
> +	    gc->bar0_size - shm_off < SMC_APERTURE_SIZE ||
> +	    !IS_ALIGNED(shm_off, sizeof(u32))) {
> +		dev_err(gc->dev,
> +			"SHM offset 0x%llx out of range or unaligned (BAR0 size 0x%llx)\n",
> +			shm_off, (u64)gc->bar0_size);
> +		return -EPROTO;
> +	}
> +
> +	gc->shm_base = gc->bar0_va + shm_off;
>  
>  	return 0;
>  }
> diff --git a/include/net/mana/shm_channel.h b/include/net/mana/shm_channel.h
> index 5199b41497ff..dbabcfb95daf 100644
> --- a/include/net/mana/shm_channel.h
> +++ b/include/net/mana/shm_channel.h
> @@ -4,6 +4,12 @@
>  #ifndef _SHM_CHANNEL_H
>  #define _SHM_CHANNEL_H
>  
> +#define SMC_APERTURE_BITS 256
> +#define SMC_BASIC_UNIT (sizeof(u32))
> +#define SMC_APERTURE_DWORDS (SMC_APERTURE_BITS / (SMC_BASIC_UNIT * 8))
> +#define SMC_LAST_DWORD (SMC_APERTURE_DWORDS - 1)
> +#define SMC_APERTURE_SIZE  (SMC_APERTURE_BITS / 8)

AI bots complain that we're redefining this.
Since it's a fix I think it's better to remove the existing definition
even if it lives in a driver that goes via a different tree.

>  struct shm_channel {
>  	struct device *dev;
>  	void __iomem *base;
-- 
pw-bot: cr

  parent reply	other threads:[~2026-05-02  1:53 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-29 18:57 [PATCH net, v2] net: mana: Fix crash from unvalidated SHM offset read from BAR0 during FLR Dipayaan Roy
2026-04-30 18:59 ` sashiko-bot
2026-05-02  1:53 ` Jakub Kicinski [this message]
2026-05-03  3:03   ` Dipayaan Roy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260501185324.0f02dc72@kernel.org \
    --to=kuba@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=dipayanroy@linux.microsoft.com \
    --cc=dipayanroy@microsoft.com \
    --cc=edumazet@google.com \
    --cc=ernis@linux.microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=jacob.e.keller@intel.com \
    --cc=john.fastabend@gmail.com \
    --cc=kees@kernel.org \
    --cc=kotaranov@microsoft.com \
    --cc=kys@microsoft.com \
    --cc=leitao@debian.org \
    --cc=leon@kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=shirazsaleem@microsoft.com \
    --cc=shradhagupta@linux.microsoft.com \
    --cc=ssengar@linux.microsoft.com \
    --cc=stephen@networkplumber.org \
    --cc=wei.liu@kernel.org \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.