From: Simon Horman <horms@kernel.org>
To: mhklinux@outlook.com
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, andrew+netdev@lunn.ch, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
James.Bottomley@hansenpartnership.com,
martin.petersen@oracle.com, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
linux-scsi@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH net 3/5] hv_netvsc: Preserve contiguous PFN grouping in the page buffer array
Date: Wed, 14 May 2025 10:34:35 +0100 [thread overview]
Message-ID: <20250514093435.GE3339421@horms.kernel.org> (raw)
In-Reply-To: <20250513000604.1396-4-mhklinux@outlook.com>
On Mon, May 12, 2025 at 05:06:02PM -0700, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
>
> Starting with commit dca5161f9bd0 ("hv_netvsc: Check status in
> SEND_RNDIS_PKT completion message") in the 6.3 kernel, the Linux
> driver for Hyper-V synthetic networking (netvsc) occasionally reports
> "nvsp_rndis_pkt_complete error status: 2".[1] This error indicates
> that Hyper-V has rejected a network packet transmit request from the
> guest, and the outgoing network packet is dropped. Higher level
> network protocols presumably recover and resend the packet so there is
> no functional error, but performance is slightly impacted. Commit
> dca5161f9bd0 is not the cause of the error -- it only added reporting
> of an error that was already happening without any notice. The error
> has presumably been present since the netvsc driver was originally
> introduced into Linux.
>
> The root cause of the problem is that the netvsc driver in Linux may
> send an incorrectly formatted VMBus message to Hyper-V when
> transmitting the network packet. The incorrect formatting occurs when
> the rndis header of the VMBus message crosses a page boundary due to
> how the Linux skb head memory is aligned. In such a case, two PFNs are
> required to describe the location of the rndis header, even though
> they are contiguous in guest physical address (GPA) space. Hyper-V
> requires that two rndis header PFNs be in a single "GPA range" data
> struture, but current netvsc code puts each PFN in its own GPA range,
> which Hyper-V rejects as an error.
>
> The incorrect formatting occurs only for larger packets that netvsc
> must transmit via a VMBus "GPA Direct" message. There's no problem
> when netvsc transmits a smaller packet by copying it into a pre-
> allocated send buffer slot because the pre-allocated slots don't have
> page crossing issues.
>
> After commit 14ad6ed30a10 ("net: allow small head cache usage with
> large MAX_SKB_FRAGS values") in the 6.14-rc4 kernel, the error occurs
> much more frequently in VMs with 16 or more vCPUs. It may occur every
> few seconds, or even more frequently, in an ssh session that outputs a
> lot of text. Commit 14ad6ed30a10 subtly changes how skb head memory is
> allocated, making it much more likely that the rndis header will cross
> a page boundary when the vCPU count is 16 or more. The changes in
> commit 14ad6ed30a10 are perfectly valid -- they just had the side
> effect of making the netvsc bug more prominent.
>
> Current code in init_page_array() creates a separate page buffer array
> entry for each PFN required to identify the data to be transmitted.
> Contiguous PFNs get separate entries in the page buffer array, and any
> information about contiguity is lost.
>
> Fix the core issue by having init_page_array() construct the page
> buffer array to represent contiguous ranges rather than individual
> pages. When these ranges are subsequently passed to
> netvsc_build_mpb_array(), it can build GPA ranges that contain
> multiple PFNs, as required to avoid the error "nvsp_rndis_pkt_complete
> error status: 2". If instead the network packet is sent by copying
> into a pre-allocated send buffer slot, the copy proceeds using the
> contiguous ranges rather than individual pages, but the result of the
> copying is the same. Also fix rndis_filter_send_request() to construct
> a contiguous range, since it has its own page buffer array.
>
> This change has a side benefit in CoCo VMs in that netvsc_dma_map()
> calls dma_map_single() on each contiguous range instead of on each
> page. This results in fewer calls to dma_map_single() but on larger
> chunks of memory, which should reduce contention on the swiotlb.
>
> Since the page buffer array now contains one entry for each contiguous
> range instead of for each individual page, the number of entries in
> the array can be reduced, saving 208 bytes of stack space in
> netvsc_xmit() when MAX_SKG_FRAGS has the default value of 17.
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=217503
>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217503
> Cc: <stable@vger.kernel.org> # 6.1.x
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> ---
> drivers/net/hyperv/hyperv_net.h | 12 ++++++
> drivers/net/hyperv/netvsc_drv.c | 63 ++++++++-----------------------
> drivers/net/hyperv/rndis_filter.c | 24 +++---------
> 3 files changed, 32 insertions(+), 67 deletions(-)
>
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index 70f7cb383228..76725f25abd5 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -893,6 +893,18 @@ struct nvsp_message {
> sizeof(struct nvsp_message))
> #define NETVSC_MIN_IN_MSG_SIZE sizeof(struct vmpacket_descriptor)
>
> +/* Maximum # of contiguous data ranges that can make up a trasmitted packet.
> + * Typically it's the max SKB fragments plus 2 for the rndis packet and the
> + * linear portion of the SKB. But if MAX_SKB_FRAGS is large, the value may
> + * need to be limited to MAX_PAGE_BUFFER_COUNT, which is the max # of entries
> + * in a GPA direct packet sent to netvsp over VMBus.
> + */
> +#if MAX_SKB_FRAGS + 2 < MAX_PAGE_BUFFER_COUNT
> +#define MAX_DATA_RANGES (MAX_SKB_FRAGS + 2)
> +#else
> +#define MAX_DATA_RANGES MAX_PAGE_BUFFER_COUNT
> +#endif
> +
> /* Estimated requestor size:
> * out_ring_size/min_out_msg_size + in_ring_size/min_in_msg_size
> */
...
> @@ -371,28 +338,28 @@ static u32 init_page_array(void *hdr, u32 len, struct sk_buff *skb,
> * 2. skb linear data
> * 3. skb fragment data
> */
> - slots_used += fill_pg_buf(virt_to_hvpfn(hdr),
> - offset_in_hvpage(hdr),
> - len,
> - &pb[slots_used]);
>
> + pb[0].offset = offset_in_hvpage(hdr);
> + pb[0].len = len;
> + pb[0].pfn = virt_to_hvpfn(hdr);
> packet->rmsg_size = len;
> - packet->rmsg_pgcnt = slots_used;
> + packet->rmsg_pgcnt = 1;
>
> - slots_used += fill_pg_buf(virt_to_hvpfn(data),
> - offset_in_hvpage(data),
> - skb_headlen(skb),
> - &pb[slots_used]);
> + pb[1].offset = offset_in_hvpage(skb->data);
> + pb[1].len = skb_headlen(skb);
> + pb[1].pfn = virt_to_hvpfn(skb->data);
>
> for (i = 0; i < frags; i++) {
> skb_frag_t *frag = skb_shinfo(skb)->frags + i;
> + struct hv_page_buffer *cur_pb = &pb[i + 2];
Hi Michael,
If I got things right then then pb is allocated on the stack
in netvsc_xmit and has MAX_DATA_RANGES elements.
If MAX_SKB_FRAGS is largs and MAX_DATA_RANGES has been limited to
MAX_DATA_RANGES. And frags is large. Is is possible to overrun pb here?
> + u64 pfn = page_to_hvpfn(skb_frag_page(frag));
> + u32 offset = skb_frag_off(frag);
>
> - slots_used += fill_pg_buf(page_to_hvpfn(skb_frag_page(frag)),
> - skb_frag_off(frag),
> - skb_frag_size(frag),
> - &pb[slots_used]);
> + cur_pb->offset = offset_in_hvpage(offset);
> + cur_pb->len = skb_frag_size(frag);
> + cur_pb->pfn = pfn + (offset >> HV_HYP_PAGE_SHIFT);
> }
> - return slots_used;
> + return frags + 2;
> }
>
> static int count_skb_frag_slots(struct sk_buff *skb)
> @@ -483,7 +450,7 @@ static int netvsc_xmit(struct sk_buff *skb, struct net_device *net, bool xdp_tx)
> struct net_device *vf_netdev;
> u32 rndis_msg_size;
> u32 hash;
> - struct hv_page_buffer pb[MAX_PAGE_BUFFER_COUNT];
> + struct hv_page_buffer pb[MAX_DATA_RANGES];
>
> /* If VF is present and up then redirect packets to it.
> * Skip the VF if it is marked down or has no carrier.
...
next prev parent reply other threads:[~2025-05-14 9:34 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-13 0:05 [PATCH net 0/5] hv_netvsc: Fix error "nvsp_rndis_pkt_complete error status: 2" mhkelley58
2025-05-13 0:06 ` [PATCH net 1/5] Drivers: hv: Allow vmbus_sendpacket_mpb_desc() to create multiple ranges mhkelley58
2025-05-15 10:51 ` Simon Horman
2025-05-13 0:06 ` [PATCH net 2/5] hv_netvsc: Use vmbus_sendpacket_mpb_desc() to send VMBus messages mhkelley58
2025-05-14 9:37 ` Simon Horman
2025-05-14 15:44 ` Michael Kelley
2025-05-15 10:50 ` Simon Horman
2025-05-13 0:06 ` [PATCH net 3/5] hv_netvsc: Preserve contiguous PFN grouping in the page buffer array mhkelley58
2025-05-14 9:34 ` Simon Horman [this message]
2025-05-14 15:42 ` Michael Kelley
2025-05-15 10:40 ` Simon Horman
2025-05-13 0:06 ` [PATCH net 4/5] hv_netvsc: Remove rmsg_pgcnt mhkelley58
2025-05-15 10:55 ` Simon Horman
2025-05-13 0:06 ` [PATCH net 5/5] Drivers: hv: vmbus: Remove vmbus_sendpacket_pagebuffer() mhkelley58
2025-05-15 3:00 ` [PATCH net 0/5] hv_netvsc: Fix error "nvsp_rndis_pkt_complete error status: 2" patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250514093435.GE3339421@horms.kernel.org \
--to=horms@kernel.org \
--cc=James.Bottomley@hansenpartnership.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=decui@microsoft.com \
--cc=edumazet@google.com \
--cc=haiyangz@microsoft.com \
--cc=kuba@kernel.org \
--cc=kys@microsoft.com \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=mhklinux@outlook.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=stable@vger.kernel.org \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).