* [PATCH net] net: gso: fix panic on frag_list with mixed head alloc types
@ 2022-10-27 8:20 Jiri Benc
2022-10-29 4:41 ` Jakub Kicinski
2022-10-29 7:41 ` Shmulik Ladkani
0 siblings, 2 replies; 7+ messages in thread
From: Jiri Benc @ 2022-10-27 8:20 UTC (permalink / raw)
To: netdev; +Cc: Shmulik Ladkani, Eric Dumazet, Tomas Hruby, Jeremi Piotrowski
Since commit 3dcbdb134f32 ("net: gso: Fix skb_segment splat when
splitting gso_size mangled skb having linear-headed frag_list"), it is
allowed to change gso_size of a GRO packet. However, that commit assumes
that "checking the first list_skb member suffices; i.e if either of the
list_skb members have non head_frag head, then the first one has too".
It turns out this assumption does not hold. We've seen BUG_ON being hit
in skb_segment when skbs on the frag_list had differing head_frag. That
particular case was with vmxnet3; looking at the driver, it indeed uses
different skb allocation strategies based on the packet size. The last
packet in frag_list can thus be kmalloced if it is sufficiently small.
And there's nothing preventing drivers from mixing things even more
freely.
There are three different locations where this can be fixed:
(1) We could check head_frag in GRO and not allow GROing skbs with
different head_frag. However, that would lead to performance
regression (at least on vmxnet3) on normal forward paths with
unmodified gso_size, where mixed head_frag is not a problem.
(2) Set a flag in bpf_skb_net_grow and bpf_skb_net_shrink indicating
that NETIF_F_SG is undesirable. That would need to eat a bit in
sk_buff. Furthermore, that flag can be unset when all skbs on the
frag_list are page backed. To retain good performance,
bpf_skb_net_grow/shrink would have to walk the frag_list.
(3) Walk the frag_list in skb_segment when determining whether
NETIF_F_SG should be cleared. This of course slows things down.
This patch implements (3). To limit the performance impact in
skb_segment, the list is walked only for skbs with SKB_GSO_DODGY set
that have gso_size changed. Normal paths thus will not hit it.
Fixes: 3dcbdb134f32 ("net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
net/core/skbuff.c | 36 +++++++++++++++++++-----------------
1 file changed, 19 insertions(+), 17 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 1d9719e72f9d..bbf3acff44c6 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4134,23 +4134,25 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
int i = 0;
int pos;
- if (list_skb && !list_skb->head_frag && skb_headlen(list_skb) &&
- (skb_shinfo(head_skb)->gso_type & SKB_GSO_DODGY)) {
- /* gso_size is untrusted, and we have a frag_list with a linear
- * non head_frag head.
- *
- * (we assume checking the first list_skb member suffices;
- * i.e if either of the list_skb members have non head_frag
- * head, then the first one has too).
- *
- * If head_skb's headlen does not fit requested gso_size, it
- * means that the frag_list members do NOT terminate on exact
- * gso_size boundaries. Hence we cannot perform skb_frag_t page
- * sharing. Therefore we must fallback to copying the frag_list
- * skbs; we do so by disabling SG.
- */
- if (mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb))
- features &= ~NETIF_F_SG;
+ if ((skb_shinfo(head_skb)->gso_type & SKB_GSO_DODGY) &&
+ mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) {
+ struct sk_buff *check_skb;
+
+ for (check_skb = list_skb; check_skb; check_skb = check_skb->next) {
+ if (skb_headlen(check_skb) && !check_skb->head_frag) {
+ /* gso_size is untrusted, and we have a frag_list with
+ * a linear non head_frag item.
+ *
+ * If head_skb's headlen does not fit requested gso_size,
+ * it means that the frag_list members do NOT terminate
+ * on exact gso_size boundaries. Hence we cannot perform
+ * skb_frag_t page sharing. Therefore we must fallback to
+ * copying the frag_list skbs; we do so by disabling SG.
+ */
+ features &= ~NETIF_F_SG;
+ break;
+ }
+ }
}
__skb_push(head_skb, doffset);
--
2.37.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net] net: gso: fix panic on frag_list with mixed head alloc types
2022-10-27 8:20 [PATCH net] net: gso: fix panic on frag_list with mixed head alloc types Jiri Benc
@ 2022-10-29 4:41 ` Jakub Kicinski
2022-10-31 15:54 ` Jiri Benc
2022-10-29 7:41 ` Shmulik Ladkani
1 sibling, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2022-10-29 4:41 UTC (permalink / raw)
To: Jiri Benc
Cc: netdev, Shmulik Ladkani, Eric Dumazet, Tomas Hruby,
Jeremi Piotrowski, alexanderduyck, willemb, Paolo Abeni
On Thu, 27 Oct 2022 10:20:56 +0200 Jiri Benc wrote:
> Since commit 3dcbdb134f32 ("net: gso: Fix skb_segment splat when
> splitting gso_size mangled skb having linear-headed frag_list"), it is
> allowed to change gso_size of a GRO packet. However, that commit assumes
> that "checking the first list_skb member suffices; i.e if either of the
> list_skb members have non head_frag head, then the first one has too".
>
> It turns out this assumption does not hold. We've seen BUG_ON being hit
> in skb_segment when skbs on the frag_list had differing head_frag. That
> particular case was with vmxnet3; looking at the driver, it indeed uses
> different skb allocation strategies based on the packet size.
Where are you looking? I'm not seeing it TBH.
I don't think the driver is that important, tho, __napi_alloc_skb()
will select page backing or kmalloc, all by itself.
The patch LGTM, adding more CCs in case I'm missing something.
> The last packet in frag_list can thus be kmalloced if it is
> sufficiently small. And there's nothing preventing drivers from
> mixing things even more freely.
>
> There are three different locations where this can be fixed:
>
> (1) We could check head_frag in GRO and not allow GROing skbs with
> different head_frag. However, that would lead to performance
> regression (at least on vmxnet3) on normal forward paths with
> unmodified gso_size, where mixed head_frag is not a problem.
>
> (2) Set a flag in bpf_skb_net_grow and bpf_skb_net_shrink indicating
> that NETIF_F_SG is undesirable. That would need to eat a bit in
> sk_buff. Furthermore, that flag can be unset when all skbs on the
> frag_list are page backed. To retain good performance,
> bpf_skb_net_grow/shrink would have to walk the frag_list.
>
> (3) Walk the frag_list in skb_segment when determining whether
> NETIF_F_SG should be cleared. This of course slows things down.
>
> This patch implements (3). To limit the performance impact in
> skb_segment, the list is walked only for skbs with SKB_GSO_DODGY set
> that have gso_size changed. Normal paths thus will not hit it.
>
> Fixes: 3dcbdb134f32 ("net: gso: Fix skb_segment splat when splitting
> gso_size mangled skb having linear-headed frag_list") Signed-off-by:
> Jiri Benc <jbenc@redhat.com>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 1d9719e72f9d..bbf3acff44c6 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4134,23 +4134,25 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> int i = 0;
> int pos;
>
> - if (list_skb && !list_skb->head_frag && skb_headlen(list_skb) &&
> - (skb_shinfo(head_skb)->gso_type & SKB_GSO_DODGY)) {
> - /* gso_size is untrusted, and we have a frag_list with a linear
> - * non head_frag head.
> - *
> - * (we assume checking the first list_skb member suffices;
> - * i.e if either of the list_skb members have non head_frag
> - * head, then the first one has too).
> - *
> - * If head_skb's headlen does not fit requested gso_size, it
> - * means that the frag_list members do NOT terminate on exact
> - * gso_size boundaries. Hence we cannot perform skb_frag_t page
> - * sharing. Therefore we must fallback to copying the frag_list
> - * skbs; we do so by disabling SG.
> - */
> - if (mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb))
> - features &= ~NETIF_F_SG;
> + if ((skb_shinfo(head_skb)->gso_type & SKB_GSO_DODGY) &&
> + mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) {
> + struct sk_buff *check_skb;
> +
> + for (check_skb = list_skb; check_skb; check_skb = check_skb->next) {
> + if (skb_headlen(check_skb) && !check_skb->head_frag) {
> + /* gso_size is untrusted, and we have a frag_list with
> + * a linear non head_frag item.
> + *
> + * If head_skb's headlen does not fit requested gso_size,
> + * it means that the frag_list members do NOT terminate
> + * on exact gso_size boundaries. Hence we cannot perform
> + * skb_frag_t page sharing. Therefore we must fallback to
> + * copying the frag_list skbs; we do so by disabling SG.
> + */
> + features &= ~NETIF_F_SG;
> + break;
> + }
> + }
> }
>
> __skb_push(head_skb, doffset);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] net: gso: fix panic on frag_list with mixed head alloc types
2022-10-27 8:20 [PATCH net] net: gso: fix panic on frag_list with mixed head alloc types Jiri Benc
2022-10-29 4:41 ` Jakub Kicinski
@ 2022-10-29 7:41 ` Shmulik Ladkani
2022-10-29 14:10 ` Willem de Bruijn
1 sibling, 1 reply; 7+ messages in thread
From: Shmulik Ladkani @ 2022-10-29 7:41 UTC (permalink / raw)
To: Jiri Benc, willemb
Cc: netdev, Eric Dumazet, Tomas Hruby, Jeremi Piotrowski,
alexanderduyck, Jakub Kicinski
On Thu, 27 Oct 2022 10:20:56 +0200
Jiri Benc <jbenc@redhat.com> wrote:
> It turns out this assumption does not hold. We've seen BUG_ON being hit
> in skb_segment when skbs on the frag_list had differing head_frag. That
> particular case was with vmxnet3; looking at the driver, it indeed uses
> different skb allocation strategies based on the packet size. The last
> packet in frag_list can thus be kmalloced if it is sufficiently small.
> And there's nothing preventing drivers from mixing things even more
> freely.
Hi Jiri,
One of my early attempts to fix the original BUG was to also detect:
> - some frag in the frag_list has a linear part that is NOT head_frag,
> or length not equal to the requested gso_size
See [0], see skb_is_nonlinear_equal_frags() there
(Note that your current suggestion implements the "some frag in the
frag_list has a linear part that is NOT head_frag" condition, but not
"length not equal to the requested gso_size")
As a response, Willem suggested:
> My suggestion only tested the first frag_skb length. If a list can be
> created where the first frag_skb is head_frag but a later one is not,
> it will fail short. I kind of doubt that.
See [1]
So we eventually concluded testing just
!list_skb->head_frag && skb_headlen(list_skb)
and not every frag in frag_list.
Maybe Willem can elaborate on that.
[0] https://lore.kernel.org/netdev/20190903185121.56906d31@pixies/
[1] https://lore.kernel.org/netdev/CA+FuTScE=pyopY=3f5E4JGx1zyGqT+XS+8ss13UN4if4TZ2NbA@mail.gmail.com/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] net: gso: fix panic on frag_list with mixed head alloc types
2022-10-29 7:41 ` Shmulik Ladkani
@ 2022-10-29 14:10 ` Willem de Bruijn
2022-10-31 16:52 ` Jiri Benc
0 siblings, 1 reply; 7+ messages in thread
From: Willem de Bruijn @ 2022-10-29 14:10 UTC (permalink / raw)
To: Shmulik Ladkani
Cc: Jiri Benc, netdev, Eric Dumazet, Tomas Hruby, Jeremi Piotrowski,
alexanderduyck, Jakub Kicinski
On Sat, Oct 29, 2022 at 3:41 AM Shmulik Ladkani
<shmulik.ladkani@gmail.com> wrote:
>
> On Thu, 27 Oct 2022 10:20:56 +0200
> Jiri Benc <jbenc@redhat.com> wrote:
>
> > It turns out this assumption does not hold. We've seen BUG_ON being hit
> > in skb_segment when skbs on the frag_list had differing head_frag. That
> > particular case was with vmxnet3; looking at the driver, it indeed uses
> > different skb allocation strategies based on the packet size. The last
> > packet in frag_list can thus be kmalloced if it is sufficiently small.
> > And there's nothing preventing drivers from mixing things even more
> > freely.
>
> Hi Jiri,
>
> One of my early attempts to fix the original BUG was to also detect:
>
> > - some frag in the frag_list has a linear part that is NOT head_frag,
> > or length not equal to the requested gso_size
>
> See [0], see skb_is_nonlinear_equal_frags() there
>
> (Note that your current suggestion implements the "some frag in the
> frag_list has a linear part that is NOT head_frag" condition, but not
> "length not equal to the requested gso_size")
>
> As a response, Willem suggested:
>
> > My suggestion only tested the first frag_skb length. If a list can be
> > created where the first frag_skb is head_frag but a later one is not,
> > it will fail short. I kind of doubt that.
>
> See [1]
>
> So we eventually concluded testing just
> !list_skb->head_frag && skb_headlen(list_skb)
> and not every frag in frag_list.
>
> Maybe Willem can elaborate on that.
Clearly I was wrong :)
If a device has different allocation strategies depending on packet
size, and GRO coalesces those using a list, then indeed this does not
have to hold. GRO requires the same packet size and thus allocation
strategy to coalesce -- except for the last segment.
I don't see any allocation in vmxnet3 that uses a head frag, though.
There is a small packet path (rxDataRingUsed), but both small and
large allocate using a kmalloc-backed skb->data as far as I can tell.
In any case, iterating over all frags is more robust. This is an edge
case, fine to incur the cost there.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] net: gso: fix panic on frag_list with mixed head alloc types
2022-10-29 4:41 ` Jakub Kicinski
@ 2022-10-31 15:54 ` Jiri Benc
0 siblings, 0 replies; 7+ messages in thread
From: Jiri Benc @ 2022-10-31 15:54 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, Shmulik Ladkani, Eric Dumazet, Tomas Hruby,
Jeremi Piotrowski, alexanderduyck, willemb, Paolo Abeni
On Fri, 28 Oct 2022 21:41:23 -0700, Jakub Kicinski wrote:
> On Thu, 27 Oct 2022 10:20:56 +0200 Jiri Benc wrote:
> > It turns out this assumption does not hold. We've seen BUG_ON being hit
> > in skb_segment when skbs on the frag_list had differing head_frag. That
> > particular case was with vmxnet3; looking at the driver, it indeed uses
> > different skb allocation strategies based on the packet size.
>
> Where are you looking? I'm not seeing it TBH.
Looking at the code again, I think I misread it.
> I don't think the driver is that important, tho, __napi_alloc_skb()
> will select page backing or kmalloc, all by itself.
In this case, it was __netdev_alloc_skb. I'll fix the description.
Thanks!
Jiri
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] net: gso: fix panic on frag_list with mixed head alloc types
2022-10-29 14:10 ` Willem de Bruijn
@ 2022-10-31 16:52 ` Jiri Benc
2022-10-31 21:16 ` Willem de Bruijn
0 siblings, 1 reply; 7+ messages in thread
From: Jiri Benc @ 2022-10-31 16:52 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Shmulik Ladkani, netdev, Eric Dumazet, Tomas Hruby,
Jeremi Piotrowski, alexanderduyck, Jakub Kicinski
On Sat, 29 Oct 2022 10:10:03 -0400, Willem de Bruijn wrote:
> If a device has different allocation strategies depending on packet
> size, and GRO coalesces those using a list, then indeed this does not
> have to hold. GRO requires the same packet size and thus allocation
> strategy to coalesce -- except for the last segment.
That's exactly what I saw: the last segment was different.
However, I don't see anything in the GRO code that enforces that. It
appears that currently, it just usually happens that way. When there's
a burst of packets for the given flow on the wire, only the last
segment is small (and thus malloced) and there's no immediate packet
following for the same flow. What would happen if (for whatever reason)
there was such packet following?
> I don't see any allocation in vmxnet3 that uses a head frag, though.
> There is a small packet path (rxDataRingUsed), but both small and
> large allocate using a kmalloc-backed skb->data as far as I can tell.
I believe the logic is that for rxDataRingUsed,
netdev_alloc_skb_ip_align is called to alloc skb to copy data into,
passing to it the actual packet length. If it's small enough,
__netdev_alloc_skb will kmalloc the data. However, for !rxDataRingUsed,
the skb for dma buffer is allocated with a larger length and
__netdev_alloc_skb will use page_frag_alloc.
I admit I've not spend that much time understanding the logic in the
driver. I was satisfied when the perceived logic matched what I saw in
the kernel memory dump. I may have easily missed something, such as
Jakub's point that it's not actually the driver deciding on the
allocation strategy but rather __netdev_alloc_skb on its own. But the
outcome still seems to be that small packets are kmalloced, while
larger packets are page backed. Am I wrong?
> In any case, iterating over all frags is more robust. This is an edge
> case, fine to incur the cost there.
Thanks! We might get a minor speedup if we check only the last segment;
but first, I'd like to be proven wrong about GRO not enforcing this.
Plus, I wonder whether the speedup would be measurable if we have to
iterate through the list to find the last segment anyway.
Unless there are objections or clarifications (and unless I'm wrong
above), I'll send a v2 with the commit message corrected and with the
same code.
Thanks to all!
Jiri
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] net: gso: fix panic on frag_list with mixed head alloc types
2022-10-31 16:52 ` Jiri Benc
@ 2022-10-31 21:16 ` Willem de Bruijn
0 siblings, 0 replies; 7+ messages in thread
From: Willem de Bruijn @ 2022-10-31 21:16 UTC (permalink / raw)
To: Jiri Benc
Cc: Willem de Bruijn, Shmulik Ladkani, netdev, Eric Dumazet,
Tomas Hruby, Jeremi Piotrowski, alexanderduyck, Jakub Kicinski
On Mon, Oct 31, 2022 at 12:53 PM Jiri Benc <jbenc@redhat.com> wrote:
>
> On Sat, 29 Oct 2022 10:10:03 -0400, Willem de Bruijn wrote:
> > If a device has different allocation strategies depending on packet
> > size, and GRO coalesces those using a list, then indeed this does not
> > have to hold. GRO requires the same packet size and thus allocation
> > strategy to coalesce -- except for the last segment.
>
> That's exactly what I saw: the last segment was different.
>
> However, I don't see anything in the GRO code that enforces that. It
> appears that currently, it just usually happens that way. When there's
> a burst of packets for the given flow on the wire, only the last
> segment is small (and thus malloced) and there's no immediate packet
> following for the same flow. What would happen if (for whatever reason)
> there was such packet following?
This is enforced. In tcp_gro_receive:
/* If skb is a GRO packet, make sure its gso_size matches
prior packet mss.
* If it is a single frame, do not aggregate it if its length
* is bigger than our mss.
*/
if (unlikely(skb_is_gso(skb)))
flush |= (mss != skb_shinfo(skb)->gso_size);
else
flush |= (len - 1) >= mss;
[..]
/* Force a flush if last segment is smaller than mss. */
if (unlikely(skb_is_gso(skb)))
flush = len != NAPI_GRO_CB(skb)->count *
skb_shinfo(skb)->gso_size;
else
flush = len < mss;
That branch and the comment is very new, introduced with GRO support
for HW-GRO packets. But the else clauses are not new.
>
> > I don't see any allocation in vmxnet3 that uses a head frag, though.
> > There is a small packet path (rxDataRingUsed), but both small and
> > large allocate using a kmalloc-backed skb->data as far as I can tell.
>
> I believe the logic is that for rxDataRingUsed,
> netdev_alloc_skb_ip_align is called to alloc skb to copy data into,
> passing to it the actual packet length. If it's small enough,
> __netdev_alloc_skb will kmalloc the data. However, for !rxDataRingUsed,
> the skb for dma buffer is allocated with a larger length and
> __netdev_alloc_skb will use page_frag_alloc.
That explains perfectly.
> I admit I've not spend that much time understanding the logic in the
> driver. I was satisfied when the perceived logic matched what I saw in
> the kernel memory dump. I may have easily missed something, such as
> Jakub's point that it's not actually the driver deciding on the
> allocation strategy but rather __netdev_alloc_skb on its own. But the
> outcome still seems to be that small packets are kmalloced, while
> larger packets are page backed. Am I wrong?
>
> > In any case, iterating over all frags is more robust. This is an edge
> > case, fine to incur the cost there.
>
> Thanks! We might get a minor speedup if we check only the last segment;
> but first, I'd like to be proven wrong about GRO not enforcing this.
> Plus, I wonder whether the speedup would be measurable if we have to
> iterate through the list to find the last segment anyway.
>
> Unless there are objections or clarifications (and unless I'm wrong
> above), I'll send a v2 with the commit message corrected and with the
> same code.
Sounds good to me, thanks.
>
> Thanks to all!
>
> Jiri
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-10-31 21:17 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-27 8:20 [PATCH net] net: gso: fix panic on frag_list with mixed head alloc types Jiri Benc
2022-10-29 4:41 ` Jakub Kicinski
2022-10-31 15:54 ` Jiri Benc
2022-10-29 7:41 ` Shmulik Ladkani
2022-10-29 14:10 ` Willem de Bruijn
2022-10-31 16:52 ` Jiri Benc
2022-10-31 21:16 ` Willem de Bruijn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).