* [PATCH net-next 0/2] net: permit skb_segment on head_frag frag_list skb
@ 2018-03-20 5:16 Yonghong Song
2018-03-20 5:16 ` [PATCH net-next 1/2] " Yonghong Song
2018-03-20 5:16 ` [PATCH net-next 2/2] net: bpf: add a test for skb_segment in test_bpf module Yonghong Song
0 siblings, 2 replies; 6+ messages in thread
From: Yonghong Song @ 2018-03-20 5:16 UTC (permalink / raw)
To: edumazet, ast, daniel, diptanu, netdev; +Cc: kernel-team
One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
function skb_segment(), line 3667. The bpf program attaches to
clsact ingress, calls bpf_skb_change_proto to change protocol
from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect
to send the changed packet out.
...
3665 while (pos < offset + len) {
3666 if (i >= nfrags) {
3667 BUG_ON(skb_headlen(list_skb));
...
The triggering input skb has the following properties:
list_skb = skb->frag_list;
skb->nfrags != NULL && skb_headlen(list_skb) != 0
and skb_segment() is not able to handle a frag_list skb
if its headlen (list_skb->len - list_skb->data_len) is not 0.
Patch #1 provides a simple solution to avoid BUG_ON. If
list_skb->head_frag is true, its page-backed frag will
be processed before the list_skb->frags.
Patch #2 provides a test case in test_bpf module which
constructs a skb and calls skb_segment() directly. The test
case is able to trigger the BUG_ON without Patch #1.
Yonghong Song (2):
net: permit skb_segment on head_frag frag_list skb
net: bpf: add a test for skb_segment in test_bpf module
lib/test_bpf.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
net/core/skbuff.c | 42 ++++++++++++++++++++++----------
2 files changed, 100 insertions(+), 13 deletions(-)
--
2.9.5
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH net-next 1/2] net: permit skb_segment on head_frag frag_list skb 2018-03-20 5:16 [PATCH net-next 0/2] net: permit skb_segment on head_frag frag_list skb Yonghong Song @ 2018-03-20 5:16 ` Yonghong Song 2018-03-20 5:30 ` Yuan, Linyu (NSB - CN/Shanghai) 2018-03-20 5:16 ` [PATCH net-next 2/2] net: bpf: add a test for skb_segment in test_bpf module Yonghong Song 1 sibling, 1 reply; 6+ messages in thread From: Yonghong Song @ 2018-03-20 5:16 UTC (permalink / raw) To: edumazet, ast, daniel, diptanu, netdev; +Cc: kernel-team One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at function skb_segment(), line 3667. The bpf program attaches to clsact ingress, calls bpf_skb_change_proto to change protocol from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect to send the changed packet out. 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb, 3473 netdev_features_t features) 3474 { 3475 struct sk_buff *segs = NULL; 3476 struct sk_buff *tail = NULL; ... 3665 while (pos < offset + len) { 3666 if (i >= nfrags) { 3667 BUG_ON(skb_headlen(list_skb)); 3668 3669 i = 0; 3670 nfrags = skb_shinfo(list_skb)->nr_frags; 3671 frag = skb_shinfo(list_skb)->frags; 3672 frag_skb = list_skb; ... call stack: ... #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525 #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7 #4 [ffff883ffef03668] die at ffffffff8101deb2 #5 [ffff883ffef03698] do_trap at ffffffff8101a700 #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0 #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab [exception RIP: skb_segment+3044] RIP: ffffffff817e4dd4 RSP: ffff883ffef03860 RFLAGS: 00010216 RAX: 0000000000002bf6 RBX: ffff883feb7aaa00 RCX: 0000000000000011 RDX: ffff883fb87910c0 RSI: 0000000000000011 RDI: ffff883feb7ab500 RBP: ffff883ffef03928 R8: 0000000000002ce2 R9: 00000000000027da R10: 000001ea00000000 R11: 0000000000002d82 R12: ffff883f90a1ee80 R13: ffff883fb8791120 R14: ffff883feb7abc00 R15: 0000000000002ce2 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7 --- <IRQ stack> --- ... The triggering input skb has the following properties: list_skb = skb->frag_list; skb->nfrags != NULL && skb_headlen(list_skb) != 0 and skb_segment() is not able to handle a frag_list skb if its headlen (list_skb->len - list_skb->data_len) is not 0. This patch addressed the issue by handling skb_headlen(list_skb) != 0 case properly if list_skb->head_frag is true, which is expected in most cases. A one-element frag array is created for the list_skb head and processed before list_skb->frags are processed. Reported-by: Diptanu Gon Choudhury <diptanu@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> --- net/core/skbuff.c | 42 ++++++++++++++++++++++++++++++------------ 1 file changed, 30 insertions(+), 12 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 715c134..983f62a 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3475,9 +3475,10 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, struct sk_buff *segs = NULL; struct sk_buff *tail = NULL; struct sk_buff *list_skb = skb_shinfo(head_skb)->frag_list; - skb_frag_t *frag = skb_shinfo(head_skb)->frags; + skb_frag_t *frag = skb_shinfo(head_skb)->frags, head_frag; unsigned int mss = skb_shinfo(head_skb)->gso_size; unsigned int doffset = head_skb->data - skb_mac_header(head_skb); + struct sk_buff *check_list_skb = list_skb; struct sk_buff *frag_skb = head_skb; unsigned int offset = doffset; unsigned int tnl_hlen = skb_tnl_header_len(head_skb); @@ -3590,6 +3591,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, nskb = skb_clone(list_skb, GFP_ATOMIC); list_skb = list_skb->next; + check_list_skb = list_skb; if (unlikely(!nskb)) goto err; @@ -3664,21 +3666,37 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, while (pos < offset + len) { if (i >= nfrags) { - BUG_ON(skb_headlen(list_skb)); + if (skb_headlen(list_skb) && check_list_skb == list_skb) { + struct page *page; + + BUG_ON(!list_skb->head_frag); + + i = 0; + nfrags = 1; + page = virt_to_head_page(list_skb->head); + head_frag.page.p = page; + head_frag.page_offset = list_skb->data - + (unsigned char *)page_address(page); + head_frag.size = skb_headlen(list_skb); + frag = &head_frag; + check_list_skb = list_skb->next; + } else { + BUG_ON(skb_headlen(list_skb) && check_list_skb == list_skb); - i = 0; - nfrags = skb_shinfo(list_skb)->nr_frags; - frag = skb_shinfo(list_skb)->frags; - frag_skb = list_skb; + i = 0; + nfrags = skb_shinfo(list_skb)->nr_frags; + frag = skb_shinfo(list_skb)->frags; + frag_skb = list_skb; - BUG_ON(!nfrags); + BUG_ON(!nfrags); - if (skb_orphan_frags(frag_skb, GFP_ATOMIC) || - skb_zerocopy_clone(nskb, frag_skb, - GFP_ATOMIC)) - goto err; + if (skb_orphan_frags(frag_skb, GFP_ATOMIC) || + skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC)) + goto err; - list_skb = list_skb->next; + list_skb = list_skb->next; + check_list_skb = list_skb; + } } if (unlikely(skb_shinfo(nskb)->nr_frags >= -- 2.9.5 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* RE: [PATCH net-next 1/2] net: permit skb_segment on head_frag frag_list skb 2018-03-20 5:16 ` [PATCH net-next 1/2] " Yonghong Song @ 2018-03-20 5:30 ` Yuan, Linyu (NSB - CN/Shanghai) 2018-03-20 5:54 ` Yonghong Song 0 siblings, 1 reply; 6+ messages in thread From: Yuan, Linyu (NSB - CN/Shanghai) @ 2018-03-20 5:30 UTC (permalink / raw) To: Yonghong Song, edumazet@google.com, ast@fb.com, daniel@iogearbox.net, diptanu@fb.com, netdev@vger.kernel.org Cc: kernel-team@fb.com > -----Original Message----- > From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] > On Behalf Of Yonghong Song > Sent: Tuesday, March 20, 2018 1:16 PM > To: edumazet@google.com; ast@fb.com; daniel@iogearbox.net; > diptanu@fb.com; netdev@vger.kernel.org > Cc: kernel-team@fb.com > Subject: [PATCH net-next 1/2] net: permit skb_segment on head_frag frag_list > skb > > > while (pos < offset + len) { > if (i >= nfrags) { > - BUG_ON(skb_headlen(list_skb)); > + if (skb_headlen(list_skb) && check_list_skb == list_skb) { Here cause next BUG_ON always false. > + } else { > + BUG_ON(skb_headlen(list_skb) && check_list_skb == > list_skb); Just according code logic, no need BUG_ON, right? > > - i = 0; > - nfrags = skb_shinfo(list_skb)->nr_frags; > - frag = skb_shinfo(list_skb)->frags; > - frag_skb = list_skb; > + i = 0; > + nfrags = skb_shinfo(list_skb)->nr_frags; > + frag = skb_shinfo(list_skb)->frags; > + frag_skb = list_skb; > > - BUG_ON(!nfrags); > + BUG_ON(!nfrags); > > - if (skb_orphan_frags(frag_skb, GFP_ATOMIC) || > - skb_zerocopy_clone(nskb, frag_skb, > - GFP_ATOMIC)) > - goto err; > + if (skb_orphan_frags(frag_skb, GFP_ATOMIC) || > + skb_zerocopy_clone(nskb, frag_skb, > GFP_ATOMIC)) > + goto err; > > - list_skb = list_skb->next; > + list_skb = list_skb->next; > + check_list_skb = list_skb; > + } > } > > if (unlikely(skb_shinfo(nskb)->nr_frags >= > -- > 2.9.5 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next 1/2] net: permit skb_segment on head_frag frag_list skb 2018-03-20 5:30 ` Yuan, Linyu (NSB - CN/Shanghai) @ 2018-03-20 5:54 ` Yonghong Song 2018-03-20 6:11 ` Yuan, Linyu (NSB - CN/Shanghai) 0 siblings, 1 reply; 6+ messages in thread From: Yonghong Song @ 2018-03-20 5:54 UTC (permalink / raw) To: Yuan, Linyu (NSB - CN/Shanghai), edumazet@google.com, ast@fb.com, daniel@iogearbox.net, diptanu@fb.com, netdev@vger.kernel.org Cc: kernel-team@fb.com On 3/19/18 10:30 PM, Yuan, Linyu (NSB - CN/Shanghai) wrote: > > >> -----Original Message----- >> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] >> On Behalf Of Yonghong Song >> Sent: Tuesday, March 20, 2018 1:16 PM >> To: edumazet@google.com; ast@fb.com; daniel@iogearbox.net; >> diptanu@fb.com; netdev@vger.kernel.org >> Cc: kernel-team@fb.com >> Subject: [PATCH net-next 1/2] net: permit skb_segment on head_frag frag_list >> skb >> >> >> while (pos < offset + len) { >> if (i >= nfrags) { >> - BUG_ON(skb_headlen(list_skb)); >> + if (skb_headlen(list_skb) && check_list_skb == list_skb) { > Here cause next BUG_ON always false. The idea is since in this branch, we did not do list_skb = list_skb->next. So we update check_list_skb. Next time, when the control reaches here, list_skb may still be the same, but check_list_skb is not, so we proceed to process list_skb->frags in the else branch. In the else branch, we have list_skb = list_skb->next; check_list_skb = list_skb; So when the current frags are processed and ready for the list_skb. list_skb will be equal to check_list_skb and it will be processed again. It is a little bit convoluted. Please let me know you have better idea. >> + } else { >> + BUG_ON(skb_headlen(list_skb) && check_list_skb == >> list_skb); > Just according code logic, no need BUG_ON, right? Oh, yes, we do not need this. Will remove in the next version. >> >> - i = 0; >> - nfrags = skb_shinfo(list_skb)->nr_frags; >> - frag = skb_shinfo(list_skb)->frags; >> - frag_skb = list_skb; >> + i = 0; >> + nfrags = skb_shinfo(list_skb)->nr_frags; >> + frag = skb_shinfo(list_skb)->frags; >> + frag_skb = list_skb; >> >> - BUG_ON(!nfrags); >> + BUG_ON(!nfrags); >> >> - if (skb_orphan_frags(frag_skb, GFP_ATOMIC) || >> - skb_zerocopy_clone(nskb, frag_skb, >> - GFP_ATOMIC)) >> - goto err; >> + if (skb_orphan_frags(frag_skb, GFP_ATOMIC) || >> + skb_zerocopy_clone(nskb, frag_skb, >> GFP_ATOMIC)) >> + goto err; >> >> - list_skb = list_skb->next; >> + list_skb = list_skb->next; >> + check_list_skb = list_skb; >> + } >> } >> >> if (unlikely(skb_shinfo(nskb)->nr_frags >= >> -- >> 2.9.5 > ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH net-next 1/2] net: permit skb_segment on head_frag frag_list skb 2018-03-20 5:54 ` Yonghong Song @ 2018-03-20 6:11 ` Yuan, Linyu (NSB - CN/Shanghai) 0 siblings, 0 replies; 6+ messages in thread From: Yuan, Linyu (NSB - CN/Shanghai) @ 2018-03-20 6:11 UTC (permalink / raw) To: Yonghong Song, edumazet@google.com, ast@fb.com, daniel@iogearbox.net, diptanu@fb.com, netdev@vger.kernel.org Cc: kernel-team@fb.com Sorry, I should not add "Here cause next BUG_ON always false." It cause misunderstanding, I just comment on BUG_ON in else branch. > -----Original Message----- > From: Yonghong Song [mailto:yhs@fb.com] > Sent: Tuesday, March 20, 2018 1:54 PM > To: Yuan, Linyu (NSB - CN/Shanghai); edumazet@google.com; ast@fb.com; > daniel@iogearbox.net; diptanu@fb.com; netdev@vger.kernel.org > Cc: kernel-team@fb.com > Subject: Re: [PATCH net-next 1/2] net: permit skb_segment on head_frag > frag_list skb > > > > On 3/19/18 10:30 PM, Yuan, Linyu (NSB - CN/Shanghai) wrote: > > > > > >> -----Original Message----- > >> From: netdev-owner@vger.kernel.org > [mailto:netdev-owner@vger.kernel.org] > >> On Behalf Of Yonghong Song > >> Sent: Tuesday, March 20, 2018 1:16 PM > >> To: edumazet@google.com; ast@fb.com; daniel@iogearbox.net; > >> diptanu@fb.com; netdev@vger.kernel.org > >> Cc: kernel-team@fb.com > >> Subject: [PATCH net-next 1/2] net: permit skb_segment on head_frag > frag_list > >> skb > >> > >> > >> while (pos < offset + len) { > >> if (i >= nfrags) { > >> - BUG_ON(skb_headlen(list_skb)); > >> + if (skb_headlen(list_skb) && check_list_skb == list_skb) { > > Here cause next BUG_ON always false. > > The idea is since in this branch, we did not do list_skb = > list_skb->next. So we update check_list_skb. Next time, when the > control reaches here, list_skb may still be the same, but check_list_skb > is not, so we proceed to process list_skb->frags in the else branch. > > In the else branch, we have > list_skb = list_skb->next; > check_list_skb = list_skb; > > So when the current frags are processed and ready for the list_skb. > list_skb will be equal to check_list_skb and it will be processed again. > > It is a little bit convoluted. Please let me know you have better idea. > > >> + } else { > >> + BUG_ON(skb_headlen(list_skb) && check_list_skb == > >> list_skb); > > Just according code logic, no need BUG_ON, right? > > Oh, yes, we do not need this. Will remove in the next version. > > >> > >> - i = 0; > >> - nfrags = skb_shinfo(list_skb)->nr_frags; > >> - frag = skb_shinfo(list_skb)->frags; > >> - frag_skb = list_skb; > >> + i = 0; > >> + nfrags = skb_shinfo(list_skb)->nr_frags; > >> + frag = skb_shinfo(list_skb)->frags; > >> + frag_skb = list_skb; > >> > >> - BUG_ON(!nfrags); > >> + BUG_ON(!nfrags); > >> > >> - if (skb_orphan_frags(frag_skb, GFP_ATOMIC) || > >> - skb_zerocopy_clone(nskb, frag_skb, > >> - GFP_ATOMIC)) > >> - goto err; > >> + if (skb_orphan_frags(frag_skb, GFP_ATOMIC) || > >> + skb_zerocopy_clone(nskb, frag_skb, > >> GFP_ATOMIC)) > >> + goto err; > >> > >> - list_skb = list_skb->next; > >> + list_skb = list_skb->next; > >> + check_list_skb = list_skb; > >> + } > >> } > >> > >> if (unlikely(skb_shinfo(nskb)->nr_frags >= > >> -- > >> 2.9.5 > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH net-next 2/2] net: bpf: add a test for skb_segment in test_bpf module 2018-03-20 5:16 [PATCH net-next 0/2] net: permit skb_segment on head_frag frag_list skb Yonghong Song 2018-03-20 5:16 ` [PATCH net-next 1/2] " Yonghong Song @ 2018-03-20 5:16 ` Yonghong Song 1 sibling, 0 replies; 6+ messages in thread From: Yonghong Song @ 2018-03-20 5:16 UTC (permalink / raw) To: edumazet, ast, daniel, diptanu, netdev; +Cc: kernel-team Without the previous commit, "modprobe test_bpf" will have the following errors: ... [ 98.149165] ------------[ cut here ]------------ [ 98.159362] kernel BUG at net/core/skbuff.c:3667! [ 98.169756] invalid opcode: 0000 [#1] SMP PTI [ 98.179370] Modules linked in: [ 98.179371] test_bpf(+) ... which triggers the bug the previous commit intends to fix. The skbs are constructed to mimic what mlx5 may generate. The packet size/header may not mimic real cases in production. But the processing flow is similar. Signed-off-by: Yonghong Song <yhs@fb.com> --- lib/test_bpf.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 70 insertions(+), 1 deletion(-) diff --git a/lib/test_bpf.c b/lib/test_bpf.c index 2efb213..045d7d3 100644 --- a/lib/test_bpf.c +++ b/lib/test_bpf.c @@ -6574,6 +6574,72 @@ static bool exclude_test(int test_id) return test_id < test_range[0] || test_id > test_range[1]; } +static struct sk_buff *build_test_skb(void *page) +{ + u32 headroom = NET_SKB_PAD + NET_IP_ALIGN + ETH_HLEN; + struct sk_buff *skb[2]; + int i, data_size = 8; + + for (i = 0; i < 2; i++) { + /* this will set skb[i]->head_frag */ + skb[i] = build_skb(page, headroom); + if (!skb[i]) + return NULL; + + skb_reserve(skb[i], headroom); + skb_put(skb[i], data_size); + skb[i]->protocol = htons(ETH_P_IP); + skb_reset_network_header(skb[i]); + skb_set_mac_header(skb[i], -ETH_HLEN); + + skb_add_rx_frag(skb[i], skb_shinfo(skb[i])->nr_frags, + page, 0, 64, 64); + // skb: skb_headlen(skb[i]): 8, skb[i]->head_frag = 1 + } + + /* setup shinfo */ + skb_shinfo(skb[0])->gso_size = 1448; + skb_shinfo(skb[0])->gso_type = SKB_GSO_TCPV4; + skb_shinfo(skb[0])->gso_type |= SKB_GSO_DODGY; + skb_shinfo(skb[0])->gso_segs = 0; + skb_shinfo(skb[0])->frag_list = skb[1]; + + /* adjust skb[0]'s len */ + skb[0]->len += skb[1]->len; + skb[0]->data_len += skb[1]->data_len; + skb[0]->truesize += skb[1]->truesize; + + return skb[0]; +} + +static __init int test_skb_segment(void) +{ + netdev_features_t features; + struct sk_buff *skb; + void *page; + int ret = -1; + + page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO); + if (!page) { + pr_info("%s: failed to get_free_page!", __func__); + return ret; + } + + features = NETIF_F_SG | NETIF_F_GSO_PARTIAL | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM; + features |= NETIF_F_RXCSUM; + skb = build_test_skb(page); + if (!skb) { + pr_info("%s: failed to build_test_skb", __func__); + } else if (skb_segment(skb, features)) { + ret = 0; + pr_info("%s: success in skb_segment!", __func__); + } else { + pr_info("%s: failed in skb_segment!", __func__); + } + free_page((unsigned long)page); + return ret; +} + static __init int test_bpf(void) { int i, err_cnt = 0, pass_cnt = 0; @@ -6632,8 +6698,11 @@ static int __init test_bpf_init(void) return ret; ret = test_bpf(); - destroy_bpf_tests(); + if (ret) + return ret; + + ret = test_skb_segment(); return ret; } -- 2.9.5 ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-03-20 6:11 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-03-20 5:16 [PATCH net-next 0/2] net: permit skb_segment on head_frag frag_list skb Yonghong Song 2018-03-20 5:16 ` [PATCH net-next 1/2] " Yonghong Song 2018-03-20 5:30 ` Yuan, Linyu (NSB - CN/Shanghai) 2018-03-20 5:54 ` Yonghong Song 2018-03-20 6:11 ` Yuan, Linyu (NSB - CN/Shanghai) 2018-03-20 5:16 ` [PATCH net-next 2/2] net: bpf: add a test for skb_segment in test_bpf module Yonghong Song
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).