* [PATCH net-next v3 0/2] net: permit skb_segment on head_frag frag_list skb
From: Yonghong Song @ 2018-03-20 23:21 UTC (permalink / raw)
To: edumazet, ast, daniel, diptanu, netdev, alexander.duyck; +Cc: kernel-team
One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
function skb_segment(), line 3667. The bpf program attaches to
clsact ingress, calls bpf_skb_change_proto to change protocol
from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect
to send the changed packet out.
...
3665 while (pos < offset + len) {
3666 if (i >= nfrags) {
3667 BUG_ON(skb_headlen(list_skb));
...
The triggering input skb has the following properties:
list_skb = skb->frag_list;
skb->nfrags != NULL && skb_headlen(list_skb) != 0
and skb_segment() is not able to handle a frag_list skb
if its headlen (list_skb->len - list_skb->data_len) is not 0.
Patch #1 provides a simple solution to avoid BUG_ON. If
list_skb->head_frag is true, its page-backed frag will
be processed before the list_skb->frags.
Patch #2 provides a test case in test_bpf module which
constructs a skb and calls skb_segment() directly. The test
case is able to trigger the BUG_ON without Patch #1.
The patch has been tested in the following setup:
ipv6_host <-> nat_server <-> ipv4_host
where nat_server has a bpf program doing ipv4<->ipv6
translation and forwarding through clsact hook
bpf_skb_change_proto.
Changelog:
v2 -> v3:
. Use starting frag index -1 (instead of 0) to
special process head_frag before other frags in the skb,
suggested by Alexander Duyck.
v1 -> v2:
. Removed never-hit BUG_ON, spotted by Linyu Yuan.
Yonghong Song (2):
net: permit skb_segment on head_frag frag_list skb
net: bpf: add a test for skb_segment in test_bpf module
lib/test_bpf.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
net/core/skbuff.c | 51 ++++++++++++++++++++++++++++-----------
2 files changed, 107 insertions(+), 15 deletions(-)
--
2.9.5
^ permalink raw reply
* [PATCH net-next v3 1/2] net: permit skb_segment on head_frag frag_list skb
From: Yonghong Song @ 2018-03-20 23:21 UTC (permalink / raw)
To: edumazet, ast, daniel, diptanu, netdev, alexander.duyck; +Cc: kernel-team
In-Reply-To: <20180320232156.3455738-1-yhs@fb.com>
One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
function skb_segment(), line 3667. The bpf program attaches to
clsact ingress, calls bpf_skb_change_proto to change protocol
from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect
to send the changed packet out.
3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3473 netdev_features_t features)
3474 {
3475 struct sk_buff *segs = NULL;
3476 struct sk_buff *tail = NULL;
...
3665 while (pos < offset + len) {
3666 if (i >= nfrags) {
3667 BUG_ON(skb_headlen(list_skb));
3668
3669 i = 0;
3670 nfrags = skb_shinfo(list_skb)->nr_frags;
3671 frag = skb_shinfo(list_skb)->frags;
3672 frag_skb = list_skb;
...
call stack:
...
#1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525
#2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc
#3 [ffff883ffef03640] oops_end at ffffffff8101d7e7
#4 [ffff883ffef03668] die at ffffffff8101deb2
#5 [ffff883ffef03698] do_trap at ffffffff8101a700
#6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe
#7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0
#8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab
[exception RIP: skb_segment+3044]
RIP: ffffffff817e4dd4 RSP: ffff883ffef03860 RFLAGS: 00010216
RAX: 0000000000002bf6 RBX: ffff883feb7aaa00 RCX: 0000000000000011
RDX: ffff883fb87910c0 RSI: 0000000000000011 RDI: ffff883feb7ab500
RBP: ffff883ffef03928 R8: 0000000000002ce2 R9: 00000000000027da
R10: 000001ea00000000 R11: 0000000000002d82 R12: ffff883f90a1ee80
R13: ffff883fb8791120 R14: ffff883feb7abc00 R15: 0000000000002ce2
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7
--- <IRQ stack> ---
...
The triggering input skb has the following properties:
list_skb = skb->frag_list;
skb->nfrags != NULL && skb_headlen(list_skb) != 0
and skb_segment() is not able to handle a frag_list skb
if its headlen (list_skb->len - list_skb->data_len) is not 0.
This patch addressed the issue by handling skb_headlen(list_skb) != 0
case properly if list_skb->head_frag is true, which is expected in
most cases. The head frag is processed before list_skb->frags
are processed.
Reported-by: Diptanu Gon Choudhury <diptanu@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
net/core/skbuff.c | 51 +++++++++++++++++++++++++++++++++++++--------------
1 file changed, 37 insertions(+), 14 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 715c134..59bbc06 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3475,7 +3475,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
struct sk_buff *segs = NULL;
struct sk_buff *tail = NULL;
struct sk_buff *list_skb = skb_shinfo(head_skb)->frag_list;
- skb_frag_t *frag = skb_shinfo(head_skb)->frags;
+ skb_frag_t *frag = skb_shinfo(head_skb)->frags, *head_frag = NULL;
unsigned int mss = skb_shinfo(head_skb)->gso_size;
unsigned int doffset = head_skb->data - skb_mac_header(head_skb);
struct sk_buff *frag_skb = head_skb;
@@ -3664,19 +3664,39 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
while (pos < offset + len) {
if (i >= nfrags) {
- BUG_ON(skb_headlen(list_skb));
-
i = 0;
+ if (skb_headlen(list_skb)) {
+ struct page *page;
+
+ BUG_ON(!list_skb->head_frag);
+
+ page = virt_to_head_page(list_skb->head);
+ if (!head_frag) {
+ head_frag = kmalloc(sizeof(skb_frag_t),
+ GFP_KERNEL);
+ if (!head_frag)
+ goto err;
+ }
+ head_frag->page.p = page;
+ head_frag->page_offset = list_skb->data -
+ (unsigned char *)page_address(page);
+ head_frag->size = skb_headlen(list_skb);
+ /* set i = -1 so we will pick head_frag
+ * instead of skb_shinfo(list_skb)->frags
+ * when i == -1.
+ */
+ i = -1;
+ }
nfrags = skb_shinfo(list_skb)->nr_frags;
- frag = skb_shinfo(list_skb)->frags;
- frag_skb = list_skb;
-
- BUG_ON(!nfrags);
-
- if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
- skb_zerocopy_clone(nskb, frag_skb,
- GFP_ATOMIC))
- goto err;
+ if (nfrags) {
+ frag = skb_shinfo(list_skb)->frags;
+ frag_skb = list_skb;
+
+ if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
+ skb_zerocopy_clone(nskb, frag_skb,
+ GFP_ATOMIC))
+ goto err;
+ }
list_skb = list_skb->next;
}
@@ -3689,7 +3709,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
goto err;
}
- *nskb_frag = *frag;
+ *nskb_frag = (i == -1) ? *head_frag : *frag;
__skb_frag_ref(nskb_frag);
size = skb_frag_size(nskb_frag);
@@ -3702,7 +3722,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
if (pos + size <= offset + len) {
i++;
- frag++;
+ if (i != 0)
+ frag++;
pos += size;
} else {
skb_frag_size_sub(nskb_frag, pos + size - (offset + len));
@@ -3774,10 +3795,12 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
swap(tail->destructor, head_skb->destructor);
swap(tail->sk, head_skb->sk);
}
+ kfree(head_frag);
return segs;
err:
kfree_skb_list(segs);
+ kfree(head_frag);
return ERR_PTR(err);
}
EXPORT_SYMBOL_GPL(skb_segment);
--
2.9.5
^ permalink raw reply related
* Re: [PATCH bpf-next] bpf: skip unnecessary capability check
From: Daniel Borkmann @ 2018-03-20 22:55 UTC (permalink / raw)
To: Lorenzo Colitti, Chenbo Feng
Cc: netdev, ast, Jeffrey Vander Stoep, Chenbo Feng
In-Reply-To: <CAKD1Yr2SLoQ_HzYouchVH6J6TU2JYf_wAOh5DawUARW4W_O82w@mail.gmail.com>
On 03/20/2018 12:37 PM, Lorenzo Colitti wrote:
> On Tue, Mar 20, 2018 at 12:57 AM, Chenbo Feng
> <chenbofeng.kernel@gmail.com> wrote:
>> - if (!capable(CAP_SYS_ADMIN) && sysctl_unprivileged_bpf_disabled)
>> + if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
>> return -EPERM;
>>
>
> Acked-by: Lorenzo Colitti <lorenzo@google.com>
>
> Should this be targeted to bpf (or even -stable) instead of bpf-next?
Ok, I've applied to bpf tree, thanks guys!
^ permalink raw reply
* Re: [PATCH net-next v2 1/2] net: permit skb_segment on head_frag frag_list skb
From: Yonghong Song @ 2018-03-20 22:53 UTC (permalink / raw)
To: Alexander Duyck
Cc: Eric Dumazet, ast, Daniel Borkmann, diptanu, Netdev, Kernel Team
In-Reply-To: <CAKgT0Ue3am2MtFcqEpOsLWwEtg_Whh3eV4OoMM_a3ES82s1Scg@mail.gmail.com>
On 3/20/18 11:08 AM, Alexander Duyck wrote:
> On Tue, Mar 20, 2018 at 8:55 AM, Yonghong Song <yhs@fb.com> wrote:
>> One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
>> function skb_segment(), line 3667. The bpf program attaches to
>> clsact ingress, calls bpf_skb_change_proto to change protocol
>> from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect
>> to send the changed packet out.
>>
>> 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
>> 3473 netdev_features_t features)
>> 3474 {
>> 3475 struct sk_buff *segs = NULL;
>> 3476 struct sk_buff *tail = NULL;
>> ...
>> 3665 while (pos < offset + len) {
>> 3666 if (i >= nfrags) {
>> 3667 BUG_ON(skb_headlen(list_skb));
>> 3668
>> 3669 i = 0;
>> 3670 nfrags = skb_shinfo(list_skb)->nr_frags;
>> 3671 frag = skb_shinfo(list_skb)->frags;
>> 3672 frag_skb = list_skb;
>> ...
>>
>> call stack:
>> ...
>> #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525
>> #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc
>> #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7
>> #4 [ffff883ffef03668] die at ffffffff8101deb2
>> #5 [ffff883ffef03698] do_trap at ffffffff8101a700
>> #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe
>> #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0
>> #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab
>> [exception RIP: skb_segment+3044]
>> RIP: ffffffff817e4dd4 RSP: ffff883ffef03860 RFLAGS: 00010216
>> RAX: 0000000000002bf6 RBX: ffff883feb7aaa00 RCX: 0000000000000011
>> RDX: ffff883fb87910c0 RSI: 0000000000000011 RDI: ffff883feb7ab500
>> RBP: ffff883ffef03928 R8: 0000000000002ce2 R9: 00000000000027da
>> R10: 000001ea00000000 R11: 0000000000002d82 R12: ffff883f90a1ee80
>> R13: ffff883fb8791120 R14: ffff883feb7abc00 R15: 0000000000002ce2
>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>> #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7
>> --- <IRQ stack> ---
>> ...
>>
>> The triggering input skb has the following properties:
>> list_skb = skb->frag_list;
>> skb->nfrags != NULL && skb_headlen(list_skb) != 0
>> and skb_segment() is not able to handle a frag_list skb
>> if its headlen (list_skb->len - list_skb->data_len) is not 0.
>>
>> This patch addressed the issue by handling skb_headlen(list_skb) != 0
>> case properly if list_skb->head_frag is true, which is expected in
>> most cases. A one-element frag array is created for the list_skb head
>> and processed before list_skb->frags are processed.
>>
>> Reported-by: Diptanu Gon Choudhury <diptanu@fb.com>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>> net/core/skbuff.c | 42 +++++++++++++++++++++++++++++-------------
>> 1 file changed, 29 insertions(+), 13 deletions(-)
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 715c134..0ad4cda 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3475,9 +3475,10 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>> struct sk_buff *segs = NULL;
>> struct sk_buff *tail = NULL;
>> struct sk_buff *list_skb = skb_shinfo(head_skb)->frag_list;
>> - skb_frag_t *frag = skb_shinfo(head_skb)->frags;
>> + skb_frag_t *frag = skb_shinfo(head_skb)->frags, head_frag;
>
> I would move head_frag down into the while loop. No point in making it
> global to this function and eating up the extra stack space if you
> don't need it.
We need head_frag declared outside the main segmentation loop.
Its value may survive across different main loop iterations.
However, I see your point to save the stack space.
Will use a pointer here and do allocation on demand instead.
>
>> unsigned int mss = skb_shinfo(head_skb)->gso_size;
>> unsigned int doffset = head_skb->data - skb_mac_header(head_skb);
>> + struct sk_buff *check_list_skb = list_skb;
>
> This seems like a waste of a pointer. You can probably just repurpose
> nfrags to achieve the same purpose you are achieving below. Note that
> nfrags is a signed int and only needing to store up to only 18 or so
> total frags so we can probably just reserve a nfrags value of -1 to
> indicate that we are using the header frag.
Just did some prototyping. Indeed, we could reserve the i = -1 to
indicate the special head_frag.
>
>> struct sk_buff *frag_skb = head_skb;
>> unsigned int offset = doffset;
>> unsigned int tnl_hlen = skb_tnl_header_len(head_skb);
>> @@ -3590,6 +3591,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>> nskb = skb_clone(list_skb, GFP_ATOMIC);
>> list_skb = list_skb->next;
>> + check_list_skb = list_skb;
>>
>> if (unlikely(!nskb))
>> goto err;
>
> If my understanding is correct then this is unneeded if you just
> change how you use i and nfrags.
Right.
>
>> @@ -3664,21 +3666,35 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>> while (pos < offset + len) {
>
> You could probably just declare here since this would be more local to
> where it is actually used and the fact is it can be overwritten with
> each iteration of the loop anyway so there is no need to reserve the
> space before you get here.
The actual space can be allocated once and reused later. But the space
needs to be preserved across main segmentation loop.
>
>> if (i >= nfrags) {
>> - BUG_ON(skb_headlen(list_skb));
>> -
>> - i = 0;
>> - nfrags = skb_shinfo(list_skb)->nr_frags;
>> - frag = skb_shinfo(list_skb)->frags;
>> - frag_skb = list_skb;
>> + if (skb_headlen(list_skb) && check_list_skb == list_skb) {
>> + struct page *page;
>> +
>> + BUG_ON(!list_skb->head_frag);
>> +
>> + i = 0;
>> + nfrags = 1;
>> + page = virt_to_head_page(list_skb->head);
>> + head_frag.page.p = page;
>> + head_frag.page_offset = list_skb->data -
>> + (unsigned char *)page_address(page);
>> + head_frag.size = skb_headlen(list_skb);
>> + frag = &head_frag;
>> + check_list_skb = list_skb->next;
>
> The whole need for check_list_skb can be worked around if we take
> advantage of the fact that i and nfrags are both integers so we could
> use -1 for both to indicate we are processing the head frag. The only
> bit that gets messy is the fact that we have to add special handling
> for the case where skb_shinfo(list_skb)->nr_frags is 0. If that is the
> case we would have to set nfrags to 0 and bump the list_skb =
> list_skb->next so that we avoid trying to pull frags from an otherwise
> empty buffer.
Right. if nr_frags = 0, we do need to avoid pulling frags.
Thanks for the review, will send out v3 shortly.
>
>> + } else {
>> + i = 0;
>> + nfrags = skb_shinfo(list_skb)->nr_frags;
>> + frag = skb_shinfo(list_skb)->frags;
>> + frag_skb = list_skb;
>>
>> - BUG_ON(!nfrags);
>> + BUG_ON(!nfrags);
>>
>> - if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>> - skb_zerocopy_clone(nskb, frag_skb,
>> - GFP_ATOMIC))
>> - goto err;
>> + if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>> + skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>> + goto err;
>>
>> - list_skb = list_skb->next;
>> + list_skb = list_skb->next;
>> + check_list_skb = list_skb;
>> + }
>> }
>>
>> if (unlikely(skb_shinfo(nskb)->nr_frags >=
>> --
>> 2.9.5
>>
^ permalink raw reply
* Re: [PATCH bpf-next] bpf, doc: add description wrt native/bpf clang target and pointer size
From: Alexei Starovoitov @ 2018-03-20 22:49 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: netdev
In-Reply-To: <6f7d230bee66625c8dab2b9e0db3ea62a8d48459.1521501596.git.daniel@iogearbox.net>
On Tue, Mar 20, 2018 at 12:21:15AM +0100, Daniel Borkmann wrote:
> As this recently came up on netdev [0], lets add it to the BPF devel doc.
>
> [0] https://www.spinics.net/lists/netdev/msg489612.html
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Appled to bpf-next, thanks Daniel.
^ permalink raw reply
* Re: [PATCH net] trace/bpf: remove helper bpf_perf_prog_read_value from tracepoint type programs
From: Daniel Borkmann @ 2018-03-20 22:21 UTC (permalink / raw)
To: Yonghong Song, ast, netdev; +Cc: kernel-team
In-Reply-To: <20180320181917.2030514-1-yhs@fb.com>
On 03/20/2018 07:19 PM, Yonghong Song wrote:
> Commit 4bebdc7a85aa ("bpf: add helper bpf_perf_prog_read_value")
> added helper bpf_perf_prog_read_value so that perf_event type program
> can read event counter and enabled/running time.
> This commit, however, introduced a bug which allows this helper
> for tracepoint type programs. This is incorrect as bpf_perf_prog_read_value
> needs to access perf_event through its bpf_perf_event_data_kern type context,
> which is not available for tracepoint type program.
>
> This patch fixed the issue by separating bpf_func_proto between tracepoint
> and perf_event type programs and removed bpf_perf_prog_read_value
> from tracepoint func prototype.
>
> Fixes: Commit 4bebdc7a85aa ("bpf: add helper bpf_perf_prog_read_value")
> Reported-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Yonghong Song <yhs@fb.com>
Sigh, also makes sense to split given this is so subtle. Applied to bpf,
thanks Yonghong!
^ permalink raw reply
* Re: [PATCH] test_bpf: Fix testing with CONFIG_BPF_JIT_ALWAYS_ON=y on other arches
From: Daniel Borkmann @ 2018-03-20 22:07 UTC (permalink / raw)
To: Thadeu Lima de Souza Cascardo, netdev
Cc: linux-kernel, Yonghong Song, Alexei Starovoitov
In-Reply-To: <20180320125851.19650-1-cascardo@canonical.com>
On 03/20/2018 01:58 PM, Thadeu Lima de Souza Cascardo wrote:
> Function bpf_fill_maxinsns11 is designed to not be able to be JITed on
> x86_64. So, it fails when CONFIG_BPF_JIT_ALWAYS_ON=y, and
> commit 09584b406742 ("bpf: fix selftests/bpf test_kmod.sh failure when
> CONFIG_BPF_JIT_ALWAYS_ON=y") makes sure that failure is detected on that
> case.
>
> However, it does not fail on other architectures, which have a different
> JIT compiler design. So, test_bpf has started to fail to load on those.
>
> After this fix, test_bpf loads fine on both x86_64 and ppc64el.
>
> Fixes: 09584b406742 ("bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y")
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Yep, agree. Applied to bpf tree, thanks Thadeu!
^ permalink raw reply
* [PATCH PATCH net 4/4] hv_netvsc: common detach logic
From: Stephen Hemminger @ 2018-03-20 22:03 UTC (permalink / raw)
To: kys, haiyangz, sthemmin; +Cc: devel, netdev
In-Reply-To: <20180320220305.32223-1-sthemmin@microsoft.com>
Make common function for detaching internals of device
during changes to MTU and RSS. Make sure no more packets
are transmitted and all packets have been received before
doing device teardown.
Change the wait logic to be common and use usleep_range().
Changes transmit enabling logic so that transmit queues are disabled
during the period when lower device is being changed. And enabled
only after sub channels are setup. This avoids issue where it could
be that a packet was being sent while subchannel was not initialized.
Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
drivers/net/hyperv/hyperv_net.h | 1 -
drivers/net/hyperv/netvsc.c | 20 +--
drivers/net/hyperv/netvsc_drv.c | 278 +++++++++++++++++++++-----------------
drivers/net/hyperv/rndis_filter.c | 17 +--
4 files changed, 173 insertions(+), 143 deletions(-)
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index cd538d5a7986..32861036c3fc 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -212,7 +212,6 @@ void netvsc_channel_cb(void *context);
int netvsc_poll(struct napi_struct *napi, int budget);
void rndis_set_subchannel(struct work_struct *w);
-bool rndis_filter_opened(const struct netvsc_device *nvdev);
int rndis_filter_open(struct netvsc_device *nvdev);
int rndis_filter_close(struct netvsc_device *nvdev);
struct netvsc_device *rndis_filter_device_add(struct hv_device *dev,
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 37b0a30d6b03..7472172823f3 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -555,8 +555,6 @@ void netvsc_device_remove(struct hv_device *device)
= rtnl_dereference(net_device_ctx->nvdev);
int i;
- cancel_work_sync(&net_device->subchan_work);
-
netvsc_revoke_buf(device, net_device);
RCU_INIT_POINTER(net_device_ctx->nvdev, NULL);
@@ -643,14 +641,18 @@ static void netvsc_send_tx_complete(struct netvsc_device *net_device,
queue_sends =
atomic_dec_return(&net_device->chan_table[q_idx].queue_sends);
- if (net_device->destroy && queue_sends == 0)
- wake_up(&net_device->wait_drain);
+ if (unlikely(net_device->destroy)) {
+ if (queue_sends == 0)
+ wake_up(&net_device->wait_drain);
+ } else {
+ struct netdev_queue *txq = netdev_get_tx_queue(ndev, q_idx);
- if (netif_tx_queue_stopped(netdev_get_tx_queue(ndev, q_idx)) &&
- (hv_ringbuf_avail_percent(&channel->outbound) > RING_AVAIL_PERCENT_HIWATER ||
- queue_sends < 1)) {
- netif_tx_wake_queue(netdev_get_tx_queue(ndev, q_idx));
- ndev_ctx->eth_stats.wake_queue++;
+ if (netif_tx_queue_stopped(txq) &&
+ (hv_ringbuf_avail_percent(&channel->outbound) > RING_AVAIL_PERCENT_HIWATER ||
+ queue_sends < 1)) {
+ netif_tx_wake_queue(txq);
+ ndev_ctx->eth_stats.wake_queue++;
+ }
}
}
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index faea0be18924..f28c85d212ce 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -46,7 +46,10 @@
#include "hyperv_net.h"
-#define RING_SIZE_MIN 64
+#define RING_SIZE_MIN 64
+#define RETRY_US_LO 5000
+#define RETRY_US_HI 10000
+#define RETRY_MAX 2000 /* >10 sec */
#define LINKCHANGE_INT (2 * HZ)
#define VF_TAKEOVER_INT (HZ / 10)
@@ -123,10 +126,8 @@ static int netvsc_open(struct net_device *net)
}
rdev = nvdev->extension;
- if (!rdev->link_state) {
+ if (!rdev->link_state)
netif_carrier_on(net);
- netif_tx_wake_all_queues(net);
- }
if (vf_netdev) {
/* Setting synthetic device up transparently sets
@@ -142,36 +143,25 @@ static int netvsc_open(struct net_device *net)
return 0;
}
-static int netvsc_close(struct net_device *net)
+static int netvsc_wait_until_empty(struct netvsc_device *nvdev)
{
- struct net_device_context *net_device_ctx = netdev_priv(net);
- struct net_device *vf_netdev
- = rtnl_dereference(net_device_ctx->vf_netdev);
- struct netvsc_device *nvdev = rtnl_dereference(net_device_ctx->nvdev);
- int ret = 0;
- u32 aread, i, msec = 10, retry = 0, retry_max = 20;
- struct vmbus_channel *chn;
-
- netif_tx_disable(net);
-
- /* No need to close rndis filter if it is removed already */
- if (!nvdev)
- goto out;
-
- ret = rndis_filter_close(nvdev);
- if (ret != 0) {
- netdev_err(net, "unable to close device (ret %d).\n", ret);
- return ret;
- }
+ unsigned int retry = 0;
+ int i;
/* Ensure pending bytes in ring are read */
- while (true) {
- aread = 0;
+ for (;;) {
+ u32 aread = 0;
+
for (i = 0; i < nvdev->num_chn; i++) {
- chn = nvdev->chan_table[i].channel;
+ struct vmbus_channel *chn
+ = nvdev->chan_table[i].channel;
+
if (!chn)
continue;
+ /* make sure receive not running now */
+ napi_synchronize(&nvdev->chan_table[i].napi);
+
aread = hv_get_bytes_to_read(&chn->inbound);
if (aread)
break;
@@ -181,22 +171,40 @@ static int netvsc_close(struct net_device *net)
break;
}
- retry++;
- if (retry > retry_max || aread == 0)
- break;
+ if (aread == 0)
+ return 0;
- msleep(msec);
+ if (++retry > RETRY_MAX)
+ return -ETIMEDOUT;
- if (msec < 1000)
- msec *= 2;
+ usleep_range(RETRY_US_LO, RETRY_US_HI);
}
+}
- if (aread) {
- netdev_err(net, "Ring buffer not empty after closing rndis\n");
- ret = -ETIMEDOUT;
+static int netvsc_close(struct net_device *net)
+{
+ struct net_device_context *net_device_ctx = netdev_priv(net);
+ struct net_device *vf_netdev
+ = rtnl_dereference(net_device_ctx->vf_netdev);
+ struct netvsc_device *nvdev = rtnl_dereference(net_device_ctx->nvdev);
+ int ret;
+
+ netif_tx_disable(net);
+
+ /* No need to close rndis filter if it is removed already */
+ if (!nvdev)
+ return 0;
+
+ ret = rndis_filter_close(nvdev);
+ if (ret != 0) {
+ netdev_err(net, "unable to close device (ret %d).\n", ret);
+ return ret;
}
-out:
+ ret = netvsc_wait_until_empty(nvdev);
+ if (ret)
+ netdev_err(net, "Ring buffer not empty after closing rndis\n");
+
if (vf_netdev)
dev_close(vf_netdev);
@@ -845,16 +853,81 @@ static void netvsc_get_channels(struct net_device *net,
}
}
+static int netvsc_detach(struct net_device *ndev,
+ struct netvsc_device *nvdev)
+{
+ struct net_device_context *ndev_ctx = netdev_priv(ndev);
+ struct hv_device *hdev = ndev_ctx->device_ctx;
+ int ret;
+
+ /* Don't try continuing to try and setup sub channels */
+ if (cancel_work_sync(&nvdev->subchan_work))
+ nvdev->num_chn = 1;
+
+ /* If device was up (receiving) then shutdown */
+ if (netif_running(ndev)) {
+ netif_tx_disable(ndev);
+
+ ret = rndis_filter_close(nvdev);
+ if (ret) {
+ netdev_err(ndev,
+ "unable to close device (ret %d).\n", ret);
+ return ret;
+ }
+
+ ret = netvsc_wait_until_empty(nvdev);
+ if (ret) {
+ netdev_err(ndev,
+ "Ring buffer not empty after closing rndis\n");
+ return ret;
+ }
+ }
+
+ netif_device_detach(ndev);
+
+ rndis_filter_device_remove(hdev, nvdev);
+
+ return 0;
+}
+
+static int netvsc_attach(struct net_device *ndev,
+ struct netvsc_device_info *dev_info)
+{
+ struct net_device_context *ndev_ctx = netdev_priv(ndev);
+ struct hv_device *hdev = ndev_ctx->device_ctx;
+ struct netvsc_device *nvdev;
+ struct rndis_device *rdev;
+ int ret;
+
+ nvdev = rndis_filter_device_add(hdev, dev_info);
+ if (IS_ERR(nvdev))
+ return PTR_ERR(nvdev);
+
+ /* Note: enable and attach happen when sub-channels setup */
+
+ netif_carrier_off(ndev);
+
+ if (netif_running(ndev)) {
+ ret = rndis_filter_open(nvdev);
+ if (ret)
+ return ret;
+
+ rdev = nvdev->extension;
+ if (!rdev->link_state)
+ netif_carrier_on(ndev);
+ }
+
+ return 0;
+}
+
static int netvsc_set_channels(struct net_device *net,
struct ethtool_channels *channels)
{
struct net_device_context *net_device_ctx = netdev_priv(net);
- struct hv_device *dev = net_device_ctx->device_ctx;
struct netvsc_device *nvdev = rtnl_dereference(net_device_ctx->nvdev);
unsigned int orig, count = channels->combined_count;
struct netvsc_device_info device_info;
- bool was_opened;
- int ret = 0;
+ int ret;
/* We do not support separate count for rx, tx, or other */
if (count == 0 ||
@@ -871,9 +944,6 @@ static int netvsc_set_channels(struct net_device *net,
return -EINVAL;
orig = nvdev->num_chn;
- was_opened = rndis_filter_opened(nvdev);
- if (was_opened)
- rndis_filter_close(nvdev);
memset(&device_info, 0, sizeof(device_info));
device_info.num_chn = count;
@@ -882,28 +952,17 @@ static int netvsc_set_channels(struct net_device *net,
device_info.recv_sections = nvdev->recv_section_cnt;
device_info.recv_section_size = nvdev->recv_section_size;
- rndis_filter_device_remove(dev, nvdev);
+ ret = netvsc_detach(net, nvdev);
+ if (ret)
+ return ret;
- nvdev = rndis_filter_device_add(dev, &device_info);
- if (IS_ERR(nvdev)) {
- ret = PTR_ERR(nvdev);
+ ret = netvsc_attach(net, &device_info);
+ if (ret) {
device_info.num_chn = orig;
- nvdev = rndis_filter_device_add(dev, &device_info);
-
- if (IS_ERR(nvdev)) {
- netdev_err(net, "restoring channel setting failed: %ld\n",
- PTR_ERR(nvdev));
- return ret;
- }
+ if (netvsc_attach(net, &device_info))
+ netdev_err(net, "restoring channel setting failed\n");
}
- if (was_opened)
- rndis_filter_open(nvdev);
-
- /* We may have missed link change notifications */
- net_device_ctx->last_reconfig = 0;
- schedule_delayed_work(&net_device_ctx->dwork, 0);
-
return ret;
}
@@ -969,10 +1028,8 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu)
struct net_device_context *ndevctx = netdev_priv(ndev);
struct net_device *vf_netdev = rtnl_dereference(ndevctx->vf_netdev);
struct netvsc_device *nvdev = rtnl_dereference(ndevctx->nvdev);
- struct hv_device *hdev = ndevctx->device_ctx;
int orig_mtu = ndev->mtu;
struct netvsc_device_info device_info;
- bool was_opened;
int ret = 0;
if (!nvdev || nvdev->destroy)
@@ -985,11 +1042,6 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu)
return ret;
}
- netif_device_detach(ndev);
- was_opened = rndis_filter_opened(nvdev);
- if (was_opened)
- rndis_filter_close(nvdev);
-
memset(&device_info, 0, sizeof(device_info));
device_info.num_chn = nvdev->num_chn;
device_info.send_sections = nvdev->send_section_cnt;
@@ -997,35 +1049,27 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu)
device_info.recv_sections = nvdev->recv_section_cnt;
device_info.recv_section_size = nvdev->recv_section_size;
- rndis_filter_device_remove(hdev, nvdev);
+ ret = netvsc_detach(ndev, nvdev);
+ if (ret)
+ goto rollback_vf;
ndev->mtu = mtu;
- nvdev = rndis_filter_device_add(hdev, &device_info);
- if (IS_ERR(nvdev)) {
- ret = PTR_ERR(nvdev);
-
- /* Attempt rollback to original MTU */
- ndev->mtu = orig_mtu;
- nvdev = rndis_filter_device_add(hdev, &device_info);
-
- if (vf_netdev)
- dev_set_mtu(vf_netdev, orig_mtu);
-
- if (IS_ERR(nvdev)) {
- netdev_err(ndev, "restoring mtu failed: %ld\n",
- PTR_ERR(nvdev));
- return ret;
- }
- }
+ ret = netvsc_attach(ndev, &device_info);
+ if (ret)
+ goto rollback;
- if (was_opened)
- rndis_filter_open(nvdev);
+ return 0;
- netif_device_attach(ndev);
+rollback:
+ /* Attempt rollback to original MTU */
+ ndev->mtu = orig_mtu;
- /* We may have missed link change notifications */
- schedule_delayed_work(&ndevctx->dwork, 0);
+ if (netvsc_attach(ndev, &device_info))
+ netdev_err(ndev, "restoring mtu failed\n");
+rollback_vf:
+ if (vf_netdev)
+ dev_set_mtu(vf_netdev, orig_mtu);
return ret;
}
@@ -1531,11 +1575,9 @@ static int netvsc_set_ringparam(struct net_device *ndev,
{
struct net_device_context *ndevctx = netdev_priv(ndev);
struct netvsc_device *nvdev = rtnl_dereference(ndevctx->nvdev);
- struct hv_device *hdev = ndevctx->device_ctx;
struct netvsc_device_info device_info;
struct ethtool_ringparam orig;
u32 new_tx, new_rx;
- bool was_opened;
int ret = 0;
if (!nvdev || nvdev->destroy)
@@ -1560,34 +1602,18 @@ static int netvsc_set_ringparam(struct net_device *ndev,
device_info.recv_sections = new_rx;
device_info.recv_section_size = nvdev->recv_section_size;
- netif_device_detach(ndev);
- was_opened = rndis_filter_opened(nvdev);
- if (was_opened)
- rndis_filter_close(nvdev);
-
- rndis_filter_device_remove(hdev, nvdev);
-
- nvdev = rndis_filter_device_add(hdev, &device_info);
- if (IS_ERR(nvdev)) {
- ret = PTR_ERR(nvdev);
+ ret = netvsc_detach(ndev, nvdev);
+ if (ret)
+ return ret;
+ ret = netvsc_attach(ndev, &device_info);
+ if (ret) {
device_info.send_sections = orig.tx_pending;
device_info.recv_sections = orig.rx_pending;
- nvdev = rndis_filter_device_add(hdev, &device_info);
- if (IS_ERR(nvdev)) {
- netdev_err(ndev, "restoring ringparam failed: %ld\n",
- PTR_ERR(nvdev));
- return ret;
- }
- }
- if (was_opened)
- rndis_filter_open(nvdev);
- netif_device_attach(ndev);
-
- /* We may have missed link change notifications */
- ndevctx->last_reconfig = 0;
- schedule_delayed_work(&ndevctx->dwork, 0);
+ if (netvsc_attach(ndev, &device_info))
+ netdev_err(ndev, "restoring ringparam failed");
+ }
return ret;
}
@@ -2072,8 +2098,8 @@ static int netvsc_probe(struct hv_device *dev,
static int netvsc_remove(struct hv_device *dev)
{
struct net_device_context *ndev_ctx;
- struct net_device *vf_netdev;
- struct net_device *net;
+ struct net_device *vf_netdev, *net;
+ struct netvsc_device *nvdev;
net = hv_get_drvdata(dev);
if (net == NULL) {
@@ -2083,10 +2109,14 @@ static int netvsc_remove(struct hv_device *dev)
ndev_ctx = netdev_priv(net);
- netif_device_detach(net);
-
cancel_delayed_work_sync(&ndev_ctx->dwork);
+ rcu_read_lock();
+ nvdev = rcu_dereference(ndev_ctx->nvdev);
+
+ if (nvdev)
+ cancel_work_sync(&nvdev->subchan_work);
+
/*
* Call to the vsc driver to let it know that the device is being
* removed. Also blocks mtu and channel changes.
@@ -2096,11 +2126,13 @@ static int netvsc_remove(struct hv_device *dev)
if (vf_netdev)
netvsc_unregister_vf(vf_netdev);
+ if (nvdev)
+ rndis_filter_device_remove(dev, nvdev);
+
unregister_netdevice(net);
- rndis_filter_device_remove(dev,
- rtnl_dereference(ndev_ctx->nvdev));
rtnl_unlock();
+ rcu_read_unlock();
hv_set_drvdata(dev, NULL);
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index 963314eb3226..a6ec41c399d6 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -1118,6 +1118,7 @@ void rndis_set_subchannel(struct work_struct *w)
for (i = 0; i < VRSS_SEND_TAB_SIZE; i++)
ndev_ctx->tx_table[i] = i % nvdev->num_chn;
+ netif_device_attach(ndev);
rtnl_unlock();
return;
@@ -1128,6 +1129,8 @@ void rndis_set_subchannel(struct work_struct *w)
nvdev->max_chn = 1;
nvdev->num_chn = 1;
+
+ netif_device_attach(ndev);
unlock:
rtnl_unlock();
}
@@ -1330,6 +1333,10 @@ struct netvsc_device *rndis_filter_device_add(struct hv_device *dev,
net_device->num_chn = 1;
}
+ /* No sub channels, device is ready */
+ if (net_device->num_chn == 1)
+ netif_device_attach(net);
+
return net_device;
err_dev_remv:
@@ -1342,9 +1349,6 @@ void rndis_filter_device_remove(struct hv_device *dev,
{
struct rndis_device *rndis_dev = net_dev->extension;
- /* Don't try and setup sub channels if about to halt */
- cancel_work_sync(&net_dev->subchan_work);
-
/* Halt and release the rndis device */
rndis_filter_halt_device(rndis_dev);
@@ -1368,10 +1372,3 @@ int rndis_filter_close(struct netvsc_device *nvdev)
return rndis_filter_close_device(nvdev->extension);
}
-
-bool rndis_filter_opened(const struct netvsc_device *nvdev)
-{
- const struct rndis_device *dev = nvdev->extension;
-
- return dev->state == RNDIS_DEV_DATAINITIALIZED;
-}
--
2.16.2
^ permalink raw reply related
* [PATCH PATCH net 3/4] hv_netvsc: change GPAD teardown order on older versions
From: Stephen Hemminger @ 2018-03-20 22:03 UTC (permalink / raw)
To: kys, haiyangz, sthemmin; +Cc: devel, netdev
In-Reply-To: <20180320220305.32223-1-sthemmin@microsoft.com>
On older versions of Windows, the host ignores messages after
vmbus channel is closed.
Workaround this by doing what Windows does and send the teardown
before close on older versions of NVSP protocol.
Reported-by: Mohammed Gamal <mgamal@redhat.com>
Fixes: 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
drivers/net/hyperv/netvsc.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 12c044baf1af..37b0a30d6b03 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -571,10 +571,15 @@ void netvsc_device_remove(struct hv_device *device)
*/
netdev_dbg(ndev, "net device safe to remove\n");
+ /* older versions require that buffer be revoked before close */
+ if (net_device->nvsp_version < NVSP_PROTOCOL_VERSION_4)
+ netvsc_teardown_gpadl(device, net_device);
+
/* Now, we can close the channel safely */
vmbus_close(device->channel);
- netvsc_teardown_gpadl(device, net_device);
+ if (net_device->nvsp_version >= NVSP_PROTOCOL_VERSION_4)
+ netvsc_teardown_gpadl(device, net_device);
/* Release all resources */
free_netvsc_device_rcu(net_device);
--
2.16.2
^ permalink raw reply related
* [PATCH PATCH net 2/4] hv_netvsc: use RCU to fix concurrent rx and queue changes
From: Stephen Hemminger @ 2018-03-20 22:03 UTC (permalink / raw)
To: kys, haiyangz, sthemmin; +Cc: devel, netdev
In-Reply-To: <20180320220305.32223-1-sthemmin@microsoft.com>
The receive processing may continue to happen while the
internal network device state is in RCU grace period.
The internal RNDIS structure is associated with the
internal netvsc_device structure; both have the same
RCU lifetime.
Defer freeing all associated parts until after grace
period.
Fixes: 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
drivers/net/hyperv/netvsc.c | 17 +++++------------
drivers/net/hyperv/rndis_filter.c | 39 ++++++++++++++++-----------------------
2 files changed, 21 insertions(+), 35 deletions(-)
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index e70a44273f55..12c044baf1af 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -90,6 +90,11 @@ static void free_netvsc_device(struct rcu_head *head)
= container_of(head, struct netvsc_device, rcu);
int i;
+ kfree(nvdev->extension);
+ vfree(nvdev->recv_buf);
+ vfree(nvdev->send_buf);
+ kfree(nvdev->send_section_map);
+
for (i = 0; i < VRSS_CHANNEL_MAX; i++)
vfree(nvdev->chan_table[i].mrc.slots);
@@ -211,12 +216,6 @@ static void netvsc_teardown_gpadl(struct hv_device *device,
net_device->recv_buf_gpadl_handle = 0;
}
- if (net_device->recv_buf) {
- /* Free up the receive buffer */
- vfree(net_device->recv_buf);
- net_device->recv_buf = NULL;
- }
-
if (net_device->send_buf_gpadl_handle) {
ret = vmbus_teardown_gpadl(device->channel,
net_device->send_buf_gpadl_handle);
@@ -231,12 +230,6 @@ static void netvsc_teardown_gpadl(struct hv_device *device,
}
net_device->send_buf_gpadl_handle = 0;
}
- if (net_device->send_buf) {
- /* Free up the send buffer */
- vfree(net_device->send_buf);
- net_device->send_buf = NULL;
- }
- kfree(net_device->send_section_map);
}
int netvsc_alloc_recv_comp_ring(struct netvsc_device *net_device, u32 q_idx)
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index 00ec80c23fe5..963314eb3226 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -264,13 +264,23 @@ static void rndis_set_link_state(struct rndis_device *rdev,
}
}
-static void rndis_filter_receive_response(struct rndis_device *dev,
- struct rndis_message *resp)
+static void rndis_filter_receive_response(struct net_device *ndev,
+ struct netvsc_device *nvdev,
+ const struct rndis_message *resp)
{
+ struct rndis_device *dev = nvdev->extension;
struct rndis_request *request = NULL;
bool found = false;
unsigned long flags;
- struct net_device *ndev = dev->ndev;
+
+ /* This should never happen, it means control message
+ * response received after device removed.
+ */
+ if (dev->state == RNDIS_DEV_UNINITIALIZED) {
+ netdev_err(ndev,
+ "got rndis message uninitialized\n");
+ return;
+ }
spin_lock_irqsave(&dev->request_lock, flags);
list_for_each_entry(request, &dev->req_list, list_ent) {
@@ -352,7 +362,6 @@ static inline void *rndis_get_ppi(struct rndis_packet *rpkt, u32 type)
static int rndis_filter_receive_data(struct net_device *ndev,
struct netvsc_device *nvdev,
- struct rndis_device *dev,
struct rndis_message *msg,
struct vmbus_channel *channel,
void *data, u32 data_buflen)
@@ -372,7 +381,7 @@ static int rndis_filter_receive_data(struct net_device *ndev,
* should be the data packet size plus the trailer padding size
*/
if (unlikely(data_buflen < rndis_pkt->data_len)) {
- netdev_err(dev->ndev, "rndis message buffer "
+ netdev_err(ndev, "rndis message buffer "
"overflow detected (got %u, min %u)"
"...dropping this message!\n",
data_buflen, rndis_pkt->data_len);
@@ -400,35 +409,20 @@ int rndis_filter_receive(struct net_device *ndev,
void *data, u32 buflen)
{
struct net_device_context *net_device_ctx = netdev_priv(ndev);
- struct rndis_device *rndis_dev = net_dev->extension;
struct rndis_message *rndis_msg = data;
- /* Make sure the rndis device state is initialized */
- if (unlikely(!rndis_dev)) {
- netif_dbg(net_device_ctx, rx_err, ndev,
- "got rndis message but no rndis device!\n");
- return NVSP_STAT_FAIL;
- }
-
- if (unlikely(rndis_dev->state == RNDIS_DEV_UNINITIALIZED)) {
- netif_dbg(net_device_ctx, rx_err, ndev,
- "got rndis message uninitialized\n");
- return NVSP_STAT_FAIL;
- }
-
if (netif_msg_rx_status(net_device_ctx))
dump_rndis_message(ndev, rndis_msg);
switch (rndis_msg->ndis_msg_type) {
case RNDIS_MSG_PACKET:
- return rndis_filter_receive_data(ndev, net_dev,
- rndis_dev, rndis_msg,
+ return rndis_filter_receive_data(ndev, net_dev, rndis_msg,
channel, data, buflen);
case RNDIS_MSG_INIT_C:
case RNDIS_MSG_QUERY_C:
case RNDIS_MSG_SET_C:
/* completion msgs */
- rndis_filter_receive_response(rndis_dev, rndis_msg);
+ rndis_filter_receive_response(ndev, net_dev, rndis_msg);
break;
case RNDIS_MSG_INDICATE:
@@ -1357,7 +1351,6 @@ void rndis_filter_device_remove(struct hv_device *dev,
net_dev->extension = NULL;
netvsc_device_remove(dev);
- kfree(rndis_dev);
}
int rndis_filter_open(struct netvsc_device *nvdev)
--
2.16.2
^ permalink raw reply related
* [PATCH PATCH net 1/4] hv_netvsc: disable NAPI before channel close
From: Stephen Hemminger @ 2018-03-20 22:03 UTC (permalink / raw)
To: kys, haiyangz, sthemmin; +Cc: devel, netdev
In-Reply-To: <20180320220305.32223-1-sthemmin@microsoft.com>
This makes sure that no CPU is still process packets when
the channel is closed.
Fixes: 76bb5db5c749 ("netvsc: fix use after free on module removal")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
drivers/net/hyperv/netvsc.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 0265d703eb03..e70a44273f55 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -568,6 +568,10 @@ void netvsc_device_remove(struct hv_device *device)
RCU_INIT_POINTER(net_device_ctx->nvdev, NULL);
+ /* And disassociate NAPI context from device */
+ for (i = 0; i < net_device->num_chn; i++)
+ netif_napi_del(&net_device->chan_table[i].napi);
+
/*
* At this point, no one should be accessing net_device
* except in here
@@ -579,10 +583,6 @@ void netvsc_device_remove(struct hv_device *device)
netvsc_teardown_gpadl(device, net_device);
- /* And dissassociate NAPI context from device */
- for (i = 0; i < net_device->num_chn; i++)
- netif_napi_del(&net_device->chan_table[i].napi);
-
/* Release all resources */
free_netvsc_device_rcu(net_device);
}
--
2.16.2
^ permalink raw reply related
* [PATCH PATCH net 0/4] hv_netvsc: fix races during shutdown and changes
From: Stephen Hemminger @ 2018-03-20 22:03 UTC (permalink / raw)
To: kys, haiyangz, sthemmin; +Cc: devel, netdev
This set of patches fixes issues identified by Vitaly Kuznetsov and
Mohammed Gamal related to state changes in Hyper-v network driver.
A lot of the issues are because setting up the netvsc device requires
a second step (in work queue) to get all the sub-channels running.
Stephen Hemminger (4):
hv_netvsc: disable NAPI before channel close
hv_netvsc: use RCU to fix concurrent rx and queue changes
hv_netvsc: change GPAD teardown order on older versions
hv_netvsc: common detach logic
drivers/net/hyperv/hyperv_net.h | 1 -
drivers/net/hyperv/netvsc.c | 52 +++----
drivers/net/hyperv/netvsc_drv.c | 278 +++++++++++++++++++++-----------------
drivers/net/hyperv/rndis_filter.c | 56 ++++----
4 files changed, 204 insertions(+), 183 deletions(-)
--
2.16.2
^ permalink raw reply
* Re: [PATCH net-next v2 2/5] net: Revert "ipv4: fix a deadlock in ip_ra_control"
From: Kirill Tkhai @ 2018-03-20 21:50 UTC (permalink / raw)
To: David Miller
Cc: yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil, avagin,
nicolas.dichtel, ebiederm, fw, roman.kapl, netdev, xiyou.wangcong,
dvyukov, andreyknvl, lkp
In-Reply-To: <41aba98d-6e38-0789-f562-4eada70a84b6@virtuozzo.com>
On 20.03.2018 22:25, Kirill Tkhai wrote:
> Hi, David,
>
> thanks for the review!
>
> On 20.03.2018 19:23, David Miller wrote:
>> From: Kirill Tkhai <ktkhai@virtuozzo.com>
>> Date: Mon, 19 Mar 2018 12:14:54 +0300
>>
>>> This reverts commit 1215e51edad1.
>>> Since raw_close() is used on every RAW socket destruction,
>>> the changes made by 1215e51edad1 scale sadly. This clearly
>>> seen on endless unshare(CLONE_NEWNET) test, and cleanup_net()
>>> kwork spends a lot of time waiting for rtnl_lock() introduced
>>> by this commit.
>>>
>>> Next patches in series will rework this in another way,
>>> so now we revert 1215e51edad1. Also, it doesn't seen
>>> mrtsock_destruct() takes sk_lock, and the comment to the commit
>>> does not show the actual stack dump. So, there is a question
>>> did we really need in it.
>>>
>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>>
>> Kirill, I think the commit you are reverting is legitimate.
>>
>> The IP_RAW_CONTROL path has an ABBA deadlock with other paths once
>> you revert this, so you are reintroducing a bug.
>
> The talk is about IP_ROUTER_ALERT, I assume there is just an erratum.
>
>> All code paths that must take both RTNL and the socket lock must
>> do them in the same order. And that order is RTNL then socket
>> lock.
>
> The place I change in this patch is IP_ROUTER_ALERT. There is only
> a call of ip_ra_control(), while this function does not need socket
> lock. Please, see next patch. It moves this ip_ra_control() out
> of socket lock. And it fixes the problem pointed in reverted patch
> in another way. So, if there is ABBA, after next patch it becomes
> solved. Does this mean I have to merge [2/5] and [3/5] together?
We also can just change the order of patches, and make [3/5] go before [2/5].
Then, the kernel still remains bisectable. How do you think about this?
Thanks,
Kirill
>> But you are breaking that here by getting us back into a state
>> where IP_RAW_CONTROL setsockopt will take the socket lock and
>> then RTNL.
>>
>> Again, we can't take, or retake, RTNL if we have the socket lock
>> currently.
>>
>> The only valid locking order is socket lock then RTNL.
>
> Thanks,
> Kirill
>
^ permalink raw reply
* Re: linux-next on x60: network manager often complains "network is disabled" after resume
From: Woody Suwalski @ 2018-03-20 21:26 UTC (permalink / raw)
To: Pavel Machek
Cc: Rafael J. Wysocki, kernel list, Linux-pm mailing list,
Netdev list
In-Reply-To: <20180319092106.GA5683@amd>
Pavel Machek wrote:
> On Mon 2018-03-19 05:17:45, Woody Suwalski wrote:
>> Pavel Machek wrote:
>>> Hi!
>>>
>>> With recent linux-next, after resume networkmanager often claims that
>>> "network is disabled". Sometimes suspend/resume clears that.
>>>
>>> Any ideas? Does it work for you?
>>> Pavel
>> Tried the 4.16-rc6 with nm 1.4.4 - I do not see the issue.
> Thanks for testing... but yes, 4.16 should be ok. If not fixed,
> problem will appear in 4.17-rc1.
>
Works here OK. Tried ~10 suspends, all restarted OK.
kernel next-20180320
nmcli shows that Wifi always connects OK
Woody
^ permalink raw reply
* Re: [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info only when crossing netns
From: Liran Alon @ 2018-03-20 21:12 UTC (permalink / raw)
To: valdis.kletnieks
Cc: David Miller, netdev, linux-kernel, idan.brown, yuval.shaia
In-Reply-To: <55538.1521571867@turing-police.cc.vt.edu>
On 20/03/18 20:51, valdis.kletnieks@vt.edu wrote:
> On Tue, 20 Mar 2018 18:39:47 +0200, Liran Alon said:
>> What is your opinion in regards if it's OK to put the flag enabling this
>> "fix" in /proc/sys/net/core? Do you think it's sufficient?
>
> Umm.. *which* /proc/sys/net/core? These could differ for things that
> are in different namespaces. Or are you proposing one systemwide
> global value (which also gets "interesting" if it's writable inside a
> container and changes the behavior a different container sees...)
>
I'm indeed proposing an opt-in system-wide global value.
I think it is the simplest approach to fix the issue at
hand here while maintaining backwards-compatibility.
I'm open to suggestions to where that system-wide
global value should be.
It must be a system-wide global value if we are not going
with the per-netdev flag approach as this system-wide global flag
should control how a skb is travelled between different netns.
So it doesn't belong to any one single netns.
^ permalink raw reply
* Re: [PATCH iproute2 v2 9/9] bpf: avoid compiler warnings about strncpy
From: Daniel Borkmann @ 2018-03-20 21:12 UTC (permalink / raw)
To: Stephen Hemminger, netdev
In-Reply-To: <20180320202909.22166-10-stephen@networkplumber.org>
On 03/20/2018 09:29 PM, Stephen Hemminger wrote:
> Use strlcpy to avoid cases where sizeof(buf) == strlen(buf)
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
^ permalink raw reply
* Re: [net-next] intel: add SPDX identifiers to all the Intel drivers
From: Joe Perches @ 2018-03-20 21:01 UTC (permalink / raw)
To: Allan, Bruce W, Kirsher, Jeffrey T, davem@davemloft.net,
Philippe Ombredanne, Thomas Gleixner
Cc: netdev@vger.kernel.org, nhorman@redhat.com, sassmann@redhat.com,
jogreene@redhat.com
In-Reply-To: <804857E1F29AAC47BF68C404FC60A184ED5CCD20@ORSMSX105.amr.corp.intel.com>
On Tue, 2018-03-20 at 20:48 +0000, Allan, Bruce W wrote:
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> > On Behalf Of Jeff Kirsher
> > Sent: Tuesday, March 20, 2018 10:52 AM
> > To: Joe Perches <joe@perches.com>; davem@davemloft.net; Philippe
> > Ombredanne <pombredanne@nexb.com>
> > Cc: netdev@vger.kernel.org; nhorman@redhat.com; sassmann@redhat.com;
> > jogreene@redhat.com
> > Subject: Re: [net-next] intel: add SPDX identifiers to all the Intel drivers
> >
> > On Tue, 2018-03-20 at 10:41 -0700, Joe Perches wrote:
> > > On Tue, 2018-03-20 at 10:13 -0700, Jeff Kirsher wrote:
> > > > Add the SPDX identifiers to all the Intel wired LAN driver files,
> > > > as
> > > > outlined in Documentation/process/license-rules.rst.
> > >
> > > So far the Documentation does not show using the -only variant.
> > >
> > > For a discussion, please see:
> > > https://lkml.org/lkml/2018/2/8/311
>
> But the Linux Foundation, the authority maintaining the valid SPDX
> identifiers, indicates at https://spdx.org/licenses/ that "GPL-2.0" is
> deprecated while "GPL-2.0-only" (and others) is appropriate.
> Was there any mention in the thread or other conversations if/when the
> kernel's documentation (and all existing uses of "GPL-2.0" in the
> kernel) will be updated to "GPL-2.0-only"?
Not as far as I know.
I believe the uses of "-only" and "-or-later" are
superior to the current
styles and should be done
sooner than later.
I trust at some point the documentation will be
updated and something like the script I submitted
to do the conversions will be run across the tree.
At that point, checkpatch will be updated to do
appropriate specific license checking.
> >
> > :-( I had it originally as GPL-2.0 and then it was pointed out that it
> > was being deprecated, so rather than creating future thrash over the
> > change, figured I would be ahead of the game.
> >
> > >
> > > > diff --git a/drivers/net/ethernet/intel/e100.c
> > > > b/drivers/net/ethernet/intel/e100.c
> > >
> > > []
> > > > @@ -1,3 +1,4 @@
> > > > +// SPDX-License-Identifier: GPL-2.0-only
> > >
> > > etc...
^ permalink raw reply
* RE: [net-next] intel: add SPDX identifiers to all the Intel drivers
From: Allan, Bruce W @ 2018-03-20 20:48 UTC (permalink / raw)
To: Kirsher, Jeffrey T, Joe Perches, davem@davemloft.net,
Philippe Ombredanne
Cc: netdev@vger.kernel.org, nhorman@redhat.com, sassmann@redhat.com,
jogreene@redhat.com
In-Reply-To: <1521568343.12746.4.camel@intel.com>
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Jeff Kirsher
> Sent: Tuesday, March 20, 2018 10:52 AM
> To: Joe Perches <joe@perches.com>; davem@davemloft.net; Philippe
> Ombredanne <pombredanne@nexb.com>
> Cc: netdev@vger.kernel.org; nhorman@redhat.com; sassmann@redhat.com;
> jogreene@redhat.com
> Subject: Re: [net-next] intel: add SPDX identifiers to all the Intel drivers
>
> On Tue, 2018-03-20 at 10:41 -0700, Joe Perches wrote:
> > On Tue, 2018-03-20 at 10:13 -0700, Jeff Kirsher wrote:
> > > Add the SPDX identifiers to all the Intel wired LAN driver files,
> > > as
> > > outlined in Documentation/process/license-rules.rst.
> >
> > So far the Documentation does not show using the -only variant.
> >
> > For a discussion, please see:
> > https://lkml.org/lkml/2018/2/8/311
But the Linux Foundation, the authority maintaining the valid SPDX identifiers, indicates at https://spdx.org/licenses/ that "GPL-2.0" is deprecated while "GPL-2.0-only" (and others) is appropriate.
Was there any mention in the thread or other conversations if/when the kernel's documentation (and all existing uses of "GPL-2.0" in the kernel) will be updated to "GPL-2.0-only"?
>
> :-( I had it originally as GPL-2.0 and then it was pointed out that it
> was being deprecated, so rather than creating future thrash over the
> change, figured I would be ahead of the game.
>
> >
> > > diff --git a/drivers/net/ethernet/intel/e100.c
> > > b/drivers/net/ethernet/intel/e100.c
> >
> > []
> > > @@ -1,3 +1,4 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> >
> > etc...
^ permalink raw reply
* Re: [RFC 2/2] page_frag_cache: Store metadata in struct page
From: Alexander Duyck @ 2018-03-20 20:47 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Alexander Duyck, linux-mm, Netdev, Matthew Wilcox
In-Reply-To: <20180315195329.7787-3-willy@infradead.org>
On Thu, Mar 15, 2018 at 12:53 PM, Matthew Wilcox <willy@infradead.org> wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
>
> Shrink page_frag_cache from 24 to 8 bytes (a single pointer to the
> currently-in-use struct page) by using the page's refcount directly
> (instead of maintaining a bias) and storing our current progress through
> the page in the same bits currently used for page->index. We no longer
> need to reflect the page pfmemalloc state if we're storing the page
> directly.
>
> On the downside, we now call page_address() on every allocation, and we
> do an atomic_inc() rather than a non-atomic decrement, but we should
> touch the same number of cachelines and there is far less code (and
> the code is less complex).
>
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
So I went to test the patches and it looks like I am running into
something similar to what the kbuild test robot was reporting.
For my test I was going to just do a simple routing setup with an igb
pf trying to route packets to a netns with a pair of VFs on two
different subnets. As soon as I brought the VFs up in the namespace
they crashed with the following errors:
[ 270.169580] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000010
[ 270.177473] IP: page_frag_alloc+0xb/0x60
[ 270.181415] PGD 0 P4D 0
[ 270.183966] Oops: 0000 [#1] SMP PTI
[ 270.187476] Modules linked in: igbvf tun bridge stp llc vfat fat
intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp
kvm_intel kvm irqbypass crct10dif_pclmul nfsd crc32_pclmul
ghash_clmulni_intel ipmi_ssif iTCO_wdt pcbc mei_me ipmi_si
iTCO_vendor_support aesni_intel crypto_simd glue_helper mei cryptd
mxm_wmi wmi ioatdma ipmi_devintf i2c_i801 sg pcspkr shpchp lpc_ich
mfd_core ipmi_msghandler acpi_power_meter acpi_pad auth_rpcgss nfs_acl
lockd grace sunrpc binfmt_misc xfs libcrc32c sr_mod cdrom sd_mod ast
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops igb ttm
ixgbe i40e drm ahci mdio libahci ptp crc32c_intel libata pps_core
i2c_algo_bit dca i2c_core dm_mirror dm_region_hash dm_log dm_mod dax
[ 270.251286] CPU: 1 PID: 3638 Comm: ifconfig Not tainted 4.16.0-rc4+ #66
[ 270.257937] Hardware name: Supermicro X10DRH/X10DRH-i, BIOS 2.0a 06/30/2016
[ 270.264934] RIP: 0010:page_frag_alloc+0xb/0x60
[ 270.269402] RSP: 0018:ffffc900093bbb28 EFLAGS: 00010086
[ 270.274653] RAX: 0000000000000202 RBX: 0000000000000780 RCX: 0000000000000000
[ 270.281820] RDX: 0000000001080020 RSI: 0000000000000780 RDI: ffff88085f2610f0
[ 270.290557] RBP: ffff88085f2610f0 R08: 0000000001080020 R09: 000000000000030f
[ 270.299196] R10: 0000000000000001 R11: 0000000000000018 R12: 0000000000000780
[ 270.307828] R13: 0000000000000202 R14: 0000000000000000 R15: ffff88084e9de880
[ 270.316461] FS: 00007f135724a740(0000) GS:ffff88085f240000(0000)
knlGS:0000000000000000
[ 270.326034] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 270.333226] CR2: 0000000000000010 CR3: 0000000852612001 CR4: 00000000003606e0
[ 270.341817] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 270.350393] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 270.358974] Call Trace:
[ 270.362869] __netdev_alloc_skb+0x9b/0x100
[ 270.368420] igbvf_alloc_rx_buffers+0x23a/0x3d0 [igbvf]
[ 270.375097] igbvf_open+0x63/0x130 [igbvf]
[ 270.380627] __dev_open+0xcb/0x150
[ 270.385461] __dev_change_flags+0x1a4/0x1f0
[ 270.391067] ? netdev_run_todo+0x62/0x330
[ 270.396484] dev_change_flags+0x23/0x60
[ 270.401711] devinet_ioctl+0x63c/0x710
[ 270.406839] inet_ioctl+0x93/0x170
[ 270.411608] ? dev_get_by_name_rcu+0x66/0x80
[ 270.417239] sock_do_ioctl+0x3d/0x130
[ 270.422255] sock_ioctl+0x1f8/0x310
[ 270.427093] do_vfs_ioctl+0xa6/0x5f0
[ 270.432016] ? handle_mm_fault+0xfa/0x210
[ 270.437368] SyS_ioctl+0x74/0x80
[ 270.441929] do_syscall_64+0x6e/0x1a0
[ 270.446910] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 270.453295] RIP: 0033:0x7f1356d81107
[ 270.458198] RSP: 002b:00007ffed6fc6888 EFLAGS: 00000202 ORIG_RAX:
0000000000000010
[ 270.467125] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1356d81107
[ 270.475626] RDX: 00007ffed6fc6890 RSI: 0000000000008914 RDI: 0000000000000004
[ 270.484145] RBP: 00007ffed6fc6960 R08: 0000000000000005 R09: 00007ffed6fc6a9c
[ 270.492655] R10: 0000000000000000 R11: 0000000000000202 R12: 0000558d817b9b5c
[ 270.501137] R13: 0000000000000041 R14: 0000000000000000 R15: 0000000000000000
[ 270.509606] Code: 89 ef e8 75 1e 00 00 49 89 c5 e9 94 fe ff ff 40
80 e5 3f eb d9 4c 89 74 24 08 eb de 0f 1f 40 00 0f 1f 44 00 00 48 8b
0f 53 89 f3 <8b> 41 10 39 f0 72 35 48 85 c9 74 30 f0 ff 41 1c 48 ba 00
00 00
[ 270.531290] RIP: page_frag_alloc+0xb/0x60 RSP: ffffc900093bbb28
[ 270.538584] CR2: 0000000000000010
[ 270.543301] ---[ end trace ce66eb444de36915 ]---
[ 270.554539] Kernel panic - not syncing: Fatal exception
[ 270.561129] Kernel Offset: disabled
[ 270.571026] ---[ end Kernel panic - not syncing: Fatal exception
[ 270.578258] WARNING: CPU: 1 PID: 3638 at kernel/sched/core.c:1189
set_task_cpu+0x184/0x190
[ 270.587728] Modules linked in: igbvf tun bridge stp llc vfat fat
intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp
kvm_intel kvm irqbypass crct10dif_pclmul nfsd crc32_pclmul
ghash_clmulni_intel ipmi_ssif iTCO_wdt pcbc mei_me ipmi_si
iTCO_vendor_support aesni_intel crypto_simd glue_helper mei cryptd
mxm_wmi wmi ioatdma ipmi_devintf i2c_i801 sg pcspkr shpchp lpc_ich
mfd_core ipmi_msghandler acpi_power_meter acpi_pad auth_rpcgss nfs_acl
lockd grace sunrpc binfmt_misc xfs libcrc32c sr_mod cdrom sd_mod ast
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops igb ttm
ixgbe i40e drm ahci mdio libahci ptp crc32c_intel libata pps_core
i2c_algo_bit dca i2c_core dm_mirror dm_region_hash dm_log dm_mod dax
[ 270.659174] CPU: 1 PID: 3638 Comm: ifconfig Tainted: G D
4.16.0-rc4+ #66
[ 270.668522] Hardware name: Supermicro X10DRH/X10DRH-i, BIOS 2.0a 06/30/2016
[ 270.676928] RIP: 0010:set_task_cpu+0x184/0x190
[ 270.682802] RSP: 0018:ffff88085f243ce0 EFLAGS: 00010046
[ 270.689449] RAX: 0000000000000200 RBX: ffff8808502eaa00 RCX: 00000fffffc00001
[ 270.698011] RDX: 0000000000000001 RSI: 0000000000000016 RDI: ffff8808502eaa00
[ 270.706583] RBP: 0000000000021e80 R08: 0000000000000000 R09: 0000000000000000
[ 270.715162] R10: 0000000000000000 R11: 00000000197d4f00 R12: 0000000000000016
[ 270.723724] R13: 0000000000000016 R14: 0000000000000004 R15: 0000000000000016
[ 270.732278] FS: 00007f135724a740(0000) GS:ffff88085f240000(0000)
knlGS:0000000000000000
[ 270.741801] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 270.748985] CR2: 0000000000000010 CR3: 0000000852612001 CR4: 00000000003606e0
[ 270.757582] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 270.766179] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 270.774774] Call Trace:
[ 270.778676] <IRQ>
[ 270.782144] try_to_wake_up+0x15d/0x430
[ 270.787441] __wake_up_common+0x8a/0x150
[ 270.792823] ep_poll_callback+0xc4/0x2f0
[ 270.798203] __wake_up_common+0x8a/0x150
[ 270.803585] __wake_up_common_lock+0x7a/0xc0
[ 270.809305] irq_work_run_list+0x46/0x70
[ 270.814688] ? tick_sched_do_timer+0x60/0x60
[ 270.820413] update_process_times+0x3b/0x50
[ 270.826054] tick_sched_handle+0x26/0x60
[ 270.831426] tick_sched_timer+0x34/0x70
[ 270.836704] __hrtimer_run_queues+0xf9/0x260
[ 270.842397] hrtimer_interrupt+0x122/0x270
[ 270.847911] smp_apic_timer_interrupt+0x56/0x120
[ 270.853942] apic_timer_interrupt+0xf/0x20
[ 270.859450] </IRQ>
[ 270.862956] RIP: 0010:panic+0x1fa/0x23c
[ 270.868202] RSP: 0018:ffffc900093bb8e0 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff12
[ 270.877170] RAX: 0000000000000034 RBX: 0000000000000000 RCX: 0000000000000006
[ 270.885691] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88085f256500
[ 270.894182] RBP: ffffc900093bb950 R08: 0000000000000000 R09: 0000000000000611
[ 270.902667] R10: 00000000000003ff R11: 0000000000aaaaaa R12: ffffffff81e243ce
[ 270.911149] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
[ 270.919631] oops_end+0xb0/0xc0
[ 270.924128] no_context+0x1a9/0x3f0
[ 270.928954] __do_page_fault+0x97/0x4c0
[ 270.934104] ? update_curr+0x6d/0x190
[ 270.939039] do_page_fault+0x32/0x130
[ 270.943928] page_fault+0x25/0x50
[ 270.948418] RIP: 0010:page_frag_alloc+0xb/0x60
[ 270.953996] RSP: 0018:ffffc900093bbb28 EFLAGS: 00010086
[ 270.960320] RAX: 0000000000000202 RBX: 0000000000000780 RCX: 0000000000000000
[ 270.968528] RDX: 0000000001080020 RSI: 0000000000000780 RDI: ffff88085f2610f0
[ 270.976718] RBP: ffff88085f2610f0 R08: 0000000001080020 R09: 000000000000030f
[ 270.984888] R10: 0000000000000001 R11: 0000000000000018 R12: 0000000000000780
[ 270.993051] R13: 0000000000000202 R14: 0000000000000000 R15: ffff88084e9de880
[ 271.001214] __netdev_alloc_skb+0x9b/0x100
[ 271.006324] igbvf_alloc_rx_buffers+0x23a/0x3d0 [igbvf]
[ 271.012564] igbvf_open+0x63/0x130 [igbvf]
[ 271.017677] __dev_open+0xcb/0x150
[ 271.022087] __dev_change_flags+0x1a4/0x1f0
[ 271.027286] ? netdev_run_todo+0x62/0x330
[ 271.032295] dev_change_flags+0x23/0x60
[ 271.037112] devinet_ioctl+0x63c/0x710
[ 271.041829] inet_ioctl+0x93/0x170
[ 271.046187] ? dev_get_by_name_rcu+0x66/0x80
[ 271.051420] sock_do_ioctl+0x3d/0x130
[ 271.056032] sock_ioctl+0x1f8/0x310
[ 271.060476] do_vfs_ioctl+0xa6/0x5f0
[ 271.064999] ? handle_mm_fault+0xfa/0x210
[ 271.069956] SyS_ioctl+0x74/0x80
[ 271.074133] do_syscall_64+0x6e/0x1a0
[ 271.078752] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 271.084774] RIP: 0033:0x7f1356d81107
[ 271.089322] RSP: 002b:00007ffed6fc6888 EFLAGS: 00000202 ORIG_RAX:
0000000000000010
[ 271.097894] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1356d81107
[ 271.106039] RDX: 00007ffed6fc6890 RSI: 0000000000008914 RDI: 0000000000000004
[ 271.114193] RBP: 00007ffed6fc6960 R08: 0000000000000005 R09: 00007ffed6fc6a9c
[ 271.122347] R10: 0000000000000000 R11: 0000000000000202 R12: 0000558d817b9b5c
[ 271.130511] R13: 0000000000000041 R14: 0000000000000000 R15: 0000000000000000
[ 271.138668] Code: ff 80 8b ec 03 00 00 04 e9 2b ff ff ff 0f 0b e9
c7 fe ff ff f7 83 88 00 00 00 fd ff ff ff 0f 84 d1 fe ff ff 0f 0b e9
ca fe ff ff <0f> 0b e9 d9 fe ff ff 0f 1f 44 00 00 0f 1f 44 00 00 41 55
49 89
[ 271.159723] ---[ end trace ce66eb444de36916 ]---
[ 271.165446] ------------[ cut here ]------------
[ 271.171172] sched: Unexpected reschedule of offline CPU#22!
[ 271.177852] WARNING: CPU: 1 PID: 3638 at arch/x86/kernel/smp.c:128
native_smp_send_reschedule+0x36/0x40
[ 271.188373] Modules linked in: igbvf tun bridge stp llc vfat fat
intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp
kvm_intel kvm irqbypass crct10dif_pclmul nfsd crc32_pclmul
ghash_clmulni_intel ipmi_ssif iTCO_wdt pcbc mei_me ipmi_si
iTCO_vendor_support aesni_intel crypto_simd glue_helper mei cryptd
mxm_wmi wmi ioatdma ipmi_devintf i2c_i801 sg pcspkr shpchp lpc_ich
mfd_core ipmi_msghandler acpi_power_meter acpi_pad auth_rpcgss nfs_acl
lockd grace sunrpc binfmt_misc xfs libcrc32c sr_mod cdrom sd_mod ast
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops igb ttm
ixgbe i40e drm ahci mdio libahci ptp crc32c_intel libata pps_core
i2c_algo_bit dca i2c_core dm_mirror dm_region_hash dm_log dm_mod dax
[ 271.259471] CPU: 1 PID: 3638 Comm: ifconfig Tainted: G D W
4.16.0-rc4+ #66
[ 271.268767] Hardware name: Supermicro X10DRH/X10DRH-i, BIOS 2.0a 06/30/2016
[ 271.277133] RIP: 0010:native_smp_send_reschedule+0x36/0x40
[ 271.284039] RSP: 0018:ffff88085f243d00 EFLAGS: 00010086
[ 271.290686] RAX: 0000000000000000 RBX: ffff8808502eaa00 RCX: 0000000000000006
[ 271.299351] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88085f256500
[ 271.307929] RBP: 0000000000021e80 R08: 0000000000000000 R09: 0000000000000662
[ 271.316502] R10: 00000000000003ff R11: 0000000000aaaaaa R12: ffff8808502eb0ec
[ 271.325072] R13: 0000000000000046 R14: 0000000000000001 R15: 0000000000000016
[ 271.333633] FS: 00007f135724a740(0000) GS:ffff88085f240000(0000)
knlGS:0000000000000000
[ 271.343175] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 271.350385] CR2: 0000000000000010 CR3: 0000000852612001 CR4: 00000000003606e0
[ 271.358980] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 271.367570] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 271.376147] Call Trace:
[ 271.380039] <IRQ>
[ 271.383499] try_to_wake_up+0x3c9/0x430
[ 271.388786] __wake_up_common+0x8a/0x150
[ 271.394158] ep_poll_callback+0xc4/0x2f0
[ 271.399533] __wake_up_common+0x8a/0x150
[ 271.404905] __wake_up_common_lock+0x7a/0xc0
[ 271.410617] irq_work_run_list+0x46/0x70
[ 271.415981] ? tick_sched_do_timer+0x60/0x60
[ 271.421700] update_process_times+0x3b/0x50
[ 271.427331] tick_sched_handle+0x26/0x60
[ 271.432705] tick_sched_timer+0x34/0x70
[ 271.437990] __hrtimer_run_queues+0xf9/0x260
[ 271.443711] hrtimer_interrupt+0x122/0x270
[ 271.449250] smp_apic_timer_interrupt+0x56/0x120
[ 271.455316] apic_timer_interrupt+0xf/0x20
[ 271.460842] </IRQ>
[ 271.464343] RIP: 0010:panic+0x1fa/0x23c
[ 271.469585] RSP: 0018:ffffc900093bb8e0 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff12
[ 271.478588] RAX: 0000000000000034 RBX: 0000000000000000 RCX: 0000000000000006
[ 271.487168] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88085f256500
[ 271.495748] RBP: ffffc900093bb950 R08: 0000000000000000 R09: 0000000000000611
[ 271.504291] R10: 00000000000003ff R11: 0000000000aaaaaa R12: ffffffff81e243ce
[ 271.512809] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
[ 271.521316] oops_end+0xb0/0xc0
[ 271.525829] no_context+0x1a9/0x3f0
[ 271.530674] __do_page_fault+0x97/0x4c0
[ 271.535838] ? update_curr+0x6d/0x190
[ 271.540785] do_page_fault+0x32/0x130
[ 271.545681] page_fault+0x25/0x50
[ 271.550190] RIP: 0010:page_frag_alloc+0xb/0x60
[ 271.555785] RSP: 0018:ffffc900093bbb28 EFLAGS: 00010086
[ 271.562118] RAX: 0000000000000202 RBX: 0000000000000780 RCX: 0000000000000000
[ 271.570343] RDX: 0000000001080020 RSI: 0000000000000780 RDI: ffff88085f2610f0
[ 271.578540] RBP: ffff88085f2610f0 R08: 0000000001080020 R09: 000000000000030f
[ 271.586730] R10: 0000000000000001 R11: 0000000000000018 R12: 0000000000000780
[ 271.594902] R13: 0000000000000202 R14: 0000000000000000 R15: ffff88084e9de880
[ 271.603071] __netdev_alloc_skb+0x9b/0x100
[ 271.608191] igbvf_alloc_rx_buffers+0x23a/0x3d0 [igbvf]
[ 271.614449] igbvf_open+0x63/0x130 [igbvf]
[ 271.619578] __dev_open+0xcb/0x150
[ 271.623998] __dev_change_flags+0x1a4/0x1f0
[ 271.629197] ? netdev_run_todo+0x62/0x330
[ 271.634205] dev_change_flags+0x23/0x60
[ 271.639014] devinet_ioctl+0x63c/0x710
[ 271.643722] inet_ioctl+0x93/0x170
[ 271.648080] ? dev_get_by_name_rcu+0x66/0x80
[ 271.653315] sock_do_ioctl+0x3d/0x130
[ 271.657926] sock_ioctl+0x1f8/0x310
[ 271.662360] do_vfs_ioctl+0xa6/0x5f0
[ 271.666886] ? handle_mm_fault+0xfa/0x210
[ 271.671849] SyS_ioctl+0x74/0x80
[ 271.676029] do_syscall_64+0x6e/0x1a0
[ 271.680646] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 271.686666] RIP: 0033:0x7f1356d81107
[ 271.691268] RSP: 002b:00007ffed6fc6888 EFLAGS: 00000202 ORIG_RAX:
0000000000000010
[ 271.699838] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1356d81107
[ 271.707985] RDX: 00007ffed6fc6890 RSI: 0000000000008914 RDI: 0000000000000004
[ 271.716130] RBP: 00007ffed6fc6960 R08: 0000000000000005 R09: 00007ffed6fc6a9c
[ 271.724286] R10: 0000000000000000 R11: 0000000000000202 R12: 0000558d817b9b5c
[ 271.732449] R13: 0000000000000041 R14: 0000000000000000 R15: 0000000000000000
[ 271.740603] Code: f1 18 0e 01 0f 92 c0 84 c0 74 12 48 8b 05 33 03
e8 00 be fd 00 00 00 48 8b 40 30 ff e0 89 fe 48 c7 c7 18 ba e2 81 e8
0a 1c 03 00 <0f> 0b c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 8b 05 ad 22
57 01
[ 271.761653] ---[ end trace ce66eb444de36917 ]---
[ 271.767406] ------------[ cut here ]------------
[ 271.772606] sched: Unexpected reschedule of offline CPU#0!
[ 271.778578] WARNING: CPU: 1 PID: 3638 at arch/x86/kernel/smp.c:128
native_smp_send_reschedule+0x36/0x40
[ 271.788454] Modules linked in: igbvf tun bridge stp llc vfat fat
intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp
kvm_intel kvm irqbypass crct10dif_pclmul nfsd crc32_pclmul
ghash_clmulni_intel ipmi_ssif iTCO_wdt pcbc mei_me ipmi_si
iTCO_vendor_support aesni_intel crypto_simd glue_helper mei cryptd
mxm_wmi wmi ioatdma ipmi_devintf i2c_i801 sg pcspkr shpchp lpc_ich
mfd_core ipmi_msghandler acpi_power_meter acpi_pad auth_rpcgss nfs_acl
lockd grace sunrpc binfmt_misc xfs libcrc32c sr_mod cdrom sd_mod ast
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops igb ttm
ixgbe i40e drm ahci mdio libahci ptp crc32c_intel libata pps_core
i2c_algo_bit dca i2c_core dm_mirror dm_region_hash dm_log dm_mod dax
[ 271.855174] CPU: 1 PID: 3638 Comm: ifconfig Tainted: G D W
4.16.0-rc4+ #66
[ 271.863755] Hardware name: Supermicro X10DRH/X10DRH-i, BIOS 2.0a 06/30/2016
[ 271.871390] RIP: 0010:native_smp_send_reschedule+0x36/0x40
[ 271.877541] RSP: 0018:ffff88085f243bf0 EFLAGS: 00010086
[ 271.883391] RAX: 0000000000000000 RBX: ffff88085f221e80 RCX: 0000000000000006
[ 271.891147] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88085f256500
[ 271.898980] RBP: ffff88085f221e80 R08: 0000000000000000 R09: 00000000000006b3
[ 271.906728] R10: 00000000000003ff R11: 0000000000aaaaaa R12: ffff88085a671c00
[ 271.914476] R13: ffff88085f243c38 R14: 0000000000000000 R15: 0000000000000000
[ 271.922222] FS: 00007f135724a740(0000) GS:ffff88085f240000(0000)
knlGS:0000000000000000
[ 271.930933] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 271.937310] CR2: 0000000000000010 CR3: 0000000852612001 CR4: 00000000003606e0
[ 271.945065] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 271.952823] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 271.960622] Call Trace:
[ 271.963715] <IRQ>
[ 271.966351] check_preempt_curr+0x74/0xa0
[ 271.971004] ttwu_do_wakeup+0x19/0x140
[ 271.975373] try_to_wake_up+0x1cd/0x430
[ 271.979838] ? acpi_hw_write_port+0x2c/0x91
[ 271.984645] __queue_work+0x12c/0x3a0
[ 271.988935] ? acpi_ev_asynch_enable_gpe+0x32/0x32
[ 271.994376] queue_work_on+0x24/0x40
[ 271.998599] acpi_os_execute+0x8b/0xf0
[ 272.002982] acpi_ev_gpe_dispatch+0xe0/0x12d
[ 272.007941] acpi_ev_gpe_detect+0x16d/0x1cf
[ 272.012758] acpi_ev_sci_xrupt_handler+0x1c/0x31
[ 272.018000] acpi_irq+0x12/0x30
[ 272.021772] __handle_irq_event_percpu+0x3d/0x190
[ 272.027100] handle_irq_event_percpu+0x30/0x70
[ 272.032203] handle_irq_event+0x39/0x60
[ 272.036649] handle_fasteoi_irq+0x84/0x130
[ 272.041388] handle_irq+0xa7/0x130
[ 272.045403] do_IRQ+0x43/0xc0
[ 272.048972] common_interrupt+0xf/0xf
[ 272.053235] RIP: 0010:__do_softirq+0x6f/0x26c
[ 272.058175] RSP: 0018:ffff88085f243f78 EFLAGS: 00000206 ORIG_RAX:
ffffffffffffffde
[ 272.066328] RAX: ffff880829071c00 RBX: ffff88085f255f40 RCX: 0000000000000282
[ 272.074050] RDX: 0000000000000424 RSI: 00000000f6c088cd RDI: 00000000000006e0
[ 272.081781] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[ 272.089492] R10: 0000000000000004 R11: 0000000000000005 R12: 0000000000000000
[ 272.097283] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 272.104971] ? common_interrupt+0xa/0xf
[ 272.109332] irq_exit+0xbe/0xd0
[ 272.112978] smp_apic_timer_interrupt+0x60/0x120
[ 272.118161] apic_timer_interrupt+0xf/0x20
[ 272.122806] </IRQ>
[ 272.125363] RIP: 0010:panic+0x1fa/0x23c
[ 272.129668] RSP: 0018:ffffc900093bb8e0 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff12
[ 272.137685] RAX: 0000000000000034 RBX: 0000000000000000 RCX: 0000000000000006
[ 272.145259] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88085f256500
[ 272.152832] RBP: ffffc900093bb950 R08: 0000000000000000 R09: 0000000000000611
[ 272.160407] R10: 00000000000003ff R11: 0000000000aaaaaa R12: ffffffff81e243ce
[ 272.167979] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
[ 272.175547] oops_end+0xb0/0xc0
[ 272.179126] no_context+0x1a9/0x3f0
[ 272.183051] __do_page_fault+0x97/0x4c0
[ 272.187378] ? update_curr+0x6d/0x190
[ 272.191474] do_page_fault+0x32/0x130
[ 272.195547] page_fault+0x25/0x50
[ 272.199266] RIP: 0010:page_frag_alloc+0xb/0x60
[ 272.204118] RSP: 0018:ffffc900093bbb28 EFLAGS: 00010086
[ 272.209752] RAX: 0000000000000202 RBX: 0000000000000780 RCX: 0000000000000000
[ 272.217307] RDX: 0000000001080020 RSI: 0000000000000780 RDI: ffff88085f2610f0
[ 272.224854] RBP: ffff88085f2610f0 R08: 0000000001080020 R09: 000000000000030f
[ 272.232412] R10: 0000000000000001 R11: 0000000000000018 R12: 0000000000000780
[ 272.239967] R13: 0000000000000202 R14: 0000000000000000 R15: ffff88084e9de880
[ 272.247535] __netdev_alloc_skb+0x9b/0x100
[ 272.252059] igbvf_alloc_rx_buffers+0x23a/0x3d0 [igbvf]
[ 272.257760] igbvf_open+0x63/0x130 [igbvf]
[ 272.262284] __dev_open+0xcb/0x150
[ 272.266115] __dev_change_flags+0x1a4/0x1f0
[ 272.270725] ? netdev_run_todo+0x62/0x330
[ 272.275171] dev_change_flags+0x23/0x60
[ 272.279434] devinet_ioctl+0x63c/0x710
[ 272.283611] inet_ioctl+0x93/0x170
[ 272.287434] ? dev_get_by_name_rcu+0x66/0x80
[ 272.292157] sock_do_ioctl+0x3d/0x130
[ 272.296246] sock_ioctl+0x1f8/0x310
[ 272.300198] do_vfs_ioctl+0xa6/0x5f0
[ 272.304203] ? handle_mm_fault+0xfa/0x210
[ 272.308638] SyS_ioctl+0x74/0x80
[ 272.312295] do_syscall_64+0x6e/0x1a0
[ 272.316386] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 272.321861] RIP: 0033:0x7f1356d81107
[ 272.325874] RSP: 002b:00007ffed6fc6888 EFLAGS: 00000202 ORIG_RAX:
0000000000000010
[ 272.333865] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1356d81107
[ 272.341447] RDX: 00007ffed6fc6890 RSI: 0000000000008914 RDI: 0000000000000004
[ 272.349038] RBP: 00007ffed6fc6960 R08: 0000000000000005 R09: 00007ffed6fc6a9c
[ 272.356612] R10: 0000000000000000 R11: 0000000000000202 R12: 0000558d817b9b5c
[ 272.364177] R13: 0000000000000041 R14: 0000000000000000 R15: 0000000000000000
[ 272.371744] Code: f1 18 0e 01 0f 92 c0 84 c0 74 12 48 8b 05 33 03
e8 00 be fd 00 00 00 48 8b 40 30 ff e0 89 fe 48 c7 c7 18 ba e2 81 e8
0a 1c 03 00 <0f> 0b c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 8b 05 ad 22
57 01
[ 272.391554] ---[ end trace ce66eb444de36918 ]---
^ permalink raw reply
* Re: [PATCH net-next 01/14] tcp: Add clean acked data hook
From: Rao Shoaib @ 2018-03-20 20:36 UTC (permalink / raw)
To: Saeed Mahameed, David S. Miller
Cc: netdev, Dave Watson, Boris Pismenny, Ilya Lesokhin,
Aviad Yehezkel
In-Reply-To: <20180320024510.7408-2-saeedm@mellanox.com>
On 03/19/2018 07:44 PM, Saeed Mahameed wrote:
> From: Ilya Lesokhin <ilyal@mellanox.com>
>
> Called when a TCP segment is acknowledged.
> Could be used by application protocols who hold additional
> metadata associated with the stream data.
>
> This is required by TLS device offload to release
> metadata associated with acknowledged TLS records.
>
> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
> Signed-off-by: Boris Pismenny <borisp@mellanox.com>
> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> ---
> include/net/inet_connection_sock.h | 2 ++
> net/ipv4/tcp_input.c | 2 ++
> 2 files changed, 4 insertions(+)
>
> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
> index b68fea022a82..2ab6667275df 100644
> --- a/include/net/inet_connection_sock.h
> +++ b/include/net/inet_connection_sock.h
> @@ -77,6 +77,7 @@ struct inet_connection_sock_af_ops {
> * @icsk_af_ops Operations which are AF_INET{4,6} specific
> * @icsk_ulp_ops Pluggable ULP control hook
> * @icsk_ulp_data ULP private data
> + * @icsk_clean_acked Clean acked data hook
> * @icsk_listen_portaddr_node hash to the portaddr listener hashtable
> * @icsk_ca_state: Congestion control state
> * @icsk_retransmits: Number of unrecovered [RTO] timeouts
> @@ -102,6 +103,7 @@ struct inet_connection_sock {
> const struct inet_connection_sock_af_ops *icsk_af_ops;
> const struct tcp_ulp_ops *icsk_ulp_ops;
> void *icsk_ulp_data;
> + void (*icsk_clean_acked)(struct sock *sk, u32 acked_seq);
> struct hlist_node icsk_listen_portaddr_node;
> unsigned int (*icsk_sync_mss)(struct sock *sk, u32 pmtu);
> __u8 icsk_ca_state:6,
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 451ef3012636..9854ecae7245 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3542,6 +3542,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
> if (after(ack, prior_snd_una)) {
> flag |= FLAG_SND_UNA_ADVANCED;
> icsk->icsk_retransmits = 0;
> + if (icsk->icsk_clean_acked)
> + icsk->icsk_clean_acked(sk, ack);
> }
>
> prior_fack = tcp_is_sack(tp) ? tcp_highest_sack_seq(tp) : tp->snd_una;
Per Dave we are not allowed to use function pointers any more, so why
extend their use. I implemented a similar callback for my changes but in
my use case I need to call the meta data update function even when the
packet does not ack any new data or has no payload. Is it possible to
move this to say tcp_data_queue() ?
Thanks,
Shoaib
^ permalink raw reply
* [PATCH iproute2 v2 9/9] bpf: avoid compiler warnings about strncpy
From: Stephen Hemminger @ 2018-03-20 20:29 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger
In-Reply-To: <20180320202909.22166-1-stephen@networkplumber.org>
Use strlcpy to avoid cases where sizeof(buf) == strlen(buf)
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
lib/bpf.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/bpf.c b/lib/bpf.c
index c38d92d87759..04bc5a5685d5 100644
--- a/lib/bpf.c
+++ b/lib/bpf.c
@@ -2593,7 +2593,7 @@ bpf_map_set_send(int fd, struct sockaddr_un *addr, unsigned int addr_len,
char *amsg_buf;
int i;
- strncpy(msg.aux.obj_name, aux->obj, sizeof(msg.aux.obj_name));
+ strlcpy(msg.aux.obj_name, aux->obj, sizeof(msg.aux.obj_name));
memcpy(&msg.aux.obj_st, aux->st, sizeof(msg.aux.obj_st));
cmsg_buf = bpf_map_set_init(&msg, addr, addr_len);
@@ -2682,7 +2682,7 @@ int bpf_send_map_fds(const char *path, const char *obj)
return -1;
}
- strncpy(addr.sun_path, path, sizeof(addr.sun_path));
+ strlcpy(addr.sun_path, path, sizeof(addr.sun_path));
ret = connect(fd, (struct sockaddr *)&addr, sizeof(addr));
if (ret < 0) {
@@ -2715,7 +2715,7 @@ int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
return -1;
}
- strncpy(addr.sun_path, path, sizeof(addr.sun_path));
+ strlcpy(addr.sun_path, path, sizeof(addr.sun_path));
ret = bind(fd, (struct sockaddr *)&addr, sizeof(addr));
if (ret < 0) {
--
2.16.2
^ permalink raw reply related
* [PATCH iproute2 v2 8/9] misc: avoid snprintf warnings in ss and nstat
From: Stephen Hemminger @ 2018-03-20 20:29 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger
In-Reply-To: <20180320202909.22166-1-stephen@networkplumber.org>
Gcc 8 checks that target buffer is big enough.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
misc/nstat.c | 4 ++--
misc/ss.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/misc/nstat.c b/misc/nstat.c
index a4dd405d43a9..433a1f483be3 100644
--- a/misc/nstat.c
+++ b/misc/nstat.c
@@ -178,12 +178,12 @@ static int count_spaces(const char *line)
static void load_ugly_table(FILE *fp)
{
- char buf[4096];
+ char buf[2048];
struct nstat_ent *db = NULL;
struct nstat_ent *n;
while (fgets(buf, sizeof(buf), fp) != NULL) {
- char idbuf[sizeof(buf)];
+ char idbuf[4096];
int off;
char *p;
int count1, count2, skip = 0;
diff --git a/misc/ss.c b/misc/ss.c
index e087bef739b0..a03fa4a7c174 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -4032,7 +4032,7 @@ static int netlink_show_one(struct filter *f,
if (!pid) {
done = 1;
- strncpy(procname, "kernel", 6);
+ strncpy(procname, "kernel", 7);
} else if (pid > 0) {
FILE *fp;
--
2.16.2
^ permalink raw reply related
* [PATCH iproute2 v2 7/9] ematch: fix possible snprintf overflow
From: Stephen Hemminger @ 2018-03-20 20:29 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger
In-Reply-To: <20180320202909.22166-1-stephen@networkplumber.org>
Fixes gcc 8 warning about possible snprint overflow
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
tc/m_ematch.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tc/m_ematch.c b/tc/m_ematch.c
index d2bb5c380382..0d66dc682314 100644
--- a/tc/m_ematch.c
+++ b/tc/m_ematch.c
@@ -161,7 +161,7 @@ static struct ematch_util *get_ematch_kind(char *kind)
static struct ematch_util *get_ematch_kind_num(__u16 kind)
{
- char name[32];
+ char name[513];
if (lookup_map(kind, name, sizeof(name), EMATCH_MAP) < 0)
return NULL;
--
2.16.2
^ permalink raw reply related
* [PATCH iproute2 v2 6/9] tc_class: fix snprintf warning
From: Stephen Hemminger @ 2018-03-20 20:29 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger
In-Reply-To: <20180320202909.22166-1-stephen@networkplumber.org>
Size buffer big enough to avoid any possible overflow.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
tc/tc_class.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tc/tc_class.c b/tc/tc_class.c
index 1b214b82c702..91802518bb27 100644
--- a/tc/tc_class.c
+++ b/tc/tc_class.c
@@ -219,7 +219,7 @@ static void graph_cls_show(FILE *fp, char *buf, struct hlist_head *root_list,
char cls_id_str[256] = {};
struct rtattr *tb[TCA_MAX + 1];
struct qdisc_util *q;
- char str[100] = {};
+ char str[300] = {};
hlist_for_each_safe(n, tmp_cls, root_list) {
struct hlist_node *c, *tmp_chld;
@@ -242,7 +242,8 @@ static void graph_cls_show(FILE *fp, char *buf, struct hlist_head *root_list,
graph_indent(buf, cls, 0, 0);
print_tc_classid(cls_id_str, sizeof(cls_id_str), cls->id);
- sprintf(str, "+---(%s)", cls_id_str);
+ snprintf(str, sizeof(str),
+ "+---(%s)", cls_id_str);
strcat(buf, str);
parse_rtattr(tb, TCA_MAX, (struct rtattr *)cls->data,
--
2.16.2
^ permalink raw reply related
* [PATCH iproute2 v2 5/9] namespace: fix warning snprintf buffer
From: Stephen Hemminger @ 2018-03-20 20:29 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger
In-Reply-To: <20180320202909.22166-1-stephen@networkplumber.org>
It is possible that user could request really long namespace
name and overrun the path buffer.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
lib/namespace.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/lib/namespace.c b/lib/namespace.c
index 6f3356d0fa08..682634028587 100644
--- a/lib/namespace.c
+++ b/lib/namespace.c
@@ -23,7 +23,8 @@ static void bind_etc(const char *name)
struct dirent *entry;
DIR *dir;
- snprintf(etc_netns_path, sizeof(etc_netns_path), "%s/%s", NETNS_ETC_DIR, name);
+ snprintf(etc_netns_path, sizeof(etc_netns_path), "%s/%s",
+ NETNS_ETC_DIR, name);
dir = opendir(etc_netns_path);
if (!dir)
return;
@@ -33,7 +34,8 @@ static void bind_etc(const char *name)
continue;
if (strcmp(entry->d_name, "..") == 0)
continue;
- snprintf(netns_name, sizeof(netns_name), "%s/%s", etc_netns_path, entry->d_name);
+ snprintf(netns_name, sizeof(netns_name),
+ "%s/%s", etc_netns_path, entry->d_name);
snprintf(etc_name, sizeof(etc_name), "/etc/%s", entry->d_name);
if (mount(netns_name, etc_name, "none", MS_BIND, NULL) < 0) {
fprintf(stderr, "Bind %s -> %s failed: %s\n",
--
2.16.2
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox