* Re: [PATCH net-next 3/3] virtio-net: support XDP_REDIRECT
From: John Fastabend @ 2017-09-20 22:02 UTC (permalink / raw)
To: Jason Wang, mst, virtualization, netdev, linux-kernel
In-Reply-To: <1505814163-7315-3-git-send-email-jasowang@redhat.com>
On 09/19/2017 02:42 AM, Jason Wang wrote:
> This patch tries to add XDP_REDIRECT for virtio-net. The changes are
> not complex as we could use exist XDP_TX helpers for most of the
> work. The rest is passing the XDP_TX to NAPI handler for implementing
> batching.
>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
[...]
> @@ -678,12 +711,20 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> }
> break;
> case XDP_TX:
> - if (unlikely(!virtnet_xdp_xmit(vi, &xdp)))
> + if (unlikely(!__virtnet_xdp_xmit(vi, &xdp)))
> trace_xdp_exception(vi->dev, xdp_prog, act);
> + else
> + *xdp_xmit = true;
> if (unlikely(xdp_page != page))
> goto err_xdp;
> rcu_read_unlock();
> goto xdp_xmit;
> + case XDP_REDIRECT:
> + err = xdp_do_redirect(dev, &xdp, xdp_prog);
> + if (err)
> + *xdp_xmit = true;
Is this a typo? Should above be
if (!err)
*xdp_xmit = true;
this would match the pattern in receive_small and also make more sense.
> + rcu_read_unlock();
> + goto xdp_xmit;
> default:
> bpf_warn_invalid_xdp_action(act);
> case XDP_ABORTED:
> @@ -788,7 +829,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> }
>
[...]
Otherwise looks good to me thanks for doing this.
.John
^ permalink raw reply
* Re: Latest net-next from GIT panic
From: Wei Wang @ 2017-09-20 22:09 UTC (permalink / raw)
To: Paweł Staszewski
Cc: Cong Wang, Eric Dumazet, Linux Kernel Network Developers,
Eric Dumazet
In-Reply-To: <6c36fa00-13e6-8cd7-ff5c-df7739220736@itcare.pl>
>>> bisected again and same result:
>>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
>>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>> Author: Wei Wang <weiwan@google.com>
>>> Date: Sat Jun 17 10:42:32 2017 -0700
>>>
>>> ipv4: mark DST_NOGC and remove the operation of dst_free()
>>>
>>> With the previous preparation patches, we are ready to get rid of the
>>> dst gc operation in ipv4 code and release dst based on refcnt only.
>>> So this patch adds DST_NOGC flag for all IPv4 dst and remove the
>>> calls
>>> to dst_free().
>>> At this point, all dst created in ipv4 code do not use the dst gc
>>> anymore and will be destroyed at the point when refcnt drops to 0.
>>>
>>> Signed-off-by: Wei Wang <weiwan@google.com>
>>> Acked-by: Martin KaFai Lau <kafai@fb.com>
>>> Signed-off-by: David S. Miller <davem@davemloft.net>
>>>
>>> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da
>>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net
>>>
>>> Will add now version 2 of patch from Eric and we will see
>>>
>>>
>> after adding patch
>> perf top catch
>> PerfTop: 77159 irqs/sec kernel:99.7% exact: 0.0% [4000Hz cycles],
>> (all, 40 CPUs)
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> 60.95% [kernel] [k] dev_put.part.6
>> 4.00% [kernel] [k] ixgbe_poll
>> 3.63% [kernel] [k] irq_entries_start
>> 1.22% [kernel] [k] fib_table_lookup
>> 1.15% [kernel] [k] do_raw_spin_lock
>> 1.05% [kernel] [k] ixgbe_xmit_frame_ring
>> 1.04% [kernel] [k] lookup
>> 0.87% [kernel] [k] eth_type_trans
>>
>>
>> no panic on console - rebooting to check logs
>>
>>
> Nothing logged
>
Thanks very much Pawel for the feedback.
I was looking into the code (specifically IPv4 part) and found that in
free_fib_info_rcu(), we call free_nh_exceptions() without holding the
fnhe_lock. I am wondering if that could cause some race condition on
fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the
same dst could be happening.
But as we call free_fib_info_rcu() only after the grace period, and
the lookup code which could potentially modify
fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems
fine...
On Wed, Sep 20, 2017 at 2:25 PM, Paweł Staszewski <pstaszewski@itcare.pl> wrote:
>
>
> W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze:
>
>>
>>
>> W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze:
>>>
>>>
>>>
>>> W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze:
>>>>
>>>>
>>>>
>>>> W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze:
>>>>>
>>>>>
>>>>>
>>>>> W dniu 2017-09-20 o 20:36, Cong Wang pisze:
>>>>>>
>>>>>> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet
>>>>>> <eric.dumazet@gmail.com> wrote:
>>>>>>>
>>>>>>> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote:
>>>>>>>>
>>>>>>>> but dmesg at this time shows nothing about interfaces or flaps.
>>>>>>>>
>>>>>>>> This is very odd.
>>>>>>>>
>>>>>>>> We only free netdevice in free_netdev() and it is only called when
>>>>>>>> we unregister a netdevice. Otherwise pcpu_refcnt is impossible
>>>>>>>> to be NULL.
>>>>>>>
>>>>>>> If there is a missing dev_hold() or one dev_put() in excess,
>>>>>>> this would allow the netdev to be freed too soon.
>>>>>>>
>>>>>>> -> Use after free.
>>>>>>> memory holding netdev could be reallocated-cleared by some other
>>>>>>> kernel
>>>>>>> user.
>>>>>>>
>>>>>> Sure, but only unregister could trigger a free. If there is no
>>>>>> unregister,
>>>>>> like what Pawel claims, then there is no free, the refcnt just goes to
>>>>>> 0 but the memory is still there.
>>>>>>
>>>>> About possible mistake from my side with bisect - i can judge too early
>>>>> that some bisect was good
>>>>> the road was:
>>>>> git bisect start
>>>>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag
>>>>> 'pinctrl-v4.13-1' of
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
>>>>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
>>>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next'
>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
>>>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
>>>>> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using
>>>>> stack larger than 1024.
>>>>> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f
>>>>> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch
>>>>> 'udp-reduce-cache-pressure'
>>>>> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20
>>>>> # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch
>>>>> 's390-net-updates-part-2'
>>>>> git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230
>>>>> # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch
>>>>> 'bpf-ctx-narrow'
>>>>> git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70
>>>>> # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove
>>>>> cp_outgoing
>>>>> git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2
>>>>> # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add
>>>>> TCP_MD5SIG_EXT socket option to set a key address prefix
>>>>> git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d
>>>>> # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new
>>>>> function dst_dev_put()
>>>>>
>>>>> And currently have this running for about 4 hours without problems.
>>>>>
>>>>>
>>>>>
>>>>> git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36
>>>>> # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove
>>>>> DST_NOCACHE flag
>>>>>
>>>>> Here for sure - panic
>>>>>
>>>>> git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2
>>>>> # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call
>>>>> dst_hold_safe() properly
>>>>> git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112
>>>>> # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call
>>>>> dst_hold_safe() properly
>>>>> git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f
>>>>> # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take
>>>>> dst->__refcnt for insertion into fib6 tree
>>>>>
>>>>> im not 100% sure tor last two
>>>>> Will test them again starting from
>>>>> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put()
>>>>> properly
>>>>>
>>>>>
>>>>> git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855
>>>>> # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC
>>>>> and remove the operation of dst_free()
>>>>>
>>>>>
>>>>>
>>>>> git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>>>> # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4:
>>>>> mark DST_NOGC and remove the operation of dst_free()
>>>>>
>>>>>
>>>>>
>>>> What i can say more
>>>> I can reproduce this on any server with similar configuration
>>>> the difference can be teamd instead of bonding
>>>> ixgbe or i40e and mlx5
>>>> Same problems
>>>>
>>>> vlans - more or less prefixes learned from bgp -> zebra -> netlink ->
>>>> kernel
>>>> But normally in lab when using only plain routing no bgpd and about 128
>>>> vlans - with 128 routes - cant reproduce this - this apperas only with bgp -
>>>> minimum where i can reproduce this was about 130k prefixes with about 286
>>>> nexthops
>>>>
>>>>
>>>>
>>>>
>>> bisected again and same result:
>>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
>>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>> Author: Wei Wang <weiwan@google.com>
>>> Date: Sat Jun 17 10:42:32 2017 -0700
>>>
>>> ipv4: mark DST_NOGC and remove the operation of dst_free()
>>>
>>> With the previous preparation patches, we are ready to get rid of the
>>> dst gc operation in ipv4 code and release dst based on refcnt only.
>>> So this patch adds DST_NOGC flag for all IPv4 dst and remove the
>>> calls
>>> to dst_free().
>>> At this point, all dst created in ipv4 code do not use the dst gc
>>> anymore and will be destroyed at the point when refcnt drops to 0.
>>>
>>> Signed-off-by: Wei Wang <weiwan@google.com>
>>> Acked-by: Martin KaFai Lau <kafai@fb.com>
>>> Signed-off-by: David S. Miller <davem@davemloft.net>
>>>
>>> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da
>>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net
>>>
>>> Will add now version 2 of patch from Eric and we will see
>>>
>>>
>> after adding patch
>> perf top catch
>> PerfTop: 77159 irqs/sec kernel:99.7% exact: 0.0% [4000Hz cycles],
>> (all, 40 CPUs)
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> 60.95% [kernel] [k] dev_put.part.6
>> 4.00% [kernel] [k] ixgbe_poll
>> 3.63% [kernel] [k] irq_entries_start
>> 1.22% [kernel] [k] fib_table_lookup
>> 1.15% [kernel] [k] do_raw_spin_lock
>> 1.05% [kernel] [k] ixgbe_xmit_frame_ring
>> 1.04% [kernel] [k] lookup
>> 0.87% [kernel] [k] eth_type_trans
>>
>>
>> no panic on console - rebooting to check logs
>>
>>
> Nothing logged
>
^ permalink raw reply
* Re: [PATCH net-next v2] bridge: also trigger RTM_NEWLINK when interface is released from bridge
From: Vincent Bernat @ 2017-09-20 22:12 UTC (permalink / raw)
To: David Ahern; +Cc: David Miller, stephen, bridge, netdev
In-Reply-To: <16e5566a-909d-ba83-7637-1fb6c93126bc@gmail.com>
❦ 20 septembre 2017 15:57 -0600, David Ahern <dsahern@gmail.com> :
> The DELLINK is for AF_BRIDGE family (ifi_family). Adding family to
> print_linkinfo and running the monitor I get:
>
>
> [LINK]family 0: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc
> noqueue master br0 state UNKNOWN group default
> link/ether d6:c3:73:86:3c:73 brd ff:ff:ff:ff:ff:ff
>
> [LINK]family 7: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500
> master br0 state UNKNOWN
> link/ether d6:c3:73:86:3c:73
>
> [LINK]Deleted family 7: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu
> 1500 master br0 state UNKNOWN
> link/ether d6:c3:73:86:3c:73
I didn't know about the family. We can drop the patch.
--
One of the most striking differences between a cat and a lie is that a cat has
only nine lives.
-- Mark Twain, "Pudd'nhead Wilson's Calendar"
^ permalink raw reply
* [PATCH 1/1] ip: ip_print: if no json obj is initialized default fp is stdout
From: Julien Fortin @ 2017-09-20 22:19 UTC (permalink / raw)
To: netdev; +Cc: roopa, nikolay, dsa, Julien Fortin
From: Julien Fortin <julien@cumulusnetworks.com>
The ip monitor didn't call `new_json_obj` (even for in non json context),
so the static FILE* _fp variable wasn't initialized, thus raising a
SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
paths won't have to call `new_json_obj`.
$ ip -t mon label link
fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
#3 0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d “%d: “, key=0x4545fc “ifindex”, t=PRINT_ANY) at ip_common.h:189
#4 print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
#5 0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
#6 0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
at libnetlink.c:761
#7 0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
#8 0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 “mon”, argc=3, argv=0x7fffffffe588) at ip.c:116
#9 0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311
Traceback can be seen when running the following command in a new terminal.
$ ip link add dev br0 type bridge
Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
---
ip/ip_print.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/ip/ip_print.c b/ip/ip_print.c
index 4cd6a0bc..8cce2011 100644
--- a/ip/ip_print.c
+++ b/ip/ip_print.c
@@ -23,6 +23,11 @@ static FILE *_fp;
#define _IS_JSON_CONTEXT(type) ((type & PRINT_JSON || type & PRINT_ANY) && _jw)
#define _IS_FP_CONTEXT(type) (!_jw && (type & PRINT_FP || type & PRINT_ANY))
+static inline FILE* _get_fp(void)
+{
+ return _fp ? _fp : stdout;
+};
+
void new_json_obj(int json, FILE *fp)
{
if (json) {
@@ -91,7 +96,7 @@ void open_json_array(enum output_type type, const char *str)
jsonw_name(_jw, str);
jsonw_start_array(_jw);
} else if (_IS_FP_CONTEXT(type)) {
- fprintf(_fp, "%s", str);
+ fprintf(_get_fp(), "%s", str);
}
}
@@ -105,7 +110,7 @@ void close_json_array(enum output_type type, const char *str)
jsonw_end_array(_jw);
jsonw_pretty(_jw, true);
} else if (_IS_FP_CONTEXT(type)) {
- fprintf(_fp, "%s", str);
+ fprintf(_get_fp(), "%s", str);
}
}
@@ -126,7 +131,7 @@ void close_json_array(enum output_type type, const char *str)
else \
jsonw_##type_name##_field(_jw, key, value); \
} else if (_IS_FP_CONTEXT(t)) { \
- color_fprintf(_fp, color, fmt, value); \
+ color_fprintf(_get_fp(), color, fmt, value); \
} \
}
_PRINT_FUNC(int, int);
@@ -149,7 +154,7 @@ void print_color_string(enum output_type type,
else
jsonw_string_field(_jw, key, value);
} else if (_IS_FP_CONTEXT(type)) {
- color_fprintf(_fp, color, fmt, value);
+ color_fprintf(_get_fp(), color, fmt, value);
}
}
@@ -170,7 +175,7 @@ void print_color_bool(enum output_type type,
else
jsonw_bool(_jw, value);
} else if (_IS_FP_CONTEXT(type)) {
- color_fprintf(_fp, color, fmt, value ? "true" : "false");
+ color_fprintf(_get_fp(), color, fmt, value ? "true" : "false");
}
}
@@ -189,7 +194,7 @@ void print_color_0xhex(enum output_type type,
snprintf(b1, sizeof(b1), "%#x", hex);
print_string(PRINT_JSON, key, NULL, b1);
} else if (_IS_FP_CONTEXT(type)) {
- color_fprintf(_fp, color, fmt, hex);
+ color_fprintf(_get_fp(), color, fmt, hex);
}
}
@@ -208,7 +213,7 @@ void print_color_hex(enum output_type type,
else
jsonw_string(_jw, b1);
} else if (_IS_FP_CONTEXT(type)) {
- color_fprintf(_fp, color, fmt, hex);
+ color_fprintf(_get_fp(), color, fmt, hex);
}
}
@@ -228,6 +233,6 @@ void print_color_null(enum output_type type,
else
jsonw_null(_jw);
} else if (_IS_FP_CONTEXT(type)) {
- color_fprintf(_fp, color, fmt, value);
+ color_fprintf(_get_fp(), color, fmt, value);
}
}
--
2.14.1
^ permalink raw reply related
* Re: [PATCH] net: compat: assert the size of cmsg copied in is as expected
From: David Miller @ 2017-09-20 22:36 UTC (permalink / raw)
To: mengxu.gatech; +Cc: netdev, linux-kernel, meng.xu, sanidhya, taesoo
In-Reply-To: <1505841553-39084-1-git-send-email-mengxu.gatech@gmail.com>
From: Meng Xu <mengxu.gatech@gmail.com>
Date: Tue, 19 Sep 2017 13:19:13 -0400
> The actual length of cmsg fetched in during the second loop
> (i.e., kcmsg - kcmsg_base) could be different from what we
> get from the first loop (i.e., kcmlen).
>
> The main reason is that the two get_user() calls in the two
> loops (i.e., get_user(ucmlen, &ucmsg->cmsg_len) and
> __get_user(ucmlen, &ucmsg->cmsg_len)) could cause ucmlen
> to have different values even they fetch from the same userspace
> address, as user can race to change the memory content in
> &ucmsg->cmsg_len across fetches.
>
> Although in the second loop, the sanity check
> if ((char *)kcmsg_base + kcmlen - (char *)kcmsg < CMSG_ALIGN(tmp))
> is inplace, it only ensures that the cmsg fetched in during the
> second loop does not exceed the length of kcmlen, but not
> necessarily equal to kcmlen. But indicated by the assignment
> kmsg->msg_controllen = kcmlen, we should enforce that.
>
> This patch adds this additional sanity check and ensures that
> what is recorded in kmsg->msg_controllen is the actual cmsg length.
>
> Signed-off-by: Meng Xu <mengxu.gatech@gmail.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next v2] bridge: also trigger RTM_NEWLINK when interface is released from bridge
From: David Miller @ 2017-09-20 22:41 UTC (permalink / raw)
To: vincent; +Cc: dsahern, stephen, bridge, netdev
In-Reply-To: <87bmm56mu2.fsf@luffy.cx>
From: Vincent Bernat <vincent@bernat.im>
Date: Thu, 21 Sep 2017 00:12:53 +0200
> ❦ 20 septembre 2017 15:57 -0600, David Ahern <dsahern@gmail.com> :
>
>> The DELLINK is for AF_BRIDGE family (ifi_family). Adding family to
>> print_linkinfo and running the monitor I get:
>>
>>
>> [LINK]family 0: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc
>> noqueue master br0 state UNKNOWN group default
>> link/ether d6:c3:73:86:3c:73 brd ff:ff:ff:ff:ff:ff
>>
>> [LINK]family 7: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500
>> master br0 state UNKNOWN
>> link/ether d6:c3:73:86:3c:73
>>
>> [LINK]Deleted family 7: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu
>> 1500 master br0 state UNKNOWN
>> link/ether d6:c3:73:86:3c:73
>
> I didn't know about the family. We can drop the patch.
Ok, I've reverted.
Thanks.
^ permalink raw reply
* Re: [PATCH net] net: change skb->mac_header when Generic XDP calls adjust_head
From: David Miller @ 2017-09-20 22:45 UTC (permalink / raw)
To: ecree; +Cc: netdev
In-Reply-To: <394385d4-796a-9ada-8f01-53a52d73a418@solarflare.com>
From: Edward Cree <ecree@solarflare.com>
Date: Tue, 19 Sep 2017 18:45:56 +0100
> Since XDP's view of the packet includes the MAC header, moving the start-
> of-packet with bpf_xdp_adjust_head needs to also update the offset of the
> MAC header (which is relative to skb->head, not to the skb->data that was
> changed).
> Without this, tcpdump sees packets starting from the old MAC header rather
> than the new one, at least in my tests on the loopback device.
>
> Fixes: b5cdae3291f7 ("net: Generic XDP")
> Signed-off-by: Edward Cree <ecree@solarflare.com>
Applied and queued up for -stable, thanks Edward.
^ permalink raw reply
* Re: [PATCH v4 1/3] net: fec: only check queue 0 if RXF_0/TXF_0 interrupt is set
From: David Miller @ 2017-09-20 22:47 UTC (permalink / raw)
To: troy.kisky; +Cc: fugang.duan, netdev, fabio.estevam, lznuaa
In-Reply-To: <1505867589-11784-1-git-send-email-troy.kisky@boundarydevices.com>
From: Troy Kisky <troy.kisky@boundarydevices.com>
Date: Tue, 19 Sep 2017 17:33:07 -0700
> Before queue 0 was always checked if any queue caused an interrupt.
> It is better to just mark queue 0 if queue 0 has caused an interrupt.
>
> Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
> Acked-by: Fugang Duan <Fugang.duan@nxp.com>
Applied.
^ permalink raw reply
* Re: [PATCH v4 2/3] net: fec: remove unused interrupt FEC_ENET_TS_TIMER
From: David Miller @ 2017-09-20 22:47 UTC (permalink / raw)
To: troy.kisky; +Cc: fugang.duan, netdev, fabio.estevam, lznuaa
In-Reply-To: <1505867589-11784-2-git-send-email-troy.kisky@boundarydevices.com>
From: Troy Kisky <troy.kisky@boundarydevices.com>
Date: Tue, 19 Sep 2017 17:33:08 -0700
> FEC_ENET_TS_TIMER is not checked in the interrupt routine
> so there is no need to enable it.
>
> Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
> Acked-by: Fugang Duan <fugang.duan@nxp.com>
Applied.
^ permalink raw reply
* Re: [PATCH v4 3/3] net: fec: return IRQ_HANDLED if fec_ptp_check_pps_event handled it
From: David Miller @ 2017-09-20 22:47 UTC (permalink / raw)
To: troy.kisky; +Cc: fugang.duan, netdev, fabio.estevam, lznuaa
In-Reply-To: <1505867589-11784-3-git-send-email-troy.kisky@boundarydevices.com>
From: Troy Kisky <troy.kisky@boundarydevices.com>
Date: Tue, 19 Sep 2017 17:33:09 -0700
> fec_ptp_check_pps_event will return 1 if FEC_T_TF_MASK caused
> an interrupt. Don't return IRQ_NONE in this case.
>
> Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
> Acked-by: Fugang Duan <fugang.duan@nxp.com>
Applied.
^ permalink raw reply
* Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket
From: Vallish Vaidyeshwara @ 2017-09-20 22:48 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Eric Dumazet, Eduardo Valentin, David Miller, dwmw2, shuah,
richardcochran, xiyou.wangcong, netdev, linux-kernel, anchalag,
dwmw
In-Reply-To: <alpine.DEB.2.20.1709161131200.2105@nanos>
On Sat, Sep 16, 2017 at 11:47:56AM +0200, Thomas Gleixner wrote:
> On Fri, 8 Sep 2017, Eric Dumazet wrote:
> > On Fri, 2017-09-08 at 11:55 -0700, Eduardo Valentin wrote:
> > > Hello,
> > >
> > > On Fri, Sep 08, 2017 at 10:26:45AM -0700, David Miller wrote:
> > > > From: David Woodhouse <dwmw2@infradead.org>
> > > > Date: Fri, 08 Sep 2017 18:23:22 +0100
> > > >
> > > > > I don't know that anyone's ever tried saying "show me the chapter
> > > and
> > > > > verse of the documentation"
> > > >
> > > > Do you know why I brought this up? Because the person I am replying
> > > > to told me that the syscall documentation should have suggested this
> > > > or that.
> > > >
> > > > That's why.
> > >
> > > :-) My intention was for sure not to upset anybody.
> > >
> > > Just to reiterate, the point of patch is simple, there was a change in
> > > behavior in the system call from one kernel version to the other. As I
> > > mentioned, I agree that the userspace could use other means to achieve
> > > the same, but still the system call behavior has changed.
> > >
> > > >
> > > > So let's concentrate on the other aspects of my reply, ok?
> > >
> > > I agree. I would prefer to understand here what is the technical
> > > reason not to accept these patches other than "use other system
> > > calls".
> >
> > So if we need to replace all 'legacy' timers to high resolution timer,
> > because some application was _relying_ on jiffies being kind of precise,
> > maybe it is better to revert the change done on legacy timers.
>
> Which would be a major step back in terms of timer performance and system
> disturbance caused by massive recascading operations.
>
> > Or continue the migration and make them use high res internally.
> >
> > select() and poll() are the standard way to have precise timeouts,
> > it is silly we have to maintain a timeout handling in the datagram fast
> > path.
>
> A few years ago we switched select/poll over to use hrtimers because the
> wheel timers were too inaccurate for some operations, so it feels
> consequent to switch the timeout in the datagram rcv path over as well. I
> agree that the whole timeout magic there feels silly, but unfortunately
> it's a documented property of sockets.
>
> Thanks,
>
> tglx
>
Hello Thomas,
Thanks for your comments. This patch has been NACK'ed by David Miller. Is
there any other approach to solve this problem with out application code
being recompiled?
Thanks.
-Vallish
^ permalink raw reply
* Re: [PATCH net-next] net: dsa: Utilize dsa_slave_dev_check()
From: David Miller @ 2017-09-20 22:49 UTC (permalink / raw)
To: f.fainelli; +Cc: netdev, andrew, vivien.didelot, linux-kernel
In-Reply-To: <20170920010038.12393-1-f.fainelli@gmail.com>
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue, 19 Sep 2017 18:00:37 -0700
> Instead of open coding the check.
>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Applied, thanks.
^ permalink raw reply
* [PATCH net 0/2] Bring back transceiver type for PHYLIB
From: Florian Fainelli @ 2017-09-20 22:52 UTC (permalink / raw)
To: netdev; +Cc: davem, linville, decot, tremyfr, andrew, Florian Fainelli
Hi
With the introduction of the xLINKSETTINGS ethtool APIs, the transceiver type
was deprecated, but in that process we lost some useful information that PHYLIB
was consistently reporting about internal vs. external PHYs.
This brings back transceiver as a read-only field that is only consumed in the
legacy path where ETHTOOL_GET is called but the underlying drivers implement the
new style klink_settings API.
Florian Fainelli (2):
net: ethtool: Add back transceiver type
net: phy: Keep reporting transceiver type
drivers/net/phy/phy.c | 3 ++-
include/uapi/linux/ethtool.h | 6 +++++-
net/core/ethtool.c | 2 ++
3 files changed, 9 insertions(+), 2 deletions(-)
--
2.9.3
^ permalink raw reply
* [PATCH net 1/2] net: ethtool: Add back transceiver type
From: Florian Fainelli @ 2017-09-20 22:52 UTC (permalink / raw)
To: netdev; +Cc: davem, linville, decot, tremyfr, andrew, Florian Fainelli
In-Reply-To: <20170920225214.21974-1-f.fainelli@gmail.com>
Commit 3f1ac7a700d0 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
deprecated the ethtool_cmd::transceiver field, which was fine in
premise, except that the PHY library was actually using it to report the
type of transceiver: internal or external.
Use the first word of the reserved field to put this __u8 transceiver
field back in. It is made read-only, and we don't expect the
ETHTOOL_xLINKSETTINGS API to be doing anything with this anyway, so this
is mostly for the legacy path where we do:
ethtool_get_settings()
-> dev->ethtool_ops->get_link_ksettings()
-> convert_link_ksettings_to_legacy_settings()
to have no information loss compared to the legacy get_settings API.
Fixes: 3f1ac7a700d0 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
include/uapi/linux/ethtool.h | 6 +++++-
net/core/ethtool.c | 2 ++
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 9c041dae8e2c..5bd1b1de4ea0 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1753,6 +1753,8 @@ enum ethtool_reset_flags {
* %ethtool_link_mode_bit_indices for the link modes, and other
* link features that the link partner advertised through
* autonegotiation; 0 if unknown or not applicable. Read-only.
+ * @transceiver: Used to distinguish different possible PHY types,
+ * reported consistently by PHYLIB. Read-only.
*
* If autonegotiation is disabled, the speed and @duplex represent the
* fixed link mode and are writable if the driver supports multiple
@@ -1804,7 +1806,9 @@ struct ethtool_link_settings {
__u8 eth_tp_mdix;
__u8 eth_tp_mdix_ctrl;
__s8 link_mode_masks_nwords;
- __u32 reserved[8];
+ __u8 transceiver;
+ __u8 reserved1[3];
+ __u32 reserved[7];
__u32 link_mode_masks[0];
/* layout of link_mode_masks fields:
* __u32 map_supported[link_mode_masks_nwords];
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 6a582ae4c5d9..3228411ada0f 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -525,6 +525,8 @@ convert_link_ksettings_to_legacy_settings(
= link_ksettings->base.eth_tp_mdix;
legacy_settings->eth_tp_mdix_ctrl
= link_ksettings->base.eth_tp_mdix_ctrl;
+ legacy_settings->transceiver
+ = link_ksettings->base.transceiver;
return retval;
}
--
2.9.3
^ permalink raw reply related
* [PATCH net 2/2] net: phy: Keep reporting transceiver type
From: Florian Fainelli @ 2017-09-20 22:52 UTC (permalink / raw)
To: netdev; +Cc: davem, linville, decot, tremyfr, andrew, Florian Fainelli
In-Reply-To: <20170920225214.21974-1-f.fainelli@gmail.com>
With commit 2d55173e71b0 ("phy: add generic function to support
ksetting support"), we lost the ability to report the transceiver type
like we used to. Now that we have added back the transceiver type to
ethtool_link_settings, we can report it back like we used to and have no
loss of information.
Fixes: 3f1ac7a700d0 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/net/phy/phy.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index e842d2cd1ee7..2b1e67bc1e73 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -373,7 +373,8 @@ void phy_ethtool_ksettings_get(struct phy_device *phydev,
cmd->base.port = PORT_BNC;
else
cmd->base.port = PORT_MII;
-
+ cmd->base.transceiver = phy_is_internal(phydev) ?
+ XCVR_INTERNAL : XCVR_EXTERNAL;
cmd->base.phy_address = phydev->mdio.addr;
cmd->base.autoneg = phydev->autoneg;
cmd->base.eth_tp_mdix_ctrl = phydev->mdix_ctrl;
--
2.9.3
^ permalink raw reply related
* Re: [PATCH net-next] bpf: Optimize lpm trie delete
From: Daniel Mack @ 2017-09-20 22:56 UTC (permalink / raw)
To: Craig Gallek
Cc: Alexei Starovoitov, Daniel Borkmann, David S . Miller, netdev
In-Reply-To: <CAEfhGiy_hC0tCd=5qoPZ1okyMCYsgMQbKAa0ms1Tb9cnYDaLBg@mail.gmail.com>
On 09/20/2017 08:51 PM, Craig Gallek wrote:
> On Wed, Sep 20, 2017 at 12:51 PM, Daniel Mack <daniel@zonque.org> wrote:
>> Hi Craig,
>>
>> Thanks, this looks much cleaner already :)
>>
>> On 09/20/2017 06:22 PM, Craig Gallek wrote:
>>> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
>>> index 9d58a576b2ae..b5a7d70ec8b5 100644
>>> --- a/kernel/bpf/lpm_trie.c
>>> +++ b/kernel/bpf/lpm_trie.c
>>> @@ -397,7 +397,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
>>> struct lpm_trie_node __rcu **trim;
>>> struct lpm_trie_node *node;
>>> unsigned long irq_flags;
>>> - unsigned int next_bit;
>>> + unsigned int next_bit = 0;
>>
>> This default assignment seems wrong, and I guess you only added it to
>> squelch a compiler warning?
> Yes, this variable is only initialized after the lookup iterations
> below (meaning it will never be initialized the the root-removal
> case).
Right, and once set, it's only updated in case we don't have an exact
match and try to drill down further.
>> [...]
>>
>>> + /* If the node has one child, we may be able to collapse the tree
>>> + * while removing this node if the node's child is in the same
>>> + * 'next bit' slot as this node was in its parent or if the node
>>> + * itself is the root.
>>> + */
>>> + if (trim == &trie->root) {
>>> + next_bit = node->child[0] ? 0 : 1;
>>> + rcu_assign_pointer(trie->root, node->child[next_bit]);
>>> + kfree_rcu(node, rcu);
>>
>> I don't think you should treat this 'root' case special.
>>
>> Instead, move the 'next_bit' assignment outside of the condition ...
> I'm not quite sure I follow. Are you saying do something like this:
>
> if (trim == &trie->root) {
> next_bit = node->child[0] ? 0 : 1;
> }
> if (rcu_access_pointer(node->child[next_bit])) {
> ...
>
> This would save a couple lines of code, but I think the as-is
> implementation is slightly easier to understand. I don't have a
> strong opinion either way, though.
Me neither :)
My idea was to set
next_bit = node->child[0] ? 0 : 1;
unconditionally, because it should result in the same in both cases.
It might be a bit of bike shedding, but I dislike this default
assignment, and I believe that not relying on next_bit to be set as a
side effect of the lookup loop makes the code a bit more readable.
WDYT?
Thanks,
Daniel
^ permalink raw reply
* Re: [PATCH 0/2] blackfin: Drop non-functional DSA code
From: David Miller @ 2017-09-20 22:57 UTC (permalink / raw)
To: f.fainelli; +Cc: netdev, andrew, vivien.didelot, realmz6, adi-buildroot-devel
In-Reply-To: <20170920010346.16871-1-f.fainelli@gmail.com>
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue, 19 Sep 2017 18:03:44 -0700
> I sent those many months ago in the hope that the bfin-linux people
> would pick those patches but nobody seems to be responding, can you
> queue those via net-next since this affects DSA?
Ok, if they aren't responding what can we do. Probably should update
the MAINTAINERS file to reflect this...
Series applied to net-next, thanks.
^ permalink raw reply
* Re: [PATCH] isdn/i4l: fetch the ppp_write buffer in one shot
From: David Miller @ 2017-09-20 23:01 UTC (permalink / raw)
To: mengxu.gatech
Cc: isdn, johannes.berg, netdev, linux-kernel, meng.xu, sanidhya,
taesoo
In-Reply-To: <1505872195-46627-1-git-send-email-mengxu.gatech@gmail.com>
From: Meng Xu <mengxu.gatech@gmail.com>
Date: Tue, 19 Sep 2017 21:49:55 -0400
> In isdn_ppp_write(), the header (i.e., protobuf) of the buffer is
> fetched twice from userspace. The first fetch is used to peek at the
> protocol of the message and reset the huptimer if necessary; while the
> second fetch copies in the whole buffer. However, given that buf resides
> in userspace memory, a user process can race to change its memory content
> across fetches. By doing so, we can either avoid resetting the huptimer
> for any type of packets (by first setting proto to PPP_LCP and later
> change to the actual type) or force resetting the huptimer for LCP
> packets.
>
> This patch changes this double-fetch behavior into two single fetches
> decided by condition (lp->isdn_device < 0 || lp->isdn_channel <0).
> A more detailed discussion can be found at
> https://marc.info/?l=linux-kernel&m=150586376926123&w=2
>
> Signed-off-by: Meng Xu <mengxu.gatech@gmail.com>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH net] net/ncsi: Don't assume last available channel exists
From: David Miller @ 2017-09-20 23:05 UTC (permalink / raw)
To: sam; +Cc: netdev, linux-kernel
In-Reply-To: <20170920041251.14635-1-sam@mendozajonas.com>
From: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Date: Wed, 20 Sep 2017 14:12:51 +1000
> When handling new VLAN tags in NCSI we check the maximum allowed number
> of filters on the last active ("hot") channel. However if the 'add'
> callback is called before NCSI has configured a channel, this causes a
> NULL dereference.
>
> Check that we actually have a hot channel, and warn if it is missing.
>
> Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Well, a few things.
We should not allow this driver method to be invoked in the first
place if the device is not in a state where the operation is legal
yet.
Second of all, if !ndp is true, you should not return 0 to indicate
success.
But more fundamentally, we should block this method from being
callable unless the device is in the proper state and can complete the
operation.
Seriously, these checks should be completely unnecessary if those
issues are handled properly.
^ permalink raw reply
* Re: [PATCH net-next] cxgb4: add new T5 pci device id's
From: David Miller @ 2017-09-20 23:06 UTC (permalink / raw)
To: ganeshgr; +Cc: netdev, nirranjan, indranil, venkatesh
In-Reply-To: <1505887327-8687-1-git-send-email-ganeshgr@chelsio.com>
From: Ganesh Goudar <ganeshgr@chelsio.com>
Date: Wed, 20 Sep 2017 11:32:07 +0530
> Add 0x50a5, 0x50a6, 0x50a7, 0x50a8 and 0x50a9 T5 device
> id's.
>
> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH net-next v5 1/4] bpf: add helper bpf_perf_event_read_value for perf event array map
From: David Miller @ 2017-09-20 23:07 UTC (permalink / raw)
To: peterz; +Cc: yhs, rostedt, ast, daniel, netdev, kernel-team
In-Reply-To: <20170920172651.zx4nfmj2lpgxzubq@hirez.programming.kicks-ass.net>
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed, 20 Sep 2017 19:26:51 +0200
> Dave, could we have this in a topic tree of sorts, because I have a
> pending series to rework all the timekeeping and it might be nice to not
> have sfr run into all sorts of conflicts.
If you want to merge it into your tree that's fine.
But it means that any further development done on top of this
work will need to go via you as well.
^ permalink raw reply
* Re: [PATCH 0/2] Netfilter fixes for net
From: David Miller @ 2017-09-20 23:08 UTC (permalink / raw)
To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1505904543-10004-1-git-send-email-pablo@netfilter.org>
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Wed, 20 Sep 2017 12:49:01 +0200
> The following patchset contains two Netfilter fixes for your net tree,
> they are:
>
> 1) Fix NAt compilation with UP, from Geert Uytterhoeven.
>
> 2) Fix incorrect number of entries when dumping a set, from
> Vishwanath Pai.
Pulled, thanks Pablo.
^ permalink raw reply
* Re: [PATCH net 0/9] TM related bugfixes for the HNS3 Ethernet Driver
From: David Miller @ 2017-09-20 23:15 UTC (permalink / raw)
To: linyunsheng
Cc: huangdaode, xuwei5, liguozhu, Yisen.Zhuang, gabriele.paoloni,
john.garry, linuxarm, salil.mehta, lipeng321, netdev,
linux-kernel
In-Reply-To: <1505904778-53217-1-git-send-email-linyunsheng@huawei.com>
From: Yunsheng Lin <linyunsheng@huawei.com>
Date: Wed, 20 Sep 2017 18:52:49 +0800
> This patch set contains a few bugfixes related to hclge_tm module.
Series applied, thank you.
^ permalink raw reply
* Re: [PATCH net-next v2] bridge: also trigger RTM_NEWLINK when interface is released from bridge
From: Stephen Hemminger @ 2017-09-20 23:21 UTC (permalink / raw)
To: David Ahern; +Cc: David Miller, vincent, bridge, netdev
In-Reply-To: <16e5566a-909d-ba83-7637-1fb6c93126bc@gmail.com>
On Wed, 20 Sep 2017 15:57:16 -0600
David Ahern <dsahern@gmail.com> wrote:
> On 9/20/17 3:09 PM, David Miller wrote:
> > From: Vincent Bernat <vincent@bernat.im>
> > Date: Sat, 16 Sep 2017 16:18:33 +0200
> >
> > David, I am CC:'ing you because you've done work in this area over the
> > past year. I'm applying this patch, it's been sitting since the 16th
> > and likes entirely correct to me. But if you have objections just let
> > me know.
> >
> >> Currently, when an interface is released from a bridge via
> >> ioctl(), we get a RTM_DELLINK event through netlink:
> >>
> >> Deleted 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
> >> link/ether 6e:23:c2:54:3a:b3
> >>
> >> Userspace has to interpret that as a removal from the bridge, not as a
> >> complete removal of the interface. When an bridged interface is
> >> completely removed, we get two events:
> >>
> >> Deleted 2: dummy0: <BROADCAST,NOARP> mtu 1500 master bridge0 state DOWN
> >> link/ether 6e:23:c2:54:3a:b3
> >> Deleted 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
> >> link/ether 6e:23:c2:54:3a:b3 brd ff:ff:ff:ff:ff:ff
> >>
> >> In constrast, when an interface is released from a bond, we get a
> >> RTM_NEWLINK with only the new characteristics (no master):
> >>
> >> 3: dummy1: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UNKNOWN group default
> >> link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
> >> 3: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
> >> link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
> >> 4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
> >> link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
> >> 3: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noqueue state DOWN group default
> >> link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
> >> 3: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noqueue state DOWN group default
> >> link/ether ca:c8:7b:66:f8:25 brd ff:ff:ff:ff:ff:ff
> >> 4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
> >> link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
> >>
> >> Userland may be confused by the fact we say a link is deleted while
> >> its characteristics are only modified. A first solution would have
> >> been to turn the RTM_DELLINK event in del_nbp() into a RTM_NEWLINK
> >> event. However, maybe some piece of userland is relying on this
> >> RTM_DELLINK to detect when a bridged interface is released. Instead,
> >> we also emit a RTM_NEWLINK event once the interface is
> >> released (without master info).
> >>
> >> Deleted 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
> >> link/ether 8a:bb:e7:94:b1:f8
> >> 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
> >> link/ether 8a:bb:e7:94:b1:f8 brd ff:ff:ff:ff:ff:ff
> >>
> >> This is done only when using ioctl(). When using Netlink, such an
> >> event is already automatically emitted in do_setlink().
>
> The DELLINK is for AF_BRIDGE family (ifi_family). Adding family to
> print_linkinfo and running the monitor I get:
>
>
> [LINK]family 0: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc
> noqueue master br0 state UNKNOWN group default
> link/ether d6:c3:73:86:3c:73 brd ff:ff:ff:ff:ff:ff
>
> [LINK]family 7: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500
> master br0 state UNKNOWN
> link/ether d6:c3:73:86:3c:73
>
> [LINK]Deleted family 7: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu
> 1500 master br0 state UNKNOWN
> link/ether d6:c3:73:86:3c:73
>
> [LINK]family 0: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc
> noqueue state UNKNOWN group default
> link/ether d6:c3:73:86:3c:73 brd ff:ff:ff:ff:ff:ff
>
> And that seems correct. So I think the RTM_NEWLINK added by this patch
> should not be needed.
Agreed, thanks for tracing this.
The one concern is that ports added or removed through ioctl should
cause same events as doing the same thing via netlink. Some users use
brctl (ioctl) and others use newer bridge (netlink) API.
^ permalink raw reply
* Re: [PATCH v3 14/31] vxfs: Define usercopy region in vxfs_inode slab cache
From: Christoph Hellwig @ 2017-09-20 23:22 UTC (permalink / raw)
To: Kees Cook
Cc: Christoph Hellwig, LKML, David Windsor,
linux-fsdevel@vger.kernel.org, Network Development, Linux-MM,
kernel-hardening@lists.openwall.com
In-Reply-To: <CAGXu5j+hr0UwB5NsvPSKVVfM6NFHHhnNeUZbuwyTRppSOx9Ucw@mail.gmail.com>
On Wed, Sep 20, 2017 at 02:21:45PM -0700, Kees Cook wrote:
> This is why I included several other lists on the full CC (am I
> unlucky enough to have you not subscribed to any of them?). Adding a
> CC for everyone can result in a huge CC list, especially for the
> forth-coming 300-patch timer_list series. ;)
If you think the lists are enough to review changes include only
the lists, but don't add CCs for individual patches, that's what
I usually do for cleanups that touch a lot of drivers, but don't
really change actual logic in ever little driver touched.
> Do you want me to resend the full series to you, or would you prefer
> something else like a patchwork bundle? (I'll explicitly add you to CC
> for any future versions, though.)
I'm fine with not being Cced at all if there isn't anything requiring
my urgent personal attention. It's up to you whom you want to Cc,
but my preference is generally for rather less than more people, and
rather more than less mailing lists.
But the important bit is to Cc a person or mailinglist either on
all patches or on none, otherwise a good review isn't possible.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox