* Re: [PATCH net-next v4] openvswitch: enable NSH support
From: Jiri Benc @ 2017-08-21 9:51 UTC (permalink / raw)
To: Jan Scheurich
Cc: Yang, Yi, netdev@vger.kernel.org, dev@openvswitch.org,
blp@ovn.org, e@erig.me
In-Reply-To: <CFF8EF42F1132E4CBE2BF0AB6C21C58D7274A55C@ESESSMB107.ericsson.se>
On Mon, 21 Aug 2017 09:42:27 +0000, Jan Scheurich wrote:
> I understand your concern. But not declaring md2 as array is wrong as
> well, as there might not be an MD2 TLV context header. An MD2 NSH
> header is perfectly valid without any TLV. So in any case the user of
> the struct needs to be aware of the NSH semantics.
Good point.
> NSH can be carried over Ethernet with a 14 byte header. In that case
> the total NSH header would typically be 16-bit aligned, so that all
> 32-bit members would be misaligned.
See NET_IP_ALIGN in include/linux/skbuff.h.
Jiri
^ permalink raw reply
* [PATCH net] ethernet: xircom: small clean up in setup_xirc2ps_cs()
From: Dan Carpenter @ 2017-08-21 9:47 UTC (permalink / raw)
To: Jarod Wilson
Cc: David S. Miller, netdev, kernel-janitors, Ilya Matveychikov,
Baoquan He, Andrew Morton, Ingo Molnar, linux-kernel
The get_options() function takes the whole ARRAY_SIZE(). It doesn't
matter here because we don't use more than 7 elements.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
diff --git a/drivers/net/ethernet/xircom/xirc2ps_cs.c b/drivers/net/ethernet/xircom/xirc2ps_cs.c
index f71883264cc0..fd5288ff53b5 100644
--- a/drivers/net/ethernet/xircom/xirc2ps_cs.c
+++ b/drivers/net/ethernet/xircom/xirc2ps_cs.c
@@ -1781,7 +1781,7 @@ static int __init setup_xirc2ps_cs(char *str)
*/
int ints[10] = { -1 };
- str = get_options(str, 9, ints);
+ str = get_options(str, ARRAY_SIZE(ints), ints);
#define MAYBE_SET(X,Y) if (ints[0] >= Y && ints[Y] != -1) { X = ints[Y]; }
MAYBE_SET(if_port, 3);
^ permalink raw reply related
* Re: [PATCH net-next v4] openvswitch: enable NSH support
From: Jiri Benc @ 2017-08-21 9:47 UTC (permalink / raw)
To: Yang, Yi
Cc: netdev@vger.kernel.org, dev@openvswitch.org, blp@ovn.org,
e@erig.me, jan.scheurich@ericsson.com
In-Reply-To: <20170821091541.GA75219@cran64.bj.intel.com>
On Mon, 21 Aug 2017 17:15:42 +0800, Yang, Yi wrote:
> The issue is it is used union in
>
> struct nsh_hdr {
> ovs_be16 ver_flags_ttl_len;
> uint8_t md_type;
> uint8_t next_proto;
> ovs_16aligned_be32 path_hdr;
> union {
> struct nsh_md1_ctx md1;
> struct nsh_md2_tlv md2;
> };
> };
This should work (modulo the non-kernel type names, of course). Did you
mean to put [] after md2?
> in Linux kernel build, it complained it, I changed it to
What was the error message?
> struct nsh_hdr {
> ovs_be16 ver_flags_ttl_len;
> uint8_t md_type;
> uint8_t next_proto;
> ovs_16aligned_be32 path_hdr;
> union {
> struct nsh_md1_ctx md1;
> struct nsh_md2_tlv md2[0];
> };
> };
I wouldn't use this. First, zero length array is a GCC extension. It
would indeed be better not to use that in uAPI. Second, I wouldn't even
use a flexible array member here, see my reply to Jan for the reasons.
Note that I commented on struct nsh_md2_tlv having __u8[] as the last
member which IMHO makes good sense. I'm not entirely sure what C99 says
about flexible array member being part of a struct inside union inside
a struct, though. GCC seems to cope with that just fine but AFAIK it
has some extension over the C standard wrt. flexible array members.
> I don't know how we can support this, is it a must-have thing?
What would happen if you get a GSO packet? Ports of an ovs bridge claim
GSO support, thus they may get a GSO packet. You have to handle it one
way or the other: either software segment the packet before pushing the
header, or implement proper GSO support for NSH.
> But struct nsh_hdr had different struct from struct ovs_key_nsh, we
> have no way to make them completely same, do you mean we should use the
> same name if they are same fields and represent the same thing?
Yes.
Thanks,
Jiri
^ permalink raw reply
* Re: [PATCH 2/2] net: phy: Don't use drv when it is NULL in phy_attached_print
From: Sergei Shtylyov @ 2017-08-21 9:45 UTC (permalink / raw)
To: Romain Perier, Giuseppe Cavallaro, Alexandre Torgue, Andrew Lunn,
Florian Fainelli
Cc: netdev, linux-kernel
In-Reply-To: <20170821075235.28473-3-romain.perier@collabora.com>
Hello!
On 8/21/2017 10:52 AM, Romain Perier wrote:
> Currently, if this logging function is used prior the phy driver is
> binded to the phy device (that is usually done from .ndo_open),
s/binded/bound/.
> 'phydev->drv' might be NULL, resulting in a kernel crash. That is
> typically the case in the stmmac driver, info about the phy is displayed
> during the registration of the MDIO bus, and then genphy driver is binded
Likewise.
> to this phydev when .ndo_open is called.
>
> This commit fixes the issue by using the right genphy driver, when
> phydev->drv is NULL.
>
> Fixes: commit fbca164776e4 ("net: stmmac: Use the right logging functi")
"Commit" not needed here.
> Signed-off-by: Romain Perier <romain.perier@collabora.com>
[...]
MBR, Sergei
^ permalink raw reply
* RE: Something hitting my total number of connections to the server
From: David Laight @ 2017-08-21 9:43 UTC (permalink / raw)
To: 'Akshat Kakkar', Eric Dumazet; +Cc: netdev
In-Reply-To: <CAA5aLPjs2S0cV7MoYLB6MvZVKXomUmwpi=EVHDXh=AwR_xBqsw@mail.gmail.com>
From: Akshat Kakkar
> Sent: 18 August 2017 10:14
> On Thu, Aug 17, 2017 at 5:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote:
> >
> >> I upgraded to 4.4 but still experiencing same issue.
> >> Please help.
> >
> > Still too old kernel, shoot again ;)
> >
> >
>
>
> Sorry but that's the maximum I can try as of now as its the LT version.
You should be able to build a current kernel and run it with your
existing user space.
David
^ permalink raw reply
* RE: [PATCH net-next v4] openvswitch: enable NSH support
From: Jan Scheurich @ 2017-08-21 9:42 UTC (permalink / raw)
To: Jiri Benc
Cc: Yang, Yi, netdev@vger.kernel.org, dev@openvswitch.org,
blp@ovn.org, e@erig.me
In-Reply-To: <20170821113514.6f0ec15e@griffin>
> > struct nsh_hdr {
> > ovs_be16 ver_flags_ttl_len;
> > uint8_t mdtype;
> > uint8_t np;
> > ovs_16aligned_be32 path_hdr;
> > union {
> > struct nsh_md1_ctx md1;
> > struct nsh_md2_tlv md2[];
>
> I'm not that sure about this. With each member of md2 having a different
> size, you can't use md2 as an array. However, if it was declared as an
> array, it might encourage such (wrong) usage.
>
> In particular, nsh_hdr->md2[1] is always wrong.
>
> It seems better to not declare md2 as an array.
I understand your concern. But not declaring md2 as array is wrong as well, as there might not be an MD2 TLV context header. An MD2 NSH header is perfectly valid without any TLV. So in any case the user of the struct needs to be aware of the NSH semantics.
> >
> > I wonder about the possible 16-bit alignment of the 32-bit fields,
> > though. How is that handled in the kernel?
>
> get_unaligned_*
>
> > Also struct nsh_md1_ctx
> > has 32-bit members, which might not be 32-bit aligned in the packet.
>
> I don't see that happening, it seems the header before md1 is 8 bytes and
> sizeof(md1) is 32 bytes? And for md2, the standard mandates that the md2
> size is a multiply of 4 bytes, too.
NSH can be carried over Ethernet with a 14 byte header. In that case the total NSH header would typically be 16-bit aligned, so that all 32-bit members would be misaligned.
^ permalink raw reply
* Re: [PATCH net-next v4] openvswitch: enable NSH support
From: Jiri Benc @ 2017-08-21 9:35 UTC (permalink / raw)
To: Jan Scheurich
Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, e@erig.me
In-Reply-To: <CFF8EF42F1132E4CBE2BF0AB6C21C58D727494F3-hqolJogE5njKJFWPz4pdheaU1rCVNFv4@public.gmane.org>
On Mon, 21 Aug 2017 09:04:30 +0000, Jan Scheurich wrote:
> The second member of the union should be a variable length array []
> of struct nsh_md2_tlv
>
> struct nsh_hdr {
> ovs_be16 ver_flags_ttl_len;
> uint8_t mdtype;
> uint8_t np;
> ovs_16aligned_be32 path_hdr;
> union {
> struct nsh_md1_ctx md1;
> struct nsh_md2_tlv md2[];
I'm not that sure about this. With each member of md2 having
a different size, you can't use md2 as an array. However, if it was
declared as an array, it might encourage such (wrong) usage.
In particular, nsh_hdr->md2[1] is always wrong.
It seems better to not declare md2 as an array.
> };
> };
>
> That was the original design before Ben removed it due to missing
> support in Microsoft compiler. For the Kernel datapath we should go
> back to that.
>
> I wonder about the possible 16-bit alignment of the 32-bit fields,
> though. How is that handled in the kernel?
get_unaligned_*
> Also struct nsh_md1_ctx
> has 32-bit members, which might not be 32-bit aligned in the packet.
I don't see that happening, it seems the header before md1 is 8 bytes
and sizeof(md1) is 32 bytes? And for md2, the standard mandates that
the md2 size is a multiply of 4 bytes, too.
Jiri
^ permalink raw reply
* Re: [PATCH net-next v4] openvswitch: enable NSH support
From: Jan Scheurich @ 2017-08-21 9:31 UTC (permalink / raw)
To: 'Yang, Yi', 'Jiri Benc'
Cc: 'dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org',
'netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org',
'e@erig.me'
In-Reply-To: <CFF8EF42F1132E4CBE2BF0AB6C21C58D727494F3-hqolJogE5njKJFWPz4pdheaU1rCVNFv4@public.gmane.org>
>
> The second member of the union should be a variable length array [] of
> struct nsh_md2_tlv
>
> struct nsh_hdr {
> ovs_be16 ver_flags_ttl_len;
> uint8_t mdtype;
> uint8_t np;
> ovs_16aligned_be32 path_hdr;
> union {
> struct nsh_md1_ctx md1;
> struct nsh_md2_tlv md2[];
> };
> };
>
> That was the original design before Ben removed it due to missing support
> in Microsoft compiler.
> For the Kernel datapath we should go back to that.
>
> I wonder about the possible 16-bit alignment of the 32-bit fields, though.
> How is that handled in the kernel? Also struct nsh_md1_ctx has 32-bit
> members, which might not be 32-bit aligned in the packet.
>
One afterthought:
I think it would be good to add another member to the union to represent unstructured context headers as used, e.g., in the context of push_nsh.
struct nsh_hdr {
ovs_be16 ver_flags_ttl_len;
uint8_t mdtype;
uint8_t np;
ovs_16aligned_be32 path_hdr;
union {
uint8_t ctx_headers[];
struct nsh_md1_ctx md1;
struct nsh_md2_tlv md2[];
};
};
^ permalink raw reply
* [WARNING] stmmac: refcount_t: saturated; leaking memory
From: Corentin Labbe @ 2017-08-21 9:29 UTC (permalink / raw)
To: peppe.cavallaro, alexandre.torgue, davem, netdev; +Cc: linux-kernel
Hello
I got on all my stmmac boards the following warning:
[12605.062840] ------------[ cut here ]------------
[12605.062956] WARNING: CPU: 0 PID: 15637 at /linux-next/lib/refcount.c:77 refcount_add_not_zero+0xa8/0xb8
[12605.062972] refcount_t: saturated; leaking memory.
[12605.062987] Modules linked in: iptable_filter ip_tables x_tables nfnetlink_log nfnetlink sun8i_ce crypto_engine
[12605.063122] CPU: 0 PID: 15637 Comm: kworker/0:1 Not tainted 4.13.0-rc5-next-20170817+ #403
[12605.063136] Hardware name: Allwinner sun8i Family
[12605.063161] Workqueue: rpciod rpc_async_schedule
[12605.063186] Backtrace:
[12605.063222] [<c010cc30>] (dump_backtrace) from [<c010cf2c>] (show_stack+0x18/0x1c)
[12605.063241] r7:c0c51700 r6:00000000 r5:600d0013 r4:c0c51700
[12605.063262] [<c010cf14>] (show_stack) from [<c06ea23c>] (dump_stack+0xac/0xd8)
[12605.063283] [<c06ea190>] (dump_stack) from [<c0125114>] (__warn+0xec/0x104)
[12605.063304] r10:c064c3c8 r9:c0400ea8 r8:0000004d r7:00000009 r6:c0987468 r5:00000000
[12605.063319] r4:ed8cd7b0 r3:00040d00
[12605.063339] [<c0125028>] (__warn) from [<c012516c>] (warn_slowpath_fmt+0x40/0x48)
[12605.063359] r9:ee2c5bb8 r8:ebf8e6c0 r7:c064c3c8 r6:ebf8ee40 r5:0a028725 r4:c0987440
[12605.063379] [<c0125130>] (warn_slowpath_fmt) from [<c0400ea8>] (refcount_add_not_zero+0xa8/0xb8)
[12605.063395] r3:c0c5262c r2:c0987440
[12605.063408] r4:00000001
[12605.063427] [<c0400e00>] (refcount_add_not_zero) from [<c0400ec8>] (refcount_add+0x10/0x50)
[12605.063443] r5:0a028725 r4:ee5f4e80
[12605.063465] [<c0400eb8>] (refcount_add) from [<c065deec>] (tcp_gso_segment+0x448/0x47c)
[12605.063486] [<c065daa4>] (tcp_gso_segment) from [<c065df6c>] (tcp4_gso_segment+0x4c/0xac)
[12605.063506] r10:c08744bc r9:0000005e r8:00000000 r7:00000020 r6:00004833 r5:0000006c
[12605.063520] r4:ee2c5bb8
[12605.063541] [<c065df20>] (tcp4_gso_segment) from [<c067202c>] (inet_gso_segment+0x1a0/0x330)
[12605.063559] r7:00000000 r6:0000184e r5:0000006c r4:ee2c5bb8
[12605.063582] [<c0671e8c>] (inet_gso_segment) from [<c05e4fa0>] (skb_mac_gso_segment+0xe8/0x1f4)
[12605.063602] r10:ee842850 r9:ee9e8000 r8:c0671e8c r7:ee2c5bb8 r6:00000008 r5:00000020
[12605.063616] r4:00004833
[12605.063638] [<c05e4eb8>] (skb_mac_gso_segment) from [<c05e517c>] (__skb_gso_segment+0xd0/0x198)
[12605.063657] r8:00000000 r7:00000020 r6:00004833 r5:00000001 r4:ee2c5bb8
[12605.063679] [<c05e50ac>] (__skb_gso_segment) from [<c05e5700>] (validate_xmit_skb+0x124/0x2e4)
[12605.063698] r9:ee9e8000 r8:00000000 r7:00000020 r6:00004833 r5:00000000 r4:ee2c5bb8
[12605.063720] [<c05e55dc>] (validate_xmit_skb) from [<c05e58f8>] (validate_xmit_skb_list+0x38/0x68)
[12605.063740] r10:ee842850 r9:00000000 r8:00000000 r7:ee9e8000 r6:00000000 r5:ee2c5bb8
[12605.063754] r4:00000000
[12605.063777] [<c05e58c0>] (validate_xmit_skb_list) from [<c06127d4>] (sch_direct_xmit+0x154/0x19c)
[12605.063797] r9:00000000 r8:00000001 r7:ee9e8000 r6:ee91d800 r5:ee2c5bb8 r4:ee842800
[12605.063817] [<c0612680>] (sch_direct_xmit) from [<c05e6148>] (__dev_queue_xmit+0x614/0x7ec)
[12605.063836] r8:013ba711 r7:00000000 r6:00000bd4 r5:ee842800 r4:00000001
[12605.063858] [<c05e5b34>] (__dev_queue_xmit) from [<c05e6334>] (dev_queue_xmit+0x14/0x18)
[12605.063877] r10:00000000 r9:0000000e r8:ee2c5bb8 r7:ee84210c r6:00000000 r5:ee842144
[12605.063891] r4:ee842000
[12605.063913] [<c05e6320>] (dev_queue_xmit) from [<c06303c0>] (ip_finish_output2+0x2e8/0x758)
[12605.063933] [<c06300d8>] (ip_finish_output2) from [<c0632388>] (ip_finish_output+0x238/0x348)
[12605.063953] r10:ee9e8000 r9:00000001 r8:000005dc r7:00000000 r6:c0c4a580 r5:ee336580
[12605.063966] r4:ee2c5bb8
[12605.063985] [<c0632150>] (ip_finish_output) from [<c063382c>] (ip_output+0x10c/0x30c)
[12605.064005] r10:ee9e8000 r9:00000000 r8:00003eb6 r7:00000000 r6:00291fb3 r5:ee2c5bb8
[12605.064018] r4:ef7b1288
[12605.064037] [<c0633720>] (ip_output) from [<c06327b4>] (ip_local_out+0x48/0x84)
[12605.064057] r10:ee27f900 r9:00026900 r8:00026900 r7:00000000 r6:ee336580 r5:c0c4a580
[12605.064070] r4:ee2c5bb8
[12605.064089] [<c063276c>] (ip_local_out) from [<c0632bd0>] (ip_queue_xmit+0x1e8/0x634)
[12605.064107] r7:00000000 r6:ee3368b0 r5:ee336580 r4:ee2c5bb8
[12605.064129] [<c06329e8>] (ip_queue_xmit) from [<c064ddb4>] (tcp_transmit_skb+0x444/0x8e4)
[12605.064149] r10:00000000 r9:00026900 r8:00026900 r7:00000000 r6:00000000 r5:ee2c5bb8
[12605.064163] r4:ee336580
[12605.064184] [<c064d970>] (tcp_transmit_skb) from [<c064e470>] (tcp_write_xmit+0x21c/0xfe0)
[12605.064204] r9:000005a8 r8:0a02817d r7:00000000 r6:ee2c5b00 r5:00000b50 r4:ee336580
[12605.064225] [<c064e254>] (tcp_write_xmit) from [<c064f270>] (__tcp_push_pending_frames+0x3c/0xa4)
[12605.064245] r10:000005a8 r9:00000000 r8:00000f03 r7:0a02817d r6:00008000 r5:ee2c5b00
[12605.064259] r4:ee336580
[12605.064280] [<c064f234>] (__tcp_push_pending_frames) from [<c063c388>] (tcp_push+0xcc/0x138)
[12605.064294] r4:ee336580
[12605.064314] [<c063c2bc>] (tcp_push) from [<c063ffcc>] (do_tcp_sendpages+0x568/0x5b0)
[12605.064331] r7:0000c040 r6:00000f03 r5:ee2c5b00 r4:ee336580
[12605.064351] [<c063fa64>] (do_tcp_sendpages) from [<c06400c0>] (tcp_sendpage_locked+0xac/0xc8)
[12605.064371] r10:ed8cddf4 r9:00000f03 r8:00000f03 r7:00000000 r6:00000008 r5:eff47880
[12605.064385] r4:ee336580
[12605.064405] [<c0640014>] (tcp_sendpage_locked) from [<c0640120>] (tcp_sendpage+0x44/0x5c)
[12605.064423] r7:00000000 r6:eff47880 r5:0000c040 r4:ee336580
[12605.064443] [<c06400dc>] (tcp_sendpage) from [<c0671aec>] (inet_sendpage+0x6c/0x244)
[12605.064461] r8:00000000 r7:eff47880 r6:eec5b980 r5:c06400dc r4:ee336580
[12605.064482] [<c0671a80>] (inet_sendpage) from [<c06c3df0>] (xs_sendpages+0x1e4/0x230)
[12605.064502] r10:ed8cddf4 r9:00001000 r8:ee1893bc r7:eec5b980 r6:ee11d304 r5:00000000
[12605.064515] r4:00000f03
[12605.064535] [<c06c3c0c>] (xs_sendpages) from [<c06c40b0>] (xs_tcp_send_request+0xa0/0x198)
[12605.064555] r10:c09c77c4 r9:ee189200 r8:c0c52990 r7:ed807000 r6:ee11d304 r5:00000000
[12605.064568] r4:ee11d300
[12605.064588] [<c06c4010>] (xs_tcp_send_request) from [<c06c12dc>] (xprt_transmit+0x4c/0x22c)
[12605.064607] r10:00000001 r9:00000000 r8:ed8073a4 r7:ee11d374 r6:ee189200 r5:ee11d300
[12605.064621] r4:ed807000
[12605.064643] [<c06c1290>] (xprt_transmit) from [<c06be528>] (call_transmit+0x168/0x200)
[12605.064662] r9:00000000 r8:1462b931 r7:ee11d300 r6:00004001 r5:ee11d300 r4:ee189200
[12605.064683] [<c06be3c0>] (call_transmit) from [<c06c63e4>] (__rpc_execute+0x60/0x27c)
[12605.064702] r8:c06c5d28 r7:c06c5994 r6:00000001 r5:00000000 r4:ee189200
[12605.064722] [<c06c6384>] (__rpc_execute) from [<c06c6614>] (rpc_async_schedule+0x14/0x18)
[12605.064742] r10:00000001 r9:00000000 r8:00000000 r7:ef7b1700 r6:ef7ac0c0 r5:ee694700
[12605.064756] r4:ee189224
[12605.064777] [<c06c6600>] (rpc_async_schedule) from [<c014126c>] (process_one_work+0x25c/0x524)
[12605.064796] [<c0141010>] (process_one_work) from [<c01421e4>] (worker_thread+0x58/0x588)
[12605.064816] r10:ee694700 r9:ed8cc000 r8:ef7ac0f8 r7:c0c04d00 r6:00000008 r5:ee694718
[12605.064830] r4:ef7ac0c0
[12605.064850] [<c014218c>] (worker_thread) from [<c0148514>] (kthread+0x174/0x1b0)
[12605.064870] r10:ee694700 r9:ee627da8 r8:ee510480 r7:ed8cc000 r6:ef3e5480 r5:00000000
[12605.064884] r4:ee510400
[12605.064906] [<c01483a0>] (kthread) from [<c0108238>] (ret_from_fork+0x14/0x3c)
[12605.064925] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c01483a0
[12605.064939] r4:ef3e5480
[12605.064954] ---[ end trace c8c0d6caaf57fa10 ]---
So basicly it means that stmmac leaks skb ?
Thanks
Regards
^ permalink raw reply
* [iproute PATCH v3 2/6] iplink_can: Prevent overstepping array bounds
From: Phil Sutter @ 2017-08-21 9:27 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20170821092704.21614-1-phil@nwl.cc>
can_state_names array contains at most CAN_STATE_MAX fields, so allowing
an index to it to be equal to that number is wrong. While here, also
make sure the array is indeed that big so nothing bad happens if
CAN_STATE_MAX ever increases.
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
ip/iplink_can.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/ip/iplink_can.c b/ip/iplink_can.c
index 5df56b2bbbb3b..2954010fefa22 100644
--- a/ip/iplink_can.c
+++ b/ip/iplink_can.c
@@ -251,7 +251,7 @@ static int can_parse_opt(struct link_util *lu, int argc, char **argv,
return 0;
}
-static const char *can_state_names[] = {
+static const char *can_state_names[CAN_STATE_MAX] = {
[CAN_STATE_ERROR_ACTIVE] = "ERROR-ACTIVE",
[CAN_STATE_ERROR_WARNING] = "ERROR-WARNING",
[CAN_STATE_ERROR_PASSIVE] = "ERROR-PASSIVE",
@@ -275,7 +275,7 @@ static void can_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
if (tb[IFLA_CAN_STATE]) {
uint32_t state = rta_getattr_u32(tb[IFLA_CAN_STATE]);
- fprintf(f, "state %s ", state <= CAN_STATE_MAX ?
+ fprintf(f, "state %s ", state < CAN_STATE_MAX ?
can_state_names[state] : "UNKNOWN");
}
--
2.13.1
^ permalink raw reply related
* [iproute PATCH v3 5/6] netem/maketable: Check return value of fstat()
From: Phil Sutter @ 2017-08-21 9:27 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20170821092704.21614-1-phil@nwl.cc>
Otherwise info.st_size may contain garbage.
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
netem/maketable.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/netem/maketable.c b/netem/maketable.c
index 6aff927be7040..ad660e7d457f0 100644
--- a/netem/maketable.c
+++ b/netem/maketable.c
@@ -24,8 +24,8 @@ readdoubles(FILE *fp, int *number)
int limit;
int n=0, i;
- fstat(fileno(fp), &info);
- if (info.st_size > 0) {
+ if (!fstat(fileno(fp), &info) &&
+ info.st_size > 0) {
limit = 2*info.st_size/sizeof(double); /* @@ approximate */
} else {
limit = 10000;
--
2.13.1
^ permalink raw reply related
* [iproute PATCH v3 6/6] tc/q_multiq: Don't pass garbage in TCA_OPTIONS
From: Phil Sutter @ 2017-08-21 9:27 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20170821092704.21614-1-phil@nwl.cc>
multiq_parse_opt() doesn't change 'opt' at all. So at least make sure
it doesn't fill TCA_OPTIONS attribute with garbage from stack.
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
tc/q_multiq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tc/q_multiq.c b/tc/q_multiq.c
index 7823931494563..9c09c9a7748f6 100644
--- a/tc/q_multiq.c
+++ b/tc/q_multiq.c
@@ -43,7 +43,7 @@ static void explain(void)
static int multiq_parse_opt(struct qdisc_util *qu, int argc, char **argv,
struct nlmsghdr *n)
{
- struct tc_multiq_qopt opt;
+ struct tc_multiq_qopt opt = {};
if (argc) {
if (strcmp(*argv, "help") == 0) {
--
2.13.1
^ permalink raw reply related
* [iproute PATCH v3 1/6] ipaddress: Avoid accessing uninitialized variable lcl
From: Phil Sutter @ 2017-08-21 9:26 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20170821092704.21614-1-phil@nwl.cc>
If no address was given, ipaddr_modify() accesses uninitialized data
when assigning to req.ifa.ifa_prefixlen.
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
ip/ipaddress.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 4d37c5e045071..c9312f06dbd4d 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1888,7 +1888,7 @@ static int ipaddr_modify(int cmd, int flags, int argc, char **argv)
char *lcl_arg = NULL;
char *valid_lftp = NULL;
char *preferred_lftp = NULL;
- inet_prefix lcl;
+ inet_prefix lcl = {};
inet_prefix peer;
int local_len = 0;
int peer_len = 0;
--
2.13.1
^ permalink raw reply related
* [iproute PATCH v3 0/6] Covscan: Don't access garbage
From: Phil Sutter @ 2017-08-21 9:26 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
This series collects patches from v1 which resolve situations where
garbage might be read, either due to missing initialization of
variables or accessing data which went out of scope.
Changes since v2:
- Rebased onto current master branch.
- Dropped first patch since it is not a real issue.
Phil Sutter (6):
ipaddress: Avoid accessing uninitialized variable lcl
iplink_can: Prevent overstepping array bounds
ipmaddr: Avoid accessing uninitialized data
ss: Use C99 initializer in netlink_show_one()
netem/maketable: Check return value of fstat()
tc/q_multiq: Don't pass garbage in TCA_OPTIONS
ip/ipaddress.c | 2 +-
ip/iplink_can.c | 4 ++--
ip/ipmaddr.c | 2 +-
misc/ss.c | 13 +++++++------
netem/maketable.c | 4 ++--
tc/q_multiq.c | 2 +-
6 files changed, 14 insertions(+), 13 deletions(-)
--
2.13.1
^ permalink raw reply
* [iproute PATCH v3 3/6] ipmaddr: Avoid accessing uninitialized data
From: Phil Sutter @ 2017-08-21 9:27 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20170821092704.21614-1-phil@nwl.cc>
Looks like this can only happen if /proc/net/igmp is malformed, but
better be sure.
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
ip/ipmaddr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/ip/ipmaddr.c b/ip/ipmaddr.c
index 4f726fdd976f1..85a69e779563d 100644
--- a/ip/ipmaddr.c
+++ b/ip/ipmaddr.c
@@ -136,7 +136,7 @@ static void read_igmp(struct ma_info **result_p)
while (fgets(buf, sizeof(buf), fp)) {
struct ma_info *ma;
- size_t len;
+ size_t len = 0;
if (buf[0] != '\t') {
sscanf(buf, "%d%s", &m.index, m.name);
--
2.13.1
^ permalink raw reply related
* [iproute PATCH v3 4/6] ss: Use C99 initializer in netlink_show_one()
From: Phil Sutter @ 2017-08-21 9:27 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20170821092704.21614-1-phil@nwl.cc>
This has the additional benefit of initializing st.ino to zero which is
used later in is_sctp_assoc() function.
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
misc/ss.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/misc/ss.c b/misc/ss.c
index 10360e5a04ff8..63d12871dd826 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -3480,17 +3480,18 @@ static int netlink_show_one(struct filter *f,
int rq, int wq,
unsigned long long sk, unsigned long long cb)
{
- struct sockstat st;
+ struct sockstat st = {
+ .state = SS_CLOSE,
+ .rq = rq,
+ .wq = wq,
+ .local.family = AF_NETLINK,
+ .remote.family = AF_NETLINK,
+ };
SPRINT_BUF(prot_buf) = {};
const char *prot_name;
char procname[64] = {};
- st.state = SS_CLOSE;
- st.rq = rq;
- st.wq = wq;
- st.local.family = st.remote.family = AF_NETLINK;
-
if (f->f) {
st.rport = -1;
st.lport = pid;
--
2.13.1
^ permalink raw reply related
* Re: [PATCH net-next v4] openvswitch: enable NSH support
From: Jiri Benc @ 2017-08-21 9:18 UTC (permalink / raw)
To: Yang, Yi
Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, e@erig.me
In-Reply-To: <20170821083900.GA74649-re2EX8HDrk21gSHoDXDV2kEOCMrvLtNR@public.gmane.org>
On Mon, 21 Aug 2017 16:39:01 +0800, Yang, Yi wrote:
> Anyway, we need to keep the code in userspace consistent with the one in
> kernel as possible as, otherwise it will be a burden for developer, I
> know userspace has different coding standard from kernel, this will make
> developer painful if we have two sets of code although they have same
> functionality.
I'm sorry, I don't get this. What's wrong with having __u8[] as the
last member of the struct? That's C99. It's 18 years old standard.
We're using that throughout our uAPI. Why that should be a problem for
any user space program?
> > MPLS supports GSO and needs this for segmentation. I don't see anything
> > GSO related in this patch.
> >
> > How do you plan to address GSO, anyway?
>
> No plan to do that, I'm not an expert on this, we can remove it if
> you're very sure it is necessary.
Without GSO, I don't see any use for inner_protocol.
However, don't you need to software segment the packet if it's GSO
before pushing the NSH header?
And wouldn't it be better to implement GSO for NSH, anyway?
> To make sure we make agreement, please confirm if this one is ok?
>
> struct nsh_hdr {
> ovs_be16 ver_flags_ttl_len;
> uint8_t mdtype;
> uint8_t np;
> ovs_16aligned_be32 path_hdr;
> union {
> struct nsh_md1_ctx md1;
> struct nsh_md2_tlv md2;
> };
> };
>
> Or it will be better if you can provide your preferred version here.
I don't really care that much about the names if it's clear what they
mean. I was merely commenting on the inconsistency which looked weird.
Whether it's md_type or mdtype, I don't have a preference (does not
mean others won't, though :-)). Just pick one and stick to it, as far
as I'm concerned.
Jiri
^ permalink raw reply
* Re: [PATCH net-next v4] openvswitch: enable NSH support
From: Yang, Yi @ 2017-08-21 9:15 UTC (permalink / raw)
To: Jiri Benc
Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, e@erig.me
In-Reply-To: <20170821111854.42dece9f@griffin>
On Mon, Aug 21, 2017 at 11:18:54AM +0200, Jiri Benc wrote:
> On Mon, 21 Aug 2017 16:39:01 +0800, Yang, Yi wrote:
> > Anyway, we need to keep the code in userspace consistent with the one in
> > kernel as possible as, otherwise it will be a burden for developer, I
> > know userspace has different coding standard from kernel, this will make
> > developer painful if we have two sets of code although they have same
> > functionality.
>
> I'm sorry, I don't get this. What's wrong with having __u8[] as the
> last member of the struct? That's C99. It's 18 years old standard.
> We're using that throughout our uAPI. Why that should be a problem for
> any user space program?
The issue is it is used union in
struct nsh_hdr {
ovs_be16 ver_flags_ttl_len;
uint8_t md_type;
uint8_t next_proto;
ovs_16aligned_be32 path_hdr;
union {
struct nsh_md1_ctx md1;
struct nsh_md2_tlv md2;
};
};
in Linux kernel build, it complained it, I changed it to
struct nsh_hdr {
ovs_be16 ver_flags_ttl_len;
uint8_t md_type;
uint8_t next_proto;
ovs_16aligned_be32 path_hdr;
union {
struct nsh_md1_ctx md1;
struct nsh_md2_tlv md2[0];
};
};
It is ok, but for Microsoft compiler, it isn't allowed there is struct
nsh_md2_tlv md2[0] in a union, that is Ben Pfaff's hack :-)
>
> > > MPLS supports GSO and needs this for segmentation. I don't see anything
> > > GSO related in this patch.
> > >
> > > How do you plan to address GSO, anyway?
> >
> > No plan to do that, I'm not an expert on this, we can remove it if
> > you're very sure it is necessary.
>
> Without GSO, I don't see any use for inner_protocol.
>
> However, don't you need to software segment the packet if it's GSO
> before pushing the NSH header?
>
> And wouldn't it be better to implement GSO for NSH, anyway?
I don't know how we can support this, is it a must-have thing?
>
> > To make sure we make agreement, please confirm if this one is ok?
> >
> > struct nsh_hdr {
> > ovs_be16 ver_flags_ttl_len;
> > uint8_t mdtype;
> > uint8_t np;
> > ovs_16aligned_be32 path_hdr;
> > union {
> > struct nsh_md1_ctx md1;
> > struct nsh_md2_tlv md2;
> > };
> > };
> >
> > Or it will be better if you can provide your preferred version here.
>
> I don't really care that much about the names if it's clear what they
> mean. I was merely commenting on the inconsistency which looked weird.
> Whether it's md_type or mdtype, I don't have a preference (does not
> mean others won't, though :-)). Just pick one and stick to it, as far
> as I'm concerned.
But struct nsh_hdr had different struct from struct ovs_key_nsh, we
have no way to make them completely same, do you mean we should use the
same name if they are same fields and represent the same thing?
>
> Jiri
^ permalink raw reply
* RE: [PATCH net-next v4] openvswitch: enable NSH support
From: Jan Scheurich @ 2017-08-21 9:04 UTC (permalink / raw)
To: Yang, Yi, Jiri Benc
Cc: netdev@vger.kernel.org, dev@openvswitch.org, blp@ovn.org,
e@erig.me
In-Reply-To: <20170821083900.GA74649@cran64.bj.intel.com>
> struct nsh_hdr {
> ovs_be16 ver_flags_ttl_len;
> uint8_t mdtype;
> uint8_t np;
> ovs_16aligned_be32 path_hdr;
> union {
> struct nsh_md1_ctx md1;
> struct nsh_md2_tlv md2;
> };
> };
>
The second member of the union should be a variable length array [] of struct nsh_md2_tlv
struct nsh_hdr {
ovs_be16 ver_flags_ttl_len;
uint8_t mdtype;
uint8_t np;
ovs_16aligned_be32 path_hdr;
union {
struct nsh_md1_ctx md1;
struct nsh_md2_tlv md2[];
};
};
That was the original design before Ben removed it due to missing support in Microsoft compiler.
For the Kernel datapath we should go back to that.
I wonder about the possible 16-bit alignment of the 32-bit fields, though. How is that handled in the kernel? Also struct nsh_md1_ctx has 32-bit members, which might not be 32-bit aligned in the packet.
BR, Jan
^ permalink raw reply
* Re: [PATCH v2 0/5] ARM: dts: rcar-gen2: Convert to new CPG/MSSR bindings
From: Simon Horman @ 2017-08-21 9:02 UTC (permalink / raw)
To: Geert Uytterhoeven
Cc: Magnus Damm, linux-renesas-soc, linux-clk, linux-arm-kernel,
devicetree, Andrew Lunn, Florian Fainelli, netdev
In-Reply-To: <1503047498-29078-1-git-send-email-geert+renesas@glider.be>
On Fri, Aug 18, 2017 at 11:11:33AM +0200, Geert Uytterhoeven wrote:
> Hi Simon, Magnus,
>
> Currently Renesas R-Car Gen2 SoCs use the common clk-rcar-gen2,
> clk-mstp, and clk-div6 drivers, which depend on most clocks being
> described in DT. Especially the module (MSTP) clocks are cumbersome and
> error prone, due to 3 arrays (clocks, clock-indices, and
> clock-output-names) to be kept in sync. In addition, the clk-mstp driver
> cannot be extended easily to also support module resets, which are
> provided by the same hardware module.
>
> Hence when developing support for R-Car Gen3 SoCs, another approach was
> chosen, which led to the CPG/MSSR driver core, and SoC-specific
> subdrivers (initially for R-Car Gen3, but later also for RZ/G1).
>
> This series converts the various R-Car Gen2 DTSes to migrate to the new
> CPG/MSSR drivers that were added in v4.13-rc1.
>
> Note that module reset descriptions will be added later.
>
> Changes compared to v1:
> - Rebased.
>
> Dependencies:
> - renesas-devel-20170818-v4.13-rc5.
>
> Known issues:
> - The CPG/MSSR driver is initialized later than the old clk-rcar-gen2
> driver, causing changes of initialization order for other drivers.
>
> Currently the PHY subsystem does not support probe deferral
>
> +irq: no irq domain found for /interrupt-controller@e61c0000 !
>
> -Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY driver [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=182)
> +Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY driver [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=-1)
>
> leading to the Ethernet PHY falling back to polling instead of using an
> interrupt. This can be remedied by "of_mdio: Fix broken PHY IRQ in
> case of probe deferral" (https://patchwork.kernel.org/patch/9734175/),
> which is still pending approval.
>
> Fortunately the impact of this is limited to a small delay in the
> detection of cable (un)plugging.
> Note that when using the PHY interrupt, cable unplugging is reported
> instantaneous, but plugging takes ca. 1.5-1.8 seconds. Hence when
> using polling with the standard interval of 1 second, the link comes up
> in about the same time.
>
> For your convenience, this series is also available in the
> topic/rcar2-cpg-mssr-dt-v2 branch of my renesas-drivers git repository at
> git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers.git.
>
> This has been tested on r8a7790/lager, r8a7791/koelsch, r8a7792/blanche,
> r8a7793/gose, and r8a7794/alt. /sys/kernel/debug/clk/clk_summary has
> been compared before and after the conversion.
>
> Thanks for applying!
Thanks, applied for v4.15.
^ permalink raw reply
* Re: [iproute PATCH v2 1/7] devlink: No need for this self-assignment
From: Jiri Pirko @ 2017-08-21 9:02 UTC (permalink / raw)
To: Phil Sutter, Stephen Hemminger, netdev
In-Reply-To: <20170818102024.GF10864@orbyte.nwl.cc>
Fri, Aug 18, 2017 at 12:20:24PM CEST, phil@nwl.cc wrote:
>On Thu, Aug 17, 2017 at 09:48:50PM +0200, Jiri Pirko wrote:
>> Thu, Aug 17, 2017 at 07:09:25PM CEST, phil@nwl.cc wrote:
>> >dl_argv_handle_both() will either assign to handle_bit or error out in
>> >which case the variable is not used by the caller.
>>
>> I'm pretty sure that I did this to silence the compiler. If the compiler
>> bug is fixed now, good.
>
>That might depend on the compiler you used, so maybe you just want to
>give it a try in your environment? If it still happens, we can keep this
>self-assignment of course since it shouldn't harm.
No warning with gcc 6.3.1
^ permalink raw reply
* [PATCH v3] net: sunrpc: svcsock: fix NULL-pointer exception
From: Vadim Lomovtsev @ 2017-08-21 8:56 UTC (permalink / raw)
To: trond.myklebust, anna.schumaker, bfields, jlayton, davem,
linux-nfs, netdev, linux-kernel, pabeni
Cc: Vadim Lomovtsev
In-Reply-To: <1503305085-5488-1-git-send-email-vlomovts@redhat.com>
While running nfs/connectathon tests kernel NULL-pointer exception
has been observed due to races in svcsock.c.
Race is appear when kernel accepts connection by kernel_accept
(which creates new socket) and start queuing ingress packets
to new socket. This happens in ksoftirq context which could run
concurrently on a different core while new socket setup is not done yet.
The fix is to re-order socket user data init sequence and add
write/read barrier calls to be sure that we got proper values
for callback pointers before actually calling them.
Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
Crash log:
---<-snip->---
[ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 6708.647093] pgd = ffff0000094e0000
[ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
[ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
[ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops
[ 6708.736446] ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
[ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G W OE 4.11.0-4.el7.aarch64 #1
[ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
[ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
[ 6708.773822] PC is at 0x0
[ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
[ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
[ 6708.789248] sp : ffff810ffbad3900
[ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
[ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
[ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
[ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
[ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
[ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
[ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
[ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
[ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
[ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
[ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
[ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
[ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
[ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
[ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
[ 6708.872084]
[ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
[ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
[ 6708.886075] Call trace:
[ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
[ 6708.894942] 3700: ffff810012412400 0001000000000000
[ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
[ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
[ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
[ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
[ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
[ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
[ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
[ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
[ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
[ 6708.973117] [< (null)>] (null)
[ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
[ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
[ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
[ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
[ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
[ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
[ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
[ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
[ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
[ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
[ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
[ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
[ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
[ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
[ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
[ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
[ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
[ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
[ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
[ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
[ 6709.092929] fde0: 0000810ff2ec0000 ffff000008c10000
[ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
[ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
[ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
[ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
[ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
[ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
[ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
[ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
[ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
[ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
[ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
[ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
[ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
[ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
[ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
[ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
[ 6709.207967] Code: bad PC value
[ 6709.211061] SMP: stopping secondary CPUs
[ 6709.218830] Starting crashdump kernel...
[ 6709.222749] Bye!
---<-snip>---
Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
---
net/sunrpc/svcsock.c | 27 +++++++++++++++++++++++++--
1 file changed, 25 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 2b720fa..9ec16c5 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -421,6 +421,10 @@ static void svc_data_ready(struct sock *sk)
dprintk("svc: socket %p(inet %p), busy=%d\n",
svsk, sk,
test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+ /* this barrier is necessary to prevent kernel crash
+ (due to bad CPU-value) caused by races against
+ svc_setup_socket() while calling sk_odata() callback. */
+ rmb();
svsk->sk_odata(sk);
if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
svc_xprt_enqueue(&svsk->sk_xprt);
@@ -437,6 +441,10 @@ static void svc_write_space(struct sock *sk)
if (svsk) {
dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+ /* this barrier is necessary to prevent kernel crash
+ (due to bad CPU-value) caused by races against
+ svc_setup_socket() while calling sk_owspace() callback. */
+ rmb();
svsk->sk_owspace(sk);
svc_xprt_enqueue(&svsk->sk_xprt);
}
@@ -760,8 +768,14 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
dprintk("svc: socket %p TCP (listen) state change %d\n",
sk, sk->sk_state);
- if (svsk)
+ if (svsk) {
+ /* this barrier is necessary to prevent kernel crash
+ (due to bad CPU-value) caused by races against
+ svc_setup_socket() while calling sk_odata() callback.*/
+ rmb();
svsk->sk_odata(sk);
+ }
+
/*
* This callback may called twice when a new connection
* is established as a child socket inherits everything
@@ -794,7 +808,12 @@ static void svc_tcp_state_change(struct sock *sk)
if (!svsk)
printk("svc: socket %p: no user data\n", sk);
else {
+ /* this barrier is necessary to prevent kernel crash
+ (due to bad CPU-value) caused by races against
+ svc_setup_socket() while calling sk_ostate() callback. */
+ rmb();
svsk->sk_ostate(sk);
+
if (sk->sk_state != TCP_ESTABLISHED) {
set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
svc_xprt_enqueue(&svsk->sk_xprt);
@@ -1381,12 +1400,16 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
return ERR_PTR(err);
}
- inet->sk_user_data = svsk;
svsk->sk_sock = sock;
svsk->sk_sk = inet;
svsk->sk_ostate = inet->sk_state_change;
svsk->sk_odata = inet->sk_data_ready;
svsk->sk_owspace = inet->sk_write_space;
+ /* this barrier is necessary in order to prevent race condition
+ with svc_data_ready(), svc_listen_data_ready()
+ and others when calling callbacks above */
+ wmb();
+ inet->sk_user_data = svsk;
/* Initialize the socket */
if (sock->type == SOCK_DGRAM)
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH net-next v5] openvswitch: enable NSH support
From: Yang, Yi @ 2017-08-21 8:41 UTC (permalink / raw)
To: Jiri Benc; +Cc: netdev, dev, blp, e, jan.scheurich
In-Reply-To: <20170821102509.076d7cc9@griffin>
On Mon, Aug 21, 2017 at 10:25:09AM +0200, Jiri Benc wrote:
> On Mon, 21 Aug 2017 13:54:23 +0800, Yi Yang wrote:
> > v4->v5
> > - Fix many comments by Jiri Benc and Eric Garver
> > for v4.
>
> NACK for v5, we haven't finished discussing v4. You're sending new
> versions too quickly.
>
> My comment about sending a new version was meant only for the rest of
> the patch that I have not reviewed - i.e. that once the comments I've
> made so far are addressed, I'm fine with continuing reviewing in a new
> version.
>
> It did not mean that I'm okay with sending a new version without my
> comments to the first part being addressed. Which they are not, see my
> reply to your mail at v4.
>
> Sorry if that was not clear but I won't review v5 until we have
> conclusion for the v4 comments.
Ok, let me fix comments for v4 first.
>
> Jiri
^ permalink raw reply
* Re: [PATCH v2] net: sunrpc: svcsock: fix NULL-pointer exception
From: Vadim Lomovtsev @ 2017-08-21 8:50 UTC (permalink / raw)
To: trond.myklebust, anna.schumaker, bfields, jlayton, davem,
linux-nfs, netdev, linux-kernel, pabeni
Cc: vlomovts
In-Reply-To: <1503305085-5488-1-git-send-email-vlomovts@redhat.com>
Sorry guys, please ignore this - being sent by mistake.
Have some typos at comments by copy-paste, will correct and re-send
Vadim
On Mon, Aug 21, 2017 at 04:44:45AM -0400, Vadim Lomovtsev wrote:
> While running nfs/connectathon tests kernel NULL-pointer exception
> has been observed due to races in svcsock.c.
>
> Race is appear when kernel accepts connection by kernel_accept
> (which creates new socket) and start queuing ingress packets
> to new socket. This happens in ksoftirq context which could run
> concurrently on a different core while new socket setup is not done yet.
>
> The fix is to re-order socket user data init sequence and add
> write/read barrier calls to be sure that we got proper values
> for callback pointers before actually calling them.
>
> Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
>
> Crash log:
> ---<-snip->---
> [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [ 6708.647093] pgd = ffff0000094e0000
> [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgb
lt fb_sys_fops
> [ 6708.736446] ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G W OE 4.11.0-4.el7.aarch64 #1
> [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> [ 6708.773822] PC is at 0x0
> [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> [ 6708.789248] sp : ffff810ffbad3900
> [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> [ 6708.872084]
> [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> [ 6708.886075] Call trace:
> [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> [ 6708.894942] 3700: ffff810012412400 0001000000000000
> [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> [ 6708.973117] [< (null)>] (null)
> [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> [ 6709.092929] fde0: 0000810ff2ec0000 ffff000008c10000
> [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> [ 6709.207967] Code: bad PC value
> [ 6709.211061] SMP: stopping secondary CPUs
> [ 6709.218830] Starting crashdump kernel...
> [ 6709.222749] Bye!
> ---<-snip>---
>
> Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> ---
> net/sunrpc/svcsock.c | 27 +++++++++++++++++++++++++--
> 1 file changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 2b720fa..2be967f 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -421,6 +421,10 @@ static void svc_data_ready(struct sock *sk)
> dprintk("svc: socket %p(inet %p), busy=%d\n",
> svsk, sk,
> test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> + /* this barrier is necessary to prevent kernel crash
> + (due to bad CPU-value) caused by races aginst
> + svc_setup_socket() while calling sk_odata() callback. */
> + rmb();
> svsk->sk_odata(sk);
> if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
> svc_xprt_enqueue(&svsk->sk_xprt);
> @@ -437,6 +441,10 @@ static void svc_write_space(struct sock *sk)
> if (svsk) {
> dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
> svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> + /* this barrier is necessary to prevent kernel crash
> + (due to bad CPU-value) caused by races aginst
> + svc_setup_socket() while calling sk_owspace() callback. */
> + rmb();
> svsk->sk_owspace(sk);
> svc_xprt_enqueue(&svsk->sk_xprt);
> }
> @@ -760,8 +768,14 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
> dprintk("svc: socket %p TCP (listen) state change %d\n",
> sk, sk->sk_state);
>
> - if (svsk)
> + if (svsk) {
> + /* this barrier is necessary to prevent kernel crash
> + (due to bad CPU-value) caused by races aginst
> + svc_setup_socket() while calling sk_odata() callback.*/
> + rmb();
> svsk->sk_odata(sk);
> + }
> +
> /*
> * This callback may called twice when a new connection
> * is established as a child socket inherits everything
> @@ -794,7 +808,12 @@ static void svc_tcp_state_change(struct sock *sk)
> if (!svsk)
> printk("svc: socket %p: no user data\n", sk);
> else {
> + /* this barrier is necessary to prevent kernel crash
> + (due to bad CPU-value) caused by races aginst
> + svc_setup_socket() while calling sk_odata() callback. */
> + rmb();
> svsk->sk_ostate(sk);
> +
> if (sk->sk_state != TCP_ESTABLISHED) {
> set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> svc_xprt_enqueue(&svsk->sk_xprt);
> @@ -1381,12 +1400,16 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
> return ERR_PTR(err);
> }
>
> - inet->sk_user_data = svsk;
> svsk->sk_sock = sock;
> svsk->sk_sk = inet;
> svsk->sk_ostate = inet->sk_state_change;
> svsk->sk_odata = inet->sk_data_ready;
> svsk->sk_owspace = inet->sk_write_space;
> + /* this barrier is necessary in order to prevent race condition
> + with svc_data_ready(), svc_listen_data_ready()
> + and others when calling callbacks above */
> + wmb();
> + inet->sk_user_data = svsk;
>
> /* Initialize the socket */
> if (sock->type == SOCK_DGRAM)
> --
> 1.8.3.1
>
^ permalink raw reply
* [PATCH v2] net: sunrpc: svcsock: fix NULL-pointer exception
From: Vadim Lomovtsev @ 2017-08-21 8:44 UTC (permalink / raw)
To: trond.myklebust-7I+n7zu2hftEKMMhf/gKZA,
anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA,
bfields-uC3wQj2KruNg9hUCZPvPmw, jlayton-vpEMnDpepFuMZCB2o+C8xQ,
davem-fT/PcQaiUtIeIZ0/mPfg9Q, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
pabeni-H+wXaHxf7aLQT0dZR+AlfA
Cc: Vadim Lomovtsev
In-Reply-To: <1503050447-13362-1-git-send-email-vlomovts-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
While running nfs/connectathon tests kernel NULL-pointer exception
has been observed due to races in svcsock.c.
Race is appear when kernel accepts connection by kernel_accept
(which creates new socket) and start queuing ingress packets
to new socket. This happens in ksoftirq context which could run
concurrently on a different core while new socket setup is not done yet.
The fix is to re-order socket user data init sequence and add
write/read barrier calls to be sure that we got proper values
for callback pointers before actually calling them.
Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
Crash log:
---<-snip->---
[ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 6708.647093] pgd = ffff0000094e0000
[ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
[ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
[ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops
[ 6708.736446] ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
[ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G W OE 4.11.0-4.el7.aarch64 #1
[ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
[ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
[ 6708.773822] PC is at 0x0
[ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
[ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
[ 6708.789248] sp : ffff810ffbad3900
[ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
[ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
[ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
[ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
[ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
[ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
[ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
[ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
[ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
[ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
[ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
[ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
[ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
[ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
[ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
[ 6708.872084]
[ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
[ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
[ 6708.886075] Call trace:
[ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
[ 6708.894942] 3700: ffff810012412400 0001000000000000
[ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
[ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
[ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
[ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
[ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
[ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
[ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
[ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
[ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
[ 6708.973117] [< (null)>] (null)
[ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
[ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
[ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
[ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
[ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
[ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
[ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
[ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
[ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
[ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
[ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
[ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
[ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
[ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
[ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
[ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
[ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
[ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
[ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
[ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
[ 6709.092929] fde0: 0000810ff2ec0000 ffff000008c10000
[ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
[ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
[ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
[ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
[ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
[ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
[ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
[ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
[ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
[ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
[ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
[ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
[ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
[ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
[ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
[ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
[ 6709.207967] Code: bad PC value
[ 6709.211061] SMP: stopping secondary CPUs
[ 6709.218830] Starting crashdump kernel...
[ 6709.222749] Bye!
---<-snip>---
Signed-off-by: Vadim Lomovtsev <vlomovts-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
net/sunrpc/svcsock.c | 27 +++++++++++++++++++++++++--
1 file changed, 25 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 2b720fa..2be967f 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -421,6 +421,10 @@ static void svc_data_ready(struct sock *sk)
dprintk("svc: socket %p(inet %p), busy=%d\n",
svsk, sk,
test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+ /* this barrier is necessary to prevent kernel crash
+ (due to bad CPU-value) caused by races aginst
+ svc_setup_socket() while calling sk_odata() callback. */
+ rmb();
svsk->sk_odata(sk);
if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
svc_xprt_enqueue(&svsk->sk_xprt);
@@ -437,6 +441,10 @@ static void svc_write_space(struct sock *sk)
if (svsk) {
dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+ /* this barrier is necessary to prevent kernel crash
+ (due to bad CPU-value) caused by races aginst
+ svc_setup_socket() while calling sk_owspace() callback. */
+ rmb();
svsk->sk_owspace(sk);
svc_xprt_enqueue(&svsk->sk_xprt);
}
@@ -760,8 +768,14 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
dprintk("svc: socket %p TCP (listen) state change %d\n",
sk, sk->sk_state);
- if (svsk)
+ if (svsk) {
+ /* this barrier is necessary to prevent kernel crash
+ (due to bad CPU-value) caused by races aginst
+ svc_setup_socket() while calling sk_odata() callback.*/
+ rmb();
svsk->sk_odata(sk);
+ }
+
/*
* This callback may called twice when a new connection
* is established as a child socket inherits everything
@@ -794,7 +808,12 @@ static void svc_tcp_state_change(struct sock *sk)
if (!svsk)
printk("svc: socket %p: no user data\n", sk);
else {
+ /* this barrier is necessary to prevent kernel crash
+ (due to bad CPU-value) caused by races aginst
+ svc_setup_socket() while calling sk_odata() callback. */
+ rmb();
svsk->sk_ostate(sk);
+
if (sk->sk_state != TCP_ESTABLISHED) {
set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
svc_xprt_enqueue(&svsk->sk_xprt);
@@ -1381,12 +1400,16 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
return ERR_PTR(err);
}
- inet->sk_user_data = svsk;
svsk->sk_sock = sock;
svsk->sk_sk = inet;
svsk->sk_ostate = inet->sk_state_change;
svsk->sk_odata = inet->sk_data_ready;
svsk->sk_owspace = inet->sk_write_space;
+ /* this barrier is necessary in order to prevent race condition
+ with svc_data_ready(), svc_listen_data_ready()
+ and others when calling callbacks above */
+ wmb();
+ inet->sk_user_data = svsk;
/* Initialize the socket */
if (sock->type == SOCK_DGRAM)
--
1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox