Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v4 net-next 06/10] net/ncsi: Ethtool operation to get NCSI hw statistics
From: David Miller @ 2017-05-04  0:16 UTC (permalink / raw)
  To: gwshan; +Cc: andrew, netdev, joe, kubakici, f.fainelli
In-Reply-To: <20170504000534.GA11287@gwshan>

From: Gavin Shan <gwshan@linux.vnet.ibm.com>
Date: Thu, 4 May 2017 10:05:34 +1000

> On Wed, May 03, 2017 at 09:18:23AM -0400, David Miller wrote:
>>From: Andrew Lunn <andrew@lunn.ch>
>>Date: Wed, 3 May 2017 14:47:22 +0200
>>
>>> On Wed, May 03, 2017 at 02:44:37PM +1000, Gavin Shan wrote:
>>>> This adds ethtool command (ETHTOOL_GNCSISTATS) to retrieve the
>>>> NCSI hardware statistics.
>>> 
>>> Hi Gavin
>>> 
>>> I've not been following this patchset, so maybe i'm about to ask a
>>> question which has already been asked and answered.
>>> 
>>> Why cannot use just use ethtool -S for this?
>>
>>Indeed, when I said to use ethtool for these NCSI hw stats I meant
>>that the "ethtool -S" be used, not some new ethtool command.
>>
> 
> Thanks for the comments. Feel free to ask any questions which would
> make the code clear and better. There are couple of factors I thought
> separate command is better than ETHTOOL_GSTATS: The statistic items from
> ETHTOOL_GSTATS are variable, meaning the kernel needs provide the layout
> of the statistic items via ETHTOOL_GSSET_INFO and ETHTOOL_GSTRINGS.
> NCSI HW statistics aren't following this and their layout is fixed.
> Besides, statistics for ETHTOOL_GSTATS are maintained in local MAC,
> but NCSI HW statistics are collected from NIC on far-end. So they're
> different from this point. Lastly, the NCSI software statistics needs
> separate command. It'd better to have separate command for HW statistics
> as well, to make things consistent.

ETHTOOL_GSTATS are for device specific statistics.  Whether they are
fixed in your implementation or not.

^ permalink raw reply

* Re: [PATCH v4 net-next 06/10] net/ncsi: Ethtool operation to get NCSI hw statistics
From: Andrew Lunn @ 2017-05-04  0:34 UTC (permalink / raw)
  To: Gavin Shan; +Cc: David Miller, netdev, joe, kubakici, f.fainelli
In-Reply-To: <20170504000534.GA11287@gwshan>

On Thu, May 04, 2017 at 10:05:34AM +1000, Gavin Shan wrote:
> On Wed, May 03, 2017 at 09:18:23AM -0400, David Miller wrote:
> >From: Andrew Lunn <andrew@lunn.ch>
> >Date: Wed, 3 May 2017 14:47:22 +0200
> >
> >> On Wed, May 03, 2017 at 02:44:37PM +1000, Gavin Shan wrote:
> >>> This adds ethtool command (ETHTOOL_GNCSISTATS) to retrieve the
> >>> NCSI hardware statistics.
> >> 
> >> Hi Gavin
> >> 
> >> I've not been following this patchset, so maybe i'm about to ask a
> >> question which has already been asked and answered.
> >> 
> >> Why cannot use just use ethtool -S for this?
> >
> >Indeed, when I said to use ethtool for these NCSI hw stats I meant
> >that the "ethtool -S" be used, not some new ethtool command.
> >
> 
> Thanks for the comments. Feel free to ask any questions which would
> make the code clear and better. There are couple of factors I thought
> separate command is better than ETHTOOL_GSTATS: The statistic items from
> ETHTOOL_GSTATS are variable, meaning the kernel needs provide the layout
> of the statistic items via ETHTOOL_GSSET_INFO and ETHTOOL_GSTRINGS.
> NCSI HW statistics aren't following this and their layout is fixed.

That does not stop you from using it for a fixed layout. And what
happens when a new version of the standard is released with more
statistics?

> Besides, statistics for ETHTOOL_GSTATS are maintained in local MAC,
> but NCSI HW statistics are collected from NIC on far-end. So they're
> different from this point. 

You might want to take a look at what is happening with Ethernet
switches. Again, we have two sets of statistics for each port on the
switch. We have the statistics from the hardware, and statistics from
the software. Frames that come in one interface and the hardware
forwards out another interface are not seen by the software. But
frames which originate/terminate in the host, or the switch does not
know what to do with and so are passed to the host, are seen by the
software. Mellanox can tell you more. Your local and remote are not
that different.

> Lastly, the NCSI software statistics needs separate command. It'd
> better to have separate command for HW statistics as well, to make
> things consistent.

Are software statistics defined in DSP0222? Since they are software, i
kind of expect you want the flexibility to add more later. The
ETHTOOL_GSSET_INFO and ETHTOOL_GSTRINGS gives you that flexibility,
without having to change ethtool.

	Andrew

^ permalink raw reply

* Re: [PATCH v4 net-next 06/10] net/ncsi: Ethtool operation to get NCSI hw statistics
From: Gavin Shan @ 2017-05-04  0:38 UTC (permalink / raw)
  To: David Miller; +Cc: gwshan, andrew, netdev, joe, kubakici, f.fainelli
In-Reply-To: <20170503.201643.358209122385716637.davem@davemloft.net>

On Wed, May 03, 2017 at 08:16:43PM -0400, David Miller wrote:
>From: Gavin Shan <gwshan@linux.vnet.ibm.com>
>Date: Thu, 4 May 2017 10:05:34 +1000
>> On Wed, May 03, 2017 at 09:18:23AM -0400, David Miller wrote:
>>>From: Andrew Lunn <andrew@lunn.ch>
>>>Date: Wed, 3 May 2017 14:47:22 +0200
>>>
>>>> On Wed, May 03, 2017 at 02:44:37PM +1000, Gavin Shan wrote:
>>>>> This adds ethtool command (ETHTOOL_GNCSISTATS) to retrieve the
>>>>> NCSI hardware statistics.
>>>> 
>>>> Hi Gavin
>>>> 
>>>> I've not been following this patchset, so maybe i'm about to ask a
>>>> question which has already been asked and answered.
>>>> 
>>>> Why cannot use just use ethtool -S for this?
>>>
>>>Indeed, when I said to use ethtool for these NCSI hw stats I meant
>>>that the "ethtool -S" be used, not some new ethtool command.
>>>
>> 
>> Thanks for the comments. Feel free to ask any questions which would
>> make the code clear and better. There are couple of factors I thought
>> separate command is better than ETHTOOL_GSTATS: The statistic items from
>> ETHTOOL_GSTATS are variable, meaning the kernel needs provide the layout
>> of the statistic items via ETHTOOL_GSSET_INFO and ETHTOOL_GSTRINGS.
>> NCSI HW statistics aren't following this and their layout is fixed.
>> Besides, statistics for ETHTOOL_GSTATS are maintained in local MAC,
>> but NCSI HW statistics are collected from NIC on far-end. So they're
>> different from this point. Lastly, the NCSI software statistics needs
>> separate command. It'd better to have separate command for HW statistics
>> as well, to make things consistent.
>
>ETHTOOL_GSTATS are for device specific statistics.  Whether they are
>fixed in your implementation or not.
>

For NCSI HW statistics, they're always fixed. It's assured by NCSI
specification. It seems my explanation isn't convincing enough. I'm
fine to convey NCSI HW statistics through ETHTOOL_GSTATS (ETHTOOL_GSSET_INFO
and ETHTOOL_GSTRINGS). Dave, please confirm it's what you're happy to
see?

Cheers,
Gavin

^ permalink raw reply

* Re: [PATCH v4 net-next 04/10] net/ncsi: Ethtool operation to get NCSI topology
From: Andrew Lunn @ 2017-05-04  0:49 UTC (permalink / raw)
  To: Gavin Shan; +Cc: netdev, joe, kubakici, f.fainelli, davem
In-Reply-To: <1493786681-27468-5-git-send-email-gwshan@linux.vnet.ibm.com>

> +void ncsi_ethtool_register_dev(struct net_device *dev)
> +{
> +	struct ethtool_ops *ops;
> +
> +	ops = (struct ethtool_ops *)(dev->ethtool_ops);

Why do you need the cast here?

Ah, is it because net_device has:

          const struct ethtool_ops *ethtool_ops;

i.e. you are casting off the const.

> +	if (!ops)
> +		return;
> +
> +	ops->get_ncsi_channels = ncsi_get_channels;

and here you modify it. Which is going to blow up, because it will be
in a read only segment and should throw an opps when you write to it?

You need to export ncsi_get_channels, and let the underlying driver
add it to its own ethtool_ops.

   Andrew

^ permalink raw reply

* Re: [PATCH v4 net-next 06/10] net/ncsi: Ethtool operation to get NCSI hw statistics
From: Gavin Shan @ 2017-05-04  0:55 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Gavin Shan, David Miller, netdev, joe, kubakici, f.fainelli
In-Reply-To: <20170504003452.GO8029@lunn.ch>

On Thu, May 04, 2017 at 02:34:52AM +0200, Andrew Lunn wrote:
>On Thu, May 04, 2017 at 10:05:34AM +1000, Gavin Shan wrote:
>> On Wed, May 03, 2017 at 09:18:23AM -0400, David Miller wrote:
>> >From: Andrew Lunn <andrew@lunn.ch>
>> >Date: Wed, 3 May 2017 14:47:22 +0200
>> >
>> >> On Wed, May 03, 2017 at 02:44:37PM +1000, Gavin Shan wrote:
>> >>> This adds ethtool command (ETHTOOL_GNCSISTATS) to retrieve the
>> >>> NCSI hardware statistics.
>> >> 
>> >> Hi Gavin
>> >> 
>> >> I've not been following this patchset, so maybe i'm about to ask a
>> >> question which has already been asked and answered.
>> >> 
>> >> Why cannot use just use ethtool -S for this?
>> >
>> >Indeed, when I said to use ethtool for these NCSI hw stats I meant
>> >that the "ethtool -S" be used, not some new ethtool command.
>> >
>> 
>> Thanks for the comments. Feel free to ask any questions which would
>> make the code clear and better. There are couple of factors I thought
>> separate command is better than ETHTOOL_GSTATS: The statistic items from
>> ETHTOOL_GSTATS are variable, meaning the kernel needs provide the layout
>> of the statistic items via ETHTOOL_GSSET_INFO and ETHTOOL_GSTRINGS.
>> NCSI HW statistics aren't following this and their layout is fixed.
>
>That does not stop you from using it for a fixed layout. And what
>happens when a new version of the standard is released with more
>statistics?
>

Correct, I don't see the difference between NCSI spec 1.01 and 1.1.0.
So I assume it's pretty stable, but still have potential to change.

I agree to collect NCSI HW statistic with ETHTOOL_GSTATS as I said
in another thread.

>> Besides, statistics for ETHTOOL_GSTATS are maintained in local MAC,
>> but NCSI HW statistics are collected from NIC on far-end. So they're
>> different from this point. 
>
>You might want to take a look at what is happening with Ethernet
>switches. Again, we have two sets of statistics for each port on the
>switch. We have the statistics from the hardware, and statistics from
>the software. Frames that come in one interface and the hardware
>forwards out another interface are not seen by the software. But
>frames which originate/terminate in the host, or the switch does not
>know what to do with and so are passed to the host, are seen by the
>software. Mellanox can tell you more. Your local and remote are not
>that different.
>

Yeah, it makes sense. Thanks for making an example here.

>> Lastly, the NCSI software statistics needs separate command. It'd
>> better to have separate command for HW statistics as well, to make
>> things consistent.
>
>Are software statistics defined in DSP0222? Since they are software, i
>kind of expect you want the flexibility to add more later. The
>ETHTOOL_GSSET_INFO and ETHTOOL_GSTRINGS gives you that flexibility,
>without having to change ethtool.
>

No, they're purely software implementation, not defined in DSP0222.
These statistics count the NCSI packets seen by software, not hardware.
It's used for diagnosis.

It's a nice suggestion to convey SW statistic with ETHTOOL_GSTATS,
ETHTOOL_GSSET_INFO and ETHTOOL_GSTRINGS, for better flexibility. I'll
follow this in next respin.

Cheers,
Gavin

^ permalink raw reply

* Re: FEC on i.MX 7 transmit queue timeout
From: Stefan Agner @ 2017-05-04  1:21 UTC (permalink / raw)
  To: Andy Duan; +Cc: fugang.duan, festevam, netdev, netdev-owner
In-Reply-To: <ab1fc814-4a81-9eb0-b20b-2cd325158ba8@nxp.com>

Hi Andy,

On 2017-04-20 19:48, Andy Duan wrote:
> On 2017年04月20日 07:15, Stefan Agner wrote:
>> I tested again with imx6sx-fec compatible string. I could reproduce it
>> on a Colibri with i.MX 7Dual. But not always: It really depends whether
>> queue 2 is counting up or not. Just after boot, I check /proc/interrupts
>> twice, if queue 2 is counting it will happen!
>>
>> But if only queue 0 is mostly in use, then it seems to work just fine.
> If your case is only running best effort like tcp/udp, you can re-set 
> the "fsl,num-tx-queues" and "fsl,num-rx-queues" to 1 in board dts file.
> Other two queues are for AVB audio/video queues, they have high priority 
> than queue 0. If running iperf tcp test on the three queues, then
> the tcp segment may be out-of-order that cause net watchdog timeout.
>>
>> I also tried i.MX 7Dual SabreSD here, and the same thing. I had to
>> reboot 3 times, then queue 2 was counting:
>>   57:          8     GIC-0 150 Level     30be0000.ethernet
>>   58:      20137     GIC-0 151 Level     30be0000.ethernet
>>   59:       9269     GIC-0 152 Level     30be0000.ethernet
>>
>> It took me about 40 minutes on Sabre until it happened, and I had to
>> force it using iperf, but then I got the ring dumps:
> My board had ran more than 47 hours with nfs rootfs in 4.11.0-rc6, but 
> not running iperf.
> I am testing with iperf.

Any update on this issue?

When using iperf (server) on the board with Linux 4.11 the issue appears
within a few iperf iterations on a Sabre (TO 1.2, Board Rev C, if that
matters)...

root@colibri-imx7:~# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 192.168.10.70 port 5001 connected with 192.168.10.1 port
60524
random: crng init done
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  1.06 GBytes   909 Mbits/sec
[  5] local 192.168.10.70 port 5001 connected with 192.168.10.1 port
60528
[  5]  0.0-10.0 sec  1.07 GBytes   919 Mbits/sec
[  4] local 192.168.10.70 port 5001 connected with 192.168.10.1 port
60562
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316
dev_watchdog+0x248/0x24c
NETDEV WATCHDOG: eth0 (fec): transmit queue 1 timed out
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0 #360
Hardware name: Freescale i.MX7 Dual (Device Tree)
[<c0226930>] (unwind_backtrace) from [<c0222ffc>] (show_stack+0x10/0x14)
[<c0222ffc>] (show_stack) from [<c04d4f78>] (dump_stack+0x78/0x8c)
[<c04d4f78>] (dump_stack) from [<c0236b38>] (__warn+0xe8/0x100)
[<c0236b38>] (__warn) from [<c0236b88>] (warn_slowpath_fmt+0x38/0x48)
[<c0236b88>] (warn_slowpath_fmt) from [<c0806154>]
(dev_watchdog+0x248/0x24c)
[<c0806154>] (dev_watchdog) from [<c028a9a0>] (call_timer_fn+0x28/0x98)
[<c028a9a0>] (call_timer_fn) from [<c028aab0>] (expire_timers+0xa0/0xac)
[<c028aab0>] (expire_timers) from [<c028ab58>]
(run_timer_softirq+0x9c/0x194)
[<c028ab58>] (run_timer_softirq) from [<c023b110>]
(__do_softirq+0x114/0x234)
[<c023b110>] (__do_softirq) from [<c023b4fc>] (irq_exit+0xcc/0x108)
[<c023b4fc>] (irq_exit) from [<c027a1a0>]
(__handle_domain_irq+0x80/0xec)
[<c027a1a0>] (__handle_domain_irq) from [<c0201544>]
(gic_handle_irq+0x48/0x8c)
[<c0201544>] (gic_handle_irq) from [<c0223ab8>] (__irq_svc+0x58/0x8c)
Exception stack(0xc1001f28 to 0xc1001f70)
1f20:                   00000001 00000000 00000000 c0230060 c1000000
c1003d80
1f40: c1003d34 c0e72f50 c0bd9a04 c1001f80 00000000 00000000 0000320a
c1001f78
1f60: c022070c c0220710 600e0013 ffffffff
[<c0223ab8>] (__irq_svc) from [<c0220710>] (arch_cpu_idle+0x38/0x3c)
[<c0220710>] (arch_cpu_idle) from [<c026f4e0>] (do_idle+0x170/0x204)
[<c026f4e0>] (do_idle) from [<c026f82c>] (cpu_startup_entry+0x18/0x1c)
[<c026f82c>] (cpu_startup_entry) from [<c0e00c88>]
(start_kernel+0x394/0x3a0)
---[ end trace 86a38600d1b9e2a5 ]---
fec 30be0000.ethernet eth0: TX ring dump
Nr     SC     addr       len  SKB
  0    0x1c00 0x00000000   42   (null)
  1  H 0x1c00 0x00000000   86   (null)

^ permalink raw reply

* Re: [4.9.13] use after free in ipv4_mtu
From: Daniel J Blueman @ 2017-05-04  1:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Netdev, David Miller, Stephen Hemminger
In-Reply-To: <1488807904.9415.375.camel@edumazet-glaptop3.roam.corp.google.com>

On 6 March 2017 at 21:45, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2017-03-06 at 14:33 +0800, Daniel J Blueman wrote:
>> On 2 March 2017 at 21:28, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Thu, 2017-03-02 at 05:08 -0800, Eric Dumazet wrote:
>> >
>> >> Thanks for the report !
>> >>
>> >> This patch should solve this precise issue, but we need more work.
>> >>
>> >> We need to audit all __sk_dst_get() and make sure they are inside an
>> >> rcu_read_lock()/rcu_read_unlock() section.
>> >>
>> >> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> >> index 22548b5f05cbe5a655e0c53df2d31c5cc2e8a702..517963e1cb6eb9d70fcd71f44262813c3378759f 100644
>> >> --- a/net/ipv4/tcp_output.c
>> >> +++ b/net/ipv4/tcp_output.c
>> >> @@ -1459,7 +1459,7 @@ EXPORT_SYMBOL(tcp_sync_mss);
>> >>  unsigned int tcp_current_mss(struct sock *sk)
>> >>  {
>> >>       const struct tcp_sock *tp = tcp_sk(sk);
>> >> -     const struct dst_entry *dst = __sk_dst_get(sk);
>> >> +     const struct dst_entry *dst;
>> >>       u32 mss_now;
>> >>       unsigned int header_len;
>> >>       struct tcp_out_options opts;
>> >> @@ -1467,11 +1467,14 @@ unsigned int tcp_current_mss(struct sock *sk)
>> >>
>> >>       mss_now = tp->mss_cache;
>> >>
>> >> +     rcu_read_lock();
>> >> +     dst = __sk_dst_get(sk);
>> >>       if (dst) {
>> >>               u32 mtu = dst_mtu(dst);
>> >>               if (mtu != inet_csk(sk)->icsk_pmtu_cookie)
>> >>                       mss_now = tcp_sync_mss(sk, mtu);
>> >>       }
>> >> +     rcu_read_unlock();
>> >>
>> >>       header_len = tcp_established_options(sk, NULL, &opts, &md5) +
>> >>                    sizeof(struct tcphdr);
>> >
>> > Normally TCP sockets sk_dst_cache can only be changed if the thread
>> > doing the change owns the socket.
>> >
>> > I have not yet understood which point was breaking the rule yet.
>>
>> Great work Eric! I have been unable to reproduce the KASAN warning
>> with this patch.
>>
>> Reported-by: Daniel J Blueman <daniel@quora.org>
>> Tested-by: Daniel J Blueman <daniel@quora.org>
>>
>> I do change the network queueing discipline and related at runtime [1]
>> which may be triggering this, though I did think I saw the KASAN
>> report only after resuming from suspend. rf(un)kill and other tweaking
>> may have been involved too.
>>
>> Thanks,
>>   Dan
>
> Thanks Daniel, but this bandaid patch should not be needed.
>
> Somehow another point in the stack is at fault and needs to be
> identified.
>
> Otherwise we'll keep adding works around.
>
> Since net-next is soon to be re-opened, I will submit patches adding
> more lockdep assisted checks.

I am still seeing the same KASAN issue on 4.9.25; let me know if any
debug or testing would help. Thanks Eric!

https://quora.org/linux/ipv4_mtu/vmlinux
https://quora.org/linux/ipv4_mtu/config

[19515.180104] BUG: KASAN: use-after-free in ipv4_mtu+0x25d/0x320 at
addr ffff88040d6fecc4
[19515.180108] Read of size 4 by task Socket Thread/3299
[19515.180113] CPU: 7 PID: 3299 Comm: Socket Thread Tainted: G    BU
       4.9.25-debug+ #5
[19515.180115] Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS
1.2.21 02/17/2017
[19515.180127]  ffff8803f55a77d0 ffffffff9ce404b1 ffff88042d803800
ffff88040d6fecc0
[19515.180133]  ffff8803f55a77f8 ffffffff9c7e2751 ffff8803f55a7890
ffff88040d6fecc0
[19515.180137]  ffff88042d803800 ffff8803f55a7880 ffffffff9c7e29ea
ffff8803f55a7828
[19515.180140] Call Trace:
[19515.180148]  [<ffffffff9ce404b1>] dump_stack+0x63/0x82
[19515.180154]  [<ffffffff9c7e2751>] kasan_object_err+0x21/0x70
[19515.180158]  [<ffffffff9c7e29ea>] kasan_report_error+0x1fa/0x500
[19515.180162]  [<ffffffff9dced48d>] ? schedule+0x8d/0x1a0
[19515.180168]  [<ffffffff9c4f561e>] ? get_futex_key+0x64e/0xbb0
[19515.180174]  [<ffffffff9c40a1ab>] ? update_cfs_rq_load_avg+0x89b/0x1520
[19515.180177]  [<ffffffff9c7e2e31>] __asan_report_load4_noabort+0x61/0x70
[19515.180180]  [<ffffffff9d9ace2d>] ? ipv4_mtu+0x25d/0x320
[19515.180182]  [<ffffffff9d9ace2d>] ipv4_mtu+0x25d/0x320
[19515.180187]  [<ffffffff9da44693>] tcp_current_mss+0x133/0x270
[19515.180190]  [<ffffffff9da44560>] ? tcp_mtu_to_mss+0x280/0x280
[19515.180193]  [<ffffffff9c40eeb9>] ? select_idle_sibling+0x29/0xaa0
[19515.180195]  [<ffffffff9c4f6284>] ? futex_wait_setup+0x1d4/0x300
[19515.180198]  [<ffffffff9c424f81>] ? enqueue_task_fair+0x261/0x2760
[19515.180201]  [<ffffffff9d9f99e4>] tcp_send_mss+0x24/0x2b0
[19515.180203]  [<ffffffff9da09809>] tcp_sendmsg+0x3d9/0x3530
[19515.180208]  [<ffffffff9c3f018d>] ? ttwu_do_wakeup+0x1d/0x2c0
[19515.180211]  [<ffffffff9c3f0539>] ? ttwu_do_activate+0x109/0x180
[19515.180213]  [<ffffffff9da09430>] ? tcp_sendpage+0x1cc0/0x1cc0
[19515.180216]  [<ffffffff9c3f3a27>] ? wake_up_q+0x87/0xe0
[19515.180219]  [<ffffffff9c4fb707>] ? do_futex+0xf87/0x1730
[19515.180222]  [<ffffffff9dac9d90>] ? inet_gso_segment+0x1340/0x1340
[19515.180224]  [<ffffffff9daca1b0>] ? inet_recvmsg+0x420/0x420
[19515.180226]  [<ffffffff9daca3c7>] inet_sendmsg+0x217/0x3c0
[19515.180228]  [<ffffffff9daca1b0>] ? inet_recvmsg+0x420/0x420
[19515.180232]  [<ffffffff9d87752a>] sock_sendmsg+0xba/0xf0
[19515.180234]  [<ffffffff9d8785ef>] SYSC_sendto+0x1df/0x330
[19515.180236]  [<ffffffff9d878410>] ? SYSC_connect+0x2d0/0x2d0
[19515.180243]  [<ffffffff9d872794>] ? sock_ioctl+0x284/0x3a0
[19515.180248]  [<ffffffff9c87e3ef>] ? do_vfs_ioctl+0x1af/0xf60
[19515.180250]  [<ffffffff9c87e240>] ? ioctl_preallocate+0x1e0/0x1e0
[19515.180253]  [<ffffffff9c4fbfa4>] ? SyS_futex+0xf4/0x2a0
[19515.180259]  [<ffffffff9c8436b4>] ? vfs_read+0x254/0x2f0
[19515.180261]  [<ffffffff9c4fbeb0>] ? do_futex+0x1730/0x1730
[19515.180265]  [<ffffffff9c84705c>] ? SyS_read+0x11c/0x1c0
[19515.180267]  [<ffffffff9d87a31e>] SyS_sendto+0xe/0x10
[19515.180271]  [<ffffffff9dcf897b>] entry_SYSCALL_64_fastpath+0x1e/0xad
[19515.180274] Object at ffff88040d6fecc0, in cache kmalloc-64 size: 64
[19515.180277] Allocated:
[19515.180279] PID = 2326
[19515.180287]  save_stack_trace+0x1b/0x20
[19515.180291]  save_stack+0x46/0xd0
[19515.180293]  kasan_kmalloc+0xad/0xe0
[19515.180295]  kmem_cache_alloc_trace+0xe8/0x1e0
[19515.180298]  fib_create_info+0x917/0x3fc0
[19515.180300]  fib_table_insert+0x1d4/0x1c90
[19515.180302]  inet_rtm_newroute+0xfb/0x180
[19515.180306]  rtnetlink_rcv_msg+0x249/0x6a0
[19515.180312]  netlink_rcv_skb+0x247/0x350
[19515.180314]  rtnetlink_rcv+0x28/0x30
[19515.180316]  netlink_unicast+0x419/0x5c0
[19515.180318]  netlink_sendmsg+0x8a7/0xb60
[19515.180319]  sock_sendmsg+0xba/0xf0
[19515.180321]  ___sys_sendmsg+0x697/0x860
[19515.180323]  __sys_sendmsg+0xd5/0x170
[19515.180324]  SyS_sendmsg+0x12/0x20
[19515.180327]  entry_SYSCALL_64_fastpath+0x1e/0xad
[19515.180327] Freed:
[19515.180329] PID = 0
[19515.180333]  save_stack_trace+0x1b/0x20
[19515.180340]  save_stack+0x46/0xd0
[19515.180345]  kasan_slab_free+0x71/0xb0
[19515.180347]  kfree+0x8c/0x1a0
[19515.180353]  free_fib_info_rcu+0x558/0x760
[19515.180358]  rcu_process_callbacks+0x968/0x1000
[19515.180361]  __do_softirq+0x1a9/0x538
[19515.180361] Memory state around the buggy address:
[19515.180367]  ffff88040d6feb80: fc fc fc fc 00 00 00 00 00 00 00 00
fc fc fc fc
[19515.180369]  ffff88040d6fec00: fb fb fb fb fb fb fb fb fc fc fc fc
fb fb fb fb
[19515.180371] >ffff88040d6fec80: fb fb fb fb fc fc fc fc fb fb fb fb
fb fb fb fb
[19515.180373]                                            ^
[19515.180375]  ffff88040d6fed00: fc fc fc fc fb fb fb fb fb fb fb fb
fc fc fc fc
[19515.180378]  ffff88040d6fed80: 00 00 00 00 00 00 00 00 fc fc fc fc
fb fb fb fb
-- 
Daniel J Blueman

^ permalink raw reply

* Re: [PATCH v4 net-next 04/10] net/ncsi: Ethtool operation to get NCSI topology
From: Gavin Shan @ 2017-05-04  1:36 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Gavin Shan, netdev, joe, kubakici, f.fainelli, davem
In-Reply-To: <20170504004933.GP8029@lunn.ch>

On Thu, May 04, 2017 at 02:49:33AM +0200, Andrew Lunn wrote:
>> +void ncsi_ethtool_register_dev(struct net_device *dev)
>> +{
>> +	struct ethtool_ops *ops;
>> +
>> +	ops = (struct ethtool_ops *)(dev->ethtool_ops);
>
>Why do you need the cast here?
>
>Ah, is it because net_device has:
>
>          const struct ethtool_ops *ethtool_ops;
>
>i.e. you are casting off the const.
>
>> +	if (!ops)
>> +		return;
>> +
>> +	ops->get_ncsi_channels = ncsi_get_channels;
>
>and here you modify it. Which is going to blow up, because it will be
>in a read only segment and should throw an opps when you write to it?
>
>You need to export ncsi_get_channels, and let the underlying driver
>add it to its own ethtool_ops.
>

I didn't see oops because of this on the ARM board where I did the
testing. The intention was to make ethtool invisible to drivers, for
simplicity. However, we need to expose ethtool to driver when the HW
and SW statistics are conveyed by ETHTOOL_GSTATS as agree'd in another
thread.

Cheers,
Gavin

^ permalink raw reply

* Re: Maximum MPLS labels on Linux network stack
From: David Ahern @ 2017-05-04  2:22 UTC (permalink / raw)
  To: Joe Stringer,
	Алексей Болдырев
  Cc: netdev
In-Reply-To: <CAPWQB7FFwBFL9h7JoJMF4FuK7JXLrHfo5yg4aHJ4iczSdNU_bg@mail.gmail.com>

On 5/3/17 2:21 PM, Joe Stringer wrote:
>> • 8192 MPLS labels
>>
>> Especially interested in the figure 8192 MPLS Labels.

The 8k labels has to be 8k individual routes with a single label (or a
few labels in the stack for the route). In that case you can set
net.mpls.platforms_labels to 10001 and install routes with label values
up to 10000.

^ permalink raw reply

* Re: [PATCH 2/6] wl1251: Use request_firmware_prefer_user() for loading NVS calibration data
From: Luis R. Rodriguez @ 2017-05-04  2:28 UTC (permalink / raw)
  To: Arend Van Spriel
  Cc: Luis R. Rodriguez, Pavel Machek, Daniel Wagner, Tom Gundersen,
	Pali Rohár, Ming Lei, Greg Kroah-Hartman, Kalle Valo,
	David Gnedt, Michal Kazior, Tony Lindgren, Sebastian Reichel,
	Ivaylo Dimitrov, Aaro Koskinen, Takashi Iwai, David Woodhouse,
	Bjorn Andersson, Grazvydas Ignotas, linux-kernel, linux-wireless
In-Reply-To: <936bf348-58ac-882c-a433-83f209862deb@broadcom.com>

On Wed, May 03, 2017 at 09:02:20PM +0200, Arend Van Spriel wrote:
> On 3-1-2017 18:59, Luis R. Rodriguez wrote:
> > On Mon, Dec 26, 2016 at 05:35:59PM +0100, Pavel Machek wrote:
> >>
> >> Right question is "should we solve it without user-space help"?
> >>
> >> Answer is no, too. Way too complex. Yes, it would be nice if hardware
> >> was designed in such a way that getting calibration data from kernel
> >> is easy, and if you design hardware, please design it like that. But
> >> N900 is not designed like that and getting the calibration through
> >> userspace looks like only reasonable solution.
> > 
> > Arend seems to have a better alternative in mind possible for other
> > devices which *can* probably pull of doing this easily and nicely,
> > given the nasty history of the usermode helper crap we should not
> > in any way discourage such efforts.
> > 
> > Arend -- please look at the firmware cache, it not a hash but a hash
> > table for an O(1) lookups would be a welcomed change, then it could
> > be repurposed for what you describe, I think the only difference is
> > you'd perhaps want a custom driver hook to fetch the calibration data
> > so the driver does whatever it needs.
> 
> Hi Luis,
> 
> I let my idea catch dust on the shelf for a while. 

:) BTW did you get to check out Daniel Wagner and Tom Gundersen's firmware work
[0] ? This can provide a proper clear fallback mechanism which *also* helps
address the race between "critical mount points ready" problem we had discussed
long ago. IIRC it did this by having two modes of operation a best effort-mode
and a final-mode. The final-mode would only be used once all the real rootfs is
ready. Userspace decides when to kick / signal firmwared to switch to final-mode
as only userspace will know for sure when that is ready. The best-effort mode
would be used in initramfs. There is also no need for a "custom fallback", the
uevent fallback mechanism can be relied upon alone. Custom tools can just fork
off and do something similar to firmward then in terms of architecture. This is
a form of fallback mechanism I think I'd be happy to see enabled on the new
driver data API. But of course, I recall also liking what you had in mind as well
so would be happy to see more alternatives! The cleaner the better !

[0] https://github.com/teg/firmwared

> Actually had a couple
> of patches ready, but did get around testing them. So I wanted to rebase
> them on your linux-next tree. I bumped into the umh lock thing and was
> wondering why direct filesystem access was under that lock as well.

Indeed, it was an odd thing.

> In your tree I noticed a fix for that. 

Yup!

It took a lot of git archeology to reach a sound approach forward which makes
me feel comfortable without regressing the kernel, this should not regress
the kernel.

> The more reason to base my work on
> top of your firmware_class changes. Now my question is what is the best
> branch to choose, because you have a "few" in that repo to choose from ;-)

I have a series of topical changes, and I rebase onto the latest linux-next
regularly before posting patches, if 0-day finds issues, I dish out a next
try2 or tryX iteration until all issues are fixed. So in this case you'd
just go for the latest driver-data-$(next_date) and if there is a try
postfix use the latest tryX.

In this case 20170501-driver-data-try2, this is based on linux-next tag
next-20170501. If you have issues booting on that next tag though you
could instead try the v4.11-rc8-driver-data-try3 branch based on v4.11-rc8.
But naturally patches ultimately should be based on the latest linux-next
tag for actual submission.

PS. after my changes the fallback mechanism can easily be shoved into its
own file, not sure if that helps with how clean of a solution your work
will be.

  Luis

^ permalink raw reply

* new warning at net/wireless/util.c:1236
From: Linus Torvalds @ 2017-05-04  2:28 UTC (permalink / raw)
  To: David Miller, Johannes Berg
  Cc: Linux Wireless List, Network Development,
	Linux Kernel Mailing List

So my Dell XPS 13 seems to have grown a new warning as of the
networking merge yesterday.

Things still work, but when it starts warning, it generates a *lot* of
noise (I got 36 of these within about ten minutes).

I have no idea what triggered it, because when I rebooted (not because
of this issue, but just to reboot into a newer kernel) I don't see it
again.

This is all pretty regular wireless - it's intel 8260 wireless in a
fairly normal laptop.

Things still seem to *work* ok, so the only problem here is the overly
verbose and useless WARN_ON. It doesn't even print out *which* rate it
is warning about, it just does that stupid unconditional WARN_ON()
without ever shutting up about it..

The WARN_ON() seems to be old, but my logs don't seem to have any
mention of this until today, so there's something that has changed
that now triggers it.

Ideas?

                  Linus

---

WARNING: CPU: 3 PID: 1138 at net/wireless/util.c:1236
cfg80211_calculate_bitrate+0x139/0x170 [cfg80211]
Modules linked in: rfcomm fuse ccm ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp
llc ebtable_nat ip6table_security ip6table_mangle ip6table_nat nf_con
 snd_hda_codec iwlmvm irqbypass snd_hwdep snd_hda_core intel_cstate
mac80211 snd_seq intel_rapl_perf snd_seq_device snd_pcm iwlwifi
rtsx_pci_ms snd_timer cfg80211 memstick snd soundcore i2c_i801 shpchp
 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops drm i2c_hid video
CPU: 3 PID: 1138 Comm: NetworkManager Tainted: G        W
4.11.0-06543-g2f34c1231bfc #60
Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.13 12/28/2016
task: ffff9c1d1bfcbb80 task.stack: ffffbb95c337c000
RIP: 0010:cfg80211_calculate_bitrate+0x139/0x170 [cfg80211]
RSP: 0018:ffffbb95c337f5b8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff9c1cb080cc00 RCX: 0000000000000000
RDX: 0000000000000005 RSI: 0000000000000002 RDI: ffffbb95c337f76e
RBP: ffffbb95c337f5b8 R08: 0000000000000004 R09: ffff9c1cc36fe0c4
R10: 00000000f7c00472 R11: ffffffffc0682000 R12: ffff9c1cc36fe0c0
R13: ffffbb95c337f76e R14: 0000000000000000 R15: ffff9c1cc36fe030
FS:  00007f000015f980(0000) GS:ffff9c1d3ed80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd1ae62748 CR3: 0000000469f4f000 CR4: 00000000003406e0
Call Trace:
 nl80211_put_sta_rate+0x56/0x210 [cfg80211]
 nl80211_send_station.isra.63+0x639/0xd60 [cfg80211]
 nl80211_get_station+0x1e4/0x250 [cfg80211]
 genl_family_rcv_msg+0x1fa/0x3e0
 genl_rcv_msg+0x4c/0x90
 netlink_rcv_skb+0xde/0x110
 genl_rcv+0x28/0x40
 netlink_unicast+0x189/0x220
 netlink_sendmsg+0x2ba/0x3b0
 sock_sendmsg+0x38/0x50
 ___sys_sendmsg+0x2b6/0x2d0
 __sys_sendmsg+0x54/0x90
 SyS_sendmsg+0x12/0x20
 entry_SYSCALL_64_fastpath+0x13/0x94
RIP: 0033:0x7efffebf82fd
RSP: 002b:00007fff97d4af30 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007efffebf82fd
RDX: 0000000000000000 RSI: 00007fff97d4afc0 RDI: 0000000000000010
RBP: 00007fff97d4b070 R08: 0000000000000000 R09: 00007efffcc7b168
R10: 000055eb69e6d110 R11: 0000000000000293 R12: 00007fff97d4b0e0
R13: 0000000000000001 R14: 0000000000000000 R15: 000055eb68a32760
Code: 89 d0 f7 e1 d1 ea 8d 14 92 01 d2 81 c2 50 c3 00 00 b9 c5 5a 7c
0a c1 ea 05 89 d0 f7 e1 5d 89 d0 c1 e8 07 c3 31 c0 80 f9 02 74 b7 <0f>
ff 31 c0 eb b1 0f ff 31 c0 5d c3 0f ff 31 c0 5d c3 8d 04 40

^ permalink raw reply

* Re: net/ipv6: GPF in rt6_device_match
From: David Ahern @ 2017-05-04  2:43 UTC (permalink / raw)
  To: Cong Wang
  Cc: Andrey Konovalov, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Dmitry Vyukov,
	Kostya Serebryany, Eric Dumazet, syzkaller
In-Reply-To: <CAM_iQpVbcOL3BD6SkVAOFLVCX5q_OpO0vfYnwG6XE4vdJkiw5Q@mail.gmail.com>

On 5/3/17 5:35 PM, Cong Wang wrote:
> Ah, we need:
> 
> @@ -4024,7 +4027,7 @@ static struct pernet_operations ip6_route_net_late_ops = {
> 
>  static struct notifier_block ip6_route_dev_notifier = {
>         .notifier_call = ip6_route_dev_notify,
> -       .priority = 0,
> +       .priority = -10, /* Must be called after addrconf_notify!! */
>  };
> 


It's not a notifier problem; the null_entry is created in ip6_route_init
which is an init function.

For network namespaces other than init_net, it is never initialized. See
ip6_route_net_init.

^ permalink raw reply

* RE: FEC on i.MX 7 transmit queue timeout
From: Andy Duan @ 2017-05-04  3:08 UTC (permalink / raw)
  To: Stefan Agner
  Cc: fugang.duan@freescale.com, festevam@gmail.com,
	netdev@vger.kernel.org, netdev-owner@vger.kernel.org
In-Reply-To: <e366479239b66ba80415b5594def2369@agner.ch>

From: Stefan Agner <stefan@agner.ch> Sent: Thursday, May 04, 2017 9:22 AM
>To: Andy Duan <fugang.duan@nxp.com>
>Cc: fugang.duan@freescale.com; festevam@gmail.com;
>netdev@vger.kernel.org; netdev-owner@vger.kernel.org
>Subject: Re: FEC on i.MX 7 transmit queue timeout
>
>Hi Andy,
>
>On 2017-04-20 19:48, Andy Duan wrote:
>> On 2017年04月20日 07:15, Stefan Agner wrote:
>>> I tested again with imx6sx-fec compatible string. I could reproduce
>>> it on a Colibri with i.MX 7Dual. But not always: It really depends
>>> whether queue 2 is counting up or not. Just after boot, I check
>>> /proc/interrupts twice, if queue 2 is counting it will happen!
>>>
>>> But if only queue 0 is mostly in use, then it seems to work just fine.
>> If your case is only running best effort like tcp/udp, you can re-set
>> the "fsl,num-tx-queues" and "fsl,num-rx-queues" to 1 in board dts file.
>> Other two queues are for AVB audio/video queues, they have high
>> priority than queue 0. If running iperf tcp test on the three queues,
>> then the tcp segment may be out-of-order that cause net watchdog
>timeout.
>>>
>>> I also tried i.MX 7Dual SabreSD here, and the same thing. I had to
>>> reboot 3 times, then queue 2 was counting:
>>>   57:          8     GIC-0 150 Level     30be0000.ethernet
>>>   58:      20137     GIC-0 151 Level     30be0000.ethernet
>>>   59:       9269     GIC-0 152 Level     30be0000.ethernet
>>>
>>> It took me about 40 minutes on Sabre until it happened, and I had to
>>> force it using iperf, but then I got the ring dumps:
>> My board had ran more than 47 hours with nfs rootfs in 4.11.0-rc6, but
>> not running iperf.
>> I am testing with iperf.
>
>Any update on this issue?
>
>When using iperf (server) on the board with Linux 4.11 the issue appears
>within a few iperf iterations on a Sabre (TO 1.2, Board Rev C, if that matters)...
>
I don’t know whether you received my last mail. (maybe failed due to I received some rejection mails)

If your case is only running best effort like tcp/udp, you can re-set the "fsl,num-tx-queues" and "fsl,num-rx-queues" to 1 in board dts file.
Other two queues are for AVB audio/video queues, they have high priority than queue 0. If running iperf tcp test on the three queues, then the tcp segment may be out-of-order that cause net watchdog timeout.
In fsl kernel tree, there have one patch that only select the queue0 for best effort like tcp/udp. Pls test again in your board, if no problem I will upstream the patch.

^ permalink raw reply

* Re: [PATCH net-next] selftests/bpf: get rid of -D__x86_64__
From: Alexei Starovoitov @ 2017-05-04  3:30 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, netdev
In-Reply-To: <20170503.133512.310640909764585408.davem@davemloft.net>

On 5/3/17 10:35 AM, David Miller wrote:
> From: Alexei Starovoitov <ast@fb.com>
> Date: Wed, 3 May 2017 09:54:42 -0700
>
>> /usr/include/asm/types.h -> asm-generic/int-ll64.h
>> as far as I can see that should be the same on most archs.
>> Why doesn't it work for sparc?
>
> You can't assume anything about the kernel headers installed,
> on my debian Sparc box /usr/include/asm/types.h is below.
>
> They do things this way to facilitate multiarch building.  I think
> it's pretty reasonable.
>
> #ifndef _SPARC_TYPES_H
> #define _SPARC_TYPES_H
> /*
>  * This file is never included by application software unless
>  * explicitly requested (e.g., via linux/types.h) in which case the
>  * application is Linux specific so (user-) name space pollution is
>  * not a major issue.  However, for interoperability, libraries still
>  * need to be careful to avoid a name clashes.
>  */
>
> #if defined(__sparc__)
>
> #include <asm-generic/int-ll64.h>
>
> #ifndef __ASSEMBLY__
>
> typedef unsigned short umode_t;
>
> #endif /* __ASSEMBLY__ */
>
> #endif /* defined(__sparc__) */

if it was something like
#ifdef __sparc__
...
#else
#include_next <asm/types.h>

I would buy that debian folks indeed care about multi-arch, but
what above does is making #include <linux/types.h> to be a nop
for any cross-compiler on sparc that included it.
Which is probably quite painful to debug as we found out.
You're right that we cannot assume much about /usr/include craziness.
In that sense adding __native_arch__ macro is also wrong, since
it assumes sane /usr/include without inline asm or other things
that clang for bpf arch can consume.
In that sense the only way to be independent from arch dependent
things in /usr/include is to put all arch specific headers
into our own dir in tools/selftests/ (or may be tools/bpf/include)
and point clang to that. I think the list of .h in there will be
limited. Only things like linux/types.h and gnu/stubs.h,
so it will be manageable.
Thoughts?

^ permalink raw reply

* Re: net/ipv6: GPF in rt6_device_match
From: Cong Wang @ 2017-05-04  3:55 UTC (permalink / raw)
  To: David Ahern
  Cc: Andrey Konovalov, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Dmitry Vyukov,
	Kostya Serebryany, Eric Dumazet, syzkaller
In-Reply-To: <79157b41-2a13-6fda-ef31-c96f16850254@gmail.com>

On Wed, May 3, 2017 at 7:43 PM, David Ahern <dsahern@gmail.com> wrote:
> On 5/3/17 5:35 PM, Cong Wang wrote:
>> Ah, we need:
>>
>> @@ -4024,7 +4027,7 @@ static struct pernet_operations ip6_route_net_late_ops = {
>>
>>  static struct notifier_block ip6_route_dev_notifier = {
>>         .notifier_call = ip6_route_dev_notify,
>> -       .priority = 0,
>> +       .priority = -10, /* Must be called after addrconf_notify!! */
>>  };
>>
>
>
> It's not a notifier problem; the null_entry is created in ip6_route_init
> which is an init function.

Only init_net's null entry is created here.

>
> For network namespaces other than init_net, it is never initialized. See
> ip6_route_net_init.

I don't understand what you are talking about...

It is obviously initialized in 3 places:

1) The template itself, as we use memdup()
2) Code around dst_init_metrics() in ip6_route_net_init()
3) ip6_route_dev_notify(), which is supposed to initialize it after
loopback registers (the order needs to fix, as shown in my patch)

Why not add a printk and play with my patch to see the difference?

^ permalink raw reply

* Re: net/ipv6: GPF in rt6_device_match
From: David Ahern @ 2017-05-04  4:14 UTC (permalink / raw)
  To: Cong Wang
  Cc: Andrey Konovalov, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Dmitry Vyukov,
	Kostya Serebryany, Eric Dumazet, syzkaller
In-Reply-To: <CAM_iQpU8rBnJEk7NRdJ9bBan93BpjFPeVV7Yz9QSb0O-O+=BtQ@mail.gmail.com>

On 5/3/17 9:55 PM, Cong Wang wrote:
> Why not add a printk and play with my patch to see the difference?

I have other things to do. If you believe your patch fixes the problem,
send it and let Andrey verify.

^ permalink raw reply

* Re: new warning at net/wireless/util.c:1236
From: Kalle Valo @ 2017-05-04  4:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Miller, Johannes Berg, Linux Wireless List,
	Network Development, Linux Kernel Mailing List, Luca Coelho
In-Reply-To: <CA+55aFxyp5gpUZEYHXLwyhbK6DMWf4RoVykGxq9FPtO=j4yJsw@mail.gmail.com>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> So my Dell XPS 13 seems to have grown a new warning as of the
> networking merge yesterday.
>
> Things still work, but when it starts warning, it generates a *lot* of
> noise (I got 36 of these within about ten minutes).
>
> I have no idea what triggered it, because when I rebooted (not because
> of this issue, but just to reboot into a newer kernel) I don't see it
> again.
>
> This is all pretty regular wireless - it's intel 8260 wireless in a
> fairly normal laptop.
>
> Things still seem to *work* ok, so the only problem here is the overly
> verbose and useless WARN_ON. It doesn't even print out *which* rate it
> is warning about, it just does that stupid unconditional WARN_ON()
> without ever shutting up about it..
>
> The WARN_ON() seems to be old, but my logs don't seem to have any
> mention of this until today, so there's something that has changed
> that now triggers it.
>
> Ideas?
>
>                   Linus
>
> ---
>
> WARNING: CPU: 3 PID: 1138 at net/wireless/util.c:1236
> cfg80211_calculate_bitrate+0x139/0x170 [cfg80211]

As this is with iwlwifi adding also Luca. There were some rate handling
changes in iwlwifi, like commit 77e409455f41, but don't know if that
could cause this.

-- 
Kalle Valo

^ permalink raw reply

* [Patch net] ipv6: initialize route null entry in addrconf_init()
From: Cong Wang @ 2017-05-04  5:07 UTC (permalink / raw)
  To: netdev; +Cc: andreyknvl, dsahern, Cong Wang

Andrey reported a crash on init_net.ipv6.ip6_null_entry->rt6i_idev
since it is always NULL.

This is clearly wrong, we have code to initialize it to loopback_dev,
unfortunately the order is still not correct.

loopback_dev is registered very early during boot, we lose a chance
to re-initialize it in notifier. addrconf_init() is called after
ip6_route_init(), which means we have no chance to correct it.

Fix it by moving this initialization explicitly after
ipv6_add_dev(init_net.loopback_dev) in addrconf_init().

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/net/ip6_route.h |  1 +
 net/ipv6/addrconf.c     |  2 ++
 net/ipv6/route.c        | 26 +++++++++++++++-----------
 3 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 9dc2c18..f5e625f 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -84,6 +84,7 @@ struct dst_entry *ip6_route_lookup(struct net *net, struct flowi6 *fl6,
 struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 			       int ifindex, struct flowi6 *fl6, int flags);
 
+void ip6_route_init_special_entries(void);
 int ip6_route_init(void);
 void ip6_route_cleanup(void);
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index a2a370b..77a4bd5 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -6573,6 +6573,8 @@ int __init addrconf_init(void)
 		goto errlo;
 	}
 
+	ip6_route_init_special_entries();
+
 	for (i = 0; i < IN6_ADDR_HSIZE; i++)
 		INIT_HLIST_HEAD(&inet6_addr_lst[i]);
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index a1bf426..2f11366 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4027,6 +4027,21 @@ static struct notifier_block ip6_route_dev_notifier = {
 	.priority = 0,
 };
 
+void __init ip6_route_init_special_entries(void)
+{
+	/* Registering of the loopback is done before this portion of code,
+	 * the loopback reference in rt6_info will not be taken, do it
+	 * manually for init_net */
+	init_net.ipv6.ip6_null_entry->dst.dev = init_net.loopback_dev;
+	init_net.ipv6.ip6_null_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
+  #ifdef CONFIG_IPV6_MULTIPLE_TABLES
+	init_net.ipv6.ip6_prohibit_entry->dst.dev = init_net.loopback_dev;
+	init_net.ipv6.ip6_prohibit_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
+	init_net.ipv6.ip6_blk_hole_entry->dst.dev = init_net.loopback_dev;
+	init_net.ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
+  #endif
+}
+
 int __init ip6_route_init(void)
 {
 	int ret;
@@ -4053,17 +4068,6 @@ int __init ip6_route_init(void)
 
 	ip6_dst_blackhole_ops.kmem_cachep = ip6_dst_ops_template.kmem_cachep;
 
-	/* Registering of the loopback is done before this portion of code,
-	 * the loopback reference in rt6_info will not be taken, do it
-	 * manually for init_net */
-	init_net.ipv6.ip6_null_entry->dst.dev = init_net.loopback_dev;
-	init_net.ipv6.ip6_null_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
-  #ifdef CONFIG_IPV6_MULTIPLE_TABLES
-	init_net.ipv6.ip6_prohibit_entry->dst.dev = init_net.loopback_dev;
-	init_net.ipv6.ip6_prohibit_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
-	init_net.ipv6.ip6_blk_hole_entry->dst.dev = init_net.loopback_dev;
-	init_net.ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
-  #endif
 	ret = fib6_init();
 	if (ret)
 		goto out_register_subsys;
-- 
2.5.5

^ permalink raw reply related

* [Patch net] ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf
From: Cong Wang @ 2017-05-04  5:07 UTC (permalink / raw)
  To: netdev; +Cc: andreyknvl, dsahern, Cong Wang

For each netns (except init_net), we initialize its null entry
in 3 places:

1) The template itself, as we use kmemdup()
2) Code around dst_init_metrics() in ip6_route_net_init()
3) ip6_route_dev_notify(), which is supposed to initialize it after
loopback registers

Unfortunately the last one still happens in a wrong order because
we expect to initialize net->ipv6.ip6_null_entry->rt6i_idev to
net->loopback_dev's idev, so we have to do that after we add
idev to it. However, this notifier has priority == 0 same as
ipv6_dev_notf, and ipv6_dev_notf is registered after
ip6_route_dev_notifier so it is called actually after
ip6_route_dev_notifier.

Fix it by specifying a smaller priority for ip6_route_dev_notifier.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/ipv6/route.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 2f11366..4dbf7e2 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4024,7 +4024,7 @@ static struct pernet_operations ip6_route_net_late_ops = {

 static struct notifier_block ip6_route_dev_notifier = {
 	.notifier_call = ip6_route_dev_notify,
-	.priority = 0,
+	.priority = -10, /* Must be called after addrconf_notify!! */
 };

 void __init ip6_route_init_special_entries(void)
-- 
2.5.5

^ permalink raw reply related

* [net-next] net: remove duplicate add_device_randomness() call
From: Zhang Shengju @ 2017-05-04  3:40 UTC (permalink / raw)
  To: davem, netdev, edumazet

Since register_netdevice() already call add_device_randomness() and
dev_set_mac_address() will call it after mac address change.
It's not necessary to call at device UP.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
---
 net/core/dev.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 35a06ce..cb48e40 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1347,7 +1347,6 @@ static int __dev_open(struct net_device *dev)
 		dev->flags |= IFF_UP;
 		dev_set_rx_mode(dev);
 		dev_activate(dev);
-		add_device_randomness(dev->dev_addr, dev->addr_len);
 	}
 
 	return ret;
-- 
1.8.3.1

^ permalink raw reply related

* 58960 netdev
From: kelley @ 2017-05-04  5:11 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: 374278179.zip --]
[-- Type: application/zip, Size: 3233 bytes --]

^ permalink raw reply

* Re: [PATCH v4 net-next 04/10] net/ncsi: Ethtool operation to get NCSI topology
From: Stephen Hemminger @ 2017-05-04  5:19 UTC (permalink / raw)
  To: Gavin Shan; +Cc: netdev, joe, kubakici, f.fainelli, davem
In-Reply-To: <1493786681-27468-5-git-send-email-gwshan@linux.vnet.ibm.com>

On Wed,  3 May 2017 14:44:35 +1000
Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:

> +static int ethtool_get_ncsi_channels(struct net_device *dev,
> +				     void __user *useraddr)

Please don't use an opaque type for this. See how other ethtool
operations take a struct.

> +{
> +	struct ethtool_ncsi_channels *enc;
> +	short nr_channels;
Should be __u16 or unsigned not short.

> +	ssize_t size = 0;
> +	int ret;
> +
> +	if (!dev->ethtool_ops->get_ncsi_channels)
> +		return -EOPNOTSUPP;
> +
> +	if (copy_from_user(&nr_channels, useraddr + sizeof(enc->cmd),
> +			   sizeof(nr_channels)))
> +		return -EFAULT;
> +
> +	size = sizeof(*enc);
> +	if (nr_channels > 0)
> +		size += nr_channels * sizeof(enc->id[0]);

You have no upper bound on number of channels, and therefore an incorrectly
application could grab an excessive amount of kernel memory.

^ permalink raw reply

* Re: [PATCH v4 net-next 04/10] net/ncsi: Ethtool operation to get NCSI topology
From: Stephen Hemminger @ 2017-05-04  5:21 UTC (permalink / raw)
  To: Gavin Shan; +Cc: netdev, joe, kubakici, f.fainelli, davem
In-Reply-To: <1493786681-27468-5-git-send-email-gwshan@linux.vnet.ibm.com>

On Wed,  3 May 2017 14:44:35 +1000
Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:

> +void ncsi_ethtool_register_dev(struct net_device *dev)
> +{
> +	struct ethtool_ops *ops;
> +
> +	ops = (struct ethtool_ops *)(dev->ethtool_ops);
> +	if (!ops)
> +		return;
> +
> +	ops->get_ncsi_channels = ncsi_get_channels;
> +}
> +

Instead of casting away const which opens up potential security
issues. Have two ethtool_ops structures.

^ permalink raw reply

* Re: new warning at net/wireless/util.c:1236
From: Coelho, Luciano @ 2017-05-04  6:06 UTC (permalink / raw)
  To: torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	kvalo-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org
In-Reply-To: <87o9v99r3g.fsf-5ukZ45wKbUHoml4zekdYB16hYfS7NtTn@public.gmane.org>

On Thu, 2017-05-04 at 07:35 +0300, Kalle Valo wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > So my Dell XPS 13 seems to have grown a new warning as of the
> > networking merge yesterday.
> > 
> > Things still work, but when it starts warning, it generates a *lot* of
> > noise (I got 36 of these within about ten minutes).
> > 
> > I have no idea what triggered it, because when I rebooted (not because
> > of this issue, but just to reboot into a newer kernel) I don't see it
> > again.
> > 
> > This is all pretty regular wireless - it's intel 8260 wireless in a
> > fairly normal laptop.
> > 
> > Things still seem to *work* ok, so the only problem here is the overly
> > verbose and useless WARN_ON. It doesn't even print out *which* rate it
> > is warning about, it just does that stupid unconditional WARN_ON()
> > without ever shutting up about it..
> > 
> > The WARN_ON() seems to be old, but my logs don't seem to have any
> > mention of this until today, so there's something that has changed
> > that now triggers it.
> > 
> > Ideas?
> > 
> >                   Linus
> > 
> > ---
> > 
> > WARNING: CPU: 3 PID: 1138 at net/wireless/util.c:1236
> > cfg80211_calculate_bitrate+0x139/0x170 [cfg80211]
> 
> As this is with iwlwifi adding also Luca. There were some rate handling
> changes in iwlwifi, like commit 77e409455f41, but don't know if that
> could cause this.

Thanks Kalle.  I don't see anything in the iwlwifi driver that could be
causing this.  Johannes suspects some RX rate changes he made in
mac80211...

--
Luca.

^ permalink raw reply

* Re: [PATCH v4 net-next 04/10] net/ncsi: Ethtool operation to get NCSI topology
From: Gavin Shan @ 2017-05-04  6:15 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Gavin Shan, netdev, joe, kubakici, f.fainelli, davem
In-Reply-To: <20170503221944.1dd0d576@xeon-e3>

On Wed, May 03, 2017 at 10:19:44PM -0700, Stephen Hemminger wrote:
>On Wed,  3 May 2017 14:44:35 +1000
>Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>
>> +static int ethtool_get_ncsi_channels(struct net_device *dev,
>> +				     void __user *useraddr)
>
>Please don't use an opaque type for this. See how other ethtool
>operations take a struct.
>

After checking output from below command, all other ethtool operations
uses "void __user *" or "char __user *".

   git grep static.*useraddr net/core/ethtool.c

>> +{
>> +	struct ethtool_ncsi_channels *enc;
>> +	short nr_channels;
>Should be __u16 or unsigned not short.
>

Nope, It's for signed number. User expects to get number of available
channels when negative number is passed in. When it's positive, it's
going to get the channels' information.

>> +	ssize_t size = 0;
>> +	int ret;
>> +
>> +	if (!dev->ethtool_ops->get_ncsi_channels)
>> +		return -EOPNOTSUPP;
>> +
>> +	if (copy_from_user(&nr_channels, useraddr + sizeof(enc->cmd),
>> +			   sizeof(nr_channels)))
>> +		return -EFAULT;
>> +
>> +	size = sizeof(*enc);
>> +	if (nr_channels > 0)
>> +		size += nr_channels * sizeof(enc->id[0]);
>
>You have no upper bound on number of channels, and therefore an incorrectly
>application could grab an excessive amount of kernel memory.
>

Yeah, I'll limit it to 256 in next respin. 256 is the maximal number
of channels for one particular net device.

Cheers,
Gavin

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox