Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Latest net-next from GIT panic
From: Paweł Staszewski @ 2017-09-20 21:27 UTC (permalink / raw)
  To: Cong Wang, Eric Dumazet
  Cc: Wei Wang, Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <6c36fa00-13e6-8cd7-ff5c-df7739220736@itcare.pl>



W dniu 2017-09-20 o 23:25, Paweł Staszewski pisze:
>
>
> W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze:
>>
>>
>> W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze:
>>>
>>>
>>> W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze:
>>>>
>>>>
>>>> W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze:
>>>>>
>>>>>
>>>>> W dniu 2017-09-20 o 20:36, Cong Wang pisze:
>>>>>> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet 
>>>>>> <eric.dumazet@gmail.com> wrote:
>>>>>>> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote:
>>>>>>>> but dmesg at this time shows nothing about interfaces or flaps.
>>>>>>>>
>>>>>>>> This is very odd.
>>>>>>>>
>>>>>>>> We only free netdevice in free_netdev() and it is only called when
>>>>>>>> we unregister a netdevice. Otherwise pcpu_refcnt is impossible
>>>>>>>> to be NULL.
>>>>>>> If there is a missing dev_hold() or one dev_put() in excess,
>>>>>>> this would allow the netdev to be freed too soon.
>>>>>>>
>>>>>>> -> Use after free.
>>>>>>> memory holding netdev could be reallocated-cleared by some other 
>>>>>>> kernel
>>>>>>> user.
>>>>>>>
>>>>>> Sure, but only unregister could trigger a free. If there is no 
>>>>>> unregister,
>>>>>> like what Pawel claims, then there is no free, the refcnt just 
>>>>>> goes to
>>>>>> 0 but the memory is still there.
>>>>>>
>>>>> About possible mistake from my side with bisect - i can judge too 
>>>>> early that some bisect was good
>>>>> the road was:
>>>>> git bisect start
>>>>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 
>>>>> 'pinctrl-v4.13-1' of 
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
>>>>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
>>>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 
>>>>> 'next' of 
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
>>>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
>>>>> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid 
>>>>> using stack larger than 1024.
>>>>> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f
>>>>> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 
>>>>> 'udp-reduce-cache-pressure'
>>>>> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20
>>>>> # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch 
>>>>> 's390-net-updates-part-2'
>>>>> git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230
>>>>> # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch 
>>>>> 'bpf-ctx-narrow'
>>>>> git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70
>>>>> # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: 
>>>>> remove cp_outgoing
>>>>> git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2
>>>>> # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add 
>>>>> TCP_MD5SIG_EXT socket option to set a key address prefix
>>>>> git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d
>>>>> # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce 
>>>>> a new function dst_dev_put()
>>>>>
>>>>> And currently have this running for about 4 hours without problems.
>>>>>
>>>>>
>>>>>
>>>>> git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36
>>>>> # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove 
>>>>> DST_NOCACHE flag
>>>>>
>>>>> Here for sure - panic
>>>>>
>>>>> git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2
>>>>> # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call 
>>>>> dst_hold_safe() properly
>>>>> git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112
>>>>> # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call 
>>>>> dst_hold_safe() properly
>>>>> git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f
>>>>> # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take 
>>>>> dst->__refcnt for insertion into fib6 tree
>>>>>
>>>>> im not 100% sure tor last two
>>>>> Will test them again starting from
>>>>> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call 
>>>>> dst_dev_put() properly
>>>>>
>>>>>
>>>>> git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855
>>>>> # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark 
>>>>> DST_NOGC and remove the operation of dst_free()
>>>>>
>>>>>
>>>>>
>>>>> git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>>>> # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] 
>>>>> ipv4: mark DST_NOGC and remove the operation of dst_free()
>>>>>
>>>>>
>>>>>
>>>> What i can say more
>>>> I can reproduce this on any server with similar configuration
>>>> the difference can be teamd instead of bonding
>>>> ixgbe or i40e and mlx5
>>>> Same problems
>>>>
>>>> vlans - more or less prefixes learned from bgp -> zebra -> netlink 
>>>> -> kernel
>>>> But normally in lab when using only plain routing no bgpd and about 
>>>> 128 vlans - with 128 routes - cant reproduce this - this apperas 
>>>> only with bgp - minimum where i can reproduce this was about 130k 
>>>> prefixes with about 286 nexthops
>>>>
>>>>
>>>>
>>>>
>>> bisected again and same result:
>>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
>>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>> Author: Wei Wang <weiwan@google.com>
>>> Date:   Sat Jun 17 10:42:32 2017 -0700
>>>
>>>     ipv4: mark DST_NOGC and remove the operation of dst_free()
>>>
>>>     With the previous preparation patches, we are ready to get rid 
>>> of the
>>>     dst gc operation in ipv4 code and release dst based on refcnt only.
>>>     So this patch adds DST_NOGC flag for all IPv4 dst and remove the 
>>> calls
>>>     to dst_free().
>>>     At this point, all dst created in ipv4 code do not use the dst gc
>>>     anymore and will be destroyed at the point when refcnt drops to 0.
>>>
>>>     Signed-off-by: Wei Wang <weiwan@google.com>
>>>     Acked-by: Martin KaFai Lau <kafai@fb.com>
>>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>>
>>> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da 
>>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M      net
>>>
>>> Will add now version 2 of patch from Eric and we will see
>>>
>>>
>> after adding patch
>> perf top catch
>>    PerfTop:   77159 irqs/sec  kernel:99.7%  exact:  0.0% [4000Hz 
>> cycles],  (all, 40 CPUs)
>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
>>
>>
>>     60.95%  [kernel]            [k] dev_put.part.6
>>      4.00%  [kernel]            [k] ixgbe_poll
>>      3.63%  [kernel]            [k] irq_entries_start
>>      1.22%  [kernel]            [k] fib_table_lookup
>>      1.15%  [kernel]            [k] do_raw_spin_lock
>>      1.05%  [kernel]            [k] ixgbe_xmit_frame_ring
>>      1.04%  [kernel]            [k] lookup
>>      0.87%  [kernel]            [k] eth_type_trans
>>
>>
>> no panic on console - rebooting to check logs
>>
>>
> Nothing logged
>
>
after adding this patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3331,7 +3331,15 @@ void netdev_run_todo(void);
   */
  static inline void dev_put(struct net_device *dev)
  {
-	this_cpu_dec(*dev->pcpu_refcnt);
+	int __percpu *pref = READ_ONCE(dev->pcpu_refcnt);
+
+	if (!pref) {
+		pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n",
+		       dev, dev->name, dev->reg_state, dev->dismantle);
+		for (;;)
+			cpu_relax();
+	}
+	this_cpu_dec(*pref);
  }
  
  /**


Have just halted console - no output no reaction on kbd
nothing in any syslog/log and catched only something from perf top

^ permalink raw reply related

* Re: [PATCH] bgmac: Remove all offloading features, including GRO.
From: Florian Fainelli @ 2017-09-20 21:32 UTC (permalink / raw)
  To: rosenp, Eric Dumazet; +Cc: netdev
In-Reply-To: <1505942864.3451.3.camel@gmail.com>

On 09/20/2017 02:27 PM, rosenp@gmail.com wrote:
> Sorry for the noise. After more testing I've found out that the cause
> was that I had BBR enabled on my laptop. Switching back to CUBIC fixed
> the issue.
> 
> In other words, this patch is detrimental.

Quite unsurprisingly, thanks for coming back, and please don't top-post
on netdev.

> 
> ~67mbps - gro off
> ~87mbps - gro on
> 
> On Fri, 2017-09-15 at 23:04 -0700, Florian Fainelli wrote:
>> On September 15, 2017 5:38:42 PM PDT, rosenp@gmail.com wrote:
>>> I have not. Unfortunately I own no gigabit hardware to test this
>>> on.
>>> The MIPS CPU runs at 300MHz on my unit.
>>>
>>
>> bgmac is used on Gigabit capable hardware, like Northstar and
>> Northstar Plus, and others too, so unless you can get access to such
>> HW or get confirmation from someone that your patches changes
>> something, I would just drop this change and not bother. This is
>> already not 100mbits/sec linerate...
>>
>>> On Fri, 2017-09-15 at 17:34 -0700, Eric Dumazet wrote:
>>>> On Fri, 2017-09-15 at 17:23 -0700, Rosen Penev wrote:
>>>>> On a linksys E1200v1 (actually a crossflashed E1000v2), the
>>>>> offloading features give no measurable benefit to speed or
>>>>> latency.
>>>>> Furthermore, disabling GRO actually improves iperf performance
>>>>> by a
>>>>> whoppimg 3mbps. Results:
>>>>>
>>>>> Currently:
>>>>>
>>>>> v2: Changed napi_gro_receive to netif_receive_skb. Seems to
>>>>> have an
>>>>> identical result.
>>>>>
>>>>> Signed-off-by: Rosen Penev <rosenp@gmail.com>
>>>>> ---
>>>>>  drivers/net/ethernet/broadcom/bgmac.c | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/broadcom/bgmac.c
>>>>> b/drivers/net/ethernet/broadcom/bgmac.c
>>>>> index 48d672b204a4..1fb0053aeee7 100644
>>>>> --- a/drivers/net/ethernet/broadcom/bgmac.c
>>>>> +++ b/drivers/net/ethernet/broadcom/bgmac.c
>>>>> @@ -483,7 +483,7 @@ static int bgmac_dma_rx_read(struct bgmac
>>>>> *bgmac, struct bgmac_dma_ring *ring,
>>>>>  			skb->protocol = eth_type_trans(skb,
>>>>> bgmac-
>>>>>> net_dev);
>>>>>
>>>>>  			bgmac->net_dev->stats.rx_bytes += len;
>>>>>  			bgmac->net_dev->stats.rx_packets++;
>>>>> -			napi_gro_receive(&bgmac->napi, skb);
>>>>> +			netif_receive_skb(skb);
>>>>>  			handled++;
>>>>>  		} while (0);
>>>>>  
>>>>
>>>> And have you tested 1Gbit link speed ?
>>>> ( Or 2.5 Gbit link speed )
>>>>
>>>> If you want to disable GRO on your host, fine : you can use
>>>> ethtool
>>>> -K
>>>>
>>>>
>>>>
>>
>> (please don't top-post)


-- 
Florian

^ permalink raw reply

* Re: TSN Scorecard, was Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
From: Jesus Sanchez-Palencia @ 2017-09-20 21:29 UTC (permalink / raw)
  To: Richard Cochran, levipearson
  Cc: netdev, intel-wired-lan, vinicius.gomes, andre.guedes,
	ivan.briano, boon.leong.ong, jhs, xiyou.wangcong, jiri, henrik
In-Reply-To: <20170920054919.jtqckuybsu42tvpk@localhost>

Hi,

On 09/19/2017 10:49 PM, Richard Cochran wrote:
(...)

> 
> No, that is not what I meant.  We need some minimal additional kernel
> support in order to fully implement the TSN family of standards.  Of
> course, the bulk will have to be done in user space.  It would be a
> mistake to cram the stuff that belongs in userland into the kernel.
> 
> Looking at the table, and reading your descriptions of the state of
> OpenAVB, I remained convinced that the kernel needs only three
> additions:
> 
> 1. SO_TXTIME
> 2. CBS Qdisc
> 3. ALSA support for DAC clock control (but that is another story)

We'll be posting the CBS v1 series for review soon.

The current SO_TXTIME RFC for the purpose of Launchtime looks great, and we are
looking forward for the v1 + its companion qdisc so we can test / review and
provide feedback.

We are still under the impression that a config interface for HW offload of Qbv
/ Qbu config will be needed, but we'll be deferring the 'taprio' proposal until
there are NICs (end stations) that support these standards available. We can
revisit it if that ever happens, and if it's still needed, but then taking into
account SO_TXTIME (and its related qdisc).

Thanks everyone for all the feedback so far.

Regards,
Jesus

^ permalink raw reply

* Re: [RFC PATCH] can: m_can: Support higher speed CAN-FD bitrates
From: Mario Hüttel @ 2017-09-20 21:37 UTC (permalink / raw)
  To: Franklin S Cooper Jr, Yang, Wenyou, Sekhar Nori, wg, mkl,
	socketcan, quentin.schulz, edumazet, linux-can, netdev,
	linux-kernel
  Cc: Wenyou Yang, Dong Aisheng
In-Reply-To: <88d4ddc4-b786-0205-6852-56e93182a1c9@ti.com>


[-- Attachment #1.1: Type: text/plain, Size: 3666 bytes --]



On 09/20/2017 10:19 PM, Franklin S Cooper Jr wrote:
> Hi Wenyou,
>
> On 09/17/2017 10:47 PM, Yang, Wenyou wrote:
>>
>> On 2017/9/14 13:06, Sekhar Nori wrote:
>>> On Thursday 14 September 2017 03:28 AM, Franklin S Cooper Jr wrote:
>>>> On 08/18/2017 02:39 PM, Franklin S Cooper Jr wrote:
>>>>> During test transmitting using CAN-FD at high bitrates (4 Mbps) only
>>>>> resulted in errors. Scoping the signals I noticed that only a single
>>>>> bit
>>>>> was being transmitted and with a bit more investigation realized the
>>>>> actual
>>>>> MCAN IP would go back to initialization mode automatically.
>>>>>
>>>>> It appears this issue is due to the MCAN needing to use the Transmitter
>>>>> Delay Compensation Mode as defined in the MCAN User's Guide. When this
>>>>> mode is used the User's Guide indicates that the Transmitter Delay
>>>>> Compensation Offset register should be set. The document mentions
>>>>> that this
>>>>> register should be set to (1/dbitrate)/2*(Func Clk Freq).
>>>>>
>>>>> Additional CAN-CIA's "Bit Time Requirements for CAN FD" document
>>>>> indicates
>>>>> that this TDC mode is only needed for data bit rates above 2.5 Mbps.
>>>>> Therefore, only enable this mode and only set TDCO when the data bit
>>>>> rate
>>>>> is above 2.5 Mbps.
>>>>>
>>>>> Signed-off-by: Franklin S Cooper Jr <fcooper@ti.com>
>>>>> ---
>>>>> I'm pretty surprised that this hasn't been implemented already since
>>>>> the primary purpose of CAN-FD is to go beyond 1 Mbps and the MCAN IP
>>>>> supports up to 10 Mbps.
>>>>>
>>>>> So it will be nice to get comments from users of this driver to
>>>>> understand
>>>>> if they have been able to use CAN-FD beyond 2.5 Mbps without this
>>>>> patch.
>>>>> If they haven't what did they do to get around it if they needed higher
>>>>> speeds.
>>>>>
>>>>> Meanwhile I plan on testing this using a more "realistic" CAN bus to
>>>>> insure
>>>>> everything still works at 5 Mbps which is the max speed of my CAN
>>>>> transceiver.
>>>> ping. Anyone has any thoughts on this?
>>> I added Dong who authored the m_can driver and Wenyou who added the only
>>> in-kernel user of the driver for any help.
>> I tested it on SAMA5D2 Xplained board both with and without this patch, 
>> both work with the 4M bps data bit rate.
> Thank you for testing this out. Its interesting that you have been able
> to use higher speeds without this patch. What is the CAN transceiver
> being used on the SAMA5D2 Xplained board? I tried looking at the
> schematic but it seems the CAN signals are used on an extension board
> which I can't find the schematic for. Also do you mind sharing your test
> setup? Were you doing a short point to point test?
>
> Thank You,
> Franklin
Hello Franklin,

your patch definitely makes sense.

I forgot the TDC in my patches because it was not present in the
previous driver versions and because I didn't encounter any
problems when testing it myself.

The error is highly dependent on the hardware (transceiver) setup.
So it is definitely possible that some people don't encounter errors
without your patch.

Could you clarify what you meant with
> Scoping the signals I noticed that only a single bit was being transmitted 

Do you mean one data bit (high bit rate)  or did the core already fail
in the arbitration phase?

There is also another aspect that can lead to errors:

If the CAN clock 'cclk' is above the frequency of the interface/logic
clock 'hclk', the clock domain crossing of the CAN messages can't
work properly. However, I will throw this topic as an extra e-mail into
the round.

Mario

 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v2] bridge: also trigger RTM_NEWLINK when interface is released from bridge
From: David Ahern @ 2017-09-20 21:57 UTC (permalink / raw)
  To: David Miller, vincent; +Cc: stephen, bridge, netdev
In-Reply-To: <20170920.140956.781667692297467325.davem@davemloft.net>

On 9/20/17 3:09 PM, David Miller wrote:
> From: Vincent Bernat <vincent@bernat.im>
> Date: Sat, 16 Sep 2017 16:18:33 +0200
> 
> David, I am CC:'ing you because you've done work in this area over the
> past year.  I'm applying this patch, it's been sitting since the 16th
> and likes entirely correct to me.  But if you have objections just let
> me know.
> 
>> Currently, when an interface is released from a bridge via
>> ioctl(), we get a RTM_DELLINK event through netlink:
>>
>> Deleted 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>>     link/ether 6e:23:c2:54:3a:b3
>>
>> Userspace has to interpret that as a removal from the bridge, not as a
>> complete removal of the interface. When an bridged interface is
>> completely removed, we get two events:
>>
>> Deleted 2: dummy0: <BROADCAST,NOARP> mtu 1500 master bridge0 state DOWN
>>     link/ether 6e:23:c2:54:3a:b3
>> Deleted 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
>>     link/ether 6e:23:c2:54:3a:b3 brd ff:ff:ff:ff:ff:ff
>>
>> In constrast, when an interface is released from a bond, we get a
>> RTM_NEWLINK with only the new characteristics (no master):
>>
>> 3: dummy1: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UNKNOWN group default
>>     link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
>> 3: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
>>     link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
>> 4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
>>     link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
>> 3: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noqueue state DOWN group default
>>     link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
>> 3: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noqueue state DOWN group default
>>     link/ether ca:c8:7b:66:f8:25 brd ff:ff:ff:ff:ff:ff
>> 4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
>>     link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
>>
>> Userland may be confused by the fact we say a link is deleted while
>> its characteristics are only modified. A first solution would have
>> been to turn the RTM_DELLINK event in del_nbp() into a RTM_NEWLINK
>> event. However, maybe some piece of userland is relying on this
>> RTM_DELLINK to detect when a bridged interface is released. Instead,
>> we also emit a RTM_NEWLINK event once the interface is
>> released (without master info).
>>
>> Deleted 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>>     link/ether 8a:bb:e7:94:b1:f8
>> 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
>>     link/ether 8a:bb:e7:94:b1:f8 brd ff:ff:ff:ff:ff:ff
>>
>> This is done only when using ioctl(). When using Netlink, such an
>> event is already automatically emitted in do_setlink().

The DELLINK is for AF_BRIDGE family (ifi_family). Adding family to
print_linkinfo and running the monitor I get:


[LINK]family 0: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc
noqueue master br0 state UNKNOWN group default
    link/ether d6:c3:73:86:3c:73 brd ff:ff:ff:ff:ff:ff

[LINK]family 7: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500
master br0 state UNKNOWN
    link/ether d6:c3:73:86:3c:73

[LINK]Deleted family 7: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu
1500 master br0 state UNKNOWN
    link/ether d6:c3:73:86:3c:73

[LINK]family 0: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UNKNOWN group default
    link/ether d6:c3:73:86:3c:73 brd ff:ff:ff:ff:ff:ff

And that seems correct. So I think the RTM_NEWLINK added by this patch
should not be needed.

^ permalink raw reply

* Re: [PATCH v2 1/3] can: m_can: Make hclk optional
From: Mario Hüttel @ 2017-09-20 22:00 UTC (permalink / raw)
  To: Franklin S Cooper Jr, linux-kernel, devicetree, netdev, linux-can,
	wg, mkl, robh+dt, quentin.schulz
In-Reply-To: <20170724225142.19975-2-fcooper@ti.com>

[-- Attachment #1.1: Type: text/plain, Size: 2347 bytes --]

> Hclk is the MCAN's interface clock. However, for OMAP based devices such as
> DRA7 SoC family the interface clock is handled by hwmod. Therefore, this
> interface clock is managed by hwmod driver via pm_runtime_get and
> pm_runtime_put calls. Therefore, this interface clock isn't defined in DT
> and thus the driver shouldn't fail if this clock isn't found.
I also agree in this point.
However, there's a problem I want to point out:

The M_CAN can only function correctly, if the condition
'hclk >= cclk' holds true.

The internal clock domain crossing can fail if this condition
is violated.

I thought about adding the condition to the driver to ensure
correct operation. But I had some problems:

1. Determine the clock rates:
    The devices you've mentioned above don't have an assigned
    hclk. Is there still a possibility to get the clock frequency?

2. What to do if 'hclk < cclk'?
    Is there a general way to process such an error? - I think not.
    Is a simple warning in case of an error enough?

Is there a way of ensuring that the condition is always met at all?

I think it is quite unlikely that the condition is violated, because
devices that actually run Linux usually have (bus) clock rates that
are high enough to ensure the correct operation.

Would it be safe to just ignore this possible fault?

Regards

Mario

>
> Signed-off-by: Franklin S Cooper Jr <fcooper@ti.com>
> ---
> Version 2 changes:
> Used NULL instead of 0 for unused hclk handle
>
>  drivers/net/can/m_can/m_can.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
> index f4947a7..ea48e59 100644
> --- a/drivers/net/can/m_can/m_can.c
> +++ b/drivers/net/can/m_can/m_can.c
> @@ -1568,8 +1568,13 @@ static int m_can_plat_probe(struct platform_device *pdev)
>  	hclk = devm_clk_get(&pdev->dev, "hclk");
>  	cclk = devm_clk_get(&pdev->dev, "cclk");
>  
> -	if (IS_ERR(hclk) || IS_ERR(cclk)) {
> -		dev_err(&pdev->dev, "no clock found\n");
> +	if (IS_ERR(hclk)) {
> +		dev_warn(&pdev->dev, "hclk could not be found\n");
> +		hclk = NULL;
> +	}
> +
> +	if (IS_ERR(cclk)) {
> +		dev_err(&pdev->dev, "cclk could not be found\n");
>  		ret = -ENODEV;
>  		goto failed_ret;
>  	}

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 3/3] virtio-net: support XDP_REDIRECT
From: John Fastabend @ 2017-09-20 22:02 UTC (permalink / raw)
  To: Jason Wang, mst, virtualization, netdev, linux-kernel
In-Reply-To: <1505814163-7315-3-git-send-email-jasowang@redhat.com>

On 09/19/2017 02:42 AM, Jason Wang wrote:
> This patch tries to add XDP_REDIRECT for virtio-net. The changes are
> not complex as we could use exist XDP_TX helpers for most of the
> work. The rest is passing the XDP_TX to NAPI handler for implementing
> batching.
> 
> Cc: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---

[...]

> @@ -678,12 +711,20 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  			}
>  			break;
>  		case XDP_TX:
> -			if (unlikely(!virtnet_xdp_xmit(vi, &xdp)))
> +			if (unlikely(!__virtnet_xdp_xmit(vi, &xdp)))
>  				trace_xdp_exception(vi->dev, xdp_prog, act);
> +			else
> +				*xdp_xmit = true;
>  			if (unlikely(xdp_page != page))
>  				goto err_xdp;
>  			rcu_read_unlock();
>  			goto xdp_xmit;
> +		case XDP_REDIRECT:
> +			err = xdp_do_redirect(dev, &xdp, xdp_prog);
> +			if (err)
> +				*xdp_xmit = true;

Is this a typo? Should above be

			if (!err)
				*xdp_xmit = true;

this would match the pattern in receive_small and also make more sense.

> +			rcu_read_unlock();
> +			goto xdp_xmit;
>  		default:
>  			bpf_warn_invalid_xdp_action(act);
>  		case XDP_ABORTED:
> @@ -788,7 +829,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  }
>  

[...]

Otherwise looks good to me thanks for doing this.

.John

^ permalink raw reply

* Re: Latest net-next from GIT panic
From: Wei Wang @ 2017-09-20 22:09 UTC (permalink / raw)
  To: Paweł Staszewski
  Cc: Cong Wang, Eric Dumazet, Linux Kernel Network Developers,
	Eric Dumazet
In-Reply-To: <6c36fa00-13e6-8cd7-ff5c-df7739220736@itcare.pl>

>>> bisected again and same result:
>>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
>>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>> Author: Wei Wang <weiwan@google.com>
>>> Date:   Sat Jun 17 10:42:32 2017 -0700
>>>
>>>     ipv4: mark DST_NOGC and remove the operation of dst_free()
>>>
>>>     With the previous preparation patches, we are ready to get rid of the
>>>     dst gc operation in ipv4 code and release dst based on refcnt only.
>>>     So this patch adds DST_NOGC flag for all IPv4 dst and remove the
>>> calls
>>>     to dst_free().
>>>     At this point, all dst created in ipv4 code do not use the dst gc
>>>     anymore and will be destroyed at the point when refcnt drops to 0.
>>>
>>>     Signed-off-by: Wei Wang <weiwan@google.com>
>>>     Acked-by: Martin KaFai Lau <kafai@fb.com>
>>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>>
>>> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da
>>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M      net
>>>
>>> Will add now version 2 of patch from Eric and we will see
>>>
>>>
>> after adding patch
>> perf top catch
>>    PerfTop:   77159 irqs/sec  kernel:99.7%  exact:  0.0% [4000Hz cycles],
>> (all, 40 CPUs)
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>     60.95%  [kernel]            [k] dev_put.part.6
>>      4.00%  [kernel]            [k] ixgbe_poll
>>      3.63%  [kernel]            [k] irq_entries_start
>>      1.22%  [kernel]            [k] fib_table_lookup
>>      1.15%  [kernel]            [k] do_raw_spin_lock
>>      1.05%  [kernel]            [k] ixgbe_xmit_frame_ring
>>      1.04%  [kernel]            [k] lookup
>>      0.87%  [kernel]            [k] eth_type_trans
>>
>>
>> no panic on console - rebooting to check logs
>>
>>
> Nothing logged
>

Thanks very much Pawel for the feedback.

I was looking into the code (specifically IPv4 part) and found that in
free_fib_info_rcu(), we call free_nh_exceptions() without holding the
fnhe_lock. I am wondering if that could cause some race condition on
fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the
same dst could be happening.

But as we call free_fib_info_rcu() only after the grace period, and
the lookup code which could potentially modify
fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems
fine...


On Wed, Sep 20, 2017 at 2:25 PM, Paweł Staszewski <pstaszewski@itcare.pl> wrote:
>
>
> W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze:
>
>>
>>
>> W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze:
>>>
>>>
>>>
>>> W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze:
>>>>
>>>>
>>>>
>>>> W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze:
>>>>>
>>>>>
>>>>>
>>>>> W dniu 2017-09-20 o 20:36, Cong Wang pisze:
>>>>>>
>>>>>> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet
>>>>>> <eric.dumazet@gmail.com> wrote:
>>>>>>>
>>>>>>> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote:
>>>>>>>>
>>>>>>>> but dmesg at this time shows nothing about interfaces or flaps.
>>>>>>>>
>>>>>>>> This is very odd.
>>>>>>>>
>>>>>>>> We only free netdevice in free_netdev() and it is only called when
>>>>>>>> we unregister a netdevice. Otherwise pcpu_refcnt is impossible
>>>>>>>> to be NULL.
>>>>>>>
>>>>>>> If there is a missing dev_hold() or one dev_put() in excess,
>>>>>>> this would allow the netdev to be freed too soon.
>>>>>>>
>>>>>>> -> Use after free.
>>>>>>> memory holding netdev could be reallocated-cleared by some other
>>>>>>> kernel
>>>>>>> user.
>>>>>>>
>>>>>> Sure, but only unregister could trigger a free. If there is no
>>>>>> unregister,
>>>>>> like what Pawel claims, then there is no free, the refcnt just goes to
>>>>>> 0 but the memory is still there.
>>>>>>
>>>>> About possible mistake from my side with bisect - i can judge too early
>>>>> that some bisect was good
>>>>> the road was:
>>>>> git bisect start
>>>>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag
>>>>> 'pinctrl-v4.13-1' of
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
>>>>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
>>>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next'
>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
>>>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
>>>>> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using
>>>>> stack larger than 1024.
>>>>> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f
>>>>> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch
>>>>> 'udp-reduce-cache-pressure'
>>>>> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20
>>>>> # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch
>>>>> 's390-net-updates-part-2'
>>>>> git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230
>>>>> # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch
>>>>> 'bpf-ctx-narrow'
>>>>> git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70
>>>>> # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove
>>>>> cp_outgoing
>>>>> git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2
>>>>> # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add
>>>>> TCP_MD5SIG_EXT socket option to set a key address prefix
>>>>> git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d
>>>>> # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new
>>>>> function dst_dev_put()
>>>>>
>>>>> And currently have this running for about 4 hours without problems.
>>>>>
>>>>>
>>>>>
>>>>> git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36
>>>>> # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove
>>>>> DST_NOCACHE flag
>>>>>
>>>>> Here for sure - panic
>>>>>
>>>>> git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2
>>>>> # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call
>>>>> dst_hold_safe() properly
>>>>> git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112
>>>>> # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call
>>>>> dst_hold_safe() properly
>>>>> git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f
>>>>> # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take
>>>>> dst->__refcnt for insertion into fib6 tree
>>>>>
>>>>> im not 100% sure tor last two
>>>>> Will test them again starting from
>>>>> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put()
>>>>> properly
>>>>>
>>>>>
>>>>> git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855
>>>>> # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC
>>>>> and remove the operation of dst_free()
>>>>>
>>>>>
>>>>>
>>>>> git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>>>> # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4:
>>>>> mark DST_NOGC and remove the operation of dst_free()
>>>>>
>>>>>
>>>>>
>>>> What i can say more
>>>> I can reproduce this on any server with similar configuration
>>>> the difference can be teamd instead of bonding
>>>> ixgbe or i40e and mlx5
>>>> Same problems
>>>>
>>>> vlans - more or less prefixes learned from bgp -> zebra -> netlink ->
>>>> kernel
>>>> But normally in lab when using only plain routing no bgpd and about 128
>>>> vlans - with 128 routes - cant reproduce this - this apperas only with bgp -
>>>> minimum where i can reproduce this was about 130k prefixes with about 286
>>>> nexthops
>>>>
>>>>
>>>>
>>>>
>>> bisected again and same result:
>>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
>>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>> Author: Wei Wang <weiwan@google.com>
>>> Date:   Sat Jun 17 10:42:32 2017 -0700
>>>
>>>     ipv4: mark DST_NOGC and remove the operation of dst_free()
>>>
>>>     With the previous preparation patches, we are ready to get rid of the
>>>     dst gc operation in ipv4 code and release dst based on refcnt only.
>>>     So this patch adds DST_NOGC flag for all IPv4 dst and remove the
>>> calls
>>>     to dst_free().
>>>     At this point, all dst created in ipv4 code do not use the dst gc
>>>     anymore and will be destroyed at the point when refcnt drops to 0.
>>>
>>>     Signed-off-by: Wei Wang <weiwan@google.com>
>>>     Acked-by: Martin KaFai Lau <kafai@fb.com>
>>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>>
>>> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da
>>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M      net
>>>
>>> Will add now version 2 of patch from Eric and we will see
>>>
>>>
>> after adding patch
>> perf top catch
>>    PerfTop:   77159 irqs/sec  kernel:99.7%  exact:  0.0% [4000Hz cycles],
>> (all, 40 CPUs)
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>     60.95%  [kernel]            [k] dev_put.part.6
>>      4.00%  [kernel]            [k] ixgbe_poll
>>      3.63%  [kernel]            [k] irq_entries_start
>>      1.22%  [kernel]            [k] fib_table_lookup
>>      1.15%  [kernel]            [k] do_raw_spin_lock
>>      1.05%  [kernel]            [k] ixgbe_xmit_frame_ring
>>      1.04%  [kernel]            [k] lookup
>>      0.87%  [kernel]            [k] eth_type_trans
>>
>>
>> no panic on console - rebooting to check logs
>>
>>
> Nothing logged
>

^ permalink raw reply

* Re: [PATCH net-next v2] bridge: also trigger RTM_NEWLINK when interface is released from bridge
From: Vincent Bernat @ 2017-09-20 22:12 UTC (permalink / raw)
  To: David Ahern; +Cc: David Miller, stephen, bridge, netdev
In-Reply-To: <16e5566a-909d-ba83-7637-1fb6c93126bc@gmail.com>

 ❦ 20 septembre 2017 15:57 -0600, David Ahern <dsahern@gmail.com> :

> The DELLINK is for AF_BRIDGE family (ifi_family). Adding family to
> print_linkinfo and running the monitor I get:
>
>
> [LINK]family 0: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc
> noqueue master br0 state UNKNOWN group default
>     link/ether d6:c3:73:86:3c:73 brd ff:ff:ff:ff:ff:ff
>
> [LINK]family 7: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500
> master br0 state UNKNOWN
>     link/ether d6:c3:73:86:3c:73
>
> [LINK]Deleted family 7: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu
> 1500 master br0 state UNKNOWN
>     link/ether d6:c3:73:86:3c:73

I didn't know about the family. We can drop the patch.
-- 
One of the most striking differences between a cat and a lie is that a cat has
only nine lives.
		-- Mark Twain, "Pudd'nhead Wilson's Calendar"

^ permalink raw reply

* [PATCH 1/1] ip: ip_print: if no json obj is initialized default fp is stdout
From: Julien Fortin @ 2017-09-20 22:19 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nikolay, dsa, Julien Fortin

From: Julien Fortin <julien@cumulusnetworks.com>

The ip monitor didn't call `new_json_obj` (even for in non json context),
so the static FILE* _fp variable wasn't initialized, thus raising a
SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
paths won't have to call `new_json_obj`.

$ ip -t mon label link
    fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
    #3  0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d “%d: “, key=0x4545fc “ifindex”, t=PRINT_ANY) at ip_common.h:189
    #4  print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
    #5  0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
    #6  0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
        at libnetlink.c:761
	#7  0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
	#8  0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 “mon”, argc=3, argv=0x7fffffffe588) at ip.c:116
	#9  0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311

Traceback can be seen when running the following command in a new terminal.
$ ip link add dev br0 type bridge

Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
---
 ip/ip_print.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/ip/ip_print.c b/ip/ip_print.c
index 4cd6a0bc..8cce2011 100644
--- a/ip/ip_print.c
+++ b/ip/ip_print.c
@@ -23,6 +23,11 @@ static FILE *_fp;
 #define _IS_JSON_CONTEXT(type) ((type & PRINT_JSON || type & PRINT_ANY) && _jw)
 #define _IS_FP_CONTEXT(type) (!_jw && (type & PRINT_FP || type & PRINT_ANY))
 
+static inline FILE* _get_fp(void)
+{
+	return _fp ? _fp : stdout;
+};
+
 void new_json_obj(int json, FILE *fp)
 {
 	if (json) {
@@ -91,7 +96,7 @@ void open_json_array(enum output_type type, const char *str)
 			jsonw_name(_jw, str);
 		jsonw_start_array(_jw);
 	} else if (_IS_FP_CONTEXT(type)) {
-		fprintf(_fp, "%s", str);
+		fprintf(_get_fp(), "%s", str);
 	}
 }
 
@@ -105,7 +110,7 @@ void close_json_array(enum output_type type, const char *str)
 		jsonw_end_array(_jw);
 		jsonw_pretty(_jw, true);
 	} else if (_IS_FP_CONTEXT(type)) {
-		fprintf(_fp, "%s", str);
+		fprintf(_get_fp(), "%s", str);
 	}
 }
 
@@ -126,7 +131,7 @@ void close_json_array(enum output_type type, const char *str)
 			else						\
 				jsonw_##type_name##_field(_jw, key, value); \
 		} else if (_IS_FP_CONTEXT(t)) {				\
-			color_fprintf(_fp, color, fmt, value);          \
+			color_fprintf(_get_fp(), color, fmt, value);	\
 		}							\
 	}
 _PRINT_FUNC(int, int);
@@ -149,7 +154,7 @@ void print_color_string(enum output_type type,
 		else
 			jsonw_string_field(_jw, key, value);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value);
+		color_fprintf(_get_fp(), color, fmt, value);
 	}
 }
 
@@ -170,7 +175,7 @@ void print_color_bool(enum output_type type,
 		else
 			jsonw_bool(_jw, value);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value ? "true" : "false");
+		color_fprintf(_get_fp(), color, fmt, value ? "true" : "false");
 	}
 }
 
@@ -189,7 +194,7 @@ void print_color_0xhex(enum output_type type,
 		snprintf(b1, sizeof(b1), "%#x", hex);
 		print_string(PRINT_JSON, key, NULL, b1);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, hex);
+		color_fprintf(_get_fp(), color, fmt, hex);
 	}
 }
 
@@ -208,7 +213,7 @@ void print_color_hex(enum output_type type,
 		else
 			jsonw_string(_jw, b1);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, hex);
+		color_fprintf(_get_fp(), color, fmt, hex);
 	}
 }
 
@@ -228,6 +233,6 @@ void print_color_null(enum output_type type,
 		else
 			jsonw_null(_jw);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value);
+		color_fprintf(_get_fp(), color, fmt, value);
 	}
 }
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH] net: compat: assert the size of cmsg copied in is as expected
From: David Miller @ 2017-09-20 22:36 UTC (permalink / raw)
  To: mengxu.gatech; +Cc: netdev, linux-kernel, meng.xu, sanidhya, taesoo
In-Reply-To: <1505841553-39084-1-git-send-email-mengxu.gatech@gmail.com>

From: Meng Xu <mengxu.gatech@gmail.com>
Date: Tue, 19 Sep 2017 13:19:13 -0400

> The actual length of cmsg fetched in during the second loop
> (i.e., kcmsg - kcmsg_base) could be different from what we
> get from the first loop (i.e., kcmlen).
> 
> The main reason is that the two get_user() calls in the two
> loops (i.e., get_user(ucmlen, &ucmsg->cmsg_len) and
> __get_user(ucmlen, &ucmsg->cmsg_len)) could cause ucmlen
> to have different values even they fetch from the same userspace
> address, as user can race to change the memory content in
> &ucmsg->cmsg_len across fetches.
> 
> Although in the second loop, the sanity check
> if ((char *)kcmsg_base + kcmlen - (char *)kcmsg < CMSG_ALIGN(tmp))
> is inplace, it only ensures that the cmsg fetched in during the
> second loop does not exceed the length of kcmlen, but not
> necessarily equal to kcmlen. But indicated by the assignment
> kmsg->msg_controllen = kcmlen, we should enforce that.
> 
> This patch adds this additional sanity check and ensures that
> what is recorded in kmsg->msg_controllen is the actual cmsg length.
> 
> Signed-off-by: Meng Xu <mengxu.gatech@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next v2] bridge: also trigger RTM_NEWLINK when interface is released from bridge
From: David Miller @ 2017-09-20 22:41 UTC (permalink / raw)
  To: vincent; +Cc: dsahern, stephen, bridge, netdev
In-Reply-To: <87bmm56mu2.fsf@luffy.cx>

From: Vincent Bernat <vincent@bernat.im>
Date: Thu, 21 Sep 2017 00:12:53 +0200

>  ❦ 20 septembre 2017 15:57 -0600, David Ahern <dsahern@gmail.com> :
> 
>> The DELLINK is for AF_BRIDGE family (ifi_family). Adding family to
>> print_linkinfo and running the monitor I get:
>>
>>
>> [LINK]family 0: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc
>> noqueue master br0 state UNKNOWN group default
>>     link/ether d6:c3:73:86:3c:73 brd ff:ff:ff:ff:ff:ff
>>
>> [LINK]family 7: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500
>> master br0 state UNKNOWN
>>     link/ether d6:c3:73:86:3c:73
>>
>> [LINK]Deleted family 7: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu
>> 1500 master br0 state UNKNOWN
>>     link/ether d6:c3:73:86:3c:73
> 
> I didn't know about the family. We can drop the patch.

Ok, I've reverted.

Thanks.

^ permalink raw reply

* Re: [PATCH net] net: change skb->mac_header when Generic XDP calls adjust_head
From: David Miller @ 2017-09-20 22:45 UTC (permalink / raw)
  To: ecree; +Cc: netdev
In-Reply-To: <394385d4-796a-9ada-8f01-53a52d73a418@solarflare.com>

From: Edward Cree <ecree@solarflare.com>
Date: Tue, 19 Sep 2017 18:45:56 +0100

> Since XDP's view of the packet includes the MAC header, moving the start-
>  of-packet with bpf_xdp_adjust_head needs to also update the offset of the
>  MAC header (which is relative to skb->head, not to the skb->data that was
>  changed).
> Without this, tcpdump sees packets starting from the old MAC header rather
>  than the new one, at least in my tests on the loopback device.
> 
> Fixes: b5cdae3291f7 ("net: Generic XDP")
> Signed-off-by: Edward Cree <ecree@solarflare.com>

Applied and queued up for -stable, thanks Edward.

^ permalink raw reply

* Re: [PATCH v4 1/3] net: fec: only check queue 0 if RXF_0/TXF_0 interrupt is set
From: David Miller @ 2017-09-20 22:47 UTC (permalink / raw)
  To: troy.kisky; +Cc: fugang.duan, netdev, fabio.estevam, lznuaa
In-Reply-To: <1505867589-11784-1-git-send-email-troy.kisky@boundarydevices.com>

From: Troy Kisky <troy.kisky@boundarydevices.com>
Date: Tue, 19 Sep 2017 17:33:07 -0700

> Before queue 0 was always checked if any queue caused an interrupt.
> It is better to just mark queue 0 if queue 0 has caused an interrupt.
> 
> Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
> Acked-by: Fugang Duan <Fugang.duan@nxp.com>

Applied.

^ permalink raw reply

* Re: [PATCH v4 2/3] net: fec: remove unused interrupt FEC_ENET_TS_TIMER
From: David Miller @ 2017-09-20 22:47 UTC (permalink / raw)
  To: troy.kisky; +Cc: fugang.duan, netdev, fabio.estevam, lznuaa
In-Reply-To: <1505867589-11784-2-git-send-email-troy.kisky@boundarydevices.com>

From: Troy Kisky <troy.kisky@boundarydevices.com>
Date: Tue, 19 Sep 2017 17:33:08 -0700

> FEC_ENET_TS_TIMER is not checked in the interrupt routine
> so there is no need to enable it.
> 
> Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
> Acked-by: Fugang Duan <fugang.duan@nxp.com>

Applied.

^ permalink raw reply

* Re: [PATCH v4 3/3] net: fec: return IRQ_HANDLED if fec_ptp_check_pps_event handled it
From: David Miller @ 2017-09-20 22:47 UTC (permalink / raw)
  To: troy.kisky; +Cc: fugang.duan, netdev, fabio.estevam, lznuaa
In-Reply-To: <1505867589-11784-3-git-send-email-troy.kisky@boundarydevices.com>

From: Troy Kisky <troy.kisky@boundarydevices.com>
Date: Tue, 19 Sep 2017 17:33:09 -0700

> fec_ptp_check_pps_event will return 1 if FEC_T_TF_MASK caused
> an interrupt. Don't return IRQ_NONE in this case.
> 
> Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
> Acked-by: Fugang Duan <fugang.duan@nxp.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket
From: Vallish Vaidyeshwara @ 2017-09-20 22:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Eric Dumazet, Eduardo Valentin, David Miller, dwmw2, shuah,
	richardcochran, xiyou.wangcong, netdev, linux-kernel, anchalag,
	dwmw
In-Reply-To: <alpine.DEB.2.20.1709161131200.2105@nanos>

On Sat, Sep 16, 2017 at 11:47:56AM +0200, Thomas Gleixner wrote:
> On Fri, 8 Sep 2017, Eric Dumazet wrote:
> > On Fri, 2017-09-08 at 11:55 -0700, Eduardo Valentin wrote:
> > > Hello,
> > > 
> > > On Fri, Sep 08, 2017 at 10:26:45AM -0700, David Miller wrote:
> > > > From: David Woodhouse <dwmw2@infradead.org>
> > > > Date: Fri, 08 Sep 2017 18:23:22 +0100
> > > > 
> > > > > I don't know that anyone's ever tried saying "show me the chapter
> > > and
> > > > > verse of the documentation"
> > > > 
> > > > Do you know why I brought this up?  Because the person I am replying
> > > > to told me that the syscall documentation should have suggested this
> > > > or that.
> > > > 
> > > > That's why.
> > > 
> > > :-) My intention was for sure not to upset anybody.
> > > 
> > > Just to reiterate, the point of patch is simple, there was a change in
> > > behavior in the system call from one kernel version to the other. As I
> > > mentioned, I agree that the userspace could use other means to achieve
> > > the same, but still the system call behavior has changed.
> > > 
> > > > 
> > > > So let's concentrate on the other aspects of my reply, ok?
> > > 
> > > I agree. I would prefer to understand here what is the technical
> > > reason not to accept these patches other than "use other system
> > > calls".
> > 
> > So if we need to replace all 'legacy' timers to high resolution timer,
> > because some application was _relying_ on jiffies being kind of precise,
> > maybe it is better to revert the change done on legacy timers.
> 
> Which would be a major step back in terms of timer performance and system
> disturbance caused by massive recascading operations.
> 
> > Or continue the migration and make them use high res internally.
> > 
> > select() and poll() are the standard way to have precise timeouts,
> > it is silly we have to maintain a timeout handling in the datagram fast
> > path.
> 
> A few years ago we switched select/poll over to use hrtimers because the
> wheel timers were too inaccurate for some operations, so it feels
> consequent to switch the timeout in the datagram rcv path over as well. I
> agree that the whole timeout magic there feels silly, but unfortunately
> it's a documented property of sockets.
> 
> Thanks,
> 
> 	tglx
> 

Hello Thomas,

Thanks for your comments. This patch has been NACK'ed by David Miller. Is
there any other approach to solve this problem with out application code
being recompiled?

Thanks.
-Vallish

^ permalink raw reply

* Re: [PATCH net-next] net: dsa: Utilize dsa_slave_dev_check()
From: David Miller @ 2017-09-20 22:49 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, andrew, vivien.didelot, linux-kernel
In-Reply-To: <20170920010038.12393-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue, 19 Sep 2017 18:00:37 -0700

> Instead of open coding the check.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied, thanks.

^ permalink raw reply

* [PATCH net 0/2] Bring back transceiver type for PHYLIB
From: Florian Fainelli @ 2017-09-20 22:52 UTC (permalink / raw)
  To: netdev; +Cc: davem, linville, decot, tremyfr, andrew, Florian Fainelli

Hi

With the introduction of the xLINKSETTINGS ethtool APIs, the transceiver type
was deprecated, but in that process we lost some useful information that PHYLIB
was consistently reporting about internal vs. external PHYs.

This brings back transceiver as a read-only field that is only consumed in the
legacy path where ETHTOOL_GET is called but the underlying drivers implement the
new style klink_settings API.

Florian Fainelli (2):
  net: ethtool: Add back transceiver type
  net: phy: Keep reporting transceiver type

 drivers/net/phy/phy.c        | 3 ++-
 include/uapi/linux/ethtool.h | 6 +++++-
 net/core/ethtool.c           | 2 ++
 3 files changed, 9 insertions(+), 2 deletions(-)

-- 
2.9.3

^ permalink raw reply

* [PATCH net 1/2] net: ethtool: Add back transceiver type
From: Florian Fainelli @ 2017-09-20 22:52 UTC (permalink / raw)
  To: netdev; +Cc: davem, linville, decot, tremyfr, andrew, Florian Fainelli
In-Reply-To: <20170920225214.21974-1-f.fainelli@gmail.com>

Commit 3f1ac7a700d0 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
deprecated the ethtool_cmd::transceiver field, which was fine in
premise, except that the PHY library was actually using it to report the
type of transceiver: internal or external.

Use the first word of the reserved field to put this __u8 transceiver
field back in. It is made read-only, and we don't expect the
ETHTOOL_xLINKSETTINGS API to be doing anything with this anyway, so this
is mostly for the legacy path where we do:

ethtool_get_settings()
-> dev->ethtool_ops->get_link_ksettings()
   -> convert_link_ksettings_to_legacy_settings()

to have no information loss compared to the legacy get_settings API.

Fixes: 3f1ac7a700d0 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 include/uapi/linux/ethtool.h | 6 +++++-
 net/core/ethtool.c           | 2 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 9c041dae8e2c..5bd1b1de4ea0 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1753,6 +1753,8 @@ enum ethtool_reset_flags {
  *	%ethtool_link_mode_bit_indices for the link modes, and other
  *	link features that the link partner advertised through
  *	autonegotiation; 0 if unknown or not applicable.  Read-only.
+ * @transceiver: Used to distinguish different possible PHY types,
+ *	reported consistently by PHYLIB.  Read-only.
  *
  * If autonegotiation is disabled, the speed and @duplex represent the
  * fixed link mode and are writable if the driver supports multiple
@@ -1804,7 +1806,9 @@ struct ethtool_link_settings {
 	__u8	eth_tp_mdix;
 	__u8	eth_tp_mdix_ctrl;
 	__s8	link_mode_masks_nwords;
-	__u32	reserved[8];
+	__u8	transceiver;
+	__u8	reserved1[3];
+	__u32	reserved[7];
 	__u32	link_mode_masks[0];
 	/* layout of link_mode_masks fields:
 	 * __u32 map_supported[link_mode_masks_nwords];
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 6a582ae4c5d9..3228411ada0f 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -525,6 +525,8 @@ convert_link_ksettings_to_legacy_settings(
 		= link_ksettings->base.eth_tp_mdix;
 	legacy_settings->eth_tp_mdix_ctrl
 		= link_ksettings->base.eth_tp_mdix_ctrl;
+	legacy_settings->transceiver
+		= link_ksettings->base.transceiver;
 	return retval;
 }
 
-- 
2.9.3

^ permalink raw reply related

* [PATCH net 2/2] net: phy: Keep reporting transceiver type
From: Florian Fainelli @ 2017-09-20 22:52 UTC (permalink / raw)
  To: netdev; +Cc: davem, linville, decot, tremyfr, andrew, Florian Fainelli
In-Reply-To: <20170920225214.21974-1-f.fainelli@gmail.com>

With commit 2d55173e71b0 ("phy: add generic function to support
ksetting support"), we lost the ability to report the transceiver type
like we used to. Now that we have added back the transceiver type to
ethtool_link_settings, we can report it back like we used to and have no
loss of information.

Fixes: 3f1ac7a700d0 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/phy/phy.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index e842d2cd1ee7..2b1e67bc1e73 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -373,7 +373,8 @@ void phy_ethtool_ksettings_get(struct phy_device *phydev,
 		cmd->base.port = PORT_BNC;
 	else
 		cmd->base.port = PORT_MII;
-
+	cmd->base.transceiver = phy_is_internal(phydev) ?
+				XCVR_INTERNAL : XCVR_EXTERNAL;
 	cmd->base.phy_address = phydev->mdio.addr;
 	cmd->base.autoneg = phydev->autoneg;
 	cmd->base.eth_tp_mdix_ctrl = phydev->mdix_ctrl;
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH net-next] bpf: Optimize lpm trie delete
From: Daniel Mack @ 2017-09-20 22:56 UTC (permalink / raw)
  To: Craig Gallek
  Cc: Alexei Starovoitov, Daniel Borkmann, David S . Miller, netdev
In-Reply-To: <CAEfhGiy_hC0tCd=5qoPZ1okyMCYsgMQbKAa0ms1Tb9cnYDaLBg@mail.gmail.com>

On 09/20/2017 08:51 PM, Craig Gallek wrote:
> On Wed, Sep 20, 2017 at 12:51 PM, Daniel Mack <daniel@zonque.org> wrote:
>> Hi Craig,
>>
>> Thanks, this looks much cleaner already :)
>>
>> On 09/20/2017 06:22 PM, Craig Gallek wrote:
>>> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
>>> index 9d58a576b2ae..b5a7d70ec8b5 100644
>>> --- a/kernel/bpf/lpm_trie.c
>>> +++ b/kernel/bpf/lpm_trie.c
>>> @@ -397,7 +397,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
>>>       struct lpm_trie_node __rcu **trim;
>>>       struct lpm_trie_node *node;
>>>       unsigned long irq_flags;
>>> -     unsigned int next_bit;
>>> +     unsigned int next_bit = 0;
>>
>> This default assignment seems wrong, and I guess you only added it to
>> squelch a compiler warning?
> Yes, this variable is only initialized after the lookup iterations
> below (meaning it will never be initialized the the root-removal
> case).

Right, and once set, it's only updated in case we don't have an exact
match and try to drill down further.

>> [...]
>>
>>> +     /* If the node has one child, we may be able to collapse the tree
>>> +      * while removing this node if the node's child is in the same
>>> +      * 'next bit' slot as this node was in its parent or if the node
>>> +      * itself is the root.
>>> +      */
>>> +     if (trim == &trie->root) {
>>> +             next_bit = node->child[0] ? 0 : 1;
>>> +             rcu_assign_pointer(trie->root, node->child[next_bit]);
>>> +             kfree_rcu(node, rcu);
>>
>> I don't think you should treat this 'root' case special.
>>
>> Instead, move the 'next_bit' assignment outside of the condition ...
> I'm not quite sure I follow.  Are you saying do something like this:
> 
> if (trim == &trie->root) {
>   next_bit = node->child[0] ? 0 : 1;
> }
> if (rcu_access_pointer(node->child[next_bit])) {
> ...
> 
> This would save a couple lines of code, but I think the as-is
> implementation is slightly easier to understand.  I don't have a
> strong opinion either way, though.

Me neither :)

My idea was to set

  next_bit = node->child[0] ? 0 : 1;

unconditionally, because it should result in the same in both cases.

It might be a bit of bike shedding, but I dislike this default
assignment, and I believe that not relying on next_bit to be set as a
side effect of the lookup loop makes the code a bit more readable.

WDYT?


Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH 0/2] blackfin: Drop non-functional DSA code
From: David Miller @ 2017-09-20 22:57 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, andrew, vivien.didelot, realmz6, adi-buildroot-devel
In-Reply-To: <20170920010346.16871-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue, 19 Sep 2017 18:03:44 -0700

> I sent those many months ago in the hope that the bfin-linux people
> would pick those patches but nobody seems to be responding, can you
> queue those via net-next since this affects DSA?

Ok, if they aren't responding what can we do.  Probably should update
the MAINTAINERS file to reflect this...

Series applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH] isdn/i4l: fetch the ppp_write buffer in one shot
From: David Miller @ 2017-09-20 23:01 UTC (permalink / raw)
  To: mengxu.gatech
  Cc: isdn, johannes.berg, netdev, linux-kernel, meng.xu, sanidhya,
	taesoo
In-Reply-To: <1505872195-46627-1-git-send-email-mengxu.gatech@gmail.com>

From: Meng Xu <mengxu.gatech@gmail.com>
Date: Tue, 19 Sep 2017 21:49:55 -0400

> In isdn_ppp_write(), the header (i.e., protobuf) of the buffer is
> fetched twice from userspace. The first fetch is used to peek at the
> protocol of the message and reset the huptimer if necessary; while the
> second fetch copies in the whole buffer. However, given that buf resides
> in userspace memory, a user process can race to change its memory content
> across fetches. By doing so, we can either avoid resetting the huptimer
> for any type of packets (by first setting proto to PPP_LCP and later
> change to the actual type) or force resetting the huptimer for LCP
> packets.
> 
> This patch changes this double-fetch behavior into two single fetches
> decided by condition (lp->isdn_device < 0 || lp->isdn_channel <0).
> A more detailed discussion can be found at
> https://marc.info/?l=linux-kernel&m=150586376926123&w=2
> 
> Signed-off-by: Meng Xu <mengxu.gatech@gmail.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net] net/ncsi: Don't assume last available channel exists
From: David Miller @ 2017-09-20 23:05 UTC (permalink / raw)
  To: sam; +Cc: netdev, linux-kernel
In-Reply-To: <20170920041251.14635-1-sam@mendozajonas.com>

From: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Date: Wed, 20 Sep 2017 14:12:51 +1000

> When handling new VLAN tags in NCSI we check the maximum allowed number
> of filters on the last active ("hot") channel. However if the 'add'
> callback is called before NCSI has configured a channel, this causes a
> NULL dereference.
> 
> Check that we actually have a hot channel, and warn if it is missing.
> 
> Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>

Well, a few things.

We should not allow this driver method to be invoked in the first
place if the device is not in a state where the operation is legal
yet.

Second of all, if !ndp is true, you should not return 0 to indicate
success.

But more fundamentally, we should block this method from being
callable unless the device is in the proper state and can complete the
operation.

Seriously, these checks should be completely unnecessary if those
issues are handled properly.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox