* Re: [PATCH net-next 3/3] virtio-net: support XDP_REDIRECT
From: John Fastabend @ 2017-09-20 22:02 UTC (permalink / raw)
To: Jason Wang, mst, virtualization, netdev, linux-kernel
In-Reply-To: <1505814163-7315-3-git-send-email-jasowang@redhat.com>
On 09/19/2017 02:42 AM, Jason Wang wrote:
> This patch tries to add XDP_REDIRECT for virtio-net. The changes are
> not complex as we could use exist XDP_TX helpers for most of the
> work. The rest is passing the XDP_TX to NAPI handler for implementing
> batching.
>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
[...]
> @@ -678,12 +711,20 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> }
> break;
> case XDP_TX:
> - if (unlikely(!virtnet_xdp_xmit(vi, &xdp)))
> + if (unlikely(!__virtnet_xdp_xmit(vi, &xdp)))
> trace_xdp_exception(vi->dev, xdp_prog, act);
> + else
> + *xdp_xmit = true;
> if (unlikely(xdp_page != page))
> goto err_xdp;
> rcu_read_unlock();
> goto xdp_xmit;
> + case XDP_REDIRECT:
> + err = xdp_do_redirect(dev, &xdp, xdp_prog);
> + if (err)
> + *xdp_xmit = true;
Is this a typo? Should above be
if (!err)
*xdp_xmit = true;
this would match the pattern in receive_small and also make more sense.
> + rcu_read_unlock();
> + goto xdp_xmit;
> default:
> bpf_warn_invalid_xdp_action(act);
> case XDP_ABORTED:
> @@ -788,7 +829,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> }
>
[...]
Otherwise looks good to me thanks for doing this.
.John
^ permalink raw reply
* Re: [PATCH v2 1/3] can: m_can: Make hclk optional
From: Mario Hüttel @ 2017-09-20 22:00 UTC (permalink / raw)
To: Franklin S Cooper Jr, linux-kernel, devicetree, netdev, linux-can,
wg, mkl, robh+dt, quentin.schulz
In-Reply-To: <20170724225142.19975-2-fcooper@ti.com>
[-- Attachment #1.1: Type: text/plain, Size: 2347 bytes --]
> Hclk is the MCAN's interface clock. However, for OMAP based devices such as
> DRA7 SoC family the interface clock is handled by hwmod. Therefore, this
> interface clock is managed by hwmod driver via pm_runtime_get and
> pm_runtime_put calls. Therefore, this interface clock isn't defined in DT
> and thus the driver shouldn't fail if this clock isn't found.
I also agree in this point.
However, there's a problem I want to point out:
The M_CAN can only function correctly, if the condition
'hclk >= cclk' holds true.
The internal clock domain crossing can fail if this condition
is violated.
I thought about adding the condition to the driver to ensure
correct operation. But I had some problems:
1. Determine the clock rates:
The devices you've mentioned above don't have an assigned
hclk. Is there still a possibility to get the clock frequency?
2. What to do if 'hclk < cclk'?
Is there a general way to process such an error? - I think not.
Is a simple warning in case of an error enough?
Is there a way of ensuring that the condition is always met at all?
I think it is quite unlikely that the condition is violated, because
devices that actually run Linux usually have (bus) clock rates that
are high enough to ensure the correct operation.
Would it be safe to just ignore this possible fault?
Regards
Mario
>
> Signed-off-by: Franklin S Cooper Jr <fcooper@ti.com>
> ---
> Version 2 changes:
> Used NULL instead of 0 for unused hclk handle
>
> drivers/net/can/m_can/m_can.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
> index f4947a7..ea48e59 100644
> --- a/drivers/net/can/m_can/m_can.c
> +++ b/drivers/net/can/m_can/m_can.c
> @@ -1568,8 +1568,13 @@ static int m_can_plat_probe(struct platform_device *pdev)
> hclk = devm_clk_get(&pdev->dev, "hclk");
> cclk = devm_clk_get(&pdev->dev, "cclk");
>
> - if (IS_ERR(hclk) || IS_ERR(cclk)) {
> - dev_err(&pdev->dev, "no clock found\n");
> + if (IS_ERR(hclk)) {
> + dev_warn(&pdev->dev, "hclk could not be found\n");
> + hclk = NULL;
> + }
> +
> + if (IS_ERR(cclk)) {
> + dev_err(&pdev->dev, "cclk could not be found\n");
> ret = -ENODEV;
> goto failed_ret;
> }
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH net-next v2] bridge: also trigger RTM_NEWLINK when interface is released from bridge
From: David Ahern @ 2017-09-20 21:57 UTC (permalink / raw)
To: David Miller, vincent; +Cc: stephen, bridge, netdev
In-Reply-To: <20170920.140956.781667692297467325.davem@davemloft.net>
On 9/20/17 3:09 PM, David Miller wrote:
> From: Vincent Bernat <vincent@bernat.im>
> Date: Sat, 16 Sep 2017 16:18:33 +0200
>
> David, I am CC:'ing you because you've done work in this area over the
> past year. I'm applying this patch, it's been sitting since the 16th
> and likes entirely correct to me. But if you have objections just let
> me know.
>
>> Currently, when an interface is released from a bridge via
>> ioctl(), we get a RTM_DELLINK event through netlink:
>>
>> Deleted 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>> link/ether 6e:23:c2:54:3a:b3
>>
>> Userspace has to interpret that as a removal from the bridge, not as a
>> complete removal of the interface. When an bridged interface is
>> completely removed, we get two events:
>>
>> Deleted 2: dummy0: <BROADCAST,NOARP> mtu 1500 master bridge0 state DOWN
>> link/ether 6e:23:c2:54:3a:b3
>> Deleted 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
>> link/ether 6e:23:c2:54:3a:b3 brd ff:ff:ff:ff:ff:ff
>>
>> In constrast, when an interface is released from a bond, we get a
>> RTM_NEWLINK with only the new characteristics (no master):
>>
>> 3: dummy1: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UNKNOWN group default
>> link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
>> 3: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
>> link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
>> 4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
>> link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
>> 3: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noqueue state DOWN group default
>> link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
>> 3: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noqueue state DOWN group default
>> link/ether ca:c8:7b:66:f8:25 brd ff:ff:ff:ff:ff:ff
>> 4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
>> link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
>>
>> Userland may be confused by the fact we say a link is deleted while
>> its characteristics are only modified. A first solution would have
>> been to turn the RTM_DELLINK event in del_nbp() into a RTM_NEWLINK
>> event. However, maybe some piece of userland is relying on this
>> RTM_DELLINK to detect when a bridged interface is released. Instead,
>> we also emit a RTM_NEWLINK event once the interface is
>> released (without master info).
>>
>> Deleted 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>> link/ether 8a:bb:e7:94:b1:f8
>> 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
>> link/ether 8a:bb:e7:94:b1:f8 brd ff:ff:ff:ff:ff:ff
>>
>> This is done only when using ioctl(). When using Netlink, such an
>> event is already automatically emitted in do_setlink().
The DELLINK is for AF_BRIDGE family (ifi_family). Adding family to
print_linkinfo and running the monitor I get:
[LINK]family 0: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc
noqueue master br0 state UNKNOWN group default
link/ether d6:c3:73:86:3c:73 brd ff:ff:ff:ff:ff:ff
[LINK]family 7: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500
master br0 state UNKNOWN
link/ether d6:c3:73:86:3c:73
[LINK]Deleted family 7: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu
1500 master br0 state UNKNOWN
link/ether d6:c3:73:86:3c:73
[LINK]family 0: 35: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UNKNOWN group default
link/ether d6:c3:73:86:3c:73 brd ff:ff:ff:ff:ff:ff
And that seems correct. So I think the RTM_NEWLINK added by this patch
should not be needed.
^ permalink raw reply
* Re: [RFC PATCH] can: m_can: Support higher speed CAN-FD bitrates
From: Mario Hüttel @ 2017-09-20 21:37 UTC (permalink / raw)
To: Franklin S Cooper Jr, Yang, Wenyou, Sekhar Nori, wg, mkl,
socketcan, quentin.schulz, edumazet, linux-can, netdev,
linux-kernel
Cc: Wenyou Yang, Dong Aisheng
In-Reply-To: <88d4ddc4-b786-0205-6852-56e93182a1c9@ti.com>
[-- Attachment #1.1: Type: text/plain, Size: 3666 bytes --]
On 09/20/2017 10:19 PM, Franklin S Cooper Jr wrote:
> Hi Wenyou,
>
> On 09/17/2017 10:47 PM, Yang, Wenyou wrote:
>>
>> On 2017/9/14 13:06, Sekhar Nori wrote:
>>> On Thursday 14 September 2017 03:28 AM, Franklin S Cooper Jr wrote:
>>>> On 08/18/2017 02:39 PM, Franklin S Cooper Jr wrote:
>>>>> During test transmitting using CAN-FD at high bitrates (4 Mbps) only
>>>>> resulted in errors. Scoping the signals I noticed that only a single
>>>>> bit
>>>>> was being transmitted and with a bit more investigation realized the
>>>>> actual
>>>>> MCAN IP would go back to initialization mode automatically.
>>>>>
>>>>> It appears this issue is due to the MCAN needing to use the Transmitter
>>>>> Delay Compensation Mode as defined in the MCAN User's Guide. When this
>>>>> mode is used the User's Guide indicates that the Transmitter Delay
>>>>> Compensation Offset register should be set. The document mentions
>>>>> that this
>>>>> register should be set to (1/dbitrate)/2*(Func Clk Freq).
>>>>>
>>>>> Additional CAN-CIA's "Bit Time Requirements for CAN FD" document
>>>>> indicates
>>>>> that this TDC mode is only needed for data bit rates above 2.5 Mbps.
>>>>> Therefore, only enable this mode and only set TDCO when the data bit
>>>>> rate
>>>>> is above 2.5 Mbps.
>>>>>
>>>>> Signed-off-by: Franklin S Cooper Jr <fcooper@ti.com>
>>>>> ---
>>>>> I'm pretty surprised that this hasn't been implemented already since
>>>>> the primary purpose of CAN-FD is to go beyond 1 Mbps and the MCAN IP
>>>>> supports up to 10 Mbps.
>>>>>
>>>>> So it will be nice to get comments from users of this driver to
>>>>> understand
>>>>> if they have been able to use CAN-FD beyond 2.5 Mbps without this
>>>>> patch.
>>>>> If they haven't what did they do to get around it if they needed higher
>>>>> speeds.
>>>>>
>>>>> Meanwhile I plan on testing this using a more "realistic" CAN bus to
>>>>> insure
>>>>> everything still works at 5 Mbps which is the max speed of my CAN
>>>>> transceiver.
>>>> ping. Anyone has any thoughts on this?
>>> I added Dong who authored the m_can driver and Wenyou who added the only
>>> in-kernel user of the driver for any help.
>> I tested it on SAMA5D2 Xplained board both with and without this patch,
>> both work with the 4M bps data bit rate.
> Thank you for testing this out. Its interesting that you have been able
> to use higher speeds without this patch. What is the CAN transceiver
> being used on the SAMA5D2 Xplained board? I tried looking at the
> schematic but it seems the CAN signals are used on an extension board
> which I can't find the schematic for. Also do you mind sharing your test
> setup? Were you doing a short point to point test?
>
> Thank You,
> Franklin
Hello Franklin,
your patch definitely makes sense.
I forgot the TDC in my patches because it was not present in the
previous driver versions and because I didn't encounter any
problems when testing it myself.
The error is highly dependent on the hardware (transceiver) setup.
So it is definitely possible that some people don't encounter errors
without your patch.
Could you clarify what you meant with
> Scoping the signals I noticed that only a single bit was being transmitted
Do you mean one data bit (high bit rate) or did the core already fail
in the arbitration phase?
There is also another aspect that can lead to errors:
If the CAN clock 'cclk' is above the frequency of the interface/logic
clock 'hclk', the clock domain crossing of the CAN messages can't
work properly. However, I will throw this topic as an extra e-mail into
the round.
Mario
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: TSN Scorecard, was Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
From: Jesus Sanchez-Palencia @ 2017-09-20 21:29 UTC (permalink / raw)
To: Richard Cochran, levipearson
Cc: netdev, intel-wired-lan, vinicius.gomes, andre.guedes,
ivan.briano, boon.leong.ong, jhs, xiyou.wangcong, jiri, henrik
In-Reply-To: <20170920054919.jtqckuybsu42tvpk@localhost>
Hi,
On 09/19/2017 10:49 PM, Richard Cochran wrote:
(...)
>
> No, that is not what I meant. We need some minimal additional kernel
> support in order to fully implement the TSN family of standards. Of
> course, the bulk will have to be done in user space. It would be a
> mistake to cram the stuff that belongs in userland into the kernel.
>
> Looking at the table, and reading your descriptions of the state of
> OpenAVB, I remained convinced that the kernel needs only three
> additions:
>
> 1. SO_TXTIME
> 2. CBS Qdisc
> 3. ALSA support for DAC clock control (but that is another story)
We'll be posting the CBS v1 series for review soon.
The current SO_TXTIME RFC for the purpose of Launchtime looks great, and we are
looking forward for the v1 + its companion qdisc so we can test / review and
provide feedback.
We are still under the impression that a config interface for HW offload of Qbv
/ Qbu config will be needed, but we'll be deferring the 'taprio' proposal until
there are NICs (end stations) that support these standards available. We can
revisit it if that ever happens, and if it's still needed, but then taking into
account SO_TXTIME (and its related qdisc).
Thanks everyone for all the feedback so far.
Regards,
Jesus
^ permalink raw reply
* Re: [PATCH] bgmac: Remove all offloading features, including GRO.
From: Florian Fainelli @ 2017-09-20 21:32 UTC (permalink / raw)
To: rosenp, Eric Dumazet; +Cc: netdev
In-Reply-To: <1505942864.3451.3.camel@gmail.com>
On 09/20/2017 02:27 PM, rosenp@gmail.com wrote:
> Sorry for the noise. After more testing I've found out that the cause
> was that I had BBR enabled on my laptop. Switching back to CUBIC fixed
> the issue.
>
> In other words, this patch is detrimental.
Quite unsurprisingly, thanks for coming back, and please don't top-post
on netdev.
>
> ~67mbps - gro off
> ~87mbps - gro on
>
> On Fri, 2017-09-15 at 23:04 -0700, Florian Fainelli wrote:
>> On September 15, 2017 5:38:42 PM PDT, rosenp@gmail.com wrote:
>>> I have not. Unfortunately I own no gigabit hardware to test this
>>> on.
>>> The MIPS CPU runs at 300MHz on my unit.
>>>
>>
>> bgmac is used on Gigabit capable hardware, like Northstar and
>> Northstar Plus, and others too, so unless you can get access to such
>> HW or get confirmation from someone that your patches changes
>> something, I would just drop this change and not bother. This is
>> already not 100mbits/sec linerate...
>>
>>> On Fri, 2017-09-15 at 17:34 -0700, Eric Dumazet wrote:
>>>> On Fri, 2017-09-15 at 17:23 -0700, Rosen Penev wrote:
>>>>> On a linksys E1200v1 (actually a crossflashed E1000v2), the
>>>>> offloading features give no measurable benefit to speed or
>>>>> latency.
>>>>> Furthermore, disabling GRO actually improves iperf performance
>>>>> by a
>>>>> whoppimg 3mbps. Results:
>>>>>
>>>>> Currently:
>>>>>
>>>>> v2: Changed napi_gro_receive to netif_receive_skb. Seems to
>>>>> have an
>>>>> identical result.
>>>>>
>>>>> Signed-off-by: Rosen Penev <rosenp@gmail.com>
>>>>> ---
>>>>> drivers/net/ethernet/broadcom/bgmac.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/broadcom/bgmac.c
>>>>> b/drivers/net/ethernet/broadcom/bgmac.c
>>>>> index 48d672b204a4..1fb0053aeee7 100644
>>>>> --- a/drivers/net/ethernet/broadcom/bgmac.c
>>>>> +++ b/drivers/net/ethernet/broadcom/bgmac.c
>>>>> @@ -483,7 +483,7 @@ static int bgmac_dma_rx_read(struct bgmac
>>>>> *bgmac, struct bgmac_dma_ring *ring,
>>>>> skb->protocol = eth_type_trans(skb,
>>>>> bgmac-
>>>>>> net_dev);
>>>>>
>>>>> bgmac->net_dev->stats.rx_bytes += len;
>>>>> bgmac->net_dev->stats.rx_packets++;
>>>>> - napi_gro_receive(&bgmac->napi, skb);
>>>>> + netif_receive_skb(skb);
>>>>> handled++;
>>>>> } while (0);
>>>>>
>>>>
>>>> And have you tested 1Gbit link speed ?
>>>> ( Or 2.5 Gbit link speed )
>>>>
>>>> If you want to disable GRO on your host, fine : you can use
>>>> ethtool
>>>> -K
>>>>
>>>>
>>>>
>>
>> (please don't top-post)
--
Florian
^ permalink raw reply
* Re: Latest net-next from GIT panic
From: Paweł Staszewski @ 2017-09-20 21:27 UTC (permalink / raw)
To: Cong Wang, Eric Dumazet
Cc: Wei Wang, Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <6c36fa00-13e6-8cd7-ff5c-df7739220736@itcare.pl>
W dniu 2017-09-20 o 23:25, Paweł Staszewski pisze:
>
>
> W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze:
>>
>>
>> W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze:
>>>
>>>
>>> W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze:
>>>>
>>>>
>>>> W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze:
>>>>>
>>>>>
>>>>> W dniu 2017-09-20 o 20:36, Cong Wang pisze:
>>>>>> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet
>>>>>> <eric.dumazet@gmail.com> wrote:
>>>>>>> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote:
>>>>>>>> but dmesg at this time shows nothing about interfaces or flaps.
>>>>>>>>
>>>>>>>> This is very odd.
>>>>>>>>
>>>>>>>> We only free netdevice in free_netdev() and it is only called when
>>>>>>>> we unregister a netdevice. Otherwise pcpu_refcnt is impossible
>>>>>>>> to be NULL.
>>>>>>> If there is a missing dev_hold() or one dev_put() in excess,
>>>>>>> this would allow the netdev to be freed too soon.
>>>>>>>
>>>>>>> -> Use after free.
>>>>>>> memory holding netdev could be reallocated-cleared by some other
>>>>>>> kernel
>>>>>>> user.
>>>>>>>
>>>>>> Sure, but only unregister could trigger a free. If there is no
>>>>>> unregister,
>>>>>> like what Pawel claims, then there is no free, the refcnt just
>>>>>> goes to
>>>>>> 0 but the memory is still there.
>>>>>>
>>>>> About possible mistake from my side with bisect - i can judge too
>>>>> early that some bisect was good
>>>>> the road was:
>>>>> git bisect start
>>>>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag
>>>>> 'pinctrl-v4.13-1' of
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
>>>>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
>>>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch
>>>>> 'next' of
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
>>>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
>>>>> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid
>>>>> using stack larger than 1024.
>>>>> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f
>>>>> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch
>>>>> 'udp-reduce-cache-pressure'
>>>>> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20
>>>>> # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch
>>>>> 's390-net-updates-part-2'
>>>>> git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230
>>>>> # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch
>>>>> 'bpf-ctx-narrow'
>>>>> git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70
>>>>> # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp:
>>>>> remove cp_outgoing
>>>>> git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2
>>>>> # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add
>>>>> TCP_MD5SIG_EXT socket option to set a key address prefix
>>>>> git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d
>>>>> # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce
>>>>> a new function dst_dev_put()
>>>>>
>>>>> And currently have this running for about 4 hours without problems.
>>>>>
>>>>>
>>>>>
>>>>> git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36
>>>>> # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove
>>>>> DST_NOCACHE flag
>>>>>
>>>>> Here for sure - panic
>>>>>
>>>>> git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2
>>>>> # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call
>>>>> dst_hold_safe() properly
>>>>> git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112
>>>>> # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call
>>>>> dst_hold_safe() properly
>>>>> git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f
>>>>> # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take
>>>>> dst->__refcnt for insertion into fib6 tree
>>>>>
>>>>> im not 100% sure tor last two
>>>>> Will test them again starting from
>>>>> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call
>>>>> dst_dev_put() properly
>>>>>
>>>>>
>>>>> git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855
>>>>> # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark
>>>>> DST_NOGC and remove the operation of dst_free()
>>>>>
>>>>>
>>>>>
>>>>> git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>>>> # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911]
>>>>> ipv4: mark DST_NOGC and remove the operation of dst_free()
>>>>>
>>>>>
>>>>>
>>>> What i can say more
>>>> I can reproduce this on any server with similar configuration
>>>> the difference can be teamd instead of bonding
>>>> ixgbe or i40e and mlx5
>>>> Same problems
>>>>
>>>> vlans - more or less prefixes learned from bgp -> zebra -> netlink
>>>> -> kernel
>>>> But normally in lab when using only plain routing no bgpd and about
>>>> 128 vlans - with 128 routes - cant reproduce this - this apperas
>>>> only with bgp - minimum where i can reproduce this was about 130k
>>>> prefixes with about 286 nexthops
>>>>
>>>>
>>>>
>>>>
>>> bisected again and same result:
>>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
>>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>> Author: Wei Wang <weiwan@google.com>
>>> Date: Sat Jun 17 10:42:32 2017 -0700
>>>
>>> ipv4: mark DST_NOGC and remove the operation of dst_free()
>>>
>>> With the previous preparation patches, we are ready to get rid
>>> of the
>>> dst gc operation in ipv4 code and release dst based on refcnt only.
>>> So this patch adds DST_NOGC flag for all IPv4 dst and remove the
>>> calls
>>> to dst_free().
>>> At this point, all dst created in ipv4 code do not use the dst gc
>>> anymore and will be destroyed at the point when refcnt drops to 0.
>>>
>>> Signed-off-by: Wei Wang <weiwan@google.com>
>>> Acked-by: Martin KaFai Lau <kafai@fb.com>
>>> Signed-off-by: David S. Miller <davem@davemloft.net>
>>>
>>> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da
>>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net
>>>
>>> Will add now version 2 of patch from Eric and we will see
>>>
>>>
>> after adding patch
>> perf top catch
>> PerfTop: 77159 irqs/sec kernel:99.7% exact: 0.0% [4000Hz
>> cycles], (all, 40 CPUs)
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>> 60.95% [kernel] [k] dev_put.part.6
>> 4.00% [kernel] [k] ixgbe_poll
>> 3.63% [kernel] [k] irq_entries_start
>> 1.22% [kernel] [k] fib_table_lookup
>> 1.15% [kernel] [k] do_raw_spin_lock
>> 1.05% [kernel] [k] ixgbe_xmit_frame_ring
>> 1.04% [kernel] [k] lookup
>> 0.87% [kernel] [k] eth_type_trans
>>
>>
>> no panic on console - rebooting to check logs
>>
>>
> Nothing logged
>
>
after adding this patch
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3331,7 +3331,15 @@ void netdev_run_todo(void);
*/
static inline void dev_put(struct net_device *dev)
{
- this_cpu_dec(*dev->pcpu_refcnt);
+ int __percpu *pref = READ_ONCE(dev->pcpu_refcnt);
+
+ if (!pref) {
+ pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n",
+ dev, dev->name, dev->reg_state, dev->dismantle);
+ for (;;)
+ cpu_relax();
+ }
+ this_cpu_dec(*pref);
}
/**
Have just halted console - no output no reaction on kbd
nothing in any syslog/log and catched only something from perf top
^ permalink raw reply related
* Re: [PATCH net-next] udp: do rmem bulk free even if the rx sk queue is empty
From: David Miller @ 2017-09-20 21:29 UTC (permalink / raw)
To: pabeni; +Cc: netdev, edumazet
In-Reply-To: <f82856528a033394c407539d1ff4b92e243bad36.1505815819.git.pabeni@redhat.com>
From: Paolo Abeni <pabeni@redhat.com>
Date: Tue, 19 Sep 2017 12:11:43 +0200
> The commit 6b229cf77d68 ("udp: add batching to udp_rmem_release()")
> reduced greatly the cacheline contention between the BH and the US
> reader batching the rmem updates in most scenarios.
>
> Such optimization is explicitly avoided if the US reader is faster
> then BH processing.
>
> My fault, I initially suggested this kind of behavior due to concerns
> of possible regressions with small sk_rcvbuf values. Tests showed
> such concerns are misplaced, so this commit relaxes the condition
> for rmem bulk updates, obtaining small but measurable performance
> gain in the scenario described above.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Applied, thanks Paolo.
^ permalink raw reply
* Re: [PATCH net-next 3/3] virtio-net: support XDP_REDIRECT
From: David Miller @ 2017-09-20 21:28 UTC (permalink / raw)
To: jasowang; +Cc: mst, virtualization, netdev, linux-kernel, john.fastabend
In-Reply-To: <1505814163-7315-3-git-send-email-jasowang@redhat.com>
From: Jason Wang <jasowang@redhat.com>
Date: Tue, 19 Sep 2017 17:42:43 +0800
> This patch tries to add XDP_REDIRECT for virtio-net. The changes are
> not complex as we could use exist XDP_TX helpers for most of the
> work. The rest is passing the XDP_TX to NAPI handler for implementing
> batching.
>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next 2/3] virtio-net: add packet len average only when needed during XDP
From: David Miller @ 2017-09-20 21:28 UTC (permalink / raw)
To: jasowang; +Cc: netdev, virtualization, john.fastabend, linux-kernel, mst
In-Reply-To: <1505814163-7315-2-git-send-email-jasowang@redhat.com>
From: Jason Wang <jasowang@redhat.com>
Date: Tue, 19 Sep 2017 17:42:42 +0800
> There's no need to add packet len average in the case of XDP_PASS
> since it will be done soon after skb is created.
>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next 1/3] virtio-net: remove unnecessary parameter of virtnet_xdp_xmit()
From: David Miller @ 2017-09-20 21:28 UTC (permalink / raw)
To: jasowang; +Cc: netdev, virtualization, john.fastabend, linux-kernel, mst
In-Reply-To: <1505814163-7315-1-git-send-email-jasowang@redhat.com>
From: Jason Wang <jasowang@redhat.com>
Date: Tue, 19 Sep 2017 17:42:41 +0800
> CC: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Applied.
^ permalink raw reply
* Re: Latest net-next from GIT panic
From: Paweł Staszewski @ 2017-09-20 21:25 UTC (permalink / raw)
To: Cong Wang, Eric Dumazet
Cc: Wei Wang, Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <26e8b899-d8b1-caa4-436b-e23ffe384b0f@itcare.pl>
W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze:
>
>
> W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze:
>>
>>
>> W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze:
>>>
>>>
>>> W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze:
>>>>
>>>>
>>>> W dniu 2017-09-20 o 20:36, Cong Wang pisze:
>>>>> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet
>>>>> <eric.dumazet@gmail.com> wrote:
>>>>>> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote:
>>>>>>> but dmesg at this time shows nothing about interfaces or flaps.
>>>>>>>
>>>>>>> This is very odd.
>>>>>>>
>>>>>>> We only free netdevice in free_netdev() and it is only called when
>>>>>>> we unregister a netdevice. Otherwise pcpu_refcnt is impossible
>>>>>>> to be NULL.
>>>>>> If there is a missing dev_hold() or one dev_put() in excess,
>>>>>> this would allow the netdev to be freed too soon.
>>>>>>
>>>>>> -> Use after free.
>>>>>> memory holding netdev could be reallocated-cleared by some other
>>>>>> kernel
>>>>>> user.
>>>>>>
>>>>> Sure, but only unregister could trigger a free. If there is no
>>>>> unregister,
>>>>> like what Pawel claims, then there is no free, the refcnt just
>>>>> goes to
>>>>> 0 but the memory is still there.
>>>>>
>>>> About possible mistake from my side with bisect - i can judge too
>>>> early that some bisect was good
>>>> the road was:
>>>> git bisect start
>>>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag
>>>> 'pinctrl-v4.13-1' of
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
>>>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
>>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch
>>>> 'next' of
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
>>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
>>>> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid
>>>> using stack larger than 1024.
>>>> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f
>>>> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch
>>>> 'udp-reduce-cache-pressure'
>>>> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20
>>>> # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch
>>>> 's390-net-updates-part-2'
>>>> git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230
>>>> # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch
>>>> 'bpf-ctx-narrow'
>>>> git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70
>>>> # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove
>>>> cp_outgoing
>>>> git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2
>>>> # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add
>>>> TCP_MD5SIG_EXT socket option to set a key address prefix
>>>> git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d
>>>> # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a
>>>> new function dst_dev_put()
>>>>
>>>> And currently have this running for about 4 hours without problems.
>>>>
>>>>
>>>>
>>>> git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36
>>>> # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove
>>>> DST_NOCACHE flag
>>>>
>>>> Here for sure - panic
>>>>
>>>> git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2
>>>> # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call
>>>> dst_hold_safe() properly
>>>> git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112
>>>> # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call
>>>> dst_hold_safe() properly
>>>> git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f
>>>> # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take
>>>> dst->__refcnt for insertion into fib6 tree
>>>>
>>>> im not 100% sure tor last two
>>>> Will test them again starting from
>>>> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put()
>>>> properly
>>>>
>>>>
>>>> git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855
>>>> # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark
>>>> DST_NOGC and remove the operation of dst_free()
>>>>
>>>>
>>>>
>>>> git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>>> # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911]
>>>> ipv4: mark DST_NOGC and remove the operation of dst_free()
>>>>
>>>>
>>>>
>>> What i can say more
>>> I can reproduce this on any server with similar configuration
>>> the difference can be teamd instead of bonding
>>> ixgbe or i40e and mlx5
>>> Same problems
>>>
>>> vlans - more or less prefixes learned from bgp -> zebra -> netlink
>>> -> kernel
>>> But normally in lab when using only plain routing no bgpd and about
>>> 128 vlans - with 128 routes - cant reproduce this - this apperas
>>> only with bgp - minimum where i can reproduce this was about 130k
>>> prefixes with about 286 nexthops
>>>
>>>
>>>
>>>
>> bisected again and same result:
>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
>> Author: Wei Wang <weiwan@google.com>
>> Date: Sat Jun 17 10:42:32 2017 -0700
>>
>> ipv4: mark DST_NOGC and remove the operation of dst_free()
>>
>> With the previous preparation patches, we are ready to get rid of
>> the
>> dst gc operation in ipv4 code and release dst based on refcnt only.
>> So this patch adds DST_NOGC flag for all IPv4 dst and remove the
>> calls
>> to dst_free().
>> At this point, all dst created in ipv4 code do not use the dst gc
>> anymore and will be destroyed at the point when refcnt drops to 0.
>>
>> Signed-off-by: Wei Wang <weiwan@google.com>
>> Acked-by: Martin KaFai Lau <kafai@fb.com>
>> Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da
>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net
>>
>> Will add now version 2 of patch from Eric and we will see
>>
>>
> after adding patch
> perf top catch
> PerfTop: 77159 irqs/sec kernel:99.7% exact: 0.0% [4000Hz
> cycles], (all, 40 CPUs)
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> 60.95% [kernel] [k] dev_put.part.6
> 4.00% [kernel] [k] ixgbe_poll
> 3.63% [kernel] [k] irq_entries_start
> 1.22% [kernel] [k] fib_table_lookup
> 1.15% [kernel] [k] do_raw_spin_lock
> 1.05% [kernel] [k] ixgbe_xmit_frame_ring
> 1.04% [kernel] [k] lookup
> 0.87% [kernel] [k] eth_type_trans
>
>
> no panic on console - rebooting to check logs
>
>
Nothing logged
^ permalink raw reply
* Re: [PATCH] bgmac: Remove all offloading features, including GRO.
From: rosenp @ 2017-09-20 21:27 UTC (permalink / raw)
To: Florian Fainelli, Eric Dumazet; +Cc: netdev
In-Reply-To: <0CEBE536-2C18-4ED2-90CB-3836CEF9A70E@gmail.com>
Sorry for the noise. After more testing I've found out that the cause
was that I had BBR enabled on my laptop. Switching back to CUBIC fixed
the issue.
In other words, this patch is detrimental.
~67mbps - gro off
~87mbps - gro on
On Fri, 2017-09-15 at 23:04 -0700, Florian Fainelli wrote:
> On September 15, 2017 5:38:42 PM PDT, rosenp@gmail.com wrote:
> > I have not. Unfortunately I own no gigabit hardware to test this
> > on.
> > The MIPS CPU runs at 300MHz on my unit.
> >
>
> bgmac is used on Gigabit capable hardware, like Northstar and
> Northstar Plus, and others too, so unless you can get access to such
> HW or get confirmation from someone that your patches changes
> something, I would just drop this change and not bother. This is
> already not 100mbits/sec linerate...
>
> > On Fri, 2017-09-15 at 17:34 -0700, Eric Dumazet wrote:
> > > On Fri, 2017-09-15 at 17:23 -0700, Rosen Penev wrote:
> > > > On a linksys E1200v1 (actually a crossflashed E1000v2), the
> > > > offloading features give no measurable benefit to speed or
> > > > latency.
> > > > Furthermore, disabling GRO actually improves iperf performance
> > > > by a
> > > > whoppimg 3mbps. Results:
> > > >
> > > > Currently:
> > > >
> > > > v2: Changed napi_gro_receive to netif_receive_skb. Seems to
> > > > have an
> > > > identical result.
> > > >
> > > > Signed-off-by: Rosen Penev <rosenp@gmail.com>
> > > > ---
> > > > drivers/net/ethernet/broadcom/bgmac.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/net/ethernet/broadcom/bgmac.c
> > > > b/drivers/net/ethernet/broadcom/bgmac.c
> > > > index 48d672b204a4..1fb0053aeee7 100644
> > > > --- a/drivers/net/ethernet/broadcom/bgmac.c
> > > > +++ b/drivers/net/ethernet/broadcom/bgmac.c
> > > > @@ -483,7 +483,7 @@ static int bgmac_dma_rx_read(struct bgmac
> > > > *bgmac, struct bgmac_dma_ring *ring,
> > > > skb->protocol = eth_type_trans(skb,
> > > > bgmac-
> > > > > net_dev);
> > > >
> > > > bgmac->net_dev->stats.rx_bytes += len;
> > > > bgmac->net_dev->stats.rx_packets++;
> > > > - napi_gro_receive(&bgmac->napi, skb);
> > > > + netif_receive_skb(skb);
> > > > handled++;
> > > > } while (0);
> > > >
> > >
> > > And have you tested 1Gbit link speed ?
> > > ( Or 2.5 Gbit link speed )
> > >
> > > If you want to disable GRO on your host, fine : you can use
> > > ethtool
> > > -K
> > >
> > >
> > >
>
> (please don't top-post)
^ permalink raw reply
* Re: Latest net-next from GIT panic
From: Paweł Staszewski @ 2017-09-20 21:24 UTC (permalink / raw)
To: Cong Wang, Eric Dumazet
Cc: Wei Wang, Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <7d8731ee-6c91-bbfd-6313-c95f52d7b49a@itcare.pl>
W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze:
>
>
> W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze:
>>
>>
>> W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze:
>>>
>>>
>>> W dniu 2017-09-20 o 20:36, Cong Wang pisze:
>>>> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet
>>>> <eric.dumazet@gmail.com> wrote:
>>>>> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote:
>>>>>> but dmesg at this time shows nothing about interfaces or flaps.
>>>>>>
>>>>>> This is very odd.
>>>>>>
>>>>>> We only free netdevice in free_netdev() and it is only called when
>>>>>> we unregister a netdevice. Otherwise pcpu_refcnt is impossible
>>>>>> to be NULL.
>>>>> If there is a missing dev_hold() or one dev_put() in excess,
>>>>> this would allow the netdev to be freed too soon.
>>>>>
>>>>> -> Use after free.
>>>>> memory holding netdev could be reallocated-cleared by some other
>>>>> kernel
>>>>> user.
>>>>>
>>>> Sure, but only unregister could trigger a free. If there is no
>>>> unregister,
>>>> like what Pawel claims, then there is no free, the refcnt just goes to
>>>> 0 but the memory is still there.
>>>>
>>> About possible mistake from my side with bisect - i can judge too
>>> early that some bisect was good
>>> the road was:
>>> git bisect start
>>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag
>>> 'pinctrl-v4.13-1' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
>>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch
>>> 'next' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
>>> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid
>>> using stack larger than 1024.
>>> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f
>>> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch
>>> 'udp-reduce-cache-pressure'
>>> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20
>>> # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch
>>> 's390-net-updates-part-2'
>>> git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230
>>> # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch
>>> 'bpf-ctx-narrow'
>>> git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70
>>> # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove
>>> cp_outgoing
>>> git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2
>>> # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add
>>> TCP_MD5SIG_EXT socket option to set a key address prefix
>>> git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d
>>> # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a
>>> new function dst_dev_put()
>>>
>>> And currently have this running for about 4 hours without problems.
>>>
>>>
>>>
>>> git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36
>>> # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove
>>> DST_NOCACHE flag
>>>
>>> Here for sure - panic
>>>
>>> git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2
>>> # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call
>>> dst_hold_safe() properly
>>> git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112
>>> # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call
>>> dst_hold_safe() properly
>>> git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f
>>> # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take
>>> dst->__refcnt for insertion into fib6 tree
>>>
>>> im not 100% sure tor last two
>>> Will test them again starting from
>>> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put()
>>> properly
>>>
>>>
>>> git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855
>>> # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark
>>> DST_NOGC and remove the operation of dst_free()
>>>
>>>
>>>
>>> git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>> # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4:
>>> mark DST_NOGC and remove the operation of dst_free()
>>>
>>>
>>>
>> What i can say more
>> I can reproduce this on any server with similar configuration
>> the difference can be teamd instead of bonding
>> ixgbe or i40e and mlx5
>> Same problems
>>
>> vlans - more or less prefixes learned from bgp -> zebra -> netlink ->
>> kernel
>> But normally in lab when using only plain routing no bgpd and about
>> 128 vlans - with 128 routes - cant reproduce this - this apperas only
>> with bgp - minimum where i can reproduce this was about 130k prefixes
>> with about 286 nexthops
>>
>>
>>
>>
> bisected again and same result:
> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
> Author: Wei Wang <weiwan@google.com>
> Date: Sat Jun 17 10:42:32 2017 -0700
>
> ipv4: mark DST_NOGC and remove the operation of dst_free()
>
> With the previous preparation patches, we are ready to get rid of the
> dst gc operation in ipv4 code and release dst based on refcnt only.
> So this patch adds DST_NOGC flag for all IPv4 dst and remove the
> calls
> to dst_free().
> At this point, all dst created in ipv4 code do not use the dst gc
> anymore and will be destroyed at the point when refcnt drops to 0.
>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Acked-by: Martin KaFai Lau <kafai@fb.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da
> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net
>
> Will add now version 2 of patch from Eric and we will see
>
>
after adding patch
perf top catch
PerfTop: 77159 irqs/sec kernel:99.7% exact: 0.0% [4000Hz
cycles], (all, 40 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
60.95% [kernel] [k] dev_put.part.6
4.00% [kernel] [k] ixgbe_poll
3.63% [kernel] [k] irq_entries_start
1.22% [kernel] [k] fib_table_lookup
1.15% [kernel] [k] do_raw_spin_lock
1.05% [kernel] [k] ixgbe_xmit_frame_ring
1.04% [kernel] [k] lookup
0.87% [kernel] [k] eth_type_trans
no panic on console - rebooting to check logs
^ permalink raw reply
* Re: [PATCH v4 4/4] samples/bpf: Add documentation on cross compilation
From: Daniel Borkmann @ 2017-09-20 21:25 UTC (permalink / raw)
To: Joel Fernandes, linux-kernel
Cc: netdev, alison, juri.lelli, fengc, davem, ast, kernel-team
In-Reply-To: <20170920161159.25747-4-joelaf@google.com>
On 09/20/2017 06:11 PM, Joel Fernandes wrote:
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Joel Fernandes <joelaf@google.com>
(Minor typo pointed out by Randy, but rest looks fine.)
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
^ permalink raw reply
* Re: [Patch v3 2/3] ipv4: Namespaceify tcp_fastopen_key knob
From: David Miller @ 2017-09-20 21:25 UTC (permalink / raw)
To: yanhaishuang; +Cc: kuznet, edumazet, weiwan, lucab, netdev, linux-kernel
In-Reply-To: <1505813896-12121-2-git-send-email-yanhaishuang@cmss.chinamobile.com>
From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Tue, 19 Sep 2017 17:38:15 +0800
> @@ -128,6 +130,8 @@ struct netns_ipv4 {
> struct inet_timewait_death_row tcp_death_row;
> int sysctl_max_syn_backlog;
> int sysctl_tcp_fastopen;
> + struct tcp_fastopen_context __rcu *tcp_fastopen_ctx;
> + spinlock_t tcp_fastopen_ctx_lock;
>
> #ifdef CONFIG_NET_L3_MASTER_DEV
> int sysctl_udp_l3mdev_accept;
Where are you releasing this context during netns teardown?
I think this is a leak.
^ permalink raw reply
* Re: [PATCH v4 3/4] samples/bpf: Fix pt_regs issues when cross-compiling
From: Daniel Borkmann @ 2017-09-20 21:25 UTC (permalink / raw)
To: Joel Fernandes, linux-kernel
Cc: netdev, alison, juri.lelli, fengc, davem, ast, kernel-team
In-Reply-To: <20170920161159.25747-3-joelaf@google.com>
On 09/20/2017 06:11 PM, Joel Fernandes wrote:
> BPF samples fail to build when cross-compiling for ARM64 because of incorrect
> pt_regs param selection. This is because clang defines __x86_64__ and
> bpf_headers thinks we're building for x86. Since clang is building for the BPF
> target, it shouldn't make assumptions about what target the BPF program is
> going to run on. To fix this, lets pass ARCH so the header knows which target
> the BPF program is being compiled for and can use the correct pt_regs code.
>
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Joel Fernandes <joelaf@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
^ permalink raw reply
* Re: [PATCH v4 2/4] samples/bpf: Enable cross compiler support
From: Daniel Borkmann @ 2017-09-20 21:24 UTC (permalink / raw)
To: Joel Fernandes, linux-kernel
Cc: netdev, alison, juri.lelli, fengc, davem, ast, kernel-team
In-Reply-To: <20170920161159.25747-2-joelaf@google.com>
On 09/20/2017 06:11 PM, Joel Fernandes wrote:
> When cross compiling, bpf samples use HOSTCC for compiling the non-BPF part of
> the sample, however what we really want is to use the cross compiler to build
> for the cross target since that is what will load and run the BPF sample.
> Detect this and compile samples correctly.
>
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Joel Fernandes <joelaf@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
^ permalink raw reply
* Re: [PATCH v4 1/4] samples/bpf: Use getppid instead of getpgrp for array map stress
From: Daniel Borkmann @ 2017-09-20 21:24 UTC (permalink / raw)
To: Joel Fernandes, linux-kernel
Cc: netdev, alison, juri.lelli, fengc, davem, ast, kernel-team
In-Reply-To: <20170920161159.25747-1-joelaf@google.com>
On 09/20/2017 06:11 PM, Joel Fernandes wrote:
> When cross-compiling the bpf sample map_perf_test for aarch64, I find that
> __NR_getpgrp is undefined. This causes build errors. This syscall is deprecated
> and requires defining __ARCH_WANT_SYSCALL_DEPRECATED. To avoid having to define
> that, just use a different syscall (getppid) for the array map stress test.
>
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Joel Fernandes <joelaf@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
^ permalink raw reply
* Re: [Patch v3 1/3] ipv4: Namespaceify tcp_fastopen knob
From: David Miller @ 2017-09-20 21:22 UTC (permalink / raw)
To: yanhaishuang; +Cc: kuznet, edumazet, weiwan, lucab, netdev, linux-kernel
In-Reply-To: <1505813896-12121-1-git-send-email-yanhaishuang@cmss.chinamobile.com>
From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Tue, 19 Sep 2017 17:38:14 +0800
> - if ((sysctl_tcp_fastopen & TFO_SERVER_WO_SOCKOPT1) &&
> - (sysctl_tcp_fastopen & TFO_SERVER_ENABLE) &&
> + tcp_fastopen = sock_net(sk)->ipv4.sysctl_tcp_fastopen;
^^
Please change that to one space.
And also please provide an appropriate "[PATCH vX 0/3] " header
posting when you respin this series.
> @@ -282,18 +280,19 @@ struct sock *tcp_try_fastopen(struct sock *sk, struct sk_buff *skb,
> struct tcp_fastopen_cookie valid_foc = { .len = -1 };
> bool syn_data = TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq + 1;
> struct sock *child;
> + int tcp_fastopen = sock_net(sk)->ipv4.sysctl_tcp_fastopen;
Please order local variables from longest to shortest line (aka. reverse
christmas tree format).
^ permalink raw reply
* Re: [PATCH v3 14/31] vxfs: Define usercopy region in vxfs_inode slab cache
From: Kees Cook @ 2017-09-20 21:21 UTC (permalink / raw)
To: Christoph Hellwig
Cc: LKML, David Windsor, linux-fsdevel@vger.kernel.org,
Network Development, Linux-MM,
kernel-hardening@lists.openwall.com
In-Reply-To: <20170920205642.GA20023@infradead.org>
On Wed, Sep 20, 2017 at 1:56 PM, Christoph Hellwig <hch@infradead.org> wrote:
> Hi Kees,
>
> I've only got this single email from you, which on it's own doesn't
> compile and seems to be part of a 31 patch series.
>
> So as-is NAK, doesn't work.
>
> Please make sure to always send every patch in a series to every
> developer you want to include.
This is why I included several other lists on the full CC (am I
unlucky enough to have you not subscribed to any of them?). Adding a
CC for everyone can result in a huge CC list, especially for the
forth-coming 300-patch timer_list series. ;)
Do you want me to resend the full series to you, or would you prefer
something else like a patchwork bundle? (I'll explicitly add you to CC
for any future versions, though.)
-Kees
--
Kees Cook
Pixel Security
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: ipv4 ID calculation
From: Stephen Hemminger @ 2017-09-20 21:15 UTC (permalink / raw)
To: Harsha Chenji; +Cc: netdev
In-Reply-To: <CAMB9WxJpwdZ2gFygTdWZmpLoCG0A79ReCbts-phpT76JfTPHsQ@mail.gmail.com>
On Mon, 18 Sep 2017 20:43:05 -0400
Harsha Chenji <cjkernel@gmail.com> wrote:
> Hi all,
>
> Where is the ID field of the IPv4 header created when the DF flag is
> set? I am looking at ip_build_and_send_pkt. The code seems to have
> changed in 4.4-rc1:
>
> if (ip_dont_fragment(sk, &rt->dst)) {
> iph->frag_off = htons(IP_DF);
> iph->id = 0;
> } else {
> iph->frag_off = 0;
> __ip_select_ident(net, iph, 1);
> }
>
> old code (executed irrespective of DF or not):
>
> ip_select_ident(sock_net(sk), skb, sk);
>
> The code in Stevens is basically iph->id = htons(ip_ident++) and now
> it seems to be calculated based on a hash + lookup table.
>
> So where is the id of 0 overwritten when DF is set? Didn't find any
> info in the docs.
IP id doesn't matter if Dont Fragment bit is set. The IP id is used
by receiver (and firewalls) to coalesce fragments by the ID. If DF
is set no system in the path is supposed to fragment. The idea is
to reduce the number of possible cases of fragment collisions.
> P.S. - is this the right mailing list for these kind of questions?
Sort of, the list is more about technical discussions and patches.
You could find more info by using git blame to find the commit that
introduced the change, then read the log for that.
^ permalink raw reply
* Re: [PATCH net-next] net: dsa: lan9303: Add adjust_link() method
From: David Miller @ 2017-09-20 21:14 UTC (permalink / raw)
To: privat; +Cc: andrew, vivien.didelot, f.fainelli, netdev, linux-kernel
In-Reply-To: <20170919080924.11144-1-privat@egil-hjelmeland.no>
From: Egil Hjelmeland <privat@egil-hjelmeland.no>
Date: Tue, 19 Sep 2017 10:09:24 +0200
> Make the driver react to device tree "fixed-link" declaration on CPU port.
>
> - turn off autonegotiation
> - force speed 10 or 100 mb/s
> - force duplex mode
>
> Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Applied, thank you.
^ permalink raw reply
* Re: Latest net-next from GIT panic
From: Paweł Staszewski @ 2017-09-20 21:10 UTC (permalink / raw)
To: Cong Wang, Eric Dumazet
Cc: Wei Wang, Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <548a893c-8a8f-3b07-00f7-d75de1d777ad@itcare.pl>
W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze:
>
>
> W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze:
>>
>>
>> W dniu 2017-09-20 o 20:36, Cong Wang pisze:
>>> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet
>>> <eric.dumazet@gmail.com> wrote:
>>>> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote:
>>>>> but dmesg at this time shows nothing about interfaces or flaps.
>>>>>
>>>>> This is very odd.
>>>>>
>>>>> We only free netdevice in free_netdev() and it is only called when
>>>>> we unregister a netdevice. Otherwise pcpu_refcnt is impossible
>>>>> to be NULL.
>>>> If there is a missing dev_hold() or one dev_put() in excess,
>>>> this would allow the netdev to be freed too soon.
>>>>
>>>> -> Use after free.
>>>> memory holding netdev could be reallocated-cleared by some other
>>>> kernel
>>>> user.
>>>>
>>> Sure, but only unregister could trigger a free. If there is no
>>> unregister,
>>> like what Pawel claims, then there is no free, the refcnt just goes to
>>> 0 but the memory is still there.
>>>
>> About possible mistake from my side with bisect - i can judge too
>> early that some bisect was good
>> the road was:
>> git bisect start
>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag
>> 'pinctrl-v4.13-1' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch
>> 'next' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
>> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid
>> using stack larger than 1024.
>> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f
>> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch
>> 'udp-reduce-cache-pressure'
>> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20
>> # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch
>> 's390-net-updates-part-2'
>> git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230
>> # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch
>> 'bpf-ctx-narrow'
>> git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70
>> # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove
>> cp_outgoing
>> git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2
>> # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add
>> TCP_MD5SIG_EXT socket option to set a key address prefix
>> git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d
>> # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a
>> new function dst_dev_put()
>>
>> And currently have this running for about 4 hours without problems.
>>
>>
>>
>> git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36
>> # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove
>> DST_NOCACHE flag
>>
>> Here for sure - panic
>>
>> git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2
>> # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call
>> dst_hold_safe() properly
>> git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112
>> # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call
>> dst_hold_safe() properly
>> git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f
>> # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take
>> dst->__refcnt for insertion into fib6 tree
>>
>> im not 100% sure tor last two
>> Will test them again starting from
>> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put()
>> properly
>>
>>
>> git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855
>> # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC
>> and remove the operation of dst_free()
>>
>>
>>
>> git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911
>> # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4:
>> mark DST_NOGC and remove the operation of dst_free()
>>
>>
>>
> What i can say more
> I can reproduce this on any server with similar configuration
> the difference can be teamd instead of bonding
> ixgbe or i40e and mlx5
> Same problems
>
> vlans - more or less prefixes learned from bgp -> zebra -> netlink ->
> kernel
> But normally in lab when using only plain routing no bgpd and about
> 128 vlans - with 128 routes - cant reproduce this - this apperas only
> with bgp - minimum where i can reproduce this was about 130k prefixes
> with about 286 nexthops
>
>
>
>
bisected again and same result:
b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
Author: Wei Wang <weiwan@google.com>
Date: Sat Jun 17 10:42:32 2017 -0700
ipv4: mark DST_NOGC and remove the operation of dst_free()
With the previous preparation patches, we are ready to get rid of the
dst gc operation in ipv4 code and release dst based on refcnt only.
So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls
to dst_free().
At this point, all dst created in ipv4 code do not use the dst gc
anymore and will be destroyed at the point when refcnt drops to 0.
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
:040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da
831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net
Will add now version 2 of patch from Eric and we will see
^ permalink raw reply
* Re: [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event
From: David Miller @ 2017-09-20 21:12 UTC (permalink / raw)
To: yhs; +Cc: peterz, rostedt, ast, daniel, netdev, kernel-team
In-Reply-To: <20170918233836.1817062-1-yhs@fb.com>
From: Yonghong Song <yhs@fb.com>
Date: Mon, 18 Sep 2017 16:38:36 -0700
> This patch fixes a bug exhibited by the following scenario:
> 1. fd1 = perf_event_open with attr.config = ID1
> 2. attach bpf program prog1 to fd1
> 3. fd2 = perf_event_open with attr.config = ID1
> <this will be successful>
> 4. user program closes fd2 and prog1 is detached from the tracepoint.
> 5. user program with fd1 does not work properly as tracepoint
> no output any more.
>
> The issue happens at step 4. Multiple perf_event_open can be called
> successfully, but only one bpf prog pointer in the tp_event. In the
> current logic, any fd release for the same tp_event will free
> the tp_event->prog.
>
> The fix is to free tp_event->prog only when the closing fd
> corresponds to the one which registered the program.
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
I've applied this and queued it up for -stable as it looks good
to me and 2 days is enough time for waiting for any other reviews.
Thanks.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox