Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v3 07/29] x86: bpf_jit, use ENTRY+ENDPROC
From: David Miller @ 2017-04-24 14:41 UTC (permalink / raw)
  To: jslaby
  Cc: alexei.starovoitov, mingo, tglx, hpa, x86, jpoimboe, linux-kernel,
	netdev, daniel, edumazet
In-Reply-To: <697947f4-0a2c-1480-0995-9919556dc020@suse.cz>

From: Jiri Slaby <jslaby@suse.cz>
Date: Mon, 24 Apr 2017 08:45:11 +0200

> On 04/21/2017, 09:32 PM, Alexei Starovoitov wrote:
>> On Fri, Apr 21, 2017 at 04:12:43PM +0200, Jiri Slaby wrote:
>>> Do not use a custom macro FUNC for starts of the global functions, use
>>> ENTRY instead.
>>>
>>> And while at it, annotate also ends of the functions by ENDPROC.
>>>
>>> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
>>> Cc: "David S. Miller" <davem@davemloft.net>
>>> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
>>> Cc: James Morris <jmorris@namei.org>
>>> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
>>> Cc: Patrick McHardy <kaber@trash.net>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Ingo Molnar <mingo@redhat.com>
>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>> Cc: x86@kernel.org
>>> Cc: netdev@vger.kernel.org
>>> ---
>>>  arch/x86/net/bpf_jit.S | 32 ++++++++++++++++++--------------
>>>  1 file changed, 18 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/arch/x86/net/bpf_jit.S b/arch/x86/net/bpf_jit.S
>>> index f2a7faf4706e..762c29fb8832 100644
>>> --- a/arch/x86/net/bpf_jit.S
>>> +++ b/arch/x86/net/bpf_jit.S
>>> @@ -23,16 +23,12 @@
>>>  	32 /* space for rbx,r13,r14,r15 */ + \
>>>  	8 /* space for skb_copy_bits */)
>>>  
>>> -#define FUNC(name) \
>>> -	.globl name; \
>>> -	.type name, @function; \
>>> -	name:
>>> -
>>> -FUNC(sk_load_word)
>>> +ENTRY(sk_load_word)
>>>  	test	%esi,%esi
>>>  	js	bpf_slow_path_word_neg
>>> +ENDPROC(sk_load_word)
>> 
>> this doens't look right.
>> It will add alignment nops in critical paths of these pseudo functions.
>> I'm also not sure whether it will still work afterwards.
>> Was it tested?
>> I'd prefer if this code kept as-is.
> 
> It cannot stay as-is simply because we want to know where the functions
> end to inject debuginfo properly. The code above does not warrant for
> any exception.

I totally and completely disagree.

> Executing a nop takes a little and having externally-callable functions
> aligned can actually help performance (no, I haven't measured nor tested
> the code). But sure, the tool is generic, so I can introduce a local
> macros to avoid alignments in the functions:

Not for this case, it's a bunch of entry points all packed together
intentionally so that SKB accesses of different access sizes (which is
almost always the case) from BPF programs use the smallest amount of
I-cache as possible.

^ permalink raw reply

* Re: [PATCH net-next v5 1/2] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jamal Hadi Salim @ 2017-04-24 14:42 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Simon Horman, Jiri Pirko, davem, xiyou.wangcong, eric.dumazet,
	netdev, Tom Herbert
In-Reply-To: <20170424142058.GA26625@salvia>

On 17-04-24 10:20 AM, Pablo Neira Ayuso wrote:
> On Mon, Apr 24, 2017 at 08:49:00AM -0400, Jamal Hadi Salim wrote:

>>
>> I am fine with the counter-Postel view of having the kernel
>> validate that appropriate bits are set as long as we dont make
>> user space to now start learning how to play acrobatics.
>
> jamal, what performance concern you have in building this error
> message? TLVs is the most flexible way. And this is error path, so we
> should build this message rarely, only if the user sends us something
> incorrect, why bother...

I have a feeling we are reffering to 2 different things.
Which error message? Are you talking about extended ACK?
I have no problem with that.

Let me sumarize for you the discussion.

My concern was was the double request needed now
which was unneeded before.

Before: You send a msg and say the kernel didnt understand.
Kernel ignores what it didnt understand and does things
you asked it to. i.e Part of Postel principle which says
"Be liberal in what you expect of others"
But the new concern is user space not abiding to the other
half of Postel principle "Be conservative in what you send".
It may set some random flags which the kernel doesnt understand.

One idea is to have the kernel totally reject anytime
it sees such flags. I am sure such a message could be
conveyed back to the user. Then the user sends the
correct one back.

The challenge i have is to enforce this trial by fire
approach to all user space apps. It is a large change.

My suggestion is for user to set flag to request the
old behavior of sending only one message.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH net-next 0/3] Misc BPF cleanup
From: Alexei Starovoitov @ 2017-04-24 14:45 UTC (permalink / raw)
  To: Alexander Alemayhu, netdev; +Cc: daniel
In-Reply-To: <20170424133108.31595-1-alexander@alemayhu.com>

On 4/24/17 6:31 AM, Alexander Alemayhu wrote:
> Hei,
>
> while looking into making the Makefile in samples/bpf better handle O= I saw
> several warnings when running `make clean && make samples/bpf/`. This series
> reduces those warnings.

Cleanup looks good to me.
Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

* Re: [PATCH net v2 1/3] net: hns: support deferred probe when can not obtain irq
From: Matthias Brugger @ 2017-04-24 14:47 UTC (permalink / raw)
  To: lipeng (Y), Yankejian, davem, salil.mehta, yisen.zhuang,
	huangdaode, zhouhuiru
  Cc: netdev, charles.chenxin, linuxarm
In-Reply-To: <a64b254a-d5b4-287e-1f76-ec6f8189609a@huawei.com>



On 24/04/17 13:43, lipeng (Y) wrote:
>
>
> On 2017/4/24 18:28, Matthias Brugger wrote:
>> On 21/04/17 09:44, Yankejian wrote:
>>> From: lipeng <lipeng321@huawei.com>
>>>
>>> In the hip06 and hip07 SoCs, the interrupt lines from the
>>> DSAF controllers are connected to mbigen hw module.
>>> The mbigen module is probed with module_init, and, as such,
>>> is not guaranteed to probe before the HNS driver. So we need
>>> to support deferred probe.
>>>
>>> We check for probe deferral in the hw layer probe, so we not
>>> probe into the main layer and memories, etc., to later learn
>>> that we need to defer the probe.
>>>
>>
>> Why? This looks like a hack.
>> From what I see, we can handle EPROBE_DEFER easily inside hns_ppe_init
>> checking the return value of hns_rcb_get_cfg. Like you do in 2/3 of
>> this series.
>>
>> Regards,
>> Matthias
> Hi Matthias,
>
> mdio && phy is not necessary condition, and port can work well  for port
> + SFP (without mdio &&phy).
>
> BUT irq is the necessary condition,  port can not work well without irq.
>
> So, I check IRQ first,and do not probe dsaf if can't obtain irq(1/3 of
> this series),   and check mdio only when there is phy(2/3 of this series).
>
> And thanks for your review.

I think I didn't explained myself good enough.
I was suggesting the following (not even compile tested):

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
index eba406bea52f..be38d47bc399 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
@@ -510,7 +510,9 @@ int hns_ppe_init(struct dsaf_device *dsaf_dev)

                 hns_ppe_get_cfg(dsaf_dev->ppe_common[i]);

-               hns_rcb_get_cfg(dsaf_dev->rcb_common[i]);
+               ret = hns_rcb_get_cfg(dsaf_dev->rcb_common[i]);
+               if (reg < 0)
+                       goto get_cfg_fail;
         }

         for (i = 0; i < HNS_PPE_COM_NUM; i++)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
index c20a0f4f8f02..c7e801d0c3b7 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
@@ -492,7 +492,7 @@ static int hns_rcb_get_base_irq_idx(struct 
rcb_common_cb *rcb_common)
   *hns_rcb_get_cfg - get rcb config
   *@rcb_common: rcb common device
   */
-void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
+int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
  {
         struct ring_pair_cb *ring_pair_cb;
         u32 i;
@@ -517,10 +517,18 @@ void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
                 ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] =
                 is_ver1 ? platform_get_irq(pdev, base_irq_idx + i * 2 + 
1) :
                           platform_get_irq(pdev, base_irq_idx + i * 3);
+
+               if ((ring_pair_cb->virq[HNS_RCB_IRQ_IDX_TX] == 
-EPROBE_DEFER) ||
+                   (ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] == 
-EPROBE_DEFER)) {
+                       return -EPROBE_DEFER;
+               }
+
                 ring_pair_cb->q.phy_base =
 
RCB_COMM_BASE_TO_RING_BASE(rcb_common->phy_base, i);
                 hns_rcb_ring_pair_get_cfg(ring_pair_cb);
         }
+
+       return 0;
  }

  /**


Regards,
Matthias

>
> lipeng
>
>>
>>> Signed-off-by: lipeng <lipeng321@huawei.com>
>>> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
>>> ---
>>>  drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 12 ++++++++++++
>>>  1 file changed, 12 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
>>> b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
>>> index 403ea9d..2da5b42 100644
>>> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
>>> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
>>> @@ -2971,6 +2971,18 @@ static int hns_dsaf_probe(struct
>>> platform_device *pdev)
>>>      struct dsaf_device *dsaf_dev;
>>>      int ret;
>>>
>>> +    /*
>>> +     * Check if we should defer the probe before we probe the
>>> +     * dsaf, as it's hard to defer later on.
>>> +     */
>>> +    ret = platform_get_irq(pdev, 0);
>>> +    if (ret < 0) {
>>> +        if (ret != -EPROBE_DEFER)
>>> +            dev_err(&pdev->dev, "Cannot obtain irq\n");
>>> +
>>> +        return ret;
>>> +    }
>>> +
>>>      dsaf_dev = hns_dsaf_alloc_dev(&pdev->dev, sizeof(struct
>>> dsaf_drv_priv));
>>>      if (IS_ERR(dsaf_dev)) {
>>>          ret = PTR_ERR(dsaf_dev);
>>>
>>
>> .
>>
>

^ permalink raw reply related

* Re: [PATCH v3 07/29] x86: bpf_jit, use ENTRY+ENDPROC
From: Jiri Slaby @ 2017-04-24 14:52 UTC (permalink / raw)
  To: David Miller
  Cc: alexei.starovoitov, mingo, tglx, hpa, x86, jpoimboe, linux-kernel,
	netdev, daniel, edumazet
In-Reply-To: <20170424.104132.950580313142367896.davem@davemloft.net>

On 04/24/2017, 04:41 PM, David Miller wrote:
>> It cannot stay as-is simply because we want to know where the functions
>> end to inject debuginfo properly. The code above does not warrant for
>> any exception.
> 
> I totally and completely disagree.

You can disagree as you wish but there is really nothing special on the
bpf code with respect to annotations.

>> Executing a nop takes a little and having externally-callable functions
>> aligned can actually help performance (no, I haven't measured nor tested
>> the code). But sure, the tool is generic, so I can introduce a local
>> macros to avoid alignments in the functions:
> 
> Not for this case, it's a bunch of entry points all packed together
> intentionally so that SKB accesses of different access sizes (which is
> almost always the case) from BPF programs use the smallest amount of
> I-cache as possible.

And for that reason I suggested the special macros for the code (see the
macros in the e-mail you replied to again). So what problem do you
actually have with the suggested solution?

thanks,
-- 
js
suse labs

^ permalink raw reply

* Re: Re: [PATCH net-next 1/4] ixgbe: sparc: rename the ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER
From: Will Deacon @ 2017-04-24 14:53 UTC (permalink / raw)
  To: Gabriele Paoloni
  Cc: Amir Ancel, David Laight, davem@davemloft.net, Catalin Marinas,
	Mark Rutland, Robin Murphy, jeffrey.t.kirsher@intel.com,
	alexander.duyck@gmail.com, linux-arm-kernel@lists.infradead.org,
	netdev@vger.kernel.org, Dingtianhong, Linuxarm
In-Reply-To: <EE11001F9E5DDD47B7634E2F8A612F2E2053543C@FRAEML521-MBX.china.huawei.com>

On Wed, Apr 19, 2017 at 02:46:19PM +0000, Gabriele Paoloni wrote:
> > From: Amir Ancel [mailto:amira@mellanox.com]
> > Sent: 18 April 2017 21:18
> > To: David Laight; Gabriele Paoloni; davem@davemloft.net
> > Cc: Catalin Marinas; Will Deacon; Mark Rutland; Robin Murphy;
> > jeffrey.t.kirsher@intel.com; alexander.duyck@gmail.com; linux-arm-
> > kernel@lists.infradead.org; netdev@vger.kernel.org; Dingtianhong;
> > Linuxarm
> > Subject: Re: Re: [PATCH net-next 1/4] ixgbe: sparc: rename the
> > ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER
> > 
> > Hi,
> > mlx5 driver is planned to have RO support this year.
> > I believe drivers should be able to query whether the arch support it
> 
> I guess that here when you say query you mean having a config symbol
> that is set accordingly to the host architecture, right?
> 
> As already said I have looked around a bit and other drivers do not seem
> to enable/disable RO for their EP on the basis of the host architecture.
> So why should mlx5 do it according to the host?
> 
> Also my understating is that some architectures (like ARM64 for example)
> can have different PCI host controller implementations depending on the
> vendor...therefore maybe it is not appropriate there to have a Kconfig
> symbol selected by the architecture...  

Indeed. We're not able to determine whether or not RO is supported at
compile time, so we'd have to detect this dynamically if we want to support
it for arm64 with a single kernel Image. That means either passing something
through firmware, having the PCI host controller opt-in or something coarse
like a command-line option.

Will

^ permalink raw reply

* Re: [PATCH net] bridge: shutdown bridge device before removing it
From: Nikolay Aleksandrov @ 2017-04-24 14:53 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, bridge@lists.linux-foundation.org, David S. Miller
In-Reply-To: <CADvbK_ckVO19iPnhVLoSKXsRPLcb3kJ_H+fN-UEwaF4T=JY0dQ@mail.gmail.com>

On 24/04/17 17:41, Xin Long wrote:
> On Mon, Apr 24, 2017 at 8:07 PM, Nikolay Aleksandrov
> <nikolay@cumulusnetworks.com> wrote:
>> On 24/04/17 14:01, Nikolay Aleksandrov wrote:
>>> On 24/04/17 10:25, Xin Long wrote:
>>>> During removing a bridge device, if the bridge is still up, a new mdb entry
>>>> still can be added in br_multicast_add_group() after all mdb entries are
>>>> removed in br_multicast_dev_del(). Like the path:
>>>>
>>>>   mld_ifc_timer_expire ->
>>>>     mld_sendpack -> ...
>>>>       br_multicast_rcv ->
>>>>         br_multicast_add_group
>>>>
>>>> The new mp's timer will be set up. If the timer expires after the bridge
>>>> is freed, it may cause use-after-free panic in br_multicast_group_expired.
>>>> This can happen when ip link remove a bridge or destroy a netns with a
>>>> bridge device inside.
>>>>
>>>> As we can see in br_del_bridge, brctl is also supposed to remove a bridge
>>>> device after it's shutdown.
>>>>
>>>> This patch is to call dev_close at the beginning of br_dev_delete so that
>>>> netif_running check in br_multicast_add_group can avoid this issue. But
>>>> to keep consistent with before, it will not remove the IFF_UP check in
>>>> br_del_bridge for brctl.
>>>>
>>>> Reported-by: Jianwen Ji <jiji@redhat.com>
>>>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>>>> ---
>>>>  net/bridge/br_if.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>
>>> +CC bridge maintainers
>>>
>>> I can see how this could happen, could you also provide the traceback ?
>>>
>>> The patch looks good to me, actually I think it fixes another issue with
>>> mcast stats where the percpu pointer can be accessed after it's freed if
>>> an mcast packet can get sent via br->dev after the br_multicast_dev_del() call.
>>> This is definitely stable material, if I'm not mistaken the issue is there since
>>> the introduction of br_dev_delete:
>>> commit e10177abf842
>>> Author: Satish Ashok <sashok@cumulusnetworks.com>
>>> Date:   Wed Jul 15 07:16:51 2015 -0700
>>>
>>>     bridge: multicast: fix handling of temp and perm entries
>>>
>>>
>>>
>>> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>>
>>
>> Actually I have a better idea for a fix because dev_close() for a single device is rather heavy.
>> Why don't you move the mdb flush logic in the bridge's ndo_uninit() callback ?
>> That should have the same effect and be much faster.
> Yes. But it seems that all cleanups for bridge should be done after
> it's shutdown since beginning according to brctl. I'm not sure if there
> are still other problems caused by this. maybe safer to use dev_close.
> I need to check more to confirm this.
> 

ndo_uninit() is after the device has been stopped, so it is the same as
your fix as I said.

> I also have another question about mp->timer removing.
> As we can see, now it removes this timer with del_timer, instead of
> del_timer_sync. What if the timer is running when del_timer ?
> How can we be sure that br_multicast_group_expired will be done
> before the bridge dev is freed. synchronize_net ?
> 

Yeah, I've been thinking about that and the only race is that the timer
might have fired and waiting for the lock while the mdb is being flushed
thus the cancel_timer() won't affect it and then it will enter and see
that !netif_running(br->dev), but unfortunately there's a bug because we
cannot guarantee that br->dev still exists at that point.
This is a different bug though.

>>
>> By the way I just noticed that there's also a memory leak - the mdb hash is reallocated
>> and not freed due to the mdb rehash, here's also kmemleak's object:
>>
> yeps, ;-)
> 
>> unreferenced object 0xffff8800540ba800 (size 2048):
>>   comm "softirq", pid 0, jiffies 4520588901 (age 5787.284s)
>>   hex dump (first 32 bytes):
>>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>   backtrace:
>>     [<ffffffff816e2287>] kmemleak_alloc+0x67/0xc0
>>     [<ffffffff81260bea>] __kmalloc+0x1ba/0x3e0
>>     [<ffffffffa05c60ee>] br_mdb_rehash+0x5e/0x340 [bridge]
>>     [<ffffffffa05c74af>] br_multicast_new_group+0x43f/0x6e0 [bridge]
>>     [<ffffffffa05c7aa3>] br_multicast_add_group+0x203/0x260 [bridge]
>>     [<ffffffffa05ca4b5>] br_multicast_rcv+0x945/0x11d0 [bridge]
>>     [<ffffffffa05b6b10>] br_dev_xmit+0x180/0x470 [bridge]
>>     [<ffffffff815c781b>] dev_hard_start_xmit+0xbb/0x3d0
>>     [<ffffffff815c8743>] __dev_queue_xmit+0xb13/0xc10
>>     [<ffffffff815c8850>] dev_queue_xmit+0x10/0x20
>>     [<ffffffffa02f8d7a>] ip6_finish_output2+0x5ca/0xac0 [ipv6]
>>     [<ffffffffa02fbfc6>] ip6_finish_output+0x126/0x2c0 [ipv6]
>>     [<ffffffffa02fc245>] ip6_output+0xe5/0x390 [ipv6]
>>     [<ffffffffa032b92c>] NF_HOOK.constprop.44+0x6c/0x240 [ipv6]
>>     [<ffffffffa032bd16>] mld_sendpack+0x216/0x3e0 [ipv6]
>>     [<ffffffffa032d5eb>] mld_ifc_timer_expire+0x18b/0x2b0 [ipv6]
>>
>>
>>

^ permalink raw reply

* Re: macvlan: Fix device ref leak when purging bc_queue
From: Joe.Ghalam @ 2017-04-24 15:01 UTC (permalink / raw)
  To: herbert; +Cc: davem, Clifford.Wichmann, netdev
In-Reply-To: <20170424075606.GA19926@gondor.apana.org.au>

> The only thing that can stop macvlan_process_broadcast from getting
> called is macvlan_port_destroy.  Nothing else can stop the work
> queue, unless of course the work queue mechanism itself is broken.

> So if you're sure macvlan_port_destroy is never even called in
> your case, then you'll need to start debugging the kernel work
> queue mechanism to see why macvlan_process_broadcast is not getting
> called.

I will get your changes reloaded and re-tested without any other debug tools. Hopefully, we'll see success. I will let you know if I see any issues. 
Btw, is your fix committed already? if not, do you know when and where it would be committed?

Thanks,
Joe
 

^ permalink raw reply

* net/ipv6: slab-out-of-bounds in ip6_tnl_xmit
From: Andrey Konovalov @ 2017-04-24 15:03 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML
  Cc: Eric Dumazet, Cong Wang, Dmitry Vyukov, Kostya Serebryany,
	syzkaller

Hi,

I've got the following error report while fuzzing the kernel with syzkaller.

On commit 5a7ad1146caa895ad718a534399e38bd2ba721b7 (4.11-rc8).

Unfortunately it's not reproducible.

The issue might be similar to this one:
https://groups.google.com/forum/#!topic/syzkaller/IDoQHFmrnRI

==================================================================
BUG: KASAN: slab-out-of-bounds in ip6_tnl_xmit+0x25dd/0x28f0
net/ipv6/ip6_tunnel.c:1078 at addr ffff88005dcc5f98
Read of size 16 by task syz-executor7/8076
CPU: 3 PID: 8076 Comm: syz-executor7 Not tainted 4.11.0-rc8+ #266
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x192/0x22d lib/dump_stack.c:52
 kasan_object_err+0x1c/0x70 mm/kasan/report.c:164
 print_address_description mm/kasan/report.c:202 [inline]
 kasan_report_error mm/kasan/report.c:291 [inline]
 kasan_report+0x252/0x510 mm/kasan/report.c:347
 __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:378
 ip6_tnl_xmit+0x25dd/0x28f0 net/ipv6/ip6_tunnel.c:1078
 ip4ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1268 [inline]
 ip6_tnl_start_xmit+0xc1e/0x1890 net/ipv6/ip6_tunnel.c:1370
 __netdev_start_xmit include/linux/netdevice.h:3980 [inline]
 netdev_start_xmit include/linux/netdevice.h:3989 [inline]
 xmit_one net/core/dev.c:2908 [inline]
 dev_hard_start_xmit+0x213/0x800 net/core/dev.c:2924
 __dev_queue_xmit+0x1abc/0x2580 net/core/dev.c:3391
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3424
 neigh_direct_output+0x15/0x20 net/core/neighbour.c:1349
 neigh_output include/net/neighbour.h:478 [inline]
 ip_finish_output2+0x7cd/0x1020 net/ipv4/ip_output.c:228
 ip_finish_output+0x83d/0xc30 net/ipv4/ip_output.c:316
 NF_HOOK_COND include/linux/netfilter.h:246 [inline]
 ip_output+0x1e7/0x5d0 net/ipv4/ip_output.c:404
 dst_output include/net/dst.h:486 [inline]
 ip_local_out+0x82/0xb0 net/ipv4/ip_output.c:124
 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
 ping_v4_push_pending_frames net/ipv4/ping.c:653 [inline]
 ping_v4_sendmsg+0x1b35/0x23e0 net/ipv4/ping.c:840
 inet_sendmsg+0x164/0x490 net/ipv4/af_inet.c:762
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 SYSC_sendto+0x660/0x810 net/socket.c:1696
 SyS_sendto+0x40/0x50 net/socket.c:1664
 entry_SYSCALL_64_fastpath+0x1a/0xa9
RIP: 0033:0x4458d9
RSP: 002b:00007f853159db58 EFLAGS: 00000282 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000708000 RCX: 00000000004458d9
RDX: 0000000000000008 RSI: 00000000204f9fe1 RDI: 0000000000000017
RBP: 0000000000003410 R08: 0000000020235000 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000282 R12: 00000000006e24d0
R13: 0000000020ef8000 R14: 0000000000001000 R15: 0000000000000003
Object at ffff88005dcc5e20, in cache kmalloc-512 size: 512
Allocated:
PID = 8076
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:513
 set_track mm/kasan/kasan.c:525 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:616
 __kmalloc+0x7c/0x1c0 mm/slub.c:3745
 kmalloc include/linux/slab.h:495 [inline]
 kzalloc include/linux/slab.h:663 [inline]
 neigh_alloc net/core/neighbour.c:286 [inline]
 __neigh_create+0x386/0x1da0 net/core/neighbour.c:458
 neigh_create include/net/neighbour.h:313 [inline]
 ipv4_neigh_lookup+0x4bb/0x730 net/ipv4/route.c:463
 dst_neigh_lookup include/net/dst.h:447 [inline]
 ip6_tnl_xmit+0x1598/0x28f0 net/ipv6/ip6_tunnel.c:1067
 ip4ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1268 [inline]
 ip6_tnl_start_xmit+0xc1e/0x1890 net/ipv6/ip6_tunnel.c:1370
 __netdev_start_xmit include/linux/netdevice.h:3980 [inline]
 netdev_start_xmit include/linux/netdevice.h:3989 [inline]
 xmit_one net/core/dev.c:2908 [inline]
 dev_hard_start_xmit+0x213/0x800 net/core/dev.c:2924
 __dev_queue_xmit+0x1abc/0x2580 net/core/dev.c:3391
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3424
 neigh_direct_output+0x15/0x20 net/core/neighbour.c:1349
 neigh_output include/net/neighbour.h:478 [inline]
 ip_finish_output2+0x7cd/0x1020 net/ipv4/ip_output.c:228
 ip_finish_output+0x83d/0xc30 net/ipv4/ip_output.c:316
 NF_HOOK_COND include/linux/netfilter.h:246 [inline]
 ip_output+0x1e7/0x5d0 net/ipv4/ip_output.c:404
 dst_output include/net/dst.h:486 [inline]
 ip_local_out+0x82/0xb0 net/ipv4/ip_output.c:124
 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
 ping_v4_push_pending_frames net/ipv4/ping.c:653 [inline]
 ping_v4_sendmsg+0x1b35/0x23e0 net/ipv4/ping.c:840
 inet_sendmsg+0x164/0x490 net/ipv4/af_inet.c:762
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 SYSC_sendto+0x660/0x810 net/socket.c:1696
 SyS_sendto+0x40/0x50 net/socket.c:1664
 entry_SYSCALL_64_fastpath+0x1a/0xa9
Freed:
PID = 7604
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:513
 set_track mm/kasan/kasan.c:525 [inline]
 kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:589
 slab_free_hook mm/slub.c:1357 [inline]
 slab_free_freelist_hook mm/slub.c:1379 [inline]
 slab_free mm/slub.c:2961 [inline]
 kfree+0x91/0x190 mm/slub.c:3882
 skb_free_head+0x74/0xb0 net/core/skbuff.c:579
 skb_release_data+0x37c/0x440 net/core/skbuff.c:610
 skb_release_all+0x4a/0x60 net/core/skbuff.c:669
 __kfree_skb net/core/skbuff.c:683 [inline]
 consume_skb+0x130/0x2f0 net/core/skbuff.c:756
 netlink_broadcast_filtered+0x5fa/0x1420 net/netlink/af_netlink.c:1473
 netlink_broadcast net/netlink/af_netlink.c:1495 [inline]
 nlmsg_multicast include/net/netlink.h:577 [inline]
 nlmsg_notify+0x9c/0x140 net/netlink/af_netlink.c:2382
 rtnl_notify+0xbb/0xe0 net/core/rtnetlink.c:674
 rtmsg_fib+0x3a7/0x4b0 net/ipv4/fib_semantics.c:422
 fib_table_delete+0x836/0x1140 net/ipv4/fib_trie.c:1659
 fib_magic.isra.14+0x4b3/0x890 net/ipv4/fib_frontend.c:840
 fib_del_ifaddr+0xb20/0xe10 net/ipv4/fib_frontend.c:1013
 fib_inetaddr_event+0xaf/0x200 net/ipv4/fib_frontend.c:1150
 notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93
 __blocking_notifier_call_chain kernel/notifier.c:317 [inline]
 blocking_notifier_call_chain+0x109/0x1a0 kernel/notifier.c:328
 __inet_del_ifa+0x4b5/0xb00 net/ipv4/devinet.c:402
 inet_del_ifa net/ipv4/devinet.c:432 [inline]
 devinet_ioctl+0xa75/0x1a10 net/ipv4/devinet.c:1073
 inet_ioctl+0x117/0x1c0 net/ipv4/af_inet.c:900
 sock_do_ioctl+0x65/0xb0 net/socket.c:906
 sock_ioctl+0x27a/0x410 net/socket.c:1004
 vfs_ioctl fs/ioctl.c:45 [inline]
 do_vfs_ioctl+0x1cd/0x15a0 fs/ioctl.c:685
 SYSC_ioctl fs/ioctl.c:700 [inline]
 SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
 entry_SYSCALL_64_fastpath+0x1a/0xa9
Memory state around the buggy address:
 ffff88005dcc5e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff88005dcc5f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88005dcc5f80: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
                               ^
 ffff88005dcc6000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88005dcc6080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

^ permalink raw reply

* Re: [PATCH v3 07/29] x86: bpf_jit, use ENTRY+ENDPROC
From: David Miller @ 2017-04-24 15:08 UTC (permalink / raw)
  To: jslaby
  Cc: alexei.starovoitov, mingo, tglx, hpa, x86, jpoimboe, linux-kernel,
	netdev, daniel, edumazet
In-Reply-To: <f7a94f4e-6d93-a447-a62f-3f290e66c647@suse.cz>

From: Jiri Slaby <jslaby@suse.cz>
Date: Mon, 24 Apr 2017 16:52:43 +0200

> On 04/24/2017, 04:41 PM, David Miller wrote:
>>> It cannot stay as-is simply because we want to know where the functions
>>> end to inject debuginfo properly. The code above does not warrant for
>>> any exception.
>> 
>> I totally and completely disagree.
> 
> You can disagree as you wish but there is really nothing special on the
> bpf code with respect to annotations.
> 
>>> Executing a nop takes a little and having externally-callable functions
>>> aligned can actually help performance (no, I haven't measured nor tested
>>> the code). But sure, the tool is generic, so I can introduce a local
>>> macros to avoid alignments in the functions:
>> 
>> Not for this case, it's a bunch of entry points all packed together
>> intentionally so that SKB accesses of different access sizes (which is
>> almost always the case) from BPF programs use the smallest amount of
>> I-cache as possible.
> 
> And for that reason I suggested the special macros for the code (see the
> macros in the e-mail you replied to again). So what problem do you
> actually have with the suggested solution?

If you align the entry points, then the code sequence as a whole is
are no longer densely packed.

Or do I misunderstand how your macros work?

^ permalink raw reply

* [PATCH v3 net] net: ipv6: regenerate host route if moved to gc list
From: David Ahern @ 2017-04-24 15:09 UTC (permalink / raw)
  To: netdev; +Cc: dvyukov, andreyknvl, mmanning, kafai, David Ahern

Taking down the loopback device wreaks havoc on IPv6 routing. By
extension, taking down a VRF device wreaks havoc on its table.

Dmitry and Andrey both reported heap out-of-bounds reports in the IPv6
FIB code while running syzkaller fuzzer. The root cause is a dead dst
that is on the garbage list gets reinserted into the IPv6 FIB. While on
the gc (or perhaps when it gets added to the gc list) the dst->next is
set to an IPv4 dst. A subsequent walk of the ipv6 tables causes the
out-of-bounds access.

Andrey's reproducer was the key to getting to the bottom of this.

With IPv6, host routes for an address have the dst->dev set to the
loopback device. When the 'lo' device is taken down, rt6_ifdown initiates
a walk of the fib evicting routes with the 'lo' device which means all
host routes are removed. That process moves the dst which is attached to
an inet6_ifaddr to the gc list and marks it as dead.

The recent change to keep global IPv6 addresses added a new function,
fixup_permanent_addr, that is called on admin up. That function restarts
dad for an inet6_ifaddr and when it completes the host route attached
to it is inserted into the fib. Since the route was marked dead and
moved to the gc list, re-inserting the route causes the reported
out-of-bounds accesses. If the device with the address is taken down
or the address is removed, the WARN_ON in fib6_del is triggered.

All of those faults are fixed by regenerating the host route if the
existing one has been moved to the gc list, something that can be
determined by checking if the rt6i_ref counter is 0.

Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v3
- removed 'if (prev)' and just call ip6_rt_put; added comment about spinlock

v2
- change ifp->rt under spinlock vs cmpxchg
- add comment about rt6i_ref == 0

 net/ipv6/addrconf.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 80ce478c4851..93f81d9cd85f 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3271,14 +3271,25 @@ static void addrconf_gre_config(struct net_device *dev)
 static int fixup_permanent_addr(struct inet6_dev *idev,
 				struct inet6_ifaddr *ifp)
 {
-	if (!ifp->rt) {
-		struct rt6_info *rt;
+	/* rt6i_ref == 0 means the host route was removed from the
+	 * FIB, for example, if 'lo' device is taken down. In that
+	 * case regenerate the host route.
+	 */
+	if (!ifp->rt || !atomic_read(&ifp->rt->rt6i_ref)) {
+		struct rt6_info *rt, *prev;
 
 		rt = addrconf_dst_alloc(idev, &ifp->addr, false);
 		if (unlikely(IS_ERR(rt)))
 			return PTR_ERR(rt);
 
+		prev = ifp->rt;
+
+		/* ifp->rt can be accessed outside of rtnl */
+		spin_lock(&ifp->lock);
 		ifp->rt = rt;
+		spin_unlock(&ifp->lock);
+
+		ip6_rt_put(prev);
 	}
 
 	if (!(ifp->flags & IFA_F_NOPREFIXROUTE)) {
-- 
2.1.4

^ permalink raw reply related

* Re: macvlan: Fix device ref leak when purging bc_queue
From: David Miller @ 2017-04-24 15:10 UTC (permalink / raw)
  To: Joe.Ghalam; +Cc: herbert, Clifford.Wichmann, netdev
In-Reply-To: <1493046084156.78287@Dell.com>

From: <Joe.Ghalam@dell.com>
Date: Mon, 24 Apr 2017 15:01:24 +0000

>> The only thing that can stop macvlan_process_broadcast from getting
>> called is macvlan_port_destroy.  Nothing else can stop the work
>> queue, unless of course the work queue mechanism itself is broken.
> 
>> So if you're sure macvlan_port_destroy is never even called in
>> your case, then you'll need to start debugging the kernel work
>> queue mechanism to see why macvlan_process_broadcast is not getting
>> called.
> 
> I will get your changes reloaded and re-tested without any other debug tools. Hopefully, we'll see success. I will let you know if I see any issues. 
> Btw, is your fix committed already? if not, do you know when and where it would be committed?

I'm waiting for this discussion to settle down before I apply the
patch.

^ permalink raw reply

* Re: [PATCH net-next V3 2/2] rtnl: Add support for netdev event attribute to link messages
From: Roopa Prabhu @ 2017-04-24 15:14 UTC (permalink / raw)
  To: David Ahern
  Cc: Vladislav Yasevich, netdev@vger.kernel.org, Vladislav Yasevich,
	Jiri Pirko
In-Reply-To: <877efb54-2aef-4d1e-c0b4-2ce6aa6562df@cumulusnetworks.com>

On Sun, Apr 23, 2017 at 6:07 PM, David Ahern <dsa@cumulusnetworks.com> wrote:
>
> On 4/21/17 11:31 AM, Vladislav Yasevich wrote:
> > @@ -1276,9 +1277,40 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev)
> >       return err;
> >  }
> >
> > +static int rtnl_fill_link_event(struct sk_buff *skb, unsigned long event)
> > +{
> > +     u32 rtnl_event;
> > +
> > +     switch (event) {
> > +     case NETDEV_REBOOT:
> > +             rtnl_event = IFLA_EVENT_REBOOT;
> > +             break;
> > +     case NETDEV_FEAT_CHANGE:
> > +             rtnl_event = IFLA_EVENT_FEAT_CHANGE;
> > +             break;
> > +     case NETDEV_BONDING_FAILOVER:
> > +             rtnl_event = IFLA_EVENT_BONDING_FAILOVER;
> > +             break;
> > +     case NETDEV_NOTIFY_PEERS:
> > +             rtnl_event = IFLA_EVENT_NOTIFY_PEERS;
> > +             break;
> > +     case NETDEV_RESEND_IGMP:
> > +             rtnl_event = IFLA_EVENT_RESEND_IGMP;
> > +             break;
> > +     case NETDEV_CHANGEINFODATA:
> > +             rtnl_event = IFLA_EVENT_CHANGE_INFO_DATA;
> > +             break;
> > +     default:
> > +             return 0;
> > +     }
> > +
> > +     return nla_put_u32(skb, IFLA_EVENT, rtnl_event);
> > +}
> > +
>
> I still have doubts about encoding kernel events into a uapi.

agree. I don't see why user-space will need NETDEV_CHANGEINFODATA and
others david listed.

My other concerns are, once we have this exposed to user-space and
user-space starts relying on it, it will need accurate information and
will expect to have this event information all the time.
IIUC, we cannot cover multiple events in a single notification and not
all link notifications will contain an IFLA_EVENT attribute. In other
words, we will be telling user-space to not expect that the kernel
will send IFLA_EVENT every time.



>
> For example, NETDEV_CHANGEINFODATA is only for bonds though nothing
> about the name suggests it is a bonding notification. This one was added
> specifically to notify userspace (d4261e5650004), yet seems to happen
> only during a changelink and that already generates a RTM_NEWLINK
> message via do_setlink. Since the rtnetlink_event message does not
> contain anything "NETDEV_CHANGEINFODATA" related what purpose does it
> really serve besides duplicating netlink messages to userspace.
>
> The REBOOT, IGMP, FEAT_CHANGE and BONDING_FAILOVER seem to be unique
> messages (code analysis only) which I get for notifying userspace.
>
> NETDEV_NOTIFY_PEERS is not so clear in how often it duplicates other
> messages.

^ permalink raw reply

* Re: [RFC PATCH 3/7] net: add option to get information about timestamped packets
From: Willem de Bruijn @ 2017-04-24 15:18 UTC (permalink / raw)
  To: Miroslav Lichvar
  Cc: Network Development, Richard Cochran, Willem de Bruijn,
	Soheil Hassas Yeganeh, Keller, Jacob E, Denny Page, Jiri Benc
In-Reply-To: <20170424090043.GF8847@localhost>

On Mon, Apr 24, 2017 at 5:00 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> On Thu, Apr 13, 2017 at 12:16:09PM -0400, Willem de Bruijn wrote:
>> On Thu, Apr 13, 2017 at 11:18 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
>> > On Thu, Apr 13, 2017 at 10:37:07AM -0400, Willem de Bruijn wrote:
>> >> Why is this L2 length needed?
>> >
>> > It's needed for incoming packets to allow converting of preamble
>> > timestamps to trailer timestamps.
>>
>> Receiving the mac length of a packet sounds like a feature independent
>> from timestamping.
>
> I agree, but so far nobody suggested another use for this information.
> Do you have any suggestions?
>
> The idea was that if it is useful only with HW timestamping, it would
> be better to save it only with the timestamp, so there is no
> performance impact in the more common case when HW timestamping is
> disabled. Am I overly cautious here?

The additional cost of a cmsg is zero for sockets that have no cmsg
enabled, due to

        if (inet->cmsg_flags)
                ip_cmsg_recv_offset(msg, sk, skb, sizeof(struct udphdr), off);

But you might be right that there are no uses outside the specific
timestamp requirement you have, so if you prefer to use a timestamp
option, I won't object further.

>> Either an ioctl similar to SIOCGIFMTU or, if it may
>> vary due to existince of vlan headers, a new independent cmsg at the
>> SOL_SOCKET layer.

The latter would require adding the SOL_SOCKET level cmsg processing
infra. It is simpler to just add it at the INET/INET6 levels.

> It's not just the VLAN headers. The length of the IP header may vary
> with IP options, so the offset of the UDP data in the packet cannot be
> assumed to be constant.

As well as tunnels.

> Now I'm wondering if it's actually necessary to save the original
> value of skb->mac_len + skb->len.

Computing it on recv if needed is definitely preferable to computing
on enqueue and storing in an intermediate variable.

> Would "skb->data - skb->head -
> skb->mac_header + skb->len" always work as the L2 length for received
> packets at the time when the cmsg is prepared?

(skb->data - skb->head) - skb->mac_header computes the length
of data before the mac, such as reserve? Do you mean skb->data -
skb->mac_header (or - skb_mac_offset(skb))?

> As for the original ifindex, it seems to me it does need to be saved
> to a new field since __netif_receive_skb_core() intentionally
> overwrites skb->skb_iif. What would be the best place for it, sk_buff
> or skb_shared_info?

Finding storage space on the receive path will not be easy.

One shortcut to avoid storing this information explicitly is to look up
the device from skb->napi_id.

> And would it really be acceptable to save it for all packets in
> __netif_receive_skb_core(), even when HW timestamping is disabled?
> Seeing how the code and the data structures were optimized over time,
> I have a feeling it would not be accepted.

Incurring this cost on all packets for such a rare edge case does sound
like a non-starter.

It can be called only if the netstamp_needed static key is enabled (false),
in __net_timestamp, though.

^ permalink raw reply

* Re: [PATCH net] bridge: shutdown bridge device before removing it
From: Xin Long @ 2017-04-24 15:21 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: network dev, bridge@lists.linux-foundation.org, David S. Miller
In-Reply-To: <940af2e3-8742-46b0-0550-0bba5e0a3f71@cumulusnetworks.com>

On Mon, Apr 24, 2017 at 10:53 PM, Nikolay Aleksandrov
<nikolay@cumulusnetworks.com> wrote:
> On 24/04/17 17:41, Xin Long wrote:
>> On Mon, Apr 24, 2017 at 8:07 PM, Nikolay Aleksandrov
>> <nikolay@cumulusnetworks.com> wrote:
>>> On 24/04/17 14:01, Nikolay Aleksandrov wrote:
>>>> On 24/04/17 10:25, Xin Long wrote:
>>>>> During removing a bridge device, if the bridge is still up, a new mdb entry
>>>>> still can be added in br_multicast_add_group() after all mdb entries are
>>>>> removed in br_multicast_dev_del(). Like the path:
>>>>>
>>>>>   mld_ifc_timer_expire ->
>>>>>     mld_sendpack -> ...
>>>>>       br_multicast_rcv ->
>>>>>         br_multicast_add_group
>>>>>
>>>>> The new mp's timer will be set up. If the timer expires after the bridge
>>>>> is freed, it may cause use-after-free panic in br_multicast_group_expired.
>>>>> This can happen when ip link remove a bridge or destroy a netns with a
>>>>> bridge device inside.
>>>>>
>>>>> As we can see in br_del_bridge, brctl is also supposed to remove a bridge
>>>>> device after it's shutdown.
>>>>>
>>>>> This patch is to call dev_close at the beginning of br_dev_delete so that
>>>>> netif_running check in br_multicast_add_group can avoid this issue. But
>>>>> to keep consistent with before, it will not remove the IFF_UP check in
>>>>> br_del_bridge for brctl.
>>>>>
>>>>> Reported-by: Jianwen Ji <jiji@redhat.com>
>>>>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>>>>> ---
>>>>>  net/bridge/br_if.c | 2 ++
>>>>>  1 file changed, 2 insertions(+)
>>>>>
>>>>
>>>> +CC bridge maintainers
>>>>
>>>> I can see how this could happen, could you also provide the traceback ?
>>>>
>>>> The patch looks good to me, actually I think it fixes another issue with
>>>> mcast stats where the percpu pointer can be accessed after it's freed if
>>>> an mcast packet can get sent via br->dev after the br_multicast_dev_del() call.
>>>> This is definitely stable material, if I'm not mistaken the issue is there since
>>>> the introduction of br_dev_delete:
>>>> commit e10177abf842
>>>> Author: Satish Ashok <sashok@cumulusnetworks.com>
>>>> Date:   Wed Jul 15 07:16:51 2015 -0700
>>>>
>>>>     bridge: multicast: fix handling of temp and perm entries
>>>>
>>>>
>>>>
>>>> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>>>
>>>
>>> Actually I have a better idea for a fix because dev_close() for a single device is rather heavy.
>>> Why don't you move the mdb flush logic in the bridge's ndo_uninit() callback ?
>>> That should have the same effect and be much faster.
>> Yes. But it seems that all cleanups for bridge should be done after
>> it's shutdown since beginning according to brctl. I'm not sure if there
>> are still other problems caused by this. maybe safer to use dev_close.
>> I need to check more to confirm this.
>>
>
> ndo_uninit() is after the device has been stopped, so it is the same as
> your fix as I said.
got that your suggestion can fix this issue. what I'm afraid of is there
are still other problems like this issue, like "the percpu pointer" one
you just mentioned above, though it's already fixed by ndo_uninit.
dev_close would just avoid ALL this kind of issues if there still are. :)

But if you can be sure no more issue like this one, I'm all for that,
will improve this patch with your suggestion.


>
>> I also have another question about mp->timer removing.
>> As we can see, now it removes this timer with del_timer, instead of
>> del_timer_sync. What if the timer is running when del_timer ?
>> How can we be sure that br_multicast_group_expired will be done
>> before the bridge dev is freed. synchronize_net ?
>>
>
> Yeah, I've been thinking about that and the only race is that the timer
> might have fired and waiting for the lock while the mdb is being flushed
> thus the cancel_timer() won't affect it and then it will enter and see
> that !netif_running(br->dev), but unfortunately there's a bug because we
> cannot guarantee that br->dev still exists at that point.
> This is a different bug though.
exactly, the bad thing is it's pretty hard to reproduce even if this bz exists,
since the timer process can not be preemptable. synchronize_net probably
could avoid it (not sure).

>
>>>
>>> By the way I just noticed that there's also a memory leak - the mdb hash is reallocated
>>> and not freed due to the mdb rehash, here's also kmemleak's object:
>>>
>> yeps, ;-)
>>
>>> unreferenced object 0xffff8800540ba800 (size 2048):
>>>   comm "softirq", pid 0, jiffies 4520588901 (age 5787.284s)
>>>   hex dump (first 32 bytes):
>>>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>   backtrace:
>>>     [<ffffffff816e2287>] kmemleak_alloc+0x67/0xc0
>>>     [<ffffffff81260bea>] __kmalloc+0x1ba/0x3e0
>>>     [<ffffffffa05c60ee>] br_mdb_rehash+0x5e/0x340 [bridge]
>>>     [<ffffffffa05c74af>] br_multicast_new_group+0x43f/0x6e0 [bridge]
>>>     [<ffffffffa05c7aa3>] br_multicast_add_group+0x203/0x260 [bridge]
>>>     [<ffffffffa05ca4b5>] br_multicast_rcv+0x945/0x11d0 [bridge]
>>>     [<ffffffffa05b6b10>] br_dev_xmit+0x180/0x470 [bridge]
>>>     [<ffffffff815c781b>] dev_hard_start_xmit+0xbb/0x3d0
>>>     [<ffffffff815c8743>] __dev_queue_xmit+0xb13/0xc10
>>>     [<ffffffff815c8850>] dev_queue_xmit+0x10/0x20
>>>     [<ffffffffa02f8d7a>] ip6_finish_output2+0x5ca/0xac0 [ipv6]
>>>     [<ffffffffa02fbfc6>] ip6_finish_output+0x126/0x2c0 [ipv6]
>>>     [<ffffffffa02fc245>] ip6_output+0xe5/0x390 [ipv6]
>>>     [<ffffffffa032b92c>] NF_HOOK.constprop.44+0x6c/0x240 [ipv6]
>>>     [<ffffffffa032bd16>] mld_sendpack+0x216/0x3e0 [ipv6]
>>>     [<ffffffffa032d5eb>] mld_ifc_timer_expire+0x18b/0x2b0 [ipv6]
>>>
>>>
>>>
>

^ permalink raw reply

* Re: macvlan: Fix device ref leak when purging bc_queue
From: Joe.Ghalam @ 2017-04-24 15:30 UTC (permalink / raw)
  To: davem; +Cc: herbert, Clifford.Wichmann, netdev
In-Reply-To: <20170424.111028.2157290275229080747.davem@davemloft.net>

> I'm waiting for this discussion to settle down before I apply the patch.

Thanks David. I will get some answers soon, and hopefully the change is a good one.

^ permalink raw reply

* [iproute PATCH] man: ip-rule.8: Further clarify how to interpret priority value
From: Phil Sutter @ 2017-04-24 15:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Despite the past changes, users seemed to get confused by the seemingly
contradictory relation of priority value and actual rule priority.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 man/man8/ip-rule.8 | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/man/man8/ip-rule.8 b/man/man8/ip-rule.8
index 7de80f3e6db9f..a5c479811927f 100644
--- a/man/man8/ip-rule.8
+++ b/man/man8/ip-rule.8
@@ -95,7 +95,10 @@ Each policy routing rule consists of a
 .B selector
 and an
 .B action predicate.
-The RPDB is scanned in order of decreasing priority. The selector
+The RPDB is scanned in order of decreasing priority (note that lower number
+means higher priority, see the description of
+.I PREFERENCE
+below). The selector
 of each rule is applied to {source address, destination address, incoming
 interface, tos, fwmark} and, if the selector matches the packet,
 the action is performed. The action predicate may return with success.
@@ -225,7 +228,8 @@ value to match.
 .BI priority " PREFERENCE"
 the priority of this rule.
 .I PREFERENCE
-is an unsigned integer value, higher number means lower priority.  Each rule
+is an unsigned integer value, higher number means lower priority, and rules get
+processed in order of increasing number. Each rule
 should have an explicitly set
 .I unique
 priority value.
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH v3 07/29] x86: bpf_jit, use ENTRY+ENDPROC
From: Jiri Slaby @ 2017-04-24 15:41 UTC (permalink / raw)
  To: David Miller
  Cc: alexei.starovoitov, mingo, tglx, hpa, x86, jpoimboe, linux-kernel,
	netdev, daniel, edumazet
In-Reply-To: <20170424.110844.1321374394090353753.davem@davemloft.net>

On 04/24/2017, 05:08 PM, David Miller wrote:
> If you align the entry points, then the code sequence as a whole is
> are no longer densely packed.

Sure.

> Or do I misunderstand how your macros work?

Perhaps. So the suggested macros for the code are:
#define BPF_FUNC_START_LOCAL(name) \
		SYM_START(name, SYM_V_LOCAL, SYM_A_NONE)
#define BPF_FUNC_START(name) \
		SYM_START(name, SYM_V_GLOBAL, SYM_A_NONE)

and they differ from the standard ones:
#define SYM_FUNC_START_LOCAL(name)                      \
        SYM_START(name, SYM_V_LOCAL, SYM_A_ALIGN)
#define SYM_FUNC_START(name)                            \
        SYM_START(name, SYM_V_GLOBAL, SYM_A_ALIGN)


The difference is SYM_A_NONE vs. SYM_A_ALIGN, which means:
#define SYM_A_ALIGN                             ALIGN
#define SYM_A_NONE                              /* nothing */

Does it look OK now?

thanks,
-- 
js
suse labs

^ permalink raw reply

* Re: [PATCH v4] {net,IB}/{rxe,usnic}: Utilize generic mac to eui32 function
From: Yuval Shaia @ 2017-04-24 15:46 UTC (permalink / raw)
  To: Leon Romanovsky, dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: benve-FYB4Gu1CFyUAvxtiuMwx3w, dgoodell-FYB4Gu1CFyUAvxtiuMwx3w,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w,
	monis-VPRAkNaXOzVWk0Htik3J/w, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20170314175843.GY2079-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>

On Tue, Mar 14, 2017 at 07:58:43PM +0200, Leon Romanovsky wrote:
> On Tue, Mar 14, 2017 at 04:01:57PM +0200, Yuval Shaia wrote:
> > This logic seems to be duplicated in (at least) three separate files.
> > Move it to one place so code can be re-use.
> >
> > Signed-off-by: Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> > ---
> > v0 -> v1:
> > 	* Add missing #include
> > 	* Rename to genaddrconf_ifid_eui48
> > v1 -> v2:
> > 	* Reset eui[0] to default if dev_id is used
> > v2 -> v3:
> > 	* Add helper function to avoid re-setting eui[0] to default if
> > 	  dev_id is used
> > v3 -> v4:
> > 	* Remove RXE wrappers
> > 	* Remove addrconf_addr_eui48_xor and do the eui[0] ^= 2 in the
> > 	  basic implementation
> > ---
> >  drivers/infiniband/hw/usnic/usnic_common_util.h | 11 +++-------
> >  drivers/infiniband/sw/rxe/rxe.c                 |  4 +++-
> >  drivers/infiniband/sw/rxe/rxe_loc.h             |  2 --
> >  drivers/infiniband/sw/rxe/rxe_net.c             | 28 -------------------------
> >  drivers/infiniband/sw/rxe/rxe_verbs.c           |  4 +++-
> >  include/net/addrconf.h                          | 22 +++++++++++++++----
> >  6 files changed, 27 insertions(+), 44 deletions(-)
> >
> 
> Thanks, Yuval.
> Reviewed-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Hi Doug,
If no more comments on this one can you consider taking it?

Thanks,
Yuval


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 07/29] x86: bpf_jit, use ENTRY+ENDPROC
From: David Miller @ 2017-04-24 15:51 UTC (permalink / raw)
  To: jslaby
  Cc: alexei.starovoitov, mingo, tglx, hpa, x86, jpoimboe, linux-kernel,
	netdev, daniel, edumazet
In-Reply-To: <614ca52b-8a43-244e-8a3a-c39145ecc3e8@suse.cz>

From: Jiri Slaby <jslaby@suse.cz>
Date: Mon, 24 Apr 2017 17:41:06 +0200

> On 04/24/2017, 05:08 PM, David Miller wrote:
>> If you align the entry points, then the code sequence as a whole is
>> are no longer densely packed.
> 
> Sure.
> 
>> Or do I misunderstand how your macros work?
> 
> Perhaps. So the suggested macros for the code are:
> #define BPF_FUNC_START_LOCAL(name) \
> 		SYM_START(name, SYM_V_LOCAL, SYM_A_NONE)
> #define BPF_FUNC_START(name) \
> 		SYM_START(name, SYM_V_GLOBAL, SYM_A_NONE)
> 
> and they differ from the standard ones:
> #define SYM_FUNC_START_LOCAL(name)                      \
>         SYM_START(name, SYM_V_LOCAL, SYM_A_ALIGN)
> #define SYM_FUNC_START(name)                            \
>         SYM_START(name, SYM_V_GLOBAL, SYM_A_ALIGN)
> 
> 
> The difference is SYM_A_NONE vs. SYM_A_ALIGN, which means:
> #define SYM_A_ALIGN                             ALIGN
> #define SYM_A_NONE                              /* nothing */
> 
> Does it look OK now?

I said I'm not OK with the alignment, so personally I am not
with how these macros work and what they will do to the code
generated for BPF packet accesses.

But I'll defer to Alexei on this because I don't have the time
nor the energy to fight this.

Thanks.

^ permalink raw reply

* Re: [PATCH v3 07/29] x86: bpf_jit, use ENTRY+ENDPROC
From: Jiri Slaby @ 2017-04-24 15:53 UTC (permalink / raw)
  To: David Miller
  Cc: alexei.starovoitov, mingo, tglx, hpa, x86, jpoimboe, linux-kernel,
	netdev, daniel, edumazet
In-Reply-To: <20170424.115118.1652158849030310645.davem@davemloft.net>

On 04/24/2017, 05:51 PM, David Miller wrote:
> I said I'm not OK with the alignment

So in short, the suggested macros add no alignment.

-- 
js
suse labs

^ permalink raw reply

* Re: [net-next 0/7][pull request] 1GbE Intel Wired LAN Driver Updates 2017-04-20
From: David Miller @ 2017-04-24 15:54 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <20170420233335.34900-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 20 Apr 2017 16:33:28 -0700

> This series contains updates to e1000, e1000e, igb/vf and ixgb.

Pulled, thanks Jeff.

^ permalink raw reply

* Re: I find one aspect of the ip link show command confusing and I'd like you to fix it, please
From: Stephen Hemminger @ 2017-04-24 15:54 UTC (permalink / raw)
  To: Jeff Silverman; +Cc: netdev
In-Reply-To: <CAGu9dLLRVaiO-vHd_jzVuMM3O=sjLX_VOeu5n6-+eV6fEdhBug@mail.gmail.com>

On Sun, 23 Apr 2017 15:36:32 -0700
Jeff Silverman <jeffsilverm@gmail.com> wrote:

> People,
> 
> When my NIC is up, but not connected, I see:
> 
> root@jeff-desktop:~# ip link show enp3s0
> 2: enp3s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state
> DOWN mode DEFAULT group default qlen 1000
>     link/ether 00:10:18:cc:9c:77 brd ff:ff:ff:ff:ff:ff
> root@jeff-desktop:~# ip link show enp3s0
> 
> NO-CARRIER makes sense to me - if the wire is unplugged, then the NIC
> isn't seeing the humm at the beginning of each packet.  That's clear.
> Note that even though my link isn't plugged in,  ip still notes that
> it is up.  That's great.
> 
> 
> But if I down my NIC, there is no indication that it is DOWN other
> than you can't see the UP flag.  If somebody was new to linux, they
> would not see what's not there.
> 
> 2: enp3s0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode
> DEFAULT group default qlen 1000
>     link/ether 00:10:18:cc:9c:77 brd ff:ff:ff:ff:ff:ff
> 
> 
> What I would like you to do is modify the ip command so that if the
> NIC has been downed by something, then it explicitly says DOWN.
> 
> What would be really nice would be if you enumerated all of the flags
> that an interface can have, and note if the flag is set or cleared.
> But that's more than what I want with this message.
> 
> 
> Many thanks,
> 
> 
> Jeff
> 

If you have a suggestion send a patch. The utility has shown the same
output since the earliest versions. Therefore the default output format
can't change since people do things like write scripts to parse it.
A more verbose output is possible but would have to be enabled by
a flag.

^ permalink raw reply

* Re: [PATCH v3 07/29] x86: bpf_jit, use ENTRY+ENDPROC
From: Ingo Molnar @ 2017-04-24 15:55 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: David Miller, alexei.starovoitov, mingo, tglx, hpa, x86, jpoimboe,
	linux-kernel, netdev, daniel, edumazet
In-Reply-To: <614ca52b-8a43-244e-8a3a-c39145ecc3e8@suse.cz>


* Jiri Slaby <jslaby@suse.cz> wrote:

> On 04/24/2017, 05:08 PM, David Miller wrote:
> > If you align the entry points, then the code sequence as a whole is
> > are no longer densely packed.
> 
> Sure.
> 
> > Or do I misunderstand how your macros work?
> 
> Perhaps. So the suggested macros for the code are:
> #define BPF_FUNC_START_LOCAL(name) \
> 		SYM_START(name, SYM_V_LOCAL, SYM_A_NONE)
> #define BPF_FUNC_START(name) \
> 		SYM_START(name, SYM_V_GLOBAL, SYM_A_NONE)
> 
> and they differ from the standard ones:
> #define SYM_FUNC_START_LOCAL(name)                      \
>         SYM_START(name, SYM_V_LOCAL, SYM_A_ALIGN)
> #define SYM_FUNC_START(name)                            \
>         SYM_START(name, SYM_V_GLOBAL, SYM_A_ALIGN)
> 
> 
> The difference is SYM_A_NONE vs. SYM_A_ALIGN, which means:
> #define SYM_A_ALIGN                             ALIGN
> #define SYM_A_NONE                              /* nothing */
> 
> Does it look OK now?

No, the patch changes alignment which is undesirable, it needs to preserve the 
existing (non-)alignment of the symbols!

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH net] bridge: shutdown bridge device before removing it
From: Nikolay Aleksandrov @ 2017-04-24 15:55 UTC (permalink / raw)
  To: Xin Long
  Cc: network dev, bridge@lists.linux-foundation.org, David S. Miller,
	Herbert Xu
In-Reply-To: <CADvbK_fJAqpcspMO7SPhRcbia9KLBA=Q5E2ySTsV0A6+EfDzFg@mail.gmail.com>

On 24/04/17 18:21, Xin Long wrote:
> On Mon, Apr 24, 2017 at 10:53 PM, Nikolay Aleksandrov
> <nikolay@cumulusnetworks.com> wrote:
>> On 24/04/17 17:41, Xin Long wrote:
>>> On Mon, Apr 24, 2017 at 8:07 PM, Nikolay Aleksandrov
>>> <nikolay@cumulusnetworks.com> wrote:
>>>> On 24/04/17 14:01, Nikolay Aleksandrov wrote:
>>>>> On 24/04/17 10:25, Xin Long wrote:
>>>>>> During removing a bridge device, if the bridge is still up, a new mdb entry
>>>>>> still can be added in br_multicast_add_group() after all mdb entries are
>>>>>> removed in br_multicast_dev_del(). Like the path:
>>>>>>
>>>>>>   mld_ifc_timer_expire ->
>>>>>>     mld_sendpack -> ...
>>>>>>       br_multicast_rcv ->
>>>>>>         br_multicast_add_group
>>>>>>
>>>>>> The new mp's timer will be set up. If the timer expires after the bridge
>>>>>> is freed, it may cause use-after-free panic in br_multicast_group_expired.
>>>>>> This can happen when ip link remove a bridge or destroy a netns with a
>>>>>> bridge device inside.
>>>>>>
>>>>>> As we can see in br_del_bridge, brctl is also supposed to remove a bridge
>>>>>> device after it's shutdown.
>>>>>>
>>>>>> This patch is to call dev_close at the beginning of br_dev_delete so that
>>>>>> netif_running check in br_multicast_add_group can avoid this issue. But
>>>>>> to keep consistent with before, it will not remove the IFF_UP check in
>>>>>> br_del_bridge for brctl.
>>>>>>
>>>>>> Reported-by: Jianwen Ji <jiji@redhat.com>
>>>>>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>>>>>> ---
>>>>>>  net/bridge/br_if.c | 2 ++
>>>>>>  1 file changed, 2 insertions(+)
>>>>>>
>>>>>
>>>>> +CC bridge maintainers
>>>>>
>>>>> I can see how this could happen, could you also provide the traceback ?
>>>>>
>>>>> The patch looks good to me, actually I think it fixes another issue with
>>>>> mcast stats where the percpu pointer can be accessed after it's freed if
>>>>> an mcast packet can get sent via br->dev after the br_multicast_dev_del() call.
>>>>> This is definitely stable material, if I'm not mistaken the issue is there since
>>>>> the introduction of br_dev_delete:
>>>>> commit e10177abf842
>>>>> Author: Satish Ashok <sashok@cumulusnetworks.com>
>>>>> Date:   Wed Jul 15 07:16:51 2015 -0700
>>>>>
>>>>>     bridge: multicast: fix handling of temp and perm entries
>>>>>
>>>>>
>>>>>
>>>>> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>>>>
>>>>
>>>> Actually I have a better idea for a fix because dev_close() for a single device is rather heavy.
>>>> Why don't you move the mdb flush logic in the bridge's ndo_uninit() callback ?
>>>> That should have the same effect and be much faster.
>>> Yes. But it seems that all cleanups for bridge should be done after
>>> it's shutdown since beginning according to brctl. I'm not sure if there
>>> are still other problems caused by this. maybe safer to use dev_close.
>>> I need to check more to confirm this.
>>>
>>
>> ndo_uninit() is after the device has been stopped, so it is the same as
>> your fix as I said.
> got that your suggestion can fix this issue. what I'm afraid of is there
> are still other problems like this issue, like "the percpu pointer" one
> you just mentioned above, though it's already fixed by ndo_uninit.
> dev_close would just avoid ALL this kind of issues if there still are. :)
> 
> But if you can be sure no more issue like this one, I'm all for that,
> will improve this patch with your suggestion.
> 

Please fix it with ndo_uninit(), avoiding another synchronize_net() call
is worth the trouble.

> 
>>
>>> I also have another question about mp->timer removing.
>>> As we can see, now it removes this timer with del_timer, instead of
>>> del_timer_sync. What if the timer is running when del_timer ?
>>> How can we be sure that br_multicast_group_expired will be done
>>> before the bridge dev is freed. synchronize_net ?
>>>
>>
>> Yeah, I've been thinking about that and the only race is that the timer
>> might have fired and waiting for the lock while the mdb is being flushed
>> thus the cancel_timer() won't affect it and then it will enter and see
>> that !netif_running(br->dev), but unfortunately there's a bug because we
>> cannot guarantee that br->dev still exists at that point.
>> This is a different bug though.
> exactly, the bad thing is it's pretty hard to reproduce even if this bz exists,
> since the timer process can not be preemptable. synchronize_net probably
> could avoid it (not sure).

I think the _bh rcu barrier in br_multicast_dev_del() should wait for
all currently executing BHs to finish before executing the callbacks to
free the groups, so it should be fine if any timer is waiting for the
lock at the same time: it will get it, see br->dev as not running and exit.

This is the part I'm talking about (br_multicast.c, 2023 - 2025):
                spin_unlock_bh(&br->multicast_lock);
                rcu_barrier_bh();
                spin_lock_bh(&br->multicast_lock);

At this point either the timer has fired and has been waiting for the
lock or got deleted by the flush.

If anyone could check the logic above it'd be great, adding the original
bridge multicast author as well and I'll keep digging.

> 
>>
>>>>
>>>> By the way I just noticed that there's also a memory leak - the mdb hash is reallocated
>>>> and not freed due to the mdb rehash, here's also kmemleak's object:
>>>>
>>> yeps, ;-)
>>>
>>>> unreferenced object 0xffff8800540ba800 (size 2048):
>>>>   comm "softirq", pid 0, jiffies 4520588901 (age 5787.284s)
>>>>   hex dump (first 32 bytes):
>>>>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>   backtrace:
>>>>     [<ffffffff816e2287>] kmemleak_alloc+0x67/0xc0
>>>>     [<ffffffff81260bea>] __kmalloc+0x1ba/0x3e0
>>>>     [<ffffffffa05c60ee>] br_mdb_rehash+0x5e/0x340 [bridge]
>>>>     [<ffffffffa05c74af>] br_multicast_new_group+0x43f/0x6e0 [bridge]
>>>>     [<ffffffffa05c7aa3>] br_multicast_add_group+0x203/0x260 [bridge]
>>>>     [<ffffffffa05ca4b5>] br_multicast_rcv+0x945/0x11d0 [bridge]
>>>>     [<ffffffffa05b6b10>] br_dev_xmit+0x180/0x470 [bridge]
>>>>     [<ffffffff815c781b>] dev_hard_start_xmit+0xbb/0x3d0
>>>>     [<ffffffff815c8743>] __dev_queue_xmit+0xb13/0xc10
>>>>     [<ffffffff815c8850>] dev_queue_xmit+0x10/0x20
>>>>     [<ffffffffa02f8d7a>] ip6_finish_output2+0x5ca/0xac0 [ipv6]
>>>>     [<ffffffffa02fbfc6>] ip6_finish_output+0x126/0x2c0 [ipv6]
>>>>     [<ffffffffa02fc245>] ip6_output+0xe5/0x390 [ipv6]
>>>>     [<ffffffffa032b92c>] NF_HOOK.constprop.44+0x6c/0x240 [ipv6]
>>>>     [<ffffffffa032bd16>] mld_sendpack+0x216/0x3e0 [ipv6]
>>>>     [<ffffffffa032d5eb>] mld_ifc_timer_expire+0x18b/0x2b0 [ipv6]
>>>>
>>>>
>>>>
>>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox