Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] net: core: orphan frags before queuing to slow qdisc
From: Jason Wang @ 2014-01-18  5:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, linux-kernel, Michael S. Tsirkin
In-Reply-To: <1389968897.31367.489.camel@edumazet-glaptop2.roam.corp.google.com>

On 01/17/2014 10:28 PM, Eric Dumazet wrote:
> On Fri, 2014-01-17 at 17:42 +0800, Jason Wang wrote:
>> Many qdiscs can queue a packet for a long time, this will lead an issue
>> with zerocopy skb. It means the frags will not be orphaned in an expected
>> short time, this breaks the assumption that virtio-net will transmit the
>> packet in time.
>>
>> So if guest packets were queued through such kind of qdisc and hit the
>> limitation of the max pending packets for virtio/vhost. All packets that
>> go to another destination from guest will also be blocked.
>>
>> A case for reproducing the issue:
>>
>> - Boot two VMs and connect them to the same bridge kvmbr.
>> - Setup tbf with a very low rate/burst on eth0 which is a port of kvmbr.
>> - Let VM1 send lots of packets thorugh eth0
>> - After a while, VM1 is unable to send any packets out since the number of
>>    pending packets (queued to tbf) were exceeds the limitation of vhost/virito
> So whats the problem ? If the limit is low, you cannot sent packets.

It was just an extreme case. The problem is if zercopy packets of vm1 
were throttled by qdisc in eth0, probably all packets from vm1 were 
throttled even if it was not go through eth0.
> Solution : increase the limit, or tell the vm to lower its rate.
>
> Oh wait, are you bitten because you did some prior skb_orphan() to allow
> the vm to send unlimited number of skbs ???
>

The problem is sndbuf were defaulted to INT_MAX to prevent a similar 
issue for non-zerocopy packets. For zerocopy, only after the frags were 
orphaned can vhost notify the completion of tx for virtio-net. So 
INT_MAX sndbuf is not enough.
>> Solve this issue by orphaning the frags before queuing it to a slow qdisc (the
>> one without TCQ_F_CAN_BYPASS).
> Why orphaning the frags only solves the problem ? A skb without zerocopy
> frags should also be blocked for a while.

It's ok for non-zerocopy packet to be blocked since VM1 thought the 
packets has been sent instead of pending in the virtqueue. So VM1 can 
still send packet to other destination.
> Seriously, lets admit this zero copy stuff is utterly broken.
>
>
> TCQ_F_CAN_BYPASS is not enough. Some NIC have separate queues with
> strict priorities.
>

Yes, but looks less serious than traffic shaping.
> It seems to me that you are pushing to use FIFO (the only qdisc setting
> TCQ_F_CAN_BYPASS), by adding yet another test in fast path (I do not
> know how we can still call it a fast path), while we already have smart
> qdisc to avoid the inherent HOL and unfairness problems of FIFO.
>

It was just a workaround like the case of sndbuf before we had a better 
solution. So looks like using sfq or fq in guest can mitigate the issue?
>> Cc: Michael S. Tsirkin<mst@redhat.com>
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>> ---
>>   net/core/dev.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 0ce469e..1209774 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -2700,6 +2700,12 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
>>   	contended = qdisc_is_running(q);
>>   	if (unlikely(contended))
>>   		spin_lock(&q->busylock);
>> +	if (!(q->flags&  TCQ_F_CAN_BYPASS)&&
>> +	    unlikely(skb_orphan_frags(skb, GFP_ATOMIC))) {
>> +		kfree_skb(skb);
>> +		rc = NET_XMIT_DROP;
>> +		goto out;
>> +	}
> Are you aware that copying stuff takes time ?
>
> If yes, why is it done after taking the busylock spinlock ?
>

Yes and it should be done outside the spinlock.
>>
>>   	spin_lock(root_lock);
>>   	if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED,&q->state))) {
>> @@ -2739,6 +2745,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
>>   		}
>>   	}
>>   	spin_unlock(root_lock);
>> +out:
>>   	if (unlikely(contended))
>>   		spin_unlock(&q->busylock);
>>   	return rc;
>
>

^ permalink raw reply

* Re: ipv4_dst_destroy panic regression after 3.10.15
From: Eric Dumazet @ 2014-01-18  6:49 UTC (permalink / raw)
  To: dormando; +Cc: netdev, linux-kernel
In-Reply-To: <alpine.DEB.2.10.1401171720560.15759@dinf>

On Fri, 2014-01-17 at 17:25 -0800, dormando wrote:
> Hi,
> 
> Upgraded a few kernels to the latest 3.10 stable tree while tracking down
> a rare kernel panic, seems to have introduced a much more frequent kernel
> panic. Takes anywhere from 4 hours to 2 days to trigger:
> 
> <4>[196727.311203] general protection fault: 0000 [#1] SMP
> <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
> <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
> <4>[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
> <4>[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000
> <4>[196727.311377] RIP: 0010:[<ffffffff815f8c7f>]  [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
> <4>[196727.311399] RSP: 0018:ffff885effd23a70  EFLAGS: 00010282
> <4>[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040
> <4>[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200
> <4>[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800
> <4>[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> <4>[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce
> <4>[196727.311510] FS:  0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000
> <4>[196727.311554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0
> <4>[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <4>[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> <4>[196727.311713] Stack:
> <4>[196727.311733]  ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42
> <4>[196727.311784]  ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0
> <4>[196727.311834]  ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0
> <4>[196727.311885] Call Trace:
> <4>[196727.311907]  <IRQ>
> <4>[196727.311912]  [<ffffffff815b7f42>] dst_destroy+0x32/0xe0
> <4>[196727.311959]  [<ffffffff815b86c6>] dst_release+0x56/0x80
> <4>[196727.311986]  [<ffffffff81620bd5>] tcp_v4_do_rcv+0x2a5/0x4a0
> <4>[196727.312013]  [<ffffffff81622b5a>] tcp_v4_rcv+0x7da/0x820
> <4>[196727.312041]  [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
> <4>[196727.312070]  [<ffffffff815de02d>] ? nf_hook_slow+0x7d/0x150
> <4>[196727.312097]  [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
> <4>[196727.312125]  [<ffffffff815fda92>] ip_local_deliver_finish+0xb2/0x230
> <4>[196727.312154]  [<ffffffff815fdd9a>] ip_local_deliver+0x4a/0x90
> <4>[196727.312183]  [<ffffffff815fd799>] ip_rcv_finish+0x119/0x360
> <4>[196727.312212]  [<ffffffff815fe00b>] ip_rcv+0x22b/0x340
> <4>[196727.312242]  [<ffffffffa0339680>] ? macvlan_broadcast+0x160/0x160 [macvlan]
> <4>[196727.312275]  [<ffffffff815b0c62>] __netif_receive_skb_core+0x512/0x640
> <4>[196727.312308]  [<ffffffff811427fb>] ? kmem_cache_alloc+0x13b/0x150
> <4>[196727.312338]  [<ffffffff815b0db1>] __netif_receive_skb+0x21/0x70
> <4>[196727.312368]  [<ffffffff815b0fa1>] netif_receive_skb+0x31/0xa0
> <4>[196727.312397]  [<ffffffff815b1ae8>] napi_gro_receive+0xe8/0x140
> <4>[196727.312433]  [<ffffffffa00274f1>] ixgbe_poll+0x551/0x11f0 [ixgbe]
> <4>[196727.312463]  [<ffffffff815fe00b>] ? ip_rcv+0x22b/0x340
> <4>[196727.312491]  [<ffffffff815b1691>] net_rx_action+0x111/0x210
> <4>[196727.312521]  [<ffffffff815b0db1>] ? __netif_receive_skb+0x21/0x70
> <4>[196727.312552]  [<ffffffff810519d0>] __do_softirq+0xd0/0x270
> <4>[196727.312583]  [<ffffffff816cef3c>] call_softirq+0x1c/0x30
> <4>[196727.312613]  [<ffffffff81004205>] do_softirq+0x55/0x90
> <4>[196727.312640]  [<ffffffff81051c85>] irq_exit+0x55/0x60
> <4>[196727.312668]  [<ffffffff816cf5c3>] do_IRQ+0x63/0xe0
> <4>[196727.312696]  [<ffffffff816c5aaa>] common_interrupt+0x6a/0x6a
> <4>[196727.312722]  <EOI>
> <4>[196727.312727]  [<ffffffff8100a150>] ? default_idle+0x20/0xe0
> <4>[196727.312775]  [<ffffffff8100a8ff>] arch_cpu_idle+0xf/0x20
> <4>[196727.312803]  [<ffffffff8108d330>] cpu_startup_entry+0xc0/0x270
> <4>[196727.312833]  [<ffffffff816b276e>] start_secondary+0x1f9/0x200
> <4>[196727.312860] Code: 4a 9f e9 81 e8 13 cb 0c 00 48 8b 93 b0 00 00 00 48 bf 00 02 20 00 00 00 ad de 48 8b 83 b8 00 00 00 48 be 00 01 10 00 00 00 ad de <48> 89 42 08 48 89 10 48 89 bb b8 00 00 00 48 c7 c7 4a 9f e9 81
> <1>[196727.313071] RIP  [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
> <4>[196727.313100]  RSP <ffff885effd23a70>
> <4>[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
> <0>[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt
> 
> 
> ... bisecting it's going to be a pain... I tried eyeballing the diffs and
> am trying a revert or two.
> 
> We've hit it in .25, .26 so far. I have .27 running but not sure if it
> crashed, so the change exists between .15 and .25.

Please try following fix, thanks for the report !

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 25071b48921c..78a50a22298a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1333,7 +1333,7 @@ static void ipv4_dst_destroy(struct dst_entry
*dst)
 
 	if (!list_empty(&rt->rt_uncached)) {
 		spin_lock_bh(&rt_uncached_lock);
-		list_del(&rt->rt_uncached);
+		list_del_init(&rt->rt_uncached);
 		spin_unlock_bh(&rt_uncached_lock);
 	}
 }

^ permalink raw reply related

* Re: ipv4_dst_destroy panic regression after 3.10.15
From: Eric Dumazet @ 2014-01-18  7:09 UTC (permalink / raw)
  To: dormando; +Cc: netdev, linux-kernel, Alexei Starovoitov
In-Reply-To: <1390027758.31367.505.camel@edumazet-glaptop2.roam.corp.google.com>

On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote:
> On Fri, 2014-01-17 at 17:25 -0800, dormando wrote:
> > Hi,
> > 
> > Upgraded a few kernels to the latest 3.10 stable tree while tracking down
> > a rare kernel panic, seems to have introduced a much more frequent kernel
> > panic. Takes anywhere from 4 hours to 2 days to trigger:
> > 
> > <4>[196727.311203] general protection fault: 0000 [#1] SMP
> > <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
> > <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
> > <4>[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
> > <4>[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000
> > <4>[196727.311377] RIP: 0010:[<ffffffff815f8c7f>]  [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
> > <4>[196727.311399] RSP: 0018:ffff885effd23a70  EFLAGS: 00010282
> > <4>[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040
> > <4>[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200
> > <4>[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800
> > <4>[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > <4>[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce
> > <4>[196727.311510] FS:  0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000
> > <4>[196727.311554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > <4>[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0
> > <4>[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > <4>[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > <4>[196727.311713] Stack:
> > <4>[196727.311733]  ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42
> > <4>[196727.311784]  ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0
> > <4>[196727.311834]  ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0
> > <4>[196727.311885] Call Trace:
> > <4>[196727.311907]  <IRQ>
> > <4>[196727.311912]  [<ffffffff815b7f42>] dst_destroy+0x32/0xe0
> > <4>[196727.311959]  [<ffffffff815b86c6>] dst_release+0x56/0x80
> > <4>[196727.311986]  [<ffffffff81620bd5>] tcp_v4_do_rcv+0x2a5/0x4a0
> > <4>[196727.312013]  [<ffffffff81622b5a>] tcp_v4_rcv+0x7da/0x820
> > <4>[196727.312041]  [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
> > <4>[196727.312070]  [<ffffffff815de02d>] ? nf_hook_slow+0x7d/0x150
> > <4>[196727.312097]  [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
> > <4>[196727.312125]  [<ffffffff815fda92>] ip_local_deliver_finish+0xb2/0x230
> > <4>[196727.312154]  [<ffffffff815fdd9a>] ip_local_deliver+0x4a/0x90
> > <4>[196727.312183]  [<ffffffff815fd799>] ip_rcv_finish+0x119/0x360
> > <4>[196727.312212]  [<ffffffff815fe00b>] ip_rcv+0x22b/0x340
> > <4>[196727.312242]  [<ffffffffa0339680>] ? macvlan_broadcast+0x160/0x160 [macvlan]
> > <4>[196727.312275]  [<ffffffff815b0c62>] __netif_receive_skb_core+0x512/0x640
> > <4>[196727.312308]  [<ffffffff811427fb>] ? kmem_cache_alloc+0x13b/0x150
> > <4>[196727.312338]  [<ffffffff815b0db1>] __netif_receive_skb+0x21/0x70
> > <4>[196727.312368]  [<ffffffff815b0fa1>] netif_receive_skb+0x31/0xa0
> > <4>[196727.312397]  [<ffffffff815b1ae8>] napi_gro_receive+0xe8/0x140
> > <4>[196727.312433]  [<ffffffffa00274f1>] ixgbe_poll+0x551/0x11f0 [ixgbe]
> > <4>[196727.312463]  [<ffffffff815fe00b>] ? ip_rcv+0x22b/0x340
> > <4>[196727.312491]  [<ffffffff815b1691>] net_rx_action+0x111/0x210
> > <4>[196727.312521]  [<ffffffff815b0db1>] ? __netif_receive_skb+0x21/0x70
> > <4>[196727.312552]  [<ffffffff810519d0>] __do_softirq+0xd0/0x270
> > <4>[196727.312583]  [<ffffffff816cef3c>] call_softirq+0x1c/0x30
> > <4>[196727.312613]  [<ffffffff81004205>] do_softirq+0x55/0x90
> > <4>[196727.312640]  [<ffffffff81051c85>] irq_exit+0x55/0x60
> > <4>[196727.312668]  [<ffffffff816cf5c3>] do_IRQ+0x63/0xe0
> > <4>[196727.312696]  [<ffffffff816c5aaa>] common_interrupt+0x6a/0x6a
> > <4>[196727.312722]  <EOI>
> > <4>[196727.312727]  [<ffffffff8100a150>] ? default_idle+0x20/0xe0
> > <4>[196727.312775]  [<ffffffff8100a8ff>] arch_cpu_idle+0xf/0x20
> > <4>[196727.312803]  [<ffffffff8108d330>] cpu_startup_entry+0xc0/0x270
> > <4>[196727.312833]  [<ffffffff816b276e>] start_secondary+0x1f9/0x200
> > <4>[196727.312860] Code: 4a 9f e9 81 e8 13 cb 0c 00 48 8b 93 b0 00 00 00 48 bf 00 02 20 00 00 00 ad de 48 8b 83 b8 00 00 00 48 be 00 01 10 00 00 00 ad de <48> 89 42 08 48 89 10 48 89 bb b8 00 00 00 48 c7 c7 4a 9f e9 81
> > <1>[196727.313071] RIP  [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
> > <4>[196727.313100]  RSP <ffff885effd23a70>
> > <4>[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
> > <0>[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt
> > 
> > 
> > ... bisecting it's going to be a pain... I tried eyeballing the diffs and
> > am trying a revert or two.
> > 
> > We've hit it in .25, .26 so far. I have .27 running but not sure if it
> > crashed, so the change exists between .15 and .25.
> 
> Please try following fix, thanks for the report !
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 25071b48921c..78a50a22298a 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1333,7 +1333,7 @@ static void ipv4_dst_destroy(struct dst_entry
> *dst)
>  
>  	if (!list_empty(&rt->rt_uncached)) {
>  		spin_lock_bh(&rt_uncached_lock);
> -		list_del(&rt->rt_uncached);
> +		list_del_init(&rt->rt_uncached);
>  		spin_unlock_bh(&rt_uncached_lock);
>  	}
>  }
> 

Problem could come from this commit, in linux 3.10.23,
you also could try to revert it
 
commit 62713c4b6bc10c2d082ee1540e11b01a2b2162ab
Author: Alexei Starovoitov <ast@plumgrid.com>
Date:   Tue Nov 19 19:12:34 2013 -0800

    ipv4: fix race in concurrent ip_route_input_slow()
    
    [ Upstream commit dcdfdf56b4a6c9437fc37dbc9cee94a788f9b0c4 ]
    
    CPUs can ask for local route via ip_route_input_noref() concurrently.
    if nh_rth_input is not cached yet, CPUs will proceed to allocate
    equivalent DSTs on 'lo' and then will try to cache them in nh_rth_input
    via rt_cache_route()
    Most of the time they succeed, but on occasion the following two lines:
        orig = *p;
        prev = cmpxchg(p, orig, rt);
    in rt_cache_route() do race and one of the cpus fails to complete cmpxchg.
    But ip_route_input_slow() doesn't check the return code of rt_cache_route(),
    so dst is leaking. dst_destroy() is never called and 'lo' device
    refcnt doesn't go to zero, which can be seen in the logs as:
        unregister_netdevice: waiting for lo to become free. Usage count = 1
    Adding mdelay() between above two lines makes it easily reproducible.
    Fix it similar to nh_pcpu_rth_output case.
    
    Fixes: d2d68ba9fe8b ("ipv4: Cache input routes in fib_info nexthops.")
    Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply

* Re: ipv4_dst_destroy panic regression after 3.10.15
From: dormando @ 2014-01-18  7:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-kernel, Alexei Starovoitov
In-Reply-To: <1390028976.31367.512.camel@edumazet-glaptop2.roam.corp.google.com>

> On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote:
> > On Fri, 2014-01-17 at 17:25 -0800, dormando wrote:
> > > Hi,
> > >
> > > Upgraded a few kernels to the latest 3.10 stable tree while tracking down
> > > a rare kernel panic, seems to have introduced a much more frequent kernel
> > > panic. Takes anywhere from 4 hours to 2 days to trigger:
> > >
> > > <4>[196727.311203] general protection fault: 0000 [#1] SMP
> > > <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
> > > <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
> > > <4>[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
> > > <4>[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000
> > > <4>[196727.311377] RIP: 0010:[<ffffffff815f8c7f>]  [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
> > > <4>[196727.311399] RSP: 0018:ffff885effd23a70  EFLAGS: 00010282
> > > <4>[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040
> > > <4>[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200
> > > <4>[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800
> > > <4>[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > > <4>[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce
> > > <4>[196727.311510] FS:  0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000
> > > <4>[196727.311554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > <4>[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0
> > > <4>[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > <4>[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > <4>[196727.311713] Stack:
> > > <4>[196727.311733]  ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42
> > > <4>[196727.311784]  ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0
> > > <4>[196727.311834]  ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0
> > > <4>[196727.311885] Call Trace:
> > > <4>[196727.311907]  <IRQ>
> > > <4>[196727.311912]  [<ffffffff815b7f42>] dst_destroy+0x32/0xe0
> > > <4>[196727.311959]  [<ffffffff815b86c6>] dst_release+0x56/0x80
> > > <4>[196727.311986]  [<ffffffff81620bd5>] tcp_v4_do_rcv+0x2a5/0x4a0
> > > <4>[196727.312013]  [<ffffffff81622b5a>] tcp_v4_rcv+0x7da/0x820
> > > <4>[196727.312041]  [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
> > > <4>[196727.312070]  [<ffffffff815de02d>] ? nf_hook_slow+0x7d/0x150
> > > <4>[196727.312097]  [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
> > > <4>[196727.312125]  [<ffffffff815fda92>] ip_local_deliver_finish+0xb2/0x230
> > > <4>[196727.312154]  [<ffffffff815fdd9a>] ip_local_deliver+0x4a/0x90
> > > <4>[196727.312183]  [<ffffffff815fd799>] ip_rcv_finish+0x119/0x360
> > > <4>[196727.312212]  [<ffffffff815fe00b>] ip_rcv+0x22b/0x340
> > > <4>[196727.312242]  [<ffffffffa0339680>] ? macvlan_broadcast+0x160/0x160 [macvlan]
> > > <4>[196727.312275]  [<ffffffff815b0c62>] __netif_receive_skb_core+0x512/0x640
> > > <4>[196727.312308]  [<ffffffff811427fb>] ? kmem_cache_alloc+0x13b/0x150
> > > <4>[196727.312338]  [<ffffffff815b0db1>] __netif_receive_skb+0x21/0x70
> > > <4>[196727.312368]  [<ffffffff815b0fa1>] netif_receive_skb+0x31/0xa0
> > > <4>[196727.312397]  [<ffffffff815b1ae8>] napi_gro_receive+0xe8/0x140
> > > <4>[196727.312433]  [<ffffffffa00274f1>] ixgbe_poll+0x551/0x11f0 [ixgbe]
> > > <4>[196727.312463]  [<ffffffff815fe00b>] ? ip_rcv+0x22b/0x340
> > > <4>[196727.312491]  [<ffffffff815b1691>] net_rx_action+0x111/0x210
> > > <4>[196727.312521]  [<ffffffff815b0db1>] ? __netif_receive_skb+0x21/0x70
> > > <4>[196727.312552]  [<ffffffff810519d0>] __do_softirq+0xd0/0x270
> > > <4>[196727.312583]  [<ffffffff816cef3c>] call_softirq+0x1c/0x30
> > > <4>[196727.312613]  [<ffffffff81004205>] do_softirq+0x55/0x90
> > > <4>[196727.312640]  [<ffffffff81051c85>] irq_exit+0x55/0x60
> > > <4>[196727.312668]  [<ffffffff816cf5c3>] do_IRQ+0x63/0xe0
> > > <4>[196727.312696]  [<ffffffff816c5aaa>] common_interrupt+0x6a/0x6a
> > > <4>[196727.312722]  <EOI>
> > > <4>[196727.312727]  [<ffffffff8100a150>] ? default_idle+0x20/0xe0
> > > <4>[196727.312775]  [<ffffffff8100a8ff>] arch_cpu_idle+0xf/0x20
> > > <4>[196727.312803]  [<ffffffff8108d330>] cpu_startup_entry+0xc0/0x270
> > > <4>[196727.312833]  [<ffffffff816b276e>] start_secondary+0x1f9/0x200
> > > <4>[196727.312860] Code: 4a 9f e9 81 e8 13 cb 0c 00 48 8b 93 b0 00 00 00 48 bf 00 02 20 00 00 00 ad de 48 8b 83 b8 00 00 00 48 be 00 01 10 00 00 00 ad de <48> 89 42 08 48 89 10 48 89 bb b8 00 00 00 48 c7 c7 4a 9f e9 81
> > > <1>[196727.313071] RIP  [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
> > > <4>[196727.313100]  RSP <ffff885effd23a70>
> > > <4>[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
> > > <0>[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt
> > >
> > >
> > > ... bisecting it's going to be a pain... I tried eyeballing the diffs and
> > > am trying a revert or two.
> > >
> > > We've hit it in .25, .26 so far. I have .27 running but not sure if it
> > > crashed, so the change exists between .15 and .25.
> >
> > Please try following fix, thanks for the report !
> >
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index 25071b48921c..78a50a22298a 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -1333,7 +1333,7 @@ static void ipv4_dst_destroy(struct dst_entry
> > *dst)
> >
> >  	if (!list_empty(&rt->rt_uncached)) {
> >  		spin_lock_bh(&rt_uncached_lock);
> > -		list_del(&rt->rt_uncached);
> > +		list_del_init(&rt->rt_uncached);
> >  		spin_unlock_bh(&rt_uncached_lock);
> >  	}
> >  }
> >
>
> Problem could come from this commit, in linux 3.10.23,
> you also could try to revert it
>
> commit 62713c4b6bc10c2d082ee1540e11b01a2b2162ab
> Author: Alexei Starovoitov <ast@plumgrid.com>
> Date:   Tue Nov 19 19:12:34 2013 -0800
>
>     ipv4: fix race in concurrent ip_route_input_slow()
>
>     [ Upstream commit dcdfdf56b4a6c9437fc37dbc9cee94a788f9b0c4 ]
>
>     CPUs can ask for local route via ip_route_input_noref() concurrently.
>     if nh_rth_input is not cached yet, CPUs will proceed to allocate
>     equivalent DSTs on 'lo' and then will try to cache them in nh_rth_input
>     via rt_cache_route()
>     Most of the time they succeed, but on occasion the following two lines:
>         orig = *p;
>         prev = cmpxchg(p, orig, rt);
>     in rt_cache_route() do race and one of the cpus fails to complete cmpxchg.
>     But ip_route_input_slow() doesn't check the return code of rt_cache_route(),
>     so dst is leaking. dst_destroy() is never called and 'lo' device
>     refcnt doesn't go to zero, which can be seen in the logs as:
>         unregister_netdevice: waiting for lo to become free. Usage count = 1
>     Adding mdelay() between above two lines makes it easily reproducible.
>     Fix it similar to nh_pcpu_rth_output case.
>
>     Fixes: d2d68ba9fe8b ("ipv4: Cache input routes in fib_info nexthops.")
>     Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>

Heh. I spent an hour squinting at the difflog from .15 to .25 and this was
my best guess. I have a kernel running in production with only this
reverted as of ~5 hours ago. Won't know if it helps for a day or two.

I'm building a kernel now with your route patch, but 62713c4 *not*
reverted, which I can throw on a different machine. Does this sound like a
good idea?

Thanks for your quick help as always!

^ permalink raw reply

* [PATCH net-next v3 0/2] bonding: fix primary problem for bonding
From: Ding Tianhong @ 2014-01-18  8:28 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, Netdev, David S. Miller

If the slave's name changed, and the bond params primary is exist,
the bond should deal with the situation in two ways:

1) If the slave was the primary slave yet, clean the primary slave
   and reselect active slave.
2) If the slave's new name is as same as bond primary, set the slave
   as primary slave and reselect active slave.

If the new primary is not matching any slave in the bond, the bond should
record it to params, clean the primary slave and select a new active slave.

Update bonding.txt for primary description.

v2.1->v1: Because there are too many indentions and useless verification, so rewrite
	  the logic for updating the primary slave.
	  Modify some comments for to clean the typos.

v3->v2.1: Veaceslav disagree the first patch and modify the logic for it
	  (bonding: update the primary slave when changing slave's name)
	  and resend it himself (bonding: handle slave's name change with primary_slave logic),
	  so remove the first patch and send the last two patches.

Ding Tianhong (2):
  bonding: clean the primary slave if there is no slave matching new
    primary
  bonding: update bonding.txt for primary description.

 Documentation/networking/bonding.txt | 3 ++-
 drivers/net/bonding/bond_options.c   | 6 ++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

-- 
1.8.0

^ permalink raw reply

* [PATCH net-next v3 1/2] bonding: clean the primary slave if there is no slave matching new primary
From: Ding Tianhong @ 2014-01-18  8:28 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, David S. Miller, Netdev

If the new primay is not matching any slave in the bond, the bond should
record it to params, clean the primary slave and select a new active slave.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/bonding/bond_options.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 945a666..0ee0bfe 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -512,6 +512,12 @@ int bond_option_primary_set(struct bonding *bond, const char *primary)
 		}
 	}

+	if (bond->primary_slave) {
+		pr_info("%s: Setting primary slave to None.\n",
+			bond->dev->name);
+		bond->primary_slave = NULL;
+		bond_select_active_slave(bond);
+	}
 	strncpy(bond->params.primary, primary, IFNAMSIZ);
 	bond->params.primary[IFNAMSIZ - 1] = 0;

-- 
1.8.0

^ permalink raw reply related

* [PATCH net-next v3 2/2] bonding: update bonding.txt for primary description
From: Ding Tianhong @ 2014-01-18  8:28 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, David S. Miller, Netdev

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 Documentation/networking/bonding.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index a4d925e..5cdb229 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -657,7 +657,8 @@ primary
 	one slave is preferred over another, e.g., when one slave has
 	higher throughput than another.
 
-	The primary option is only valid for active-backup mode.
+	The primary option is only valid for active-backup(1),
+	balance-tlb (5) and balance-alb (6) mode.
 
 primary_reselect
 
-- 
1.8.0

^ permalink raw reply related

* [PATCH net-next] bonding: move the netdev_add_tso_features() to bonding module
From: Ding Tianhong @ 2014-01-18  8:31 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, Eric Dumazet, David S. Miller,
	Netdev

The function netdev_add_tso_features() was only be used for bonding,
so no need to export it in netdevice.h, move it to bonding module.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/bonding/bond_main.c | 12 +++++++++++-
 include/linux/netdevice.h       | 10 ----------
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index e06c445..4cfe14e 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1045,6 +1045,16 @@ static void bond_netpoll_cleanup(struct net_device *bond_dev)
 
 /*---------------------------------- IOCTL ----------------------------------*/
 
+/* Allow TSO being used on stacked device:
+ * Performing the GSO segmentation before last device
+ * is a performance improvement.
+ */
+static netdev_features_t bond_add_tso_features(netdev_features_t features,
+					       netdev_features_t mask)
+{
+	return netdev_increment_features(features, NETIF_F_ALL_TSO, mask);
+}
+
 static netdev_features_t bond_fix_features(struct net_device *dev,
 					   netdev_features_t features)
 {
@@ -1068,7 +1078,7 @@ static netdev_features_t bond_fix_features(struct net_device *dev,
 						     slave->dev->features,
 						     mask);
 	}
-	features = netdev_add_tso_features(features, mask);
+	features = bond_add_tso_features(features, mask);
 
 	return features;
 }
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a2a70cc..1be74ea 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3010,16 +3010,6 @@ static inline netdev_features_t netdev_get_wanted_features(
 netdev_features_t netdev_increment_features(netdev_features_t all,
 	netdev_features_t one, netdev_features_t mask);
 
-/* Allow TSO being used on stacked device :
- * Performing the GSO segmentation before last device
- * is a performance improvement.
- */
-static inline netdev_features_t netdev_add_tso_features(netdev_features_t features,
-							netdev_features_t mask)
-{
-	return netdev_increment_features(features, NETIF_F_ALL_TSO, mask);
-}
-
 int __netdev_update_features(struct net_device *dev);
 void netdev_update_features(struct net_device *dev);
 void netdev_change_features(struct net_device *dev);
-- 
1.8.0

^ permalink raw reply related

* kmem_cache_alloc panic in 3.10+
From: dormando @ 2014-01-18  8:44 UTC (permalink / raw)
  To: netdev, linux-kernel

Hello again!

We've had a rare crash that's existed between 3.10.0 and 3.10.15 at least
(trying newer stables now, but I can't tell if it was fixed, and it takes
weeks to reproduce).

Unfortunately I can only get 8k back from pstore. The panic looks a bit
longer than that is caught in the log, but the bottom part is almost
always this same trace as this one:

Panic#6 Part1
<4>[1197485.199166]  [<ffffffff81611e8c>] tcp_push+0x6c/0x90
<4>[1197485.199171]  [<ffffffff816160a9>] tcp_sendmsg+0x109/0xd40
<4>[1197485.199179]  [<ffffffff81114b65>] ? put_page+0x35/0x40
<4>[1197485.199185]  [<ffffffff8163bf75>] inet_sendmsg+0x45/0xb0
<4>[1197485.199191]  [<ffffffff8159da7e>] sock_aio_write+0x11e/0x130
<4>[1197485.199196]  [<ffffffff8163b83f>] ? inet_recvmsg+0x4f/0x80
<4>[1197485.199203]  [<ffffffff811558ad>] do_sync_readv_writev+0x6d/0xa0
<4>[1197485.199209]  [<ffffffff8115722b>] do_readv_writev+0xfb/0x2f0
<4>[1197485.199215]  [<ffffffff8110fda5>] ? __free_pages+0x35/0x40
<4>[1197485.199220]  [<ffffffff8110fe56>] ? free_pages+0x46/0x50
<4>[1197485.199226]  [<ffffffff8112f9e2>] ? SyS_mincore+0x152/0x690
<4>[1197485.199231]  [<ffffffff81157468>] vfs_writev+0x48/0x60
<4>[1197485.199236]  [<ffffffff811575af>] SyS_writev+0x5f/0xd0
<4>[1197485.199243]  [<ffffffff816cf942>] system_call_fastpath+0x16/0x1b
<4>[1197485.199247] Code: 65 4c 03 04 25 c8 cb 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 84 00 00 00 48 85 c0 74 7f 49 63 44 24 20 49 8b 3c 24 <49> 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c
<1>[1197485.199290] RIP  [<ffffffff811476da>] kmem_cache_alloc+0x5a/0x130
<4>[1197485.199296]  RSP <ffff883171211868>
<4>[1197485.199299] CR2: 0000000100000000
<4>[1197485.199343] ---[ end trace 90fee06aa40b7304 ]---
<1>[1197485.263911] BUG: unable to handle kernel paging request at 0000000100000000
<1>[1197485.263923] IP: [<ffffffff811476da>] kmem_cache_alloc+0x5a/0x130
<4>[1197485.263932] PGD 3f43e5c067 PUD 0
<4>[1197485.263937] Oops: 0000 [#5] SMP
<4>[1197485.263941] Modules linked in: ntfs vfat msdos fat macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode sb_edac edac_core lpc_ich mfd_core ixgbe igb i2c_algo_bit mdio ptp pps_core
<4>[1197485.263966] CPU: 0 PID: 233846 Comm: cache-worker Tainted: G      D      3.10.15 #1
<4>[1197485.263972] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 2.0a 03/07/2013
<4>[1197485.263976] task: ffff883427f9dc00 ti: ffff8830d4312000 task.ti: ffff8830d4312000
<4>[1197485.263982] RIP: 0010:[<ffffffff811476da>]  [<ffffffff811476da>] kmem_cache_alloc+0x5a/0x130
<4>[1197485.263990] RSP: 0018:ffff881fffc038c8  EFLAGS: 00010286
<4>[1197485.263994] RAX: 0000000000000000 RBX: ffffffff81c8c740 RCX: 00000000ffffffff
<4>[1197485.263999] RDX: 0000000029273024 RSI: 0000000000000020 RDI: 0000000000015680
<4>[1197485.264004] RBP: ffff881fffc03908 R08: ffff881fffc15680 R09: ffffffff815bdd4b
<4>[1197485.264009] R10: ffff881c65d21800 R11: 0000000000000000 R12: ffff881fff803800
<4>[1197485.264014] R13: 0000000100000000 R14: 00000000ffffffff R15: 0000000000000000
<4>[1197485.264019] FS:  00007f8d855eb700(0000) GS:ffff881fffc00000(0000) knlGS:0000000000000000
<4>[1197485.264024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[1197485.264028] CR2: 0000000100000000 CR3: 000000308f258000 CR4: 00000000000407f0
<4>[1197485.264032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[1197485.264037] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[1197485.264041] Stack:
<4>[1197485.264044]  ffff881fffc03928 00000020815d0d95 ffff881fffc03938 ffffffff81c8c740
<4>[1197485.264050]  ffff881fce210000 0000000000000001 00000000ffffffff 0000000000000000
<4>[1197485.264056]  ffff881fffc03958 ffffffff815bdd4b ffff881fffc039a8 0000000000000000
<4>[1197485.264063] Call Trace:
<4>[1197485.264066]  <IRQ>
<4>[1197485.264069]  [<ffffffff815bdd4b>] dst_alloc+0x5b/0x190
<4>[1197485.264080]  [<ffffffff8160068c>] rt_dst_alloc+0x4c/0x50
<4>[1197485.264085]  [<ffffffff81602a30>] __ip_route_output_key+0x270/0x880
<4>[1197485.264092]  [<ffffffff8107ee7e>] ? try_to_wake_up+0x23e/0x2b0
<4>[1197485.264097]  [<ffffffff81603067>] ip_route_output_flow+0x27/0x60
<4>[1197485.264102]  [<ffffffff8160ab8a>] ip_queue_xmit+0x36a/0x390
<4>[1197485.264108]  [<ffffffff816207c5>] tcp_transmit_skb+0x485/0x890
<4>[1197485.264113]  [<ffffffff81621aa1>] tcp_send_ack+0xf1/0x130
<4>[1197485.264118]  [<ffffffff81618d7e>] __tcp_ack_snd_check+0x5e/0xa0
<4>[1197485.264123]  [<ffffffff8161f2c2>] tcp_rcv_state_process+0x8b2/0xb20
<4>[1197485.264128]  [<ffffffff81627e61>] tcp_v4_do_rcv+0x191/0x4f0
<4>[1197485.264133]  [<ffffffff8162984c>] tcp_v4_rcv+0x5fc/0x750
<4>[1197485.264138]  [<ffffffff81604c80>] ? ip_rcv+0x350/0x350
<4>[1197485.264143]  [<ffffffff815e45cd>] ? nf_hook_slow+0x7d/0x160
<4>[1197485.264147]  [<ffffffff81604c80>] ? ip_rcv+0x350/0x350
<4>[1197485.264152]  [<ffffffff81604d4e>] ip_local_deliver_finish+0xce/0x250
<4>[1197485.264156]  [<ffffffff81604f1c>] ip_local_deliver+0x4c/0x80
<4>[1197485.264161]  [<ffffffff816045a9>] ip_rcv_finish+0x119/0x360
<4>[1197485.264165]  [<ffffffff81604b60>] ip_rcv+0x230/0x350
<4>[1197485.264170]  [<ffffffff815b89f7>] __netif_receive_skb_core+0x477/0x600
<4>[1197485.264175]  [<ffffffff815b8ba7>] __netif_receive_skb+0x27/0x70
<4>[1197485.264180]  [<ffffffff815b8ce4>] process_backlog+0xf4/0x1e0
<4>[1197485.264184]  [<ffffffff815b94e5>] net_rx_action+0xf5/0x250
<4>[1197485.264190]  [<ffffffff81053b7f>] __do_softirq+0xef/0x270
<4>[1197485.264196]  [<ffffffff816d0b7c>] call_softirq+0x1c/0x30
<4>[1197485.264199]  <EOI>
<4>[1197485.264201]  [<ffffffff81004495>] do_softirq+0x55/0x90
<4>[1197485.264209]  [<ffffffff81053a84>] local_bh_enable+0x94/0xa0
<4>[1197485.264215]  [<ffffffff8165567a>] ipt_do_table+0x22a/0x680
<4>[1197485.264221]  [<ffffffff815d39c1>] ? skb_clone_tx_timestamp+0x31/0x110
<4>[1197485.264231]  [<ffffffffa00ae840>] ? ixgbe_xmit_frame_ring+0x4c0/0xd40 [ixgbe]
<4>[1197485.264239]  [<ffffffffa00af103>] ? ixgbe_xmit_frame+0x43/0x90 [ixgbe]
<4>[1197485.264245]  [<ffffffff81657a23>] iptable_raw_hook+0x33/0x70
<4>[1197485.264252]  [<ffffffff815e43a7>] nf_iterate+0x87/0xb0
<4>[1197485.264256]  [<ffffffff81607e20>] ? ip_options_echo+0x420/0x420
<4>[1197485.264261]  [<ffffffff815e45cd>] nf_hook_slow+0x7d/0x160
<4>[1197485.264266]  [<ffffffff81607e20>] ? ip_options_echo+0x420/0x420
<4>[1197485.264270]  [<ffffffff8160a430>] __ip_local_out+0xa0/0xb0
<4>[1197485.264275]  [<ffffffff8160a456>] ip_local_out+0x16/0x30
<4>[1197485.264280]  [<ffffffff8160a97a>] ip_queue_xmit+0x15a/0x390
<4>[1197485.264286]  [<ffffffff81625e73>] ? tcp_v4_md5_lookup+0x13/0x20
<4>[1197485.264290]  [<ffffffff816207c5>] tcp_transmit_skb+0x485/0x890
<4>[1197485.264295]  [<ffffffff81622e08>] tcp_write_xmit+0x1b8/0xa50
<4>[1197485.264300]  [<ffffffff815a7e28>] ? __alloc_skb+0xa8/0x1f0
<4>[1197485.264304]  [<ffffffff816236d0>] tcp_push_one+0x30/0x40
<4>[1197485.264309]  [<ffffffff81616b84>] tcp_sendmsg+0xbe4/0xd40
<4>[1197485.264315]  [<ffffffff81114b65>] ? put_page+0x35/0x40
<4>[1197485.264321]  [<ffffffff8163bf75>] inet_sendmsg+0x45/0xb0
<4>[1197485.264326]  [<ffffffff8159da7e>] sock_aio_write+0x11e/0x130
<4>[1197485.264331]  [<ffffffff8163b83f>] ? inet_recvmsg+0x4f/0x80
<4>[1197485.264337]  [<ffffffff811558ad>] do_sync_readv_writev+0x6d/0xa0
<4>[1197485.264343]  [<ffffffff8115722b>] do_readv_writev+0xfb/0x2f0
<4>[1197485.264347]  [<ffffffff8110fda5>] ? __free_pages+0x35/0x40
<4>[1197485.264352]  [<ffffffff8110fe56>] ? free_pages+0x46/0x50
<4>[1197485.264357]  [<ffffffff8112f9e2>] ? SyS_mincore+0x152/0x690
<4>[1197485.264363]  [<ffffffff81157468>] vfs_writev+0x48/0x60
<4>[1197485.264367]  [<ffffffff811575af>] SyS_writev+0x5f/0xd0
<4>[1197485.264373]  [<ffffffff816cf942>] system_call_fastpath+0x16/0x1b
<4>[1197485.264377] Code: 65 4c 03 04 25 c8 cb 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 84 00 00 00 48 85 c0 74 7f 49 63 44 24 20 49 8b 3c 24 <49> 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c
<1>[1197485.264417] RIP  [<ffffffff811476da>] kmem_cache_alloc+0x5a/0x130
<4>[1197485.264424]  RSP <ffff881fffc038c8>
<4>[1197485.264427] CR2: 0000000100000000
<4>[1197485.264431] ---[ end trace 90fee06aa40b7305 ]---
<0>[1197485.325141] Kernel panic - not syncing: Fatal exception in interrupt

... way down in the tcp code.

Any help would be appreciated :) I'll do what I can to help, but iterating
this particular crash is very hard due to the amount of time it takes to
reproduce. Since we have a large number of machines they're always
crashing here and there, but once they do it's not going to happen again
for a while.

Thanks!
-Dormando

^ permalink raw reply

* Re: [PATCH v2 net] bpf: do not use reciprocal divide
From: Heiko Carstens @ 2014-01-18 10:12 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, schwidefsky, hannes, netdev, dborkman, darkjames-ws,
	mgherzan, rmk+kernel, matt
In-Reply-To: <20140117.185600.1405505573912550580.davem@davemloft.net>

On Fri, Jan 17, 2014 at 06:56:00PM -0800, David Miller wrote:
> From: Heiko Carstens <heiko.carstens@de.ibm.com>
> Date: Fri, 17 Jan 2014 09:59:16 +0100
> 
> > Could you please also apply the patch below to your tree? It would only
> > generate a merge conflict, that would need fixing, if it would sit in the
> > s390 tree.
> 
> Applied and I queued it up for -stable so I can combine it with
> Eric's original change when I submit it to -stable.

Great, thank you!

^ permalink raw reply

* [PATCH net-next] sch_netem: replace magic numbers with enumerate
From: Yang Yingliang @ 2014-01-18 10:13 UTC (permalink / raw)
  To: netdev; +Cc: stephen, davem

Replace some magic numbers which describe states of 4-state model
loss generator with enumerate.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 net/sched/sch_netem.c | 47 ++++++++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 3019c10..a2bfc37 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -110,6 +110,13 @@ struct netem_sched_data {
 		CLG_GILB_ELL,
 	} loss_model;
 
+	enum {
+		TX_IN_GAP_PERIOD = 1,
+		TX_IN_BURST_PERIOD,
+		LOST_IN_GAP_PERIOD,
+		LOST_IN_BURST_PERIOD,
+	} _4_state_model;
+
 	/* Correlated Loss Generation models */
 	struct clgstate {
 		/* state of the Markov chain */
@@ -205,43 +212,45 @@ static bool loss_4state(struct netem_sched_data *q)
 	 * probabilities outgoing from the current state, then decides the
 	 * next state and if the next packet has to be transmitted or lost.
 	 * The four states correspond to:
-	 *   1 => successfully transmitted packets within a gap period
-	 *   4 => isolated losses within a gap period
-	 *   3 => lost packets within a burst period
-	 *   2 => successfully transmitted packets within a burst period
+	 *   TX_IN_GAP_PERIOD => successfully transmitted packets within a gap period
+	 *   LOST_IN_BURST_PERIOD => isolated losses within a gap period
+	 *   LOST_IN_GAP_PERIOD => lost packets within a burst period
+	 *   TX_IN_GAP_PERIOD => successfully transmitted packets within a burst period
 	 */
 	switch (clg->state) {
-	case 1:
+	case TX_IN_GAP_PERIOD:
 		if (rnd < clg->a4) {
-			clg->state = 4;
+			clg->state = LOST_IN_BURST_PERIOD;
 			return true;
 		} else if (clg->a4 < rnd && rnd < clg->a1 + clg->a4) {
-			clg->state = 3;
+			clg->state = LOST_IN_GAP_PERIOD;
 			return true;
-		} else if (clg->a1 + clg->a4 < rnd)
-			clg->state = 1;
+		} else if (clg->a1 + clg->a4 < rnd) {
+			clg->state = TX_IN_GAP_PERIOD;
+		}
 
 		break;
-	case 2:
+	case TX_IN_BURST_PERIOD:
 		if (rnd < clg->a5) {
-			clg->state = 3;
+			clg->state = LOST_IN_GAP_PERIOD;
 			return true;
-		} else
-			clg->state = 2;
+		} else {
+			clg->state = TX_IN_BURST_PERIOD;
+		}
 
 		break;
-	case 3:
+	case LOST_IN_GAP_PERIOD:
 		if (rnd < clg->a3)
-			clg->state = 2;
+			clg->state = TX_IN_BURST_PERIOD;
 		else if (clg->a3 < rnd && rnd < clg->a2 + clg->a3) {
-			clg->state = 1;
+			clg->state = TX_IN_GAP_PERIOD;
 		} else if (clg->a2 + clg->a3 < rnd) {
-			clg->state = 3;
+			clg->state = LOST_IN_GAP_PERIOD;
 			return true;
 		}
 		break;
-	case 4:
-		clg->state = 1;
+	case LOST_IN_BURST_PERIOD:
+		clg->state = TX_IN_GAP_PERIOD;
 		break;
 	}
 
-- 
1.8.0

^ permalink raw reply related

* Re: [PATCH 1/6] cgroup: make CONFIG_NET_CLS_CGROUP and CONFIG_NETPRIO_CGROUP bool instead of tristate
From: Daniel Borkmann @ 2014-01-18 11:25 UTC (permalink / raw)
  To: Li Zefan
  Cc: Neil Horman, netdev,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Thomas Graf, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, David S. Miller
In-Reply-To: <52D9D421.6040608-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

On 01/18/2014 02:08 AM, Li Zefan wrote:
> Cc: Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> On 2014/1/18 2:11, Tejun Heo wrote:
>> net_cls and net_prio are the only cgroups which are allowed to be
>> built as modules.  The savings from allowing the two controllers to be
>> built as modules are tiny especially given that cgroup module support
>> itself adds quite a bit of complexity.
>>
>> The following are the sizes of vmlinux with both built as module and
>> both built as part of the kernel image with cgroup module support
>> removed.
>>
>> 	text		data		bss		dec
>> 	20292207	2411496		10784768	33488471
>> 	20293421	2412568		10784768	33490757
>>
>> The total difference is 2286 bytes.  Given that none of other
>> controllers has much chance of being made a module and that we're
>> unlikely to add new modular controllers, the added complexity is
>> simply not justifiable.
>>
>> As a first step to drop cgroup module support, this patch changes the
>> two config options to bool from tristate and drops module related code
>> from the two controllers.
>>
> 
> I sugguested Daniel to do this for net_cls, and the change has been in
> net-next.
> 
> https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=fe1217c4f3f7d7cbf8efdd8dd5fdc7204a1d65a8
> 
> I was planning to remove module support after that change goes into
> upstream. :)

I am fine with that, thanks Li.

>> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>> Cc: Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
>> Cc: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
>> Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
>> ---
>>   net/Kconfig               |  2 +-
>>   net/core/netprio_cgroup.c | 32 ++------------------------------
>>   net/sched/Kconfig         |  2 +-
>>   net/sched/cls_cgroup.c    | 23 ++---------------------
>>   4 files changed, 6 insertions(+), 53 deletions(-)
>>
> 
> The modular version of task_netprioidx() in include/net/netprio_cgroup.h
> can be removed.
> 

^ permalink raw reply

* [PATCH linux-next] net: batman-adv: use "__packed __aligned(2)" for each structure instead of "__packed(2)" region
From: Chen Gang @ 2014-01-18 11:31 UTC (permalink / raw)
  To: mareklindner-rVWd3aGhH2z5bpWLKbzFeg,
	sw-2YrNx6rUIHYiY0qSoAWiAoQuADTiUCJX,
	antonio-x4xJYDvStAgysxA8WJXlww
  Cc: David Miller, b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	netdev, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-metag-u79uwXL29TY76Z2rM5mHXA, James Hogan

Unfortunately, not all compilers assumes the structures within a pack
region also need be packed (e.g. metag), so need add a pack explicitly
to satisfy all compilers.

The related error (under metag with allmodconfig):

    MODPOST 2952 modules
  ERROR: "__compiletime_assert_431" [net/batman-adv/batman-adv.ko] undefined!
  ERROR: "__compiletime_assert_432" [net/batman-adv/batman-adv.ko] undefined!
  ERROR: "__compiletime_assert_429" [net/batman-adv/batman-adv.ko] undefined!
  ERROR: "__compiletime_assert_428" [net/batman-adv/batman-adv.ko] undefined!
  ERROR: "__compiletime_assert_423" [net/batman-adv/batman-adv.ko] undefined!


Signed-off-by: Chen Gang <gang.chen.5i5j-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 net/batman-adv/packet.h | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/net/batman-adv/packet.h b/net/batman-adv/packet.h
index 0a381d1..9206b48 100644
--- a/net/batman-adv/packet.h
+++ b/net/batman-adv/packet.h
@@ -154,7 +154,6 @@ enum batadv_tvlv_type {
 	BATADV_TVLV_ROAM	= 0x05,
 };
 
-#pragma pack(2)
 /* the destination hardware field in the ARP frame is used to
  * transport the claim type and the group id
  */
@@ -162,8 +161,7 @@ struct batadv_bla_claim_dst {
 	uint8_t magic[3];	/* FF:43:05 */
 	uint8_t type;		/* bla_claimframe */
 	__be16 group;		/* group id */
-};
-#pragma pack()
+} __packed __aligned(2);
 
 /**
  * struct batadv_ogm_packet - ogm (routing protocol) packet
@@ -281,7 +279,6 @@ struct batadv_icmp_packet_rr {
  * misalignment of the payload after the ethernet header. It may also lead to
  * leakage of information when the padding it not initialized before sending.
  */
-#pragma pack(2)
 
 /**
  * struct batadv_unicast_packet - unicast packet for network payload
@@ -300,7 +297,7 @@ struct batadv_unicast_packet {
 	/* "4 bytes boundary + 2 bytes" long to make the payload after the
 	 * following ethernet header again 4 bytes boundary aligned
 	 */
-};
+}  __packed __aligned(2);
 
 /**
  * struct batadv_unicast_4addr_packet - extended unicast packet
@@ -316,7 +313,7 @@ struct batadv_unicast_4addr_packet {
 	/* "4 bytes boundary + 2 bytes" long to make the payload after the
 	 * following ethernet header again 4 bytes boundary aligned
 	 */
-};
+}  __packed __aligned(2);
 
 /**
  * struct batadv_frag_packet - fragmented packet
@@ -347,7 +344,7 @@ struct batadv_frag_packet {
 	uint8_t orig[ETH_ALEN];
 	__be16  seqno;
 	__be16  total_size;
-};
+}  __packed __aligned(2);
 
 /**
  * struct batadv_bcast_packet - broadcast packet for network payload
@@ -368,7 +365,7 @@ struct batadv_bcast_packet {
 	/* "4 bytes boundary + 2 bytes" long to make the payload after the
 	 * following ethernet header again 4 bytes boundary aligned
 	 */
-};
+}  __packed __aligned(2);
 
 /**
  * struct batadv_coded_packet - network coded packet
@@ -404,9 +401,8 @@ struct batadv_coded_packet {
 	uint8_t  second_orig_dest[ETH_ALEN];
 	__be32   second_crc;
 	__be16   coded_len;
-};
+}  __packed __aligned(2);
 
-#pragma pack()
 
 /**
  * struct batadv_unicast_tvlv - generic unicast packet with tvlv payload
-- 
1.7.11.7
--
To unsubscribe from this list: send the line "unsubscribe linux-metag" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH 3/7] staging,spear_adc: Add dependency on HAS_IOMEM
From: Jonathan Cameron @ 2014-01-18 11:41 UTC (permalink / raw)
  To: Richard Weinberger, kishon-l0cyMroinI0,
	anton-9xeibp6oKSgdnm+yROfE0A, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	richardcochran-Re5JQEeQqe8AvxtiuMwx3w,
	lidza.louina-Re5JQEeQqe8AvxtiuMwx3w,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: sebastian.hesselbarth-Re5JQEeQqe8AvxtiuMwx3w,
	florian-p3rKhJxN3npAfugRpC6u6w,
	thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8,
	lars-Qo5EllUWu/uELgA04lAiVw, marex-ynQEQJNshbs,
	acourbot-DDmLM1+adcrQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	driverdev-devel-tBiZLqfeLfOHmIFyCCdPziST3g8Odh+X,
	devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b,
	linux-iio-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1389714345-20165-3-git-send-email-richard-/L3Ra7n9ekc@public.gmane.org>

On 14/01/14 15:45, Richard Weinberger wrote:
> On archs like S390 or um this driver cannot build nor work.
> Make it depend on HAS_IOMEM to bypass build failures.
>
> drivers/staging/iio/adc/spear_adc.c: In function ‘spear_adc_probe’:
> drivers/staging/iio/adc/spear_adc.c:393:2: error: implicit declaration of function ‘iounmap’ [-Werror=implicit-function-declaration
>
> Signed-off-by: Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org>
Applied to the fixes-togreg branch of iio.git

Thanks,
> ---
>   drivers/staging/iio/adc/Kconfig | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/staging/iio/adc/Kconfig b/drivers/staging/iio/adc/Kconfig
> index e3d6430..7d5d675 100644
> --- a/drivers/staging/iio/adc/Kconfig
> +++ b/drivers/staging/iio/adc/Kconfig
> @@ -128,6 +128,7 @@ config MXS_LRADC
>   config SPEAR_ADC
>   	tristate "ST SPEAr ADC"
>   	depends on PLAT_SPEAR || COMPILE_TEST
> +	depends on HAS_IOMEM
>   	help
>   	  Say yes here to build support for the integrated ADC inside the
>   	  ST SPEAr SoC. Provides direct access via sysfs.
>

^ permalink raw reply

* Re: [PATCH 7/7] staging,lpc32xx_adc: Add dependency on HAS_IOMEM
From: Jonathan Cameron @ 2014-01-18 11:50 UTC (permalink / raw)
  To: Richard Weinberger, kishon, anton, dwmw2, richardcochran,
	lidza.louina, gregkh, davem
  Cc: marex, lars, linux-iio, netdev, driverdev-devel, linux-kernel,
	acourbot, devel, florian, sebastian.hesselbarth
In-Reply-To: <1389714345-20165-7-git-send-email-richard@nod.at>



On 14/01/14 15:45, Richard Weinberger wrote:
> On archs like S390 or um this driver cannot build nor work.
> Make it depend on HAS_IOMEM to bypass build failures.
>
> drivers/built-in.o: In function `lpc32xx_adc_probe':
> drivers/staging/iio/adc/lpc32xx_adc.c:149: undefined reference to `devm_ioremap'
>
> Signed-off-by: Richard Weinberger <richard@nod.at>
applied to the fixes-togreg branch of iio.git

Thanks,
> ---
>   drivers/staging/iio/adc/Kconfig | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/staging/iio/adc/Kconfig b/drivers/staging/iio/adc/Kconfig
> index 7d5d675..3633298 100644
> --- a/drivers/staging/iio/adc/Kconfig
> +++ b/drivers/staging/iio/adc/Kconfig
> @@ -103,6 +103,7 @@ config AD7280
>   config LPC32XX_ADC
>   	tristate "NXP LPC32XX ADC"
>   	depends on ARCH_LPC32XX || COMPILE_TEST
> +	depends on HAS_IOMEM
>   	help
>   	  Say yes here to build support for the integrated ADC inside the
>   	  LPC32XX SoC. Note that this feature uses the same hardware as the
>

^ permalink raw reply

* Re: [PATCH net-next] bonding: move the netdev_add_tso_features() to bonding module
From: Veaceslav Falico @ 2014-01-18 11:48 UTC (permalink / raw)
  To: Ding Tianhong; +Cc: Jay Vosburgh, Eric Dumazet, David S. Miller, Netdev
In-Reply-To: <52DA3BE5.5020500@huawei.com>

On Sat, Jan 18, 2014 at 04:31:33PM +0800, Ding Tianhong wrote:
>The function netdev_add_tso_features() was only be used for bonding,
>so no need to export it in netdevice.h, move it to bonding module.

Eric added it for a reason - like, other drivers might use it. Do you know
if team, bridge, vlan etc. might use it?

Thanks.

>
>Cc: Eric Dumazet <edumazet@google.com>
>Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>---
> drivers/net/bonding/bond_main.c | 12 +++++++++++-
> include/linux/netdevice.h       | 10 ----------
> 2 files changed, 11 insertions(+), 11 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index e06c445..4cfe14e 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1045,6 +1045,16 @@ static void bond_netpoll_cleanup(struct net_device *bond_dev)
>
> /*---------------------------------- IOCTL ----------------------------------*/
>
>+/* Allow TSO being used on stacked device:
>+ * Performing the GSO segmentation before last device
>+ * is a performance improvement.
>+ */
>+static netdev_features_t bond_add_tso_features(netdev_features_t features,
>+					       netdev_features_t mask)
>+{
>+	return netdev_increment_features(features, NETIF_F_ALL_TSO, mask);
>+}
>+
> static netdev_features_t bond_fix_features(struct net_device *dev,
> 					   netdev_features_t features)
> {
>@@ -1068,7 +1078,7 @@ static netdev_features_t bond_fix_features(struct net_device *dev,
> 						     slave->dev->features,
> 						     mask);
> 	}
>-	features = netdev_add_tso_features(features, mask);
>+	features = bond_add_tso_features(features, mask);
>
> 	return features;
> }
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index a2a70cc..1be74ea 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -3010,16 +3010,6 @@ static inline netdev_features_t netdev_get_wanted_features(
> netdev_features_t netdev_increment_features(netdev_features_t all,
> 	netdev_features_t one, netdev_features_t mask);
>
>-/* Allow TSO being used on stacked device :
>- * Performing the GSO segmentation before last device
>- * is a performance improvement.
>- */
>-static inline netdev_features_t netdev_add_tso_features(netdev_features_t features,
>-							netdev_features_t mask)
>-{
>-	return netdev_increment_features(features, NETIF_F_ALL_TSO, mask);
>-}
>-
> int __netdev_update_features(struct net_device *dev);
> void netdev_update_features(struct net_device *dev);
> void netdev_change_features(struct net_device *dev);
>-- 
>1.8.0
>
>

^ permalink raw reply

* Re: [PATCH net-next] net: add build-time checks for msg->msg_name size
From: Hannes Frederic Sowa @ 2014-01-18 12:05 UTC (permalink / raw)
  To: Steffen Hurrle; +Cc: netdev
In-Reply-To: <20140117215314.GC7562@noise.didjital.de>

On Fri, Jan 17, 2014 at 10:53:15PM +0100, Steffen Hurrle wrote:
> This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
> handler msg_name and msg_namelen logic").
> 
> DECLARE_SOCKADDR validates that the structure we use for writing the
> name information to is not larger than the buffer which is reserved
> for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
> consistently in sendmsg code paths.
> 
> Signed-off-by: Steffen Hurrle <steffen@hurrle.net>
> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Thanks!

^ permalink raw reply

* Re: [PATCH net-next] ipcomp: Convert struct xt_ipcomp spis into 16bits
From: Pablo Neira Ayuso @ 2014-01-18 12:24 UTC (permalink / raw)
  To: Fan Du; +Cc: steffen.klassert, davem, netdev, netfilter-devel
In-Reply-To: <1390011374-21760-1-git-send-email-fan.du@windriver.com>

On Sat, Jan 18, 2014 at 10:16:14AM +0800, Fan Du wrote:
> sparse warnings: (new ones prefixed by >>)
> 
> >> >> net/netfilter/xt_ipcomp.c:63:26: sparse: restricted __be16 degrades to integer
> >> >> net/netfilter/xt_ipcomp.c:63:26: sparse: cast to restricted __be32
> 
> Fix this by using 16bits long spi, as IPcomp CPI is only valid for 16bits.
> 
> Signed-off-by: Fan Du <fan.du@windriver.com>
> ---
>  include/uapi/linux/netfilter/xt_ipcomp.h |    2 +-
>  net/netfilter/xt_ipcomp.c                |    4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/uapi/linux/netfilter/xt_ipcomp.h b/include/uapi/linux/netfilter/xt_ipcomp.h
> index 45c7e40..ca82ebb 100644
> --- a/include/uapi/linux/netfilter/xt_ipcomp.h
> +++ b/include/uapi/linux/netfilter/xt_ipcomp.h
> @@ -4,7 +4,7 @@
>  #include <linux/types.h>
>  
>  struct xt_ipcomp {
> -	__u32 spis[2];	/* Security Parameter Index */
> +	__u16 spis[2];	/* Security Parameter Index */

This changes the binary interface so it break userspace (iptables
needs to be recompiled), we're still in time to make such change as
this is net-next stuff, but what I understand from the patch
description is that this aims to fix a sparse warning, which is a bit
of intrusive change.

Didn't you find any way to fix this without change the layout of
xt_ipcomp?

>  	__u8 invflags;	/* Inverse flags */
>  	__u8 hdrres;	/* Test of the Reserved Filed */
>  };
> diff --git a/net/netfilter/xt_ipcomp.c b/net/netfilter/xt_ipcomp.c
> index a4c7561..5542cb2 100644
> --- a/net/netfilter/xt_ipcomp.c
> +++ b/net/netfilter/xt_ipcomp.c
> @@ -29,7 +29,7 @@ MODULE_DESCRIPTION("Xtables: IPv4/6 IPsec-IPComp SPI match");
>  
>  /* Returns 1 if the spi is matched by the range, 0 otherwise */
>  static inline bool
> -spi_match(u_int32_t min, u_int32_t max, u_int32_t spi, bool invert)
> +spi_match(u_int16_t min, u_int16_t max, u_int16_t spi, bool invert)
>  {
>  	bool r;
>  	pr_debug("spi_match:%c 0x%x <= 0x%x <= 0x%x\n",
> @@ -60,7 +60,7 @@ static bool comp_mt(const struct sk_buff *skb, struct xt_action_param *par)
>  	}
>  
>  	return spi_match(compinfo->spis[0], compinfo->spis[1],
> -			 ntohl(chdr->cpi << 16),
> +			 ntohl(chdr->cpi),
>  			 !!(compinfo->invflags & XT_IPCOMP_INV_SPI));
>  }
>  
> -- 
> 1.7.9.5
> 

^ permalink raw reply

* Re: [PATCH linux-next] net: batman-adv: use "__packed __aligned(2)" for each structure instead of "__packed(2)" region
From: Antonio Quartulli @ 2014-01-18 13:03 UTC (permalink / raw)
  To: Chen Gang, David Miller
  Cc: James Hogan, mareklindner-rVWd3aGhH2z5bpWLKbzFeg, netdev,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-metag-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <52DA65F4.5070501-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 3852 bytes --]

On 18/01/14 12:31, Chen Gang wrote:
> Unfortunately, not all compilers assumes the structures within a pack
> region also need be packed (e.g. metag), so need add a pack explicitly
> to satisfy all compilers.
> 
> The related error (under metag with allmodconfig):
> 
>     MODPOST 2952 modules
>   ERROR: "__compiletime_assert_431" [net/batman-adv/batman-adv.ko] undefined!
>   ERROR: "__compiletime_assert_432" [net/batman-adv/batman-adv.ko] undefined!
>   ERROR: "__compiletime_assert_429" [net/batman-adv/batman-adv.ko] undefined!
>   ERROR: "__compiletime_assert_428" [net/batman-adv/batman-adv.ko] undefined!
>   ERROR: "__compiletime_assert_423" [net/batman-adv/batman-adv.ko] undefined!
> 
> 
> Signed-off-by: Chen Gang <gang.chen.5i5j-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

David, what do you think about this change?


Can "__packed __aligned(2)" generate a different structure padding than
"#pragma pack(2)" ?

I am not really sure about the difference between the two. But if we
have the possibility that the padding may change then this patch should
go into net, otherwise we will have a protocol compatibility problem
between 3.13 and 3.14.


Cheers,

> ---
>  net/batman-adv/packet.h | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/net/batman-adv/packet.h b/net/batman-adv/packet.h
> index 0a381d1..9206b48 100644
> --- a/net/batman-adv/packet.h
> +++ b/net/batman-adv/packet.h
> @@ -154,7 +154,6 @@ enum batadv_tvlv_type {
>  	BATADV_TVLV_ROAM	= 0x05,
>  };
>  
> -#pragma pack(2)
>  /* the destination hardware field in the ARP frame is used to
>   * transport the claim type and the group id
>   */
> @@ -162,8 +161,7 @@ struct batadv_bla_claim_dst {
>  	uint8_t magic[3];	/* FF:43:05 */
>  	uint8_t type;		/* bla_claimframe */
>  	__be16 group;		/* group id */
> -};
> -#pragma pack()
> +} __packed __aligned(2);
>  
>  /**
>   * struct batadv_ogm_packet - ogm (routing protocol) packet
> @@ -281,7 +279,6 @@ struct batadv_icmp_packet_rr {
>   * misalignment of the payload after the ethernet header. It may also lead to
>   * leakage of information when the padding it not initialized before sending.
>   */
> -#pragma pack(2)
>  
>  /**
>   * struct batadv_unicast_packet - unicast packet for network payload
> @@ -300,7 +297,7 @@ struct batadv_unicast_packet {
>  	/* "4 bytes boundary + 2 bytes" long to make the payload after the
>  	 * following ethernet header again 4 bytes boundary aligned
>  	 */
> -};
> +}  __packed __aligned(2);
>  
>  /**
>   * struct batadv_unicast_4addr_packet - extended unicast packet
> @@ -316,7 +313,7 @@ struct batadv_unicast_4addr_packet {
>  	/* "4 bytes boundary + 2 bytes" long to make the payload after the
>  	 * following ethernet header again 4 bytes boundary aligned
>  	 */
> -};
> +}  __packed __aligned(2);
>  
>  /**
>   * struct batadv_frag_packet - fragmented packet
> @@ -347,7 +344,7 @@ struct batadv_frag_packet {
>  	uint8_t orig[ETH_ALEN];
>  	__be16  seqno;
>  	__be16  total_size;
> -};
> +}  __packed __aligned(2);
>  
>  /**
>   * struct batadv_bcast_packet - broadcast packet for network payload
> @@ -368,7 +365,7 @@ struct batadv_bcast_packet {
>  	/* "4 bytes boundary + 2 bytes" long to make the payload after the
>  	 * following ethernet header again 4 bytes boundary aligned
>  	 */
> -};
> +}  __packed __aligned(2);
>  
>  /**
>   * struct batadv_coded_packet - network coded packet
> @@ -404,9 +401,8 @@ struct batadv_coded_packet {
>  	uint8_t  second_orig_dest[ETH_ALEN];
>  	__be32   second_crc;
>  	__be16   coded_len;
> -};
> +}  __packed __aligned(2);
>  
> -#pragma pack()
>  
>  /**
>   * struct batadv_unicast_tvlv - generic unicast packet with tvlv payload
> 


-- 
Antonio Quartulli


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [RFC PATCH net-next 3/3] virtio-net: Add accelerated RFS support
From: Ben Hutchings @ 2014-01-18 14:19 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Zhi Yong Wu, Stefan Hajnoczi, Linux Netdev List, Eric Dumazet,
	David S. Miller, Zhi Yong Wu
In-Reply-To: <CA+mtBx8eRuWpYkYoPbuCaO1h0Y+g96zJB96zP17ZixOwZ1_gmQ@mail.gmail.com>

On Fri, 2014-01-17 at 20:59 -0800, Tom Herbert wrote:
> Ben,
> 
> I've never quite understood why flow management in aRFS has to be done
> with separate messages, and if I recall this seems to mitigate
> performance gains to a large extent. It seems like we should be able
> to piggyback on a TX descriptor for a connection information about the
> RX side for that connection, namely the rxhash and queue mapping.
> State creation should be implicit by just seeing a new rxhash value,
> tear down might be accomplished with a separate flag on the final TX
> packet on the connection (this would need some additional logic in the
> stack). Is this method not feasible in either NICs or virtio-net?

Well that's roughly how Flow Director works, isn't it?  So it is
feasible on at least one NIC!  It might be possible to implement
something like that in firmware on the SFC9100 (with the filter based on
the following packet headers, not a hash), but I don't know.  As for
other vendors - I have no idea.

Inserting filters from the receive path seemed like a natural extension
of the software RFS implementation.  And it means that the hardware
filters are inserted a little earlier (no need to transmit another
packet), but maybe that doesn't matter much.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH] net: remove unnecessary initializations in net_dev_init
From: Sabrina Dubroca @ 2014-01-18 15:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, Sabrina Dubroca

softnet_data is set to 0 by memset, no need to initialize specific
fields to 0 or NULL afterwards.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
 net/core/dev.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 288df62..b57b44a2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7000,25 +7000,16 @@ static int __init net_dev_init(void)
 		memset(sd, 0, sizeof(*sd));
 		skb_queue_head_init(&sd->input_pkt_queue);
 		skb_queue_head_init(&sd->process_queue);
-		sd->completion_queue = NULL;
 		INIT_LIST_HEAD(&sd->poll_list);
-		sd->output_queue = NULL;
 		sd->output_queue_tailp = &sd->output_queue;
 #ifdef CONFIG_RPS
 		sd->csd.func = rps_trigger_softirq;
 		sd->csd.info = sd;
-		sd->csd.flags = 0;
 		sd->cpu = i;
 #endif
 
 		sd->backlog.poll = process_backlog;
 		sd->backlog.weight = weight_p;
-		sd->backlog.gro_list = NULL;
-		sd->backlog.gro_count = 0;
-
-#ifdef CONFIG_NET_FLOW_LIMIT
-		sd->flow_limit = NULL;
-#endif
 	}
 
 	dev_boot_phase = 0;
-- 
1.8.5.3

^ permalink raw reply related

* Re: kmem_cache_alloc panic in 3.10+
From: Eric Dumazet @ 2014-01-18 16:29 UTC (permalink / raw)
  To: dormando; +Cc: netdev, linux-kernel, Alexei Starovoitov
In-Reply-To: <alpine.DEB.2.10.1401180036020.18419@dinf>

On Sat, 2014-01-18 at 00:44 -0800, dormando wrote:
> Hello again!
> 
> We've had a rare crash that's existed between 3.10.0 and 3.10.15 at least
> (trying newer stables now, but I can't tell if it was fixed, and it takes
> weeks to reproduce).
> 
> Unfortunately I can only get 8k back from pstore. The panic looks a bit
> longer than that is caught in the log, but the bottom part is almost
> always this same trace as this one:
> 
> Panic#6 Part1
> <4>[1197485.199166]  [<ffffffff81611e8c>] tcp_push+0x6c/0x90
> <4>[1197485.199171]  [<ffffffff816160a9>] tcp_sendmsg+0x109/0xd40
> <4>[1197485.199179]  [<ffffffff81114b65>] ? put_page+0x35/0x40
> <4>[1197485.199185]  [<ffffffff8163bf75>] inet_sendmsg+0x45/0xb0
> <4>[1197485.199191]  [<ffffffff8159da7e>] sock_aio_write+0x11e/0x130
> <4>[1197485.199196]  [<ffffffff8163b83f>] ? inet_recvmsg+0x4f/0x80
> <4>[1197485.199203]  [<ffffffff811558ad>] do_sync_readv_writev+0x6d/0xa0
> <4>[1197485.199209]  [<ffffffff8115722b>] do_readv_writev+0xfb/0x2f0
> <4>[1197485.199215]  [<ffffffff8110fda5>] ? __free_pages+0x35/0x40
> <4>[1197485.199220]  [<ffffffff8110fe56>] ? free_pages+0x46/0x50
> <4>[1197485.199226]  [<ffffffff8112f9e2>] ? SyS_mincore+0x152/0x690
> <4>[1197485.199231]  [<ffffffff81157468>] vfs_writev+0x48/0x60
> <4>[1197485.199236]  [<ffffffff811575af>] SyS_writev+0x5f/0xd0
> <4>[1197485.199243]  [<ffffffff816cf942>] system_call_fastpath+0x16/0x1b
> <4>[1197485.199247] Code: 65 4c 03 04 25 c8 cb 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 84 00 00 00 48 85 c0 74 7f 49 63 44 24 20 49 8b 3c 24 <49> 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c
> <1>[1197485.199290] RIP  [<ffffffff811476da>] kmem_cache_alloc+0x5a/0x130
> <4>[1197485.199296]  RSP <ffff883171211868>
> <4>[1197485.199299] CR2: 0000000100000000
> <4>[1197485.199343] ---[ end trace 90fee06aa40b7304 ]---
> <1>[1197485.263911] BUG: unable to handle kernel paging request at 0000000100000000
> <1>[1197485.263923] IP: [<ffffffff811476da>] kmem_cache_alloc+0x5a/0x130
> <4>[1197485.263932] PGD 3f43e5c067 PUD 0
> <4>[1197485.263937] Oops: 0000 [#5] SMP
> <4>[1197485.263941] Modules linked in: ntfs vfat msdos fat macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode sb_edac edac_core lpc_ich mfd_core ixgbe igb i2c_algo_bit mdio ptp pps_core
> <4>[1197485.263966] CPU: 0 PID: 233846 Comm: cache-worker Tainted: G      D      3.10.15 #1
> <4>[1197485.263972] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 2.0a 03/07/2013
> <4>[1197485.263976] task: ffff883427f9dc00 ti: ffff8830d4312000 task.ti: ffff8830d4312000
> <4>[1197485.263982] RIP: 0010:[<ffffffff811476da>]  [<ffffffff811476da>] kmem_cache_alloc+0x5a/0x130
> <4>[1197485.263990] RSP: 0018:ffff881fffc038c8  EFLAGS: 00010286
> <4>[1197485.263994] RAX: 0000000000000000 RBX: ffffffff81c8c740 RCX: 00000000ffffffff
> <4>[1197485.263999] RDX: 0000000029273024 RSI: 0000000000000020 RDI: 0000000000015680
> <4>[1197485.264004] RBP: ffff881fffc03908 R08: ffff881fffc15680 R09: ffffffff815bdd4b
> <4>[1197485.264009] R10: ffff881c65d21800 R11: 0000000000000000 R12: ffff881fff803800
> <4>[1197485.264014] R13: 0000000100000000 R14: 00000000ffffffff R15: 0000000000000000
> <4>[1197485.264019] FS:  00007f8d855eb700(0000) GS:ffff881fffc00000(0000) knlGS:0000000000000000
> <4>[1197485.264024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[1197485.264028] CR2: 0000000100000000 CR3: 000000308f258000 CR4: 00000000000407f0
> <4>[1197485.264032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <4>[1197485.264037] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> <4>[1197485.264041] Stack:
> <4>[1197485.264044]  ffff881fffc03928 00000020815d0d95 ffff881fffc03938 ffffffff81c8c740
> <4>[1197485.264050]  ffff881fce210000 0000000000000001 00000000ffffffff 0000000000000000
> <4>[1197485.264056]  ffff881fffc03958 ffffffff815bdd4b ffff881fffc039a8 0000000000000000
> <4>[1197485.264063] Call Trace:
> <4>[1197485.264066]  <IRQ>
> <4>[1197485.264069]  [<ffffffff815bdd4b>] dst_alloc+0x5b/0x190
> <4>[1197485.264080]  [<ffffffff8160068c>] rt_dst_alloc+0x4c/0x50
> <4>[1197485.264085]  [<ffffffff81602a30>] __ip_route_output_key+0x270/0x880
> <4>[1197485.264092]  [<ffffffff8107ee7e>] ? try_to_wake_up+0x23e/0x2b0
> <4>[1197485.264097]  [<ffffffff81603067>] ip_route_output_flow+0x27/0x60
> <4>[1197485.264102]  [<ffffffff8160ab8a>] ip_queue_xmit+0x36a/0x390
> <4>[1197485.264108]  [<ffffffff816207c5>] tcp_transmit_skb+0x485/0x890
> <4>[1197485.264113]  [<ffffffff81621aa1>] tcp_send_ack+0xf1/0x130
> <4>[1197485.264118]  [<ffffffff81618d7e>] __tcp_ack_snd_check+0x5e/0xa0
> <4>[1197485.264123]  [<ffffffff8161f2c2>] tcp_rcv_state_process+0x8b2/0xb20
> <4>[1197485.264128]  [<ffffffff81627e61>] tcp_v4_do_rcv+0x191/0x4f0
> <4>[1197485.264133]  [<ffffffff8162984c>] tcp_v4_rcv+0x5fc/0x750
> <4>[1197485.264138]  [<ffffffff81604c80>] ? ip_rcv+0x350/0x350
> <4>[1197485.264143]  [<ffffffff815e45cd>] ? nf_hook_slow+0x7d/0x160
> <4>[1197485.264147]  [<ffffffff81604c80>] ? ip_rcv+0x350/0x350
> <4>[1197485.264152]  [<ffffffff81604d4e>] ip_local_deliver_finish+0xce/0x250
> <4>[1197485.264156]  [<ffffffff81604f1c>] ip_local_deliver+0x4c/0x80
> <4>[1197485.264161]  [<ffffffff816045a9>] ip_rcv_finish+0x119/0x360
> <4>[1197485.264165]  [<ffffffff81604b60>] ip_rcv+0x230/0x350
> <4>[1197485.264170]  [<ffffffff815b89f7>] __netif_receive_skb_core+0x477/0x600
> <4>[1197485.264175]  [<ffffffff815b8ba7>] __netif_receive_skb+0x27/0x70
> <4>[1197485.264180]  [<ffffffff815b8ce4>] process_backlog+0xf4/0x1e0
> <4>[1197485.264184]  [<ffffffff815b94e5>] net_rx_action+0xf5/0x250
> <4>[1197485.264190]  [<ffffffff81053b7f>] __do_softirq+0xef/0x270
> <4>[1197485.264196]  [<ffffffff816d0b7c>] call_softirq+0x1c/0x30
> <4>[1197485.264199]  <EOI>
> <4>[1197485.264201]  [<ffffffff81004495>] do_softirq+0x55/0x90
> <4>[1197485.264209]  [<ffffffff81053a84>] local_bh_enable+0x94/0xa0
> <4>[1197485.264215]  [<ffffffff8165567a>] ipt_do_table+0x22a/0x680
> <4>[1197485.264221]  [<ffffffff815d39c1>] ? skb_clone_tx_timestamp+0x31/0x110
> <4>[1197485.264231]  [<ffffffffa00ae840>] ? ixgbe_xmit_frame_ring+0x4c0/0xd40 [ixgbe]
> <4>[1197485.264239]  [<ffffffffa00af103>] ? ixgbe_xmit_frame+0x43/0x90 [ixgbe]
> <4>[1197485.264245]  [<ffffffff81657a23>] iptable_raw_hook+0x33/0x70
> <4>[1197485.264252]  [<ffffffff815e43a7>] nf_iterate+0x87/0xb0
> <4>[1197485.264256]  [<ffffffff81607e20>] ? ip_options_echo+0x420/0x420
> <4>[1197485.264261]  [<ffffffff815e45cd>] nf_hook_slow+0x7d/0x160
> <4>[1197485.264266]  [<ffffffff81607e20>] ? ip_options_echo+0x420/0x420
> <4>[1197485.264270]  [<ffffffff8160a430>] __ip_local_out+0xa0/0xb0
> <4>[1197485.264275]  [<ffffffff8160a456>] ip_local_out+0x16/0x30
> <4>[1197485.264280]  [<ffffffff8160a97a>] ip_queue_xmit+0x15a/0x390
> <4>[1197485.264286]  [<ffffffff81625e73>] ? tcp_v4_md5_lookup+0x13/0x20
> <4>[1197485.264290]  [<ffffffff816207c5>] tcp_transmit_skb+0x485/0x890
> <4>[1197485.264295]  [<ffffffff81622e08>] tcp_write_xmit+0x1b8/0xa50
> <4>[1197485.264300]  [<ffffffff815a7e28>] ? __alloc_skb+0xa8/0x1f0
> <4>[1197485.264304]  [<ffffffff816236d0>] tcp_push_one+0x30/0x40
> <4>[1197485.264309]  [<ffffffff81616b84>] tcp_sendmsg+0xbe4/0xd40
> <4>[1197485.264315]  [<ffffffff81114b65>] ? put_page+0x35/0x40
> <4>[1197485.264321]  [<ffffffff8163bf75>] inet_sendmsg+0x45/0xb0
> <4>[1197485.264326]  [<ffffffff8159da7e>] sock_aio_write+0x11e/0x130
> <4>[1197485.264331]  [<ffffffff8163b83f>] ? inet_recvmsg+0x4f/0x80
> <4>[1197485.264337]  [<ffffffff811558ad>] do_sync_readv_writev+0x6d/0xa0
> <4>[1197485.264343]  [<ffffffff8115722b>] do_readv_writev+0xfb/0x2f0
> <4>[1197485.264347]  [<ffffffff8110fda5>] ? __free_pages+0x35/0x40
> <4>[1197485.264352]  [<ffffffff8110fe56>] ? free_pages+0x46/0x50
> <4>[1197485.264357]  [<ffffffff8112f9e2>] ? SyS_mincore+0x152/0x690
> <4>[1197485.264363]  [<ffffffff81157468>] vfs_writev+0x48/0x60
> <4>[1197485.264367]  [<ffffffff811575af>] SyS_writev+0x5f/0xd0
> <4>[1197485.264373]  [<ffffffff816cf942>] system_call_fastpath+0x16/0x1b
> <4>[1197485.264377] Code: 65 4c 03 04 25 c8 cb 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 84 00 00 00 48 85 c0 74 7f 49 63 44 24 20 49 8b 3c 24 <49> 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c
> <1>[1197485.264417] RIP  [<ffffffff811476da>] kmem_cache_alloc+0x5a/0x130
> <4>[1197485.264424]  RSP <ffff881fffc038c8>
> <4>[1197485.264427] CR2: 0000000100000000
> <4>[1197485.264431] ---[ end trace 90fee06aa40b7305 ]---
> <0>[1197485.325141] Kernel panic - not syncing: Fatal exception in interrupt
> 
> ... way down in the tcp code.
> 
> Any help would be appreciated :) I'll do what I can to help, but iterating
> this particular crash is very hard due to the amount of time it takes to
> reproduce. Since we have a large number of machines they're always
> crashing here and there, but once they do it's not going to happen again
> for a while.
> 
> Thanks!
> -Dormando
> --

Hmm...

Some dst seems to be destroyed twice. This likely screws slab allocator.

Please try following untested patch :
diff --git a/include/net/route.h b/include/net/route.h
index 9d1f423d5944..bb96e0873eb5 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -314,4 +314,9 @@ static inline int ip4_dst_hoplimit(const struct dst_entry *dst)
 	return hoplimit;
 }
 
+static inline void rt_free(struct rtable *rt)
+{
+	call_rcu(&rt->dst.rcu_head, dst_rcu_free);
+}
+
 #endif	/* _ROUTE_H */
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index b53f0bf84dca..97b43b09e037 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -152,7 +152,7 @@ static void rt_fibinfo_free(struct rtable __rcu **rtp)
 	 * free_fib_info_rcu()
 	 */
 
-	dst_free(&rt->dst);
+	rt_free(rt);
 }
 
 static void free_nh_exceptions(struct fib_nh *nh)
@@ -192,7 +192,7 @@ static void rt_fibinfo_free_cpus(struct rtable __rcu * __percpu *rtp)
 
 		rt = rcu_dereference_protected(*per_cpu_ptr(rtp, cpu), 1);
 		if (rt)
-			dst_free(&rt->dst);
+			rt_free(rt);
 	}
 	free_percpu(rtp);
 }
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 25071b48921c..06f79225b7ac 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -556,11 +556,6 @@ static void ip_rt_build_flow_key(struct flowi4 *fl4, const struct sock *sk,
 		build_sk_flow_key(fl4, sk);
 }
 
-static inline void rt_free(struct rtable *rt)
-{
-	call_rcu(&rt->dst.rcu_head, dst_rcu_free);
-}
-
 static DEFINE_SPINLOCK(fnhe_lock);
 
 static void fnhe_flush_routes(struct fib_nh_exception *fnhe)

^ permalink raw reply related

* Re: kmem_cache_alloc panic in 3.10+
From: Eric Dumazet @ 2014-01-18 16:57 UTC (permalink / raw)
  To: dormando; +Cc: netdev, linux-kernel, Alexei Starovoitov
In-Reply-To: <1390062576.31367.519.camel@edumazet-glaptop2.roam.corp.google.com>

On Sat, 2014-01-18 at 08:29 -0800, Eric Dumazet wrote:

> Hmm...
> 
> Some dst seems to be destroyed twice. This likely screws slab allocator.
> 
> Please try following untested patch :


Forget it, after some coffee it makes no longer sense ;)

^ permalink raw reply

* Re: [PATCH net-next] bonding: move the netdev_add_tso_features() to bonding module
From: Eric Dumazet @ 2014-01-18 17:08 UTC (permalink / raw)
  To: Veaceslav Falico
  Cc: Ding Tianhong, Jay Vosburgh, Eric Dumazet, David S. Miller,
	Netdev
In-Reply-To: <20140118114801.GA30549@redhat.com>

On Sat, 2014-01-18 at 12:48 +0100, Veaceslav Falico wrote:
> On Sat, Jan 18, 2014 at 04:31:33PM +0800, Ding Tianhong wrote:
> >The function netdev_add_tso_features() was only be used for bonding,
> >so no need to export it in netdevice.h, move it to bonding module.
> 
> Eric added it for a reason - like, other drivers might use it. Do you know
> if team, bridge, vlan etc. might use it?

A helper can be used once, this is fine. A car can have 4 seats, and can
even be used with no passenger.

I am quite bored by patches that break clean layering for wrong reasons.


static inline netdev_features_t netdev_add_tso_features(netdev_features_t features,
                                                      netdev_features_t mask)
{
      return netdev_increment_features(features, NETIF_F_ALL_TSO, mask);
}

There is _nothing_ in this helper that implies it should be private to bonding.

^ permalink raw reply

* Re: [PATCH net-next] net: vxlan: do not use vxlan_net before checking event type
From: Eric Dumazet @ 2014-01-18 17:18 UTC (permalink / raw)
  To: Cong Wang
  Cc: Daniel Borkmann, David Miller, Linux Kernel Network Developers,
	Eric W. Biederman, Jesse Brandeburg
In-Reply-To: <CAM_iQpUoQcHpQJn-nYp9mO+XXMmXSjFxy3ASwchAH-qECoz9OA@mail.gmail.com>

On Fri, 2014-01-17 at 19:50 -0800, Cong Wang wrote:
> On Fri, Jan 17, 2014 at 10:32 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
> >
> >
> > If you want to do cleanups, whatever, I really don't care.
> > You had your chance to complain about that when you reviewed
> > the initial version ... it has nothing to do with the fix.
> 
> This is not for stable, as long as it doesn't harm the readability
> we are free to do any cleanup's.
> 
> If unsure, check Eric's patch for tunnel dst cache.
> 
> BTW, I am the original author of the patch, you just updated
> it *trivially* and set yourself as the author. :) I don't mind, but
> remember that this may be not appropriate for others. At
> very least I didn't and don't do this myself.


Hmm... Daniel mentioned in the changelog you wrote the initial patch,
and you are credited as the author of the patch, since he kept your
"Signed-off-by: ..." as the first one.

Quite frankly, keeping vxlan_handle_lowerdev_unregister() was the right
choice.

Stop thinking that a function needs to be used more than once to have
the right to exist. Splitting code in small parts ease readability and
code reuse/refactor, this should be obvious to you.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox