Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2] net: secure_seq: Fix warning when CONFIG_IPV6 and CONFIG_INET are not selected
From: David Miller @ 2013-10-07 19:59 UTC (permalink / raw)
  To: festevam; +Cc: edumazet, hannes, netdev, olof, fabio.estevam
In-Reply-To: <1381006619-17126-2-git-send-email-festevam@gmail.com>

From: Fabio Estevam <festevam@gmail.com>
Date: Sat,  5 Oct 2013 17:56:59 -0300

> From: Fabio Estevam <fabio.estevam@freescale.com>
> 
> net_secret() is only used when CONFIG_IPV6 or CONFIG_INET are selected.
> 
> Building a defconfig with both of these symbols unselected (Using the ARM
> at91sam9rl_defconfig, for example) leads to the following build warning:
 ...
> Fix this warning by protecting the definition of net_secret() with these
> symbols.
> 
> Reported-by: Olof Johansson <olof@lixom.net>
> Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
> ---
> Changes since v1:
> - Add #if IS_ENABLED(CONFIG_IPV6) || IS_ENABLED(CONFIG_INET)

This looks a lot better, applied, thanks.

^ permalink raw reply

* Re: IPv6 kernel warning
From: Yuchung Cheng @ 2013-10-07 20:00 UTC (permalink / raw)
  To: dormando
  Cc: Michele Baldessari, Russell King - ARM Linux, netdev,
	Neal Cardwell, Nandita Dukkipati
In-Reply-To: <alpine.DEB.2.02.1310071256190.17658@dtop>

On Mon, Oct 7, 2013 at 12:56 PM, dormando <dormando@rydia.net> wrote:
> On Mon, 7 Oct 2013, Yuchung Cheng wrote:
>
>> On Mon, Oct 7, 2013 at 11:13 AM, dormando <dormando@rydia.net> wrote:
>> >
>> > > >
>> > > > there's been multiple reports about this one:
>> > > > https://bugzilla.redhat.com/show_bug.cgi?id=989251
>> > > > http://bugzilla.kernel.org/show_bug.cgi?id=60779
>> > > >
>> > > > Could you try Yuchung's debug patch?
>> > > > http://www.spinics.net/lists/netdev/msg250193.html
>> > > Yes it looks like the same bug. Please try that patch to help identify
>> > > this elusive bug.
>> > >
>> >
>> > Hi!
>> >
>> > We get this one a few times a day in production. Here's a warning with
>> > your debug trace in the line immediately following:
>> > (I censored a few things)
>> >
>> >  [125311.721950] ------------[ cut here ]------------
>> >  [125311.721961] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0xb58/0xc80()
>> >  [125311.721962] Modules linked in: bridge ip_vs macvlan coretemp crc32_pclmul ghash_clmulni_intel gpio_ich ipmi_watchdog microcode ipmi_devintf sb_edac lpc_ich edac_core mfd_core ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat ixgbe igb mdio i2c_algo_bit ptp pps_core
>> >  [125311.721981] CPU: 11 PID: 0 Comm: swapper/11 Not tainted 3.10.13 #1
>> >  [125311.721982] Hardware name: Supermicro XXXXXXXXXXX, BIOS 1.1 10/03/2012
>> >  [125311.721984]  ffffffff81a82007 ffff88407fc63958 ffffffff816bb9cc ffff88407fc63998
>> >  [125311.721986]  ffffffff8104b940 00ff8840ad904f82 ffff883b8a165b00 0000000000004120
>> >  [125311.721989]  0000000000000001 0000000000000019 0000000000000000 ffff88407fc639a8
>> >  [125311.721991] Call Trace:
>> >  [125311.721992]  <IRQ>  [<ffffffff816bb9cc>] dump_stack+0x19/0x1d
>> >  [125311.722002]  [<ffffffff8104b940>] warn_slowpath_common+0x70/0xa0
>> >  [125311.722005]  [<ffffffff8104b98a>] warn_slowpath_null+0x1a/0x20
>> >  [125311.722007]  [<ffffffff81616db8>] tcp_fastretrans_alert+0xb58/0xc80
>> >  [125311.722011]  [<ffffffff8161891f>] tcp_ack+0x6df/0xe90
>> >  [125311.722016]  [<ffffffff8164e0ca>] ? ipt_do_table+0x22a/0x680
>> >  [125311.722018]  [<ffffffff816194b3>] ? tcp_validate_incoming+0x63/0x320
>> >  [125311.722021]  [<ffffffff8161a55c>] tcp_rcv_established+0x2cc/0x810
>> >  [125311.722023]  [<ffffffff81622c84>] tcp_v4_do_rcv+0x254/0x4f0
>> >  [125311.722025]  [<ffffffff816245ac>] tcp_v4_rcv+0x5fc/0x750
>> >  [125311.722027]  [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350
>> >  [125311.722032]  [<ffffffff815df3ad>] ? nf_hook_slow+0x7d/0x160
>> >  [125311.722034]  [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350
>> >  [125311.722036]  [<ffffffff815fface>] ip_local_deliver_finish+0xce/0x250
>> >  [125311.722037]  [<ffffffff815ffc9c>] ip_local_deliver+0x4c/0x80
>> >  [125311.722039]  [<ffffffff815ff329>] ip_rcv_finish+0x119/0x360
>> >  [125311.722040]  [<ffffffff815ff8e0>] ip_rcv+0x230/0x350
>> >  [125311.722046]  [<ffffffff815b4067>] __netif_receive_skb_core+0x477/0x600
>> >  [125311.722049]  [<ffffffff815b4217>] __netif_receive_skb+0x27/0x70
>> >  [125311.722051]  [<ffffffff815b4354>] process_backlog+0xf4/0x1e0
>> >  [125311.722053]  [<ffffffff815b4b45>] net_rx_action+0xf5/0x250
>> >  [125311.722056]  [<ffffffff81053a5f>] __do_softirq+0xef/0x270
>> >  [125311.722058]  [<ffffffff81053cb5>] irq_exit+0x95/0xa0
>> >  [125311.722062]  [<ffffffff816c8f26>] do_IRQ+0x66/0xe0
>> >  [125311.722065]  [<ffffffff816bf62a>] common_interrupt+0x6a/0x6a
>> >  [125311.722065]  <EOI>  [<ffffffff8100abf1>] ? default_idle+0x21/0xc0
>> >  [125311.722082]  [<ffffffff8100a54f>] arch_cpu_idle+0xf/0x20
>> >  [125311.722086]  [<ffffffff8108f353>] cpu_startup_entry+0xb3/0x230
>> >  [125311.722091]  [<ffffffff816b439e>] start_secondary+0x1dc/0x1e3
>> >  [125311.722093] ---[ end trace e77cd5ba583fcbe9 ]---
>> >  [125311.722096] 355.355.1.355:22496 F0x4120 S1 s7 IF25+17-1-24f0 ur57 rr3 rt0 um0 hs23120 nxt23120
>> >
>> > It's been happening with all 3.10 kernels, and the one above is .13 as
>> > stated in the trace.
>>
>> Thanks! could you post the output of `sysctl -a |grep tcp`?
>>
>> I suspect tcp_process_tlp_ack() should not revert state to Open
>> directly, but calling tcp_try_keep_open() instead, similar to all the
>> undo processing in the tcp_fastretrans_alert(): after
>> tcp_end_cwnd_reduction(), the process (E) falls back to check other
>> stats before moving to CA_Open.
>>
>>
>> index 9c62257..9012b42 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -3314,7 +3314,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack,
>>                         tcp_init_cwnd_reduction(sk, true);
>>                         tcp_set_ca_state(sk, TCP_CA_CWR);
>>                         tcp_end_cwnd_reduction(sk);
>> -                       tcp_set_ca_state(sk, TCP_CA_Open);
>> +                       tcp_try_keep_open(sk);
>>                         NET_INC_STATS_BH(sock_net(sk),
>>                                          LINUX_MIB_TCPLOSSPROBERECOVERY);
>>                 }
>>
>
> Should I apply this and see if the warning stops?
I'd like to hear what the authors of TLP think. In the mean time could
you help us collect more evidence by disabling TLP with
sysctl net.ipv4.tcp_early_retrans=2
and see if the problem still occurs? (it should not).

thanks

>
> net.ipv4.tcp_abort_on_overflow = 0
> net.ipv4.tcp_adv_win_scale = 1
> net.ipv4.tcp_allowed_congestion_control = cubic reno
> net.ipv4.tcp_app_win = 31
> net.ipv4.tcp_available_congestion_control = cubic reno westwood
> net.ipv4.tcp_base_mss = 512
> net.ipv4.tcp_challenge_ack_limit = 100
> net.ipv4.tcp_congestion_control = cubic
> net.ipv4.tcp_dma_copybreak = 262144
> net.ipv4.tcp_dsack = 1
> net.ipv4.tcp_early_retrans = 3


> net.ipv4.tcp_ecn = 2
> net.ipv4.tcp_fack = 1
> net.ipv4.tcp_fastopen = 0
> net.ipv4.tcp_fastopen_key = 009dc92c-82e3e514-d440ed23-c49b1a89
> net.ipv4.tcp_fin_timeout = 5
> net.ipv4.tcp_frto = 0
> net.ipv4.tcp_keepalive_intvl = 75
> net.ipv4.tcp_keepalive_probes = 9
> net.ipv4.tcp_keepalive_time = 1800
> net.ipv4.tcp_limit_output_bytes = 131072
> net.ipv4.tcp_low_latency = 0
> net.ipv4.tcp_max_orphans = 2000000
> net.ipv4.tcp_max_ssthresh = 0
> net.ipv4.tcp_max_syn_backlog = 65536
> net.ipv4.tcp_max_tw_buckets = 2000000
> net.ipv4.tcp_mem = 6188001      8250670 12376002
> net.ipv4.tcp_moderate_rcvbuf = 1
> net.ipv4.tcp_mtu_probing = 0
> net.ipv4.tcp_no_metrics_save = 1
> net.ipv4.tcp_orphan_retries = 0
> net.ipv4.tcp_reordering = 3
> net.ipv4.tcp_retrans_collapse = 1
> net.ipv4.tcp_retries1 = 3
> net.ipv4.tcp_retries2 = 15
> net.ipv4.tcp_rfc1337 = 0
> net.ipv4.tcp_rmem = 4096        87380   16777216
> net.ipv4.tcp_sack = 1
> net.ipv4.tcp_slow_start_after_idle = 0
> net.ipv4.tcp_stdurg = 0
> net.ipv4.tcp_syn_retries = 6
> net.ipv4.tcp_synack_retries = 5
> net.ipv4.tcp_syncookies = 1
> net.ipv4.tcp_thin_dupack = 0
> net.ipv4.tcp_thin_linear_timeouts = 0
> net.ipv4.tcp_timestamps = 1
> net.ipv4.tcp_tso_win_divisor = 3
> net.ipv4.tcp_tw_recycle = 0
> net.ipv4.tcp_tw_reuse = 0
> net.ipv4.tcp_user_cwnd_max = 20
> net.ipv4.tcp_window_scaling = 1
> net.ipv4.tcp_wmem = 4096        65536   16777216
> net.ipv4.tcp_workaround_signed_windows = 0
> net.ipv4.vs.secure_tcp = 0
>

^ permalink raw reply

* Re: [PATCH] eisa: standardize on eisa_register_driver like similar bus registrations
From: David Miller @ 2013-10-07 20:02 UTC (permalink / raw)
  To: tedheadster; +Cc: netdev, linux-scsi
In-Reply-To: <1381026958-4459-1-git-send-email-tedheadster@gmail.com>

From: Matthew Whitehead <tedheadster@gmail.com>
Date: Sat,  5 Oct 2013 22:35:58 -0400

> The other buses (isa, pci, pnp, parport, usb, tty, etc) all use the convention
> of ${BUSNAME}_register_driver. Rewrite the little remaining code that uses EISA
> to follow this convention for easier readability.
> 
> This affects the EISA bus, SCSI, and networking subsystems so only one should
> ultimately merge the patch if it is accepted.
> 
> Signed-off-by: Matthew Whitehead <tedheadster@gmail.com>

I'm fine with someone else taking this, for networking parts:

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [PATCH net-next] xen-netback: fix xenvif_count_skb_slots()
From: Ian Campbell @ 2013-10-07 20:03 UTC (permalink / raw)
  To: David Miller
  Cc: paul.durrant, xen-devel, netdev, xixiong, msw, annie.li, wei.liu2
In-Reply-To: <20131007.153622.1392772445100851507.davem@davemloft.net>

On Mon, 2013-10-07 at 15:36 -0400, David Miller wrote:
> From: Paul Durrant <paul.durrant@citrix.com>
> Date: Fri, 4 Oct 2013 17:26:23 +0100
> 
> > Commit 4f0581d25827d5e864bcf07b05d73d0d12a20a5c introduced an error into
> > xenvif_count_skb_slots() for skbs with a linear area spanning a page
> > boundary. The alignment of skb->data needs to be taken into account, not
> > just the head length. This patch fixes the issue by dry-running the code
> > from xenvif_gop_skb() (and adjusting the comment above the function to note
> > that).
> > 
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> There seems to be a lot of back and forth about what is the most
> desirable way forward wrt. this commit and another similar one.
> 
> Please advise.

Lets revert 4f0581d25827d5e864bcf07b05d73d0d12a20a5c and see about
making this stuff less fragile in the future.

Thanks,
Ian.

^ permalink raw reply

* Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern
From: Benjamin Herrenschmidt @ 2013-10-07 20:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alexander Gordeev, Ben Hutchings, linux-kernel, Bjorn Helgaas,
	Ralf Baechle, Michael Ellerman, Martin Schwidefsky, Ingo Molnar,
	Dan Williams, Andy King, Jon Mason, Matt Porter, linux-pci,
	linux-mips, linuxppc-dev, linux390, linux-s390, x86, linux-ide,
	iss_storagedev, linux-nvme, linux-rdma, netdev, e1000-devel
In-Reply-To: <20131007180111.GC2481@htj.dyndns.org>

On Mon, 2013-10-07 at 14:01 -0400, Tejun Heo wrote:
> I don't think the same race condition would happen with the loop.  The
> problem case is where multiple msi(x) allocation fails completely
> because the global limit went down before inquiry and allocation.  In
> the loop based interface, it'd retry with the lower number.
> 
> As long as the number of drivers which need this sort of adaptive
> allocation isn't too high and the common cases can be made simple, I
> don't think the "complex" part of interface is all that important.
> Maybe we can have reserve / cancel type interface or just keep the
> loop with more explicit function names (ie. try_enable or something
> like that).

I'm thinking a better API overall might just have been to request
individual MSI-X one by one :-)

We want to be able to request an MSI-X at runtime anyway ... if I want
to dynamically add a queue to my network interface, I want it to be able
to pop a new arbitrary MSI-X.

And we don't want to lock drivers into contiguous MSI-X sets either.

And for the cleanup ... well that's what the "pcim" functions are for,
we can just make MSI-X variants.

Ben.

^ permalink raw reply

* Re: IPv6 kernel warning
From: dormando @ 2013-10-07 20:15 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Michele Baldessari, Russell King - ARM Linux, netdev,
	Neal Cardwell, Nandita Dukkipati
In-Reply-To: <CAK6E8=d_O+HHTKb37zGKxpU8E0tTUL6m77g_o6a7ASEiJfXSkw@mail.gmail.com>



On Mon, 7 Oct 2013, Yuchung Cheng wrote:

> On Mon, Oct 7, 2013 at 12:56 PM, dormando <dormando@rydia.net> wrote:
> > On Mon, 7 Oct 2013, Yuchung Cheng wrote:
> >
> >> On Mon, Oct 7, 2013 at 11:13 AM, dormando <dormando@rydia.net> wrote:
> >> >
> >> > > >
> >> > > > there's been multiple reports about this one:
> >> > > > https://bugzilla.redhat.com/show_bug.cgi?id=989251
> >> > > > http://bugzilla.kernel.org/show_bug.cgi?id=60779
> >> > > >
> >> > > > Could you try Yuchung's debug patch?
> >> > > > http://www.spinics.net/lists/netdev/msg250193.html
> >> > > Yes it looks like the same bug. Please try that patch to help identify
> >> > > this elusive bug.
> >> > >
> >> >
> >> > Hi!
> >> >
> >> > We get this one a few times a day in production. Here's a warning with
> >> > your debug trace in the line immediately following:
> >> > (I censored a few things)
> >> >
> >> >  [125311.721950] ------------[ cut here ]------------
> >> >  [125311.721961] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0xb58/0xc80()
> >> >  [125311.721962] Modules linked in: bridge ip_vs macvlan coretemp crc32_pclmul ghash_clmulni_intel gpio_ich ipmi_watchdog microcode ipmi_devintf sb_edac lpc_ich edac_core mfd_core ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat ixgbe igb mdio i2c_algo_bit ptp pps_core
> >> >  [125311.721981] CPU: 11 PID: 0 Comm: swapper/11 Not tainted 3.10.13 #1
> >> >  [125311.721982] Hardware name: Supermicro XXXXXXXXXXX, BIOS 1.1 10/03/2012
> >> >  [125311.721984]  ffffffff81a82007 ffff88407fc63958 ffffffff816bb9cc ffff88407fc63998
> >> >  [125311.721986]  ffffffff8104b940 00ff8840ad904f82 ffff883b8a165b00 0000000000004120
> >> >  [125311.721989]  0000000000000001 0000000000000019 0000000000000000 ffff88407fc639a8
> >> >  [125311.721991] Call Trace:
> >> >  [125311.721992]  <IRQ>  [<ffffffff816bb9cc>] dump_stack+0x19/0x1d
> >> >  [125311.722002]  [<ffffffff8104b940>] warn_slowpath_common+0x70/0xa0
> >> >  [125311.722005]  [<ffffffff8104b98a>] warn_slowpath_null+0x1a/0x20
> >> >  [125311.722007]  [<ffffffff81616db8>] tcp_fastretrans_alert+0xb58/0xc80
> >> >  [125311.722011]  [<ffffffff8161891f>] tcp_ack+0x6df/0xe90
> >> >  [125311.722016]  [<ffffffff8164e0ca>] ? ipt_do_table+0x22a/0x680
> >> >  [125311.722018]  [<ffffffff816194b3>] ? tcp_validate_incoming+0x63/0x320
> >> >  [125311.722021]  [<ffffffff8161a55c>] tcp_rcv_established+0x2cc/0x810
> >> >  [125311.722023]  [<ffffffff81622c84>] tcp_v4_do_rcv+0x254/0x4f0
> >> >  [125311.722025]  [<ffffffff816245ac>] tcp_v4_rcv+0x5fc/0x750
> >> >  [125311.722027]  [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350
> >> >  [125311.722032]  [<ffffffff815df3ad>] ? nf_hook_slow+0x7d/0x160
> >> >  [125311.722034]  [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350
> >> >  [125311.722036]  [<ffffffff815fface>] ip_local_deliver_finish+0xce/0x250
> >> >  [125311.722037]  [<ffffffff815ffc9c>] ip_local_deliver+0x4c/0x80
> >> >  [125311.722039]  [<ffffffff815ff329>] ip_rcv_finish+0x119/0x360
> >> >  [125311.722040]  [<ffffffff815ff8e0>] ip_rcv+0x230/0x350
> >> >  [125311.722046]  [<ffffffff815b4067>] __netif_receive_skb_core+0x477/0x600
> >> >  [125311.722049]  [<ffffffff815b4217>] __netif_receive_skb+0x27/0x70
> >> >  [125311.722051]  [<ffffffff815b4354>] process_backlog+0xf4/0x1e0
> >> >  [125311.722053]  [<ffffffff815b4b45>] net_rx_action+0xf5/0x250
> >> >  [125311.722056]  [<ffffffff81053a5f>] __do_softirq+0xef/0x270
> >> >  [125311.722058]  [<ffffffff81053cb5>] irq_exit+0x95/0xa0
> >> >  [125311.722062]  [<ffffffff816c8f26>] do_IRQ+0x66/0xe0
> >> >  [125311.722065]  [<ffffffff816bf62a>] common_interrupt+0x6a/0x6a
> >> >  [125311.722065]  <EOI>  [<ffffffff8100abf1>] ? default_idle+0x21/0xc0
> >> >  [125311.722082]  [<ffffffff8100a54f>] arch_cpu_idle+0xf/0x20
> >> >  [125311.722086]  [<ffffffff8108f353>] cpu_startup_entry+0xb3/0x230
> >> >  [125311.722091]  [<ffffffff816b439e>] start_secondary+0x1dc/0x1e3
> >> >  [125311.722093] ---[ end trace e77cd5ba583fcbe9 ]---
> >> >  [125311.722096] 355.355.1.355:22496 F0x4120 S1 s7 IF25+17-1-24f0 ur57 rr3 rt0 um0 hs23120 nxt23120
> >> >
> >> > It's been happening with all 3.10 kernels, and the one above is .13 as
> >> > stated in the trace.
> >>
> >> Thanks! could you post the output of `sysctl -a |grep tcp`?
> >>
> >> I suspect tcp_process_tlp_ack() should not revert state to Open
> >> directly, but calling tcp_try_keep_open() instead, similar to all the
> >> undo processing in the tcp_fastretrans_alert(): after
> >> tcp_end_cwnd_reduction(), the process (E) falls back to check other
> >> stats before moving to CA_Open.
> >>
> >>
> >> index 9c62257..9012b42 100644
> >> --- a/net/ipv4/tcp_input.c
> >> +++ b/net/ipv4/tcp_input.c
> >> @@ -3314,7 +3314,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack,
> >>                         tcp_init_cwnd_reduction(sk, true);
> >>                         tcp_set_ca_state(sk, TCP_CA_CWR);
> >>                         tcp_end_cwnd_reduction(sk);
> >> -                       tcp_set_ca_state(sk, TCP_CA_Open);
> >> +                       tcp_try_keep_open(sk);
> >>                         NET_INC_STATS_BH(sock_net(sk),
> >>                                          LINUX_MIB_TCPLOSSPROBERECOVERY);
> >>                 }
> >>
> >
> > Should I apply this and see if the warning stops?
> I'd like to hear what the authors of TLP think. In the mean time could
> you help us collect more evidence by disabling TLP with
> sysctl net.ipv4.tcp_early_retrans=2
> and see if the problem still occurs? (it should not).
>
> thanks

Changed on one machine. We tend to only see one per box every 12-24 hours,
so it'll take a while to confirm.

> >
> > net.ipv4.tcp_abort_on_overflow = 0
> > net.ipv4.tcp_adv_win_scale = 1
> > net.ipv4.tcp_allowed_congestion_control = cubic reno
> > net.ipv4.tcp_app_win = 31
> > net.ipv4.tcp_available_congestion_control = cubic reno westwood
> > net.ipv4.tcp_base_mss = 512
> > net.ipv4.tcp_challenge_ack_limit = 100
> > net.ipv4.tcp_congestion_control = cubic
> > net.ipv4.tcp_dma_copybreak = 262144
> > net.ipv4.tcp_dsack = 1
> > net.ipv4.tcp_early_retrans = 3
>
>
> > net.ipv4.tcp_ecn = 2
> > net.ipv4.tcp_fack = 1
> > net.ipv4.tcp_fastopen = 0
> > net.ipv4.tcp_fastopen_key = 009dc92c-82e3e514-d440ed23-c49b1a89
> > net.ipv4.tcp_fin_timeout = 5
> > net.ipv4.tcp_frto = 0
> > net.ipv4.tcp_keepalive_intvl = 75
> > net.ipv4.tcp_keepalive_probes = 9
> > net.ipv4.tcp_keepalive_time = 1800
> > net.ipv4.tcp_limit_output_bytes = 131072
> > net.ipv4.tcp_low_latency = 0
> > net.ipv4.tcp_max_orphans = 2000000
> > net.ipv4.tcp_max_ssthresh = 0
> > net.ipv4.tcp_max_syn_backlog = 65536
> > net.ipv4.tcp_max_tw_buckets = 2000000
> > net.ipv4.tcp_mem = 6188001      8250670 12376002
> > net.ipv4.tcp_moderate_rcvbuf = 1
> > net.ipv4.tcp_mtu_probing = 0
> > net.ipv4.tcp_no_metrics_save = 1
> > net.ipv4.tcp_orphan_retries = 0
> > net.ipv4.tcp_reordering = 3
> > net.ipv4.tcp_retrans_collapse = 1
> > net.ipv4.tcp_retries1 = 3
> > net.ipv4.tcp_retries2 = 15
> > net.ipv4.tcp_rfc1337 = 0
> > net.ipv4.tcp_rmem = 4096        87380   16777216
> > net.ipv4.tcp_sack = 1
> > net.ipv4.tcp_slow_start_after_idle = 0
> > net.ipv4.tcp_stdurg = 0
> > net.ipv4.tcp_syn_retries = 6
> > net.ipv4.tcp_synack_retries = 5
> > net.ipv4.tcp_syncookies = 1
> > net.ipv4.tcp_thin_dupack = 0
> > net.ipv4.tcp_thin_linear_timeouts = 0
> > net.ipv4.tcp_timestamps = 1
> > net.ipv4.tcp_tso_win_divisor = 3
> > net.ipv4.tcp_tw_recycle = 0
> > net.ipv4.tcp_tw_reuse = 0
> > net.ipv4.tcp_user_cwnd_max = 20
> > net.ipv4.tcp_window_scaling = 1
> > net.ipv4.tcp_wmem = 4096        65536   16777216
> > net.ipv4.tcp_workaround_signed_windows = 0
> > net.ipv4.vs.secure_tcp = 0
> >
>

^ permalink raw reply

* Re: bug in passing file descriptors
From: Steve Rago @ 2013-10-07 20:29 UTC (permalink / raw)
  To: David Miller; +Cc: luto, netdev, mtk.manpages, ebiederm
In-Reply-To: <20131007.154226.533738557474978526.davem@davemloft.net>

On 10/07/2013 03:42 PM, David Miller wrote:
> There is no compatability issue.
>
> 32-bit tasks will always see the 4-byte align/length.
> 64-bit tasks will always see the 8-byte align/length.
>

Really?  So when I compile my application on a 32-bit Linux box and then try to run it on a 64-bit Linux box, you're not 
going to overrun my buffer when CMSG_SPACE led me to allocate an insufficient amount of memory needed to account for 
padding on the 64-bit platform?

By the way, FreeBSD, Mac OS X, and Solaris all behave as I described, so if you're happy with Linux behaving 
differently, then I'll stop wasting bandwidth.

Steve

^ permalink raw reply

* Re: [PATCH RFC 54/77] ntb: Ensure number of MSIs on SNB is enough for the link interrupt
From: Jon Mason @ 2013-10-07 20:31 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, Bjorn Helgaas, Ralf Baechle, Michael Ellerman,
	Benjamin Herrenschmidt, Martin Schwidefsky, Ingo Molnar,
	Tejun Heo, Dan Williams, Andy King, Matt Porter, stable,
	linux-pci, linux-mips, linuxppc-dev, linux390, linux-s390, x86,
	linux-ide, iss_storagedev, linux-nvme, linux-rdma, netdev,
	e1000-devel, linux-driver, Solarflare linux maintainers,
	VMware, Inc.
In-Reply-To: <20131007183845.GA1834@dhcp-26-207.brq.redhat.com>

On Mon, Oct 07, 2013 at 08:38:45PM +0200, Alexander Gordeev wrote:
> On Mon, Oct 07, 2013 at 09:50:57AM -0700, Jon Mason wrote:
> > On Sat, Oct 05, 2013 at 11:43:04PM +0200, Alexander Gordeev wrote:
> > > On Wed, Oct 02, 2013 at 05:48:05PM -0700, Jon Mason wrote:
> > > > On Wed, Oct 02, 2013 at 12:49:10PM +0200, Alexander Gordeev wrote:
> > > > > Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
> > > > > ---
> > > > >  drivers/ntb/ntb_hw.c |    2 +-
> > > > >  1 files changed, 1 insertions(+), 1 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
> > > > > index de2062c..eccd5e5 100644
> > > > > --- a/drivers/ntb/ntb_hw.c
> > > > > +++ b/drivers/ntb/ntb_hw.c
> > > > > @@ -1066,7 +1066,7 @@ static int ntb_setup_msix(struct ntb_device *ndev)
> > > > >  		/* On SNB, the link interrupt is always tied to 4th vector.  If
> > > > >  		 * we can't get all 4, then we can't use MSI-X.
> > > > >  		 */
> > > > > -		if (ndev->hw_type != BWD_HW) {
> > > > > +		if ((rc < SNB_MSIX_CNT) && (ndev->hw_type != BWD_HW)) {
> > > > 
> > > > Nack, this check is unnecessary.
> > > 
> > > If SNB can do more than SNB_MSIX_CNT MSI-Xs then this check is needed
> > > to enable less than maximum MSI-Xs in case the maximum was not allocated.
> > > Otherwise SNB will fallback to single MSI instead of multiple MSI-Xs.
> > 
> > Per the comment in the code snippet above, "If we can't get all 4,
> > then we can't use MSI-X".  There is already a check to see if more
> > than 4 were acquired.  So it's not possible to hit this.  Even if it
> > was, don't use SNB_MSIX_CNT here (limits.msix_cnt is the preferred
> > variable).  Also, the "()" are unnecessary.
> 
> The changelog is definitely bogus. I meant here an improvement to the
> existing scheme, not a conversion to the new one:
> 
> 	msix_entries = msix_table_size(val);
> 
> Getting i.e. 16 vectors here.
> 
> 	if (msix_entries > ndev->limits.msix_cnt) {

On SNB HW, limits.msix_cnt is set to SNB_MSIX_CNT (4)
http://lxr.free-electrons.com/source/drivers/ntb/ntb_hw.c#L558

> 		rc = -EINVAL;
> 		goto err;
> 	}
> 
> Upper limit check i.e. succeeds.
> 
> 	[...]
> 
> 	rc = pci_enable_msix(pdev, ndev->msix_entries, msix_entries);
> 
> pci_enable_msix() does not success and returns i.e. 8 here, should retry.

Per the above, since our upper bound is 4.  We will either have this
return 0 for all 4 or a number between 1 and 3 (or an error, but
that's not relevant to this discussion).

> 	if (rc < 0)
> 		goto err1;
> 	if (rc > 0) {
> 		/* On SNB, the link interrupt is always tied to 4th vector.  If
> 		 * we can't get all 4, then we can't use MSI-X.
> 		 */
> 		if (ndev->hw_type != BWD_HW) {
> 
> On SNB bail out here, although could have continue with 8 vectors.
> Can only use SNB_MSIX_CNT here, since limits.msix_cnt is the upper limit.

Since we can guarantee that rc is between 1 and 3 at this point (on
SNB HW), we should error out.

Thanks,
Jon


> 
> 			rc = -EIO;
> 			goto err1;
> 		}
> 
> 		[...]
> 	}
> 
> -- 
> Regards,
> Alexander Gordeev
> agordeev@redhat.com

^ permalink raw reply

* Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern
From: Ben Hutchings @ 2013-10-07 20:46 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Tejun Heo, Alexander Gordeev, linux-kernel, Bjorn Helgaas,
	Ralf Baechle, Michael Ellerman, Martin Schwidefsky, Ingo Molnar,
	Dan Williams, Andy King, Jon Mason, Matt Porter, linux-pci,
	linux-mips, linuxppc-dev, linux390, linux-s390, x86, linux-ide,
	iss_storagedev, linux-nvme, linux-rdma, netdev, e1000-devel,
	linux-driver, Solarflare linux maintainers
In-Reply-To: <1381176656.645.171.camel@pasglop>

On Tue, 2013-10-08 at 07:10 +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2013-10-07 at 14:01 -0400, Tejun Heo wrote:
> > I don't think the same race condition would happen with the loop.  The
> > problem case is where multiple msi(x) allocation fails completely
> > because the global limit went down before inquiry and allocation.  In
> > the loop based interface, it'd retry with the lower number.
> > 
> > As long as the number of drivers which need this sort of adaptive
> > allocation isn't too high and the common cases can be made simple, I
> > don't think the "complex" part of interface is all that important.
> > Maybe we can have reserve / cancel type interface or just keep the
> > loop with more explicit function names (ie. try_enable or something
> > like that).
> 
> I'm thinking a better API overall might just have been to request
> individual MSI-X one by one :-)
> 
> We want to be able to request an MSI-X at runtime anyway ... if I want
> to dynamically add a queue to my network interface, I want it to be able
> to pop a new arbitrary MSI-X.

Yes, this would be very useful.

> And we don't want to lock drivers into contiguous MSI-X sets either.

I don't think there's any such limitation now.  The entries array passed
to pci_enable_msix() specifies which MSI-X vectors the driver wants to
enable.  It's usually filled with 0..nvec-1 in order, but not always.
And the IRQ numbers returned aren't usually contiguous either, on x86.

Ben.

> And for the cleanup ... well that's what the "pcim" functions are for,
> we can just make MSI-X variants.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern
From: Ben Hutchings @ 2013-10-07 20:48 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Benjamin Herrenschmidt, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Bjorn Helgaas, Ralf Baechle, Michael Ellerman, Martin Schwidefsky,
	Ingo Molnar, Tejun Heo, Dan Williams, Andy King, Jon Mason,
	Matt Porter, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-mips-6z/3iImG2C8G8FEW9MqTrA,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux390-tA70FqPdS9bQT0dZR+AlfA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	linux-ide-u79uwXL29TY76Z2rM5mHXA, iss_storagedev-VXdhtT5mjnY,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	e1000-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-driver-h88ZbnxC6KDQT0dZR+AlfA, Solarflare linux maintainers
In-Reply-To: <20131006071027.GA29143-hdGaXg0bp3uRXgp2RCiI5R/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>

On Sun, 2013-10-06 at 09:10 +0200, Alexander Gordeev wrote:
> On Sun, Oct 06, 2013 at 05:19:46PM +1100, Benjamin Herrenschmidt wrote:
> > On Sun, 2013-10-06 at 08:02 +0200, Alexander Gordeev wrote:
> > > In fact, in the current design to address the quota race decently the
> > > drivers would have to protect the *loop* to prevent the quota change
> > > between a pci_enable_msix() returned a positive number and the the next
> > > call to pci_enable_msix() with that number. Is it doable?
> > 
> > I am not advocating for the current design, simply saying that your
> > proposal doesn't address this issue while Ben's does.
> 
> There is one major flaw in min-max approach - the generic MSI layer
> will have to take decisions on exact number of MSIs to request, not
> device drivers.
[...

No, the min-max functions should be implemented using the same loop that
drivers are expected to use now.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] netfilter: ipvs: ip_vs_sync: Remove unused variable
From: Fabio Estevam @ 2013-10-07 21:09 UTC (permalink / raw)
  To: horms; +Cc: ja, netdev, Fabio Estevam

From: Fabio Estevam <fabio.estevam@freescale.com>

Variable 'ret' is not used anywhere and it causes the following build warning: 

net/netfilter/ipvs/ip_vs_sync.c: In function 'sync_thread_master':
net/netfilter/ipvs/ip_vs_sync.c:1640:8: warning: unused variable 'ret' [-Wunused-variable]

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
---
 net/netfilter/ipvs/ip_vs_sync.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index f63c238..d258125 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -1637,7 +1637,7 @@ static int sync_thread_master(void *data)
 			continue;
 		}
 		while (ip_vs_send_sync_msg(tinfo->sock, sb->mesg) < 0) {
-			int ret = __wait_event_interruptible(*sk_sleep(sk),
+			__wait_event_interruptible(*sk_sleep(sk),
 						   sock_writeable(sk) ||
 						   kthread_should_stop());
 			if (unlikely(kthread_should_stop()))
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
From: Neil Horman @ 2013-10-07 21:20 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <20131007.155214.2232375975382665567.davem@davemloft.net>

Forgive the poor reply format, Dave, I deleted your email (to fast on the
trigger apparently), so I have to reconstruct it.

>> @@ -426,9 +426,12 @@ struct sk_buff {
>>  	char			cb[48] __aligned(8);
>>  
>>  	unsigned long		_skb_refdst;
>> -#ifdef CONFIG_XFRM
>> -	struct	sec_path	*sp;
>> -#endif
>> +
>> +	union {
>> +		struct	sec_path	*sp;
>> +		void 			*accel_priv;
>> +	};
>> +
>
>I'm not %100 sure these two things are really mutually exclusive.
>
>What if bridging ebtables does an input route lookup?  That can
>populate the security path.
>
You are mostly likely right, thats why this is an RFC, I haven't really thought
through that bit fully yet, to be perfectly honest.  I wanted a place for a
pointer to the accelerated data path data to live, and that looked like a
reasonably safe place at the time, but as you point out, its not.  I'll need to
find a better place for it.

>Also, why have you not added this to the usual netdev_ops and
>hw_features?

Thats me experimenting.  I was thinking that origionally this functionality
might be grouped separately, so that we could handle it independently of the
standard network device operations (you might have noticed in v1 of my patch I
had a size_t variable in there, so I thought the separation might be
organizationally nice).  It was also something I was tinkering with for
potential future work to support other data plane accelerators (like the FM6000
switch chip from intel) in a manner that didn't pollute the more typical host network
devices.  Like I said though, just experimenting at the moment....

Regards
Neil

^ permalink raw reply

* [PATCH] net: vlan: fix nlmsg size calculation in vlan_get_size()
From: Marc Kleine-Budde @ 2013-10-07 21:19 UTC (permalink / raw)
  To: netdev; +Cc: kernel, Marc Kleine-Budde, Patrick McHardy

This patch fixes the calculation of the nlmsg size, by adding the missing
nla_total_size().

Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 net/8021q/vlan_netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
index 3091297..c7e634a 100644
--- a/net/8021q/vlan_netlink.c
+++ b/net/8021q/vlan_netlink.c
@@ -171,7 +171,7 @@ static size_t vlan_get_size(const struct net_device *dev)
 
 	return nla_total_size(2) +	/* IFLA_VLAN_PROTOCOL */
 	       nla_total_size(2) +	/* IFLA_VLAN_ID */
-	       sizeof(struct ifla_vlan_flags) + /* IFLA_VLAN_FLAGS */
+	       nla_total_size(sizeof(struct ifla_vlan_flags)) + /* IFLA_VLAN_FLAGS */
 	       vlan_qos_map_size(vlan->nr_ingress_mappings) +
 	       vlan_qos_map_size(vlan->nr_egress_mappings);
 }
-- 
1.8.4.rc3

^ permalink raw reply related

* Re: bug in passing file descriptors
From: David Miller @ 2013-10-07 21:32 UTC (permalink / raw)
  To: sar; +Cc: luto, netdev, mtk.manpages, ebiederm
In-Reply-To: <5253199B.3000109@nec-labs.com>

From: Steve Rago <sar@nec-labs.com>
Date: Mon, 7 Oct 2013 16:29:15 -0400

> On 10/07/2013 03:42 PM, David Miller wrote:
>> There is no compatability issue.
>>
>> 32-bit tasks will always see the 4-byte align/length.
>> 64-bit tasks will always see the 8-byte align/length.
>>
> 
> Really?  So when I compile my application on a 32-bit Linux box and
> then try to run it on a 64-bit Linux box, you're not going to overrun
> my buffer when CMSG_SPACE led me to allocate an insufficient amount of
> memory needed to account for padding on the 64-bit platform?

We have a compatability layer that gives 32-bit applications the
same behavior as if they had run on a 32-bit machine.

Search around for the MSG_MSG_COMPAT flag and how that is used in
net/socket.c

^ permalink raw reply

* Re: [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
From: David Miller @ 2013-10-07 21:34 UTC (permalink / raw)
  To: nhorman; +Cc: netdev
In-Reply-To: <20131007212000.GA1596@hmsreliant.think-freely.org>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Mon, 7 Oct 2013 17:20:00 -0400

> Thats me experimenting.  I was thinking that origionally this functionality
> might be grouped separately, so that we could handle it independently of the
> standard network device operations (you might have noticed in v1 of my patch I
> had a size_t variable in there, so I thought the separation might be
> organizationally nice).  It was also something I was tinkering with for
> potential future work to support other data plane accelerators (like the FM6000
> switch chip from intel) in a manner that didn't pollute the more typical host network
> devices.  Like I said though, just experimenting at the moment....

Can these dataplane devices still act like a normal networking port and
send and receive packets at the host level?

If yes, that would be an extremely strong argument for netdev_ops.

^ permalink raw reply

* Re: [PATCH v2.42 1/5] odp: Allow VLAN actions after MPLS actions
From: Ben Pfaff @ 2013-10-07 21:41 UTC (permalink / raw)
  To: Simon Horman
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, Ravi K, netdev-u79uwXL29TY76Z2rM5mHXA,
	Isaku Yamahata
In-Reply-To: <20131007063447.GF19926-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>

On Mon, Oct 07, 2013 at 03:34:47PM +0900, Simon Horman wrote:
> What I have done is to make an incremental patch which:
> 
> 1. Moves the 'vlan_tci' member of strict xlate_in to
>    be the 'final_vlan_tci' member of struct xlate_ctx.
> 
> 2. Moves the 'vlan_tci' local variable of do_xlate_actions()
>    to be the 'next_vlan_tci' member of struct xlate_ctx.
> 
> 3. Restructures the comments surrounding the logic of the vlan_tci
>    code that this patch adds mostly as comments for the new
>    members of struct xlate_ctx. I hope things are (still?) clear.
> 
> For reference, the incremental patch I have so far is as follows.
> I will squash it into this patch before reposting this series.

Thanks a lot, I'll take another look at the series now.

^ permalink raw reply

* [PATCH] pkt_sched: fq: fix non TCP flows pacing
From: Eric Dumazet @ 2013-10-07 21:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Steinar H. Gunderson

From: Eric Dumazet <edumazet@google.com>

Steinar reported FQ pacing was not working for UDP flows.

It looks like the initial sk->sk_pacing_rate value of 0 was
a wrong choice. We should init it to ~0U like sk_max_pacing_rate

Then, TCA_FQ_FLOW_DEFAULT_RATE should be removed because it makes
no real sense. (The default rate is really : ~0U)

While debugging this issue, I realized sk_pacing_rate is shared between
transport and packet scheduler without locking / barriers :

We should use ACCESS_ONCE() to make sure compiler wont perform
multiple loads or stores.

Reported-by: Steinar H. Gunderson <sesse@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/sock.c      |    1 +
 net/ipv4/tcp_input.c |    7 ++++++-
 net/sched/sch_fq.c   |   20 +++++++++-----------
 3 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 2bd9b3f..fd6afa2 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2331,6 +2331,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 #endif
 
 	sk->sk_max_pacing_rate = ~0U;
+	sk->sk_pacing_rate = ~0U;
 	/*
 	 * Before updating sk_refcnt, we must commit prior changes to memory
 	 * (Documentation/RCU/rculist_nulls.txt for details)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index fa6cf1f..d95f875 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -755,7 +755,12 @@ static void tcp_update_pacing_rate(struct sock *sk)
 	if (tp->srtt > 8 + 2)
 		do_div(rate, tp->srtt);
 
-	sk->sk_pacing_rate = min_t(u64, rate, sk->sk_max_pacing_rate);
+	/* ACCESS_ONCE() is needed because FQ fetches sk_pacing_rate without
+	 * any lock. We want to make sure compiler wont use sk_pacing_rate
+	 * with intermediate values.
+	 */
+	ACCESS_ONCE(sk->sk_pacing_rate) = min_t(u64, rate,
+						sk->sk_max_pacing_rate);
 }
 
 /* Calculate rto without backoff.  This is the second half of Van Jacobson's
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index a2fef8b..46b2adb 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -472,20 +472,16 @@ begin:
 	if (f->credit > 0 || !q->rate_enable)
 		goto out;
 
-	if (skb->sk && skb->sk->sk_state != TCP_TIME_WAIT) {
-		rate = skb->sk->sk_pacing_rate ?: q->flow_default_rate;
+	rate = q->flow_max_rate;
+	if (skb->sk && skb->sk->sk_state != TCP_TIME_WAIT)
+		rate = min(ACCESS_ONCE(skb->sk->sk_pacing_rate), rate);
 
-		rate = min(rate, q->flow_max_rate);
-	} else {
-		rate = q->flow_max_rate;
-		if (rate == ~0U)
-			goto out;
-	}
-	if (rate) {
+	if (rate != ~0U) {
 		u32 plen = max(qdisc_pkt_len(skb), q->quantum);
 		u64 len = (u64)plen * NSEC_PER_SEC;
 
-		do_div(len, rate);
+		if (likely(rate))
+			do_div(len, rate);
 		/* Since socket rate can change later,
 		 * clamp the delay to 125 ms.
 		 * TODO: maybe segment the too big skb, as in commit
@@ -735,12 +731,14 @@ static int fq_dump(struct Qdisc *sch, struct sk_buff *skb)
 	if (opts == NULL)
 		goto nla_put_failure;
 
+	/* TCA_FQ_FLOW_DEFAULT_RATE is not used anymore,
+	 * do not bother giving its value
+	 */
 	if (nla_put_u32(skb, TCA_FQ_PLIMIT, sch->limit) ||
 	    nla_put_u32(skb, TCA_FQ_FLOW_PLIMIT, q->flow_plimit) ||
 	    nla_put_u32(skb, TCA_FQ_QUANTUM, q->quantum) ||
 	    nla_put_u32(skb, TCA_FQ_INITIAL_QUANTUM, q->initial_quantum) ||
 	    nla_put_u32(skb, TCA_FQ_RATE_ENABLE, q->rate_enable) ||
-	    nla_put_u32(skb, TCA_FQ_FLOW_DEFAULT_RATE, q->flow_default_rate) ||
 	    nla_put_u32(skb, TCA_FQ_FLOW_MAX_RATE, q->flow_max_rate) ||
 	    nla_put_u32(skb, TCA_FQ_BUCKETS_LOG, q->fq_trees_log))
 		goto nla_put_failure;

^ permalink raw reply related

* Re: [RFC PATCH 0/2 v2] net: alternate proposal for using macvlans with forwarding acceleration
From: John Fastabend @ 2013-10-07 22:09 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, John Fastabend, Andy Gospodarek, David Miller
In-Reply-To: <1380917405-23801-1-git-send-email-nhorman@tuxdriver.com>

On 10/04/2013 01:10 PM, Neil Horman wrote:
> Hey all-
>       heres the next, updated version of the vsi/macvlan integration that we've
> been discussing.
>
> Some change notes:
>
> * Changes to the fowarding ops structure - Removed the priv_size field, and
> added a flags field.  Removal of the priv_size field was accomplished by just
> having the add method return a void * and using ERR_PTR and PTR_ERR checks,
> which also allows us to allocate memory for the acceleration path in the driver,
> which I like.  I'm not super happy still with how I'm using the flags (currenly
> only used to indicate support for feature sets), but at least we have the flags
> now, and they can be exposed to user space via iproute2 or ethtool if need be

For the flag why not use the existing feature flag namespace? Adding
NETIF_F_HW_L2FW_DOFFLOAD or something equivalent to netdev_features.h
and also doing the string updates would get the userspace support for
free.

The one downside of a global feature flag approach like this is if
the user wants to create some offloaded macvlan devices and some SW
only macvlans the control sequence is a bit clumsy. But unless we add
a new flag to the macvlan netlink message I'm not sure how to avoid
this. Further if you have multiple control applications
creating/deleting these macvlans the flag mechanism starts to break
down. One app sets the flag the other deletes, you get the idea.

It might be worth adding a netlink flag to change the state of
the HW offload. I know this goes against the 'it just gets offloaded'
line of thought, but if you make the default to offload then it should
be OK.

>
> * Changes to the Transmit path - Specifically I'm using dev_queue_xmit to send
> frames now, which I like as it makes the macvlan subject to the lowerdevs qdisc
> configuration.

This creates perhaps another oddity where on TX the packets will be
visible to the lowerdev. Think tcpdump and egress qdisc. But on ingress
the packets will be delivered directly to macvlan device. Also I was
thinking there might be good reasons to skip the lowerdev qdisc, for
performance reasons or QOS setups.

So it might be best the way you had it in the previous revision even if
it was not functionally equivalent to the SW path. If you skip the lower
dev and submit directly to the hardware then you also don't need a
sk_buff change. The driver can do a lookup via the skb->dev field and
additional hints from the stack are not needed.

If there is a perceived problem with not having the HW and SW completely
equivalent functionally we could always default the feature flag to off.
Then enabling the feature would be a single 'ethtool' command. This
seems like a nice compromise.

I like where this is going though!

Thanks,
John

-- 
John Fastabend         Intel Corporation

^ permalink raw reply

* Re: [E1000-devel] [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern
From: Waskiewicz Jr, Peter P @ 2013-10-07 22:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Tejun Heo, linux-mips@linux-mips.org, VMware, Inc.,
	linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-ide@vger.kernel.org, linux-s390@vger.kernel.org, King,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	x86@kernel.org, Ben Hutchings, Alexander Gordeev, Matt Porter,
	iss_storagedev@hp.com, Michael Ellerman, "lin
In-Reply-To: <1381176656.645.171.camel@pasglop>

On Tue, 2013-10-08 at 07:10 +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2013-10-07 at 14:01 -0400, Tejun Heo wrote:
> > I don't think the same race condition would happen with the loop.  The
> > problem case is where multiple msi(x) allocation fails completely
> > because the global limit went down before inquiry and allocation.  In
> > the loop based interface, it'd retry with the lower number.
> > 
> > As long as the number of drivers which need this sort of adaptive
> > allocation isn't too high and the common cases can be made simple, I
> > don't think the "complex" part of interface is all that important.
> > Maybe we can have reserve / cancel type interface or just keep the
> > loop with more explicit function names (ie. try_enable or something
> > like that).
> 
> We want to be able to request an MSI-X at runtime anyway ... if I want
> to dynamically add a queue to my network interface, I want it to be able
> to pop a new arbitrary MSI-X.

If you want to dynamically allocate another queue, you'd either need to
have them all pre-allocated at alloc_etherdev_mqs(), or add a new API to
netdev that allows adding new queues on the fly.

How things are done today, the Tx queues are all tacked onto the end of
the netdev struct.  That would have to change to probably a linked list
of queues that could be grown or shrunk on the fly.
netif_alloc_netdev_queues() would need to change the kzalloc() to a list
allocation.

Cheers,
-PJ

^ permalink raw reply

* Re: [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
From: John Fastabend @ 2013-10-07 22:39 UTC (permalink / raw)
  To: David Miller; +Cc: nhorman, netdev
In-Reply-To: <20131007.173433.163556658910279518.davem@davemloft.net>

On 10/07/2013 02:34 PM, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Mon, 7 Oct 2013 17:20:00 -0400
>
>> Thats me experimenting.  I was thinking that origionally this functionality
>> might be grouped separately, so that we could handle it independently of the
>> standard network device operations (you might have noticed in v1 of my patch I
>> had a size_t variable in there, so I thought the separation might be
>> organizationally nice).  It was also something I was tinkering with for
>> potential future work to support other data plane accelerators (like the FM6000
>> switch chip from intel) in a manner that didn't pollute the more typical host network
>> devices.  Like I said though, just experimenting at the moment....
>

We can do something like the dcbnl ops and add another pointer off
the net device structure and then use the skb->dev field to find the
correct set of ops? This seems like the simplest option to me and
isolates the ops structure.

Is there some information loss from hanging it off the netdevice
structure vs the skb? I can't see any.

> Can these dataplane devices still act like a normal networking port and
> send and receive packets at the host level?
>

Yes they act like normal networking ports except for there is a
switching component in the hardware. These patches are not looking at
virtual or multiple physical functions at the moment.

> If yes, that would be an extremely strong argument for netdev_ops.

I agree.


-- 
John Fastabend         Intel Corporation

^ permalink raw reply

* Re: [PATCH] net: Separate the close_list and the unreg_list v2
From: Eric W. Biederman @ 2013-10-07 22:45 UTC (permalink / raw)
  To: David Miller; +Cc: fruggeri, netdev
In-Reply-To: <20131007.152238.779958484281422820.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> From: ebiederm@xmission.com (Eric W. Biederman)
> Date: Sat, 05 Oct 2013 19:26:05 -0700
>
>> 
>> Separate the unreg_list and the close_list in dev_close_many preventing
>> dev_close_many from permuting the unreg_list.  The permutations of the
>> unreg_list have resulted in cases where the loopback device is accessed
>> it has been freed in code such as dst_ifdown.  Resulting in subtle memory
>> corruption.
>> 
>> This is the second bug from sharing the storage between the close_list
>> and the unreg_list.  The issues that crop up with sharing are
>> apparently too subtle to show up in normal testing or usage, so let's
>> forget about being clever and use two separate lists.
>> 
>> v2: Make all callers pass in a close_list to dev_close_many
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>> 
>> Sending the complete diff because this version is actually more
>> readable and more obviously correct.
>
> I'll apply this, thanks Eric.

Thanks.  It is good to see this getting sorted out.

Eric

^ permalink raw reply

* [PATCH 1/4] [RFC] net: Explicitly initialize u64_stats_sync structures for lockdep
From: John Stultz @ 2013-10-07 22:51 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Eric Dumazet, Thomas Petazzoni, Mirko Lindner,
	Stephen Hemminger, Roger Luethi, Patrick McHardy, Rusty Russell,
	Michael S. Tsirkin, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Wensong Zhang, Simon Horman, Julian Anastasov,
	Jesse Gross, Mathieu Desnoyers, Steven Rostedt, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, David S. Miller, netdev
In-Reply-To: <1381186321-4906-1-git-send-email-john.stultz@linaro.org>

In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.

The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.

This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.

Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.

Feedback would be appreciated!

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Roger Luethi <rl@hellgate.ch>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/net/dummy.c                            |  6 ++++++
 drivers/net/ethernet/emulex/benet/be_main.c    |  4 ++++
 drivers/net/ethernet/intel/igb/igb_main.c      |  5 +++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  4 ++++
 drivers/net/ethernet/marvell/mvneta.c          |  3 +++
 drivers/net/ethernet/marvell/sky2.c            |  3 +++
 drivers/net/ethernet/neterion/vxge/vxge-main.c |  4 ++++
 drivers/net/ethernet/nvidia/forcedeth.c        |  2 ++
 drivers/net/ethernet/realtek/8139too.c         |  3 +++
 drivers/net/ethernet/tile/tilepro.c            |  2 ++
 drivers/net/ethernet/via/via-rhine.c           |  3 +++
 drivers/net/ifb.c                              |  5 +++++
 drivers/net/loopback.c                         |  6 ++++++
 drivers/net/macvlan.c                          |  7 +++++++
 drivers/net/nlmon.c                            |  8 ++++++++
 drivers/net/team/team.c                        |  6 ++++++
 drivers/net/team/team_mode_loadbalance.c       |  9 ++++++++-
 drivers/net/veth.c                             |  8 ++++++++
 drivers/net/virtio_net.c                       |  8 ++++++++
 drivers/net/vxlan.c                            |  8 ++++++++
 drivers/net/xen-netfront.c                     |  6 ++++++
 include/linux/u64_stats_sync.h                 |  7 +++++++
 net/8021q/vlan_dev.c                           |  9 ++++++++-
 net/bridge/br_device.c                         |  7 +++++++
 net/ipv4/af_inet.c                             | 14 ++++++++++++++
 net/ipv4/ip_tunnel.c                           |  8 +++++++-
 net/ipv6/addrconf.c                            | 14 ++++++++++++++
 net/ipv6/af_inet6.c                            | 14 ++++++++++++++
 net/ipv6/ip6_gre.c                             | 15 +++++++++++++++
 net/ipv6/ip6_tunnel.c                          |  7 +++++++
 net/ipv6/sit.c                                 | 15 +++++++++++++++
 net/netfilter/ipvs/ip_vs_ctl.c                 | 25 ++++++++++++++++++++++---
 net/openvswitch/datapath.c                     |  6 ++++++
 net/openvswitch/vport.c                        |  8 ++++++++
 34 files changed, 253 insertions(+), 6 deletions(-)

diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index b710c6b..bd8f84b 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -88,10 +88,16 @@ static netdev_tx_t dummy_xmit(struct sk_buff *skb, struct net_device *dev)
 
 static int dummy_dev_init(struct net_device *dev)
 {
+	int i;
 	dev->dstats = alloc_percpu(struct pcpu_dstats);
 	if (!dev->dstats)
 		return -ENOMEM;
 
+	for_each_possible_cpu(i) {
+		struct pcpu_dstats *dstats;
+		dstats = per_cpu_ptr(dev->dstats, i);
+		u64_stats_init(&dstats->syncp);
+	}
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 100b528..d2dcf2e 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -2033,6 +2033,9 @@ static int be_tx_qs_create(struct be_adapter *adapter)
 		if (status)
 			return status;
 
+		u64_stats_init(&txo->stats.sync);
+		u64_stats_init(&txo->stats.sync_compl);
+
 		/* If num_evt_qs is less than num_tx_qs, then more than
 		 * one txq share an eq
 		 */
@@ -2094,6 +2097,7 @@ static int be_rx_cqs_create(struct be_adapter *adapter)
 		if (rc)
 			return rc;
 
+		u64_stats_init(&rxo->stats.sync);
 		eq = &adapter->eq_obj[i % adapter->num_evt_qs].q;
 		rc = be_cmd_cq_create(adapter, cq, eq, false, 3);
 		if (rc)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 8cf44f2..b6edb93 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -1223,6 +1223,9 @@ static int igb_alloc_q_vector(struct igb_adapter *adapter,
 		ring->count = adapter->tx_ring_count;
 		ring->queue_index = txr_idx;
 
+		u64_stats_init(&ring->tx_syncp);
+		u64_stats_init(&ring->tx_syncp2);
+
 		/* assign ring to adapter */
 		adapter->tx_ring[txr_idx] = ring;
 
@@ -1256,6 +1259,8 @@ static int igb_alloc_q_vector(struct igb_adapter *adapter,
 		ring->count = adapter->rx_ring_count;
 		ring->queue_index = rxr_idx;
 
+		u64_stats_init(&ring->rx_syncp);
+
 		/* assign ring to adapter */
 		adapter->rx_ring[rxr_idx] = ring;
 	}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0ade0cd..c175036 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -4867,6 +4867,8 @@ int ixgbe_setup_tx_resources(struct ixgbe_ring *tx_ring)
 	if (!tx_ring->tx_buffer_info)
 		goto err;
 
+	u64_stats_init(&tx_ring->syncp);
+
 	/* round up to nearest 4K */
 	tx_ring->size = tx_ring->count * sizeof(union ixgbe_adv_tx_desc);
 	tx_ring->size = ALIGN(tx_ring->size, 4096);
@@ -4949,6 +4951,8 @@ int ixgbe_setup_rx_resources(struct ixgbe_ring *rx_ring)
 	if (!rx_ring->rx_buffer_info)
 		goto err;
 
+	u64_stats_init(&rx_ring->syncp);
+
 	/* Round up to nearest 4K */
 	rx_ring->size = rx_ring->count * sizeof(union ixgbe_adv_rx_desc);
 	rx_ring->size = ALIGN(rx_ring->size, 4096);
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index e35bac7..cb4635c 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2792,6 +2792,9 @@ static int mvneta_probe(struct platform_device *pdev)
 
 	pp = netdev_priv(dev);
 
+	u64_stats_init(&pp->tx_stats.syncp);
+	u64_stats_init(&pp->rx_stats.syncp);
+
 	pp->weight = MVNETA_RX_POLL_WEIGHT;
 	pp->phy_node = phy_node;
 	pp->phy_interface = phy_mode;
diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
index e09a8c6..339d841 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -4763,6 +4763,9 @@ static struct net_device *sky2_init_netdev(struct sky2_hw *hw, unsigned port,
 	sky2->hw = hw;
 	sky2->msg_enable = netif_msg_init(debug, default_msg);
 
+	u64_stats_init(&sky2->tx_stats.syncp);
+	u64_stats_init(&sky2->rx_stats.syncp);
+
 	/* Auto speed and flow control */
 	sky2->flags = SKY2_FLAG_AUTO_SPEED | SKY2_FLAG_AUTO_PAUSE;
 	if (hw->chip_id != CHIP_ID_YUKON_XL)
diff --git a/drivers/net/ethernet/neterion/vxge/vxge-main.c b/drivers/net/ethernet/neterion/vxge/vxge-main.c
index 5a20eaf..44626ec 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-main.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-main.c
@@ -2072,6 +2072,10 @@ static int vxge_open_vpaths(struct vxgedev *vdev)
 				vdev->config.tx_steering_type;
 			vpath->fifo.ndev = vdev->ndev;
 			vpath->fifo.pdev = vdev->pdev;
+
+			u64_stats_init(&vpath->fifo.stats.syncp);
+			u64_stats_init(&vpath->ring.stats.syncp);
+
 			if (vdev->config.tx_steering_type)
 				vpath->fifo.txq =
 					netdev_get_tx_queue(vdev->ndev, i);
diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index 098b96d..2d045be 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -5619,6 +5619,8 @@ static int nv_probe(struct pci_dev *pci_dev, const struct pci_device_id *id)
 	spin_lock_init(&np->lock);
 	spin_lock_init(&np->hwstats_lock);
 	SET_NETDEV_DEV(dev, &pci_dev->dev);
+	u64_stats_init(&np->swstats_rx_syncp);
+	u64_stats_init(&np->swstats_tx_syncp);
 
 	init_timer(&np->oom_kick);
 	np->oom_kick.data = (unsigned long) dev;
diff --git a/drivers/net/ethernet/realtek/8139too.c b/drivers/net/ethernet/realtek/8139too.c
index 3ccedeb..c40e9848 100644
--- a/drivers/net/ethernet/realtek/8139too.c
+++ b/drivers/net/ethernet/realtek/8139too.c
@@ -791,6 +791,9 @@ static struct net_device *rtl8139_init_board(struct pci_dev *pdev)
 
 	pci_set_master (pdev);
 
+	u64_stats_init(&tp->rx_stats.syncp);
+	u64_stats_init(&tp->tx_stats.syncp);
+
 retry:
 	/* PIO bar register comes first. */
 	bar = !use_io;
diff --git a/drivers/net/ethernet/tile/tilepro.c b/drivers/net/ethernet/tile/tilepro.c
index 106be47..edb2e12 100644
--- a/drivers/net/ethernet/tile/tilepro.c
+++ b/drivers/net/ethernet/tile/tilepro.c
@@ -1008,6 +1008,8 @@ static void tile_net_register(void *dev_ptr)
 	info->egress_timer.data = (long)info;
 	info->egress_timer.function = tile_net_handle_egress_timer;
 
+	u64_stats_init(&info->stats.syncp);
+
 	priv->cpu[my_cpu] = info;
 
 	/*
diff --git a/drivers/net/ethernet/via/via-rhine.c b/drivers/net/ethernet/via/via-rhine.c
index c8f088a..13cade2 100644
--- a/drivers/net/ethernet/via/via-rhine.c
+++ b/drivers/net/ethernet/via/via-rhine.c
@@ -987,6 +987,9 @@ static int rhine_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	rp->base = ioaddr;
 
+	u64_stats_init(&rp->tx_stats.syncp);
+	u64_stats_init(&rp->rx_stats.syncp);
+
 	/* Get chip registers into a sane state */
 	rhine_power_init(dev);
 	rhine_hw_init(dev, pioaddr);
diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index a3bed28..c14d39b 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -265,6 +265,7 @@ MODULE_PARM_DESC(numifbs, "Number of ifb devices");
 static int __init ifb_init_one(int index)
 {
 	struct net_device *dev_ifb;
+	struct ifb_private *dp;
 	int err;
 
 	dev_ifb = alloc_netdev(sizeof(struct ifb_private),
@@ -273,6 +274,10 @@ static int __init ifb_init_one(int index)
 	if (!dev_ifb)
 		return -ENOMEM;
 
+	dp = netdev_priv(dev_ifb);
+	u64_stats_init(&dp->rsync);
+	u64_stats_init(&dp->tsync);
+
 	dev_ifb->rtnl_link_ops = &ifb_link_ops;
 	err = register_netdevice(dev_ifb);
 	if (err < 0)
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index a17d85a..ac24c27 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -137,10 +137,16 @@ static const struct ethtool_ops loopback_ethtool_ops = {
 
 static int loopback_dev_init(struct net_device *dev)
 {
+	int i;
 	dev->lstats = alloc_percpu(struct pcpu_lstats);
 	if (!dev->lstats)
 		return -ENOMEM;
 
+	for_each_possible_cpu(i) {
+		struct pcpu_lstats *lb_stats;
+		lb_stats = per_cpu_ptr(dev->lstats, i);
+		u64_stats_init(&lb_stats->syncp);
+	}
 	return 0;
 }
 
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 9bf46bd..0924e51b 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -501,6 +501,7 @@ static int macvlan_init(struct net_device *dev)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
 	const struct net_device *lowerdev = vlan->lowerdev;
+	int i;
 
 	dev->state		= (dev->state & ~MACVLAN_STATE_MASK) |
 				  (lowerdev->state & MACVLAN_STATE_MASK);
@@ -516,6 +517,12 @@ static int macvlan_init(struct net_device *dev)
 	if (!vlan->pcpu_stats)
 		return -ENOMEM;
 
+	for_each_possible_cpu(i) {
+		struct macvlan_pcpu_stats *mvlstats;
+		mvlstats = per_cpu_ptr(vlan->pcpu_stats, i);
+		u64_stats_init(&mvlstats->syncp);
+	}
+
 	return 0;
 }
 
diff --git a/drivers/net/nlmon.c b/drivers/net/nlmon.c
index b57ce5f..d2bb12b 100644
--- a/drivers/net/nlmon.c
+++ b/drivers/net/nlmon.c
@@ -47,8 +47,16 @@ static int nlmon_change_mtu(struct net_device *dev, int new_mtu)
 
 static int nlmon_dev_init(struct net_device *dev)
 {
+	int i;
+
 	dev->lstats = alloc_percpu(struct pcpu_lstats);
 
+	for_each_possible_cpu(i) {
+		struct pcpu_lstats *nlmstats;
+		nlmstats = per_cpu_ptr(dev->lstats, i);
+		u64_stats_init(&nlmstats->syncp);
+	}
+
 	return dev->lstats == NULL ? -ENOMEM : 0;
 }
 
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 50e43e6..6574eb8 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1540,6 +1540,12 @@ static int team_init(struct net_device *dev)
 	if (!team->pcpu_stats)
 		return -ENOMEM;
 
+	for_each_possible_cpu(i) {
+		struct team_pcpu_stats *team_stats;
+		team_stats = per_cpu_ptr(team->pcpu_stats, i);
+		u64_stats_init(&team_stats->syncp);
+	}
+
 	for (i = 0; i < TEAM_PORT_HASHENTRIES; i++)
 		INIT_HLIST_HEAD(&team->en_port_hlist[i]);
 	INIT_LIST_HEAD(&team->port_list);
diff --git a/drivers/net/team/team_mode_loadbalance.c b/drivers/net/team/team_mode_loadbalance.c
index 829a9cd..d671fc3 100644
--- a/drivers/net/team/team_mode_loadbalance.c
+++ b/drivers/net/team/team_mode_loadbalance.c
@@ -570,7 +570,7 @@ static int lb_init(struct team *team)
 {
 	struct lb_priv *lb_priv = get_lb_priv(team);
 	lb_select_tx_port_func_t *func;
-	int err;
+	int i, err;
 
 	/* set default tx port selector */
 	func = lb_select_tx_port_get_func("hash");
@@ -588,6 +588,13 @@ static int lb_init(struct team *team)
 		goto err_alloc_pcpu_stats;
 	}
 
+	for_each_possible_cpu(i) {
+		struct lb_pcpu_stats *team_lb_stats;
+		team_lb_stats = per_cpu_ptr(lb_priv->pcpu_stats, i);
+		u64_stats_init(&team_lb_stats->syncp);
+	}
+
+
 	INIT_DELAYED_WORK(&lb_priv->ex->stats.refresh_dw, lb_stats_refresh);
 
 	err = team_options_register(team, lb_options, ARRAY_SIZE(lb_options));
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index eee1f19..46e83e3 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -230,10 +230,18 @@ static int veth_change_mtu(struct net_device *dev, int new_mtu)
 
 static int veth_dev_init(struct net_device *dev)
 {
+	int i;
+
 	dev->vstats = alloc_percpu(struct pcpu_vstats);
 	if (!dev->vstats)
 		return -ENOMEM;
 
+	for_each_possible_cpu(i) {
+		struct pcpu_vstats *veth_stats;
+		veth_stats = per_cpu_ptr(dev->vstats, i);
+		u64_stats_init(&veth_stats->syncp);
+	}
+
 	return 0;
 }
 
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index defec2b..bd12772 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1559,6 +1559,14 @@ static int virtnet_probe(struct virtio_device *vdev)
 	if (vi->stats == NULL)
 		goto free;
 
+	for_each_possible_cpu(i) {
+		struct virtnet_stats *virtnet_stats;
+		virtnet_stats = per_cpu_ptr(vi->stats, i);
+		u64_stats_init(&virtnet_stats->tx_syncp);
+		u64_stats_init(&virtnet_stats->rx_syncp);
+	}
+
+
 	vi->vq_index = alloc_percpu(int);
 	if (vi->vq_index == NULL)
 		goto free_stats;
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index d1292fe..2e4cdc8 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1886,11 +1886,19 @@ static int vxlan_init(struct net_device *dev)
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
 	struct vxlan_sock *vs;
+	int i;
 
 	dev->tstats = alloc_percpu(struct pcpu_tstats);
 	if (!dev->tstats)
 		return -ENOMEM;
 
+	for_each_possible_cpu(i) {
+		struct pcpu_tstats *vxlan_stats;
+		vxlan_stats = per_cpu_ptr(dev->tstats, i);
+		u64_stats_init(&vxlan_stats->syncp);
+	}
+
+
 	spin_lock(&vn->sock_lock);
 	vs = vxlan_find_sock(dev_net(dev), vxlan->dst_port);
 	if (vs) {
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 36808bf..54223ac 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1338,6 +1338,12 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 	if (np->stats == NULL)
 		goto exit;
 
+	for_each_possible_cpu(i) {
+		struct netfront_stats *xen_nf_stats;
+		xen_nf_stats = per_cpu_ptr(np->stats, i);
+		u64_stats_init(&xen_nf_stats->syncp);
+	}
+
 	/* Initialise tx_skbs as a free chain containing every entry. */
 	np->tx_skb_freelist = 0;
 	for (i = 0; i < NET_TX_RING_SIZE; i++) {
diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
index 8da8c4e..7bfabd2 100644
--- a/include/linux/u64_stats_sync.h
+++ b/include/linux/u64_stats_sync.h
@@ -67,6 +67,13 @@ struct u64_stats_sync {
 #endif
 };
 
+
+#if BITS_PER_LONG == 32 && defined(CONFIG_SMP)
+# define u64_stats_init(syncp)	seqcount_init(syncp.seq)
+#else
+# define u64_stats_init(syncp)	do { } while (0)
+#endif
+
 static inline void u64_stats_update_begin(struct u64_stats_sync *syncp)
 {
 #if BITS_PER_LONG==32 && defined(CONFIG_SMP)
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 09bf1c3..4deff3e 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -558,7 +558,7 @@ static const struct net_device_ops vlan_netdev_ops;
 static int vlan_dev_init(struct net_device *dev)
 {
 	struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
-	int subclass = 0;
+	int subclass = 0, i;
 
 	netif_carrier_off(dev);
 
@@ -612,6 +612,13 @@ static int vlan_dev_init(struct net_device *dev)
 	if (!vlan_dev_priv(dev)->vlan_pcpu_stats)
 		return -ENOMEM;
 
+	for_each_possible_cpu(i) {
+		struct vlan_pcpu_stats *vlan_stat;
+		vlan_stat = per_cpu_ptr(vlan_dev_priv(dev)->vlan_pcpu_stats, i);
+		u64_stats_init(&vlan_stat->syncp);
+	}
+
+
 	return 0;
 }
 
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index ca04163..7893d64 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -88,11 +88,18 @@ out:
 static int br_dev_init(struct net_device *dev)
 {
 	struct net_bridge *br = netdev_priv(dev);
+	int i;
 
 	br->stats = alloc_percpu(struct br_cpu_netstats);
 	if (!br->stats)
 		return -ENOMEM;
 
+	for_each_possible_cpu(i) {
+		struct br_cpu_netstats *br_dev_stats;
+		br_dev_stats = per_cpu_ptr(br->stats, i);
+		u64_stats_init(&br_dev_stats->syncp);
+	}
+
 	return 0;
 }
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 7a1874b..f40ce62 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1520,6 +1520,7 @@ int snmp_mib_init(void __percpu *ptr[2], size_t mibsize, size_t align)
 	ptr[0] = __alloc_percpu(mibsize, align);
 	if (!ptr[0])
 		return -ENOMEM;
+
 #if SNMP_ARRAY_SZ == 2
 	ptr[1] = __alloc_percpu(mibsize, align);
 	if (!ptr[1]) {
@@ -1563,6 +1564,8 @@ static const struct net_protocol icmp_protocol = {
 
 static __net_init int ipv4_mib_init_net(struct net *net)
 {
+	int i;
+
 	if (snmp_mib_init((void __percpu **)net->mib.tcp_statistics,
 			  sizeof(struct tcp_mib),
 			  __alignof__(struct tcp_mib)) < 0)
@@ -1571,6 +1574,17 @@ static __net_init int ipv4_mib_init_net(struct net *net)
 			  sizeof(struct ipstats_mib),
 			  __alignof__(struct ipstats_mib)) < 0)
 		goto err_ip_mib;
+
+	for_each_possible_cpu(i) {
+		struct ipstats_mib *af_inet_stats;
+		af_inet_stats = per_cpu_ptr(net->mib.ip_statistics[0], i);
+		u64_stats_init(&af_inet_stats->syncp);
+#if SNMP_ARRAY_SZ == 2
+		af_inet_stats = per_cpu_ptr(net->mib.ip_statistics[1], i);
+		u64_stats_init(&af_inet_stats->syncp);
+#endif
+	}
+
 	if (snmp_mib_init((void __percpu **)net->mib.net_statistics,
 			  sizeof(struct linux_mib),
 			  __alignof__(struct linux_mib)) < 0)
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index ac9fabe..2b9c945 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -976,13 +976,19 @@ int ip_tunnel_init(struct net_device *dev)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
 	struct iphdr *iph = &tunnel->parms.iph;
-	int err;
+	int i, err;
 
 	dev->destructor	= ip_tunnel_dev_free;
 	dev->tstats = alloc_percpu(struct pcpu_tstats);
 	if (!dev->tstats)
 		return -ENOMEM;
 
+	for_each_possible_cpu(i) {
+		struct pcpu_tstats *ipt_stats;
+		ipt_stats = per_cpu_ptr(dev->tstats, i);
+		u64_stats_init(&ipt_stats->syncp);
+	}
+
 	err = gro_cells_init(&tunnel->gro_cells, dev);
 	if (err) {
 		free_percpu(dev->tstats);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index d6ff126..390953c 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -281,10 +281,24 @@ static void addrconf_mod_dad_timer(struct inet6_ifaddr *ifp,
 
 static int snmp6_alloc_dev(struct inet6_dev *idev)
 {
+	int i;
+
 	if (snmp_mib_init((void __percpu **)idev->stats.ipv6,
 			  sizeof(struct ipstats_mib),
 			  __alignof__(struct ipstats_mib)) < 0)
 		goto err_ip;
+
+	for_each_possible_cpu(i) {
+		struct ipstats_mib *addrconf_stats;
+		addrconf_stats = per_cpu_ptr(idev->stats.ipv6[0], i);
+		u64_stats_init(&addrconf_stats->syncp);
+#if SNMP_ARRAY_SZ == 2
+		addrconf_stats = per_cpu_ptr(idev->stats.ipv6[1], i);
+		u64_stats_init(&addrconf_stats->syncp);
+#endif
+	}
+
+
 	idev->stats.icmpv6dev = kzalloc(sizeof(struct icmpv6_mib_device),
 					GFP_KERNEL);
 	if (!idev->stats.icmpv6dev)
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 7c96100..a8f8559 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -719,6 +719,8 @@ static void ipv6_packet_cleanup(void)
 
 static int __net_init ipv6_init_mibs(struct net *net)
 {
+	int i;
+
 	if (snmp_mib_init((void __percpu **)net->mib.udp_stats_in6,
 			  sizeof(struct udp_mib),
 			  __alignof__(struct udp_mib)) < 0)
@@ -731,6 +733,18 @@ static int __net_init ipv6_init_mibs(struct net *net)
 			  sizeof(struct ipstats_mib),
 			  __alignof__(struct ipstats_mib)) < 0)
 		goto err_ip_mib;
+
+	for_each_possible_cpu(i) {
+		struct ipstats_mib *af_inet6_stats;
+		af_inet6_stats = per_cpu_ptr(net->mib.ipv6_statistics[0], i);
+		u64_stats_init(&af_inet6_stats->syncp);
+#if SNMP_ARRAY_SZ == 2
+		af_inet6_stats = per_cpu_ptr(net->mib.ipv6_statistics[1], i);
+		u64_stats_init(&af_inet6_stats->syncp);
+#endif
+	}
+
+
 	if (snmp_mib_init((void __percpu **)net->mib.icmpv6_statistics,
 			  sizeof(struct icmpv6_mib),
 			  __alignof__(struct icmpv6_mib)) < 0)
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 6b26e9f..b355cb0 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1254,6 +1254,7 @@ static void ip6gre_tunnel_setup(struct net_device *dev)
 static int ip6gre_tunnel_init(struct net_device *dev)
 {
 	struct ip6_tnl *tunnel;
+	int i;
 
 	tunnel = netdev_priv(dev);
 
@@ -1271,6 +1272,13 @@ static int ip6gre_tunnel_init(struct net_device *dev)
 	if (!dev->tstats)
 		return -ENOMEM;
 
+	for_each_possible_cpu(i) {
+		struct pcpu_tstats *ip6gre_tunnel_stats;
+		ip6gre_tunnel_stats = per_cpu_ptr(dev->tstats, i);
+		u64_stats_init(&ip6gre_tunnel_stats->syncp);
+	}
+
+
 	return 0;
 }
 
@@ -1451,6 +1459,7 @@ static void ip6gre_netlink_parms(struct nlattr *data[],
 static int ip6gre_tap_init(struct net_device *dev)
 {
 	struct ip6_tnl *tunnel;
+	int i;
 
 	tunnel = netdev_priv(dev);
 
@@ -1464,6 +1473,12 @@ static int ip6gre_tap_init(struct net_device *dev)
 	if (!dev->tstats)
 		return -ENOMEM;
 
+	for_each_possible_cpu(i) {
+		struct pcpu_tstats *ip6gre_tap_stats;
+		ip6gre_tap_stats = per_cpu_ptr(dev->tstats, i);
+		u64_stats_init(&ip6gre_tap_stats->syncp);
+	}
+
 	return 0;
 }
 
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 2d8f482..b0e3aa1 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1486,12 +1486,19 @@ static inline int
 ip6_tnl_dev_init_gen(struct net_device *dev)
 {
 	struct ip6_tnl *t = netdev_priv(dev);
+	int i;
 
 	t->dev = dev;
 	t->net = dev_net(dev);
 	dev->tstats = alloc_percpu(struct pcpu_tstats);
 	if (!dev->tstats)
 		return -ENOMEM;
+
+	for_each_possible_cpu(i) {
+		struct pcpu_tstats *ip6_tnl_stats;
+		ip6_tnl_stats = per_cpu_ptr(dev->tstats, i);
+		u64_stats_init(&ip6_tnl_stats->syncp);
+	}
 	return 0;
 }
 
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 7ee5cb9..24889fc 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1256,6 +1256,7 @@ static void ipip6_tunnel_setup(struct net_device *dev)
 static int ipip6_tunnel_init(struct net_device *dev)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
+	int i;
 
 	tunnel->dev = dev;
 	tunnel->net = dev_net(dev);
@@ -1268,6 +1269,12 @@ static int ipip6_tunnel_init(struct net_device *dev)
 	if (!dev->tstats)
 		return -ENOMEM;
 
+	for_each_possible_cpu(i) {
+		struct pcpu_tstats *ipip6_tunnel_stats;
+		ipip6_tunnel_stats = per_cpu_ptr(dev->tstats, i);
+		u64_stats_init(&ipip6_tunnel_stats->syncp);
+	}
+
 	return 0;
 }
 
@@ -1277,6 +1284,7 @@ static int __net_init ipip6_fb_tunnel_init(struct net_device *dev)
 	struct iphdr *iph = &tunnel->parms.iph;
 	struct net *net = dev_net(dev);
 	struct sit_net *sitn = net_generic(net, sit_net_id);
+	int i;
 
 	tunnel->dev = dev;
 	tunnel->net = dev_net(dev);
@@ -1290,6 +1298,13 @@ static int __net_init ipip6_fb_tunnel_init(struct net_device *dev)
 	dev->tstats = alloc_percpu(struct pcpu_tstats);
 	if (!dev->tstats)
 		return -ENOMEM;
+
+	for_each_possible_cpu(i) {
+		struct pcpu_tstats *ipip6_fb_stats;
+		ipip6_fb_stats = per_cpu_ptr(dev->tstats, i);
+		u64_stats_init(&ipip6_fb_stats->syncp);
+	}
+
 	dev_hold(dev);
 	rcu_assign_pointer(sitn->tunnels_wc[0], tunnel);
 	return 0;
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index c8148e4..5c54c23 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -836,7 +836,7 @@ ip_vs_new_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest,
 	       struct ip_vs_dest **dest_p)
 {
 	struct ip_vs_dest *dest;
-	unsigned int atype;
+	unsigned int atype, i;
 
 	EnterFunction(2);
 
@@ -863,6 +863,12 @@ ip_vs_new_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest,
 	if (!dest->stats.cpustats)
 		goto err_alloc;
 
+	for_each_possible_cpu(i) {
+		struct ip_vs_cpu_stats *ip_vs_dest_stats;
+		ip_vs_dest_stats = per_cpu_ptr(dest->stats.cpustats, i);
+		u64_stats_init(&ip_vs_dest_stats->syncp);
+	}
+
 	dest->af = svc->af;
 	dest->protocol = svc->protocol;
 	dest->vaddr = svc->addr;
@@ -1136,7 +1142,7 @@ static int
 ip_vs_add_service(struct net *net, struct ip_vs_service_user_kern *u,
 		  struct ip_vs_service **svc_p)
 {
-	int ret = 0;
+	int ret = 0, i;
 	struct ip_vs_scheduler *sched = NULL;
 	struct ip_vs_pe *pe = NULL;
 	struct ip_vs_service *svc = NULL;
@@ -1186,6 +1192,13 @@ ip_vs_add_service(struct net *net, struct ip_vs_service_user_kern *u,
 		goto out_err;
 	}
 
+	for_each_possible_cpu(i) {
+		struct ip_vs_cpu_stats *ip_vs_stats;
+		ip_vs_stats = per_cpu_ptr(svc->stats.cpustats, i);
+		u64_stats_init(&ip_vs_stats->syncp);
+	}
+
+
 	/* I'm the first user of the service */
 	atomic_set(&svc->refcnt, 0);
 
@@ -3796,7 +3809,7 @@ static struct notifier_block ip_vs_dst_notifier = {
 
 int __net_init ip_vs_control_net_init(struct net *net)
 {
-	int idx;
+	int i, idx;
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
 	/* Initialize rs_table */
@@ -3815,6 +3828,12 @@ int __net_init ip_vs_control_net_init(struct net *net)
 	if (!ipvs->tot_stats.cpustats)
 		return -ENOMEM;
 
+	for_each_possible_cpu(i) {
+		struct ip_vs_cpu_stats *ipvs_tot_stats;
+		ipvs_tot_stats = per_cpu_ptr(ipvs->tot_stats.cpustats, i);
+		u64_stats_init(&ipvs_tot_stats->syncp);
+	}
+
 	spin_lock_init(&ipvs->tot_stats.lock);
 
 	proc_create("ip_vs", 0, net->proc_net, &ip_vs_info_fops);
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 2aa13bd..b92553c 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1698,6 +1698,12 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		goto err_destroy_table;
 	}
 
+	for_each_possible_cpu(i) {
+		struct dp_stats_percpu *dpath_stats;
+		dpath_stats = per_cpu_ptr(dp->stats_percpu, i);
+		u64_stats_init(&dpath_stats->sync);
+	}
+
 	dp->ports = kmalloc(DP_VPORT_HASH_BUCKETS * sizeof(struct hlist_head),
 			GFP_KERNEL);
 	if (!dp->ports) {
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 6f65dbe..d830a95f 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -118,6 +118,7 @@ struct vport *ovs_vport_alloc(int priv_size, const struct vport_ops *ops,
 {
 	struct vport *vport;
 	size_t alloc_size;
+	int i;
 
 	alloc_size = sizeof(struct vport);
 	if (priv_size) {
@@ -141,6 +142,13 @@ struct vport *ovs_vport_alloc(int priv_size, const struct vport_ops *ops,
 		return ERR_PTR(-ENOMEM);
 	}
 
+	for_each_possible_cpu(i) {
+		struct pcpu_tstats *vport_stats;
+		vport_stats = per_cpu_ptr(vport->percpu_stats, i);
+		u64_stats_init(&vport_stats->syncp);
+	}
+
+
 	spin_lock_init(&vport->stats_lock);
 
 	return vport;
-- 
1.8.1.2


^ permalink raw reply related

* Re: bug in passing file descriptors
From: Andi Kleen @ 2013-10-07 22:55 UTC (permalink / raw)
  To: David Miller; +Cc: sar, luto, netdev, mtk.manpages, ebiederm
In-Reply-To: <20131007.173237.1669132001431607341.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> From: Steve Rago <sar@nec-labs.com>
> Date: Mon, 7 Oct 2013 16:29:15 -0400
>
>> On 10/07/2013 03:42 PM, David Miller wrote:
>>> There is no compatability issue.
>>>
>>> 32-bit tasks will always see the 4-byte align/length.
>>> 64-bit tasks will always see the 8-byte align/length.
>>>
>> 
>> Really?  So when I compile my application on a 32-bit Linux box and
>> then try to run it on a 64-bit Linux box, you're not going to overrun
>> my buffer when CMSG_SPACE led me to allocate an insufficient amount of
>> memory needed to account for padding on the 64-bit platform?
>
> We have a compatability layer that gives 32-bit applications the
> same behavior as if they had run on a 32-bit machine.
>
> Search around for the MSG_MSG_COMPAT flag and how that is used in
> net/socket.c

But it seems the compat layer doesn't handle this correctly,
otherwise Steve's original test case would work.

Must be a bug somewhere in the compat layer.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply

* [PATCH] net ipv4: Allow unprivileged users to use most of the per net systctls
From: Eric W. Biederman @ 2013-10-07 23:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev


Allow unprivileged users to use:
/proc/sys/net/ipv4/icmp_echo_ignore_all
/proc/sys/net/ipv4/icmp_echo_ignore_broadcasts
/proc/sys/net/ipv4/icmp_ignore_bogus_error_response
/proc/sys/net/ipv4/icmp_errors_use_inbound_ifaddr
/proc/sys/net/ipv4/icmp_ratelimit
/proc/sys/net/ipv4/icmp_ratemask
/proc/sys/net/ipv4/ping_group_range
/proc/sys/net/ipv4/tcp_ecn
/proc/sys/net/ipv4/ip_local_ports_range

These are occassionally handy and after a quick review I don't see
any problems with unprivileged users using them.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 net/ipv4/sysctl_net_ipv4.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index c08f096d46b5..470ea82fca51 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -898,9 +898,9 @@ static __net_init int ipv4_sysctl_init_net(struct net *net)
 		table[8].data =
 			&net->ipv4.sysctl_local_ports.range;
 
-		/* Don't export sysctls to unprivileged users */
+		/* Don't export dangerous sysctls to unprivileged users */
 		if (net->user_ns != &init_user_ns)
-			table[0].procname = NULL;
+			table[9].procname = NULL;
 	}
 
 	/*
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH] mac80211: Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...))
From: djduanjiong-Re5JQEeQqe8AvxtiuMwx3w @ 2013-10-08  0:09 UTC (permalink / raw)
  To: Johannes Berg, John W. Linville, David S. Miller
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Duan Jiong

From: Duan Jiong <duanj.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>

Signed-off-by: Duan Jiong <duanj.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
 net/mac80211/key.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/mac80211/key.c b/net/mac80211/key.c
index 620677e..3e51dd7 100644
--- a/net/mac80211/key.c
+++ b/net/mac80211/key.c
@@ -879,7 +879,7 @@ ieee80211_gtk_rekey_add(struct ieee80211_vif *vif,
 				  keyconf->keylen, keyconf->key,
 				  0, NULL);
 	if (IS_ERR(key))
-		return ERR_PTR(PTR_ERR(key));
+		return ERR_CAST(key);
 
 	if (sdata->u.mgd.mfp != IEEE80211_MFP_DISABLED)
 		key->conf.flags |= IEEE80211_KEY_FLAG_RX_MGMT;
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox