Netdev List
 help / color / mirror / Atom feed
* Re: [QA-TCP] How to send tcp small packages immediately?
From: Rick Jones @ 2014-10-24 15:19 UTC (permalink / raw)
  To: Zhangjie (HZ), kvm, Jason Wang, Michael S. Tsirkin, linux-kernel,
	netdev, liuyongan, qinchuanyu
In-Reply-To: <544A029D.3080508@huawei.com>

On 10/24/2014 12:41 AM, Zhangjie (HZ) wrote:
> Hi,
>
> I use netperf to test the performance of small tcp package, with TCP_NODELAY set :
>
> netperf -H 129.9.7.164 -l 100 -- -m 512 -D
>
> Among the packages I got by tcpdump, there is not only small packages, also lost of
> big ones (skb->len=65160).
>
> IP 129.9.7.186.60840 > 129.9.7.164.34607: tcp 65160
> IP 129.9.7.164.34607 > 129.9.7.186.60840: tcp 0
> IP 129.9.7.164.34607 > 129.9.7.186.60840: tcp 0
> IP 129.9.7.164.34607 > 129.9.7.186.60840: tcp 0
> IP 129.9.7.186.60840 > 129.9.7.164.34607: tcp 65160
> IP 129.9.7.164.34607 > 129.9.7.186.60840: tcp 0
> IP 129.9.7.164.34607 > 129.9.7.186.60840: tcp 0
> IP 129.9.7.164.34607 > 129.9.7.186.60840: tcp 0
> IP 129.9.7.186.60840 > 129.9.7.164.34607: tcp 80
> IP 129.9.7.186.60840 > 129.9.7.164.34607: tcp 512
> IP 129.9.7.186.60840 > 129.9.7.164.34607: tcp 512
>
> SO, how to test small tcp packages? Including TCP_NODELAY, What else should be set?

Well, I don't think there is anything else you can set.  Even with 
TCP_NODELAY set, segment size with TCP will still be controlled by 
factors such as congestion window.

I am ass-u-me-ing your packet trace is at the sender.  I suppose if your 
sender were fast enough compared to the path that might combine with 
congestion window to result in the very large segments.

Not to say there cannot be a bug somewhere with TSO overriding 
TCP_NODELAY, but in broad terms, even TCP_NODELAY does not guarantee 
small TCP segments.  That has been something of a bane on my attempts to 
use TCP for aggregate small-packet performance measurements via netperf 
for quite some time.

And since you seem to have included a virtualization mailing list I 
would also ass-u-me that virtualization is involved somehow.  Knuth only 
knows how that will affect the timing of events, which will be very much 
involved in matters of congestion window and such.  I suppose it is even 
possible that if the packet trace is on a VM receiver that some delays 
in getting the VM running could mean that GRO would end-up making large 
segments being pushed up the stack.

happy benchmarking,

rick jones

^ permalink raw reply

* Re: [PATCH net-next] tcp: Add TCP_FREEZE socket option
From: Yuchung Cheng @ 2014-10-24 14:58 UTC (permalink / raw)
  To: Kristian Evensen; +Cc: Hagen Paul Pfeifer, David Miller, Network Development
In-Reply-To: <CAKfDRXhJ9c+AqqrD38e7RcUSXn21pNMWskRs38H+PnSEwXac+Q@mail.gmail.com>

On Thu, Oct 23, 2014 at 4:33 AM, Kristian Evensen
<kristian.evensen@gmail.com> wrote:
> Hi,
>
> I am very sorry for not explaining the scenario/use-case properly.
> Freeze-TCP is mostly targeted at TCP connections established through
> mobile broadband networks. One example scenario is that of when a user
> moves outside of an area with LTE coverage. The mobile broadband
> connection will then be downgraded to 2G/3G and this process takes
> 10-15 seconds in the networks I have been able to measure. During this
> handover, the modem/device will in most cases report that it is still
> connected to LTE. So just looking at the state of the link is not good
> enough, as it will appear to be working fine (except for no data
> coming through it). The device does not change IP address, so TCP
> connections will resume normal operation as soon as the network
> connection is re-established and packet is retransmitted. However,
> because of the large "idle" period, this can take another 10-15
> seconds.
>
> On Wed, Oct 22, 2014 at 9:50 PM, Hagen Paul Pfeifer <hagen@jauu.net> wrote:
>> At least better. But what userspace daemon would configure this?
>> Likely NetworkManager and friends. But at what conditions?
>
> Yes, that would be my suggestion for tools too. The conditions would
> depend on the kind of network, available information and so on.
>
>> In a NATed scenario there is no gain because IP addreses change and
>> the connection is lost anyway. For the signal strength thing there
>> might be an advantage but it has costs:
>>
>> a) how long did you freeze the connection? What if NetworkManager
>> stops? The connection hang \infty
>> b) is it not better to inform the upper layer - the application - that
>> something happen with the link?
>>
>> I mean when the application experience disruptions, the application
>> can decide what it do: reconnect, reconnect and resend or inform the
>> user. This possibility is now lost/hidden. Maybe it is no problem -
>> maybe it is for some applications.
>
> This is the main reason why I went with a socket option. While I
> worked on this patch I wrote a small daemon for testing purposes. This
> daemon analyses data exported from a mobile broadband modem (QMI),
> looks at total interface throughput and then multicasts a netlink
> message when it determines that a handover might happen. This message
> is only a hint and then it is up to the application developer to
> decide what to do. Another solution would be a hybrid, the module will
> works as I described and the socket option will be used as an opt-in
> for Freeze-TCP.
>
>>
>> Do you have considered to bring this to the IETF (TCPM WG)?
>>
>
> Yes, I am currently considering it, or if I should look into different
> solutions before bringing it up for discussion. The ideal solution
> would be if there was a way to force a retransmit when the handover
> period is over, but that opens a whole net set of problems, potential
> security problems and changes TCP semantics a bit. An advantage of
Do packets actually get dropped during the handover period? if not
then the sender can detect spurious RTOs and undo the cwnd reductions
with timestamps or DSACKs (Eifel). Eifel did not exist when the
freeze-TCP was published at 2000. Even if the connection does not
support these options as major TCP stacks do, slow-start on a path
with new BDP isn't that bad of an idea.

> Freeze-TCP is that it works fine with what we have today.
>
> Thanks for your detailed comments!
>
> Kristian
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: localed stuck in recent 3.18 git in copy_net_ns?
From: Paul E. McKenney @ 2014-10-24 14:50 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Yanko Kaneti, Josh Boyer, Eric W. Biederman, Cong Wang,
	Kevin Fenzi, netdev, Linux-Kernel@Vger. Kernel. Org
In-Reply-To: <31920.1414126114@famine>

On Thu, Oct 23, 2014 at 09:48:34PM -0700, Jay Vosburgh wrote:
> Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> >On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> >> 
> >> On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> >> > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> >> > > On Thu-10/23/14-2014 08:33, Paul E. McKenney wrote:
> >> > > > On Thu, Oct 23, 2014 at 05:27:50AM -0700, Paul E. McKenney wrote:
> >> > > > > On Thu, Oct 23, 2014 at 09:09:26AM +0300, Yanko Kaneti wrote:
> >> > > > > > On Wed, 2014-10-22 at 16:24 -0700, Paul E. McKenney wrote:
> >> > > > > > > On Thu, Oct 23, 2014 at 01:40:32AM +0300, Yanko Kaneti 
> >> > > > > > > wrote:
> >> > > > > > > > On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
> >> > > > > > > > > On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
> >> > > > > > > > > <paulmck@linux.vnet.ibm.com> wrote:
> >> > > > > > > 
> >> > > > > > > [ . . . ]
> >> > > > > > > 
> >> > > > > > > > > > Don't get me wrong -- the fact that this kthread 
> >> > > > > > > > > > appears to
> >> > > > > > > > > > have
> >> > > > > > > > > > blocked within rcu_barrier() for 120 seconds means 
> >> > > > > > > > > > that
> >> > > > > > > > > > something is
> >> > > > > > > > > > most definitely wrong here.  I am surprised that 
> >> > > > > > > > > > there are no
> >> > > > > > > > > > RCU CPU
> >> > > > > > > > > > stall warnings, but perhaps the blockage is in the 
> >> > > > > > > > > > callback
> >> > > > > > > > > > execution
> >> > > > > > > > > > rather than grace-period completion.  Or something is
> >> > > > > > > > > > preventing this
> >> > > > > > > > > > kthread from starting up after the wake-up callback 
> >> > > > > > > > > > executes.
> >> > > > > > > > > > Or...
> >> > > > > > > > > > 
> >> > > > > > > > > > Is this thing reproducible?
> >> > > > > > > > > 
> >> > > > > > > > > I've added Yanko on CC, who reported the backtrace 
> >> > > > > > > > > above and can
> >> > > > > > > > > recreate it reliably.  Apparently reverting the RCU 
> >> > > > > > > > > merge commit
> >> > > > > > > > > (d6dd50e) and rebuilding the latest after that does 
> >> > > > > > > > > not show the
> >> > > > > > > > > issue.  I'll let Yanko explain more and answer any 
> >> > > > > > > > > questions you
> >> > > > > > > > > have.
> >> > > > > > > > 
> >> > > > > > > > - It is reproducible
> >> > > > > > > > - I've done another build here to double check and its 
> >> > > > > > > > definitely
> >> > > > > > > > the rcu merge
> >> > > > > > > >   that's causing it.
> >> > > > > > > > 
> >> > > > > > > > Don't think I'll be able to dig deeper, but I can do 
> >> > > > > > > > testing if
> >> > > > > > > > needed.
> >> > > > > > > 
> >> > > > > > > Please!  Does the following patch help?
> >> > > > > > 
> >> > > > > > Nope, doesn't seem to make a difference to the modprobe 
> >> > > > > > ppp_generic
> >> > > > > > test
> >> > > > > 
> >> > > > > Well, I was hoping.  I will take a closer look at the RCU 
> >> > > > > merge commit
> >> > > > > and see what suggests itself.  I am likely to ask you to 
> >> > > > > revert specific
> >> > > > > commits, if that works for you.
> >> > > > 
> >> > > > Well, rather than reverting commits, could you please try 
> >> > > > testing the
> >> > > > following commits?
> >> > > > 
> >> > > > 11ed7f934cb8 (rcu: Make nocb leader kthreads process pending 
> >> > > > callbacks after spawning)
> >> > > > 
> >> > > > 73a860cd58a1 (rcu: Replace flush_signals() with 
> >> > > > WARN_ON(signal_pending()))
> >> > > > 
> >> > > > c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
> >> > > > 
> >> > > >         For whatever it is worth, I am guessing this one.
> >> > > 
> >> > > Indeed, c847f14217d5 it is.
> >> > > 
> >> > > Much to my embarrasment I just noticed that in addition to the
> >> > > rcu merge, triggering the bug "requires" my specific Fedora 
> >> > > rawhide network
> >> > > setup. Booting in single mode and modprobe ppp_generic is fine. 
> >> > > The bug
> >> > > appears when starting with my regular fedora network setup, which 
> >> > > in my case
> >> > > includes 3 ethernet adapters and a libvirt birdge+nat setup.
> >> > > 
> >> > > Hope that helps.
> >> > > 
> >> > > I am attaching the config.
> >> > 
> >> > It does help a lot, thank you!!!
> >> > 
> >> > The following patch is a bit of a shot in the dark, and assumes that
> >> > commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled 
> >> > idle
> >> > code) introduced the problem.  Does this patch fix things up?
> >> 
> >> Unfortunately not, This is linus-tip + patch
> >
> >OK.  Can't have everything, I guess.
> >
> >> INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
> >>       Not tainted 3.18.0-rc1+ #4
> >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> kworker/u16:6   D ffff8800ca84cec0 11168    96      2 0x00000000
> >> Workqueue: netns cleanup_net
> >>  ffff8802218339e8 0000000000000096 ffff8800ca84cec0 00000000001d5f00
> >>  ffff880221833fd8 00000000001d5f00 ffff880223264ec0 ffff8800ca84cec0
> >>  ffffffff82c52040 7fffffffffffffff ffffffff81ee2658 ffffffff81ee2650
> >> Call Trace:
> >>  [<ffffffff8185b8e9>] schedule+0x29/0x70
> >>  [<ffffffff81860b0c>] schedule_timeout+0x26c/0x410
> >>  [<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
> >>  [<ffffffff8110759c>] ? mark_held_locks+0x7c/0xb0
> >>  [<ffffffff81861b90>] ? _raw_spin_unlock_irq+0x30/0x50
> >>  [<ffffffff8110772d>] ? trace_hardirqs_on_caller+0x15d/0x200
> >>  [<ffffffff8185d31c>] wait_for_completion+0x10c/0x150
> >>  [<ffffffff810e4ed0>] ? wake_up_state+0x20/0x20
> >>  [<ffffffff8112a219>] _rcu_barrier+0x159/0x200
> >>  [<ffffffff8112a315>] rcu_barrier+0x15/0x20
> >>  [<ffffffff8171657f>] netdev_run_todo+0x6f/0x310
> >>  [<ffffffff8170b145>] ? rollback_registered_many+0x265/0x2e0
> >>  [<ffffffff817235ee>] rtnl_unlock+0xe/0x10
> >>  [<ffffffff8170cfa6>] default_device_exit_batch+0x156/0x180
> >>  [<ffffffff810fd390>] ? abort_exclusive_wait+0xb0/0xb0
> >>  [<ffffffff81705053>] ops_exit_list.isra.1+0x53/0x60
> >>  [<ffffffff81705c00>] cleanup_net+0x100/0x1f0
> >>  [<ffffffff810cca98>] process_one_work+0x218/0x850
> >>  [<ffffffff810cc9ff>] ? process_one_work+0x17f/0x850
> >>  [<ffffffff810cd1b7>] ? worker_thread+0xe7/0x4a0
> >>  [<ffffffff810cd13b>] worker_thread+0x6b/0x4a0
> >>  [<ffffffff810cd0d0>] ? process_one_work+0x850/0x850
> >>  [<ffffffff810d348b>] kthread+0x10b/0x130
> >>  [<ffffffff81028c69>] ? sched_clock+0x9/0x10
> >>  [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
> >>  [<ffffffff818628bc>] ret_from_fork+0x7c/0xb0
> >>  [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
> >> 4 locks held by kworker/u16:6/96:
> >>  #0:  ("%s""netns"){.+.+.+}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
> >>  #1:  (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
> >>  #2:  (net_mutex){+.+.+.}, at: [<ffffffff81705b8c>] cleanup_net+0x8c/0x1f0
> >>  #3:  (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a0f5>] _rcu_barrier+0x35/0x200
> >> INFO: task modprobe:1045 blocked for more than 120 seconds.
> >>       Not tainted 3.18.0-rc1+ #4
> >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> modprobe        D ffff880218343480 12920  1045   1044 0x00000080
> >>  ffff880218353bf8 0000000000000096 ffff880218343480 00000000001d5f00
> >>  ffff880218353fd8 00000000001d5f00 ffffffff81e1b580 ffff880218343480
> >>  ffff880218343480 ffffffff81f8f748 0000000000000246 ffff880218343480
> >> Call Trace:
> >>  [<ffffffff8185be91>] schedule_preempt_disabled+0x31/0x80
> >>  [<ffffffff8185d6e3>] mutex_lock_nested+0x183/0x440
> >>  [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
> >>  [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
> >>  [<ffffffffa0673000>] ? 0xffffffffa0673000
> >>  [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
> >>  [<ffffffffa0673048>] br_init+0x48/0xd3 [bridge]
> >>  [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> >>  [<ffffffff81153052>] load_module+0x20c2/0x2870
> >>  [<ffffffff8114e030>] ? store_uevent+0x70/0x70
> >>  [<ffffffff81278717>] ? kernel_read+0x57/0x90
> >>  [<ffffffff811539e6>] SyS_finit_module+0xa6/0xe0
> >>  [<ffffffff81862969>] system_call_fastpath+0x12/0x17
> >> 1 lock held by modprobe/1045:
> >>  #0:  (net_mutex){+.+.+.}, at: [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
> >
> >Presumably the kworker/u16:6 completed, then modprobe hung?
> >
> >If not, I have some very hard questions about why net_mutex can be
> >held by two tasks concurrently, given that it does not appear to be a
> >reader-writer lock...
> >
> >Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
> >__call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
> >NOCB callbacks from irq-disabled idle code) would fail.  Is that the case?
> >If not, could you please bisect the commits between 11ed7f934cb8 (rcu:
> >Make nocb leader kthreads process pending callbacks after spawning)
> >and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?
> 
> 	Just a note to add that I am also reliably inducing what appears
> to be this issue on a current -net tree, when configuring openvswitch
> via script.  I am available to test patches or bisect tomorrow (Friday)
> US time if needed.

Thank you, Jay!  Could you please check to see if reverting this commit
fixes things for you?

35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs

Reverting is not a long-term fix, as this commit is itself a bug fix,
but would be good to check to see if you are seeing the same thing that
Yanko is.  ;-)

							Thanx, Paul

> 	The stack is as follows:
> 
> [ 1320.492020] INFO: task ovs-vswitchd:1303 blocked for more than 120 seconds.
> [ 1320.498965]       Not tainted 3.17.0-testola+ #1
> [ 1320.503570] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1320.511374] ovs-vswitchd    D ffff88013fc14600     0  1303   1302 0x00000004
> [ 1320.511378]  ffff8801388d77d8 0000000000000002 ffff880031144b00 ffff8801388d7fd8
> [ 1320.511382]  0000000000014600 0000000000014600 ffff8800b092e400 ffff880031144b00
> [ 1320.511385]  ffff8800b1126000 ffffffff81c58ad0 ffffffff81c58ad8 7fffffffffffffff
> [ 1320.511389] Call Trace:
> [ 1320.511396]  [<ffffffff81739db9>] schedule+0x29/0x70
> [ 1320.511399]  [<ffffffff8173cd8c>] schedule_timeout+0x1dc/0x260
> [ 1320.511404]  [<ffffffff8109698d>] ? check_preempt_curr+0x8d/0xa0
> [ 1320.511407]  [<ffffffff810969bd>] ? ttwu_do_wakeup+0x1d/0xd0
> [ 1320.511410]  [<ffffffff8173aab6>] wait_for_completion+0xa6/0x160
> [ 1320.511413]  [<ffffffff81099980>] ? wake_up_state+0x20/0x20
> [ 1320.511417]  [<ffffffff810cdb57>] _rcu_barrier+0x157/0x200
> [ 1320.511419]  [<ffffffff810cdc55>] rcu_barrier+0x15/0x20
> [ 1320.511423]  [<ffffffff8163a780>] netdev_run_todo+0x60/0x300
> [ 1320.511427]  [<ffffffff8164515e>] rtnl_unlock+0xe/0x10
> [ 1320.511435]  [<ffffffffa01aecc5>] internal_dev_destroy+0x55/0x80 [openvswitch]
> [ 1320.511440]  [<ffffffffa01ae622>] ovs_vport_del+0x32/0x40 [openvswitch]
> [ 1320.511444]  [<ffffffffa01a7dd0>] ovs_dp_detach_port+0x30/0x40 [openvswitch]
> [ 1320.511448]  [<ffffffffa01a7ea5>] ovs_vport_cmd_del+0xc5/0x110 [openvswitch]
> [ 1320.511452]  [<ffffffff816675b5>] genl_family_rcv_msg+0x1a5/0x3c0
> [ 1320.511455]  [<ffffffff816677d0>] ? genl_family_rcv_msg+0x3c0/0x3c0
> [ 1320.511458]  [<ffffffff81667861>] genl_rcv_msg+0x91/0xd0
> [ 1320.511461]  [<ffffffff816658d1>] netlink_rcv_skb+0xc1/0xe0
> [ 1320.511463]  [<ffffffff81665dfc>] genl_rcv+0x2c/0x40
> [ 1320.511466]  [<ffffffff81664e66>] netlink_unicast+0xf6/0x200
> [ 1320.511468]  [<ffffffff8166528d>] netlink_sendmsg+0x31d/0x780
> [ 1320.511472]  [<ffffffff81662274>] ? netlink_rcv_wake+0x44/0x60
> [ 1320.511475]  [<ffffffff816632e3>] ? netlink_recvmsg+0x1d3/0x3e0
> [ 1320.511479]  [<ffffffff8161c463>] sock_sendmsg+0x93/0xd0
> [ 1320.511484]  [<ffffffff81332d00>] ? apparmor_file_alloc_security+0x20/0x40
> [ 1320.511487]  [<ffffffff8162a697>] ? verify_iovec+0x47/0xd0
> [ 1320.511491]  [<ffffffff8161cc79>] ___sys_sendmsg+0x399/0x3b0
> [ 1320.511495]  [<ffffffff81254e02>] ? kernfs_seq_stop_active+0x32/0x40
> [ 1320.511499]  [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
> [ 1320.511502]  [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
> [ 1320.511505]  [<ffffffff8101c3e9>] ? sched_clock+0x9/0x10
> [ 1320.511509]  [<ffffffff81122d5c>] ? acct_account_cputime+0x1c/0x20
> [ 1320.511512]  [<ffffffff8109ce6b>] ? account_user_time+0x8b/0xa0
> [ 1320.511516]  [<ffffffff811fc135>] ? __fget_light+0x25/0x70
> [ 1320.511519]  [<ffffffff8161d372>] __sys_sendmsg+0x42/0x80
> [ 1320.511521]  [<ffffffff8161d3c2>] SyS_sendmsg+0x12/0x20
> [ 1320.511525]  [<ffffffff8173e6a4>] tracesys_phase2+0xd8/0xdd
> 
> 	-J
> 
> ---
> 	-Jay Vosburgh, jay.vosburgh@canonical.com
> 

^ permalink raw reply

* Re: [PATCH RFC 1/4] virtio_net: pass vi around
From: Michael S. Tsirkin @ 2014-10-24 14:10 UTC (permalink / raw)
  To: David Laight
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D1C9DE6C2@AcuExch.aculab.com>

On Fri, Oct 24, 2014 at 10:02:15AM +0000, David Laight wrote:
> From: Michael S. Tsirkin
>  
> > Too many places poke at [rs]q->vq->vdev->priv just to get
> > the the vi structure.  Let's just pass the pointer around: seems
> > cleaner, and might even be faster.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  drivers/net/virtio_net.c | 36 +++++++++++++++++++-----------------
> >  1 file changed, 19 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 57cbc7d..36f3dfc 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> ...
> >  static struct sk_buff *receive_big(struct net_device *dev,
> > +				   struct virtnet_info *vi,
> 
> Do you need to pass 'dev' here?
> Looks like it is obtainable from vi->dev (as below).
> 
> 	David
> 
> >  				   struct receive_queue *rq,
> >  				   void *buf,
> >  				   unsigned int len)
> >  {
> >  	struct page *page = buf;
> > -	struct sk_buff *skb = page_to_skb(rq, page, 0, len, PAGE_SIZE);
> > +	struct sk_buff *skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
> ...
> > -static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
> > +static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
> > +			void *buf, unsigned int len)
> >  {
> > -	struct virtnet_info *vi = rq->vq->vdev->priv;
> >  	struct net_device *dev = vi->dev;
> ...

It's a matter of style, isn't it?
We have dev to hand, it seems cleaner to just pass it around.

-- 
MST

^ permalink raw reply

* Re: [PATCH 06/14] net: dsa: Add support for hardware monitoring
From: Guenter Roeck @ 2014-10-24 13:53 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, netdev, David S. Miller,
	linux-kernel@vger.kernel.org
In-Reply-To: <20141023195526.GH25190@lunn.ch>

On 10/23/2014 12:55 PM, Andrew Lunn wrote:
[ ... ]
>
> Does hwmon offer a function to sanitise a string?
>
No, that wasn't necessary so far. The 'name' string is a constant string
provided by the driver.

> The switch index definitely should be used and i would probably
> combine that with a sanitised version of the ethernet device name and
> "dsa".
>

How is this ?

em1dsa0-isa-0000
Adapter: ISA adapter
temp1:        +38.0°C  (high = +100.0°C)

This is the sanitized network device name + 'dsa' + index.

Guenter

^ permalink raw reply

* [PATCH 1/1] drivers: net:cpsw: fix probe_dt when only slave 1 is pinned out
From: Mugunthan V N @ 2014-10-24 13:21 UTC (permalink / raw)
  To: netdev; +Cc: davem, Mugunthan V N

when slave 0 has no phy and slave 1 connected to phy, driver probe will
fail as there is no phy id present for slave 0 device tree, so continuing
even though no phy-id found, also moving mac-id read later to ensure
mac-id is read from device tree even when phy-id entry in not found.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
---
 drivers/net/ethernet/ti/cpsw.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 952e1e4..d81b84b 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2006,7 +2006,7 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
 		parp = of_get_property(slave_node, "phy_id", &lenp);
 		if ((parp == NULL) || (lenp != (sizeof(void *) * 2))) {
 			dev_err(&pdev->dev, "Missing slave[%d] phy_id property\n", i);
-			return -EINVAL;
+			goto no_phy_slave;
 		}
 		mdio_node = of_find_node_by_phandle(be32_to_cpup(parp));
 		phyid = be32_to_cpup(parp+1);
@@ -2019,6 +2019,14 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
 		snprintf(slave_data->phy_id, sizeof(slave_data->phy_id),
 			 PHY_ID_FMT, mdio->name, phyid);
 
+		slave_data->phy_if = of_get_phy_mode(slave_node);
+		if (slave_data->phy_if < 0) {
+			dev_err(&pdev->dev, "Missing or malformed slave[%d] phy-mode property\n",
+				i);
+			return slave_data->phy_if;
+		}
+
+no_phy_slave:
 		mac_addr = of_get_mac_address(slave_node);
 		if (mac_addr) {
 			memcpy(slave_data->mac_addr, mac_addr, ETH_ALEN);
@@ -2030,14 +2038,6 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
 					return ret;
 			}
 		}
-
-		slave_data->phy_if = of_get_phy_mode(slave_node);
-		if (slave_data->phy_if < 0) {
-			dev_err(&pdev->dev, "Missing or malformed slave[%d] phy-mode property\n",
-				i);
-			return slave_data->phy_if;
-		}
-
 		if (data->dual_emac) {
 			if (of_property_read_u32(slave_node, "dual_emac_res_vlan",
 						 &prop)) {
-- 
2.1.2.484.g13da0fc

^ permalink raw reply related

* RE: [net-next 8/9] i40e: Moving variable declaration out of the loops
From: David Laight @ 2014-10-24 13:10 UTC (permalink / raw)
  To: 'Sergei Shtylyov', Jeff Kirsher, davem@davemloft.net
  Cc: Akeem G Abodunrin, netdev@vger.kernel.org, nhorman@redhat.com,
	sassmann@redhat.com, jogreene@redhat.com, Patrick Lu
In-Reply-To: <544A4403.9090108@cogentembedded.com>

From: Sergei Shtylyov
> Hello.
> 
> On 10/24/2014 8:10 AM, Jeff Kirsher wrote:
> 
> > From: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
> 
> > Move the three variables out of the loop, so it only declares once.
> 
>     It's only declared once in either case. I think gcc allocates maximum
> local variable space at the start of the function, so declaring variables in
> the loop body comes with no extra price.

It won't make much (if any) difference to the generated code.
However it is generally easier for the human reader to find definitions
if they are at the top of the function.

Personally the only time I declare variables in an inner scope is when
that scope is very small.

	David

^ permalink raw reply

* Payment
From: Finance Department @ 2014-10-24 12:04 UTC (permalink / raw)


Dear Recipient,

You have been awarded the sum of  8,000,000.00 (Eight Million Pounds sterling) with reference number 77100146 by office of the ministry of finance UK.Send us your personal details to deliver your funds.

Gloria Peter

^ permalink raw reply

* Re: [PATCH 06/14] net: dsa: Add support for hardware monitoring
From: Florian Fainelli @ 2014-10-24 12:52 UTC (permalink / raw)
  To: Guenter Roeck, David Miller; +Cc: netdev, andrew, linux-kernel
In-Reply-To: <5449E66B.6090902@roeck-us.net>

Le 23/10/2014 22:40, Guenter Roeck a écrit :
> On 10/23/2014 10:03 PM, David Miller wrote:
>> From: Guenter Roeck <linux@roeck-us.net>
>> Date: Wed, 22 Oct 2014 22:06:41 -0700
>>
>>> On 10/22/2014 09:37 PM, Florian Fainelli wrote:
>>>> 2014-10-22 21:03 GMT-07:00 Guenter Roeck <linux@roeck-us.net>:
>>>>> Some Marvell switches provide chip temperature data.
>>>>> Add support for reporting it to the dsa infrastructure.
>>>>>
>>>>> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
>>>>> ---
>>>> [snip]
>>>>
>>>>> +/* hwmon support
>>>>> ************************************************************/
>>>>> +
>>>>> +#if defined(CONFIG_HWMON) || (defined(MODULE) &&
>>>>> defined(CONFIG_HWMON_MODULE))
>>>>
>>>> IS_ENABLED(CONFIG_HWMON)?
>>>>
>>>
>>> Hi Florian,
>>>
>>> unfortunately, that won't work; I had it initially and got a nice
>>> error message
>>> from Fengguang's build test bot.
>>
>> Then the Kconfig dependencies are broken.
>>
>> Fix Kconfig to only allow legal combinations.
>>
>
> I see two options for that:
>
> - Add
>      select HWMON
>    to the NET_DSA Kconfig entry.
>    Example is Broadcom TIGON3 driver.
>
> - Add a DSA_HWMON Kconfig entry to define the dependencies and
>    to let the user select if the functionality should be enabled.
>    Example is Intel IGB driver.
>
> Any preference from your side ? If no, I'll go with the latter.

I would prefer DSA_HWMON personaly, though no strong feelings. Since 
this is the most debated patch in this patch set, how about you drop it 
from your v2 and we sort this one out separately?
--
Florian

^ permalink raw reply

* Re: bridge: Do not compile options in br_parse_ip_options
From: Pablo Neira Ayuso @ 2014-10-24 12:28 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Herbert Xu, netfilter-devel, bsd, stephen, netdev, eric.dumazet,
	davidn, David S. Miller
In-Reply-To: <20141024104149.GA7401@breakpoint.cc>

On Fri, Oct 24, 2014 at 12:41:49PM +0200, Florian Westphal wrote:
> Herbert Xu <herbert@gondor.apana.org.au> wrote:
> > bridge: Do not compile options in br_parse_ip_options
> > 
> > Commit 462fb2af9788a82a534f8184abfde31574e1cfa0
> > 
> > 	bridge : Sanitize skb before it enters the IP stack
> > 
> > broke when IP options are actually used because it mangles the
> > skb as if it entered the IP stack which is wrong because the
> > bridge is supposed to operate below the IP stack.
> > 
> > Since nobody has actually requested for parsing of IP options
> > this patch fixes it by simply reverting to the previous approach
> > of ignoring all IP options, i.e., zeroing the IPCB.
> > 
> > If and when somebody who uses IP options and actually needs them
> > to be parsed by the bridge complains then we can revisit this.
> > 
> > Reported-by: David Newall <davidn@davidnewall.com>
> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> Tested-by: Florian Westphal <fw@strlen.de>

Applied, thanks a lot for testing Florian.

^ permalink raw reply

* Re: [net-next 8/9] i40e: Moving variable declaration out of the loops
From: Sergei Shtylyov @ 2014-10-24 12:20 UTC (permalink / raw)
  To: Jeff Kirsher, davem
  Cc: Akeem G Abodunrin, netdev, nhorman, sassmann, jogreene,
	Patrick Lu
In-Reply-To: <1414123806-20049-9-git-send-email-jeffrey.t.kirsher@intel.com>

Hello.

On 10/24/2014 8:10 AM, Jeff Kirsher wrote:

> From: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>

> Move the three variables out of the loop, so it only declares once.

    It's only declared once in either case. I think gcc allocates maximum 
local variable space at the start of the function, so declaring variables in 
the loop body comes with no extra price.

> Change-ID: I436913777c7da3c16dc0031b59e3ffa61de74718
> Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
> Signed-off-by: Patrick Lu <patrick.lu@intel.com>
> Tested-by: Jim Young <jamesx.m.young@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
>   drivers/net/ethernet/intel/i40e/i40e_main.c | 7 +++----
>   1 file changed, 3 insertions(+), 4 deletions(-)

> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 9d36d10..b0c10e0 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -813,7 +813,10 @@ static void i40e_update_vsi_stats(struct i40e_vsi *vsi)
>   	struct i40e_eth_stats *oes;
>   	struct i40e_eth_stats *es;     /* device's eth stats */
>   	u32 tx_restart, tx_busy;
> +	struct i40e_ring *p;
>   	u32 rx_page, rx_buf;
> +	u64 bytes, packets;
> +	unsigned int start;
>   	u64 rx_p, rx_b;
>   	u64 tx_p, tx_b;
>   	u16 q;
> @@ -837,10 +840,6 @@ static void i40e_update_vsi_stats(struct i40e_vsi *vsi)
>   	rx_buf = 0;
>   	rcu_read_lock();
>   	for (q = 0; q < vsi->num_queue_pairs; q++) {
> -		struct i40e_ring *p;
> -		u64 bytes, packets;
> -		unsigned int start;
> -

WBR, Sergei

^ permalink raw reply

* [PATCH 2/2] staging: lustre: lnet: lnet: trailing statement
From: Balavasu @ 2014-10-24 12:16 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

This patch fixes the checkpatch.pl issue
Error: trailing statements should be on next line

Signed-off-by: Balavasu <kp.balavasu@gmail.com>
---
 drivers/staging/lustre/lnet/lnet/router.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index cdeb246..0f569a0 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -1670,13 +1670,16 @@ lnet_get_tunables (void)
 	char *s;
 
 	s = getenv("LNET_ROUTER_PING_TIMEOUT");
-	if (s != NULL) router_ping_timeout = atoi(s);
+	if (s != NULL)
+		router_ping_timeout = atoi(s);
 
 	s = getenv("LNET_LIVE_ROUTER_CHECK_INTERVAL");
-	if (s != NULL) live_router_check_interval = atoi(s);
+	if (s != NULL)
+		live_router_check_interval = atoi(s);
 
 	s = getenv("LNET_DEAD_ROUTER_CHECK_INTERVAL");
-	if (s != NULL) dead_router_check_interval = atoi(s);
+	if (s != NULL)
+		dead_router_check_interval = atoi(s);
 
 	/* This replaces old lnd_notify mechanism */
 	check_routers_before_use = 1;
-- 
1.9.1

^ permalink raw reply related

* [PATCH 1/2] staging: lustre: lnet: lnet: do not initialise 0
From: Balavasu @ 2014-10-24 12:15 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

This patch fixes the checkpatch.pl issue
Error: do not initialise statics to 0 or NULL for time

Signed-off-by: Balavasu <kp.balavasu@gmail.com>
---
 drivers/staging/lustre/lnet/lnet/router.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index b5b8fb5..cdeb246 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -46,7 +46,7 @@ MODULE_PARM_DESC(small_router_buffers, "# of small (1 page) messages to buffer i
 static int large_router_buffers;
 module_param(large_router_buffers, int, 0444);
 MODULE_PARM_DESC(large_router_buffers, "# of large messages to buffer in the router");
-static int peer_buffer_credits = 0;
+static int peer_buffer_credits;
 module_param(peer_buffer_credits, int, 0444);
 MODULE_PARM_DESC(peer_buffer_credits, "# router buffer credits per peer");
 
@@ -80,7 +80,7 @@ lnet_peer_buffer_credits(lnet_ni_t *ni)
 
 #endif
 
-static int check_routers_before_use = 0;
+static int check_routers_before_use;
 module_param(check_routers_before_use, int, 0444);
 MODULE_PARM_DESC(check_routers_before_use, "Assume routers are down and ping them before use");
 
@@ -245,7 +245,7 @@ lnet_find_net_locked (__u32 net)
 
 static void lnet_shuffle_seed(void)
 {
-	static int seeded = 0;
+	static int seeded;
 	int lnd_type, seed[2];
 	struct timeval tv;
 	lnet_ni_t *ni;
@@ -1584,8 +1584,8 @@ lnet_notify (lnet_ni_t *ni, lnet_nid_t nid, int alive, unsigned long when)
 void
 lnet_router_checker (void)
 {
-	static time_t last = 0;
-	static int    running = 0;
+	static time_t last;
+	static int    running;
 
 	time_t	    now = get_seconds();
 	int	       interval = now - last;
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH] ucc_geth: invalid rx checksum error values
From: Jianhua Xie @ 2014-10-24 12:11 UTC (permalink / raw)
  To: Kokoris, Ioannis, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org
  Cc: Zhao Qiang, netdev, Timur Tabi
In-Reply-To: <FDB831DFAAC88B42A50BA37D1BC5E13A29C63D1F@MCHP03MSX.global-ad.net>


[-- Attachment #1.1: Type: text/plain, Size: 1828 bytes --]

Hi Ioannis ,

Thank you very much for finding and reporting the issue.
I help to broadcast to netdev mailing-list.

Thanks & Best Regards,
Jianhua

在 2014年10月22日 21:07, Kokoris, Ioannis 写道:
> Hi,
>
> The value in QE UCC ethernet interfaces shows random values:
>
> # ethtool -S eth0
> NIC statistics:
>       ...
>       rx-ip-checksum-errors: 3933892214
>
> The problem is located in a mismatch between the rx_fw_stat_gstrings fields
> - used in ucc_geth_ethtool - and the ucc_geth_rx_firmware_statistics_pram
> fields - used in ucc_geth.
> Although the QE UCC Ethernet Controller includes the Rx checksum error
> counter in the 'Rx firmware counters', the related field is missing from the
> ucc_geth driver.
> After adding the RxChecksumError field in
> ucc_geth_rx_firmware_statistics_pram structure the counter works fine.
>
>
>
> Signed-off-by: Ioannis Kokkoris <ioannis.kokoris@unify.com>
>
>
> diff -Nru a/drivers/net/ethernet/freescale/ucc_geth.h
> b/drivers/net/ethernet/freescale/ucc_geth.h
> --- a/drivers/net/ethernet/freescale/ucc_geth.h 2014-10-22
> 15:19:16.000000000 +0300
> +++ b/drivers/net/ethernet/freescale/ucc_geth.h 2014-10-22
> 15:24:39.000000000 +0300
> @@ -541,6 +541,8 @@
>                                     replaced */
>          u32 insertvlan;         /* total frames that had their VLAN tag
>                                     inserted */
> +       u32 checksumerr;        /* total frames that have IP Checksum Error
> +                                */
>   } __packed;
>
>   struct ucc_geth_rx_interrupt_coalescing_entry {
>
>
>
> Best Regards,
> Ioannis
>
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev


[-- Attachment #1.2: Type: text/html, Size: 2563 bytes --]

[-- Attachment #2: Type: text/plain, Size: 150 bytes --]

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* Re: bridge: Do not compile options in br_parse_ip_options
From: Florian Westphal @ 2014-10-24 10:41 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Florian Westphal, netfilter-devel, bsd, stephen, netdev,
	eric.dumazet, davidn, David S. Miller
In-Reply-To: <20141004141802.GA10878@gondor.apana.org.au>

Herbert Xu <herbert@gondor.apana.org.au> wrote:
> bridge: Do not compile options in br_parse_ip_options
> 
> Commit 462fb2af9788a82a534f8184abfde31574e1cfa0
> 
> 	bridge : Sanitize skb before it enters the IP stack
> 
> broke when IP options are actually used because it mangles the
> skb as if it entered the IP stack which is wrong because the
> bridge is supposed to operate below the IP stack.
> 
> Since nobody has actually requested for parsing of IP options
> this patch fixes it by simply reverting to the previous approach
> of ignoring all IP options, i.e., zeroing the IPCB.
> 
> If and when somebody who uses IP options and actually needs them
> to be parsed by the bridge complains then we can revisit this.
> 
> Reported-by: David Newall <davidn@davidnewall.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Tested-by: Florian Westphal <fw@strlen.de>

Pablo, could you please apply this?

Thanks!

^ permalink raw reply

* RE: [PATCH RFC 1/4] virtio_net: pass vi around
From: David Laight @ 2014-10-24 10:02 UTC (permalink / raw)
  To: 'Michael S. Tsirkin', linux-kernel@vger.kernel.org
  Cc: netdev@vger.kernel.org, virtualization@lists.linux-foundation.org
In-Reply-To: <1414099656-28090-1-git-send-email-mst@redhat.com>

From: Michael S. Tsirkin
 
> Too many places poke at [rs]q->vq->vdev->priv just to get
> the the vi structure.  Let's just pass the pointer around: seems
> cleaner, and might even be faster.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  drivers/net/virtio_net.c | 36 +++++++++++++++++++-----------------
>  1 file changed, 19 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 57cbc7d..36f3dfc 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
...
>  static struct sk_buff *receive_big(struct net_device *dev,
> +				   struct virtnet_info *vi,

Do you need to pass 'dev' here?
Looks like it is obtainable from vi->dev (as below).

	David

>  				   struct receive_queue *rq,
>  				   void *buf,
>  				   unsigned int len)
>  {
>  	struct page *page = buf;
> -	struct sk_buff *skb = page_to_skb(rq, page, 0, len, PAGE_SIZE);
> +	struct sk_buff *skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
...
> -static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
> +static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
> +			void *buf, unsigned int len)
>  {
> -	struct virtnet_info *vi = rq->vq->vdev->priv;
>  	struct net_device *dev = vi->dev;
...

^ permalink raw reply

* Re: [RFC] tcp md5 use of alloc_percpu
From: Herbert Xu @ 2014-10-24  9:33 UTC (permalink / raw)
  To: Crestez Dan Leonard; +Cc: eric.dumazet, netdev, linux-crypto
In-Reply-To: <5448383A.4090908@gmail.com>

Crestez Dan Leonard <cdleonard@gmail.com> wrote:
>
>> Yep, but the sg stuff does not allow for stack variables. Because of
>> possible offloading and DMA, I dont know...
> A stack buffer is used in tcp_md5_hash_header to add a tcphdr to the 
> hash. A quick grep for sg_init_one find a couple of additional instances 
> of what looks like doing crypto on small stack buffers:

First of all crypto_hash_update is obsolete, don't use it in any
new code.  Thanks for reminding me to get rid of existing users.

You should either use crypto_shash_update for small data, e.g., headers
or crypto_ahash_update for large data such as whole packets.

If you use shash then you may allocate your buffer on the stack.  With
ahash stack memory is not allowed.

I hope this clears things up for you.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: ixgbe driver fails occasionally since ee98b577e7711d5890ded2c7b05578a29512bd39
From: Scott Harrison @ 2014-10-24  9:18 UTC (permalink / raw)
  To: Tantilov, Emil S; +Cc: netdev@vger.kernel.org
In-Reply-To: <87618083B2453E4A8714035B62D679925016872B@FMSMSX105.amr.corp.intel.com>

Emil,

Sorry, I should have looked harder it happens as part of ifup eth0, we have 
"/sbin/ethtool -s eth0 autoneg on" in the interfaces file.

HTH.

Scott.

On Thu, Oct 23, 2014 at 07:48:32PM +0000, Tantilov, Emil S wrote:
>>-----Original Message-----
>>From: netdev-owner@vger.kernel.org [mailto:netdev-
>>owner@vger.kernel.org] On Behalf Of Scott Harrison
>>Sent: Thursday, October 23, 2014 7:06 AM
>>To: netdev@vger.kernel.org
>>Subject: ixgbe driver fails occasionally since ee98b577e7711d5890ded2c7b05578a29512bd39
>>
>>Hi,
>>
>>I was asked to raise this issue here.
>>
>>https://bugzilla.kernel.org/show_bug.cgi?id=86591
>>
>>lspci ->
>>
>>03:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>>03:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>>
>>With the fibre 10Gbs SFP occasionally on reboot we get
>>
>>[   15.104726] ixgbe 0000:03:00.0 eth0: detected SFP+: 5
>>[   19.735155] ixgbe 0000:03:00.0 eth0: setup link failed with code -14
>
>The error message is from an ethtool command. Where you trying to set the speed to a certain value?
>
>Thanks,
>Emil
>

-- 
Software Engineer

Cisco.com - http://www.cisco.com

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorised to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html

^ permalink raw reply

* Re: localed stuck in recent 3.18 git in copy_net_ns?
From: Yanko Kaneti @ 2014-10-24  9:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Josh Boyer, Eric W. Biederman, Cong Wang, Kevin Fenzi, netdev,
	Linux-Kernel@Vger. Kernel. Org
In-Reply-To: <20141023220406.GJ4977@linux.vnet.ibm.com>

On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> > 
> > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> > > > On Thu-10/23/14-2014 08:33, Paul E. McKenney wrote:
> > > > > On Thu, Oct 23, 2014 at 05:27:50AM -0700, Paul E. McKenney wrote:
> > > > > > On Thu, Oct 23, 2014 at 09:09:26AM +0300, Yanko Kaneti wrote:
> > > > > > > On Wed, 2014-10-22 at 16:24 -0700, Paul E. McKenney wrote:
> > > > > > > > On Thu, Oct 23, 2014 at 01:40:32AM +0300, Yanko Kaneti 
> > > > > > > > wrote:
> > > > > > > > > On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
> > > > > > > > > > On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
> > > > > > > > > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > > > > > 
> > > > > > > > [ . . . ]
> > > > > > > > 
> > > > > > > > > > > Don't get me wrong -- the fact that this kthread 
> > > > > > > > > > > appears to
> > > > > > > > > > > have
> > > > > > > > > > > blocked within rcu_barrier() for 120 seconds means 
> > > > > > > > > > > that
> > > > > > > > > > > something is
> > > > > > > > > > > most definitely wrong here.  I am surprised that 
> > > > > > > > > > > there are no
> > > > > > > > > > > RCU CPU
> > > > > > > > > > > stall warnings, but perhaps the blockage is in the 
> > > > > > > > > > > callback
> > > > > > > > > > > execution
> > > > > > > > > > > rather than grace-period completion.  Or something is
> > > > > > > > > > > preventing this
> > > > > > > > > > > kthread from starting up after the wake-up callback 
> > > > > > > > > > > executes.
> > > > > > > > > > > Or...
> > > > > > > > > > > 
> > > > > > > > > > > Is this thing reproducible?
> > > > > > > > > > 
> > > > > > > > > > I've added Yanko on CC, who reported the backtrace 
> > > > > > > > > > above and can
> > > > > > > > > > recreate it reliably.  Apparently reverting the RCU 
> > > > > > > > > > merge commit
> > > > > > > > > > (d6dd50e) and rebuilding the latest after that does 
> > > > > > > > > > not show the
> > > > > > > > > > issue.  I'll let Yanko explain more and answer any 
> > > > > > > > > > questions you
> > > > > > > > > > have.
> > > > > > > > > 
> > > > > > > > > - It is reproducible
> > > > > > > > > - I've done another build here to double check and its 
> > > > > > > > > definitely
> > > > > > > > > the rcu merge
> > > > > > > > >   that's causing it.
> > > > > > > > > 
> > > > > > > > > Don't think I'll be able to dig deeper, but I can do 
> > > > > > > > > testing if
> > > > > > > > > needed.
> > > > > > > > 
> > > > > > > > Please!  Does the following patch help?
> > > > > > > 
> > > > > > > Nope, doesn't seem to make a difference to the modprobe 
> > > > > > > ppp_generic
> > > > > > > test
> > > > > > 
> > > > > > Well, I was hoping.  I will take a closer look at the RCU 
> > > > > > merge commit
> > > > > > and see what suggests itself.  I am likely to ask you to 
> > > > > > revert specific
> > > > > > commits, if that works for you.
> > > > > 
> > > > > Well, rather than reverting commits, could you please try 
> > > > > testing the
> > > > > following commits?
> > > > > 
> > > > > 11ed7f934cb8 (rcu: Make nocb leader kthreads process pending 
> > > > > callbacks after spawning)
> > > > > 
> > > > > 73a860cd58a1 (rcu: Replace flush_signals() with 
> > > > > WARN_ON(signal_pending()))
> > > > > 
> > > > > c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
> > > > > 
> > > > >         For whatever it is worth, I am guessing this one.
> > > > 
> > > > Indeed, c847f14217d5 it is.
> > > > 
> > > > Much to my embarrasment I just noticed that in addition to the
> > > > rcu merge, triggering the bug "requires" my specific Fedora 
> > > > rawhide network
> > > > setup. Booting in single mode and modprobe ppp_generic is fine. 
> > > > The bug
> > > > appears when starting with my regular fedora network setup, which 
> > > > in my case
> > > > includes 3 ethernet adapters and a libvirt birdge+nat setup.
> > > > 
> > > > Hope that helps.
> > > > 
> > > > I am attaching the config.
> > > 
> > > It does help a lot, thank you!!!
> > > 
> > > The following patch is a bit of a shot in the dark, and assumes that
> > > commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled 
> > > idle
> > > code) introduced the problem.  Does this patch fix things up?
> > 
> > Unfortunately not, This is linus-tip + patch
> 
> OK.  Can't have everything, I guess.
> 
> > INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
> >       Not tainted 3.18.0-rc1+ #4
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > kworker/u16:6   D ffff8800ca84cec0 11168    96      2 0x00000000
> > Workqueue: netns cleanup_net
> >  ffff8802218339e8 0000000000000096 ffff8800ca84cec0 00000000001d5f00
> >  ffff880221833fd8 00000000001d5f00 ffff880223264ec0 ffff8800ca84cec0
> >  ffffffff82c52040 7fffffffffffffff ffffffff81ee2658 ffffffff81ee2650
> > Call Trace:
> >  [<ffffffff8185b8e9>] schedule+0x29/0x70
> >  [<ffffffff81860b0c>] schedule_timeout+0x26c/0x410
> >  [<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
> >  [<ffffffff8110759c>] ? mark_held_locks+0x7c/0xb0
> >  [<ffffffff81861b90>] ? _raw_spin_unlock_irq+0x30/0x50
> >  [<ffffffff8110772d>] ? trace_hardirqs_on_caller+0x15d/0x200
> >  [<ffffffff8185d31c>] wait_for_completion+0x10c/0x150
> >  [<ffffffff810e4ed0>] ? wake_up_state+0x20/0x20
> >  [<ffffffff8112a219>] _rcu_barrier+0x159/0x200
> >  [<ffffffff8112a315>] rcu_barrier+0x15/0x20
> >  [<ffffffff8171657f>] netdev_run_todo+0x6f/0x310
> >  [<ffffffff8170b145>] ? rollback_registered_many+0x265/0x2e0
> >  [<ffffffff817235ee>] rtnl_unlock+0xe/0x10
> >  [<ffffffff8170cfa6>] default_device_exit_batch+0x156/0x180
> >  [<ffffffff810fd390>] ? abort_exclusive_wait+0xb0/0xb0
> >  [<ffffffff81705053>] ops_exit_list.isra.1+0x53/0x60
> >  [<ffffffff81705c00>] cleanup_net+0x100/0x1f0
> >  [<ffffffff810cca98>] process_one_work+0x218/0x850
> >  [<ffffffff810cc9ff>] ? process_one_work+0x17f/0x850
> >  [<ffffffff810cd1b7>] ? worker_thread+0xe7/0x4a0
> >  [<ffffffff810cd13b>] worker_thread+0x6b/0x4a0
> >  [<ffffffff810cd0d0>] ? process_one_work+0x850/0x850
> >  [<ffffffff810d348b>] kthread+0x10b/0x130
> >  [<ffffffff81028c69>] ? sched_clock+0x9/0x10
> >  [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
> >  [<ffffffff818628bc>] ret_from_fork+0x7c/0xb0
> >  [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
> > 4 locks held by kworker/u16:6/96:
> >  #0:  ("%s""netns"){.+.+.+}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
> >  #1:  (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
> >  #2:  (net_mutex){+.+.+.}, at: [<ffffffff81705b8c>] cleanup_net+0x8c/0x1f0
> >  #3:  (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a0f5>] _rcu_barrier+0x35/0x200
> > INFO: task modprobe:1045 blocked for more than 120 seconds.
> >       Not tainted 3.18.0-rc1+ #4
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > modprobe        D ffff880218343480 12920  1045   1044 0x00000080
> >  ffff880218353bf8 0000000000000096 ffff880218343480 00000000001d5f00
> >  ffff880218353fd8 00000000001d5f00 ffffffff81e1b580 ffff880218343480
> >  ffff880218343480 ffffffff81f8f748 0000000000000246 ffff880218343480
> > Call Trace:
> >  [<ffffffff8185be91>] schedule_preempt_disabled+0x31/0x80
> >  [<ffffffff8185d6e3>] mutex_lock_nested+0x183/0x440
> >  [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
> >  [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
> >  [<ffffffffa0673000>] ? 0xffffffffa0673000
> >  [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
> >  [<ffffffffa0673048>] br_init+0x48/0xd3 [bridge]
> >  [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> >  [<ffffffff81153052>] load_module+0x20c2/0x2870
> >  [<ffffffff8114e030>] ? store_uevent+0x70/0x70
> >  [<ffffffff81278717>] ? kernel_read+0x57/0x90
> >  [<ffffffff811539e6>] SyS_finit_module+0xa6/0xe0
> >  [<ffffffff81862969>] system_call_fastpath+0x12/0x17
> > 1 lock held by modprobe/1045:
> >  #0:  (net_mutex){+.+.+.}, at: [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
> 
> Presumably the kworker/u16:6 completed, then modprobe hung?
> 
> If not, I have some very hard questions about why net_mutex can be
> held by two tasks concurrently, given that it does not appear to be a
> reader-writer lock...
> 
> Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
> __call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
> NOCB callbacks from irq-disabled idle code) would fail.  Is that the case?
> If not, could you please bisect the commits between 11ed7f934cb8 (rcu:
> Make nocb leader kthreads process pending callbacks after spawning)
> and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?

Ok, unless I've messsed up something major, bisecting points to:

35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs

Makes any sense ?


Another thing I noticed is that in failure mode the libvirtd bridge actually 
doesn't show up. So maybe ppp is just the first thing to try that bumps up
into whatever libvirtd is failing to do to setup those.

Truly hope this is not something with random timing dependency....

--Yanko

^ permalink raw reply

* Problem with 10Gbit Broadcom NetXtreme II 5771x/578xx 10/20-Gigabit Ethernet Driver
From: Stefan Bottelier | Ocius.nl @ 2014-10-24  8:42 UTC (permalink / raw)
  To: netdev

Hello,

We are using Dell Blade Centers, but we get a error on the 10Gbit 
Broadcom adapter bnx2x

bnx2x 0000:01:00.0: part number 394D4342-31383735-30315430-473030
WARNING: at drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:9410 
bnx2x_init_one+0x1194/0x32b9
Hardware name: PowerEdge M620
  Modules linked in:
  Pid: 1, comm: swapper/0 Not tainted 3.2.63 #1
Call Trace:
warn_slowpath_common+0x78/0xb0
bnx2x_init_one+0x1194/0x32b9
bnx2x_init_one+0x1194/0x32b9
warn_slowpath_null+0x1b/0x20
bnx2x_init_one+0x1194/0x32b9
load_balance+0xb5/0x600
soft_cursor+0x19d/0x220
idr_get_empty_slot+0xf2/0x290
sysfs_link_sibling+0x6a/0xb0
__sysfs_add_one+0x5b/0x100
ida_get_new_above+0x49/0x1e0
kmem_cache_alloc+0xa9/0xb0
sysfs_add_one+0x1c/0xe0
sysfs_addrm_finish+0x12/0x90
sysfs_new_dirent+0x6d/0xf0
sysfs_do_create_link+0xbe/0x1f0
notifier_call_chain+0x44/0x60
pci_device_probe+0xf7/0x100
driver_probe_device+0x60/0x1f0
pci_match_device+0xf/0xa0
driver_probe_device+0x1f0/0x1f0
driver_attach+0x79/0x80
bus_for_each_dev+0x38/0x70
driver_attach+0x16/0x20
driver_probe_device+0x1f0/0x1f0
bus_add_driver+0x16f/0x250
pci_dev_put+0x20/0x20
driver_register+0x63/0x130
process_scheduled_works+0x30/0x30
cnic_init+0x7f/0x7f
__pci_register_driver+0x3c/0xa0
cnic_init+0x7f/0x7f
bnx2x_init+0x71/0x94
do_one_initcall+0x112/0x160
kernel_init+0x10f/0x1ab
do_early_param+0x77/0x77
start_kernel+0x327/0x327
kernel_thread_helper+0x6/0xd
bnx2x 0000:01:00.0: irq 132 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 133 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 134 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 135 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 136 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 137 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 138 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 139 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 140 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 141 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 142 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 143 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 144 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 145 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 146 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 147 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 148 for MSI/MSI-X
bnx2x 0000:01:00.0: eth0: Added CNIC device
bnx2x 0000:01:00.1: part number 394D4342-31383735-30315430-473030

-- 
Met vriendelijke groet,

Stefan Bottelier
Ocius Internet Services

E: Stefan.Bottelier@ocius.nl
T: +31 (0)20 716 39 09
W: http://www.ocius.nl

^ permalink raw reply

* [PATCH net] net/sched: Fix use of wild pointer in mq_destroy() when qdisc_alloc fail
From: wang.bo116 @ 2014-10-24  8:34 UTC (permalink / raw)
  To: davem, kaber; +Cc: netdev, cui.yunfeng


Hello:
	In mq_destroy() we should set pointer priv->qdiscs to null after free it.
	When attach_default_qdiscs -> qdisc_create_dflt -> mq_init -> qdisc_create_dflt fail -> qdisc_alloc fail,
mq_destroy() will called twice, the first time called in mq_init, and the second time called by qdisc_destroy -> mq_destroy,
if priv->qdiscs not set null after free, the second time to go into mq_destroy() will use wild pointer, becasuse if(!priv->qdiscs) not work.

The problem happend in my machine when ifconfig alloc memory failed:

ifconfig: page allocation failure. order:0, mode:0xd0, oom_adj:0
[<c0211a00>] (unwind_backtrace+0x0/0xd4) from [<c060dc14>] (dump_stack+0x18/0x1c)
[<c060dc14>] (dump_stack+0x18/0x1c) from [<c02a64f0>] (__alloc_pages_nodemask+0x910/0x9dc)
[<c02a64f0>] (__alloc_pages_nodemask+0x910/0x9dc) from [<c02cf0b4>] (cache_alloc_refill+0x364/0x788)
[<c02cf0b4>] (cache_alloc_refill+0x364/0x788) from [<c02cf7f4>] (__kmalloc+0x134/0x1e8)
[<c02cf7f4>] (__kmalloc+0x134/0x1e8) from [<c054b540>] (qdisc_alloc+0x24/0xbc)
[<c054b540>] (qdisc_alloc+0x24/0xbc) from [<c054b5f8>] (qdisc_create_dflt+0x20/0x60)
[<c054b5f8>] (qdisc_create_dflt+0x20/0x60) from [<c054c008>] (mq_init+0x8c/0xf4)
[<c054c008>] (mq_init+0x8c/0xf4) from [<c054b61c>] (qdisc_create_dflt+0x44/0x60)
[<c054b61c>] (qdisc_create_dflt+0x44/0x60) from [<c054b7b4>] (dev_activate+0xac/0x150)
[<c054b7b4>] (dev_activate+0xac/0x150) from [<c053a298>] (dev_open+0xf0/0x120)
[<c053a298>] (dev_open+0xf0/0x120) from [<c0539e08>] (dev_change_flags+0x94/0x164)
[<c0539e08>] (dev_change_flags+0x94/0x164) from [<c05804d8>] (devinet_ioctl+0x300/0x684)
[<c05804d8>] (devinet_ioctl+0x300/0x684) from [<c0581a4c>] (inet_ioctl+0xd0/0x104)
[<c0581a4c>] (inet_ioctl+0xd0/0x104) from [<c0526d0c>] (sock_ioctl+0x200/0x250)
[<c0526d0c>] (sock_ioctl+0x200/0x250) from [<c02e2010>] (vfs_ioctl+0x34/0xb4)
[<c02e2010>] (vfs_ioctl+0x34/0xb4) from [<c02e2b6c>] (do_vfs_ioctl+0x56c/0x5d8)
[<c02e2b6c>] (do_vfs_ioctl+0x56c/0x5d8) from [<c02e2c18>] (sys_ioctl+0x40/0x64)
[<c02e2c18>] (sys_ioctl+0x40/0x64) from [<c0209a60>] (ret_fast_syscall+0x0/0x38)

Unable to handle kernel paging request at virtual address 6b6b6b73
pgd = c1e70000
[6b6b6b73] *pgd=00000000
Internal error: Oops: 15 [#1] PREEMPT
last sysfs file:
Modules linked in:
CPU: 0    Tainted: G        W   (2.6.32.61-EMBSYS-CGEL-4.03.20.P3.F0.B5MAXCNF #2)
PC is at qdisc_destroy+0xc/0xb4
LR is at mq_destroy+0x34/0x60
pc : [<c054b084>]    lr : [<c054bf50>]    psr: 20000213
sp : c191bd80  ip : c191bd98  fp : c191bd94
r10: 00000000  r9 : c191be70  r8 : c1bff40c
r7 : c1c2e000  r6 : c1f3e140  r5 : 00000000  r4 : c1f3e0a0
r3 : f2266ea0  r2 : 00000000  r1 : c1f3e0cc  r0 : 6b6b6b6b
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 12c5387d  Table: 01e70019  DAC: 55555555
Process ifconfig (pid: 391, stack limit = 0xc191a2e8)
Stack: (0xc191bd80 to 0xc191c000)
[<c054b084>] (qdisc_destroy+0xc/0xb4) from [<c054bf50>] (mq_destroy+0x34/0x60)
[<c054bf50>] (mq_destroy+0x34/0x60) from [<c054b0ec>] (qdisc_destroy+0x74/0xb4)
[<c054b0ec>] (qdisc_destroy+0x74/0xb4) from [<c054b62c>] (qdisc_create_dflt+0x54/0x60)
[<c054b62c>] (qdisc_create_dflt+0x54/0x60) from [<c054b7b4>] (dev_activate+0xac/0x150)
[<c054b7b4>] (dev_activate+0xac/0x150) from [<c053a298>] (dev_open+0xf0/0x120)
[<c053a298>] (dev_open+0xf0/0x120) from [<c0539e08>] (dev_change_flags+0x94/0x164)
[<c0539e08>] (dev_change_flags+0x94/0x164) from [<c05804d8>] (devinet_ioctl+0x300/0x684)
[<c05804d8>] (devinet_ioctl+0x300/0x684) from [<c0581a4c>] (inet_ioctl+0xd0/0x104)
[<c0581a4c>] (inet_ioctl+0xd0/0x104) from [<c0526d0c>] (sock_ioctl+0x200/0x250)
[<c0526d0c>] (sock_ioctl+0x200/0x250) from [<c02e2010>] (vfs_ioctl+0x34/0xb4)
[<c02e2010>] (vfs_ioctl+0x34/0xb4) from [<c02e2b6c>] (do_vfs_ioctl+0x56c/0x5d8)
[<c02e2b6c>] (do_vfs_ioctl+0x56c/0x5d8) from [<c02e2c18>] (sys_ioctl+0x40/0x64)
[<c02e2c18>] (sys_ioctl+0x40/0x64) from [<c0209a60>] (ret_fast_syscall+0x0/0x38)
Code: e89da8f0 e1a0c00d e92dd830 e24cb004 (e5903008)
---[ end trace 8e66b5118c0bea77 ]---
Kernel panic - not syncing: Fatal exception

--------------------------------------------------------------------------------

This patch  fix this problem, base on linux 3.18-rc-1:

Signed-off-by: Wang Bo <wang.bo116@zte.com.cn>
Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 42f72f1..a0c90e7 100755
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -33,6 +33,7 @@ static void mq_destroy(struct Qdisc *sch)
 	for (ntx = 0; ntx < dev->num_tx_queues && priv->qdiscs[ntx]; ntx++)
 		qdisc_destroy(priv->qdiscs[ntx]);
 	kfree(priv->qdiscs);
+	priv->qdiscs = NULL;
 }

 static int mq_init(struct Qdisc *sch, struct nlattr *opt)

^ permalink raw reply related

* Re: [PATCH net] bpf: split eBPF out of NET
From: Daniel Borkmann @ 2014-10-24  8:37 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Alexei Starovoitov, David S. Miller, Geert Uytterhoeven,
	Ingo Molnar, Steven Rostedt, Hannes Frederic Sowa, Eric Dumazet,
	Network Development, LKML, yann.morin.1998
In-Reply-To: <20141024081139.GA8861@thin>

On 10/24/2014 10:11 AM, Josh Triplett wrote:
> On Thu, Oct 23, 2014 at 10:32:50PM -0700, Alexei Starovoitov wrote:
>> On Thu, Oct 23, 2014 at 8:23 PM, Josh Triplett <josh@joshtriplett.org> wrote:
>>> On Thu, Oct 23, 2014 at 06:41:08PM -0700, Alexei Starovoitov wrote:
>>>> introduce two configs:
>>>> - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters
>>>>    depend on
>>>> - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use
>>>>
>>>> that solves several problems:
>>>> - tracing and others that wish to use eBPF don't need to depend on NET.
>>>>    They can use BPF_SYSCALL to allow loading from userspace or select BPF
>>>>    to use it directly from kernel in NET-less configs.
>>>> - in 3.18 programs cannot be attached to events yet, so don't force it on
>>>> - when the rest of eBPF infra is there in 3.19+, it's still useful to
>>>>    switch it off to minimize kernel size
>>>>
>>>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
>>>
>>> Thanks for working on this!  A few nits below, but otherwise this looks
>>> good to me.  Once this gets appropriate reviews from net and bpf folks,
>>> please let me know if you want this to go through the net tree, the tiny
>>> tree, or some other tree.
>>
>> Thanks :)
>> I've sent it to Dave and marked it as 'net', so it's for
>> his net tree. I don't mind if he decides to steer it into net-next
>> when it opens, since changing Kconfig is always tricky.
>> I just felt that this patch deserves to be in 'net' and in 3.18-rc
>
> Ah, nice; yes, getting it into 3.18-rc would be excellent if possible.

Fully agreed, BPF_SYSCALL defaulting to 'n' _for the time being_
would also give an option for reducing exposure until the API is
further stabilized and in a ready-to-use state.

>>>> bloat-o-meter on x64 shows:
>>>> add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601)
>>>
>>> Very nice!  Please do include the bloat-o-meter stats in the commit
>>> message.
>>
>> I don't think that's necessary. eBPF is in early stages of adoption.
>> More things to come, so bloat-o-meter stats will be obsolete
>> very quickly.
>
> I don't mean the full list of symbols, just the summary saying this
> saves 15k.

It might probably help to more easily identify from the log which
commits are related to a tinyfication perspective. Perhaps Dave can
still squash that into the commit log.

>>>> +# interpreter that classic socket filters depend on
>>>> +config BPF
>>>> +     boolean
>>>
>>> s/boolean/bool/
>>
>> Is there a difference? I thought it's an alias.
>
> It's an alias, but almost everything uses "bool":
>
> ~/src/linux$ git grep -w bool -- '*Kconfig*' | wc -l
> 7064
> ~/src/linux$ git grep -w boolean -- '*Kconfig*' | wc -l
> 94

Actually, shouldn't we get rid of the alias then? Same accounts
for def_bool and def_boolean ... it would help to avoid confusion
to just have a single term for each.

Anyway, the rest looks good to me, thanks.

I am totally fine of having it under EXPERT for now for the reasons
mentioned above. This can still be lifted later on.

Acked-by: Daniel Borkmann <dborkman@redhat.com>

^ permalink raw reply

* Re: [PATCH net] bpf: split eBPF out of NET
From: Geert Uytterhoeven @ 2014-10-24  8:19 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Alexei Starovoitov, David S. Miller, Ingo Molnar, Steven Rostedt,
	Hannes Frederic Sowa, Eric Dumazet, Daniel Borkmann,
	Network Development, LKML
In-Reply-To: <20141024081139.GA8861@thin>

On Fri, Oct 24, 2014 at 10:11 AM, Josh Triplett <josh@joshtriplett.org> wrote:
>> >> +config BPF_SYSCALL
>> >> +     bool "Enable bpf() system call" if EXPERT
>> >> +     select ANON_INODES
>> >> +     select BPF
>> >> +     default n
>> >> +     help
>> >> +       Enable the bpf() system call that allows to manipulate eBPF
>> >> +       programs and maps via file descriptors.
>> >
>> > Not sure this one goes under EXPERT, especially since it currently has
>> > "default n".
>>
>> I followed the same style as EPOLL, EVENTFD and others
>> in the same category.
>
> I was thinking of CROSS_MEMORY_ATTACH and FHANDLE in the same file.

Those indeed look like better examples.
With if EXPERT and default n, you need to enable EXPERT before you can
enable the syscall, which is probably not what you want.

Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH net] bpf: split eBPF out of NET
From: Josh Triplett @ 2014-10-24  8:11 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Geert Uytterhoeven, Ingo Molnar, Steven Rostedt,
	Hannes Frederic Sowa, Eric Dumazet, Daniel Borkmann,
	Network Development, LKML
In-Reply-To: <CAMEtUuxsk=iLDsD4XXZ8EcurFXgFxD-9iePv=NbBZn+b3YOXJA@mail.gmail.com>

On Thu, Oct 23, 2014 at 10:32:50PM -0700, Alexei Starovoitov wrote:
> On Thu, Oct 23, 2014 at 8:23 PM, Josh Triplett <josh@joshtriplett.org> wrote:
> > On Thu, Oct 23, 2014 at 06:41:08PM -0700, Alexei Starovoitov wrote:
> >> introduce two configs:
> >> - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters
> >>   depend on
> >> - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use
> >>
> >> that solves several problems:
> >> - tracing and others that wish to use eBPF don't need to depend on NET.
> >>   They can use BPF_SYSCALL to allow loading from userspace or select BPF
> >>   to use it directly from kernel in NET-less configs.
> >> - in 3.18 programs cannot be attached to events yet, so don't force it on
> >> - when the rest of eBPF infra is there in 3.19+, it's still useful to
> >>   switch it off to minimize kernel size
> >>
> >> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
> >
> > Thanks for working on this!  A few nits below, but otherwise this looks
> > good to me.  Once this gets appropriate reviews from net and bpf folks,
> > please let me know if you want this to go through the net tree, the tiny
> > tree, or some other tree.
> 
> Thanks :)
> I've sent it to Dave and marked it as 'net', so it's for
> his net tree. I don't mind if he decides to steer it into net-next
> when it opens, since changing Kconfig is always tricky.
> I just felt that this patch deserves to be in 'net' and in 3.18-rc

Ah, nice; yes, getting it into 3.18-rc would be excellent if possible.

> >> bloat-o-meter on x64 shows:
> >> add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601)
> >
> > Very nice!  Please do include the bloat-o-meter stats in the commit
> > message.
> 
> I don't think that's necessary. eBPF is in early stages of adoption.
> More things to come, so bloat-o-meter stats will be obsolete
> very quickly.

I don't mean the full list of symbols, just the summary saying this
saves 15k.

> >> +# interpreter that classic socket filters depend on
> >> +config BPF
> >> +     boolean
> >
> > s/boolean/bool/
> 
> Is there a difference? I thought it's an alias.

It's an alias, but almost everything uses "bool":

~/src/linux$ git grep -w bool -- '*Kconfig*' | wc -l
7064
~/src/linux$ git grep -w boolean -- '*Kconfig*' | wc -l
94

> >> +config BPF_SYSCALL
> >> +     bool "Enable bpf() system call" if EXPERT
> >> +     select ANON_INODES
> >> +     select BPF
> >> +     default n
> >> +     help
> >> +       Enable the bpf() system call that allows to manipulate eBPF
> >> +       programs and maps via file descriptors.
> >
> > Not sure this one goes under EXPERT, especially since it currently has
> > "default n".
> 
> I followed the same style as EPOLL, EVENTFD and others
> in the same category.

I was thinking of CROSS_MEMORY_ATTACH and FHANDLE in the same file.

> >> +/* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call
> >> + * skb_copy_bits(), so provide a weak definition of it for NET-less config.
> >> + */
> >> +int __weak skb_copy_bits(const struct sk_buff *skb, int offset, void *to,
> >> +                      int len)
> >> +{
> >> +     return -EFAULT;
> >> +}
> >
> > Please discuss this in the commit message.  What are the implications of
> > ending up with this implementation that always returns -EFAULT?
> 
> because that's what real skb_copy_bits() would return.
> In this case it's actually irrelevant, since non-socket programs
> are not allowed to have LD_ABS/LD_IND instructions and
> I'm only resolving linker error here.
> But returning negative error helps prevent bugs in cases
> where verifier or some in-kernel generated program uses
> LD_ABS by mistake.

Makes sense.

> I don't think these type of explanations are necessary in
> commit logs.
> 
> >> @@ -6,7 +6,7 @@ menuconfig NET
> >>       bool "Networking support"
> >>       select NLATTR
> >>       select GENERIC_NET_UTILS
> >> -     select ANON_INODES
> >> +     select BPF
> >
> > Why does this not need to select ANON_INODES anymore?  Did *only* BPF
> > use that, so it only needs to occur via BPF_SYSCALL?  If so, can you
> > document that in the commit message?
> 
> I hope that folks who were following this work on netdev
> remember commit 38b3629adb8c04 that added it.
> So here I'm actually removing this ANON_INODES dependency
> from NET and moving it into BPF_SYSCALL where it belongs.

Thanks for the clarification.

> btw, the goal of this patch is not tinification, but rather being
> good citizen and not forcing new syscall on everyone.

A critical part of the tinification effort is not having the kernel get
gratuitously bigger in other areas while we're trying to shrink it.  So,
I really appreciate your work. :)

- Josh Triplett

^ permalink raw reply

* [QA-TCP] How to send tcp small packages immediately?
From: Zhangjie (HZ) @ 2014-10-24  7:41 UTC (permalink / raw)
  To: kvm, Jason Wang, Michael S. Tsirkin, linux-kernel, netdev,
	liuyongan, qinchuanyu

Hi,

I use netperf to test the performance of small tcp package, with TCP_NODELAY set :

netperf -H 129.9.7.164 -l 100 -- -m 512 -D

Among the packages I got by tcpdump, there is not only small packages, also lost of
big ones (skb->len=65160).

IP 129.9.7.186.60840 > 129.9.7.164.34607: tcp 65160
IP 129.9.7.164.34607 > 129.9.7.186.60840: tcp 0
IP 129.9.7.164.34607 > 129.9.7.186.60840: tcp 0
IP 129.9.7.164.34607 > 129.9.7.186.60840: tcp 0
IP 129.9.7.186.60840 > 129.9.7.164.34607: tcp 65160
IP 129.9.7.164.34607 > 129.9.7.186.60840: tcp 0
IP 129.9.7.164.34607 > 129.9.7.186.60840: tcp 0
IP 129.9.7.164.34607 > 129.9.7.186.60840: tcp 0
IP 129.9.7.186.60840 > 129.9.7.164.34607: tcp 80
IP 129.9.7.186.60840 > 129.9.7.164.34607: tcp 512
IP 129.9.7.186.60840 > 129.9.7.164.34607: tcp 512

SO, how to test small tcp packages? Including TCP_NODELAY, What else should be set?

Thanks!
-- 
Best Wishes!
Zhang Jie


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox