Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH RFC v5 net 0/3] ipv6: Reduce the number of fib6_lookup() calls from ip6_pol_route()
From: David Miller @ 2014-10-24  4:15 UTC (permalink / raw)
  To: kafai; +Cc: netdev
In-Reply-To: <1413837765-5446-1-git-send-email-kafai@fb.com>

From: Martin KaFai Lau <kafai@fb.com>
Date: Mon, 20 Oct 2014 13:42:42 -0700

> This patch set is trying to reduce the number of fib6_lookup()
> calls from ip6_pol_route().
> 
> I have adapted davem's udpflooda and kbench_mod test
> (https://git.kernel.org/pub/scm/linux/kernel/git/davem/net_test_tools.git) to
> support IPv6 and here is the result:

Series applied, thanks.

Can you cook up some clean patches against the net_test_tools repo so
that people can use it for both ipv4 and ipv6 route lookup measurements?

Thanks.

^ permalink raw reply

* Es tut mir leid für Sie stören
From: Sara Britt @ 2014-10-24  0:04 UTC (permalink / raw)


Sarah Britt,
99 NORTH MAIN STREET
CHAGRIN FALLS 44022
Ohio U.S.A.


Guten Tag



Bitte nehmen Sie meine aufrichtigen Entschuldigungen an, wenn meine E-mail
Ihre persönliche Ethik nicht trifft und auch meine kleine
Deutschkenntnisse.


Ich weiß, dass dies wie ein vollständiges Eindringen zu Ihrer Ruhe
scheinen kann, aber zurzeit ,dies ist meine einzige Option fuer
Kommunikation zu Ihnen. Dies könnte fremd oder wahrscheinlich unwahr
scheinen,wegen der Hoehe von Ausschuss E-mail,die wir täglich hier in den
U.S.A empfangen, aber ich glaube, dass dies noch der echteste Weg ist,
eine wahre Person und Individuum in einem Deutsch-sprechendem Land zu
kontaktieren. Ich habe Ihnen vor drei Wochen eine Mail per Post gesendet
und bis jetzt haben Sie nicht geantwortet (vielleicht gab es ein Problem
mit der Postlieferung) deswegen habe ich mich entschieden, Sie per E-mail
zu kontaktieren.


Ich heiße Frau Sarah Britt {eine amerikanische Frau},eine Witwe zu
Pensionierten General Micheal J. Britt (ehemaliger Kommandant von der
amerikanischen Armee und er hat als Stabschef der Vereinigten Staaten
Armee gedient).Ich leide an lange Zeit Krebs von der Brust.Von allen
Anzeigen verschlechtert sich meine Bedingung wirklich und es ist ziemlich
offensichtlich, dass ich mehr als zwei Monate {gemäß medizinischen
Berichten von meinem Arzt} nicht leben werde, weil die Krebsphase zu einer
sehr schlechten Phase erreichen hat.


Mein später Ehemann war sehr wohlhabend und reich und nach seinem Tod,
habe ich 60% von seinem Geschäft und Reichtum geerbt, unsere Kinder hat
das Reste 40% geerbt.


Der Arzt hat mir geraten,dass ich für mehr als zwei Monate nicht leben
kann, deswegen habe ich mich jetzt entschieden, Teil von diesem Reichtum
zu teilen,zur Entwicklung von dem wenigen privilegierten Leute in
Deutsch-sprechenden Laendern beizutragen, da dies die Wunsch von meinem
Ehemann Pensionierten General Micheal J. Britt bevor seinem Tod ist und
ich habe ihm versprochen, dass ich alles moeglich machen werde, um zu
versichern, dass diese Wunsch erfuellt worden ist.


Ich bin Bereit, sterben Summe von £6,800,000.00 (sechs Millionen acht
Hundert Tausend Britische Pfund) zu Ihnen für Weniger Das
English-sprechenden Laendern zu spenden und Privilegierte
Wohltaetigkeitsstiftung. ICH HABE SiE WEGEN des Unter Grunden Gewählt:


- Sie verstehen Deutsch
- Sie leben in einem Deutsch-sprechenden Land
- Aufzeichnungen an der Botschaft Ihres Landes hier in U.S.A beweisen Ihr
hoher Sinn der städtischer Verantwortung.


Meine Frage ist , koennen Sie mir helfen, diese Wunsch zu erfuellen?


Ich muss Sie informieren, dass diese Fonds in einer Bank (INTERNATIONAL
FINANCE FIRM), London, England liegt,und auf meiner Anweisung,werden Sie
einer Anwendung für die Überweisung von den Fonds in Ihrem Namen ablegen.


Ich bete ehrlich,dass dieses Geld,wenn es zu Ihnen überwiesen worden
ist,sollen Sie versichern,dass es für den gesagten Zweck benutzt werden
muss.Weil ich darauf gekommen bin,dass jene Reichtumerwerbung ohne
Christus zu erfahren, ist Eitelkeit auf Eitelkeit.


Für Ihre Hilfe,habe ich 40% von diesem gesamten Geld {£6,800,000.00} für
Sie gelegt, auf Grund Ihrer persönlichen Bemühungen und Bereitung, mir
dabei zu helfen, die Sie für Ihren persönlichen Gebrauch sofort das Geld
zu Ihnen überwiesen worden ist, abziehen werden.


Auf Grund meiner unglücklichen Gesundheitsbedingung habe ich die ganzen
Einzahlungsdokumente zu dem deponiertem Geld zu Meinem Anwalt in
Liverpool, England übergeben (weil die Fonds bei der INTERNATIONAL FINANCE
FIRM in London England eingezahlt wurde).


Bitte kontaktieren  Sie mich in meiner privaten E-Mail-Adresse, so  dass
ich meinen Anwalt beauftragen, Ihnen  zu schreiben. Hier ist meine private
 E-Mail-Adresse


===========================================

E-mail: frau.sarabritt@outlook.com
===========================================

Zuletzt,ich bete und hoffe, dass wenn das Geld schließlich zu Ihnen
überwiesen worden ist, werden Sie es umsichtig gemäß meinem Willen und
Gott benutzen {der 60% vom Geld zum wenigen privilegierten in
Deutsch-sprechenden Laendern zu spenden}.


Ihres in Christus.

Frau Sarah Britt

^ permalink raw reply

* Re: [PATCH net-next] Removed unused function sctp_addr_is_valid()
From: David Miller @ 2014-10-24  4:37 UTC (permalink / raw)
  To: sebastien.barre; +Cc: vyasevich, nhorman, linux-sctp, netdev
In-Reply-To: <1413897975-6066-1-git-send-email-sebastien.barre@uclouvain.be>

From: Sébastien Barré <sebastien.barre@uclouvain.be>
Date: Tue, 21 Oct 2014 15:26:15 +0200

> sctp_addr_is_valid() only appeared in its definition.
> 
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>

Applied, thanks.

^ permalink raw reply

* Please Resend Your Message.
From: Liliane Bettencourt. @ 2014-10-24  2:29 UTC (permalink / raw)
  To: Me

I, Liliane authenticate this email to you. You can read about me on: fr.wikipedia.org/wiki/Liliane_Bettencourt I intend to give to you a portion of my Net-worth which I have been banking. Click reply for confirmation and more details.

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

^ permalink raw reply

* Re: [PATCH] net: typhoon: Remove redundant casts
From: David Miller @ 2014-10-24  4:41 UTC (permalink / raw)
  To: linux; +Cc: dave, netdev, linux-kernel
In-Reply-To: <1413903103-3047-1-git-send-email-linux@rasmusvillemoes.dk>

From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Date: Tue, 21 Oct 2014 16:51:43 +0200

> Both image_data and typhoon_fw->data are const u8*, so the cast to u8*
> is unnecessary and confusing.
> 
> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>

Applied.

^ permalink raw reply

* Re: [PATCH] cirrus/mac89x0: Remove superfluous interrupt disable/restore
From: David Miller @ 2014-10-24  4:43 UTC (permalink / raw)
  To: geert; +Cc: netdev, linux-kernel
In-Reply-To: <1413913991-23634-1-git-send-email-geert@linux-m68k.org>

From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Tue, 21 Oct 2014 19:53:11 +0200

> As of commit e4dc601bf99ccd1c ("m68k: Disable/restore interrupts in
> hwreg_present()/hwreg_write()"), this is no longer needed.
> 
> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>

Applied.

^ permalink raw reply

* Re: [PATCH] natsemi/macsonic: Remove superfluous interrupt disable/restore
From: David Miller @ 2014-10-24  4:43 UTC (permalink / raw)
  To: geert; +Cc: netdev, linux-kernel
In-Reply-To: <1413914037-23693-1-git-send-email-geert@linux-m68k.org>

From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Tue, 21 Oct 2014 19:53:57 +0200

> As of commit e4dc601bf99ccd1c ("m68k: Disable/restore interrupts in
> hwreg_present()/hwreg_write()"), this is no longer needed.
> 
> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>

Applied.

^ permalink raw reply

* Re: localed stuck in recent 3.18 git in copy_net_ns?
From: Jay Vosburgh @ 2014-10-24  4:48 UTC (permalink / raw)
  To: paulmck
  Cc: Yanko Kaneti, Josh Boyer, Eric W. Biederman, Cong Wang,
	Kevin Fenzi, netdev, Linux-Kernel@Vger. Kernel. Org
In-Reply-To: <20141023220406.GJ4977@linux.vnet.ibm.com>

Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

>On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
>> 
>> On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
>> > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
>> > > On Thu-10/23/14-2014 08:33, Paul E. McKenney wrote:
>> > > > On Thu, Oct 23, 2014 at 05:27:50AM -0700, Paul E. McKenney wrote:
>> > > > > On Thu, Oct 23, 2014 at 09:09:26AM +0300, Yanko Kaneti wrote:
>> > > > > > On Wed, 2014-10-22 at 16:24 -0700, Paul E. McKenney wrote:
>> > > > > > > On Thu, Oct 23, 2014 at 01:40:32AM +0300, Yanko Kaneti 
>> > > > > > > wrote:
>> > > > > > > > On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
>> > > > > > > > > On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
>> > > > > > > > > <paulmck@linux.vnet.ibm.com> wrote:
>> > > > > > > 
>> > > > > > > [ . . . ]
>> > > > > > > 
>> > > > > > > > > > Don't get me wrong -- the fact that this kthread 
>> > > > > > > > > > appears to
>> > > > > > > > > > have
>> > > > > > > > > > blocked within rcu_barrier() for 120 seconds means 
>> > > > > > > > > > that
>> > > > > > > > > > something is
>> > > > > > > > > > most definitely wrong here.  I am surprised that 
>> > > > > > > > > > there are no
>> > > > > > > > > > RCU CPU
>> > > > > > > > > > stall warnings, but perhaps the blockage is in the 
>> > > > > > > > > > callback
>> > > > > > > > > > execution
>> > > > > > > > > > rather than grace-period completion.  Or something is
>> > > > > > > > > > preventing this
>> > > > > > > > > > kthread from starting up after the wake-up callback 
>> > > > > > > > > > executes.
>> > > > > > > > > > Or...
>> > > > > > > > > > 
>> > > > > > > > > > Is this thing reproducible?
>> > > > > > > > > 
>> > > > > > > > > I've added Yanko on CC, who reported the backtrace 
>> > > > > > > > > above and can
>> > > > > > > > > recreate it reliably.  Apparently reverting the RCU 
>> > > > > > > > > merge commit
>> > > > > > > > > (d6dd50e) and rebuilding the latest after that does 
>> > > > > > > > > not show the
>> > > > > > > > > issue.  I'll let Yanko explain more and answer any 
>> > > > > > > > > questions you
>> > > > > > > > > have.
>> > > > > > > > 
>> > > > > > > > - It is reproducible
>> > > > > > > > - I've done another build here to double check and its 
>> > > > > > > > definitely
>> > > > > > > > the rcu merge
>> > > > > > > >   that's causing it.
>> > > > > > > > 
>> > > > > > > > Don't think I'll be able to dig deeper, but I can do 
>> > > > > > > > testing if
>> > > > > > > > needed.
>> > > > > > > 
>> > > > > > > Please!  Does the following patch help?
>> > > > > > 
>> > > > > > Nope, doesn't seem to make a difference to the modprobe 
>> > > > > > ppp_generic
>> > > > > > test
>> > > > > 
>> > > > > Well, I was hoping.  I will take a closer look at the RCU 
>> > > > > merge commit
>> > > > > and see what suggests itself.  I am likely to ask you to 
>> > > > > revert specific
>> > > > > commits, if that works for you.
>> > > > 
>> > > > Well, rather than reverting commits, could you please try 
>> > > > testing the
>> > > > following commits?
>> > > > 
>> > > > 11ed7f934cb8 (rcu: Make nocb leader kthreads process pending 
>> > > > callbacks after spawning)
>> > > > 
>> > > > 73a860cd58a1 (rcu: Replace flush_signals() with 
>> > > > WARN_ON(signal_pending()))
>> > > > 
>> > > > c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
>> > > > 
>> > > >         For whatever it is worth, I am guessing this one.
>> > > 
>> > > Indeed, c847f14217d5 it is.
>> > > 
>> > > Much to my embarrasment I just noticed that in addition to the
>> > > rcu merge, triggering the bug "requires" my specific Fedora 
>> > > rawhide network
>> > > setup. Booting in single mode and modprobe ppp_generic is fine. 
>> > > The bug
>> > > appears when starting with my regular fedora network setup, which 
>> > > in my case
>> > > includes 3 ethernet adapters and a libvirt birdge+nat setup.
>> > > 
>> > > Hope that helps.
>> > > 
>> > > I am attaching the config.
>> > 
>> > It does help a lot, thank you!!!
>> > 
>> > The following patch is a bit of a shot in the dark, and assumes that
>> > commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled 
>> > idle
>> > code) introduced the problem.  Does this patch fix things up?
>> 
>> Unfortunately not, This is linus-tip + patch
>
>OK.  Can't have everything, I guess.
>
>> INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
>>       Not tainted 3.18.0-rc1+ #4
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> kworker/u16:6   D ffff8800ca84cec0 11168    96      2 0x00000000
>> Workqueue: netns cleanup_net
>>  ffff8802218339e8 0000000000000096 ffff8800ca84cec0 00000000001d5f00
>>  ffff880221833fd8 00000000001d5f00 ffff880223264ec0 ffff8800ca84cec0
>>  ffffffff82c52040 7fffffffffffffff ffffffff81ee2658 ffffffff81ee2650
>> Call Trace:
>>  [<ffffffff8185b8e9>] schedule+0x29/0x70
>>  [<ffffffff81860b0c>] schedule_timeout+0x26c/0x410
>>  [<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
>>  [<ffffffff8110759c>] ? mark_held_locks+0x7c/0xb0
>>  [<ffffffff81861b90>] ? _raw_spin_unlock_irq+0x30/0x50
>>  [<ffffffff8110772d>] ? trace_hardirqs_on_caller+0x15d/0x200
>>  [<ffffffff8185d31c>] wait_for_completion+0x10c/0x150
>>  [<ffffffff810e4ed0>] ? wake_up_state+0x20/0x20
>>  [<ffffffff8112a219>] _rcu_barrier+0x159/0x200
>>  [<ffffffff8112a315>] rcu_barrier+0x15/0x20
>>  [<ffffffff8171657f>] netdev_run_todo+0x6f/0x310
>>  [<ffffffff8170b145>] ? rollback_registered_many+0x265/0x2e0
>>  [<ffffffff817235ee>] rtnl_unlock+0xe/0x10
>>  [<ffffffff8170cfa6>] default_device_exit_batch+0x156/0x180
>>  [<ffffffff810fd390>] ? abort_exclusive_wait+0xb0/0xb0
>>  [<ffffffff81705053>] ops_exit_list.isra.1+0x53/0x60
>>  [<ffffffff81705c00>] cleanup_net+0x100/0x1f0
>>  [<ffffffff810cca98>] process_one_work+0x218/0x850
>>  [<ffffffff810cc9ff>] ? process_one_work+0x17f/0x850
>>  [<ffffffff810cd1b7>] ? worker_thread+0xe7/0x4a0
>>  [<ffffffff810cd13b>] worker_thread+0x6b/0x4a0
>>  [<ffffffff810cd0d0>] ? process_one_work+0x850/0x850
>>  [<ffffffff810d348b>] kthread+0x10b/0x130
>>  [<ffffffff81028c69>] ? sched_clock+0x9/0x10
>>  [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
>>  [<ffffffff818628bc>] ret_from_fork+0x7c/0xb0
>>  [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
>> 4 locks held by kworker/u16:6/96:
>>  #0:  ("%s""netns"){.+.+.+}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
>>  #1:  (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
>>  #2:  (net_mutex){+.+.+.}, at: [<ffffffff81705b8c>] cleanup_net+0x8c/0x1f0
>>  #3:  (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a0f5>] _rcu_barrier+0x35/0x200
>> INFO: task modprobe:1045 blocked for more than 120 seconds.
>>       Not tainted 3.18.0-rc1+ #4
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> modprobe        D ffff880218343480 12920  1045   1044 0x00000080
>>  ffff880218353bf8 0000000000000096 ffff880218343480 00000000001d5f00
>>  ffff880218353fd8 00000000001d5f00 ffffffff81e1b580 ffff880218343480
>>  ffff880218343480 ffffffff81f8f748 0000000000000246 ffff880218343480
>> Call Trace:
>>  [<ffffffff8185be91>] schedule_preempt_disabled+0x31/0x80
>>  [<ffffffff8185d6e3>] mutex_lock_nested+0x183/0x440
>>  [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
>>  [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
>>  [<ffffffffa0673000>] ? 0xffffffffa0673000
>>  [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
>>  [<ffffffffa0673048>] br_init+0x48/0xd3 [bridge]
>>  [<ffffffff81002148>] do_one_initcall+0xd8/0x210
>>  [<ffffffff81153052>] load_module+0x20c2/0x2870
>>  [<ffffffff8114e030>] ? store_uevent+0x70/0x70
>>  [<ffffffff81278717>] ? kernel_read+0x57/0x90
>>  [<ffffffff811539e6>] SyS_finit_module+0xa6/0xe0
>>  [<ffffffff81862969>] system_call_fastpath+0x12/0x17
>> 1 lock held by modprobe/1045:
>>  #0:  (net_mutex){+.+.+.}, at: [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
>
>Presumably the kworker/u16:6 completed, then modprobe hung?
>
>If not, I have some very hard questions about why net_mutex can be
>held by two tasks concurrently, given that it does not appear to be a
>reader-writer lock...
>
>Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
>__call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
>NOCB callbacks from irq-disabled idle code) would fail.  Is that the case?
>If not, could you please bisect the commits between 11ed7f934cb8 (rcu:
>Make nocb leader kthreads process pending callbacks after spawning)
>and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?

	Just a note to add that I am also reliably inducing what appears
to be this issue on a current -net tree, when configuring openvswitch
via script.  I am available to test patches or bisect tomorrow (Friday)
US time if needed.

	The stack is as follows:

[ 1320.492020] INFO: task ovs-vswitchd:1303 blocked for more than 120 seconds.
[ 1320.498965]       Not tainted 3.17.0-testola+ #1
[ 1320.503570] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1320.511374] ovs-vswitchd    D ffff88013fc14600     0  1303   1302 0x00000004
[ 1320.511378]  ffff8801388d77d8 0000000000000002 ffff880031144b00 ffff8801388d7fd8
[ 1320.511382]  0000000000014600 0000000000014600 ffff8800b092e400 ffff880031144b00
[ 1320.511385]  ffff8800b1126000 ffffffff81c58ad0 ffffffff81c58ad8 7fffffffffffffff
[ 1320.511389] Call Trace:
[ 1320.511396]  [<ffffffff81739db9>] schedule+0x29/0x70
[ 1320.511399]  [<ffffffff8173cd8c>] schedule_timeout+0x1dc/0x260
[ 1320.511404]  [<ffffffff8109698d>] ? check_preempt_curr+0x8d/0xa0
[ 1320.511407]  [<ffffffff810969bd>] ? ttwu_do_wakeup+0x1d/0xd0
[ 1320.511410]  [<ffffffff8173aab6>] wait_for_completion+0xa6/0x160
[ 1320.511413]  [<ffffffff81099980>] ? wake_up_state+0x20/0x20
[ 1320.511417]  [<ffffffff810cdb57>] _rcu_barrier+0x157/0x200
[ 1320.511419]  [<ffffffff810cdc55>] rcu_barrier+0x15/0x20
[ 1320.511423]  [<ffffffff8163a780>] netdev_run_todo+0x60/0x300
[ 1320.511427]  [<ffffffff8164515e>] rtnl_unlock+0xe/0x10
[ 1320.511435]  [<ffffffffa01aecc5>] internal_dev_destroy+0x55/0x80 [openvswitch]
[ 1320.511440]  [<ffffffffa01ae622>] ovs_vport_del+0x32/0x40 [openvswitch]
[ 1320.511444]  [<ffffffffa01a7dd0>] ovs_dp_detach_port+0x30/0x40 [openvswitch]
[ 1320.511448]  [<ffffffffa01a7ea5>] ovs_vport_cmd_del+0xc5/0x110 [openvswitch]
[ 1320.511452]  [<ffffffff816675b5>] genl_family_rcv_msg+0x1a5/0x3c0
[ 1320.511455]  [<ffffffff816677d0>] ? genl_family_rcv_msg+0x3c0/0x3c0
[ 1320.511458]  [<ffffffff81667861>] genl_rcv_msg+0x91/0xd0
[ 1320.511461]  [<ffffffff816658d1>] netlink_rcv_skb+0xc1/0xe0
[ 1320.511463]  [<ffffffff81665dfc>] genl_rcv+0x2c/0x40
[ 1320.511466]  [<ffffffff81664e66>] netlink_unicast+0xf6/0x200
[ 1320.511468]  [<ffffffff8166528d>] netlink_sendmsg+0x31d/0x780
[ 1320.511472]  [<ffffffff81662274>] ? netlink_rcv_wake+0x44/0x60
[ 1320.511475]  [<ffffffff816632e3>] ? netlink_recvmsg+0x1d3/0x3e0
[ 1320.511479]  [<ffffffff8161c463>] sock_sendmsg+0x93/0xd0
[ 1320.511484]  [<ffffffff81332d00>] ? apparmor_file_alloc_security+0x20/0x40
[ 1320.511487]  [<ffffffff8162a697>] ? verify_iovec+0x47/0xd0
[ 1320.511491]  [<ffffffff8161cc79>] ___sys_sendmsg+0x399/0x3b0
[ 1320.511495]  [<ffffffff81254e02>] ? kernfs_seq_stop_active+0x32/0x40
[ 1320.511499]  [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 1320.511502]  [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 1320.511505]  [<ffffffff8101c3e9>] ? sched_clock+0x9/0x10
[ 1320.511509]  [<ffffffff81122d5c>] ? acct_account_cputime+0x1c/0x20
[ 1320.511512]  [<ffffffff8109ce6b>] ? account_user_time+0x8b/0xa0
[ 1320.511516]  [<ffffffff811fc135>] ? __fget_light+0x25/0x70
[ 1320.511519]  [<ffffffff8161d372>] __sys_sendmsg+0x42/0x80
[ 1320.511521]  [<ffffffff8161d3c2>] SyS_sendmsg+0x12/0x20
[ 1320.511525]  [<ffffffff8173e6a4>] tracesys_phase2+0xd8/0xdd

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply

* Re: [PATCHv2 net-next] xen-netfront: always keep the Rx ring full of requests
From: David Miller @ 2014-10-24  4:50 UTC (permalink / raw)
  To: david.vrabel; +Cc: netdev, xen-devel, konrad.wilk, boris.ostrovsky
In-Reply-To: <1413973026-6475-1-git-send-email-david.vrabel@citrix.com>

From: David Vrabel <david.vrabel@citrix.com>
Date: Wed, 22 Oct 2014 11:17:06 +0100

> A full Rx ring only requires 1 MiB of memory.  This is not enough
> memory that it is useful to dynamically scale the number of Rx
> requests in the ring based on traffic rates, because:
> 
> a) Even the full 1 MiB is a tiny fraction of a typically modern Linux
>    VM (for example, the AWS micro instance still has 1 GiB of memory).
> 
> b) Netfront would have used up to 1 MiB already even with moderate
>    data rates (there was no adjustment of target based on memory
>    pressure).
> 
> c) Small VMs are going to typically have one VCPU and hence only one
>    queue.
> 
> Keeping the ring full of Rx requests handles bursty traffic better
> than trying to converge on an optimal number of requests to keep
> filled.
> 
> On a 4 core host, an iperf -P 64 -t 60 run from dom0 to a 4 VCPU guest
> improved from 5.1 Gbit/s to 5.6 Gbit/s.  Gains with more bursty
> traffic are expected to be higher.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
> Changes in v2:
> - Keep rxbuf_* sysfs files.

Applied.

^ permalink raw reply

* Re: [PATCH 06/14] net: dsa: Add support for hardware monitoring
From: David Miller @ 2014-10-24  5:03 UTC (permalink / raw)
  To: linux; +Cc: f.fainelli, netdev, andrew, linux-kernel
In-Reply-To: <54488CE1.2000106@roeck-us.net>

From: Guenter Roeck <linux@roeck-us.net>
Date: Wed, 22 Oct 2014 22:06:41 -0700

> On 10/22/2014 09:37 PM, Florian Fainelli wrote:
>> 2014-10-22 21:03 GMT-07:00 Guenter Roeck <linux@roeck-us.net>:
>>> Some Marvell switches provide chip temperature data.
>>> Add support for reporting it to the dsa infrastructure.
>>>
>>> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
>>> ---
>> [snip]
>>
>>> +/* hwmon support
>>> ************************************************************/
>>> +
>>> +#if defined(CONFIG_HWMON) || (defined(MODULE) &&
>>> defined(CONFIG_HWMON_MODULE))
>>
>> IS_ENABLED(CONFIG_HWMON)?
>>
> 
> Hi Florian,
> 
> unfortunately, that won't work; I had it initially and got a nice
> error message
> from Fengguang's build test bot.

Then the Kconfig dependencies are broken.

Fix Kconfig to only allow legal combinations.

^ permalink raw reply

* Re: [PATCH 0/6 resend] s390: network patches for net-next
From: David Miller @ 2014-10-24  5:05 UTC (permalink / raw)
  To: blaschka; +Cc: netdev, linux-s390
In-Reply-To: <1413973087-18740-1-git-send-email-blaschka@linux.vnet.ibm.com>

From: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Date: Wed, 22 Oct 2014 12:18:01 +0200

> looks like there was a problem with my previous posting. Hope this time
> it will work. Sorry for any inconvenience. The patches are mostly
> cleanups and small enhancements for net-next

Series applied, thanks.

^ permalink raw reply

* [net v2] iptunnel: Fix iptunnel_xmit return code for stats maintenance
From: Andy Zhou @ 2014-10-24  5:10 UTC (permalink / raw)
  To: davem; +Cc: netdev, Andy Zhou

iptunnel_xmit() currently always return >= 0 instead of proper error
code, that is used to maintain stats. For example, current return code
conflicts with how iptunnel_xmit_stats() maintains stats.

Unfortunately, the return code can not be changed without readjusting
how SKB memory is managed through the call chain.  The following two
rules are adopted for this patch:

1) Proper error code are always propagate back through the call chain
   so that the caller can maintain stats.

2) Tunnel xmit functions always free resources, e.g. skb and route
   entry.

Signed-off-by: Andy Zhou <azhou@nicira.com>

-----
V1->v2:  Address pravin's review comments:
	 * fix error path memory leak in gre_tnl_send()
	 * Keep error counting consistent between openvswitch vport
	   and iptunnel_xmit_stats()
	 Sending out for net, rather than net-next, as a bug fix.
---
 drivers/net/vxlan.c            |   21 +++++++++++++--------
 include/net/ip_tunnels.h       |    7 +++++++
 net/ipv4/geneve.c              |    8 ++++++--
 net/ipv4/ip_tunnel_core.c      |   14 +++++++++++---
 net/openvswitch/vport-geneve.c |    5 ++---
 net/openvswitch/vport-gre.c    |    7 +++++--
 net/openvswitch/vport-vxlan.c  |    6 +++---
 net/openvswitch/vport.c        |    1 -
 8 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ca30982..93348cb 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1626,8 +1626,10 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
 	bool udp_sum = !vs->sock->sk->sk_no_check_tx;
 
 	skb = udp_tunnel_handle_offloads(skb, udp_sum);
-	if (IS_ERR(skb))
-		return -EINVAL;
+	if (IS_ERR(skb)) {
+		err = -EINVAL;
+		goto error;
+	}
 
 	min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
 			+ VXLAN_HLEN + sizeof(struct iphdr)
@@ -1636,13 +1638,15 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
 	/* Need space for new headers (invalidates iph ptr) */
 	err = skb_cow_head(skb, min_headroom);
 	if (unlikely(err))
-		return err;
+		goto error;
 
 	if (vlan_tx_tag_present(skb)) {
 		if (WARN_ON(!__vlan_put_tag(skb,
 					    skb->vlan_proto,
-					    vlan_tx_tag_get(skb))))
-			return -ENOMEM;
+					    vlan_tx_tag_get(skb)))) {
+			err = -ENOMEM;
+			goto error;
+		}
 
 		skb->vlan_tci = 0;
 	}
@@ -1655,6 +1659,10 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
 
 	return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos,
 				   ttl, df, src_port, dst_port, xnet);
+error:
+	kfree_skb(skb);
+	ip_rt_put(rt);
+	return err;
 }
 EXPORT_SYMBOL_GPL(vxlan_xmit_skb);
 
@@ -1786,9 +1794,6 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 				     tos, ttl, df, src_port, dst_port,
 				     htonl(vni << 8),
 				     !net_eq(vxlan->net, dev_net(vxlan->dev)));
-
-		if (err < 0)
-			goto rt_tx_error;
 		iptunnel_xmit_stats(err, &dev->stats, dev->tstats);
 #if IS_ENABLED(CONFIG_IPV6)
 	} else {
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 5bc6ede..80bcf2e 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -174,6 +174,13 @@ static inline u8 ip_tunnel_ecn_encap(u8 tos, const struct iphdr *iph,
 }
 
 int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto);
+
+/* Transmit a packet over IP tunnel
+ * Returns:
+ *	0 Congestion notification received
+ *	>0  Number of bytes in the packet successfully sent
+ *	<0 packet dropped due to error
+ */
 int iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 		  __be32 src, __be32 dst, __u8 proto,
 		  __u8 tos, __u8 ttl, __be16 df, bool xnet);
diff --git a/net/ipv4/geneve.c b/net/ipv4/geneve.c
index 065cd94..90fea48 100644
--- a/net/ipv4/geneve.c
+++ b/net/ipv4/geneve.c
@@ -129,14 +129,14 @@ int geneve_xmit_skb(struct geneve_sock *gs, struct rtable *rt,
 
 	err = skb_cow_head(skb, min_headroom);
 	if (unlikely(err))
-		return err;
+		goto error;
 
 	if (vlan_tx_tag_present(skb)) {
 		if (unlikely(!__vlan_put_tag(skb,
 					     skb->vlan_proto,
 					     vlan_tx_tag_get(skb)))) {
 			err = -ENOMEM;
-			return err;
+			goto error;
 		}
 		skb->vlan_tci = 0;
 	}
@@ -146,6 +146,10 @@ int geneve_xmit_skb(struct geneve_sock *gs, struct rtable *rt,
 
 	return udp_tunnel_xmit_skb(gs->sock, rt, skb, src, dst,
 				   tos, ttl, df, src_port, dst_port, xnet);
+error:
+	kfree_skb(skb);
+	ip_rt_put(rt);
+	return err;
 }
 EXPORT_SYMBOL_GPL(geneve_xmit_skb);
 
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 88c386c..b3ba4a3 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -77,9 +77,17 @@ int iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 	__ip_select_ident(iph, skb_shinfo(skb)->gso_segs ?: 1);
 
 	err = ip_local_out_sk(sk, skb);
-	if (unlikely(net_xmit_eval(err)))
-		pkt_len = 0;
-	return pkt_len;
+
+	/* Deal with positive error numbers. Filter out NET_XMIT_CN */
+	if (err > 0)
+		return net_xmit_errno(err);
+
+	/* Success, return number of bytes transmitted */
+	if (err == 0)
+		err = pkt_len;
+
+	/* Return pkt_len or an error code */
+	return err;
 }
 EXPORT_SYMBOL_GPL(iptunnel_xmit);
 
diff --git a/net/openvswitch/vport-geneve.c b/net/openvswitch/vport-geneve.c
index 106a9d8..34276fb 100644
--- a/net/openvswitch/vport-geneve.c
+++ b/net/openvswitch/vport-geneve.c
@@ -206,15 +206,14 @@ static int geneve_tnl_send(struct vport *vport, struct sk_buff *skb)
 	tunnel_id_to_vni(tun_key->tun_id, vni);
 	skb->ignore_df = 1;
 
-	err = geneve_xmit_skb(geneve_port->gs, rt, skb, fl.saddr,
+	return  geneve_xmit_skb(geneve_port->gs, rt, skb, fl.saddr,
 			      tun_key->ipv4_dst, tun_key->ipv4_tos,
 			      tun_key->ipv4_ttl, df, sport, dport,
 			      tun_key->tun_flags, vni,
 			      tun_info->options_len, (u8 *)tun_info->options,
 			      false);
-	if (err < 0)
-		ip_rt_put(rt);
 error:
+	kfree_skb(skb);
 	return err;
 }
 
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index 108b82d..c0ec43f 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -154,8 +154,10 @@ static int gre_tnl_send(struct vport *vport, struct sk_buff *skb)
 	fl.flowi4_proto = IPPROTO_GRE;
 
 	rt = ip_route_output_key(net, &fl);
-	if (IS_ERR(rt))
-		return PTR_ERR(rt);
+	if (IS_ERR(rt)) {
+		err = PTR_ERR(rt);
+		goto error;
+	}
 
 	tunnel_hlen = ip_gre_calc_hlen(tun_key->tun_flags);
 
@@ -200,6 +202,7 @@ static int gre_tnl_send(struct vport *vport, struct sk_buff *skb)
 err_free_rt:
 	ip_rt_put(rt);
 error:
+	kfree_skb(skb);
 	return err;
 }
 
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index 2735e01..ace849a 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -174,15 +174,15 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
 
 	src_port = udp_flow_src_port(net, skb, 0, 0, true);
 
-	err = vxlan_xmit_skb(vxlan_port->vs, rt, skb,
+	return vxlan_xmit_skb(vxlan_port->vs, rt, skb,
 			     fl.saddr, tun_key->ipv4_dst,
 			     tun_key->ipv4_tos, tun_key->ipv4_ttl, df,
 			     src_port, dst_port,
 			     htonl(be64_to_cpu(tun_key->tun_id) << 8),
 			     false);
-	if (err < 0)
-		ip_rt_put(rt);
+
 error:
+	kfree_skb(skb);
 	return err;
 }
 
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 6015802..da24d32 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -482,7 +482,6 @@ int ovs_vport_send(struct vport *vport, struct sk_buff *skb)
 		u64_stats_update_end(&stats->syncp);
 	} else if (sent < 0) {
 		ovs_vport_record_error(vport, VPORT_E_TX_ERROR);
-		kfree_skb(skb);
 	} else
 		ovs_vport_record_error(vport, VPORT_E_TX_DROPPED);
 
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH net] bpf: split eBPF out of NET
From: Alexei Starovoitov @ 2014-10-24  5:32 UTC (permalink / raw)
  To: Josh Triplett
  Cc: David S. Miller, Geert Uytterhoeven, Ingo Molnar, Steven Rostedt,
	Hannes Frederic Sowa, Eric Dumazet, Daniel Borkmann,
	Network Development, LKML
In-Reply-To: <20141024032355.GB7879@thin>

On Thu, Oct 23, 2014 at 8:23 PM, Josh Triplett <josh@joshtriplett.org> wrote:
> On Thu, Oct 23, 2014 at 06:41:08PM -0700, Alexei Starovoitov wrote:
>> introduce two configs:
>> - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters
>>   depend on
>> - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use
>>
>> that solves several problems:
>> - tracing and others that wish to use eBPF don't need to depend on NET.
>>   They can use BPF_SYSCALL to allow loading from userspace or select BPF
>>   to use it directly from kernel in NET-less configs.
>> - in 3.18 programs cannot be attached to events yet, so don't force it on
>> - when the rest of eBPF infra is there in 3.19+, it's still useful to
>>   switch it off to minimize kernel size
>>
>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
>
> Thanks for working on this!  A few nits below, but otherwise this looks
> good to me.  Once this gets appropriate reviews from net and bpf folks,
> please let me know if you want this to go through the net tree, the tiny
> tree, or some other tree.

Thanks :)
I've sent it to Dave and marked it as 'net', so it's for
his net tree. I don't mind if he decides to steer it into net-next
when it opens, since changing Kconfig is always tricky.
I just felt that this patch deserves to be in 'net' and in 3.18-rc

>> bloat-o-meter on x64 shows:
>> add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601)
>
> Very nice!  Please do include the bloat-o-meter stats in the commit
> message.

I don't think that's necessary. eBPF is in early stages of adoption.
More things to come, so bloat-o-meter stats will be obsolete
very quickly.

>> +# interpreter that classic socket filters depend on
>> +config BPF
>> +     boolean
>
> s/boolean/bool/

Is there a difference? I thought it's an alias.

>> +config BPF_SYSCALL
>> +     bool "Enable bpf() system call" if EXPERT
>> +     select ANON_INODES
>> +     select BPF
>> +     default n
>> +     help
>> +       Enable the bpf() system call that allows to manipulate eBPF
>> +       programs and maps via file descriptors.
>
> Not sure this one goes under EXPERT, especially since it currently has
> "default n".

I followed the same style as EPOLL, EVENTFD and others
in the same category.

>> +/* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call
>> + * skb_copy_bits(), so provide a weak definition of it for NET-less config.
>> + */
>> +int __weak skb_copy_bits(const struct sk_buff *skb, int offset, void *to,
>> +                      int len)
>> +{
>> +     return -EFAULT;
>> +}
>
> Please discuss this in the commit message.  What are the implications of
> ending up with this implementation that always returns -EFAULT?

because that's what real skb_copy_bits() would return.
In this case it's actually irrelevant, since non-socket programs
are not allowed to have LD_ABS/LD_IND instructions and
I'm only resolving linker error here.
But returning negative error helps prevent bugs in cases
where verifier or some in-kernel generated program uses
LD_ABS by mistake.
I don't think these type of explanations are necessary in
commit logs.

>> @@ -6,7 +6,7 @@ menuconfig NET
>>       bool "Networking support"
>>       select NLATTR
>>       select GENERIC_NET_UTILS
>> -     select ANON_INODES
>> +     select BPF
>
> Why does this not need to select ANON_INODES anymore?  Did *only* BPF
> use that, so it only needs to occur via BPF_SYSCALL?  If so, can you
> document that in the commit message?

I hope that folks who were following this work on netdev
remember commit 38b3629adb8c04 that added it.
So here I'm actually removing this ANON_INODES dependency
from NET and moving it into BPF_SYSCALL where it belongs.

btw, the goal of this patch is not tinification, but rather being
good citizen and not forcing new syscall on everyone.
It was tested with upcoming tracing patches that select
BPF instead of NET.
It will also help parallelize the development, since my old
predicate-tree into eBPF optimization for vanilla tracing filters:
http://lwn.net/Articles/598545/
can potentially go into tip tree a release earlier.
Back then full NET dependency was a show stopper.
This patch finally addresses it.

^ permalink raw reply

* Re: [PATCH 06/14] net: dsa: Add support for hardware monitoring
From: Guenter Roeck @ 2014-10-24  5:40 UTC (permalink / raw)
  To: David Miller; +Cc: f.fainelli, netdev, andrew, linux-kernel
In-Reply-To: <20141024.010306.1939269479587208896.davem@davemloft.net>

On 10/23/2014 10:03 PM, David Miller wrote:
> From: Guenter Roeck <linux@roeck-us.net>
> Date: Wed, 22 Oct 2014 22:06:41 -0700
>
>> On 10/22/2014 09:37 PM, Florian Fainelli wrote:
>>> 2014-10-22 21:03 GMT-07:00 Guenter Roeck <linux@roeck-us.net>:
>>>> Some Marvell switches provide chip temperature data.
>>>> Add support for reporting it to the dsa infrastructure.
>>>>
>>>> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
>>>> ---
>>> [snip]
>>>
>>>> +/* hwmon support
>>>> ************************************************************/
>>>> +
>>>> +#if defined(CONFIG_HWMON) || (defined(MODULE) &&
>>>> defined(CONFIG_HWMON_MODULE))
>>>
>>> IS_ENABLED(CONFIG_HWMON)?
>>>
>>
>> Hi Florian,
>>
>> unfortunately, that won't work; I had it initially and got a nice
>> error message
>> from Fengguang's build test bot.
>
> Then the Kconfig dependencies are broken.
>
> Fix Kconfig to only allow legal combinations.
>

I see two options for that:

- Add
	select HWMON
   to the NET_DSA Kconfig entry.
   Example is Broadcom TIGON3 driver.

- Add a DSA_HWMON Kconfig entry to define the dependencies and
   to let the user select if the functionality should be enabled.
   Example is Intel IGB driver.

Any preference from your side ? If no, I'll go with the latter.

Thanks,
Guenter

^ permalink raw reply

* Re: [PATCH 06/14] net: dsa: Add support for hardware monitoring
From: Guenter Roeck @ 2014-10-24  6:09 UTC (permalink / raw)
  To: Florian Fainelli, David Miller; +Cc: netdev, andrew, linux-kernel
In-Reply-To: <544A4B8C.5040002@gmail.com>

On 10/24/2014 05:52 AM, Florian Fainelli wrote:
> Le 23/10/2014 22:40, Guenter Roeck a écrit :
>> On 10/23/2014 10:03 PM, David Miller wrote:
>>> From: Guenter Roeck <linux@roeck-us.net>
>>> Date: Wed, 22 Oct 2014 22:06:41 -0700
>>>
>>>> On 10/22/2014 09:37 PM, Florian Fainelli wrote:
>>>>> 2014-10-22 21:03 GMT-07:00 Guenter Roeck <linux@roeck-us.net>:
>>>>>> Some Marvell switches provide chip temperature data.
>>>>>> Add support for reporting it to the dsa infrastructure.
>>>>>>
>>>>>> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
>>>>>> ---
>>>>> [snip]
>>>>>
>>>>>> +/* hwmon support
>>>>>> ************************************************************/
>>>>>> +
>>>>>> +#if defined(CONFIG_HWMON) || (defined(MODULE) &&
>>>>>> defined(CONFIG_HWMON_MODULE))
>>>>>
>>>>> IS_ENABLED(CONFIG_HWMON)?
>>>>>
>>>>
>>>> Hi Florian,
>>>>
>>>> unfortunately, that won't work; I had it initially and got a nice
>>>> error message
>>>> from Fengguang's build test bot.
>>>
>>> Then the Kconfig dependencies are broken.
>>>
>>> Fix Kconfig to only allow legal combinations.
>>>
>>
>> I see two options for that:
>>
>> - Add
>>      select HWMON
>>    to the NET_DSA Kconfig entry.
>>    Example is Broadcom TIGON3 driver.
>>
>> - Add a DSA_HWMON Kconfig entry to define the dependencies and
>>    to let the user select if the functionality should be enabled.
>>    Example is Intel IGB driver.
>>
>> Any preference from your side ? If no, I'll go with the latter.
>
> I would prefer DSA_HWMON personaly, though no strong feelings.

That is what I ended up implementing. NET_DSA_HWMON, actually, for consistency.

Since this is the most debated patch in this patch set, how about you drop it from your v2 and we sort this one out separately?

We can do that if there are still issues in v2.

Thanks,
Guenter

^ permalink raw reply

* Re: [PATCH 06/14] net: dsa: Add support for hardware monitoring
From: David Miller @ 2014-10-24  6:10 UTC (permalink / raw)
  To: linux; +Cc: f.fainelli, netdev, andrew, linux-kernel
In-Reply-To: <5449E66B.6090902@roeck-us.net>

From: Guenter Roeck <linux@roeck-us.net>
Date: Thu, 23 Oct 2014 22:40:59 -0700

> I see two options for that:
> 
> - Add
> 	select HWMON
>   to the NET_DSA Kconfig entry.
>   Example is Broadcom TIGON3 driver.
> 
> - Add a DSA_HWMON Kconfig entry to define the dependencies and
>   to let the user select if the functionality should be enabled.
>   Example is Intel IGB driver.
> 
> Any preference from your side ? If no, I'll go with the latter.

Probably the latter is better, select can get you into trouble.

^ permalink raw reply

* [QA-TCP] How to send tcp small packages immediately?
From: Zhangjie (HZ) @ 2014-10-24  7:41 UTC (permalink / raw)
  To: kvm, Jason Wang, Michael S. Tsirkin, linux-kernel, netdev,
	liuyongan, qinchuanyu

Hi,

I use netperf to test the performance of small tcp package, with TCP_NODELAY set :

netperf -H 129.9.7.164 -l 100 -- -m 512 -D

Among the packages I got by tcpdump, there is not only small packages, also lost of
big ones (skb->len=65160).

IP 129.9.7.186.60840 > 129.9.7.164.34607: tcp 65160
IP 129.9.7.164.34607 > 129.9.7.186.60840: tcp 0
IP 129.9.7.164.34607 > 129.9.7.186.60840: tcp 0
IP 129.9.7.164.34607 > 129.9.7.186.60840: tcp 0
IP 129.9.7.186.60840 > 129.9.7.164.34607: tcp 65160
IP 129.9.7.164.34607 > 129.9.7.186.60840: tcp 0
IP 129.9.7.164.34607 > 129.9.7.186.60840: tcp 0
IP 129.9.7.164.34607 > 129.9.7.186.60840: tcp 0
IP 129.9.7.186.60840 > 129.9.7.164.34607: tcp 80
IP 129.9.7.186.60840 > 129.9.7.164.34607: tcp 512
IP 129.9.7.186.60840 > 129.9.7.164.34607: tcp 512

SO, how to test small tcp packages? Including TCP_NODELAY, What else should be set?

Thanks!
-- 
Best Wishes!
Zhang Jie


^ permalink raw reply

* Re: [PATCH net] bpf: split eBPF out of NET
From: Josh Triplett @ 2014-10-24  8:11 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Geert Uytterhoeven, Ingo Molnar, Steven Rostedt,
	Hannes Frederic Sowa, Eric Dumazet, Daniel Borkmann,
	Network Development, LKML
In-Reply-To: <CAMEtUuxsk=iLDsD4XXZ8EcurFXgFxD-9iePv=NbBZn+b3YOXJA@mail.gmail.com>

On Thu, Oct 23, 2014 at 10:32:50PM -0700, Alexei Starovoitov wrote:
> On Thu, Oct 23, 2014 at 8:23 PM, Josh Triplett <josh@joshtriplett.org> wrote:
> > On Thu, Oct 23, 2014 at 06:41:08PM -0700, Alexei Starovoitov wrote:
> >> introduce two configs:
> >> - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters
> >>   depend on
> >> - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use
> >>
> >> that solves several problems:
> >> - tracing and others that wish to use eBPF don't need to depend on NET.
> >>   They can use BPF_SYSCALL to allow loading from userspace or select BPF
> >>   to use it directly from kernel in NET-less configs.
> >> - in 3.18 programs cannot be attached to events yet, so don't force it on
> >> - when the rest of eBPF infra is there in 3.19+, it's still useful to
> >>   switch it off to minimize kernel size
> >>
> >> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
> >
> > Thanks for working on this!  A few nits below, but otherwise this looks
> > good to me.  Once this gets appropriate reviews from net and bpf folks,
> > please let me know if you want this to go through the net tree, the tiny
> > tree, or some other tree.
> 
> Thanks :)
> I've sent it to Dave and marked it as 'net', so it's for
> his net tree. I don't mind if he decides to steer it into net-next
> when it opens, since changing Kconfig is always tricky.
> I just felt that this patch deserves to be in 'net' and in 3.18-rc

Ah, nice; yes, getting it into 3.18-rc would be excellent if possible.

> >> bloat-o-meter on x64 shows:
> >> add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601)
> >
> > Very nice!  Please do include the bloat-o-meter stats in the commit
> > message.
> 
> I don't think that's necessary. eBPF is in early stages of adoption.
> More things to come, so bloat-o-meter stats will be obsolete
> very quickly.

I don't mean the full list of symbols, just the summary saying this
saves 15k.

> >> +# interpreter that classic socket filters depend on
> >> +config BPF
> >> +     boolean
> >
> > s/boolean/bool/
> 
> Is there a difference? I thought it's an alias.

It's an alias, but almost everything uses "bool":

~/src/linux$ git grep -w bool -- '*Kconfig*' | wc -l
7064
~/src/linux$ git grep -w boolean -- '*Kconfig*' | wc -l
94

> >> +config BPF_SYSCALL
> >> +     bool "Enable bpf() system call" if EXPERT
> >> +     select ANON_INODES
> >> +     select BPF
> >> +     default n
> >> +     help
> >> +       Enable the bpf() system call that allows to manipulate eBPF
> >> +       programs and maps via file descriptors.
> >
> > Not sure this one goes under EXPERT, especially since it currently has
> > "default n".
> 
> I followed the same style as EPOLL, EVENTFD and others
> in the same category.

I was thinking of CROSS_MEMORY_ATTACH and FHANDLE in the same file.

> >> +/* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call
> >> + * skb_copy_bits(), so provide a weak definition of it for NET-less config.
> >> + */
> >> +int __weak skb_copy_bits(const struct sk_buff *skb, int offset, void *to,
> >> +                      int len)
> >> +{
> >> +     return -EFAULT;
> >> +}
> >
> > Please discuss this in the commit message.  What are the implications of
> > ending up with this implementation that always returns -EFAULT?
> 
> because that's what real skb_copy_bits() would return.
> In this case it's actually irrelevant, since non-socket programs
> are not allowed to have LD_ABS/LD_IND instructions and
> I'm only resolving linker error here.
> But returning negative error helps prevent bugs in cases
> where verifier or some in-kernel generated program uses
> LD_ABS by mistake.

Makes sense.

> I don't think these type of explanations are necessary in
> commit logs.
> 
> >> @@ -6,7 +6,7 @@ menuconfig NET
> >>       bool "Networking support"
> >>       select NLATTR
> >>       select GENERIC_NET_UTILS
> >> -     select ANON_INODES
> >> +     select BPF
> >
> > Why does this not need to select ANON_INODES anymore?  Did *only* BPF
> > use that, so it only needs to occur via BPF_SYSCALL?  If so, can you
> > document that in the commit message?
> 
> I hope that folks who were following this work on netdev
> remember commit 38b3629adb8c04 that added it.
> So here I'm actually removing this ANON_INODES dependency
> from NET and moving it into BPF_SYSCALL where it belongs.

Thanks for the clarification.

> btw, the goal of this patch is not tinification, but rather being
> good citizen and not forcing new syscall on everyone.

A critical part of the tinification effort is not having the kernel get
gratuitously bigger in other areas while we're trying to shrink it.  So,
I really appreciate your work. :)

- Josh Triplett

^ permalink raw reply

* Re: [PATCH net] bpf: split eBPF out of NET
From: Geert Uytterhoeven @ 2014-10-24  8:19 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Alexei Starovoitov, David S. Miller, Ingo Molnar, Steven Rostedt,
	Hannes Frederic Sowa, Eric Dumazet, Daniel Borkmann,
	Network Development, LKML
In-Reply-To: <20141024081139.GA8861@thin>

On Fri, Oct 24, 2014 at 10:11 AM, Josh Triplett <josh@joshtriplett.org> wrote:
>> >> +config BPF_SYSCALL
>> >> +     bool "Enable bpf() system call" if EXPERT
>> >> +     select ANON_INODES
>> >> +     select BPF
>> >> +     default n
>> >> +     help
>> >> +       Enable the bpf() system call that allows to manipulate eBPF
>> >> +       programs and maps via file descriptors.
>> >
>> > Not sure this one goes under EXPERT, especially since it currently has
>> > "default n".
>>
>> I followed the same style as EPOLL, EVENTFD and others
>> in the same category.
>
> I was thinking of CROSS_MEMORY_ATTACH and FHANDLE in the same file.

Those indeed look like better examples.
With if EXPERT and default n, you need to enable EXPERT before you can
enable the syscall, which is probably not what you want.

Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH net] bpf: split eBPF out of NET
From: Daniel Borkmann @ 2014-10-24  8:37 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Alexei Starovoitov, David S. Miller, Geert Uytterhoeven,
	Ingo Molnar, Steven Rostedt, Hannes Frederic Sowa, Eric Dumazet,
	Network Development, LKML, yann.morin.1998
In-Reply-To: <20141024081139.GA8861@thin>

On 10/24/2014 10:11 AM, Josh Triplett wrote:
> On Thu, Oct 23, 2014 at 10:32:50PM -0700, Alexei Starovoitov wrote:
>> On Thu, Oct 23, 2014 at 8:23 PM, Josh Triplett <josh@joshtriplett.org> wrote:
>>> On Thu, Oct 23, 2014 at 06:41:08PM -0700, Alexei Starovoitov wrote:
>>>> introduce two configs:
>>>> - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters
>>>>    depend on
>>>> - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use
>>>>
>>>> that solves several problems:
>>>> - tracing and others that wish to use eBPF don't need to depend on NET.
>>>>    They can use BPF_SYSCALL to allow loading from userspace or select BPF
>>>>    to use it directly from kernel in NET-less configs.
>>>> - in 3.18 programs cannot be attached to events yet, so don't force it on
>>>> - when the rest of eBPF infra is there in 3.19+, it's still useful to
>>>>    switch it off to minimize kernel size
>>>>
>>>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
>>>
>>> Thanks for working on this!  A few nits below, but otherwise this looks
>>> good to me.  Once this gets appropriate reviews from net and bpf folks,
>>> please let me know if you want this to go through the net tree, the tiny
>>> tree, or some other tree.
>>
>> Thanks :)
>> I've sent it to Dave and marked it as 'net', so it's for
>> his net tree. I don't mind if he decides to steer it into net-next
>> when it opens, since changing Kconfig is always tricky.
>> I just felt that this patch deserves to be in 'net' and in 3.18-rc
>
> Ah, nice; yes, getting it into 3.18-rc would be excellent if possible.

Fully agreed, BPF_SYSCALL defaulting to 'n' _for the time being_
would also give an option for reducing exposure until the API is
further stabilized and in a ready-to-use state.

>>>> bloat-o-meter on x64 shows:
>>>> add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601)
>>>
>>> Very nice!  Please do include the bloat-o-meter stats in the commit
>>> message.
>>
>> I don't think that's necessary. eBPF is in early stages of adoption.
>> More things to come, so bloat-o-meter stats will be obsolete
>> very quickly.
>
> I don't mean the full list of symbols, just the summary saying this
> saves 15k.

It might probably help to more easily identify from the log which
commits are related to a tinyfication perspective. Perhaps Dave can
still squash that into the commit log.

>>>> +# interpreter that classic socket filters depend on
>>>> +config BPF
>>>> +     boolean
>>>
>>> s/boolean/bool/
>>
>> Is there a difference? I thought it's an alias.
>
> It's an alias, but almost everything uses "bool":
>
> ~/src/linux$ git grep -w bool -- '*Kconfig*' | wc -l
> 7064
> ~/src/linux$ git grep -w boolean -- '*Kconfig*' | wc -l
> 94

Actually, shouldn't we get rid of the alias then? Same accounts
for def_bool and def_boolean ... it would help to avoid confusion
to just have a single term for each.

Anyway, the rest looks good to me, thanks.

I am totally fine of having it under EXPERT for now for the reasons
mentioned above. This can still be lifted later on.

Acked-by: Daniel Borkmann <dborkman@redhat.com>

^ permalink raw reply

* [PATCH net] net/sched: Fix use of wild pointer in mq_destroy() when qdisc_alloc fail
From: wang.bo116 @ 2014-10-24  8:34 UTC (permalink / raw)
  To: davem, kaber; +Cc: netdev, cui.yunfeng


Hello:
	In mq_destroy() we should set pointer priv->qdiscs to null after free it.
	When attach_default_qdiscs -> qdisc_create_dflt -> mq_init -> qdisc_create_dflt fail -> qdisc_alloc fail,
mq_destroy() will called twice, the first time called in mq_init, and the second time called by qdisc_destroy -> mq_destroy,
if priv->qdiscs not set null after free, the second time to go into mq_destroy() will use wild pointer, becasuse if(!priv->qdiscs) not work.

The problem happend in my machine when ifconfig alloc memory failed:

ifconfig: page allocation failure. order:0, mode:0xd0, oom_adj:0
[<c0211a00>] (unwind_backtrace+0x0/0xd4) from [<c060dc14>] (dump_stack+0x18/0x1c)
[<c060dc14>] (dump_stack+0x18/0x1c) from [<c02a64f0>] (__alloc_pages_nodemask+0x910/0x9dc)
[<c02a64f0>] (__alloc_pages_nodemask+0x910/0x9dc) from [<c02cf0b4>] (cache_alloc_refill+0x364/0x788)
[<c02cf0b4>] (cache_alloc_refill+0x364/0x788) from [<c02cf7f4>] (__kmalloc+0x134/0x1e8)
[<c02cf7f4>] (__kmalloc+0x134/0x1e8) from [<c054b540>] (qdisc_alloc+0x24/0xbc)
[<c054b540>] (qdisc_alloc+0x24/0xbc) from [<c054b5f8>] (qdisc_create_dflt+0x20/0x60)
[<c054b5f8>] (qdisc_create_dflt+0x20/0x60) from [<c054c008>] (mq_init+0x8c/0xf4)
[<c054c008>] (mq_init+0x8c/0xf4) from [<c054b61c>] (qdisc_create_dflt+0x44/0x60)
[<c054b61c>] (qdisc_create_dflt+0x44/0x60) from [<c054b7b4>] (dev_activate+0xac/0x150)
[<c054b7b4>] (dev_activate+0xac/0x150) from [<c053a298>] (dev_open+0xf0/0x120)
[<c053a298>] (dev_open+0xf0/0x120) from [<c0539e08>] (dev_change_flags+0x94/0x164)
[<c0539e08>] (dev_change_flags+0x94/0x164) from [<c05804d8>] (devinet_ioctl+0x300/0x684)
[<c05804d8>] (devinet_ioctl+0x300/0x684) from [<c0581a4c>] (inet_ioctl+0xd0/0x104)
[<c0581a4c>] (inet_ioctl+0xd0/0x104) from [<c0526d0c>] (sock_ioctl+0x200/0x250)
[<c0526d0c>] (sock_ioctl+0x200/0x250) from [<c02e2010>] (vfs_ioctl+0x34/0xb4)
[<c02e2010>] (vfs_ioctl+0x34/0xb4) from [<c02e2b6c>] (do_vfs_ioctl+0x56c/0x5d8)
[<c02e2b6c>] (do_vfs_ioctl+0x56c/0x5d8) from [<c02e2c18>] (sys_ioctl+0x40/0x64)
[<c02e2c18>] (sys_ioctl+0x40/0x64) from [<c0209a60>] (ret_fast_syscall+0x0/0x38)

Unable to handle kernel paging request at virtual address 6b6b6b73
pgd = c1e70000
[6b6b6b73] *pgd=00000000
Internal error: Oops: 15 [#1] PREEMPT
last sysfs file:
Modules linked in:
CPU: 0    Tainted: G        W   (2.6.32.61-EMBSYS-CGEL-4.03.20.P3.F0.B5MAXCNF #2)
PC is at qdisc_destroy+0xc/0xb4
LR is at mq_destroy+0x34/0x60
pc : [<c054b084>]    lr : [<c054bf50>]    psr: 20000213
sp : c191bd80  ip : c191bd98  fp : c191bd94
r10: 00000000  r9 : c191be70  r8 : c1bff40c
r7 : c1c2e000  r6 : c1f3e140  r5 : 00000000  r4 : c1f3e0a0
r3 : f2266ea0  r2 : 00000000  r1 : c1f3e0cc  r0 : 6b6b6b6b
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 12c5387d  Table: 01e70019  DAC: 55555555
Process ifconfig (pid: 391, stack limit = 0xc191a2e8)
Stack: (0xc191bd80 to 0xc191c000)
[<c054b084>] (qdisc_destroy+0xc/0xb4) from [<c054bf50>] (mq_destroy+0x34/0x60)
[<c054bf50>] (mq_destroy+0x34/0x60) from [<c054b0ec>] (qdisc_destroy+0x74/0xb4)
[<c054b0ec>] (qdisc_destroy+0x74/0xb4) from [<c054b62c>] (qdisc_create_dflt+0x54/0x60)
[<c054b62c>] (qdisc_create_dflt+0x54/0x60) from [<c054b7b4>] (dev_activate+0xac/0x150)
[<c054b7b4>] (dev_activate+0xac/0x150) from [<c053a298>] (dev_open+0xf0/0x120)
[<c053a298>] (dev_open+0xf0/0x120) from [<c0539e08>] (dev_change_flags+0x94/0x164)
[<c0539e08>] (dev_change_flags+0x94/0x164) from [<c05804d8>] (devinet_ioctl+0x300/0x684)
[<c05804d8>] (devinet_ioctl+0x300/0x684) from [<c0581a4c>] (inet_ioctl+0xd0/0x104)
[<c0581a4c>] (inet_ioctl+0xd0/0x104) from [<c0526d0c>] (sock_ioctl+0x200/0x250)
[<c0526d0c>] (sock_ioctl+0x200/0x250) from [<c02e2010>] (vfs_ioctl+0x34/0xb4)
[<c02e2010>] (vfs_ioctl+0x34/0xb4) from [<c02e2b6c>] (do_vfs_ioctl+0x56c/0x5d8)
[<c02e2b6c>] (do_vfs_ioctl+0x56c/0x5d8) from [<c02e2c18>] (sys_ioctl+0x40/0x64)
[<c02e2c18>] (sys_ioctl+0x40/0x64) from [<c0209a60>] (ret_fast_syscall+0x0/0x38)
Code: e89da8f0 e1a0c00d e92dd830 e24cb004 (e5903008)
---[ end trace 8e66b5118c0bea77 ]---
Kernel panic - not syncing: Fatal exception

--------------------------------------------------------------------------------

This patch  fix this problem, base on linux 3.18-rc-1:

Signed-off-by: Wang Bo <wang.bo116@zte.com.cn>
Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 42f72f1..a0c90e7 100755
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -33,6 +33,7 @@ static void mq_destroy(struct Qdisc *sch)
 	for (ntx = 0; ntx < dev->num_tx_queues && priv->qdiscs[ntx]; ntx++)
 		qdisc_destroy(priv->qdiscs[ntx]);
 	kfree(priv->qdiscs);
+	priv->qdiscs = NULL;
 }

 static int mq_init(struct Qdisc *sch, struct nlattr *opt)

^ permalink raw reply related

* Problem with 10Gbit Broadcom NetXtreme II 5771x/578xx 10/20-Gigabit Ethernet Driver
From: Stefan Bottelier | Ocius.nl @ 2014-10-24  8:42 UTC (permalink / raw)
  To: netdev

Hello,

We are using Dell Blade Centers, but we get a error on the 10Gbit 
Broadcom adapter bnx2x

bnx2x 0000:01:00.0: part number 394D4342-31383735-30315430-473030
WARNING: at drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:9410 
bnx2x_init_one+0x1194/0x32b9
Hardware name: PowerEdge M620
  Modules linked in:
  Pid: 1, comm: swapper/0 Not tainted 3.2.63 #1
Call Trace:
warn_slowpath_common+0x78/0xb0
bnx2x_init_one+0x1194/0x32b9
bnx2x_init_one+0x1194/0x32b9
warn_slowpath_null+0x1b/0x20
bnx2x_init_one+0x1194/0x32b9
load_balance+0xb5/0x600
soft_cursor+0x19d/0x220
idr_get_empty_slot+0xf2/0x290
sysfs_link_sibling+0x6a/0xb0
__sysfs_add_one+0x5b/0x100
ida_get_new_above+0x49/0x1e0
kmem_cache_alloc+0xa9/0xb0
sysfs_add_one+0x1c/0xe0
sysfs_addrm_finish+0x12/0x90
sysfs_new_dirent+0x6d/0xf0
sysfs_do_create_link+0xbe/0x1f0
notifier_call_chain+0x44/0x60
pci_device_probe+0xf7/0x100
driver_probe_device+0x60/0x1f0
pci_match_device+0xf/0xa0
driver_probe_device+0x1f0/0x1f0
driver_attach+0x79/0x80
bus_for_each_dev+0x38/0x70
driver_attach+0x16/0x20
driver_probe_device+0x1f0/0x1f0
bus_add_driver+0x16f/0x250
pci_dev_put+0x20/0x20
driver_register+0x63/0x130
process_scheduled_works+0x30/0x30
cnic_init+0x7f/0x7f
__pci_register_driver+0x3c/0xa0
cnic_init+0x7f/0x7f
bnx2x_init+0x71/0x94
do_one_initcall+0x112/0x160
kernel_init+0x10f/0x1ab
do_early_param+0x77/0x77
start_kernel+0x327/0x327
kernel_thread_helper+0x6/0xd
bnx2x 0000:01:00.0: irq 132 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 133 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 134 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 135 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 136 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 137 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 138 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 139 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 140 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 141 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 142 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 143 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 144 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 145 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 146 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 147 for MSI/MSI-X
bnx2x 0000:01:00.0: irq 148 for MSI/MSI-X
bnx2x 0000:01:00.0: eth0: Added CNIC device
bnx2x 0000:01:00.1: part number 394D4342-31383735-30315430-473030

-- 
Met vriendelijke groet,

Stefan Bottelier
Ocius Internet Services

E: Stefan.Bottelier@ocius.nl
T: +31 (0)20 716 39 09
W: http://www.ocius.nl

^ permalink raw reply

* Re: localed stuck in recent 3.18 git in copy_net_ns?
From: Yanko Kaneti @ 2014-10-24  9:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Josh Boyer, Eric W. Biederman, Cong Wang, Kevin Fenzi, netdev,
	Linux-Kernel@Vger. Kernel. Org
In-Reply-To: <20141023220406.GJ4977@linux.vnet.ibm.com>

On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> > 
> > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> > > > On Thu-10/23/14-2014 08:33, Paul E. McKenney wrote:
> > > > > On Thu, Oct 23, 2014 at 05:27:50AM -0700, Paul E. McKenney wrote:
> > > > > > On Thu, Oct 23, 2014 at 09:09:26AM +0300, Yanko Kaneti wrote:
> > > > > > > On Wed, 2014-10-22 at 16:24 -0700, Paul E. McKenney wrote:
> > > > > > > > On Thu, Oct 23, 2014 at 01:40:32AM +0300, Yanko Kaneti 
> > > > > > > > wrote:
> > > > > > > > > On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
> > > > > > > > > > On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
> > > > > > > > > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > > > > > 
> > > > > > > > [ . . . ]
> > > > > > > > 
> > > > > > > > > > > Don't get me wrong -- the fact that this kthread 
> > > > > > > > > > > appears to
> > > > > > > > > > > have
> > > > > > > > > > > blocked within rcu_barrier() for 120 seconds means 
> > > > > > > > > > > that
> > > > > > > > > > > something is
> > > > > > > > > > > most definitely wrong here.  I am surprised that 
> > > > > > > > > > > there are no
> > > > > > > > > > > RCU CPU
> > > > > > > > > > > stall warnings, but perhaps the blockage is in the 
> > > > > > > > > > > callback
> > > > > > > > > > > execution
> > > > > > > > > > > rather than grace-period completion.  Or something is
> > > > > > > > > > > preventing this
> > > > > > > > > > > kthread from starting up after the wake-up callback 
> > > > > > > > > > > executes.
> > > > > > > > > > > Or...
> > > > > > > > > > > 
> > > > > > > > > > > Is this thing reproducible?
> > > > > > > > > > 
> > > > > > > > > > I've added Yanko on CC, who reported the backtrace 
> > > > > > > > > > above and can
> > > > > > > > > > recreate it reliably.  Apparently reverting the RCU 
> > > > > > > > > > merge commit
> > > > > > > > > > (d6dd50e) and rebuilding the latest after that does 
> > > > > > > > > > not show the
> > > > > > > > > > issue.  I'll let Yanko explain more and answer any 
> > > > > > > > > > questions you
> > > > > > > > > > have.
> > > > > > > > > 
> > > > > > > > > - It is reproducible
> > > > > > > > > - I've done another build here to double check and its 
> > > > > > > > > definitely
> > > > > > > > > the rcu merge
> > > > > > > > >   that's causing it.
> > > > > > > > > 
> > > > > > > > > Don't think I'll be able to dig deeper, but I can do 
> > > > > > > > > testing if
> > > > > > > > > needed.
> > > > > > > > 
> > > > > > > > Please!  Does the following patch help?
> > > > > > > 
> > > > > > > Nope, doesn't seem to make a difference to the modprobe 
> > > > > > > ppp_generic
> > > > > > > test
> > > > > > 
> > > > > > Well, I was hoping.  I will take a closer look at the RCU 
> > > > > > merge commit
> > > > > > and see what suggests itself.  I am likely to ask you to 
> > > > > > revert specific
> > > > > > commits, if that works for you.
> > > > > 
> > > > > Well, rather than reverting commits, could you please try 
> > > > > testing the
> > > > > following commits?
> > > > > 
> > > > > 11ed7f934cb8 (rcu: Make nocb leader kthreads process pending 
> > > > > callbacks after spawning)
> > > > > 
> > > > > 73a860cd58a1 (rcu: Replace flush_signals() with 
> > > > > WARN_ON(signal_pending()))
> > > > > 
> > > > > c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
> > > > > 
> > > > >         For whatever it is worth, I am guessing this one.
> > > > 
> > > > Indeed, c847f14217d5 it is.
> > > > 
> > > > Much to my embarrasment I just noticed that in addition to the
> > > > rcu merge, triggering the bug "requires" my specific Fedora 
> > > > rawhide network
> > > > setup. Booting in single mode and modprobe ppp_generic is fine. 
> > > > The bug
> > > > appears when starting with my regular fedora network setup, which 
> > > > in my case
> > > > includes 3 ethernet adapters and a libvirt birdge+nat setup.
> > > > 
> > > > Hope that helps.
> > > > 
> > > > I am attaching the config.
> > > 
> > > It does help a lot, thank you!!!
> > > 
> > > The following patch is a bit of a shot in the dark, and assumes that
> > > commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled 
> > > idle
> > > code) introduced the problem.  Does this patch fix things up?
> > 
> > Unfortunately not, This is linus-tip + patch
> 
> OK.  Can't have everything, I guess.
> 
> > INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
> >       Not tainted 3.18.0-rc1+ #4
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > kworker/u16:6   D ffff8800ca84cec0 11168    96      2 0x00000000
> > Workqueue: netns cleanup_net
> >  ffff8802218339e8 0000000000000096 ffff8800ca84cec0 00000000001d5f00
> >  ffff880221833fd8 00000000001d5f00 ffff880223264ec0 ffff8800ca84cec0
> >  ffffffff82c52040 7fffffffffffffff ffffffff81ee2658 ffffffff81ee2650
> > Call Trace:
> >  [<ffffffff8185b8e9>] schedule+0x29/0x70
> >  [<ffffffff81860b0c>] schedule_timeout+0x26c/0x410
> >  [<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
> >  [<ffffffff8110759c>] ? mark_held_locks+0x7c/0xb0
> >  [<ffffffff81861b90>] ? _raw_spin_unlock_irq+0x30/0x50
> >  [<ffffffff8110772d>] ? trace_hardirqs_on_caller+0x15d/0x200
> >  [<ffffffff8185d31c>] wait_for_completion+0x10c/0x150
> >  [<ffffffff810e4ed0>] ? wake_up_state+0x20/0x20
> >  [<ffffffff8112a219>] _rcu_barrier+0x159/0x200
> >  [<ffffffff8112a315>] rcu_barrier+0x15/0x20
> >  [<ffffffff8171657f>] netdev_run_todo+0x6f/0x310
> >  [<ffffffff8170b145>] ? rollback_registered_many+0x265/0x2e0
> >  [<ffffffff817235ee>] rtnl_unlock+0xe/0x10
> >  [<ffffffff8170cfa6>] default_device_exit_batch+0x156/0x180
> >  [<ffffffff810fd390>] ? abort_exclusive_wait+0xb0/0xb0
> >  [<ffffffff81705053>] ops_exit_list.isra.1+0x53/0x60
> >  [<ffffffff81705c00>] cleanup_net+0x100/0x1f0
> >  [<ffffffff810cca98>] process_one_work+0x218/0x850
> >  [<ffffffff810cc9ff>] ? process_one_work+0x17f/0x850
> >  [<ffffffff810cd1b7>] ? worker_thread+0xe7/0x4a0
> >  [<ffffffff810cd13b>] worker_thread+0x6b/0x4a0
> >  [<ffffffff810cd0d0>] ? process_one_work+0x850/0x850
> >  [<ffffffff810d348b>] kthread+0x10b/0x130
> >  [<ffffffff81028c69>] ? sched_clock+0x9/0x10
> >  [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
> >  [<ffffffff818628bc>] ret_from_fork+0x7c/0xb0
> >  [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
> > 4 locks held by kworker/u16:6/96:
> >  #0:  ("%s""netns"){.+.+.+}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
> >  #1:  (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
> >  #2:  (net_mutex){+.+.+.}, at: [<ffffffff81705b8c>] cleanup_net+0x8c/0x1f0
> >  #3:  (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a0f5>] _rcu_barrier+0x35/0x200
> > INFO: task modprobe:1045 blocked for more than 120 seconds.
> >       Not tainted 3.18.0-rc1+ #4
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > modprobe        D ffff880218343480 12920  1045   1044 0x00000080
> >  ffff880218353bf8 0000000000000096 ffff880218343480 00000000001d5f00
> >  ffff880218353fd8 00000000001d5f00 ffffffff81e1b580 ffff880218343480
> >  ffff880218343480 ffffffff81f8f748 0000000000000246 ffff880218343480
> > Call Trace:
> >  [<ffffffff8185be91>] schedule_preempt_disabled+0x31/0x80
> >  [<ffffffff8185d6e3>] mutex_lock_nested+0x183/0x440
> >  [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
> >  [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
> >  [<ffffffffa0673000>] ? 0xffffffffa0673000
> >  [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
> >  [<ffffffffa0673048>] br_init+0x48/0xd3 [bridge]
> >  [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> >  [<ffffffff81153052>] load_module+0x20c2/0x2870
> >  [<ffffffff8114e030>] ? store_uevent+0x70/0x70
> >  [<ffffffff81278717>] ? kernel_read+0x57/0x90
> >  [<ffffffff811539e6>] SyS_finit_module+0xa6/0xe0
> >  [<ffffffff81862969>] system_call_fastpath+0x12/0x17
> > 1 lock held by modprobe/1045:
> >  #0:  (net_mutex){+.+.+.}, at: [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
> 
> Presumably the kworker/u16:6 completed, then modprobe hung?
> 
> If not, I have some very hard questions about why net_mutex can be
> held by two tasks concurrently, given that it does not appear to be a
> reader-writer lock...
> 
> Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
> __call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
> NOCB callbacks from irq-disabled idle code) would fail.  Is that the case?
> If not, could you please bisect the commits between 11ed7f934cb8 (rcu:
> Make nocb leader kthreads process pending callbacks after spawning)
> and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?

Ok, unless I've messsed up something major, bisecting points to:

35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs

Makes any sense ?


Another thing I noticed is that in failure mode the libvirtd bridge actually 
doesn't show up. So maybe ppp is just the first thing to try that bumps up
into whatever libvirtd is failing to do to setup those.

Truly hope this is not something with random timing dependency....

--Yanko

^ permalink raw reply

* Re: ixgbe driver fails occasionally since ee98b577e7711d5890ded2c7b05578a29512bd39
From: Scott Harrison @ 2014-10-24  9:18 UTC (permalink / raw)
  To: Tantilov, Emil S; +Cc: netdev@vger.kernel.org
In-Reply-To: <87618083B2453E4A8714035B62D679925016872B@FMSMSX105.amr.corp.intel.com>

Emil,

Sorry, I should have looked harder it happens as part of ifup eth0, we have 
"/sbin/ethtool -s eth0 autoneg on" in the interfaces file.

HTH.

Scott.

On Thu, Oct 23, 2014 at 07:48:32PM +0000, Tantilov, Emil S wrote:
>>-----Original Message-----
>>From: netdev-owner@vger.kernel.org [mailto:netdev-
>>owner@vger.kernel.org] On Behalf Of Scott Harrison
>>Sent: Thursday, October 23, 2014 7:06 AM
>>To: netdev@vger.kernel.org
>>Subject: ixgbe driver fails occasionally since ee98b577e7711d5890ded2c7b05578a29512bd39
>>
>>Hi,
>>
>>I was asked to raise this issue here.
>>
>>https://bugzilla.kernel.org/show_bug.cgi?id=86591
>>
>>lspci ->
>>
>>03:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>>03:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>>
>>With the fibre 10Gbs SFP occasionally on reboot we get
>>
>>[   15.104726] ixgbe 0000:03:00.0 eth0: detected SFP+: 5
>>[   19.735155] ixgbe 0000:03:00.0 eth0: setup link failed with code -14
>
>The error message is from an ethtool command. Where you trying to set the speed to a certain value?
>
>Thanks,
>Emil
>

-- 
Software Engineer

Cisco.com - http://www.cisco.com

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorised to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html

^ permalink raw reply

* Re: [RFC] tcp md5 use of alloc_percpu
From: Herbert Xu @ 2014-10-24  9:33 UTC (permalink / raw)
  To: Crestez Dan Leonard; +Cc: eric.dumazet, netdev, linux-crypto
In-Reply-To: <5448383A.4090908@gmail.com>

Crestez Dan Leonard <cdleonard@gmail.com> wrote:
>
>> Yep, but the sg stuff does not allow for stack variables. Because of
>> possible offloading and DMA, I dont know...
> A stack buffer is used in tcp_md5_hash_header to add a tcphdr to the 
> hash. A quick grep for sg_init_one find a couple of additional instances 
> of what looks like doing crypto on small stack buffers:

First of all crypto_hash_update is obsolete, don't use it in any
new code.  Thanks for reminding me to get rid of existing users.

You should either use crypto_shash_update for small data, e.g., headers
or crypto_ahash_update for large data such as whole packets.

If you use shash then you may allocate your buffer on the stack.  With
ahash stack memory is not allowed.

I hope this clears things up for you.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox