* Re: Linux 3.0 release [not found] <CA+55aFxakA2U+oMJ1T7awTYa+p6xp9N0aCbfrUqgkF7BJ8gnQw@mail.gmail.com> @ 2011-07-22 19:11 ` David 2011-07-22 19:21 ` Linus Torvalds 0 siblings, 1 reply; 11+ messages in thread From: David @ 2011-07-22 19:11 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List, netdev On 22/07/11 03:59, Linus Torvalds wrote: > really smooth. Which is not to say that there may not be bugs, but if > anything, there are hopefully fewer than usual, rather than the normal > ".0" problems. I'm getting the following warning at boot from 3.0, everything seems to be fine otherwise though. Jul 22 19:40:02 server kernel: [ 15.526629] ------------[ cut here ]------------ Jul 22 19:40:02 server kernel: [ 15.526635] WARNING: at kernel/timer.c:1011 del_timer_sync+0x4e/0x50() Jul 22 19:40:02 server kernel: [ 15.526637] Hardware name: System Product Name Jul 22 19:40:02 server kernel: [ 15.526638] Modules linked in: xt_owner ipt_REDIRECT ipt_MASQUERADE ts_kmp xt_string ipt_REJECT xt_recent xt_state xt_multiport xt_tcpudp xt_pkttype ipt_LOG xt_limit iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables ip6table_filter ip6_tables x_tables wctdm dahdi hisax nfsd isl6421 crc_ccitt b2c2_flexcop_pci dvb_pll b2c2_flexcop mt352 isdn cx24123 dvb_usb_digitv cx24113 s5h1420 dvb_usb e1000e snd_hda_codec_hdmi exportfs it87 dvb_core pl2303 hwmon_vid lp snd_hda_codec_via ppdev nfs k10temp i2c_piix4 snd_hda_intel snd_hda_codec r8169 mii lockd snd_pcm snd_timer snd shpchp soundcore parport_pc snd_page_alloc parport auth_rpcgss nfs_acl sunrpc Jul 22 19:40:02 server kernel: [ 15.526667] Pid: 0, comm: kworker/0:0 Not tainted 3.0.0 #1 Jul 22 19:40:02 server kernel: [ 15.526668] Call Trace: Jul 22 19:40:02 server kernel: [ 15.526670] <IRQ> [<ffffffff8104599a>] warn_slowpath_common+0x7a/0xb0 Jul 22 19:40:02 server kernel: [ 15.526675] [<ffffffff810459e5>] warn_slowpath_null+0x15/0x20 Jul 22 19:40:02 server kernel: [ 15.526677] [<ffffffff81054b4e>] del_timer_sync+0x4e/0x50 Jul 22 19:40:02 server kernel: [ 15.526679] [<ffffffff8145e224>] linkwatch_schedule_work+0x84/0xa0 Jul 22 19:40:02 server kernel: [ 15.526681] [<ffffffff8145e2bc>] linkwatch_fire_event+0x7c/0x100 Jul 22 19:40:02 server kernel: [ 15.526684] [<ffffffff8146b1ed>] netif_carrier_on+0x2d/0x40 Jul 22 19:40:02 server kernel: [ 15.526689] [<ffffffffa006b6fb>] __rtl8169_check_link_status+0x4b/0xc0 [r8169] Jul 22 19:40:02 server kernel: [ 15.526693] [<ffffffffa006c016>] rtl8169_interrupt+0x166/0x3a0 [r8169] Jul 22 19:40:02 server kernel: [ 15.526696] [<ffffffff810a4385>] handle_irq_event_percpu+0x55/0x1f0 Jul 22 19:40:02 server kernel: [ 15.526698] [<ffffffff810a4551>] handle_irq_event+0x31/0x50 Jul 22 19:40:02 server kernel: [ 15.526701] [<ffffffff8101a371>] ? ack_apic_edge+0x31/0x40 Jul 22 19:40:02 server kernel: [ 15.526703] [<ffffffff810a6eb5>] handle_edge_irq+0x65/0x120 Jul 22 19:40:02 server kernel: [ 15.526706] [<ffffffff81003f8d>] handle_irq+0x1d/0x30 Jul 22 19:40:02 server kernel: [ 15.526708] [<ffffffff81003918>] do_IRQ+0x58/0xe0 Jul 22 19:40:02 server kernel: [ 15.526711] [<ffffffff815588d3>] common_interrupt+0x13/0x13 Jul 22 19:40:02 server kernel: [ 15.526712] <EOI> [<ffffffff8106bf5d>] ? sched_clock_local+0x1d/0x90 Jul 22 19:40:02 server kernel: [ 15.526717] [<ffffffff81009e25>] ? default_idle+0x55/0x170 Jul 22 19:40:02 server kernel: [ 15.526719] [<ffffffff81009fc3>] amd_e400_idle+0x83/0x100 Jul 22 19:40:02 server kernel: [ 15.526721] [<ffffffff8155bb55>] ? atomic_notifier_call_chain+0x15/0x20 Jul 22 19:40:02 server kernel: [ 15.526723] [<ffffffff81001fd9>] cpu_idle+0x59/0xb0 Jul 22 19:40:02 server kernel: [ 15.526726] [<ffffffff81551552>] start_secondary+0x181/0x185 Jul 22 19:40:02 server kernel: [ 15.526727] ---[ end trace 5bac97729a402448 ]--- (dual core phenom 2, dmesg etc on request) Cheers David ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Linux 3.0 release 2011-07-22 19:11 ` Linux 3.0 release David @ 2011-07-22 19:21 ` Linus Torvalds 2011-07-22 19:44 ` Ben Greear 0 siblings, 1 reply; 11+ messages in thread From: Linus Torvalds @ 2011-07-22 19:21 UTC (permalink / raw) To: David; +Cc: Linux Kernel Mailing List, netdev On Fri, Jul 22, 2011 at 12:11 PM, David <david@unsolicited.net> wrote: > > I'm getting the following warning at boot from 3.0, everything seems to be fine otherwise though. > > Jul 22 19:40:02 server kernel: [ 15.526629] ------------[ cut here ]------------ > Jul 22 19:40:02 server kernel: [ 15.526635] WARNING: at kernel/timer.c:1011 del_timer_sync+0x4e/0x50() Hmm. That looks like a real bug: you shouldn't do a "del_timer_sync()" from an interrupt. It probably works, but it sounds like a really bad idea. > Jul 22 19:40:02 server kernel: [ 15.526677] [<ffffffff81054b4e>] del_timer_sync+0x4e/0x50 > Jul 22 19:40:02 server kernel: [ 15.526679] [<ffffffff8145e224>] linkwatch_schedule_work+0x84/0xa0 > Jul 22 19:40:02 server kernel: [ 15.526681] [<ffffffff8145e2bc>] linkwatch_fire_event+0x7c/0x100 > Jul 22 19:40:02 server kernel: [ 15.526684] [<ffffffff8146b1ed>] netif_carrier_on+0x2d/0x40 > Jul 22 19:40:02 server kernel: [ 15.526689] [<ffffffffa006b6fb>] __rtl8169_check_link_status+0x4b/0xc0 [r8169] > Jul 22 19:40:02 server kernel: [ 15.526693] [<ffffffffa006c016>] rtl8169_interrupt+0x166/0x3a0 [r8169] > Jul 22 19:40:02 server kernel: [ 15.526696] [<ffffffff810a4385>] handle_irq_event_percpu+0x55/0x1f0 > Jul 22 19:40:02 server kernel: [ 15.526698] [<ffffffff810a4551>] handle_irq_event+0x31/0x50 I'm not seeing a lot of changes in any of these areas, though. I wonder what made it start happen. Linus ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Linux 3.0 release 2011-07-22 19:21 ` Linus Torvalds @ 2011-07-22 19:44 ` Ben Greear 2011-07-22 20:32 ` Stephen Hemminger 0 siblings, 1 reply; 11+ messages in thread From: Ben Greear @ 2011-07-22 19:44 UTC (permalink / raw) To: Linus Torvalds; +Cc: David, Linux Kernel Mailing List, netdev On 07/22/2011 12:21 PM, Linus Torvalds wrote: > On Fri, Jul 22, 2011 at 12:11 PM, David<david@unsolicited.net> wrote: >> >> I'm getting the following warning at boot from 3.0, everything seems to be fine otherwise though. >> >> Jul 22 19:40:02 server kernel: [ 15.526629] ------------[ cut here ]------------ >> Jul 22 19:40:02 server kernel: [ 15.526635] WARNING: at kernel/timer.c:1011 del_timer_sync+0x4e/0x50() > > Hmm. That looks like a real bug: you shouldn't do a "del_timer_sync()" > from an interrupt. It probably works, but it sounds like a really bad > idea. > >> Jul 22 19:40:02 server kernel: [ 15.526677] [<ffffffff81054b4e>] del_timer_sync+0x4e/0x50 >> Jul 22 19:40:02 server kernel: [ 15.526679] [<ffffffff8145e224>] linkwatch_schedule_work+0x84/0xa0 >> Jul 22 19:40:02 server kernel: [ 15.526681] [<ffffffff8145e2bc>] linkwatch_fire_event+0x7c/0x100 >> Jul 22 19:40:02 server kernel: [ 15.526684] [<ffffffff8146b1ed>] netif_carrier_on+0x2d/0x40 >> Jul 22 19:40:02 server kernel: [ 15.526689] [<ffffffffa006b6fb>] __rtl8169_check_link_status+0x4b/0xc0 [r8169] >> Jul 22 19:40:02 server kernel: [ 15.526693] [<ffffffffa006c016>] rtl8169_interrupt+0x166/0x3a0 [r8169] >> Jul 22 19:40:02 server kernel: [ 15.526696] [<ffffffff810a4385>] handle_irq_event_percpu+0x55/0x1f0 >> Jul 22 19:40:02 server kernel: [ 15.526698] [<ffffffff810a4551>] handle_irq_event+0x31/0x50 > > I'm not seeing a lot of changes in any of these areas, though. I > wonder what made it start happen. This has been around since at least 2.6.38. I haven't tested recently on my rtl8169 system, but I don't recall seeing any attempts to fix it... http://permalink.gmane.org/gmane.linux.network/193565 http://lists.openwall.net/netdev/2011/05/04/183 Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Linux 3.0 release 2011-07-22 19:44 ` Ben Greear @ 2011-07-22 20:32 ` Stephen Hemminger 2011-07-22 20:35 ` Linus Torvalds 2011-07-22 21:26 ` Francois Romieu 0 siblings, 2 replies; 11+ messages in thread From: Stephen Hemminger @ 2011-07-22 20:32 UTC (permalink / raw) To: Ben Greear, Linus Torvalds, David, Tejun Heo Cc: Linux Kernel Mailing List, netdev This a regression which probably began with commit e22bee782b3b00bd4534ae9b1c5fb2e8e6573c5c Author: Tejun Heo <tj@kernel.org> Date: Tue Jun 29 10:07:14 2010 +0200 workqueue: implement concurrency managed dynamic worker pool Before that it was perfectly legal for link watch code to call schedule_delayed_work from IRQ. This should be allowable; the code to manage the worker pool should handle it. Network devices call netif_carrier_on/off from IRQ all the time. The problem is that the new pool code breaks this if link watch tries to schedule work before there enough worker threads. The workqueue code should have a fallback and not try and do anything if being called from IRQ. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Linux 3.0 release 2011-07-22 20:32 ` Stephen Hemminger @ 2011-07-22 20:35 ` Linus Torvalds 2011-07-23 2:27 ` Tejun Heo 2011-07-22 21:26 ` Francois Romieu 1 sibling, 1 reply; 11+ messages in thread From: Linus Torvalds @ 2011-07-22 20:35 UTC (permalink / raw) To: Stephen Hemminger Cc: Ben Greear, David, Tejun Heo, Linux Kernel Mailing List, netdev On Fri, Jul 22, 2011 at 1:32 PM, Stephen Hemminger <shemminger@vyatta.com> wrote: > > The workqueue code should have a fallback and not try and > do anything if being called from IRQ. Fair enough. Especially since one of the *points* of workqueues is indeed to schedule stuff from irqs and that cannot be done immediately. Tejun? Linus ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Linux 3.0 release 2011-07-22 20:35 ` Linus Torvalds @ 2011-07-23 2:27 ` Tejun Heo 2011-07-23 2:30 ` Tejun Heo 0 siblings, 1 reply; 11+ messages in thread From: Tejun Heo @ 2011-07-23 2:27 UTC (permalink / raw) To: Linus Torvalds Cc: Stephen Hemminger, Ben Greear, David, Linux Kernel Mailing List, netdev Hello, Stephen, Linus. On Fri, Jul 22, 2011 at 01:35:16PM -0700, Linus Torvalds wrote: > On Fri, Jul 22, 2011 at 1:32 PM, Stephen Hemminger > <shemminger@vyatta.com> wrote: > > > > The workqueue code should have a fallback and not try and > > do anything if being called from IRQ. > > Fair enough. Especially since one of the *points* of workqueues is > indeed to schedule stuff from irqs and that cannot be done > immediately. > > Tejun? It seems to have been already tracked down but, just to be clear. Nothing changed regarding synchronization requirements for all the queue, flush and cancel functions. If it worked before cmwq, it should work with cmwq. While on the topic, we do have some workqueue API problems. The delayed ones are a bit screwy. e.g. requeueing an already pending delayed work item should probably update the timer but it doesn't andp we have a bunch of users doing cancel/requeue or using separate timers for that. Also, the cancel/flush[_sync] variants are subtly different making using the correct one difficult, which has possibility of introducing bugs which are extremely difficult to reproduce. Again, most of these had accumulated well before cmwq came into the picture. I think we need to make workqueue simpler and easier to use. Thanks. -- tejun ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Linux 3.0 release 2011-07-23 2:27 ` Tejun Heo @ 2011-07-23 2:30 ` Tejun Heo 0 siblings, 0 replies; 11+ messages in thread From: Tejun Heo @ 2011-07-23 2:30 UTC (permalink / raw) To: Linus Torvalds Cc: Stephen Hemminger, Ben Greear, David, Linux Kernel Mailing List, netdev On Sat, Jul 23, 2011 at 04:27:15AM +0200, Tejun Heo wrote: > While on the topic, we do have some workqueue API problems. The > delayed ones are a bit screwy. e.g. requeueing an already pending > delayed work item should probably update the timer but it doesn't andp > we have a bunch of users doing cancel/requeue or using separate timers > for that. (after reading the other branch of the thread) Ooh, bingo, this actually was the issue which triggered the problem reported here. :) -- tejun ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Linux 3.0 release 2011-07-22 20:32 ` Stephen Hemminger 2011-07-22 20:35 ` Linus Torvalds @ 2011-07-22 21:26 ` Francois Romieu 2011-07-22 22:09 ` Stephen Hemminger 1 sibling, 1 reply; 11+ messages in thread From: Francois Romieu @ 2011-07-22 21:26 UTC (permalink / raw) To: Stephen Hemminger Cc: Ben Greear, Linus Torvalds, David, Tejun Heo, Linux Kernel Mailing List, netdev Stephen Hemminger <shemminger@vyatta.com> : > This a regression which probably began with > > commit e22bee782b3b00bd4534ae9b1c5fb2e8e6573c5c > Author: Tejun Heo <tj@kernel.org> > Date: Tue Jun 29 10:07:14 2010 +0200 > > workqueue: implement concurrency managed dynamic worker pool > > Before that it was perfectly legal for link watch code to > call schedule_delayed_work from IRQ. This should be allowable; > the code to manage the worker pool should handle it. I beg to differ: see Ben's first report (http://lists.openwall.net/netdev/2011/05/04/183). ^^ One of the code path in the netif_carrier code leads it to try and disable a late workqueue to reenable it immediately (mod_workqueue anyone ?): netif_carrier_on -> linkwatch_fire_event -> linkwatch_schedule_work -> cancel_delayed_work -> del_timer_sync The del_timer_sync has been here for ages. Afaiks it is not a new pool code nor a schedule_delayed_work only problem. -- Ueimor ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Linux 3.0 release 2011-07-22 21:26 ` Francois Romieu @ 2011-07-22 22:09 ` Stephen Hemminger 2011-07-22 22:53 ` [PATCH] net: allow netif_carrier to be called safely from IRQ Stephen Hemminger 0 siblings, 1 reply; 11+ messages in thread From: Stephen Hemminger @ 2011-07-22 22:09 UTC (permalink / raw) To: Francois Romieu Cc: Ben Greear, Linus Torvalds, David, Tejun Heo, Linux Kernel Mailing List, netdev On Fri, 22 Jul 2011 23:26:29 +0200 Francois Romieu <romieu@fr.zoreil.com> wrote: > Stephen Hemminger <shemminger@vyatta.com> : > > This a regression which probably began with > > > > commit e22bee782b3b00bd4534ae9b1c5fb2e8e6573c5c > > Author: Tejun Heo <tj@kernel.org> > > Date: Tue Jun 29 10:07:14 2010 +0200 > > > > workqueue: implement concurrency managed dynamic worker pool > > > > Before that it was perfectly legal for link watch code to > > call schedule_delayed_work from IRQ. This should be allowable; > > the code to manage the worker pool should handle it. > > I beg to differ: see Ben's first report > (http://lists.openwall.net/netdev/2011/05/04/183). > ^^ > > One of the code path in the netif_carrier code leads it to try and disable > a late workqueue to reenable it immediately (mod_workqueue anyone ?): > netif_carrier_on > -> linkwatch_fire_event > -> linkwatch_schedule_work > -> cancel_delayed_work > -> del_timer_sync > > The del_timer_sync has been here for ages. Afaiks it is not a new pool code > nor a schedule_delayed_work only problem. > That path can be fixed by calling _cancel_delayed_work() instead. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] net: allow netif_carrier to be called safely from IRQ 2011-07-22 22:09 ` Stephen Hemminger @ 2011-07-22 22:53 ` Stephen Hemminger 2011-07-23 0:16 ` David Miller 0 siblings, 1 reply; 11+ messages in thread From: Stephen Hemminger @ 2011-07-22 22:53 UTC (permalink / raw) To: David Miller Cc: Francois Romieu, Ben Greear, Linus Torvalds, David, Tejun Heo, Linux Kernel Mailing List, netdev As reported by Ben Greer and Froncois Romieu. The code path in the netif_carrier code leads it to try and disable a late workqueue to reenable it immediately netif_carrier_on -> linkwatch_fire_event -> linkwatch_schedule_work -> cancel_delayed_work -> del_timer_sync If __cancel_delayed_work is used instead then there is no problem of waiting for running linkwatch_event. There is a race between linkwatch_event running re-scheduling but it is harmless to schedule an extra scan of the linkwatch queue. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> --- a/net/core/link_watch.c 2011-07-22 15:25:31.027533604 -0700 +++ b/net/core/link_watch.c 2011-07-22 15:31:27.531520028 -0700 @@ -126,7 +126,7 @@ static void linkwatch_schedule_work(int return; /* It's already running which is good enough. */ - if (!cancel_delayed_work(&linkwatch_work)) + if (!__cancel_delayed_work(&linkwatch_work)) return; /* Otherwise we reschedule it again for immediate execution. */ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] net: allow netif_carrier to be called safely from IRQ 2011-07-22 22:53 ` [PATCH] net: allow netif_carrier to be called safely from IRQ Stephen Hemminger @ 2011-07-23 0:16 ` David Miller 0 siblings, 0 replies; 11+ messages in thread From: David Miller @ 2011-07-23 0:16 UTC (permalink / raw) To: shemminger; +Cc: romieu, greearb, torvalds, david, tj, linux-kernel, netdev From: Stephen Hemminger <shemminger@vyatta.com> Date: Fri, 22 Jul 2011 15:53:56 -0700 > As reported by Ben Greer and Froncois Romieu. The code path in > the netif_carrier code leads it to try and disable > a late workqueue to reenable it immediately > netif_carrier_on > -> linkwatch_fire_event > -> linkwatch_schedule_work > -> cancel_delayed_work > -> del_timer_sync > > If __cancel_delayed_work is used instead then there is no > problem of waiting for running linkwatch_event. > > There is a race between linkwatch_event running re-scheduling > but it is harmless to schedule an extra scan of the linkwatch queue. > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Applied. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-07-23 2:30 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CA+55aFxakA2U+oMJ1T7awTYa+p6xp9N0aCbfrUqgkF7BJ8gnQw@mail.gmail.com>
2011-07-22 19:11 ` Linux 3.0 release David
2011-07-22 19:21 ` Linus Torvalds
2011-07-22 19:44 ` Ben Greear
2011-07-22 20:32 ` Stephen Hemminger
2011-07-22 20:35 ` Linus Torvalds
2011-07-23 2:27 ` Tejun Heo
2011-07-23 2:30 ` Tejun Heo
2011-07-22 21:26 ` Francois Romieu
2011-07-22 22:09 ` Stephen Hemminger
2011-07-22 22:53 ` [PATCH] net: allow netif_carrier to be called safely from IRQ Stephen Hemminger
2011-07-23 0:16 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).