* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree [not found] ` <1183651165.4069.26.camel@ranko-fc2.spidernet.net> @ 2007-07-06 6:14 ` Jarek Poplawski 2007-07-06 6:20 ` Fwd: " Jarek Poplawski 2007-07-06 6:26 ` Jarek Poplawski 0 siblings, 2 replies; 25+ messages in thread From: Jarek Poplawski @ 2007-07-06 6:14 UTC (permalink / raw) To: Ranko Zivojnovic; +Cc: akpm, kaber, netdev On Thu, Jul 05, 2007 at 06:59:25PM +0300, Ranko Zivojnovic wrote: > On Thu, 2007-07-05 at 17:34 +0300, Ranko Zivojnovic wrote: > > Anyhow - I am currently running 2.6.22-rc6-mm1 + sch_htb patch and > > running a test script that always managed to reproduce the problem > > within half hour - so far it looks good, but I will leave it hammering > > until tomorrow and will let you know. > > > > Ahm...it managed to get stuck: > > Relevant piece of the process table ('tc' stays stuck): > 9608 pts/0 S+ 0:00 /bin/bash ./tc-crash.sh > 9609 pts/0 D+ 0:00 tc qdisc del dev lo root > > There was no soft lockup kernel warning however sysreq 'w' & 'q' dump > the following: > > ---cut--- I have to admit you are really good at this testing! Especially after this first htb bug I was almost stunned, how long such repeatable and not very hidden bug can live in one of the most popular networking modules... This new lockup bug you have just found needs some time to figure out. BTW, I wonder if you had lockdep on (CONFIG_PROVE_LOCKING or CONFIG_LOCK_ALLOC)? Thanks & regards, Jarek P. PS: I hope you don't mind I forward this to netdev. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Fwd: Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-06 6:14 ` + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree Jarek Poplawski @ 2007-07-06 6:20 ` Jarek Poplawski 2007-07-06 6:26 ` Jarek Poplawski 1 sibling, 0 replies; 25+ messages in thread From: Jarek Poplawski @ 2007-07-06 6:20 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Ranko Zivojnovic, akpm, kaber, netdev ----- Forwarded message from Ranko Zivojnovic <ranko@spidernet.net> ----- > Date: Thu, 05 Jul 2007 18:59:25 +0300 > From: Ranko Zivojnovic <ranko@spidernet.net> > Subject: Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added > to -mm tree > To: Jarek Poplawski <jarkao2@o2.pl> > Cc: akpm@linux-foundation.org, kaber@trash.net > X-Original-To: info > Delivered-To: info@dom.local > In-Reply-To: <1183646029.4069.11.camel@ranko-fc2.spidernet.net> > Organization: SpiderNet Services Public Ltd. > X-Mailer: Evolution 2.8.3 (2.8.3-2.fc6) > X-o2-Trust: 2, 43 > > On Thu, 2007-07-05 at 17:34 +0300, Ranko Zivojnovic wrote: > > Anyhow - I am currently running 2.6.22-rc6-mm1 + sch_htb patch and > > running a test script that always managed to reproduce the problem > > within half hour - so far it looks good, but I will leave it hammering > > until tomorrow and will let you know. > > > > Ahm...it managed to get stuck: > > Relevant piece of the process table ('tc' stays stuck): > 9608 pts/0 S+ 0:00 /bin/bash ./tc-crash.sh > 9609 pts/0 D+ 0:00 tc qdisc del dev lo root > > There was no soft lockup kernel warning however sysreq 'w' & 'q' dump > the following: > > ---cut--- > SysRq : Show Blocked State > > free sibling > task PC stack pid father child younger older > kjournald D 00000531 0 337 2 (L-TLB) > c3083e9c 00000046 c4ef3383 00000531 c3083e84 00000000 c292f7fc c3082000 > 00000000 00000000 c2903388 c29031d0 c2903388 c150a000 00000000 c03b5d80 > c29d9480 c2949080 c03ff080 c03ff080 c289e7fc 00000000 c03ff080 0054b8e6 > Call Trace: > [<c02e9e1b>] io_schedule+0x1e/0x28 > [<c0184915>] sync_buffer+0x34/0x37 > [<c02ea416>] __wait_on_bit+0x42/0x5e > [<c01848e1>] sync_buffer+0x0/0x37 > [<c01848e1>] sync_buffer+0x0/0x37 > [<c02ea495>] out_of_line_wait_on_bit+0x63/0x6b > [<c01320a2>] wake_bit_function+0x0/0x3c > [<c0184873>] __wait_on_buffer+0x24/0x29 > [<e083392e>] journal_commit_transaction+0x63d/0x10df [jbd] > [<c02eb265>] _spin_unlock_irq+0x5/0x7 > [<c02e9713>] schedule+0x323/0x867 > [<c02eb277>] _spin_lock_irqsave+0x9/0xd > [<c0128dfa>] lock_timer_base+0x19/0x35 > [<c0128e5d>] try_to_del_timer_sync+0x47/0x4f > [<e0837899>] kjournald+0xac/0x1f4 [jbd] > [<c013206b>] autoremove_wake_function+0x0/0x37 > [<e08377ed>] kjournald+0x0/0x1f4 [jbd] > [<c0131e26>] kthread+0x34/0x56 > [<c0131df2>] kthread+0x0/0x56 > [<c0104023>] kernel_thread_helper+0x7/0x14 > ======================= > syslogd D 00000531 0 1414 1 (NOTLB) > c3045e84 00200082 c4e83c2f 00000531 c1506a80 00000002 00000000 c3044000 > 1d5003ae 00000552 c29de888 c29de6d0 c29de888 c150a000 00000000 00000003 > 00200082 c2949080 c03ff080 c03ff080 c2ac46b0 00000000 c03ff080 0054b8e6 > Call Trace: > [<e0836bdb>] log_wait_commit+0xaf/0x116 [jbd] > [<c013206b>] autoremove_wake_function+0x0/0x37 > [<e08316b1>] journal_stop+0x158/0x1f7 [jbd] > [<c0180046>] __writeback_single_inode+0x267/0x329 > [<c014bcd7>] generic_writepages+0x20/0x29 > [<c0180121>] sync_inode+0x19/0x2a > [<e086c4f4>] ext3_sync_file+0xbc/0xc8 [ext3] > [<c0182d89>] do_fsync+0x55/0x8a > [<c0182ddb>] __do_fsync+0x1d/0x2b > [<c01033d2>] sysenter_past_esp+0x5f/0x85 > ======================= > irqbalance D 0000040E 0 1428 1 (NOTLB) > c304becc 00200082 f5b6412a 0000040e c304beb4 00000000 c34e0000 c304a000 > 00000001 c2a754c4 c22a2848 c22a2690 c22a2848 c1513000 00000001 c03b5d80 > 0000000a c286e480 c03ff080 c03ff080 c03631be c18667e4 c03ff080 003fd3f2 > Call Trace: > [<c02ea650>] __mutex_lock_slowpath+0x54/0x8d > [<c02ea50e>] mutex_lock+0x21/0x26 > [<c027d0ef>] sock_ioctl+0x0/0x1c6 > [<c027d0ef>] sock_ioctl+0x0/0x1c6 > [<c02895a2>] dev_ioctl+0x20c/0x309 > [<c018d4d1>] inotify_d_instantiate+0x1b/0x80 > [<c0175858>] d_instantiate+0x56/0x8a > [<c027d12f>] sock_ioctl+0x40/0x1c6 > [<c027d0ef>] sock_ioctl+0x0/0x1c6 > [<c017046f>] do_ioctl+0x1f/0x6d > [<c027e39f>] sys_socket+0x29/0x44 > [<c017050d>] vfs_ioctl+0x50/0x26b > [<c0170785>] sys_ioctl+0x5d/0x6c > [<c01033d2>] sysenter_past_esp+0x5f/0x85 > ======================= > tc D 00000409 0 9609 9608 (NOTLB) > c37a7be8 00200086 ee308e07 00000409 c1506a80 00000002 00000000 c37a6000 > 3e416208 00000414 c29021e8 c2902030 c29021e8 c150a000 00000000 c321c4c4 > c37a7bd0 c19d36c0 c03ff080 c03ff080 c0134d71 00200286 c03ff080 003fe1ed > Call Trace: > [<c0134d71>] lock_hrtimer_base+0x15/0x2f > [<c02e9d49>] wait_for_completion+0x9e/0xb7 > [<c011b805>] default_wake_function+0x0/0xc > [<c0130122>] synchronize_rcu+0x2a/0x2f > [<c012fddb>] wakeme_after_rcu+0x0/0x8 > [<c0295ba5>] dev_deactivate+0x8a/0xa3 > [<c0296f12>] dev_graft_qdisc+0x81/0xa4 > [<c0296f5d>] qdisc_graft+0x28/0x88 > [<c0297709>] tc_get_qdisc+0x15d/0x1e9 > [<c02975ac>] tc_get_qdisc+0x0/0x1e9 > [<c02904f9>] rtnetlink_rcv_msg+0x1c2/0x1f5 > [<c029b0a0>] netlink_run_queue+0xa7/0x10e > [<c0290337>] rtnetlink_rcv_msg+0x0/0x1f5 > [<c02902e9>] rtnetlink_rcv+0x26/0x42 > [<c029b5c9>] netlink_data_ready+0x12/0x54 > [<c029a244>] netlink_sendskb+0x1f/0x53 > [<c029b4d8>] netlink_sendmsg+0x1f5/0x2d4 > [<c027dae6>] sock_sendmsg+0xe2/0xfd > [<c013206b>] autoremove_wake_function+0x0/0x37 > [<c013206b>] autoremove_wake_function+0x0/0x37 > [<c027dae6>] sock_sendmsg+0xe2/0xfd > [<c01d9ec3>] copy_from_user+0x2d/0x59 > [<c027dc2e>] sys_sendmsg+0x12d/0x243 > [<c014805a>] filemap_fault+0x125/0x42d > [<c0151be7>] __do_fault+0x18f/0x37d > [<c014afb4>] __alloc_pages+0x4f/0x358 > [<c0147f35>] filemap_fault+0x0/0x42d > [<c0153059>] __handle_mm_fault+0x114/0x662 > [<c0155717>] acct_stack_growth+0x86/0x9a > [<c027ee6d>] sys_socketcall+0x24f/0x271 > [<c011632c>] do_page_fault+0x0/0x6b4 > [<c01033d2>] sysenter_past_esp+0x5f/0x85 > ======================= > Sched Debug Version: v0.03, cfs-v18, 2.6.22-rc6-mm1.SNET.Thors.htbpatch.2 #2 > now at 5852817390013 nsecs > > cpu#0, 2992.428 MHz > .nr_running : 0 > .load : 0 > .ls.delta_fair : 0 > .ls.delta_exec : 0 > .nr_switches : 339295 > .nr_load_updates : 5726868 > .nr_uninterruptible : 4294967087 > .jiffies : 5552844 > .next_balance : 5552847 > .curr->pid : 0 > .clock : 5711615883107 > .prev_clock_raw : 5850635770741 > .clock_warps : 1 > .clock_overflows : 12712 > .clock_unstable_events : 0 > .clock_max_delta : 1999660 > .cfs.fair_clock : 1406994636435 > .cfs.exec_clock : 1106293764887 > .cfs.wait_runtime : -700502353213 > .cfs.wait_runtime_overruns : 36267 > .cfs.wait_runtime_underruns : 1329 > .cpu_load[0] : 0 > .cpu_load[1] : 0 > .cpu_load[2] : 0 > .cpu_load[3] : 0 > .cpu_load[4] : 0 > .wait_runtime_rq_sum : 0 > > runnable tasks: > task PID tree-key delta waiting switches prio sum-exec sum-wait sum-sleep wait-overrun wait-underrun > ------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > cpu#1, 2992.428 MHz > .nr_running : 1 > .load : 1024 > .ls.delta_fair : 536356056422 > .ls.delta_exec : 8760209027 > .nr_switches : 401913 > .nr_load_updates : 4460569 > .nr_uninterruptible : 213 > .jiffies : 5552969 > .next_balance : 4183026 > .curr->pid : 4536 > .clock : 4465134326215 > .prev_clock_raw : 5850235012355 > .clock_warps : 0 > .clock_overflows : 6100 > .clock_unstable_events : 0 > .clock_max_delta : 1999533 > .cfs.fair_clock : 1265459395930 > .cfs.exec_clock : 1019469215716 > .cfs.wait_runtime : -866196108791 > .cfs.wait_runtime_overruns : 35706 > .cfs.wait_runtime_underruns : 4327 > .cpu_load[0] : 2302 > .cpu_load[1] : 2514 > .cpu_load[2] : 1966 > .cpu_load[3] : 1532 > .cpu_load[4] : 1292 > .wait_runtime_rq_sum : 15731878 > > runnable tasks: > task PID tree-key delta waiting switches prio sum-exec sum-wait sum-sleep wait-overrun wait-underrun > ------------------------------------------------------------------------------------------------------------------------------------------------------------------ > R bash 4536 1265443664052 -15731878 15731878 192 120 144540956 410343770 2862976805398 19 0 > > > SysRq : Show Pending Timers > Timer List Version: v0.3 > HRTIMER_MAX_CLOCK_BASES: 2 > now at 6550224915183 nsecs > > cpu: 0 > clock 0: > .index: 0 > .resolution: 1 nsecs > .get_time: ktime_get_real > .offset: 1183655045226773151 nsecs > active timers: > clock 1: > .index: 1 > .resolution: 1 nsecs > .get_time: ktime_get > .offset: 0 nsecs > active timers: > #0: <3>BUG: sleeping function called from invalid context at kernel/mutex.c:86 > in_atomic():0, irqs_disabled():1 > [<c02ea502>] mutex_lock+0x15/0x26 > [<c021f810>] vt_console_print+0x0/0x287 > [<c013ff72>] lookup_module_symbol_name+0x14/0xbb > [<c01379fc>] print_name_offset+0x1a/0x8e > [<c01214c4>] release_console_sem+0x1ae/0x1c9 > [<c0117f03>] __wake_up_common+0x39/0x59 > [<c0136787>] getnstimeofday+0x30/0xc9 > [<c0135116>] ktime_get_ts+0x16/0x44 > [<c0138389>] timer_list_show+0x65c/0xa5d > [<c0139a24>] tick_sched_timer+0x0/0x178 > [<c0223e93>] __handle_sysrq+0x7a/0x115 > [<c01a0078>] write_sysrq_trigger+0x0/0x2d > [<c01a00a3>] write_sysrq_trigger+0x2b/0x2d > [<c019a7f7>] proc_reg_write+0x58/0x79 > [<c0165a1a>] vfs_write+0xa6/0x13e > [<c019a79f>] proc_reg_write+0x0/0x79 > [<c016600a>] sys_write+0x41/0x6a > [<c01033d2>] sysenter_past_esp+0x5f/0x85 > ======================= > <c34dbef0>, tick_sched_timer, S:01 > # expires at 6550223000000 nsecs [in 18446744073707636433 nsecs] > #1: <c34dbef0>, hrtimer_wakeup, S:01 > # expires at 6560773230863 nsecs [in 10548315680 nsecs] > #2: <c34dbef0>, it_real_fn, S:01 > # expires at 7238128506303 nsecs [in 687903591120 nsecs] > #3: <c34dbef0>, it_real_fn, S:01 > # expires at 7238392105656 nsecs [in 688167190473 nsecs] > .expires_next : 6550223000000 nsecs > .hres_active : 1 > .nr_events : 6409367 > .nohz_mode : 0 > .idle_tick : 0 nsecs > .tick_stopped : 0 > .idle_jiffies : 0 > .idle_calls : 0 > .idle_sleeps : 0 > .idle_entrytime : 0 nsecs > .idle_sleeptime : 0 nsecs > .last_jiffies : 0 > .next_jiffies : 0 > .idle_expires : 0 nsecs > jiffies: 6250223 > > cpu: 1 > clock 0: > .index: 0 > .resolution: 1 nsecs > .get_time: ktime_get_real > .offset: 1183655045226773151 nsecs > active timers: > clock 1: > .index: 1 > .resolution: 1 nsecs > .get_time: ktime_get > .offset: 0 nsecs > active timers: > #0: <c34dbef0>, hrtimer_wakeup, S:01 > # expires at 6550773229533 nsecs [in 548314350 nsecs] > #1: <c34dbef0>, tick_sched_timer, S:01 > # expires at 8881205015625 nsecs [in 2330980100442 nsecs] > #2: <c34dbef0>, hrtimer_wakeup, S:01 > # expires at 8894203449198 nsecs [in 2343978534015 nsecs] > #3: <c34dbef0>, it_real_fn, S:01 > # expires at 8911201851655 nsecs [in 2360976936472 nsecs] > #4: <c34dbef0>, hrtimer_wakeup, S:01 > # expires at 9181203252434 nsecs [in 2630978337251 nsecs] > #5: <c34dbef0>, hrtimer_wakeup, S:01 > # expires at 9841202403256 nsecs [in 3290977488073 nsecs] > .expires_next : 6550773229533 nsecs > .hres_active : 1 > .nr_events : 4475068 > .nohz_mode : 0 > .idle_tick : 0 nsecs > .tick_stopped : 0 > .idle_jiffies : 0 > .idle_calls : 0 > .idle_sleeps : 0 > .idle_entrytime : 0 nsecs > .idle_sleeptime : 0 nsecs > .last_jiffies : 0 > .next_jiffies : 0 > .idle_expires : 0 nsecs > jiffies: 6250223 > > > Tick Device: mode: 1 > Clock Event Device: pit > max_delta_ns: 27461866 > min_delta_ns: 12571 > mult: 5124677 > shift: 32 > mode: 3 > next_event: 9223372036854775807 nsecs > set_next_event: pit_next_event > set_mode: init_pit_timer > event_handler: tick_handle_oneshot_broadcast > tick_broadcast_mask: 00000000 > tick_broadcast_oneshot_mask: 00000000 > > > Tick Device: mode: 1 > Clock Event Device: lapic > max_delta_ns: 2037624802 > min_delta_ns: 3643 > mult: 17681760 > shift: 32 > mode: 3 > next_event: 6550223000000 nsecs > set_next_event: lapic_next_event > set_mode: lapic_timer_setup > event_handler: hrtimer_interrupt > > Tick Device: mode: 1 > Clock Event Device: lapic > max_delta_ns: 2037624802 > min_delta_ns: 3643 > mult: 17681760 > shift: 32 > mode: 3 > next_event: 6550773229533 nsecs > set_next_event: lapic_next_event > set_mode: lapic_timer_setup > event_handler: hrtimer_interrupt > > > ---cut--- > > R. > > ----- End forwarded message ----- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-06 6:14 ` + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree Jarek Poplawski 2007-07-06 6:20 ` Fwd: " Jarek Poplawski @ 2007-07-06 6:26 ` Jarek Poplawski 2007-07-06 6:45 ` Jarek Poplawski 1 sibling, 1 reply; 25+ messages in thread From: Jarek Poplawski @ 2007-07-06 6:26 UTC (permalink / raw) To: Ranko Zivojnovic; +Cc: akpm, kaber, netdev On Fri, Jul 06, 2007 at 08:14:20AM +0200, Jarek Poplawski wrote: ... > This new lockup bug you have just found needs some time > to figure out. BTW, I wonder if you had lockdep on > (CONFIG_PROVE_LOCKING or CONFIG_LOCK_ALLOC)? Should be: (CONFIG_PROVE_LOCKING or CONFIG_DEBUG_LOCK_ALLOC)? Jarek P. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-06 6:26 ` Jarek Poplawski @ 2007-07-06 6:45 ` Jarek Poplawski 2007-07-06 12:47 ` Jarek Poplawski 2007-07-06 13:14 ` Ranko Zivojnovic 0 siblings, 2 replies; 25+ messages in thread From: Jarek Poplawski @ 2007-07-06 6:45 UTC (permalink / raw) To: Ranko Zivojnovic; +Cc: akpm, kaber, netdev On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote: > On Thu, 2007-07-05 at 18:59 +0300, Ranko Zivojnovic wrote: > > On Thu, 2007-07-05 at 17:34 +0300, Ranko Zivojnovic wrote: > > > Anyhow - I am currently running 2.6.22-rc6-mm1 + sch_htb patch and > > > running a test script that always managed to reproduce the problem > > > within half hour - so far it looks good, but I will leave it hammering > > > until tomorrow and will let you know. > > > > > > > Ahm...it managed to get stuck: > > > > Relevant piece of the process table ('tc' stays stuck): > > 9608 pts/0 S+ 0:00 /bin/bash ./tc-crash.sh > > 9609 pts/0 D+ 0:00 tc qdisc del dev lo root > > > > Something really weird happened here... > > I've left it the way it was (stuck) over night and this morning the > process was terminated. > > It looks like some timer was out of whack. > > I'm not sure if it is possibly relevant - but the test I performed was > done in VMWare. Yes, this could be relevant, so it's good to know. > > In order to get that parameter out of the way - I will make the same > test on a real machine. Fine! But, unles you have something to hide, try to reply on these messages which went through netdev, or cc netdev at least, because maybe more people would be interested or even could help to solve this faster. Cheers, Jarek P. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-06 6:45 ` Jarek Poplawski @ 2007-07-06 12:47 ` Jarek Poplawski 2007-07-06 13:16 ` Ranko Zivojnovic 2007-07-06 13:14 ` Ranko Zivojnovic 1 sibling, 1 reply; 25+ messages in thread From: Jarek Poplawski @ 2007-07-06 12:47 UTC (permalink / raw) To: Ranko Zivojnovic; +Cc: akpm, kaber, netdev On Fri, Jul 06, 2007 at 08:45:23AM +0200, Jarek Poplawski wrote: > On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote: ... > > In order to get that parameter out of the way - I will make the same > > test on a real machine. BTW, maybe it would be better to try with something more stable than -mm. There is also one bug invoked by sysreq, but I hope you've started to use sysreq only after the lockup begun. Jarek P. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-06 12:47 ` Jarek Poplawski @ 2007-07-06 13:16 ` Ranko Zivojnovic 2007-07-09 8:25 ` Jarek Poplawski 0 siblings, 1 reply; 25+ messages in thread From: Ranko Zivojnovic @ 2007-07-06 13:16 UTC (permalink / raw) To: Jarek Poplawski; +Cc: akpm, kaber, netdev On Fri, 2007-07-06 at 14:47 +0200, Jarek Poplawski wrote: > On Fri, Jul 06, 2007 at 08:45:23AM +0200, Jarek Poplawski wrote: > > On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote: > ... > > > In order to get that parameter out of the way - I will make the same > > > test on a real machine. > > BTW, maybe it would be better to try with something more stable than -mm. That would be the next thing I was to try. Suggestion? > There is also one bug invoked by sysreq, but I hope you've started to use > sysreq only after the lockup begun. Yes - it was after the lockup. R. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-06 13:16 ` Ranko Zivojnovic @ 2007-07-09 8:25 ` Jarek Poplawski 0 siblings, 0 replies; 25+ messages in thread From: Jarek Poplawski @ 2007-07-09 8:25 UTC (permalink / raw) To: Ranko Zivojnovic; +Cc: akpm, kaber, netdev On Fri, Jul 06, 2007 at 04:16:18PM +0300, Ranko Zivojnovic wrote: > On Fri, 2007-07-06 at 14:47 +0200, Jarek Poplawski wrote: > > On Fri, Jul 06, 2007 at 08:45:23AM +0200, Jarek Poplawski wrote: > > > On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote: > > ... > > > > In order to get that parameter out of the way - I will make the same > > > > test on a real machine. > > > > BTW, maybe it would be better to try with something more stable than -mm. > > That would be the next thing I was to try. Suggestion? Of course, -mm needs testing too, and it's probably easier to find bugs here, but it's hard to check fixes: you never know if the new bugs are caused by the fix or the fix only uncovered them. So, last stable or -rc versions (at least with higher numbers) should make things faster. I've looked a bit on this first log, and it's very strange, especially stats of cpu#0. Maybe it's really about vmware or maybe there is something more, but it doesn't seem this stats_lock deadlock could've done all this mess. Anyway, it would be interesting to repeat such test with -mm after checking e.g. with 2.6.22. Thanks, Jarek P. PS: Sorry for late responding. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-06 6:45 ` Jarek Poplawski 2007-07-06 12:47 ` Jarek Poplawski @ 2007-07-06 13:14 ` Ranko Zivojnovic 2007-07-06 13:27 ` Patrick McHardy 2007-07-06 14:21 ` Patrick McHardy 1 sibling, 2 replies; 25+ messages in thread From: Ranko Zivojnovic @ 2007-07-06 13:14 UTC (permalink / raw) To: Jarek Poplawski; +Cc: akpm, kaber, netdev On Fri, 2007-07-06 at 08:45 +0200, Jarek Poplawski wrote: > On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote: > > On Thu, 2007-07-05 at 18:59 +0300, Ranko Zivojnovic wrote: > > > On Thu, 2007-07-05 at 17:34 +0300, Ranko Zivojnovic wrote: > > > > Anyhow - I am currently running 2.6.22-rc6-mm1 + sch_htb patch and > > > > running a test script that always managed to reproduce the problem > > > > within half hour - so far it looks good, but I will leave it hammering > > > > until tomorrow and will let you know. > > > > > > > > > > Ahm...it managed to get stuck: > > > > > > Relevant piece of the process table ('tc' stays stuck): > > > 9608 pts/0 S+ 0:00 /bin/bash ./tc-crash.sh > > > 9609 pts/0 D+ 0:00 tc qdisc del dev lo root > > > > > > > Something really weird happened here... > > > > I've left it the way it was (stuck) over night and this morning the > > process was terminated. > > > > It looks like some timer was out of whack. > > > > I'm not sure if it is possibly relevant - but the test I performed was > > done in VMWare. > > Yes, this could be relevant, so it's good to know. > > > > > In order to get that parameter out of the way - I will make the same > > test on a real machine. > > Fine! But, unles you have something to hide, try to reply > on these messages which went through netdev, or cc netdev > at least, because maybe more people would be interested > or even could help to solve this faster. > Nope - nothing to hide :). Managed to get stuck on a normal machine as well (2.6.22-rc6-mm1 + sch_htb patch) ... here's the log: ---CUT--- -- 0:loop-telnet.sh -- time-stamp -- Jul/06/07 14:46:17 -- u32 classifier Performance counters on OLD policer on input device check on -- 0:loop-telnet.sh -- time-stamp -- Jul/06/07 14:48:13 -- -- 0:loop-telnet.sh -- time-stamp -- Jul/06/07 15:38:26 -- BUG: spinlock lockup on CPU#0, swapper/0, c03eff80 [<c01ed1fe>] _raw_spin_lock+0x108/0x13c [<c02a8468>] __qdisc_run+0x97/0x1b0 [<c02a96f3>] qdisc_watchdog+0x19/0x58 [<c02fe5e7>] __lock_text_start+0x37/0x43 [<c02a9730>] qdisc_watchdog+0x56/0x58 [<c02a96da>] qdisc_watchdog+0x0/0x58 [<c0135d84>] run_hrtimer_softirq+0x58/0xb5 [<c012626a>] __do_softirq+0x7e/0xf3 [<c0126335>] do_softirq+0x56/0x58 [<c0112574>] smp_apic_timer_interrupt+0x5a/0x85 [<c0103eb1>] apic_timer_interrupt+0x29/0x38 [<c0103ebb>] apic_timer_interrupt+0x33/0x38 [<c010172f>] mwait_idle_with_hints+0x3b/0x3f [<c0101846>] cpu_idle+0x5f/0x74 [<c0417d2a>] start_kernel+0x318/0x3ae [<c04174bf>] unknown_bootoption+0x0/0x258 ======================= BUG: spinlock lockup on CPU#1, swapper/0, c03eff80 [<c01ed1fe>] _raw_spin_lock+0x108/0x13c [<c0298b9b>] est_timer+0x53/0x148 [<c01294b3>] run_timer_softirq+0x30/0x184 [<c01295a4>] run_timer_softirq+0x121/0x184 [<c0126252>] __do_softirq+0x66/0xf3 [<c0298b48>] est_timer+0x0/0x148 [<c012626a>] __do_softirq+0x7e/0xf3 [<c0126335>] do_softirq+0x56/0x58 [<c0112574>] smp_apic_timer_interrupt+0x5a/0x85 [<c0103eb1>] apic_timer_interrupt+0x29/0x38 [<c0103ebb>] apic_timer_interrupt+0x33/0x38 [<c010172f>] mwait_idle_with_hints+0x3b/0x3f [<c0101846>] cpu_idle+0x5f/0x74 ======================= telnet> send break SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks telnet> send break SysRq : Show Regs Pid: 0, comm: swapper EIP: 0060:[<c01ed199>] CPU: 1 EIP is at _raw_spin_lock+0xa3/0x13c EFLAGS: 00000246 Not tainted (2.6.22-rc6-mm1.SNET.Thors.htbpatch.2.lockdebug #1) EAX: 00000000 EBX: c03eff80 ECX: e4f7bf60 EDX: 98b928f8 ESI: 260e7970 EDI: 00000000 EBP: beb75188 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: 080f7578 CR3: 03651000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c0298b9b>] est_timer+0x53/0x148 [<c01294b3>] run_timer_softirq+0x30/0x184 [<c01295a4>] run_timer_softirq+0x121/0x184 [<c0126252>] __do_softirq+0x66/0xf3 [<c0298b48>] est_timer+0x0/0x148 [<c012626a>] __do_softirq+0x7e/0xf3 [<c0126335>] do_softirq+0x56/0x58 [<c0112574>] smp_apic_timer_interrupt+0x5a/0x85 [<c0103eb1>] apic_timer_interrupt+0x29/0x38 [<c0103ebb>] apic_timer_interrupt+0x33/0x38 [<c010172f>] mwait_idle_with_hints+0x3b/0x3f [<c0101846>] cpu_idle+0x5f/0x74 ======================= telnet> send break SysRq : Show Blocked State free sibling task PC stack pid father child younger older kjournald D 00000343 0 518 2 (L-TLB) c2361e8c 00000092 090e4f4b 00000343 c2361e74 00000000 c2baf920 c2360000 00000002 c2baf920 c21343f0 c21345a8 c1d24080 00000002 c040ed80 c2361ec4 c4289280 00000000 00000000 003252cc c02feb97 c2361ecc ffffffff 00000000 Call Trace: [<c02feb97>] _spin_unlock_irqrestore+0x34/0x39 [<c02fc563>] io_schedule+0x1d/0x27 [<c018b02c>] sync_buffer+0x30/0x35 [<c02fcb5e>] __wait_on_bit+0x42/0x5e [<c018affc>] sync_buffer+0x0/0x35 [<c018affc>] sync_buffer+0x0/0x35 [<c02fcbdd>] out_of_line_wait_on_bit+0x63/0x76 [<c0132bf2>] wake_bit_function+0x0/0x3c [<c018af91>] __wait_on_buffer+0x24/0x29 [<c018bdb2>] sync_dirty_buffer+0x9c/0xe6 [<f88fcefe>] journal_get_descriptor_buffer+0x88/0x9a [jbd] [<f88f922b>] journal_commit_transaction+0xf2a/0x111f [jbd] [<c02fea7a>] _spin_unlock_irq+0x20/0x23 [<f88fc8bf>] kjournald+0xac/0x1f4 [jbd] [<c0132bbb>] autoremove_wake_function+0x0/0x37 [<f88fc813>] kjournald+0x0/0x1f4 [jbd] [<c0132976>] kthread+0x34/0x56 [<c0132942>] kthread+0x0/0x56 [<c0104053>] kernel_thread_helper+0x7/0x14 ======================= syslogd D 00000341 0 2173 1 (NOTLB) c4315e84 00000082 b9c1b1c3 00000341 c4315e6c 00000000 00000000 c4314000 00000002 00000002 c2ea25f0 c2ea27a8 c1d24080 00000002 c040ed80 c2ea25f0 c3652b80 c2aab4c8 c2aab414 00323cc0 00000002 00000046 ffffffff 00000000 Call Trace: [<f88fbbfb>] log_wait_commit+0xaf/0x116 [jbd] [<c0132bbb>] autoremove_wake_function+0x0/0x37 [<f88f66b1>] journal_stop+0x158/0x1fa [jbd] [<c0186711>] __writeback_single_inode+0x26f/0x331 [<c01867e3>] sync_inode+0x10/0x2a [<c02fe7cf>] _spin_lock+0x29/0x34 [<c01867ec>] sync_inode+0x19/0x2a [<f88b3533>] ext3_sync_file+0xbf/0xcc [ext3] [<c02fd2f1>] __mutex_lock_slowpath+0x14b/0x293 [<c018948b>] do_fsync+0x57/0x90 [<c01894e1>] __do_fsync+0x1d/0x2b [<c0103366>] sysenter_past_esp+0x5f/0x99 ======================= crond D 0000034B 0 10251 2740 (NOTLB) c5b6be18 00000086 44c73986 0000034b c5b6be00 00000000 c43048f0 c5b6a000 00000002 c04ea564 c43048f0 c4304aa8 c1d24080 00000002 c040ed80 00000000 c393e280 c02fea7a c5b6be2c 0032dd68 c013d19e c02fc418 ffffffff 00000000 Call Trace: [<c02fea7a>] _spin_unlock_irq+0x20/0x23 [<c013d19e>] trace_hardirqs_on+0xab/0x147 [<c02fc418>] wait_for_completion+0x24/0xb7 [<c02fc492>] wait_for_completion+0x9e/0xb7 [<c011bd50>] default_wake_function+0x0/0xc [<c012f537>] call_usermodehelper_exec+0xdb/0xe0 [<c0167d3b>] cache_alloc_refill+0x163/0x557 [<c012f69b>] request_module+0x9f/0xe2 [<c02feb97>] _spin_unlock_irqrestore+0x34/0x39 [<c013d19e>] trace_hardirqs_on+0xab/0x147 [<c0118f2d>] __wake_up+0x32/0x43 [<c02adfa2>] netlink_create+0xa6/0x1a2 [<c029011b>] __sock_create+0xae/0x1db [<c029027d>] sock_create+0x18/0x1d [<c0290412>] sys_socket+0x1c/0x44 [<c0290d06>] sys_socketcall+0x68/0x271 [<c013d19e>] trace_hardirqs_on+0xab/0x147 [<c0103438>] restore_nocheck+0x12/0x15 [<c0103366>] sysenter_past_esp+0x5f/0x99 ======================= Sched Debug Version: v0.03, cfs-v18, 2.6.22-rc6-mm1.SNET.Thors.htbpatch.2.lockdebug #1 now at 3640122056027 nsecs cpu#0, 3200.442 MHz .nr_running : 2 .load : 4149 .ls.delta_fair : 0 .ls.delta_exec : 0 .nr_switches : 904853 .nr_load_updates : 3640019 .nr_uninterruptible : 425 .jiffies : 3340159 .next_balance : 2936193 .curr->pid : 0 .clock : 3628742932463 .prev_clock_raw : 3628948350088 .clock_warps : 0 .clock_overflows : 57 .clock_unstable_events : 1 .clock_max_delta : 1999557 .cfs.fair_clock : 557156111815 .cfs.exec_clock : 213278676734 .cfs.wait_runtime : -93884907451 .cfs.wait_runtime_overruns : 1235 .cfs.wait_runtime_underruns : 2874 .cpu_load[0] : 0 .cpu_load[1] : 0 .cpu_load[2] : 0 .cpu_load[3] : 0 .cpu_load[4] : 0 .wait_runtime_rq_sum : 47286887 runnable tasks: task PID tree-key delta waiting switches prio sum-exec sum-wait sum-sleep wait-overrun wait-underrun ------------------------------------------------------------------------------------------------------------------------------------------------------------------ khelper 19 557082869628 -73242187 24000000 955 115 12293227 53703642836 3621787091747 1 0 eping 3114 557163819749 7707934 23286887 333993 120 83353964767 253906993 2733416723635 41 217 WARNING: at kernel/lockdep.c:2031 trace_hardirqs_on() [<c02fea9d>] _read_unlock_irq+0x20/0x23 [<c0119814>] sched_debug_show+0x8b2/0xe04 [<c011e2ab>] show_state_filter+0x161/0x184 [<c02354c3>] __handle_sysrq+0x7a/0x115 [<c024059d>] receive_chars+0x23b/0x2a4 [<c024174d>] serial8250_interrupt+0x114/0x132 [<c0148733>] handle_IRQ_event+0x1e/0x48 [<c014982f>] handle_edge_irq+0x9b/0x112 [<c0105afc>] do_IRQ+0x45/0x99 [<c0103dd0>] common_interrupt+0x24/0x34 [<c0103dda>] common_interrupt+0x2e/0x34 [<c010172f>] mwait_idle_with_hints+0x3b/0x3f [<c0101846>] cpu_idle+0x5f/0x74 ======================= cpu#1, 3200.442 MHz .nr_running : 1 .load : 1024 .ls.delta_fair : 0 .ls.delta_exec : 0 .nr_switches : 1825911 .nr_load_updates : 3638353 .nr_uninterruptible : 4294967022 .jiffies : 3340380 .next_balance : 2937417 .curr->pid : 0 .clock : 3627052732372 .prev_clock_raw : 3629168676206 .clock_warps : 0 .clock_overflows : 45 .clock_unstable_events : 0 .clock_max_delta : 1968238 .cfs.fair_clock : 1032651577814 .cfs.exec_clock : 242507384061 .cfs.wait_runtime : -91108953414 .cfs.wait_runtime_overruns : 2143 .cfs.wait_runtime_underruns : 1953 .cpu_load[0] : 0 .cpu_load[1] : 0 .cpu_load[2] : 0 .cpu_load[3] : 0 .cpu_load[4] : 0 .wait_runtime_rq_sum : 23979985 runnable tasks: task PID tree-key delta waiting switches prio sum-exec sum-wait sum-sleep wait-overrun wait-underrun ------------------------------------------------------------------------------------------------------------------------------------------------------------------ hald-addon-stor 2781 1032650759709 -818105 23979985 12308 120 491952575 2635261433 3092710060269 30 0 cpu#2, 3200.442 MHz .nr_running : 0 .load : 0 .ls.delta_fair : 0 .ls.delta_exec : 0 .nr_switches : 1509999 .nr_load_updates : 3639454 .nr_uninterruptible : 789 .jiffies : 3340529 .next_balance : 3339823 .curr->pid : 0 .clock : 3628130245442 .prev_clock_raw : 3629317315215 .clock_warps : 0 .clock_overflows : 73 .clock_unstable_events : 0 .clock_max_delta : 1965436 .cfs.fair_clock : 839764839660 .cfs.exec_clock : 208639226606 .cfs.wait_runtime : 17382220347 .cfs.wait_runtime_overruns : 2282 .cfs.wait_runtime_underruns : 11415 .cpu_load[0] : 0 .cpu_load[1] : 0 .cpu_load[2] : 0 .cpu_load[3] : 0 .cpu_load[4] : 0 .wait_runtime_rq_sum : 0 runnable tasks: task PID tree-key delta waiting switches prio sum-exec sum-wait sum-sleep wait-overrun wait-underrun ------------------------------------------------------------------------------------------------------------------------------------------------------------------ cpu#3, 3200.442 MHz .nr_running : 1 .load : 1024 .ls.delta_fair : 0 .ls.delta_exec : 0 .nr_switches : 3120535 .nr_load_updates : 3640332 .nr_uninterruptible : 4294966359 .jiffies : 3340663 .next_balance : 2936318 .curr->pid : 10250 .clock : 3629093644120 .prev_clock_raw : 3629450829376 .clock_warps : 0 .clock_overflows : 15 .clock_unstable_events : 0 .clock_max_delta : 1940676 .cfs.fair_clock : 2242476711934 .cfs.exec_clock : 782992850413 .cfs.wait_runtime : 621759666860 .cfs.wait_runtime_overruns : 4730 .cfs.wait_runtime_underruns : 46039 .cpu_load[0] : 1019 .cpu_load[1] : 1020 .cpu_load[2] : 1018 .cpu_load[3] : 1017 .cpu_load[4] : 1009 .wait_runtime_rq_sum : -272987 runnable tasks: task PID tree-key delta waiting switches prio sum-exec sum-wait sum-sleep wait-overrun wait-underrun ------------------------------------------------------------------------------------------------------------------------------------------------------------------ R tc 10250 2242552751078 273022 -273022 1 120 403322303899 -273022 653 0 0 telnet> send break SysRq : Show Pending Timers Timer List Version: v0.3 HRTIMER_MAX_CLOCK_BASES: 2 now at 3730198060133 nsecs cpu: 0 clock 0: .index: 0 .resolution: 1 nsecs .get_time: ktime_get_real .offset: 1183721968505121290 nsecs active timers: clock 1: .index: 1 .resolution: 1 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: <3>BUG: sleeping function called from invalid context at kernel/mutex.c:86 in_atomic():1, irqs_disabled():1 INFO: lockdep is turned off. irq event stamp: 38482572 hardirqs last enabled at (38482572): [<c0103438>] restore_nocheck+0x12/0x15 hardirqs last disabled at (38482571): [<c0103eb1>] apic_timer_interrupt+0x29/0x38 softirqs last enabled at (37677028): [<c0126335>] do_softirq+0x56/0x58 softirqs last disabled at (37677033): [<c0126335>] do_softirq+0x56/0x58 [<c02fd44b>] mutex_lock+0x12/0x1a [<c0145e69>] lookup_module_symbol_name+0x14/0xbb [<c0138810>] print_name_offset+0x1a/0x8e [<c0121c3f>] release_console_sem+0x1d9/0x1f5 [<c01391a1>] timer_list_show+0x660/0xa66 [<c013a861>] tick_sched_timer+0x0/0x186 [<c02354c3>] __handle_sysrq+0x7a/0x115 [<c024059d>] receive_chars+0x23b/0x2a4 [<c024174d>] serial8250_interrupt+0x114/0x132 [<c0148733>] handle_IRQ_event+0x1e/0x48 [<c014982f>] handle_edge_irq+0x9b/0x112 [<c0105afc>] do_IRQ+0x45/0x99 [<c0103dd0>] common_interrupt+0x24/0x34 [<c0103dda>] common_interrupt+0x2e/0x34 [<c01e007b>] vsnprintf+0x29a/0x5fe [<c01ed199>] _raw_spin_lock+0xa3/0x13c [<c0298b9b>] est_timer+0x53/0x148 [<c01294b3>] run_timer_softirq+0x30/0x184 [<c01295a4>] run_timer_softirq+0x121/0x184 [<c0126252>] __do_softirq+0x66/0xf3 [<c0298b48>] est_timer+0x0/0x148 [<c012626a>] __do_softirq+0x7e/0xf3 [<c0126335>] do_softirq+0x56/0x58 [<c0112574>] smp_apic_timer_interrupt+0x5a/0x85 [<c0103eb1>] apic_timer_interrupt+0x29/0x38 [<c0103ebb>] apic_timer_interrupt+0x33/0x38 [<c010172f>] mwait_idle_with_hints+0x3b/0x3f [<c0101846>] cpu_idle+0x5f/0x74 ======================= <c2047d6c>, tick_sched_timer, S:01 # expires at 3730231000000 nsecs [in 32939867 nsecs] #1: <c2047d6c>, it_real_fn, S:01 # expires at 3734130154519 nsecs [in 3932094386 nsecs] #2: <c2047d6c>, it_real_fn, S:01 # expires at 3734197190457 nsecs [in 3999130324 nsecs] #3: <c2047d6c>, it_real_fn, S:01 # expires at 31835132462428 nsecs [in 28104934402295 nsecs] .expires_next : 3730415000000 nsecs .hres_active : 1 .nr_events : 3736552 .nohz_mode : 0 .idle_tick : 0 nsecs .tick_stopped : 0 .idle_jiffies : 0 .idle_calls : 0 .idle_sleeps : 0 .idle_entrytime : 0 nsecs .idle_sleeptime : 0 nsecs .last_jiffies : 0 .next_jiffies : 0 .idle_expires : 0 nsecs jiffies: 3430448 cpu: 1 clock 0: .index: 0 .resolution: 1 nsecs .get_time: ktime_get_real .offset: 1183721968505121290 nsecs active timers: clock 1: .index: 1 .resolution: 1 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: <c2047d6c>, tick_sched_timer, S:01 # expires at 3730196015625 nsecs [in 18446744073707507108 nsecs] .expires_next : 3730196015625 nsecs .hres_active : 1 .nr_events : 3733988 .nohz_mode : 0 .idle_tick : 0 nsecs .tick_stopped : 0 .idle_jiffies : 0 .idle_calls : 0 .idle_sleeps : 0 .idle_entrytime : 0 nsecs .idle_sleeptime : 0 nsecs .last_jiffies : 0 .next_jiffies : 0 .idle_expires : 0 nsecs jiffies: 3430518 cpu: 2 clock 0: .index: 0 .resolution: 1 nsecs .get_time: ktime_get_real .offset: 1183721968505121290 nsecs active timers: clock 1: .index: 1 .resolution: 1 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: <c2047d6c>, tick_sched_timer, S:01 # expires at 3730546031250 nsecs [in 347971117 nsecs] #1: <c2047d6c>, hrtimer_wakeup, S:01 # expires at 3730649159723 nsecs [in 451099590 nsecs] #2: <c2047d6c>, hrtimer_wakeup, S:01 # expires at 3734223051944 nsecs [in 4024991811 nsecs] #3: <c2047d6c>, hrtimer_wakeup, S:01 # expires at 3753480126556 nsecs [in 23282066423 nsecs] .expires_next : 3730581031250 nsecs .hres_active : 1 .nr_events : 3739031 .nohz_mode : 0 .idle_tick : 0 nsecs .tick_stopped : 0 .idle_jiffies : 0 .idle_calls : 0 .idle_sleeps : 0 .idle_entrytime : 0 nsecs .idle_sleeptime : 0 nsecs .last_jiffies : 0 .next_jiffies : 0 .idle_expires : 0 nsecs jiffies: 3430613 cpu: 3 clock 0: .index: 0 .resolution: 1 nsecs .get_time: ktime_get_real .offset: 1183721968505121290 nsecs active timers: clock 1: .index: 1 .resolution: 1 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: <c2047d6c>, tick_sched_timer, S:01 # expires at 3730641046875 nsecs [in 442986742 nsecs] .expires_next : 3730650046875 nsecs .hres_active : 1 .nr_events : 3736405 .nohz_mode : 0 .idle_tick : 0 nsecs .tick_stopped : 0 .idle_jiffies : 0 .idle_calls : 0 .idle_sleeps : 0 .idle_entrytime : 0 nsecs .idle_sleeptime : 0 nsecs .last_jiffies : 0 .next_jiffies : 0 .idle_expires : 0 nsecs jiffies: 3430685 Tick Device: mode: 1 Clock Event Device: pit max_delta_ns: 27461866 min_delta_ns: 12571 mult: 5124677 shift: 32 mode: 3 next_event: 9223372036854775807 nsecs set_next_event: pit_next_event set_mode: init_pit_timer event_handler: tick_handle_oneshot_broadcast tick_broadcast_mask: 00000000 tick_broadcast_oneshot_mask: 00000000 Tick Device: mode: 1 Clock Event Device: lapic max_delta_ns: 671064938 min_delta_ns: 1199 mult: 53688981 shift: 32 mode: 3 next_event: 3730740000000 nsecs set_next_event: lapic_next_event set_mode: lapic_timer_setup event_handler: hrtimer_interrupt Tick Device: mode: 1 Clock Event Device: lapic max_delta_ns: 671064938 min_delta_ns: 1199 mult: 53688981 shift: 32 mode: 3 next_event: 3730196015625 nsecs set_next_event: lapic_next_event set_mode: lapic_timer_setup event_handler: hrtimer_interrupt Tick Device: mode: 1 Clock Event Device: lapic max_delta_ns: 671064938 min_delta_ns: 1199 mult: 53688981 shift: 32 mode: 3 next_event: 3730799031250 nsecs set_next_event: lapic_next_event set_mode: lapic_timer_setup event_handler: hrtimer_interrupt Tick Device: mode: 1 Clock Event Device: lapic max_delta_ns: 671064938 min_delta_ns: 1199 mult: 53688981 shift: 32 mode: 3 next_event: 3730828046875 nsecs set_next_event: lapic_next_event set_mode: lapic_timer_setup event_handler: hrtimer_interrupt ---CUT--- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-06 13:14 ` Ranko Zivojnovic @ 2007-07-06 13:27 ` Patrick McHardy 2007-07-06 13:59 ` Ranko Zivojnovic 2007-07-06 14:21 ` Patrick McHardy 1 sibling, 1 reply; 25+ messages in thread From: Patrick McHardy @ 2007-07-06 13:27 UTC (permalink / raw) To: Ranko Zivojnovic; +Cc: Jarek Poplawski, akpm, netdev Ranko Zivojnovic wrote: > Managed to get stuck on a normal machine as well (2.6.22-rc6-mm1 + > sch_htb patch) ... here's the log: > Can you post the script you're using to reproduce this please? ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-06 13:27 ` Patrick McHardy @ 2007-07-06 13:59 ` Ranko Zivojnovic 0 siblings, 0 replies; 25+ messages in thread From: Ranko Zivojnovic @ 2007-07-06 13:59 UTC (permalink / raw) To: Patrick McHardy; +Cc: Jarek Poplawski, akpm, netdev [-- Attachment #1: Type: text/plain, Size: 282 bytes --] On Fri, 2007-07-06 at 15:27 +0200, Patrick McHardy wrote: > Ranko Zivojnovic wrote: > > Managed to get stuck on a normal machine as well (2.6.22-rc6-mm1 + > > sch_htb patch) ... here's the log: > > > > Can you post the script you're using to reproduce this please? Attached. R. [-- Attachment #2: tc-crash.sh --] [-- Type: application/x-shellscript, Size: 1597 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-06 13:14 ` Ranko Zivojnovic 2007-07-06 13:27 ` Patrick McHardy @ 2007-07-06 14:21 ` Patrick McHardy 2007-07-06 14:55 ` Ranko Zivojnovic 1 sibling, 1 reply; 25+ messages in thread From: Patrick McHardy @ 2007-07-06 14:21 UTC (permalink / raw) To: Ranko Zivojnovic; +Cc: Jarek Poplawski, akpm, netdev Ranko Zivojnovic wrote: > BUG: spinlock lockup on CPU#0, swapper/0, c03eff80 > [<c01ed1fe>] _raw_spin_lock+0x108/0x13c > [<c02a8468>] __qdisc_run+0x97/0x1b0 > [<c02a96f3>] qdisc_watchdog+0x19/0x58 > [<c02fe5e7>] __lock_text_start+0x37/0x43 > [<c02a9730>] qdisc_watchdog+0x56/0x58 > [<c02a96da>] qdisc_watchdog+0x0/0x58 > [<c0135d84>] run_hrtimer_softirq+0x58/0xb5 > [...] > BUG: spinlock lockup on CPU#1, swapper/0, c03eff80 > [<c01ed1fe>] _raw_spin_lock+0x108/0x13c > [<c0298b9b>] est_timer+0x53/0x148 > [<c01294b3>] run_timer_softirq+0x30/0x184 > [<c01295a4>] run_timer_softirq+0x121/0x184 > [<c0126252>] __do_softirq+0x66/0xf3 > [<c0298b48>] est_timer+0x0/0x148 > [...] There is at least one ABBA deadlock, est_timer does: read_lock(&est_lock) spin_lock(e->stats_lock) (which is dev->queue_lock) and qdisc_destroy calls htb_destroy under dev->queue_lock, which calls htb_destroy_class, then gen_kill_estimator and this write_locks est_lock. I can't see the problem above though, the qdisc_run path only takes dev->queue_lock. Please enable lockdep and post the output if any. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-06 14:21 ` Patrick McHardy @ 2007-07-06 14:55 ` Ranko Zivojnovic 2007-07-07 7:55 ` Ranko Zivojnovic 0 siblings, 1 reply; 25+ messages in thread From: Ranko Zivojnovic @ 2007-07-06 14:55 UTC (permalink / raw) To: Patrick McHardy; +Cc: Jarek Poplawski, akpm, netdev On Fri, 2007-07-06 at 16:21 +0200, Patrick McHardy wrote: > Ranko Zivojnovic wrote: > > BUG: spinlock lockup on CPU#0, swapper/0, c03eff80 > > [<c01ed1fe>] _raw_spin_lock+0x108/0x13c > > [<c02a8468>] __qdisc_run+0x97/0x1b0 > > [<c02a96f3>] qdisc_watchdog+0x19/0x58 > > [<c02fe5e7>] __lock_text_start+0x37/0x43 > > [<c02a9730>] qdisc_watchdog+0x56/0x58 > > [<c02a96da>] qdisc_watchdog+0x0/0x58 > > [<c0135d84>] run_hrtimer_softirq+0x58/0xb5 > > [...] > > > BUG: spinlock lockup on CPU#1, swapper/0, c03eff80 > > [<c01ed1fe>] _raw_spin_lock+0x108/0x13c > > [<c0298b9b>] est_timer+0x53/0x148 > > [<c01294b3>] run_timer_softirq+0x30/0x184 > > [<c01295a4>] run_timer_softirq+0x121/0x184 > > [<c0126252>] __do_softirq+0x66/0xf3 > > [<c0298b48>] est_timer+0x0/0x148 > > [...] > > > There is at least one ABBA deadlock, est_timer does: > > read_lock(&est_lock) > spin_lock(e->stats_lock) (which is dev->queue_lock) > > and qdisc_destroy calls htb_destroy under dev->queue_lock, which > calls htb_destroy_class, then gen_kill_estimator and this > write_locks est_lock. > > I can't see the problem above though, the qdisc_run path only takes > dev->queue_lock. Please enable lockdep and post the output if any. That was with lockdep already enabled: $ zgrep LOCKDEP /proc/config.gz CONFIG_LOCKDEP_SUPPORT=y CONFIG_LOCKDEP=y # CONFIG_DEBUG_LOCKDEP is not set also with: CONFIG_PROVE_LOCKING=y CONFIG_DEBUG_LOCK_ALLOC=y that Jarek suggested, and including: CONFIG_DETECT_SOFTLOCKUP=y CONFIG_DEBUG_SPINLOCK=y I will give it another spin as it is, just to see if it will show a different path - and then I will try 2.6.21.5 with these two patches. R. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-06 14:55 ` Ranko Zivojnovic @ 2007-07-07 7:55 ` Ranko Zivojnovic 2007-07-07 15:10 ` Patrick McHardy 0 siblings, 1 reply; 25+ messages in thread From: Ranko Zivojnovic @ 2007-07-07 7:55 UTC (permalink / raw) To: Patrick McHardy; +Cc: Jarek Poplawski, akpm, netdev On Fri, 2007-07-06 at 17:55 +0300, Ranko Zivojnovic wrote: > On Fri, 2007-07-06 at 16:21 +0200, Patrick McHardy wrote: > > Ranko Zivojnovic wrote: > > > BUG: spinlock lockup on CPU#0, swapper/0, c03eff80 > > > [<c01ed1fe>] _raw_spin_lock+0x108/0x13c > > > [<c02a8468>] __qdisc_run+0x97/0x1b0 > > > [<c02a96f3>] qdisc_watchdog+0x19/0x58 > > > [<c02fe5e7>] __lock_text_start+0x37/0x43 > > > [<c02a9730>] qdisc_watchdog+0x56/0x58 > > > [<c02a96da>] qdisc_watchdog+0x0/0x58 > > > [<c0135d84>] run_hrtimer_softirq+0x58/0xb5 > > > [...] > > > > > BUG: spinlock lockup on CPU#1, swapper/0, c03eff80 > > > [<c01ed1fe>] _raw_spin_lock+0x108/0x13c > > > [<c0298b9b>] est_timer+0x53/0x148 > > > [<c01294b3>] run_timer_softirq+0x30/0x184 > > > [<c01295a4>] run_timer_softirq+0x121/0x184 > > > [<c0126252>] __do_softirq+0x66/0xf3 > > > [<c0298b48>] est_timer+0x0/0x148 > > > [...] > > > > > > There is at least one ABBA deadlock, est_timer does: > > > > read_lock(&est_lock) > > spin_lock(e->stats_lock) (which is dev->queue_lock) > > > > and qdisc_destroy calls htb_destroy under dev->queue_lock, which > > calls htb_destroy_class, then gen_kill_estimator and this > > write_locks est_lock. > > > > I can't see the problem above though, the qdisc_run path only takes > > dev->queue_lock. Please enable lockdep and post the output if any. > I've got both code paths this time. It shows exactly the ABBA deadlock you describe above. The details are below. Maybe the appropriate way to fix this would to call gen_kill_estimator, with the appropriate lock order, before the call to qdisc_destroy, so when dev->queue_lock is taken for qdisc_destroy - the structure is already off the list. -------------LOG------------ BUG: spinlock lockup on CPU#2, ping/27868, c03eff80 [<c01ed1fe>] _raw_spin_lock+0x108/0x13c [<c0298b9b>] est_timer+0x53/0x148 [<c01295a4>] run_timer_softirq+0x121/0x184 [<c0126252>] __do_softirq+0x66/0xf3 [<c0298b48>] est_timer+0x0/0x148 [<c012626a>] __do_softirq+0x7e/0xf3 [<c0126335>] do_softirq+0x56/0x58 [<c0112574>] smp_apic_timer_interrupt+0x5a/0x85 [<c0103eb1>] apic_timer_interrupt+0x29/0x38 [<c0103ebb>] apic_timer_interrupt+0x33/0x38 [<c0126485>] local_bh_enable+0x94/0x13b [<c029c380>] dev_queue_xmit+0x95/0x2d5 [<c02bb9a9>] ip_output+0x193/0x32a [<c02b9fd8>] ip_finish_output+0x0/0x29e [<c02b8aa6>] ip_push_pending_frames+0x27f/0x46b [<c02b8770>] dst_output+0x0/0x7 [<c02d4fb9>] raw_sendmsg+0x70b/0x7f2 [<c02dcbe0>] inet_sendmsg+0x2b/0x49 [<c028fb66>] sock_sendmsg+0xe2/0xfd [<c0132bbb>] autoremove_wake_function+0x0/0x37 [<c0132bbb>] autoremove_wake_function+0x0/0x37 [<c011aacc>] enqueue_entity+0x139/0x4f8 [<c01e0dc3>] copy_from_user+0x2d/0x59 [<c028fcae>] sys_sendmsg+0x12d/0x243 [<c013dec5>] __lock_acquire+0x825/0x1002 [<c013dec5>] __lock_acquire+0x825/0x1002 [<c011d8a2>] scheduler_tick+0x1a7/0x20e [<c02fea7a>] _spin_unlock_irq+0x20/0x23 [<c013d166>] trace_hardirqs_on+0x73/0x147 [<c01294b3>] run_timer_softirq+0x30/0x184 [<c02fea7a>] _spin_unlock_irq+0x20/0x23 [<c0290eed>] sys_socketcall+0x24f/0x271 [<c013d19e>] trace_hardirqs_on+0xab/0x147 [<c01e0fe6>] copy_to_user+0x2f/0x49 [<c0103396>] sysenter_past_esp+0x8f/0x99 [<c0103366>] sysenter_past_esp+0x5f/0x99 ======================= And here is the ABBA deadlock: ---cut--- SysRq : Show Locks Held Showing all locks held in the system: ****snip**** 3 locks held by ping/27868: #0: (sk_lock-AF_INET){--..}, at: [<c02d4f24>] raw_sendmsg+0x676/0x7f2 #1: (est_lock#2){-.-+}, at: [<c0298b5d>] est_timer+0x15/0x148 #2: (&dev->queue_lock){-+..}, at: [<c0298b9b>] est_timer+0x53/0x148 ****snip**** 8 locks held by tc/2278: #0: (rtnl_mutex){--..}, at: [<c02a26d7>] rtnetlink_rcv+0x18/0x42 #1: (&dev->queue_lock){-+..}, at: [<c02a7f27>] qdisc_lock_tree+0xe/0x1c #2: (&dev->ingress_lock){-...}, at: [<c02a9ba8>] tc_get_qdisc+0x192/0x1e9 #3: (est_lock#2){-.-+}, at: [<c02989bc>] gen_kill_estimator+0x58/0x6f #4: (&irq_lists[i].lock){++..}, at: [<c024164d>] serial8250_interrupt+0x14/0x132 #5: (&port_lock_key){++..}, at: [<c024169b>] serial8250_interrupt+0x62/0x132 #6: (sysrq_key_table_lock){+...}, at: [<c0235460>] __handle_sysrq+0x17/0x115 #7: (tasklist_lock){..-?}, at: [<c013be2b>] debug_show_all_locks+0x2e/0x15e ****snip**** ---cut--- As well as 'tc' stack: ---cut--- SysRq : Show Regs Pid: 2278, comm: tc EIP: 0060:[<c02fe39b>] CPU: 0 EIP is at __write_lock_failed+0xf/0x1c EFLAGS: 00000287 Not tainted (2.6.22-rc6-mm1.SNET.Thors.htbpatch.2.lockdebug #1) EAX: c03f5968 EBX: c03f5968 ECX: 00000000 EDX: 00000004 ESI: c9852840 EDI: c85eae24 EBP: c06aaa60 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: 008ba828 CR3: 11841000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c01ed264>] _raw_write_lock+0x32/0x6e [<c02989bc>] gen_kill_estimator+0x58/0x6f [<f8bb55a6>] htb_destroy_class+0x27/0x12f [sch_htb] [<f8bb6037>] htb_destroy+0x34/0x70 [sch_htb] [<c02a8152>] qdisc_destroy+0x52/0x8d [<c013d166>] trace_hardirqs_on+0x73/0x147 [<f8bb5651>] htb_destroy_class+0xd2/0x12f [sch_htb] [<f8bb6037>] htb_destroy+0x34/0x70 [sch_htb] [<c02a8152>] qdisc_destroy+0x52/0x8d [<c02a9bb1>] tc_get_qdisc+0x19b/0x1e9 [<c02a9a16>] tc_get_qdisc+0x0/0x1e9 [<c02a28f5>] rtnetlink_rcv_msg+0x1c2/0x1f5 [<c02ad51f>] netlink_run_queue+0x96/0xfd [<c02a2733>] rtnetlink_rcv_msg+0x0/0x1f5 [<c02a26e5>] rtnetlink_rcv+0x26/0x42 [<c02ada49>] netlink_data_ready+0x12/0x54 [<c02ac6d4>] netlink_sendskb+0x1f/0x53 [<c02ad958>] netlink_sendmsg+0x1f5/0x2d4 [<c028fb66>] sock_sendmsg+0xe2/0xfd [<c0132bbb>] autoremove_wake_function+0x0/0x37 [<c013dec5>] __lock_acquire+0x825/0x1002 [<c028fb66>] sock_sendmsg+0xe2/0xfd [<c01e0dc3>] copy_from_user+0x2d/0x59 [<c028fcae>] sys_sendmsg+0x12d/0x243 [<c0157d4c>] __do_fault+0x12b/0x38b [<c0157db9>] __do_fault+0x198/0x38b [<c013d7b8>] __lock_acquire+0x118/0x1002 [<c014de85>] filemap_fault+0x0/0x42f [<c015922e>] __handle_mm_fault+0x11e/0x68d [<c0290eed>] sys_socketcall+0x24f/0x271 [<c013d19e>] trace_hardirqs_on+0xab/0x147 [<c0103438>] restore_nocheck+0x12/0x15 [<c0103366>] sysenter_past_esp+0x5f/0x99 ======================= ---cut--- Best regards, Ranko ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-07 7:55 ` Ranko Zivojnovic @ 2007-07-07 15:10 ` Patrick McHardy 2007-07-09 7:47 ` Jarek Poplawski 2007-07-09 12:41 ` Ranko Zivojnovic 0 siblings, 2 replies; 25+ messages in thread From: Patrick McHardy @ 2007-07-07 15:10 UTC (permalink / raw) To: Ranko Zivojnovic; +Cc: Jarek Poplawski, akpm, netdev On Sat, 7 Jul 2007, Ranko Zivojnovic wrote: >> On Fri, 2007-07-06 at 16:21 +0200, Patrick McHardy wrote: >>> There is at least one ABBA deadlock, est_timer does: >>> >>> read_lock(&est_lock) >>> spin_lock(e->stats_lock) (which is dev->queue_lock) >>> >>> and qdisc_destroy calls htb_destroy under dev->queue_lock, which >>> calls htb_destroy_class, then gen_kill_estimator and this >>> write_locks est_lock. >>> >>> I can't see the problem above though, the qdisc_run path only takes >>> dev->queue_lock. Please enable lockdep and post the output if any. >> > > I've got both code paths this time. It shows exactly the ABBA deadlock > you describe above. The details are below. Thanks. I'm still wondering whats causing the other lockup you're seeing. > Maybe the appropriate way to fix this would to call gen_kill_estimator, > with the appropriate lock order, before the call to qdisc_destroy, so > when dev->queue_lock is taken for qdisc_destroy - the structure is > already off the list. Probably easier to just kill est_lock and use rcu lists. I'm currently travelling, I'll look into it tomorrow. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-07 15:10 ` Patrick McHardy @ 2007-07-09 7:47 ` Jarek Poplawski 2007-07-09 12:41 ` Ranko Zivojnovic 1 sibling, 0 replies; 25+ messages in thread From: Jarek Poplawski @ 2007-07-09 7:47 UTC (permalink / raw) To: Patrick McHardy; +Cc: Ranko Zivojnovic, akpm, netdev On Sat, Jul 07, 2007 at 05:10:54PM +0200, Patrick McHardy wrote: > On Sat, 7 Jul 2007, Ranko Zivojnovic wrote: > > >>On Fri, 2007-07-06 at 16:21 +0200, Patrick McHardy wrote: > >>>There is at least one ABBA deadlock, est_timer does: > >>> > >>>read_lock(&est_lock) > >>>spin_lock(e->stats_lock) (which is dev->queue_lock) > >>> > >>>and qdisc_destroy calls htb_destroy under dev->queue_lock, which > >>>calls htb_destroy_class, then gen_kill_estimator and this > >>>write_locks est_lock. > >>> > >>>I can't see the problem above though, the qdisc_run path only takes > >>>dev->queue_lock. Please enable lockdep and post the output if any. > >> > > > >I've got both code paths this time. It shows exactly the ABBA deadlock > >you describe above. The details are below. > > > Thanks. I'm still wondering whats causing the other lockup > you're seeing. > > >Maybe the appropriate way to fix this would to call gen_kill_estimator, > >with the appropriate lock order, before the call to qdisc_destroy, so > >when dev->queue_lock is taken for qdisc_destroy - the structure is > >already off the list. > > > Probably easier to just kill est_lock and use rcu lists. > I'm currently travelling, I'll look into it tomorrow. > Congratulations to both of you! Ranko, it looks like you have some licence to kill! (I hope bugs only...) Cheers, Jarek P. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-07 15:10 ` Patrick McHardy 2007-07-09 7:47 ` Jarek Poplawski @ 2007-07-09 12:41 ` Ranko Zivojnovic 2007-07-09 13:52 ` Patrick McHardy 1 sibling, 1 reply; 25+ messages in thread From: Ranko Zivojnovic @ 2007-07-09 12:41 UTC (permalink / raw) To: Patrick McHardy; +Cc: Jarek Poplawski, akpm, netdev [-- Attachment #1: Type: text/plain, Size: 819 bytes --] On Sat, 2007-07-07 at 17:10 +0200, Patrick McHardy wrote: > On Sat, 7 Jul 2007, Ranko Zivojnovic wrote: > > Maybe the appropriate way to fix this would to call gen_kill_estimator, > > with the appropriate lock order, before the call to qdisc_destroy, so > > when dev->queue_lock is taken for qdisc_destroy - the structure is > > already off the list. > > Probably easier to just kill est_lock and use rcu lists. > I'm currently travelling, I'll look into it tomorrow. Patrick, I've taken liberty to try and implement this myself. Attached is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch that is RCU lists based. Please be kind to review. I've just compiled it against 2.6.22 together with sch_htb patch and I am currently testing it. I will let it run until tomorrow or until it crashes. R. [-- Attachment #2: gen_estimator-fix-locking-and-timer-related-bugs.patch --] [-- Type: text/x-patch, Size: 3816 bytes --] --- a/net/core/gen_estimator.c 2007-06-25 02:21:48.000000000 +0300 +++ b/net/core/gen_estimator.c 2007-07-09 14:27:12.053336875 +0300 @@ -79,7 +79,7 @@ struct gen_estimator { - struct gen_estimator *next; + struct list_head list; struct gnet_stats_basic *bstats; struct gnet_stats_rate_est *rate_est; spinlock_t *stats_lock; @@ -89,26 +89,27 @@ u32 last_packets; u32 avpps; u32 avbps; + struct rcu_head e_rcu; }; struct gen_estimator_head { struct timer_list timer; - struct gen_estimator *list; + struct list_head list; }; static struct gen_estimator_head elist[EST_MAX_INTERVAL+1]; /* Estimator array lock */ -static DEFINE_RWLOCK(est_lock); +static DEFINE_SPINLOCK(est_lock); static void est_timer(unsigned long arg) { int idx = (int)arg; struct gen_estimator *e; - read_lock(&est_lock); - for (e = elist[idx].list; e; e = e->next) { + rcu_read_lock(); + list_for_each_entry_rcu(e, &elist[idx].list, list) { u64 nbytes; u32 npackets; u32 rate; @@ -127,9 +128,9 @@ e->rate_est->pps = (e->avpps+0x1FF)>>10; spin_unlock(e->stats_lock); } - - mod_timer(&elist[idx].timer, jiffies + ((HZ<<idx)/4)); - read_unlock(&est_lock); + if (!list_empty(&elist[idx].list)) + mod_timer(&elist[idx].timer, jiffies + ((HZ<<idx)/4)); + rcu_read_unlock(); } /** @@ -152,6 +153,7 @@ { struct gen_estimator *est; struct gnet_estimator *parm = RTA_DATA(opt); + int idx; if (RTA_PAYLOAD(opt) < sizeof(*parm)) return -EINVAL; @@ -163,7 +165,8 @@ if (est == NULL) return -ENOBUFS; - est->interval = parm->interval + 2; + INIT_LIST_HEAD(&est->list); + est->interval = idx = parm->interval + 2; est->bstats = bstats; est->rate_est = rate_est; est->stats_lock = stats_lock; @@ -173,20 +176,26 @@ est->last_packets = bstats->packets; est->avpps = rate_est->pps<<10; - est->next = elist[est->interval].list; - if (est->next == NULL) { - init_timer(&elist[est->interval].timer); - elist[est->interval].timer.data = est->interval; - elist[est->interval].timer.expires = jiffies + ((HZ<<est->interval)/4); - elist[est->interval].timer.function = est_timer; - add_timer(&elist[est->interval].timer); + spin_lock_bh(&est_lock); + if (!elist[idx].timer.function) { + INIT_LIST_HEAD(&elist[idx].list); + setup_timer(&elist[idx].timer, est_timer, est->interval); } - write_lock_bh(&est_lock); - elist[est->interval].list = est; - write_unlock_bh(&est_lock); + + if (list_empty(&elist[est->interval].list)) + mod_timer(&elist[idx].timer, jiffies + ((HZ<<idx)/4)); + + list_add_rcu(&est->list, &elist[idx].list); + spin_unlock_bh(&est_lock); return 0; } +static void __gen_kill_estimator(struct rcu_head *head) +{ + struct gen_estimator *e = container_of(head, struct gen_estimator, e_rcu); + kfree(e); +} + /** * gen_kill_estimator - remove a rate estimator * @bstats: basic statistics @@ -199,26 +208,23 @@ struct gnet_stats_rate_est *rate_est) { int idx; - struct gen_estimator *est, **pest; + struct gen_estimator *e, *n; for (idx=0; idx <= EST_MAX_INTERVAL; idx++) { - int killed = 0; - pest = &elist[idx].list; - while ((est=*pest) != NULL) { - if (est->rate_est != rate_est || est->bstats != bstats) { - pest = &est->next; - continue; - } - write_lock_bh(&est_lock); - *pest = est->next; - write_unlock_bh(&est_lock); + /* Skip non initialized indexes */ + if (!elist[idx].timer.function) + continue; + + list_for_each_entry_safe(e, n, &elist[idx].list, list) { + if (e->rate_est != rate_est || e->bstats != bstats) + continue; - kfree(est); - killed++; + spin_lock_bh(&est_lock); + list_del_rcu(&e->list); + spin_unlock_bh(&est_lock); + call_rcu(&e->e_rcu, __gen_kill_estimator); } - if (killed && elist[idx].list == NULL) - del_timer(&elist[idx].timer); } } ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-09 12:41 ` Ranko Zivojnovic @ 2007-07-09 13:52 ` Patrick McHardy 2007-07-09 16:43 ` Ranko Zivojnovic 0 siblings, 1 reply; 25+ messages in thread From: Patrick McHardy @ 2007-07-09 13:52 UTC (permalink / raw) To: Ranko Zivojnovic; +Cc: Jarek Poplawski, akpm, netdev Ranko Zivojnovic wrote: > On Sat, 2007-07-07 at 17:10 +0200, Patrick McHardy wrote: > >>On Sat, 7 Jul 2007, Ranko Zivojnovic wrote: >> >>>Maybe the appropriate way to fix this would to call gen_kill_estimator, >>>with the appropriate lock order, before the call to qdisc_destroy, so >>>when dev->queue_lock is taken for qdisc_destroy - the structure is >>>already off the list. >> >>Probably easier to just kill est_lock and use rcu lists. >>I'm currently travelling, I'll look into it tomorrow. > > > Patrick, I've taken liberty to try and implement this myself. Attached > is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch > that is RCU lists based. Please be kind to review. Thanks for taking care of this, I'm a bit behind. See below for comments .. > --- a/net/core/gen_estimator.c 2007-06-25 02:21:48.000000000 +0300 > +++ b/net/core/gen_estimator.c 2007-07-09 14:27:12.053336875 +0300 > @@ -79,7 +79,7 @@ > > struct gen_estimator > { > - struct gen_estimator *next; > + struct list_head list; > struct gnet_stats_basic *bstats; > struct gnet_stats_rate_est *rate_est; > spinlock_t *stats_lock; > @@ -89,26 +89,27 @@ > u32 last_packets; > u32 avpps; > u32 avbps; > + struct rcu_head e_rcu; You could also use synchronize_rcu(), estimator destruction is not particulary performance critical. > static struct gen_estimator_head elist[EST_MAX_INTERVAL+1]; > > /* Estimator array lock */ > -static DEFINE_RWLOCK(est_lock); > +static DEFINE_SPINLOCK(est_lock); The lock isn't needed anymore since we can rely on the rtnl for creation/destruction. > /** > @@ -152,6 +153,7 @@ > { > struct gen_estimator *est; > struct gnet_estimator *parm = RTA_DATA(opt); > + int idx; > > if (RTA_PAYLOAD(opt) < sizeof(*parm)) > return -EINVAL; > @@ -163,7 +165,8 @@ > if (est == NULL) > return -ENOBUFS; > > - est->interval = parm->interval + 2; > + INIT_LIST_HEAD(&est->list); Initializing the members' list_head isn't necessary. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-09 13:52 ` Patrick McHardy @ 2007-07-09 16:43 ` Ranko Zivojnovic 2007-07-09 16:54 ` Patrick McHardy 2007-07-10 7:34 ` Jarek Poplawski 0 siblings, 2 replies; 25+ messages in thread From: Ranko Zivojnovic @ 2007-07-09 16:43 UTC (permalink / raw) To: Patrick McHardy; +Cc: Jarek Poplawski, akpm, netdev [-- Attachment #1: Type: text/plain, Size: 2712 bytes --] On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote: > Ranko Zivojnovic wrote: > > Patrick, I've taken liberty to try and implement this myself. Attached > > is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch > > that is RCU lists based. Please be kind to review. > > Thanks for taking care of this, I'm a bit behind. Glad to be able to help. > See below for comments .. > > > --- a/net/core/gen_estimator.c 2007-06-25 02:21:48.000000000 +0300 > > +++ b/net/core/gen_estimator.c 2007-07-09 14:27:12.053336875 +0300 > > @@ -79,7 +79,7 @@ > > > > struct gen_estimator > > { > > - struct gen_estimator *next; > > + struct list_head list; > > struct gnet_stats_basic *bstats; > > struct gnet_stats_rate_est *rate_est; > > spinlock_t *stats_lock; > > @@ -89,26 +89,27 @@ > > u32 last_packets; > > u32 avpps; > > u32 avbps; > > + struct rcu_head e_rcu; > > > You could also use synchronize_rcu(), estimator destruction is > not particulary performance critical. I've tried synchronize_rcu(), but it went kaboom with the below, thus call_rcu() must stay: ---cut--- BUG: sleeping function called from invalid context at kernel/sched.c:3928 in_atomic():1, irqs_disabled():0 3 locks held by tc/2933: #0: (rtnl_mutex){--..}, at: [<c0296e0a>] rtnetlink_rcv+0x18/0x42 #1: (&dev->queue_lock){-+..}, at: [<c029c337>] qdisc_lock_tree+0xe/0x1c #2: (&dev->ingress_lock){-...}, at: [<c029dfb8>] tc_get_qdisc+0x192/0x1e9 [<c02eec9e>] wait_for_completion+0x1a/0xb7 [<c01e3cd4>] __spin_lock_init+0x29/0x4b [<c012e410>] synchronize_rcu+0x2a/0x2f [<c012e0bc>] wakeme_after_rcu+0x0/0x8 [<c028da00>] gen_kill_estimator+0x70/0x9b [<f8b7c5a6>] htb_destroy_class+0x27/0x12f [sch_htb] [<f8b7d037>] htb_destroy+0x34/0x70 [sch_htb] [<c029c554>] qdisc_destroy+0x44/0x69 . . . ---cut--- > > static struct gen_estimator_head elist[EST_MAX_INTERVAL+1]; > > > > /* Estimator array lock */ > > -static DEFINE_RWLOCK(est_lock); > > +static DEFINE_SPINLOCK(est_lock); > > > The lock isn't needed anymore since we can rely on the rtnl > for creation/destruction. The 'est_lock' now completely removed. > > /** > > @@ -152,6 +153,7 @@ > > { > > struct gen_estimator *est; > > struct gnet_estimator *parm = RTA_DATA(opt); > > + int idx; > > > > if (RTA_PAYLOAD(opt) < sizeof(*parm)) > > return -EINVAL; > > @@ -163,7 +165,8 @@ > > if (est == NULL) > > return -ENOBUFS; > > > > - est->interval = parm->interval + 2; > > + INIT_LIST_HEAD(&est->list); > > > Initializing the members' list_head isn't necessary. > Removed. All the changes above now included in the attached patch. I've clean compiled it with 2.6.22 and now testing this version. R. [-- Attachment #2: gen_estimator-fix-locking-and-timer-related-bugs.patch --] [-- Type: text/x-patch, Size: 3635 bytes --] --- a/net/core/gen_estimator.c 2007-06-25 02:21:48.000000000 +0300 +++ b/net/core/gen_estimator.c 2007-07-09 19:08:06.801544963 +0300 @@ -79,7 +79,7 @@ struct gen_estimator { - struct gen_estimator *next; + struct list_head list; struct gnet_stats_basic *bstats; struct gnet_stats_rate_est *rate_est; spinlock_t *stats_lock; @@ -89,26 +89,24 @@ u32 last_packets; u32 avpps; u32 avbps; + struct rcu_head e_rcu; }; struct gen_estimator_head { struct timer_list timer; - struct gen_estimator *list; + struct list_head list; }; static struct gen_estimator_head elist[EST_MAX_INTERVAL+1]; -/* Estimator array lock */ -static DEFINE_RWLOCK(est_lock); - static void est_timer(unsigned long arg) { int idx = (int)arg; struct gen_estimator *e; - read_lock(&est_lock); - for (e = elist[idx].list; e; e = e->next) { + rcu_read_lock(); + list_for_each_entry_rcu(e, &elist[idx].list, list) { u64 nbytes; u32 npackets; u32 rate; @@ -127,9 +125,9 @@ e->rate_est->pps = (e->avpps+0x1FF)>>10; spin_unlock(e->stats_lock); } - - mod_timer(&elist[idx].timer, jiffies + ((HZ<<idx)/4)); - read_unlock(&est_lock); + if (!list_empty(&elist[idx].list)) + mod_timer(&elist[idx].timer, jiffies + ((HZ<<idx)/4)); + rcu_read_unlock(); } /** @@ -152,6 +150,7 @@ { struct gen_estimator *est; struct gnet_estimator *parm = RTA_DATA(opt); + int idx; if (RTA_PAYLOAD(opt) < sizeof(*parm)) return -EINVAL; @@ -163,7 +162,7 @@ if (est == NULL) return -ENOBUFS; - est->interval = parm->interval + 2; + est->interval = idx = parm->interval + 2; est->bstats = bstats; est->rate_est = rate_est; est->stats_lock = stats_lock; @@ -173,20 +172,24 @@ est->last_packets = bstats->packets; est->avpps = rate_est->pps<<10; - est->next = elist[est->interval].list; - if (est->next == NULL) { - init_timer(&elist[est->interval].timer); - elist[est->interval].timer.data = est->interval; - elist[est->interval].timer.expires = jiffies + ((HZ<<est->interval)/4); - elist[est->interval].timer.function = est_timer; - add_timer(&elist[est->interval].timer); + if (!elist[idx].timer.function) { + INIT_LIST_HEAD(&elist[idx].list); + setup_timer(&elist[idx].timer, est_timer, est->interval); } - write_lock_bh(&est_lock); - elist[est->interval].list = est; - write_unlock_bh(&est_lock); + + if (list_empty(&elist[est->interval].list)) + mod_timer(&elist[idx].timer, jiffies + ((HZ<<idx)/4)); + + list_add_rcu(&est->list, &elist[idx].list); return 0; } +static void __gen_kill_estimator(struct rcu_head *head) +{ + struct gen_estimator *e = container_of(head, struct gen_estimator, e_rcu); + kfree(e); +} + /** * gen_kill_estimator - remove a rate estimator * @bstats: basic statistics @@ -199,26 +202,21 @@ struct gnet_stats_rate_est *rate_est) { int idx; - struct gen_estimator *est, **pest; + struct gen_estimator *e, *n; for (idx=0; idx <= EST_MAX_INTERVAL; idx++) { - int killed = 0; - pest = &elist[idx].list; - while ((est=*pest) != NULL) { - if (est->rate_est != rate_est || est->bstats != bstats) { - pest = &est->next; - continue; - } - write_lock_bh(&est_lock); - *pest = est->next; - write_unlock_bh(&est_lock); + /* Skip non initialized indexes */ + if (!elist[idx].timer.function) + continue; + + list_for_each_entry_safe(e, n, &elist[idx].list, list) { + if (e->rate_est != rate_est || e->bstats != bstats) + continue; - kfree(est); - killed++; + list_del_rcu(&e->list); + call_rcu(&e->e_rcu, __gen_kill_estimator); } - if (killed && elist[idx].list == NULL) - del_timer(&elist[idx].timer); } } ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-09 16:43 ` Ranko Zivojnovic @ 2007-07-09 16:54 ` Patrick McHardy 2007-07-10 7:34 ` Jarek Poplawski 1 sibling, 0 replies; 25+ messages in thread From: Patrick McHardy @ 2007-07-09 16:54 UTC (permalink / raw) To: Ranko Zivojnovic; +Cc: Jarek Poplawski, akpm, netdev Ranko Zivojnovic wrote: > On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote: > >>You could also use synchronize_rcu(), estimator destruction is >>not particulary performance critical. > > > I've tried synchronize_rcu(), but it went kaboom with the below, thus > call_rcu() must stay: > > ---cut--- > BUG: sleeping function called from invalid context at kernel/sched.c:3928 > in_atomic():1, irqs_disabled():0 Right, I forgot that we're holding some locks. > [...] > All the changes above now included in the attached patch. I've clean > compiled it with 2.6.22 and now testing this version. Looks good. Please resend with a Signed-off-by: line once you're done testing so Dave can apply it (ideally with a new subject so it doesn't get missed). ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-09 16:43 ` Ranko Zivojnovic 2007-07-09 16:54 ` Patrick McHardy @ 2007-07-10 7:34 ` Jarek Poplawski 2007-07-10 10:09 ` Ranko Zivojnovic 1 sibling, 1 reply; 25+ messages in thread From: Jarek Poplawski @ 2007-07-10 7:34 UTC (permalink / raw) To: Ranko Zivojnovic; +Cc: Patrick McHardy, akpm, netdev On Mon, Jul 09, 2007 at 07:43:40PM +0300, Ranko Zivojnovic wrote: > On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote: > > Ranko Zivojnovic wrote: > > > Patrick, I've taken liberty to try and implement this myself. Attached > > > is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch > > > that is RCU lists based. Please be kind to review. ... I've some doubts/suggestions too: > --- a/net/core/gen_estimator.c 2007-06-25 02:21:48.000000000 +0300 > +++ b/net/core/gen_estimator.c 2007-07-09 19:08:06.801544963 +0300 ... > @@ -173,20 +172,24 @@ > est->last_packets = bstats->packets; > est->avpps = rate_est->pps<<10; > > - est->next = elist[est->interval].list; > - if (est->next == NULL) { > - init_timer(&elist[est->interval].timer); > - elist[est->interval].timer.data = est->interval; > - elist[est->interval].timer.expires = jiffies + ((HZ<<est->interval)/4); > - elist[est->interval].timer.function = est_timer; > - add_timer(&elist[est->interval].timer); > + if (!elist[idx].timer.function) { > + INIT_LIST_HEAD(&elist[idx].list); > + setup_timer(&elist[idx].timer, est_timer, est->interval); s/est->interval/idx/ here and below. > } > - write_lock_bh(&est_lock); > - elist[est->interval].list = est; > - write_unlock_bh(&est_lock); > + > + if (list_empty(&elist[est->interval].list)) > + mod_timer(&elist[idx].timer, jiffies + ((HZ<<idx)/4)); > + > + list_add_rcu(&est->list, &elist[idx].list); > return 0; > } > > +static void __gen_kill_estimator(struct rcu_head *head) > +{ > + struct gen_estimator *e = container_of(head, struct gen_estimator, e_rcu); > + kfree(e); > +} > + > /** > * gen_kill_estimator - remove a rate estimator > * @bstats: basic statistics > @@ -199,26 +202,21 @@ > struct gnet_stats_rate_est *rate_est) > { ... > + list_for_each_entry_safe(e, n, &elist[idx].list, list) { IMHO, at least for readability list_for_each_entry_rcu() is better here. > + if (e->rate_est != rate_est || e->bstats != bstats) > + continue; > > - kfree(est); > - killed++; > + list_del_rcu(&e->list); > + call_rcu(&e->e_rcu, __gen_kill_estimator); I think a race is possible here: e.g. a timer could be running after return from this function yet, and trying to use *bstats, *rate_est and maybe even stats_lock after their destruction. BTW, I think, rcu_read_lock/unlock are recommended around e.g. rcu lists traversals, even if the current implementation doesn't use them now. Regards, Jarek P. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-10 7:34 ` Jarek Poplawski @ 2007-07-10 10:09 ` Ranko Zivojnovic 2007-07-10 12:17 ` Jarek Poplawski 0 siblings, 1 reply; 25+ messages in thread From: Ranko Zivojnovic @ 2007-07-10 10:09 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Patrick McHardy, akpm, netdev On Tue, 2007-07-10 at 09:34 +0200, Jarek Poplawski wrote: > On Mon, Jul 09, 2007 at 07:43:40PM +0300, Ranko Zivojnovic wrote: > > On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote: > > > Ranko Zivojnovic wrote: > > > > Patrick, I've taken liberty to try and implement this myself. Attached > > > > is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch > > > > that is RCU lists based. Please be kind to review. > ... > > I've some doubts/suggestions too: > > > --- a/net/core/gen_estimator.c 2007-06-25 02:21:48.000000000 +0300 > > +++ b/net/core/gen_estimator.c 2007-07-09 19:08:06.801544963 +0300 > ... > > @@ -173,20 +172,24 @@ > > est->last_packets = bstats->packets; > > est->avpps = rate_est->pps<<10; > > > > - est->next = elist[est->interval].list; > > - if (est->next == NULL) { > > - init_timer(&elist[est->interval].timer); > > - elist[est->interval].timer.data = est->interval; > > - elist[est->interval].timer.expires = jiffies + ((HZ<<est->interval)/4); > > - elist[est->interval].timer.function = est_timer; > > - add_timer(&elist[est->interval].timer); > > + if (!elist[idx].timer.function) { > > + INIT_LIST_HEAD(&elist[idx].list); > > + setup_timer(&elist[idx].timer, est_timer, est->interval); > > s/est->interval/idx/ here and below. Agree - will do by final submission. > > > } > > - write_lock_bh(&est_lock); > > - elist[est->interval].list = est; > > - write_unlock_bh(&est_lock); > > + > > + if (list_empty(&elist[est->interval].list)) > > + mod_timer(&elist[idx].timer, jiffies + ((HZ<<idx)/4)); > > + > > + list_add_rcu(&est->list, &elist[idx].list); > > return 0; > > } > > > > +static void __gen_kill_estimator(struct rcu_head *head) > > +{ > > + struct gen_estimator *e = container_of(head, struct gen_estimator, e_rcu); > > + kfree(e); > > +} > > + > > /** > > * gen_kill_estimator - remove a rate estimator > > * @bstats: basic statistics > > @@ -199,26 +202,21 @@ > > struct gnet_stats_rate_est *rate_est) > > { > ... > > + list_for_each_entry_safe(e, n, &elist[idx].list, list) { > > IMHO, at least for readability list_for_each_entry_rcu() is better here. Should not hurt - however functionality wise not necessary as the list is protected by rtnl_mutex from being altered - also, for correctness, it should be list_for_each_safe_rcu() as there is a list_del_rcu in the loop... However I decided not to use _rcu based iteration neither the rcu_read_lock() after going through the RCU documentation and a bunch of examples in kernel that iterate through the lists using non _rcu macros and do list_del_rcu() just fine. For readability, the reference to list_del_rcu as well as call_rcu, I believe, should be enough of the indication. Please do correct me if I am wrong here. > > > + if (e->rate_est != rate_est || e->bstats != bstats) > > + continue; > > > > - kfree(est); > > - killed++; > > + list_del_rcu(&e->list); > > + call_rcu(&e->e_rcu, __gen_kill_estimator); > > I think a race is possible here: e.g. a timer could be running > after return from this function yet, and trying to use *bstats, > *rate_est and maybe even stats_lock after their destruction. > That will not happen because est_timer protects the loop with rcu_read_lock() as well as the iteration is done using list_for_each_entry_rcu(). The destruction callback is deferred and will happen only after the outermost rcu_read_unlock() is called. Well - that is at least what the documentation says... > BTW, I think, rcu_read_lock/unlock are recommended around e.g. > rcu lists traversals, even if the current implementation doesn't > use them now. I am afraid I do not understand what that the current implementation is not using right now... The whole concept and the implementation of the RCU, as I understand it, depends on rcu_read_lock/unlock to protect the read-side critical sections... Please be kind to elaborate... R. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-10 10:09 ` Ranko Zivojnovic @ 2007-07-10 12:17 ` Jarek Poplawski 2007-07-10 12:20 ` Patrick McHardy 0 siblings, 1 reply; 25+ messages in thread From: Jarek Poplawski @ 2007-07-10 12:17 UTC (permalink / raw) To: Ranko Zivojnovic; +Cc: Patrick McHardy, akpm, netdev On Tue, Jul 10, 2007 at 01:09:07PM +0300, Ranko Zivojnovic wrote: > On Tue, 2007-07-10 at 09:34 +0200, Jarek Poplawski wrote: > > On Mon, Jul 09, 2007 at 07:43:40PM +0300, Ranko Zivojnovic wrote: > > > On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote: > > > > Ranko Zivojnovic wrote: > > > > > Patrick, I've taken liberty to try and implement this myself. Attached > > > > > is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch > > > > > that is RCU lists based. Please be kind to review. > > ... > > > > I've some doubts/suggestions too: > > > > > --- a/net/core/gen_estimator.c 2007-06-25 02:21:48.000000000 +0300 > > > +++ b/net/core/gen_estimator.c 2007-07-09 19:08:06.801544963 +0300 ... > > > struct gnet_stats_rate_est *rate_est) > > > { > > ... > > > + list_for_each_entry_safe(e, n, &elist[idx].list, list) { > > > > IMHO, at least for readability list_for_each_entry_rcu() is better here. > > Should not hurt - however functionality wise not necessary as the list > is protected by rtnl_mutex from being altered - also, for correctness, > it should be list_for_each_safe_rcu() as there is a list_del_rcu in the > loop... > > However I decided not to use _rcu based iteration neither the > rcu_read_lock() after going through the RCU documentation and a bunch of > examples in kernel that iterate through the lists using non _rcu macros > and do list_del_rcu() just fine. > > For readability, the reference to list_del_rcu as well as call_rcu, I > believe, should be enough of the indication. Please do correct me if I > am wrong here. It's only my opinion, and it's probably not very popular at least at net/ code, so it's more about general policy and not this particular code. But: - if somebody is looking after some rcu related problems, why can't he/she spare some time by omitting lists without _rcu and not analyzing why/how such lists are used and locked? - if some day rcu code should change and needs to know precisely, where all the reads and writes take place (or some automatic tool needs this for checking), and it's no problem to show this now, why leave this for future? > > > > > > + if (e->rate_est != rate_est || e->bstats != bstats) > > > + continue; > > > > > > - kfree(est); > > > - killed++; > > > + list_del_rcu(&e->list); > > > + call_rcu(&e->e_rcu, __gen_kill_estimator); > > > > I think a race is possible here: e.g. a timer could be running > > after return from this function yet, and trying to use *bstats, > > *rate_est and maybe even stats_lock after their destruction. > > > > That will not happen because est_timer protects the loop with > rcu_read_lock() as well as the iteration is done using > list_for_each_entry_rcu(). The destruction callback is deferred and will > happen only after the outermost rcu_read_unlock() is called. Well - that > is at least what the documentation says... I'm not very sure it's really possible, so I hope it'll be checked by somebody more. I don't mean gen_estimator struct here, but gnet_stats_basic and gnet_stats_rate_est structs here, which are allocated elsewhere, and could be destructed just 'after' - if not before: that's the question... E.g. when we are leaving this function, on another cpu a timer could traverse this list and I'm not sure rcu has to show the change fast enough - before freeing these other structs. > > > BTW, I think, rcu_read_lock/unlock are recommended around e.g. > > rcu lists traversals, even if the current implementation doesn't > > use them now. > > I am afraid I do not understand what that the current implementation is > not using right now... The whole concept and the implementation of the > RCU, as I understand it, depends on rcu_read_lock/unlock to protect the > read-side critical sections... > > Please be kind to elaborate... Don't take this very seriously, there are many opinions... Cheers, Jarek P. PS: I think you could read more about this here: http://marc.info/?l=linux-kernel&m=118193431820959&w=2 Especially this: List: linux-kernel Subject: Re: Using RCU with rcu_read_lock()? From: Peter Zijlstra <a.p.zijlstra () chello ! nl> Date: 2007-06-15 19:04:19 Message-ID: 1181934259.11113.6.camel () lappy [Download message RAW] On Fri, 2007-06-15 at 15:00 -0400, Dmitry Torokhov wrote: > Hi, > > I have a piece of code that is always called under a spinlock with > interrups disabled. Within that piece of code I iterate through a > list. I have another piece of code that wants to modify that list. I > have 2 options: > > 1) Have the other piece of code acquire the same spinlock > 2) Use RCU > > I don't want to do 1) because the otheir piece of code does not really > care about object owning the spinlock and so acquiring the spinlock is > "not nice". However it is guaranteed that the piece of code that > accesses lock runs atomically with interrupts disabled. So > rcu_read_lock() would be superfluos there. > > Is it possible to still use list_for_each_rcu() and friends to access > that list without rcu_read_lock()? Or it is betteruse complete RCU > interface and eat cost of couple of extra instrctions? Yes, preemptible rcu requires that you use the full interface, also, it more clearly documents the code. Trying to find code that breaks these assumptions is very tedious work after the fact. Please do use the RCU interface in full. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-10 12:17 ` Jarek Poplawski @ 2007-07-10 12:20 ` Patrick McHardy 2007-07-10 13:10 ` Jarek Poplawski 0 siblings, 1 reply; 25+ messages in thread From: Patrick McHardy @ 2007-07-10 12:20 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Ranko Zivojnovic, akpm, netdev Jarek Poplawski wrote: > On Tue, Jul 10, 2007 at 01:09:07PM +0300, Ranko Zivojnovic wrote: > >>However I decided not to use _rcu based iteration neither the >>rcu_read_lock() after going through the RCU documentation and a bunch of >>examples in kernel that iterate through the lists using non _rcu macros >>and do list_del_rcu() just fine. >> >>For readability, the reference to list_del_rcu as well as call_rcu, I >>believe, should be enough of the indication. Please do correct me if I >>am wrong here. > > > It's only my opinion, and it's probably not very popular at least > at net/ code, so it's more about general policy and not this > particular code. But: > - if somebody is looking after some rcu related problems, why can't > he/she spare some time by omitting lists without _rcu and not > analyzing why/how such lists are used and locked? RCU is used for the read-side, using it on the write-side just makes things *less* understandable IMO. It will look like the read-side but still do modifications. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-10 12:20 ` Patrick McHardy @ 2007-07-10 13:10 ` Jarek Poplawski 2007-07-10 13:51 ` Jarek Poplawski 0 siblings, 1 reply; 25+ messages in thread From: Jarek Poplawski @ 2007-07-10 13:10 UTC (permalink / raw) To: Patrick McHardy; +Cc: Ranko Zivojnovic, akpm, netdev On Tue, Jul 10, 2007 at 02:20:12PM +0200, Patrick McHardy wrote: > Jarek Poplawski wrote: > > On Tue, Jul 10, 2007 at 01:09:07PM +0300, Ranko Zivojnovic wrote: > > > >>However I decided not to use _rcu based iteration neither the > >>rcu_read_lock() after going through the RCU documentation and a bunch of > >>examples in kernel that iterate through the lists using non _rcu macros > >>and do list_del_rcu() just fine. > >> > >>For readability, the reference to list_del_rcu as well as call_rcu, I > >>believe, should be enough of the indication. Please do correct me if I > >>am wrong here. > > > > > > It's only my opinion, and it's probably not very popular at least > > at net/ code, so it's more about general policy and not this > > particular code. But: > > - if somebody is looking after some rcu related problems, why can't > > he/she spare some time by omitting lists without _rcu and not > > analyzing why/how such lists are used and locked? > > > RCU is used for the read-side, using it on the write-side just makes > things *less* understandable IMO. It will look like the read-side > but still do modifications. > >From Documentation/RCU/checklist: "9. All RCU list-traversal primitives, which include list_for_each_rcu(), list_for_each_entry_rcu(), list_for_each_continue_rcu(), and list_for_each_safe_rcu(), must be within an RCU read-side critical section. RCU read-side critical sections are delimited by rcu_read_lock() and rcu_read_unlock(), or by similar primitives such as rcu_read_lock_bh() and rcu_read_unlock_bh(). Use of the _rcu() list-traversal primitives outside of an RCU read-side critical section causes no harm other than a slight performance degradation on Alpha CPUs. It can also be quite helpful in reducing code bloat when common code is shared between readers and updaters." So, it seems, it's a question of taste about what is this bloat. I'd certainly agree with you if the write-side looks like in examples from documentation: there is always a few lines of code with some spinlocks. But here we see no locks and no comments, so each time we have to guess which of the supposed here locks is needed for RCU or maybe for the lists without "rcu". Sorry, for messing here with my private opinions: I really shouldn't start this theme. Jarek P. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree 2007-07-10 13:10 ` Jarek Poplawski @ 2007-07-10 13:51 ` Jarek Poplawski 0 siblings, 0 replies; 25+ messages in thread From: Jarek Poplawski @ 2007-07-10 13:51 UTC (permalink / raw) To: Patrick McHardy; +Cc: Ranko Zivojnovic, akpm, netdev On Tue, Jul 10, 2007 at 03:10:34PM +0200, Jarek Poplawski wrote: > On Tue, Jul 10, 2007 at 02:20:12PM +0200, Patrick McHardy wrote: > > Jarek Poplawski wrote: > > > On Tue, Jul 10, 2007 at 01:09:07PM +0300, Ranko Zivojnovic wrote: > > > > > >>However I decided not to use _rcu based iteration neither the > > >>rcu_read_lock() after going through the RCU documentation and a bunch of > > >>examples in kernel that iterate through the lists using non _rcu macros > > >>and do list_del_rcu() just fine. > > >> > > >>For readability, the reference to list_del_rcu as well as call_rcu, I > > >>believe, should be enough of the indication. Please do correct me if I > > >>am wrong here. > > > > > > > > > It's only my opinion, and it's probably not very popular at least > > > at net/ code, so it's more about general policy and not this > > > particular code. But: > > > - if somebody is looking after some rcu related problems, why can't > > > he/she spare some time by omitting lists without _rcu and not > > > analyzing why/how such lists are used and locked? > > > > > > RCU is used for the read-side, using it on the write-side just makes > > things *less* understandable IMO. It will look like the read-side > > but still do modifications. > > > > From Documentation/RCU/checklist: > > "9. All RCU list-traversal primitives, which include > list_for_each_rcu(), list_for_each_entry_rcu(), > list_for_each_continue_rcu(), and list_for_each_safe_rcu(), > must be within an RCU read-side critical section. RCU > read-side critical sections are delimited by rcu_read_lock() > and rcu_read_unlock(), or by similar primitives such as > rcu_read_lock_bh() and rcu_read_unlock_bh(). ...But, on the other hand, Ranko didn't use any of these primitives... So, first I said he should use this, and than I asked why there was no rcu_read_locks around... I really start to amaze with my consistency. I hope some day you'll forgive me (no hurry, I can wait). Cheers, Jarek P. ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2007-07-10 13:43 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200706271921.l5RJLgCC003910@imap1.linux-foundation.org>
[not found] ` <1183642800.3789.11.camel@ranko-fc2.spidernet.net>
[not found] ` <20070705142135.GG4759@ff.dom.local>
[not found] ` <1183646029.4069.11.camel@ranko-fc2.spidernet.net>
[not found] ` <1183651165.4069.26.camel@ranko-fc2.spidernet.net>
2007-07-06 6:14 ` + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree Jarek Poplawski
2007-07-06 6:20 ` Fwd: " Jarek Poplawski
2007-07-06 6:26 ` Jarek Poplawski
2007-07-06 6:45 ` Jarek Poplawski
2007-07-06 12:47 ` Jarek Poplawski
2007-07-06 13:16 ` Ranko Zivojnovic
2007-07-09 8:25 ` Jarek Poplawski
2007-07-06 13:14 ` Ranko Zivojnovic
2007-07-06 13:27 ` Patrick McHardy
2007-07-06 13:59 ` Ranko Zivojnovic
2007-07-06 14:21 ` Patrick McHardy
2007-07-06 14:55 ` Ranko Zivojnovic
2007-07-07 7:55 ` Ranko Zivojnovic
2007-07-07 15:10 ` Patrick McHardy
2007-07-09 7:47 ` Jarek Poplawski
2007-07-09 12:41 ` Ranko Zivojnovic
2007-07-09 13:52 ` Patrick McHardy
2007-07-09 16:43 ` Ranko Zivojnovic
2007-07-09 16:54 ` Patrick McHardy
2007-07-10 7:34 ` Jarek Poplawski
2007-07-10 10:09 ` Ranko Zivojnovic
2007-07-10 12:17 ` Jarek Poplawski
2007-07-10 12:20 ` Patrick McHardy
2007-07-10 13:10 ` Jarek Poplawski
2007-07-10 13:51 ` Jarek Poplawski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).