Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
       [not found]       ` <1183651165.4069.26.camel@ranko-fc2.spidernet.net>
@ 2007-07-06  6:14         ` Jarek Poplawski
  2007-07-06  6:20           ` Fwd: " Jarek Poplawski
  2007-07-06  6:26           ` Jarek Poplawski
  0 siblings, 2 replies; 25+ messages in thread
From: Jarek Poplawski @ 2007-07-06  6:14 UTC (permalink / raw)
  To: Ranko Zivojnovic; +Cc: akpm, kaber, netdev

On Thu, Jul 05, 2007 at 06:59:25PM +0300, Ranko Zivojnovic wrote:
> On Thu, 2007-07-05 at 17:34 +0300, Ranko Zivojnovic wrote:
> > Anyhow - I am currently running 2.6.22-rc6-mm1 + sch_htb patch and
> > running a test script that always managed to reproduce the problem
> > within half hour - so far it looks good, but I will leave it hammering
> > until tomorrow and will let you know.
> > 
> 
> Ahm...it managed to get stuck:
> 
> Relevant piece of the process table ('tc' stays stuck):
>  9608 pts/0    S+     0:00 /bin/bash ./tc-crash.sh
>  9609 pts/0    D+     0:00 tc qdisc del dev lo root
> 
> There was no soft lockup kernel warning however sysreq 'w' & 'q' dump
> the following:
> 
> ---cut---

I have to admit you are really good at this testing!

Especially after this first htb bug I was almost stunned,
how long such repeatable and not very hidden bug can
live in one of the most popular networking modules...

This new lockup bug you have just found needs some time
to figure out. BTW, I wonder if you had lockdep on
(CONFIG_PROVE_LOCKING or CONFIG_LOCK_ALLOC)?

Thanks & regards,
Jarek P.

PS: I hope you don't mind I forward this to netdev.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Fwd: Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-06  6:14         ` + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree Jarek Poplawski
@ 2007-07-06  6:20           ` Jarek Poplawski
  2007-07-06  6:26           ` Jarek Poplawski
  1 sibling, 0 replies; 25+ messages in thread
From: Jarek Poplawski @ 2007-07-06  6:20 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Ranko Zivojnovic, akpm, kaber, netdev

----- Forwarded message from Ranko Zivojnovic <ranko@spidernet.net> -----

> Date: Thu, 05 Jul 2007 18:59:25 +0300
> From: Ranko Zivojnovic <ranko@spidernet.net>
> Subject: Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added
> 	to -mm tree
> To: Jarek Poplawski <jarkao2@o2.pl>
> Cc: akpm@linux-foundation.org, kaber@trash.net
> X-Original-To: info
> Delivered-To: info@dom.local
> In-Reply-To: <1183646029.4069.11.camel@ranko-fc2.spidernet.net>
> Organization: SpiderNet Services Public Ltd.
> X-Mailer: Evolution 2.8.3 (2.8.3-2.fc6) 
> X-o2-Trust: 2, 43
> 
> On Thu, 2007-07-05 at 17:34 +0300, Ranko Zivojnovic wrote:
> > Anyhow - I am currently running 2.6.22-rc6-mm1 + sch_htb patch and
> > running a test script that always managed to reproduce the problem
> > within half hour - so far it looks good, but I will leave it hammering
> > until tomorrow and will let you know.
> > 
> 
> Ahm...it managed to get stuck:
> 
> Relevant piece of the process table ('tc' stays stuck):
>  9608 pts/0    S+     0:00 /bin/bash ./tc-crash.sh
>  9609 pts/0    D+     0:00 tc qdisc del dev lo root
> 
> There was no soft lockup kernel warning however sysreq 'w' & 'q' dump
> the following:
> 
> ---cut---
> SysRq : Show Blocked State
> 
>                          free                        sibling
>   task             PC    stack   pid father child younger older
> kjournald     D 00000531     0   337      2 (L-TLB)
>        c3083e9c 00000046 c4ef3383 00000531 c3083e84 00000000 c292f7fc c3082000
>        00000000 00000000 c2903388 c29031d0 c2903388 c150a000 00000000 c03b5d80
>        c29d9480 c2949080 c03ff080 c03ff080 c289e7fc 00000000 c03ff080 0054b8e6
> Call Trace:
>  [<c02e9e1b>] io_schedule+0x1e/0x28
>  [<c0184915>] sync_buffer+0x34/0x37
>  [<c02ea416>] __wait_on_bit+0x42/0x5e
>  [<c01848e1>] sync_buffer+0x0/0x37
>  [<c01848e1>] sync_buffer+0x0/0x37
>  [<c02ea495>] out_of_line_wait_on_bit+0x63/0x6b
>  [<c01320a2>] wake_bit_function+0x0/0x3c
>  [<c0184873>] __wait_on_buffer+0x24/0x29
>  [<e083392e>] journal_commit_transaction+0x63d/0x10df [jbd]
>  [<c02eb265>] _spin_unlock_irq+0x5/0x7
>  [<c02e9713>] schedule+0x323/0x867
>  [<c02eb277>] _spin_lock_irqsave+0x9/0xd
>  [<c0128dfa>] lock_timer_base+0x19/0x35
>  [<c0128e5d>] try_to_del_timer_sync+0x47/0x4f
>  [<e0837899>] kjournald+0xac/0x1f4 [jbd]
>  [<c013206b>] autoremove_wake_function+0x0/0x37
>  [<e08377ed>] kjournald+0x0/0x1f4 [jbd]
>  [<c0131e26>] kthread+0x34/0x56
>  [<c0131df2>] kthread+0x0/0x56
>  [<c0104023>] kernel_thread_helper+0x7/0x14
>  =======================
> syslogd       D 00000531     0  1414      1 (NOTLB)
>        c3045e84 00200082 c4e83c2f 00000531 c1506a80 00000002 00000000 c3044000
>        1d5003ae 00000552 c29de888 c29de6d0 c29de888 c150a000 00000000 00000003
>        00200082 c2949080 c03ff080 c03ff080 c2ac46b0 00000000 c03ff080 0054b8e6
> Call Trace:
>  [<e0836bdb>] log_wait_commit+0xaf/0x116 [jbd]
>  [<c013206b>] autoremove_wake_function+0x0/0x37
>  [<e08316b1>] journal_stop+0x158/0x1f7 [jbd]
>  [<c0180046>] __writeback_single_inode+0x267/0x329
>  [<c014bcd7>] generic_writepages+0x20/0x29
>  [<c0180121>] sync_inode+0x19/0x2a
>  [<e086c4f4>] ext3_sync_file+0xbc/0xc8 [ext3]
>  [<c0182d89>] do_fsync+0x55/0x8a
>  [<c0182ddb>] __do_fsync+0x1d/0x2b
>  [<c01033d2>] sysenter_past_esp+0x5f/0x85
>  =======================
> irqbalance    D 0000040E     0  1428      1 (NOTLB)
>        c304becc 00200082 f5b6412a 0000040e c304beb4 00000000 c34e0000 c304a000
>        00000001 c2a754c4 c22a2848 c22a2690 c22a2848 c1513000 00000001 c03b5d80
>        0000000a c286e480 c03ff080 c03ff080 c03631be c18667e4 c03ff080 003fd3f2
> Call Trace:
>  [<c02ea650>] __mutex_lock_slowpath+0x54/0x8d
>  [<c02ea50e>] mutex_lock+0x21/0x26
>  [<c027d0ef>] sock_ioctl+0x0/0x1c6
>  [<c027d0ef>] sock_ioctl+0x0/0x1c6
>  [<c02895a2>] dev_ioctl+0x20c/0x309
>  [<c018d4d1>] inotify_d_instantiate+0x1b/0x80
>  [<c0175858>] d_instantiate+0x56/0x8a
>  [<c027d12f>] sock_ioctl+0x40/0x1c6
>  [<c027d0ef>] sock_ioctl+0x0/0x1c6
>  [<c017046f>] do_ioctl+0x1f/0x6d
>  [<c027e39f>] sys_socket+0x29/0x44
>  [<c017050d>] vfs_ioctl+0x50/0x26b
>  [<c0170785>] sys_ioctl+0x5d/0x6c
>  [<c01033d2>] sysenter_past_esp+0x5f/0x85
>  =======================
> tc            D 00000409     0  9609   9608 (NOTLB)
>        c37a7be8 00200086 ee308e07 00000409 c1506a80 00000002 00000000 c37a6000
>        3e416208 00000414 c29021e8 c2902030 c29021e8 c150a000 00000000 c321c4c4
>        c37a7bd0 c19d36c0 c03ff080 c03ff080 c0134d71 00200286 c03ff080 003fe1ed
> Call Trace:
>  [<c0134d71>] lock_hrtimer_base+0x15/0x2f
>  [<c02e9d49>] wait_for_completion+0x9e/0xb7
>  [<c011b805>] default_wake_function+0x0/0xc
>  [<c0130122>] synchronize_rcu+0x2a/0x2f
>  [<c012fddb>] wakeme_after_rcu+0x0/0x8
>  [<c0295ba5>] dev_deactivate+0x8a/0xa3
>  [<c0296f12>] dev_graft_qdisc+0x81/0xa4
>  [<c0296f5d>] qdisc_graft+0x28/0x88
>  [<c0297709>] tc_get_qdisc+0x15d/0x1e9
>  [<c02975ac>] tc_get_qdisc+0x0/0x1e9
>  [<c02904f9>] rtnetlink_rcv_msg+0x1c2/0x1f5
>  [<c029b0a0>] netlink_run_queue+0xa7/0x10e
>  [<c0290337>] rtnetlink_rcv_msg+0x0/0x1f5
>  [<c02902e9>] rtnetlink_rcv+0x26/0x42
>  [<c029b5c9>] netlink_data_ready+0x12/0x54
>  [<c029a244>] netlink_sendskb+0x1f/0x53
>  [<c029b4d8>] netlink_sendmsg+0x1f5/0x2d4
>  [<c027dae6>] sock_sendmsg+0xe2/0xfd
>  [<c013206b>] autoremove_wake_function+0x0/0x37
>  [<c013206b>] autoremove_wake_function+0x0/0x37
>  [<c027dae6>] sock_sendmsg+0xe2/0xfd
>  [<c01d9ec3>] copy_from_user+0x2d/0x59
>  [<c027dc2e>] sys_sendmsg+0x12d/0x243
>  [<c014805a>] filemap_fault+0x125/0x42d
>  [<c0151be7>] __do_fault+0x18f/0x37d
>  [<c014afb4>] __alloc_pages+0x4f/0x358
>  [<c0147f35>] filemap_fault+0x0/0x42d
>  [<c0153059>] __handle_mm_fault+0x114/0x662
>  [<c0155717>] acct_stack_growth+0x86/0x9a
>  [<c027ee6d>] sys_socketcall+0x24f/0x271
>  [<c011632c>] do_page_fault+0x0/0x6b4
>  [<c01033d2>] sysenter_past_esp+0x5f/0x85
>  =======================
> Sched Debug Version: v0.03, cfs-v18, 2.6.22-rc6-mm1.SNET.Thors.htbpatch.2 #2
> now at 5852817390013 nsecs
> 
> cpu#0, 2992.428 MHz
>   .nr_running                    : 0
>   .load                          : 0
>   .ls.delta_fair                 : 0
>   .ls.delta_exec                 : 0
>   .nr_switches                   : 339295
>   .nr_load_updates               : 5726868
>   .nr_uninterruptible            : 4294967087
>   .jiffies                       : 5552844
>   .next_balance                  : 5552847
>   .curr->pid                     : 0
>   .clock                         : 5711615883107
>   .prev_clock_raw                : 5850635770741
>   .clock_warps                   : 1
>   .clock_overflows               : 12712
>   .clock_unstable_events         : 0
>   .clock_max_delta               : 1999660
>   .cfs.fair_clock                : 1406994636435
>   .cfs.exec_clock                : 1106293764887
>   .cfs.wait_runtime              : -700502353213
>   .cfs.wait_runtime_overruns     : 36267
>   .cfs.wait_runtime_underruns    : 1329
>   .cpu_load[0]                   : 0
>   .cpu_load[1]                   : 0
>   .cpu_load[2]                   : 0
>   .cpu_load[3]                   : 0
>   .cpu_load[4]                   : 0
>   .wait_runtime_rq_sum           : 0
> 
> runnable tasks:
>             task   PID        tree-key         delta       waiting  switches  prio        sum-exec        sum-wait       sum-sleep    wait-overrun   wait-underrun
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> cpu#1, 2992.428 MHz
>   .nr_running                    : 1
>   .load                          : 1024
>   .ls.delta_fair                 : 536356056422
>   .ls.delta_exec                 : 8760209027
>   .nr_switches                   : 401913
>   .nr_load_updates               : 4460569
>   .nr_uninterruptible            : 213
>   .jiffies                       : 5552969
>   .next_balance                  : 4183026
>   .curr->pid                     : 4536
>   .clock                         : 4465134326215
>   .prev_clock_raw                : 5850235012355
>   .clock_warps                   : 0
>   .clock_overflows               : 6100
>   .clock_unstable_events         : 0
>   .clock_max_delta               : 1999533
>   .cfs.fair_clock                : 1265459395930
>   .cfs.exec_clock                : 1019469215716
>   .cfs.wait_runtime              : -866196108791
>   .cfs.wait_runtime_overruns     : 35706
>   .cfs.wait_runtime_underruns    : 4327
>   .cpu_load[0]                   : 2302
>   .cpu_load[1]                   : 2514
>   .cpu_load[2]                   : 1966
>   .cpu_load[3]                   : 1532
>   .cpu_load[4]                   : 1292
>   .wait_runtime_rq_sum           : 15731878
> 
> runnable tasks:
>             task   PID        tree-key         delta       waiting  switches  prio        sum-exec        sum-wait       sum-sleep    wait-overrun   wait-underrun
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------
> R           bash  4536   1265443664052     -15731878      15731878       192   120       144540956       410343770   2862976805398              19               0
> 
> 
> SysRq : Show Pending Timers
> Timer List Version: v0.3
> HRTIMER_MAX_CLOCK_BASES: 2
> now at 6550224915183 nsecs
> 
> cpu: 0
>  clock 0:
>   .index:      0
>   .resolution: 1 nsecs
>   .get_time:   ktime_get_real
>   .offset:     1183655045226773151 nsecs
> active timers:
>  clock 1:
>   .index:      1
>   .resolution: 1 nsecs
>   .get_time:   ktime_get
>   .offset:     0 nsecs
> active timers:
>  #0: <3>BUG: sleeping function called from invalid context at kernel/mutex.c:86
> in_atomic():0, irqs_disabled():1
>  [<c02ea502>] mutex_lock+0x15/0x26
>  [<c021f810>] vt_console_print+0x0/0x287
>  [<c013ff72>] lookup_module_symbol_name+0x14/0xbb
>  [<c01379fc>] print_name_offset+0x1a/0x8e
>  [<c01214c4>] release_console_sem+0x1ae/0x1c9
>  [<c0117f03>] __wake_up_common+0x39/0x59
>  [<c0136787>] getnstimeofday+0x30/0xc9
>  [<c0135116>] ktime_get_ts+0x16/0x44
>  [<c0138389>] timer_list_show+0x65c/0xa5d
>  [<c0139a24>] tick_sched_timer+0x0/0x178
>  [<c0223e93>] __handle_sysrq+0x7a/0x115
>  [<c01a0078>] write_sysrq_trigger+0x0/0x2d
>  [<c01a00a3>] write_sysrq_trigger+0x2b/0x2d
>  [<c019a7f7>] proc_reg_write+0x58/0x79
>  [<c0165a1a>] vfs_write+0xa6/0x13e
>  [<c019a79f>] proc_reg_write+0x0/0x79
>  [<c016600a>] sys_write+0x41/0x6a
>  [<c01033d2>] sysenter_past_esp+0x5f/0x85
>  =======================
> <c34dbef0>, tick_sched_timer, S:01
>  # expires at 6550223000000 nsecs [in 18446744073707636433 nsecs]
>  #1: <c34dbef0>, hrtimer_wakeup, S:01
>  # expires at 6560773230863 nsecs [in 10548315680 nsecs]
>  #2: <c34dbef0>, it_real_fn, S:01
>  # expires at 7238128506303 nsecs [in 687903591120 nsecs]
>  #3: <c34dbef0>, it_real_fn, S:01
>  # expires at 7238392105656 nsecs [in 688167190473 nsecs]
>   .expires_next   : 6550223000000 nsecs
>   .hres_active    : 1
>   .nr_events      : 6409367
>   .nohz_mode      : 0
>   .idle_tick      : 0 nsecs
>   .tick_stopped   : 0
>   .idle_jiffies   : 0
>   .idle_calls     : 0
>   .idle_sleeps    : 0
>   .idle_entrytime : 0 nsecs
>   .idle_sleeptime : 0 nsecs
>   .last_jiffies   : 0
>   .next_jiffies   : 0
>   .idle_expires   : 0 nsecs
> jiffies: 6250223
> 
> cpu: 1
>  clock 0:
>   .index:      0
>   .resolution: 1 nsecs
>   .get_time:   ktime_get_real
>   .offset:     1183655045226773151 nsecs
> active timers:
>  clock 1:
>   .index:      1
>   .resolution: 1 nsecs
>   .get_time:   ktime_get
>   .offset:     0 nsecs
> active timers:
>  #0: <c34dbef0>, hrtimer_wakeup, S:01
>  # expires at 6550773229533 nsecs [in 548314350 nsecs]
>  #1: <c34dbef0>, tick_sched_timer, S:01
>  # expires at 8881205015625 nsecs [in 2330980100442 nsecs]
>  #2: <c34dbef0>, hrtimer_wakeup, S:01
>  # expires at 8894203449198 nsecs [in 2343978534015 nsecs]
>  #3: <c34dbef0>, it_real_fn, S:01
>  # expires at 8911201851655 nsecs [in 2360976936472 nsecs]
>  #4: <c34dbef0>, hrtimer_wakeup, S:01
>  # expires at 9181203252434 nsecs [in 2630978337251 nsecs]
>  #5: <c34dbef0>, hrtimer_wakeup, S:01
>  # expires at 9841202403256 nsecs [in 3290977488073 nsecs]
>   .expires_next   : 6550773229533 nsecs
>   .hres_active    : 1
>   .nr_events      : 4475068
>   .nohz_mode      : 0
>   .idle_tick      : 0 nsecs
>   .tick_stopped   : 0
>   .idle_jiffies   : 0
>   .idle_calls     : 0
>   .idle_sleeps    : 0
>   .idle_entrytime : 0 nsecs
>   .idle_sleeptime : 0 nsecs
>   .last_jiffies   : 0
>   .next_jiffies   : 0
>   .idle_expires   : 0 nsecs
> jiffies: 6250223
> 
> 
> Tick Device: mode:     1
> Clock Event Device: pit
>  max_delta_ns:   27461866
>  min_delta_ns:   12571
>  mult:           5124677
>  shift:          32
>  mode:           3
>  next_event:     9223372036854775807 nsecs
>  set_next_event: pit_next_event
>  set_mode:       init_pit_timer
>  event_handler:  tick_handle_oneshot_broadcast
> tick_broadcast_mask: 00000000
> tick_broadcast_oneshot_mask: 00000000
> 
> 
> Tick Device: mode:     1
> Clock Event Device: lapic
>  max_delta_ns:   2037624802
>  min_delta_ns:   3643
>  mult:           17681760
>  shift:          32
>  mode:           3
>  next_event:     6550223000000 nsecs
>  set_next_event: lapic_next_event
>  set_mode:       lapic_timer_setup
>  event_handler:  hrtimer_interrupt
> 
> Tick Device: mode:     1
> Clock Event Device: lapic
>  max_delta_ns:   2037624802
>  min_delta_ns:   3643
>  mult:           17681760
>  shift:          32
>  mode:           3
>  next_event:     6550773229533 nsecs
>  set_next_event: lapic_next_event
>  set_mode:       lapic_timer_setup
>  event_handler:  hrtimer_interrupt
> 
> 
> ---cut---
> 
> R.
> 
> 

----- End forwarded message -----

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-06  6:14         ` + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree Jarek Poplawski
  2007-07-06  6:20           ` Fwd: " Jarek Poplawski
@ 2007-07-06  6:26           ` Jarek Poplawski
  2007-07-06  6:45             ` Jarek Poplawski
  1 sibling, 1 reply; 25+ messages in thread
From: Jarek Poplawski @ 2007-07-06  6:26 UTC (permalink / raw)
  To: Ranko Zivojnovic; +Cc: akpm, kaber, netdev

On Fri, Jul 06, 2007 at 08:14:20AM +0200, Jarek Poplawski wrote:
...
> This new lockup bug you have just found needs some time
> to figure out. BTW, I wonder if you had lockdep on
> (CONFIG_PROVE_LOCKING or CONFIG_LOCK_ALLOC)?

Should be:
(CONFIG_PROVE_LOCKING or CONFIG_DEBUG_LOCK_ALLOC)?

Jarek P.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-06  6:26           ` Jarek Poplawski
@ 2007-07-06  6:45             ` Jarek Poplawski
  2007-07-06 12:47               ` Jarek Poplawski
  2007-07-06 13:14               ` Ranko Zivojnovic
  0 siblings, 2 replies; 25+ messages in thread
From: Jarek Poplawski @ 2007-07-06  6:45 UTC (permalink / raw)
  To: Ranko Zivojnovic; +Cc: akpm, kaber, netdev

On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote:
> On Thu, 2007-07-05 at 18:59 +0300, Ranko Zivojnovic wrote:
> > On Thu, 2007-07-05 at 17:34 +0300, Ranko Zivojnovic wrote:
> > > Anyhow - I am currently running 2.6.22-rc6-mm1 + sch_htb patch and
> > > running a test script that always managed to reproduce the problem
> > > within half hour - so far it looks good, but I will leave it hammering
> > > until tomorrow and will let you know.
> > > 
> > 
> > Ahm...it managed to get stuck:
> > 
> > Relevant piece of the process table ('tc' stays stuck):
> >  9608 pts/0    S+     0:00 /bin/bash ./tc-crash.sh
> >  9609 pts/0    D+     0:00 tc qdisc del dev lo root
> > 
> 
> Something really weird happened here...
> 
> I've left it the way it was (stuck) over night and this morning the
> process was terminated.
> 
> It looks like some timer was out of whack.
> 
> I'm not sure if it is possibly relevant - but the test I performed was
> done in VMWare.

Yes, this could be relevant, so it's good to know.

> 
> In order to get that parameter out of the way - I will make the same
> test on a real machine.

Fine! But, unles you have something to hide, try to reply
on these messages which went through netdev, or cc netdev
at least, because maybe more people would be interested
or even could help to solve this faster.

Cheers,
Jarek P.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-06  6:45             ` Jarek Poplawski
@ 2007-07-06 12:47               ` Jarek Poplawski
  2007-07-06 13:16                 ` Ranko Zivojnovic
  2007-07-06 13:14               ` Ranko Zivojnovic
  1 sibling, 1 reply; 25+ messages in thread
From: Jarek Poplawski @ 2007-07-06 12:47 UTC (permalink / raw)
  To: Ranko Zivojnovic; +Cc: akpm, kaber, netdev

On Fri, Jul 06, 2007 at 08:45:23AM +0200, Jarek Poplawski wrote:
> On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote:
...
> > In order to get that parameter out of the way - I will make the same
> > test on a real machine.

BTW, maybe it would be better to try with something more stable than -mm.
There is also one bug invoked by sysreq, but I hope you've started to use
sysreq only after the lockup begun.

Jarek P.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-06 12:47               ` Jarek Poplawski
@ 2007-07-06 13:16                 ` Ranko Zivojnovic
  2007-07-09  8:25                   ` Jarek Poplawski
  0 siblings, 1 reply; 25+ messages in thread
From: Ranko Zivojnovic @ 2007-07-06 13:16 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: akpm, kaber, netdev

On Fri, 2007-07-06 at 14:47 +0200, Jarek Poplawski wrote:
> On Fri, Jul 06, 2007 at 08:45:23AM +0200, Jarek Poplawski wrote:
> > On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote:
> ...
> > > In order to get that parameter out of the way - I will make the same
> > > test on a real machine.
> 
> BTW, maybe it would be better to try with something more stable than -mm.

That would be the next thing I was to try. Suggestion?

> There is also one bug invoked by sysreq, but I hope you've started to use
> sysreq only after the lockup begun.

Yes - it was after the lockup.

R.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-06 13:16                 ` Ranko Zivojnovic
@ 2007-07-09  8:25                   ` Jarek Poplawski
  0 siblings, 0 replies; 25+ messages in thread
From: Jarek Poplawski @ 2007-07-09  8:25 UTC (permalink / raw)
  To: Ranko Zivojnovic; +Cc: akpm, kaber, netdev

On Fri, Jul 06, 2007 at 04:16:18PM +0300, Ranko Zivojnovic wrote:
> On Fri, 2007-07-06 at 14:47 +0200, Jarek Poplawski wrote:
> > On Fri, Jul 06, 2007 at 08:45:23AM +0200, Jarek Poplawski wrote:
> > > On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote:
> > ...
> > > > In order to get that parameter out of the way - I will make the same
> > > > test on a real machine.
> > 
> > BTW, maybe it would be better to try with something more stable than -mm.
> 
> That would be the next thing I was to try. Suggestion?

Of course, -mm needs testing too, and it's probably easier
to find bugs here, but it's hard to check fixes: you never
know if the new bugs are caused by the fix or the fix only
uncovered them. So, last stable or -rc versions (at least
with higher numbers) should make things faster. 

I've looked a bit on this first log, and it's very strange,
especially stats of cpu#0. Maybe it's really about vmware
or maybe there is something more, but it doesn't seem this
stats_lock deadlock could've done all this mess. Anyway, it
would be interesting to repeat such test with -mm after
checking e.g. with 2.6.22.

Thanks,
Jarek P.

PS: Sorry for late responding.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-06  6:45             ` Jarek Poplawski
  2007-07-06 12:47               ` Jarek Poplawski
@ 2007-07-06 13:14               ` Ranko Zivojnovic
  2007-07-06 13:27                 ` Patrick McHardy
  2007-07-06 14:21                 ` Patrick McHardy
  1 sibling, 2 replies; 25+ messages in thread
From: Ranko Zivojnovic @ 2007-07-06 13:14 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: akpm, kaber, netdev

On Fri, 2007-07-06 at 08:45 +0200, Jarek Poplawski wrote: 
> On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote:
> > On Thu, 2007-07-05 at 18:59 +0300, Ranko Zivojnovic wrote:
> > > On Thu, 2007-07-05 at 17:34 +0300, Ranko Zivojnovic wrote:
> > > > Anyhow - I am currently running 2.6.22-rc6-mm1 + sch_htb patch and
> > > > running a test script that always managed to reproduce the problem
> > > > within half hour - so far it looks good, but I will leave it hammering
> > > > until tomorrow and will let you know.
> > > > 
> > > 
> > > Ahm...it managed to get stuck:
> > > 
> > > Relevant piece of the process table ('tc' stays stuck):
> > >  9608 pts/0    S+     0:00 /bin/bash ./tc-crash.sh
> > >  9609 pts/0    D+     0:00 tc qdisc del dev lo root
> > > 
> > 
> > Something really weird happened here...
> > 
> > I've left it the way it was (stuck) over night and this morning the
> > process was terminated.
> > 
> > It looks like some timer was out of whack.
> > 
> > I'm not sure if it is possibly relevant - but the test I performed was
> > done in VMWare.
> 
> Yes, this could be relevant, so it's good to know.
> 
> > 
> > In order to get that parameter out of the way - I will make the same
> > test on a real machine.
> 
> Fine! But, unles you have something to hide, try to reply
> on these messages which went through netdev, or cc netdev
> at least, because maybe more people would be interested
> or even could help to solve this faster.
> 

Nope - nothing to hide :).

Managed to get stuck on a normal machine as well (2.6.22-rc6-mm1 +
sch_htb patch) ... here's the log:

---CUT--- 
-- 0:loop-telnet.sh -- time-stamp -- Jul/06/07 14:46:17 --
u32 classifier
    Performance counters on
    OLD policer on
    input device check on
-- 0:loop-telnet.sh -- time-stamp -- Jul/06/07 14:48:13 --
-- 0:loop-telnet.sh -- time-stamp -- Jul/06/07 15:38:26 --
BUG: spinlock lockup on CPU#0, swapper/0, c03eff80
 [<c01ed1fe>] _raw_spin_lock+0x108/0x13c
 [<c02a8468>] __qdisc_run+0x97/0x1b0
 [<c02a96f3>] qdisc_watchdog+0x19/0x58
 [<c02fe5e7>] __lock_text_start+0x37/0x43
 [<c02a9730>] qdisc_watchdog+0x56/0x58
 [<c02a96da>] qdisc_watchdog+0x0/0x58
 [<c0135d84>] run_hrtimer_softirq+0x58/0xb5
 [<c012626a>] __do_softirq+0x7e/0xf3
 [<c0126335>] do_softirq+0x56/0x58
 [<c0112574>] smp_apic_timer_interrupt+0x5a/0x85
 [<c0103eb1>] apic_timer_interrupt+0x29/0x38
 [<c0103ebb>] apic_timer_interrupt+0x33/0x38
 [<c010172f>] mwait_idle_with_hints+0x3b/0x3f
 [<c0101846>] cpu_idle+0x5f/0x74
 [<c0417d2a>] start_kernel+0x318/0x3ae
 [<c04174bf>] unknown_bootoption+0x0/0x258
 =======================
BUG: spinlock lockup on CPU#1, swapper/0, c03eff80
 [<c01ed1fe>] _raw_spin_lock+0x108/0x13c
 [<c0298b9b>] est_timer+0x53/0x148
 [<c01294b3>] run_timer_softirq+0x30/0x184
 [<c01295a4>] run_timer_softirq+0x121/0x184
 [<c0126252>] __do_softirq+0x66/0xf3
 [<c0298b48>] est_timer+0x0/0x148
 [<c012626a>] __do_softirq+0x7e/0xf3
 [<c0126335>] do_softirq+0x56/0x58
 [<c0112574>] smp_apic_timer_interrupt+0x5a/0x85
 [<c0103eb1>] apic_timer_interrupt+0x29/0x38
 [<c0103ebb>] apic_timer_interrupt+0x33/0x38
 [<c010172f>] mwait_idle_with_hints+0x3b/0x3f
 [<c0101846>] cpu_idle+0x5f/0x74
 =======================

telnet> send break
SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks

telnet> send break
SysRq : Show Regs

Pid: 0, comm:              swapper
EIP: 0060:[<c01ed199>] CPU: 1
EIP is at _raw_spin_lock+0xa3/0x13c
 EFLAGS: 00000246    Not tainted  (2.6.22-rc6-mm1.SNET.Thors.htbpatch.2.lockdebug #1)
EAX: 00000000 EBX: c03eff80 ECX: e4f7bf60 EDX: 98b928f8
ESI: 260e7970 EDI: 00000000 EBP: beb75188 DS: 007b ES: 007b FS: 00d8
CR0: 8005003b CR2: 080f7578 CR3: 03651000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 [<c0298b9b>] est_timer+0x53/0x148
 [<c01294b3>] run_timer_softirq+0x30/0x184
 [<c01295a4>] run_timer_softirq+0x121/0x184
 [<c0126252>] __do_softirq+0x66/0xf3
 [<c0298b48>] est_timer+0x0/0x148
 [<c012626a>] __do_softirq+0x7e/0xf3
 [<c0126335>] do_softirq+0x56/0x58
 [<c0112574>] smp_apic_timer_interrupt+0x5a/0x85
 [<c0103eb1>] apic_timer_interrupt+0x29/0x38
 [<c0103ebb>] apic_timer_interrupt+0x33/0x38
 [<c010172f>] mwait_idle_with_hints+0x3b/0x3f
 [<c0101846>] cpu_idle+0x5f/0x74
 =======================

telnet> send break
SysRq : Show Blocked State

                         free                        sibling
  task             PC    stack   pid father child younger older
kjournald     D 00000343     0   518      2 (L-TLB)
       c2361e8c 00000092 090e4f4b 00000343 c2361e74 00000000 c2baf920 c2360000
       00000002 c2baf920 c21343f0 c21345a8 c1d24080 00000002 c040ed80 c2361ec4
       c4289280 00000000 00000000 003252cc c02feb97 c2361ecc ffffffff 00000000
Call Trace:
 [<c02feb97>] _spin_unlock_irqrestore+0x34/0x39
 [<c02fc563>] io_schedule+0x1d/0x27
 [<c018b02c>] sync_buffer+0x30/0x35
 [<c02fcb5e>] __wait_on_bit+0x42/0x5e
 [<c018affc>] sync_buffer+0x0/0x35
 [<c018affc>] sync_buffer+0x0/0x35
 [<c02fcbdd>] out_of_line_wait_on_bit+0x63/0x76
 [<c0132bf2>] wake_bit_function+0x0/0x3c
 [<c018af91>] __wait_on_buffer+0x24/0x29
 [<c018bdb2>] sync_dirty_buffer+0x9c/0xe6
 [<f88fcefe>] journal_get_descriptor_buffer+0x88/0x9a [jbd]
 [<f88f922b>] journal_commit_transaction+0xf2a/0x111f [jbd]
 [<c02fea7a>] _spin_unlock_irq+0x20/0x23
 [<f88fc8bf>] kjournald+0xac/0x1f4 [jbd]
 [<c0132bbb>] autoremove_wake_function+0x0/0x37
 [<f88fc813>] kjournald+0x0/0x1f4 [jbd]
 [<c0132976>] kthread+0x34/0x56
 [<c0132942>] kthread+0x0/0x56
 [<c0104053>] kernel_thread_helper+0x7/0x14
 =======================
syslogd       D 00000341     0  2173      1 (NOTLB)
       c4315e84 00000082 b9c1b1c3 00000341 c4315e6c 00000000 00000000 c4314000
       00000002 00000002 c2ea25f0 c2ea27a8 c1d24080 00000002 c040ed80 c2ea25f0
       c3652b80 c2aab4c8 c2aab414 00323cc0 00000002 00000046 ffffffff 00000000
Call Trace:
 [<f88fbbfb>] log_wait_commit+0xaf/0x116 [jbd]
 [<c0132bbb>] autoremove_wake_function+0x0/0x37
 [<f88f66b1>] journal_stop+0x158/0x1fa [jbd]
 [<c0186711>] __writeback_single_inode+0x26f/0x331
 [<c01867e3>] sync_inode+0x10/0x2a
 [<c02fe7cf>] _spin_lock+0x29/0x34
 [<c01867ec>] sync_inode+0x19/0x2a
 [<f88b3533>] ext3_sync_file+0xbf/0xcc [ext3]
 [<c02fd2f1>] __mutex_lock_slowpath+0x14b/0x293
 [<c018948b>] do_fsync+0x57/0x90
 [<c01894e1>] __do_fsync+0x1d/0x2b
 [<c0103366>] sysenter_past_esp+0x5f/0x99
 =======================
crond         D 0000034B     0 10251   2740 (NOTLB)
       c5b6be18 00000086 44c73986 0000034b c5b6be00 00000000 c43048f0 c5b6a000
       00000002 c04ea564 c43048f0 c4304aa8 c1d24080 00000002 c040ed80 00000000
       c393e280 c02fea7a c5b6be2c 0032dd68 c013d19e c02fc418 ffffffff 00000000
Call Trace:
 [<c02fea7a>] _spin_unlock_irq+0x20/0x23
 [<c013d19e>] trace_hardirqs_on+0xab/0x147
 [<c02fc418>] wait_for_completion+0x24/0xb7
 [<c02fc492>] wait_for_completion+0x9e/0xb7
 [<c011bd50>] default_wake_function+0x0/0xc
 [<c012f537>] call_usermodehelper_exec+0xdb/0xe0
 [<c0167d3b>] cache_alloc_refill+0x163/0x557
 [<c012f69b>] request_module+0x9f/0xe2
 [<c02feb97>] _spin_unlock_irqrestore+0x34/0x39
 [<c013d19e>] trace_hardirqs_on+0xab/0x147
 [<c0118f2d>] __wake_up+0x32/0x43
 [<c02adfa2>] netlink_create+0xa6/0x1a2
 [<c029011b>] __sock_create+0xae/0x1db
 [<c029027d>] sock_create+0x18/0x1d
 [<c0290412>] sys_socket+0x1c/0x44
 [<c0290d06>] sys_socketcall+0x68/0x271
 [<c013d19e>] trace_hardirqs_on+0xab/0x147
 [<c0103438>] restore_nocheck+0x12/0x15
 [<c0103366>] sysenter_past_esp+0x5f/0x99
 =======================
Sched Debug Version: v0.03, cfs-v18, 2.6.22-rc6-mm1.SNET.Thors.htbpatch.2.lockdebug #1
now at 3640122056027 nsecs

cpu#0, 3200.442 MHz
  .nr_running                    : 2
  .load                          : 4149
  .ls.delta_fair                 : 0
  .ls.delta_exec                 : 0
  .nr_switches                   : 904853
  .nr_load_updates               : 3640019
  .nr_uninterruptible            : 425
  .jiffies                       : 3340159
  .next_balance                  : 2936193
  .curr->pid                     : 0
  .clock                         : 3628742932463
  .prev_clock_raw                : 3628948350088
  .clock_warps                   : 0
  .clock_overflows               : 57
  .clock_unstable_events         : 1
  .clock_max_delta               : 1999557
  .cfs.fair_clock                : 557156111815
  .cfs.exec_clock                : 213278676734
  .cfs.wait_runtime              : -93884907451
  .cfs.wait_runtime_overruns     : 1235
  .cfs.wait_runtime_underruns    : 2874
  .cpu_load[0]                   : 0
  .cpu_load[1]                   : 0
  .cpu_load[2]                   : 0
  .cpu_load[3]                   : 0
  .cpu_load[4]                   : 0
  .wait_runtime_rq_sum           : 47286887

runnable tasks:
            task   PID        tree-key         delta       waiting  switches  prio        sum-exec        sum-wait       sum-sleep    wait-overrun   wait-underrun
------------------------------------------------------------------------------------------------------------------------------------------------------------------
         khelper    19    557082869628     -73242187      24000000       955   115        12293227     53703642836   3621787091747               1               0
           eping  3114    557163819749       7707934      23286887    333993   120     83353964767       253906993   2733416723635              41             217
WARNING: at kernel/lockdep.c:2031 trace_hardirqs_on()
 [<c02fea9d>] _read_unlock_irq+0x20/0x23
 [<c0119814>] sched_debug_show+0x8b2/0xe04
 [<c011e2ab>] show_state_filter+0x161/0x184
 [<c02354c3>] __handle_sysrq+0x7a/0x115
 [<c024059d>] receive_chars+0x23b/0x2a4
 [<c024174d>] serial8250_interrupt+0x114/0x132
 [<c0148733>] handle_IRQ_event+0x1e/0x48
 [<c014982f>] handle_edge_irq+0x9b/0x112
 [<c0105afc>] do_IRQ+0x45/0x99
 [<c0103dd0>] common_interrupt+0x24/0x34
 [<c0103dda>] common_interrupt+0x2e/0x34
 [<c010172f>] mwait_idle_with_hints+0x3b/0x3f
 [<c0101846>] cpu_idle+0x5f/0x74
 =======================

cpu#1, 3200.442 MHz
  .nr_running                    : 1
  .load                          : 1024
  .ls.delta_fair                 : 0
  .ls.delta_exec                 : 0
  .nr_switches                   : 1825911
  .nr_load_updates               : 3638353
  .nr_uninterruptible            : 4294967022
  .jiffies                       : 3340380
  .next_balance                  : 2937417
  .curr->pid                     : 0
  .clock                         : 3627052732372
  .prev_clock_raw                : 3629168676206
  .clock_warps                   : 0
  .clock_overflows               : 45
  .clock_unstable_events         : 0
  .clock_max_delta               : 1968238
  .cfs.fair_clock                : 1032651577814
  .cfs.exec_clock                : 242507384061
  .cfs.wait_runtime              : -91108953414
  .cfs.wait_runtime_overruns     : 2143
  .cfs.wait_runtime_underruns    : 1953
  .cpu_load[0]                   : 0
  .cpu_load[1]                   : 0
  .cpu_load[2]                   : 0
  .cpu_load[3]                   : 0
  .cpu_load[4]                   : 0
  .wait_runtime_rq_sum           : 23979985

runnable tasks:
            task   PID        tree-key         delta       waiting  switches  prio        sum-exec        sum-wait       sum-sleep    wait-overrun   wait-underrun
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 hald-addon-stor  2781   1032650759709       -818105      23979985     12308   120       491952575      2635261433   3092710060269              30               0

cpu#2, 3200.442 MHz
  .nr_running                    : 0
  .load                          : 0
  .ls.delta_fair                 : 0
  .ls.delta_exec                 : 0
  .nr_switches                   : 1509999
  .nr_load_updates               : 3639454
  .nr_uninterruptible            : 789
  .jiffies                       : 3340529
  .next_balance                  : 3339823
  .curr->pid                     : 0
  .clock                         : 3628130245442
  .prev_clock_raw                : 3629317315215
  .clock_warps                   : 0
  .clock_overflows               : 73
  .clock_unstable_events         : 0
  .clock_max_delta               : 1965436
  .cfs.fair_clock                : 839764839660
  .cfs.exec_clock                : 208639226606
  .cfs.wait_runtime              : 17382220347
  .cfs.wait_runtime_overruns     : 2282
  .cfs.wait_runtime_underruns    : 11415
  .cpu_load[0]                   : 0
  .cpu_load[1]                   : 0
  .cpu_load[2]                   : 0
  .cpu_load[3]                   : 0
  .cpu_load[4]                   : 0
  .wait_runtime_rq_sum           : 0

runnable tasks:
            task   PID        tree-key         delta       waiting  switches  prio        sum-exec        sum-wait       sum-sleep    wait-overrun   wait-underrun
------------------------------------------------------------------------------------------------------------------------------------------------------------------

cpu#3, 3200.442 MHz
  .nr_running                    : 1
  .load                          : 1024
  .ls.delta_fair                 : 0
  .ls.delta_exec                 : 0
  .nr_switches                   : 3120535
  .nr_load_updates               : 3640332
  .nr_uninterruptible            : 4294966359
  .jiffies                       : 3340663
  .next_balance                  : 2936318
  .curr->pid                     : 10250
  .clock                         : 3629093644120
  .prev_clock_raw                : 3629450829376
  .clock_warps                   : 0
  .clock_overflows               : 15
  .clock_unstable_events         : 0
  .clock_max_delta               : 1940676
  .cfs.fair_clock                : 2242476711934
  .cfs.exec_clock                : 782992850413
  .cfs.wait_runtime              : 621759666860
  .cfs.wait_runtime_overruns     : 4730
  .cfs.wait_runtime_underruns    : 46039
  .cpu_load[0]                   : 1019
  .cpu_load[1]                   : 1020
  .cpu_load[2]                   : 1018
  .cpu_load[3]                   : 1017
  .cpu_load[4]                   : 1009
  .wait_runtime_rq_sum           : -272987

runnable tasks:
            task   PID        tree-key         delta       waiting  switches  prio        sum-exec        sum-wait       sum-sleep    wait-overrun   wait-underrun
------------------------------------------------------------------------------------------------------------------------------------------------------------------
R             tc 10250   2242552751078        273022       -273022         1   120    403322303899         -273022             653               0               0


telnet> send break
SysRq : Show Pending Timers
Timer List Version: v0.3
HRTIMER_MAX_CLOCK_BASES: 2
now at 3730198060133 nsecs

cpu: 0
 clock 0:
  .index:      0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset:     1183721968505121290 nsecs
active timers:
 clock 1:
  .index:      1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
 #0: <3>BUG: sleeping function called from invalid context at kernel/mutex.c:86
in_atomic():1, irqs_disabled():1
INFO: lockdep is turned off.
irq event stamp: 38482572
hardirqs last  enabled at (38482572): [<c0103438>] restore_nocheck+0x12/0x15
hardirqs last disabled at (38482571): [<c0103eb1>] apic_timer_interrupt+0x29/0x38
softirqs last  enabled at (37677028): [<c0126335>] do_softirq+0x56/0x58
softirqs last disabled at (37677033): [<c0126335>] do_softirq+0x56/0x58
 [<c02fd44b>] mutex_lock+0x12/0x1a
 [<c0145e69>] lookup_module_symbol_name+0x14/0xbb
 [<c0138810>] print_name_offset+0x1a/0x8e
 [<c0121c3f>] release_console_sem+0x1d9/0x1f5
 [<c01391a1>] timer_list_show+0x660/0xa66
 [<c013a861>] tick_sched_timer+0x0/0x186
 [<c02354c3>] __handle_sysrq+0x7a/0x115
 [<c024059d>] receive_chars+0x23b/0x2a4
 [<c024174d>] serial8250_interrupt+0x114/0x132
 [<c0148733>] handle_IRQ_event+0x1e/0x48
 [<c014982f>] handle_edge_irq+0x9b/0x112
 [<c0105afc>] do_IRQ+0x45/0x99
 [<c0103dd0>] common_interrupt+0x24/0x34
 [<c0103dda>] common_interrupt+0x2e/0x34
 [<c01e007b>] vsnprintf+0x29a/0x5fe
 [<c01ed199>] _raw_spin_lock+0xa3/0x13c
 [<c0298b9b>] est_timer+0x53/0x148
 [<c01294b3>] run_timer_softirq+0x30/0x184
 [<c01295a4>] run_timer_softirq+0x121/0x184
 [<c0126252>] __do_softirq+0x66/0xf3
 [<c0298b48>] est_timer+0x0/0x148
 [<c012626a>] __do_softirq+0x7e/0xf3
 [<c0126335>] do_softirq+0x56/0x58
 [<c0112574>] smp_apic_timer_interrupt+0x5a/0x85
 [<c0103eb1>] apic_timer_interrupt+0x29/0x38
 [<c0103ebb>] apic_timer_interrupt+0x33/0x38
 [<c010172f>] mwait_idle_with_hints+0x3b/0x3f
 [<c0101846>] cpu_idle+0x5f/0x74
 =======================
<c2047d6c>, tick_sched_timer, S:01
 # expires at 3730231000000 nsecs [in 32939867 nsecs]
 #1: <c2047d6c>, it_real_fn, S:01
 # expires at 3734130154519 nsecs [in 3932094386 nsecs]
 #2: <c2047d6c>, it_real_fn, S:01
 # expires at 3734197190457 nsecs [in 3999130324 nsecs]
 #3: <c2047d6c>, it_real_fn, S:01
 # expires at 31835132462428 nsecs [in 28104934402295 nsecs]
  .expires_next   : 3730415000000 nsecs
  .hres_active    : 1
  .nr_events      : 3736552
  .nohz_mode      : 0
  .idle_tick      : 0 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 0
  .idle_calls     : 0
  .idle_sleeps    : 0
  .idle_entrytime : 0 nsecs
  .idle_sleeptime : 0 nsecs
  .last_jiffies   : 0
  .next_jiffies   : 0
  .idle_expires   : 0 nsecs
jiffies: 3430448

cpu: 1
 clock 0:
  .index:      0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset:     1183721968505121290 nsecs
active timers:
 clock 1:
  .index:      1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
 #0: <c2047d6c>, tick_sched_timer, S:01
 # expires at 3730196015625 nsecs [in 18446744073707507108 nsecs]
  .expires_next   : 3730196015625 nsecs
  .hres_active    : 1
  .nr_events      : 3733988
  .nohz_mode      : 0
  .idle_tick      : 0 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 0
  .idle_calls     : 0
  .idle_sleeps    : 0
  .idle_entrytime : 0 nsecs
  .idle_sleeptime : 0 nsecs
  .last_jiffies   : 0
  .next_jiffies   : 0
  .idle_expires   : 0 nsecs
jiffies: 3430518

cpu: 2
 clock 0:
  .index:      0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset:     1183721968505121290 nsecs
active timers:
 clock 1:
  .index:      1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
 #0: <c2047d6c>, tick_sched_timer, S:01
 # expires at 3730546031250 nsecs [in 347971117 nsecs]
 #1: <c2047d6c>, hrtimer_wakeup, S:01
 # expires at 3730649159723 nsecs [in 451099590 nsecs]
 #2: <c2047d6c>, hrtimer_wakeup, S:01
 # expires at 3734223051944 nsecs [in 4024991811 nsecs]
 #3: <c2047d6c>, hrtimer_wakeup, S:01
 # expires at 3753480126556 nsecs [in 23282066423 nsecs]
  .expires_next   : 3730581031250 nsecs
  .hres_active    : 1
  .nr_events      : 3739031
  .nohz_mode      : 0
  .idle_tick      : 0 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 0
  .idle_calls     : 0
  .idle_sleeps    : 0
  .idle_entrytime : 0 nsecs
  .idle_sleeptime : 0 nsecs
  .last_jiffies   : 0
  .next_jiffies   : 0
  .idle_expires   : 0 nsecs
jiffies: 3430613

cpu: 3
 clock 0:
  .index:      0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset:     1183721968505121290 nsecs
active timers:
 clock 1:
  .index:      1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
 #0: <c2047d6c>, tick_sched_timer, S:01
 # expires at 3730641046875 nsecs [in 442986742 nsecs]
  .expires_next   : 3730650046875 nsecs
  .hres_active    : 1
  .nr_events      : 3736405
  .nohz_mode      : 0
  .idle_tick      : 0 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 0
  .idle_calls     : 0
  .idle_sleeps    : 0
  .idle_entrytime : 0 nsecs
  .idle_sleeptime : 0 nsecs
  .last_jiffies   : 0
  .next_jiffies   : 0
  .idle_expires   : 0 nsecs
jiffies: 3430685


Tick Device: mode:     1
Clock Event Device: pit
 max_delta_ns:   27461866
 min_delta_ns:   12571
 mult:           5124677
 shift:          32
 mode:           3
 next_event:     9223372036854775807 nsecs
 set_next_event: pit_next_event
 set_mode:       init_pit_timer
 event_handler:  tick_handle_oneshot_broadcast
tick_broadcast_mask: 00000000
tick_broadcast_oneshot_mask: 00000000


Tick Device: mode:     1
Clock Event Device: lapic
 max_delta_ns:   671064938
 min_delta_ns:   1199
 mult:           53688981
 shift:          32
 mode:           3
 next_event:     3730740000000 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt

Tick Device: mode:     1
Clock Event Device: lapic
 max_delta_ns:   671064938
 min_delta_ns:   1199
 mult:           53688981
 shift:          32
 mode:           3
 next_event:     3730196015625 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt

Tick Device: mode:     1
Clock Event Device: lapic
 max_delta_ns:   671064938
 min_delta_ns:   1199
 mult:           53688981
 shift:          32
 mode:           3
 next_event:     3730799031250 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt

Tick Device: mode:     1
Clock Event Device: lapic
 max_delta_ns:   671064938
 min_delta_ns:   1199
 mult:           53688981
 shift:          32
 mode:           3
 next_event:     3730828046875 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt

---CUT---



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-06 13:14               ` Ranko Zivojnovic
@ 2007-07-06 13:27                 ` Patrick McHardy
  2007-07-06 13:59                   ` Ranko Zivojnovic
  2007-07-06 14:21                 ` Patrick McHardy
  1 sibling, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2007-07-06 13:27 UTC (permalink / raw)
  To: Ranko Zivojnovic; +Cc: Jarek Poplawski, akpm, netdev

Ranko Zivojnovic wrote:
> Managed to get stuck on a normal machine as well (2.6.22-rc6-mm1 +
> sch_htb patch) ... here's the log:
> 

Can you post the script you're using to reproduce this please?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-06 13:27                 ` Patrick McHardy
@ 2007-07-06 13:59                   ` Ranko Zivojnovic
  0 siblings, 0 replies; 25+ messages in thread
From: Ranko Zivojnovic @ 2007-07-06 13:59 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Jarek Poplawski, akpm, netdev

[-- Attachment #1: Type: text/plain, Size: 282 bytes --]

On Fri, 2007-07-06 at 15:27 +0200, Patrick McHardy wrote:
> Ranko Zivojnovic wrote:
> > Managed to get stuck on a normal machine as well (2.6.22-rc6-mm1 +
> > sch_htb patch) ... here's the log:
> > 
> 
> Can you post the script you're using to reproduce this please?

Attached.

R.

[-- Attachment #2: tc-crash.sh --]
[-- Type: application/x-shellscript, Size: 1597 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-06 13:14               ` Ranko Zivojnovic
  2007-07-06 13:27                 ` Patrick McHardy
@ 2007-07-06 14:21                 ` Patrick McHardy
  2007-07-06 14:55                   ` Ranko Zivojnovic
  1 sibling, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2007-07-06 14:21 UTC (permalink / raw)
  To: Ranko Zivojnovic; +Cc: Jarek Poplawski, akpm, netdev

Ranko Zivojnovic wrote:
> BUG: spinlock lockup on CPU#0, swapper/0, c03eff80
>  [<c01ed1fe>] _raw_spin_lock+0x108/0x13c
>  [<c02a8468>] __qdisc_run+0x97/0x1b0
>  [<c02a96f3>] qdisc_watchdog+0x19/0x58
>  [<c02fe5e7>] __lock_text_start+0x37/0x43
>  [<c02a9730>] qdisc_watchdog+0x56/0x58
>  [<c02a96da>] qdisc_watchdog+0x0/0x58
>  [<c0135d84>] run_hrtimer_softirq+0x58/0xb5
>  [...]

> BUG: spinlock lockup on CPU#1, swapper/0, c03eff80
>  [<c01ed1fe>] _raw_spin_lock+0x108/0x13c
>  [<c0298b9b>] est_timer+0x53/0x148
>  [<c01294b3>] run_timer_softirq+0x30/0x184
>  [<c01295a4>] run_timer_softirq+0x121/0x184
>  [<c0126252>] __do_softirq+0x66/0xf3
>  [<c0298b48>] est_timer+0x0/0x148
>  [...]


There is at least one ABBA deadlock, est_timer does:

read_lock(&est_lock)
spin_lock(e->stats_lock) (which is dev->queue_lock)

and qdisc_destroy calls htb_destroy under dev->queue_lock, which
calls htb_destroy_class, then gen_kill_estimator and this
write_locks est_lock.

I can't see the problem above though, the qdisc_run path only takes
dev->queue_lock. Please enable lockdep and post the output if any.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-06 14:21                 ` Patrick McHardy
@ 2007-07-06 14:55                   ` Ranko Zivojnovic
  2007-07-07  7:55                     ` Ranko Zivojnovic
  0 siblings, 1 reply; 25+ messages in thread
From: Ranko Zivojnovic @ 2007-07-06 14:55 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Jarek Poplawski, akpm, netdev

On Fri, 2007-07-06 at 16:21 +0200, Patrick McHardy wrote:
> Ranko Zivojnovic wrote:
> > BUG: spinlock lockup on CPU#0, swapper/0, c03eff80
> >  [<c01ed1fe>] _raw_spin_lock+0x108/0x13c
> >  [<c02a8468>] __qdisc_run+0x97/0x1b0
> >  [<c02a96f3>] qdisc_watchdog+0x19/0x58
> >  [<c02fe5e7>] __lock_text_start+0x37/0x43
> >  [<c02a9730>] qdisc_watchdog+0x56/0x58
> >  [<c02a96da>] qdisc_watchdog+0x0/0x58
> >  [<c0135d84>] run_hrtimer_softirq+0x58/0xb5
> >  [...]
> 
> > BUG: spinlock lockup on CPU#1, swapper/0, c03eff80
> >  [<c01ed1fe>] _raw_spin_lock+0x108/0x13c
> >  [<c0298b9b>] est_timer+0x53/0x148
> >  [<c01294b3>] run_timer_softirq+0x30/0x184
> >  [<c01295a4>] run_timer_softirq+0x121/0x184
> >  [<c0126252>] __do_softirq+0x66/0xf3
> >  [<c0298b48>] est_timer+0x0/0x148
> >  [...]
> 
> 
> There is at least one ABBA deadlock, est_timer does:
> 
> read_lock(&est_lock)
> spin_lock(e->stats_lock) (which is dev->queue_lock)
> 
> and qdisc_destroy calls htb_destroy under dev->queue_lock, which
> calls htb_destroy_class, then gen_kill_estimator and this
> write_locks est_lock.
> 
> I can't see the problem above though, the qdisc_run path only takes
> dev->queue_lock. Please enable lockdep and post the output if any.

That was with lockdep already enabled:
$ zgrep LOCKDEP /proc/config.gz
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_LOCKDEP=y
# CONFIG_DEBUG_LOCKDEP is not set

also with:
CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_LOCK_ALLOC=y

that Jarek suggested, and including:
CONFIG_DETECT_SOFTLOCKUP=y
CONFIG_DEBUG_SPINLOCK=y

I will give it another spin as it is, just to see if it will show a
different path - and then I will try 2.6.21.5 with these two patches.

R.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-06 14:55                   ` Ranko Zivojnovic
@ 2007-07-07  7:55                     ` Ranko Zivojnovic
  2007-07-07 15:10                       ` Patrick McHardy
  0 siblings, 1 reply; 25+ messages in thread
From: Ranko Zivojnovic @ 2007-07-07  7:55 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Jarek Poplawski, akpm, netdev

On Fri, 2007-07-06 at 17:55 +0300, Ranko Zivojnovic wrote:
> On Fri, 2007-07-06 at 16:21 +0200, Patrick McHardy wrote:
> > Ranko Zivojnovic wrote:
> > > BUG: spinlock lockup on CPU#0, swapper/0, c03eff80
> > >  [<c01ed1fe>] _raw_spin_lock+0x108/0x13c
> > >  [<c02a8468>] __qdisc_run+0x97/0x1b0
> > >  [<c02a96f3>] qdisc_watchdog+0x19/0x58
> > >  [<c02fe5e7>] __lock_text_start+0x37/0x43
> > >  [<c02a9730>] qdisc_watchdog+0x56/0x58
> > >  [<c02a96da>] qdisc_watchdog+0x0/0x58
> > >  [<c0135d84>] run_hrtimer_softirq+0x58/0xb5
> > >  [...]
> > 
> > > BUG: spinlock lockup on CPU#1, swapper/0, c03eff80
> > >  [<c01ed1fe>] _raw_spin_lock+0x108/0x13c
> > >  [<c0298b9b>] est_timer+0x53/0x148
> > >  [<c01294b3>] run_timer_softirq+0x30/0x184
> > >  [<c01295a4>] run_timer_softirq+0x121/0x184
> > >  [<c0126252>] __do_softirq+0x66/0xf3
> > >  [<c0298b48>] est_timer+0x0/0x148
> > >  [...]
> > 
> > 
> > There is at least one ABBA deadlock, est_timer does:
> > 
> > read_lock(&est_lock)
> > spin_lock(e->stats_lock) (which is dev->queue_lock)
> > 
> > and qdisc_destroy calls htb_destroy under dev->queue_lock, which
> > calls htb_destroy_class, then gen_kill_estimator and this
> > write_locks est_lock.
> > 
> > I can't see the problem above though, the qdisc_run path only takes
> > dev->queue_lock. Please enable lockdep and post the output if any.
> 

I've got both code paths this time. It shows exactly the ABBA deadlock
you describe above. The details are below.

Maybe the appropriate way to fix this would to call gen_kill_estimator,
with the appropriate lock order, before the call to qdisc_destroy, so
when dev->queue_lock is taken for qdisc_destroy - the structure is
already off the list.

-------------LOG------------
BUG: spinlock lockup on CPU#2, ping/27868, c03eff80
 [<c01ed1fe>] _raw_spin_lock+0x108/0x13c
 [<c0298b9b>] est_timer+0x53/0x148
 [<c01295a4>] run_timer_softirq+0x121/0x184
 [<c0126252>] __do_softirq+0x66/0xf3
 [<c0298b48>] est_timer+0x0/0x148
 [<c012626a>] __do_softirq+0x7e/0xf3
 [<c0126335>] do_softirq+0x56/0x58
 [<c0112574>] smp_apic_timer_interrupt+0x5a/0x85
 [<c0103eb1>] apic_timer_interrupt+0x29/0x38
 [<c0103ebb>] apic_timer_interrupt+0x33/0x38
 [<c0126485>] local_bh_enable+0x94/0x13b
 [<c029c380>] dev_queue_xmit+0x95/0x2d5
 [<c02bb9a9>] ip_output+0x193/0x32a
 [<c02b9fd8>] ip_finish_output+0x0/0x29e
 [<c02b8aa6>] ip_push_pending_frames+0x27f/0x46b
 [<c02b8770>] dst_output+0x0/0x7
 [<c02d4fb9>] raw_sendmsg+0x70b/0x7f2
 [<c02dcbe0>] inet_sendmsg+0x2b/0x49
 [<c028fb66>] sock_sendmsg+0xe2/0xfd
 [<c0132bbb>] autoremove_wake_function+0x0/0x37
 [<c0132bbb>] autoremove_wake_function+0x0/0x37
 [<c011aacc>] enqueue_entity+0x139/0x4f8
 [<c01e0dc3>] copy_from_user+0x2d/0x59
 [<c028fcae>] sys_sendmsg+0x12d/0x243
 [<c013dec5>] __lock_acquire+0x825/0x1002
 [<c013dec5>] __lock_acquire+0x825/0x1002
 [<c011d8a2>] scheduler_tick+0x1a7/0x20e
 [<c02fea7a>] _spin_unlock_irq+0x20/0x23
 [<c013d166>] trace_hardirqs_on+0x73/0x147
 [<c01294b3>] run_timer_softirq+0x30/0x184
 [<c02fea7a>] _spin_unlock_irq+0x20/0x23
 [<c0290eed>] sys_socketcall+0x24f/0x271
 [<c013d19e>] trace_hardirqs_on+0xab/0x147
 [<c01e0fe6>] copy_to_user+0x2f/0x49
 [<c0103396>] sysenter_past_esp+0x8f/0x99
 [<c0103366>] sysenter_past_esp+0x5f/0x99
 =======================

And here is the ABBA deadlock:
---cut---
SysRq : Show Locks Held

Showing all locks held in the system:
****snip****
3 locks held by ping/27868:
 #0:  (sk_lock-AF_INET){--..}, at: [<c02d4f24>] raw_sendmsg+0x676/0x7f2
 #1:  (est_lock#2){-.-+}, at: [<c0298b5d>] est_timer+0x15/0x148
 #2:  (&dev->queue_lock){-+..}, at: [<c0298b9b>] est_timer+0x53/0x148
****snip****
8 locks held by tc/2278:
 #0:  (rtnl_mutex){--..}, at: [<c02a26d7>] rtnetlink_rcv+0x18/0x42
 #1:  (&dev->queue_lock){-+..}, at: [<c02a7f27>] qdisc_lock_tree+0xe/0x1c
 #2:  (&dev->ingress_lock){-...}, at: [<c02a9ba8>] tc_get_qdisc+0x192/0x1e9
 #3:  (est_lock#2){-.-+}, at: [<c02989bc>] gen_kill_estimator+0x58/0x6f
 #4:  (&irq_lists[i].lock){++..}, at: [<c024164d>] serial8250_interrupt+0x14/0x132
 #5:  (&port_lock_key){++..}, at: [<c024169b>] serial8250_interrupt+0x62/0x132
 #6:  (sysrq_key_table_lock){+...}, at: [<c0235460>] __handle_sysrq+0x17/0x115
 #7:  (tasklist_lock){..-?}, at: [<c013be2b>] debug_show_all_locks+0x2e/0x15e
****snip****
---cut---

As well as 'tc' stack:
---cut---
SysRq : Show Regs

Pid: 2278, comm:                   tc
EIP: 0060:[<c02fe39b>] CPU: 0
EIP is at __write_lock_failed+0xf/0x1c
 EFLAGS: 00000287    Not tainted
(2.6.22-rc6-mm1.SNET.Thors.htbpatch.2.lockdebug #1)
EAX: c03f5968 EBX: c03f5968 ECX: 00000000 EDX: 00000004
ESI: c9852840 EDI: c85eae24 EBP: c06aaa60 DS: 007b ES: 007b FS: 00d8
CR0: 8005003b CR2: 008ba828 CR3: 11841000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 [<c01ed264>] _raw_write_lock+0x32/0x6e
 [<c02989bc>] gen_kill_estimator+0x58/0x6f
 [<f8bb55a6>] htb_destroy_class+0x27/0x12f [sch_htb]
 [<f8bb6037>] htb_destroy+0x34/0x70 [sch_htb]
 [<c02a8152>] qdisc_destroy+0x52/0x8d
 [<c013d166>] trace_hardirqs_on+0x73/0x147
 [<f8bb5651>] htb_destroy_class+0xd2/0x12f [sch_htb]
 [<f8bb6037>] htb_destroy+0x34/0x70 [sch_htb]
 [<c02a8152>] qdisc_destroy+0x52/0x8d
 [<c02a9bb1>] tc_get_qdisc+0x19b/0x1e9
 [<c02a9a16>] tc_get_qdisc+0x0/0x1e9
 [<c02a28f5>] rtnetlink_rcv_msg+0x1c2/0x1f5
 [<c02ad51f>] netlink_run_queue+0x96/0xfd
 [<c02a2733>] rtnetlink_rcv_msg+0x0/0x1f5
 [<c02a26e5>] rtnetlink_rcv+0x26/0x42
 [<c02ada49>] netlink_data_ready+0x12/0x54
 [<c02ac6d4>] netlink_sendskb+0x1f/0x53
 [<c02ad958>] netlink_sendmsg+0x1f5/0x2d4
 [<c028fb66>] sock_sendmsg+0xe2/0xfd
 [<c0132bbb>] autoremove_wake_function+0x0/0x37
 [<c013dec5>] __lock_acquire+0x825/0x1002
 [<c028fb66>] sock_sendmsg+0xe2/0xfd
 [<c01e0dc3>] copy_from_user+0x2d/0x59
 [<c028fcae>] sys_sendmsg+0x12d/0x243
 [<c0157d4c>] __do_fault+0x12b/0x38b
 [<c0157db9>] __do_fault+0x198/0x38b
 [<c013d7b8>] __lock_acquire+0x118/0x1002
 [<c014de85>] filemap_fault+0x0/0x42f
 [<c015922e>] __handle_mm_fault+0x11e/0x68d
 [<c0290eed>] sys_socketcall+0x24f/0x271
 [<c013d19e>] trace_hardirqs_on+0xab/0x147
 [<c0103438>] restore_nocheck+0x12/0x15
 [<c0103366>] sysenter_past_esp+0x5f/0x99
 =======================
---cut---

Best regards,

Ranko


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-07  7:55                     ` Ranko Zivojnovic
@ 2007-07-07 15:10                       ` Patrick McHardy
  2007-07-09  7:47                         ` Jarek Poplawski
  2007-07-09 12:41                         ` Ranko Zivojnovic
  0 siblings, 2 replies; 25+ messages in thread
From: Patrick McHardy @ 2007-07-07 15:10 UTC (permalink / raw)
  To: Ranko Zivojnovic; +Cc: Jarek Poplawski, akpm, netdev

On Sat, 7 Jul 2007, Ranko Zivojnovic wrote:

>> On Fri, 2007-07-06 at 16:21 +0200, Patrick McHardy wrote:
>>> There is at least one ABBA deadlock, est_timer does:
>>>
>>> read_lock(&est_lock)
>>> spin_lock(e->stats_lock) (which is dev->queue_lock)
>>>
>>> and qdisc_destroy calls htb_destroy under dev->queue_lock, which
>>> calls htb_destroy_class, then gen_kill_estimator and this
>>> write_locks est_lock.
>>>
>>> I can't see the problem above though, the qdisc_run path only takes
>>> dev->queue_lock. Please enable lockdep and post the output if any.
>>
>
> I've got both code paths this time. It shows exactly the ABBA deadlock
> you describe above. The details are below.


Thanks. I'm still wondering whats causing the other lockup
you're seeing.

> Maybe the appropriate way to fix this would to call gen_kill_estimator,
> with the appropriate lock order, before the call to qdisc_destroy, so
> when dev->queue_lock is taken for qdisc_destroy - the structure is
> already off the list.


Probably easier to just kill est_lock and use rcu lists.
I'm currently travelling, I'll look into it tomorrow.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-07 15:10                       ` Patrick McHardy
@ 2007-07-09  7:47                         ` Jarek Poplawski
  2007-07-09 12:41                         ` Ranko Zivojnovic
  1 sibling, 0 replies; 25+ messages in thread
From: Jarek Poplawski @ 2007-07-09  7:47 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Ranko Zivojnovic, akpm, netdev

On Sat, Jul 07, 2007 at 05:10:54PM +0200, Patrick McHardy wrote:
> On Sat, 7 Jul 2007, Ranko Zivojnovic wrote:
> 
> >>On Fri, 2007-07-06 at 16:21 +0200, Patrick McHardy wrote:
> >>>There is at least one ABBA deadlock, est_timer does:
> >>>
> >>>read_lock(&est_lock)
> >>>spin_lock(e->stats_lock) (which is dev->queue_lock)
> >>>
> >>>and qdisc_destroy calls htb_destroy under dev->queue_lock, which
> >>>calls htb_destroy_class, then gen_kill_estimator and this
> >>>write_locks est_lock.
> >>>
> >>>I can't see the problem above though, the qdisc_run path only takes
> >>>dev->queue_lock. Please enable lockdep and post the output if any.
> >>
> >
> >I've got both code paths this time. It shows exactly the ABBA deadlock
> >you describe above. The details are below.
> 
> 
> Thanks. I'm still wondering whats causing the other lockup
> you're seeing.
> 
> >Maybe the appropriate way to fix this would to call gen_kill_estimator,
> >with the appropriate lock order, before the call to qdisc_destroy, so
> >when dev->queue_lock is taken for qdisc_destroy - the structure is
> >already off the list.
> 
> 
> Probably easier to just kill est_lock and use rcu lists.
> I'm currently travelling, I'll look into it tomorrow.
> 

Congratulations to both of you!

Ranko, it looks like you have some licence to kill!
(I hope bugs only...)

Cheers,
Jarek P.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-07 15:10                       ` Patrick McHardy
  2007-07-09  7:47                         ` Jarek Poplawski
@ 2007-07-09 12:41                         ` Ranko Zivojnovic
  2007-07-09 13:52                           ` Patrick McHardy
  1 sibling, 1 reply; 25+ messages in thread
From: Ranko Zivojnovic @ 2007-07-09 12:41 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Jarek Poplawski, akpm, netdev

[-- Attachment #1: Type: text/plain, Size: 819 bytes --]

On Sat, 2007-07-07 at 17:10 +0200, Patrick McHardy wrote:
> On Sat, 7 Jul 2007, Ranko Zivojnovic wrote:
> > Maybe the appropriate way to fix this would to call gen_kill_estimator,
> > with the appropriate lock order, before the call to qdisc_destroy, so
> > when dev->queue_lock is taken for qdisc_destroy - the structure is
> > already off the list.
> 
> Probably easier to just kill est_lock and use rcu lists.
> I'm currently travelling, I'll look into it tomorrow.

Patrick, I've taken liberty to try and implement this myself. Attached
is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch
that is RCU lists based. Please be kind to review.

I've just compiled it against 2.6.22 together with sch_htb patch and I
am currently testing it. I will let it run until tomorrow or until it
crashes.

R.

[-- Attachment #2: gen_estimator-fix-locking-and-timer-related-bugs.patch --]
[-- Type: text/x-patch, Size: 3816 bytes --]

--- a/net/core/gen_estimator.c	2007-06-25 02:21:48.000000000 +0300
+++ b/net/core/gen_estimator.c	2007-07-09 14:27:12.053336875 +0300
@@ -79,7 +79,7 @@
 
 struct gen_estimator
 {
-	struct gen_estimator	*next;
+	struct list_head	list;
 	struct gnet_stats_basic	*bstats;
 	struct gnet_stats_rate_est	*rate_est;
 	spinlock_t		*stats_lock;
@@ -89,26 +89,27 @@
 	u32			last_packets;
 	u32			avpps;
 	u32			avbps;
+	struct rcu_head		e_rcu;
 };
 
 struct gen_estimator_head
 {
 	struct timer_list	timer;
-	struct gen_estimator	*list;
+	struct list_head	list;
 };
 
 static struct gen_estimator_head elist[EST_MAX_INTERVAL+1];
 
 /* Estimator array lock */
-static DEFINE_RWLOCK(est_lock);
+static DEFINE_SPINLOCK(est_lock);
 
 static void est_timer(unsigned long arg)
 {
 	int idx = (int)arg;
 	struct gen_estimator *e;
 
-	read_lock(&est_lock);
-	for (e = elist[idx].list; e; e = e->next) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(e, &elist[idx].list, list) {
 		u64 nbytes;
 		u32 npackets;
 		u32 rate;
@@ -127,9 +128,9 @@
 		e->rate_est->pps = (e->avpps+0x1FF)>>10;
 		spin_unlock(e->stats_lock);
 	}
-
-	mod_timer(&elist[idx].timer, jiffies + ((HZ<<idx)/4));
-	read_unlock(&est_lock);
+	if (!list_empty(&elist[idx].list))
+		mod_timer(&elist[idx].timer, jiffies + ((HZ<<idx)/4));
+	rcu_read_unlock();
 }
 
 /**
@@ -152,6 +153,7 @@
 {
 	struct gen_estimator *est;
 	struct gnet_estimator *parm = RTA_DATA(opt);
+	int idx;
 
 	if (RTA_PAYLOAD(opt) < sizeof(*parm))
 		return -EINVAL;
@@ -163,7 +165,8 @@
 	if (est == NULL)
 		return -ENOBUFS;
 
-	est->interval = parm->interval + 2;
+	INIT_LIST_HEAD(&est->list);
+	est->interval = idx = parm->interval + 2;
 	est->bstats = bstats;
 	est->rate_est = rate_est;
 	est->stats_lock = stats_lock;
@@ -173,20 +176,26 @@
 	est->last_packets = bstats->packets;
 	est->avpps = rate_est->pps<<10;
 
-	est->next = elist[est->interval].list;
-	if (est->next == NULL) {
-		init_timer(&elist[est->interval].timer);
-		elist[est->interval].timer.data = est->interval;
-		elist[est->interval].timer.expires = jiffies + ((HZ<<est->interval)/4);
-		elist[est->interval].timer.function = est_timer;
-		add_timer(&elist[est->interval].timer);
+	spin_lock_bh(&est_lock);
+	if (!elist[idx].timer.function) {
+		INIT_LIST_HEAD(&elist[idx].list);
+		setup_timer(&elist[idx].timer, est_timer, est->interval);
 	}
-	write_lock_bh(&est_lock);
-	elist[est->interval].list = est;
-	write_unlock_bh(&est_lock);
+		
+	if (list_empty(&elist[est->interval].list))
+		mod_timer(&elist[idx].timer, jiffies + ((HZ<<idx)/4));
+
+	list_add_rcu(&est->list, &elist[idx].list);
+	spin_unlock_bh(&est_lock);
 	return 0;
 }
 
+static void __gen_kill_estimator(struct rcu_head *head)
+{
+	struct gen_estimator *e = container_of(head, struct gen_estimator, e_rcu);
+	kfree(e);
+}
+
 /**
  * gen_kill_estimator - remove a rate estimator
  * @bstats: basic statistics
@@ -199,26 +208,23 @@
 	struct gnet_stats_rate_est *rate_est)
 {
 	int idx;
-	struct gen_estimator *est, **pest;
+	struct gen_estimator *e, *n;
 
 	for (idx=0; idx <= EST_MAX_INTERVAL; idx++) {
-		int killed = 0;
-		pest = &elist[idx].list;
-		while ((est=*pest) != NULL) {
-			if (est->rate_est != rate_est || est->bstats != bstats) {
-				pest = &est->next;
-				continue;
-			}
 
-			write_lock_bh(&est_lock);
-			*pest = est->next;
-			write_unlock_bh(&est_lock);
+		/* Skip non initialized indexes */
+		if (!elist[idx].timer.function)
+			continue;
+
+		list_for_each_entry_safe(e, n, &elist[idx].list, list) {
+			if (e->rate_est != rate_est || e->bstats != bstats)
+				continue;
 
-			kfree(est);
-			killed++;
+			spin_lock_bh(&est_lock);
+			list_del_rcu(&e->list);
+			spin_unlock_bh(&est_lock);
+			call_rcu(&e->e_rcu, __gen_kill_estimator);
 		}
-		if (killed && elist[idx].list == NULL)
-			del_timer(&elist[idx].timer);
 	}
 }
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-09 12:41                         ` Ranko Zivojnovic
@ 2007-07-09 13:52                           ` Patrick McHardy
  2007-07-09 16:43                             ` Ranko Zivojnovic
  0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2007-07-09 13:52 UTC (permalink / raw)
  To: Ranko Zivojnovic; +Cc: Jarek Poplawski, akpm, netdev

Ranko Zivojnovic wrote:
> On Sat, 2007-07-07 at 17:10 +0200, Patrick McHardy wrote:
> 
>>On Sat, 7 Jul 2007, Ranko Zivojnovic wrote:
>>
>>>Maybe the appropriate way to fix this would to call gen_kill_estimator,
>>>with the appropriate lock order, before the call to qdisc_destroy, so
>>>when dev->queue_lock is taken for qdisc_destroy - the structure is
>>>already off the list.
>>
>>Probably easier to just kill est_lock and use rcu lists.
>>I'm currently travelling, I'll look into it tomorrow.
> 
> 
> Patrick, I've taken liberty to try and implement this myself. Attached
> is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch
> that is RCU lists based. Please be kind to review.


Thanks for taking care of this, I'm a bit behind.
See below for comments ..


> --- a/net/core/gen_estimator.c	2007-06-25 02:21:48.000000000 +0300
> +++ b/net/core/gen_estimator.c	2007-07-09 14:27:12.053336875 +0300
> @@ -79,7 +79,7 @@
>  
>  struct gen_estimator
>  {
> -	struct gen_estimator	*next;
> +	struct list_head	list;
>  	struct gnet_stats_basic	*bstats;
>  	struct gnet_stats_rate_est	*rate_est;
>  	spinlock_t		*stats_lock;
> @@ -89,26 +89,27 @@
>  	u32			last_packets;
>  	u32			avpps;
>  	u32			avbps;
> +	struct rcu_head		e_rcu;


You could also use synchronize_rcu(), estimator destruction is
not particulary performance critical.

>  static struct gen_estimator_head elist[EST_MAX_INTERVAL+1];
>  
>  /* Estimator array lock */
> -static DEFINE_RWLOCK(est_lock);
> +static DEFINE_SPINLOCK(est_lock);


The lock isn't needed anymore since we can rely on the rtnl
for creation/destruction.

>  /**
> @@ -152,6 +153,7 @@
>  {
>  	struct gen_estimator *est;
>  	struct gnet_estimator *parm = RTA_DATA(opt);
> +	int idx;
>  
>  	if (RTA_PAYLOAD(opt) < sizeof(*parm))
>  		return -EINVAL;
> @@ -163,7 +165,8 @@
>  	if (est == NULL)
>  		return -ENOBUFS;
>  
> -	est->interval = parm->interval + 2;
> +	INIT_LIST_HEAD(&est->list);


Initializing the members' list_head isn't necessary.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-09 13:52                           ` Patrick McHardy
@ 2007-07-09 16:43                             ` Ranko Zivojnovic
  2007-07-09 16:54                               ` Patrick McHardy
  2007-07-10  7:34                               ` Jarek Poplawski
  0 siblings, 2 replies; 25+ messages in thread
From: Ranko Zivojnovic @ 2007-07-09 16:43 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Jarek Poplawski, akpm, netdev

[-- Attachment #1: Type: text/plain, Size: 2712 bytes --]

On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote:
> Ranko Zivojnovic wrote:
> > Patrick, I've taken liberty to try and implement this myself. Attached
> > is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch
> > that is RCU lists based. Please be kind to review.
> 
> Thanks for taking care of this, I'm a bit behind.

Glad to be able to help.

> See below for comments ..
> 
> > --- a/net/core/gen_estimator.c	2007-06-25 02:21:48.000000000 +0300
> > +++ b/net/core/gen_estimator.c	2007-07-09 14:27:12.053336875 +0300
> > @@ -79,7 +79,7 @@
> >  
> >  struct gen_estimator
> >  {
> > -	struct gen_estimator	*next;
> > +	struct list_head	list;
> >  	struct gnet_stats_basic	*bstats;
> >  	struct gnet_stats_rate_est	*rate_est;
> >  	spinlock_t		*stats_lock;
> > @@ -89,26 +89,27 @@
> >  	u32			last_packets;
> >  	u32			avpps;
> >  	u32			avbps;
> > +	struct rcu_head		e_rcu;
> 
> 
> You could also use synchronize_rcu(), estimator destruction is
> not particulary performance critical.

I've tried synchronize_rcu(), but it went kaboom with the below, thus
call_rcu() must stay:

---cut---
BUG: sleeping function called from invalid context at kernel/sched.c:3928
in_atomic():1, irqs_disabled():0
3 locks held by tc/2933:
 #0:  (rtnl_mutex){--..}, at: [<c0296e0a>] rtnetlink_rcv+0x18/0x42
 #1:  (&dev->queue_lock){-+..}, at: [<c029c337>] qdisc_lock_tree+0xe/0x1c
 #2:  (&dev->ingress_lock){-...}, at: [<c029dfb8>] tc_get_qdisc+0x192/0x1e9
 [<c02eec9e>] wait_for_completion+0x1a/0xb7
 [<c01e3cd4>] __spin_lock_init+0x29/0x4b
 [<c012e410>] synchronize_rcu+0x2a/0x2f
 [<c012e0bc>] wakeme_after_rcu+0x0/0x8
 [<c028da00>] gen_kill_estimator+0x70/0x9b
 [<f8b7c5a6>] htb_destroy_class+0x27/0x12f [sch_htb]
 [<f8b7d037>] htb_destroy+0x34/0x70 [sch_htb]
 [<c029c554>] qdisc_destroy+0x44/0x69
 . 
 .
 .
---cut---

> >  static struct gen_estimator_head elist[EST_MAX_INTERVAL+1];
> >  
> >  /* Estimator array lock */
> > -static DEFINE_RWLOCK(est_lock);
> > +static DEFINE_SPINLOCK(est_lock);
> 
> 
> The lock isn't needed anymore since we can rely on the rtnl
> for creation/destruction.

The 'est_lock' now completely removed.

> >  /**
> > @@ -152,6 +153,7 @@
> >  {
> >  	struct gen_estimator *est;
> >  	struct gnet_estimator *parm = RTA_DATA(opt);
> > +	int idx;
> >  
> >  	if (RTA_PAYLOAD(opt) < sizeof(*parm))
> >  		return -EINVAL;
> > @@ -163,7 +165,8 @@
> >  	if (est == NULL)
> >  		return -ENOBUFS;
> >  
> > -	est->interval = parm->interval + 2;
> > +	INIT_LIST_HEAD(&est->list);
> 
> 
> Initializing the members' list_head isn't necessary.
> 
Removed.

All the changes above now included in the attached patch. I've clean
compiled it with 2.6.22 and now testing this version.

R.

[-- Attachment #2: gen_estimator-fix-locking-and-timer-related-bugs.patch --]
[-- Type: text/x-patch, Size: 3635 bytes --]

--- a/net/core/gen_estimator.c	2007-06-25 02:21:48.000000000 +0300
+++ b/net/core/gen_estimator.c	2007-07-09 19:08:06.801544963 +0300
@@ -79,7 +79,7 @@
 
 struct gen_estimator
 {
-	struct gen_estimator	*next;
+	struct list_head	list;
 	struct gnet_stats_basic	*bstats;
 	struct gnet_stats_rate_est	*rate_est;
 	spinlock_t		*stats_lock;
@@ -89,26 +89,24 @@
 	u32			last_packets;
 	u32			avpps;
 	u32			avbps;
+	struct rcu_head		e_rcu;
 };
 
 struct gen_estimator_head
 {
 	struct timer_list	timer;
-	struct gen_estimator	*list;
+	struct list_head	list;
 };
 
 static struct gen_estimator_head elist[EST_MAX_INTERVAL+1];
 
-/* Estimator array lock */
-static DEFINE_RWLOCK(est_lock);
-
 static void est_timer(unsigned long arg)
 {
 	int idx = (int)arg;
 	struct gen_estimator *e;
 
-	read_lock(&est_lock);
-	for (e = elist[idx].list; e; e = e->next) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(e, &elist[idx].list, list) {
 		u64 nbytes;
 		u32 npackets;
 		u32 rate;
@@ -127,9 +125,9 @@
 		e->rate_est->pps = (e->avpps+0x1FF)>>10;
 		spin_unlock(e->stats_lock);
 	}
-
-	mod_timer(&elist[idx].timer, jiffies + ((HZ<<idx)/4));
-	read_unlock(&est_lock);
+	if (!list_empty(&elist[idx].list))
+		mod_timer(&elist[idx].timer, jiffies + ((HZ<<idx)/4));
+	rcu_read_unlock();
 }
 
 /**
@@ -152,6 +150,7 @@
 {
 	struct gen_estimator *est;
 	struct gnet_estimator *parm = RTA_DATA(opt);
+	int idx;
 
 	if (RTA_PAYLOAD(opt) < sizeof(*parm))
 		return -EINVAL;
@@ -163,7 +162,7 @@
 	if (est == NULL)
 		return -ENOBUFS;
 
-	est->interval = parm->interval + 2;
+	est->interval = idx = parm->interval + 2;
 	est->bstats = bstats;
 	est->rate_est = rate_est;
 	est->stats_lock = stats_lock;
@@ -173,20 +172,24 @@
 	est->last_packets = bstats->packets;
 	est->avpps = rate_est->pps<<10;
 
-	est->next = elist[est->interval].list;
-	if (est->next == NULL) {
-		init_timer(&elist[est->interval].timer);
-		elist[est->interval].timer.data = est->interval;
-		elist[est->interval].timer.expires = jiffies + ((HZ<<est->interval)/4);
-		elist[est->interval].timer.function = est_timer;
-		add_timer(&elist[est->interval].timer);
+	if (!elist[idx].timer.function) {
+		INIT_LIST_HEAD(&elist[idx].list);
+		setup_timer(&elist[idx].timer, est_timer, est->interval);
 	}
-	write_lock_bh(&est_lock);
-	elist[est->interval].list = est;
-	write_unlock_bh(&est_lock);
+		
+	if (list_empty(&elist[est->interval].list))
+		mod_timer(&elist[idx].timer, jiffies + ((HZ<<idx)/4));
+
+	list_add_rcu(&est->list, &elist[idx].list);
 	return 0;
 }
 
+static void __gen_kill_estimator(struct rcu_head *head)
+{
+	struct gen_estimator *e = container_of(head, struct gen_estimator, e_rcu);
+	kfree(e);
+}
+
 /**
  * gen_kill_estimator - remove a rate estimator
  * @bstats: basic statistics
@@ -199,26 +202,21 @@
 	struct gnet_stats_rate_est *rate_est)
 {
 	int idx;
-	struct gen_estimator *est, **pest;
+	struct gen_estimator *e, *n;
 
 	for (idx=0; idx <= EST_MAX_INTERVAL; idx++) {
-		int killed = 0;
-		pest = &elist[idx].list;
-		while ((est=*pest) != NULL) {
-			if (est->rate_est != rate_est || est->bstats != bstats) {
-				pest = &est->next;
-				continue;
-			}
 
-			write_lock_bh(&est_lock);
-			*pest = est->next;
-			write_unlock_bh(&est_lock);
+		/* Skip non initialized indexes */
+		if (!elist[idx].timer.function)
+			continue;
+
+		list_for_each_entry_safe(e, n, &elist[idx].list, list) {
+			if (e->rate_est != rate_est || e->bstats != bstats)
+				continue;
 
-			kfree(est);
-			killed++;
+			list_del_rcu(&e->list);
+			call_rcu(&e->e_rcu, __gen_kill_estimator);
 		}
-		if (killed && elist[idx].list == NULL)
-			del_timer(&elist[idx].timer);
 	}
 }
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-09 16:43                             ` Ranko Zivojnovic
@ 2007-07-09 16:54                               ` Patrick McHardy
  2007-07-10  7:34                               ` Jarek Poplawski
  1 sibling, 0 replies; 25+ messages in thread
From: Patrick McHardy @ 2007-07-09 16:54 UTC (permalink / raw)
  To: Ranko Zivojnovic; +Cc: Jarek Poplawski, akpm, netdev

Ranko Zivojnovic wrote:
> On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote:
> 
>>You could also use synchronize_rcu(), estimator destruction is
>>not particulary performance critical.
> 
> 
> I've tried synchronize_rcu(), but it went kaboom with the below, thus
> call_rcu() must stay:
> 
> ---cut---
> BUG: sleeping function called from invalid context at kernel/sched.c:3928
> in_atomic():1, irqs_disabled():0


Right, I forgot that we're holding some locks.

> [...]
> All the changes above now included in the attached patch. I've clean
> compiled it with 2.6.22 and now testing this version.


Looks good. Please resend with a Signed-off-by: line once you're done
testing so Dave can apply it (ideally with a new subject so it doesn't
get missed).


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-09 16:43                             ` Ranko Zivojnovic
  2007-07-09 16:54                               ` Patrick McHardy
@ 2007-07-10  7:34                               ` Jarek Poplawski
  2007-07-10 10:09                                 ` Ranko Zivojnovic
  1 sibling, 1 reply; 25+ messages in thread
From: Jarek Poplawski @ 2007-07-10  7:34 UTC (permalink / raw)
  To: Ranko Zivojnovic; +Cc: Patrick McHardy, akpm, netdev

On Mon, Jul 09, 2007 at 07:43:40PM +0300, Ranko Zivojnovic wrote:
> On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote:
> > Ranko Zivojnovic wrote:
> > > Patrick, I've taken liberty to try and implement this myself. Attached
> > > is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch
> > > that is RCU lists based. Please be kind to review.
...

I've some doubts/suggestions too:

> --- a/net/core/gen_estimator.c	2007-06-25 02:21:48.000000000 +0300
> +++ b/net/core/gen_estimator.c	2007-07-09 19:08:06.801544963 +0300
...
> @@ -173,20 +172,24 @@
>  	est->last_packets = bstats->packets;
>  	est->avpps = rate_est->pps<<10;
>  
> -	est->next = elist[est->interval].list;
> -	if (est->next == NULL) {
> -		init_timer(&elist[est->interval].timer);
> -		elist[est->interval].timer.data = est->interval;
> -		elist[est->interval].timer.expires = jiffies + ((HZ<<est->interval)/4);
> -		elist[est->interval].timer.function = est_timer;
> -		add_timer(&elist[est->interval].timer);
> +	if (!elist[idx].timer.function) {
> +		INIT_LIST_HEAD(&elist[idx].list);
> +		setup_timer(&elist[idx].timer, est_timer, est->interval);

s/est->interval/idx/ here and below.

>  	}
> -	write_lock_bh(&est_lock);
> -	elist[est->interval].list = est;
> -	write_unlock_bh(&est_lock);
> +		
> +	if (list_empty(&elist[est->interval].list))
> +		mod_timer(&elist[idx].timer, jiffies + ((HZ<<idx)/4));
> +
> +	list_add_rcu(&est->list, &elist[idx].list);
>  	return 0;
>  }
>  
> +static void __gen_kill_estimator(struct rcu_head *head)
> +{
> +	struct gen_estimator *e = container_of(head, struct gen_estimator, e_rcu);
> +	kfree(e);
> +}
> +
>  /**
>   * gen_kill_estimator - remove a rate estimator
>   * @bstats: basic statistics
> @@ -199,26 +202,21 @@
>  	struct gnet_stats_rate_est *rate_est)
>  {
...
> +		list_for_each_entry_safe(e, n, &elist[idx].list, list) {

IMHO, at least for readability list_for_each_entry_rcu() is better here.

> +			if (e->rate_est != rate_est || e->bstats != bstats)
> +				continue;
>  
> -			kfree(est);
> -			killed++;
> +			list_del_rcu(&e->list);
> +			call_rcu(&e->e_rcu, __gen_kill_estimator);

I think a race is possible here: e.g. a timer could be running
after return from this function yet, and trying to use *bstats,
*rate_est and maybe even stats_lock after their destruction.

BTW, I think, rcu_read_lock/unlock are recommended around e.g.
rcu lists traversals, even if the current implementation doesn't
use them now.

Regards,
Jarek P.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-10  7:34                               ` Jarek Poplawski
@ 2007-07-10 10:09                                 ` Ranko Zivojnovic
  2007-07-10 12:17                                   ` Jarek Poplawski
  0 siblings, 1 reply; 25+ messages in thread
From: Ranko Zivojnovic @ 2007-07-10 10:09 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Patrick McHardy, akpm, netdev

On Tue, 2007-07-10 at 09:34 +0200, Jarek Poplawski wrote:
> On Mon, Jul 09, 2007 at 07:43:40PM +0300, Ranko Zivojnovic wrote:
> > On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote:
> > > Ranko Zivojnovic wrote:
> > > > Patrick, I've taken liberty to try and implement this myself. Attached
> > > > is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch
> > > > that is RCU lists based. Please be kind to review.
> ...
> 
> I've some doubts/suggestions too:
> 
> > --- a/net/core/gen_estimator.c	2007-06-25 02:21:48.000000000 +0300
> > +++ b/net/core/gen_estimator.c	2007-07-09 19:08:06.801544963 +0300
> ...
> > @@ -173,20 +172,24 @@
> >  	est->last_packets = bstats->packets;
> >  	est->avpps = rate_est->pps<<10;
> >  
> > -	est->next = elist[est->interval].list;
> > -	if (est->next == NULL) {
> > -		init_timer(&elist[est->interval].timer);
> > -		elist[est->interval].timer.data = est->interval;
> > -		elist[est->interval].timer.expires = jiffies + ((HZ<<est->interval)/4);
> > -		elist[est->interval].timer.function = est_timer;
> > -		add_timer(&elist[est->interval].timer);
> > +	if (!elist[idx].timer.function) {
> > +		INIT_LIST_HEAD(&elist[idx].list);
> > +		setup_timer(&elist[idx].timer, est_timer, est->interval);
> 
> s/est->interval/idx/ here and below.

Agree - will do by final submission.

> 
> >  	}
> > -	write_lock_bh(&est_lock);
> > -	elist[est->interval].list = est;
> > -	write_unlock_bh(&est_lock);
> > +		
> > +	if (list_empty(&elist[est->interval].list))
> > +		mod_timer(&elist[idx].timer, jiffies + ((HZ<<idx)/4));
> > +
> > +	list_add_rcu(&est->list, &elist[idx].list);
> >  	return 0;
> >  }
> >  
> > +static void __gen_kill_estimator(struct rcu_head *head)
> > +{
> > +	struct gen_estimator *e = container_of(head, struct gen_estimator, e_rcu);
> > +	kfree(e);
> > +}
> > +
> >  /**
> >   * gen_kill_estimator - remove a rate estimator
> >   * @bstats: basic statistics
> > @@ -199,26 +202,21 @@
> >  	struct gnet_stats_rate_est *rate_est)
> >  {
> ...
> > +		list_for_each_entry_safe(e, n, &elist[idx].list, list) {
> 
> IMHO, at least for readability list_for_each_entry_rcu() is better here.

Should not hurt - however functionality wise not necessary as the list
is protected by rtnl_mutex from being altered - also, for correctness,
it should be list_for_each_safe_rcu() as there is a list_del_rcu in the
loop... 

However I decided not to use _rcu based iteration neither the
rcu_read_lock() after going through the RCU documentation and a bunch of
examples in kernel that iterate through the lists using non _rcu macros
and do list_del_rcu() just fine.

For readability, the reference to list_del_rcu as well as call_rcu, I
believe, should be enough of the indication. Please do correct me if I
am wrong here.

> 
> > +			if (e->rate_est != rate_est || e->bstats != bstats)
> > +				continue;
> >  
> > -			kfree(est);
> > -			killed++;
> > +			list_del_rcu(&e->list);
> > +			call_rcu(&e->e_rcu, __gen_kill_estimator);
> 
> I think a race is possible here: e.g. a timer could be running
> after return from this function yet, and trying to use *bstats,
> *rate_est and maybe even stats_lock after their destruction.
> 

That will not happen because est_timer protects the loop with
rcu_read_lock() as well as the iteration is done using
list_for_each_entry_rcu(). The destruction callback is deferred and will
happen only after the outermost rcu_read_unlock() is called. Well - that
is at least what the documentation says...

> BTW, I think, rcu_read_lock/unlock are recommended around e.g.
> rcu lists traversals, even if the current implementation doesn't
> use them now.

I am afraid I do not understand what that the current implementation is
not using right now... The whole concept and the implementation of the
RCU, as I understand it, depends on rcu_read_lock/unlock to protect the
read-side critical sections...

Please be kind to elaborate...

R.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-10 10:09                                 ` Ranko Zivojnovic
@ 2007-07-10 12:17                                   ` Jarek Poplawski
  2007-07-10 12:20                                     ` Patrick McHardy
  0 siblings, 1 reply; 25+ messages in thread
From: Jarek Poplawski @ 2007-07-10 12:17 UTC (permalink / raw)
  To: Ranko Zivojnovic; +Cc: Patrick McHardy, akpm, netdev

On Tue, Jul 10, 2007 at 01:09:07PM +0300, Ranko Zivojnovic wrote:
> On Tue, 2007-07-10 at 09:34 +0200, Jarek Poplawski wrote:
> > On Mon, Jul 09, 2007 at 07:43:40PM +0300, Ranko Zivojnovic wrote:
> > > On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote:
> > > > Ranko Zivojnovic wrote:
> > > > > Patrick, I've taken liberty to try and implement this myself. Attached
> > > > > is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch
> > > > > that is RCU lists based. Please be kind to review.
> > ...
> > 
> > I've some doubts/suggestions too:
> > 
> > > --- a/net/core/gen_estimator.c	2007-06-25 02:21:48.000000000 +0300
> > > +++ b/net/core/gen_estimator.c	2007-07-09 19:08:06.801544963 +0300
...
> > >  	struct gnet_stats_rate_est *rate_est)
> > >  {
> > ...
> > > +		list_for_each_entry_safe(e, n, &elist[idx].list, list) {
> > 
> > IMHO, at least for readability list_for_each_entry_rcu() is better here.
> 
> Should not hurt - however functionality wise not necessary as the list
> is protected by rtnl_mutex from being altered - also, for correctness,
> it should be list_for_each_safe_rcu() as there is a list_del_rcu in the
> loop... 
> 
> However I decided not to use _rcu based iteration neither the
> rcu_read_lock() after going through the RCU documentation and a bunch of
> examples in kernel that iterate through the lists using non _rcu macros
> and do list_del_rcu() just fine.
> 
> For readability, the reference to list_del_rcu as well as call_rcu, I
> believe, should be enough of the indication. Please do correct me if I
> am wrong here.

It's only my opinion, and it's probably not very popular at least
at net/ code, so it's more about general policy and not this
particular code. But:
- if somebody is looking after some rcu related problems, why can't
he/she spare some time by omitting lists without _rcu and not
analyzing why/how such lists are used and locked?
- if some day rcu code should change and needs to know precisely,
where all the reads and writes take place (or some automatic tool
needs this for checking), and it's no problem to show this now,
why leave this for future?

> 
> > 
> > > +			if (e->rate_est != rate_est || e->bstats != bstats)
> > > +				continue;
> > >  
> > > -			kfree(est);
> > > -			killed++;
> > > +			list_del_rcu(&e->list);
> > > +			call_rcu(&e->e_rcu, __gen_kill_estimator);
> > 
> > I think a race is possible here: e.g. a timer could be running
> > after return from this function yet, and trying to use *bstats,
> > *rate_est and maybe even stats_lock after their destruction.
> > 
> 
> That will not happen because est_timer protects the loop with
> rcu_read_lock() as well as the iteration is done using
> list_for_each_entry_rcu(). The destruction callback is deferred and will
> happen only after the outermost rcu_read_unlock() is called. Well - that
> is at least what the documentation says...

I'm not very sure it's really possible, so I hope it'll be checked
by somebody more. I don't mean gen_estimator struct here, but
gnet_stats_basic and gnet_stats_rate_est structs here, which are
allocated elsewhere, and could be destructed just 'after' - if not
before: that's the question... E.g. when we are leaving this
function, on another cpu a timer could traverse this list and I'm
not sure rcu has to show the change fast enough - before freeing
these other structs.

> 
> > BTW, I think, rcu_read_lock/unlock are recommended around e.g.
> > rcu lists traversals, even if the current implementation doesn't
> > use them now.
> 
> I am afraid I do not understand what that the current implementation is
> not using right now... The whole concept and the implementation of the
> RCU, as I understand it, depends on rcu_read_lock/unlock to protect the
> read-side critical sections...
> 
> Please be kind to elaborate...

Don't take this very seriously, there are many opinions...

Cheers,
Jarek P.

PS: I think you could read more about this here:

http://marc.info/?l=linux-kernel&m=118193431820959&w=2

Especially this:

List:       linux-kernel
Subject:    Re: Using RCU with rcu_read_lock()?
From:       Peter Zijlstra <a.p.zijlstra () chello ! nl>
Date:       2007-06-15 19:04:19
Message-ID: 1181934259.11113.6.camel () lappy
[Download message RAW]

On Fri, 2007-06-15 at 15:00 -0400, Dmitry Torokhov wrote:
> Hi,
> 
> I have a piece of code that is always called under a spinlock with
> interrups disabled. Within that piece of code I iterate through a
> list. I have another piece of code that wants to modify that list. I
> have 2 options:
> 
> 1) Have the other piece of code acquire the same spinlock
> 2) Use RCU
> 
> I don't want to do 1) because the otheir piece of code does not really
> care about object owning the spinlock and so acquiring the spinlock is
> "not nice". However it is guaranteed that the piece of code that
> accesses lock runs atomically with interrupts disabled. So
> rcu_read_lock() would be superfluos there.
> 
> Is it possible to still use list_for_each_rcu() and friends to access
> that list without rcu_read_lock()? Or it is betteruse complete RCU
> interface and eat cost of couple of extra instrctions?

Yes, preemptible rcu requires that you use the full interface, also, it
more clearly documents the code. Trying to find code that breaks these
assumptions is very tedious work after the fact.

Please do use the RCU interface in full.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-10 12:17                                   ` Jarek Poplawski
@ 2007-07-10 12:20                                     ` Patrick McHardy
  2007-07-10 13:10                                       ` Jarek Poplawski
  0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2007-07-10 12:20 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Ranko Zivojnovic, akpm, netdev

Jarek Poplawski wrote:
> On Tue, Jul 10, 2007 at 01:09:07PM +0300, Ranko Zivojnovic wrote:
> 
>>However I decided not to use _rcu based iteration neither the
>>rcu_read_lock() after going through the RCU documentation and a bunch of
>>examples in kernel that iterate through the lists using non _rcu macros
>>and do list_del_rcu() just fine.
>>
>>For readability, the reference to list_del_rcu as well as call_rcu, I
>>believe, should be enough of the indication. Please do correct me if I
>>am wrong here.
> 
> 
> It's only my opinion, and it's probably not very popular at least
> at net/ code, so it's more about general policy and not this
> particular code. But:
> - if somebody is looking after some rcu related problems, why can't
> he/she spare some time by omitting lists without _rcu and not
> analyzing why/how such lists are used and locked?


RCU is used for the read-side, using it on the write-side just makes
things *less* understandable IMO. It will look like the read-side
but still do modifications.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-10 12:20                                     ` Patrick McHardy
@ 2007-07-10 13:10                                       ` Jarek Poplawski
  2007-07-10 13:51                                         ` Jarek Poplawski
  0 siblings, 1 reply; 25+ messages in thread
From: Jarek Poplawski @ 2007-07-10 13:10 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Ranko Zivojnovic, akpm, netdev

On Tue, Jul 10, 2007 at 02:20:12PM +0200, Patrick McHardy wrote:
> Jarek Poplawski wrote:
> > On Tue, Jul 10, 2007 at 01:09:07PM +0300, Ranko Zivojnovic wrote:
> > 
> >>However I decided not to use _rcu based iteration neither the
> >>rcu_read_lock() after going through the RCU documentation and a bunch of
> >>examples in kernel that iterate through the lists using non _rcu macros
> >>and do list_del_rcu() just fine.
> >>
> >>For readability, the reference to list_del_rcu as well as call_rcu, I
> >>believe, should be enough of the indication. Please do correct me if I
> >>am wrong here.
> > 
> > 
> > It's only my opinion, and it's probably not very popular at least
> > at net/ code, so it's more about general policy and not this
> > particular code. But:
> > - if somebody is looking after some rcu related problems, why can't
> > he/she spare some time by omitting lists without _rcu and not
> > analyzing why/how such lists are used and locked?
> 
> 
> RCU is used for the read-side, using it on the write-side just makes
> things *less* understandable IMO. It will look like the read-side
> but still do modifications.
> 

>From Documentation/RCU/checklist:

"9.      All RCU list-traversal primitives, which include
        list_for_each_rcu(), list_for_each_entry_rcu(),
        list_for_each_continue_rcu(), and list_for_each_safe_rcu(),
        must be within an RCU read-side critical section.  RCU
        read-side critical sections are delimited by rcu_read_lock()
        and rcu_read_unlock(), or by similar primitives such as
        rcu_read_lock_bh() and rcu_read_unlock_bh().

        Use of the _rcu() list-traversal primitives outside of an
        RCU read-side critical section causes no harm other than
        a slight performance degradation on Alpha CPUs.  It can
        also be quite helpful in reducing code bloat when common
        code is shared between readers and updaters."

So, it seems, it's a question of taste about what is this bloat.

I'd certainly agree with you if the write-side looks like in
examples from documentation: there is always a few lines of code
with some spinlocks. But here we see no locks and no comments,
so each time we have to guess which of the supposed here locks
is needed for RCU or maybe for the lists without "rcu".

Sorry, for messing here with my private opinions: I really
shouldn't start this theme.

Jarek P.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
  2007-07-10 13:10                                       ` Jarek Poplawski
@ 2007-07-10 13:51                                         ` Jarek Poplawski
  0 siblings, 0 replies; 25+ messages in thread
From: Jarek Poplawski @ 2007-07-10 13:51 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Ranko Zivojnovic, akpm, netdev

On Tue, Jul 10, 2007 at 03:10:34PM +0200, Jarek Poplawski wrote:
> On Tue, Jul 10, 2007 at 02:20:12PM +0200, Patrick McHardy wrote:
> > Jarek Poplawski wrote:
> > > On Tue, Jul 10, 2007 at 01:09:07PM +0300, Ranko Zivojnovic wrote:
> > > 
> > >>However I decided not to use _rcu based iteration neither the
> > >>rcu_read_lock() after going through the RCU documentation and a bunch of
> > >>examples in kernel that iterate through the lists using non _rcu macros
> > >>and do list_del_rcu() just fine.
> > >>
> > >>For readability, the reference to list_del_rcu as well as call_rcu, I
> > >>believe, should be enough of the indication. Please do correct me if I
> > >>am wrong here.
> > > 
> > > 
> > > It's only my opinion, and it's probably not very popular at least
> > > at net/ code, so it's more about general policy and not this
> > > particular code. But:
> > > - if somebody is looking after some rcu related problems, why can't
> > > he/she spare some time by omitting lists without _rcu and not
> > > analyzing why/how such lists are used and locked?
> > 
> > 
> > RCU is used for the read-side, using it on the write-side just makes
> > things *less* understandable IMO. It will look like the read-side
> > but still do modifications.
> > 
> 
> From Documentation/RCU/checklist:
> 
> "9.      All RCU list-traversal primitives, which include
>         list_for_each_rcu(), list_for_each_entry_rcu(),
>         list_for_each_continue_rcu(), and list_for_each_safe_rcu(),
>         must be within an RCU read-side critical section.  RCU
>         read-side critical sections are delimited by rcu_read_lock()
>         and rcu_read_unlock(), or by similar primitives such as
>         rcu_read_lock_bh() and rcu_read_unlock_bh().

...But, on the other hand, Ranko didn't use any of these primitives...
So, first I said he should use this, and than I asked why there was
no rcu_read_locks around... I really start to amaze with my consistency.

I hope some day you'll forgive me (no hurry, I can wait).

Cheers,
Jarek P. 

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2007-07-10 13:43 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200706271921.l5RJLgCC003910@imap1.linux-foundation.org>
     [not found] ` <1183642800.3789.11.camel@ranko-fc2.spidernet.net>
     [not found]   ` <20070705142135.GG4759@ff.dom.local>
     [not found]     ` <1183646029.4069.11.camel@ranko-fc2.spidernet.net>
     [not found]       ` <1183651165.4069.26.camel@ranko-fc2.spidernet.net>
2007-07-06  6:14         ` + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree Jarek Poplawski
2007-07-06  6:20           ` Fwd: " Jarek Poplawski
2007-07-06  6:26           ` Jarek Poplawski
2007-07-06  6:45             ` Jarek Poplawski
2007-07-06 12:47               ` Jarek Poplawski
2007-07-06 13:16                 ` Ranko Zivojnovic
2007-07-09  8:25                   ` Jarek Poplawski
2007-07-06 13:14               ` Ranko Zivojnovic
2007-07-06 13:27                 ` Patrick McHardy
2007-07-06 13:59                   ` Ranko Zivojnovic
2007-07-06 14:21                 ` Patrick McHardy
2007-07-06 14:55                   ` Ranko Zivojnovic
2007-07-07  7:55                     ` Ranko Zivojnovic
2007-07-07 15:10                       ` Patrick McHardy
2007-07-09  7:47                         ` Jarek Poplawski
2007-07-09 12:41                         ` Ranko Zivojnovic
2007-07-09 13:52                           ` Patrick McHardy
2007-07-09 16:43                             ` Ranko Zivojnovic
2007-07-09 16:54                               ` Patrick McHardy
2007-07-10  7:34                               ` Jarek Poplawski
2007-07-10 10:09                                 ` Ranko Zivojnovic
2007-07-10 12:17                                   ` Jarek Poplawski
2007-07-10 12:20                                     ` Patrick McHardy
2007-07-10 13:10                                       ` Jarek Poplawski
2007-07-10 13:51                                         ` Jarek Poplawski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).