netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [3.19-rc3] tg3: BUG: sleeping function called from invalid context
@ 2015-01-13  0:59 Peter Hurley
  2015-01-13  2:30 ` Prashant Sreedharan
  2015-01-13  6:49 ` Michael Chan
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Hurley @ 2015-01-13  0:59 UTC (permalink / raw)
  To: Prashant Sreedharan, Michael Chan; +Cc: netdev, Linux kernel

On 3.19-rc3, I'm seeing this might_sleep() warning [1] from the tg3_open()
call stack. Let me know if I need to bisect this.

Regards,
Peter Hurley

[1]

[   17.203009] BUG: sleeping function called from invalid context at /home/peter/src/kernels/mainline/kernel/irq/manage.c:104
[   17.203067] in_atomic(): 1, irqs_disabled(): 0, pid: 1106, name: ip
[   17.203092] 2 locks held by ip/1106:
[   17.205255]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816adf1f>] rtnetlink_rcv+0x1f/0x40
[   17.207445]  #1:  (&(&tp->lock)->rlock){+.....}, at: [<ffffffffa01073e6>] tg3_start+0xc06/0x11f0 [tg3]
[   17.209725] CPU: 2 PID: 1106 Comm: ip Not tainted 3.19.0-rc3+wip-xeon+lockdep #rc3+wip
[   17.211900] Hardware name: Dell Inc. Precision WorkStation T5400  /0RW203, BIOS A11 04/30/2012
[   17.214086]  0000000000000068 ffff8802ac823498 ffffffff817af7e8 0000000000000005
[   17.216265]  ffffffff81a9be78 ffff8802ac8234a8 ffffffff810998a5 ffff8802ac8234d8
[   17.218446]  ffffffff8109991a ffff8802ac8234c8 ffff8802af0aae00 ffffffffa00ed000
[   17.220636] Call Trace:
[   17.222743]  [<ffffffff817af7e8>] dump_stack+0x4f/0x7b
[   17.224808]  [<ffffffff810998a5>] ___might_sleep+0x105/0x140
[   17.226842]  [<ffffffff8109991a>] __might_sleep+0x3a/0xa0
[   17.228869]  [<ffffffffa00ed000>] ? 0xffffffffa00ed000
[   17.230939]  [<ffffffff810d7d78>] synchronize_irq+0x38/0xa0
[   17.232967]  [<ffffffffa00ed000>] ? 0xffffffffa00ed000
[   17.234991]  [<ffffffffa010105f>] tg3_chip_reset+0x13f/0x9c0 [tg3]
[   17.236988]  [<ffffffffa01020ae>] tg3_reset_hw+0x7e/0x2d20 [tg3]
[   17.238996]  [<ffffffff813bfaff>] ? __udelay+0x2f/0x40
[   17.241007]  [<ffffffffa00ef2f7>] ? _tw32_flush+0x47/0x80 [tg3]
[   17.243066]  [<ffffffffa0104dac>] tg3_init_hw+0x5c/0x70 [tg3]
[   17.245438]  [<ffffffffa010740b>] tg3_start+0xc2b/0x11f0 [tg3]
[   17.247444]  [<ffffffffa0107ad7>] ? tg3_open+0x107/0x2e0 [tg3]
[   17.249556]  [<ffffffff810c338d>] ? trace_hardirqs_on+0xd/0x10
[   17.251581]  [<ffffffff8107806f>] ? __local_bh_enable_ip+0x6f/0x100
[   17.253710]  [<ffffffffa0107af8>] tg3_open+0x128/0x2e0 [tg3]
[   17.255758]  [<ffffffff816ba3f5>] ? netpoll_poll_disable+0x5/0xa0
[   17.257932]  [<ffffffff816a14af>] __dev_open+0xbf/0x140
[   17.260091]  [<ffffffff816a17c1>] __dev_change_flags+0xa1/0x160
[   17.262222]  [<ffffffff816a18a9>] dev_change_flags+0x29/0x60
[   17.264360]  [<ffffffff816b0e02>] do_setlink+0x2f2/0xa30
[   17.266431]  [<ffffffff816b1b7f>] rtnl_newlink+0x51f/0x750
[   17.268485]  [<ffffffff816b1749>] ? rtnl_newlink+0xe9/0x750
[   17.270483]  [<ffffffff811869c2>] ? free_pages_prepare+0x1d2/0x270
[   17.272507]  [<ffffffff810c32bd>] ? trace_hardirqs_on_caller+0x11d/0x1e0
[   17.274531]  [<ffffffff813dd1b2>] ? nla_parse+0x32/0x120
[   17.276531]  [<ffffffff81021ab5>] ? native_sched_clock+0x35/0xa0
[   17.278514]  [<ffffffff816adfd5>] rtnetlink_rcv_msg+0x95/0x250
[   17.280485]  [<ffffffff8109f699>] ? preempt_count_sub+0x49/0x50
[   17.282448]  [<ffffffff817b4a02>] ? mutex_lock_nested+0x382/0x530
[   17.284402]  [<ffffffff816adf1f>] ? rtnetlink_rcv+0x1f/0x40
[   17.286290]  [<ffffffff816adf1f>] ? rtnetlink_rcv+0x1f/0x40
[   17.288142]  [<ffffffff816adf40>] ? rtnetlink_rcv+0x40/0x40
[   17.290031]  [<ffffffff816cedc1>] netlink_rcv_skb+0xc1/0xe0
[   17.291836]  [<ffffffff816adf2e>] rtnetlink_rcv+0x2e/0x40
[   17.293615]  [<ffffffff816ce473>] netlink_unicast+0xf3/0x1d0
[   17.295420]  [<ffffffff816ce863>] netlink_sendmsg+0x313/0x690
[   17.297132]  [<ffffffff811ada4f>] ? might_fault+0x5f/0xb0
[   17.298799]  [<ffffffff8168253c>] do_sock_sendmsg+0x8c/0x100
[   17.300493]  [<ffffffff81681e3e>] ? copy_msghdr_from_user+0x15e/0x1f0
[   17.302173]  [<ffffffff81682aeb>] ___sys_sendmsg+0x30b/0x320
[   17.303798]  [<ffffffff81021ab5>] ? native_sched_clock+0x35/0xa0
[   17.305431]  [<ffffffff810bdee0>] ? cpuacct_account_field+0x80/0xb0
[   17.307085]  [<ffffffff81021ab5>] ? native_sched_clock+0x35/0xa0
[   17.308744]  [<ffffffff810a4f35>] ? sched_clock_local+0x25/0x90
[   17.310375]  [<ffffffff810a5dc1>] ? vtime_account_user+0x91/0xa0
[   17.311948]  [<ffffffff810a5198>] ? sched_clock_cpu+0xb8/0xe0
[   17.313509]  [<ffffffff810bf8be>] ? put_lock_stats.isra.26+0xe/0x30
[   17.315069]  [<ffffffff810c007e>] ? lock_release_holdtime.part.27+0x12e/0x1b0
[   17.316618]  [<ffffffff810a5dc1>] ? vtime_account_user+0x91/0xa0
[   17.318162]  [<ffffffff8109f5d1>] ? get_parent_ip+0x11/0x50
[   17.319703]  [<ffffffff8109f699>] ? preempt_count_sub+0x49/0x50
[   17.321235]  [<ffffffff811807e5>] ? context_tracking_user_exit+0x55/0x130
[   17.322732]  [<ffffffff811807e5>] ? context_tracking_user_exit+0x55/0x130
[   17.324197]  [<ffffffff816834f2>] __sys_sendmsg+0x42/0x80
[   17.325634]  [<ffffffff81683542>] SyS_sendmsg+0x12/0x20
[   17.327048]  [<ffffffff817ba12d>] system_call_fastpath+0x16/0x1b

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [3.19-rc3] tg3: BUG: sleeping function called from invalid context
  2015-01-13  0:59 [3.19-rc3] tg3: BUG: sleeping function called from invalid context Peter Hurley
@ 2015-01-13  2:30 ` Prashant Sreedharan
  2015-01-13 12:30   ` Peter Hurley
  2015-01-14 16:06   ` Peter Hurley
  2015-01-13  6:49 ` Michael Chan
  1 sibling, 2 replies; 7+ messages in thread
From: Prashant Sreedharan @ 2015-01-13  2:30 UTC (permalink / raw)
  To: Peter Hurley; +Cc: Michael Chan, netdev, Linux kernel

On Mon, 2015-01-12 at 19:59 -0500, Peter Hurley wrote:
> On 3.19-rc3, I'm seeing this might_sleep() warning [1] from the tg3_open()
> call stack. Let me know if I need to bisect this.
> 
> Regards,
> Peter Hurley
> 
> [1]
> 
> [   17.203009] BUG: sleeping function called from invalid context at /home/peter/src/kernels/mainline/kernel/irq/manage.c:104
> [   17.203067] in_atomic(): 1, irqs_disabled(): 0, pid: 1106, name: ip
> [   17.203092] 2 locks held by ip/1106:
> [   17.205255]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816adf1f>] rtnetlink_rcv+0x1f/0x40
> [   17.207445]  #1:  (&(&tp->lock)->rlock){+.....}, at: [<ffffffffa01073e6>] tg3_start+0xc06/0x11f0 [tg3]
> [   17.209725] CPU: 2 PID: 1106 Comm: ip Not tainted 3.19.0-rc3+wip-xeon+lockdep #rc3+wip
> [   17.211900] Hardware name: Dell Inc. Precision WorkStation T5400  /0RW203, BIOS A11 04/30/2012
> [   17.214086]  0000000000000068 ffff8802ac823498 ffffffff817af7e8 0000000000000005
> [   17.216265]  ffffffff81a9be78 ffff8802ac8234a8 ffffffff810998a5 ffff8802ac8234d8
> [   17.218446]  ffffffff8109991a ffff8802ac8234c8 ffff8802af0aae00 ffffffffa00ed000
> [   17.220636] Call Trace:
> [   17.222743]  [<ffffffff817af7e8>] dump_stack+0x4f/0x7b
> [   17.224808]  [<ffffffff810998a5>] ___might_sleep+0x105/0x140
> [   17.226842]  [<ffffffff8109991a>] __might_sleep+0x3a/0xa0
> [   17.228869]  [<ffffffffa00ed000>] ? 0xffffffffa00ed000
> [   17.230939]  [<ffffffff810d7d78>] synchronize_irq+0x38/0xa0
> [   17.232967]  [<ffffffffa00ed000>] ? 0xffffffffa00ed000
> [   17.234991]  [<ffffffffa010105f>] tg3_chip_reset+0x13f/0x9c0 [tg3]
> [   17.236988]  [<ffffffffa01020ae>] tg3_reset_hw+0x7e/0x2d20 [tg3]
> [   17.238996]  [<ffffffff813bfaff>] ? __udelay+0x2f/0x40
> [   17.241007]  [<ffffffffa00ef2f7>] ? _tw32_flush+0x47/0x80 [tg3]
> [   17.243066]  [<ffffffffa0104dac>] tg3_init_hw+0x5c/0x70 [tg3]
> [   17.245438]  [<ffffffffa010740b>] tg3_start+0xc2b/0x11f0 [tg3]
> [   17.247444]  [<ffffffffa0107ad7>] ? tg3_open+0x107/0x2e0 [tg3]
> [   17.249556]  [<ffffffff810c338d>] ? trace_hardirqs_on+0xd/0x10
> [   17.251581]  [<ffffffff8107806f>] ? __local_bh_enable_ip+0x6f/0x100
> [   17.253710]  [<ffffffffa0107af8>] tg3_open+0x128/0x2e0 [tg3]
> [   17.255758]  [<ffffffff816ba3f5>] ? netpoll_poll_disable+0x5/0xa0
> [   17.257932]  [<ffffffff816a14af>] __dev_open+0xbf/0x140
> [   17.260091]  [<ffffffff816a17c1>] __dev_change_flags+0xa1/0x160
> [   17.262222]  [<ffffffff816a18a9>] dev_change_flags+0x29/0x60
> [   17.264360]  [<ffffffff816b0e02>] do_setlink+0x2f2/0xa30
> [   17.266431]  [<ffffffff816b1b7f>] rtnl_newlink+0x51f/0x750
> [   17.268485]  [<ffffffff816b1749>] ? rtnl_newlink+0xe9/0x750
> [   17.270483]  [<ffffffff811869c2>] ? free_pages_prepare+0x1d2/0x270
> [   17.272507]  [<ffffffff810c32bd>] ? trace_hardirqs_on_caller+0x11d/0x1e0
> [   17.274531]  [<ffffffff813dd1b2>] ? nla_parse+0x32/0x120
> [   17.276531]  [<ffffffff81021ab5>] ? native_sched_clock+0x35/0xa0
> [   17.278514]  [<ffffffff816adfd5>] rtnetlink_rcv_msg+0x95/0x250
> [   17.280485]  [<ffffffff8109f699>] ? preempt_count_sub+0x49/0x50
> [   17.282448]  [<ffffffff817b4a02>] ? mutex_lock_nested+0x382/0x530
> [   17.284402]  [<ffffffff816adf1f>] ? rtnetlink_rcv+0x1f/0x40
> [   17.286290]  [<ffffffff816adf1f>] ? rtnetlink_rcv+0x1f/0x40
> [   17.288142]  [<ffffffff816adf40>] ? rtnetlink_rcv+0x40/0x40
> [   17.290031]  [<ffffffff816cedc1>] netlink_rcv_skb+0xc1/0xe0
> [   17.291836]  [<ffffffff816adf2e>] rtnetlink_rcv+0x2e/0x40
> [   17.293615]  [<ffffffff816ce473>] netlink_unicast+0xf3/0x1d0
> [   17.295420]  [<ffffffff816ce863>] netlink_sendmsg+0x313/0x690
> [   17.297132]  [<ffffffff811ada4f>] ? might_fault+0x5f/0xb0
> [   17.298799]  [<ffffffff8168253c>] do_sock_sendmsg+0x8c/0x100
> [   17.300493]  [<ffffffff81681e3e>] ? copy_msghdr_from_user+0x15e/0x1f0
> [   17.302173]  [<ffffffff81682aeb>] ___sys_sendmsg+0x30b/0x320
> [   17.303798]  [<ffffffff81021ab5>] ? native_sched_clock+0x35/0xa0
> [   17.305431]  [<ffffffff810bdee0>] ? cpuacct_account_field+0x80/0xb0
> [   17.307085]  [<ffffffff81021ab5>] ? native_sched_clock+0x35/0xa0
> [   17.308744]  [<ffffffff810a4f35>] ? sched_clock_local+0x25/0x90
> [   17.310375]  [<ffffffff810a5dc1>] ? vtime_account_user+0x91/0xa0
> [   17.311948]  [<ffffffff810a5198>] ? sched_clock_cpu+0xb8/0xe0
> [   17.313509]  [<ffffffff810bf8be>] ? put_lock_stats.isra.26+0xe/0x30
> [   17.315069]  [<ffffffff810c007e>] ? lock_release_holdtime.part.27+0x12e/0x1b0
> [   17.316618]  [<ffffffff810a5dc1>] ? vtime_account_user+0x91/0xa0
> [   17.318162]  [<ffffffff8109f5d1>] ? get_parent_ip+0x11/0x50
> [   17.319703]  [<ffffffff8109f699>] ? preempt_count_sub+0x49/0x50
> [   17.321235]  [<ffffffff811807e5>] ? context_tracking_user_exit+0x55/0x130
> [   17.322732]  [<ffffffff811807e5>] ? context_tracking_user_exit+0x55/0x130
> [   17.324197]  [<ffffffff816834f2>] __sys_sendmsg+0x42/0x80
> [   17.325634]  [<ffffffff81683542>] SyS_sendmsg+0x12/0x20
> [   17.327048]  [<ffffffff817ba12d>] system_call_fastpath+0x16/0x1b

Please bisect, there hasn't been tg3 code changes in this path that
might cause this. It would help to know the commit changes that is
triggering the problem. Also could you provide the device details, from
syslog look for "Tigon3 [partno(BCMxxxxx) rev xxxxxxx]". Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [3.19-rc3] tg3: BUG: sleeping function called from invalid context
  2015-01-13  0:59 [3.19-rc3] tg3: BUG: sleeping function called from invalid context Peter Hurley
  2015-01-13  2:30 ` Prashant Sreedharan
@ 2015-01-13  6:49 ` Michael Chan
  2015-01-13 12:47   ` Peter Hurley
  1 sibling, 1 reply; 7+ messages in thread
From: Michael Chan @ 2015-01-13  6:49 UTC (permalink / raw)
  To: Peter Hurley; +Cc: Prashant Sreedharan, netdev, Linux kernel

On Mon, 2015-01-12 at 19:59 -0500, Peter Hurley wrote: 
> [   17.203009] BUG: sleeping function called from invalid context at /home/peter/src/kernels/mainline/kernel/irq/manage.c:104
> [   17.203067] in_atomic(): 1, irqs_disabled(): 0, pid: 1106, name: ip
> [   17.203092] 2 locks held by ip/1106:
> [   17.205255]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816adf1f>] rtnetlink_rcv+0x1f/0x40
> [   17.207445]  #1:  (&(&tp->lock)->rlock){+.....}, at: [<ffffffffa01073e6>] tg3_start+0xc06/0x11f0 [tg3]
> [   17.209725] CPU: 2 PID: 1106 Comm: ip Not tainted 3.19.0-rc3+wip-xeon+lockdep #rc3+wip
> [   17.211900] Hardware name: Dell Inc. Precision WorkStation T5400  /0RW203, BIOS A11 04/30/2012
> [   17.214086]  0000000000000068 ffff8802ac823498 ffffffff817af7e8 0000000000000005
> [   17.216265]  ffffffff81a9be78 ffff8802ac8234a8 ffffffff810998a5 ffff8802ac8234d8
> [   17.218446]  ffffffff8109991a ffff8802ac8234c8 ffff8802af0aae00 ffffffffa00ed000
> [   17.220636] Call Trace:
> [   17.222743]  [<ffffffff817af7e8>] dump_stack+0x4f/0x7b
> [   17.224808]  [<ffffffff810998a5>] ___might_sleep+0x105/0x140
> [   17.226842]  [<ffffffff8109991a>] __might_sleep+0x3a/0xa0
> [   17.228869]  [<ffffffffa00ed000>] ? 0xffffffffa00ed000
> [   17.230939]  [<ffffffff810d7d78>] synchronize_irq+0x38/0xa0
> [   17.232967]  [<ffffffffa00ed000>] ? 0xffffffffa00ed000
> [   17.234991]  [<ffffffffa010105f>] tg3_chip_reset+0x13f/0x9c0 [tg3]
> [   17.236988]  [<ffffffffa01020ae>] tg3_reset_hw+0x7e/0x2d20 [tg3] 

tp->lock is held in this code path.  If synchronize_irq() sleeps in
wait_event(desc->wait_for_threads, ...), we'll get the warning.

The synchronize_irq() call is to wait for any tg3 irq handler to finish
so that it is guaranteed that next time it will see the CHIP_RESETTING
flag and do nothing.

Not sure if we can drop the tp->lock before we call synchronize_irq()
and then take it again after synchronize_irq().

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [3.19-rc3] tg3: BUG: sleeping function called from invalid context
  2015-01-13  2:30 ` Prashant Sreedharan
@ 2015-01-13 12:30   ` Peter Hurley
  2015-01-14 16:06   ` Peter Hurley
  1 sibling, 0 replies; 7+ messages in thread
From: Peter Hurley @ 2015-01-13 12:30 UTC (permalink / raw)
  To: Prashant Sreedharan; +Cc: Michael Chan, netdev, Linux kernel

On 01/12/2015 09:30 PM, Prashant Sreedharan wrote:
> On Mon, 2015-01-12 at 19:59 -0500, Peter Hurley wrote:
>> On 3.19-rc3, I'm seeing this might_sleep() warning [1] from the tg3_open()
>> call stack. Let me know if I need to bisect this.
>>
>> Regards,
>> Peter Hurley
>>
>> [1]
>>
>> [   17.203009] BUG: sleeping function called from invalid context at /home/peter/src/kernels/mainline/kernel/irq/manage.c:104
>> [   17.203067] in_atomic(): 1, irqs_disabled(): 0, pid: 1106, name: ip
>> [   17.203092] 2 locks held by ip/1106:
>> [   17.205255]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816adf1f>] rtnetlink_rcv+0x1f/0x40
>> [   17.207445]  #1:  (&(&tp->lock)->rlock){+.....}, at: [<ffffffffa01073e6>] tg3_start+0xc06/0x11f0 [tg3]
>> [   17.209725] CPU: 2 PID: 1106 Comm: ip Not tainted 3.19.0-rc3+wip-xeon+lockdep #rc3+wip
>> [   17.211900] Hardware name: Dell Inc. Precision WorkStation T5400  /0RW203, BIOS A11 04/30/2012
>> [   17.214086]  0000000000000068 ffff8802ac823498 ffffffff817af7e8 0000000000000005
>> [   17.216265]  ffffffff81a9be78 ffff8802ac8234a8 ffffffff810998a5 ffff8802ac8234d8
>> [   17.218446]  ffffffff8109991a ffff8802ac8234c8 ffff8802af0aae00 ffffffffa00ed000
>> [   17.220636] Call Trace:
>> [   17.222743]  [<ffffffff817af7e8>] dump_stack+0x4f/0x7b
>> [   17.224808]  [<ffffffff810998a5>] ___might_sleep+0x105/0x140
>> [   17.226842]  [<ffffffff8109991a>] __might_sleep+0x3a/0xa0
>> [   17.228869]  [<ffffffffa00ed000>] ? 0xffffffffa00ed000
>> [   17.230939]  [<ffffffff810d7d78>] synchronize_irq+0x38/0xa0
>> [   17.232967]  [<ffffffffa00ed000>] ? 0xffffffffa00ed000
>> [   17.234991]  [<ffffffffa010105f>] tg3_chip_reset+0x13f/0x9c0 [tg3]
>> [   17.236988]  [<ffffffffa01020ae>] tg3_reset_hw+0x7e/0x2d20 [tg3]
>> [   17.238996]  [<ffffffff813bfaff>] ? __udelay+0x2f/0x40
>> [   17.241007]  [<ffffffffa00ef2f7>] ? _tw32_flush+0x47/0x80 [tg3]
>> [   17.243066]  [<ffffffffa0104dac>] tg3_init_hw+0x5c/0x70 [tg3]
>> [   17.245438]  [<ffffffffa010740b>] tg3_start+0xc2b/0x11f0 [tg3]
>> [   17.247444]  [<ffffffffa0107ad7>] ? tg3_open+0x107/0x2e0 [tg3]
>> [   17.249556]  [<ffffffff810c338d>] ? trace_hardirqs_on+0xd/0x10
>> [   17.251581]  [<ffffffff8107806f>] ? __local_bh_enable_ip+0x6f/0x100
>> [   17.253710]  [<ffffffffa0107af8>] tg3_open+0x128/0x2e0 [tg3]
>> [   17.255758]  [<ffffffff816ba3f5>] ? netpoll_poll_disable+0x5/0xa0
>> [   17.257932]  [<ffffffff816a14af>] __dev_open+0xbf/0x140
>> [   17.260091]  [<ffffffff816a17c1>] __dev_change_flags+0xa1/0x160
>> [   17.262222]  [<ffffffff816a18a9>] dev_change_flags+0x29/0x60
>> [   17.264360]  [<ffffffff816b0e02>] do_setlink+0x2f2/0xa30
>> [   17.266431]  [<ffffffff816b1b7f>] rtnl_newlink+0x51f/0x750
>> [   17.268485]  [<ffffffff816b1749>] ? rtnl_newlink+0xe9/0x750
>> [   17.270483]  [<ffffffff811869c2>] ? free_pages_prepare+0x1d2/0x270
>> [   17.272507]  [<ffffffff810c32bd>] ? trace_hardirqs_on_caller+0x11d/0x1e0
>> [   17.274531]  [<ffffffff813dd1b2>] ? nla_parse+0x32/0x120
>> [   17.276531]  [<ffffffff81021ab5>] ? native_sched_clock+0x35/0xa0
>> [   17.278514]  [<ffffffff816adfd5>] rtnetlink_rcv_msg+0x95/0x250
>> [   17.280485]  [<ffffffff8109f699>] ? preempt_count_sub+0x49/0x50
>> [   17.282448]  [<ffffffff817b4a02>] ? mutex_lock_nested+0x382/0x530
>> [   17.284402]  [<ffffffff816adf1f>] ? rtnetlink_rcv+0x1f/0x40
>> [   17.286290]  [<ffffffff816adf1f>] ? rtnetlink_rcv+0x1f/0x40
>> [   17.288142]  [<ffffffff816adf40>] ? rtnetlink_rcv+0x40/0x40
>> [   17.290031]  [<ffffffff816cedc1>] netlink_rcv_skb+0xc1/0xe0
>> [   17.291836]  [<ffffffff816adf2e>] rtnetlink_rcv+0x2e/0x40
>> [   17.293615]  [<ffffffff816ce473>] netlink_unicast+0xf3/0x1d0
>> [   17.295420]  [<ffffffff816ce863>] netlink_sendmsg+0x313/0x690
>> [   17.297132]  [<ffffffff811ada4f>] ? might_fault+0x5f/0xb0
>> [   17.298799]  [<ffffffff8168253c>] do_sock_sendmsg+0x8c/0x100
>> [   17.300493]  [<ffffffff81681e3e>] ? copy_msghdr_from_user+0x15e/0x1f0
>> [   17.302173]  [<ffffffff81682aeb>] ___sys_sendmsg+0x30b/0x320
>> [   17.303798]  [<ffffffff81021ab5>] ? native_sched_clock+0x35/0xa0
>> [   17.305431]  [<ffffffff810bdee0>] ? cpuacct_account_field+0x80/0xb0
>> [   17.307085]  [<ffffffff81021ab5>] ? native_sched_clock+0x35/0xa0
>> [   17.308744]  [<ffffffff810a4f35>] ? sched_clock_local+0x25/0x90
>> [   17.310375]  [<ffffffff810a5dc1>] ? vtime_account_user+0x91/0xa0
>> [   17.311948]  [<ffffffff810a5198>] ? sched_clock_cpu+0xb8/0xe0
>> [   17.313509]  [<ffffffff810bf8be>] ? put_lock_stats.isra.26+0xe/0x30
>> [   17.315069]  [<ffffffff810c007e>] ? lock_release_holdtime.part.27+0x12e/0x1b0
>> [   17.316618]  [<ffffffff810a5dc1>] ? vtime_account_user+0x91/0xa0
>> [   17.318162]  [<ffffffff8109f5d1>] ? get_parent_ip+0x11/0x50
>> [   17.319703]  [<ffffffff8109f699>] ? preempt_count_sub+0x49/0x50
>> [   17.321235]  [<ffffffff811807e5>] ? context_tracking_user_exit+0x55/0x130
>> [   17.322732]  [<ffffffff811807e5>] ? context_tracking_user_exit+0x55/0x130
>> [   17.324197]  [<ffffffff816834f2>] __sys_sendmsg+0x42/0x80
>> [   17.325634]  [<ffffffff81683542>] SyS_sendmsg+0x12/0x20
>> [   17.327048]  [<ffffffff817ba12d>] system_call_fastpath+0x16/0x1b
> 
> Please bisect, there hasn't been tg3 code changes in this path that
> might cause this. It would help to know the commit changes that is
> triggering the problem.

Ok, will do.

> Also could you provide the device details, from
> syslog look for "Tigon3 [partno(BCMxxxxx) rev xxxxxxx]". Thanks.

[    1.430884] tg3 0000:08:00.0 eth0: Tigon3 [partno(BCM95754) rev b002] (PCI Express) MAC address xx:xx:xx:xx:xx:xx
[    1.431095] tg3 0000:08:00.0 eth0: attached PHY is 5787 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
[    1.431295] tg3 0000:08:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[    1.431488] tg3 0000:08:00.0 eth0: dma_rwctrl[76180000] dma_mask[64-bit]

Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [3.19-rc3] tg3: BUG: sleeping function called from invalid context
  2015-01-13  6:49 ` Michael Chan
@ 2015-01-13 12:47   ` Peter Hurley
  2015-01-13 17:25     ` Michael Chan
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Hurley @ 2015-01-13 12:47 UTC (permalink / raw)
  To: Michael Chan; +Cc: Prashant Sreedharan, netdev, Linux kernel

On 01/13/2015 01:49 AM, Michael Chan wrote:
> On Mon, 2015-01-12 at 19:59 -0500, Peter Hurley wrote: 
>> [   17.203009] BUG: sleeping function called from invalid context at /home/peter/src/kernels/mainline/kernel/irq/manage.c:104
>> [   17.203067] in_atomic(): 1, irqs_disabled(): 0, pid: 1106, name: ip
>> [   17.203092] 2 locks held by ip/1106:
>> [   17.205255]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816adf1f>] rtnetlink_rcv+0x1f/0x40
>> [   17.207445]  #1:  (&(&tp->lock)->rlock){+.....}, at: [<ffffffffa01073e6>] tg3_start+0xc06/0x11f0 [tg3]
>> [   17.209725] CPU: 2 PID: 1106 Comm: ip Not tainted 3.19.0-rc3+wip-xeon+lockdep #rc3+wip
>> [   17.211900] Hardware name: Dell Inc. Precision WorkStation T5400  /0RW203, BIOS A11 04/30/2012
>> [   17.214086]  0000000000000068 ffff8802ac823498 ffffffff817af7e8 0000000000000005
>> [   17.216265]  ffffffff81a9be78 ffff8802ac8234a8 ffffffff810998a5 ffff8802ac8234d8
>> [   17.218446]  ffffffff8109991a ffff8802ac8234c8 ffff8802af0aae00 ffffffffa00ed000
>> [   17.220636] Call Trace:
>> [   17.222743]  [<ffffffff817af7e8>] dump_stack+0x4f/0x7b
>> [   17.224808]  [<ffffffff810998a5>] ___might_sleep+0x105/0x140
>> [   17.226842]  [<ffffffff8109991a>] __might_sleep+0x3a/0xa0
>> [   17.228869]  [<ffffffffa00ed000>] ? 0xffffffffa00ed000
>> [   17.230939]  [<ffffffff810d7d78>] synchronize_irq+0x38/0xa0
>> [   17.232967]  [<ffffffffa00ed000>] ? 0xffffffffa00ed000
>> [   17.234991]  [<ffffffffa010105f>] tg3_chip_reset+0x13f/0x9c0 [tg3]
>> [   17.236988]  [<ffffffffa01020ae>] tg3_reset_hw+0x7e/0x2d20 [tg3] 
> 
> tp->lock is held in this code path.  If synchronize_irq() sleeps in
> wait_event(desc->wait_for_threads, ...), we'll get the warning.
> 
> The synchronize_irq() call is to wait for any tg3 irq handler to finish
> so that it is guaranteed that next time it will see the CHIP_RESETTING
> flag and do nothing.
> 
> Not sure if we can drop the tp->lock before we call synchronize_irq()
> and then take it again after synchronize_irq().

Well, this device [1] is using MSI (INTx disabled) so if the synchronize_irq()
is _only_ for the CHIP_RESETTING logic then it would seem ok to skip it (the
synchronize_irq()).

Regards,
Peter Hurley


[1] lspci -vv

08:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5754 Gigabit Ethernet PCI Express (rev 02)
	Subsystem: Dell Precision T5400
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 31
	Region 0: Memory at d3ff0000 (64-bit, non-prefetchable) [size=64K]
	Expansion ROM at <ignored> [disabled]
	Capabilities: [48] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] Vital Product Data
		Product Name: Broadcom NetLink Gigabit Ethernet Controller
		Read-only fields:
			[PN] Part number: BCM95754
			[EC] Engineering changes: 106679-15
			[SN] Serial number: 0123456789
			[MN] Manufacture ID: 31 34 65 34
			[RV] Reserved: checksum good, 30 byte(s) reserved
		Read/write fields:
			[YA] Asset tag: XYZ01234567
			[RW] Read-write area: 107 byte(s) free
		End
	Capabilities: [58] Vendor Specific Information: Len=78 <?>
	Capabilities: [e8] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee0400c  Data: 41a2
	Capabilities: [d0] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <4us, L1 <64us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr+ BadTLP- BadDLLP+ Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [13c v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [160 v1] Device Serial Number xx-xx-xx-xx-xx-xx-xx-xx
	Capabilities: [16c v1] Power Budgeting <?>
	Kernel driver in use: tg3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [3.19-rc3] tg3: BUG: sleeping function called from invalid context
  2015-01-13 12:47   ` Peter Hurley
@ 2015-01-13 17:25     ` Michael Chan
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Chan @ 2015-01-13 17:25 UTC (permalink / raw)
  To: Peter Hurley; +Cc: Prashant Sreedharan, netdev, Linux kernel

On Tue, 2015-01-13 at 07:47 -0500, Peter Hurley wrote: 
> > tp->lock is held in this code path.  If synchronize_irq() sleeps in
> > wait_event(desc->wait_for_threads, ...), we'll get the warning.
> > 
> > The synchronize_irq() call is to wait for any tg3 irq handler to finish
> > so that it is guaranteed that next time it will see the CHIP_RESETTING
> > flag and do nothing.
> > 
> > Not sure if we can drop the tp->lock before we call synchronize_irq()
> > and then take it again after synchronize_irq().
> 
> Well, this device [1] is using MSI (INTx disabled) so if the synchronize_irq()
> is _only_ for the CHIP_RESETTING logic then it would seem ok to skip it (the
> synchronize_irq()). 

It is only for INTx.  But any device can operate in INTx mode if
MSI/MSIX is not available, so the fix needs to work in all cases.

Let me review the code some more.  If we can guarantee that another
reset, the timer code, etc, cannot come in even if we drop the tp->lock,
the simplest fix will be to drop it before calling synchronize_irq().

Thanks. 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [3.19-rc3] tg3: BUG: sleeping function called from invalid context
  2015-01-13  2:30 ` Prashant Sreedharan
  2015-01-13 12:30   ` Peter Hurley
@ 2015-01-14 16:06   ` Peter Hurley
  1 sibling, 0 replies; 7+ messages in thread
From: Peter Hurley @ 2015-01-14 16:06 UTC (permalink / raw)
  To: Prashant Sreedharan; +Cc: Michael Chan, netdev, Linux kernel

On 01/12/2015 09:30 PM, Prashant Sreedharan wrote:
> On Mon, 2015-01-12 at 19:59 -0500, Peter Hurley wrote:
>> On 3.19-rc3, I'm seeing this might_sleep() warning [1] from the tg3_open()
>> call stack. Let me know if I need to bisect this.
>>
>> Regards,
>> Peter Hurley
>>
>> [1]
>>
>> [   17.203009] BUG: sleeping function called from invalid context at /home/peter/src/kernels/mainline/kernel/irq/manage.c:104
>> [   17.203067] in_atomic(): 1, irqs_disabled(): 0, pid: 1106, name: ip
>> [   17.203092] 2 locks held by ip/1106:
>> [   17.205255]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816adf1f>] rtnetlink_rcv+0x1f/0x40
>> [   17.207445]  #1:  (&(&tp->lock)->rlock){+.....}, at: [<ffffffffa01073e6>] tg3_start+0xc06/0x11f0 [tg3]
>> [   17.209725] CPU: 2 PID: 1106 Comm: ip Not tainted 3.19.0-rc3+wip-xeon+lockdep #rc3+wip
>> [   17.211900] Hardware name: Dell Inc. Precision WorkStation T5400  /0RW203, BIOS A11 04/30/2012
>> [   17.214086]  0000000000000068 ffff8802ac823498 ffffffff817af7e8 0000000000000005
>> [   17.216265]  ffffffff81a9be78 ffff8802ac8234a8 ffffffff810998a5 ffff8802ac8234d8
>> [   17.218446]  ffffffff8109991a ffff8802ac8234c8 ffff8802af0aae00 ffffffffa00ed000
>> [   17.220636] Call Trace:
>> [   17.222743]  [<ffffffff817af7e8>] dump_stack+0x4f/0x7b
>> [   17.224808]  [<ffffffff810998a5>] ___might_sleep+0x105/0x140
>> [   17.226842]  [<ffffffff8109991a>] __might_sleep+0x3a/0xa0
>> [   17.228869]  [<ffffffffa00ed000>] ? 0xffffffffa00ed000
>> [   17.230939]  [<ffffffff810d7d78>] synchronize_irq+0x38/0xa0
>> [   17.232967]  [<ffffffffa00ed000>] ? 0xffffffffa00ed000
>> [   17.234991]  [<ffffffffa010105f>] tg3_chip_reset+0x13f/0x9c0 [tg3]
>> [   17.236988]  [<ffffffffa01020ae>] tg3_reset_hw+0x7e/0x2d20 [tg3]
>> [   17.238996]  [<ffffffff813bfaff>] ? __udelay+0x2f/0x40
>> [   17.241007]  [<ffffffffa00ef2f7>] ? _tw32_flush+0x47/0x80 [tg3]
>> [   17.243066]  [<ffffffffa0104dac>] tg3_init_hw+0x5c/0x70 [tg3]
>> [   17.245438]  [<ffffffffa010740b>] tg3_start+0xc2b/0x11f0 [tg3]
>> [   17.247444]  [<ffffffffa0107ad7>] ? tg3_open+0x107/0x2e0 [tg3]
>> [   17.249556]  [<ffffffff810c338d>] ? trace_hardirqs_on+0xd/0x10
>> [   17.251581]  [<ffffffff8107806f>] ? __local_bh_enable_ip+0x6f/0x100
>> [   17.253710]  [<ffffffffa0107af8>] tg3_open+0x128/0x2e0 [tg3]
>> [   17.255758]  [<ffffffff816ba3f5>] ? netpoll_poll_disable+0x5/0xa0
>> [   17.257932]  [<ffffffff816a14af>] __dev_open+0xbf/0x140
>> [   17.260091]  [<ffffffff816a17c1>] __dev_change_flags+0xa1/0x160
>> [   17.262222]  [<ffffffff816a18a9>] dev_change_flags+0x29/0x60
>> [   17.264360]  [<ffffffff816b0e02>] do_setlink+0x2f2/0xa30
>> [   17.266431]  [<ffffffff816b1b7f>] rtnl_newlink+0x51f/0x750
>> [   17.268485]  [<ffffffff816b1749>] ? rtnl_newlink+0xe9/0x750
>> [   17.270483]  [<ffffffff811869c2>] ? free_pages_prepare+0x1d2/0x270
>> [   17.272507]  [<ffffffff810c32bd>] ? trace_hardirqs_on_caller+0x11d/0x1e0
>> [   17.274531]  [<ffffffff813dd1b2>] ? nla_parse+0x32/0x120
>> [   17.276531]  [<ffffffff81021ab5>] ? native_sched_clock+0x35/0xa0
>> [   17.278514]  [<ffffffff816adfd5>] rtnetlink_rcv_msg+0x95/0x250
>> [   17.280485]  [<ffffffff8109f699>] ? preempt_count_sub+0x49/0x50
>> [   17.282448]  [<ffffffff817b4a02>] ? mutex_lock_nested+0x382/0x530
>> [   17.284402]  [<ffffffff816adf1f>] ? rtnetlink_rcv+0x1f/0x40
>> [   17.286290]  [<ffffffff816adf1f>] ? rtnetlink_rcv+0x1f/0x40
>> [   17.288142]  [<ffffffff816adf40>] ? rtnetlink_rcv+0x40/0x40
>> [   17.290031]  [<ffffffff816cedc1>] netlink_rcv_skb+0xc1/0xe0
>> [   17.291836]  [<ffffffff816adf2e>] rtnetlink_rcv+0x2e/0x40
>> [   17.293615]  [<ffffffff816ce473>] netlink_unicast+0xf3/0x1d0
>> [   17.295420]  [<ffffffff816ce863>] netlink_sendmsg+0x313/0x690
>> [   17.297132]  [<ffffffff811ada4f>] ? might_fault+0x5f/0xb0
>> [   17.298799]  [<ffffffff8168253c>] do_sock_sendmsg+0x8c/0x100
>> [   17.300493]  [<ffffffff81681e3e>] ? copy_msghdr_from_user+0x15e/0x1f0
>> [   17.302173]  [<ffffffff81682aeb>] ___sys_sendmsg+0x30b/0x320
>> [   17.303798]  [<ffffffff81021ab5>] ? native_sched_clock+0x35/0xa0
>> [   17.305431]  [<ffffffff810bdee0>] ? cpuacct_account_field+0x80/0xb0
>> [   17.307085]  [<ffffffff81021ab5>] ? native_sched_clock+0x35/0xa0
>> [   17.308744]  [<ffffffff810a4f35>] ? sched_clock_local+0x25/0x90
>> [   17.310375]  [<ffffffff810a5dc1>] ? vtime_account_user+0x91/0xa0
>> [   17.311948]  [<ffffffff810a5198>] ? sched_clock_cpu+0xb8/0xe0
>> [   17.313509]  [<ffffffff810bf8be>] ? put_lock_stats.isra.26+0xe/0x30
>> [   17.315069]  [<ffffffff810c007e>] ? lock_release_holdtime.part.27+0x12e/0x1b0
>> [   17.316618]  [<ffffffff810a5dc1>] ? vtime_account_user+0x91/0xa0
>> [   17.318162]  [<ffffffff8109f5d1>] ? get_parent_ip+0x11/0x50
>> [   17.319703]  [<ffffffff8109f699>] ? preempt_count_sub+0x49/0x50
>> [   17.321235]  [<ffffffff811807e5>] ? context_tracking_user_exit+0x55/0x130
>> [   17.322732]  [<ffffffff811807e5>] ? context_tracking_user_exit+0x55/0x130
>> [   17.324197]  [<ffffffff816834f2>] __sys_sendmsg+0x42/0x80
>> [   17.325634]  [<ffffffff81683542>] SyS_sendmsg+0x12/0x20
>> [   17.327048]  [<ffffffff817ba12d>] system_call_fastpath+0x16/0x1b
> 
> Please bisect, there hasn't been tg3 code changes in this path that
> might cause this.

What triggers this is the new debugging code added to catch nested
sleeps; specifically e22b886 ("sched/wait: Add might_sleep() checks").

Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-01-14 16:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-13  0:59 [3.19-rc3] tg3: BUG: sleeping function called from invalid context Peter Hurley
2015-01-13  2:30 ` Prashant Sreedharan
2015-01-13 12:30   ` Peter Hurley
2015-01-14 16:06   ` Peter Hurley
2015-01-13  6:49 ` Michael Chan
2015-01-13 12:47   ` Peter Hurley
2015-01-13 17:25     ` Michael Chan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).