All of lore.kernel.org
 help / color / mirror / Atom feed
* lost interrupt after a signal?
@ 2008-05-22 14:57 Chuck Lever
  2008-05-22 20:39 ` Trond Myklebust
  2008-05-23  3:50 ` Matthew Wilcox
  0 siblings, 2 replies; 6+ messages in thread
From: Chuck Lever @ 2008-05-22 14:57 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing List, Matthew Wilcox

We've been running some tests to understand how the 2.6.25 "intr/ 
nointr" behavior affects signal handling during I/O on NFS mounts.

While running an Oracle database workload, we signal the database  
(this is a normal way administrative tools control database  
activity).  Subsequently all of the I/O threads block on the inode  
mutex in nfs_invalidate_mapping() except this one:

INFO: task oracle:27214 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this  
message.
oracle        D f6d85e84  1592 27214      1
        c93d2920 00200086 00000001 f6d85e84 c04a0080 c04a0080 c04a0080  
c93d2b84
        c93d2b84 c4021f80 00000001 cc072000 f341c900 f6d85e7c 10a1a042  
f6d85e7c
        cc072ddc c4021f80 03b7e000 cc072ddc c40082b4 c036e21c cc072dd4  
00000001
Call Trace:
  [<c036e21c>] io_schedule+0x4c/0x90
  [<c015f63c>] sync_page+0x2c/0x40
  [<c036e3e5>] __wait_on_bit_lock+0x45/0x70
  [<c015f610>] sync_page+0x0/0x40
  [<c015f5f3>] __lock_page+0x73/0x80
  [<c013cad0>] wake_bit_function+0x0/0x80
  [<c0167f98>] invalidate_inode_pages2_range+0xb8/0x200
  [<f905d1a8>] nfs_writepages+0x68/0x90 [nfs]
  [<f905489f>] nfs_invalidate_mapping_nolock+0x1f/0xd0 [nfs]
  [<f9054ffa>] nfs_invalidate_mapping+0x5a/0x60 [nfs]
  [<f90538a5>] nfs_file_read+0x85/0x120 [nfs]
  [<c0182685>] do_sync_read+0xd5/0x120
  [<c016cf4a>] __do_fault+0x1ca/0x400
  [<c011c277>] __update_rq_clock+0x27/0x180
  [<c013ca80>] autoremove_wake_function+0x0/0x50
  [<c0136b25>] k_getrusage+0x1f5/0x200
  [<c01e525c>] security_file_permission+0xc/0x10
  [<c0182736>] rw_verify_area+0x66/0xd0
  [<c0136b52>] getrusage+0x22/0x40
  [<c0182f81>] vfs_read+0xa1/0x140
  [<c01825b0>] do_sync_read+0x0/0x120
  [<c01835da>] sys_pread64+0x6a/0x70
  [<c0103e62>] syscall_call+0x7/0xb

I haven't looked too closely at this, but maybe the signal caused a  
lost I/O interrupt?

What would be the next steps to troubleshoot this further?

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: lost interrupt after a signal?
  2008-05-22 14:57 lost interrupt after a signal? Chuck Lever
@ 2008-05-22 20:39 ` Trond Myklebust
  2008-05-23  3:50 ` Matthew Wilcox
  1 sibling, 0 replies; 6+ messages in thread
From: Trond Myklebust @ 2008-05-22 20:39 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List, Matthew Wilcox

On Thu, 2008-05-22 at 10:57 -0400, Chuck Lever wrote:
> We've been running some tests to understand how the 2.6.25 "intr/ 
> nointr" behavior affects signal handling during I/O on NFS mounts.
> 
> While running an Oracle database workload, we signal the database  
> (this is a normal way administrative tools control database  
> activity).  Subsequently all of the I/O threads block on the inode  
> mutex in nfs_invalidate_mapping() except this one:
> 
> INFO: task oracle:27214 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this  
> message.
> oracle        D f6d85e84  1592 27214      1
>         c93d2920 00200086 00000001 f6d85e84 c04a0080 c04a0080 c04a0080  
> c93d2b84
>         c93d2b84 c4021f80 00000001 cc072000 f341c900 f6d85e7c 10a1a042  
> f6d85e7c
>         cc072ddc c4021f80 03b7e000 cc072ddc c40082b4 c036e21c cc072dd4  
> 00000001
> Call Trace:
>   [<c036e21c>] io_schedule+0x4c/0x90
>   [<c015f63c>] sync_page+0x2c/0x40
>   [<c036e3e5>] __wait_on_bit_lock+0x45/0x70
>   [<c015f610>] sync_page+0x0/0x40
>   [<c015f5f3>] __lock_page+0x73/0x80
>   [<c013cad0>] wake_bit_function+0x0/0x80
>   [<c0167f98>] invalidate_inode_pages2_range+0xb8/0x200
>   [<f905d1a8>] nfs_writepages+0x68/0x90 [nfs]
>   [<f905489f>] nfs_invalidate_mapping_nolock+0x1f/0xd0 [nfs]
>   [<f9054ffa>] nfs_invalidate_mapping+0x5a/0x60 [nfs]
>   [<f90538a5>] nfs_file_read+0x85/0x120 [nfs]
>   [<c0182685>] do_sync_read+0xd5/0x120
>   [<c016cf4a>] __do_fault+0x1ca/0x400
>   [<c011c277>] __update_rq_clock+0x27/0x180
>   [<c013ca80>] autoremove_wake_function+0x0/0x50
>   [<c0136b25>] k_getrusage+0x1f5/0x200
>   [<c01e525c>] security_file_permission+0xc/0x10
>   [<c0182736>] rw_verify_area+0x66/0xd0
>   [<c0136b52>] getrusage+0x22/0x40
>   [<c0182f81>] vfs_read+0xa1/0x140
>   [<c01825b0>] do_sync_read+0x0/0x120
>   [<c01835da>] sys_pread64+0x6a/0x70
>   [<c0103e62>] syscall_call+0x7/0xb
> 
> I haven't looked too closely at this, but maybe the signal caused a  
> lost I/O interrupt?
> 
> What would be the next steps to troubleshoot this further?

'cat /proc/1592/status' should tell you if there is a signal that is
being blocked.


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: lost interrupt after a signal?
  2008-05-22 14:57 lost interrupt after a signal? Chuck Lever
  2008-05-22 20:39 ` Trond Myklebust
@ 2008-05-23  3:50 ` Matthew Wilcox
       [not found]   ` <20080523035004.GY2638-6jwH94ZQLHl74goWV3ctuw@public.gmane.org>
  1 sibling, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2008-05-23  3:50 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond Myklebust, Linux NFS Mailing List

On Thu, May 22, 2008 at 10:57:35AM -0400, Chuck Lever wrote:
> We've been running some tests to understand how the 2.6.25 "intr/ 
> nointr" behavior affects signal handling during I/O on NFS mounts.

I believe intr is now a no-op ... we still recognise the option, but it
does nothing.

> While running an Oracle database workload, we signal the database  
> (this is a normal way administrative tools control database  
> activity).  Subsequently all of the I/O threads block on the inode  
> mutex in nfs_invalidate_mapping() except this one:
> 
> INFO: task oracle:27214 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this  
> message.
> oracle        D f6d85e84  1592 27214      1
>        c93d2920 00200086 00000001 f6d85e84 c04a0080 c04a0080 c04a0080  
> c93d2b84
>        c93d2b84 c4021f80 00000001 cc072000 f341c900 f6d85e7c 10a1a042  
> f6d85e7c
>        cc072ddc c4021f80 03b7e000 cc072ddc c40082b4 c036e21c cc072dd4  
> 00000001
> Call Trace:
>  [<c036e21c>] io_schedule+0x4c/0x90
>  [<c015f63c>] sync_page+0x2c/0x40
>  [<c036e3e5>] __wait_on_bit_lock+0x45/0x70
>  [<c015f610>] sync_page+0x0/0x40
>  [<c015f5f3>] __lock_page+0x73/0x80
>  [<c013cad0>] wake_bit_function+0x0/0x80
>  [<c0167f98>] invalidate_inode_pages2_range+0xb8/0x200
>  [<f905d1a8>] nfs_writepages+0x68/0x90 [nfs]
[...]

This isn't jumping out screaming that it's my fault (obviously it
probably is, but ...).  invalidate_inode_pages2_range calls lock_page()
... which uses TASK_UNINTERRUPTIBLE.  If it were calling
lock_page_killable(), I'd understand.

Maybe this isn't the problem task though.  Maybe this is just the
canary that dropped dead, and we should stop trying to autopsy it and
start running.  [ok, I'll stop with the bad analogies now]

It might be informative to do something like:

 int __fatal_signal_pending(struct task_struct *tsk)
 {
-	return sigismember(&tsk->pending.signal, SIGKILL);
+	if (sigismember(&tsk->pending.signal, SIGKILL)) {
+		dump_stack();
+		return 1;
+	}
+	return 0;
 }

> I haven't looked too closely at this, but maybe the signal caused a  
> lost I/O interrupt?

I wonder if the signal caused something bad to happen to the page that
we're trying to sync?  Maybe dumping the page struct would give us some
useful information?

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: lost interrupt after a signal?
       [not found]   ` <20080523035004.GY2638-6jwH94ZQLHl74goWV3ctuw@public.gmane.org>
@ 2008-05-27 15:59     ` Chuck Lever
  2008-05-27 17:35       ` Matthew Wilcox
  0 siblings, 1 reply; 6+ messages in thread
From: Chuck Lever @ 2008-05-27 15:59 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Trond Myklebust, Linux NFS Mailing List

On May 22, 2008, at 11:50 PM, Matthew Wilcox wrote:
> On Thu, May 22, 2008 at 10:57:35AM -0400, Chuck Lever wrote:
>> We've been running some tests to understand how the 2.6.25 "intr/
>> nointr" behavior affects signal handling during I/O on NFS mounts.
>
> I believe intr is now a no-op ... we still recognise the option, but  
> it
> does nothing.

Understood...

>> While running an Oracle database workload, we signal the database
>> (this is a normal way administrative tools control database
>> activity).  Subsequently all of the I/O threads block on the inode
>> mutex in nfs_invalidate_mapping() except this one:
>>
>> INFO: task oracle:27214 blocked for more than 120 seconds.
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
>> message.
>> oracle        D f6d85e84  1592 27214      1
>>       c93d2920 00200086 00000001 f6d85e84 c04a0080 c04a0080 c04a0080
>> c93d2b84
>>       c93d2b84 c4021f80 00000001 cc072000 f341c900 f6d85e7c 10a1a042
>> f6d85e7c
>>       cc072ddc c4021f80 03b7e000 cc072ddc c40082b4 c036e21c cc072dd4
>> 00000001
>> Call Trace:
>> [<c036e21c>] io_schedule+0x4c/0x90
>> [<c015f63c>] sync_page+0x2c/0x40
>> [<c036e3e5>] __wait_on_bit_lock+0x45/0x70
>> [<c015f610>] sync_page+0x0/0x40
>> [<c015f5f3>] __lock_page+0x73/0x80
>> [<c013cad0>] wake_bit_function+0x0/0x80
>> [<c0167f98>] invalidate_inode_pages2_range+0xb8/0x200
>> [<f905d1a8>] nfs_writepages+0x68/0x90 [nfs]
> [...]
>
> This isn't jumping out screaming that it's my fault (obviously it
> probably is, but ...).  invalidate_inode_pages2_range calls  
> lock_page()
> ... which uses TASK_UNINTERRUPTIBLE.  If it were calling
> lock_page_killable(), I'd understand.

I don't think it's directly caused by your changes, but my concern is  
that you may have exposed a latent bug, or exposed an underlying  
design assumption in the NFS/RPC client stack that causes the hang in  
this situation.

> Maybe this isn't the problem task though.  Maybe this is just the
> canary that dropped dead, and we should stop trying to autopsy it and
> start running.  [ok, I'll stop with the bad analogies now]

This appears to be the only task that is in this state.  All the  
others in the dump are waiting for this inode's mutex.  I don't know  
if the dump is complete, though.

> It might be informative to do something like:
>
> int __fatal_signal_pending(struct task_struct *tsk)
> {
> -	return sigismember(&tsk->pending.signal, SIGKILL);
> +	if (sigismember(&tsk->pending.signal, SIGKILL)) {
> +		dump_stack();
> +		return 1;
> +	}
> +	return 0;
> }
>
>> I haven't looked too closely at this, but maybe the signal caused a
>> lost I/O interrupt?
>
> I wonder if the signal caused something bad to happen to the page that
> we're trying to sync?  Maybe dumping the page struct would give us  
> some
> useful information?

I've passed your suggestions along to our testers.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: lost interrupt after a signal?
  2008-05-27 15:59     ` Chuck Lever
@ 2008-05-27 17:35       ` Matthew Wilcox
       [not found]         ` <20080527173530.GM30894-6jwH94ZQLHl74goWV3ctuw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2008-05-27 17:35 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond Myklebust, Linux NFS Mailing List

On Tue, May 27, 2008 at 11:59:00AM -0400, Chuck Lever wrote:
> >This isn't jumping out screaming that it's my fault (obviously it
> >probably is, but ...).  invalidate_inode_pages2_range calls  
> >lock_page()
> >... which uses TASK_UNINTERRUPTIBLE.  If it were calling
> >lock_page_killable(), I'd understand.
> 
> I don't think it's directly caused by your changes, but my concern is  
> that you may have exposed a latent bug, or exposed an underlying  
> design assumption in the NFS/RPC client stack that causes the hang in  
> this situation.

Certainly possible.

> >Maybe this isn't the problem task though.  Maybe this is just the
> >canary that dropped dead, and we should stop trying to autopsy it and
> >start running.  [ok, I'll stop with the bad analogies now]
> 
> This appears to be the only task that is in this state.  All the  
> others in the dump are waiting for this inode's mutex.  I don't know  
> if the dump is complete, though.

My thought is that the task which caused the problem has gone away and
left this page in a state where sync_page will never finish.

> I've passed your suggestions along to our testers.

Thanks!  I'm keen to get this fixed.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: lost interrupt after a signal?
       [not found]         ` <20080527173530.GM30894-6jwH94ZQLHl74goWV3ctuw@public.gmane.org>
@ 2008-12-09 22:52           ` Chuck Lever
  0 siblings, 0 replies; 6+ messages in thread
From: Chuck Lever @ 2008-12-09 22:52 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Trond Myklebust, Linux NFS Mailing List

On May 27, 2008, at May 27, 2008, 1:35 PM, Matthew Wilcox wrote:
> On Tue, May 27, 2008 at 11:59:00AM -0400, Chuck Lever wrote:
>>> This isn't jumping out screaming that it's my fault (obviously it
>>> probably is, but ...).  invalidate_inode_pages2_range calls
>>> lock_page()
>>> ... which uses TASK_UNINTERRUPTIBLE.  If it were calling
>>> lock_page_killable(), I'd understand.
>>
>> I don't think it's directly caused by your changes, but my concern is
>> that you may have exposed a latent bug, or exposed an underlying
>> design assumption in the NFS/RPC client stack that causes the hang in
>> this situation.
>
> Certainly possible.
>
>>> Maybe this isn't the problem task though.  Maybe this is just the
>>> canary that dropped dead, and we should stop trying to autopsy it  
>>> and
>>> start running.  [ok, I'll stop with the bad analogies now]
>>
>> This appears to be the only task that is in this state.  All the
>> others in the dump are waiting for this inode's mutex.  I don't know
>> if the dump is complete, though.
>
> My thought is that the task which caused the problem has gone away and
> left this page in a state where sync_page will never finish.

One thing to note: NFS doesn't have a sync_page() a_op.  So this  
shouldn't be the problem, right?

>> I've passed your suggestions along to our testers.
>
> Thanks!  I'm keen to get this fixed.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-12-09 22:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-22 14:57 lost interrupt after a signal? Chuck Lever
2008-05-22 20:39 ` Trond Myklebust
2008-05-23  3:50 ` Matthew Wilcox
     [not found]   ` <20080523035004.GY2638-6jwH94ZQLHl74goWV3ctuw@public.gmane.org>
2008-05-27 15:59     ` Chuck Lever
2008-05-27 17:35       ` Matthew Wilcox
     [not found]         ` <20080527173530.GM30894-6jwH94ZQLHl74goWV3ctuw@public.gmane.org>
2008-12-09 22:52           ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.