All of lore.kernel.org
 help / color / mirror / Atom feed
* lost interrupt after a signal?
@ 2008-05-22 14:57 Chuck Lever
  2008-05-22 20:39 ` Trond Myklebust
  2008-05-23  3:50 ` Matthew Wilcox
  0 siblings, 2 replies; 6+ messages in thread
From: Chuck Lever @ 2008-05-22 14:57 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing List, Matthew Wilcox

We've been running some tests to understand how the 2.6.25 "intr/ 
nointr" behavior affects signal handling during I/O on NFS mounts.

While running an Oracle database workload, we signal the database  
(this is a normal way administrative tools control database  
activity).  Subsequently all of the I/O threads block on the inode  
mutex in nfs_invalidate_mapping() except this one:

INFO: task oracle:27214 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this  
message.
oracle        D f6d85e84  1592 27214      1
        c93d2920 00200086 00000001 f6d85e84 c04a0080 c04a0080 c04a0080  
c93d2b84
        c93d2b84 c4021f80 00000001 cc072000 f341c900 f6d85e7c 10a1a042  
f6d85e7c
        cc072ddc c4021f80 03b7e000 cc072ddc c40082b4 c036e21c cc072dd4  
00000001
Call Trace:
  [<c036e21c>] io_schedule+0x4c/0x90
  [<c015f63c>] sync_page+0x2c/0x40
  [<c036e3e5>] __wait_on_bit_lock+0x45/0x70
  [<c015f610>] sync_page+0x0/0x40
  [<c015f5f3>] __lock_page+0x73/0x80
  [<c013cad0>] wake_bit_function+0x0/0x80
  [<c0167f98>] invalidate_inode_pages2_range+0xb8/0x200
  [<f905d1a8>] nfs_writepages+0x68/0x90 [nfs]
  [<f905489f>] nfs_invalidate_mapping_nolock+0x1f/0xd0 [nfs]
  [<f9054ffa>] nfs_invalidate_mapping+0x5a/0x60 [nfs]
  [<f90538a5>] nfs_file_read+0x85/0x120 [nfs]
  [<c0182685>] do_sync_read+0xd5/0x120
  [<c016cf4a>] __do_fault+0x1ca/0x400
  [<c011c277>] __update_rq_clock+0x27/0x180
  [<c013ca80>] autoremove_wake_function+0x0/0x50
  [<c0136b25>] k_getrusage+0x1f5/0x200
  [<c01e525c>] security_file_permission+0xc/0x10
  [<c0182736>] rw_verify_area+0x66/0xd0
  [<c0136b52>] getrusage+0x22/0x40
  [<c0182f81>] vfs_read+0xa1/0x140
  [<c01825b0>] do_sync_read+0x0/0x120
  [<c01835da>] sys_pread64+0x6a/0x70
  [<c0103e62>] syscall_call+0x7/0xb

I haven't looked too closely at this, but maybe the signal caused a  
lost I/O interrupt?

What would be the next steps to troubleshoot this further?

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-12-09 22:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-22 14:57 lost interrupt after a signal? Chuck Lever
2008-05-22 20:39 ` Trond Myklebust
2008-05-23  3:50 ` Matthew Wilcox
     [not found]   ` <20080523035004.GY2638-6jwH94ZQLHl74goWV3ctuw@public.gmane.org>
2008-05-27 15:59     ` Chuck Lever
2008-05-27 17:35       ` Matthew Wilcox
     [not found]         ` <20080527173530.GM30894-6jwH94ZQLHl74goWV3ctuw@public.gmane.org>
2008-12-09 22:52           ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.