open sleeps

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* open sleeps
@ 2008-06-19  5:27 Brian May
  2008-06-19  5:30 ` Brian May
  2008-06-19  6:21 ` Dave Chinner
  0 siblings, 2 replies; 9+ messages in thread
From: Brian May @ 2008-06-19  5:27 UTC (permalink / raw)
  To: xfs

Hello,

I am having (weird) issues with XFS, in that open(...) on certain files 
takes 45 seconds to return. After the file has been opened, the next 
file in the same directory takes 45 seconds. If the file was recently 
opened it returns immediately.

I thought this was a low level I/O issue, so copied the files in 
question to a completely independent RAID array  (separate LVM, RAID, 
controllers, disks), but the problem remains.

More details at thread starting from 
<http://lists.luv.asn.au/wws/arc/luv-main/2008-06/msg00143.html>.

Any ideas?

Please CC me responses.

Thanks.

Brian May

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: open sleeps
  2008-06-19  5:27 open sleeps Brian May
@ 2008-06-19  5:30 ` Brian May
  2008-06-19  6:21 ` Dave Chinner
  1 sibling, 0 replies; 9+ messages in thread
From: Brian May @ 2008-06-19  5:30 UTC (permalink / raw)
  To: Brian May; +Cc: xfs

Sorry, repost with the weird Followup-To header (was hoping Thunderbird 
would use Mail-Followup-To but obviously not).

Brian May wrote:
> Hello,
>
> I am having (weird) issues with XFS, in that open(...) on certain 
> files takes 45 seconds to return. After the file has been opened, the 
> next file in the same directory takes 45 seconds. If the file was 
> recently opened it returns immediately.
>
> I thought this was a low level I/O issue, so copied the files in 
> question to a completely independent RAID array  (separate LVM, RAID, 
> controllers, disks), but the problem remains.
>
> More details at thread starting from 
> <http://lists.luv.asn.au/wws/arc/luv-main/2008-06/msg00143.html>.
>
> Any ideas?
>
> Please CC me responses.
>
> Thanks.
>
> Brian May

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: open sleeps
  2008-06-19  5:27 open sleeps Brian May
  2008-06-19  5:30 ` Brian May
@ 2008-06-19  6:21 ` Dave Chinner
  2008-06-19  6:40   ` Brian May
  1 sibling, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2008-06-19  6:21 UTC (permalink / raw)
  To: Brian May; +Cc: xfs

On Thu, Jun 19, 2008 at 03:27:48PM +1000, Brian May wrote:
> Hello,
>
> I am having (weird) issues with XFS, in that open(...) on certain files  
> takes 45 seconds to return. After the file has been opened, the next  
> file in the same directory takes 45 seconds. If the file was recently  
> opened it returns immediately.
>
> I thought this was a low level I/O issue, so copied the files in  
> question to a completely independent RAID array  (separate LVM, RAID,  
> controllers, disks), but the problem remains.
>
> More details at thread starting from  
> <http://lists.luv.asn.au/wws/arc/luv-main/2008-06/msg00143.html>.
>
> Any ideas?

# echo t > /proc/sysrq-trigger

when it is hung to get a stack trace of the blocked open call.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: open sleeps
  2008-06-19  6:21 ` Dave Chinner
@ 2008-06-19  6:40   ` Brian May
  2008-06-19  8:43     ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Brian May @ 2008-06-19  6:40 UTC (permalink / raw)
  To: Brian May, xfs

Dave Chinner wrote:
> On Thu, Jun 19, 2008 at 03:27:48PM +1000, Brian May wrote:
>   
>> Hello,
>>
>> I am having (weird) issues with XFS, in that open(...) on certain files  
>> takes 45 seconds to return. After the file has been opened, the next  
>> file in the same directory takes 45 seconds. If the file was recently  
>> opened it returns immediately.
>>
>> I thought this was a low level I/O issue, so copied the files in  
>> question to a completely independent RAID array  (separate LVM, RAID,  
>> controllers, disks), but the problem remains.
>>
>> More details at thread starting from  
>> <http://lists.luv.asn.au/wws/arc/luv-main/2008-06/msg00143.html>.
>>
>> Any ideas?
>>     
>
> # echo t > /proc/sysrq-trigger
>
> when it is hung to get a stack trace of the blocked open call.
>
> Cheers,
>
> Dave.
>   
Does the following help? I still have the logs of the other processes, 
if required (just in case it is some weird interaction between multiple 
processes?)

It seems to be pretty consistent with lock_timer_base, every time I look 
(assuming I haven't read the stack trace upside down...).

Jun 19 16:33:30 hq kernel: grep          S 00000000     0 12793  12567                     (NOTLB)

Jun 19 16:33:30 hq kernel:        f0c23e7c 00200082 000a1089 00000000 00000010 00000008 cd0db550 dfa97550 

Jun 19 16:33:30 hq kernel:        34f84262 00273db2 0008a1dc 00000001 cd0db660 c20140a0 dfe1cbe8 00200286 

Jun 19 16:33:30 hq kernel:        c0125380 a4dbf26b dfa6a000 00200286 000000ff 00000000 00000000 a4dbf26b 

Jun 19 16:33:30 hq kernel: Call Trace:

Jun 19 16:33:30 hq kernel:  [<c0125380>] lock_timer_base+0x15/0x2f

Jun 19 16:33:30 hq kernel:  [<c027f960>] schedule_timeout+0x71/0x8c

Jun 19 16:33:30 hq kernel:  [<c0124a81>] process_timeout+0x0/0x5

Jun 19 16:33:30 hq kernel:  [<c016c801>] __break_lease+0x2a8/0x2b9

Jun 19 16:33:30 hq kernel:  [<c0117778>] default_wake_function+0x0/0xc

Jun 19 16:33:30 hq kernel:  [<c0166450>] may_open+0x125/0x203

Jun 19 16:33:30 hq kernel:  [<c0168451>] open_namei+0x23d/0x5c8

Jun 19 16:33:30 hq kernel:  [<c0158cb8>] do_filp_open+0x1c/0x31

Jun 19 16:33:30 hq kernel:  [<c0161f38>] sys_stat64+0x1e/0x23

Jun 19 16:33:30 hq kernel:  [<c0158d0b>] do_sys_open+0x3e/0xb3

Jun 19 16:33:30 hq kernel:  [<c0158dad>] sys_open+0x16/0x18

Jun 19 16:33:30 hq kernel:  [<c0102c11>] sysenter_past_esp+0x56/0x79



Jun 19 16:33:50 hq kernel: grep          S 00000000     0 12793  12567                     (NOTLB)

Jun 19 16:33:50 hq kernel:        f0c23e7c 00200082 000a1089 00000000 00000010 00000008 cd0db550 dfa97550 

Jun 19 16:33:50 hq kernel:        34f84262 00273db2 0008a1dc 00000001 cd0db660 c20140a0 dfe1cbe8 00200286 

Jun 19 16:33:50 hq kernel:        c0125380 a4dbf26b dfa6a000 00200286 000000ff 00000000 00000000 a4dbf26b 

Jun 19 16:33:50 hq kernel: Call Trace:

Jun 19 16:33:50 hq kernel:  [<c0125380>] lock_timer_base+0x15/0x2f

Jun 19 16:33:50 hq kernel:  [<c027f960>] schedule_timeout+0x71/0x8c

Jun 19 16:33:50 hq kernel:  [<c0124a81>] process_timeout+0x0/0x5

Jun 19 16:33:50 hq kernel:  [<c016c801>] __break_lease+0x2a8/0x2b9

Jun 19 16:33:50 hq kernel:  [<c0117778>] default_wake_function+0x0/0xc

Jun 19 16:33:50 hq kernel:  [<c0166450>] may_open+0x125/0x203

Jun 19 16:33:50 hq kernel:  [<c0168451>] open_namei+0x23d/0x5c8

Jun 19 16:33:50 hq kernel:  [<c0158cb8>] do_filp_open+0x1c/0x31

Jun 19 16:33:50 hq kernel:  [<c0161f38>] sys_stat64+0x1e/0x23

Jun 19 16:33:50 hq kernel:  [<c0158d0b>] do_sys_open+0x3e/0xb3

Jun 19 16:33:50 hq kernel:  [<c0158dad>] sys_open+0x16/0x18

Jun 19 16:33:50 hq kernel:  [<c0102c11>] sysenter_past_esp+0x56/0x79


Thanks.

Brian May

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: open sleeps
  2008-06-19  6:40   ` Brian May
@ 2008-06-19  8:43     ` Christoph Hellwig
  2008-06-19  9:48       ` Sebastian Brings
  2008-06-20  1:47       ` Brian May
  0 siblings, 2 replies; 9+ messages in thread
From: Christoph Hellwig @ 2008-06-19  8:43 UTC (permalink / raw)
  To: Brian May; +Cc: xfs

On Thu, Jun 19, 2008 at 04:40:00PM +1000, Brian May wrote:
> Does the following help? I still have the logs of the other processes, if 
> required (just in case it is some weird interaction between multiple 
> processes?)
>
> It seems to be pretty consistent with lock_timer_base, every time I look 
> (assuming I haven't read the stack trace upside down...).
>
> Jun 19 16:33:30 hq kernel: grep          S 00000000     0 12793  12567                     (NOTLB)
>
> Jun 19 16:33:30 hq kernel:        f0c23e7c 00200082 000a1089 00000000 
> 00000010 00000008 cd0db550 dfa97550 
> Jun 19 16:33:30 hq kernel:        34f84262 00273db2 0008a1dc 00000001 
> cd0db660 c20140a0 dfe1cbe8 00200286 
> Jun 19 16:33:30 hq kernel:        c0125380 a4dbf26b dfa6a000 00200286 
> 000000ff 00000000 00000000 a4dbf26b 
> Jun 19 16:33:30 hq kernel: Call Trace:
>
> Jun 19 16:33:30 hq kernel:  [<c0125380>] lock_timer_base+0x15/0x2f
>
> Jun 19 16:33:30 hq kernel:  [<c027f960>] schedule_timeout+0x71/0x8c
>
> Jun 19 16:33:30 hq kernel:  [<c0124a81>] process_timeout+0x0/0x5
>
> Jun 19 16:33:30 hq kernel:  [<c016c801>] __break_lease+0x2a8/0x2b9

That's the lease breaking code in the VFS, long before we call
into XFS.  Looks like someone (samba?) has a least on this file and
we're having trouble having it broken.  Try sending a report about
this to linux-fsdevel@vger.kernel.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: open sleeps
  2008-06-19  8:43     ` Christoph Hellwig
@ 2008-06-19  9:48       ` Sebastian Brings
  2008-06-20  1:47       ` Brian May
  1 sibling, 0 replies; 9+ messages in thread
From: Sebastian Brings @ 2008-06-19  9:48 UTC (permalink / raw)
  To: xfs@oss.sgi.com



> -----Original Message-----
> From: xfs-bounce@oss.sgi.com [mailto:xfs-bounce@oss.sgi.com]
> On Behalf Of Christoph Hellwig
> Sent: Donnerstag, 19. Juni 2008 10:43
> To: Brian May
> Cc: xfs@oss.sgi.com
> Subject: Re: open sleeps
>
> On Thu, Jun 19, 2008 at 04:40:00PM +1000, Brian May wrote:
> > Does the following help? I still have the logs of the other
> processes, if
> > required (just in case it is some weird interaction between multiple
> > processes?)
> >
> > It seems to be pretty consistent with lock_timer_base,
> every time I look
> > (assuming I haven't read the stack trace upside down...).
> >
> > Jun 19 16:33:30 hq kernel: grep          S 00000000     0
> 12793  12567                     (NOTLB)
> >
> > Jun 19 16:33:30 hq kernel:        f0c23e7c 00200082
> 000a1089 00000000
> > 00000010 00000008 cd0db550 dfa97550
> > Jun 19 16:33:30 hq kernel:        34f84262 00273db2
> 0008a1dc 00000001
> > cd0db660 c20140a0 dfe1cbe8 00200286
> > Jun 19 16:33:30 hq kernel:        c0125380 a4dbf26b
> dfa6a000 00200286
> > 000000ff 00000000 00000000 a4dbf26b
> > Jun 19 16:33:30 hq kernel: Call Trace:
> >
> > Jun 19 16:33:30 hq kernel:  [<c0125380>] lock_timer_base+0x15/0x2f
> >
> > Jun 19 16:33:30 hq kernel:  [<c027f960>] schedule_timeout+0x71/0x8c
> >
> > Jun 19 16:33:30 hq kernel:  [<c0124a81>] process_timeout+0x0/0x5
> >
> > Jun 19 16:33:30 hq kernel:  [<c016c801>] __break_lease+0x2a8/0x2b9
>
> That's the lease breaking code in the VFS, long before we call
> into XFS.  Looks like someone (samba?) has a least on this file and
> we're having trouble having it broken.  Try sending a report about
> this to linux-fsdevel@vger.kernel.org
>
>

For curiosity: Is the lease breaking you mention breaking a kernel oplock samba may hold?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: open sleeps
  2008-06-19  8:43     ` Christoph Hellwig
  2008-06-19  9:48       ` Sebastian Brings
@ 2008-06-20  1:47       ` Brian May
  2008-06-20  4:11         ` Brian May
  1 sibling, 1 reply; 9+ messages in thread
From: Brian May @ 2008-06-20  1:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

Christoph Hellwig wrote:
> On Thu, Jun 19, 2008 at 04:40:00PM +1000, Brian May wrote:
>   
>> Does the following help? I still have the logs of the other processes, if 
>> required (just in case it is some weird interaction between multiple 
>> processes?)
>>
>> It seems to be pretty consistent with lock_timer_base, every time I look 
>> (assuming I haven't read the stack trace upside down...).
>>
>> Jun 19 16:33:30 hq kernel: grep          S 00000000     0 12793  12567                     (NOTLB)
>>
>> Jun 19 16:33:30 hq kernel:        f0c23e7c 00200082 000a1089 00000000 
>> 00000010 00000008 cd0db550 dfa97550 
>> Jun 19 16:33:30 hq kernel:        34f84262 00273db2 0008a1dc 00000001 
>> cd0db660 c20140a0 dfe1cbe8 00200286 
>> Jun 19 16:33:30 hq kernel:        c0125380 a4dbf26b dfa6a000 00200286 
>> 000000ff 00000000 00000000 a4dbf26b 
>> Jun 19 16:33:30 hq kernel: Call Trace:
>>
>> Jun 19 16:33:30 hq kernel:  [<c0125380>] lock_timer_base+0x15/0x2f
>>
>> Jun 19 16:33:30 hq kernel:  [<c027f960>] schedule_timeout+0x71/0x8c
>>
>> Jun 19 16:33:30 hq kernel:  [<c0124a81>] process_timeout+0x0/0x5
>>
>> Jun 19 16:33:30 hq kernel:  [<c016c801>] __break_lease+0x2a8/0x2b9
>>     
>
> That's the lease breaking code in the VFS, long before we call
> into XFS.  Looks like someone (samba?) has a least on this file and
> we're having trouble having it broken.  Try sending a report about
> this to linux-fsdevel@vger.kernel.org
>   
I feel I am going around in circles.

Anyway, I started the discussion from 
<http://www.archivum.info/linux-fsdevel@vger.kernel.org/2008-06/msg00337.html>.

In the last message (which isn't archived yet), I looked at the Samba 
process that is holding the lease. The following is the stack trace of 
this process. I don't understand why the XFS code is calling e1000 code, 
the filesystem isn't attached via the network. Perhaps this would mean 
the problem is with the network code???

Jun 20 10:54:37 hq kernel: smbd          S 00000000     0 13516  11112               13459 (NOTLB)

Jun 20 10:54:37 hq kernel:        ddd19b70 00000082 034cdfca 00000000 00000001 00000007 f7c2c550 dfa9caa0 


Jun 20 10:54:37 hq kernel:        ae402975 002779a9 0000830f 00000003 f7c2c660 c20240a0 00000001 00000286 


Jun 20 10:54:37 hq kernel:        c0125380 a5d7f11b c2116000 00000286 000000ff 00000000 00000000 a5d7f11b 


Jun 20 10:54:37 hq kernel: Call Trace:

Jun 20 10:54:37 hq kernel:  [<c0125380>] lock_timer_base+0x15/0x2f

Jun 20 10:54:37 hq kernel:  [<c027f960>] schedule_timeout+0x71/0x8c

Jun 20 10:54:37 hq kernel:  [<c0124a81>] process_timeout+0x0/0x5

Jun 20 10:54:37 hq kernel:  [<c016a115>] do_select+0x37a/0x3d4

Jun 20 10:54:37 hq kernel:  [<c016a677>] __pollwait+0x0/0xb2

Jun 20 10:54:37 hq kernel:  [<c0117778>] default_wake_function+0x0/0xc

Jun 20 10:54:37 hq kernel:  [<c0117778>] default_wake_function+0x0/0xc

Jun 20 10:54:37 hq kernel:  [<f88e998f>] e1000_xmit_frame+0x928/0x958 [e1000]

Jun 20 10:54:37 hq kernel:  [<c0121c24>] tasklet_action+0x55/0xaf

Jun 20 10:54:37 hq kernel:  [<c022950a>] dev_hard_start_xmit+0x19a/0x1f0

Jun 20 10:54:37 hq kernel:  [<f8ae3e6d>] xfs_iext_bno_to_ext+0xd8/0x191 [xfs]

Jun 20 10:54:37 hq kernel:  [<f8ac7aec>] xfs_bmap_search_multi_extents+0xa8/0xc5 [xfs]

Jun 20 10:54:37 hq kernel:  [<f8ac7b52>] xfs_bmap_search_extents+0x49/0xbe [xfs]

Jun 20 10:54:37 hq kernel:  [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]

Jun 20 10:54:37 hq kernel:  [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]

Jun 20 10:54:37 hq kernel:  [<c02547e4>] tcp_transmit_skb+0x604/0x632

Jun 20 10:54:37 hq kernel:  [<c02560d3>] __tcp_push_pending_frames+0x6a2/0x758

Jun 20 10:54:37 hq kernel:  [<c016d84e>] __d_lookup+0x98/0xdb

Jun 20 10:54:37 hq kernel:  [<c016d84e>] __d_lookup+0x98/0xdb

Jun 20 10:54:37 hq kernel:  [<c0165370>] do_lookup+0x4f/0x135

Jun 20 10:54:37 hq kernel:  [<c016dbc4>] dput+0x1a/0x11b

Jun 20 10:54:37 hq kernel:  [<c0167312>] __link_path_walk+0xbe4/0xd1d

Jun 20 10:54:37 hq kernel:  [<c016a3fb>] core_sys_select+0x28c/0x2a9

Jun 20 10:54:37 hq kernel:  [<c01674fe>] link_path_walk+0xb3/0xbd

Jun 20 10:54:37 hq kernel:  [<f8afbea1>] xfs_inactive_free_eofblocks+0xdf/0x23f [xfs]

Jun 20 10:54:37 hq kernel:  [<c016785d>] do_path_lookup+0x20a/0x225

Jun 20 10:54:37 hq kernel:  [<f8b07de5>] xfs_vn_getattr+0x27/0x2f [xfs]

Jun 20 10:54:37 hq kernel:  [<c0161b28>] cp_new_stat64+0xfd/0x10f

Jun 20 10:54:37 hq kernel:  [<c016a9c1>] sys_select+0x9f/0x182

Jun 20 10:54:37 hq kernel:  [<c0102c11>] sysenter_past_esp+0x56/0x79



I guess I also need to make sure I get this same stack trace each time.

Thanks.
Brian May

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: open sleeps
  2008-06-20  1:47       ` Brian May
@ 2008-06-20  4:11         ` Brian May
  2008-06-26  6:29           ` Brian May
  0 siblings, 1 reply; 9+ messages in thread
From: Brian May @ 2008-06-20  4:11 UTC (permalink / raw)
  To: Brian May; +Cc: Christoph Hellwig, xfs

Brian May wrote:
> I guess I also need to make sure I get this same stack trace each time.
Yes. Looks consistent to me.

Brian May.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: open sleeps
  2008-06-20  4:11         ` Brian May
@ 2008-06-26  6:29           ` Brian May
  0 siblings, 0 replies; 9+ messages in thread
From: Brian May @ 2008-06-26  6:29 UTC (permalink / raw)
  To: Brian May; +Cc: Christoph Hellwig, xfs

Hello,

Thanks for your help.

Just a followup: Every indication seems to be that it is the firewall in 
the virus scanner we use, F-Secure, that prevents Windows from releasing 
oplocks properly. After turning off the firewall, the oplocks get 
released and there is no problem. Possibly related to the timing of wpkg 
being invoked before system logons too.

Might be time to consider another virus scanner/firewall solution.

Brian May

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-06-26  6:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-19  5:27 open sleeps Brian May
2008-06-19  5:30 ` Brian May
2008-06-19  6:21 ` Dave Chinner
2008-06-19  6:40   ` Brian May
2008-06-19  8:43     ` Christoph Hellwig
2008-06-19  9:48       ` Sebastian Brings
2008-06-20  1:47       ` Brian May
2008-06-20  4:11         ` Brian May
2008-06-26  6:29           ` Brian May

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox