open sleeps

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* open sleeps
@ 2008-06-19 23:14 Brian May
  2008-06-19 23:58 ` Stephen Rothwell
  0 siblings, 1 reply; 8+ messages in thread
From: Brian May @ 2008-06-19 23:14 UTC (permalink / raw)
  To: linux-fsdevel

Hello,

I am having (weird) issues with XFS, in that open(...) on certain files 
takes 45 seconds to return. After the file has been opened, the next 
file in the same directory takes 45 seconds. If the file was recently 
opened it returns immediately.

I thought this was a low level I/O issue, so copied the files in 
question to a completely independent RAID array (separate LVM, RAID, 
controllers, disks), but the problem remains.

I raised the issue on the XFS mailing list and was told the problem was 
the the lease breaking code in VFS[2], and that I should contact this 
mailing list.

This is with the kernel in Debian/Stable:

hq:/tmp# uname -a
Linux hq 2.6.18-6-686 #1 SMP Sun Feb 10 22:11:31 UTC 2008 i686 GNU/Linux

More details:
[1] <http://lists.luv.asn.au/wws/arc/luv-main/2008-06/msg00143.html>
[2] <http://oss.sgi.com/archives/xfs/2008-06/msg00222.html>

Any ideas?

Please CC me responses.

Thanks.

Brian May

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: open sleeps
  2008-06-19 23:14 open sleeps Brian May
@ 2008-06-19 23:58 ` Stephen Rothwell
  2008-06-20  0:07   ` Brian May
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Rothwell @ 2008-06-19 23:58 UTC (permalink / raw)
  To: Brian May; +Cc: linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 991 bytes --]

Hi Brian,

On Fri, 20 Jun 2008 09:14:46 +1000 Brian May <brian@vpac.org> wrote:
>
> I am having (weird) issues with XFS, in that open(...) on certain files 
> takes 45 seconds to return. After the file has been opened, the next 
> file in the same directory takes 45 seconds. If the file was recently 
> opened it returns immediately.
> 
> I thought this was a low level I/O issue, so copied the files in 
> question to a completely independent RAID array (separate LVM, RAID, 
> controllers, disks), but the problem remains.
> 
> I raised the issue on the XFS mailing list and was told the problem was 
> the the lease breaking code in VFS[2], and that I should contact this 
> mailing list.

So, it would seem some other process has taken write leases on your
files but does not release them in a timely manner.  You can see the
current leases in /proc/locks.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: open sleeps
  2008-06-19 23:58 ` Stephen Rothwell
@ 2008-06-20  0:07   ` Brian May
  2008-06-20  0:20     ` Stephen Rothwell
  0 siblings, 1 reply; 8+ messages in thread
From: Brian May @ 2008-06-20  0:07 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linux-fsdevel

Stephen Rothwell wrote:
> So, it would seem some other process has taken write leases on your
> files but does not release them in a timely manner.  You can see the
> current leases in /proc/locks.
>   
What is a lease? Is this a kernel level thing or user level thing?

What is the format of /proc/locks? I assume the 4th field is PID, what 
is the next field? How do I work out what is locked?

Thanks.

Brian May

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: open sleeps
  2008-06-20  0:07   ` Brian May
@ 2008-06-20  0:20     ` Stephen Rothwell
  2008-06-20  0:50       ` Brian May
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Rothwell @ 2008-06-20  0:20 UTC (permalink / raw)
  To: Brian May; +Cc: linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1301 bytes --]

On Fri, 20 Jun 2008 10:07:26 +1000 Brian May <brian@vpac.org> wrote:
>
> Stephen Rothwell wrote:
> > So, it would seem some other process has taken write leases on your
> > files but does not release them in a timely manner.  You can see the
> > current leases in /proc/locks.
> >   
> What is a lease? Is this a kernel level thing or user level thing?

Its what samba calls an oplock.  Its like a kernel file lock but it
blocks opens and notifies the holder (who is supposed to clean up and
release the lease).  If the lease is not released in 45 seconds (by
default) then the kernel assumes that the holder is broken and breaks the
lease and allows the open to proceed.

> What is the format of /proc/locks? I assume the 4th field is PID, what 
> is the next field? How do I work out what is locked?

the format is:

index: type status subtype pid major:minor:inode start end

for leases, type is LEASE status is ACTIVE, BREAKING or BREAKER, subtype
is UNLCK, READ or WRITE.

If there is a "->" infromt of the type, then the process described by
this line is blocked by the first line with the same index.

You can see the inode number of a file using "ls -i".

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: open sleeps
  2008-06-20  0:20     ` Stephen Rothwell
@ 2008-06-20  0:50       ` Brian May
  2008-06-20  1:11         ` Brian May
  0 siblings, 1 reply; 8+ messages in thread
From: Brian May @ 2008-06-20  0:50 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linux-fsdevel

Stephen Rothwell wrote:
> Its what samba calls an oplock.  Its like a kernel file lock but it
> blocks opens and notifies the holder (who is supposed to clean up and
> release the lease).  If the lease is not released in 45 seconds (by
> default) then the kernel assumes that the holder is broken and breaks the
> lease and allows the open to proceed.
>   
Ok, so I guess it is most likely Samba's fault then. Most likely this is 
the only application that uses leases/oplocks on this computer anyway. I 
will keep investigating...

Thanks.

Brian May



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: open sleeps
  2008-06-20  0:50       ` Brian May
@ 2008-06-20  1:11         ` Brian May
  2008-06-20  6:45           ` Dave Chinner
  0 siblings, 1 reply; 8+ messages in thread
From: Brian May @ 2008-06-20  1:11 UTC (permalink / raw)
  To: Brian May; +Cc: Stephen Rothwell, linux-fsdevel

Brian May wrote:
> Stephen Rothwell wrote:
>> Its what samba calls an oplock.  Its like a kernel file lock but it
>> blocks opens and notifies the holder (who is supposed to clean up and
>> release the lease).  If the lease is not released in 45 seconds (by
>> default) then the kernel assumes that the holder is broken and breaks 
>> the
>> lease and allows the open to proceed.
>>   
> Ok, so I guess it is most likely Samba's fault then. Most likely this 
> is the only application that uses leases/oplocks on this computer 
> anyway. I will keep investigating...
Maybe I spoke too soon when I ruled XFS out :-(.


hq:~# cat /proc/locks | grep :0a:
5: LEASE  ACTIVE    WRITE 13516 fd:0a:131 0 EOF

hq:~# ps  13516
  PID TTY      STAT   TIME COMMAND
13516 ?        S      0:00 /usr/sbin/smbd -D

hq:~# find /var/lib/wpkg -inum 131
/var/lib/wpkg/cp_status.cmd

hq:~# cat /var/lib/wpkg/cp_status.cmd
[Ctrl-C pushed]

hq:~# cat /proc/locks | grep :0a:
5: LEASE  BREAKING  READ  13516 fd:0a:131 0 EOF

hq:~# echo t > /proc/sysrq-trigger


Gives a stack trace of:

Jun 20 10:54:37 hq kernel: smbd          S 00000000     0 13516  11112               13459 (NOTLB)

Jun 20 10:54:37 hq kernel:        ddd19b70 00000082 034cdfca 00000000 00000001 00000007 f7c2c550 dfa9caa0 

Jun 20 10:54:37 hq kernel:        ae402975 002779a9 0000830f 00000003 f7c2c660 c20240a0 00000001 00000286 

Jun 20 10:54:37 hq kernel:        c0125380 a5d7f11b c2116000 00000286 000000ff 00000000 00000000 a5d7f11b 

Jun 20 10:54:37 hq kernel: Call Trace:

Jun 20 10:54:37 hq kernel:  [<c0125380>] lock_timer_base+0x15/0x2f

Jun 20 10:54:37 hq kernel:  [<c027f960>] schedule_timeout+0x71/0x8c

Jun 20 10:54:37 hq kernel:  [<c0124a81>] process_timeout+0x0/0x5

Jun 20 10:54:37 hq kernel:  [<c016a115>] do_select+0x37a/0x3d4

Jun 20 10:54:37 hq kernel:  [<c016a677>] __pollwait+0x0/0xb2

Jun 20 10:54:37 hq kernel:  [<c0117778>] default_wake_function+0x0/0xc

Jun 20 10:54:37 hq kernel:  [<c0117778>] default_wake_function+0x0/0xc

Jun 20 10:54:37 hq kernel:  [<f88e998f>] e1000_xmit_frame+0x928/0x958 [e1000]

Jun 20 10:54:37 hq kernel:  [<c0121c24>] tasklet_action+0x55/0xaf

Jun 20 10:54:37 hq kernel:  [<c022950a>] dev_hard_start_xmit+0x19a/0x1f0

Jun 20 10:54:37 hq kernel:  [<f8ae3e6d>] xfs_iext_bno_to_ext+0xd8/0x191 [xfs]

Jun 20 10:54:37 hq kernel:  [<f8ac7aec>] xfs_bmap_search_multi_extents+0xa8/0xc5 [xfs]

Jun 20 10:54:37 hq kernel:  [<f8ac7b52>] xfs_bmap_search_extents+0x49/0xbe [xfs]

Jun 20 10:54:37 hq kernel:  [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]

Jun 20 10:54:37 hq kernel:  [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]

Jun 20 10:54:37 hq kernel:  [<c02547e4>] tcp_transmit_skb+0x604/0x632

Jun 20 10:54:37 hq kernel:  [<c02560d3>] __tcp_push_pending_frames+0x6a2/0x758

Jun 20 10:54:37 hq kernel:  [<c016d84e>] __d_lookup+0x98/0xdb

Jun 20 10:54:37 hq kernel:  [<c016d84e>] __d_lookup+0x98/0xdb

Jun 20 10:54:37 hq kernel:  [<c0165370>] do_lookup+0x4f/0x135

Jun 20 10:54:37 hq kernel:  [<c016dbc4>] dput+0x1a/0x11b

Jun 20 10:54:37 hq kernel:  [<c0167312>] __link_path_walk+0xbe4/0xd1d

Jun 20 10:54:37 hq kernel:  [<c016a3fb>] core_sys_select+0x28c/0x2a9

Jun 20 10:54:37 hq kernel:  [<c01674fe>] link_path_walk+0xb3/0xbd

Jun 20 10:54:37 hq kernel:  [<f8afbea1>] xfs_inactive_free_eofblocks+0xdf/0x23f [xfs]

Jun 20 10:54:37 hq kernel:  [<c016785d>] do_path_lookup+0x20a/0x225

Jun 20 10:54:37 hq kernel:  [<f8b07de5>] xfs_vn_getattr+0x27/0x2f [xfs]

Jun 20 10:54:37 hq kernel:  [<c0161b28>] cp_new_stat64+0xfd/0x10f

Jun 20 10:54:37 hq kernel:  [<c016a9c1>] sys_select+0x9f/0x182

Jun 20 10:54:37 hq kernel:  [<c0102c11>] sysenter_past_esp+0x56/0x79


Why xfs routines are calling e1000 (isn't that a network driver???) is 
very puzzling. This was using a Fibre Channel card (QLogic Corp. QLA2312 
Fibre Channel Adapter); Would there be any benefit in repeating the same 
thing again with the SATA device?

Brian May

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: open sleeps
  2008-06-20  1:11         ` Brian May
@ 2008-06-20  6:45           ` Dave Chinner
  2008-06-26  6:30             ` Brian May
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2008-06-20  6:45 UTC (permalink / raw)
  To: Brian May; +Cc: Stephen Rothwell, linux-fsdevel

On Fri, Jun 20, 2008 at 11:11:54AM +1000, Brian May wrote:
> Brian May wrote:
>> Stephen Rothwell wrote:
>>> Its what samba calls an oplock.  Its like a kernel file lock but it
>>> blocks opens and notifies the holder (who is supposed to clean up and
>>> release the lease).  If the lease is not released in 45 seconds (by
>>> default) then the kernel assumes that the holder is broken and breaks 
>>> the
>>> lease and allows the open to proceed.
>>>   
>> Ok, so I guess it is most likely Samba's fault then. Most likely this  
>> is the only application that uses leases/oplocks on this computer  
>> anyway. I will keep investigating...
> Maybe I spoke too soon when I ruled XFS out :-(.
....
> hq:~# echo t > /proc/sysrq-trigger
> Gives a stack trace of:
>
> Jun 20 10:54:37 hq kernel: Call Trace:
>
> Jun 20 10:54:37 hq kernel:  [<c0125380>] lock_timer_base+0x15/0x2f
> Jun 20 10:54:37 hq kernel:  [<c027f960>] schedule_timeout+0x71/0x8c
> Jun 20 10:54:37 hq kernel:  [<c0124a81>] process_timeout+0x0/0x5
> Jun 20 10:54:37 hq kernel:  [<c016a115>] do_select+0x37a/0x3d4
> Jun 20 10:54:37 hq kernel:  [<c016a677>] __pollwait+0x0/0xb2
> Jun 20 10:54:37 hq kernel:  [<c0117778>] default_wake_function+0x0/0xc
> Jun 20 10:54:37 hq kernel:  [<c0117778>] default_wake_function+0x0/0xc
> Jun 20 10:54:37 hq kernel:  [<f88e998f>] e1000_xmit_frame+0x928/0x958 [e1000]
> Jun 20 10:54:37 hq kernel:  [<c0121c24>] tasklet_action+0x55/0xaf
> Jun 20 10:54:37 hq kernel:  [<c022950a>] dev_hard_start_xmit+0x19a/0x1f0
> Jun 20 10:54:37 hq kernel:  [<f8ae3e6d>] xfs_iext_bno_to_ext+0xd8/0x191 [xfs]
> Jun 20 10:54:37 hq kernel:  [<f8ac7aec>] xfs_bmap_search_multi_extents+0xa8/0xc5 [xfs]
> Jun 20 10:54:37 hq kernel:  [<f8ac7b52>] xfs_bmap_search_extents+0x49/0xbe [xfs]
> Jun 20 10:54:37 hq kernel:  [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]
> Jun 20 10:54:37 hq kernel:  [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]
> Jun 20 10:54:37 hq kernel:  [<c02547e4>] tcp_transmit_skb+0x604/0x632
> Jun 20 10:54:37 hq kernel:  [<c02560d3>] __tcp_push_pending_frames+0x6a2/0x758
> Jun 20 10:54:37 hq kernel:  [<c016d84e>] __d_lookup+0x98/0xdb
> Jun 20 10:54:37 hq kernel:  [<c016d84e>] __d_lookup+0x98/0xdb
> Jun 20 10:54:37 hq kernel:  [<c0165370>] do_lookup+0x4f/0x135
> Jun 20 10:54:37 hq kernel:  [<c016dbc4>] dput+0x1a/0x11b
> Jun 20 10:54:37 hq kernel:  [<c0167312>] __link_path_walk+0xbe4/0xd1d
> Jun 20 10:54:37 hq kernel:  [<c016a3fb>] core_sys_select+0x28c/0x2a9
> Jun 20 10:54:37 hq kernel:  [<c01674fe>] link_path_walk+0xb3/0xbd
> Jun 20 10:54:37 hq kernel:  [<f8afbea1>] xfs_inactive_free_eofblocks+0xdf/0x23f [xfs]
> Jun 20 10:54:37 hq kernel:  [<c016785d>] do_path_lookup+0x20a/0x225
> Jun 20 10:54:37 hq kernel:  [<f8b07de5>] xfs_vn_getattr+0x27/0x2f [xfs]
> Jun 20 10:54:37 hq kernel:  [<c0161b28>] cp_new_stat64+0xfd/0x10f
> Jun 20 10:54:37 hq kernel:  [<c016a9c1>] sys_select+0x9f/0x182
> Jun 20 10:54:37 hq kernel:  [<c0102c11>] sysenter_past_esp+0x56/0x79

This is just the stupid (broken) ia32 stack tracing - it outputs every
single object it sees in the stack that may be function call. Hence
you see some stuff that is from past stack traversals. I'd say this
is sleeping in a select() call, not doing anything related to
xfs or networking....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: open sleeps
  2008-06-20  6:45           ` Dave Chinner
@ 2008-06-26  6:30             ` Brian May
  0 siblings, 0 replies; 8+ messages in thread
From: Brian May @ 2008-06-26  6:30 UTC (permalink / raw)
  To: Brian May, Stephen Rothwell, linux-fsdevel

Dave Chinner wrote:
> This is just the stupid (broken) ia32 stack tracing - it outputs every
> single object it sees in the stack that may be function call. Hence
> you see some stuff that is from past stack traversals. I'd say this
> is sleeping in a select() call, not doing anything related to
> xfs or networking....
>   
Hello,

Thanks for your help.

Just a followup: Every indication seems to be that it is the firewall in 
the virus scanner we use, F-Secure, that prevents Windows from releasing 
oplocks properly. After turning off the firewall, the oplocks get 
released and there is no problem. Possibly related to the timing of wpkg 
being invoked before system logons too.

Might be time to consider another virus scanner/firewall solution.

Brian May

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-06-26  6:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-19 23:14 open sleeps Brian May
2008-06-19 23:58 ` Stephen Rothwell
2008-06-20  0:07   ` Brian May
2008-06-20  0:20     ` Stephen Rothwell
2008-06-20  0:50       ` Brian May
2008-06-20  1:11         ` Brian May
2008-06-20  6:45           ` Dave Chinner
2008-06-26  6:30             ` Brian May

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).