* open sleeps
@ 2008-06-19 23:14 Brian May
2008-06-19 23:58 ` Stephen Rothwell
0 siblings, 1 reply; 8+ messages in thread
From: Brian May @ 2008-06-19 23:14 UTC (permalink / raw)
To: linux-fsdevel
Hello,
I am having (weird) issues with XFS, in that open(...) on certain files
takes 45 seconds to return. After the file has been opened, the next
file in the same directory takes 45 seconds. If the file was recently
opened it returns immediately.
I thought this was a low level I/O issue, so copied the files in
question to a completely independent RAID array (separate LVM, RAID,
controllers, disks), but the problem remains.
I raised the issue on the XFS mailing list and was told the problem was
the the lease breaking code in VFS[2], and that I should contact this
mailing list.
This is with the kernel in Debian/Stable:
hq:/tmp# uname -a
Linux hq 2.6.18-6-686 #1 SMP Sun Feb 10 22:11:31 UTC 2008 i686 GNU/Linux
More details:
[1] <http://lists.luv.asn.au/wws/arc/luv-main/2008-06/msg00143.html>
[2] <http://oss.sgi.com/archives/xfs/2008-06/msg00222.html>
Any ideas?
Please CC me responses.
Thanks.
Brian May
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: open sleeps
2008-06-19 23:14 open sleeps Brian May
@ 2008-06-19 23:58 ` Stephen Rothwell
2008-06-20 0:07 ` Brian May
0 siblings, 1 reply; 8+ messages in thread
From: Stephen Rothwell @ 2008-06-19 23:58 UTC (permalink / raw)
To: Brian May; +Cc: linux-fsdevel
[-- Attachment #1: Type: text/plain, Size: 991 bytes --]
Hi Brian,
On Fri, 20 Jun 2008 09:14:46 +1000 Brian May <brian@vpac.org> wrote:
>
> I am having (weird) issues with XFS, in that open(...) on certain files
> takes 45 seconds to return. After the file has been opened, the next
> file in the same directory takes 45 seconds. If the file was recently
> opened it returns immediately.
>
> I thought this was a low level I/O issue, so copied the files in
> question to a completely independent RAID array (separate LVM, RAID,
> controllers, disks), but the problem remains.
>
> I raised the issue on the XFS mailing list and was told the problem was
> the the lease breaking code in VFS[2], and that I should contact this
> mailing list.
So, it would seem some other process has taken write leases on your
files but does not release them in a timely manner. You can see the
current leases in /proc/locks.
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: open sleeps
2008-06-19 23:58 ` Stephen Rothwell
@ 2008-06-20 0:07 ` Brian May
2008-06-20 0:20 ` Stephen Rothwell
0 siblings, 1 reply; 8+ messages in thread
From: Brian May @ 2008-06-20 0:07 UTC (permalink / raw)
To: Stephen Rothwell; +Cc: linux-fsdevel
Stephen Rothwell wrote:
> So, it would seem some other process has taken write leases on your
> files but does not release them in a timely manner. You can see the
> current leases in /proc/locks.
>
What is a lease? Is this a kernel level thing or user level thing?
What is the format of /proc/locks? I assume the 4th field is PID, what
is the next field? How do I work out what is locked?
Thanks.
Brian May
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: open sleeps
2008-06-20 0:07 ` Brian May
@ 2008-06-20 0:20 ` Stephen Rothwell
2008-06-20 0:50 ` Brian May
0 siblings, 1 reply; 8+ messages in thread
From: Stephen Rothwell @ 2008-06-20 0:20 UTC (permalink / raw)
To: Brian May; +Cc: linux-fsdevel
[-- Attachment #1: Type: text/plain, Size: 1301 bytes --]
On Fri, 20 Jun 2008 10:07:26 +1000 Brian May <brian@vpac.org> wrote:
>
> Stephen Rothwell wrote:
> > So, it would seem some other process has taken write leases on your
> > files but does not release them in a timely manner. You can see the
> > current leases in /proc/locks.
> >
> What is a lease? Is this a kernel level thing or user level thing?
Its what samba calls an oplock. Its like a kernel file lock but it
blocks opens and notifies the holder (who is supposed to clean up and
release the lease). If the lease is not released in 45 seconds (by
default) then the kernel assumes that the holder is broken and breaks the
lease and allows the open to proceed.
> What is the format of /proc/locks? I assume the 4th field is PID, what
> is the next field? How do I work out what is locked?
the format is:
index: type status subtype pid major:minor:inode start end
for leases, type is LEASE status is ACTIVE, BREAKING or BREAKER, subtype
is UNLCK, READ or WRITE.
If there is a "->" infromt of the type, then the process described by
this line is blocked by the first line with the same index.
You can see the inode number of a file using "ls -i".
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: open sleeps
2008-06-20 0:20 ` Stephen Rothwell
@ 2008-06-20 0:50 ` Brian May
2008-06-20 1:11 ` Brian May
0 siblings, 1 reply; 8+ messages in thread
From: Brian May @ 2008-06-20 0:50 UTC (permalink / raw)
To: Stephen Rothwell; +Cc: linux-fsdevel
Stephen Rothwell wrote:
> Its what samba calls an oplock. Its like a kernel file lock but it
> blocks opens and notifies the holder (who is supposed to clean up and
> release the lease). If the lease is not released in 45 seconds (by
> default) then the kernel assumes that the holder is broken and breaks the
> lease and allows the open to proceed.
>
Ok, so I guess it is most likely Samba's fault then. Most likely this is
the only application that uses leases/oplocks on this computer anyway. I
will keep investigating...
Thanks.
Brian May
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: open sleeps
2008-06-20 0:50 ` Brian May
@ 2008-06-20 1:11 ` Brian May
2008-06-20 6:45 ` Dave Chinner
0 siblings, 1 reply; 8+ messages in thread
From: Brian May @ 2008-06-20 1:11 UTC (permalink / raw)
To: Brian May; +Cc: Stephen Rothwell, linux-fsdevel
Brian May wrote:
> Stephen Rothwell wrote:
>> Its what samba calls an oplock. Its like a kernel file lock but it
>> blocks opens and notifies the holder (who is supposed to clean up and
>> release the lease). If the lease is not released in 45 seconds (by
>> default) then the kernel assumes that the holder is broken and breaks
>> the
>> lease and allows the open to proceed.
>>
> Ok, so I guess it is most likely Samba's fault then. Most likely this
> is the only application that uses leases/oplocks on this computer
> anyway. I will keep investigating...
Maybe I spoke too soon when I ruled XFS out :-(.
hq:~# cat /proc/locks | grep :0a:
5: LEASE ACTIVE WRITE 13516 fd:0a:131 0 EOF
hq:~# ps 13516
PID TTY STAT TIME COMMAND
13516 ? S 0:00 /usr/sbin/smbd -D
hq:~# find /var/lib/wpkg -inum 131
/var/lib/wpkg/cp_status.cmd
hq:~# cat /var/lib/wpkg/cp_status.cmd
[Ctrl-C pushed]
hq:~# cat /proc/locks | grep :0a:
5: LEASE BREAKING READ 13516 fd:0a:131 0 EOF
hq:~# echo t > /proc/sysrq-trigger
Gives a stack trace of:
Jun 20 10:54:37 hq kernel: smbd S 00000000 0 13516 11112 13459 (NOTLB)
Jun 20 10:54:37 hq kernel: ddd19b70 00000082 034cdfca 00000000 00000001 00000007 f7c2c550 dfa9caa0
Jun 20 10:54:37 hq kernel: ae402975 002779a9 0000830f 00000003 f7c2c660 c20240a0 00000001 00000286
Jun 20 10:54:37 hq kernel: c0125380 a5d7f11b c2116000 00000286 000000ff 00000000 00000000 a5d7f11b
Jun 20 10:54:37 hq kernel: Call Trace:
Jun 20 10:54:37 hq kernel: [<c0125380>] lock_timer_base+0x15/0x2f
Jun 20 10:54:37 hq kernel: [<c027f960>] schedule_timeout+0x71/0x8c
Jun 20 10:54:37 hq kernel: [<c0124a81>] process_timeout+0x0/0x5
Jun 20 10:54:37 hq kernel: [<c016a115>] do_select+0x37a/0x3d4
Jun 20 10:54:37 hq kernel: [<c016a677>] __pollwait+0x0/0xb2
Jun 20 10:54:37 hq kernel: [<c0117778>] default_wake_function+0x0/0xc
Jun 20 10:54:37 hq kernel: [<c0117778>] default_wake_function+0x0/0xc
Jun 20 10:54:37 hq kernel: [<f88e998f>] e1000_xmit_frame+0x928/0x958 [e1000]
Jun 20 10:54:37 hq kernel: [<c0121c24>] tasklet_action+0x55/0xaf
Jun 20 10:54:37 hq kernel: [<c022950a>] dev_hard_start_xmit+0x19a/0x1f0
Jun 20 10:54:37 hq kernel: [<f8ae3e6d>] xfs_iext_bno_to_ext+0xd8/0x191 [xfs]
Jun 20 10:54:37 hq kernel: [<f8ac7aec>] xfs_bmap_search_multi_extents+0xa8/0xc5 [xfs]
Jun 20 10:54:37 hq kernel: [<f8ac7b52>] xfs_bmap_search_extents+0x49/0xbe [xfs]
Jun 20 10:54:37 hq kernel: [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]
Jun 20 10:54:37 hq kernel: [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]
Jun 20 10:54:37 hq kernel: [<c02547e4>] tcp_transmit_skb+0x604/0x632
Jun 20 10:54:37 hq kernel: [<c02560d3>] __tcp_push_pending_frames+0x6a2/0x758
Jun 20 10:54:37 hq kernel: [<c016d84e>] __d_lookup+0x98/0xdb
Jun 20 10:54:37 hq kernel: [<c016d84e>] __d_lookup+0x98/0xdb
Jun 20 10:54:37 hq kernel: [<c0165370>] do_lookup+0x4f/0x135
Jun 20 10:54:37 hq kernel: [<c016dbc4>] dput+0x1a/0x11b
Jun 20 10:54:37 hq kernel: [<c0167312>] __link_path_walk+0xbe4/0xd1d
Jun 20 10:54:37 hq kernel: [<c016a3fb>] core_sys_select+0x28c/0x2a9
Jun 20 10:54:37 hq kernel: [<c01674fe>] link_path_walk+0xb3/0xbd
Jun 20 10:54:37 hq kernel: [<f8afbea1>] xfs_inactive_free_eofblocks+0xdf/0x23f [xfs]
Jun 20 10:54:37 hq kernel: [<c016785d>] do_path_lookup+0x20a/0x225
Jun 20 10:54:37 hq kernel: [<f8b07de5>] xfs_vn_getattr+0x27/0x2f [xfs]
Jun 20 10:54:37 hq kernel: [<c0161b28>] cp_new_stat64+0xfd/0x10f
Jun 20 10:54:37 hq kernel: [<c016a9c1>] sys_select+0x9f/0x182
Jun 20 10:54:37 hq kernel: [<c0102c11>] sysenter_past_esp+0x56/0x79
Why xfs routines are calling e1000 (isn't that a network driver???) is
very puzzling. This was using a Fibre Channel card (QLogic Corp. QLA2312
Fibre Channel Adapter); Would there be any benefit in repeating the same
thing again with the SATA device?
Brian May
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: open sleeps
2008-06-20 1:11 ` Brian May
@ 2008-06-20 6:45 ` Dave Chinner
2008-06-26 6:30 ` Brian May
0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2008-06-20 6:45 UTC (permalink / raw)
To: Brian May; +Cc: Stephen Rothwell, linux-fsdevel
On Fri, Jun 20, 2008 at 11:11:54AM +1000, Brian May wrote:
> Brian May wrote:
>> Stephen Rothwell wrote:
>>> Its what samba calls an oplock. Its like a kernel file lock but it
>>> blocks opens and notifies the holder (who is supposed to clean up and
>>> release the lease). If the lease is not released in 45 seconds (by
>>> default) then the kernel assumes that the holder is broken and breaks
>>> the
>>> lease and allows the open to proceed.
>>>
>> Ok, so I guess it is most likely Samba's fault then. Most likely this
>> is the only application that uses leases/oplocks on this computer
>> anyway. I will keep investigating...
> Maybe I spoke too soon when I ruled XFS out :-(.
....
> hq:~# echo t > /proc/sysrq-trigger
> Gives a stack trace of:
>
> Jun 20 10:54:37 hq kernel: Call Trace:
>
> Jun 20 10:54:37 hq kernel: [<c0125380>] lock_timer_base+0x15/0x2f
> Jun 20 10:54:37 hq kernel: [<c027f960>] schedule_timeout+0x71/0x8c
> Jun 20 10:54:37 hq kernel: [<c0124a81>] process_timeout+0x0/0x5
> Jun 20 10:54:37 hq kernel: [<c016a115>] do_select+0x37a/0x3d4
> Jun 20 10:54:37 hq kernel: [<c016a677>] __pollwait+0x0/0xb2
> Jun 20 10:54:37 hq kernel: [<c0117778>] default_wake_function+0x0/0xc
> Jun 20 10:54:37 hq kernel: [<c0117778>] default_wake_function+0x0/0xc
> Jun 20 10:54:37 hq kernel: [<f88e998f>] e1000_xmit_frame+0x928/0x958 [e1000]
> Jun 20 10:54:37 hq kernel: [<c0121c24>] tasklet_action+0x55/0xaf
> Jun 20 10:54:37 hq kernel: [<c022950a>] dev_hard_start_xmit+0x19a/0x1f0
> Jun 20 10:54:37 hq kernel: [<f8ae3e6d>] xfs_iext_bno_to_ext+0xd8/0x191 [xfs]
> Jun 20 10:54:37 hq kernel: [<f8ac7aec>] xfs_bmap_search_multi_extents+0xa8/0xc5 [xfs]
> Jun 20 10:54:37 hq kernel: [<f8ac7b52>] xfs_bmap_search_extents+0x49/0xbe [xfs]
> Jun 20 10:54:37 hq kernel: [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]
> Jun 20 10:54:37 hq kernel: [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]
> Jun 20 10:54:37 hq kernel: [<c02547e4>] tcp_transmit_skb+0x604/0x632
> Jun 20 10:54:37 hq kernel: [<c02560d3>] __tcp_push_pending_frames+0x6a2/0x758
> Jun 20 10:54:37 hq kernel: [<c016d84e>] __d_lookup+0x98/0xdb
> Jun 20 10:54:37 hq kernel: [<c016d84e>] __d_lookup+0x98/0xdb
> Jun 20 10:54:37 hq kernel: [<c0165370>] do_lookup+0x4f/0x135
> Jun 20 10:54:37 hq kernel: [<c016dbc4>] dput+0x1a/0x11b
> Jun 20 10:54:37 hq kernel: [<c0167312>] __link_path_walk+0xbe4/0xd1d
> Jun 20 10:54:37 hq kernel: [<c016a3fb>] core_sys_select+0x28c/0x2a9
> Jun 20 10:54:37 hq kernel: [<c01674fe>] link_path_walk+0xb3/0xbd
> Jun 20 10:54:37 hq kernel: [<f8afbea1>] xfs_inactive_free_eofblocks+0xdf/0x23f [xfs]
> Jun 20 10:54:37 hq kernel: [<c016785d>] do_path_lookup+0x20a/0x225
> Jun 20 10:54:37 hq kernel: [<f8b07de5>] xfs_vn_getattr+0x27/0x2f [xfs]
> Jun 20 10:54:37 hq kernel: [<c0161b28>] cp_new_stat64+0xfd/0x10f
> Jun 20 10:54:37 hq kernel: [<c016a9c1>] sys_select+0x9f/0x182
> Jun 20 10:54:37 hq kernel: [<c0102c11>] sysenter_past_esp+0x56/0x79
This is just the stupid (broken) ia32 stack tracing - it outputs every
single object it sees in the stack that may be function call. Hence
you see some stuff that is from past stack traversals. I'd say this
is sleeping in a select() call, not doing anything related to
xfs or networking....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: open sleeps
2008-06-20 6:45 ` Dave Chinner
@ 2008-06-26 6:30 ` Brian May
0 siblings, 0 replies; 8+ messages in thread
From: Brian May @ 2008-06-26 6:30 UTC (permalink / raw)
To: Brian May, Stephen Rothwell, linux-fsdevel
Dave Chinner wrote:
> This is just the stupid (broken) ia32 stack tracing - it outputs every
> single object it sees in the stack that may be function call. Hence
> you see some stuff that is from past stack traversals. I'd say this
> is sleeping in a select() call, not doing anything related to
> xfs or networking....
>
Hello,
Thanks for your help.
Just a followup: Every indication seems to be that it is the firewall in
the virus scanner we use, F-Secure, that prevents Windows from releasing
oplocks properly. After turning off the firewall, the oplocks get
released and there is no problem. Possibly related to the timing of wpkg
being invoked before system logons too.
Might be time to consider another virus scanner/firewall solution.
Brian May
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-06-26 6:30 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-19 23:14 open sleeps Brian May
2008-06-19 23:58 ` Stephen Rothwell
2008-06-20 0:07 ` Brian May
2008-06-20 0:20 ` Stephen Rothwell
2008-06-20 0:50 ` Brian May
2008-06-20 1:11 ` Brian May
2008-06-20 6:45 ` Dave Chinner
2008-06-26 6:30 ` Brian May
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).