* Re: Bug#328135: kernel-image-2.6.11-1-686-smp: nfs reading process stuck in disk wait
[not found] <20050913194707.8C8C28E6F0@ayer.connecterra.net>
@ 2005-09-14 2:51 ` Horms
2005-09-14 23:58 ` Trond Myklebust
0 siblings, 1 reply; 8+ messages in thread
From: Horms @ 2005-09-14 2:51 UTC (permalink / raw)
To: Marc Horowitz, 328135; +Cc: Trond Myklebust, linux-kernel
Hi Marc,
would is be possible to test linux-image-2.6.12-1-686-smp from
unstable to see if this problem persists? I am CCing the NFS
maintainer and LKML as this looks reasonably nasty and they
may be interested in looking into it.
--
Horms
On Tue, Sep 13, 2005 at 03:47:07PM -0400, Marc Horowitz wrote:
> Package: kernel-image-2.6.11-1-686-smp
> Version: 2.6.11-7
> Severity: important
>
> cvs D 00000008 0 6344 3830 6345 6162 (NOTLB)
> db6bdd18 00000086 db6bdd08 00000008 00000002 e203cba4 c02edb60 00000001
> 00000001 00000001 000001d2 c013fe43 c02f33a0 c02f2b80 c1805fa0 00000000
> 00000000 b3f24580 000f9fc6 00000000 e3f4f540 e3f4f694 c02f33a0 00000002
> Call Trace:
> [<c013fe43>] __alloc_pages+0x2e3/0x420
> [<c02ac668>] io_schedule+0x28/0x40
> [<c013a6c5>] sync_page+0x45/0x60
> [<c02ac9bf>] __wait_on_bit_lock+0x5f/0x70
> [<c013a680>] sync_page+0x0/0x60
> [<c0131f10>] wake_bit_function+0x0/0x60
> [<c013af51>] __lock_page+0x91/0xa0
> [<c0131f10>] wake_bit_function+0x0/0x60
> [<c014286d>] page_cache_readahead+0x24d/0x2d0
> [<c0131f10>] wake_bit_function+0x0/0x60
> [<c013af9d>] find_get_page+0x3d/0x50
> [<c013b897>] do_generic_mapping_read+0x517/0x630
> [<c013bcb2>] __generic_file_aio_read+0x212/0x250
> [<c013b9b0>] file_read_actor+0x0/0xf0
> [<c013bd4b>] generic_file_aio_read+0x5b/0x80
> [<f8cd1920>] nfs_file_read+0xa0/0xf0 [nfs]
> [<c015ada7>] do_sync_read+0xb7/0xf0
> [<c014d131>] vma_merge+0xd1/0x1d0
> [<c014d7e1>] do_mmap_pgoff+0x461/0x790
> [<c0131eb0>] autoremove_wake_function+0x0/0x60
> [<c015aec5>] vfs_read+0xe5/0x160
> [<c015b1e1>] sys_read+0x51/0x80
> [<c0103123>] syscall_call+0x7/0xb
>
> I was doing a "cvs add" on a working directory in NFS, and the process
> got stuck here. I don't know how to tell what file it was accessing.
>
> I have seen this happen twice with this kernel in the past month, but
> I don't know how to reliably reproduce it.
>
> -- System Information:
> Debian Release: 3.1
> APT prefers testing
> APT policy: (990, 'testing'), (500, 'unstable')
> Architecture: i386 (i686)
> Kernel: Linux 2.6.11-1-686-smp
> Locale: LANG=en_US.ISO8859-1, LC_CTYPE=en_US.ISO8859-1 (charmap=ISO-8859-1)
>
> Versions of packages kernel-image-2.6.11-1-686-smp depends on:
> ii coreutils [fileutils] 5.2.1-2 The GNU core utilities
> ii initrd-tools 0.1.77 tools to create initrd image for p
> ii module-init-tools 3.2-pre1-2 tools for managing Linux kernel mo
>
> -- no debconf information
>
>
> --
> To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Bug#328135: kernel-image-2.6.11-1-686-smp: nfs reading process stuck in disk wait
2005-09-14 2:51 ` Horms
@ 2005-09-14 23:58 ` Trond Myklebust
2005-09-15 1:10 ` Marc Horowitz
0 siblings, 1 reply; 8+ messages in thread
From: Trond Myklebust @ 2005-09-14 23:58 UTC (permalink / raw)
To: Horms; +Cc: Marc Horowitz, 328135, linux-kernel
on den 14.09.2005 Klokka 11:51 (+0900) skreiv Horms:
> Hi Marc,
>
> would is be possible to test linux-image-2.6.12-1-686-smp from
> unstable to see if this problem persists? I am CCing the NFS
> maintainer and LKML as this looks reasonably nasty and they
> may be interested in looking into it.
>
I doubt this has anything to do with NFS. We should no longer have a
sync_page VFS method in the 2.6 kernels. What other filesystems is the
user running?
Cheers,
Trond
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Bug#328135: kernel-image-2.6.11-1-686-smp: nfs reading process stuck in disk wait
2005-09-14 23:58 ` Trond Myklebust
@ 2005-09-15 1:10 ` Marc Horowitz
2005-09-15 8:32 ` Trond Myklebust
0 siblings, 1 reply; 8+ messages in thread
From: Marc Horowitz @ 2005-09-15 1:10 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Horms, 328135, linux-kernel
Trond Myklebust <trond.myklebust@fys.uio.no> writes:
>> on den 14.09.2005 Klokka 11:51 (+0900) skreiv Horms:
>> > Hi Marc,
>> >
>> > would is be possible to test linux-image-2.6.12-1-686-smp from
>> > unstable to see if this problem persists? I am CCing the NFS
>> > maintainer and LKML as this looks reasonably nasty and they
>> > may be interested in looking into it.
>> >
>>
>> I doubt this has anything to do with NFS. We should no longer have a
>> sync_page VFS method in the 2.6 kernels. What other filesystems is the
>> user running?
In the stack trace I sent, from a running 2.6.11 kernel, vfs_read
appears to be the vfs method, not sync_page. sync_page is called much
deeper in the stack trace.
I haven't had a chance to try a 2.6.12 kernel, but I should be able to
this week.
Marc
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Bug#328135: kernel-image-2.6.11-1-686-smp: nfs reading process stuck in disk wait
2005-09-15 1:10 ` Marc Horowitz
@ 2005-09-15 8:32 ` Trond Myklebust
2005-09-15 9:22 ` Horms
2005-09-15 14:29 ` Marc Horowitz
0 siblings, 2 replies; 8+ messages in thread
From: Trond Myklebust @ 2005-09-15 8:32 UTC (permalink / raw)
To: Marc Horowitz; +Cc: Horms, 328135, linux-kernel
on den 14.09.2005 Klokka 21:10 (-0400) skreiv Marc Horowitz:
> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
>
> >> on den 14.09.2005 Klokka 11:51 (+0900) skreiv Horms:
> >> > Hi Marc,
> >> >
> >> > would is be possible to test linux-image-2.6.12-1-686-smp from
> >> > unstable to see if this problem persists? I am CCing the NFS
> >> > maintainer and LKML as this looks reasonably nasty and they
> >> > may be interested in looking into it.
> >> >
> >>
> >> I doubt this has anything to do with NFS. We should no longer have a
> >> sync_page VFS method in the 2.6 kernels. What other filesystems is the
> >> user running?
>
> In the stack trace I sent, from a running 2.6.11 kernel, vfs_read
> appears to be the vfs method, not sync_page. sync_page is called much
> deeper in the stack trace.
So? It is clearly the call to sync_page that is Oopsing.
The NFS call is just trying to lock a page that appears to be owned by
someone else. That triggers a call to that filesystem's sync_page, which
then goes on to do a page allocation, which again Oopses.
Cheers,
Trond
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Bug#328135: kernel-image-2.6.11-1-686-smp: nfs reading process stuck in disk wait
2005-09-15 8:32 ` Trond Myklebust
@ 2005-09-15 9:22 ` Horms
2005-09-15 9:44 ` Trond Myklebust
2005-09-15 14:29 ` Marc Horowitz
1 sibling, 1 reply; 8+ messages in thread
From: Horms @ 2005-09-15 9:22 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Marc Horowitz, 328135, linux-kernel
On Thu, Sep 15, 2005 at 09:32:47AM +0100, Trond Myklebust wrote:
> on den 14.09.2005 Klokka 21:10 (-0400) skreiv Marc Horowitz:
> > Trond Myklebust <trond.myklebust@fys.uio.no> writes:
> >
> > >> on den 14.09.2005 Klokka 11:51 (+0900) skreiv Horms:
> > >> > Hi Marc,
> > >> >
> > >> > would is be possible to test linux-image-2.6.12-1-686-smp from
> > >> > unstable to see if this problem persists? I am CCing the NFS
> > >> > maintainer and LKML as this looks reasonably nasty and they
> > >> > may be interested in looking into it.
> > >> >
> > >>
> > >> I doubt this has anything to do with NFS. We should no longer have a
> > >> sync_page VFS method in the 2.6 kernels. What other filesystems is the
> > >> user running?
> >
> > In the stack trace I sent, from a running 2.6.11 kernel, vfs_read
> > appears to be the vfs method, not sync_page. sync_page is called much
> > deeper in the stack trace.
>
> So? It is clearly the call to sync_page that is Oopsing.
>
> The NFS call is just trying to lock a page that appears to be owned by
> someone else. That triggers a call to that filesystem's sync_page, which
> then goes on to do a page allocation, which again Oopses.
I take it from your initial remarks that the use of sync_page()
in the VSF has changed recently. And in any case, it would
be worth testing 2.6.12 or 2.6.13 before investigating any further
as in your oppinion the problem is not NFS related, but related
to somthing that NFS coincidently triggers (but could just as
easily triggered by anything else).
--
Horms
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Bug#328135: kernel-image-2.6.11-1-686-smp: nfs reading process stuck in disk wait
2005-09-15 9:22 ` Horms
@ 2005-09-15 9:44 ` Trond Myklebust
0 siblings, 0 replies; 8+ messages in thread
From: Trond Myklebust @ 2005-09-15 9:44 UTC (permalink / raw)
To: Horms; +Cc: Marc Horowitz, 328135, linux-kernel
to den 15.09.2005 Klokka 18:22 (+0900) skreiv Horms:
> I take it from your initial remarks that the use of sync_page()
> in the VSF has changed recently. And in any case, it would
> be worth testing 2.6.12 or 2.6.13 before investigating any further
> as in your oppinion the problem is not NFS related, but related
> to somthing that NFS coincidently triggers (but could just as
> easily triggered by anything else).
Right. What I'm saying is that NFS has no special hooks inside
lock_page(), so this is 100% generic VFS code.
Cheers,
Trond
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Bug#328135: kernel-image-2.6.11-1-686-smp: nfs reading process stuck in disk wait
2005-09-15 8:32 ` Trond Myklebust
2005-09-15 9:22 ` Horms
@ 2005-09-15 14:29 ` Marc Horowitz
1 sibling, 0 replies; 8+ messages in thread
From: Marc Horowitz @ 2005-09-15 14:29 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Horms, 328135, linux-kernel
Trond Myklebust <trond.myklebust@fys.uio.no> writes:
>> on den 14.09.2005 Klokka 21:10 (-0400) skreiv Marc Horowitz:
>> > Trond Myklebust <trond.myklebust@fys.uio.no> writes:
>> >
>> > >> on den 14.09.2005 Klokka 11:51 (+0900) skreiv Horms:
>> > >> I doubt this has anything to do with NFS. We should no longer have a
>> > >> sync_page VFS method in the 2.6 kernels. What other filesystems is the
>> > >> user running?
>> >
>> > In the stack trace I sent, from a running 2.6.11 kernel, vfs_read
>> > appears to be the vfs method, not sync_page. sync_page is called much
>> > deeper in the stack trace.
>>
>> So? It is clearly the call to sync_page that is Oopsing.
>>
>> The NFS call is just trying to lock a page that appears to be owned by
>> someone else. That triggers a call to that filesystem's sync_page, which
>> then goes on to do a page allocation, which again Oopses.
Ah, I understand now. I misinterpreted what you said to mean you
didn't expect to see a sync_page call at all.
That said, I'd like to clarify one thing: there is no oops in the
dmesg output. That stack trace comes from dmesg after I do
"echo t > /proc/sysrq_trigger".
I'll give the 2.6.12 kernel a try today or tomorrow, and see what
happens.
Marc
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Bug#328135: kernel-image-2.6.11-1-686-smp: nfs reading process stuck in disk wait
[not found] ` <4N34g-7De-15@gated-at.bofh.it>
@ 2005-09-15 18:46 ` Nigel Kukard
0 siblings, 0 replies; 8+ messages in thread
From: Nigel Kukard @ 2005-09-15 18:46 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 488 bytes --]
Most interesting thing is that I also had a similar problem, I have a
client with a Samba server and about 500Gb of harddrive space.
The Samba server maintains a large number of locks on files due to
employees not closing the software they using before going home, it is
backed up over NFS to external drives every evening.
About 50% of the backups hang in a D state while trying to access files,
these D states last indefinitly even when locks are dropped on the Samba
server.
-Nigel
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2005-09-15 18:47 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4Mpqc-7h0-23@gated-at.bofh.it>
[not found] ` <4MxnE-1Z7-5@gated-at.bofh.it>
[not found] ` <4MPuc-4iZ-1@gated-at.bofh.it>
[not found] ` <4MQzZ-5PL-9@gated-at.bofh.it>
[not found] ` <4MXrT-7xn-29@gated-at.bofh.it>
[not found] ` <4N34g-7De-15@gated-at.bofh.it>
2005-09-15 18:46 ` Bug#328135: kernel-image-2.6.11-1-686-smp: nfs reading process stuck in disk wait Nigel Kukard
[not found] <20050913194707.8C8C28E6F0@ayer.connecterra.net>
2005-09-14 2:51 ` Horms
2005-09-14 23:58 ` Trond Myklebust
2005-09-15 1:10 ` Marc Horowitz
2005-09-15 8:32 ` Trond Myklebust
2005-09-15 9:22 ` Horms
2005-09-15 9:44 ` Trond Myklebust
2005-09-15 14:29 ` Marc Horowitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox