* Deadlock in NFSv4 in all kernels @ 2010-05-07 15:39 Lukas Hejtmanek 2010-05-24 21:24 ` Pavel Machek 2010-05-25 13:45 ` William A. (Andy) Adamson 0 siblings, 2 replies; 12+ messages in thread From: Lukas Hejtmanek @ 2010-05-07 15:39 UTC (permalink / raw) To: linux-nfs, linux-kernel, linux-fsdevel; +Cc: salvet Hi, I encountered the following problem. We use short expiration time for kerberos contexts created by rpc.gssd (some patches were included in mainline nfs-utils). In particular, we use 120secs expiration time. Now, I run application that eats 80% of available RAM. Then I run 10 parallel dd processes that write data into NFS4 volume with sec=krb5. As soon as the kerberos context expires (i.e., up to 120 secs), the whole system gets stuck in do_page_fault and succesive functions. It is because there is no free memory in kernel, all free memory is used as cache for NFS4 (due to dd traffic), kernel ask NFS to write back its pages but NFS cannot do anything as it is missing valid context. NFS contacts rpc.gssd to provide a renewed context, the rpc.gssd does not provide the context as it needs some memory to scan /tmp for a ticket. I.e., it deadlocks. Longer context expiration time is no real solution as it only makes the deadlock less often. Any ideas what can be done here? (Please cc me.) We could preallocate some memory in rpc.gssd and use mlockall but not sure whether this proctects also kernel malloc for things related to rpc.gssd and context creation (new file descriptors and so on). This is seen in 2.6.32 kernel but most probably this is related to all kernel versions. rpc.gssd and all dd processes are stuck in: May 6 15:33:10 skirit20 kernel: [84087.788019] rpc.gssd D 6758d881 0 26864 1 0x00000000 May 6 15:33:10 skirit20 kernel: [84087.788019] c280594c 00000086 f94c3d6c 6758d881 00004c5a 0f3e1c27 c1d868c0 c1c068dc May 6 15:33:10 skirit20 kernel: [84087.788019] c07d8880 c07d8880 f6752130 f67523d4 c1c06880 00000000 c1c06880 00000000 May 6 15:33:10 skirit20 kernel: [84087.788019] f6d7c8f0 f67523d4 f6752130 c1c06cd4 c1c06880 c2805960 c052101f 00000000 May 6 15:33:10 skirit20 kernel: [84087.788019] Call Trace: May 6 15:33:10 skirit20 kernel: [84087.788019] [<c052101f>] io_schedule+0x6f/0xc0 May 6 15:33:10 skirit20 kernel: [84087.788019] [<f88d8af5>] nfs_wait_bit_uninterruptible+0x5/0x10 [nfs] May 6 15:33:10 skirit20 kernel: [84087.788019] [<c05215a7>] __wait_on_bit+0x47/0x70 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c0521683>] out_of_line_wait_on_bit+0xb3/0xd0 May 6 15:33:10 skirit20 kernel: [84087.788019] [<f88d8ae3>] nfs_wait_on_request+0x23/0x30 [nfs] May 6 15:33:10 skirit20 kernel: [84087.788019] [<f88ddb4a>] nfs_sync_mapping_wait+0xea/0x200 [nfs] May 6 15:33:10 skirit20 kernel: [84087.788019] [<f88ddd0e>] nfs_wb_page_priority+0xae/0x170 [nfs] May 6 15:33:10 skirit20 kernel: [84087.788019] [<f88cdfec>] nfs_release_page+0x5c/0x70 [nfs] May 6 15:33:10 skirit20 kernel: [84087.788019] [<c029620b>] try_to_release_page+0x2b/0x40 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02a25af>] shrink_page_list+0x37f/0x4b0 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02a29ae>] shrink_inactive_list+0x2ce/0x6c0 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02a34c8>] shrink_zone+0x1c8/0x260 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02a35ae>] shrink_zones+0x4e/0xe0 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02a43b5>] do_try_to_free_pages+0x75/0x2e0 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02a4756>] try_to_free_pages+0x86/0xa0 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c029cda4>] __alloc_pages_slowpath+0x164/0x470 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c029d1c8>] __alloc_pages_nodemask+0x118/0x120 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02acce0>] do_anonymous_page+0x100/0x240 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02b072a>] handle_mm_fault+0x34a/0x3d0 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c0524c54>] do_page_fault+0x174/0x370 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c0522cb6>] error_code+0x66/0x70 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c0296ac2>] file_read_actor+0x32/0xf0 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c029815f>] do_generic_file_read+0x3af/0x4c0 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c0298a01>] generic_file_aio_read+0xb1/0x210 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02cdc05>] do_sync_read+0xd5/0x120 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02ce3bb>] vfs_read+0x9b/0x110 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02ce501>] sys_read+0x41/0x80 May 6 15:33:10 skirit20 kernel: [84087.788019] [<c0203150>] sysenter_do_call+0x12/0x22 May 6 15:33:10 skirit20 kernel: [84087.788019] [<ffffe430>] 0xffffe430 -- Lukáš Hejtmánek ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Deadlock in NFSv4 in all kernels 2010-05-07 15:39 Deadlock in NFSv4 in all kernels Lukas Hejtmanek @ 2010-05-24 21:24 ` Pavel Machek 2010-05-25 12:28 ` Trond Myklebust 2010-05-25 13:45 ` William A. (Andy) Adamson 1 sibling, 1 reply; 12+ messages in thread From: Pavel Machek @ 2010-05-24 21:24 UTC (permalink / raw) To: Lukas Hejtmanek; +Cc: linux-nfs, linux-kernel, linux-fsdevel, salvet Hi! > I encountered the following problem. We use short expiration time for > kerberos contexts created by rpc.gssd (some patches were included in mainline > nfs-utils). In particular, we use 120secs expiration time. > > Now, I run application that eats 80% of available RAM. Then I run 10 parallel > dd processes that write data into NFS4 volume with sec=krb5. > > As soon as the kerberos context expires (i.e., up to 120 secs), the whole > system gets stuck in do_page_fault and succesive functions. It is because > there is no free memory in kernel, all free memory is used as cache for NFS4 > (due to dd traffic), kernel ask NFS to write back its pages but NFS cannot do > anything as it is missing valid context. NFS contacts rpc.gssd to provide > a renewed context, the rpc.gssd does not provide the context as it needs some memory > to scan /tmp for a ticket. I.e., it deadlocks. > > Longer context expiration time is no real solution as it only makes the > deadlock less often. > > Any ideas what can be done here? (Please cc me.) We could preallocate some > memory in rpc.gssd and use mlockall but not sure whether this proctects also > kernel malloc for things related to rpc.gssd and context creation (new file > descriptors and so on). > > This is seen in 2.6.32 kernel but most probably this is related to all kernel > versions. Seems like pretty fundamental problem in nfs :-(. Limiting writeback caches for nfs, so that system has enough memory to perform rpc calls with the rest might do the trick, but... -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Deadlock in NFSv4 in all kernels 2010-05-24 21:24 ` Pavel Machek @ 2010-05-25 12:28 ` Trond Myklebust 2010-05-25 12:58 ` Lukas Hejtmanek 2010-05-25 17:10 ` Sunil Mushran 0 siblings, 2 replies; 12+ messages in thread From: Trond Myklebust @ 2010-05-25 12:28 UTC (permalink / raw) To: Pavel Machek Cc: Lukas Hejtmanek, linux-nfs, linux-kernel, linux-fsdevel, salvet On Mon, 2010-05-24 at 23:24 +0200, Pavel Machek wrote: > Hi! > > > I encountered the following problem. We use short expiration time for > > kerberos contexts created by rpc.gssd (some patches were included in mainline > > nfs-utils). In particular, we use 120secs expiration time. > > > > Now, I run application that eats 80% of available RAM. Then I run 10 parallel > > dd processes that write data into NFS4 volume with sec=krb5. > > > > As soon as the kerberos context expires (i.e., up to 120 secs), the whole > > system gets stuck in do_page_fault and succesive functions. It is because > > there is no free memory in kernel, all free memory is used as cache for NFS4 > > (due to dd traffic), kernel ask NFS to write back its pages but NFS cannot do > > anything as it is missing valid context. NFS contacts rpc.gssd to provide > > a renewed context, the rpc.gssd does not provide the context as it needs some memory > > to scan /tmp for a ticket. I.e., it deadlocks. > > > > Longer context expiration time is no real solution as it only makes the > > deadlock less often. > > > > Any ideas what can be done here? (Please cc me.) We could preallocate some > > memory in rpc.gssd and use mlockall but not sure whether this proctects also > > kernel malloc for things related to rpc.gssd and context creation (new file > > descriptors and so on). > > > > This is seen in 2.6.32 kernel but most probably this is related to all kernel > > versions. > > Seems like pretty fundamental problem in nfs :-(. Limiting writeback > caches for nfs, so that system has enough memory to perform rpc calls > with the rest might do the trick, but... > It's the same problem that you have for any file or storage system that has initiators in userland. On the storage side, iSCSI in particular has the same problem. On the filesystem side, CIFS, AFS, coda, .... do too. The clustered filesystems can deadlock if the node that is running the DLM runs out of memory... A few years ago there were several people proposing various solutions for allowing these daemons to run in a protected memory environment to avoid deadlocks, but those efforts have since petered out. Perhaps it is time to review the problem? Cheers Trond ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Deadlock in NFSv4 in all kernels 2010-05-25 12:28 ` Trond Myklebust @ 2010-05-25 12:58 ` Lukas Hejtmanek 2010-05-25 13:39 ` Trond Myklebust 2010-05-25 17:10 ` Sunil Mushran 1 sibling, 1 reply; 12+ messages in thread From: Lukas Hejtmanek @ 2010-05-25 12:58 UTC (permalink / raw) To: Trond Myklebust Cc: Pavel Machek, linux-nfs, linux-kernel, linux-fsdevel, salvet Hi, On Tue, May 25, 2010 at 08:28:40AM -0400, Trond Myklebust wrote: > > Seems like pretty fundamental problem in nfs :-(. Limiting writeback > > caches for nfs, so that system has enough memory to perform rpc calls > > with the rest might do the trick, but... > > > > It's the same problem that you have for any file or storage system that > has initiators in userland. On the storage side, iSCSI in particular has > the same problem. On the filesystem side, CIFS, AFS, coda, .... do too. > The clustered filesystems can deadlock if the node that is running the > DLM runs out of memory... > > A few years ago there were several people proposing various solutions > for allowing these daemons to run in a protected memory environment to > avoid deadlocks, but those efforts have since petered out. Perhaps it is > time to review the problem? I saw some patches targeting 2.6.35 that should prevent some deadlocks. They seem to be not enough in some cases. rpc.* daemons should be mlocked for sure but there is a problem with libkrb that reads files using fread(). fread() uses anonymous mmap, under mlockall(MCL_FUTURE) this causes the anonymous map to be mapped instantly and it deadlocks. IBM GPFS also uses userspace daemon, but it seems that the deamon is mlocked and it does not open any files and does not create new connections. My problem was quite easily reproducible. I started an application that eats 80% of free memory. Then I started: for i in `seq 1 10`; do dd if=/dev/zero of=/mnt/nfs4/file$i bs=1M count=2048 & done it deadlock within 2 minutes until this patch is applied: commit 3d7b08945e54a3a5358d5890240619a013cb7388 Author: Trond Myklebust <Trond.Myklebust@netapp.com> Date: Thu Apr 22 15:35:55 2010 -0400 SUNRPC: Fix a bug in rpcauth_prune_expired Don't want to evict a credential if cred->cr_expire == jiffies, since that means that it was just placed on the cred_unused list. We therefore need to use time_in_range() rather than time_in_range_open(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index f394fc1..95afe79 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -237,7 +237,7 @@ rpcauth_prune_expired(struct list_head *free, int nr_to_scan) list_for_each_entry_safe(cred, next, &cred_unused, cr_lru) { /* Enforce a 60 second garbage collection moratorium */ - if (time_in_range_open(cred->cr_expire, expired, jiffies) && + if (time_in_range(cred->cr_expire, expired, jiffies) && test_bit(RPCAUTH_CRED_HASHED, &cred->cr_flags) != 0) continue; but I believe this only hides the real problem. -- Lukáš Hejtmánek ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: Deadlock in NFSv4 in all kernels 2010-05-25 12:58 ` Lukas Hejtmanek @ 2010-05-25 13:39 ` Trond Myklebust 2010-05-25 14:07 ` Zdenek Salvet 0 siblings, 1 reply; 12+ messages in thread From: Trond Myklebust @ 2010-05-25 13:39 UTC (permalink / raw) To: Lukas Hejtmanek Cc: Pavel Machek, linux-nfs, linux-kernel, linux-fsdevel, salvet On Tue, 2010-05-25 at 14:58 +0200, Lukas Hejtmanek wrote: > Hi, > > On Tue, May 25, 2010 at 08:28:40AM -0400, Trond Myklebust wrote: > > > Seems like pretty fundamental problem in nfs :-(. Limiting writeback > > > caches for nfs, so that system has enough memory to perform rpc calls > > > with the rest might do the trick, but... > > > > > > > It's the same problem that you have for any file or storage system that > > has initiators in userland. On the storage side, iSCSI in particular has > > the same problem. On the filesystem side, CIFS, AFS, coda, .... do too. > > The clustered filesystems can deadlock if the node that is running the > > DLM runs out of memory... > > > > A few years ago there were several people proposing various solutions > > for allowing these daemons to run in a protected memory environment to > > avoid deadlocks, but those efforts have since petered out. Perhaps it is > > time to review the problem? > > I saw some patches targeting 2.6.35 that should prevent some deadlocks. They > seem to be not enough in some cases. rpc.* daemons should be mlocked for sure > but there is a problem with libkrb that reads files using fread(). fread() uses > anonymous mmap, under mlockall(MCL_FUTURE) this causes the anonymous map to be > mapped instantly and it deadlocks. > > IBM GPFS also uses userspace daemon, but it seems that the deamon is mlocked > and it does not open any files and does not create new connections. Doesn't matter. Just writing to a socket or pipe may trigger a kernel memory allocation which can result in an attempt to reclaim memory. Furthermore, there is the issue of what to do when you really are OOM, and the kernel cannot allocate more memory for you without reclaiming it. The schemes I'm talking about typically had special memory pools preallocated for use by daemons, and would label the daemons using some equivalent of the PF_MEMALLOC flag to prevent recursion into the filesystem. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Deadlock in NFSv4 in all kernels 2010-05-25 13:39 ` Trond Myklebust @ 2010-05-25 14:07 ` Zdenek Salvet 0 siblings, 0 replies; 12+ messages in thread From: Zdenek Salvet @ 2010-05-25 14:07 UTC (permalink / raw) To: Trond Myklebust Cc: Lukas Hejtmanek, Pavel Machek, linux-nfs, linux-kernel, linux-fsdevel, salvet On Tue, May 25, 2010 at 09:39:25AM -0400, Trond Myklebust wrote: > The schemes I'm talking about typically had special memory pools > preallocated for use by daemons, and would label the daemons using some > equivalent of the PF_MEMALLOC flag to prevent recursion into the > filesystem. Yes. In my opinion, proper solution has to be careful at three points: - daemons must be carefully written not to require much memory - daemons should 'inherit' PF_MEMALLOC while processing upcalls - FS should try to flush fast enough (what W. Adamson wrote) and delay new allocations when it cannot Regards, Zdenek Salvet salvet@ics.muni.cz Institute of Computer Science of Masaryk University, Brno, Czech Republic and CESNET, z.s.p.o., Prague, Czech Republic Phone: ++420-549 49 6534 Fax: ++420-541 212 747 ---------------------------------------------------------------------------- Teamwork is essential -- it allows you to blame someone else. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Deadlock in NFSv4 in all kernels 2010-05-25 12:28 ` Trond Myklebust 2010-05-25 12:58 ` Lukas Hejtmanek @ 2010-05-25 17:10 ` Sunil Mushran 1 sibling, 0 replies; 12+ messages in thread From: Sunil Mushran @ 2010-05-25 17:10 UTC (permalink / raw) To: Trond Myklebust Cc: Pavel Machek, Lukas Hejtmanek, linux-nfs, linux-kernel, linux-fsdevel, salvet On 05/25/2010 05:28 AM, Trond Myklebust wrote: >>> I encountered the following problem. We use short expiration time for >>> kerberos contexts created by rpc.gssd (some patches were included in mainline >>> nfs-utils). In particular, we use 120secs expiration time. >>> >>> Now, I run application that eats 80% of available RAM. Then I run 10 parallel >>> dd processes that write data into NFS4 volume with sec=krb5. >>> >>> As soon as the kerberos context expires (i.e., up to 120 secs), the whole >>> system gets stuck in do_page_fault and succesive functions. It is because >>> there is no free memory in kernel, all free memory is used as cache for NFS4 >>> (due to dd traffic), kernel ask NFS to write back its pages but NFS cannot do >>> anything as it is missing valid context. NFS contacts rpc.gssd to provide >>> a renewed context, the rpc.gssd does not provide the context as it needs some memory >>> to scan /tmp for a ticket. I.e., it deadlocks. >>> >>> Longer context expiration time is no real solution as it only makes the >>> deadlock less often. >>> >>> Any ideas what can be done here? (Please cc me.) We could preallocate some >>> memory in rpc.gssd and use mlockall but not sure whether this proctects also >>> kernel malloc for things related to rpc.gssd and context creation (new file >>> descriptors and so on). >>> >>> This is seen in 2.6.32 kernel but most probably this is related to all kernel >>> versions. >>> >> Seems like pretty fundamental problem in nfs :-(. Limiting writeback >> caches for nfs, so that system has enough memory to perform rpc calls >> with the rest might do the trick, but... >> > It's the same problem that you have for any file or storage system that > has initiators in userland. On the storage side, iSCSI in particular has > the same problem. On the filesystem side, CIFS, AFS, coda, .... do too. > The clustered filesystems can deadlock if the node that is running the > DLM runs out of memory... > Not so trivially. In ocfs2, the dlm allocates small blocks with GFP_NOFS. Furthermore, in the time-sensitive recovery thread, it preallocates buffers, what it can, at create time. That does not mean it is not affected by memory pressure. It is. But that shows up as slower response and not a deadlock. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Deadlock in NFSv4 in all kernels 2010-05-07 15:39 Deadlock in NFSv4 in all kernels Lukas Hejtmanek 2010-05-24 21:24 ` Pavel Machek @ 2010-05-25 13:45 ` William A. (Andy) Adamson 2010-05-25 14:02 ` Lukas Hejtmanek 2010-05-25 14:04 ` Trond Myklebust 1 sibling, 2 replies; 12+ messages in thread From: William A. (Andy) Adamson @ 2010-05-25 13:45 UTC (permalink / raw) To: Lukas Hejtmanek; +Cc: linux-nfs, linux-kernel, linux-fsdevel, salvet 2010/5/7 Lukas Hejtmanek <xhejtman@ics.muni.cz>: > Hi, > > I encountered the following problem. We use short expiration time for > kerberos contexts created by rpc.gssd (some patches were included in mainline > nfs-utils). In particular, we use 120secs expiration time. > > Now, I run application that eats 80% of available RAM. Then I run 10 parallel > dd processes that write data into NFS4 volume with sec=krb5. > > As soon as the kerberos context expires (i.e., up to 120 secs), the whole > system gets stuck in do_page_fault and succesive functions. It is because > there is no free memory in kernel, all free memory is used as cache for NFS4 > (due to dd traffic), kernel ask NFS to write back its pages but NFS cannot do > anything as it is missing valid context. NFS contacts rpc.gssd to provide > a renewed context, the rpc.gssd does not provide the context as it needs some memory > to scan /tmp for a ticket. I.e., it deadlocks. > > Longer context expiration time is no real solution as it only makes the > deadlock less often. > > Any ideas what can be done here? Not get into the problem in the first place: this means 1) determine a 'lead time' where the NFS client declares a context expired even though it really as 'lead time' until it actually expires. 2) flush all writes on any contex that will expire within the lead time which needs to be long enough for flushes to take place. -->Andy >(Please cc me.) We could preallocate some > memory in rpc.gssd and use mlockall but not sure whether this proctects also > kernel malloc for things related to rpc.gssd and context creation (new file > descriptors and so on). > > This is seen in 2.6.32 kernel but most probably this is related to all kernel > versions. > > rpc.gssd and all dd processes are stuck in: > > May 6 15:33:10 skirit20 kernel: [84087.788019] rpc.gssd D 6758d881 0 26864 1 0x00000000 > May 6 15:33:10 skirit20 kernel: [84087.788019] c280594c 00000086 f94c3d6c 6758d881 00004c5a 0f3e1c27 c1d868c0 c1c068dc > May 6 15:33:10 skirit20 kernel: [84087.788019] c07d8880 c07d8880 f6752130 f67523d4 c1c06880 00000000 c1c06880 00000000 > May 6 15:33:10 skirit20 kernel: [84087.788019] f6d7c8f0 f67523d4 f6752130 c1c06cd4 c1c06880 c2805960 c052101f 00000000 > May 6 15:33:10 skirit20 kernel: [84087.788019] Call Trace: > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c052101f>] io_schedule+0x6f/0xc0 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<f88d8af5>] nfs_wait_bit_uninterruptible+0x5/0x10 [nfs] > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c05215a7>] __wait_on_bit+0x47/0x70 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c0521683>] out_of_line_wait_on_bit+0xb3/0xd0 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<f88d8ae3>] nfs_wait_on_request+0x23/0x30 [nfs] > May 6 15:33:10 skirit20 kernel: [84087.788019] [<f88ddb4a>] nfs_sync_mapping_wait+0xea/0x200 [nfs] > May 6 15:33:10 skirit20 kernel: [84087.788019] [<f88ddd0e>] nfs_wb_page_priority+0xae/0x170 [nfs] > May 6 15:33:10 skirit20 kernel: [84087.788019] [<f88cdfec>] nfs_release_page+0x5c/0x70 [nfs] > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c029620b>] try_to_release_page+0x2b/0x40 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02a25af>] shrink_page_list+0x37f/0x4b0 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02a29ae>] shrink_inactive_list+0x2ce/0x6c0 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02a34c8>] shrink_zone+0x1c8/0x260 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02a35ae>] shrink_zones+0x4e/0xe0 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02a43b5>] do_try_to_free_pages+0x75/0x2e0 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02a4756>] try_to_free_pages+0x86/0xa0 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c029cda4>] __alloc_pages_slowpath+0x164/0x470 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c029d1c8>] __alloc_pages_nodemask+0x118/0x120 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02acce0>] do_anonymous_page+0x100/0x240 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02b072a>] handle_mm_fault+0x34a/0x3d0 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c0524c54>] do_page_fault+0x174/0x370 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c0522cb6>] error_code+0x66/0x70 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c0296ac2>] file_read_actor+0x32/0xf0 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c029815f>] do_generic_file_read+0x3af/0x4c0 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c0298a01>] generic_file_aio_read+0xb1/0x210 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02cdc05>] do_sync_read+0xd5/0x120 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02ce3bb>] vfs_read+0x9b/0x110 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c02ce501>] sys_read+0x41/0x80 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<c0203150>] sysenter_do_call+0x12/0x22 > May 6 15:33:10 skirit20 kernel: [84087.788019] [<ffffe430>] 0xffffe430 > > -- > Lukáš Hejtmánek > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Deadlock in NFSv4 in all kernels 2010-05-25 13:45 ` William A. (Andy) Adamson @ 2010-05-25 14:02 ` Lukas Hejtmanek 2010-05-25 14:10 ` William A. (Andy) Adamson 2010-05-25 14:04 ` Trond Myklebust 1 sibling, 1 reply; 12+ messages in thread From: Lukas Hejtmanek @ 2010-05-25 14:02 UTC (permalink / raw) To: William A. (Andy) Adamson; +Cc: linux-nfs, linux-kernel, linux-fsdevel, salvet On Tue, May 25, 2010 at 09:45:32AM -0400, William A. (Andy) Adamson wrote: > Not get into the problem in the first place: this means > > 1) determine a 'lead time' where the NFS client declares a context > expired even though it really as 'lead time' until it actually > expires. > > 2) flush all writes on any contex that will expire within the lead > time which needs to be long enough for flushes to take place. I think you cannot give any guarantees that the flush happens on time. There can be server overload, network overload, anything and you are out of luck. -- Lukáš Hejtmánek ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Deadlock in NFSv4 in all kernels 2010-05-25 14:02 ` Lukas Hejtmanek @ 2010-05-25 14:10 ` William A. (Andy) Adamson 2010-05-25 14:29 ` Trond Myklebust 0 siblings, 1 reply; 12+ messages in thread From: William A. (Andy) Adamson @ 2010-05-25 14:10 UTC (permalink / raw) To: Lukas Hejtmanek; +Cc: linux-nfs, linux-kernel, linux-fsdevel, salvet 2010/5/25 Lukas Hejtmanek <xhejtman@ics.muni.cz>: > On Tue, May 25, 2010 at 09:45:32AM -0400, William A. (Andy) Adamson wrote: >> Not get into the problem in the first place: this means >> >> 1) determine a 'lead time' where the NFS client declares a context >> expired even though it really as 'lead time' until it actually >> expires. >> >> 2) flush all writes on any contex that will expire within the lead >> time which needs to be long enough for flushes to take place. > > I think you cannot give any guarantees that the flush happens on time. There > can be server overload, network overload, anything and you are out of luck. True - but this will be the case no matter what scheme is in place. The above is to handle the normal working situation. When this fails due to network, server overload, server reboot, i.e. not-normal situation, then use the machine credential. -->Andy > > -- > Lukáš Hejtmánek > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Deadlock in NFSv4 in all kernels 2010-05-25 14:10 ` William A. (Andy) Adamson @ 2010-05-25 14:29 ` Trond Myklebust 0 siblings, 0 replies; 12+ messages in thread From: Trond Myklebust @ 2010-05-25 14:29 UTC (permalink / raw) To: William A. (Andy) Adamson Cc: Lukas Hejtmanek, linux-nfs, linux-kernel, linux-fsdevel, salvet On Tue, 2010-05-25 at 10:10 -0400, William A. (Andy) Adamson wrote: > 2010/5/25 Lukas Hejtmanek <xhejtman@ics.muni.cz>: > > On Tue, May 25, 2010 at 09:45:32AM -0400, William A. (Andy) Adamson wrote: > >> Not get into the problem in the first place: this means > >> > >> 1) determine a 'lead time' where the NFS client declares a context > >> expired even though it really as 'lead time' until it actually > >> expires. > >> > >> 2) flush all writes on any contex that will expire within the lead > >> time which needs to be long enough for flushes to take place. > > > > I think you cannot give any guarantees that the flush happens on time. There > > can be server overload, network overload, anything and you are out of luck. > > True - but this will be the case no matter what scheme is in place. > The above is to handle the normal working situation. When this fails > due to network, server overload, server reboot, i.e. not-normal > situation, then use the machine credential. Use of the machine credential also requires help from the rpc.gssd daemon. It's not a solution to the deadlock Lukas is describing. Trond ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Deadlock in NFSv4 in all kernels 2010-05-25 13:45 ` William A. (Andy) Adamson 2010-05-25 14:02 ` Lukas Hejtmanek @ 2010-05-25 14:04 ` Trond Myklebust 1 sibling, 0 replies; 12+ messages in thread From: Trond Myklebust @ 2010-05-25 14:04 UTC (permalink / raw) To: William A. (Andy) Adamson Cc: Lukas Hejtmanek, linux-nfs, linux-kernel, linux-fsdevel, salvet On Tue, 2010-05-25 at 09:45 -0400, William A. (Andy) Adamson wrote: > 2010/5/7 Lukas Hejtmanek <xhejtman@ics.muni.cz>: > > Hi, > > > > I encountered the following problem. We use short expiration time for > > kerberos contexts created by rpc.gssd (some patches were included in mainline > > nfs-utils). In particular, we use 120secs expiration time. > > > > Now, I run application that eats 80% of available RAM. Then I run 10 parallel > > dd processes that write data into NFS4 volume with sec=krb5. > > > > As soon as the kerberos context expires (i.e., up to 120 secs), the whole > > system gets stuck in do_page_fault and succesive functions. It is because > > there is no free memory in kernel, all free memory is used as cache for NFS4 > > (due to dd traffic), kernel ask NFS to write back its pages but NFS cannot do > > anything as it is missing valid context. NFS contacts rpc.gssd to provide > > a renewed context, the rpc.gssd does not provide the context as it needs some memory > > to scan /tmp for a ticket. I.e., it deadlocks. > > > > Longer context expiration time is no real solution as it only makes the > > deadlock less often. > > > > Any ideas what can be done here? > > Not get into the problem in the first place: this means > > 1) determine a 'lead time' where the NFS client declares a context > expired even though it really as 'lead time' until it actually > expires. > > 2) flush all writes on any contex that will expire within the lead > time which needs to be long enough for flushes to take place. That too is only a partial solution. The GSS context can expire early due to totally unforeseeable circumstances such as a server reboot, for instance. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-05-25 17:11 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-05-07 15:39 Deadlock in NFSv4 in all kernels Lukas Hejtmanek 2010-05-24 21:24 ` Pavel Machek 2010-05-25 12:28 ` Trond Myklebust 2010-05-25 12:58 ` Lukas Hejtmanek 2010-05-25 13:39 ` Trond Myklebust 2010-05-25 14:07 ` Zdenek Salvet 2010-05-25 17:10 ` Sunil Mushran 2010-05-25 13:45 ` William A. (Andy) Adamson 2010-05-25 14:02 ` Lukas Hejtmanek 2010-05-25 14:10 ` William A. (Andy) Adamson 2010-05-25 14:29 ` Trond Myklebust 2010-05-25 14:04 ` Trond Myklebust
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).