Re: Randomly inaccessible files through NFS

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Denis V. Nagorny" <dvnagorny@compcenter.org>
To: linux-nfs@vger.kernel.org
Subject: Re: Randomly inaccessible files through NFS
Date: Fri, 17 Aug 2012 16:28:53 +0400	[thread overview]
Message-ID: <502E3905.2070709@compcenter.org> (raw)
In-Reply-To: <CAA4Z2ZQR4RVCEzct2AH4yBy=oGkw78M=Gc8pOWC5C0vGbSy-Xg@mail.gmail.com>

One more observation,
It looks like NFS4ERR_EXPIRED messages are delivered for the process 
blocked in the kernel:

Aug 17 13:18:41 srvmpidev03 kernel: INFO: task bcast2:6338 blocked for 
more than 120 seconds.
Aug 17 13:18:41 srvmpidev03 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 17 13:18:41 srvmpidev03 kernel: bcast2        D 000000000000000e     
0  6338      1 0x00000084
Aug 17 13:18:41 srvmpidev03 kernel: ffff880c238b76e8 0000000000000082 
0000000000000000 ffffffffa03a9eed
Aug 17 13:18:41 srvmpidev03 kernel: ffff880621561080 ffff880603c79aa0 
ffff880603c79bc0 00000001004eaf4f
Aug 17 13:18:41 srvmpidev03 kernel: ffff880c23f325f8 ffff880c238b7fd8 
000000000000f598 ffff880c23f325f8
Aug 17 13:18:41 srvmpidev03 kernel: Call Trace:
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffffa03a9eed>] ? 
__put_nfs_open_context+0x4d/0xf0 [nfs]
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff8110d310>] ? 
sync_page+0x0/0x50
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff814daf13>] 
io_schedule+0x73/0xc0
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff8110d34d>] sync_page+0x3d/0x50
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff814db77f>] 
__wait_on_bit+0x5f/0x90
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff8110d503>] 
wait_on_page_bit+0x73/0x80
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff8108e140>] ? 
wake_bit_function+0x0/0x50
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff8110d5aa>] 
__lock_page_or_retry+0x3a/0x60
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff8110e73f>] 
filemap_fault+0x2df/0x500
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff81136fe4>] 
__do_fault+0x54/0x510
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff81137597>] 
handle_pte_fault+0xf7/0xb50
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff811544ca>] ? 
alloc_pages_current+0xaa/0x110
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff81045b77>] ? 
pte_alloc_one+0x37/0x50
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff811381c8>] 
handle_mm_fault+0x1d8/0x2a0
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff810414e9>] 
__do_page_fault+0x139/0x480
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff8112dbc0>] ? 
vma_prio_tree_insert+0x30/0x50
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff8113ac8c>] ? 
__vma_link_file+0x4c/0x80
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff8113b45b>] ? 
vma_link+0x9b/0xf0
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff8113d9e9>] ? 
mmap_region+0x269/0x590
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff814e007e>] 
do_page_fault+0x3e/0xa0
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff814dd425>] 
page_fault+0x25/0x30
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff8126e3af>] ? 
__clear_user+0x3f/0x70
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff8126e391>] ? 
__clear_user+0x21/0x70
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff8126e418>] 
clear_user+0x38/0x40
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff811c4e1d>] padzero+0x2d/0x40
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff811c6e3e>] 
load_elf_binary+0x88e/0x1b10
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff811330f1>] ? 
follow_page+0x321/0x460
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff8113839f>] ? 
__get_user_pages+0x10f/0x420
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff811c38dc>] ? 
load_misc_binary+0xac/0x3e0
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff81179f2b>] 
search_binary_handler+0x10b/0x350
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff8117b0b9>] 
do_execve+0x239/0x310
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff8126e4ca>] ? 
strncpy_from_user+0x4a/0x90
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff810095ca>] 
sys_execve+0x4a/0x80
Aug 17 13:18:41 srvmpidev03 kernel: [<ffffffff8100b5ca>] 
stub_execve+0x6a/0xc0


17.08.2012 14:50, Adrien Kunysz пишет:
> I would try to tcpdump all NFS traffic starting when the client is in
> the "stable" state (including the MOUNT call). Once it's in the
> "unstable" state, I would stop the capture then try to figure out
> exactly at what point it switched from "stable" to "unstable" (maybe
> figure out when exactly the NFS4ERR_EXPIRED start to happen) and track
> it down to a specific NFS pattern.
>
> I don't know much about NFS really so I cannot be more specific. Yes,
> this probably requires lot of storage to capture all the traffic and
> lot of time to analyse the captured data.
>
> On Fri, Aug 17, 2012 at 11:26 AM, Denis V. Nagorny
> <dvnagorny@compcenter.org> wrote:
>> 15.08.2012 11:54, Denis V. Nagorny пишет:
>>
>>> Hello,
>>>
>>> Using Scientific Linux 6.1 (I think it's equal to RH EL 6.1) we met the
>>> strange issue.  Several last months we have problem. After one or two days
>>> of successful work, files on nfs server begins to be randomly unacessible.
>>> I doesn't mean that files becames hidden or something like this. It means
>>> that attempts to open some random files may be unsuccessful. Usually restart
>>> of nfs server makes situation better but for several days only. There are no
>>> any messages about errors in logs on server and clients machines. Can
>>> anybody point me how can I try to understand what happens at least. Sorry
>>> for my english.
>>>
>>> Denis.
>>
>> Hello again,
>>
>> I've made some additional experiments. It looks like nfs clients can be in
>> one of two states: "quite stable" and "quite unstable". Clients are usually
>> stable but after some heavy job with a lot of I/O with NFS server clients
>> become "quite unstable" and fails even with single file operations with NFS
>> server. In this state I can't unmount NFS shares and so on.  I've tried to
>> analyse with wireshark and found that in unstable state there are a lot of
>> NFS4ERR_EXPIRED answers from NFS server.  In one of experiments I've changed
>> NICs in both machines involved - result the same. So I'm still looking for
>> the ways to understand the problem.
>> Can anybody give me any advices?
>>
>> Denis
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

     prev parent reply	other threads:[~2012-08-17 12:28 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-15  7:54 Randomly inaccessible files through NFS Denis V. Nagorny
2012-08-17 10:26 ` Denis V. Nagorny
2012-08-17 10:50   ` Adrien Kunysz
2012-08-17 12:28     ` Denis V. Nagorny [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=502E3905.2070709@compcenter.org \
    --to=dvnagorny@compcenter.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.