From: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
To: Olga Kornievskaia <aglo@umich.edu>
Cc: Andy Adamson <William.Adamson@netapp.com>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
Trond Myklebust <trond.myklebust@primarydata.com>,
Steve Dickson <steved@redhat.com>
Subject: Re: Lost CLOSE with NFSv4.1 on RHEL7 ( and bejond?)
Date: Mon, 1 Aug 2016 13:08:25 +0200 (CEST) [thread overview]
Message-ID: <1548681850.3694730.1470049705230.JavaMail.zimbra@desy.de> (raw)
In-Reply-To: <CAN-5tyEKDpciUS81QEQwS_ioufndrjDKzPwtcYkQ8O8hvtWiQQ@mail.gmail.com>
Hi Olga,
we have installed kernel 4.7.0 on one of the nodes and don't see missing
closes from that node.
Nevertheless, I don't think that the commit you have mentioned is fixing that,
as it fixes OPEN_DOWNGRADE, but we have a sequence of OPEN->CLOSE->OPEN. The
OPEN_DOWNGRADE is not expected - file is already closed when a second open
is sent and both requests using the same session slot.
Have you seen a similar issue on vanilla or rhel kernel?
Thanks a lot,
Tigran.
----- Original Message -----
> From: "Olga Kornievskaia" <aglo@umich.edu>
> To: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
> Cc: "Andy Adamson" <William.Adamson@netapp.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "Trond Myklebust"
> <trond.myklebust@primarydata.com>, "Steve Dickson" <steved@redhat.com>
> Sent: Thursday, July 14, 2016 4:52:59 PM
> Subject: Re: Lost CLOSE with NFSv4.1 on RHEL7 ( and bejond?)
> Hi Tigran,
>
> On Wed, Jul 13, 2016 at 12:49 PM, Mkrtchyan, Tigran
> <tigran.mkrtchyan@desy.de> wrote:
>>
>>
>> Hi Andy,
>>
>> I will try to get upstream kernel on one of the nodes. It will take
>> some time as we need to add a new host into the cluster and get
>> some traffic go through it.
>>
>> In the mean while, with RHEL7 we get it easy reproduced - about 10
>> such cases per day. Is there any tool that will help us to see where
>> it happens? Some traces points? Call trace from vfs close to NFS close?
>
> There are NFS tracepoints but I don't know think there are VFS
> tracepoints. Unfortunately, there was a bug in the OPEN tracepoints
> that caused a kernel crash. I had a bugzilla out for RHEL7.2. It says
> it's fixed in the later kernel (.381) but it's currently not back
> ported to RHEL7.2z but hopefully will be soon (just chatted with Steve
> about getting the fix into zstream). I made no progress in figuring
> out what could be causing the lack of CLOSE and it was hard for me to
> reproduce.
>
> Just recently Trond fixed a problem where a CLOSE that was suppose to
> be sent as an OPEN_DOWNGRADE wasn't sent (commit 0979bc2a59) . I
> wonder if that can be fixing this problem....
>
>> There is a one comment in the kernel code, which sounds similar:
>> (http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=blob;f=fs/nfs/nfs4proc.c;h=519368b987622ea23bea210929bebfd0c327e14e;hb=refs/heads/linux-next#l2955)
>>
>> nfs4proc.c: 2954
>> ====
>>
>> /*
>> * It is possible for data to be read/written from a mem-mapped file
>> * after the sys_close call (which hits the vfs layer as a flush).
>> * This means that we can't safely call nfsv4 close on a file until
>> * the inode is cleared. This in turn means that we are not good
>> * NFSv4 citizens - we do not indicate to the server to update the file's
>> * share state even when we are done with one of the three share
>> * stateid's in the inode.
>> *
>> * NOTE: Caller must be holding the sp->so_owner semaphore!
>> */
>> int nfs4_do_close(struct nfs4_state *state, gfp_t gfp_mask, int wait)
>>
>
> I'm not sure if the comment means to say that there is a possibility
> that NFS won't send a CLOSE (or at least I hope not). I thought that
> because we keep a reference count on the inode and send the CLOSE when
> it goes down to 0. Basically the last WRITE will trigger the nfs close
> not the vfs_close.
>
>
>> ====
>>
>>
>> Tigran.
>>
>>
>> ----- Original Message -----
>>> From: "Andy Adamson" <William.Adamson@netapp.com>
>>> To: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
>>> Cc: "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "Andy Adamson"
>>> <William.Adamson@netapp.com>, "Trond Myklebust"
>>> <trond.myklebust@primarydata.com>, "Steve Dickson" <steved@redhat.com>
>>> Sent: Tuesday, July 12, 2016 7:16:19 PM
>>> Subject: Re: Lost CLOSE with NFSv4.1 on RHEL7 ( and bejond?)
>>
>>> Hi Tigran
>>>
>>> Can you test with an upstream kernel? Olga has seen issues around no CLOSE being
>>> sent - it is really hard to reproduce….
>>>
>>> —>Andy
>>>
>>>
>>>> On Jul 7, 2016, at 6:49 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
>>>>
>>>>
>>>>
>>>> Dear NFS folks,
>>>>
>>>> we observe orphan open-states on our deployment with nfsv4.1.
>>>> Our setup - two client nodes, running RHEL-7.2 with kernel
>>>> 3.10.0-327.22.2.el7.x86_64. Both nodes running ownCloud (like
>>>> a dropbox) which nfsv4.1 mounts to dCache storage. Some clients
>>>> connected to node1, others to node2.
>>>>
>>>> Time-to-time we see some 'active' transfers on data our DS
>>>> which do nothing. There is a corresponding state on MDS.
>>>>
>>>> I have traced one one such cases:
>>>>
>>>> - node1 uploads the file.
>>>> - node2 reads the file couple of times, OPEN+LAYOUTGET+CLOSE
>>>> - node2 sends OPEN+LAYOUTGET
>>>> - there is no open file on node2 which points to it.
>>>> - CLOSE never send to the server.
>>>> - node1 eventually removes the removes the file
>>>>
>>>> We have many other cases where file is not removed, but this one I was
>>>> able to trace. The link to capture files:
>>>>
>>>> https://desycloud.desy.de/index.php/s/YldowcRzTGJeLbN
>>>>
>>>> We had ~ 10^6 transfers in last 2 days and 29 files in such state (~0.0029%).
>>>>
>>> > Tigran.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2016-08-01 11:08 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-07 10:49 Lost CLOSE with NFSv4.1 on RHEL7 ( and bejond?) Mkrtchyan, Tigran
2016-07-12 17:16 ` Adamson, Andy
2016-07-13 16:49 ` Mkrtchyan, Tigran
2016-07-14 14:52 ` Olga Kornievskaia
2016-08-01 11:08 ` Mkrtchyan, Tigran [this message]
2016-08-01 21:22 ` Olga Kornievskaia
2016-08-04 15:04 ` Mkrtchyan, Tigran
2016-08-04 19:00 ` Olga Kornievskaia
2016-08-04 21:20 ` Olga Kornievskaia
2016-08-09 10:57 ` Mkrtchyan, Tigran
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1548681850.3694730.1470049705230.JavaMail.zimbra@desy.de \
--to=tigran.mkrtchyan@desy.de \
--cc=William.Adamson@netapp.com \
--cc=aglo@umich.edu \
--cc=linux-nfs@vger.kernel.org \
--cc=steved@redhat.com \
--cc=trond.myklebust@primarydata.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).