All of lore.kernel.org
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: "Patrick J. LoPresti" <lopresti@gmail.com>
Cc: linux-nfs@vger.kernel.org,
	Trond Myklebust <trond.myklebust@fys.uio.no>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] nfs: lookupcache coherence bugs in WCC update path (revised)
Date: Fri, 13 Aug 2010 08:36:35 -0400	[thread overview]
Message-ID: <20100813123635.GA8945@fieldses.org> (raw)
In-Reply-To: <AANLkTi=OEkPuL+0E_VhPOn-U6neR=jJWunz-2DT755aB@mail.gmail.com>

On Thu, Aug 12, 2010 at 10:16:57PM -0700, Patrick J. LoPresti wrote:
> OK, I found the problem.  Whether it is a bug depends on your point of
> view, I suppose.
> 
> Although I am using XFS on my file server, and XFS has
> nanosecond-granularity timestamps, the true granularity of ctime/mtime
> is ultimately determined by the resolution of current_fs_time() which
> calls current_kernel_time(); i.e. jiffies; i.e. 1/HZ.
> 
> On my system (SLES 11 SP1), HZ is 250.  In my failing application, 4
> ms is long enough for many filesystem operations, even over NFS. (My
> network is 10GigE with a 300 microsecond round trip time, and my
> systems are very new.)
> 
> Anyway, I instrumented the VFS code on the NFS server to catch it in
> the act; specifically, I saw the following sequence:
> 
> file A is created on server, updating directory mtime
> NFS client does LOOKUP on file B, gets nfserr_noent
> file B created on server, does not update directory's mtime
> 
> ...all within 4 milliseconds (which is why the creation of file B did
> not update the directory's mtime).
> 
> The result is that the lookup cache on the client is stale and stays
> stale until some other client (or the server) updates the directory.
> Even making changes from the client does not invalidate the cache,
> thanks to the clever WCC logic that Trond had to explain to me
> earlier.
> 
> This is not exactly an NFS specific question, but I will ask anyway...
>  If I were to propose modifying current_fs_time() to call
> getnstimeofday() instead of current_kernel_time(), would the VFS folks
> laugh me out the door?

Good question.... Cc'ing linux-fsdevel.

If you have an easy reproducer it might also be worth experimenting with
NFSv4 exports of an ext4 system mounted with the i_version option.

Actually: should the NFSv4 server always be using i_version as the
change attribute for directories?  (Does every exportable filesystem
update it on every directory modification?)

(And if so, maybe on directories we should factor the i_version into the
low bits of the mtime reported to NFSv3 client?)

--b.

> 
> 1/HZ granularity for file timestamps just seems so...  90s or
> something.  4ms really is a lot of time these days.
> 
>  - Pat
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
To: "Patrick J. LoPresti" <lopresti-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Trond Myklebust
	<trond.myklebust-41N18TsMXrtuMpJDpNschA@public.gmane.org>,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH] nfs: lookupcache coherence bugs in WCC update path (revised)
Date: Fri, 13 Aug 2010 08:36:35 -0400	[thread overview]
Message-ID: <20100813123635.GA8945@fieldses.org> (raw)
In-Reply-To: <AANLkTi=OEkPuL+0E_VhPOn-U6neR=jJWunz-2DT755aB-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Thu, Aug 12, 2010 at 10:16:57PM -0700, Patrick J. LoPresti wrote:
> OK, I found the problem.  Whether it is a bug depends on your point of
> view, I suppose.
> 
> Although I am using XFS on my file server, and XFS has
> nanosecond-granularity timestamps, the true granularity of ctime/mtime
> is ultimately determined by the resolution of current_fs_time() which
> calls current_kernel_time(); i.e. jiffies; i.e. 1/HZ.
> 
> On my system (SLES 11 SP1), HZ is 250.  In my failing application, 4
> ms is long enough for many filesystem operations, even over NFS. (My
> network is 10GigE with a 300 microsecond round trip time, and my
> systems are very new.)
> 
> Anyway, I instrumented the VFS code on the NFS server to catch it in
> the act; specifically, I saw the following sequence:
> 
> file A is created on server, updating directory mtime
> NFS client does LOOKUP on file B, gets nfserr_noent
> file B created on server, does not update directory's mtime
> 
> ...all within 4 milliseconds (which is why the creation of file B did
> not update the directory's mtime).
> 
> The result is that the lookup cache on the client is stale and stays
> stale until some other client (or the server) updates the directory.
> Even making changes from the client does not invalidate the cache,
> thanks to the clever WCC logic that Trond had to explain to me
> earlier.
> 
> This is not exactly an NFS specific question, but I will ask anyway...
>  If I were to propose modifying current_fs_time() to call
> getnstimeofday() instead of current_kernel_time(), would the VFS folks
> laugh me out the door?

Good question.... Cc'ing linux-fsdevel.

If you have an easy reproducer it might also be worth experimenting with
NFSv4 exports of an ext4 system mounted with the i_version option.

Actually: should the NFSv4 server always be using i_version as the
change attribute for directories?  (Does every exportable filesystem
update it on every directory modification?)

(And if so, maybe on directories we should factor the i_version into the
low bits of the mtime reported to NFSv3 client?)

--b.

> 
> 1/HZ granularity for file timestamps just seems so...  90s or
> something.  4ms really is a lot of time these days.
> 
>  - Pat
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2010-08-13 12:38 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-11 23:02 [PATCH] nfs: lookupcache coherence bugs in WCC update path Patrick J. LoPresti
2010-08-12  5:26 ` [PATCH] nfs: lookupcache coherence bugs in WCC update path (revised) Patrick J. LoPresti
2010-08-12 15:49   ` Trond Myklebust
2010-08-12 17:13     ` Patrick J. LoPresti
2010-08-12 17:26       ` Trond Myklebust
2010-08-12 17:48         ` Patrick J. LoPresti
2010-08-12 18:50         ` Patrick J. LoPresti
2010-08-13  5:16           ` Patrick J. LoPresti
2010-08-13 12:36             ` J. Bruce Fields [this message]
2010-08-13 12:36               ` J. Bruce Fields
2010-08-13 18:30               ` Patrick J. LoPresti
2010-08-13 18:30                 ` Patrick J. LoPresti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100813123635.GA8945@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=lopresti@gmail.com \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.