Re: client caching and locks

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "bfields@fieldses.org" <bfields@fieldses.org>
To: Trond Myklebust <trondmy@hammerspace.com>
Cc: "inoguchi.yuki@fujitsu.com" <inoguchi.yuki@fujitsu.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: client caching and locks
Date: Thu, 18 Jun 2020 16:09:05 -0400	[thread overview]
Message-ID: <20200618200905.GA10313@fieldses.org> (raw)
In-Reply-To: <22b841f7a8979f19009c96f31a7be88dd177a47a.camel@hammerspace.com>

On Thu, Jun 18, 2020 at 02:29:42PM +0000, Trond Myklebust wrote:
> On Thu, 2020-06-18 at 09:54 +0000, inoguchi.yuki@fujitsu.com wrote:
> > > What does the client do to its cache when it writes to a locked
> > > range?
> > > 
> > > The RFC:
> > > 
> > > 	https://tools.ietf.org/html/rfc7530#section-10.3.2
> > > 
> > > seems to apply that you should get something like local-filesystem
> > > semantics if you write-lock any range that you write to and read-
> > > lock
> > > any range that you read from.
> > > 
> > > But I see a report that when applications write to non-overlapping
> > > ranges (while taking locks over those ranges), they don't see each
> > > other's updates.
> > > 
> > > I think for simultaneous non-overlapping writes to work that way,
> > > the
> > > client would need to invalidate its cache on unlock (except for the
> > > locked range).  But i can't tell what the client's designed to do.
> > 
> > Simultaneous non-overlapping WRITEs is not taken into consideration
> > in RFC7530.
> > I personally think it is not necessary to deal with this case by
> > modifying the kernel because
> > the application on the client can be implemented to avoid it.
> > 
> > Serialization of the simultaneous operations may be one of the ways.
> > Just before the write operation, each client locks and reads the
> > overlapped range of data
> > instead of obtaining a lock in their own non-overlapping range.
> > They can reflect updates from other clients in this case.
> > 
> > Yuki Inoguchi
> > 
> > > --b.
> 
> See the function 'fs/nfs/file.c:do_setlk()'. We flush dirty file data
> both before and after taking the byte range lock. After taking the
> lock, we force a revalidation of the data before returning control to
> the application (unless there is a delegation that allows us to cache
> more aggressively).
> 
> In addition, if you look at fs/nfs/file.c:do_unlk() you'll note that we
> force a flush of all dirty file data before releasing the lock.
> 
> Finally, note that we turn off assumptions of close-to-open caching
> semantics when we detect that the application is using locking, and we
> turn off optimisations such as assuming we can extend writes to page
> boundaries when the page is marked as being up to date.
> 
> IOW: if all the clients are running Linux, then the thread that took
> the lock should see 100% up to date data in the locked range. I believe
> most (if not all) non-Linux clients use similar semantics when
> taking/releasing byte range locks, so they too should be fine.

I probably don't understand the algorithm (in particular, how it
revalidates caches after a write).

How does it avoid a race like this?:

Start with a file whose data is all 0's and change attribute x:

        client 0                        client 1
        --------                        --------
        take write lock on byte 0
                                        take write lock on byte 1
        write 1 to offset 0
          change attribute now x+1
                                        write 1 to offset 1
                                          change attribute now x+2
        getattr returns x+2
                                        getattr returns x+2
        unlock
                                        unlock

        take readlock on byte 1

At this point a getattr will return change attribute x+2, the same as
was returned after client 0's write.  Does that mean client 0 assumes
the file data is unchanged since its last write?

--b.

next prev parent reply	other threads:[~2020-06-18 20:09 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-08 21:19 client caching and locks J. Bruce Fields
2020-06-18  9:54 ` inoguchi.yuki
2020-06-18 14:29   ` Trond Myklebust
2020-06-18 20:09     ` bfields [this message]
2020-06-22 13:52       ` bfields
2020-10-01 21:47         ` bfields
2020-10-01 22:26           ` Matt Benjamin
2020-10-06 17:26             ` bfields
2021-12-28  2:39               ` inoguchi.yuki
2021-12-28  5:11               ` NeilBrown
2022-01-03 16:20                 ` 'bfields@fieldses.org'
2022-01-04  9:24                   ` inoguchi.yuki
2022-01-04 12:36                     ` Trond Myklebust
2022-01-04 15:32                       ` bfields
2022-01-04 15:54                         ` Trond Myklebust
2022-01-05  9:31                           ` inoguchi.yuki
2022-01-05 22:03                             ` 'bfields@fieldses.org'
2022-01-06  7:23                               ` inoguchi.yuki
2022-01-06 14:16                                 ` 'bfields@fieldses.org'
2022-01-07  8:33                                   ` inoguchi.yuki
2022-01-09 22:16                                   ` NeilBrown
2022-01-09 22:38                                     ` 'bfields@fieldses.org'
2022-01-09 21:58                               ` NeilBrown
2022-01-09 22:41                                 ` 'bfields@fieldses.org'
2022-01-17  9:09                                 ` inoguchi.yuki
2022-01-17 22:27                                 ` NeilBrown
2022-02-02  4:09                                   ` inoguchi.yuki
2022-02-02  4:25                                     ` Trond Myklebust
2022-02-02  4:44                                   ` NeilBrown
2022-02-03  7:31                                     ` inoguchi.yuki
2022-02-07  4:16                                     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200618200905.GA10313@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=inoguchi.yuki@fujitsu.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.