All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mi Jinlong <mijinlong@cn.fujitsu.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>,
	Jeff Layton <jlayton@redhat.com>,
	NFSv3 list <linux-nfs@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org, ebiederm@xmission.com,
	adobriyan@gmail.com, viro@ZenIV.linux.org.uk,
	jamie@shareable.org
Subject: Re: [PATCH] VFS: Unlink should revoke all outstanding leases on	file
Date: Tue, 25 May 2010 18:14:56 +0800	[thread overview]
Message-ID: <4BFBA320.8090102@cn.fujitsu.com> (raw)
In-Reply-To: <20100521210738.GK11675@fieldses.org>



J. Bruce Fields :
> On Thu, May 20, 2010 at 05:46:06PM +0800, Mi Jinlong wrote:
>> J. Bruce Fields :
>>> I don't know of any existing lock that does exactly what we want.
>>>
>>> Somebody at citi worked on a better lease implementation for a while,
>>> but I don't think we ever really got it right; the last version I can
>>> find is here:
>>>
>>> 	git://linux-nfs.org/~bfields/linux-topics.git leases
>>   When reading the code of the git, i found a patch which try to fix
>>   the lease's problem, but only for unlink.
> 
> It's 8 patches together:
> 
>>   commit id: d5a08e556116c66fb60a448f805a40bf54314634
>>         msg: "leases: break file leases on unlink."
>>
>>   In this patch, it seems only add break_lease() and some other functions
>>   but seems don't avoid the problem of race.
> 
> Look again: break_lease() is there, but there's also a break_lease_end()
> after the unlink.

  Thanks. That's important.

> 
>> Or there is some different 
>>   at break_lease() with the community's kernel.
>>
>>   Can you give me some message about the new lease? Thanks.
> 
> So the 8 patches at that branch are:
> 	leases: introduce per-inode lease enabling/disabling
> 	VFS: clean up extra dereferences in do_unlinkat()
> 	leases: break file leases on unlink.
> 	leases: break file leases on rename.
> 	leases: break leases on chown.
> 	VFS: refactor sys_fchmod and sys_fchmodat
> 	leases: break leases on chmod.
> 	leases: break leases on link.
> 
> Like I say, I don't think they're correct or I'd just copy them all to
> the list.  But maybe the comment on the first patch (appended) is
> useful.

  As reading the first patch ("leases: introduce ... "), it's really useful.
  But, as Jamie Lokier said "new lease semantics should use new userspace API.",
  is there some new lease under development ?

  And, IMO, this problem we discussing is serious that should be fix ASAP.
  Can we fix this problem refer to this solutions? 

thanks, 
Mi Jinlong

> 
> --b.
> 
> leases: introduce per-inode lease enabling/disabling
> 
> The current file lease implementation is inadequate (for the purposes of
> nfs, and, we believe, for the purposes of Samba), in at least two ways:
> 
> 	- Leases are broken only conflicting opens; but both nfsv4
> 	  delegations and (we're told) Windows op locks actually require
> 	  that leases be broken on any operation that changes file
> 	  metadata--including unlink, link, rename, chmod, and chown.
> 
> 	- The internal kernel api used for lease-breaking is inherently
> 	  racy, consisting as it does of a single break_lease() call.
> 	  (Consider this scenario: a file is not currently open and is
> 	  about to be unlinked.  During unlink processing, a lookup is
> 	  done, and break_lease() is called.  After the break_lease(),
> 	  but before the unlink completes, another user opens the file
> 	  and gets a read lease.  The unlink then completes, but the
> 	  other user thinks their read lease is still valid.  This
> 	  situation would be avoided if lease-granting for the inode
> 	  were disabled for the duration of the unlink.)
> 
> We're primarily interested in the case of read leases for now.  (Write
> leases, which also must be broken on *access* to a file, are more
> difficult to get completely right, and aren't used by the current nfs
> server.)
> 
> Fixing the second problem requires replacing break_lease() by a pair of
> calls, here called break_lease() and break_lease_end(), between which
> new leases are temporarily prohibited.
> 
> We want to implement that temporary prohibition in a simple way that has
> low impact in common (uncontended) cases.
> 
> This patch adds a field, i_leasecount, which provides mutual exclusion
> between inode-modifying operations and read leases in the same way the
> i_writecount provides mutual exclusion between write opens and execs:
> when i_leasecount is positive, it counts the number of leases on the
> given inode, and when it's negative it counts the number of operations
> which want leases temporarily disabled.  This allows selective
> enabling/disabling of leases on a per-inode basis.
> 
> To that end, the functions leases_get_access() and leases_put_access()
> are used when a lease is granted and returned, respectively.  The
> functions leases_deny_access() and leases_allow_access() are used to
> prevent races between breaking-with-FMODE_WRITE and write-lease-granting
> for the entire duration of a file operation.  Currently, leases are
> broken only when a file is opened or truncated; these functions will
> allow leases to be broken on things like unlink and rename as well.
> NFSv4 implements delegations using leases, and needs its delegations to
> be revoked on unlinks, renames, chowns, etc.
> 
> Note that this patch changes break_lease() and __break_lease(), such
> that when they are called with FMODE_WRITE and return successfully, they
> will leave leases disabled on the inode in question, and the caller must
> eventually call break_lease_end() to re-enable leasing.  As alluded to
> in the scenario above, this behavior isn't necessary when breaking
> without FMODE_WRITE: existing and new read-leases wouldn't need to be
> revoked or blocked; and a write-lease-granting setlease won't race the
> break_lease() because the latter is presumed to have been preceded by
> something like a dget() on the dentry in question (where d_count or
> i_count > 1 blocks write-lease-granting).
> 
> This patch also closes a small existing open/lease race: a lease-related
> race exists between the time that outstanding leases are broken (by
> may_open()) and the time that, e.g., O_RDWR or O_WRONLY are reflected in
> the inode's i_writecount variable (which will prevent subsequent
> lease-granting setlease calls).  Conceivably, a read lease could be
> granted in the interim.
> 
> To deal with this, may_open() is modified so that, on success and when
> called with FMODE_WRITE, it will return with lease-granting disabled for
> the inode in question.  do_filp_open() is modified so that leasing is
> re-enabled once everything is finished.  Analogous changes are made on
> truncation.


WARNING: multiple messages have this Message-ID (diff)
From: Mi Jinlong <mijinlong-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
To: "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
Cc: Trond Myklebust
	<trond.myklebust-41N18TsMXrtuMpJDpNschA@public.gmane.org>,
	Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	NFSv3 list <linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org,
	adobriyan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org,
	jamie-yetKDKU6eevNLxjTenLetw@public.gmane.org
Subject: Re: [PATCH] VFS: Unlink should revoke all outstanding leases on	file
Date: Tue, 25 May 2010 18:14:56 +0800	[thread overview]
Message-ID: <4BFBA320.8090102@cn.fujitsu.com> (raw)
In-Reply-To: <20100521210738.GK11675-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>



J. Bruce Fields :
> On Thu, May 20, 2010 at 05:46:06PM +0800, Mi Jinlong wrote:
>> J. Bruce Fields :
>>> I don't know of any existing lock that does exactly what we want.
>>>
>>> Somebody at citi worked on a better lease implementation for a while,
>>> but I don't think we ever really got it right; the last version I can
>>> find is here:
>>>
>>> 	git://linux-nfs.org/~bfields/linux-topics.git leases
>>   When reading the code of the git, i found a patch which try to fix
>>   the lease's problem, but only for unlink.
> 
> It's 8 patches together:
> 
>>   commit id: d5a08e556116c66fb60a448f805a40bf54314634
>>         msg: "leases: break file leases on unlink."
>>
>>   In this patch, it seems only add break_lease() and some other functions
>>   but seems don't avoid the problem of race.
> 
> Look again: break_lease() is there, but there's also a break_lease_end()
> after the unlink.

  Thanks. That's important.

> 
>> Or there is some different 
>>   at break_lease() with the community's kernel.
>>
>>   Can you give me some message about the new lease? Thanks.
> 
> So the 8 patches at that branch are:
> 	leases: introduce per-inode lease enabling/disabling
> 	VFS: clean up extra dereferences in do_unlinkat()
> 	leases: break file leases on unlink.
> 	leases: break file leases on rename.
> 	leases: break leases on chown.
> 	VFS: refactor sys_fchmod and sys_fchmodat
> 	leases: break leases on chmod.
> 	leases: break leases on link.
> 
> Like I say, I don't think they're correct or I'd just copy them all to
> the list.  But maybe the comment on the first patch (appended) is
> useful.

  As reading the first patch ("leases: introduce ... "), it's really useful.
  But, as Jamie Lokier said "new lease semantics should use new userspace API.",
  is there some new lease under development ?

  And, IMO, this problem we discussing is serious that should be fix ASAP.
  Can we fix this problem refer to this solutions? 

thanks, 
Mi Jinlong

> 
> --b.
> 
> leases: introduce per-inode lease enabling/disabling
> 
> The current file lease implementation is inadequate (for the purposes of
> nfs, and, we believe, for the purposes of Samba), in at least two ways:
> 
> 	- Leases are broken only conflicting opens; but both nfsv4
> 	  delegations and (we're told) Windows op locks actually require
> 	  that leases be broken on any operation that changes file
> 	  metadata--including unlink, link, rename, chmod, and chown.
> 
> 	- The internal kernel api used for lease-breaking is inherently
> 	  racy, consisting as it does of a single break_lease() call.
> 	  (Consider this scenario: a file is not currently open and is
> 	  about to be unlinked.  During unlink processing, a lookup is
> 	  done, and break_lease() is called.  After the break_lease(),
> 	  but before the unlink completes, another user opens the file
> 	  and gets a read lease.  The unlink then completes, but the
> 	  other user thinks their read lease is still valid.  This
> 	  situation would be avoided if lease-granting for the inode
> 	  were disabled for the duration of the unlink.)
> 
> We're primarily interested in the case of read leases for now.  (Write
> leases, which also must be broken on *access* to a file, are more
> difficult to get completely right, and aren't used by the current nfs
> server.)
> 
> Fixing the second problem requires replacing break_lease() by a pair of
> calls, here called break_lease() and break_lease_end(), between which
> new leases are temporarily prohibited.
> 
> We want to implement that temporary prohibition in a simple way that has
> low impact in common (uncontended) cases.
> 
> This patch adds a field, i_leasecount, which provides mutual exclusion
> between inode-modifying operations and read leases in the same way the
> i_writecount provides mutual exclusion between write opens and execs:
> when i_leasecount is positive, it counts the number of leases on the
> given inode, and when it's negative it counts the number of operations
> which want leases temporarily disabled.  This allows selective
> enabling/disabling of leases on a per-inode basis.
> 
> To that end, the functions leases_get_access() and leases_put_access()
> are used when a lease is granted and returned, respectively.  The
> functions leases_deny_access() and leases_allow_access() are used to
> prevent races between breaking-with-FMODE_WRITE and write-lease-granting
> for the entire duration of a file operation.  Currently, leases are
> broken only when a file is opened or truncated; these functions will
> allow leases to be broken on things like unlink and rename as well.
> NFSv4 implements delegations using leases, and needs its delegations to
> be revoked on unlinks, renames, chowns, etc.
> 
> Note that this patch changes break_lease() and __break_lease(), such
> that when they are called with FMODE_WRITE and return successfully, they
> will leave leases disabled on the inode in question, and the caller must
> eventually call break_lease_end() to re-enable leasing.  As alluded to
> in the scenario above, this behavior isn't necessary when breaking
> without FMODE_WRITE: existing and new read-leases wouldn't need to be
> revoked or blocked; and a write-lease-granting setlease won't race the
> break_lease() because the latter is presumed to have been preceded by
> something like a dget() on the dentry in question (where d_count or
> i_count > 1 blocks write-lease-granting).
> 
> This patch also closes a small existing open/lease race: a lease-related
> race exists between the time that outstanding leases are broken (by
> may_open()) and the time that, e.g., O_RDWR or O_WRONLY are reflected in
> the inode's i_writecount variable (which will prevent subsequent
> lease-granting setlease calls).  Conceivably, a read lease could be
> granted in the interim.
> 
> To deal with this, may_open() is modified so that, on success and when
> called with FMODE_WRITE, it will return with lease-granting disabled for
> the inode in question.  do_filp_open() is modified so that leasing is
> re-enabled once everything is finished.  Analogous changes are made on
> truncation.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2010-05-25 10:14 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-14  9:35 [PATCH] VFS: Unlink should revoke all outstanding leases on file Mi Jinlong
2010-05-14  9:35 ` Mi Jinlong
2010-05-14  9:58 ` Jeff Layton
2010-05-14  9:58   ` Jeff Layton
2010-05-14 17:17   ` Trond Myklebust
2010-05-14 17:38     ` Jeff Layton
2010-05-14 17:38       ` Jeff Layton
2010-05-14 17:46       ` Jamie Lokier
2010-05-14 17:46         ` Jamie Lokier
2010-05-14 18:16         ` Jeremy Allison
2010-05-19 14:06           ` J. Bruce Fields
2010-05-19 16:21             ` Jamie Lokier
2010-05-19 16:21               ` Jamie Lokier
2010-05-14 17:59       ` Trond Myklebust
2010-05-14 17:59         ` Trond Myklebust
2010-05-14 18:31         ` Trond Myklebust
2010-05-14 19:23           ` J. Bruce Fields
2010-05-14 19:23             ` J. Bruce Fields
2010-05-19  9:46             ` Mi Jinlong
2010-05-19  9:46               ` Mi Jinlong
2010-05-19 15:57               ` J. Bruce Fields
2010-05-20  9:46                 ` Mi Jinlong
2010-05-20  9:46                   ` Mi Jinlong
2010-05-21 21:07                   ` J. Bruce Fields
2010-05-21 21:07                     ` J. Bruce Fields
2010-05-25 10:14                     ` Mi Jinlong [this message]
2010-05-25 10:14                       ` Mi Jinlong
2010-05-19 16:14             ` Jamie Lokier
2010-05-20  2:21               ` J. Bruce Fields
2010-05-20  2:21                 ` J. Bruce Fields
     [not found]   ` <20100514055844.109d2fdc-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2010-05-19  9:49     ` Mi Jinlong
2010-05-19  9:49       ` Mi Jinlong
2010-05-19 16:03       ` J. Bruce Fields
2010-05-20  9:23         ` Mi Jinlong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BFBA320.8090102@cn.fujitsu.com \
    --to=mijinlong@cn.fujitsu.com \
    --cc=adobriyan@gmail.com \
    --cc=bfields@fieldses.org \
    --cc=ebiederm@xmission.com \
    --cc=jamie@shareable.org \
    --cc=jlayton@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@fys.uio.no \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.