* [PATCH] nfs: Always try and release an NFS file lock, even after receiving a SIGKILL [not found] <1136742914.14501494.1407251366067.JavaMail.zimbra@redhat.com> @ 2014-08-05 15:17 ` David Jeffery 2014-08-21 0:28 ` Trond Myklebust 0 siblings, 1 reply; 10+ messages in thread From: David Jeffery @ 2014-08-05 15:17 UTC (permalink / raw) To: linux-nfs If a task holding an NFS file lock is killed with SIGKILL, it can error out of do_unlk without ever trying to release the lock on the server. This triggers the WARN in locks_remove_file(), while also leaving the lock still claimed on the NFS server. The file's lock state is left out of sync between client and server, requiring a restart of one or the other in order to release this ghost lock on the file. do_unlk() should continue on and tell the server to release the lock, even if nfs_iocounter_wait() reports an error do to SIGKILL. Signed-off-by: David Jeffery <djeffery@redhat.com> --- file.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 4042ff5..1b09243 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -750,10 +750,8 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local) l_ctx = nfs_get_lock_context(nfs_file_open_context(filp)); if (!IS_ERR(l_ctx)) { - status = nfs_iocounter_wait(&l_ctx->io_count); + nfs_iocounter_wait(&l_ctx->io_count); nfs_put_lock_context(l_ctx); - if (status < 0) - return status; } /* NOTE: special case ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] nfs: Always try and release an NFS file lock, even after receiving a SIGKILL 2014-08-05 15:17 ` [PATCH] nfs: Always try and release an NFS file lock, even after receiving a SIGKILL David Jeffery @ 2014-08-21 0:28 ` Trond Myklebust 2014-08-21 15:40 ` David Jeffery 0 siblings, 1 reply; 10+ messages in thread From: Trond Myklebust @ 2014-08-21 0:28 UTC (permalink / raw) To: David Jeffery; +Cc: linux-nfs On Tue, 2014-08-05 at 11:17 -0400, David Jeffery wrote: > If a task holding an NFS file lock is killed with SIGKILL, it can > error out of do_unlk without ever trying to release the lock on > the server. This triggers the WARN in locks_remove_file(), while > also leaving the lock still claimed on the NFS server. The file's > lock state is left out of sync between client and server, requiring > a restart of one or the other in order to release this ghost lock > on the file. > > do_unlk() should continue on and tell the server to release the > lock, even if nfs_iocounter_wait() reports an error do to SIGKILL. > > Signed-off-by: David Jeffery <djeffery@redhat.com> > > --- > file.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/fs/nfs/file.c b/fs/nfs/file.c > index 4042ff5..1b09243 100644 > --- a/fs/nfs/file.c > +++ b/fs/nfs/file.c > @@ -750,10 +750,8 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local) > > l_ctx = nfs_get_lock_context(nfs_file_open_context(filp)); > if (!IS_ERR(l_ctx)) { > - status = nfs_iocounter_wait(&l_ctx->io_count); > + nfs_iocounter_wait(&l_ctx->io_count); > nfs_put_lock_context(l_ctx); > - if (status < 0) > - return status; > } > > /* NOTE: special case What guarantees that this does not lead to silent corruption of the file if there are outstanding write requests? -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] nfs: Always try and release an NFS file lock, even after receiving a SIGKILL 2014-08-21 0:28 ` Trond Myklebust @ 2014-08-21 15:40 ` David Jeffery 2014-08-21 15:50 ` Trond Myklebust 0 siblings, 1 reply; 10+ messages in thread From: David Jeffery @ 2014-08-21 15:40 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On 08/20/2014 08:28 PM, Trond Myklebust wrote: > > What guarantees that this does not lead to silent corruption of the file > if there are outstanding write requests? > Do you have a particular scenario in mind you are concerned about? Right before the code the patch modifies, nfs_sync_mapping() is called. Any writes started before the unlock operation began have already been flushed, so we shouldn't have a corruption case of writes from before the unlock began being sent after the unlock is complete. Are you concerned about some other nfs4 writes being started while we initially waited on the counter? Such a write racing with the unlock going ahead instead of erroring out could initially fail from a wrong state ID, but should retry with the new state. Is there something I've overlooked? David Jeffery ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] nfs: Always try and release an NFS file lock, even after receiving a SIGKILL 2014-08-21 15:40 ` David Jeffery @ 2014-08-21 15:50 ` Trond Myklebust 2014-08-21 18:15 ` David Jeffery 0 siblings, 1 reply; 10+ messages in thread From: Trond Myklebust @ 2014-08-21 15:50 UTC (permalink / raw) To: David Jeffery; +Cc: Linux NFS Mailing List On Thu, Aug 21, 2014 at 11:40 AM, David Jeffery <djeffery@redhat.com> wrote: > On 08/20/2014 08:28 PM, Trond Myklebust wrote: >> >> What guarantees that this does not lead to silent corruption of the file >> if there are outstanding write requests? >> > > Do you have a particular scenario in mind you are concerned about? > > Right before the code the patch modifies, nfs_sync_mapping() is called. > Any writes started before the unlock operation began have already been > flushed, so we shouldn't have a corruption case of writes from before > the unlock began being sent after the unlock is complete. > > Are you concerned about some other nfs4 writes being started while we > initially waited on the counter? Such a write racing with the unlock No. I'm worried about the writes that have been started, but which are now completing in the background while the lock is being freed. > going ahead instead of erroring out could initially fail from a wrong > state ID, but should retry with the new state. Is there something I've > overlooked? Loss of lock atomicity due to the fact that the writes are completing after the lock was released. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] nfs: Always try and release an NFS file lock, even after receiving a SIGKILL 2014-08-21 15:50 ` Trond Myklebust @ 2014-08-21 18:15 ` David Jeffery 2014-08-21 18:41 ` Trond Myklebust 0 siblings, 1 reply; 10+ messages in thread From: David Jeffery @ 2014-08-21 18:15 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List On 08/21/2014 11:50 AM, Trond Myklebust wrote: > On Thu, Aug 21, 2014 at 11:40 AM, David Jeffery <djeffery@redhat.com> wrote: >> On 08/20/2014 08:28 PM, Trond Myklebust wrote: >>> >>> What guarantees that this does not lead to silent corruption of the file >>> if there are outstanding write requests? >>> >> >> Do you have a particular scenario in mind you are concerned about? >> >> Right before the code the patch modifies, nfs_sync_mapping() is called. >> Any writes started before the unlock operation began have already been >> flushed, so we shouldn't have a corruption case of writes from before >> the unlock began being sent after the unlock is complete. >> >> Are you concerned about some other nfs4 writes being started while we >> initially waited on the counter? Such a write racing with the unlock > > No. I'm worried about the writes that have been started, but which are > now completing in the background while the lock is being freed. > >> going ahead instead of erroring out could initially fail from a wrong >> state ID, but should retry with the new state. Is there something I've >> overlooked? > > Loss of lock atomicity due to the fact that the writes are completing > after the lock was released. > I don't think my patches break lock atomicity, unless I've completely misunderstood nfs_sync_mapping()'s behavior. The background writes should have already been waited for by nfs_sync_mapping() at the beginning of do_unlk(). nfs_sync_mapping() would call nfs_wb_all() which calls sync_inode() with WB_SYNC_ALL, so it's going to push out any dirty data and wait for any writes to complete. Only after the writes are complete do we go on to call nfs_iocounter_wait(). If there is a write causing us to want to wait in nfs_iocounter_wait() and which my patches no longer wait for, the write was done after nfs_sync_mapping() started, which means the write occurred after the unlock-initiating call began. Such a write should have no atomicity guarantee with holding the lock, and may complete before or after the lock is released. The writes which require atomicity are already completed before getting to the call of nfs_iocounter_wait() and the code my patch changes. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] nfs: Always try and release an NFS file lock, even after receiving a SIGKILL 2014-08-21 18:15 ` David Jeffery @ 2014-08-21 18:41 ` Trond Myklebust 2014-08-29 15:34 ` Benjamin Coddington 0 siblings, 1 reply; 10+ messages in thread From: Trond Myklebust @ 2014-08-21 18:41 UTC (permalink / raw) To: David Jeffery; +Cc: Linux NFS Mailing List On Thu, Aug 21, 2014 at 2:15 PM, David Jeffery <djeffery@redhat.com> wrote: > On 08/21/2014 11:50 AM, Trond Myklebust wrote: >> On Thu, Aug 21, 2014 at 11:40 AM, David Jeffery <djeffery@redhat.com> wrote: >>> On 08/20/2014 08:28 PM, Trond Myklebust wrote: >>>> >>>> What guarantees that this does not lead to silent corruption of the file >>>> if there are outstanding write requests? >>>> >>> >>> Do you have a particular scenario in mind you are concerned about? >>> >>> Right before the code the patch modifies, nfs_sync_mapping() is called. >>> Any writes started before the unlock operation began have already been >>> flushed, so we shouldn't have a corruption case of writes from before >>> the unlock began being sent after the unlock is complete. >>> >>> Are you concerned about some other nfs4 writes being started while we >>> initially waited on the counter? Such a write racing with the unlock >> >> No. I'm worried about the writes that have been started, but which are >> now completing in the background while the lock is being freed. >> >>> going ahead instead of erroring out could initially fail from a wrong >>> state ID, but should retry with the new state. Is there something I've >>> overlooked? >> >> Loss of lock atomicity due to the fact that the writes are completing >> after the lock was released. >> > > I don't think my patches break lock atomicity, unless I've completely > misunderstood nfs_sync_mapping()'s behavior. > > The background writes should have already been waited for by > nfs_sync_mapping() at the beginning of do_unlk(). nfs_sync_mapping() > would call nfs_wb_all() which calls sync_inode() with WB_SYNC_ALL, so > it's going to push out any dirty data and wait for any writes to > complete. Only after the writes are complete do we go on to call > nfs_iocounter_wait(). What if nfs_wb_all() is interrupted? Currently, that doesn't matter precisely because of the call to nfs_iocounter_wait. > If there is a write causing us to want to wait in nfs_iocounter_wait() > and which my patches no longer wait for, the write was done after > nfs_sync_mapping() started, which means the write occurred after the > unlock-initiating call began. Such a write should have no atomicity > guarantee with holding the lock, and may complete before or after the > lock is released. The writes which require atomicity are already > completed before getting to the call of nfs_iocounter_wait() and the > code my patch changes. See above. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] nfs: Always try and release an NFS file lock, even after receiving a SIGKILL 2014-08-21 18:41 ` Trond Myklebust @ 2014-08-29 15:34 ` Benjamin Coddington 2014-08-29 15:51 ` Trond Myklebust 0 siblings, 1 reply; 10+ messages in thread From: Benjamin Coddington @ 2014-08-29 15:34 UTC (permalink / raw) To: Trond Myklebust; +Cc: David Jeffery, Linux NFS Mailing List On Thu, 21 Aug 2014, Trond Myklebust wrote: > On Thu, Aug 21, 2014 at 2:15 PM, David Jeffery <djeffery@redhat.com> wrote: >> On 08/21/2014 11:50 AM, Trond Myklebust wrote: >>> On Thu, Aug 21, 2014 at 11:40 AM, David Jeffery <djeffery@redhat.com> wrote: >>>> On 08/20/2014 08:28 PM, Trond Myklebust wrote: >>>>> >>>>> What guarantees that this does not lead to silent corruption of the file >>>>> if there are outstanding write requests? >>>>> >>>> >>>> Do you have a particular scenario in mind you are concerned about? >>>> >>>> Right before the code the patch modifies, nfs_sync_mapping() is called. >>>> Any writes started before the unlock operation began have already been >>>> flushed, so we shouldn't have a corruption case of writes from before >>>> the unlock began being sent after the unlock is complete. >>>> >>>> Are you concerned about some other nfs4 writes being started while we >>>> initially waited on the counter? Such a write racing with the unlock >>> >>> No. I'm worried about the writes that have been started, but which are >>> now completing in the background while the lock is being freed. >>> >>>> going ahead instead of erroring out could initially fail from a wrong >>>> state ID, but should retry with the new state. Is there something I've >>>> overlooked? >>> >>> Loss of lock atomicity due to the fact that the writes are completing >>> after the lock was released. >>> >> >> I don't think my patches break lock atomicity, unless I've completely >> misunderstood nfs_sync_mapping()'s behavior. >> >> The background writes should have already been waited for by >> nfs_sync_mapping() at the beginning of do_unlk(). nfs_sync_mapping() >> would call nfs_wb_all() which calls sync_inode() with WB_SYNC_ALL, so >> it's going to push out any dirty data and wait for any writes to >> complete. Only after the writes are complete do we go on to call >> nfs_iocounter_wait(). > > What if nfs_wb_all() is interrupted? Currently, that doesn't matter > precisely because of the call to nfs_iocounter_wait. Hi Trond, it looks like the intent of: 577b423 NFS: Add functionality to allow waiting on all outstanding reads.. 7a8203d NFS: Ensure that NFS file unlock waits for readahead to complete.. was to wait for readahead, not to avoid releasing the lock if nfs_wb_all() is interrupted. What was the underlying reason for waiting for readahead to complete? I know there was one issue where the NFS server would release the lockowner before processing the reads - was that what these were trying to avoid? I ask because if we're only worried about outstanding writes we could just wait on the writes, not the reads. Would it be better to use TASK_UNINTERRUPTIBLE instead of TASK_KILLABLE in __nfs_iocounter_wait? Ben ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] nfs: Always try and release an NFS file lock, even after receiving a SIGKILL 2014-08-29 15:34 ` Benjamin Coddington @ 2014-08-29 15:51 ` Trond Myklebust 2014-08-29 16:31 ` Benjamin Coddington 0 siblings, 1 reply; 10+ messages in thread From: Trond Myklebust @ 2014-08-29 15:51 UTC (permalink / raw) To: Benjamin Coddington; +Cc: David Jeffery, Linux NFS Mailing List On Fri, Aug 29, 2014 at 11:34 AM, Benjamin Coddington <bcodding@redhat.com> wrote: > > On Thu, 21 Aug 2014, Trond Myklebust wrote: >> >> On Thu, Aug 21, 2014 at 2:15 PM, David Jeffery <djeffery@redhat.com> >> wrote: >>> >>> On 08/21/2014 11:50 AM, Trond Myklebust wrote: >>>> >>>> On Thu, Aug 21, 2014 at 11:40 AM, David Jeffery <djeffery@redhat.com> >>>> wrote: >>>>> >>>>> On 08/20/2014 08:28 PM, Trond Myklebust wrote: >>>>>> >>>>>> >>>>>> What guarantees that this does not lead to silent corruption of the >>>>>> file >>>>>> if there are outstanding write requests? >>>>>> >>>>> >>>>> Do you have a particular scenario in mind you are concerned about? >>>>> >>>>> Right before the code the patch modifies, nfs_sync_mapping() is called. >>>>> Any writes started before the unlock operation began have already been >>>>> flushed, so we shouldn't have a corruption case of writes from before >>>>> the unlock began being sent after the unlock is complete. >>>>> >>>>> Are you concerned about some other nfs4 writes being started while we >>>>> initially waited on the counter? Such a write racing with the unlock >>>> >>>> >>>> No. I'm worried about the writes that have been started, but which are >>>> now completing in the background while the lock is being freed. >>>> >>>>> going ahead instead of erroring out could initially fail from a wrong >>>>> state ID, but should retry with the new state. Is there something I've >>>>> overlooked? >>>> >>>> >>>> Loss of lock atomicity due to the fact that the writes are completing >>>> after the lock was released. >>>> >>> >>> I don't think my patches break lock atomicity, unless I've completely >>> misunderstood nfs_sync_mapping()'s behavior. >>> >>> The background writes should have already been waited for by >>> nfs_sync_mapping() at the beginning of do_unlk(). nfs_sync_mapping() >>> would call nfs_wb_all() which calls sync_inode() with WB_SYNC_ALL, so >>> it's going to push out any dirty data and wait for any writes to >>> complete. Only after the writes are complete do we go on to call >>> nfs_iocounter_wait(). >> >> >> What if nfs_wb_all() is interrupted? Currently, that doesn't matter >> precisely because of the call to nfs_iocounter_wait. > > > Hi Trond, it looks like the intent of: > > 577b423 NFS: Add functionality to allow waiting on all outstanding reads.. > 7a8203d NFS: Ensure that NFS file unlock waits for readahead to complete.. > > was to wait for readahead, not to avoid releasing the lock if nfs_wb_all() > is interrupted. What was the underlying reason for waiting for readahead to > complete? I know there was one issue where the NFS server would release the > lockowner before processing the reads - was that what these were trying to > avoid? I ask because if we're only worried about outstanding writes we > could just wait on the writes, not the reads. > > Would it be better to use TASK_UNINTERRUPTIBLE instead of TASK_KILLABLE in > __nfs_iocounter_wait? > > Ben Hi Ben, The main intent of the two patches was to ensure that all I/O that uses the lock stateid should have completed before we attempt to release the lock. The main reason is to avoid an unnecessary "recovery" situation where the server asks us to resend the I/O because the lock stateid is no longer valid. Concerning writes; if there are any outstanding then we must ensure those complete before we unlock the file. This rule should be unaffected by the above 2 patches (although it is just a bonus if they help to enforce it). Cheers Trond -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] nfs: Always try and release an NFS file lock, even after receiving a SIGKILL 2014-08-29 15:51 ` Trond Myklebust @ 2014-08-29 16:31 ` Benjamin Coddington 2014-08-29 19:02 ` Trond Myklebust 0 siblings, 1 reply; 10+ messages in thread From: Benjamin Coddington @ 2014-08-29 16:31 UTC (permalink / raw) To: Trond Myklebust; +Cc: David Jeffery, Linux NFS Mailing List On Fri, 29 Aug 2014, Trond Myklebust wrote: > On Fri, Aug 29, 2014 at 11:34 AM, Benjamin Coddington > <bcodding@redhat.com> wrote: >> >> On Thu, 21 Aug 2014, Trond Myklebust wrote: >>> >>> On Thu, Aug 21, 2014 at 2:15 PM, David Jeffery <djeffery@redhat.com> >>> wrote: >>>> >>>> On 08/21/2014 11:50 AM, Trond Myklebust wrote: >>>>> >>>>> On Thu, Aug 21, 2014 at 11:40 AM, David Jeffery <djeffery@redhat.com> >>>>> wrote: >>>>>> >>>>>> On 08/20/2014 08:28 PM, Trond Myklebust wrote: >>>>>>> >>>>>>> >>>>>>> What guarantees that this does not lead to silent corruption of the >>>>>>> file >>>>>>> if there are outstanding write requests? >>>>>>> >>>>>> >>>>>> Do you have a particular scenario in mind you are concerned about? >>>>>> >>>>>> Right before the code the patch modifies, nfs_sync_mapping() is called. >>>>>> Any writes started before the unlock operation began have already been >>>>>> flushed, so we shouldn't have a corruption case of writes from before >>>>>> the unlock began being sent after the unlock is complete. >>>>>> >>>>>> Are you concerned about some other nfs4 writes being started while we >>>>>> initially waited on the counter? Such a write racing with the unlock >>>>> >>>>> >>>>> No. I'm worried about the writes that have been started, but which are >>>>> now completing in the background while the lock is being freed. >>>>> >>>>>> going ahead instead of erroring out could initially fail from a wrong >>>>>> state ID, but should retry with the new state. Is there something I've >>>>>> overlooked? >>>>> >>>>> >>>>> Loss of lock atomicity due to the fact that the writes are completing >>>>> after the lock was released. >>>>> >>>> >>>> I don't think my patches break lock atomicity, unless I've completely >>>> misunderstood nfs_sync_mapping()'s behavior. >>>> >>>> The background writes should have already been waited for by >>>> nfs_sync_mapping() at the beginning of do_unlk(). nfs_sync_mapping() >>>> would call nfs_wb_all() which calls sync_inode() with WB_SYNC_ALL, so >>>> it's going to push out any dirty data and wait for any writes to >>>> complete. Only after the writes are complete do we go on to call >>>> nfs_iocounter_wait(). >>> >>> >>> What if nfs_wb_all() is interrupted? Currently, that doesn't matter >>> precisely because of the call to nfs_iocounter_wait. >> >> >> Hi Trond, it looks like the intent of: >> >> 577b423 NFS: Add functionality to allow waiting on all outstanding reads.. >> 7a8203d NFS: Ensure that NFS file unlock waits for readahead to complete.. >> >> was to wait for readahead, not to avoid releasing the lock if nfs_wb_all() >> is interrupted. What was the underlying reason for waiting for readahead to >> complete? I know there was one issue where the NFS server would release the >> lockowner before processing the reads - was that what these were trying to >> avoid? I ask because if we're only worried about outstanding writes we >> could just wait on the writes, not the reads. >> >> Would it be better to use TASK_UNINTERRUPTIBLE instead of TASK_KILLABLE in >> __nfs_iocounter_wait? >> >> Ben > > Hi Ben, > > The main intent of the two patches was to ensure that all I/O that > uses the lock stateid should have completed before we attempt to > release the lock. The main reason is to avoid an unnecessary > "recovery" situation where the server asks us to resend the I/O > because the lock stateid is no longer valid. > > Concerning writes; if there are any outstanding then we must ensure > those complete before we unlock the file. This rule should be > unaffected by the above 2 patches (although it is just a bonus if they > help to enforce it). If so, then David's patch shouldn't risk corruption any more than before 577b423 and 7a8203d were applied. He's trying to avoid the problem (I think) where very aggressive readahead behavior blocks the unlocking thread and a sysadmin decides to kill the process, which results in the abandoned lock. That can make a big mess. Instead, his change would result in the "recovery" scenario at worst when the impatient sysadmin goes on a rampage. Ben ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] nfs: Always try and release an NFS file lock, even after receiving a SIGKILL 2014-08-29 16:31 ` Benjamin Coddington @ 2014-08-29 19:02 ` Trond Myklebust 0 siblings, 0 replies; 10+ messages in thread From: Trond Myklebust @ 2014-08-29 19:02 UTC (permalink / raw) To: Benjamin Coddington; +Cc: David Jeffery, Linux NFS Mailing List On Fri, Aug 29, 2014 at 12:31 PM, Benjamin Coddington <bcodding@redhat.com> wrote: > On Fri, 29 Aug 2014, Trond Myklebust wrote: > >> On Fri, Aug 29, 2014 at 11:34 AM, Benjamin Coddington >> <bcodding@redhat.com> wrote: >>> >>> >>> On Thu, 21 Aug 2014, Trond Myklebust wrote: >>>> >>>> >>>> On Thu, Aug 21, 2014 at 2:15 PM, David Jeffery <djeffery@redhat.com> >>>> wrote: >>>>> >>>>> >>>>> On 08/21/2014 11:50 AM, Trond Myklebust wrote: >>>>>> >>>>>> >>>>>> On Thu, Aug 21, 2014 at 11:40 AM, David Jeffery <djeffery@redhat.com> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> On 08/20/2014 08:28 PM, Trond Myklebust wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> What guarantees that this does not lead to silent corruption of the >>>>>>>> file >>>>>>>> if there are outstanding write requests? >>>>>>>> >>>>>>> >>>>>>> Do you have a particular scenario in mind you are concerned about? >>>>>>> >>>>>>> Right before the code the patch modifies, nfs_sync_mapping() is >>>>>>> called. >>>>>>> Any writes started before the unlock operation began have already >>>>>>> been >>>>>>> flushed, so we shouldn't have a corruption case of writes from before >>>>>>> the unlock began being sent after the unlock is complete. >>>>>>> >>>>>>> Are you concerned about some other nfs4 writes being started while we >>>>>>> initially waited on the counter? Such a write racing with the unlock >>>>>> >>>>>> >>>>>> >>>>>> No. I'm worried about the writes that have been started, but which are >>>>>> now completing in the background while the lock is being freed. >>>>>> >>>>>>> going ahead instead of erroring out could initially fail from a wrong >>>>>>> state ID, but should retry with the new state. Is there something >>>>>>> I've >>>>>>> overlooked? >>>>>> >>>>>> >>>>>> >>>>>> Loss of lock atomicity due to the fact that the writes are completing >>>>>> after the lock was released. >>>>>> >>>>> >>>>> I don't think my patches break lock atomicity, unless I've completely >>>>> misunderstood nfs_sync_mapping()'s behavior. >>>>> >>>>> The background writes should have already been waited for by >>>>> nfs_sync_mapping() at the beginning of do_unlk(). nfs_sync_mapping() >>>>> would call nfs_wb_all() which calls sync_inode() with WB_SYNC_ALL, so >>>>> it's going to push out any dirty data and wait for any writes to >>>>> complete. Only after the writes are complete do we go on to call >>>>> nfs_iocounter_wait(). >>>> >>>> >>>> >>>> What if nfs_wb_all() is interrupted? Currently, that doesn't matter >>>> precisely because of the call to nfs_iocounter_wait. >>> >>> >>> >>> Hi Trond, it looks like the intent of: >>> >>> 577b423 NFS: Add functionality to allow waiting on all outstanding >>> reads.. >>> 7a8203d NFS: Ensure that NFS file unlock waits for readahead to >>> complete.. >>> >>> was to wait for readahead, not to avoid releasing the lock if >>> nfs_wb_all() >>> is interrupted. What was the underlying reason for waiting for readahead >>> to >>> complete? I know there was one issue where the NFS server would release >>> the >>> lockowner before processing the reads - was that what these were trying >>> to >>> avoid? I ask because if we're only worried about outstanding writes we >>> could just wait on the writes, not the reads. >>> >>> Would it be better to use TASK_UNINTERRUPTIBLE instead of TASK_KILLABLE >>> in >>> __nfs_iocounter_wait? >>> >>> Ben >> >> >> Hi Ben, >> >> The main intent of the two patches was to ensure that all I/O that >> uses the lock stateid should have completed before we attempt to >> release the lock. The main reason is to avoid an unnecessary >> "recovery" situation where the server asks us to resend the I/O >> because the lock stateid is no longer valid. >> >> Concerning writes; if there are any outstanding then we must ensure >> those complete before we unlock the file. This rule should be >> unaffected by the above 2 patches (although it is just a bonus if they >> help to enforce it). > > > If so, then David's patch shouldn't risk corruption any more than before > 577b423 and 7a8203d were applied. > > He's trying to avoid the problem (I think) where very aggressive readahead > behavior blocks the unlocking thread and a sysadmin decides to kill the > process, which results in the abandoned lock. That can make a big mess. > Instead, his change would result in the "recovery" scenario at worst when > the impatient sysadmin goes on a rampage. > The patch is trying to force the lock to be cleared before the I/O is done. That's wrong, whether or not the code was broken before. I'm not applying. Cheers Trond -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-08-29 19:02 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1136742914.14501494.1407251366067.JavaMail.zimbra@redhat.com>
2014-08-05 15:17 ` [PATCH] nfs: Always try and release an NFS file lock, even after receiving a SIGKILL David Jeffery
2014-08-21 0:28 ` Trond Myklebust
2014-08-21 15:40 ` David Jeffery
2014-08-21 15:50 ` Trond Myklebust
2014-08-21 18:15 ` David Jeffery
2014-08-21 18:41 ` Trond Myklebust
2014-08-29 15:34 ` Benjamin Coddington
2014-08-29 15:51 ` Trond Myklebust
2014-08-29 16:31 ` Benjamin Coddington
2014-08-29 19:02 ` Trond Myklebust
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.