linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] pnfs: call layoutcommit after flushing inode's data to disk.
@ 2010-05-20  3:28 Tao Guo
  2010-05-20  4:42 ` Tao Guo
  0 siblings, 1 reply; 6+ messages in thread
From: Tao Guo @ 2010-05-20  3:28 UTC (permalink / raw)
  To: linux-nfs; +Cc: Benny Halevy, J. Bruce Fields

Signed-off-by: Tao Guo <guotao-U4AKAne5IzAR5TUyvShJeg@public.gmane.org>
---
 fs/nfs/write.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index d3e1645..b4f48b2 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1558,6 +1558,7 @@ int nfs_write_inode(struct inode *inode, struct writeback_control *wbc)
  */
 int nfs_wb_all(struct inode *inode)
 {
+	int ret;
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_ALL,
 		.nr_to_write = LONG_MAX,
@@ -1565,7 +1566,12 @@ int nfs_wb_all(struct inode *inode)
 		.range_end = LLONG_MAX,
 	};
 
-	return sync_inode(inode, &wbc);
+	ret = sync_inode(inode, &wbc);
+#ifdef CONFIG_NFS_V4_1
+	if (!ret && NFS_I(inode)->layoutcommit_ctx)
+		ret = pnfs_layoutcommit_inode(inode, 1);
+#endif
+	return ret;
 }
 
 int nfs_wb_page_cancel(struct inode *inode, struct page *page)
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] pnfs: call layoutcommit after flushing inode's data to disk.
  2010-05-20  3:28 Tao Guo
@ 2010-05-20  4:42 ` Tao Guo
       [not found]   ` <AANLkTikc6JXz1ubKJHxs0UJq-x80Mafsz4SKgkBPZgnf-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Tao Guo @ 2010-05-20  4:42 UTC (permalink / raw)
  To: linux-nfs; +Cc: Benny Halevy, J. Bruce Fields

This is for a bug introduced to 2.6.34. In 2.6.32 and 2.6.33 we call
layoutcommit in
nfs_sync_mapping_wait(), but in 2.6.34 we use sync_inode() to sync
inode's data, so
the layoutcommit code is gone.

BTW: In current code, layoutcommit_ctx will increase refcount of
nfs_inode's ctx, so if
layoutcommit_ctx is not NULL, we could not reach nfs4_close_context
=2E.. --> __nfs_close().
So pnfs_layoutcommit_inode() in __nfs_close() will not be called in
whatever situation.
Why we have to use nfs_inode's ctx as layoutcommit_ctx, since we only
need its rpc_creds
actually?

On Thu, May 20, 2010 at 11:28 AM, Tao Guo <guotao-U4AKAne5IzAR5TUyvShJeg@public.gmane.org> wrote:
> Signed-off-by: Tao Guo <guotao-U4AKAne5IzAR5TUyvShJeg@public.gmane.org>
> ---
> =C2=A0fs/nfs/write.c | =C2=A0 =C2=A08 +++++++-
> =C2=A01 files changed, 7 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index d3e1645..b4f48b2 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1558,6 +1558,7 @@ int nfs_write_inode(struct inode *inode, struct=
 writeback_control *wbc)
> =C2=A0*/
> =C2=A0int nfs_wb_all(struct inode *inode)
> =C2=A0{
> + =C2=A0 =C2=A0 =C2=A0 int ret;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0struct writeback_control wbc =3D {
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0.sync_mode =3D=
 WB_SYNC_ALL,
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0.nr_to_write =3D=
 LONG_MAX,
> @@ -1565,7 +1566,12 @@ int nfs_wb_all(struct inode *inode)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0.range_end =3D=
 LLONG_MAX,
> =C2=A0 =C2=A0 =C2=A0 =C2=A0};
>
> - =C2=A0 =C2=A0 =C2=A0 return sync_inode(inode, &wbc);
> + =C2=A0 =C2=A0 =C2=A0 ret =3D sync_inode(inode, &wbc);
> +#ifdef CONFIG_NFS_V4_1
> + =C2=A0 =C2=A0 =C2=A0 if (!ret && NFS_I(inode)->layoutcommit_ctx)
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ret =3D pnfs_layou=
tcommit_inode(inode, 1);
> +#endif
> + =C2=A0 =C2=A0 =C2=A0 return ret;
> =C2=A0}
>
> =C2=A0int nfs_wb_page_cancel(struct inode *inode, struct page *page)
> --
> 1.6.3.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =C2=A0http://vger.kernel.org/majordomo-info.ht=
ml
>



--=20
tao.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pnfs: call layoutcommit after flushing inode's data to  disk.
       [not found]   ` <AANLkTikc6JXz1ubKJHxs0UJq-x80Mafsz4SKgkBPZgnf-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-05-20 14:17     ` Andy Adamson
  0 siblings, 0 replies; 6+ messages in thread
From: Andy Adamson @ 2010-05-20 14:17 UTC (permalink / raw)
  To: Tao Guo; +Cc: linux-nfs, Benny Halevy, J. Bruce Fields


On May 20, 2010, at 12:42 AM, Tao Guo wrote:

> This is for a bug introduced to 2.6.34. In 2.6.32 and 2.6.33 we call
> layoutcommit in
> nfs_sync_mapping_wait(), but in 2.6.34 we use sync_inode() to sync
> inode's data, so
> the layoutcommit code is gone.
>
> BTW: In current code, layoutcommit_ctx will increase refcount of
> nfs_inode's ctx, so if
> layoutcommit_ctx is not NULL, we could not reach nfs4_close_context
> ... --> __nfs_close().
> So pnfs_layoutcommit_inode() in __nfs_close() will not be called in
> whatever situation.
> Why we have to use nfs_inode's ctx as layoutcommit_ctx, since we only
> need its rpc_creds
> actually?


I have a patch to be submitted today that replaces the  
nfs_open_context with an rpc_cred.

-->Andy

>
> On Thu, May 20, 2010 at 11:28 AM, Tao Guo <guotao-U4AKAne5IzAR5TUyvShJeg@public.gmane.org> wrote:
>> Signed-off-by: Tao Guo <guotao-U4AKAne5IzAR5TUyvShJeg@public.gmane.org>
>> ---
>>  fs/nfs/write.c |    8 +++++++-
>>  1 files changed, 7 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>> index d3e1645..b4f48b2 100644
>> --- a/fs/nfs/write.c
>> +++ b/fs/nfs/write.c
>> @@ -1558,6 +1558,7 @@ int nfs_write_inode(struct inode *inode,  
>> struct writeback_control *wbc)
>>  */
>>  int nfs_wb_all(struct inode *inode)
>>  {
>> +       int ret;
>>        struct writeback_control wbc = {
>>                .sync_mode = WB_SYNC_ALL,
>>                .nr_to_write = LONG_MAX,
>> @@ -1565,7 +1566,12 @@ int nfs_wb_all(struct inode *inode)
>>                .range_end = LLONG_MAX,
>>        };
>>
>> -       return sync_inode(inode, &wbc);
>> +       ret = sync_inode(inode, &wbc);
>> +#ifdef CONFIG_NFS_V4_1
>> +       if (!ret && NFS_I(inode)->layoutcommit_ctx)
>> +               ret = pnfs_layoutcommit_inode(inode, 1);
>> +#endif
>> +       return ret;
>>  }
>>
>>  int nfs_wb_page_cancel(struct inode *inode, struct page *page)
>> --
>> 1.6.3.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux- 
>> nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> -- 
> tao.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"  
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] pnfs: call layoutcommit after flushing inode's data to disk.
@ 2010-05-24 19:04 Boaz Harrosh
  2010-05-24 19:06 ` Boaz Harrosh
  0 siblings, 1 reply; 6+ messages in thread
From: Boaz Harrosh @ 2010-05-24 19:04 UTC (permalink / raw)
  To: Tao Guo, Andy Adamson, Benny Halevy, NFS list

From: Tao Guo <guotao@nrchpc.ac.cn>

This is for a bug introduced to 2.6.34. In 2.6.32 and 2.6.33 we call
layoutcommit in nfs_sync_mapping_wait(), but in 2.6.34 we use
sync_inode() to sync inode's data, so the layoutcommit code is gone.

BTW: In current code, layoutcommit_ctx will increase refcount of
nfs_inode's ctx, so if layoutcommit_ctx is not NULL, we could not
reach nfs4_close_context ... --> __nfs_close().
So pnfs_layoutcommit_inode() in __nfs_close() will not be called in
whatever situation.
Why we have to use nfs_inode's ctx as layoutcommit_ctx, since we
only need its rpc_creds actually?

Boaz:
  Without this patch none-files layouts which return NFS_FILE_SYNC
  from writes, will eventually hang on uncommitted inodes.

  This patch reveals a bad design pattern from the pnfs-client. On
  none-files layouts nfs_post_op_update_inode_force_wcc or friends
  are never called and i_size_write() is never preformed on client.
  What happens is that the layoutcommit eventually updates the server.
  In Linux the sever would update i_size_attr and mtime. It would then
  be detected by the client and i_size is fetched from server. This is
  not nice client behaviour, and is not done so for files and none-pnfs
  IO. Sigh! 
  I think also files-lo might suffer from this in some cases, which means
  if commit is not preformed (It's just an hint) then writeout will have
  i_size  corruption.

Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 fs/nfs/write.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index dad8da3..9424203 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1558,6 +1558,7 @@ int nfs_write_inode(struct inode *inode, struct writeback_control *wbc)
  */
 int nfs_wb_all(struct inode *inode)
 {
+	int ret;
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_ALL,
 		.nr_to_write = LONG_MAX,
@@ -1565,7 +1566,12 @@ int nfs_wb_all(struct inode *inode)
 		.range_end = LLONG_MAX,
 	};
 
-	return sync_inode(inode, &wbc);
+	ret = sync_inode(inode, &wbc);
+#ifdef CONFIG_NFS_V4_1
+	if (!ret && do_layoutcommit(NFS_I(inode)))
+		ret = pnfs_layoutcommit_inode(inode, 1);
+#endif
+	return ret;
 }
 
 int nfs_wb_page_cancel(struct inode *inode, struct page *page)
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] pnfs: call layoutcommit after flushing inode's data to disk.
  2010-05-24 19:04 [PATCH] pnfs: call layoutcommit after flushing inode's data to disk Boaz Harrosh
@ 2010-05-24 19:06 ` Boaz Harrosh
  2010-05-25  6:55   ` Benny Halevy
  0 siblings, 1 reply; 6+ messages in thread
From: Boaz Harrosh @ 2010-05-24 19:06 UTC (permalink / raw)
  To: Tao Guo, Andy Adamson, Benny Halevy, NFS list

On 05/24/2010 10:04 PM, Boaz Harrosh wrote:
> From: Tao Guo <guotao-U4AKAne5IzAR5TUyvShJeg@public.gmane.org>
> 
> This is for a bug introduced to 2.6.34. In 2.6.32 and 2.6.33 we call
> layoutcommit in nfs_sync_mapping_wait(), but in 2.6.34 we use
> sync_inode() to sync inode's data, so the layoutcommit code is gone.
> 
> BTW: In current code, layoutcommit_ctx will increase refcount of
> nfs_inode's ctx, so if layoutcommit_ctx is not NULL, we could not
> reach nfs4_close_context ... --> __nfs_close().
> So pnfs_layoutcommit_inode() in __nfs_close() will not be called in
> whatever situation.
> Why we have to use nfs_inode's ctx as layoutcommit_ctx, since we
> only need its rpc_creds actually?
> 
> Boaz:
>   Without this patch none-files layouts which return NFS_FILE_SYNC
>   from writes, will eventually hang on uncommitted inodes.
> 
>   This patch reveals a bad design pattern from the pnfs-client. On
>   none-files layouts nfs_post_op_update_inode_force_wcc or friends
>   are never called and i_size_write() is never preformed on client.
>   What happens is that the layoutcommit eventually updates the server.
>   In Linux the sever would update i_size_attr and mtime. It would then
>   be detected by the client and i_size is fetched from server. This is
>   not nice client behaviour, and is not done so for files and none-pnfs
>   IO. Sigh! 
>   I think also files-lo might suffer from this in some cases, which means
>   if commit is not preformed (It's just an hint) then writeout will have

commit => layout_commit, oops

>   i_size  corruption.
> 
> Signed-off-by: Tao Guo <guotao-U4AKAne5IzAR5TUyvShJeg@public.gmane.org>
> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
> ---
>  fs/nfs/write.c |    8 +++++++-
>  1 files changed, 7 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index dad8da3..9424203 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1558,6 +1558,7 @@ int nfs_write_inode(struct inode *inode, struct writeback_control *wbc)
>   */
>  int nfs_wb_all(struct inode *inode)
>  {
> +	int ret;
>  	struct writeback_control wbc = {
>  		.sync_mode = WB_SYNC_ALL,
>  		.nr_to_write = LONG_MAX,
> @@ -1565,7 +1566,12 @@ int nfs_wb_all(struct inode *inode)
>  		.range_end = LLONG_MAX,
>  	};
>  
> -	return sync_inode(inode, &wbc);
> +	ret = sync_inode(inode, &wbc);
> +#ifdef CONFIG_NFS_V4_1
> +	if (!ret && do_layoutcommit(NFS_I(inode)))
> +		ret = pnfs_layoutcommit_inode(inode, 1);
> +#endif
> +	return ret;
>  }
>  
>  int nfs_wb_page_cancel(struct inode *inode, struct page *page)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pnfs: call layoutcommit after flushing inode's data to disk.
  2010-05-24 19:06 ` Boaz Harrosh
@ 2010-05-25  6:55   ` Benny Halevy
  0 siblings, 0 replies; 6+ messages in thread
From: Benny Halevy @ 2010-05-25  6:55 UTC (permalink / raw)
  To: Boaz Harrosh, Tao Guo; +Cc: Andy Adamson, NFS list

On May. 24, 2010, 22:06 +0300, Boaz Harrosh <bharrosh@panasas.com> wrote:
> On 05/24/2010 10:04 PM, Boaz Harrosh wrote:
>> From: Tao Guo <guotao-U4AKAne5IzAR5TUyvShJeg@public.gmane.org>
>>
>> This is for a bug introduced to 2.6.34. In 2.6.32 and 2.6.33 we call
>> layoutcommit in nfs_sync_mapping_wait(), but in 2.6.34 we use
>> sync_inode() to sync inode's data, so the layoutcommit code is gone.
>>
>> BTW: In current code, layoutcommit_ctx will increase refcount of
>> nfs_inode's ctx, so if layoutcommit_ctx is not NULL, we could not
>> reach nfs4_close_context ... --> __nfs_close().
>> So pnfs_layoutcommit_inode() in __nfs_close() will not be called in
>> whatever situation.
>> Why we have to use nfs_inode's ctx as layoutcommit_ctx, since we
>> only need its rpc_creds actually?
>>
>> Boaz:
>>   Without this patch none-files layouts which return NFS_FILE_SYNC
>>   from writes, will eventually hang on uncommitted inodes.
>>
>>   This patch reveals a bad design pattern from the pnfs-client. On
>>   none-files layouts nfs_post_op_update_inode_force_wcc or friends
>>   are never called and i_size_write() is never preformed on client.
>>   What happens is that the layoutcommit eventually updates the server.
>>   In Linux the sever would update i_size_attr and mtime. It would then
>>   be detected by the client and i_size is fetched from server. This is
>>   not nice client behaviour, and is not done so for files and none-pnfs
>>   IO. Sigh! 
>>   I think also files-lo might suffer from this in some cases, which means
>>   if commit is not preformed (It's just an hint) then writeout will have
> 
> commit => layout_commit, oops
> 
>>   i_size  corruption.
>>
>> Signed-off-by: Tao Guo <guotao-U4AKAne5IzAR5TUyvShJeg@public.gmane.org>
>> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>

Committed at pnfs-all-2.6.34-2010-05-25

Thanks!

Benny

>> ---
>>  fs/nfs/write.c |    8 +++++++-
>>  1 files changed, 7 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>> index dad8da3..9424203 100644
>> --- a/fs/nfs/write.c
>> +++ b/fs/nfs/write.c
>> @@ -1558,6 +1558,7 @@ int nfs_write_inode(struct inode *inode, struct writeback_control *wbc)
>>   */
>>  int nfs_wb_all(struct inode *inode)
>>  {
>> +	int ret;
>>  	struct writeback_control wbc = {
>>  		.sync_mode = WB_SYNC_ALL,
>>  		.nr_to_write = LONG_MAX,
>> @@ -1565,7 +1566,12 @@ int nfs_wb_all(struct inode *inode)
>>  		.range_end = LLONG_MAX,
>>  	};
>>  
>> -	return sync_inode(inode, &wbc);
>> +	ret = sync_inode(inode, &wbc);
>> +#ifdef CONFIG_NFS_V4_1
>> +	if (!ret && do_layoutcommit(NFS_I(inode)))
>> +		ret = pnfs_layoutcommit_inode(inode, 1);
>> +#endif
>> +	return ret;
>>  }
>>  
>>  int nfs_wb_page_cancel(struct inode *inode, struct page *page)
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-05-25  6:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-24 19:04 [PATCH] pnfs: call layoutcommit after flushing inode's data to disk Boaz Harrosh
2010-05-24 19:06 ` Boaz Harrosh
2010-05-25  6:55   ` Benny Halevy
  -- strict thread matches above, loose matches on Subject: below --
2010-05-20  3:28 Tao Guo
2010-05-20  4:42 ` Tao Guo
     [not found]   ` <AANLkTikc6JXz1ubKJHxs0UJq-x80Mafsz4SKgkBPZgnf-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-05-20 14:17     ` Andy Adamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).