* exofs_file_fsync
@ 2010-05-31 10:09 Christoph Hellwig
2010-05-31 10:23 ` exofs_file_fsync Boaz Harrosh
2010-06-01 15:30 ` [PATCH] exofs: exofs_file_fsync correctness Boaz Harrosh
0 siblings, 2 replies; 20+ messages in thread
From: Christoph Hellwig @ 2010-05-31 10:09 UTC (permalink / raw)
To: bharrosh; +Cc: linux-fsdevel
Various odd things going on here:
- no checks for I_DIRTY and friends, so it will always write out data
- filemap_write_and_wait is superflous, as it's already done by
the caller
- write_inode_now is overkill as it also writes out data, better use
sync_inode in a similar way to generic_file_fsync
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: exofs_file_fsync
2010-05-31 10:09 exofs_file_fsync Christoph Hellwig
@ 2010-05-31 10:23 ` Boaz Harrosh
2010-05-31 10:27 ` exofs_file_fsync Christoph Hellwig
2010-06-01 15:30 ` [PATCH] exofs: exofs_file_fsync correctness Boaz Harrosh
1 sibling, 1 reply; 20+ messages in thread
From: Boaz Harrosh @ 2010-05-31 10:23 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-fsdevel
On 05/31/2010 01:09 PM, Christoph Hellwig wrote:
> Various odd things going on here:
>
> - no checks for I_DIRTY and friends, so it will always write out data
> - filemap_write_and_wait is superflous, as it's already done by
> the caller
> - write_inode_now is overkill as it also writes out data, better use
> sync_inode in a similar way to generic_file_fsync
I'll look into it, but I think what I did here was to effectively
"data sync" because I wanted a data sync on close and that was the only
vector I already had that's called on close.
I want "data sync on close" because I'm kind of close-2-open
network semantics, cross a network. And this is my local
access checkpoint.
But sure I'll re think the all thing.
Thanks
Boaz
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: exofs_file_fsync
2010-05-31 10:23 ` exofs_file_fsync Boaz Harrosh
@ 2010-05-31 10:27 ` Christoph Hellwig
2010-05-31 10:31 ` exofs_file_fsync Boaz Harrosh
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2010-05-31 10:27 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Christoph Hellwig, linux-fsdevel
On Mon, May 31, 2010 at 01:23:09PM +0300, Boaz Harrosh wrote:
> On 05/31/2010 01:09 PM, Christoph Hellwig wrote:
> > Various odd things going on here:
> >
> > - no checks for I_DIRTY and friends, so it will always write out data
> > - filemap_write_and_wait is superflous, as it's already done by
> > the caller
> > - write_inode_now is overkill as it also writes out data, better use
> > sync_inode in a similar way to generic_file_fsync
>
> I'll look into it, but I think what I did here was to effectively
> "data sync" because I wanted a data sync on close and that was the only
> vector I already had that's called on close.
fsync won't get called at close time. ->release is called on last close
and ->flush on every close.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: exofs_file_fsync
2010-05-31 10:27 ` exofs_file_fsync Christoph Hellwig
@ 2010-05-31 10:31 ` Boaz Harrosh
2010-05-31 10:33 ` exofs_file_fsync Christoph Hellwig
0 siblings, 1 reply; 20+ messages in thread
From: Boaz Harrosh @ 2010-05-31 10:31 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-fsdevel
On 05/31/2010 01:27 PM, Christoph Hellwig wrote:
> On Mon, May 31, 2010 at 01:23:09PM +0300, Boaz Harrosh wrote:
>> On 05/31/2010 01:09 PM, Christoph Hellwig wrote:
>>> Various odd things going on here:
>>>
>>> - no checks for I_DIRTY and friends, so it will always write out data
>>> - filemap_write_and_wait is superflous, as it's already done by
>>> the caller
>>> - write_inode_now is overkill as it also writes out data, better use
>>> sync_inode in a similar way to generic_file_fsync
>>
>> I'll look into it, but I think what I did here was to effectively
>> "data sync" because I wanted a data sync on close and that was the only
>> vector I already had that's called on close.
>
> fsync won't get called at close time. ->release is called on last close
> and ->flush on every close.
>
OK, I was just looking at that. thanks you saved me some digging.
should I just open-code the generic_file_fsync minus the blocks
thing then?
I'm busy with the truncate stuff, but I'll do this next.
Do you need this ASAP?
Thanks
Boaz
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: exofs_file_fsync
2010-05-31 10:31 ` exofs_file_fsync Boaz Harrosh
@ 2010-05-31 10:33 ` Christoph Hellwig
2010-05-31 13:43 ` exofs_file_fsync Boaz Harrosh
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2010-05-31 10:33 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Christoph Hellwig, linux-fsdevel
On Mon, May 31, 2010 at 01:31:01PM +0300, Boaz Harrosh wrote:
> OK, I was just looking at that. thanks you saved me some digging.
> should I just open-code the generic_file_fsync minus the blocks
> thing then?
sync_mapping_buffers is a no-op for you so you can keep it.
The other difference is that you sync out the superblock at the
end of your fsync implementation. That is rather unusual, but I don't
know enough about exofs if you really need to update data in the
superblock to commit file data to disk.
> I'm busy with the truncate stuff, but I'll do this next.
> Do you need this ASAP?
I just noticed it while walking through the fsync implementations.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: exofs_file_fsync
2010-05-31 10:33 ` exofs_file_fsync Christoph Hellwig
@ 2010-05-31 13:43 ` Boaz Harrosh
2010-05-31 13:46 ` exofs_file_fsync Boaz Harrosh
2010-06-01 10:04 ` exofs_file_fsync Christoph Hellwig
0 siblings, 2 replies; 20+ messages in thread
From: Boaz Harrosh @ 2010-05-31 13:43 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-fsdevel
On 05/31/2010 01:33 PM, Christoph Hellwig wrote:
> On Mon, May 31, 2010 at 01:31:01PM +0300, Boaz Harrosh wrote:
>> OK, I was just looking at that. thanks you saved me some digging.
>> should I just open-code the generic_file_fsync minus the blocks
>> thing then?
>
> sync_mapping_buffers is a no-op for you so you can keep it.
> The other difference is that you sync out the superblock at the
> end of your fsync implementation. That is rather unusual, but I don't
> know enough about exofs if you really need to update data in the
> superblock to commit file data to disk.
>
>> I'm busy with the truncate stuff, but I'll do this next.
>> Do you need this ASAP?
>
> I just noticed it while walking through the fsync implementations.
>
OK Chritoff I would need your help Please.
It looks like what I need exactly is:
write_inode_now(inode, sync)
But write_inode_now() has one extra hunk over generic_file_fsync:
if (sync)
inode_sync_wait(inode);
Do you think I can get in trouble calling it from ->fsync
I don't like generic_file_fsync because it does not write my
data since I don't have buffer_heads.
OK, I'm totally lost what does ->fsync need to do? only write
the inode or the pages as well?
Boaz
---
git diff --stat -p -M fs/exofs/file.c
fs/exofs/file.c | 8 ++------
1 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/fs/exofs/file.c b/fs/exofs/file.c
index f9bfe2b..9b3555e 100644
--- a/fs/exofs/file.c
+++ b/fs/exofs/file.c
@@ -47,18 +47,14 @@ static int exofs_file_fsync(struct file *filp, int datasync)
struct inode *inode = mapping->host;
struct super_block *sb;
- ret = filemap_write_and_wait(mapping);
- if (ret)
- return ret;
-
/* sync the inode attributes */
- ret = write_inode_now(inode, 1);
+ ret = write_inode_now(inode, datasync);
/* This is a good place to write the sb */
/* TODO: Sechedule an sb-sync on create */
sb = inode->i_sb;
if (sb->s_dirt)
- exofs_sync_fs(sb, 1);
+ exofs_sync_fs(sb, datasync);
return ret;
}
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: exofs_file_fsync
2010-05-31 13:43 ` exofs_file_fsync Boaz Harrosh
@ 2010-05-31 13:46 ` Boaz Harrosh
2010-06-01 10:05 ` exofs_file_fsync Christoph Hellwig
2010-06-01 10:04 ` exofs_file_fsync Christoph Hellwig
1 sibling, 1 reply; 20+ messages in thread
From: Boaz Harrosh @ 2010-05-31 13:46 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-fsdevel
On 05/31/2010 04:43 PM, Boaz Harrosh wrote:
> On 05/31/2010 01:33 PM, Christoph Hellwig wrote:
>> On Mon, May 31, 2010 at 01:31:01PM +0300, Boaz Harrosh wrote:
>>> OK, I was just looking at that. thanks you saved me some digging.
>>> should I just open-code the generic_file_fsync minus the blocks
>>> thing then?
>>
>> sync_mapping_buffers is a no-op for you so you can keep it.
>> The other difference is that you sync out the superblock at the
>> end of your fsync implementation. That is rather unusual, but I don't
>> know enough about exofs if you really need to update data in the
>> superblock to commit file data to disk.
>>
>>> I'm busy with the truncate stuff, but I'll do this next.
>>> Do you need this ASAP?
>>
>> I just noticed it while walking through the fsync implementations.
>>
>
> OK Chritoff I would need your help Please.
>
> It looks like what I need exactly is:
> write_inode_now(inode, sync)
>
> But write_inode_now() has one extra hunk over generic_file_fsync:
> if (sync)
> inode_sync_wait(inode);
>
> Do you think I can get in trouble calling it from ->fsync
>
> I don't like generic_file_fsync because it does not write my
> data since I don't have buffer_heads.
>
> OK, I'm totally lost what does ->fsync need to do? only write
> the inode or the pages as well?
>
> Boaz
> ---
> git diff --stat -p -M fs/exofs/file.c
> fs/exofs/file.c | 8 ++------
> 1 files changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/fs/exofs/file.c b/fs/exofs/file.c
> index f9bfe2b..9b3555e 100644
> --- a/fs/exofs/file.c
> +++ b/fs/exofs/file.c
> @@ -47,18 +47,14 @@ static int exofs_file_fsync(struct file *filp, int datasync)
> struct inode *inode = mapping->host;
> struct super_block *sb;
>
> - ret = filemap_write_and_wait(mapping);
> - if (ret)
> - return ret;
> -
You think I also need a:
if (!(inode->i_state & I_DIRTY))
return ret;
if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
return ret;
before the write_inode_now call ?
> /* sync the inode attributes */
> - ret = write_inode_now(inode, 1);
> + ret = write_inode_now(inode, datasync);
>
> /* This is a good place to write the sb */
> /* TODO: Sechedule an sb-sync on create */
> sb = inode->i_sb;
> if (sb->s_dirt)
> - exofs_sync_fs(sb, 1);
> + exofs_sync_fs(sb, datasync);
>
> return ret;
> }
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: exofs_file_fsync
2010-05-31 13:43 ` exofs_file_fsync Boaz Harrosh
2010-05-31 13:46 ` exofs_file_fsync Boaz Harrosh
@ 2010-06-01 10:04 ` Christoph Hellwig
2010-06-01 15:29 ` exofs_file_fsync Boaz Harrosh
1 sibling, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2010-06-01 10:04 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Christoph Hellwig, linux-fsdevel
On Mon, May 31, 2010 at 04:43:48PM +0300, Boaz Harrosh wrote:
> OK Chritoff I would need your help Please.
>
> It looks like what I need exactly is:
> write_inode_now(inode, sync)
>
> But write_inode_now() has one extra hunk over generic_file_fsync:
> if (sync)
> inode_sync_wait(inode);
It waits for I_SYNC beeing cleared, which is entirely superflous
for ->fsync purposes. In fact I don't have the slightest idea
what it is useful for, given that we already clear I_SYNC inside
writeback_single_inode ourselves. So the only thing it causes
is to wait for another code racing to step into this code just
after us.
> Do you think I can get in trouble calling it from ->fsync
>
> I don't like generic_file_fsync because it does not write my
> data since I don't have buffer_heads.
Then copy it.
> OK, I'm totally lost what does ->fsync need to do? only write
> the inode or the pages as well?
The pages are written by vfs_fsync_range. You need to make sure
that the inode back, or if the datasync flag is set only changes
to the inode require to find that data are written back. That's
basically a messy wording for you can skip dirtiness of timestamp
updates if the datasync flag is set.
Note that your patch below is very wrong for this, as it does
a lot of asynchronous activity if the datasync flag is not set.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: exofs_file_fsync
2010-05-31 13:46 ` exofs_file_fsync Boaz Harrosh
@ 2010-06-01 10:05 ` Christoph Hellwig
0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2010-06-01 10:05 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Christoph Hellwig, linux-fsdevel
On Mon, May 31, 2010 at 04:46:33PM +0300, Boaz Harrosh wrote:
> You think I also need a:
> if (!(inode->i_state & I_DIRTY))
> return ret;
> if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
> return ret;
> before the write_inode_now call ?
You don't strictly need it, but without it you will perform a lot
of work for no gain if the inode is not actually dirty.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: exofs_file_fsync
2010-06-01 10:04 ` exofs_file_fsync Christoph Hellwig
@ 2010-06-01 15:29 ` Boaz Harrosh
0 siblings, 0 replies; 20+ messages in thread
From: Boaz Harrosh @ 2010-06-01 15:29 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-fsdevel
On 06/01/2010 01:04 PM, Christoph Hellwig wrote:
>
> The pages are written by vfs_fsync_range. You need to make sure
> that the inode back, or if the datasync flag is set only changes
> to the inode require to find that data are written back. That's
> basically a messy wording for you can skip dirtiness of timestamp
> updates if the datasync flag is set.
>
> Note that your patch below is very wrong for this, as it does
> a lot of asynchronous activity if the datasync flag is not set.
>
Thank you Christoph. Me smack me on the head. Got two unrelated
flags totally confused.
I'll be posting a fix. For the best of my knowledge all I need is
a synchronous call to sync_inode() which will in turn call
exofs_write_inode. The later takes care of all metadata an exofs inode has.
(No allocation bitmaps, no data nodes lists, etc...)
In a next patch I'll also fix the super-block shit. Testing ...
Boaz
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] exofs: exofs_file_fsync correctness
2010-05-31 10:09 exofs_file_fsync Christoph Hellwig
2010-05-31 10:23 ` exofs_file_fsync Boaz Harrosh
@ 2010-06-01 15:30 ` Boaz Harrosh
2010-06-01 15:33 ` [osd-dev] " Boaz Harrosh
` (3 more replies)
1 sibling, 4 replies; 20+ messages in thread
From: Boaz Harrosh @ 2010-06-01 15:30 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-fsdevel, open-osd
As per Christoph advise: no need to call filemap_write_and_wait().
In exofs all metadata is at the inode so just writing the pages
and inode is all is needed. fsync implies this must be done
synchronously.
FIXME: remove the sb_sync and fix that sb_update better.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
fs/exofs/file.c | 21 ++++++++++++---------
1 files changed, 12 insertions(+), 9 deletions(-)
diff --git a/fs/exofs/file.c b/fs/exofs/file.c
index f9bfe2b..54bb17d 100644
--- a/fs/exofs/file.c
+++ b/fs/exofs/file.c
@@ -30,9 +30,6 @@
* along with exofs; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/
-
-#include <linux/buffer_head.h>
-
#include "exofs.h"
static int exofs_release_file(struct inode *inode, struct file *filp)
@@ -40,19 +37,25 @@ static int exofs_release_file(struct inode *inode, struct file *filp)
return 0;
}
+/* exofs_file_fsync - flush the inode to disk
+ *
+ * @datasync is not used. All metadata is written in one place regardless.
+ * the writeout is synchronous
+ */
static int exofs_file_fsync(struct file *filp, int datasync)
{
int ret;
struct address_space *mapping = filp->f_mapping;
struct inode *inode = mapping->host;
struct super_block *sb;
+ struct writeback_control wbc = {
+ .sync_mode = WB_SYNC_ALL,
+ .nr_to_write = LONG_MAX,
+ .range_start = 0,
+ .range_end = LLONG_MAX,
+ };
- ret = filemap_write_and_wait(mapping);
- if (ret)
- return ret;
-
- /* sync the inode attributes */
- ret = write_inode_now(inode, 1);
+ ret = sync_inode(inode, &wbc);
/* This is a good place to write the sb */
/* TODO: Sechedule an sb-sync on create */
--
1.6.6.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [osd-dev] [PATCH] exofs: exofs_file_fsync correctness
2010-06-01 15:30 ` [PATCH] exofs: exofs_file_fsync correctness Boaz Harrosh
@ 2010-06-01 15:33 ` Boaz Harrosh
2010-06-01 15:34 ` Christoph Hellwig
` (2 subsequent siblings)
3 siblings, 0 replies; 20+ messages in thread
From: Boaz Harrosh @ 2010-06-01 15:33 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-fsdevel, open-osd
On 06/01/2010 06:30 PM, Boaz Harrosh wrote:
>
> As per Christoph advise: no need to call filemap_write_and_wait().
> In exofs all metadata is at the inode so just writing the pages
> and inode is all is needed. fsync implies this must be done
> synchronously.
>
> FIXME: remove the sb_sync and fix that sb_update better.
>
> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
> ---
> fs/exofs/file.c | 21 ++++++++++++---------
> 1 files changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/fs/exofs/file.c b/fs/exofs/file.c
> index f9bfe2b..54bb17d 100644
> --- a/fs/exofs/file.c
> +++ b/fs/exofs/file.c
> @@ -30,9 +30,6 @@
> * along with exofs; if not, write to the Free Software
> * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> */
> -
> -#include <linux/buffer_head.h>
> -
> #include "exofs.h"
>
> static int exofs_release_file(struct inode *inode, struct file *filp)
> @@ -40,19 +37,25 @@ static int exofs_release_file(struct inode *inode, struct file *filp)
> return 0;
> }
>
> +/* exofs_file_fsync - flush the inode to disk
> + *
> + * @datasync is not used. All metadata is written in one place regardless.
> + * the writeout is synchronous
> + */
> static int exofs_file_fsync(struct file *filp, int datasync)
> {
> int ret;
> struct address_space *mapping = filp->f_mapping;
> struct inode *inode = mapping->host;
> struct super_block *sb;
> + struct writeback_control wbc = {
> + .sync_mode = WB_SYNC_ALL,
> + .nr_to_write = LONG_MAX,
> + .range_start = 0,
> + .range_end = LLONG_MAX,
Christoph what do you say should I do
+ .range_end = i_size_read(inode);
here to cap the sync up to current size?
Boaz
> + };
>
> - ret = filemap_write_and_wait(mapping);
> - if (ret)
> - return ret;
> -
> - /* sync the inode attributes */
> - ret = write_inode_now(inode, 1);
> + ret = sync_inode(inode, &wbc);
>
> /* This is a good place to write the sb */
> /* TODO: Sechedule an sb-sync on create */
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] exofs: exofs_file_fsync correctness
2010-06-01 15:30 ` [PATCH] exofs: exofs_file_fsync correctness Boaz Harrosh
2010-06-01 15:33 ` [osd-dev] " Boaz Harrosh
@ 2010-06-01 15:34 ` Christoph Hellwig
2010-06-01 15:40 ` Boaz Harrosh
2010-06-01 16:11 ` [PATCH ver2] exofs: exofs_file_fsync and exofs_file_flush correctness Boaz Harrosh
2010-06-01 17:03 ` [PATCH ver3] " Boaz Harrosh
3 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2010-06-01 15:34 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Christoph Hellwig, linux-fsdevel, open-osd
On Tue, Jun 01, 2010 at 06:30:55PM +0300, Boaz Harrosh wrote:
> +/* exofs_file_fsync - flush the inode to disk
> + *
> + * @datasync is not used. All metadata is written in one place regardless.
> + * the writeout is synchronous
> + */
You still want to check it. It's really only need for checking the
various dirty flags in the inode, which you're still missing.
> + struct writeback_control wbc = {
> + .sync_mode = WB_SYNC_ALL,
> + .nr_to_write = LONG_MAX,
By setting a nr_to_write you still write out data (at least in theory).
I'd recommend just copying the code from generic_file_fsync..
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] exofs: exofs_file_fsync correctness
2010-06-01 15:34 ` Christoph Hellwig
@ 2010-06-01 15:40 ` Boaz Harrosh
2010-06-01 15:47 ` Christoph Hellwig
0 siblings, 1 reply; 20+ messages in thread
From: Boaz Harrosh @ 2010-06-01 15:40 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-fsdevel, open-osd
On 06/01/2010 06:34 PM, Christoph Hellwig wrote:
> On Tue, Jun 01, 2010 at 06:30:55PM +0300, Boaz Harrosh wrote:
>> +/* exofs_file_fsync - flush the inode to disk
>> + *
>> + * @datasync is not used. All metadata is written in one place regardless.
>> + * the writeout is synchronous
>> + */
>
> You still want to check it. It's really only need for checking the
> various dirty flags in the inode, which you're still missing.
>
will do
>> + struct writeback_control wbc = {
>> + .sync_mode = WB_SYNC_ALL,
>> + .nr_to_write = LONG_MAX,
>
> By setting a nr_to_write you still write out data (at least in theory).
>
So when do I also sync the data? is that done for me at the VFS layer?
> I'd recommend just copying the code from generic_file_fsync..
>
I was actually mimicking the code from nfs/write.c which has similar
semantics as mine.
And if so then I'll need to not reuse the above in .flush
Boaz
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] exofs: exofs_file_fsync correctness
2010-06-01 15:40 ` Boaz Harrosh
@ 2010-06-01 15:47 ` Christoph Hellwig
2010-06-01 16:10 ` Boaz Harrosh
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2010-06-01 15:47 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Christoph Hellwig, linux-fsdevel, open-osd
On Tue, Jun 01, 2010 at 06:40:46PM +0300, Boaz Harrosh wrote:
> >> + struct writeback_control wbc = {
> >> + .sync_mode = WB_SYNC_ALL,
> >> + .nr_to_write = LONG_MAX,
> >
> > By setting a nr_to_write you still write out data (at least in theory).
> >
>
> So when do I also sync the data? is that done for me at the VFS layer?
You never have to. vfs_fsync_range does the data writeout for you.
> > I'd recommend just copying the code from generic_file_fsync..
> >
>
> I was actually mimicking the code from nfs/write.c which has similar
> semantics as mine.
nfs code has no good reason to do that, at least when called from
->fsync.
> And if so then I'll need to not reuse the above in .flush
I would recommend keeping the ->flush code separate.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] exofs: exofs_file_fsync correctness
2010-06-01 15:47 ` Christoph Hellwig
@ 2010-06-01 16:10 ` Boaz Harrosh
0 siblings, 0 replies; 20+ messages in thread
From: Boaz Harrosh @ 2010-06-01 16:10 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-fsdevel, open-osd
On 06/01/2010 06:47 PM, Christoph Hellwig wrote:
> On Tue, Jun 01, 2010 at 06:40:46PM +0300, Boaz Harrosh wrote:
>>>> + struct writeback_control wbc = {
>>>> + .sync_mode = WB_SYNC_ALL,
>>>> + .nr_to_write = LONG_MAX,
>>>
>>> By setting a nr_to_write you still write out data (at least in theory).
>>>
>>
>> So when do I also sync the data? is that done for me at the VFS layer?
>
> You never have to. vfs_fsync_range does the data writeout for you.
>
>>> I'd recommend just copying the code from generic_file_fsync..
>>>
>>
>> I was actually mimicking the code from nfs/write.c which has similar
>> semantics as mine.
>
> nfs code has no good reason to do that, at least when called from
> ->fsync.
>
>> And if so then I'll need to not reuse the above in .flush
>
> I would recommend keeping the ->flush code separate.
OK, ... I think I got it this time. (I hope)
And you are totally right. The ->flush code is called without any
locks and the ->fsync is called with i_mutex held. So the previous
code was totally racy for ->flush.
Boaz
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH ver2] exofs: exofs_file_fsync and exofs_file_flush correctness
2010-06-01 15:30 ` [PATCH] exofs: exofs_file_fsync correctness Boaz Harrosh
2010-06-01 15:33 ` [osd-dev] " Boaz Harrosh
2010-06-01 15:34 ` Christoph Hellwig
@ 2010-06-01 16:11 ` Boaz Harrosh
2010-06-01 16:17 ` Christoph Hellwig
2010-06-01 17:03 ` [PATCH ver3] " Boaz Harrosh
3 siblings, 1 reply; 20+ messages in thread
From: Boaz Harrosh @ 2010-06-01 16:11 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-fsdevel, open-osd
As per Christoph advise: no need to call filemap_write_and_wait().
In exofs all metadata is at the inode so just writing the inode is
all is needed. fsync implies this must be done synchronously.
But now exofs_file_fsync can not be used by exofs_file_flush.
vfs_fsync_range should do that job.
FIXME: remove the sb_sync and fix that sb_update better.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
fs/exofs/file.c | 29 +++++++++++++++++------------
1 files changed, 17 insertions(+), 12 deletions(-)
diff --git a/fs/exofs/file.c b/fs/exofs/file.c
index f9bfe2b..e0eb219 100644
--- a/fs/exofs/file.c
+++ b/fs/exofs/file.c
@@ -30,9 +30,6 @@
* along with exofs; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/
-
-#include <linux/buffer_head.h>
-
#include "exofs.h"
static int exofs_release_file(struct inode *inode, struct file *filp)
@@ -40,19 +37,27 @@ static int exofs_release_file(struct inode *inode, struct file *filp)
return 0;
}
+/* exofs_file_fsync - flush the inode to disk
+ *
+ * @datasync is not used. All metadata is written in one place regardless.
+ * the writeout is synchronous
+ */
static int exofs_file_fsync(struct file *filp, int datasync)
{
int ret;
- struct address_space *mapping = filp->f_mapping;
- struct inode *inode = mapping->host;
+ struct writeback_control wbc = {
+ .sync_mode = WB_SYNC_ALL,
+ .nr_to_write = 0, /* metadata-only; caller takes care of data */
+ };
+ struct inode *inode = filp->f_mapping->host;
struct super_block *sb;
- ret = filemap_write_and_wait(mapping);
- if (ret)
- return ret;
+ if (!(inode->i_state & I_DIRTY))
+ return 0;
+ if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
+ return 0;
- /* sync the inode attributes */
- ret = write_inode_now(inode, 1);
+ ret = sync_inode(inode, &wbc);
/* This is a good place to write the sb */
/* TODO: Sechedule an sb-sync on create */
@@ -65,9 +70,9 @@ static int exofs_file_fsync(struct file *filp, int datasync)
static int exofs_flush(struct file *file, fl_owner_t id)
{
- exofs_file_fsync(file, 1);
+ int ret = vfs_fsync_range(file, 0, LLONG_MAX, 0);
/* TODO: Flush the OSD target */
- return 0;
+ return ret;
}
const struct file_operations exofs_file_operations = {
--
1.6.6.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH ver2] exofs: exofs_file_fsync and exofs_file_flush correctness
2010-06-01 16:11 ` [PATCH ver2] exofs: exofs_file_fsync and exofs_file_flush correctness Boaz Harrosh
@ 2010-06-01 16:17 ` Christoph Hellwig
2010-06-01 16:36 ` Boaz Harrosh
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2010-06-01 16:17 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Christoph Hellwig, linux-fsdevel, open-osd
On Tue, Jun 01, 2010 at 07:11:06PM +0300, Boaz Harrosh wrote:
> +/* exofs_file_fsync - flush the inode to disk
> + *
> + * @datasync is not used. All metadata is written in one place regardless.
> + * the writeout is synchronous
> + */
It actually is used now :)
> static int exofs_flush(struct file *file, fl_owner_t id)
> {
> - exofs_file_fsync(file, 1);
> + int ret = vfs_fsync_range(file, 0, LLONG_MAX, 0);
> /* TODO: Flush the OSD target */
> - return 0;
> + return ret;
why not just:
return vfs_fsync(file, 0);
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH ver2] exofs: exofs_file_fsync and exofs_file_flush correctness
2010-06-01 16:17 ` Christoph Hellwig
@ 2010-06-01 16:36 ` Boaz Harrosh
0 siblings, 0 replies; 20+ messages in thread
From: Boaz Harrosh @ 2010-06-01 16:36 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-fsdevel, open-osd
On 06/01/2010 07:17 PM, Christoph Hellwig wrote:
> On Tue, Jun 01, 2010 at 07:11:06PM +0300, Boaz Harrosh wrote:
>> +/* exofs_file_fsync - flush the inode to disk
>> + *
>> + * @datasync is not used. All metadata is written in one place regardless.
>> + * the writeout is synchronous
>> + */
>
> It actually is used now :)
>
Right ;-) fixing
>> static int exofs_flush(struct file *file, fl_owner_t id)
>> {
>> - exofs_file_fsync(file, 1);
>> + int ret = vfs_fsync_range(file, 0, LLONG_MAX, 0);
>> /* TODO: Flush the OSD target */
>> - return 0;
>> + return ret;
>
> why not just:
>
> return vfs_fsync(file, 0);
>
Why, because I don't look passed my nose that's why.
Thank you Christoph. This is an area I was feeling bad about from the
get go. You can't imagine how happy I'm to have you help me on this.
(Things You Always Wanted to Know But Were Afraid to Ask)
Will post
Boaz
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH ver3] exofs: exofs_file_fsync and exofs_file_flush correctness
2010-06-01 15:30 ` [PATCH] exofs: exofs_file_fsync correctness Boaz Harrosh
` (2 preceding siblings ...)
2010-06-01 16:11 ` [PATCH ver2] exofs: exofs_file_fsync and exofs_file_flush correctness Boaz Harrosh
@ 2010-06-01 17:03 ` Boaz Harrosh
3 siblings, 0 replies; 20+ messages in thread
From: Boaz Harrosh @ 2010-06-01 17:03 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-fsdevel, open-osd
As per Christoph advise: no need to call filemap_write_and_wait().
In exofs all metadata is at the inode so just writing the inode is
all is needed. ->fsync implies this must be done synchronously.
But now exofs_file_fsync can not be used by exofs_file_flush.
vfs_fsync() should do that job correctly.
FIXME: remove the sb_sync and fix that sb_update better.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
fs/exofs/file.c | 29 +++++++++++++++++------------
1 files changed, 17 insertions(+), 12 deletions(-)
diff --git a/fs/exofs/file.c b/fs/exofs/file.c
index f9bfe2b..68cb23e 100644
--- a/fs/exofs/file.c
+++ b/fs/exofs/file.c
@@ -30,9 +30,6 @@
* along with exofs; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/
-
-#include <linux/buffer_head.h>
-
#include "exofs.h"
static int exofs_release_file(struct inode *inode, struct file *filp)
@@ -40,19 +37,27 @@ static int exofs_release_file(struct inode *inode, struct file *filp)
return 0;
}
+/* exofs_file_fsync - flush the inode to disk
+ *
+ * Note, in exofs all metadata is written as part of inode, regardless.
+ * The writeout is synchronous
+ */
static int exofs_file_fsync(struct file *filp, int datasync)
{
int ret;
- struct address_space *mapping = filp->f_mapping;
- struct inode *inode = mapping->host;
+ struct inode *inode = filp->f_mapping->host;
+ struct writeback_control wbc = {
+ .sync_mode = WB_SYNC_ALL,
+ .nr_to_write = 0, /* metadata-only; caller takes care of data */
+ };
struct super_block *sb;
- ret = filemap_write_and_wait(mapping);
- if (ret)
- return ret;
+ if (!(inode->i_state & I_DIRTY))
+ return 0;
+ if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
+ return 0;
- /* sync the inode attributes */
- ret = write_inode_now(inode, 1);
+ ret = sync_inode(inode, &wbc);
/* This is a good place to write the sb */
/* TODO: Sechedule an sb-sync on create */
@@ -65,9 +70,9 @@ static int exofs_file_fsync(struct file *filp, int datasync)
static int exofs_flush(struct file *file, fl_owner_t id)
{
- exofs_file_fsync(file, 1);
+ int ret = vfs_fsync(file, 0);
/* TODO: Flush the OSD target */
- return 0;
+ return ret;
}
const struct file_operations exofs_file_operations = {
--
1.6.6.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
end of thread, other threads:[~2010-06-01 17:03 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-31 10:09 exofs_file_fsync Christoph Hellwig
2010-05-31 10:23 ` exofs_file_fsync Boaz Harrosh
2010-05-31 10:27 ` exofs_file_fsync Christoph Hellwig
2010-05-31 10:31 ` exofs_file_fsync Boaz Harrosh
2010-05-31 10:33 ` exofs_file_fsync Christoph Hellwig
2010-05-31 13:43 ` exofs_file_fsync Boaz Harrosh
2010-05-31 13:46 ` exofs_file_fsync Boaz Harrosh
2010-06-01 10:05 ` exofs_file_fsync Christoph Hellwig
2010-06-01 10:04 ` exofs_file_fsync Christoph Hellwig
2010-06-01 15:29 ` exofs_file_fsync Boaz Harrosh
2010-06-01 15:30 ` [PATCH] exofs: exofs_file_fsync correctness Boaz Harrosh
2010-06-01 15:33 ` [osd-dev] " Boaz Harrosh
2010-06-01 15:34 ` Christoph Hellwig
2010-06-01 15:40 ` Boaz Harrosh
2010-06-01 15:47 ` Christoph Hellwig
2010-06-01 16:10 ` Boaz Harrosh
2010-06-01 16:11 ` [PATCH ver2] exofs: exofs_file_fsync and exofs_file_flush correctness Boaz Harrosh
2010-06-01 16:17 ` Christoph Hellwig
2010-06-01 16:36 ` Boaz Harrosh
2010-06-01 17:03 ` [PATCH ver3] " Boaz Harrosh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).