* [RFC] exofs: New truncate sequence
@ 2010-05-31 12:30 Boaz Harrosh
2010-05-31 13:44 ` Nick Piggin
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Boaz Harrosh @ 2010-05-31 12:30 UTC (permalink / raw)
To: Christoph Hellwig, Nick Piggin, linux-fsdevel, open-osd
These changes are crafted based on the similar
conversion done to ext2 by Nick Piggin.
Please check me out I bound to have miss-looked
something.
- Remove the deprecated ->truncate vector, call a new
exofs_setsize for on-disk size update, from exofs_setattr
- Call truncate_pagecache on the unused pages if
write_begin/end fails.
- Cleanup exofs_delete_inode that did stupid inode
writes and updates on an inode that will be
removed.
- And finally get rid of exofs_get_block. We never
had any blocks it was all for calling nobh_truncate_page.
nobh_truncate_page is not actually needed in exofs since
the last page is complete and gone just like all the other
pages. There is no partial blocks in exofs.
[OK do I might need a partial read here upto i_size ???]
I've tested with this patch, and there are no apparent
failures, so far.
These patches are based on Al's vfs for-next branch, which
contain Nick's last patches. But they do not contain,
(and will lightly conflict with) latest Christoph's
patches. Christoph do you have a tree I can rebase on? or
maybe you want to take this patch into your patchset?
CC: Christoph Hellwig <hch@lst.de>
CC: Nick Piggin <npiggin@suse.de>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
fs/exofs/exofs.h | 1 -
fs/exofs/file.c | 1 -
fs/exofs/inode.c | 115 +++++++++++++++++++++++++++---------------------------
3 files changed, 58 insertions(+), 59 deletions(-)
diff --git a/fs/exofs/exofs.h b/fs/exofs/exofs.h
index 22721b2..0706ce9 100644
--- a/fs/exofs/exofs.h
+++ b/fs/exofs/exofs.h
@@ -256,7 +256,6 @@ static inline int exofs_oi_read(struct exofs_i_info *oi,
}
/* inode.c */
-void exofs_truncate(struct inode *inode);
int exofs_setattr(struct dentry *, struct iattr *);
int exofs_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
diff --git a/fs/exofs/file.c b/fs/exofs/file.c
index fef6899..f9bfe2b 100644
--- a/fs/exofs/file.c
+++ b/fs/exofs/file.c
@@ -86,6 +86,5 @@ const struct file_operations exofs_file_operations = {
};
const struct inode_operations exofs_file_inode_operations = {
- .truncate = exofs_truncate,
.setattr = exofs_setattr,
};
diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
index 4bb6ef8..1c2666c 100644
--- a/fs/exofs/inode.c
+++ b/fs/exofs/inode.c
@@ -710,7 +710,7 @@ int exofs_write_begin(struct file *file, struct address_space *mapping,
fsdata);
if (ret) {
EXOFS_DBGMSG("simple_write_begin faild\n");
- return ret;
+ goto out;
}
page = *pagep;
@@ -725,7 +725,17 @@ int exofs_write_begin(struct file *file, struct address_space *mapping,
EXOFS_DBGMSG("__readpage_filler faild\n");
}
}
+out:
+ if (unlikely(ret)) {
+ struct inode *inode = mapping->host;
+ loff_t to = pos + len;
+ if (to > inode->i_size)
+ truncate_pagecache(inode, to, inode->i_size);
+
+ mark_inode_dirty(inode); /* write the new size */
+ return ret;
+ }
return ret;
}
@@ -750,6 +760,10 @@ static int exofs_write_end(struct file *file, struct address_space *mapping,
int ret;
ret = simple_write_end(file, mapping,pos, len, copied, page, fsdata);
+ if (unlikely(ret && pos + len > inode->i_size))
+ truncate_pagecache(inode, pos + len, inode->i_size);
+
+ /* TODO: once simple_write_end marks inode dirty remove */
if (i_size != inode->i_size)
mark_inode_dirty(inode);
return ret;
@@ -808,35 +822,18 @@ static inline int exofs_inode_is_fast_symlink(struct inode *inode)
return S_ISLNK(inode->i_mode) && (oi->i_data[0] != 0);
}
-/*
- * get_block_t - Fill in a buffer_head
- * An OSD takes care of block allocation so we just fake an allocation by
- * putting in the inode's sector_t in the buffer_head.
- * TODO: What about the case of create==0 and @iblock does not exist in the
- * object?
- */
-static int exofs_get_block(struct inode *inode, sector_t iblock,
- struct buffer_head *bh_result, int create)
-{
- map_bh(bh_result, inode->i_sb, iblock);
- return 0;
-}
-
const struct osd_attr g_attr_logical_length = ATTR_DEF(
OSD_APAGE_OBJECT_INFORMATION, OSD_ATTR_OI_LOGICAL_LENGTH, 8);
-static int _do_truncate(struct inode *inode)
+static int _do_truncate(struct inode *inode, loff_t newsize)
{
struct exofs_i_info *oi = exofs_i(inode);
- loff_t isize = i_size_read(inode);
int ret;
inode->i_mtime = inode->i_ctime = CURRENT_TIME;
- nobh_truncate_page(inode->i_mapping, isize, exofs_get_block);
-
- ret = exofs_oi_truncate(oi, (u64)isize);
- EXOFS_DBGMSG("(0x%lx) size=0x%llx\n", inode->i_ino, isize);
+ ret = exofs_oi_truncate(oi, (u64)newsize);
+ EXOFS_DBGMSG("(0x%lx) size=0x%llx\n", inode->i_ino, newsize);
return ret;
}
@@ -844,35 +841,28 @@ static int _do_truncate(struct inode *inode)
* Truncate a file to the specified size - all we have to do is set the size
* attribute. We make sure the object exists first.
*/
-void exofs_truncate(struct inode *inode)
+static int exofs_setsize(struct inode *inode, loff_t newsize)
{
- struct exofs_i_info *oi = exofs_i(inode);
- int ret;
+ loff_t oldsize;
+ int error;
- if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)
- || S_ISLNK(inode->i_mode)))
- return;
+ error = inode_newsize_ok(inode, newsize);
+ if (error)
+ return error;
+
+ if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
+ S_ISLNK(inode->i_mode)))
+ return -EINVAL;
if (exofs_inode_is_fast_symlink(inode))
- return;
+ return -EINVAL;
if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
- return;
-
- /* if we are about to truncate an object, and it hasn't been
- * created yet, wait
- */
- if (unlikely(wait_obj_created(oi)))
- goto fail;
+ return -EPERM;
- ret = _do_truncate(inode);
- if (ret)
- goto fail;
+ oldsize = inode->i_size;
+ i_size_write(inode, newsize);
+ truncate_pagecache(inode, oldsize, newsize);
-out:
- mark_inode_dirty(inode);
- return;
-fail:
- make_bad_inode(inode);
- goto out;
+ return _do_truncate(inode, newsize);
}
/*
@@ -883,11 +873,25 @@ int exofs_setattr(struct dentry *dentry, struct iattr *iattr)
struct inode *inode = dentry->d_inode;
int error;
+ /* if we are about to modify an object, and it hasn't been
+ * created yet, wait
+ */
+ error = wait_obj_created(exofs_i(inode));
+ if (unlikely(error))
+ return error;
+
error = inode_change_ok(inode, iattr);
if (error)
return error;
- error = inode_setattr(inode, iattr);
+ if (iattr->ia_valid & ATTR_SIZE) {
+ error = exofs_setsize(inode, iattr->ia_size);
+ if (error)
+ return error;
+ }
+ generic_setattr(inode, iattr);
+ mark_inode_dirty(inode);
+
return error;
}
@@ -1335,28 +1339,25 @@ void exofs_delete_inode(struct inode *inode)
truncate_inode_pages(&inode->i_data, 0);
+ /* TODO: should do better here */
if (is_bad_inode(inode))
goto no_delete;
- mark_inode_dirty(inode);
- exofs_update_inode(inode, inode_needs_sync(inode));
-
inode->i_size = 0;
- if (inode->i_blocks)
- exofs_truncate(inode);
-
clear_inode(inode);
- ret = exofs_get_io_state(&sbi->layout, &ios);
- if (unlikely(ret)) {
- EXOFS_ERR("%s: exofs_get_io_state failed\n", __func__);
- return;
- }
-
/* if we are deleting an obj that hasn't been created yet, wait */
if (!obj_created(oi)) {
BUG_ON(!obj_2bcreated(oi));
wait_event(oi->i_wq, obj_created(oi));
+ /* ignore the error attempt a remove anyway */
+ }
+
+ /* Now Remove the OSD objects */
+ ret = exofs_get_io_state(&sbi->layout, &ios);
+ if (unlikely(ret)) {
+ EXOFS_ERR("%s: exofs_get_io_state failed\n", __func__);
+ return;
}
ios->obj.id = exofs_oi_objno(oi);
--
1.6.6.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [RFC] exofs: New truncate sequence
2010-05-31 12:30 [RFC] exofs: New truncate sequence Boaz Harrosh
@ 2010-05-31 13:44 ` Nick Piggin
2010-05-31 14:13 ` Boaz Harrosh
2010-06-01 10:28 ` [PATCH ver2] " Boaz Harrosh
2010-06-01 11:31 ` [PATCH ver3] " Boaz Harrosh
2 siblings, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2010-05-31 13:44 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Christoph Hellwig, linux-fsdevel, open-osd
On Mon, May 31, 2010 at 03:30:02PM +0300, Boaz Harrosh wrote:
>
> These changes are crafted based on the similar
> conversion done to ext2 by Nick Piggin.
>
> Please check me out I bound to have miss-looked
> something.
>
> - Remove the deprecated ->truncate vector, call a new
> exofs_setsize for on-disk size update, from exofs_setattr
> - Call truncate_pagecache on the unused pages if
> write_begin/end fails.
> - Cleanup exofs_delete_inode that did stupid inode
> writes and updates on an inode that will be
> removed.
> - And finally get rid of exofs_get_block. We never
> had any blocks it was all for calling nobh_truncate_page.
> nobh_truncate_page is not actually needed in exofs since
> the last page is complete and gone just like all the other
> pages. There is no partial blocks in exofs.
> [OK do I might need a partial read here upto i_size ???]
>
> I've tested with this patch, and there are no apparent
> failures, so far.
>
> These patches are based on Al's vfs for-next branch, which
> contain Nick's last patches. But they do not contain,
> (and will lightly conflict with) latest Christoph's
> patches. Christoph do you have a tree I can rebase on? or
> maybe you want to take this patch into your patchset?
>
> CC: Christoph Hellwig <hch@lst.de>
> CC: Nick Piggin <npiggin@suse.de>
> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
> ---
> fs/exofs/exofs.h | 1 -
> fs/exofs/file.c | 1 -
> fs/exofs/inode.c | 115 +++++++++++++++++++++++++++---------------------------
Can you rip out all the rest of the buffer_head stuff too?
> 3 files changed, 58 insertions(+), 59 deletions(-)
>
> diff --git a/fs/exofs/exofs.h b/fs/exofs/exofs.h
> index 22721b2..0706ce9 100644
> --- a/fs/exofs/exofs.h
> +++ b/fs/exofs/exofs.h
> @@ -256,7 +256,6 @@ static inline int exofs_oi_read(struct exofs_i_info *oi,
> }
>
> /* inode.c */
> -void exofs_truncate(struct inode *inode);
> int exofs_setattr(struct dentry *, struct iattr *);
> int exofs_write_begin(struct file *file, struct address_space *mapping,
> loff_t pos, unsigned len, unsigned flags,
> diff --git a/fs/exofs/file.c b/fs/exofs/file.c
> index fef6899..f9bfe2b 100644
> --- a/fs/exofs/file.c
> +++ b/fs/exofs/file.c
> @@ -86,6 +86,5 @@ const struct file_operations exofs_file_operations = {
> };
>
> const struct inode_operations exofs_file_inode_operations = {
> - .truncate = exofs_truncate,
> .setattr = exofs_setattr,
> };
> diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
> index 4bb6ef8..1c2666c 100644
> --- a/fs/exofs/inode.c
> +++ b/fs/exofs/inode.c
> @@ -710,7 +710,7 @@ int exofs_write_begin(struct file *file, struct address_space *mapping,
> fsdata);
> if (ret) {
> EXOFS_DBGMSG("simple_write_begin faild\n");
> - return ret;
> + goto out;
> }
>
> page = *pagep;
> @@ -725,7 +725,17 @@ int exofs_write_begin(struct file *file, struct address_space *mapping,
> EXOFS_DBGMSG("__readpage_filler faild\n");
> }
> }
> +out:
> + if (unlikely(ret)) {
> + struct inode *inode = mapping->host;
> + loff_t to = pos + len;
>
> + if (to > inode->i_size)
> + truncate_pagecache(inode, to, inode->i_size);
> +
> + mark_inode_dirty(inode); /* write the new size */
Is this required? I don't think the i_size should be changed here?
> + return ret;
> + }
> return ret;
> }
>
> @@ -750,6 +760,10 @@ static int exofs_write_end(struct file *file, struct address_space *mapping,
> int ret;
>
> ret = simple_write_end(file, mapping,pos, len, copied, page, fsdata);
> + if (unlikely(ret && pos + len > inode->i_size))
> + truncate_pagecache(inode, pos + len, inode->i_size);
> +
So there is no need to do any oi_truncate? Even if _readpage in
write_begin has set up some blocks?
> + /* TODO: once simple_write_end marks inode dirty remove */
> if (i_size != inode->i_size)
> mark_inode_dirty(inode);
> return ret;
Hmm, I suppose simple_write_end probably should mark the inode dirty?
> @@ -1335,28 +1339,25 @@ void exofs_delete_inode(struct inode *inode)
>
> truncate_inode_pages(&inode->i_data, 0);
>
> + /* TODO: should do better here */
> if (is_bad_inode(inode))
> goto no_delete;
>
> - mark_inode_dirty(inode);
> - exofs_update_inode(inode, inode_needs_sync(inode));
> -
> inode->i_size = 0;
> - if (inode->i_blocks)
> - exofs_truncate(inode);
This guy has gone missing -- I assume exofs_sbi_remove is a more
efficient way to do this anyway?
> -
> clear_inode(inode);
>
> - ret = exofs_get_io_state(&sbi->layout, &ios);
> - if (unlikely(ret)) {
> - EXOFS_ERR("%s: exofs_get_io_state failed\n", __func__);
> - return;
> - }
> -
> /* if we are deleting an obj that hasn't been created yet, wait */
> if (!obj_created(oi)) {
> BUG_ON(!obj_2bcreated(oi));
> wait_event(oi->i_wq, obj_created(oi));
> + /* ignore the error attempt a remove anyway */
> + }
> +
> + /* Now Remove the OSD objects */
> + ret = exofs_get_io_state(&sbi->layout, &ios);
> + if (unlikely(ret)) {
> + EXOFS_ERR("%s: exofs_get_io_state failed\n", __func__);
> + return;
> }
>
> ios->obj.id = exofs_oi_objno(oi);
> --
> 1.6.6.1
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] exofs: New truncate sequence
2010-05-31 13:44 ` Nick Piggin
@ 2010-05-31 14:13 ` Boaz Harrosh
2010-05-31 14:33 ` Nick Piggin
0 siblings, 1 reply; 20+ messages in thread
From: Boaz Harrosh @ 2010-05-31 14:13 UTC (permalink / raw)
To: Nick Piggin; +Cc: Christoph Hellwig, linux-fsdevel, open-osd
On 05/31/2010 04:44 PM, Nick Piggin wrote:
> On Mon, May 31, 2010 at 03:30:02PM +0300, Boaz Harrosh wrote:
>>
>> These changes are crafted based on the similar
>> conversion done to ext2 by Nick Piggin.
>>
>> Please check me out I bound to have miss-looked
>> something.
>>
>> - Remove the deprecated ->truncate vector, call a new
>> exofs_setsize for on-disk size update, from exofs_setattr
>> - Call truncate_pagecache on the unused pages if
>> write_begin/end fails.
>> - Cleanup exofs_delete_inode that did stupid inode
>> writes and updates on an inode that will be
>> removed.
>> - And finally get rid of exofs_get_block. We never
>> had any blocks it was all for calling nobh_truncate_page.
>> nobh_truncate_page is not actually needed in exofs since
>> the last page is complete and gone just like all the other
>> pages. There is no partial blocks in exofs.
>> [OK do I might need a partial read here upto i_size ???]
>>
>> I've tested with this patch, and there are no apparent
>> failures, so far.
>>
>> These patches are based on Al's vfs for-next branch, which
>> contain Nick's last patches. But they do not contain,
>> (and will lightly conflict with) latest Christoph's
>> patches. Christoph do you have a tree I can rebase on? or
>> maybe you want to take this patch into your patchset?
>>
>> CC: Christoph Hellwig <hch@lst.de>
>> CC: Nick Piggin <npiggin@suse.de>
>> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
>> ---
>> fs/exofs/exofs.h | 1 -
>> fs/exofs/file.c | 1 -
>> fs/exofs/inode.c | 115 +++++++++++++++++++++++++++---------------------------
>
> Can you rip out all the rest of the buffer_head stuff too?
>
I hope I don't have any left, that was the last, have I missed
something?
Thanks for looking. Please talk to me a bit more, the exofs
is a bit special around here and I would like another pair
of eyes on this.
>
>> 3 files changed, 58 insertions(+), 59 deletions(-)
>>
>> diff --git a/fs/exofs/exofs.h b/fs/exofs/exofs.h
>> index 22721b2..0706ce9 100644
>> --- a/fs/exofs/exofs.h
>> +++ b/fs/exofs/exofs.h
>> @@ -256,7 +256,6 @@ static inline int exofs_oi_read(struct exofs_i_info *oi,
>> }
>>
>> /* inode.c */
>> -void exofs_truncate(struct inode *inode);
>> int exofs_setattr(struct dentry *, struct iattr *);
>> int exofs_write_begin(struct file *file, struct address_space *mapping,
>> loff_t pos, unsigned len, unsigned flags,
>> diff --git a/fs/exofs/file.c b/fs/exofs/file.c
>> index fef6899..f9bfe2b 100644
>> --- a/fs/exofs/file.c
>> +++ b/fs/exofs/file.c
>> @@ -86,6 +86,5 @@ const struct file_operations exofs_file_operations = {
>> };
>>
>> const struct inode_operations exofs_file_inode_operations = {
>> - .truncate = exofs_truncate,
>> .setattr = exofs_setattr,
>> };
>> diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
>> index 4bb6ef8..1c2666c 100644
>> --- a/fs/exofs/inode.c
>> +++ b/fs/exofs/inode.c
>> @@ -710,7 +710,7 @@ int exofs_write_begin(struct file *file, struct address_space *mapping,
>> fsdata);
>> if (ret) {
>> EXOFS_DBGMSG("simple_write_begin faild\n");
>> - return ret;
>> + goto out;
>> }
>>
>> page = *pagep;
>> @@ -725,7 +725,17 @@ int exofs_write_begin(struct file *file, struct address_space *mapping,
>> EXOFS_DBGMSG("__readpage_filler faild\n");
>> }
>> }
>> +out:
>> + if (unlikely(ret)) {
>> + struct inode *inode = mapping->host;
>> + loff_t to = pos + len;
>>
>> + if (to > inode->i_size)
>> + truncate_pagecache(inode, to, inode->i_size);
>> +
>> + mark_inode_dirty(inode); /* write the new size */
>
> Is this required? I don't think the i_size should be changed here?
Right and write_begin has marked it dirty anyway
>
>> + return ret;
>> + }
>> return ret;
>> }
>>
>> @@ -750,6 +760,10 @@ static int exofs_write_end(struct file *file, struct address_space *mapping,
>> int ret;
>>
>> ret = simple_write_end(file, mapping,pos, len, copied, page, fsdata);
>> + if (unlikely(ret && pos + len > inode->i_size))
>> + truncate_pagecache(inode, pos + len, inode->i_size);
>> +
>
> So there is no need to do any oi_truncate? Even if _readpage in
> write_begin has set up some blocks?
>
No blocks are setup. Exofs does not have any blocks. If the write
failed then the object on OSD has the last written offset as objects
size. If error handling was done right residual is subtracted from IO
size and should be reflected here.
If IO was never attempted then object size did not grow and we revert
in memory i_size here. (size is keept in two places and is checked
for consistency in fsck)
TODO:
eject short writes and see if this works correctly.
>
>> + /* TODO: once simple_write_end marks inode dirty remove */
>> if (i_size != inode->i_size)
>> mark_inode_dirty(inode);
>> return ret;
>
> Hmm, I suppose simple_write_end probably should mark the inode dirty?
>
I think Christoph has a patch for that.
>
>> @@ -1335,28 +1339,25 @@ void exofs_delete_inode(struct inode *inode)
>>
>> truncate_inode_pages(&inode->i_data, 0);
>>
>> + /* TODO: should do better here */
>> if (is_bad_inode(inode))
>> goto no_delete;
>>
>> - mark_inode_dirty(inode);
>> - exofs_update_inode(inode, inode_needs_sync(inode));
>> -
>> inode->i_size = 0;
>> - if (inode->i_blocks)
>> - exofs_truncate(inode);
>
> This guy has gone missing -- I assume exofs_sbi_remove is a more
> efficient way to do this anyway?
>
You see this is where exofs is different a file is an object_no on
multiple OSD devices. The inode is kept as an attribute of the
object. (data as object's data) so a exofs_sbi_remove will just
obliterate any association to the object. It was historically
called because exofs_truncate used to do what truncate_inode_pages
does today. (And some other in memory book keeping.) But with
your help all this was cleaned up.
Do you see any operation I missed that might need cleaning from the
generic VFS inode, that might now leak. As far as storage is concerned
I'm covered.
[I ran git clone linux; rm -rf linux; 100 times in a loop and the OSD
storage stayed constant size. So I presume there is no storage leak.
OSD is good in this respect]
>
>> -
>> clear_inode(inode);
>>
>> - ret = exofs_get_io_state(&sbi->layout, &ios);
>> - if (unlikely(ret)) {
>> - EXOFS_ERR("%s: exofs_get_io_state failed\n", __func__);
>> - return;
>> - }
>> -
>> /* if we are deleting an obj that hasn't been created yet, wait */
>> if (!obj_created(oi)) {
>> BUG_ON(!obj_2bcreated(oi));
>> wait_event(oi->i_wq, obj_created(oi));
>> + /* ignore the error attempt a remove anyway */
>> + }
>> +
>> + /* Now Remove the OSD objects */
>> + ret = exofs_get_io_state(&sbi->layout, &ios);
>> + if (unlikely(ret)) {
>> + EXOFS_ERR("%s: exofs_get_io_state failed\n", __func__);
>> + return;
>> }
>>
>> ios->obj.id = exofs_oi_objno(oi);
>> --
>> 1.6.6.1
Thanks for lookin. And thanks for making this patch possible. I wanted
this cleaned, long ago, but it was only made easy and simple after your
changes.
Boaz
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] exofs: New truncate sequence
2010-05-31 14:13 ` Boaz Harrosh
@ 2010-05-31 14:33 ` Nick Piggin
2010-05-31 14:50 ` Boaz Harrosh
0 siblings, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2010-05-31 14:33 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Christoph Hellwig, linux-fsdevel, open-osd
On Mon, May 31, 2010 at 05:13:34PM +0300, Boaz Harrosh wrote:
> On 05/31/2010 04:44 PM, Nick Piggin wrote:
> > On Mon, May 31, 2010 at 03:30:02PM +0300, Boaz Harrosh wrote:
> >> ---
> >> fs/exofs/exofs.h | 1 -
> >> fs/exofs/file.c | 1 -
> >> fs/exofs/inode.c | 115 +++++++++++++++++++++++++++---------------------------
> >
> > Can you rip out all the rest of the buffer_head stuff too?
> >
>
> I hope I don't have any left, that was the last, have I missed
> something?
exofs_invalidatepage, exofs_releasepage, includes of buffer_head.h.
No point to any of that if you never actually map the buffers or
use them for tracking state yourself.
> >> @@ -750,6 +760,10 @@ static int exofs_write_end(struct file *file, struct address_space *mapping,
> >> int ret;
> >>
> >> ret = simple_write_end(file, mapping,pos, len, copied, page, fsdata);
> >> + if (unlikely(ret && pos + len > inode->i_size))
> >> + truncate_pagecache(inode, pos + len, inode->i_size);
> >> +
> >
> > So there is no need to do any oi_truncate? Even if _readpage in
> > write_begin has set up some blocks?
> >
>
> No blocks are setup. Exofs does not have any blocks.
Sure, I was just using blocks as a placeholder for whatever you call
it.
> If the write
> failed then the object on OSD has the last written offset as objects
> size. If error handling was done right residual is subtracted from IO
> size and should be reflected here.
>
> If IO was never attempted then object size did not grow and we revert
> in memory i_size here. (size is keept in two places and is checked
> for consistency in fsck)
So long as the _readpage has not altered the object state at all,
then this looks fine to me.
> TODO:
> eject short writes and see if this works correctly.
>
> >
> >> + /* TODO: once simple_write_end marks inode dirty remove */
> >> if (i_size != inode->i_size)
> >> mark_inode_dirty(inode);
> >> return ret;
> >
> > Hmm, I suppose simple_write_end probably should mark the inode dirty?
> >
>
> I think Christoph has a patch for that.
>
> >
> >> @@ -1335,28 +1339,25 @@ void exofs_delete_inode(struct inode *inode)
> >>
> >> truncate_inode_pages(&inode->i_data, 0);
> >>
> >> + /* TODO: should do better here */
> >> if (is_bad_inode(inode))
> >> goto no_delete;
> >>
> >> - mark_inode_dirty(inode);
> >> - exofs_update_inode(inode, inode_needs_sync(inode));
> >> -
> >> inode->i_size = 0;
> >> - if (inode->i_blocks)
> >> - exofs_truncate(inode);
> >
> > This guy has gone missing -- I assume exofs_sbi_remove is a more
> > efficient way to do this anyway?
> >
>
> You see this is where exofs is different a file is an object_no on
> multiple OSD devices. The inode is kept as an attribute of the
> object. (data as object's data) so a exofs_sbi_remove will just
> obliterate any association to the object. It was historically
> called because exofs_truncate used to do what truncate_inode_pages
> does today. (And some other in memory book keeping.) But with
> your help all this was cleaned up.
OK, I was thinking the underlying object itself needs to be trimmed
to match i_size similarly to just a block based filesystem? Like
exofs_oi_truncate appears to.
> Do you see any operation I missed that might need cleaning from the
> generic VFS inode, that might now leak. As far as storage is concerned
> I'm covered.
>
> [I ran git clone linux; rm -rf linux; 100 times in a loop and the OSD
> storage stayed constant size. So I presume there is no storage leak.
> OSD is good in this respect]
I can't see anything off hand. Was just flagging points where
vmtruncate or truncate had been called and is not now. If you
have all those covered, then you should be OK.
> >> -
> >> clear_inode(inode);
> >>
> >> - ret = exofs_get_io_state(&sbi->layout, &ios);
> >> - if (unlikely(ret)) {
> >> - EXOFS_ERR("%s: exofs_get_io_state failed\n", __func__);
> >> - return;
> >> - }
> >> -
> >> /* if we are deleting an obj that hasn't been created yet, wait */
> >> if (!obj_created(oi)) {
> >> BUG_ON(!obj_2bcreated(oi));
> >> wait_event(oi->i_wq, obj_created(oi));
> >> + /* ignore the error attempt a remove anyway */
> >> + }
> >> +
> >> + /* Now Remove the OSD objects */
> >> + ret = exofs_get_io_state(&sbi->layout, &ios);
> >> + if (unlikely(ret)) {
> >> + EXOFS_ERR("%s: exofs_get_io_state failed\n", __func__);
> >> + return;
> >> }
> >>
> >> ios->obj.id = exofs_oi_objno(oi);
> >> --
> >> 1.6.6.1
>
> Thanks for lookin. And thanks for making this patch possible. I wanted
> this cleaned, long ago, but it was only made easy and simple after your
> changes.
That's OK, thanks for helping with it.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] exofs: New truncate sequence
2010-05-31 14:33 ` Nick Piggin
@ 2010-05-31 14:50 ` Boaz Harrosh
2010-05-31 15:09 ` Nick Piggin
0 siblings, 1 reply; 20+ messages in thread
From: Boaz Harrosh @ 2010-05-31 14:50 UTC (permalink / raw)
To: Nick Piggin; +Cc: Christoph Hellwig, linux-fsdevel, open-osd
On 05/31/2010 05:33 PM, Nick Piggin wrote:
> On Mon, May 31, 2010 at 05:13:34PM +0300, Boaz Harrosh wrote:
>> On 05/31/2010 04:44 PM, Nick Piggin wrote:
>>> On Mon, May 31, 2010 at 03:30:02PM +0300, Boaz Harrosh wrote:
>>>> ---
>>>> fs/exofs/exofs.h | 1 -
>>>> fs/exofs/file.c | 1 -
>>>> fs/exofs/inode.c | 115 +++++++++++++++++++++++++++---------------------------
>>>
>>> Can you rip out all the rest of the buffer_head stuff too?
>>>
>>
>> I hope I don't have any left, that was the last, have I missed
>> something?
>
> exofs_invalidatepage, exofs_releasepage, includes of buffer_head.h.
> No point to any of that if you never actually map the buffers or
> use them for tracking state yourself.
>
Rrr thanks. Yes I'll leave the WARN_ON and do nothing, and remove the
include. I'll submit for linux-next.
Thanks.
<snip>
>
>> You see this is where exofs is different a file is an object_no on
>> multiple OSD devices. The inode is kept as an attribute of the
>> object. (data as object's data) so a exofs_sbi_remove will just
>> obliterate any association to the object. It was historically
>> called because exofs_truncate used to do what truncate_inode_pages
>> does today. (And some other in memory book keeping.) But with
>> your help all this was cleaned up.
>
> OK, I was thinking the underlying object itself needs to be trimmed
> to match i_size similarly to just a block based filesystem? Like
> exofs_oi_truncate appears to.
>
Yes you are right, generally, but since I'm doing exofs_sbi_remove()
just after that then there is no point. the osd_remove will take care of
that anyway.
>
>> Do you see any operation I missed that might need cleaning from the
>> generic VFS inode, that might now leak. As far as storage is concerned
>> I'm covered.
>>
>> [I ran git clone linux; rm -rf linux; 100 times in a loop and the OSD
>> storage stayed constant size. So I presume there is no storage leak.
>> OSD is good in this respect]
>
> I can't see anything off hand. Was just flagging points where
> vmtruncate or truncate had been called and is not now. If you
> have all those covered, then you should be OK.
>
>
I'll be testing this particular branch for a while. I have a chicken
and egg problem. I think I'm kind of dependent on Al's vfs for-next
branch. Do you think the patch could also work with Linus-2.6.35-rc1?
I'm afraid that even if so It might conflict with Al's tree if I put
it independently in the osd/for-next tree.
I'll see what comes up
Boaz
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] exofs: New truncate sequence
2010-05-31 14:50 ` Boaz Harrosh
@ 2010-05-31 15:09 ` Nick Piggin
2010-05-31 15:19 ` Boaz Harrosh
0 siblings, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2010-05-31 15:09 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Christoph Hellwig, linux-fsdevel, open-osd
On Mon, May 31, 2010 at 05:50:01PM +0300, Boaz Harrosh wrote:
> On 05/31/2010 05:33 PM, Nick Piggin wrote:
> > On Mon, May 31, 2010 at 05:13:34PM +0300, Boaz Harrosh wrote:
> >> On 05/31/2010 04:44 PM, Nick Piggin wrote:
> >>> On Mon, May 31, 2010 at 03:30:02PM +0300, Boaz Harrosh wrote:
> >>>> ---
> >>>> fs/exofs/exofs.h | 1 -
> >>>> fs/exofs/file.c | 1 -
> >>>> fs/exofs/inode.c | 115 +++++++++++++++++++++++++++---------------------------
> >>>
> >>> Can you rip out all the rest of the buffer_head stuff too?
> >>>
> >>
> >> I hope I don't have any left, that was the last, have I missed
> >> something?
> >
> > exofs_invalidatepage, exofs_releasepage, includes of buffer_head.h.
> > No point to any of that if you never actually map the buffers or
> > use them for tracking state yourself.
> >
>
> Rrr thanks. Yes I'll leave the WARN_ON and do nothing, and remove the
> include. I'll submit for linux-next.
OK. I'd get rid of it completely before it hits mainline.
> Thanks.
>
> <snip>
>
> >
> >> You see this is where exofs is different a file is an object_no on
> >> multiple OSD devices. The inode is kept as an attribute of the
> >> object. (data as object's data) so a exofs_sbi_remove will just
> >> obliterate any association to the object. It was historically
> >> called because exofs_truncate used to do what truncate_inode_pages
> >> does today. (And some other in memory book keeping.) But with
> >> your help all this was cleaned up.
> >
> > OK, I was thinking the underlying object itself needs to be trimmed
> > to match i_size similarly to just a block based filesystem? Like
> > exofs_oi_truncate appears to.
> >
>
> Yes you are right, generally, but since I'm doing exofs_sbi_remove()
> just after that then there is no point. the osd_remove will take care of
> that anyway.
I fugured it would, just wanted to check.
> >> Do you see any operation I missed that might need cleaning from the
> >> generic VFS inode, that might now leak. As far as storage is concerned
> >> I'm covered.
> >>
> >> [I ran git clone linux; rm -rf linux; 100 times in a loop and the OSD
> >> storage stayed constant size. So I presume there is no storage leak.
> >> OSD is good in this respect]
> >
> > I can't see anything off hand. Was just flagging points where
> > vmtruncate or truncate had been called and is not now. If you
> > have all those covered, then you should be OK.
> >
> >
>
> I'll be testing this particular branch for a while. I have a chicken
> and egg problem. I think I'm kind of dependent on Al's vfs for-next
> branch. Do you think the patch could also work with Linus-2.6.35-rc1?
> I'm afraid that even if so It might conflict with Al's tree if I put
> it independently in the osd/for-next tree.
I would keep them based on top of Christoph's patches (just cherry
pick the exofs hunks in the meantime).
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] exofs: New truncate sequence
2010-05-31 15:09 ` Nick Piggin
@ 2010-05-31 15:19 ` Boaz Harrosh
2010-06-01 10:08 ` Christoph Hellwig
0 siblings, 1 reply; 20+ messages in thread
From: Boaz Harrosh @ 2010-05-31 15:19 UTC (permalink / raw)
To: Nick Piggin; +Cc: Christoph Hellwig, linux-fsdevel, open-osd
On 05/31/2010 06:09 PM, Nick Piggin wrote:
> On Mon, May 31, 2010 at 05:50:01PM +0300, Boaz Harrosh wrote:
>> On 05/31/2010 05:33 PM, Nick Piggin wrote:
>>> On Mon, May 31, 2010 at 05:13:34PM +0300, Boaz Harrosh wrote:
>>>> On 05/31/2010 04:44 PM, Nick Piggin wrote:
>>>>> On Mon, May 31, 2010 at 03:30:02PM +0300, Boaz Harrosh wrote:
>>>>>> ---
>>>>>> fs/exofs/exofs.h | 1 -
>>>>>> fs/exofs/file.c | 1 -
>>>>>> fs/exofs/inode.c | 115 +++++++++++++++++++++++++++---------------------------
>>>>>
>>>>> Can you rip out all the rest of the buffer_head stuff too?
>>>>>
>>>>
>>>> I hope I don't have any left, that was the last, have I missed
>>>> something?
>>>
>>> exofs_invalidatepage, exofs_releasepage, includes of buffer_head.h.
>>> No point to any of that if you never actually map the buffers or
>>> use them for tracking state yourself.
>>>
>>
>> Rrr thanks. Yes I'll leave the WARN_ON and do nothing, and remove the
>> include. I'll submit for linux-next.
>
> OK. I'd get rid of it completely before it hits mainline.
>
>
No this one I've already queued for Linux next. On me ;-)
>
> I would keep them based on top of Christoph's patches (just cherry
> pick the exofs hunks in the meantime).
>
You mean pick them from the mailing-list. But then there is no chance
for them to go into linux-next and pick up some testing. That's why
I thought on rebasing on Linus-2.6.35-rc1.
Then when Christoph's patches get picked by Al I'll rebase. But the
base is just good right now and I'd like some exposure for this.
Christoph this is your call
Boaz
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] exofs: New truncate sequence
2010-05-31 15:19 ` Boaz Harrosh
@ 2010-06-01 10:08 ` Christoph Hellwig
2010-06-01 10:26 ` Boaz Harrosh
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2010-06-01 10:08 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Nick Piggin, Christoph Hellwig, linux-fsdevel, open-osd
On Mon, May 31, 2010 at 06:19:53PM +0300, Boaz Harrosh wrote:
> You mean pick them from the mailing-list. But then there is no chance
> for them to go into linux-next and pick up some testing. That's why
> I thought on rebasing on Linus-2.6.35-rc1.
>
> Then when Christoph's patches get picked by Al I'll rebase. But the
> base is just good right now and I'd like some exposure for this.
>
> Christoph this is your call
In the end it really depends on Al. If you want to push your patches
out ASAP feel free to do it and I can rebase easily.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] exofs: New truncate sequence
2010-06-01 10:08 ` Christoph Hellwig
@ 2010-06-01 10:26 ` Boaz Harrosh
2010-06-01 10:44 ` Christoph Hellwig
0 siblings, 1 reply; 20+ messages in thread
From: Boaz Harrosh @ 2010-06-01 10:26 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Nick Piggin, linux-fsdevel, open-osd
On 06/01/2010 01:08 PM, Christoph Hellwig wrote:
> On Mon, May 31, 2010 at 06:19:53PM +0300, Boaz Harrosh wrote:
>> You mean pick them from the mailing-list. But then there is no chance
>> for them to go into linux-next and pick up some testing. That's why
>> I thought on rebasing on Linus-2.6.35-rc1.
>>
>> Then when Christoph's patches get picked by Al I'll rebase. But the
>> base is just good right now and I'd like some exposure for this.
>>
>> Christoph this is your call
>
> In the end it really depends on Al. If you want to push your patches
> out ASAP feel free to do it and I can rebase easily.
>
Imagine my surprise when this morning I see that all of Al's patches
are in mainline. (that was sarcastic, Stupid me they where there yesterday)
So I'm posting a final version of this patch. It runs over 2.6.35-rc1
and works really well and I like this patch a lot. I'm putting it
in osd/linux-next.
Please advise: Should I ask Linus to pull this in for current Kernel?
If not, then I'll ask Al to put it in his tree for 2.6.36 and you can
rebase over his tree.
Thanks
Boaz
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH ver2] exofs: New truncate sequence
2010-05-31 12:30 [RFC] exofs: New truncate sequence Boaz Harrosh
2010-05-31 13:44 ` Nick Piggin
@ 2010-06-01 10:28 ` Boaz Harrosh
2010-06-01 10:43 ` Christoph Hellwig
2010-06-01 11:31 ` [PATCH ver3] " Boaz Harrosh
2 siblings, 1 reply; 20+ messages in thread
From: Boaz Harrosh @ 2010-06-01 10:28 UTC (permalink / raw)
To: Christoph Hellwig, Nick Piggin, linux-fsdevel, open-osd, Al Viro
These changes are crafted based on the similar
conversion done to ext2 by Nick Piggin.
- Remove the deprecated ->truncate vector, call a new
exofs_setsize for on-disk size update.
- Call truncate_pagecache on the unused pages if
write_begin/end fails.
- Cleanup exofs_delete_inode that did stupid inode
writes and updates on an inode that will be
removed.
- And finally get rid of exofs_get_block. We never
had any blocks it was all for calling nobh_truncate_page.
nobh_truncate_page is not actually needed in exofs since
the last page is complete and gone, just like all the other
pages. There is no partial blocks in exofs.
[OK do I need a partial read here upto i_size ???]
I've tested with this patch, and there are no apparent
failures, so far.
CC: Nick Piggin <npiggin@suse.de>
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
fs/exofs/exofs.h | 1 -
fs/exofs/file.c | 1 -
fs/exofs/inode.c | 112 ++++++++++++++++++++++++++---------------------------
3 files changed, 55 insertions(+), 59 deletions(-)
diff --git a/fs/exofs/exofs.h b/fs/exofs/exofs.h
index 22721b2..0706ce9 100644
--- a/fs/exofs/exofs.h
+++ b/fs/exofs/exofs.h
@@ -256,7 +256,6 @@ static inline int exofs_oi_read(struct exofs_i_info *oi,
}
/* inode.c */
-void exofs_truncate(struct inode *inode);
int exofs_setattr(struct dentry *, struct iattr *);
int exofs_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
diff --git a/fs/exofs/file.c b/fs/exofs/file.c
index fef6899..f9bfe2b 100644
--- a/fs/exofs/file.c
+++ b/fs/exofs/file.c
@@ -86,6 +86,5 @@ const struct file_operations exofs_file_operations = {
};
const struct inode_operations exofs_file_inode_operations = {
- .truncate = exofs_truncate,
.setattr = exofs_setattr,
};
diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
index 4bb6ef8..8e101ed 100644
--- a/fs/exofs/inode.c
+++ b/fs/exofs/inode.c
@@ -710,7 +710,7 @@ int exofs_write_begin(struct file *file, struct address_space *mapping,
fsdata);
if (ret) {
EXOFS_DBGMSG("simple_write_begin faild\n");
- return ret;
+ goto out;
}
page = *pagep;
@@ -725,7 +725,14 @@ int exofs_write_begin(struct file *file, struct address_space *mapping,
EXOFS_DBGMSG("__readpage_filler faild\n");
}
}
+out:
+ if (unlikely(ret)) {
+ struct inode *inode = mapping->host;
+ loff_t to = pos + len;
+ if (to > inode->i_size)
+ truncate_pagecache(inode, to, inode->i_size);
+ }
return ret;
}
@@ -750,6 +757,10 @@ static int exofs_write_end(struct file *file, struct address_space *mapping,
int ret;
ret = simple_write_end(file, mapping,pos, len, copied, page, fsdata);
+ if (unlikely(ret && pos + len > inode->i_size))
+ truncate_pagecache(inode, pos + len, inode->i_size);
+
+ /* TODO: once simple_write_end marks inode dirty remove */
if (i_size != inode->i_size)
mark_inode_dirty(inode);
return ret;
@@ -808,35 +819,18 @@ static inline int exofs_inode_is_fast_symlink(struct inode *inode)
return S_ISLNK(inode->i_mode) && (oi->i_data[0] != 0);
}
-/*
- * get_block_t - Fill in a buffer_head
- * An OSD takes care of block allocation so we just fake an allocation by
- * putting in the inode's sector_t in the buffer_head.
- * TODO: What about the case of create==0 and @iblock does not exist in the
- * object?
- */
-static int exofs_get_block(struct inode *inode, sector_t iblock,
- struct buffer_head *bh_result, int create)
-{
- map_bh(bh_result, inode->i_sb, iblock);
- return 0;
-}
-
const struct osd_attr g_attr_logical_length = ATTR_DEF(
OSD_APAGE_OBJECT_INFORMATION, OSD_ATTR_OI_LOGICAL_LENGTH, 8);
-static int _do_truncate(struct inode *inode)
+static int _do_truncate(struct inode *inode, loff_t newsize)
{
struct exofs_i_info *oi = exofs_i(inode);
- loff_t isize = i_size_read(inode);
int ret;
inode->i_mtime = inode->i_ctime = CURRENT_TIME;
- nobh_truncate_page(inode->i_mapping, isize, exofs_get_block);
-
- ret = exofs_oi_truncate(oi, (u64)isize);
- EXOFS_DBGMSG("(0x%lx) size=0x%llx\n", inode->i_ino, isize);
+ ret = exofs_oi_truncate(oi, (u64)newsize);
+ EXOFS_DBGMSG("(0x%lx) size=0x%llx\n", inode->i_ino, newsize);
return ret;
}
@@ -844,35 +838,28 @@ static int _do_truncate(struct inode *inode)
* Truncate a file to the specified size - all we have to do is set the size
* attribute. We make sure the object exists first.
*/
-void exofs_truncate(struct inode *inode)
+static int exofs_setsize(struct inode *inode, loff_t newsize)
{
- struct exofs_i_info *oi = exofs_i(inode);
- int ret;
+ loff_t oldsize;
+ int error;
- if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)
- || S_ISLNK(inode->i_mode)))
- return;
+ error = inode_newsize_ok(inode, newsize);
+ if (error)
+ return error;
+
+ if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
+ S_ISLNK(inode->i_mode)))
+ return -EINVAL;
if (exofs_inode_is_fast_symlink(inode))
- return;
+ return -EINVAL;
if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
- return;
-
- /* if we are about to truncate an object, and it hasn't been
- * created yet, wait
- */
- if (unlikely(wait_obj_created(oi)))
- goto fail;
+ return -EPERM;
- ret = _do_truncate(inode);
- if (ret)
- goto fail;
+ oldsize = inode->i_size;
+ i_size_write(inode, newsize);
+ truncate_pagecache(inode, oldsize, newsize);
-out:
- mark_inode_dirty(inode);
- return;
-fail:
- make_bad_inode(inode);
- goto out;
+ return _do_truncate(inode, newsize);
}
/*
@@ -883,11 +870,25 @@ int exofs_setattr(struct dentry *dentry, struct iattr *iattr)
struct inode *inode = dentry->d_inode;
int error;
+ /* if we are about to modify an object, and it hasn't been
+ * created yet, wait
+ */
+ error = wait_obj_created(exofs_i(inode));
+ if (unlikely(error))
+ return error;
+
error = inode_change_ok(inode, iattr);
if (error)
return error;
- error = inode_setattr(inode, iattr);
+ if (iattr->ia_valid & ATTR_SIZE) {
+ error = exofs_setsize(inode, iattr->ia_size);
+ if (error)
+ return error;
+ }
+ generic_setattr(inode, iattr);
+ mark_inode_dirty(inode);
+
return error;
}
@@ -1335,28 +1336,25 @@ void exofs_delete_inode(struct inode *inode)
truncate_inode_pages(&inode->i_data, 0);
+ /* TODO: should do better here */
if (is_bad_inode(inode))
goto no_delete;
- mark_inode_dirty(inode);
- exofs_update_inode(inode, inode_needs_sync(inode));
-
inode->i_size = 0;
- if (inode->i_blocks)
- exofs_truncate(inode);
-
clear_inode(inode);
- ret = exofs_get_io_state(&sbi->layout, &ios);
- if (unlikely(ret)) {
- EXOFS_ERR("%s: exofs_get_io_state failed\n", __func__);
- return;
- }
-
/* if we are deleting an obj that hasn't been created yet, wait */
if (!obj_created(oi)) {
BUG_ON(!obj_2bcreated(oi));
wait_event(oi->i_wq, obj_created(oi));
+ /* ignore the error attempt a remove anyway */
+ }
+
+ /* Now Remove the OSD objects */
+ ret = exofs_get_io_state(&sbi->layout, &ios);
+ if (unlikely(ret)) {
+ EXOFS_ERR("%s: exofs_get_io_state failed\n", __func__);
+ return;
}
ios->obj.id = exofs_oi_objno(oi);
--
1.6.6.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH ver2] exofs: New truncate sequence
2010-06-01 10:28 ` [PATCH ver2] " Boaz Harrosh
@ 2010-06-01 10:43 ` Christoph Hellwig
2010-06-01 10:59 ` Boaz Harrosh
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2010-06-01 10:43 UTC (permalink / raw)
To: Boaz Harrosh
Cc: Christoph Hellwig, Nick Piggin, linux-fsdevel, open-osd, Al Viro
On Tue, Jun 01, 2010 at 01:28:50PM +0300, Boaz Harrosh wrote:
> + if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
> + S_ISLNK(inode->i_mode)))
> + return -EINVAL;
> if (exofs_inode_is_fast_symlink(inode))
> - return;
> + return -EINVAL;
do_sys_truncate already makes sure ATTR_SIZE changes only happen on
ISREG files.
> if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
> - return;
Same for these - IS_APPEND directly and IS_IMMUTABLE via
inode_permission().
> + oldsize = inode->i_size;
> + i_size_write(inode, newsize);
> + truncate_pagecache(inode, oldsize, newsize);
At this point you can probably just inline this in exofs_setattr.
Especially as the lines above will be replaced with a single
function call in my next batch.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] exofs: New truncate sequence
2010-06-01 10:26 ` Boaz Harrosh
@ 2010-06-01 10:44 ` Christoph Hellwig
2010-06-01 11:05 ` Boaz Harrosh
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2010-06-01 10:44 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Christoph Hellwig, Nick Piggin, linux-fsdevel, open-osd
On Tue, Jun 01, 2010 at 01:26:30PM +0300, Boaz Harrosh wrote:
> Imagine my surprise when this morning I see that all of Al's patches
> are in mainline. (that was sarcastic, Stupid me they where there yesterday)
>
> So I'm posting a final version of this patch. It runs over 2.6.35-rc1
> and works really well and I like this patch a lot. I'm putting it
> in osd/linux-next.
>
> Please advise: Should I ask Linus to pull this in for current Kernel?
> If not, then I'll ask Al to put it in his tree for 2.6.36 and you can
> rebase over his tree.
With the other changes I have you'll be much better off rebasing
ontop of my series as it will make your life easier. But if you're
eager to push it out please do it ASAP.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH ver2] exofs: New truncate sequence
2010-06-01 10:43 ` Christoph Hellwig
@ 2010-06-01 10:59 ` Boaz Harrosh
2010-06-01 11:06 ` Christoph Hellwig
0 siblings, 1 reply; 20+ messages in thread
From: Boaz Harrosh @ 2010-06-01 10:59 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Nick Piggin, linux-fsdevel, open-osd, Al Viro
On 06/01/2010 01:43 PM, Christoph Hellwig wrote:
> On Tue, Jun 01, 2010 at 01:28:50PM +0300, Boaz Harrosh wrote:
>> + if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
>> + S_ISLNK(inode->i_mode)))
>> + return -EINVAL;
>> if (exofs_inode_is_fast_symlink(inode))
>> - return;
>> + return -EINVAL;
>
> do_sys_truncate already makes sure ATTR_SIZE changes only happen on
> ISREG files.
>
I copy/pasted this from ext2_setsize. (Would you remove it from
there as well?)
>> if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
>> - return;
>
> Same for these - IS_APPEND directly and IS_IMMUTABLE via
> inode_permission().
>
>> + oldsize = inode->i_size;
>> + i_size_write(inode, newsize);
>> + truncate_pagecache(inode, oldsize, newsize);
>
> At this point you can probably just inline this in exofs_setattr.
> Especially as the lines above will be replaced with a single
> function call in my next batch.
>
OK, I'm re-spinning minus these and inlined. I'll want to
re-run a few tests just to make sure.
Thanks
Boaz
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] exofs: New truncate sequence
2010-06-01 10:44 ` Christoph Hellwig
@ 2010-06-01 11:05 ` Boaz Harrosh
2010-06-01 11:06 ` Christoph Hellwig
0 siblings, 1 reply; 20+ messages in thread
From: Boaz Harrosh @ 2010-06-01 11:05 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Nick Piggin, linux-fsdevel, open-osd
On 06/01/2010 01:44 PM, Christoph Hellwig wrote:
> On Tue, Jun 01, 2010 at 01:26:30PM +0300, Boaz Harrosh wrote:
>> Imagine my surprise when this morning I see that all of Al's patches
>> are in mainline. (that was sarcastic, Stupid me they where there yesterday)
>>
>> So I'm posting a final version of this patch. It runs over 2.6.35-rc1
>> and works really well and I like this patch a lot. I'm putting it
>> in osd/linux-next.
>>
>> Please advise: Should I ask Linus to pull this in for current Kernel?
>> If not, then I'll ask Al to put it in his tree for 2.6.36 and you can
>> rebase over his tree.
>
> With the other changes I have you'll be much better off rebasing
> ontop of my series as it will make your life easier. But if you're
> eager to push it out please do it ASAP.
>
I would prefer to push it, because it will be much easier for me
when juggling with the pnfs tree, Which is based on 2.6.35-rc1.
Thanks
Boaz
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH ver2] exofs: New truncate sequence
2010-06-01 10:59 ` Boaz Harrosh
@ 2010-06-01 11:06 ` Christoph Hellwig
0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2010-06-01 11:06 UTC (permalink / raw)
To: Boaz Harrosh
Cc: Christoph Hellwig, Nick Piggin, linux-fsdevel, open-osd, Al Viro
On Tue, Jun 01, 2010 at 01:59:53PM +0300, Boaz Harrosh wrote:
> I copy/pasted this from ext2_setsize. (Would you remove it from
> there as well?)
I'll throw in a patch to remove it there.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] exofs: New truncate sequence
2010-06-01 11:05 ` Boaz Harrosh
@ 2010-06-01 11:06 ` Christoph Hellwig
0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2010-06-01 11:06 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Christoph Hellwig, Nick Piggin, linux-fsdevel, open-osd
On Tue, Jun 01, 2010 at 02:05:35PM +0300, Boaz Harrosh wrote:
> I would prefer to push it, because it will be much easier for me
> when juggling with the pnfs tree, Which is based on 2.6.35-rc1.
Ok, then go for it.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH ver3] exofs: New truncate sequence
2010-05-31 12:30 [RFC] exofs: New truncate sequence Boaz Harrosh
2010-05-31 13:44 ` Nick Piggin
2010-06-01 10:28 ` [PATCH ver2] " Boaz Harrosh
@ 2010-06-01 11:31 ` Boaz Harrosh
2010-06-01 11:36 ` Christoph Hellwig
2010-06-01 15:09 ` Nick Piggin
2 siblings, 2 replies; 20+ messages in thread
From: Boaz Harrosh @ 2010-06-01 11:31 UTC (permalink / raw)
To: Christoph Hellwig, Nick Piggin, linux-fsdevel, open-osd
These changes are crafted based on the similar
conversion done to ext2 by Nick Piggin.
* Remove the deprecated ->truncate vector. Let exofs_setattr
take care of on-disk size updates.
* Call truncate_pagecache on the unused pages if
write_begin/end fails.
* Cleanup exofs_delete_inode that did stupid inode
writes and updates on an inode that will be
removed.
* And finally get rid of exofs_get_block. We never
had any blocks it was all for calling nobh_truncate_page.
nobh_truncate_page is not actually needed in exofs since
the last page is complete and gone, just like all the other
pages. There is no partial blocks in exofs.
[OK do I need a partial read here upto i_size ???]
I've tested with this patch, and there are no apparent
failures, so far.
CC: Nick Piggin <npiggin@suse.de>
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
fs/exofs/exofs.h | 1 -
fs/exofs/file.c | 1 -
fs/exofs/inode.c | 123 +++++++++++++++++++++++------------------------------
3 files changed, 53 insertions(+), 72 deletions(-)
diff --git a/fs/exofs/exofs.h b/fs/exofs/exofs.h
index 22721b2..0706ce9 100644
--- a/fs/exofs/exofs.h
+++ b/fs/exofs/exofs.h
@@ -256,7 +256,6 @@ static inline int exofs_oi_read(struct exofs_i_info *oi,
}
/* inode.c */
-void exofs_truncate(struct inode *inode);
int exofs_setattr(struct dentry *, struct iattr *);
int exofs_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
diff --git a/fs/exofs/file.c b/fs/exofs/file.c
index fef6899..f9bfe2b 100644
--- a/fs/exofs/file.c
+++ b/fs/exofs/file.c
@@ -86,6 +86,5 @@ const struct file_operations exofs_file_operations = {
};
const struct inode_operations exofs_file_inode_operations = {
- .truncate = exofs_truncate,
.setattr = exofs_setattr,
};
diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
index 4bb6ef8..474dc4b 100644
--- a/fs/exofs/inode.c
+++ b/fs/exofs/inode.c
@@ -710,7 +710,7 @@ int exofs_write_begin(struct file *file, struct address_space *mapping,
fsdata);
if (ret) {
EXOFS_DBGMSG("simple_write_begin faild\n");
- return ret;
+ goto out;
}
page = *pagep;
@@ -725,7 +725,14 @@ int exofs_write_begin(struct file *file, struct address_space *mapping,
EXOFS_DBGMSG("__readpage_filler faild\n");
}
}
+out:
+ if (unlikely(ret)) {
+ struct inode *inode = mapping->host;
+ loff_t to = pos + len;
+ if (to > inode->i_size)
+ truncate_pagecache(inode, to, inode->i_size);
+ }
return ret;
}
@@ -750,6 +757,10 @@ static int exofs_write_end(struct file *file, struct address_space *mapping,
int ret;
ret = simple_write_end(file, mapping,pos, len, copied, page, fsdata);
+ if (unlikely(ret && pos + len > inode->i_size))
+ truncate_pagecache(inode, pos + len, inode->i_size);
+
+ /* TODO: once simple_write_end marks inode dirty remove */
if (i_size != inode->i_size)
mark_inode_dirty(inode);
return ret;
@@ -808,86 +819,61 @@ static inline int exofs_inode_is_fast_symlink(struct inode *inode)
return S_ISLNK(inode->i_mode) && (oi->i_data[0] != 0);
}
-/*
- * get_block_t - Fill in a buffer_head
- * An OSD takes care of block allocation so we just fake an allocation by
- * putting in the inode's sector_t in the buffer_head.
- * TODO: What about the case of create==0 and @iblock does not exist in the
- * object?
- */
-static int exofs_get_block(struct inode *inode, sector_t iblock,
- struct buffer_head *bh_result, int create)
-{
- map_bh(bh_result, inode->i_sb, iblock);
- return 0;
-}
-
const struct osd_attr g_attr_logical_length = ATTR_DEF(
OSD_APAGE_OBJECT_INFORMATION, OSD_ATTR_OI_LOGICAL_LENGTH, 8);
-static int _do_truncate(struct inode *inode)
+static int _do_truncate(struct inode *inode, loff_t newsize)
{
struct exofs_i_info *oi = exofs_i(inode);
- loff_t isize = i_size_read(inode);
int ret;
inode->i_mtime = inode->i_ctime = CURRENT_TIME;
- nobh_truncate_page(inode->i_mapping, isize, exofs_get_block);
-
- ret = exofs_oi_truncate(oi, (u64)isize);
- EXOFS_DBGMSG("(0x%lx) size=0x%llx\n", inode->i_ino, isize);
+ ret = exofs_oi_truncate(oi, (u64)newsize);
+ EXOFS_DBGMSG("(0x%lx) size=0x%llx\n", inode->i_ino, newsize);
return ret;
}
/*
- * Truncate a file to the specified size - all we have to do is set the size
- * attribute. We make sure the object exists first.
- */
-void exofs_truncate(struct inode *inode)
-{
- struct exofs_i_info *oi = exofs_i(inode);
- int ret;
-
- if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)
- || S_ISLNK(inode->i_mode)))
- return;
- if (exofs_inode_is_fast_symlink(inode))
- return;
- if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
- return;
-
- /* if we are about to truncate an object, and it hasn't been
- * created yet, wait
- */
- if (unlikely(wait_obj_created(oi)))
- goto fail;
-
- ret = _do_truncate(inode);
- if (ret)
- goto fail;
-
-out:
- mark_inode_dirty(inode);
- return;
-fail:
- make_bad_inode(inode);
- goto out;
-}
-
-/*
- * Set inode attributes - just call generic functions.
+ * Set inode attributes - update size attribute on OSD if needed,
+ * otherwise just call generic functions.
*/
int exofs_setattr(struct dentry *dentry, struct iattr *iattr)
{
struct inode *inode = dentry->d_inode;
int error;
+ /* if we are about to modify an object, and it hasn't been
+ * created yet, wait
+ */
+ error = wait_obj_created(exofs_i(inode));
+ if (unlikely(error))
+ return error;
+
error = inode_change_ok(inode, iattr);
if (error)
return error;
- error = inode_setattr(inode, iattr);
+ if (iattr->ia_valid & ATTR_SIZE) {
+ loff_t newsize = iattr->ia_size;
+ loff_t oldsize;
+
+ error = inode_newsize_ok(inode, newsize);
+ if (error)
+ return error;
+
+ oldsize = inode->i_size;
+ i_size_write(inode, newsize);
+ truncate_pagecache(inode, oldsize, newsize);
+
+ error = _do_truncate(inode, newsize);
+ if (error)
+ return error;
+ }
+
+ generic_setattr(inode, iattr);
+ mark_inode_dirty(inode);
+
return error;
}
@@ -1335,28 +1321,25 @@ void exofs_delete_inode(struct inode *inode)
truncate_inode_pages(&inode->i_data, 0);
+ /* TODO: should do better here */
if (is_bad_inode(inode))
goto no_delete;
- mark_inode_dirty(inode);
- exofs_update_inode(inode, inode_needs_sync(inode));
-
inode->i_size = 0;
- if (inode->i_blocks)
- exofs_truncate(inode);
-
clear_inode(inode);
- ret = exofs_get_io_state(&sbi->layout, &ios);
- if (unlikely(ret)) {
- EXOFS_ERR("%s: exofs_get_io_state failed\n", __func__);
- return;
- }
-
/* if we are deleting an obj that hasn't been created yet, wait */
if (!obj_created(oi)) {
BUG_ON(!obj_2bcreated(oi));
wait_event(oi->i_wq, obj_created(oi));
+ /* ignore the error attempt a remove anyway */
+ }
+
+ /* Now Remove the OSD objects */
+ ret = exofs_get_io_state(&sbi->layout, &ios);
+ if (unlikely(ret)) {
+ EXOFS_ERR("%s: exofs_get_io_state failed\n", __func__);
+ return;
}
ios->obj.id = exofs_oi_objno(oi);
--
1.6.6.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH ver3] exofs: New truncate sequence
2010-06-01 11:31 ` [PATCH ver3] " Boaz Harrosh
@ 2010-06-01 11:36 ` Christoph Hellwig
2010-06-01 11:52 ` Boaz Harrosh
2010-06-01 15:09 ` Nick Piggin
1 sibling, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2010-06-01 11:36 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Christoph Hellwig, Nick Piggin, linux-fsdevel, open-osd
On Tue, Jun 01, 2010 at 02:31:30PM +0300, Boaz Harrosh wrote:
> +static int _do_truncate(struct inode *inode, loff_t newsize)
> {
> struct exofs_i_info *oi = exofs_i(inode);
> int ret;
>
> inode->i_mtime = inode->i_ctime = CURRENT_TIME;
>
> + ret = exofs_oi_truncate(oi, (u64)newsize);
> + EXOFS_DBGMSG("(0x%lx) size=0x%llx\n", inode->i_ino, newsize);
> return ret;
Maybe I'm beeing picky, but I really don't see the need for this
function either, especially given that is has a single caller.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH ver3] exofs: New truncate sequence
2010-06-01 11:36 ` Christoph Hellwig
@ 2010-06-01 11:52 ` Boaz Harrosh
0 siblings, 0 replies; 20+ messages in thread
From: Boaz Harrosh @ 2010-06-01 11:52 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Nick Piggin, linux-fsdevel, open-osd
On 06/01/2010 02:36 PM, Christoph Hellwig wrote:
> On Tue, Jun 01, 2010 at 02:31:30PM +0300, Boaz Harrosh wrote:
>> +static int _do_truncate(struct inode *inode, loff_t newsize)
>> {
>> struct exofs_i_info *oi = exofs_i(inode);
>> int ret;
>>
>> inode->i_mtime = inode->i_ctime = CURRENT_TIME;
>>
>> + ret = exofs_oi_truncate(oi, (u64)newsize);
>> + EXOFS_DBGMSG("(0x%lx) size=0x%llx\n", inode->i_ino, newsize);
>> return ret;
>
> Maybe I'm beeing picky, but I really don't see the need for this
> function either, especially given that is has a single caller.
>
It's fine.
It must stay for a very hidden reason. In the pnfs tree, this inside
chunk is preformed from a callback pointer within the pnfs_recall mechanism.
The pnfs export layer has a recall facility that receives a pointer to
a doer. It will block all layouts, recall the layout, preform the doer
let in layouts, and return. So this is split this way to minimize maintenance
efforts.
See an old patch here:
http://git.open-osd.org/gitweb.cgi?p=linux-open-osd.git;a=commitdiff;h=ec3c599368550b0cf792738ae6448c932244df51
specifically the small hunk at inode.c
Thanks
Boaz
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH ver3] exofs: New truncate sequence
2010-06-01 11:31 ` [PATCH ver3] " Boaz Harrosh
2010-06-01 11:36 ` Christoph Hellwig
@ 2010-06-01 15:09 ` Nick Piggin
1 sibling, 0 replies; 20+ messages in thread
From: Nick Piggin @ 2010-06-01 15:09 UTC (permalink / raw)
To: Boaz Harrosh; +Cc: Christoph Hellwig, linux-fsdevel, open-osd
On Tue, Jun 01, 2010 at 02:31:30PM +0300, Boaz Harrosh wrote:
>
> These changes are crafted based on the similar
> conversion done to ext2 by Nick Piggin.
>
> * Remove the deprecated ->truncate vector. Let exofs_setattr
> take care of on-disk size updates.
> * Call truncate_pagecache on the unused pages if
> write_begin/end fails.
> * Cleanup exofs_delete_inode that did stupid inode
> writes and updates on an inode that will be
> removed.
> * And finally get rid of exofs_get_block. We never
> had any blocks it was all for calling nobh_truncate_page.
> nobh_truncate_page is not actually needed in exofs since
> the last page is complete and gone, just like all the other
> pages. There is no partial blocks in exofs.
> [OK do I need a partial read here upto i_size ???]
You shouldn't need to. We do have some interesting code trying to deal
with mmap data past i_size in core/buffer code. I hope Jan's patches
can improve that.
But anyway if you are not doing byte based IO, you have to remember
that if you mark a page uptodate, it should not contain uninitialized
data past i_size. Also, in the writeback path, junk data can get
written past i_size too.
The latter case is difficult to avoid (due to concurrent mmap writes).
It is not technically a leak perhaps, but it's a bit nasty if you allow
data to be hidden past i_size. The best course of action IMO is to
zero this out at read-in and truncate-extend time.
Doesn't seem like you have to worry about any of that if you have
byte granularity , although you do have EXOFS_BLKSHIFT defined?
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2010-06-01 15:09 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-31 12:30 [RFC] exofs: New truncate sequence Boaz Harrosh
2010-05-31 13:44 ` Nick Piggin
2010-05-31 14:13 ` Boaz Harrosh
2010-05-31 14:33 ` Nick Piggin
2010-05-31 14:50 ` Boaz Harrosh
2010-05-31 15:09 ` Nick Piggin
2010-05-31 15:19 ` Boaz Harrosh
2010-06-01 10:08 ` Christoph Hellwig
2010-06-01 10:26 ` Boaz Harrosh
2010-06-01 10:44 ` Christoph Hellwig
2010-06-01 11:05 ` Boaz Harrosh
2010-06-01 11:06 ` Christoph Hellwig
2010-06-01 10:28 ` [PATCH ver2] " Boaz Harrosh
2010-06-01 10:43 ` Christoph Hellwig
2010-06-01 10:59 ` Boaz Harrosh
2010-06-01 11:06 ` Christoph Hellwig
2010-06-01 11:31 ` [PATCH ver3] " Boaz Harrosh
2010-06-01 11:36 ` Christoph Hellwig
2010-06-01 11:52 ` Boaz Harrosh
2010-06-01 15:09 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).