linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC] spnfs-block: restore i_op->fallocate
       [not found] <1301500460-16467-1-git-send-email-bhalevy@panasas.com>
@ 2011-03-30 15:58 ` Christoph Hellwig
  2011-03-30 17:11   ` Benny Halevy
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2011-03-30 15:58 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Christoph Hellwig, Jim, "Rees <rees", linux-nfs

On Wed, Mar 30, 2011 at 05:54:20PM +0200, Benny Halevy wrote:
> spnfsd-blocks needs the old inode_operations API for fallocate
> as it does not have a struct_file in hand.
> 
> As all file systems (but xfs) currently use the struct file argument
> to get to the inode move their implementation back into a inode operation.
> Introduce generic_file_fallocate that can be used as the file_operations
> method that just does that and calls i_op->fallocate.
> 
> Refactor the xfs implementation and introduce _xfs_vn_fallocate
> that takes an addition attr_flags, which value depends on the struct file
> argument to xfs_file_fallocate.

NAK.  Not only isn't spnfsd-block not upstream, but I probably never will be
given what a piece of junk it is.

Second making fallocate a file operation was done on purpose, and all the
other filesystem need the same fix that xfs has - making the allocation
stable if done on an O_SYNC file descriptor.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] spnfs-block: restore i_op->fallocate
  2011-03-30 15:58 ` [RFC] spnfs-block: restore i_op->fallocate Christoph Hellwig
@ 2011-03-30 17:11   ` Benny Halevy
  2011-03-30 17:33     ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Benny Halevy @ 2011-03-30 17:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Rees, linux-nfs

On 2011-03-30 17:58, Christoph Hellwig wrote:

> On Wed, Mar 30, 2011 at 05:54:20PM +0200, Benny Halevy wrote:
>> spnfsd-blocks needs the old inode_operations API for fallocate
>> as it does not have a struct_file in hand.
>>
>> As all file systems (but xfs) currently use the struct file argument
>> to get to the inode move their implementation back into a inode operation.
>> Introduce generic_file_fallocate that can be used as the file_operations
>> method that just does that and calls i_op->fallocate.
>>
>> Refactor the xfs implementation and introduce _xfs_vn_fallocate
>> that takes an addition attr_flags, which value depends on the struct file
>> argument to xfs_file_fallocate.
> NAK.  Not only isn't spnfsd-block not upstream, but I probably never will be
> given what a piece of junk it is.
>
> Second making fallocate a file operation was done on purpose, and all the

I understand that from the API perspective but note that other than O_SYNC
there's no use for the struct file * passed in.

> other filesystem need the same fix that xfs has - making the allocation
> stable if done on an O_SYNC file descriptor.

Makes sense. This could also be done by adding a respective flags argument
to fallocate and have a common wrapper function look at the file descriptor
and call the fs fallocate, that could then get the inode rather the file.
In other words, why copy code rather than factor it out into a common
function?

Benny

> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] spnfs-block: restore i_op->fallocate
  2011-03-30 17:11   ` Benny Halevy
@ 2011-03-30 17:33     ` Christoph Hellwig
  2011-03-31  6:53       ` Benny Halevy
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2011-03-30 17:33 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Rees, linux-nfs

On Wed, Mar 30, 2011 at 07:11:47PM +0200, Benny Halevy wrote:
> Makes sense. This could also be done by adding a respective flags argument
> to fallocate and have a common wrapper function look at the file descriptor
> and call the fs fallocate, that could then get the inode rather the file.
> In other words, why copy code rather than factor it out into a common
> function?

We can discuss that _iff_ a valid use for a file-less fallocate appears
in mainline.  The pnfs-block one is not.  It's just a racy hack, which
opens gapping holes.  Take a look what it does - it allocates block for
a client to write into directly, with absolutely zero guarantee the
block allocation actually stays around until that point.

You'll need to have some outstanding token on extent map changes like
done in CXFS or NEC's "gfs" which implemented something similar to pnfs
based on nfsv3.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] spnfs-block: restore i_op->fallocate
  2011-03-30 17:33     ` Christoph Hellwig
@ 2011-03-31  6:53       ` Benny Halevy
  2011-03-31 13:53         ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Benny Halevy @ 2011-03-31  6:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Rees, linux-nfs

On 2011-03-30 19:33, Christoph Hellwig wrote:

> On Wed, Mar 30, 2011 at 07:11:47PM +0200, Benny Halevy wrote:
>> Makes sense. This could also be done by adding a respective flags argument
>> to fallocate and have a common wrapper function look at the file descriptor
>> and call the fs fallocate, that could then get the inode rather the file.
>> In other words, why copy code rather than factor it out into a common
>> function?
> We can discuss that _iff_ a valid use for a file-less fallocate appears
> in mainline.  The pnfs-block one is not.  It's just a racy hack, which
> opens gapping holes.  Take a look what it does - it allocates block for
> a client to write into directly, with absolutely zero guarantee the
> block allocation actually stays around until that point.
>
> You'll need to have some outstanding token on extent map changes like
> done in CXFS or NEC's "gfs" which implemented something similar to pnfs
> based on nfsv3.

Agreed.

Benny

> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] spnfs-block: restore i_op->fallocate
  2011-03-31  6:53       ` Benny Halevy
@ 2011-03-31 13:53         ` Christoph Hellwig
  2011-04-01  8:30           ` Benny Halevy
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2011-03-31 13:53 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Rees, linux-nfs

Btw, how is the spnfs-block support supposed to work at all?

fallocate creates unwritten extents, and I can't actually
spot a place that would later convert them to regular extents.
And how does it work for filesystems without ->fallocate like
ext3?

And how do we prevent clients from reading uninitialized
blocks in areas allocated on the server but not written
to yet.  Is there anything like unwritten extents in the
on the write protocol?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] spnfs-block: restore i_op->fallocate
  2011-03-31 13:53         ` Christoph Hellwig
@ 2011-04-01  8:30           ` Benny Halevy
  0 siblings, 0 replies; 6+ messages in thread
From: Benny Halevy @ 2011-04-01  8:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Rees, linux-nfs

On 2011-03-31 09:53, Christoph Hellwig wrote:

> Btw, how is the spnfs-block support supposed to work at all?
>
> fallocate creates unwritten extents, and I can't actually
> spot a place that would later convert them to regular extents.

It's supposed to work by committing the extents on
layoutcommit. It's supposed to happen in the spnfs-block
but it doesn't. Currently, the generic layer calls write_inode_now
if the size changes and the fs is exported "sync" so my guess is that
it works now only when the file is extended but not when writing
in-place into holes.

> And how does it work for filesystems without ->fallocate like
> ext3?

It doesn't.  spnfs-block requires fs support for fallocate and fiemap.

> And how do we prevent clients from reading uninitialized
> blocks in areas allocated on the server but not written
> to yet.  Is there anything like unwritten extents in the
> on the write protocol?

Yes, there is, yet spnfs-block does not implement it
as it was implemented essentially as a reference/testing tool.

The protocol allows the server to provisionally allocate space
on layoutget that the client can write into, privately.
The clients changes only become visible to other clients
when they are committed to the file on LAYOUTCOMMIT.
This also allows implementing copy-on-write as the client
can be given in the layout separate extents describing the
readable copy of the block and the writeable one and the
client participates in the copy-on-write process by copying
the contents of the block before modifying it (or zeroing it out
if it's just invalid).  This is done at write_begin time on
the client side.

Benny

> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-04-01  8:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1301500460-16467-1-git-send-email-bhalevy@panasas.com>
2011-03-30 15:58 ` [RFC] spnfs-block: restore i_op->fallocate Christoph Hellwig
2011-03-30 17:11   ` Benny Halevy
2011-03-30 17:33     ` Christoph Hellwig
2011-03-31  6:53       ` Benny Halevy
2011-03-31 13:53         ` Christoph Hellwig
2011-04-01  8:30           ` Benny Halevy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).