All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] OCFS2 and direct-io writes
@ 2008-06-04  0:22 Eivind Sarto
  2008-06-05 23:10 ` Sunil Mushran
  0 siblings, 1 reply; 3+ messages in thread
From: Eivind Sarto @ 2008-06-04  0:22 UTC (permalink / raw)
  To: ocfs2-devel

I am looking at possibility of using OCFS2 with an existing application that
requires very high throughput for read and write file access.
Files are created by single writer (process) and can be read by multiple reader,
possibly while the file is being written.  100+ different files may be written
simultaneously, and can be read by 1000+ readers. 

I am currently using XFS on a local filesystem, preallocating the unwritten extents with RESVSP,
writing and reading the files with large direct-io requests.

OCFS2-1.3.9 appears to almost support the features I need.  Large direct-io requests can be passed straight
through to the storage device and allocation of unwritten extents are supported (even same API as XFS).
However, direct-io writes are not supported if the file is being appended.  The direct-io requests
is converted to a buffered-io and the io write-bandwidth is not very good.

I am not familiar with OCFS2 internals and my question is the following:
Would it be possible to modify OCFS2 to support direct-io when writing a file sequentially?
Would it easier if the data blocks had already been allocated as unwritten extents (using RESVSP)?


I actually attempted to hack the OCFS2 code a bit to allow direct-io writes to happen when the extents
had previously been allocated with a RESVSP.  It only to a couple of minor changes:
  file.c:ocfs2_prepare_inode_for_write()
      Don't disable direct_io if file is growing.
  file.c:ocfs2_check_range_for_holes()
      Don't treat unwritten extents as holes.
  aops.c:ocfs2_direct_IO_get_blocks()
      Map unwritten extents if they exists.

With these changes, a single/local OCFS2 filesystem will allow me to write/create files using
large, direct-io.  All the write requests go straight through to the storage.  And the write performance
is very close to that of XFS.
But, in a distributed environment the inode->i_size does not get syncronized with the other nodes in
the cluster.  The direct-io path does not syncronize the inode->i_size.

Would it be possible to safely to update the i_size for all nodes in a cluster, without causing any
races or other problems?
If so, does anyone have any suggestions as to how and where in the code I could syncronize the i_size?

Any feedback would be appreciated.
Thanks,
-ivan 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20080603/fd514a0b/attachment.html 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Ocfs2-devel] OCFS2 and direct-io writes
  2008-06-04  0:22 [Ocfs2-devel] OCFS2 and direct-io writes Eivind Sarto
@ 2008-06-05 23:10 ` Sunil Mushran
  2008-06-06  0:04   ` Eivind Sarto
  0 siblings, 1 reply; 3+ messages in thread
From: Sunil Mushran @ 2008-06-05 23:10 UTC (permalink / raw)
  To: ocfs2-devel

Ivan,

Updating inode->i_size will require us to take the EX on the inode
cluster lock. (We take great pains to avoid taking that lock
in the directio path lest we serialize those ios across the
cluster.)

As far as treating unwritten extents as holes goes, we do that
simply to remember to initialize them, which is more efficient
in the buffered path. Skipping this will be a security hole.

Mark, Comments?

Also cc-ing Chris incase he can shed some light into XFS behavior.

Sunil


Eivind Sarto wrote:
>
> I am looking at possibility of using OCFS2 with an existing 
> application that
> requires very high throughput for read and write file access.
> Files are created by single writer (process) and can be read by 
> multiple reader,
> possibly while the file is being written.  100+ different files may be 
> written
> simultaneously, and can be read by 1000+ readers.
>
> I am currently using XFS on a local filesystem, preallocating the 
> unwritten extents with RESVSP,
> writing and reading the files with large direct-io requests.
>
> OCFS2-1.3.9 appears to almost support the features I need.  Large 
> direct-io requests can be passed straight
> through to the storage device and allocation of unwritten extents are 
> supported (even same API as XFS).
> However, direct-io writes are not supported if the file is being 
> appended.  The direct-io requests
> is converted to a buffered-io and the io write-bandwidth is not very good.
>
> I am not familiar with OCFS2 internals and my question is the following:
> Would it be possible to modify OCFS2 to support direct-io when writing 
> a file sequentially?
> Would it easier if the data blocks had already been allocated as 
> unwritten extents (using RESVSP)?
>
>
> I actually attempted to hack the OCFS2 code a bit to allow direct-io 
> writes to happen when the extents
> had previously been allocated with a RESVSP.  It only to a couple of 
> minor changes:
>   file.c:ocfs2_prepare_inode_for_write()
>       Don't disable direct_io if file is growing.
>   file.c:ocfs2_check_range_for_holes()
>       Don't treat unwritten extents as holes.
>   aops.c:ocfs2_direct_IO_get_blocks()
>       Map unwritten extents if they exists.
>
> With these changes, a single/local OCFS2 filesystem will allow me to 
> write/create files using
> large, direct-io.  All the write requests go straight through to the 
> storage.  And the write performance
> is very close to that of XFS.
> But, in a distributed environment the inode->i_size does not get 
> syncronized with the other nodes in
> the cluster.  The direct-io path does not syncronize the inode->i_size.
>
> Would it be possible to safely to update the i_size for all nodes in a 
> cluster, without causing any
> races or other problems?
> If so, does anyone have any suggestions as to how and where in the 
> code I could syncronize the i_size?
>
> Any feedback would be appreciated.
> Thanks,
> -ivan
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Ocfs2-devel] OCFS2 and direct-io writes
  2008-06-05 23:10 ` Sunil Mushran
@ 2008-06-06  0:04   ` Eivind Sarto
  0 siblings, 0 replies; 3+ messages in thread
From: Eivind Sarto @ 2008-06-06  0:04 UTC (permalink / raw)
  To: ocfs2-devel

Thanks for the reply.

I have spent a bit more time looking at OCFS2 code and it will clearly
require an EX lock.  It will also be neccessary to convert the unwritten extents and
also syncronize the updated inode->i_size across all nodes.

The main problem would be that this should probably be done from within ocfs2_dio_end_io().
But, I think that routine is being called from an interrupt context so that prevents me
from calling anything that could block.

I am going to try some quick and dirty hacks that will allow me to get direct-io writes working so
I can test OCFS2 with our video server.
If the results are promising, I will think about the right way to make this work.

Thanks,
-ivan 


-----Original Message-----
From: Sunil Mushran [mailto:Sunil.Mushran at oracle.com]
Sent: Thu 6/5/2008 4:10 PM
To: Eivind Sarto
Cc: ocfs2-devel at oss.oracle.com; Chris Mason
Subject: Re: [Ocfs2-devel] OCFS2 and direct-io writes
 
Ivan,

Updating inode->i_size will require us to take the EX on the inode
cluster lock. (We take great pains to avoid taking that lock
in the directio path lest we serialize those ios across the
cluster.)

As far as treating unwritten extents as holes goes, we do that
simply to remember to initialize them, which is more efficient
in the buffered path. Skipping this will be a security hole.

Mark, Comments?

Also cc-ing Chris incase he can shed some light into XFS behavior.

Sunil


Eivind Sarto wrote:
>
> I am looking at possibility of using OCFS2 with an existing 
> application that
> requires very high throughput for read and write file access.
> Files are created by single writer (process) and can be read by 
> multiple reader,
> possibly while the file is being written.  100+ different files may be 
> written
> simultaneously, and can be read by 1000+ readers.
>
> I am currently using XFS on a local filesystem, preallocating the 
> unwritten extents with RESVSP,
> writing and reading the files with large direct-io requests.
>
> OCFS2-1.3.9 appears to almost support the features I need.  Large 
> direct-io requests can be passed straight
> through to the storage device and allocation of unwritten extents are 
> supported (even same API as XFS).
> However, direct-io writes are not supported if the file is being 
> appended.  The direct-io requests
> is converted to a buffered-io and the io write-bandwidth is not very good.
>
> I am not familiar with OCFS2 internals and my question is the following:
> Would it be possible to modify OCFS2 to support direct-io when writing 
> a file sequentially?
> Would it easier if the data blocks had already been allocated as 
> unwritten extents (using RESVSP)?
>
>
> I actually attempted to hack the OCFS2 code a bit to allow direct-io 
> writes to happen when the extents
> had previously been allocated with a RESVSP.  It only to a couple of 
> minor changes:
>   file.c:ocfs2_prepare_inode_for_write()
>       Don't disable direct_io if file is growing.
>   file.c:ocfs2_check_range_for_holes()
>       Don't treat unwritten extents as holes.
>   aops.c:ocfs2_direct_IO_get_blocks()
>       Map unwritten extents if they exists.
>
> With these changes, a single/local OCFS2 filesystem will allow me to 
> write/create files using
> large, direct-io.  All the write requests go straight through to the 
> storage.  And the write performance
> is very close to that of XFS.
> But, in a distributed environment the inode->i_size does not get 
> syncronized with the other nodes in
> the cluster.  The direct-io path does not syncronize the inode->i_size.
>
> Would it be possible to safely to update the i_size for all nodes in a 
> cluster, without causing any
> races or other problems?
> If so, does anyone have any suggestions as to how and where in the 
> code I could syncronize the i_size?
>
> Any feedback would be appreciated.
> Thanks,
> -ivan
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20080605/8c940deb/attachment.html 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-06-06  0:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-04  0:22 [Ocfs2-devel] OCFS2 and direct-io writes Eivind Sarto
2008-06-05 23:10 ` Sunil Mushran
2008-06-06  0:04   ` Eivind Sarto

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.