public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Zero-copy Block IO with XFS
@ 2007-12-11 11:38 Matthew Hodgson
  2007-12-11 16:39 ` Bhagi rathi
  2007-12-12 11:07 ` David Chinner
  0 siblings, 2 replies; 5+ messages in thread
From: Matthew Hodgson @ 2007-12-11 11:38 UTC (permalink / raw)
  To: xfs

Hi all,

I'm experimenting with using XFS with a network block device (DST), and 
have come up against the problem that when writing data to the network, 
it uses kernel_sendpage to hand the page presented at the BIO layer to 
the network stack.  It then completes the block IO request.

The problem arises when XFS proceeds to then reuse that page before the 
NIC actually sends it.  Particularly if TX checksumming or TCP 
segmentation is being offloaded to the NIC, it seems that the NIC will 
try to access to page after the BIO request has returned, and so operate 
on stale data.  I assume the same problem might happen in the case of 
TCP retransmits or similar.  The motivation for using sendpage rather 
than sendmsg (or using sendpage on a copy of the original page) is to 
try to ensure speed by a zero-copy path through the subsystem.

Is there any way at all in which XFS would be able to (theoretically) 
expose an API to allow an underlying block device to retain ownership of 
pages until it's done with them, so as to avoid a potentially needless 
copy?  Or is there another way of achieving this?

thanks in advance,

Matthew.


-- 
Matthew Hodgson <matthew@mxtelecom.com>
Media & Systems Project Manager
Tel: +44 (0) 845 666 7778
http://www.mxtelecom.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Zero-copy Block IO with XFS
  2007-12-11 11:38 Zero-copy Block IO with XFS Matthew Hodgson
@ 2007-12-11 16:39 ` Bhagi rathi
  2007-12-12  1:52   ` Matthew Hodgson
  2007-12-12 11:07 ` David Chinner
  1 sibling, 1 reply; 5+ messages in thread
From: Bhagi rathi @ 2007-12-11 16:39 UTC (permalink / raw)
  To: Matthew Hodgson; +Cc: xfs

On Dec 11, 2007 5:08 PM, Matthew Hodgson <matthew@mxtelecom.com> wrote:

> Hi all,
>
> I'm experimenting with using XFS with a network block device (DST), and
> have come up against the problem that when writing data to the network,
> it uses kernel_sendpage to hand the page presented at the BIO layer to
> the network stack.  It then completes the block IO request.


 Actually,  you can pass a sendpage  read actor which takes a reference on
the page
 which ensures valid page exists with you. As long as you have ref on the
page and
 no truncate to the same file, you can safely access the file.  Once NIC
sends the
 data over wire, you can do put_page. This should work. By the way, linux
sendfile
 does something similar to the above. Look into read actor of nfs server
sendfile
 usage.

-Cheers,
  Bhagi.

>
>
> The problem arises when XFS proceeds to then reuse that page before the
> NIC actually sends it.  Particularly if TX checksumming or TCP
> segmentation is being offloaded to the NIC, it seems that the NIC will
> try to access to page after the BIO request has returned, and so operate
> on stale data.  I assume the same problem might happen in the case of
> TCP retransmits or similar.  The motivation for using sendpage rather
> than sendmsg (or using sendpage on a copy of the original page) is to
> try to ensure speed by a zero-copy path through the subsystem.
>
> Is there any way at all in which XFS would be able to (theoretically)
> expose an API to allow an underlying block device to retain ownership of
> pages until it's done with them, so as to avoid a potentially needless
> copy?  Or is there another way of achieving this?
>
> thanks in advance,
>
> Matthew.
>
>
> --
> Matthew Hodgson <matthew@mxtelecom.com>
> Media & Systems Project Manager
> Tel: +44 (0) 845 666 7778
> http://www.mxtelecom.com
>
>
>


[[HTML alternate version deleted]]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Zero-copy Block IO with XFS
  2007-12-11 16:39 ` Bhagi rathi
@ 2007-12-12  1:52   ` Matthew Hodgson
  0 siblings, 0 replies; 5+ messages in thread
From: Matthew Hodgson @ 2007-12-12  1:52 UTC (permalink / raw)
  To: xfs; +Cc: Bhagi rathi

On Tue, 11 Dec 2007, Bhagi rathi wrote:

> On Dec 11, 2007 5:08 PM, Matthew Hodgson <matthew@mxtelecom.com> wrote:
>
>> Hi all,
>>
>> I'm experimenting with using XFS with a network block device (DST), and
>> have come up against the problem that when writing data to the network,
>> it uses kernel_sendpage to hand the page presented at the BIO layer to
>> the network stack.  It then completes the block IO request.
>
> Actually, you can pass a sendpage read actor which takes a reference on 
> the page which ensures valid page exists with you.

Hmm, i'm a little confused as to how one would do that - I can see that 
sendfile can be passed a read actor for use in the underlying read, but I 
can't see anywhere where sendpage can be used with a read actor.  I see 
that nfsd/vfs.c:nfsd_read_actor() adjusts the page refcounting to stop 
them being freed before they are sent - but that only seems to be usable 
when sending with sendfile.

> As long as you have 
> ref on the page and no truncate to the same file, you can safely access 
> the file.  Once NIC sends the data over wire, you can do put_page. This 
> should work.

I'm not sure that it will help, though.  The problem seems to be that XFS 
itself overwrites the page with new data (rather than the page being freed 
and reused) whilst the page is waiting to be sent in the TCP stack.  Is 
there any way to prevent XFS from doing this - or have I misunderstood the 
problem?

Along similar lines, is there any way to stop XFS from passing slab pages 
to the block IO layer?  Attempts to pass slab pages over to the TCP stack 
fail too.

thanks,

Matthew.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Zero-copy Block IO with XFS
  2007-12-11 11:38 Zero-copy Block IO with XFS Matthew Hodgson
  2007-12-11 16:39 ` Bhagi rathi
@ 2007-12-12 11:07 ` David Chinner
  2007-12-12 11:20   ` Matthew Hodgson
  1 sibling, 1 reply; 5+ messages in thread
From: David Chinner @ 2007-12-12 11:07 UTC (permalink / raw)
  To: Matthew Hodgson; +Cc: xfs

On Tue, Dec 11, 2007 at 11:38:19AM +0000, Matthew Hodgson wrote:
> Hi all,
> 
> I'm experimenting with using XFS with a network block device (DST), and 
> have come up against the problem that when writing data to the network, 
> it uses kernel_sendpage to hand the page presented at the BIO layer to 
> the network stack.  It then completes the block IO request.
> 
> The problem arises when XFS proceeds to then reuse that page before the 
> NIC actually sends it.

Where does XFS overwrite a page while I/O is still in progress?
Stack trace please.

> Particularly if TX checksumming or TCP 
> segmentation is being offloaded to the NIC, it seems that the NIC will 
> try to access to page after the BIO request has returned, and so operate 
> on stale data. 

That sounds like you are completing the bio before the I/O has
really been completed. Basically, the bio can't be completed until
the data has been sent and that will prevent any use after free or
overwrite of the data while it is being sent...

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Zero-copy Block IO with XFS
  2007-12-12 11:07 ` David Chinner
@ 2007-12-12 11:20   ` Matthew Hodgson
  0 siblings, 0 replies; 5+ messages in thread
From: Matthew Hodgson @ 2007-12-12 11:20 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs

Hi Dave,

Thanks for the response :)

David Chinner wrote:
> On Tue, Dec 11, 2007 at 11:38:19AM +0000, Matthew Hodgson wrote:
>> I'm experimenting with using XFS with a network block device (DST), and 
>> have come up against the problem that when writing data to the network, 
>> it uses kernel_sendpage to hand the page presented at the BIO layer to 
>> the network stack.  It then completes the block IO request.
>>
>> The problem arises when XFS proceeds to then reuse that page before the 
>> NIC actually sends it.
> 
> Where does XFS overwrite a page while I/O is still in progress?
> Stack trace please.

It doesn't.  The problem is that after the block device has completed 
the IO request with bio_endio(), there's a risk that it may still need 
access to the page in order to retransmit it, perform offloaded 
checksumming, etc.

>> Particularly if TX checksumming or TCP 
>> segmentation is being offloaded to the NIC, it seems that the NIC will 
>> try to access to page after the BIO request has returned, and so operate 
>> on stale data. 
> 
> That sounds like you are completing the bio before the I/O has
> really been completed. Basically, the bio can't be completed until
> the data has been sent and that will prevent any use after free or
> overwrite of the data while it is being sent...

Agreed.  In general that will cause a fairly major performance hit, 
however (you'd have to at least wait for the ACK from the TCP peer 
before completing the BIO).  Or make a copy of the page.  Is there no 
scope (however theoretical - I guess this is starting to become an 
academic question) for providing XFS with hints that particular pages 
are in use elsewhere and should not be overwritten?  Could XFS mandate 
only overwriting pages in its cache with a ->count of 1?

In other news, does XFS still provide the block layer with 
slab-allocated pages for metadata operations?

thanks,

Matthew.

-- 
Matthew Hodgson <matthew@mxtelecom.com>
Media & Systems Project Manager
Tel: +44 (0) 845 666 7778
http://www.mxtelecom.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-12-12 11:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-11 11:38 Zero-copy Block IO with XFS Matthew Hodgson
2007-12-11 16:39 ` Bhagi rathi
2007-12-12  1:52   ` Matthew Hodgson
2007-12-12 11:07 ` David Chinner
2007-12-12 11:20   ` Matthew Hodgson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox