public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Zero-copy Block IO with XFS
@ 2007-12-11 11:38 Matthew Hodgson
  2007-12-11 16:39 ` Bhagi rathi
  2007-12-12 11:07 ` David Chinner
  0 siblings, 2 replies; 5+ messages in thread
From: Matthew Hodgson @ 2007-12-11 11:38 UTC (permalink / raw)
  To: xfs

Hi all,

I'm experimenting with using XFS with a network block device (DST), and 
have come up against the problem that when writing data to the network, 
it uses kernel_sendpage to hand the page presented at the BIO layer to 
the network stack.  It then completes the block IO request.

The problem arises when XFS proceeds to then reuse that page before the 
NIC actually sends it.  Particularly if TX checksumming or TCP 
segmentation is being offloaded to the NIC, it seems that the NIC will 
try to access to page after the BIO request has returned, and so operate 
on stale data.  I assume the same problem might happen in the case of 
TCP retransmits or similar.  The motivation for using sendpage rather 
than sendmsg (or using sendpage on a copy of the original page) is to 
try to ensure speed by a zero-copy path through the subsystem.

Is there any way at all in which XFS would be able to (theoretically) 
expose an API to allow an underlying block device to retain ownership of 
pages until it's done with them, so as to avoid a potentially needless 
copy?  Or is there another way of achieving this?

thanks in advance,

Matthew.


-- 
Matthew Hodgson <matthew@mxtelecom.com>
Media & Systems Project Manager
Tel: +44 (0) 845 666 7778
http://www.mxtelecom.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-12-12 11:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-11 11:38 Zero-copy Block IO with XFS Matthew Hodgson
2007-12-11 16:39 ` Bhagi rathi
2007-12-12  1:52   ` Matthew Hodgson
2007-12-12 11:07 ` David Chinner
2007-12-12 11:20   ` Matthew Hodgson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox