* Zero-copy Block IO with XFS
@ 2007-12-11 11:38 Matthew Hodgson
2007-12-11 16:39 ` Bhagi rathi
2007-12-12 11:07 ` David Chinner
0 siblings, 2 replies; 5+ messages in thread
From: Matthew Hodgson @ 2007-12-11 11:38 UTC (permalink / raw)
To: xfs
Hi all,
I'm experimenting with using XFS with a network block device (DST), and
have come up against the problem that when writing data to the network,
it uses kernel_sendpage to hand the page presented at the BIO layer to
the network stack. It then completes the block IO request.
The problem arises when XFS proceeds to then reuse that page before the
NIC actually sends it. Particularly if TX checksumming or TCP
segmentation is being offloaded to the NIC, it seems that the NIC will
try to access to page after the BIO request has returned, and so operate
on stale data. I assume the same problem might happen in the case of
TCP retransmits or similar. The motivation for using sendpage rather
than sendmsg (or using sendpage on a copy of the original page) is to
try to ensure speed by a zero-copy path through the subsystem.
Is there any way at all in which XFS would be able to (theoretically)
expose an API to allow an underlying block device to retain ownership of
pages until it's done with them, so as to avoid a potentially needless
copy? Or is there another way of achieving this?
thanks in advance,
Matthew.
--
Matthew Hodgson <matthew@mxtelecom.com>
Media & Systems Project Manager
Tel: +44 (0) 845 666 7778
http://www.mxtelecom.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Zero-copy Block IO with XFS
2007-12-11 11:38 Zero-copy Block IO with XFS Matthew Hodgson
@ 2007-12-11 16:39 ` Bhagi rathi
2007-12-12 1:52 ` Matthew Hodgson
2007-12-12 11:07 ` David Chinner
1 sibling, 1 reply; 5+ messages in thread
From: Bhagi rathi @ 2007-12-11 16:39 UTC (permalink / raw)
To: Matthew Hodgson; +Cc: xfs
On Dec 11, 2007 5:08 PM, Matthew Hodgson <matthew@mxtelecom.com> wrote:
> Hi all,
>
> I'm experimenting with using XFS with a network block device (DST), and
> have come up against the problem that when writing data to the network,
> it uses kernel_sendpage to hand the page presented at the BIO layer to
> the network stack. It then completes the block IO request.
Actually, you can pass a sendpage read actor which takes a reference on
the page
which ensures valid page exists with you. As long as you have ref on the
page and
no truncate to the same file, you can safely access the file. Once NIC
sends the
data over wire, you can do put_page. This should work. By the way, linux
sendfile
does something similar to the above. Look into read actor of nfs server
sendfile
usage.
-Cheers,
Bhagi.
>
>
> The problem arises when XFS proceeds to then reuse that page before the
> NIC actually sends it. Particularly if TX checksumming or TCP
> segmentation is being offloaded to the NIC, it seems that the NIC will
> try to access to page after the BIO request has returned, and so operate
> on stale data. I assume the same problem might happen in the case of
> TCP retransmits or similar. The motivation for using sendpage rather
> than sendmsg (or using sendpage on a copy of the original page) is to
> try to ensure speed by a zero-copy path through the subsystem.
>
> Is there any way at all in which XFS would be able to (theoretically)
> expose an API to allow an underlying block device to retain ownership of
> pages until it's done with them, so as to avoid a potentially needless
> copy? Or is there another way of achieving this?
>
> thanks in advance,
>
> Matthew.
>
>
> --
> Matthew Hodgson <matthew@mxtelecom.com>
> Media & Systems Project Manager
> Tel: +44 (0) 845 666 7778
> http://www.mxtelecom.com
>
>
>
[[HTML alternate version deleted]]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Zero-copy Block IO with XFS
2007-12-11 16:39 ` Bhagi rathi
@ 2007-12-12 1:52 ` Matthew Hodgson
0 siblings, 0 replies; 5+ messages in thread
From: Matthew Hodgson @ 2007-12-12 1:52 UTC (permalink / raw)
To: xfs; +Cc: Bhagi rathi
On Tue, 11 Dec 2007, Bhagi rathi wrote:
> On Dec 11, 2007 5:08 PM, Matthew Hodgson <matthew@mxtelecom.com> wrote:
>
>> Hi all,
>>
>> I'm experimenting with using XFS with a network block device (DST), and
>> have come up against the problem that when writing data to the network,
>> it uses kernel_sendpage to hand the page presented at the BIO layer to
>> the network stack. It then completes the block IO request.
>
> Actually, you can pass a sendpage read actor which takes a reference on
> the page which ensures valid page exists with you.
Hmm, i'm a little confused as to how one would do that - I can see that
sendfile can be passed a read actor for use in the underlying read, but I
can't see anywhere where sendpage can be used with a read actor. I see
that nfsd/vfs.c:nfsd_read_actor() adjusts the page refcounting to stop
them being freed before they are sent - but that only seems to be usable
when sending with sendfile.
> As long as you have
> ref on the page and no truncate to the same file, you can safely access
> the file. Once NIC sends the data over wire, you can do put_page. This
> should work.
I'm not sure that it will help, though. The problem seems to be that XFS
itself overwrites the page with new data (rather than the page being freed
and reused) whilst the page is waiting to be sent in the TCP stack. Is
there any way to prevent XFS from doing this - or have I misunderstood the
problem?
Along similar lines, is there any way to stop XFS from passing slab pages
to the block IO layer? Attempts to pass slab pages over to the TCP stack
fail too.
thanks,
Matthew.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Zero-copy Block IO with XFS
2007-12-11 11:38 Zero-copy Block IO with XFS Matthew Hodgson
2007-12-11 16:39 ` Bhagi rathi
@ 2007-12-12 11:07 ` David Chinner
2007-12-12 11:20 ` Matthew Hodgson
1 sibling, 1 reply; 5+ messages in thread
From: David Chinner @ 2007-12-12 11:07 UTC (permalink / raw)
To: Matthew Hodgson; +Cc: xfs
On Tue, Dec 11, 2007 at 11:38:19AM +0000, Matthew Hodgson wrote:
> Hi all,
>
> I'm experimenting with using XFS with a network block device (DST), and
> have come up against the problem that when writing data to the network,
> it uses kernel_sendpage to hand the page presented at the BIO layer to
> the network stack. It then completes the block IO request.
>
> The problem arises when XFS proceeds to then reuse that page before the
> NIC actually sends it.
Where does XFS overwrite a page while I/O is still in progress?
Stack trace please.
> Particularly if TX checksumming or TCP
> segmentation is being offloaded to the NIC, it seems that the NIC will
> try to access to page after the BIO request has returned, and so operate
> on stale data.
That sounds like you are completing the bio before the I/O has
really been completed. Basically, the bio can't be completed until
the data has been sent and that will prevent any use after free or
overwrite of the data while it is being sent...
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Zero-copy Block IO with XFS
2007-12-12 11:07 ` David Chinner
@ 2007-12-12 11:20 ` Matthew Hodgson
0 siblings, 0 replies; 5+ messages in thread
From: Matthew Hodgson @ 2007-12-12 11:20 UTC (permalink / raw)
To: David Chinner; +Cc: xfs
Hi Dave,
Thanks for the response :)
David Chinner wrote:
> On Tue, Dec 11, 2007 at 11:38:19AM +0000, Matthew Hodgson wrote:
>> I'm experimenting with using XFS with a network block device (DST), and
>> have come up against the problem that when writing data to the network,
>> it uses kernel_sendpage to hand the page presented at the BIO layer to
>> the network stack. It then completes the block IO request.
>>
>> The problem arises when XFS proceeds to then reuse that page before the
>> NIC actually sends it.
>
> Where does XFS overwrite a page while I/O is still in progress?
> Stack trace please.
It doesn't. The problem is that after the block device has completed
the IO request with bio_endio(), there's a risk that it may still need
access to the page in order to retransmit it, perform offloaded
checksumming, etc.
>> Particularly if TX checksumming or TCP
>> segmentation is being offloaded to the NIC, it seems that the NIC will
>> try to access to page after the BIO request has returned, and so operate
>> on stale data.
>
> That sounds like you are completing the bio before the I/O has
> really been completed. Basically, the bio can't be completed until
> the data has been sent and that will prevent any use after free or
> overwrite of the data while it is being sent...
Agreed. In general that will cause a fairly major performance hit,
however (you'd have to at least wait for the ACK from the TCP peer
before completing the BIO). Or make a copy of the page. Is there no
scope (however theoretical - I guess this is starting to become an
academic question) for providing XFS with hints that particular pages
are in use elsewhere and should not be overwritten? Could XFS mandate
only overwriting pages in its cache with a ->count of 1?
In other news, does XFS still provide the block layer with
slab-allocated pages for metadata operations?
thanks,
Matthew.
--
Matthew Hodgson <matthew@mxtelecom.com>
Media & Systems Project Manager
Tel: +44 (0) 845 666 7778
http://www.mxtelecom.com
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-12-12 11:19 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-11 11:38 Zero-copy Block IO with XFS Matthew Hodgson
2007-12-11 16:39 ` Bhagi rathi
2007-12-12 1:52 ` Matthew Hodgson
2007-12-12 11:07 ` David Chinner
2007-12-12 11:20 ` Matthew Hodgson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox