From: "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Ian Campbell <Ian.Campbell-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
Cc: "netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>,
Neil Brown <neilb-l3A5Bk7waGM@public.gmane.org>,
"J. Bruce Fields"
<bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>,
"linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH v3 6/6] sunrpc: use SKB fragment destructors to delay completion until page is released by network stack.
Date: Mon, 30 Jan 2012 20:06:53 +0200 [thread overview]
Message-ID: <20120130180652.GC9345@redhat.com> (raw)
In-Reply-To: <1327941813.26983.270.camel-o4Be2W7LfRlXesXXhkcM7miJhflN2719@public.gmane.org>
On Mon, Jan 30, 2012 at 04:43:33PM +0000, Ian Campbell wrote:
> On Mon, 2012-01-30 at 16:23 +0000, Michael S. Tsirkin wrote:
> > On Mon, Jan 30, 2012 at 03:51:53PM +0000, Ian Campbell wrote:
> > > On Thu, 2012-01-26 at 13:11 +0000, Michael S. Tsirkin wrote:
> > > > On Wed, Jan 25, 2012 at 12:27:14PM +0000, Ian Campbell wrote:
> > > > > This prevents an issue where an ACK is delayed, a retransmit is queued (either
> > > > > at the RPC or TCP level) and the ACK arrives before the retransmission hits the
> > > > > wire. If this happens to an NFS WRITE RPC then the write() system call
> > > > > completes and the userspace process can continue, potentially modifying data
> > > > > referenced by the retransmission before the retransmission occurs.
> > > > >
> > > > > Signed-off-by: Ian Campbell <ian.campbell-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
> > > > > Acked-by: Trond Myklebust <Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
> > > > > Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> > > > > Cc: Neil Brown <neilb-l3A5Bk7waGM@public.gmane.org>
> > > > > Cc: "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
> > > > > Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > > > Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > >
> > > > This doesn't include either of the two options you proposed to address
> > > > the sender blocked forever by receiver issue with bridged septups and
> > > > endpoints such a tap device or a socket on the same box,
> > > > does it?
> > >
> > > There was never any response to Bruce's question:
> > > http://thread.gmane.org/gmane.linux.network/210873/focus=44849
> > >
> > > Stupid question: Is it a requirement that you be safe against DOS by a
> > > rogue process with a tap device? (And if so, does current code satisfy
> > > that requirement?)
> > >
> > > IMHO the answer to both questions is no, there are plenty of ways a
> > > rogue process with a tap device can wreak havoc.
> >
> > I thought the answer is an obviious yes :(
> > What are these ways tap can wreak havoc?
>
> Can't they spoof traffic
> and all sorts of things like that?
> Hrm. I
> suppose that the same as any peer on the network so we already handle
> that sort of thing. Maybe that's a red herring then.
Right. It typically does not include DOS on the sender :)
>
> > > > How about patching __skb_queue_tail to do a deep copy?
> > > > That would seem to handle both tap and socket cases -
> > > > any other ones left?
> > >
> > > Wouldn't that mean we were frequently (almost always) copying for lots
> > > of other cases too? That would rather defeat the purpose of being able
> > > to hand pages off to the network stack in a zero copy fashion.
> >
> > Yes but the case of an rpc connection to the same box
> > is very rare I think, not worth optimizing for.
>
> But changing __skb_queue_tail doesn't only impact rpc connections to the
> same box, does it? At least I can see plenty of callers of
> __skb_queue_tail which don't look like they would want a copy to occur
> -- plenty of drivers for one thing.
> Perhaps in combination with a per-queue flag or per-socket flag to
> enable it though it might work though?
Right. I missed that. I'm guessing drivers don't hang on to skbs
indefinitely. Still, copying is always safe - maybe the right thing to
do is to add an __skb_queue_tail variant that does not copy, and
gradually convert drivers that care to that API?
> > > > If we do this, I think it would be beneficial to pass a flag
> > > > to the destructor indicating that a deep copy was done:
> > > > this would allow senders to detect that and adapt.
> > >
> > > If you were doing a deep copy anyway you might as well create a
> > > completely new skb and release the old one, thereby causing the
> > > destructors to fire in the normal way for it SKB. The copy wouldn't have
> > > destructors because the pages would no longer be owned by the sender.
> > >
> > > Ian.
> >
> > What I mean is that page pin + deep copy might be more expensive
> > than directly copying. So the owner of the original skb
> > cares whether we did a deep copy or zero copy transmit worked.
>
> You mean so they can adaptively do a copy directly next time?
> I think that would add a fair bit more complexity to what, as you point
> out, is a fairly rare occurrence.
>
> Ian.
For sunrpc yes but I was thinking about utilizing this
mechanism for e.g. kvm in the future. It might be more common
there. I agree this might be a future extension.
--
MSt
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2012-01-30 18:06 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-25 12:26 [PATCH v3 0/6] skb paged fragment destructors Ian Campbell
2012-01-25 12:27 ` [PATCH v3 1/6] net: pad skb data and shinfo as a whole rather than individually Ian Campbell
2012-01-25 12:51 ` Eric Dumazet
2012-01-25 13:09 ` Ian Campbell
2012-01-25 13:12 ` Eric Dumazet
2012-01-31 14:35 ` Ian Campbell
2012-01-31 14:45 ` Eric Dumazet
2012-02-01 9:39 ` Ian Campbell
2012-02-01 10:02 ` Eric Dumazet
2012-02-01 10:09 ` Ian Campbell
2012-01-31 14:54 ` Francois Romieu
2012-02-01 9:38 ` Ian Campbell
2012-01-25 12:27 ` [PATCH v3 2/6] net: move destructor_arg to the front of sk_buff Ian Campbell
2012-01-25 12:27 ` [PATCH v3 3/6] net: add support for per-paged-fragment destructors Ian Campbell
2012-01-25 12:27 ` [PATCH v3 4/6] net: only allow paged fragments with the same destructor to be coalesced Ian Campbell
[not found] ` <1327494389.24561.316.camel-o4Be2W7LfRlXesXXhkcM7miJhflN2719@public.gmane.org>
2012-01-25 12:27 ` [PATCH v3 5/6] net: add paged frag destructor support to kernel_sendpage Ian Campbell
2012-01-25 12:27 ` [PATCH v3 6/6] sunrpc: use SKB fragment destructors to delay completion until page is released by network stack Ian Campbell
[not found] ` <1327494434-21566-6-git-send-email-ian.campbell-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2012-01-26 13:11 ` Michael S. Tsirkin
[not found] ` <20120126131136.GA16400-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-01-30 15:51 ` Ian Campbell
2012-01-30 16:23 ` Michael S. Tsirkin
[not found] ` <20120130162306.GB9345-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-01-30 16:43 ` Ian Campbell
[not found] ` <1327941813.26983.270.camel-o4Be2W7LfRlXesXXhkcM7miJhflN2719@public.gmane.org>
2012-01-30 18:06 ` Michael S. Tsirkin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120130180652.GC9345@redhat.com \
--to=mst-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=Ian.Campbell-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org \
--cc=bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org \
--cc=davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org \
--cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=neilb-l3A5Bk7waGM@public.gmane.org \
--cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).