From: "Michael S. Tsirkin" <mst@redhat.com>
To: Ian Campbell <Ian.Campbell@eu.citrix.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Jeremy Fitzhardinge <jeremy@goop.org>,
xen-devel <xen-devel@lists.xensource.com>,
"mashirle@us.ibm.com" <mashirle@us.ibm.com>,
Rusty Russell <rusty@rustcorp.com.au>
Subject: Re: SKB paged fragment lifecycle on receive
Date: Mon, 27 Jun 2011 13:21:29 +0300 [thread overview]
Message-ID: <20110627102129.GB12978@redhat.com> (raw)
In-Reply-To: <1309167695.32717.181.camel@zakaz.uk.xensource.com>
On Mon, Jun 27, 2011 at 10:41:35AM +0100, Ian Campbell wrote:
> On Sun, 2011-06-26 at 11:25 +0100, Michael S. Tsirkin wrote:
> > On Fri, Jun 24, 2011 at 04:43:22PM +0100, Ian Campbell wrote:
> > > In this mode guest data pages ("foreign pages") were mapped into the
> > > backend domain (using Xen grant-table functionality) and placed into the
> > > skb's paged frag list (skb_shinfo(skb)->frags, I hope I am using the
> > > right term). Once the page is finished with netback unmaps it in order
> > > to return it to the guest (we really want to avoid returning such pages
> > > to the general allocation pool!).
> >
> > Are the pages writeable by the source guest while netback processes
> > them? If yes, firewalling becomes unreliable as the packet can be
> > modified after it's checked, right?
>
> We only map the paged frags, the linear area is always copied (enough to
> cover maximally sized TCP/IP, including options), for this reason.
Hmm. That'll cover the most common scenarios
(such as port filtering) but not deep inspection.
Not sure how important that is.
> > Also, for guest to guest communication, do you wait for
> > the destination to stop looking at the packet in order
> > to return it to the source? If yes, can source guest
> > networking be disrupted by a slow destination?
>
> There is a timeout which ultimately does a copy into dom0 memory and
> frees up the domain grant for return to the sending guest.
Interesting. How long's the timeout?
> > > Jeremy Fitzhardinge and I subsequently
> > > looked at the possibility of a no-clone skb flag (i.e. always forcing a
> > > copy instead of a clone)
> >
> > I think this is the approach that the patchset
> > 'macvtap/vhost TX zero-copy support' takes.
>
> That's TX from the guests PoV, the same as I am looking at here,
> correct?
Right.
> I should definitely check this work out, thanks for the pointer. Is V7
> (http://marc.info/?l=linux-kernel&m=130661128431312&w=2) the most recent
> posting?
I think so.
> I suppose one difference with this is that it deals with data from
> "dom0" userspace buffers rather than (what looks like) kernel memory,
> although I don't know if that matters yet. Also it hangs off of struct
> sock which netback doesn't have. Anyway I'll check it out.
I think the most important detail is the copy on clone approach.
We can make it controlled by an skb flag if necessary.
> > > but IIRC honouring it universally turned into a
> > > very twisty maze with a number of nasty corner cases etc.
> >
> > Any examples? Are they covered by the patchset above?
>
> It was quite a while ago so I don't remember many of the specifics.
> Jeremy might remember better but for example any broadcast traffic
> hitting a bridge (a very interesting case for Xen), seems like a likely
> case? pcap was another one which I do remember, but that's obviously
> less critical.
Last I looked I thought these clone the skb, so if a copy happens on
clone things will work correctly?
> I presume with the TX zero-copy support the "copying due to attempted
> clone" rate is low?
Yes. My understanding is that this version targets a non-bridged setup
(guest connected to a macvlan on a physical dev) as the first step.
> > > FWIW I proposed a session on the subject for LPC this year.
> > We also plan to discuss this on kvm forum 2011
> > (colocated with linuxcon 2011).
> > http://www.linux-kvm.org/page/KVM_Forum_2011
>
> I had already considered coming to LinuxCon for other reasons but
> unfortunately I have family commitments around then :-(
>
> Ian.
And I'm not coming to LPC this year :(
--
MST
next prev parent reply other threads:[~2011-06-27 10:21 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-24 15:43 SKB paged fragment lifecycle on receive Ian Campbell
2011-06-24 15:43 ` Ian Campbell
2011-06-24 17:29 ` Jeremy Fitzhardinge
2011-06-24 17:56 ` Eric Dumazet
2011-06-24 18:21 ` Jeremy Fitzhardinge
2011-06-24 19:46 ` David Miller
2011-06-24 20:11 ` Jeremy Fitzhardinge
2011-06-24 20:27 ` David Miller
2011-06-25 11:58 ` Ian Campbell
2011-06-27 20:51 ` Jeremy Fitzhardinge
2011-06-28 10:25 ` Ian Campbell
2011-06-27 14:42 ` Ian Campbell
2011-06-27 22:49 ` David Miller
2011-06-28 10:24 ` Ian Campbell
2011-06-24 22:44 ` Ian Campbell
2011-06-24 22:48 ` Jeremy Fitzhardinge
2011-06-26 10:25 ` Michael S. Tsirkin
2011-06-27 9:41 ` Ian Campbell
2011-06-27 10:21 ` Michael S. Tsirkin [this message]
2011-06-27 10:54 ` Ian Campbell
2011-06-27 11:19 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110627102129.GB12978@redhat.com \
--to=mst@redhat.com \
--cc=Ian.Campbell@eu.citrix.com \
--cc=jeremy@goop.org \
--cc=mashirle@us.ibm.com \
--cc=netdev@vger.kernel.org \
--cc=rusty@rustcorp.com.au \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.