Re: netback: Delayed copy alternative

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Zoltan Kiss <zoltan.kiss@citrix.com>
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>,
	Jonathan Davies <Jonathan.Davies@eu.citrix.com>,
	Paul Durrant <Paul.Durrant@citrix.com>,
	David Vrabel <david.vrabel@citrix.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Malcolm Crossley <malcolm.crossley@citrix.com>
Subject: Re: netback: Delayed copy alternative
Date: Wed, 20 Nov 2013 12:28:48 +0000	[thread overview]
Message-ID: <528CAB00.6080203@citrix.com> (raw)
In-Reply-To: <1384946191.28441.52.camel@kazak.uk.xensource.com>

On 20/11/13 11:16, Ian Campbell wrote:
> On Tue, 2013-11-19 at 16:42 +0000, Zoltan Kiss wrote:
>> After further discussions and investigations, it seems it is a viable
>> approach to drop the packets in the RX path of the another VIF after a
>> timeout,
>
> Since RX/TX in netback is a bit confusing (since it is inverted, but you
> don't seem to be using it that way): A diagram:
I meant "RX path" to be the Dom0->DomU path.

> domU (netfront) --> dom0 (netback) --> network stack --> bridge ,
>                                                                  |
> domU' (netfront) <- timeout & drop <- dom0 (netback) <- stack <-'
>
> You are proposing dropping at "timeout & drop". Since the dom0->domU'
> path is based on copying there should be no problem with an skb getting
> stuck with domU' holding on to it. In effect you will be dropping
> traffic from some internal queue before it hits the shared ring anyway.
> You will be making sure that either the full skb fits on the ring or it
> remains in the queue.
I propose to drop from the qdisc queue, as it happens in classic kernel. 
I actually reimplemented the wake_queue timer.
When the stack calls xenvif_start_xmit, it checks if it can fit the 
packet into the ring, and it drops it if not. If it can fit, we are 
good, as it will be copied shortly. If not, dropping is also good for 
us, because we get back the pages.
Next start_xmit checks if there is room in the ring for a max sized 
packet, and stops queueing. That's bad, because packets gather up in 
qdisc indefinitely, if the ring doesn't move. The wake_queue timer is 
set therefore, and when it fires, it wakes the queueing. Then qdisc 
start calling start_xmit again, and as the ring is full, it drops the 
packet. And because we don't stop the queueing when we drop, it will 
keep calling start_xmit and drops the packets, draining qdisc eventually.

> What about any queueing which occurs in "network stack" (either
> instance) or "bridge?" How can you cancel an skb out of those? Are you
> intending that by dropping packets a "timeout & drop" they would
> eventually make their way to the second netback and be droppable? How
> convinced are you that this is viable?
I don't know if the stack does too much queueing on that path apart from 
qdisc, I assume not. But even if it does, eventually those queues will 
advance when the qdisc one gets drained out.

>
>>   and don't care about the rest of the cases (packets get stucked
>> somewhere in the core stack, a driver, or in the queue of a Dom0
>> userspace socket. In the latter case, they get copied anyway, so it
>> shouldn't happen)
>
> I think that is OK iff you are copying for dom0 delivery. If you are not
> copying here then an dom0 process (including an anonymous one) which can
> open a socket and receive traffic could block things indefinitely.
>
> The more general case of an unprivileged or deprivileged (i.e. a process
> which has dropped its root privs somehow) being able to interfere with
> the traffic in a way which causes gridlock might need a little more
> thought though.
deliver_skb() will copy the skbs sent to Dom0 stack:

https://lkml.org/lkml/2012/7/20/363

>
>> Does anyone has a counterargument?
>>
>> Zoli
>>
>> On 13/11/13 20:29, Zoltan Kiss wrote:
>>> Hi,
>>>
>>> I'm trying to forward port delayed copy to my new grant mapping patches.
>>> One important problem I've faced is that classic used
>>> gnttab_copy_grant_page to replace the granted page with a local copy and
>>> unmap the grant. And this function has never been upstreamed as only
>>> netback used it. Unfortunately upstreaming it is not a very easy task,
>>> as the kernel's grant table infrastructure doesn't track at the moment
>>> whether the page is DMA mapped or not. It is required because we
>>> shouldn't proceed with the copy and replace if a device already mapped
>>> the page for DMA.
>>> David came up with an alternative idea: we do this delayed copy because
>>> we don't want the guest's page to get stucked in Dom0 indefinitely. The
>>> only realistic case for that would be if the egress interface would be
>>> an another guest's vif, where the guest (either due to a bug or as a
>>> malicious attempt) doesn't empty its ring. I think it's a safe
>>> assumption that Dom0 otherwise doesn't hold on to packets for too long.
>>> Or if it does, then that's a bug we should fix instead of doing a copy
>>> of the packet.
>>> If we accept that only other vif's can keep the skb indefinitely, then
>>> an easier solution would be to handle this problem on the RX side: the
>>> RX thread can also check whether this skb hanged around for too long and
>>> drop it. Actually, xenvif_start_xmit already checks if the guest
>>> provided enough slots for us to do the grant copy. If I understand it
>>> correctly. What do you think about such an approach?
>>
>
>

     prev parent reply	other threads:[~2013-11-20 12:28 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-13 20:29 netback: Delayed copy alternative Zoltan Kiss
2013-11-14  9:42 ` Paul Durrant
2013-11-14 11:04   ` David Vrabel
2013-11-14 19:27   ` Timout packets in device's TX queue Zoltan Kiss
2013-11-19 16:42 ` netback: Delayed copy alternative Zoltan Kiss
2013-11-20 11:16   ` Ian Campbell
2013-11-20 12:28     ` Zoltan Kiss [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=528CAB00.6080203@citrix.com \
    --to=zoltan.kiss@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Jonathan.Davies@eu.citrix.com \
    --cc=Paul.Durrant@citrix.com \
    --cc=david.vrabel@citrix.com \
    --cc=malcolm.crossley@citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).