netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnd Bergmann <arnd@arndb.de>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: netdev@vger.kernel.org, Patrick McHardy <kaber@trash.net>,
	Stephen Hemminger <shemminger@linux-foundation.org>,
	"David S. Miller" <davem@davemloft.net>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Or Gerlitz <ogerlitz@voltaire.com>,
	"Fischer, Anna" <anna.fischer@hp.com>,
	bridge@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	Edge Virtual Bridging <evb@yahoogroups.com>
Subject: Re: [PATCH] macvlan: add tap device backend
Date: Sun, 9 Aug 2009 20:42:24 +0000	[thread overview]
Message-ID: <200908092042.24686.arnd@arndb.de> (raw)
In-Reply-To: <20090809080215.GA17200@redhat.com>

On Sunday 09 August 2009 08:02:16 Michael S. Tsirkin wrote:
> On Thu, Aug 06, 2009 at 09:50:28PM +0000, Arnd Bergmann wrote:
> > * The same framework in macvlan can be used to add a third backend
> > into a future kernel based virtio-net implementation.
> 
> Could you split the patches up, to make this last easier?
> patch 1 - export framework
> patch 2 - code using it

Sure, will do.

> > +/* Get packet from user space buffer */
> > +static ssize_t macvtap_get_user(struct macvtap_dev *vtap,
> > +			       const struct iovec *iv, size_t count,
> > +			       int noblock)
> > +{
> > +	struct sk_buff *skb;
> > +	size_t len = count;
> > +
> > +	if (unlikely(len < ETH_HLEN))
> > +		return -EINVAL;
> > +
> > +	skb = alloc_skb(NET_IP_ALIGN + len, GFP_KERNEL);
> > +
> > +	if (!skb) {
> > +		vtap->m.dev->stats.rx_dropped++;
> > +		return -ENOMEM;
> > +	}
> > +
> > +	skb_reserve(skb, NET_IP_ALIGN);
> > +	skb_put(skb, count);
> > +
> > +	if (skb_copy_datagram_from_iovec(skb, 0, iv, 0, len)) {
> > +		vtap->m.dev->stats.rx_dropped++;
> > +		kfree_skb(skb);
> > +		return -EFAULT;
> > +	}
> > +
> > +	skb_set_network_header(skb, ETH_HLEN);
> > +	skb->dev = vtap->m.lowerdev;
> > +
> > +	macvlan_start_xmit(skb, vtap->m.dev);
> > +
> > +	return count;
> > +}
> 
> With tap, we discovered that not limiting the number of outstanding
> skbs hurts UDP performance. And the solution was to limit
> the number of outstanding packets - with hacks to work around
> the fact that userspace .

Something seems to be missing in your last sentence here.

My driver OTOH is also missing any sort of flow control in both
RX and TX direction ;) For RX, there should probably just be
a limit of frames that get buffered in the ring.

For TX, I guess there should be a way to let the packet
scheduler handle this and give us a chance to block and
unblock at the right time. I haven't found out yet how to
do that.

Would it be enough to check the dev_queue_xmit() return
code for NETDEV_TX_BUSY?

How would I get notified when it gets free again?

> > +	ret = skb_copy_datagram_iovec(skb, 0, iv, len);
> > +
> > +	vtap->m.dev->stats.rx_packets++;
> > +	vtap->m.dev->stats.rx_bytes += len;
> 
> where does atomicity guarantee for these counters come from?

AFAIK, we never do for any driver. They are statistics only and
need not be 100% correct, so the networking stack goes for
lower overhead and 99.9% correct.

> > +static ssize_t macvtap_aio_read(struct kiocb *iocb, const struct iovec *iv,
> > +			    unsigned long count, loff_t pos)
> > +{
> > +	struct file *file = iocb->ki_filp;
> > +	struct macvtap_dev *vtap = file->private_data;
> > +	DECLARE_WAITQUEUE(wait, current);
> > +	struct sk_buff *skb;
> > +	ssize_t len, ret = 0;
> > +
> > +	if (!vtap)
> > +		return -EBADFD;
> > +
> > +	len = iov_length(iv, count);
> > +	if (len < 0) {
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	add_wait_queue(&vtap->wait, &wait);
> > +	while (len) {
> > +		current->state = TASK_INTERRUPTIBLE;
> > +
> > +		/* Read frames from the queue */
> > +		if (!(skb=skb_dequeue(&vtap->readq))) {
> > +			if (file->f_flags & O_NONBLOCK) {
> > +				ret = -EAGAIN;
> > +				break;
> > +			}
> > +			if (signal_pending(current)) {
> > +				ret = -ERESTARTSYS;
> > +				break;
> > +			}
> > +			/* Nothing to read, let's sleep */
> > +			schedule();
> > +			continue;
> > +		}
> > +		ret = macvtap_put_user(vtap, skb, (struct iovec *) iv, len);
> 
> Don't cast away the constness. Instead, fix macvtap_put_user
> to used skb_copy_datagram_const_iovec which does not modify the iovec.

Ah, good catch. I had copied that from the tun driver before you
fixed it there and failed to fix it the right way when I adapted
it for the new interface.

Thanks for the review,

	Arnd <><

  reply	other threads:[~2009-08-09 20:42 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-06 21:50 [PATCH] macvlan: add tap device backend Arnd Bergmann
2009-08-07  3:20 ` David Miller
2009-08-07 17:35 ` [Bridge] " Daniel Robbins
2009-08-09  8:02 ` Michael S. Tsirkin
2009-08-09 20:42   ` Arnd Bergmann [this message]
2009-08-10  8:50     ` Michael S. Tsirkin
2009-08-10 13:29       ` Arnd Bergmann
2009-08-10 13:42         ` Michael S. Tsirkin
2009-08-10  6:47 ` Patrick McHardy
2009-08-10 18:43   ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200908092042.24686.arnd@arndb.de \
    --to=arnd@arndb.de \
    --cc=anna.fischer@hp.com \
    --cc=bridge@lists.linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=evb@yahoogroups.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=kaber@trash.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@voltaire.com \
    --cc=shemminger@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).