Re: [Bridge] [PATCH] macvlan: add tap device backend

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Arnd Bergmann <arnd@arndb.de>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
	"Fischer, Anna" <anna.fischer@hp.com>,
	netdev@vger.kernel.org, bridge@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org,
	"David S. Miller" <davem@davemloft.net>,
	Or Gerlitz <ogerlitz@voltaire.com>,
	Edge Virtual Bridging <evb@yahoogroups.com>
Subject: Re: [Bridge] [PATCH] macvlan: add tap device backend
Date: Sun, 9 Aug 2009 20:42:24 +0000	[thread overview]
Message-ID: <200908092042.24686.arnd@arndb.de> (raw)
In-Reply-To: <20090809080215.GA17200@redhat.com>

On Sunday 09 August 2009 08:02:16 Michael S. Tsirkin wrote:
> On Thu, Aug 06, 2009 at 09:50:28PM +0000, Arnd Bergmann wrote:
> > * The same framework in macvlan can be used to add a third backend
> > into a future kernel based virtio-net implementation.
> 
> Could you split the patches up, to make this last easier?
> patch 1 - export framework
> patch 2 - code using it

Sure, will do.

> > +/* Get packet from user space buffer */
> > +static ssize_t macvtap_get_user(struct macvtap_dev *vtap,
> > +			       const struct iovec *iv, size_t count,
> > +			       int noblock)
> > +{
> > +	struct sk_buff *skb;
> > +	size_t len = count;
> > +
> > +	if (unlikely(len < ETH_HLEN))
> > +		return -EINVAL;
> > +
> > +	skb = alloc_skb(NET_IP_ALIGN + len, GFP_KERNEL);
> > +
> > +	if (!skb) {
> > +		vtap->m.dev->stats.rx_dropped++;
> > +		return -ENOMEM;
> > +	}
> > +
> > +	skb_reserve(skb, NET_IP_ALIGN);
> > +	skb_put(skb, count);
> > +
> > +	if (skb_copy_datagram_from_iovec(skb, 0, iv, 0, len)) {
> > +		vtap->m.dev->stats.rx_dropped++;
> > +		kfree_skb(skb);
> > +		return -EFAULT;
> > +	}
> > +
> > +	skb_set_network_header(skb, ETH_HLEN);
> > +	skb->dev = vtap->m.lowerdev;
> > +
> > +	macvlan_start_xmit(skb, vtap->m.dev);
> > +
> > +	return count;
> > +}
> 
> With tap, we discovered that not limiting the number of outstanding
> skbs hurts UDP performance. And the solution was to limit
> the number of outstanding packets - with hacks to work around
> the fact that userspace .

Something seems to be missing in your last sentence here.

My driver OTOH is also missing any sort of flow control in both
RX and TX direction ;) For RX, there should probably just be
a limit of frames that get buffered in the ring.

For TX, I guess there should be a way to let the packet
scheduler handle this and give us a chance to block and
unblock at the right time. I haven't found out yet how to
do that.

Would it be enough to check the dev_queue_xmit() return
code for NETDEV_TX_BUSY?

How would I get notified when it gets free again?

> > +	ret = skb_copy_datagram_iovec(skb, 0, iv, len);
> > +
> > +	vtap->m.dev->stats.rx_packets++;
> > +	vtap->m.dev->stats.rx_bytes += len;
> 
> where does atomicity guarantee for these counters come from?

AFAIK, we never do for any driver. They are statistics only and
need not be 100% correct, so the networking stack goes for
lower overhead and 99.9% correct.

> > +static ssize_t macvtap_aio_read(struct kiocb *iocb, const struct iovec *iv,
> > +			    unsigned long count, loff_t pos)
> > +{
> > +	struct file *file = iocb->ki_filp;
> > +	struct macvtap_dev *vtap = file->private_data;
> > +	DECLARE_WAITQUEUE(wait, current);
> > +	struct sk_buff *skb;
> > +	ssize_t len, ret = 0;
> > +
> > +	if (!vtap)
> > +		return -EBADFD;
> > +
> > +	len = iov_length(iv, count);
> > +	if (len < 0) {
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	add_wait_queue(&vtap->wait, &wait);
> > +	while (len) {
> > +		current->state = TASK_INTERRUPTIBLE;
> > +
> > +		/* Read frames from the queue */
> > +		if (!(skb=skb_dequeue(&vtap->readq))) {
> > +			if (file->f_flags & O_NONBLOCK) {
> > +				ret = -EAGAIN;
> > +				break;
> > +			}
> > +			if (signal_pending(current)) {
> > +				ret = -ERESTARTSYS;
> > +				break;
> > +			}
> > +			/* Nothing to read, let's sleep */
> > +			schedule();
> > +			continue;
> > +		}
> > +		ret = macvtap_put_user(vtap, skb, (struct iovec *) iv, len);
> 
> Don't cast away the constness. Instead, fix macvtap_put_user
> to used skb_copy_datagram_const_iovec which does not modify the iovec.

Ah, good catch. I had copied that from the tun driver before you
fixed it there and failed to fix it the right way when I adapted
it for the new interface.

Thanks for the review,

	Arnd <><

WARNING: multiple messages have this Message-ID (diff)

From: Arnd Bergmann <arnd@arndb.de>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: netdev@vger.kernel.org, Patrick McHardy <kaber@trash.net>,
	Stephen Hemminger <shemminger@linux-foundation.org>,
	"David S. Miller" <davem@davemloft.net>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Or Gerlitz <ogerlitz@voltaire.com>,
	"Fischer, Anna" <anna.fischer@hp.com>,
	bridge@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	Edge Virtual Bridging <evb@yahoogroups.com>
Subject: Re: [PATCH] macvlan: add tap device backend
Date: Sun, 9 Aug 2009 20:42:24 +0000	[thread overview]
Message-ID: <200908092042.24686.arnd@arndb.de> (raw)
In-Reply-To: <20090809080215.GA17200@redhat.com>

On Sunday 09 August 2009 08:02:16 Michael S. Tsirkin wrote:
> On Thu, Aug 06, 2009 at 09:50:28PM +0000, Arnd Bergmann wrote:
> > * The same framework in macvlan can be used to add a third backend
> > into a future kernel based virtio-net implementation.
> 
> Could you split the patches up, to make this last easier?
> patch 1 - export framework
> patch 2 - code using it

Sure, will do.

> > +/* Get packet from user space buffer */
> > +static ssize_t macvtap_get_user(struct macvtap_dev *vtap,
> > +			       const struct iovec *iv, size_t count,
> > +			       int noblock)
> > +{
> > +	struct sk_buff *skb;
> > +	size_t len = count;
> > +
> > +	if (unlikely(len < ETH_HLEN))
> > +		return -EINVAL;
> > +
> > +	skb = alloc_skb(NET_IP_ALIGN + len, GFP_KERNEL);
> > +
> > +	if (!skb) {
> > +		vtap->m.dev->stats.rx_dropped++;
> > +		return -ENOMEM;
> > +	}
> > +
> > +	skb_reserve(skb, NET_IP_ALIGN);
> > +	skb_put(skb, count);
> > +
> > +	if (skb_copy_datagram_from_iovec(skb, 0, iv, 0, len)) {
> > +		vtap->m.dev->stats.rx_dropped++;
> > +		kfree_skb(skb);
> > +		return -EFAULT;
> > +	}
> > +
> > +	skb_set_network_header(skb, ETH_HLEN);
> > +	skb->dev = vtap->m.lowerdev;
> > +
> > +	macvlan_start_xmit(skb, vtap->m.dev);
> > +
> > +	return count;
> > +}
> 
> With tap, we discovered that not limiting the number of outstanding
> skbs hurts UDP performance. And the solution was to limit
> the number of outstanding packets - with hacks to work around
> the fact that userspace .

Something seems to be missing in your last sentence here.

My driver OTOH is also missing any sort of flow control in both
RX and TX direction ;) For RX, there should probably just be
a limit of frames that get buffered in the ring.

For TX, I guess there should be a way to let the packet
scheduler handle this and give us a chance to block and
unblock at the right time. I haven't found out yet how to
do that.

Would it be enough to check the dev_queue_xmit() return
code for NETDEV_TX_BUSY?

How would I get notified when it gets free again?

> > +	ret = skb_copy_datagram_iovec(skb, 0, iv, len);
> > +
> > +	vtap->m.dev->stats.rx_packets++;
> > +	vtap->m.dev->stats.rx_bytes += len;
> 
> where does atomicity guarantee for these counters come from?

AFAIK, we never do for any driver. They are statistics only and
need not be 100% correct, so the networking stack goes for
lower overhead and 99.9% correct.

> > +static ssize_t macvtap_aio_read(struct kiocb *iocb, const struct iovec *iv,
> > +			    unsigned long count, loff_t pos)
> > +{
> > +	struct file *file = iocb->ki_filp;
> > +	struct macvtap_dev *vtap = file->private_data;
> > +	DECLARE_WAITQUEUE(wait, current);
> > +	struct sk_buff *skb;
> > +	ssize_t len, ret = 0;
> > +
> > +	if (!vtap)
> > +		return -EBADFD;
> > +
> > +	len = iov_length(iv, count);
> > +	if (len < 0) {
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	add_wait_queue(&vtap->wait, &wait);
> > +	while (len) {
> > +		current->state = TASK_INTERRUPTIBLE;
> > +
> > +		/* Read frames from the queue */
> > +		if (!(skb=skb_dequeue(&vtap->readq))) {
> > +			if (file->f_flags & O_NONBLOCK) {
> > +				ret = -EAGAIN;
> > +				break;
> > +			}
> > +			if (signal_pending(current)) {
> > +				ret = -ERESTARTSYS;
> > +				break;
> > +			}
> > +			/* Nothing to read, let's sleep */
> > +			schedule();
> > +			continue;
> > +		}
> > +		ret = macvtap_put_user(vtap, skb, (struct iovec *) iv, len);
> 
> Don't cast away the constness. Instead, fix macvtap_put_user
> to used skb_copy_datagram_const_iovec which does not modify the iovec.

Ah, good catch. I had copied that from the tun driver before you
fixed it there and failed to fix it the right way when I adapted
it for the new interface.

Thanks for the review,

	Arnd <><

next prev parent reply	other threads:[~2009-08-09 20:42 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-06 21:50 [Bridge] [PATCH] macvlan: add tap device backend Arnd Bergmann
2009-08-06 21:50 ` Arnd Bergmann
2009-08-07  3:20 ` [Bridge] " David Miller
2009-08-07  3:20   ` David Miller
2009-08-07 17:35 ` [Bridge] " Daniel Robbins
2009-08-07 17:35   ` Daniel Robbins
2009-08-09  8:02 ` Michael S. Tsirkin
2009-08-09  8:02   ` Michael S. Tsirkin
2009-08-09 20:42   ` Arnd Bergmann [this message]
2009-08-09 20:42     ` Arnd Bergmann
2009-08-10  8:50     ` [Bridge] " Michael S. Tsirkin
2009-08-10  8:50       ` Michael S. Tsirkin
2009-08-10 13:29       ` [Bridge] " Arnd Bergmann
2009-08-10 13:29         ` Arnd Bergmann
2009-08-10 13:42         ` [Bridge] " Michael S. Tsirkin
2009-08-10 13:42           ` Michael S. Tsirkin
2009-08-10  6:47 ` [Bridge] " Patrick McHardy
2009-08-10  6:47   ` Patrick McHardy
2009-08-10 18:43   ` [Bridge] " Arnd Bergmann
2009-08-10 18:43     ` Arnd Bergmann
     [not found] <0199E0D51A61344794750DC57738F58E6D6A6CD7F6@GVW1118EXC.americas.hpqcorp.net>
2009-08-07 19:10 ` [Bridge] " Paul Congdon (UC Davis)
2009-08-07 19:10   ` Paul Congdon (UC Davis)
2009-08-07 19:10   ` Paul Congdon (UC Davis)
2009-08-07 19:35   ` Stephen Hemminger
2009-08-07 19:35     ` Stephen Hemminger
2009-08-07 19:44     ` Fischer, Anna
2009-08-07 19:44       ` Fischer, Anna
2009-08-07 20:17       ` david
2009-08-07 20:17         ` david
2009-08-07 19:47     ` Paul Congdon (UC Davis)
2009-08-07 19:47       ` Paul Congdon (UC Davis)
2009-08-07 21:38     ` Arnd Bergmann
2009-08-07 21:38       ` Arnd Bergmann
2009-08-07 22:05   ` Arnd Bergmann
2009-08-07 22:05     ` Arnd Bergmann
2009-08-10 12:40     ` Fischer, Anna
2009-08-10 12:40       ` Fischer, Anna
2009-08-10 12:40       ` Fischer, Anna
2009-08-10 19:04       ` Arnd Bergmann
2009-08-10 19:04         ` Arnd Bergmann
2009-08-10 19:32         ` Michael S. Tsirkin
2009-08-10 19:32           ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200908092042.24686.arnd@arndb.de \
    --to=arnd@arndb.de \
    --cc=anna.fischer@hp.com \
    --cc=bridge@lists.linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=evb@yahoogroups.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@voltaire.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.