Re: [Bridge] [PATCH] macvlan: add tap device backend

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
	"Fischer, Anna" <anna.fischer@hp.com>,
	netdev@vger.kernel.org, bridge@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org,
	"David S. Miller" <davem@davemloft.net>,
	Or Gerlitz <ogerlitz@voltaire.com>,
	Edge Virtual Bridging <evb@yahoogroups.com>
Subject: Re: [Bridge] [PATCH] macvlan: add tap device backend
Date: Mon, 10 Aug 2009 11:50:03 +0300	[thread overview]
Message-ID: <20090810085003.GA3654@redhat.com> (raw)
In-Reply-To: <200908092042.24686.arnd@arndb.de>

On Sun, Aug 09, 2009 at 08:42:24PM +0000, Arnd Bergmann wrote:
> On Sunday 09 August 2009 08:02:16 Michael S. Tsirkin wrote:
> > On Thu, Aug 06, 2009 at 09:50:28PM +0000, Arnd Bergmann wrote:
> > > * The same framework in macvlan can be used to add a third backend
> > > into a future kernel based virtio-net implementation.
> > 
> > Could you split the patches up, to make this last easier?
> > patch 1 - export framework
> > patch 2 - code using it
> 
> Sure, will do.
> 
> > > +/* Get packet from user space buffer */
> > > +static ssize_t macvtap_get_user(struct macvtap_dev *vtap,
> > > +			       const struct iovec *iv, size_t count,
> > > +			       int noblock)
> > > +{
> > > +	struct sk_buff *skb;
> > > +	size_t len = count;
> > > +
> > > +	if (unlikely(len < ETH_HLEN))
> > > +		return -EINVAL;
> > > +
> > > +	skb = alloc_skb(NET_IP_ALIGN + len, GFP_KERNEL);
> > > +
> > > +	if (!skb) {
> > > +		vtap->m.dev->stats.rx_dropped++;
> > > +		return -ENOMEM;
> > > +	}
> > > +
> > > +	skb_reserve(skb, NET_IP_ALIGN);
> > > +	skb_put(skb, count);
> > > +
> > > +	if (skb_copy_datagram_from_iovec(skb, 0, iv, 0, len)) {
> > > +		vtap->m.dev->stats.rx_dropped++;
> > > +		kfree_skb(skb);
> > > +		return -EFAULT;
> > > +	}
> > > +
> > > +	skb_set_network_header(skb, ETH_HLEN);
> > > +	skb->dev = vtap->m.lowerdev;
> > > +
> > > +	macvlan_start_xmit(skb, vtap->m.dev);
> > > +
> > > +	return count;
> > > +}
> > 
> > With tap, we discovered that not limiting the number of outstanding
> > skbs hurts UDP performance. And the solution was to limit
> > the number of outstanding packets - with hacks to work around
> > the fact that userspace .
> 
> Something seems to be missing in your last sentence here.

Most userspace does not seem to implement software flow control for UDP,
even though it probably should.

> My driver OTOH is also missing any sort of flow control in both
> RX and TX direction ;) For RX, there should probably just be
> a limit of frames that get buffered in the ring.
> 
> For TX, I guess there should be a way to let the packet
> scheduler handle this and give us a chance to block and
> unblock at the right time. I haven't found out yet how to
> do that.
> 
> Would it be enough to check the dev_queue_xmit() return
> code for NETDEV_TX_BUSY?
> 
> How would I get notified when it gets free again?

You can do this by creating a socket. Look at how tun does
this now.

> > > +	ret = skb_copy_datagram_iovec(skb, 0, iv, len);
> > > +
> > > +	vtap->m.dev->stats.rx_packets++;
> > > +	vtap->m.dev->stats.rx_bytes += len;
> > 
> > where does atomicity guarantee for these counters come from?
> 
> AFAIK, we never do for any driver. They are statistics only and
> need not be 100% correct, so the networking stack goes for
> lower overhead and 99.9% correct.
> 
> > > +static ssize_t macvtap_aio_read(struct kiocb *iocb, const struct iovec *iv,
> > > +			    unsigned long count, loff_t pos)
> > > +{
> > > +	struct file *file = iocb->ki_filp;
> > > +	struct macvtap_dev *vtap = file->private_data;
> > > +	DECLARE_WAITQUEUE(wait, current);
> > > +	struct sk_buff *skb;
> > > +	ssize_t len, ret = 0;
> > > +
> > > +	if (!vtap)
> > > +		return -EBADFD;
> > > +
> > > +	len = iov_length(iv, count);
> > > +	if (len < 0) {
> > > +		ret = -EINVAL;
> > > +		goto out;
> > > +	}
> > > +
> > > +	add_wait_queue(&vtap->wait, &wait);
> > > +	while (len) {
> > > +		current->state = TASK_INTERRUPTIBLE;
> > > +
> > > +		/* Read frames from the queue */
> > > +		if (!(skb=skb_dequeue(&vtap->readq))) {
> > > +			if (file->f_flags & O_NONBLOCK) {
> > > +				ret = -EAGAIN;
> > > +				break;
> > > +			}
> > > +			if (signal_pending(current)) {
> > > +				ret = -ERESTARTSYS;
> > > +				break;
> > > +			}
> > > +			/* Nothing to read, let's sleep */
> > > +			schedule();
> > > +			continue;
> > > +		}
> > > +		ret = macvtap_put_user(vtap, skb, (struct iovec *) iv, len);
> > 
> > Don't cast away the constness. Instead, fix macvtap_put_user
> > to used skb_copy_datagram_const_iovec which does not modify the iovec.
> 
> Ah, good catch. I had copied that from the tun driver before you
> fixed it there and failed to fix it the right way when I adapted
> it for the new interface.
> 
> Thanks for the review,
> 
> 	Arnd <><

WARNING: multiple messages have this Message-ID (diff)

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: netdev@vger.kernel.org, Patrick McHardy <kaber@trash.net>,
	Stephen Hemminger <shemminger@linux-foundation.org>,
	"David S. Miller" <davem@davemloft.net>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Or Gerlitz <ogerlitz@voltaire.com>,
	"Fischer, Anna" <anna.fischer@hp.com>,
	bridge@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	Edge Virtual Bridging <evb@yahoogroups.com>
Subject: Re: [PATCH] macvlan: add tap device backend
Date: Mon, 10 Aug 2009 11:50:03 +0300	[thread overview]
Message-ID: <20090810085003.GA3654@redhat.com> (raw)
In-Reply-To: <200908092042.24686.arnd@arndb.de>

On Sun, Aug 09, 2009 at 08:42:24PM +0000, Arnd Bergmann wrote:
> On Sunday 09 August 2009 08:02:16 Michael S. Tsirkin wrote:
> > On Thu, Aug 06, 2009 at 09:50:28PM +0000, Arnd Bergmann wrote:
> > > * The same framework in macvlan can be used to add a third backend
> > > into a future kernel based virtio-net implementation.
> > 
> > Could you split the patches up, to make this last easier?
> > patch 1 - export framework
> > patch 2 - code using it
> 
> Sure, will do.
> 
> > > +/* Get packet from user space buffer */
> > > +static ssize_t macvtap_get_user(struct macvtap_dev *vtap,
> > > +			       const struct iovec *iv, size_t count,
> > > +			       int noblock)
> > > +{
> > > +	struct sk_buff *skb;
> > > +	size_t len = count;
> > > +
> > > +	if (unlikely(len < ETH_HLEN))
> > > +		return -EINVAL;
> > > +
> > > +	skb = alloc_skb(NET_IP_ALIGN + len, GFP_KERNEL);
> > > +
> > > +	if (!skb) {
> > > +		vtap->m.dev->stats.rx_dropped++;
> > > +		return -ENOMEM;
> > > +	}
> > > +
> > > +	skb_reserve(skb, NET_IP_ALIGN);
> > > +	skb_put(skb, count);
> > > +
> > > +	if (skb_copy_datagram_from_iovec(skb, 0, iv, 0, len)) {
> > > +		vtap->m.dev->stats.rx_dropped++;
> > > +		kfree_skb(skb);
> > > +		return -EFAULT;
> > > +	}
> > > +
> > > +	skb_set_network_header(skb, ETH_HLEN);
> > > +	skb->dev = vtap->m.lowerdev;
> > > +
> > > +	macvlan_start_xmit(skb, vtap->m.dev);
> > > +
> > > +	return count;
> > > +}
> > 
> > With tap, we discovered that not limiting the number of outstanding
> > skbs hurts UDP performance. And the solution was to limit
> > the number of outstanding packets - with hacks to work around
> > the fact that userspace .
> 
> Something seems to be missing in your last sentence here.

Most userspace does not seem to implement software flow control for UDP,
even though it probably should.

> My driver OTOH is also missing any sort of flow control in both
> RX and TX direction ;) For RX, there should probably just be
> a limit of frames that get buffered in the ring.
> 
> For TX, I guess there should be a way to let the packet
> scheduler handle this and give us a chance to block and
> unblock at the right time. I haven't found out yet how to
> do that.
> 
> Would it be enough to check the dev_queue_xmit() return
> code for NETDEV_TX_BUSY?
> 
> How would I get notified when it gets free again?

You can do this by creating a socket. Look at how tun does
this now.

> > > +	ret = skb_copy_datagram_iovec(skb, 0, iv, len);
> > > +
> > > +	vtap->m.dev->stats.rx_packets++;
> > > +	vtap->m.dev->stats.rx_bytes += len;
> > 
> > where does atomicity guarantee for these counters come from?
> 
> AFAIK, we never do for any driver. They are statistics only and
> need not be 100% correct, so the networking stack goes for
> lower overhead and 99.9% correct.
> 
> > > +static ssize_t macvtap_aio_read(struct kiocb *iocb, const struct iovec *iv,
> > > +			    unsigned long count, loff_t pos)
> > > +{
> > > +	struct file *file = iocb->ki_filp;
> > > +	struct macvtap_dev *vtap = file->private_data;
> > > +	DECLARE_WAITQUEUE(wait, current);
> > > +	struct sk_buff *skb;
> > > +	ssize_t len, ret = 0;
> > > +
> > > +	if (!vtap)
> > > +		return -EBADFD;
> > > +
> > > +	len = iov_length(iv, count);
> > > +	if (len < 0) {
> > > +		ret = -EINVAL;
> > > +		goto out;
> > > +	}
> > > +
> > > +	add_wait_queue(&vtap->wait, &wait);
> > > +	while (len) {
> > > +		current->state = TASK_INTERRUPTIBLE;
> > > +
> > > +		/* Read frames from the queue */
> > > +		if (!(skb=skb_dequeue(&vtap->readq))) {
> > > +			if (file->f_flags & O_NONBLOCK) {
> > > +				ret = -EAGAIN;
> > > +				break;
> > > +			}
> > > +			if (signal_pending(current)) {
> > > +				ret = -ERESTARTSYS;
> > > +				break;
> > > +			}
> > > +			/* Nothing to read, let's sleep */
> > > +			schedule();
> > > +			continue;
> > > +		}
> > > +		ret = macvtap_put_user(vtap, skb, (struct iovec *) iv, len);
> > 
> > Don't cast away the constness. Instead, fix macvtap_put_user
> > to used skb_copy_datagram_const_iovec which does not modify the iovec.
> 
> Ah, good catch. I had copied that from the tun driver before you
> fixed it there and failed to fix it the right way when I adapted
> it for the new interface.
> 
> Thanks for the review,
> 
> 	Arnd <><

next prev parent reply	other threads:[~2009-08-10  8:50 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-06 21:50 [Bridge] [PATCH] macvlan: add tap device backend Arnd Bergmann
2009-08-06 21:50 ` Arnd Bergmann
2009-08-07  3:20 ` [Bridge] " David Miller
2009-08-07  3:20   ` David Miller
2009-08-07 17:35 ` [Bridge] " Daniel Robbins
2009-08-07 17:35   ` Daniel Robbins
2009-08-09  8:02 ` Michael S. Tsirkin
2009-08-09  8:02   ` Michael S. Tsirkin
2009-08-09 20:42   ` [Bridge] " Arnd Bergmann
2009-08-09 20:42     ` Arnd Bergmann
2009-08-10  8:50     ` Michael S. Tsirkin [this message]
2009-08-10  8:50       ` Michael S. Tsirkin
2009-08-10 13:29       ` [Bridge] " Arnd Bergmann
2009-08-10 13:29         ` Arnd Bergmann
2009-08-10 13:42         ` [Bridge] " Michael S. Tsirkin
2009-08-10 13:42           ` Michael S. Tsirkin
2009-08-10  6:47 ` [Bridge] " Patrick McHardy
2009-08-10  6:47   ` Patrick McHardy
2009-08-10 18:43   ` [Bridge] " Arnd Bergmann
2009-08-10 18:43     ` Arnd Bergmann
     [not found] <0199E0D51A61344794750DC57738F58E6D6A6CD7F6@GVW1118EXC.americas.hpqcorp.net>
2009-08-07 19:10 ` [Bridge] " Paul Congdon (UC Davis)
2009-08-07 19:10   ` Paul Congdon (UC Davis)
2009-08-07 19:10   ` Paul Congdon (UC Davis)
2009-08-07 19:35   ` Stephen Hemminger
2009-08-07 19:35     ` Stephen Hemminger
2009-08-07 19:44     ` Fischer, Anna
2009-08-07 19:44       ` Fischer, Anna
2009-08-07 20:17       ` david
2009-08-07 20:17         ` david
2009-08-07 19:47     ` Paul Congdon (UC Davis)
2009-08-07 19:47       ` Paul Congdon (UC Davis)
2009-08-07 21:38     ` Arnd Bergmann
2009-08-07 21:38       ` Arnd Bergmann
2009-08-07 22:05   ` Arnd Bergmann
2009-08-07 22:05     ` Arnd Bergmann
2009-08-10 12:40     ` Fischer, Anna
2009-08-10 12:40       ` Fischer, Anna
2009-08-10 12:40       ` Fischer, Anna
2009-08-10 19:04       ` Arnd Bergmann
2009-08-10 19:04         ` Arnd Bergmann
2009-08-10 19:32         ` Michael S. Tsirkin
2009-08-10 19:32           ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090810085003.GA3654@redhat.com \
    --to=mst@redhat.com \
    --cc=anna.fischer@hp.com \
    --cc=arnd@arndb.de \
    --cc=bridge@lists.linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=evb@yahoogroups.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@voltaire.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.