Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: linux-next: manual merge of the net tree with the net-current tree
From: David Miller @ 2011-02-09  1:06 UTC (permalink / raw)
  To: sfr
  Cc: netdev, linux-next, linux-kernel, jesse.brandeburg,
	jeffrey.t.kirsher, bruce.w.allan
In-Reply-To: <20110209115609.79932942.sfr@canb.auug.org.au>

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Wed, 9 Feb 2011 11:56:09 +1100

> Today's linux-next merge of the net tree got a conflict in
> drivers/net/e1000e/netdev.c between commit
> 463342741222c79469303cdab8ce99c8bc2d80e8 ("e1000e: tx_timeout should not
> increment for non-hang events") from the net-current tree and commit
> 90da06692532541a38f9857972e1fd6b1cdfb45a ("e1000e: reduce scope of some
> variables, remove unnecessary ones") from the net tree.
> 
> I fixed it up (see below) and can carry the fix as necessary.

I'll do a merge to clear this up, thanks Stephen.

^ permalink raw reply

* linux-next: manual merge of the net tree with the net-current tree
From: Stephen Rothwell @ 2011-02-09  0:56 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: linux-next, linux-kernel, Jesse Brandeburg, Jeff Kirsher,
	Bruce Allan

Hi all,

Today's linux-next merge of the net tree got a conflict in
drivers/net/e1000e/netdev.c between commit
463342741222c79469303cdab8ce99c8bc2d80e8 ("e1000e: tx_timeout should not
increment for non-hang events") from the net-current tree and commit
90da06692532541a38f9857972e1fd6b1cdfb45a ("e1000e: reduce scope of some
variables, remove unnecessary ones") from the net tree.

I fixed it up (see below) and can carry the fix as necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc drivers/net/e1000e/netdev.c
index 3065870,5b916b0..0000000
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@@ -4299,20 -4303,18 +4303,17 @@@ link_up
  
  	e1000e_update_adaptive(&adapter->hw);
  
- 	if (!netif_carrier_ok(netdev)) {
- 		tx_pending = (e1000_desc_unused(tx_ring) + 1 <
- 			       tx_ring->count);
- 		if (tx_pending) {
- 			/*
- 			 * We've lost link, so the controller stops DMA,
- 			 * but we've got queued Tx work that's never going
- 			 * to get done, so reset controller to flush Tx.
- 			 * (Do the reset outside of interrupt context).
- 			 */
- 			schedule_work(&adapter->reset_task);
- 			/* return immediately since reset is imminent */
- 			return;
- 		}
+ 	if (!netif_carrier_ok(netdev) &&
+ 	    (e1000_desc_unused(tx_ring) + 1 < tx_ring->count)) {
+ 		/*
+ 		 * We've lost link, so the controller stops DMA,
+ 		 * but we've got queued Tx work that's never going
+ 		 * to get done, so reset controller to flush Tx.
+ 		 * (Do the reset outside of interrupt context).
+ 		 */
 -		adapter->tx_timeout_count++;
+ 		schedule_work(&adapter->reset_task);
+ 		/* return immediately since reset is imminent */
+ 		return;
  	}
  
  	/* Simple mode for Interrupt Throttle Rate (ITR) */

^ permalink raw reply

* Re: Network performance with small packets
From: Michael S. Tsirkin @ 2011-02-09  0:53 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Krishna Kumar2, David Miller, kvm, Shirley Ma, netdev, steved
In-Reply-To: <201102091107.20270.rusty@rustcorp.com.au>

On Wed, Feb 09, 2011 at 11:07:20AM +1030, Rusty Russell wrote:
> On Wed, 2 Feb 2011 03:12:22 pm Michael S. Tsirkin wrote:
> > On Wed, Feb 02, 2011 at 10:09:18AM +0530, Krishna Kumar2 wrote:
> > > > "Michael S. Tsirkin" <mst@redhat.com> 02/02/2011 03:11 AM
> > > >
> > > > On Tue, Feb 01, 2011 at 01:28:45PM -0800, Shirley Ma wrote:
> > > > > On Tue, 2011-02-01 at 23:21 +0200, Michael S. Tsirkin wrote:
> > > > > > Confused. We compare capacity to skb frags, no?
> > > > > > That's sg I think ...
> > > > >
> > > > > Current guest kernel use indirect buffers, num_free returns how many
> > > > > available descriptors not skb frags. So it's wrong here.
> > > > >
> > > > > Shirley
> > > >
> > > > I see. Good point. In other words when we complete the buffer
> > > > it was indirect, but when we add a new one we
> > > > can not allocate indirect so we consume.
> > > > And then we start the queue and add will fail.
> > > > I guess we need some kind of API to figure out
> > > > whether the buf we complete was indirect?
> 
> I've finally read this thread... I think we need to get more serious
> with our stats gathering to diagnose these kind of performance issues.
> 
> This is a start; it should tell us what is actually happening to the
> virtio ring(s) without significant performance impact...
> 
> Subject: virtio: CONFIG_VIRTIO_STATS
> 
> For performance problems we'd like to know exactly what the ring looks
> like.  This patch adds stats indexed by how-full-ring-is; we could extend
> it to also record them by how-used-ring-is if we need.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Not sure whether the intent is to merge this. If yes -
would it make sense to use tracing for this instead?
That's what kvm does.

> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -7,6 +7,14 @@ config VIRTIO_RING
>  	tristate
>  	depends on VIRTIO
>  
> +config VIRTIO_STATS
> +	bool "Virtio debugging stats (EXPERIMENTAL)"
> +	depends on VIRTIO_RING
> +	select DEBUG_FS
> +	---help---
> +	  Virtio stats collected by how full the ring is at any time,
> +	  presented under debugfs/virtio/<name>-<vq>/<num-used>/
> +
>  config VIRTIO_PCI
>  	tristate "PCI driver for virtio devices (EXPERIMENTAL)"
>  	depends on PCI && EXPERIMENTAL
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -21,6 +21,7 @@
>  #include <linux/virtio_config.h>
>  #include <linux/device.h>
>  #include <linux/slab.h>
> +#include <linux/debugfs.h>
>  
>  /* virtio guest is communicating with a virtual "device" that actually runs on
>   * a host processor.  Memory barriers are used to control SMP effects. */
> @@ -95,6 +96,11 @@ struct vring_virtqueue
>  	/* How to notify other side. FIXME: commonalize hcalls! */
>  	void (*notify)(struct virtqueue *vq);
>  
> +#ifdef CONFIG_VIRTIO_STATS
> +	struct vring_stat *stats;
> +	struct dentry *statdir;
> +#endif
> +
>  #ifdef DEBUG
>  	/* They're supposed to lock for us. */
>  	unsigned int in_use;
> @@ -106,6 +112,87 @@ struct vring_virtqueue
>  
>  #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
>  
> +#ifdef CONFIG_VIRTIO_STATS
> +/* We have an array of these, indexed by how full the ring is. */
> +struct vring_stat {
> +	/* How many interrupts? */
> +	size_t interrupt_nowork, interrupt_work;
> +	/* How many non-notify kicks, how many notify kicks, how many add notify? */
> +	size_t kick_no_notify, kick_notify, add_notify;
> +	/* How many adds? */
> +	size_t add_direct, add_indirect, add_fail;
> +	/* How many gets? */
> +	size_t get;
> +	/* How many disable callbacks? */
> +	size_t disable_cb;
> +	/* How many enables? */
> +	size_t enable_cb_retry, enable_cb_success;
> +};
> +
> +static struct dentry *virtio_stats;
> +
> +static void create_stat_files(struct vring_virtqueue *vq)
> +{
> +	char name[80];
> +	unsigned int i;
> +
> +	/* Racy in theory, but we don't care. */
> +	if (!virtio_stats)
> +		virtio_stats = debugfs_create_dir("virtio-stats", NULL);
> +
> +	sprintf(name, "%s-%s", dev_name(&vq->vq.vdev->dev), vq->vq.name);
> +	vq->statdir = debugfs_create_dir(name, virtio_stats);
> +
> +	for (i = 0; i < vq->vring.num; i++) {
> +		struct dentry *dir;
> +
> +		sprintf(name, "%i", i);
> +		dir = debugfs_create_dir(name, vq->statdir);
> +		debugfs_create_size_t("interrupt_nowork", 0400, dir,
> +				      &vq->stats[i].interrupt_nowork);
> +		debugfs_create_size_t("interrupt_work", 0400, dir,
> +				      &vq->stats[i].interrupt_work);
> +		debugfs_create_size_t("kick_no_notify", 0400, dir,
> +				      &vq->stats[i].kick_no_notify);
> +		debugfs_create_size_t("kick_notify", 0400, dir,
> +				      &vq->stats[i].kick_notify);
> +		debugfs_create_size_t("add_notify", 0400, dir,
> +				      &vq->stats[i].add_notify);
> +		debugfs_create_size_t("add_direct", 0400, dir,
> +				      &vq->stats[i].add_direct);
> +		debugfs_create_size_t("add_indirect", 0400, dir,
> +				      &vq->stats[i].add_indirect);
> +		debugfs_create_size_t("add_fail", 0400, dir,
> +				      &vq->stats[i].add_fail);
> +		debugfs_create_size_t("get", 0400, dir,
> +				      &vq->stats[i].get);
> +		debugfs_create_size_t("disable_cb", 0400, dir,
> +				      &vq->stats[i].disable_cb);
> +		debugfs_create_size_t("enable_cb_retry", 0400, dir,
> +				      &vq->stats[i].enable_cb_retry);
> +		debugfs_create_size_t("enable_cb_success", 0400, dir,
> +				      &vq->stats[i].enable_cb_success);
> +	}
> +}
> +
> +static void delete_stat_files(struct vring_virtqueue *vq)
> +{
> +	debugfs_remove_recursive(vq->statdir);
> +}
> +
> +#define add_stat(vq, name)						\
> +	do {								\
> +		struct vring_virtqueue *_vq = (vq);			\
> +		_vq->stats[_vq->num_free - _vq->vring.num].name++;	\
> +	} while (0)
> +
> +#else
> +#define add_stat(vq, name)
> +static void delete_stat_files(struct vring_virtqueue *vq)
> +{
> +}
> +#endif
> +
>  /* Set up an indirect table of descriptors and add it to the queue. */
>  static int vring_add_indirect(struct vring_virtqueue *vq,
>  			      struct scatterlist sg[],
> @@ -121,6 +208,8 @@ static int vring_add_indirect(struct vri
>  	if (!desc)
>  		return -ENOMEM;
>  
> +	add_stat(vq, add_indirect);
> +
>  	/* Transfer entries from the sg list into the indirect page */
>  	for (i = 0; i < out; i++) {
>  		desc[i].flags = VRING_DESC_F_NEXT;
> @@ -183,17 +272,22 @@ int virtqueue_add_buf_gfp(struct virtque
>  	BUG_ON(out + in == 0);
>  
>  	if (vq->num_free < out + in) {
> +		add_stat(vq, add_fail);
>  		pr_debug("Can't add buf len %i - avail = %i\n",
>  			 out + in, vq->num_free);
>  		/* FIXME: for historical reasons, we force a notify here if
>  		 * there are outgoing parts to the buffer.  Presumably the
>  		 * host should service the ring ASAP. */
> -		if (out)
> +		if (out) {
> +			add_stat(vq, add_notify);
>  			vq->notify(&vq->vq);
> +		}
>  		END_USE(vq);
>  		return -ENOSPC;
>  	}
>  
> +	add_stat(vq, add_direct);
> +
>  	/* We're about to use some buffers from the free list. */
>  	vq->num_free -= out + in;
>  
> @@ -248,9 +342,12 @@ void virtqueue_kick(struct virtqueue *_v
>  	/* Need to update avail index before checking if we should notify */
>  	virtio_mb();
>  
> -	if (!(vq->vring.used->flags & VRING_USED_F_NO_NOTIFY))
> +	if (!(vq->vring.used->flags & VRING_USED_F_NO_NOTIFY)) {
> +		add_stat(vq, kick_notify);
>  		/* Prod other side to tell it about changes. */
>  		vq->notify(&vq->vq);
> +	} else
> +		add_stat(vq, kick_no_notify);
>  
>  	END_USE(vq);
>  }
> @@ -294,6 +391,8 @@ void *virtqueue_get_buf(struct virtqueue
>  
>  	START_USE(vq);
>  
> +	add_stat(vq, get);
> +
>  	if (unlikely(vq->broken)) {
>  		END_USE(vq);
>  		return NULL;
> @@ -333,6 +432,7 @@ void virtqueue_disable_cb(struct virtque
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
> +	add_stat(vq, disable_cb);
>  	vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
> @@ -348,10 +448,12 @@ bool virtqueue_enable_cb(struct virtqueu
>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>  	virtio_mb();
>  	if (unlikely(more_used(vq))) {
> +		add_stat(vq, enable_cb_retry);
>  		END_USE(vq);
>  		return false;
>  	}
>  
> +	add_stat(vq, enable_cb_success);
>  	END_USE(vq);
>  	return true;
>  }
> @@ -387,10 +489,12 @@ irqreturn_t vring_interrupt(int irq, voi
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
>  	if (!more_used(vq)) {
> +		add_stat(vq, interrupt_nowork);
>  		pr_debug("virtqueue interrupt with no work for %p\n", vq);
>  		return IRQ_NONE;
>  	}
>  
> +	add_stat(vq, interrupt_work);
>  	if (unlikely(vq->broken))
>  		return IRQ_HANDLED;
>  
> @@ -451,6 +555,15 @@ struct virtqueue *vring_new_virtqueue(un
>  	}
>  	vq->data[i] = NULL;
>  
> +#ifdef CONFIG_VIRTIO_STATS
> +	vq->stats = kzalloc(sizeof(*vq->stats) * num, GFP_KERNEL);
> +	if (!vq->stats) {
> +		kfree(vq);
> +		return NULL;
> +	}
> +	create_stat_files(vq);
> +#endif
> +
>  	return &vq->vq;
>  }
>  EXPORT_SYMBOL_GPL(vring_new_virtqueue);
> @@ -458,6 +571,7 @@ EXPORT_SYMBOL_GPL(vring_new_virtqueue);
>  void vring_del_virtqueue(struct virtqueue *vq)
>  {
>  	list_del(&vq->list);
> +	delete_stat_files(to_vvq(vq));
>  	kfree(to_vvq(vq));
>  }
>  EXPORT_SYMBOL_GPL(vring_del_virtqueue);

^ permalink raw reply

* Re: [PATCH] pch_gbe: Fix the issue which a driver locks when rx offload is set by ethtool
From: David Miller @ 2011-02-09  0:40 UTC (permalink / raw)
  To: toshiharu-linux
  Cc: netdev, tomoya-linux, qi.wang, yong.y.wang, andrew.chih.howe.khor,
	joel.clark, kok.howg.ewe, linux-kernel
In-Reply-To: <4D512A27.2030406@dsn.okisemi.com>

From: Toshiharu Okada <toshiharu-linux@dsn.okisemi.com>
Date: Tue, 08 Feb 2011 20:33:59 +0900

> @@ -531,12 +533,8 @@ void pch_gbe_reinit_locked(struct pch_gbe_adapter *adapter)
>  {
>  	struct net_device *netdev = adapter->netdev;
>  
> -	rtnl_lock();
> -	if (netif_running(netdev)) {
> -		pch_gbe_down(adapter);
> -		pch_gbe_up(adapter);
> -	}
> -	rtnl_unlock();
> +	pch_gbe_down(adapter);
> +	pch_gbe_up(adapter);

Are you sure you can just blindly delete the netif_running() check here?

^ permalink raw reply

* Re: [PATCH 3/3] pch_can: fix module reload issue with MSI
From: David Miller @ 2011-02-09  0:37 UTC (permalink / raw)
  To: tomoya-linux-ECg8zkTtlr0C6LszWs/t0g
  Cc: andrew.chih.howe.khor-ral2JQCrhuEAvxtiuMwx3w,
	qi.wang-ral2JQCrhuEAvxtiuMwx3w, netdev-u79uwXL29TY76Z2rM5mHXA,
	yong.y.wang-ral2JQCrhuEAvxtiuMwx3w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	toshiharu-linux-ECg8zkTtlr0C6LszWs/t0g,
	kok.howg.ewe-ral2JQCrhuEAvxtiuMwx3w,
	joel.clark-ral2JQCrhuEAvxtiuMwx3w, wg-5Yr1BZd7O62+XT7JhA+gdA
In-Reply-To: <1297157343-3213-3-git-send-email-tomoya-linux-ECg8zkTtlr0C6LszWs/t0g@public.gmane.org>

From: Tomoya MORINAGA <tomoya-linux-ECg8zkTtlr0C6LszWs/t0g@public.gmane.org>
Date: Tue,  8 Feb 2011 18:29:03 +0900

> Currently, in case reload pch_can,
> pch_can not to be able to catch interrupt.
> 
> The cause is bus-master is not set in pch_can.
> Thus, add enabling bus-master processing.
> 
> Signed-off-by: Tomoya MORINAGA <tomoya-linux-ECg8zkTtlr0C6LszWs/t0g@public.gmane.org>

Applied.

^ permalink raw reply

* Re: [PATCH 2/3] pch_can: fix rmmod issue
From: David Miller @ 2011-02-09  0:37 UTC (permalink / raw)
  To: tomoya-linux-ECg8zkTtlr0C6LszWs/t0g
  Cc: andrew.chih.howe.khor-ral2JQCrhuEAvxtiuMwx3w,
	qi.wang-ral2JQCrhuEAvxtiuMwx3w, netdev-u79uwXL29TY76Z2rM5mHXA,
	yong.y.wang-ral2JQCrhuEAvxtiuMwx3w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	toshiharu-linux-ECg8zkTtlr0C6LszWs/t0g,
	kok.howg.ewe-ral2JQCrhuEAvxtiuMwx3w,
	joel.clark-ral2JQCrhuEAvxtiuMwx3w, wg-5Yr1BZd7O62+XT7JhA+gdA
In-Reply-To: <1297157343-3213-2-git-send-email-tomoya-linux-ECg8zkTtlr0C6LszWs/t0g@public.gmane.org>

From: Tomoya MORINAGA <tomoya-linux-ECg8zkTtlr0C6LszWs/t0g@public.gmane.org>
Date: Tue,  8 Feb 2011 18:29:02 +0900

> Currently, when rmmod pch_can, kernel failure occurs.
> The cause is pci_iounmap executed before pch_can_reset.
> Thus pci_iounmap moves after pch_can_reset.
> 
> Signed-off-by: Tomoya MORINAGA <tomoya-linux-ECg8zkTtlr0C6LszWs/t0g@public.gmane.org>

Applied.

^ permalink raw reply

* Re: [PATCH 1/3] pch_can: fix 800k comms issue
From: David Miller @ 2011-02-09  0:37 UTC (permalink / raw)
  To: tomoya-linux-ECg8zkTtlr0C6LszWs/t0g
  Cc: andrew.chih.howe.khor-ral2JQCrhuEAvxtiuMwx3w,
	qi.wang-ral2JQCrhuEAvxtiuMwx3w, netdev-u79uwXL29TY76Z2rM5mHXA,
	yong.y.wang-ral2JQCrhuEAvxtiuMwx3w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	toshiharu-linux-ECg8zkTtlr0C6LszWs/t0g,
	kok.howg.ewe-ral2JQCrhuEAvxtiuMwx3w,
	joel.clark-ral2JQCrhuEAvxtiuMwx3w, wg-5Yr1BZd7O62+XT7JhA+gdA
In-Reply-To: <1297157343-3213-1-git-send-email-tomoya-linux-ECg8zkTtlr0C6LszWs/t0g@public.gmane.org>

From: Tomoya MORINAGA <tomoya-linux-ECg8zkTtlr0C6LszWs/t0g@public.gmane.org>
Date: Tue,  8 Feb 2011 18:29:01 +0900

> Currently, 800k comms fails since prop_seg set zero.
> (EG20T PCH CAN register of prop_seg must be set more than 1)
> To prevent prop_seg set to zero, change tseg2_min 1 to 2.
> 
> Signed-off-by: Tomoya MORINAGA <tomoya-linux-ECg8zkTtlr0C6LszWs/t0g@public.gmane.org>

Applied.

^ permalink raw reply

* Re: Network performance with small packets
From: Rusty Russell @ 2011-02-09  0:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Krishna Kumar2, David Miller, kvm, Shirley Ma, netdev, steved
In-Reply-To: <20110202044222.GC3818@redhat.com>

On Wed, 2 Feb 2011 03:12:22 pm Michael S. Tsirkin wrote:
> On Wed, Feb 02, 2011 at 10:09:18AM +0530, Krishna Kumar2 wrote:
> > > "Michael S. Tsirkin" <mst@redhat.com> 02/02/2011 03:11 AM
> > >
> > > On Tue, Feb 01, 2011 at 01:28:45PM -0800, Shirley Ma wrote:
> > > > On Tue, 2011-02-01 at 23:21 +0200, Michael S. Tsirkin wrote:
> > > > > Confused. We compare capacity to skb frags, no?
> > > > > That's sg I think ...
> > > >
> > > > Current guest kernel use indirect buffers, num_free returns how many
> > > > available descriptors not skb frags. So it's wrong here.
> > > >
> > > > Shirley
> > >
> > > I see. Good point. In other words when we complete the buffer
> > > it was indirect, but when we add a new one we
> > > can not allocate indirect so we consume.
> > > And then we start the queue and add will fail.
> > > I guess we need some kind of API to figure out
> > > whether the buf we complete was indirect?

I've finally read this thread... I think we need to get more serious
with our stats gathering to diagnose these kind of performance issues.

This is a start; it should tell us what is actually happening to the
virtio ring(s) without significant performance impact...

Subject: virtio: CONFIG_VIRTIO_STATS

For performance problems we'd like to know exactly what the ring looks
like.  This patch adds stats indexed by how-full-ring-is; we could extend
it to also record them by how-used-ring-is if we need.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -7,6 +7,14 @@ config VIRTIO_RING
 	tristate
 	depends on VIRTIO
 
+config VIRTIO_STATS
+	bool "Virtio debugging stats (EXPERIMENTAL)"
+	depends on VIRTIO_RING
+	select DEBUG_FS
+	---help---
+	  Virtio stats collected by how full the ring is at any time,
+	  presented under debugfs/virtio/<name>-<vq>/<num-used>/
+
 config VIRTIO_PCI
 	tristate "PCI driver for virtio devices (EXPERIMENTAL)"
 	depends on PCI && EXPERIMENTAL
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -21,6 +21,7 @@
 #include <linux/virtio_config.h>
 #include <linux/device.h>
 #include <linux/slab.h>
+#include <linux/debugfs.h>
 
 /* virtio guest is communicating with a virtual "device" that actually runs on
  * a host processor.  Memory barriers are used to control SMP effects. */
@@ -95,6 +96,11 @@ struct vring_virtqueue
 	/* How to notify other side. FIXME: commonalize hcalls! */
 	void (*notify)(struct virtqueue *vq);
 
+#ifdef CONFIG_VIRTIO_STATS
+	struct vring_stat *stats;
+	struct dentry *statdir;
+#endif
+
 #ifdef DEBUG
 	/* They're supposed to lock for us. */
 	unsigned int in_use;
@@ -106,6 +112,87 @@ struct vring_virtqueue
 
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
+#ifdef CONFIG_VIRTIO_STATS
+/* We have an array of these, indexed by how full the ring is. */
+struct vring_stat {
+	/* How many interrupts? */
+	size_t interrupt_nowork, interrupt_work;
+	/* How many non-notify kicks, how many notify kicks, how many add notify? */
+	size_t kick_no_notify, kick_notify, add_notify;
+	/* How many adds? */
+	size_t add_direct, add_indirect, add_fail;
+	/* How many gets? */
+	size_t get;
+	/* How many disable callbacks? */
+	size_t disable_cb;
+	/* How many enables? */
+	size_t enable_cb_retry, enable_cb_success;
+};
+
+static struct dentry *virtio_stats;
+
+static void create_stat_files(struct vring_virtqueue *vq)
+{
+	char name[80];
+	unsigned int i;
+
+	/* Racy in theory, but we don't care. */
+	if (!virtio_stats)
+		virtio_stats = debugfs_create_dir("virtio-stats", NULL);
+
+	sprintf(name, "%s-%s", dev_name(&vq->vq.vdev->dev), vq->vq.name);
+	vq->statdir = debugfs_create_dir(name, virtio_stats);
+
+	for (i = 0; i < vq->vring.num; i++) {
+		struct dentry *dir;
+
+		sprintf(name, "%i", i);
+		dir = debugfs_create_dir(name, vq->statdir);
+		debugfs_create_size_t("interrupt_nowork", 0400, dir,
+				      &vq->stats[i].interrupt_nowork);
+		debugfs_create_size_t("interrupt_work", 0400, dir,
+				      &vq->stats[i].interrupt_work);
+		debugfs_create_size_t("kick_no_notify", 0400, dir,
+				      &vq->stats[i].kick_no_notify);
+		debugfs_create_size_t("kick_notify", 0400, dir,
+				      &vq->stats[i].kick_notify);
+		debugfs_create_size_t("add_notify", 0400, dir,
+				      &vq->stats[i].add_notify);
+		debugfs_create_size_t("add_direct", 0400, dir,
+				      &vq->stats[i].add_direct);
+		debugfs_create_size_t("add_indirect", 0400, dir,
+				      &vq->stats[i].add_indirect);
+		debugfs_create_size_t("add_fail", 0400, dir,
+				      &vq->stats[i].add_fail);
+		debugfs_create_size_t("get", 0400, dir,
+				      &vq->stats[i].get);
+		debugfs_create_size_t("disable_cb", 0400, dir,
+				      &vq->stats[i].disable_cb);
+		debugfs_create_size_t("enable_cb_retry", 0400, dir,
+				      &vq->stats[i].enable_cb_retry);
+		debugfs_create_size_t("enable_cb_success", 0400, dir,
+				      &vq->stats[i].enable_cb_success);
+	}
+}
+
+static void delete_stat_files(struct vring_virtqueue *vq)
+{
+	debugfs_remove_recursive(vq->statdir);
+}
+
+#define add_stat(vq, name)						\
+	do {								\
+		struct vring_virtqueue *_vq = (vq);			\
+		_vq->stats[_vq->num_free - _vq->vring.num].name++;	\
+	} while (0)
+
+#else
+#define add_stat(vq, name)
+static void delete_stat_files(struct vring_virtqueue *vq)
+{
+}
+#endif
+
 /* Set up an indirect table of descriptors and add it to the queue. */
 static int vring_add_indirect(struct vring_virtqueue *vq,
 			      struct scatterlist sg[],
@@ -121,6 +208,8 @@ static int vring_add_indirect(struct vri
 	if (!desc)
 		return -ENOMEM;
 
+	add_stat(vq, add_indirect);
+
 	/* Transfer entries from the sg list into the indirect page */
 	for (i = 0; i < out; i++) {
 		desc[i].flags = VRING_DESC_F_NEXT;
@@ -183,17 +272,22 @@ int virtqueue_add_buf_gfp(struct virtque
 	BUG_ON(out + in == 0);
 
 	if (vq->num_free < out + in) {
+		add_stat(vq, add_fail);
 		pr_debug("Can't add buf len %i - avail = %i\n",
 			 out + in, vq->num_free);
 		/* FIXME: for historical reasons, we force a notify here if
 		 * there are outgoing parts to the buffer.  Presumably the
 		 * host should service the ring ASAP. */
-		if (out)
+		if (out) {
+			add_stat(vq, add_notify);
 			vq->notify(&vq->vq);
+		}
 		END_USE(vq);
 		return -ENOSPC;
 	}
 
+	add_stat(vq, add_direct);
+
 	/* We're about to use some buffers from the free list. */
 	vq->num_free -= out + in;
 
@@ -248,9 +342,12 @@ void virtqueue_kick(struct virtqueue *_v
 	/* Need to update avail index before checking if we should notify */
 	virtio_mb();
 
-	if (!(vq->vring.used->flags & VRING_USED_F_NO_NOTIFY))
+	if (!(vq->vring.used->flags & VRING_USED_F_NO_NOTIFY)) {
+		add_stat(vq, kick_notify);
 		/* Prod other side to tell it about changes. */
 		vq->notify(&vq->vq);
+	} else
+		add_stat(vq, kick_no_notify);
 
 	END_USE(vq);
 }
@@ -294,6 +391,8 @@ void *virtqueue_get_buf(struct virtqueue
 
 	START_USE(vq);
 
+	add_stat(vq, get);
+
 	if (unlikely(vq->broken)) {
 		END_USE(vq);
 		return NULL;
@@ -333,6 +432,7 @@ void virtqueue_disable_cb(struct virtque
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
+	add_stat(vq, disable_cb);
 	vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
 }
 EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
@@ -348,10 +448,12 @@ bool virtqueue_enable_cb(struct virtqueu
 	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
 	virtio_mb();
 	if (unlikely(more_used(vq))) {
+		add_stat(vq, enable_cb_retry);
 		END_USE(vq);
 		return false;
 	}
 
+	add_stat(vq, enable_cb_success);
 	END_USE(vq);
 	return true;
 }
@@ -387,10 +489,12 @@ irqreturn_t vring_interrupt(int irq, voi
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
 	if (!more_used(vq)) {
+		add_stat(vq, interrupt_nowork);
 		pr_debug("virtqueue interrupt with no work for %p\n", vq);
 		return IRQ_NONE;
 	}
 
+	add_stat(vq, interrupt_work);
 	if (unlikely(vq->broken))
 		return IRQ_HANDLED;
 
@@ -451,6 +555,15 @@ struct virtqueue *vring_new_virtqueue(un
 	}
 	vq->data[i] = NULL;
 
+#ifdef CONFIG_VIRTIO_STATS
+	vq->stats = kzalloc(sizeof(*vq->stats) * num, GFP_KERNEL);
+	if (!vq->stats) {
+		kfree(vq);
+		return NULL;
+	}
+	create_stat_files(vq);
+#endif
+
 	return &vq->vq;
 }
 EXPORT_SYMBOL_GPL(vring_new_virtqueue);
@@ -458,6 +571,7 @@ EXPORT_SYMBOL_GPL(vring_new_virtqueue);
 void vring_del_virtqueue(struct virtqueue *vq)
 {
 	list_del(&vq->list);
+	delete_stat_files(to_vvq(vq));
 	kfree(to_vvq(vq));
 }
 EXPORT_SYMBOL_GPL(vring_del_virtqueue);

^ permalink raw reply

* Re: [PATCH] pch_gbe: Fix the issue that the receiving data is not normal.
From: David Miller @ 2011-02-09  0:35 UTC (permalink / raw)
  To: toshiharu-linux
  Cc: netdev, linux-kernel, qi.wang, yong.y.wang, andrew.chih.howe.khor,
	joel.clark, kok.howg.ewe, tomoya-linux
In-Reply-To: <4D50FDAA.1060506@dsn.okisemi.com>

From: Toshiharu Okada <toshiharu-linux@dsn.okisemi.com>
Date: Tue, 08 Feb 2011 17:24:10 +0900

> Hi Devid
> 
> I resubmit this patch modified.
> Please check them.

Your memcpy+memcpy sequences are equivalent to memmove(), please
use that instead.

I have to say this function is insanely complicated.  There seems
to be 16 different ways RX packets are processed.  I can't believe
that is needs to be like this.

^ permalink raw reply

* [PATCH] ipvs: remove extra lookups for ICMP packets
From: Julian Anastasov @ 2011-02-09  0:26 UTC (permalink / raw)
  To: Simon Horman; +Cc: lvs-devel, netdev


 	Remove code that should not be called anymore.
Now when ip_vs_out handles replies for local clients at
LOCAL_IN hook we do not need to call conn_out_get and
handle_response_icmp from ip_vs_in_icmp* because such
lookups were already performed for the ICMP packet and no
connection was found.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---

 	Patch is against lvs-test-2.6 from 09-FEB-2011
but should apply cleanly to net-next.

--- lvs-test-2.6-7c9989a/net/netfilter/ipvs/ip_vs_core.c	2011-02-07 13:40:00.000000000 +0200
+++ linux/net/netfilter/ipvs/ip_vs_core.c	2011-02-09 02:02:53.135352388 +0200
@@ -729,7 +729,7 @@ void ip_vs_nat_icmp_v6(struct sk_buff *s
  #endif

  /* Handle relevant response ICMP messages - forward to the right
- * destination host. Used for NAT and local client.
+ * destination host.
   */
  static int handle_response_icmp(int af, struct sk_buff *skb,
  				union nf_inet_addr *snet,
@@ -979,7 +979,6 @@ static inline int is_tcp_reset(const str
  }

  /* Handle response packets: rewrite addresses and send away...
- * Used for NAT and local client.
   */
  static unsigned int
  handle_response(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
@@ -1280,7 +1279,6 @@ ip_vs_in_icmp(struct sk_buff *skb, int *
  	struct ip_vs_protocol *pp;
  	struct ip_vs_proto_data *pd;
  	unsigned int offset, ihl, verdict;
-	union nf_inet_addr snet;

  	*related = 1;

@@ -1339,17 +1337,8 @@ ip_vs_in_icmp(struct sk_buff *skb, int *
  	ip_vs_fill_iphdr(AF_INET, cih, &ciph);
  	/* The embedded headers contain source and dest in reverse order */
  	cp = pp->conn_in_get(AF_INET, skb, &ciph, offset, 1);
-	if (!cp) {
-		/* The packet could also belong to a local client */
-		cp = pp->conn_out_get(AF_INET, skb, &ciph, offset, 1);
-		if (cp) {
-			snet.ip = iph->saddr;
-			return handle_response_icmp(AF_INET, skb, &snet,
-						    cih->protocol, cp, pp,
-						    offset, ihl);
-		}
+	if (!cp)
  		return NF_ACCEPT;
-	}

  	verdict = NF_DROP;

@@ -1395,7 +1384,6 @@ ip_vs_in_icmp_v6(struct sk_buff *skb, in
  	struct ip_vs_protocol *pp;
  	struct ip_vs_proto_data *pd;
  	unsigned int offset, verdict;
-	union nf_inet_addr snet;
  	struct rt6_info *rt;

  	*related = 1;
@@ -1455,18 +1443,8 @@ ip_vs_in_icmp_v6(struct sk_buff *skb, in
  	ip_vs_fill_iphdr(AF_INET6, cih, &ciph);
  	/* The embedded headers contain source and dest in reverse order */
  	cp = pp->conn_in_get(AF_INET6, skb, &ciph, offset, 1);
-	if (!cp) {
-		/* The packet could also belong to a local client */
-		cp = pp->conn_out_get(AF_INET6, skb, &ciph, offset, 1);
-		if (cp) {
-			ipv6_addr_copy(&snet.in6, &iph->saddr);
-			return handle_response_icmp(AF_INET6, skb, &snet,
-						    cih->nexthdr,
-						    cp, pp, offset,
-						    sizeof(struct ipv6hdr));
-		}
+	if (!cp)
  		return NF_ACCEPT;
-	}

  	verdict = NF_DROP;


^ permalink raw reply

* [PATCH] ipvs: fix timer in get_curr_sync_buff
From: Julian Anastasov @ 2011-02-09  0:21 UTC (permalink / raw)
  To: Simon Horman; +Cc: lvs-devel, netdev, Tinggong Wang


From: Tinggong Wang <wangtinggong@gmail.com>

 	Fix get_curr_sync_buff to keep buffer for 2 seconds
as intended, not just for the current jiffie. By this way
we will sync more connection structures with single packet.

Signed-off-by: Tinggong Wang <wangtinggong@gmail.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---

 	Patch is against lvs-test-2.6 from 09-FEB-2011
but should apply cleanly to net-next.

--- lvs-test-2.6-7c9989a/net/netfilter/ipvs/ip_vs_sync.c	2011-02-07 13:40:00.000000000 +0200
+++ linux/net/netfilter/ipvs/ip_vs_sync.c	2011-02-09 01:59:46.771351460 +0200
@@ -374,8 +374,8 @@ get_curr_sync_buff(struct netns_ipvs *ip
  	struct ip_vs_sync_buff *sb;

  	spin_lock_bh(&ipvs->sync_buff_lock);
-	if (ipvs->sync_buff && (time == 0 ||
-	    time_before(jiffies - ipvs->sync_buff->firstuse, time))) {
+	if (ipvs->sync_buff &&
+	    time_after_eq(jiffies - ipvs->sync_buff->firstuse, time)) {
  		sb = ipvs->sync_buff;
  		ipvs->sync_buff = NULL;
  	} else

^ permalink raw reply

* [PATCH] net: Kill NETEVENT_PMTU_UPDATE.
From: David Miller @ 2011-02-09  0:18 UTC (permalink / raw)
  To: netdev


Nobody actually does anything in response to the event,
so just kill it off.

Signed-off-by: David S. Miller <davem@davemloft.net>
---

Committed to net-next-2.6

 drivers/net/cxgb3/cxgb3_offload.c |    2 --
 drivers/net/cxgb4/cxgb4_main.c    |    1 -
 include/net/netevent.h            |    1 -
 net/ipv4/route.c                  |    1 -
 net/ipv6/route.c                  |    1 -
 5 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/drivers/net/cxgb3/cxgb3_offload.c b/drivers/net/cxgb3/cxgb3_offload.c
index ef02aa6..7ea94b5 100644
--- a/drivers/net/cxgb3/cxgb3_offload.c
+++ b/drivers/net/cxgb3/cxgb3_offload.c
@@ -967,8 +967,6 @@ static int nb_callback(struct notifier_block *self, unsigned long event,
 		cxgb_neigh_update((struct neighbour *)ctx);
 		break;
 	}
-	case (NETEVENT_PMTU_UPDATE):
-		break;
 	case (NETEVENT_REDIRECT):{
 		struct netevent_redirect *nr = ctx;
 		cxgb_redirect(nr->old, nr->new);
diff --git a/drivers/net/cxgb4/cxgb4_main.c b/drivers/net/cxgb4/cxgb4_main.c
index ec35d45..5352c8a 100644
--- a/drivers/net/cxgb4/cxgb4_main.c
+++ b/drivers/net/cxgb4/cxgb4_main.c
@@ -2471,7 +2471,6 @@ static int netevent_cb(struct notifier_block *nb, unsigned long event,
 	case NETEVENT_NEIGH_UPDATE:
 		check_neigh_update(data);
 		break;
-	case NETEVENT_PMTU_UPDATE:
 	case NETEVENT_REDIRECT:
 	default:
 		break;
diff --git a/include/net/netevent.h b/include/net/netevent.h
index e82b7ba..22b239c 100644
--- a/include/net/netevent.h
+++ b/include/net/netevent.h
@@ -21,7 +21,6 @@ struct netevent_redirect {
 
 enum netevent_notif_type {
 	NETEVENT_NEIGH_UPDATE = 1, /* arg is struct neighbour ptr */
-	NETEVENT_PMTU_UPDATE,	   /* arg is struct dst_entry ptr */
 	NETEVENT_REDIRECT,	   /* arg is struct netevent_redirect ptr */
 };
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 2e225da..0455af8 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1762,7 +1762,6 @@ static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu)
 		}
 		dst_metric_set(dst, RTAX_MTU, mtu);
 		dst_set_expires(dst, ip_rt_mtu_expires);
-		call_netevent_notifiers(NETEVENT_PMTU_UPDATE, dst);
 	}
 }
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 0a63d44..12ec83d 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -965,7 +965,6 @@ static void ip6_rt_update_pmtu(struct dst_entry *dst, u32 mtu)
 			dst_metric_set(dst, RTAX_FEATURES, features);
 		}
 		dst_metric_set(dst, RTAX_MTU, mtu);
-		call_netevent_notifiers(NETEVENT_PMTU_UPDATE, dst);
 	}
 }
 
-- 
1.7.4


^ permalink raw reply related

* Re: [PATCH v4 0/5] net: Unified offload configuration
From: David Miller @ 2011-02-08 23:58 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev, bhutchings
In-Reply-To: <20110208235543.GA14251@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Wed, 9 Feb 2011 00:55:43 +0100

> And I got no bounces. What were the errors?

The bounces only go to me.

You had mal-formed email addresses in your TO or CC line,
you forgot a comma between two of them.

^ permalink raw reply

* Re: [PATCH v4 0/5] net: Unified offload configuration
From: Michał Mirosław @ 2011-02-08 23:55 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, bhutchings
In-Reply-To: <20110208.114018.71120939.davem@davemloft.net>

On Tue, Feb 08, 2011 at 11:40:18AM -0800, David Miller wrote:
> BTW, none of your patch postings from today made it to the mailing
> lists because there were syntax errors in your email headers.
> 
> Therefore, you'll need to resend them.

Hmm, that's weird. My mailer said:

mail.info: Feb  8 15:23:59 postfix/smtp[6076]: 7138813909: to=<netdev@vger.kernel.org>, relay=vger.kernel.org[209.132.180.67]:25, delay=3, delays=0.09/0.08/2.2/0.7, dsn=2.7.0, status=sent (250 2.7.0 nothing apparently wrong in the message. BF:<H 0>; S1754489Ab1BHOX7)
mail.info: Feb  8 15:23:59 postfix/smtp[6110]: BA99413A5F: to=<netdev@vger.kernel.org>, relay=vger.kernel.org[209.132.180.67]:25, delay=3, delays=0.2/0.45/1.5/0.84, dsn=2.7.1, status=sent (250 2.7.1 Looks like Linux source DIFF email.. BF:<H 0>; S1754575Ab1BHOX7)
[same for other patches in series]

And I got no bounces. What were the errors?

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [Bugme-new] [Bug 28282] New: forwarding turns autoconfiguration off
From: Francois Romieu @ 2011-02-08 23:49 UTC (permalink / raw)
  To: Hadmut Danisch; +Cc: David Miller, akpm, netdev, bugzilla-daemon, bugme-daemon
In-Reply-To: <4D51CAE8.6070706@msgid.danisch.de>

Hadmut Danisch <hadmut@danisch.de> :
[...]
> Since Linux machines can - in contrast to Windows desktops and cisco
> routers - can be a host and a router at the same time, even on the same
> interface (i.e. use a autoconf IPv6 address as a host and an fe80::
> address as a router address).
> 
> So I'd consider this in a different way. From my point of view the
> decision between host and router must be done per assigned IPv6 address
> (or address range) and not per IPv6 interface.

o^O

Afaik networking does not operate this way in the kernel. Really.

May I suggest you to have some sleep and see how your (dhcpv6 enabled ?)
DSL router can be convinced to collaborate with the existing tools under
Linux _without_ modifications ?

-- 
Ueimor

^ permalink raw reply

* [PATCH] net: Remove bogus barrier() in dst_allfrag().
From: David Miller @ 2011-02-08 23:34 UTC (permalink / raw)
  To: netdev


I simply missed this one when modifying the other dst
metric interfaces earlier.

Signed-off-by: David S. Miller <davem@davemloft.net>
---

Committed to net-next-2.6

 include/net/dst.h |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index e550195..e01855d 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -220,8 +220,6 @@ static inline u32
 dst_allfrag(const struct dst_entry *dst)
 {
 	int ret = dst_feature(dst,  RTAX_FEATURE_ALLFRAG);
-	/* Yes, _exactly_. This is paranoia. */
-	barrier();
 	return ret;
 }
 
-- 
1.7.4


^ permalink raw reply related

* Re: [Bugme-new] [Bug 28282] New: forwarding turns autoconfiguration off
From: Hadmut Danisch @ 2011-02-08 22:59 UTC (permalink / raw)
  To: Francois Romieu; +Cc: David Miller, akpm, netdev, bugzilla-daemon, bugme-daemon
In-Reply-To: <20110208224411.GA9674@electric-eye.fr.zoreil.com>

On 08.02.2011 23:44, Francois Romieu wrote:
>
> RFC 4862        IPv6 Stateless Address Autoconfiguration  September 2007
> [...]
>    The autoconfiguration process specified in this document applies only
>    to hosts and not routers.  Since host autoconfiguration uses
>    information advertised by routers, routers will need to be configured
>    by some other means.  However, it is expected that routers will
>    generate link-local addresses using the mechanism described in this
>    document.  In addition, routers are expected to successfully pass the
>    Duplicate Address Detection procedure described in this document on
>    all addresses prior to assigning them to an interface.

Thanks for the citation.


Since Linux machines can - in contrast to Windows desktops and cisco
routers - can be a host and a router at the same time, even on the same
interface (i.e. use a autoconf IPv6 address as a host and an fe80::
address as a router address).

So I'd consider this in a different way. From my point of view the
decision between host and router must be done per assigned IPv6 address
(or address range) and not per IPv6 interface.

(Maybe it would be a more correct implementation to assign a special IP
address pattern like  xxxx::.../64 to tell the interface to accept
autoconfiguration for a particular network range, probably  for 2::/3 in
most cases.)

regards
Hadmut



^ permalink raw reply

* Re: [Bugme-new] [Bug 28282] New: forwarding turns autoconfiguration off
From: Francois Romieu @ 2011-02-08 22:44 UTC (permalink / raw)
  To: David Miller; +Cc: hadmut, akpm, netdev, bugzilla-daemon, bugme-daemon
In-Reply-To: <20110208.143046.246534138.davem@davemloft.net>

David Miller <davem@davemloft.net> :
> From: Hadmut Danisch <hadmut@danisch.de>
> Date: Tue, 08 Feb 2011 23:12:30 +0100
> 
> > On 08.02.2011 22:44, David Miller wrote:
> >>
> >> This is a case where we're probably just following what the RFC documents
> >> state we should do, which means unless you can provide clear reference to
> >> a specification that states we should behave otherwise this isn't changing.
> > 
> > Could you cite where exactly this is stated in the RFC documents?
> 
> I'm working on other bugs at the moment, so I am personally unable to
> help you with this at this time.  Perhaps someone else can.

This one MAY^W may be relevant (see http://www.ietf.org/rfc/rfc4862.txt) :

Thomson, et al.             Standards Track                     [Page 3]

RFC 4862        IPv6 Stateless Address Autoconfiguration  September 2007
[...]
   The autoconfiguration process specified in this document applies only
   to hosts and not routers.  Since host autoconfiguration uses
   information advertised by routers, routers will need to be configured
   by some other means.  However, it is expected that routers will
   generate link-local addresses using the mechanism described in this
   document.  In addition, routers are expected to successfully pass the
   Duplicate Address Detection procedure described in this document on
   all addresses prior to assigning them to an interface.

--
Ueimor

^ permalink raw reply

* RE: Oops in tcp_output.c, kernel 2.6.38-rc3
From: Allan, Bruce W @ 2011-02-08 22:46 UTC (permalink / raw)
  To: Brandeburg, Jesse, David Miller
  Cc: cebbert@redhat.com, netdev@vger.kernel.org,
	ilpo.jarvinen@helsinki.fi
In-Reply-To: <alpine.WNT.2.00.1102081401480.5424@JBRANDEB-DESK2.amr.corp.intel.com>

>-----Original Message-----
>From: Brandeburg, Jesse
>Sent: Tuesday, February 08, 2011 2:07 PM
>To: David Miller
>Cc: cebbert@redhat.com; netdev@vger.kernel.org; ilpo.jarvinen@helsinki.fi;
>Allan, Bruce W
>Subject: Re: Oops in tcp_output.c, kernel 2.6.38-rc3
>
>glaring at the bug dump shows e1000e as the only network driver that i can
>see, hardware seems to be the ICH10 chipset.  Copied bruce in case he's
>seen or heard anything.
>
>appears to be this system:
>http://www.asus.com/Product.aspx?P_ID=qH6ZSEJ8EPY6HoNU&content=specifications

Sorry, no, this is new to me.  BTW, it's the ICH10R chipset which is
different from other ICH10* chipsets, just FYI.

^ permalink raw reply

* [PATCH] net/caif: Fix dangling list pointer in freed object on error.
From: David Miller @ 2011-02-08 22:33 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev

rtnl_link_ops->setup(), and the "setup" callback passed to alloc_netdev*(),
cannot make state changes which need to be undone on failure.  There is
no cleanup mechanism available at this point.

So we have to add the caif private instance to the global list once we
are sure that register_netdev() has succedded in ->newlink().

Otherwise, if register_netdev() fails, the caller will invoke free_netdev()
and we will have a reference to freed up memory on the chnl_net_list.

Signed-off-by: David S. Miller <davem@davemloft.net>
---

Committed to net-2.6, I need this setup() invariant to be properly
followed tree-wide in order to fix another bug.

 net/caif/chnl_net.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/caif/chnl_net.c b/net/caif/chnl_net.c
index fa9dab3..6008d6d 100644
--- a/net/caif/chnl_net.c
+++ b/net/caif/chnl_net.c
@@ -394,9 +394,7 @@ static void ipcaif_net_setup(struct net_device *dev)
 	priv->conn_req.sockaddr.u.dgm.connection_id = -1;
 	priv->flowenabled = false;

-	ASSERT_RTNL();
 	init_waitqueue_head(&priv->netmgmt_wq);
-	list_add(&priv->list_field, &chnl_net_list);
 }

@@ -453,6 +451,8 @@ static int ipcaif_newlink(struct net *src_net, struct net_device *dev,
 	ret = register_netdevice(dev);
 	if (ret)
 		pr_warn("device rtml registration failed\n");
+	else
+		list_add(&caifdev->list_field, &chnl_net_list);
 	return ret;
 }

-- 
1.7.4

^ permalink raw reply related

* Re: [Bugme-new] [Bug 28282] New: forwarding turns autoconfiguration off
From: David Miller @ 2011-02-08 22:30 UTC (permalink / raw)
  To: hadmut; +Cc: akpm, netdev, bugzilla-daemon, bugme-daemon
In-Reply-To: <4D51BFCE.2070405@msgid.danisch.de>

From: Hadmut Danisch <hadmut@danisch.de>
Date: Tue, 08 Feb 2011 23:12:30 +0100

> On 08.02.2011 22:44, David Miller wrote:
>>
>> This is a case where we're probably just following what the RFC documents
>> state we should do, which means unless you can provide clear reference to
>> a specification that states we should behave otherwise this isn't changing.
> 
> Could you cite where exactly this is stated in the RFC documents?

I'm working on other bugs at the moment, so I am personally unable to
help you with this at this time.  Perhaps someone else can.


^ permalink raw reply

* [PATCH] net: provide capability and group sets via SCM
From: Casey Schaufler @ 2011-02-08 22:28 UTC (permalink / raw)
  To: LKLM, netdev
  Cc: Casey Schaufler, Sakkinen Jarkko.2 (EXT-Tieto/Tampere),
	Janne Karhunen, Reshetova Elena (Nokia-D/Helsinki)
In-Reply-To: <4D3466BD.10500@schaufler-ca.com>


Subject: [PATCH] net: provide group lists and capability set via CMSG

Provide the namespace converted group list of the peer
process using the SCM mechanism. Provide the capability
set of the peer process. The capability set is not
namespace converted on the assumption that there is no
such conversion available or required.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
---
 include/asm-generic/socket.h |    4 ++
 include/linux/net.h          |    2 +
 include/linux/socket.h       |   18 +++++++++
 include/net/scm.h            |   33 +++++++++++++++++
 net/core/sock.c              |   82 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 139 insertions(+), 0 deletions(-)

diff --git a/include/asm-generic/socket.h b/include/asm-generic/socket.h
index 9a6115e..fc2d609 100644
--- a/include/asm-generic/socket.h
+++ b/include/asm-generic/socket.h
@@ -64,4 +64,8 @@
 #define SO_DOMAIN		39
 
 #define SO_RXQ_OVFL             40
+
+#define SO_PASSGROUPS		41
+#define SO_PASSCAPS		42
+
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/include/linux/net.h b/include/linux/net.h
index 16faa13..929e241 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -71,6 +71,8 @@ struct net;
 #define SOCK_NOSPACE		2
 #define SOCK_PASSCRED		3
 #define SOCK_PASSSEC		4
+#define SOCK_PASSGROUPS		5
+#define SOCK_PASSCAPS		6
 
 #ifndef ARCH_HAS_SOCKET_TYPES
 /**
diff --git a/include/linux/socket.h b/include/linux/socket.h
index edbb1d0..63f64f0 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -23,6 +23,7 @@ struct __kernel_sockaddr_storage {
 #include <linux/uio.h>			/* iovec support		*/
 #include <linux/types.h>		/* pid_t			*/
 #include <linux/compiler.h>		/* __user			*/
+#include <linux/capability.h>		/* _KERNEL_CAPABILITY_U32S	*/
 
 struct pid;
 struct cred;
@@ -145,6 +146,8 @@ static inline struct cmsghdr * cmsg_nxthdr (struct msghdr *__msg, struct cmsghdr
 #define	SCM_RIGHTS	0x01		/* rw: access rights (array of int) */
 #define SCM_CREDENTIALS 0x02		/* rw: struct ucred		*/
 #define SCM_SECURITY	0x03		/* rw: security label		*/
+#define SCM_GROUPS	0x04		/* rw: group list		*/
+#define SCM_CAPS	0x05		/* rw: capability set		*/
 
 struct ucred {
 	__u32	pid;
@@ -152,6 +155,16 @@ struct ucred {
 	__u32	gid;
 };
 
+struct scm_groups {
+	__u32	count;
+	__u32	gids[0];
+};
+
+struct scm_capabilities {
+	struct __user_cap_data_struct caps[_KERNEL_CAPABILITY_U32S];
+};
+
+
 /* Supported address families. */
 #define AF_UNSPEC	0
 #define AF_UNIX		1	/* Unix domain sockets 		*/
@@ -313,6 +326,11 @@ struct ucred {
 #define IPX_TYPE	1
 
 extern void cred_to_ucred(struct pid *pid, const struct cred *cred, struct ucred *ucred);
+extern void cred_to_scm_groups(const struct cred *cred,
+				struct scm_groups **sgp, int *size);
+extern void cred_to_scm_capabilities(const struct cred *cred,
+					struct scm_capabilities **scp,
+					int *size);
 
 extern int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len);
 extern int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov,
diff --git a/include/net/scm.h b/include/net/scm.h
index 745460f..b263867 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -102,6 +102,36 @@ static inline void scm_passec(struct socket *sock, struct msghdr *msg, struct sc
 { }
 #endif /* CONFIG_SECURITY_NETWORK */
 
+static inline void scm_passgroups(struct socket *sock, struct msghdr *msg,
+					struct scm_cookie *scm)
+{
+	struct scm_groups *data;
+	int size = 0;
+
+	if (test_bit(SOCK_PASSGROUPS, &sock->flags)) {
+		cred_to_scm_groups(scm->cred, &data, &size);
+		if (size) {
+			put_cmsg(msg, SOL_SOCKET, SCM_GROUPS, size, data);
+			kfree(data);
+		}
+	}
+}
+
+static inline void scm_passcaps(struct socket *sock, struct msghdr *msg,
+				struct scm_cookie *scm)
+{
+	struct scm_capabilities *data;
+	int size = 0;
+
+	if (test_bit(SOCK_PASSCAPS, &sock->flags)) {
+		cred_to_scm_capabilities(scm->cred, &data, &size);
+		if (size) {
+			put_cmsg(msg, SOL_SOCKET, SCM_CAPS, size, data);
+			kfree(data);
+		}
+	}
+}
+
 static __inline__ void scm_recv(struct socket *sock, struct msghdr *msg,
 				struct scm_cookie *scm, int flags)
 {
@@ -115,6 +145,9 @@ static __inline__ void scm_recv(struct socket *sock, struct msghdr *msg,
 	if (test_bit(SOCK_PASSCRED, &sock->flags))
 		put_cmsg(msg, SOL_SOCKET, SCM_CREDENTIALS, sizeof(scm->creds), &scm->creds);
 
+	scm_passgroups(sock, msg, scm);
+	scm_passcaps(sock, msg, scm);
+
 	scm_destroy_cred(scm);
 
 	scm_passec(sock, msg, scm);
diff --git a/net/core/sock.c b/net/core/sock.c
index 7dfed79..80eb5d7 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -648,6 +648,20 @@ set_rcvbuf:
 			clear_bit(SOCK_PASSCRED, &sock->flags);
 		break;
 
+	case SO_PASSGROUPS:
+		if (valbool)
+			set_bit(SOCK_PASSGROUPS, &sock->flags);
+		else
+			clear_bit(SOCK_PASSGROUPS, &sock->flags);
+		break;
+
+	case SO_PASSCAPS:
+		if (valbool)
+			set_bit(SOCK_PASSCAPS, &sock->flags);
+		else
+			clear_bit(SOCK_PASSCAPS, &sock->flags);
+		break;
+
 	case SO_TIMESTAMP:
 	case SO_TIMESTAMPNS:
 		if (valbool)  {
@@ -764,6 +778,66 @@ void cred_to_ucred(struct pid *pid, const struct cred *cred,
 }
 EXPORT_SYMBOL_GPL(cred_to_ucred);
 
+void cred_to_scm_groups(const struct cred *cred,
+			struct scm_groups **scm_groups, int *size)
+{
+	struct user_namespace *current_ns;
+	struct scm_groups *sgp;
+	struct group_info *gip;
+	int bytes;
+	int i;
+
+	if (!cred || !scm_groups)
+		return;
+
+	gip = cred->group_info;
+	if (gip == NULL)
+		return;
+
+	bytes = sizeof(struct scm_groups) + gip->ngroups * sizeof(__u32);
+	sgp = kmalloc(bytes, GFP_KERNEL);
+	if (sgp == NULL) {
+		*scm_groups = NULL;
+		*size = 0;
+		return;
+	}
+
+	current_ns = current_user_ns();
+
+	for (i = 0 ; i < gip->ngroups; i++)
+		sgp->gids[i] = user_ns_map_gid(current_ns, cred,
+						GROUP_AT(gip, i));
+	sgp->count = gip->ngroups;
+	*scm_groups = sgp;
+	*size = bytes;
+}
+EXPORT_SYMBOL_GPL(cred_to_scm_groups);
+
+void cred_to_scm_capabilities(const struct cred *cred,
+				struct scm_capabilities **scm_caps, int *size)
+{
+	struct scm_capabilities *scp;
+	int i;
+
+	if (!cred || !scm_caps)
+		return;
+
+	scp = kmalloc(sizeof(struct scm_capabilities), GFP_KERNEL);
+	if (!scp) {
+		*scm_caps = NULL;
+		*size = 0;
+		return;
+	}
+	for (i = 0; i < _LINUX_CAPABILITY_U32S_3; i++) {
+		scp->caps[i].permitted = cred->cap_permitted.cap[i];
+		scp->caps[i].effective = cred->cap_effective.cap[i];
+		scp->caps[i].inheritable = cred->cap_inheritable.cap[i];
+	}
+	*scm_caps = scp;
+	*size = sizeof(struct scm_capabilities);
+}
+EXPORT_SYMBOL_GPL(cred_to_scm_capabilities);
+
 int sock_getsockopt(struct socket *sock, int level, int optname,
 		    char __user *optval, int __user *optlen)
 {
@@ -915,6 +989,14 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val = test_bit(SOCK_PASSCRED, &sock->flags) ? 1 : 0;
 		break;
 
+	case SO_PASSGROUPS:
+		v.val = test_bit(SOCK_PASSGROUPS, &sock->flags) ? 1 : 0;
+		break;
+
+	case SO_PASSCAPS:
+		v.val = test_bit(SOCK_PASSCAPS, &sock->flags) ? 1 : 0;
+		break;
+
 	case SO_PEERCRED:
 	{
 		struct ucred peercred;

^ permalink raw reply related

* Re: [Bugme-new] [Bug 28282] New: forwarding turns autoconfiguration off
From: Hadmut Danisch @ 2011-02-08 22:12 UTC (permalink / raw)
  To: David Miller; +Cc: akpm, netdev, bugzilla-daemon, bugme-daemon
In-Reply-To: <20110208.134421.39185637.davem@davemloft.net>

On 08.02.2011 22:44, David Miller wrote:
>
> This is a case where we're probably just following what the RFC documents
> state we should do, which means unless you can provide clear reference to
> a specification that states we should behave otherwise this isn't changing.

Could you cite where exactly this is stated in the RFC documents? (Would
save me the time to dig through all the RFCs to find that particular
statement and help avoid misunderstanding.)

It appears to me to be a contradiction in terms. IPv6 interfaces must be
able to have several IP addresses, and IPv6 does not have a default
route (or 0::0/0). IPv6 interfaces are designed to participate in
multiple independend logical networks (and several address ranges have
been reserved for future extensions and uses). It therefore does not
make sense if autoconfiguration for one network and routing for another
are mutually exclusive. I'd like to check this (and maybe file a bug in
the RFC).

regards
Hadmut

^ permalink raw reply

* Re: Oops in tcp_output.c, kernel 2.6.38-rc3
From: Brandeburg, Jesse @ 2011-02-08 22:07 UTC (permalink / raw)
  To: David Miller
  Cc: cebbert@redhat.com, netdev@vger.kernel.org,
	ilpo.jarvinen@helsinki.fi, bruce.w.allan
In-Reply-To: <20110207.130234.112590109.davem@davemloft.net>



On Mon, 7 Feb 2011, David Miller wrote:

> From: Chuck Ebbert <cebbert@redhat.com>
> Date: Fri, 4 Feb 2011 15:32:54 -0500
> 
> > Analysis is below. (From https://bugzilla.redhat.com/show_bug.cgi?id=674622)
> > 
> >  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> >  IP: [<ffffffff81407b16>] tcp_write_xmit+0x694/0x7af
> 
> I bet this is some kind of bug in the driver or similar, what device
> and also what kind of network config (netfilter, packet scheduler,
> interface, routes, etc.) is this?

glaring at the bug dump shows e1000e as the only network driver that i can 
see, hardware seems to be the ICH10 chipset.  Copied bruce in case he's 
seen or heard anything.

appears to be this system:
http://www.asus.com/Product.aspx?P_ID=qH6ZSEJ8EPY6HoNU&content=specifications


^ permalink raw reply

* Re: [PATCH] ipsec: allow to align IPv4 AH on 32 bits
From: David Miller @ 2011-02-08 22:00 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: herbert, netdev, christophe.gouault
In-Reply-To: <4D49864E.8020002@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Wed, 02 Feb 2011 17:29:02 +0100

> On 28/01/2011 20:46, David Miller wrote:
>> From: Nicolas Dichtel<nicolas.dichtel@6wind.com>
>> Date: Fri, 28 Jan 2011 09:51:40 +0100
>>
>>> On 28/01/2011 05:51, Herbert Xu wrote:
>>>> So perhaps an SA configuration flag is needed?
>>> I agree. If David is ok, I will update the patch.
>>
>> Sounds good to me.
> 
> Here is the new patch.

I've applied this to net-next-2.6, thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox