Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 1/2] Export firmware assigned labels of network devices to sysfs
From: Domsch, Matt @ 2010-07-07 18:35 UTC (permalink / raw)
  To: Greg KH
  Cc: K, Narendra, netdev@vger.kernel.org,
	linux-hotplug@vger.kernel.org, linux-pci@vger.kernel.org,
	Hargrave, Jordan, Rose, Charles, Nijhawan, Vijay
In-Reply-To: <20100707181134.GB4293@kroah.com>

On Wed, Jul 07, 2010 at 01:11:34PM -0500, Greg KH wrote:
> > > Why do you need it?  What is calling that function?  What am I missing
> > > here?
> > 
> > The function 'pci_create_smbiosname_file' below is calling the .test method.
> > For every pdev the function checks if it has a SMBIOS string associated
> > with it or not. If there is no string (and instance) associated, then the
> > attributes 'label' and 'instance' are not created for that pdev.
> > To check for the existance of the string, the .test method is needed and
> > it is not available in 'struct device_attribute'. It provides 
> > .show and .store. We need a .show and .test. So we defined 
> 
> {sigh}
> 
> So, you just reinvented the is_visible function in struct
> attribute_group?  Please use the infrastructure already available to do
> this, it saves on code and debugging and review time.

I'll take the blame for this.  I recommended Narendra use the .test
method, as this is what I did back in 2005 in drivers/firmware/edd.c
which was one of the earliest consumers of the new sysfs code.  James
added the is_visible field to attribute groups in 2008, which I
missed (only 3 drivers make use of it, so it was easy to miss).  Since
that's the "new" preferred way to do it, we can adjust this patch
accordingly.

Thanks,
Matt

-- 
Matt Domsch
Technology Strategist
Dell | Office of the CTO

^ permalink raw reply

* RE: [PATCH 2.6.35-rc1] net: vmxnet3 fixes [1/5] Spare skb to avoid starvation
From: Shreyas Bhatewara @ 2010-07-07 18:34 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, pv-drivers, stolarchuk, linux-kernel


Stephen,

Thanks for taking a look.
Comments inline.

> -----Original Message-----
> From: Stephen Hemminger [mailto:shemminger@vyatta.com]
> Sent: Wednesday, July 07, 2010 10:47 AM
> To: Shreyas Bhatewara
> Cc: netdev@vger.kernel.org; pv-drivers@vmware.com
> Subject: Re: [PATCH 2.6.35-rc1] net: vmxnet3 fixes [1/5] Spare skb to
> avoid starvation
> 
> On Wed, 7 Jul 2010 02:21:55 -0700 (PDT)
> Shreyas Bhatewara <sbhatewara@vmware.com> wrote:
> 
> >
> >
> > From: Shreyas Bhatewara <sbhatewara@vmware.com>
> >
> > skb_alloc() failure can cause the recv ring to loose all packet
> reception.
> > Avoid this by introducing a spare buffer.
> >
> > Signed-off-by: Michael Stolarchuk <stolarchuk@vmware.com>
> > Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>
> >
> 
> I don't see how this fixes the problem, what happens when
> the spare buffer is used up? Better to design a flow control algorithm
> that holds off sender if allocation fails, and retry allocation later
> (for example with a work queue).
> 


The spare skb is only used when the ring is "empty" and an skb allocation 
failure would cause the ring to starve.
It is never handed up to netif_rx(), and instead, when the rx_interrupt 
occurs, the skb is shuffled back to reuse. 

A solution like the one below was considered,


vmxnet3_try_alloc_skb(adapter)
{
    if(!vmxnet3_rq_alloc_rx_buf(..., ringsize - 1, adapter))      //This 
will be scheduled only when ring is empty, so try allocating max
            compat_schedule_delayed_work(&alloc_work, delay)
}
 
probe()
{
....
COMPAT_INIT_DELAYED_WORK(&alloc_work, vmxnet3_try_alloc_skb, adapter);
 
...
}
 
vmxnet3_rq_rx_complete()
{
.....
if (unlikely(num_to_alloc > VMXNET3_RX_ALLOC_THRESHOLD(rq,
                                                        ring_idx, 
adapter))) {
                        if(!vmxnet3_rq_alloc_rx_buf(rq, ring_idx, 
num_to_alloc, adapter) && num_to_alloc == ring_size-1)
                                      
compat_schedule_delayed_work(&alloc_work, delay);
 
 ....
}


But the value of delay, while scheduling the deferred work is critical. A 
good value is the stdard deviation of packet inter-arrival timings (varies 
a lot from low to high throughput rate). Instead of maintaining this value 
explicitly, its much smarter to rely on packet receive interrupts as 
skb-alloc-poll events, and use them to drive the skb allocation.



> > diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c
> b/drivers/net/vmxnet3/vmxnet3_drv.c
> > index 989b742..5a50d10 100644
> ...
> 
> >  /*
> > @@ -149,6 +149,13 @@ vmxnet3_cmd_ring_desc_avail(struct
> vmxnet3_cmd_ring *ring)
> >  		ring->next2comp - ring->next2fill - 1;
> >  }
> >
> > +static inline bool
> > +vmxnet3_cmd_ring_desc_empty(struct vmxnet3_cmd_ring *ring)
> > +{
> > +	return (ring->next2comp == ring->next2fill);
> > +}
> 
> const is good practice on functions like this.

Reattaching the patch with const in the function.

---

skb_alloc() failure can cause the ring to loose all packet reception. Avoid
this by introducing a spare buffer. The spare skb is only used when the ring is
"empty" and an skb allocation failure would cause the ring to starve. It is
never handed up to netif_rx(), and instead, when the rx_interrupt occurs, the 
skb is shuffled back to reuse. Further use the incoming receive packet 
interrupts as events to poll for skb allocation.

Signed-off-by: Michael Stolarchuk <stolarchuk@vmware.com>
Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>

---

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 989b742..5a50d10 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -541,7 +541,12 @@ vmxnet3_rq_alloc_rx_buf(struct vmxnet3_rx_queue *rq, u32 ring_idx,
 							 NET_IP_ALIGN);
 				if (unlikely(rbi->skb == NULL)) {
 					rq->stats.rx_buf_alloc_failure++;
-					break;
+					/* starvation prevention */
+					if (vmxnet3_cmd_ring_desc_empty(
+							rq->rx_ring + ring_idx))
+						rbi->skb = rq->spare_skb;
+					else
+						break;
 				}
 				rbi->skb->dev = adapter->netdev;
 
@@ -611,6 +616,29 @@ vmxnet3_append_frag(struct sk_buff *skb, struct Vmxnet3_RxCompDesc *rcd,
 }
 
 
+/*
+ * Free any pages which were attached to the frags of the spare skb.  This can
+ * happen when the spare skb is attached to the rx ring to prevent starvation,
+ * but there was no issue with page allocation.
+ */
+
+static void
+vmxnet3_rx_spare_skb_free_frags(struct vmxnet3_adapter *adapter)
+{
+	struct sk_buff *skb = adapter->rx_queue.spare_skb;
+	int i;
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[i];
+		BUG_ON(frag->page != 0);
+		put_page(frag->page);
+		frag->page = 0;
+		frag->size = 0;
+	}
+	skb_shinfo(skb)->nr_frags = 0;
+	skb->data_len = 0;
+}
+
+
 static void
 vmxnet3_map_pkt(struct sk_buff *skb, struct vmxnet3_tx_ctx *ctx,
 		struct vmxnet3_tx_queue *tq, struct pci_dev *pdev,
@@ -1060,8 +1088,12 @@ vmxnet3_rx_error(struct vmxnet3_rx_queue *rq, struct Vmxnet3_RxCompDesc *rcd,
 	 * ctx->skb may be NULL if this is the first and the only one
 	 * desc for the pkt
 	 */
-	if (ctx->skb)
-		dev_kfree_skb_irq(ctx->skb);
+	if (ctx->skb) {
+		if (ctx->skb == rq->spare_skb)
+			vmxnet3_rx_spare_skb_free_frags(adapter);
+		else
+			dev_kfree_skb_irq(ctx->skb);
+	}
 
 	ctx->skb = NULL;
 }
@@ -1159,6 +1191,12 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
 
 		skb = ctx->skb;
 		if (rcd->eop) {
+			if (skb == rq->spare_skb) {
+				rq->stats.drop_total++;
+				vmxnet3_rx_spare_skb_free_frags(adapter);
+				ctx->skb = NULL;
+				goto rcd_done;
+			}
 			skb->len += skb->data_len;
 			skb->truesize += skb->data_len;
 
@@ -1244,6 +1282,14 @@ vmxnet3_rq_cleanup(struct vmxnet3_rx_queue *rq,
 		rq->uncommitted[ring_idx] = 0;
 	}
 
+	/* free starvation prevention skb if allocated */
+	if (rq->spare_skb) {
+		vmxnet3_rx_spare_skb_free_frags(adapter);
+		dev_kfree_skb(rq->spare_skb);
+		rq->spare_skb = NULL;
+	}
+
+
 	rq->comp_ring.gen = VMXNET3_INIT_GEN;
 	rq->comp_ring.next2proc = 0;
 }
@@ -1325,6 +1371,15 @@ vmxnet3_rq_init(struct vmxnet3_rx_queue *rq,
 	}
 	vmxnet3_rq_alloc_rx_buf(rq, 1, rq->rx_ring[1].size - 1, adapter);
 
+	/* allocate ring starvation protection */
+	rq->spare_skb = dev_alloc_skb(PAGE_SIZE);
+	if (rq->spare_skb == NULL) {
+		vmxnet3_rq_cleanup(rq, adapter);
+		return -ENOMEM;
+	}
+
+
+
 	/* reset the comp ring */
 	rq->comp_ring.next2proc = 0;
 	memset(rq->comp_ring.base, 0, rq->comp_ring.size *
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h
index 34f392f..7985ba4 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -68,10 +68,10 @@
 /*
  * Version numbers
  */
-#define VMXNET3_DRIVER_VERSION_STRING   "1.0.5.0-k"
+#define VMXNET3_DRIVER_VERSION_STRING   "1.0.6.0-k"
 
 /* a 32-bit int, each byte encode a verion number in VMXNET3_DRIVER_VERSION */
-#define VMXNET3_DRIVER_VERSION_NUM      0x01000500
+#define VMXNET3_DRIVER_VERSION_NUM      0x01000600
 
 
 /*
@@ -149,6 +149,13 @@ vmxnet3_cmd_ring_desc_avail(struct vmxnet3_cmd_ring *ring)
 		ring->next2comp - ring->next2fill - 1;
 }
 
+static inline bool
+vmxnet3_cmd_ring_desc_empty(const struct vmxnet3_cmd_ring *ring)
+{
+	return (ring->next2comp == ring->next2fill);
+}
+
+
 struct vmxnet3_comp_ring {
 	union Vmxnet3_GenericDesc *base;
 	u32               size;
@@ -266,9 +273,10 @@ struct vmxnet3_rx_queue {
 	u32 qid2;           /* rqID in RCD for buffer from 2nd ring */
 	u32 uncommitted[2]; /* # of buffers allocated since last RXPROD
 				* update */
-	struct vmxnet3_rx_buf_info     *buf_info[2];
-	struct Vmxnet3_RxQueueCtrl            *shared;
+	struct vmxnet3_rx_buf_info	*buf_info[2];
+	struct Vmxnet3_RxQueueCtrl	*shared;
 	struct vmxnet3_rq_driver_stats  stats;
+	struct sk_buff			*spare_skb;      /* starvation skb */
 } __attribute__((__aligned__(SMP_CACHE_BYTES)));
 
 #define VMXNET3_LINUX_MAX_MSIX_VECT     1









^ permalink raw reply related

* Re: [PATCH 1/2] Export firmware assigned labels of network devices to sysfs
From: Greg KH @ 2010-07-07 18:11 UTC (permalink / raw)
  To: Narendra K
  Cc: netdev, linux-hotplug, linux-pci, matt_domsch, jordan_hargrave,
	charles_rose, vijay_nijhawan
In-Reply-To: <20100707174826.GA1046@auslistsprd01.us.dell.com>

On Wed, Jul 07, 2010 at 12:48:26PM -0500, Narendra K wrote:
> > -----Original Message-----
> > From: Greg KH [mailto:greg@kroah.com] 
> > Sent: Wednesday, July 07, 2010 4:52 AM
> > To: K, Narendra
> > Cc: netdev@vger.kernel.org; linux-hotplug@vger.kernel.org;
> > linux-pci@vger.kernel.org; Domsch, Matt; Hargrave, Jordan; Rose,
> > Charles; Nijhawan, Vijay
> > Subject: Re: [PATCH 1/2] Export firmware assigned labels of network
> > devices to sysfs
> > 
> > On Tue, Jul 06, 2010 at 01:52:18PM -0500, Narendra K wrote:
> > > 
> > > 'device_create_file' takes 'struct device_attribute *' as a param
> > which 
> > > we have not used here because  'struct device_attribute' does not have
> > .test
> > > member which we needed in this patch.
> > 
> > Why do you need it?  What is calling that function?  What am I missing
> > here?
> 
> The function 'pci_create_smbiosname_file' below is calling the .test method.
> For every pdev the function checks if it has a SMBIOS string associated
> with it or not. If there is no string (and instance) associated, then the
> attributes 'label' and 'instance' are not created for that pdev.
> To check for the existance of the string, the .test method is needed and
> it is not available in 'struct device_attribute'. It provides 
> .show and .store. We need a .show and .test. So we defined 

{sigh}

So, you just reinvented the is_visible function in struct
attribute_group?  Please use the infrastructure already available to do
this, it saves on code and debugging and review time.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH RFC 0/4] ARM/KGDB: Some fixes for SMP machines
From: Anton Vorontsov @ 2010-07-07 17:54 UTC (permalink / raw)
  To: kgdb-bugreport
  Cc: netdev, Russell King, David S. Miller, linux-arm-kernel,
	Jason Wessel
In-Reply-To: <20100707171222.GA16448@oksana.dev.rtsoft.ru>

On Wed, Jul 07, 2010 at 09:12:22PM +0400, Anton Vorontsov wrote:
[...] 
> 2. The patches are against a heavily patched kernel, and so far
>    I didn't rebase them onto the 'debug core' rework as found
>    in the very latest mainline kernels. I'll rebase the patches
>    soon, so for now this is just an RFC.

BTW, I'm testing with another small fixup applied, I didn't send
it as a patch because this deadlock was already fixed in the
debug_core implementation (which KGDB is using nowadays). But
for the completeness, here it is:

(Don't deadlock if there's a wannabe-master CPUs, which entered
KGDB via exception, not NMI/IPI).

diff --git a/kernel/kgdb.c b/kernel/kgdb.c
index e7a2274..65bf75d 100644
--- a/kernel/kgdb.c
+++ b/kernel/kgdb.c
@@ -1522,7 +1522,9 @@ return_normal:
 		 * from the debugger.
 		 */
 		for_each_online_cpu(i) {
-			while (atomic_read(&cpu_in_kgdb[i]))
+			while (atomic_read(&cpu_in_kgdb[i]) &&
+					!(kgdb_info[i].exception_state &
+						DCPU_WANT_MASTER))
 				cpu_relax();
 		}
 	}

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first

^ permalink raw reply related

* Re: [PATCH 1/2] Export firmware assigned labels of network devices to sysfs
From: Narendra K @ 2010-07-07 17:48 UTC (permalink / raw)
  To: greg
  Cc: netdev, linux-hotplug, linux-pci, matt_domsch, jordan_hargrave,
	charles_rose, vijay_nijhawan
In-Reply-To: <EDA0A4495861324DA2618B4C45DCB3EE612B27@blrx3m08.blr.amer.dell.com>

> -----Original Message-----
> From: Greg KH [mailto:greg@kroah.com] 
> Sent: Wednesday, July 07, 2010 4:52 AM
> To: K, Narendra
> Cc: netdev@vger.kernel.org; linux-hotplug@vger.kernel.org;
> linux-pci@vger.kernel.org; Domsch, Matt; Hargrave, Jordan; Rose,
> Charles; Nijhawan, Vijay
> Subject: Re: [PATCH 1/2] Export firmware assigned labels of network
> devices to sysfs
> 
> On Tue, Jul 06, 2010 at 01:52:18PM -0500, Narendra K wrote:
> > 
> > 'device_create_file' takes 'struct device_attribute *' as a param
> which 
> > we have not used here because  'struct device_attribute' does not have
> .test
> > member which we needed in this patch.
> 
> Why do you need it?  What is calling that function?  What am I missing
> here?

The function 'pci_create_smbiosname_file' below is calling the .test method.
For every pdev the function checks if it has a SMBIOS string associated
with it or not. If there is no string (and instance) associated, then the
attributes 'label' and 'instance' are not created for that pdev.
To check for the existance of the string, the .test method is needed and
it is not available in 'struct device_attribute'. It provides 
.show and .store. We need a .show and .test. So we defined 

+struct smbios_attribute smbios_attr_label = {
+	.attr = {.name = "label", .mode = 0444, .owner = THIS_MODULE},
+	.show = smbiosname_show,
+	.test = smbios_instance_string_exist,
+};

'smbios_instance_string_exist' checks if the pdev has a 'string' and 'instance'.

+static int 
+pci_create_smbiosname_file(struct pci_dev *pdev)
+{
+	if (smbios_attr_label.test && smbios_attr_label.test(&pdev->dev, NULL, NULL)) {
+		if (sysfs_create_file(&pdev->dev.kobj, &smbios_attr_label.attr))
+			return -1;
+		if (sysfs_create_file(&pdev->dev.kobj, &smbios_attr_instance.attr))
+			return -1;
+		return 0;
+	}
+	return -1;	
+}


+int pci_create_firmware_label_files(struct pci_dev *pdev)
+{
+	if (!pci_create_smbiosname_file(pdev))

> 
> Please always run your patches through scripts/checkpatch.pl and fix up
> the issues it finds before sending it out and having everyone else point
> them out to you :)
> 
> Also, a new thread is nice at times for new versions of patches...

Thanks for the feedback. Sorry for missing this.I would address all the issues 
and post the patch in a new thread.

With regards,
Narendra K

^ permalink raw reply

* Re: [PATCH 2.6.35-rc1] net: vmxnet3 fixes [1/5] Spare skb to avoid starvation
From: Stephen Hemminger @ 2010-07-07 17:46 UTC (permalink / raw)
  To: Shreyas Bhatewara; +Cc: netdev, pv-drivers
In-Reply-To: <alpine.LRH.2.00.1007070214400.6105@localhost.localdomain>

On Wed, 7 Jul 2010 02:21:55 -0700 (PDT)
Shreyas Bhatewara <sbhatewara@vmware.com> wrote:

> 
> 
> From: Shreyas Bhatewara <sbhatewara@vmware.com>
> 
> skb_alloc() failure can cause the recv ring to loose all packet reception.
> Avoid this by introducing a spare buffer.
> 
> Signed-off-by: Michael Stolarchuk <stolarchuk@vmware.com>
> Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>
> 

I don't see how this fixes the problem, what happens when
the spare buffer is used up? Better to design a flow control algorithm
that holds off sender if allocation fails, and retry allocation later
(for example with a work queue).

> diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
> index 989b742..5a50d10 100644
...

>  /*
> @@ -149,6 +149,13 @@ vmxnet3_cmd_ring_desc_avail(struct vmxnet3_cmd_ring *ring)
>  		ring->next2comp - ring->next2fill - 1;
>  }
>  
> +static inline bool
> +vmxnet3_cmd_ring_desc_empty(struct vmxnet3_cmd_ring *ring)
> +{
> +	return (ring->next2comp == ring->next2fill);
> +}

const is good practice on functions like this.


^ permalink raw reply

* Re: [PATCH] vlan: allow TSO setting on vlan interfaces
From: Ben Hutchings @ 2010-07-07 17:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Patrick McHardy
In-Reply-To: <1278522874.5906.9.camel@edumazet-laptop>

On Wed, 2010-07-07 at 19:14 +0200, Eric Dumazet wrote:
> When we need to shape traffic with low speeds, we need to disable tso on
> network interface :
> 
> ethtool -K eth0.2240 tso off
> 
> It seems vlan interfaces miss the set_tso() ethtool method.
> Propagating tso changes from lower device is not always wanted, some
> vlan want TSO on, other want TSO off.

But it should not be possible to enable TSO if the underlying device
doesn't support it.

Ben.

> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> 
> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> index c6456cb..d827c2e 100644
> --- a/net/8021q/vlan_dev.c
> +++ b/net/8021q/vlan_dev.c
> @@ -844,6 +844,7 @@ static const struct ethtool_ops vlan_ethtool_ops = {
>  	.get_link		= ethtool_op_get_link,
>  	.get_rx_csum		= vlan_ethtool_get_rx_csum,
>  	.get_flags		= vlan_ethtool_get_flags,
> +	.set_tso		= ethtool_op_set_tso,
>  };
>  
>  static const struct net_device_ops vlan_netdev_ops = {
> 

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* [PATCH] vlan: allow TSO setting on vlan interfaces
From: Eric Dumazet @ 2010-07-07 17:14 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Patrick McHardy

When we need to shape traffic with low speeds, we need to disable tso on
network interface :

ethtool -K eth0.2240 tso off

It seems vlan interfaces miss the set_tso() ethtool method.
Propagating tso changes from lower device is not always wanted, some
vlan want TSO on, other want TSO off.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index c6456cb..d827c2e 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -844,6 +844,7 @@ static const struct ethtool_ops vlan_ethtool_ops = {
 	.get_link		= ethtool_op_get_link,
 	.get_rx_csum		= vlan_ethtool_get_rx_csum,
 	.get_flags		= vlan_ethtool_get_flags,
+	.set_tso		= ethtool_op_set_tso,
 };
 
 static const struct net_device_ops vlan_netdev_ops = {



^ permalink raw reply related

* [PATCH RFC 2/4] ARM: kgdb: Disable preemption before re-enabling interrupts
From: Anton Vorontsov @ 2010-07-07 17:13 UTC (permalink / raw)
  To: kgdb-bugreport
  Cc: Russell King, David S. Miller, Jason Wessel, netdev,
	linux-arm-kernel
In-Reply-To: <20100707171222.GA16448@oksana.dev.rtsoft.ru>

We have to disable preemption before enabling local IRQs because
local_irq_enable() makes it possible for the kernel to reschedule,
so the kernel might hit a breakpoint causing itself to re-enter
KGDB and die.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
---
 arch/arm/kernel/kgdb.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index 5c61100..6ece654 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -167,9 +167,17 @@ static void kgdb_call_nmi_hook(void *ignored)
 
 void kgdb_roundup_cpus(unsigned long flags)
 {
+	/*
+	 * We have to disable preemption before enabling local
+	 * IRQs because local_irq_enable() makes it possible for
+	 * the kernel to reschedule, so the kernel might hit a
+	 * breakpoint causing itself to re-enter KGDB and die.
+	 */
+	preempt_disable();
        local_irq_enable();
        smp_call_function(kgdb_call_nmi_hook, NULL, 0);
        local_irq_disable();
+	preempt_enable();
 }
 
 #ifdef CONFIG_SMP
-- 
1.7.0.5


^ permalink raw reply related

* [PATCH RFC 4/4] kgdb: Quiesce IO back-end before rounding up secondary CPUs
From: Anton Vorontsov @ 2010-07-07 17:13 UTC (permalink / raw)
  To: kgdb-bugreport
  Cc: Russell King, David S. Miller, Jason Wessel, netdev,
	linux-arm-kernel
In-Reply-To: <20100707171222.GA16448@oksana.dev.rtsoft.ru>

KGDB may try to roundup secondary CPUs while they're in the IO
back-end code (e.g. NAPI is polling an eth device), so the
secondary CPU might stop processing IO at a random place, which
may cause the IO back-end to become unusable for KGDB itself.

This patch implements try_quiesce and activate KGDB IO callbacks,
so that we quiesce the IO back-end before rounding up the CPUs.
So far it's only implemented for KGDBoE driver.

Note that we have to quiesce the devices via _try mechanism since
we have to poll for IPIs during the time we wait for the other
CPUs.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
---
 drivers/net/kgdboe.c |   21 +++++++++++++++++++++
 include/linux/kgdb.h |    2 ++
 kernel/kgdb.c        |    6 ++++++
 3 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/drivers/net/kgdboe.c b/drivers/net/kgdboe.c
index 939797a..b46cc9a 100644
--- a/drivers/net/kgdboe.c
+++ b/drivers/net/kgdboe.c
@@ -270,8 +270,29 @@ static int param_set_kgdboe_var(const char *kmessage, struct kernel_param *kp)
 	return 0;
 }
 
+static int eth_try_quiesce(void)
+{
+	struct napi_struct *napi;
+
+	list_for_each_entry(napi, &np.dev->napi_list, dev_list) {
+		if (!napi_try_disable(napi))
+			return 0;
+	}
+	return 1;
+}
+
+static void eth_activate(void)
+{
+	struct napi_struct *napi;
+
+	list_for_each_entry(napi, &np.dev->napi_list, dev_list)
+		napi_enable(napi);
+}
+
 static struct kgdb_io local_kgdb_io_ops = {
 	.name = "kgdboe",
+	.try_quiesce = eth_try_quiesce,
+	.activate = eth_activate,
 	.read_char = eth_get_char,
 	.write_char = eth_put_char,
 	.flush = eth_flush_buf,
diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
index 4c80859..93cd305 100644
--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -250,6 +250,8 @@ struct kgdb_arch {
  */
 struct kgdb_io {
 	const char		*name;
+	int			(*try_quiesce) (void);
+	void			(*activate) (void);
 	int			(*read_char) (void);
 	void			(*write_char) (u8);
 	void			(*flush) (void);
diff --git a/kernel/kgdb.c b/kernel/kgdb.c
index 97edb05..e7a2274 100644
--- a/kernel/kgdb.c
+++ b/kernel/kgdb.c
@@ -1475,6 +1475,9 @@ return_normal:
 			atomic_inc(&passive_cpu_wait[i]);
 	}
 
+	while (kgdb_io_ops->try_quiesce && !kgdb_io_ops->try_quiesce())
+		kgdb_arch_poll_ipi(regs);
+
 #ifdef CONFIG_SMP
 	/* Signal the other CPUs to enter kgdb_wait() */
 	if ((!kgdb_single_step) && kgdb_do_roundup)
@@ -1489,6 +1492,9 @@ return_normal:
 			kgdb_arch_poll_ipi(regs);
 	}
 
+	if (kgdb_io_ops->activate)
+		kgdb_io_ops->activate();
+
 	/*
 	 * At this point the primary processor is completely
 	 * in the debugger and all secondary CPUs are quiescent
-- 
1.7.0.5

^ permalink raw reply related

* [PATCH RFC 3/4] net: Implement napi_try_disable()
From: Anton Vorontsov @ 2010-07-07 17:13 UTC (permalink / raw)
  To: kgdb-bugreport
  Cc: Russell King, David S. Miller, Jason Wessel, netdev,
	linux-arm-kernel
In-Reply-To: <20100707171222.GA16448@oksana.dev.rtsoft.ru>

For KGDBoE we'll need to disable NAPI, so that we don't randomly
interrupt secondary CPUs while they process networking stuff.

We need to wait for NAPI to become disabled from an atomic context,
thus we can't use napi_disable() as it calls msleep().

Plus, in KGDB we have to always poll for IPIs during busy-waiting,
which means that we must implement the function in a such way that
we can do some caller specific work during the time we wait for
NAPI disable, so just changing msleep() to mdelay() won't work.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
---
 include/linux/netdevice.h |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 08a1dd8..4fc24eb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -414,6 +414,27 @@ extern void __napi_complete(struct napi_struct *n);
 extern void napi_complete(struct napi_struct *n);
 
 /**
+ *	napi_try_disable - try to prevent NAPI from scheduling
+ *	@n: napi context
+ *
+ * This call is similar to napi_disable(), except that it doesn't
+ * wait till any outstanding processing completes, but returns
+ * whether napi was successfuly disabled.
+ *
+ * Note that you still must wait till napi is actually disabled,
+ * so the only benefit from using this call is that you can do
+ * some useful work while you wait.
+ */
+static inline int napi_try_disable(struct napi_struct *n)
+{
+	set_bit(NAPI_STATE_DISABLE, &n->state);
+	if (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
+		return 0;
+	clear_bit(NAPI_STATE_DISABLE, &n->state);
+	return 1;
+}
+
+/**
  *	napi_disable - prevent NAPI from scheduling
  *	@n: napi context
  *
-- 
1.7.0.5


^ permalink raw reply related

* [PATCH RFC 1/4] ARM: kgdb: Must poll for IPIs during busy-waiting
From: Anton Vorontsov @ 2010-07-07 17:13 UTC (permalink / raw)
  To: kgdb-bugreport
  Cc: Russell King, David S. Miller, Jason Wessel, netdev,
	linux-arm-kernel
In-Reply-To: <20100707171222.GA16448@oksana.dev.rtsoft.ru>

On architectures w/o NMIs (e.g. ARM), ordinary (maskable) IRQs are used
for SMP IPI calls.

Various deadlocks are possible if we not poll for IPIs:

- The master CPU might hang in kgdb_roundup_cpus() because the slave CPU
  does not process IPIs;
- DMA cache coherency calls are implemented as IPIs, so the master CPU
  might hang in *dma_sync_*() calls that may be issued by the KGDB IO
  back-ends.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
---
 arch/arm/kernel/kgdb.c |    8 ++++++++
 kernel/kgdb.c          |   23 ++++++++++++++++++++++-
 2 files changed, 30 insertions(+), 1 deletions(-)

diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index a5b846b..5c61100 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -11,6 +11,7 @@
  */
 #include <linux/irq.h>
 #include <linux/kgdb.h>
+#include <asm/smp.h>
 #include <asm/traps.h>
 
 /* Make a local copy of the registers passed into the handler (bletch) */
@@ -171,6 +172,13 @@ void kgdb_roundup_cpus(unsigned long flags)
        local_irq_disable();
 }
 
+#ifdef CONFIG_SMP
+void kgdb_arch_poll_ipi(struct pt_regs *regs)
+{
+	do_IPI(regs);
+}
+#endif
+
 /**
  *	kgdb_arch_init - Perform any architecture specific initalization.
  *
diff --git a/kernel/kgdb.c b/kernel/kgdb.c
index 9934aa0..97edb05 100644
--- a/kernel/kgdb.c
+++ b/kernel/kgdb.c
@@ -225,6 +225,25 @@ int __weak kgdb_arch_init(void)
 	return 0;
 }
 
+/*
+ * On architectures w/o NMIs (e.g. ARM), ordinary (maskable) IRQs are used
+ * for SMP IPI calls.
+ *
+ * Various deadlocks are possible if we not poll for IPIs:
+ *
+ * - The master CPU might hang in kgdb_roundup_cpus() because the slave CPU
+ *   does not process IPIs;
+ * - DMA cache coherency calls are implemented as IPIs, so the master CPU
+ *   might hang in *dma_sync_*() calls that may be issued by the KGDB IO
+ *   back-ends.
+ *
+ * The rule of thumb: always poll for IPIs in busy-waiting loops until
+ * all CPUs are in KGDB (i.e. all cpu_in_kgdb[] are set).
+ */
+void __weak kgdb_arch_poll_ipi(struct pt_regs *regs)
+{
+}
+
 int __weak kgdb_skipexception(int exception, struct pt_regs *regs)
 {
 	return 0;
@@ -1389,6 +1408,8 @@ acquirelock:
 	 * master cpu and acquire the kgdb_active lock:
 	 */
 	while (1) {
+		kgdb_arch_poll_ipi(regs);
+
 		if (kgdb_info[cpu].exception_state & DCPU_WANT_MASTER) {
 			if (atomic_cmpxchg(&kgdb_active, -1, cpu) == cpu)
 				break;
@@ -1465,7 +1486,7 @@ return_normal:
 	 */
 	for_each_online_cpu(i) {
 		while (!atomic_read(&cpu_in_kgdb[i]))
-			cpu_relax();
+			kgdb_arch_poll_ipi(regs);
 	}
 
 	/*
-- 
1.7.0.5


^ permalink raw reply related

* [PATCH RFC 0/4] ARM/KGDB: Some fixes for SMP machines
From: Anton Vorontsov @ 2010-07-07 17:12 UTC (permalink / raw)
  To: kgdb-bugreport
  Cc: Russell King, David S. Miller, Jason Wessel, netdev,
	linux-arm-kernel

Hi all,

It appears that KGDB can easily hang SMP ARM machines.

Here are few fixes that I come up with.

Before the patches, doing 'b schedule' and then 1-3 'continue'
or 'si' gdb commands would cause the board to lock-up completely.

After the patches, KGDB seems to be bullet proof. Well, at least
with the 'b schedule' testcase, there might be other bugs which
I didn't hit yet. ;-)

Notes:

1. I'm testing with I/D caches disabled. With enabled caches,
   ARMv6K still locks-up. I'll test ARMv7 soon, and will try
   to debug cache issues on ARMv6K.

2. The patches are against a heavily patched kernel, and so far
   I didn't rebase them onto the 'debug core' rework as found
   in the very latest mainline kernels. I'll rebase the patches
   soon, so for now this is just an RFC.

Thanks!

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply

* [PATCH linux-2.6.35-rc3] ks8842 driver
From: Choi, David @ 2010-07-07 15:24 UTC (permalink / raw)
  To: netdev; +Cc: Choi, David, Li, Charles

To whom it may have concerned,

It seems that there are differences between Micrel ks8842 device and Timberdale FPGA based device with generic bus interface. In order to check the differences, I have sent several times but have not received any response. This patch is to support ks8841/ks8842 device from Micrel.

From: David J. Choi <david.choi@micrel.com>

Body of the explanation: 
   -support 16bit and 32bit bus width.
   -add device reset for ks8842/8841 Micrel device.
   -set 100Mbps as a default for Micrel device.
   -set MAC address in both MAC/Switch layer with different sequence for Micrel device, 
    as mentioned in data sheet.

Signed-off-by: David J. Choi <david.choi@micrel.com>

---
--- linux-2.6.35-rc3/drivers/net/ks8842.c.orig	2010-07-01 16:26:50.000000000 -0700
+++ linux-2.6.35-rc3/drivers/net/ks8842.c	2010-07-07 07:41:03.000000000 -0700
@@ -18,6 +18,7 @@
 
 /* Supports:
  * The Micrel KS8842 behind the timberdale FPGA
+ * The genuine Micrel KS8841/42 device with ISA 16/32bit bus interface
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -191,6 +192,12 @@ static inline u32 ks8842_read32(struct k
 
 static void ks8842_reset(struct ks8842_adapter *adapter)
 {
+#ifdef CONFIG_MICREL_KS884X
+	ks8842_write16(adapter, 3, 1, REG_GRR);
+	msleep(10);
+	iowrite16(0, adapter->hw_addr + REG_GRR);
+
+#else
 	/* The KS8842 goes haywire when doing softare reset
 	 * a work around in the timberdale IP is implemented to
 	 * do a hardware reset instead
@@ -201,6 +208,7 @@ static void ks8842_reset(struct ks8842_a
 	iowrite16(32, adapter->hw_addr + REG_SELECT_BANK);
 	iowrite32(0x1, adapter->hw_addr + REG_TIMB_RST);
 	msleep(20);
+#endif
 }
 
 static void ks8842_update_link_status(struct net_device *netdev,
@@ -269,8 +277,11 @@ static void ks8842_reset_hw(struct ks884
 
 	/* restart port auto-negotiation */
 	ks8842_enable_bits(adapter, 49, 1 << 13, REG_P1CR4);
+
+#ifndef CONFIG_MICREL_KS884X
 	/* only advertise 10Mbps */
 	ks8842_clear_bits(adapter, 49, 3 << 2, REG_P1CR4);
+#endif
 
 	/* Enable the transmitter */
 	ks8842_enable_tx(adapter);
@@ -296,6 +307,20 @@ static void ks8842_read_mac_addr(struct 
 	for (i = 0; i < ETH_ALEN; i++)
 		dest[ETH_ALEN - i - 1] = ks8842_read8(adapter, 2, REG_MARL + i);
 
+#ifdef CONFIG_MICREL_KS884X
+	/*
+		the sequence of saving mac addr between MAC and Switch is
+		different.
+	*/
+
+	mac = ks8842_read16(adapter, 2, REG_MARL);
+	ks8842_write16(adapter, 39, mac, REG_MACAR3);
+	mac = ks8842_read16(adapter, 2, REG_MARM);
+	ks8842_write16(adapter, 39, mac, REG_MACAR2);
+	mac = ks8842_read16(adapter, 2, REG_MARH);
+	ks8842_write16(adapter, 39, mac, REG_MACAR1);
+#else
+
 	/* make sure the switch port uses the same MAC as the QMU */
 	mac = ks8842_read16(adapter, 2, REG_MARL);
 	ks8842_write16(adapter, 39, mac, REG_MACAR1);
@@ -303,6 +328,7 @@ static void ks8842_read_mac_addr(struct 
 	ks8842_write16(adapter, 39, mac, REG_MACAR2);
 	mac = ks8842_read16(adapter, 2, REG_MARH);
 	ks8842_write16(adapter, 39, mac, REG_MACAR3);
+#endif
 }
 
 static void ks8842_write_mac_addr(struct ks8842_adapter *adapter, u8 *mac)
@@ -313,8 +339,13 @@ static void ks8842_write_mac_addr(struct
 	spin_lock_irqsave(&adapter->lock, flags);
 	for (i = 0; i < ETH_ALEN; i++) {
 		ks8842_write8(adapter, 2, mac[ETH_ALEN - i - 1], REG_MARL + i);
+#ifdef CONFIG_MICREL_KS884X
+		ks8842_write8(adapter, 39, mac[ETH_ALEN - i - 1],
+			REG_MACAR3 + 1 - i);
+#else
 		ks8842_write8(adapter, 39, mac[ETH_ALEN - i - 1],
 			REG_MACAR1 + i);
+#endif
 	}
 	spin_unlock_irqrestore(&adapter->lock, flags);
 }
@@ -328,8 +359,12 @@ static int ks8842_tx_frame(struct sk_buf
 {
 	struct ks8842_adapter *adapter = netdev_priv(netdev);
 	int len = skb->len;
+#ifdef CONFIG_KS884X_16BIT
+	u16 *ptr16 = (u16 *)skb->data;
+#else
 	u32 *ptr = (u32 *)skb->data;
 	u32 ctrl;
+#endif
 
 	dev_dbg(&adapter->pdev->dev,
 		"%s: len %u head %p data %p tail %p end %p\n",
@@ -340,6 +375,18 @@ static int ks8842_tx_frame(struct sk_buf
 	if (ks8842_tx_fifo_space(adapter) < len + 8)
 		return NETDEV_TX_BUSY;
 
+#ifdef CONFIG_KS884X_16BIT
+	ks8842_write16(adapter, 17, 0x8000 | 0x100, REG_QMU_DATA_LO);
+	ks8842_write16(adapter, 17, (u16)len, REG_QMU_DATA_HI);
+	netdev->stats.tx_bytes += len;
+
+	/* copy buffer */
+	while (len > 0) {
+		iowrite16(*ptr16++, adapter->hw_addr + REG_QMU_DATA_LO);
+		iowrite16(*ptr16++, adapter->hw_addr + REG_QMU_DATA_HI);
+		len -= sizeof(u32);
+	}
+#else
 	/* the control word, enable IRQ, port 1 and the length */
 	ctrl = 0x8000 | 0x100 | (len << 16);
 	ks8842_write32(adapter, 17, ctrl, REG_QMU_DATA_LO);
@@ -352,6 +399,7 @@ static int ks8842_tx_frame(struct sk_buf
 		len -= sizeof(u32);
 		ptr++;
 	}
+#endif
 
 	/* enqueue packet */
 	ks8842_write16(adapter, 17, 1, REG_TXQCR);
@@ -364,10 +412,15 @@ static int ks8842_tx_frame(struct sk_buf
 static void ks8842_rx_frame(struct net_device *netdev,
 	struct ks8842_adapter *adapter)
 {
+#ifdef CONFIG_KS884X_16BIT
+	u16 status = ks8842_read16(adapter, 17, REG_QMU_DATA_LO);
+	int len  = (int)ks8842_read16(adapter, 17, REG_QMU_DATA_HI) & 0xffff;
+#else
 	u32 status = ks8842_read32(adapter, 17, REG_QMU_DATA_LO);
 	int len = (status >> 16) & 0x7ff;
 
 	status &= 0xffff;
+#endif
 
 	dev_dbg(&adapter->pdev->dev, "%s - rx_data: status: %x\n",
 		__func__, status);
@@ -379,13 +432,28 @@ static void ks8842_rx_frame(struct net_d
 		dev_dbg(&adapter->pdev->dev, "%s, got package, len: %d\n",
 			__func__, len);
 		if (skb) {
+#ifdef CONFIG_KS884X_16BIT
+			u16 *data16;
+#else
 			u32 *data;
+#endif
 
 			netdev->stats.rx_packets++;
 			netdev->stats.rx_bytes += len;
 			if (status & RXSR_MULTICAST)
 				netdev->stats.multicast++;
 
+#ifdef CONFIG_KS884X_16BIT
+			data16 = (u16 *)skb_put(skb, len);
+			ks8842_select_bank(adapter, 17);
+			while (len > 0) {
+				*data16++ = ioread16(adapter->hw_addr +
+					REG_QMU_DATA_LO);
+				*data16++ = ioread16(adapter->hw_addr +
+					REG_QMU_DATA_HI);
+				len -= sizeof(u32);
+			}
+#else
 			data = (u32 *)skb_put(skb, len);
 
 			ks8842_select_bank(adapter, 17);
@@ -397,6 +465,7 @@ static void ks8842_rx_frame(struct net_d
 
 			skb->protocol = eth_type_trans(skb, netdev);
 			netif_rx(skb);
+#endif
 		} else
 			netdev->stats.rx_dropped++;
 	} else {
--- linux-2.6.35-rc3/drivers/net/Kconfig.orig	2010-07-02 15:52:41.000000000 -0700
+++ linux-2.6.35-rc3/drivers/net/Kconfig	2010-07-07 07:45:47.000000000 -0700
@@ -1748,11 +1748,29 @@ config TLAN
 	  Please email feedback to <torben.mathiasen@compaq.com>.
 
 config KS8842
-	tristate "Micrel KSZ8842"
+	tristate "Micrel KSZ8841/42 with generic bus interface"
 	depends on HAS_IOMEM
 	help
-	  This platform driver is for Micrel KSZ8842 / KS8842
-	  2-port ethernet switch chip (managed, VLAN, QoS).
+	  This platform driver is for KSZ8841(1-port) / KS8842(2-port) 
+	  ethernet switch chip (managed, VLAN, QoS) from Micrel or
+	  Timberdale(FPGA).
+
+if KS8842
+config MICREL_KS884X
+	boolean "KSZ8841/42 device from Micrel"
+	default n
+	help
+	  Say Y to use Micrel device. Otherwise Timberdale(FPGA) device is 
+	  selected.
+
+config KS884X_16BIT
+	boolean "16bit bus width"
+	default y
+	help
+	  This option specifies 16bit or 32bit bus interface. Say Y to use 
+	  16bit bus. Otherwise 32bit bus is selected.
+
+endif # KS8842
 
 config KS8851
        tristate "Micrel KS8851 SPI"
---

^ permalink raw reply

* Re: Kernel bug 14839: Trying to use a TUN device for IPv6 traffic, cannot set destination address.
From: Rémi Denis-Courmont @ 2010-07-07 15:25 UTC (permalink / raw)
  To: Ian Shorter; +Cc: netdev
In-Reply-To: <846562.91258.qm@web113913.mail.gq1.yahoo.com>


On Wed, 7 Jul 2010 08:04:33 -0700 (PDT), Ian Shorter <mcisho@yahoo.com>
wrote:
> I reported this kernel bug at the end of 2009. In February 2010 Stephen
> Hemminger said that the problem was not unique to TUN, and that I should
> bring it up on the network maintainers mailing list, which I am finally
> doing. Do I need to provide the problem description again?

I really do not see the point of setting a destination address on a TUN
device. Since a TUN device has no link-layer, the notions of gateway or
destination is rather void of meaning.

-- 
Rémi Denis-Courmont
http://www.remlab.net
http://fi.linkedin.com/in/remidenis


^ permalink raw reply

* Kernel bug 14839: Trying to use a TUN device for IPv6 traffic, cannot set destination address.
From: Ian Shorter @ 2010-07-07 15:04 UTC (permalink / raw)
  To: netdev

Hello,

I reported this kernel bug at the end of 2009. In February 2010 Stephen Hemminger said that the problem was not unique to TUN, and that I should bring it up on the network maintainers mailing list, which I am finally doing. Do I need to provide the problem description again?

Regards
mcisho



      


^ permalink raw reply

* Re: QoS weirdness : HTB accuracy
From: Jussi Kivilinna @ 2010-07-07 15:07 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Andrew Beverley, Julien Vehent, Philip A. Prindeville, Netdev,
	netfilter, hawk
In-Reply-To: <Pine.LNX.4.64.1007071457020.6631@ask.diku.dk>

Quoting "Jesper Dangaard Brouer" <hawk@diku.dk>:

>
> On Sun, 4 Jul 2010, Andrew Beverley wrote:
>
>>>> I was, in fact, an error in my ruleset. I had put the 'linklayer atm' at
>>>> both the branch and leaf levels, so the overhead was computed twice,
>>>> creating those holes in the bandwidth.
>>>
>>> I am seeing similar behaviour with my setup. Am I making the same
>>> mistake? A subset of my rules is as follows:
>>>
>>> tc qdisc add dev ppp0 root handle 1: htb r2q 1
>>>
>>> tc class add dev ppp0 parent 1: classid 1:1 htb \
>>>    rate ${DOWNLINK}kbit ceil ${DOWNLINK}kbit \
>>>    overhead $overhead linklayer atm                   <------- Here
>>>
>>> tc class add dev ppp0 parent 1:1 classid 1:10 htb \
>>>    rate 612kbit ceil 612kbit prio 0 \
>>>    overhead $overhead linklayer atm                   <------- And here
>>>
>>> tc qdisc add dev ppp0 parent 1:10 handle 4210: \
>>>    sfq perturb 10 limit 50
>>>
>>> tc filter add dev ppp0 parent 1:0 protocol ip \
>>>    prio 10 handle 10 fw flowid 1:10
>>
>> I removed the overhead option on the first leaf, and the speeds change
>> to what I expect. However, the rules above are taken straight from the
>> ADSL Optimizer project, which was the source of the original overhead
>> patch for tc. So is the ADSL Optimizer project wrong?
>
> After looking at the HTB kernel code I believe that the ADSL  
> Optimizer project is NOT wrong.  You should/must set the linklayer  
> option on both the root class and leaf (else you would be charging  
> the root/parent node too little).
>

It's been while I looked at the linklayer/size-table code, but if I  
remember right overhead is calculated with first linklayer packet sees  
in qdisc. So when packet goes to leaf with 'linklayer atm', packet get  
packet size with overhead for ATM and root linklayer is not used for  
that packet. Because of this you can have leafs with different  
overheads (pppoe, ipv6-tunnel, etc, with ) and all get right  
overhead... ..

..

...Uh oh...

HTB still has linklayer/overhead of its own, I was talking about the  
generic linklayer code that can be used with all qdiscs. Never mind,  
sorry.

-Jussi



^ permalink raw reply

* Re: QoS weirdness : HTB accuracy
From: Jesper Dangaard Brouer @ 2010-07-07 13:07 UTC (permalink / raw)
  To: Andrew Beverley
  Cc: Julien Vehent, Philip A. Prindeville, Netdev, netfilter, hawk
In-Reply-To: <1278265807.1506.86.camel@andybev>


On Sun, 4 Jul 2010, Andrew Beverley wrote:

>>> I was, in fact, an error in my ruleset. I had put the 'linklayer atm' at
>>> both the branch and leaf levels, so the overhead was computed twice,
>>> creating those holes in the bandwidth.
>>
>> I am seeing similar behaviour with my setup. Am I making the same
>> mistake? A subset of my rules is as follows:
>>
>> tc qdisc add dev ppp0 root handle 1: htb r2q 1
>>
>> tc class add dev ppp0 parent 1: classid 1:1 htb \
>>     rate ${DOWNLINK}kbit ceil ${DOWNLINK}kbit \
>>     overhead $overhead linklayer atm                   <------- Here
>>
>> tc class add dev ppp0 parent 1:1 classid 1:10 htb \
>>     rate 612kbit ceil 612kbit prio 0 \
>>     overhead $overhead linklayer atm                   <------- And here
>>
>> tc qdisc add dev ppp0 parent 1:10 handle 4210: \
>>     sfq perturb 10 limit 50
>>
>> tc filter add dev ppp0 parent 1:0 protocol ip \
>>     prio 10 handle 10 fw flowid 1:10
>
> I removed the overhead option on the first leaf, and the speeds change
> to what I expect. However, the rules above are taken straight from the
> ADSL Optimizer project, which was the source of the original overhead
> patch for tc. So is the ADSL Optimizer project wrong?

After looking at the HTB kernel code I believe that the ADSL Optimizer 
project is NOT wrong.  You should/must set the linklayer option on both 
the root class and leaf (else you would be charging the root/parent node 
too little).

It is the expected behavior that small packets cause a significant
reduction in the available bandwidth on the ATM link.  Small packets will 
(almost) always cause 2 ATM packets (being send) using 106 bytes, thus
eg. sending a 40 bytes TCP ACK packet result in approx 62% overhead.

Cheers,
   Jesper Brouer

--
-------------------------------------------------------------------
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
-------------------------------------------------------------------

^ permalink raw reply

* [PATCH] NET: nl80211, fix lock imbalance and netdev referencing
From: Jiri Slaby @ 2010-07-07 13:02 UTC (permalink / raw)
  To: johannes
  Cc: jirislaby, linux-kernel, John W. Linville, David S. Miller,
	Jouni Malinen, Samuel Ortiz, linux-wireless, netdev

Stanse found that nl80211_set_wiphy imporperly handles a lock and netdev
reference and contains unreachable code. It is because there return statement
isntead of assignment to result variable. Fix that.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jouni Malinen <j@w1.fi>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
---
 net/wireless/nl80211.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 85285b4..fbfac58 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -877,7 +877,7 @@ static int nl80211_set_wiphy(struct sk_buff *skb, struct genl_info *info)
 		int idx, mbm = 0;
 
 		if (!rdev->ops->set_tx_power) {
-			return -EOPNOTSUPP;
+			result = -EOPNOTSUPP;
 			goto bad_res;
 		}
 
-- 
1.7.1

^ permalink raw reply related

* Re: QoS weirdness : HTB accuracy
From: Jesper Dangaard Brouer @ 2010-07-07 11:40 UTC (permalink / raw)
  To: netdev
In-Reply-To: <067c83163988908ef546d7ff7f560a17@localhost>

Julien Vehent <julien <at> linuxwall.info> writes:

> I observe unused bandwidth on my QoS policy that I cannot explain.
> Conditions are: I have a HTB tree with 8 classes and a total rate of
> 768kbits. I use the ATM option so I assume the real rate to be something
> close to 675kbits (about 88% of the ethernet rate).
> 
> The sum of my 8 rates is exactly 768kbits. Some have ceil values up to
> 768kbits.
> 
> When class 20 "tcp_acks" starts borrowing, TC reduces the total bandwidth
> down to 595kbits/S (minus 79kbits/s). And I can't explain why....

Fortunately, I can explain why.  This is actually the expected/correct
behavior.  The "tcp_acks" class only receives small packets, and small
packets have a significantly higher overhead than larger packets.

The simple explanation is that small packets will (almost) always
result in two ATM frames being transmittet, thus resulting in 106
bytes (2x 53 bytes) being used on the link.

 
> The attached graph "tc_htb_weirdness.png" shows the result: there are
> 'holes' in the sending rate. 

This is as explained above the behavior I expected to see.  As the
small packets eats/consumes more bandwidth on the physical link, and
your observations are based upon what happens on the Ethernet layer.

-- 
Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network Kernel Developer
  Cand. Scient Datalog / MSc.CS
  Author of http://adsl-optimizer.dk
  LinkedIn: http://www.linkedin.com/in/brouer



^ permalink raw reply

* Re: IPVS: Incorrect helper use for SCTP [was: wensong@linux-vs.org, horms@verge.net.au]
From: Simon Horman @ 2010-07-07 12:21 UTC (permalink / raw)
  To: xiaoyu Du; +Cc: lvs-devel, linux-kernel, netdev
In-Reply-To: <AANLkTinEKn_UBpjf3k2DZn1euMM6z2UM_yeDPvdEQkav@mail.gmail.com>

[CCed netdev]

Thanks,

that looks correct to me. Have you tested this change?
If so could you provided a Signed-off-by line
as per section 12 of Documentation/SubmittingPatches?

On Wed, Jul 07, 2010 at 05:19:06PM +0800, xiaoyu Du wrote:
> Hi,all
> 
> After I compared the sctp with tcp and udp, I thinkt his a bug that
> sctp_dnat_handler Incorrectly invoked ip_vs_app_pkt_out.
> below is the patch.
> 
> ---
>  net/netfilter/ipvs/ip_vs_proto_sctp.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c
> b/net/netfilter/ipvs/ip_vs_proto_sctp.c
> index c9a3f7a..db55759 100644
> --- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
> +++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
> @@ -173,7 +173,7 @@ sctp_dnat_handler(struct sk_buff *skb,
>                        return 0;
> 
>                /* Call application helper if needed */
> -               if (!ip_vs_app_pkt_out(cp, skb))
> +               if (!ip_vs_app_pkt_in(cp, skb))
>                        return 0;
>        }
> 
> --
> --
> To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 2.6.35-rc1] net: vmxnet3 fixes [5/5] Respect the interrupt type in VM configuration
From: Shreyas Bhatewara @ 2010-07-07  9:34 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, pv-drivers
In-Reply-To: <alpine.LRH.2.00.1007070202080.5939@localhost.localdomain>


Respect the interrupt type set in VM configuration.
   
When interrupt type is not auto, do not ignore the interrupt type set from
VM configuration.

Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>

---

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 01a5bb7..c6d756f 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -2355,9 +2355,13 @@ vmxnet3_alloc_intr_resources(struct vmxnet3_adapter *adapter)
 	adapter->intr.mask_mode = (cfg >> 2) & 0x3;
 
 	if (adapter->intr.type == VMXNET3_IT_AUTO) {
-		int err;
+		adapter->intr.type = VMXNET3_IT_MSIX;
+	}
 
 #ifdef CONFIG_PCI_MSI
+	if (adapter->intr.type == VMXNET3_IT_MSIX) {
+		int err;
+
 		adapter->intr.msix_entries[0].entry = 0;
 		err = pci_enable_msix(adapter->pdev, adapter->intr.msix_entries,
 				      VMXNET3_LINUX_MAX_MSIX_VECT);
@@ -2366,15 +2370,18 @@ vmxnet3_alloc_intr_resources(struct vmxnet3_adapter *adapter)
 			adapter->intr.type = VMXNET3_IT_MSIX;
 			return;
 		}
-#endif
+		adapter->intr.type = VMXNET3_IT_MSI;
+	}
 
+	if (adapter->intr.type == VMXNET3_IT_MSI) {
+		int err;
 		err = pci_enable_msi(adapter->pdev);
 		if (!err) {
 			adapter->intr.num_intrs = 1;
-			adapter->intr.type = VMXNET3_IT_MSI;
 			return;
 		}
 	}
+#endif /* CONFIG_PCI_MSI */
 
 	adapter->intr.type = VMXNET3_IT_INTX;
 
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h
index 8e1f704..6bab6ff 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -68,10 +68,10 @@
 /*
  * Version numbers
  */
-#define VMXNET3_DRIVER_VERSION_STRING   "1.0.11.0-k"
+#define VMXNET3_DRIVER_VERSION_STRING   "1.0.12.0-k"
 
 /* a 32-bit int, each byte encode a verion number in VMXNET3_DRIVER_VERSION */
-#define VMXNET3_DRIVER_VERSION_NUM      0x01000B00
+#define VMXNET3_DRIVER_VERSION_NUM      0x01000C00
 
 
 /*

^ permalink raw reply related

* [PATCH 2.6.35-rc1] net: vmxnet3 fixes [4/5] Do not reset when the device is not opened
From: Shreyas Bhatewara @ 2010-07-07  9:31 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, pv-drivers, ronghua, matthieu
In-Reply-To: <alpine.LRH.2.00.1007070202080.5939@localhost.localdomain>



commit bc8b3f0b3978d3c0a201926b2e2bd5c732e0352e
Author: Shreyas Bhatewara <sbhatewara@vmware.com>
Date:   Tue Jul 6 17:00:48 2010 -0700

No reset when the device is not opened
    
If a reset is scheduled, and the device goes thru close and open, it
may happen that reset and open may run in parallel.  
The reset code now bails out if the device is not opened.
 
Signed-off-by: Ronghua Zang <ronghua@vmware.com>
Signed-off-by: Matthieu Bucchianeri <matthieu@vmware.com>
Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>

---

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 31a838f..01a5bb7 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -2417,8 +2417,9 @@ vmxnet3_reset_work(struct work_struct *data)
 	if (test_and_set_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state))
 		return;
 
-	/* if the device is closed, we must leave it alone */
-	if (netif_running(adapter->netdev)) {
+	/* if the device is closed or is being opened, we must leave it alone */
+	if (netif_running(adapter->netdev) &&
+	    (adapter->netdev->flags & IFF_UP)) {
 		printk(KERN_INFO "%s: resetting\n", adapter->netdev->name);
 		vmxnet3_quiesce_dev(adapter);
 		vmxnet3_reset_dev(adapter);
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h
index 5c94afa..8e1f704 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -68,10 +68,10 @@
 /*
  * Version numbers
  */
-#define VMXNET3_DRIVER_VERSION_STRING   "1.0.10.0-k"
+#define VMXNET3_DRIVER_VERSION_STRING   "1.0.11.0-k"
 
 /* a 32-bit int, each byte encode a verion number in VMXNET3_DRIVER_VERSION */
-#define VMXNET3_DRIVER_VERSION_NUM      0x01000A00
+#define VMXNET3_DRIVER_VERSION_NUM      0x01000B00
 
 
 /*

^ permalink raw reply related

* [PATCH 2.6.35-rc1] net: vmxnet3 fixes [3/5] Initialize link state at probe time
From: Shreyas Bhatewara @ 2010-07-07  9:28 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, pv-drivers
In-Reply-To: <alpine.LRH.2.00.1007070125440.4652@localhost.localdomain>


Author: Shreyas Bhatewara <sbhatewara@vmware.com>

Initialize vmxnet3 link state at probe time
    
This change initializes the state of link at the time when driver is
loaded. The ethtool output for 'link detected' and 'link speed'
is thus valid even before the interface is brought up.
    
Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>

---

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 29db294..31a838f 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -130,7 +130,7 @@ vmxnet3_tq_stop(struct vmxnet3_tx_queue *tq, struct vmxnet3_adapter *adapter)
  * Check the link state. This may start or stop the tx queue.
  */
 static void
-vmxnet3_check_link(struct vmxnet3_adapter *adapter)
+vmxnet3_check_link(struct vmxnet3_adapter *adapter, bool affectTxQueue)
 {
 	u32 ret;
 
@@ -143,14 +143,16 @@ vmxnet3_check_link(struct vmxnet3_adapter *adapter)
 		if (!netif_carrier_ok(adapter->netdev))
 			netif_carrier_on(adapter->netdev);
 
-		vmxnet3_tq_start(&adapter->tx_queue, adapter);
+		if (affectTxQueue)
+			vmxnet3_tq_start(&adapter->tx_queue, adapter);
 	} else {
 		printk(KERN_INFO "%s: NIC Link is Down\n",
 		       adapter->netdev->name);
 		if (netif_carrier_ok(adapter->netdev))
 			netif_carrier_off(adapter->netdev);
 
-		vmxnet3_tq_stop(&adapter->tx_queue, adapter);
+		if (affectTxQueue)
+			vmxnet3_tq_stop(&adapter->tx_queue, adapter);
 	}
 }
 
@@ -165,7 +167,7 @@ vmxnet3_process_events(struct vmxnet3_adapter *adapter)
 
 	/* Check if link state has changed */
 	if (events & VMXNET3_ECR_LINK)
-		vmxnet3_check_link(adapter);
+		vmxnet3_check_link(adapter, true);
 
 	/* Check if there is an error on xmit/recv queues */
 	if (events & (VMXNET3_ECR_TQERR | VMXNET3_ECR_RQERR)) {
@@ -1947,7 +1949,7 @@ vmxnet3_activate_dev(struct vmxnet3_adapter *adapter)
 	 * Check link state when first activating device. It will start the
 	 * tx queue if the link is up.
 	 */
-	vmxnet3_check_link(adapter);
+	vmxnet3_check_link(adapter, true);
 
 	napi_enable(&adapter->napi);
 	vmxnet3_enable_all_intrs(adapter);
@@ -2549,6 +2551,7 @@ vmxnet3_probe_device(struct pci_dev *pdev,
 	}
 
 	set_bit(VMXNET3_STATE_BIT_QUIESCED, &adapter->state);
+	vmxnet3_check_link(adapter, false);
 	atomic_inc(&devices_found);
 	return 0;
 
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h
index 0ddfe3c..5c94afa 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -68,10 +68,10 @@
 /*
  * Version numbers
  */
-#define VMXNET3_DRIVER_VERSION_STRING   "1.0.9.0-k"
+#define VMXNET3_DRIVER_VERSION_STRING   "1.0.10.0-k"
 
 /* a 32-bit int, each byte encode a verion number in VMXNET3_DRIVER_VERSION */
-#define VMXNET3_DRIVER_VERSION_NUM      0x01000900
+#define VMXNET3_DRIVER_VERSION_NUM      0x01000A00
 
 
 /*

^ permalink raw reply related

* [PATCH 2.6.35-rc1] net: vmxnet3 fixes [2/5] Interrupt control bitmap
From: Shreyas Bhatewara @ 2010-07-07  9:24 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, pv-drivers, ronghua
In-Reply-To: <alpine.LRH.2.00.1007070125440.4652@localhost.localdomain>


A new bit map 'intrCtrl' is introduced in the DriverShared area. The driver 
should update VMXNET3_IC_DISABLE_ALL bit before writing IMR.

Signed-off-by: Ronghua Zang <ronghua@vmware.com>
Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>

---

diff --git a/drivers/net/vmxnet3/vmxnet3_defs.h b/drivers/net/vmxnet3/vmxnet3_defs.h
index b4889e6..ca7727b 100644
--- a/drivers/net/vmxnet3/vmxnet3_defs.h
+++ b/drivers/net/vmxnet3/vmxnet3_defs.h
@@ -464,6 +464,9 @@ enum vmxnet3_intr_type {
 /* addition 1 for events */
 #define VMXNET3_MAX_INTRS      25
 
+/* value of intrCtrl */
+#define VMXNET3_IC_DISABLE_ALL  0x1   /* bit 0 */
+
 
 struct Vmxnet3_IntrConf {
 	bool		autoMask;
@@ -471,7 +474,8 @@ struct Vmxnet3_IntrConf {
 	u8		eventIntrIdx;
 	u8		modLevels[VMXNET3_MAX_INTRS];	/* moderation level for
 							 * each intr */
-	__le32		reserved[3];
+	__le32		intrCtrl;
+	__le32		reserved[2];
 };
 
 /* one bit per VLAN ID, the size is in the units of u32	*/
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 1b2d467..29db294 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -72,6 +72,7 @@ vmxnet3_enable_all_intrs(struct vmxnet3_adapter *adapter)
 
 	for (i = 0; i < adapter->intr.num_intrs; i++)
 		vmxnet3_enable_intr(adapter, i);
+	adapter->shared->devRead.intrConf.intrCtrl &= ~VMXNET3_IC_DISABLE_ALL;
 }
 
 
@@ -80,6 +81,7 @@ vmxnet3_disable_all_intrs(struct vmxnet3_adapter *adapter)
 {
 	int i;
 
+	adapter->shared->devRead.intrConf.intrCtrl |= VMXNET3_IC_DISABLE_ALL;
 	for (i = 0; i < adapter->intr.num_intrs; i++)
 		vmxnet3_disable_intr(adapter, i);
 }
@@ -1880,6 +1882,7 @@ vmxnet3_setup_driver_shared(struct vmxnet3_adapter *adapter)
 		devRead->intrConf.modLevels[i] = adapter->intr.mod_levels[i];
 
 	devRead->intrConf.eventIntrIdx = adapter->intr.event_intr_idx;
+	devRead->intrConf.intrCtrl |= VMXNET3_IC_DISABLE_ALL;
 
 	/* rx filter settings */
 	devRead->rxFilterConf.rxMode = 0;
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h
index 7985ba4..0ddfe3c 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -68,10 +68,10 @@
 /*
  * Version numbers
  */
-#define VMXNET3_DRIVER_VERSION_STRING   "1.0.6.0-k"
+#define VMXNET3_DRIVER_VERSION_STRING   "1.0.9.0-k"
 
 /* a 32-bit int, each byte encode a verion number in VMXNET3_DRIVER_VERSION */
-#define VMXNET3_DRIVER_VERSION_NUM      0x01000600
+#define VMXNET3_DRIVER_VERSION_NUM      0x01000900
 
 
 /*

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox