Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 00/13] Swap-over-NBD without deadlocking
From: Peter Zijlstra @ 2011-04-26 14:23 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Neil Brown
In-Reply-To: <1303803414-5937-1-git-send-email-mgorman@suse.de>

On Tue, 2011-04-26 at 08:36 +0100, Mel Gorman wrote:
> Comments?

Last time I brought up the whole swap over network bits I was pointed
towards the generic skb recycling work:

  http://lwn.net/Articles/332037/

as a means to pre-allocate memory, and it was suggested to simply pin
the few route-cache entries required to route these packets and
dis-allow swap packets to be fragmented (these last two avoid lots of
funny allocation cases in the network stack).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 12/13] mm: Throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage
From: Mel Gorman @ 2011-04-26 14:26 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <20110426223059.10f3edda@notabene.brown>

On Tue, Apr 26, 2011 at 10:30:59PM +1000, NeilBrown wrote:
> On Tue, 26 Apr 2011 08:36:53 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> 
> > +/*
> > + * Throttle direct reclaimers if backing storage is backed by the network
> > + * and the PFMEMALLOC reserve for the preferred node is getting dangerously
> > + * depleted. kswapd will continue to make progress and wake the processes
> > + * when the low watermark is reached
> > + */
> > +static void throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
> > +					nodemask_t *nodemask)
> > +{
> > +	struct zone *zone;
> > +	int high_zoneidx = gfp_zone(gfp_mask);
> > +	DEFINE_WAIT(wait);
> > +
> > +	/* Check if the pfmemalloc reserves are ok */
> > +	first_zones_zonelist(zonelist, high_zoneidx, NULL, &zone);
> > +	prepare_to_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait,
> > +							TASK_INTERRUPTIBLE);
> > +	if (pfmemalloc_watermark_ok(zone->zone_pgdat, high_zoneidx))
> > +		goto out;
> > +
> > +	/* Throttle */
> > +	do {
> > +		schedule();
> > +		finish_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait);
> > +		prepare_to_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait,
> > +							TASK_INTERRUPTIBLE);
> > +	} while (!pfmemalloc_watermark_ok(zone->zone_pgdat, high_zoneidx) &&
> > +			!fatal_signal_pending(current));
> > +
> > +out:
> > +	finish_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait);
> > +}
> 
> You are doing an interruptible wait, but only checking for fatal signals.
> So if a non-fatal signal arrives, you will busy-wait.
> 
> So I suspect you want TASK_KILLABLE, so just use:
> 
>     wait_event_killable(zone->zone_pgdat->pfmemalloc_wait,
>                         pgmemalloc_watermark_ok(zone->zone_pgdata,
>                                                 high_zoneidx));
> 

Well, if a normal signal arrives, we do not necessarily want the
process to enter reclaim. For fatal signals, I allow it to continue
because it's not likely to be putting the system under more pressure
if it's exiting.

> (You also have an extraneous call to finish_wait)
> 

Which one? I'm not seeing a flow where finish_wait gets called twice
without a prepare_to_wait in between. 

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply

* Re: [PATCH 13/13] mm: Account for the number of times direct reclaimers get throttled
From: Mel Gorman @ 2011-04-26 14:26 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <20110426223510.4c6ab3cc@notabene.brown>

On Tue, Apr 26, 2011 at 10:35:10PM +1000, NeilBrown wrote:
> On Tue, 26 Apr 2011 08:36:54 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > Under significant pressure when writing back to network-backed storage,
> > direct reclaimers may get throttled. This is expected to be a
> > short-lived event and the processes get woken up again but processes do
> > get stalled. This patch counts how many times such stalling occurs. It's
> > up to the administrator whether to reduce these stalls by increasing
> > min_free_kbytes.
> > 
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > ---
> >  include/linux/vm_event_item.h |    1 +
> >  mm/vmscan.c                   |    1 +
> >  mm/vmstat.c                   |    1 +
> >  3 files changed, 3 insertions(+), 0 deletions(-)
> > 
> > diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> > index 03b90cdc..652e5f3 100644
> > --- a/include/linux/vm_event_item.h
> > +++ b/include/linux/vm_event_item.h
> > @@ -29,6 +29,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> >  		FOR_ALL_ZONES(PGSTEAL),
> >  		FOR_ALL_ZONES(PGSCAN_KSWAPD),
> >  		FOR_ALL_ZONES(PGSCAN_DIRECT),
> > +		PGSCAN_DIRECT_THROTTLE,
> >  #ifdef CONFIG_NUMA
> >  		PGSCAN_ZONE_RECLAIM_FAILED,
> >  #endif
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 8b6da2b..e88138b 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2154,6 +2154,7 @@ static void throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
> >  		goto out;
> >  
> >  	/* Throttle */
> > +	count_vm_event(PGSCAN_DIRECT_THROTTLE);
> >  	do {
> >  		schedule();
> >  		finish_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait);
> > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > index a2b7344..5725387 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -911,6 +911,7 @@ const char * const vmstat_text[] = {
> >  	TEXTS_FOR_ZONES("pgsteal")
> >  	TEXTS_FOR_ZONES("pgscan_kswapd")
> >  	TEXTS_FOR_ZONES("pgscan_direct")
> > +	"pgscan_direct_throttle",
> >  
> >  #ifdef CONFIG_NUMA
> >  	"zone_reclaim_failed",
> 
> I like this approach.  Make the information available, but don't make a fuss
> about it.
> 
> Actually, I like the whole series - I'm really having to dig deep to find
> anything to complain about :-)
> 
> Feel free to put
>    Reviewed-by: NeilBrown <neilb@suse.de>
> against anything that I haven't commented on.
> 

Thanks very much!

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 2/3] bql: Byte queue limits
From: Eric Dumazet @ 2011-04-26 14:33 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev
In-Reply-To: <BANLkTi=xTTjcG7CVfCk8sWojRSZhST+e_Q@mail.gmail.com>

Le mardi 26 avril 2011 à 07:13 -0700, Tom Herbert a écrit :

> > SMP ???), and why kobj is now guarded by CONFIG_XPS instead of
> > CONFIG_RPS.
> >
> Because that kobj is for transmit side (currently only for xps_cpus)

Please provide a separate patch then, cleanups should be seperated.




^ permalink raw reply

* Re: [PATCH 00/13] Swap-over-NBD without deadlocking
From: Mel Gorman @ 2011-04-26 14:46 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Neil Brown
In-Reply-To: <1303827785.20212.266.camel@twins>

On Tue, Apr 26, 2011 at 04:23:05PM +0200, Peter Zijlstra wrote:
> On Tue, 2011-04-26 at 08:36 +0100, Mel Gorman wrote:
> > Comments?
> 
> Last time I brought up the whole swap over network bits I was pointed
> towards the generic skb recycling work:
> 
>   http://lwn.net/Articles/332037/
> 
> as a means to pre-allocate memory,

I'd taken note of this to take a much closer look if it turned
out reservations were necessary and to find out what happened with
these patches. So far, bigger reservations have *not* been required
but I agree recycling SKBs may be a better alternative than large
reservations or preallocations if they are necessary.

>  and it was suggested to simply pin
> the few route-cache entries required to route these packets and
> dis-allow swap packets to be fragmented (these last two avoid lots of
> funny allocation cases in the network stack).
> 

I did find that only a few route-cache entries should be required. In
the original patches I worked with, there was a reservation for the
maximum possible number of route-cache entries. I thought this was
overkill and instead reserved 1-per-active-swapfile-backed-by-NFS.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 00/13] Swap-over-NBD without deadlocking
From: Peter Zijlstra @ 2011-04-26 14:50 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Neil Brown
In-Reply-To: <20110426144635.GK4658@suse.de>

On Tue, 2011-04-26 at 15:46 +0100, Mel Gorman wrote:
> 
> I did find that only a few route-cache entries should be required. In
> the original patches I worked with, there was a reservation for the
> maximum possible number of route-cache entries. I thought this was
> overkill and instead reserved 1-per-active-swapfile-backed-by-NFS.

Right, so the thing I was worried about was a route-cache poison attack
where someone would spam the machine such that it would create a lot of
route cache entries and might flush the one we needed just as we needed
it.

Pinning the one entry we need would solve that (if possible).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] dsa/mv88e6131: fix unknown multicast/broadcast forwarding on mv88e6085
From: Lennert Buytenhek @ 2011-04-26 14:57 UTC (permalink / raw)
  To: Peter Korsgaard; +Cc: davem, netdev
In-Reply-To: <1303818341-13839-1-git-send-email-jacmet@sunsite.dk>

On Tue, Apr 26, 2011 at 01:45:41PM +0200, Peter Korsgaard wrote:

> The 88e6085 has a few differences from the other devices in the port
> control registers, causing unknown multicast/broadcast packets to get
> dropped when using the standard port setup.
> 
> At the same time update kconfig to clarify that the mv88e6085 is now
> supported.
> 
> Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>

Assuming that you've tested this.. :)

Acked-by: Lennert Buytenhek <buytenh@wantstofly.org>

^ permalink raw reply

* Re: [PATCH] igb: restore EEPROM 16kB access limit
From: Andy Gospodarek @ 2011-04-26 15:06 UTC (permalink / raw)
  To: Wyborny, Carolyn
  Cc: e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org,
	Stefan Assmann, Ronciak, John
In-Reply-To: <EDC0E76513226749BFBC9C3FB031318F0137E85E47@orsmsx508.amr.corp.intel.com>

On Fri, Apr 08, 2011 at 01:10:30PM -0700, Wyborny, Carolyn wrote:
[...]
> 
> Yes, there's more code changed than just the removal of what you're trying to add back.  The snip is the replacement but those function need to exist as well.  I believe that the commit referenced did not completely apply and you're missing some critical code.  The problem you are seeing should not occur with full patch.
> 
> The version of e1000_82575.c in 2.6.39-rc2 has all the changes needed for this to work correctly.
> 

I'm still seeing failures with today's net-next-2.6 ('git describe'
shows v2.6.39-rc1-1283-g64cad2a), so it would be really nice to get this
fixed.  I would rather not have to carry a patch like the one Stefan
posted or one like this crazy one I hacked up to try all sizes until
valid NVRAM is found.

It applies cleanly net-next-2.6, net-2.6, and linux-2.6 as all exhibit
the exact same problem.

diff --git a/drivers/net/igb/e1000_82575.c b/drivers/net/igb/e1000_82575.c
index 0cd41c4..f8677f2 100644
--- a/drivers/net/igb/e1000_82575.c
+++ b/drivers/net/igb/e1000_82575.c
@@ -243,7 +243,7 @@ static s32 igb_get_invariants_82575(struct e1000_hw *hw)
 	 * for setting word_size.
 	 */
 	size += NVM_WORD_SIZE_BASE_SHIFT;
-
+err_eeprom:
 	nvm->word_size = 1 << size;
 	if (nvm->word_size == (1 << 15))
 		nvm->page_size = 128;
@@ -271,6 +271,17 @@ static s32 igb_get_invariants_82575(struct e1000_hw *hw)
 	}
 	nvm->ops.write = igb_write_nvm_spi;
 
+        /* make sure the NVM is good */
+        if (hw->nvm.ops.validate(hw) < 0) {
+		if (size > 14)  {
+			size--;
+			printk(KERN_ERR "igb: The NVM size is not valid, trying %d\n", 1<<size);
+			goto err_eeprom;
+		}
+		printk(KERN_ERR "The NVM Checksum Is Not Valid\n");
+		return -E1000_ERR_MAC_INIT;
+        }
+
 	/* if part supports SR-IOV then initialize mailbox parameters */
 	switch (mac->type) {
 	case e1000_82576:
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index cdfd572..8e23ca2 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -1940,13 +1940,6 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 	 * known good starting state */
 	hw->mac.ops.reset_hw(hw);
 
-	/* make sure the NVM is good */
-	if (hw->nvm.ops.validate(hw) < 0) {
-		dev_err(&pdev->dev, "The NVM Checksum Is Not Valid\n");
-		err = -EIO;
-		goto err_eeprom;
-	}
-
 	/* copy the MAC address out of the NVM */
 	if (hw->mac.ops.read_mac_addr(hw))
 		dev_err(&pdev->dev, "NVM Read Error\n");



------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related

* Re: [PATCH] igb: restore EEPROM 16kB access limit
From: Stefan Assmann @ 2011-04-26 15:07 UTC (permalink / raw)
  To: Wyborny, Carolyn
  Cc: netdev@vger.kernel.org, e1000-devel@lists.sourceforge.net,
	Kirsher, Jeffrey T, Pieper, Jeffrey E, Ronciak, John,
	Andy Gospodarek
In-Reply-To: <EDC0E76513226749BFBC9C3FB031318F0137E85E47@orsmsx508.amr.corp.intel.com>

New patch against net-next-2.6.

  Stefan

>From 67ce9e09e10f05054a26560306aa484ae3acc03f Mon Sep 17 00:00:00 2001
From: Stefan Assmann <sassmann@kpanic.de>
Date: Mon, 18 Apr 2011 15:22:19 +0200
Subject: [PATCH] igb: default to 32kB for EEPROMs reporting invalid size

The check that gracefully handled invalid EEPROM sizes was removed by
commit 4322e561a93ec7ee034b603a6c610e7be90d4e8a. Without this check the
EEPROM validation fails if the size is invalid and the NIC is not usable
by the OS. Observed with a 8086:10c9 NIC.

igb 0000:03:00.0: 0 vfs allocated
igb 0000:03:00.0: The NVM Checksum Is Not Valid
ACPI: PCI interrupt for device 0000:03:00.0 disabled
igb: probe of 0000:03:00.0 failed with error -5

Re-introducing the check with an additional dev_err() to report the problem.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
---
 drivers/net/igb/e1000_82575.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/drivers/net/igb/e1000_82575.c b/drivers/net/igb/e1000_82575.c
index 0cd41c4..f3bdf29 100644
--- a/drivers/net/igb/e1000_82575.c
+++ b/drivers/net/igb/e1000_82575.c
@@ -31,9 +31,11 @@

 #include <linux/types.h>
 #include <linux/if_ether.h>
+#include <linux/pci.h>

 #include "e1000_mac.h"
 #include "e1000_82575.h"
+#include "igb.h"

 static s32  igb_get_invariants_82575(struct e1000_hw *);
 static s32  igb_acquire_phy_82575(struct e1000_hw *);
@@ -113,6 +115,7 @@ static s32 igb_get_invariants_82575(struct e1000_hw *hw)
 	struct e1000_nvm_info *nvm = &hw->nvm;
 	struct e1000_mac_info *mac = &hw->mac;
 	struct e1000_dev_spec_82575 * dev_spec = &hw->dev_spec._82575;
+	struct igb_adapter *adapter = hw->back;
 	u32 eecd;
 	s32 ret_val;
 	u16 size;
@@ -244,6 +247,13 @@ static s32 igb_get_invariants_82575(struct e1000_hw *hw)
 	 */
 	size += NVM_WORD_SIZE_BASE_SHIFT;

+	/* gracefully handle NICs reporting an invalid EEPROM size */
+	if (size > 15) {
+		dev_err(&adapter->pdev->dev,
+		        "NVM size is not valid, defaulting to 32kB\n");
+		size = 15;
+	}
+
 	nvm->word_size = 1 << size;
 	if (nvm->word_size == (1 << 15))
 		nvm->page_size = 128;
-- 
1.7.4


^ permalink raw reply related

* RE: [PATCH] igb: restore EEPROM 16kB access limit
From: Wyborny, Carolyn @ 2011-04-26 15:10 UTC (permalink / raw)
  To: Stefan Assmann
  Cc: netdev@vger.kernel.org, e1000-devel@lists.sourceforge.net,
	Kirsher, Jeffrey T, Pieper, Jeffrey E, Ronciak, John,
	Andy Gospodarek
In-Reply-To: <4DB6DF9B.90706@kpanic.de>



>-----Original Message-----
>From: Stefan Assmann [mailto:sassmann@kpanic.de]
>Sent: Tuesday, April 26, 2011 8:07 AM
>To: Wyborny, Carolyn
>Cc: netdev@vger.kernel.org; e1000-devel@lists.sourceforge.net; Kirsher,
>Jeffrey T; Pieper, Jeffrey E; Ronciak, John; Andy Gospodarek
>Subject: Re: [PATCH] igb: restore EEPROM 16kB access limit
>
>New patch against net-next-2.6.
>
>  Stefan
>
>From 67ce9e09e10f05054a26560306aa484ae3acc03f Mon Sep 17 00:00:00 2001
>From: Stefan Assmann <sassmann@kpanic.de>
>Date: Mon, 18 Apr 2011 15:22:19 +0200
>Subject: [PATCH] igb: default to 32kB for EEPROMs reporting invalid size
>
>The check that gracefully handled invalid EEPROM sizes was removed by
>commit 4322e561a93ec7ee034b603a6c610e7be90d4e8a. Without this check the
>EEPROM validation fails if the size is invalid and the NIC is not usable
>by the OS. Observed with a 8086:10c9 NIC.
>
>igb 0000:03:00.0: 0 vfs allocated
>igb 0000:03:00.0: The NVM Checksum Is Not Valid
>ACPI: PCI interrupt for device 0000:03:00.0 disabled
>igb: probe of 0000:03:00.0 failed with error -5
>
>Re-introducing the check with an additional dev_err() to report the
>problem.
>
>Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
>---
> drivers/net/igb/e1000_82575.c |   10 ++++++++++
> 1 files changed, 10 insertions(+), 0 deletions(-)
>
>diff --git a/drivers/net/igb/e1000_82575.c
>b/drivers/net/igb/e1000_82575.c
>index 0cd41c4..f3bdf29 100644
>--- a/drivers/net/igb/e1000_82575.c
>+++ b/drivers/net/igb/e1000_82575.c
>@@ -31,9 +31,11 @@
>
> #include <linux/types.h>
> #include <linux/if_ether.h>
>+#include <linux/pci.h>
>
> #include "e1000_mac.h"
> #include "e1000_82575.h"
>+#include "igb.h"
>
> static s32  igb_get_invariants_82575(struct e1000_hw *);
> static s32  igb_acquire_phy_82575(struct e1000_hw *);
>@@ -113,6 +115,7 @@ static s32 igb_get_invariants_82575(struct e1000_hw
>*hw)
> 	struct e1000_nvm_info *nvm = &hw->nvm;
> 	struct e1000_mac_info *mac = &hw->mac;
> 	struct e1000_dev_spec_82575 * dev_spec = &hw->dev_spec._82575;
>+	struct igb_adapter *adapter = hw->back;
> 	u32 eecd;
> 	s32 ret_val;
> 	u16 size;
>@@ -244,6 +247,13 @@ static s32 igb_get_invariants_82575(struct e1000_hw
>*hw)
> 	 */
> 	size += NVM_WORD_SIZE_BASE_SHIFT;
>
>+	/* gracefully handle NICs reporting an invalid EEPROM size */
>+	if (size > 15) {
>+		dev_err(&adapter->pdev->dev,
>+		        "NVM size is not valid, defaulting to 32kB\n");
>+		size = 15;
>+	}
>+
> 	nvm->word_size = 1 << size;
> 	if (nvm->word_size == (1 << 15))
> 		nvm->page_size = 128;
>--
>1.7.4

I have already submitted a patch to fix this in our internal process, but our maintainer has been ill.  It should be out shortly.  I thought you agreed with Alex to let us submit the patch as this code in our Shared Code and we need to retain the copyright.

I will get with Jeff to get it out as soon as possible.

Thanks,

Carolyn

Carolyn Wyborny
Linux Development
LAN Access Division
Intel Corporation




^ permalink raw reply

* RE: [PATCH] igb: restore EEPROM 16kB access limit
From: Wyborny, Carolyn @ 2011-04-26 15:12 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Stefan Assmann, netdev@vger.kernel.org,
	e1000-devel@lists.sourceforge.net, Kirsher, Jeffrey T,
	Pieper, Jeffrey E, Ronciak, John
In-Reply-To: <20110426150659.GA21309@gospo.rdu.redhat.com>



>-----Original Message-----
>From: Andy Gospodarek [mailto:andy@greyhouse.net]
>Sent: Tuesday, April 26, 2011 8:07 AM
>To: Wyborny, Carolyn
>Cc: Stefan Assmann; netdev@vger.kernel.org; e1000-
>devel@lists.sourceforge.net; Kirsher, Jeffrey T; Pieper, Jeffrey E;
>Ronciak, John
>Subject: Re: [PATCH] igb: restore EEPROM 16kB access limit
>
>On Fri, Apr 08, 2011 at 01:10:30PM -0700, Wyborny, Carolyn wrote:
>[...]
>>
>> Yes, there's more code changed than just the removal of what you're
>trying to add back.  The snip is the replacement but those function need
>to exist as well.  I believe that the commit referenced did not
>completely apply and you're missing some critical code.  The problem you
>are seeing should not occur with full patch.
>>
>> The version of e1000_82575.c in 2.6.39-rc2 has all the changes needed
>for this to work correctly.
>>
>
>I'm still seeing failures with today's net-next-2.6 ('git describe'
>shows v2.6.39-rc1-1283-g64cad2a), so it would be really nice to get this
>fixed.  I would rather not have to carry a patch like the one Stefan
>posted or one like this crazy one I hacked up to try all sizes until
>valid NVRAM is found.
>
>It applies cleanly net-next-2.6, net-2.6, and linux-2.6 as all exhibit
>the exact same problem.
>
>diff --git a/drivers/net/igb/e1000_82575.c
>b/drivers/net/igb/e1000_82575.c
>index 0cd41c4..f8677f2 100644
>--- a/drivers/net/igb/e1000_82575.c
>+++ b/drivers/net/igb/e1000_82575.c
>@@ -243,7 +243,7 @@ static s32 igb_get_invariants_82575(struct e1000_hw
>*hw)
> 	 * for setting word_size.
> 	 */
> 	size += NVM_WORD_SIZE_BASE_SHIFT;
>-
>+err_eeprom:
> 	nvm->word_size = 1 << size;
> 	if (nvm->word_size == (1 << 15))
> 		nvm->page_size = 128;
>@@ -271,6 +271,17 @@ static s32 igb_get_invariants_82575(struct e1000_hw
>*hw)
> 	}
> 	nvm->ops.write = igb_write_nvm_spi;
>
>+        /* make sure the NVM is good */
>+        if (hw->nvm.ops.validate(hw) < 0) {
>+		if (size > 14)  {
>+			size--;
>+			printk(KERN_ERR "igb: The NVM size is not valid,
>trying %d\n", 1<<size);
>+			goto err_eeprom;
>+		}
>+		printk(KERN_ERR "The NVM Checksum Is Not Valid\n");
>+		return -E1000_ERR_MAC_INIT;
>+        }
>+
> 	/* if part supports SR-IOV then initialize mailbox parameters */
> 	switch (mac->type) {
> 	case e1000_82576:
>diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
>index cdfd572..8e23ca2 100644
>--- a/drivers/net/igb/igb_main.c
>+++ b/drivers/net/igb/igb_main.c
>@@ -1940,13 +1940,6 @@ static int __devinit igb_probe(struct pci_dev
>*pdev,
> 	 * known good starting state */
> 	hw->mac.ops.reset_hw(hw);
>
>-	/* make sure the NVM is good */
>-	if (hw->nvm.ops.validate(hw) < 0) {
>-		dev_err(&pdev->dev, "The NVM Checksum Is Not Valid\n");
>-		err = -EIO;
>-		goto err_eeprom;
>-	}
>-
> 	/* copy the MAC address out of the NVM */
> 	if (hw->mac.ops.read_mac_addr(hw))
> 		dev_err(&pdev->dev, "NVM Read Error\n");
>
Part of the problem you are seeing is an apparently widespread EEPROM problem where the size word in the EEPROM is invalid.  Since we didn't really check it before it didn't cause a problem.  I have a patch coming that addresses this by messaging the user that the size is invalid but setting it to a default and continuing.  

Thanks,

Carolyn

Carolyn Wyborny
Linux Development
LAN Access Division
Intel Corporation




^ permalink raw reply

* Re: how to set vlan filter for intel 82599
From: Alexander Duyck @ 2011-04-26 15:22 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: zhou rui, netdev@vger.kernel.org,
	e1000-devel@lists.sourceforge.net
In-Reply-To: <1303824705.3032.359.camel@localhost>

On 4/26/2011 6:31 AM, Ben Hutchings wrote:
> On Tue, 2011-04-26 at 12:39 +0800, zhou rui wrote:
> [...]
>> i set the filter like below:
>>
>> for a vlanid=50, it always match the last rule (action 7)
>>
>> ./ethtool -K eth5 ntuple off
>> ./ethtool -K eth5 ntuple on
>> ./ethtool -U eth5 flow-type tcp4 vlan 32 vlan-mask 0xF01F action 1
>> ./ethtool -U eth5 flow-type udp4 vlan 32 vlan-mask 0xF01F action 1
>> ./ethtool -U eth5 flow-type udp4 vlan 64 vlan-mask 0xF01F action 7
>> ./ethtool -U eth5 flow-type tcp4 vlan 64 vlan-mask 0xF01F action 7
>>
>> I tried the latest ixgbe driver 3.3.9, it reports:
>>
>> Cannot add new RX n-tuple filter: Operation not permitted
>>
>> ./ethtool -V
>> ethtool version 2.6.36
>
> Check dmesg; there should be an error message there.  Of course the
> error code should be EINVAL and not EPERM.
>
> Ben.
>

The problem is likely the vlan-mask.  The only valid VLAN masks 
supported are 0xFFFF, 0x0FFF, 0xF000, and 0x0000.  The hardware cannot 
partially mask either the priority nor the VLAN TCI.

Thanks,

Alex

^ permalink raw reply

* Re: how to set vlan filter for intel 82599
From: zhou rui @ 2011-04-26 15:30 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Ben Hutchings, netdev@vger.kernel.org,
	e1000-devel@lists.sourceforge.net
In-Reply-To: <4DB6E34F.5000203@intel.com>

On Tuesday, April 26, 2011, Alexander Duyck <alexander.h.duyck@intel.com> wrote:
> On 4/26/2011 6:31 AM, Ben Hutchings wrote:
>
> On Tue, 2011-04-26 at 12:39 +0800, zhou rui wrote:
> [...]
>
> i set the filter like below:
>
> for a vlanid=50, it always match the last rule (action 7)
>
> ./ethtool -K eth5 ntuple off
> ./ethtool -K eth5 ntuple on
> ./ethtool -U eth5 flow-type tcp4 vlan 32 vlan-mask 0xF01F action 1
> ./ethtool -U eth5 flow-type udp4 vlan 32 vlan-mask 0xF01F action 1
> ./ethtool -U eth5 flow-type udp4 vlan 64 vlan-mask 0xF01F action 7
> ./ethtool -U eth5 flow-type tcp4 vlan 64 vlan-mask 0xF01F action 7
>
> I tried the latest ixgbe driver 3.3.9, it reports:
>
> Cannot add new RX n-tuple filter: Operation not permitted
>
> ./ethtool -V
> ethtool version 2.6.36
>
>
> Check dmesg; there should be an error message there.  Of course the
> error code should be EINVAL and not EPERM.
>
> Ben.
>
>
>
> The problem is likely the vlan-mask.  The only valid VLAN masks supported are 0xFFFF, 0x0FFF, 0xF000, and 0x0000.  The hardware cannot partially mask either the priority nor the VLAN TCI.
>
> Thanks,
>
> Alex
>
Yes i just checked the log, it said partially vlan Id not support,
that means I can't set this filter?(vlan id 0-31 go to
Q0,32-63->Q1....?)

^ permalink raw reply

* Re: [PATCH] netfilter/IPv6: initialize TOS field in REJECT target module
From: Pablo Neira Ayuso @ 2011-04-26 15:34 UTC (permalink / raw)
  To: Fernando Luis Vazquez Cao
  Cc: David Miller, eric.dumazet, netfilter-devel, netdev, yoshfuji,
	jengelh, Patrick McHardy
In-Reply-To: <1303795522.6010.5.camel@nausicaa>

On 26/04/11 07:25, Fernando Luis Vazquez Cao wrote:
> On Mon, 2011-04-25 at 22:17 -0700, David Miller wrote:
>> From: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
>> Date: Tue, 26 Apr 2011 10:26:20 +0900
>>
>>> On Tue, 2011-04-26 at 03:13 +0200, Pablo Neira Ayuso wrote:
>>>> On 22/04/11 10:37, Eric Dumazet wrote:
>>>>> Le vendredi 22 avril 2011 à 17:11 +0900, Fernando Luis Vazquez Cao a
>>>>> écrit :
>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>> Should we send these two patches to -stable too?
>>>>>
>>>>> David takes care of stable submissions for netdev stuff, thanks.
>>>>
>>>> If the patch follows the netfilter path, we'll take care of sending
>>>> stable submissions.
>>>
>>> David, will you take care of these two patches or should they go through
>>> the netfilter tree?
>>
>> Netfilter, as usual.
> 
> Thank you for the clarification. I really appreciate it.
> 
> Pablo, could you pull in the two patches below? They have already been
> acked by Eric. It would be great it we could get them merged for the
> next -rc and stable releases.
> 
> [PATCH] netfilter/IPv6: fix DSCP mangle code
> [PATCH] netfilter/IPv6: initialize TOS field in REJECT target module

Patrick is the primary link to take patches, I'm including him in this
CC. If he experiences any problem, I'll make sure that these hit -rc, so
never mind.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] netfilter/IPv6: initialize TOS field in REJECT target module
From: Pablo Neira Ayuso @ 2011-04-26 15:35 UTC (permalink / raw)
  To: Fernando Luis Vazquez Cao
  Cc: David Miller, eric.dumazet, netfilter-devel, netdev, yoshfuji,
	jengelh, Patrick McHardy
In-Reply-To: <4DB6E607.6000303@netfilter.org>

On 26/04/11 17:34, Pablo Neira Ayuso wrote:
> On 26/04/11 07:25, Fernando Luis Vazquez Cao wrote:
>> On Mon, 2011-04-25 at 22:17 -0700, David Miller wrote:
>>> From: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
>>> Date: Tue, 26 Apr 2011 10:26:20 +0900
>>>
>>>> On Tue, 2011-04-26 at 03:13 +0200, Pablo Neira Ayuso wrote:
>>>>> On 22/04/11 10:37, Eric Dumazet wrote:
>>>>>> Le vendredi 22 avril 2011 à 17:11 +0900, Fernando Luis Vazquez Cao a
>>>>>> écrit :
>>>>>>
>>>>>>> Thank you!
>>>>>>>
>>>>>>> Should we send these two patches to -stable too?
>>>>>>
>>>>>> David takes care of stable submissions for netdev stuff, thanks.
>>>>>
>>>>> If the patch follows the netfilter path, we'll take care of sending
>>>>> stable submissions.
>>>>
>>>> David, will you take care of these two patches or should they go through
>>>> the netfilter tree?
>>>
>>> Netfilter, as usual.
>>
>> Thank you for the clarification. I really appreciate it.
>>
>> Pablo, could you pull in the two patches below? They have already been
>> acked by Eric. It would be great it we could get them merged for the
>> next -rc and stable releases.
>>
>> [PATCH] netfilter/IPv6: fix DSCP mangle code
>> [PATCH] netfilter/IPv6: initialize TOS field in REJECT target module
> 
> Patrick is the primary link to take patches, I'm including him in this
> CC. If he experiences any problem, I'll make sure that these hit -rc, so
> never mind.
  ^^^^^^^^^^

Sorry, I meant to say, "don't worry" :-)


^ permalink raw reply

* Re: CDC NCM missing Zero Length Packets
From: Alexey ORISHKO @ 2011-04-26 15:59 UTC (permalink / raw)
  To: Matthias Benesch; +Cc: linux-usb@vger.kernel.org, netdev@vger.kernel.org

On Fri, Apr 22, 2011 at 4:32 PM, Matthias Benesch <twoofseven@freenet.de> wrote:
> Hello,
>
> I am testing the USB-CDC NCM host driver. One issue that was found by one
> USB device stack provider, is that the current implementation is not sending
> ZLP or short packets if the dwNtbOutMaxSize reported by the device is
> greater than CDC_NCM_NTB_MAX_SIZE_TX (=16383 bytes).
>

I'd like to point out, that NCM is not a Mass Storage and increasing max
buffer size in the device will lead to reduced throughput at the end. By 
increasing max buffer size, you are also increasing overall latency, which
will hit hard streaming and other services.

NCM specification was written with approach that device is less capable than
the host. The issue you mentioned, is not addressed properly in the specification,
but might be fixed in the next version of NCM spec.

16K buffer size in the current driver was chosen as a result of testing and
as a tradeoff between latency and throughput and should be suitable for LTE
speeds. Devices using 21 Mbit HSPA radio could function happily with 8K buffers.

By using bigger buffer in the device your USB device stack provider forces
host driver constantly send short packets, which will significantly reduce
benefits provided by DMA controller. While having maximum data traffic your
device interrupt load would be twice as high as the load when device buffer
size is equal or less than host driver buffer size.

As a conclusion: 
- yes, current driver version has a bug (no short packet sent) and I will
submit a patch for it after testing. 
- Patch provided a few days ago is not a proper solution and can be considered as
a workaround only.
- I also suggest you to talk to your usb device stack vendor, because your product
might work much better if issues mentioned above are taking into consideration.

Regards,
Alexey

^ permalink raw reply

* RE: [RFC PATCH] netlink: Increase netlink dump skb message size
From: Rose, Gregory V @ 2011-04-26 16:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev@vger.kernel.org, bhutchings@solarflare.com,
	davem@davemloft.net
In-Reply-To: <1303799597.2747.214.camel@edumazet-laptop>

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Monday, April 25, 2011 11:33 PM
> To: Rose, Gregory V
> Cc: netdev@vger.kernel.org; bhutchings@solarflare.com; davem@davemloft.net
> Subject: Re: [RFC PATCH] netlink: Increase netlink dump skb message size
> 
> Le lundi 25 avril 2011 à 15:01 -0700, Greg Rose a écrit :
> > The message size allocated for rtnl info dumps was limited to a single
> page.
> > This is not enough for additional interface info available with devices
> > that support SR-IOV.  Check that the amount of data allocated is
> sufficient
> > for the amount of data requested.
> >
> > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> > ---
> >
> >  include/linux/rtnetlink.h |    1 +
> >  net/core/rtnetlink.c      |    6 ++++++
> >  net/netlink/af_netlink.c  |   37 +++++++++++++++++++++++++++++++------
> >  3 files changed, 38 insertions(+), 6 deletions(-)
> >
> 
> Hmm, thats a hack, because netlink_dump() is generic and you add
> something very specific.
> 
> I prefer something that allows one dump() to reallocate a bigger skb
> 
> Maybe changing->dump() prototype to struct sk_buff **pskb instead of
> struct sk_buff *skb.

I'll have a look at that, although Dave's not too happy with the whole lot of this mess and I very much agree with him.

- Greg
> 
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index c8f35b5..7fa6735 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -1681,7 +1681,7 @@ static int netlink_dump(struct sock *sk)
>  		goto errout_skb;
>  	}
> 
> -	len = cb->dump(skb, cb);
> +	len = cb->dump(&skb, cb);
> 
>  	if (len > 0) {
>  		mutex_unlock(nlk->cb_mutex);
> 


^ permalink raw reply

* Re: linux-next ftmac100 driver
From: Adam Jaremko @ 2011-04-26 16:08 UTC (permalink / raw)
  To: Ratbert Po-Yu Chuang(莊博宇); +Cc: ratbert.chuang, netdev
In-Reply-To: <D557AF68929979498EDCFCBB34FFD4E82C62BC@ftcpcs78.faraday.com.tw>

> You can try to use udelay() to replace usleep_range(). I will also check this problem when I am free later.
> If you solved this problem before me, please send a message (and keep the CC list).
> It will be even better if you could submit a patch. :-)

That is what I ended up doing and am able to continue with my testing.

^ permalink raw reply

* RE: [RFC PATCH] netlink: Increase netlink dump skb message size
From: Rose, Gregory V @ 2011-04-26 16:12 UTC (permalink / raw)
  To: David Miller, eric.dumazet@gmail.com
  Cc: netdev@vger.kernel.org, bhutchings@solarflare.com
In-Reply-To: <20110425.235600.104065244.davem@davemloft.net>

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Monday, April 25, 2011 11:56 PM
> To: eric.dumazet@gmail.com
> Cc: Rose, Gregory V; netdev@vger.kernel.org; bhutchings@solarflare.com
> Subject: Re: [RFC PATCH] netlink: Increase netlink dump skb message size
> 
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 26 Apr 2011 08:33:17 +0200
> 
> > Le lundi 25 avril 2011 à 15:01 -0700, Greg Rose a écrit :
> >> The message size allocated for rtnl info dumps was limited to a single
> page.
> >> This is not enough for additional interface info available with devices
> >> that support SR-IOV.  Check that the amount of data allocated is
> sufficient
> >> for the amount of data requested.
> >>
> >> Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> >> ---
> >>
> >>  include/linux/rtnetlink.h |    1 +
> >>  net/core/rtnetlink.c      |    6 ++++++
> >>  net/netlink/af_netlink.c  |   37 +++++++++++++++++++++++++++++++------
> >>  3 files changed, 38 insertions(+), 6 deletions(-)
> >>
> >
> > Hmm, thats a hack, because netlink_dump() is generic and you add
> > something very specific.
> 
> You also can't do this without breaking applications.
> 
> We've trained every single netlink library out there about this message
> size
> formula, so they know that if you allocate at least 8192 bytes for a
> recvmsg()
> call they can receive fully any single netlink message.

I checked the message sizes being generated and they were far less than 8K for 63 VFs and from my calculation would still be less than 8K for 127 VFs as per Ben Hutching's requirements.  I thought I had 16k to work with as that is what is allocated by the iproute2 ip application also as per Ben's prior email.  I checked and that is the case.  But whatever the case I think we're still fine for 8K.  I could put a check in to make sure alloc_size doesn't go over 8k.

> 
> And they must be able to make assumptions like this, otherwise they
> cannot know how to reliably be able to receive a netlink message in
> it's entirety in a generic fashion.
> 
> So one must not attack this problem from this angle.

This was just a bug fix.  If you recall the email thread from last week that brought up this whole thing I was advocating for an entirely new angle myself.  If the bug fix causes more problems than it solves, and that's not rare by any means, then we can toss it and go back to the start.

> 
> It is absolutely necessary to find some way to report the VF
> information, out of band, instead of trying to make the message
> larger.

Absolutely agreed!

> 
> Needing more than 8K to get a dump of a single device over netlink is
> absolutely rediculious, this VF stuff was poorly designed.  If has to
> be fixed and the current stuff marked deprecated.

Again, absolutely no argument from me.

And I have an idea as to how to do this but Ben asked me to come up with a bug fix first and then work on the extensions that would do just what you're saying here later.  I'm fine with dropping the bug fix hack and getting back to my original ideas on the subject.  The problem there is that due to my work load it would be a month or more before I could finish it up and meanwhile the bug still exists.

I'm fine with however you folks want to approach this, just give me some direction.

- Greg

^ permalink raw reply

* Re: [PATCH 2/3] bql: Byte queue limits
From: Stephen Hemminger @ 2011-04-26 16:16 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev
In-Reply-To: <alpine.DEB.2.00.1104252128290.5895@pokey.mtv.corp.google.com>

On Mon, 25 Apr 2011 21:38:11 -0700 (PDT)
Tom Herbert <therbert@google.com> wrote:

> Networking stack support for byte queue limits, uses dynamic queue
> limits library.  Byte queue limits are maintained per transmit queue,
> and a bql structure has been added to netdev_queue structure for this
> purpose.
> 
> Configuration of bql is in the tx-<n> sysfs directory for the queue
> under the byte_queue_limits directory.  Configuration includes:
> limit_min, bql minimum limit
> limit_max, bql maximum limit
> hold_time, bql slack hold time
> 
> Also under the directory are:
> limit, current byte limit
> inflight, current number of bytes on the queue
> 
> Signed-off-by: Tom Herbert <therbert@google.com>

Although this is implemented as a device attribute, from a layering
point of view it feels like a queuing strategy. Having two competing
ways to do something is not always a good idea. Why is this not a
qdisc parameter or a new qdisc?

-- 

^ permalink raw reply

* RE: [RFC PATCH] netlink: Increase netlink dump skb message size
From: Eric Dumazet @ 2011-04-26 16:21 UTC (permalink / raw)
  To: Rose, Gregory V
  Cc: David Miller, netdev@vger.kernel.org, bhutchings@solarflare.com
In-Reply-To: <43F901BD926A4E43B106BF17856F0755018DC497D4@orsmsx508.amr.corp.intel.com>

Le mardi 26 avril 2011 à 09:12 -0700, Rose, Gregory V a écrit :

> I'm fine with however you folks want to approach this, just give me some direction.

I would just try following patch :


diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 4c4ac3f..22cac81 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -210,6 +210,7 @@ int netlink_sendskb(struct sock *sk, struct sk_buff *skb);
 #else
 #define NLMSG_GOODSIZE	SKB_WITH_OVERHEAD(8192UL)
 #endif
+#define NLMSG_DUMPSIZE SKB_WITH_OVERHEAD(8192UL)
 
 #define NLMSG_DEFAULT_SIZE (NLMSG_GOODSIZE - NLMSG_HDRLEN)
 
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index c8f35b5..cb8d6ac 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1669,7 +1669,7 @@ static int netlink_dump(struct sock *sk)
 	struct nlmsghdr *nlh;
 	int len, err = -ENOBUFS;
 
-	skb = sock_rmalloc(sk, NLMSG_GOODSIZE, 0, GFP_KERNEL);
+	skb = sock_rmalloc(sk, NLMSG_DUMPSIZE, 0, GFP_KERNEL);
 	if (!skb)
 		goto errout;
 



^ permalink raw reply related

* RE: [RFC PATCH] netlink: Increase netlink dump skb message size
From: Rose, Gregory V @ 2011-04-26 16:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev@vger.kernel.org, bhutchings@solarflare.com
In-Reply-To: <1303834864.3358.58.camel@edumazet-laptop>

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Tuesday, April 26, 2011 9:21 AM
> To: Rose, Gregory V
> Cc: David Miller; netdev@vger.kernel.org; bhutchings@solarflare.com
> Subject: RE: [RFC PATCH] netlink: Increase netlink dump skb message size
> 
> Le mardi 26 avril 2011 à 09:12 -0700, Rose, Gregory V a écrit :
> 
> > I'm fine with however you folks want to approach this, just give me some
> direction.
> 
> I would just try following patch :

I'll test it out, it's certainly a lot simpler.  Perhaps I was getting a bit too fancy.

Ben would want to make sure it works for 127 VFs, my device does 63.

- Greg

> 
> 
> diff --git a/include/linux/netlink.h b/include/linux/netlink.h
> index 4c4ac3f..22cac81 100644
> --- a/include/linux/netlink.h
> +++ b/include/linux/netlink.h
> @@ -210,6 +210,7 @@ int netlink_sendskb(struct sock *sk, struct sk_buff
> *skb);
>  #else
>  #define NLMSG_GOODSIZE	SKB_WITH_OVERHEAD(8192UL)
>  #endif
> +#define NLMSG_DUMPSIZE SKB_WITH_OVERHEAD(8192UL)
> 
>  #define NLMSG_DEFAULT_SIZE (NLMSG_GOODSIZE - NLMSG_HDRLEN)
> 
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index c8f35b5..cb8d6ac 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -1669,7 +1669,7 @@ static int netlink_dump(struct sock *sk)
>  	struct nlmsghdr *nlh;
>  	int len, err = -ENOBUFS;
> 
> -	skb = sock_rmalloc(sk, NLMSG_GOODSIZE, 0, GFP_KERNEL);
> +	skb = sock_rmalloc(sk, NLMSG_DUMPSIZE, 0, GFP_KERNEL);
>  	if (!skb)
>  		goto errout;
> 
> 


^ permalink raw reply

* [PATCH] net: ftmac100: fix scheduling while atomic during PHY link status change
From: Adam Jaremko @ 2011-04-26 16:32 UTC (permalink / raw)
  To: Ratbert Po-Yu Chuang(莊博宇); +Cc: ratbert.chuang, netdev

Signed-off-by: Adam Jaremko <adam.jaremko@gmail.com>
---
 drivers/net/ftmac100.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ftmac100.c b/drivers/net/ftmac100.c
index a316619..9bd7746 100644
--- a/drivers/net/ftmac100.c
+++ b/drivers/net/ftmac100.c
@@ -139,11 +139,11 @@ static int ftmac100_reset(struct ftmac100 *priv)
 			 * that hardware reset completed (what the f*ck).
 			 * We still need to wait for a while.
 			 */
-			usleep_range(500, 1000);
+			udelay(500);
 			return 0;
 		}

-		usleep_range(1000, 10000);
+		udelay(1000);
 	}

 	netdev_err(netdev, "software reset failed\n");
@@ -772,7 +772,7 @@ static int ftmac100_mdio_read(struct net_device
*netdev, int phy_id, int reg)
 		if ((phycr & FTMAC100_PHYCR_MIIRD) == 0)
 			return phycr & FTMAC100_PHYCR_MIIRDATA;

-		usleep_range(100, 1000);
+		udelay(100);
 	}

 	netdev_err(netdev, "mdio read timed out\n");
@@ -801,7 +801,7 @@ static void ftmac100_mdio_write(struct net_device
*netdev, int phy_id, int reg,
 		if ((phycr & FTMAC100_PHYCR_MIIWR) == 0)
 			return;

-		usleep_range(100, 1000);
+		udelay(100);
 	}

 	netdev_err(netdev, "mdio write timed out\n");
-- 
1.7.4.4

^ permalink raw reply related

* Re: [PATCH 2/3] bql: Byte queue limits
From: Eric Dumazet @ 2011-04-26 16:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Tom Herbert, davem, netdev
In-Reply-To: <20110426091620.7c576a98@nehalam>

Le mardi 26 avril 2011 à 09:16 -0700, Stephen Hemminger a écrit :

> Although this is implemented as a device attribute, from a layering
> point of view it feels like a queuing strategy. Having two competing
> ways to do something is not always a good idea. Why is this not a
> qdisc parameter or a new qdisc?
> 

The dequeue thing is performed in TX completion (one call for all
dequeued skbs to reduce overhead).

But its right the queueing test could be done generically (stop the
queue if limit reached).

BTW I did not check if both paths were SMP safe if run concurrently...

^ permalink raw reply

* Re: [PATCH 0/3] net: Byte queue limit patch series
From: Rick Jones @ 2011-04-26 16:57 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev
In-Reply-To: <alpine.DEB.2.00.1104252128001.5889@pokey.mtv.corp.google.com>

On Mon, 2011-04-25 at 21:38 -0700, Tom Herbert wrote:
> This patch series implements byte queue limits (bql) for NIC TX queues.
> 
> Byte queue limits are a mechanism to limit the size of the transmit
> hardware queue on a NIC by number of bytes. The goal of these byte
> limits is too reduce latency caused by excessive queuing in hardware
> without sacrificing throughput.
> 
> Hardware queuing limits are typically specified in terms of a number
> hardware descriptors, each of which has a variable size. The variability
> of the size of individual queued items can have a very wide range. For
> instance with the e1000 NIC the size could range from 64 bytes to 4K
> (with TSO enabled). This variability makes it next to impossible to
> choose a single queue limit that prevents starvation and provides lowest
> possible latency.
> 
> The objective of byte queue limits is to set the limit to be the
> minimum needed to prevent starvation between successive transmissions to
> the hardware. The latency between two transmissions can be variable in a
> system. It is dependent on interrupt frequency, NAPI polling latencies,
> scheduling of the queuing discipline, lock contention, etc. Therefore we
> propose that byte queue limits should be dynamic and change in
> iaccordance with networking stack latencies a system encounters.
> 
> Patches to implement this:
> Patch 1: Dynamic queue limits (dql) library.  This provides the general
> queuing algorithm.
> Patch 2: netdev changes that use dlq to support byte queue limits.
> Patch 3: Support in forcedeth drvier for byte queue limits.
> 
> The effects of BQL are demonstrated in the benchmark results below.
> These were made running 200 stream of netperf RR tests:
> 
> 140000 rr size
> BQL: 80-215K bytes in queue, 856 tps, 3.26%
> No BQL: 2700-2930K bytes in queue, 854 tps, 3.71% cpu

That is both the request and the response being set to 140000 yes?

> 14000 rr size
> BQ: 25-55K bytes in queue, 8500 tps
> No BQL: 1500-1622K bytes in queue,  8523 tps, 4.53% cpu
> 
> 1400 rr size
> BQL: 20-38K in queue bytes in queue, 86582 tps,  7.38% cpu
> No BQL: 29-117K 85738 tps, 7.67% cpu
> 
> 140 rr size
> BQL: 1-10K bytes in queue, 320540 tps, 34.6% cpu
> No BQL: 1-13K bytes in queue, 323158, 37.16% cpu

What, no 14?-)

> 1 rr size
> BQL: 0-3K in queue, 338811 tps, 41.41% cpu
> No BQL: 0-3K in queue, 339947 42.36% cpu
> 
> The amount of queuing in the NIC is reduced up to 90%, and I haven't
> yet seen a consistent negative impact in terms of throughout or
> CPU utilization.

Presumably this will also have a positive (imo) effect on the maximum
size to which a bulk transfer's window will grow under auto tuning yes?

How about a "burst mode" TCP_RR test?

happy benchmarking,

rick jones


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox