Netdev List
 help / color / mirror / Atom feed
* RE: [PATCH] igb: restore EEPROM 16kB access limit
From: Wyborny, Carolyn @ 2011-04-26 15:10 UTC (permalink / raw)
  To: Stefan Assmann
  Cc: netdev@vger.kernel.org, e1000-devel@lists.sourceforge.net,
	Kirsher, Jeffrey T, Pieper, Jeffrey E, Ronciak, John,
	Andy Gospodarek
In-Reply-To: <4DB6DF9B.90706@kpanic.de>



>-----Original Message-----
>From: Stefan Assmann [mailto:sassmann@kpanic.de]
>Sent: Tuesday, April 26, 2011 8:07 AM
>To: Wyborny, Carolyn
>Cc: netdev@vger.kernel.org; e1000-devel@lists.sourceforge.net; Kirsher,
>Jeffrey T; Pieper, Jeffrey E; Ronciak, John; Andy Gospodarek
>Subject: Re: [PATCH] igb: restore EEPROM 16kB access limit
>
>New patch against net-next-2.6.
>
>  Stefan
>
>From 67ce9e09e10f05054a26560306aa484ae3acc03f Mon Sep 17 00:00:00 2001
>From: Stefan Assmann <sassmann@kpanic.de>
>Date: Mon, 18 Apr 2011 15:22:19 +0200
>Subject: [PATCH] igb: default to 32kB for EEPROMs reporting invalid size
>
>The check that gracefully handled invalid EEPROM sizes was removed by
>commit 4322e561a93ec7ee034b603a6c610e7be90d4e8a. Without this check the
>EEPROM validation fails if the size is invalid and the NIC is not usable
>by the OS. Observed with a 8086:10c9 NIC.
>
>igb 0000:03:00.0: 0 vfs allocated
>igb 0000:03:00.0: The NVM Checksum Is Not Valid
>ACPI: PCI interrupt for device 0000:03:00.0 disabled
>igb: probe of 0000:03:00.0 failed with error -5
>
>Re-introducing the check with an additional dev_err() to report the
>problem.
>
>Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
>---
> drivers/net/igb/e1000_82575.c |   10 ++++++++++
> 1 files changed, 10 insertions(+), 0 deletions(-)
>
>diff --git a/drivers/net/igb/e1000_82575.c
>b/drivers/net/igb/e1000_82575.c
>index 0cd41c4..f3bdf29 100644
>--- a/drivers/net/igb/e1000_82575.c
>+++ b/drivers/net/igb/e1000_82575.c
>@@ -31,9 +31,11 @@
>
> #include <linux/types.h>
> #include <linux/if_ether.h>
>+#include <linux/pci.h>
>
> #include "e1000_mac.h"
> #include "e1000_82575.h"
>+#include "igb.h"
>
> static s32  igb_get_invariants_82575(struct e1000_hw *);
> static s32  igb_acquire_phy_82575(struct e1000_hw *);
>@@ -113,6 +115,7 @@ static s32 igb_get_invariants_82575(struct e1000_hw
>*hw)
> 	struct e1000_nvm_info *nvm = &hw->nvm;
> 	struct e1000_mac_info *mac = &hw->mac;
> 	struct e1000_dev_spec_82575 * dev_spec = &hw->dev_spec._82575;
>+	struct igb_adapter *adapter = hw->back;
> 	u32 eecd;
> 	s32 ret_val;
> 	u16 size;
>@@ -244,6 +247,13 @@ static s32 igb_get_invariants_82575(struct e1000_hw
>*hw)
> 	 */
> 	size += NVM_WORD_SIZE_BASE_SHIFT;
>
>+	/* gracefully handle NICs reporting an invalid EEPROM size */
>+	if (size > 15) {
>+		dev_err(&adapter->pdev->dev,
>+		        "NVM size is not valid, defaulting to 32kB\n");
>+		size = 15;
>+	}
>+
> 	nvm->word_size = 1 << size;
> 	if (nvm->word_size == (1 << 15))
> 		nvm->page_size = 128;
>--
>1.7.4

I have already submitted a patch to fix this in our internal process, but our maintainer has been ill.  It should be out shortly.  I thought you agreed with Alex to let us submit the patch as this code in our Shared Code and we need to retain the copyright.

I will get with Jeff to get it out as soon as possible.

Thanks,

Carolyn

Carolyn Wyborny
Linux Development
LAN Access Division
Intel Corporation




^ permalink raw reply

* Re: [PATCH] igb: restore EEPROM 16kB access limit
From: Stefan Assmann @ 2011-04-26 15:07 UTC (permalink / raw)
  To: Wyborny, Carolyn
  Cc: netdev@vger.kernel.org, e1000-devel@lists.sourceforge.net,
	Kirsher, Jeffrey T, Pieper, Jeffrey E, Ronciak, John,
	Andy Gospodarek
In-Reply-To: <EDC0E76513226749BFBC9C3FB031318F0137E85E47@orsmsx508.amr.corp.intel.com>

New patch against net-next-2.6.

  Stefan

>From 67ce9e09e10f05054a26560306aa484ae3acc03f Mon Sep 17 00:00:00 2001
From: Stefan Assmann <sassmann@kpanic.de>
Date: Mon, 18 Apr 2011 15:22:19 +0200
Subject: [PATCH] igb: default to 32kB for EEPROMs reporting invalid size

The check that gracefully handled invalid EEPROM sizes was removed by
commit 4322e561a93ec7ee034b603a6c610e7be90d4e8a. Without this check the
EEPROM validation fails if the size is invalid and the NIC is not usable
by the OS. Observed with a 8086:10c9 NIC.

igb 0000:03:00.0: 0 vfs allocated
igb 0000:03:00.0: The NVM Checksum Is Not Valid
ACPI: PCI interrupt for device 0000:03:00.0 disabled
igb: probe of 0000:03:00.0 failed with error -5

Re-introducing the check with an additional dev_err() to report the problem.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
---
 drivers/net/igb/e1000_82575.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/drivers/net/igb/e1000_82575.c b/drivers/net/igb/e1000_82575.c
index 0cd41c4..f3bdf29 100644
--- a/drivers/net/igb/e1000_82575.c
+++ b/drivers/net/igb/e1000_82575.c
@@ -31,9 +31,11 @@

 #include <linux/types.h>
 #include <linux/if_ether.h>
+#include <linux/pci.h>

 #include "e1000_mac.h"
 #include "e1000_82575.h"
+#include "igb.h"

 static s32  igb_get_invariants_82575(struct e1000_hw *);
 static s32  igb_acquire_phy_82575(struct e1000_hw *);
@@ -113,6 +115,7 @@ static s32 igb_get_invariants_82575(struct e1000_hw *hw)
 	struct e1000_nvm_info *nvm = &hw->nvm;
 	struct e1000_mac_info *mac = &hw->mac;
 	struct e1000_dev_spec_82575 * dev_spec = &hw->dev_spec._82575;
+	struct igb_adapter *adapter = hw->back;
 	u32 eecd;
 	s32 ret_val;
 	u16 size;
@@ -244,6 +247,13 @@ static s32 igb_get_invariants_82575(struct e1000_hw *hw)
 	 */
 	size += NVM_WORD_SIZE_BASE_SHIFT;

+	/* gracefully handle NICs reporting an invalid EEPROM size */
+	if (size > 15) {
+		dev_err(&adapter->pdev->dev,
+		        "NVM size is not valid, defaulting to 32kB\n");
+		size = 15;
+	}
+
 	nvm->word_size = 1 << size;
 	if (nvm->word_size == (1 << 15))
 		nvm->page_size = 128;
-- 
1.7.4


^ permalink raw reply related

* Re: [PATCH] igb: restore EEPROM 16kB access limit
From: Andy Gospodarek @ 2011-04-26 15:06 UTC (permalink / raw)
  To: Wyborny, Carolyn
  Cc: e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org,
	Stefan Assmann, Ronciak, John
In-Reply-To: <EDC0E76513226749BFBC9C3FB031318F0137E85E47@orsmsx508.amr.corp.intel.com>

On Fri, Apr 08, 2011 at 01:10:30PM -0700, Wyborny, Carolyn wrote:
[...]
> 
> Yes, there's more code changed than just the removal of what you're trying to add back.  The snip is the replacement but those function need to exist as well.  I believe that the commit referenced did not completely apply and you're missing some critical code.  The problem you are seeing should not occur with full patch.
> 
> The version of e1000_82575.c in 2.6.39-rc2 has all the changes needed for this to work correctly.
> 

I'm still seeing failures with today's net-next-2.6 ('git describe'
shows v2.6.39-rc1-1283-g64cad2a), so it would be really nice to get this
fixed.  I would rather not have to carry a patch like the one Stefan
posted or one like this crazy one I hacked up to try all sizes until
valid NVRAM is found.

It applies cleanly net-next-2.6, net-2.6, and linux-2.6 as all exhibit
the exact same problem.

diff --git a/drivers/net/igb/e1000_82575.c b/drivers/net/igb/e1000_82575.c
index 0cd41c4..f8677f2 100644
--- a/drivers/net/igb/e1000_82575.c
+++ b/drivers/net/igb/e1000_82575.c
@@ -243,7 +243,7 @@ static s32 igb_get_invariants_82575(struct e1000_hw *hw)
 	 * for setting word_size.
 	 */
 	size += NVM_WORD_SIZE_BASE_SHIFT;
-
+err_eeprom:
 	nvm->word_size = 1 << size;
 	if (nvm->word_size == (1 << 15))
 		nvm->page_size = 128;
@@ -271,6 +271,17 @@ static s32 igb_get_invariants_82575(struct e1000_hw *hw)
 	}
 	nvm->ops.write = igb_write_nvm_spi;
 
+        /* make sure the NVM is good */
+        if (hw->nvm.ops.validate(hw) < 0) {
+		if (size > 14)  {
+			size--;
+			printk(KERN_ERR "igb: The NVM size is not valid, trying %d\n", 1<<size);
+			goto err_eeprom;
+		}
+		printk(KERN_ERR "The NVM Checksum Is Not Valid\n");
+		return -E1000_ERR_MAC_INIT;
+        }
+
 	/* if part supports SR-IOV then initialize mailbox parameters */
 	switch (mac->type) {
 	case e1000_82576:
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index cdfd572..8e23ca2 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -1940,13 +1940,6 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 	 * known good starting state */
 	hw->mac.ops.reset_hw(hw);
 
-	/* make sure the NVM is good */
-	if (hw->nvm.ops.validate(hw) < 0) {
-		dev_err(&pdev->dev, "The NVM Checksum Is Not Valid\n");
-		err = -EIO;
-		goto err_eeprom;
-	}
-
 	/* copy the MAC address out of the NVM */
 	if (hw->mac.ops.read_mac_addr(hw))
 		dev_err(&pdev->dev, "NVM Read Error\n");



------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related

* Re: [PATCH] dsa/mv88e6131: fix unknown multicast/broadcast forwarding on mv88e6085
From: Lennert Buytenhek @ 2011-04-26 14:57 UTC (permalink / raw)
  To: Peter Korsgaard; +Cc: davem, netdev
In-Reply-To: <1303818341-13839-1-git-send-email-jacmet@sunsite.dk>

On Tue, Apr 26, 2011 at 01:45:41PM +0200, Peter Korsgaard wrote:

> The 88e6085 has a few differences from the other devices in the port
> control registers, causing unknown multicast/broadcast packets to get
> dropped when using the standard port setup.
> 
> At the same time update kconfig to clarify that the mv88e6085 is now
> supported.
> 
> Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>

Assuming that you've tested this.. :)

Acked-by: Lennert Buytenhek <buytenh@wantstofly.org>

^ permalink raw reply

* Re: [PATCH 00/13] Swap-over-NBD without deadlocking
From: Peter Zijlstra @ 2011-04-26 14:50 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Neil Brown
In-Reply-To: <20110426144635.GK4658@suse.de>

On Tue, 2011-04-26 at 15:46 +0100, Mel Gorman wrote:
> 
> I did find that only a few route-cache entries should be required. In
> the original patches I worked with, there was a reservation for the
> maximum possible number of route-cache entries. I thought this was
> overkill and instead reserved 1-per-active-swapfile-backed-by-NFS.

Right, so the thing I was worried about was a route-cache poison attack
where someone would spam the machine such that it would create a lot of
route cache entries and might flush the one we needed just as we needed
it.

Pinning the one entry we need would solve that (if possible).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 00/13] Swap-over-NBD without deadlocking
From: Mel Gorman @ 2011-04-26 14:46 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Neil Brown
In-Reply-To: <1303827785.20212.266.camel@twins>

On Tue, Apr 26, 2011 at 04:23:05PM +0200, Peter Zijlstra wrote:
> On Tue, 2011-04-26 at 08:36 +0100, Mel Gorman wrote:
> > Comments?
> 
> Last time I brought up the whole swap over network bits I was pointed
> towards the generic skb recycling work:
> 
>   http://lwn.net/Articles/332037/
> 
> as a means to pre-allocate memory,

I'd taken note of this to take a much closer look if it turned
out reservations were necessary and to find out what happened with
these patches. So far, bigger reservations have *not* been required
but I agree recycling SKBs may be a better alternative than large
reservations or preallocations if they are necessary.

>  and it was suggested to simply pin
> the few route-cache entries required to route these packets and
> dis-allow swap packets to be fragmented (these last two avoid lots of
> funny allocation cases in the network stack).
> 

I did find that only a few route-cache entries should be required. In
the original patches I worked with, there was a reservation for the
maximum possible number of route-cache entries. I thought this was
overkill and instead reserved 1-per-active-swapfile-backed-by-NFS.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 2/3] bql: Byte queue limits
From: Eric Dumazet @ 2011-04-26 14:33 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev
In-Reply-To: <BANLkTi=xTTjcG7CVfCk8sWojRSZhST+e_Q@mail.gmail.com>

Le mardi 26 avril 2011 à 07:13 -0700, Tom Herbert a écrit :

> > SMP ???), and why kobj is now guarded by CONFIG_XPS instead of
> > CONFIG_RPS.
> >
> Because that kobj is for transmit side (currently only for xps_cpus)

Please provide a separate patch then, cleanups should be seperated.




^ permalink raw reply

* Re: [PATCH 13/13] mm: Account for the number of times direct reclaimers get throttled
From: Mel Gorman @ 2011-04-26 14:26 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <20110426223510.4c6ab3cc@notabene.brown>

On Tue, Apr 26, 2011 at 10:35:10PM +1000, NeilBrown wrote:
> On Tue, 26 Apr 2011 08:36:54 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > Under significant pressure when writing back to network-backed storage,
> > direct reclaimers may get throttled. This is expected to be a
> > short-lived event and the processes get woken up again but processes do
> > get stalled. This patch counts how many times such stalling occurs. It's
> > up to the administrator whether to reduce these stalls by increasing
> > min_free_kbytes.
> > 
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > ---
> >  include/linux/vm_event_item.h |    1 +
> >  mm/vmscan.c                   |    1 +
> >  mm/vmstat.c                   |    1 +
> >  3 files changed, 3 insertions(+), 0 deletions(-)
> > 
> > diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> > index 03b90cdc..652e5f3 100644
> > --- a/include/linux/vm_event_item.h
> > +++ b/include/linux/vm_event_item.h
> > @@ -29,6 +29,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> >  		FOR_ALL_ZONES(PGSTEAL),
> >  		FOR_ALL_ZONES(PGSCAN_KSWAPD),
> >  		FOR_ALL_ZONES(PGSCAN_DIRECT),
> > +		PGSCAN_DIRECT_THROTTLE,
> >  #ifdef CONFIG_NUMA
> >  		PGSCAN_ZONE_RECLAIM_FAILED,
> >  #endif
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 8b6da2b..e88138b 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2154,6 +2154,7 @@ static void throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
> >  		goto out;
> >  
> >  	/* Throttle */
> > +	count_vm_event(PGSCAN_DIRECT_THROTTLE);
> >  	do {
> >  		schedule();
> >  		finish_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait);
> > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > index a2b7344..5725387 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -911,6 +911,7 @@ const char * const vmstat_text[] = {
> >  	TEXTS_FOR_ZONES("pgsteal")
> >  	TEXTS_FOR_ZONES("pgscan_kswapd")
> >  	TEXTS_FOR_ZONES("pgscan_direct")
> > +	"pgscan_direct_throttle",
> >  
> >  #ifdef CONFIG_NUMA
> >  	"zone_reclaim_failed",
> 
> I like this approach.  Make the information available, but don't make a fuss
> about it.
> 
> Actually, I like the whole series - I'm really having to dig deep to find
> anything to complain about :-)
> 
> Feel free to put
>    Reviewed-by: NeilBrown <neilb@suse.de>
> against anything that I haven't commented on.
> 

Thanks very much!

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 12/13] mm: Throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage
From: Mel Gorman @ 2011-04-26 14:26 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <20110426223059.10f3edda@notabene.brown>

On Tue, Apr 26, 2011 at 10:30:59PM +1000, NeilBrown wrote:
> On Tue, 26 Apr 2011 08:36:53 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> 
> > +/*
> > + * Throttle direct reclaimers if backing storage is backed by the network
> > + * and the PFMEMALLOC reserve for the preferred node is getting dangerously
> > + * depleted. kswapd will continue to make progress and wake the processes
> > + * when the low watermark is reached
> > + */
> > +static void throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
> > +					nodemask_t *nodemask)
> > +{
> > +	struct zone *zone;
> > +	int high_zoneidx = gfp_zone(gfp_mask);
> > +	DEFINE_WAIT(wait);
> > +
> > +	/* Check if the pfmemalloc reserves are ok */
> > +	first_zones_zonelist(zonelist, high_zoneidx, NULL, &zone);
> > +	prepare_to_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait,
> > +							TASK_INTERRUPTIBLE);
> > +	if (pfmemalloc_watermark_ok(zone->zone_pgdat, high_zoneidx))
> > +		goto out;
> > +
> > +	/* Throttle */
> > +	do {
> > +		schedule();
> > +		finish_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait);
> > +		prepare_to_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait,
> > +							TASK_INTERRUPTIBLE);
> > +	} while (!pfmemalloc_watermark_ok(zone->zone_pgdat, high_zoneidx) &&
> > +			!fatal_signal_pending(current));
> > +
> > +out:
> > +	finish_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait);
> > +}
> 
> You are doing an interruptible wait, but only checking for fatal signals.
> So if a non-fatal signal arrives, you will busy-wait.
> 
> So I suspect you want TASK_KILLABLE, so just use:
> 
>     wait_event_killable(zone->zone_pgdat->pfmemalloc_wait,
>                         pgmemalloc_watermark_ok(zone->zone_pgdata,
>                                                 high_zoneidx));
> 

Well, if a normal signal arrives, we do not necessarily want the
process to enter reclaim. For fatal signals, I allow it to continue
because it's not likely to be putting the system under more pressure
if it's exiting.

> (You also have an extraneous call to finish_wait)
> 

Which one? I'm not seeing a flow where finish_wait gets called twice
without a prepare_to_wait in between. 

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply

* Re: [PATCH 00/13] Swap-over-NBD without deadlocking
From: Peter Zijlstra @ 2011-04-26 14:23 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Neil Brown
In-Reply-To: <1303803414-5937-1-git-send-email-mgorman@suse.de>

On Tue, 2011-04-26 at 08:36 +0100, Mel Gorman wrote:
> Comments?

Last time I brought up the whole swap over network bits I was pointed
towards the generic skb recycling work:

  http://lwn.net/Articles/332037/

as a means to pre-allocate memory, and it was suggested to simply pin
the few route-cache entries required to route these packets and
dis-allow swap packets to be fragmented (these last two avoid lots of
funny allocation cases in the network stack).


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 2/3] bql: Byte queue limits
From: Tom Herbert @ 2011-04-26 14:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev
In-Reply-To: <1303796313.2747.190.camel@edumazet-laptop>

> Wow... magical values and very limited advices how to tune them.
>
The intent is that the algorithm is sufficiently adaptive so that no
tuning should be required.

> Tom, this reminds me you were supposed to provide Documentation/files to
> describe RPS, RFS, XPS ...
>
We'll look into that.

> We receive many questions about these features...
>
>> Signed-off-by: Tom Herbert <therbert@google.com>
>> ---
>>  include/linux/netdevice.h |   46 +++++++++++++++-
>>  net/core/net-sysfs.c      |  137 +++++++++++++++++++++++++++++++++++++++++++--
>>  2 files changed, 177 insertions(+), 6 deletions(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index cb8178a..0a76b88 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -44,6 +44,7 @@
>>  #include <linux/rculist.h>
>>  #include <linux/dmaengine.h>
>>  #include <linux/workqueue.h>
>> +#include <linux/dynamic_queue_limits.h>
>>
>>  #include <linux/ethtool.h>
>>  #include <net/net_namespace.h>
>> @@ -556,8 +557,10 @@ struct netdev_queue {
>>       struct Qdisc            *qdisc;
>>       unsigned long           state;
>>       struct Qdisc            *qdisc_sleeping;
>> -#ifdef CONFIG_RPS
>> +#ifdef CONFIG_XPS
>>       struct kobject          kobj;
>> +     bool                    do_bql;
>> +     struct dql              dql;
>>  #endif
>
> I have no idea why you use CONFIG_XPS for BQL (how BQL is it related to

I'll add CONFIG_BQL

> SMP ???), and why kobj is now guarded by CONFIG_XPS instead of
> CONFIG_RPS.
>
Because that kobj is for transmit side (currently only for xps_cpus)

>
>
>

^ permalink raw reply

* Re: [PATCH 09/13] netvm: Set PF_MEMALLOC as appropriate during SKB processing
From: Mel Gorman @ 2011-04-26 14:10 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <20110426222157.33a461f8@notabene.brown>

On Tue, Apr 26, 2011 at 10:21:57PM +1000, NeilBrown wrote:
> On Tue, 26 Apr 2011 08:36:50 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 3871bf6..2d79a20 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3095,6 +3095,27 @@ static void vlan_on_bond_hook(struct sk_buff *skb)
> >  	}
> >  }
> >  
> > +/*
> > + * Limit which protocols can use the PFMEMALLOC reserves to those that are
> > + * expected to be used for communication with swap.
> > + */
> > +static bool skb_pfmemalloc_protocol(struct sk_buff *skb)
> > +{
> > +	if (skb_pfmemalloc(skb))
> > +		switch (skb->protocol) {
> > +		case __constant_htons(ETH_P_ARP):
> > +		case __constant_htons(ETH_P_IP):
> > +		case __constant_htons(ETH_P_IPV6):
> > +		case __constant_htons(ETH_P_8021Q):
> > +			break;
> > +
> > +		default:
> > +			return false;
> > +		}
> > +
> > +	return true;
> > +}
> 
> This sort of thing really bugs me :-)
> Neither the comment nor the function name actually describe what the function
> is doing.  The function is checking *2* things.
>    is_pfmemalloc_skb_or_pfmemalloc_protocol()
> might be a more correct name, but is too verbose.
> 
> I would prefer the skb_pfmemalloc test were removed from here and ....
> 
> > +	if (!skb_pfmemalloc_protocol(skb))
> > +		goto drop;
> > +
> 
> ...added here so this becomes:
> 
>       if (!skb_pfmemalloc(skb) && !skb_pfmemalloc_protocol(skb))
>                 goto drop;
> 
> which actually makes sense.
> 

Moving the check is neater but that check should be

	if (skb_pfmemalloc(skb) && !skb_pfmemalloc_protocol(skb))

? It's only if the skb was allocated from emergency reserves that we
need to consider dropping it to make way for other packets to be
received.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 03/13] mm: Introduce __GFP_MEMALLOC to allow access to emergency reserves
From: Mel Gorman @ 2011-04-26 14:00 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <20110426205330.539a2766@notabene.brown>

On Tue, Apr 26, 2011 at 08:53:30PM +1000, NeilBrown wrote:
> On Tue, 26 Apr 2011 11:36:46 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > On Tue, Apr 26, 2011 at 07:49:47PM +1000, NeilBrown wrote:
> 
> > > Maybe a
> > >    WARN_ON((gfp_mask & __GFP_MEMALLOC) && (gfp_mask & __GFP_NOMEMALLOC));
> > > might be wise?
> > > 
> > 
> > Both MEMALLOC and NOMEMALLOC are related to PFMEMALLOC reserves so
> > it's reasonable for them to have similar names. This warning will
> > also trigger because it's a combination of flags that does happen.
> > 
> > Consider for example
> > 
> > any interrupt
> >   -> __netdev_alloc_skb		(mask == GFP_ATOMIC)
> >     -> __alloc_skb		(mask == GFP_ATOMIC)
> >        if (sk_memalloc_socks() && (flags & SKB_ALLOC_RX))
> >                gfp_mask |= __GFP_MEMALLOC;
> > 				(mask == GFP_ATOMIC|__GFP_NOMEMALLOC)
> >       -> __kmalloc_reserve
> > 		First attempt tries to avoid reserves so adds __GFP_MEMALLOC
> > 				(mask == GFP_ATOMIC|__GFP_NOMEMALLOC|__GFP_MEMALLOC)
> > 
> 
> You have the "NO"s mixed up a bit which confused me for a while :-)
> But I see your point - I guess the WARN_ON isn't really needed.
> 

Bah, that was unhelpful on my part. I'm glad you saw the point anyway.
Sorry about that.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 02/13] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages
From: Mel Gorman @ 2011-04-26 13:59 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <20110426213758.450f6f49@notabene.brown>

On Tue, Apr 26, 2011 at 09:37:58PM +1000, NeilBrown wrote:
> On Tue, 26 Apr 2011 08:36:43 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > +		/*
> > +		 * If there are full empty slabs and we were not forced to
> > +		 * allocate a slab, mark this one !pfmemalloc
> > +		 */
> > +		l3 = cachep->nodelists[numa_mem_id()];
> > +		if (!list_empty(&l3->slabs_free) && force_refill) {
> > +			struct slab *slabp = virt_to_slab(objp);
> > +			slabp->pfmemalloc = false;
> > +			clear_obj_pfmemalloc(&objp);
> > +			check_ac_pfmemalloc(cachep, ac);
> > +			return objp;
> > +		}
> 
> The comment doesn't match the code.  I think you need to remove the words
> "full" and "not" assuming the code is correct which it probably is...
> 

I'll fix up the comment, you're right, it's confusing.

> But the code seems to be much more complex than Peter's original, and I don't
> see the gain.
> 

You're right, it is more complex.

> Peter's code had only one 'reserved' flag for each kmem_cache. 

The reserve was set in a per-cpu structure so there was a "lag" time
before that information was available to other CPUs. Fine on smaller
machines but a bit more of a problem today. 

> You seem to
> have one for every slab.  I don't see the point.
> It is true that yours is in some sense more fair - but I'm not sure the
> complexity is worth it.
> 

More fairness was one of the objects.

> Was there some particular reason you made the change?
> 

This version survives under considerably more stress than Peter's
original version did without requiring the additional complexity of
memory reserves.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: how to set vlan filter for intel 82599
From: Ben Hutchings @ 2011-04-26 13:31 UTC (permalink / raw)
  To: zhou rui; +Cc: netdev, e1000-devel
In-Reply-To: <BANLkTi=eADSYV90EC9uJhtLZfBBJa5O1Bw@mail.gmail.com>

On Tue, 2011-04-26 at 12:39 +0800, zhou rui wrote:
[...]
> i set the filter like below:
> 
> for a vlanid=50, it always match the last rule (action 7)
> 
> ./ethtool -K eth5 ntuple off
> ./ethtool -K eth5 ntuple on
> ./ethtool -U eth5 flow-type tcp4 vlan 32 vlan-mask 0xF01F action 1
> ./ethtool -U eth5 flow-type udp4 vlan 32 vlan-mask 0xF01F action 1
> ./ethtool -U eth5 flow-type udp4 vlan 64 vlan-mask 0xF01F action 7
> ./ethtool -U eth5 flow-type tcp4 vlan 64 vlan-mask 0xF01F action 7
> 
> I tried the latest ixgbe driver 3.3.9, it reports:
> 
> Cannot add new RX n-tuple filter: Operation not permitted
> 
> ./ethtool -V
> ethtool version 2.6.36

Check dmesg; there should be an error message there.  Of course the
error code should be EINVAL and not EPERM.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH 13/13] mm: Account for the number of times direct reclaimers get throttled
From: NeilBrown @ 2011-04-26 12:35 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <1303803414-5937-14-git-send-email-mgorman@suse.de>

On Tue, 26 Apr 2011 08:36:54 +0100 Mel Gorman <mgorman@suse.de> wrote:

> Under significant pressure when writing back to network-backed storage,
> direct reclaimers may get throttled. This is expected to be a
> short-lived event and the processes get woken up again but processes do
> get stalled. This patch counts how many times such stalling occurs. It's
> up to the administrator whether to reduce these stalls by increasing
> min_free_kbytes.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>  include/linux/vm_event_item.h |    1 +
>  mm/vmscan.c                   |    1 +
>  mm/vmstat.c                   |    1 +
>  3 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 03b90cdc..652e5f3 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -29,6 +29,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>  		FOR_ALL_ZONES(PGSTEAL),
>  		FOR_ALL_ZONES(PGSCAN_KSWAPD),
>  		FOR_ALL_ZONES(PGSCAN_DIRECT),
> +		PGSCAN_DIRECT_THROTTLE,
>  #ifdef CONFIG_NUMA
>  		PGSCAN_ZONE_RECLAIM_FAILED,
>  #endif
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 8b6da2b..e88138b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2154,6 +2154,7 @@ static void throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
>  		goto out;
>  
>  	/* Throttle */
> +	count_vm_event(PGSCAN_DIRECT_THROTTLE);
>  	do {
>  		schedule();
>  		finish_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait);
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index a2b7344..5725387 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -911,6 +911,7 @@ const char * const vmstat_text[] = {
>  	TEXTS_FOR_ZONES("pgsteal")
>  	TEXTS_FOR_ZONES("pgscan_kswapd")
>  	TEXTS_FOR_ZONES("pgscan_direct")
> +	"pgscan_direct_throttle",
>  
>  #ifdef CONFIG_NUMA
>  	"zone_reclaim_failed",

I like this approach.  Make the information available, but don't make a fuss
about it.

Actually, I like the whole series - I'm really having to dig deep to find
anything to complain about :-)

Feel free to put
   Reviewed-by: NeilBrown <neilb@suse.de>
against anything that I haven't commented on.

Thanks,
NeilBrown

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 12/13] mm: Throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage
From: NeilBrown @ 2011-04-26 12:30 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <1303803414-5937-13-git-send-email-mgorman@suse.de>

On Tue, 26 Apr 2011 08:36:53 +0100 Mel Gorman <mgorman@suse.de> wrote:


> +/*
> + * Throttle direct reclaimers if backing storage is backed by the network
> + * and the PFMEMALLOC reserve for the preferred node is getting dangerously
> + * depleted. kswapd will continue to make progress and wake the processes
> + * when the low watermark is reached
> + */
> +static void throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
> +					nodemask_t *nodemask)
> +{
> +	struct zone *zone;
> +	int high_zoneidx = gfp_zone(gfp_mask);
> +	DEFINE_WAIT(wait);
> +
> +	/* Check if the pfmemalloc reserves are ok */
> +	first_zones_zonelist(zonelist, high_zoneidx, NULL, &zone);
> +	prepare_to_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait,
> +							TASK_INTERRUPTIBLE);
> +	if (pfmemalloc_watermark_ok(zone->zone_pgdat, high_zoneidx))
> +		goto out;
> +
> +	/* Throttle */
> +	do {
> +		schedule();
> +		finish_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait);
> +		prepare_to_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait,
> +							TASK_INTERRUPTIBLE);
> +	} while (!pfmemalloc_watermark_ok(zone->zone_pgdat, high_zoneidx) &&
> +			!fatal_signal_pending(current));
> +
> +out:
> +	finish_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait);
> +}

You are doing an interruptible wait, but only checking for fatal signals.
So if a non-fatal signal arrives, you will busy-wait.

So I suspect you want TASK_KILLABLE, so just use:

    wait_event_killable(zone->zone_pgdat->pfmemalloc_wait,
                        pgmemalloc_watermark_ok(zone->zone_pgdata,
                                                high_zoneidx));

(You also have an extraneous call to finish_wait)

NeilBrown


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 09/13] netvm: Set PF_MEMALLOC as appropriate during SKB processing
From: NeilBrown @ 2011-04-26 12:21 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <1303803414-5937-10-git-send-email-mgorman@suse.de>

On Tue, 26 Apr 2011 08:36:50 +0100 Mel Gorman <mgorman@suse.de> wrote:

> diff --git a/net/core/dev.c b/net/core/dev.c
> index 3871bf6..2d79a20 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3095,6 +3095,27 @@ static void vlan_on_bond_hook(struct sk_buff *skb)
>  	}
>  }
>  
> +/*
> + * Limit which protocols can use the PFMEMALLOC reserves to those that are
> + * expected to be used for communication with swap.
> + */
> +static bool skb_pfmemalloc_protocol(struct sk_buff *skb)
> +{
> +	if (skb_pfmemalloc(skb))
> +		switch (skb->protocol) {
> +		case __constant_htons(ETH_P_ARP):
> +		case __constant_htons(ETH_P_IP):
> +		case __constant_htons(ETH_P_IPV6):
> +		case __constant_htons(ETH_P_8021Q):
> +			break;
> +
> +		default:
> +			return false;
> +		}
> +
> +	return true;
> +}

This sort of thing really bugs me :-)
Neither the comment nor the function name actually describe what the function
is doing.  The function is checking *2* things.
   is_pfmemalloc_skb_or_pfmemalloc_protocol()
might be a more correct name, but is too verbose.

I would prefer the skb_pfmemalloc test were removed from here and ....

> +	if (!skb_pfmemalloc_protocol(skb))
> +		goto drop;
> +

...added here so this becomes:

      if (!skb_pfmemalloc(skb) && !skb_pfmemalloc_protocol(skb))
                goto drop;

which actually makes sense.

Thanks,
NeilBrown

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 02/13] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages
From: NeilBrown @ 2011-04-26 12:05 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <1303817628.5593.6.camel@machina.109elm.lan>

On Tue, 26 Apr 2011 12:33:48 +0100 Mel Gorman <mgorman@suse.de> wrote:

> On Tue, 2011-04-26 at 21:15 +1000, NeilBrown wrote:
> > On Tue, 26 Apr 2011 08:36:43 +0100 Mel Gorman <mgorman@suse.de> wrote:
> > 
> > > +bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
> > > +{
> > > +	return gfp_to_alloc_flags(gfp_mask) & ALLOC_PFMEMALLOC;
> > > +}
> > > +
> > >  static inline struct page *
> > >  __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > >  	struct zonelist *zonelist, enum zone_type high_zoneidx,
> > > @@ -2202,8 +2211,16 @@ nopage:
> > >  got_pg:
> > >  	if (kmemcheck_enabled)
> > >  		kmemcheck_pagealloc_alloc(page, order, gfp_mask);
> > > -	return page;
> > >  
> > > +	/*
> > > +	 * page->pfmemalloc is set when the caller had PFMEMALLOC set or is
> > > +	 * been OOM killed. The expectation is that the caller is taking
> > > +	 * steps that will free more memory. The caller should avoid the
> > > +	 * page being used for !PFMEMALLOC purposes.
> > > +	 */
> > > +	page->pfmemalloc = (alloc_flags & ALLOC_PFMEMALLOC);
> > > +
> > > +	return page;
> > 
> > Linus doesn't seem to be a fan of this construct:
> >    https://lkml.org/lkml/2011/4/1/255
> > 
> 
> There is confusion around this topic. Andrew prefers bool for true/false
> values and it's self-documenting. I tend to prefer it myself for
> readability and there is a slow conversion in the VM from
> ints-used-as-bools to bools and my understanding of bool is that any
> non-zero value will be treated as true (just as it is for int).
> 
> > pfmemalloc is a bool, and the value on the right is either 0 or 0x1000.
> > 
> > If bool happens to be typedefed to 'char' or even 'short', pfmemalloc would
> > always be set to 0.
> 
> It is typedeffed as _Bool though which I thought was able to handle the
> cast appropriately or is that wrong?

Yes, I too believe that _Bool does the right thing, so this particular usage
does happen to be safe in the kernel.  But there is a long tradition of
'bool' being typedefed to an int type so the usage can look wrong.  So maybe
it is best avoided.

I have no strong feeling either way - I just thought I would highlight it.
In general, I like bool, but I don't like automatic casts (I often use
  if (pointer != NULL) ...
because I think it reads better).

Thanks,
NeilBrown



> 
> > Ditto for the gfp_pfmemalloc_allowed function.
> > 
> > Prefixing with '!!' would make it safe.
> > 
> 
> Will do that to avoid any oddities. Thanks
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH] dsa/mv88e6131: fix unknown multicast/broadcast forwarding on mv88e6085
From: Peter Korsgaard @ 2011-04-26 11:45 UTC (permalink / raw)
  To: davem, buytenh, netdev; +Cc: Peter Korsgaard

The 88e6085 has a few differences from the other devices in the port
control registers, causing unknown multicast/broadcast packets to get
dropped when using the standard port setup.

At the same time update kconfig to clarify that the mv88e6085 is now
supported.

Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
---
 net/dsa/Kconfig     |    4 ++--
 net/dsa/mv88e6131.c |   26 +++++++++++++++++++++-----
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index 87bb5f4..c53ded2 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -41,12 +41,12 @@ config NET_DSA_MV88E6XXX_NEED_PPU
 	default n
 
 config NET_DSA_MV88E6131
-	bool "Marvell 88E6095/6095F/6131 ethernet switch chip support"
+	bool "Marvell 88E6085/6095/6095F/6131 ethernet switch chip support"
 	select NET_DSA_MV88E6XXX
 	select NET_DSA_MV88E6XXX_NEED_PPU
 	select NET_DSA_TAG_DSA
 	---help---
-	  This enables support for the Marvell 88E6095/6095F/6131
+	  This enables support for the Marvell 88E6085/6095/6095F/6131
 	  ethernet switch chips.
 
 config NET_DSA_MV88E6123_61_65
diff --git a/net/dsa/mv88e6131.c b/net/dsa/mv88e6131.c
index 3da4188..45f7411 100644
--- a/net/dsa/mv88e6131.c
+++ b/net/dsa/mv88e6131.c
@@ -207,8 +207,15 @@ static int mv88e6131_setup_port(struct dsa_switch *ds, int p)
 	 * mode, but do not enable forwarding of unknown unicasts.
 	 */
 	val = 0x0433;
-	if (p == dsa_upstream_port(ds))
+	if (p == dsa_upstream_port(ds)) {
 		val |= 0x0104;
+		/*
+		 * On 6085, unknown multicast forward is controlled
+		 * here rather than in Port Control 2 register.
+		 */
+		if (ps->id == ID_6085)
+			val |= 0x0008;
+	}
 	if (ds->dsa_port_mask & (1 << p))
 		val |= 0x0100;
 	REG_WRITE(addr, 0x04, val);
@@ -251,10 +258,19 @@ static int mv88e6131_setup_port(struct dsa_switch *ds, int p)
 	 * If this is the upstream port for this switch, enable
 	 * forwarding of unknown multicast addresses.
 	 */
-	val = 0x0080 | dsa_upstream_port(ds);
-	if (p == dsa_upstream_port(ds))
-		val |= 0x0040;
-	REG_WRITE(addr, 0x08, val);
+	if (ps->id == ID_6085)
+		/*
+		 * on 6085, bits 3:0 are reserved, bit 6 control ARP
+		 * mirroring, and multicast forward is handled in
+		 * Port Control register.
+		 */
+		REG_WRITE(addr, 0x08, 0x0080);
+	else {
+		val = 0x0080 | dsa_upstream_port(ds);
+		if (p == dsa_upstream_port(ds))
+			val |= 0x0040;
+		REG_WRITE(addr, 0x08, val);
+	}
 
 	/*
 	 * Rate Control: disable ingress rate limiting.
-- 
1.7.2.3


^ permalink raw reply related

* Re: [PATCH 02/13] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages
From: NeilBrown @ 2011-04-26 11:37 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <1303803414-5937-3-git-send-email-mgorman@suse.de>

On Tue, 26 Apr 2011 08:36:43 +0100 Mel Gorman <mgorman@suse.de> wrote:

> +		/*
> +		 * If there are full empty slabs and we were not forced to
> +		 * allocate a slab, mark this one !pfmemalloc
> +		 */
> +		l3 = cachep->nodelists[numa_mem_id()];
> +		if (!list_empty(&l3->slabs_free) && force_refill) {
> +			struct slab *slabp = virt_to_slab(objp);
> +			slabp->pfmemalloc = false;
> +			clear_obj_pfmemalloc(&objp);
> +			check_ac_pfmemalloc(cachep, ac);
> +			return objp;
> +		}

The comment doesn't match the code.  I think you need to remove the words
"full" and "not" assuming the code is correct which it probably is...

But the code seems to be much more complex than Peter's original, and I don't
see the gain.

Peter's code had only one 'reserved' flag for each kmem_cache.  You seem to
have one for every slab.  I don't see the point.
It is true that yours is in some sense more fair - but I'm not sure the
complexity is worth it.

Was there some particular reason you made the change?

NeilBrown

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 02/13] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages
From: Mel Gorman @ 2011-04-26 11:33 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <20110426211500.02d6a5a6@notabene.brown>

On Tue, 2011-04-26 at 21:15 +1000, NeilBrown wrote:
> On Tue, 26 Apr 2011 08:36:43 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > +bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
> > +{
> > +	return gfp_to_alloc_flags(gfp_mask) & ALLOC_PFMEMALLOC;
> > +}
> > +
> >  static inline struct page *
> >  __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  	struct zonelist *zonelist, enum zone_type high_zoneidx,
> > @@ -2202,8 +2211,16 @@ nopage:
> >  got_pg:
> >  	if (kmemcheck_enabled)
> >  		kmemcheck_pagealloc_alloc(page, order, gfp_mask);
> > -	return page;
> >  
> > +	/*
> > +	 * page->pfmemalloc is set when the caller had PFMEMALLOC set or is
> > +	 * been OOM killed. The expectation is that the caller is taking
> > +	 * steps that will free more memory. The caller should avoid the
> > +	 * page being used for !PFMEMALLOC purposes.
> > +	 */
> > +	page->pfmemalloc = (alloc_flags & ALLOC_PFMEMALLOC);
> > +
> > +	return page;
> 
> Linus doesn't seem to be a fan of this construct:
>    https://lkml.org/lkml/2011/4/1/255
> 

There is confusion around this topic. Andrew prefers bool for true/false
values and it's self-documenting. I tend to prefer it myself for
readability and there is a slow conversion in the VM from
ints-used-as-bools to bools and my understanding of bool is that any
non-zero value will be treated as true (just as it is for int).

> pfmemalloc is a bool, and the value on the right is either 0 or 0x1000.
> 
> If bool happens to be typedefed to 'char' or even 'short', pfmemalloc would
> always be set to 0.

It is typedeffed as _Bool though which I thought was able to handle the
cast appropriately or is that wrong?

> Ditto for the gfp_pfmemalloc_allowed function.
> 
> Prefixing with '!!' would make it safe.
> 

Will do that to avoid any oddities. Thanks

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 02/13] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages
From: NeilBrown @ 2011-04-26 11:15 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <1303803414-5937-3-git-send-email-mgorman@suse.de>

On Tue, 26 Apr 2011 08:36:43 +0100 Mel Gorman <mgorman@suse.de> wrote:

> +bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
> +{
> +	return gfp_to_alloc_flags(gfp_mask) & ALLOC_PFMEMALLOC;
> +}
> +
>  static inline struct page *
>  __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	struct zonelist *zonelist, enum zone_type high_zoneidx,
> @@ -2202,8 +2211,16 @@ nopage:
>  got_pg:
>  	if (kmemcheck_enabled)
>  		kmemcheck_pagealloc_alloc(page, order, gfp_mask);
> -	return page;
>  
> +	/*
> +	 * page->pfmemalloc is set when the caller had PFMEMALLOC set or is
> +	 * been OOM killed. The expectation is that the caller is taking
> +	 * steps that will free more memory. The caller should avoid the
> +	 * page being used for !PFMEMALLOC purposes.
> +	 */
> +	page->pfmemalloc = (alloc_flags & ALLOC_PFMEMALLOC);
> +
> +	return page;

Linus doesn't seem to be a fan of this construct:
   https://lkml.org/lkml/2011/4/1/255

pfmemalloc is a bool, and the value on the right is either 0 or 0x1000.

If bool happens to be typedefed to 'char' or even 'short', pfmemalloc would
always be set to 0.
Ditto for the gfp_pfmemalloc_allowed function.

Prefixing with '!!' would make it safe.

NeilBrown

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH net-2.6 4/4] xfrm: Fix integer underrun on zero sized replay windows
From: Steffen Klassert @ 2011-04-26 10:58 UTC (permalink / raw)
  To: David Miller, Herbert Xu; +Cc: netdev
In-Reply-To: <20110426054232.GI5495@secunet.com>

On Tue, Apr 26, 2011 at 07:42:32AM +0200, Steffen Klassert wrote:
> The check if the replay window is contained within one subspace or
> spans over two subspaces causes an unwanted integer underrun on
> zero sized replay windows when we subtract minus one. We fix this by
> changeing this check to avoid the subtraction.
> 

Don't apply this one, it does not fix the issue completely.
I'll send a better one, sorry.

^ permalink raw reply

* Re: [PATCH 03/13] mm: Introduce __GFP_MEMALLOC to allow access to emergency reserves
From: NeilBrown @ 2011-04-26 10:53 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <20110426103646.GD4658@suse.de>

On Tue, 26 Apr 2011 11:36:46 +0100 Mel Gorman <mgorman@suse.de> wrote:

> On Tue, Apr 26, 2011 at 07:49:47PM +1000, NeilBrown wrote:

> > Maybe a
> >    WARN_ON((gfp_mask & __GFP_MEMALLOC) && (gfp_mask & __GFP_NOMEMALLOC));
> > might be wise?
> > 
> 
> Both MEMALLOC and NOMEMALLOC are related to PFMEMALLOC reserves so
> it's reasonable for them to have similar names. This warning will
> also trigger because it's a combination of flags that does happen.
> 
> Consider for example
> 
> any interrupt
>   -> __netdev_alloc_skb		(mask == GFP_ATOMIC)
>     -> __alloc_skb		(mask == GFP_ATOMIC)
>        if (sk_memalloc_socks() && (flags & SKB_ALLOC_RX))
>                gfp_mask |= __GFP_MEMALLOC;
> 				(mask == GFP_ATOMIC|__GFP_NOMEMALLOC)
>       -> __kmalloc_reserve
> 		First attempt tries to avoid reserves so adds __GFP_MEMALLOC
> 				(mask == GFP_ATOMIC|__GFP_NOMEMALLOC|__GFP_MEMALLOC)
> 

You have the "NO"s mixed up a bit which confused me for a while :-)
But I see your point - I guess the WARN_ON isn't really needed.


> You're right in that __GFP_NOMEMALLOC overrides __GFP_MEMALLOC so that
> could do with a note.
> 

Thanks,

NeilBrown

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox