Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 4/5] page allocator: Pre-emptively wake kswapd when high-order watermarks are hit
From: Mel Gorman @ 2009-10-23  9:13 UTC (permalink / raw)
  To: David Rientjes
  Cc: Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, Rafael J. Wysocki, David Miller, Reinette Chatre,
	Kalle Valo, KOSAKI Motohiro, Mohamed Abbas, Jens Axboe,
	John W. Linville, Pekka Enberg, Bartlomiej Zolnierkiewicz,
	Greg Kroah-Hartman, Stephan von Krawczynski, Kernel Testers List,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
In-Reply-To: <alpine.DEB.2.00.0910221227010.21601-X6Q0R45D7oAcqpCFd4KODRPsWskHk0ljAL8bYrjMMd8@public.gmane.org>

On Thu, Oct 22, 2009 at 12:41:42PM -0700, David Rientjes wrote:
> On Thu, 22 Oct 2009, Mel Gorman wrote:
> 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 7f2aa3e..851df40 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1596,6 +1596,17 @@ try_next_zone:
> >  	return page;
> >  }
> >  
> > +static inline
> > +void wake_all_kswapd(unsigned int order, struct zonelist *zonelist,
> > +						enum zone_type high_zoneidx)
> > +{
> > +	struct zoneref *z;
> > +	struct zone *zone;
> > +
> > +	for_each_zone_zonelist(zone, z, zonelist, high_zoneidx)
> > +		wakeup_kswapd(zone, order);
> > +}
> > +
> >  static inline int
> >  should_alloc_retry(gfp_t gfp_mask, unsigned int order,
> >  				unsigned long pages_reclaimed)
> > @@ -1730,18 +1741,18 @@ __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order,
> >  			congestion_wait(BLK_RW_ASYNC, HZ/50);
> >  	} while (!page && (gfp_mask & __GFP_NOFAIL));
> >  
> > -	return page;
> > -}
> > -
> > -static inline
> > -void wake_all_kswapd(unsigned int order, struct zonelist *zonelist,
> > -						enum zone_type high_zoneidx)
> > -{
> > -	struct zoneref *z;
> > -	struct zone *zone;
> > +	/*
> > +	 * If after a high-order allocation we are now below watermarks,
> > +	 * pre-emptively kick kswapd rather than having the next allocation
> > +	 * fail and have to wake up kswapd, potentially failing GFP_ATOMIC
> > +	 * allocations or entering direct reclaim
> > +	 */
> > +	if (unlikely(order) && page && !zone_watermark_ok(preferred_zone, order,
> > +				preferred_zone->watermark[ALLOC_WMARK_LOW],
> > +				zone_idx(preferred_zone), ALLOC_WMARK_LOW))
> > +		wake_all_kswapd(order, zonelist, high_zoneidx);
> >  
> > -	for_each_zone_zonelist(zone, z, zonelist, high_zoneidx)
> > -		wakeup_kswapd(zone, order);
> > +	return page;
> >  }
> >  
> >  static inline int
> 
> Hmm, is this really supposed to be added to __alloc_pages_high_priority()?  
> By the patch description I was expecting kswapd to be woken up 
> preemptively whenever the preferred zone is below ALLOC_WMARK_LOW and 
> we're known to have just allocated at a higher order, not just when 
> current was oom killed (when we should already be freeing a _lot_ of 
> memory soon) or is doing a higher order allocation during direct reclaim.
> 

It was a somewhat arbitrary choice to have it trigger in the event high
priority allocations were happening frequently.

> For the best coverage, it would have to be add the branch to the fastpath.  

Agreed - specifically at the end of __alloc_pages_nodemask()

> That seems fine for a debugging aid and to see if progress is being made 
> on the GFP_ATOMIC allocation issues, but doesn't seem like it should make 
> its way to mainline, the subsequent GFP_ATOMIC allocation could already be 
> happening and in the page allocator's slowpath at this point that this 
> wakeup becomes unnecessary.
> 
> If this is moved to the fastpath, why is this wake_all_kswapd() and not
> wakeup_kswapd(preferred_zone, order)?  Do we need to kick kswapd in all 
> zones even though they may be free just because preferred_zone is now 
> below the watermark?
> 

It probably makes no difference as zones are checked for their watermarks
before any real work happens. However, even if this patch makes a difference,
I don't want to see it merged.  At best, it is an extremely heavy-handed
hack which is why I asked for it to be tested in isolation. It shouldn't
be necessary at all because sort of pre-emptive waking of kswapd was never
necessary before.

> Wouldn't it be better to do this on page_zone(page) instead of 
> preferred_zone anyway?
> 

No. The preferred_zone is the zone we should be allocating from. If we
failed to allocate from it, it implies the watermarks are not being met
so we want to wake it.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* Re: [PATCH 4/5] page allocator: Pre-emptively wake kswapd when high-order watermarks are hit
From: David Rientjes @ 2009-10-23  9:36 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, Rafael J. Wysocki, David Miller, Reinette Chatre,
	Kalle Valo, KOSAKI Motohiro, Mohamed Abbas, Jens Axboe,
	John W. Linville, Pekka Enberg, Bartlomiej Zolnierkiewicz,
	Greg Kroah-Hartman, Stephan von Krawczynski, Kernel Testers List,
	netdev, linux-kernel
In-Reply-To: <20091023091334.GV11778@csn.ul.ie>

On Fri, 23 Oct 2009, Mel Gorman wrote:

> > Hmm, is this really supposed to be added to __alloc_pages_high_priority()?  
> > By the patch description I was expecting kswapd to be woken up 
> > preemptively whenever the preferred zone is below ALLOC_WMARK_LOW and 
> > we're known to have just allocated at a higher order, not just when 
> > current was oom killed (when we should already be freeing a _lot_ of 
> > memory soon) or is doing a higher order allocation during direct reclaim.
> > 
> 
> It was a somewhat arbitrary choice to have it trigger in the event high
> priority allocations were happening frequently.
> 

I don't quite understand, users of PF_MEMALLOC shouldn't be doing these 
higher order allocations and if ALLOC_NO_WATERMARKS is by way of the oom 
killer, we should be freeing a substantial amount of memory imminently 
when it exits that waking up kswapd would be irrelevant.

> > If this is moved to the fastpath, why is this wake_all_kswapd() and not
> > wakeup_kswapd(preferred_zone, order)?  Do we need to kick kswapd in all 
> > zones even though they may be free just because preferred_zone is now 
> > below the watermark?
> > 
> 
> It probably makes no difference as zones are checked for their watermarks
> before any real work happens. However, even if this patch makes a difference,
> I don't want to see it merged.  At best, it is an extremely heavy-handed
> hack which is why I asked for it to be tested in isolation. It shouldn't
> be necessary at all because sort of pre-emptive waking of kswapd was never
> necessary before.
> 

Ahh, that makes a ton more sense: this particular patch is a debugging 
effort while the first two are candidates for 2.6.32 and -stable.  Gotcha.

> > Wouldn't it be better to do this on page_zone(page) instead of 
> > preferred_zone anyway?
> > 
> 
> No. The preferred_zone is the zone we should be allocating from. If we
> failed to allocate from it, it implies the watermarks are not being met
> so we want to wake it.
> 

Oops, I'm even more confused now :)  I thought the existing 
wake_all_kswapd() in the slowpath was doing that and that this patch was 
waking them prematurely because it speculates that a subsequent high 
order allocation will fail unless memory is reclaimed.  I thought we'd  
want to reclaim from the zone we just did a high order allocation from so 
that the fastpath could find the memory next time with ALLOC_WMARK_LOW.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: VLAN and ARP failure on tg3 drivers
From: Benny Amorsen @ 2009-10-23  9:12 UTC (permalink / raw)
  To: Gertjan Hofman; +Cc: netdev
In-Reply-To: <93821.71108.qm@web32607.mail.mud.yahoo.com>

Gertjan Hofman <gertjan_hofman@yahoo.com> writes:

> Dear Kernel developers,
>
> A couple of weeks ago we tried to migrate from a 2.6.24  kernel to a
> 2.6.29 kernel and noticed our VLAN application no longer works.  The
> problem is easy to replicate:
>
> 1. connect 2 PC's with a cross-over cable
> 2. set up a fixed IP address to both PC's  (say 192.168.0.[1,2])
> 3. create a vlan:  vconfig  add eth0 0.

VLAN 0 is a special case ("priority tag only"). If you're just trying to
use VLAN's, pick a different number. Also avoid 1001-1024 and 4095 if
you add switches to the mix, because some vendors have odd ideas.


/Benny


^ permalink raw reply

* [PATCH 23/10] Optimize the upload speed for PPP connection.
From: fangxiaozhi 00110321 @ 2009-10-23  9:47 UTC (permalink / raw)
  To: netdev; +Cc: kernel, zihan, greg, haegar

From: fangxiaozhi <huananhu@huawei.com>
1. This patch is based on the kernel of 2.6.32-rc4 
2. In this patch, we enlarge the out buffer size to optimize the upload speed for the ppp connection. Then it can support the upload of HSUPA data cards.
Signed-off-by: fangxiaozhi <huananhu@huawei.com>
-----------------------------------------------------------------------------------------
--- a/drivers/net/ppp_async.c	2009-10-12 05:43:56.000000000 +0800
+++ b/drivers/net/ppp_async.c	2009-10-15 16:29:56.000000000 +0800
@@ -36,7 +36,7 @@
 
 #define PPP_VERSION	"2.4.2"
 
-#define OBUFSIZE	256
+#define OBUFSIZE	2048
 
 /* Structure for storing local state. */
 struct asyncppp {

******************************************************************************************
 This email and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained here in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this email in error, please notify the sender by phone or email
 immediately and delete it!
 *****************************************************************************************

^ permalink raw reply

* Re: bridging + load balancing bonding
From: Jasper Spaans @ 2009-10-23  9:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org
In-Reply-To: <4AE16F83.7080400@gmail.com>

On Fri, Oct 23, 2009 at 10:55:31AM +0200, Eric Dumazet wrote:

> Dont you think special attention is needed for multicast/broadcast trafic
> (they should be sent to both IDS) ?

Not really -- AFAIK we're currently not using the information encapsulated
in broadcast traffic to judge the unicast packets. Besides, we're
aggregating the results from the IDSes, so it shouldn't matter on which node
a packet is processed.

But if the need arises, this could be added quite easily to the code.

Jasper
-- 
Fox-IT Experts in IT Security!
T: +31 (0) 15 284 79 99
KvK Haaglanden 27301624

^ permalink raw reply

* Re: bridging + load balancing bonding
From: Eric Dumazet @ 2009-10-23  9:54 UTC (permalink / raw)
  To: Jasper Spaans; +Cc: netdev@vger.kernel.org
In-Reply-To: <20091023095137.GA22424@spaans.fox.local>

Jasper Spaans a écrit :
> On Fri, Oct 23, 2009 at 10:55:31AM +0200, Eric Dumazet wrote:
> 
>> Dont you think special attention is needed for multicast/broadcast trafic
>> (they should be sent to both IDS) ?
> 
> Not really -- AFAIK we're currently not using the information encapsulated
> in broadcast traffic to judge the unicast packets. Besides, we're
> aggregating the results from the IDSes, so it shouldn't matter on which node
> a packet is processed.
> 
> But if the need arises, this could be added quite easily to the code.
> 

OK

I found following very interesting article that can give pretty good advices

http://lwn.net/Articles/145406/


^ permalink raw reply

* Re: [PATCH 2/5] page allocator: Do not allow interrupts to use ALLOC_HARDER
From: Stephan von Krawczynski @ 2009-10-23  9:57 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, Rafael J. Wysocki, David Miller, Reinette Chatre,
	Kalle Valo, David Rientjes, KOSAKI Motohiro, Mohamed Abbas,
	Jens Axboe, John W. Linville, Pekka Enberg,
	Bartlomiej Zolnierkiewicz, Greg Kroah-Hartman,
	Kernel Testers List, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
In-Reply-To: <20091022163752.GU11778-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>

On Thu, 22 Oct 2009 17:37:52 +0100
Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> wrote:

> In this case, it's ok. It's just a harmless heads-up that the kernel
> looks slightly different than expected. I posted a 2.6.31.4 version of
> the two patches that cause real problems.

After around 12 hours of runtime with patch 1/5 and 2/5 on 2.6.31.4 I can see
no page allocation failure messages so far. I'll keep you informed.

-- 
Regards,
Stephan

^ permalink raw reply

* DM9000: Fix revision ID for DM9000B
From: Ben Dooks @ 2009-10-23 10:26 UTC (permalink / raw)
  To: netdev; +Cc: Simtec Linux Team

[-- Attachment #1: dm9000-fix-revb-id.patch --]
[-- Type: text/plain, Size: 841 bytes --]

The DM9000B revision ID is 0x1A, not 0x1B as set in the curernt dm9000.h
header.

Fix bug reported by Paolo Zebelloni.

Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: Simtec Linux Team <linux@simtec.co.uk>
---
 drivers/net/dm9000.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: b/drivers/net/dm9000.h
===================================================================
--- a/drivers/net/dm9000.h	2009-10-22 12:33:48.000000000 +0100
+++ b/drivers/net/dm9000.h	2009-10-22 12:33:53.000000000 +0100
@@ -50,7 +50,7 @@
 #define DM9000_RCSR	       0x32
 
 #define CHIPR_DM9000A	       0x19
-#define CHIPR_DM9000B	       0x1B
+#define CHIPR_DM9000B	       0x1A
 
 #define DM9000_MRCMDX          0xF0
 #define DM9000_MRCMD           0xF2

-- 
Ben (ben@fluff.org, http://www.fluff.org/)

  'a smiley only costs 4 bytes'

^ permalink raw reply

* [PATCH] e1000: the power down when running ifdown command
From: Naohiro Ooiwa @ 2009-10-23 10:29 UTC (permalink / raw)
  To: Jeff Kirsher, Jesse Brandeburg, Bruce Allan, PJ Waskiewicz, Joh
  Cc: netdev@vger.kernel.org, svaidy, e1000-devel, linux-kernel-owner,
	akpm

I resend the mail to maintainers of the e1000 driver.
and Andrew CC-ed.

-------- Original Message --------

Hi all

I'm trying to modify e1000 driver for power saving.

The e1000 driver doesn't let the power down when running ifdown command.
So, I set to the D3hot state of a PCI device at the end of e1000_close().

With this modification, e1000 driver reduces power by ifdown
to the same level as link-down case.

I spoke it in Collaboration Summit 2009.
For details and result of power measurement,
please refer the 36-38 page of following document.
http://events.linuxfoundation.org/archive/lfcs09_ooiwa.pdf

Could you please check the my patch ?

Should I consider WOL ?
Dosen't E1000_PCI_POWER_SAVE need ?


Thanks you.
Naohiro Ooiwa

Signed-off-by: Naohiro Ooiwa <nooiwa@miraclelinux.com>
---
 drivers/net/e1000/e1000_main.c |   32 ++++++++++++++++++++++++++++++++
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index bcd192c..12e1a42 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -26,6 +26,11 @@

 *******************************************************************************/

+/*
+ * define this if you want pci save power while ifdown.
+ */
+#define E1000_PCI_POWER_SAVE
+
 #include "e1000.h"
 #include <net/ip6_checksum.h>

@@ -1248,6 +1253,23 @@ static int e1000_open(struct net_device *netdev)
 	struct e1000_hw *hw = &adapter->hw;
 	int err;

+#ifdef E1000_PCI_POWER_SAVE
+	struct pci_dev *pdev = adapter->pdev;
+
+	pci_set_power_state(pdev, PCI_D0);
+	pci_restore_state(pdev);
+
+	if (adapter->need_ioport)
+		err = pci_enable_device(pdev);
+	else
+		err = pci_enable_device_mem(pdev);
+	if (err) {
+		printk(KERN_ERR "e1000: Cannot enable PCI device from power-save\n");
+		return err;
+	}
+	pci_set_master(pdev);
+#endif
+
 	/* disallow open during test */
 	if (test_bit(__E1000_TESTING, &adapter->flags))
 		return -EBUSY;
@@ -1265,6 +1287,7 @@ static int e1000_open(struct net_device *netdev)
 		goto err_setup_rx;

 	e1000_power_up_phy(adapter);
+	e1000_reset(adapter);

 	adapter->mng_vlan_id = E1000_MNG_VLAN_NONE;
 	if ((hw->mng_cookie.status &
@@ -1341,6 +1364,15 @@ static int e1000_close(struct net_device *netdev)
 		e1000_vlan_rx_kill_vid(netdev, adapter->mng_vlan_id);
 	}

+#ifdef E1000_PCI_POWER_SAVE
+#ifdef CONFIG_PM
+	pci_save_state(adapter->pdev);
+#endif
+	pci_disable_device(adapter->pdev);
+	pci_wake_from_d3(adapter->pdev, true);
+	pci_set_power_state(adapter->pdev, PCI_D3hot);
+#endif
+
 	return 0;
 }

-- 
1.5.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply related

* Re: [PATCH] net: Fix RPF to work with policy routing
From: jamal @ 2009-10-23 10:51 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, atis, eric.dumazet, zenczykowski
In-Reply-To: <20091022.214943.105371652.davem@davemloft.net>

On Thu, 2009-10-22 at 21:49 -0700, David Miller wrote:

> Such a change has a built-in assumption, I think, that
> marks are symmetric.

Only if the admin said they are symetric (by jumping
a few hoops).  In otherwise, I may intentionaly want
them to be symetric and use policy routing. I cant today.

> Just because we ended up with mark X on input doesn't mean
> that the reverse path route exists with mark X too.
>
> In fact I can't even see a valid way to specify a mark for
> the RPF lookup.

with the ipt or skbedit actions or via netfilter i could
set marks which could be as trivial as "set mark X if packet 
came in via eth0 or eth1 and mark Y if they came in via gre0"

> Maybe you can convince me otherwise :-)

Ok, let me try ;-> 

First let me say that it is _non-trivial_ for a packet to be 
policy-routing-RPF-dropped even with this patch. Youd have to do at
least 3 things to achieve that goal:
a) mark the packet on ingress 
b) have a  blackhole route in the policy routing table as the fall
through match and 
c) turn on rpf.

If someone goes that far to install policies, their intent could be
judged to mean they are serious about using RPF with policy routing.
If any of the above 3 conditions are not true then things continue to
work as is today.
IOW, if i set all those 3 conditions above then my expectation is the
system should not let in a packet if the policy routing table says no.
My intent is not to use the main table or some default route in the main
table; i really meant to use that policy routing table.

cheers,
jamal


^ permalink raw reply

* [RFC] [PATCH] cxgb3: Set the rxq
From: Krishna Kumar @ 2009-10-23 11:13 UTC (permalink / raw)
  To: davem; +Cc: netdev, divy, Krishna Kumar

From: Krishna Kumar <krkumar2@in.ibm.com>

Set the rxq# for LRO when processing the last fragment of a
frame. This helps in fast txq selection for routing workloads.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---

 drivers/net/cxgb3/sge.c |    1 +
 1 file changed, 1 insertion(+)

diff -ruNp org/drivers/net/cxgb3/sge.c new/drivers/net/cxgb3/sge.c
--- org/drivers/net/cxgb3/sge.c	2009-10-19 11:58:16.000000000 +0530
+++ new/drivers/net/cxgb3/sge.c	2009-10-20 09:40:40.000000000 +0530
@@ -2135,6 +2135,7 @@ static void lro_add_page(struct adapter 
 	if (!complete)
 		return;
 
+	skb_record_rx_queue(skb, qs - &adap->sge.qs[0]);
 	skb->ip_summed = CHECKSUM_UNNECESSARY;
 	cpl = qs->lro_va;
 

^ permalink raw reply

* [RFC] [PATCH] udp: Don't save dst in udpv6_sendmsg()
From: Krishna Kumar @ 2009-10-23 11:13 UTC (permalink / raw)
  To: davem; +Cc: netdev, Krishna Kumar

From: Krishna Kumar <krkumar2@in.ibm.com>

Since ip6_datagram_connect saves the dst entry, it is not
required to do the same in every iteration of udpv6_sendmsg.
It also breaks the txq# caching, which then gets updated
every time in dev_pick_tx only to get reset again here.
Update dst only if ip6_sk_dst_lookup changed the dst entry.

Performance: I ran netperf UDPv6 RR to use connected sockets.
Tested with a 70 min run, aggregate of 5 netperf runs for
each result.

------------------------  UDPv6 RR Test  ---------------------
#procs  Org TPS     New TPS (%)       Org SD    New SD (%)
--------------------------------------------------------------
1       119031      118793 (-0.19)      82         77 (-5.23)
2       217572      218607 (0.47)      315        300 (-4.65)
4       258463      258823 (0.13)     1334       1248 (-6.44)
8       318018      319425 (0.44)     5735       5500 (-4.09)
10      395531      401529 (1.51)     9901       9507 (-3.98)
12      453319      453492 (0.03)    15473       15131 (-2.21)
--------------------------------------------------------------

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 net/ipv6/udp.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff -ruNp org/net/ipv6/udp.c new/net/ipv6/udp.c
--- org/net/ipv6/udp.c	2009-10-19 11:58:16.000000000 +0530
+++ new/net/ipv6/udp.c	2009-10-23 10:42:35.000000000 +0530
@@ -990,7 +990,8 @@ do_append_data:
 
 	if (dst) {
 		if (connected) {
-			ip6_dst_store(sk, dst,
+			if (__sk_dst_get(sk) != dst)
+				ip6_dst_store(sk, dst,
 				      ipv6_addr_equal(&fl.fl6_dst, &np->daddr) ?
 				      &np->daddr : NULL,
 #ifdef CONFIG_IPV6_SUBTREES

^ permalink raw reply

* Re: [PATCH 4/5] page allocator: Pre-emptively wake kswapd when high-order watermarks are hit
From: Mel Gorman @ 2009-10-23 11:25 UTC (permalink / raw)
  To: David Rientjes
  Cc: Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, Rafael J. Wysocki, David Miller, Reinette Chatre,
	Kalle Valo, KOSAKI Motohiro, Mohamed Abbas, Jens Axboe,
	John W. Linville, Pekka Enberg, Bartlomiej Zolnierkiewicz,
	Greg Kroah-Hartman, Stephan von Krawczynski, Kernel Testers List,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
In-Reply-To: <alpine.DEB.2.00.0910230229010.28109-X6Q0R45D7oAcqpCFd4KODRPsWskHk0ljAL8bYrjMMd8@public.gmane.org>

On Fri, Oct 23, 2009 at 02:36:53AM -0700, David Rientjes wrote:
> On Fri, 23 Oct 2009, Mel Gorman wrote:
> 
> > > Hmm, is this really supposed to be added to __alloc_pages_high_priority()?  
> > > By the patch description I was expecting kswapd to be woken up 
> > > preemptively whenever the preferred zone is below ALLOC_WMARK_LOW and 
> > > we're known to have just allocated at a higher order, not just when 
> > > current was oom killed (when we should already be freeing a _lot_ of 
> > > memory soon) or is doing a higher order allocation during direct reclaim.
> > > 
> > 
> > It was a somewhat arbitrary choice to have it trigger in the event high
> > priority allocations were happening frequently.
> > 
> 
> I don't quite understand, users of PF_MEMALLOC shouldn't be doing these 
> higher order allocations and if ALLOC_NO_WATERMARKS is by way of the oom 
> killer, we should be freeing a substantial amount of memory imminently 
> when it exits that waking up kswapd would be irrelevant.
> 

I agree. I think it's highly unlikely this patch will make any
difference but I wanted to eliminate it as a possibility. Patch 3 and 4
were previously one patch that were tested together.

> > > If this is moved to the fastpath, why is this wake_all_kswapd() and not
> > > wakeup_kswapd(preferred_zone, order)?  Do we need to kick kswapd in all 
> > > zones even though they may be free just because preferred_zone is now 
> > > below the watermark?
> > > 
> > 
> > It probably makes no difference as zones are checked for their watermarks
> > before any real work happens. However, even if this patch makes a difference,
> > I don't want to see it merged.  At best, it is an extremely heavy-handed
> > hack which is why I asked for it to be tested in isolation. It shouldn't
> > be necessary at all because sort of pre-emptive waking of kswapd was never
> > necessary before.
> > 
> 
> Ahh, that makes a ton more sense: this particular patch is a debugging 
> effort while the first two are candidates for 2.6.32 and -stable.  Gotcha.
> 

Yep.

> > > Wouldn't it be better to do this on page_zone(page) instead of 
> > > preferred_zone anyway?
> > > 
> > 
> > No. The preferred_zone is the zone we should be allocating from. If we
> > failed to allocate from it, it implies the watermarks are not being met
> > so we want to wake it.
> > 
> 
> Oops, I'm even more confused now :)  I thought the existing 
> wake_all_kswapd() in the slowpath was doing that and that this patch was 
> waking them prematurely because it speculates that a subsequent high 
> order allocation will fail unless memory is reclaimed. 


It should be doing that. This patch should be junk but because it was tested
previously, I needed to be sure it was actually junk.

> I thought we'd  
> want to reclaim from the zone we just did a high order allocation from so 
> that the fastpath could find the memory next time with ALLOC_WMARK_LOW.

The fastpath should be getting the pages it needs from the
preferred_zone. If it's not, we still want to get pages back in that
zone and the zone we actually ended up getting pages from.

It's probably best to ignore this patch except in the unlikely event Tobias
says it makes a difference to his testing. I'm hoping he's covered by patches
1+2 and maybe 3 and that patches 4 and 5 of this set get consigned to
the bit bucket.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* Re: [PATCH 4/5] page allocator: Pre-emptively wake kswapd when high-order watermarks are hit
From: Tobias Oetiker @ 2009-10-23 11:31 UTC (permalink / raw)
  To: Mel Gorman
  Cc: David Rientjes, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Rafael J. Wysocki, David Miller,
	Reinette Chatre, Kalle Valo, KOSAKI Motohiro, Mohamed Abbas,
	Jens Axboe, John W. Linville, Pekka Enberg,
	Bartlomiej Zolnierkiewicz, Greg Kroah-Hartman,
	Stephan von Krawczynski, Kernel Testers List, netdev,
	linux-kernel, "linux-mm@kvack.org" <linu
In-Reply-To: <20091023112512.GW11778@csn.ul.ie>

Mel,

Today Mel Gorman wrote:

> On Fri, Oct 23, 2009 at 02:36:53AM -0700, David Rientjes wrote:
> > On Fri, 23 Oct 2009, Mel Gorman wrote:
> >
> > > > Hmm, is this really supposed to be added to __alloc_pages_high_priority()?
> > > > By the patch description I was expecting kswapd to be woken up
> > > > preemptively whenever the preferred zone is below ALLOC_WMARK_LOW and
> > > > we're known to have just allocated at a higher order, not just when
> > > > current was oom killed (when we should already be freeing a _lot_ of
> > > > memory soon) or is doing a higher order allocation during direct reclaim.
> > > >
> > >
> > > It was a somewhat arbitrary choice to have it trigger in the event high
> > > priority allocations were happening frequently.
> > >
> >
> > I don't quite understand, users of PF_MEMALLOC shouldn't be doing these
> > higher order allocations and if ALLOC_NO_WATERMARKS is by way of the oom
> > killer, we should be freeing a substantial amount of memory imminently
> > when it exits that waking up kswapd would be irrelevant.
> >
>
> I agree. I think it's highly unlikely this patch will make any
> difference but I wanted to eliminate it as a possibility. Patch 3 and 4
> were previously one patch that were tested together.

hi hi ... I have tested '3 only' this morning, and the allocation
problems started again ... so for me 3 alone does not work while
3+4 does.

cheers
tobi

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: bridging + load balancing bonding
From: Jasper Spaans @ 2009-10-23 11:45 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev
In-Reply-To: <347.1256232960@death.nxdomain.ibm.com>

On Thu, Oct 22, 2009 at 07:36:00PM +0200, Jay Vosburgh wrote:
> 	By "packets from one flow" do you really mean that packets from
> a given "flow" (TCP connection, UDP "stream", etc) are not always
> delivered to the same bonding port?  I.e., that two packets from the
> same "flow" will be delivered to different ports?  I'm not sure how
> that's possible unless the source MAC in the packets changes during the
> course of the flow.
> 
> 	Or is your problem really that the balance algorithm on the
> bonding send side doesn't match the algorithm used on the other side of
> the IDS machines coming the other direction (and, thus, packets for a
> given flow going in one direction end up at a different IDS than the
> packets going the other direction)?

It's the second problem: traffic in one direction ends up at another
node than traffic in the other direction, even if between the same pair of
machines.

> 	Locally generated packets do, but he's got a bridge in there, so
> the traffic they're balancing is presumably not locally generated (i.e.,
> is being forwarded by the bridge, in which case they'll still bear the
> source MAC of the originating node on the subnet).  If the packets were
> being routed instead of bridged, then, yah, they'd have the bond's
> source MAC.

And in case of routing, the destination MAC will also be modified.. so worst
case (all traffic goes to one node), no balancing will happen. This also
affects non-IDS use of load balancing, I guess.

> >So your solution might be the right fix...
> 
> 	Yes, I think he's found a legitimate bug, one that only will
> manifest when balancing bridged traffic.  I had to think for a minute if
> this change would break anything, and I'm coming up empty.  Locally
> generated or routed traffic won't see a change, and bridged traffic will
> be correctly balanced according to the "source MAC XOR destination MAC"
> forumla described in the documentation.

I'll post a patch shortly.

Jasper
-- 
Fox-IT Experts in IT Security!
T: +31 (0) 15 284 79 99
KvK Haaglanden 27301624

^ permalink raw reply

* Re: PATCH 23/10]Optimize the upload speed for PPP connection.
From: William Allen Simpson @ 2009-10-23 11:46 UTC (permalink / raw)
  To: fangxiaozhi 00110321; +Cc: netdev, linux-kernel, zihan, greg, haegar
In-Reply-To: <fae097f21b27.1b27fae097f2@huawei.com>

fangxiaozhi 00110321 wrote:
> From: fangxiaozhi <huananhu@huawei.com>
> 1. This patch is based on the kernel of 2.6.32-rc4 
> 2. In this patch, we enlarge the out buffer size to optimize the upload speed for the ppp connection. Then it can support the upload of HSUPA data cards.
> Signed-off-by: fangxiaozhi <huananhu@huawei.com>
> -----------------------------------------------------------------------------------------
> --- a/drivers/net/ppp_async.c	2009-10-12 05:43:56.000000000 +0800
> +++ b/drivers/net/ppp_async.c	2009-10-15 16:29:56.000000000 +0800
> @@ -36,7 +36,7 @@
>  
>  #define PPP_VERSION	"2.4.2"
>  
> -#define OBUFSIZE	256
> +#define OBUFSIZE	2048
>  
>  /* Structure for storing local state. */
>  struct asyncppp {
> 
Concur.  I'd go further than that, my code usually made room for at least
a full MTU (MRU) with HDLC escaping.  To minimize context switches, that
should be 3014 ((1500 MRU + 2 FCS + 4 header) * 2 escapes + 2 flags).

Even in the old days, when memory was tight, context switches and interrupt
time were more expensive, too.  PPP is supposed to scale to OC-192.

^ permalink raw reply

* Re: [PATCH 23/10] Optimize the upload speed for PPP connection.
From: William Allen Simpson @ 2009-10-23 11:51 UTC (permalink / raw)
  To: fangxiaozhi 00110321; +Cc: netdev, kernel, zihan, greg, haegar
In-Reply-To: <f98f8abb3f92.3f92f98f8abb@huawei.com>

fangxiaozhi 00110321 wrote:
> From: fangxiaozhi <huananhu@huawei.com>
> 1. This patch is based on the kernel of 2.6.32-rc4 
> 2. In this patch, we enlarge the out buffer size to optimize the upload speed for the ppp connection. Then it can support the upload of HSUPA data cards.
> Signed-off-by: fangxiaozhi <huananhu@huawei.com>
> -----------------------------------------------------------------------------------------
> --- a/drivers/net/ppp_async.c	2009-10-12 05:43:56.000000000 +0800
> +++ b/drivers/net/ppp_async.c	2009-10-15 16:29:56.000000000 +0800
> @@ -36,7 +36,7 @@
>  
>  #define PPP_VERSION	"2.4.2"
>  
> -#define OBUFSIZE	256
> +#define OBUFSIZE	2048
>  
>  /* Structure for storing local state. */
>  struct asyncppp {
> 
Repeat.  I'd go further than that, my code usually made room for at least
a full MTU (MRU) with HDLC escaping.  To minimize context switches, that
should be 3014 ((1500 MRU + 2 FCS + 4 header) * 2 escapes + 2 flags).

Even in the old days, when memory was tight, context switches and interrupt
time were more expensive, too.  PPP is supposed to scale to OC-192.

^ permalink raw reply

* [PATCH] Modify bonding hash transmit policies to use the packet's source MAC address
From: Jasper Spaans @ 2009-10-23 11:58 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org
In-Reply-To: <20091023114511.GA537@spaans.fox.local>

Modify bonding hash transmit policies to use the psource MAC address of
the packet instead of the MAC address configured for the bonding device.

The old sitation conflicts with the documentation.
---
 drivers/net/bonding/bond_main.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 69c5b15..b140b52 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3668,7 +3668,7 @@ static int bond_xmit_hash_policy_l23(struct sk_buff *skb,
 			(data->h_dest[5] ^ bond_dev->dev_addr[5])) % count;
 	}
 
-	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
+	return (data->h_dest[5] ^ data->h_source[5]) % count;
 }
 
 /*
@@ -3695,7 +3695,7 @@ static int bond_xmit_hash_policy_l34(struct sk_buff *skb,
 
 	}
 
-	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
+	return (data->h_dest[5] ^ data->h_source[5]) % count;
 }
 
 /*
@@ -3706,7 +3706,7 @@ static int bond_xmit_hash_policy_l2(struct sk_buff *skb,
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
 
-	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
+	return (data->h_dest[5] ^ data->h_source[5]) % count;
 }
 
 /*-------------------------- Device entry points ----------------------------*/
-- 
1.6.0.4


-- 
Fox-IT Experts in IT Security!
T: +31 (0) 15 284 79 99
KvK Haaglanden 27301624

^ permalink raw reply related

* Re: [PATCH] Modify bonding hash transmit policies to use the packet's source MAC address
From: Eric Dumazet @ 2009-10-23 12:37 UTC (permalink / raw)
  To: Jasper Spaans; +Cc: netdev@vger.kernel.org
In-Reply-To: <20091023115834.GA2216@spaans.fox.local>

Jasper Spaans a écrit :
> Modify bonding hash transmit policies to use the psource MAC address of
> the packet instead of the MAC address configured for the bonding device.
> 
> The old sitation conflicts with the documentation.
> ---
>  drivers/net/bonding/bond_main.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 69c5b15..b140b52 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -3668,7 +3668,7 @@ static int bond_xmit_hash_policy_l23(struct sk_buff *skb,
>  			(data->h_dest[5] ^ bond_dev->dev_addr[5])) % count;

You forgot one bond_dev->dev_addr[5] occurrence here


>  	}
>  
> -	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
> +	return (data->h_dest[5] ^ data->h_source[5]) % count;
>  }

Could you check if bond->xmit_hash_policy(skb, bond_dev, bond->slave_cnt);
could be replaced by bond->xmit_hash_policy(skb, bond->slave_cnt),
now that various implementations dont need bond_dev anymore ?

static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count) ...

Dont forget your "Signed-off-by: Jasper Spaans <spaans@fox-it.com>",
copied to "David S. Miller" <davem@davemloft.net>  and "Jay Vosburgh" <fubar@us.ibm.com>
(respectively network and bonding maintainers)

Thanks

^ permalink raw reply

* Re: NOHZ: local_softirq_pending 08
From: Johannes Berg @ 2009-10-23 13:34 UTC (permalink / raw)
  To: Tilman Schmidt
  Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel,
	tglx, linux-wireless, linux-ppp, netdev, paulus, isdn4linux,
	i4ldeveloper, Karsten Keil
In-Reply-To: <4AE0ECCE.2020407@imap.cc>

[-- Attachment #1: Type: text/plain, Size: 566 bytes --]

On Fri, 2009-10-23 at 01:37 +0200, Tilman Schmidt wrote:

> --- a/drivers/isdn/i4l/isdn_ppp.c
> +++ b/drivers/isdn/i4l/isdn_ppp.c
> @@ -1174,7 +1174,10 @@ isdn_ppp_push_higher(isdn_net_dev * net_dev, isdn_net_local * lp, struct sk_buff
>  #endif /* CONFIG_IPPP_FILTER */
>  	skb->dev = dev;
>  	skb_reset_mac_header(skb);
> -	netif_rx(skb);
> +	if (in_interrupt())
> +		netif_rx(skb);
> +	else
> +		netif_rx_ni(skb);

So you've verified that the entire i4l stack can cope with being called
twice on the same CPU, from different contexts?

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [PATCH 4/5] page allocator: Pre-emptively wake kswapd when high-order watermarks are hit
From: Mel Gorman @ 2009-10-23 13:39 UTC (permalink / raw)
  To: Tobias Oetiker
  Cc: David Rientjes, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Rafael J. Wysocki, David Miller,
	Reinette Chatre, Kalle Valo, KOSAKI Motohiro, Mohamed Abbas,
	Jens Axboe, John W. Linville, Pekka Enberg,
	Bartlomiej Zolnierkiewicz, Greg Kroah-Hartman,
	Stephan von Krawczynski, Kernel Testers List,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
In-Reply-To: <alpine.DEB.2.00.0910231329550.26462-EjsAmf5DE5zIvOfxy3zmAzgUDZmNtoG9@public.gmane.org>

On Fri, Oct 23, 2009 at 01:31:10PM +0200, Tobias Oetiker wrote:
> Mel,
> 
> Today Mel Gorman wrote:
> 
> > On Fri, Oct 23, 2009 at 02:36:53AM -0700, David Rientjes wrote:
> > > On Fri, 23 Oct 2009, Mel Gorman wrote:
> > >
> > > > > Hmm, is this really supposed to be added to __alloc_pages_high_priority()?
> > > > > By the patch description I was expecting kswapd to be woken up
> > > > > preemptively whenever the preferred zone is below ALLOC_WMARK_LOW and
> > > > > we're known to have just allocated at a higher order, not just when
> > > > > current was oom killed (when we should already be freeing a _lot_ of
> > > > > memory soon) or is doing a higher order allocation during direct reclaim.
> > > > >
> > > >
> > > > It was a somewhat arbitrary choice to have it trigger in the event high
> > > > priority allocations were happening frequently.
> > > >
> > >
> > > I don't quite understand, users of PF_MEMALLOC shouldn't be doing these
> > > higher order allocations and if ALLOC_NO_WATERMARKS is by way of the oom
> > > killer, we should be freeing a substantial amount of memory imminently
> > > when it exits that waking up kswapd would be irrelevant.
> > >
> >
> > I agree. I think it's highly unlikely this patch will make any
> > difference but I wanted to eliminate it as a possibility. Patch 3 and 4
> > were previously one patch that were tested together.
> 
> hi hi ... I have tested '3 only' this morning, and the allocation
> problems started again ... so for me 3 alone does not work while
> 3+4 does.
> 

Hi,

What was the outcome of 1+2?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* [PATCH] Modify bonding hash transmit policies to use the packet's source MAC address
From: Jasper Spaans @ 2009-10-23 14:08 UTC (permalink / raw)
  To: Jay Vosburgh, Eric Dumazet; +Cc: netdev@vger.kernel.org
In-Reply-To: <4AE1A37F.8010203@gmail.com>

Modify bonding hash transmit policies to use the psource MAC address of
the packet instead of the MAC address configured for the bonding device.

The old sitation conflicts with the documentation.

Signed-off-by: Jasper Spaans <spaans@fox-it.com>
---
 drivers/net/bonding/bond_main.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 69c5b15..3f05267 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3665,10 +3665,10 @@ static int bond_xmit_hash_policy_l23(struct sk_buff *skb,
 
 	if (skb->protocol == htons(ETH_P_IP)) {
 		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
-			(data->h_dest[5] ^ bond_dev->dev_addr[5])) % count;
+			(data->h_dest[5] ^ data->h_source[5])) % count;
 	}
 
-	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
+	return (data->h_dest[5] ^ data->h_source[5]) % count;
 }
 
 /*
@@ -3695,7 +3695,7 @@ static int bond_xmit_hash_policy_l34(struct sk_buff *skb,
 
 	}
 
-	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
+	return (data->h_dest[5] ^ data->h_source[5]) % count;
 }
 
 /*
@@ -3706,7 +3706,7 @@ static int bond_xmit_hash_policy_l2(struct sk_buff *skb,
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
 
-	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
+	return (data->h_dest[5] ^ data->h_source[5]) % count;
 }
 
 /*-------------------------- Device entry points ----------------------------*/
-- 
1.6.0.4


-- 
Fox-IT Experts in IT Security!
T: +31 (0) 15 284 79 99
KvK Haaglanden 27301624

^ permalink raw reply related

* [PATCH] Remove bond_dev from xmit_hash_policy call.
From: Jasper Spaans @ 2009-10-23 14:09 UTC (permalink / raw)
  To: Jay Vosburgh, Eric Dumazet; +Cc: netdev@vger.kernel.org
In-Reply-To: <4AE1A37F.8010203@gmail.com>

Now that the bonding device is no longer used in determining the device to
which to send packets, it can be dropped from the argument list of the various
xmit_hash_policy calls.

Signed-off-by: Jasper Spaans <spaans@fox-it.com>
---
 drivers/net/bonding/bond_3ad.c  |   11 +++++------
 drivers/net/bonding/bond_main.c |   11 ++++-------
 drivers/net/bonding/bonding.h   |    3 +--
 3 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index c3fa31c..3cd8153 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1956,7 +1956,7 @@ void bond_3ad_unbind_slave(struct slave *slave)
 	struct port *port, *prev_port, *temp_port;
 	struct aggregator *aggregator, *new_aggregator, *temp_aggregator;
 	int select_new_active_agg = 0;
-	
+
 	// find the aggregator related to this slave
 	aggregator = &(SLAVE_AD_INFO(slave).aggregator);
 
@@ -2024,7 +2024,7 @@ void bond_3ad_unbind_slave(struct slave *slave)
 
 				// clear the aggregator
 				ad_clear_agg(aggregator);
-				
+
 				if (select_new_active_agg) {
 					ad_agg_selection_logic(__get_first_agg(port));
 				}
@@ -2075,7 +2075,7 @@ void bond_3ad_unbind_slave(struct slave *slave)
 			}
 		}
 	}
-	port->slave=NULL;	
+	port->slave=NULL;
 }
 
 /**
@@ -2301,7 +2301,7 @@ void bond_3ad_handle_link_change(struct slave *slave, char link)
 }
 
 /*
- * set link state for bonding master: if we have an active 
+ * set link state for bonding master: if we have an active
  * aggregator, we're up, if not, we're down.  Presumes that we cannot
  * have an active aggregator if there are no slaves with link up.
  *
@@ -2395,7 +2395,7 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
 		goto out;
 	}
 
-	slave_agg_no = bond->xmit_hash_policy(skb, dev, slaves_in_agg);
+	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
 
 	bond_for_each_slave(bond, slave, i) {
 		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
@@ -2468,4 +2468,3 @@ out:
 
 	return ret;
 }
-
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 3f05267..13058c5 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3657,8 +3657,7 @@ void bond_unregister_arp(struct bonding *bond)
  * Hash for the output device based upon layer 2 and layer 3 data. If
  * the packet is not IP mimic bond_xmit_hash_policy_l2()
  */
-static int bond_xmit_hash_policy_l23(struct sk_buff *skb,
-				     struct net_device *bond_dev, int count)
+static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
 	struct iphdr *iph = ip_hdr(skb);
@@ -3676,8 +3675,7 @@ static int bond_xmit_hash_policy_l23(struct sk_buff *skb,
  * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
  * altogether not IP, mimic bond_xmit_hash_policy_l2()
  */
-static int bond_xmit_hash_policy_l34(struct sk_buff *skb,
-				    struct net_device *bond_dev, int count)
+static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
 	struct iphdr *iph = ip_hdr(skb);
@@ -3701,8 +3699,7 @@ static int bond_xmit_hash_policy_l34(struct sk_buff *skb,
 /*
  * Hash for the output device based upon layer 2 data
  */
-static int bond_xmit_hash_policy_l2(struct sk_buff *skb,
-				   struct net_device *bond_dev, int count)
+static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
 
@@ -4295,7 +4292,7 @@ static int bond_xmit_xor(struct sk_buff *skb, struct net_device *bond_dev)
 	if (!BOND_IS_OK(bond))
 		goto out;
 
-	slave_no = bond->xmit_hash_policy(skb, bond_dev, bond->slave_cnt);
+	slave_no = bond->xmit_hash_policy(skb, bond->slave_cnt);
 
 	bond_for_each_slave(bond, slave, i) {
 		slave_no--;
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 6824771..02e1f9e 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -204,7 +204,7 @@ struct bonding {
 #endif /* CONFIG_PROC_FS */
 	struct   list_head bond_list;
 	struct   dev_mc_list *mc_list;
-	int      (*xmit_hash_policy)(struct sk_buff *, struct net_device *, int);
+	int      (*xmit_hash_policy)(struct sk_buff *, int);
 	__be32   master_ip;
 	u16      flags;
 	u16      rr_tx_counter;
@@ -370,4 +370,3 @@ static inline void bond_unregister_ipv6_notifier(void)
 #endif
 
 #endif /* _LINUX_BONDING_H */
-
-- 
1.6.0.4


-- 
Fox-IT Experts in IT Security!
T: +31 (0) 15 284 79 99
KvK Haaglanden 27301624

^ permalink raw reply related

* Re: NOHZ: local_softirq_pending 08
From: Tilman Schmidt @ 2009-10-23 14:27 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel,
	tglx, linux-wireless, linux-ppp, netdev, paulus, isdn4linux,
	i4ldeveloper, Karsten Keil
In-Reply-To: <1256304869.12174.20.camel@johannes.local>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Johannes Berg schrieb:
> On Fri, 2009-10-23 at 01:37 +0200, Tilman Schmidt wrote:
> 
>> --- a/drivers/isdn/i4l/isdn_ppp.c
>> +++ b/drivers/isdn/i4l/isdn_ppp.c
>> @@ -1174,7 +1174,10 @@ isdn_ppp_push_higher(isdn_net_dev * net_dev, isdn_net_local * lp, struct sk_buff
>>  #endif /* CONFIG_IPPP_FILTER */
>>  	skb->dev = dev;
>>  	skb_reset_mac_header(skb);
>> -	netif_rx(skb);
>> +	if (in_interrupt())
>> +		netif_rx(skb);
>> +	else
>> +		netif_rx_ni(skb);
> 
> So you've verified that the entire i4l stack can cope with being called
> twice on the same CPU, from different contexts?

What makes you think so?
Better yet, what do you propose?

Thanks,
Tilman

- --
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFK4b09Q3+did9BuFsRAqBvAKCbRI0iXQEyK3ztxkGHcqpbcceqbACgkagX
JF7nYd152ihp2uemIs/cB54=
=YOin
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: NOHZ: local_softirq_pending 08
From: Johannes Berg @ 2009-10-23 14:31 UTC (permalink / raw)
  To: Tilman Schmidt
  Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel,
	tglx, linux-wireless, linux-ppp, netdev, paulus, isdn4linux,
	i4ldeveloper, Karsten Keil
In-Reply-To: <4AE1BD3D.3020007@imap.cc>

[-- Attachment #1: Type: text/plain, Size: 1369 bytes --]

On Fri, 2009-10-23 at 16:27 +0200, Tilman Schmidt wrote:

> >> --- a/drivers/isdn/i4l/isdn_ppp.c
> >> +++ b/drivers/isdn/i4l/isdn_ppp.c
> >> @@ -1174,7 +1174,10 @@ isdn_ppp_push_higher(isdn_net_dev * net_dev, isdn_net_local * lp, struct sk_buff
> >>  #endif /* CONFIG_IPPP_FILTER */
> >>  	skb->dev = dev;
> >>  	skb_reset_mac_header(skb);
> >> -	netif_rx(skb);
> >> +	if (in_interrupt())
> >> +		netif_rx(skb);
> >> +	else
> >> +		netif_rx_ni(skb);
> > 
> > So you've verified that the entire i4l stack can cope with being called
> > twice on the same CPU, from different contexts?
> 
> What makes you think so?

I thought I'd explained this in my other email. *sigh*

You're squelching a warning, which comes from the fact that you're
calling something that calls into netif_rx() with softirqs enabled. That
would seem to imply that potentially a softirq could at the same time
call into that code too.

Basically, what happens now is this:

disable softirqs
call into i4l/ppp
i4l/ppp code
call netif_rx()
more code
enable softirqs


You're changing it to

call into i4l/ppp
i4l/ppp code
call netif_rx_ni()
more code

so the entire chunks "i4l/ppp code" and "more code" are now no longer
protected against being interrupted by a softirq that runs the same
code, maybe for a different device, on the same CPU.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox