Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] isdn: fix possible circular locking dependency
From: David Miller @ 2009-10-23  1:28 UTC (permalink / raw)
  To: xtfeng; +Cc: isdn, isdn4linux, tilman, netdev, linux-kernel
In-Reply-To: <1256202424-28314-1-git-send-email-xtfeng@gmail.com>

From: Xiaotian Feng <xtfeng@gmail.com>
Date: Thu, 22 Oct 2009 17:07:04 +0800

> There's a circular locking dependency:
 ...
>  We don't need to lock nd->queue->xmit_lock to protect single
> isdn_net_lp_busy(). This can fix above lockdep warnings.
> 
> Reported-and-tested-by: Tilman Schmidt <tilman@imap.cc>
> Signed-off-by: Xiaotian Feng <xtfeng@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH 0/3] netxen: bug fixes
From: David Miller @ 2009-10-23  1:28 UTC (permalink / raw)
  To: dhananjay; +Cc: netdev
In-Reply-To: <1256189943-20477-1-git-send-email-dhananjay@netxen.com>

From: Dhananjay Phadke <dhananjay@netxen.com>
Date: Wed, 21 Oct 2009 22:39:00 -0700

> 3 bug fixes for 2.6.32. Please apply to net-2.6.

All applied, thank you.

^ permalink raw reply

* Re: [PATCH 2.6.32-rc5] r8169: fix Ethernet Hangup for RTL8110SC rev d
From: David Miller @ 2009-10-23  1:19 UTC (permalink / raw)
  To: simon.wunderlich; +Cc: netdev, romieu, bernhard.schmidt
In-Reply-To: <66ae97b70910212348o6c32da21s906da4770ebc6f80@mail.gmail.com>

This patch looks fine, but I can't apply it because your email
client corrupted the patch, changing tab characters into
spaces, breaking up long lines, etc.

Please fix this up and resubmit, thanks.  You can read
linux/Documentation/email-clients.txt for helpful tips.

^ permalink raw reply

* Re: [PATCH] vmxnet3: remove duplicated #include
From: Shreyas Bhatewara @ 2009-10-22 23:58 UTC (permalink / raw)
  To: netdev; +Cc: Huang Weiyi, Shreyas Bhatewara, pv-drivers
In-Reply-To: <alpine.LRH.2.00.0910221631580.23769@sbhatewara-dev1.eng.vmware.com>



Remove duplicate headerfile includes from vmxnet3_int.h

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>
Signed-off-by: Bhavesh Davda <davda@vmware.com>

---

diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h
index 6bb9157..c4f8f04 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -27,15 +27,11 @@
 #ifndef _VMXNET3_INT_H
 #define _VMXNET3_INT_H
 
-#include <linux/types.h>
 #include <linux/ethtool.h>
 #include <linux/delay.h>
 #include <linux/netdevice.h>
 #include <linux/pci.h>
-#include <linux/ethtool.h>
 #include <linux/compiler.h>
-#include <linux/module.h>
-#include <linux/moduleparam.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/ioport.h>


^ permalink raw reply related

* Re: NOHZ: local_softirq_pending 08
From: Tilman Schmidt @ 2009-10-22 23:37 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Miller, johannes, hidave.darkstar, linux-kernel, tglx,
	linux-wireless, linux-ppp, netdev, paulus, isdn4linux,
	i4ldeveloper, Karsten Keil
In-Reply-To: <4ADF5710.4030505@imap.cc>

[-- Attachment #1: Type: text/plain, Size: 1719 bytes --]

On 21.10.2009 20:46, /me wrote:
>>>> I have encountered the message in the subject during a test of
>>>> the Gigaset CAPI driver, and would like to determine whether
>>>> it's a bug in the driver, a bug somewhere else, or no bug at
>>>> all. The test scenario was PPP over ISDN with pppd+capiplugin.
>>>> In an alternative scenario, also PPP over ISDN but with
>>>> smpppd+capidrv, the message did not occur.
> 
> I'm sorry, I had confused the two cases. The message occurs in
> the smpppd+capidrv scenario, not with pppd+capiplugin.
> 
>>>> Johannes' answer pointed me to the netif_rx() function.
>>>> The Gigaset driver itself doesn't call that function at all.
>>>> In the scenario where I saw the message, it was the SYNC_PPP
>>>> line discipline that did.
> 
> This analysis was therefore wrong. It would be the netif_rx()
> call towards the end of isdn_ppp_push_higher() in
> drivers/isdn/i4l/isdn_ppp.c L1177.

Having noticed that, I cooked up the following patch which fixed
the messages for me. Comments? (Adding i4l people to the already
impressive CC list.)

--- a/drivers/isdn/i4l/isdn_ppp.c
+++ b/drivers/isdn/i4l/isdn_ppp.c
@@ -1174,7 +1174,10 @@ isdn_ppp_push_higher(isdn_net_dev * net_dev, isdn_net_local * lp, struct sk_buff
 #endif /* CONFIG_IPPP_FILTER */
 	skb->dev = dev;
 	skb_reset_mac_header(skb);
-	netif_rx(skb);
+	if (in_interrupt())
+		netif_rx(skb);
+	else
+		netif_rx_ni(skb);
 	/* net_dev->local->stats.rx_packets++; done in isdn_net.c */
 	return;
 


-- 
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply

* Re: [PATCH net-2.6] sfc: 10Xpress: Report support for pause frames
From: Ben Hutchings @ 2009-10-22 22:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1256250314.2785.22.camel@achroite>

On Thu, 2009-10-22 at 23:25 +0100, Ben Hutchings wrote:
> Commits 27fbc7d 'mdio: Expose pause frame advertising flags to ethtool'
> and c634263 'sfc: 10Xpress: Initialise pause advertising flags'
> added to our reported advertising flags.
> 
> efx_mdio_set_settings() requires that all advertising flags are
> also present in the supported flags, so make sure that is true.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
> This fixes a regression in 2.6.32: after resetting a 10Xpress PHY we
> fail to reconfigure it if pause frames are enabled.  Manually changing
> pause frame settings will also fail.

Sorry, I was mistaken - those earlier commits are only in net-next-2.6,
so this is not needed for 2.6.32.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* [PATCH net-2.6] sfc: 10Xpress: Report support for pause frames
From: Ben Hutchings @ 2009-10-22 22:25 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

Commits 27fbc7d 'mdio: Expose pause frame advertising flags to ethtool'
and c634263 'sfc: 10Xpress: Initialise pause advertising flags'
added to our reported advertising flags.

efx_mdio_set_settings() requires that all advertising flags are
also present in the supported flags, so make sure that is true.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
This fixes a regression in 2.6.32: after resetting a 10Xpress PHY we
fail to reconfigure it if pause frames are enabled.  Manually changing
pause frame settings will also fail.

Ben.

 drivers/net/sfc/tenxpress.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/sfc/tenxpress.c b/drivers/net/sfc/tenxpress.c
index 1a3495c..352cc56 100644
--- a/drivers/net/sfc/tenxpress.c
+++ b/drivers/net/sfc/tenxpress.c
@@ -752,6 +752,7 @@ tenxpress_get_settings(struct efx_nic *efx, struct ethtool_cmd *ecmd)
 
 	mdio45_ethtool_gset_npage(&efx->mdio, ecmd, adv, lpa);
 
+	ecmd->supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
 	if (efx->phy_type != PHY_TYPE_SFX7101) {
 		ecmd->supported |= (SUPPORTED_100baseT_Full |
 				    SUPPORTED_1000baseT_Full);

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* Re: Irq architecture for multi-core network driver.
From: David Daney @ 2009-10-22 22:24 UTC (permalink / raw)
  To: Chris Friesen; +Cc: netdev, Linux Kernel Mailing List, linux-mips
In-Reply-To: <4AE0D72A.4090607@nortel.com>

Chris Friesen wrote:
> On 10/22/2009 03:40 PM, David Daney wrote:
> 
>> The main problem I have encountered is how to fit the interrupt
>> management into the kernel framework.  Currently the interrupt source
>> is connected to a single irq number.  I request_irq, and then manage
>> the masking and unmasking on a per cpu basis by directly manipulating
>> the interrupt controller's affinity/routing registers.  This goes
>> behind the back of all the kernel's standard interrupt management
>> routines.  I am looking for a better approach.
>>
>> One thing that comes to mind is that I could assign a different
>> interrupt number per cpu to the interrupt signal.  So instead of
>> having one irq I would have 32 of them.  The driver would then do
>> request_irq for all 32 irqs, and could call enable_irq and disable_irq
>> to enable and disable them.  The problem with this is that there isn't
>> really a single packets-ready signal, but instead 16 of them.  So If I
>> go this route I would have 16(lines) x 32(cpus) = 512 interrupt
>> numbers just for the networking hardware, which seems a bit excessive.
> 
> Does your hardware do flow-based queues?  In this model you have
> multiple rx queues and the hardware hashes incoming packets to a single
> queue based on the addresses, ports, etc. This ensures that all the
> packets of a single connection always get processed in the order they
> arrived at the net device.
> 

Indeed, this is exactly what we have.


> Typically in this model you have as many interrupts as queues
> (presumably 16 in your case).  Each queue is assigned an interrupt and
> that interrupt is affined to a single core.

Certainly this is one mode of operation that should be supported, but I 
would also like to be able to go for raw throughput and have as many 
cores as possible reading from a single queue (like I currently have).

> 
> The intel igb driver is an example of one that uses this sort of design.
> 

Thanks, I will look at that driver.

David Daney

^ permalink raw reply

* Re: Irq architecture for multi-core network driver.
From: Chris Friesen @ 2009-10-22 22:05 UTC (permalink / raw)
  To: David Daney; +Cc: netdev, Linux Kernel Mailing List, linux-mips
In-Reply-To: <4AE0D14B.1070307@caviumnetworks.com>

On 10/22/2009 03:40 PM, David Daney wrote:

> The main problem I have encountered is how to fit the interrupt
> management into the kernel framework.  Currently the interrupt source
> is connected to a single irq number.  I request_irq, and then manage
> the masking and unmasking on a per cpu basis by directly manipulating
> the interrupt controller's affinity/routing registers.  This goes
> behind the back of all the kernel's standard interrupt management
> routines.  I am looking for a better approach.
> 
> One thing that comes to mind is that I could assign a different
> interrupt number per cpu to the interrupt signal.  So instead of
> having one irq I would have 32 of them.  The driver would then do
> request_irq for all 32 irqs, and could call enable_irq and disable_irq
> to enable and disable them.  The problem with this is that there isn't
> really a single packets-ready signal, but instead 16 of them.  So If I
> go this route I would have 16(lines) x 32(cpus) = 512 interrupt
> numbers just for the networking hardware, which seems a bit excessive.

Does your hardware do flow-based queues?  In this model you have
multiple rx queues and the hardware hashes incoming packets to a single
queue based on the addresses, ports, etc. This ensures that all the
packets of a single connection always get processed in the order they
arrived at the net device.

Typically in this model you have as many interrupts as queues
(presumably 16 in your case).  Each queue is assigned an interrupt and
that interrupt is affined to a single core.

The intel igb driver is an example of one that uses this sort of design.

Chris

^ permalink raw reply

* Re: [PATCH 5/5] ONLY-APPLY-IF-STILL-FAILING Revert 373c0a7e, 8aa7e847: Fix congestion_wait() sync/async vs read/write confusion
From: Jens Axboe @ 2009-10-22 21:49 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, Rafael J. Wysocki, David Miller, Reinette Chatre,
	Kalle Valo, David Rientjes, KOSAKI Motohiro, Mohamed Abbas,
	John W. Linville, Pekka Enberg, Bartlomiej Zolnierkiewicz,
	Greg Kroah-Hartman, Stephan von Krawczynski, Kernel Testers List,
	netdev, linux-kernel, linux-mm@kvack.org
In-Reply-To: <1256221356-26049-6-git-send-email-mel@csn.ul.ie>

On Thu, Oct 22 2009, Mel Gorman wrote:
> Testing by Frans Pop indicates that in the 2.6.30..2.6.31 window at
> least that the commits 373c0a7e 8aa7e847 dramatically increased the
> number of GFP_ATOMIC failures that were occuring within a wireless
> driver. It was never isolated which of the changes was the exact problem
> and it's possible it has been fixed since. If problems are still
> occuring with GFP_ATOMIC in 2.6.31-rc5, then this patch should be
> applied to determine if the congestion_wait() callers are still broken.

I still think this is a complete red herring.

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [net-next-2.6 PATCH 2/3] ixgbe: Set MSI-X vectors to NOBALANCING and set affinity
From: Stephen Hemminger @ 2009-10-22 21:45 UTC (permalink / raw)
  To: David Miller; +Cc: peter.p.waskiewicz.jr, jeffrey.t.kirsher, gospo, netdev
In-Reply-To: <20091022.035601.193700201.davem@davemloft.net>

On Thu, 22 Oct 2009 03:56:01 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> Date: Thu, 22 Oct 2009 01:22:36 -0700
> 
> > The first thing any performance guide says is to disable irqbalance
> 
> Such guides are wrong, and that's the end of this discussion.
> 
> These kinds of guides also say to do all kinds of crazy things with
> the socket sysctl settings.  That's wrong too and we absolutely do not
> do things to accomodate nor support those guide suggestions.
> 
> And we won't do that here.
> 
> I'm especially not going to succumb in this case because Arjan has
> been more than responsive to making sure irqbalanced in userspace does
> the right thing for networking devices, even multiqueue ones.
> 
> So we can make it do the right thing when flow director is present.
> In fact, the thing you want for flow director makes sense in the
> general case too.

irqbalance daemon already has IRQBALANCE_BANNED_INTERRUPTS
to work around this. It also has code to special case devices, if you
think ixgbe needs special treatment, why not do it there.

^ permalink raw reply

* Re: [Patch] sctp: remove deprecated SCTP_GET_*_OLD stuffs
From: Sam Ravnborg @ 2009-10-22 21:44 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: Amerigo Wang, linux-kernel, netdev, akpm
In-Reply-To: <4AE0C64A.9080400@hp.com>

On Thu, Oct 22, 2009 at 04:53:30PM -0400, Vlad Yasevich wrote:
> > diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
> > index be2334a..0991f1b 100644
> > --- a/include/net/sctp/user.h
> > +++ b/include/net/sctp/user.h
> > @@ -131,14 +131,6 @@ enum sctp_optname {
> >  #define SCTP_SOCKOPT_BINDX_REM	SCTP_SOCKOPT_BINDX_REM
> >  	SCTP_SOCKOPT_PEELOFF, 	/* peel off association. */
> >  #define SCTP_SOCKOPT_PEELOFF	SCTP_SOCKOPT_PEELOFF
> > -	SCTP_GET_PEER_ADDRS_NUM_OLD, 	/* Get number of peer addresss. */
> > -#define SCTP_GET_PEER_ADDRS_NUM_OLD	SCTP_GET_PEER_ADDRS_NUM_OLD
> > -	SCTP_GET_PEER_ADDRS_OLD, 	/* Get all peer addresss. */
> > -#define SCTP_GET_PEER_ADDRS_OLD	SCTP_GET_PEER_ADDRS_OLD
> > -	SCTP_GET_LOCAL_ADDRS_NUM_OLD, 	/* Get number of local addresss. */
> > -#define SCTP_GET_LOCAL_ADDRS_NUM_OLD	SCTP_GET_LOCAL_ADDRS_NUM_OLD
> > -	SCTP_GET_LOCAL_ADDRS_OLD, 	/* Get all local addresss. */
> > -#define SCTP_GET_LOCAL_ADDRS_OLD	SCTP_GET_LOCAL_ADDRS_OLD
> >  	SCTP_SOCKOPT_CONNECTX_OLD, /* CONNECTX old requests. */
> 
> After running the regression suite against this patch I find that we can't
> remove the enum values.  Removing the enums changes the value for the remainder
> of the definitions and breaks binary compatibility for applications that use
> those trailing options.
> 
> You should be ok with removing the #defines and actual code that uses them,
> but not the enums.  You can even rename the enums, but we must preserve
> numeric ordering.

If we really depend on the actual value of an enum as in this case,
then e should assign them direct to better document this.

	Sam

^ permalink raw reply

* Irq architecture for multi-core network driver.
From: David Daney @ 2009-10-22 21:40 UTC (permalink / raw)
  To: netdev, Linux Kernel Mailing List; +Cc: linux-mips

My network controller is part of a multicore SOC family[1] with up to 32 
cpu cores.

The the packets-ready signal from the network controller can trigger
an interrupt on any or all cpus and is configurable on a per cpu basis.

If more than one cpu has the interrupt enabled, they would all get the
interrupt, so if a single packet were to be ready, all cpus could be
interrupted and try to process it.  The kernel interrupt management
functions don't seem to give me a good way to manage the interrupts.
More on this later.

My current approach is to add a NAPI instance for each cpu.  I start
with the interrupt enabled on a single cpu, when the interrupt
triggers, I mask the interrupt on that cpu and schedule the
napi_poll.  When the napi_poll function is entered, I look at the
packet backlog and if it is above a threshold , I enable the interrupt
on an additional cpu.  The process then iterates until the number of cpu
running the napi_poll function can maintain the backlog under the
threshold.  This all seems to work fairly well.

The main problem I have encountered is how to fit the interrupt
management into the kernel framework.  Currently the interrupt source
is connected to a single irq number.  I request_irq, and then manage
the masking and unmasking on a per cpu basis by directly manipulating
the interrupt controller's affinity/routing registers.  This goes
behind the back of all the kernel's standard interrupt management
routines.  I am looking for a better approach.

One thing that comes to mind is that I could assign a different
interrupt number per cpu to the interrupt signal.  So instead of
having one irq I would have 32 of them.  The driver would then do
request_irq for all 32 irqs, and could call enable_irq and disable_irq
to enable and disable them.  The problem with this is that there isn't
really a single packets-ready signal, but instead 16 of them.  So If I
go this route I would have 16(lines) x 32(cpus) = 512 interrupt
numbers just for the networking hardware, which seems a bit excessive.

A second possibility is to add something like:

int irq_add_affinity(unsigned int irq, cpumask_t cpumask);

int irq_remove_affinity(unsigned int irq, cpumask_t cpumask);

These would atomically add and remove cpus from an irq's affinity.
This is essentially what my current driver does, but it would be with
a new officially blessed kernel interface.

Any opinions about the best way forward are most welcome.

Thanks,
David Daney

[1]: See: arch/mips/cavium-octeon and drivers/staging/octeon.  Yes the 
staging driver is ugly, I am working to improve it.

^ permalink raw reply

* Re: [Patch] sctp: remove deprecated SCTP_GET_*_OLD stuffs
From: Vlad Yasevich @ 2009-10-22 20:53 UTC (permalink / raw)
  To: Amerigo Wang; +Cc: linux-kernel, netdev, akpm
In-Reply-To: <20091015082849.4605.48311.sendpatchset@localhost.localdomain>


Amerigo Wang wrote:
> SCTP_GET_*_OLD stuffs are schedlued to be removed.
> 
> Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
> Signed-off-by: WANG Cong <amwang@redhat.com>
> 
> 
> ---
> diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
> index 04e6c81..0b0eee5 100644
> --- a/Documentation/feature-removal-schedule.txt
> +++ b/Documentation/feature-removal-schedule.txt
> @@ -302,18 +302,6 @@ Who:	ocfs2-devel@oss.oracle.com
>  
>  ---------------------------
>  
> -What:	SCTP_GET_PEER_ADDRS_NUM_OLD, SCTP_GET_PEER_ADDRS_OLD,
> -	SCTP_GET_LOCAL_ADDRS_NUM_OLD, SCTP_GET_LOCAL_ADDRS_OLD
> -When: 	June 2009
> -Why:    A newer version of the options have been introduced in 2005 that
> -	removes the limitions of the old API.  The sctp library has been
> -        converted to use these new options at the same time.  Any user
> -	space app that directly uses the old options should convert to using
> -	the new options.
> -Who:	Vlad Yasevich <vladislav.yasevich@hp.com>
> -
> ----------------------------
> -
>  What:	Ability for non root users to shm_get hugetlb pages based on mlock
>  	resource limits
>  When:	2.6.31
> diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
> index be2334a..0991f1b 100644
> --- a/include/net/sctp/user.h
> +++ b/include/net/sctp/user.h
> @@ -131,14 +131,6 @@ enum sctp_optname {
>  #define SCTP_SOCKOPT_BINDX_REM	SCTP_SOCKOPT_BINDX_REM
>  	SCTP_SOCKOPT_PEELOFF, 	/* peel off association. */
>  #define SCTP_SOCKOPT_PEELOFF	SCTP_SOCKOPT_PEELOFF
> -	SCTP_GET_PEER_ADDRS_NUM_OLD, 	/* Get number of peer addresss. */
> -#define SCTP_GET_PEER_ADDRS_NUM_OLD	SCTP_GET_PEER_ADDRS_NUM_OLD
> -	SCTP_GET_PEER_ADDRS_OLD, 	/* Get all peer addresss. */
> -#define SCTP_GET_PEER_ADDRS_OLD	SCTP_GET_PEER_ADDRS_OLD
> -	SCTP_GET_LOCAL_ADDRS_NUM_OLD, 	/* Get number of local addresss. */
> -#define SCTP_GET_LOCAL_ADDRS_NUM_OLD	SCTP_GET_LOCAL_ADDRS_NUM_OLD
> -	SCTP_GET_LOCAL_ADDRS_OLD, 	/* Get all local addresss. */
> -#define SCTP_GET_LOCAL_ADDRS_OLD	SCTP_GET_LOCAL_ADDRS_OLD
>  	SCTP_SOCKOPT_CONNECTX_OLD, /* CONNECTX old requests. */

After running the regression suite against this patch I find that we can't
remove the enum values.  Removing the enums changes the value for the remainder
of the definitions and breaks binary compatibility for applications that use
those trailing options.

You should be ok with removing the #defines and actual code that uses them,
but not the enums.  You can even rename the enums, but we must preserve
numeric ordering.

Can you resubmit a corrected patch.

-vlad

>  #define SCTP_SOCKOPT_CONNECTX_OLD	SCTP_SOCKOPT_CONNECTX_OLD
>  	SCTP_GET_PEER_ADDRS, 	/* Get all peer addresss. */
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index c8d0575..1732a70 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -4339,90 +4339,6 @@ static int sctp_getsockopt_initmsg(struct sock *sk, int len, char __user *optval
>  	return 0;
>  }
>  
> -static int sctp_getsockopt_peer_addrs_num_old(struct sock *sk, int len,
> -					      char __user *optval,
> -					      int __user *optlen)
> -{
> -	sctp_assoc_t id;
> -	struct sctp_association *asoc;
> -	struct list_head *pos;
> -	int cnt = 0;
> -
> -	if (len < sizeof(sctp_assoc_t))
> -		return -EINVAL;
> -
> -	if (copy_from_user(&id, optval, sizeof(sctp_assoc_t)))
> -		return -EFAULT;
> -
> -	printk(KERN_WARNING "SCTP: Use of SCTP_GET_PEER_ADDRS_NUM_OLD "
> -			    "socket option deprecated\n");
> -	/* For UDP-style sockets, id specifies the association to query.  */
> -	asoc = sctp_id2assoc(sk, id);
> -	if (!asoc)
> -		return -EINVAL;
> -
> -	list_for_each(pos, &asoc->peer.transport_addr_list) {
> -		cnt ++;
> -	}
> -
> -	return cnt;
> -}
> -
> -/*
> - * Old API for getting list of peer addresses. Does not work for 32-bit
> - * programs running on a 64-bit kernel
> - */
> -static int sctp_getsockopt_peer_addrs_old(struct sock *sk, int len,
> -					  char __user *optval,
> -					  int __user *optlen)
> -{
> -	struct sctp_association *asoc;
> -	int cnt = 0;
> -	struct sctp_getaddrs_old getaddrs;
> -	struct sctp_transport *from;
> -	void __user *to;
> -	union sctp_addr temp;
> -	struct sctp_sock *sp = sctp_sk(sk);
> -	int addrlen;
> -
> -	if (len < sizeof(struct sctp_getaddrs_old))
> -		return -EINVAL;
> -
> -	len = sizeof(struct sctp_getaddrs_old);
> -
> -	if (copy_from_user(&getaddrs, optval, len))
> -		return -EFAULT;
> -
> -	if (getaddrs.addr_num <= 0) return -EINVAL;
> -
> -	printk(KERN_WARNING "SCTP: Use of SCTP_GET_PEER_ADDRS_OLD "
> -			    "socket option deprecated\n");
> -
> -	/* For UDP-style sockets, id specifies the association to query.  */
> -	asoc = sctp_id2assoc(sk, getaddrs.assoc_id);
> -	if (!asoc)
> -		return -EINVAL;
> -
> -	to = (void __user *)getaddrs.addrs;
> -	list_for_each_entry(from, &asoc->peer.transport_addr_list,
> -				transports) {
> -		memcpy(&temp, &from->ipaddr, sizeof(temp));
> -		sctp_get_pf_specific(sk->sk_family)->addr_v4map(sp, &temp);
> -		addrlen = sctp_get_af_specific(sk->sk_family)->sockaddr_len;
> -		if (copy_to_user(to, &temp, addrlen))
> -			return -EFAULT;
> -		to += addrlen ;
> -		cnt ++;
> -		if (cnt >= getaddrs.addr_num) break;
> -	}
> -	getaddrs.addr_num = cnt;
> -	if (put_user(len, optlen))
> -		return -EFAULT;
> -	if (copy_to_user(optval, &getaddrs, len))
> -		return -EFAULT;
> -
> -	return 0;
> -}
>  
>  static int sctp_getsockopt_peer_addrs(struct sock *sk, int len,
>  				      char __user *optval, int __user *optlen)
> @@ -4475,81 +4391,6 @@ static int sctp_getsockopt_peer_addrs(struct sock *sk, int len,
>  	return 0;
>  }
>  
> -static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
> -					       char __user *optval,
> -					       int __user *optlen)
> -{
> -	sctp_assoc_t id;
> -	struct sctp_bind_addr *bp;
> -	struct sctp_association *asoc;
> -	struct sctp_sockaddr_entry *addr;
> -	int cnt = 0;
> -
> -	if (len < sizeof(sctp_assoc_t))
> -		return -EINVAL;
> -
> -	if (copy_from_user(&id, optval, sizeof(sctp_assoc_t)))
> -		return -EFAULT;
> -
> -	printk(KERN_WARNING "SCTP: Use of SCTP_GET_LOCAL_ADDRS_NUM_OLD "
> -			    "socket option deprecated\n");
> -
> -	/*
> -	 *  For UDP-style sockets, id specifies the association to query.
> -	 *  If the id field is set to the value '0' then the locally bound
> -	 *  addresses are returned without regard to any particular
> -	 *  association.
> -	 */
> -	if (0 == id) {
> -		bp = &sctp_sk(sk)->ep->base.bind_addr;
> -	} else {
> -		asoc = sctp_id2assoc(sk, id);
> -		if (!asoc)
> -			return -EINVAL;
> -		bp = &asoc->base.bind_addr;
> -	}
> -
> -	/* If the endpoint is bound to 0.0.0.0 or ::0, count the valid
> -	 * addresses from the global local address list.
> -	 */
> -	if (sctp_list_single_entry(&bp->address_list)) {
> -		addr = list_entry(bp->address_list.next,
> -				  struct sctp_sockaddr_entry, list);
> -		if (sctp_is_any(sk, &addr->a)) {
> -			rcu_read_lock();
> -			list_for_each_entry_rcu(addr,
> -						&sctp_local_addr_list, list) {
> -				if (!addr->valid)
> -					continue;
> -
> -				if ((PF_INET == sk->sk_family) &&
> -				    (AF_INET6 == addr->a.sa.sa_family))
> -					continue;
> -
> -				if ((PF_INET6 == sk->sk_family) &&
> -				    inet_v6_ipv6only(sk) &&
> -				    (AF_INET == addr->a.sa.sa_family))
> -					continue;
> -
> -				cnt++;
> -			}
> -			rcu_read_unlock();
> -		} else {
> -			cnt = 1;
> -		}
> -		goto done;
> -	}
> -
> -	/* Protection on the bound address list is not needed,
> -	 * since in the socket option context we hold the socket lock,
> -	 * so there is no way that the bound address list can change.
> -	 */
> -	list_for_each_entry(addr, &bp->address_list, list) {
> -		cnt ++;
> -	}
> -done:
> -	return cnt;
> -}
>  
>  /* Helper function that copies local addresses to user and returns the number
>   * of addresses copied.
> @@ -4637,112 +4478,6 @@ static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
>  	return cnt;
>  }
>  
> -/* Old API for getting list of local addresses. Does not work for 32-bit
> - * programs running on a 64-bit kernel
> - */
> -static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
> -					   char __user *optval, int __user *optlen)
> -{
> -	struct sctp_bind_addr *bp;
> -	struct sctp_association *asoc;
> -	int cnt = 0;
> -	struct sctp_getaddrs_old getaddrs;
> -	struct sctp_sockaddr_entry *addr;
> -	void __user *to;
> -	union sctp_addr temp;
> -	struct sctp_sock *sp = sctp_sk(sk);
> -	int addrlen;
> -	int err = 0;
> -	void *addrs;
> -	void *buf;
> -	int bytes_copied = 0;
> -
> -	if (len < sizeof(struct sctp_getaddrs_old))
> -		return -EINVAL;
> -
> -	len = sizeof(struct sctp_getaddrs_old);
> -	if (copy_from_user(&getaddrs, optval, len))
> -		return -EFAULT;
> -
> -	if (getaddrs.addr_num <= 0 ||
> -	    getaddrs.addr_num >= (INT_MAX / sizeof(union sctp_addr)))
> -		return -EINVAL;
> -
> -	printk(KERN_WARNING "SCTP: Use of SCTP_GET_LOCAL_ADDRS_OLD "
> -			    "socket option deprecated\n");
> -
> -	/*
> -	 *  For UDP-style sockets, id specifies the association to query.
> -	 *  If the id field is set to the value '0' then the locally bound
> -	 *  addresses are returned without regard to any particular
> -	 *  association.
> -	 */
> -	if (0 == getaddrs.assoc_id) {
> -		bp = &sctp_sk(sk)->ep->base.bind_addr;
> -	} else {
> -		asoc = sctp_id2assoc(sk, getaddrs.assoc_id);
> -		if (!asoc)
> -			return -EINVAL;
> -		bp = &asoc->base.bind_addr;
> -	}
> -
> -	to = getaddrs.addrs;
> -
> -	/* Allocate space for a local instance of packed array to hold all
> -	 * the data.  We store addresses here first and then put write them
> -	 * to the user in one shot.
> -	 */
> -	addrs = kmalloc(sizeof(union sctp_addr) * getaddrs.addr_num,
> -			GFP_KERNEL);
> -	if (!addrs)
> -		return -ENOMEM;
> -
> -	/* If the endpoint is bound to 0.0.0.0 or ::0, get the valid
> -	 * addresses from the global local address list.
> -	 */
> -	if (sctp_list_single_entry(&bp->address_list)) {
> -		addr = list_entry(bp->address_list.next,
> -				  struct sctp_sockaddr_entry, list);
> -		if (sctp_is_any(sk, &addr->a)) {
> -			cnt = sctp_copy_laddrs_old(sk, bp->port,
> -						   getaddrs.addr_num,
> -						   addrs, &bytes_copied);
> -			goto copy_getaddrs;
> -		}
> -	}
> -
> -	buf = addrs;
> -	/* Protection on the bound address list is not needed since
> -	 * in the socket option context we hold a socket lock and
> -	 * thus the bound address list can't change.
> -	 */
> -	list_for_each_entry(addr, &bp->address_list, list) {
> -		memcpy(&temp, &addr->a, sizeof(temp));
> -		sctp_get_pf_specific(sk->sk_family)->addr_v4map(sp, &temp);
> -		addrlen = sctp_get_af_specific(temp.sa.sa_family)->sockaddr_len;
> -		memcpy(buf, &temp, addrlen);
> -		buf += addrlen;
> -		bytes_copied += addrlen;
> -		cnt ++;
> -		if (cnt >= getaddrs.addr_num) break;
> -	}
> -
> -copy_getaddrs:
> -	/* copy the entire address list into the user provided space */
> -	if (copy_to_user(to, addrs, bytes_copied)) {
> -		err = -EFAULT;
> -		goto error;
> -	}
> -
> -	/* copy the leading structure back to user */
> -	getaddrs.addr_num = cnt;
> -	if (copy_to_user(optval, &getaddrs, len))
> -		err = -EFAULT;
> -
> -error:
> -	kfree(addrs);
> -	return err;
> -}
>  
>  static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
>  				       char __user *optval, int __user *optlen)
> @@ -5593,22 +5328,6 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
>  	case SCTP_INITMSG:
>  		retval = sctp_getsockopt_initmsg(sk, len, optval, optlen);
>  		break;
> -	case SCTP_GET_PEER_ADDRS_NUM_OLD:
> -		retval = sctp_getsockopt_peer_addrs_num_old(sk, len, optval,
> -							    optlen);
> -		break;
> -	case SCTP_GET_LOCAL_ADDRS_NUM_OLD:
> -		retval = sctp_getsockopt_local_addrs_num_old(sk, len, optval,
> -							     optlen);
> -		break;
> -	case SCTP_GET_PEER_ADDRS_OLD:
> -		retval = sctp_getsockopt_peer_addrs_old(sk, len, optval,
> -							optlen);
> -		break;
> -	case SCTP_GET_LOCAL_ADDRS_OLD:
> -		retval = sctp_getsockopt_local_addrs_old(sk, len, optval,
> -							 optlen);
> -		break;
>  	case SCTP_GET_PEER_ADDRS:
>  		retval = sctp_getsockopt_peer_addrs(sk, len, optval,
>  						    optlen);
> 

^ permalink raw reply

* Re: [PATCH 4/5] page allocator: Pre-emptively wake kswapd when high-order watermarks are hit
From: David Rientjes @ 2009-10-22 19:41 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, Rafael J. Wysocki, David Miller, Reinette Chatre,
	Kalle Valo, KOSAKI Motohiro, Mohamed Abbas, Jens Axboe,
	John W. Linville, Pekka Enberg, Bartlomiej Zolnierkiewicz,
	Greg Kroah-Hartman, Stephan von Krawczynski, Kernel Testers List,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"
In-Reply-To: <1256221356-26049-5-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>

On Thu, 22 Oct 2009, Mel Gorman wrote:

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7f2aa3e..851df40 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1596,6 +1596,17 @@ try_next_zone:
>  	return page;
>  }
>  
> +static inline
> +void wake_all_kswapd(unsigned int order, struct zonelist *zonelist,
> +						enum zone_type high_zoneidx)
> +{
> +	struct zoneref *z;
> +	struct zone *zone;
> +
> +	for_each_zone_zonelist(zone, z, zonelist, high_zoneidx)
> +		wakeup_kswapd(zone, order);
> +}
> +
>  static inline int
>  should_alloc_retry(gfp_t gfp_mask, unsigned int order,
>  				unsigned long pages_reclaimed)
> @@ -1730,18 +1741,18 @@ __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order,
>  			congestion_wait(BLK_RW_ASYNC, HZ/50);
>  	} while (!page && (gfp_mask & __GFP_NOFAIL));
>  
> -	return page;
> -}
> -
> -static inline
> -void wake_all_kswapd(unsigned int order, struct zonelist *zonelist,
> -						enum zone_type high_zoneidx)
> -{
> -	struct zoneref *z;
> -	struct zone *zone;
> +	/*
> +	 * If after a high-order allocation we are now below watermarks,
> +	 * pre-emptively kick kswapd rather than having the next allocation
> +	 * fail and have to wake up kswapd, potentially failing GFP_ATOMIC
> +	 * allocations or entering direct reclaim
> +	 */
> +	if (unlikely(order) && page && !zone_watermark_ok(preferred_zone, order,
> +				preferred_zone->watermark[ALLOC_WMARK_LOW],
> +				zone_idx(preferred_zone), ALLOC_WMARK_LOW))
> +		wake_all_kswapd(order, zonelist, high_zoneidx);
>  
> -	for_each_zone_zonelist(zone, z, zonelist, high_zoneidx)
> -		wakeup_kswapd(zone, order);
> +	return page;
>  }
>  
>  static inline int

Hmm, is this really supposed to be added to __alloc_pages_high_priority()?  
By the patch description I was expecting kswapd to be woken up 
preemptively whenever the preferred zone is below ALLOC_WMARK_LOW and 
we're known to have just allocated at a higher order, not just when 
current was oom killed (when we should already be freeing a _lot_ of 
memory soon) or is doing a higher order allocation during direct reclaim.

For the best coverage, it would have to be add the branch to the fastpath.  
That seems fine for a debugging aid and to see if progress is being made 
on the GFP_ATOMIC allocation issues, but doesn't seem like it should make 
its way to mainline, the subsequent GFP_ATOMIC allocation could already be 
happening and in the page allocator's slowpath at this point that this 
wakeup becomes unnecessary.

If this is moved to the fastpath, why is this wake_all_kswapd() and not
wakeup_kswapd(preferred_zone, order)?  Do we need to kick kswapd in all 
zones even though they may be free just because preferred_zone is now 
below the watermark?

Wouldn't it be better to do this on page_zone(page) instead of 
preferred_zone anyway?

^ permalink raw reply

* [PATCH] DRIVERS: NET: USB: DM9601 driver can drive a device not supported yet, add support for it
From: Janusz Krzysztofik @ 2009-10-22 18:25 UTC (permalink / raw)
  To: Peter Korsgaard; +Cc: netdev

Hi,

I found that the current version of drivers/net/usb/dm9601.c can be used to
successfully drive a low-power, low-cost network adapter with USB ID
0a46:9000, based on a DM9000E chipset. As no device with this ID is yet
present in the kernel, I have created a patch that adds support for the device
to the dm9601 driver.

Created and tested against linux-2.6.32-rc5.

Signed-off-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl>

---
There seems to be plenty of those devices available on ebay recently, for example: 
http://cgi.ebay.pl/USB-TO-Fast-Ethernet-Network-RJ45-Adapter-Converter-NIC_W0QQitemZ250458092252QQcmdZViewItemQQptZUK_Computing_Networking_SM?hash=item3a507732dc

--- linux-2.6.32-rc5/drivers/net/usb/dm9601.c.orig	2009-10-22 20:14:00.000000000 +0200
+++ linux-2.6.32-rc5/drivers/net/usb/dm9601.c	2009-10-22 20:14:04.000000000 +0200
@@ -649,6 +649,10 @@ static const struct usb_device_id produc
 	USB_DEVICE(0x0fe6, 0x8101),	/* DM9601 USB to Fast Ethernet Adapter */
 	.driver_info = (unsigned long)&dm9601_info,
 	 },
+	{
+	 USB_DEVICE(0x0a46, 0x9000),	/* DM9000E */
+	 .driver_info = (unsigned long)&dm9601_info,
+	 },
 	{},			// END
 };
 

^ permalink raw reply

* Re: [PATCH] DRIVERS: NET: USB: DM9601 driver can drive a device not supported yet, add support for it
From: Peter Korsgaard @ 2009-10-22 18:54 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: netdev
In-Reply-To: <200910222025.50405.jkrzyszt@tis.icnet.pl>

>>>>> "Janusz" == Janusz Krzysztofik <jkrzyszt@tis.icnet.pl> writes:

 Janusz> Hi,
 Janusz> I found that the current version of drivers/net/usb/dm9601.c can be used to
 Janusz> successfully drive a low-power, low-cost network adapter with USB ID
 Janusz> 0a46:9000, based on a DM9000E chipset. As no device with this ID is yet
 Janusz> present in the kernel, I have created a patch that adds support for the device
 Janusz> to the dm9601 driver.

 Janusz> Created and tested against linux-2.6.32-rc5.

 Janusz> Signed-off-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl>

Thanks.

Acked-by: Peter Korsgaard <jacmet@sunsite.dk>

-- 
Bye, Peter Korsgaard

^ permalink raw reply

* Re: bridging + load balancing bonding
From: Eric Dumazet @ 2009-10-22 17:53 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Jasper Spaans, netdev
In-Reply-To: <347.1256232960@death.nxdomain.ibm.com>

Jay Vosburgh a écrit :

> 	By "packets from one flow" do you really mean that packets from
> a given "flow" (TCP connection, UDP "stream", etc) are not always
> delivered to the same bonding port?  I.e., that two packets from the
> same "flow" will be delivered to different ports?  I'm not sure how
> that's possible unless the source MAC in the packets changes during the
> course of the flow.
> 
> 	Or is your problem really that the balance algorithm on the
> bonding send side doesn't match the algorithm used on the other side of
> the IDS machines coming the other direction (and, thus, packets for a
> given flow going in one direction end up at a different IDS than the
> packets going the other direction)?
> 

Yes this is probably Jasper problem : catch both direction on same IDS target

Say you have machine A with MAC address MAC_A
and machine B with MAC address MAC_B
(I suspect asymetric routing on A or B is out of the question :) )

A tcp / udp/ whatever protocol flow is running between these two machines

When machine A sends a frame to machine B, Jasper machine
receives a copy of this frame, with eth->src = MAC_A and eth->dst = MAC_B

With current xor algo, we perform a hash on (bond->dev_addr[5] ^ MAC_B[5])  -> IDS X

When machine B sends a frame to machine A, Jasper machine
receives a copy of this frame, with eth->src = MAC_B and eth->dst = MAC_A

With current xor algo, we peform a hash on (bond->dev_addr[5] ^ MAC_A[5])  -> possibly other IDS Y


With his fix, algo is a commutative hash (MAC_A[5] ^ MAC_B[5]) ==  (MAC_B[5] ^ MAC_A[5])



I suspect multicast/broadcast trafic should be sent to both IDS, so bonding might be inappropriate anyway...

an iptables solution might be more powerfull

^ permalink raw reply

* Re: bridging + load balancing bonding
From: Jay Vosburgh @ 2009-10-22 17:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jasper Spaans, netdev
In-Reply-To: <4AE07D3C.3040702@gmail.com>

Eric Dumazet <eric.dumazet@gmail.com> wrote:

>Jasper Spaans a écrit :
>> Hi,
>> 
>> We're using the following setup for bonding and bridging, to be able to put
>> large amounts of data through multiple IDS analyzers:
>> 
>>                              +---[br0]----+     +--- eth1 ---(IDS machine 1)
>> (Span port from switch) -- eth0          bond0--+
>>                                                 +--- eth2 ---(IDS machine 2)
>> 
>> eth0 receives network traffic, which should be passed to machines which are
>> connected to eth1 and eth2. These machines run an IDS package, and there are
>> two of those for performance reasons.
>> 
>> bond0 is configured to load balance the packets using "balance-xor", in this
>> case combined with xmit_hash_policy layer2.
>> 
>> However, we're seeing problems: packets from one flow do not end up at the
>> same IDS machine.  This is because this selection is not based on the source
>> _and_ destination mac addresses of the original packet, but on the mac
>> address of the bonding device and the destination mac address of the
>> package.

	By "packets from one flow" do you really mean that packets from
a given "flow" (TCP connection, UDP "stream", etc) are not always
delivered to the same bonding port?  I.e., that two packets from the
same "flow" will be delivered to different ports?  I'm not sure how
that's possible unless the source MAC in the packets changes during the
course of the flow.

	Or is your problem really that the balance algorithm on the
bonding send side doesn't match the algorithm used on the other side of
the IDS machines coming the other direction (and, thus, packets for a
given flow going in one direction end up at a different IDS than the
packets going the other direction)?

>> This is also clear in the code:
>> For example, in bond_main.c, in bond_xmit_hash_policy_l2:
>> 	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
>> 
>> Changing this to
>> 	return (data->h_dest[5] ^ data->h_source[5]) % count;
>> fixes our problems, but is this harmful for packets originating locally (or
>> being routed?)
>> 
>> If not, can this be applied? Or does anyone have other ideas?
>> 
>
>Hi Jasper
>
>Very nice setup, and nice finding.
>
>Dont locally generated (or outed) packets have h_source set to
>bond_dev->dev_addr anyway ?

	Locally generated packets do, but he's got a bridge in there, so
the traffic they're balancing is presumably not locally generated (i.e.,
is being forwarded by the bridge, in which case they'll still bear the
source MAC of the originating node on the subnet).  If the packets were
being routed instead of bridged, then, yah, they'd have the bond's
source MAC.

>So your solution might be the right fix...

	Yes, I think he's found a legitimate bug, one that only will
manifest when balancing bridged traffic.  I had to think for a minute if
this change would break anything, and I'm coming up empty.  Locally
generated or routed traffic won't see a change, and bridged traffic will
be correctly balanced according to the "source MAC XOR destination MAC"
forumla described in the documentation.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH 2/5] page allocator: Do not allow interrupts to use ALLOC_HARDER
From: Mel Gorman @ 2009-10-22 16:37 UTC (permalink / raw)
  To: Stephan von Krawczynski
  Cc: Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, Rafael J. Wysocki, David Miller, Reinette Chatre,
	Kalle Valo, David Rientjes, KOSAKI Motohiro, Mohamed Abbas,
	Jens Axboe, John W. Linville, Pekka Enberg,
	Bartlomiej Zolnierkiewicz, Greg Kroah-Hartman,
	Kernel Testers List, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
In-Reply-To: <20091022183303.2448942d.skraw-DcQCyzbjH0jQT0dZR+AlfA@public.gmane.org>

On Thu, Oct 22, 2009 at 06:33:03PM +0200, Stephan von Krawczynski wrote:
> On Thu, 22 Oct 2009 15:22:33 +0100
> Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> wrote:
> 
> > Commit 341ce06f69abfafa31b9468410a13dbd60e2b237 altered watermark logic
> > slightly by allowing rt_tasks that are handling an interrupt to set
> > ALLOC_HARDER. This patch brings the watermark logic more in line with
> > 2.6.30.
> > 
> > [rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org: Spotted the problem]
> > Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>
> > Reviewed-by: Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
> > ---
> >  mm/page_alloc.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index dfa4362..7f2aa3e 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1769,7 +1769,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> >  		 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
> >  		 */
> >  		alloc_flags &= ~ALLOC_CPUSET;
> > -	} else if (unlikely(rt_task(p)))
> > +	} else if (unlikely(rt_task(p)) && !in_interrupt())
> >  		alloc_flags |= ALLOC_HARDER;
> >  
> >  	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
> > -- 
> > 1.6.3.3
> > 
> 
> Is it correct that this one applies offset -54 lines in 2.6.31.4 ? 
> 

In this case, it's ok. It's just a harmless heads-up that the kernel
looks slightly different than expected. I posted a 2.6.31.4 version of
the two patches that cause real problems.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* Re: [PATCH 2/5] page allocator: Do not allow interrupts to use ALLOC_HARDER
From: Stephan von Krawczynski @ 2009-10-22 16:33 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, Rafael J. Wysocki, David Miller, Reinette Chatre,
	Kalle Valo, David Rientjes, KOSAKI Motohiro, Mohamed Abbas,
	Jens Axboe, John W. Linville, Pekka Enberg,
	Bartlomiej Zolnierkiewicz, Greg Kroah-Hartman,
	Kernel Testers List, netdev, linux-kernel,
	linux-mm@kvack.org", Mel Gorman
In-Reply-To: <1256221356-26049-3-git-send-email-mel@csn.ul.ie>

On Thu, 22 Oct 2009 15:22:33 +0100
Mel Gorman <mel@csn.ul.ie> wrote:

> Commit 341ce06f69abfafa31b9468410a13dbd60e2b237 altered watermark logic
> slightly by allowing rt_tasks that are handling an interrupt to set
> ALLOC_HARDER. This patch brings the watermark logic more in line with
> 2.6.30.
> 
> [rientjes@google.com: Spotted the problem]
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
> ---
>  mm/page_alloc.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index dfa4362..7f2aa3e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1769,7 +1769,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>  		 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
>  		 */
>  		alloc_flags &= ~ALLOC_CPUSET;
> -	} else if (unlikely(rt_task(p)))
> +	} else if (unlikely(rt_task(p)) && !in_interrupt())
>  		alloc_flags |= ALLOC_HARDER;
>  
>  	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
> -- 
> 1.6.3.3
> 

Is it correct that this one applies offset -54 lines in 2.6.31.4 ? 

-- 
Regards,
Stephan

^ permalink raw reply

* Re: [PATCH 0/5] Candidate fix for increased number of GFP_ATOMIC failures V2
From: Mel Gorman @ 2009-10-22 16:03 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, Rafael J. Wysocki, David Miller, Reinette Chatre,
	Kalle Valo, David Rientjes, KOSAKI Motohiro, Mohamed Abbas,
	Jens Axboe, John W. Linville, Bartlomiej Zolnierkiewicz,
	Greg Kroah-Hartman, Stephan von Krawczynski, Kernel Testers List,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	cl-de/tnXTf+JLsfHDXvbKv3Sm6D+HspMUB
In-Reply-To: <84144f020910220747nba30d8bkc83c2569da79bd7c-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Thu, Oct 22, 2009 at 05:47:10PM +0300, Pekka Enberg wrote:
> On Thu, Oct 22, 2009 at 5:22 PM, Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> wrote:
> > Test 1: Verify your problem occurs on 2.6.32-rc5 if you can
> >
> > Test 2: Apply the following two patches and test again
> >
> >  1/5 page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed
> >  2/5 page allocator: Do not allow interrupts to use ALLOC_HARDER
> 
> These are pretty obvious bug fixes and should go to linux-next ASAP IMHO.
> 

Agreed, but I wanted to pin down where exactly we stand with this
problem before sending patches any direction for merging.

> > Test 5: If things are still screwed, apply the following
> >  5/5 Revert 373c0a7e, 8aa7e847: Fix congestion_wait() sync/async vs read/write confusion
> >
> >        Frans Pop reports that the bulk of his problems go away when this
> >        patch is reverted on 2.6.31. There has been some confusion on why
> >        exactly this patch was wrong but apparently the conversion was not
> >        complete and further work was required. It's unknown if all the
> >        necessary work exists in 2.6.31-rc5 or not. If there are still
> >        allocation failures and applying this patch fixes the problem,
> >        there are still snags that need to be ironed out.
> 
> As explained by Jens Axboe, this changes timing but is not the source
> of the OOMs so the revert is bogus even if it "helps" on some
> workloads. IIRC the person who reported the revert to help things did
> report that the OOMs did not go away, they were simply harder to
> trigger with the revert.
> 

IIRC, there were mixed reports as to how much the revert helped.  I'm hoping
that patches 1+2 cover the bases hence why I asked them to be tested on
their own. Patch 2 in particular might be responsible for watermarks being
impacted enough to cause timing problems. I left reverting with patch 5 as
a standalone test to see how much of a factor the timing changes introduced
are if there are still allocation problems.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* Re: [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed
From: Mel Gorman @ 2009-10-22 15:49 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, Rafael J. Wysocki, David Miller, Reinette Chatre,
	Kalle Valo, David Rientjes, KOSAKI Motohiro, Mohamed Abbas,
	Jens Axboe, John W. Linville, Bartlomiej Zolnierkiewicz,
	Greg Kroah-Hartman, Stephan von Krawczynski, Kernel Testers List,
	netdev, linux-kernel, linux-mm@kvack.org
In-Reply-To: <84144f020910220741o51c7e3dajcfd7b78d6dbbc4eb@mail.gmail.com>

On Thu, Oct 22, 2009 at 05:41:53PM +0300, Pekka Enberg wrote:
> On Thu, Oct 22, 2009 at 5:22 PM, Mel Gorman <mel@csn.ul.ie> wrote:
> > If a direct reclaim makes no forward progress, it considers whether it
> > should go OOM or not. Whether OOM is triggered or not, it may retry the
> > application afterwards. In times past, this would always wake kswapd as well
> > but currently, kswapd is not woken up after direct reclaim fails. For order-0
> > allocations, this makes little difference but if there is a heavy mix of
> > higher-order allocations that direct reclaim is failing for, it might mean
> > that kswapd is not rewoken for higher orders as much as it did previously.
> >
> > This patch wakes up kswapd when an allocation is being retried after a direct
> > reclaim failure. It would be expected that kswapd is already awake, but
> > this has the effect of telling kswapd to reclaim at the higher order as well.
> >
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> 
> You seem to have dropped the Reviewed-by tags from me and Christoph
> for this patch.
> 

My apologies. I missed then when going through the old mails.

> >  mm/page_alloc.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index bf72055..dfa4362 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1817,9 +1817,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >        if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE)
> >                goto nopage;
> >
> > +restart:
> >        wake_all_kswapd(order, zonelist, high_zoneidx);
> >
> > -restart:
> >        /*
> >         * OK, we're below the kswapd watermark and have kicked background
> >         * reclaim. Now things get more complex, so set up alloc_flags according
> > --
> > 1.6.3.3
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* Re: [PATCH 0/5] Candidate fix for increased number of GFP_ATOMIC failures V2
From: reinette chatre @ 2009-10-22 15:43 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, Rafael J. Wysocki, David Miller, Kalle Valo,
	David Rientjes, KOSAKI Motohiro, Abbas, Mohamed, Jens Axboe,
	John W. Linville, Pekka Enberg, Bartlomiej Zolnierkiewicz,
	Greg Kroah-Hartman, Stephan von Krawczynski, Kernel Testers List,
	netdev@vger.kernel.org, linux-kernel@ vger.kernel.org,
	linux-mm@kvack.org"
In-Reply-To: <1256221356-26049-1-git-send-email-mel@csn.ul.ie>

On Thu, 2009-10-22 at 07:22 -0700, Mel Gorman wrote:
> [Bug #14141] order 2 page allocation failures in iwlagn
> 	Commit 4752c93c30441f98f7ed723001b1a5e3e5619829 introduced GFP_ATOMIC
> 	allocations within the wireless driver. This has caused large numbers
> 	of failure reports to occur as reported by Frans Pop. Fixing this
> 	requires changes to the driver if it wants to use GFP_ATOMIC which
> 	is in the hands of Mohamed Abbas and Reinette Chatre. However,
> 	it is very likely that it has being compounded by core mm changes
> 	that this series is aimed at.

Driver has been changed to allocate paged skb for its receive buffers.
This reduces amount of memory needed from order-2 to order-1. This work
is significant and will thus be in 2.6.33. 

Reinette


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: bridging + load balancing bonding
From: Eric Dumazet @ 2009-10-22 15:41 UTC (permalink / raw)
  To: Jasper Spaans; +Cc: netdev
In-Reply-To: <20091022122339.GA20148@spaans.fox.local>

Jasper Spaans a écrit :
> Hi,
> 
> We're using the following setup for bonding and bridging, to be able to put
> large amounts of data through multiple IDS analyzers:
> 
>                              +---[br0]----+     +--- eth1 ---(IDS machine 1)
> (Span port from switch) -- eth0          bond0--+
>                                                 +--- eth2 ---(IDS machine 2)
> 
> eth0 receives network traffic, which should be passed to machines which are
> connected to eth1 and eth2. These machines run an IDS package, and there are
> two of those for performance reasons.
> 
> bond0 is configured to load balance the packets using "balance-xor", in this
> case combined with xmit_hash_policy layer2.
> 
> However, we're seeing problems: packets from one flow do not end up at the
> same IDS machine.  This is because this selection is not based on the source
> _and_ destination mac addresses of the original packet, but on the mac
> address of the bonding device and the destination mac address of the
> package.
> 
> This is also clear in the code:
> For example, in bond_main.c, in bond_xmit_hash_policy_l2:
> 	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
> 
> Changing this to
> 	return (data->h_dest[5] ^ data->h_source[5]) % count;
> fixes our problems, but is this harmful for packets originating locally (or
> being routed?)
> 
> If not, can this be applied? Or does anyone have other ideas?
> 

Hi Jasper

Very nice setup, and nice finding.

Dont locally generated (or outed) packets have h_source set to bond_dev->dev_addr anyway ?

So your solution might be the right fix...

About other ideas... I was thinking of TEE target (not in mainline unfortunatly) :

iptables -t mangle -A PREROUTING -i eth0 <some hash on mac addr>  -j TEE --gateway 192.168.99.1  # IDS1
iptables -t mangle -A PREROUTING -i eth0 !<some hash on mac addr>  -j TEE --gateway 192.168.99.2  # IDS2



^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox