Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] net_sched: accurate bytes/packets stats/rates
From: Eric Dumazet @ 2011-01-14 19:21 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, netdev, Patrick McHardy, jamal, Jarek Poplawski
In-Reply-To: <20110114110342.4d95ad5b@nehalam>

Le vendredi 14 janvier 2011 à 11:03 -0800, Stephen Hemminger a écrit :
> From Eric Dumazet <eric.dumazet@gmail.com>
> 
> In commit 44b8288308ac9d (net_sched: pfifo_head_drop problem), we fixed
> a problem with pfifo_head drops that incorrectly decreased
> sch->bstats.bytes and sch->bstats.packets
> 
> Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
> previously enqueued packet, and bstats cannot be changed, so
> bstats/rates are not accurate (over estimated)
> 
> This patch changes the qdisc_bstats updates to be done at dequeue() time
> instead of enqueue() time. bstats counters no longer account for dropped
> frames, and rates are more correct, since enqueue() bursts dont have
> effect on dequeue() rate.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Acked-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> CC: Patrick McHardy <kaber@trash.net>
> CC: Jarek Poplawski <jarkao2@gmail.com>
> CC: jamal <hadi@cyberus.ca>
> ---
> sch_fifo now changed to use __qdisc_queue_drop_head which
> keeps correct statistics and is actually clearer.
> 
>  

Thanks for doing this Stephen, this version seems fine.



^ permalink raw reply

* Re: [PATCH 1/2] genirq: Add IRQ affinity notifiers
From: Thomas Gleixner @ 2011-01-14 19:47 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, Tom Herbert, linux-kernel, netdev,
	linux-net-drivers
In-Reply-To: <1294169919.3636.33.camel@bwh-desktop>

On Tue, 4 Jan 2011, Ben Hutchings wrote:
> +/**
> + * struct irq_affinity_notify - context for notification of IRQ affinity changes
> + * @irq:		Interrupt to which notification applies
> + * @kref:		Reference count, for internal use
> + * @work:		Work item, for internal use
> + * @notify:		Function to be called on change.  This will be
> + *			called in process context.
> + * @release:		Function to be called on release.  This will be
> + *			called in process context.  Once registered, the
> + *			structure must only be freed when this function is
> + *			called or later.
> + */
> +struct irq_affinity_notify {
> +        unsigned int irq;
> +        struct kref kref;
> +#if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)

The whole affinity thing is SMP and GENERIC_HARDIRQS only anyway, so
what's the point of this ifdeffery ?

> +        struct work_struct work;
> +#endif
> +        void (*notify)(struct irq_affinity_notify *, const cpumask_t *mask);
> +        void (*release)(struct kref *ref);
> +};
> +

> +/**
> + *	irq_set_affinity_notifier - control notification of IRQ affinity changes
> + *	@irq:		Interrupt for which to enable/disable notification
> + *	@notify:	Context for notification, or %NULL to disable
> + *			notification.  Function pointers must be initialised;
> + *			the other fields will be initialised by this function.
> + *
> + *	Must be called in process context.  Notification may only be enabled
> + *	after the IRQ is allocated but before it is bound with request_irq()

Why? And if there is that restriction, then it needs to be
checked. But I don't see why this is necessary.

> + *	and must be disabled before the IRQ is freed using free_irq().
> + */

> +#ifdef CONFIG_SMP
> +	BUG_ON(desc->affinity_notify);

We should be nice here and just WARN and fixup the wreckage by
uninstalling it.

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCH 1/2] genirq: Add IRQ affinity notifiers
From: Ben Hutchings @ 2011-01-14 20:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Miller, Tom Herbert, linux-kernel, netdev,
	linux-net-drivers
In-Reply-To: <alpine.LFD.2.00.1101141928210.2678@localhost6.localdomain6>

On Fri, 2011-01-14 at 20:47 +0100, Thomas Gleixner wrote:
> On Tue, 4 Jan 2011, Ben Hutchings wrote:
> > +/**
> > + * struct irq_affinity_notify - context for notification of IRQ affinity changes
> > + * @irq:		Interrupt to which notification applies
> > + * @kref:		Reference count, for internal use
> > + * @work:		Work item, for internal use
> > + * @notify:		Function to be called on change.  This will be
> > + *			called in process context.
> > + * @release:		Function to be called on release.  This will be
> > + *			called in process context.  Once registered, the
> > + *			structure must only be freed when this function is
> > + *			called or later.
> > + */
> > +struct irq_affinity_notify {
> > +        unsigned int irq;
> > +        struct kref kref;
> > +#if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
> 
> The whole affinity thing is SMP and GENERIC_HARDIRQS only anyway, so
> what's the point of this ifdeffery ?

The intent is that code using this can be compiled even if those config
options are not set.  The work_struct is not needed in that case.  I
think this is probably pointless though.

> > +        struct work_struct work;
> > +#endif
> > +        void (*notify)(struct irq_affinity_notify *, const cpumask_t *mask);
> > +        void (*release)(struct kref *ref);
> > +};
> > +
> 
> > +/**
> > + *	irq_set_affinity_notifier - control notification of IRQ affinity changes
> > + *	@irq:		Interrupt for which to enable/disable notification
> > + *	@notify:	Context for notification, or %NULL to disable
> > + *			notification.  Function pointers must be initialised;
> > + *			the other fields will be initialised by this function.
> > + *
> > + *	Must be called in process context.  Notification may only be enabled
> > + *	after the IRQ is allocated but before it is bound with request_irq()
> 
> Why? And if there is that restriction, then it needs to be
> checked. But I don't see why this is necessary.

Which restriction?

> > + *	and must be disabled before the IRQ is freed using free_irq().
> > + */
> 
> > +#ifdef CONFIG_SMP
> > +	BUG_ON(desc->affinity_notify);
> 
> We should be nice here and just WARN and fixup the wreckage by
> uninstalling it.

OK.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
From: John Fastabend @ 2011-01-14 20:10 UTC (permalink / raw)
  To: Oleg V. Ukhno; +Cc: netdev@vger.kernel.org, Jay Vosburgh, David S. Miller
In-Reply-To: <20110114190714.GA11655@yandex-team.ru>

On 1/14/2011 11:07 AM, Oleg V. Ukhno wrote:
> Patch introduces new hashing policy for 802.3ad bonding mode.
> This hashing policy can be used(was tested) only for round-robin
> balancing of ISCSI traffic(single TCP session is balanced (per-packet)
> over all slave interfaces. 
> General requirements for this hashing policy usage are:
> 1) switch must be configured with src-dst-mac or src-mac hashing policy 
> 2) number of bond slaves on sending and receiving machine should be equal
> and preferrably even; or simply even, otherwise you may get asymmetric 
> load on receiving machine
> 3) hashing policy must not be used when round trip time between source 
> and destination machines for slaves in same bond is expected to be 
> significanly different (it works fine when all slaves are plugged into
> single switch)
> 
> Signed-off-by: Oleg V. Ukhno <olegu@yandex-team.ru>
> ---

I think you want this patch against net-next not 2.6.37.

> 
>  Documentation/networking/bonding.txt |   27 +++++++++++++++++++++++++++
>  drivers/net/bonding/bond_3ad.c       |    6 ++++++
>  drivers/net/bonding/bond_main.c      |   18 +++++++++++++++++-
>  include/linux/if_bonding.h           |    1 +
>  4 files changed, 51 insertions(+), 1 deletion(-)
> 
> diff -uprN -X linux-2.6.37-vanilla/Documentation/dontdiff linux-2.6.37-vanilla/Documentation/networking/bonding.txt linux-2.6.37.my/Documentation/networking/bonding.txt
> --- linux-2.6.37-vanilla/Documentation/networking/bonding.txt	2011-01-05 03:50:19.000000000 +0300
> +++ linux-2.6.37.my/Documentation/networking/bonding.txt	2011-01-14 21:34:46.635268000 +0300
> @@ -759,6 +759,33 @@ xmit_hash_policy
>  		most UDP traffic is not involved in extended
>  		conversations.  Other implementations of 802.3ad may
>  		or may not tolerate this noncompliance.
> +
> +	simple-rr or 3
> +		This policy simply sends every next packet via "next"
> +		slave interface. When sending, it resets mac-address
> +		within packet to real mac-address of the slave interface.
> +
> +		When switch is configured properly, and receiving machine
> +		has even and equal number of interfaces, this guarantees
> +		quite precise rx/tx load balancing for any single TCP
> +		session. Typical use-case for this mode is ISCSI(and patch was
> +		developed for), because it ises single TCP session to
> +		transmit data.

Oleg, sorry but I don't follow. If this is simply sending every next packet
via "next" slave interface how are packets not going to get out of order? If
the links have different RTT this would seem problematic.

Have you considered using multipath at the block layer? This is how I generally
handle load balancing over iSCSI/FCoE and it works reasonably well.

see ./drivers/md/dm-mpath.c

> +
> +		It is important to remember, that all slaves should be
> +		plugged into single switch to avoid out-of-order packets
> +		It is recommended to have equal and even number of slave
> +		interfaces in sending and receviving machines bond's,
> +		otherwise you will get asymmetric load on receiving host.
> +		Another caveat is that hashing policy must not be used when
> +		round trip time between source and destination machines for
> +		slaves in same bond is expected to be significanly different
> +		(it works fine when all slaves are plugged into single switch)
> +
> +		For correct load baalncing on the receiving side you must
> +		configure switch for using src-dst-mac or src-mac hashing
> +		mode.
> +
>  
>  	The default value is layer2.  This option was added in bonding
>  	version 2.6.3.  In earlier versions of bonding, this parameter
> diff -uprN -X linux-2.6.37-vanilla/Documentation/dontdiff linux-2.6.37-vanilla/drivers/net/bonding/bond_3ad.c linux-2.6.37.my/drivers/net/bonding/bond_3ad.c
> --- linux-2.6.37-vanilla/drivers/net/bonding/bond_3ad.c	2011-01-14 19:39:05.575268000 +0300
> +++ linux-2.6.37.my/drivers/net/bonding/bond_3ad.c	2011-01-14 19:47:03.815268000 +0300
> @@ -2395,6 +2395,7 @@ int bond_3ad_xmit_xor(struct sk_buff *sk
>  	int i;
>  	struct ad_info ad_info;
>  	int res = 1;
> +	struct ethhdr *eth_data;
>  
>  	/* make sure that the slaves list will
>  	 * not change during tx
> @@ -2447,6 +2448,11 @@ int bond_3ad_xmit_xor(struct sk_buff *sk
>  			slave_agg_id = agg->aggregator_identifier;
>  
>  		if (SLAVE_IS_OK(slave) && agg && (slave_agg_id == agg_id)) {
> +			if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYERRR && ntohs(skb->protocol) == ETH_P_IP) {
> +				skb_reset_mac_header(skb);
> +				eth_data = eth_hdr(skb);
> +				memcpy(eth_data->h_source, slave->perm_hwaddr, ETH_ALEN);
> +			}
>  			res = bond_dev_queue_xmit(bond, skb, slave->dev);
>  			break;
>  		}
> diff -uprN -X linux-2.6.37-vanilla/Documentation/dontdiff linux-2.6.37-vanilla/drivers/net/bonding/bond_main.c linux-2.6.37.my/drivers/net/bonding/bond_main.c
> --- linux-2.6.37-vanilla/drivers/net/bonding/bond_main.c	2011-01-14 19:39:05.575268000 +0300
> +++ linux-2.6.37.my/drivers/net/bonding/bond_main.c	2011-01-14 19:47:55.835268001 +0300
> @@ -152,7 +152,9 @@ module_param(ad_select, charp, 0);
>  MODULE_PARM_DESC(ad_select, "803.ad aggregation selection logic: stable (0, default), bandwidth (1), count (2)");
>  module_param(xmit_hash_policy, charp, 0);
>  MODULE_PARM_DESC(xmit_hash_policy, "XOR hashing method: 0 for layer 2 (default)"
> -				   ", 1 for layer 3+4");
> +				   ", 1 for layer 3+4"
> +				   ", 2 for layer 2+3"
> +				   ", 3 for round-robin");
>  module_param(arp_interval, int, 0);
>  MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
>  module_param_array(arp_ip_target, charp, NULL, 0);
> @@ -206,6 +208,7 @@ const struct bond_parm_tbl xmit_hashtype
>  {	"layer2",		BOND_XMIT_POLICY_LAYER2},
>  {	"layer3+4",		BOND_XMIT_POLICY_LAYER34},
>  {	"layer2+3",		BOND_XMIT_POLICY_LAYER23},
> +{	"simple-rr",		BOND_XMIT_POLICY_LAYERRR},
>  {	NULL,			-1},
>  };
>  
> @@ -3762,6 +3765,16 @@ static int bond_xmit_hash_policy_l2(stru
>  	return (data->h_dest[5] ^ data->h_source[5]) % count;
>  }
>  
> +/*
> + * simply round robin
> + */
> +static int bond_xmit_hash_policy_rr(struct sk_buff *skb,
> +				   struct net_device *bond_dev, int count)

Here's one reason why this won't work on net-next-2.6.

int      (*xmit_hash_policy)(struct sk_buff *, int);


Thanks,
John

^ permalink raw reply

* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
From: Jay Vosburgh @ 2011-01-14 20:13 UTC (permalink / raw)
  To: Oleg V. Ukhno; +Cc: netdev, David S. Miller
In-Reply-To: <20110114190714.GA11655@yandex-team.ru>

Oleg V. Ukhno <olegu@yandex-team.ru> wrote:

>Patch introduces new hashing policy for 802.3ad bonding mode.
>This hashing policy can be used(was tested) only for round-robin
>balancing of ISCSI traffic(single TCP session is balanced (per-packet)
>over all slave interfaces. 

	This is a violation of the 802.3ad (now 802.1ax) standard, 5.2.1
(f), which requires that all frames of a given "conversation" are passed
to a single port.

	The existing layer3+4 hash has a similar problem (that it may
send packets from a conversation to multiple ports), but for that case
it's an unlikely exception (only in the case of IP fragmentation), but
here it's the norm.  At a minimum, this must be clearly documented.

	Also, what does a round robin in 802.3ad provide that the
existing round robin does not?  My presumption is that you're looking to
get the aggregator autoconfiguration that 802.3ad provides, but you
don't say.

	I don't necessarily think this is a bad cheat (round robining on
802.3ad as an explicit non-standard extension), since everybody wants to
stripe their traffic across multiple slaves.  I've given some thought to
making round robin into just another hash mode, but this also does some
magic to the MAC addresses of the outgoing frames (more on that below).

>General requirements for this hashing policy usage are:
>1) switch must be configured with src-dst-mac or src-mac hashing policy 
>2) number of bond slaves on sending and receiving machine should be equal
>and preferrably even; or simply even, otherwise you may get asymmetric 
>load on receiving machine
>3) hashing policy must not be used when round trip time between source 
>and destination machines for slaves in same bond is expected to be 
>significanly different (it works fine when all slaves are plugged into
>single switch)
>
>Signed-off-by: Oleg V. Ukhno <olegu@yandex-team.ru>
>---
>
> Documentation/networking/bonding.txt |   27 +++++++++++++++++++++++++++
> drivers/net/bonding/bond_3ad.c       |    6 ++++++
> drivers/net/bonding/bond_main.c      |   18 +++++++++++++++++-
> include/linux/if_bonding.h           |    1 +
> 4 files changed, 51 insertions(+), 1 deletion(-)
>
>diff -uprN -X linux-2.6.37-vanilla/Documentation/dontdiff linux-2.6.37-vanilla/Documentation/networking/bonding.txt linux-2.6.37.my/Documentation/networking/bonding.txt
>--- linux-2.6.37-vanilla/Documentation/networking/bonding.txt	2011-01-05 03:50:19.000000000 +0300
>+++ linux-2.6.37.my/Documentation/networking/bonding.txt	2011-01-14 21:34:46.635268000 +0300
>@@ -759,6 +759,33 @@ xmit_hash_policy
> 		most UDP traffic is not involved in extended
> 		conversations.  Other implementations of 802.3ad may
> 		or may not tolerate this noncompliance.
>+
>+	simple-rr or 3
>+		This policy simply sends every next packet via "next"
>+		slave interface. When sending, it resets mac-address
>+		within packet to real mac-address of the slave interface.

	Why is the MAC address reset done?  This is also a violation of
802.3ad, 5.2.1 (j).

>+		When switch is configured properly, and receiving machine
>+		has even and equal number of interfaces, this guarantees
>+		quite precise rx/tx load balancing for any single TCP
>+		session. Typical use-case for this mode is ISCSI(and patch was
>+		developed for), because it ises single TCP session to
>+		transmit data.
>+
>+		It is important to remember, that all slaves should be
>+		plugged into single switch to avoid out-of-order packets
>+		It is recommended to have equal and even number of slave
>+		interfaces in sending and receviving machines bond's,
>+		otherwise you will get asymmetric load on receiving host.
>+		Another caveat is that hashing policy must not be used when
>+		round trip time between source and destination machines for
>+		slaves in same bond is expected to be significanly different
>+		(it works fine when all slaves are plugged into single switch)
>+
>+		For correct load baalncing on the receiving side you must
>+		configure switch for using src-dst-mac or src-mac hashing
>+		mode.
>+
>
> 	The default value is layer2.  This option was added in bonding
> 	version 2.6.3.  In earlier versions of bonding, this parameter
>diff -uprN -X linux-2.6.37-vanilla/Documentation/dontdiff linux-2.6.37-vanilla/drivers/net/bonding/bond_3ad.c linux-2.6.37.my/drivers/net/bonding/bond_3ad.c
>--- linux-2.6.37-vanilla/drivers/net/bonding/bond_3ad.c	2011-01-14 19:39:05.575268000 +0300
>+++ linux-2.6.37.my/drivers/net/bonding/bond_3ad.c	2011-01-14 19:47:03.815268000 +0300
>@@ -2395,6 +2395,7 @@ int bond_3ad_xmit_xor(struct sk_buff *sk
> 	int i;
> 	struct ad_info ad_info;
> 	int res = 1;
>+	struct ethhdr *eth_data;
>
> 	/* make sure that the slaves list will
> 	 * not change during tx
>@@ -2447,6 +2448,11 @@ int bond_3ad_xmit_xor(struct sk_buff *sk
> 			slave_agg_id = agg->aggregator_identifier;
>
> 		if (SLAVE_IS_OK(slave) && agg && (slave_agg_id == agg_id)) {
>+			if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYERRR && ntohs(skb->protocol) == ETH_P_IP) {
>+				skb_reset_mac_header(skb);
>+				eth_data = eth_hdr(skb);
>+				memcpy(eth_data->h_source, slave->perm_hwaddr, ETH_ALEN);
>+			}

	This is the code that resets the MAC header as described above.
It doesn't quite match the documentation, since it only resets the MAC
for ETH_P_IP packets.

> 			res = bond_dev_queue_xmit(bond, skb, slave->dev);
> 			break;
> 		}
>diff -uprN -X linux-2.6.37-vanilla/Documentation/dontdiff linux-2.6.37-vanilla/drivers/net/bonding/bond_main.c linux-2.6.37.my/drivers/net/bonding/bond_main.c
>--- linux-2.6.37-vanilla/drivers/net/bonding/bond_main.c	2011-01-14 19:39:05.575268000 +0300
>+++ linux-2.6.37.my/drivers/net/bonding/bond_main.c	2011-01-14 19:47:55.835268001 +0300
>@@ -152,7 +152,9 @@ module_param(ad_select, charp, 0);
> MODULE_PARM_DESC(ad_select, "803.ad aggregation selection logic: stable (0, default), bandwidth (1), count (2)");
> module_param(xmit_hash_policy, charp, 0);
> MODULE_PARM_DESC(xmit_hash_policy, "XOR hashing method: 0 for layer 2 (default)"
>-				   ", 1 for layer 3+4");
>+				   ", 1 for layer 3+4"
>+				   ", 2 for layer 2+3"
>+				   ", 3 for round-robin");
> module_param(arp_interval, int, 0);
> MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
> module_param_array(arp_ip_target, charp, NULL, 0);
>@@ -206,6 +208,7 @@ const struct bond_parm_tbl xmit_hashtype
> {	"layer2",		BOND_XMIT_POLICY_LAYER2},
> {	"layer3+4",		BOND_XMIT_POLICY_LAYER34},
> {	"layer2+3",		BOND_XMIT_POLICY_LAYER23},
>+{	"simple-rr",		BOND_XMIT_POLICY_LAYERRR},

	I'd just call it "round-robin" instead of "simple-rr".

> {	NULL,			-1},
> };
>
>@@ -3762,6 +3765,16 @@ static int bond_xmit_hash_policy_l2(stru
> 	return (data->h_dest[5] ^ data->h_source[5]) % count;
> }
>
>+/*
>+ * simply round robin
>+ */
>+static int bond_xmit_hash_policy_rr(struct sk_buff *skb,
>+				   struct net_device *bond_dev, int count)
>+{
>+	struct bonding *bond = netdev_priv(bond_dev);
>+	return bond->rr_tx_counter++ % count;
>+}
>+
> /*-------------------------- Device entry points ----------------------------*/
>
> static int bond_open(struct net_device *bond_dev)
>@@ -4482,6 +4495,9 @@ out:
> static void bond_set_xmit_hash_policy(struct bonding *bond)
> {
> 	switch (bond->params.xmit_policy) {
>+	case BOND_XMIT_POLICY_LAYERRR:
>+		bond->xmit_hash_policy = bond_xmit_hash_policy_rr;
>+		break;
> 	case BOND_XMIT_POLICY_LAYER23:
> 		bond->xmit_hash_policy = bond_xmit_hash_policy_l23;
> 		break;
>diff -uprN -X linux-2.6.37-vanilla/Documentation/dontdiff linux-2.6.37-vanilla/include/linux/if_bonding.h linux-2.6.37.my/include/linux/if_bonding.h
>--- linux-2.6.37-vanilla/include/linux/if_bonding.h	2011-01-05 03:50:19.000000000 +0300
>+++ linux-2.6.37.my/include/linux/if_bonding.h	2011-01-14 19:34:29.755268001 +0300
>@@ -91,6 +91,7 @@
> #define BOND_XMIT_POLICY_LAYER2		0 /* layer 2 (MAC only), default */
> #define BOND_XMIT_POLICY_LAYER34	1 /* layer 3+4 (IP ^ (TCP || UDP)) */
> #define BOND_XMIT_POLICY_LAYER23	2 /* layer 2+3 (IP ^ MAC) */
>+#define BOND_XMIT_POLICY_LAYERRR	3 /* round-robin */
>
> typedef struct ifbond {
> 	__s32 bond_mode;

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH 1/2] genirq: Add IRQ affinity notifiers
From: Thomas Gleixner @ 2011-01-14 20:40 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, Tom Herbert, linux-kernel, netdev,
	linux-net-drivers
In-Reply-To: <1295035597.5386.8.camel@bwh-desktop>

On Fri, 14 Jan 2011, Ben Hutchings wrote:
> On Fri, 2011-01-14 at 20:47 +0100, Thomas Gleixner wrote:
> > > +#if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
> > 
> > The whole affinity thing is SMP and GENERIC_HARDIRQS only anyway, so
> > what's the point of this ifdeffery ?
> 
> The intent is that code using this can be compiled even if those config
> options are not set.  The work_struct is not needed in that case.  I
> think this is probably pointless though.

Yup, work_struct is defined for the !SMP and !GENERIC_HARDIRQS case as
well :)
 
> > > +        struct work_struct work;
> > > +#endif
> > > +        void (*notify)(struct irq_affinity_notify *, const cpumask_t *mask);
> > > +        void (*release)(struct kref *ref);
> > > +};
> > > +
> > 
> > > +/**
> > > + *	irq_set_affinity_notifier - control notification of IRQ affinity changes
> > > + *	@irq:		Interrupt for which to enable/disable notification
> > > + *	@notify:	Context for notification, or %NULL to disable
> > > + *			notification.  Function pointers must be initialised;
> > > + *			the other fields will be initialised by this function.
> > > + *
> > > + *	Must be called in process context.  Notification may only be enabled
> > > + *	after the IRQ is allocated but before it is bound with request_irq()
> > 
> > Why? And if there is that restriction, then it needs to be
> > checked. But I don't see why this is necessary.
> 
> Which restriction?

  Notification may only be enabled after the IRQ is allocated but
  before it is bound with request_irq()

After IRQ is allocated is obvious, but why needs it to be done
_before_ request_irq() ?

Thanks,

	tglx

^ permalink raw reply

* Re: Kernel 2.6.37-git10 build failure: cassini.c
From: David Miller @ 2011-01-14 20:41 UTC (permalink / raw)
  To: anca.emanuel
  Cc: linux-kernel, netdev, grant.likely, eric.dumazet, joe, siccegge,
	jpirko
In-Reply-To: <AANLkTin61iqrD0Jbax+cvQ1wxD8b=c4KJv7pBSvU7kbv@mail.gmail.com>

From: Anca Emanuel <anca.emanuel@gmail.com>
Date: Fri, 14 Jan 2011 10:09:43 +0200

> drivers/net/cassini.c: In function ‘cas_get_vpd_info’:
> drivers/net/cassini.c:3358: error: implicit declaration of function
> ‘of_get_property’
> drivers/net/cassini.c:3358: warning: assignment makes pointer from
> integer without a cast
> drivers/net/cassini.c: In function ‘cas_init_one’:
> drivers/net/cassini.c:5035: error: implicit declaration of function
> ‘pci_device_to_OF_node’
> drivers/net/cassini.c:5035: warning: assignment makes pointer from
> integer without a cast
> make[3]: *** [drivers/net/cassini.o] Error 1
> make[2]: *** [drivers/net] Error 2

This is the fix I'll be using:

--------------------
cassini: Fix build bustage on x86.

Unfortunately, not all CONFIG_OF platforms provide
pci_device_to_OF_node().

Change the test to CONFIG_SPARC for now to deal with
the build regressions.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/cassini.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/cassini.c b/drivers/net/cassini.c
index 7206ab2..3437613 100644
--- a/drivers/net/cassini.c
+++ b/drivers/net/cassini.c
@@ -3203,7 +3203,7 @@ static int cas_get_vpd_info(struct cas *cp, unsigned char *dev_addr,
 	int phy_type = CAS_PHY_MII_MDIO0; /* default phy type */
 	int mac_off  = 0;
 
-#if defined(CONFIG_OF)
+#if defined(CONFIG_SPARC)
 	const unsigned char *addr;
 #endif
 
@@ -3354,7 +3354,7 @@ use_random_mac_addr:
 	if (found & VPD_FOUND_MAC)
 		goto done;
 
-#if defined(CONFIG_OF)
+#if defined(CONFIG_SPARC)
 	addr = of_get_property(cp->of_node, "local-mac-address", NULL);
 	if (addr != NULL) {
 		memcpy(dev_addr, addr, 6);
@@ -5031,7 +5031,7 @@ static int __devinit cas_init_one(struct pci_dev *pdev,
 	cp->msg_enable = (cassini_debug < 0) ? CAS_DEF_MSG_ENABLE :
 	  cassini_debug;
 
-#if defined(CONFIG_OF)
+#if defined(CONFIG_SPARC)
 	cp->of_node = pci_device_to_OF_node(pdev);
 #endif
 
-- 
1.7.3.4


^ permalink raw reply related

* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
From: Nicolas de Pesloüan @ 2011-01-14 20:41 UTC (permalink / raw)
  To: Oleg V. Ukhno; +Cc: netdev, Jay Vosburgh, David S. Miller
In-Reply-To: <20110114190714.GA11655@yandex-team.ru>

Le 14/01/2011 20:07, Oleg V. Ukhno a écrit :

> +
> +		For correct load baalncing on the receiving side you must
> +		configure switch for using src-dst-mac or src-mac hashing
> +		mode.

Typo in baalncing -> balancing.

	Nicolas.

^ permalink raw reply

* Re: [PULL] vhost-net: 2.6.38 fix
From: David Miller @ 2011-01-14 20:41 UTC (permalink / raw)
  To: mst; +Cc: kvm, virtualization, netdev, linux-kernel
In-Reply-To: <20110114093302.GA702@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Fri, 14 Jan 2011 11:33:02 +0200

> Please pull the following for 2.6.38.
> Thanks!
> 
> The following changes since commit 0c21e3aaf6ae85bee804a325aa29c325209180fd:
> 
>   Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus (2011-01-07 17:16:27 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net
> 
> Michael S. Tsirkin (1):
>       vhost: fix signed/unsigned comparison

Pulled, thanks.

^ permalink raw reply

* Re: pull request: sfc-2.6 2011-01-14
From: David Miller @ 2011-01-14 20:42 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1295014889.5386.1.camel@bwh-desktop>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Fri, 14 Jan 2011 14:21:29 +0000

> The following changes since commit 5b919f833d9d60588d026ad82d17f17e8872c7a9:
> 
>   net: ax25: fix information leak to userland harder (2011-01-12 00:34:49 -0800)
> 
> are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-2.6.git master
> 
> A minor optimisation and a regression fix.

Pulled, thanks Ben.

^ permalink raw reply

* Re: [net-2.6 0/3][pull-request] Intel Wired LAN Driver Updates
From: David Miller @ 2011-01-14 20:43 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips
In-Reply-To: <1295005350-28124-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 14 Jan 2011 03:42:27 -0800

> The following series contains a fix for e1000 and trivial fixes
> for e1000e.
> 
> The following are changes since commit 1949e084bfd143c76e22c0b37f370d6e7bf4bfdd:
>   Merge branch 'master' of git://1984.lsi.us.es/net-2.6
> 
> and are available in the git repository at:
>   master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-2.6 master
> 
> Bruce Allan (2):
>   e1000e: update Copyright for 2011
>   e1000e: consistent use of Rx/Tx vs. RX/TX/rx/tx in comments/logs
> 
> Jesse Brandeburg (1):
>   e1000: Avoid unhandled IRQ

Pulled, thanks Jeff.

^ permalink raw reply

* Re: [PATCH 1/7 v2] GRETH: added raw AMBA vendor/device number to match against.
From: David Miller @ 2011-01-14 20:46 UTC (permalink / raw)
  To: daniel; +Cc: netdev, kristoffer
In-Reply-To: <1295010163-2585-1-git-send-email-daniel@gaisler.com>

From: Daniel Hellstrom <daniel@gaisler.com>
Date: Fri, 14 Jan 2011 14:02:37 +0100

> Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/7 v2] GRETH: fix opening/closing
From: David Miller @ 2011-01-14 20:46 UTC (permalink / raw)
  To: daniel; +Cc: netdev, kristoffer
In-Reply-To: <1295010163-2585-2-git-send-email-daniel@gaisler.com>

From: Daniel Hellstrom <daniel@gaisler.com>
Date: Fri, 14 Jan 2011 14:02:38 +0100

> When NAPI is disabled there is no point in having IRQs enabled, TX/RX
> should be off before clearing the TX/RX descriptor rings.
> 
> Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>

Applied.

^ permalink raw reply

* Re: [PATCH 3/7 v2] GRETH: GBit transmit descriptor handling optimization
From: David Miller @ 2011-01-14 20:46 UTC (permalink / raw)
  To: daniel; +Cc: netdev, kristoffer
In-Reply-To: <1295010163-2585-3-git-send-email-daniel@gaisler.com>

From: Daniel Hellstrom <daniel@gaisler.com>
Date: Fri, 14 Jan 2011 14:02:39 +0100

> It is safe to enable all fragments before enabling the first descriptor,
> this way all descriptors don't have to be processed twice, added extra
> memory barrier.
> 
> Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>

Applied.

^ permalink raw reply

* Re: [PATCH 4/7 v2] GRETH: fixed skb buffer memory leak on frame errors
From: David Miller @ 2011-01-14 20:46 UTC (permalink / raw)
  To: daniel; +Cc: netdev, kristoffer
In-Reply-To: <1295010163-2585-4-git-send-email-daniel@gaisler.com>

From: Daniel Hellstrom <daniel@gaisler.com>
Date: Fri, 14 Jan 2011 14:02:40 +0100

> A new SKB buffer should not be allocated when the old SKB is reused.
> 
> Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>

Applied.

^ permalink raw reply

* Re: [PATCH 5/7 v2] GRETH: avoid writing bad speed/duplex when setting transfer mode
From: David Miller @ 2011-01-14 20:46 UTC (permalink / raw)
  To: daniel; +Cc: netdev, kristoffer
In-Reply-To: <1295010163-2585-5-git-send-email-daniel@gaisler.com>

From: Daniel Hellstrom <daniel@gaisler.com>
Date: Fri, 14 Jan 2011 14:02:41 +0100

> Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>

Applied.

^ permalink raw reply

* Re: [PATCH 6/7 v2] GRETH: handle frame error interrupts
From: David Miller @ 2011-01-14 20:46 UTC (permalink / raw)
  To: daniel; +Cc: netdev, kristoffer
In-Reply-To: <1295010163-2585-6-git-send-email-daniel@gaisler.com>

From: Daniel Hellstrom <daniel@gaisler.com>
Date: Fri, 14 Jan 2011 14:02:42 +0100

> Frame error interrupts must also be handled since the RX flag only indicates
> successful reception, it is unlikely but the old code may lead to dead lock
> if 128 error frames are recieved in a row.
> 
> Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>

Applied.

^ permalink raw reply

* Re: [PATCH 7/7 v2] GRETH: resolve SMP issues and other problems
From: David Miller @ 2011-01-14 20:47 UTC (permalink / raw)
  To: daniel; +Cc: netdev, kristoffer
In-Reply-To: <1295010163-2585-7-git-send-email-daniel@gaisler.com>

From: Daniel Hellstrom <daniel@gaisler.com>
Date: Fri, 14 Jan 2011 14:02:43 +0100

> Fixes the following:
 ...
> Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>

Applied.

^ permalink raw reply

* Re: Kernel 2.6.37-git10 build failure: cassini.c
From: Grant Likely @ 2011-01-14 21:01 UTC (permalink / raw)
  To: David Miller
  Cc: anca.emanuel, linux-kernel, netdev, eric.dumazet, joe, siccegge,
	jpirko
In-Reply-To: <20110114.124116.197952526.davem@davemloft.net>

2011/1/14 David Miller <davem@davemloft.net>:
> From: Anca Emanuel <anca.emanuel@gmail.com>
> Date: Fri, 14 Jan 2011 10:09:43 +0200
>
>> drivers/net/cassini.c: In function ‘cas_get_vpd_info’:
>> drivers/net/cassini.c:3358: error: implicit declaration of function
>> ‘of_get_property’
>> drivers/net/cassini.c:3358: warning: assignment makes pointer from
>> integer without a cast
>> drivers/net/cassini.c: In function ‘cas_init_one’:
>> drivers/net/cassini.c:5035: error: implicit declaration of function
>> ‘pci_device_to_OF_node’
>> drivers/net/cassini.c:5035: warning: assignment makes pointer from
>> integer without a cast
>> make[3]: *** [drivers/net/cassini.o] Error 1
>> make[2]: *** [drivers/net] Error 2
>
> This is the fix I'll be using:
>
> --------------------
> cassini: Fix build bustage on x86.
>
> Unfortunately, not all CONFIG_OF platforms provide
> pci_device_to_OF_node().
>
> Change the test to CONFIG_SPARC for now to deal with
> the build regressions.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>

Acked-by: Grant Likely <grant.likely@secretlab.ca>

pci_device_to_OF_node() will probably become available for all
CONFIG_OF users in 2.6.39.  In the meantime, I agree with this
solution.

g.

> ---
>  drivers/net/cassini.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/cassini.c b/drivers/net/cassini.c
> index 7206ab2..3437613 100644
> --- a/drivers/net/cassini.c
> +++ b/drivers/net/cassini.c
> @@ -3203,7 +3203,7 @@ static int cas_get_vpd_info(struct cas *cp, unsigned char *dev_addr,
>        int phy_type = CAS_PHY_MII_MDIO0; /* default phy type */
>        int mac_off  = 0;
>
> -#if defined(CONFIG_OF)
> +#if defined(CONFIG_SPARC)
>        const unsigned char *addr;
>  #endif
>
> @@ -3354,7 +3354,7 @@ use_random_mac_addr:
>        if (found & VPD_FOUND_MAC)
>                goto done;
>
> -#if defined(CONFIG_OF)
> +#if defined(CONFIG_SPARC)
>        addr = of_get_property(cp->of_node, "local-mac-address", NULL);
>        if (addr != NULL) {
>                memcpy(dev_addr, addr, 6);
> @@ -5031,7 +5031,7 @@ static int __devinit cas_init_one(struct pci_dev *pdev,
>        cp->msg_enable = (cassini_debug < 0) ? CAS_DEF_MSG_ENABLE :
>          cassini_debug;
>
> -#if defined(CONFIG_OF)
> +#if defined(CONFIG_SPARC)
>        cp->of_node = pci_device_to_OF_node(pdev);
>  #endif
>
> --
> 1.7.3.4
>
>



-- 
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2011-01-14 21:03 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) NAPI and SMP locking bug fixes in GRETH from Daniel Hellstrom.

2) Fix Cassini driver build on x86.

3) Fix unhandled IRQs in e1000, from Jesse Brandeburg.

4) SFC accidently stopped adhering to rss_cpus module parm, from
   Ben Hutchings.

5) IPV6 forwarding path must check skb->packet_type for PACKET_HOST,
   otherwise we get packet storms, fix from Alexey Kuznetsov.

6) rndis driver can deadlock in stats handling, part of the problem is
   the use of dev_txq_stats_fold() which makes this situation too easy
   to get into.  Kill the interface and convert the small number of
   existing users, thus fixing the rndis deadlocks.  From Eric Dumazet.

7) tproxy w/o conntrack build fix in netfilter, from KOVACS Krisztian.

8) ath9k wireless fixes from Sujith Manoharan.

9) Fix ctnetlink error signalling such that we don't loop forever
   in some situations, from Pablo Neira Ayuso.

10) Kernel doc fixups from Randy Dunlap.

11) Wireless stack kernel doc and other comment fixes from Johannes Berg.

Please pull, thanks a lot!

The following changes since commit 4162cf64973df51fc885825bc9ca4d055891c49f:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 (2011-01-11 16:32:41 -0800)

are available in the git repository at:

  master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master

Alexey Kuznetsov (1):
      inet6: prevent network storms caused by linux IPv6 routers

Ben Hutchings (4):
      sfc: Make efx_get_tx_queue() an inline function
      sfc: Restore the effect of the rss_cpus module parameter
      ks8695net: Disable non-working ethtool operations
      ks8695net: Use default implementation of ethtool_ops::get_link

Bruce Allan (2):
      e1000e: update Copyright for 2011
      e1000e: consistent use of Rx/Tx vs. RX/TX/rx/tx in comments/logs

Christian Lamparter (1):
      p54: fix sequence no. accounting off-by-one error

Daniel Hellstrom (7):
      GRETH: added raw AMBA vendor/device number to match against.
      GRETH: fix opening/closing
      GRETH: GBit transmit descriptor handling optimization
      GRETH: fixed skb buffer memory leak on frame errors
      GRETH: avoid writing bad speed/duplex when setting transfer mode
      GRETH: handle frame error interrupts
      GRETH: resolve SMP issues and other problems

David S. Miller (7):
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless-2.6
      Merge branch 'master' of git://1984.lsi.us.es/net-2.6
      Merge branch 'master' of git://1984.lsi.us.es/net-2.6
      cassini: Fix build bustage on x86.
      Merge branch 'vhost-net' of git://git.kernel.org/.../mst/vhost
      Merge branch 'master' of git://git.kernel.org/.../bwh/sfc-2.6
      Merge branch 'master' of master.kernel.org:/.../jkirsher/net-2.6

Eric Dumazet (1):
      net: remove dev_txq_stats_fold()

Indan Zupancic (1):
      ipw2200: Check for -1 INTA in tasklet too.

Jesper Juhl (2):
      vxge: Remember to release firmware after upgrading firmware
      USB CDC NCM: Don't deref NULL in cdc_ncm_rx_fixup() and don't use uninitialized variable.

Jesse Brandeburg (1):
      e1000: Avoid unhandled IRQ

Joe Perches (2):
      bna: Remove unnecessary memset(,0,)
      netdev: bfin_mac: Remove is_multicast_ether_addr use in netdev_for_each_mc_addr

Johannes Berg (5):
      mac80211: add remain-on-channel docs
      mac80211: add missing docs for off-chan TX flag
      cfg80211: add mesh join/leave callback docs
      nl80211: add/fix mesh docs
      mac80211: add doc short section on LED triggers

KOVACS Krisztian (1):
      netfilter: fix compilation when conntrack is disabled but tproxy is enabled

Kees Cook (1):
      net: ax25: fix information leak to userland harder

Michael Buesch (1):
      ssb: Ignore dangling ethernet cores on wireless devices

Michael S. Tsirkin (1):
      vhost: fix signed/unsigned comparison

Nicolas Dichtel (1):
      ipsec: update MAX_AH_AUTH_LEN to support sha512

Pablo Neira Ayuso (1):
      netfilter: ctnetlink: fix loop in ctnetlink_get_conntrack()

Randy Dunlap (1):
      eth: fix new kernel-doc warning

Stanislaw Gruszka (1):
      hostap_cs: fix sleeping function called from invalid context

Sujith Manoharan (5):
      ath9k_hw: Fix chip test
      ath9k_hw: Fix calibration for AR9287 devices
      ath9k_hw: Fix thermal issue with UB94
      ath9k_hw: Fix RX handling for USB devices
      ath9k_htc: Really fix packet injection

Tobias Klauser (4):
      netdev: ucc_geth: Use is_multicast_ether_addr helper
      netdev: bfin_mac: Use is_multicast_ether_addr helper
      etherdevice.h: Add is_unicast_ether_addr function
      netdev: tilepro: Use is_unicast_ether_addr helper

françois romieu (1):
      r8169: keep firmware in memory.

stephen hemminger (1):
      sched: remove unused backlog in RED stats

 Documentation/DocBook/80211.tmpl               |   21 ++-
 drivers/net/arm/ks8695net.c                    |  288 ++++++++----------------
 drivers/net/bfin_mac.c                         |    9 +-
 drivers/net/bna/bnad_ethtool.c                 |    1 -
 drivers/net/cassini.c                          |    6 +-
 drivers/net/e1000/e1000_main.c                 |   10 +-
 drivers/net/e1000e/82571.c                     |    4 +-
 drivers/net/e1000e/Makefile                    |    2 +-
 drivers/net/e1000e/defines.h                   |    2 +-
 drivers/net/e1000e/e1000.h                     |    2 +-
 drivers/net/e1000e/es2lan.c                    |    2 +-
 drivers/net/e1000e/ethtool.c                   |    2 +-
 drivers/net/e1000e/hw.h                        |    4 +-
 drivers/net/e1000e/ich8lan.c                   |    2 +-
 drivers/net/e1000e/lib.c                       |   20 +-
 drivers/net/e1000e/netdev.c                    |  223 +++++++++---------
 drivers/net/e1000e/param.c                     |    6 +-
 drivers/net/e1000e/phy.c                       |    4 +-
 drivers/net/gianfar.c                          |   10 +-
 drivers/net/gianfar.h                          |   10 +
 drivers/net/greth.c                            |  221 +++++++++++--------
 drivers/net/greth.h                            |    2 +
 drivers/net/ixgbe/ixgbe_main.c                 |   23 ++-
 drivers/net/macvtap.c                          |    2 +-
 drivers/net/r8169.c                            |   43 +++-
 drivers/net/sfc/efx.c                          |   18 +-
 drivers/net/sfc/net_driver.h                   |   10 +-
 drivers/net/tile/tilepro.c                     |   10 +-
 drivers/net/ucc_geth.c                         |    2 +-
 drivers/net/usb/cdc_ncm.c                      |    4 +-
 drivers/net/vxge/vxge-main.c                   |    1 +
 drivers/net/wireless/ath/ath9k/ar9002_calib.c  |    3 +
 drivers/net/wireless/ath/ath9k/eeprom_def.c    |    4 +
 drivers/net/wireless/ath/ath9k/htc.h           |    1 +
 drivers/net/wireless/ath/ath9k/htc_drv_main.c  |   37 +++-
 drivers/net/wireless/ath/ath9k/hw.c            |    5 +-
 drivers/net/wireless/hostap/hostap_cs.c        |   15 +-
 drivers/net/wireless/ipw2x00/ipw2200.c         |    7 +
 drivers/net/wireless/p54/txrx.c                |    2 +-
 drivers/ssb/scan.c                             |   10 +
 drivers/vhost/vhost.c                          |   18 +-
 include/linux/etherdevice.h                    |   11 +
 include/linux/netdevice.h                      |    5 -
 include/linux/nl80211.h                        |   20 ++-
 include/linux/skbuff.h                         |   15 ++
 include/net/ah.h                               |    2 +-
 include/net/cfg80211.h                         |    2 +
 include/net/mac80211.h                         |   14 ++
 include/net/netfilter/ipv6/nf_conntrack_ipv6.h |   10 -
 include/net/netfilter/ipv6/nf_defrag_ipv6.h    |   10 +
 include/net/red.h                              |    1 -
 net/ax25/af_ax25.c                             |    2 +-
 net/core/dev.c                                 |   29 ---
 net/core/skbuff.c                              |    2 +
 net/ethernet/eth.c                             |    2 +-
 net/ipv6/ip6_output.c                          |    3 +
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c      |    8 +-
 net/netfilter/nf_conntrack_netlink.c           |    3 +-
 net/sched/sch_teql.c                           |   26 ++-
 59 files changed, 655 insertions(+), 576 deletions(-)

^ permalink raw reply

* Re: sch_sfb
From: Jarek Poplawski @ 2011-01-14 21:06 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: Patrick McHardy, netdev, David Miller
In-Reply-To: <7ivd1rsj6n.fsf@lanthane.pps.jussieu.fr>

Juliusz Chroboczek wrote:
>> I just looked at it out of interest after already having started my
>> own version.
> 
>>>   http://thread.gmane.org/gmane.linux.network/90225
>>>   http://thread.gmane.org/gmane.linux.network/90375
> 
>>> It was reviewed in particular by one Patrick McHardy.
> 
>> There's no reason to be pissed
> 
> Yes, there is.
> 
> First you object to my patch by making a bunch of unreasonable requests
> (notably that I use the in-kernel classifiers, which are not usable with
> Bloom filters).  Then it turns out you're implementing your own version
> "from scratch".  And then you claim that you never saw my version in the
> first place?
> 
> Patrick, what you're doing is not merely rude, it's actually unethical.

Or Linux Classics ;-) Vide: Molnar vs Kolivas.

Jarek P.

^ permalink raw reply

* Re: [PATCH] ethtool : Add option -L | --set-common to set common flags.
From: Ben Hutchings @ 2011-01-14 21:19 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: David Miller, Tom Herbert, Laurent Chavey, netdev
In-Reply-To: <1294963892-11997-1-git-send-email-maheshb@google.com>

On Thu, 2011-01-13 at 16:11 -0800, Mahesh Bandewar wrote:
> This patch adds -L | --set-common option to add / remove common flags which
> includes loopback flag. The -l | --show-common displays the current values
> for these common flags.
> 
> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> ---
>  ethtool-copy.h |    1 +
>  ethtool.8      |   16 ++++++++++
>  ethtool.c      |   90 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 105 insertions(+), 2 deletions(-)
> 
> diff --git a/ethtool-copy.h b/ethtool-copy.h
> index 75c3ae7..5fd18c7 100644
> --- a/ethtool-copy.h
> +++ b/ethtool-copy.h
> @@ -309,6 +309,7 @@ struct ethtool_perm_addr {
>   * flag differs from the read-only value.
>   */
>  enum ethtool_flags {
> +	ETH_FLAG_LOOPBACK	= (1 << 2),	/* Loopback enable / disable */
>  	ETH_FLAG_TXVLAN		= (1 << 7),	/* TX VLAN offload enabled */
>  	ETH_FLAG_RXVLAN		= (1 << 8),	/* RX VLAN offload enabled */
>  	ETH_FLAG_LRO		= (1 << 15),	/* LRO is enabled */
> diff --git a/ethtool.8 b/ethtool.8
> index 1760924..cf7128f 100644
> --- a/ethtool.8
> +++ b/ethtool.8
> @@ -174,6 +174,13 @@ ethtool \- Display or change ethernet card settings
>  .B2 txvlan on off
>  .B2 rxhash on off
>  
> +.B ethtool \-l|\-\-show\-common
> +.I ethX
> +
> +.B ethtool \-L|\-\-set\-common
> +.I ethX
> +.B2 loopback on off
> +
>  .B ethtool \-p|\-\-identify
>  .I ethX
>  .RI [ N ]
> @@ -406,6 +413,15 @@ Specifies whether TX VLAN acceleration should be enabled
>  .A2 rxhash on off
>  Specifies whether receive hashing offload should be enabled
>  .TP
> +.B \-l \-\-show\-common
> +Queries the specified ethernet device for common flag settings.
> +.TP
> +.B \-L \-\-set\-common
> +Changes the common parameters of the specified ethernet device.
> +.TP
> +.A2 loopback on off
> +Specifies whether loopback should be enabled.
> +.TP

I've just gone through the manual page and changed 'ethernet device' to
'network device' for all generic operations; please follow that.  The
source for the manual page was also renamed to ethtool.8.in as it now
goes through autoconf substitution.

>  .B \-p \-\-identify
>  Initiates adapter-specific action intended to enable an operator to
>  easily identify the adapter by sight.  Typically this involves
> diff --git a/ethtool.c b/ethtool.c
> index 63e0ead..1a0c10c 100644
> --- a/ethtool.c
> +++ b/ethtool.c
[...]
> @@ -1905,6 +1932,13 @@ static int dump_offload(int rx, int tx, int sg, int tso, int ufo, int gso,
>  	return 0;
>  }
>  
> +static int dump_common_flags(int loopback)
> +{
> +	fprintf(stdout, "loopback: %s\n", loopback ? "on" : "off");
> +
> +	return 0;
> +}
> +
>  static int dump_rxfhash(int fhash, u64 val)
>  {
>  	switch (fhash) {
[...]
> @@ -2219,6 +2257,53 @@ static int do_scoalesce(int fd, struct ifreq *ifr)
>  	return 0;
>  }
>  
> +static int do_gcommon(int fd, struct ifreq *ifr)
> +{
> +	struct ethtool_value eval;
> +	int loopback = 0;
> +
> +	fprintf(stdout, "Common flags for %s:\n", devname);
> +
> +	eval.cmd = ETHTOOL_GFLAGS;
> +	ifr->ifr_data = (caddr_t)&eval;
> +	if (ioctl(fd, SIOCETHTOOL, ifr)) {
> +		perror("Cannot get device flags");
> +	} else {
> +		loopback = (eval.data & ETH_FLAG_LOOPBACK) != 0;
> +	}
> +
> +	return dump_common_flags(loopback);

Breaking up a bitmask into a list of flag parameters is fairly
pointless.  I realise do_goffload() and dump_offload() do that but I am
just waiting for Michał Mirosław's changes to offload flags to be
settled before I fix them.

> +}
> +
> +static int do_scommon(int fd, struct ifreq *ifr)
> +{
> +	struct ethtool_value eval;
> +
> +	if (common_flags_mask) {
> +		eval.cmd = ETHTOOL_GFLAGS;
> +		eval.data = 0;
> +		ifr->ifr_data = (caddr_t)&eval;
> +		if (ioctl(fd, SIOCETHTOOL, ifr)) {
> +			perror("Cannot get device common flags");
> +			return 1;
> +		}
> +
> +		eval.cmd = ETHTOOL_SFLAGS;
> +		eval.data =
> +		    ((eval.data & ~(common_flags_mask | off_flags_mask)) |
> +		     (common_flags_wanted | off_flags_wanted));

Why should this use off_flags_mask and off_flags_wanted?  They should
both be 0 if this function is called.

> +		if (ioctl(fd, SIOCETHTOOL, ifr)) {
> +			perror("Cannot set device common flags");
> +			return 1;
> +		}
> +	} else {
> +		fprintf(stdout, "No common settings changed\n");
> +	}
> +
> +	return 0;
> +}
> +
>  static int do_goffload(int fd, struct ifreq *ifr)
>  {
>  	struct ethtool_value eval;
> @@ -2407,8 +2492,9 @@ static int do_soffload(int fd, struct ifreq *ifr)
>  		}
>  
>  		eval.cmd = ETHTOOL_SFLAGS;
> -		eval.data = ((eval.data & ~off_flags_mask) |
> -			     off_flags_wanted);
> +		eval.data =
> +		    ((eval.data & ~(off_flags_mask | common_flags_mask)) |
> +		     (off_flags_wanted | common_flags_wanted));

Similarly, why should this use common_flags_mask and
common_flags_wanted?

Ben.

>  
>  		err = ioctl(fd, SIOCETHTOOL, ifr);
>  		if (err) {

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH] r8169: keep firmware in memory.
From: Rafael J. Wysocki @ 2011-01-14 21:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Hutchings, Michael Tokarev, David Woodhouse, Johannes Berg,
	Greg Kroah-Hartman, Francois Romieu, David Miller, netdev,
	Jarek Kamiński, Hayes
In-Reply-To: <AANLkTinumTQN3F=O_hC=jFTFLKVYyWmJNNvHx2A1jL2p@mail.gmail.com>

On Friday, January 14, 2011, Linus Torvalds wrote:
> On Fri, Jan 14, 2011 at 8:30 AM, Ben Hutchings <benh@debian.org> wrote:
> >
> > This is something I started to implement, but never got finished.  I
> > don't think it can be done without an API change, though, as we need
> > to know when to drop firmware from the cache.  But perhaps this could
> > be done with a hook in the device-driver binding code.
> 
> Or just associate the firmware with a module?
> 
> So if the firmware gets loaded, it stays in memory until the module is unloaded?
> 
> And this all would only be the case if CONFIG_PM is set, so you'd not
> waste memory unnecessarily.

Alternatively, a suspend/hibernate notifier can be used for that I think.

They are called before the freezing and after the thawing of user space, so the
the PM_POST_SUSPEND or PM_POST_RESTORE notification can easily cause the
firmare(s) to be dropped from memory.

Thanks,
Rafael

^ permalink raw reply

* [PATCH] Make INET_LHTABLE_SIZE a compile-time tunable
From: Bill Sommerfeld @ 2011-01-14 21:48 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Tom Herbert, Bill Sommerfeld

INET_LHTABLE_SIZE has been fixed at 32 for a long time.  It should be
tunable as larger systems may be running many more than 32 listeners.
Since the exising code depends on the hash table size
being a power of two, use a shift value as the tunable and
compute the hash table size from the shift value.

Signed-off-by: Bill Sommerfeld <wsommerfeld@google.com>
---
Background: We've observed that many of our machines are now running
with well in excess of 32 TCP listener sockets open.  As the number of
cpu cores per system increases we expect this to grow further.

In general hash tables should be sized to keep hash chains short, and
32 is not enough for this on some of our machines.

The help text was inspired by the help text for LOG_BUF_SHIFT in
init/Kconfig

 include/net/inet_hashtables.h |    2 +-
 net/ipv4/Kconfig              |   14 ++++++++++++++
 2 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index e9c2ed8..7253fce 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -113,7 +113,7 @@ struct inet_listen_hashbucket {
 };
 
 /* This is for listening sockets, thus all sockets which possess wildcards. */
-#define INET_LHTABLE_SIZE	32	/* Yes, really, this is all you need. */
+#define INET_LHTABLE_SIZE	(1U << (CONFIG_INET_LHTABLE_SHIFT))
 
 struct inet_hashinfo {
 	/* This is for sockets with full identity only.  Sockets here will
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 9e95d7f..a92954e 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -440,6 +440,20 @@ config INET_TCP_DIAG
 	depends on INET_DIAG
 	def_tristate INET_DIAG
 
+config INET_LHTABLE_SHIFT
+	int "Number of TCP port listener table buckets (5 => 32, 8 => 256)"
+	default 5
+	range 0 12
+	help
+	  Select number of buckets in TCP listener hash as a power of 2
+	  32 is probably enough unless you run a lot of different servers
+	  Examples:
+		     4 => 16
+		     5 => 32 (default)
+		     6 => 64
+		     7 => 128
+		     8 => 256
+
 menuconfig TCP_CONG_ADVANCED
 	bool "TCP: advanced congestion control"
 	---help---
-- 
1.7.3.1


^ permalink raw reply related

* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
From: Oleg V. Ukhno @ 2011-01-14 22:51 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev, David S. Miller
In-Reply-To: <17405.1295036019@death>



Jay Vosburgh wrote:

> 	This is a violation of the 802.3ad (now 802.1ax) standard, 5.2.1
> (f), which requires that all frames of a given "conversation" are passed
> to a single port.
> 
> 	The existing layer3+4 hash has a similar problem (that it may
> send packets from a conversation to multiple ports), but for that case
> it's an unlikely exception (only in the case of IP fragmentation), but
> here it's the norm.  At a minimum, this must be clearly documented.
> 
> 	Also, what does a round robin in 802.3ad provide that the
> existing round robin does not?  My presumption is that you're looking to
> get the aggregator autoconfiguration that 802.3ad provides, but you
> don't say.
> 
> 	I don't necessarily think this is a bad cheat (round robining on
> 802.3ad as an explicit non-standard extension), since everybody wants to
> stripe their traffic across multiple slaves.  I've given some thought to
> making round robin into just another hash mode, but this also does some
> magic to the MAC addresses of the outgoing frames (more on that below).
Yes, I am resetting MAC addresses when transmitting packets to have 
switch to put packets into different ports of the receiving etherchannel.
I am using this patch to provide full-mesh ISCSI connectivity between at 
least 4 hosts (all hosts of course are in same ethernet segment) and 
every host is connected with aggregate link with 4 slaves(usually).
Using round-robin I provide near-equal load striping when transmitting, 
using MAC address magic I force switch to stripe packets over all slave 
links in destination port-channel(when number of rx-ing slaves is equal 
to number ot tx-ing slaves and is even). So I am able to utilize all 
slaves for tx and for rx up to maximum capacity; besides I am getting L2 
link failure detection (and load rebalancing), which is (in my opinion) 
much faster and robust than L3 or than dm-multipath provides.
It's my idea with the patch
> 

> 
> 	This is the code that resets the MAC header as described above.
> It doesn't quite match the documentation, since it only resets the MAC
> for ETH_P_IP packets.
Yes, I really meant that my patch applies to ETH_P_IP packets and I've 
missed that from documentation I wrote.
> 

> 
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
> 

-- 
Best regards,

Oleg Ukhno


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox