Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] iptables: xt_recent: Add optional mask option for xt_recent
From: Denys Fedoryshchenko @ 2012-07-14 17:55 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, Linux netdev
In-Reply-To: <20120714150527.GA6271@1984>

Hi Pablo

On 2012-07-14 18:05, Pablo Neira Ayuso wrote:
> Hi Denys,
>
> On Thu, May 17, 2012 at 11:08:57PM +0300, Denys Fedoryshchenko wrote:
>> Use case for this feature:
>> 1)In some occasions if you need to allow,block,match specific 
>> subnet.
>> 2)I can use recent as a trigger when netfilter rule matches, with 
>> mask 0.0.0.0
>>
>> Tested for backward compatibility:
>> )old (userspace) iptables, new kernel
>> )old kernel, new iptables
>> )new kernel, new iptables
>
> I've cleaned up this patch:
>
> 
> http://git.netfilter.org/cgi-bin/gitweb.cgi?p=iptables.git;a=commit;h=73bf03981dfaee48ac1d6da380d46501a96cc83e
>
> It's not yet in master. Please, check that this is correct.
>
> BTW, the man page update is still missing.

Thank you, everything looks ok, i will try to diff(to learn, not repeat 
coding style mistakes in next submissions) and test. Since i'm in desert 
now, it will take few days.
I will prepare also man page patch for iptables.

---
Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L.

^ permalink raw reply

* Re: [PATCH] sctp: fix sparse warning for sctp_init_cause_fixed
From: Vlad Yasevich @ 2012-07-14 18:01 UTC (permalink / raw)
  To: Ioan Orghici, sri, davem, netdev
In-Reply-To: <1342199797-11305-1-git-send-email-ioanorghici@gmail.com>

Ioan Orghici <ioanorghici@gmail.com> wrote:

>Fix the following sparse warning:
>	* symbol 'sctp_init_cause_fixed' was not declared. Should it be
>	  static?
>
>Signed-off-by: Ioan Orghici <ioanorghici@gmail.com>

Looks good.

Acked-by: Vlad Yasevich <vyasevich@gmail.com>


>---
> net/sctp/sm_make_chunk.c |    2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
>diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
>index b6de71e..479a70e 100644
>--- a/net/sctp/sm_make_chunk.c
>+++ b/net/sctp/sm_make_chunk.c
>@@ -132,7 +132,7 @@ void  sctp_init_cause(struct sctp_chunk *chunk,
>__be16 cause_code,
>  * abort chunk.  Differs from sctp_init_cause in that it won't oops
>  * if there isn't enough space in the op error chunk
>  */
>-int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16 cause_code,
>+static int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16
>cause_code,
> 		      size_t paylen)
> {
> 	sctp_errhdr_t err;
>-- 
>1.7.6.3


-- 
Sent from my Android phone with SkitMail. Please excuse my brevity.

^ permalink raw reply

* Re: [PATCH] sctp: Implement quick failover draft from tsvwg
From: Vlad Yasevich @ 2012-07-14 18:12 UTC (permalink / raw)
  To: Neil Horman, netdev; +Cc: Sridhar Samudrala, David S. Miller, linux-sctp
In-Reply-To: <1342203998-24037-1-git-send-email-nhorman@tuxdriver.com>

Neil Horman <nhorman@tuxdriver.com> wrote:

>I've seen several attempts recently made to do quick failover of sctp
>transports
>by reducing various retransmit timers and counters.  While its possible
>to
>implement a faster failover on multihomed sctp associations, its not
>particularly robust, in that it can lead to unneeded retransmits, as
>well as
>false connection failures due to intermittent latency on a network.
>
>Instead, lets implement the new ietf quick failover draft found here:
>http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
>
>This will let the sctp stack identify transports that have had a small
>number of
>errors, and avoid using them quickly until their reliability can be
>re-established.  I've tested this out on two virt guests connected via
>multiple
>isolated virt networks and believe its in compliance with the above
>draft and
>works well.
>
>Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>CC: Vlad Yasevich <vyasevich@gmail.com>
>CC: Sridhar Samudrala <sri@us.ibm.com>
>CC: "David S. Miller" <davem@davemloft.net>
>CC: linux-sctp@vger.kernel.org
>---
> Documentation/networking/ip-sysctl.txt |   14 +++++++++++++
> include/net/sctp/constants.h           |    1 +
> include/net/sctp/structs.h             |    4 +++
> include/net/sctp/user.h                |    1 +
>net/sctp/associola.c                   |   33
>+++++++++++++++++++++++++------
> net/sctp/outqueue.c                    |    6 +++-
>net/sctp/sm_sideeffect.c               |   33
>++++++++++++++++++++++++++++---
> net/sctp/sysctl.c                      |    9 ++++++++
> net/sctp/transport.c                   |    3 +-
> 9 files changed, 90 insertions(+), 14 deletions(-)
>
>diff --git a/Documentation/networking/ip-sysctl.txt
>b/Documentation/networking/ip-sysctl.txt
>index 47b6c79..c636f9c 100644
>--- a/Documentation/networking/ip-sysctl.txt
>+++ b/Documentation/networking/ip-sysctl.txt
>@@ -1408,6 +1408,20 @@ path_max_retrans - INTEGER
> 
> 	Default: 5
> 
>+pf_retrans - INTEGER
>+	The number of retransmissions that will be attempted on a given path
>+	before traffic is redirected to an alternate transport (should one
>+	exist).  Note this is distinct from path_max_retrans, as a path that
>+	passes the pf_retrans threshold can still be used.  Its only
>+	deprioritized when a transmission path is selected by the stack. 
>This
>+	setting is primarily used to enable fast failover mechanisms without
>+	having to reduce path_max_retrans to a very low value.  See:
>+	http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
>+	for details.  Note also that a value of pf_retrans > path_max_retrans
>+	disables this feature
>+
>+	Default: 0
>+
> rto_initial - INTEGER
>	The initial round trip timeout value in milliseconds that will be used
> 	in calculating round trip times.  This is the initial time interval
>diff --git a/include/net/sctp/constants.h
>b/include/net/sctp/constants.h
>index 942b864..d053d2e 100644
>--- a/include/net/sctp/constants.h
>+++ b/include/net/sctp/constants.h
>@@ -334,6 +334,7 @@ typedef enum {
> typedef enum {
> 	SCTP_TRANSPORT_UP,
> 	SCTP_TRANSPORT_DOWN,
>+	SCTP_TRANSPORT_PF,
> } sctp_transport_cmd_t;
> 
> /* These are the address scopes defined mainly for IPv4 addresses
>diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
>index e4652fe..22825abe 100644
>--- a/include/net/sctp/structs.h
>+++ b/include/net/sctp/structs.h
>@@ -160,6 +160,7 @@ extern struct sctp_globals {
> 	int max_retrans_association;
> 	int max_retrans_path;
> 	int max_retrans_init;
>+	int pf_retrans;
> 
> 	/*
> 	 * Policy for preforming sctp/socket accounting
>@@ -258,6 +259,7 @@ extern struct sctp_globals {
> #define sctp_sndbuf_policy	 	(sctp_globals.sndbuf_policy)
> #define sctp_rcvbuf_policy	 	(sctp_globals.rcvbuf_policy)
> #define sctp_max_retrans_path		(sctp_globals.max_retrans_path)
>+#define sctp_pf_retrans			(sctp_globals.pf_retrans)
> #define sctp_max_retrans_init		(sctp_globals.max_retrans_init)
> #define sctp_sack_timeout		(sctp_globals.sack_timeout)
> #define sctp_hb_interval		(sctp_globals.hb_interval)
>@@ -1660,6 +1662,8 @@ struct sctp_association {
> 	 */
> 	int max_retrans;
> 
>+	int pf_retrans;
>+
> 	/* Maximum number of times the endpoint will retransmit INIT  */
> 	__u16 max_init_attempts;
> 
>diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
>index 0842ef0..cece1bf 100644
>--- a/include/net/sctp/user.h
>+++ b/include/net/sctp/user.h
>@@ -649,6 +649,7 @@ struct sctp_paddrinfo {
>  */
> enum sctp_spinfo_state {
> 	SCTP_INACTIVE,
>+	SCTP_PF,
> 	SCTP_ACTIVE,
> 	SCTP_UNCONFIRMED,
> 	SCTP_UNKNOWN = 0xffff  /* Value used for transport state unknown */
>diff --git a/net/sctp/associola.c b/net/sctp/associola.c
>index 5bc9ab1..f3ebc23 100644
>--- a/net/sctp/associola.c
>+++ b/net/sctp/associola.c
>@@ -124,6 +124,8 @@ static struct sctp_association
>*sctp_association_init(struct sctp_association *a
> 	 * socket values.
> 	 */
> 	asoc->max_retrans = sp->assocparams.sasoc_asocmaxrxt;
>+	asoc->pf_retrans  = sctp_pf_retrans;
>+
> 	asoc->rto_initial = msecs_to_jiffies(sp->rtoinfo.srto_initial);
> 	asoc->rto_max = msecs_to_jiffies(sp->rtoinfo.srto_max);
> 	asoc->rto_min = msecs_to_jiffies(sp->rtoinfo.srto_min);
>@@ -840,6 +842,7 @@ void sctp_assoc_control_transport(struct
>sctp_association *asoc,
> 	struct sctp_ulpevent *event;
> 	struct sockaddr_storage addr;
> 	int spc_state = 0;
>+	bool ulp_notify = true;
> 
> 	/* Record the transition on the transport.  */
> 	switch (command) {
>@@ -853,6 +856,14 @@ void sctp_assoc_control_transport(struct
>sctp_association *asoc,
> 			spc_state = SCTP_ADDR_CONFIRMED;
> 		else
> 			spc_state = SCTP_ADDR_AVAILABLE;
>+		/* Don't inform ULP about transition from PF to
>+		 * active state and set cwnd to 1, see SCTP
>+		 * Quick failover draft section 5.1, point 5
>+		 */
>+		if (transport->state == SCTP_PF) {
>+			ulp_notify = false;
>+			transport->cwnd = 1;
>+		}
> 		transport->state = SCTP_ACTIVE;
> 		break;
> 
>@@ -871,6 +882,10 @@ void sctp_assoc_control_transport(struct
>sctp_association *asoc,
> 		spc_state = SCTP_ADDR_UNREACHABLE;
> 		break;
> 
>+	case SCTP_TRANSPORT_PF:
>+		transport->state = SCTP_PF;
>+		ulp_notify = false;
>+		break;
> 	default:
> 		return;
> 	}
>@@ -878,12 +893,15 @@ void sctp_assoc_control_transport(struct
>sctp_association *asoc,
> 	/* Generate and send a SCTP_PEER_ADDR_CHANGE notification to the
> 	 * user.
> 	 */
>-	memset(&addr, 0, sizeof(struct sockaddr_storage));
>-	memcpy(&addr, &transport->ipaddr,
>transport->af_specific->sockaddr_len);
>-	event = sctp_ulpevent_make_peer_addr_change(asoc, &addr,
>-				0, spc_state, error, GFP_ATOMIC);
>-	if (event)
>-		sctp_ulpq_tail_event(&asoc->ulpq, event);
>+	if (ulp_notify) {
>+		memset(&addr, 0, sizeof(struct sockaddr_storage));
>+		memcpy(&addr, &transport->ipaddr,
>+		       transport->af_specific->sockaddr_len);
>+		event = sctp_ulpevent_make_peer_addr_change(asoc, &addr,
>+					0, spc_state, error, GFP_ATOMIC);
>+		if (event)
>+			sctp_ulpq_tail_event(&asoc->ulpq, event);
>+	}
> 
> 	/* Select new active and retran paths. */
> 
>@@ -899,7 +917,8 @@ void sctp_assoc_control_transport(struct
>sctp_association *asoc,
> 			transports) {
> 
> 		if ((t->state == SCTP_INACTIVE) ||
>-		    (t->state == SCTP_UNCONFIRMED))
>+		    (t->state == SCTP_UNCONFIRMED) ||
>+		    (t->state == SCTP_PF))
> 			continue;
> 		if (!first || t->last_time_heard > first->last_time_heard) {
> 			second = first;
>diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
>index a0fa19f..e7aa177c 100644
>--- a/net/sctp/outqueue.c
>+++ b/net/sctp/outqueue.c
>@@ -792,7 +792,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int
>rtx_timeout)
> 			if (!new_transport)
> 				new_transport = asoc->peer.active_path;
> 		} else if ((new_transport->state == SCTP_INACTIVE) ||
>-			   (new_transport->state == SCTP_UNCONFIRMED)) {
>+			   (new_transport->state == SCTP_UNCONFIRMED) ||
>+			   (new_transport->state == SCTP_PF)) {
> 			/* If the chunk is Heartbeat or Heartbeat Ack,
> 			 * send it to chunk->transport, even if it's
> 			 * inactive.
>@@ -987,7 +988,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int
>rtx_timeout)
> 			new_transport = chunk->transport;
> 			if (!new_transport ||
> 			    ((new_transport->state == SCTP_INACTIVE) ||
>-			     (new_transport->state == SCTP_UNCONFIRMED)))
>+			     (new_transport->state == SCTP_UNCONFIRMED) ||
>+			     (new_transport->state == SCTP_PF)))
> 				new_transport = asoc->peer.active_path;
> 			if (new_transport->state == SCTP_UNCONFIRMED)
> 				continue;
>diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
>index c96d1a8..285e26a 100644
>--- a/net/sctp/sm_sideeffect.c
>+++ b/net/sctp/sm_sideeffect.c
>@@ -76,6 +76,8 @@ static int sctp_side_effects(sctp_event_t event_type,
>sctp_subtype_t subtype,
> 			     sctp_cmd_seq_t *commands,
> 			     gfp_t gfp);
> 
>+static void sctp_cmd_hb_timer_update(sctp_cmd_seq_t *cmds,
>+				     struct sctp_transport *t);
> /********************************************************************
>  * Helper functions
>  ********************************************************************/
>@@ -470,7 +472,8 @@ sctp_timer_event_t
>*sctp_timer_events[SCTP_NUM_TIMEOUT_TYPES] = {
>  * notification SHOULD be sent to the upper layer.
>  *
>  */
>-static void sctp_do_8_2_transport_strike(struct sctp_association
>*asoc,
>+static void sctp_do_8_2_transport_strike(sctp_cmd_seq_t *commands,
>+					 struct sctp_association *asoc,
> 					 struct sctp_transport *transport,
> 					 int is_hb)
> {
>@@ -495,6 +498,23 @@ static void sctp_do_8_2_transport_strike(struct
>sctp_association *asoc,
> 			transport->error_count++;
> 	}
> 
>+	/* If the transport error count is greater than the pf_retrans
>+	 * threshold, and less than pathmaxrtx, then mark this transport
>+	 * as Partially Failed, ee SCTP Quick Failover Draft, secon 5.1,
>+	 * point 1
>+	 */
>+	if ((transport->state != SCTP_PF) &&
>+	   (asoc->pf_retrans < transport->pathmaxrxt) &&
>+	   (transport->error_count > asoc->pf_retrans)) {
>+
>+		sctp_assoc_control_transport(asoc, transport,
>+					     SCTP_TRANSPORT_PF,
>+					     0);
>+
>+		/* Update the hb timer to resend a heartbeat every rto */
>+		sctp_cmd_hb_timer_update(commands, transport);
>+	}
>+
> 	if (transport->state != SCTP_INACTIVE &&
> 	    (transport->error_count > transport->pathmaxrxt)) {
> 		SCTP_DEBUG_PRINTK_IPADDR("transport_strike:association %p",
>@@ -699,6 +719,10 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t
>*cmds,
> 					     SCTP_HEARTBEAT_SUCCESS);
> 	}
> 
>+	if (t->state == SCTP_PF)
>+		sctp_assoc_control_transport(asoc, t, SCTP_TRANSPORT_UP,
>+					     SCTP_HEARTBEAT_SUCCESS);
>+
> 	/* The receiver of the HEARTBEAT ACK should also perform an
> 	 * RTT measurement for that destination transport address
> 	 * using the time value carried in the HEARTBEAT ACK chunk.
>@@ -1565,8 +1589,8 @@ static int sctp_cmd_interpreter(sctp_event_t
>event_type,
> 
> 		case SCTP_CMD_STRIKE:
> 			/* Mark one strike against a transport.  */
>-			sctp_do_8_2_transport_strike(asoc, cmd->obj.transport,
>-						    0);
>+			sctp_do_8_2_transport_strike(commands, asoc,
>+						    cmd->obj.transport, 0);
> 			break;
> 
> 		case SCTP_CMD_TRANSPORT_IDLE:
>@@ -1576,7 +1600,8 @@ static int sctp_cmd_interpreter(sctp_event_t
>event_type,
> 
> 		case SCTP_CMD_TRANSPORT_HB_SENT:
> 			t = cmd->obj.transport;
>-			sctp_do_8_2_transport_strike(asoc, t, 1);
>+			sctp_do_8_2_transport_strike(commands, asoc,
>+						     t, 1);
> 			t->hb_sent = 1;
> 			break;
> 
>diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
>index e5fe639..2b2bfe9 100644
>--- a/net/sctp/sysctl.c
>+++ b/net/sctp/sysctl.c
>@@ -141,6 +141,15 @@ static ctl_table sctp_table[] = {
> 		.extra2		= &int_max
> 	},
> 	{
>+		.procname	= "pf_retrans",
>+		.data		= &sctp_pf_retrans,
>+		.maxlen		= sizeof(int),
>+		.mode		= 0644,
>+		.proc_handler	= proc_dointvec_minmax,
>+		.extra1		= &zero,
>+		.extra2		= &int_max
>+	},
>+	{
> 		.procname	= "max_init_retransmits",
> 		.data		= &sctp_max_retrans_init,
> 		.maxlen		= sizeof(int),
>diff --git a/net/sctp/transport.c b/net/sctp/transport.c
>index b026ba0..4639ba2 100644
>--- a/net/sctp/transport.c
>+++ b/net/sctp/transport.c
>@@ -585,7 +585,8 @@ unsigned long sctp_transport_timeout(struct
>sctp_transport *t)
> {
> 	unsigned long timeout;
> 	timeout = t->rto + sctp_jitter(t->rto);
>-	if (t->state != SCTP_UNCONFIRMED)
>+	if ((t->state != SCTP_UNCONFIRMED) &&
>+	    (t->state != SCTP_PF))
> 		timeout += t->hbinterval;
> 	timeout += jiffies;
> 	return timeout;
>-- 
>1.7.7.6

One thing that seems to be missing is the API.  As a result you don't carry the value per transport which we'll need.  That caused you to add assoc parameter to some functions.  That's really the only missing item.
-- 
Sent from my Android phone with SkitMail. Please excuse my brevity.

^ permalink raw reply

* gateway icmp redirect handling problem (3.0.36-3.0.23)
From: Simon Roscic @ 2012-07-14 18:57 UTC (permalink / raw)
  To: netdev

Hello,

I´m experiencing the following problem with kernel versions 3.0.36 
(down to 3.0.23):

on our network we all have one default gateway, it´s 10.1.1.254, but 
there are some networks for which we have another gateway and for this 
networks the default gateway sends an icmp redirect.

lets assume my test machine has ip 10.1.20.79 netmask is 255.255.0.0 
and my default gateway is 10.1.1.254, i now ping the following ip: 
10.109.98.11, my default gateway (10.1.1.254) now sends me an icmp 
redirect to another gateway (10.1.1.1) ... and now everything works as 
expected, i get the replies from 10.109.98.11 but not for long, after 
approx. 60 (or so) seconds i only get "ping: sendmsg: Network is down".
(exact same problem with all other tcp/udp protocols, but i used ping 
for the tests because it also prints the redirect messages to the 
console)

so let´s have a closer look:

not ok - kernel versions 3.0.36 down to 3.0.23:
-----------------------------------------------

test-simon:~ # ping 10.109.98.10
...
64 bytes from 10.109.98.11: icmp_seq=62 ttl=60 time=12.1 ms
64 bytes from 10.109.98.11: icmp_seq=63 ttl=60 time=11.6 ms
ping: sendmsg: Network is down
ping: sendmsg: Network is down

when looking at "ip neigh" the "ping: sendmsg: Network is down" message 
appears in the exact moment when the arp entry for the default gateway 
(10.1.1.254) gets removed from the arp cache:

ping "OK"
test-simon:~ # ip neigh
10.1.1.1 dev eth0 lladdr 00:00:0c:9f:f0:64 REACHABLE
10.1.1.254 dev eth0 lladdr 00:1a:64:8f:23:64 STALE

ping "dead"
test-simon:~ # ip neigh
10.1.1.1 dev eth0 lladdr 00:00:0c:9f:f0:64 REACHABLE

so it seems that when the default gateway is removed from the arp cache 
something goes wrong in the kernel route handling. i don´t know the 
internals of the linux route handling, now i need your help, any ideas 
what´s going wrong?

i did a lot of tests, the problem i described first happens with kernel 
version 3.0.23, i found in the changelog of 3.0.23 the following two 
commits:
(http://www.kernel.org/pub/linux/kernel/v3.0/ChangeLog-3.0.23)

commit 42ab5316ddcaa0de23e88e8a3d363c767b9ab0b3
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Fri Nov 18 15:24:32 2011 -0500
ipv4: fix redirect handling

commit bebee22bcbf0026f92141990972bd5863ef9b69c
Author: Flavio Leitner <fbl@redhat.com>
Date:   Mon Oct 24 02:56:38 2011 -0400
route: fix ICMP redirect validation

i then took the net/ipv4/route.c file from kernel 3.0.22 and replaced 
the version in 3.0.23 with it, this reverts the two mentioned patches 
above (if i havent overlooked something) after that the problem 
disappears.
so those two patches surely fixed some problem but for kernel versions 
3.0.23-3.0.36 they broke the gateway icmp redirect handling as described 
by me here.

i did some further tests with different kernel versions:
3.5-rc6: OK
3.4.4: OK
3.2.22: OK
3.0.1 - 3.0.22: OK
3.0.23 - 3.0.36: not OK
2.6.35.13: OK

now lets have a closer look at a kernel version which works:
------------------------------------------------------------

this is from 3.5-rc6, but 3.4.4, 3.2.2 and 2.6.35.13 also behave 
exactly this way, 3.0.1-3.0.22 behave slightly different, see note 
below.

test-simon:~ # ping 10.109.98.11
PING 10.109.98.10 (10.109.98.11) 56(84) bytes of data.
64 bytes from 10.109.98.11: icmp_seq=1 ttl=60 time=15.2 ms
 From 10.1.1.254: icmp_seq=2 Redirect Host(New nexthop: 10.1.1.1)
...

test-simon:~ # ip neigh
10.1.1.1 dev eth0 lladdr 00:00:0c:9f:f0:64 REACHABLE
10.1.1.254 dev eth0 lladdr 00:1a:64:8f:23:64 STALE

and after approx 60 or so seconds:

test-simon:~ # ip neigh
10.1.1.1 dev eth0 lladdr 00:00:0c:9f:f0:64 REACHABLE

and ping (and everything else) is as expected still working.

note:
-----

on 3.0.1-3.0.22:

i see lots of icmp redirects sent from the default gateway (10.1.1.254) 
to my test machine, while running tcpdump on the default gateway 
(10.1.1.254) i see every ping packet also arriving there and also some 
icmp redirect messages going out to my test machine.
but everything works so i think my test machine is correctly talking to 
the destination using the other gateway (10.1.1.1).
i also sniffed a windows 7 client pc, it looks the same there, so 
possibly no problem, but i mention this because kernel versions 3.5-rc6, 
3.4.4, 3.2.22 and 2.6.35.13 act differently (see below).

on 3.0.23-3.0.36:

i see lots of icmp redirects sent from the default gateway (10.1.1.254) 
to my test machine, while running tcpdump on the default gateway 
(10.1.1.254) i see up to 20 ping packets arriving there and also up to 
17 icmp redirect messages going out to my test machine, after the 20th 
ping packet i dont see further ping packets arriving at the default 
gateway. so my test machine is then only talking to the other gateway 
(10.1.1.1) i think.
...
17:48:41.643952 IP 10.1.1.254 > 10.1.20.79: ICMP redirect 10.109.98.11 
to host 10.1.1.1, length 92
...
17:48:44.649008 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 
30733, seq 20, length 64
17:48:44.649018 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 
30733, seq 20, length 64

on 3.5-rc6, 3.4.4, 3.2.22 and 2.6.35.13:

here it looks different, and for me this is the expected behavior, or 
at least the behavior i have seen from lots of linux machines on my 
network. i see 1-2 icmp redirects sent from the default gateway 
(10.1.1.254) to my test machine, while running tcpdump on the default 
gateway (10.1.1.254) i only see up to 2 ping packets arriving then 
nothing, so then my test machine seems to only talk to the other gateway 
(10.1.1.1).

17:50:58.995894 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 
10766, seq 1, length 64
17:50:58.995914 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 
10766, seq 1, length 64
17:50:59.997260 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 
10766, seq 2, length 64
17:50:59.997277 IP 10.1.1.254 > 10.1.20.79: ICMP redirect 10.109.98.11 
to host 10.1.1.1, length 92
17:50:59.997287 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 
10766, seq 2, length 64

...

(before someone asks why i "must" use kernel 3.0.x ... because this are 
SLES 11 SP2 VMs and they currently ship kernel 3.0.34)

i hope i described the problem in a way so that the kernel network 
stack maintainers can understand the problem, please conact me if you 
have further questions, and please CC me as i am not subscribed to 
linux-netdev. if you wish you can also CC this message or replies to 
linux-kernel.

kind regards,
Simon Roscic.

^ permalink raw reply

* Re: resurrecting tcphealth
From: David Miller @ 2012-07-14 19:29 UTC (permalink / raw)
  To: eric.dumazet; +Cc: a9702387, netdev, linux-kernel
In-Reply-To: <1342254423.3265.9028.camel@edumazet-glaptop>


Don't we report all of this shit in tcp_info already?

I really hate such cruddy patches like this.

^ permalink raw reply

* Re: 3.5rc6 sctp panic
From: David Miller @ 2012-07-14 20:02 UTC (permalink / raw)
  To: davej; +Cc: netdev, vyasevich, sri, nhorman
In-Reply-To: <20120711000831.GA10518@redhat.com>

From: Dave Jones <davej@redhat.com>
Date: Tue, 10 Jul 2012 20:08:32 -0400

> I just hit this while fuzz testing, and the box locked up immediately afterwards.
> The serial log was a little mangled, I did my best to clean it up..

Guys can we fix crashes like this one reported by Dave instead of
working on new features and cleanups?

Thanks.

> [22766.294255] general protection fault: 0000 [#1] PREEMPT SMP 
> [22766.295376] CPU 0 
> [22766.295384] Modules linked in:
> [22766.387137]  ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90 ffff880147c03a74
> [22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000
> [22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000,
> [22766.387137] Stack:
> [22766.387140]  ffff880147c03a10
> [22766.387140]  ffffffffa169f2b6
> [22766.387140]  ffff88013ed95728
> [22766.387143]  0000000000000002
> [22766.387143]  0000000000000000
> [22766.387143]  ffff880003fad062
> [22766.387144]  ffff88013c120000
> [22766.387144] 
> [22766.387145] Call Trace:
> [22766.387145]  <IRQ> 
> [22766.387150]  [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0 [sctp]
> [22766.387154]  [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp]
> [22766.387157]  [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp]
> [22766.387161]  [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0
> [22766.387163]  [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210
> [22766.387166]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
> [22766.387168]  [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0
> [22766.387169]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
> [22766.387171]  [<ffffffff81590a07>] ip_local_deliver+0x47/0x80
> [22766.387172]  [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680
> [22766.387174]  [<ffffffff81590c54>] ip_rcv+0x214/0x320
> [22766.387176]  [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910
> [22766.387178]  [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910
> [22766.387180]  [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40
> [22766.387182]  [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0
> [22766.387183]  [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440
> [22766.387185]  [<ffffffff81559280>] napi_skb_finish+0x70/0xa0
> [22766.387187]  [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130
> [22766.387218]  [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e]
> [22766.387242]  [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e]
> [22766.387266]  [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e]
> [22766.387268]  [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0
> [22766.387270]  [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130
> [22766.387273]  [<ffffffff810734d0>] __do_softirq+0xe0/0x420
> [22766.387275]  [<ffffffff8169826c>] call_softirq+0x1c/0x30
> [22766.387278]  [<ffffffff8101db15>] do_softirq+0xd5/0x110
> [22766.387279]  [<ffffffff81073bc5>] irq_exit+0xd5/0xe0
> [22766.387281]  [<ffffffff81698b03>] do_IRQ+0x63/0xd0
> [22766.387283]  [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f
> [22766.387283]  <EOI> 
> [22766.387284] 
> [22766.387285]  [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b
> [22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48 89 e5 48 83 
> ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00 48 89 fb 
> 49 89 f5 66 c1 c0 08 66 39 46 02 
> [22766.387307] 
> [22766.387307] RIP 
> [22766.387311]  [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp]
> [22766.387311]  RSP <ffff880147c039b0>
> [22766.387142]  ffffffffa16ab120
> [22766.599537] ---[ end trace 3f6dae82e37b17f5 ]---
> [22766.601221] Kernel panic - not syncing: Fatal exception in interrupt
> 
> 
> 
> Disassembly of the function shows that we oopsed here..
> 
> /* Is this the association we are looking for? */
> struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc,
>                                            const union sctp_addr *laddr,
>                                            const union sctp_addr *paddr)
> {
>     1070:       55                      push   %rbp
>     1071:       48 89 e5                mov    %rsp,%rbp
>     1074:       48 83 ec 20             sub    $0x20,%rsp
>     1078:       48 89 5d e8             mov    %rbx,-0x18(%rbp)
>     107c:       4c 89 65 f0             mov    %r12,-0x10(%rbp)
>     1080:       4c 89 6d f8             mov    %r13,-0x8(%rbp)
>     1084:       e8 00 00 00 00          callq  1089 <sctp_assoc_is_match+0x19>
>         struct sctp_transport *transport;
> 
>         if ((htons(asoc->base.bind_addr.port) == laddr->v4.sin_port) &&
>     1089:       0f b7 87 98 00 00 00    movzwl 0x98(%rdi),%eax
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: Crash in CIPSO_V4_TAG_LOCAL handling
From: David Miller @ 2012-07-14 20:08 UTC (permalink / raw)
  To: mlin; +Cc: alan, paul.moore, netdev
In-Reply-To: <1342286550.23395.12.camel@monkey32>

From: Lin Ming <mlin@ss.pku.edu.cn>
Date: Sun, 15 Jul 2012 01:22:30 +0800

> It's caused by below code added in commit 15c45f7b.
> 
>                 case CIPSO_V4_TAG_LOCAL:
>                         /* This is a non-standard tag that we only allow for
>                          * local connections, so if the incoming interface is
>                          * not the loopback device drop the packet. */
>                         if (!(skb->dev->flags & IFF_LOOPBACK)) {
>                                 err_offset = opt_iter;
>                                 goto validate_return_locked;
>                         }

Paul please fix this, as shown 'skb' can easily be NULL in this
code path.

^ permalink raw reply

* [PATCH] drivers: connector: fixed coding style issues
From: Valentin Ilie @ 2012-07-14 20:26 UTC (permalink / raw)
  To: zbr; +Cc: netdev, Valentin Ilie

Fixed checkpatch warnings.

Signed-off-by: Valentin Ilie <valentin.ilie@gmail.com>
---
 drivers/connector/cn_proc.c   |   30 +++++++++++++++---------------
 drivers/connector/cn_queue.c  |    6 +++---
 drivers/connector/connector.c |    2 +-
 3 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 77e1e6c..a76801c 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -46,7 +46,7 @@ static DEFINE_PER_CPU(__u32, proc_event_counts) = { 0 };
 static inline void get_seq(__u32 *ts, int *cpu)
 {
 	preempt_disable();
-	*ts = __this_cpu_inc_return(proc_event_counts) -1;
+	*ts = __this_cpu_inc_return(proc_event_counts) - 1;
 	*cpu = smp_processor_id();
 	preempt_enable();
 }
@@ -62,8 +62,8 @@ void proc_fork_connector(struct task_struct *task)
 	if (atomic_read(&proc_event_num_listeners) < 1)
 		return;
 
-	msg = (struct cn_msg*)buffer;
-	ev = (struct proc_event*)msg->data;
+	msg = (struct cn_msg *)buffer;
+	ev = (struct proc_event *)msg->data;
 	get_seq(&msg->seq, &ev->cpu);
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
 	put_unaligned(timespec_to_ns(&ts), (__u64 *)&ev->timestamp_ns);
@@ -93,8 +93,8 @@ void proc_exec_connector(struct task_struct *task)
 	if (atomic_read(&proc_event_num_listeners) < 1)
 		return;
 
-	msg = (struct cn_msg*)buffer;
-	ev = (struct proc_event*)msg->data;
+	msg = (struct cn_msg *)buffer;
+	ev = (struct proc_event *)msg->data;
 	get_seq(&msg->seq, &ev->cpu);
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
 	put_unaligned(timespec_to_ns(&ts), (__u64 *)&ev->timestamp_ns);
@@ -119,8 +119,8 @@ void proc_id_connector(struct task_struct *task, int which_id)
 	if (atomic_read(&proc_event_num_listeners) < 1)
 		return;
 
-	msg = (struct cn_msg*)buffer;
-	ev = (struct proc_event*)msg->data;
+	msg = (struct cn_msg *)buffer;
+	ev = (struct proc_event *)msg->data;
 	ev->what = which_id;
 	ev->event_data.id.process_pid = task->pid;
 	ev->event_data.id.process_tgid = task->tgid;
@@ -134,7 +134,7 @@ void proc_id_connector(struct task_struct *task, int which_id)
 		ev->event_data.id.e.egid = cred->egid;
 	} else {
 		rcu_read_unlock();
-	     	return;
+		return;
 	}
 	rcu_read_unlock();
 	get_seq(&msg->seq, &ev->cpu);
@@ -241,8 +241,8 @@ void proc_exit_connector(struct task_struct *task)
 	if (atomic_read(&proc_event_num_listeners) < 1)
 		return;
 
-	msg = (struct cn_msg*)buffer;
-	ev = (struct proc_event*)msg->data;
+	msg = (struct cn_msg *)buffer;
+	ev = (struct proc_event *)msg->data;
 	get_seq(&msg->seq, &ev->cpu);
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
 	put_unaligned(timespec_to_ns(&ts), (__u64 *)&ev->timestamp_ns);
@@ -276,8 +276,8 @@ static void cn_proc_ack(int err, int rcvd_seq, int rcvd_ack)
 	if (atomic_read(&proc_event_num_listeners) < 1)
 		return;
 
-	msg = (struct cn_msg*)buffer;
-	ev = (struct proc_event*)msg->data;
+	msg = (struct cn_msg *)buffer;
+	ev = (struct proc_event *)msg->data;
 	msg->seq = rcvd_seq;
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
 	put_unaligned(timespec_to_ns(&ts), (__u64 *)&ev->timestamp_ns);
@@ -303,7 +303,7 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg,
 	if (msg->len != sizeof(*mc_op))
 		return;
 
-	mc_op = (enum proc_cn_mcast_op*)msg->data;
+	mc_op = (enum proc_cn_mcast_op *)msg->data;
 	switch (*mc_op) {
 	case PROC_CN_MCAST_LISTEN:
 		atomic_inc(&proc_event_num_listeners);
@@ -328,8 +328,8 @@ static int __init cn_proc_init(void)
 	int err;
 
 	if ((err = cn_add_callback(&cn_proc_event_id, "cn_proc",
-	 			   &cn_proc_mcast_ctl))) {
-		printk(KERN_WARNING "cn_proc failed to register\n");
+					&cn_proc_mcast_ctl))) {
+		pr_warn("cn_proc failed to register\n");
 		return err;
 	}
 	return 0;
diff --git a/drivers/connector/cn_queue.c b/drivers/connector/cn_queue.c
index c42c9d5..cd3bd7d 100644
--- a/drivers/connector/cn_queue.c
+++ b/drivers/connector/cn_queue.c
@@ -1,5 +1,5 @@
 /*
- * 	cn_queue.c
+ *	cn_queue.c
  *
  * 2004+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net>
  * All rights reserved.
@@ -40,7 +40,7 @@ cn_queue_alloc_callback_entry(struct cn_queue_dev *dev, const char *name,
 
 	cbq = kzalloc(sizeof(*cbq), GFP_KERNEL);
 	if (!cbq) {
-		printk(KERN_ERR "Failed to create new callback queue.\n");
+		pr_err("Failed to create new callback queue.\n");
 		return NULL;
 	}
 
@@ -149,7 +149,7 @@ void cn_queue_free_dev(struct cn_queue_dev *dev)
 	spin_unlock_bh(&dev->queue_lock);
 
 	while (atomic_read(&dev->refcnt)) {
-		printk(KERN_INFO "Waiting for %s to become free: refcnt=%d.\n",
+		pr_info("Waiting for %s to become free: refcnt=%d.\n",
 		       dev->name, atomic_read(&dev->refcnt));
 		msleep(1000);
 	}
diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index dde6a0f..36e89ae 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -1,5 +1,5 @@
 /*
- * 	connector.c
+ *	connector.c
  *
  * 2004+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net>
  * All rights reserved.
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH] drivers: connector: fixed coding style issues
From: Joe Perches @ 2012-07-14 20:33 UTC (permalink / raw)
  To: Valentin Ilie; +Cc: zbr, netdev
In-Reply-To: <1342297608-6872-1-git-send-email-valentin.ilie@gmail.com>

On Sat, 2012-07-14 at 23:26 +0300, Valentin Ilie wrote:
> Fixed checkpatch warnings.
[]
> diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
[]
> @@ -328,8 +328,8 @@ static int __init cn_proc_init(void)
>  	int err;
>  
>  	if ((err = cn_add_callback(&cn_proc_event_id, "cn_proc",
> -	 			   &cn_proc_mcast_ctl))) {
> -		printk(KERN_WARNING "cn_proc failed to register\n");
> +					&cn_proc_mcast_ctl))) {
> +		pr_warn("cn_proc failed to register\n");

I don't believe the first part of this change is a checkpatch
message.  I think it would be better written by hoisting the
function outside the if like:

	err = cn_add_callback(&cn_proc_event_id, "cn_proc", &cn_proc_mcast_ctl)
	if (err) {
		pr_warn("cn_proc failed to register\n");
		return err;
	}

^ permalink raw reply

* Re: 3.5rc6 sctp panic
From: Neil Horman @ 2012-07-14 21:14 UTC (permalink / raw)
  To: David Miller; +Cc: davej, netdev, vyasevich, sri
In-Reply-To: <20120714.130201.1562818146578811700.davem@davemloft.net>

On Sat, Jul 14, 2012 at 01:02:01PM -0700, David Miller wrote:
> From: Dave Jones <davej@redhat.com>
> Date: Tue, 10 Jul 2012 20:08:32 -0400
> 
> > I just hit this while fuzz testing, and the box locked up immediately afterwards.
> > The serial log was a little mangled, I did my best to clean it up..
> 
> Guys can we fix crashes like this one reported by Dave instead of
> working on new features and cleanups?
> 
> Thanks.
> 
Yeah, I'll put it at the top of my to do list, and track it down monday AM.
Neil

> > [22766.294255] general protection fault: 0000 [#1] PREEMPT SMP 
> > [22766.295376] CPU 0 
> > [22766.295384] Modules linked in:
> > [22766.387137]  ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90 ffff880147c03a74
> > [22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000
> > [22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000,
> > [22766.387137] Stack:
> > [22766.387140]  ffff880147c03a10
> > [22766.387140]  ffffffffa169f2b6
> > [22766.387140]  ffff88013ed95728
> > [22766.387143]  0000000000000002
> > [22766.387143]  0000000000000000
> > [22766.387143]  ffff880003fad062
> > [22766.387144]  ffff88013c120000
> > [22766.387144] 
> > [22766.387145] Call Trace:
> > [22766.387145]  <IRQ> 
> > [22766.387150]  [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0 [sctp]
> > [22766.387154]  [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp]
> > [22766.387157]  [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp]
> > [22766.387161]  [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0
> > [22766.387163]  [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210
> > [22766.387166]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
> > [22766.387168]  [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0
> > [22766.387169]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
> > [22766.387171]  [<ffffffff81590a07>] ip_local_deliver+0x47/0x80
> > [22766.387172]  [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680
> > [22766.387174]  [<ffffffff81590c54>] ip_rcv+0x214/0x320
> > [22766.387176]  [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910
> > [22766.387178]  [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910
> > [22766.387180]  [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40
> > [22766.387182]  [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0
> > [22766.387183]  [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440
> > [22766.387185]  [<ffffffff81559280>] napi_skb_finish+0x70/0xa0
> > [22766.387187]  [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130
> > [22766.387218]  [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e]
> > [22766.387242]  [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e]
> > [22766.387266]  [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e]
> > [22766.387268]  [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0
> > [22766.387270]  [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130
> > [22766.387273]  [<ffffffff810734d0>] __do_softirq+0xe0/0x420
> > [22766.387275]  [<ffffffff8169826c>] call_softirq+0x1c/0x30
> > [22766.387278]  [<ffffffff8101db15>] do_softirq+0xd5/0x110
> > [22766.387279]  [<ffffffff81073bc5>] irq_exit+0xd5/0xe0
> > [22766.387281]  [<ffffffff81698b03>] do_IRQ+0x63/0xd0
> > [22766.387283]  [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f
> > [22766.387283]  <EOI> 
> > [22766.387284] 
> > [22766.387285]  [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b
> > [22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48 89 e5 48 83 
> > ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00 48 89 fb 
> > 49 89 f5 66 c1 c0 08 66 39 46 02 
> > [22766.387307] 
> > [22766.387307] RIP 
> > [22766.387311]  [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp]
> > [22766.387311]  RSP <ffff880147c039b0>
> > [22766.387142]  ffffffffa16ab120
> > [22766.599537] ---[ end trace 3f6dae82e37b17f5 ]---
> > [22766.601221] Kernel panic - not syncing: Fatal exception in interrupt
> > 
> > 
> > 
> > Disassembly of the function shows that we oopsed here..
> > 
> > /* Is this the association we are looking for? */
> > struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc,
> >                                            const union sctp_addr *laddr,
> >                                            const union sctp_addr *paddr)
> > {
> >     1070:       55                      push   %rbp
> >     1071:       48 89 e5                mov    %rsp,%rbp
> >     1074:       48 83 ec 20             sub    $0x20,%rsp
> >     1078:       48 89 5d e8             mov    %rbx,-0x18(%rbp)
> >     107c:       4c 89 65 f0             mov    %r12,-0x10(%rbp)
> >     1080:       4c 89 6d f8             mov    %r13,-0x8(%rbp)
> >     1084:       e8 00 00 00 00          callq  1089 <sctp_assoc_is_match+0x19>
> >         struct sctp_transport *transport;
> > 
> >         if ((htons(asoc->base.bind_addr.port) == laddr->v4.sin_port) &&
> >     1089:       0f b7 87 98 00 00 00    movzwl 0x98(%rdi),%eax
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

^ permalink raw reply

* Re: [PATCH] sctp: Implement quick failover draft from tsvwg
From: Neil Horman @ 2012-07-14 21:22 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, Sridhar Samudrala, David S. Miller, linux-sctp
In-Reply-To: <fd33324e-93b0-4ced-8f57-f3c6032baf9b@email.android.com>

On Sat, Jul 14, 2012 at 02:12:36PM -0400, Vlad Yasevich wrote:
> Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> >I've seen several attempts recently made to do quick failover of sctp
> >transports
> >by reducing various retransmit timers and counters.  While its possible
> >to
> >implement a faster failover on multihomed sctp associations, its not
> >particularly robust, in that it can lead to unneeded retransmits, as
> >well as
> >false connection failures due to intermittent latency on a network.
> >
> >Instead, lets implement the new ietf quick failover draft found here:
> >http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
> >
> >This will let the sctp stack identify transports that have had a small
> >number of
> >errors, and avoid using them quickly until their reliability can be
> >re-established.  I've tested this out on two virt guests connected via
> >multiple
> >isolated virt networks and believe its in compliance with the above
> >draft and
> >works well.
> >
> >Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> >CC: Vlad Yasevich <vyasevich@gmail.com>
> >CC: Sridhar Samudrala <sri@us.ibm.com>
> >CC: "David S. Miller" <davem@davemloft.net>
> >CC: linux-sctp@vger.kernel.org
> >---
> > Documentation/networking/ip-sysctl.txt |   14 +++++++++++++
> > include/net/sctp/constants.h           |    1 +
> > include/net/sctp/structs.h             |    4 +++
> > include/net/sctp/user.h                |    1 +
> >net/sctp/associola.c                   |   33
> >+++++++++++++++++++++++++------
> > net/sctp/outqueue.c                    |    6 +++-
> >net/sctp/sm_sideeffect.c               |   33
> >++++++++++++++++++++++++++++---
> > net/sctp/sysctl.c                      |    9 ++++++++
> > net/sctp/transport.c                   |    3 +-
> > 9 files changed, 90 insertions(+), 14 deletions(-)
> >
> >diff --git a/Documentation/networking/ip-sysctl.txt
> >b/Documentation/networking/ip-sysctl.txt
> >index 47b6c79..c636f9c 100644
> >--- a/Documentation/networking/ip-sysctl.txt
> >+++ b/Documentation/networking/ip-sysctl.txt
> >@@ -1408,6 +1408,20 @@ path_max_retrans - INTEGER
> > 
> > 	Default: 5
> > 
> >+pf_retrans - INTEGER
> >+	The number of retransmissions that will be attempted on a given path
> >+	before traffic is redirected to an alternate transport (should one
> >+	exist).  Note this is distinct from path_max_retrans, as a path that
> >+	passes the pf_retrans threshold can still be used.  Its only
> >+	deprioritized when a transmission path is selected by the stack. 
> >This
> >+	setting is primarily used to enable fast failover mechanisms without
> >+	having to reduce path_max_retrans to a very low value.  See:
> >+	http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
> >+	for details.  Note also that a value of pf_retrans > path_max_retrans
> >+	disables this feature
> >+
> >+	Default: 0
> >+
> > rto_initial - INTEGER
> >	The initial round trip timeout value in milliseconds that will be used
> > 	in calculating round trip times.  This is the initial time interval
> >diff --git a/include/net/sctp/constants.h
> >b/include/net/sctp/constants.h
> >index 942b864..d053d2e 100644
> >--- a/include/net/sctp/constants.h
> >+++ b/include/net/sctp/constants.h
> >@@ -334,6 +334,7 @@ typedef enum {
> > typedef enum {
> > 	SCTP_TRANSPORT_UP,
> > 	SCTP_TRANSPORT_DOWN,
> >+	SCTP_TRANSPORT_PF,
> > } sctp_transport_cmd_t;
> > 
> > /* These are the address scopes defined mainly for IPv4 addresses
> >diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> >index e4652fe..22825abe 100644
> >--- a/include/net/sctp/structs.h
> >+++ b/include/net/sctp/structs.h
> >@@ -160,6 +160,7 @@ extern struct sctp_globals {
> > 	int max_retrans_association;
> > 	int max_retrans_path;
> > 	int max_retrans_init;
> >+	int pf_retrans;
> > 
> > 	/*
> > 	 * Policy for preforming sctp/socket accounting
> >@@ -258,6 +259,7 @@ extern struct sctp_globals {
> > #define sctp_sndbuf_policy	 	(sctp_globals.sndbuf_policy)
> > #define sctp_rcvbuf_policy	 	(sctp_globals.rcvbuf_policy)
> > #define sctp_max_retrans_path		(sctp_globals.max_retrans_path)
> >+#define sctp_pf_retrans			(sctp_globals.pf_retrans)
> > #define sctp_max_retrans_init		(sctp_globals.max_retrans_init)
> > #define sctp_sack_timeout		(sctp_globals.sack_timeout)
> > #define sctp_hb_interval		(sctp_globals.hb_interval)
> >@@ -1660,6 +1662,8 @@ struct sctp_association {
> > 	 */
> > 	int max_retrans;
> > 
> >+	int pf_retrans;
> >+
> > 	/* Maximum number of times the endpoint will retransmit INIT  */
> > 	__u16 max_init_attempts;
> > 
> >diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
> >index 0842ef0..cece1bf 100644
> >--- a/include/net/sctp/user.h
> >+++ b/include/net/sctp/user.h
> >@@ -649,6 +649,7 @@ struct sctp_paddrinfo {
> >  */
> > enum sctp_spinfo_state {
> > 	SCTP_INACTIVE,
> >+	SCTP_PF,
> > 	SCTP_ACTIVE,
> > 	SCTP_UNCONFIRMED,
> > 	SCTP_UNKNOWN = 0xffff  /* Value used for transport state unknown */
> >diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> >index 5bc9ab1..f3ebc23 100644
> >--- a/net/sctp/associola.c
> >+++ b/net/sctp/associola.c
> >@@ -124,6 +124,8 @@ static struct sctp_association
> >*sctp_association_init(struct sctp_association *a
> > 	 * socket values.
> > 	 */
> > 	asoc->max_retrans = sp->assocparams.sasoc_asocmaxrxt;
> >+	asoc->pf_retrans  = sctp_pf_retrans;
> >+
> > 	asoc->rto_initial = msecs_to_jiffies(sp->rtoinfo.srto_initial);
> > 	asoc->rto_max = msecs_to_jiffies(sp->rtoinfo.srto_max);
> > 	asoc->rto_min = msecs_to_jiffies(sp->rtoinfo.srto_min);
> >@@ -840,6 +842,7 @@ void sctp_assoc_control_transport(struct
> >sctp_association *asoc,
> > 	struct sctp_ulpevent *event;
> > 	struct sockaddr_storage addr;
> > 	int spc_state = 0;
> >+	bool ulp_notify = true;
> > 
> > 	/* Record the transition on the transport.  */
> > 	switch (command) {
> >@@ -853,6 +856,14 @@ void sctp_assoc_control_transport(struct
> >sctp_association *asoc,
> > 			spc_state = SCTP_ADDR_CONFIRMED;
> > 		else
> > 			spc_state = SCTP_ADDR_AVAILABLE;
> >+		/* Don't inform ULP about transition from PF to
> >+		 * active state and set cwnd to 1, see SCTP
> >+		 * Quick failover draft section 5.1, point 5
> >+		 */
> >+		if (transport->state == SCTP_PF) {
> >+			ulp_notify = false;
> >+			transport->cwnd = 1;
> >+		}
> > 		transport->state = SCTP_ACTIVE;
> > 		break;
> > 
> >@@ -871,6 +882,10 @@ void sctp_assoc_control_transport(struct
> >sctp_association *asoc,
> > 		spc_state = SCTP_ADDR_UNREACHABLE;
> > 		break;
> > 
> >+	case SCTP_TRANSPORT_PF:
> >+		transport->state = SCTP_PF;
> >+		ulp_notify = false;
> >+		break;
> > 	default:
> > 		return;
> > 	}
> >@@ -878,12 +893,15 @@ void sctp_assoc_control_transport(struct
> >sctp_association *asoc,
> > 	/* Generate and send a SCTP_PEER_ADDR_CHANGE notification to the
> > 	 * user.
> > 	 */
> >-	memset(&addr, 0, sizeof(struct sockaddr_storage));
> >-	memcpy(&addr, &transport->ipaddr,
> >transport->af_specific->sockaddr_len);
> >-	event = sctp_ulpevent_make_peer_addr_change(asoc, &addr,
> >-				0, spc_state, error, GFP_ATOMIC);
> >-	if (event)
> >-		sctp_ulpq_tail_event(&asoc->ulpq, event);
> >+	if (ulp_notify) {
> >+		memset(&addr, 0, sizeof(struct sockaddr_storage));
> >+		memcpy(&addr, &transport->ipaddr,
> >+		       transport->af_specific->sockaddr_len);
> >+		event = sctp_ulpevent_make_peer_addr_change(asoc, &addr,
> >+					0, spc_state, error, GFP_ATOMIC);
> >+		if (event)
> >+			sctp_ulpq_tail_event(&asoc->ulpq, event);
> >+	}
> > 
> > 	/* Select new active and retran paths. */
> > 
> >@@ -899,7 +917,8 @@ void sctp_assoc_control_transport(struct
> >sctp_association *asoc,
> > 			transports) {
> > 
> > 		if ((t->state == SCTP_INACTIVE) ||
> >-		    (t->state == SCTP_UNCONFIRMED))
> >+		    (t->state == SCTP_UNCONFIRMED) ||
> >+		    (t->state == SCTP_PF))
> > 			continue;
> > 		if (!first || t->last_time_heard > first->last_time_heard) {
> > 			second = first;
> >diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
> >index a0fa19f..e7aa177c 100644
> >--- a/net/sctp/outqueue.c
> >+++ b/net/sctp/outqueue.c
> >@@ -792,7 +792,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int
> >rtx_timeout)
> > 			if (!new_transport)
> > 				new_transport = asoc->peer.active_path;
> > 		} else if ((new_transport->state == SCTP_INACTIVE) ||
> >-			   (new_transport->state == SCTP_UNCONFIRMED)) {
> >+			   (new_transport->state == SCTP_UNCONFIRMED) ||
> >+			   (new_transport->state == SCTP_PF)) {
> > 			/* If the chunk is Heartbeat or Heartbeat Ack,
> > 			 * send it to chunk->transport, even if it's
> > 			 * inactive.
> >@@ -987,7 +988,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int
> >rtx_timeout)
> > 			new_transport = chunk->transport;
> > 			if (!new_transport ||
> > 			    ((new_transport->state == SCTP_INACTIVE) ||
> >-			     (new_transport->state == SCTP_UNCONFIRMED)))
> >+			     (new_transport->state == SCTP_UNCONFIRMED) ||
> >+			     (new_transport->state == SCTP_PF)))
> > 				new_transport = asoc->peer.active_path;
> > 			if (new_transport->state == SCTP_UNCONFIRMED)
> > 				continue;
> >diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> >index c96d1a8..285e26a 100644
> >--- a/net/sctp/sm_sideeffect.c
> >+++ b/net/sctp/sm_sideeffect.c
> >@@ -76,6 +76,8 @@ static int sctp_side_effects(sctp_event_t event_type,
> >sctp_subtype_t subtype,
> > 			     sctp_cmd_seq_t *commands,
> > 			     gfp_t gfp);
> > 
> >+static void sctp_cmd_hb_timer_update(sctp_cmd_seq_t *cmds,
> >+				     struct sctp_transport *t);
> > /********************************************************************
> >  * Helper functions
> >  ********************************************************************/
> >@@ -470,7 +472,8 @@ sctp_timer_event_t
> >*sctp_timer_events[SCTP_NUM_TIMEOUT_TYPES] = {
> >  * notification SHOULD be sent to the upper layer.
> >  *
> >  */
> >-static void sctp_do_8_2_transport_strike(struct sctp_association
> >*asoc,
> >+static void sctp_do_8_2_transport_strike(sctp_cmd_seq_t *commands,
> >+					 struct sctp_association *asoc,
> > 					 struct sctp_transport *transport,
> > 					 int is_hb)
> > {
> >@@ -495,6 +498,23 @@ static void sctp_do_8_2_transport_strike(struct
> >sctp_association *asoc,
> > 			transport->error_count++;
> > 	}
> > 
> >+	/* If the transport error count is greater than the pf_retrans
> >+	 * threshold, and less than pathmaxrtx, then mark this transport
> >+	 * as Partially Failed, ee SCTP Quick Failover Draft, secon 5.1,
> >+	 * point 1
> >+	 */
> >+	if ((transport->state != SCTP_PF) &&
> >+	   (asoc->pf_retrans < transport->pathmaxrxt) &&
> >+	   (transport->error_count > asoc->pf_retrans)) {
> >+
> >+		sctp_assoc_control_transport(asoc, transport,
> >+					     SCTP_TRANSPORT_PF,
> >+					     0);
> >+
> >+		/* Update the hb timer to resend a heartbeat every rto */
> >+		sctp_cmd_hb_timer_update(commands, transport);
> >+	}
> >+
> > 	if (transport->state != SCTP_INACTIVE &&
> > 	    (transport->error_count > transport->pathmaxrxt)) {
> > 		SCTP_DEBUG_PRINTK_IPADDR("transport_strike:association %p",
> >@@ -699,6 +719,10 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t
> >*cmds,
> > 					     SCTP_HEARTBEAT_SUCCESS);
> > 	}
> > 
> >+	if (t->state == SCTP_PF)
> >+		sctp_assoc_control_transport(asoc, t, SCTP_TRANSPORT_UP,
> >+					     SCTP_HEARTBEAT_SUCCESS);
> >+
> > 	/* The receiver of the HEARTBEAT ACK should also perform an
> > 	 * RTT measurement for that destination transport address
> > 	 * using the time value carried in the HEARTBEAT ACK chunk.
> >@@ -1565,8 +1589,8 @@ static int sctp_cmd_interpreter(sctp_event_t
> >event_type,
> > 
> > 		case SCTP_CMD_STRIKE:
> > 			/* Mark one strike against a transport.  */
> >-			sctp_do_8_2_transport_strike(asoc, cmd->obj.transport,
> >-						    0);
> >+			sctp_do_8_2_transport_strike(commands, asoc,
> >+						    cmd->obj.transport, 0);
> > 			break;
> > 
> > 		case SCTP_CMD_TRANSPORT_IDLE:
> >@@ -1576,7 +1600,8 @@ static int sctp_cmd_interpreter(sctp_event_t
> >event_type,
> > 
> > 		case SCTP_CMD_TRANSPORT_HB_SENT:
> > 			t = cmd->obj.transport;
> >-			sctp_do_8_2_transport_strike(asoc, t, 1);
> >+			sctp_do_8_2_transport_strike(commands, asoc,
> >+						     t, 1);
> > 			t->hb_sent = 1;
> > 			break;
> > 
> >diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
> >index e5fe639..2b2bfe9 100644
> >--- a/net/sctp/sysctl.c
> >+++ b/net/sctp/sysctl.c
> >@@ -141,6 +141,15 @@ static ctl_table sctp_table[] = {
> > 		.extra2		= &int_max
> > 	},
> > 	{
> >+		.procname	= "pf_retrans",
> >+		.data		= &sctp_pf_retrans,
> >+		.maxlen		= sizeof(int),
> >+		.mode		= 0644,
> >+		.proc_handler	= proc_dointvec_minmax,
> >+		.extra1		= &zero,
> >+		.extra2		= &int_max
> >+	},
> >+	{
> > 		.procname	= "max_init_retransmits",
> > 		.data		= &sctp_max_retrans_init,
> > 		.maxlen		= sizeof(int),
> >diff --git a/net/sctp/transport.c b/net/sctp/transport.c
> >index b026ba0..4639ba2 100644
> >--- a/net/sctp/transport.c
> >+++ b/net/sctp/transport.c
> >@@ -585,7 +585,8 @@ unsigned long sctp_transport_timeout(struct
> >sctp_transport *t)
> > {
> > 	unsigned long timeout;
> > 	timeout = t->rto + sctp_jitter(t->rto);
> >-	if (t->state != SCTP_UNCONFIRMED)
> >+	if ((t->state != SCTP_UNCONFIRMED) &&
> >+	    (t->state != SCTP_PF))
> > 		timeout += t->hbinterval;
> > 	timeout += jiffies;
> > 	return timeout;
> >-- 
> >1.7.7.6
> 
> One thing that seems to be missing is the API.  As a result you don't carry the value per transport which we'll need.  That caused you to add assoc parameter to some functions.  That's really the only missing item.

I definately agree that a way to set the a per association pf threshold from
userspace (ostensibly from a socket option), but I'm not quite sure we're on the
same page about the semantics.  You say above that that I don't carry the value
per transport (I presume you mean the pf threshold).  According to the draft the
threshold is maintained per association, see point one of section 5.1:
1.  The sender maintains a new tunable parameter called Potentially-
       failed.Max.Retrans (PFMR).  The recommended value of PFMR = 0
       when quick failover is used.  When an association's PFMR >= PMR,
       quick failover is turned off.

So yes, we should have a way to change it programatically from the default via
an option, but I think the threshold is stored in the correct place (the
assocition struct).

Or am I misunderstanding what you're saying?

Regards
Neil

> -- 
> Sent from my Android phone with SkitMail. Please excuse my brevity.
> 

^ permalink raw reply

* Re: resurrecting tcphealth
From: Stephen Hemminger @ 2012-07-14 21:48 UTC (permalink / raw)
  To: Piotr Sawuk; +Cc: netdev, linux-kernel
In-Reply-To: <dee325378a2d3b7792da9640fc14905f.squirrel@webmail.univie.ac.at>

Maybe it would be better to use netlink info (for ss command)
rather than a /proc/net interface.
See how existing TCP values and MEMINFO are handled.

^ permalink raw reply

* Re: [PATCH] drivers: connector: fixed coding style issues
From: Valentin Ilie @ 2012-07-14 21:51 UTC (permalink / raw)
  To: Joe Perches; +Cc: zbr, netdev
In-Reply-To: <1342298026.8377.33.camel@joe2Laptop>

On 14 July 2012 23:33, Joe Perches <joe@perches.com> wrote:
> I think it would be better written by hoisting the
> function outside the if like [..]

OK.

Should I submit the whole patch again (with that change) or just
another small patch that does just that?

-- 
Valentin Ilie

^ permalink raw reply

* Re: [PATCH net-next] netem: refine early skb orphaning
From: Stephen Hemminger @ 2012-07-14 21:53 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Hagen Paul Pfeifer, Mark Gordon,
	Andreas Terzis, Yuchung Cheng
In-Reply-To: <1342271787.3265.9632.camel@edumazet-glaptop>

On Sat, 14 Jul 2012 15:16:27 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> From: Eric Dumazet <edumazet@google.com>
> 
> netem does an early orphaning of skbs. Doing so breaks TCP Small Queue
> or any mechanism relying on socket sk_wmem_alloc feedback.
> 
> Ideally, we should perform this orphaning after the rate module and
> before the delay module, to mimic what happens on a real link :
> 
> skb orphaning is indeed normally done at TX completion, before the
> transit on the link.
> 
> +-------+   +--------+  +---------------+  +-----------------+
> + Qdisc +---> Device +--> TX completion +--> links / hops    +->
> +       +   +  xmit  +  + skb orphaning +  + propagation     +
> +-------+   +--------+  +---------------+  +-----------------+
>       < rate limiting >                  < delay, drops, reorders >
> 
> If netem is used without delay feature (drops, reorders, rate
> limiting), then we should avoid early skb orphaning, to keep pressure
> on sockets as long as packets are still in qdisc queue.
> 
> Ideally, netem should be refactored to implement delay module
> as the last stage. Current algorithm merges the two phases
> (rate limiting + delay) so its not correct.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Hagen Paul Pfeifer <hagen@jauu.net>
> Cc: Mark Gordon <msg@google.com>
> Cc: Andreas Terzis <aterzis@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> ---
>  net/sched/sch_netem.c |    9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index c412ad0..298c0dd 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -380,7 +380,14 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>  		return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
>  	}
>  
> -	skb_orphan(skb);
> +	/* If a delay is expected, orphan the skb. (orphaning usually takes
> +	 * place at TX completion time, so _before_ the link transit delay)
> +	 * Ideally, this orphaning should be done after the rate limiting
> +	 * module, because this breaks TCP Small Queue, and other mechanisms
> +	 * based on socket sk_wmem_alloc.
> +	 */
> +	if (q->latency || q->jitter)
> +		skb_orphan(skb);
>  
>  	/*
>  	 * If we need to duplicate packet, then re-insert at top of the
> 

Acked-by: Stephen Hemminger <shemminger@vyatta.com>

^ permalink raw reply

* Re: [PATCH] drivers: connector: fixed coding style issues
From: Joe Perches @ 2012-07-14 21:56 UTC (permalink / raw)
  To: Valentin Ilie; +Cc: zbr, netdev
In-Reply-To: <CAHuBVBYXAj7OUgo3UmLERbcmBw-sK8c66TuCq_7Fsp=CVzzYSw@mail.gmail.com>

On Sun, 2012-07-15 at 00:51 +0300, Valentin Ilie wrote:
> On 14 July 2012 23:33, Joe Perches <joe@perches.com> wrote:
> > I think it would be better written by hoisting the
> > function outside the if like [..]
> 
> OK.
> 
> Should I submit the whole patch again (with that change) or just
> another small patch that does just that?

It's generally better to reply to the original patch
with [PATCH V2] with the entire patch content updated.

btw: there's a missing a semicolon in what I posted.

Also add a changelog after the --- so the new patch
email looks like:

Subject: [PATCH V2] drivers: connectors: etc

Patch description...

Signed-off-by: etc
---
v2: Updated to do whatever

actual patch...

cheers, Joe

^ permalink raw reply

* [net-next 0/7][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2012-07-14 23:06 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

This series contains updates to e1000e and ixgbe.

The following are changes since commit 141e369de698f2e17bf716b83fcc647ddcb2220c:
  xfrm: Initialize the struct xfrm_dst behind the dst_enty field
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Alexander Duyck (5):
  ixgbe: Simplify logic for getting traffic class from user priority
  ixgbe: Cleanup unpacking code for DCB
  ixgbe: Populate the prio_tc_map in ixgbe_setup_tc
  ixgbe: Add function for obtaining FCoE TC based on FCoE user priority
  ixgbe: Merge FCoE set_num and cache_ring calls into RSS/DCB config

Matthew Vick (1):
  e1000e: Program the correct register for ITR when using MSI-X.

Tushar Dave (1):
  e1000e: Cleanup code logic in e1000_check_for_serdes_link_82571()

 drivers/net/ethernet/intel/e1000e/82571.c       |   14 +-
 drivers/net/ethernet/intel/e1000e/e1000.h       |    1 +
 drivers/net/ethernet/intel/e1000e/ethtool.c     |    5 +-
 drivers/net/ethernet/intel/e1000e/netdev.c      |   32 ++-
 drivers/net/ethernet/intel/ixgbe/ixgbe.h        |    1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c    |   74 ++++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.h    |    1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c |   26 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c   |   15 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c    |  260 ++++++++++-------------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |   61 ++++--
 11 files changed, 265 insertions(+), 225 deletions(-)

-- 
1.7.10.4

^ permalink raw reply

* [net-next 1/7] e1000e: Cleanup code logic in e1000_check_for_serdes_link_82571()
From: Jeff Kirsher @ 2012-07-14 23:06 UTC (permalink / raw)
  To: davem; +Cc: Tushar Dave, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342307225-28917-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Tushar Dave <tushar.n.dave@intel.com>

Cleanup code to make it more clean and readable.

Signed-off-by: Tushar Dave <tushar.n.dave@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/82571.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/82571.c b/drivers/net/ethernet/intel/e1000e/82571.c
index 36db4df..19f4cb9 100644
--- a/drivers/net/ethernet/intel/e1000e/82571.c
+++ b/drivers/net/ethernet/intel/e1000e/82571.c
@@ -1677,16 +1677,18 @@ static s32 e1000_check_for_serdes_link_82571(struct e1000_hw *hw)
 			e_dbg("ANYSTATE  -> DOWN\n");
 		} else {
 			/*
-			 * Check several times, if Sync and Config
-			 * both are consistently 1 then simply ignore
-			 * the Invalid bit and restart Autoneg
+			 * Check several times, if SYNCH bit and CONFIG
+			 * bit both are consistently 1 then simply ignore
+			 * the IV bit and restart Autoneg
 			 */
 			for (i = 0; i < AN_RETRY_COUNT; i++) {
 				udelay(10);
 				rxcw = er32(RXCW);
-				if ((rxcw & E1000_RXCW_IV) &&
-				    !((rxcw & E1000_RXCW_SYNCH) &&
-				      (rxcw & E1000_RXCW_C))) {
+				if ((rxcw & E1000_RXCW_SYNCH) &&
+				    (rxcw & E1000_RXCW_C))
+					continue;
+
+				if (rxcw & E1000_RXCW_IV) {
 					mac->serdes_has_link = false;
 					mac->serdes_link_state =
 					    e1000_serdes_link_down;
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 2/7] e1000e: Program the correct register for ITR when using MSI-X.
From: Jeff Kirsher @ 2012-07-14 23:07 UTC (permalink / raw)
  To: davem; +Cc: Matthew Vick, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342307225-28917-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Matthew Vick <matthew.vick@intel.com>

When configuring interrupt throttling on 82574 in MSI-X mode, we need to
be programming the EITR registers instead of the ITR register.

-rc2: Renamed e1000_write_itr() to e1000e_write_itr(), fixed whitespace
      issues, and removed unnecessary !! operation.
-rc3: Reduced the scope of the loop variable in e1000e_write_itr().

Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/e1000.h   |    1 +
 drivers/net/ethernet/intel/e1000e/ethtool.c |    5 ++---
 drivers/net/ethernet/intel/e1000e/netdev.c  |   32 +++++++++++++++++++++++----
 3 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h b/drivers/net/ethernet/intel/e1000e/e1000.h
index 6e6fffb..cd15332 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -514,6 +514,7 @@ extern void e1000e_set_interrupt_capability(struct e1000_adapter *adapter);
 extern void e1000e_reset_interrupt_capability(struct e1000_adapter *adapter);
 extern void e1000e_get_hw_control(struct e1000_adapter *adapter);
 extern void e1000e_release_hw_control(struct e1000_adapter *adapter);
+extern void e1000e_write_itr(struct e1000_adapter *adapter, u32 itr);
 
 extern unsigned int copybreak;
 
diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c b/drivers/net/ethernet/intel/e1000e/ethtool.c
index 905e214..105d554 100644
--- a/drivers/net/ethernet/intel/e1000e/ethtool.c
+++ b/drivers/net/ethernet/intel/e1000e/ethtool.c
@@ -1897,7 +1897,6 @@ static int e1000_set_coalesce(struct net_device *netdev,
 			      struct ethtool_coalesce *ec)
 {
 	struct e1000_adapter *adapter = netdev_priv(netdev);
-	struct e1000_hw *hw = &adapter->hw;
 
 	if ((ec->rx_coalesce_usecs > E1000_MAX_ITR_USECS) ||
 	    ((ec->rx_coalesce_usecs > 4) &&
@@ -1916,9 +1915,9 @@ static int e1000_set_coalesce(struct net_device *netdev,
 	}
 
 	if (adapter->itr_setting != 0)
-		ew32(ITR, 1000000000 / (adapter->itr * 256));
+		e1000e_write_itr(adapter, adapter->itr);
 	else
-		ew32(ITR, 0);
+		e1000e_write_itr(adapter, 0);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index ca477e8..95b2453 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -2474,6 +2474,30 @@ set_itr_now:
 }
 
 /**
+ * e1000e_write_itr - write the ITR value to the appropriate registers
+ * @adapter: address of board private structure
+ * @itr: new ITR value to program
+ *
+ * e1000e_write_itr determines if the adapter is in MSI-X mode
+ * and, if so, writes the EITR registers with the ITR value.
+ * Otherwise, it writes the ITR value into the ITR register.
+ **/
+void e1000e_write_itr(struct e1000_adapter *adapter, u32 itr)
+{
+	struct e1000_hw *hw = &adapter->hw;
+	u32 new_itr = itr ? 1000000000 / (itr * 256) : 0;
+
+	if (adapter->msix_entries) {
+		int vector;
+
+		for (vector = 0; vector < adapter->num_vectors; vector++)
+			writel(new_itr, hw->hw_addr + E1000_EITR_82574(vector));
+	} else {
+		ew32(ITR, new_itr);
+	}
+}
+
+/**
  * e1000_alloc_queues - Allocate memory for all rings
  * @adapter: board private structure to initialize
  **/
@@ -3059,7 +3083,7 @@ static void e1000_configure_rx(struct e1000_adapter *adapter)
 	/* irq moderation */
 	ew32(RADV, adapter->rx_abs_int_delay);
 	if ((adapter->itr_setting != 0) && (adapter->itr != 0))
-		ew32(ITR, 1000000000 / (adapter->itr * 256));
+		e1000e_write_itr(adapter, adapter->itr);
 
 	ctrl_ext = er32(CTRL_EXT);
 	/* Auto-Mask interrupts upon ICR access */
@@ -3486,14 +3510,14 @@ void e1000e_reset(struct e1000_adapter *adapter)
 				dev_info(&adapter->pdev->dev,
 					"Interrupt Throttle Rate turned off\n");
 				adapter->flags2 |= FLAG2_DISABLE_AIM;
-				ew32(ITR, 0);
+				e1000e_write_itr(adapter, 0);
 			}
 		} else if (adapter->flags2 & FLAG2_DISABLE_AIM) {
 			dev_info(&adapter->pdev->dev,
 				 "Interrupt Throttle Rate turned on\n");
 			adapter->flags2 &= ~FLAG2_DISABLE_AIM;
 			adapter->itr = 20000;
-			ew32(ITR, 1000000000 / (adapter->itr * 256));
+			e1000e_write_itr(adapter, adapter->itr);
 		}
 	}
 
@@ -4576,7 +4600,7 @@ link_up:
 			    adapter->gorc - adapter->gotc) / 10000;
 		u32 itr = goc > 0 ? (dif * 6000 / goc + 2000) : 8000;
 
-		ew32(ITR, 1000000000 / (itr * 256));
+		e1000e_write_itr(adapter, itr);
 	}
 
 	/* Cause software interrupt to ensure Rx ring is cleaned */
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 3/7] ixgbe: Simplify logic for getting traffic class from user priority
From: Jeff Kirsher @ 2012-07-14 23:07 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342307225-28917-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch is meant to help simplify the logic for getting traffic classes
from user priorities. To do this I am adding a function named
ixgbe_dcb_get_tc_from_up that will go through the traffic classes in
reverse order in order to determine which traffic class contains a bit for
a given user priority.

Adding a declaration for this new function to the header so that
we have a centralized means for sorting out traffic classes belonging to
features such as FCoE.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c |   30 ++++++++++++++++++++------
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c
index 8bfaaee..4fd5a0d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c
@@ -231,16 +231,32 @@ void ixgbe_dcb_unpack_prio(struct ixgbe_dcb_config *cfg, int direction,
 	}
 }
 
-void ixgbe_dcb_unpack_map(struct ixgbe_dcb_config *cfg, int direction, u8 *map)
+static u8 ixgbe_dcb_get_tc_from_up(struct ixgbe_dcb_config *cfg,
+				   int direction, u8 up)
 {
-	int i, up;
-	unsigned long bitmap;
+	struct tc_configuration *tc_config = &cfg->tc_config[0];
+	u8 prio_mask = 1 << up;
+	u8 tc;
 
-	for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
-		bitmap = cfg->tc_config[i].path[direction].up_to_tc_bitmap;
-		for_each_set_bit(up, &bitmap, MAX_USER_PRIORITY)
-			map[up] = i;
+	/*
+	 * Test for TCs 7 through 1 and report the first match we find.  If
+	 * we find no match we can assume that the TC is 0 since the TC must
+	 * be set for all user priorities
+	 */
+	for (tc = MAX_TRAFFIC_CLASS - 1; tc; tc--) {
+		if (prio_mask & tc_config[tc].path[direction].up_to_tc_bitmap)
+			break;
 	}
+
+	return tc;
+}
+
+void ixgbe_dcb_unpack_map(struct ixgbe_dcb_config *cfg, int direction, u8 *map)
+{
+	u8 up;
+
+	for (up = 0; up < MAX_USER_PRIORITY; up++)
+		map[up] = ixgbe_dcb_get_tc_from_up(cfg, direction, up);
 }
 
 /**
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 4/7] ixgbe: Cleanup unpacking code for DCB
From: Jeff Kirsher @ 2012-07-14 23:07 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342307225-28917-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This is meant to be a generic clean-up of the remaining functions for
unpacking data from the DCB structures. The only real changes are:
replaced the variable i with tc for functions that were looping through the
traffic classes, and added a pointer for tc_class instead of path since
that way we only need to pull the pointer once instead of once per loop.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c |   47 ++++++++++++--------------
 1 file changed, 22 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c
index 4fd5a0d..39ac2fe 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c
@@ -180,55 +180,52 @@ out:
 
 void ixgbe_dcb_unpack_pfc(struct ixgbe_dcb_config *cfg, u8 *pfc_en)
 {
-	int i;
+	struct tc_configuration *tc_config = &cfg->tc_config[0];
+	int tc;
 
-	*pfc_en = 0;
-	for (i = 0; i < MAX_TRAFFIC_CLASS; i++)
-		*pfc_en |= !!(cfg->tc_config[i].dcb_pfc & 0xF) << i;
+	for (*pfc_en = 0, tc = 0; tc < MAX_TRAFFIC_CLASS; tc++) {
+		if (tc_config[tc].dcb_pfc != pfc_disabled)
+			*pfc_en |= 1 << tc;
+	}
 }
 
 void ixgbe_dcb_unpack_refill(struct ixgbe_dcb_config *cfg, int direction,
 			     u16 *refill)
 {
-	struct tc_bw_alloc *p;
-	int i;
+	struct tc_configuration *tc_config = &cfg->tc_config[0];
+	int tc;
 
-	for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
-		p = &cfg->tc_config[i].path[direction];
-		refill[i] = p->data_credits_refill;
-	}
+	for (tc = 0; tc < MAX_TRAFFIC_CLASS; tc++)
+		refill[tc] = tc_config[tc].path[direction].data_credits_refill;
 }
 
 void ixgbe_dcb_unpack_max(struct ixgbe_dcb_config *cfg, u16 *max)
 {
-	int i;
+	struct tc_configuration *tc_config = &cfg->tc_config[0];
+	int tc;
 
-	for (i = 0; i < MAX_TRAFFIC_CLASS; i++)
-		max[i] = cfg->tc_config[i].desc_credits_max;
+	for (tc = 0; tc < MAX_TRAFFIC_CLASS; tc++)
+		max[tc] = tc_config[tc].desc_credits_max;
 }
 
 void ixgbe_dcb_unpack_bwgid(struct ixgbe_dcb_config *cfg, int direction,
 			    u8 *bwgid)
 {
-	struct tc_bw_alloc *p;
-	int i;
+	struct tc_configuration *tc_config = &cfg->tc_config[0];
+	int tc;
 
-	for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
-		p = &cfg->tc_config[i].path[direction];
-		bwgid[i] = p->bwg_id;
-	}
+	for (tc = 0; tc < MAX_TRAFFIC_CLASS; tc++)
+		bwgid[tc] = tc_config[tc].path[direction].bwg_id;
 }
 
 void ixgbe_dcb_unpack_prio(struct ixgbe_dcb_config *cfg, int direction,
 			    u8 *ptype)
 {
-	struct tc_bw_alloc *p;
-	int i;
+	struct tc_configuration *tc_config = &cfg->tc_config[0];
+	int tc;
 
-	for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
-		p = &cfg->tc_config[i].path[direction];
-		ptype[i] = p->prio_type;
-	}
+	for (tc = 0; tc < MAX_TRAFFIC_CLASS; tc++)
+		ptype[tc] = tc_config[tc].path[direction].prio_type;
 }
 
 static u8 ixgbe_dcb_get_tc_from_up(struct ixgbe_dcb_config *cfg,
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 5/7] ixgbe: Populate the prio_tc_map in ixgbe_setup_tc
From: Jeff Kirsher @ 2012-07-14 23:07 UTC (permalink / raw)
  To: davem
  Cc: Alexander Duyck, netdev, gospo, sassmann, John Fastabend,
	Jeff Kirsher
In-Reply-To: <1342307225-28917-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

There were cases where the prio_tc_map was not populated when we were
calling open.  This will result in us incorrectly configuring the traffic
classes when DCB is enabled.  In order to correct this I have updated the
code so that we now populate the values prior to allocating the q_vectors
and calling ixgbe_open.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c    |    3 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.h    |    1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c |   26 ++++-----------------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |   28 +++++++++++++++++++++++
 4 files changed, 35 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c
index 39ac2fe..5442b35 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c
@@ -228,8 +228,7 @@ void ixgbe_dcb_unpack_prio(struct ixgbe_dcb_config *cfg, int direction,
 		ptype[tc] = tc_config[tc].path[direction].prio_type;
 }
 
-static u8 ixgbe_dcb_get_tc_from_up(struct ixgbe_dcb_config *cfg,
-				   int direction, u8 up)
+u8 ixgbe_dcb_get_tc_from_up(struct ixgbe_dcb_config *cfg, int direction, u8 up)
 {
 	struct tc_configuration *tc_config = &cfg->tc_config[0];
 	u8 prio_mask = 1 << up;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.h
index 24333b7..1f4108e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.h
@@ -146,6 +146,7 @@ void ixgbe_dcb_unpack_max(struct ixgbe_dcb_config *, u16 *);
 void ixgbe_dcb_unpack_bwgid(struct ixgbe_dcb_config *, int, u8 *);
 void ixgbe_dcb_unpack_prio(struct ixgbe_dcb_config *, int, u8 *);
 void ixgbe_dcb_unpack_map(struct ixgbe_dcb_config *, int, u8 *);
+u8 ixgbe_dcb_get_tc_from_up(struct ixgbe_dcb_config *, int, u8);
 
 /* DCB credits calculation */
 s32 ixgbe_dcb_calculate_tc_credits(struct ixgbe_hw *,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c
index 5164a21..f1e002d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c
@@ -151,34 +151,21 @@ static u8 ixgbe_dcbnl_get_state(struct net_device *netdev)
 
 static u8 ixgbe_dcbnl_set_state(struct net_device *netdev, u8 state)
 {
-	int err = 0;
-	u8 prio_tc[MAX_USER_PRIORITY] = {0};
-	int i;
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
+	int err = 0;
 
 	/* Fail command if not in CEE mode */
 	if (!(adapter->dcbx_cap & DCB_CAP_DCBX_VER_CEE))
 		return 1;
 
 	/* verify there is something to do, if not then exit */
-	if (!!state != !(adapter->flags & IXGBE_FLAG_DCB_ENABLED))
-		goto out;
-
-	if (state > 0) {
-		err = ixgbe_setup_tc(netdev, adapter->dcb_cfg.num_tcs.pg_tcs);
-		ixgbe_dcb_unpack_map(&adapter->dcb_cfg, DCB_TX_CONFIG, prio_tc);
-	} else {
-		err = ixgbe_setup_tc(netdev, 0);
-	}
-
-	if (err)
+	if (!state == !(adapter->flags & IXGBE_FLAG_DCB_ENABLED))
 		goto out;
 
-	for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++)
-		netdev_set_prio_tc_map(netdev, i, prio_tc[i]);
-
+	err = ixgbe_setup_tc(netdev,
+			     state ? adapter->dcb_cfg.num_tcs.pg_tcs : 0);
 out:
-	return err ? 1 : 0;
+	return !!err;
 }
 
 static void ixgbe_dcbnl_get_perm_hw_addr(struct net_device *netdev,
@@ -584,9 +571,6 @@ static int ixgbe_dcbnl_ieee_setets(struct net_device *dev,
 	if (err)
 		goto err_out;
 
-	for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++)
-		netdev_set_prio_tc_map(dev, i, ets->prio_tc[i]);
-
 	err = ixgbe_dcb_hw_ets(&adapter->hw, ets, max_frame);
 err_out:
 	return err;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d3cf887..91bc60f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6596,6 +6596,31 @@ static void ixgbe_validate_rtr(struct ixgbe_adapter *adapter, u8 tc)
 }
 
 /**
+ * ixgbe_set_prio_tc_map - Configure netdev prio tc map
+ * @adapter: Pointer to adapter struct
+ *
+ * Populate the netdev user priority to tc map
+ */
+static void ixgbe_set_prio_tc_map(struct ixgbe_adapter *adapter)
+{
+	struct net_device *dev = adapter->netdev;
+	struct ixgbe_dcb_config *dcb_cfg = &adapter->dcb_cfg;
+	struct ieee_ets *ets = adapter->ixgbe_ieee_ets;
+	u8 prio;
+
+	for (prio = 0; prio < MAX_USER_PRIORITY; prio++) {
+		u8 tc = 0;
+
+		if (adapter->dcbx_cap & DCB_CAP_DCBX_VER_CEE)
+			tc = ixgbe_dcb_get_tc_from_up(dcb_cfg, 0, prio);
+		else if (ets)
+			tc = ets->prio_tc[prio];
+
+		netdev_set_prio_tc_map(dev, prio, tc);
+	}
+}
+
+/**
  * ixgbe_setup_tc - configure net_device for multiple traffic classes
  *
  * @netdev: net device to configure
@@ -6633,6 +6658,8 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 
 	if (tc) {
 		netdev_set_num_tc(dev, tc);
+		ixgbe_set_prio_tc_map(adapter);
+
 		adapter->flags |= IXGBE_FLAG_DCB_ENABLED;
 		adapter->flags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE;
 
@@ -6642,6 +6669,7 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 		}
 	} else {
 		netdev_reset_tc(dev);
+
 		if (adapter->hw.mac.type == ixgbe_mac_82598EB)
 			adapter->hw.fc.requested_mode = adapter->last_lfc_mode;
 
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 6/7] ixgbe: Add function for obtaining FCoE TC based on FCoE user priority
From: Jeff Kirsher @ 2012-07-14 23:07 UTC (permalink / raw)
  To: davem
  Cc: Alexander Duyck, netdev, gospo, sassmann, John Fastabend,
	Jeff Kirsher
In-Reply-To: <1342307225-28917-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

In upcoming patches it will become increasingly common to need to determine
the FCoE traffic class in order to determine the correct queues for FCoE.
In order to make this easier I am adding a function for obtaining the FCoE
traffic class based on the user priority.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |    1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c |   15 +++++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |    7 ++-----
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   14 ++++----------
 4 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 2ffdc8f..5a75a9c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -707,6 +707,7 @@ extern u8 ixgbe_fcoe_setapp(struct ixgbe_adapter *adapter, u8 up);
 extern int ixgbe_fcoe_get_wwn(struct net_device *netdev, u64 *wwn, int type);
 extern int ixgbe_fcoe_get_hbainfo(struct net_device *netdev,
 				  struct netdev_fcoe_hbainfo *info);
+extern u8 ixgbe_fcoe_get_tc(struct ixgbe_adapter *adapter);
 #endif /* IXGBE_FCOE */
 
 static inline struct netdev_queue *txring_txq(const struct ixgbe_ring *ring)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
index 0922ece..cc28c44 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
@@ -960,3 +960,18 @@ int ixgbe_fcoe_get_hbainfo(struct net_device *netdev,
 
 	return 0;
 }
+
+/**
+ * ixgbe_fcoe_get_tc - get the current TC that fcoe is mapped to
+ * @adapter - pointer to the device adapter structure
+ *
+ * Return : TC that FCoE is mapped to
+ */
+u8 ixgbe_fcoe_get_tc(struct ixgbe_adapter *adapter)
+{
+#ifdef CONFIG_IXGBE_DCB
+	return netdev_get_prio_tc_map(adapter->netdev, adapter->fcoe.up);
+#else
+	return 0;
+#endif
+}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index 83eadd0..c0ff31e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -49,8 +49,8 @@ static inline bool ixgbe_cache_ring_rss(struct ixgbe_adapter *adapter)
 
 	return true;
 }
-#ifdef CONFIG_IXGBE_DCB
 
+#ifdef CONFIG_IXGBE_DCB
 /* ixgbe_get_first_reg_idx - Return first register index associated with ring */
 static void ixgbe_get_first_reg_idx(struct ixgbe_adapter *adapter, u8 tc,
 				    unsigned int *tx, unsigned int *rx)
@@ -343,13 +343,10 @@ static inline bool ixgbe_set_dcb_queues(struct ixgbe_adapter *adapter)
 	 * configuration later.
 	 */
 	if (adapter->flags & IXGBE_FLAG_FCOE_ENABLED) {
-		u8 prio_tc[MAX_USER_PRIORITY] = {0};
-		int tc;
+		u8 tc = ixgbe_fcoe_get_tc(adapter);
 		struct ixgbe_ring_feature *f =
 					&adapter->ring_feature[RING_F_FCOE];
 
-		ixgbe_dcb_unpack_map(&adapter->dcb_cfg, DCB_TX_CONFIG, prio_tc);
-		tc = prio_tc[adapter->fcoe.up];
 		f->indices = dev->tc_to_txq[tc].count;
 		f->offset = dev->tc_to_txq[tc].offset;
 	}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 91bc60f..3d7ce7e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3646,18 +3646,12 @@ static int ixgbe_hpbthresh(struct ixgbe_adapter *adapter, int pb)
 
 #ifdef IXGBE_FCOE
 	/* FCoE traffic class uses FCOE jumbo frames */
-	if (dev->features & NETIF_F_FCOE_MTU) {
-		int fcoe_pb = 0;
-
-#ifdef CONFIG_IXGBE_DCB
-		fcoe_pb = netdev_get_prio_tc_map(dev, adapter->fcoe.up);
+	if ((dev->features & NETIF_F_FCOE_MTU) &&
+	    (tc < IXGBE_FCOE_JUMBO_FRAME_SIZE) &&
+	    (pb == ixgbe_fcoe_get_tc(adapter)))
+		tc = IXGBE_FCOE_JUMBO_FRAME_SIZE;
 
 #endif
-		if (fcoe_pb == pb && tc < IXGBE_FCOE_JUMBO_FRAME_SIZE)
-			tc = IXGBE_FCOE_JUMBO_FRAME_SIZE;
-	}
-#endif
-
 	/* Calculate delay value for device */
 	switch (hw->mac.type) {
 	case ixgbe_mac_X540:
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 7/7] ixgbe: Merge FCoE set_num and cache_ring calls into RSS/DCB config
From: Jeff Kirsher @ 2012-07-14 23:07 UTC (permalink / raw)
  To: davem
  Cc: Alexander Duyck, netdev, gospo, sassmann, John Fastabend,
	Jeff Kirsher
In-Reply-To: <1342307225-28917-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change merges the ixgbe_cache_ring_fcoe and ixgbe_set_fcoe_queues
logic into the DCB and RSS initialization calls.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |  257 +++++++++++--------------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   19 +-
 2 files changed, 129 insertions(+), 147 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index c0ff31e..d308e71 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -28,28 +28,6 @@
 #include "ixgbe.h"
 #include "ixgbe_sriov.h"
 
-/**
- * ixgbe_cache_ring_rss - Descriptor ring to register mapping for RSS
- * @adapter: board private structure to initialize
- *
- * Cache the descriptor ring offsets for RSS to the assigned rings.
- *
- **/
-static inline bool ixgbe_cache_ring_rss(struct ixgbe_adapter *adapter)
-{
-	int i;
-
-	if (!(adapter->flags & IXGBE_FLAG_RSS_ENABLED))
-		return false;
-
-	for (i = 0; i < adapter->num_rx_queues; i++)
-		adapter->rx_ring[i]->reg_idx = i;
-	for (i = 0; i < adapter->num_tx_queues; i++)
-		adapter->tx_ring[i]->reg_idx = i;
-
-	return true;
-}
-
 #ifdef CONFIG_IXGBE_DCB
 /* ixgbe_get_first_reg_idx - Return first register index associated with ring */
 static void ixgbe_get_first_reg_idx(struct ixgbe_adapter *adapter, u8 tc,
@@ -136,39 +114,8 @@ static inline bool ixgbe_cache_ring_dcb(struct ixgbe_adapter *adapter)
 
 	return true;
 }
-#endif
-
-#ifdef IXGBE_FCOE
-/**
- * ixgbe_cache_ring_fcoe - Descriptor ring to register mapping for the FCoE
- * @adapter: board private structure to initialize
- *
- * Cache the descriptor ring offsets for FCoE mode to the assigned rings.
- *
- */
-static inline bool ixgbe_cache_ring_fcoe(struct ixgbe_adapter *adapter)
-{
-	struct ixgbe_ring_feature *f = &adapter->ring_feature[RING_F_FCOE];
-	int i;
-	u8 fcoe_rx_i = 0, fcoe_tx_i = 0;
-
-	if (!(adapter->flags & IXGBE_FLAG_FCOE_ENABLED))
-		return false;
-
-	if (adapter->flags & IXGBE_FLAG_RSS_ENABLED) {
-		ixgbe_cache_ring_rss(adapter);
-
-		fcoe_rx_i = f->offset;
-		fcoe_tx_i = f->offset;
-	}
-	for (i = 0; i < f->indices; i++, fcoe_rx_i++, fcoe_tx_i++) {
-		adapter->rx_ring[f->offset + i]->reg_idx = fcoe_rx_i;
-		adapter->tx_ring[f->offset + i]->reg_idx = fcoe_tx_i;
-	}
-	return true;
-}
 
-#endif /* IXGBE_FCOE */
+#endif
 /**
  * ixgbe_cache_ring_sriov - Descriptor ring to register mapping for sriov
  * @adapter: board private structure to initialize
@@ -188,6 +135,28 @@ static inline bool ixgbe_cache_ring_sriov(struct ixgbe_adapter *adapter)
 }
 
 /**
+ * ixgbe_cache_ring_rss - Descriptor ring to register mapping for RSS
+ * @adapter: board private structure to initialize
+ *
+ * Cache the descriptor ring offsets for RSS to the assigned rings.
+ *
+ **/
+static bool ixgbe_cache_ring_rss(struct ixgbe_adapter *adapter)
+{
+	int i;
+
+	if (!(adapter->flags & IXGBE_FLAG_RSS_ENABLED))
+		return false;
+
+	for (i = 0; i < adapter->num_rx_queues; i++)
+		adapter->rx_ring[i]->reg_idx = i;
+	for (i = 0; i < adapter->num_tx_queues; i++)
+		adapter->tx_ring[i]->reg_idx = i;
+
+	return true;
+}
+
+/**
  * ixgbe_cache_ring_register - Descriptor ring to register mapping
  * @adapter: board private structure to initialize
  *
@@ -212,13 +181,7 @@ static void ixgbe_cache_ring_register(struct ixgbe_adapter *adapter)
 		return;
 #endif
 
-#ifdef IXGBE_FCOE
-	if (ixgbe_cache_ring_fcoe(adapter))
-		return;
-#endif /* IXGBE_FCOE */
-
-	if (ixgbe_cache_ring_rss(adapter))
-		return;
+	ixgbe_cache_ring_rss(adapter);
 }
 
 /**
@@ -234,6 +197,74 @@ static inline bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter)
 	return false;
 }
 
+#define IXGBE_RSS_16Q_MASK	0xF
+#define IXGBE_RSS_8Q_MASK	0x7
+#define IXGBE_RSS_4Q_MASK	0x3
+#define IXGBE_RSS_2Q_MASK	0x1
+#define IXGBE_RSS_DISABLED_MASK	0x0
+
+#ifdef CONFIG_IXGBE_DCB
+static bool ixgbe_set_dcb_queues(struct ixgbe_adapter *adapter)
+{
+	struct net_device *dev = adapter->netdev;
+	struct ixgbe_ring_feature *f;
+	int rss_i, rss_m, i;
+	int tcs;
+
+	/* Map queue offset and counts onto allocated tx queues */
+	tcs = netdev_get_num_tc(dev);
+
+	/* verify we have DCB queueing enabled before proceeding */
+	if (tcs <= 1)
+		return false;
+
+	/* determine the upper limit for our current DCB mode */
+	rss_i = dev->num_tx_queues / tcs;
+	if (adapter->hw.mac.type == ixgbe_mac_82598EB) {
+		/* 8 TC w/ 4 queues per TC */
+		rss_i = min_t(u16, rss_i, 4);
+		rss_m = IXGBE_RSS_4Q_MASK;
+	} else if (tcs > 4) {
+		/* 8 TC w/ 8 queues per TC */
+		rss_i = min_t(u16, rss_i, 8);
+		rss_m = IXGBE_RSS_8Q_MASK;
+	} else {
+		/* 4 TC w/ 16 queues per TC */
+		rss_i = min_t(u16, rss_i, 16);
+		rss_m = IXGBE_RSS_16Q_MASK;
+	}
+
+	/* set RSS mask and indices */
+	f = &adapter->ring_feature[RING_F_RSS];
+	rss_i = min_t(int, rss_i, f->limit);
+	f->indices = rss_i;
+	f->mask = rss_m;
+
+#ifdef IXGBE_FCOE
+	/* FCoE enabled queues require special configuration indexed
+	 * by feature specific indices and offset. Here we map FCoE
+	 * indices onto the DCB queue pairs allowing FCoE to own
+	 * configuration later.
+	 */
+	if (adapter->flags & IXGBE_FLAG_FCOE_ENABLED) {
+		u8 tc = ixgbe_fcoe_get_tc(adapter);
+
+		f = &adapter->ring_feature[RING_F_FCOE];
+		f->indices = min_t(u16, rss_i, f->limit);
+		f->offset = rss_i * tc;
+	}
+
+#endif /* IXGBE_FCOE */
+	for (i = 0; i < tcs; i++)
+		netdev_set_tc_queue(dev, i, rss_i, rss_i * i);
+
+	adapter->num_tx_queues = rss_i * tcs;
+	adapter->num_rx_queues = rss_i * tcs;
+
+	return true;
+}
+
+#endif
 /**
  * ixgbe_set_rss_queues - Allocate queues for RSS
  * @adapter: board private structure to initialize
@@ -257,7 +288,7 @@ static bool ixgbe_set_rss_queues(struct ixgbe_adapter *adapter)
 	rss_i = f->limit;
 
 	f->indices = rss_i;
-	f->mask = 0xF;
+	f->mask = IXGBE_RSS_16Q_MASK;
 
 	/*
 	 * Use Flow Director in addition to RSS to ensure the best
@@ -271,90 +302,41 @@ static bool ixgbe_set_rss_queues(struct ixgbe_adapter *adapter)
 		rss_i = max_t(u16, rss_i, f->indices);
 	}
 
-	adapter->num_rx_queues = rss_i;
-	adapter->num_tx_queues = rss_i;
-
-	return true;
-}
-
 #ifdef IXGBE_FCOE
-/**
- * ixgbe_set_fcoe_queues - Allocate queues for Fiber Channel over Ethernet (FCoE)
- * @adapter: board private structure to initialize
- *
- * FCoE RX FCRETA can use up to 8 rx queues for up to 8 different exchanges.
- * Offset is used as the index of the first rx queue used by FCoE.
- **/
-static inline bool ixgbe_set_fcoe_queues(struct ixgbe_adapter *adapter)
-{
-	struct ixgbe_ring_feature *f = &adapter->ring_feature[RING_F_FCOE];
+	/*
+	 * FCoE can exist on the same rings as standard network traffic
+	 * however it is preferred to avoid that if possible.  In order
+	 * to get the best performance we allocate as many FCoE queues
+	 * as we can and we place them at the end of the ring array to
+	 * avoid sharing queues with standard RSS on systems with 24 or
+	 * more CPUs.
+	 */
+	if (adapter->flags & IXGBE_FLAG_FCOE_ENABLED) {
+		struct net_device *dev = adapter->netdev;
+		u16 fcoe_i;
 
-	if (!(adapter->flags & IXGBE_FLAG_FCOE_ENABLED))
-		return false;
+		f = &adapter->ring_feature[RING_F_FCOE];
 
-	f->indices = min_t(int, num_online_cpus(), f->limit);
+		/* merge FCoE queues with RSS queues */
+		fcoe_i = min_t(u16, f->limit + rss_i, num_online_cpus());
+		fcoe_i = min_t(u16, fcoe_i, dev->num_tx_queues);
 
-	adapter->num_rx_queues = 1;
-	adapter->num_tx_queues = 1;
+		/* limit indices to rss_i if MSI-X is disabled */
+		if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED))
+			fcoe_i = rss_i;
 
-	if (adapter->flags & IXGBE_FLAG_RSS_ENABLED) {
-		e_info(probe, "FCoE enabled with RSS\n");
-		ixgbe_set_rss_queues(adapter);
+		/* attempt to reserve some queues for just FCoE */
+		f->indices = min_t(u16, fcoe_i, f->limit);
+		f->offset = fcoe_i - f->indices;
+		rss_i = max_t(u16, fcoe_i, rss_i);
 	}
 
-	/* adding FCoE rx rings to the end */
-	f->offset = adapter->num_rx_queues;
-	adapter->num_rx_queues += f->indices;
-	adapter->num_tx_queues += f->indices;
-
-	return true;
-}
 #endif /* IXGBE_FCOE */
-
-/* Artificial max queue cap per traffic class in DCB mode */
-#define DCB_QUEUE_CAP 8
-
-#ifdef CONFIG_IXGBE_DCB
-static inline bool ixgbe_set_dcb_queues(struct ixgbe_adapter *adapter)
-{
-	int per_tc_q, q, i, offset = 0;
-	struct net_device *dev = adapter->netdev;
-	int tcs = netdev_get_num_tc(dev);
-
-	if (!tcs)
-		return false;
-
-	/* Map queue offset and counts onto allocated tx queues */
-	per_tc_q = min_t(unsigned int, dev->num_tx_queues / tcs, DCB_QUEUE_CAP);
-	q = min_t(int, num_online_cpus(), per_tc_q);
-
-	for (i = 0; i < tcs; i++) {
-		netdev_set_tc_queue(dev, i, q, offset);
-		offset += q;
-	}
-
-	adapter->num_tx_queues = q * tcs;
-	adapter->num_rx_queues = q * tcs;
-
-#ifdef IXGBE_FCOE
-	/* FCoE enabled queues require special configuration indexed
-	 * by feature specific indices and offset. Here we map FCoE
-	 * indices onto the DCB queue pairs allowing FCoE to own
-	 * configuration later.
-	 */
-	if (adapter->flags & IXGBE_FLAG_FCOE_ENABLED) {
-		u8 tc = ixgbe_fcoe_get_tc(adapter);
-		struct ixgbe_ring_feature *f =
-					&adapter->ring_feature[RING_F_FCOE];
-
-		f->indices = dev->tc_to_txq[tc].count;
-		f->offset = dev->tc_to_txq[tc].offset;
-	}
-#endif
+	adapter->num_rx_queues = rss_i;
+	adapter->num_tx_queues = rss_i;
 
 	return true;
 }
-#endif
 
 /**
  * ixgbe_set_num_queues - Allocate queues for device, feature dependent
@@ -383,11 +365,6 @@ static int ixgbe_set_num_queues(struct ixgbe_adapter *adapter)
 		goto done;
 
 #endif
-#ifdef IXGBE_FCOE
-	if (ixgbe_set_fcoe_queues(adapter))
-		goto done;
-
-#endif /* IXGBE_FCOE */
 	if (ixgbe_set_rss_queues(adapter))
 		goto done;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 3d7ce7e..ee230f5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3610,16 +3610,17 @@ static void ixgbe_configure_dcb(struct ixgbe_adapter *adapter)
 	if (hw->mac.type != ixgbe_mac_82598EB) {
 		int i;
 		u32 reg = 0;
+		u8 msb = 0;
+		u8 rss_i = adapter->netdev->tc_to_txq[0].count - 1;
 
-		for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
-			u8 msb = 0;
-			u8 cnt = adapter->netdev->tc_to_txq[i].count;
-
-			while (cnt >>= 1)
-				msb++;
+		while (rss_i) {
+			msb++;
+			rss_i >>= 1;
+		}
 
+		for (i = 0; i < MAX_TRAFFIC_CLASS; i++)
 			reg |= msb << IXGBE_RQTC_SHIFT_TC(i);
-		}
+
 		IXGBE_WRITE_REG(hw, IXGBE_RQTC, reg);
 	}
 }
@@ -7027,7 +7028,11 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev,
 #endif
 
 	if (ii->mac == ixgbe_mac_82598EB)
+#ifdef CONFIG_IXGBE_DCB
+		indices = min_t(unsigned int, indices, MAX_TRAFFIC_CLASS * 4);
+#else
 		indices = min_t(unsigned int, indices, IXGBE_MAX_RSS_INDICES);
+#endif
 	else
 		indices = min_t(unsigned int, indices, IXGBE_MAX_FDIR_INDICES);
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH V2] drivers: connector: fixed coding style issues
From: Valentin Ilie @ 2012-07-14 23:08 UTC (permalink / raw)
  To: zbr; +Cc: netdev, Valentin Ilie
In-Reply-To: <1342297608-6872-1-git-send-email-valentin.ilie@gmail.com>

V2: Replaced assignment in if statement.
Fixed coding style issues.

Signed-off-by: Valentin Ilie <valentin.ilie@gmail.com>
---
 drivers/connector/cn_proc.c   |   36 ++++++++++++++++++------------------
 drivers/connector/cn_queue.c  |   12 +++++++-----
 drivers/connector/connector.c |    5 +++--
 3 files changed, 28 insertions(+), 25 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 77e1e6c..3e92b7d 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -46,7 +46,7 @@ static DEFINE_PER_CPU(__u32, proc_event_counts) = { 0 };
 static inline void get_seq(__u32 *ts, int *cpu)
 {
 	preempt_disable();
-	*ts = __this_cpu_inc_return(proc_event_counts) -1;
+	*ts = __this_cpu_inc_return(proc_event_counts) - 1;
 	*cpu = smp_processor_id();
 	preempt_enable();
 }
@@ -62,8 +62,8 @@ void proc_fork_connector(struct task_struct *task)
 	if (atomic_read(&proc_event_num_listeners) < 1)
 		return;
 
-	msg = (struct cn_msg*)buffer;
-	ev = (struct proc_event*)msg->data;
+	msg = (struct cn_msg *)buffer;
+	ev = (struct proc_event *)msg->data;
 	get_seq(&msg->seq, &ev->cpu);
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
 	put_unaligned(timespec_to_ns(&ts), (__u64 *)&ev->timestamp_ns);
@@ -93,8 +93,8 @@ void proc_exec_connector(struct task_struct *task)
 	if (atomic_read(&proc_event_num_listeners) < 1)
 		return;
 
-	msg = (struct cn_msg*)buffer;
-	ev = (struct proc_event*)msg->data;
+	msg = (struct cn_msg *)buffer;
+	ev = (struct proc_event *)msg->data;
 	get_seq(&msg->seq, &ev->cpu);
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
 	put_unaligned(timespec_to_ns(&ts), (__u64 *)&ev->timestamp_ns);
@@ -119,8 +119,8 @@ void proc_id_connector(struct task_struct *task, int which_id)
 	if (atomic_read(&proc_event_num_listeners) < 1)
 		return;
 
-	msg = (struct cn_msg*)buffer;
-	ev = (struct proc_event*)msg->data;
+	msg = (struct cn_msg *)buffer;
+	ev = (struct proc_event *)msg->data;
 	ev->what = which_id;
 	ev->event_data.id.process_pid = task->pid;
 	ev->event_data.id.process_tgid = task->tgid;
@@ -134,7 +134,7 @@ void proc_id_connector(struct task_struct *task, int which_id)
 		ev->event_data.id.e.egid = cred->egid;
 	} else {
 		rcu_read_unlock();
-	     	return;
+		return;
 	}
 	rcu_read_unlock();
 	get_seq(&msg->seq, &ev->cpu);
@@ -241,8 +241,8 @@ void proc_exit_connector(struct task_struct *task)
 	if (atomic_read(&proc_event_num_listeners) < 1)
 		return;
 
-	msg = (struct cn_msg*)buffer;
-	ev = (struct proc_event*)msg->data;
+	msg = (struct cn_msg *)buffer;
+	ev = (struct proc_event *)msg->data;
 	get_seq(&msg->seq, &ev->cpu);
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
 	put_unaligned(timespec_to_ns(&ts), (__u64 *)&ev->timestamp_ns);
@@ -276,8 +276,8 @@ static void cn_proc_ack(int err, int rcvd_seq, int rcvd_ack)
 	if (atomic_read(&proc_event_num_listeners) < 1)
 		return;
 
-	msg = (struct cn_msg*)buffer;
-	ev = (struct proc_event*)msg->data;
+	msg = (struct cn_msg *)buffer;
+	ev = (struct proc_event *)msg->data;
 	msg->seq = rcvd_seq;
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
 	put_unaligned(timespec_to_ns(&ts), (__u64 *)&ev->timestamp_ns);
@@ -303,7 +303,7 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg,
 	if (msg->len != sizeof(*mc_op))
 		return;
 
-	mc_op = (enum proc_cn_mcast_op*)msg->data;
+	mc_op = (enum proc_cn_mcast_op *)msg->data;
 	switch (*mc_op) {
 	case PROC_CN_MCAST_LISTEN:
 		atomic_inc(&proc_event_num_listeners);
@@ -325,11 +325,11 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg,
  */
 static int __init cn_proc_init(void)
 {
-	int err;
-
-	if ((err = cn_add_callback(&cn_proc_event_id, "cn_proc",
-	 			   &cn_proc_mcast_ctl))) {
-		printk(KERN_WARNING "cn_proc failed to register\n");
+	int err = cn_add_callback(&cn_proc_event_id,
+				  "cn_proc",
+				  &cn_proc_mcast_ctl);
+	if (err) {
+		pr_warn("cn_proc failed to register\n");
 		return err;
 	}
 	return 0;
diff --git a/drivers/connector/cn_queue.c b/drivers/connector/cn_queue.c
index c42c9d5..1f8bf05 100644
--- a/drivers/connector/cn_queue.c
+++ b/drivers/connector/cn_queue.c
@@ -1,5 +1,5 @@
 /*
- * 	cn_queue.c
+ *	cn_queue.c
  *
  * 2004+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net>
  * All rights reserved.
@@ -34,13 +34,14 @@
 static struct cn_callback_entry *
 cn_queue_alloc_callback_entry(struct cn_queue_dev *dev, const char *name,
 			      struct cb_id *id,
-			      void (*callback)(struct cn_msg *, struct netlink_skb_parms *))
+			      void (*callback)(struct cn_msg *,
+					       struct netlink_skb_parms *))
 {
 	struct cn_callback_entry *cbq;
 
 	cbq = kzalloc(sizeof(*cbq), GFP_KERNEL);
 	if (!cbq) {
-		printk(KERN_ERR "Failed to create new callback queue.\n");
+		pr_err("Failed to create new callback queue.\n");
 		return NULL;
 	}
 
@@ -71,7 +72,8 @@ int cn_cb_equal(struct cb_id *i1, struct cb_id *i2)
 
 int cn_queue_add_callback(struct cn_queue_dev *dev, const char *name,
 			  struct cb_id *id,
-			  void (*callback)(struct cn_msg *, struct netlink_skb_parms *))
+			  void (*callback)(struct cn_msg *,
+					   struct netlink_skb_parms *))
 {
 	struct cn_callback_entry *cbq, *__cbq;
 	int found = 0;
@@ -149,7 +151,7 @@ void cn_queue_free_dev(struct cn_queue_dev *dev)
 	spin_unlock_bh(&dev->queue_lock);
 
 	while (atomic_read(&dev->refcnt)) {
-		printk(KERN_INFO "Waiting for %s to become free: refcnt=%d.\n",
+		pr_info("Waiting for %s to become free: refcnt=%d.\n",
 		       dev->name, atomic_read(&dev->refcnt));
 		msleep(1000);
 	}
diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index dde6a0f..32174f0 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -1,5 +1,5 @@
 /*
- * 	connector.c
+ *	connector.c
  *
  * 2004+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net>
  * All rights reserved.
@@ -185,7 +185,8 @@ static void cn_rx_skb(struct sk_buff *__skb)
  * May sleep.
  */
 int cn_add_callback(struct cb_id *id, const char *name,
-		    void (*callback)(struct cn_msg *, struct netlink_skb_parms *))
+		    void (*callback)(struct cn_msg *,
+				     struct netlink_skb_parms *))
 {
 	int err;
 	struct cn_dev *dev = &cdev;
-- 
1.7.10.4

^ permalink raw reply related

* Re: resurrecting tcphealth
From: Piotr Sawuk @ 2012-07-14 23:43 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel
In-Reply-To: <20120714144800.7f8c97f6@nehalam.linuxnetplumber.net>

On Sa, 14.07.2012, 23:48, Stephen Hemminger wrote:
> Maybe it would be better to use netlink info (for ss command)
> rather than a /proc/net interface.
> See how existing TCP values and MEMINFO are handled.
>

I'm confused, what exactly do you mean?
of course a library-interface might be more interesting than proc/net.
the proc/net/tcphealth functionality probably could be done in userspace,
although I'm wondering how userspace could traverse all tcp_sock structures
for all clients -- but the gain would be a bit more security. and the rest
of the patch still would need to stay in the kernel since that info it
collects would otherwise be lost completely. so I somehow doubt this would
be worth the effort. better would be optional tcphealth disabled by default
for security-reasons. (i.e. if you're a server you don't want to enable it.)

@"David Miller":
sorry, I forgot an important info about this patch, an info that can be
found in the paper I quoted:

'We concentrate on client side metrics to make our results more applicable
to web clients. We chose this strategy to help answer the often asked
question: "Why is my Internet connection so slow?"
...
Since we only have access to the client node and not the server, we must
fi\fnd performance data using only the information coming downstream to us.
Savage showed that the TCP protocol maintains data that we can leverage to
our advantage and we use that idea to monitor per-connection TCP health at
the client. This task is made difficult because of the TCP philosophy of
"smart sender, dumb receiver". There is simply more information about the
connection on the server side than on the client.' [1]

tcp_info is such a server-side diagnosis, clients usually don't send enough
data to make the info therein meaningful for checking the "health" of tcp.
it's all about the senders, and the desktop-pc simply isn't that big of a
tcp-sender to slashdot and co.

@"Eric Dumazet":
the same goes for you too. netstat -s and whatever kind of pooling together
statistics on all internet-connections is useful for a server-admin only,
slowness of particular sites can only be investigated with ping and this
tcphealth patch when you're on client-side.

as for userspace-tools, apart from tcp-health display in each
downloading-progress-bar I can think of cron-jobs which wait for better
tcp-health before spidering some site, or an altered wget in mirror-mode.
the gkrellm-plugin that comes with this patch might count together all the
data of tcphealth, but when a tcp_sock isn't stored anymore that data is
forgotten and you get a concurrent info most likely for only one site, since
the client wont have much connections open and will be closing old
connections frequently.

oh, and again I recommend the really short although outdated thesis

[1] https://sacerdoti.org/tcphealth/tcphealth-paper.pdf

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox