Netdev List

Netdev List
 help / color / mirror / Atom feed

* Please reply
From: Jose Calvache @ 2014-10-09 19:50 UTC (permalink / raw)


Dear Sir/Madam, Here is a pdf attachment of my proposal to you. Please
read and reply I would be grateful. Jose Calvache

^ permalink raw reply

* [PATCH net 0/3] SCTP fixes
From: Daniel Borkmann @ 2014-10-09 20:55 UTC (permalink / raw)
  To: davem; +Cc: linux-sctp, netdev

Here are some SCTP fixes.

[ Note, immediate workaround would be to disable ASCONF (it
  is sysctl disabled by default). It is actually only used
  together with chunk authentication. ]

Thanks!

Daniel Borkmann (3):
  net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks
  net: sctp: fix panic on duplicate ASCONF chunks
  net: sctp: fix remote memory pressure from excessive queueing

 include/net/sctp/sctp.h  |  5 +++
 include/net/sctp/sm.h    |  6 +--
 net/sctp/associola.c     |  2 +
 net/sctp/inqueue.c       | 33 ++++------------
 net/sctp/sm_make_chunk.c | 99 +++++++++++++++++++++++++++---------------------
 net/sctp/sm_statefuns.c  | 21 +++-------
 6 files changed, 77 insertions(+), 89 deletions(-)

-- 
1.7.11.7

^ permalink raw reply

* [PATCH net 1/3] net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks
From: Daniel Borkmann @ 2014-10-09 20:55 UTC (permalink / raw)
  To: davem; +Cc: linux-sctp, netdev, Vlad Yasevich
In-Reply-To: <1412888133-833-1-git-send-email-dborkman@redhat.com>

Commit 6f4c618ddb0 ("SCTP : Add paramters validity check for
ASCONF chunk") added basic verification of ASCONF chunks, however,
it is still possible to remotely crash a server by sending a
special crafted ASCONF chunk, even up to pre 2.6.12 kernels:

skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768
 head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950
 end:0x440 dev:<NULL>
 ------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
 <IRQ>
 [<ffffffff8144fb1c>] skb_put+0x5c/0x70
 [<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp]
 [<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp]
 [<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20
 [<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp]
 [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
 [<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0
 [<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
 [<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
 [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
 [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
 [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
 [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0
 [<ffffffff81497078>] ip_local_deliver+0x98/0xa0
 [<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440
 [<ffffffff81496ac5>] ip_rcv+0x275/0x350
 [<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750
 [<ffffffff81460588>] netif_receive_skb+0x58/0x60

This can be triggered e.g., through a simple scripted nmap
connection scan injecting the chunk after the handshake, for
example, ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ------------------ ASCONF; UNKNOWN ------------------>

... where ASCONF chunk of length 280 contains 2 parameters ...

  1) Add IP address parameter (param length: 16)
  2) Add/del IP address parameter (param length: 255)

... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
Address Parameter in the ASCONF chunk is even missing, too.
This is just an example and similarly-crafted ASCONF chunks
could be used just as well.

The ASCONF chunk passes through sctp_verify_asconf() as all
parameters passed sanity checks, and after walking, we ended
up successfully at the chunk end boundary, and thus may invoke
sctp_process_asconf(). Parameter walking is done with
WORD_ROUND() to take padding into account.

In sctp_process_asconf()'s TLV processing, we may fail in
sctp_process_asconf_param() e.g., due to removal of the IP
address that is also the source address of the packet containing
the ASCONF chunk, and thus we need to add all TLVs after the
failure to our ASCONF response to remote via helper function
sctp_add_asconf_response(), which basically invokes a
sctp_addto_chunk() adding the error parameters to the given
skb.

When walking to the next parameter this time, we proceed
with ...

  length = ntohs(asconf_param->param_hdr.length);
  asconf_param = (void *)asconf_param + length;

... instead of the WORD_ROUND()'ed length, thus resulting here
in an off-by-one that leads to reading the follow-up garbage
parameter length of 12336, and thus throwing an skb_over_panic
for the reply when trying to sctp_addto_chunk() next time,
which implicitly calls the skb_put() with that length.

Fix it by using sctp_walk_params() [ which is also used in
INIT parameter processing ] macro in the verification *and*
in ASCONF processing: it will make sure we don't spill over,
that we walk parameters WORD_ROUND()'ed. Moreover, we're being
more defensive and guard against unknown parameter types and
missized addresses.

Joint work with Vlad Yasevich.

Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
---
 include/net/sctp/sm.h    |  6 +--
 net/sctp/sm_make_chunk.c | 99 +++++++++++++++++++++++++++---------------------
 net/sctp/sm_statefuns.c  | 18 +--------
 3 files changed, 60 insertions(+), 63 deletions(-)

diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index 7f4eeb3..72a31db 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -248,9 +248,9 @@ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *,
 					      int, __be16);
 struct sctp_chunk *sctp_make_asconf_set_prim(struct sctp_association *asoc,
 					     union sctp_addr *addr);
-int sctp_verify_asconf(const struct sctp_association *asoc,
-		       struct sctp_paramhdr *param_hdr, void *chunk_end,
-		       struct sctp_paramhdr **errp);
+bool sctp_verify_asconf(const struct sctp_association *asoc,
+			struct sctp_chunk *chunk, bool addr_param_needed,
+			struct sctp_paramhdr **errp);
 struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
 				       struct sctp_chunk *asconf);
 int sctp_process_asconf_ack(struct sctp_association *asoc,
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index ae0e616..ab734be 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -3110,50 +3110,63 @@ static __be16 sctp_process_asconf_param(struct sctp_association *asoc,
 	return SCTP_ERROR_NO_ERROR;
 }
 
-/* Verify the ASCONF packet before we process it.  */
-int sctp_verify_asconf(const struct sctp_association *asoc,
-		       struct sctp_paramhdr *param_hdr, void *chunk_end,
-		       struct sctp_paramhdr **errp) {
-	sctp_addip_param_t *asconf_param;
+/* Verify the ASCONF packet before we process it. */
+bool sctp_verify_asconf(const struct sctp_association *asoc,
+			struct sctp_chunk *chunk, bool addr_param_needed,
+			struct sctp_paramhdr **errp)
+{
+	sctp_addip_chunk_t *addip = (sctp_addip_chunk_t *) chunk->chunk_hdr;
 	union sctp_params param;
-	int length, plen;
-
-	param.v = (sctp_paramhdr_t *) param_hdr;
-	while (param.v <= chunk_end - sizeof(sctp_paramhdr_t)) {
-		length = ntohs(param.p->length);
-		*errp = param.p;
+	bool addr_param_seen = false;
 
-		if (param.v > chunk_end - length ||
-		    length < sizeof(sctp_paramhdr_t))
-			return 0;
+	sctp_walk_params(param, addip, addip_hdr.params) {
+		size_t length = ntohs(param.p->length);
 
+		*errp = param.p;
 		switch (param.p->type) {
+		case SCTP_PARAM_ERR_CAUSE:
+			break;
+		case SCTP_PARAM_IPV4_ADDRESS:
+			if (length != sizeof(sctp_ipv4addr_param_t))
+				return false;
+			addr_param_seen = true;
+			break;
+		case SCTP_PARAM_IPV6_ADDRESS:
+			if (length != sizeof(sctp_ipv6addr_param_t))
+				return false;
+			addr_param_seen = true;
+			break;
 		case SCTP_PARAM_ADD_IP:
 		case SCTP_PARAM_DEL_IP:
 		case SCTP_PARAM_SET_PRIMARY:
-			asconf_param = (sctp_addip_param_t *)param.v;
-			plen = ntohs(asconf_param->param_hdr.length);
-			if (plen < sizeof(sctp_addip_param_t) +
-			    sizeof(sctp_paramhdr_t))
-				return 0;
+			/* In ASCONF chunks, these need to be first. */
+			if (addr_param_needed && !addr_param_seen)
+				return false;
+			length = ntohs(param.addip->param_hdr.length);
+			if (length < sizeof(sctp_addip_param_t) +
+				     sizeof(sctp_paramhdr_t))
+				return false;
 			break;
 		case SCTP_PARAM_SUCCESS_REPORT:
 		case SCTP_PARAM_ADAPTATION_LAYER_IND:
 			if (length != sizeof(sctp_addip_param_t))
-				return 0;
-
+				return false;
 			break;
 		default:
-			break;
+			/* This is unkown to us, reject! */
+			return false;
 		}
-
-		param.v += WORD_ROUND(length);
 	}
 
-	if (param.v != chunk_end)
-		return 0;
+	/* Remaining sanity checks. */
+	if (addr_param_needed && !addr_param_seen)
+		return false;
+	if (!addr_param_needed && addr_param_seen)
+		return false;
+	if (param.v != chunk->chunk_end)
+		return false;
 
-	return 1;
+	return true;
 }
 
 /* Process an incoming ASCONF chunk with the next expected serial no. and
@@ -3162,16 +3175,17 @@ int sctp_verify_asconf(const struct sctp_association *asoc,
 struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
 				       struct sctp_chunk *asconf)
 {
+	sctp_addip_chunk_t *addip = (sctp_addip_chunk_t *) asconf->chunk_hdr;
+	bool all_param_pass = true;
+	union sctp_params param;
 	sctp_addiphdr_t		*hdr;
 	union sctp_addr_param	*addr_param;
 	sctp_addip_param_t	*asconf_param;
 	struct sctp_chunk	*asconf_ack;
-
 	__be16	err_code;
 	int	length = 0;
 	int	chunk_len;
 	__u32	serial;
-	int	all_param_pass = 1;
 
 	chunk_len = ntohs(asconf->chunk_hdr->length) - sizeof(sctp_chunkhdr_t);
 	hdr = (sctp_addiphdr_t *)asconf->skb->data;
@@ -3199,9 +3213,14 @@ struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
 		goto done;
 
 	/* Process the TLVs contained within the ASCONF chunk. */
-	while (chunk_len > 0) {
+	sctp_walk_params(param, addip, addip_hdr.params) {
+		/* Skip preceeding address parameters. */
+		if (param.p->type == SCTP_PARAM_IPV4_ADDRESS ||
+		    param.p->type == SCTP_PARAM_IPV6_ADDRESS)
+			continue;
+
 		err_code = sctp_process_asconf_param(asoc, asconf,
-						     asconf_param);
+						     param.addip);
 		/* ADDIP 4.1 A7)
 		 * If an error response is received for a TLV parameter,
 		 * all TLVs with no response before the failed TLV are
@@ -3209,28 +3228,20 @@ struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
 		 * the failed response are considered unsuccessful unless
 		 * a specific success indication is present for the parameter.
 		 */
-		if (SCTP_ERROR_NO_ERROR != err_code)
-			all_param_pass = 0;
-
+		if (err_code != SCTP_ERROR_NO_ERROR)
+			all_param_pass = false;
 		if (!all_param_pass)
-			sctp_add_asconf_response(asconf_ack,
-						 asconf_param->crr_id, err_code,
-						 asconf_param);
+			sctp_add_asconf_response(asconf_ack, param.addip->crr_id,
+						 err_code, param.addip);
 
 		/* ADDIP 4.3 D11) When an endpoint receiving an ASCONF to add
 		 * an IP address sends an 'Out of Resource' in its response, it
 		 * MUST also fail any subsequent add or delete requests bundled
 		 * in the ASCONF.
 		 */
-		if (SCTP_ERROR_RSRC_LOW == err_code)
+		if (err_code == SCTP_ERROR_RSRC_LOW)
 			goto done;
-
-		/* Move to the next ASCONF param. */
-		length = ntohs(asconf_param->param_hdr.length);
-		asconf_param = (void *)asconf_param + length;
-		chunk_len -= length;
 	}
-
 done:
 	asoc->peer.addip_serial++;
 
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index c8f6063..bdea3df 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -3591,9 +3591,7 @@ sctp_disposition_t sctp_sf_do_asconf(struct net *net,
 	struct sctp_chunk	*asconf_ack = NULL;
 	struct sctp_paramhdr	*err_param = NULL;
 	sctp_addiphdr_t		*hdr;
-	union sctp_addr_param	*addr_param;
 	__u32			serial;
-	int			length;
 
 	if (!sctp_vtag_verify(chunk, asoc)) {
 		sctp_add_cmd_sf(commands, SCTP_CMD_REPORT_BAD_TAG,
@@ -3618,17 +3616,8 @@ sctp_disposition_t sctp_sf_do_asconf(struct net *net,
 	hdr = (sctp_addiphdr_t *)chunk->skb->data;
 	serial = ntohl(hdr->serial);
 
-	addr_param = (union sctp_addr_param *)hdr->params;
-	length = ntohs(addr_param->p.length);
-	if (length < sizeof(sctp_paramhdr_t))
-		return sctp_sf_violation_paramlen(net, ep, asoc, type, arg,
-			   (void *)addr_param, commands);
-
 	/* Verify the ASCONF chunk before processing it. */
-	if (!sctp_verify_asconf(asoc,
-			    (sctp_paramhdr_t *)((void *)addr_param + length),
-			    (void *)chunk->chunk_end,
-			    &err_param))
+	if (!sctp_verify_asconf(asoc, chunk, true, &err_param))
 		return sctp_sf_violation_paramlen(net, ep, asoc, type, arg,
 						  (void *)err_param, commands);
 
@@ -3745,10 +3734,7 @@ sctp_disposition_t sctp_sf_do_asconf_ack(struct net *net,
 	rcvd_serial = ntohl(addip_hdr->serial);
 
 	/* Verify the ASCONF-ACK chunk before processing it. */
-	if (!sctp_verify_asconf(asoc,
-	    (sctp_paramhdr_t *)addip_hdr->params,
-	    (void *)asconf_ack->chunk_end,
-	    &err_param))
+	if (!sctp_verify_asconf(asoc, asconf_ack, false, &err_param))
 		return sctp_sf_violation_paramlen(net, ep, asoc, type, arg,
 			   (void *)err_param, commands);
 
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net 2/3] net: sctp: fix panic on duplicate ASCONF chunks
From: Daniel Borkmann @ 2014-10-09 20:55 UTC (permalink / raw)
  To: davem; +Cc: linux-sctp, netdev, Vlad Yasevich
In-Reply-To: <1412888133-833-1-git-send-email-dborkman@redhat.com>

When receiving a e.g. semi-good formed connection scan in the
form of ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ---------------- ASCONF_a; ASCONF_b ----------------->

... where ASCONF_a equals ASCONF_b chunk (at least both serials
need to be equal), we panic an SCTP server!

The problem is that good-formed ASCONF chunks that we reply with
ASCONF_ACK chunks are cached per serial. Thus, when we receive a
same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
not need to process them again on the server side (that was the
idea, also proposed in the RFC). Instead, we know it was cached
and we just resend the cached chunk instead. So far, so good.

Where things get nasty is in SCTP's side effect interpreter, that
is, sctp_cmd_interpreter():

While incoming ASCONF_a (chunk = event_arg) is being marked
!end_of_packet and !singleton, and we have an association context,
we do not flush the outqueue the first time after processing the
ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
queued up, although we set local_cork to 1. Commit 2e3216cd54b1
changed the precedence, so that as long as we get bundled, incoming
chunks we try possible bundling on outgoing queue as well. Before
this commit, we would just flush the output queue.

Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
continue to process the same ASCONF_b chunk from the packet. As
we have cached the previous ASCONF_ACK, we find it, grab it and
do another SCTP_CMD_REPLY command on it. So, effectively, we rip
the chunk->list pointers and requeue the same ASCONF_ACK chunk
another time. Since we process ASCONF_b, it's correctly marked
with end_of_packet and we enforce an uncork, and thus flush, thus
crashing the kernel.

Fix it by testing if the ASCONF_ACK is currently pending and if
that is the case, do not requeue it. When flushing the output
queue we may relink the chunk for preparing an outgoing packet,
but eventually unlink it when it's copied into the skb right
before transmission.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
---
 include/net/sctp/sctp.h | 5 +++++
 net/sctp/associola.c    | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 9fbd856..856f01c 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -426,6 +426,11 @@ static inline void sctp_assoc_pending_pmtu(struct sock *sk, struct sctp_associat
 	asoc->pmtu_pending = 0;
 }

+static inline bool sctp_chunk_pending(const struct sctp_chunk *chunk)
+{
+	return !list_empty(&chunk->list);
+}
+
 /* Walk through a list of TLV parameters.  Don't trust the
  * individual parameter lengths and instead depend on
  * the chunk length to indicate when to stop.  Make sure
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index a88b852..f791edd 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1668,6 +1668,8 @@ struct sctp_chunk *sctp_assoc_lookup_asconf_ack(
 	 * ack chunk whose serial number matches that of the request.
 	 */
 	list_for_each_entry(ack, &asoc->asconf_ack_list, transmitted_list) {
+		if (sctp_chunk_pending(ack))
+			continue;
 		if (ack->subh.addip_hdr->serial == serial) {
 			sctp_chunk_hold(ack);
 			return ack;
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net 3/3] net: sctp: fix remote memory pressure from excessive queueing
From: Daniel Borkmann @ 2014-10-09 20:55 UTC (permalink / raw)
  To: davem; +Cc: linux-sctp, netdev, Vlad Yasevich
In-Reply-To: <1412888133-833-1-git-send-email-dborkman@redhat.com>

This scenario is not limited to ASCONF, just taken as one
example triggering the issue. When receiving ASCONF probes
in the form of ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------>
  [...]
  ---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------>

... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed
ASCONFs and have increasing serial numbers, we process such
ASCONF chunk(s) marked with !end_of_packet and !singleton,
since we have not yet reached the SCTP packet end. SCTP does
only do verification on a chunk by chunk basis, as an SCTP
packet is nothing more than just a container of a stream of
chunks which it eats up one by one.

We could run into the case that we receive a packet with a
malformed tail, above marked as trailing JUNK. All previous
chunks are here goodformed, so the stack will eat up all
previous chunks up to this point. In case JUNK does not fit
into a chunk header and there are no more other chunks in
the input queue, or in case JUNK contains a garbage chunk
header, but the encoded chunk length would exceed the skb
tail, or we came here from an entirely different scenario
and the chunk has pdiscard=1 mark (without having had a flush
point), it will happen, that we will excessively queue up
the association's output queue (a correct final chunk may
then turn it into a response flood when flushing the
queue ;)): I ran a simple script with incremental ASCONF
serial numbers and could see the server side consuming
excessive amount of RAM [before/after: up to 2GB and more].

The issue at heart is that the chunk train basically ends
with !end_of_packet and !singleton markers and since commit
2e3216cd54b1 ("sctp: Follow security requirement of responding
with 1 packet") therefore preventing an output queue flush
point in sctp_do_sm() -> sctp_cmd_interpreter() on the input
chunk (chunk = event_arg) even though local_cork is set,
but its precedence has changed since then. In the normal
case, the last chunk with end_of_packet=1 would trigger the
queue flush to accommodate possible outgoing bundling.

In the input queue, sctp_inq_pop() seems to do the right thing
in terms of discarding invalid chunks. So, above JUNK will
not enter the state machine and instead be released and exit
the sctp_assoc_bh_rcv() chunk processing loop. It's simply
the flush point being missing at loop exit. Adding a try-flush
approach on the output queue might not work as the underlying
infrastructure might be long gone at this point due to the
side-effect interpreter run.

One possibility, albeit a bit of a kludge, would be to defer
invalid chunk freeing into the state machine in order to
possibly trigger packet discards and thus indirectly a queue
flush on error. It would surely be better to discard chunks
as in the current, perhaps better controlled environment, but
going back and forth, it's simply architecturally not possible.
I tried various trailing JUNK attack cases and it seems to
look good now.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
---
 net/sctp/inqueue.c      | 33 +++++++--------------------------
 net/sctp/sm_statefuns.c |  3 +++
 2 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/net/sctp/inqueue.c b/net/sctp/inqueue.c
index 4de12af..7e8a16c 100644
--- a/net/sctp/inqueue.c
+++ b/net/sctp/inqueue.c
@@ -140,18 +140,9 @@ struct sctp_chunk *sctp_inq_pop(struct sctp_inq *queue)
 		} else {
 			/* Nothing to do. Next chunk in the packet, please. */
 			ch = (sctp_chunkhdr_t *) chunk->chunk_end;
-
 			/* Force chunk->skb->data to chunk->chunk_end.  */
-			skb_pull(chunk->skb,
-				 chunk->chunk_end - chunk->skb->data);
-
-			/* Verify that we have at least chunk headers
-			 * worth of buffer left.
-			 */
-			if (skb_headlen(chunk->skb) < sizeof(sctp_chunkhdr_t)) {
-				sctp_chunk_free(chunk);
-				chunk = queue->in_progress = NULL;
-			}
+			skb_pull(chunk->skb, chunk->chunk_end - chunk->skb->data);
+			/* We are guaranteed to pull a SCTP header. */
 		}
 	}

@@ -187,24 +178,14 @@ struct sctp_chunk *sctp_inq_pop(struct sctp_inq *queue)
 	skb_pull(chunk->skb, sizeof(sctp_chunkhdr_t));
 	chunk->subh.v = NULL; /* Subheader is no longer valid.  */

-	if (chunk->chunk_end < skb_tail_pointer(chunk->skb)) {
+	if (chunk->chunk_end + sizeof(sctp_chunkhdr_t) <
+	    skb_tail_pointer(chunk->skb)) {
 		/* This is not a singleton */
 		chunk->singleton = 0;
 	} else if (chunk->chunk_end > skb_tail_pointer(chunk->skb)) {
-		/* RFC 2960, Section 6.10  Bundling
-		 *
-		 * Partial chunks MUST NOT be placed in an SCTP packet.
-		 * If the receiver detects a partial chunk, it MUST drop
-		 * the chunk.
-		 *
-		 * Since the end of the chunk is past the end of our buffer
-		 * (which contains the whole packet, we can freely discard
-		 * the whole packet.
-		 */
-		sctp_chunk_free(chunk);
-		chunk = queue->in_progress = NULL;
-
-		return NULL;
+		/* Discard inside state machine. */
+		chunk->pdiscard = 1;
+		chunk->chunk_end = skb_tail_pointer(chunk->skb);
 	} else {
 		/* We are at the end of the packet, so mark the chunk
 		 * in case we need to send a SACK.
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index bdea3df..3ee27b7 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -170,6 +170,9 @@ sctp_chunk_length_valid(struct sctp_chunk *chunk,
 {
 	__u16 chunk_length = ntohs(chunk->chunk_hdr->length);

+	/* Previously already marked? */
+	if (unlikely(chunk->pdiscard))
+		return 0;
 	if (unlikely(chunk_length < required_length))
 		return 0;

-- 
1.7.11.7

^ permalink raw reply related

* [RFC] bridge: Add support for IEEE 802.11 Proxy ARP
From: Kyeyoon Park @ 2014-10-09 21:27 UTC (permalink / raw)
  To: davem; +Cc: kyeyoonp, jouni, netdev

This feature is defined in IEEE Std 802.11-2012, 10.23.13. It allows
the AP devices to keep track of the hardware-address-to-IP-address
mapping of the mobile devices within the WLAN network.

The AP will learn this mapping via observing DHCP, ARP, and NS/NA
frames. When a request for such information is made (i.e. ARP request,
Neighbor Solicitation), the AP will respond on behalf of the
associated mobile device. In the process of doing so, the AP will drop
the multicast request frame that was intended to go out to the wireless
medium.

It was recommended at the LKS workshop to do this implementation in
the bridge layer. vxlan.c is already doing something very similar.
The DHCP snooping code will be added to the userspace application
(hostapd) per the recommendation.

This RFC commit is only for IPv4. A similar approach in the bridge
layer will be taken for IPv6 as well.

Signed-off-by: Kyeyoon Park <kyeyoonp@codeaurora.org>
---
 include/uapi/linux/if_link.h |  1 +
 net/bridge/br_forward.c      |  5 ++++
 net/bridge/br_input.c        | 60 ++++++++++++++++++++++++++++++++++++++++++++
 net/bridge/br_netlink.c      |  5 +++-
 net/bridge/br_private.h      |  1 +
 net/bridge/br_sysfs_if.c     |  2 ++
 6 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index ff95760..62a17e0 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -231,6 +231,7 @@ enum {
 	IFLA_BRPORT_FAST_LEAVE,	/* multicast fast leave    */
 	IFLA_BRPORT_LEARNING,	/* mac learning */
 	IFLA_BRPORT_UNICAST_FLOOD, /* flood unicast traffic */
+	IFLA_BRPORT_PROXYARP,	/* proxy ARP */
 	__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 056b67b..61d9edf 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -181,6 +181,11 @@ static void br_flood(struct net_bridge *br, struct sk_buff *skb,
 		/* Do not flood unicast traffic to ports that turn it off */
 		if (unicast && !(p->flags & BR_FLOOD))
 			continue;
+
+		/* Do not flood to ports that enable proxy ARP */
+		if (p->flags & BR_PROXYARP)
+			continue;
+
 		prev = maybe_deliver(prev, p, skb, __packet_hook);
 		if (IS_ERR(prev))
 			goto out;
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 366c436..a548fbf 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -16,6 +16,8 @@
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>
 #include <linux/netfilter_bridge.h>
+#include <linux/neighbour.h>
+#include <net/arp.h>
 #include <linux/export.h>
 #include <linux/rculist.h>
 #include "br_private.h"
@@ -57,6 +59,60 @@ static int br_pass_frame_up(struct sk_buff *skb)
 		       netif_receive_skb);
 }
 
+static void br_do_proxy_arp(struct sk_buff *skb, struct net_bridge *br,
+			    u16 vid)
+{
+	struct net_device *dev = br->dev;
+	struct neighbour *n;
+	struct arphdr *parp;
+	u8 *arpptr, *sha;
+	__be32 sip, tip;
+
+	if (dev->flags & IFF_NOARP)
+		return;
+
+	if (!pskb_may_pull(skb, arp_hdr_len(dev))) {
+		dev->stats.tx_dropped++;
+		return;
+	}
+	parp = arp_hdr(skb);
+
+	if (parp->ar_pro != htons(ETH_P_IP) ||
+	    parp->ar_op != htons(ARPOP_REQUEST) ||
+	    parp->ar_hln != dev->addr_len ||
+	    parp->ar_pln != 4)
+		return;
+
+	arpptr = (u8 *)parp + sizeof(struct arphdr);
+	sha = arpptr;
+	arpptr += dev->addr_len;	/* sha */
+	memcpy(&sip, arpptr, sizeof(sip));
+	arpptr += sizeof(sip);
+	arpptr += dev->addr_len;	/* tha */
+	memcpy(&tip, arpptr, sizeof(tip));
+
+	if (ipv4_is_loopback(tip) ||
+	    ipv4_is_multicast(tip))
+		return;
+
+	n = neigh_lookup(&arp_tbl, &tip, dev);
+	if (n) {
+		struct net_bridge_fdb_entry *f;
+
+		if (!(n->nud_state & NUD_VALID)) {
+			neigh_release(n);
+			return;
+		}
+
+		f = __br_fdb_get(br, n->ha, vid);
+		if (f)
+			arp_send(ARPOP_REPLY, ETH_P_ARP, sip, skb->dev, tip,
+				 sha, n->ha, sha);
+
+		neigh_release(n);
+	}
+}
+
 /* note: already called with rcu_read_lock */
 int br_handle_frame_finish(struct sk_buff *skb)
 {
@@ -98,6 +154,10 @@ int br_handle_frame_finish(struct sk_buff *skb)
 	dst = NULL;
 
 	if (is_broadcast_ether_addr(dest)) {
+		if (p->flags & BR_PROXYARP &&
+		    skb->protocol == htons(ETH_P_ARP))
+			br_do_proxy_arp(skb, br, vid);
+
 		skb2 = skb;
 		unicast = false;
 	} else if (is_multicast_ether_addr(dest)) {
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index cb5fcf6..eb7bd93 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -60,7 +60,9 @@ static int br_port_fill_attrs(struct sk_buff *skb,
 	    nla_put_u8(skb, IFLA_BRPORT_PROTECT, !!(p->flags & BR_ROOT_BLOCK)) ||
 	    nla_put_u8(skb, IFLA_BRPORT_FAST_LEAVE, !!(p->flags & BR_MULTICAST_FAST_LEAVE)) ||
 	    nla_put_u8(skb, IFLA_BRPORT_LEARNING, !!(p->flags & BR_LEARNING)) ||
-	    nla_put_u8(skb, IFLA_BRPORT_UNICAST_FLOOD, !!(p->flags & BR_FLOOD)))
+	    nla_put_u8(skb, IFLA_BRPORT_UNICAST_FLOOD,
+		       !!(p->flags & BR_FLOOD)) ||
+	    nla_put_u8(skb, IFLA_BRPORT_PROXYARP, !!(p->flags & BR_PROXYARP)))
 		return -EMSGSIZE;
 
 	return 0;
@@ -335,6 +337,7 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
 	br_set_port_flag(p, tb, IFLA_BRPORT_PROTECT, BR_ROOT_BLOCK);
 	br_set_port_flag(p, tb, IFLA_BRPORT_LEARNING, BR_LEARNING);
 	br_set_port_flag(p, tb, IFLA_BRPORT_UNICAST_FLOOD, BR_FLOOD);
+	br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP, BR_PROXYARP);
 
 	if (tb[IFLA_BRPORT_COST]) {
 		err = br_stp_set_path_cost(p, nla_get_u32(tb[IFLA_BRPORT_COST]));
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index b6c04cb..666c6bc 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -172,6 +172,7 @@ struct net_bridge_port
 #define BR_FLOOD		0x00000040
 #define BR_AUTO_MASK (BR_FLOOD | BR_LEARNING)
 #define BR_PROMISC		0x00000080
+#define BR_PROXYARP		0x00000100
 
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 	struct bridge_mcast_own_query	ip4_own_query;
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index e561cd5..2de5d91 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -170,6 +170,7 @@ BRPORT_ATTR_FLAG(bpdu_guard, BR_BPDU_GUARD);
 BRPORT_ATTR_FLAG(root_block, BR_ROOT_BLOCK);
 BRPORT_ATTR_FLAG(learning, BR_LEARNING);
 BRPORT_ATTR_FLAG(unicast_flood, BR_FLOOD);
+BRPORT_ATTR_FLAG(proxyarp, BR_PROXYARP);
 
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 static ssize_t show_multicast_router(struct net_bridge_port *p, char *buf)
@@ -213,6 +214,7 @@ static const struct brport_attribute *brport_attrs[] = {
 	&brport_attr_multicast_router,
 	&brport_attr_multicast_fast_leave,
 #endif
+	&brport_attr_proxyarp,
 	NULL
 };
 
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH] net sched: text ematch: zero out ts_state before using it
From: Cong Wang @ 2014-10-09 21:48 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: Jamal Hadi Salim, David S. Miller, netdev,
	linux-kernel@vger.kernel.org
In-Reply-To: <1412870721-31061-1-git-send-email-osandov@osandov.com>

On Thu, Oct 9, 2014 at 9:05 AM, Omar Sandoval <osandov@osandov.com> wrote:
> textsearch_find zeroes out the offset, but the control buffer (which may or may
> not matter in this case) needs to be zeroed out as well.

Why? skb_prepare_seq_read() initializes the cb.

Also, the comment says:

 * @state: uninitialized textsearch state variable

^ permalink raw reply

* Re: [PATCH] net sched: text ematch: zero out ts_state before using it
From: Omar Sandoval @ 2014-10-09 21:54 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jamal Hadi Salim, David S. Miller, netdev,
	linux-kernel@vger.kernel.org
In-Reply-To: <CAHA+R7Ot5u6F1zi4egLqOXPWbQrdLxdUQybaFEp5_aa463d4Bw@mail.gmail.com>

On Thu, Oct 09, 2014 at 02:48:54PM -0700, Cong Wang wrote:
> On Thu, Oct 9, 2014 at 9:05 AM, Omar Sandoval <osandov@osandov.com> wrote:
> > textsearch_find zeroes out the offset, but the control buffer (which may or may
> > not matter in this case) needs to be zeroed out as well.
> 
> Why? skb_prepare_seq_read() initializes the cb.
> 
> Also, the comment says:
> 
>  * @state: uninitialized textsearch state variable

Mm, thanks, I missed that. It looks like every other caller of skb_find_text is
doing an unnecessary memset in that case. Disregard this, I guess.
-- 
Omar

^ permalink raw reply

* Re: r8168 is needed to enter P-state: Package State 6 (pc6)onHaswell hardware: does the patch below against current kernel make a difference?
From: Francois Romieu @ 2014-10-09 22:14 UTC (permalink / raw)
  To: Ceriel Jacobs; +Cc: Hayes Wang, nic_swsd, netdev@vger.kernel.org
In-Reply-To: <54367965.7010301@crashplan.pro>

Ceriel Jacobs <linux-ide@crashplan.pro> :
> Francois Romieu schreef op 07-10-14 om 00:13:
> > Ceriel, does the patch below against current kernel make a difference?
[...]
> New r8169 "powertop" result (even without --auto-tune):
> C2 (pc2)    0.0%    |                     |
> C3 (pc3)    9.9%    | C3 (cc3)    0.7%    | C3-HSW      0.7%   16.4 ms
> C6 (pc6)   89.9%    | C6 (cc6)   99.2%    | C6-HSW     99.2%  223.2 ms
> ---

Fine (almost: I hope that ASPM was enabled from bios or during boot
behind your back).

Remember your "lspci -nnkvv -s 03:00.0" ?

03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 11)
[...]
        Capabilities: [70] Express (v2) Endpoint, MSI 01
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
                   	ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

It should now look like:
[...]
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+

Let's temporarily disable it and see if powertop notices a difference.

<full disclosure>

"Capabilities: [70]" above gives you the offset of the relevant registers:
# lspci -xxx -s 03:00.0
[...]
70: .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
^^ -> "[70]"

You are interested in the Link Control register, aka PCI_EXP_LNKCTL in
/usr/include/pci/header.h (devel part of pciutils) or kernel's
include/uapi/linux/pci_regs.h. It's 16 bytes further, thus
[...]
70: .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
80: 42 .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
    ^^

0x42 matches "LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch-" built from above. There may be differences but the 3 lower
weight binary digits in 0x42 encode ASPM control (0=nada, 1=L0, 2=L1,
see PCI_EXP_LNKCTL_ASPxyz in include/uapi/linux/pci_regs.h). Mask these
out (0x42 & ~0x03) and feed the resulting value back into the Link
Control register:

# setpci -s 03:00.0 CAP_EXP+10.b=0x40

(CAP_EXP is pciutils's alias for the PCI Express capability block, see
PCI_CAP_ID_EXP in kernel's include/uapi/linux/pci_regs.h)

If you are not too sure about the 0x40 value, you can retrieve it with
lspci and an unpatched r8169 driver.

</full disclosure>

If I have understood Hayes correctly and he got my question right, lspci
should now tell that ASPM is disabled. C6 should not be reached anymore.

ASPM could thus be enabled unconditionally at the driver level, then
controled through the PCI config registers. Kernel r8169 driver would
thus protect polar bears as Realtek's own r8168 driver already does.

I can't exclude that it will fail miserably in a firework of smelly
smoke though.

-- 
Ueimor

^ permalink raw reply

* [PATCH net] net: bpf: fix bpf syscall dependence on anon_inodes
From: Alexei Starovoitov @ 2014-10-09 22:16 UTC (permalink / raw)
  To: David S. Miller; +Cc: Michal Sojka, netdev, linux-kernel

minimal configurations where EPOLL, PERF_EVENTS, etc are disabled,
but NET is enabled, are failing to build with link error:
kernel/built-in.o: In function `bpf_prog_load':
syscall.c:(.text+0x3b728): undefined reference to `anon_inode_getfd'

fix it by selecting ANON_INODES when NET is enabled

Reported-by: Michal Sojka <sojkam1@fel.cvut.cz>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---

I understand that 'select' is highly non-recommended for all good reasons,
but here 'depends on' is very user unfriendly, since ANON_INODES is
a hidden config that users cannot select directly.

 net/Kconfig |    1 +
 1 file changed, 1 insertion(+)

diff --git a/net/Kconfig b/net/Kconfig
index d6b138e..6272420 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -6,6 +6,7 @@ menuconfig NET
 	bool "Networking support"
 	select NLATTR
 	select GENERIC_NET_UTILS
+	select ANON_INODES
 	---help---
 	  Unless you really know what you are doing, you should say Y here.
 	  The reason is that some programs need kernel networking support even
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH net] ixgbe: check adapter->vfinfo before dereference
From: Jeff Kirsher @ 2014-10-09 22:35 UTC (permalink / raw)
  To: Thierry Herbelot; +Cc: netdev
In-Reply-To: <1412769913-22306-1-git-send-email-thierry.herbelot@6wind.com>

[-- Attachment #1: Type: text/plain, Size: 9955 bytes --]

On Wed, 2014-10-08 at 14:05 +0200, Thierry Herbelot wrote:
> this protects against the following panic:
> (before a VF was actually created on p96p1 PF Ethernet port)
> ip link set p96p1 vf 0 spoofchk off
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000052
> IP: [<ffffffffa044a1c1>] ixgbe_ndo_set_vf_spoofchk+0x51/0x150 [ixgbe]
> 
> Signed-off-by: Thierry Herbelot <thierry.herbelot@6wind.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |   73 +++++++++++++++++++++++-
>  1 file changed, 70 insertions(+), 3 deletions(-)

Dropping this patch because the driver generates compile warnings with
this patch applied.

n0324:[0]/usr/src/net-community-queue> make -j 18
SUBDIRS=drivers/net/ethernet/intel/ixgbe modules
  CC [M]  drivers/net/ethernet/intel/ixgbe/ixgbe_main.o
...
drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c: In function
âixgbe_ping_all_vfsâ:
drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c:1102: warning: âreturnâ
with a value, in function returning void
drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c: In function
âixgbe_ndo_set_vf_spoofchkâ:
drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c:1326: warning: âreturnâ
with no value, in function returning non-void
  LD [M]  drivers/net/ethernet/intel/ixgbe/ixgbe.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC      drivers/net/ethernet/intel/ixgbe/ixgbe.mod.o
  LD [M]  drivers/net/ethernet/intel/ixgbe/ixgbe.ko

> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
> index c14d4d8..c6c9c0a 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
> @@ -314,7 +314,7 @@ static int ixgbe_set_vf_multicasts(struct ixgbe_adapter *adapter,
>  	int entries = (msgbuf[0] & IXGBE_VT_MSGINFO_MASK)
>  		       >> IXGBE_VT_MSGINFO_SHIFT;
>  	u16 *hash_list = (u16 *)&msgbuf[1];
> -	struct vf_data_storage *vfinfo = &adapter->vfinfo[vf];
> +	struct vf_data_storage *vfinfo;
>  	struct ixgbe_hw *hw = &adapter->hw;
>  	int i;
>  	u32 vector_bit;
> @@ -322,6 +322,11 @@ static int ixgbe_set_vf_multicasts(struct ixgbe_adapter *adapter,
>  	u32 mta_reg;
>  	u32 vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(vf));
>  
> +	if (!adapter->vfinfo)
> +		return -1;
> +
> +	vfinfo = &adapter->vfinfo[vf];
> +
>  	/* only so many hash values supported */
>  	entries = min(entries, IXGBE_MAX_VF_MC_ENTRIES);
>  
> @@ -363,6 +368,9 @@ void ixgbe_restore_vf_multicasts(struct ixgbe_adapter *adapter)
>  	u32 vector_reg;
>  	u32 mta_reg;
>  
> +	if (!adapter->vfinfo)
> +		return;
> +
>  	for (i = 0; i < adapter->num_vfs; i++) {
>  		u32 vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(i));
>  		vfinfo = &adapter->vfinfo[i];
> @@ -416,6 +424,9 @@ static s32 ixgbe_set_vf_lpe(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf)
>  		u32 reg_offset, vf_shift, vfre;
>  		s32 err = 0;
>  
> +		if (!adapter->vfinfo)
> +			return -1;
> +
>  #ifdef CONFIG_FCOE
>  		if (dev->features & NETIF_F_FCOE_MTU)
>  			pf_max_frame = max_t(int, pf_max_frame,
> @@ -505,6 +516,9 @@ static inline void ixgbe_vf_reset_event(struct ixgbe_adapter *adapter, u32 vf)
>  	struct vf_data_storage *vfinfo = &adapter->vfinfo[vf];
>  	u8 num_tcs = netdev_get_num_tc(adapter->netdev);
>  
> +	if (!adapter->vfinfo)
> +		return;
> +
>  	/* add PF assigned VLAN or VLAN 0 */
>  	ixgbe_set_vf_vlan(adapter, true, vfinfo->pf_vlan, vf);
>  
> @@ -541,6 +555,8 @@ static inline void ixgbe_vf_reset_event(struct ixgbe_adapter *adapter, u32 vf)
>  static int ixgbe_set_vf_mac(struct ixgbe_adapter *adapter,
>  			    int vf, unsigned char *mac_addr)
>  {
> +	if (!adapter->vfinfo)
> +		return -1;
>  	ixgbe_del_mac_filter(adapter, adapter->vfinfo[vf].vf_mac_addresses, vf);
>  	memcpy(adapter->vfinfo[vf].vf_mac_addresses, mac_addr, ETH_ALEN);
>  	ixgbe_add_mac_filter(adapter, adapter->vfinfo[vf].vf_mac_addresses, vf);
> @@ -610,6 +626,9 @@ int ixgbe_vf_configuration(struct pci_dev *pdev, unsigned int event_mask)
>  
>  	bool enable = ((event_mask & 0x10000000U) != 0);
>  
> +	if (!adapter->vfinfo)
> +		return -1;
> +
>  	if (enable)
>  		eth_zero_addr(adapter->vfinfo[vfn].vf_mac_addresses);
>  
> @@ -620,13 +639,18 @@ static int ixgbe_vf_reset_msg(struct ixgbe_adapter *adapter, u32 vf)
>  {
>  	struct ixgbe_ring_feature *vmdq = &adapter->ring_feature[RING_F_VMDQ];
>  	struct ixgbe_hw *hw = &adapter->hw;
> -	unsigned char *vf_mac = adapter->vfinfo[vf].vf_mac_addresses;
> +	unsigned char *vf_mac;
>  	u32 reg, reg_offset, vf_shift;
>  	u32 msgbuf[4] = {0, 0, 0, 0};
>  	u8 *addr = (u8 *)(&msgbuf[1]);
>  	u32 q_per_pool = __ALIGN_MASK(1, ~vmdq->mask);
>  	int i;
>  
> +	if (!adapter->vfinfo)
> +		return -1;
> +
> +	vf_mac = adapter->vfinfo[vf].vf_mac_addresses;
> +
>  	e_info(probe, "VF Reset msg received from vf %d\n", vf);
>  
>  	/* reset the filters for the device */
> @@ -721,6 +745,9 @@ static int ixgbe_set_vf_mac_addr(struct ixgbe_adapter *adapter,
>  {
>  	u8 *new_mac = ((u8 *)(&msgbuf[1]));
>  
> +	if (!adapter->vfinfo)
> +		return -1;
> +
>  	if (!is_valid_ether_addr(new_mac)) {
>  		e_warn(drv, "VF %d attempted to set invalid mac\n", vf);
>  		return -1;
> @@ -773,6 +800,9 @@ static int ixgbe_set_vf_vlan_msg(struct ixgbe_adapter *adapter,
>  	u32 bits;
>  	u8 tcs = netdev_get_num_tc(adapter->netdev);
>  
> +	if (!adapter->vfinfo)
> +		return -1;
> +
>  	if (adapter->vfinfo[vf].pf_vlan || tcs) {
>  		e_warn(drv,
>  		       "VF %d attempted to override administratively set VLAN configuration\n"
> @@ -839,6 +869,9 @@ static int ixgbe_set_vf_macvlan_msg(struct ixgbe_adapter *adapter,
>  		    IXGBE_VT_MSGINFO_SHIFT;
>  	int err;
>  
> +	if (!adapter->vfinfo)
> +		return -1;
> +
>  	if (adapter->vfinfo[vf].pf_set_mac && index > 0) {
>  		e_warn(drv,
>  		       "VF %d requested MACVLAN filter but is administratively denied\n",
> @@ -875,6 +908,9 @@ static int ixgbe_negotiate_vf_api(struct ixgbe_adapter *adapter,
>  {
>  	int api = msgbuf[1];
>  
> +	if (!adapter->vfinfo)
> +		return -1;
> +
>  	switch (api) {
>  	case ixgbe_mbox_api_10:
>  	case ixgbe_mbox_api_11:
> @@ -897,6 +933,9 @@ static int ixgbe_get_vf_queues(struct ixgbe_adapter *adapter,
>  	unsigned int default_tc = 0;
>  	u8 num_tcs = netdev_get_num_tc(dev);
>  
> +	if (!adapter->vfinfo)
> +		return -1;
> +
>  	/* verify the PF is supporting the correct APIs */
>  	switch (adapter->vfinfo[vf].vf_api) {
>  	case ixgbe_mbox_api_20:
> @@ -935,6 +974,9 @@ static int ixgbe_rcv_msg_from_vf(struct ixgbe_adapter *adapter, u32 vf)
>  	struct ixgbe_hw *hw = &adapter->hw;
>  	s32 retval;
>  
> +	if (!adapter->vfinfo)
> +		return -1;
> +
>  	retval = ixgbe_read_mbx(hw, msgbuf, mbx_size, vf);
>  
>  	if (retval) {
> @@ -1008,6 +1050,9 @@ static void ixgbe_rcv_ack_from_vf(struct ixgbe_adapter *adapter, u32 vf)
>  	struct ixgbe_hw *hw = &adapter->hw;
>  	u32 msg = IXGBE_VT_MSGTYPE_NACK;
>  
> +	if (!adapter->vfinfo)
> +		return;
> +
>  	/* if device isn't clear to send it shouldn't be reading either */
>  	if (!adapter->vfinfo[vf].clear_to_send)
>  		ixgbe_write_mbx(hw, &msg, 1, vf);
> @@ -1051,6 +1096,9 @@ void ixgbe_ping_all_vfs(struct ixgbe_adapter *adapter)
>  	u32 ping;
>  	int i;
>  
> +	if (!adapter->vfinfo)
> +		return -1;
> +

This should be simply "return;" since this is a void function.

>  	for (i = 0 ; i < adapter->num_vfs; i++) {
>  		ping = IXGBE_PF_CONTROL_MSG;
>  		if (adapter->vfinfo[i].clear_to_send)
> @@ -1064,6 +1112,9 @@ int ixgbe_ndo_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
>  	struct ixgbe_adapter *adapter = netdev_priv(netdev);
>  	if (!is_valid_ether_addr(mac) || (vf >= adapter->num_vfs))
>  		return -EINVAL;
> +	if (!adapter->vfinfo)
> +		return -1;
> +
>  	adapter->vfinfo[vf].pf_set_mac = true;
>  	dev_info(&adapter->pdev->dev, "setting MAC %pM on VF %d\n", mac, vf);
>  	dev_info(&adapter->pdev->dev, "Reload the VF driver to make this"
> @@ -1083,6 +1134,9 @@ int ixgbe_ndo_set_vf_vlan(struct net_device *netdev, int vf, u16 vlan, u8 qos)
>  	struct ixgbe_adapter *adapter = netdev_priv(netdev);
>  	struct ixgbe_hw *hw = &adapter->hw;
>  
> +	if (!adapter->vfinfo)
> +		return -1;
> +
>  	if ((vf >= adapter->num_vfs) || (vlan > 4095) || (qos > 7))
>  		return -EINVAL;
>  	if (vlan || qos) {
> @@ -1147,8 +1201,12 @@ static void ixgbe_set_vf_rate_limit(struct ixgbe_adapter *adapter, int vf)
>  	struct ixgbe_hw *hw = &adapter->hw;
>  	u32 bcnrc_val = 0;
>  	u16 queue, queues_per_pool;
> -	u16 tx_rate = adapter->vfinfo[vf].tx_rate;
> +	u16 tx_rate;
>  
> +	if (!adapter->vfinfo)
> +		return;
> +
> +	tx_rate = adapter->vfinfo[vf].tx_rate;
>  	if (tx_rate) {
>  		/* start with base link speed value */
>  		bcnrc_val = adapter->vf_rate_link_speed;
> @@ -1197,6 +1255,9 @@ void ixgbe_check_vf_rate_limit(struct ixgbe_adapter *adapter)
>  {
>  	int i;
>  
> +	if (!adapter->vfinfo)
> +		return;
> +
>  	/* VF Tx rate limit was not set */
>  	if (!adapter->vf_rate_link_speed)
>  		return;
> @@ -1259,6 +1320,9 @@ int ixgbe_ndo_set_vf_spoofchk(struct net_device *netdev, int vf, bool setting)
>  	struct ixgbe_hw *hw = &adapter->hw;
>  	u32 regval;
>  
> +	if (!adapter->vfinfo)
> +		return;
> +

The return needs to return an integer.

>  	adapter->vfinfo[vf].spoofchk_enabled = setting;
>  
>  	regval = IXGBE_READ_REG(hw, IXGBE_PFVFSPOOF(vf_target_reg));
> @@ -1283,6 +1347,9 @@ int ixgbe_ndo_get_vf_config(struct net_device *netdev,
>  	struct ixgbe_adapter *adapter = netdev_priv(netdev);
>  	if (vf >= adapter->num_vfs)
>  		return -EINVAL;
> +	if (!adapter->vfinfo)
> +		return -EINVAL;
> +
>  	ivi->vf = vf;
>  	memcpy(&ivi->mac, adapter->vfinfo[vf].vf_mac_addresses, ETH_ALEN);
>  	ivi->max_tx_rate = adapter->vfinfo[vf].tx_rate;



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v3] Add support for GPIOs for SMSC LAN95xx chips.
From: David Miller @ 2014-10-09 22:45 UTC (permalink / raw)
  To: boger-hVk9LwgH4SrGCOCKMErq+g
  Cc: dforsi-Re5JQEeQqe8AvxtiuMwx3w,
	steve.glendinning-nksJyM/082jR7s880joybQ,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1412806498-22556-1-git-send-email-boger-hVk9LwgH4SrGCOCKMErq+g@public.gmane.org>

From: Evgeny Boger <boger-hVk9LwgH4SrGCOCKMErq+g@public.gmane.org>
Date: Thu,  9 Oct 2014 02:14:58 +0400

> There might be 11 GPIOs in total.
> Last three GPIOs  (offsets 8-10, 0-based) are shared with FDX, LNKA, SPD
> LEDs respectively. The LEDs are driven by chip by default at startup time.
> Once the corresponding GPIO is requested, the chip LED drive logic is disabled.
> 
> The numbering scheme according to datasheets differs a bit between LAN9500
> and LAN951x.
> 
> For LAN951x:
> GPIOs with offsets 0-7 are named "GPIO3" - "GPIO7",
> offsets 8-10 are for "GPIO0" - "GPIO2" (these three are multiplexed with nFDX_LED,
> nLNKA_LED, nSPD_LED).
> 
> For LAN9500:
> The datasheet name is the same as the corresponding offset, i.e. offsets 0-10 are
> for "GPIO0"-"GPIO10".
> 
> Signed-off-by: Evgeny Boger <boger-hVk9LwgH4SrGCOCKMErq+g@public.gmane.org>

Please either "select GPIOLIB" from USB_NET_SMSC95XX, or add a new
config option "UBS_NET_SMSC95XX_GPIO" which does it.  I would prefer
the former, and then you can get rid of all of the ifdefs in your
patch.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next 0/3] cxgb4/cxgb4vf: Misc fixes and 40G support for cxgb4vf
From: David Miller @ 2014-10-09 22:55 UTC (permalink / raw)
  To: hariprasad; +Cc: netdev, leedom, kumaras, nirranjan, santosh
In-Reply-To: <1412813927-24951-1-git-send-email-hariprasad@chelsio.com>

From: Hariprasad Shenai <hariprasad@chelsio.com>
Date: Thu,  9 Oct 2014 05:48:44 +0530

> This patch series adds 40G support for cxgb4vf driver. Update the LSO length for
> cxgb4vf, fix macro. Wait for device to get ready before reading PL_WHOAMI
> register.
> 
> The patches series is created against 'net-next' tree.
> And includes patches on cxgb4 and cxgb4vf driver.
> 
> We have included all the maintainers of respective drivers. Kindly review the
> change and let us know in case of any review comments.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH v3 0/6] Add 10GbE support to APM X-Gene SoC ethernet driver
From: David Miller @ 2014-10-09 22:56 UTC (permalink / raw)
  To: isubramanian; +Cc: netdev, devicetree, linux-arm-kernel, patches, kchudgar
In-Reply-To: <1412826861-32208-1-git-send-email-isubramanian@apm.com>

From: Iyappan Subramanian <isubramanian@apm.com>
Date: Wed,  8 Oct 2014 20:54:15 -0700

> Adding 10GbE support to APM X-Gene SoC ethernet driver.
> 
> v3: Address comments from v2
> * dtb: changed to use all-zeros for the mac address
> 
> v2: Address comments from v1
> * created preparatory patch to review before adding new functionality
> * dtb: updated to use tabs consistently
> 
> v1:
> * Initial version

This does not apply to my current tree, in particular the DT files
get rejects.

^ permalink raw reply

* Re: [PATCH] net: Missing @ before descriptions cause make xmldocs warning
From: David Miller @ 2014-10-09 22:57 UTC (permalink / raw)
  To: standby24x7; +Cc: linux-kernel, netdev, edumazet
In-Reply-To: <1412827088-30718-1-git-send-email-standby24x7@gmail.com>

From: Masanari Iida <standby24x7@gmail.com>
Date: Thu,  9 Oct 2014 12:58:08 +0900

> This patch fix following warning.
> Warning(.//net/core/skbuff.c:4142): No description found for parameter 'header_len'
> Warning(.//net/core/skbuff.c:4142): No description found for parameter 'data_len'
> Warning(.//net/core/skbuff.c:4142): No description found for parameter 'max_page_order'
> Warning(.//net/core/skbuff.c:4142): No description found for parameter 'errcode'
> Warning(.//net/core/skbuff.c:4142): No description found for parameter 'gfp_mask'
> 
> Acutually the descriptions exist, but missing "@" in front.
> 
> This problem start to happen when following commit was merged
> into Linus's tree during 3.18-rc1 merge period.
> commit 2e4e44107176d552f8bb1bb76053e850e3809841
> net: add alloc_skb_with_frags() helper
> 
> Signed-off-by: Masanari Iida <standby24x7@gmail.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next] r8152: use mutex for hw settings
From: David Miller @ 2014-10-09 23:05 UTC (permalink / raw)
  To: hayeswang-Rasf1IRRPZFBDgjK7y7TUQ
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, nic_swsd-Rasf1IRRPZFBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <0835B3720019904CB8F7AA43166CEEB2EC397B-DsoXZbr0xhNADej8IQs+YlSdAeM+KJnwp3nQiEPZk/A@public.gmane.org>

From: Hayes Wang <hayeswang-Rasf1IRRPZFBDgjK7y7TUQ@public.gmane.org>
Date: Thu, 9 Oct 2014 07:59:35 +0000

> If I use the rtnl_lock(), I get a dead lock when enabling autosuspend.
> 
> Case 1:
>    autosuspend before calling open.
>    rtnl_lock()
>    call open
>    try to autoresume and rtl8152_resume is called.
>    dead lock occurs.
> 
> Case 2:
>    autosuspend occurs.
>    rtnl_lock()
>    call close
>    try to autoresume and rtl8152_resume is called.
>    dead lock occurs.

That's really unfortunate that we can variably get into the resume
handlers from contexts holding the RTNL mutex.

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next v2 0/3] r8152: use mutex for hw settings
From: David Miller @ 2014-10-09 23:06 UTC (permalink / raw)
  To: hayeswang; +Cc: netdev, nic_swsd, linux-kernel, linux-usb
In-Reply-To: <1394712342-15778-60-Taiwan-albertk@realtek.com>

From: Hayes Wang <hayeswang@realtek.com>
Date: Thu, 9 Oct 2014 18:00:23 +0800

> v2:
> Make sure the autoresume wouldn't occur inside the mutex, otherwise
> the dead lock would happen. For the purpose, adjust some code about
> autosuspend/autoresume.
> 
> v1:
> Use mutex to avoid that the serial hw settings would be interrupted
> by other settings. Although there is no problem now, it makes the
> driver more safe.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH] net_sched: restore qdisc quota fairness limits after bulk dequeue
From: David Miller @ 2014-10-09 23:12 UTC (permalink / raw)
  To: brouer; +Cc: netdev, eric.dumazet, hannes, dborkman, dave.taht
In-Reply-To: <20141009101640.26713.80729.stgit@dragon>

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Thu, 09 Oct 2014 12:18:10 +0200

> Restore the quota fairness between qdisc's, that we broke with commit
> 5772e9a346 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE").
> 
> Before that commit, the quota in __qdisc_run() were in packets as
> dequeue_skb() would only dequeue a single packet, that assumption
> broke with bulk dequeue.
> 
> We choose not to account for the number of packets inside the TSO/GSO
> packets (accessable via "skb_gso_segs").  As the previous fairness
> also had this "defect". Thus, GSO/TSO packets counts as a single
> packet.
> 
> Further more, we choose to slack on accuracy, by allowing a bulk
> dequeue try_bulk_dequeue_skb() to exceed the "packets" limit, only
> limited by the BQL bytelimit.  This is done because BQL prefers to get
> its full budget for appropriate feedback from TX completion.
> 
> In future, we might consider reworking this further and, if it allows,
> switch to a time-based model, as suggested by Eric. Right now, we only
> restore old semantics.
> 
> Joint work with Eric, Hannes, Daniel and Jesper.  Hannes wrote the
> first patch in cooperation with Daniel and Jesper.  Eric rewrote the
> patch.
> 
> Fixes: 5772e9a346 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>

Looks fantastic, thanks everyone.

^ permalink raw reply

* Re: [PATCH net] niu: remove unnecessary atomic operation
From: Eric Dumazet @ 2014-10-09 23:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1412875428.18196.3.camel@edumazet-glaptop2.roam.corp.google.com>

On Thu, 2014-10-09 at 10:23 -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> We allocate single page, compound_head() is not needed.
> 
> We own the page, we can simply set page->_count to the
> needed value instead of doing a locked addition.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  drivers/net/ethernet/sun/niu.c |    5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/sun/niu.c b/drivers/net/ethernet/sun/niu.c
> index 904fd1ab5f6e..ec2cd8a6d5bc 100644
> --- a/drivers/net/ethernet/sun/niu.c
> +++ b/drivers/net/ethernet/sun/niu.c
> @@ -3340,9 +3340,8 @@ static int niu_rbr_add_page(struct niu *np, struct rx_ring_info *rp,
>  	}
>  
>  	niu_hash_page(rp, page, addr);
> -	if (rp->rbr_blocks_per_page > 1)
> -		atomic_add(rp->rbr_blocks_per_page - 1,
> -			   &compound_head(page)->_count);
> +
> +	atomic_set(&page->_count, rp->rbr_blocks_per_page);
>  
>  	for (i = 0; i < rp->rbr_blocks_per_page; i++) {
>  		__le32 *rbr = &rp->rbr[start_index + i];
> 

Please disregard this patch.

We need to fix all usages of atomic_set() on page->_count as they are
racy.

I'll send fixes.

^ permalink raw reply

* eth_get_headlen() and unaligned accesses...
From: David Miller @ 2014-10-10  0:12 UTC (permalink / raw)
  To: netdev; +Cc: alexander.duyck

So, we have a bit of a problem, this is on sparc64:

[423475.740836] Kernel unaligned access at TPC[81d330] __skb_flow_get_ports+0x70/0xe0
[423475.755756] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.0+ #2
[423475.767854] Call Trace:
[423475.772877]  [0000000000433288] kernel_unaligned_trap+0x368/0x5c0
[423475.785203]  [000000000042a824] sun4v_do_mna+0x84/0xa0
[423475.795624]  [0000000000406cd0] sun4v_mna+0x5c/0x68
[423475.805521]  [000000000081d330] __skb_flow_get_ports+0x70/0xe0
[423475.817323]  [000000000081d6ac] __skb_flow_dissect+0x1ac/0x460
[423475.829128]  [0000000000843c98] eth_get_headlen+0x38/0xa0
[423475.840083]  [0000000010064d54] igb_poll+0x8d4/0xf60 [igb]
[423475.851184]  [00000000008243c8] net_rx_action+0xa8/0x1c0

The chip DMA's to the beginning of a frag page and (unless timestamps
are enabled) that's where the ethernet header begins.

So any larger than 16-bit access to the IP and later headers will be
unaligned.

We have various ways we can deal with this based upon the capabilities
of the chips involved.  Can we configure the IGB to put 2 "don't care"
bytes at the beginning of the packet?

^ permalink raw reply

* [PATCH net 0/3] net: bcmgenet & systemport fixes
From: Florian Fainelli @ 2014-10-10  1:06 UTC (permalink / raw)
  To: netdev; +Cc: davem, pgynther, jaedon.shin, Florian Fainelli

Hi David,

This patch series fixes an off-by-one error introduced during a previous
change, and the two other fixes fix a wake depth imbalance situation for
the Wake-on-LAN interrupt line.

Thanks!

Florian Fainelli (3):
  net: bcmgenet: fix off-by-one in incrementing read pointer
  net: bcmgenet: avoid unbalanced enable_irq_wake calls
  net: systemport: avoid unbalanced enable_irq_wake calls

 drivers/net/ethernet/broadcom/bcmsysport.c         | 3 ++-
 drivers/net/ethernet/broadcom/genet/bcmgenet.c     | 6 +++---
 drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c | 4 +++-
 3 files changed, 8 insertions(+), 5 deletions(-)

-- 
1.9.1

^ permalink raw reply

* [PATCH net 1/3] net: bcmgenet: fix off-by-one in incrementing read pointer
From: Florian Fainelli @ 2014-10-10  1:06 UTC (permalink / raw)
  To: netdev; +Cc: davem, pgynther, jaedon.shin, Florian Fainelli
In-Reply-To: <1412903197-19193-1-git-send-email-f.fainelli@gmail.com>

Commit b629be5c8399d7c423b92135eb43a86c924d1cbc ("net: bcmgenet: check
harder for out of memory conditions") moved the increment of the local
read pointer *before* reading from the hardware descriptor using
dmadesc_get_length_status(), which creates an off-by-one situation.

Fix this by moving again the read_ptr increment after we have read the
hardware descriptor to get both the control block and the read pointer
back in sync.

Fixes: b629be5c8399 ("net: bcmgenet: check harder for out of memory conditions")
Reported-by: Jaedon Shin <jaedon.shin@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index fff2634b6f34..f1bcebcbba80 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1287,9 +1287,6 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_priv *priv,
 
 		rxpktprocessed++;
 
-		priv->rx_read_ptr++;
-		priv->rx_read_ptr &= (priv->num_rx_bds - 1);
-
 		/* We do not have a backing SKB, so we do not have a
 		 * corresponding DMA mapping for this incoming packet since
 		 * bcmgenet_rx_refill always either has both skb and mapping or
@@ -1332,6 +1329,9 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_priv *priv,
 			  __func__, p_index, priv->rx_c_index,
 			  priv->rx_read_ptr, dma_length_status);
 
+		priv->rx_read_ptr++;
+		priv->rx_read_ptr &= (priv->num_rx_bds - 1);
+
 		if (unlikely(!(dma_flag & DMA_EOP) || !(dma_flag & DMA_SOP))) {
 			netif_err(priv, rx_status, dev,
 				  "dropping fragmented packet!\n");
-- 
1.9.1

^ permalink raw reply related

* [PATCH net 2/3] net: bcmgenet: avoid unbalanced enable_irq_wake calls
From: Florian Fainelli @ 2014-10-10  1:06 UTC (permalink / raw)
  To: netdev; +Cc: davem, pgynther, jaedon.shin, Florian Fainelli
In-Reply-To: <1412903197-19193-1-git-send-email-f.fainelli@gmail.com>

Multiple enable_irq_wake() calls will keep increasing the IRQ
wake_depth, which ultimately leads to the following types of
situation:

1) enable Wake-on-LAN interrupt w/o password
2) enable Wake-on-LAN interrupt w/ password
3) enable Wake-on-LAN interrupt w/o password
4) disable Wake-on-LAN interrupt

After step 4), GENET would always wake-up the system no matter what
wake-up device we use, which is not what we want. Fix this by making
sure there are no unbalanced enable_irq_wake() calls.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c b/drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c
index b82b7e4e06b2..149a0d70c108 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c
@@ -86,7 +86,9 @@ int bcmgenet_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
 	/* Flag the device and relevant IRQ as wakeup capable */
 	if (wol->wolopts) {
 		device_set_wakeup_enable(kdev, 1);
-		enable_irq_wake(priv->wol_irq);
+		/* Avoid unbalanced enable_irq_wake calls */
+		if (priv->wol_irq_disabled)
+			enable_irq_wake(priv->wol_irq);
 		priv->wol_irq_disabled = false;
 	} else {
 		device_set_wakeup_enable(kdev, 0);
-- 
1.9.1

^ permalink raw reply related

* [PATCH net 3/3] net: systemport: avoid unbalanced enable_irq_wake calls
From: Florian Fainelli @ 2014-10-10  1:06 UTC (permalink / raw)
  To: netdev; +Cc: davem, pgynther, jaedon.shin, Florian Fainelli
In-Reply-To: <1412903197-19193-1-git-send-email-f.fainelli@gmail.com>

Multiple enable_irq_wake() calls will keep increasing the IRQ
wake_depth, which ultimately leads to the following types of
situation:

1) enable Wake-on-LAN interrupt w/o password
2) enable Wake-on-LAN interrupt w/ password
3) enable Wake-on-LAN interrupt w/o password
4) disable Wake-on-LAN interrupt

After step 4), SYSTEMPORT would always wake-up the system no matter what
wake-up device we use, which is not what we want. Fix this by making
sure there are no unbalanced enable_irq_wake() calls.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index 075688188644..9ae36979bdee 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -436,7 +436,8 @@ static int bcm_sysport_set_wol(struct net_device *dev,
 	/* Flag the device and relevant IRQ as wakeup capable */
 	if (wol->wolopts) {
 		device_set_wakeup_enable(kdev, 1);
-		enable_irq_wake(priv->wol_irq);
+		if (priv->wol_irq_disabled)
+			enable_irq_wake(priv->wol_irq);
 		priv->wol_irq_disabled = 0;
 	} else {
 		device_set_wakeup_enable(kdev, 0);
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net-next 0/2] sunvnet: Packet processing in non-interrupt context.
From: Raghuram Kothakota @ 2014-10-10  1:10 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: David Miller, netdev
In-Reply-To: <20141006193111.GE24721@oracle.com>

I would like to share my experience, on sparc we need more parallelism to
get the high performance that's just the nature of CMT processors(at least today).
Lock less Tx and Rx implementation is very nice, but requires to the code path to
single threaded to achieve it. This may limit the performance to the single thread
performance, that may be not very high for standard MTU packets. The large MTU packets
like 64K MTU may have some advantage due to less processing both in the stack
and the driver, but still  single thread would limit the max performance. I would suggest
explore both lock less and high parallelism methods to see which one gives the
best performance.

-Raghuram
On Oct 6, 2014, at 12:31 PM, Sowmini Varadhan <sowmini.varadhan@oracle.com> wrote:

> On (10/06/14 15:25), David Miller wrote:
>> 
>>> But we still need to hold the vio lock around the ldc_write 
>>> (and also around dring write) in vnet_start_xmit, right?
>> 
>> You might be able to avoid it, you're fully serialized by the TX queue
>> lock.
> 
> yes, I was just noticing that. The only place where I believe I need
> to hold the vio spin-lock is to sync with the dr->cons checks
> (the "should I send a start_cons LDC message?" check in vnet_start_xmit()
> vs the vnet_ack() updates).
> 
> But isn't it better in general to declare NETIF_F_LLTX and have finer lock
> granularity in the driver?
> 
> --Sowmini
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox