Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next 06/20] tipc: Optimizations & corrections to message rejection
From: Paul Gortmaker @ 2011-06-24 22:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Allan.Stephens, Allan Stephens, Paul Gortmaker
In-Reply-To: <1308953247-25266-1-git-send-email-paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Optimizes the creation of a returned payload message by duplicating
the original message and then updating the small number of fields
that need to be adjusted, rather than building the new message header
from scratch. In addition, certain operations that are not always
required are relocated so that they are only done if needed.

These optimizations also have the effect of addressing other issues
that were present previously:

1) Fixes a bug that caused the socket send routines to return the
size of the returned message, rather than the size of the sent
message, when a returnable payload message was sent to a non-existent
destination port.

2) The message header of the returned message now matches that of
the original message more closely. The header is now always the same
size as the original header, and some message header fields that
weren't being initialized in the returned message header are now
populated correctly -- namely the "d" and "s" bits, and the upper
bound of a multicast name instance (where present).

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/port.c |   49 ++++++++++++++++++++++++-------------------------
 1 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/net/tipc/port.c b/net/tipc/port.c
index 70ecdfd..5311817 100644
--- a/net/tipc/port.c
+++ b/net/tipc/port.c
@@ -358,14 +358,10 @@ int tipc_reject_msg(struct sk_buff *buf, u32 err)
 	struct sk_buff *rbuf;
 	struct tipc_msg *rmsg;
 	int hdr_sz;
-	u32 imp = msg_importance(msg);
+	u32 imp;
 	u32 data_sz = msg_data_sz(msg);
 	u32 src_node;
-
-	if (data_sz > MAX_REJECT_SIZE)
-		data_sz = MAX_REJECT_SIZE;
-	if (msg_connected(msg) && (imp < TIPC_CRITICAL_IMPORTANCE))
-		imp++;
+	u32 rmsg_sz;
 
 	/* discard rejected message if it shouldn't be returned to sender */
 
@@ -377,30 +373,33 @@ int tipc_reject_msg(struct sk_buff *buf, u32 err)
 	if (msg_errcode(msg) || msg_dest_droppable(msg))
 		goto exit;
 
-	/* construct rejected message */
-	if (msg_mcast(msg))
-		hdr_sz = MCAST_H_SIZE;
-	else
-		hdr_sz = LONG_H_SIZE;
-	rbuf = tipc_buf_acquire(data_sz + hdr_sz);
+	/*
+	 * construct returned message by copying rejected message header and
+	 * data (or subset), then updating header fields that need adjusting
+	 */
+
+	hdr_sz = msg_hdr_sz(msg);
+	rmsg_sz = hdr_sz + min_t(u32, data_sz, MAX_REJECT_SIZE);
+
+	rbuf = tipc_buf_acquire(rmsg_sz);
 	if (rbuf == NULL)
 		goto exit;
 
 	rmsg = buf_msg(rbuf);
-	tipc_msg_init(rmsg, imp, msg_type(msg), hdr_sz, msg_orignode(msg));
-	msg_set_errcode(rmsg, err);
-	msg_set_destport(rmsg, msg_origport(msg));
-	msg_set_origport(rmsg, msg_destport(msg));
-	if (msg_short(msg)) {
-		msg_set_orignode(rmsg, tipc_own_addr);
-		/* leave name type & instance as zeroes */
-	} else {
-		msg_set_orignode(rmsg, msg_destnode(msg));
-		msg_set_nametype(rmsg, msg_nametype(msg));
-		msg_set_nameinst(rmsg, msg_nameinst(msg));
+	skb_copy_to_linear_data(rbuf, msg, rmsg_sz);
+
+	if (msg_connected(rmsg)) {
+		imp = msg_importance(rmsg);
+		if (imp < TIPC_CRITICAL_IMPORTANCE)
+			msg_set_importance(rmsg, ++imp);
 	}
-	msg_set_size(rmsg, data_sz + hdr_sz);
-	skb_copy_to_linear_data_offset(rbuf, hdr_sz, msg_data(msg), data_sz);
+	msg_set_non_seq(rmsg, 0);
+	msg_set_size(rmsg, rmsg_sz);
+	msg_set_errcode(rmsg, err);
+	msg_set_prevnode(rmsg, tipc_own_addr);
+	msg_swap_words(rmsg, 4, 5);
+	if (!msg_short(rmsg))
+		msg_swap_words(rmsg, 6, 7);
 
 	/* send self-abort message when rejecting on a connected port */
 	if (msg_connected(msg)) {
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH net-next 07/20] tipc: Eliminate message header routines for caching destination node
From: Paul Gortmaker @ 2011-06-24 22:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Allan.Stephens, Allan Stephens, Paul Gortmaker
In-Reply-To: <1308953247-25266-1-git-send-email-paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Gets rid of a pair of routines that provide support for temporarily
caching the destination node for a message in the associated message
buffer's application handle, since this capability is no longer used.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/msg.h |   20 --------------------
 1 files changed, 0 insertions(+), 20 deletions(-)

diff --git a/net/tipc/msg.h b/net/tipc/msg.h
index 8452454..11b74dc 100644
--- a/net/tipc/msg.h
+++ b/net/tipc/msg.h
@@ -311,26 +311,6 @@ static inline void msg_set_seqno(struct tipc_msg *m, u32 n)
 }
 
 /*
- * TIPC may utilize the "link ack #" and "link seq #" fields of a short
- * message header to hold the destination node for the message, since the
- * normal "dest node" field isn't present.  This cache is only referenced
- * when required, so populating the cache of a longer message header is
- * harmless (as long as the header has the two link sequence fields present).
- *
- * Note: Host byte order is OK here, since the info never goes off-card.
- */
-
-static inline u32 msg_destnode_cache(struct tipc_msg *m)
-{
-	return m->hdr[2];
-}
-
-static inline void msg_set_destnode_cache(struct tipc_msg *m, u32 dnode)
-{
-	m->hdr[2] = dnode;
-}
-
-/*
  * Words 3-10
  */
 
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH net-next 08/20] tipc: Eliminate redundant masking in message header routines
From: Paul Gortmaker @ 2011-06-24 22:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Allan.Stephens, Allan Stephens, Paul Gortmaker
In-Reply-To: <1308953247-25266-1-git-send-email-paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Gets rid of unnecessary masking in two routines that set TIPC message
header fields. (The msg_set_bits() routine already takes care of
masking the new value to the correct size.)

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/msg.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/tipc/msg.h b/net/tipc/msg.h
index 11b74dc..a58975c 100644
--- a/net/tipc/msg.h
+++ b/net/tipc/msg.h
@@ -615,7 +615,7 @@ static inline u32 msg_link_selector(struct tipc_msg *m)
 
 static inline void msg_set_link_selector(struct tipc_msg *m, u32 n)
 {
-	msg_set_bits(m, 4, 0, 1, (n & 1));
+	msg_set_bits(m, 4, 0, 1, n);
 }
 
 /*
@@ -639,7 +639,7 @@ static inline u32 msg_probe(struct tipc_msg *m)
 
 static inline void msg_set_probe(struct tipc_msg *m, u32 val)
 {
-	msg_set_bits(m, 5, 0, 1, (val & 1));
+	msg_set_bits(m, 5, 0, 1, val);
 }
 
 static inline char msg_net_plane(struct tipc_msg *m)
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH net-next 09/20] tipc: Partition name table instance array info into two parts
From: Paul Gortmaker @ 2011-06-24 22:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Allan.Stephens, Allan Stephens, Paul Gortmaker
In-Reply-To: <1308953247-25266-1-git-send-email-paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Modifies the name table array structure that contains the name
sequence instances for a given name type so that the publication
lists associated with a given instance are stored in a dynamically
allocated structure, rather than being embedded within the array
entry itself. This change is being done for several reasons:

1) It reduces the amount of data that needs to be copied whenever
a given array is expanded or contracted to accommodate the first
publication of a new name sequence or the removal of the last
publication of an existing name sequence.

2) It reduces the amount of memory associated with array entries that
are currently unused.

3) It facilitates the upcoming conversion of the publication lists
from TIPC-specific circular lists to standard kernel lists. (Standard
lists cannot be used with the former array structure because the
relocation of array entries during array expansion and contraction
would corrupt the lists.)

Note that, aside from introducing a small amount of code to dynamically
allocate and free the structure that now holds publication list info,
this change is largely a simple renaming exercise that replaces
references to "sseq->LIST" with "sseq->info->LIST" (or "info->LIST").

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/name_table.c |  150 +++++++++++++++++++++++++++++--------------------
 1 files changed, 89 insertions(+), 61 deletions(-)

diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
index 205ed4a..9cd58f8 100644
--- a/net/tipc/name_table.c
+++ b/net/tipc/name_table.c
@@ -44,9 +44,7 @@
 static int tipc_nametbl_size = 1024;		/* must be a power of 2 */
 
 /**
- * struct sub_seq - container for all published instances of a name sequence
- * @lower: name sequence lower bound
- * @upper: name sequence upper bound
+ * struct name_info - name sequence publication info
  * @node_list: circular list of publications made by own node
  * @cluster_list: circular list of publications made by own cluster
  * @zone_list: circular list of publications made by own zone
@@ -59,9 +57,7 @@ static int tipc_nametbl_size = 1024;		/* must be a power of 2 */
  *       (The cluster and node lists may be empty.)
  */
 
-struct sub_seq {
-	u32 lower;
-	u32 upper;
+struct name_info {
 	struct publication *node_list;
 	struct publication *cluster_list;
 	struct publication *zone_list;
@@ -71,6 +67,19 @@ struct sub_seq {
 };
 
 /**
+ * struct sub_seq - container for all published instances of a name sequence
+ * @lower: name sequence lower bound
+ * @upper: name sequence upper bound
+ * @info: pointer to name sequence publication info
+ */
+
+struct sub_seq {
+	u32 lower;
+	u32 upper;
+	struct name_info *info;
+};
+
+/**
  * struct name_seq - container for all published instances of a name type
  * @type: 32 bit 'type' value for name sequence
  * @sseq: pointer to dynamically-sized array of sub-sequences of this 'type';
@@ -246,6 +255,7 @@ static struct publication *tipc_nameseq_insert_publ(struct name_seq *nseq,
 	struct subscription *st;
 	struct publication *publ;
 	struct sub_seq *sseq;
+	struct name_info *info;
 	int created_subseq = 0;
 
 	sseq = nameseq_find_subseq(nseq, lower);
@@ -258,6 +268,8 @@ static struct publication *tipc_nameseq_insert_publ(struct name_seq *nseq,
 			     type, lower, upper);
 			return NULL;
 		}
+
+		info = sseq->info;
 	} else {
 		u32 inspos;
 		struct sub_seq *freesseq;
@@ -292,6 +304,13 @@ static struct publication *tipc_nameseq_insert_publ(struct name_seq *nseq,
 			nseq->alloc *= 2;
 		}
 
+		info = kzalloc(sizeof(*info), GFP_ATOMIC);
+		if (!info) {
+			warn("Cannot publish {%u,%u,%u}, no memory\n",
+			     type, lower, upper);
+			return NULL;
+		}
+
 		/* Insert new sub-sequence */
 
 		sseq = &nseq->sseqs[inspos];
@@ -301,6 +320,7 @@ static struct publication *tipc_nameseq_insert_publ(struct name_seq *nseq,
 		nseq->first_free++;
 		sseq->lower = lower;
 		sseq->upper = upper;
+		sseq->info = info;
 		created_subseq = 1;
 	}
 
@@ -310,32 +330,32 @@ static struct publication *tipc_nameseq_insert_publ(struct name_seq *nseq,
 	if (!publ)
 		return NULL;
 
-	sseq->zone_list_size++;
-	if (!sseq->zone_list)
-		sseq->zone_list = publ->zone_list_next = publ;
+	info->zone_list_size++;
+	if (!info->zone_list)
+		info->zone_list = publ->zone_list_next = publ;
 	else {
-		publ->zone_list_next = sseq->zone_list->zone_list_next;
-		sseq->zone_list->zone_list_next = publ;
+		publ->zone_list_next = info->zone_list->zone_list_next;
+		info->zone_list->zone_list_next = publ;
 	}
 
 	if (in_own_cluster(node)) {
-		sseq->cluster_list_size++;
-		if (!sseq->cluster_list)
-			sseq->cluster_list = publ->cluster_list_next = publ;
+		info->cluster_list_size++;
+		if (!info->cluster_list)
+			info->cluster_list = publ->cluster_list_next = publ;
 		else {
 			publ->cluster_list_next =
-			sseq->cluster_list->cluster_list_next;
-			sseq->cluster_list->cluster_list_next = publ;
+			info->cluster_list->cluster_list_next;
+			info->cluster_list->cluster_list_next = publ;
 		}
 	}
 
 	if (node == tipc_own_addr) {
-		sseq->node_list_size++;
-		if (!sseq->node_list)
-			sseq->node_list = publ->node_list_next = publ;
+		info->node_list_size++;
+		if (!info->node_list)
+			info->node_list = publ->node_list_next = publ;
 		else {
-			publ->node_list_next = sseq->node_list->node_list_next;
-			sseq->node_list->node_list_next = publ;
+			publ->node_list_next = info->node_list->node_list_next;
+			info->node_list->node_list_next = publ;
 		}
 	}
 
@@ -373,6 +393,7 @@ static struct publication *tipc_nameseq_remove_publ(struct name_seq *nseq, u32 i
 	struct publication *curr;
 	struct publication *prev;
 	struct sub_seq *sseq = nameseq_find_subseq(nseq, inst);
+	struct name_info *info;
 	struct sub_seq *free;
 	struct subscription *s, *st;
 	int removed_subseq = 0;
@@ -380,40 +401,42 @@ static struct publication *tipc_nameseq_remove_publ(struct name_seq *nseq, u32 i
 	if (!sseq)
 		return NULL;
 
+	info = sseq->info;
+
 	/* Remove publication from zone scope list */
 
-	prev = sseq->zone_list;
-	publ = sseq->zone_list->zone_list_next;
+	prev = info->zone_list;
+	publ = info->zone_list->zone_list_next;
 	while ((publ->key != key) || (publ->ref != ref) ||
 	       (publ->node && (publ->node != node))) {
 		prev = publ;
 		publ = publ->zone_list_next;
-		if (prev == sseq->zone_list) {
+		if (prev == info->zone_list) {
 
 			/* Prevent endless loop if publication not found */
 
 			return NULL;
 		}
 	}
-	if (publ != sseq->zone_list)
+	if (publ != info->zone_list)
 		prev->zone_list_next = publ->zone_list_next;
 	else if (publ->zone_list_next != publ) {
 		prev->zone_list_next = publ->zone_list_next;
-		sseq->zone_list = publ->zone_list_next;
+		info->zone_list = publ->zone_list_next;
 	} else {
-		sseq->zone_list = NULL;
+		info->zone_list = NULL;
 	}
-	sseq->zone_list_size--;
+	info->zone_list_size--;
 
 	/* Remove publication from cluster scope list, if present */
 
 	if (in_own_cluster(node)) {
-		prev = sseq->cluster_list;
-		curr = sseq->cluster_list->cluster_list_next;
+		prev = info->cluster_list;
+		curr = info->cluster_list->cluster_list_next;
 		while (curr != publ) {
 			prev = curr;
 			curr = curr->cluster_list_next;
-			if (prev == sseq->cluster_list) {
+			if (prev == info->cluster_list) {
 
 				/* Prevent endless loop for malformed list */
 
@@ -424,27 +447,27 @@ static struct publication *tipc_nameseq_remove_publ(struct name_seq *nseq, u32 i
 				goto end_cluster;
 			}
 		}
-		if (publ != sseq->cluster_list)
+		if (publ != info->cluster_list)
 			prev->cluster_list_next = publ->cluster_list_next;
 		else if (publ->cluster_list_next != publ) {
 			prev->cluster_list_next = publ->cluster_list_next;
-			sseq->cluster_list = publ->cluster_list_next;
+			info->cluster_list = publ->cluster_list_next;
 		} else {
-			sseq->cluster_list = NULL;
+			info->cluster_list = NULL;
 		}
-		sseq->cluster_list_size--;
+		info->cluster_list_size--;
 	}
 end_cluster:
 
 	/* Remove publication from node scope list, if present */
 
 	if (node == tipc_own_addr) {
-		prev = sseq->node_list;
-		curr = sseq->node_list->node_list_next;
+		prev = info->node_list;
+		curr = info->node_list->node_list_next;
 		while (curr != publ) {
 			prev = curr;
 			curr = curr->node_list_next;
-			if (prev == sseq->node_list) {
+			if (prev == info->node_list) {
 
 				/* Prevent endless loop for malformed list */
 
@@ -455,21 +478,22 @@ end_cluster:
 				goto end_node;
 			}
 		}
-		if (publ != sseq->node_list)
+		if (publ != info->node_list)
 			prev->node_list_next = publ->node_list_next;
 		else if (publ->node_list_next != publ) {
 			prev->node_list_next = publ->node_list_next;
-			sseq->node_list = publ->node_list_next;
+			info->node_list = publ->node_list_next;
 		} else {
-			sseq->node_list = NULL;
+			info->node_list = NULL;
 		}
-		sseq->node_list_size--;
+		info->node_list_size--;
 	}
 end_node:
 
 	/* Contract subseq list if no more publications for that subseq */
 
-	if (!sseq->zone_list) {
+	if (!info->zone_list) {
+		kfree(info);
 		free = &nseq->sseqs[nseq->first_free--];
 		memmove(sseq, sseq + 1, (free - (sseq + 1)) * sizeof(*sseq));
 		removed_subseq = 1;
@@ -506,7 +530,7 @@ static void tipc_nameseq_subscribe(struct name_seq *nseq, struct subscription *s
 		return;
 
 	while (sseq != &nseq->sseqs[nseq->first_free]) {
-		struct publication *zl = sseq->zone_list;
+		struct publication *zl = sseq->info->zone_list;
 		if (zl && tipc_subscr_overlap(s, sseq->lower, sseq->upper)) {
 			struct publication *crs = zl;
 			int must_report = 1;
@@ -591,6 +615,7 @@ struct publication *tipc_nametbl_remove_publ(u32 type, u32 lower,
 u32 tipc_nametbl_translate(u32 type, u32 instance, u32 *destnode)
 {
 	struct sub_seq *sseq;
+	struct name_info *info;
 	struct publication *publ = NULL;
 	struct name_seq *seq;
 	u32 ref;
@@ -606,12 +631,13 @@ u32 tipc_nametbl_translate(u32 type, u32 instance, u32 *destnode)
 	if (unlikely(!sseq))
 		goto not_found;
 	spin_lock_bh(&seq->lock);
+	info = sseq->info;
 
 	/* Closest-First Algorithm: */
 	if (likely(!*destnode)) {
-		publ = sseq->node_list;
+		publ = info->node_list;
 		if (publ) {
-			sseq->node_list = publ->node_list_next;
+			info->node_list = publ->node_list_next;
 found:
 			ref = publ->ref;
 			*destnode = publ->node;
@@ -619,35 +645,35 @@ found:
 			read_unlock_bh(&tipc_nametbl_lock);
 			return ref;
 		}
-		publ = sseq->cluster_list;
+		publ = info->cluster_list;
 		if (publ) {
-			sseq->cluster_list = publ->cluster_list_next;
+			info->cluster_list = publ->cluster_list_next;
 			goto found;
 		}
-		publ = sseq->zone_list;
+		publ = info->zone_list;
 		if (publ) {
-			sseq->zone_list = publ->zone_list_next;
+			info->zone_list = publ->zone_list_next;
 			goto found;
 		}
 	}
 
 	/* Round-Robin Algorithm: */
 	else if (*destnode == tipc_own_addr) {
-		publ = sseq->node_list;
+		publ = info->node_list;
 		if (publ) {
-			sseq->node_list = publ->node_list_next;
+			info->node_list = publ->node_list_next;
 			goto found;
 		}
 	} else if (in_own_cluster(*destnode)) {
-		publ = sseq->cluster_list;
+		publ = info->cluster_list;
 		if (publ) {
-			sseq->cluster_list = publ->cluster_list_next;
+			info->cluster_list = publ->cluster_list_next;
 			goto found;
 		}
 	} else {
-		publ = sseq->zone_list;
+		publ = info->zone_list;
 		if (publ) {
-			sseq->zone_list = publ->zone_list_next;
+			info->zone_list = publ->zone_list_next;
 			goto found;
 		}
 	}
@@ -676,6 +702,7 @@ int tipc_nametbl_mc_translate(u32 type, u32 lower, u32 upper, u32 limit,
 	struct name_seq *seq;
 	struct sub_seq *sseq;
 	struct sub_seq *sseq_stop;
+	struct name_info *info;
 	int res = 0;
 
 	read_lock_bh(&tipc_nametbl_lock);
@@ -693,16 +720,17 @@ int tipc_nametbl_mc_translate(u32 type, u32 lower, u32 upper, u32 limit,
 		if (sseq->lower > upper)
 			break;
 
-		publ = sseq->node_list;
+		info = sseq->info;
+		publ = info->node_list;
 		if (publ) {
 			do {
 				if (publ->scope <= limit)
 					tipc_port_list_add(dports, publ->ref);
 				publ = publ->node_list_next;
-			} while (publ != sseq->node_list);
+			} while (publ != info->node_list);
 		}
 
-		if (sseq->cluster_list_size != sseq->node_list_size)
+		if (info->cluster_list_size != info->node_list_size)
 			res = 1;
 	}
 
@@ -840,7 +868,7 @@ static void subseq_list(struct sub_seq *sseq, struct print_buf *buf, u32 depth,
 {
 	char portIdStr[27];
 	const char *scope_str[] = {"", " zone", " cluster", " node"};
-	struct publication *publ = sseq->zone_list;
+	struct publication *publ = sseq->info->zone_list;
 
 	tipc_printf(buf, "%-10u %-10u ", sseq->lower, sseq->upper);
 
@@ -860,7 +888,7 @@ static void subseq_list(struct sub_seq *sseq, struct print_buf *buf, u32 depth,
 		}
 
 		publ = publ->zone_list_next;
-		if (publ == sseq->zone_list)
+		if (publ == sseq->info->zone_list)
 			break;
 
 		tipc_printf(buf, "\n%33s", " ");
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH net-next 11/20] tipc: Eliminate checks for empty zone list during name translation
From: Paul Gortmaker @ 2011-06-24 22:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Allan.Stephens, Allan Stephens, Paul Gortmaker
In-Reply-To: <1308953247-25266-1-git-send-email-paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Gets rid of a pair of checks to see if a name sequence entry in
TIPC's name table has an empty zone list. These checks are pointless
since the zone list can never be empty (i.e. as soon as the list
becomes empty the associated name sequence entry is deleted).

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/name_table.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
index 7d85cc1..46e6b6c 100644
--- a/net/tipc/name_table.c
+++ b/net/tipc/name_table.c
@@ -574,14 +574,13 @@ u32 tipc_nametbl_translate(u32 type, u32 instance, u32 *destnode)
 						cluster_list);
 			list_move_tail(&publ->cluster_list,
 				       &info->cluster_list);
-		} else if (!list_empty(&info->zone_list)) {
+		} else {
 			publ = list_first_entry(&info->zone_list,
 						struct publication,
 						zone_list);
 			list_move_tail(&publ->zone_list,
 				       &info->zone_list);
-		} else
-			goto no_match;
+		}
 	}
 
 	/* Round-Robin Algorithm: */
@@ -598,8 +597,6 @@ u32 tipc_nametbl_translate(u32 type, u32 instance, u32 *destnode)
 					cluster_list);
 		list_move_tail(&publ->cluster_list, &info->cluster_list);
 	} else {
-		if (list_empty(&info->zone_list))
-			goto no_match;
 		publ = list_first_entry(&info->zone_list, struct publication,
 					zone_list);
 		list_move_tail(&publ->zone_list, &info->zone_list);
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH net-next 10/20] tipc: Convert name table publication lists to standard kernel lists
From: Paul Gortmaker @ 2011-06-24 22:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Allan.Stephens, Allan Stephens, Paul Gortmaker
In-Reply-To: <1308953247-25266-1-git-send-email-paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Modifies the main circular linked lists of publications used in TIPC's
name table to use the standard kernel linked list type. This change
simplifies the deletion of an existing publication by eliminating
the need to search up to three lists to locate the publication.
The use of standard list routines also helps improve the readability
of the name table code by make it clearer what each list operation
being performed is actually doing.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/name_table.c |  240 +++++++++++++++++--------------------------------
 net/tipc/name_table.h |   14 ++--
 2 files changed, 90 insertions(+), 164 deletions(-)

diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
index 9cd58f8..7d85cc1 100644
--- a/net/tipc/name_table.c
+++ b/net/tipc/name_table.c
@@ -2,7 +2,7 @@
  * net/tipc/name_table.c: TIPC name table code
  *
  * Copyright (c) 2000-2006, Ericsson AB
- * Copyright (c) 2004-2008, Wind River Systems
+ * Copyright (c) 2004-2008, 2010-2011, Wind River Systems
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -58,9 +58,9 @@ static int tipc_nametbl_size = 1024;		/* must be a power of 2 */
  */
 
 struct name_info {
-	struct publication *node_list;
-	struct publication *cluster_list;
-	struct publication *zone_list;
+	struct list_head node_list;
+	struct list_head cluster_list;
+	struct list_head zone_list;
 	u32 node_list_size;
 	u32 cluster_list_size;
 	u32 zone_list_size;
@@ -311,6 +311,10 @@ static struct publication *tipc_nameseq_insert_publ(struct name_seq *nseq,
 			return NULL;
 		}
 
+		INIT_LIST_HEAD(&info->node_list);
+		INIT_LIST_HEAD(&info->cluster_list);
+		INIT_LIST_HEAD(&info->zone_list);
+
 		/* Insert new sub-sequence */
 
 		sseq = &nseq->sseqs[inspos];
@@ -330,33 +334,17 @@ static struct publication *tipc_nameseq_insert_publ(struct name_seq *nseq,
 	if (!publ)
 		return NULL;
 
+	list_add(&publ->zone_list, &info->zone_list);
 	info->zone_list_size++;
-	if (!info->zone_list)
-		info->zone_list = publ->zone_list_next = publ;
-	else {
-		publ->zone_list_next = info->zone_list->zone_list_next;
-		info->zone_list->zone_list_next = publ;
-	}
 
 	if (in_own_cluster(node)) {
+		list_add(&publ->cluster_list, &info->cluster_list);
 		info->cluster_list_size++;
-		if (!info->cluster_list)
-			info->cluster_list = publ->cluster_list_next = publ;
-		else {
-			publ->cluster_list_next =
-			info->cluster_list->cluster_list_next;
-			info->cluster_list->cluster_list_next = publ;
-		}
 	}
 
 	if (node == tipc_own_addr) {
+		list_add(&publ->node_list, &info->node_list);
 		info->node_list_size++;
-		if (!info->node_list)
-			info->node_list = publ->node_list_next = publ;
-		else {
-			publ->node_list_next = info->node_list->node_list_next;
-			info->node_list->node_list_next = publ;
-		}
 	}
 
 	/*
@@ -390,8 +378,6 @@ static struct publication *tipc_nameseq_remove_publ(struct name_seq *nseq, u32 i
 						    u32 node, u32 ref, u32 key)
 {
 	struct publication *publ;
-	struct publication *curr;
-	struct publication *prev;
 	struct sub_seq *sseq = nameseq_find_subseq(nseq, inst);
 	struct name_info *info;
 	struct sub_seq *free;
@@ -403,96 +389,38 @@ static struct publication *tipc_nameseq_remove_publ(struct name_seq *nseq, u32 i
 
 	info = sseq->info;
 
-	/* Remove publication from zone scope list */
+	/* Locate publication, if it exists */
 
-	prev = info->zone_list;
-	publ = info->zone_list->zone_list_next;
-	while ((publ->key != key) || (publ->ref != ref) ||
-	       (publ->node && (publ->node != node))) {
-		prev = publ;
-		publ = publ->zone_list_next;
-		if (prev == info->zone_list) {
+	list_for_each_entry(publ, &info->zone_list, zone_list) {
+		if ((publ->key == key) && (publ->ref == ref) &&
+		    (!publ->node || (publ->node == node)))
+			goto found;
+	}
+	return NULL;
 
-			/* Prevent endless loop if publication not found */
+found:
+	/* Remove publication from zone scope list */
 
-			return NULL;
-		}
-	}
-	if (publ != info->zone_list)
-		prev->zone_list_next = publ->zone_list_next;
-	else if (publ->zone_list_next != publ) {
-		prev->zone_list_next = publ->zone_list_next;
-		info->zone_list = publ->zone_list_next;
-	} else {
-		info->zone_list = NULL;
-	}
+	list_del(&publ->zone_list);
 	info->zone_list_size--;
 
 	/* Remove publication from cluster scope list, if present */
 
 	if (in_own_cluster(node)) {
-		prev = info->cluster_list;
-		curr = info->cluster_list->cluster_list_next;
-		while (curr != publ) {
-			prev = curr;
-			curr = curr->cluster_list_next;
-			if (prev == info->cluster_list) {
-
-				/* Prevent endless loop for malformed list */
-
-				err("Unable to de-list cluster publication\n"
-				    "{%u%u}, node=0x%x, ref=%u, key=%u)\n",
-				    publ->type, publ->lower, publ->node,
-				    publ->ref, publ->key);
-				goto end_cluster;
-			}
-		}
-		if (publ != info->cluster_list)
-			prev->cluster_list_next = publ->cluster_list_next;
-		else if (publ->cluster_list_next != publ) {
-			prev->cluster_list_next = publ->cluster_list_next;
-			info->cluster_list = publ->cluster_list_next;
-		} else {
-			info->cluster_list = NULL;
-		}
+		list_del(&publ->cluster_list);
 		info->cluster_list_size--;
 	}
-end_cluster:
 
 	/* Remove publication from node scope list, if present */
 
 	if (node == tipc_own_addr) {
-		prev = info->node_list;
-		curr = info->node_list->node_list_next;
-		while (curr != publ) {
-			prev = curr;
-			curr = curr->node_list_next;
-			if (prev == info->node_list) {
-
-				/* Prevent endless loop for malformed list */
-
-				err("Unable to de-list node publication\n"
-				    "{%u%u}, node=0x%x, ref=%u, key=%u)\n",
-				    publ->type, publ->lower, publ->node,
-				    publ->ref, publ->key);
-				goto end_node;
-			}
-		}
-		if (publ != info->node_list)
-			prev->node_list_next = publ->node_list_next;
-		else if (publ->node_list_next != publ) {
-			prev->node_list_next = publ->node_list_next;
-			info->node_list = publ->node_list_next;
-		} else {
-			info->node_list = NULL;
-		}
+		list_del(&publ->node_list);
 		info->node_list_size--;
 	}
-end_node:
 
 	/* Contract subseq list if no more publications for that subseq */
 
-	if (!info->zone_list) {
+	if (list_empty(&info->zone_list)) {
 		kfree(info);
 		free = &nseq->sseqs[nseq->first_free--];
 		memmove(sseq, sseq + 1, (free - (sseq + 1)) * sizeof(*sseq));
@@ -530,12 +458,12 @@ static void tipc_nameseq_subscribe(struct name_seq *nseq, struct subscription *s
 		return;
 
 	while (sseq != &nseq->sseqs[nseq->first_free]) {
-		struct publication *zl = sseq->info->zone_list;
-		if (zl && tipc_subscr_overlap(s, sseq->lower, sseq->upper)) {
-			struct publication *crs = zl;
+		if (tipc_subscr_overlap(s, sseq->lower, sseq->upper)) {
+			struct publication *crs;
+			struct name_info *info = sseq->info;
 			int must_report = 1;
 
-			do {
+			list_for_each_entry(crs, &info->zone_list, zone_list) {
 				tipc_subscr_report_overlap(s,
 							   sseq->lower,
 							   sseq->upper,
@@ -544,8 +472,7 @@ static void tipc_nameseq_subscribe(struct name_seq *nseq, struct subscription *s
 							   crs->node,
 							   must_report);
 				must_report = 0;
-				crs = crs->zone_list_next;
-			} while (crs != zl);
+			}
 		}
 		sseq++;
 	}
@@ -616,9 +543,9 @@ u32 tipc_nametbl_translate(u32 type, u32 instance, u32 *destnode)
 {
 	struct sub_seq *sseq;
 	struct name_info *info;
-	struct publication *publ = NULL;
+	struct publication *publ;
 	struct name_seq *seq;
-	u32 ref;
+	u32 ref = 0;
 
 	if (!tipc_in_scope(*destnode, tipc_own_addr))
 		return 0;
@@ -635,52 +562,56 @@ u32 tipc_nametbl_translate(u32 type, u32 instance, u32 *destnode)
 
 	/* Closest-First Algorithm: */
 	if (likely(!*destnode)) {
-		publ = info->node_list;
-		if (publ) {
-			info->node_list = publ->node_list_next;
-found:
-			ref = publ->ref;
-			*destnode = publ->node;
-			spin_unlock_bh(&seq->lock);
-			read_unlock_bh(&tipc_nametbl_lock);
-			return ref;
-		}
-		publ = info->cluster_list;
-		if (publ) {
-			info->cluster_list = publ->cluster_list_next;
-			goto found;
-		}
-		publ = info->zone_list;
-		if (publ) {
-			info->zone_list = publ->zone_list_next;
-			goto found;
-		}
+		if (!list_empty(&info->node_list)) {
+			publ = list_first_entry(&info->node_list,
+						struct publication,
+						node_list);
+			list_move_tail(&publ->node_list,
+				       &info->node_list);
+		} else if (!list_empty(&info->cluster_list)) {
+			publ = list_first_entry(&info->cluster_list,
+						struct publication,
+						cluster_list);
+			list_move_tail(&publ->cluster_list,
+				       &info->cluster_list);
+		} else if (!list_empty(&info->zone_list)) {
+			publ = list_first_entry(&info->zone_list,
+						struct publication,
+						zone_list);
+			list_move_tail(&publ->zone_list,
+				       &info->zone_list);
+		} else
+			goto no_match;
 	}
 
 	/* Round-Robin Algorithm: */
 	else if (*destnode == tipc_own_addr) {
-		publ = info->node_list;
-		if (publ) {
-			info->node_list = publ->node_list_next;
-			goto found;
-		}
+		if (list_empty(&info->node_list))
+			goto no_match;
+		publ = list_first_entry(&info->node_list, struct publication,
+					node_list);
+		list_move_tail(&publ->node_list, &info->node_list);
 	} else if (in_own_cluster(*destnode)) {
-		publ = info->cluster_list;
-		if (publ) {
-			info->cluster_list = publ->cluster_list_next;
-			goto found;
-		}
+		if (list_empty(&info->cluster_list))
+			goto no_match;
+		publ = list_first_entry(&info->cluster_list, struct publication,
+					cluster_list);
+		list_move_tail(&publ->cluster_list, &info->cluster_list);
 	} else {
-		publ = info->zone_list;
-		if (publ) {
-			info->zone_list = publ->zone_list_next;
-			goto found;
-		}
+		if (list_empty(&info->zone_list))
+			goto no_match;
+		publ = list_first_entry(&info->zone_list, struct publication,
+					zone_list);
+		list_move_tail(&publ->zone_list, &info->zone_list);
 	}
+
+	ref = publ->ref;
+	*destnode = publ->node;
+no_match:
 	spin_unlock_bh(&seq->lock);
 not_found:
 	read_unlock_bh(&tipc_nametbl_lock);
-	return 0;
+	return ref;
 }
 
 /**
@@ -721,13 +652,9 @@ int tipc_nametbl_mc_translate(u32 type, u32 lower, u32 upper, u32 limit,
 			break;
 
 		info = sseq->info;
-		publ = info->node_list;
-		if (publ) {
-			do {
-				if (publ->scope <= limit)
-					tipc_port_list_add(dports, publ->ref);
-				publ = publ->node_list_next;
-			} while (publ != info->node_list);
+		list_for_each_entry(publ, &info->node_list, node_list) {
+			if (publ->scope <= limit)
+				tipc_port_list_add(dports, publ->ref);
 		}
 
 		if (info->cluster_list_size != info->node_list_size)
@@ -868,16 +795,19 @@ static void subseq_list(struct sub_seq *sseq, struct print_buf *buf, u32 depth,
 {
 	char portIdStr[27];
 	const char *scope_str[] = {"", " zone", " cluster", " node"};
-	struct publication *publ = sseq->info->zone_list;
+	struct publication *publ;
+	struct name_info *info;
 
 	tipc_printf(buf, "%-10u %-10u ", sseq->lower, sseq->upper);
 
-	if (depth == 2 || !publ) {
+	if (depth == 2) {
 		tipc_printf(buf, "\n");
 		return;
 	}
 
-	do {
+	info = sseq->info;
+
+	list_for_each_entry(publ, &info->zone_list, zone_list) {
 		sprintf(portIdStr, "<%u.%u.%u:%u>",
 			 tipc_zone(publ->node), tipc_cluster(publ->node),
 			 tipc_node(publ->node), publ->ref);
@@ -886,13 +816,9 @@ static void subseq_list(struct sub_seq *sseq, struct print_buf *buf, u32 depth,
 			tipc_printf(buf, "%-10u %s", publ->key,
 				    scope_str[publ->scope]);
 		}
-
-		publ = publ->zone_list_next;
-		if (publ == sseq->info->zone_list)
-			break;
-
-		tipc_printf(buf, "\n%33s", " ");
-	} while (1);
+		if (!list_is_last(&publ->zone_list, &info->zone_list))
+			tipc_printf(buf, "\n%33s", " ");
+	};
 
 	tipc_printf(buf, "\n");
 }
diff --git a/net/tipc/name_table.h b/net/tipc/name_table.h
index d228bd6..62d77e5 100644
--- a/net/tipc/name_table.h
+++ b/net/tipc/name_table.h
@@ -2,7 +2,7 @@
  * net/tipc/name_table.h: Include file for TIPC name table code
  *
  * Copyright (c) 2000-2006, Ericsson AB
- * Copyright (c) 2004-2005, Wind River Systems
+ * Copyright (c) 2004-2005, 2010-2011, Wind River Systems
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -61,9 +61,9 @@ struct port_list;
  * @subscr: subscription to "node down" event (for off-node publications only)
  * @local_list: adjacent entries in list of publications made by this node
  * @pport_list: adjacent entries in list of publications made by this port
- * @node_list: next matching name seq publication with >= node scope
- * @cluster_list: next matching name seq publication with >= cluster scope
- * @zone_list: next matching name seq publication with >= zone scope
+ * @node_list: adjacent matching name seq publications with >= node scope
+ * @cluster_list: adjacent matching name seq publications with >= cluster scope
+ * @zone_list: adjacent matching name seq publications with >= zone scope
  *
  * Note that the node list, cluster list, and zone list are circular lists.
  */
@@ -79,9 +79,9 @@ struct publication {
 	struct tipc_node_subscr subscr;
 	struct list_head local_list;
 	struct list_head pport_list;
-	struct publication *node_list_next;
-	struct publication *cluster_list_next;
-	struct publication *zone_list_next;
+	struct list_head node_list;
+	struct list_head cluster_list;
+	struct list_head zone_list;
 };
 
 
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH net-next 12/20] tipc: Correct typo in link statistics output
From: Paul Gortmaker @ 2011-06-24 22:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Allan.Stephens, Allan Stephens, Paul Gortmaker
In-Reply-To: <1308953247-25266-1-git-send-email-paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Fixes a minor error in the title of one of the message size profiling
values printed as part of TIPC's link statistics.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/link.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 5ed4b4f..5bfe000 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -2882,7 +2882,7 @@ static int tipc_link_stats(const char *name, char *buf, const u32 buf_size)
 		profile_total = 1;
 	tipc_printf(&pb, "  TX profile sample:%u packets  average:%u octets\n"
 			 "  0-64:%u%% -256:%u%% -1024:%u%% -4096:%u%% "
-			 "-16354:%u%% -32768:%u%% -66000:%u%%\n",
+			 "-16384:%u%% -32768:%u%% -66000:%u%%\n",
 		    l_ptr->stats.msg_length_counts,
 		    l_ptr->stats.msg_lengths_total / profile_total,
 		    percent(l_ptr->stats.msg_length_profile[0], profile_total),
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH net-next 13/20] tipc: Eliminate unused field in bearer structure
From: Paul Gortmaker @ 2011-06-24 22:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Allan.Stephens, Allan Stephens, Paul Gortmaker
In-Reply-To: <1308953247-25266-1-git-send-email-paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Gets rid of counter that records the number of times a bearer has
resumed after congestion or blocking, since the value is never
referenced anywhere.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/bearer.c |    1 -
 net/tipc/bearer.h |    2 --
 2 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 85209ea..85eba9c 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -402,7 +402,6 @@ void tipc_bearer_lock_push(struct tipc_bearer *b_ptr)
 void tipc_continue(struct tipc_bearer *b_ptr)
 {
 	spin_lock_bh(&b_ptr->lock);
-	b_ptr->continue_count++;
 	if (!list_empty(&b_ptr->cong_links))
 		tipc_k_signal((Handler)tipc_bearer_lock_push, (unsigned long)b_ptr);
 	b_ptr->blocked = 0;
diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h
index 31d6172..5ad70ef 100644
--- a/net/tipc/bearer.h
+++ b/net/tipc/bearer.h
@@ -107,7 +107,6 @@ struct media {
  * @link_req: ptr to (optional) structure making periodic link setup requests
  * @links: list of non-congested links associated with bearer
  * @cong_links: list of congested links associated with bearer
- * @continue_count: # of times bearer has resumed after congestion or blocking
  * @active: non-zero if bearer structure is represents a bearer
  * @net_plane: network plane ('A' through 'H') currently associated with bearer
  * @nodes: indicates which nodes in cluster can be reached through bearer
@@ -129,7 +128,6 @@ struct tipc_bearer {
 	struct link_req *link_req;
 	struct list_head links;
 	struct list_head cong_links;
-	u32 continue_count;
 	int active;
 	char net_plane;
 	struct tipc_node_map nodes;
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH net-next 14/20] tipc: Remove unnecessary includes in socket code
From: Paul Gortmaker @ 2011-06-24 22:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Allan.Stephens, Allan Stephens, Paul Gortmaker
In-Reply-To: <1308953247-25266-1-git-send-email-paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Eliminates a pair of #include statements for files that are brought in
automatically by including core.h.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/socket.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 3388373..adb2eff 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -36,9 +36,6 @@
 
 #include <net/sock.h>
 
-#include <linux/tipc.h>
-#include <linux/tipc_config.h>
-
 #include "core.h"
 #include "port.h"
 
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH net-next 15/20] tipc: Eliminate useless check when creating internal message
From: Paul Gortmaker @ 2011-06-24 22:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Allan.Stephens, Allan Stephens, Paul Gortmaker
In-Reply-To: <1308953247-25266-1-git-send-email-paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Gets rid of code that allows tipc_msg_init() to create a short
payload message header. This optimization is possible because
there are no longer any callers who require this capability.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/msg.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/tipc/msg.c b/net/tipc/msg.c
index 03e57bf..83d5096 100644
--- a/net/tipc/msg.c
+++ b/net/tipc/msg.c
@@ -61,10 +61,8 @@ void tipc_msg_init(struct tipc_msg *m, u32 user, u32 type,
 	msg_set_size(m, hsize);
 	msg_set_prevnode(m, tipc_own_addr);
 	msg_set_type(m, type);
-	if (!msg_short(m)) {
-		msg_set_orignode(m, tipc_own_addr);
-		msg_set_destnode(m, destnode);
-	}
+	msg_set_orignode(m, tipc_own_addr);
+	msg_set_destnode(m, destnode);
 }
 
 /**
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH net-next 16/20] tipc: Cleanup of message header size terminology
From: Paul Gortmaker @ 2011-06-24 22:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Allan.Stephens, Allan Stephens, Paul Gortmaker
In-Reply-To: <1308953247-25266-1-git-send-email-paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Performs cosmetic cleanup of the symbolic names used to specify TIPC
payload message header sizes. The revised names now more accurately
reflect the payload messages in which they can appear. In addition,
several places where these payload message symbol names were being used
to create non-payload messages have been updated to use the proper
internal message symbolic name.

No functional changes are introduced by this rework.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/link.c       |    4 ++--
 net/tipc/msg.h        |   10 +++++-----
 net/tipc/name_distr.c |    6 +++---
 net/tipc/port.c       |   20 ++++++++++----------
 4 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 5bfe000..f89570c 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1572,7 +1572,7 @@ static struct sk_buff *link_insert_deferred_queue(struct link *l_ptr,
 static int link_recv_buf_validate(struct sk_buff *buf)
 {
 	static u32 min_data_hdr_size[8] = {
-		SHORT_H_SIZE, MCAST_H_SIZE, LONG_H_SIZE, DIR_MSG_H_SIZE,
+		SHORT_H_SIZE, MCAST_H_SIZE, NAMED_H_SIZE, BASIC_H_SIZE,
 		MAX_H_SIZE, MAX_H_SIZE, MAX_H_SIZE, MAX_H_SIZE
 		};
 
@@ -2553,7 +2553,7 @@ int tipc_link_recv_fragment(struct sk_buff **pending, struct sk_buff **fb,
 		u32 msg_sz = msg_size(imsg);
 		u32 fragm_sz = msg_data_sz(fragm);
 		u32 exp_fragm_cnt = msg_sz/fragm_sz + !!(msg_sz % fragm_sz);
-		u32 max =  TIPC_MAX_USER_MSG_SIZE + LONG_H_SIZE;
+		u32 max =  TIPC_MAX_USER_MSG_SIZE + NAMED_H_SIZE;
 		if (msg_type(imsg) == TIPC_MCAST_MSG)
 			max = TIPC_MAX_USER_MSG_SIZE + MCAST_H_SIZE;
 		if (msg_size(imsg) > max) {
diff --git a/net/tipc/msg.h b/net/tipc/msg.h
index a58975c..d93178f 100644
--- a/net/tipc/msg.h
+++ b/net/tipc/msg.h
@@ -68,10 +68,10 @@
  * Message header sizes
  */
 
-#define SHORT_H_SIZE              24	/* Connected, in-cluster messages */
-#define DIR_MSG_H_SIZE            32	/* Directly addressed messages */
-#define LONG_H_SIZE               40	/* Named messages */
-#define MCAST_H_SIZE              44	/* Multicast messages */
+#define SHORT_H_SIZE              24	/* In-cluster basic payload message */
+#define BASIC_H_SIZE              32	/* Basic payload message */
+#define NAMED_H_SIZE              40	/* Named payload message */
+#define MCAST_H_SIZE              44	/* Multicast payload message */
 #define INT_H_SIZE                40	/* Internal messages */
 #define MIN_H_SIZE                24	/* Smallest legal TIPC header size */
 #define MAX_H_SIZE                60	/* Largest possible TIPC header size */
@@ -357,7 +357,7 @@ static inline void msg_set_mc_netid(struct tipc_msg *m, u32 p)
 
 static inline int msg_short(struct tipc_msg *m)
 {
-	return msg_hdr_sz(m) == 24;
+	return msg_hdr_sz(m) == SHORT_H_SIZE;
 }
 
 static inline u32 msg_orignode(struct tipc_msg *m)
diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
index 80025a1..cd356e5 100644
--- a/net/tipc/name_distr.c
+++ b/net/tipc/name_distr.c
@@ -94,13 +94,13 @@ static void publ_to_item(struct distr_item *i, struct publication *p)
 
 static struct sk_buff *named_prepare_buf(u32 type, u32 size, u32 dest)
 {
-	struct sk_buff *buf = tipc_buf_acquire(LONG_H_SIZE + size);
+	struct sk_buff *buf = tipc_buf_acquire(INT_H_SIZE + size);
 	struct tipc_msg *msg;
 
 	if (buf != NULL) {
 		msg = buf_msg(buf);
-		tipc_msg_init(msg, NAME_DISTRIBUTOR, type, LONG_H_SIZE, dest);
-		msg_set_size(msg, LONG_H_SIZE + size);
+		tipc_msg_init(msg, NAME_DISTRIBUTOR, type, INT_H_SIZE, dest);
+		msg_set_size(msg, INT_H_SIZE + size);
 	}
 	return buf;
 }
diff --git a/net/tipc/port.c b/net/tipc/port.c
index 5311817..70e9de5 100644
--- a/net/tipc/port.c
+++ b/net/tipc/port.c
@@ -222,7 +222,7 @@ struct tipc_port *tipc_createport_raw(void *usr_handle,
 	p_ptr->max_pkt = MAX_PKT_DEFAULT;
 	p_ptr->ref = ref;
 	msg = &p_ptr->phdr;
-	tipc_msg_init(msg, importance, TIPC_NAMED_MSG, LONG_H_SIZE, 0);
+	tipc_msg_init(msg, importance, TIPC_NAMED_MSG, NAMED_H_SIZE, 0);
 	msg_set_origport(msg, ref);
 	INIT_LIST_HEAD(&p_ptr->wait_list);
 	INIT_LIST_HEAD(&p_ptr->subscription.nodesub_list);
@@ -339,10 +339,10 @@ static struct sk_buff *port_build_proto_msg(u32 destport, u32 destnode,
 	struct sk_buff *buf;
 	struct tipc_msg *msg;
 
-	buf = tipc_buf_acquire(LONG_H_SIZE);
+	buf = tipc_buf_acquire(INT_H_SIZE);
 	if (buf) {
 		msg = buf_msg(buf);
-		tipc_msg_init(msg, usr, type, LONG_H_SIZE, destnode);
+		tipc_msg_init(msg, usr, type, INT_H_SIZE, destnode);
 		msg_set_errcode(msg, err);
 		msg_set_destport(msg, destport);
 		msg_set_origport(msg, origport);
@@ -1247,7 +1247,7 @@ int tipc_send2name(u32 ref, struct tipc_name const *name, unsigned int domain,
 	msg_set_type(msg, TIPC_NAMED_MSG);
 	msg_set_orignode(msg, tipc_own_addr);
 	msg_set_origport(msg, ref);
-	msg_set_hdr_sz(msg, LONG_H_SIZE);
+	msg_set_hdr_sz(msg, NAMED_H_SIZE);
 	msg_set_nametype(msg, name->type);
 	msg_set_nameinst(msg, name->instance);
 	msg_set_lookup_scope(msg, tipc_addr_scope(domain));
@@ -1300,7 +1300,7 @@ int tipc_send2port(u32 ref, struct tipc_portid const *dest,
 	msg_set_origport(msg, ref);
 	msg_set_destnode(msg, dest->node);
 	msg_set_destport(msg, dest->ref);
-	msg_set_hdr_sz(msg, DIR_MSG_H_SIZE);
+	msg_set_hdr_sz(msg, BASIC_H_SIZE);
 
 	if (dest->node == tipc_own_addr)
 		res =  tipc_port_recv_sections(p_ptr, num_sect, msg_sect,
@@ -1340,13 +1340,13 @@ int tipc_send_buf2port(u32 ref, struct tipc_portid const *dest,
 	msg_set_origport(msg, ref);
 	msg_set_destnode(msg, dest->node);
 	msg_set_destport(msg, dest->ref);
-	msg_set_hdr_sz(msg, DIR_MSG_H_SIZE);
-	msg_set_size(msg, DIR_MSG_H_SIZE + dsz);
-	if (skb_cow(buf, DIR_MSG_H_SIZE))
+	msg_set_hdr_sz(msg, BASIC_H_SIZE);
+	msg_set_size(msg, BASIC_H_SIZE + dsz);
+	if (skb_cow(buf, BASIC_H_SIZE))
 		return -ENOMEM;
 
-	skb_push(buf, DIR_MSG_H_SIZE);
-	skb_copy_to_linear_data(buf, msg, DIR_MSG_H_SIZE);
+	skb_push(buf, BASIC_H_SIZE);
+	skb_copy_to_linear_data(buf, msg, BASIC_H_SIZE);
 
 	if (dest->node == tipc_own_addr)
 		res = tipc_port_recv_msg(buf);
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH net-next 18/20] tipc: Reject connection protocol message sent to unconnected port
From: Paul Gortmaker @ 2011-06-24 22:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Allan.Stephens, Allan Stephens, Paul Gortmaker
In-Reply-To: <1308953247-25266-1-git-send-email-paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Restructures the logic used in tipc_port_recv_proto_msg() to ensure
that incoming connection protocol messages are handled properly. The
routine now uses a two-stage process that first ensures the message
applies on an existing connection and then processes the request.
This corrects a loophole that allowed a connection probe request to
be processed if it was sent to an unconnected port that had no names
bound to it.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/port.c |   79 ++++++++++++++++++++++++++++---------------------------
 1 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/net/tipc/port.c b/net/tipc/port.c
index 8be68e0..1b20b96 100644
--- a/net/tipc/port.c
+++ b/net/tipc/port.c
@@ -526,62 +526,63 @@ static struct sk_buff *port_build_peer_abort_msg(struct tipc_port *p_ptr, u32 er
 void tipc_port_recv_proto_msg(struct sk_buff *buf)
 {
 	struct tipc_msg *msg = buf_msg(buf);
-	struct tipc_port *p_ptr = tipc_port_lock(msg_destport(msg));
-	u32 err = TIPC_OK;
+	struct tipc_port *p_ptr;
 	struct sk_buff *r_buf = NULL;
-	struct sk_buff *abort_buf = NULL;
-
-	if (!p_ptr) {
-		err = TIPC_ERR_NO_PORT;
-	} else if (p_ptr->connected) {
-		if ((port_peernode(p_ptr) != msg_orignode(msg)) ||
-		    (port_peerport(p_ptr) != msg_origport(msg))) {
-			err = TIPC_ERR_NO_PORT;
-		} else if (msg_type(msg) == CONN_ACK) {
-			int wakeup = tipc_port_congested(p_ptr) &&
-				     p_ptr->congested &&
-				     p_ptr->wakeup;
-			p_ptr->acked += msg_msgcnt(msg);
-			if (tipc_port_congested(p_ptr))
-				goto exit;
-			p_ptr->congested = 0;
-			if (!wakeup)
-				goto exit;
-			p_ptr->wakeup(p_ptr);
-			goto exit;
-		}
-	} else if (p_ptr->published) {
-		err = TIPC_ERR_NO_PORT;
-	}
-	if (err) {
-		r_buf = port_build_proto_msg(msg_origport(msg),
-					     msg_orignode(msg),
-					     msg_destport(msg),
+	u32 orignode = msg_orignode(msg);
+	u32 origport = msg_origport(msg);
+	u32 destport = msg_destport(msg);
+	int wakeable;
+
+	/* Validate connection */
+
+	p_ptr = tipc_port_lock(destport);
+	if (!p_ptr || !p_ptr->connected ||
+	    (port_peernode(p_ptr) != orignode) ||
+	    (port_peerport(p_ptr) != origport)) {
+		r_buf = port_build_proto_msg(origport,
+					     orignode,
+					     destport,
 					     tipc_own_addr,
 					     TIPC_HIGH_IMPORTANCE,
 					     TIPC_CONN_MSG,
-					     err,
+					     TIPC_ERR_NO_PORT,
 					     0);
+		if (p_ptr)
+			tipc_port_unlock(p_ptr);
 		goto exit;
 	}
 
-	/* All is fine */
-	if (msg_type(msg) == CONN_PROBE) {
-		r_buf = port_build_proto_msg(msg_origport(msg),
-					     msg_orignode(msg),
-					     msg_destport(msg),
+	/* Process protocol message sent by peer */
+
+	switch (msg_type(msg)) {
+	case CONN_ACK:
+		wakeable = tipc_port_congested(p_ptr) && p_ptr->congested &&
+			p_ptr->wakeup;
+		p_ptr->acked += msg_msgcnt(msg);
+		if (!tipc_port_congested(p_ptr)) {
+			p_ptr->congested = 0;
+			if (wakeable)
+				p_ptr->wakeup(p_ptr);
+		}
+		break;
+	case CONN_PROBE:
+		r_buf = port_build_proto_msg(origport,
+					     orignode,
+					     destport,
 					     tipc_own_addr,
 					     CONN_MANAGER,
 					     CONN_PROBE_REPLY,
 					     TIPC_OK,
 					     0);
+		break;
+	default:
+		/* CONN_PROBE_REPLY or unrecognized - no action required */
+		break;
 	}
 	p_ptr->probing_state = CONFIRMED;
+	tipc_port_unlock(p_ptr);
 exit:
-	if (p_ptr)
-		tipc_port_unlock(p_ptr);
 	tipc_net_route_msg(r_buf);
-	tipc_net_route_msg(abort_buf);
 	buf_discard(buf);
 }
 
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH net-next 17/20] tipc: Optimize creation of FIN messages
From: Paul Gortmaker @ 2011-06-24 22:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Allan.Stephens, Allan Stephens, Paul Gortmaker
In-Reply-To: <1308953247-25266-1-git-send-email-paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Speeds up the creation of the FIN message that terminates a TIPC
connection. The typical peer termination message is now created by
duplicating the terminating port's standard payload message header
and adjusting the message size, importance, and error code fields,
rather than building all fields of the message from scratch. A FIN
message that is directed to the port itself is created the same way.
but also requires swapping the origin and destination address fields.

In addition to reducing the work required to create FIN messages,
these changes eliminate several instances of duplicated code,

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/port.c |   61 +++++++++++++++++++++---------------------------------
 1 files changed, 24 insertions(+), 37 deletions(-)

diff --git a/net/tipc/port.c b/net/tipc/port.c
index 70e9de5..8be68e0 100644
--- a/net/tipc/port.c
+++ b/net/tipc/port.c
@@ -489,39 +489,38 @@ static void port_handle_node_down(unsigned long ref)
 
 static struct sk_buff *port_build_self_abort_msg(struct tipc_port *p_ptr, u32 err)
 {
-	u32 imp = msg_importance(&p_ptr->phdr);
+	struct sk_buff *buf = port_build_peer_abort_msg(p_ptr, err);
 
-	if (!p_ptr->connected)
-		return NULL;
-	if (imp < TIPC_CRITICAL_IMPORTANCE)
-		imp++;
-	return port_build_proto_msg(p_ptr->ref,
-				    tipc_own_addr,
-				    port_peerport(p_ptr),
-				    port_peernode(p_ptr),
-				    imp,
-				    TIPC_CONN_MSG,
-				    err,
-				    0);
+	if (buf) {
+		struct tipc_msg *msg = buf_msg(buf);
+		msg_swap_words(msg, 4, 5);
+		msg_swap_words(msg, 6, 7);
+	}
+	return buf;
 }
 
 
 static struct sk_buff *port_build_peer_abort_msg(struct tipc_port *p_ptr, u32 err)
 {
-	u32 imp = msg_importance(&p_ptr->phdr);
+	struct sk_buff *buf;
+	struct tipc_msg *msg;
+	u32 imp;
 
 	if (!p_ptr->connected)
 		return NULL;
-	if (imp < TIPC_CRITICAL_IMPORTANCE)
-		imp++;
-	return port_build_proto_msg(port_peerport(p_ptr),
-				    port_peernode(p_ptr),
-				    p_ptr->ref,
-				    tipc_own_addr,
-				    imp,
-				    TIPC_CONN_MSG,
-				    err,
-				    0);
+
+	buf = tipc_buf_acquire(BASIC_H_SIZE);
+	if (buf) {
+		msg = buf_msg(buf);
+		memcpy(msg, &p_ptr->phdr, BASIC_H_SIZE);
+		msg_set_hdr_sz(msg, BASIC_H_SIZE);
+		msg_set_size(msg, BASIC_H_SIZE);
+		imp = msg_importance(msg);
+		if (imp < TIPC_CRITICAL_IMPORTANCE)
+			msg_set_importance(msg, ++imp);
+		msg_set_errcode(msg, err);
+	}
+	return buf;
 }
 
 void tipc_port_recv_proto_msg(struct sk_buff *buf)
@@ -1149,19 +1148,7 @@ int tipc_shutdown(u32 ref)
 	if (!p_ptr)
 		return -EINVAL;
 
-	if (p_ptr->connected) {
-		u32 imp = msg_importance(&p_ptr->phdr);
-		if (imp < TIPC_CRITICAL_IMPORTANCE)
-			imp++;
-		buf = port_build_proto_msg(port_peerport(p_ptr),
-					   port_peernode(p_ptr),
-					   ref,
-					   tipc_own_addr,
-					   imp,
-					   TIPC_CONN_MSG,
-					   TIPC_CONN_SHUTDOWN,
-					   0);
-	}
+	buf = port_build_peer_abort_msg(p_ptr, TIPC_CONN_SHUTDOWN);
 	tipc_port_unlock(p_ptr);
 	tipc_net_route_msg(buf);
 	return tipc_disconnect(ref);
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH net-next 19/20] tipc: Don't create payload message using connection protocol routine
From: Paul Gortmaker @ 2011-06-24 22:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Allan.Stephens, Allan Stephens, Paul Gortmaker
In-Reply-To: <1308953247-25266-1-git-send-email-paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Modifies the logic that creates a connection termination payload
message so that it no longer (mis)uses a routine that creates a
connection protocol message. The revised code is now more easily
understood, and avoids setting several fields that are either not
present in payload messages or were being set more than once.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/port.c |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/net/tipc/port.c b/net/tipc/port.c
index 1b20b96..ab0a8e9 100644
--- a/net/tipc/port.c
+++ b/net/tipc/port.c
@@ -539,14 +539,15 @@ void tipc_port_recv_proto_msg(struct sk_buff *buf)
 	if (!p_ptr || !p_ptr->connected ||
 	    (port_peernode(p_ptr) != orignode) ||
 	    (port_peerport(p_ptr) != origport)) {
-		r_buf = port_build_proto_msg(origport,
-					     orignode,
-					     destport,
-					     tipc_own_addr,
-					     TIPC_HIGH_IMPORTANCE,
-					     TIPC_CONN_MSG,
-					     TIPC_ERR_NO_PORT,
-					     0);
+		r_buf = tipc_buf_acquire(BASIC_H_SIZE);
+		if (r_buf) {
+			msg = buf_msg(r_buf);
+			tipc_msg_init(msg, TIPC_HIGH_IMPORTANCE, TIPC_CONN_MSG,
+				      BASIC_H_SIZE, orignode);
+			msg_set_errcode(msg, TIPC_ERR_NO_PORT);
+			msg_set_origport(msg, destport);
+			msg_set_destport(msg, origport);
+		}
 		if (p_ptr)
 			tipc_port_unlock(p_ptr);
 		goto exit;
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH net-next 20/20] tipc: Optimize creation of connection protocol messages
From: Paul Gortmaker @ 2011-06-24 22:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Allan.Stephens, Allan Stephens, Paul Gortmaker
In-Reply-To: <1308953247-25266-1-git-send-email-paul.gortmaker@windriver.com>

From: Allan Stephens <allan.stephens@windriver.com>

Simplifies the creation of connection protocol messages by eliminating
the passing of information that is no longer required, is constant,
or is contained within the port structure that is issuing the message.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/port.c |   48 ++++++++++++------------------------------------
 1 files changed, 12 insertions(+), 36 deletions(-)

diff --git a/net/tipc/port.c b/net/tipc/port.c
index ab0a8e9..54d812a 100644
--- a/net/tipc/port.c
+++ b/net/tipc/port.c
@@ -327,14 +327,12 @@ int tipc_set_portunreturnable(u32 ref, unsigned int isunrejectable)
 }
 
 /*
- * port_build_proto_msg(): build a port level protocol
- * or a connection abortion message. Called with
- * tipc_port lock on.
+ * port_build_proto_msg(): create connection protocol message for port
+ *
+ * On entry the port must be locked and connected.
  */
-static struct sk_buff *port_build_proto_msg(u32 destport, u32 destnode,
-					    u32 origport, u32 orignode,
-					    u32 usr, u32 type, u32 err,
-					    u32 ack)
+static struct sk_buff *port_build_proto_msg(struct tipc_port *p_ptr,
+					    u32 type, u32 ack)
 {
 	struct sk_buff *buf;
 	struct tipc_msg *msg;
@@ -342,11 +340,10 @@ static struct sk_buff *port_build_proto_msg(u32 destport, u32 destnode,
 	buf = tipc_buf_acquire(INT_H_SIZE);
 	if (buf) {
 		msg = buf_msg(buf);
-		tipc_msg_init(msg, usr, type, INT_H_SIZE, destnode);
-		msg_set_errcode(msg, err);
-		msg_set_destport(msg, destport);
-		msg_set_origport(msg, origport);
-		msg_set_orignode(msg, orignode);
+		tipc_msg_init(msg, CONN_MANAGER, type, INT_H_SIZE,
+			      port_peernode(p_ptr));
+		msg_set_destport(msg, port_peerport(p_ptr));
+		msg_set_origport(msg, p_ptr->ref);
 		msg_set_msgcnt(msg, ack);
 	}
 	return buf;
@@ -458,14 +455,7 @@ static void port_timeout(unsigned long ref)
 	if (p_ptr->probing_state == PROBING) {
 		buf = port_build_self_abort_msg(p_ptr, TIPC_ERR_NO_PORT);
 	} else {
-		buf = port_build_proto_msg(port_peerport(p_ptr),
-					   port_peernode(p_ptr),
-					   p_ptr->ref,
-					   tipc_own_addr,
-					   CONN_MANAGER,
-					   CONN_PROBE,
-					   TIPC_OK,
-					   0);
+		buf = port_build_proto_msg(p_ptr, CONN_PROBE, 0);
 		p_ptr->probing_state = PROBING;
 		k_start_timer(&p_ptr->timer, p_ptr->probing_interval);
 	}
@@ -567,14 +557,7 @@ void tipc_port_recv_proto_msg(struct sk_buff *buf)
 		}
 		break;
 	case CONN_PROBE:
-		r_buf = port_build_proto_msg(origport,
-					     orignode,
-					     destport,
-					     tipc_own_addr,
-					     CONN_MANAGER,
-					     CONN_PROBE_REPLY,
-					     TIPC_OK,
-					     0);
+		r_buf = port_build_proto_msg(p_ptr, CONN_PROBE_REPLY, 0);
 		break;
 	default:
 		/* CONN_PROBE_REPLY or unrecognized - no action required */
@@ -899,14 +882,7 @@ void tipc_acknowledge(u32 ref, u32 ack)
 		return;
 	if (p_ptr->connected) {
 		p_ptr->conn_unacked -= ack;
-		buf = port_build_proto_msg(port_peerport(p_ptr),
-					   port_peernode(p_ptr),
-					   ref,
-					   tipc_own_addr,
-					   CONN_MANAGER,
-					   CONN_ACK,
-					   TIPC_OK,
-					   ack);
+		buf = port_build_proto_msg(p_ptr, CONN_ACK, ack);
 	}
 	tipc_port_unlock(p_ptr);
 	tipc_net_route_msg(buf);
-- 
1.7.4.4


^ permalink raw reply related

* Re: SKB paged fragment lifecycle on receive
From: Ian Campbell @ 2011-06-24 22:44 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: netdev@vger.kernel.org, xen-devel, Rusty Russell
In-Reply-To: <4E04C961.9010302@goop.org>

On Fri, 2011-06-24 at 18:29 +0100, Jeremy Fitzhardinge wrote:
> On 06/24/2011 08:43 AM, Ian Campbell wrote:
> > We've previously looked into solutions using the skb destructor callback
> > but that falls over if the skb is cloned since you also need to know
> > when the clone is destroyed. Jeremy Fitzhardinge and I subsequently
> > looked at the possibility of a no-clone skb flag (i.e. always forcing a
> > copy instead of a clone) but IIRC honouring it universally turned into a
> > very twisty maze with a number of nasty corner cases etc. It also seemed
> > that the proportion of SKBs which get cloned at least once appeared as
> > if it could be quite high which would presumably make the performance
> > impact unacceptable when using the flag. Another issue with using the
> > skb destructor is that functions such as __pskb_pull_tail will eat (and
> > free) pages from the start of the frag array such that by the time the
> > skb destructor is called they are no longer there.
> >
> > AIUI Rusty Russell had previously looked into a per-page destructor in
> > the shinfo but found that it couldn't be made to work (I don't remember
> > why, or if I even knew at the time). Could that be an approach worth
> > reinvestigating?
> >
> > I can't really think of any other solution which doesn't involve some
> > sort of driver callback at the time a page is free()d.
> 
> One simple approach would be to simply make sure that we retain a page
> reference on any granted pages so that the network stack's put pages
> will never result in them being released back to the kernel.  We can
> also install an skb destructor.  If it sees a page being released with a
> refcount of 1, then we know its our own reference and can free the page
> immediately.  If the refcount is > 1 then we can add it to a queue of
> pending pages, which can be periodically polled to free pages whose
> other references have been dropped.

One problem with this is that some functions (__pskb_pull_tail) drop the
ref count and then remove the page from the skb's fraglist. So by the
time the destructor is called you have lost the page and cannot do the
refcount checking.

I suppose we could keep a queue of _all_ pages we ever put in an SKB
which we poll. We could still check for pages with count==1 in the
destructor. Apart from the other issues with the destructor not being
copied over clone etc which would cause us to fall-back to polling the
queue more often than not I reckon.

> That said, I think an event-based rather than polling based mechanism
> would be much more preferable.

Absolutely.

> > I expect that wrapping the uses of get/put_page in a network specific
> > wrapper (e.g. skb_{get,frag}_frag(skb, nr) would be a useful first step
> > in any solution. That's a pretty big task/patch in itself but could be
> > done. Might it be worthwhile in for its own sake?
> 
> Is there some way to do it so that you'd get compiler warnings/errors in
> missed cases?  I guess wrap "struct page" in some other type would go
> some way to helping.

I was thinking it could be done by changing the field name (e.g. even
just to _frag), add the wrapper and fixup everything grep could find and
then run an allBLAHconfig build, fix the compile errors, repeat.

Once the transition is complete we would have the option of putting the
name back -- since it would only mean changing the wrapper. Although I
don't know if we would necessarily want that since otherwise new
open-coded users will likely creep in.

> > Does anyone have any ideas or advice for other approaches I could try
> > (either on the driver or stack side)?
> >
> > FWIW I proposed a session on the subject for LPC this year. The proposal
> > was for the virtualisation track although as I say I think the class of
> > problem reaches a bit wider than that. Whether the session will be a
> > discussion around ways of solving the issue or a presentation on the
> > solution remains to be seen ;-)
> >
> > Ian.
> >
> > [0] at least with a mainline kernel, in the older out-of-tree Xen stuff
> > we had a PageForeign page-flag and a destructor function in a spare
> > struct page field which was called from the mm free routines
> > (free_pages_prepare and free_hot_cold_page). I'm under no illusions
> > about the upstreamability of this approach...
> 
> When I last asked AKPM about this - a long time ago - the problem was
> that we'd simply run out of page flags (at least on 32-bit x86), so it
> simply wasn't implementable.  But since then the page flags have been
> rearranged and I think there's less pressure on them - but they're still
> a valuable resource, so the justification would need to be strong (ie,
> multiple convincing users).
> 
>     J



^ permalink raw reply

* Re: [Pv-drivers] [PATCH] vmxnet3: Enable GRO support.
From: Jesse Gross @ 2011-06-24 22:47 UTC (permalink / raw)
  To: Scott Goldman
  Cc: David Miller, Shreyas Bhatewara, VMware PV-Drivers,
	netdev@vger.kernel.org
In-Reply-To: <03E840D17E263A48A5766AD576E0423A0129262BB5@exch-mbx-111.vmware.com>

On Fri, Jun 24, 2011 at 3:02 PM, Scott Goldman <scottjg@vmware.com> wrote:
>> -                     netif_receive_skb(skb);
>> +                     napi_gro_receive(&rq->napi, skb);
>
> So... this doesn't discriminate between if LRO is off or on.  The last time I tried using GRO on top of our hardware LRO, there was actually some minor performance penalty. Do you have any benchmarks showing that this is ok? If not, do you think it might make sense to just do gro only if(unlikely(lro is off))?

I ran some benchmarks and do see a slight performance drop with GRO
when LRO is also on, so it seems reasonable to avoid it in that
situation.  I can resubmit with that change.

As an aside, in many cases the hypervisor actually has all of the
information that is necessary to keep LRO on but does not provide it
to the guest.  For example, in the VM-to-VM case the MSS is provided
by the sender as part of the TSO descriptor and if given to the
receiver we could generate a GSO frame and avoid the need to do GRO in
the first place.  Do you know if it is possible to do this?

^ permalink raw reply

* Re: SKB paged fragment lifecycle on receive
From: Jeremy Fitzhardinge @ 2011-06-24 22:48 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev@vger.kernel.org, Rusty Russell, xen-devel
In-Reply-To: <1308955477.5807.8.camel@dagon.hellion.org.uk>

On 06/24/2011 03:44 PM, Ian Campbell wrote:
> One problem with this is that some functions (__pskb_pull_tail) drop the
> ref count and then remove the page from the skb's fraglist. So by the
> time the destructor is called you have lost the page and cannot do the
> refcount checking.
>
> I suppose we could keep a queue of _all_ pages we ever put in an SKB
> which we poll.

Right, that seems like the only way to make sure we don't lose anything.

>  We could still check for pages with count==1 in the
> destructor. Apart from the other issues with the destructor not being
> copied over clone etc which would cause us to fall-back to polling the
> queue more often than not I reckon.

Yeah, sounds like it.

>> That said, I think an event-based rather than polling based mechanism
>> would be much more preferable.
> Absolutely.

>From what Eric and David were saying, it seems we don't really have any
other workable options.

    J

^ permalink raw reply

* RE: [Pv-drivers] [PATCH] vmxnet3: Enable GRO support.
From: Scott Goldman @ 2011-06-24 23:23 UTC (permalink / raw)
  To: Jesse Gross
  Cc: David Miller, Shreyas Bhatewara, VMware PV-Drivers,
	netdev@vger.kernel.org
In-Reply-To: <BANLkTi=OJSnGQ99cG-QaaxxMp1od3KGCKQ@mail.gmail.com>

> I ran some benchmarks and do see a slight performance drop with GRO
> when LRO is also on, so it seems reasonable to avoid it in that
> situation.  I can resubmit with that change.

Cool, thanks.

> As an aside, in many cases the hypervisor actually has all of the
> information that is necessary to keep LRO on but does not provide it
> to the guest.  For example, in the VM-to-VM case the MSS is provided
> by the sender as part of the TSO descriptor and if given to the
> receiver we could generate a GSO frame and avoid the need to do GRO in
> the first place.  Do you know if it is possible to do this?

I think that sounds like a pretty good idea, but if I understand correctly, that change needs to go in the hypervisor, not just the driver. The device emulation backend needs to populate that MSS somewhere in the receive descriptor. I will file an internal bug about it, but just to set expectations, ESX 5.0 is about to be released, so realistically at the earliest, this change may not be publically available for another year.
	
-sjg

P.S. Ronghua, the vmxnet3 mastermind, works at Nicira now. If you see him, tell him I said hi.

^ permalink raw reply

* Re: [PATCH] bridge: Forward EAPOL Kconfig option BRIDGE_PAE_FORWARD
From: Nick Carter @ 2011-06-24 23:33 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, davem
In-Reply-To: <BANLkTin4XOCmpFaETsjkYb2kk+psZUBKcA@mail.gmail.com>

Updated diffs addressing Stephens comments

diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index d9d1e2b..a401ed4 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -214,6 +214,7 @@ static struct net_device *new_bridge_dev(struct
net *net, const char *name)
 	br->topology_change = 0;
 	br->topology_change_detected = 0;
 	br->ageing_time = 300 * HZ;
+	br->pae_forward = false;

 	br_netfilter_rtable_init(br);

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 90e985b..79b03fa 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -98,6 +98,14 @@ int br_handle_frame_finish(struct sk_buff *skb)
 	}

 	if (skb) {
+		/* Prevent Crosstalk where a Supplicant on one Port attempts to
+		 * interfere with authentications occurring on another Port.
+		 * (IEEE Std 802.1X-2001 C.3.3)
+		 */
+		if (unlikely(!br->pae_forward &&
+		    skb->protocol == htons(ETH_P_PAE)))
+			goto drop;
+
 		if (dst)
 			br_forward(dst->dst, skb, skb2);
 		else
@@ -166,6 +174,10 @@ struct sk_buff *br_handle_frame(struct sk_buff *skb)
 		if (p->br->stp_enabled == BR_NO_STP && dest[5] == 0)
 			goto forward;

+		/* Check if PAE frame should be forwarded */
+		if (p->br->pae_forward && skb->protocol == htons(ETH_P_PAE))
+			goto forward;
+
 		if (NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN, skb, skb->dev,
 			    NULL, br_handle_local_finish))
 			return NULL;	/* frame consumed by filter */
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 4e1b620..8977d66 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -244,6 +244,8 @@ struct net_bridge
 	struct timer_list		multicast_query_timer;
 #endif

+	bool pae_forward;		/* 802.1x frames forwarded / dropped */
+
 	struct timer_list		hello_timer;
 	struct timer_list		tcn_timer;
 	struct timer_list		topology_change_timer;
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index 5c1e555..de3550f 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -679,6 +679,28 @@ static DEVICE_ATTR(nf_call_arptables, S_IRUGO | S_IWUSR,
 		   show_nf_call_arptables, store_nf_call_arptables);
 #endif

+static ssize_t show_pae_forward(struct device *d, struct
device_attribute *attr,
+				char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%d\n", br->pae_forward);
+}
+
+static int set_pae_forward(struct net_bridge *br, unsigned long val)
+{
+	br->pae_forward = val ? true : false;
+	return 0;
+}
+
+static ssize_t store_pae_forward(struct device *d,
+				 struct device_attribute *attr, const char *buf,
+				 size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_pae_forward);
+}
+static DEVICE_ATTR(pae_forward, S_IRUGO | S_IWUSR, show_pae_forward,
+		   store_pae_forward);
+
 static struct attribute *bridge_attrs[] = {
 	&dev_attr_forward_delay.attr,
 	&dev_attr_hello_time.attr,
@@ -698,6 +720,7 @@ static struct attribute *bridge_attrs[] = {
 	&dev_attr_gc_timer.attr,
 	&dev_attr_group_addr.attr,
 	&dev_attr_flush.attr,
+	&dev_attr_pae_forward.attr,
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 	&dev_attr_multicast_router.attr,
 	&dev_attr_multicast_snooping.attr,

On 24 June 2011 22:29, Nick Carter <ncarter100@gmail.com> wrote:
> On 24 June 2011 20:08, Stephen Hemminger
> <shemminger@linux-foundation.org> wrote:
>> On Fri, 24 Jun 2011 19:29:41 +0100
>> Nick Carter <ncarter100@gmail.com> wrote:
>>
>>> New diffs below with the Kconfig option removed as requested.
>>>
>>> Now all users and distro's will get the correct 802.1x bridge
>>> behaviour by default.  That is EAPOL frames attempting to traverse the
>>> bridge will be dropped (IEEE Std 802.1X-2001 C.3.3).
>>>
>>> Users or distro's who want the non-standard behaviour of forwarding
>>> EAPOL frames, can use a simple runtime configuration change to the
>>> sysfs bridge/pae_forward attribute.
>>
>> This is much better, thanks.
>> See the comments for how to make the code more compact and tighter.
>>
>>> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
>>> index d9d1e2b..91c1b71 100644
>>> --- a/net/bridge/br_if.c
>>> +++ b/net/bridge/br_if.c
>>> @@ -214,6 +214,7 @@ static struct net_device *new_bridge_dev(struct
>>> net *net, const char *name)
>>>       br->topology_change = 0;
>>>       br->topology_change_detected = 0;
>>>       br->ageing_time = 300 * HZ;
>>> +     br->pae_forward = BR_PAE_DEFAULT;
>>
>> It is just a boolean, why the verbose enum values?
> In case we want BR_PAE_<foo> in the future, not that I can think of a
> 3rd option now.  So happy to change to a boolean.
>>
>>>       br_netfilter_rtable_init(br);
>>>
>>> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
>>> index 90e985b..edeb92d 100644
>>> --- a/net/bridge/br_input.c
>>> +++ b/net/bridge/br_input.c
>>> @@ -43,6 +43,16 @@ static int br_pass_frame_up(struct sk_buff *skb)
>>>                      netif_receive_skb);
>>>  }
>>>
>>> +static inline bool br_pae_forward(struct net_bridge *br, __be16 proto)
>>> +{
>>> +     return br->pae_forward == BR_PAE_FORWARD && proto == htons(ETH_P_PAE);
>>> +}
>>> +
>>> +static inline bool br_pae_drop(struct net_bridge *br, __be16 proto)
>>> +{
>>> +     return br->pae_forward == BR_PAE_DEFAULT && proto == htons(ETH_P_PAE);
>>> +}
>>
>> Since only used one place, the extra wrappers aren't helping.
> I thought they helped readability, but certainly for performance we
> should only be doing each check once in a single place.  Again happy
> to change.
>>
>>>  /* note: already called with rcu_read_lock */
>>>  int br_handle_frame_finish(struct sk_buff *skb)
>>>  {
>>> @@ -98,6 +108,10 @@ int br_handle_frame_finish(struct sk_buff *skb)
>>>       }
>>>
>>>       if (skb) {
>>> +             /* Prevent Crosstalk (IEEE Std 802.1X-2001 C.3.3) */
>>> +             if (unlikely(br_pae_drop(br, skb->protocol)))
>>> +                     goto drop;
>>> +
>>
>> Referencing standard is good, but perhaps explaining what that means.
> ok
>
>> Since these are multicast frames, will it ever reach this point.
>> This point is reached for unicast frames that are not local.
> yes, think of it as a bug fix rather than part of new functionality
>
>> And won't this change existing behavior since before this 802.1x unicast
>> frames would be forwarded.
> Yes, that was my original motivation for making it a Kconfig setting,
> so there would be no chance of regressions.  But keep in mind that
> 802.1x handshake must start with a multicast.  Its only if that
> multicast is delivered that the reply can be unicast.  So any one
> relying on the existing behaviour of forwarding unicast 802.1x must be
> doing something very strange and non-standard.  I can't imagine what.
> If there is a valid use case then they now have the simple workaround
> of enabling pae forwarding.
>
>>>               if (dst)
>>>                       br_forward(dst->dst, skb, skb2);
>>>               else
>>> @@ -166,6 +180,10 @@ struct sk_buff *br_handle_frame(struct sk_buff *skb)
>>>               if (p->br->stp_enabled == BR_NO_STP && dest[5] == 0)
>>>                       goto forward;
>>>
>>> +             /* Check if PAE frame should be forwarded */
>>> +             if (br_pae_forward(p->br, skb->protocol))
>>> +                     goto forward;
>>> +
>>>               if (NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN, skb, skb->dev,
>>>                           NULL, br_handle_local_finish))
>>>                       return NULL;    /* frame consumed by filter */
>>> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
>>> index 4e1b620..683c057 100644
>>> --- a/net/bridge/br_private.h
>>> +++ b/net/bridge/br_private.h
>>> @@ -244,6 +244,11 @@ struct net_bridge
>>>       struct timer_list               multicast_query_timer;
>>>  #endif
>>>
>>> +     enum {
>>> +             BR_PAE_DEFAULT,         /* 802.1x frames consumed by bridge */
>>> +             BR_PAE_FORWARD,         /* 802.1x frames forwarded by bridge */
>>> +     } pae_forward;
>>> +
>>>       struct timer_list               hello_timer;
>>>       struct timer_list               tcn_timer;
>>>       struct timer_list               topology_change_timer;
>>> diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
>>> index 5c1e555..9bdbc84 100644
>>> --- a/net/bridge/br_sysfs_br.c
>>> +++ b/net/bridge/br_sysfs_br.c
>>> @@ -679,6 +679,31 @@ static DEVICE_ATTR(nf_call_arptables, S_IRUGO | S_IWUSR,
>>>                  show_nf_call_arptables, store_nf_call_arptables);
>>>  #endif
>>>
>>> +static ssize_t show_pae_forward(struct device *d, struct
>>> device_attribute *attr,
>>> +                             char *buf)
>>> +{
>>> +     struct net_bridge *br = to_bridge(d);
>>> +     return sprintf(buf, "%d\n", br->pae_forward);
>>> +}
>>> +
>>> +static int set_pae_forward(struct net_bridge *br, unsigned long val)
>>> +{
>>> +     if (val > BR_PAE_FORWARD)
>>> +             return -EINVAL;
>>> +
>>> +     br->pae_forward = val;
>>> +     return 0;
>>> +}
>>> +
>>> +static ssize_t store_pae_forward(struct device *d,
>>> +                              struct device_attribute *attr, const char *buf,
>>> +                              size_t len)
>>> +{
>>> +     return store_bridge_parm(d, buf, len, set_pae_forward);
>>> +}
>>> +static DEVICE_ATTR(pae_forward, S_IRUGO | S_IWUSR, show_pae_forward,
>>> +                store_pae_forward);
>>> +
>>>  static struct attribute *bridge_attrs[] = {
>>>       &dev_attr_forward_delay.attr,
>>>       &dev_attr_hello_time.attr,
>>> @@ -698,6 +723,7 @@ static struct attribute *bridge_attrs[] = {
>>>       &dev_attr_gc_timer.attr,
>>>       &dev_attr_group_addr.attr,
>>>       &dev_attr_flush.attr,
>>> +     &dev_attr_pae_forward.attr,
>>>  #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
>>>       &dev_attr_multicast_router.attr,
>>>       &dev_attr_multicast_snooping.attr,
>>
>>
>

^ permalink raw reply related

* Re: [RFC PATCH 0/1] BPF JIT for PPC64
From: Benjamin Herrenschmidt @ 2011-06-24 23:33 UTC (permalink / raw)
  To: Kumar Gala; +Cc: Matt Evans, netdev, linuxppc-dev
In-Reply-To: <750BF3E9-A788-4302-8AA1-4630614E20DC@kernel.crashing.org>

On Fri, 2011-06-24 at 04:16 -0500, Kumar Gala wrote:
> > Tested in-situ (tcpdump with varying complexity filters) and with a random BPF
> > generator; I haven't verified loads from the fall back skb_copy_bits path.  Bug
> > reports/testing would be very welcome.
> 
> Would be nice to get PPC32 support as well. 

Patches welcome :-)

Cheers,
Ben.



^ permalink raw reply

* [PATCH 0/2 v2] Fix ipv6 routing table entry limit.
From: David Miller @ 2011-06-24 23:47 UTC (permalink / raw)
  To: netdev; +Cc: sim


As discussed in other threads, the routing table lives in the same
data structure as cached routing entries.  This enforces a false
limit on the number of routing table entries one can install
on ipv6 and this is becomming a real problem for people.

Fix this by adding a DST_NOCOUNT flag and use it for ipv6 routing
table entries.

No special handling is necessary when cloning or copying since
the flags are never directly copied by the clone/copy code in
ipv6.

This is a 2 patch series now in order to sanitize the dst->flags
setting done in net/ipv6/route.c

I've tested this by adding and removing ~16K ipv6 routes over
and over again, the routing cache limit never got hit.

Simon, you'll need to work out that crash you were seeing since
I can't reproduce it on any of my machines.

^ permalink raw reply

* [PATCH 1/2 v2] ipv6: Don't change dst->flags using assignments.
From: David Miller @ 2011-06-24 23:47 UTC (permalink / raw)
  To: netdev; +Cc: sim


This blows away any flags already set in the entry.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/route.c |   12 ++----------
 1 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index de2b1de..c2af4da 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1062,14 +1062,6 @@ struct dst_entry *icmp6_dst_alloc(struct net_device *dev,
 	dst_metric_set(&rt->dst, RTAX_HOPLIMIT, 255);
 	rt->dst.output  = ip6_output;
 
-#if 0	/* there's no chance to use these for ndisc */
-	rt->dst.flags   = ipv6_addr_type(addr) & IPV6_ADDR_UNICAST
-				? DST_HOST
-				: 0;
-	ipv6_addr_copy(&rt->rt6i_dst.addr, addr);
-	rt->rt6i_dst.plen = 128;
-#endif
-
 	spin_lock_bh(&icmp6_dst_lock);
 	rt->dst.next = icmp6_dst_gc_list;
 	icmp6_dst_gc_list = &rt->dst;
@@ -1244,7 +1236,7 @@ int ip6_route_add(struct fib6_config *cfg)
 	ipv6_addr_prefix(&rt->rt6i_dst.addr, &cfg->fc_dst, cfg->fc_dst_len);
 	rt->rt6i_dst.plen = cfg->fc_dst_len;
 	if (rt->rt6i_dst.plen == 128)
-	       rt->dst.flags = DST_HOST;
+	       rt->dst.flags |= DST_HOST;
 
 #ifdef CONFIG_IPV6_SUBTREES
 	ipv6_addr_prefix(&rt->rt6i_src.addr, &cfg->fc_src, cfg->fc_src_len);
@@ -2025,7 +2017,7 @@ struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev,
 
 	in6_dev_hold(idev);
 
-	rt->dst.flags = DST_HOST;
+	rt->dst.flags |= DST_HOST;
 	rt->dst.input = ip6_input;
 	rt->dst.output = ip6_output;
 	rt->rt6i_idev = idev;
-- 
1.7.5.4


^ permalink raw reply related

* [PATCH 2/2 v2] ipv6: Don't put artificial limit on routing table size.
From: David Miller @ 2011-06-24 23:47 UTC (permalink / raw)
  To: netdev; +Cc: sim


IPV6, unlike IPV4, doesn't have a routing cache.

Routing table entries, as well as clones made in response
to route lookup requests, all live in the same table.  And
all of these things are together collected in the destination
cache table for ipv6.

This means that routing table entries count against the garbage
collection limits, even though such entries cannot ever be reclaimed
and are added explicitly by the administrator (rather than being
created in response to lookups).

Therefore it makes no sense to count ipv6 routing table entries
against the GC limits.

Add a DST_NOCOUNT destination cache entry flag, and skip the counting
if it is set.  Use this flag bit in ipv6 when adding routing table
entries.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dst.h |    1 +
 net/core/dst.c    |    6 ++++--
 net/ipv6/route.c  |   13 +++++++------
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 7d15d23..e12ddfb 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -77,6 +77,7 @@ struct dst_entry {
 #define DST_NOPOLICY		0x0004
 #define DST_NOHASH		0x0008
 #define DST_NOCACHE		0x0010
+#define DST_NOCOUNT		0x0020
 	union {
 		struct dst_entry	*next;
 		struct rtable __rcu	*rt_next;
diff --git a/net/core/dst.c b/net/core/dst.c
index 9ccca03..6135f36 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -190,7 +190,8 @@ void *dst_alloc(struct dst_ops *ops, struct net_device *dev,
 	dst->lastuse = jiffies;
 	dst->flags = flags;
 	dst->next = NULL;
-	dst_entries_add(ops, 1);
+	if (!(flags & DST_NOCOUNT))
+		dst_entries_add(ops, 1);
 	return dst;
 }
 EXPORT_SYMBOL(dst_alloc);
@@ -243,7 +244,8 @@ again:
 		neigh_release(neigh);
 	}
 
-	dst_entries_add(dst->ops, -1);
+	if (!(dst->flags & DST_NOCOUNT))
+		dst_entries_add(dst->ops, -1);
 
 	if (dst->ops->destroy)
 		dst->ops->destroy(dst);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c2af4da..0ef1f08 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -228,9 +228,10 @@ static struct rt6_info ip6_blk_hole_entry_template = {
 
 /* allocate dst with ip6_dst_ops */
 static inline struct rt6_info *ip6_dst_alloc(struct dst_ops *ops,
-					     struct net_device *dev)
+					     struct net_device *dev,
+					     int flags)
 {
-	struct rt6_info *rt = dst_alloc(ops, dev, 0, 0, 0);
+	struct rt6_info *rt = dst_alloc(ops, dev, 0, 0, flags);
 
 	memset(&rt->rt6i_table, 0, sizeof(*rt) - sizeof(struct dst_entry));
 
@@ -1042,7 +1043,7 @@ struct dst_entry *icmp6_dst_alloc(struct net_device *dev,
 	if (unlikely(idev == NULL))
 		return NULL;
 
-	rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops, dev);
+	rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops, dev, 0);
 	if (unlikely(rt == NULL)) {
 		in6_dev_put(idev);
 		goto out;
@@ -1206,7 +1207,7 @@ int ip6_route_add(struct fib6_config *cfg)
 		goto out;
 	}
 
-	rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops, NULL);
+	rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops, NULL, DST_NOCOUNT);
 
 	if (rt == NULL) {
 		err = -ENOMEM;
@@ -1726,7 +1727,7 @@ static struct rt6_info * ip6_rt_copy(struct rt6_info *ort)
 {
 	struct net *net = dev_net(ort->rt6i_dev);
 	struct rt6_info *rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops,
-					    ort->dst.dev);
+					    ort->dst.dev, 0);
 
 	if (rt) {
 		rt->dst.input = ort->dst.input;
@@ -2005,7 +2006,7 @@ struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev,
 {
 	struct net *net = dev_net(idev->dev);
 	struct rt6_info *rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops,
-					    net->loopback_dev);
+					    net->loopback_dev, 0);
 	struct neighbour *neigh;
 
 	if (rt == NULL) {
-- 
1.7.5.4


^ permalink raw reply related

* Re: [PATCH net-next 00/20] misc tipc updates / enhancements
From: David Miller @ 2011-06-24 23:55 UTC (permalink / raw)
  To: paul.gortmaker; +Cc: netdev, Allan.Stephens
In-Reply-To: <1308953247-25266-1-git-send-email-paul.gortmaker@windriver.com>

From: Paul Gortmaker <paul.gortmaker@windriver.com>
Date: Fri, 24 Jun 2011 18:07:07 -0400

> A bit more dead code removal, some collapsing of functions with
> too many arguments, and some cosmetic stuff with no real impact.
> 
> But I think the best part in this lot is getting rid of the internal
> (to tipc) duplication of "almost" linked list like code, and just
> having it use the normal shared kernel code for the task. 
> The diffstat summary reflects the net gain here:
> 
>        12 files changed, 276 insertions(+), 381 deletions(-)
> 
> All credit to Al for the work to get here.  I'm just an intermediate
> reviewer -- and happy to see my value-add in that role becoming smaller
> and smaller each time as tipc-2 leaves the SF tipc-1.7.x further behind.
> 
> I've independently tested using the basic tipcTS/tipcTC tests between
> an x86-32 and an x86-64 host, in both directions.

Pulled, thanks a lot.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox