Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Increased Latencies when upgrading kernel version
From: Eric Dumazet @ 2010-04-01 21:19 UTC (permalink / raw)
  To: Taylor Lewick; +Cc: netdev, linux-kernel
In-Reply-To: <o2vd585dc4f1004011212s65cc2beewed2f19321210a249@mail.gmail.com>

Le jeudi 01 avril 2010 à 14:12 -0500, Taylor Lewick a écrit :
> For some time now we've been running an older kernel, 2.6.16.60.  When
> we tried to upgrade, first going to 2.6.27.19 and then to 2.6.32.1 and
> 2.6.33.1 we noticed that latencies increased.  At first we noticed it
> by doing network tests via udpping, netperf, etc.  We made some
> tweaks, and were able to get network latency to within 1 to 2
> microseconds of where we were previously on 2.6.16.60.  Then we did
> some more testing, and noticed that system latency also seems higher.
> 
> We've done our tests on identical hardware servers, same NICs,
> connected through same network gear.  Basically, we've tried to keep
> everything identical except the kernel versions, and we are unable to
> achieve the same performance for system latency on the newer kernels,
> despite adjusting various kernel settings and recompiling.
> 
> The latency differences are about 15 microseconds per transaction.
> 
> At this point, I don't know what else to try.  I haven't played around
> with the /proc/sys/kernel/sched_* paramaters under the newer kernels
> yet.  Have tried changing pre-emption modes with little effect, in
> fact, voluntary preemption seems to be peforming the best for us.
> 
> At this time the realtime patch isn't really an option for us to
> consider, at least not yet.
> 
> Any suggestions?  Is this a known issue when upgrading to more recent
> kernel versions?
> 

Hi Taylor

Well, this is bit difficult to generically answer to your generic
question. 15 us more latency per transaction seems pretty bad.

Some inputs would be nice, describing your workload and
software/hardware architecture.

lspci
cat /proc/cpuinfo
cat /proc/interrupts
dmesg
ethtool -S eth0
ethtool -c eth0




^ permalink raw reply

* Re: [PATCH] netlabel: Fix several rcu_dereference() calls used without RCU read locks
From: Eric Dumazet @ 2010-04-01 20:56 UTC (permalink / raw)
  To: Paul Moore; +Cc: netdev, Paul E. McKenney
In-Reply-To: <20100401204357.9795.80383.stgit@flek.lan>

Le jeudi 01 avril 2010 à 16:43 -0400, Paul Moore a écrit :
> The recent changes to add RCU lock verification to rcu_dereference() calls
> caught out a problem with netlbl_unlhsh_hash(), see below.
> 
>  ===================================================
>  [ INFO: suspicious rcu_dereference_check() usage. ]
>  ---------------------------------------------------
>  net/netlabel/netlabel_unlabeled.c:246 invoked rcu_dereference_check()
>  without protection!
> 
> This patch fixes this problem as well as others like it in the NetLabel
> code.  Also included in this patch is the identification of future work
> to eliminate the RCU read lock in netlbl_domhsh_add(), but in the interest
> of getting this patch out quickly that work will happen in another patch
> to be finished later.
> 
> Thanks to Eric Dumazet and Paul McKenney for their help in understanding
> the recent RCU changes.
> 
> Signed-off-by: Paul Moore <paul.moore@hp.com>
> Reported-by: David Howells <dhowells@redhat.com>
> CC: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  net/netlabel/netlabel_domainhash.c |   28 ++++++++++-----
>  net/netlabel/netlabel_unlabeled.c  |   66 ++++++++++--------------------------
>  2 files changed, 37 insertions(+), 57 deletions(-)
> 

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Thanks Paul



^ permalink raw reply

* Re: [PATCH net-next-2.6] net: add support for htonb and ntohb
From: Eric Dumazet @ 2010-04-01 20:47 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: netdev
In-Reply-To: <20100401203209.GC28741@gospo.rdu.redhat.com>

Le jeudi 01 avril 2010 à 16:32 -0400, Andy Gospodarek a écrit :
> After my recent post to net-2.6 this week to accidentally run htons to a
> u8, it is clear to me we _must_ add some infrastructure to make sure
> single bytes are in the correct network and host order on big and little
> endian systems.  Today seemed like the perfect day to post this.
> 
> This patch adds basic support for htonb and ntohb.  Patches to add this
> in the entire networking tree in _every_ case where a single byte is
> accessed will be posted next week -- I'm almost done with them!
> 
> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> ---

Seems fine, thanks a lot Andy.

Hmm, I notice this doesnt handle yet arches where a byte is 9 bits long.

Do you have any plan, to finally support PDP-6/10 ?




^ permalink raw reply

* [PATCH] netlabel: Fix several rcu_dereference() calls used without RCU read locks
From: Paul Moore @ 2010-04-01 20:43 UTC (permalink / raw)
  To: netdev

The recent changes to add RCU lock verification to rcu_dereference() calls
caught out a problem with netlbl_unlhsh_hash(), see below.

 ===================================================
 [ INFO: suspicious rcu_dereference_check() usage. ]
 ---------------------------------------------------
 net/netlabel/netlabel_unlabeled.c:246 invoked rcu_dereference_check()
 without protection!

This patch fixes this problem as well as others like it in the NetLabel
code.  Also included in this patch is the identification of future work
to eliminate the RCU read lock in netlbl_domhsh_add(), but in the interest
of getting this patch out quickly that work will happen in another patch
to be finished later.

Thanks to Eric Dumazet and Paul McKenney for their help in understanding
the recent RCU changes.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Reported-by: David Howells <dhowells@redhat.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 net/netlabel/netlabel_domainhash.c |   28 ++++++++++-----
 net/netlabel/netlabel_unlabeled.c  |   66 ++++++++++--------------------------
 2 files changed, 37 insertions(+), 57 deletions(-)

diff --git a/net/netlabel/netlabel_domainhash.c b/net/netlabel/netlabel_domainhash.c
index 0bfeaab..06ab41b 100644
--- a/net/netlabel/netlabel_domainhash.c
+++ b/net/netlabel/netlabel_domainhash.c
@@ -50,9 +50,12 @@ struct netlbl_domhsh_tbl {
 };
 
 /* Domain hash table */
-/* XXX - updates should be so rare that having one spinlock for the entire
- * hash table should be okay */
+/* updates should be so rare that having one spinlock for the entire hash table
+ * should be okay */
 static DEFINE_SPINLOCK(netlbl_domhsh_lock);
+#define netlbl_domhsh_rcu_deref(p) \
+	rcu_dereference_check(p, rcu_read_lock_held() || \
+				 lockdep_is_held(&netlbl_domhsh_lock))
 static struct netlbl_domhsh_tbl *netlbl_domhsh = NULL;
 static struct netlbl_dom_map *netlbl_domhsh_def = NULL;
 
@@ -106,7 +109,8 @@ static void netlbl_domhsh_free_entry(struct rcu_head *entry)
  * Description:
  * This is the hashing function for the domain hash table, it returns the
  * correct bucket number for the domain.  The caller is responsibile for
- * calling the rcu_read_[un]lock() functions.
+ * ensuring that the hash table is protected with either a RCU read lock or the
+ * hash table lock.
  *
  */
 static u32 netlbl_domhsh_hash(const char *key)
@@ -120,7 +124,7 @@ static u32 netlbl_domhsh_hash(const char *key)
 
 	for (iter = 0, val = 0, len = strlen(key); iter < len; iter++)
 		val = (val << 4 | (val >> (8 * sizeof(u32) - 4))) ^ key[iter];
-	return val & (rcu_dereference(netlbl_domhsh)->size - 1);
+	return val & (netlbl_domhsh_rcu_deref(netlbl_domhsh)->size - 1);
 }
 
 /**
@@ -130,7 +134,8 @@ static u32 netlbl_domhsh_hash(const char *key)
  * Description:
  * Searches the domain hash table and returns a pointer to the hash table
  * entry if found, otherwise NULL is returned.  The caller is responsibile for
- * the rcu hash table locks (i.e. the caller much call rcu_read_[un]lock()).
+ * ensuring that the hash table is protected with either a RCU read lock or the
+ * hash table lock.
  *
  */
 static struct netlbl_dom_map *netlbl_domhsh_search(const char *domain)
@@ -141,7 +146,7 @@ static struct netlbl_dom_map *netlbl_domhsh_search(const char *domain)
 
 	if (domain != NULL) {
 		bkt = netlbl_domhsh_hash(domain);
-		bkt_list = &rcu_dereference(netlbl_domhsh)->tbl[bkt];
+		bkt_list = &netlbl_domhsh_rcu_deref(netlbl_domhsh)->tbl[bkt];
 		list_for_each_entry_rcu(iter, bkt_list, list)
 			if (iter->valid && strcmp(iter->domain, domain) == 0)
 				return iter;
@@ -159,8 +164,8 @@ static struct netlbl_dom_map *netlbl_domhsh_search(const char *domain)
  * Searches the domain hash table and returns a pointer to the hash table
  * entry if an exact match is found, if an exact match is not present in the
  * hash table then the default entry is returned if valid otherwise NULL is
- * returned.  The caller is responsibile for the rcu hash table locks
- * (i.e. the caller much call rcu_read_[un]lock()).
+ * returned.  The caller is responsibile ensuring that the hash table is
+ * protected with either a RCU read lock or the hash table lock.
  *
  */
 static struct netlbl_dom_map *netlbl_domhsh_search_def(const char *domain)
@@ -169,7 +174,7 @@ static struct netlbl_dom_map *netlbl_domhsh_search_def(const char *domain)
 
 	entry = netlbl_domhsh_search(domain);
 	if (entry == NULL) {
-		entry = rcu_dereference(netlbl_domhsh_def);
+		entry = netlbl_domhsh_rcu_deref(netlbl_domhsh_def);
 		if (entry != NULL && !entry->valid)
 			entry = NULL;
 	}
@@ -306,8 +311,11 @@ int netlbl_domhsh_add(struct netlbl_dom_map *entry,
 	struct netlbl_af6list *tmp6;
 #endif /* IPv6 */
 
+	/* XXX - we can remove this RCU read lock as the spinlock protects the
+	 *       entire function, but before we do we need to fixup the
+	 *       netlbl_af[4,6]list RCU functions to do "the right thing" with
+	 *       respect to rcu_dereference() when only a spinlock is held. */
 	rcu_read_lock();
-
 	spin_lock(&netlbl_domhsh_lock);
 	if (entry->domain != NULL)
 		entry_old = netlbl_domhsh_search(entry->domain);
diff --git a/net/netlabel/netlabel_unlabeled.c b/net/netlabel/netlabel_unlabeled.c
index 852d9d7..3b4fde7 100644
--- a/net/netlabel/netlabel_unlabeled.c
+++ b/net/netlabel/netlabel_unlabeled.c
@@ -114,6 +114,9 @@ struct netlbl_unlhsh_walk_arg {
 /* updates should be so rare that having one spinlock for the entire
  * hash table should be okay */
 static DEFINE_SPINLOCK(netlbl_unlhsh_lock);
+#define netlbl_unlhsh_rcu_deref(p) \
+	rcu_dereference_check(p, rcu_read_lock_held() || \
+				 lockdep_is_held(&netlbl_unlhsh_lock))
 static struct netlbl_unlhsh_tbl *netlbl_unlhsh = NULL;
 static struct netlbl_unlhsh_iface *netlbl_unlhsh_def = NULL;
 
@@ -235,15 +238,13 @@ static void netlbl_unlhsh_free_iface(struct rcu_head *entry)
  * Description:
  * This is the hashing function for the unlabeled hash table, it returns the
  * bucket number for the given device/interface.  The caller is responsible for
- * calling the rcu_read_[un]lock() functions.
+ * ensuring that the hash table is protected with either a RCU read lock or
+ * the hash table lock.
  *
  */
 static u32 netlbl_unlhsh_hash(int ifindex)
 {
-	/* this is taken _almost_ directly from
-	 * security/selinux/netif.c:sel_netif_hasfn() as they do pretty much
-	 * the same thing */
-	return ifindex & (rcu_dereference(netlbl_unlhsh)->size - 1);
+	return ifindex & (netlbl_unlhsh_rcu_deref(netlbl_unlhsh)->size - 1);
 }
 
 /**
@@ -253,7 +254,8 @@ static u32 netlbl_unlhsh_hash(int ifindex)
  * Description:
  * Searches the unlabeled connection hash table and returns a pointer to the
  * interface entry which matches @ifindex, otherwise NULL is returned.  The
- * caller is responsible for calling the rcu_read_[un]lock() functions.
+ * caller is responsible for ensuring that the hash table is protected with
+ * either a RCU read lock or the hash table lock.
  *
  */
 static struct netlbl_unlhsh_iface *netlbl_unlhsh_search_iface(int ifindex)
@@ -263,7 +265,7 @@ static struct netlbl_unlhsh_iface *netlbl_unlhsh_search_iface(int ifindex)
 	struct netlbl_unlhsh_iface *iter;
 
 	bkt = netlbl_unlhsh_hash(ifindex);
-	bkt_list = &rcu_dereference(netlbl_unlhsh)->tbl[bkt];
+	bkt_list = &netlbl_unlhsh_rcu_deref(netlbl_unlhsh)->tbl[bkt];
 	list_for_each_entry_rcu(iter, bkt_list, list)
 		if (iter->valid && iter->ifindex == ifindex)
 			return iter;
@@ -272,33 +274,6 @@ static struct netlbl_unlhsh_iface *netlbl_unlhsh_search_iface(int ifindex)
 }
 
 /**
- * netlbl_unlhsh_search_iface_def - Search for a matching interface entry
- * @ifindex: the network interface
- *
- * Description:
- * Searches the unlabeled connection hash table and returns a pointer to the
- * interface entry which matches @ifindex.  If an exact match can not be found
- * and there is a valid default entry, the default entry is returned, otherwise
- * NULL is returned.  The caller is responsible for calling the
- * rcu_read_[un]lock() functions.
- *
- */
-static struct netlbl_unlhsh_iface *netlbl_unlhsh_search_iface_def(int ifindex)
-{
-	struct netlbl_unlhsh_iface *entry;
-
-	entry = netlbl_unlhsh_search_iface(ifindex);
-	if (entry != NULL)
-		return entry;
-
-	entry = rcu_dereference(netlbl_unlhsh_def);
-	if (entry != NULL && entry->valid)
-		return entry;
-
-	return NULL;
-}
-
-/**
  * netlbl_unlhsh_add_addr4 - Add a new IPv4 address entry to the hash table
  * @iface: the associated interface entry
  * @addr: IPv4 address in network byte order
@@ -308,8 +283,7 @@ static struct netlbl_unlhsh_iface *netlbl_unlhsh_search_iface_def(int ifindex)
  * Description:
  * Add a new address entry into the unlabeled connection hash table using the
  * interface entry specified by @iface.  On success zero is returned, otherwise
- * a negative value is returned.  The caller is responsible for calling the
- * rcu_read_[un]lock() functions.
+ * a negative value is returned.
  *
  */
 static int netlbl_unlhsh_add_addr4(struct netlbl_unlhsh_iface *iface,
@@ -349,8 +323,7 @@ static int netlbl_unlhsh_add_addr4(struct netlbl_unlhsh_iface *iface,
  * Description:
  * Add a new address entry into the unlabeled connection hash table using the
  * interface entry specified by @iface.  On success zero is returned, otherwise
- * a negative value is returned.  The caller is responsible for calling the
- * rcu_read_[un]lock() functions.
+ * a negative value is returned.
  *
  */
 static int netlbl_unlhsh_add_addr6(struct netlbl_unlhsh_iface *iface,
@@ -391,8 +364,7 @@ static int netlbl_unlhsh_add_addr6(struct netlbl_unlhsh_iface *iface,
  * Description:
  * Add a new, empty, interface entry into the unlabeled connection hash table.
  * On success a pointer to the new interface entry is returned, on failure NULL
- * is returned.  The caller is responsible for calling the rcu_read_[un]lock()
- * functions.
+ * is returned.
  *
  */
 static struct netlbl_unlhsh_iface *netlbl_unlhsh_add_iface(int ifindex)
@@ -415,10 +387,10 @@ static struct netlbl_unlhsh_iface *netlbl_unlhsh_add_iface(int ifindex)
 		if (netlbl_unlhsh_search_iface(ifindex) != NULL)
 			goto add_iface_failure;
 		list_add_tail_rcu(&iface->list,
-				  &rcu_dereference(netlbl_unlhsh)->tbl[bkt]);
+			     &netlbl_unlhsh_rcu_deref(netlbl_unlhsh)->tbl[bkt]);
 	} else {
 		INIT_LIST_HEAD(&iface->list);
-		if (rcu_dereference(netlbl_unlhsh_def) != NULL)
+		if (netlbl_unlhsh_rcu_deref(netlbl_unlhsh_def) != NULL)
 			goto add_iface_failure;
 		rcu_assign_pointer(netlbl_unlhsh_def, iface);
 	}
@@ -548,8 +520,7 @@ unlhsh_add_return:
  *
  * Description:
  * Remove an IP address entry from the unlabeled connection hash table.
- * Returns zero on success, negative values on failure.  The caller is
- * responsible for calling the rcu_read_[un]lock() functions.
+ * Returns zero on success, negative values on failure.
  *
  */
 static int netlbl_unlhsh_remove_addr4(struct net *net,
@@ -611,8 +582,7 @@ static int netlbl_unlhsh_remove_addr4(struct net *net,
  *
  * Description:
  * Remove an IP address entry from the unlabeled connection hash table.
- * Returns zero on success, negative values on failure.  The caller is
- * responsible for calling the rcu_read_[un]lock() functions.
+ * Returns zero on success, negative values on failure.
  *
  */
 static int netlbl_unlhsh_remove_addr6(struct net *net,
@@ -1547,8 +1517,10 @@ int netlbl_unlabel_getattr(const struct sk_buff *skb,
 	struct netlbl_unlhsh_iface *iface;
 
 	rcu_read_lock();
-	iface = netlbl_unlhsh_search_iface_def(skb->skb_iif);
+	iface = netlbl_unlhsh_search_iface(skb->skb_iif);
 	if (iface == NULL)
+		iface = rcu_dereference(netlbl_unlhsh_def);
+	if (iface == NULL || !iface->valid)
 		goto unlabel_getattr_nolabel;
 	switch (family) {
 	case PF_INET: {


^ permalink raw reply related

* [PATCH net-next-2.6] net: add support for htonb and ntohb
From: Andy Gospodarek @ 2010-04-01 20:32 UTC (permalink / raw)
  To: netdev


After my recent post to net-2.6 this week to accidentally run htons to a
u8, it is clear to me we _must_ add some infrastructure to make sure
single bytes are in the correct network and host order on big and little
endian systems.  Today seemed like the perfect day to post this.

This patch adds basic support for htonb and ntohb.  Patches to add this
in the entire networking tree in _every_ case where a single byte is
accessed will be posted next week -- I'm almost done with them!

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
---

 byteorder/big_endian.h    |   30 ++++++++++++++++++++++++++
 byteorder/generic.h       |    6 +++++
 byteorder/little_endian.h |   30 ++++++++++++++++++++++++++
 swab.h                    |   53 ++++++++++++++++++++++++++++++++++++++++++++--
 types.h                   |    2 +
 5 files changed, 119 insertions(+), 2 deletions(-)

diff --git a/include/linux/byteorder/big_endian.h b/include/linux/byteorder/big_endian.h
index 3c80fd7..87f1089 100644
--- a/include/linux/byteorder/big_endian.h
+++ b/include/linux/byteorder/big_endian.h
@@ -15,30 +15,40 @@
 #define __constant_ntohl(x) ((__force __u32)(__be32)(x))
 #define __constant_htons(x) ((__force __be16)(__u16)(x))
 #define __constant_ntohs(x) ((__force __u16)(__be16)(x))
+#define __constant_htonb(x) ((__force __be8)(__u8)(x))
+#define __constant_ntohb(x) ((__force __u8)(__be8)(x))
 #define __constant_cpu_to_le64(x) ((__force __le64)___constant_swab64((x)))
 #define __constant_le64_to_cpu(x) ___constant_swab64((__force __u64)(__le64)(x))
 #define __constant_cpu_to_le32(x) ((__force __le32)___constant_swab32((x)))
 #define __constant_le32_to_cpu(x) ___constant_swab32((__force __u32)(__le32)(x))
 #define __constant_cpu_to_le16(x) ((__force __le16)___constant_swab16((x)))
 #define __constant_le16_to_cpu(x) ___constant_swab16((__force __u16)(__le16)(x))
+#define __constant_cpu_to_le8(x) ((__force __le8)___constant_swab8((x)))
+#define __constant_le8_to_cpu(x) ___constant_swab8((__force __u8)(__le8)(x))
 #define __constant_cpu_to_be64(x) ((__force __be64)(__u64)(x))
 #define __constant_be64_to_cpu(x) ((__force __u64)(__be64)(x))
 #define __constant_cpu_to_be32(x) ((__force __be32)(__u32)(x))
 #define __constant_be32_to_cpu(x) ((__force __u32)(__be32)(x))
 #define __constant_cpu_to_be16(x) ((__force __be16)(__u16)(x))
 #define __constant_be16_to_cpu(x) ((__force __u16)(__be16)(x))
+#define __constant_cpu_to_be8(x) ((__force __be8)(__u8)(x))
+#define __constant_be8_to_cpu(x) ((__force __u8)(__be8)(x))
 #define __cpu_to_le64(x) ((__force __le64)__swab64((x)))
 #define __le64_to_cpu(x) __swab64((__force __u64)(__le64)(x))
 #define __cpu_to_le32(x) ((__force __le32)__swab32((x)))
 #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
 #define __cpu_to_le16(x) ((__force __le16)__swab16((x)))
 #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
+#define __cpu_to_le8(x) ((__force __le8)__swab8((x)))
+#define __le8_to_cpu(x) __swab8((__force __u8)(__le8)(x))
 #define __cpu_to_be64(x) ((__force __be64)(__u64)(x))
 #define __be64_to_cpu(x) ((__force __u64)(__be64)(x))
 #define __cpu_to_be32(x) ((__force __be32)(__u32)(x))
 #define __be32_to_cpu(x) ((__force __u32)(__be32)(x))
 #define __cpu_to_be16(x) ((__force __be16)(__u16)(x))
 #define __be16_to_cpu(x) ((__force __u16)(__be16)(x))
+#define __cpu_to_be8(x) ((__force __be8)(__u8)(x))
+#define __be8_to_cpu(x) ((__force __u8)(__be8)(x))
 
 static inline __le64 __cpu_to_le64p(const __u64 *p)
 {
@@ -64,6 +74,14 @@ static inline __u16 __le16_to_cpup(const __le16 *p)
 {
 	return __swab16p((__u16 *)p);
 }
+static inline __le8 __cpu_to_le8p(const __u8 *p)
+{
+	return (__force __le8)__swab8p(p);
+}
+static inline __u8 __le8_to_cpup(const __le8 *p)
+{
+	return __swab8p((__u8 *)p);
+}
 static inline __be64 __cpu_to_be64p(const __u64 *p)
 {
 	return (__force __be64)*p;
@@ -88,18 +106,30 @@ static inline __u16 __be16_to_cpup(const __be16 *p)
 {
 	return (__force __u16)*p;
 }
+static inline __be8 __cpu_to_be8p(const __u8 *p)
+{
+	return (__force __be8)*p;
+}
+static inline __u8 __be8_to_cpup(const __be8 *p)
+{
+	return (__force __u8)*p;
+}
 #define __cpu_to_le64s(x) __swab64s((x))
 #define __le64_to_cpus(x) __swab64s((x))
 #define __cpu_to_le32s(x) __swab32s((x))
 #define __le32_to_cpus(x) __swab32s((x))
 #define __cpu_to_le16s(x) __swab16s((x))
 #define __le16_to_cpus(x) __swab16s((x))
+#define __cpu_to_le8s(x) __swab8s((x))
+#define __le8_to_cpus(x) __swab8s((x))
 #define __cpu_to_be64s(x) do { (void)(x); } while (0)
 #define __be64_to_cpus(x) do { (void)(x); } while (0)
 #define __cpu_to_be32s(x) do { (void)(x); } while (0)
 #define __be32_to_cpus(x) do { (void)(x); } while (0)
 #define __cpu_to_be16s(x) do { (void)(x); } while (0)
 #define __be16_to_cpus(x) do { (void)(x); } while (0)
+#define __cpu_to_be8s(x) do { (void)(x); } while (0)
+#define __be8_to_cpus(x) do { (void)(x); } while (0)
 
 #ifdef __KERNEL__
 #include <linux/byteorder/generic.h>
diff --git a/include/linux/byteorder/generic.h b/include/linux/byteorder/generic.h
index 0846e6b..11c9f36 100644
--- a/include/linux/byteorder/generic.h
+++ b/include/linux/byteorder/generic.h
@@ -127,18 +127,24 @@
 
 #undef ntohl
 #undef ntohs
+#undef ntohb
 #undef htonl
 #undef htons
+#undef htonb
 
 #define ___htonl(x) __cpu_to_be32(x)
 #define ___htons(x) __cpu_to_be16(x)
+#define ___htonb(x) __cpu_to_be8(x)
 #define ___ntohl(x) __be32_to_cpu(x)
 #define ___ntohs(x) __be16_to_cpu(x)
+#define ___ntohb(x) __be8_to_cpu(x)
 
 #define htonl(x) ___htonl(x)
 #define ntohl(x) ___ntohl(x)
 #define htons(x) ___htons(x)
 #define ntohs(x) ___ntohs(x)
+#define htonb(x) ___htonb(x)
+#define ntohb(x) ___ntohb(x)
 
 static inline void le16_add_cpu(__le16 *var, u16 val)
 {
diff --git a/include/linux/byteorder/little_endian.h b/include/linux/byteorder/little_endian.h
index 83195fb..ef862db 100644
--- a/include/linux/byteorder/little_endian.h
+++ b/include/linux/byteorder/little_endian.h
@@ -15,30 +15,40 @@
 #define __constant_ntohl(x) ___constant_swab32((__force __be32)(x))
 #define __constant_htons(x) ((__force __be16)___constant_swab16((x)))
 #define __constant_ntohs(x) ___constant_swab16((__force __be16)(x))
+#define __constant_htonb(x) ((__force __be8)___constant_swab8((x)))
+#define __constant_ntohb(x) ___constant_swab8((__force __be8)(x))
 #define __constant_cpu_to_le64(x) ((__force __le64)(__u64)(x))
 #define __constant_le64_to_cpu(x) ((__force __u64)(__le64)(x))
 #define __constant_cpu_to_le32(x) ((__force __le32)(__u32)(x))
 #define __constant_le32_to_cpu(x) ((__force __u32)(__le32)(x))
 #define __constant_cpu_to_le16(x) ((__force __le16)(__u16)(x))
 #define __constant_le16_to_cpu(x) ((__force __u16)(__le16)(x))
+#define __constant_cpu_to_le8(x) ((__force __le8)(__u8)(x))
+#define __constant_le8_to_cpu(x) ((__force __u8)(__le8)(x))
 #define __constant_cpu_to_be64(x) ((__force __be64)___constant_swab64((x)))
 #define __constant_be64_to_cpu(x) ___constant_swab64((__force __u64)(__be64)(x))
 #define __constant_cpu_to_be32(x) ((__force __be32)___constant_swab32((x)))
 #define __constant_be32_to_cpu(x) ___constant_swab32((__force __u32)(__be32)(x))
 #define __constant_cpu_to_be16(x) ((__force __be16)___constant_swab16((x)))
 #define __constant_be16_to_cpu(x) ___constant_swab16((__force __u16)(__be16)(x))
+#define __constant_cpu_to_be8(x) ((__force __be8)___constant_swab8((x)))
+#define __constant_be8_to_cpu(x) ___constant_swab8((__force __u8)(__be8)(x))
 #define __cpu_to_le64(x) ((__force __le64)(__u64)(x))
 #define __le64_to_cpu(x) ((__force __u64)(__le64)(x))
 #define __cpu_to_le32(x) ((__force __le32)(__u32)(x))
 #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
 #define __cpu_to_le16(x) ((__force __le16)(__u16)(x))
 #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
+#define __cpu_to_le8(x) ((__force __le8)(__u8)(x))
+#define __le8_to_cpu(x) ((__force __u8)(__le8)(x))
 #define __cpu_to_be64(x) ((__force __be64)__swab64((x)))
 #define __be64_to_cpu(x) __swab64((__force __u64)(__be64)(x))
 #define __cpu_to_be32(x) ((__force __be32)__swab32((x)))
 #define __be32_to_cpu(x) __swab32((__force __u32)(__be32)(x))
 #define __cpu_to_be16(x) ((__force __be16)__swab16((x)))
 #define __be16_to_cpu(x) __swab16((__force __u16)(__be16)(x))
+#define __cpu_to_be8(x) ((__force __be8)__swab8((x)))
+#define __be8_to_cpu(x) __swab8((__force __u8)(__be8)(x))
 
 static inline __le64 __cpu_to_le64p(const __u64 *p)
 {
@@ -64,6 +74,14 @@ static inline __u16 __le16_to_cpup(const __le16 *p)
 {
 	return (__force __u16)*p;
 }
+static inline __le8 __cpu_to_le8p(const __u8 *p)
+{
+	return (__force __le8)*p;
+}
+static inline __u8 __le8_to_cpup(const __le8 *p)
+{
+	return (__force __u8)*p;
+}
 static inline __be64 __cpu_to_be64p(const __u64 *p)
 {
 	return (__force __be64)__swab64p(p);
@@ -88,18 +106,30 @@ static inline __u16 __be16_to_cpup(const __be16 *p)
 {
 	return __swab16p((__u16 *)p);
 }
+static inline __be8 __cpu_to_be8p(const __u8 *p)
+{
+	return (__force __be8)__swab8p(p);
+}
+static inline __u8 __be8_to_cpup(const __be8 *p)
+{
+	return __swab8p((__u8 *)p);
+}
 #define __cpu_to_le64s(x) do { (void)(x); } while (0)
 #define __le64_to_cpus(x) do { (void)(x); } while (0)
 #define __cpu_to_le32s(x) do { (void)(x); } while (0)
 #define __le32_to_cpus(x) do { (void)(x); } while (0)
 #define __cpu_to_le16s(x) do { (void)(x); } while (0)
 #define __le16_to_cpus(x) do { (void)(x); } while (0)
+#define __cpu_to_le8s(x) do { (void)(x); } while (0)
+#define __le8_to_cpus(x) do { (void)(x); } while (0)
 #define __cpu_to_be64s(x) __swab64s((x))
 #define __be64_to_cpus(x) __swab64s((x))
 #define __cpu_to_be32s(x) __swab32s((x))
 #define __be32_to_cpus(x) __swab32s((x))
 #define __cpu_to_be16s(x) __swab16s((x))
 #define __be16_to_cpus(x) __swab16s((x))
+#define __cpu_to_be8s(x) __swab8s((x))
+#define __be8_to_cpus(x) __swab8s((x))
 
 #ifdef __KERNEL__
 #include <linux/byteorder/generic.h>
diff --git a/include/linux/swab.h b/include/linux/swab.h
index ea0c02f..043d9a6 100644
--- a/include/linux/swab.h
+++ b/include/linux/swab.h
@@ -7,8 +7,11 @@
 
 /*
  * casts are necessary for constants, because we never know how for sure
- * how U/UL/ULL map to __u16, __u32, __u64. At least not in a portable way.
+ * how U/UL/ULL map to __u8, __u16, __u32, __u64. At least not in a portable way.
  */
+#define ___constant_swab8(x) ((__u8)(				\
+	(((__u8)(x) & (__u8)0xffU) << 8)))
+
 #define ___constant_swab16(x) ((__u16)(				\
 	(((__u16)(x) & (__u16)0x00ffU) << 8) |			\
 	(((__u16)(x) & (__u16)0xff00U) >> 8)))
@@ -40,9 +43,18 @@
 /*
  * Implement the following as inlines, but define the interface using
  * macros to allow constant folding when possible:
- * ___swab16, ___swab32, ___swab64, ___swahw32, ___swahb32
+ * ___swab8, ___swab16, ___swab32, ___swab64, ___swahw32, ___swahb32
  */
 
+static inline __attribute_const__ __u8 __fswab8(__u8 val)
+{
+#ifdef __arch_swab8
+	return __arch_swab8(val);
+#else
+	return ___constant_swab8(val);
+#endif
+}
+
 static inline __attribute_const__ __u16 __fswab16(__u16 val)
 {
 #ifdef __arch_swab16
@@ -93,6 +105,15 @@ static inline __attribute_const__ __u32 __fswahb32(__u32 val)
 }
 
 /**
+ * __swab8 - return an 8-bit value
+ * @x: value to not byteswap
+ */
+#define __swab8(x)				\
+	(__builtin_constant_p((__u8)(x)) ?	\
+	___constant_swab8(x) :			\
+	__fswab8(x))
+
+/**
  * __swab16 - return a byteswapped 16-bit value
  * @x: value to byteswap
  */
@@ -142,6 +163,19 @@ static inline __attribute_const__ __u32 __fswahb32(__u32 val)
 	__fswahb32(x))
 
 /**
+ * __swab8p - return an 8-bit value from a pointer
+ * @p: pointer to a naturally-aligned 8-bit value
+ */
+static inline __u8 __swab8p(const __u8 *p)
+{
+#ifdef __arch_swab8p
+	return __arch_swab8p(p);
+#else
+	return __swab8(*p);
+#endif
+}
+
+/**
  * __swab16p - return a byteswapped 16-bit value from a pointer
  * @p: pointer to a naturally-aligned 16-bit value
  */
@@ -211,6 +245,18 @@ static inline __u32 __swahb32p(const __u32 *p)
 }
 
 /**
+ * __swab8s - do not byteswap an 8-bit value in-place
+ * @p: pointer to a naturally-aligned 8-bit value
+ */
+static inline void __swab8s(__u8 *p)
+{
+#ifdef __arch_swab8s
+	__arch_swab8s(p);
+#else
+	*p = __swab8p(p);
+#endif
+}
+/**
  * __swab16s - byteswap a 16-bit value in-place
  * @p: pointer to a naturally-aligned 16-bit value
  */
@@ -279,16 +325,19 @@ static inline void __swahb32s(__u32 *p)
 }
 
 #ifdef __KERNEL__
+# define swab8 __swab8
 # define swab16 __swab16
 # define swab32 __swab32
 # define swab64 __swab64
 # define swahw32 __swahw32
 # define swahb32 __swahb32
+# define swab8p __swab8p
 # define swab16p __swab16p
 # define swab32p __swab32p
 # define swab64p __swab64p
 # define swahw32p __swahw32p
 # define swahb32p __swahb32p
+# define swab8s __swab8s
 # define swab16s __swab16s
 # define swab32s __swab32s
 # define swab64s __swab64s
diff --git a/include/linux/types.h b/include/linux/types.h
index c42724f..f8a0a0f 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -165,6 +165,8 @@ typedef unsigned long blkcnt_t;
 #define __bitwise
 #endif
 
+typedef __u8 __bitwise __le8;
+typedef __u8 __bitwise __be8;
 typedef __u16 __bitwise __le16;
 typedef __u16 __bitwise __be16;
 typedef __u32 __bitwise __le32;

^ permalink raw reply related

* Re: [PATCH 2/3] can: add support for Janz VMOD-ICAN3 Intelligent CAN module
From: Andrew Morton @ 2010-04-01 20:03 UTC (permalink / raw)
  To: Ira W. Snyder; +Cc: linux-kernel, socketcan-core, netdev, sameo
In-Reply-To: <1269881932-3803-3-git-send-email-iws@ovro.caltech.edu>

On Mon, 29 Mar 2010 09:58:51 -0700
"Ira W. Snyder" <iws@ovro.caltech.edu> wrote:

> The Janz VMOD-ICAN3 is a MODULbus daughterboard which fits onto any
> MODULbus carrier board. It is an intelligent CAN controller with a
> microcontroller and associated firmware.
> 

A neat-looking driver.

> ...
>
> +	spin_lock_irqsave(&mod->lock, flags);
>
> ...

It does this rather a lot.  it seems to be doing quite a lot of work
under that lock, too - quite a lot of memcpy_toio(), other stuff.

Is there potential here to disable interrupt for too long?  Not
possible to use spin_lock_bh() here?

^ permalink raw reply

* Increased Latencies when upgrading kernel version
From: Taylor Lewick @ 2010-04-01 19:12 UTC (permalink / raw)
  To: netdev, linux-kernel

For some time now we've been running an older kernel, 2.6.16.60.  When
we tried to upgrade, first going to 2.6.27.19 and then to 2.6.32.1 and
2.6.33.1 we noticed that latencies increased.  At first we noticed it
by doing network tests via udpping, netperf, etc.  We made some
tweaks, and were able to get network latency to within 1 to 2
microseconds of where we were previously on 2.6.16.60.  Then we did
some more testing, and noticed that system latency also seems higher.

We've done our tests on identical hardware servers, same NICs,
connected through same network gear.  Basically, we've tried to keep
everything identical except the kernel versions, and we are unable to
achieve the same performance for system latency on the newer kernels,
despite adjusting various kernel settings and recompiling.

The latency differences are about 15 microseconds per transaction.

At this point, I don't know what else to try.  I haven't played around
with the /proc/sys/kernel/sched_* paramaters under the newer kernels
yet.  Have tried changing pre-emption modes with little effect, in
fact, voluntary preemption seems to be peforming the best for us.

At this time the realtime patch isn't really an option for us to
consider, at least not yet.

Any suggestions?  Is this a known issue when upgrading to more recent
kernel versions?

Thanks,
Taylor

^ permalink raw reply

* Re: [netfilter / iptables] question
From: Jan Engelhardt @ 2010-04-01 18:35 UTC (permalink / raw)
  To: thomas yang; +Cc: netdev, Patrick McHardy
In-Reply-To: <n2if4f837ab1004011011gf3d82d3fzfd1423a812f98520@mail.gmail.com>

On Thursday 2010-04-01 19:11, thomas yang wrote:

>2010/4/2 Jan Engelhardt <jengelh@medozas.de>:
>> On Thursday 2010-04-01 18:44, thomas yang wrote:
>>
>>>Hi,
>>>
>>>I want to copy/ clone a  sk_buff  *skb  to  *skb2  on my Linux router,
>>>  then transmit both  skb2   and  skb  .
>>>
>>>How to do this with   netfilter hook function  / iptables  ?
>>>
>> Wait for xt_TEE to be merged.
>>
>
>I want to route skb to 'path 1' , route the copied/cloned skb2 to
>'path 2'    simultaneously,
>if one of them lose , it will not be a problem.
>
>What's the function of xt_TEE ?

Follow the mailing list.
http://marc.info/?t=127003152600006&r=1&w=2

^ permalink raw reply

* Re: [PATCH v2] Add Mergeable RX buffer feature to vhost_net
From: David Stevens @ 2010-04-01 18:22 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, kvm-owner, netdev, rusty, virtualization
In-Reply-To: <20100401105415.GA3323@redhat.com>

kvm-owner@vger.kernel.org wrote on 04/01/2010 03:54:15 AM:

> On Wed, Mar 31, 2010 at 03:04:43PM -0700, David Stevens wrote:

> > 
> > > > +               head.iov_base = (void 
*)vhost_get_vq_desc(&net->dev, 
> > vq,
> > > > +                       vq->iov, ARRAY_SIZE(vq->iov), &out, &in, 
NULL, 
> > 
> > > > NULL);
> > > 
> > > I this casting confusing.
> > > Is it really expensive to add an array of heads so that
> > > we do not need to cast?
> > 
> >         It needs the heads and the lengths, which looks a lot
> > like an iovec. I was trying to resist adding a new
> > struct XXX { unsigned head; unsigned len; } just for this,
> > but I could make these parallel arrays, one with head index and
> > the other with length.

        Michael, on this one, if I add vq->heads as an argument to
vhost_get_heads (aka vhost_get_desc_n), I'd need the length too.
Would you rather this 1) remain an iovec (and a single arg added) but
cast still there, 2) 2 arrays (head and length) and 2 args added, or
3) a new struct type of {unsigned,int} to carry for the heads+len
instead of iovec?
        My preference would be 1). I agree the casts are ugly, but
it is essentially an iovec the way we use it; it's just that the
base isn't a pointer but a descriptor index instead.

> > 
> >         EAGAIN is not possible after the change, because we don't
> > even enter the loop unless we have an skb on the read queue; the
> > other cases bomb out, so I figured the comment for future work is
> > now done. :-)
> 
> Guest could be buggy so we'll get EFAULT.
> If skb is taken off the rx queue (as below), we might get EAGAIN.

        We break on any error. If we get EAGAIN because someone read
on the socket, this code would break the loop, but EAGAIN is a more
serious problem if it changed since we peeked (because it means
someone else is reading the socket).
        But I don't understand -- are you suggesting that the error
handling be different than that, or that the comment is still
relevant? My intention here is to do the "TODO" from the comment
so that it can be removed, by handling all error cases. I think
because of the peek, EAGAIN isn't something to be ignored anymore,
but the effect is the same whether we break out of the loop or
not, since we retry the packet next time around. Essentially, we
ignore every error since we will redo it with the same packet the
next time around. Maybe we should print something here, but since
we'll be retrying the packet that's still on the socket, a permanent
error would spew continuously. Maybe we should shut down entirely
if we get any negative return value here (including EAGAIN, since
that tells us someone messed with the socket when we don't want them
to).
        If you want the comment still there, ok, but I do think EAGAIN
isn't a special case per the comment anymore, and is handled as all
other errors are: by exiting the loop and retrying next time.

                                                                +-DLS


^ permalink raw reply

* Any ideas about a crash on reboot with igb and intel_iommu?
From: Roland Dreier @ 2010-04-01 18:13 UTC (permalink / raw)
  To: netdev, iommu; +Cc: David Woodhouse

Hi everyone,

I've been asked to help debug a strange crash, and I'm wondering if
anyone has seen something similar.  The setup is a bit awkward because
this is happening in manufacturing burn-in and we have not reproduced it
in the lab yet, so my ability to do specific experiments is still
limited.

Anyway, we have a fairly standard two-socket Xeon server product that
passes all tests with Nehalem CPUs.  However, when we use Westmere CPUs
(which also requires a new BIOS of course), some fraction of the systems
are crashing during burn-in, which basically runs a cycle where it runs
CPU and memory stress tests and then reboots the system for the next
round of tests.  The crash is happening on reboot, and unfortunately I
only have a bunch of pictures of the traceback output, but we've seen
multiple cases where the system is crashing with a traceback like:

  rb_erase
  __free_iova
  flush_unmaps
  intel_unmap_page
  igb_clean_rx_ring
  igb_down
  igb_close
  __igb_shutdown
  igb_shutdown
  pci_device_shutdown
  device_shutdown
  kernel_restart_prepare
  kernel_restart
  sys_reboot

The newest kernel they've been able to try is 2.6.30.9, but from looking
at the kernel changelogs for igb and intel_iommu at least, I don't see
anything particularly promising that was fixed since then.

One other data point is that enabling the BIOS option "maximize memory
under 4GB" (which apparently just allocates less space for PCI BARs
below 4GB) seems to make this crash go away again.

Anyway, does this tickle anyone's memory?  I'm trying to get a better
handle on things, but if this has been seen before, I'd sure love to
skip some of the pain of debugging this.

Thanks,
  Roland
-- 
Roland Dreier <rolandd@cisco.com> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html

^ permalink raw reply

* Re: [PATCH 1/2] phylib: Support phy module autoloading
From: Ben Hutchings @ 2010-04-01 18:05 UTC (permalink / raw)
  To: David Woodhouse; +Cc: davem, netdev, 553024
In-Reply-To: <1270141428.3101.399.camel@macbook.infradead.org>

On Thu, Apr 01, 2010 at 06:03:48PM +0100, David Woodhouse wrote:
> On Thu, 2010-04-01 at 05:34 +0100, Ben Hutchings wrote:
[...]
> > Since you've dealt with (a), and (b) is not really as important, I would
> > just like to suggest some minor changes to your patch 1 (see below).
> > Feel free to fold them in.  Your patch 2 would then need the
> > substitutions s/phy_device_id/mdio_device_id/; s/TABLE(phy/TABLE(mdio/.
> 
> I'll tolerate the silly __u32 crap if I must for consistency, but
> normally I prefer to write in C.
> 
> I did think about 'mdio:' for the module alias, but I decided that
> 'phy:' probably made more sense since these are PHY driver modules and
> the number is the phy_id.
[...]

Many multi-layered communication standards have distinct PHY devices,
and they presumably have their own ID spaces.  phylib deals only with
management of Ethernet PHYs over an MDIO bus, identified using MDIO
ID registers.

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply

* Re: [PATCH] r8169: clean up my printk uglyness
From: Neil Horman @ 2010-04-01 17:30 UTC (permalink / raw)
  To: netdev; +Cc: davem, romieu, brandon
In-Reply-To: <20100401145356.GB14069@shamino.rdu.redhat.com>

On Thu, Apr 01, 2010 at 10:53:56AM -0400, Neil Horman wrote:
> Fix formatting on r8169 printk
> 
> Brandon Philips noted that I had a spacing issue in my printk for the last r8169
> patch that made it quite ugly.  Fix that up and add the PFX macro to it as well
> so it looks like the other r8169 printks
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> 
> 
Grr, yes thanks, V2, fixing my idiocy :)

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 r8169.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 9674005..a119885 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -3227,8 +3227,8 @@ static void rtl8169_set_rxbufsize(struct rtl8169_private *tp,
 	unsigned int max_frame = mtu + VLAN_ETH_HLEN + ETH_FCS_LEN;
 
 	if (max_frame != 16383)
-		printk(KERN_WARNING "WARNING! Changing of MTU on this NIC"
-			"May lead to frame reception errors!\n");
+		printk(KERN_WARNING PFX "WARNING! Changing of MTU on this "
+			"NIC may lead to frame reception errors!\n");
 
 	tp->rx_buf_sz = (max_frame > RX_BUF_SIZE) ? max_frame : RX_BUF_SIZE;
 }


^ permalink raw reply related

* Re: [netfilter / iptables] question
From: thomas yang @ 2010-04-01 17:11 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netdev, Patrick McHardy
In-Reply-To: <alpine.LSU.2.01.1004011857440.788@obet.zrqbmnf.qr>

2010/4/2 Jan Engelhardt <jengelh@medozas.de>:
> On Thursday 2010-04-01 18:44, thomas yang wrote:
>
>>Hi,
>>
>>I want to copy/ clone a  sk_buff  *skb  to  *skb2  on my Linux router,
>>  then transmit both  skb2   and  skb  .
>>
>>How to do this with   netfilter hook function  / iptables  ?
>>
> Wait for xt_TEE to be merged.
>

I want to route skb to 'path 1' , route the copied/cloned skb2 to
'path 2'    simultaneously,
if one of them lose , it will not be a problem.

What's the function of xt_TEE ?


--
Tom

^ permalink raw reply

* [PATCH 1/26] rdma/cm: define native IB address
From: Sean Hefty @ 2010-04-01 17:08 UTC (permalink / raw)
  To: linux-rdma, 'Linux Netdev List'

Define AF_IB and sockaddr_ib to allow the rdma_cm to use native IB
addressing.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
---

 include/linux/socket.h |    2 +
 include/rdma/ib.h      |   89 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+), 0 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 7b3aae2..966e268 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -181,6 +181,7 @@ struct ucred {
 #define AF_PPPOX	24	/* PPPoX sockets		*/
 #define AF_WANPIPE	25	/* Wanpipe API Sockets */
 #define AF_LLC		26	/* Linux LLC			*/
+#define AF_IB		27	/* Native InfiniBand address	*/
 #define AF_CAN		29	/* Controller Area Network      */
 #define AF_TIPC		30	/* TIPC sockets			*/
 #define AF_BLUETOOTH	31	/* Bluetooth sockets 		*/
@@ -221,6 +222,7 @@ struct ucred {
 #define PF_PPPOX	AF_PPPOX
 #define PF_WANPIPE	AF_WANPIPE
 #define PF_LLC		AF_LLC
+#define PF_IB		AF_IB
 #define PF_CAN		AF_CAN
 #define PF_TIPC		AF_TIPC
 #define PF_BLUETOOTH	AF_BLUETOOTH
diff --git a/include/rdma/ib.h b/include/rdma/ib.h
new file mode 100644
index 0000000..cf8f9e7
--- /dev/null
+++ b/include/rdma/ib.h
@@ -0,0 +1,89 @@
+/*
+ * Copyright (c) 2010 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#if !defined(_RDMA_IB_H)
+#define _RDMA_IB_H
+
+#include <linux/types.h>
+
+struct ib_addr {
+	union {
+		__u8		uib_addr8[16];
+		__be16		uib_addr16[8];
+		__be32		uib_addr32[4];
+		__be64		uib_addr64[2];
+	} ib_u;
+#define sib_addr8		ib_u.uib_addr8
+#define sib_addr16		ib_u.uib_addr16
+#define sib_addr32		ib_u.uib_addr32
+#define sib_addr64		ib_u.uib_addr64
+#define sib_raw			ib_u.uib_addr8
+#define sib_subnet_prefix	ib_u.uib_addr64[0]
+#define sib_interface_id	ib_u.uib_addr64[1]
+};
+
+static inline int ib_addr_any(const struct ib_addr *a)
+{
+	return ((a->sib_addr64[0] | a->sib_addr64[1]) == 0);
+}
+
+static inline int ib_addr_loopback(const struct ib_addr *a)
+{
+	return ((a->sib_addr32[0] | a->sib_addr32[1] |
+		 a->sib_addr32[2] | (a->sib_addr32[3] ^ htonl(1))) == 0);
+}
+
+static inline void ib_addr_set(struct ib_addr *addr,
+			       __be32 w1, __be32 w2, __be32 w3, __be32 w4)
+{
+	addr->sib_addr32[0] = w1;
+	addr->sib_addr32[1] = w2;
+	addr->sib_addr32[2] = w3;
+	addr->sib_addr32[3] = w4;
+}
+
+static inline int ib_addr_cmp(const struct ib_addr *a1, const struct ib_addr *a2)
+{
+	return memcmp(a1, a2, sizeof(struct ib_addr));
+}
+
+struct sockaddr_ib {
+	unsigned short int	sib_family;	/* AF_IB */
+	__be16			sib_pkey;
+	__be32			sib_flowinfo;
+	struct ib_addr		sib_addr;
+	__be64			sib_sid;
+	__be64			sib_sid_mask;
+	__u64			sib_scope_id;
+};
+
+#endif /* _RDMA_IB_H */




^ permalink raw reply related

* Re: [PATCH 1/2] phylib: Support phy module autoloading
From: David Woodhouse @ 2010-04-01 17:03 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: davem, netdev, 553024
In-Reply-To: <1270096441.12516.18.camel@localhost>

On Thu, 2010-04-01 at 05:34 +0100, Ben Hutchings wrote:
> On Wed, 2010-03-31 at 02:18 +0100, David Woodhouse wrote:
> > We don't use the normal hotplug mechanism because it doesn't work. It will
> > load the module some time after the device appears, but that's not good
> > enough for us -- we need the driver loaded _immediately_ because otherwise
> > the NIC driver may just abort and then the phy 'device' goes away.
> > 
> > Instead, we just issue a request_module() directly in phy_device_create().
> [...]
> 
> Thanks for doing this, David.  I had a stab at it earlier when this
> problem was reported in Debian <http://bugs.debian.org/553024>.  I
> didn't complete this because (a) I didn't understand all the details of
> adding new device table type, and (b) I tried to avoid duplicating
> information, which turns out to be rather difficult in modules with
> multiple drivers.

It shouldn't be _that_ hard.

You could contrive a macro which you use inside the driver definition
and which takes the phy_id and phy_id_mask as arguments, and has the
side-effect of setting up the MODULE_DEVICE_TABLE data.

> Since you've dealt with (a), and (b) is not really as important, I would
> just like to suggest some minor changes to your patch 1 (see below).
> Feel free to fold them in.  Your patch 2 would then need the
> substitutions s/phy_device_id/mdio_device_id/; s/TABLE(phy/TABLE(mdio/.

I'll tolerate the silly __u32 crap if I must for consistency, but
normally I prefer to write in C.

I did think about 'mdio:' for the module alias, but I decided that
'phy:' probably made more sense since these are PHY driver modules and
the number is the phy_id.

Kernel-doc is good though.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply

* [PATCH net-next-2.6] igb: restrict WoL for 82576 ET2 Quad Port Server Adapter
From: Stefan Assmann @ 2010-04-01 17:00 UTC (permalink / raw)
  To: netdev; +Cc: Duyck, Alexander H, jeffrey.t.kirsher

From: Stefan Assmann <sassmann@redhat.com>

Restrict Wake-on-LAN to first port on 82576 ET2 quad port NICs, as it is
only supported there.

Signed-off-by: Stefan Assmann <sassmann@redhat.com>
---
 drivers/net/igb/igb_ethtool.c |    1 +
 drivers/net/igb/igb_main.c    |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/igb/igb_ethtool.c b/drivers/net/igb/igb_ethtool.c
index 1d4ee41..cdebfbf 100644
--- a/drivers/net/igb/igb_ethtool.c
+++ b/drivers/net/igb/igb_ethtool.c
@@ -1863,6 +1863,7 @@ static int igb_wol_exclusion(struct igb_adapter *adapter,
 		retval = 0;
 		break;
 	case E1000_DEV_ID_82576_QUAD_COPPER:
+	case E1000_DEV_ID_82576_QUAD_COPPER_ET2:
 		/* quad port adapters only support WoL on port A */
 		if (!(adapter->flags & IGB_FLAG_QUAD_PORT_A)) {
 			wol->supported = 0;
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index ea87570..5426f41 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -1597,6 +1597,7 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 			adapter->eeprom_wol = 0;
 		break;
 	case E1000_DEV_ID_82576_QUAD_COPPER:
+	case E1000_DEV_ID_82576_QUAD_COPPER_ET2:
 		/* if quad port adapter, disable WoL on all but port A */
 		if (global_quad_port_a != 0)
 			adapter->eeprom_wol = 0;
-- 
1.6.6.1


^ permalink raw reply related

* Re: [netfilter / iptables] question
From: Jan Engelhardt @ 2010-04-01 16:57 UTC (permalink / raw)
  To: thomas yang; +Cc: Patrick McHardy, netdev
In-Reply-To: <v2if4f837ab1004010944yd08c37c5u23d6f2d07df7ccc6@mail.gmail.com>

On Thursday 2010-04-01 18:44, thomas yang wrote:

>Hi,
>
>I want to copy/ clone a  sk_buff  *skb  to  *skb2  on my Linux router,
>  then transmit both  skb2   and  skb  .
>
>How to do this with   netfilter hook function  / iptables  ?
>
Wait for xt_TEE to be merged.

^ permalink raw reply

* [netfilter / iptables] question
From: thomas yang @ 2010-04-01 16:44 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, jengelh

Hi,

I want to copy/ clone a  sk_buff  *skb  to  *skb2  on my Linux router,
  then transmit both  skb2   and  skb  .

How to do this with   netfilter hook function  / iptables  ?

--
Tom

^ permalink raw reply

* Re: [net-next-2.6 PATCH] netfilter: ctnetlink: compute message size properly
From: Jiri Pirko @ 2010-04-01 10:43 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Eric Dumazet, netdev, netfilter-devel, netfilter, davem,
	Krzysztof Oledzki
In-Reply-To: <4BB47817.4060902@trash.net>

Thu, Apr 01, 2010 at 12:40:23PM CEST, kaber@trash.net wrote:
>Eric Dumazet wrote:
>> Le mercredi 31 mars 2010 à 20:21 +0200, Jiri Pirko a écrit :
>>> Okay, I see your point. How about this:
>>>
>>> Subject: [net-next-2.6 PATCH] netfilter: ctnetlink: compute message size properly V2
>>>
>>> Message size should be dependent on net->ct.sysctl_acct, not on
>>> CONFIG_NF_CT_ACCT definition.
>>>
>> 
>> Then Changelog is not updated with the actual test :)
>
>I've fixed up the changelog and applied the patch, thanks.

Cand I ask on which tree?

thanks

Jirka

^ permalink raw reply

* CAIF device
From: Alan @ 2010-04-01 15:09 UTC (permalink / raw)
  To: netdev, sjur.brandeland

I was reading through the CAIF code and I noticed a couple of bugs

Doesn't check there is a write method so set on a read only
device it's not good news (doubly so as there seem to be no
permission checks ?) plus no permissions checks and also the
following which looks unsafe

        dev_close(ser->dev);
        unregister_netdevice(ser->dev);
        list_del(&ser->node);
        debugfs_deinit(ser);

Now ser is the netdev private data so what stops it going away when
unregister_netdev is called ?

Secondly tty devices are ref counted and this for some reason didn't get
fixed in the driver yet.

[Patches to follow for the write and kref bugs, the others need the
authors and someone who knows the netdev code these days to fix]

^ permalink raw reply

* [PATCH] r8169: clean up my printk uglyness
From: Neil Horman @ 2010-04-01 14:53 UTC (permalink / raw)
  To: netdev; +Cc: davem, romieu, nhorman, brandon

Fix formatting on r8169 printk

Brandon Philips noted that I had a spacing issue in my printk for the last r8169
patch that made it quite ugly.  Fix that up and add the PFX macro to it as well
so it looks like the other r8169 printks

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


 r8169.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 9674005..a119885 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -3227,8 +3227,8 @@ static void rtl8169_set_rxbufsize(struct rtl8169_private *tp,
 	unsigned int max_frame = mtu + VLAN_ETH_HLEN + ETH_FCS_LEN;
 
 	if (max_frame != 16383)
-		printk(KERN_WARNING "WARNING! Changing of MTU on this NIC"
-			"May lead to frame reception errors!\n");
+		printk(KERN_WARNING PFX "WARNING! Changing of MTU on this "
+			"NIC May lead to frame reception errors!\n");
 
 	tp->rx_buf_sz = (max_frame > RX_BUF_SIZE) ? max_frame : RX_BUF_SIZE;
 }

^ permalink raw reply related

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
From: Patrick McHardy @ 2010-04-01 14:03 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel, netdev
In-Reply-To: <alpine.LSU.2.01.1004011557270.5368@obet.zrqbmnf.qr>

Jan Engelhardt wrote:
> On Thursday 2010-04-01 15:48, Patrick McHardy wrote:
>> Jan Engelhardt wrote:
>>> On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>>>>>>> Conntrack loops are prevented by using a dummy conntrack, just as 
>>>>>>> NOTRACK does.
>>>>>> [...]
>>>>>>>  - When the cloned packets gets XFRMed or tunneled, its status switches 
>>>>>>>    from "special" to "plain". Doing policy routing on them does not seem 
>>>>>>>    so far-fetched.
>>>>>> My question was about the case without conntrack.
>>>>> Hm. Do you have any suggestion in countering a case whereby a user
>>>>> does -I OUTPUT -j TEE without conntrack?
>>>>>
>>>>> Perhaps making nesting a feature that requires conntrack, such that the 
>>>>> non-CT case can't loop?
>>>> If we drop the reentrancy thing, what should work is to prevent
>>>> using loopback as output device and using something similar to
>>>> the recursion counters tunnel devices used to have.
>>> Nah. I'm going to pick a bit from struct skbuff to indicate the
>>> packet was teed so as to avoid that loop.
>> That's a bad idea, we shouldn't be adding new skb members for something
>> as peripheral as this module.
> 
> I would have done this, which does not add a member:
> 
> 	IP6CB(skb)->flags |= IPSKB_CLONED;

This doesn't work, the CB is not preserved across layers
(which f.i. matters if you allow loopback destinations).
Its also not preserved for clones.

>> What's wrong with adding a reentrancy counter?
> 
> Sounds like a plan.



^ permalink raw reply

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
From: Jan Engelhardt @ 2010-04-01 13:59 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev
In-Reply-To: <4BB4A41C.7030107@trash.net>


On Thursday 2010-04-01 15:48, Patrick McHardy wrote:
>Jan Engelhardt wrote:
>> On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>>>>>> Conntrack loops are prevented by using a dummy conntrack, just as 
>>>>>> NOTRACK does.
>>>>> [...]
>>>>>>  - When the cloned packets gets XFRMed or tunneled, its status switches 
>>>>>>    from "special" to "plain". Doing policy routing on them does not seem 
>>>>>>    so far-fetched.
>>>>> My question was about the case without conntrack.
>>>> Hm. Do you have any suggestion in countering a case whereby a user
>>>> does -I OUTPUT -j TEE without conntrack?
>>>>
>>>> Perhaps making nesting a feature that requires conntrack, such that the 
>>>> non-CT case can't loop?
>>> If we drop the reentrancy thing, what should work is to prevent
>>> using loopback as output device and using something similar to
>>> the recursion counters tunnel devices used to have.
>> 
>> Nah. I'm going to pick a bit from struct skbuff to indicate the
>> packet was teed so as to avoid that loop.
>
>That's a bad idea, we shouldn't be adding new skb members for something
>as peripheral as this module.

I would have done this, which does not add a member:

	IP6CB(skb)->flags |= IPSKB_CLONED;

>What's wrong with adding a reentrancy counter?

Sounds like a plan.

^ permalink raw reply

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
From: Patrick McHardy @ 2010-04-01 13:48 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel, netdev
In-Reply-To: <alpine.LSU.2.01.1004011543260.1174@obet.zrqbmnf.qr>

Jan Engelhardt wrote:
> On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>>>>> Conntrack loops are prevented by using a dummy conntrack, just as 
>>>>> NOTRACK does.
>>>> [...]
>>>>>  - When the cloned packets gets XFRMed or tunneled, its status switches 
>>>>>    from "special" to "plain". Doing policy routing on them does not seem 
>>>>>    so far-fetched.
>>>> My question was about the case without conntrack.
>>> Hm. Do you have any suggestion in countering a case whereby a user
>>> does -I OUTPUT -j TEE without conntrack?
>>>
>>> Perhaps making nesting a feature that requires conntrack, such that the 
>>> non-CT case can't loop?
>> If we drop the reentrancy thing, what should work is to prevent
>> using loopback as output device and using something similar to
>> the recursion counters tunnel devices used to have.
> 
> Nah. I'm going to pick a bit from struct skbuff to indicate the
> packet was teed so as to avoid that loop.

That's a bad idea, we shouldn't be adding new skb members for something
as peripheral as this module.

What's wrong with adding a reentrancy counter?

^ permalink raw reply

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
From: Jan Engelhardt @ 2010-04-01 13:44 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev
In-Reply-To: <4BB49E10.8080608@trash.net>


On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>>>> Conntrack loops are prevented by using a dummy conntrack, just as 
>>>> NOTRACK does.
>>> [...]
>>>>  - When the cloned packets gets XFRMed or tunneled, its status switches 
>>>>    from "special" to "plain". Doing policy routing on them does not seem 
>>>>    so far-fetched.
>>> My question was about the case without conntrack.
>> 
>> Hm. Do you have any suggestion in countering a case whereby a user
>> does -I OUTPUT -j TEE without conntrack?
>> 
>> Perhaps making nesting a feature that requires conntrack, such that the 
>> non-CT case can't loop?
>
>If we drop the reentrancy thing, what should work is to prevent
>using loopback as output device and using something similar to
>the recursion counters tunnel devices used to have.

Nah. I'm going to pick a bit from struct skbuff to indicate the
packet was teed so as to avoid that loop.


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox