Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 4/4] drivers/atm/idt77252.c: Remove unnecessary error check
From: David Miller @ 2010-10-04  5:06 UTC (permalink / raw)
  To: julia
  Cc: wharms, chas, kernel-janitors, linux-atm-general, netdev,
	linux-kernel
In-Reply-To: <Pine.LNX.4.64.1010021636330.21879@ask.diku.dk>

From: Julia Lawall <julia@diku.dk>
Date: Sat, 2 Oct 2010 16:37:07 +0200 (CEST)

> This code does not call deinit_card(card); in an error case, as done in
> other error-handling code in the same function.  But actually, the called
> function init_sram can only return 0, so there is no need for the error
> check at all.
> 
> init_sram is also given a void return type, and its single return statement
> at the end of the function is dropped.
> 
> A simplified version of the sematic match that finds this problem is as
> follows: (http://coccinelle.lip6.fr/)
 ...
> Signed-off-by: Julia Lawall <julia@diku.dk>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next] cxgb4: remove a bogus PCI function number check
From: David Miller @ 2010-10-04  5:07 UTC (permalink / raw)
  To: dm; +Cc: netdev
In-Reply-To: <1285874232-1554-1-git-send-email-dm@chelsio.com>

From: Dimitris Michailidis <dm@chelsio.com>
Date: Thu, 30 Sep 2010 12:17:12 -0700

> Remove a bogus PCI function number check from the driver's .remove
> method that causes pci_release_regions not to be called for function 0
> if additional functions are attached and one of them is used as primary.
> 
> Signed-off-by: Dimitris Michailidis <dm@chelsio.com>

Applied, thanks.

^ permalink raw reply

* Re: [patch 1/1] [PATCH] qeth: tagging with VLAN-ID 0
From: David Miller @ 2010-10-04  5:08 UTC (permalink / raw)
  To: frank.blaschka; +Cc: netdev, linux-s390, ursula.braun
In-Reply-To: <20101001125142.037089467@de.ibm.com>

From: frank.blaschka@de.ibm.com
Date: Fri, 01 Oct 2010 14:51:13 +0200

> From: Ursula Braun <ursula.braun@de.ibm.com>
> 
> This patch adapts qeth to handle tagged frames with VLAN-ID 0 and
> with or without priority information in the tag. It enables qeth to
> receive priority-tagged frames on a base interface, for example from
> z/OS, without configuring an additional VLAN interface.
> 
> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>

I'll apply this, thanks.

^ permalink raw reply

* Re: pull request: wireless-next-2.6 2010-10-01
From: David Miller @ 2010-10-04  5:11 UTC (permalink / raw)
  To: linville-2XuSBdqkA4R54TAoqtyWWQ
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20101001161819.GD3049-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>

From: "John W. Linville" <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
Date: Fri, 1 Oct 2010 12:18:19 -0400

> Here is the latest round of wireless LAN updates intended for 2.6.37.
> Included are some ath5k bits from Bruno Randolf, some carl9170 updates
> from Christian Lamparter, some mac80211 updates from Johannes Berg,
> some work for supporting multiple VIFs on one device from Ben Greear,
> and a smattering of other bits.
> 
> Please let me know if there are problems!

Pulled, thanks John.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next-2.6] be2net: add multiple RX queue support
From: David Miller @ 2010-10-04  5:12 UTC (permalink / raw)
  To: sathya.perla; +Cc: netdev
In-Reply-To: <20101001104133.GA17206@emulex.com>

From: Sathya Perla <sathya.perla@emulex.com>
Date: Fri, 1 Oct 2010 16:11:33 +0530

> Dave, incorporated your comment to discover the num of supported MSIx vectors
> dynamically; thanks.
> 
> This patch adds multiple RX queue support to be2net. There are
> upto 4 extra rx-queues per port into which TCP/UDP traffic can be hashed into.
> Some of the ethtool stats are now displayed on a per queue basis.
> 
> Signed-off-by: Sathya Perla <sathya.perla@emulex.com>

Looks good, applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: introduce DST_NOCACHE flag
From: David Miller @ 2010-10-04  5:18 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1285868687.2615.900.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 30 Sep 2010 19:44:47 +0200

 ...
> Introduce a new dst flag, DST_NOCACHE, that is set when dst was not
> inserted in route cache.
 ...
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Cute, and applied, but it shows that we're RCU'd so much of the
surrounding infrastructure that the neighbour cache is now pretty
high on the list of things to RCU.

^ permalink raw reply

* [PATCHv4 net-next-2.6 0/5] Removal of RH2/HAO from IPsec-protected MIPv6 traffic
From: Arnaud Ebalard @ 2010-10-04  6:24 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Herbert Xu, Hideaki YOSHIFUJI; +Cc: netdev

Hi,

This is version 4 of the patches. Compared to previous version:

 - symbols which need not be exported are no more exported
 - s/printk(KERNINFO, ...)/pr_info()/ in net/ipv6/mip6.c (my code
   and existing one)
 - I also refactored some initialization code for mip6 states
   (by introduced mip6_state_init_sanity_check())

After the discussion with Herbert, I decided to leave input handlers
as they are at the moment until I have some additional time to dig on
it. At least, it's safe.

Regarding reject handler() (to warn user space so that a Binding Error
can be sent), it still does nothing yet.

Herbert, if you read this, can you take a quick look at patches 1/5 and
4/5 to tell if the changes are acceptable for IPsec users.

Cheers,

a+

^ permalink raw reply

* [PATCHv4 net-next-2.6 1/5] XFRM,IPv6: Remove xfrm_spi_hash() dependency on destination address
From: Arnaud Ebalard @ 2010-10-04  6:25 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Herbert Xu, Hideaki YOSHIFUJI; +Cc: netdev
In-Reply-To: <cover.1286139128.git.arno@natisbad.org>

In the new IPsec architecture [RFC4301], "for an SA used to carry
unicast traffic, the Security Parameters Index (SPI) by itself
suffices to specify an SA".  Section 4.1 of [RFC4301] provides
additional guidance on the topic.

In the old IPsec architecture [RFC2401], a SA "is uniquely identified
by a triple consisting of a Security Parameter Index (SPI), an IP
Destination Address and a security protocol (AH or ESP) identifier".

If an IPsec stack only supports the behavior mandated by the old
IPsec architecture, SAD lookup on inbound packets require the use of
both the SPI and the destination address of the SA.

For inbound IPsec traffic, IRO remapping rules may exist on the MN to
remap the destination address (CoA) into the HoA.  In that case, by
design, the address found in the destination address field of the
packet (CoA) does not match the one in the SA (HoA).

At the moment, Linux XFRM stack includes the address when computing
the hash to perform state lookup by SPI. This patch changes XFRM
state hash computation to prevent destination address to be
used. This will later allow finding states for packets w/ mangled
destination addresses.

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
---
 net/xfrm/xfrm_hash.h  |   21 +--------------------
 net/xfrm/xfrm_state.c |   20 ++++++++------------
 2 files changed, 9 insertions(+), 32 deletions(-)

diff --git a/net/xfrm/xfrm_hash.h b/net/xfrm/xfrm_hash.h
index 8e69533..19eeee7 100644
--- a/net/xfrm/xfrm_hash.h
+++ b/net/xfrm/xfrm_hash.h
@@ -4,16 +4,6 @@
 #include <linux/xfrm.h>
 #include <linux/socket.h>
 
-static inline unsigned int __xfrm4_addr_hash(xfrm_address_t *addr)
-{
-	return ntohl(addr->a4);
-}
-
-static inline unsigned int __xfrm6_addr_hash(xfrm_address_t *addr)
-{
-	return ntohl(addr->a6[2] ^ addr->a6[3]);
-}
-
 static inline unsigned int __xfrm4_daddr_saddr_hash(xfrm_address_t *daddr, xfrm_address_t *saddr)
 {
 	u32 sum = (__force u32)daddr->a4 + (__force u32)saddr->a4;
@@ -60,18 +50,9 @@ static inline unsigned __xfrm_src_hash(xfrm_address_t *daddr,
 }
 
 static inline unsigned int
-__xfrm_spi_hash(xfrm_address_t *daddr, __be32 spi, u8 proto, unsigned short family,
-		unsigned int hmask)
+__xfrm_spi_hash(__be32 spi, u8 proto, unsigned int hmask)
 {
 	unsigned int h = (__force u32)spi ^ proto;
-	switch (family) {
-	case AF_INET:
-		h ^= __xfrm4_addr_hash(daddr);
-		break;
-	case AF_INET6:
-		h ^= __xfrm6_addr_hash(daddr);
-		break;
-	}
 	return (h ^ (h >> 10) ^ (h >> 20)) & hmask;
 }
 
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index eb96ce5..b6a4d8d 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -30,7 +30,7 @@
 
 /* Each xfrm_state may be linked to two tables:
 
-   1. Hash table by (spi,daddr,ah/esp) to find SA by SPI. (input,ctl)
+   1. Hash table by (spi,ah/esp) to find SA by SPI. (input,ctl)
    2. Hash table by (daddr,family,reqid) to find what SAs exist for given
       destination/tunnel endpoint. (output)
  */
@@ -67,9 +67,9 @@ static inline unsigned int xfrm_src_hash(struct net *net,
 }
 
 static inline unsigned int
-xfrm_spi_hash(struct net *net, xfrm_address_t *daddr, __be32 spi, u8 proto, unsigned short family)
+xfrm_spi_hash(struct net *net, __be32 spi, u8 proto)
 {
-	return __xfrm_spi_hash(daddr, spi, proto, family, net->xfrm.state_hmask);
+	return __xfrm_spi_hash(spi, proto, net->xfrm.state_hmask);
 }
 
 static void xfrm_hash_transfer(struct hlist_head *list,
@@ -95,9 +95,7 @@ static void xfrm_hash_transfer(struct hlist_head *list,
 		hlist_add_head(&x->bysrc, nsrctable+h);
 
 		if (x->id.spi) {
-			h = __xfrm_spi_hash(&x->id.daddr, x->id.spi,
-					    x->id.proto, x->props.family,
-					    nhashmask);
+			h = __xfrm_spi_hash(x->id.spi, x->id.proto, nhashmask);
 			hlist_add_head(&x->byspi, nspitable+h);
 		}
 	}
@@ -679,7 +677,7 @@ xfrm_init_tempstate(struct xfrm_state *x, struct flowi *fl,
 
 static struct xfrm_state *__xfrm_state_lookup(struct net *net, u32 mark, xfrm_address_t *daddr, __be32 spi, u8 proto, unsigned short family)
 {
-	unsigned int h = xfrm_spi_hash(net, daddr, spi, proto, family);
+	unsigned int h = xfrm_spi_hash(net, spi, proto);
 	struct xfrm_state *x;
 	struct hlist_node *entry;
 
@@ -868,7 +866,7 @@ found:
 			h = xfrm_src_hash(net, daddr, saddr, encap_family);
 			hlist_add_head(&x->bysrc, net->xfrm.state_bysrc+h);
 			if (x->id.spi) {
-				h = xfrm_spi_hash(net, &x->id.daddr, x->id.spi, x->id.proto, encap_family);
+				h = xfrm_spi_hash(net, x->id.spi, x->id.proto);
 				hlist_add_head(&x->byspi, net->xfrm.state_byspi+h);
 			}
 			x->lft.hard_add_expires_seconds = net->xfrm.sysctl_acq_expires;
@@ -942,9 +940,7 @@ static void __xfrm_state_insert(struct xfrm_state *x)
 	hlist_add_head(&x->bysrc, net->xfrm.state_bysrc+h);
 
 	if (x->id.spi) {
-		h = xfrm_spi_hash(net, &x->id.daddr, x->id.spi, x->id.proto,
-				  x->props.family);
-
+		h = xfrm_spi_hash(net, x->id.spi, x->id.proto);
 		hlist_add_head(&x->byspi, net->xfrm.state_byspi+h);
 	}
 
@@ -1535,7 +1531,7 @@ int xfrm_alloc_spi(struct xfrm_state *x, u32 low, u32 high)
 	}
 	if (x->id.spi) {
 		spin_lock_bh(&xfrm_state_lock);
-		h = xfrm_spi_hash(net, &x->id.daddr, x->id.spi, x->id.proto, x->props.family);
+		h = xfrm_spi_hash(net, x->id.spi, x->id.proto);
 		hlist_add_head(&x->byspi, net->xfrm.state_byspi+h);
 		spin_unlock_bh(&xfrm_state_lock);
 
-- 
1.7.1



^ permalink raw reply related

* [PATCHv4 net-next-2.6 2/5] XFRM,IPv6: Introduce receive sockopts to access IRO remapped src/dst addresses
From: Arnaud Ebalard @ 2010-10-04  6:25 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Herbert Xu, Hideaki YOSHIFUJI; +Cc: netdev
In-Reply-To: <cover.1286139128.git.arno@natisbad.org>

This patch introduces IRO recv sockopts, in order for userland processes
(e.g. UMIP) to access on-wire source or destination addresses found in
incoming (IPsec-protected) packets as they were before remapping by IRO.
The socket options are respectively IPV6_RECVIROSRC and IPV6_RECVIRODST.

Basically, the two recv socket options are similar in their purpose to
their generic RH2/HAO counterparts defined in RFC 3542 (IPV6_RECVIROSRC
<->  IPV6_RECVDSTOPTS, IPV6_RECVIRODST <-> IPV6_RECVRTHDR). They differ
on the following aspects:

 - IRO reporting sockopts only work on incoming IPsec-protected packets
   Userspace will never get IRO remapped address report for common
   (non protected) packets.
 - The receiver gets the original source/desination address (IRO
   remapping) from its IPsec stack.
 - as IRO sockopts only deal with addresses, no specific structure is
   defined, i.e. struct in6_addr is used to pass info.

As we only interact with IPsec protected packets, struct sec_path is
used to carry information (addresses) for incoming packets that have
undergone remapping process.

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
---
 include/linux/in6.h      |    7 +++++++
 include/linux/ipv6.h     |    4 +++-
 include/net/xfrm.h       |    5 +++++
 net/ipv6/datagram.c      |   18 ++++++++++++++++++
 net/ipv6/ipv6_sockglue.c |   26 ++++++++++++++++++++++++++
 5 files changed, 59 insertions(+), 1 deletions(-)

diff --git a/include/linux/in6.h b/include/linux/in6.h
index c4bf46f..52a98ab 100644
--- a/include/linux/in6.h
+++ b/include/linux/in6.h
@@ -283,4 +283,11 @@ struct in6_flowlabel_req {
  * MRT6_PIM			208
  * (reserved)			209
  */
+
+/* IRO (IPsec Route Optimization) sockopts */
+#define IPV6_RECVIROSRC         74
+#define IPV6_IROSRC		75
+#define IPV6_RECVIRODST         76
+#define IPV6_IRODST		77
+
 #endif
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index e62683b..55289ee 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -341,7 +341,9 @@ struct ipv6_pinfo {
 				odstopts:1,
                                 rxflow:1,
 				rxtclass:1,
-				rxpmtu:1;
+				rxpmtu:1,
+				irosrc:1,
+				irodst:1;
 		} bits;
 		__u16		all;
 	} rxopt;
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 4f53532..e6a753c 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -909,6 +909,11 @@ struct sec_path {
 	atomic_t		refcnt;
 	int			len;
 	struct xfrm_state	*xvec[XFRM_MAX_DEPTH];
+
+#ifdef CONFIG_XFRM_SUB_POLICY
+	struct in6_addr         irosrc;
+	struct in6_addr         irodst;
+#endif
 };
 
 static inline struct sec_path *
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index ef371aa..2952c9e 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -29,6 +29,7 @@
 #include <net/transp_v6.h>
 #include <net/ip6_route.h>
 #include <net/tcp_states.h>
+#include <net/xfrm.h>
 
 #include <linux/errqueue.h>
 #include <asm/uaccess.h>
@@ -504,6 +505,23 @@ int datagram_recv_ctl(struct sock *sk, struct msghdr *msg, struct sk_buff *skb)
 		put_cmsg(msg, SOL_IPV6, IPV6_HOPOPTS, (ptr[1]+1)<<3, ptr);
 	}
 
+#ifdef CONFIG_XFRM_SUB_POLICY
+	/* If access to IRO-remapped source or destination address has been
+	 * requested and it has indeed been remapped, provide the on-wire
+	 * address to userland */
+	if (skb_sec_path(skb)) {
+		struct sec_path *sp = skb_sec_path(skb);
+
+		if (np->rxopt.bits.irosrc && !ipv6_addr_any(&sp->irosrc))
+			put_cmsg(msg, SOL_IPV6, IPV6_IROSRC,
+				 sizeof(sp->irosrc), &sp->irosrc);
+
+		if (np->rxopt.bits.irodst && !ipv6_addr_any(&sp->irodst))
+			put_cmsg(msg, SOL_IPV6, IPV6_IRODST,
+				 sizeof(sp->irodst), &sp->irodst);
+	}
+#endif
+
 	if (opt->lastopt &&
 	    (np->rxopt.bits.dstopts || np->rxopt.bits.srcrt)) {
 		/*
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index a7f66bc..722a49f 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -302,6 +302,22 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 		retv = 0;
 		break;
 
+#ifdef CONFIG_XFRM_SUB_POLICY
+	case IPV6_RECVIROSRC:
+		if (optlen < sizeof(int))
+			goto e_inval;
+		np->rxopt.bits.irosrc = valbool;
+		retv = 0;
+		break;
+
+	case IPV6_RECVIRODST:
+		if (optlen < sizeof(int))
+			goto e_inval;
+		np->rxopt.bits.irodst = valbool;
+		retv = 0;
+		break;
+#endif
+
 	case IPV6_2292DSTOPTS:
 		if (optlen < sizeof(int))
 			goto e_inval;
@@ -1056,6 +1072,16 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 		val = np->rxopt.bits.dstopts;
 		break;
 
+#ifdef CONFIG_XFRM_SUB_POLICY
+	case IPV6_RECVIROSRC:
+		val = np->rxopt.bits.irosrc;
+		break;
+
+	case IPV6_RECVIRODST:
+		val = np->rxopt.bits.irodst;
+		break;
+#endif
+
 	case IPV6_2292DSTOPTS:
 		val = np->rxopt.bits.odstopts;
 		break;
-- 
1.7.1



^ permalink raw reply related

* [PATCHv4 net-next-2.6 3/5] XFRM,IPv6: Add IRO src/dst address remapping XFRM types and i/o handlers
From: Arnaud Ebalard @ 2010-10-04  6:25 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Herbert Xu, Hideaki YOSHIFUJI; +Cc: netdev
In-Reply-To: <cover.1286139128.git.arno@natisbad.org>

Add IRO source and destination remapping XFRM types and associated
input/output handlers. This allows userland to install such states
in order to support remapping of source or destination address
of packet. They basically work like existing RH2 and HAO ones; the
main difference is that output handlers do not expand the packet by
adding an extension header: they simply change the source or
destination in place. Input handlers are almost the same as RH2/HAO
version in their behavior, but they are triggered differently. RH2
and HAO handlers are triggered based on structures found in the
packet. On input, IRO states (and associated handlers) are looked
up when processing an IPsec-protected packet, when there is an
address mismatch.

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
---
 include/net/xfrm.h       |    2 +
 net/ipv6/mip6.c          |  181 ++++++++++++++++++++++++++++++++++++++++------
 net/ipv6/xfrm6_mode_ro.c |   11 +++-
 net/xfrm/xfrm_user.c     |    4 +
 4 files changed, 176 insertions(+), 22 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index e6a753c..05b2b1f 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -35,6 +35,8 @@
 #define XFRM_PROTO_IPV6		41
 #define XFRM_PROTO_ROUTING	IPPROTO_ROUTING
 #define XFRM_PROTO_DSTOPTS	IPPROTO_DSTOPTS
+#define XFRM_PROTO_IRO_SRC      127
+#define XFRM_PROTO_IRO_DST      128
 
 #define XFRM_ALIGN8(len)	(((len) + 7) & ~7)
 #define MODULE_ALIAS_XFRM_MODE(family, encap) \
diff --git a/net/ipv6/mip6.c b/net/ipv6/mip6.c
index d6e9599..9685599 100644
--- a/net/ipv6/mip6.c
+++ b/net/ipv6/mip6.c
@@ -302,18 +302,26 @@ static int mip6_destopt_offset(struct xfrm_state *x, struct sk_buff *skb,
 	return offset;
 }
 
-static int mip6_destopt_init_state(struct xfrm_state *x)
+/* Helper performing basic sanity checks on given mip6 state
+ * during state's initialization process */
+static int mip6_state_init_sanity_check(struct xfrm_state *x)
 {
 	if (x->id.spi) {
-		printk(KERN_INFO "%s: spi is not 0: %u\n", __func__,
-		       x->id.spi);
+		pr_info("%s: SPI is not 0 but %u\n", __func__, x->id.spi);
 		return -EINVAL;
 	}
 	if (x->props.mode != XFRM_MODE_ROUTEOPTIMIZATION) {
-		printk(KERN_INFO "%s: state's mode is not %u: %u\n",
-		       __func__, XFRM_MODE_ROUTEOPTIMIZATION, x->props.mode);
+		pr_info("%s: state's mode is not RO (%u) but %u\n", __func__,
+			XFRM_MODE_ROUTEOPTIMIZATION, x->props.mode);
 		return -EINVAL;
 	}
+	return 0;
+}
+
+static int mip6_destopt_init_state(struct xfrm_state *x)
+{
+	if (mip6_state_init_sanity_check(x))
+		return -EINVAL;
 
 	x->props.header_len = sizeof(struct ipv6_destopt_hdr) +
 		calc_padlen(sizeof(struct ipv6_destopt_hdr), 6) +
@@ -440,16 +448,8 @@ static int mip6_rthdr_offset(struct xfrm_state *x, struct sk_buff *skb,
 
 static int mip6_rthdr_init_state(struct xfrm_state *x)
 {
-	if (x->id.spi) {
-		printk(KERN_INFO "%s: spi is not 0: %u\n", __func__,
-		       x->id.spi);
-		return -EINVAL;
-	}
-	if (x->props.mode != XFRM_MODE_ROUTEOPTIMIZATION) {
-		printk(KERN_INFO "%s: state's mode is not %u: %u\n",
-		       __func__, XFRM_MODE_ROUTEOPTIMIZATION, x->props.mode);
+	if (mip6_state_init_sanity_check(x))
 		return -EINVAL;
-	}
 
 	x->props.header_len = sizeof(struct rt2_hdr);
 
@@ -477,20 +477,145 @@ static const struct xfrm_type mip6_rthdr_type =
 	.hdr_offset	= mip6_rthdr_offset,
 };
 
+#ifdef CONFIG_XFRM_SUB_POLICY
+/* IRO equivalent of mip6_destopt_input(): handles incoming packet with a
+ * source address different from the one expected in the SA: check that
+ * received source address is indeed the CoA we expected (or any address
+ * if the state references the unspecified address '::') */
+static int mip6_iro_src_input(struct xfrm_state *x, struct sk_buff *skb)
+{
+	struct ipv6hdr *iph = ipv6_hdr(skb);
+	int err = 1;
+
+	spin_lock(&x->lock);
+	if (!ipv6_addr_equal(&iph->saddr, (struct in6_addr *)x->coaddr) &&
+	    !ipv6_addr_any((struct in6_addr *)x->coaddr))
+		err = -ENOENT;
+	spin_unlock(&x->lock);
+
+	return err;
+}
+
+/* IRO equivalent of mip6_destopt_output(): replaces current source address
+ * of outgoing packet by state's CoA. */
+static int mip6_iro_src_output(struct xfrm_state *x, struct sk_buff *skb)
+{
+	struct ipv6hdr *iph = ipv6_hdr(skb);
+
+	spin_lock_bh(&x->lock);
+	memcpy(&iph->saddr, x->coaddr, sizeof(iph->saddr));
+	spin_unlock_bh(&x->lock);
+
+	return 0;
+}
+
+static int mip6_iro_src_reject(struct xfrm_state *x, struct sk_buff *skb, struct flowi *fl)
+{
+	/* XXX We may need some reject handler at some point but it is not
+	 * critical yet: see xfrm_secpath_reject() in net/xfrm/xfrm_policy.c
+	 * and aslo what mip6_destopt_reject() implements */
+
+	pr_debug("%s: not implemented yet.\n", __func__);
+
+	return 0;
+}
+
+/* This is the IRO equivalent of mip6_rthdr_input(): handles incoming packet
+ * with a destination address different from the one expected in the SA:
+ * check that received destination address is indeed the CoA we expected
+ * (or any address if the state references the unspecified address '::') */
+static int mip6_iro_dst_input(struct xfrm_state *x, struct sk_buff *skb)
+{
+	struct ipv6hdr *iph = ipv6_hdr(skb);
+	int err = 1;
+
+	spin_lock(&x->lock);
+	if (!ipv6_addr_equal(&iph->daddr, (struct in6_addr *)x->coaddr) &&
+	    !ipv6_addr_any((struct in6_addr *)x->coaddr))
+		err = -ENOENT;
+	spin_unlock(&x->lock);
+
+	return err;
+}
+
+/* IRO equivalent of mip6_rthdr_output(): replaces current destination
+ * address of outgoing packet with state's CoA */
+static int mip6_iro_dst_output(struct xfrm_state *x, struct sk_buff *skb)
+{
+	struct ipv6hdr *iph = ipv6_hdr(skb);
+
+	spin_lock_bh(&x->lock);
+	memcpy(&iph->daddr, x->coaddr, sizeof(iph->daddr));
+	spin_unlock_bh(&x->lock);
+
+	return 0;
+}
+
+/* Common to iro src and dst remapping states. */
+static int mip6_iro_init_state(struct xfrm_state *x)
+{
+	return mip6_state_init_sanity_check(x);
+}
+
+/* Unlike common IPsec protocols, nothing to do when destroying */
+static void mip6_iro_destroy(struct xfrm_state *x)
+{
+}
+
+static const struct xfrm_type mip6_iro_src_type =
+{
+	.description	= "MIP6_IRO_SRC",
+	.owner		= THIS_MODULE,
+	.proto	     	= XFRM_PROTO_IRO_SRC,
+	.flags		= XFRM_TYPE_NON_FRAGMENT | XFRM_TYPE_LOCAL_COADDR,
+	.init_state	= mip6_iro_init_state,
+	.destructor	= mip6_iro_destroy,
+	.input		= mip6_iro_src_input,
+	.output		= mip6_iro_src_output,
+	.reject         = mip6_iro_src_reject,
+};
+
+static const struct xfrm_type mip6_iro_dst_type =
+{
+	.description	= "MIP6_IRO_DST",
+	.owner		= THIS_MODULE,
+	.proto	     	= XFRM_PROTO_IRO_DST,
+	.flags		= XFRM_TYPE_NON_FRAGMENT | XFRM_TYPE_REMOTE_COADDR,
+	.init_state	= mip6_iro_init_state,
+	.destructor	= mip6_iro_destroy,
+	.input		= mip6_iro_dst_input,
+	.output		= mip6_iro_dst_output,
+};
+#endif /* CONFIG_XFRM_SUB_POLICY */
+
 static int __init mip6_init(void)
 {
-	printk(KERN_INFO "Mobile IPv6\n");
+	pr_info("Mobile IPv6\n");
 
 	if (xfrm_register_type(&mip6_destopt_type, AF_INET6) < 0) {
-		printk(KERN_INFO "%s: can't add xfrm type(destopt)\n", __func__);
+		pr_info("%s: can't add xfrm type(destopt)\n", __func__);
 		goto mip6_destopt_xfrm_fail;
 	}
 	if (xfrm_register_type(&mip6_rthdr_type, AF_INET6) < 0) {
-		printk(KERN_INFO "%s: can't add xfrm type(rthdr)\n", __func__);
+		pr_info("%s: can't add xfrm type(rthdr)\n", __func__);
 		goto mip6_rthdr_xfrm_fail;
 	}
+
+#ifdef CONFIG_XFRM_SUB_POLICY
+	if (xfrm_register_type(&mip6_iro_src_type, AF_INET6) < 0) {
+		pr_info("%s: can't add xfrm type(IRO src remap)\n",
+		       __func__);
+		goto mip6_iro_src_remap_xfrm_fail;
+	}
+	if (xfrm_register_type(&mip6_iro_dst_type, AF_INET6) < 0) {
+		pr_info("%s: can't add xfrm type(IRO dst remap)\n",
+		       __func__);
+		goto mip6_iro_dst_remap_xfrm_fail;
+	}
+#endif
+
 	if (rawv6_mh_filter_register(mip6_mh_filter) < 0) {
-		printk(KERN_INFO "%s: can't add rawv6 mh filter\n", __func__);
+		pr_info("%s: can't add rawv6 mh filter\n", __func__);
 		goto mip6_rawv6_mh_fail;
 	}
 
@@ -498,6 +623,12 @@ static int __init mip6_init(void)
 	return 0;
 
  mip6_rawv6_mh_fail:
+#ifdef CONFIG_XFRM_SUB_POLICY
+	xfrm_unregister_type(&mip6_iro_dst_type, AF_INET6);
+ mip6_iro_dst_remap_xfrm_fail:
+	xfrm_unregister_type(&mip6_iro_src_type, AF_INET6);
+ mip6_iro_src_remap_xfrm_fail:
+#endif
 	xfrm_unregister_type(&mip6_rthdr_type, AF_INET6);
  mip6_rthdr_xfrm_fail:
 	xfrm_unregister_type(&mip6_destopt_type, AF_INET6);
@@ -508,11 +639,19 @@ static int __init mip6_init(void)
 static void __exit mip6_fini(void)
 {
 	if (rawv6_mh_filter_unregister(mip6_mh_filter) < 0)
-		printk(KERN_INFO "%s: can't remove rawv6 mh filter\n", __func__);
+		pr_info("%s: can't remove rawv6 mh filter\n", __func__);
+#ifdef CONFIG_XFRM_SUB_POLICY
+	if (xfrm_unregister_type(&mip6_iro_dst_type, AF_INET6) < 0)
+		pr_info("%s: can't remove xfrm type(IRO dst remap)\n",
+		       __func__);
+	if (xfrm_unregister_type(&mip6_iro_src_type, AF_INET6) < 0)
+		pr_info("%s: can't remove xfrm type(IRO src remap)\n",
+		       __func__);
+#endif
 	if (xfrm_unregister_type(&mip6_rthdr_type, AF_INET6) < 0)
-		printk(KERN_INFO "%s: can't remove xfrm type(rthdr)\n", __func__);
+		pr_info("%s: can't remove xfrm type(rthdr)\n", __func__);
 	if (xfrm_unregister_type(&mip6_destopt_type, AF_INET6) < 0)
-		printk(KERN_INFO "%s: can't remove xfrm type(destopt)\n", __func__);
+		pr_info("%s: can't remove xfrm type(destopt)\n", __func__);
 }
 
 module_init(mip6_init);
diff --git a/net/ipv6/xfrm6_mode_ro.c b/net/ipv6/xfrm6_mode_ro.c
index 63d5d49..ea33178 100644
--- a/net/ipv6/xfrm6_mode_ro.c
+++ b/net/ipv6/xfrm6_mode_ro.c
@@ -45,6 +45,15 @@ static int xfrm6_ro_output(struct xfrm_state *x, struct sk_buff *skb)
 	u8 *prevhdr;
 	int hdr_len;
 
+	/* Unlike RH2 (IPPROTO_ROUTING) and HAO in DstOpt
+	 * (IPPROTO_DSTOPTS), IRO remapping states do not
+	 * add extension header to the packet. Source
+	 * and/or destination addresses are simply modified
+	 * in place. */
+	if (x->id.proto == XFRM_PROTO_IRO_SRC ||
+	    x->id.proto == XFRM_PROTO_IRO_DST)
+		goto out;
+
 	iph = ipv6_hdr(skb);
 
 	hdr_len = x->type->hdr_offset(x, skb, &prevhdr);
@@ -54,8 +63,8 @@ static int xfrm6_ro_output(struct xfrm_state *x, struct sk_buff *skb)
 	__skb_pull(skb, hdr_len);
 	memmove(ipv6_hdr(skb), iph, hdr_len);
 
+ out:
 	x->lastused = get_seconds();
-
 	return 0;
 }
 
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 8bae6b2..2aecd40 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -179,6 +179,10 @@ static int verify_newsa_info(struct xfrm_usersa_info *p,
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 	case IPPROTO_DSTOPTS:
 	case IPPROTO_ROUTING:
+#ifdef CONFIG_XFRM_SUB_POLICY
+	case XFRM_PROTO_IRO_SRC:
+	case XFRM_PROTO_IRO_DST:
+#endif
 		if (attrs[XFRMA_ALG_COMP]	||
 		    attrs[XFRMA_ALG_AUTH]	||
 		    attrs[XFRMA_ALG_AUTH_TRUNC]	||
-- 
1.7.1



^ permalink raw reply related

* [PATCHv4 net-next-2.6 4/5] XFRM,IPv6: Add IRO remapping hook in xfrm_input()
From: Arnaud Ebalard @ 2010-10-04  6:25 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Herbert Xu, Hideaki YOSHIFUJI; +Cc: netdev
In-Reply-To: <cover.1286139128.git.arno@natisbad.org>

Add a hook in xfrm_input() to allow IRO remapping to occur when
an incoming packet matching an existing SA (based on SPI) with
an unexpected destination or source address is received.
Because IRO does not consume additional bits in a packet (that's
the point), there is no way to demultiplex based on something
like nh or spi. Instead, IRO input handlers (for source and
destination address remapping) are called upon address mismatch
during IPsec processing.
For that to work, we rely on the fact that SPI values generated
locally are no more linked to destination address (first patch
of the set) and we postpone a bit the expected address check in
xfrm_input() (inside xfrm_state_lookup() against daddr param) by
introducing a call to the input_addr_check() handler from the
struct xfrm_state_afinfo associated with the address family.

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
---
 include/net/xfrm.h     |    5 +++
 net/ipv4/xfrm4_input.c |   11 +++++++
 net/ipv4/xfrm4_state.c |    1 +
 net/ipv6/xfrm6_input.c |   69 +++++++++++++++++++++++++++++++++++++++++++++++-
 net/ipv6/xfrm6_state.c |    1 +
 net/xfrm/xfrm_input.c  |    5 ++-
 net/xfrm/xfrm_state.c  |    2 +-
 7 files changed, 90 insertions(+), 4 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 05b2b1f..5b84c19 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -312,6 +312,8 @@ struct xfrm_state_afinfo {
 						  struct sk_buff *skb);
 	int			(*transport_finish)(struct sk_buff *skb,
 						    int async);
+	int			(*input_addr_check)(struct sk_buff *skb,
+						    struct xfrm_state *x);
 };
 
 extern int xfrm_state_register_afinfo(struct xfrm_state_afinfo *afinfo);
@@ -623,6 +625,7 @@ struct xfrm_spi_skb_cb {
 		struct inet6_skb_parm h6;
 	} header;
 
+	unsigned int saddroff;
 	unsigned int daddroff;
 	unsigned int family;
 };
@@ -1405,6 +1408,7 @@ extern int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi,
 			   int encap_type);
 extern int xfrm4_transport_finish(struct sk_buff *skb, int async);
 extern int xfrm4_rcv(struct sk_buff *skb);
+extern int xfrm4_input_addr_check(struct sk_buff *skb, struct xfrm_state *x);
 
 static inline int xfrm4_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
 {
@@ -1423,6 +1427,7 @@ extern int xfrm6_transport_finish(struct sk_buff *skb, int async);
 extern int xfrm6_rcv(struct sk_buff *skb);
 extern int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
 			    xfrm_address_t *saddr, u8 proto);
+extern int xfrm6_input_addr_check(struct sk_buff *skb, struct xfrm_state *x);
 extern int xfrm6_tunnel_register(struct xfrm6_tunnel *handler, unsigned short family);
 extern int xfrm6_tunnel_deregister(struct xfrm6_tunnel *handler, unsigned short family);
 extern __be32 xfrm6_tunnel_alloc_spi(struct net *net, xfrm_address_t *saddr);
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index 06814b6..8d414ca 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -41,6 +41,7 @@ int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi,
 		    int encap_type)
 {
 	XFRM_SPI_SKB_CB(skb)->family = AF_INET;
+	XFRM_SPI_SKB_CB(skb)->saddroff = offsetof(struct iphdr, saddr);
 	XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct iphdr, daddr);
 	return xfrm_input(skb, nexthdr, spi, encap_type);
 }
@@ -164,3 +165,13 @@ int xfrm4_rcv(struct sk_buff *skb)
 	return xfrm4_rcv_spi(skb, ip_hdr(skb)->protocol, 0);
 }
 EXPORT_SYMBOL(xfrm4_rcv);
+
+int xfrm4_input_addr_check(struct sk_buff *skb, struct xfrm_state *x)
+{
+	xfrm_address_t *daddr;
+
+	daddr = (xfrm_address_t *)(skb_network_header(skb) +
+				   XFRM_SPI_SKB_CB(skb)->daddroff);
+
+	return xfrm_addr_cmp(&x->id.daddr, daddr, AF_INET);
+}
diff --git a/net/ipv4/xfrm4_state.c b/net/ipv4/xfrm4_state.c
index 4794762..c6b038a 100644
--- a/net/ipv4/xfrm4_state.c
+++ b/net/ipv4/xfrm4_state.c
@@ -79,6 +79,7 @@ static struct xfrm_state_afinfo xfrm4_state_afinfo = {
 	.extract_input		= xfrm4_extract_input,
 	.extract_output		= xfrm4_extract_output,
 	.transport_finish	= xfrm4_transport_finish,
+	.input_addr_check	= xfrm4_input_addr_check,
 };
 
 void __init xfrm4_state_init(void)
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index f8c3cf8..aeb7fc6 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -15,6 +15,7 @@
 #include <linux/netfilter_ipv6.h>
 #include <net/ipv6.h>
 #include <net/xfrm.h>
+#include <net/ip6_route.h> /* XXX for ip6_route_input() */
 
 int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb)
 {
@@ -24,6 +25,7 @@ int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb)
 int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
 {
 	XFRM_SPI_SKB_CB(skb)->family = AF_INET6;
+	XFRM_SPI_SKB_CB(skb)->saddroff = offsetof(struct ipv6hdr, saddr);
 	XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct ipv6hdr, daddr);
 	return xfrm_input(skb, nexthdr, spi, 0);
 }
@@ -142,5 +144,70 @@ int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
 drop:
 	return -1;
 }
-
 EXPORT_SYMBOL(xfrm6_input_addr);
+
+#if defined(CONFIG_XFRM_SUB_POLICY)
+/* Perform check on source and destination addresses and possibly IRO
+ * address remapping upon mismatch and if matching IRO state exists. */
+int xfrm6_input_addr_check(struct sk_buff *skb, struct xfrm_state *x)
+{
+	xfrm_address_t *saddr, *exp_saddr, *daddr, *exp_daddr;
+
+	saddr = (xfrm_address_t *)(skb_network_header(skb) +
+				   XFRM_SPI_SKB_CB(skb)->saddroff);
+	daddr = (xfrm_address_t *)(skb_network_header(skb) +
+				   XFRM_SPI_SKB_CB(skb)->daddroff);
+
+	exp_daddr = &x->id.daddr;
+	if (xfrm_addr_cmp(exp_daddr, daddr, AF_INET6)) {
+		/* Destination address mismatch: check if we have an IRO
+		 * destination remapping state to explain that.
+		 *
+		 * Note: saddr is provided as a hint. If source address
+		 * is also a remapped one, xfrm6_input_addr() will manage
+		 * to find IRO destination remapping state */
+		if (xfrm6_input_addr(skb, exp_daddr, saddr,
+				     XFRM_PROTO_IRO_DST) < 0)
+			return -1;
+
+		/* Copy destination address to sec_path for sock opts and
+		 * replace packet destination address with expected HoA */
+		ipv6_addr_copy(&skb->sp->irodst, (struct in6_addr *)daddr);
+		ipv6_addr_copy((struct in6_addr *)daddr,
+			       (struct in6_addr *)exp_daddr);
+
+		skb_dst_drop(skb);
+		ip6_route_input(skb);
+		if (skb_dst(skb)->error)
+			return -1;
+	}
+
+	exp_saddr = &x->props.saddr;
+	if (xfrm_addr_cmp(exp_saddr, saddr, AF_INET6)) {
+		/* Source address mismatch: check if we have an IRO
+		 * source remapping state to explain that.
+		 *
+		 * Note: unlike for destination addresses above, a
+		 * source mismatch is not considered fatal */
+		if (xfrm6_input_addr(skb, daddr, exp_saddr,
+				     XFRM_PROTO_IRO_SRC) < 0)
+			return 0;
+
+		/* Copy destination address to sec_path for sock opts and
+		 * then replace source address with expected peer's HoA */
+		ipv6_addr_copy(&skb->sp->irosrc, (struct in6_addr *)saddr);
+		ipv6_addr_copy((struct in6_addr *)saddr,
+			       (struct in6_addr *)exp_saddr);
+	}
+
+	return 0;
+}
+#else
+int xfrm6_input_addr_check(struct sk_buff *skb, struct xfrm_state *x)
+{
+	xfrm_address_t *daddr;
+	daddr = (xfrm_address_t *)(skb_network_header(skb) +
+				   XFRM_SPI_SKB_CB(skb)->daddroff);
+	return xfrm_addr_cmp(&x->id.daddr, daddr, AF_INET6);
+}
+#endif
diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c
index a67575d..aeb4688 100644
--- a/net/ipv6/xfrm6_state.c
+++ b/net/ipv6/xfrm6_state.c
@@ -179,6 +179,7 @@ static struct xfrm_state_afinfo xfrm6_state_afinfo = {
 	.extract_input		= xfrm6_extract_input,
 	.extract_output		= xfrm6_extract_output,
 	.transport_finish	= xfrm6_transport_finish,
+	.input_addr_check	= xfrm6_input_addr_check,
 };
 
 int __init xfrm6_state_init(void)
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 45f1c98..9ff65f6 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -152,8 +152,9 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 			goto drop;
 		}
 
-		x = xfrm_state_lookup(net, skb->mark, daddr, spi, nexthdr, family);
-		if (x == NULL) {
+		x = xfrm_state_lookup(net, skb->mark, NULL, spi, nexthdr, family);
+		if (x == NULL ||
+		    x->outer_mode->afinfo->input_addr_check(skb, x)) {
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES);
 			xfrm_audit_state_notfound(skb, family, spi, seq);
 			goto drop;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index b6a4d8d..b8f7c08 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -685,7 +685,7 @@ static struct xfrm_state *__xfrm_state_lookup(struct net *net, u32 mark, xfrm_ad
 		if (x->props.family != family ||
 		    x->id.spi       != spi ||
 		    x->id.proto     != proto ||
-		    xfrm_addr_cmp(&x->id.daddr, daddr, family))
+		    (daddr && xfrm_addr_cmp(&x->id.daddr, daddr, family)))
 			continue;
 
 		if ((mark & x->mark.m) != x->mark.v)
-- 
1.7.1



^ permalink raw reply related

* [PATCHv4 net-next-2.6 5/5] XFRM,IPv6: Add IRO remapping capability via socket ancillary data path
From: Arnaud Ebalard @ 2010-10-04  6:25 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Herbert Xu, Hideaki YOSHIFUJI; +Cc: netdev
In-Reply-To: <cover.1286139128.git.arno@natisbad.org>

This provides the ability to remap src/dst address using IRO
via ancillary data passed to sockets. This is the IRO equivalent
of what is done for RH2/HAO (i.e. IPV6_RTHDR/IPV6_DSTOPTS).
This is used by UMIP during BA emission when acting as a Home
Agent.

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
---
 include/net/ipv6.h       |    4 ++++
 net/ipv6/datagram.c      |   20 ++++++++++++++++++++
 net/ipv6/exthdrs.c       |   19 +++++++++++++------
 net/ipv6/ip6_flowlabel.c |    7 +++++++
 net/ipv6/ip6_output.c    |   22 ++++++++++++++++++++++
 net/ipv6/raw.c           |    3 ++-
 6 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 4a3cd2c..2ba96d8 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -188,6 +188,10 @@ struct ipv6_txoptions {
 	struct ipv6_rt_hdr	*srcrt;	/* Routing Header */
 	struct ipv6_opt_hdr	*dst1opt;
 
+	/* XXX protect those via some ifdef e.g. CONFIG_XFRM_SUB_POLICY ? */
+	struct in6_addr         *iro_src;
+	struct in6_addr         *iro_dst;
+
 	/* Option buffer, as read by IPV6_PKTOPTIONS, starts here. */
 };
 
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 2952c9e..0ac7adf 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -757,6 +757,26 @@ int datagram_send_ctl(struct net *net,
 			}
 			break;
 
+#ifdef CONFIG_XFRM_SUB_POLICY
+		case IPV6_IROSRC:
+		case IPV6_IRODST:
+			if (cmsg->cmsg_len < CMSG_LEN(sizeof(struct in6_addr))) {
+				err = -EINVAL;
+				goto exit_f;
+			}
+
+			if (!capable(CAP_NET_RAW)) {
+				err = -EPERM;
+				goto exit_f;
+			}
+
+			if (cmsg->cmsg_type == IPV6_IROSRC)
+				opt->iro_src = (struct in6_addr *)CMSG_DATA(cmsg);
+			else
+				opt->iro_dst = (struct in6_addr *)CMSG_DATA(cmsg);
+			break;
+#endif
+
 		case IPV6_2292RTHDR:
 		case IPV6_RTHDR:
 			if (cmsg->cmsg_len < CMSG_LEN(sizeof(struct ipv6_rt_hdr))) {
diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 262f105..e480b06 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -750,6 +750,10 @@ ipv6_dup_options(struct sock *sk, struct ipv6_txoptions *opt)
 			*((char**)&opt2->dst1opt) += dif;
 		if (opt2->srcrt)
 			*((char**)&opt2->srcrt) += dif;
+		if (opt2->iro_src)
+			*((char**)&opt2->iro_src) += dif;
+		if (opt2->iro_dst)
+			*((char**)&opt2->iro_dst) += dif;
 	}
 	return opt2;
 }
@@ -874,24 +878,27 @@ struct ipv6_txoptions *ipv6_fixup_options(struct ipv6_txoptions *opt_space,
 
 /**
  * fl6_update_dst - update flowi destination address with info given
- *                  by srcrt option, if any.
+ *                  by srcrt/iro_dst option, if any.
  *
  * @fl: flowi for which fl6_dst is to be updated
- * @opt: struct ipv6_txoptions in which to look for srcrt opt
+ * @opt: struct ipv6_txoptions in which to look for srcrt/iro_dst opt
  * @orig: copy of original fl6_dst address if modified
  *
- * Returns NULL if no txoptions or no srcrt, otherwise returns orig
- * and initial value of fl->fl6_dst set in orig
+ * Returns NULL if no txoptions or no options to change flowi destination
+ * (srcrt or IRO destination remapping rule), otherwise returns orig and
+ * initial value of fl->fl6_dst set in orig
  */
 struct in6_addr *fl6_update_dst(struct flowi *fl,
 				const struct ipv6_txoptions *opt,
 				struct in6_addr *orig)
 {
-	if (!opt || !opt->srcrt)
+	if (!opt || (!opt->srcrt && !opt->iro_dst))
 		return NULL;
 
 	ipv6_addr_copy(orig, &fl->fl6_dst);
-	ipv6_addr_copy(&fl->fl6_dst, ((struct rt0_hdr *)opt->srcrt)->addr);
+	ipv6_addr_copy(&fl->fl6_dst,
+		       opt->iro_dst ?: ((struct rt0_hdr *)opt->srcrt)->addr);
+
 	return orig;
 }
 
diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
index 1365468..dbf9c29 100644
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -280,6 +280,9 @@ struct ipv6_txoptions *fl6_merge_options(struct ipv6_txoptions * opt_space,
 		opt_space->hopopt = fl_opt->hopopt;
 		opt_space->dst0opt = fl_opt->dst0opt;
 		opt_space->srcrt = fl_opt->srcrt;
+		/* XXX protect those via some ifdef - see net/ipv6.h */
+		opt_space->iro_src = fl_opt->iro_src;
+		opt_space->iro_dst = fl_opt->iro_dst;
 		opt_space->opt_nflen = fl_opt->opt_nflen;
 	} else {
 		if (fopt->opt_nflen == 0)
@@ -287,6 +290,9 @@ struct ipv6_txoptions *fl6_merge_options(struct ipv6_txoptions * opt_space,
 		opt_space->hopopt = NULL;
 		opt_space->dst0opt = NULL;
 		opt_space->srcrt = NULL;
+		/* XXX protect those via some ifdef - see net/ipv6.h */
+		opt_space->iro_src = NULL;
+		opt_space->iro_dst = NULL;
 		opt_space->opt_nflen = 0;
 	}
 	opt_space->dst1opt = fopt->dst1opt;
@@ -456,6 +462,7 @@ static int ipv6_opt_cmp(struct ipv6_txoptions *o1, struct ipv6_txoptions *o2)
 		return 1;
 	if (ipv6_hdr_cmp((struct ipv6_opt_hdr *)o1->srcrt, (struct ipv6_opt_hdr *)o2->srcrt))
 		return 1;
+
 	return 0;
 }
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 99157b4..210f269 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -222,6 +222,8 @@ int ip6_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl,
 			ipv6_push_frag_opts(skb, opt, &proto);
 		if (opt->opt_nflen)
 			ipv6_push_nfrag_opts(skb, opt, &proto, &first_hop);
+		if (opt->iro_dst)
+			first_hop = opt->iro_dst;
 	}
 
 	skb_push(skb, sizeof(struct ipv6hdr));
@@ -1106,6 +1108,12 @@ static inline struct ipv6_rt_hdr *ip6_rthdr_dup(struct ipv6_rt_hdr *src,
 	return src ? kmemdup(src, (src->hdrlen + 1) * 8, gfp) : NULL;
 }
 
+static inline struct in6_addr *ip6_iro_addr_dup(struct in6_addr *addr,
+						gfp_t gfp)
+{
+	return addr ? kmemdup(addr, sizeof(struct in6_addr), gfp) : NULL;
+}
+
 int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 	int offset, int len, int odd, struct sk_buff *skb),
 	void *from, int length, int transhdrlen,
@@ -1162,6 +1170,16 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 			if (opt->srcrt && !np->cork.opt->srcrt)
 				return -ENOBUFS;
 
+			np->cork.opt->iro_src = ip6_iro_addr_dup(opt->iro_src,
+								 sk->sk_allocation);
+			if (opt->iro_src && !np->cork.opt->iro_src)
+				return -ENOBUFS;
+
+			np->cork.opt->iro_dst = ip6_iro_addr_dup(opt->iro_dst,
+								 sk->sk_allocation);
+			if (opt->iro_dst && !np->cork.opt->iro_dst)
+				return -ENOBUFS;
+
 			/* need source address above miyazawa*/
 		}
 		dst_hold(&rt->dst);
@@ -1440,6 +1458,8 @@ static void ip6_cork_release(struct inet_sock *inet, struct ipv6_pinfo *np)
 		kfree(np->cork.opt->dst1opt);
 		kfree(np->cork.opt->hopopt);
 		kfree(np->cork.opt->srcrt);
+		kfree(np->cork.opt->iro_src);
+		kfree(np->cork.opt->iro_dst);
 		kfree(np->cork.opt);
 		np->cork.opt = NULL;
 	}
@@ -1495,6 +1515,8 @@ int ip6_push_pending_frames(struct sock *sk)
 		ipv6_push_frag_opts(skb, opt, &proto);
 	if (opt && opt->opt_nflen)
 		ipv6_push_nfrag_opts(skb, opt, &proto, &final_dst);
+	if (opt && opt->iro_dst)
+		final_dst = opt->iro_dst;
 
 	skb_push(skb, sizeof(struct ipv6hdr));
 	skb_reset_network_header(skb);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 45e6efb..1a11bd5 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -828,7 +828,8 @@ static int rawv6_sendmsg(struct kiocb *iocb, struct sock *sk,
 			if (flowlabel == NULL)
 				return -EINVAL;
 		}
-		if (!(opt->opt_nflen|opt->opt_flen))
+		if (!(opt->opt_nflen|opt->opt_flen) &&
+		    (!opt->iro_src && !opt->iro_dst))
 			opt = NULL;
 	}
 	if (opt == NULL)
-- 
1.7.1


^ permalink raw reply related

* ipvs: Use frag walker helper in SCTP proto support.
From: David Miller @ 2010-10-04  6:46 UTC (permalink / raw)
  To: horms; +Cc: netdev, netfilter-devel


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/netfilter/ipvs/ip_vs_proto_sctp.c |   19 ++++++++++---------
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c b/net/netfilter/ipvs/ip_vs_proto_sctp.c
index 4c0855c..2f982a4 100644
--- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
@@ -61,6 +61,7 @@ sctp_snat_handler(struct sk_buff *skb,
 {
 	sctp_sctphdr_t *sctph;
 	unsigned int sctphoff;
+	struct sk_buff *iter;
 	__be32 crc32;
 
 #ifdef CONFIG_IP_VS_IPV6
@@ -89,8 +90,8 @@ sctp_snat_handler(struct sk_buff *skb,
 
 	/* Calculate the checksum */
 	crc32 = sctp_start_cksum((u8 *) sctph, skb_headlen(skb) - sctphoff);
-	for (skb = skb_shinfo(skb)->frag_list; skb; skb = skb->next)
-		crc32 = sctp_update_cksum((u8 *) skb->data, skb_headlen(skb),
+	skb_walk_frags(skb, iter)
+		crc32 = sctp_update_cksum((u8 *) iter->data, skb_headlen(iter),
 				          crc32);
 	crc32 = sctp_end_cksum(crc32);
 	sctph->checksum = crc32;
@@ -102,9 +103,9 @@ static int
 sctp_dnat_handler(struct sk_buff *skb,
 		  struct ip_vs_protocol *pp, struct ip_vs_conn *cp)
 {
-
 	sctp_sctphdr_t *sctph;
 	unsigned int sctphoff;
+	struct sk_buff *iter;
 	__be32 crc32;
 
 #ifdef CONFIG_IP_VS_IPV6
@@ -133,8 +134,8 @@ sctp_dnat_handler(struct sk_buff *skb,
 
 	/* Calculate the checksum */
 	crc32 = sctp_start_cksum((u8 *) sctph, skb_headlen(skb) - sctphoff);
-	for (skb = skb_shinfo(skb)->frag_list; skb; skb = skb->next)
-		crc32 = sctp_update_cksum((u8 *) skb->data, skb_headlen(skb),
+	skb_walk_frags(skb, iter)
+		crc32 = sctp_update_cksum((u8 *) iter->data, skb_headlen(iter),
 					  crc32);
 	crc32 = sctp_end_cksum(crc32);
 	sctph->checksum = crc32;
@@ -145,9 +146,9 @@ sctp_dnat_handler(struct sk_buff *skb,
 static int
 sctp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp)
 {
-	struct sk_buff *list = skb_shinfo(skb)->frag_list;
 	unsigned int sctphoff;
 	struct sctphdr *sh, _sctph;
+	struct sk_buff *iter;
 	__le32 cmp;
 	__le32 val;
 	__u32 tmp;
@@ -166,9 +167,9 @@ sctp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp)
 	cmp = sh->checksum;
 
 	tmp = sctp_start_cksum((__u8 *) sh, skb_headlen(skb));
-	for (; list; list = list->next)
-		tmp = sctp_update_cksum((__u8 *) list->data,
-					skb_headlen(list), tmp);
+	skb_walk_frags(skb, iter)
+		tmp = sctp_update_cksum((__u8 *) iter->data,
+					skb_headlen(iter), tmp);
 
 	val = sctp_end_cksum(tmp);
 
-- 
1.7.3.1


^ permalink raw reply related

* ixgbe: normalize frag_list usage
From: David Miller @ 2010-10-04  6:54 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: jesse.brandeburg, bruce.w.allan, netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---

Basically I want to get the whole tree to consistently use
the convention that the frag list is built as packets arrive
by using "skb->prev" of the head skb to track the tail member
of the frag list.

This will allow me to do something like:

struct sk_buff {
       union {
		struct {
			struct sk_buff *next;
			struct sk_buff *prev;
		};
		struct {
			struct sk_buff *frag_next;
			struct sk_buff *frag_tail_tracker;
		};
	}
 ...
}

And have all frag_list code use these members consistently.

While doing this patch I noticed that there are some left-over bits in
the IXGBEVF driver that builds these IXGBE style (before my patch)
skb->prev RSC chains.

But nothing other than the RX ring shutdown code processes them, so
likely if they are created they will leak or cause some other kind of
problem.

Please take a look, thanks.

diff --git a/drivers/net/ixgbe/ixgbe.h b/drivers/net/ixgbe/ixgbe.h
index 5cebc37..434c9fa 100644
--- a/drivers/net/ixgbe/ixgbe.h
+++ b/drivers/net/ixgbe/ixgbe.h
@@ -181,6 +181,7 @@ struct ixgbe_ring {
 	struct ixgbe_queue_stats stats;
 	unsigned long reinit_state;
 	int numa_node;
+	struct sk_buff *rsc_skb;	/* RSC packet being built */
 	u64 rsc_count;			/* stat for coalesced packets */
 	u64 rsc_flush;			/* stats for flushed packets */
 	u32 restart_queue;		/* track tx queue restarts */
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index c35e13c..43987a4 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -1130,36 +1130,6 @@ static inline u32 ixgbe_get_rsc_count(union ixgbe_adv_rx_desc *rx_desc)
 		IXGBE_RXDADV_RSCCNT_SHIFT;
 }
 
-/**
- * ixgbe_transform_rsc_queue - change rsc queue into a full packet
- * @skb: pointer to the last skb in the rsc queue
- * @count: pointer to number of packets coalesced in this context
- *
- * This function changes a queue full of hw rsc buffers into a completed
- * packet.  It uses the ->prev pointers to find the first packet and then
- * turns it into the frag list owner.
- **/
-static inline struct sk_buff *ixgbe_transform_rsc_queue(struct sk_buff *skb,
-							u64 *count)
-{
-	unsigned int frag_list_size = 0;
-
-	while (skb->prev) {
-		struct sk_buff *prev = skb->prev;
-		frag_list_size += skb->len;
-		skb->prev = NULL;
-		skb = prev;
-		*count += 1;
-	}
-
-	skb_shinfo(skb)->frag_list = skb->next;
-	skb->next = NULL;
-	skb->len += frag_list_size;
-	skb->data_len += frag_list_size;
-	skb->truesize += frag_list_size;
-	return skb;
-}
-
 struct ixgbe_rsc_cb {
 	dma_addr_t dma;
 	bool delay_unmap;
@@ -1216,10 +1186,13 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		prefetch(skb->data);
 		rx_buffer_info->skb = NULL;
 
+		if (!(staterr & IXGBE_RXD_STAT_EOP)) {
+			rx_ring->rsc_skb = skb;
+			rx_ring->rsc_count++;
+		}
 		if (rx_buffer_info->dma) {
 			if ((adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED) &&
-			    (!(staterr & IXGBE_RXD_STAT_EOP)) &&
-				 (!(skb->prev))) {
+			    rx_ring->rsc_skb == skb) {
 				/*
 				 * When HWRSC is enabled, delay unmapping
 				 * of the first packet. It carries the
@@ -1279,9 +1252,12 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		}
 
 		if (staterr & IXGBE_RXD_STAT_EOP) {
-			if (skb->prev)
-				skb = ixgbe_transform_rsc_queue(skb,
-								&(rx_ring->rsc_count));
+			if (rx_ring->rsc_skb) {
+				skb = rx_ring->rsc_skb;
+				rx_ring->rsc_skb = NULL;
+
+				skb->prev = NULL;
+			}
 			if (adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED) {
 				if (IXGBE_RSC_CB(skb)->delay_unmap) {
 					dma_unmap_single(&pdev->dev,
@@ -1307,8 +1283,22 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 				next_buffer->skb = skb;
 				next_buffer->dma = 0;
 			} else {
-				skb->next = next_buffer->skb;
-				skb->next->prev = skb;
+				struct sk_buff *head = rx_ring->rsc_skb;
+				struct sk_buff *next = next_buffer->skb;
+
+				if (!skb_shinfo(head)->frag_list) {
+					skb_shinfo(head)->frag_list = next;
+				} else {
+					head->prev->next = next;
+				}
+
+				/* ->prev tracks the last skb in the frag_list */
+				head->prev = next;
+				rx_ring->rsc_count++;
+
+				head->len += next->len;
+				head->data_len += next->len;
+				head->truesize += next->len;
 			}
 			rx_ring->non_eop_descs++;
 			goto next_desc;

^ permalink raw reply related

* Re: [PATCH net-next] net: introduce DST_NOCACHE flag
From: Eric Dumazet @ 2010-10-04  7:19 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20101003.221846.245388012.davem@davemloft.net>

Le dimanche 03 octobre 2010 à 22:18 -0700, David Miller a écrit :

> Cute, and applied, but it shows that we're RCU'd so much of the
> surrounding infrastructure that the neighbour cache is now pretty
> high on the list of things to RCU.

Yes, this is the plan, I began the work friday evening ;)




^ permalink raw reply

* powerpc, fs_enet: scanning PHY after Linux is up
From: Heiko Schocher @ 2010-10-04  7:32 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: devicetree-discuss, Holger Brunck, Detlev Zundel, netdev

Hello all,

we have on the mgcoge arch/powerpc/boot/dts/mgcoge.dts 3 fs_enet
devices. The first is accessible on boot, and so get correct
probed and works fine. For the other two fs_enet devices the PHYs
are on startup in reset, and gets later, through userapplikations,
out of reset ... so, on bootup, this 2 fs_enet devices could
not detect the PHY in drivers/of/of_mdio.c of_mdiobus_register(),
and if we want to use them later, we get for example:

-bash-3.2# ifconfig eth2 172.31.31.33
net eth2: Could not attach to PHY
SIOCSIFFLAGS: No such device

So the problem is, that we cannot rescan the PHYs, if they are
accessible. Also we could not load the fs_enet driver as a module,
because the first port is used fix.

So, first question which comes in my mind, is:

Is detecting the phy in drivers/of/of_mdio.c of_mdiobus_register()
the right place, or should it not better be done, when really
using the port?

But we found another way to solve this issue:

After the PHYs are out of reset, we just have to rescan the PHYs
with (for example PHY with addr 1)

err = mdiobus_scan(bus, 1);

and

of_find_node_by_path("/soc@f0000000/cpm@119c0/mdio@10d40/ethernet-phy@1");
of_node_get(np);
dev_archdata_set_node(&err->dev.archdata, np);

but thats just a hack ...

So, the question is, is there a possibility to solve this problem?

If there is no standard option, what would be with adding a
"scan_phy" file in

/proc/device-tree/soc\@f0000000/cpm\@119c0/mdio\@10d40
(or better destination?)

which with we could rescan a PHY with
"echo addr > /proc/device-tree/soc\@f0000000/cpm\@119c0/mdio\@10d40/scan_phy"
(so there is no need for using of_find_node_by_path(), as we should
 have the associated device node here, and can step through the child
 nodes with "for_each_child_of_node(np, child)" and check if reg == addr)

or shouldn;t be at least, if the phy couldn;t be found when opening
the port, retrigger a scanning, if the phy now is accessible?

bye,
Heiko
-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

^ permalink raw reply

* Re: ipvs: Use frag walker helper in SCTP proto support.
From: Simon Horman @ 2010-10-04  7:59 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, netfilter-devel, lvs-devel, Patrick McHardy
In-Reply-To: <20101003.234601.232735559.davem@davemloft.net>

On Sun, Oct 03, 2010 at 11:46:01PM -0700, David Miller wrote:
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>

Acked-by: Simon Horman <horms@verge.net.au>

Dave, I'm happy for this to go via your tree or Partick's.
I don't believe it conflicts with any of the other changes
that are pending.

^ permalink raw reply

* Re: [PATCHv4 net-next-2.6 1/5] XFRM,IPv6: Remove xfrm_spi_hash() dependency on destination address
From: Herbert Xu @ 2010-10-04  8:33 UTC (permalink / raw)
  To: Arnaud Ebalard; +Cc: David S. Miller, Eric Dumazet, Hideaki YOSHIFUJI, netdev
In-Reply-To: <5a0e320544e253cc903cfd3292600b6bec044a5f.1286139129.git.arno@natisbad.org>

On Mon, Oct 04, 2010 at 08:25:07AM +0200, Arnaud Ebalard wrote:
>
> At the moment, Linux XFRM stack includes the address when computing
> the hash to perform state lookup by SPI. This patch changes XFRM
> state hash computation to prevent destination address to be
> used. This will later allow finding states for packets w/ mangled
> destination addresses.

I'm fine with doing this for inbound SAs.  However, I can't see
how we can do this for outbound SAs where the SPI is chosen by
the remote end.

Incidentally, it appears that our hash could do with some
strengthening.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] tms380tr: fix long delays in initialization
From: Meelis Roos @ 2010-10-04  8:39 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20101003.200109.226763463.davem@davemloft.net>

> > tms380tr driver tries to use udelay (meaning busy loop) for several half 
> > second delays during hardware initialization. Crazy overly long busy 
> > wait delays mean no delay at all so driver initialization fails without 
> > waiting. Fix it by using msleep() for long delays and leave it to 
> > udelay() for short delays.
> > 
> > Signed-off-by: Meelis Roos <mroos@linux.ee>
> 
> You can't use msleep() here because this code can be invoked
> from interrupts and thus cannot sleep.

I checked these two functions that contain long delays that I changed - 
tms380tr_bringup_diags() and tms380tr_init_adapter() - to be called only 
from tms380tr_chipset_init() that is only called from tms380tr_open() so 
no call paths from interrupts AFAICS. Short delyas from interrupt 
context are not changed in any way, they still use udelay().

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply

* Re: [PATCHv4 net-next-2.6 4/5] XFRM,IPv6: Add IRO remapping hook in xfrm_input()
From: Herbert Xu @ 2010-10-04  8:40 UTC (permalink / raw)
  To: Arnaud Ebalard; +Cc: David S. Miller, Eric Dumazet, Hideaki YOSHIFUJI, netdev
In-Reply-To: <db067c0a2ab679dfb16c84e8509e671fa6c5cb01.1286139129.git.arno@natisbad.org>

On Mon, Oct 04, 2010 at 08:25:23AM +0200, Arnaud Ebalard wrote:
> Add a hook in xfrm_input() to allow IRO remapping to occur when
> an incoming packet matching an existing SA (based on SPI) with
> an unexpected destination or source address is received.
> Because IRO does not consume additional bits in a packet (that's
> the point), there is no way to demultiplex based on something
> like nh or spi. Instead, IRO input handlers (for source and
> destination address remapping) are called upon address mismatch
> during IPsec processing.
> For that to work, we rely on the fact that SPI values generated
> locally are no more linked to destination address (first patch
> of the set) and we postpone a bit the expected address check in
> xfrm_input() (inside xfrm_state_lookup() against daddr param) by
> introducing a call to the input_addr_check() handler from the
> struct xfrm_state_afinfo associated with the address family.
> 
> Signed-off-by: Arnaud Ebalard <arno@natisbad.org>

I would prefer for this check to go into x->type->input since
it does not apply to IPsec.

Just because the SPI is unique for inbound SAs, it doesn't mean
that we should ignore the destination IP address in the packet for
IPsec.

I think another way of getting what you want is to simply add
inbound SAs with a zero destination address in your case which
can then be made to match any destination IP address.  You can
then follow that up with additional checks in x->type->input.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH net-next V3] net: dynamic ingress_queue allocation
From: Eric Dumazet @ 2010-10-04  8:42 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: hadi, David Miller, netdev
In-Reply-To: <20101003094221.GA2028@del.dom.local>

Le dimanche 03 octobre 2010 à 11:42 +0200, Jarek Poplawski a écrit :

> I'd consider rcu_dereference_rtnl(). Btw, technically qdisc_lookup()
> doesn't require rtnl, and there was time it was used without it
> (on xmit path).


Hmm, for me, rcu_dereference_rtnl() is a bit lazy.

Either we are a reader and should use rcu_dereference(), or a writer and
RTNL should be held.

Mixing two conditions in a "super macro" is a workaround that we used to
promptly shutup some lockdep splats. Real fix would be to use strict
lockdep conditions, because this better documents the code and the
locking invariants.

BTW, rtnl_dereference() should be changed to use
rcu_dereference_protected() instead of rcu_dereference_check() :
If RTBL is held, there is no need to force a barrier.


> I think you should also add a comment here why this rcu is used, and
> that it changes only once in dev's liftime.
> 

This comment was needed in the previous version of the patch, with the
smb_wmb() barrier. Now I switched to regular rcu use, nothing prevents
us to change dev->ingress_queue in flight. Of course there is no current
interest doing so.




^ permalink raw reply

* Re: [PATCH] sysctl: fix min/max handling in __do_proc_doulongvec_minmax()
From: Robin Holt @ 2010-10-04  8:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, linux-kernel, Robin Holt, Willy Tarreau,
	David S. Miller, netdev, James Morris, Hideaki YOSHIFUJI,
	Pekka Savola (ipv6), Patrick McHardy, Alexey Kuznetsov
In-Reply-To: <1286025469.2582.1806.camel@edumazet-laptop>

On Sat, Oct 02, 2010 at 03:17:49PM +0200, Eric Dumazet wrote:
> When proc_doulongvec_minmax() is used with an array of longs,
> and no min/max check requested (.extra1 or .extra2 being NULL), we
> dereference a NULL pointer for the second element of the array.
> 
> Noticed while doing some changes in network stack for the "16TB problem"
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  kernel/sysctl.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index f88552c..4fba86d 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -2500,7 +2500,8 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int
>  				break;
>  			if (neg)
>  				continue;
> -			if ((min && val < *min) || (max && val > *max))
> +			if ((table->extra1 && val < *min) ||
> +			    (table->extra2 && val > *max))

How about changing:
        for (; left && vleft--; i++, min++, max++, first=0) {
into:
        for (; left && vleft--; i++, min = min ? min + 1 : NULL, max = max ? max + 1: NULL, first=0) {

That would make min and max correct and reduce the chances somebody in
the future overlooks the fact they are currently filled with garbage.

Robin

^ permalink raw reply

* [PATCH net-next] net: relax rtnl_dereference()
From: Eric Dumazet @ 2010-10-04  9:00 UTC (permalink / raw)
  To: David Miller; +Cc: hadi, netdev, Jarek Poplawski
In-Reply-To: <1286181729.18293.8.camel@edumazet-laptop>

Le lundi 04 octobre 2010 à 10:42 +0200, Eric Dumazet a écrit :

> BTW, rtnl_dereference() should be changed to use
> rcu_dereference_protected() instead of rcu_dereference_check() :
> If RTBL is held, there is no need to force a barrier.
> 

[PATCH net-next] net: relax rtnl_dereference()

rtnl_dereference() is used in contexts where RTNL is held, to fetch an
RCU protected pointer.
 
Updates to this pointer are prevented by RTNL, so we dont need
smp_read_barrier_depends() and the ACCESS_ONCE() provided in
rcu_dereference_check().

rtnl_dereference() is mainly a macro to document the locking invariant.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/rtnetlink.h |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 68c436b..d3c4efa 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -754,20 +754,22 @@ extern int lockdep_rtnl_is_held(void);
  * @p: The pointer to read, prior to dereferencing
  *
  * Do an rcu_dereference(p), but check caller either holds rcu_read_lock()
- * or RTNL
+ * or RTNL. Note : Please prefer rtnl_dereference() or rcu_dereference()
  */
 #define rcu_dereference_rtnl(p)					\
 	rcu_dereference_check(p, rcu_read_lock_held() ||	\
 				 lockdep_rtnl_is_held())
 
 /**
- * rtnl_dereference - rcu_dereference with debug checking
+ * rtnl_dereference - fetch RCU pointer when updates are prevented by RTNL
  * @p: The pointer to read, prior to dereferencing
  *
- * Do an rcu_dereference(p), but check caller holds RTNL
+ * Return the value of the specified RCU-protected pointer, but omit
+ * both the smp_read_barrier_depends() and the ACCESS_ONCE(), because
+ * caller holds RTNL.
  */
 #define rtnl_dereference(p)					\
-	rcu_dereference_check(p, lockdep_rtnl_is_held())
+	rcu_dereference_protected(p, lockdep_rtnl_is_held())
 
 extern void rtnetlink_init(void);
 extern void __rtnl_unlock(void);



^ permalink raw reply related

* [patch v4 00/12] IPVS: SIP Persistence Engine
From: Simon Horman @ 2010-10-04  9:03 UTC (permalink / raw)
  To: lvs-devel, netdev, netfilter, netfilter-devel
  Cc: Jan Engelhardt, Stephen Hemminger, Wensong Zhang,
	Julian Anastasov, Patrick McHardy

This patch series adds load-balancing of UDP SIP based on Call-ID to
IPVS as well as a frame-work for extending IPVS to handle alternate
persistence requirements.

REVISIONS

This is v4 of the patch series which addresses the following concerns
raised by Julian Anastasov:
 - initialize compat structures to avoid crash in
   ip_vs_add_service for pe_name
 - fix ip_vs_conn_out_get_proto to properly check the
   ip_vs_conn_fill_param_proto result

These changes and changes from previous revisions of this patch series
are noted on a per-patch basis.

I have taken the liberty of including Acked-by: Julian Anastasov <ja@ssi.bg>
although the ack was provided before the two problems above were discovered.

OVERVIEW

The approach that I have taken is what I call persistence engines.
The basic idea being that you can provide a module to LVS that alters
the way that it handles connection templates, which are at the core
of persistence. In particular, an additional key can be added, and
any of the normal IP address, port and protocol information can either
be used or ignored.

In the case of the SIP persistence engine, the only persistence engine, all
the keys used by the default persistence behaviour are used and the callid
is added as an extra key. I originally intended to ignore the cip, but this
can optionally be done by setting the persistence mask (-M) to 0.0.0.0
while allowing the flexibility of other mask values.

It is envisaged that the SIP persistence engine will be used in conjunction
with one-packet scheduling. I'm interested to hear if that doesn't fit your
needs.


CONFIGURATION

A persistence engine is associated with a virtual service
(as are schedulers). I have added the --pe option to the
ivpsadm -A and -E commands to allow the persistence engine
of a virtual service to be added, changed, or deleted.

e.g. ipvsadm -A -u 10.4.3.192:5060 -p 60 -M 0.0.0.0 -o --pe sip

There are no other configuration parameters at this time.


RUNNING

When a connection template is created, if its virtual service
has a persistence engine, then the persistence engine can add
an extra key to the connection template. For the SIP module this
is the callid. More generically, it is known as "pe data". And
both the name of the persistence engine, "pe name", and "pe data"
can be viewed in /proc/net/ip_vs_conn and by passing the
--persistent-conn option to ipvsadm -Lc.

e.g.
# ipvsadm -Lcn --persistent-conn
UDP 00:38  UDP         10.4.3.0:0         10.4.3.192:5060    127.0.0.1:5060 sip 193373839

Here we see a single persistence template (cport is 0), which has been
handled by the sip persistence engine. The pe data (callid) is 193373839.

In the case where the persistence engine can't match a packet for some
reason, the connection will fall back to the normal persistence handling.
This seems reasonable, as that if the packet ought to be dropped, iptables
could be used.

A limited amount of debugging information has been added which
can be enabled using a value of 9 or greater in
/proc/sys/net/ipv4/vs/debug_level

CODE AVAILABILITY

The kernel patches (12) are available in git as the pe-3 branch of
git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6.git

The ipvsadm patches (2) are available in git as the pe-3 branch of
git://github.com/horms/ipvsadm-test.git

There is no change to the ipvsadm patches since the v2 series
so I will not repost them.


^ permalink raw reply

* [patch v4 01/12] netfilter: nf_conntrack_sip: Allow ct_sip_get_header() to be called with a null ct argument
From: Simon Horman @ 2010-10-04  9:03 UTC (permalink / raw)
  To: lvs-devel, netdev, netfilter, netfilter-devel
  Cc: Jan Engelhardt, Stephen Hemminger, Wensong Zhang,
	Julian Anastasov, Patrick McHardy
In-Reply-To: <20101004090352.966809749@akiko.akashicho.tokyo.vergenet.net>

[-- Attachment #1: 0001-netfilter-nf_conntrack_sip-Allow-ct_sip_get_header-t.patch --]
[-- Type: text/plain, Size: 515 bytes --]

Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Julian Anastasov <ja@ssi.bg>

diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index 53d8922..2fd1ea2 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -152,6 +152,9 @@ static int parse_addr(const struct nf_conn *ct, const char *cp,
 	const char *end;
 	int ret = 0;
 
+	if (!ct)
+		return 0;
+
 	memset(addr, 0, sizeof(*addr));
 	switch (nf_ct_l3num(ct)) {
 	case AF_INET:
-- 
1.7.1



^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox