Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next-2.6] IB/{nes,ipoib}: Pass supported flags to ethtool_op_set_flags()
From: David Miller @ 2010-07-04 18:48 UTC (permalink / raw)
  To: rdreier
  Cc: bhutchings, rolandd, randy.dunlap, netdev, linux-net-drivers,
	sgruszka, amit.salecha, amwang, anirban.chakraborty, dm, scofeldm,
	vkolluri, roprabhu, e1000-devel, buytenh, gallatin, brice,
	shemminger, jgarzik, faisal.latif, chien.tin.tung, linux-rdma
In-Reply-To: <adask40nxaa.fsf@roland-alpha.cisco.com>

From: Roland Dreier <rdreier@cisco.com>
Date: Sat, 03 Jul 2010 13:08:29 -0700

>  > Following commit 1437ce3983bcbc0447a0dedcd644c14fe833d266 "ethtool:
>  > Change ethtool_op_set_flags to validate flags", ethtool_op_set_flags
>  > takes a third parameter and cannot be used directly as an
>  > implementation of ethtool_ops::set_flags.
> 
> Acked-by: Roland Dreier <rolandd@cisco.com>

Applied, thanks guys.

^ permalink raw reply

* Re: [PATCHv2] xfrm: fix xfrm by MARK logic
From: David Miller @ 2010-07-04 18:46 UTC (permalink / raw)
  To: eric.dumazet; +Cc: p.kosyh, netdev
In-Reply-To: <1278139083.2474.42.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 03 Jul 2010 08:38:03 +0200

> Le vendredi 02 juillet 2010 à 21:47 +0400, Peter Kosyh a écrit :
>> From: Peter Kosyh <p.kosyh@gmail.com>
>> 
>> While using xfrm by MARK feature in
>> 2.6.34 - 2.6.35 kernels, the mark 
>> is always cleared in flowi structure via memset in 
>> _decode_session4 (net/ipv4/xfrm4_policy.c), so
>> the policy lookup fails.
>> IPv6 code is affected by this bug too.
>> 
>> Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
>> ---
>> 
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks everyone.

^ permalink raw reply

* Re: [PATCH net-next 4/4] bnx2: Update version to 2.0.16.
From: David Miller @ 2010-07-04 18:44 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1278225738-7795-4-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Sat, 3 Jul 2010 23:42:18 -0700

> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 3/4] bnx2: Dump some config space registers during TX timeout.
From: David Miller @ 2010-07-04 18:44 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1278225738-7795-3-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Sat, 3 Jul 2010 23:42:17 -0700

> These config register values will be useful when the memory registers
> are returning 0xffffffff which has been reported.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 2/4] bnx2: Add support for skb->rxhash.
From: David Miller @ 2010-07-04 18:44 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1278225738-7795-2-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Sat, 3 Jul 2010 23:42:16 -0700

> Add skb->rxhash support for TCP packets only because the bnx2 RSS hash
> does not hash UDP ports.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 1/4] bnx2: Always enable MSI-X on 5709.
From: David Miller @ 2010-07-04 18:44 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1278225738-7795-1-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Sat, 3 Jul 2010 23:42:15 -0700

> Minor change to use MSI-X even if there is only one CPU.  This allows
> the CNIC driver to always have a dedicated MSI-X vector to handle
> iSCSI events, instead of sharing the MSI vector.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

* Re: QoS weirdness : HTB accuracy
From: Andrew Beverley @ 2010-07-04 17:50 UTC (permalink / raw)
  To: Julien Vehent; +Cc: Philip A. Prindeville, Netdev, netfilter
In-Reply-To: <1276204948.1403.13.camel@andybev>

> > I was, in fact, an error in my ruleset. I had put the 'linklayer atm' at
> > both the branch and leaf levels, so the overhead was computed twice,
> > creating those holes in the bandwidth.
> 
> I am seeing similar behaviour with my setup. Am I making the same
> mistake? A subset of my rules is as follows:
> 
> 
> tc qdisc add dev ppp0 root handle 1: htb r2q 1
> 
> tc class add dev ppp0 parent 1: classid 1:1 htb \
>     rate ${DOWNLINK}kbit ceil ${DOWNLINK}kbit \
>     overhead $overhead linklayer atm                   <------- Here
> 
> tc class add dev ppp0 parent 1:1 classid 1:10 htb \
>     rate 612kbit ceil 612kbit prio 0 \
>     overhead $overhead linklayer atm                   <------- And here
> 
> tc qdisc add dev ppp0 parent 1:10 handle 4210: \
>     sfq perturb 10 limit 50
> 
> tc filter add dev ppp0 parent 1:0 protocol ip \
>     prio 10 handle 10 fw flowid 1:10

I removed the overhead option on the first leaf, and the speeds change
to what I expect. However, the rules above are taken straight from the
ADSL Optimizer project, which was the source of the original overhead
patch for tc. So is the ADSL Optimizer project wrong?

Andy



^ permalink raw reply

* Re: [PATCH 0/4] Introduce and use printk pointer extension %pV
From: David Miller @ 2010-07-04 17:40 UTC (permalink / raw)
  To: greg; +Cc: joe, akpm, linux-kernel, netdev
In-Reply-To: <20100703160857.GA29043@kroah.com>

From: Greg KH <greg@kroah.com>
Date: Sat, 3 Jul 2010 09:08:57 -0700

> On Fri, Jul 02, 2010 at 10:32:44PM -0700, David Miller wrote:
>> From: David Miller <davem@davemloft.net>
>> Date: Wed, 30 Jun 2010 13:07:09 -0700 (PDT)
>> 
>> > From: Joe Perches <joe@perches.com>
>> > Date: Sun, 27 Jun 2010 04:02:32 -0700
>> > 
>> >> Recursive printk can reduce the total image size of an x86 defconfig about 1% 
>> >> by reducing duplicated KERN_<level> strings and centralizing the functions
>> >> used by macros in new separate functions.
>> >> 
>> >> Joe Perches (4):
>> >>   vsprintf: Recursive vsnprintf: Add "%pV", struct va_format
>> >>   device.h drivers/base/core.c Convert dev_<level> logging macros to functions
>> >>   netdevice.h net/core/dev.c: Convert netdev_<level> logging macros to functions
>> >>   netdevice.h: Change netif_<level> macros to call netdev_<level> functions
>> > 
>> > I'm fine with this, thanks Joe.
>> > 
>> > Greg, could you ACK this and let me know if it's OK if it swings
>> > through my net-next-2.6 tree?
>> 
>> Greg, ping?
> 
> Sorry about the delay.
> 
> Yes, that's fine to take it through your tree, thanks for doing that:
> 	Acked-by: Greg Kroah-Hartman <gregkh@suse.de>

Thanks!  I've added this set to my tree.

^ permalink raw reply

* RE: [PATCH] bnx2x: add support for receive hashing
From: Vladislav Zolotarov @ 2010-07-04 16:46 UTC (permalink / raw)
  To: Tom Herbert; +Cc: netdev@vger.kernel.org
In-Reply-To: <8628FE4E7912BF47A96AE7DD7BAC0AADDDE646FBFB@SJEXCHCCR02.corp.ad.broadcom.com>

Is there any reason not to fill skb->rxhash for LROed packets?

Thanks,
vlad

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
> Behalf Of Vladislav Zolotarov
> Sent: Sunday, July 04, 2010 7:36 PM
> To: Tom Herbert
> Cc: netdev@vger.kernel.org
> Subject: RE: [PATCH] bnx2x: add support for receive hashing
> 
> Tom, could u, pls., explain what did u mean by taking the (RSS) flags
> configuration out of RSS "if"? To recall "if(is_multi(bp))" is true iff RSS
> is enabled.
> 
> Thanks,
> vlad
> 
> > @@ -5750,10 +5757,10 @@ static void bnx2x_init_internal_func(struct bnx2x
> > *bp)
> >  	u32 offset;
> >  	u16 max_agg_size;
> >
> > -	if (is_multi(bp)) {
> > -		tstorm_config.config_flags = MULTI_FLAGS(bp);
> > +	tstorm_config.config_flags = RSS_FLAGS(bp);
> > +
> > +	if (is_multi(bp))
> >  		tstorm_config.rss_result_mask = MULTI_MASK;
> > -	}
> >
> >  	/* Enable TPA if needed */
> >  	if (bp->flags & TPA_ENABLE_FLAG)
> 
> 
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
> > Behalf Of Tom Herbert
> > Sent: Friday, April 23, 2010 8:54 AM
> > To: davem@davemloft.net; netdev@vger.kernel.org
> > Subject: [PATCH] bnx2x: add support for receive hashing
> >
> > Add support to bnx2x to extract Toeplitz hash out of the receive descriptor
> > for use in skb->rxhash.
> >
> > Signed-off-by: Tom Herbert <therbert@google.com>
> > ---
> > diff --git a/drivers/net/bnx2x.h b/drivers/net/bnx2x.h
> > index 0819530..8bd2368 100644
> > --- a/drivers/net/bnx2x.h
> > +++ b/drivers/net/bnx2x.h
> > @@ -1330,7 +1330,7 @@ static inline u32 reg_poll(struct bnx2x *bp, u32 reg,
> > u32 expected, int ms,
> >  		AEU_INPUTS_ATTN_BITS_MCP_LATCHED_UMP_TX_PARITY | \
> >  		AEU_INPUTS_ATTN_BITS_MCP_LATCHED_SCPAD_PARITY)
> >
> > -#define MULTI_FLAGS(bp) \
> > +#define RSS_FLAGS(bp) \
> >  		(TSTORM_ETH_FUNCTION_COMMON_CONFIG_RSS_IPV4_CAPABILITY | \
> >  		 TSTORM_ETH_FUNCTION_COMMON_CONFIG_RSS_IPV4_TCP_CAPABILITY | \
> >  		 TSTORM_ETH_FUNCTION_COMMON_CONFIG_RSS_IPV6_CAPABILITY | \
> > diff --git a/drivers/net/bnx2x_main.c b/drivers/net/bnx2x_main.c
> > index 0c6dba2..613f727 100644
> > --- a/drivers/net/bnx2x_main.c
> > +++ b/drivers/net/bnx2x_main.c
> > @@ -1582,7 +1582,7 @@ static int bnx2x_rx_int(struct bnx2x_fastpath *fp,
> int
> > budget)
> >  		struct sw_rx_bd *rx_buf = NULL;
> >  		struct sk_buff *skb;
> >  		union eth_rx_cqe *cqe;
> > -		u8 cqe_fp_flags;
> > +		u8 cqe_fp_flags, cqe_fp_status_flags;
> >  		u16 len, pad;
> >
> >  		comp_ring_cons = RCQ_BD(sw_comp_cons);
> > @@ -1598,6 +1598,7 @@ static int bnx2x_rx_int(struct bnx2x_fastpath *fp,
> int
> > budget)
> >
> >  		cqe = &fp->rx_comp_ring[comp_ring_cons];
> >  		cqe_fp_flags = cqe->fast_path_cqe.type_error_flags;
> > +		cqe_fp_status_flags = cqe->fast_path_cqe.status_flags;
> >
> >  		DP(NETIF_MSG_RX_STATUS, "CQE type %x  err %x  status %x"
> >  		   "  queue %x  vlan %x  len %u\n", CQE_TYPE(cqe_fp_flags),
> > @@ -1727,6 +1728,12 @@ reuse_rx:
> >
> >  			skb->protocol = eth_type_trans(skb, bp->dev);
> >
> > +			if ((bp->dev->features & ETH_FLAG_RXHASH) &&
> > +			    (cqe_fp_status_flags &
> > +			     ETH_FAST_PATH_RX_CQE_RSS_HASH_FLG))
> > +				skb->rxhash = le32_to_cpu(
> > +				    cqe->fast_path_cqe.rss_hash_result);
> > +
> >  			skb->ip_summed = CHECKSUM_NONE;
> >  			if (bp->rx_csum) {
> >  				if (likely(BNX2X_RX_CSUM_OK(cqe)))
> > @@ -5750,10 +5757,10 @@ static void bnx2x_init_internal_func(struct bnx2x
> > *bp)
> >  	u32 offset;
> >  	u16 max_agg_size;
> >
> > -	if (is_multi(bp)) {
> > -		tstorm_config.config_flags = MULTI_FLAGS(bp);
> > +	tstorm_config.config_flags = RSS_FLAGS(bp);
> > +
> > +	if (is_multi(bp))
> >  		tstorm_config.rss_result_mask = MULTI_MASK;
> > -	}
> >
> >  	/* Enable TPA if needed */
> >  	if (bp->flags & TPA_ENABLE_FLAG)
> > @@ -6629,10 +6636,8 @@ static int bnx2x_init_common(struct bnx2x *bp)
> >  	bnx2x_init_block(bp, PBF_BLOCK, COMMON_STAGE);
> >
> >  	REG_WR(bp, SRC_REG_SOFT_RST, 1);
> > -	for (i = SRC_REG_KEYRSS0_0; i <= SRC_REG_KEYRSS1_9; i += 4) {
> > -		REG_WR(bp, i, 0xc0cac01a);
> > -		/* TODO: replace with something meaningful */
> > -	}
> > +	for (i = SRC_REG_KEYRSS0_0; i <= SRC_REG_KEYRSS1_9; i += 4)
> > +		REG_WR(bp, i, random32());
> >  	bnx2x_init_block(bp, SRCH_BLOCK, COMMON_STAGE);
> >  #ifdef BCM_CNIC
> >  	REG_WR(bp, SRC_REG_KEYSEARCH_0, 0x63285672);
> > @@ -11001,6 +11006,11 @@ static int bnx2x_set_flags(struct net_device *dev,
> > u32 data)
> >  		changed = 1;
> >  	}
> >
> > +	if (data & ETH_FLAG_RXHASH)
> > +		dev->features |= NETIF_F_RXHASH;
> > +	else
> > +		dev->features &= ~NETIF_F_RXHASH;
> > +
> >  	if (changed && netif_running(dev)) {
> >  		bnx2x_nic_unload(bp, UNLOAD_NORMAL);
> >  		rc = bnx2x_nic_load(bp, LOAD_NORMAL);
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply

* RE: [PATCH] bnx2x: add support for receive hashing
From: Vladislav Zolotarov @ 2010-07-04 16:36 UTC (permalink / raw)
  To: Tom Herbert; +Cc: netdev@vger.kernel.org
In-Reply-To: <alpine.DEB.1.00.1004222249400.27016@pokey.mtv.corp.google.com>

Tom, could u, pls., explain what did u mean by taking the (RSS) flags configuration out of RSS "if"? To recall "if(is_multi(bp))" is true iff RSS is enabled.

Thanks,
vlad

> @@ -5750,10 +5757,10 @@ static void bnx2x_init_internal_func(struct bnx2x
> *bp)
>  	u32 offset;
>  	u16 max_agg_size;
> 
> -	if (is_multi(bp)) {
> -		tstorm_config.config_flags = MULTI_FLAGS(bp);
> +	tstorm_config.config_flags = RSS_FLAGS(bp);
> +
> +	if (is_multi(bp))
>  		tstorm_config.rss_result_mask = MULTI_MASK;
> -	}
> 
>  	/* Enable TPA if needed */
>  	if (bp->flags & TPA_ENABLE_FLAG)


> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
> Behalf Of Tom Herbert
> Sent: Friday, April 23, 2010 8:54 AM
> To: davem@davemloft.net; netdev@vger.kernel.org
> Subject: [PATCH] bnx2x: add support for receive hashing
> 
> Add support to bnx2x to extract Toeplitz hash out of the receive descriptor
> for use in skb->rxhash.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
> diff --git a/drivers/net/bnx2x.h b/drivers/net/bnx2x.h
> index 0819530..8bd2368 100644
> --- a/drivers/net/bnx2x.h
> +++ b/drivers/net/bnx2x.h
> @@ -1330,7 +1330,7 @@ static inline u32 reg_poll(struct bnx2x *bp, u32 reg,
> u32 expected, int ms,
>  		AEU_INPUTS_ATTN_BITS_MCP_LATCHED_UMP_TX_PARITY | \
>  		AEU_INPUTS_ATTN_BITS_MCP_LATCHED_SCPAD_PARITY)
> 
> -#define MULTI_FLAGS(bp) \
> +#define RSS_FLAGS(bp) \
>  		(TSTORM_ETH_FUNCTION_COMMON_CONFIG_RSS_IPV4_CAPABILITY | \
>  		 TSTORM_ETH_FUNCTION_COMMON_CONFIG_RSS_IPV4_TCP_CAPABILITY | \
>  		 TSTORM_ETH_FUNCTION_COMMON_CONFIG_RSS_IPV6_CAPABILITY | \
> diff --git a/drivers/net/bnx2x_main.c b/drivers/net/bnx2x_main.c
> index 0c6dba2..613f727 100644
> --- a/drivers/net/bnx2x_main.c
> +++ b/drivers/net/bnx2x_main.c
> @@ -1582,7 +1582,7 @@ static int bnx2x_rx_int(struct bnx2x_fastpath *fp, int
> budget)
>  		struct sw_rx_bd *rx_buf = NULL;
>  		struct sk_buff *skb;
>  		union eth_rx_cqe *cqe;
> -		u8 cqe_fp_flags;
> +		u8 cqe_fp_flags, cqe_fp_status_flags;
>  		u16 len, pad;
> 
>  		comp_ring_cons = RCQ_BD(sw_comp_cons);
> @@ -1598,6 +1598,7 @@ static int bnx2x_rx_int(struct bnx2x_fastpath *fp, int
> budget)
> 
>  		cqe = &fp->rx_comp_ring[comp_ring_cons];
>  		cqe_fp_flags = cqe->fast_path_cqe.type_error_flags;
> +		cqe_fp_status_flags = cqe->fast_path_cqe.status_flags;
> 
>  		DP(NETIF_MSG_RX_STATUS, "CQE type %x  err %x  status %x"
>  		   "  queue %x  vlan %x  len %u\n", CQE_TYPE(cqe_fp_flags),
> @@ -1727,6 +1728,12 @@ reuse_rx:
> 
>  			skb->protocol = eth_type_trans(skb, bp->dev);
> 
> +			if ((bp->dev->features & ETH_FLAG_RXHASH) &&
> +			    (cqe_fp_status_flags &
> +			     ETH_FAST_PATH_RX_CQE_RSS_HASH_FLG))
> +				skb->rxhash = le32_to_cpu(
> +				    cqe->fast_path_cqe.rss_hash_result);
> +
>  			skb->ip_summed = CHECKSUM_NONE;
>  			if (bp->rx_csum) {
>  				if (likely(BNX2X_RX_CSUM_OK(cqe)))
> @@ -5750,10 +5757,10 @@ static void bnx2x_init_internal_func(struct bnx2x
> *bp)
>  	u32 offset;
>  	u16 max_agg_size;
> 
> -	if (is_multi(bp)) {
> -		tstorm_config.config_flags = MULTI_FLAGS(bp);
> +	tstorm_config.config_flags = RSS_FLAGS(bp);
> +
> +	if (is_multi(bp))
>  		tstorm_config.rss_result_mask = MULTI_MASK;
> -	}
> 
>  	/* Enable TPA if needed */
>  	if (bp->flags & TPA_ENABLE_FLAG)
> @@ -6629,10 +6636,8 @@ static int bnx2x_init_common(struct bnx2x *bp)
>  	bnx2x_init_block(bp, PBF_BLOCK, COMMON_STAGE);
> 
>  	REG_WR(bp, SRC_REG_SOFT_RST, 1);
> -	for (i = SRC_REG_KEYRSS0_0; i <= SRC_REG_KEYRSS1_9; i += 4) {
> -		REG_WR(bp, i, 0xc0cac01a);
> -		/* TODO: replace with something meaningful */
> -	}
> +	for (i = SRC_REG_KEYRSS0_0; i <= SRC_REG_KEYRSS1_9; i += 4)
> +		REG_WR(bp, i, random32());
>  	bnx2x_init_block(bp, SRCH_BLOCK, COMMON_STAGE);
>  #ifdef BCM_CNIC
>  	REG_WR(bp, SRC_REG_KEYSEARCH_0, 0x63285672);
> @@ -11001,6 +11006,11 @@ static int bnx2x_set_flags(struct net_device *dev,
> u32 data)
>  		changed = 1;
>  	}
> 
> +	if (data & ETH_FLAG_RXHASH)
> +		dev->features |= NETIF_F_RXHASH;
> +	else
> +		dev->features &= ~NETIF_F_RXHASH;
> +
>  	if (changed && netif_running(dev)) {
>  		bnx2x_nic_unload(bp, UNLOAD_NORMAL);
>  		rc = bnx2x_nic_load(bp, LOAD_NORMAL);
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply

* [patch v2.3 4/4] libxt_ipvs: user-space lib for netfilter matcher xt_ipvs
From: Simon Horman @ 2010-07-04 11:32 UTC (permalink / raw)
  To: lvs-devel, netdev, linux-kernel, netfilter, netfilter-devel
  Cc: Malcolm Turnbull, Wensong Zhang, Julius Volz, Patrick McHardy,
	David S. Miller, Hannes Eder
In-Reply-To: <20100704113246.562399500@vergenet.net>

[-- Attachment #1: libxt_ipvs-user-space-lib-for-netfilter-matcher-xt_ipvs.patch --]
[-- Type: text/plain, Size: 13930 bytes --]

From:	Hannes Eder <heder@google.com>

The user-space library for the netfilter matcher xt_ipvs.

[ trivial up-port by Simon Horman <horms@verge.net.au> ]
Signed-off-by: Hannes Eder <heder@google.com>
Acked-by: Simon Horman <horms@verge.net.au>

 configure.ac                      |   10 -
 extensions/libxt_ipvs.c           |  365 +++++++++++++++++++++++++++++++++++++
 extensions/libxt_ipvs.man         |   24 ++
 include/linux/netfilter/xt_ipvs.h |   25 +++
 4 files changed, 422 insertions(+), 2 deletions(-)
 create mode 100644 extensions/libxt_ipvs.c
 create mode 100644 extensions/libxt_ipvs.man
 create mode 100644 include/linux/netfilter/xt_ipvs.h

v2.1, v2.3
Trivial up-port

v2.2
No change

Index: iptables/configure.ac
===================================================================
--- iptables.orig/configure.ac	2010-07-04 20:21:07.000000000 +0900
+++ iptables/configure.ac	2010-07-04 20:23:30.000000000 +0900
@@ -52,12 +52,18 @@ AC_ARG_WITH([pkgconfigdir], AS_HELP_STRI
 	[Path to the pkgconfig directory [[LIBDIR/pkgconfig]]]),
 	[pkgconfigdir="$withval"], [pkgconfigdir='${libdir}/pkgconfig'])
 
-AC_CHECK_HEADER([linux/dccp.h])
-
 blacklist_modules="";
+
+AC_CHECK_HEADER([linux/dccp.h])
 if test "$ac_cv_header_linux_dccp_h" != "yes"; then
 	blacklist_modules="$blacklist_modules dccp";
 fi;
+
+AC_CHECK_HEADER([linux/ip_vs.h])
+if test "$ac_cv_header_linux_ip_vs_h" != "yes"; then
+	blacklist_modules="$blacklist_modules ipvs";
+fi;
+
 AC_SUBST([blacklist_modules])
 
 AM_CONDITIONAL([ENABLE_STATIC], [test "$enable_static" = "yes"])
Index: iptables/extensions/libxt_ipvs.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ iptables/extensions/libxt_ipvs.c	2010-07-04 20:24:01.000000000 +0900
@@ -0,0 +1,365 @@
+/*
+ * Shared library add-on to iptables to add IPVS matching.
+ *
+ * Detailed doc is in the kernel module source net/netfilter/xt_ipvs.c
+ *
+ * Author: Hannes Eder <heder@google.com>
+ */
+#include <sys/types.h>
+#include <assert.h>
+#include <ctype.h>
+#include <errno.h>
+#include <getopt.h>
+#include <netdb.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <xtables.h>
+#include <linux/ip_vs.h>
+#include <linux/netfilter/xt_ipvs.h>
+
+static const struct option ipvs_mt_opts[] = {
+	{ .name = "ipvs",     .has_arg = false, .val = '0' },
+	{ .name = "vproto",   .has_arg = true,  .val = '1' },
+	{ .name = "vaddr",    .has_arg = true,  .val = '2' },
+	{ .name = "vport",    .has_arg = true,  .val = '3' },
+	{ .name = "vdir",     .has_arg = true,  .val = '4' },
+	{ .name = "vmethod",  .has_arg = true,  .val = '5' },
+	{ .name = "vportctl", .has_arg = true,  .val = '6' },
+	{ .name = NULL }
+};
+
+static void ipvs_mt_help(void)
+{
+	printf(
+"IPVS match options:\n"
+"[!] --ipvs                      packet belongs to an IPVS connection\n"
+"\n"
+"Any of the following options implies --ipvs (even negated)\n"
+"[!] --vproto protocol           VIP protocol to match; by number or name,\n"
+"                                e.g. \"tcp\"\n"
+"[!] --vaddr address[/mask]      VIP address to match\n"
+"[!] --vport port                VIP port to match; by number or name,\n"
+"                                e.g. \"http\"\n"
+"    --vdir {ORIGINAL|REPLY}     flow direction of packet\n"
+"[!] --vmethod {GATE|IPIP|MASQ}  IPVS forwarding method used\n"
+"[!] --vportctl port             VIP port of the controlling connection to\n"
+"                                match, e.g. 21 for FTP\n"
+		);
+}
+
+static void ipvs_mt_parse_addr_and_mask(const char *arg,
+					union nf_inet_addr *address,
+					union nf_inet_addr *mask,
+					unsigned int family)
+{
+	struct in_addr *addr = NULL;
+	struct in6_addr *addr6 = NULL;
+	unsigned int naddrs = 0;
+
+	if (family == NFPROTO_IPV4) {
+		xtables_ipparse_any(arg, &addr, &mask->in, &naddrs);
+		if (naddrs > 1)
+			xtables_error(PARAMETER_PROBLEM,
+				      "multiple IP addresses not allowed");
+		if (naddrs == 1)
+			memcpy(&address->in, addr, sizeof(*addr));
+	} else if (family == NFPROTO_IPV6) {
+		xtables_ip6parse_any(arg, &addr6, &mask->in6, &naddrs);
+		if (naddrs > 1)
+			xtables_error(PARAMETER_PROBLEM,
+				      "multiple IP addresses not allowed");
+		if (naddrs == 1)
+			memcpy(&address->in6, addr6, sizeof(*addr6));
+	} else {
+		/* Hu? */
+		assert(false);
+	}
+}
+
+/* Function which parses command options; returns true if it ate an option */
+static int ipvs_mt_parse(int c, char **argv, int invert, unsigned int *flags,
+			 const void *entry, struct xt_entry_match **match,
+			 unsigned int family)
+{
+	struct xt_ipvs_mtinfo *data = (void *)(*match)->data;
+	char *p = NULL;
+	u_int8_t op = 0;
+
+	if ('0' <= c && c <= '6') {
+		static const int ops[] = {
+			XT_IPVS_IPVS_PROPERTY,
+			XT_IPVS_PROTO,
+			XT_IPVS_VADDR,
+			XT_IPVS_VPORT,
+			XT_IPVS_DIR,
+			XT_IPVS_METHOD,
+			XT_IPVS_VPORTCTL
+		};
+		op = ops[c - '0'];
+	} else
+		return 0;
+
+	if (*flags & op & XT_IPVS_ONCE_MASK)
+		goto multiple_use;
+
+	switch (c) {
+	case '0': /* --ipvs */
+		/* Nothing to do here. */
+		break;
+
+	case '1': /* --vproto */
+		/* Canonicalize into lower case */
+		for (p = optarg; *p != '\0'; ++p)
+			*p = tolower(*p);
+
+		data->l4proto = xtables_parse_protocol(optarg);
+		break;
+
+	case '2': /* --vaddr */
+		ipvs_mt_parse_addr_and_mask(optarg, &data->vaddr,
+					    &data->vmask, family);
+		break;
+
+	case '3': /* --vport */
+		data->vport = htons(xtables_parse_port(optarg, "tcp"));
+		break;
+
+	case '4': /* --vdir */
+		xtables_param_act(XTF_NO_INVERT, "ipvs", "--vdir", invert);
+		if (strcasecmp(optarg, "ORIGINAL") == 0) {
+			data->bitmask |= XT_IPVS_DIR;
+			data->invert   &= ~XT_IPVS_DIR;
+		} else if (strcasecmp(optarg, "REPLY") == 0) {
+			data->bitmask |= XT_IPVS_DIR;
+			data->invert  |= XT_IPVS_DIR;
+		} else {
+			xtables_param_act(XTF_BAD_VALUE,
+					  "ipvs", "--vdir", optarg);
+		}
+		break;
+
+	case '5': /* --vmethod */
+		if (strcasecmp(optarg, "GATE") == 0)
+			data->fwd_method = IP_VS_CONN_F_DROUTE;
+		else if (strcasecmp(optarg, "IPIP") == 0)
+			data->fwd_method = IP_VS_CONN_F_TUNNEL;
+		else if (strcasecmp(optarg, "MASQ") == 0)
+			data->fwd_method = IP_VS_CONN_F_MASQ;
+		else
+			xtables_param_act(XTF_BAD_VALUE,
+					  "ipvs", "--vmethod", optarg);
+		break;
+
+	case '6': /* --vportctl */
+		data->vportctl = htons(xtables_parse_port(optarg, "tcp"));
+		break;
+
+	default:
+		/* Hu? How did we come here? */
+		assert(false);
+		return 0;
+	}
+
+	if (op & XT_IPVS_ONCE_MASK) {
+		if (data->invert & XT_IPVS_IPVS_PROPERTY)
+			xtables_error(PARAMETER_PROBLEM,
+				      "! --ipvs cannot be together with"
+				      " other options");
+		data->bitmask |= XT_IPVS_IPVS_PROPERTY;
+	}
+
+	data->bitmask |= op;
+	if (invert)
+		data->invert |= op;
+	*flags |= op;
+	return 1;
+
+multiple_use:
+	xtables_error(PARAMETER_PROBLEM,
+		      "multiple use of the same IPVS option is not allowed");
+}
+
+static int ipvs_mt4_parse(int c, char **argv, int invert, unsigned int *flags,
+			  const void *entry, struct xt_entry_match **match)
+{
+	return ipvs_mt_parse(c, argv, invert, flags, entry, match,
+			     NFPROTO_IPV4);
+}
+
+static int ipvs_mt6_parse(int c, char **argv, int invert, unsigned int *flags,
+			  const void *entry, struct xt_entry_match **match)
+{
+	return ipvs_mt_parse(c, argv, invert, flags, entry, match,
+			     NFPROTO_IPV6);
+}
+
+static void ipvs_mt_check(unsigned int flags)
+{
+	if (flags == 0)
+		xtables_error(PARAMETER_PROBLEM,
+			      "IPVS: At least one option is required");
+}
+
+/* Shamelessly copied from libxt_conntrack.c */
+static void ipvs_mt_dump_addr(const union nf_inet_addr *addr,
+			      const union nf_inet_addr *mask,
+			      unsigned int family, bool numeric)
+{
+	char buf[BUFSIZ];
+
+	if (family == NFPROTO_IPV4) {
+		if (!numeric && addr->ip == 0) {
+			printf("anywhere ");
+			return;
+		}
+		if (numeric)
+			strcpy(buf, xtables_ipaddr_to_numeric(&addr->in));
+		else
+			strcpy(buf, xtables_ipaddr_to_anyname(&addr->in));
+		strcat(buf, xtables_ipmask_to_numeric(&mask->in));
+		printf("%s ", buf);
+	} else if (family == NFPROTO_IPV6) {
+		if (!numeric && addr->ip6[0] == 0 && addr->ip6[1] == 0 &&
+		    addr->ip6[2] == 0 && addr->ip6[3] == 0) {
+			printf("anywhere ");
+			return;
+		}
+		if (numeric)
+			strcpy(buf, xtables_ip6addr_to_numeric(&addr->in6));
+		else
+			strcpy(buf, xtables_ip6addr_to_anyname(&addr->in6));
+		strcat(buf, xtables_ip6mask_to_numeric(&mask->in6));
+		printf("%s ", buf);
+	}
+}
+
+static void ipvs_mt_dump(const void *ip, const struct xt_ipvs_mtinfo *data,
+			 unsigned int family, bool numeric, const char *prefix)
+{
+	if (data->bitmask == XT_IPVS_IPVS_PROPERTY) {
+		if (data->invert & XT_IPVS_IPVS_PROPERTY)
+			printf("! ");
+		printf("%sipvs ", prefix);
+	}
+
+	if (data->bitmask & XT_IPVS_PROTO) {
+		if (data->invert & XT_IPVS_PROTO)
+			printf("! ");
+		printf("%sproto %u ", prefix, data->l4proto);
+	}
+
+	if (data->bitmask & XT_IPVS_VADDR) {
+		if (data->invert & XT_IPVS_VADDR)
+			printf("! ");
+
+		printf("%svaddr ", prefix);
+		ipvs_mt_dump_addr(&data->vaddr, &data->vmask, family, numeric);
+	}
+
+	if (data->bitmask & XT_IPVS_VPORT) {
+		if (data->invert & XT_IPVS_VPORT)
+			printf("! ");
+
+		printf("%svport %u ", prefix, ntohs(data->vport));
+	}
+
+	if (data->bitmask & XT_IPVS_DIR) {
+		if (data->invert & XT_IPVS_DIR)
+			printf("%svdir REPLY ", prefix);
+		else
+			printf("%svdir ORIGINAL ", prefix);
+	}
+
+	if (data->bitmask & XT_IPVS_METHOD) {
+		if (data->invert & XT_IPVS_METHOD)
+			printf("! ");
+
+		printf("%svmethod ", prefix);
+		switch (data->fwd_method) {
+		case IP_VS_CONN_F_DROUTE:
+			printf("GATE ");
+			break;
+		case IP_VS_CONN_F_TUNNEL:
+			printf("IPIP ");
+			break;
+		case IP_VS_CONN_F_MASQ:
+			printf("MASQ ");
+			break;
+		default:
+			/* Hu? */
+			printf("UNKNOWN ");
+			break;
+		}
+	}
+
+	if (data->bitmask & XT_IPVS_VPORTCTL) {
+		if (data->invert & XT_IPVS_VPORTCTL)
+			printf("! ");
+
+		printf("%svportctl %u ", prefix, ntohs(data->vportctl));
+	}
+}
+
+static void ipvs_mt4_print(const void *ip, const struct xt_entry_match *match,
+			   int numeric)
+{
+	const struct xt_ipvs_mtinfo *data = (const void *)match->data;
+	ipvs_mt_dump(ip, data, NFPROTO_IPV4, numeric, "");
+}
+
+static void ipvs_mt6_print(const void *ip, const struct xt_entry_match *match,
+			   int numeric)
+{
+	const struct xt_ipvs_mtinfo *data = (const void *)match->data;
+	ipvs_mt_dump(ip, data, NFPROTO_IPV6, numeric, "");
+}
+
+static void ipvs_mt4_save(const void *ip, const struct xt_entry_match *match)
+{
+	const struct xt_ipvs_mtinfo *data = (const void *)match->data;
+	ipvs_mt_dump(ip, data, NFPROTO_IPV4, true, "--");
+}
+
+static void ipvs_mt6_save(const void *ip, const struct xt_entry_match *match)
+{
+	const struct xt_ipvs_mtinfo *data = (const void *)match->data;
+	ipvs_mt_dump(ip, data, NFPROTO_IPV6, true, "--");
+}
+
+static struct xtables_match ipvs_matches_reg[] = {
+	{
+		.version       = XTABLES_VERSION,
+		.name          = "ipvs",
+		.revision      = 0,
+		.family        = NFPROTO_IPV4,
+		.size          = XT_ALIGN(sizeof(struct xt_ipvs_mtinfo)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_ipvs_mtinfo)),
+		.help          = ipvs_mt_help,
+		.parse         = ipvs_mt4_parse,
+		.final_check   = ipvs_mt_check,
+		.print         = ipvs_mt4_print,
+		.save          = ipvs_mt4_save,
+		.extra_opts    = ipvs_mt_opts,
+	},
+	{
+		.version       = XTABLES_VERSION,
+		.name          = "ipvs",
+		.revision      = 0,
+		.family        = NFPROTO_IPV6,
+		.size          = XT_ALIGN(sizeof(struct xt_ipvs_mtinfo)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_ipvs_mtinfo)),
+		.help          = ipvs_mt_help,
+		.parse         = ipvs_mt6_parse,
+		.final_check   = ipvs_mt_check,
+		.print         = ipvs_mt6_print,
+		.save          = ipvs_mt6_save,
+		.extra_opts    = ipvs_mt_opts,
+	},
+};
+
+void _init(void)
+{
+	xtables_register_matches(ipvs_matches_reg,
+				 ARRAY_SIZE(ipvs_matches_reg));
+}
Index: iptables/extensions/libxt_ipvs.man
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ iptables/extensions/libxt_ipvs.man	2010-07-04 20:23:30.000000000 +0900
@@ -0,0 +1,24 @@
+Match IPVS connection properties.
+.TP
+[\fB!\fR] \fB\-\-ipvs\fP
+packet belongs to an IPVS connection
+.TP
+Any of the following options implies \-\-ipvs (even negated)
+.TP
+[\fB!\fR] \fB\-\-vproto\fP \fIprotocol\fP
+VIP protocol to match; by number or name, e.g. "tcp"
+.TP
+[\fB!\fR] \fB\-\-vaddr\fP \fIaddress\fP[\fB/\fP\fImask\fP]
+VIP address to match
+.TP
+[\fB!\fR] \fB\-\-vport\fP \fIport\fP
+VIP port to match; by number or name, e.g. "http"
+.TP
+\fB\-\-vdir\fP {\fBORIGINAL\fP|\fBREPLY\fP}
+flow direction of packet
+.TP
+[\fB!\fR] \fB\-\-vmethod\fP {\fBGATE\fP|\fBIPIP\fP|\fBMASQ\fP}
+IPVS forwarding method used
+.TP
+[\fB!\fR] \fB\-\-vportctl\fP \fIport\fP
+VIP port of the controlling connection to match, e.g. 21 for FTP
Index: iptables/include/linux/netfilter/xt_ipvs.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ iptables/include/linux/netfilter/xt_ipvs.h	2010-07-04 20:23:30.000000000 +0900
@@ -0,0 +1,25 @@
+#ifndef _XT_IPVS_H
+#define _XT_IPVS_H 1
+
+#define XT_IPVS_IPVS_PROPERTY	(1 << 0) /* all other options imply this one */
+#define XT_IPVS_PROTO		(1 << 1)
+#define XT_IPVS_VADDR		(1 << 2)
+#define XT_IPVS_VPORT		(1 << 3)
+#define XT_IPVS_DIR		(1 << 4)
+#define XT_IPVS_METHOD		(1 << 5)
+#define XT_IPVS_VPORTCTL	(1 << 6)
+#define XT_IPVS_MASK		((1 << 7) - 1)
+#define XT_IPVS_ONCE_MASK	(XT_IPVS_MASK & ~XT_IPVS_IPVS_PROPERTY)
+
+struct xt_ipvs_mtinfo {
+	union nf_inet_addr	vaddr, vmask;
+	__be16			vport;
+	__u16			l4proto;
+	__u16			fwd_method;
+	__be16			vportctl;
+
+	__u8			invert;
+	__u8			bitmask;
+};
+
+#endif /* _XT_IPVS_H */


^ permalink raw reply

* [patch v2.3 3/4] IPVS: make FTP work with full NAT support
From: Simon Horman @ 2010-07-04 11:32 UTC (permalink / raw)
  To: lvs-devel, netdev, linux-kernel, netfilter, netfilter-devel
  Cc: Malcolm Turnbull, Wensong Zhang, Julius Volz, Patrick McHardy,
	David S. Miller, Hannes Eder
In-Reply-To: <20100704113246.562399500@vergenet.net>

[-- Attachment #1: IPVS-make-FTP-work-with-full-NAT-support.patch --]
[-- Type: text/plain, Size: 11854 bytes --]

From:	Hannes Eder <heder@google.com>

Use nf_conntrack/nf_nat code to do the packet mangling and the TCP
sequence adjusting.  The function 'ip_vs_skb_replace' is now dead
code, so it is removed.

To SNAT FTP, use something like:

% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 \
> --vport 21 -j SNAT --to-source 192.168.10.10

and for the data connections in passive mode:

% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 \
> --vportctl 21 -j SNAT --to-source 192.168.10.10

using '-m state --state RELATED' would also works.

Make sure the kernel modules ip_vs_ftp, nf_conntrack_ftp, and
nf_nat_ftp are loaded.

[ up-port and minor fixes by Simon Horman <horms@verge.net.au> ]
Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

--- 

 include/net/ip_vs.h             |    2 
 net/netfilter/ipvs/Kconfig      |    2 
 net/netfilter/ipvs/ip_vs_app.c  |   43 ----------
 net/netfilter/ipvs/ip_vs_core.c |    1 
 net/netfilter/ipvs/ip_vs_ftp.c  |  164 ++++++++++++++++++++++++++++++++++++---
 5 files changed, 153 insertions(+), 59 deletions(-)

v2.1
* Up-port

v2.2
* No change

v2.3
* Up-port
* Use %pI4 instead of NIPQUAD
* Drop buf_len = snprintf() change - its a separate, cosmetic, fix

Index: nf-next-2.6/include/net/ip_vs.h
===================================================================
--- nf-next-2.6.orig/include/net/ip_vs.h	2010-07-04 20:30:19.000000000 +0900
+++ nf-next-2.6/include/net/ip_vs.h	2010-07-04 20:32:06.000000000 +0900
@@ -736,8 +736,6 @@ extern void ip_vs_app_inc_put(struct ip_
 
 extern int ip_vs_app_pkt_out(struct ip_vs_conn *, struct sk_buff *skb);
 extern int ip_vs_app_pkt_in(struct ip_vs_conn *, struct sk_buff *skb);
-extern int ip_vs_skb_replace(struct sk_buff *skb, gfp_t pri,
-			     char *o_buf, int o_len, char *n_buf, int n_len);
 extern int ip_vs_app_init(void);
 extern void ip_vs_app_cleanup(void);
 
Index: nf-next-2.6/net/netfilter/ipvs/Kconfig
===================================================================
--- nf-next-2.6.orig/net/netfilter/ipvs/Kconfig	2010-07-04 20:32:05.000000000 +0900
+++ nf-next-2.6/net/netfilter/ipvs/Kconfig	2010-07-04 20:32:06.000000000 +0900
@@ -238,7 +238,7 @@ comment 'IPVS application helper'
 
 config	IP_VS_FTP
   	tristate "FTP protocol helper"
-        depends on IP_VS_PROTO_TCP
+        depends on IP_VS_PROTO_TCP && NF_NAT
 	---help---
 	  FTP is a protocol that transfers IP address and/or port number in
 	  the payload. In the virtual server via Network Address Translation,
Index: nf-next-2.6/net/netfilter/ipvs/ip_vs_app.c
===================================================================
--- nf-next-2.6.orig/net/netfilter/ipvs/ip_vs_app.c	2010-07-04 20:30:19.000000000 +0900
+++ nf-next-2.6/net/netfilter/ipvs/ip_vs_app.c	2010-07-04 20:32:06.000000000 +0900
@@ -569,49 +569,6 @@ static const struct file_operations ip_v
 };
 #endif
 
-
-/*
- *	Replace a segment of data with a new segment
- */
-int ip_vs_skb_replace(struct sk_buff *skb, gfp_t pri,
-		      char *o_buf, int o_len, char *n_buf, int n_len)
-{
-	int diff;
-	int o_offset;
-	int o_left;
-
-	EnterFunction(9);
-
-	diff = n_len - o_len;
-	o_offset = o_buf - (char *)skb->data;
-	/* The length of left data after o_buf+o_len in the skb data */
-	o_left = skb->len - (o_offset + o_len);
-
-	if (diff <= 0) {
-		memmove(o_buf + n_len, o_buf + o_len, o_left);
-		memcpy(o_buf, n_buf, n_len);
-		skb_trim(skb, skb->len + diff);
-	} else if (diff <= skb_tailroom(skb)) {
-		skb_put(skb, diff);
-		memmove(o_buf + n_len, o_buf + o_len, o_left);
-		memcpy(o_buf, n_buf, n_len);
-	} else {
-		if (pskb_expand_head(skb, skb_headroom(skb), diff, pri))
-			return -ENOMEM;
-		skb_put(skb, diff);
-		memmove(skb->data + o_offset + n_len,
-			skb->data + o_offset + o_len, o_left);
-		skb_copy_to_linear_data_offset(skb, o_offset, n_buf, n_len);
-	}
-
-	/* must update the iph total length here */
-	ip_hdr(skb)->tot_len = htons(skb->len);
-
-	LeaveFunction(9);
-	return 0;
-}
-
-
 int __init ip_vs_app_init(void)
 {
 	/* we will replace it with proc_net_ipvs_create() soon */
Index: nf-next-2.6/net/netfilter/ipvs/ip_vs_core.c
===================================================================
--- nf-next-2.6.orig/net/netfilter/ipvs/ip_vs_core.c	2010-07-04 20:32:05.000000000 +0900
+++ nf-next-2.6/net/netfilter/ipvs/ip_vs_core.c	2010-07-04 20:32:06.000000000 +0900
@@ -54,7 +54,6 @@
 
 EXPORT_SYMBOL(register_ip_vs_scheduler);
 EXPORT_SYMBOL(unregister_ip_vs_scheduler);
-EXPORT_SYMBOL(ip_vs_skb_replace);
 EXPORT_SYMBOL(ip_vs_proto_name);
 EXPORT_SYMBOL(ip_vs_conn_new);
 EXPORT_SYMBOL(ip_vs_conn_in_get);
Index: nf-next-2.6/net/netfilter/ipvs/ip_vs_ftp.c
===================================================================
--- nf-next-2.6.orig/net/netfilter/ipvs/ip_vs_ftp.c	2010-07-04 20:30:19.000000000 +0900
+++ nf-next-2.6/net/netfilter/ipvs/ip_vs_ftp.c	2010-07-04 20:32:06.000000000 +0900
@@ -20,6 +20,17 @@
  *
  * Author:	Wouter Gadeyne
  *
+ *
+ * Code for ip_vs_expect_related and ip_vs_expect_callback is taken from
+ * http://www.ssi.bg/~ja/nfct/:
+ *
+ * ip_vs_nfct.c:	Netfilter connection tracking support for IPVS
+ *
+ * Portions Copyright (C) 2001-2002
+ * Antefacto Ltd, 181 Parnell St, Dublin 1, Ireland.
+ *
+ * Portions Copyright (C) 2003-2008
+ * Julian Anastasov
  */
 
 #define KMSG_COMPONENT "IPVS"
@@ -32,6 +43,9 @@
 #include <linux/in.h>
 #include <linux/ip.h>
 #include <linux/netfilter.h>
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_expect.h>
+#include <net/netfilter/nf_nat_helper.h>
 #include <linux/gfp.h>
 #include <net/protocol.h>
 #include <net/tcp.h>
@@ -43,6 +57,16 @@
 #define SERVER_STRING "227 Entering Passive Mode ("
 #define CLIENT_STRING "PORT "
 
+#define FMT_TUPLE	"%pI4:%u->%pI4:%u/%u"
+#define ARG_TUPLE(T)	(T)->src.u3.ip, ntohs((T)->src.u.all), \
+			(T)->dst.u3.ip, ntohs((T)->dst.u.all), \
+			(T)->dst.protonum
+
+#define FMT_CONN	"%pI4:%u->%pI4:%u->%pI4:%u/%u:%u"
+#define ARG_CONN(C)	(C)->caddr, ntohs((C)->cport), \
+			(C)->vaddr, ntohs((C)->vport), \
+			(C)->daddr, ntohs((C)->dport), \
+			(C)->protocol, (C)->state
 
 /*
  * List of ports (up to IP_VS_APP_MAX_PORTS) to be handled by helper
@@ -123,6 +147,119 @@ static int ip_vs_ftp_get_addrport(char *
 	return 1;
 }
 
+/*
+ * Called from init_conntrack() as expectfn handler.
+ */
+static void
+ip_vs_expect_callback(struct nf_conn *ct,
+		      struct nf_conntrack_expect *exp)
+{
+	struct nf_conntrack_tuple *orig, new_reply;
+	struct ip_vs_conn *cp;
+
+	if (exp->tuple.src.l3num != PF_INET)
+		return;
+
+	/*
+	 * We assume that no NF locks are held before this callback.
+	 * ip_vs_conn_out_get and ip_vs_conn_in_get should match their
+	 * expectations even if they use wildcard values, now we provide the
+	 * actual values from the newly created original conntrack direction.
+	 * The conntrack is confirmed when packet reaches IPVS hooks.
+	 */
+
+	/* RS->CLIENT */
+	orig = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+	cp = ip_vs_conn_out_get(exp->tuple.src.l3num, orig->dst.protonum,
+				&orig->src.u3, orig->src.u.tcp.port,
+				&orig->dst.u3, orig->dst.u.tcp.port);
+	if (cp) {
+		/* Change reply CLIENT->RS to CLIENT->VS */
+		new_reply = ct->tuplehash[IP_CT_DIR_REPLY].tuple;
+		IP_VS_DBG(7, "%s(): ct=%p, status=0x%lX, tuples=" FMT_TUPLE ", "
+			  FMT_TUPLE ", found inout cp=" FMT_CONN "\n",
+			  __func__, ct, ct->status,
+			  ARG_TUPLE(orig), ARG_TUPLE(&new_reply),
+			  ARG_CONN(cp));
+		new_reply.dst.u3 = cp->vaddr;
+		new_reply.dst.u.tcp.port = cp->vport;
+		IP_VS_DBG(7, "%s(): ct=%p, new tuples=" FMT_TUPLE ", " FMT_TUPLE
+			  ", inout cp=" FMT_CONN "\n",
+			  __func__, ct,
+			  ARG_TUPLE(orig), ARG_TUPLE(&new_reply),
+			  ARG_CONN(cp));
+		goto alter;
+	}
+
+	/* CLIENT->VS */
+	cp = ip_vs_conn_in_get(exp->tuple.src.l3num, orig->dst.protonum,
+			       &orig->src.u3, orig->src.u.tcp.port,
+			       &orig->dst.u3, orig->dst.u.tcp.port);
+	if (cp) {
+		/* Change reply VS->CLIENT to RS->CLIENT */
+		new_reply = ct->tuplehash[IP_CT_DIR_REPLY].tuple;
+		IP_VS_DBG(7, "%s(): ct=%p, status=0x%lX, tuples=" FMT_TUPLE ", "
+			  FMT_TUPLE ", found outin cp=" FMT_CONN "\n",
+			  __func__, ct, ct->status,
+			  ARG_TUPLE(orig), ARG_TUPLE(&new_reply),
+			  ARG_CONN(cp));
+		new_reply.src.u3 = cp->daddr;
+		new_reply.src.u.tcp.port = cp->dport;
+		IP_VS_DBG(7, "%s(): ct=%p, new tuples=" FMT_TUPLE ", "
+			  FMT_TUPLE ", outin cp=" FMT_CONN "\n",
+			  __func__, ct,
+			  ARG_TUPLE(orig), ARG_TUPLE(&new_reply),
+			  ARG_CONN(cp));
+		goto alter;
+	}
+
+	IP_VS_DBG(7, "%s(): ct=%p, status=0x%lX, tuple=" FMT_TUPLE
+		  " - unknown expect\n",
+		  __func__, ct, ct->status, ARG_TUPLE(orig));
+	return;
+
+alter:
+	/* Never alter conntrack for non-NAT conns */
+	if (IP_VS_FWD_METHOD(cp) == IP_VS_CONN_F_MASQ)
+		nf_conntrack_alter_reply(ct, &new_reply);
+	ip_vs_conn_put(cp);
+	return;
+}
+
+/*
+ * Create NF conntrack expectation with wildcard (optional) source port.
+ * Then the default callback function will alter the reply and will confirm
+ * the conntrack entry when the first packet comes.
+ */
+static void
+ip_vs_expect_related(struct sk_buff *skb, struct nf_conn *ct,
+		     struct ip_vs_conn *cp, u_int8_t proto,
+		     const __be16 *port, int from_rs)
+{
+	struct nf_conntrack_expect *exp;
+
+	BUG_ON(!ct || ct == &nf_conntrack_untracked);
+
+	exp = nf_ct_expect_alloc(ct);
+	if (!exp)
+		return;
+
+	if (from_rs)
+		nf_ct_expect_init(exp, NF_CT_EXPECT_CLASS_DEFAULT,
+				  nf_ct_l3num(ct), &cp->daddr, &cp->caddr,
+				  proto, port, &cp->cport);
+	else
+		nf_ct_expect_init(exp, NF_CT_EXPECT_CLASS_DEFAULT,
+				  nf_ct_l3num(ct), &cp->caddr, &cp->vaddr,
+				  proto, port, &cp->vport);
+
+	exp->expectfn = ip_vs_expect_callback;
+
+	IP_VS_DBG(7, "%s(): ct=%p, expect tuple=" FMT_TUPLE "\n",
+		  __func__, ct, ARG_TUPLE(&exp->tuple));
+	nf_ct_expect_related(exp);
+	nf_ct_expect_put(exp);
+}
 
 /*
  * Look at outgoing ftp packets to catch the response to a PASV command
@@ -147,9 +284,11 @@ static int ip_vs_ftp_out(struct ip_vs_ap
 	union nf_inet_addr from;
 	__be16 port;
 	struct ip_vs_conn *n_cp;
-	char buf[24];		/* xxx.xxx.xxx.xxx,ppp,ppp\000 */
+	char buf[sizeof("xxx,xxx,xxx,xxx,ppp,ppp")];
 	unsigned buf_len;
 	int ret;
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
 
 #ifdef CONFIG_IP_VS_IPV6
 	/* This application helper doesn't work with IPv6 yet,
@@ -219,19 +358,23 @@ static int ip_vs_ftp_out(struct ip_vs_ap
 
 		buf_len = strlen(buf);
 
+		ct = nf_ct_get(skb, &ctinfo);
+		ret = nf_nat_mangle_tcp_packet(skb,
+					       ct,
+					       ctinfo,
+					       start-data,
+					       end-start,
+					       buf,
+					       buf_len);
+
+		if (ct && ct != &nf_conntrack_untracked)
+			ip_vs_expect_related(skb, ct, n_cp,
+					     IPPROTO_TCP, NULL, 0);
+
 		/*
-		 * Calculate required delta-offset to keep TCP happy
+		 * Not setting 'diff' is intentional, otherwise the sequence
+		 * would be adjusted twice.
 		 */
-		*diff = buf_len - (end-start);
-
-		if (*diff == 0) {
-			/* simply replace it with new passive address */
-			memcpy(start, buf, buf_len);
-			ret = 1;
-		} else {
-			ret = !ip_vs_skb_replace(skb, GFP_ATOMIC, start,
-					  end-start, buf, buf_len);
-		}
 
 		cp->app_data = NULL;
 		ip_vs_tcp_conn_listen(n_cp);
@@ -263,6 +406,7 @@ static int ip_vs_ftp_in(struct ip_vs_app
 	union nf_inet_addr to;
 	__be16 port;
 	struct ip_vs_conn *n_cp;
+	struct nf_conn *ct;
 
 #ifdef CONFIG_IP_VS_IPV6
 	/* This application helper doesn't work with IPv6 yet,
@@ -349,6 +493,11 @@ static int ip_vs_ftp_in(struct ip_vs_app
 		ip_vs_control_add(n_cp, cp);
 	}
 
+	ct = (struct nf_conn *)skb->nfct;
+	if (ct && ct != &nf_conntrack_untracked)
+		ip_vs_expect_related(skb, ct, n_cp,
+				     IPPROTO_TCP, &n_cp->dport, 1);
+
 	/*
 	 *	Move tunnel to listen state
 	 */


^ permalink raw reply

* [patch v2.3 2/4] IPVS: make friends with nf_conntrack
From: Simon Horman @ 2010-07-04 11:32 UTC (permalink / raw)
  To: lvs-devel, netdev, linux-kernel, netfilter, netfilter-devel
  Cc: Malcolm Turnbull, Wensong Zhang, Julius Volz, Patrick McHardy,
	David S. Miller, Hannes Eder
In-Reply-To: <20100704113246.562399500@vergenet.net>

[-- Attachment #1: IPVS-make-friends-with-nf_conntrack.patch --]
[-- Type: text/plain, Size: 5504 bytes --]

From:	Hannes Eder <heder@google.com>

Update the nf_conntrack tuple in reply direction, as we will see
traffic from the real server (RIP) to the client (CIP).  Once this is
done we can use netfilters SNAT in POSTROUTING, especially with
xt_ipvs, to do source NAT, e.g.:

% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 --vport 80 \
> -j SNAT --to-source 192.168.10.10

Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

--- 

 net/netfilter/ipvs/Kconfig      |    2 +-
 net/netfilter/ipvs/ip_vs_core.c |   36 ------------------------------------
 net/netfilter/ipvs/ip_vs_xmit.c |   30 ++++++++++++++++++++++++++++++
 3 files changed, 31 insertions(+), 37 deletions(-)

v2.1, v2.2, v2.3
No change

Index: nf-next-2.6/net/netfilter/ipvs/Kconfig
===================================================================
--- nf-next-2.6.orig/net/netfilter/ipvs/Kconfig	2010-07-04 20:18:12.000000000 +0900
+++ nf-next-2.6/net/netfilter/ipvs/Kconfig	2010-07-04 20:18:33.000000000 +0900
@@ -3,7 +3,7 @@
 #
 menuconfig IP_VS
 	tristate "IP virtual server support"
-	depends on NET && INET && NETFILTER
+	depends on NET && INET && NETFILTER && NF_CONNTRACK
 	---help---
 	  IP Virtual Server support will let you build a high-performance
 	  virtual server based on cluster of two or more real servers. This
Index: nf-next-2.6/net/netfilter/ipvs/ip_vs_core.c
===================================================================
--- nf-next-2.6.orig/net/netfilter/ipvs/ip_vs_core.c	2010-07-04 20:18:12.000000000 +0900
+++ nf-next-2.6/net/netfilter/ipvs/ip_vs_core.c	2010-07-04 20:18:33.000000000 +0900
@@ -536,26 +536,6 @@ int ip_vs_leave(struct ip_vs_service *sv
 	return NF_DROP;
 }
 
-
-/*
- *      It is hooked before NF_IP_PRI_NAT_SRC at the NF_INET_POST_ROUTING
- *      chain, and is used for VS/NAT.
- *      It detects packets for VS/NAT connections and sends the packets
- *      immediately. This can avoid that iptable_nat mangles the packets
- *      for VS/NAT.
- */
-static unsigned int ip_vs_post_routing(unsigned int hooknum,
-				       struct sk_buff *skb,
-				       const struct net_device *in,
-				       const struct net_device *out,
-				       int (*okfn)(struct sk_buff *))
-{
-	if (!skb->ipvs_property)
-		return NF_ACCEPT;
-	/* The packet was sent from IPVS, exit this chain */
-	return NF_STOP;
-}
-
 __sum16 ip_vs_checksum_complete(struct sk_buff *skb, int offset)
 {
 	return csum_fold(skb_checksum(skb, offset, skb->len - offset, 0));
@@ -1499,14 +1479,6 @@ static struct nf_hook_ops ip_vs_ops[] __
 		.hooknum        = NF_INET_FORWARD,
 		.priority       = 99,
 	},
-	/* Before the netfilter connection tracking, exit from POST_ROUTING */
-	{
-		.hook		= ip_vs_post_routing,
-		.owner		= THIS_MODULE,
-		.pf		= PF_INET,
-		.hooknum        = NF_INET_POST_ROUTING,
-		.priority       = NF_IP_PRI_NAT_SRC-1,
-	},
 #ifdef CONFIG_IP_VS_IPV6
 	/* After packet filtering, forward packet through VS/DR, VS/TUN,
 	 * or VS/NAT(change destination), so that filtering rules can be
@@ -1535,14 +1507,6 @@ static struct nf_hook_ops ip_vs_ops[] __
 		.hooknum        = NF_INET_FORWARD,
 		.priority       = 99,
 	},
-	/* Before the netfilter connection tracking, exit from POST_ROUTING */
-	{
-		.hook		= ip_vs_post_routing,
-		.owner		= THIS_MODULE,
-		.pf		= PF_INET6,
-		.hooknum        = NF_INET_POST_ROUTING,
-		.priority       = NF_IP6_PRI_NAT_SRC-1,
-	},
 #endif
 };
 
Index: nf-next-2.6/net/netfilter/ipvs/ip_vs_xmit.c
===================================================================
--- nf-next-2.6.orig/net/netfilter/ipvs/ip_vs_xmit.c	2010-07-04 16:15:59.000000000 +0900
+++ nf-next-2.6/net/netfilter/ipvs/ip_vs_xmit.c	2010-07-04 20:18:33.000000000 +0900
@@ -28,6 +28,7 @@
 #include <net/ip6_route.h>
 #include <linux/icmpv6.h>
 #include <linux/netfilter.h>
+#include <net/netfilter/nf_conntrack.h>
 #include <linux/netfilter_ipv4.h>
 
 #include <net/ip_vs.h>
@@ -348,6 +349,31 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb
 }
 #endif
 
+static void
+ip_vs_update_conntrack(struct sk_buff *skb, struct ip_vs_conn *cp)
+{
+	struct nf_conn *ct = (struct nf_conn *)skb->nfct;
+	struct nf_conntrack_tuple new_tuple;
+
+	if (ct == NULL || ct == &nf_conntrack_untracked ||
+	    nf_ct_is_confirmed(ct))
+		return;
+
+	/*
+	 * The connection is not yet in the hashtable, so we update it.
+	 * CIP->VIP will remain the same, so leave the tuple in
+	 * IP_CT_DIR_ORIGINAL untouched.  When the reply comes back from the
+	 * real-server we will see RIP->DIP.
+	 */
+	new_tuple = ct->tuplehash[IP_CT_DIR_REPLY].tuple;
+	new_tuple.src.u3 = cp->daddr;
+	/*
+	 * This will also take care of UDP and other protocols.
+	 */
+	new_tuple.src.u.tcp.port = cp->dport;
+	nf_conntrack_alter_reply(ct, &new_tuple);
+}
+
 /*
  *      NAT transmitter (only for outside-to-inside nat forwarding)
  *      Not used for related ICMP
@@ -403,6 +429,8 @@ ip_vs_nat_xmit(struct sk_buff *skb, stru
 
 	IP_VS_DBG_PKT(10, pp, skb, 0, "After DNAT");
 
+	ip_vs_update_conntrack(skb, cp);
+
 	/* FIXME: when application helper enlarges the packet and the length
 	   is larger than the MTU of outgoing device, there will be still
 	   MTU problem. */
@@ -479,6 +507,8 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, s
 
 	IP_VS_DBG_PKT(10, pp, skb, 0, "After DNAT");
 
+	ip_vs_update_conntrack(skb, cp);
+
 	/* FIXME: when application helper enlarges the packet and the length
 	   is larger than the MTU of outgoing device, there will be still
 	   MTU problem. */


^ permalink raw reply

* [patch v2.3 1/4] netfilter: xt_ipvs (netfilter matcher for IPVS)
From: Simon Horman @ 2010-07-04 11:32 UTC (permalink / raw)
  To: lvs-devel, netdev, linux-kernel, netfilter, netfilter-devel
  Cc: Malcolm Turnbull, Wensong Zhang, Julius Volz, Patrick McHardy,
	David S. Miller, Hannes Eder
In-Reply-To: <20100704113246.562399500@vergenet.net>

[-- Attachment #1: netfilter-xt_ipvs-netfilter-matcher-for-IPVS.patch --]
[-- Type: text/plain, Size: 8570 bytes --]

From:	Hannes Eder <heder@google.com>

This implements the kernel-space side of the netfilter matcher xt_ipvs.

[ minor fixes by Simon Horman <horms@verge.net.au> ]
Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

--- 

 include/linux/netfilter/xt_ipvs.h |   25 ++++
 net/netfilter/Kconfig             |   10 +
 net/netfilter/Makefile            |    1 
 net/netfilter/ipvs/ip_vs_proto.c  |    1 
 net/netfilter/xt_ipvs.c           |  187 +++++++++++++++++++++++++++++++++++++
 5 files changed, 224 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_ipvs.h
 create mode 100644 net/netfilter/xt_ipvs.c

v2.1, v2.2
No Change

v2.3
As per advice from Patrick McHardy
* Don't define a value for _XT_IPVS_H in xt_ipvs.h
* Depend on NF_CONNTRACK
* Update to new API
  - ipvs_mt_check() should return an int rather than a bool
  - Change type of ipvs_mt()'s par parameter from
    struct xt_action_param to struct xt_match_param
  - Make ipvs_mt()'s par parameter non-const
Index: nf-next-2.6/include/linux/netfilter/xt_ipvs.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ nf-next-2.6/include/linux/netfilter/xt_ipvs.h	2010-07-04 16:22:19.000000000 +0900
@@ -0,0 +1,25 @@
+#ifndef _XT_IPVS_H
+#define _XT_IPVS_H
+
+#define XT_IPVS_IPVS_PROPERTY	(1 << 0) /* all other options imply this one */
+#define XT_IPVS_PROTO		(1 << 1)
+#define XT_IPVS_VADDR		(1 << 2)
+#define XT_IPVS_VPORT		(1 << 3)
+#define XT_IPVS_DIR		(1 << 4)
+#define XT_IPVS_METHOD		(1 << 5)
+#define XT_IPVS_VPORTCTL	(1 << 6)
+#define XT_IPVS_MASK		((1 << 7) - 1)
+#define XT_IPVS_ONCE_MASK	(XT_IPVS_MASK & ~XT_IPVS_IPVS_PROPERTY)
+
+struct xt_ipvs_mtinfo {
+	union nf_inet_addr	vaddr, vmask;
+	__be16			vport;
+	__u16			l4proto;
+	__u16			fwd_method;
+	__be16			vportctl;
+
+	__u8			invert;
+	__u8			bitmask;
+};
+
+#endif /* _XT_IPVS_H */
Index: nf-next-2.6/net/netfilter/Kconfig
===================================================================
--- nf-next-2.6.orig/net/netfilter/Kconfig	2010-07-04 16:21:28.000000000 +0900
+++ nf-next-2.6/net/netfilter/Kconfig	2010-07-04 16:22:19.000000000 +0900
@@ -726,6 +726,16 @@ config NETFILTER_XT_MATCH_IPRANGE
 
 	If unsure, say M.
 
+config NETFILTER_XT_MATCH_IPVS
+	tristate '"ipvs" match support'
+	depends on IP_VS
+	depends on NETFILTER_ADVANCED
+	depends on NF_CONNTRACK
+	help
+	  This option allows you to match against IPVS properties of a packet.
+
+	  If unsure, say N.
+
 config NETFILTER_XT_MATCH_LENGTH
 	tristate '"length" match support'
 	depends on NETFILTER_ADVANCED
Index: nf-next-2.6/net/netfilter/Makefile
===================================================================
--- nf-next-2.6.orig/net/netfilter/Makefile	2010-07-04 16:21:28.000000000 +0900
+++ nf-next-2.6/net/netfilter/Makefile	2010-07-04 16:22:19.000000000 +0900
@@ -76,6 +76,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_HASHLIMI
 obj-$(CONFIG_NETFILTER_XT_MATCH_HELPER) += xt_helper.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_HL) += xt_hl.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_IPRANGE) += xt_iprange.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_IPVS) += xt_ipvs.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_LENGTH) += xt_length.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_LIMIT) += xt_limit.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_MAC) += xt_mac.o
Index: nf-next-2.6/net/netfilter/ipvs/ip_vs_proto.c
===================================================================
--- nf-next-2.6.orig/net/netfilter/ipvs/ip_vs_proto.c	2010-07-04 16:21:28.000000000 +0900
+++ nf-next-2.6/net/netfilter/ipvs/ip_vs_proto.c	2010-07-04 16:22:19.000000000 +0900
@@ -98,6 +98,7 @@ struct ip_vs_protocol * ip_vs_proto_get(
 
 	return NULL;
 }
+EXPORT_SYMBOL(ip_vs_proto_get);
 
 
 /*
Index: nf-next-2.6/net/netfilter/xt_ipvs.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ nf-next-2.6/net/netfilter/xt_ipvs.c	2010-07-04 16:43:17.000000000 +0900
@@ -0,0 +1,189 @@
+/*
+ *	xt_ipvs - kernel module to match IPVS connection properties
+ *
+ *	Author: Hannes Eder <heder@google.com>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/spinlock.h>
+#include <linux/skbuff.h>
+#ifdef CONFIG_IP_VS_IPV6
+#include <net/ipv6.h>
+#endif
+#include <linux/ip_vs.h>
+#include <linux/types.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_ipvs.h>
+#include <net/netfilter/nf_conntrack.h>
+
+#include <net/ip_vs.h>
+
+MODULE_AUTHOR("Hannes Eder <heder@google.com>");
+MODULE_DESCRIPTION("Xtables: match IPVS connection properties");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("ipt_ipvs");
+MODULE_ALIAS("ip6t_ipvs");
+
+/* borrowed from xt_conntrack */
+static bool ipvs_mt_addrcmp(const union nf_inet_addr *kaddr,
+			    const union nf_inet_addr *uaddr,
+			    const union nf_inet_addr *umask,
+			    unsigned int l3proto)
+{
+	if (l3proto == NFPROTO_IPV4)
+		return ((kaddr->ip ^ uaddr->ip) & umask->ip) == 0;
+#ifdef CONFIG_IP_VS_IPV6
+	else if (l3proto == NFPROTO_IPV6)
+		return ipv6_masked_addr_cmp(&kaddr->in6, &umask->in6,
+		       &uaddr->in6) == 0;
+#endif
+	else
+		return false;
+}
+
+static bool
+ipvs_mt(const struct sk_buff *skb, struct xt_action_param *par)
+{
+	const struct xt_ipvs_mtinfo *data = par->matchinfo;
+	/* ipvs_mt_check ensures that family is only NFPROTO_IPV[46]. */
+	const u_int8_t family = par->family;
+	struct ip_vs_iphdr iph;
+	struct ip_vs_protocol *pp;
+	struct ip_vs_conn *cp;
+	bool match = true;
+
+	if (data->bitmask == XT_IPVS_IPVS_PROPERTY) {
+		match = skb->ipvs_property ^
+			!!(data->invert & XT_IPVS_IPVS_PROPERTY);
+		goto out;
+	}
+
+	/* other flags than XT_IPVS_IPVS_PROPERTY are set */
+	if (!skb->ipvs_property) {
+		match = false;
+		goto out;
+	}
+
+	ip_vs_fill_iphdr(family, skb_network_header(skb), &iph);
+
+	if (data->bitmask & XT_IPVS_PROTO)
+		if ((iph.protocol == data->l4proto) ^
+		    !(data->invert & XT_IPVS_PROTO)) {
+			match = false;
+			goto out;
+		}
+
+	pp = ip_vs_proto_get(iph.protocol);
+	if (unlikely(!pp)) {
+		match = false;
+		goto out;
+	}
+
+	/*
+	 * Check if the packet belongs to an existing entry
+	 */
+	cp = pp->conn_out_get(family, skb, pp, &iph, iph.len, 1 /* inverse */);
+	if (unlikely(cp == NULL)) {
+		match = false;
+		goto out;
+	}
+
+	/*
+	 * We found a connection, i.e. ct != 0, make sure to call
+	 * __ip_vs_conn_put before returning.  In our case jump to out_put_con.
+	 */
+
+	if (data->bitmask & XT_IPVS_VPORT)
+		if ((cp->vport == data->vport) ^
+		    !(data->invert & XT_IPVS_VPORT)) {
+			match = false;
+			goto out_put_cp;
+		}
+
+	if (data->bitmask & XT_IPVS_VPORTCTL)
+		if ((cp->control != NULL &&
+		     cp->control->vport == data->vportctl) ^
+		    !(data->invert & XT_IPVS_VPORTCTL)) {
+			match = false;
+			goto out_put_cp;
+		}
+
+	if (data->bitmask & XT_IPVS_DIR) {
+		enum ip_conntrack_info ctinfo;
+		struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
+
+		if (ct == NULL || ct == &nf_conntrack_untracked) {
+			match = false;
+			goto out_put_cp;
+		}
+
+		if ((ctinfo >= IP_CT_IS_REPLY) ^
+		    !!(data->invert & XT_IPVS_DIR)) {
+			match = false;
+			goto out_put_cp;
+		}
+	}
+
+	if (data->bitmask & XT_IPVS_METHOD)
+		if (((cp->flags & IP_VS_CONN_F_FWD_MASK) == data->fwd_method) ^
+		    !(data->invert & XT_IPVS_METHOD)) {
+			match = false;
+			goto out_put_cp;
+		}
+
+	if (data->bitmask & XT_IPVS_VADDR) {
+		if (ipvs_mt_addrcmp(&cp->vaddr, &data->vaddr,
+				    &data->vmask, family) ^
+		    !(data->invert & XT_IPVS_VADDR)) {
+			match = false;
+			goto out_put_cp;
+		}
+	}
+
+out_put_cp:
+	__ip_vs_conn_put(cp);
+out:
+	pr_debug("match=%d\n", match);
+	return match;
+}
+
+static int ipvs_mt_check(const struct xt_mtchk_param *par)
+{
+	if (par->family != NFPROTO_IPV4
+#ifdef CONFIG_IP_VS_IPV6
+	    && par->family != NFPROTO_IPV6
+#endif
+		) {
+		pr_info("protocol family %u not supported\n", par->family);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static struct xt_match xt_ipvs_mt_reg __read_mostly = {
+	.name       = "ipvs",
+	.revision   = 0,
+	.family     = NFPROTO_UNSPEC,
+	.match      = ipvs_mt,
+	.checkentry = ipvs_mt_check,
+	.matchsize  = XT_ALIGN(sizeof(struct xt_ipvs_mtinfo)),
+	.me         = THIS_MODULE,
+};
+
+static int __init ipvs_mt_init(void)
+{
+	return xt_register_match(&xt_ipvs_mt_reg);
+}
+
+static void __exit ipvs_mt_exit(void)
+{
+	xt_unregister_match(&xt_ipvs_mt_reg);
+}
+
+module_init(ipvs_mt_init);
+module_exit(ipvs_mt_exit);

^ permalink raw reply

* [patch v2.3 0/4], [patch v2.3 0/4] IPVS full NAT support + netfilter 'ipvs' match support
From: Simon Horman @ 2010-07-04 11:32 UTC (permalink / raw)
  To: lvs-devel, netdev, linux-kernel, netfilter, netfilter-devel
  Cc: Malcolm Turnbull, Wensong Zhang, Julius Volz, Patrick McHardy,
	David S. Miller, Hannes Eder

This is a repost of a patch-series posted by Hannes Eder last September.
This is v2 of the patch series and I don't see any outstanding objections to
it in the mailing list archives.

After I posted v2.2 of this series in May several concerns were raised
by Patrick McHardy. This series should address all of those concerns.

Malcolm Turnbull has offered to test this code so I'd like to get
a Reviewed-by from him before the code gets merged. In other words,
at this stage these patches are for review not merging.

The original cover-email from Hannes follows.
The diffstat output has been updated to reflect minor up-porting by me.

From:	Hannes Eder <heder@google.com>

The following series implements full NAT support for IPVS.  The
approach is via a minimal change to IPVS (make friends with
nf_conntrack) and adding a netfilter matcher, kernel- and user-space
part, i.e. xt_ipvs and libxt_ipvs.

Example usage:

% ipvsadm -A -t 192.168.100.30:80 -s rr
% ipvsadm -a -t 192.168.100.30:80 -r 192.168.10.20:80 -m
# ...

# Source NAT for VIP 192.168.100.30:80
% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 \
> --vport 80 -j SNAT --to-source 192.168.10.10

or SNAT-ing only a specific real server:

% iptables -t nat -A POSTROUTING --dst 192.168.11.20 \
> -m ipvs --vaddr 192.168.100.30/32 -j SNAT --to-source 192.168.10.10


First of all, thanks for all the feedback.  This is the changelog for v2:

- Make ip_vs_ftp work again.  Setup nf_conntrack expectations for
  related data connections (based on Julian's patch see
  http://www.ssi.bg/~ja/nfct/) and let nf_conntrack/nf_nat do the
  packet mangling and the TCP sequence adjusting.

  This change rises the question how to deal with ip_vs_sync?  Does it
  work together with conntrackd?  Wild idea: what about getting rid of
  ip_vs_sync and piggy packing all on nf_conntrack and use conntrackd?

  Any comments on this?

- xt_ipvs: add new rule '--vportctl port' to match the VIP port of the
  controlling connection, e.g. port 21 for FTP.  Can be used to match
  a related data connection for FTP:

  # SNAT FTP control connection
  % iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 \
  > --vport 21 -j SNAT --to-source 192.168.10.10
  
  # SNAT FTP passive data connection
  % iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 \
  > --vportctl 21 -j SNAT --to-source 192.168.10.10

- xt_ipvs: use 'par->family' instead of 'skb->protocol'

- xt_ipvs: add ipvs_mt_check and restrict to NFPROTO_IPV4 and NFPROTO_IPV6

- Call nf_conntrack_alter_reply(), so helper lookup is performed based
  on the changed tuple.

Changes to the linux kernel
(nf-next-2.6, "bridge: add per bridge device controls for invoking iptables")

Hannes Eder (3):
      netfilter: xt_ipvs (netfilter matcher for IPVS)
      IPVS: make friends with nf_conntrack
      IPVS: make FTP work with full NAT support


 include/linux/netfilter/xt_ipvs.h |   25 +++++
 include/net/ip_vs.h              |    2 
 net/netfilter/Kconfig            |   10 ++
 net/netfilter/Makefile           |    1 
 net/netfilter/ipvs/Kconfig       |    4 
 net/netfilter/ipvs/ip_vs_app.c   |   43 ---------
 net/netfilter/ipvs/ip_vs_core.c  |   37 --------
 net/netfilter/ipvs/ip_vs_ftp.c   |  173 +++++++++++++++++++++++++++++++++++---
 net/netfilter/ipvs/ip_vs_proto.c |    1 
 net/netfilter/ipvs/ip_vs_xmit.c  |   30 ++++++
 net/netfilter/xt_ipvs.c           |  189 +++++++++++++++++++++++++++++++++++++
 11 files changed, 419 insertions(+), 96 deletions(-)
 create mode 100644 include/linux/netfilter/xt_ipvs.h
 create mode 100644 net/netfilter/xt_ipvs.c


Changes to iptables
(iptables.git, "xt_quota: also document negation")

Hannes Eder (1):
      libxt_ipvs: user-space lib for netfilter matcher xt_ipvs

 configure.ac                      |   10 1
 extensions/libxt_ipvs.c           |  365 +++++++++++++++++++++++++++++++++++++
 extensions/libxt_ipvs.man         |   24 ++
 include/linux/netfilter/xt_ipvs.h |   25 +++
 4 files changed, 422 insertions(+), 2 deletions(-)
 create mode 100644 extensions/libxt_ipvs.c
 create mode 100644 extensions/libxt_ipvs.man
 create mode 100644 include/linux/netfilter/xt_ipvs.h

^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Michael S. Tsirkin @ 2010-07-04  9:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Sridhar Samudrala, Tejun Heo, Ingo Molnar, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <20100702210637.GA12433@redhat.com>

On Fri, Jul 02, 2010 at 11:06:37PM +0200, Oleg Nesterov wrote:
> On 07/02, Peter Zijlstra wrote:
> >
> > On Fri, 2010-07-02 at 11:01 -0700, Sridhar Samudrala wrote:
> > >
> > >  Does  it (Tejun's kthread_clone() patch) also  inherit the
> > > cgroup of the caller?
> >
> > Of course, its a simple do_fork() which inherits everything just as you
> > would expect from a similar sys_clone()/sys_fork() call.
> 
> Yes. And I'm afraid it can inherit more than we want. IIUC, this is called
> from ioctl(), right?
> 
> Then the new thread becomes the natural child of the caller, and it shares
> ->mm with the parent. And files, dup_fd() without CLONE_FS.
> 
> Signals. Say, if you send SIGKILL to this new thread, it can't sleep in
> TASK_INTERRUPTIBLE or KILLABLE after that. And this SIGKILL can be sent
> just because the parent gets SIGQUIT or abother coredumpable signal.
> Or the new thread can recieve SIGSTOP via ^Z.
> 
> Perhaps this is OK, I do not know. Just to remind that kernel_thread()
> is merely clone(CLONE_VM).
> 
> Oleg.


Right. Doing this might break things like flush.  The signal and exit
behaviour needs to be examined carefully. I am also unsure whether
using such threads might be more expensive than inheriting kthreadd.

-- 
MST

^ permalink raw reply

* How to detect ethernet cable plug in event when ethernet card is in D3hot power state?
From: LionSky @ 2010-07-04  8:50 UTC (permalink / raw)
  To: netdev

Just as the title.
A scenario is as follows:
When the cable is unplug-in, the ethernet card is placed into D3hot
power state for power saving.
When the cable is plug in later, it will be resumed back to D0 for user usage.

But I do not know whether the ethernet driver/OS can detect the cable
plug-in event when ethernet card is at D3hot power state?
 If it can do that, OS can resume it from D3 to D0.  Thanks

^ permalink raw reply

* Re: [PATCH] ipvs: Kconfig cleanup
From: Simon Horman @ 2010-07-04  7:13 UTC (permalink / raw)
  To: Michal Marek
  Cc: lvs-devel, netdev, Julian Anastasov, Wensong Zhang, linux-kernel,
	Patrick McHardy
In-Reply-To: <20100704070516.GA9437@verge.net.au>

[ Added Patrick McHardy to CC ]

On Sun, Jul 04, 2010 at 04:05:16PM +0900, Simon Horman wrote:
> On Fri, Jul 02, 2010 at 10:32:08PM +0200, Michal Marek wrote:
> > IP_VS_PROTO_AH_ESP should be set iff either of IP_VS_PROTO_{AH,ESP} is
> > selected. Express this with standard kconfig syntax.
> > 
> > Signed-off-by: Michal Marek <mmarek@suse.cz>
> 
> Acked-by: Simon Horman <horms@verge.net.au>
> 
> > ---
> >  net/netfilter/ipvs/Kconfig |    5 +----
> >  1 files changed, 1 insertions(+), 4 deletions(-)
> > 
> > diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig
> > index f2d7623..91e7373 100644
> > --- a/net/netfilter/ipvs/Kconfig
> > +++ b/net/netfilter/ipvs/Kconfig
> > @@ -83,19 +83,16 @@ config	IP_VS_PROTO_UDP
> >  	  protocol. Say Y if unsure.
> >  
> >  config	IP_VS_PROTO_AH_ESP
> > -	bool
> > -	depends on UNDEFINED
> > +	def_bool IP_VS_PROTO_ESP || IP_VS_PROTO_AH
> >  
> >  config	IP_VS_PROTO_ESP
> >  	bool "ESP load balancing support"
> > -	select IP_VS_PROTO_AH_ESP
> >  	---help---
> >  	  This option enables support for load balancing ESP (Encapsulation
> >  	  Security Payload) transport protocol. Say Y if unsure.
> >  
> >  config	IP_VS_PROTO_AH
> >  	bool "AH load balancing support"
> > -	select IP_VS_PROTO_AH_ESP
> >  	---help---
> >  	  This option enables support for load balancing AH (Authentication
> >  	  Header) transport protocol. Say Y if unsure.
> > -- 
> > 1.7.1
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] ipvs: Kconfig cleanup
From: Simon Horman @ 2010-07-04  7:05 UTC (permalink / raw)
  To: Michal Marek
  Cc: lvs-devel, netdev, Julian Anastasov, Wensong Zhang, linux-kernel
In-Reply-To: <1278102728-20217-1-git-send-email-mmarek@suse.cz>

On Fri, Jul 02, 2010 at 10:32:08PM +0200, Michal Marek wrote:
> IP_VS_PROTO_AH_ESP should be set iff either of IP_VS_PROTO_{AH,ESP} is
> selected. Express this with standard kconfig syntax.
> 
> Signed-off-by: Michal Marek <mmarek@suse.cz>

Acked-by: Simon Horman <horms@verge.net.au>

> ---
>  net/netfilter/ipvs/Kconfig |    5 +----
>  1 files changed, 1 insertions(+), 4 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig
> index f2d7623..91e7373 100644
> --- a/net/netfilter/ipvs/Kconfig
> +++ b/net/netfilter/ipvs/Kconfig
> @@ -83,19 +83,16 @@ config	IP_VS_PROTO_UDP
>  	  protocol. Say Y if unsure.
>  
>  config	IP_VS_PROTO_AH_ESP
> -	bool
> -	depends on UNDEFINED
> +	def_bool IP_VS_PROTO_ESP || IP_VS_PROTO_AH
>  
>  config	IP_VS_PROTO_ESP
>  	bool "ESP load balancing support"
> -	select IP_VS_PROTO_AH_ESP
>  	---help---
>  	  This option enables support for load balancing ESP (Encapsulation
>  	  Security Payload) transport protocol. Say Y if unsure.
>  
>  config	IP_VS_PROTO_AH
>  	bool "AH load balancing support"
> -	select IP_VS_PROTO_AH_ESP
>  	---help---
>  	  This option enables support for load balancing AH (Authentication
>  	  Header) transport protocol. Say Y if unsure.
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next 1/4] bnx2: Always enable MSI-X on 5709.
From: Michael Chan @ 2010-07-04  6:42 UTC (permalink / raw)
  To: davem; +Cc: netdev

Minor change to use MSI-X even if there is only one CPU.  This allows
the CNIC driver to always have a dedicated MSI-X vector to handle
iSCSI events, instead of sharing the MSI vector.

Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/bnx2.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index a5dd81f..0614ca0 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -6185,7 +6185,7 @@ bnx2_setup_int_mode(struct bnx2 *bp, int dis_msi)
 	bp->irq_nvecs = 1;
 	bp->irq_tbl[0].vector = bp->pdev->irq;
 
-	if ((bp->flags & BNX2_FLAG_MSIX_CAP) && !dis_msi && cpus > 1)
+	if ((bp->flags & BNX2_FLAG_MSIX_CAP) && !dis_msi)
 		bnx2_enable_msix(bp, msix_vecs);
 
 	if ((bp->flags & BNX2_FLAG_MSI_CAP) && !dis_msi &&
-- 
1.6.4.GIT



^ permalink raw reply related

* [PATCH net-next 2/4] bnx2: Add support for skb->rxhash.
From: Michael Chan @ 2010-07-04  6:42 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1278225738-7795-1-git-send-email-mchan@broadcom.com>

Add skb->rxhash support for TCP packets only because the bnx2 RSS hash
does not hash UDP ports.

Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/bnx2.c |   15 ++++++++++++++-
 drivers/net/bnx2.h |    3 +++
 2 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 0614ca0..1450c75 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -3219,6 +3219,10 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 					      L2_FHDR_ERRORS_UDP_XSUM)) == 0))
 				skb->ip_summed = CHECKSUM_UNNECESSARY;
 		}
+		if ((bp->dev->features & NETIF_F_RXHASH) &&
+		    ((status & L2_FHDR_STATUS_USE_RXHASH) ==
+		     L2_FHDR_STATUS_USE_RXHASH))
+			skb->rxhash = rx_hdr->l2_fhdr_hash;
 
 		skb_record_rx_queue(skb, bnapi - &bp->bnx2_napi[0]);
 
@@ -7558,6 +7562,12 @@ bnx2_set_tx_csum(struct net_device *dev, u32 data)
 		return (ethtool_op_set_tx_csum(dev, data));
 }
 
+static int
+bnx2_set_flags(struct net_device *dev, u32 data)
+{
+	return ethtool_op_set_flags(dev, data, ETH_FLAG_RXHASH);
+}
+
 static const struct ethtool_ops bnx2_ethtool_ops = {
 	.get_settings		= bnx2_get_settings,
 	.set_settings		= bnx2_set_settings,
@@ -7587,6 +7597,8 @@ static const struct ethtool_ops bnx2_ethtool_ops = {
 	.phys_id		= bnx2_phys_id,
 	.get_ethtool_stats	= bnx2_get_ethtool_stats,
 	.get_sset_count		= bnx2_get_sset_count,
+	.set_flags		= bnx2_set_flags,
+	.get_flags		= ethtool_op_get_flags,
 };
 
 /* Called with rtnl_lock */
@@ -8333,7 +8345,8 @@ bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	memcpy(dev->dev_addr, bp->mac_addr, 6);
 	memcpy(dev->perm_addr, bp->mac_addr, 6);
 
-	dev->features |= NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_GRO;
+	dev->features |= NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_GRO |
+			 NETIF_F_RXHASH;
 	vlan_features_add(dev, NETIF_F_IP_CSUM | NETIF_F_SG);
 	if (CHIP_NUM(bp) == CHIP_NUM_5709) {
 		dev->features |= NETIF_F_IPV6_CSUM;
diff --git a/drivers/net/bnx2.h b/drivers/net/bnx2.h
index ddaa3fc..b9af6bc 100644
--- a/drivers/net/bnx2.h
+++ b/drivers/net/bnx2.h
@@ -295,6 +295,9 @@ struct l2_fhdr {
 		#define L2_FHDR_ERRORS_TCP_XSUM		(1<<28)
 		#define L2_FHDR_ERRORS_UDP_XSUM		(1<<31)
 
+		#define L2_FHDR_STATUS_USE_RXHASH	\
+			(L2_FHDR_STATUS_TCP_SEGMENT | L2_FHDR_STATUS_RSS_HASH)
+
 	u32 l2_fhdr_hash;
 #if defined(__BIG_ENDIAN)
 	u16 l2_fhdr_pkt_len;
-- 
1.6.4.GIT



^ permalink raw reply related

* [PATCH net-next 4/4] bnx2: Update version to 2.0.16.
From: Michael Chan @ 2010-07-04  6:42 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1278225738-7795-3-git-send-email-mchan@broadcom.com>

Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/bnx2.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 1450c75..ae0a9af 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -58,8 +58,8 @@
 #include "bnx2_fw.h"
 
 #define DRV_MODULE_NAME		"bnx2"
-#define DRV_MODULE_VERSION	"2.0.15"
-#define DRV_MODULE_RELDATE	"May 4, 2010"
+#define DRV_MODULE_VERSION	"2.0.16"
+#define DRV_MODULE_RELDATE	"July 2, 2010"
 #define FW_MIPS_FILE_06		"bnx2/bnx2-mips-06-5.0.0.j6.fw"
 #define FW_RV2P_FILE_06		"bnx2/bnx2-rv2p-06-5.0.0.j3.fw"
 #define FW_MIPS_FILE_09		"bnx2/bnx2-mips-09-5.0.0.j15.fw"
-- 
1.6.4.GIT



^ permalink raw reply related

* [PATCH net-next 3/4] bnx2: Dump some config space registers during TX timeout.
From: Michael Chan @ 2010-07-04  6:42 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1278225738-7795-2-git-send-email-mchan@broadcom.com>

These config register values will be useful when the memory registers
are returning 0xffffffff which has been reported.

Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/bnx2.c |   11 ++++++++---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index ae0a9af..22fa1e9 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -6313,9 +6313,14 @@ static void
 bnx2_dump_state(struct bnx2 *bp)
 {
 	struct net_device *dev = bp->dev;
-	u32 mcp_p0, mcp_p1;
-
-	netdev_err(dev, "DEBUG: intr_sem[%x]\n", atomic_read(&bp->intr_sem));
+	u32 mcp_p0, mcp_p1, val1, val2;
+
+	pci_read_config_dword(bp->pdev, PCI_COMMAND, &val1);
+	netdev_err(dev, "DEBUG: intr_sem[%x] PCI_CMD[%08x]\n",
+		   atomic_read(&bp->intr_sem), val1);
+	pci_read_config_dword(bp->pdev, bp->pm_cap + PCI_PM_CTRL, &val1);
+	pci_read_config_dword(bp->pdev, BNX2_PCICFG_MISC_CONFIG, &val2);
+	netdev_err(dev, "DEBUG: PCI_PM[%08x] PCI_MISC_CFG[%08x]\n", val1, val2);
 	netdev_err(dev, "DEBUG: EMAC_TX_STATUS[%08x] EMAC_RX_STATUS[%08x]\n",
 		   REG_RD(bp, BNX2_EMAC_TX_STATUS),
 		   REG_RD(bp, BNX2_EMAC_RX_STATUS));
-- 
1.6.4.GIT



^ permalink raw reply related

* RE: [REGRESSION] e1000e stopped working
From: Maxim Levitsky @ 2010-07-04  0:41 UTC (permalink / raw)
  To: Tantilov, Emil S
  Cc: netdev@vger.kernel.org, Allan, Bruce W, Pieper, Jeffrey E
In-Reply-To: <1277938757.4138.3.camel@localhost.localdomain>

On Thu, 2010-07-01 at 01:59 +0300, Maxim Levitsky wrote:
> On Tue, 2010-06-29 at 12:37 -0600, Tantilov, Emil S wrote:
> > Maxim Levitsky wrote:
> > > On Mon, 2010-06-28 at 18:09 -0700, Allan, Bruce W wrote:
> > >> On Monday, June 28, 2010 10:14 AM, Maxim Levitsky wrote:
> > >>> On Mon, 2010-06-28 at 10:04 -0700, Allan, Bruce W wrote:
> > >>>> On Sunday, June 27, 2010 10:47 AM, Maxim Levitsky wrote:
> > >>>>> On Sun, 2010-06-27 at 20:43 +0300, Maxim Levitsky wrote:
> > >>>>>> On Sun, 2010-06-27 at 20:29 +0300, Maxim Levitsky wrote:
> > >>>>>>> On Sun, 2010-06-27 at 20:27 +0300, Maxim Levitsky wrote:
> > >>>>>>>> Just that,
> > >>>>>>>> 
> > >>>>>>>> It doesn't receive anything from my internet router during
> > >>>>>>>> DHCP. 
> > >>>>>>>> 
> > >>>>>>>> 
> > >>>>>>>> 00:19.0 Ethernet controller [0200]: Intel Corporation 82566DC
> > >>>>>>>> 	Gigabit Network Connection [8086:104b] (rev 02) Subsystem:
> > >>>>>>>> 	Intel Corporation Device [8086:0001] Control: I/O+ Mem+
> > >>>>>>>> 	BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
> > >>>>>>>> 	SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B-
> > >>>>>>>> 	ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
> > >>>>>>>> 	INTx- 	Latency: 0 Interrupt: pin A routed to IRQ 47 Region 0:
> > >>>>>>>> 	Memory at 50300000 (32-bit, non-prefetchable) [size=128K]
> > >>>>>>>> 	Region 1: Memory at 50324000 (32-bit, non-prefetchable)
> > >>>>>>>> 		[size=4K] Region 2: I/O ports at 30e0 [size=32]
> > >>>>>>>> 		Capabilities: [c8] Power Management version 2 Flags: PMEClk-
> > >>>>>>>> 	DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > >>>>>>>> 		Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities:
> > >>>>>>>> 	[d0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0
> > >>>>>>>> 	Enable+ Address: 00000000fee0100c  Data: 41c9 Kernel driver
> > >>>>>>>> in use: e1000e Kernel modules: e1000e 
> > >>>>>>>> 
> > >>>>>>>> I use vanilla tree, commit
> > >>>>>>>> bf2937695fe2330bfd8933a2310e7bdd2581dc2e
> > >>>>>>>> 
> > >>>>>>>> 
> > >>>>>>>> Best regards,
> > >>>>>>>> 	Maxim Levitsky
> > >>>>>>>> 
> > >>>>>>> 
> > >>>>>>> It appears to work now after reboot.
> > >>>>>>> Will keep a look for this.
> > >>>>>>> 
> > >>>>>>> Disregard for now.
> > >>>>>> 
> > >>>>>> 
> > >>>>>> Just s2ram cycle, problem is back.
> > >>>>>> Did full reboot (power off then on), same thing card doesn't
> > >>>>>> work... 
> > >>>>>> 
> > >>>>> 
> > >>>>> Yep, s2ram sometimes 'fixes', sometimes breaks the card.
> > >>>>> Something got broken in device initialization path.
> > >>>>> 
> > >>>>> Best regards,
> > >>>>>  	Maxim Levitsky
> > >>>> 
> > >>>> What distro are you using?  If RedHat, since you are using DHCP
> > >>>> will you please try putting a "LINKDELAY=10" in the
> > >>>> /etc/sysconfig/network-scripts/ifcfg-ethX config file.
> > >>>> 
> > >>> I use ubuntu 9.10
> > >>> 
> > >>>> Is there anything in the system log that might help narrow down the
> > >>>> issue?
> > >>> 
> > >>> Nothing, really nothing.
> > >>> It seems to detect link, dhcp client sends requests, but doesn't
> > >>> recieve a thing (even tried promisc mode - doesn't help)
> > >>> 
> > >>> 
> > >>> 
> > >>> Best regards,
> > >>> 	Maxim Levitsky
> > >> 
> > >> Since you say this is a regression, when did this last work for you
> > >> without this problem, i.e. which distro, which kernel? 
> > > 
> > > I always compile kernel, and last kernel I compiled here was vanilla
> > > 2.6.33-rc4.
> > > It works just fine.
> > > 
> > > I mostly use my laptop, and therefore didn't update kernel on my
> > > desktop for long time.
> > > 
> > > If I find some free time I try to bisect the problem.
> > 
> > Could you provide some additional info about your setup:
> > ethtool -e eth0
> > ethtool -d eth0
> > kernel config (if possible)
> > 
> > What is the model of your system/MB?
> 
> 
> Sure,
> 
> 
> My motherboard on this system is Intel DG965RY
> 
> The bug in about 90% reproducible.
> Doing several s2ram cycles, its possible to catch a moment when the
> device starts working.
> 

Just tested 2.6.34, and it works, so this is 2.6.35 regression.

Best regards,
	Maxim Levitsky


^ permalink raw reply

* Re: setsockopt(IP_TOS) being privileged or distinct capability?
From: Alexander Clouter @ 2010-07-03 23:48 UTC (permalink / raw)
  To: Philip Prindeville; +Cc: netdev
In-Reply-To: <4C2FC2C8.8080203@redfish-solutions.com>

Hi,

* Philip Prindeville <philipp_subx@redfish-solutions.com> [2010-07-03 17:07:52-0600]:
>
> On 7/3/10 12:55 PM, Alexander Clouter wrote:
>>    
>>> Does anyone else think that setsockopt(IP_TOS) should be a privileged
>>> operation, perhaps using CAP_NET_ADMIN, or maybe even adding separate
>>> granularity as CAP_NET_TOS?
>>>
>>>      
>> I really would prefer not having to run telnet and ssh *clients* as
>> root. :)
>
> Don't ping and traceroute -I currently run as root?
>
Indeed, but I have no idea what that has to do with ToS/DSCP flags?

ping and (old skool) traceroute use ICMP where you need to open a 
privileged socket; to send and receive ICMP packets.  Opening a UDP/TCP 
is an unprivileged operation and so is setsockopt(IP_TOS).

I'm guessing, if you excuse me Google-stalking you), this is all linked 
to:

https://bugzilla.mindrot.org/show_bug.cgi?id=1733

You have to bear in mind ToS is a marking that userland can utilise to 
request that the network provides it with a particular QoS, this does 
not mean for an instant the network has to honour that (I know my ISP 
does not and neither does my work network I sysadmin for)...otherwise 
nothing would stop me using:

iptables -t mangle -I POSTROUTING -j DSCP --set-dscp-class EF

QoS is meaningless unless you place boundaries on the policies; the 
ToS/DSCP marking should only be used as a *hint* for classification of 
traffic flows.

For example, 'interactive' and 'low latency' (in the case of SSH or 
telnet) should not exceed 10kB/s...unless you like to play 0verkill :)  
Anything marking it's traffic as interactive but shutting traffic at 
500kB/s is obviously telling lies.  If you build your policing rules to 
blindly accept whatever is in the ToS/DSCP field, you are configuring a 
DoS vector on your network.

Cheers

-- 
Alexander Clouter
.sigmonster says: A rolling stone gathers momentum.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox