Netdev List

Netdev List
 help / color / mirror / Atom feed

* Server Rental services in Hong Kong
From: trtr678678 @ 2012-06-14 15:22 UTC (permalink / raw)


Dear All,

We have our own datacenter in Hong Kong & provide email/application/web rental service to clients.We are APNIC member & provide clean IP to clients.

Dell? PowerEdge? EnterpriseRack Mount Server
-Intel(R) Xeon(R) E3-1240 Processor (3.3GHz, 8M Cache, Turbo, 4C/8T, 80W)
-8GB RAM, 2x4GB, 1333MHz, DDR-3, Dual Ranked UDIMMs
-500GB, 3.5", 6Gbps SAS x 2
-Raid 1 Mirroring Protection
-Remote KVM (iDRAC6 Enterprise)

Dell(TM) PowerEdge(TM) R410 Rack Mount Server
-Intel(R) Quad Core E5606 Xeon(R) CPU, 2.13GHz, 4M Cache, 4.86 GT/s QPI
-4GB Memory (2x2GB), 1333MHz Dual Ranked RDIMMs Fully-Buffered
-500GB 7.2K RPM SATAII 3.5" Hard Drive x 2
-iDRAC6 Enterprise or Express (Remote KVM Management)

Every Dedicated Server Hosting Solution Also Includes:

Software Specification
- CentOS / Fedora / Debian / FreeBSD / Ubuntu / Redhat Linux
- Full root-level access
- Data Center Facilities
- Shared Local & International Bandwidth
- 2 IP Addresses Allocation
- Un-interruptible Power Supply (UPS) backed up by private diesel generator
- FM200¡§based fire suppression system
- 24x7 CRAC Air Conditioning and Humidity Control
- 24x7 Security Control
- 24x7 Remote Hand Service

Pls send us email for further information.Thanks,

Ron
trtr678678@gmail.com

If you do not wish to further receive this event message, email "trtr789789@gmail.com" to unsubscribe this message or remove your email from the list.

^ permalink raw reply

* Server Rental services in Hong Kong
From: trtr678678 @ 2012-06-14 15:22 UTC (permalink / raw)


Dear All,

We have our own datacenter in Hong Kong & provide email/application/web rental service to clients.We are APNIC member & provide clean IP to clients.

Dell? PowerEdge? EnterpriseRack Mount Server
-Intel(R) Xeon(R) E3-1240 Processor (3.3GHz, 8M Cache, Turbo, 4C/8T, 80W)
-8GB RAM, 2x4GB, 1333MHz, DDR-3, Dual Ranked UDIMMs
-500GB, 3.5", 6Gbps SAS x 2
-Raid 1 Mirroring Protection
-Remote KVM (iDRAC6 Enterprise)

Dell(TM) PowerEdge(TM) R410 Rack Mount Server
-Intel(R) Quad Core E5606 Xeon(R) CPU, 2.13GHz, 4M Cache, 4.86 GT/s QPI
-4GB Memory (2x2GB), 1333MHz Dual Ranked RDIMMs Fully-Buffered
-500GB 7.2K RPM SATAII 3.5" Hard Drive x 2
-iDRAC6 Enterprise or Express (Remote KVM Management)

Every Dedicated Server Hosting Solution Also Includes:

Software Specification
- CentOS / Fedora / Debian / FreeBSD / Ubuntu / Redhat Linux
- Full root-level access
- Data Center Facilities
- Shared Local & International Bandwidth
- 2 IP Addresses Allocation
- Un-interruptible Power Supply (UPS) backed up by private diesel generator
- FM200¡§based fire suppression system
- 24x7 CRAC Air Conditioning and Humidity Control
- 24x7 Security Control
- 24x7 Remote Hand Service

Pls send us email for further information.Thanks,

Ron
trtr678678@gmail.com

If you do not wish to further receive this event message, email "trtr789789@gmail.com" to unsubscribe this message or remove your email from the list.

^ permalink raw reply

* Re: Regression on TX throughput when using bonding
From: Jean-Michel Hautbois @ 2012-06-14 15:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1339684157.22704.722.camel@edumazet-glaptop>

2012/6/14 Eric Dumazet <eric.dumazet@gmail.com>:
> On Thu, 2012-06-14 at 16:14 +0200, Jean-Michel Hautbois wrote:
>
>> ~# tc -s -d qdisc show dev eth1 > before_tc && sleep 10 && tc -s -d
>> qdisc show dev eth1 > after_tc && ./beforeafter before_tc after_tc
>> qdisc mq 0: root
>>  Sent 3185900568 bytes 788681 pkt (dropped 0, overlimits 0 requeues 620)
>>  backlog 0b 0p requeues 620
>>
>> As you can see, 2.5Gbps without any difficulties :).
>>
>> Thanks,
>> JM
>
> I have no idea why throughput on ethernet link is changed.
>
> There is another bug elsewhere.  Use a thousand of sockets instead of
> few, and you'll hit the bug.
>
> Orphaning skbs should not lower speed of the device, only drops excess
> packets, instead of blocking the application, waiting the socket wmem
> alloc being freed by destructors.
>
> Are you playing with process priorities ?
>
> If the ksoftirqd cannot run, this could explain the problem.
>

As suggested by Eric, here is a description I wish to be as precise as possible.
I send three RAW video frames, 1920x1088@30fps on three udp sockets to
the same NIC.
Each sending is in a thread, so I will focus on the numbers for one thread.

This generates burst of send(), as this : each 1/30s send 3.133.440
bytes to the ethernet interface.
This is in fact something similar to this :
while (n != 0)
{
  sendto(socket, packet, 4000);
  n -= 4000;
  packet += 4000
}

My interface is a bond with a 10Gbps interface and MTU set to 4096.
This means I have 784 packets each 1/30s which are sent on my
interface by one thread, then I wait for the next burst, and so on.
The videos are not necessarily the same video, so the threads may send
simultaneously or not...

My socket is in blocking mode.

JM

^ permalink raw reply

* Re: [PATCH 0/8] dcbnl: Major simplifications
From: John Fastabend @ 2012-06-14 16:06 UTC (permalink / raw)
  To: tgraf, alexander.h.duyck; +Cc: David Miller, netdev, lucy.liu
In-Reply-To: <20120614075435.GA29185@canuck.infradead.org>

On 6/14/2012 12:54 AM, Thomas Graf wrote:
> On Wed, Jun 13, 2012 at 03:55:41PM -0700, David Miller wrote:
>> Lots of deleted code, I like it :-)
>>
>> Applied, but could you send a follow-on patch to use BUG_ON() instead
>> of that "if (!ptr) { /* ... */ BUG(); }" construct?
>
> Sure, I must have had a weak moment right there :)
>

Nice! I'm a bit late but dumped this into my dcbnl netlink test kit
and everything looks good so...

Tested-by: john.r.fastabend@intel.com

^ permalink raw reply

* [PATCH] net: remove skb_orphan_try()
From: Eric Dumazet @ 2012-06-14 16:42 UTC (permalink / raw)
  To: David Miller; +Cc: jhautbois, netdev
In-Reply-To: <20120614.033153.258221733380821664.davem@davemloft.net>

From: Eric Dumazet <edumazet@google.com>

On Thu, 2012-06-14 at 03:31 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>

> > We should have a way to properly park packets in Qdiscs, and only do the
> > orphaning once skb given to real device for 'immediate or so'
> > transmission.
> 
> Ok.

In the other hand, all this stuff happens too late with BQL, since more
packets are parked in a Qdisc instead of being delivered with hot
caches.

Doing the orphaning once packet was enqueued, then dequeued, is probably
not worth adding yet another test in fast path.



[PATCH] net: remove skb_orphan_try()

Orphaning skb in dev_hard_start_xmit() makes bonding behavior
unfriendly for applications sending big UDP bursts : Once packets
pass the bonding device and come to real device, they might hit a full
qdisc and be dropped. Without orphaning, the sender is automatically
throttled because sk->sk_wmemalloc reaches sk->sk_sndbuf (assuming
sk_sndbuf is not too big)

We could try to defer the orphaning adding another test in
dev_hard_start_xmit(), but all this seems of little gain,
now that BQL tends to make packets more likely to be parked
in Qdisc queues instead of NIC TX ring, in cases where performance
matters.

Reverts commits :
fc6055a5ba31 net: Introduce skb_orphan_try()
87fd308cfc6b net: skb_tx_hash() fix relative to skb_orphan_try()
and removes SKBTX_DRV_NEEDS_SK_REF flag

Reported-and-bisected-by: Jean-Michel Hautbois <jhautbois@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/skbuff.h |    7 ++-----
 net/can/raw.c          |    3 ---
 net/core/dev.c         |   23 +----------------------
 net/iucv/af_iucv.c     |    1 -
 4 files changed, 3 insertions(+), 31 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b534a1b..642cb73 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -225,14 +225,11 @@ enum {
 	/* device driver is going to provide hardware time stamp */
 	SKBTX_IN_PROGRESS = 1 << 2,
 
-	/* ensure the originating sk reference is available on driver level */
-	SKBTX_DRV_NEEDS_SK_REF = 1 << 3,
-
 	/* device driver supports TX zero-copy buffers */
-	SKBTX_DEV_ZEROCOPY = 1 << 4,
+	SKBTX_DEV_ZEROCOPY = 1 << 3,
 
 	/* generate wifi status information (where possible) */
-	SKBTX_WIFI_STATUS = 1 << 5,
+	SKBTX_WIFI_STATUS = 1 << 4,
 };
 
 /*
diff --git a/net/can/raw.c b/net/can/raw.c
index cde1b4a..46cca3a 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -681,9 +681,6 @@ static int raw_sendmsg(struct kiocb *iocb, struct socket *sock,
 	if (err < 0)
 		goto free_skb;
 
-	/* to be able to check the received tx sock reference in raw_rcv() */
-	skb_shinfo(skb)->tx_flags |= SKBTX_DRV_NEEDS_SK_REF;
-
 	skb->dev = dev;
 	skb->sk  = sk;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index cd09819..6df2140 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2089,25 +2089,6 @@ static int dev_gso_segment(struct sk_buff *skb, netdev_features_t features)
 	return 0;
 }
 
-/*
- * Try to orphan skb early, right before transmission by the device.
- * We cannot orphan skb if tx timestamp is requested or the sk-reference
- * is needed on driver level for other reasons, e.g. see net/can/raw.c
- */
-static inline void skb_orphan_try(struct sk_buff *skb)
-{
-	struct sock *sk = skb->sk;
-
-	if (sk && !skb_shinfo(skb)->tx_flags) {
-		/* skb_tx_hash() wont be able to get sk.
-		 * We copy sk_hash into skb->rxhash
-		 */
-		if (!skb->rxhash)
-			skb->rxhash = sk->sk_hash;
-		skb_orphan(skb);
-	}
-}
-
 static bool can_checksum_protocol(netdev_features_t features, __be16 protocol)
 {
 	return ((features & NETIF_F_GEN_CSUM) ||
@@ -2193,8 +2174,6 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 		if (!list_empty(&ptype_all))
 			dev_queue_xmit_nit(skb, dev);
 
-		skb_orphan_try(skb);
-
 		features = netif_skb_features(skb);
 
 		if (vlan_tx_tag_present(skb) &&
@@ -2304,7 +2283,7 @@ u16 __skb_tx_hash(const struct net_device *dev, const struct sk_buff *skb,
 	if (skb->sk && skb->sk->sk_hash)
 		hash = skb->sk->sk_hash;
 	else
-		hash = (__force u16) skb->protocol ^ skb->rxhash;
+		hash = (__force u16) skb->protocol;
 	hash = jhash_1word(hash, hashrnd);
 
 	return (u16) (((u64) hash * qcount) >> 32) + qoffset;
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 07d7d55..cd6f7a9 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -372,7 +372,6 @@ static int afiucv_hs_send(struct iucv_message *imsg, struct sock *sock,
 			skb_trim(skb, skb->dev->mtu);
 	}
 	skb->protocol = ETH_P_AF_IUCV;
-	skb_shinfo(skb)->tx_flags |= SKBTX_DRV_NEEDS_SK_REF;
 	nskb = skb_clone(skb, GFP_ATOMIC);
 	if (!nskb)
 		return -ENOMEM;

^ permalink raw reply related

* Re: Regression on TX throughput when using bonding
From: Rick Jones @ 2012-06-14 17:46 UTC (permalink / raw)
  To: Jean-Michel Hautbois; +Cc: Eric Dumazet, netdev
In-Reply-To: <CAL8zT=joBA5pgXB7QfDM5qhOizmdneghXsSnwN5G74-yoGzg_Q@mail.gmail.com>

On 06/14/2012 08:43 AM, Jean-Michel Hautbois wrote:
> As suggested by Eric, here is a description I wish to be as precise as possible.
> I send three RAW video frames, 1920x1088@30fps on three udp sockets to
> the same NIC.
> Each sending is in a thread, so I will focus on the numbers for one thread.
>
> This generates burst of send(), as this : each 1/30s send 3.133.440
> bytes to the ethernet interface.
> This is in fact something similar to this :
> while (n != 0)
> {
>    sendto(socket, packet, 4000);
>    n -= 4000;
>    packet += 4000
> }
>
> My interface is a bond with a 10Gbps interface and MTU set to 4096.
> This means I have 784 packets each 1/30s which are sent on my
> interface by one thread, then I wait for the next burst, and so on.
> The videos are not necessarily the same video, so the threads may send
> simultaneously or not...
>
> My socket is in blocking mode.

If desired, here is how to simulate that with netperf:

./configure --enable-intervals
make

And an example over loopback:

raj@tardy:~/netperf2_trunk$ src/netperf -l 10 -t UDP_STREAM -H localhost 
-w 33 -b 783 -- -s 1M -S 1M -m 4000
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to tardy (::1) 
port 0 AF_INET6 : interval
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

2097152    4000   9.99       260739      0     835.18
2097152           9.99       260442            834.23


Adjust the -s and/or -S options to match what Jean-Michel's application 
uses for socket buffer sizes.  Run another two simultaneous instances to 
get the three streams.  Adjust the run length with the -l option.

happy benchmarking,

rick jones

^ permalink raw reply

* Re: [PATCH 02/10] netfilter: add parameter proto for l4proto.init_net
From: Pablo Neira Ayuso @ 2012-06-14 17:59 UTC (permalink / raw)
  To: Gao feng; +Cc: netdev, netfilter-devel
In-Reply-To: <1339668445-23848-2-git-send-email-gaofeng@cn.fujitsu.com>

On Thu, Jun 14, 2012 at 06:07:17PM +0800, Gao feng wrote:
> there are redundancy codes in l4proto's init_net functions.
> we can use one init_net function and l3proto to impletment
> the same thing.
> 
> So we should add l3proto as a parameter for init_net function.
> 
> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> ---
>  include/net/netfilter/nf_conntrack_l4proto.h   |    2 +-
>  net/ipv4/netfilter/nf_conntrack_proto_icmp.c   |    2 +-
>  net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c |    2 +-
>  net/netfilter/nf_conntrack_proto.c             |    6 ++++--
>  net/netfilter/nf_conntrack_proto_dccp.c        |    2 +-
>  net/netfilter/nf_conntrack_proto_generic.c     |    2 +-
>  net/netfilter/nf_conntrack_proto_gre.c         |    2 +-
>  net/netfilter/nf_conntrack_proto_sctp.c        |    4 ++--
>  net/netfilter/nf_conntrack_proto_tcp.c         |    4 ++--
>  net/netfilter/nf_conntrack_proto_udp.c         |    4 ++--
>  net/netfilter/nf_conntrack_proto_udplite.c     |    2 +-
>  11 files changed, 17 insertions(+), 15 deletions(-)
> 
> diff --git a/include/net/netfilter/nf_conntrack_l4proto.h b/include/net/netfilter/nf_conntrack_l4proto.h
> index 81c52b5..5dd60f2 100644
> --- a/include/net/netfilter/nf_conntrack_l4proto.h
> +++ b/include/net/netfilter/nf_conntrack_l4proto.h
> @@ -97,7 +97,7 @@ struct nf_conntrack_l4proto {
>  #endif
>  	int	*net_id;
>  	/* Init l4proto pernet data */
> -	int (*init_net)(struct net *net);
> +	int (*init_net)(struct net *net, u_int16_t proto);
>  
>  	/* Protocol name */
>  	const char *name;
> diff --git a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
> index 041923c..76f7a2f 100644
> --- a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
> +++ b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
> @@ -337,7 +337,7 @@ static struct ctl_table icmp_compat_sysctl_table[] = {
>  #endif /* CONFIG_NF_CONNTRACK_PROC_COMPAT */
>  #endif /* CONFIG_SYSCTL */
>  
> -static int icmp_init_net(struct net *net)
> +static int icmp_init_net(struct net *net, u_int16_t proto)
>  {
>  	struct nf_icmp_net *in = icmp_pernet(net);
>  	struct nf_proto_net *pn = (struct nf_proto_net *)in;
> diff --git a/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c b/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c
> index 63ed012..807ae09 100644
> --- a/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c
> +++ b/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c
> @@ -333,7 +333,7 @@ static struct ctl_table icmpv6_sysctl_table[] = {
>  };
>  #endif /* CONFIG_SYSCTL */
>  
> -static int icmpv6_init_net(struct net *net)
> +static int icmpv6_init_net(struct net *net, u_int16_t proto)
>  {
>  	struct nf_icmp_net *in = icmpv6_pernet(net);
>  	struct nf_proto_net *pn = (struct nf_proto_net *)in;
> diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
> index a434dd7..8fc0332 100644
> --- a/net/netfilter/nf_conntrack_proto.c
> +++ b/net/netfilter/nf_conntrack_proto.c
> @@ -193,6 +193,7 @@ static int nf_ct_l3proto_register_sysctl(struct net *net,
>  					    l3proto->ctl_table_path,
>  					    in->ctl_table,
>  					    NULL);
> +

This entire patchset contains many extra new lines. If you want to
provide some cleanup, it should come in some follow-up patch.

>  		if (err < 0) {
>  			kfree(in->ctl_table);
>  			in->ctl_table = NULL;
> @@ -460,7 +461,7 @@ int nf_conntrack_l4proto_register(struct net *net,
>  {
>  	int ret = 0;
>  	if (l4proto->init_net) {
> -		ret = l4proto->init_net(net);
> +		ret = l4proto->init_net(net, l4proto->l3proto);
>  		if (ret < 0)
>  			return ret;
>  	}
> @@ -514,7 +515,8 @@ int nf_conntrack_proto_init(struct net *net)
>  {
>  	unsigned int i;
>  	int err;
> -	err = nf_conntrack_l4proto_generic.init_net(net);
> +	err = nf_conntrack_l4proto_generic.init_net(net,
> +						    nf_conntrack_l4proto_generic.l3proto);

You have to make sure that lines break at 80-chars per column.

Something like this should be fine:

        err = nf_conntrack_l4proto_generic.init_net(net,
                                        nf_conntrack_l4proto_generic.l3proto);


>  	if (err < 0)
>  		return err;
>  	err = nf_ct_l4proto_register_sysctl(net,
> diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c
> index c33f76a..52da8f0 100644
> --- a/net/netfilter/nf_conntrack_proto_dccp.c
> +++ b/net/netfilter/nf_conntrack_proto_dccp.c
> @@ -815,7 +815,7 @@ static struct ctl_table dccp_sysctl_table[] = {
>  };
>  #endif /* CONFIG_SYSCTL */
>  
> -static int dccp_init_net(struct net *net)
> +static int dccp_init_net(struct net *net, u_int16_t proto)
>  {
>  	struct dccp_net *dn = dccp_pernet(net);
>  	struct nf_proto_net *pn = (struct nf_proto_net *)dn;
> diff --git a/net/netfilter/nf_conntrack_proto_generic.c b/net/netfilter/nf_conntrack_proto_generic.c
> index bb0e74f..d1ed7b4 100644
> --- a/net/netfilter/nf_conntrack_proto_generic.c
> +++ b/net/netfilter/nf_conntrack_proto_generic.c
> @@ -135,7 +135,7 @@ static struct ctl_table generic_compat_sysctl_table[] = {
>  #endif /* CONFIG_NF_CONNTRACK_PROC_COMPAT */
>  #endif /* CONFIG_SYSCTL */
>  
> -static int generic_init_net(struct net *net)
> +static int generic_init_net(struct net *net, u_int16_t proto)
>  {
>  	struct nf_generic_net *gn = generic_pernet(net);
>  	struct nf_proto_net *pn = (struct nf_proto_net *)gn;
> diff --git a/net/netfilter/nf_conntrack_proto_gre.c b/net/netfilter/nf_conntrack_proto_gre.c
> index 25ba5a2..851b93b 100644
> --- a/net/netfilter/nf_conntrack_proto_gre.c
> +++ b/net/netfilter/nf_conntrack_proto_gre.c
> @@ -348,7 +348,7 @@ gre_timeout_nla_policy[CTA_TIMEOUT_GRE_MAX+1] = {
>  };
>  #endif /* CONFIG_NF_CT_NETLINK_TIMEOUT */
>  
> -static int gre_init_net(struct net *net)
> +static int gre_init_net(struct net *net, u_int16_t proto)
>  {
>  	struct netns_proto_gre *net_gre = gre_pernet(net);
>  	int i;
> diff --git a/net/netfilter/nf_conntrack_proto_sctp.c b/net/netfilter/nf_conntrack_proto_sctp.c
> index 8fb0582..1e7836c 100644
> --- a/net/netfilter/nf_conntrack_proto_sctp.c
> +++ b/net/netfilter/nf_conntrack_proto_sctp.c
> @@ -767,7 +767,7 @@ static int sctp_kmemdup_compat_sysctl_table(struct nf_proto_net *pn)
>  	return 0;
>  }
>  
> -static int sctpv4_init_net(struct net *net)
> +static int sctpv4_init_net(struct net *net, u_int16_t proto)
>  {
>  	int ret;
>  	struct sctp_net *sn = sctp_pernet(net);
> @@ -793,7 +793,7 @@ static int sctpv4_init_net(struct net *net)
>  	return ret;
>  }
>  
> -static int sctpv6_init_net(struct net *net)
> +static int sctpv6_init_net(struct net *net, u_int16_t proto)
>  {
>  	struct sctp_net *sn = sctp_pernet(net);
>  	struct nf_proto_net *pn = (struct nf_proto_net *)sn;
> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> index 99caa13..6db9d3c 100644
> --- a/net/netfilter/nf_conntrack_proto_tcp.c
> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> @@ -1593,7 +1593,7 @@ static int tcp_kmemdup_compat_sysctl_table(struct nf_proto_net *pn)
>  	return 0;
>  }
>  
> -static int tcpv4_init_net(struct net *net)
> +static int tcpv4_init_net(struct net *net, u_int16_t proto)
>  {
>  	int i;
>  	int ret = 0;
> @@ -1631,7 +1631,7 @@ static int tcpv4_init_net(struct net *net)
>  	return ret;
>  }
>  
> -static int tcpv6_init_net(struct net *net)
> +static int tcpv6_init_net(struct net *net, u_int16_t proto)
>  {
>  	int i;
>  	struct nf_tcp_net *tn = tcp_pernet(net);
> diff --git a/net/netfilter/nf_conntrack_proto_udp.c b/net/netfilter/nf_conntrack_proto_udp.c
> index a83cf93..2b978e6 100644
> --- a/net/netfilter/nf_conntrack_proto_udp.c
> +++ b/net/netfilter/nf_conntrack_proto_udp.c
> @@ -283,7 +283,7 @@ static void udp_init_net_data(struct nf_udp_net *un)
>  	}
>  }
>  
> -static int udpv4_init_net(struct net *net)
> +static int udpv4_init_net(struct net *net, u_int16_t proto)
>  {
>  	int ret;
>  	struct nf_udp_net *un = udp_pernet(net);
> @@ -307,7 +307,7 @@ static int udpv4_init_net(struct net *net)
>  	return ret;
>  }
>  
> -static int udpv6_init_net(struct net *net)
> +static int udpv6_init_net(struct net *net, u_int16_t proto)
>  {
>  	struct nf_udp_net *un = udp_pernet(net);
>  	struct nf_proto_net *pn = (struct nf_proto_net *)un;
> diff --git a/net/netfilter/nf_conntrack_proto_udplite.c b/net/netfilter/nf_conntrack_proto_udplite.c
> index b32e700..d33e511 100644
> --- a/net/netfilter/nf_conntrack_proto_udplite.c
> +++ b/net/netfilter/nf_conntrack_proto_udplite.c
> @@ -234,7 +234,7 @@ static struct ctl_table udplite_sysctl_table[] = {
>  };
>  #endif /* CONFIG_SYSCTL */
>  
> -static int udplite_init_net(struct net *net)
> +static int udplite_init_net(struct net *net, u_int16_t proto)
>  {
>  	int i;
>  	struct udplite_net *un = udplite_pernet(net);
> -- 
> 1.7.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 04/10] netfilter: regard users as refcount for l4proto's per-net data
From: Pablo Neira Ayuso @ 2012-06-14 18:03 UTC (permalink / raw)
  To: Gao feng; +Cc: netdev, netfilter-devel
In-Reply-To: <1339668445-23848-4-git-send-email-gaofeng@cn.fujitsu.com>

On Thu, Jun 14, 2012 at 06:07:19PM +0800, Gao feng wrote:
> Now, nf_proto_net's users is confusing.
> we should regard it as the refcount for l4proto's per-net data,
> because maybe there are two l4protos use the same per-net data.
> 
> so increment pn->users when nf_conntrack_l4proto_register
> success, and decrement it for nf_conntrack_l4_unregister case.
> 
> because nf_conntrack_l3proto_ipv[4|6] don't use the same per-net
> data,so we don't need to add a refcnt for their per-net data.
> 
> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> ---
>  net/netfilter/nf_conntrack_proto.c |   70 ++++++++++++++++++++++-------------
>  1 files changed, 44 insertions(+), 26 deletions(-)
> 
> diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
> index c9df1b4..63f9430 100644
> --- a/net/netfilter/nf_conntrack_proto.c
> +++ b/net/netfilter/nf_conntrack_proto.c
> @@ -39,16 +39,13 @@ static int
>  nf_ct_register_sysctl(struct net *net,
>  		      struct ctl_table_header **header,
>  		      const char *path,
> -		      struct ctl_table *table,
> -		      unsigned int *users)
> +		      struct ctl_table *table)
>  {
>  	if (*header == NULL) {
>  		*header = register_net_sysctl(net, path, table);
>  		if (*header == NULL)
>  			return -ENOMEM;
>  	}
> -	if (users != NULL)
> -		(*users)++;
>  
>  	return 0;
>  }
> @@ -58,7 +55,7 @@ nf_ct_unregister_sysctl(struct ctl_table_header **header,
>  			struct ctl_table **table,
>  			unsigned int *users)
>  {
> -	if (users != NULL && --*users > 0)
> +	if (users != NULL && *users > 0)
>  		return;
>  
>  	unregister_net_sysctl_table(*header);
> @@ -191,8 +188,7 @@ static int nf_ct_l3proto_register_sysctl(struct net *net,
>  		err = nf_ct_register_sysctl(net,
>  					    &in->ctl_table_header,
>  					    l3proto->ctl_table_path,
> -					    in->ctl_table,
> -					    NULL);
> +					    in->ctl_table);
>  
>  		if (err < 0) {
>  			kfree(in->ctl_table);
> @@ -330,20 +326,17 @@ static struct nf_proto_net *nf_ct_l4proto_net(struct net *net,
>  
>  static
>  int nf_ct_l4proto_register_sysctl(struct net *net,
> +				  struct nf_proto_net *pn,
>  				  struct nf_conntrack_l4proto *l4proto)
>  {
>  	int err = 0;
> -	struct nf_proto_net *pn = nf_ct_l4proto_net(net, l4proto);
> -	if (pn == NULL)
> -		return 0;
>  
>  #ifdef CONFIG_SYSCTL
>  	if (pn->ctl_table != NULL) {
>  		err = nf_ct_register_sysctl(net,
>  					    &pn->ctl_table_header,
>  					    "net/netfilter",
> -					    pn->ctl_table,
> -					    &pn->users);
> +					    pn->ctl_table);
>  		if (err < 0) {
>  			if (!pn->users) {
>  				kfree(pn->ctl_table);
> @@ -357,8 +350,7 @@ int nf_ct_l4proto_register_sysctl(struct net *net,
>  		err = nf_ct_register_sysctl(net,
>  					    &pn->ctl_compat_header,
>  					    "net/ipv4/netfilter",
> -					    pn->ctl_compat_table,
> -					    NULL);
> +					    pn->ctl_compat_table);
>  		if (err == 0)
>  			goto out;
>  		nf_ct_kfree_compat_sysctl_table(pn);
> @@ -374,11 +366,9 @@ out:
>  
>  static
>  void nf_ct_l4proto_unregister_sysctl(struct net *net,
> +				     struct nf_proto_net *pn,
>  				     struct nf_conntrack_l4proto *l4proto)
>  {
> -	struct nf_proto_net *pn = nf_ct_l4proto_net(net, l4proto);
> -	if (pn == NULL)
> -		return;
>  #ifdef CONFIG_SYSCTL
>  	if (pn->ctl_table_header != NULL)
>  		nf_ct_unregister_sysctl(&pn->ctl_table_header,
> @@ -391,8 +381,6 @@ void nf_ct_l4proto_unregister_sysctl(struct net *net,
>  					&pn->ctl_compat_table,
>  					NULL);
>  #endif /* CONFIG_NF_CONNTRACK_PROC_COMPAT */
> -#else
> -	pn->users--;
>  #endif /* CONFIG_SYSCTL */
>  }
>  
> @@ -458,22 +446,33 @@ int nf_conntrack_l4proto_register(struct net *net,
>  				  struct nf_conntrack_l4proto *l4proto)
>  {
>  	int ret = 0;
> +
> +	struct nf_proto_net *pn = NULL;
> +
>  	if (l4proto->init_net) {
>  		ret = l4proto->init_net(net, l4proto->l3proto);
>  		if (ret < 0)
> -			return ret;
> +			goto out;
>  	}
>  
> -	ret = nf_ct_l4proto_register_sysctl(net, l4proto);
> +	pn = nf_ct_l4proto_net(net, l4proto);
> +	if (pn == NULL)
> +		goto out;
> +
> +	ret = nf_ct_l4proto_register_sysctl(net, pn, l4proto);
>  	if (ret < 0)
> -		return ret;
> +		goto out;
>  
>  	if (net == &init_net) {
>  		ret = nf_conntrack_l4proto_register_net(l4proto);
> -		if (ret < 0)
> -			nf_ct_l4proto_unregister_sysctl(net, l4proto);
> +		if (ret < 0) {
> +			nf_ct_l4proto_unregister_sysctl(net, pn, l4proto);
> +			goto out;
> +		}
>  	}
> -
> +	/* increase the nf_proto_net's refcnt */

this comment is superfluous, please remove it.

> +	pn->users++;
> +out:
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_register);
> @@ -498,10 +497,18 @@ nf_conntrack_l4proto_unregister_net(struct nf_conntrack_l4proto *l4proto)
>  void nf_conntrack_l4proto_unregister(struct net *net,
>  				     struct nf_conntrack_l4proto *l4proto)
>  {
> +	struct nf_proto_net *pn = NULL;
>  	if (net == &init_net)
>  		nf_conntrack_l4proto_unregister_net(l4proto);
>  
> -	nf_ct_l4proto_unregister_sysctl(net, l4proto);
> +	pn = nf_ct_l4proto_net(net, l4proto);
> +	if (pn == NULL)
> +		return;
> +
> +	/* decrease the nf_proto_net's refcnt */

same thing.

> +	pn->users--;
> +	nf_ct_l4proto_unregister_sysctl(net, pn, l4proto);
> +
>  	/* Remove all contrack entries for this protocol */
>  	rtnl_lock();
>  	nf_ct_iterate_cleanup(net, kill_l4proto, l4proto);
> @@ -513,11 +520,14 @@ int nf_conntrack_proto_init(struct net *net)
>  {
>  	unsigned int i;
>  	int err;
> +	struct nf_proto_net *pn = nf_ct_l4proto_net(net,
> +						    &nf_conntrack_l4proto_generic);

break lines at 80 chars per column.

>  	err = nf_conntrack_l4proto_generic.init_net(net,
>  						    nf_conntrack_l4proto_generic.l3proto);
>  	if (err < 0)
>  		return err;
>  	err = nf_ct_l4proto_register_sysctl(net,
> +					    pn,
>  					    &nf_conntrack_l4proto_generic);
>  	if (err < 0)
>  		return err;
> @@ -527,13 +537,21 @@ int nf_conntrack_proto_init(struct net *net)
>  			rcu_assign_pointer(nf_ct_l3protos[i],
>  					   &nf_conntrack_l3proto_generic);
>  	}
> +	/* increase generic proto's nf_proto_net refcnt */
> +	pn->users++;
> +
>  	return 0;
>  }
>  
>  void nf_conntrack_proto_fini(struct net *net)
>  {
>  	unsigned int i;
> +	struct nf_proto_net *pn = nf_ct_l4proto_net(net,
> +						    &nf_conntrack_l4proto_generic);
> +	/* decrease generic proto's nf_proto_net refcnt */
> +	pn->users--;
>  	nf_ct_l4proto_unregister_sysctl(net,
> +					pn,
>  					&nf_conntrack_l4proto_generic);
>  	if (net == &init_net) {
>  		/* free l3proto protocol tables */
> -- 
> 1.7.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 03/10] netfilter: add nf_ct_kfree_compat_sysctl_table to make codes clear
From: Pablo Neira Ayuso @ 2012-06-14 18:06 UTC (permalink / raw)
  To: Gao feng; +Cc: netdev, netfilter-devel
In-Reply-To: <1339668445-23848-3-git-send-email-gaofeng@cn.fujitsu.com>

On Thu, Jun 14, 2012 at 06:07:18PM +0800, Gao feng wrote:
> add function nf_ct_kfree_compat_sysctl_table to kfree l4proto's
> compat sysctl table and set the sysctl table point to NULL.
> 
> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> ---
>  include/net/netfilter/nf_conntrack_l4proto.h |    8 ++++++++
>  net/netfilter/nf_conntrack_proto.c           |    4 +---
>  2 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/include/net/netfilter/nf_conntrack_l4proto.h b/include/net/netfilter/nf_conntrack_l4proto.h
> index 5dd60f2..889b717 100644
> --- a/include/net/netfilter/nf_conntrack_l4proto.h
> +++ b/include/net/netfilter/nf_conntrack_l4proto.h
> @@ -132,6 +132,14 @@ extern int nf_ct_port_nlattr_to_tuple(struct nlattr *tb[],
>  extern int nf_ct_port_nlattr_tuple_size(void);
>  extern const struct nla_policy nf_ct_port_nla_policy[];
>  
> +static inline void nf_ct_kfree_compat_sysctl_table(struct nf_proto_net *pn)
> +{
> +#if defined(CONFIG_SYSCTL) && defined(CONFIG_NF_CONNTRACK_PROC_COMPAT)
> +	kfree(pn->ctl_compat_table);
> +	pn->ctl_compat_table = NULL;
> +#endif
> +}
> +
>  #ifdef CONFIG_SYSCTL
>  #ifdef DEBUG_INVALID_PACKETS
>  #define LOG_INVALID(net, proto)				\
> diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
> index 8fc0332..c9df1b4 100644
> --- a/net/netfilter/nf_conntrack_proto.c
> +++ b/net/netfilter/nf_conntrack_proto.c
> @@ -361,9 +361,7 @@ int nf_ct_l4proto_register_sysctl(struct net *net,
>  					    NULL);
>  		if (err == 0)
>  			goto out;
> -
> -		kfree(pn->ctl_compat_table);
> -		pn->ctl_compat_table = NULL;
> +		nf_ct_kfree_compat_sysctl_table(pn);

if this is the only client of this function, then make it static and
define it inside nf_conntrack_proto.c

>  		nf_ct_unregister_sysctl(&pn->ctl_table_header,
>  					&pn->ctl_table,
>  					&pn->users);
> -- 
> 1.7.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 06/10] merge udpv[4,6]_net_init into udp_net_init
From: Pablo Neira Ayuso @ 2012-06-14 18:13 UTC (permalink / raw)
  To: Gao feng; +Cc: netdev, netfilter-devel
In-Reply-To: <1339668445-23848-6-git-send-email-gaofeng@cn.fujitsu.com>

On Thu, Jun 14, 2012 at 06:07:21PM +0800, Gao feng wrote:
> merge udpv4_net_init and udpv6_net_init into udp_net_init to
> reduce the redundancy codes.
> 
> and use nf_proto_net.users to identify if it's the first time
> we use the nf_proto_net. when it's the first time,we will
> initialized it.
> 
> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> ---
>  net/netfilter/nf_conntrack_proto_udp.c |   60 ++++++++++++--------------------
>  1 files changed, 22 insertions(+), 38 deletions(-)
> 
> diff --git a/net/netfilter/nf_conntrack_proto_udp.c b/net/netfilter/nf_conntrack_proto_udp.c
> index 2b978e6..480126e 100644
> --- a/net/netfilter/nf_conntrack_proto_udp.c
> +++ b/net/netfilter/nf_conntrack_proto_udp.c
> @@ -239,13 +239,16 @@ static int udp_kmemdup_sysctl_table(struct nf_proto_net *pn)
>  {
>  #ifdef CONFIG_SYSCTL
>  	struct nf_udp_net *un = (struct nf_udp_net *)pn;
> +
>  	if (pn->ctl_table)
>  		return 0;
> +
>  	pn->ctl_table = kmemdup(udp_sysctl_table,
>  				sizeof(udp_sysctl_table),
>  				GFP_KERNEL);
>  	if (!pn->ctl_table)
>  		return -ENOMEM;
> +

I like this cleanup. But you have to make it in a separate patch.

>  	pn->ctl_table[0].data = &un->timeouts[UDP_CT_UNREPLIED];
>  	pn->ctl_table[1].data = &un->timeouts[UDP_CT_REPLIED];
>  #endif
> @@ -257,6 +260,7 @@ static int udp_kmemdup_compat_sysctl_table(struct nf_proto_net *pn)
>  #ifdef CONFIG_SYSCTL
>  #ifdef CONFIG_NF_CONNTRACK_PROC_COMPAT
>  	struct nf_udp_net *un = (struct nf_udp_net *)pn;
> +
>  	pn->ctl_compat_table = kmemdup(udp_compat_sysctl_table,
>  				       sizeof(udp_compat_sysctl_table),
>  				       GFP_KERNEL);
> @@ -270,52 +274,32 @@ static int udp_kmemdup_compat_sysctl_table(struct nf_proto_net *pn)
>  	return 0;
>  }
>  
> -static void udp_init_net_data(struct nf_udp_net *un)
> +static int udp_init_net(struct net *net, u_int16_t proto)
>  {
> -	int i;
> -#ifdef CONFIG_SYSCTL
> -	if (!un->pn.ctl_table) {
> -#else
> -	if (!un->pn.users++) {
> -#endif
> +	int ret = 0;
> +	struct nf_udp_net *un = udp_pernet(net);
> +	struct nf_proto_net *pn = (struct nf_proto_net *)un;
> +
> +	if (pn->users) {
> +		int i = 0;

redundant initialization of that i variable.

>  		for (i = 0; i < UDP_CT_MAX; i++)
>  			un->timeouts[i] = udp_timeouts[i];
>  	}
> -}
>  
> -static int udpv4_init_net(struct net *net, u_int16_t proto)
> -{
> -	int ret;
> -	struct nf_udp_net *un = udp_pernet(net);
> -	struct nf_proto_net *pn = (struct nf_proto_net *)un;
> +	if (proto == AF_INET) {
> +		ret = udp_kmemdup_compat_sysctl_table(pn);
> +		if (ret < 0)
> +			return ret;
>  
> -	udp_init_net_data(un);
> +		ret = udp_kmemdup_sysctl_table(pn);
> +		if (ret < 0)
> +			nf_ct_kfree_compat_sysctl_table(pn);
> +	} else
> +		ret = udp_kmemdup_sysctl_table(pn);
>  
> -	ret = udp_kmemdup_compat_sysctl_table(pn);
> -	if (ret < 0)
> -		return ret;
> -
> -	ret = udp_kmemdup_sysctl_table(pn);
> -#ifdef CONFIG_SYSCTL
> -#ifdef CONFIG_NF_CONNTRACK_PROC_COMPAT
> -	if (ret < 0) {
> -		kfree(pn->ctl_compat_table);
> -		pn->ctl_compat_table = NULL;
> -	}
> -#endif
> -#endif
>  	return ret;
>  }
>  
> -static int udpv6_init_net(struct net *net, u_int16_t proto)
> -{
> -	struct nf_udp_net *un = udp_pernet(net);
> -	struct nf_proto_net *pn = (struct nf_proto_net *)un;
> -
> -	udp_init_net_data(un);
> -	return udp_kmemdup_sysctl_table(pn);
> -}
> -
>  struct nf_conntrack_l4proto nf_conntrack_l4proto_udp4 __read_mostly =
>  {
>  	.l3proto		= PF_INET,
> @@ -343,7 +327,7 @@ struct nf_conntrack_l4proto nf_conntrack_l4proto_udp4 __read_mostly =
>  		.nla_policy	= udp_timeout_nla_policy,
>  	},
>  #endif /* CONFIG_NF_CT_NETLINK_TIMEOUT */
> -	.init_net		= udpv4_init_net,
> +	.init_net		= udp_init_net,
>  };
>  EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_udp4);
>  
> @@ -374,6 +358,6 @@ struct nf_conntrack_l4proto nf_conntrack_l4proto_udp6 __read_mostly =
>  		.nla_policy	= udp_timeout_nla_policy,
>  	},
>  #endif /* CONFIG_NF_CT_NETLINK_TIMEOUT */
> -	.init_net		= udpv6_init_net,
> +	.init_net		= udp_init_net,
>  };
>  EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_udp6);
> -- 
> 1.7.7.6
> 

^ permalink raw reply

* Re: [PATCH 07/10] netfilter: nf_conntrack_l4proto_udplite[4,6] cleanup
From: Pablo Neira Ayuso @ 2012-06-14 18:17 UTC (permalink / raw)
  To: Gao feng; +Cc: netdev, netfilter-devel
In-Reply-To: <1339668445-23848-7-git-send-email-gaofeng@cn.fujitsu.com>

On Thu, Jun 14, 2012 at 06:07:22PM +0800, Gao feng wrote:
> some cleanup for nf_conntrack_l4proto_udplite[4,6],
> make codes more clearer and ready for moving the
> sysctl code to nf_conntrack_proto_*_sysctl.c to
> reduce the ifdef pollution.
> 
> and use nf_proto_net.users to identify if it's the first time
> we use the nf_proto_net. when it's the first time,we will
> initialized it.
> 
> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> ---
>  net/netfilter/nf_conntrack_proto_udplite.c |   42 +++++++++++++++++----------
>  1 files changed, 26 insertions(+), 16 deletions(-)
> 
> diff --git a/net/netfilter/nf_conntrack_proto_udplite.c b/net/netfilter/nf_conntrack_proto_udplite.c
> index d33e511..f6a789a 100644
> --- a/net/netfilter/nf_conntrack_proto_udplite.c
> +++ b/net/netfilter/nf_conntrack_proto_udplite.c
> @@ -234,29 +234,39 @@ static struct ctl_table udplite_sysctl_table[] = {
>  };
>  #endif /* CONFIG_SYSCTL */
>  
> +static int udplite_kmemdup_sysctl_table(struct nf_proto_net *pn)
> +{
> +#ifdef CONFIG_SYSCTL
> +	struct udplite_net *un = (struct udplite_net *)pn;
> +
> +	if (pn->ctl_table)
> +		return 0;
> +
> +	pn->ctl_table = kmemdup(udplite_sysctl_table,
> +				sizeof(udplite_sysctl_table),
> +				GFP_KERNEL);
> +	if (!pn->ctl_table)
> +		return -ENOMEM;
> +
> +	pn->ctl_table[0].data = &un->timeouts[UDPLITE_CT_UNREPLIED];
> +	pn->ctl_table[1].data = &un->timeouts[UDPLITE_CT_REPLIED];
> +#endif
> +	return 0;
> +}
> +
> +

Remove extra line.

>  static int udplite_init_net(struct net *net, u_int16_t proto)
>  {
> -	int i;
>  	struct udplite_net *un = udplite_pernet(net);
>  	struct nf_proto_net *pn = (struct nf_proto_net *)un;
> -#ifdef CONFIG_SYSCTL
> -	if (!pn->ctl_table) {
> -#else
> -	if (!pn->users++) {
> -#endif
> +
> +	if (!pn->users) {
> +		int i;
>  		for (i = 0 ; i < UDPLITE_CT_MAX; i++)
>  			un->timeouts[i] = udplite_timeouts[i];
> -#ifdef CONFIG_SYSCTL
> -		pn->ctl_table = kmemdup(udplite_sysctl_table,
> -					sizeof(udplite_sysctl_table),
> -					GFP_KERNEL);
> -		if (!pn->ctl_table)
> -			return -ENOMEM;
> -		pn->ctl_table[0].data = &un->timeouts[UDPLITE_CT_UNREPLIED];
> -		pn->ctl_table[1].data = &un->timeouts[UDPLITE_CT_REPLIED];
> -#endif
>  	}
> -	return 0;
> +
> +	return udplite_kmemdup_sysctl_table(pn);
>  }
>  
>  static struct nf_conntrack_l4proto nf_conntrack_l4proto_udplite4 __read_mostly =
> -- 
> 1.7.7.6
> 

^ permalink raw reply

* [patch] qlcnic: off by one in qlcnic_init_pci_info()
From: Dan Carpenter @ 2012-06-14 18:34 UTC (permalink / raw)
  To: Anirban Chakraborty; +Cc: Sony Chacko, linux-driver, netdev, kernel-janitors

The adapter->npars[] array has QLCNIC_MAX_PCI_FUNC elements.  We
allocate it that way a few lines earlier in the function.  So this test
is off by one.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index 33c3e46..212c121 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -479,7 +479,7 @@ qlcnic_init_pci_info(struct qlcnic_adapter *adapter)
 
 	for (i = 0; i < QLCNIC_MAX_PCI_FUNC; i++) {
 		pfn = pci_info[i].id;
-		if (pfn > QLCNIC_MAX_PCI_FUNC) {
+		if (pfn >= QLCNIC_MAX_PCI_FUNC) {
 			ret = QL_STATUS_INVALID_PARAM;
 			goto err_eswitch;
 		}

^ permalink raw reply related

* Re: [PATCH 3/3] be2net: Increase statistics structure size for skyhawk.
From: Ben Hutchings @ 2012-06-14 18:50 UTC (permalink / raw)
  To: sarveshwar.bandi; +Cc: davem, netdev, Vasundhara Volam
In-Reply-To: <57eadb3c-1687-40a3-b846-e1fcee3f5a58@exht1.ad.emulex.com>

On Thu, 2012-06-14 at 11:21 +0530, sarveshwar.bandi@emulex.com wrote:
> From: Vasundhara Volam <vasundhara.volam@emulex.com>
> 
> Increasing the hardware statistics structure to accomodate statistics for skyhawk.
> 
> Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
> ---
>  drivers/net/ethernet/emulex/benet/be_cmds.h |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.h b/drivers/net/ethernet/emulex/benet/be_cmds.h
> index 2f6bb06..3c938f5 100644
> --- a/drivers/net/ethernet/emulex/benet/be_cmds.h
> +++ b/drivers/net/ethernet/emulex/benet/be_cmds.h
> @@ -1566,7 +1566,7 @@ struct be_hw_stats_v1 {
>  	u32 rsvd0[BE_TXP_SW_SZ];
>  	struct be_erx_stats_v1 erx;
>  	struct be_pmem_stats pmem;
> -	u32 rsvd1[3];
> +	u32 rsvd1[18];
>  };
>  
>  struct be_cmd_req_get_stats_v1 {

Doesn't this merit a 'struct be_hw_stats_v2'?

Does this mean that the driver wasn't currently allocating enough memory
for the structure, and could that result in a buffer overrun?  If so
then this fix would need to go into 3.5 and 3.4.y as well.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] usbnet: sanitise overlong driver information strings
From: Ben Hutchings @ 2012-06-14 19:08 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netdev, davem
In-Reply-To: <1339672722-22793-1-git-send-email-phil.sutter@viprinet.com>

On Thu, 2012-06-14 at 13:18 +0200, Phil Sutter wrote:
> As seen on smsc75xx, driver_info->description being longer than 32
> characters messes up 'ethtool -i' output.

I should make ethtool tolerate that, but yes these are supposed to be
null-terminated strings.

Ben.

> Signed-off-by: Phil Sutter <phil.sutter@viprinet.com>
> ---
>  drivers/net/usb/usbnet.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index 9f58330..d4f7256 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -876,9 +876,9 @@ void usbnet_get_drvinfo (struct net_device *net, struct ethtool_drvinfo *info)
>  {
>  	struct usbnet *dev = netdev_priv(net);
>  
> -	strncpy (info->driver, dev->driver_name, sizeof info->driver);
> -	strncpy (info->version, DRIVER_VERSION, sizeof info->version);
> -	strncpy (info->fw_version, dev->driver_info->description,
> +	strlcpy (info->driver, dev->driver_name, sizeof info->driver);
> +	strlcpy (info->version, DRIVER_VERSION, sizeof info->version);
> +	strlcpy (info->fw_version, dev->driver_info->description,
>  		sizeof info->fw_version);
>  	usb_make_path (dev->udev, info->bus_info, sizeof info->bus_info);
>  }

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH ethtool] Don't trust drivers to null-terminate strings
From: Ben Hutchings @ 2012-06-14 19:52 UTC (permalink / raw)
  To: netdev; +Cc: Phil Sutter

I've committed the following change to ethtool.

Ben.
---
Some drivers have been seen to fill all bytes of
ethtool_drvinfo::fw_version without including a null terminator, which
effectively concatenates the following bytes to the string.  We've
already dealt with a similar problem in dump_stats() (commit
7764430a139e4a089127f5616b0d56f497be1036).  Try to cover all the
remaining string fields:

- In dump_drvinfo(), limit the length using printf() modifiers
- Add an option to get_stringset() to null-terminate all strings
- Change all callers except dump_stats() to set that option

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 ethtool.c |   38 +++++++++++++++++++++++---------------
 1 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index a5dce44..9e50640 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -615,19 +615,19 @@ static int dump_ecmd(struct ethtool_cmd *ep)
 static int dump_drvinfo(struct ethtool_drvinfo *info)
 {
 	fprintf(stdout,
-		"driver: %s\n"
-		"version: %s\n"
-		"firmware-version: %s\n"
-		"bus-info: %s\n"
+		"driver: %.*s\n"
+		"version: %.*s\n"
+		"firmware-version: %.*s\n"
+		"bus-info: %.*s\n"
 		"supports-statistics: %s\n"
 		"supports-test: %s\n"
 		"supports-eeprom-access: %s\n"
 		"supports-register-dump: %s\n"
 		"supports-priv-flags: %s\n",
-		info->driver,
-		info->version,
-		info->fw_version,
-		info->bus_info,
+		(int)sizeof(info->driver), info->driver,
+		(int)sizeof(info->version), info->version,
+		(int)sizeof(info->fw_version), info->fw_version,
+		(int)sizeof(info->bus_info), info->bus_info,
 		info->n_stats ? "yes" : "no",
 		info->testinfo_len ? "yes" : "no",
 		info->eedump_len ? "yes" : "no",
@@ -1317,14 +1317,14 @@ static int dump_tsinfo(const struct ethtool_ts_info *info)
 
 static struct ethtool_gstrings *
 get_stringset(struct cmd_context *ctx, enum ethtool_stringset set_id,
-	      ptrdiff_t drvinfo_offset)
+	      ptrdiff_t drvinfo_offset, int null_terminate)
 {
 	struct {
 		struct ethtool_sset_info hdr;
 		u32 buf[1];
 	} sset_info;
 	struct ethtool_drvinfo drvinfo;
-	u32 len;
+	u32 len, i;
 	struct ethtool_gstrings *strings;
 
 	sset_info.hdr.cmd = ETHTOOL_GSSET_INFO;
@@ -1354,6 +1354,10 @@ get_stringset(struct cmd_context *ctx, enum ethtool_stringset set_id,
 		return NULL;
 	}
 
+	if (null_terminate)
+		for (i = 0; i < len; i++)
+			strings->data[(i + 1) * ETH_GSTRING_LEN - 1] = 0;
+
 	return strings;
 }
 
@@ -1364,7 +1368,7 @@ static struct feature_defs *get_feature_defs(struct cmd_context *ctx)
 	u32 n_features;
 	int i, j;
 
-	names = get_stringset(ctx, ETH_SS_FEATURES, 0);
+	names = get_stringset(ctx, ETH_SS_FEATURES, 0, 1);
 	if (names) {
 		n_features = names->len;
 	} else if (errno == EOPNOTSUPP || errno == EINVAL) {
@@ -2640,7 +2644,8 @@ static int do_test(struct cmd_context *ctx)
 	}
 
 	strings = get_stringset(ctx, ETH_SS_TEST,
-				offsetof(struct ethtool_drvinfo, testinfo_len));
+				offsetof(struct ethtool_drvinfo, testinfo_len),
+				1);
 	if (!strings) {
 		perror("Cannot get strings");
 		return 74;
@@ -2709,7 +2714,8 @@ static int do_gstats(struct cmd_context *ctx)
 		exit_bad_args();
 
 	strings = get_stringset(ctx, ETH_SS_STATS,
-				offsetof(struct ethtool_drvinfo, n_stats));
+				offsetof(struct ethtool_drvinfo, n_stats),
+				0);
 	if (!strings) {
 		perror("Cannot get stats strings information");
 		return 96;
@@ -3274,7 +3280,8 @@ static int do_gprivflags(struct cmd_context *ctx)
 		exit_bad_args();
 
 	strings = get_stringset(ctx, ETH_SS_PRIV_FLAGS,
-				offsetof(struct ethtool_drvinfo, n_priv_flags));
+				offsetof(struct ethtool_drvinfo, n_priv_flags),
+				1);
 	if (!strings) {
 		perror("Cannot get private flag names");
 		return 1;
@@ -3314,7 +3321,8 @@ static int do_sprivflags(struct cmd_context *ctx)
 	unsigned int i;
 
 	strings = get_stringset(ctx, ETH_SS_PRIV_FLAGS,
-				offsetof(struct ethtool_drvinfo, n_priv_flags));
+				offsetof(struct ethtool_drvinfo, n_priv_flags),
+				1);
 	if (!strings) {
 		perror("Cannot get private flag names");
 		return 1;
-- 
1.7.7.6


-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* Greetings
From: Abegail Benson @ 2012-06-14 19:59 UTC (permalink / raw)


Greetings,
i hope you are doing fine over there,my name is Abegail,
and am contacting you because am looking for a honest partner for
a relationship, so if you don't mind reply me for more information about me
and picture.

^ permalink raw reply

* Re: [net-next patch 8/12] bnx2x: Allow up to 63 RSS queues default 8 queues
From: David Miller @ 2012-06-14 20:36 UTC (permalink / raw)
  To: meravs; +Cc: eilong, eric.dumazet, netdev
In-Reply-To: <1339688046.20038.3.camel@lb-tlvb-meravs.il.broadcom.com>

From: "Merav Sicron" <meravs@broadcom.com>
Date: Thu, 14 Jun 2012 18:34:06 +0300

> That's why we think (and so does Eric Dumazet) that it is better
> to have a smaller default number which is good for most cases.
> Do you agree with that?

What I think is that the thing which is more important than the
default we choose, is that it is consistently followed by all
multiqueue drivers.

By blazing your own unique path here, that is nearly guaranteed not to
happen.

I'd much rather have a bad default that every driver adheres to.

^ permalink raw reply

* Re: [PATCH] e1000: save skb counts in TX to avoid cache misses
From: Greg KH @ 2012-06-14 22:30 UTC (permalink / raw)
  To: Roman Kagan
  Cc: e1000-devel@lists.sourceforge.net, dnelson@redhat.com,
	bruce.w.allan@intel.com, jesse.brandeburg@intel.com,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	john.ronciak@intel.com, netdev@vger.kernel.org, David Miller
In-Reply-To: <1339585937.17472.11.camel@rkaganb.sw.ru>

On Wed, Jun 13, 2012 at 03:12:17PM +0400, Roman Kagan wrote:
> On Fri, 2012-06-08 at 11:37 +0400, Roman Kagan wrote:
> > On Fri, 2012-06-08 at 06:15 +0400, Greg KH wrote:
> > > On Thu, Jun 07, 2012 at 02:43:58PM -0700, David Miller wrote:
> > > > From: Jeff Kirsher <tarbal@gmail.com>
> > > > Date: Thu, 07 Jun 2012 14:38:17 -0700
> > > > 
> > > > > Thanks! I have applied the patch to my queue
> > > > 
> > > > Why?
> > > > 
> > > > My impression is that this is a patch already in the tree, and it's
> > > > being submitted for -stable but such minor performance hacks are
> > > > absolutely not appropriate for -stable submission.
> > > 
> > > The patch description says it is fixing reported oopses,
> > 
> > Exactly.
> > 
> > > but the Subject: isn't all that helpful there.
> > 
> > Well I just preserved the original subject from the upstream commit.
> > Want me to resubmit with a more alarming one?
> > 
> > > So which is this?  Should I accept it for a stable release or not?
> > 
> > IMO yes ;)
> 
> What came out of this discussion?  Should I resubmit with a different
> subject, or the original one is good enough?
> 
> The patch resolves a real oops; we've seen it multiple times when
> running Ubuntu-11.10 in virtual machines.  Upstream and RHEL have the
> fix since long.  Ubuntu is waiting for 3.0-stable to merge it
> (https://bugs.launchpad.net/bugs/1009545).

That's pretty funny that Ubuntu is letting me be the gatekeeper of fixes
to get to their customers, there's just so much wrong in that it's sad.

Anyway, I've queued it up for the next 3.0-stable release.

thanks,

greg k-h

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: linux-next: manual merge of the wireless-next tree with the net tree
From: Stephen Rothwell @ 2012-06-15  0:23 UTC (permalink / raw)
  To: John W. Linville
  Cc: linux-next, linux-kernel, Emmanuel Grumbach, Johannes Berg,
	David Miller, netdev
In-Reply-To: <20120612114129.68159847828093544c867306@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 804 bytes --]

Hi John, Dave,

On Tue, 12 Jun 2012 11:41:29 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> Today's linux-next merge of the wireless-next tree got a conflict in
> drivers/net/wireless/iwlwifi/pcie/trans.c between commit d012d04e4d63
> ("iwlwifi: disable the buggy chain extension feature in HW") from the net
> tree and commit 4beaf6c2f8af ("iwlwifi: s/txq_setup/txq_enable") from the
> wireless-next tree.
> 
> I fixed it up (I think - see below) and can carry the fix as necessary.

So this conflict has now been fixed up in both the net-next (commit
43b03f1f6d68) and wireless-next (commit 627ae3ddd6f9) trees, but slightly
differently :-(  I used the version from the net-next tree.

Just a heads up.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH 03/10] netfilter: add nf_ct_kfree_compat_sysctl_table to make codes clear
From: Gao feng @ 2012-06-15  1:36 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netdev, netfilter-devel
In-Reply-To: <20120614180632.GD10633@1984>

Hi Pablo:

于 2012年06月15日 02:06, Pablo Neira Ayuso 写道:
> On Thu, Jun 14, 2012 at 06:07:18PM +0800, Gao feng wrote:
>> add function nf_ct_kfree_compat_sysctl_table to kfree l4proto's
>> compat sysctl table and set the sysctl table point to NULL.
>>
>> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
>> ---
>>  include/net/netfilter/nf_conntrack_l4proto.h |    8 ++++++++
>>  net/netfilter/nf_conntrack_proto.c           |    4 +---
>>  2 files changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/net/netfilter/nf_conntrack_l4proto.h b/include/net/netfilter/nf_conntrack_l4proto.h
>> index 5dd60f2..889b717 100644
>> --- a/include/net/netfilter/nf_conntrack_l4proto.h
>> +++ b/include/net/netfilter/nf_conntrack_l4proto.h
>> @@ -132,6 +132,14 @@ extern int nf_ct_port_nlattr_to_tuple(struct nlattr *tb[],
>>  extern int nf_ct_port_nlattr_tuple_size(void);
>>  extern const struct nla_policy nf_ct_port_nla_policy[];
>>  
>> +static inline void nf_ct_kfree_compat_sysctl_table(struct nf_proto_net *pn)
>> +{
>> +#if defined(CONFIG_SYSCTL) && defined(CONFIG_NF_CONNTRACK_PROC_COMPAT)
>> +	kfree(pn->ctl_compat_table);
>> +	pn->ctl_compat_table = NULL;
>> +#endif
>> +}
>> +
>>  #ifdef CONFIG_SYSCTL
>>  #ifdef DEBUG_INVALID_PACKETS
>>  #define LOG_INVALID(net, proto)				\
>> diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
>> index 8fc0332..c9df1b4 100644
>> --- a/net/netfilter/nf_conntrack_proto.c
>> +++ b/net/netfilter/nf_conntrack_proto.c
>> @@ -361,9 +361,7 @@ int nf_ct_l4proto_register_sysctl(struct net *net,
>>  					    NULL);
>>  		if (err == 0)
>>  			goto out;
>> -
>> -		kfree(pn->ctl_compat_table);
>> -		pn->ctl_compat_table = NULL;
>> +		nf_ct_kfree_compat_sysctl_table(pn);
> 
> if this is the only client of this function, then make it static and
> define it inside nf_conntrack_proto.c
> 

tcp_init_net,udp_init_net use this function too.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 02/10] netfilter: add parameter proto for l4proto.init_net
From: Gao feng @ 2012-06-15  1:38 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netdev, netfilter-devel
In-Reply-To: <20120614175903.GB10633@1984>

于 2012年06月15日 01:59, Pablo Neira Ayuso 写道:
> On Thu, Jun 14, 2012 at 06:07:17PM +0800, Gao feng wrote:
>> there are redundancy codes in l4proto's init_net functions.
>> we can use one init_net function and l3proto to impletment
>> the same thing.
>>
>> So we should add l3proto as a parameter for init_net function.
>>
>> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
>> ---
>>  include/net/netfilter/nf_conntrack_l4proto.h   |    2 +-
>>  net/ipv4/netfilter/nf_conntrack_proto_icmp.c   |    2 +-
>>  net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c |    2 +-
>>  net/netfilter/nf_conntrack_proto.c             |    6 ++++--
>>  net/netfilter/nf_conntrack_proto_dccp.c        |    2 +-
>>  net/netfilter/nf_conntrack_proto_generic.c     |    2 +-
>>  net/netfilter/nf_conntrack_proto_gre.c         |    2 +-
>>  net/netfilter/nf_conntrack_proto_sctp.c        |    4 ++--
>>  net/netfilter/nf_conntrack_proto_tcp.c         |    4 ++--
>>  net/netfilter/nf_conntrack_proto_udp.c         |    4 ++--
>>  net/netfilter/nf_conntrack_proto_udplite.c     |    2 +-
>>  11 files changed, 17 insertions(+), 15 deletions(-)
>>
>> diff --git a/include/net/netfilter/nf_conntrack_l4proto.h b/include/net/netfilter/nf_conntrack_l4proto.h
>> index 81c52b5..5dd60f2 100644
>> --- a/include/net/netfilter/nf_conntrack_l4proto.h
>> +++ b/include/net/netfilter/nf_conntrack_l4proto.h
>> @@ -97,7 +97,7 @@ struct nf_conntrack_l4proto {
>>  #endif
>>  	int	*net_id;
>>  	/* Init l4proto pernet data */
>> -	int (*init_net)(struct net *net);
>> +	int (*init_net)(struct net *net, u_int16_t proto);
>>  
>>  	/* Protocol name */
>>  	const char *name;
>> diff --git a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
>> index 041923c..76f7a2f 100644
>> --- a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
>> +++ b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
>> @@ -337,7 +337,7 @@ static struct ctl_table icmp_compat_sysctl_table[] = {
>>  #endif /* CONFIG_NF_CONNTRACK_PROC_COMPAT */
>>  #endif /* CONFIG_SYSCTL */
>>  
>> -static int icmp_init_net(struct net *net)
>> +static int icmp_init_net(struct net *net, u_int16_t proto)
>>  {
>>  	struct nf_icmp_net *in = icmp_pernet(net);
>>  	struct nf_proto_net *pn = (struct nf_proto_net *)in;
>> diff --git a/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c b/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c
>> index 63ed012..807ae09 100644
>> --- a/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c
>> +++ b/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c
>> @@ -333,7 +333,7 @@ static struct ctl_table icmpv6_sysctl_table[] = {
>>  };
>>  #endif /* CONFIG_SYSCTL */
>>  
>> -static int icmpv6_init_net(struct net *net)
>> +static int icmpv6_init_net(struct net *net, u_int16_t proto)
>>  {
>>  	struct nf_icmp_net *in = icmpv6_pernet(net);
>>  	struct nf_proto_net *pn = (struct nf_proto_net *)in;
>> diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
>> index a434dd7..8fc0332 100644
>> --- a/net/netfilter/nf_conntrack_proto.c
>> +++ b/net/netfilter/nf_conntrack_proto.c
>> @@ -193,6 +193,7 @@ static int nf_ct_l3proto_register_sysctl(struct net *net,
>>  					    l3proto->ctl_table_path,
>>  					    in->ctl_table,
>>  					    NULL);
>> +
> 
> This entire patchset contains many extra new lines. If you want to
> provide some cleanup, it should come in some follow-up patch.

Ok, I will make a follow-up patch to do some cleanup.

> 
>>  		if (err < 0) {
>>  			kfree(in->ctl_table);
>>  			in->ctl_table = NULL;
>> @@ -460,7 +461,7 @@ int nf_conntrack_l4proto_register(struct net *net,
>>  {
>>  	int ret = 0;
>>  	if (l4proto->init_net) {
>> -		ret = l4proto->init_net(net);
>> +		ret = l4proto->init_net(net, l4proto->l3proto);
>>  		if (ret < 0)
>>  			return ret;
>>  	}
>> @@ -514,7 +515,8 @@ int nf_conntrack_proto_init(struct net *net)
>>  {
>>  	unsigned int i;
>>  	int err;
>> -	err = nf_conntrack_l4proto_generic.init_net(net);
>> +	err = nf_conntrack_l4proto_generic.init_net(net,
>> +						    nf_conntrack_l4proto_generic.l3proto);
> 
> You have to make sure that lines break at 80-chars per column.
> 
> Something like this should be fine:
> 
>         err = nf_conntrack_l4proto_generic.init_net(net,
>                                         nf_conntrack_l4proto_generic.l3proto);
> 
> 

Get it, thanks.

^ permalink raw reply

* Re: include/net/netlink.h:497:41: warning: ‘reply_nlh’ may be used uninitialized in this function [-Wuninitialized]
From: Fengguang Wu @ 2012-06-15  1:55 UTC (permalink / raw)
  To: netdev, David S. Miller
In-Reply-To: <20120614083057.GA29738@canuck.infradead.org>

On Thu, Jun 14, 2012 at 04:30:57AM -0400, Thomas Graf wrote:
> On Thu, Jun 14, 2012 at 10:07:08AM +0800, wfg@linux.intel.com wrote:
> > FYI: there are new compile warnings show up in
> > 
> > tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
> > head:   0450243096de90ff51c3a6c605410c5e28d79f8d
> > commit: 33a03aadb52fa05d28aba6d8f0c03c7b3b905897 [464/472] dcbnl: Prepare framework to shorten handling functions
> > config: x86_64-randconfig-net3 (attached as .config)
> > 
> > All related error/warning messages are:
> > 
> > net/dcb/dcbnl.c: In function ‘dcb_doit’:
> > include/net/netlink.h:497:41: warning: ‘reply_nlh’ may be used uninitialized in this function [-Wuninitialized]
> > net/dcb/dcbnl.c:1975:19: note: ‘reply_nlh’ was declared here
> 
> gcc 4.6.3 on my system doesn't complain about this. I'm sending a patch.

I'm running gcc 4.6.2 and will upgrade it.  Good to know gcc improved on this!

Thanks,
Fengguang

^ permalink raw reply

* Re: [PATCH 3/3] usbnet: handle remote wakeup asap
From: Ming Lei @ 2012-06-15  2:22 UTC (permalink / raw)
  To: David S. Miller, Greg Kroah-Hartman
  Cc: Oliver Neukum, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Ming Lei,
	stable-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1339561217-18151-4-git-send-email-ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1587 bytes --]

On Wed, Jun 13, 2012 at 12:20 PM, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> If usbnet is resumed by remote wakeup, generally there are
> some packets comming to be handled, so allocate and submit
> rx URBs in usbnet_resume to avoid delays introduced by tasklet.
> Otherwise, usbnet may have been runtime suspended before the
> usbnet_bh is executed to schedule Rx URBs.
>
> Without the patch, usbnet can't recieve any packets from peer
> in runtime suspend state if runtime PM is enabled and
> autosuspend_delay is set as zero.
>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Signed-off-by: Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> ---
>  drivers/net/usb/usbnet.c |   42 ++++++++++++++++++++++++++----------------
>  1 file changed, 26 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index 9bfa775..4911efa 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -1201,6 +1201,21 @@ deferred:
>  }
>  EXPORT_SYMBOL_GPL(usbnet_start_xmit);
>
> +static void rx_alloc_submit(struct usbnet *dev, gfp_t flags)
> +{
> +       struct urb      *urb;
> +       int             i;
> +
> +       /* don't refill the queue all at once */
> +       for (i = 0; i < 10 && dev->rxq.qlen < RX_QLEN(dev); i++) {
> +               urb = usb_alloc_urb(0, GFP_ATOMIC);

David, sorry, the 'GFP_ATOMIC' above should be 'flags', so could
you take the fixed version from attachment? Or could you do it by
your self?

Thanks,
--
Ming Lei

[-- Attachment #2: 0003-usbnet-handle-remote-wakeup-asap.patch --]
[-- Type: application/octet-stream, Size: 2933 bytes --]

From 29155e82eacac23e8fe6c29f2a65d7e4170ba086 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@canonical.com>
Date: Wed, 13 Jun 2012 10:40:17 +0800
Subject: [PATCH 3/3] usbnet: handle remote wakeup asap

If usbnet is resumed by remote wakeup, generally there are
some packets comming to be handled, so allocate and submit
rx URBs in usbnet_resume to avoid delays introduced by tasklet.
Otherwise, usbnet may have been runtime suspended before the
usbnet_bh is executed to schedule Rx URBs.

Without the patch, usbnet can't recieve any packets from peer
in runtime suspend state if runtime PM is enabled and
autosuspend_delay is set as zero.

Cc: stable@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/net/usb/usbnet.c |   42 ++++++++++++++++++++++++++----------------
 1 file changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 9bfa775..4911efa 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1201,6 +1201,21 @@ deferred:
 }
 EXPORT_SYMBOL_GPL(usbnet_start_xmit);
 
+static void rx_alloc_submit(struct usbnet *dev, gfp_t flags)
+{
+	struct urb	*urb;
+	int		i;
+
+	/* don't refill the queue all at once */
+	for (i = 0; i < 10 && dev->rxq.qlen < RX_QLEN(dev); i++) {
+		urb = usb_alloc_urb(0, flags);
+		if (urb != NULL) {
+			if (rx_submit(dev, urb, flags) == -ENOLINK)
+				return;
+		}
+	}
+}
+
 /*-------------------------------------------------------------------------*/
 
 // tasklet (work deferred from completions, in_irq) or timer
@@ -1240,26 +1255,14 @@ static void usbnet_bh (unsigned long param)
 		   !timer_pending (&dev->delay) &&
 		   !test_bit (EVENT_RX_HALT, &dev->flags)) {
 		int	temp = dev->rxq.qlen;
-		int	qlen = RX_QLEN (dev);
-
-		if (temp < qlen) {
-			struct urb	*urb;
-			int		i;
-
-			// don't refill the queue all at once
-			for (i = 0; i < 10 && dev->rxq.qlen < qlen; i++) {
-				urb = usb_alloc_urb (0, GFP_ATOMIC);
-				if (urb != NULL) {
-					if (rx_submit (dev, urb, GFP_ATOMIC) ==
-					    -ENOLINK)
-						return;
-				}
-			}
+
+		if (temp < RX_QLEN(dev)) {
+			rx_alloc_submit(dev, GFP_ATOMIC);
 			if (temp != dev->rxq.qlen)
 				netif_dbg(dev, link, dev->net,
 					  "rxqlen %d --> %d\n",
 					  temp, dev->rxq.qlen);
-			if (dev->rxq.qlen < qlen)
+			if (dev->rxq.qlen < RX_QLEN(dev))
 				tasklet_schedule (&dev->bh);
 		}
 		if (dev->txq.qlen < TX_QLEN (dev))
@@ -1565,6 +1568,13 @@ int usbnet_resume (struct usb_interface *intf)
 		spin_unlock_irq(&dev->txq.lock);
 
 		if (test_bit(EVENT_DEV_OPEN, &dev->flags)) {
+			/* handle remote wakeup ASAP */
+			if (!dev->wait &&
+				netif_device_present(dev->net) &&
+				!timer_pending(&dev->delay) &&
+				!test_bit(EVENT_RX_HALT, &dev->flags))
+					rx_alloc_submit(dev, GFP_KERNEL);
+
 			if (!(dev->txq.qlen >= TX_QLEN(dev)))
 				netif_tx_wake_all_queues(dev->net);
 			tasklet_schedule (&dev->bh);
-- 
1.7.9.5


^ permalink raw reply related

* Re: [net-next PATCH 02/02] net/ipv4: VTI support new module for ip_vti.
From: Saurabh Mohan @ 2012-06-15  2:43 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev
In-Reply-To: <20120614091254.GT27795@secunet.com>

Steffen,
Thanks for your feedback. Responses are inline (@SM)


----- Original Message -----
From: "Steffen Klassert" <steffen.klassert@secunet.com>
To: "Saurabh" <saurabh.mohan@vyatta.com>
Cc: netdev@vger.kernel.org
Sent: Thursday, June 14, 2012 2:12:54 AM
Subject: Re: [net-next PATCH 02/02] net/ipv4: VTI support new module for ip_vti.

On Fri, Jun 08, 2012 at 10:32:53AM -0700, Saurabh wrote:
> 
> 
> New VTI tunnel kernel module, Kconfig and Makefile changes.

It is an interesting feature, do you plan for an IPv6
version too?
@SM: Yes, we do plan to do the same for ipv6. The timelines are not yet certain though.


I made some comments on the code below.

Thanks.

> 
> Signed-off-by: Saurabh Mohan <saurabh.mohan@vyatta.com>
> Reviewed-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> ---
> diff --git a/include/linux/if_tunnel.h b/include/linux/if_tunnel.h
> index 16b92d0..4b4ce17 100644
> --- a/include/linux/if_tunnel.h
> +++ b/include/linux/if_tunnel.h
> @@ -80,4 +80,15 @@ enum {
>  
>  #define IFLA_GRE_MAX	(__IFLA_GRE_MAX - 1)
>  
> +enum {
> +	IFLA_VTI_UNSPEC,
> +	IFLA_VTI_LINK,
> +	IFLA_VTI_IKEY,
> +	IFLA_VTI_OKEY,
> +	IFLA_VTI_LOCAL,
> +	IFLA_VTI_REMOTE,
> +	__IFLA_VTI_MAX,
> +};
> +
> +#define IFLA_VTI_MAX	(__IFLA_VTI_MAX - 1)
>  #endif /* _IF_TUNNEL_H_ */
> diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
> index 20f1cb5..3a95308 100644
> --- a/net/ipv4/Kconfig
> +++ b/net/ipv4/Kconfig
> @@ -310,6 +310,20 @@ config SYN_COOKIES
>  
>  	  If unsure, say N.
>  
> +config NET_IPVTI
> +    tristate "Virtual (secure) IP: tunneling"
> +    select INET_TUNNEL

You added your register/deregister functions to net/ipv4/xfrm4_mode_tunnel.c,
so this should somehow select or depend on INET_XFRM_MODE_TUNNEL.

@SM: I'll create a dependency here.

> +    ---help---
> +      Tunneling means encapsulating data of one protocol type within
> +      another protocol and sending it over a channel that understands the
> +      Pencapsulating protocol. This particular tunneling driver implements
> +      encapsulation of IP within IP-ESP. This can be used with xfrm to give
> +      the notion of a secure tunnel and then use routing protocol on top.
> +
> +      Saying Y to this option will produce one module ( = code which can
> +      be inserted in and removed from the running kernel whenever you
> +      want). Most people won't need this and can say N.
> +
>  config INET_AH
>  	tristate "IP: AH transformation"
>  	select XFRM_ALGO
> diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
> index ff75d3b..3999ce9 100644
> --- a/net/ipv4/Makefile
> +++ b/net/ipv4/Makefile
> @@ -20,6 +20,7 @@ obj-$(CONFIG_IP_MROUTE) += ipmr.o
>  obj-$(CONFIG_NET_IPIP) += ipip.o
>  obj-$(CONFIG_NET_IPGRE_DEMUX) += gre.o
>  obj-$(CONFIG_NET_IPGRE) += ip_gre.o
> +obj-$(CONFIG_NET_IPVTI) += ip_vti.o
>  obj-$(CONFIG_SYN_COOKIES) += syncookies.o
>  obj-$(CONFIG_INET_AH) += ah4.o
>  obj-$(CONFIG_INET_ESP) += esp4.o
> diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
> new file mode 100644
> index 0000000..3eaa47c
> --- /dev/null
> +++ b/net/ipv4/ip_vti.c
> @@ -0,0 +1,980 @@
> +/*
> + *	Linux NET3:	IP/IP protocol decoder modified to support virtual tunnel interface
> + *
> + *	Authors:
> + *		Sam Lantinga (slouken@cs.ucdavis.edu)  02/01/95
> + *		Saurabh Mohan (saurabh.mohan@vyatta.com) 05/07/2012
> + *
> + *	This program is free software; you can redistribute it and/or
> + *	modify it under the terms of the GNU General Public License
> + *	as published by the Free Software Foundation; either version
> + *	2 of the License, or (at your option) any later version.
> + *
> + */
> +
> +/*
> +   This version of net/ipv4/ip_vti.c is cloned of net/ipv4/ipip.c
> +
> +   For comments look at net/ipv4/ip_gre.c --ANK
> + */
> +
> +
> +#include <linux/capability.h>
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/uaccess.h>
> +#include <linux/skbuff.h>
> +#include <linux/netdevice.h>
> +#include <linux/in.h>
> +#include <linux/tcp.h>
> +#include <linux/udp.h>
> +#include <linux/if_arp.h>
> +#include <linux/mroute.h>
> +#include <linux/init.h>
> +#include <linux/netfilter_ipv4.h>
> +#include <linux/if_ether.h>
> +
> +#include <net/sock.h>
> +#include <net/ip.h>
> +#include <net/icmp.h>
> +#include <net/ipip.h>
> +#include <net/inet_ecn.h>
> +#include <net/xfrm.h>
> +#include <net/net_namespace.h>
> +#include <net/netns/generic.h>
> +
> +#define HASH_SIZE  16
> +#define HASH(addr) (((__force u32)addr^((__force u32)addr>>4))&0xF)
> +
> +static struct rtnl_link_ops vti_link_ops __read_mostly;
> +
> +static int vti_net_id __read_mostly;
> +struct vti_net {
> +	struct ip_tunnel __rcu *tunnels_r_l[HASH_SIZE];
> +	struct ip_tunnel __rcu *tunnels_r[HASH_SIZE];
> +	struct ip_tunnel __rcu *tunnels_l[HASH_SIZE];
> +	struct ip_tunnel __rcu *tunnels_wc[1];
> +	struct ip_tunnel **tunnels[4];
> +
> +	struct net_device *fb_tunnel_dev;
> +};
> +
> +static int vti_fb_tunnel_init(struct net_device *dev);
> +static int vti_tunnel_init(struct net_device *dev);
> +static void vti_tunnel_setup(struct net_device *dev);
> +static void vti_dev_free(struct net_device *dev);
> +static int vti_tunnel_bind_dev(struct net_device *dev);
> +
> +/*
> + * Locking : hash tables are protected by RCU and RTNL
> + */
> +
> +#define for_each_ip_tunnel_rcu(start) \
> +	for (t = rcu_dereference(start); t; t = rcu_dereference(t->next))
> +
> +/* often modified stats are per cpu, other are shared (netdev->stats) */
> +struct pcpu_tstats {
> +	u64	rx_packets;
> +	u64	rx_bytes;
> +	u64	tx_packets;
> +	u64	tx_bytes;
> +	struct	u64_stats_sync	syncp;
> +};
> +
> +#define VTI_XMIT(stats1, stats2) do {				\
> +	int err;						\
> +	int pkt_len = skb->len;					\
> +	err = dst_output(skb);					\
> +	if (net_xmit_eval(err) == 0) {				\
> +		(stats1)->tx_bytes += pkt_len;			\
> +		(stats1)->tx_packets++;				\
> +	} else {						\
> +		(stats2)->tx_errors++;				\
> +		(stats2)->tx_aborted_errors++;			\
> +	}							\
> +} while (0)
> +
> +
> +static struct rtnl_link_stats64 *vti_get_stats64(struct net_device *dev,
> +					       struct rtnl_link_stats64 *tot)
> +{
> +	int i;
> +
> +	for_each_possible_cpu(i) {
> +		const struct pcpu_tstats *tstats = per_cpu_ptr(dev->tstats, i);
> +		u64 rx_packets, rx_bytes, tx_packets, tx_bytes;
> +		unsigned int start;
> +
> +		do {
> +			start = u64_stats_fetch_begin_bh(&tstats->syncp);
> +			rx_packets = tstats->rx_packets;
> +			tx_packets = tstats->tx_packets;
> +			rx_bytes = tstats->rx_bytes;
> +			tx_bytes = tstats->tx_bytes;
> +		} while (u64_stats_fetch_retry_bh(&tstats->syncp, start));
> +
> +		tot->rx_packets += rx_packets;
> +		tot->tx_packets += tx_packets;
> +		tot->rx_bytes   += rx_bytes;
> +		tot->tx_bytes   += tx_bytes;
> +	}
> +
> +	tot->multicast = dev->stats.multicast;
> +	tot->rx_crc_errors = dev->stats.rx_crc_errors;
> +	tot->rx_fifo_errors = dev->stats.rx_fifo_errors;
> +	tot->rx_length_errors = dev->stats.rx_length_errors;
> +	tot->rx_errors = dev->stats.rx_errors;
> +	tot->tx_fifo_errors = dev->stats.tx_fifo_errors;
> +	tot->tx_carrier_errors = dev->stats.tx_carrier_errors;
> +	tot->tx_dropped = dev->stats.tx_dropped;
> +	tot->tx_aborted_errors = dev->stats.tx_aborted_errors;
> +	tot->tx_errors = dev->stats.tx_errors;
> +
> +	return tot;
> +}
> +
> +static struct ip_tunnel *vti_tunnel_lookup(struct net *net,
> +					 __be32 remote, __be32 local)
> +{
> +	unsigned h0 = HASH(remote);
> +	unsigned h1 = HASH(local);
> +	struct ip_tunnel *t;
> +	struct vti_net *ipn = net_generic(net, vti_net_id);
> +
> +	for_each_ip_tunnel_rcu(ipn->tunnels_r_l[h0 ^ h1])
> +		if (local == t->parms.iph.saddr &&
> +		    remote == t->parms.iph.daddr && (t->dev->flags&IFF_UP))
> +			return t;
> +	for_each_ip_tunnel_rcu(ipn->tunnels_r[h0])
> +		if (remote == t->parms.iph.daddr && (t->dev->flags&IFF_UP))
> +			return t;
> +
> +	for_each_ip_tunnel_rcu(ipn->tunnels_l[h1])
> +		if (local == t->parms.iph.saddr && (t->dev->flags&IFF_UP))
> +			return t;
> +
> +	for_each_ip_tunnel_rcu(ipn->tunnels_wc[0])
> +		if (t && (t->dev->flags&IFF_UP))
> +			return t;
> +	return NULL;
> +}
> +
> +static struct ip_tunnel **__vti_bucket(struct vti_net *ipn,
> +				     struct ip_tunnel_parm *parms)
> +{
> +	__be32 remote = parms->iph.daddr;
> +	__be32 local = parms->iph.saddr;
> +	unsigned h = 0;
> +	int prio = 0;
> +
> +	if (remote) {
> +		prio |= 2;
> +		h ^= HASH(remote);
> +	}
> +	if (local) {
> +		prio |= 1;
> +		h ^= HASH(local);
> +	}
> +	return &ipn->tunnels[prio][h];
> +}
> +
> +static inline struct ip_tunnel **vti_bucket(struct vti_net *ipn,
> +					  struct ip_tunnel *t)
> +{
> +	return __vti_bucket(ipn, &t->parms);
> +}
> +
> +static void vti_tunnel_unlink(struct vti_net *ipn, struct ip_tunnel *t)
> +{
> +	struct ip_tunnel __rcu **tp;
> +	struct ip_tunnel *iter;
> +
> +	for (tp = vti_bucket(ipn, t);
> +	     (iter = rtnl_dereference(*tp)) != NULL;
> +	     tp = &iter->next) {
> +		if (t == iter) {
> +			rcu_assign_pointer(*tp, t->next);
> +			break;
> +		}
> +	}
> +}
> +
> +static void vti_tunnel_link(struct vti_net *ipn, struct ip_tunnel *t)
> +{
> +	struct ip_tunnel __rcu **tp = vti_bucket(ipn, t);
> +
> +	rcu_assign_pointer(t->next, rtnl_dereference(*tp));
> +	rcu_assign_pointer(*tp, t);
> +}
> +
> +static struct ip_tunnel *vti_tunnel_locate(struct net *net,
> +					 struct ip_tunnel_parm *parms,
> +					 int create)
> +{
> +	__be32 remote = parms->iph.daddr;
> +	__be32 local = parms->iph.saddr;
> +	struct ip_tunnel *t, *nt;
> +	struct ip_tunnel __rcu **tp;
> +	struct net_device *dev;
> +	char name[IFNAMSIZ];
> +	struct vti_net *ipn = net_generic(net, vti_net_id);
> +
> +	for (tp = __vti_bucket(ipn, parms);
> +	     (t = rtnl_dereference(*tp)) != NULL;
> +	     tp = &t->next) {
> +		if (local == t->parms.iph.saddr && remote == t->parms.iph.daddr)
> +			return t;
> +	}
> +	if (!create)
> +		return NULL;
> +
> +	if (parms->name[0])
> +		strlcpy(name, parms->name, IFNAMSIZ);
> +	else
> +		strcpy(name, "vti%d");
> +
> +	dev = alloc_netdev(sizeof(*t), name, vti_tunnel_setup);
> +	if (dev == NULL)
> +		return NULL;
> +
> +	dev_net_set(dev, net);
> +
> +	nt = netdev_priv(dev);
> +	nt->parms = *parms;
> +	dev->rtnl_link_ops = &vti_link_ops;
> +
> +	vti_tunnel_bind_dev(dev);
> +
> +	if (register_netdevice(dev) < 0)
> +		goto failed_free;
> +
> +	dev_hold(dev);
> +	vti_tunnel_link(ipn, nt);
> +	return nt;
> +
> + failed_free:
> +	free_netdev(dev);
> +	return NULL;
> +}
> +
> +static void vti_tunnel_uninit(struct net_device *dev)
> +{
> +	struct net *net = dev_net(dev);
> +	struct vti_net *ipn = net_generic(net, vti_net_id);
> +
> +	if (dev == ipn->fb_tunnel_dev)
> +		RCU_INIT_POINTER(ipn->tunnels_wc[0], NULL);
> +	else
> +		vti_tunnel_unlink(ipn, netdev_priv(dev));
> +	dev_put(dev);
> +}
> +
> +static int vti_err(struct sk_buff *skb, u32 info)
> +{
> +
> +	/* All the routers (except for Linux) return only
> +	 * 8 bytes of packet payload. It means, that precise relaying of
> +	 * ICMP in the real Internet is absolutely infeasible.
> +	 */
> +	struct iphdr *iph = (struct iphdr *)skb->data;
> +	const int type = icmp_hdr(skb)->type;
> +	const int code = icmp_hdr(skb)->code;
> +	struct ip_tunnel *t;
> +	int err;
> +
> +	switch (type) {
> +	default:
> +	case ICMP_PARAMETERPROB:
> +		return 0;
> +
> +	case ICMP_DEST_UNREACH:
> +		switch (code) {
> +		case ICMP_SR_FAILED:
> +		case ICMP_PORT_UNREACH:
> +			/* Impossible event. */
> +			return 0;
> +		case ICMP_FRAG_NEEDED:
> +			/* Soft state for pmtu is maintained by IP core. */
> +			return 0;
> +		default:
> +			/* All others are translated to HOST_UNREACH. */
> +			break;
> +		}
> +		break;
> +	case ICMP_TIME_EXCEEDED:
> +		if (code != ICMP_EXC_TTL)
> +			return 0;
> +		break;
> +	}
> +
> +	err = -ENOENT;
> +
> +	rcu_read_lock();
> +	t = vti_tunnel_lookup(dev_net(skb->dev), iph->daddr, iph->saddr);
> +	if (t == NULL || t->parms.iph.daddr == 0)
> +		goto out;
> +
> +	err = 0;
> +	if (t->parms.iph.ttl == 0 && type == ICMP_TIME_EXCEEDED)
> +		goto out;
> +
> +	if (time_before(jiffies, t->err_time + IPTUNNEL_ERR_TIMEO))
> +		t->err_count++;
> +	else
> +		t->err_count = 1;
> +	t->err_time = jiffies;
> +out:
> +	rcu_read_unlock();
> +	return err;
> +}
> +
> +static inline void vti_ecn_decapsulate(const struct iphdr *outer_iph,
> +					struct sk_buff *skb)
> +{
> +	struct iphdr *inner_iph = ip_hdr(skb);
> +
> +	if (INET_ECN_is_ce(outer_iph->tos))
> +		IP_ECN_set_ce(inner_iph);
> +}

vti_ecn_decapsulate is unused.
@SM: Thx!

> +
> +/*
> + * We dont digest the packet therefore let the packet pass.
> + */
> +static int vti_rcv(struct sk_buff *skb)
> +{
> +	struct ip_tunnel *tunnel;
> +	const struct iphdr *iph = ip_hdr(skb);
> +
> +	rcu_read_lock();
> +	tunnel = vti_tunnel_lookup(dev_net(skb->dev), iph->saddr, iph->daddr);
> +	if (tunnel != NULL) {
> +		struct pcpu_tstats *tstats;
> +
> +		tstats = this_cpu_ptr(tunnel->dev->tstats);
> +		tstats->rx_packets++;
> +		tstats->rx_bytes += skb->len;
> +
> +		skb->dev = tunnel->dev;
> +		skb_dst_drop(skb);

Really need to drop a refcount here? xfrm_input() does that too before
it reinjects the packet into layer 2.

> +		nf_reset(skb);

Same here.

@SM: Incorporated both.

> +		rcu_read_unlock();
> +		/* We do not eat the packet here therefore return 1 */
> +		return 1;
> +	}
> +	rcu_read_unlock();
> +
> +	return -1;
> +}
> +
> +/*
> + *	This function assumes it is being called from dev_queue_xmit()
> + *	and that skb is filled properly by that function.
> + */
> +
> +static netdev_tx_t vti_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
> +{
> +	struct ip_tunnel *tunnel = netdev_priv(dev);
> +	struct pcpu_tstats *tstats;
> +	struct net_device_stats *stats = &tunnel->dev->stats;
> +	struct iphdr  *tiph = &tunnel->parms.iph;
> +	u8     tos = tunnel->parms.iph.tos;
> +	struct rtable *rt;		/* Route to the other host */
> +	struct net_device *tdev;	/* Device to other host */
> +	struct iphdr  *old_iph = ip_hdr(skb);
> +	__be32 dst = tiph->daddr;
> +	struct flowi4 fl4;
> +
> +	if (skb->protocol != htons(ETH_P_IP))
> +		goto tx_error;
> +
> +	if (tos&1)
> +		tos = old_iph->tos;
> +
> +	if (!dst) {
> +		/* NBMA tunnel */
> +		rt = skb_rtable(skb);
> +		if (rt == NULL) {
> +			stats->tx_fifo_errors++;
> +			goto tx_error;
> +		}
> +		dst = rt->rt_gateway;
> +		if (dst == 0)
> +			goto tx_error_icmp;
> +	}
> +
> +	memset(&fl4, 0, sizeof(fl4));
> +	flowi4_init_output(&fl4, tunnel->parms.link,
> +		htonl(tunnel->parms.i_key), RT_TOS(tos), RT_SCOPE_UNIVERSE,
> +		IPPROTO_IPIP, 0,
> +		dst, tiph->saddr, 0, 0);
> +	rt = ip_route_output_key(dev_net(dev), &fl4);
> +	if (IS_ERR(rt)) {
> +		dev->stats.tx_carrier_errors++;
> +		goto tx_error_icmp;
> +	}
> +#ifdef CONFIG_XFRM
> +		/* if there is no transform then this tunnel is not functional. */
> +		if (!rt->dst.xfrm) {
> +			stats->tx_carrier_errors++;
> +			goto tx_error_icmp;
> +		}
> +#endif
> +	tdev = rt->dst.dev;
> +
> +	if (tdev == dev) {
> +		ip_rt_put(rt);
> +		stats->collisions++;
> +		goto tx_error;
> +
> +	}
> +
> +
> +	if (tunnel->err_count > 0) {
> +		if (time_before(jiffies,
> +				tunnel->err_time + IPTUNNEL_ERR_TIMEO)) {
> +			tunnel->err_count--;
> +			dst_link_failure(skb);
> +		} else
> +			tunnel->err_count = 0;
> +	}
> +
> +
> +	IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED |
> +			      IPSKB_REROUTED);
> +	skb_dst_drop(skb);
> +	skb_dst_set(skb, &rt->dst);
> +	nf_reset(skb);
> +	skb->dev = skb_dst(skb)->dev;
> +
> +	tstats = this_cpu_ptr(dev->tstats);
> +	VTI_XMIT(tstats, &dev->stats);
> +	return NETDEV_TX_OK;
> +
> +tx_error_icmp:
> +	dst_link_failure(skb);
> +tx_error:
> +	stats->tx_errors++;
> +	dev_kfree_skb(skb);
> +	return NETDEV_TX_OK;
> +}
> +
> +static int vti_tunnel_bind_dev(struct net_device *dev)
> +{
> +	struct net_device *tdev = NULL;
> +	struct ip_tunnel *tunnel;
> +	struct iphdr *iph;
> +
> +	tunnel = netdev_priv(dev);
> +	iph = &tunnel->parms.iph;
> +
> +	if (iph->daddr) {
> +		struct rtable *rt;
> +		struct flowi4 fl4;
> +		memset(&fl4, 0, sizeof(fl4));
> +		flowi4_init_output(&fl4, tunnel->parms.link,
> +				htonl(tunnel->parms.i_key), RT_TOS(iph->tos), RT_SCOPE_UNIVERSE,
> +				IPPROTO_IPIP, 0,
> +				iph->daddr, iph->saddr, 0, 0);
> +		rt = ip_route_output_key(dev_net(dev), &fl4);
> +		if (!IS_ERR(rt)) {
> +			tdev = rt->dst.dev;
> +			ip_rt_put(rt);
> +		}
> +		dev->flags |= IFF_POINTOPOINT;
> +	}
> +
> +	if (!tdev && tunnel->parms.link)
> +		tdev = __dev_get_by_index(dev_net(dev), tunnel->parms.link);
> +
> +	if (tdev) {
> +		dev->hard_header_len = tdev->hard_header_len + sizeof(struct iphdr);
> +		dev->mtu = tdev->mtu;
> +	}
> +	dev->iflink = tunnel->parms.link;
> +	return dev->mtu;
> +}
> +
> +static int
> +vti_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
> +{
> +	int err = 0;
> +	struct ip_tunnel_parm p;
> +	struct ip_tunnel *t;
> +	struct net *net = dev_net(dev);
> +	struct vti_net *ipn = net_generic(net, vti_net_id);
> +
> +	switch (cmd) {
> +	case SIOCGETTUNNEL:
> +		t = NULL;
> +		if (dev == ipn->fb_tunnel_dev) {
> +			if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p))) {
> +				err = -EFAULT;
> +				break;
> +			}
> +			t = vti_tunnel_locate(net, &p, 0);
> +		}
> +		if (t == NULL)
> +			t = netdev_priv(dev);
> +		memcpy(&p, &t->parms, sizeof(p));
> +		p.i_flags |= GRE_KEY;
> +		p.o_flags |= GRE_KEY;
> +		if (copy_to_user(ifr->ifr_ifru.ifru_data, &p, sizeof(p)))
> +			err = -EFAULT;
> +		break;
> +
> +	case SIOCADDTUNNEL:
> +	case SIOCCHGTUNNEL:
> +		err = -EPERM;
> +		if (!capable(CAP_NET_ADMIN))
> +			goto done;
> +
> +		err = -EFAULT;
> +		if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p)))
> +			goto done;
> +
> +		err = -EINVAL;
> +		if (p.iph.version != 4 || p.iph.protocol != IPPROTO_ESP ||
> +		    p.iph.ihl != 5 || (p.iph.frag_off&htons(~IP_DF)))
> +			goto done;
> +		if (p.iph.ttl)
> +			p.iph.frag_off |= htons(IP_DF);
> +
> +		t = vti_tunnel_locate(net, &p, cmd == SIOCADDTUNNEL);
> +
> +		if (dev != ipn->fb_tunnel_dev && cmd == SIOCCHGTUNNEL) {
> +			if (t != NULL) {
> +				if (t->dev != dev) {
> +					err = -EEXIST;
> +					break;
> +				}
> +			} else {
> +				if (((dev->flags&IFF_POINTOPOINT) && !p.iph.daddr) ||
> +				    (!(dev->flags&IFF_POINTOPOINT) && p.iph.daddr)) {
> +					err = -EINVAL;
> +					break;
> +				}
> +				t = netdev_priv(dev);
> +				vti_tunnel_unlink(ipn, t);
> +				synchronize_net();
> +				t->parms.iph.saddr = p.iph.saddr;
> +				t->parms.iph.daddr = p.iph.daddr;
> +				t->parms.i_key = p.i_key;
> +				t->parms.o_key = p.o_key;
> +				t->parms.iph.protocol = IPPROTO_ESP;
> +				memcpy(dev->dev_addr, &p.iph.saddr, 4);
> +				memcpy(dev->broadcast, &p.iph.daddr, 4);
> +				vti_tunnel_link(ipn, t);
> +				netdev_state_change(dev);
> +			}
> +		}
> +
> +		if (t) {
> +			err = 0;
> +			if (cmd == SIOCCHGTUNNEL) {
> +				t->parms.iph.ttl = p.iph.ttl;
> +				t->parms.iph.tos = p.iph.tos;
> +				t->parms.iph.frag_off = p.iph.frag_off;
> +				t->parms.i_key = p.i_key;
> +				t->parms.o_key = p.o_key;
> +				if (t->parms.link != p.link) {
> +					t->parms.link = p.link;
> +					vti_tunnel_bind_dev(dev);
> +					netdev_state_change(dev);
> +				}
> +			}
> +			if (copy_to_user(ifr->ifr_ifru.ifru_data, &t->parms, sizeof(p)))
> +				err = -EFAULT;
> +		} else
> +			err = (cmd == SIOCADDTUNNEL ? -ENOBUFS : -ENOENT);
> +		break;
> +
> +	case SIOCDELTUNNEL:
> +		err = -EPERM;
> +		if (!capable(CAP_NET_ADMIN))
> +			goto done;
> +
> +		if (dev == ipn->fb_tunnel_dev) {
> +			err = -EFAULT;
> +			if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p)))
> +				goto done;
> +			err = -ENOENT;
> +
> +			t = vti_tunnel_locate(net, &p, 0);
> +			if (t == NULL)
> +				goto done;
> +			err = -EPERM;
> +			if (t->dev == ipn->fb_tunnel_dev)
> +				goto done;
> +			dev = t->dev;
> +		}
> +		unregister_netdevice(dev);
> +		err = 0;
> +		break;
> +
> +	default:
> +		err = -EINVAL;
> +	}
> +
> +done:
> +	return err;
> +}
> +
> +static int vti_tunnel_change_mtu(struct net_device *dev, int new_mtu)
> +{
> +	if (new_mtu < 68 || new_mtu > 0xFFF8)
> +		return -EINVAL;
> +	dev->mtu = new_mtu;
> +	return 0;
> +}
> +
> +static const struct net_device_ops vti_netdev_ops = {
> +	.ndo_init	= vti_tunnel_init,
> +	.ndo_uninit	= vti_tunnel_uninit,
> +	.ndo_start_xmit	= vti_tunnel_xmit,
> +	.ndo_do_ioctl	= vti_tunnel_ioctl,
> +	.ndo_change_mtu	= vti_tunnel_change_mtu,
> +	.ndo_get_stats64  = vti_get_stats64,
> +};
> +
> +static void vti_dev_free(struct net_device *dev)
> +{
> +	free_percpu(dev->tstats);
> +	free_netdev(dev);
> +}
> +
> +static void vti_tunnel_setup(struct net_device *dev)
> +{
> +	dev->netdev_ops		= &vti_netdev_ops;
> +	dev->destructor		= vti_dev_free;
> +
> +	dev->type		= ARPHRD_TUNNEL;
> +	dev->hard_header_len	= LL_MAX_HEADER + sizeof(struct iphdr);
> +	dev->mtu		= ETH_DATA_LEN;
> +	dev->flags		= IFF_NOARP;
> +	dev->iflink		= 0;
> +	dev->addr_len		= 4;
> +	dev->features		|= NETIF_F_NETNS_LOCAL;
> +	dev->features		|= NETIF_F_LLTX;
> +	dev->priv_flags		&= ~IFF_XMIT_DST_RELEASE;
> +}
> +
> +static int vti_tunnel_init(struct net_device *dev)
> +{
> +	struct ip_tunnel *tunnel = netdev_priv(dev);
> +
> +	tunnel->dev = dev;
> +	strcpy(tunnel->parms.name, dev->name);
> +
> +	memcpy(dev->dev_addr, &tunnel->parms.iph.saddr, 4);
> +	memcpy(dev->broadcast, &tunnel->parms.iph.daddr, 4);
> +
> +	dev->tstats = alloc_percpu(struct pcpu_tstats);
> +	if (!dev->tstats)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static int __net_init vti_fb_tunnel_init(struct net_device *dev)
> +{
> +	struct ip_tunnel *tunnel = netdev_priv(dev);
> +	struct iphdr *iph = &tunnel->parms.iph;
> +	struct vti_net *ipn = net_generic(dev_net(dev), vti_net_id);
> +
> +	tunnel->dev = dev;
> +	strcpy(tunnel->parms.name, dev->name);
> +
> +	iph->version		= 4;
> +	iph->protocol		= IPPROTO_ESP;

Why IPPROTO_ESP? What's with the other IPsec protocols?
Shouldn't this be IPPROTO_IPIP?

@SM: VTI will work only with ESP not with AH (at least I have never heard of any one using it with AH). Plus I wanted to keep this module separate from IPIP (ip-in-ip tunnels).

I'll resubmit the patch with the changes tested.
-Saurabh.
> +	iph->ihl		= 5;
> +
> +	dev->tstats = alloc_percpu(struct pcpu_tstats);
> +	if (!dev->tstats)
> +		return -ENOMEM;
> +
> +	dev_hold(dev);
> +	rcu_assign_pointer(ipn->tunnels_wc[0], tunnel);
> +	return 0;
> +}
> +
> +static struct xfrm_tunnel vti_handler __read_mostly = {
> +	.handler	=	vti_rcv,
> +	.err_handler	=	vti_err,
> +	.priority	=	1,
> +};
> +
> +static void vti_destroy_tunnels(struct vti_net *ipn, struct list_head *head)
> +{
> +	int prio;
> +
> +	for (prio = 1; prio < 4; prio++) {
> +		int h;
> +		for (h = 0; h < HASH_SIZE; h++) {
> +			struct ip_tunnel *t;
> +
> +			t = rtnl_dereference(ipn->tunnels[prio][h]);
> +			while (t != NULL) {
> +				unregister_netdevice_queue(t->dev, head);
> +				t = rtnl_dereference(t->next);
> +			}
> +		}
> +	}
> +}
> +
> +static int __net_init vti_init_net(struct net *net)
> +{
> +	int err;
> +	struct vti_net *ipn = net_generic(net, vti_net_id);
> +
> +	ipn->tunnels[0] = ipn->tunnels_wc;
> +	ipn->tunnels[1] = ipn->tunnels_l;
> +	ipn->tunnels[2] = ipn->tunnels_r;
> +	ipn->tunnels[3] = ipn->tunnels_r_l;
> +
> +	ipn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel),
> +					   "ip_vti0",
> +					   vti_tunnel_setup);
> +	if (!ipn->fb_tunnel_dev) {
> +		err = -ENOMEM;
> +		goto err_alloc_dev;
> +	}
> +	dev_net_set(ipn->fb_tunnel_dev, net);
> +
> +	err = vti_fb_tunnel_init(ipn->fb_tunnel_dev);
> +	if (err)
> +		goto err_reg_dev;
> +	ipn->fb_tunnel_dev->rtnl_link_ops = &vti_link_ops;
> +
> +	err = register_netdev(ipn->fb_tunnel_dev);
> +	if (err)
> +		goto err_reg_dev;
> +	return 0;
> +
> +err_reg_dev:
> +	vti_dev_free(ipn->fb_tunnel_dev);
> +err_alloc_dev:
> +	/* nothing */
> +	return err;
> +}
> +
> +static void __net_exit vti_exit_net(struct net *net)
> +{
> +	struct vti_net *ipn = net_generic(net, vti_net_id);
> +	LIST_HEAD(list);
> +
> +	rtnl_lock();
> +	vti_destroy_tunnels(ipn, &list);
> +	unregister_netdevice_many(&list);
> +	rtnl_unlock();
> +}
> +
> +static struct pernet_operations vti_net_ops = {
> +	.init = vti_init_net,
> +	.exit = vti_exit_net,
> +	.id   = &vti_net_id,
> +	.size = sizeof(struct vti_net),
> +};
> +
> +static int vti_tunnel_validate(struct nlattr *tb[], struct nlattr *data[])
> +{
> +	return 0;
> +}
> +
> +static void vti_netlink_parms(struct nlattr *data[],
> +				struct ip_tunnel_parm *parms)
> +{
> +	memset(parms, 0, sizeof(*parms));
> +
> +	parms->iph.protocol = IPPROTO_ESP;
> +
> +	if (!data)
> +		return;
> +
> +	if (data[IFLA_VTI_LINK])
> +		parms->link = nla_get_u32(data[IFLA_VTI_LINK]);
> +
> +	if (data[IFLA_VTI_IKEY])
> +		parms->i_key = nla_get_be32(data[IFLA_VTI_IKEY]);
> +
> +	if (data[IFLA_VTI_OKEY])
> +		parms->o_key = nla_get_be32(data[IFLA_VTI_OKEY]);
> +
> +	if (data[IFLA_VTI_LOCAL])
> +		parms->iph.saddr = nla_get_be32(data[IFLA_VTI_LOCAL]);
> +
> +	if (data[IFLA_VTI_REMOTE])
> +		parms->iph.daddr = nla_get_be32(data[IFLA_VTI_REMOTE]);
> +
> +}
> +
> +static int vti_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[],
> +			 struct nlattr *data[])
> +{
> +	struct ip_tunnel *nt;
> +	struct net *net = dev_net(dev);
> +	struct vti_net *ipn = net_generic(net, vti_net_id);
> +	int mtu;
> +	int err;
> +
> +	nt = netdev_priv(dev);
> +	vti_netlink_parms(data, &nt->parms);
> +
> +	if (vti_tunnel_locate(net, &nt->parms, 0))
> +		return -EEXIST;
> +
> +	mtu = vti_tunnel_bind_dev(dev);
> +	if (!tb[IFLA_MTU])
> +		dev->mtu = mtu;
> +
> +	err = register_netdevice(dev);
> +	if (err)
> +		goto out;
> +
> +	dev_hold(dev);
> +	vti_tunnel_link(ipn, nt);
> +
> +out:
> +	return err;
> +	return 0;
> +}
> +
> +static int vti_changelink(struct net_device *dev, struct nlattr *tb[],
> +			    struct nlattr *data[])
> +{
> +	struct ip_tunnel *t, *nt;
> +	struct net *net = dev_net(dev);
> +	struct vti_net *ipn = net_generic(net, vti_net_id);
> +	struct ip_tunnel_parm p;
> +	int mtu;
> +
> +	if (dev == ipn->fb_tunnel_dev)
> +		return -EINVAL;
> +
> +	nt = netdev_priv(dev);
> +	vti_netlink_parms(data, &p);
> +
> +	t = vti_tunnel_locate(net, &p, 0);
> +
> +	if (t) {
> +		if (t->dev != dev)
> +			return -EEXIST;
> +	} else {
> +		t = nt;
> +
> +		vti_tunnel_unlink(ipn, t);
> +		t->parms.iph.saddr = p.iph.saddr;
> +		t->parms.iph.daddr = p.iph.daddr;
> +		t->parms.i_key = p.i_key;
> +		t->parms.o_key = p.o_key;
> +		if (dev->type != ARPHRD_ETHER) {
> +			memcpy(dev->dev_addr, &p.iph.saddr, 4);
> +			memcpy(dev->broadcast, &p.iph.daddr, 4);
> +		}
> +		vti_tunnel_link(ipn, t);
> +		netdev_state_change(dev);
> +	}
> +
> +	if (t->parms.link != p.link) {
> +		t->parms.link = p.link;
> +		mtu = vti_tunnel_bind_dev(dev);
> +		if (!tb[IFLA_MTU])
> +			dev->mtu = mtu;
> +		netdev_state_change(dev);
> +	}
> +
> +	return 0;
> +}
> +
> +static size_t vti_get_size(const struct net_device *dev)
> +{
> +	return
> +		/* IFLA_VTI_LINK */
> +		nla_total_size(4) +
> +		/* IFLA_VTI_IKEY */
> +		nla_total_size(4) +
> +		/* IFLA_VTI_OKEY */
> +		nla_total_size(4) +
> +		/* IFLA_VTI_LOCAL */
> +		nla_total_size(4) +
> +		/* IFLA_VTI_REMOTE */
> +		nla_total_size(4) +
> +		0;
> +}
> +
> +static int vti_fill_info(struct sk_buff *skb, const struct net_device *dev)
> +{
> +	struct ip_tunnel *t = netdev_priv(dev);
> +	struct ip_tunnel_parm *p = &t->parms;
> +
> +	nla_put_u32(skb, IFLA_VTI_LINK, p->link);
> +	nla_put_be32(skb, IFLA_VTI_IKEY, p->i_key);
> +	nla_put_be32(skb, IFLA_VTI_OKEY, p->o_key);
> +	nla_put_be32(skb, IFLA_VTI_LOCAL, p->iph.saddr);
> +	nla_put_be32(skb, IFLA_VTI_REMOTE, p->iph.daddr);
> +
> +	return 0;
> +}
> +
> +static const struct nla_policy vti_policy[IFLA_VTI_MAX + 1] = {
> +	[IFLA_VTI_LINK]		= { .type = NLA_U32 },
> +	[IFLA_VTI_IKEY]		= { .type = NLA_U32 },
> +	[IFLA_VTI_OKEY]		= { .type = NLA_U32 },
> +	[IFLA_VTI_LOCAL]	= { .len = FIELD_SIZEOF(struct iphdr, saddr) },
> +	[IFLA_VTI_REMOTE]	= { .len = FIELD_SIZEOF(struct iphdr, daddr) },
> +};
> +
> +static struct rtnl_link_ops vti_link_ops __read_mostly = {
> +	.kind		= "vti",
> +	.maxtype	= IFLA_VTI_MAX,
> +	.policy		= vti_policy,
> +	.priv_size	= sizeof(struct ip_tunnel),
> +	.setup		= vti_tunnel_setup,
> +	.validate	= vti_tunnel_validate,
> +	.newlink	= vti_newlink,
> +	.changelink	= vti_changelink,
> +	.get_size	= vti_get_size,
> +	.fill_info	= vti_fill_info,
> +};
> +
> +static int __init vti_init(void)
> +{
> +	int err;
> +
> +	pr_info("IPv4 over ESP tunneling driver v4\n");
> +
> +	err = register_pernet_device(&vti_net_ops);
> +	if (err < 0)
> +		return err;
> +	err = xfrm4_mode_tunnel_input_register(&vti_handler);
> +	if (err < 0) {
> +		unregister_pernet_device(&vti_net_ops);
> +		pr_info(KERN_INFO "vti init: can't register tunnel\n");
> +	}
> +
> +	err = rtnl_link_register(&vti_link_ops);
> +	if (err < 0)
> +		goto rtnl_link_failed;
> +
> +	return err;
> +
> +rtnl_link_failed:
> +	xfrm4_mode_tunnel_input_deregister(&vti_handler);
> +	unregister_pernet_device(&vti_net_ops);
> +	return err;
> +}
> +
> +static void __exit vti_fini(void)
> +{
> +	rtnl_link_unregister(&vti_link_ops);
> +	if (xfrm4_mode_tunnel_input_deregister(&vti_handler))
> +		pr_info("vti close: can't deregister tunnel\n");
> +
> +	unregister_pernet_device(&vti_net_ops);
> +}
> +
> +module_init(vti_init);
> +module_exit(vti_fini);
> +MODULE_LICENSE("GPL");
> +MODULE_ALIAS_RTNL_LINK("vti");
> +MODULE_ALIAS_NETDEV("ip_vti0");
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [net-next PATCH 01/02] net/ipv4: VTI support rx-path hook in xfrm4_mode_tunnel.
From: Saurabh Mohan @ 2012-06-15  2:44 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev
In-Reply-To: <20120614092559.GU27795@secunet.com>

Steffen See inline (@SM)

-Saurabh.

----- Original Message -----
From: "Steffen Klassert" <steffen.klassert@secunet.com>
To: "Saurabh" <saurabh.mohan@vyatta.com>
Cc: netdev@vger.kernel.org
Sent: Thursday, June 14, 2012 2:25:59 AM
Subject: Re: [net-next PATCH 01/02] net/ipv4: VTI support rx-path hook in xfrm4_mode_tunnel.

On Fri, Jun 08, 2012 at 10:32:46AM -0700, Saurabh wrote:
> 
> 
> Add hook for rx-path xfmr4_mode_tunnel for VTI tunnel module.
> 
> Signed-off-by: Saurabh Mohan <saurabh.mohan@vyatta.com>
> Reviewed-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> ---
> diff --git a/include/net/xfrm.h b/include/net/xfrm.h
> index e0a55df..04214c0 100644
> --- a/include/net/xfrm.h
> +++ b/include/net/xfrm.h
> @@ -1475,6 +1475,8 @@ extern int xfrm4_output(struct sk_buff *skb);
>  extern int xfrm4_output_finish(struct sk_buff *skb);
>  extern int xfrm4_tunnel_register(struct xfrm_tunnel *handler, unsigned short family);
>  extern int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short family);
> +extern int xfrm4_mode_tunnel_input_register(struct xfrm_tunnel *handler);
> +extern int xfrm4_mode_tunnel_input_deregister(struct xfrm_tunnel *handler);
>  extern int xfrm6_extract_header(struct sk_buff *skb);
>  extern int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb);
>  extern int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi);
> diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c
> index ed4bf11..4fc2944 100644
> --- a/net/ipv4/xfrm4_mode_tunnel.c
> +++ b/net/ipv4/xfrm4_mode_tunnel.c
> @@ -15,6 +15,68 @@
>  #include <net/ip.h>
>  #include <net/xfrm.h>
>  
> +/*
> + * Informational hook. The decap is still done here.
> + */
> +static struct xfrm_tunnel __rcu *rcv_notify_handlers __read_mostly;
> +static DEFINE_MUTEX(xfrm4_mode_tunnel_input_mutex);
> +
> +int xfrm4_mode_tunnel_input_register(struct xfrm_tunnel *handler)
> +{
> +	struct xfrm_tunnel __rcu **pprev;
> +	struct xfrm_tunnel *t;
> +
> +	int ret = -EEXIST;
> +	int priority = handler->priority;
> +
> +	mutex_lock(&xfrm4_mode_tunnel_input_mutex);
> +
> +	for (pprev = &rcv_notify_handlers;
> +		(t = rcu_dereference_protected(*pprev,
> +		lockdep_is_held(&xfrm4_mode_tunnel_input_mutex))) != NULL;
> +		pprev = &t->next) {
> +		if (t->priority > priority)
> +			break;
> +		if (t->priority == priority)
> +			goto err;
> +
> +	}
> +
> +	handler->next = *pprev;
> +	rcu_assign_pointer(*pprev, handler);
> +
> +	ret = 0;
> +
> +err:
> +	mutex_unlock(&xfrm4_mode_tunnel_input_mutex);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(xfrm4_mode_tunnel_input_register);
> +
> +int xfrm4_mode_tunnel_input_deregister(struct xfrm_tunnel *handler)
> +{
> +	struct xfrm_tunnel __rcu **pprev;
> +	struct xfrm_tunnel *t;
> +	int ret = -ENOENT;
> +
> +	mutex_lock(&xfrm4_mode_tunnel_input_mutex);
> +	for (pprev = &rcv_notify_handlers;
> +		(t = rcu_dereference_protected(*pprev,
> +		lockdep_is_held(&xfrm4_mode_tunnel_input_mutex))) != NULL;
> +		pprev = &t->next) {
> +		if (t == handler) {
> +			*pprev = handler->next;
> +			ret = 0;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&xfrm4_mode_tunnel_input_mutex);
> +	synchronize_net();
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(xfrm4_mode_tunnel_input_deregister);
> +
>  static inline void ipip_ecn_decapsulate(struct sk_buff *skb)
>  {
>  	struct iphdr *inner_iph = ipip_hdr(skb);
> @@ -64,8 +126,14 @@ static int xfrm4_mode_tunnel_output(struct xfrm_state *x, struct sk_buff *skb)
>  	return 0;
>  }
>  
> +#define for_each_input_rcu(head, handler)	\
> +	for (handler = rcu_dereference(head);	\
> +		handler != NULL;		\
> +		handler = rcu_dereference(handler->next))  \
> +
>  static int xfrm4_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
>  {
> +	struct xfrm_tunnel *handler;
>  	int err = -EINVAL;
>  
>  	if (XFRM_MODE_SKB_CB(skb)->protocol != IPPROTO_IPIP)
> @@ -74,6 +142,10 @@ static int xfrm4_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
>  	if (!pskb_may_pull(skb, sizeof(struct iphdr)))
>  		goto out;
>  
> +	/* The handlers do not consume the skb. */
> +	for_each_input_rcu(rcv_notify_handlers, handler)
> +		handler->handler(skb);

I'm not sure if this is the right place to add your handler.
My understanding of an IPsec tunnel device would be to
receive the packet first and then do IPsec processing.
Here it happens the other way arround.

@SM: The intention is to get the packet post decrypt and post-esp-decap. Just like a GRE tunnel the rx-counters account for the payload and NOT the GRE header.

Why didn't you register a tunnel handler and call the
xfrm tunnel handler from that?

@SM: I did not want to repeat the work xfrm4_mode_tunnel already does. It does the decrypt and decap. Why repeat it..
All, vti tunnel module wants to do on the receive path is click the counters and change the input device to be the vti.

Reg
-Saurabh.
> +
>  	if (skb_cloned(skb) &&
>  	    (err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC)))
>  		goto out;
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox