Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] hso: add support for new products
From: f.aben @ 2010-05-25 12:33 UTC (permalink / raw)
  To: davem, gregkh; +Cc: linux-usb, netdev, j.dumon


This patch adds a few new product id's for the hso driver.

Signed-off-by: Filip Aben <f.aben@option.com>
---
diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c
index be0cc99..8355f0e 100644
--- a/drivers/net/usb/hso.c
+++ b/drivers/net/usb/hso.c
@@ -475,6 +475,9 @@ static const struct usb_device_id hso_ids[] = {
 	{USB_DEVICE(0x0af0, 0x8302)},
 	{USB_DEVICE(0x0af0, 0x8304)},
 	{USB_DEVICE(0x0af0, 0x8400)},
+	{USB_DEVICE(0x0af0, 0x8600)},
+	{USB_DEVICE(0x0af0, 0x8800)},
+	{USB_DEVICE(0x0af0, 0x8900)},
 	{USB_DEVICE(0x0af0, 0xd035)},
 	{USB_DEVICE(0x0af0, 0xd055)},
 	{USB_DEVICE(0x0af0, 0xd155)},

^ permalink raw reply related

* Re: Question about an assignment in handle_ing()
From: jamal @ 2010-05-25 12:20 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <20100525121202.GA12712@gondor.apana.org.au>

On Tue, 2010-05-25 at 22:12 +1000, Herbert Xu wrote:

> It is guaranteed, it is illegal to hold onto the skb contents
> without the reference that comes from cloning an skb.

As i recall this was an issue. And the example i described earlier
would have tcpdump show edited contents coming out of eth0.

> The DHCP client may keep the AF_PACKET open even if there is no
> ongoing dialogue.

Ok ;-> Not running that dhcp client without looking at how it behaves.

> For ingress, the packet may have looped back from a virtual
> interface.  It may have been cloned on its way out of the virtual
> interface.

and i think this is where you and i ended last time we talked.

> Looking at ptype instead of checking whether the skb is cloned
> simply doesn't work in this case.

I will test in a few days - or if Jiri has the bandwidth i think i can
describe what to test (I have the relevant tests logged somewhere).

I apologize there may be some large latency of about a day before my
next response.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH RFC] netfilter: iptables target SYNPROXY
From: Jozsef Kadlecsik @ 2010-05-25 12:17 UTC (permalink / raw)
  To: Changli Gao
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov, James Morris,
	netfilter-devel, netdev
In-Reply-To: <1274771218-3204-1-git-send-email-xiaosuo@gmail.com>

Hi,

On Tue, 25 May 2010, Changli Gao wrote:

> iptables target SYNPROXY.
> 
> This patch implements an iptables target SYNPROXY, which works in the raw table
> of the PREROUTING chain, before conntracking system. Syncookies is used, so no
> new state is introduced into the conntracking system. In fact, until the first
> connection is established, conntracking system doesn't see any packets. So when
> there is a SYN-flood attack, conntracking system won't be busy on finding and
> deleting the un-assured ct.

My main problem with your target is that by using it, important and useful 
TCP options are lost: timestamp and SACK. That pushes back TCP by almost 
twenty years.

Here you reason for the target that it protects conntrack itself, but in 
the Kconfig text you write that it protects the servers behind the 
firewall. Both can be true, but if the real goal is to defend the servers 
then your target could simply send a faked ACK to complete the three way 
handshake and that way TCP would not be crippled (conntrack timeout 
should still be adjusted).

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply

* Re: Question about an assignment in handle_ing()
From: Herbert Xu @ 2010-05-25 12:12 UTC (permalink / raw)
  To: jamal; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <1274789024.3878.919.camel@bigi>

On Tue, May 25, 2010 at 08:03:44AM -0400, jamal wrote:
> On Tue, 2010-05-25 at 20:26 +1000, Herbert Xu wrote:
> 
> > In that case you should be checking whether the skb is cloned.
> 
> That is the general rule used (and what i specify to do in the docs)..
> but i recall there were issues if the packet path emanated from ingress
> and included multiple netdevices (earlier ex with mirror applies). There
> may have been bugs then, eg I could not assume that it i had any ptype
> at all that the ptype will clone the packet. Does tcpdump guarantee
> skb->clone being set? I will try to test some scenarios when i am back
> +settled. 

It is guaranteed, it is illegal to hold onto the skb contents
without the reference that comes from cloning an skb.

> "Most" for people running serious firewalls or routers is not to run
> DHCP servers;-> They may client, but thats a short-lived session.

The DHCP client may keep the AF_PACKET open even if there is no
ongoing dialogue.

> > Also
> > the skb may still be cloned even if there is no AF_PACKET listener.
> > In that case your optimisation may be incorrect.
> 
> Did you mean that as long as there are other ptypes - which may or 
> not be doing af packet? 

For ingress, the packet may have looped back from a virtual
interface.  It may have been cloned on its way out of the virtual
interface.

Looking at ptype instead of checking whether the skb is cloned
simply doesn't work in this case.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: Question about an assignment in handle_ing()
From: jamal @ 2010-05-25 12:03 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <20100525102603.GA11494@gondor.apana.org.au>

On Tue, 2010-05-25 at 20:26 +1000, Herbert Xu wrote:

> In that case you should be checking whether the skb is cloned.

That is the general rule used (and what i specify to do in the docs)..
but i recall there were issues if the packet path emanated from ingress
and included multiple netdevices (earlier ex with mirror applies). There
may have been bugs then, eg I could not assume that it i had any ptype
at all that the ptype will clone the packet. Does tcpdump guarantee
skb->clone being set? I will try to test some scenarios when i am back
+settled. 

> After all, tcpdump might have simply filtered the packet out.

True - but i think thats an acceptable compromise.

> BTW, this is the case whenever you run a DHCP client/server.  So
> on most boxes your optimisation will never kick in as is.  

"Most" for people running serious firewalls or routers is not to run
DHCP servers;-> They may client, but thats a short-lived session.

> Also
> the skb may still be cloned even if there is no AF_PACKET listener.
> In that case your optimisation may be incorrect.

Did you mean that as long as there are other ptypes - which may or 
not be doing af packet? 

cheers,
jamal


^ permalink raw reply

* Warning in net/ipv4/af_inet.c:154
From: Anton Blanchard @ 2010-05-25 11:58 UTC (permalink / raw)
  To: netdev


Hi,

A build from last night (2.6.34-git9) is hitting the following WARN_ON on
a ppc64 box:

Badness at net/ipv4/af_inet.c:154

NIP [c00000000062b034] .inet_sock_destruct+0x1b0/0x1f0
LR [c0000000005b376c] .__sk_free+0x58/0x154
Call Trace:
[c000000000088c0c] .__local_bh_disable+0xdc/0xfc (unreliable)
[c0000000005b376c] .__sk_free+0x58/0x154
[c0000000005b3978] .sk_free+0x4c/0x60
[c0000000005b3ab4] .sk_common_release+0x128/0x148
[c00000000061da44] .udp_lib_close+0x28/0x40
[c00000000062a990] .inet_release+0xd0/0xf8
[c0000000005adbf8] .sock_release+0x60/0xec
[c0000000005adcd0] .sock_close+0x4c/0x6c
[c000000000186554] .__fput+0x184/0x270
[c00000000018668c] .fput+0x4c/0x60
[c000000000182248] .filp_close+0xc8/0xf4
[c0000000000826d4] .put_files_struct+0xc8/0x158
[c0000000000827d4] .exit_files+0x70/0x8c
[c0000000000848cc] .do_exit+0x2c8/0x82c
[c000000000084efc] .do_group_exit+0xcc/0x100
[c000000000084f5c] .SyS_exit_group+0x2c/0x48
[c000000000008554] syscall_exit+0x0/0x40

Which is:

        WARN_ON(sk->sk_forward_alloc);

Anton

^ permalink raw reply

* Re: [PATCH RFC] netfilter: iptables target SYNPROXY
From: Jan Engelhardt @ 2010-05-25 11:50 UTC (permalink / raw)
  To: Changli Gao
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov, James Morris,
	netfilter-devel, netdev
In-Reply-To: <AANLkTim2DrXuBPUBK0TseqVJ5lxPRQjK0P_QJVsbtYRC@mail.gmail.com>


On Tuesday 2010-05-25 13:26, Changli Gao wrote:
>> On Tuesday 2010-05-25 09:06, Changli Gao wrote:
>>> net/ipv4/netfilter/Kconfig                  |   12
>>> net/ipv4/netfilter/Makefile                 |    1
>>> net/ipv4/netfilter/ipt_SYNPROXY.c           |  658 ++++++++++++++++++++++++++++
>>> net/ipv4/syncookies.c                       |   21
>>> net/ipv4/tcp_ipv4.c                         |    5
>>> net/netfilter/nf_conntrack_core.c           |   44 +
>>
>> Please make this an Xtables extension.
>> There is excellent documentation ("Writing Netfilter Modules" to google)
>> on the details if needed.
>
>Hmm. I don't know IPv6 well. So I leave it as an iptables extension,
>and hope sb. comes on with IPv6 support after it gets merged.

This should still be xt even if it does not do ipv6.

>>>+      th = skb_header_pointer(skb, iph->ihl * 4, sizeof(_th), &_th);
>>>+      BUG_ON(th == NULL);
>>
>> I wouldn't call that a bug. I think that can happen on an evil TCP tinygram
>> (with proper IPV4 header).
>
>I copied the code from file nf_conntrack_proto_tcp.c. In fact, BUG_ON
>isn't useful at all, as tcp_error is called before this function
>is called.

Ah I see it hard-depends on nf_conntrack_ipv4 (in Kconfig).
That too could be done instead by using nf_ct_get_l3proto at runtime.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH RFC] netfilter: iptables target SYNPROXY
From: Changli Gao @ 2010-05-25 11:26 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov, James Morris,
	netfilter-devel, netdev
In-Reply-To: <alpine.LSU.2.01.1005251207450.1725@obet.zrqbmnf.qr>

On Tue, May 25, 2010 at 6:36 PM, Jan Engelhardt <jengelh@medozas.de> wrote:
>
> On Tuesday 2010-05-25 09:06, Changli Gao wrote:
>> net/ipv4/netfilter/Kconfig                  |   12
>> net/ipv4/netfilter/Makefile                 |    1
>> net/ipv4/netfilter/ipt_SYNPROXY.c           |  658 ++++++++++++++++++++++++++++
>> net/ipv4/syncookies.c                       |   21
>> net/ipv4/tcp_ipv4.c                         |    5
>> net/netfilter/nf_conntrack_core.c           |   44 +
>
> Please make this an Xtables extension.
> There is excellent documentation ("Writing Netfilter Modules" to google)
> on the details if needed.

Hmm. I don't know IPv6 well. So I leave it as an iptables extension,
and hope sb. comes on with IPv6 support after it gets merged.

>
>>--- a/include/net/netfilter/nf_conntrack_core.h
>>+++ b/include/net/netfilter/nf_conntrack_core.h
>>@@ -63,8 +63,21 @@ static inline int nf_conntrack_confirm(struct sk_buff *skb)
>>       if (ct && ct != &nf_conntrack_untracked) {
>>               if (!nf_ct_is_confirmed(ct) && !nf_ct_is_dying(ct))
>>                       ret = __nf_conntrack_confirm(skb);
>>-              if (likely(ret == NF_ACCEPT))
>>+              if (likely(ret == NF_ACCEPT)) {
>>+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
>>+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
>>+                      int (*syn_proxy)(struct sk_buff *, struct nf_conn *,
>>+                                       enum ip_conntrack_info);
>>+#endif
>>+
>>                       nf_ct_deliver_cached_events(ct);
>>+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
>>+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
>>+                      syn_proxy = rcu_dereference(syn_proxy_post_hook);
>>+                      if (syn_proxy)
>>+                              ret = syn_proxy(skb, ct, skb->nfctinfo);
>>+#endif
>>+              }
>>       }
>>       return ret;
>> }
>
> I'm sure this is possible with lesser macro intrusion - use a separate
> function. Same in nf_conntrack_core.c

I'm thinking about adding two helper functions into structure
nf_conntrack_l4proto. Then the l4 protocol sepcific code won't be in
nf_conntrack_core.c but in nf_conntrack_proto_tcp.c.

>
>>--- a/include/net/tcp.h
>>+++ b/include/net/tcp.h
>>@@ -460,8 +460,18 @@ extern int                        tcp_disconnect(struct sock *sk, int flags);
>> extern __u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS];
>> extern struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
>>                                   struct ip_options *opt);
>>-extern __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb,
>>-                                   __u16 *mss);
>>+extern __u32 __cookie_v4_init_sequence(__be32 saddr, __be32 daddr,
>>+                                     __be16 sport, __be16 dport, __u32 seq,
>>+                                     __u16 *mssp);
>>+static inline __u32 cookie_v4_init_sequence(const struct iphdr *iph,
>>+                                          const struct tcphdr *th,
>>+                                          __u16 *mssp)
>>+{
>>+      return __cookie_v4_init_sequence(iph->saddr, iph->daddr, th->source,
>>+                                       th->dest, ntohl(th->seq), mssp);
>>+}
>
> Since there is only one cookie_v4_init_sequence user AFAICS,
> moving the function there seems to make best sense.

OK. I wanted to keep cookie_v4_init_sequence() and
cookie_v4_check_sequence() symmetric.

>
>>+static int get_mss(u8 *data, int len)
>
> unsigned

I am afraid that I can't do that. As get_mss() may return a negative
value, if there is some invalid TCP option.

>
>>+      /* only support IPv4 now */
>
> Please fix :)

The same reasion as above.

>
>>+      th = skb_header_pointer(skb, iph->ihl * 4, sizeof(_th), &_th);
>>+      BUG_ON(th == NULL);
>
> I wouldn't call that a bug. I think that can happen on an evil TCP tinygram
> (with proper IPV4 header).

I copied the code from file nf_conntrack_proto_tcp.c. In fact, BUG_ON
isn't useful at all, as tcp_error is called before this function
is called.

>
>>+static int syn_proxy_mangle_pkt(struct sk_buff *skb, struct iphdr *iph,
>>+                              struct tcphdr *th, u32 seq_diff)
>>+{
>>+      __be32 new;
>>+      int olen;
>>+
>>+      if (skb->len < (iph->ihl + th->doff) * 4)
>>+              return NF_DROP;
>
> Verdicts are unsigned too. (Also in other functions.)

OK. Thanks.

>
>>+static struct xt_target synproxy_tg_reg __read_mostly = {
>>+      .name           = "SYNPROXY",
>>+      .family         = NFPROTO_IPV4,
>>+      .target         = synproxy_tg,
>>+      .table          = "raw",
>>+      .hooks          = (1 << NF_INET_PRE_ROUTING),
>
> No need for (),

Yea. Thanks.

>
>>+      .proto          = IPPROTO_TCP,
>>+      .me             = THIS_MODULE,
>>+};
>>+
>>+static int __init synproxy_tg_init(void)
>>+{
>>+      int err, cpu;
>>+
>>+      for_each_possible_cpu(cpu)
>>+              per_cpu(syn_proxy_state, cpu).seq_inited = 0;
>
> Static data starts out zeroed. That also applies to percpu data, doesn't it?
>

Yea, Eric has mentioned that.

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 22/27] net/dccp: Use memdup_user
From: Gerrit Renker @ 2010-05-25 11:23 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Arnaldo Carvalho de Melo, David S. Miller, dccp, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <Pine.LNX.4.64.1005221025040.13021@ask.diku.dk>

| Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

| 
| ---
|  net/dccp/proto.c |   11 +++--------
|  1 file changed, 3 insertions(+), 8 deletions(-)
| 
| diff --git a/net/dccp/proto.c b/net/dccp/proto.c
| index b03ecf6..f79bcef 100644
| --- a/net/dccp/proto.c
| +++ b/net/dccp/proto.c
| @@ -473,14 +473,9 @@ static int dccp_setsockopt_ccid(struct sock *sk, int type,
|  	if (optlen < 1 || optlen > DCCP_FEAT_MAX_SP_VALS)
|  		return -EINVAL;
|  
| -	val = kmalloc(optlen, GFP_KERNEL);
| -	if (val == NULL)
| -		return -ENOMEM;
| -
| -	if (copy_from_user(val, optval, optlen)) {
| -		kfree(val);
| -		return -EFAULT;
| -	}
| +	val = memdup_user(optval, optlen);
| +	if (IS_ERR(val))
| +		return PTR_ERR(val);
|  
|  	lock_sock(sk);
|  	if (type == DCCP_SOCKOPT_TX_CCID || type == DCCP_SOCKOPT_CCID)

^ permalink raw reply

* Re: [PATCH RFC] netfilter: iptables target SYNPROXY
From: Jan Engelhardt @ 2010-05-25 10:36 UTC (permalink / raw)
  To: Changli Gao
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov, James Morris,
	netfilter-devel, netdev
In-Reply-To: <1274771218-3204-1-git-send-email-xiaosuo@gmail.com>


On Tuesday 2010-05-25 09:06, Changli Gao wrote:
> net/ipv4/netfilter/Kconfig                  |   12 
> net/ipv4/netfilter/Makefile                 |    1 
> net/ipv4/netfilter/ipt_SYNPROXY.c           |  658 ++++++++++++++++++++++++++++
> net/ipv4/syncookies.c                       |   21 
> net/ipv4/tcp_ipv4.c                         |    5 
> net/netfilter/nf_conntrack_core.c           |   44 +

Please make this an Xtables extension.
There is excellent documentation ("Writing Netfilter Modules" to google)
on the details if needed.

>--- a/include/net/netfilter/nf_conntrack_core.h
>+++ b/include/net/netfilter/nf_conntrack_core.h
>@@ -63,8 +63,21 @@ static inline int nf_conntrack_confirm(struct sk_buff *skb)
> 	if (ct && ct != &nf_conntrack_untracked) {
> 		if (!nf_ct_is_confirmed(ct) && !nf_ct_is_dying(ct))
> 			ret = __nf_conntrack_confirm(skb);
>-		if (likely(ret == NF_ACCEPT))
>+		if (likely(ret == NF_ACCEPT)) {
>+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
>+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
>+			int (*syn_proxy)(struct sk_buff *, struct nf_conn *,
>+					 enum ip_conntrack_info);
>+#endif
>+
> 			nf_ct_deliver_cached_events(ct);
>+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
>+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
>+			syn_proxy = rcu_dereference(syn_proxy_post_hook);
>+			if (syn_proxy)
>+				ret = syn_proxy(skb, ct, skb->nfctinfo);
>+#endif
>+		}
> 	}
> 	return ret;
> }

I'm sure this is possible with lesser macro intrusion - use a separate
function. Same in nf_conntrack_core.c

>--- a/include/net/tcp.h
>+++ b/include/net/tcp.h
>@@ -460,8 +460,18 @@ extern int			tcp_disconnect(struct sock *sk, int flags);
> extern __u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS];
> extern struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, 
> 				    struct ip_options *opt);
>-extern __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, 
>-				     __u16 *mss);
>+extern __u32 __cookie_v4_init_sequence(__be32 saddr, __be32 daddr,
>+				       __be16 sport, __be16 dport, __u32 seq,
>+				       __u16 *mssp);
>+static inline __u32 cookie_v4_init_sequence(const struct iphdr *iph,
>+					    const struct tcphdr *th,
>+					    __u16 *mssp)
>+{
>+	return __cookie_v4_init_sequence(iph->saddr, iph->daddr, th->source,
>+					 th->dest, ntohl(th->seq), mssp);
>+}

Since there is only one cookie_v4_init_sequence user AFAICS,
moving the function there seems to make best sense.

>+static int get_mss(u8 *data, int len)

unsigned

>+	/* only support IPv4 now */

Please fix :)

>+	th = skb_header_pointer(skb, iph->ihl * 4, sizeof(_th), &_th);
>+	BUG_ON(th == NULL);

I wouldn't call that a bug. I think that can happen on an evil TCP tinygram
(with proper IPV4 header).

>+static int syn_proxy_mangle_pkt(struct sk_buff *skb, struct iphdr *iph,
>+				struct tcphdr *th, u32 seq_diff)
>+{
>+	__be32 new;
>+	int olen;
>+
>+	if (skb->len < (iph->ihl + th->doff) * 4)
>+		return NF_DROP;

Verdicts are unsigned too. (Also in other functions.)

>+static struct xt_target synproxy_tg_reg __read_mostly = {
>+	.name		= "SYNPROXY",
>+	.family		= NFPROTO_IPV4,
>+	.target		= synproxy_tg,
>+	.table		= "raw",
>+	.hooks		= (1 << NF_INET_PRE_ROUTING),

No need for (),

>+	.proto		= IPPROTO_TCP,
>+	.me		= THIS_MODULE,
>+};
>+
>+static int __init synproxy_tg_init(void)
>+{
>+	int err, cpu;
>+
>+	for_each_possible_cpu(cpu)
>+		per_cpu(syn_proxy_state, cpu).seq_inited = 0;

Static data starts out zeroed. That also applies to percpu data, doesn't it?


^ permalink raw reply

* Re: Question about an assignment in handle_ing()
From: Herbert Xu @ 2010-05-25 10:26 UTC (permalink / raw)
  To: jamal; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <1274781067.3878.872.camel@bigi>

On Tue, May 25, 2010 at 05:51:07AM -0400, jamal wrote:
> The code is correct. 
> 
> Main reason for the else condn is driven by optimization:
> If you are running tcpdump or other af packet type code, the "if" condn
> is hit and matching actions are not allowed trample on that same packet
> data. They have to make a private copy; otherwise, the "else" is hit and
> (for optimization reason) you give ok to the actions that follow to
> munge the packet. Essentially, you dont want actions to alloc/copy every
> single time when you are not running tcpdump for example; reason is that
> most of the time you run tcpdump it is for debugging.
> [I had seen very observable differences on some old mips board back in
> the day on whether you avoided copy every time vs when debugging by
> running tcpdump and copied every packet.]

In that case you should be checking whether the skb is cloned.
After all, tcpdump might have simply filtered the packet out.

BTW, this is the case whenever you run a DHCP client/server.  So
on most boxes your optimisation will never kick in as is.  Also
the skb may still be cloned even if there is no AF_PACKET listener.
In that case your optimisation may be incorrect.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: Namespaces and devices
From: Daniel Lezcano @ 2010-05-25 10:14 UTC (permalink / raw)
  To: Martín Ferrari; +Cc: netdev, Mathieu Lacage
In-Reply-To: <AANLkTimp16eLIO7F65t7-hFgXGbVnoEIW3jGhzWTJlYa@mail.gmail.com>

On 05/25/2010 11:56 AM, Martín Ferrari wrote:
> Hi Daniel,
>
> On Sun, May 23, 2010 at 16:04, Daniel Lezcano<daniel.lezcano@free.fr>  wrote:
>
>    
>> The documentation on this website is a bit out dated. That was the initial
>> behavior but was changed as the following.
>>
>> All the virtual devices are destroyed with the network namespace. The
>> destroyable virtual devices are identified when they have the dellink ops
>> defined. If you can do the 'ip link del' command on this device, then this
>> device type will be destroyed by a netns.
>>      
> Fair enough, thanks for the explanation.
>    

You are welcome ;)

>> About the oops,  was the it "kernel panic when using
>> netns+bridges+tc(netem)" ?
>>      
> No, this is a different issue. Probably it got lost in the noise of
> the thread, but it is here:
> http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/694830/focus=158583
> (it is my second mail in that thread)

Ok, I was able to reproduce it easily. If I have spare time this week, I 
will look at the oops.

>>> Also, I have read somewhere (now I cannot find it) that supposedly, I
>>> should be able to move real devices to a netns, but I always get
>>> Invalid argument errors.
>>>        
>> Yes, that was previously the case with the proof of concept, because sysfs
>> per namespace was enabled. But this feature is not merged upstream yet (but
>> is on the way), so physical devices are not movable across namespaces.
>>      
> OK. The plan is for them to return to netns 1 automatically at destroy?
>    

Yes, this is the plan.

Thanks
   -- Daniel

^ permalink raw reply

* Re: Namespaces and devices
From: Martín Ferrari @ 2010-05-25  9:56 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: netdev, Mathieu Lacage
In-Reply-To: <4BF935F6.2050907@free.fr>

Hi Daniel,

On Sun, May 23, 2010 at 16:04, Daniel Lezcano <daniel.lezcano@free.fr> wrote:

> The documentation on this website is a bit out dated. That was the initial
> behavior but was changed as the following.
>
> All the virtual devices are destroyed with the network namespace. The
> destroyable virtual devices are identified when they have the dellink ops
> defined. If you can do the 'ip link del' command on this device, then this
> device type will be destroyed by a netns.

Fair enough, thanks for the explanation.

> About the oops,  was the it "kernel panic when using
> netns+bridges+tc(netem)" ?

No, this is a different issue. Probably it got lost in the noise of
the thread, but it is here:
http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/694830/focus=158583
(it is my second mail in that thread)

>> Also, I have read somewhere (now I cannot find it) that supposedly, I
>> should be able to move real devices to a netns, but I always get
>> Invalid argument errors.
>
> Yes, that was previously the case with the proof of concept, because sysfs
> per namespace was enabled. But this feature is not merged upstream yet (but
> is on the way), so physical devices are not movable across namespaces.

OK. The plan is for them to return to netns 1 automatically at destroy?

Thanks!
-- 
Martín Ferrari

^ permalink raw reply

* Re: host panic on kernel 2.6.34
From: Avi Kivity @ 2010-05-25  9:53 UTC (permalink / raw)
  To: Hao, Xudong
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, netdev, bridge
In-Reply-To: <BC00F5384FCFC9499AF06F92E8B78A9E04DCCCE242@shsmsx502.ccr.corp.intel.com>

Copying netdev, bridge mailing lists.

On 05/24/2010 11:23 AM, Hao, Xudong wrote:
> Hi all
> I build latest kvm 37dec075a7854f0f550540bf3b9bbeef37c11e2a, based on kernel 2.6.34, after kvm and kvm_intel module loaded, then /etc/init.d/kvm start, a few minutes later, the system will panic.
>
> kernel: 2.6.34
> kvm: 37dec075a7854f0f550540bf3b9bbeef37c11e2a
> qemu-kvm: 69dd59a66aaf56d1e8e4c96d0a0923c9cf8f79a0
>
> BUG: unable to handle kernel NULL pointer dereference at 00000018
> IP: [<f914c05b>] br_mdb_ip_get+0x2e/0x1aa [bridge]
> *pdpt = 0000000035fbb001 *pde = 0000000000000000
> Oops: 0000 [#1] SMP
> last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
> Modules linked in: bridge stp autofs4 hidp rfcomm l2cap crc16 bluetooth rfkill ]
>
> Pid: 0, comm: swapper Not tainted 2.6.34 #1 X7DWA/X7DWA
> EIP: 0060:[<f914c05b>] EFLAGS: 00010246 CPU: 0
> EIP is at br_mdb_ip_get+0x2e/0x1aa [bridge]
> EAX: c5801d40 EBX: c5801d40 ECX: faffffef EDX: 00000000
> ESI: f67e03c0 EDI: f5249200 EBP: c5801c94 ESP: c5801c80
>   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process swapper (pid: 0, ti=c5801000 task=c07f2fe0 task.ti=c07de000)
> Stack:
>   c5801d40 00000000 c5801d40 f67e03c0 f5249200 c5801cb0 f914c6fd fff90006
> <0>  f67e0940 f6326740 f627e064 f67e03c0 c5801d78 f914dd0c f76af140 f6326740
> <0>  f5249200 f67e03c0 00000014 f6326758 c5801d54 c08eb440 c5801cf4 c5801d00
> Call Trace:
>   [<f914c6fd>] ? br_multicast_leave_group+0x52/0x128 [bridge]
>   [<f914dd0c>] ? br_multicast_rcv+0x6dc/0xe90 [bridge]
>   [<c0650420>] ? fib_lookup+0x2c/0x3a
>   [<c064cd15>] ? fib_validate_source+0x29d/0x2b4
>   [<c0621175>] ? nf_hook_slow+0x3b/0x92
>   [<f9147b39>] ? br_handle_frame_finish+0x53/0x17e [bridge]
>   [<f914b880>] ? br_nf_pre_routing_finish+0x264/0x27c [bridge]
>   [<c0621175>] ? nf_hook_slow+0x3b/0x92
>   [<f914b61c>] ? br_nf_pre_routing_finish+0x0/0x27c [bridge]
>   [<f914bf6f>] ? br_nf_pre_routing+0x553/0x570 [bridge]
>   [<c0621107>] ? nf_iterate+0x2f/0x62
>   [<f9147ae6>] ? br_handle_frame_finish+0x0/0x17e [bridge]
>   [<c0621175>] ? nf_hook_slow+0x3b/0x92
>   [<f9147ae6>] ? br_handle_frame_finish+0x0/0x17e [bridge]
>   [<f9147dda>] ? br_handle_frame+0x176/0x198 [bridge]
>   [<f9147ae6>] ? br_handle_frame_finish+0x0/0x17e [bridge]
>   [<c060643b>] ? __netif_receive_skb+0x29a/0x37e
>   [<c0607023>] ? dev_gro_receive+0xfd/0x1d2
>   [<c0606e03>] ? netif_receive_skb+0x61/0x67
>   [<c0607199>] ? __napi_gro_receive+0xa1/0xba
>   [<c0606e7e>] ? napi_skb_finish+0x1e/0x33
>   [<c0607201>] ? napi_gro_receive+0x20/0x24
>   [<f8867cfc>] ? igb_poll+0x706/0xa39 [igb]
>   [<c06093b2>] ? net_rx_action+0x97/0x13b
>   [<c0430641>] ? __do_softirq+0x80/0xf4
>   [<c04305c1>] ? __do_softirq+0x0/0xf4
>   <IRQ>
>   [<c04305bf>] ? irq_exit+0x29/0x2b
>   [<c040373e>] ? do_IRQ+0x85/0x9b
>   [<c0402ca9>] ? common_interrupt+0x29/0x30
>   [<c0407c4f>] ? mwait_idle+0x4c/0x52
>   [<c0401a08>] ? cpu_idle+0x3a/0x4e
>   [<c066cf16>] ? rest_init+0x62/0x64
>   [<c08248dd>] ? start_kernel+0x2c2/0x2c7
>   [<c08240b3>] ? i386_start_kernel+0xb3/0xb8
> Code: 57 56 53 83 ec 08 89 45 f0 89 55 ec 8b 42 10 66 83 f8 08 74 0e 31 db 66 3
> EIP: [<f914c05b>] br_mdb_ip_get+0x2e/0x1aa [bridge] SS:ESP 0068:c5801c80
> CR2: 0000000000000018
> ---[ end trace 907f878ab4cd8031 ]---
> Kernel panic - not syncing: Fatal exception in interrupt
> Pid: 0, comm: swapper Tainted: G      D     2.6.34 #1
> Call Trace:
>   [<c042c31b>] panic+0x3e/0xaa
>   [<c0681caa>] oops_end+0x8c/0x9b
>   [<c041e710>] no_context+0x153/0x15d
>   [<c041e8a2>] __bad_area_nosemaphore+0xe5/0xed
>   [<c041e90e>] bad_area_nosemaphore+0xd/0x13
>   [<c06838b0>] do_page_fault+0x375/0x37d
>   [<c0650420>] ? fib_lookup+0x2c/0x3a
>   [<c0624431>] ? ip_route_input_common+0x695/0xf2f
>   [<c068353b>] ? do_page_fault+0x0/0x37d
>   [<c06813d6>] error_code+0x66/0x6c
>   [<c068353b>] ? do_page_fault+0x0/0x37d
>   [<f914c05b>] ? br_mdb_ip_get+0x2e/0x1aa [bridge]
>   [<f914c6fd>] br_multicast_leave_group+0x52/0x128 [bridge]
>   [<f914dd0c>] br_multicast_rcv+0x6dc/0xe90 [bridge]
>   [<c0650420>] ? fib_lookup+0x2c/0x3a
>   [<c064cd15>] ? fib_validate_source+0x29d/0x2b4
>   [<c0621175>] ? nf_hook_slow+0x3b/0x92
>   [<f9147b39>] br_handle_frame_finish+0x53/0x17e [bridge]
>   [<f914b880>] br_nf_pre_routing_finish+0x264/0x27c [bridge]
>   [<c0621175>] ? nf_hook_slow+0x3b/0x92
>   [<f914b61c>] ? br_nf_pre_routing_finish+0x0/0x27c [bridge]
>   [<f914bf6f>] br_nf_pre_routing+0x553/0x570 [bridge]
>   [<c0621107>] nf_iterate+0x2f/0x62
>   [<f9147ae6>] ? br_handle_frame_finish+0x0/0x17e [bridge]
>   [<c0621175>] nf_hook_slow+0x3b/0x92
>   [<f9147ae6>] ? br_handle_frame_finish+0x0/0x17e [bridge]
>   [<f9147dda>] br_handle_frame+0x176/0x198 [bridge]
>   [<f9147ae6>] ? br_handle_frame_finish+0x0/0x17e [bridge]
>   [<c060643b>] __netif_receive_skb+0x29a/0x37e
>   [<c0607023>] ? dev_gro_receive+0xfd/0x1d2
>   [<c0606e03>] netif_receive_skb+0x61/0x67
>   [<c0607199>] ? __napi_gro_receive+0xa1/0xba
>   [<c0606e7e>] napi_skb_finish+0x1e/0x33
>   [<c0607201>] napi_gro_receive+0x20/0x24
>   [<f8867cfc>] igb_poll+0x706/0xa39 [igb]
>   [<c06093b2>] net_rx_action+0x97/0x13b
>   [<c0430641>] __do_softirq+0x80/0xf4
>   [<c04305c1>] ? __do_softirq+0x0/0xf4
>   <IRQ>   [<c04305bf>] ? irq_exit+0x29/0x2b
>   [<c040373e>] ? do_IRQ+0x85/0x9b
>   [<c0402ca9>] ? common_interrupt+0x29/0x30
>   [<c0407c4f>] ? mwait_idle+0x4c/0x52
>   [<c0401a08>] ? cpu_idle+0x3a/0x4e
>   [<c066cf16>] ? rest_init+0x62/0x64
>   [<c08248dd>] ? start_kernel+0x2c2/0x2c7
>   [<c08240b3>] ? i386_start_kernel+0xb3/0xb8
>
> Best Regards,
> Xudong Hao
>    


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply

* Re: [Patch] fix packet loss and massive ping spikes with PPP multi-link
From: Richard Hartmann @ 2010-05-25  9:52 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-ppp, erik_list
In-Reply-To: <2d460de71003260850x7f90d04cy79ac853464108182@mail.gmail.com>

Hi all,

unfortunately, I lack the thread [1] in which John and Erik so I am
replying in my initial thread.

Unfortunately, our own patch is still in its initial state even though
we tried to clean it up to the Kernel's standard.

As you can see in [1], there is a real problem with multilink ppp. Thus,
I wanted to take the opportunity to ask for someone who is more familiar
with both C and the Linux Kernel than we are to have a go at getting
this code cleaned up and into mainline.

Any and all feedback appreciated,
Richard

[1] http://marc.info/?l=linux-ppp&m=127429724125841&w=2

^ permalink raw reply

* Re: Question about an assignment in handle_ing()
From: jamal @ 2010-05-25  9:51 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, herbert, kaber
In-Reply-To: <20100524112236.GF2810@psychotron.lab.eng.brq.redhat.com>

Hi Jiri,

On Mon, 2010-05-24 at 13:22 +0200, Jiri Pirko wrote:

> Question is if this code is correct here. Maybe I'm missing something but
> why is this dependent on a ptype was found previously?

The code is correct. 

Main reason for the else condn is driven by optimization:
If you are running tcpdump or other af packet type code, the "if" condn
is hit and matching actions are not allowed trample on that same packet
data. They have to make a private copy; otherwise, the "else" is hit and
(for optimization reason) you give ok to the actions that follow to
munge the packet. Essentially, you dont want actions to alloc/copy every
single time when you are not running tcpdump for example; reason is that
most of the time you run tcpdump it is for debugging.
[I had seen very observable differences on some old mips board back in
the day on whether you avoided copy every time vs when debugging by
running tcpdump and copied every packet.]

There is a secondary reason for the else stmnt: you want tcpdump to see
the naked packet as it came on say eth0. I am in travel mode at the
moment, so i cant validate this for you via a testcase (which moves the
else outside), but i know this was a problem back then...
Example. If i had a packet sequence path as follows: 
--> eth0-->tcpdump0-->filter match --> action: edit -->action: redirect
to dummy0-->tcpdump1-->dummy0(drop and count)
Then you want the packet on tcpdump0 to be whatever was seen by eth0.
You want to see on tcpdump1 whatever was seen by dummy0 - which is an
edited version of whatever was seen by eth0. 

If this description makes sense to you (and since this has come up more
than once before), would you be kind enough to submit a patch that fixes
the current comment and add my ACK to it?

cheers,
jamal

^ permalink raw reply

* RE: ixgbe: macvlan on PF/VF when SRIOV is enabled
From: Shirley Ma @ 2010-05-25  9:37 UTC (permalink / raw)
  To: Rose, Gregory V
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, kvm@vger.kernel.org,
	netdev@vger.kernel.org, e1000-devel@lists.sourceforge.net
In-Reply-To: <43F901BD926A4E43B106BF17856F0755A79C0E05@orsmsx508.amr.corp.intel.com>

On Mon, 2010-05-24 at 10:54 -0700, Rose, Gregory V wrote:
> We look forward to it and will be happy to provide feedback.
I have submitted the patch to make macvlan on PF works when SRIOV is
enabled.

> One thing you can do is allocate VFs and then load the VF driver in
> your host domain and then assign each of them a macvlan filter.  You'd
> get a similar effect.

That's I am trying to make it work for macvlan on VFs in host domain. I
need to add VF secondary addresses in address filter, right?

Do you have any aggregation performance comparison between multiple
macvlans on PF and single macvlan per VF in host domain? I will run some
test to figure it out. If you have some data to share that would be
great.

Thanks
Shirley

^ permalink raw reply

* Re: [PATCH 0/2] fixes to arp_notify for virtual machine migration use case
From: Ian Campbell @ 2010-05-25  9:20 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, shemminger@linux-foundation.org,
	Jeremy Fitzhardinge, stable@kernel.org
In-Reply-To: <20100523.233722.232536798.davem@davemloft.net>

On Mon, 2010-05-24 at 07:37 +0100, David Miller wrote:
> From: Ian Campbell <Ian.Campbell@citrix.com>
> Date: Wed, 12 May 2010 14:39:14 +0100
> 
> > Ian Campbell (2):
> >       arp_notify: generate broadcast ARP reply not request.
> >       arp_notify: generate arp_notify event on NETDEV_CHANGE too
> 
> I don't agree with these changes.
> 
> For the first one, I think the documentation is just wrong and the
> code is what expresses the intent.  The idea is not to spam the
> world with a broadcast, only interested parties.

OK. So what is currently sent is a gratuitous ARP request which I hadn't
realised was a valid/normal thing to send but a bit of research shows
that it is.

In terms of MAC learning tables (which is what is interesting in the VM
migration case) I think the gratuitous ARP request and reply are
equivalent.

I guess the difference is that an ARP request doesn't cause everyone to
add an entry to their ARP cache, whereas a bcast ARP reply would cause
an ARP cache entry to appear even on hosts which may never talk to the
VM. Is that what you meant by spamming the world?

Anyway, I'll patch the docs instead in my next submission.

> Patch #2 I have major issues with, carriers flapping occaisionally is
> very common.  I have several interfaces which do this even on lightly
> loaded networks.  Iff we decide to do something like this (big "if")
> it would need to be rate limited so that it doesn't trigger due to
> normal flapping.

FWIW arp_notify is an option which is disabled by default.

> If you want your VM networking devices to trigger this event maybe
> the best thing to do is to create a special notification which
> allows us to prevent from doing this ARP notify for spurious physical
> network device carrier flaps.

I actually started down this path before I figured out how to make the
existing notifications work for my use cases.

Anyway, assuming the fact that arp_notify is disabled by default hasn't
changed your mind, would adding NETDEV_NOTIFY_PEERS triggered by
netif_notify_peers() be appropriate or would it be preferable to simply
add netif_notify_peers() which generates the existing NETDEV_CHANGEADDR?

I don't think there's any need for a new sysctl so I'll gate the new
option on the existing arp_notify one.

Ian.

^ permalink raw reply

* Re: [PATCH] vhost: Fix host panic if ioctl called with wrong index
From: Michael S. Tsirkin @ 2010-05-25  8:13 UTC (permalink / raw)
  To: Krishna Kumar; +Cc: netdev, kvm
In-Reply-To: <20100525054036.2022.66692.sendpatchset@krkumar2.in.ibm.com>

On Tue, May 25, 2010 at 11:10:36AM +0530, Krishna Kumar wrote:
> From: Krishna Kumar <krkumar2@in.ibm.com>
> 
> Missed a boundary value check in vhost_set_vring. The host panics if
> idx == nvqs is used in ioctl commands in vhost_virtqueue_init.
> 
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>

Thanks, applied.

> ---
>  drivers/vhost/vhost.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff -ruNp org/drivers/vhost/vhost.c new/drivers/vhost/vhost.c
> --- org/drivers/vhost/vhost.c	2010-05-24 09:25:57.000000000 +0530
> +++ new/drivers/vhost/vhost.c	2010-05-24 09:26:53.000000000 +0530
> @@ -374,7 +374,7 @@ static long vhost_set_vring(struct vhost
>  	r = get_user(idx, idxp);
>  	if (r < 0)
>  		return r;
> -	if (idx > d->nvqs)
> +	if (idx >= d->nvqs)
>  		return -ENOBUFS;
>  
>  	vq = d->vqs + idx;

^ permalink raw reply

* [PATCH net-2.6] be2net: Bug fix to avoid disabling bottom half during firmware upgrade.
From: Sarveshwar Bandi @ 2010-05-25  8:15 UTC (permalink / raw)
  To: netdev; +Cc: davem

Certain firmware commands/operations to upgrade firmware could take several
seconds to complete. The code presently disables bottom half during these
operations which could lead to unpredictable behaviour in certain cases. This
patch now does all firmware upgrade operations asynchronously using a
completion variable.

Signed-off-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>
---
 drivers/net/benet/be.h      |    2 ++
 drivers/net/benet/be_cmds.c |   19 +++++++++++++++++--
 drivers/net/benet/be_main.c |    1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/net/benet/be.h b/drivers/net/benet/be.h
index 373c1a5..b46be49 100644
--- a/drivers/net/benet/be.h
+++ b/drivers/net/benet/be.h
@@ -283,6 +283,8 @@ struct be_adapter {
 	u8 port_type;
 	u8 transceiver;
 	u8 generation;		/* BladeEngine ASIC generation */
+	u32 flash_status;
+	struct completion flash_compl;
 
 	bool sriov_enabled;
 	u32 vf_if_handle[BE_MAX_VF];
diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index e79bf8b..c911bfb 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -59,6 +59,13 @@ static int be_mcc_compl_process(struct b
 
 	compl_status = (compl->status >> CQE_STATUS_COMPL_SHIFT) &
 				CQE_STATUS_COMPL_MASK;
+
+	if ((compl->tag0 == OPCODE_COMMON_WRITE_FLASHROM) &&
+		(compl->tag1 == CMD_SUBSYSTEM_COMMON)) {
+		adapter->flash_status = compl_status;
+		complete(&adapter->flash_compl);
+	}
+
 	if (compl_status == MCC_STATUS_SUCCESS) {
 		if (compl->tag0 == OPCODE_ETH_GET_STATISTICS) {
 			struct be_cmd_resp_get_stats *resp =
@@ -1417,6 +1424,7 @@ int be_cmd_write_flashrom(struct be_adap
 	int status;
 
 	spin_lock_bh(&adapter->mcc_lock);
+	adapter->flash_status = 0;
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -1428,6 +1436,7 @@ int be_cmd_write_flashrom(struct be_adap
 
 	be_wrb_hdr_prepare(wrb, cmd->size, false, 1,
 			OPCODE_COMMON_WRITE_FLASHROM);
+	wrb->tag1 = CMD_SUBSYSTEM_COMMON;
 
 	be_cmd_hdr_prepare(&req->hdr, CMD_SUBSYSTEM_COMMON,
 		OPCODE_COMMON_WRITE_FLASHROM, cmd->size);
@@ -1439,10 +1448,16 @@ int be_cmd_write_flashrom(struct be_adap
 	req->params.op_code = cpu_to_le32(flash_opcode);
 	req->params.data_buf_size = cpu_to_le32(buf_size);
 
-	status = be_mcc_notify_wait(adapter);
+	be_mcc_notify(adapter);
+	spin_unlock_bh(&adapter->mcc_lock);
+
+	if (!wait_for_completion_timeout(&adapter->flash_compl,
+			msecs_to_jiffies(12000)))
+		status = -1;
+	else
+		status = adapter->flash_status;
 
 err:
-	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 1c79c20..aa065c7 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -2319,6 +2319,7 @@ static int be_ctrl_init(struct be_adapte
 	spin_lock_init(&adapter->mcc_lock);
 	spin_lock_init(&adapter->mcc_cq_lock);
 
+	init_completion(&adapter->flash_compl);
 	pci_save_state(adapter->pdev);
 	return 0;
 
-- 
1.4.0


^ permalink raw reply related

* Re: [stable-2.6.32 PATCH] ixgbe: backport bug fix for tx panic
From: Jeff Kirsher @ 2010-05-25  7:15 UTC (permalink / raw)
  To: stable, greg; +Cc: netdev, linux-kernel, davem, Brandon, Jesse Brandeburg
In-Reply-To: <20100511004655.30590.74584.stgit@localhost.localdomain>

On Mon, May 10, 2010 at 17:46, Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
>
> backporting this commit:
>
> commit fdd3d631cddad20ad9d3e1eb7dbf26825a8a121f
> Author: Krishna Kumar <krkumar2@in.ibm.com>
> Date:   Wed Feb 3 13:13:10 2010 +0000
>
>    ixgbe: Fix return of invalid txq
>
>    a developer had complained of getting lots of warnings:
>
>    "eth16 selects TX queue 98, but real number of TX queues is 64"
>
>    http://www.mail-archive.com/e1000-devel@lists.sourceforge.net/msg02200.html
>
>    As there was no follow up on that bug, I am submitting this
>    patch assuming that the other return points will not return
>    invalid txq's, and also that this fixes the bug (not tested).
>
>    Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
>    Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>    Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
>    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>    Signed-off-by: David S. Miller <davem@davemloft.net>
>
> CC: Brandon <brandon@ifup.org>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> ---
>
>  drivers/net/ixgbe/ixgbe_main.c |    8 ++++++--
>  1 files changed, 6 insertions(+), 2 deletions(-)
>

Greg - status?  Did you queue this patch for the stable release and I missed it?

-- 
Cheers,
Jeff

^ permalink raw reply

* [PATCH RFC] netfilter: iptables target SYNPROXY
From: Changli Gao @ 2010-05-25  7:06 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: David S. Miller, Alexey Kuznetsov, James Morris, netfilter-devel,
	netdev, Changli Gao

iptables target SYNPROXY.

This patch implements an iptables target SYNPROXY, which works in the raw table
of the PREROUTING chain, before conntracking system. Syncookies is used, so no
new state is introduced into the conntracking system. In fact, until the first
connection is established, conntracking system doesn't see any packets. So when
there is a SYN-flood attack, conntracking system won't be busy on finding and
deleting the un-assured ct.

As the SYN-packet of the second connection request is sent locally, the DNAT
rules which are in the PREROUTING chain should be moved to the OUTPUT chain.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/net/netfilter/nf_conntrack.h        |    8 
 include/net/netfilter/nf_conntrack_core.h   |   15 
 include/net/netfilter/nf_conntrack_extend.h |    2 
 include/net/tcp.h                           |   14 
 net/ipv4/netfilter/Kconfig                  |   12 
 net/ipv4/netfilter/Makefile                 |    1 
 net/ipv4/netfilter/ipt_SYNPROXY.c           |  658 ++++++++++++++++++++++++++++
 net/ipv4/syncookies.c                       |   21 
 net/ipv4/tcp_ipv4.c                         |    5 
 net/netfilter/nf_conntrack_core.c           |   44 +
 10 files changed, 764 insertions(+), 16 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index bde095f..001e6ee 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -310,5 +310,13 @@ do {							\
 #define MODULE_ALIAS_NFCT_HELPER(helper) \
         MODULE_ALIAS("nfct-helper-" helper)
 
+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
+extern int (*syn_proxy_pre_hook)(struct sk_buff *skb, struct nf_conn *ct,
+				 enum ip_conntrack_info ctinfo);
+
+extern int (*syn_proxy_post_hook)(struct sk_buff *skb, struct nf_conn *ct,
+				  enum ip_conntrack_info ctinfo);
+#endif
 #endif /* __KERNEL__ */
 #endif /* _NF_CONNTRACK_H */
diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h
index dffde8e..ae7d4be 100644
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -63,8 +63,21 @@ static inline int nf_conntrack_confirm(struct sk_buff *skb)
 	if (ct && ct != &nf_conntrack_untracked) {
 		if (!nf_ct_is_confirmed(ct) && !nf_ct_is_dying(ct))
 			ret = __nf_conntrack_confirm(skb);
-		if (likely(ret == NF_ACCEPT))
+		if (likely(ret == NF_ACCEPT)) {
+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
+			int (*syn_proxy)(struct sk_buff *, struct nf_conn *,
+					 enum ip_conntrack_info);
+#endif
+
 			nf_ct_deliver_cached_events(ct);
+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
+			syn_proxy = rcu_dereference(syn_proxy_post_hook);
+			if (syn_proxy)
+				ret = syn_proxy(skb, ct, skb->nfctinfo);
+#endif
+		}
 	}
 	return ret;
 }
diff --git a/include/net/netfilter/nf_conntrack_extend.h b/include/net/netfilter/nf_conntrack_extend.h
index 32d15bd..b2ae7e9 100644
--- a/include/net/netfilter/nf_conntrack_extend.h
+++ b/include/net/netfilter/nf_conntrack_extend.h
@@ -11,6 +11,7 @@ enum nf_ct_ext_id {
 	NF_CT_EXT_ACCT,
 	NF_CT_EXT_ECACHE,
 	NF_CT_EXT_ZONE,
+	NF_CT_EXT_SYNPROXY,
 	NF_CT_EXT_NUM,
 };
 
@@ -19,6 +20,7 @@ enum nf_ct_ext_id {
 #define NF_CT_EXT_ACCT_TYPE struct nf_conn_counter
 #define NF_CT_EXT_ECACHE_TYPE struct nf_conntrack_ecache
 #define NF_CT_EXT_ZONE_TYPE struct nf_conntrack_zone
+#define NF_CT_EXT_SYNPROXY_TYPE struct syn_proxy_state
 
 /* Extensions: optional stuff which isn't permanently in struct. */
 struct nf_ct_ext {
diff --git a/include/net/tcp.h b/include/net/tcp.h
index a144914..b1d59c2 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -460,8 +460,18 @@ extern int			tcp_disconnect(struct sock *sk, int flags);
 extern __u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS];
 extern struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, 
 				    struct ip_options *opt);
-extern __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, 
-				     __u16 *mss);
+extern __u32 __cookie_v4_init_sequence(__be32 saddr, __be32 daddr,
+				       __be16 sport, __be16 dport, __u32 seq,
+				       __u16 *mssp);
+static inline __u32 cookie_v4_init_sequence(const struct iphdr *iph,
+					    const struct tcphdr *th,
+					    __u16 *mssp)
+{
+	return __cookie_v4_init_sequence(iph->saddr, iph->daddr, th->source,
+					 th->dest, ntohl(th->seq), mssp);
+}
+extern int cookie_v4_check_sequence(const struct iphdr *iph,
+				    const struct tcphdr *th, __u32 cookie);
 
 extern __u32 cookie_init_timestamp(struct request_sock *req);
 extern void cookie_check_timestamp(struct tcp_options_received *tcp_opt);
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 1833bdb..4c0b5df 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -343,6 +343,18 @@ config IP_NF_RAW
 	  If you want to compile it as a module, say M here and read
 	  <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
 
+config IP_NF_TARGET_SYNPROXY
+	tristate "SYNPROXY target support (EXPERIMENTAL)"
+	depends on IP_NF_RAW && EXPERIMENTAL
+	depends on NF_CONNTRACK_IPV4
+	depends on SYN_COOKIES
+	help
+	  The SYNPROXY target allows a raw rule to specify that some TCP
+	  connections are relayed to protect the TCP servers from the SYN-flood
+	  DoS attacks.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 # security table for MAC policy
 config IP_NF_SECURITY
 	tristate "Security table"
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 4811159..ae6a688 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -61,6 +61,7 @@ obj-$(CONFIG_IP_NF_TARGET_NETMAP) += ipt_NETMAP.o
 obj-$(CONFIG_IP_NF_TARGET_REDIRECT) += ipt_REDIRECT.o
 obj-$(CONFIG_IP_NF_TARGET_REJECT) += ipt_REJECT.o
 obj-$(CONFIG_IP_NF_TARGET_ULOG) += ipt_ULOG.o
+obj-$(CONFIG_IP_NF_TARGET_SYNPROXY) += ipt_SYNPROXY.o
 
 # generic ARP tables
 obj-$(CONFIG_IP_NF_ARPTABLES) += arp_tables.o
diff --git a/net/ipv4/netfilter/ipt_SYNPROXY.c b/net/ipv4/netfilter/ipt_SYNPROXY.c
new file mode 100644
index 0000000..2794aac
--- /dev/null
+++ b/net/ipv4/netfilter/ipt_SYNPROXY.c
@@ -0,0 +1,658 @@
+/* (C) 2010- Changli Gao <xiaosuo@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * It bases on ipt_REJECT.c
+ */
+#define pr_fmt(fmt) "SYNPROXY: " fmt
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/slab.h>
+#include <linux/ip.h>
+#include <linux/udp.h>
+#include <linux/icmp.h>
+#include <linux/unaligned/access_ok.h>
+#include <net/icmp.h>
+#include <net/ip.h>
+#include <net/tcp.h>
+#include <net/route.h>
+#include <net/dst.h>
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_extend.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter_ipv4/ip_tables.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Changli Gao <xiaosuo@gmail.com>");
+MODULE_DESCRIPTION("Xtables: \"SYNPROXY\" target for IPv4");
+
+enum {
+	TCP_SEND_FLAG_NOTRACE	= 0x1,
+	TCP_SEND_FLAG_SYNCOOKIE	= 0x2,
+	TCP_SEND_FLAG_ACK2SYN	= 0x4,
+};
+
+struct syn_proxy_state {
+	u16	seq_inited;
+	__be16	window;
+	u32	seq_diff;
+};
+
+static int get_mtu(const struct dst_entry *dst)
+{
+	int mtu;
+
+	mtu = dst_mtu(dst);
+	if (mtu)
+		return mtu;
+
+	return dst->dev ? dst->dev->mtu : 0;
+}
+
+static int get_advmss(const struct dst_entry *dst)
+{
+	int advmss;
+
+	advmss = dst_metric(dst, RTAX_ADVMSS);
+	if (advmss)
+		return advmss;
+	advmss = get_mtu(dst);
+	if (advmss)
+		return advmss - (sizeof(struct iphdr) + sizeof(struct tcphdr));
+
+	return TCP_MSS_DEFAULT;
+}
+
+static int syn_proxy_route(struct sk_buff *skb, struct net *net, u16 *pmss)
+{
+	const struct iphdr *iph = ip_hdr(skb);
+	struct rtable *rt;
+	struct flowi fl = {};
+	unsigned int type;
+	int flags = 0;
+	int err;
+	u16 mss;
+
+	type = inet_addr_type(net, iph->saddr);
+	if (type != RTN_LOCAL) {
+		type = inet_addr_type(net, iph->daddr);
+		if (type == RTN_LOCAL)
+			flags |= FLOWI_FLAG_ANYSRC;
+	}
+
+	if (type == RTN_LOCAL) {
+		fl.nl_u.ip4_u.daddr = iph->daddr;
+		fl.nl_u.ip4_u.saddr = iph->saddr;
+		fl.nl_u.ip4_u.tos = RT_TOS(iph->tos);
+		fl.flags = flags;
+		err = ip_route_output_key(net, &rt, &fl);
+		if (err)
+			goto out;
+
+		skb_dst_set(skb, &rt->u.dst);
+	} else {
+		/* non-local src, find valid iif to satisfy
+		 * rp-filter when calling ip_route_input. */
+		fl.nl_u.ip4_u.daddr = iph->saddr;
+		err = ip_route_output_key(net, &rt, &fl);
+		if (err)
+			goto out;
+
+		err = ip_route_input(skb, iph->daddr, iph->saddr,
+				     RT_TOS(iph->tos), rt->u.dst.dev);
+		if (err) {
+			dst_release(&rt->u.dst);
+			goto out;
+		}
+		if (pmss) {
+			mss = get_advmss(&rt->u.dst);
+			if (*pmss > mss)
+				*pmss = mss;
+		}
+		dst_release(&rt->u.dst);
+	}
+
+	err = skb_dst(skb)->error;
+	if (!err && pmss) {
+		mss = get_advmss(skb_dst(skb));
+		if (*pmss > mss)
+			*pmss = mss;
+	}
+
+out:
+	return err;
+}
+
+static int tcp_send(__be32 src, __be32 dst, __be16 sport, __be16 dport,
+		    u32 seq, u32 ack_seq, __be16 window, u16 mss,
+		    __be32 tcp_flags, u8 tos, struct net_device *dev, int flags,
+		    struct sk_buff *oskb)
+{
+	struct sk_buff *skb;
+	struct iphdr *iph;
+	struct tcphdr *th;
+	int err, len;
+
+	len = sizeof(*th);
+	if (mss)
+		len += TCPOLEN_MSS;
+
+	skb = NULL;
+	/* caller must give me a large enough oskb */
+	if (oskb) {
+		unsigned char *odata = oskb->data;
+
+		if (skb_recycle_check(oskb, 0)) {
+			oskb->data = odata;
+			skb_reset_tail_pointer(oskb);
+			skb = oskb;
+			pr_debug("recycle skb\n");
+		}
+	}
+	if (!skb) {
+		skb = alloc_skb(LL_MAX_HEADER + sizeof(*iph) + len, GFP_ATOMIC);
+		if (!skb) {
+			err = -ENOMEM;
+			goto out;
+		}
+		skb_reserve(skb, LL_MAX_HEADER);
+	}
+
+	skb_reset_network_header(skb);
+	if (!(flags & TCP_SEND_FLAG_ACK2SYN) || skb != oskb) {
+		iph = (struct iphdr *)skb_put(skb, sizeof(*iph));
+		iph->version	= 4;
+		iph->ihl	= sizeof(*iph) / 4;
+		iph->tos	= tos;
+		/* tot_len is set in ip_local_out() */
+		iph->id		= 0;
+		iph->frag_off	= htons(IP_DF);
+		iph->protocol	= IPPROTO_TCP;
+		iph->saddr	= src;
+		iph->daddr	= dst;
+		th = (struct tcphdr *)skb_put(skb, len);
+		th->source	= sport;
+		th->dest	= dport;
+	} else {
+		iph = (struct iphdr *)skb->data;
+		iph->id		= 0;
+		iph->frag_off	= htons(IP_DF);
+		skb_put(skb, iph->ihl * 4 + len);
+		th = (struct tcphdr *)(skb->data + iph->ihl * 4);
+	}
+
+	th->seq		= htonl(seq);
+	th->ack_seq	= htonl(ack_seq);
+	tcp_flag_word(th) = tcp_flags;
+	th->doff	= len / 4;
+	th->window	= window;
+	th->urg_ptr	= 0;
+
+	if ((flags & TCP_SEND_FLAG_SYNCOOKIE) && mss)
+		err = syn_proxy_route(skb, dev_net(dev), &mss);
+	else
+		err = syn_proxy_route(skb, dev_net(dev), NULL);
+	if (err)
+		goto err_out;
+
+	if ((flags & TCP_SEND_FLAG_SYNCOOKIE)) {
+		if (mss) {
+			th->seq = htonl(__cookie_v4_init_sequence(dst, src,
+								  dport, sport,
+								  ack_seq - 1,
+								  &mss));
+		} else {
+			mss = TCP_MSS_DEFAULT;
+			th->seq = htonl(__cookie_v4_init_sequence(dst, src,
+								  dport, sport,
+								  ack_seq - 1,
+								  &mss));
+			mss = 0;
+		}
+	}
+
+	if (mss)
+		* (__force __be32 *)(th + 1) = htonl((TCPOPT_MSS << 24) |
+						     (TCPOLEN_MSS << 16) |
+						     mss);
+	skb->ip_summed = CHECKSUM_PARTIAL;
+	th->check = ~tcp_v4_check(len, src, dst, 0);
+	skb->csum_start = (unsigned char *)th - skb->head;
+	skb->csum_offset = offsetof(struct tcphdr, check);
+
+	if (!(flags & TCP_SEND_FLAG_ACK2SYN) || skb != oskb)
+		iph->ttl	= dst_metric(skb_dst(skb), RTAX_HOPLIMIT);
+
+	if (skb->len > get_mtu(skb_dst(skb))) {
+		if (printk_ratelimit())
+			pr_warning("%s has smaller mtu: %d\n",
+				   skb_dst(skb)->dev->name,
+				   get_mtu(skb_dst(skb)));
+		err = -EINVAL;
+		goto err_out;
+	}
+
+	if ((flags & TCP_SEND_FLAG_NOTRACE)) {
+		skb->nfct = &nf_conntrack_untracked.ct_general;
+		skb->nfctinfo = IP_CT_NEW;
+		nf_conntrack_get(skb->nfct);
+	}
+
+	pr_debug("ip_local_out: %pI4n:%hu -> %pI4n:%hu (seq=%u, "
+		 "ack_seq=%u mss=%hu flags=%x)\n", &src, ntohs(th->source),
+		 &dst, ntohs(th->dest), ntohl(th->seq), ack_seq, mss,
+		 ntohl(tcp_flags));
+
+	err = ip_local_out(skb);
+	if (err > 0)
+		err = net_xmit_errno(err);
+
+	pr_debug("ip_local_out: return with %d\n", err);
+out:
+	if (oskb && oskb != skb)
+		kfree_skb(oskb);
+
+	return err;
+
+err_out:
+	kfree_skb(skb);
+	goto out;
+}
+
+static int get_mss(u8 *data, int len)
+{
+	u8 olen;
+
+	while (len >= TCPOLEN_MSS) {
+		switch (data[0]) {
+		case TCPOPT_EOL:
+			return 0;
+		case TCPOPT_NOP:
+			data++;
+			len--;
+			break;
+		case TCPOPT_MSS:
+			if (data[1] != TCPOLEN_MSS)
+				return -EINVAL;
+			return get_unaligned_be16(data + 2);
+		default:
+			olen = data[1];
+			if (olen < 2 || olen > len)
+				return -EINVAL;
+			data += olen;
+			len -= olen;
+			break;
+		}
+	}
+
+	return 0;
+}
+
+static DEFINE_PER_CPU(struct syn_proxy_state, syn_proxy_state);
+
+/* syn_proxy_pre isn't under the protection of nf_conntrack_proto_tcp.c */
+static int syn_proxy_pre(struct sk_buff *skb, struct nf_conn *ct,
+			 enum ip_conntrack_info ctinfo)
+{
+	struct syn_proxy_state *state;
+	struct iphdr *iph;
+	struct tcphdr *th, _th;
+
+	/* only support IPv4 now */
+	iph = ip_hdr(skb);
+	if (iph->version != 4)
+		return NF_ACCEPT;
+
+	th = skb_header_pointer(skb, iph->ihl * 4, sizeof(_th), &_th);
+	BUG_ON(th == NULL);
+
+	if (!ct || !nf_ct_is_confirmed(ct)) {
+		int ret;
+
+		if (!th->syn && th->ack) {
+			u16 mss;
+			struct sk_buff *rec_skb;
+
+			mss = cookie_v4_check_sequence(iph, th,
+						       ntohl(th->ack_seq) - 1);
+			if (!mss)
+				return NF_ACCEPT;
+
+			pr_debug("%pI4n:%hu -> %pI4n:%hu(mss=%hu)\n",
+				 &iph->saddr, ntohs(th->source),
+				 &iph->daddr, ntohs(th->dest), mss);
+
+			if (skb_tailroom(skb) < TCPOLEN_MSS &&
+			    skb->len < iph->ihl * 4 + sizeof(*th) + TCPOLEN_MSS)
+				rec_skb = NULL;
+			else
+				rec_skb = skb;
+
+			local_bh_disable();
+			state = &__get_cpu_var(syn_proxy_state);
+			state->seq_inited = 1;
+			state->window = th->window;
+			state->seq_diff = ntohl(th->ack_seq) - 1;
+			if (rec_skb)
+				tcp_send(iph->saddr, iph->daddr, 0, 0,
+					 ntohl(th->seq) - 1, 0, th->window,
+					 mss, TCP_FLAG_SYN, 0, skb->dev,
+					 TCP_SEND_FLAG_ACK2SYN, rec_skb);
+			else
+				tcp_send(iph->saddr, iph->daddr, th->source,
+					 th->dest, ntohl(th->seq) - 1, 0,
+					 th->window, mss, TCP_FLAG_SYN,
+					 iph->tos, skb->dev, 0, NULL);
+			state->seq_inited = 0;
+			local_bh_enable();
+
+			if (!rec_skb)
+				kfree_skb(skb);
+
+			return NF_STOLEN;
+		}
+
+		if (!ct || !th->syn || th->ack)
+			return NF_ACCEPT;
+
+		ret = NF_ACCEPT;
+		local_bh_disable();
+		state = &__get_cpu_var(syn_proxy_state);
+		if (state->seq_inited) {
+			struct syn_proxy_state *nstate;
+
+			nstate = nf_ct_ext_add(ct, NF_CT_EXT_SYNPROXY,
+					       GFP_ATOMIC);
+			if (nstate != NULL) {
+				nstate->seq_inited = 0;
+				nstate->window = state->window;
+				nstate->seq_diff = state->seq_diff;
+				pr_debug("seq_diff: %u\n", nstate->seq_diff);
+			} else {
+				ret = NF_DROP;
+			}
+		}
+		local_bh_enable();
+
+		return ret;
+	}
+
+	state = nf_ct_ext_find(ct, NF_CT_EXT_SYNPROXY);
+	if (!state)
+		return NF_ACCEPT;
+
+	if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
+		__be32 newack;
+
+		/* don't need to mangle duplicate SYN packets */
+		if (th->syn && !th->ack)
+			return NF_ACCEPT;
+		if (!skb_make_writable(skb, ip_hdrlen(skb) + sizeof(*th)))
+			return NF_DROP;
+		th = (struct tcphdr *)(skb->data + ip_hdrlen(skb));
+		newack = htonl(ntohl(th->ack_seq) - state->seq_diff);
+		inet_proto_csum_replace4(&th->check, skb, th->ack_seq, newack,
+					 0);
+		pr_debug("alter ack seq: %u -> %u\n",
+			 ntohl(th->ack_seq), ntohl(newack));
+		th->ack_seq = newack;
+	} else {
+		/* Simultaneous open ? Oh, no. The connection between
+		 * client and us is established. */
+		if (th->syn && !th->ack)
+			return NF_DROP;
+	}
+
+	return NF_ACCEPT;
+}
+
+static int syn_proxy_mangle_pkt(struct sk_buff *skb, struct iphdr *iph,
+				struct tcphdr *th, u32 seq_diff)
+{
+	__be32 new;
+	int olen;
+
+	if (skb->len < (iph->ihl + th->doff) * 4)
+		return NF_DROP;
+	if (!skb_make_writable(skb, (iph->ihl + th->doff) * 4))
+		return NF_DROP;
+	iph = (struct iphdr *)(skb->data);
+	th = (struct tcphdr *)(skb->data + iph->ihl * 4);
+
+	new = tcp_flag_word(th) & (~TCP_FLAG_SYN);
+	inet_proto_csum_replace4(&th->check, skb, tcp_flag_word(th), new, 0);
+	tcp_flag_word(th) = new;
+
+	new = htonl(ntohl(th->seq) + seq_diff);
+	inet_proto_csum_replace4(&th->check, skb, th->seq, new, 0);
+	pr_debug("alter seq: %u -> %u\n", ntohl(th->seq), ntohl(new));
+	th->seq = new;
+
+	olen = th->doff - sizeof(*th) / 4;
+	if (olen) {
+		__be32 *opt;
+
+		opt = (__force __be32 *)(th + 1);
+#define TCPOPT_EOL_WORD ((TCPOPT_EOL << 24) + (TCPOPT_EOL << 16) + \
+			 (TCPOPT_EOL << 8) + TCPOPT_EOL)
+		inet_proto_csum_replace4(&th->check, skb, *opt, TCPOPT_EOL_WORD,
+					 0);
+		*opt = TCPOPT_EOL_WORD;
+	}
+
+	return NF_ACCEPT;
+}
+
+static int syn_proxy_post(struct sk_buff *skb, struct nf_conn *ct,
+			  enum ip_conntrack_info ctinfo)
+{
+	struct syn_proxy_state *state;
+	struct iphdr *iph;
+	struct tcphdr *th;
+
+	/* untraced packets don't have NF_CT_EXT_SYNPROXY ext, as they don't
+	 * enter syn_proxy_pre() */
+	state = nf_ct_ext_find(ct, NF_CT_EXT_SYNPROXY);
+	if (state == NULL)
+		return NF_ACCEPT;
+
+	iph = ip_hdr(skb);
+	if (!skb_make_writable(skb, iph->ihl * 4 + sizeof(*th)))
+		return NF_DROP;
+	th = (struct tcphdr *)(skb->data + iph->ihl * 4);
+	if (!state->seq_inited) {
+		if (th->syn) {
+			/* It must be from original direction, as the ones
+			 * from the other side are dropped in function
+			 * syn_proxy_pre() */
+			if (!th->ack)
+				return NF_ACCEPT;
+
+			pr_debug("SYN-ACK %pI4n:%hu -> %pI4n:%hu "
+				 "(seq=%u ack_seq=%u)\n",
+				 &iph->saddr, ntohs(th->source), &iph->daddr,
+				 ntohs(th->dest), ntohl(th->seq),
+				 ntohl(th->ack_seq));
+
+			/* SYN-ACK from reply direction with the protection
+			 * of conntrack */
+			spin_lock_bh(&ct->lock);
+			if (!state->seq_inited) {
+				state->seq_inited = 1;
+				pr_debug("update seq_diff %u -> %u\n",
+					 state->seq_diff,
+					 state->seq_diff - ntohl(th->seq));
+				state->seq_diff -= ntohl(th->seq);
+			}
+			spin_unlock_bh(&ct->lock);
+			tcp_send(iph->daddr, iph->saddr, th->dest, th->source,
+				 ntohl(th->ack_seq),
+				 ntohl(th->seq) + 1 + state->seq_diff,
+				 state->window, 0, TCP_FLAG_ACK, iph->tos,
+				 skb->dev, 0, NULL);
+
+			return syn_proxy_mangle_pkt(skb, iph, th,
+						    state->seq_diff + 1);
+		} else {
+			__be32 newseq;
+
+			if (!th->rst)
+				return NF_ACCEPT;
+			newseq = htonl(state->seq_diff + 1);
+			inet_proto_csum_replace4(&th->check, skb, th->seq,
+						 newseq, 0);
+			pr_debug("alter RST seq: %u -> %u\n",
+				 ntohl(th->seq), ntohl(newseq));
+			th->seq = newseq;
+
+			return NF_ACCEPT;
+		}
+	}
+
+	/* ct should be in ESTABLISHED state, but if the ack packets from
+	 * us are lost. */
+	if (th->syn) {
+		if (!th->ack)
+			return NF_ACCEPT;
+
+		tcp_send(iph->daddr, iph->saddr, th->dest, th->source,
+			 ntohl(th->ack_seq),
+			 ntohl(th->seq) + 1 + state->seq_diff,
+			 state->window, 0, TCP_FLAG_ACK, iph->tos,
+			 skb->dev, 0, NULL);
+
+		return syn_proxy_mangle_pkt(skb, iph, th, state->seq_diff + 1);
+	}
+
+	if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY) {
+		__be32 newseq;
+
+		newseq = htonl(ntohl(th->seq) + state->seq_diff);
+		inet_proto_csum_replace4(&th->check, skb, th->seq, newseq, 0);
+		pr_debug("alter seq: %u -> %u\n", ntohl(th->seq),
+			 ntohl(newseq));
+		th->seq = newseq;
+	}
+
+	return NF_ACCEPT;
+}
+
+static int tcp_process(struct sk_buff *skb)
+{
+	const struct iphdr *iph;
+	const struct tcphdr *th;
+	int err;
+	u16 mss;
+
+	iph = ip_hdr(skb);
+	if (iph->frag_off & htons(IP_OFFSET))
+		goto out;
+	if (!pskb_may_pull(skb, iph->ihl * 4 + sizeof(*th)))
+		goto out;
+	th = (const struct tcphdr *)(skb->data + iph->ihl * 4);
+	if (th->fin || th->rst || th->ack || !th->syn)
+		goto out;
+
+	if (nf_ip_checksum(skb, NF_INET_PRE_ROUTING, iph->ihl * 4, IPPROTO_TCP))
+		goto out;
+	mss = 0;
+	if (th->doff > sizeof(*th) / 4) {
+		if (!pskb_may_pull(skb, (iph->ihl + th->doff) * 4))
+			goto out;
+		err = get_mss((u8 *)(th + 1), th->doff * 4 - sizeof(*th));
+		if (err < 0)
+			goto out;
+		if (err != 0)
+			mss = err;
+	} else if (th->doff != sizeof(*th) / 4)
+		goto out;
+
+	tcp_send(iph->daddr, iph->saddr, th->dest, th->source, 0,
+		 ntohl(th->seq) + 1, 0, mss, TCP_FLAG_SYN | TCP_FLAG_ACK,
+		 iph->tos, skb->dev,
+		 TCP_SEND_FLAG_NOTRACE | TCP_SEND_FLAG_SYNCOOKIE, skb);
+
+	return NF_STOLEN;
+
+out:
+	return NF_DROP;
+}
+
+static unsigned int synproxy_tg(struct sk_buff *skb,
+				const struct xt_action_param *par)
+{
+	struct nf_conn *ct;
+	enum ip_conntrack_info ctinfo;
+	int ret;
+
+	/* received from lo */
+	ct = nf_ct_get(skb, &ctinfo);
+	if (ct)
+		return IPT_CONTINUE;
+
+	local_bh_disable();
+	if (!__get_cpu_var(syn_proxy_state).seq_inited)
+		ret = tcp_process(skb);
+	else
+		ret = IPT_CONTINUE;
+	local_bh_enable();
+
+	return ret;
+}
+
+static struct xt_target synproxy_tg_reg __read_mostly = {
+	.name		= "SYNPROXY",
+	.family		= NFPROTO_IPV4,
+	.target		= synproxy_tg,
+	.table		= "raw",
+	.hooks		= (1 << NF_INET_PRE_ROUTING),
+	.proto		= IPPROTO_TCP,
+	.me		= THIS_MODULE,
+};
+
+static struct nf_ct_ext_type syn_proxy_state_ext __read_mostly = {
+	.len	= sizeof(struct syn_proxy_state),
+	.align	= __alignof__(struct syn_proxy_state),
+	.id	= NF_CT_EXT_SYNPROXY,
+};
+
+static int __init synproxy_tg_init(void)
+{
+	int err, cpu;
+
+	for_each_possible_cpu(cpu)
+		per_cpu(syn_proxy_state, cpu).seq_inited = 0;
+	rcu_assign_pointer(syn_proxy_pre_hook, syn_proxy_pre);
+	rcu_assign_pointer(syn_proxy_post_hook, syn_proxy_post);
+	err = nf_ct_extend_register(&syn_proxy_state_ext);
+	if (err)
+		goto err_out;
+	err = xt_register_target(&synproxy_tg_reg);
+	if (err)
+		goto err_out2;
+
+	return err;
+
+err_out2:
+	nf_ct_extend_unregister(&syn_proxy_state_ext);
+err_out:
+	rcu_assign_pointer(syn_proxy_post_hook, NULL);
+	rcu_assign_pointer(syn_proxy_pre_hook, NULL);
+	rcu_barrier();
+
+	return err;
+}
+
+static void __exit synproxy_tg_exit(void)
+{
+	xt_unregister_target(&synproxy_tg_reg);
+	nf_ct_extend_unregister(&syn_proxy_state_ext);
+	rcu_assign_pointer(syn_proxy_post_hook, NULL);
+	rcu_assign_pointer(syn_proxy_pre_hook, NULL);
+	rcu_barrier();
+}
+
+module_init(synproxy_tg_init);
+module_exit(synproxy_tg_exit);
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 5c24db4..d61d374 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -160,26 +160,22 @@ static __u16 const msstab[] = {
  * Generate a syncookie.  mssp points to the mss, which is returned
  * rounded down to the value encoded in the cookie.
  */
-__u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mssp)
+__u32 __cookie_v4_init_sequence(__be32 saddr, __be32 daddr, __be16 sport,
+				__be16 dport, __u32 seq, __u16 *mssp)
 {
-	const struct iphdr *iph = ip_hdr(skb);
-	const struct tcphdr *th = tcp_hdr(skb);
 	int mssind;
 	const __u16 mss = *mssp;
 
-	tcp_synq_overflow(sk);
-
 	/* XXX sort msstab[] by probability?  Binary search? */
 	for (mssind = 0; mss > msstab[mssind + 1]; mssind++)
 		;
 	*mssp = msstab[mssind] + 1;
 
-	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT);
 
-	return secure_tcp_syn_cookie(iph->saddr, iph->daddr,
-				     th->source, th->dest, ntohl(th->seq),
+	return secure_tcp_syn_cookie(saddr, daddr, sport, dport, seq,
 				     jiffies / (HZ * 60), mssind);
 }
+EXPORT_SYMBOL(__cookie_v4_init_sequence);
 
 /*
  * This (misnamed) value is the age of syncookie which is permitted.
@@ -192,10 +188,9 @@ __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mssp)
  * Check if a ack sequence number is a valid syncookie.
  * Return the decoded mss if it is, or 0 if not.
  */
-static inline int cookie_check(struct sk_buff *skb, __u32 cookie)
+int cookie_v4_check_sequence(const struct iphdr *iph, const struct tcphdr *th,
+			     __u32 cookie)
 {
-	const struct iphdr *iph = ip_hdr(skb);
-	const struct tcphdr *th = tcp_hdr(skb);
 	__u32 seq = ntohl(th->seq) - 1;
 	__u32 mssind = check_tcp_syn_cookie(cookie, iph->saddr, iph->daddr,
 					    th->source, th->dest, seq,
@@ -204,6 +199,7 @@ static inline int cookie_check(struct sk_buff *skb, __u32 cookie)
 
 	return mssind < NUM_MSS ? msstab[mssind] + 1 : 0;
 }
+EXPORT_SYMBOL(cookie_v4_check_sequence);
 
 static inline struct sock *get_cookie_sock(struct sock *sk, struct sk_buff *skb,
 					   struct request_sock *req,
@@ -270,7 +266,8 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 		goto out;
 
 	if (tcp_synq_no_recent_overflow(sk) ||
-	    (mss = cookie_check(skb, cookie)) == 0) {
+	    (mss = cookie_v4_check_sequence(ip_hdr(skb), tcp_hdr(skb),
+					    cookie)) == 0) {
 		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESFAILED);
 		goto out;
 	}
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 202cf09..9879c3b 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1330,8 +1330,11 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 #ifdef CONFIG_SYN_COOKIES
 		syn_flood_warning(skb);
 		req->cookie_ts = tmp_opt.tstamp_ok;
+		tcp_synq_overflow(sk);
+		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT);
+		isn = cookie_v4_init_sequence(ip_hdr(skb), tcp_hdr(skb),
+					      &req->mss);
 #endif
-		isn = cookie_v4_init_sequence(sk, skb, &req->mss);
 	} else if (!isn) {
 		struct inet_peer *peer = NULL;
 
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index b83c530..2a2ef7b 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -847,8 +847,24 @@ nf_conntrack_in(struct net *net, u_int8_t pf, unsigned int hooknum,
 			       l3proto, l4proto, &set_reply, &ctinfo);
 	if (!ct) {
 		/* Not valid part of a connection */
+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
+		int (*syn_proxy)(struct sk_buff *, struct nf_conn *,
+				 enum ip_conntrack_info);
+
+		syn_proxy = rcu_dereference(syn_proxy_pre_hook);
+		if (protonum == IPPROTO_TCP && syn_proxy) {
+			ret = syn_proxy(skb, NULL, ctinfo);
+			if (ret == NF_ACCEPT)
+				NF_CT_STAT_INC_ATOMIC(net, invalid);
+		} else {
+#endif
 		NF_CT_STAT_INC_ATOMIC(net, invalid);
 		ret = NF_ACCEPT;
+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
+		}
+#endif
 		goto out;
 	}
 
@@ -861,6 +877,20 @@ nf_conntrack_in(struct net *net, u_int8_t pf, unsigned int hooknum,
 
 	NF_CT_ASSERT(skb->nfct);
 
+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
+	{
+		int (*syn_proxy)(struct sk_buff *, struct nf_conn *,
+				 enum ip_conntrack_info);
+
+		syn_proxy = rcu_dereference(syn_proxy_pre_hook);
+		if (protonum == IPPROTO_TCP && syn_proxy) {
+			ret = syn_proxy(skb, ct, ctinfo);
+			if (ret != NF_ACCEPT)
+				goto out;
+		}
+	}
+#endif
 	ret = l4proto->packet(ct, skb, dataoff, ctinfo, pf, hooknum);
 	if (ret <= 0) {
 		/* Invalid: inverse of the return code tells
@@ -1448,6 +1478,17 @@ s16 (*nf_ct_nat_offset)(const struct nf_conn *ct,
 			u32 seq);
 EXPORT_SYMBOL_GPL(nf_ct_nat_offset);
 
+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
+int (*syn_proxy_pre_hook)(struct sk_buff *skb, struct nf_conn *ct,
+			  enum ip_conntrack_info ctinfo);
+EXPORT_SYMBOL(syn_proxy_pre_hook);
+
+int (*syn_proxy_post_hook)(struct sk_buff *skb, struct nf_conn *ct,
+			   enum ip_conntrack_info ctinfo);
+EXPORT_SYMBOL(syn_proxy_post_hook);
+#endif
+
 int nf_conntrack_init(struct net *net)
 {
 	int ret;
@@ -1468,6 +1509,9 @@ int nf_conntrack_init(struct net *net)
 
 		/* Howto get NAT offsets */
 		rcu_assign_pointer(nf_ct_nat_offset, NULL);
+
+		rcu_assign_pointer(syn_proxy_pre_hook, NULL);
+		rcu_assign_pointer(syn_proxy_post_hook, NULL);
 	}
 	return 0;
 

^ permalink raw reply related

* Re: [PATCH] sock.h: fix kernel-doc warning
From: David Miller @ 2010-05-25  6:56 UTC (permalink / raw)
  To: randy.dunlap; +Cc: netdev, akpm
In-Reply-To: <20100524222225.0fc298eb.randy.dunlap@oracle.com>

From: Randy Dunlap <randy.dunlap@oracle.com>
Date: Mon, 24 May 2010 22:22:25 -0700

> From: Randy Dunlap <randy.dunlap@oracle.com>
> 
> Fix sock.h kernel-doc warning:
> Warning(include/net/sock.h:1438): No description found for parameter 'wq'
> 
> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>

Applied, thanks Randy.

^ permalink raw reply

* Re: linux-next: build failure after merge of the final tree
From: David Miller @ 2010-05-25  6:56 UTC (permalink / raw)
  To: herbert; +Cc: sfr, netdev, linux-next, linux-kernel
In-Reply-To: <20100525051442.GA8480@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 25 May 2010 15:14:42 +1000

> On Mon, May 24, 2010 at 09:58:51PM -0700, David Miller wrote:
>> 
>> Herbert, can you take a look?
> 
> Sorry, I hadn't tested the built-in case.
> 
> cls_cgroup: Fix build error when built-in
> 
> There is a typo in cgroup_cls_state when cls_cgroup is built-in.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied, thanks Herbert.

^ permalink raw reply

* Re: [Nios2-dev] [PATCH] ethoc: fix null dereference in ethoc_probe
From: Thomas Chou @ 2010-05-25  6:51 UTC (permalink / raw)
  To: nios2-dev; +Cc: netdev, thierry.reding, linux-kernel, Dan Carpenter
In-Reply-To: <4BFB6B4B.3020107@zhaw.ch>

Tobias Klauser wrote:
>> @@ -944,6 +946,7 @@ static int ethoc_probe(struct platform_device *pdev)
>>  	priv = netdev_priv(netdev);
>>  	priv->netdev = netdev;
>>  	priv->dma_alloc = 0;
>> +	priv->io_region_size = mmio->end - mmio->start + 1;
> 
> Better use the resource_size inline here.

Hi Tobias,

I just see your patch of resource_size from the git log. Sorry, I missed 
this one. I will send update on this. Thank you.

Cheers,
Thomas

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox