Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net] Phonet: fix potential use-after-free in pep_sock_close()
From: Rémi Denis-Courmont @ 2010-05-25 14:11 UTC (permalink / raw)
  To: netdev

From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>

sk_common_release() might destroy our last reference to the socket.
So an extra temporary reference is needed during cleanup.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
---
 net/phonet/pep.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index af4d38b..7b048a3 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -626,6 +626,7 @@ static void pep_sock_close(struct sock *sk, long timeout)
 	struct pep_sock *pn = pep_sk(sk);
 	int ifindex = 0;
 
+	sock_hold(sk); /* keep a reference after sk_common_release() */
 	sk_common_release(sk);
 
 	lock_sock(sk);
@@ -644,6 +645,7 @@ static void pep_sock_close(struct sock *sk, long timeout)
 
 	if (ifindex)
 		gprs_detach(sk);
+	sock_put(sk);
 }
 
 static int pep_wait_connreq(struct sock *sk, int noblock)
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH] mlx4_en: show device's port used
From: Eli Cohen @ 2010-05-25 13:55 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Linux RDMA list, Roland Dreier,
	yevgenyp-VPRAkNaXOzVS1MOuV/RT9w

Add a sysfs file under /sys/class/net/<ethx> to show the port number within the
device that this network interface is using. This is needed as ConnectX devices
have two ports and it is useful to know which port the ethernet devices uses.

Signed-off-by: Eli Cohen <eli-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
---
 drivers/net/mlx4/en_netdev.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/drivers/net/mlx4/en_netdev.c b/drivers/net/mlx4/en_netdev.c
index 73c3d20..5d6f811 100644
--- a/drivers/net/mlx4/en_netdev.c
+++ b/drivers/net/mlx4/en_netdev.c
@@ -871,6 +871,16 @@ err:
 	return -ENOMEM;
 }
 
+static ssize_t show_port(struct device *d, struct device_attribute *attr,
+			 char *buf)
+{
+	struct mlx4_en_priv *priv = netdev_priv(to_net_dev(d));
+
+	return sprintf(buf, "%d\n", priv->port);
+	return 0;
+}
+
+static DEVICE_ATTR(port, S_IRUGO, show_port, NULL);
 
 void mlx4_en_destroy_netdev(struct net_device *dev)
 {
@@ -1067,6 +1077,12 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	en_warn(priv, "Using %d TX rings\n", prof->tx_ring_num);
 	en_warn(priv, "Using %d RX rings\n", prof->rx_ring_num);
 
+	err = device_create_file(&dev->dev, &dev_attr_port);
+	if (err) {
+		mlx4_err(mdev, "failed to create sysfs file\n");
+		goto out;
+	}
+
 	priv->registered = 1;
 	queue_delayed_work(mdev->workqueue, &priv->stats_task, STATS_DELAY);
 	return 0;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: Question about an assignment in handle_ing()
From: jamal @ 2010-05-25 13:13 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <20100525124636.GA13161@gondor.apana.org.au>

On Tue, 2010-05-25 at 22:46 +1000, Herbert Xu wrote:

> That's not very surprising as you're not checking whether the
> skb is cloned in act_pedit.c:

I meant the test "if (skb_cloned(skb))" failed in such cases;-> So you
couldnt reliably use it. 
If it turns out it is unnecessary, what you describe is what i had in
mind as well.

> BTW, this error handling also looks a bit suss.  Shouldn't it
> drop the packet instead of continuing as if we have successfully
> modified it?

It is not the responsibility of the action to drop packets in a pipeline
rather the responsibility is that of the caller (ref: rule #3 in
Documentation/networking/tc-actions-env-rules.txt). What to do on a
failure such as above is programmable by the action user/admin.

Anyways, now i am really gone. Dont make me look in again Herbert ;->

cheers,
jamal

^ permalink raw reply

* Re: Question about an assignment in handle_ing()
From: Herbert Xu @ 2010-05-25 12:46 UTC (permalink / raw)
  To: jamal; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <1274790038.3878.926.camel@bigi>

On Tue, May 25, 2010 at 08:20:38AM -0400, jamal wrote:
> On Tue, 2010-05-25 at 22:12 +1000, Herbert Xu wrote:
> 
> > It is guaranteed, it is illegal to hold onto the skb contents
> > without the reference that comes from cloning an skb.
> 
> As i recall this was an issue. And the example i described earlier
> would have tcpdump show edited contents coming out of eth0.

That's not very surprising as you're not checking whether the
skb is cloned in act_pedit.c:

	if (!(skb->tc_verd & TC_OK2MUNGE)) {
		/* should we set skb->cloned? */
		if (pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) {
			return p->tcf_action;
		}
	}

This code needs to be changed to something like

	if (skb_cloned(skb)) {
		if (pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) {
			return p->tcf_action;
		}
	}

BTW, this error handling also looks a bit suss.  Shouldn't it
drop the packet instead of continuing as if we have successfully
modified it?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* [PATCH] hso: add support for new products
From: f.aben @ 2010-05-25 12:33 UTC (permalink / raw)
  To: davem, gregkh; +Cc: linux-usb, netdev, j.dumon


This patch adds a few new product id's for the hso driver.

Signed-off-by: Filip Aben <f.aben@option.com>
---
diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c
index be0cc99..8355f0e 100644
--- a/drivers/net/usb/hso.c
+++ b/drivers/net/usb/hso.c
@@ -475,6 +475,9 @@ static const struct usb_device_id hso_ids[] = {
 	{USB_DEVICE(0x0af0, 0x8302)},
 	{USB_DEVICE(0x0af0, 0x8304)},
 	{USB_DEVICE(0x0af0, 0x8400)},
+	{USB_DEVICE(0x0af0, 0x8600)},
+	{USB_DEVICE(0x0af0, 0x8800)},
+	{USB_DEVICE(0x0af0, 0x8900)},
 	{USB_DEVICE(0x0af0, 0xd035)},
 	{USB_DEVICE(0x0af0, 0xd055)},
 	{USB_DEVICE(0x0af0, 0xd155)},

^ permalink raw reply related

* Re: Question about an assignment in handle_ing()
From: jamal @ 2010-05-25 12:20 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <20100525121202.GA12712@gondor.apana.org.au>

On Tue, 2010-05-25 at 22:12 +1000, Herbert Xu wrote:

> It is guaranteed, it is illegal to hold onto the skb contents
> without the reference that comes from cloning an skb.

As i recall this was an issue. And the example i described earlier
would have tcpdump show edited contents coming out of eth0.

> The DHCP client may keep the AF_PACKET open even if there is no
> ongoing dialogue.

Ok ;-> Not running that dhcp client without looking at how it behaves.

> For ingress, the packet may have looped back from a virtual
> interface.  It may have been cloned on its way out of the virtual
> interface.

and i think this is where you and i ended last time we talked.

> Looking at ptype instead of checking whether the skb is cloned
> simply doesn't work in this case.

I will test in a few days - or if Jiri has the bandwidth i think i can
describe what to test (I have the relevant tests logged somewhere).

I apologize there may be some large latency of about a day before my
next response.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH RFC] netfilter: iptables target SYNPROXY
From: Jozsef Kadlecsik @ 2010-05-25 12:17 UTC (permalink / raw)
  To: Changli Gao
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov, James Morris,
	netfilter-devel, netdev
In-Reply-To: <1274771218-3204-1-git-send-email-xiaosuo@gmail.com>

Hi,

On Tue, 25 May 2010, Changli Gao wrote:

> iptables target SYNPROXY.
> 
> This patch implements an iptables target SYNPROXY, which works in the raw table
> of the PREROUTING chain, before conntracking system. Syncookies is used, so no
> new state is introduced into the conntracking system. In fact, until the first
> connection is established, conntracking system doesn't see any packets. So when
> there is a SYN-flood attack, conntracking system won't be busy on finding and
> deleting the un-assured ct.

My main problem with your target is that by using it, important and useful 
TCP options are lost: timestamp and SACK. That pushes back TCP by almost 
twenty years.

Here you reason for the target that it protects conntrack itself, but in 
the Kconfig text you write that it protects the servers behind the 
firewall. Both can be true, but if the real goal is to defend the servers 
then your target could simply send a faked ACK to complete the three way 
handshake and that way TCP would not be crippled (conntrack timeout 
should still be adjusted).

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply

* Re: Question about an assignment in handle_ing()
From: Herbert Xu @ 2010-05-25 12:12 UTC (permalink / raw)
  To: jamal; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <1274789024.3878.919.camel@bigi>

On Tue, May 25, 2010 at 08:03:44AM -0400, jamal wrote:
> On Tue, 2010-05-25 at 20:26 +1000, Herbert Xu wrote:
> 
> > In that case you should be checking whether the skb is cloned.
> 
> That is the general rule used (and what i specify to do in the docs)..
> but i recall there were issues if the packet path emanated from ingress
> and included multiple netdevices (earlier ex with mirror applies). There
> may have been bugs then, eg I could not assume that it i had any ptype
> at all that the ptype will clone the packet. Does tcpdump guarantee
> skb->clone being set? I will try to test some scenarios when i am back
> +settled. 

It is guaranteed, it is illegal to hold onto the skb contents
without the reference that comes from cloning an skb.

> "Most" for people running serious firewalls or routers is not to run
> DHCP servers;-> They may client, but thats a short-lived session.

The DHCP client may keep the AF_PACKET open even if there is no
ongoing dialogue.

> > Also
> > the skb may still be cloned even if there is no AF_PACKET listener.
> > In that case your optimisation may be incorrect.
> 
> Did you mean that as long as there are other ptypes - which may or 
> not be doing af packet? 

For ingress, the packet may have looped back from a virtual
interface.  It may have been cloned on its way out of the virtual
interface.

Looking at ptype instead of checking whether the skb is cloned
simply doesn't work in this case.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: Question about an assignment in handle_ing()
From: jamal @ 2010-05-25 12:03 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <20100525102603.GA11494@gondor.apana.org.au>

On Tue, 2010-05-25 at 20:26 +1000, Herbert Xu wrote:

> In that case you should be checking whether the skb is cloned.

That is the general rule used (and what i specify to do in the docs)..
but i recall there were issues if the packet path emanated from ingress
and included multiple netdevices (earlier ex with mirror applies). There
may have been bugs then, eg I could not assume that it i had any ptype
at all that the ptype will clone the packet. Does tcpdump guarantee
skb->clone being set? I will try to test some scenarios when i am back
+settled. 

> After all, tcpdump might have simply filtered the packet out.

True - but i think thats an acceptable compromise.

> BTW, this is the case whenever you run a DHCP client/server.  So
> on most boxes your optimisation will never kick in as is.  

"Most" for people running serious firewalls or routers is not to run
DHCP servers;-> They may client, but thats a short-lived session.

> Also
> the skb may still be cloned even if there is no AF_PACKET listener.
> In that case your optimisation may be incorrect.

Did you mean that as long as there are other ptypes - which may or 
not be doing af packet? 

cheers,
jamal


^ permalink raw reply

* Warning in net/ipv4/af_inet.c:154
From: Anton Blanchard @ 2010-05-25 11:58 UTC (permalink / raw)
  To: netdev


Hi,

A build from last night (2.6.34-git9) is hitting the following WARN_ON on
a ppc64 box:

Badness at net/ipv4/af_inet.c:154

NIP [c00000000062b034] .inet_sock_destruct+0x1b0/0x1f0
LR [c0000000005b376c] .__sk_free+0x58/0x154
Call Trace:
[c000000000088c0c] .__local_bh_disable+0xdc/0xfc (unreliable)
[c0000000005b376c] .__sk_free+0x58/0x154
[c0000000005b3978] .sk_free+0x4c/0x60
[c0000000005b3ab4] .sk_common_release+0x128/0x148
[c00000000061da44] .udp_lib_close+0x28/0x40
[c00000000062a990] .inet_release+0xd0/0xf8
[c0000000005adbf8] .sock_release+0x60/0xec
[c0000000005adcd0] .sock_close+0x4c/0x6c
[c000000000186554] .__fput+0x184/0x270
[c00000000018668c] .fput+0x4c/0x60
[c000000000182248] .filp_close+0xc8/0xf4
[c0000000000826d4] .put_files_struct+0xc8/0x158
[c0000000000827d4] .exit_files+0x70/0x8c
[c0000000000848cc] .do_exit+0x2c8/0x82c
[c000000000084efc] .do_group_exit+0xcc/0x100
[c000000000084f5c] .SyS_exit_group+0x2c/0x48
[c000000000008554] syscall_exit+0x0/0x40

Which is:

        WARN_ON(sk->sk_forward_alloc);

Anton

^ permalink raw reply

* Re: [PATCH RFC] netfilter: iptables target SYNPROXY
From: Jan Engelhardt @ 2010-05-25 11:50 UTC (permalink / raw)
  To: Changli Gao
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov, James Morris,
	netfilter-devel, netdev
In-Reply-To: <AANLkTim2DrXuBPUBK0TseqVJ5lxPRQjK0P_QJVsbtYRC@mail.gmail.com>


On Tuesday 2010-05-25 13:26, Changli Gao wrote:
>> On Tuesday 2010-05-25 09:06, Changli Gao wrote:
>>> net/ipv4/netfilter/Kconfig                  |   12
>>> net/ipv4/netfilter/Makefile                 |    1
>>> net/ipv4/netfilter/ipt_SYNPROXY.c           |  658 ++++++++++++++++++++++++++++
>>> net/ipv4/syncookies.c                       |   21
>>> net/ipv4/tcp_ipv4.c                         |    5
>>> net/netfilter/nf_conntrack_core.c           |   44 +
>>
>> Please make this an Xtables extension.
>> There is excellent documentation ("Writing Netfilter Modules" to google)
>> on the details if needed.
>
>Hmm. I don't know IPv6 well. So I leave it as an iptables extension,
>and hope sb. comes on with IPv6 support after it gets merged.

This should still be xt even if it does not do ipv6.

>>>+      th = skb_header_pointer(skb, iph->ihl * 4, sizeof(_th), &_th);
>>>+      BUG_ON(th == NULL);
>>
>> I wouldn't call that a bug. I think that can happen on an evil TCP tinygram
>> (with proper IPV4 header).
>
>I copied the code from file nf_conntrack_proto_tcp.c. In fact, BUG_ON
>isn't useful at all, as tcp_error is called before this function
>is called.

Ah I see it hard-depends on nf_conntrack_ipv4 (in Kconfig).
That too could be done instead by using nf_ct_get_l3proto at runtime.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH RFC] netfilter: iptables target SYNPROXY
From: Changli Gao @ 2010-05-25 11:26 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov, James Morris,
	netfilter-devel, netdev
In-Reply-To: <alpine.LSU.2.01.1005251207450.1725@obet.zrqbmnf.qr>

On Tue, May 25, 2010 at 6:36 PM, Jan Engelhardt <jengelh@medozas.de> wrote:
>
> On Tuesday 2010-05-25 09:06, Changli Gao wrote:
>> net/ipv4/netfilter/Kconfig                  |   12
>> net/ipv4/netfilter/Makefile                 |    1
>> net/ipv4/netfilter/ipt_SYNPROXY.c           |  658 ++++++++++++++++++++++++++++
>> net/ipv4/syncookies.c                       |   21
>> net/ipv4/tcp_ipv4.c                         |    5
>> net/netfilter/nf_conntrack_core.c           |   44 +
>
> Please make this an Xtables extension.
> There is excellent documentation ("Writing Netfilter Modules" to google)
> on the details if needed.

Hmm. I don't know IPv6 well. So I leave it as an iptables extension,
and hope sb. comes on with IPv6 support after it gets merged.

>
>>--- a/include/net/netfilter/nf_conntrack_core.h
>>+++ b/include/net/netfilter/nf_conntrack_core.h
>>@@ -63,8 +63,21 @@ static inline int nf_conntrack_confirm(struct sk_buff *skb)
>>       if (ct && ct != &nf_conntrack_untracked) {
>>               if (!nf_ct_is_confirmed(ct) && !nf_ct_is_dying(ct))
>>                       ret = __nf_conntrack_confirm(skb);
>>-              if (likely(ret == NF_ACCEPT))
>>+              if (likely(ret == NF_ACCEPT)) {
>>+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
>>+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
>>+                      int (*syn_proxy)(struct sk_buff *, struct nf_conn *,
>>+                                       enum ip_conntrack_info);
>>+#endif
>>+
>>                       nf_ct_deliver_cached_events(ct);
>>+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
>>+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
>>+                      syn_proxy = rcu_dereference(syn_proxy_post_hook);
>>+                      if (syn_proxy)
>>+                              ret = syn_proxy(skb, ct, skb->nfctinfo);
>>+#endif
>>+              }
>>       }
>>       return ret;
>> }
>
> I'm sure this is possible with lesser macro intrusion - use a separate
> function. Same in nf_conntrack_core.c

I'm thinking about adding two helper functions into structure
nf_conntrack_l4proto. Then the l4 protocol sepcific code won't be in
nf_conntrack_core.c but in nf_conntrack_proto_tcp.c.

>
>>--- a/include/net/tcp.h
>>+++ b/include/net/tcp.h
>>@@ -460,8 +460,18 @@ extern int                        tcp_disconnect(struct sock *sk, int flags);
>> extern __u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS];
>> extern struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
>>                                   struct ip_options *opt);
>>-extern __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb,
>>-                                   __u16 *mss);
>>+extern __u32 __cookie_v4_init_sequence(__be32 saddr, __be32 daddr,
>>+                                     __be16 sport, __be16 dport, __u32 seq,
>>+                                     __u16 *mssp);
>>+static inline __u32 cookie_v4_init_sequence(const struct iphdr *iph,
>>+                                          const struct tcphdr *th,
>>+                                          __u16 *mssp)
>>+{
>>+      return __cookie_v4_init_sequence(iph->saddr, iph->daddr, th->source,
>>+                                       th->dest, ntohl(th->seq), mssp);
>>+}
>
> Since there is only one cookie_v4_init_sequence user AFAICS,
> moving the function there seems to make best sense.

OK. I wanted to keep cookie_v4_init_sequence() and
cookie_v4_check_sequence() symmetric.

>
>>+static int get_mss(u8 *data, int len)
>
> unsigned

I am afraid that I can't do that. As get_mss() may return a negative
value, if there is some invalid TCP option.

>
>>+      /* only support IPv4 now */
>
> Please fix :)

The same reasion as above.

>
>>+      th = skb_header_pointer(skb, iph->ihl * 4, sizeof(_th), &_th);
>>+      BUG_ON(th == NULL);
>
> I wouldn't call that a bug. I think that can happen on an evil TCP tinygram
> (with proper IPV4 header).

I copied the code from file nf_conntrack_proto_tcp.c. In fact, BUG_ON
isn't useful at all, as tcp_error is called before this function
is called.

>
>>+static int syn_proxy_mangle_pkt(struct sk_buff *skb, struct iphdr *iph,
>>+                              struct tcphdr *th, u32 seq_diff)
>>+{
>>+      __be32 new;
>>+      int olen;
>>+
>>+      if (skb->len < (iph->ihl + th->doff) * 4)
>>+              return NF_DROP;
>
> Verdicts are unsigned too. (Also in other functions.)

OK. Thanks.

>
>>+static struct xt_target synproxy_tg_reg __read_mostly = {
>>+      .name           = "SYNPROXY",
>>+      .family         = NFPROTO_IPV4,
>>+      .target         = synproxy_tg,
>>+      .table          = "raw",
>>+      .hooks          = (1 << NF_INET_PRE_ROUTING),
>
> No need for (),

Yea. Thanks.

>
>>+      .proto          = IPPROTO_TCP,
>>+      .me             = THIS_MODULE,
>>+};
>>+
>>+static int __init synproxy_tg_init(void)
>>+{
>>+      int err, cpu;
>>+
>>+      for_each_possible_cpu(cpu)
>>+              per_cpu(syn_proxy_state, cpu).seq_inited = 0;
>
> Static data starts out zeroed. That also applies to percpu data, doesn't it?
>

Yea, Eric has mentioned that.

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 22/27] net/dccp: Use memdup_user
From: Gerrit Renker @ 2010-05-25 11:23 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Arnaldo Carvalho de Melo, David S. Miller, dccp, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <Pine.LNX.4.64.1005221025040.13021@ask.diku.dk>

| Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

| 
| ---
|  net/dccp/proto.c |   11 +++--------
|  1 file changed, 3 insertions(+), 8 deletions(-)
| 
| diff --git a/net/dccp/proto.c b/net/dccp/proto.c
| index b03ecf6..f79bcef 100644
| --- a/net/dccp/proto.c
| +++ b/net/dccp/proto.c
| @@ -473,14 +473,9 @@ static int dccp_setsockopt_ccid(struct sock *sk, int type,
|  	if (optlen < 1 || optlen > DCCP_FEAT_MAX_SP_VALS)
|  		return -EINVAL;
|  
| -	val = kmalloc(optlen, GFP_KERNEL);
| -	if (val == NULL)
| -		return -ENOMEM;
| -
| -	if (copy_from_user(val, optval, optlen)) {
| -		kfree(val);
| -		return -EFAULT;
| -	}
| +	val = memdup_user(optval, optlen);
| +	if (IS_ERR(val))
| +		return PTR_ERR(val);
|  
|  	lock_sock(sk);
|  	if (type == DCCP_SOCKOPT_TX_CCID || type == DCCP_SOCKOPT_CCID)

^ permalink raw reply

* Re: [PATCH RFC] netfilter: iptables target SYNPROXY
From: Jan Engelhardt @ 2010-05-25 10:36 UTC (permalink / raw)
  To: Changli Gao
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov, James Morris,
	netfilter-devel, netdev
In-Reply-To: <1274771218-3204-1-git-send-email-xiaosuo@gmail.com>


On Tuesday 2010-05-25 09:06, Changli Gao wrote:
> net/ipv4/netfilter/Kconfig                  |   12 
> net/ipv4/netfilter/Makefile                 |    1 
> net/ipv4/netfilter/ipt_SYNPROXY.c           |  658 ++++++++++++++++++++++++++++
> net/ipv4/syncookies.c                       |   21 
> net/ipv4/tcp_ipv4.c                         |    5 
> net/netfilter/nf_conntrack_core.c           |   44 +

Please make this an Xtables extension.
There is excellent documentation ("Writing Netfilter Modules" to google)
on the details if needed.

>--- a/include/net/netfilter/nf_conntrack_core.h
>+++ b/include/net/netfilter/nf_conntrack_core.h
>@@ -63,8 +63,21 @@ static inline int nf_conntrack_confirm(struct sk_buff *skb)
> 	if (ct && ct != &nf_conntrack_untracked) {
> 		if (!nf_ct_is_confirmed(ct) && !nf_ct_is_dying(ct))
> 			ret = __nf_conntrack_confirm(skb);
>-		if (likely(ret == NF_ACCEPT))
>+		if (likely(ret == NF_ACCEPT)) {
>+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
>+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
>+			int (*syn_proxy)(struct sk_buff *, struct nf_conn *,
>+					 enum ip_conntrack_info);
>+#endif
>+
> 			nf_ct_deliver_cached_events(ct);
>+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
>+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
>+			syn_proxy = rcu_dereference(syn_proxy_post_hook);
>+			if (syn_proxy)
>+				ret = syn_proxy(skb, ct, skb->nfctinfo);
>+#endif
>+		}
> 	}
> 	return ret;
> }

I'm sure this is possible with lesser macro intrusion - use a separate
function. Same in nf_conntrack_core.c

>--- a/include/net/tcp.h
>+++ b/include/net/tcp.h
>@@ -460,8 +460,18 @@ extern int			tcp_disconnect(struct sock *sk, int flags);
> extern __u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS];
> extern struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, 
> 				    struct ip_options *opt);
>-extern __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, 
>-				     __u16 *mss);
>+extern __u32 __cookie_v4_init_sequence(__be32 saddr, __be32 daddr,
>+				       __be16 sport, __be16 dport, __u32 seq,
>+				       __u16 *mssp);
>+static inline __u32 cookie_v4_init_sequence(const struct iphdr *iph,
>+					    const struct tcphdr *th,
>+					    __u16 *mssp)
>+{
>+	return __cookie_v4_init_sequence(iph->saddr, iph->daddr, th->source,
>+					 th->dest, ntohl(th->seq), mssp);
>+}

Since there is only one cookie_v4_init_sequence user AFAICS,
moving the function there seems to make best sense.

>+static int get_mss(u8 *data, int len)

unsigned

>+	/* only support IPv4 now */

Please fix :)

>+	th = skb_header_pointer(skb, iph->ihl * 4, sizeof(_th), &_th);
>+	BUG_ON(th == NULL);

I wouldn't call that a bug. I think that can happen on an evil TCP tinygram
(with proper IPV4 header).

>+static int syn_proxy_mangle_pkt(struct sk_buff *skb, struct iphdr *iph,
>+				struct tcphdr *th, u32 seq_diff)
>+{
>+	__be32 new;
>+	int olen;
>+
>+	if (skb->len < (iph->ihl + th->doff) * 4)
>+		return NF_DROP;

Verdicts are unsigned too. (Also in other functions.)

>+static struct xt_target synproxy_tg_reg __read_mostly = {
>+	.name		= "SYNPROXY",
>+	.family		= NFPROTO_IPV4,
>+	.target		= synproxy_tg,
>+	.table		= "raw",
>+	.hooks		= (1 << NF_INET_PRE_ROUTING),

No need for (),

>+	.proto		= IPPROTO_TCP,
>+	.me		= THIS_MODULE,
>+};
>+
>+static int __init synproxy_tg_init(void)
>+{
>+	int err, cpu;
>+
>+	for_each_possible_cpu(cpu)
>+		per_cpu(syn_proxy_state, cpu).seq_inited = 0;

Static data starts out zeroed. That also applies to percpu data, doesn't it?


^ permalink raw reply

* Re: Question about an assignment in handle_ing()
From: Herbert Xu @ 2010-05-25 10:26 UTC (permalink / raw)
  To: jamal; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <1274781067.3878.872.camel@bigi>

On Tue, May 25, 2010 at 05:51:07AM -0400, jamal wrote:
> The code is correct. 
> 
> Main reason for the else condn is driven by optimization:
> If you are running tcpdump or other af packet type code, the "if" condn
> is hit and matching actions are not allowed trample on that same packet
> data. They have to make a private copy; otherwise, the "else" is hit and
> (for optimization reason) you give ok to the actions that follow to
> munge the packet. Essentially, you dont want actions to alloc/copy every
> single time when you are not running tcpdump for example; reason is that
> most of the time you run tcpdump it is for debugging.
> [I had seen very observable differences on some old mips board back in
> the day on whether you avoided copy every time vs when debugging by
> running tcpdump and copied every packet.]

In that case you should be checking whether the skb is cloned.
After all, tcpdump might have simply filtered the packet out.

BTW, this is the case whenever you run a DHCP client/server.  So
on most boxes your optimisation will never kick in as is.  Also
the skb may still be cloned even if there is no AF_PACKET listener.
In that case your optimisation may be incorrect.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: Namespaces and devices
From: Daniel Lezcano @ 2010-05-25 10:14 UTC (permalink / raw)
  To: Martín Ferrari; +Cc: netdev, Mathieu Lacage
In-Reply-To: <AANLkTimp16eLIO7F65t7-hFgXGbVnoEIW3jGhzWTJlYa@mail.gmail.com>

On 05/25/2010 11:56 AM, Martín Ferrari wrote:
> Hi Daniel,
>
> On Sun, May 23, 2010 at 16:04, Daniel Lezcano<daniel.lezcano@free.fr>  wrote:
>
>    
>> The documentation on this website is a bit out dated. That was the initial
>> behavior but was changed as the following.
>>
>> All the virtual devices are destroyed with the network namespace. The
>> destroyable virtual devices are identified when they have the dellink ops
>> defined. If you can do the 'ip link del' command on this device, then this
>> device type will be destroyed by a netns.
>>      
> Fair enough, thanks for the explanation.
>    

You are welcome ;)

>> About the oops,  was the it "kernel panic when using
>> netns+bridges+tc(netem)" ?
>>      
> No, this is a different issue. Probably it got lost in the noise of
> the thread, but it is here:
> http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/694830/focus=158583
> (it is my second mail in that thread)

Ok, I was able to reproduce it easily. If I have spare time this week, I 
will look at the oops.

>>> Also, I have read somewhere (now I cannot find it) that supposedly, I
>>> should be able to move real devices to a netns, but I always get
>>> Invalid argument errors.
>>>        
>> Yes, that was previously the case with the proof of concept, because sysfs
>> per namespace was enabled. But this feature is not merged upstream yet (but
>> is on the way), so physical devices are not movable across namespaces.
>>      
> OK. The plan is for them to return to netns 1 automatically at destroy?
>    

Yes, this is the plan.

Thanks
   -- Daniel

^ permalink raw reply

* Re: Namespaces and devices
From: Martín Ferrari @ 2010-05-25  9:56 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: netdev, Mathieu Lacage
In-Reply-To: <4BF935F6.2050907@free.fr>

Hi Daniel,

On Sun, May 23, 2010 at 16:04, Daniel Lezcano <daniel.lezcano@free.fr> wrote:

> The documentation on this website is a bit out dated. That was the initial
> behavior but was changed as the following.
>
> All the virtual devices are destroyed with the network namespace. The
> destroyable virtual devices are identified when they have the dellink ops
> defined. If you can do the 'ip link del' command on this device, then this
> device type will be destroyed by a netns.

Fair enough, thanks for the explanation.

> About the oops,  was the it "kernel panic when using
> netns+bridges+tc(netem)" ?

No, this is a different issue. Probably it got lost in the noise of
the thread, but it is here:
http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/694830/focus=158583
(it is my second mail in that thread)

>> Also, I have read somewhere (now I cannot find it) that supposedly, I
>> should be able to move real devices to a netns, but I always get
>> Invalid argument errors.
>
> Yes, that was previously the case with the proof of concept, because sysfs
> per namespace was enabled. But this feature is not merged upstream yet (but
> is on the way), so physical devices are not movable across namespaces.

OK. The plan is for them to return to netns 1 automatically at destroy?

Thanks!
-- 
Martín Ferrari

^ permalink raw reply

* Re: host panic on kernel 2.6.34
From: Avi Kivity @ 2010-05-25  9:53 UTC (permalink / raw)
  To: Hao, Xudong
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, netdev, bridge
In-Reply-To: <BC00F5384FCFC9499AF06F92E8B78A9E04DCCCE242@shsmsx502.ccr.corp.intel.com>

Copying netdev, bridge mailing lists.

On 05/24/2010 11:23 AM, Hao, Xudong wrote:
> Hi all
> I build latest kvm 37dec075a7854f0f550540bf3b9bbeef37c11e2a, based on kernel 2.6.34, after kvm and kvm_intel module loaded, then /etc/init.d/kvm start, a few minutes later, the system will panic.
>
> kernel: 2.6.34
> kvm: 37dec075a7854f0f550540bf3b9bbeef37c11e2a
> qemu-kvm: 69dd59a66aaf56d1e8e4c96d0a0923c9cf8f79a0
>
> BUG: unable to handle kernel NULL pointer dereference at 00000018
> IP: [<f914c05b>] br_mdb_ip_get+0x2e/0x1aa [bridge]
> *pdpt = 0000000035fbb001 *pde = 0000000000000000
> Oops: 0000 [#1] SMP
> last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
> Modules linked in: bridge stp autofs4 hidp rfcomm l2cap crc16 bluetooth rfkill ]
>
> Pid: 0, comm: swapper Not tainted 2.6.34 #1 X7DWA/X7DWA
> EIP: 0060:[<f914c05b>] EFLAGS: 00010246 CPU: 0
> EIP is at br_mdb_ip_get+0x2e/0x1aa [bridge]
> EAX: c5801d40 EBX: c5801d40 ECX: faffffef EDX: 00000000
> ESI: f67e03c0 EDI: f5249200 EBP: c5801c94 ESP: c5801c80
>   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process swapper (pid: 0, ti=c5801000 task=c07f2fe0 task.ti=c07de000)
> Stack:
>   c5801d40 00000000 c5801d40 f67e03c0 f5249200 c5801cb0 f914c6fd fff90006
> <0>  f67e0940 f6326740 f627e064 f67e03c0 c5801d78 f914dd0c f76af140 f6326740
> <0>  f5249200 f67e03c0 00000014 f6326758 c5801d54 c08eb440 c5801cf4 c5801d00
> Call Trace:
>   [<f914c6fd>] ? br_multicast_leave_group+0x52/0x128 [bridge]
>   [<f914dd0c>] ? br_multicast_rcv+0x6dc/0xe90 [bridge]
>   [<c0650420>] ? fib_lookup+0x2c/0x3a
>   [<c064cd15>] ? fib_validate_source+0x29d/0x2b4
>   [<c0621175>] ? nf_hook_slow+0x3b/0x92
>   [<f9147b39>] ? br_handle_frame_finish+0x53/0x17e [bridge]
>   [<f914b880>] ? br_nf_pre_routing_finish+0x264/0x27c [bridge]
>   [<c0621175>] ? nf_hook_slow+0x3b/0x92
>   [<f914b61c>] ? br_nf_pre_routing_finish+0x0/0x27c [bridge]
>   [<f914bf6f>] ? br_nf_pre_routing+0x553/0x570 [bridge]
>   [<c0621107>] ? nf_iterate+0x2f/0x62
>   [<f9147ae6>] ? br_handle_frame_finish+0x0/0x17e [bridge]
>   [<c0621175>] ? nf_hook_slow+0x3b/0x92
>   [<f9147ae6>] ? br_handle_frame_finish+0x0/0x17e [bridge]
>   [<f9147dda>] ? br_handle_frame+0x176/0x198 [bridge]
>   [<f9147ae6>] ? br_handle_frame_finish+0x0/0x17e [bridge]
>   [<c060643b>] ? __netif_receive_skb+0x29a/0x37e
>   [<c0607023>] ? dev_gro_receive+0xfd/0x1d2
>   [<c0606e03>] ? netif_receive_skb+0x61/0x67
>   [<c0607199>] ? __napi_gro_receive+0xa1/0xba
>   [<c0606e7e>] ? napi_skb_finish+0x1e/0x33
>   [<c0607201>] ? napi_gro_receive+0x20/0x24
>   [<f8867cfc>] ? igb_poll+0x706/0xa39 [igb]
>   [<c06093b2>] ? net_rx_action+0x97/0x13b
>   [<c0430641>] ? __do_softirq+0x80/0xf4
>   [<c04305c1>] ? __do_softirq+0x0/0xf4
>   <IRQ>
>   [<c04305bf>] ? irq_exit+0x29/0x2b
>   [<c040373e>] ? do_IRQ+0x85/0x9b
>   [<c0402ca9>] ? common_interrupt+0x29/0x30
>   [<c0407c4f>] ? mwait_idle+0x4c/0x52
>   [<c0401a08>] ? cpu_idle+0x3a/0x4e
>   [<c066cf16>] ? rest_init+0x62/0x64
>   [<c08248dd>] ? start_kernel+0x2c2/0x2c7
>   [<c08240b3>] ? i386_start_kernel+0xb3/0xb8
> Code: 57 56 53 83 ec 08 89 45 f0 89 55 ec 8b 42 10 66 83 f8 08 74 0e 31 db 66 3
> EIP: [<f914c05b>] br_mdb_ip_get+0x2e/0x1aa [bridge] SS:ESP 0068:c5801c80
> CR2: 0000000000000018
> ---[ end trace 907f878ab4cd8031 ]---
> Kernel panic - not syncing: Fatal exception in interrupt
> Pid: 0, comm: swapper Tainted: G      D     2.6.34 #1
> Call Trace:
>   [<c042c31b>] panic+0x3e/0xaa
>   [<c0681caa>] oops_end+0x8c/0x9b
>   [<c041e710>] no_context+0x153/0x15d
>   [<c041e8a2>] __bad_area_nosemaphore+0xe5/0xed
>   [<c041e90e>] bad_area_nosemaphore+0xd/0x13
>   [<c06838b0>] do_page_fault+0x375/0x37d
>   [<c0650420>] ? fib_lookup+0x2c/0x3a
>   [<c0624431>] ? ip_route_input_common+0x695/0xf2f
>   [<c068353b>] ? do_page_fault+0x0/0x37d
>   [<c06813d6>] error_code+0x66/0x6c
>   [<c068353b>] ? do_page_fault+0x0/0x37d
>   [<f914c05b>] ? br_mdb_ip_get+0x2e/0x1aa [bridge]
>   [<f914c6fd>] br_multicast_leave_group+0x52/0x128 [bridge]
>   [<f914dd0c>] br_multicast_rcv+0x6dc/0xe90 [bridge]
>   [<c0650420>] ? fib_lookup+0x2c/0x3a
>   [<c064cd15>] ? fib_validate_source+0x29d/0x2b4
>   [<c0621175>] ? nf_hook_slow+0x3b/0x92
>   [<f9147b39>] br_handle_frame_finish+0x53/0x17e [bridge]
>   [<f914b880>] br_nf_pre_routing_finish+0x264/0x27c [bridge]
>   [<c0621175>] ? nf_hook_slow+0x3b/0x92
>   [<f914b61c>] ? br_nf_pre_routing_finish+0x0/0x27c [bridge]
>   [<f914bf6f>] br_nf_pre_routing+0x553/0x570 [bridge]
>   [<c0621107>] nf_iterate+0x2f/0x62
>   [<f9147ae6>] ? br_handle_frame_finish+0x0/0x17e [bridge]
>   [<c0621175>] nf_hook_slow+0x3b/0x92
>   [<f9147ae6>] ? br_handle_frame_finish+0x0/0x17e [bridge]
>   [<f9147dda>] br_handle_frame+0x176/0x198 [bridge]
>   [<f9147ae6>] ? br_handle_frame_finish+0x0/0x17e [bridge]
>   [<c060643b>] __netif_receive_skb+0x29a/0x37e
>   [<c0607023>] ? dev_gro_receive+0xfd/0x1d2
>   [<c0606e03>] netif_receive_skb+0x61/0x67
>   [<c0607199>] ? __napi_gro_receive+0xa1/0xba
>   [<c0606e7e>] napi_skb_finish+0x1e/0x33
>   [<c0607201>] napi_gro_receive+0x20/0x24
>   [<f8867cfc>] igb_poll+0x706/0xa39 [igb]
>   [<c06093b2>] net_rx_action+0x97/0x13b
>   [<c0430641>] __do_softirq+0x80/0xf4
>   [<c04305c1>] ? __do_softirq+0x0/0xf4
>   <IRQ>   [<c04305bf>] ? irq_exit+0x29/0x2b
>   [<c040373e>] ? do_IRQ+0x85/0x9b
>   [<c0402ca9>] ? common_interrupt+0x29/0x30
>   [<c0407c4f>] ? mwait_idle+0x4c/0x52
>   [<c0401a08>] ? cpu_idle+0x3a/0x4e
>   [<c066cf16>] ? rest_init+0x62/0x64
>   [<c08248dd>] ? start_kernel+0x2c2/0x2c7
>   [<c08240b3>] ? i386_start_kernel+0xb3/0xb8
>
> Best Regards,
> Xudong Hao
>    


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply

* Re: [Patch] fix packet loss and massive ping spikes with PPP multi-link
From: Richard Hartmann @ 2010-05-25  9:52 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-ppp, erik_list
In-Reply-To: <2d460de71003260850x7f90d04cy79ac853464108182@mail.gmail.com>

Hi all,

unfortunately, I lack the thread [1] in which John and Erik so I am
replying in my initial thread.

Unfortunately, our own patch is still in its initial state even though
we tried to clean it up to the Kernel's standard.

As you can see in [1], there is a real problem with multilink ppp. Thus,
I wanted to take the opportunity to ask for someone who is more familiar
with both C and the Linux Kernel than we are to have a go at getting
this code cleaned up and into mainline.

Any and all feedback appreciated,
Richard

[1] http://marc.info/?l=linux-ppp&m=127429724125841&w=2

^ permalink raw reply

* Re: Question about an assignment in handle_ing()
From: jamal @ 2010-05-25  9:51 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, herbert, kaber
In-Reply-To: <20100524112236.GF2810@psychotron.lab.eng.brq.redhat.com>

Hi Jiri,

On Mon, 2010-05-24 at 13:22 +0200, Jiri Pirko wrote:

> Question is if this code is correct here. Maybe I'm missing something but
> why is this dependent on a ptype was found previously?

The code is correct. 

Main reason for the else condn is driven by optimization:
If you are running tcpdump or other af packet type code, the "if" condn
is hit and matching actions are not allowed trample on that same packet
data. They have to make a private copy; otherwise, the "else" is hit and
(for optimization reason) you give ok to the actions that follow to
munge the packet. Essentially, you dont want actions to alloc/copy every
single time when you are not running tcpdump for example; reason is that
most of the time you run tcpdump it is for debugging.
[I had seen very observable differences on some old mips board back in
the day on whether you avoided copy every time vs when debugging by
running tcpdump and copied every packet.]

There is a secondary reason for the else stmnt: you want tcpdump to see
the naked packet as it came on say eth0. I am in travel mode at the
moment, so i cant validate this for you via a testcase (which moves the
else outside), but i know this was a problem back then...
Example. If i had a packet sequence path as follows: 
--> eth0-->tcpdump0-->filter match --> action: edit -->action: redirect
to dummy0-->tcpdump1-->dummy0(drop and count)
Then you want the packet on tcpdump0 to be whatever was seen by eth0.
You want to see on tcpdump1 whatever was seen by dummy0 - which is an
edited version of whatever was seen by eth0. 

If this description makes sense to you (and since this has come up more
than once before), would you be kind enough to submit a patch that fixes
the current comment and add my ACK to it?

cheers,
jamal

^ permalink raw reply

* RE: ixgbe: macvlan on PF/VF when SRIOV is enabled
From: Shirley Ma @ 2010-05-25  9:37 UTC (permalink / raw)
  To: Rose, Gregory V
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, kvm@vger.kernel.org,
	netdev@vger.kernel.org, e1000-devel@lists.sourceforge.net
In-Reply-To: <43F901BD926A4E43B106BF17856F0755A79C0E05@orsmsx508.amr.corp.intel.com>

On Mon, 2010-05-24 at 10:54 -0700, Rose, Gregory V wrote:
> We look forward to it and will be happy to provide feedback.
I have submitted the patch to make macvlan on PF works when SRIOV is
enabled.

> One thing you can do is allocate VFs and then load the VF driver in
> your host domain and then assign each of them a macvlan filter.  You'd
> get a similar effect.

That's I am trying to make it work for macvlan on VFs in host domain. I
need to add VF secondary addresses in address filter, right?

Do you have any aggregation performance comparison between multiple
macvlans on PF and single macvlan per VF in host domain? I will run some
test to figure it out. If you have some data to share that would be
great.

Thanks
Shirley

^ permalink raw reply

* Re: [PATCH 0/2] fixes to arp_notify for virtual machine migration use case
From: Ian Campbell @ 2010-05-25  9:20 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, shemminger@linux-foundation.org,
	Jeremy Fitzhardinge, stable@kernel.org
In-Reply-To: <20100523.233722.232536798.davem@davemloft.net>

On Mon, 2010-05-24 at 07:37 +0100, David Miller wrote:
> From: Ian Campbell <Ian.Campbell@citrix.com>
> Date: Wed, 12 May 2010 14:39:14 +0100
> 
> > Ian Campbell (2):
> >       arp_notify: generate broadcast ARP reply not request.
> >       arp_notify: generate arp_notify event on NETDEV_CHANGE too
> 
> I don't agree with these changes.
> 
> For the first one, I think the documentation is just wrong and the
> code is what expresses the intent.  The idea is not to spam the
> world with a broadcast, only interested parties.

OK. So what is currently sent is a gratuitous ARP request which I hadn't
realised was a valid/normal thing to send but a bit of research shows
that it is.

In terms of MAC learning tables (which is what is interesting in the VM
migration case) I think the gratuitous ARP request and reply are
equivalent.

I guess the difference is that an ARP request doesn't cause everyone to
add an entry to their ARP cache, whereas a bcast ARP reply would cause
an ARP cache entry to appear even on hosts which may never talk to the
VM. Is that what you meant by spamming the world?

Anyway, I'll patch the docs instead in my next submission.

> Patch #2 I have major issues with, carriers flapping occaisionally is
> very common.  I have several interfaces which do this even on lightly
> loaded networks.  Iff we decide to do something like this (big "if")
> it would need to be rate limited so that it doesn't trigger due to
> normal flapping.

FWIW arp_notify is an option which is disabled by default.

> If you want your VM networking devices to trigger this event maybe
> the best thing to do is to create a special notification which
> allows us to prevent from doing this ARP notify for spurious physical
> network device carrier flaps.

I actually started down this path before I figured out how to make the
existing notifications work for my use cases.

Anyway, assuming the fact that arp_notify is disabled by default hasn't
changed your mind, would adding NETDEV_NOTIFY_PEERS triggered by
netif_notify_peers() be appropriate or would it be preferable to simply
add netif_notify_peers() which generates the existing NETDEV_CHANGEADDR?

I don't think there's any need for a new sysctl so I'll gate the new
option on the existing arp_notify one.

Ian.

^ permalink raw reply

* Re: [PATCH] vhost: Fix host panic if ioctl called with wrong index
From: Michael S. Tsirkin @ 2010-05-25  8:13 UTC (permalink / raw)
  To: Krishna Kumar; +Cc: netdev, kvm
In-Reply-To: <20100525054036.2022.66692.sendpatchset@krkumar2.in.ibm.com>

On Tue, May 25, 2010 at 11:10:36AM +0530, Krishna Kumar wrote:
> From: Krishna Kumar <krkumar2@in.ibm.com>
> 
> Missed a boundary value check in vhost_set_vring. The host panics if
> idx == nvqs is used in ioctl commands in vhost_virtqueue_init.
> 
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>

Thanks, applied.

> ---
>  drivers/vhost/vhost.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff -ruNp org/drivers/vhost/vhost.c new/drivers/vhost/vhost.c
> --- org/drivers/vhost/vhost.c	2010-05-24 09:25:57.000000000 +0530
> +++ new/drivers/vhost/vhost.c	2010-05-24 09:26:53.000000000 +0530
> @@ -374,7 +374,7 @@ static long vhost_set_vring(struct vhost
>  	r = get_user(idx, idxp);
>  	if (r < 0)
>  		return r;
> -	if (idx > d->nvqs)
> +	if (idx >= d->nvqs)
>  		return -ENOBUFS;
>  
>  	vq = d->vqs + idx;

^ permalink raw reply

* [PATCH net-2.6] be2net: Bug fix to avoid disabling bottom half during firmware upgrade.
From: Sarveshwar Bandi @ 2010-05-25  8:15 UTC (permalink / raw)
  To: netdev; +Cc: davem

Certain firmware commands/operations to upgrade firmware could take several
seconds to complete. The code presently disables bottom half during these
operations which could lead to unpredictable behaviour in certain cases. This
patch now does all firmware upgrade operations asynchronously using a
completion variable.

Signed-off-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>
---
 drivers/net/benet/be.h      |    2 ++
 drivers/net/benet/be_cmds.c |   19 +++++++++++++++++--
 drivers/net/benet/be_main.c |    1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/net/benet/be.h b/drivers/net/benet/be.h
index 373c1a5..b46be49 100644
--- a/drivers/net/benet/be.h
+++ b/drivers/net/benet/be.h
@@ -283,6 +283,8 @@ struct be_adapter {
 	u8 port_type;
 	u8 transceiver;
 	u8 generation;		/* BladeEngine ASIC generation */
+	u32 flash_status;
+	struct completion flash_compl;
 
 	bool sriov_enabled;
 	u32 vf_if_handle[BE_MAX_VF];
diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index e79bf8b..c911bfb 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -59,6 +59,13 @@ static int be_mcc_compl_process(struct b
 
 	compl_status = (compl->status >> CQE_STATUS_COMPL_SHIFT) &
 				CQE_STATUS_COMPL_MASK;
+
+	if ((compl->tag0 == OPCODE_COMMON_WRITE_FLASHROM) &&
+		(compl->tag1 == CMD_SUBSYSTEM_COMMON)) {
+		adapter->flash_status = compl_status;
+		complete(&adapter->flash_compl);
+	}
+
 	if (compl_status == MCC_STATUS_SUCCESS) {
 		if (compl->tag0 == OPCODE_ETH_GET_STATISTICS) {
 			struct be_cmd_resp_get_stats *resp =
@@ -1417,6 +1424,7 @@ int be_cmd_write_flashrom(struct be_adap
 	int status;
 
 	spin_lock_bh(&adapter->mcc_lock);
+	adapter->flash_status = 0;
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -1428,6 +1436,7 @@ int be_cmd_write_flashrom(struct be_adap
 
 	be_wrb_hdr_prepare(wrb, cmd->size, false, 1,
 			OPCODE_COMMON_WRITE_FLASHROM);
+	wrb->tag1 = CMD_SUBSYSTEM_COMMON;
 
 	be_cmd_hdr_prepare(&req->hdr, CMD_SUBSYSTEM_COMMON,
 		OPCODE_COMMON_WRITE_FLASHROM, cmd->size);
@@ -1439,10 +1448,16 @@ int be_cmd_write_flashrom(struct be_adap
 	req->params.op_code = cpu_to_le32(flash_opcode);
 	req->params.data_buf_size = cpu_to_le32(buf_size);
 
-	status = be_mcc_notify_wait(adapter);
+	be_mcc_notify(adapter);
+	spin_unlock_bh(&adapter->mcc_lock);
+
+	if (!wait_for_completion_timeout(&adapter->flash_compl,
+			msecs_to_jiffies(12000)))
+		status = -1;
+	else
+		status = adapter->flash_status;
 
 err:
-	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 1c79c20..aa065c7 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -2319,6 +2319,7 @@ static int be_ctrl_init(struct be_adapte
 	spin_lock_init(&adapter->mcc_lock);
 	spin_lock_init(&adapter->mcc_cq_lock);
 
+	init_completion(&adapter->flash_compl);
 	pci_save_state(adapter->pdev);
 	return 0;
 
-- 
1.4.0


^ permalink raw reply related

* Re: [stable-2.6.32 PATCH] ixgbe: backport bug fix for tx panic
From: Jeff Kirsher @ 2010-05-25  7:15 UTC (permalink / raw)
  To: stable, greg; +Cc: netdev, linux-kernel, davem, Brandon, Jesse Brandeburg
In-Reply-To: <20100511004655.30590.74584.stgit@localhost.localdomain>

On Mon, May 10, 2010 at 17:46, Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
>
> backporting this commit:
>
> commit fdd3d631cddad20ad9d3e1eb7dbf26825a8a121f
> Author: Krishna Kumar <krkumar2@in.ibm.com>
> Date:   Wed Feb 3 13:13:10 2010 +0000
>
>    ixgbe: Fix return of invalid txq
>
>    a developer had complained of getting lots of warnings:
>
>    "eth16 selects TX queue 98, but real number of TX queues is 64"
>
>    http://www.mail-archive.com/e1000-devel@lists.sourceforge.net/msg02200.html
>
>    As there was no follow up on that bug, I am submitting this
>    patch assuming that the other return points will not return
>    invalid txq's, and also that this fixes the bug (not tested).
>
>    Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
>    Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>    Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
>    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>    Signed-off-by: David S. Miller <davem@davemloft.net>
>
> CC: Brandon <brandon@ifup.org>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> ---
>
>  drivers/net/ixgbe/ixgbe_main.c |    8 ++++++--
>  1 files changed, 6 insertions(+), 2 deletions(-)
>

Greg - status?  Did you queue this patch for the stable release and I missed it?

-- 
Cheers,
Jeff

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox