Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 2/2] drivers/net: Use of_match_ptr() macro in smsc911x.c
From: Sachin Kamat @ 2012-12-19 11:17 UTC (permalink / raw)
  To: netdev; +Cc: steve.glendinning, davem, sachin.kamat, patches, nico
In-Reply-To: <1355915830-29481-1-git-send-email-sachin.kamat@linaro.org>

Add CONFIG_OF guard and use of_match_ptr macro.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
---
Compile tested on linux-next.
---
 drivers/net/ethernet/smsc/smsc911x.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/smsc/smsc911x.c b/drivers/net/ethernet/smsc/smsc911x.c
index 4616bf2..e112877 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -2575,11 +2575,13 @@ static const struct dev_pm_ops smsc911x_pm_ops = {
 #define SMSC911X_PM_OPS NULL
 #endif
 
+#ifdef CONFIG_OF
 static const struct of_device_id smsc911x_dt_ids[] = {
 	{ .compatible = "smsc,lan9115", },
 	{ /* sentinel */ }
 };
 MODULE_DEVICE_TABLE(of, smsc911x_dt_ids);
+#endif
 
 static struct platform_driver smsc911x_driver = {
 	.probe = smsc911x_drv_probe,
@@ -2588,7 +2590,7 @@ static struct platform_driver smsc911x_driver = {
 		.name	= SMSC_CHIPNAME,
 		.owner	= THIS_MODULE,
 		.pm	= SMSC911X_PM_OPS,
-		.of_match_table = smsc911x_dt_ids,
+		.of_match_table = of_match_ptr(smsc911x_dt_ids),
 	},
 };
 
-- 
1.7.4.1

^ permalink raw reply related

* Re: [PATCH] xen/netfront: improve truesize tracking
From: Sander Eikelenboom @ 2012-12-19 11:34 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Eric Dumazet, netdev@vger.kernel.org, Konrad Rzeszutek Wilk,
	annie li, xen-devel@lists.xensource.com
In-Reply-To: <1355844398.14620.254.camel@zakaz.uk.xensource.com>


Tuesday, December 18, 2012, 4:26:38 PM, you wrote:

> On Tue, 2012-12-18 at 15:12 +0000, Eric Dumazet wrote:
>> On Tue, 2012-12-18 at 13:51 +0000, Ian Campbell wrote:
>> > Using RX_COPY_THRESHOLD is incorrect if the SKB is actually smaller
>> > than that. We have already accounted for this in
>> > NETFRONT_SKB_CB(skb)->pull_to so use that instead.
>> > 
>> > Fixes WARN_ON from skb_try_coalesce.
>> > 
>> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>> > Cc: Sander Eikelenboom <linux@eikelenboom.it>
>> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> > Cc: annie li <annie.li@oracle.com>
>> > Cc: xen-devel@lists.xensource.com
>> > Cc: netdev@vger.kernel.org
>> > Cc: stable@kernel.org # 3.7.x only
>> > ---
>> >  drivers/net/xen-netfront.c |   15 +++++----------
>> >  1 files changed, 5 insertions(+), 10 deletions(-)
>> > 
>> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> > index caa0110..b06ef81 100644
>> > --- a/drivers/net/xen-netfront.c
>> > +++ b/drivers/net/xen-netfront.c
>> > @@ -971,17 +971,12 @@ err:
>> >              * overheads. Here, we add the size of the data pulled
>> >              * in xennet_fill_frags().
>> >              *
>> > -            * We also adjust for any unused space in the main
>> > -            * data area by subtracting (RX_COPY_THRESHOLD -
>> > -            * len). This is especially important with drivers
>> > -            * which split incoming packets into header and data,
>> > -            * using only 66 bytes of the main data area (see the
>> > -            * e1000 driver for example.)  On such systems,
>> > -            * without this last adjustement, our achievable
>> > -            * receive throughout using the standard receive
>> > -            * buffer size was cut by 25%(!!!).
>> > +            * We also adjust for the __pskb_pull_tail done in
>> > +            * handle_incoming_queue which pulls data from the
>> > +            * frags into the head area, which is already
>> > +            * accounted in RX_COPY_THRESHOLD.
>> >              */
>> > -           skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
>> > +           skb->truesize += skb->data_len - NETFRONT_SKB_CB(skb)->pull_to;
>> >             skb->len += skb->data_len;
>> >  
>> >             if (rx->flags & XEN_NETRXF_csum_blank)
>> 
>> 
>> But skb truesize is not what you think.

> Indeed, it seems I was completely backwards about what it means!

>> You must account the exact memory used by this skb, not only the used
>> part of it.
>> 
>> At the very minimum, it should be
>> 
>> skb->truesize += skb->data_len;
>> 
>> But it really should be the allocated size of the fragment.
>> 
>> If its a page, then its a page, even if you use one single byte in it.

> So actually we want += PAGE_SIZE * skb_shinfo(skb)->nr_frags ?

> Sander, can you try that change?

Hi Ian,

It ran overnight and i haven't seen the warn_once trigger.
(but i also didn't with the previous patch)

--
Sander

> Ian.

^ permalink raw reply

* Re: RFC  [PATCH] iproute2:  temporary solution to fix xt breakage
From: Jamal Hadi Salim @ 2012-12-19 11:36 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Hasan Chowdhury, Jan Engelhardt, Yury Stankevich,
	netdev@vger.kernel.org, pablo, netfilter-devel
In-Reply-To: <20121217081203.3dc324c8@nehalam.linuxnetplumber.net>

On 12-12-17 11:12 AM, Stephen Hemminger wrote:

>
> Maybe xtables should have stable API/ABI and use shim routines there?

Thats the general direction being taken now with this last changes...

cheers,
jamal



^ permalink raw reply

* Re: tc ipt action
From: Jamal Hadi Salim @ 2012-12-19 11:43 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc,
	netdev@vger.kernel.org, netfilter-devel
In-Reply-To: <alpine.LNX.2.01.1212181426590.24597@nerf07.vanv.qr>

On 12-12-18 08:58 AM, Jan Engelhardt wrote:
>

> Chains can store multiple targets, so no loss.

Nice.

> 1. table
>
> First, I think some targets need to relax their restrictions, such as
> with xt_DSCP.

Saw your other patch to get rid of mangle hardcoding.

> Then, only a handful of extensions remain: CT, <all NATs>,
> TPROXY and REJECT. Would anyone want to call these from act_ipt?
> I doubt it. :)
>

Tempted to say tproxy.

> 2. hooks
>
> Extensions with hook limit: <NAT>, TPROXY, REJECT, CLASSIFY.
> Again, I don't quite see the value of attempting to NAT from act_ipt.
> CLASSIFY {c|sh?}ould be relaxed, unless I am missing something.
>


I could live with that. It would be an improvement over whats there 
today. I would prefer however for this to be an improvement over
act_xt.c i posted as opposed to have even more interfaces for xt.
We've suffered enough already ;-> i.e add your patches on top.

cheers,
jamal


^ permalink raw reply

* Re: RFC [PATCH] iproute2: temporary solution to fix xt breakage
From: Jamal Hadi Salim @ 2012-12-19 11:44 UTC (permalink / raw)
  To: Hasan Chowdhury
  Cc: Stephen Hemminger, Jan Engelhardt, Yury Stankevich,
	netdev@vger.kernel.org, pablo, netfilter-devel
In-Reply-To: <CAASe=fR6Hm2dxp=1wDchtrzqnaH6qacHpg2wrsqLfmGpPbQ9Fg@mail.gmail.com>

On 12-12-18 09:45 AM, Hasan Chowdhury wrote:
> Hi Jamal,
>
> Thanks for all the help and the information. I will keep tune myself so
> when the proper path from kernel side will show up I will integrate it
> into my system to test it.
>

Yikes. I guess i never posted that? Will do it shortly.

cheers,
jamal


^ permalink raw reply

* Re: [GIT PULL net-next 04/17] ndisc: Introduce ndisc_fill_redirect_hdr_option().
From: Bjørn Mork @ 2012-12-19 11:47 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki; +Cc: davem, netdev
In-Reply-To: <50D04B4B.7060002@linux-ipv6.org>

YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> writes:

> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
> ---
>  net/ipv6/ndisc.c |   21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> index a181113..0a4f3a9 100644
> --- a/net/ipv6/ndisc.c
> +++ b/net/ipv6/ndisc.c
> @@ -1332,6 +1332,19 @@ static void ndisc_redirect_rcv(struct sk_buff *skb)
>  	icmpv6_notify(skb, NDISC_REDIRECT, 0, 0);
>  }
>  
> +static u8 *ndisc_fill_redirect_hdr_option(u8 *opt, struct sk_buff *orig_skb,
> +					  int rd_len)
> +{
> +	memset(opt, 0, 8);
> +	*(opt++) = ND_OPT_REDIRECT_HDR;
> +	*(opt++) = (rd_len >> 3);
> +	opt += 6;
> +
> +	memcpy(opt, ipv6_hdr(orig_skb), rd_len - 8);
> +
> +	return opt;
> +}
> +

I realize that it doesn't currently matter, but the above modification
of "opt" looks like a bug-waiting-to-happen to me.

>  void ndisc_send_redirect(struct sk_buff *skb, const struct in6_addr *target)
>  {
>  	struct net_device *dev = skb->dev;
> @@ -1461,12 +1474,8 @@ void ndisc_send_redirect(struct sk_buff *skb, const struct in6_addr *target)
>  	 *	build redirect option and copy skb over to the new packet.
>  	 */
>  
> -	memset(opt, 0, 8);
> -	*(opt++) = ND_OPT_REDIRECT_HDR;
> -	*(opt++) = (rd_len >> 3);
> -	opt += 6;
> -
> -	memcpy(opt, ipv6_hdr(skb), rd_len - 8);
> +	if (rd_len)
> +		opt = ndisc_fill_redirect_hdr_option(opt, skb, rd_len);


I understand that opt isn't currently used after this, but if it ever is
then it is going to come as big a surprise that this implies opt += 8;

This was previously quite clear when the code was inline, but it becomes
problematic when it is factored out.


Bjørn

^ permalink raw reply

* [PATCH 1/2] drivers/net: Use of_match_ptr() macro in smc91x.c
From: Sachin Kamat @ 2012-12-19 11:17 UTC (permalink / raw)
  To: netdev; +Cc: steve.glendinning, davem, sachin.kamat, patches, nico

This eliminates having an #ifdef returning NULL for the case
when OF is disabled.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
---
Compile tested on linux-next.
---
 drivers/net/ethernet/smsc/smc91x.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/smsc/smc91x.c b/drivers/net/ethernet/smsc/smc91x.c
index 022b45b..a670d23 100644
--- a/drivers/net/ethernet/smsc/smc91x.c
+++ b/drivers/net/ethernet/smsc/smc91x.c
@@ -2386,8 +2386,6 @@ static const struct of_device_id smc91x_match[] = {
 	{},
 };
 MODULE_DEVICE_TABLE(of, smc91x_match);
-#else
-#define smc91x_match NULL
 #endif
 
 static struct dev_pm_ops smc_drv_pm_ops = {
@@ -2402,7 +2400,7 @@ static struct platform_driver smc_driver = {
 		.name	= CARDNAME,
 		.owner	= THIS_MODULE,
 		.pm	= &smc_drv_pm_ops,
-		.of_match_table = smc91x_match,
+		.of_match_table = of_match_ptr(smc91x_match),
 	},
 };
 
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH] pkt_sched: act_xt support new Xtables interface
From: Jamal Hadi Salim @ 2012-12-19 11:56 UTC (permalink / raw)
  To: Hasan Chowdhury
  Cc: Stephen Hemminger, Jan Engelhardt, Yury Stankevich,
	netdev@vger.kernel.org, pablo, netfilter-devel
In-Reply-To: <50D1A8A7.1090002@mojatatu.com>

[-- Attachment #1: Type: text/plain, Size: 485 bytes --]


To be applied pending more testing.

Attached. Sorry, I thought I had sent this out over the weekend.
I have done basic testing with a single mark and sending pings to
update stats which can then displayed for the mark.

Hasan/Yury, if you test this please use the latest iproute2 with only 
the first patch I posted (originally from Hasan). Hasan please use that
patch not your version - if theres anything wrong we can find out sooner
before the patch becomes final.

cheers,
jamal

[-- Attachment #2: xt-p1 --]
[-- Type: text/plain, Size: 10173 bytes --]

commit 82330cc874429c63bd0e476e413a79ebab3da350
Author: Jamal Hadi Salim <hadi@mojatatu.com>
Date:   Wed Dec 19 06:23:28 2012 -0500

    Fix iptables/xtables ABI changes. We will eventually replace
    act_ipt with act_xt since only very few targets still support the
    old xtables interface
    
    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 235e01a..1693973 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -578,12 +578,25 @@ config NET_ACT_MIRRED
 config NET_ACT_IPT
         tristate "IPtables targets"
         depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+	select NET_ACT_XT
         ---help---
 	  Say Y here to be able to invoke iptables targets after successful
-	  classification.
+	  classification. Better yet choose NET_ACT_XT since this version
+	  will eventually be obsoleted.
 
 	  To compile this code as a module, choose M here: the
 	  module will be called act_ipt.
+config NET_ACT_XT
+        tristate "New IPtables targets"
+        depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+        ---help---
+	  Say Y here to be able to invoke iptables targets after successful
+	  classification using the new xtables mechanism. This mechanism
+	  will eventually replace NET_ACT_IPT
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_xt.
+
 
 config NET_ACT_NAT
         tristate "Stateless NAT"
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 978cbf0..10a1136 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -11,6 +11,7 @@ obj-$(CONFIG_NET_ACT_POLICE)	+= act_police.o
 obj-$(CONFIG_NET_ACT_GACT)	+= act_gact.o
 obj-$(CONFIG_NET_ACT_MIRRED)	+= act_mirred.o
 obj-$(CONFIG_NET_ACT_IPT)	+= act_ipt.o
+obj-$(CONFIG_NET_ACT_XT)	+= act_xt.o
 obj-$(CONFIG_NET_ACT_NAT)	+= act_nat.o
 obj-$(CONFIG_NET_ACT_PEDIT)	+= act_pedit.o
 obj-$(CONFIG_NET_ACT_SIMP)	+= act_simple.o
diff --git a/net/sched/act_xt.c b/net/sched/act_xt.c
new file mode 100644
index 0000000..589cfe6
--- /dev/null
+++ b/net/sched/act_xt.c
@@ -0,0 +1,324 @@
+/*
+ * net/sched/act_xt.c     iptables target interface
+ *
+ *TODO: Add other tables. For now we only support the ipv4 table targets
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Copyright:	Jamal Hadi Salim (2002-12)
+ */
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <linux/tc_act/tc_ipt.h>
+#include <net/tc_act/tc_ipt.h>
+
+#include <linux/netfilter_ipv4/ip_tables.h>
+
+#define IPT_TAB_MASK     15
+static struct tcf_common *tcf_ipt_ht[IPT_TAB_MASK + 1];
+static u32 ipt_idx_gen;
+static DEFINE_RWLOCK(ipt_lock);
+
+static struct tcf_hashinfo ipt_hash_info = {
+	.htab = tcf_ipt_ht,
+	.hmask = IPT_TAB_MASK,
+	.lock = &ipt_lock,
+};
+
+static int ipt_init_target(struct xt_entry_target *t, char *table,
+			   unsigned int hook)
+{
+	struct xt_tgchk_param par;
+	struct xt_target *target;
+	int ret = 0;
+
+	target = xt_request_find_target(AF_INET, t->u.user.name,
+					t->u.user.revision);
+	if (IS_ERR(target))
+		return PTR_ERR(target);
+
+	t->u.kernel.target = target;
+	par.table = table;
+	par.entryinfo = NULL;
+	par.target = target;
+	par.targinfo = t->data;
+	par.hook_mask = hook;
+	par.family = NFPROTO_IPV4;
+
+	ret = xt_check_target(&par, t->u.target_size - sizeof(*t), 0, false);
+	if (ret < 0) {
+		module_put(t->u.kernel.target->me);
+		return ret;
+	}
+	return 0;
+}
+
+static void ipt_destroy_target(struct xt_entry_target *t)
+{
+	struct xt_tgdtor_param par = {
+		.target = t->u.kernel.target,
+		.targinfo = t->data,
+	};
+	if (par.target->destroy != NULL)
+		par.target->destroy(&par);
+	module_put(par.target->me);
+}
+
+static int tcf_ipt_release(struct tcf_ipt *ipt, int bind)
+{
+	int ret = 0;
+	if (ipt) {
+		if (bind)
+			ipt->tcf_bindcnt--;
+		ipt->tcf_refcnt--;
+		if (ipt->tcf_bindcnt <= 0 && ipt->tcf_refcnt <= 0) {
+			ipt_destroy_target(ipt->tcfi_t);
+			kfree(ipt->tcfi_tname);
+			kfree(ipt->tcfi_t);
+			tcf_hash_destroy(&ipt->common, &ipt_hash_info);
+			ret = ACT_P_DELETED;
+		}
+	}
+	return ret;
+}
+
+static const struct nla_policy ipt_policy[TCA_IPT_MAX + 1] = {
+	[TCA_IPT_TABLE] = {.type = NLA_STRING,.len = IFNAMSIZ},
+	[TCA_IPT_HOOK] = {.type = NLA_U32},
+	[TCA_IPT_INDEX] = {.type = NLA_U32},
+	[TCA_IPT_TARG] = {.len = sizeof(struct xt_entry_target)},
+};
+
+static int tcf_ipt_init(struct nlattr *nla, struct nlattr *est,
+			struct tc_action *a, int ovr, int bind)
+{
+	struct nlattr *tb[TCA_IPT_MAX + 1];
+	struct tcf_ipt *ipt;
+	struct tcf_common *pc;
+	struct xt_entry_target *td, *t;
+	char *tname;
+	int ret = 0, err;
+	u32 hook = 0;
+	u32 index = 0;
+
+	if (nla == NULL)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_IPT_MAX, nla, ipt_policy);
+	if (err < 0)
+		return err;
+
+	if (tb[TCA_IPT_HOOK] == NULL)
+		return -EINVAL;
+	if (tb[TCA_IPT_TARG] == NULL)
+		return -EINVAL;
+
+	td = (struct xt_entry_target *)nla_data(tb[TCA_IPT_TARG]);
+	if (nla_len(tb[TCA_IPT_TARG]) < td->u.target_size)
+		return -EINVAL;
+
+	if (tb[TCA_IPT_INDEX] != NULL)
+		index = nla_get_u32(tb[TCA_IPT_INDEX]);
+
+	pc = tcf_hash_check(index, a, bind, &ipt_hash_info);
+	if (!pc) {
+		pc = tcf_hash_create(index, est, a, sizeof(*ipt), bind,
+				     &ipt_idx_gen, &ipt_hash_info);
+		if (IS_ERR(pc))
+			return PTR_ERR(pc);
+		ret = ACT_P_CREATED;
+	} else {
+		if (!ovr) {
+			tcf_ipt_release(to_ipt(pc), bind);
+			return -EEXIST;
+		}
+	}
+	ipt = to_ipt(pc);
+
+	hook = nla_get_u32(tb[TCA_IPT_HOOK]);
+
+	err = -ENOMEM;
+	tname = kmalloc(IFNAMSIZ, GFP_KERNEL);
+	if (unlikely(!tname))
+		goto err1;
+	if (tb[TCA_IPT_TABLE] == NULL ||
+	    nla_strlcpy(tname, tb[TCA_IPT_TABLE], IFNAMSIZ) >= IFNAMSIZ)
+		strcpy(tname, "mangle");
+
+	t = kmemdup(td, td->u.target_size, GFP_KERNEL);
+	if (unlikely(!t))
+		goto err2;
+
+	err = ipt_init_target(t, tname, hook);
+	if (err < 0)
+		goto err3;
+
+	spin_lock_bh(&ipt->tcf_lock);
+	if (ret != ACT_P_CREATED) {
+		ipt_destroy_target(ipt->tcfi_t);
+		kfree(ipt->tcfi_tname);
+		kfree(ipt->tcfi_t);
+	}
+	ipt->tcfi_tname = tname;
+	ipt->tcfi_t = t;
+	ipt->tcfi_hook = hook;
+	spin_unlock_bh(&ipt->tcf_lock);
+	if (ret == ACT_P_CREATED)
+		tcf_hash_insert(pc, &ipt_hash_info);
+	return ret;
+
+err3:
+	kfree(t);
+err2:
+	kfree(tname);
+err1:
+	if (ret == ACT_P_CREATED) {
+		if (est)
+			gen_kill_estimator(&pc->tcfc_bstats,
+					   &pc->tcfc_rate_est);
+		kfree_rcu(pc, tcfc_rcu);
+	}
+	return err;
+}
+
+static int tcf_ipt_cleanup(struct tc_action *a, int bind)
+{
+	struct tcf_ipt *ipt = a->priv;
+	return tcf_ipt_release(ipt, bind);
+}
+
+static int tcf_ipt(struct sk_buff *skb, const struct tc_action *a,
+		   struct tcf_result *res)
+{
+	int ret = 0, result = 0;
+	struct tcf_ipt *ipt = a->priv;
+	struct xt_action_param par;
+
+	if (skb_cloned(skb)) {
+		if (pskb_expand_head(skb, 0, 0, GFP_ATOMIC))
+			return TC_ACT_UNSPEC;
+	}
+
+	spin_lock(&ipt->tcf_lock);
+
+	ipt->tcf_tm.lastuse = jiffies;
+	bstats_update(&ipt->tcf_bstats, skb);
+
+	/* yes, we have to worry about both in and out dev
+	 * worry later - danger - this API seems to have changed
+	 * from earlier kernels
+	 */
+	par.in = skb->dev;
+	par.out = NULL;
+	par.hooknum = ipt->tcfi_hook;
+	par.target = ipt->tcfi_t->u.kernel.target;
+	par.targinfo = ipt->tcfi_t->data;
+	ret = par.target->target(skb, &par);
+
+	switch (ret) {
+	case NF_ACCEPT:
+		result = TC_ACT_OK;
+		break;
+	case NF_DROP:
+		result = TC_ACT_SHOT;
+		ipt->tcf_qstats.drops++;
+		break;
+	case XT_CONTINUE:
+		result = TC_ACT_PIPE;
+		break;
+	default:
+		net_notice_ratelimited
+		    ("tc filter: Bogus netfilter code %d assume ACCEPT\n", ret);
+		result = TC_POLICE_OK;
+		break;
+	}
+	spin_unlock(&ipt->tcf_lock);
+	return result;
+
+}
+
+static int tcf_ipt_dump(struct sk_buff *skb, struct tc_action *a, int bind,
+			int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_ipt *ipt = a->priv;
+	struct xt_entry_target *t;
+	struct tcf_t tm;
+	struct tc_cnt c;
+
+	/* for simple targets kernel size == user size
+	 * user name = target name
+	 * for foolproof you need to not assume this
+	 */
+
+	t = kmemdup(ipt->tcfi_t, ipt->tcfi_t->u.user.target_size, GFP_ATOMIC);
+	if (unlikely(!t))
+		goto nla_put_failure;
+
+	c.bindcnt = ipt->tcf_bindcnt - bind;
+	c.refcnt = ipt->tcf_refcnt - ref;
+	strcpy(t->u.user.name, ipt->tcfi_t->u.kernel.target->name);
+
+	if (nla_put(skb, TCA_IPT_TARG, ipt->tcfi_t->u.user.target_size, t) ||
+	    nla_put_u32(skb, TCA_IPT_INDEX, ipt->tcf_index) ||
+	    nla_put_u32(skb, TCA_IPT_HOOK, ipt->tcfi_hook) ||
+	    nla_put(skb, TCA_IPT_CNT, sizeof(struct tc_cnt), &c) ||
+	    nla_put_string(skb, TCA_IPT_TABLE, ipt->tcfi_tname))
+		goto nla_put_failure;
+	tm.install = jiffies_to_clock_t(jiffies - ipt->tcf_tm.install);
+	tm.lastuse = jiffies_to_clock_t(jiffies - ipt->tcf_tm.lastuse);
+	tm.expires = jiffies_to_clock_t(ipt->tcf_tm.expires);
+	if (nla_put(skb, TCA_IPT_TM, sizeof(tm), &tm))
+		goto nla_put_failure;
+	kfree(t);
+	return skb->len;
+
+nla_put_failure:
+	nlmsg_trim(skb, b);
+	kfree(t);
+	return -1;
+}
+
+static struct tc_action_ops act_ipt_ops = {
+	.kind = "xt",
+	.hinfo = &ipt_hash_info,
+	.type = TCA_ACT_IPT,
+	.capab = TCA_CAP_NONE,
+	.owner = THIS_MODULE,
+	.act = tcf_ipt,
+	.dump = tcf_ipt_dump,
+	.cleanup = tcf_ipt_cleanup,
+	.lookup = tcf_hash_search,
+	.init = tcf_ipt_init,
+	.walk = tcf_generic_walker
+};
+
+MODULE_AUTHOR("Jamal Hadi Salim(2002-12)");
+MODULE_DESCRIPTION("New Iptables target actions");
+MODULE_LICENSE("GPL");
+
+static int __init ipt_init_module(void)
+{
+	return tcf_register_action(&act_ipt_ops);
+}
+
+static void __exit ipt_cleanup_module(void)
+{
+	tcf_unregister_action(&act_ipt_ops);
+}
+
+module_init(ipt_init_module);
+module_exit(ipt_cleanup_module);

^ permalink raw reply related

* PMTU discovery is broken on kernel 3.7.1 for UDP sockets
From: Yurij M. Plotnikov @ 2012-12-19 13:10 UTC (permalink / raw)
  To: netdev, Ben Hutchings, Alexandra N. Kossovsky

On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket 
option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT 
values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the 
same and packet is always sent with "Don't Fragment" bit in case of 
IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated.

This can be reproduced with 3 hosts configuration. Let it be the hosts: 
host_A, host_B and host _C. host_A via interface eth1 connected with 
host_B via intefaces eth1. Let host_C via interface eth1 connected with 
host_B via interface eth2. Also Lets address 10.0.1.1/24 is assigned to 
eth1 on host_A; 10.0.1.2/24 is assigned to eth1 on host_B; 10.0.2.1/24 
is assigned to eth2 on host_B; 10.0.2.2/24 is assigned to eth1 on 
host_C. Also there are two routes: "10.0.2.2 via 10.0.1.2 dev eth1" on 
host_A and "10.0.1.1 via 10.0.2.1 dev eth1" on host_C. Also forwarding 
is on on host_B. So we have the following picture:

host_A-eth1(10.0.1.1)<-->(10.0.1.2)eth1-host_B-eth2(10.0.2.1)<-->(10.0.2.2)eth1-host_C

MTU is equal to 1500 on all involved interfaces. Then we make the 
followign steps:

on host_A:
1. socket(SOCK_DGRAM) -> 6
2. bind(6, 10.0.1.1:25630) -> 0
on host_C:
3. socket(SOCK_DGRAM) -> 5
4. bind(5, 10.0.2.2:25631) -> 0
on host_A:
5. connect(6, 10.0.2.2:25631) -> 0
on host_C:
6. connect(5, 10.0.1.1:25630) -> 0
on host_A
7. getsockopt(6,IP_MTU) -> 0 // Returns that MTU is 1500
8. getsockopt(6,IP_MTU_DISCOVER) -> 0 // Returns that default value is 
IP_PMTUDISC_WANT

On eth2 on host_B and on eth1 on host_C change MTU from 1500 to 750. 
Wait for a while.

9. send(6, lenght=1400) -> 1400 // the packet is sent with "Don't 
Fragment" bit, tcpdump on eth1 on host_B shows it
10. sleep(5);
11. send(6, length=1400) -> -1 with EMSGSIZE
12. sleep(5);
13. getsockopt(6,IP_MTU) -> 0 // Returns that MTU is 1500 once again. So 
value is not updated.
14. send(6, lenght=1400) -> 1400 // the packet one again is sent with 
"Don't Fragment" bit, tcpdump on eth1 on host_B shows it

So "Don't Fragment" bit is always set for the packets in case when value 
of IP_MTU_DISCOVER is IP_PMTUDISC_WANT. If at step 8 we change 
IP_MTU_DISCOVER value from IP_PMTUDISC_WANT to IP_PMTUDISC_DO we have 
the same picture. The value of IP_MTU socket options is still 1500 at 
step 13 in this case.

^ permalink raw reply

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
From: Ben Hutchings @ 2012-12-19 13:35 UTC (permalink / raw)
  To: Yurij M. Plotnikov; +Cc: netdev, Alexandra N. Kossovsky
In-Reply-To: <50D1BCC0.2000208@oktetlabs.ru>

On Wed, 2012-12-19 at 17:10 +0400, Yurij M. Plotnikov wrote:
> On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket 
> option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT 
> values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the 
> same and packet is always sent with "Don't Fragment" bit in case of 
> IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated.

You could try reverting:

commit ee9a8f7ab2edf801b8b514c310455c94acc232f6
Author: Steffen Klassert <steffen.klassert@secunet.com>
Date:   Mon Oct 8 00:56:54 2012 +0000

    ipv4: Don't report stale pmtu values to userspace
    
    We report cached pmtu values even if they are already expired.
    Change this to not report these values after they are expired
    and fix a race in the expire time calculation, as suggested by
    Eric Dumazet.

Still, PMTU information is not supposed to expire for 10 minutes...

[...]
> On eth2 on host_B and on eth1 on host_C change MTU from 1500 to 750. 
> Wait for a while.
> 
> 9. send(6, lenght=1400) -> 1400 // the packet is sent with "Don't 
> Fragment" bit, tcpdump on eth1 on host_B shows it
> 10. sleep(5);
> 11. send(6, length=1400) -> -1 with EMSGSIZE
> 12. sleep(5);
> 13. getsockopt(6,IP_MTU) -> 0 // Returns that MTU is 1500 once again. So 
> value is not updated.
[...]

What if you read this option immediately before the sleep(5)?

Ben.

^ permalink raw reply

* Re: [PATCH V2 00/12] Add basic VLAN support to bridges
From: Vlad Yasevich @ 2012-12-19 14:13 UTC (permalink / raw)
  To: Shmulik Ladkani; +Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst
In-Reply-To: <20121219101006.7086faef@pixies.home.jungo.com>

On 12/19/2012 03:10 AM, Shmulik Ladkani wrote:
> Thanks Vlad,
>
> On Tue, 18 Dec 2012 14:00:51 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
>> A single vlan may also be designated as untagged.  Any untagged traffic
>> recieved by the port will be assigned to this vlan.
>
> Why the "untagged vlan" is per-bridge global?
> Usually, 802.1q switches define the PVID (port's VID) which controls
> the value of VID, in case ingress frame is either untagged or
> priority-tagged (per port configuration).
> This gives greater flexibility.

It's not.  There is a per port untagged pointer where you can designate 
which VLAN is untagged/native on a port.  The bride interface itself
can also function as a port, so it gets its own untagged pointer so
it can behave similar to port.

>
>> Any traffic exiting
>> the port with a VID matching the untagged vlan will exit untagged (the
>> bridge will strip the vlan header).  This is similar to "Native Vlan" support
>> available in most switches.
>
> 802.1q switches usually allow conifguring per-vlan, per-port
> tagged/untagged egress policy: each vid has its port membership map and
> an accompanying port egress-policy map.
> This gives great flexibility defining all sorts of configurations.

Right, and that's what's provided here.
  * Each VLAN has port membership map (net_bridge_vlan.portgroup).
  * Each port has a list of vlans configured as well 
(net_port_vlan.vlan_list).
  * Each port also has a single vlan that can be untagged 
(net_bridge_port.untagged).
  * The bridge also has a single untagged vlan (net_bridge.untagged)

The limitation (in switches as well) is that only a single VLAN
may be untagged on any 1 port.  If you have more then 1, you don't know
which VLAN the untagged traffic belongs to.

>
> Personally, I'd prefer a fully flexible vlan bridge allowing all sorts
> of configurations (as available in 802.1q switches).
>
> What's the reason limiting such configurations?

So, what do you see that's missing?

-vlad

>
> Regards,
> Shmulik
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* [PATCH net] net: qmi_wwan: add ZTE MF880
From: Bjørn Mork @ 2012-12-19 14:15 UTC (permalink / raw)
  To: netdev; +Cc: linux-usb, Bjørn Mork

The driver description files gives these names to the vendor specific
functions on this modem:

 diag: VID_19D2&PID_0284&MI_00
 nmea: VID_19D2&PID_0284&MI_01
 at:   VID_19D2&PID_0284&MI_02
 mdm:  VID_19D2&PID_0284&MI_03
 net:  VID_19D2&PID_0284&MI_04

Signed-off-by: Bjørn Mork <bjorn@mork.no>
---
 drivers/net/usb/qmi_wwan.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 9b950f5..91d7cb9 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -433,6 +433,7 @@ static const struct usb_device_id products[] = {
 	{QMI_FIXED_INTF(0x19d2, 0x0199, 1)},	/* ZTE MF820S */
 	{QMI_FIXED_INTF(0x19d2, 0x0200, 1)},
 	{QMI_FIXED_INTF(0x19d2, 0x0257, 3)},	/* ZTE MF821 */
+	{QMI_FIXED_INTF(0x19d2, 0x0284, 4)},	/* ZTE MF880 */
 	{QMI_FIXED_INTF(0x19d2, 0x0326, 4)},	/* ZTE MF821D */
 	{QMI_FIXED_INTF(0x19d2, 0x1008, 4)},	/* ZTE (Vodafone) K3570-Z */
 	{QMI_FIXED_INTF(0x19d2, 0x1010, 4)},	/* ZTE (Vodafone) K3571-Z */
-- 
1.7.10.4

^ permalink raw reply related

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
From: Yurij M. Plotnikov @ 2012-12-19 14:27 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, Alexandra N. Kossovsky
In-Reply-To: <1355924119.2676.6.camel@bwh-desktop.uk.solarflarecom.com>

On 12/19/12 17:35, Ben Hutchings wrote:
> On Wed, 2012-12-19 at 17:10 +0400, Yurij M. Plotnikov wrote:
>    
>> On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket
>> option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT
>> values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the
>> same and packet is always sent with "Don't Fragment" bit in case of
>> IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated.
>>      
> You could try reverting:
>
> commit ee9a8f7ab2edf801b8b514c310455c94acc232f6
> Author: Steffen Klassert<steffen.klassert@secunet.com>
> Date:   Mon Oct 8 00:56:54 2012 +0000
>
>      ipv4: Don't report stale pmtu values to userspace
>
>      We report cached pmtu values even if they are already expired.
>      Change this to not report these values after they are expired
>      and fix a race in the expire time calculation, as suggested by
>      Eric Dumazet.
>
> Still, PMTU information is not supposed to expire for 10 minutes...
>
>    
With reverted commit there is no such problem on 3.7.1: IP_MTU is 
updated and DF is set only for the first packet in case of 
IP_PMTUDISC_WANT.
> [...]
>    
>> On eth2 on host_B and on eth1 on host_C change MTU from 1500 to 750.
>> Wait for a while.
>>
>> 9. send(6, lenght=1400) ->  1400 // the packet is sent with "Don't
>> Fragment" bit, tcpdump on eth1 on host_B shows it
>> 10. sleep(5);
>> 11. send(6, length=1400) ->  -1 with EMSGSIZE
>> 12. sleep(5);
>> 13. getsockopt(6,IP_MTU) ->  0 // Returns that MTU is 1500 once again. So
>> value is not updated.
>>      
> [...]
>
> What if you read this option immediately before the sleep(5)?
>    
It still returns that MTU is 1500.

Yurij.

^ permalink raw reply

* Re: [PATCH V2 09/12] bridge: Add the ability to configure untagged vlans
From: Vlad Yasevich @ 2012-12-19 14:50 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, shemminger, davem, or.gerlitz, jhs
In-Reply-To: <20121218231049.GD1135@redhat.com>

On 12/18/2012 06:10 PM, Michael S. Tsirkin wrote:
> On Tue, Dec 18, 2012 at 06:03:25PM -0500, Vlad Yasevich wrote:
>> On 12/18/2012 06:01 PM, Michael S. Tsirkin wrote:
>>> On Tue, Dec 18, 2012 at 02:01:00PM -0500, Vlad Yasevich wrote:
>>>> A user may designate a certain vlan as untagged.  This means that
>>>> any ingress frame is assigned to this vlan and any forwarding decisions
>>>> are made with this vlan in mind.  On egress, any frames tagged/labeled
>>>> with untagged vlan have the vlan tag removed and are send as regular
>>>> ethernet frames.
>>>>
>>>> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
>>>> ---
>>>>   include/uapi/linux/if_bridge.h |    3 +
>>>>   net/bridge/br_if.c             |  146 +++++++++++++++++++++++++++++++++++++---
>>>>   net/bridge/br_netlink.c        |    6 +-
>>>>   net/bridge/br_private.h        |    2 +
>>>>   4 files changed, 144 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
>>>> index d0b4f5c..988d858 100644
>>>> --- a/include/uapi/linux/if_bridge.h
>>>> +++ b/include/uapi/linux/if_bridge.h
>>>> @@ -127,6 +127,9 @@ enum {
>>>>   	BR_VLAN_DEL,
>>>>   };
>>>>
>>>> +#define BRIDGE_VLAN_INFO_MASTER		1
>>>> +#define BRIDGE_VLAN_INFO_UNTAGGED	2
>>>> +
>>>>   struct bridge_vlan_info {
>>>>   	u16 op_code;
>>>>   	u16 flags;
>>>> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
>>>> index 57bbb35..14563fb 100644
>>>> --- a/net/bridge/br_if.c
>>>> +++ b/net/bridge/br_if.c
>>>> @@ -108,6 +108,34 @@ static void br_vlan_put(struct net_bridge_vlan *vlan)
>>>>   		br_vlan_destroy(vlan);
>>>>   }
>>>>
>>>> +/* Must be protected by RTNL */
>>>> +static void br_vlan_add_untagged(struct net_bridge *br,
>>>> +				 struct net_bridge_vlan *vlan)
>>>> +{
>>>> +	ASSERT_RTNL();
>>>> +	if (br->untagged == vlan)
>>>> +		return;
>>>> +	else if (br->untagged) {
>>>> +		/* Untagged vlan is already set on the master,
>>>> +		 * so drop the ref since we'll be replacing it.
>>>> +		 */
>>>> +		br_vlan_put(br->untagged);
>>>> +	}
>>>> +	br_vlan_hold(vlan);
>>>> +	rcu_assign_pointer(br->untagged, vlan);
>>>
>>> Is there a reason for rcu here but not else where? If all users are under
>>> rtnl you can just assign in a simple way.
>>> If not then rcu_dereference_protected would be more appropriate.
>>
>> Everywhere that the pointer changes rcu_assign_pointer is used.
>>
>> Now, if we hold an RTNL, we can technically read the pointer with
>> rcu since it's guaranteed not to change since it only changes under
>> RTNL.
>> I'll check that this is consistent.
>
> Check what rcu_dereference_protected does. It's really just
> an explicit way to say "this is accessed without rcu because I have
> this lock".

Looks like the helper rtnl_dereference() already does what I need.  I'll 
use that.

Thanks
-vlad

>
>> If I access the pointer without rtnl, it's always inside rcu
>> critical section and with rcu_dereference().
>>
>> I thought those were the basic rules of rcu.  Did that change?
>>
>> -vlad
>
>
>
>>>
>>>> +}
>>>> +
>>>> +/* Must be protected by RTNL */
>>>> +static void br_vlan_del_untagged(struct net_bridge *br,
>>>> +				 struct net_bridge_vlan *vlan)
>>>> +{
>>>> +	ASSERT_RTNL();
>>>> +	if (br->untagged == vlan) {
>>>> +		br_vlan_put(vlan);
>>>> +		rcu_assign_pointer(br->untagged, NULL);
>>>> +	}
>>>> +}
>>>> +
>>>>   struct net_bridge_vlan *br_vlan_find(struct net_bridge *br, u16 vid)
>>>>   {
>>>>   	struct net_bridge_vlan *vlan;
>>>> @@ -132,7 +160,7 @@ struct net_bridge_vlan *br_vlan_add(struct net_bridge *br, u16 vid,
>>>>
>>>>   	vlan = br_vlan_find(br, vid);
>>>>   	if (vlan)
>>>> -		return vlan;
>>>> +		goto untagged;
>>>>
>>>>   	vlan = kzalloc(sizeof(struct net_bridge_vlan), GFP_KERNEL);
>>>>   	if (!vlan)
>>>> @@ -141,7 +169,7 @@ struct net_bridge_vlan *br_vlan_add(struct net_bridge *br, u16 vid,
>>>>   	vlan->vid = vid;
>>>>   	atomic_set(&vlan->refcnt, 1);
>>>>
>>>> -	if (flags & BRIDGE_FLAGS_SELF) {
>>>> +	if (flags & BRIDGE_VLAN_INFO_MASTER) {
>>>>   		/* Set bit 0 that is associated with the bridge master
>>>>   		 * device.  Port numbers start with 1.
>>>>   		 */
>>>> @@ -149,15 +177,24 @@ struct net_bridge_vlan *br_vlan_add(struct net_bridge *br, u16 vid,
>>>>   	}
>>>>
>>>>   	hlist_add_head_rcu(&vlan->hlist, &br->vlan_hlist[br_vlan_hash(vid)]);
>>>> +
>>>> +untagged:
>>>> +	if (flags & BRIDGE_VLAN_INFO_UNTAGGED)
>>>> +		br_vlan_add_untagged(br, vlan);
>>>> +
>>>>   	return vlan;
>>>>   }
>>>>
>>>>   /* Must be protected by RTNL */
>>>> -static void br_vlan_del(struct net_bridge_vlan *vlan, u16 flags)
>>>> +static void br_vlan_del(struct net_bridge *br, struct net_bridge_vlan *vlan,
>>>> +			u16 flags)
>>>>   {
>>>>   	ASSERT_RTNL();
>>>>
>>>> -	if (flags & BRIDGE_FLAGS_SELF) {
>>>> +	if (flags & BRIDGE_VLAN_INFO_UNTAGGED)
>>>> +		br_vlan_del_untagged(br, vlan);
>>>> +
>>>> +	if (flags & BRIDGE_VLAN_INFO_MASTER) {
>>>>   		/* Clear bit 0 that is associated with the bridge master
>>>>   		 * device.
>>>>   		 */
>>>> @@ -172,6 +209,14 @@ static void br_vlan_del(struct net_bridge_vlan *vlan, u16 flags)
>>>>
>>>>   	vlan->vid = BR_INVALID_VID;
>>>>
>>>> +	/* If, for whatever reason, bridge still has a ref on this vlan
>>>> +	 * through the @untagged pointer, drop that ref and clear untagged.
>>>> +	 */
>>>> +	if (br->untagged == vlan) {
>>>> +		br_vlan_put(vlan);
>>>> +		rcu_assign_pointer(br->untagged, NULL);
>>>> +	}
>>>> +
>>>>   	/* Drop the self-ref to trigger descrution. */
>>>>   	br_vlan_put(vlan);
>>>>   }
>>>> @@ -187,7 +232,7 @@ int br_vlan_delete(struct net_bridge *br, u16 vid, u16 flags)
>>>>   	if (!vlan)
>>>>   		return -ENOENT;
>>>>
>>>> -	br_vlan_del(vlan, flags);
>>>> +	br_vlan_del(br, vlan, flags);
>>>>   	return 0;
>>>>   }
>>>>
>>>> @@ -204,7 +249,9 @@ static void br_vlan_flush(struct net_bridge *br)
>>>>   	for (i = 0; i < BR_VID_HASH_SIZE; i++) {
>>>>   		hlist_for_each_entry_safe(vlan, node, tmp,
>>>>   					  &br->vlan_hlist[i], hlist) {
>>>> -			br_vlan_del(vlan, BRIDGE_FLAGS_SELF);
>>>> +			br_vlan_del(br, vlan,
>>>> +				    (BRIDGE_VLAN_INFO_MASTER |
>>>> +				     BRIDGE_VLAN_INFO_UNTAGGED));
>>>>   		}
>>>>   	}
>>>>   }
>>>> @@ -224,10 +271,70 @@ struct net_port_vlan *nbp_vlan_find(const struct net_bridge_port *p, u16 vid)
>>>>   	return NULL;
>>>>   }
>>>>
>>>> +static int nbp_vlan_add_untagged(struct net_bridge_port *p,
>>>> +			  struct net_bridge_vlan *vlan,
>>>> +			  u16 flags)
>>>> +{
>>>> +	struct net_device *dev = p->dev;
>>>> +
>>>> +	if (p->untagged) {
>>>> +		/* Port already has untagged vlan set.  Drop the ref
>>>> +		 * to the old one since we'll be replace it.
>>>> +		 */
>>>> +		br_vlan_put(p->untagged);
>>>> +	} else {
>>>> +		int err;
>>>> +
>>>> +		/* Add vid 0 to filter if filter is available. */
>>>> +		if ((dev->features & NETIF_F_HW_VLAN_FILTER) &&
>>>> +		    dev->netdev_ops->ndo_vlan_rx_add_vid &&
>>>> +		    dev->netdev_ops->ndo_vlan_rx_kill_vid) {
>>>> +			err = dev->netdev_ops->ndo_vlan_rx_add_vid(dev, 0);
>>>> +			if (err)
>>>> +				return err;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	/* This VLAN is handled as untagged/native. Save an
>>>> +	 * additional ref.
>>>> +	 */
>>>> +	br_vlan_hold(vlan);
>>>> +	rcu_assign_pointer(p->untagged, vlan);
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static void nbp_vlan_delete_untagged(struct net_bridge_port *p,
>>>> +				     struct net_bridge_vlan *vlan)
>>>> +{
>>>> +	if (p->untagged != vlan)
>>>> +		return;
>>>> +
>>>> +	/* Remove VLAN from the device filter if it is supported. */
>>>> +	if ((p->dev->features & NETIF_F_HW_VLAN_FILTER) &&
>>>> +	    p->dev->netdev_ops->ndo_vlan_rx_kill_vid) {
>>>> +		int err;
>>>> +
>>>> +		err = p->dev->netdev_ops->ndo_vlan_rx_kill_vid(p->dev, 0);
>>>> +		if (err) {
>>>> +			pr_warn("failed to kill vid %d for device %s\n",
>>>> +				vlan->vid, p->dev->name);
>>>> +		}
>>>> +	}
>>>> +
>>>> +	/* If this VLAN is currently functioning as untagged, clear it.
>>>> +	 * It's safe to drop the refcount, since the vlan is still held
>>>> +	 * by the port.
>>>> +	 */
>>>> +	br_vlan_put(vlan);
>>>> +	rcu_assign_pointer(p->untagged, NULL);
>>>> +
>>>> +}
>>>> +
>>>>   /* Must be protected by RTNL */
>>>>   int nbp_vlan_add(struct net_bridge_port *p, u16 vid, u16 flags)
>>>>   {
>>>> -	struct net_port_vlan *pve;
>>>> +	struct net_port_vlan *pve = NULL;
>>>>   	struct net_bridge_vlan *vlan;
>>>>   	struct net_device *dev = p->dev;
>>>>   	int err;
>>>> @@ -275,11 +382,21 @@ int nbp_vlan_add(struct net_bridge_port *p, u16 vid, u16 flags)
>>>>   	set_bit(p->port_no, vlan->port_bitmap);
>>>>
>>>>   	list_add_tail_rcu(&pve->list, &p->vlan_list);
>>>> +
>>>> +	if (flags & BRIDGE_VLAN_INFO_UNTAGGED) {
>>>> +		err = nbp_vlan_add_untagged(p, vlan, flags);
>>>> +		if (err)
>>>> +			goto del_vlan;
>>>> +	}
>>>> +
>>>>   	return 0;
>>>>
>>>>   clean_up:
>>>>   	kfree(pve);
>>>> -	br_vlan_del(vlan, flags);
>>>> +	br_vlan_del(p->br, vlan, flags);
>>>> +	return err;
>>>> +del_vlan:
>>>> +	nbp_vlan_delete(p, vid, flags);
>>>>   	return err;
>>>>   }
>>>>
>>>> @@ -296,6 +413,9 @@ int nbp_vlan_delete(struct net_bridge_port *p, u16 vid, u16 flags)
>>>>   	if (!pve)
>>>>   		return -ENOENT;
>>>>
>>>> +	if (flags & BRIDGE_VLAN_INFO_UNTAGGED)
>>>> +		nbp_vlan_delete_untagged(p, pve->vlan);
>>>> +
>>>>   	/* Remove VLAN from the device filter if it is supported. */
>>>>   	if ((dev->features & NETIF_F_HW_VLAN_FILTER) &&
>>>>   	    dev->netdev_ops->ndo_vlan_rx_kill_vid) {
>>>> @@ -306,6 +426,7 @@ int nbp_vlan_delete(struct net_bridge_port *p, u16 vid, u16 flags)
>>>>   			pr_warn("failed to kill vid %d for device %s\n",
>>>>   				vid, dev->name);
>>>>   	}
>>>> +
>>>>   	pve->vid = BR_INVALID_VID;
>>>>
>>>>   	vlan = pve->vlan;
>>>> @@ -316,7 +437,7 @@ int nbp_vlan_delete(struct net_bridge_port *p, u16 vid, u16 flags)
>>>>   	list_del_rcu(&pve->list);
>>>>   	kfree_rcu(pve, rcu);
>>>>
>>>> -	br_vlan_del(vlan, flags);
>>>> +	br_vlan_del(p->br, vlan, flags);
>>>>
>>>>   	return 0;
>>>>   }
>>>> @@ -328,8 +449,11 @@ static void nbp_vlan_flush(struct net_bridge_port *p)
>>>>
>>>>   	ASSERT_RTNL();
>>>>
>>>> -	list_for_each_entry_safe(pve, tmp, &p->vlan_list, list)
>>>> -		nbp_vlan_delete(p, pve->vid, BRIDGE_FLAGS_SELF);
>>>> +	list_for_each_entry_safe(pve, tmp, &p->vlan_list, list)  {
>>>> +		nbp_vlan_delete(p, pve->vid,
>>>> +				(BRIDGE_VLAN_INFO_MASTER |
>>>> +				 BRIDGE_VLAN_INFO_UNTAGGED));
>>>> +	}
>>>>   }
>>>>
>>>>   static void release_nbp(struct kobject *kobj)
>>>> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>>>> index 9cf2879..1b302ce 100644
>>>> --- a/net/bridge/br_netlink.c
>>>> +++ b/net/bridge/br_netlink.c
>>>> @@ -199,7 +199,8 @@ static int br_afspec(struct net_bridge *br, struct net_bridge_port *p,
>>>>   			if (p)
>>>>   				err = nbp_vlan_add(p, vinfo->vid, vinfo->flags);
>>>>   			else {
>>>> -				u16 flags = vinfo->flags | BRIDGE_FLAGS_SELF;
>>>> +				u16 flags = vinfo->flags |
>>>> +					    BRIDGE_VLAN_INFO_MASTER;
>>>>   				if (!br_vlan_add(br, vinfo->vid, flags))
>>>>   					err = -ENOMEM;
>>>>   			}
>>>> @@ -210,7 +211,8 @@ static int br_afspec(struct net_bridge *br, struct net_bridge_port *p,
>>>>   				err = nbp_vlan_delete(p, vinfo->vid,
>>>>   						      vinfo->flags);
>>>>   			else {
>>>> -				u16 flags = vinfo->flags | BRIDGE_FLAGS_SELF;
>>>> +				u16 flags = vinfo->flags |
>>>> +					    BRIDGE_VLAN_INFO_MASTER;
>>>>   				err = br_vlan_delete(br, vinfo->vid, flags);
>>>>   			}
>>>>   			break;
>>>> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
>>>> index cc75212..9328463 100644
>>>> --- a/net/bridge/br_private.h
>>>> +++ b/net/bridge/br_private.h
>>>> @@ -179,6 +179,7 @@ struct net_bridge_port
>>>>   	struct netpoll			*np;
>>>>   #endif
>>>>   	struct list_head		vlan_list;
>>>> +	struct net_bridge_vlan __rcu	*untagged;
>>>>   };
>>>>
>>>>   #define br_port_exists(dev) (dev->priv_flags & IFF_BRIDGE_PORT)
>>>> @@ -298,6 +299,7 @@ struct net_bridge
>>>>   	struct timer_list		gc_timer;
>>>>   	struct kobject			*ifobj;
>>>>   	struct hlist_head		vlan_hlist[BR_VID_HASH_SIZE];
>>>> +	struct net_bridge_vlan __rcu	*untagged;
>>>>   };
>>>>
>>>>   struct br_input_skb_cb {
>>>> --
>>>> 1.7.7.6

^ permalink raw reply

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
From: Jan Engelhardt @ 2012-12-19 15:52 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Hasan Chowdhury, Stephen Hemminger, Yury Stankevich,
	netdev@vger.kernel.org, pablo, netfilter-devel
In-Reply-To: <50D1AB7E.5060000@mojatatu.com>


On Wednesday 2012-12-19 12:56, Jamal Hadi Salim wrote:
>
> To be applied pending more testing.
>
> Attached. Sorry, I thought I had sent this out over the weekend.
> I have done basic testing with a single mark and sending pings to
> update stats which can then displayed for the mark.
>
> diffstat xt-p1
> Kconfig  |   15 ++
> Makefile |    1 
> act_xt.c |  324 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 339 insertions(+), 1 deletion(-)

Humm... that's a huge patch for what seems to be equal to act_ipt.c
Let's do a cross-diff:

--- act_ipt.c	2012-10-25 19:49:25.372191795 +0200
+++ act_xt.c	2012-12-19 16:48:22.052419730 +0100
@@ -2 +2 @@
- * net/sched/ipt.c     iptables target interface
+ * net/sched/act_xt.c     iptables target interface
@@ -11 +11 @@
- * Copyright:	Jamal Hadi Salim (2002-4)
+ * Copyright:	Jamal Hadi Salim (2002-12)
@@ -30 +29,0 @@
-
@@ -42 +41,2 @@ static struct tcf_hashinfo ipt_hash_info
-static int ipt_init_target(struct xt_entry_target *t, char *table, unsigned int hook)
+static int ipt_init_target(struct xt_entry_target *t, char *table,
+			   unsigned int hook)
@@ -243,2 +243,2 @@ static int tcf_ipt(struct sk_buff *skb,
-		net_notice_ratelimited("tc filter: Bogus netfilter code %d assume ACCEPT\n",
-				       ret);
+		net_notice_ratelimited
+		    ("tc filter: Bogus netfilter code %d assume ACCEPT\n", ret);
@@ -253 +253,2 @@ static int tcf_ipt(struct sk_buff *skb,
-static int tcf_ipt_dump(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
+static int tcf_ipt_dump(struct sk_buff *skb, struct tc_action *a, int bind,
+			int ref)
@@ -295 +296 @@ static struct tc_action_ops act_ipt_ops
-	.kind		=	"ipt",
+	.kind = "xt",
@@ -308,2 +309,2 @@ static struct tc_action_ops act_ipt_ops
-MODULE_AUTHOR("Jamal Hadi Salim(2002-4)");
-MODULE_DESCRIPTION("Iptables target actions");
+MODULE_AUTHOR("Jamal Hadi Salim(2002-12)");
+MODULE_DESCRIPTION("New Iptables target actions");


Is that [the set of hunks] all? Then I would instead suggest
something like:


diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c
index 58fb3c7..f92a007 100644
--- a/net/sched/act_ipt.c
+++ b/net/sched/act_ipt.c
@@ -305,18 +305,43 @@ static struct tc_action_ops act_ipt_ops = {
 	.walk		=	tcf_generic_walker
 };
 
+static struct tc_action_ops act_xt_ops = {
+	.kind		=	"xt",
+	.hinfo		=	&ipt_hash_info,
+	.type		=	TCA_ACT_IPT,
+	.capab		=	TCA_CAP_NONE,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_ipt,
+	.dump		=	tcf_ipt_dump,
+	.cleanup	=	tcf_ipt_cleanup,
+	.lookup		=	tcf_hash_search,
+	.init		=	tcf_ipt_init,
+	.walk		=	tcf_generic_walker
+};
+
 MODULE_AUTHOR("Jamal Hadi Salim(2002-4)");
 MODULE_DESCRIPTION("Iptables target actions");
 MODULE_LICENSE("GPL");
+MODULE_ALIAS("act_xt");
 
 static int __init ipt_init_module(void)
 {
-	return tcf_register_action(&act_ipt_ops);
+	int ret;
+	ret = tcf_register_action(&act_ipt_ops);
+	if (ret < 0)
+		return ret;
+	ret = tcf_register_action(&xt_ipt_ops);
+	if (ret < 0) {
+		tcf_unregister_action(&act_ipt_ops);
+		return ret;
+	}
+	return 0;
 }
 
 static void __exit ipt_cleanup_module(void)
 {
 	tcf_unregister_action(&act_ipt_ops);
+	tcf_unregister_action(&act_xt_ops);
 }
 
 module_init(ipt_init_module);

^ permalink raw reply related

* Re: [PATCH 4/4] FEC: Add time stamping code and a PTP hardware clock
From: Ben Hutchings @ 2012-12-19 15:53 UTC (permalink / raw)
  To: Frank Li, Sascha Hauer
  Cc: Richard Cochran, Shawn Guo, Frank Li, lznua, linux-arm-kernel,
	netdev, davem
In-Reply-To: <20121218070420.GA2946@netboy.at.omicron.at>

On Tue, Dec 18, 2012 at 08:04:20AM +0100, Richard Cochran wrote:
> On Mon, Dec 17, 2012 at 09:02:32PM +0100, Sascha Hauer wrote:
> > This leaves an option in the tree which can be used to break FEC on
> > i.MX3/5.
> > 
> > 	depends on !SOC_IMX31 && !SOC_IMX35 && !SOC_IMX5
> > 
> > might be an option, but given that this patch seems to have bypassed any
> > review I feel more like reverting it.
> 
> Instead of reverting, I suggest finding a solution (Frank) to let the
> code work when it can work and to prevent it when it cannot. This
> could be kconfig, DT, or run time probing of silicon revisions, but I
> don't have access to this hardware, and so I can't really say how to
> fix it.
[...]

Please implement run-time probing.  A different configuration for
each SoC is just not sustainable for distributions.

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply

* Re: [RFC PATCH] fix IP_ECN_set_ce
From: Eric Dumazet @ 2012-12-19 16:14 UTC (permalink / raw)
  To: roy.qing.li; +Cc: netdev
In-Reply-To: <1355898095-7444-1-git-send-email-roy.qing.li@gmail.com>

On Wed, 2012-12-19 at 14:21 +0800, roy.qing.li@gmail.com wrote:
> From: Li RongQing <roy.qing.li@gmail.com>
> 
> 1. ECN uses the two least significant (right-most) bits of the DiffServ
> field in the IPv4, so it should be in iph->tos, not in (iph->tos+1)
> 
> 2. When setting CE, we should check if ECN Capable Transport supports,
> both 10 and 01 mean ECN Capable Transport, so only check 10 is not enough
>     00: Non ECN-Capable Transport — Non-ECT
>     10: ECN Capable Transport — ECT(0)
>     01: ECN Capable Transport — ECT(1)
>     11: Congestion Encountered — CE
> 
> 3. Remove the misunderstand comment
> 
> 4. fix the checksum computation
> 
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>

This is total crap.

Its perfectly clear to me and compiler generates fast code.

If you don't understand this code, please don't touch it.

^ permalink raw reply

* Re: [PATCH] xen/netfront: improve truesize tracking
From: Eric Dumazet @ 2012-12-19 16:17 UTC (permalink / raw)
  To: Sander Eikelenboom
  Cc: Ian Campbell, netdev@vger.kernel.org, Konrad Rzeszutek Wilk,
	annie li, xen-devel@lists.xensource.com
In-Reply-To: <55633610.20121219123427@eikelenboom.it>

On Wed, 2012-12-19 at 12:34 +0100, Sander Eikelenboom wrote:

> Hi Ian,
> 
> It ran overnight and i haven't seen the warn_once trigger.
> (but i also didn't with the previous patch)
> 

As I said, the miminum value to not trigger the warning was what Ian
patch was doing, but it was still a not accurate estimation.

Doing the real accounting might trigger slow transferts, or dropped
packets because of socket limits (SNDBUF / RCVBUF) being hit sooner.

So the real question was : If accounting for full pages, is your
applications run as smooth as before, with no huge performance
regression ?

^ permalink raw reply

* Re: [PATCH V2 00/12] Add basic VLAN support to bridges
From: Vlad Yasevich @ 2012-12-19 16:25 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst
In-Reply-To: <20121219082727.GB1637@minipsycho.orion>

On 12/19/2012 03:27 AM, Jiri Pirko wrote:
> Tue, Dec 18, 2012 at 11:46:21PM CET, vyasevic@redhat.com wrote:
>> On 12/18/2012 05:32 PM, Jiri Pirko wrote:
>>>
>>>
>>> I see that this patchset replicates a lot of code which is already
>>> present in net/8021q/ or include/linux/if_vlan.h. I think it would
>>> be nice to move this code into some "common" place, wouldn't it?
>>>
>>
>> The only replication that I am aware of is in br_vlan_untag().  I
>> thought about pulling that piece out, but I think there is a reason
>> why it's not available when 801q support isn't turned on.  I noted that
>> openvswitch implemented its own vlan header manipulation functions as well.
>
> openvswitch should use the "common" code as well.
>
>>
>> What else are you seeing that's duplicate?
>
> For example I spotted check of ndo_vlan_rx_[add/kill]_vid and
> NETIF_F_HW_VLAN_FILTER and ndo_vlan_rx_[add/kill]_vid call

Ahh yes....  I can make that generic.  Thanks

-vlad

>
>
>>
>> -vlad
>>
>>> Jiri
>>>
>>> Tue, Dec 18, 2012 at 08:00:51PM CET, vyasevic@redhat.com wrote:
>>>> This series of patches provides an ability to add VLANs to the bridge
>>>> ports.  This is similar to what can be found in most switches.  The bridge
>>>> port may have any number of VLANs added to it including vlan 0 priority tagged
>>>> traffic.  When vlans are added to the port, only traffic tagged with particular
>>>> vlan will forwarded over this port.  Additionally, vlan ids are added to FDB
>>>> entries and become part of the lookup.  This way we correctly identify the FDB
>>>> entry.
>>>>
>>>> A single vlan may also be designated as untagged.  Any untagged traffic
>>>> recieved by the port will be assigned to this vlan.  Any traffic exiting
>>>> the port with a VID matching the untagged vlan will exit untagged (the
>>>> bridge will strip the vlan header).  This is similar to "Native Vlan" support
>>>> available in most switches.
>>>>
>>>> The default behavior ofthe bridge is unchanged if no vlans have been
>>>> configured.
>>>>
>>>> Changes since v1:
>>>> - Fixed some forwarding bugs.
>>>> - Add vlan to local fdb entries.  New local entries are created per vlan
>>>>    to facilite correct forwarding to bridge interface.
>>>> - Allow configuration of vlans directly on the bridge master device
>>>>    in addition to ports.
>>>>
>>>> Changes since rfc v2:
>>>> - Per-port vlan bitmap is gone and is replaced with a vlan list.
>>>> - Added bridge vlan list, which is referenced by each port.  Entries in
>>>>    the birdge vlan list have port bitmap that shows which port are parts
>>>>    of which vlan.
>>>> - Netlink API changes.
>>>> - Dropped sysfs support for now.  If people think this is really usefull,
>>>>    can add it back.
>>>> - Support for native/untagged vlans.
>>>>
>>>> Changes since rfc v1:
>>>> - Comments addressed regarding formatting and RCU usage
>>>> - iocts have been removed and changed over the netlink interface.
>>>> - Added support of user added ndb entries.
>>>> - changed sysfs interface to export a bitmap.  Also added a write interface.
>>>>    I am not sure how much I like it, but it made my testing easier/faster.  I
>>>>    might change the write interface to take text instead of binary.
>>>>
>>>>
>>>> Vlad Yasevich (12):
>>>>   bridge: Add vlan filtering infrastructure
>>>>   bridge: Validate that vlan is permitted on ingress
>>>>   bridge: Verify that a vlan is allowed to egress on give port
>>>>   bridge: Cache vlan in the cb for faster egress lookup.
>>>>   bridge: Add vlan to unicast fdb entries
>>>>   bridge: Add vlan id to multicast groups
>>>>   bridge: Add netlink interface to configure vlans on bridge ports
>>>>   bridge: Add vlan support to static neighbors
>>>>   bridge: Add the ability to configure untagged vlans
>>>>   bridge: Implement untagged vlan handling
>>>>   bridge: Dump vlan information from a bridge port
>>>>   bridge: Add vlan support for local fdb entries
>>>>
>>>> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    5 +-
>>>> drivers/net/macvlan.c                         |    2 +-
>>>> drivers/net/vxlan.c                           |    3 +-
>>>> include/linux/netdevice.h                     |    4 +-
>>>> include/uapi/linux/if_bridge.h                |   23 ++-
>>>> include/uapi/linux/neighbour.h                |    1 +
>>>> include/uapi/linux/rtnetlink.h                |    1 +
>>>> net/bridge/br_device.c                        |   34 ++-
>>>> net/bridge/br_fdb.c                           |  253 ++++++++++++---
>>>> net/bridge/br_forward.c                       |  160 ++++++++++
>>>> net/bridge/br_if.c                            |  404 ++++++++++++++++++++++++-
>>>> net/bridge/br_input.c                         |   65 ++++-
>>>> net/bridge/br_multicast.c                     |   71 +++--
>>>> net/bridge/br_netlink.c                       |  178 ++++++++++--
>>>> net/bridge/br_private.h                       |   71 ++++-
>>>> net/core/rtnetlink.c                          |   40 ++-
>>>> 16 files changed, 1190 insertions(+), 125 deletions(-)
>>>>
>>>> --
>>>> 1.7.7.6
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [GIT PULL net-next 04/17] ndisc: Introduce ndisc_fill_redirect_hdr_option().
From: YOSHIFUJI Hideaki @ 2012-12-19 16:25 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: davem, netdev, YOSHIFUJI Hideaki
In-Reply-To: <87txrib6wa.fsf@nemi.mork.no>

Bjørn Mork wrote:
> YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> writes:
> 
>> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
>> ---
>>  net/ipv6/ndisc.c |   21 +++++++++++++++------
>>  1 file changed, 15 insertions(+), 6 deletions(-)
>>
>> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
>> index a181113..0a4f3a9 100644
>> --- a/net/ipv6/ndisc.c
>> +++ b/net/ipv6/ndisc.c
>> @@ -1332,6 +1332,19 @@ static void ndisc_redirect_rcv(struct sk_buff *skb)
>>  	icmpv6_notify(skb, NDISC_REDIRECT, 0, 0);
>>  }
>>  
>> +static u8 *ndisc_fill_redirect_hdr_option(u8 *opt, struct sk_buff *orig_skb,
>> +					  int rd_len)
>> +{
>> +	memset(opt, 0, 8);
>> +	*(opt++) = ND_OPT_REDIRECT_HDR;
>> +	*(opt++) = (rd_len >> 3);
>> +	opt += 6;
>> +
>> +	memcpy(opt, ipv6_hdr(orig_skb), rd_len - 8);
>> +
>> +	return opt;
>> +}
>> +
> 
> I realize that it doesn't currently matter, but the above modification
> of "opt" looks like a bug-waiting-to-happen to me.
> 
>>  void ndisc_send_redirect(struct sk_buff *skb, const struct in6_addr *target)
>>  {
>>  	struct net_device *dev = skb->dev;
>> @@ -1461,12 +1474,8 @@ void ndisc_send_redirect(struct sk_buff *skb, const struct in6_addr *target)
>>  	 *	build redirect option and copy skb over to the new packet.
>>  	 */
>>  
>> -	memset(opt, 0, 8);
>> -	*(opt++) = ND_OPT_REDIRECT_HDR;
>> -	*(opt++) = (rd_len >> 3);
>> -	opt += 6;
>> -
>> -	memcpy(opt, ipv6_hdr(skb), rd_len - 8);
>> +	if (rd_len)
>> +		opt = ndisc_fill_redirect_hdr_option(opt, skb, rd_len);
> 
> 
> I understand that opt isn't currently used after this, but if it ever is
> then it is going to come as big a surprise that this implies opt += 8;
> 
> This was previously quite clear when the code was inline, but it becomes
> problematic when it is factored out.

I understand your concern.  opt will be disappeared by following
changeset (12 of 17).

--yoshfuji

^ permalink raw reply

* Re: [RFC PATCH v3 2/2] tun: fix LSM/SELinux labeling of tun/tap devices
From: Paul Moore @ 2012-12-19 16:58 UTC (permalink / raw)
  To: Jason Wang; +Cc: Michael S. Tsirkin, netdev, linux-security-module, selinux
In-Reply-To: <50D154B1.4010909@redhat.com>

On Wednesday, December 19, 2012 01:46:25 PM Jason Wang wrote:
> On 12/19/2012 07:08 AM, Michael S. Tsirkin wrote:
> > On Tue, Dec 18, 2012 at 05:53:52PM -0500, Paul Moore wrote:
> >> This patch corrects some problems with LSM/SELinux that were introduced
> >> with the multiqueue patchset.  The problem stems from the fact that the
> >> multiqueue work changed the relationship between the tun device and its
> >> associated socket; before the socket persisted for the life of the
> >> device, however after the multiqueue changes the socket only persisted
> >> for the life of the userspace connection (fd open).  For non-persistent
> >> devices this is not an issue, but for persistent devices this can cause
> >> the tun device to lose its SELinux label.
> >> 
> >> We correct this problem by adding an opaque LSM security blob to the
> >> tun device struct which allows us to have the LSM security state, e.g.
> >> SELinux labeling information, persist for the lifetime of the tun
> >> device.  In the process we tweak the LSM hooks to work with this new
> >> approach to TUN device/socket labeling and introduce a new LSM hook,
> >> security_tun_dev_attach_queue(), to approve requests to attach to a
> >> TUN queue via TUNSETQUEUE.
> >> 
> >> The SELinux code has been adjusted to match the new LSM hooks, the
> >> other LSMs do not make use of the LSM TUN controls.  This patch makes
> >> use of the recently added "tun_socket:attach_queue" permission to
> >> restrict access to the TUNSETQUEUE operation.  On older SELinux
> >> policies which do not define the "tun_socket:attach_queue" permission
> >> the access control decision for TUNSETQUEUE will be handled according
> >> to the SELinux policy's unknown permission setting.
> >> 
> >> Signed-off-by: Paul Moore <pmoore@redhat.com>
> > 
> > Looks good to me. A comment not directly related to this patch, below.
> 
> Good to me too, will do some test on this.

Great.  I'll do some more testing and make sure the LSM and SELinux crowd are 
okay with the changes.

-- 
paul moore
security and virtualization @ redhat


^ permalink raw reply

* Re: [RFC PATCH v3 0/2] Fix some multiqueue TUN problems
From: Paul Moore @ 2012-12-19 16:59 UTC (permalink / raw)
  To: linux-security-module, selinux, eparis; +Cc: netdev, jasowang, mst
In-Reply-To: <20121218225001.16104.34454.stgit@localhost>

On Tuesday, December 18, 2012 05:53:37 PM Paul Moore wrote:
> A refresh/respin of the LSM/SELinux fixes to work on top of Jason's
> latest API tweak (now living in DaveM's net tree).  In general, I
> believe the hooks and thinking behind the v2 patchset still make sense
> so no changes there, although I did change the SELinux permission from
> "create_queue" to "attach_queue" to match the API changes.
> 
> Comments are welcome and encouraged; we need to get this fixed before
> 3.8 is released.

SELinux (I'm looking at you Eric) and LSM folks - any comments/objections to 
these changes?

> ---
> 
> Paul Moore (2):
>       selinux: add the "attach_queue" permission to the "tun_socket" class
>       tun: fix LSM/SELinux labeling of tun/tap devices

-- 
paul moore
security and virtualization @ redhat


^ permalink raw reply

* Re: [PATCH V2 00/12] Add basic VLAN support to bridges
From: Thomas Graf @ 2012-12-19 17:04 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Vlad Yasevich, netdev, shemminger, davem, or.gerlitz, jhs, mst
In-Reply-To: <20121219082727.GB1637@minipsycho.orion>

On 12/19/12 at 09:27am, Jiri Pirko wrote:
> Tue, Dec 18, 2012 at 11:46:21PM CET, vyasevic@redhat.com wrote:
> >On 12/18/2012 05:32 PM, Jiri Pirko wrote:
> >>
> >>
> >>I see that this patchset replicates a lot of code which is already
> >>present in net/8021q/ or include/linux/if_vlan.h. I think it would
> >>be nice to move this code into some "common" place, wouldn't it?
> >>
> >
> >The only replication that I am aware of is in br_vlan_untag().  I
> >thought about pulling that piece out, but I think there is a reason
> >why it's not available when 801q support isn't turned on.  I noted that
> >openvswitch implemented its own vlan header manipulation functions as well.
> 
> openvswitch should use the "common" code as well.

I was just about to mention this. This overlaps with openvswitch
in functionality which I have absoluetely no objections against
but code reuse should come to focus in order to avoid having to
fix bugs twice.

^ permalink raw reply

* Re: [PATCH v2] netlink: align attributes on 64-bits
From: Thomas Graf @ 2012-12-19 17:09 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: bhutchings, netdev, davem, David.Laight
In-Reply-To: <50D1A37C.8090705@6wind.com>

On 12/19/12 at 12:22pm, Nicolas Dichtel wrote:
> Here padlen will return 4, which is wrong: padlen + NLA_HDRLEN = 8,
> alignment is the same than before. Here is a proposal fix:
> 
> diff --git a/lib/nlattr.c b/lib/nlattr.c
> index e4f0329..1556313 100644
> --- a/lib/nlattr.c
> +++ b/lib/nlattr.c
> @@ -338,7 +338,10 @@ struct nlattr *__nla_reserve(struct sk_buff
> *skb, int attrtype, int attrlen)
>  		struct nlattr *pad;
>  		size_t padlen;
> 
> -		padlen = nla_total_size(offset) - offset -  NLA_HDRLEN;
> +		/* We need to remove NLA_HDRLEN two times: one time for the
> +		 * attribute hdr and one time for the pad attribute hdr.
> +		 */
> +		padlen = nla_total_size(offset) - offset -  2 * NLA_HDRLEN;
>  		pad = (struct nlattr *) skb_put(skb, nla_attr_size(padlen));
>  		pad->nla_type = 0;
>  		pad->nla_len = nla_attr_size(padlen);
> 
> With this patch, it seems goods. attribute are always aligned on 8 bytes. Also
> I did not notice any problem with size calculation (I try some ip
> link, ip xfrm, ip [m]route).
> 
> Do you want to make more tests? Or will your repost the full patch?
> I can do it if you don't have time.

Thanks.

I would like to do some testing as well. I do expect some fallout from
this. There is likely some interface abuse that will now be exposed
due to this.

We'll have to wait for the next merge window to open anyway. I'd
consider this a new feature and not a bugfix based on the possible
regression impact it could have.

I'll post a new version of the patch integrating your fix above so
others (especially subsystem maintainers depending on netlink) can run
the patch as well.

^ permalink raw reply

* Re: [PATCH V2 00/12] Add basic VLAN support to bridges
From: Vlad Yasevich @ 2012-12-19 17:11 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Jiri Pirko, netdev, shemminger, davem, or.gerlitz, jhs, mst
In-Reply-To: <20121219170431.GA6975@casper.infradead.org>

On 12/19/2012 12:04 PM, Thomas Graf wrote:
> On 12/19/12 at 09:27am, Jiri Pirko wrote:
>> Tue, Dec 18, 2012 at 11:46:21PM CET, vyasevic@redhat.com wrote:
>>> On 12/18/2012 05:32 PM, Jiri Pirko wrote:
>>>>
>>>>
>>>> I see that this patchset replicates a lot of code which is already
>>>> present in net/8021q/ or include/linux/if_vlan.h. I think it would
>>>> be nice to move this code into some "common" place, wouldn't it?
>>>>
>>>
>>> The only replication that I am aware of is in br_vlan_untag().  I
>>> thought about pulling that piece out, but I think there is a reason
>>> why it's not available when 801q support isn't turned on.  I noted that
>>> openvswitch implemented its own vlan header manipulation functions as well.
>>
>> openvswitch should use the "common" code as well.
>
> I was just about to mention this. This overlaps with openvswitch
> in functionality which I have absoluetely no objections against
> but code reuse should come to focus in order to avoid having to
> fix bugs twice.
>

Could we consolidate the code after this is accepted and all the parties
can agree on the consolidation?  I'd really like to keep this series
as minimally invasive as possible.

Thanks
-vlad

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox