Netdev List
 help / color / mirror / Atom feed
* RE: [PATCH v3 01/17] hashtable: introduce a small and naive hashtable
From: David Laight @ 2012-09-06 14:36 UTC (permalink / raw)
  To: Sasha Levin, Mathieu Desnoyers
  Cc: Pedro Alves, Steven Rostedt, Tejun Heo, torvalds, akpm,
	linux-kernel, linux-mm, paul.gortmaker, davem, mingo, ebiederm,
	aarcange, ericvh, netdev, josh, eric.dumazet, axboe, agk,
	dm-devel, neilb, ccaulfie, teigland, Trond.Myklebust, bfields,
	fweisbec, jesse, venkat.x.venkatsubra, ejt, snitzer, edumazet,
	linux-nfs, dev, rds-devel, lw
In-Reply-To: <5048AAF6.5090101@gmail.com>

> My solution to making 'break' work in the iterator is:
> 
> 	for (bkt = 0, node = NULL; bkt < HASH_SIZE(name) && node ==
NULL; bkt++)
> 		hlist_for_each_entry(obj, node, &name[bkt], member)

I'd take a look at the generated code.
Might come out a bit better if the condition is changed to:
	node == NULL && bkt < HASH_SIZE(name)
you might find the compiler always optimises out the
node == NULL comparison.
(It might anyway, but switching the order gives it a better
chance.)

	David



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH v2 09/10] net/macb: ethtool interface: add register dump feature
From: Ben Hutchings @ 2012-09-06 14:34 UTC (permalink / raw)
  To: Nicolas Ferre
  Cc: netdev, davem, linux-arm-kernel, havard, plagnioj,
	patrice.vilchez, linux-kernel
In-Reply-To: <1346941256-15676-1-git-send-email-nicolas.ferre@atmel.com>

On Thu, 2012-09-06 at 16:20 +0200, Nicolas Ferre wrote:
> Add macb_get_regs() ethtool function and its helper function:
> macb_get_regs_len().
> The version field is deduced from the IP revision which gives the
> "MACB or GEM" information. An additional version field is reserved.
> 
> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
> v2: - modify MACB_GREGS_NBR name and adapt to number of registers
>       actually displayed.
>     - change version format to reflect register layout and
>       add a version number to be future proof.
> 
>  drivers/net/ethernet/cadence/macb.c |   40 +++++++++++++++++++++++++++++++++++
>  drivers/net/ethernet/cadence/macb.h |    3 +++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
> index dc34ff1..cab42e7 100644
> --- a/drivers/net/ethernet/cadence/macb.c
> +++ b/drivers/net/ethernet/cadence/macb.c
> @@ -1223,9 +1223,49 @@ static int macb_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
>  	return phy_ethtool_sset(phydev, cmd);
>  }
>  
> +static int macb_get_regs_len(struct net_device *netdev)
> +{
> +	return MACB_GREGS_NBR * sizeof(u32);
> +}
> +
> +static void macb_get_regs(struct net_device *dev, struct ethtool_regs *regs,
> +			  void *p)
> +{
> +	struct macb *bp = netdev_priv(dev);
> +	unsigned int tail, head;
> +	u32 *regs_buff = p;
> +
> +	regs->version = (macb_readl(bp, MID) & ((1 << MACB_REV_SIZE) - 1))
> +			| MACB_GREGS_VERSION;
> +
> +	tail = macb_tx_ring_wrap(bp->tx_tail);
> +	head = macb_tx_ring_wrap(bp->tx_head);
> +
> +	regs_buff[0]  = macb_readl(bp, NCR);
> +	regs_buff[1]  = macb_or_gem_readl(bp, NCFGR);
> +	regs_buff[2]  = macb_readl(bp, NSR);
> +	regs_buff[3]  = macb_readl(bp, TSR);
> +	regs_buff[4]  = macb_readl(bp, RBQP);
> +	regs_buff[5]  = macb_readl(bp, TBQP);
> +	regs_buff[6]  = macb_readl(bp, RSR);
> +	regs_buff[7]  = macb_readl(bp, IMR);
> +
> +	regs_buff[8]  = tail;
> +	regs_buff[9]  = head;
> +	regs_buff[10] = macb_tx_dma(bp, tail);
> +	regs_buff[11] = macb_tx_dma(bp, head);
> +
> +	if (macb_is_gem(bp)) {
> +		regs_buff[12] = gem_readl(bp, USRIO);
> +		regs_buff[13] = gem_readl(bp, DMACFG);
> +	}
> +}
> +
>  static const struct ethtool_ops macb_ethtool_ops = {
>  	.get_settings		= macb_get_settings,
>  	.set_settings		= macb_set_settings,
> +	.get_regs_len		= macb_get_regs_len,
> +	.get_regs		= macb_get_regs,
>  	.get_link		= ethtool_op_get_link,
>  };
>  
> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
> index f69ceef..bcadc3c 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -10,6 +10,9 @@
>  #ifndef _MACB_H
>  #define _MACB_H
>  
> +#define MACB_GREGS_NBR 16
> +#define MACB_GREGS_VERSION 1
> +
>  /* MACB register offsets */
>  #define MACB_NCR				0x0000
>  #define MACB_NCFGR				0x0004

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH v3 01/17] hashtable: introduce a small and naive hashtable
From: Mathieu Desnoyers @ 2012-09-06 14:33 UTC (permalink / raw)
  To: Sasha Levin
  Cc: snitzer-H+wXaHxf7aLQT0dZR+AlfA, neilb-l3A5Bk7waGM,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA,
	bfields-uC3wQj2KruNg9hUCZPvPmw,
	paul.gortmaker-CWA4WttNNZF54TAoqtyWWQ,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	aarcange-H+wXaHxf7aLQT0dZR+AlfA, rds-devel-N0ozoZBvEnrZJqsBc5GL+g,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	venkat.x.venkatsubra-QHcLZuEGTsvQT0dZR+AlfA,
	ccaulfie-H+wXaHxf7aLQT0dZR+AlfA, mingo-X9Un+BFzKDI,
	dev-yBygre7rU0TnMu66kgdUjQ, ericvh-Re5JQEeQqe8AvxtiuMwx3w,
	josh-iaAMLnmF4UmaiuxdJuQwMA, Steven Rostedt,
	lw-BthXqXjhjHXQFUHtdCDX3A, teigland-H+wXaHxf7aLQT0dZR+AlfA,
	axboe-tSWWG44O7X1aa/9Udqfwiw, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	Pedro Alves, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ejt-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	netdev-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <5048AAF6.5090101-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

* Sasha Levin (levinsasha928-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org) wrote:
> On 09/04/2012 07:01 PM, Mathieu Desnoyers wrote:
> >> #define do_for_each_ftrace_rec(pg, rec)                                          \
> >> >         for (pg = ftrace_pages_start, rec = &pg->records[pg->index];             \
> >> >              pg && rec == &pg->records[pg->index];                               \
> >> >              pg = pg->next)                                                      \
> >> >           for (rec = pg->records; rec < &pg->records[pg->index]; rec++)
> > Maybe in some cases there might be ways to combine the two loops into
> > one ? I'm not seeing exactly how to do it for this one, but it should
> > not be impossible. If the inner loop condition can be moved to the outer
> > loop, and if we use (blah ? loop1_conf : loop2_cond) to test for
> > different conditions depending on the context, and do the same for the
> > 3rd argument of the for() loop. The details elude me for now though, so
> > maybe it's complete non-sense ;)
> > 
> > It might not be that useful for do_for_each_ftrace_rec, but if we can do
> > it for the hash table iterator, it might be worth it.
> 
> So I think that for the hash iterator it might actually be simpler.
> 
> My solution to making 'break' work in the iterator is:
> 
> 	for (bkt = 0, node = NULL; bkt < HASH_SIZE(name) && node == NULL; bkt++)
> 		hlist_for_each_entry(obj, node, &name[bkt], member)
> 
> We initialize our node loop cursor with NULL in the external loop, and the
> external loop will have a new condition to loop while that cursor is NULL.
> 
> My logic is that we can only 'break' when we are iterating over an object in the
> internal loop. If we're iterating over an object in that loop then 'node != NULL'.
> 
> This way, if we broke from within the internal loop, the external loop will see
> node as not NULL, and so it will stop looping itself. On the other hand, if the
> internal loop has actually ended, then node will be NULL, and the outer loop
> will keep running.
> 
> Is there anything I've missed?

This sounds good. Unless I'm missing something too.

Thanks!

Mathieu

> 
> 
> Thanks,
> Sasha

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply

* Re: NULL pointer dereference in xt_register_target()
From: Cong Wang @ 2012-09-06 14:27 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Eric Dumazet, netfilter-devel, Linux Kernel Network Developers
In-Reply-To: <20120905164831.GA21836@1984>

On Thu, Sep 6, 2012 at 12:48 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Wed, Sep 05, 2012 at 05:55:06PM +0200, Eric Dumazet wrote:
>> On Wed, 2012-09-05 at 23:43 +0800, Cong Wang wrote:
>> > Hi, folks,
>> >
>> > The latest net-next tree can't boot due to a NULL ptr def
>> > bug in the kernel, the full backtrack is:
>> >
>> > http://img1.douban.com/view/photo/photo/public/p1697139550.jpg
>> >
>> > the kernel .config file is:
>> >
>> > http://pastebin.com/9YTnkqKN
>> >
>> > I don't have time to look into the issue. If you need other info,
>> > please let me know.
>>
>> It seems xt_nat_init() is called before xt_init(), so xt array is not
>> yet setup.
>
> I have enqueued the following patch to fix this:
>
> http://1984.lsi.us.es/git/nf-next/commit/?id=00545bec9412d130c77f72a08d6c8b6ad21d4a1
> e
> commit 00545bec9412d130c77f72a08d6c8b6ad21d4a1e
> Author: Pablo Neira Ayuso <pablo@netfilter.org>
> Date:   Wed Sep 5 18:24:55 2012 +0200
>
>     netfilter: fix crash during boot if NAT has been compiled built-in
>

Yeah, this indeed fixes the bug.

Please push it to net-next as soon as possible?

Thanks!

^ permalink raw reply

* [PATCH v2 09/10] net/macb: ethtool interface: add register dump feature
From: Nicolas Ferre @ 2012-09-06 14:20 UTC (permalink / raw)
  To: netdev, bhutchings, davem
  Cc: linux-arm-kernel, havard, plagnioj, patrice.vilchez, linux-kernel,
	Nicolas Ferre
In-Reply-To: <1346888174.5325.55.camel@bwh-desktop.uk.solarflarecom.com>

Add macb_get_regs() ethtool function and its helper function:
macb_get_regs_len().
The version field is deduced from the IP revision which gives the
"MACB or GEM" information. An additional version field is reserved.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
---
v2: - modify MACB_GREGS_NBR name and adapt to number of registers
      actually displayed.
    - change version format to reflect register layout and
      add a version number to be future proof.

 drivers/net/ethernet/cadence/macb.c |   40 +++++++++++++++++++++++++++++++++++
 drivers/net/ethernet/cadence/macb.h |    3 +++
 2 files changed, 43 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
index dc34ff1..cab42e7 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -1223,9 +1223,49 @@ static int macb_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
 	return phy_ethtool_sset(phydev, cmd);
 }
 
+static int macb_get_regs_len(struct net_device *netdev)
+{
+	return MACB_GREGS_NBR * sizeof(u32);
+}
+
+static void macb_get_regs(struct net_device *dev, struct ethtool_regs *regs,
+			  void *p)
+{
+	struct macb *bp = netdev_priv(dev);
+	unsigned int tail, head;
+	u32 *regs_buff = p;
+
+	regs->version = (macb_readl(bp, MID) & ((1 << MACB_REV_SIZE) - 1))
+			| MACB_GREGS_VERSION;
+
+	tail = macb_tx_ring_wrap(bp->tx_tail);
+	head = macb_tx_ring_wrap(bp->tx_head);
+
+	regs_buff[0]  = macb_readl(bp, NCR);
+	regs_buff[1]  = macb_or_gem_readl(bp, NCFGR);
+	regs_buff[2]  = macb_readl(bp, NSR);
+	regs_buff[3]  = macb_readl(bp, TSR);
+	regs_buff[4]  = macb_readl(bp, RBQP);
+	regs_buff[5]  = macb_readl(bp, TBQP);
+	regs_buff[6]  = macb_readl(bp, RSR);
+	regs_buff[7]  = macb_readl(bp, IMR);
+
+	regs_buff[8]  = tail;
+	regs_buff[9]  = head;
+	regs_buff[10] = macb_tx_dma(bp, tail);
+	regs_buff[11] = macb_tx_dma(bp, head);
+
+	if (macb_is_gem(bp)) {
+		regs_buff[12] = gem_readl(bp, USRIO);
+		regs_buff[13] = gem_readl(bp, DMACFG);
+	}
+}
+
 static const struct ethtool_ops macb_ethtool_ops = {
 	.get_settings		= macb_get_settings,
 	.set_settings		= macb_set_settings,
+	.get_regs_len		= macb_get_regs_len,
+	.get_regs		= macb_get_regs,
 	.get_link		= ethtool_op_get_link,
 };
 
diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index f69ceef..bcadc3c 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -10,6 +10,9 @@
 #ifndef _MACB_H
 #define _MACB_H
 
+#define MACB_GREGS_NBR 16
+#define MACB_GREGS_VERSION 1
+
 /* MACB register offsets */
 #define MACB_NCR				0x0000
 #define MACB_NCFGR				0x0004
-- 
1.7.10

^ permalink raw reply related

* Re: [PATCH v3 01/17] hashtable: introduce a small and naive hashtable
From: Pedro Alves @ 2012-09-06 14:19 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Mathieu Desnoyers, Steven Rostedt, Tejun Heo, torvalds, akpm,
	linux-kernel, linux-mm, paul.gortmaker, davem, mingo, ebiederm,
	aarcange, ericvh, netdev, josh, eric.dumazet, axboe, agk,
	dm-devel, neilb, ccaulfie, teigland, Trond.Myklebust, bfields,
	fweisbec, jesse, venkat.x.venkatsubra, ejt, snitzer, edumazet,
	linux-nfs, dev, rds-devel, lw
In-Reply-To: <5048AAF6.5090101@gmail.com>

On 09/06/2012 02:53 PM, Sasha Levin wrote:

> So I think that for the hash iterator it might actually be simpler.
> 
> My solution to making 'break' work in the iterator is:
> 
> 	for (bkt = 0, node = NULL; bkt < HASH_SIZE(name) && node == NULL; bkt++)
> 		hlist_for_each_entry(obj, node, &name[bkt], member)
> 
> We initialize our node loop cursor with NULL in the external loop, and the
> external loop will have a new condition to loop while that cursor is NULL.
> 
> My logic is that we can only 'break' when we are iterating over an object in the
> internal loop. If we're iterating over an object in that loop then 'node != NULL'.
> 
> This way, if we broke from within the internal loop, the external loop will see
> node as not NULL, and so it will stop looping itself. On the other hand, if the
> internal loop has actually ended, then node will be NULL, and the outer loop
> will keep running.
> 
> Is there anything I've missed?

Looks right to me, from a cursory look at hlist_for_each_entry.  That's exactly
what I meant with this most often being trivial when the inner loop's iterator
is a pointer that goes NULL at the end.

-- 
Pedro Alves

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 08/10] net/macb: macb_get_drvinfo: add GEM/MACB suffix to differentiate revision
From: Nicolas Ferre @ 2012-09-06 14:01 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev, linux-arm-kernel, davem, havard, plagnioj, jamie,
	linux-kernel, patrice.vilchez
In-Reply-To: <1346887671.5325.47.camel@bwh-desktop.uk.solarflarecom.com>

On 09/06/2012 01:27 AM, Ben Hutchings :
> On Wed, 2012-09-05 at 11:00 +0200, Nicolas Ferre wrote:
>> Add an indication about which revision of the hardware we are running in
>> info->driver string.
>>
>> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
>> ---
>>  drivers/net/ethernet/cadence/macb.c |    4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
>> index bd331fd..c7c39f1 100644
>> --- a/drivers/net/ethernet/cadence/macb.c
>> +++ b/drivers/net/ethernet/cadence/macb.c
>> @@ -1313,6 +1313,10 @@ static void macb_get_drvinfo(struct net_device *dev,
>>  	struct macb *bp = netdev_priv(dev);
>>  
>>  	strcpy(info->driver, bp->pdev->dev.driver->name);
>> +	if (macb_is_gem(bp))
>> +		strcat(info->driver, " GEM");
>> +	else
>> +		strcat(info->driver, " MACB");
>>  	strcpy(info->version, "$Revision: 1.14 $");
> 
> Related to hardware revisions (which don't belong here, as David said),
> I rather doubt this CVS ID is very useful as a driver version.
> 
> If the driver doesn't have a meaningful version (aside from the kernel
> version) then you can remove this function and let the ethtool core fill
> in the other two fields automatically.

Absolutely, I will do this.

Thanks for the tip.

Best regards,
-- 
Nicolas Ferre

^ permalink raw reply

* Re: [PATCH net-next] ipv6: fix handling of throw routes
From: Eric Dumazet @ 2012-09-06 13:58 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: davem, netdev, markus.stenberg
In-Reply-To: <1346946815-3094-1-git-send-email-nicolas.dichtel@6wind.com>

On Thu, 2012-09-06 at 11:53 -0400, Nicolas Dichtel wrote:
> It's the same problem that previous fix about blackhole and prohibit routes.
> 
> When adding a throw route, it was handled like a classic route.
> Moreover, it was only possible to add this kind of routes by specifying
> an interface.
> 
> Before the patch:
>   $ ip route add throw 2001::2/128
>   RTNETLINK answers: No such device
>   $ ip route add throw 2001::2/128 dev eth0
>   $ ip -6 route | grep 2001::2
>   2001::2 dev eth0  metric 1024
> 
> After:
>   $ ip route add throw 2001::2/128
>   $ ip -6 route | grep 2001::2
>   throw 2001::2 dev lo  metric 1024  error -11
> 
> Reported-by: Markus Stenberg <markus.stenberg@iki.fi>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
>  net/ipv6/route.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)

Acked-by: Eric Dumazet <edumazet@google.com>

Thanks Nicolas

^ permalink raw reply

* Re: [PATCH] mac80211: use list_move instead of list_del/list_add
From: Wei Yongjun @ 2012-09-06 13:57 UTC (permalink / raw)
  To: johannes; +Cc: linville, yongjun_wei, linux-wireless, netdev

On 09/06/2012 05:56 PM, Johannes Berg wrote:

Hi Johannes,

> On Thu, 2012-09-06 at 13:20 +0800, Wei Yongjun wrote:
>> From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
>>
>> Using list_move() instead of list_del() + list_add().
>>
>> spatch with a semantic match is used to found this problem.
>> (http://coccinelle.lip6.fr/)
>>
>> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
> Applied. FWIW, I don't think it's really a "problem" rather than a
> simplification or something like that, but anyway.

That is right, I will change the patch description if I send some other
patchs like this cleanup.

Thanks,
Yongjun Wei

>
> johannes
>
>
>

^ permalink raw reply

* Re: CBQ(but probably u32 filter bug), kernel "freeze", at least from 3.2.0
From: Eric Dumazet @ 2012-09-06 13:56 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: netdev
In-Reply-To: <3012b13d6556b629a84089b68bce9a6b@visp.net.lb>

On Thu, 2012-09-06 at 16:47 +0300, Denys Fedoryshchenko wrote:

> Dear Eric
> 
> Very sorry for delay, most of time in desert, without decent internet.
> I will try to test today or tomorrow.

No problem, I reproduced the bug on my dev machine, but its always
better to have bug reporter adding its own 'Tested-by:' tag ;)

Thanks

^ permalink raw reply

* Re: CBQ(but probably u32 filter bug), kernel "freeze", at least from 3.2.0
From: Denys Fedoryshchenko @ 2012-09-06 13:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1346920806.13121.180.camel@edumazet-glaptop>

On 2012-09-06 11:40, Eric Dumazet wrote:
> On Tue, 2012-08-28 at 03:59 -0700, Eric Dumazet wrote:
>> On Tue, 2012-08-28 at 07:50 +0300, Denys Fedoryshchenko wrote:
>> > Hi
>> >
>> > Got information from friend, confirmed that it crashed at least 
>> two my
>> > boxes also :)
>> > 3.0.5-rc1 is working fine, 3.4.1 , 3.2.0 from ubuntu  - crashing
>> > No watchdog fired, and didn't got yet significant debugging
>> > information.
>> >
>> > Very easy to reproduce:
>> > 1)run the script
>> > 2)ping 192.168.3.234
>> >
>> > script:
>> > DEV_OUT=eth0
>> > ICMP="match ip protocol 1 0xff"
>> > U32="protocol ip u32"
>> > DST="match ip dst"
>> > tc qdisc add dev $DEV_OUT root handle 1: cbq avpkt 1000 bandwidth
>> > 100mbit
>> > tc class add dev $DEV_OUT parent 1: classid 1:1 cbq rate 512kbit 
>> allot
>> > 1500 prio 5 bounded isolated
>> > tc filter add dev $DEV_OUT parent 1:              prio 3 $U32 
>> $ICMP
>> > $DST 192.168.3.234 flowid 1:
>> > tc qdisc add dev $DEV_OUT parent 1:1 sfq perturb 10
>>
>> Not sure what your friend expected from this buggy configuration.
>>
>> It probably never worked at all.
>>
>> CBQ needs at least one child class and one leaf class.
>>
>> This scripts creates a loop inside CBQ, so cpu is probably looping 
>> in
>> cbq_enqueue() (or more exactly cbq_classify()), as instructed by the
>> sysadmin ;)
>>
>> u32 (or sfq) seems ok.
>>
>> Could you try the following patch ?
>>
>> Thanks !
>>
>> diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
>> index 6aabd77..564b9fc 100644
>> --- a/net/sched/sch_cbq.c
>> +++ b/net/sched/sch_cbq.c
>> @@ -250,10 +250,11 @@ cbq_classify(struct sk_buff *skb, struct Qdisc 
>> *sch, int *qerr)
>>  			else if ((cl = defmap[res.classid & TC_PRIO_MAX]) == NULL)
>>  				cl = defmap[TC_PRIO_BESTEFFORT];
>>
>> -			if (cl == NULL || cl->level >= head->level)
>> +			if (cl == NULL)
>>  				goto fallback;
>>  		}
>> -
>> +		if (cl->level >= head->level)
>> +			goto fallback;
>>  #ifdef CONFIG_NET_CLS_ACT
>>  		switch (result) {
>>  		case TC_ACT_QUEUED:
>>
>
> Hi Denys
>
> Any feedback on the suggested patch ?
>
> Thanks !
Dear Eric

Very sorry for delay, most of time in desert, without decent internet.
I will try to test today or tomorrow.

---
Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L.

^ permalink raw reply

* Re: [PATCH v3 01/17] hashtable: introduce a small and naive hashtable
From: Sasha Levin @ 2012-09-06 13:53 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Pedro Alves, Steven Rostedt, Tejun Heo, torvalds, akpm,
	linux-kernel, linux-mm, paul.gortmaker, davem, mingo, ebiederm,
	aarcange, ericvh, netdev, josh, eric.dumazet, axboe, agk,
	dm-devel, neilb, ccaulfie, teigland, Trond.Myklebust, bfields,
	fweisbec, jesse, venkat.x.venkatsubra, ejt, snitzer, edumazet,
	linux-nfs, dev, rds-devel, lw
In-Reply-To: <20120904170138.GB31934@Krystal>

On 09/04/2012 07:01 PM, Mathieu Desnoyers wrote:
>> #define do_for_each_ftrace_rec(pg, rec)                                          \
>> >         for (pg = ftrace_pages_start, rec = &pg->records[pg->index];             \
>> >              pg && rec == &pg->records[pg->index];                               \
>> >              pg = pg->next)                                                      \
>> >           for (rec = pg->records; rec < &pg->records[pg->index]; rec++)
> Maybe in some cases there might be ways to combine the two loops into
> one ? I'm not seeing exactly how to do it for this one, but it should
> not be impossible. If the inner loop condition can be moved to the outer
> loop, and if we use (blah ? loop1_conf : loop2_cond) to test for
> different conditions depending on the context, and do the same for the
> 3rd argument of the for() loop. The details elude me for now though, so
> maybe it's complete non-sense ;)
> 
> It might not be that useful for do_for_each_ftrace_rec, but if we can do
> it for the hash table iterator, it might be worth it.

So I think that for the hash iterator it might actually be simpler.

My solution to making 'break' work in the iterator is:

	for (bkt = 0, node = NULL; bkt < HASH_SIZE(name) && node == NULL; bkt++)
		hlist_for_each_entry(obj, node, &name[bkt], member)

We initialize our node loop cursor with NULL in the external loop, and the
external loop will have a new condition to loop while that cursor is NULL.

My logic is that we can only 'break' when we are iterating over an object in the
internal loop. If we're iterating over an object in that loop then 'node != NULL'.

This way, if we broke from within the internal loop, the external loop will see
node as not NULL, and so it will stop looping itself. On the other hand, if the
internal loop has actually ended, then node will be NULL, and the outer loop
will keep running.

Is there anything I've missed?


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH net-next] ipv6: fix handling of throw routes
From: Nicolas Dichtel @ 2012-09-06 15:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, markus.stenberg, eric.dumazet, Nicolas Dichtel
In-Reply-To: <5048A374.60005@6wind.com>

It's the same problem that previous fix about blackhole and prohibit routes.

When adding a throw route, it was handled like a classic route.
Moreover, it was only possible to add this kind of routes by specifying
an interface.

Before the patch:
  $ ip route add throw 2001::2/128
  RTNETLINK answers: No such device
  $ ip route add throw 2001::2/128 dev eth0
  $ ip -6 route | grep 2001::2
  2001::2 dev eth0  metric 1024

After:
  $ ip route add throw 2001::2/128
  $ ip -6 route | grep 2001::2
  throw 2001::2 dev lo  metric 1024  error -11

Reported-by: Markus Stenberg <markus.stenberg@iki.fi>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/ipv6/route.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index fa26444..339d921 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1471,6 +1471,9 @@ int ip6_route_add(struct fib6_config *cfg)
 		case RTN_PROHIBIT:
 			rt->dst.error = -EACCES;
 			break;
+		case RTN_THROW:
+			rt->dst.error = -EAGAIN;
+			break;
 		default:
 			rt->dst.error = -ENETUNREACH;
 			break;
@@ -2275,7 +2278,8 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	if (rtm->rtm_type == RTN_UNREACHABLE ||
 	    rtm->rtm_type == RTN_BLACKHOLE ||
-	    rtm->rtm_type == RTN_PROHIBIT)
+	    rtm->rtm_type == RTN_PROHIBIT ||
+	    rtm->rtm_type == RTN_THROW)
 		cfg->fc_flags |= RTF_REJECT;
 
 	if (rtm->rtm_type == RTN_LOCAL)
@@ -2412,6 +2416,9 @@ static int rt6_fill_node(struct net *net,
 		case -EACCES:
 			rtm->rtm_type = RTN_PROHIBIT;
 			break;
+		case -EAGAIN:
+			rtm->rtm_type = RTN_THROW;
+			break;
 		default:
 			rtm->rtm_type = RTN_UNREACHABLE;
 			break;
-- 
1.7.12

^ permalink raw reply related

* Re: Increased multicast packet drops in 3.4
From: Eric Dumazet @ 2012-09-06 13:31 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: netdev
In-Reply-To: <1346937667.2484.33.camel@edumazet-glaptop>

On Thu, 2012-09-06 at 15:21 +0200, Eric Dumazet wrote:

> 
> Are you receiving fragmented UDP frames ?
> 
> I ask this because with latest kernels (linux-3.5), we should no longer
> build a list of skb, but a single skb with page fragments.
> 
> commit 3cc4949269e01f39443d0fcfffb5bc6b47878d45
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Sat May 19 03:02:20 2012 +0000
> 
>     ipv4: use skb coalescing in defragmentation
>     
>     ip_frag_reasm() can use skb_try_coalesce() to build optimized skb,
>     reducing memory used by them (truesize), and reducing number of cache
>     line misses and overhead for the consumer.
>     
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Cc: Alexander Duyck <alexander.h.duyck@intel.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> 

Unfortunately mlx4 pulls too many bytes from the frame to skb->head, so
it defeats coalescing completely.

Try following patch (if you also try linux-3.5)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 9d27e42..700e70e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -150,7 +150,7 @@ enum {
 #define ETH_LLC_SNAP_SIZE	8
 
 #define SMALL_PACKET_SIZE      (256 - NET_IP_ALIGN)
-#define HEADER_COPY_SIZE       (128 - NET_IP_ALIGN)
+#define HEADER_COPY_SIZE       ETH_HLEN
 #define MLX4_LOOPBACK_TEST_PAYLOAD (HEADER_COPY_SIZE - ETH_HLEN)
 
 #define MLX4_EN_MIN_MTU		46

^ permalink raw reply related

* Re: IPv6 routing type - not at par with IPv4 one?
From: Nicolas Dichtel @ 2012-09-06 13:21 UTC (permalink / raw)
  To: Markus; +Cc: Eric Dumazet, Markus Stenberg, netdev
In-Reply-To: <7F0FC925-D0A4-4E27-AE8A-F2E0A619FB36@iki.fi>

Le 06/09/2012 12:54, Markus a écrit :
> On 6.9.2012, at 13.35, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Well, it  seems you missed this :
>>
>> http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git;a=commitdiff;h=ef2c7d7b59708d54213c7556a82d14de9a7e4475
>>
>> At least the blackhole is now supported on IPv6, so you probably have to
>> add the 'throw' bit, if it makes any sense.
>
>
> Ah, cool, teaches me to refresh my git trees before posting ;-)
> Nicolas, want to update for that too or will I?
Ok, I will send another patch for this.


Thank you,
Nicolas

^ permalink raw reply

* Re: Increased multicast packet drops in 3.4
From: Eric Dumazet @ 2012-09-06 13:21 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: netdev
In-Reply-To: <20120906130316.GA2310@BohrerMBP.gateway.2wire.net>

On Thu, 2012-09-06 at 08:03 -0500, Shawn Bohrer wrote:
> On Thu, Sep 06, 2012 at 08:22:40AM +0200, Eric Dumazet wrote:
> > On Wed, 2012-09-05 at 19:11 -0500, Shawn Bohrer wrote:
> > > I've been testing the 3.4 kernel compared to the 3.1 kernel and
> > > noticed my application is experiencing a noticeable increase in packet
> > > drops compared to 3.1.  In this case I have 8 processes all listening
> > > on the same multicast group and occasionally 1 or more of the
> > > processes will report drops based on gaps in the sequence numbers on
> > > the packets.  One thing I find interesting is that some of the time 2
> > > or 3 of the 8 processes will report that they missed the exact same
> > > 50+ packets.  Since the other processes receive the packets I know
> > > that they are making it to the machine and past the driver.
> > > 
> > > So far I have not been able to _see_ any OS counters increase when the
> > > drops occur but perhaps there is a location that I have not yet
> > > looked.  I've been looking for drops in /proc/net/udp /proc/net/snmp
> > > and /proc/net/dev.
> > > 
> > > I've tried using dropwatch/drop_monitor but it is awfully noisy even
> > > after back porting many of the patches Eric Dumazet has contributed to
> > > silence the false positives.  Similarly I setup trace-cmd/ftrace to
> > > record skb:kfree_skb calls with a stacktrace and had my application
> > > stop the trace when a drop was reported.  From these traces I see a
> > > number of the following:
> > > 
> > >     md_connector-12791 [014]  7952.982818: kfree_skb:            skbaddr=0xffff880583bd7500 protocol=2048 location=0xffffffff813c930b
> > >     md_connector-12791 [014]  7952.982821: kernel_stack:         <stack trace>
> > > => skb_release_data (ffffffff813c930b)
> > > => __kfree_skb (ffffffff813c934e)
> > > => skb_free_datagram_locked (ffffffff813ccca8)
> > > => udp_recvmsg (ffffffff8143335c)
> > > => inet_recvmsg (ffffffff8143cbfb)
> > > => sock_recvmsg_nosec (ffffffff813be80f)
> > > => __sys_recvmsg (ffffffff813bfe70)
> > > => __sys_recvmmsg (ffffffff813c2392)
> > > => sys_recvmmsg (ffffffff813c25b0)
> > > => system_call_fastpath (ffffffff8148cfd2)
> > > 
> > > Looking at the code it does look like these could be the drops, since
> > > I do not see any counters incremented in this code path.  However I'm
> > > not very familiar with this code so it could also be a false positive.
> > > It does look like the above stack only gets called if
> > > skb_has_frag_list(skb) does this imply the packet was over one MTU
> > > (1500)?
> > > 
> > > I'd appreciate any input on possible causes/solutions for these drops.
> > > Or ways that I can further debug this issue to find the root cause of
> > > the increase in drops on 3.4.
> > > 
> > > Thanks,
> > > Shawn
> > > 
> > 
> > What NIC driver are you using ?
>  
> $ sudo ethtool -i eth4
> driver: mlx4_en
> version: 2.0 (Dec 2011)
> firmware-version: 2.10.700
> bus-info: 0000:05:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: no
> supports-register-dump: no
> 
> This is the in tree driver from 3.4.9
> 
> [ sbohrer@berbox12:/home/sbohrer ]
> $ /sbin/lspci | grep -i mell
> 05:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
> 
> > Could you trace if skb_copy_and_csum_datagram_iovec() or
> > skb_copy_datagram_iovec() returns an error (it could be EFAULT by
> > example) ?
> > 
> > If so, you could add some debugging to these functions to track what
> > exact error it is
> > 
> > It seems following patch is needed anyway :
> > 
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index 6f6d1ac..2c965c9 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -1226,6 +1226,8 @@ try_again:
> >  
> >  	if (unlikely(err)) {
> >  		trace_kfree_skb(skb, udp_recvmsg);
> > +		if (!peeked)
> > +			UDP_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
> >  		goto out_free;
> >  	}
> 
> Sorry, I should have mentioned that it doesn't appear I'm hitting that
> tracepoint.  That tracepoint would have a location=udp_recvmsg and I
> believe the stack trace would also start at udp_recvmsg.  I didn't see
> any of these in the traces I captured.
> 
> I think the one I'm hitting is the following with some of my own extra
> annotation:
> 
> => kfree_skb()
> => skb_drop_list()
> => skb_drop_fraglist()
> > > => skb_release_data (ffffffff813c930b)
> => skb_release_all()
> > > => __kfree_skb (ffffffff813c934e)
> > > => skb_free_datagram_locked (ffffffff813ccca8)
> > > => udp_recvmsg (ffffffff8143335c)
> > > => inet_recvmsg (ffffffff8143cbfb)
> > > => sock_recvmsg_nosec (ffffffff813be80f)
> > > => __sys_recvmsg (ffffffff813bfe70)
> > > => __sys_recvmmsg (ffffffff813c2392)
> > > => sys_recvmmsg (ffffffff813c25b0)
> > > => system_call_fastpath (ffffffff8148cfd2)
> 
> kfree_skb() has the trace_kfree_skb() call on net/core/skbuff.c:3283
> 
> I can of course still try your patch and double check that I'm not
> hitting that one.


kfree_skb() can free a list of skb, and we use a generic function to do
so, without forwarding the drop/notdrop status. So its unfortunate, but
adding extra parameters just for the sake of drop_monitor is not worth
it.  skb_drop_fraglist() doesnt know if the parent skb is dropped or
only freed, so it calls kfree_skb(), not consume_skb() or kfree_skb()

Are you receiving fragmented UDP frames ?

I ask this because with latest kernels (linux-3.5), we should no longer
build a list of skb, but a single skb with page fragments.

commit 3cc4949269e01f39443d0fcfffb5bc6b47878d45
Author: Eric Dumazet <edumazet@google.com>
Date:   Sat May 19 03:02:20 2012 +0000

    ipv4: use skb coalescing in defragmentation
    
    ip_frag_reasm() can use skb_try_coalesce() to build optimized skb,
    reducing memory used by them (truesize), and reducing number of cache
    line misses and overhead for the consumer.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Alexander Duyck <alexander.h.duyck@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: usbnet: fix oops in usbnet_start_xmit
From: Oliver Neukum @ 2012-09-06 13:10 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: khlebnikov-GEFAQzZX7r8dnm+yROfE0A, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20120906125230.GI19410@mwanda>

On Thursday 06 September 2012 05:52:30 Dan Carpenter wrote:
> I sent this email a year ago when the patch was committed but I
> never received a response.

I probably should have checked then.

> regards,
> dan carpenter
> 
> On Wed, Nov 09, 2011 at 10:34:59AM +0300, Dan Carpenter wrote:
> > Hello Konstantin Khlebnikov,
> > 
> > This is a semi-automatic email about new static checker warnings.
> > 
> > The patch 23ba07991dad: "usbnet: fix oops in usbnet_start_xmit" from 
> > Nov 7, 2011, leads to the following Smatch complaint:
> > 
> > drivers/net/usb/usbnet.c +1077 usbnet_start_xmit()
> >        error: we previously assumed 'skb' could be null (see line 1060)
> > 
> > drivers/net/usb/usbnet.c
> >   1059        
> >   1060                if (skb)
> >                     ^^^
> > check introduced here.
> > 
> >   1061                        skb_tx_timestamp(skb);
> >   1062        
> >   1063                // some devices want funky USB-level framing, for
> >   1064                // win32 driver (usually) and/or hardware quirks
> >   1065                if (info->tx_fixup) {
> >   1066                        skb = info->tx_fixup (dev, skb, GFP_ATOMIC);

It turns out that skb == NULL implies info->tx_fixup != NULL
and skb will be reassigned.
This is very dirty.

	Regards
		Oliver


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] iproute2: Fix various manpage formatting nits
From: Andreas Schwab @ 2012-09-06 13:09 UTC (permalink / raw)
  To: netdev

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
---
 man/man8/ip-route.8.in | 10 +++++-----
 man/man8/ip.8          |  4 ++--
 man/man8/ss.8          |  2 +-
 man/man8/tc-choke.8    |  8 ++++----
 man/man8/tc-drr.8      |  1 +
 man/man8/tc-ematch.8   | 25 +++++++++++--------------
 6 files changed, 24 insertions(+), 26 deletions(-)

diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in
index 0ca6107..f06fcba 100644
--- a/man/man8/ip-route.8.in
+++ b/man/man8/ip-route.8.in
@@ -202,11 +202,10 @@ error.
 are considered to be dummy (or external) addresses which require translation
 to real (or internal) ones before forwarding.  The addresses to translate to
 are selected with the attribute
+.BR "via" .
 .B Warning:
 Route NAT is no longer supported in Linux 2.6.
 
-
-.BR "via" .
 .sp
 .B anycast
 .RI "- " "not implemented"
@@ -306,7 +305,7 @@ If this parameter is omitted,
 assumes the
 .B main
 table, with the exception of
-.BR local " , " broadcast " and " nat
+.BR local ", " broadcast " and " nat
 routes, which are put into the
 .B local
 table by default.
@@ -560,13 +559,14 @@ i.e. it lists the entire table.
 
 .TP
 .BI tos " TOS"
+.TP
 .BI dsfield " TOS"
 only select routes with the given TOS.
 
 .TP
 .BI table " TABLEID"
-show the routes from this table(s).  The default setting is to show
-.BR table main "."
+show the routes from this table(s).  The default setting is to show table
+.BR main "."
 .I TABLEID
 may either be the ID of a real table or one of the special values:
 .sp
diff --git a/man/man8/ip.8 b/man/man8/ip.8
index 9ba3621..ac78c29 100644
--- a/man/man8/ip.8
+++ b/man/man8/ip.8
@@ -85,11 +85,11 @@ shortcut for
 .BR "\-o" , " \-oneline"
 output each record on a single line, replacing line feeds
 with the
-.B '\e\'
+.B '\e'
 character. This is convenient when you want to count records
 with
 .BR wc (1)
- or to
+or to
 .BR grep (1)
 the output.
 
diff --git a/man/man8/ss.8 b/man/man8/ss.8
index 0b9a8c4..f03c6d8 100644
--- a/man/man8/ss.8
+++ b/man/man8/ss.8
@@ -14,7 +14,7 @@ It can display more TCP and state informations than other tools.
 .SH OPTIONS
 When no option is used ss displays a list of 
 open non-listening TCP sockets that have established connection.
-.TP
+.P
 These programs follow the usual GNU command line syntax, with long
 options starting with two dashes (`-').
 A summary of options is included below.
diff --git a/man/man8/tc-choke.8 b/man/man8/tc-choke.8
index 620c7f6..9d1081f 100644
--- a/man/man8/tc-choke.8
+++ b/man/man8/tc-choke.8
@@ -30,11 +30,11 @@ queue.  If both the to-be-queued and the drawn packet belong to the same flow,
 both packets are dropped.  Otherwise, if the queue length is still below the maximum length,
 the new packet has a configurable chance of being marked (which may mean dropped).
 If the queue length exceeds
-.B max
-, the new packet will always be marked (or dropped).
+.BR max ,
+the new packet will always be marked (or dropped).
 If the queue length exceeds
-.B limit
-, the new packet is always dropped.
+.BR limit ,
+the new packet is always dropped.
 
 The marking probability computation is the same as used by the RED qdisc.
 
diff --git a/man/man8/tc-drr.8 b/man/man8/tc-drr.8
index e25d6dd..29daed8 100644
--- a/man/man8/tc-drr.8
+++ b/man/man8/tc-drr.8
@@ -45,6 +45,7 @@ To attach to device eth0, using the interface MTU as its quantum:
 Adding two classes:
 .P
 # tc class add dev eth0 parent 1: classid 1:1 drr
+.br
 # tc class add dev eth0 parent 1: classid 1:2 drr
 .P
 You also need to add at least one filter to classify packets.
diff --git a/man/man8/tc-ematch.8 b/man/man8/tc-ematch.8
index 53ae161..2eafc29 100644
--- a/man/man8/tc-ematch.8
+++ b/man/man8/tc-ematch.8
@@ -21,7 +21,7 @@ ematch \- extended matches for use with "basic" or "flow" filters
 ]
 
 .ti -8
-.IR TERM " := [ " not " ] { " MATCH " | '(' " EXPR " ')' } "
+.IR TERM " := [ " \fBnot " ] { " MATCH " | '(' " EXPR " ')' } "
 
 .ti -8
 .IR MATCH " := " module " '(' " ARGS " ')' "
@@ -34,30 +34,27 @@ ematch \- extended matches for use with "basic" or "flow" filters
 .SS cmp
 Simple comparison ematch: arithmetic compare of packet data to a given value.
 .ti
-.IR cmp "( " ALIGN " at " OFFSET " [ " ATTRS " ]  { " eq " | " lt " | " gt "  } " VALUE " )
+.IR cmp "( " ALIGN " at " OFFSET " [ " ATTRS " ] { " eq " | " lt " | " gt " } " VALUE " )
 
 .ti
 .IR ALIGN " := { " u8 " | " u16 " | " u32 " } "
 
 .ti
-.IR ATTRS " := [  layer " LAYER " ] [ mask " MASK " ] [ " trans " ] "
+.IR ATTRS " := [ layer " LAYER " ] [ mask " MASK " ] [ trans ]
 
 .ti
-.IR ALIGN " := { " u8 " | " u16 " | " u32 } "
-
-.ti
-.IR LAYER " := { " link " | " network " | " transport " | " 0..%d " }
+.IR LAYER " := { " link " | " network " | " transport " | " 0..2 " }
 
 .SS meta
 Metadata ematch
 .ti
-.IR meta "( " OBJECT " { " eq " | " lt "  |" gt " } " OBJECT " )
+.IR meta "( " OBJECT " { " eq " | " lt " |" gt " } " OBJECT " )
 
 .ti
 .IR OBJECT " := { " META_ID " |  " VALUE " }
 
 .ti
-.IR META_ID " := id " [ shift " SHIFT " ] [ mask " MASK " ]
+.IR META_ID " := " id " [ shift " SHIFT " ] [ mask " MASK " ]
 
 .TP
 meta attributes:
@@ -91,26 +88,26 @@ match packet data byte sequence
 .IR OFFSET  " := " int
 
 .ti
-.IR LAYER " := { " link " | " network " | " transport " | " 0..%d " }
+.IR LAYER " := { " link " | " network " | " transport " | " 0..2 " }
 
 .SS u32
 u32 ematch
 .ti
-.IR u32 "( " ALIGN VALUE MASK " at " [ nexthdr+ ] " OFFSET " )
+.IR u32 "( " ALIGN " " VALUE " " MASK " at [ nexthdr+ ] " OFFSET " )
 
 .ti
-.IR ALIGN " := " { " u8 " | " u16 " | " u32 " }
+.IR ALIGN " := { " u8 " | " u16 " | " u32 " }
 
 .SS ipset
 test packet agains ipset membership
 .ti
-.IR ipset "( " SETNAME FLAGS )
+.IR ipset "( " SETNAME " " FLAGS " )
 
 .ti
 .IR SETNAME " := " string
 
 .ti
-.IR FLAGS " := " { " FLAG " [, " FLAGS "] }
+.IR FLAGS " := { " FLAG " [, " FLAGS "] }
 
 The flag options are the same as those used by the iptables "set" match.
 
-- 
1.7.12


-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply related

* Re: Increased multicast packet drops in 3.4
From: Shawn Bohrer @ 2012-09-06 13:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1346912560.13121.175.camel@edumazet-glaptop>

On Thu, Sep 06, 2012 at 08:22:40AM +0200, Eric Dumazet wrote:
> On Wed, 2012-09-05 at 19:11 -0500, Shawn Bohrer wrote:
> > I've been testing the 3.4 kernel compared to the 3.1 kernel and
> > noticed my application is experiencing a noticeable increase in packet
> > drops compared to 3.1.  In this case I have 8 processes all listening
> > on the same multicast group and occasionally 1 or more of the
> > processes will report drops based on gaps in the sequence numbers on
> > the packets.  One thing I find interesting is that some of the time 2
> > or 3 of the 8 processes will report that they missed the exact same
> > 50+ packets.  Since the other processes receive the packets I know
> > that they are making it to the machine and past the driver.
> > 
> > So far I have not been able to _see_ any OS counters increase when the
> > drops occur but perhaps there is a location that I have not yet
> > looked.  I've been looking for drops in /proc/net/udp /proc/net/snmp
> > and /proc/net/dev.
> > 
> > I've tried using dropwatch/drop_monitor but it is awfully noisy even
> > after back porting many of the patches Eric Dumazet has contributed to
> > silence the false positives.  Similarly I setup trace-cmd/ftrace to
> > record skb:kfree_skb calls with a stacktrace and had my application
> > stop the trace when a drop was reported.  From these traces I see a
> > number of the following:
> > 
> >     md_connector-12791 [014]  7952.982818: kfree_skb:            skbaddr=0xffff880583bd7500 protocol=2048 location=0xffffffff813c930b
> >     md_connector-12791 [014]  7952.982821: kernel_stack:         <stack trace>
> > => skb_release_data (ffffffff813c930b)
> > => __kfree_skb (ffffffff813c934e)
> > => skb_free_datagram_locked (ffffffff813ccca8)
> > => udp_recvmsg (ffffffff8143335c)
> > => inet_recvmsg (ffffffff8143cbfb)
> > => sock_recvmsg_nosec (ffffffff813be80f)
> > => __sys_recvmsg (ffffffff813bfe70)
> > => __sys_recvmmsg (ffffffff813c2392)
> > => sys_recvmmsg (ffffffff813c25b0)
> > => system_call_fastpath (ffffffff8148cfd2)
> > 
> > Looking at the code it does look like these could be the drops, since
> > I do not see any counters incremented in this code path.  However I'm
> > not very familiar with this code so it could also be a false positive.
> > It does look like the above stack only gets called if
> > skb_has_frag_list(skb) does this imply the packet was over one MTU
> > (1500)?
> > 
> > I'd appreciate any input on possible causes/solutions for these drops.
> > Or ways that I can further debug this issue to find the root cause of
> > the increase in drops on 3.4.
> > 
> > Thanks,
> > Shawn
> > 
> 
> What NIC driver are you using ?
 
$ sudo ethtool -i eth4
driver: mlx4_en
version: 2.0 (Dec 2011)
firmware-version: 2.10.700
bus-info: 0000:05:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no

This is the in tree driver from 3.4.9

[ sbohrer@berbox12:/home/sbohrer ]
$ /sbin/lspci | grep -i mell
05:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

> Could you trace if skb_copy_and_csum_datagram_iovec() or
> skb_copy_datagram_iovec() returns an error (it could be EFAULT by
> example) ?
> 
> If so, you could add some debugging to these functions to track what
> exact error it is
> 
> It seems following patch is needed anyway :
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 6f6d1ac..2c965c9 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1226,6 +1226,8 @@ try_again:
>  
>  	if (unlikely(err)) {
>  		trace_kfree_skb(skb, udp_recvmsg);
> +		if (!peeked)
> +			UDP_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
>  		goto out_free;
>  	}

Sorry, I should have mentioned that it doesn't appear I'm hitting that
tracepoint.  That tracepoint would have a location=udp_recvmsg and I
believe the stack trace would also start at udp_recvmsg.  I didn't see
any of these in the traces I captured.

I think the one I'm hitting is the following with some of my own extra
annotation:

=> kfree_skb()
=> skb_drop_list()
=> skb_drop_fraglist()
> > => skb_release_data (ffffffff813c930b)
=> skb_release_all()
> > => __kfree_skb (ffffffff813c934e)
> > => skb_free_datagram_locked (ffffffff813ccca8)
> > => udp_recvmsg (ffffffff8143335c)
> > => inet_recvmsg (ffffffff8143cbfb)
> > => sock_recvmsg_nosec (ffffffff813be80f)
> > => __sys_recvmsg (ffffffff813bfe70)
> > => __sys_recvmmsg (ffffffff813c2392)
> > => sys_recvmmsg (ffffffff813c25b0)
> > => system_call_fastpath (ffffffff8148cfd2)

kfree_skb() has the trace_kfree_skb() call on net/core/skbuff.c:3283

I can of course still try your patch and double check that I'm not
hitting that one.

Thanks,
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply

* Re: usbnet: fix oops in usbnet_start_xmit
From: Dan Carpenter @ 2012-09-06 12:52 UTC (permalink / raw)
  To: khlebnikov-GEFAQzZX7r8dnm+yROfE0A
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20111109073459.GA14409-mgFCXtclrQlZLf2FXnZxJA@public.gmane.org>

I sent this email a year ago when the patch was committed but I
never received a response.

regards,
dan carpenter

On Wed, Nov 09, 2011 at 10:34:59AM +0300, Dan Carpenter wrote:
> Hello Konstantin Khlebnikov,
> 
> This is a semi-automatic email about new static checker warnings.
> 
> The patch 23ba07991dad: "usbnet: fix oops in usbnet_start_xmit" from 
> Nov 7, 2011, leads to the following Smatch complaint:
> 
> drivers/net/usb/usbnet.c +1077 usbnet_start_xmit()
> 	 error: we previously assumed 'skb' could be null (see line 1060)
> 
> drivers/net/usb/usbnet.c
>   1059	
>   1060		if (skb)
>                     ^^^
> check introduced here.
> 
>   1061			skb_tx_timestamp(skb);
>   1062	
>   1063		// some devices want funky USB-level framing, for
>   1064		// win32 driver (usually) and/or hardware quirks
>   1065		if (info->tx_fixup) {
>   1066			skb = info->tx_fixup (dev, skb, GFP_ATOMIC);
>   1067			if (!skb) {
>   1068				if (netif_msg_tx_err(dev)) {
>   1069					netif_dbg(dev, tx_err, dev->net, "can't tx_fixup skb\n");
>   1070					goto drop;
>   1071				} else {
>   1072					/* cdc_ncm collected packet; waits for more */
>   1073					goto not_drop;
>   1074				}
>   1075			}
>   1076		}
>   1077		length = skb->len;
>                          ^^^^^^^^
> dereference without checking.
> 
>   1078	
>   1079		if (!(urb = usb_alloc_urb (0, GFP_ATOMIC))) {
> 
> regards,
> dan carpenter
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] netfilter: take care of timewait sockets
From: Pablo Neira Ayuso @ 2012-09-06 12:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Patrick McHardy, netfilter-devel, netdev
In-Reply-To: <1346867542.13121.160.camel@edumazet-glaptop>

On Wed, Sep 05, 2012 at 07:52:22PM +0200, Eric Dumazet wrote:
> On Wed, 2012-09-05 at 19:10 +0200, Pablo Neira Ayuso wrote:
> > On Tue, Sep 04, 2012 at 07:49:03PM +0200, Eric Dumazet wrote:
> > > From: Eric Dumazet <edumazet@google.com>
> > > 
> > > Sami Farin reported crashes in xt_LOG because it assumes skb->sk is a
> > > full blown socket.
> > > 
> > > But with TCP early demux, we can have skb->sk pointing to a timewait
> > > socket.
> > 
> > TCP early demux is there since 3.6-rc.
> > 
> > I'll add that to the changelog if you don't mind, to help tracking
> > things for -stable.
> 
> Sure, its not a stable candidate.

Done, applied thanks a lot Eric!

^ permalink raw reply

* [PATCH 2/2] netlink: remove module parameter from netlink_kernel_create
From: pablo @ 2012-09-06 12:31 UTC (permalink / raw)
  To: netdev; +Cc: davem
In-Reply-To: <1346934712-3056-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>

This patch defines netlink_kernel_create as a wrapper function of
__netlink_kernel_create to hide the struct module *me parameter
(which seems to be THIS_MODULE in all existing netlink subsystems).

Suggested by David S. Miller.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 crypto/crypto_user.c                |    3 +--
 drivers/connector/connector.c       |    3 +--
 drivers/scsi/scsi_netlink.c         |    2 +-
 drivers/scsi/scsi_transport_iscsi.c |    3 +--
 drivers/staging/gdm72xx/netlink_k.c |    2 +-
 include/linux/netlink.h             |   13 ++++++++++---
 kernel/audit.c                      |    3 +--
 lib/kobject_uevent.c                |    3 +--
 net/bridge/netfilter/ebt_ulog.c     |    3 +--
 net/core/rtnetlink.c                |    2 +-
 net/core/sock_diag.c                |    3 +--
 net/decnet/netfilter/dn_rtmsg.c     |    3 +--
 net/ipv4/fib_frontend.c             |    2 +-
 net/ipv4/netfilter/ipt_ULOG.c       |    3 +--
 net/netfilter/nfnetlink.c           |    2 +-
 net/netlink/af_netlink.c            |    8 +++-----
 net/netlink/genetlink.c             |    3 +--
 net/xfrm/xfrm_user.c                |    2 +-
 security/selinux/netlink.c          |    3 +--
 19 files changed, 30 insertions(+), 36 deletions(-)

diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
index ba2c611..165914e 100644
--- a/crypto/crypto_user.c
+++ b/crypto/crypto_user.c
@@ -500,8 +500,7 @@ static int __init crypto_user_init(void)
 		.input	= crypto_netlink_rcv,
 	};
 
-	crypto_nlsk = netlink_kernel_create(&init_net, NETLINK_CRYPTO,
-					    THIS_MODULE, &cfg);
+	crypto_nlsk = netlink_kernel_create(&init_net, NETLINK_CRYPTO, &cfg);
 	if (!crypto_nlsk)
 		return -ENOMEM;
 
diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 82fa4f0..965b781 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -264,8 +264,7 @@ static int __devinit cn_init(void)
 		.input	= dev->input,
 	};
 
-	dev->nls = netlink_kernel_create(&init_net, NETLINK_CONNECTOR,
-					 THIS_MODULE, &cfg);
+	dev->nls = netlink_kernel_create(&init_net, NETLINK_CONNECTOR, &cfg);
 	if (!dev->nls)
 		return -EIO;
 
diff --git a/drivers/scsi/scsi_netlink.c b/drivers/scsi/scsi_netlink.c
index 8818dd6..3252bc9 100644
--- a/drivers/scsi/scsi_netlink.c
+++ b/drivers/scsi/scsi_netlink.c
@@ -501,7 +501,7 @@ scsi_netlink_init(void)
 	}
 
 	scsi_nl_sock = netlink_kernel_create(&init_net, NETLINK_SCSITRANSPORT,
-					     THIS_MODULE, &cfg);
+					     &cfg);
 	if (!scsi_nl_sock) {
 		printk(KERN_ERR "%s: register of receive handler failed\n",
 				__func__);
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index fa1dfaa..519bd53 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -2969,8 +2969,7 @@ static __init int iscsi_transport_init(void)
 	if (err)
 		goto unregister_conn_class;
 
-	nls = netlink_kernel_create(&init_net, NETLINK_ISCSI,
-				    THIS_MODULE, &cfg);
+	nls = netlink_kernel_create(&init_net, NETLINK_ISCSI, &cfg);
 	if (!nls) {
 		err = -ENOBUFS;
 		goto unregister_session_class;
diff --git a/drivers/staging/gdm72xx/netlink_k.c b/drivers/staging/gdm72xx/netlink_k.c
index 3abb31d..2109cab 100644
--- a/drivers/staging/gdm72xx/netlink_k.c
+++ b/drivers/staging/gdm72xx/netlink_k.c
@@ -95,7 +95,7 @@ struct sock *netlink_init(int unit, void (*cb)(struct net_device *dev, u16 type,
 	init_MUTEX(&netlink_mutex);
 #endif
 
-	sock = netlink_kernel_create(&init_net, unit, THIS_MODULE, &cfg);
+	sock = netlink_kernel_create(&init_net, unit, &cfg);
 
 	if (sock)
 		rcv_cb = cb;
diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index d30ee743..628e799 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -153,6 +153,7 @@ struct nlattr {
 
 #include <linux/capability.h>
 #include <linux/skbuff.h>
+#include <linux/module.h>
 
 struct net;
 
@@ -187,9 +188,15 @@ struct netlink_kernel_cfg {
 	unsigned int	flags;
 };
 
-extern struct sock *netlink_kernel_create(struct net *net, int unit,
-					  struct module *module,
-					  struct netlink_kernel_cfg *cfg);
+extern struct sock *__netlink_kernel_create(struct net *net, int unit,
+					    struct module *module,
+					    struct netlink_kernel_cfg *cfg);
+static inline struct sock *
+netlink_kernel_create(struct net *net, int unit, struct netlink_kernel_cfg *cfg)
+{
+	return __netlink_kernel_create(net, unit, THIS_MODULE, cfg);
+}
+
 extern void netlink_kernel_release(struct sock *sk);
 extern int __netlink_change_ngroups(struct sock *sk, unsigned int groups);
 extern int netlink_change_ngroups(struct sock *sk, unsigned int groups);
diff --git a/kernel/audit.c b/kernel/audit.c
index ea3b7b6..a24aafa 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -971,8 +971,7 @@ static int __init audit_init(void)
 
 	printk(KERN_INFO "audit: initializing netlink socket (%s)\n",
 	       audit_default ? "enabled" : "disabled");
-	audit_sock = netlink_kernel_create(&init_net, NETLINK_AUDIT,
-					   THIS_MODULE, &cfg);
+	audit_sock = netlink_kernel_create(&init_net, NETLINK_AUDIT, &cfg);
 	if (!audit_sock)
 		audit_panic("cannot initialize netlink socket");
 	else
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index c2e9778..52e5abb 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -382,8 +382,7 @@ static int uevent_net_init(struct net *net)
 	if (!ue_sk)
 		return -ENOMEM;
 
-	ue_sk->sk = netlink_kernel_create(net, NETLINK_KOBJECT_UEVENT,
-					  THIS_MODULE, &cfg);
+	ue_sk->sk = netlink_kernel_create(net, NETLINK_KOBJECT_UEVENT, &cfg);
 	if (!ue_sk->sk) {
 		printk(KERN_ERR
 		       "kobject_uevent: unable to create netlink socket!\n");
diff --git a/net/bridge/netfilter/ebt_ulog.c b/net/bridge/netfilter/ebt_ulog.c
index 1906347..3476ec4 100644
--- a/net/bridge/netfilter/ebt_ulog.c
+++ b/net/bridge/netfilter/ebt_ulog.c
@@ -298,8 +298,7 @@ static int __init ebt_ulog_init(void)
 		spin_lock_init(&ulog_buffers[i].lock);
 	}
 
-	ebtulognl = netlink_kernel_create(&init_net, NETLINK_NFLOG,
-					  THIS_MODULE, &cfg);
+	ebtulognl = netlink_kernel_create(&init_net, NETLINK_NFLOG, &cfg);
 	if (!ebtulognl)
 		ret = -ENOMEM;
 	else if ((ret = xt_register_target(&ebt_ulog_tg_reg)) != 0)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a71806e..508c5df 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2384,7 +2384,7 @@ static int __net_init rtnetlink_net_init(struct net *net)
 		.flags		= NL_CFG_F_NONROOT_RECV,
 	};
 
-	sk = netlink_kernel_create(net, NETLINK_ROUTE, THIS_MODULE, &cfg);
+	sk = netlink_kernel_create(net, NETLINK_ROUTE, &cfg);
 	if (!sk)
 		return -ENOMEM;
 	net->rtnl = sk;
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index 9d8755e..602cd63 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -172,8 +172,7 @@ static int __net_init diag_net_init(struct net *net)
 		.input	= sock_diag_rcv,
 	};
 
-	net->diag_nlsk = netlink_kernel_create(net, NETLINK_SOCK_DIAG,
-					       THIS_MODULE, &cfg);
+	net->diag_nlsk = netlink_kernel_create(net, NETLINK_SOCK_DIAG, &cfg);
 	return net->diag_nlsk == NULL ? -ENOMEM : 0;
 }
 
diff --git a/net/decnet/netfilter/dn_rtmsg.c b/net/decnet/netfilter/dn_rtmsg.c
index 11db0ec..dfe4201 100644
--- a/net/decnet/netfilter/dn_rtmsg.c
+++ b/net/decnet/netfilter/dn_rtmsg.c
@@ -130,8 +130,7 @@ static int __init dn_rtmsg_init(void)
 		.input	= dnrmg_receive_user_skb,
 	};
 
-	dnrmg = netlink_kernel_create(&init_net,
-				      NETLINK_DNRTMSG, THIS_MODULE, &cfg);
+	dnrmg = netlink_kernel_create(&init_net, NETLINK_DNRTMSG, &cfg);
 	if (dnrmg == NULL) {
 		printk(KERN_ERR "dn_rtmsg: Cannot create netlink socket");
 		return -ENOMEM;
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index acdee32..21bf521 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -986,7 +986,7 @@ static int __net_init nl_fib_lookup_init(struct net *net)
 		.input	= nl_fib_input,
 	};
 
-	sk = netlink_kernel_create(net, NETLINK_FIB_LOOKUP, THIS_MODULE, &cfg);
+	sk = netlink_kernel_create(net, NETLINK_FIB_LOOKUP, &cfg);
 	if (sk == NULL)
 		return -EAFNOSUPPORT;
 	net->ipv4.fibnl = sk;
diff --git a/net/ipv4/netfilter/ipt_ULOG.c b/net/ipv4/netfilter/ipt_ULOG.c
index 1109f7f..b5ef3cb 100644
--- a/net/ipv4/netfilter/ipt_ULOG.c
+++ b/net/ipv4/netfilter/ipt_ULOG.c
@@ -396,8 +396,7 @@ static int __init ulog_tg_init(void)
 	for (i = 0; i < ULOG_MAXNLGROUPS; i++)
 		setup_timer(&ulog_buffers[i].timer, ulog_timer, i);
 
-	nflognl = netlink_kernel_create(&init_net, NETLINK_NFLOG,
-					THIS_MODULE, &cfg);
+	nflognl = netlink_kernel_create(&init_net, NETLINK_NFLOG, &cfg);
 	if (!nflognl)
 		return -ENOMEM;
 
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index a265033..ffb92c0 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -241,7 +241,7 @@ static int __net_init nfnetlink_net_init(struct net *net)
 #endif
 	};
 
-	nfnl = netlink_kernel_create(net, NETLINK_NETFILTER, THIS_MODULE, &cfg);
+	nfnl = netlink_kernel_create(net, NETLINK_NETFILTER, &cfg);
 	if (!nfnl)
 		return -ENOMEM;
 	net->nfnl_stash = nfnl;
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 1543a66..93768db 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1526,9 +1526,8 @@ static void netlink_data_ready(struct sock *sk, int len)
  */
 
 struct sock *
-netlink_kernel_create(struct net *net, int unit,
-		      struct module *module,
-		      struct netlink_kernel_cfg *cfg)
+__netlink_kernel_create(struct net *net, int unit, struct module *module,
+			struct netlink_kernel_cfg *cfg)
 {
 	struct socket *sock;
 	struct sock *sk;
@@ -1603,8 +1602,7 @@ out_sock_release_nosk:
 	sock_release(sock);
 	return NULL;
 }
-EXPORT_SYMBOL(netlink_kernel_create);
-
+EXPORT_SYMBOL(__netlink_kernel_create);
 
 void
 netlink_kernel_release(struct sock *sk)
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index c1b71ae..19288b7 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -922,8 +922,7 @@ static int __net_init genl_pernet_init(struct net *net)
 	};
 
 	/* we'll bump the group number right afterwards */
-	net->genl_sock = netlink_kernel_create(net, NETLINK_GENERIC,
-					       THIS_MODULE, &cfg);
+	net->genl_sock = netlink_kernel_create(net, NETLINK_GENERIC, &cfg);
 
 	if (!net->genl_sock && net_eq(net, &init_net))
 		panic("GENL: Cannot initialize generic netlink\n");
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index ab58034..354070a 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2963,7 +2963,7 @@ static int __net_init xfrm_user_net_init(struct net *net)
 		.input	= xfrm_netlink_rcv,
 	};
 
-	nlsk = netlink_kernel_create(net, NETLINK_XFRM, THIS_MODULE, &cfg);
+	nlsk = netlink_kernel_create(net, NETLINK_XFRM, &cfg);
 	if (nlsk == NULL)
 		return -ENOMEM;
 	net->xfrm.nlsk_stash = nlsk; /* Don't set to NULL */
diff --git a/security/selinux/netlink.c b/security/selinux/netlink.c
index 0d2cd11..14d810e 100644
--- a/security/selinux/netlink.c
+++ b/security/selinux/netlink.c
@@ -116,8 +116,7 @@ static int __init selnl_init(void)
 		.flags	= NL_CFG_F_NONROOT_RECV,
 	};
 
-	selnl = netlink_kernel_create(&init_net, NETLINK_SELINUX,
-				      THIS_MODULE, &cfg);
+	selnl = netlink_kernel_create(&init_net, NETLINK_SELINUX, &cfg);
 	if (selnl == NULL)
 		panic("SELinux:  Cannot create netlink socket.");
 	return 0;
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 1/2] netlink: kill netlink_set_nonroot
From: pablo @ 2012-09-06 12:31 UTC (permalink / raw)
  To: netdev; +Cc: davem

From: Pablo Neira Ayuso <pablo@netfilter.org>

Replace netlink_set_nonroot by one new field `flags' in
struct netlink_kernel_cfg that is passed to netlink_kernel_create.

This patch also renames NL_NONROOT_* to NL_CFG_F_NONROOT_* since
now the flags field in nl_table is generic (so we can add more
flags if needed in the future).

Also adjust all callers in the net-next tree to use these flags
instead of netlink_set_nonroot.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netlink.h    |    9 ++++-----
 lib/kobject_uevent.c       |    2 +-
 net/core/rtnetlink.c       |    2 +-
 net/netlink/af_netlink.c   |   28 +++++++++++++---------------
 net/netlink/genetlink.c    |    3 +--
 security/selinux/netlink.c |    2 +-
 6 files changed, 21 insertions(+), 25 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index c9fdde2..d30ee743 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -175,12 +175,16 @@ struct netlink_skb_parms {
 extern void netlink_table_grab(void);
 extern void netlink_table_ungrab(void);
 
+#define NL_CFG_F_NONROOT_RECV	(1 << 0)
+#define NL_CFG_F_NONROOT_SEND	(1 << 1)
+
 /* optional Netlink kernel configuration parameters */
 struct netlink_kernel_cfg {
 	unsigned int	groups;
 	void		(*input)(struct sk_buff *skb);
 	struct mutex	*cb_mutex;
 	void		(*bind)(int group);
+	unsigned int	flags;
 };
 
 extern struct sock *netlink_kernel_create(struct net *net, int unit,
@@ -259,11 +263,6 @@ extern int netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 			      const struct nlmsghdr *nlh,
 			      struct netlink_dump_control *control);
 
-
-#define NL_NONROOT_RECV 0x1
-#define NL_NONROOT_SEND 0x2
-extern void netlink_set_nonroot(int protocol, unsigned flag);
-
 #endif /* __KERNEL__ */
 
 #endif	/* __LINUX_NETLINK_H */
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 0401d29..c2e9778 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -375,6 +375,7 @@ static int uevent_net_init(struct net *net)
 	struct uevent_sock *ue_sk;
 	struct netlink_kernel_cfg cfg = {
 		.groups	= 1,
+		.flags	= NL_CFG_F_NONROOT_RECV,
 	};
 
 	ue_sk = kzalloc(sizeof(*ue_sk), GFP_KERNEL);
@@ -422,7 +423,6 @@ static struct pernet_operations uevent_net_ops = {
 
 static int __init kobject_uevent_init(void)
 {
-	netlink_set_nonroot(NETLINK_KOBJECT_UEVENT, NL_NONROOT_RECV);
 	return register_pernet_subsys(&uevent_net_ops);
 }
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index c64efcf..a71806e 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2381,6 +2381,7 @@ static int __net_init rtnetlink_net_init(struct net *net)
 		.groups		= RTNLGRP_MAX,
 		.input		= rtnetlink_rcv,
 		.cb_mutex	= &rtnl_mutex,
+		.flags		= NL_CFG_F_NONROOT_RECV,
 	};
 
 	sk = netlink_kernel_create(net, NETLINK_ROUTE, THIS_MODULE, &cfg);
@@ -2416,7 +2417,6 @@ void __init rtnetlink_init(void)
 	if (register_pernet_subsys(&rtnetlink_net_ops))
 		panic("rtnetlink_init: cannot initialize rtnetlink\n");
 
-	netlink_set_nonroot(NETLINK_ROUTE, NL_NONROOT_RECV);
 	register_netdevice_notifier(&rtnetlink_dev_notifier);
 
 	rtnl_register(PF_UNSPEC, RTM_GETLINK, rtnl_getlink,
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 3821199..1543a66 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -121,7 +121,7 @@ struct netlink_table {
 	struct nl_pid_hash	hash;
 	struct hlist_head	mc_list;
 	struct listeners __rcu	*listeners;
-	unsigned int		nl_nonroot;
+	unsigned int		flags;
 	unsigned int		groups;
 	struct mutex		*cb_mutex;
 	struct module		*module;
@@ -536,6 +536,8 @@ static int netlink_release(struct socket *sock)
 		if (--nl_table[sk->sk_protocol].registered == 0) {
 			kfree(nl_table[sk->sk_protocol].listeners);
 			nl_table[sk->sk_protocol].module = NULL;
+			nl_table[sk->sk_protocol].bind = NULL;
+			nl_table[sk->sk_protocol].flags = 0;
 			nl_table[sk->sk_protocol].registered = 0;
 		}
 	} else if (nlk->subscriptions) {
@@ -596,7 +598,7 @@ retry:
 
 static inline int netlink_capable(const struct socket *sock, unsigned int flag)
 {
-	return (nl_table[sock->sk->sk_protocol].nl_nonroot & flag) ||
+	return (nl_table[sock->sk->sk_protocol].flags & flag) ||
 	       capable(CAP_NET_ADMIN);
 }
 
@@ -659,7 +661,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
 
 	/* Only superuser is allowed to listen multicasts */
 	if (nladdr->nl_groups) {
-		if (!netlink_capable(sock, NL_NONROOT_RECV))
+		if (!netlink_capable(sock, NL_CFG_F_NONROOT_RECV))
 			return -EPERM;
 		err = netlink_realloc_groups(sk);
 		if (err)
@@ -721,7 +723,7 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr,
 		return -EINVAL;
 
 	/* Only superuser is allowed to send multicasts */
-	if (nladdr->nl_groups && !netlink_capable(sock, NL_NONROOT_SEND))
+	if (nladdr->nl_groups && !netlink_capable(sock, NL_CFG_F_NONROOT_SEND))
 		return -EPERM;
 
 	if (!nlk->pid)
@@ -1244,7 +1246,7 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
 		break;
 	case NETLINK_ADD_MEMBERSHIP:
 	case NETLINK_DROP_MEMBERSHIP: {
-		if (!netlink_capable(sock, NL_NONROOT_RECV))
+		if (!netlink_capable(sock, NL_CFG_F_NONROOT_RECV))
 			return -EPERM;
 		err = netlink_realloc_groups(sk);
 		if (err)
@@ -1376,7 +1378,7 @@ static int netlink_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		dst_group = ffs(addr->nl_groups);
 		err =  -EPERM;
 		if ((dst_group || dst_pid) &&
-		    !netlink_capable(sock, NL_NONROOT_SEND))
+		    !netlink_capable(sock, NL_CFG_F_NONROOT_SEND))
 			goto out;
 	} else {
 		dst_pid = nlk->dst_pid;
@@ -1580,7 +1582,10 @@ netlink_kernel_create(struct net *net, int unit,
 		rcu_assign_pointer(nl_table[unit].listeners, listeners);
 		nl_table[unit].cb_mutex = cb_mutex;
 		nl_table[unit].module = module;
-		nl_table[unit].bind = cfg ? cfg->bind : NULL;
+		if (cfg) {
+			nl_table[unit].bind = cfg->bind;
+			nl_table[unit].flags = cfg->flags;
+		}
 		nl_table[unit].registered = 1;
 	} else {
 		kfree(listeners);
@@ -1679,13 +1684,6 @@ void netlink_clear_multicast_users(struct sock *ksk, unsigned int group)
 	netlink_table_ungrab();
 }
 
-void netlink_set_nonroot(int protocol, unsigned int flags)
-{
-	if ((unsigned int)protocol < MAX_LINKS)
-		nl_table[protocol].nl_nonroot = flags;
-}
-EXPORT_SYMBOL(netlink_set_nonroot);
-
 struct nlmsghdr *
 __nlmsg_put(struct sk_buff *skb, u32 pid, u32 seq, int type, int len, int flags)
 {
@@ -2150,7 +2148,7 @@ static void __init netlink_add_usersock_entry(void)
 	rcu_assign_pointer(nl_table[NETLINK_USERSOCK].listeners, listeners);
 	nl_table[NETLINK_USERSOCK].module = THIS_MODULE;
 	nl_table[NETLINK_USERSOCK].registered = 1;
-	nl_table[NETLINK_USERSOCK].nl_nonroot = NL_NONROOT_SEND;
+	nl_table[NETLINK_USERSOCK].flags = NL_CFG_F_NONROOT_SEND;
 
 	netlink_table_ungrab();
 }
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index fda4974..c1b71ae 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -918,6 +918,7 @@ static int __net_init genl_pernet_init(struct net *net)
 	struct netlink_kernel_cfg cfg = {
 		.input		= genl_rcv,
 		.cb_mutex	= &genl_mutex,
+		.flags		= NL_CFG_F_NONROOT_RECV,
 	};
 
 	/* we'll bump the group number right afterwards */
@@ -955,8 +956,6 @@ static int __init genl_init(void)
 	if (err < 0)
 		goto problem;
 
-	netlink_set_nonroot(NETLINK_GENERIC, NL_NONROOT_RECV);
-
 	err = register_pernet_subsys(&genl_pernet_ops);
 	if (err)
 		goto problem;
diff --git a/security/selinux/netlink.c b/security/selinux/netlink.c
index 8a77725..0d2cd11 100644
--- a/security/selinux/netlink.c
+++ b/security/selinux/netlink.c
@@ -113,13 +113,13 @@ static int __init selnl_init(void)
 {
 	struct netlink_kernel_cfg cfg = {
 		.groups	= SELNLGRP_MAX,
+		.flags	= NL_CFG_F_NONROOT_RECV,
 	};
 
 	selnl = netlink_kernel_create(&init_net, NETLINK_SELINUX,
 				      THIS_MODULE, &cfg);
 	if (selnl == NULL)
 		panic("SELinux:  Cannot create netlink socket.");
-	netlink_set_nonroot(NETLINK_SELINUX, NL_NONROOT_RECV);
 	return 0;
 }
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCHv3] virtio-spec: virtio network device multiqueue support
From: Michael S. Tsirkin @ 2012-09-06 12:08 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, kvm, virtualization

Add multiqueue support to virtio network device.
Add a new feature flag VIRTIO_NET_F_MULTIQUEUE for this feature, a new
configuration field max_virtqueue_pairs to detect supported number of
virtqueues as well as a new command VIRTIO_NET_CTRL_STEERING to program
packet steering.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

--

Changes from v2:
Address Jason's comments on v2:
- Changed STEERING_HOST to STEERING_RX_FOLLOWS_TX:
  this is both clearer and easier to support.
  It does not look like we need a separate steering command
  since host can just watch tx packets as they go.
- Moved RX and TX steering sections near each other.
- Add motivation for other changes in v2

Changes from Jason's rfc:
- reserved vq 3: this makes all rx vqs even and tx vqs odd, which
  looks nicer to me.
- documented packet steering, added a generalized steering programming
  command. Current modes are single queue and host driven multiqueue,
  but I envision support for guest driven multiqueue in the future.
- make default vqs unused when in mq mode - this wastes some memory
  but makes it more efficient to switch between modes as
  we can avoid this causing packet reordering.

Rusty, could you please take a look and comment?
If this looks OK to everyone, we can proceed with finalizing the
implementation.  This patch is against
eb9fc84d0d3c46438aaab190e2401a9e5409a052 in virtio-spec git tree.

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index 7a073f4..a713807 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -58,6 +58,7 @@
 \html_be_strict false
 \author -608949062 "Rusty Russell,,," 
 \author 1531152142 "Paolo Bonzini,,," 
+\author 1986246365 "Michael S. Tsirkin" 
 \end_header
 
 \begin_body
@@ -3896,6 +3897,37 @@ Only if VIRTIO_NET_F_CTRL_VQ set
 \end_inset
 
 
+\change_inserted 1986246365 1346663522
+ 3: reserved
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1986246365 1346663550
+4: receiveq1.
+ 5: transmitq1.
+ 6: receiveq2.
+ 7.
+ transmitq2.
+ ...
+ 2N+2:receivqN, 2N+3:transmitqN
+\begin_inset Foot
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346663558
+Only if VIRTIO_NET_F_CTRL_VQ set.
+ N is indicated by max_virtqueue_pairs field.
+\change_unchanged
+
+\end_layout
+
+\end_inset
+
+
+\change_unchanged
+
 \end_layout
 
 \begin_layout Description
@@ -4056,6 +4088,17 @@ VIRTIO_NET_F_CTRL_VLAN
 
 \begin_layout Description
 VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous packets.
+\change_inserted 1986246365 1346617842
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1986246365 1346618103
+VIRTIO_NET_F_MULTIQUEUE(22) Device has multiple receive and transmission
+ queues.
+\change_unchanged
+
 \end_layout
 
 \end_deeper
@@ -4068,11 +4111,45 @@ configuration
 \begin_inset space ~
 \end_inset
 
-layout Two configuration fields are currently defined.
+layout 
+\change_deleted 1986246365 1346671560
+Two
+\change_inserted 1986246365 1346671647
+Six
+\change_unchanged
+ configuration fields are currently defined.
  The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC
  is set), and the status field only exists if VIRTIO_NET_F_STATUS is set.
  Two read-only bits are currently defined for the status field: VIRTIO_NET_S_LIN
 K_UP and VIRTIO_NET_S_ANNOUNCE.
+
+\change_inserted 1986246365 1346930950
+ The following four read-only fields only exists if VIRTIO_NET_F_MULTIQUEUE
+ is set.
+ The max_virtqueue_pairs field specifies the maximum number of each of transmit
+ and receive virtqueues that can be used for multiqueue operation.
+ The following read-only fields: 
+\emph on
+current_steering_rule
+\emph default
+, 
+\emph on
+reserved
+\emph default
+ and 
+\emph on
+current_steering_param
+\emph default
+ store the last successful VIRTIO_NET_CTRL_STEERING
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "sub:Transmit-Packet-Steering"
+
+\end_inset
+
+ command executed by driver, for debugging purposes.
+
+\change_unchanged
  
 \begin_inset listings
 inline false
@@ -4105,6 +4182,40 @@ struct virtio_net_config {
 \begin_layout Plain Layout
 
     u16 status;
+\change_inserted 1986246365 1346671221
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346671532
+
+    u16 max_virtqueue_pairs;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346671531
+
+    u8 current_steering_rule;
+\change_unchanged
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346671499
+
+    u8 reserved;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346671530
+
+    u16 current_steering_param;
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -4151,6 +4262,18 @@ physical
 \begin_layout Enumerate
 If the VIRTIO_NET_F_CTRL_VQ feature bit is negotiated, identify the control
  virtqueue.
+\change_inserted 1986246365 1346618052
+
+\end_layout
+
+\begin_layout Enumerate
+
+\change_inserted 1986246365 1346618175
+If VIRTIO_NET_F_MULTIQUEUE feature bit is negotiated, identify the receive
+ and transmission queues that are going to be used in multiqueue mode.
+ Only queues that are going to be used need to be initialized.
+\change_unchanged
+
 \end_layout
 
 \begin_layout Enumerate
@@ -4168,7 +4291,11 @@ status
 \end_layout
 
 \begin_layout Enumerate
-The receive virtqueue should be filled with receive buffers.
+The receive virtqueue
+\change_inserted 1986246365 1346618180
+(s)
+\change_unchanged
+ should be filled with receive buffers.
  This is described in detail below in 
 \begin_inset Quotes eld
 \end_inset
@@ -4513,6 +4640,8 @@ Note that the header will be two bytes longer for the VIRTIO_NET_F_MRG_RXBUF
 \end_inset
 
 
+\change_deleted 1986246365 1346932640
+
 \end_layout
 
 \begin_layout Subsection*
@@ -4988,8 +5117,24 @@ status open
 The Guest needs to check VIRTIO_NET_S_ANNOUNCE bit in status field when
  it notices the changes of device configuration.
  The command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that driver
- has recevied the notification and device would clear the VIRTIO_NET_S_ANNOUNCE
- bit in the status filed after it received this command.
+ has rece
+\change_inserted 1986246365 1346663932
+i
+\change_unchanged
+v
+\change_deleted 1986246365 1346663934
+i
+\change_unchanged
+ed the notification and device would clear the VIRTIO_NET_S_ANNOUNCE bit
+ in the status fi
+\change_inserted 1986246365 1346663942
+e
+\change_unchanged
+l
+\change_deleted 1986246365 1346663943
+e
+\change_unchanged
+d after it received this command.
 \end_layout
 
 \begin_layout Standard
@@ -5004,10 +5149,298 @@ Sending the gratuitous packets or marking there are pending gratuitous packets
 \begin_layout Enumerate
 Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control vq.
  
+\change_deleted 1986246365 1346662247
+
 \end_layout
 
-\begin_layout Enumerate
+\begin_layout Subsection*
+
+\change_inserted 1986246365 1346932658
+\begin_inset CommandInset label
+LatexCommand label
+name "sub:Transmit-Packet-Steering"
+
+\end_inset
+
+Transmit Packet Steering
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346932658
+When VIRTIO_NET_F_MULTIQUEUE feature bit is negotiated, guest can use any
+ of multiple configured transmit queues to transmit a given packet.
+ To avoid packet reordering by device (which generally leads to performance
+ degradation) driver should attempt to utilize the same transmit virtqueue
+ for all packets of a given transmit flow.
+ For bi-directional protocols (in practice, TCP), a given network connection
+ can utilize both transmit and receive queues.
+ For best performance, packets from a single connection should utilize the
+ paired transmit and receive queues from the same virtqueue pair; for example
+ both transmitqN and receiveqN.
+ This rule makes it possible to optimize processing on the device side,
+ but this is not a hard requirement: devices should function correctly even
+ when this rule is not followed.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346932658
+Driver selects an active steering rule using VIRTIO_NET_CTRL_STEERING command
+ (this controls both which virtqueue is selected for a given packet for
+ receive and notifies the device which virtqueues are about to be used for
+ transmit).
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346932658
+This command accepts a single out argument in the following format:
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346932658
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346932658
+
+#define VIRTIO_NET_CTRL_STEERING       4
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346932658
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346932658
+
+struct virtio_net_ctrl_steering {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346932658
+
+	u8 current_steering_rule;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346932658
+
+    u8 reserved;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346932658
+
+	u16 current_steering_param;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346932658
+
+};
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346932658
+
+#define VIRTIO_NET_CTRL_STEERING_SINGLE       0
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346932658
+
+#define VIRTIO_NET_CTRL_STEERING_HOST  1
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346932658
+The field 
+\emph on
+rule
+\emph default
+ specifies the function used to select transmit virtqueue for a given packet;
+ the field 
+\emph on
+param
+\emph default
+ makes it possible to pass an extra parameter if appropriate.
+ When 
+\emph on
+rule
+\emph default
+ is set to VIRTIO_NET_CTRL_STEERING_SINGLE (this is the default) all packets
+ are steered to the default virtqueue transmitq (1); param is unused; this
+ is the default.
+ With any other rule, When 
+\emph on
+rule
+\emph default
+ is set to VIRTIO_NET_CTRL_STEERING_RX_FOLLOWS_TX packets are steered by
+ driver to the first (
+\emph on
+param
+\emph default
++1) multiqueue virtqueues transmitq1...transmitqN; the default transmitq is
+ unused.
+ Driver must have configured all these (
+\emph on
+param
+\emph default
++1) virtqueues beforehand.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346932658
+Supported steering rules can be added and removed in the future.
+ Driver should check that the request to change the steering rule was successful
+ by checking ack values of the command.
+ As selecting a specific steering ais n optimization feature, drivers should
+ avoid hard failure and fall back on using a supported steering rule if
+ this command fails.
+ The default steering rule is VIRTIO_NET_CTRL_STEERING_SINGLE.
+ It will not be removed.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346932658
+When the steering rule is modified, some packets can still be outstanding
+ in one or more of the transmit virtqueues.
+ Since drivers might choose to modify the current steering rule at a high
+ rate (e.g.
+ adaptively in response to changes in the workload) to avoid reordering
+ packets, device is recommended to complete processing of the transmit queue(s)
+ utilized by the original steering before processing any packets delivered
+ by the modified steering rule.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346932658
+For debugging, the current steering rule can also be read from the configuration
+ space.
+\end_layout
+
+\begin_layout Subsection*
+
+\change_inserted 1986246365 1346670357
+\begin_inset CommandInset label
+LatexCommand label
+name "sub:Receive-Packet-Steering"
+
+\end_inset
+
+Receive Packet Steering
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346671046
+When VIRTIO_NET_F_MULTIQUEUE feature bit is negotiated, device can use any
+ of multiple configured receive queues to pass a given packet to driver.
+ Driver controls which virtqueue is selected in practice by configuring
+ packet steering rule using VIRTIO_NET_CTRL_STEERING command, as described
+ above
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "sub:Transmit-Packet-Steering"
+
+\end_inset
+
+.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346931532
+The field 
+\emph on
+rule
+\emph default
+ specifies the function used to select receive virtqueue for a given packet;
+ the field 
+\emph on
+param
+\emph default
+ makes it possible to pass an extra parameter if appropriate.
+ When 
+\emph on
+rule
+\emph default
+ is set to VIRTIO_NET_CTRL_STEERING_SINGLE all packets are steered to the
+ default virtqueue receveq (0); param is unused; this is the default.
+ When 
+\emph on
+rule
+\emph default
+ is set to VIRTIO_NET_CTRL_STEERING_RX_FOLLOWS_TX packets are steered by
+ host to the first (
+\emph on
+param
+\emph default
++1) multiqueue virtqueues receiveq1...receiveqN; the default receiveq is unused.
+ Driver must have configured all these (
+\emph on
+param
+\emph default
++1) virtqueues beforehand.
+ For best performance for bi-directional flows (such as TCP) device should
+ detect the flow to virtqueue pair mapping on transmit and select the receive
+ virtqueue from the same virtqueue pair.
+ For uni-directional flows, or when this mapping information is missing,
+ a device-specific steering function is used.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346669564
+Supported steering rules can be added and removed in the future.
+ Driver should probe for supported rules by checking ack values of the command.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346932135
+When the steering rule is modified, some packets can still be outstanding
+ in one or more of the virtqueues.
+ Device is not required to wait for these packets to be consumed before
+ delivering packets using the new streering rule.
+ Drivers modifying the steering rule at a high rate (e.g.
+ adaptively in response to changes in the workload) are recommended to complete
+ processing of the receive queue(s) utilized by the original steering before
+ processing any packets delivered by the modified steering rule.
+\end_layout
+
+\begin_layout Standard
+
+\change_deleted 1986246365 1346664095
 .
+
+\change_unchanged
  
 \end_layout

^ permalink raw reply related

* Re: IPv6 routing type - not at par with IPv4 one?
From: Markus @ 2012-09-06 10:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Markus Stenberg, netdev, Nicolas Dichtel
In-Reply-To: <1346927705.2484.22.camel@edumazet-glaptop>

On 6.9.2012, at 13.35, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Well, it  seems you missed this : 
> 
> http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git;a=commitdiff;h=ef2c7d7b59708d54213c7556a82d14de9a7e4475
> 
> At least the blackhole is now supported on IPv6, so you probably have to
> add the 'throw' bit, if it makes any sense.


Ah, cool, teaches me to refresh my git trees before posting ;-)
Nicolas, want to update for that too or will I? 

Throw is useful in various cases (it is RTN_THROW / -EAGAIN). 

Cheers,

-Markus

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox