Netdev List
 help / color / mirror / Atom feed
* [PATCH] gianfar: Do not call device_set_wakeup_enable() under a spinlock
From: Rafael J. Wysocki @ 2010-11-09 21:54 UTC (permalink / raw)
  To: Daniel J Blueman, David S. Miller; +Cc: Francois Romieu, Linux Kernel, netdev
In-Reply-To: <201011090030.42693.rjw@sisk.pl>

On Tuesday, November 09, 2010, Rafael J. Wysocki wrote:
> On Tuesday, November 02, 2010, Daniel J Blueman wrote:
> > Since device_set_wakeup_enable now sleeps, it should not be called
> > from a critical section. Since wol_en is not updated elsewhere, we can
> > omit the locking entirely.
> > 
> > Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
> 
> Acked-by: Rafael J. Wysocki <rjw@sisk.pl>

Having reconsidered that I think it may be better to do something like in the
patch below.

This is a regression fix, so please apply if there are no objections.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: gianfar: Do not call device_set_wakeup_enable() under a spinlock

The gianfar driver calls device_set_wakeup_enable() under a spinlock,
which causes a problem to happen after the recent core power
management changes, because this function can sleep now.  Fix this
by moving the device_set_wakeup_enable() call out of the
spinlock-protected area.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/net/gianfar_ethtool.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/net/gianfar_ethtool.c
===================================================================
--- linux-2.6.orig/drivers/net/gianfar_ethtool.c
+++ linux-2.6/drivers/net/gianfar_ethtool.c
@@ -635,9 +635,10 @@ static int gfar_set_wol(struct net_devic
 	if (wol->wolopts & ~WAKE_MAGIC)
 		return -EINVAL;
 
+	device_set_wakeup_enable(&dev->dev, wol->wolopts & WAKE_MAGIC);
+
 	spin_lock_irqsave(&priv->bflock, flags);
-	priv->wol_en = wol->wolopts & WAKE_MAGIC ? 1 : 0;
-	device_set_wakeup_enable(&dev->dev, priv->wol_en);
+	priv->wol_en =  !!device_may_wakeup(&dev->dev);
 	spin_unlock_irqrestore(&priv->bflock, flags);
 
 	return 0;

^ permalink raw reply

* Re: Netlink limitations
From: Thomas Graf @ 2010-11-09 21:40 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Patrick McHardy, David S. Miller, pablo, netdev
In-Reply-To: <alpine.LNX.2.01.1011092113420.10710@obet.zrqbmnf.qr>

On Tue, Nov 09, 2010 at 09:20:09PM +0100, Jan Engelhardt wrote:
> What's more, there is no way to specify a remote host in sockaddr_nl
> right now, so all communication is necessarily being local - that is,
> unless you add a hidden forwarder in kernel space that transparently
> tunnels it into IPv6 or something.

That's fine. I don't expect the kernel to send netlink message to
other machines directly but rather have a userspace daemon handle
this. 

> I do not believe that encoding the attribute type into the protocol
> itself is going to be such a big win. You still need a local
> authoritative database (struct nla_policy[] or some representation of
> it, nevermind I'm thinking "XML-DTD"-like) to do the verification
> against because some NL messages may be purposely forged. If you have
> an nlattr that says it is a string, how do you know that it is in
> fact a string rather than a blob that happens to have a trailing \0.

True, we will never be able to verify the contents of attributes but
what we can do is give the sender the ability to specify what type of
attribute he was meant to send. This can be a big advantage as it
limits the possibliy of misinterpreting messages which may have been
corrupted because we can match the expected attribute type against
the attribute type supplied by the sender. Of course this doesn't
protect against forged messages at all, we will never be able to do
that.

The addition won't be a revolution but it the increased header size,
8 vs. 12 bytes isn't a big deal and gives us some additional room to
work with in the future.

struct nlattr_ext {
	u16	oldlen;		/* 0 */
	u16	kind;		/* TCA_* */
	u8	type;		/* NLA_U32 */
	u8	flags;		/* NLA_F_* */
	u16	reserved;
	u32	length;
};

There has been more than one debate whether to share nla_policy between
kernel and userspace. There is nothing which prevents people from doing
so. But typically the semantics between kernel->userspace and vice versa
are slightly different and require a different policy to be applied.

^ permalink raw reply

* Re: Loopback performance from kernel 2.6.12 to 2.6.37
From: Xose Vazquez Perez @ 2010-11-09 21:35 UTC (permalink / raw)
  To: netdev, jdb

Jesper Dangaard Brouer wrote:

> To fix this I added "-q 0" to netcat.  Thus my working commands are:
> 
>  netcat -l -p 9999 >/dev/null &
>  time dd if=/dev/zero bs=1M count=10000 | netcat -q0 127.0.0.1 9999
> 
> Running this on my "big" 10G testlab system, Dual Xeon 5550 2.67GHz,
> kernel version 2.6.32-5-amd64 (which I usually don't use)
> The results are 7.487 sec:

netcat flavor ?

http://nc110.sourceforge.net/
http://nmap.org/ncat/
http://www.dest-unreach.org/socat/
http://cryptcat.sourceforge.net/
http://netcat.sourceforge.net/
http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/nc/

-- 
«Allá muevan feroz guerra, ciegos reyes por un palmo más de tierra;
que yo aquí tengo por mío cuanto abarca el mar bravío, a quien nadie
impuso leyes. Y no hay playa, sea cualquiera, ni bandera de esplendor,
que no sienta mi derecho y dé pecho a mi valor.»

^ permalink raw reply

* Re: [PATCH] Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v2)
From: Neil Horman @ 2010-11-09 21:20 UTC (permalink / raw)
  To: Maciej Żenczykowski; +Cc: netdev, davem, eric.dumazet
In-Reply-To: <AANLkTi=s9zr+yF-nr8MoHY6W6m=usR2fH=dPJx6voC34@mail.gmail.com>

On Tue, Nov 09, 2010 at 01:07:32PM -0800, Maciej Żenczykowski wrote:
> > +#define PGV_FROM_VMALLOC 1
> 
> Why don't we always just use vmalloc, what's the benefit of get_user_pages?
> 
Because of how vmalloc works.  It maps discontiguous pages into contiguous
address space.  But we only have 128MB of that address space to work with by
default, so its quite possible that we won't be able to alloc all the memory.

> > +       /*
> > +        * vmalloc failed, lets dig into swap here
> > +        */
> > +       *flags = 0;
> 
> probably better to *flags &= ~PGV_FROM_VMALLOC;
> (since some flags could have been set before this function was called)
> 
Well, if any other users of this field existed, I'd agree, but since we're the
only one, I think its ok, at least for now.

> > +       gfp_flags &= ~__GFP_NORETRY;
> > +       buffer = (char *)__get_free_pages(gfp_flags, order);
> 
> wouldn't this still cause problems because you're now requiring linear
> memory again?
yes, its a last ditch effort after the other two options have been tried.  Its
all thats left to do.

> Would it be better to just fail at this point?
Why?  If we can dig into swap and get the memory, we may as well try.  It would
be better if we didn't have to, but if the choice is between failing and making
the system slow down....

Neil

> 

^ permalink raw reply

* Re: Networking hangs when too many parallel requests are made at once
From: Luke Hutchison @ 2010-11-09 21:17 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <4CD9B09B.8040906@candelatech.com>

On Tue, Nov 9, 2010 at 3:35 PM, Ben Greear <greearb@candelatech.com> wrote:
> On 11/09/2010 12:27 PM, Luke Hutchison wrote:
>> No, I haven't been able to reproduce on any other machine.  But it
>> happens on both my wifi NIC and my ethernet NIC in this machine.
>
> Well, let us know what those are, at least.

From my first email:

> I have a Toshiba Satellite Pro S300M-S2142
> laptop with a Core 2 Duo P8600 CPU, Intel GM45 gfx,
> Intel 82567V Gigabit Ethernet and Intel 5100 Wifi,
> running kernel kernel-2.6.36-1.1.fc15.x86_64 on top
> of Fedora 14.

On Tue, Nov 9, 2010 at 3:35 PM, Ben Greear <greearb@candelatech.com> wrote:
> And, a network capture of your system going into this state might
> be useful.  I'd try to disable your wireless NIC entirely and focus
> on debugging the wired NIC as that is usually easier to debug.

Sure -- a wireshark trace is here: http://web.mit.edu/~luke_h/www/trace.bz2

In this particular trace, I opened about 20 browser tabs at once.
They all locked up after about 5 seconds.  A few of them loaded some
more content after a minute or two.  A minute or two later, I killed
them all.  In this particular example, pinging to a specific domain
name continued to work (it doesn't always), although I couldn't get
content from the domains in question: e.g. I could ping google.com,
but opening a new tab and trying to visit google.com caused the new
tab to hang too.

Thanks,
Luke

^ permalink raw reply

* Re: [PATCH] Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v2)
From: Maciej Żenczykowski @ 2010-11-09 21:07 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, davem, eric.dumazet
In-Reply-To: <1289324799-2256-1-git-send-email-nhorman@tuxdriver.com>

> +#define PGV_FROM_VMALLOC 1

Why don't we always just use vmalloc, what's the benefit of get_user_pages?

> +       /*
> +        * vmalloc failed, lets dig into swap here
> +        */
> +       *flags = 0;

probably better to *flags &= ~PGV_FROM_VMALLOC;
(since some flags could have been set before this function was called)

> +       gfp_flags &= ~__GFP_NORETRY;
> +       buffer = (char *)__get_free_pages(gfp_flags, order);

wouldn't this still cause problems because you're now requiring linear
memory again?
Would it be better to just fail at this point?

^ permalink raw reply

* Re: [PATCH] Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v2)
From: Neil Horman @ 2010-11-09 20:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, zenczykowski
In-Reply-To: <1289330438.2774.25.camel@edumazet-laptop>

On Tue, Nov 09, 2010 at 08:20:38PM +0100, Eric Dumazet wrote:
> Le mardi 09 novembre 2010 à 13:38 -0500, Neil Horman a écrit :
> > On Tue, Nov 09, 2010 at 07:02:32PM +0100, Eric Dumazet wrote:
> > > Le mardi 09 novembre 2010 à 12:46 -0500, nhorman@tuxdriver.com a écrit :
> > ic char **alloc_pg_vec(struct tpacket_req *req, int order)
> > > > +static struct pgv *alloc_pg_vec(struct tpacket_req *req, int order)
> > > >  {
> > > >  	unsigned int block_nr = req->tp_block_nr;
> > > > -	char **pg_vec;
> > > > +	struct pgv *pg_vec;
> > > >  	int i;
> > > >  
> > > > -	pg_vec = kzalloc(block_nr * sizeof(char *), GFP_KERNEL);
> > > > +	pg_vec = kzalloc(block_nr * sizeof(struct pgv), GFP_KERNEL);
> > > 
> > > While we are at it, we could check block_nr being a sane value here ;)
> > > 
> > This is true.  What do you think a reasonable sane value is?  libpcap seems to
> > limit itself to 32 order 5 entries in the ring, but that seems a bit arbitrary.
> > Perhaps we could check and limit allocations to being no more than order 8
> > (1Mb), and a total allocation of no more than perhaps max(32Mb, 1% of all ram)?
> > Just throwing it out there, open to any suggestions here
> 
> I was refering to a malicious/buggy program giving a big tp_block_nr so
> that (block_nr * sizeof(struct pgv)) overflows the u32
> 
> One way to deal with that is to use
> 
> 	kcalloc(block_nr, sizeof(struct pgv), GFP_KERNEL);
> 
> I am not sure consistency checks done in packet_set_ring() are enough to
> properly detect such errors.
Ah, I get you, ok.  Yeah, I'll respin this with that taken into account.
Thanks!
Neil

> 
> 
> 
> 
> 
> 

^ permalink raw reply

* [PATCH 2/2] net: Simplify RX queue allocation
From: Tom Herbert @ 2010-11-09 20:47 UTC (permalink / raw)
  To: davem, netdev

This patch move RX queue allocation to alloc_netdev_mq and freeing of
the queues to free_netdev (symmetric to TX queue allocation).  Each
kobject RX queue takes a reference to the queue's device so that the
device can't be freed before all the kobjects have been released-- this
obviates the need for reference counts specific to RX queues.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/linux/netdevice.h |    3 +--
 net/core/dev.c            |   15 ++++++++++-----
 net/core/net-sysfs.c      |    7 ++-----
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d8fd2c2..da59595 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -592,8 +592,7 @@ struct netdev_rx_queue {
 	struct rps_map __rcu		*rps_map;
 	struct rps_dev_flow_table __rcu	*rps_flow_table;
 	struct kobject			kobj;
-	struct netdev_rx_queue		*first;
-	atomic_t			count;
+	struct net_device		*dev;
 } ____cacheline_aligned_in_smp;
 #endif /* CONFIG_RPS */
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 8f9c76e..87d89ba 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5034,7 +5034,7 @@ static int netif_alloc_rx_queues(struct net_device *dev)
 	 * reference count.
 	 */
 	for (i = 0; i < count; i++)
-		rx[i].first = rx;
+		rx[i].dev = dev;
 #endif
 	return 0;
 }
@@ -5110,10 +5110,6 @@ int register_netdevice(struct net_device *dev)
 
 	dev->iflink = -1;
 
-	ret = netif_alloc_rx_queues(dev);
-	if (ret)
-		goto out;
-
 	netdev_init_queues(dev);
 
 	/* Init, if this function is available */
@@ -5579,6 +5575,8 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 #ifdef CONFIG_RPS
 	dev->num_rx_queues = queue_count;
 	dev->real_num_rx_queues = queue_count;
+	if (netif_alloc_rx_queues(dev))
+		goto free_pcpu;
 #endif
 
 	dev->gso_max_size = GSO_MAX_SIZE;
@@ -5596,6 +5594,10 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 free_pcpu:
 	free_percpu(dev->pcpu_refcnt);
 	kfree(dev->_tx);
+#ifdef CONFIG_RPS
+	kfree(dev->_rx);
+#endif
+
 free_p:
 	kfree(p);
 	return NULL;
@@ -5617,6 +5619,9 @@ void free_netdev(struct net_device *dev)
 	release_net(dev_net(dev));
 
 	kfree(dev->_tx);
+#ifdef CONFIG_RPS
+	kfree(dev->_rx);
+#endif
 
 	kfree(rcu_dereference_raw(dev->ingress_queue));
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index a5ff5a8..3ba526b 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -706,7 +706,6 @@ static struct attribute *rx_queue_default_attrs[] = {
 static void rx_queue_release(struct kobject *kobj)
 {
 	struct netdev_rx_queue *queue = to_rx_queue(kobj);
-	struct netdev_rx_queue *first = queue->first;
 	struct rps_map *map;
 	struct rps_dev_flow_table *flow_table;
 
@@ -719,8 +718,7 @@ static void rx_queue_release(struct kobject *kobj)
 	if (flow_table)
 		call_rcu(&flow_table->rcu, rps_dev_flow_table_release);
 
-	if (atomic_dec_and_test(&first->count))
-		kfree(first);
+	dev_put(queue->dev);
 }
 
 static struct kobj_type rx_queue_ktype = {
@@ -732,7 +730,6 @@ static struct kobj_type rx_queue_ktype = {
 static int rx_queue_add_kobject(struct net_device *net, int index)
 {
 	struct netdev_rx_queue *queue = net->_rx + index;
-	struct netdev_rx_queue *first = queue->first;
 	struct kobject *kobj = &queue->kobj;
 	int error = 0;
 
@@ -745,7 +742,7 @@ static int rx_queue_add_kobject(struct net_device *net, int index)
 	}
 
 	kobject_uevent(kobj, KOBJ_ADD);
-	atomic_inc(&first->count);
+	dev_hold(queue->dev);
 
 	return error;
 }
-- 
1.7.3.1


^ permalink raw reply related

* [PATCH 1/2] net: Move TX queue allocation to alloc_netdev_mq
From: Tom Herbert @ 2010-11-09 20:47 UTC (permalink / raw)
  To: davem, netdev

TX queues are now allocated in alloc_netdev_mq and freed in
free_netdev.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/core/dev.c |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 0dd54a6..8f9c76e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5114,10 +5114,6 @@ int register_netdevice(struct net_device *dev)
 	if (ret)
 		goto out;
 
-	ret = netif_alloc_netdev_queues(dev);
-	if (ret)
-		goto out;
-
 	netdev_init_queues(dev);
 
 	/* Init, if this function is available */
@@ -5577,6 +5573,8 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 
 	dev->num_tx_queues = queue_count;
 	dev->real_num_tx_queues = queue_count;
+	if (netif_alloc_netdev_queues(dev))
+		goto free_pcpu;
 
 #ifdef CONFIG_RPS
 	dev->num_rx_queues = queue_count;
@@ -5597,6 +5595,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 
 free_pcpu:
 	free_percpu(dev->pcpu_refcnt);
+	kfree(dev->_tx);
 free_p:
 	kfree(p);
 	return NULL;
-- 
1.7.3.1


^ permalink raw reply related

* [PATCH 0/2] net: Changes in queue allocation and freeing
From: Tom Herbert @ 2010-11-09 20:47 UTC (permalink / raw)
  To: davem, netdev

Changes to both RX and TX queue allocation.  In both cases allocate
in alloc_netdev_mq and free in free_netdev.  For RX the reference
couting also changed, the device reference count can now be used.

^ permalink raw reply

* Re: Networking hangs when too many parallel requests are made at once
From: Ben Greear @ 2010-11-09 20:35 UTC (permalink / raw)
  To: Luke Hutchison; +Cc: netdev
In-Reply-To: <AANLkTinz6C4s3wX7t6CsE2bzZ2adCoi0RAcqvpEFE2GV@mail.gmail.com>

On 11/09/2010 12:27 PM, Luke Hutchison wrote:
> On Tue, Nov 9, 2010 at 2:16 PM, Ben Greear<greearb@candelatech.com>  wrote:
>> On 11/09/2010 11:04 AM, Luke Hutchison wrote:
>>>
>>> On Tue, Nov 9, 2010 at 1:30 PM, Luke Hutchison<luke.hutch@gmail.com>
>>>   wrote:
>>>>
>>>> Since around Linux kernel 2.6.33 or so (but maybe as early as
>>>> 2.6.31, not sure exactly what version), when restoring a crashed or
>>>> closed browser session of either Firefox or Chrome where lots of tabs
>>>> (say 10-40) open simultaneously, the networking stack is brought to
>>>> its knees -- most or all the tabs eventually time out without data, or
>>>> a few tabs might get some data and then display a partial web page.
>>
>> Have you been able to reproduce this on any other machine?  I suspect
>> it might be an issue with your specific NIC or other hardware.
>>
>> At the least, it's not a general problem with opening lots
>> of TCP connections, as we routinely test with thousands...
>>
>> Thanks,
>> Ben
>
> No, I haven't been able to reproduce on any other machine.  But it
> happens on both my wifi NIC and my ethernet NIC in this machine.

Well, let us know what those are, at least.

And, a network capture of your system going into this state might
be useful.  I'd try to disable your wireless NIC entirely and focus
on debugging the wired NIC as that is usually easier to debug.

Thanks,
Ben

>
> Thanks,
> Luke


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: [PATCH 3/3] net: tipc: fix information leak to userland
From: Vasiliy Kulikov @ 2010-11-09 20:33 UTC (permalink / raw)
  To: David Miller
  Cc: kernel-janitors, jon.maloy, allan.stephens, tipc-discussion,
	netdev, linux-kernel
In-Reply-To: <20101109.092630.260076036.davem@davemloft.net>

On Tue, Nov 09, 2010 at 09:26 -0800, David Miller wrote:
> From: Vasiliy Kulikov <segooon@gmail.com>
> Date: Sun, 31 Oct 2010 20:10:32 +0300
> 
> > Structure sockaddr_tipc is copied to userland with padding bytes after
> > "id" field in union field "name" unitialized.  It leads to leaking of
> > contents of kernel stack memory.  We have to initialize them to zero.
> > 
> > Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
> 
> Applied.
> 
> Patches #1 and #2 were given feedback which I need you to integrate
> and submit new patches based upon, thanks.

About #2:

I still think that this:

    if (dev)
        strncpy(uaddr->sa_data, dev->name, 14);
    else
        memset(uaddr->sa_data, 0, 14);

is better than this:

    memset(uaddr->sa_data, 0, 14);
    dev = dev_get_by_index_rcu(sock_net(sk), pkt_sk(sk)->ifindex);
    if (dev)
        strlcpy(uaddr->sa_data, dev->name, 15);

Doesn't it?  Explicitly filling with zero on the same "if" level is
slightly easier to read and understand.

-- 
Vasiliy

^ permalink raw reply

* Re: Networking hangs when too many parallel requests are made at once
From: Luke Hutchison @ 2010-11-09 20:27 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <4CD99E20.4060307@candelatech.com>

On Tue, Nov 9, 2010 at 2:16 PM, Ben Greear <greearb@candelatech.com> wrote:
> On 11/09/2010 11:04 AM, Luke Hutchison wrote:
>>
>> On Tue, Nov 9, 2010 at 1:30 PM, Luke Hutchison<luke.hutch@gmail.com>
>>  wrote:
>>>
>>> Since around Linux kernel 2.6.33 or so (but maybe as early as
>>> 2.6.31, not sure exactly what version), when restoring a crashed or
>>> closed browser session of either Firefox or Chrome where lots of tabs
>>> (say 10-40) open simultaneously, the networking stack is brought to
>>> its knees -- most or all the tabs eventually time out without data, or
>>> a few tabs might get some data and then display a partial web page.
>
> Have you been able to reproduce this on any other machine?  I suspect
> it might be an issue with your specific NIC or other hardware.
>
> At the least, it's not a general problem with opening lots
> of TCP connections, as we routinely test with thousands...
>
> Thanks,
> Ben

No, I haven't been able to reproduce on any other machine.  But it
happens on both my wifi NIC and my ethernet NIC in this machine.

Thanks,
Luke

^ permalink raw reply

* Re: Netlink limitations
From: Jan Engelhardt @ 2010-11-09 20:20 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Patrick McHardy, David S. Miller, pablo, netdev
In-Reply-To: <20101109144941.GA4018@canuck.infradead.org>


On Tuesday 2010-11-09 15:49, Thomas Graf wrote:
>
>We have tried to come up with ways of forwarding netlink messages to
>other machines several times. It always failed due to the fact that
>protocols encode attributes/data differently without having the
>ability to specify the encoding.

What's more, there is no way to specify a remote host in sockaddr_nl
right now, so all communication is necessarily being local - that is,
unless you add a hidden forwarder in kernel space that transparently
tunnels it into IPv6 or something.

>I haven't given up on the idea of self describing netlink protocols
>yet. For example we could encode the attribute type
>(i8|u16|u32|u16|string) in additional to the existing nested attribute
>flag.

I do not believe that encoding the attribute type into the protocol
itself is going to be such a big win. You still need a local
authoritative database (struct nla_policy[] or some representation of
it, nevermind I'm thinking "XML-DTD"-like) to do the verification
against because some NL messages may be purposely forged. If you have
an nlattr that says it is a string, how do you know that it is in
fact a string rather than a blob that happens to have a trailing \0.

^ permalink raw reply

* Re: [PATCH] net/dst: dst_dev_event() called after other notifiers
From: Ben Greear @ 2010-11-09 20:11 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev
In-Reply-To: <20101109.114853.193732360.davem@davemloft.net>

On 11/09/2010 11:48 AM, David Miller wrote:
> From: Eric Dumazet<eric.dumazet@gmail.com>
> Date: Tue, 09 Nov 2010 20:37:55 +0100
>
>> [PATCH] net/dst: dst_dev_event() called after other notifiers
>
> Nice, applied.
>
> However, I had to apply this by hand:
>
>>   static struct notifier_block dst_dev_notifier = {
>>   	.notifier_call  = dst_dev_event,
>> +	.priority = -10, /* must be called after other network notifiers */
>>   };
>
> The character after ".notifier_call" in my tree is a TAB character but
> in your patch it is a sequence of spaces.  This isn't looking like the
> usual email corruption, because the leading TAB characters on these
> lines are properly there.
>
> Please figure out why this happened so that it doesn't repeat in
> future patches :-)

I manually applied this as well and can confirm that interface deletion
with a global IPv6 address on it is now comparable to any other device
delete (about 30ms).

Tested-by:  Ben Greear <greearb@candelatech.com>

I'd love to test patches that made all interface deletes faster,
btw :)

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: [PATCH] net/dst: dst_dev_event() called after other notifiers
From: David Miller @ 2010-11-09 19:48 UTC (permalink / raw)
  To: eric.dumazet; +Cc: greearb, netdev
In-Reply-To: <1289331475.2774.41.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 09 Nov 2010 20:37:55 +0100

> [PATCH] net/dst: dst_dev_event() called after other notifiers

Nice, applied.

However, I had to apply this by hand:

>  static struct notifier_block dst_dev_notifier = {
>  	.notifier_call  = dst_dev_event,
> +	.priority = -10, /* must be called after other network notifiers */
>  };

The character after ".notifier_call" in my tree is a TAB character but
in your patch it is a sequence of spaces.  This isn't looking like the
usual email corruption, because the leading TAB characters on these
lines are properly there.

Please figure out why this happened so that it doesn't repeat in
future patches :-)

Thanks!

^ permalink raw reply

* [PATCH] net/dst: dst_dev_event() called after other notifiers
From: Eric Dumazet @ 2010-11-09 19:37 UTC (permalink / raw)
  To: Ben Greear, David Miller; +Cc: NetDev
In-Reply-To: <4CD893C6.2030803@candelatech.com>

Le lundi 08 novembre 2010 à 16:20 -0800, Ben Greear a écrit :
> This is on an otherwise lightly loaded 2.6.36 + hacks system, 12 physical interfaces,
> and two VETH interfaces.
> 
> It's much faster to delete an interface when it has no IPv6 address:
> 
> [root@ct503-60 lanforge]# time ip link add link eth5 up name eth5#0 address 00:00:00:00:00:01 type macvlan
> 
> real	0m0.005s
> user	0m0.001s
> sys	0m0.004s
> [root@ct503-60 lanforge]# time ip link delete eth5#0
> 
> real	0m0.033s
> user	0m0.001s
> sys	0m0.005s
> [root@ct503-60 lanforge]# ip link add link eth5 up name eth5#0 address 00:00:00:00:00:01 type macvlan
> 
> [root@ct503-60 lanforge]# ip -6 addr add 2002::1/64 dev eth5#0
> [root@ct503-60 lanforge]# time ip link delete eth5#0
> 
> real	0m1.030s
> user	0m0.000s
> sys	0m0.013s
> 
> 
> Funny enough, if you explicitly remove the IPv6 addr first it seems
> to run at normal speed (adding both operation's times together)
> 
> [root@ct503-60 lanforge]# ip link add link eth5 up name eth5#0 address 00:00:00:00:00:01 type macvlan
> [root@ct503-60 lanforge]# ip -6 addr add 2002::1/64 dev eth5#0
> [root@ct503-60 lanforge]# time ip -6 addr delete 2002::1/64 dev eth5#0
> 
> real	0m0.001s
> user	0m0.000s
> sys	0m0.001s
> [root@ct503-60 lanforge]# time ip link delete eth5#0
> 
> real	0m0.028s
> user	0m0.001s
> sys	0m0.005s
> 

OK, I confirm I already knew how to correct the problem.

http://www.spinics.net/lists/netdev/msg140729.html

quote :

I also believe the order of netdevice notifiers is wrong (we dont set
priority), and that we should call fib_netdev_event() _before_
dst_dev_event(). This needs another patch.


Thanks

[PATCH] net/dst: dst_dev_event() called after other notifiers

Followup of commit ef885afbf8a37689 (net: use rcu_barrier() in
rollback_registered_many)

dst_dev_event() scans a garbage dst list that might be feeded by various
network notifiers at device dismantle time.

Its important to call dst_dev_event() after other notifiers, or we might
enter the infamous msleep(250) in netdev_wait_allrefs(), and wait one
second before calling again call_netdevice_notifiers(NETDEV_UNREGISTER,
dev) to properly remove last device references.

Use priority -10 to let dst_dev_notifier be called after other network
notifiers (they have the default 0 priority)

Reported-by: Ben Greear <greearb@candelatech.com>
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reported-by: Octavian Purdila <opurdila@ixiacom.com>
Reported-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/dst.c |    1 +
 1 files changed, 1 insertion(+)

diff --git a/net/core/dst.c b/net/core/dst.c
index 8abe628..e234bf1 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -370,6 +370,7 @@ static int dst_dev_event(struct notifier_block *this, unsigned long event,
 
 static struct notifier_block dst_dev_notifier = {
 	.notifier_call  = dst_dev_event,
+	.priority = -10, /* must be called after other network notifiers */
 };
 
 void __init dst_init(void)



^ permalink raw reply related

* Re: ping -I eth1 ....
From: Joakim Tjernlund @ 2010-11-09 19:33 UTC (permalink / raw)
  Cc: Thomas Graf, Eric Dumazet, netdev
In-Reply-To: <OF921B3329.67FE598A-ONC12577D3.0033387A-C12577D3.00355AC4@LocalDomain>

Joakim Tjernlund/Transmode wrote on 2010/11/06 10:42:46:
> Thomas Graf <tgr@infradead.org> wrote on 2010/11/05 21:31:50:
> >
> > On Fri, Nov 05, 2010 at 04:54:18PM +0100, Joakim Tjernlund wrote:
> > > Eric Dumazet <eric.dumazet@gmail.com> wrote on 2010/11/05 16:06:54:
> > > >
> > > > > Hopefully most of that is legacy or just plain wrong? Unless
> > > > > someone can say why only test IFF_UP one should consider changing them.
> > > > >
> > > >
> > > > Most of the places are hot path.
> > > >
> > > > You dont want to replace one test by four tests.
> > > >
> > > > _This_ would be wrong :)
> > >
> > > Wrong is wrong, even if it is in the hot path :)
> > > Perhaps it is time define and internal IFF_OPERATIONAL flag
> > > which is the sum of IFF_UP, IFF_RUNNING etc.? Tht
> > > way you still get one test in the hot path and can abstract
> > > what defines an operational link.
> >
> > You definitely don't want to have your send() call fail simply because
> > the carrier was off for a few msec or the routing daemon has put a link
> > down temporarly. Also, the outgoing interface looked up at routing
> > decision is not necessarly the interface used for sending in the end.
> > The packet may get mangled and rerouted by netfilter or tc on the way.
>
> But do you handle the case when the link is non operational for a long time?
>
> >
> > Personally I'm even ok with the current behaviour of sendto() while the
> > socket is bound to an interface but if we choose to return an error
> > if the interface is down we might as well do so based on the operational
> > status.

> Perhaps there is a better way. This all started when pppd hung because
> of ping -I <ppp interface>, then someone pulled the cable for the on the link.
>
> This is a strace where we have two ping -I,
> ping -I p1-2-1-2-2 .. and ping -I p1-2-3-2-4 ..
> Notice how pppd hangs for a long time in PPPIOCDETACH
> As far as I can tell this is due to ping -I has claimed the ppp interfaces
> and doesn't noticed that the link is down. Ideally ping should receive
> a ENODEV as soon as pppd calls PPPIOCDETACH.
>
>    0.000908 write(0, "Connection terminated.\n", 23) = 23
>      0.000481 gettimeofday({1288952770, 566048}, NULL) = 0
>      0.001553 ioctl(7, PPPIOCDETACH
> Message from syslogd@Brazil at Fri Nov  5 11:26:20 2010 ...
> Brazil kernel: unregister_netdevice: waiting for p1-2-1-2-2 to become free. Usage count = 3
> Message from syslogd@Brazil at Fri Nov  5 11:26:20 2010 ...
> Brazil kernel: unregister_netdevice: waiting for p1-2-3-2-4 to become free. Usage count = 3
> Message from syslogd@Brazil at Fri Nov  5 11:26:51 2010 ...
> Brazil last message repeated 3 times
> , 0xbfbc3398) = 0
>     66.559216 connect(9, {sa_family=AF_PPPOX, sa_data="\0\0\0\0\0\0\0\252\273\314\335\356hd"}, 30) = 0
>      0.000693 close(10)                 = 0
>      0.000449 close(7)                  = 0
>      0.009801 close(9)                  = 0

Any comment on this last strace? It is expected that ping -I should
hold pppd hostage?

 Jocke


^ permalink raw reply

* Re: [PATCH] Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v2)
From: Eric Dumazet @ 2010-11-09 19:20 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, davem, zenczykowski
In-Reply-To: <20101109183820.GA8069@hmsreliant.think-freely.org>

Le mardi 09 novembre 2010 à 13:38 -0500, Neil Horman a écrit :
> On Tue, Nov 09, 2010 at 07:02:32PM +0100, Eric Dumazet wrote:
> > Le mardi 09 novembre 2010 à 12:46 -0500, nhorman@tuxdriver.com a écrit :
> ic char **alloc_pg_vec(struct tpacket_req *req, int order)
> > > +static struct pgv *alloc_pg_vec(struct tpacket_req *req, int order)
> > >  {
> > >  	unsigned int block_nr = req->tp_block_nr;
> > > -	char **pg_vec;
> > > +	struct pgv *pg_vec;
> > >  	int i;
> > >  
> > > -	pg_vec = kzalloc(block_nr * sizeof(char *), GFP_KERNEL);
> > > +	pg_vec = kzalloc(block_nr * sizeof(struct pgv), GFP_KERNEL);
> > 
> > While we are at it, we could check block_nr being a sane value here ;)
> > 
> This is true.  What do you think a reasonable sane value is?  libpcap seems to
> limit itself to 32 order 5 entries in the ring, but that seems a bit arbitrary.
> Perhaps we could check and limit allocations to being no more than order 8
> (1Mb), and a total allocation of no more than perhaps max(32Mb, 1% of all ram)?
> Just throwing it out there, open to any suggestions here

I was refering to a malicious/buggy program giving a big tp_block_nr so
that (block_nr * sizeof(struct pgv)) overflows the u32

One way to deal with that is to use

	kcalloc(block_nr, sizeof(struct pgv), GFP_KERNEL);

I am not sure consistency checks done in packet_set_ring() are enough to
properly detect such errors.





^ permalink raw reply

* Re: Networking hangs when too many parallel requests are made at once
From: Ben Greear @ 2010-11-09 19:16 UTC (permalink / raw)
  To: Luke Hutchison; +Cc: netdev
In-Reply-To: <AANLkTinRo_ozWFCYYqanOh0YrTJ6+KadutG2j7T726dY@mail.gmail.com>

On 11/09/2010 11:04 AM, Luke Hutchison wrote:
> On Tue, Nov 9, 2010 at 1:30 PM, Luke Hutchison<luke.hutch@gmail.com>  wrote:
>> Since around Linux kernel 2.6.33 or so (but maybe as early as
>> 2.6.31, not sure exactly what version), when restoring a crashed or
>> closed browser session of either Firefox or Chrome where lots of tabs
>> (say 10-40) open simultaneously, the networking stack is brought to
>> its knees -- most or all the tabs eventually time out without data, or
>> a few tabs might get some data and then display a partial web page.

Have you been able to reproduce this on any other machine?  I suspect
it might be an issue with your specific NIC or other hardware.

At the least, it's not a general problem with opening lots
of TCP connections, as we routinely test with thousands...

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: Networking hangs when too many parallel requests are made at once
From: Luke Hutchison @ 2010-11-09 19:04 UTC (permalink / raw)
  To: netdev
In-Reply-To: <AANLkTikuCj=saH34dyO6xnU-y8rWv63hJaqowZ0ACLzk@mail.gmail.com>

On Tue, Nov 9, 2010 at 1:30 PM, Luke Hutchison <luke.hutch@gmail.com> wrote:
> Since around Linux kernel 2.6.33 or so (but maybe as early as
> 2.6.31, not sure exactly what version), when restoring a crashed or
> closed browser session of either Firefox or Chrome where lots of tabs
> (say 10-40) open simultaneously, the networking stack is brought to
> its knees -- most or all the tabs eventually time out without data, or
> a few tabs might get some data and then display a partial web page.

I forgot to mention, I have glibc-2.12.90-18.x86_64.

Also the following screenshot may be useful

http://web.mit.edu/~luke_h/www/dns-hang-problem.png

Basically in the usage depicted by the screenshot, I had Chrome open
with probably 30-50 tabs across several windows, I then started an scp
transfer of a large file and waited for it to stabilize, then closed
the browser and re-opened it, restoring the tabs.  Within a second or
two (after the first few lucky browser tabs got some content), DNS
hung, and pinging a domain name from the commandline no longer worked
(ruling out a bug in the browser itself).  However the scp transfer
continued at the same rate, and pinging an IP address directly
continued to work fine (in this case; at other times network
connections to already-resolved IP addresses can seem flaky I think,
but I haven't been able to reproduce these problems as easily as with
DNS, which has 100% reproducibility).  You can see that CPU usage
dropped from 100% to something like 50% when the browser tabs all
started blocking (but actually I'm surprised that CPU usage didn't
drop to zero).  In this instance, as soon as I shut down the browser,
pinging a domain name worked immediately again (although, as I
mentioned previously, sometimes it can take a minute or more after
killing the browser for name resolution to jump back into working
mode).  "ifdown eth0 ; ifup eth0" *usually* fixes the problem by
canceling all pending requests.

>From one of the RH engineers:

> It's not a driver issue, since it occurs
> with two different devices... it's not a configuration issue since it
> occurs on a LiveCD... For the same reason it's unlikely to be a
> userspace issue... It's unlikely to be a local network issue since you
> say it happens in multiple locations...
>
> Absolutely bizarre. :/

Any help greatly appreciated.

Thanks,
Luke

^ permalink raw reply

* Re: alloc_netdev_mq() and multiqueues
From: David Miller @ 2010-11-09 18:42 UTC (permalink / raw)
  To: wkevils; +Cc: netdev
In-Reply-To: <AANLkTi=8PrpAB+ARt-H3niNjnmQgHF1Ass345e95_pdq@mail.gmail.com>

From: Kevin Wilson <wkevils@gmail.com>
Date: Tue, 9 Nov 2010 20:33:53 +0200

> I have a short question about multiqueues  and I will appreciate if
> somebody can answer shortly in 2-3 sentences.

When you want help from volunteers, and then you dictacte exactly how
people should give you help, you usually receive no help at all.

Just FYI..

^ permalink raw reply

* Re: [PATCH] Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v2)
From: Neil Horman @ 2010-11-09 18:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, zenczykowski
In-Reply-To: <1289325753.2774.20.camel@edumazet-laptop>

On Tue, Nov 09, 2010 at 07:02:32PM +0100, Eric Dumazet wrote:
> Le mardi 09 novembre 2010 à 12:46 -0500, nhorman@tuxdriver.com a écrit :
> > From: Neil Horman <nhorman@tuxdriver.com>
> > 
> > Version 2 of this patch.  Sorry its been awhile, but I've had other issues come
> > up.  I've re-written this patch taking into account the change notes from the
> > last round
> > 
> > Change notes:
> > 1) Rewrote the patch to not do all my previous stupid vmaps on single page
> > arrays.  Instead we just use vmalloc now.
> > 
> > 2) Checked into the behavior of tcpdump on high order allocations in the failing
> > case.  Tcpdump (more specifically I think libpcap) does attempt to minimize the
> > allocation order on the ring buffer, unfortunately the lowest it will push the
> > ordrer too given a sufficiently large buffer is order 5, so its still a very
> > large contiguous allocation, and that says nothing of other malicious
> > applications that might try to do more.
> > 
> > Summary:
> > It was shown to me recently that systems under high load were driven very deep
> > into swap when tcpdump was run.  The reason this happened was because the
> > AF_PACKET protocol has a SET_RINGBUFFER socket option that allows the user space
> > application to specify how many entries an AF_PACKET socket will have and how
> > large each entry will be.  It seems the default setting for tcpdump is to set
> > the ring buffer to 32 entries of 64 Kb each, which implies 32 order 5
> > allocation.  Thats difficult under good circumstances, and horrid under memory
> > pressure.
> > 
> > I thought it would be good to make that a bit more usable.  I was going to do a
> > simple conversion of the ring buffer from contigous pages to iovecs, but
> > unfortunately, the metadata which AF_PACKET places in these buffers can easily
> > span a page boundary, and given that these buffers get mapped into user space,
> > and the data layout doesn't easily allow for a change to padding between frames
> > to avoid that, a simple iovec change is just going to break user space ABI
> > consistency.
> > 
> > So I've done this, I've added a three tiered mechanism to the af_packet set_ring
> > socket option.  It attempts to allocate memory in the following order:
> > 
> > 1) Using __get_free_pages with GFP_NORETRY set, so as to fail quickly without
> > digging into swap
> > 
> > 2) Using vmalloc
> > 
> > 3) Using __get_free_pages with GFP_NORETRY clear, causing us to try as hard as
> > needed to get the memory
> > 
> > The effect is that we don't disturb the system as much when we're under load,
> > while still being able to conduct tcpdumps effectively.
> > 
> > Tested successfully by me.
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > ---
> >  net/packet/af_packet.c |   84 ++++++++++++++++++++++++++++++++++++++---------
> >  1 files changed, 68 insertions(+), 16 deletions(-)
> > 
> > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > index 3616f27..d1148ac 100644
> > --- a/net/packet/af_packet.c
> > +++ b/net/packet/af_packet.c
> > @@ -163,8 +163,14 @@ struct packet_mreq_max {
> >  static int packet_set_ring(struct sock *sk, struct tpacket_req *req,
> >  		int closing, int tx_ring);
> >  
> > +#define PGV_FROM_VMALLOC 1
> > +struct pgv {
> > +	char *buffer;
> > +	unsigned char flags;
> > +};
> > +
> >  struct packet_ring_buffer {
> > -	char			**pg_vec;
> > +	struct pgv		*pg_vec;
> >  	unsigned int		head;
> >  	unsigned int		frames_per_block;
> >  	unsigned int		frame_size;
> > @@ -283,7 +289,8 @@ static void *packet_lookup_frame(struct packet_sock *po,
> >  	pg_vec_pos = position / rb->frames_per_block;
> >  	frame_offset = position % rb->frames_per_block;
> >  
> > -	h.raw = rb->pg_vec[pg_vec_pos] + (frame_offset * rb->frame_size);
> > +	h.raw = rb->pg_vec[pg_vec_pos].buffer +
> > +		(frame_offset * rb->frame_size);
> >  
> >  	if (status != __packet_get_status(po, h.raw))
> >  		return NULL;
> > @@ -2322,37 +2329,74 @@ static const struct vm_operations_struct packet_mmap_ops = {
> >  	.close	=	packet_mm_close,
> >  };
> >  
> > -static void free_pg_vec(char **pg_vec, unsigned int order, unsigned int len)
> > +static void free_pg_vec(struct pgv *pg_vec, unsigned int order,
> > +			unsigned int len)
> >  {
> >  	int i;
> >  
> >  	for (i = 0; i < len; i++) {
> > -		if (likely(pg_vec[i]))
> > -			free_pages((unsigned long) pg_vec[i], order);
> > +		if (likely(pg_vec[i].buffer)) {
> > +			if (pg_vec[i].flags & PGV_FROM_VMALLOC)
> > +				vfree(pg_vec[i].buffer);
> > +			else
> > +				free_pages((unsigned long)pg_vec[i].buffer,
> > +					   order);
> > +			pg_vec[i].buffer = NULL;
> > +		}
> >  	}
> >  	kfree(pg_vec);
> >  }
> >  
> > -static inline char *alloc_one_pg_vec_page(unsigned long order)
> > +static inline char *alloc_one_pg_vec_page(unsigned long order,
> > +					  unsigned char *flags)
> >  {
> > -	gfp_t gfp_flags = GFP_KERNEL | __GFP_COMP | __GFP_ZERO | __GFP_NOWARN;
> > +	char *buffer = NULL;
> > +	gfp_t gfp_flags = GFP_KERNEL | __GFP_COMP |
> > +			  __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY;
> > +
> > +	buffer = (char *) __get_free_pages(gfp_flags, order);
> > +
> > +	if (buffer)
> > +		return buffer;
> > +
> > +	/*
> > +	 * __get_free_pages failed, fall back to vmalloc
> > +	 */
> > +	*flags |= PGV_FROM_VMALLOC;
> > +	buffer = vmalloc((1 << order) * PAGE_SIZE);
> >  
> > -	return (char *) __get_free_pages(gfp_flags, order);
> > +	if (buffer)
> > +		return buffer;
> > +
> > +	/*
> > +	 * vmalloc failed, lets dig into swap here
> > +	 */
> > +	*flags = 0;
> > +	gfp_flags &= ~__GFP_NORETRY;
> > +	buffer = (char *)__get_free_pages(gfp_flags, order);
> > +	if (buffer)
> > +		return buffer;
> > +
> > +	/*
> > +	 * complete and utter failure
> > +	 */
> > +	return NULL;
> >  }
> >  
> > -static char **alloc_pg_vec(struct tpacket_req *req, int order)
> > +static struct pgv *alloc_pg_vec(struct tpacket_req *req, int order)
> >  {
> >  	unsigned int block_nr = req->tp_block_nr;
> > -	char **pg_vec;
> > +	struct pgv *pg_vec;
> >  	int i;
> >  
> > -	pg_vec = kzalloc(block_nr * sizeof(char *), GFP_KERNEL);
> > +	pg_vec = kzalloc(block_nr * sizeof(struct pgv), GFP_KERNEL);
> 
> While we are at it, we could check block_nr being a sane value here ;)
> 
This is true.  What do you think a reasonable sane value is?  libpcap seems to
limit itself to 32 order 5 entries in the ring, but that seems a bit arbitrary.
Perhaps we could check and limit allocations to being no more than order 8
(1Mb), and a total allocation of no more than perhaps max(32Mb, 1% of all ram)?
Just throwing it out there, open to any suggestions here

> Nice stuff Neil !
> 
Thanks!
Neil

> 
> 

^ permalink raw reply

* Re: alloc_netdev_mq() and multiqueues
From: Ben Hutchings @ 2010-11-09 18:40 UTC (permalink / raw)
  To: Kevin Wilson; +Cc: netdev
In-Reply-To: <AANLkTi=8PrpAB+ARt-H3niNjnmQgHF1Ass345e95_pdq@mail.gmail.com>

On Tue, 2010-11-09 at 20:33 +0200, Kevin Wilson wrote:
> Hello,
> I have a short question about multiqueues  and I will appreciate if
> somebody can answer shortly in 2-3 sentences.
> When talking about multiqueues  I refer for example, to
> http://nfws.edenwall.com/nfws_userday/David-Miller_Linux-Multiqueue-Networking.pdf,
> alloc_netdev_mq() and friends.
> 
> 1) Does an ordinary network driver code can be adjusted to use
> multiqueues ? or do we need some
> special hardware feature ?

This feature is only useful if the hardware has multiple transmit
queues.

> 2) How can I know if a certain device support multiqueus?\

Read the hardware specs.

> 3) Can anybody name some network cards which support  multiqueues?

'git grep -l dev_mq drivers/net' will show you which drivers do.  I have
no knowledge beyond this of which hardware has multiple queues.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* alloc_netdev_mq() and multiqueues
From: Kevin Wilson @ 2010-11-09 18:33 UTC (permalink / raw)
  To: netdev

Hello,
I have a short question about multiqueues  and I will appreciate if
somebody can answer shortly in 2-3 sentences.
When talking about multiqueues  I refer for example, to
http://nfws.edenwall.com/nfws_userday/David-Miller_Linux-Multiqueue-Networking.pdf,
alloc_netdev_mq() and friends.

1) Does an ordinary network driver code can be adjusted to use
multiqueues ? or do we need some
special hardware feature ?
2) How can I know if a certain device support multiqueus?\

3) Can anybody name some network cards which support  multiqueues?

Regards,
Kevin

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox