Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next V1 00/14] net/mlx4_en: Optimizations to TX flow
From: Eric Dumazet @ 2014-10-06 12:51 UTC (permalink / raw)
  To: Amir Vadai
  Cc: David S. Miller, Eric Dumazet, netdev, Yevgeny Petrilin,
	Or Gerlitz, Ido Shamay
In-Reply-To: <1412576163-7224-1-git-send-email-amirv@mellanox.com>

On Mon, 2014-10-06 at 09:15 +0300, Amir Vadai wrote:

> Changes from V0:
> - Patch 14/14 ("Use the new tx_copybreak to set inline threshold"):
>   - Use same coding convention as currently is in en_ethtool.c
> - Patch 1/14 ("Code cleanups in tx path") and Patch 9/14 ("Use local var in
>   tx flow for skb_shinfo(skb)"):
>   - local var shinfo was used by mistake in Patch 1/14 while declared at 9/14.
>     Fixed it for the sake of future bisections

Hmm... patches were already merged. I am afraid you need to provide one
patch on top of current net-next, if still needed. Too bad for the
bisection.

Thanks !

^ permalink raw reply

* Re: [PATCH net-next V1 00/14] net/mlx4_en: Optimizations to TX flow
From: Amir Vadai @ 2014-10-06 12:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Eric Dumazet, netdev, Yevgeny Petrilin,
	Or Gerlitz, Ido Shamay
In-Reply-To: <1412599904.11091.66.camel@edumazet-glaptop2.roam.corp.google.com>

On 10/6/2014 3:51 PM, Eric Dumazet wrote:
> On Mon, 2014-10-06 at 09:15 +0300, Amir Vadai wrote:
> 
>> Changes from V0:
>> - Patch 14/14 ("Use the new tx_copybreak to set inline threshold"):
>>   - Use same coding convention as currently is in en_ethtool.c
>> - Patch 1/14 ("Code cleanups in tx path") and Patch 9/14 ("Use local var in
>>   tx flow for skb_shinfo(skb)"):
>>   - local var shinfo was used by mistake in Patch 1/14 while declared at 9/14.
>>     Fixed it for the sake of future bisections
> 
> Hmm... patches were already merged. I am afraid you need to provide one
> patch on top of current net-next, if still needed. Too bad for the
> bisection.
> 
> Thanks !
> 
> 

Yeh, I missed it somehow. Sorry for the spam.

Amir

^ permalink raw reply

* Re: macvlan: optimizing the receive path?
From: Vlad Yasevich @ 2014-10-06 13:04 UTC (permalink / raw)
  To: David Miller, jbaron; +Cc: netdev, kaber
In-Reply-To: <20141004.204203.2211720828886085354.davem@davemloft.net>

On 10/04/2014 08:42 PM, David Miller wrote:
> From: Jason Baron <jbaron@akamai.com>
> Date: Thu, 02 Oct 2014 16:28:13 -0400
> 
>> --- a/drivers/net/macvlan.c
>> +++ b/drivers/net/macvlan.c
>> @@ -321,8 +321,8 @@ static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb)
>>         skb->dev = dev;
>>         skb->pkt_type = PACKET_HOST;
>>  
>> -       ret = netif_rx(skb);
>> -
>> +      macvlan_count_rx(vlan, len, true, 0);
>> +      return RX_HANDLER_ANOTHER;
>>  out:
>>         macvlan_count_rx(vlan, len, ret == NET_RX_SUCCESS, 0);
>>         return RX_HANDLER_CONSUMED;
> 
> That last argument to macvlan_count_rx() is a bool and thus should be
> specified as "false".  Yes I know other areas of this file get it
> wrong too.
> 
> Also, what about GRO?  Won't we get GRO processing if we do this via
> netif_rx() but not via the RX_HANDLER_ANOTHER route?  Just curious...

Wouldn't GRO already happen at the lower level?  For macvlan-to-macvlan,
you'd typically have large packets so no need for GRO.

-vlad

^ permalink raw reply

* Re: [PATCH] drivers/net/phy/Kconfig: Let MDIO_BCM_UNIMAC depend on HAS_IOMEM
From: Chen Gang @ 2014-10-06 13:05 UTC (permalink / raw)
  To: David Miller; +Cc: f.fainelli, netdev, linux-kernel, richard
In-Reply-To: <20141006.004646.685997864382345892.davem@davemloft.net>

On 10/6/14 12:46, David Miller wrote:
> From: Chen Gang <gang.chen.5i5j@gmail.com>
> Date: Sat, 04 Oct 2014 17:54:33 +0800
> 
>> MDIO_BCM_UNIMAC needs HAS_IOMEM, so depend on it, the related error (
>> with allmodconfig under um):
>>
>>     MODPOST 1205 modules
>>   ERROR: "devm_ioremap" [drivers/net/phy/mdio-bcm-unimac.ko] undefined!
>>
>> Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
> 
> Applied, thanks.
> 

Thank you for your work.

-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply

* Re: [PATCH] sctp: handle association restarts when the socket is closed.
From: Vlad Yasevich @ 2014-10-06 13:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20141006.002229.1370132762577292096.davem@davemloft.net>

On 10/06/2014 12:22 AM, David Miller wrote:
> From: Vladislav Yasevich <vyasevich@gmail.com>
> Date: Fri,  3 Oct 2014 18:16:20 -0400
> 
>> From: Vlad Yasevich <vyasevich@gmail.com>
>>
>> Currently association restarts do not take into consideration the
>> state of the socket.  When a restart happens, the current assocation
>> simply transitions into established state.  This creates a condition
>> where a remote system, through a the restart procedure, may create a
>> local association that is no way reachable by user.  The conditions
>> to trigger this are as follows:
>>   1) Remote does not acknoledge some data causing data to remain
>>      outstanding.
>>   2) Local application calls close() on the socket.  Since data
>>      is still outstanding, the association is placed in SHUTDOWN_PENDING
>>      state.  However, the socket is closed.
>>   3) The remote tries to create a new association, triggering a restart
>>      on the local system.  The association moves from SHUTDOWN_PENDING
>>      to ESTABLISHED.  At this point, it is no longer reachable by
>>      any socket on the local system.
>>
>> This patch addresses the above situation by moving the newly ESTABLISHED
>> association into SHUTDOWN-SENT state and bundling a SHUTDOWN after
>> the COOKIE-ACK chunk.  This way, the restarted associate immidiately
>> enters the shutdown procedure and forces the termination of the
>> unreachable association.
>>
>> Reported-by: David Laight <David.Laight@aculab.com>
>> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
> 
> Applied, thanks.
> 
> Candidate for -stable?
> 

I say yes.  If this problem is triggered it is total pain to get the memory
clean-up.

-vlad

^ permalink raw reply

* RE: [PATCH v2 net-next 15/15] tipc: remove old ASCII netlink API
From: Jon Maloy @ 2014-10-06 13:37 UTC (permalink / raw)
  To: David Miller, Richard Alpe
  Cc: netdev@vger.kernel.org, tipc-discussion@lists.sourceforge.net
In-Reply-To: <20141003.165038.2121547250200790300.davem@davemloft.net>



> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of David Miller
> Sent: October-03-14 7:51 PM
> To: Richard Alpe
> Cc: netdev@vger.kernel.org; tipc-discussion@lists.sourceforge.net
> Subject: Re: [PATCH v2 net-next 15/15] tipc: remove old ASCII netlink API
> 
> From: <richard.alpe@ericsson.com>
> Date: Thu, 2 Oct 2014 16:58:41 +0200
> 
> > From: Richard Alpe <richard.alpe@ericsson.com>
> >
> > The API has been deprecated along with its user-space tool
> > "tipc-config". Users shall use the new kernel netlink API already in
> > place along with the new user space tool "tipc" that's part of the
> > tipc-utils package.
> >
> > Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
> > Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
> > Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
> > Acked-by: Ying Xue <ying.xue@windriver.com>
> 
> Sorry, no matter what your circumstances, you cannot just break binaries
> that might be out there.

I disagree.  We did of course consider this before we posted the patch.  
There is absolutely no reason to believe there are binaries "out there" 
using this API. It is intended for and used only by our own tool "tipc-config", 
and the half-baked documentation that exists about it is not even
correct. This is an TIPC-internal API/ABI, which has changed over the years,
and keeps changing.  To our best judgement there is no risk in removing the
old API.

However, we took great care to not break any scripts that might be using 
tipc-config. Therefore, we made this wrapper that provides the old 
tipc-config command API to existing users.

Regards
///jon
> 
> The rest of this patch series is fine, but I'm really not going to even entertain
> applying this one, sorry.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next] qdisc: validate skb without holding lock
From: Jesper Dangaard Brouer @ 2014-10-06 14:12 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, netdev, therbert, hannes, fw, dborkman, jhs,
	alexander.duyck, john.r.fastabend, dave.taht, toke, brouer
In-Reply-To: <20141003.153645.72976986956341944.davem@davemloft.net>

On Fri, 03 Oct 2014 15:36:45 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 03 Oct 2014 15:31:07 -0700
> 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > Validation of skb can be pretty expensive :
> > 
> > GSO segmentation and/or checksum computations.
> > 
> > We can do this without holding qdisc lock, so that other cpus
> > can queue additional packets.
> > 
[...]
> > 
> > Turning TSO on or off had no effect on throughput, only few more cpu
> > cycles. Lock contention on qdisc lock disappeared.

This is good work! Lock contention significantly reduced!

My 10G tests just 2x 10G netperf TCP_STREAM shows:

 With GSO=off TSO=off, _raw_spin_lock is now only at perf top#13 with
1.44% (80% from qdisc calls, 60% is from __dev_queue_xmit and 20% from
sch_direct_xmit) (before with qdisc bulking is was 2.66%).

 The "show off" case is GSO=on TSO=off, where raw_spin_lock is now only
at perf top#26 with 0.85% and only 54.74% comes from qdisc calls
(52.07% sch_direct_xmit and 2.67% __dev_queue_xmit).


This is some significant improvements to the kernels xmit layer,
me very happy!!! :-))) Thanks everyone!

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH 16/16] virtio_net: fix use after free on allocation failure
From: Cornelia Huck @ 2014-10-06 14:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <1412525038-15871-17-git-send-email-mst@redhat.com>

On Sun, 5 Oct 2014 19:07:38 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> In the extremely unlikely event that driver initialization fails after
> RX buffers are added, virtio net frees RX buffers while VQs are
> still active, potentially causing device to use a freed buffer.
> 
> To fix, reset device first - same as we do on device removal.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  drivers/net/virtio_net.c | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

^ permalink raw reply

* [PATCH linux v2 0/1] Optimize network interfaces creation
From: Nicolas Dichtel @ 2014-10-06 14:30 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: davem, ebiederm, akpm, adobriyan, rui.xiang, viro, oleg, gorcunov,
	kirill.shutemov, grant.likely, tytso
In-Reply-To: <542EA009.4060009@6wind.com>

When a lot of netdevices are created, one of the bottleneck is the creation
of proc entries. This serie aims to accelerate this part.

I'm not sure against which tree this patch should be done. I've done it against
linux.git.

RFCv1 -> v2
 - use a red-black tree instead of a hash list

 fs/proc/generic.c  | 164 ++++++++++++++++++++++++++++++++++-------------------
 fs/proc/internal.h |  11 ++--
 fs/proc/proc_net.c |   1 +
 fs/proc/root.c     |   1 +
 4 files changed, 113 insertions(+), 64 deletions(-)

Comments are welcome.

Regards,
Nicolas

^ permalink raw reply

* [PATCH linux v2 1/1] fs/proc: use a rb tree for the directory entries
From: Nicolas Dichtel @ 2014-10-06 14:30 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: davem, ebiederm, akpm, adobriyan, rui.xiang, viro, oleg, gorcunov,
	kirill.shutemov, grant.likely, tytso, Nicolas Dichtel
In-Reply-To: <1412605834-4417-1-git-send-email-nicolas.dichtel@6wind.com>

The current implementation for the directories in /proc is using a single
linked list. This is slow when handling directories with large numbers of
entries (eg netdevice-related entries when lots of tunnels are opened).

This patch replaces this linked list by a red-black tree.

Here are some numbers:

dummy30000.batch contains 30 000 times 'link add type dummy'.

Before the patch:
$ time ip -b dummy30000.batch
real	2m31.950s
user	0m0.440s
sys	2m21.440s
$ time rmmod dummy
real	1m35.764s
user	0m0.000s
sys	1m24.088s

After the patch:
$ time ip -b dummy30000.batch
real	2m0.874s
user	0m0.448s
sys	1m49.720s
$ time rmmod dummy
real	1m13.988s
user	0m0.000s
sys	1m1.008s

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 fs/proc/generic.c  | 164 ++++++++++++++++++++++++++++++++++-------------------
 fs/proc/internal.h |  11 ++--
 fs/proc/proc_net.c |   1 +
 fs/proc/root.c     |   1 +
 4 files changed, 113 insertions(+), 64 deletions(-)

diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index 317b72641ebf..9f8fa1e5e8aa 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -31,9 +31,81 @@ static DEFINE_SPINLOCK(proc_subdir_lock);
 
 static int proc_match(unsigned int len, const char *name, struct proc_dir_entry *de)
 {
-	if (de->namelen != len)
-		return 0;
-	return !memcmp(name, de->name, len);
+	if (len < de->namelen)
+		return -1;
+	if (len > de->namelen)
+		return 1;
+
+	return memcmp(name, de->name, len);
+}
+
+static struct proc_dir_entry *pde_subdir_first(struct proc_dir_entry *dir)
+{
+	struct rb_node *node = rb_first(&dir->subdir);
+
+	if (node == NULL)
+		return NULL;
+
+	return rb_entry(node, struct proc_dir_entry, subdir_node);
+}
+
+static struct proc_dir_entry *pde_subdir_next(struct proc_dir_entry *dir)
+{
+	struct rb_node *node = rb_next(&dir->subdir_node);
+
+	if (node == NULL)
+		return NULL;
+
+	return rb_entry(node, struct proc_dir_entry, subdir_node);
+}
+
+static struct proc_dir_entry *pde_subdir_find(struct proc_dir_entry *dir,
+					      const char *name,
+					      unsigned int len)
+{
+	struct rb_node *node = dir->subdir.rb_node;
+
+	while (node) {
+		struct proc_dir_entry *de = container_of(node,
+							 struct proc_dir_entry,
+							 subdir_node);
+		int result = proc_match(len, name, de);
+
+		if (result < 0)
+			node = node->rb_left;
+		else if (result > 0)
+			node = node->rb_right;
+		else
+			return de;
+	}
+	return NULL;
+}
+
+static bool pde_subdir_insert(struct proc_dir_entry *dir,
+			      struct proc_dir_entry *de)
+{
+	struct rb_root *root = &dir->subdir;
+	struct rb_node **new = &root->rb_node, *parent = NULL;
+
+	/* Figure out where to put new node */
+	while (*new) {
+		struct proc_dir_entry *this =
+			container_of(*new, struct proc_dir_entry, subdir_node);
+		int result = proc_match(de->namelen, de->name, this);
+
+		parent = *new;
+		if (result < 0)
+			new = &(*new)->rb_left;
+		else if (result > 0)
+			new = &(*new)->rb_right;
+		else
+			return false;
+	}
+
+	/* Add new node and rebalance tree. */
+	rb_link_node(&de->subdir_node, parent, new);
+	rb_insert_color(&de->subdir_node, root);
+	return true;
 }
 
 static int proc_notify_change(struct dentry *dentry, struct iattr *iattr)
@@ -92,10 +164,7 @@ static int __xlate_proc_name(const char *name, struct proc_dir_entry **ret,
 			break;
 
 		len = next - cp;
-		for (de = de->subdir; de ; de = de->next) {
-			if (proc_match(len, cp, de))
-				break;
-		}
+		de = pde_subdir_find(de, cp, len);
 		if (!de) {
 			WARN(1, "name '%s'\n", name);
 			return -ENOENT;
@@ -183,19 +252,16 @@ struct dentry *proc_lookup_de(struct proc_dir_entry *de, struct inode *dir,
 	struct inode *inode;
 
 	spin_lock(&proc_subdir_lock);
-	for (de = de->subdir; de ; de = de->next) {
-		if (de->namelen != dentry->d_name.len)
-			continue;
-		if (!memcmp(dentry->d_name.name, de->name, de->namelen)) {
-			pde_get(de);
-			spin_unlock(&proc_subdir_lock);
-			inode = proc_get_inode(dir->i_sb, de);
-			if (!inode)
-				return ERR_PTR(-ENOMEM);
-			d_set_d_op(dentry, &simple_dentry_operations);
-			d_add(dentry, inode);
-			return NULL;
-		}
+	de = pde_subdir_find(de, dentry->d_name.name, dentry->d_name.len);
+	if (de) {
+		pde_get(de);
+		spin_unlock(&proc_subdir_lock);
+		inode = proc_get_inode(dir->i_sb, de);
+		if (!inode)
+			return ERR_PTR(-ENOMEM);
+		d_set_d_op(dentry, &simple_dentry_operations);
+		d_add(dentry, inode);
+		return NULL;
 	}
 	spin_unlock(&proc_subdir_lock);
 	return ERR_PTR(-ENOENT);
@@ -225,7 +291,7 @@ int proc_readdir_de(struct proc_dir_entry *de, struct file *file,
 		return 0;
 
 	spin_lock(&proc_subdir_lock);
-	de = de->subdir;
+	de = pde_subdir_first(de);
 	i = ctx->pos - 2;
 	for (;;) {
 		if (!de) {
@@ -234,7 +300,7 @@ int proc_readdir_de(struct proc_dir_entry *de, struct file *file,
 		}
 		if (!i)
 			break;
-		de = de->next;
+		de = pde_subdir_next(de);
 		i--;
 	}
 
@@ -249,7 +315,7 @@ int proc_readdir_de(struct proc_dir_entry *de, struct file *file,
 		}
 		spin_lock(&proc_subdir_lock);
 		ctx->pos++;
-		next = de->next;
+		next = pde_subdir_next(de);
 		pde_put(de);
 		de = next;
 	} while (de);
@@ -286,9 +352,8 @@ static const struct inode_operations proc_dir_inode_operations = {
 
 static int proc_register(struct proc_dir_entry * dir, struct proc_dir_entry * dp)
 {
-	struct proc_dir_entry *tmp;
 	int ret;
-	
+
 	ret = proc_alloc_inum(&dp->low_ino);
 	if (ret)
 		return ret;
@@ -308,17 +373,10 @@ static int proc_register(struct proc_dir_entry * dir, struct proc_dir_entry * dp
 	}
 
 	spin_lock(&proc_subdir_lock);
-
-	for (tmp = dir->subdir; tmp; tmp = tmp->next)
-		if (strcmp(tmp->name, dp->name) == 0) {
-			WARN(1, "proc_dir_entry '%s/%s' already registered\n",
-				dir->name, dp->name);
-			break;
-		}
-
-	dp->next = dir->subdir;
 	dp->parent = dir;
-	dir->subdir = dp;
+	if (pde_subdir_insert(dir, dp) == false)
+		WARN(1, "proc_dir_entry '%s/%s' already registered\n",
+		     dir->name, dp->name);
 	spin_unlock(&proc_subdir_lock);
 
 	return 0;
@@ -354,6 +412,7 @@ static struct proc_dir_entry *__proc_create(struct proc_dir_entry **parent,
 	ent->namelen = qstr.len;
 	ent->mode = mode;
 	ent->nlink = nlink;
+	ent->subdir = RB_ROOT;
 	atomic_set(&ent->count, 1);
 	spin_lock_init(&ent->pde_unload_lock);
 	INIT_LIST_HEAD(&ent->pde_openers);
@@ -485,7 +544,6 @@ void pde_put(struct proc_dir_entry *pde)
  */
 void remove_proc_entry(const char *name, struct proc_dir_entry *parent)
 {
-	struct proc_dir_entry **p;
 	struct proc_dir_entry *de = NULL;
 	const char *fn = name;
 	unsigned int len;
@@ -497,14 +555,9 @@ void remove_proc_entry(const char *name, struct proc_dir_entry *parent)
 	}
 	len = strlen(fn);
 
-	for (p = &parent->subdir; *p; p=&(*p)->next ) {
-		if (proc_match(len, fn, *p)) {
-			de = *p;
-			*p = de->next;
-			de->next = NULL;
-			break;
-		}
-	}
+	de = pde_subdir_find(parent, fn, len);
+	if (de)
+		rb_erase(&de->subdir_node, &parent->subdir);
 	spin_unlock(&proc_subdir_lock);
 	if (!de) {
 		WARN(1, "name '%s'\n", name);
@@ -516,16 +569,15 @@ void remove_proc_entry(const char *name, struct proc_dir_entry *parent)
 	if (S_ISDIR(de->mode))
 		parent->nlink--;
 	de->nlink = 0;
-	WARN(de->subdir, "%s: removing non-empty directory "
-			 "'%s/%s', leaking at least '%s'\n", __func__,
-			 de->parent->name, de->name, de->subdir->name);
+	WARN(pde_subdir_first(de),
+	     "%s: removing non-empty directory '%s/%s', leaking at least '%s'\n",
+	     __func__, de->parent->name, de->name, pde_subdir_first(de)->name);
 	pde_put(de);
 }
 EXPORT_SYMBOL(remove_proc_entry);
 
 int remove_proc_subtree(const char *name, struct proc_dir_entry *parent)
 {
-	struct proc_dir_entry **p;
 	struct proc_dir_entry *root = NULL, *de, *next;
 	const char *fn = name;
 	unsigned int len;
@@ -537,24 +589,18 @@ int remove_proc_subtree(const char *name, struct proc_dir_entry *parent)
 	}
 	len = strlen(fn);
 
-	for (p = &parent->subdir; *p; p=&(*p)->next ) {
-		if (proc_match(len, fn, *p)) {
-			root = *p;
-			*p = root->next;
-			root->next = NULL;
-			break;
-		}
-	}
+	root = pde_subdir_find(parent, fn, len);
 	if (!root) {
 		spin_unlock(&proc_subdir_lock);
 		return -ENOENT;
 	}
+	rb_erase(&root->subdir_node, &parent->subdir);
+
 	de = root;
 	while (1) {
-		next = de->subdir;
+		next = pde_subdir_first(de);
 		if (next) {
-			de->subdir = next->next;
-			next->next = NULL;
+			rb_erase(&next->subdir_node, &de->subdir);
 			de = next;
 			continue;
 		}
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 7da13e49128a..433557634c1b 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -24,10 +24,9 @@ struct mempolicy;
  * tree) of these proc_dir_entries, so that we can dynamically
  * add new files to /proc.
  *
- * The "next" pointer creates a linked list of one /proc directory,
- * while parent/subdir create the directory structure (every
- * /proc file has a parent, but "subdir" is NULL for all
- * non-directory entries).
+ * parent/subdir are used for the directory structure (every /proc file has a
+ * parent, but "subdir" is empty for all non-directory entries).
+ * subdir_node is used to build the rb tree "subdir" of the parent.
  */
 struct proc_dir_entry {
 	unsigned int low_ino;
@@ -38,7 +37,9 @@ struct proc_dir_entry {
 	loff_t size;
 	const struct inode_operations *proc_iops;
 	const struct file_operations *proc_fops;
-	struct proc_dir_entry *next, *parent, *subdir;
+	struct proc_dir_entry *parent;
+	struct rb_root subdir;
+	struct rb_node subdir_node;
 	void *data;
 	atomic_t count;		/* use count */
 	atomic_t in_use;	/* number of callers into module in progress; */
diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index a63af3e0a612..1bde894bc624 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -192,6 +192,7 @@ static __net_init int proc_net_ns_init(struct net *net)
 	if (!netd)
 		goto out;
 
+	netd->subdir = RB_ROOT;
 	netd->data = net;
 	netd->nlink = 2;
 	netd->namelen = 3;
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 094e44d4a6be..4eae849baedd 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -166,6 +166,7 @@ void __init proc_root_init(void)
 {
 	int err;
 
+	proc_root.subdir = RB_ROOT;
 	proc_init_inodecache();
 	err = register_filesystem(&proc_fs_type);
 	if (err)
-- 
2.1.0

^ permalink raw reply related

* Re: CPSW bug with AM437x SK
From: Felipe Balbi @ 2014-10-06 14:33 UTC (permalink / raw)
  To: Mugunthan V N
  Cc: balbi, davem, netdev, Linux OMAP Mailing List,
	Linux ARM Kernel Mailing List, Linux Kernel Mailing List
In-Reply-To: <543266F3.4090200@ti.com>

[-- Attachment #1: Type: text/plain, Size: 1382 bytes --]

Hi,

On Mon, Oct 06, 2014 at 03:24:59PM +0530, Mugunthan V N wrote:
> On Friday 03 October 2014 06:34 AM, Felipe Balbi wrote:
> > [  261.177168] [<c0648d48>] (skb_panic) from [<c0565edc>] (skb_put+0x5c/0x60)
> > [  261.184415] [<c0565edc>] (skb_put) from [<c0605aac>] (unix_stream_sendmsg+0x164/0x390)
> > [  261.192712] [<c0605aac>] (unix_stream_sendmsg) from [<c055b364>] (sock_aio_write+0xdc/0xfc)
> > [  261.201475] [<c055b364>] (sock_aio_write) from [<c014c42c>] (do_sync_write+0x8c/0xb4)
> > [  261.209697] [<c014c42c>] (do_sync_write) from [<c014cf70>] (vfs_write+0x118/0x1c0)
> > [  261.217652] [<c014cf70>] (vfs_write) from [<c014d564>] (SyS_write+0x4c/0xa0)
> > [  261.225054] [<c014d564>] (SyS_write) from [<c000ed40>] (ret_fast_syscall+0x0/0x48)
> > [  261.232988] Code: e58d4008 e58de00c e59f0008 ebfff48e (e7f001f2) 
> > [  261.239378] ---[ end trace d64258d586f40104 ]---
> 
> The BT shows that the warn came from a unix socket interface, so this
> cannot be a CPSW bug, its a bug in unix socket.
> 
> Are you not seeing this issue with file system in any other media?

Have not tried with other media, but since it comes from vfs_write() and
my rootfs sits in NFS I figured "that might be the cause". Could not
reproduce this with BeagleBone Black, btw.

> I will try to reproduce this locally with my AM437x EVMsk.

alright.

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: pull request: wireless-next 2014-10-03
From: John W. Linville @ 2014-10-06 15:00 UTC (permalink / raw)
  To: David Miller; +Cc: linux-wireless, netdev, linux-kernel, Larry Finger
In-Reply-To: <20141005.213853.488066614144870837.davem@davemloft.net>

On Sun, Oct 05, 2014 at 09:38:53PM -0400, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Sun, 05 Oct 2014 21:35:11 -0400 (EDT)
> 
> > From: "John W. Linville" <linville@tuxdriver.com>
> > Date: Fri, 3 Oct 2014 14:01:52 -0400
> > 
> >> Please pull tihs batch of updates intended for the 3.18 stream!
> > 
> > Pulled, thanks for the stellar pull request text, as always.
> 
> John, what's the deal with the following?  Will it be resolved by the
> driver being removed from the staging tree?
> 
> WARNING: drivers/staging/rtl8192ee/r8192ee: 'rtl_evm_dbm_jaguar' exported twice. Previous export was in drivers/net/wireless/rtlwifi/rtlwifi.ko

Yes, exactly.

-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: [net-next PATCH v1 1/3] net: sched: af_packet support for direct ring access
From: John Fastabend @ 2014-10-06 15:01 UTC (permalink / raw)
  To: Daniel Borkmann, John Fastabend, Jesper Dangaard Brouer,
	John W. Linville, Neil Horman
  Cc: Florian Westphal, gerlitz.or, hannes, netdev, john.ronciak, amirv,
	eric.dumazet, danny.zhou
In-Reply-To: <543265A5.8000606@redhat.com>

On 10/06/2014 02:49 AM, Daniel Borkmann wrote:
> Hi John,
> 
> On 10/06/2014 03:12 AM, John Fastabend wrote:
>> On 10/05/2014 05:29 PM, Florian Westphal wrote:
>>> John Fastabend <john.fastabend@gmail.com> wrote:
>>>> There is one critical difference when running with these interfaces
>>>> vs running without them. In the normal case the af_packet module
>>>> uses a standard descriptor format exported by the af_packet user
>>>> space headers. In this model because we are working directly with
>>>> driver queues the descriptor format maps to the descriptor format
>>>> used by the device. User space applications can learn device
>>>> information from the socket option PACKET_DEV_DESC_INFO which
>>>> should provide enough details to extrapulate the descriptor formats.
>>>> Although this adds some complexity to user space it removes the
>>>> requirement to copy descriptor fields around.
>>>
>>> I find it very disappointing that we seem to have to expose such
>>> hardware specific details to userspace via hw-independent interface.
>>
>> Well it was only for convenience if it doesn't fit as a socket
>> option we can remove it. We can look up the device using the netdev
>> name from the bind call. I see your point though so if there is
>> consensus that this is not needed that is fine.
>>
>>> How big of a cost are we talking about when you say that it 'removes
>>> the requirement to copy descriptor fields'?
>>
>> This was likely a poor description. If you want to let user space
>> poll on the ring (without using system calls or interrupts) then
>> I don't see how you can _not_ expose the ring directly complete with
>> the vendor descriptor formats.
> 
> But how big is the concrete performance degradation you're seeing if you
> use an e.g. `netmap-alike` Linux-own variant as a hw-neutral interface
> that does *not* directly expose hw descriptor formats to user space?

If we don't directly expose the hardware descriptor formats then we
need to somehow kick the driver when we want it to do the copy from
the driver descriptor format to the common descriptor format.

This requires a system call as far as I can tell. Which has unwanted
overhead. I can micro-benchmark this if its helpful. But if we dredge
up Jesper's slides here we are really counting cycles so even small
numbers count if we want to hit line rate in a user space application
with 40Gpbs hardware.

> 
> With 1 core netmap does 10G line-rate on 64b; I don't know their numbers
> on 40G when run on decent hardware though.
> 
> It would really be great if we have something vendor neutral exposed as
> a stable ABI and could leverage emerging infrastructure we already have
> in the kernel such as eBPF and recent qdisc batching for raw sockets
> instead of reinventing the wheels. (Don't get me wrong, I would love to
> see AF_PACKET improved ...)

I don't think the interface is vendor specific. It does require some
knowledge of the hardware descriptor layout though. It is though vendor
neutral from my point of view. I provided the ixgbe patch simple because
I'm most familiar with it and have a NIC here. If someone wants to send me
a Mellanox NIC I can give it a try although I was hoping to recruit Or or
Amir? The only hardware feature required is flow classification to queues
which seems to be common across 10Gbps and 40/100Gbps devices. So most
of the drivers should be able to support this.

If your worried driver writers will implement the interface but not make
their descriptor formats easily available I considered putting the layout
in a header file in the uapi somewhere. Then we could just reject any
implementation that doesn't include the header file needed to use it
from user space.

With regards to leveraging eBPF and qdisc batching I don't see how this
works with direct DMA and polling. Needed to give the lowest overhead
between kernel and user space. In this case we want to use the hardware
to do the filtering that would normally be done for eBPF and for many
use cases the hardware flow classifiers is sufficient. 

We already added a qdisc bypass option I see this as taking this path
further. I believe there is room for a continuum here. For basic cases
use af_packet v1,v2 for mmap rings but using common descriptors use
af_packet v3 and set QOS_BYASS. For absolute lowest overhead and
specific applications that don't need QOS, eBPF use this interface.

Thanks.

> 
> Thanks,
> Daniel
> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v2 07/15] virtio_net: drop config_enable
From: Michael S. Tsirkin @ 2014-10-06 15:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Rusty Russell, virtualization, netdev
In-Reply-To: <1412608214-31944-1-git-send-email-mst@redhat.com>

Now that virtio core ensures config changes don't arrive during probing,
drop config_enable flag in virtio net.
On removal, flush is now sufficient to guarantee that no change work is
queued.

This help simplify the driver, and will allow setting DRIVER_OK earlier
without losing config change notifications.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 27 ++++-----------------------
 1 file changed, 4 insertions(+), 23 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 59caa06..743fb04 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -123,9 +123,6 @@ struct virtnet_info {
 	/* Host can handle any s/g split between our header and packet data */
 	bool any_header_sg;
 
-	/* enable config space updates */
-	bool config_enable;
-
 	/* Active statistics */
 	struct virtnet_stats __percpu *stats;
 
@@ -1408,9 +1405,6 @@ static void virtnet_config_changed_work(struct work_struct *work)
 	u16 v;
 
 	mutex_lock(&vi->config_lock);
-	if (!vi->config_enable)
-		goto done;
-
 	if (virtio_cread_feature(vi->vdev, VIRTIO_NET_F_STATUS,
 				 struct virtio_net_config, status, &v) < 0)
 		goto done;
@@ -1758,7 +1752,6 @@ static int virtnet_probe(struct virtio_device *vdev)
 	}
 
 	mutex_init(&vi->config_lock);
-	vi->config_enable = true;
 	INIT_WORK(&vi->config_work, virtnet_config_changed_work);
 
 	/* If we can receive ANY GSO packets, we must allocate large ones. */
@@ -1875,17 +1868,13 @@ static void virtnet_remove(struct virtio_device *vdev)
 
 	unregister_hotcpu_notifier(&vi->nb);
 
-	/* Prevent config work handler from accessing the device. */
-	mutex_lock(&vi->config_lock);
-	vi->config_enable = false;
-	mutex_unlock(&vi->config_lock);
+	/* Make sure no work handler is accessing the device. */
+	flush_work(&vi->config_work);
 
 	unregister_netdev(vi->dev);
 
 	remove_vq_common(vi);
 
-	flush_work(&vi->config_work);
-
 	free_percpu(vi->stats);
 	free_netdev(vi->dev);
 }
@@ -1898,10 +1887,8 @@ static int virtnet_freeze(struct virtio_device *vdev)
 
 	unregister_hotcpu_notifier(&vi->nb);
 
-	/* Prevent config work handler from accessing the device */
-	mutex_lock(&vi->config_lock);
-	vi->config_enable = false;
-	mutex_unlock(&vi->config_lock);
+	/* Make sure no work handler is accessing the device */
+	flush_work(&vi->config_work);
 
 	netif_device_detach(vi->dev);
 	cancel_delayed_work_sync(&vi->refill);
@@ -1916,8 +1903,6 @@ static int virtnet_freeze(struct virtio_device *vdev)
 
 	remove_vq_common(vi);
 
-	flush_work(&vi->config_work);
-
 	return 0;
 }
 
@@ -1941,10 +1926,6 @@ static int virtnet_restore(struct virtio_device *vdev)
 
 	netif_device_attach(vi->dev);
 
-	mutex_lock(&vi->config_lock);
-	vi->config_enable = true;
-	mutex_unlock(&vi->config_lock);
-
 	rtnl_lock();
 	virtnet_set_queues(vi, vi->curr_queue_pairs);
 	rtnl_unlock();
-- 
MST

^ permalink raw reply related

* [PATCH v2 08/15] virtio-net: drop config_mutex
From: Michael S. Tsirkin @ 2014-10-06 15:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Rusty Russell, virtualization, Cornelia Huck, netdev
In-Reply-To: <1412608214-31944-1-git-send-email-mst@redhat.com>

config_mutex served two purposes: prevent multiple concurrent config
change handlers, and synchronize access to config_enable flag.

Since commit dbf2576e37da0fcc7aacbfbb9fd5d3de7888a3c1
    workqueue: make all workqueues non-reentrant
all workqueues are non-reentrant, and config_enable
is now gone.

Get rid of the unnecessary lock.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
 drivers/net/virtio_net.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 743fb04..23e4a69 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -132,9 +132,6 @@ struct virtnet_info {
 	/* Work struct for config space updates */
 	struct work_struct config_work;
 
-	/* Lock for config space updates */
-	struct mutex config_lock;
-
 	/* Does the affinity hint is set for virtqueues? */
 	bool affinity_hint_set;
 
@@ -1404,7 +1401,6 @@ static void virtnet_config_changed_work(struct work_struct *work)
 		container_of(work, struct virtnet_info, config_work);
 	u16 v;
 
-	mutex_lock(&vi->config_lock);
 	if (virtio_cread_feature(vi->vdev, VIRTIO_NET_F_STATUS,
 				 struct virtio_net_config, status, &v) < 0)
 		goto done;
@@ -1430,7 +1426,7 @@ static void virtnet_config_changed_work(struct work_struct *work)
 		netif_tx_stop_all_queues(vi->dev);
 	}
 done:
-	mutex_unlock(&vi->config_lock);
+	return;
 }
 
 static void virtnet_config_changed(struct virtio_device *vdev)
@@ -1751,7 +1747,6 @@ static int virtnet_probe(struct virtio_device *vdev)
 		u64_stats_init(&virtnet_stats->rx_syncp);
 	}
 
-	mutex_init(&vi->config_lock);
 	INIT_WORK(&vi->config_work, virtnet_config_changed_work);
 
 	/* If we can receive ANY GSO packets, we must allocate large ones. */
-- 
MST

^ permalink raw reply related

* [PATCH v2 09/15] virtio_net: minor cleanup
From: Michael S. Tsirkin @ 2014-10-06 15:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, kvm, virtualization
In-Reply-To: <1412608214-31944-1-git-send-email-mst@redhat.com>

	goto done;
done:
	return;
is ugly, it was put there to make diff review easier.
replace by open-coded return.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
 drivers/net/virtio_net.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 23e4a69..ef04d23 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1403,7 +1403,7 @@ static void virtnet_config_changed_work(struct work_struct *work)
 
 	if (virtio_cread_feature(vi->vdev, VIRTIO_NET_F_STATUS,
 				 struct virtio_net_config, status, &v) < 0)
-		goto done;
+		return;
 
 	if (v & VIRTIO_NET_S_ANNOUNCE) {
 		netdev_notify_peers(vi->dev);
@@ -1414,7 +1414,7 @@ static void virtnet_config_changed_work(struct work_struct *work)
 	v &= VIRTIO_NET_S_LINK_UP;
 
 	if (vi->status == v)
-		goto done;
+		return;
 
 	vi->status = v;
 
@@ -1425,8 +1425,6 @@ static void virtnet_config_changed_work(struct work_struct *work)
 		netif_carrier_off(vi->dev);
 		netif_tx_stop_all_queues(vi->dev);
 	}
-done:
-	return;
 }
 
 static void virtnet_config_changed(struct virtio_device *vdev)
-- 
MST

^ permalink raw reply related

* [PATCH v2 11/15] virtio_net: enable VQs early
From: Michael S. Tsirkin @ 2014-10-06 15:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Rusty Russell, virtualization, netdev
In-Reply-To: <1412608214-31944-1-git-send-email-mst@redhat.com>

virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, virtio net violated this
rule by using receive VQs within probe.

To fix, call virtio_enable_vqs_early before using VQs.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ef04d23..430f3ae 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1792,6 +1792,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 		goto free_vqs;
 	}
 
+	virtio_enable_vqs_early(vdev);
+
 	/* Last of all, set up some receive buffers. */
 	for (i = 0; i < vi->curr_queue_pairs; i++) {
 		try_fill_recv(&vi->rq[i], GFP_KERNEL);
-- 
MST

^ permalink raw reply related

* [PATCH v2 14/15] 9p/trans_virtio: enable VQs early
From: Michael S. Tsirkin @ 2014-10-06 15:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, Eric Van Hensbergen, netdev, virtualization, v9fs-developer,
	Ron Minnich, David S. Miller
In-Reply-To: <1412608214-31944-1-git-send-email-mst@redhat.com>

virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, but virtio 9p device
adds self to channel list within probe, at which point VQ can be
used in violation of the spec.

To fix, call virtio_enable_vqs_early before using VQs.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 net/9p/trans_virtio.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 6940d8f..766ba48 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -575,6 +575,8 @@ static int p9_virtio_probe(struct virtio_device *vdev)
 	/* Ceiling limit to avoid denial of service attacks */
 	chan->p9_max_pages = nr_free_buffer_pages()/4;
 
+	virtio_enable_vqs_early(vdev);
+
 	mutex_lock(&virtio_9p_lock);
 	list_add_tail(&chan->chan_list, &virtio_chan_list);
 	mutex_unlock(&virtio_9p_lock);
-- 
MST

^ permalink raw reply related

* [PATCH v2 15/15] virtio_net: fix use after free on allocation failure
From: Michael S. Tsirkin @ 2014-10-06 15:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Rusty Russell, virtualization, Cornelia Huck, netdev
In-Reply-To: <1412608214-31944-1-git-send-email-mst@redhat.com>

In the extremely unlikely event that driver initialization fails after
RX buffers are added, virtio net frees RX buffers while VQs are
still active, potentially causing device to use a freed buffer.

To fix, reset device first - same as we do on device removal.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
 drivers/net/virtio_net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 430f3ae..3551417 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1830,6 +1830,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 	return 0;
 
 free_recv_bufs:
+	vi->vdev->config->reset(vdev);
+
 	free_receive_bufs(vi);
 	unregister_netdev(dev);
 free_vqs:
-- 
MST

^ permalink raw reply related

* Re: [PATCH v2 07/15] virtio_net: drop config_enable
From: Cornelia Huck @ 2014-10-06 15:28 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <1412608214-31944-8-git-send-email-mst@redhat.com>

On Mon, 6 Oct 2014 18:11:05 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> Now that virtio core ensures config changes don't arrive during probing,
> drop config_enable flag in virtio net.
> On removal, flush is now sufficient to guarantee that no change work is
> queued.
> 
> This help simplify the driver, and will allow setting DRIVER_OK earlier
> without losing config change notifications.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  drivers/net/virtio_net.c | 27 ++++-----------------------
>  1 file changed, 4 insertions(+), 23 deletions(-)

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

^ permalink raw reply

* Re: [Xen-devel] [PATCHv1] xen-netfront: always keep the Rx ring full of requests
From: annie li @ 2014-10-06 15:35 UTC (permalink / raw)
  To: David Vrabel; +Cc: netdev, xen-devel, Boris Ostrovsky
In-Reply-To: <1412256826-18874-1-git-send-email-david.vrabel@citrix.com>


On 2014/10/2 9:33, David Vrabel wrote:
> A full Rx ring only requires 1 MiB of memory.  This is not enough
> memory that it is useful to dynamically scale the number of Rx
> requests in the ring based on traffic rates.
>
> Keeping the ring full of Rx requests handles bursty traffic better
> than trying to converges on an optimal number of requests to keep
> filled.
>
> On a 4 core host, an iperf -P 64 -t 60 run from dom0 to a 4 VCPU guest
> improved from 5.1 Gbit/s to 5.6 Gbit/s.  Gains with more bursty
> traffic are expected to be higher.

Although removing sysfs is connected with the code change for full Rx 
ring utilization, I assume it is better to split this patch into two to 
make it simpler?

  ...snip...
>   
> +	queue->rx.req_prod_pvt = req_prod;
> +
> +	/* Not enough requests? Try again later. */
> +	if (req_prod - queue->rx.rsp_cons < NET_RX_SLOTS_MIN) {
> +		mod_timer(&queue->rx_refill_timer, jiffies + (HZ/10));
> +		return;

If the previous for loop breaks because of failure of 
xennet_alloc_one_rx_buffer, then notify_remote_via_irq is missed here if 
the code returns directly.

Thanks
Annie
> +	}
> +

^ permalink raw reply

* Re: [PATCH v2 11/15] virtio_net: enable VQs early
From: Cornelia Huck @ 2014-10-06 15:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <1412608214-31944-12-git-send-email-mst@redhat.com>

On Mon, 6 Oct 2014 18:11:19 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> virtio spec requires drivers to set DRIVER_OK before using VQs.
> This is set automatically after probe returns, virtio net violated this
> rule by using receive VQs within probe.
> 
> To fix, call virtio_enable_vqs_early before using VQs.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  drivers/net/virtio_net.c | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

^ permalink raw reply

* [PATCH net-next] ipvs: Avoid null-pointer deref in debug code
From: Alex Gartrell @ 2014-10-06 15:46 UTC (permalink / raw)
  To: horms; +Cc: ja, dan.carpenter, lvs-devel, netdev, kernel-team, Alex Gartrell
In-Reply-To: <20141006073232.GA10073@verge.net.au>

Use daddr instead of reaching into dest.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_xmit.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 91f17c1..437a366 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -316,7 +316,7 @@ __ip_vs_get_out_rt(int skb_af, struct sk_buff *skb, struct ip_vs_dest *dest,
 	if (unlikely(crosses_local_route_boundary(skb_af, skb, rt_mode,
 						  local))) {
 		IP_VS_DBG_RL("We are crossing local and non-local addresses"
-			     " daddr=%pI4\n", &dest->addr.ip);
+			     " daddr=%pI4\n", &daddr);
 		goto err_put;
 	}
 
@@ -458,7 +458,7 @@ __ip_vs_get_out_rt_v6(int skb_af, struct sk_buff *skb, struct ip_vs_dest *dest,
 	if (unlikely(crosses_local_route_boundary(skb_af, skb, rt_mode,
 						  local))) {
 		IP_VS_DBG_RL("We are crossing local and non-local addresses"
-			     " daddr=%pI6\n", &dest->addr.in6);
+			     " daddr=%pI6\n", daddr);
 		goto err_put;
 	}
 
-- 
Alex Gartrell <agartrell@fb.com>


^ permalink raw reply related

* Re: [PATCH net-next] ipvs: Avoid null-pointer deref in debug code
From: Alex Gartrell @ 2014-10-06 15:56 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: horms, dan.carpenter, lvs-devel, netdev, kernel-team
In-Reply-To: <alpine.LFD.2.11.1410060940470.1738@ja.home.ssi.bg>

Hey Julian,

On 10/05/2014 11:49 PM, Julian Anastasov wrote:
>
> 	You have to print the "daddr" variable as
> it was done before your patchset in the
> "Stopping traffic to %s address, dest: %p..." message
> because dest is not present in all cases, for example,
> for *bypass_xmit. Other places provide cp->daddr but
> for backup server some conns can live without cp->dest.

I've sent an updated patch that does this but I have some questions 
about other stuff that I find mildly confusing.  Specifically I didn't 
realize until looking at the call sites that !dest || daddr = 
dest->addr.ip (but maybe I'm wrong?)

If that's the case, why do we have the following line in __ip_vs_get_out_rt?

                 daddr = dest->addr.ip;

If that's /always/ true then we should add the following line or a 
comment to the same effect to clarify

	BUG_ON(dest && dest->addr.ip != daddr);

If that's not intended to always be true, then should the patch be the 
following?

	...%pI4", dest ? &dest->addr.ip : &daddr);

Thanks,
-- 
Alex Gartrell <agartrell@fb.com>

^ permalink raw reply

* Re: [Xen-devel] [PATCHv1] xen-netfront: always keep the Rx ring full of requests
From: David Vrabel @ 2014-10-06 16:00 UTC (permalink / raw)
  To: annie li; +Cc: netdev, xen-devel, Boris Ostrovsky
In-Reply-To: <5432B6D2.9030503@oracle.com>

On 06/10/14 16:35, annie li wrote:
> 
> On 2014/10/2 9:33, David Vrabel wrote:
>> A full Rx ring only requires 1 MiB of memory.  This is not enough
>> memory that it is useful to dynamically scale the number of Rx
>> requests in the ring based on traffic rates.
>>
>> Keeping the ring full of Rx requests handles bursty traffic better
>> than trying to converges on an optimal number of requests to keep
>> filled.
>>
>> On a 4 core host, an iperf -P 64 -t 60 run from dom0 to a 4 VCPU guest
>> improved from 5.1 Gbit/s to 5.6 Gbit/s.  Gains with more bursty
>> traffic are expected to be higher.
> 
> Although removing sysfs is connected with the code change for full Rx
> ring utilization, I assume it is better to split this patch into two to
> make it simpler?

I don't see how splitting the patch would be an improvement.

>>   +    queue->rx.req_prod_pvt = req_prod;
>> +
>> +    /* Not enough requests? Try again later. */
>> +    if (req_prod - queue->rx.rsp_cons < NET_RX_SLOTS_MIN) {
>> +        mod_timer(&queue->rx_refill_timer, jiffies + (HZ/10));
>> +        return;
> 
> If the previous for loop breaks because of failure of
> xennet_alloc_one_rx_buffer, then notify_remote_via_irq is missed here if
> the code returns directly.

This is deliberate -- there's no point notifying the backend if there
aren't enough requests for the next packet.  Since we don't know what
the next packet might be we assume it's the largest possible.

David

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox