Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] Multiqueue macvtap driver
From: Krishna Kumar2 @ 2010-08-02 12:37 UTC (permalink / raw)
  To: David Miller; +Cc: arnd, bhutchings, mst, netdev
In-Reply-To: <20100801.003406.02275545.davem@davemloft.net>

Thanks Ben & Dave. A question though - which of the following
is preferable for the macvtap driver:

1. Calculate flow and use that to find a queue; or
2. First check if skb_rx_queue_recorded is true and if so use
    that; otherwise calculate the flow as in #1.

I guess #1 is better, since packets for a single flow will go to the
same queue even if they arrive on different rxqs of a mq driver.
But I want to make sure.

Thanks,

- KK

David Miller <davem@davemloft.net> wrote on 08/01/2010 01:04:06 PM:

> David Miller <davem@davemloft.net>
> 08/01/2010 01:04 PM
>
> To
>
> bhutchings@solarflare.com
>
> cc
>
> Krishna Kumar2/India/IBM@IBMIN, arnd@arndb.de,
> netdev@vger.kernel.org, mst@redhat.com
>
> Subject
>
> Re: [PATCH] Multiqueue macvtap driver
>
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Sat, 31 Jul 2010 20:18:27 +0100
>
> > On Sat, 2010-07-31 at 19:27 +0530, Krishna Kumar wrote:
> > [...]
> >> @@ -136,39 +158,68 @@ static void macvtap_put_queue(struct mac
> >>  }
> >>
> >>  /*
> >> - * Since we only support one queue, just dereference the pointer.
> >> + * Select a queue based on the rxq of the device on which this packet
> >> + * arrived. If the incoming device is not mq, then use our cpu number
> >> + * to select a queue. vlan->numvtaps is cached in case it changes
> >> + * during the execution of this function.
> >>   */
> > [...]
> >
> > This can result in reordering if a single-queue device's RX interrupt's
> > CPU affinity is changed.  We generally try to avoid that.  You should
> > really use or generate a flow hash.  There is code for this in
> > net/core/dev.c:get_rps_cpu() which could be factored out into a
separate
> > function.
>
> Agreed.


^ permalink raw reply

* Fwd: a Great Idea - include Kademlia networking protocol in kernel
From: hp fk @ 2010-08-02 13:16 UTC (permalink / raw)
  To: netdev
In-Reply-To: <AANLkTik7powjUaJ51bA_bgPVb03U-RVZay3NffHBYLzx@mail.gmail.com>

---------- Forwarded message ----------
From: hp fk <fkhp101@gmail.com>
Date: 2010/8/2
Subject: a Great Idea - include Kademlia networking protocol in kernel
To: linux-kernel@vger.kernel.org


Kademlia protocol is widely used by many p2p applications, if kernel
support the Kademlia protocol and its p2p network infrastracture. it
will be easier to bring us into a new stage of network with much more
freedom and equality.

1.systems could get public identity on network much more easier, and
need not depend on DNS system, which usual controled by buraucracy
agents.
2. system presence on network is easier to resolved in a dynamic ip
environment that many ISPs use to effectively use IP address. and
communication could be much easier.
3.many p2p applications could be releaved from building the p2p
network infrastracture repeatedly and focus on application features.
4. some restrictions imposed by buraucracy censorship will be breaked
and give us much more communication freedom
5. all linux systems could be equal citizens of a unified greate linux
p2p network and promote the development of p2p network.

^ permalink raw reply

* Re: [PATCH] Multiqueue macvtap driver
From: Ben Hutchings @ 2010-08-02 13:29 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: David Miller, arnd, mst, netdev
In-Reply-To: <OF8519C301.F0E5FA07-ON65257772.004983CC-65257773.00453C8B@in.ibm.com>

On Mon, 2010-08-02 at 18:07 +0530, Krishna Kumar2 wrote:
> Thanks Ben & Dave. A question though - which of the following
> is preferable for the macvtap driver:
> 
> 1. Calculate flow and use that to find a queue; or
> 2. First check if skb_rx_queue_recorded is true and if so use
>     that; otherwise calculate the flow as in #1.
> 
> I guess #1 is better, since packets for a single flow will go to the
> same queue even if they arrive on different rxqs of a mq driver.
> But I want to make sure.
[...]

#2 is right.  Just as macvtap should provide a stable flow to RX-queue
mapping, so should the drivers it interfaces with.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [patch] usbnet: fix 100% CPU use on suspended device
From: Elly Jones @ 2010-08-02 13:31 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Alan Stern, David Miller, netdev-u79uwXL29TY76Z2rM5mHXA, USB list
In-Reply-To: <201007261821.15020.oliver-GvhC2dPhHPQdnm+yROfE0A@public.gmane.org>

On Mon, Jul 26, 2010 at 12:21 PM, Oliver Neukum <oliver-GvhC2dPhHPQdnm+yROfE0A@public.gmane.org> wrote:
> Am Montag, 26. Juli 2010, 17:13:23 schrieb Alan Stern:
>> On Mon, 26 Jul 2010, Elly Jones wrote:
>>
>> > > This isn't right.  The problem should be fixed some other way.  Under
>> > > what circumstances are URBs submitted incorrectly?
>> >
>> > When the device is autosuspended. What is the proper thing for a
>> > device to do here?
>>
>> From looking at the code, it appears that the EVENT_DEV_ASLEEP flag
>> should be tested in usbnet_bh() the way it is in rx_submit().  But I'm
>> not an expert on usbnet; we should ask someone who is, like Oliver.
>
> Sorry, I didn't notice this thread.
>
> The correct way to check for autosuspend in usbnet is to look
> at EVENT_DEV_ASLEEP under txq.lock. That being said, usbnet_bh()
> uses rx_submit() which does the correct check. The bug seems to be
> a lack of error handling in usbnet_bh() regarding the return of rx_submit()

If rx_submit() fails, should usbnet_bh() just not tasklet_schedule() itself?

>        Regards
>                Oliver

-- Elly
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Fwd: a Great Idea - include Kademlia networking protocol in kernel
From: Paulius Zaleckas @ 2010-08-02 13:51 UTC (permalink / raw)
  To: hp fk; +Cc: netdev
In-Reply-To: <AANLkTik5qnmHALi7=AMA+Lai6a+9STgub9BnQAV6EVkh@mail.gmail.com>

On 08/02/2010 04:16 PM, hp fk wrote:
> ---------- Forwarded message ----------
> From: hp fk<fkhp101@gmail.com>
> Date: 2010/8/2
> Subject: a Great Idea - include Kademlia networking protocol in kernel
> To: linux-kernel@vger.kernel.org
>
>
> Kademlia protocol is widely used by many p2p applications, if kernel
> support the Kademlia protocol and its p2p network infrastracture. it
> will be easier to bring us into a new stage of network with much more
> freedom and equality.
>
> 1.systems could get public identity on network much more easier, and
> need not depend on DNS system, which usual controled by buraucracy
> agents.
> 2. system presence on network is easier to resolved in a dynamic ip
> environment that many ISPs use to effectively use IP address. and
> communication could be much easier.
> 3.many p2p applications could be releaved from building the p2p
> network infrastracture repeatedly and focus on application features.
> 4. some restrictions imposed by buraucracy censorship will be breaked
> and give us much more communication freedom
> 5. all linux systems could be equal citizens of a unified greate linux
> p2p network and promote the development of p2p network.

I don't see any valid point why it should be done in kernel and not
in userspace. Actually doing this in userspace is better since it
would allow you to support other OS like BSD, Windows and etc.

^ permalink raw reply

* [PATCH v2 1/2] core: Factor out flow calculation from get_rps_cpu
From: Krishna Kumar @ 2010-08-02 14:33 UTC (permalink / raw)
  To: davem, arnd; +Cc: bhutchings, netdev, therbert, Krishna Kumar, mst

From: Krishna Kumar <krkumar2@in.ibm.com>

Factor out flow calculation code from get_rps_cpu, since macvtap
driver can use the same code.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 net/core/dev.c |   94 +++++++++++++++++++++++++++++------------------
 1 file changed, 58 insertions(+), 36 deletions(-)

diff -ruNp org/net/core/dev.c new/net/core/dev.c
--- org/net/core/dev.c	2010-08-02 10:06:59.000000000 +0530
+++ new/net/core/dev.c	2010-08-02 19:29:34.000000000 +0530
@@ -2263,51 +2263,24 @@ static inline void ____napi_schedule(str
 	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
 }
 
-#ifdef CONFIG_RPS
-
-/* One global table that all flow-based protocols share. */
-struct rps_sock_flow_table *rps_sock_flow_table __read_mostly;
-EXPORT_SYMBOL(rps_sock_flow_table);
-
 /*
- * get_rps_cpu is called from netif_receive_skb and returns the target
- * CPU from the RPS map of the receiving queue for a given skb.
- * rcu_read_lock must be held on entry.
+ * skb_calculate_flow: calculate a flow hash based on src/dst addresses
+ * and src/dst port numbers. On success, returns a hash number (> 0),
+ * otherwise -1.
  */
-static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
-		       struct rps_dev_flow **rflowp)
+int skb_calculate_flow(struct net_device *dev, struct sk_buff *skb)
 {
+	int hash = skb->rxhash;
 	struct ipv6hdr *ip6;
 	struct iphdr *ip;
-	struct netdev_rx_queue *rxqueue;
-	struct rps_map *map;
-	struct rps_dev_flow_table *flow_table;
-	struct rps_sock_flow_table *sock_flow_table;
-	int cpu = -1;
 	u8 ip_proto;
-	u16 tcpu;
 	u32 addr1, addr2, ihl;
 	union {
 		u32 v32;
 		u16 v16[2];
 	} ports;
 
-	if (skb_rx_queue_recorded(skb)) {
-		u16 index = skb_get_rx_queue(skb);
-		if (unlikely(index >= dev->num_rx_queues)) {
-			WARN_ONCE(dev->num_rx_queues > 1, "%s received packet "
-				"on queue %u, but number of RX queues is %u\n",
-				dev->name, index, dev->num_rx_queues);
-			goto done;
-		}
-		rxqueue = dev->_rx + index;
-	} else
-		rxqueue = dev->_rx;
-
-	if (!rxqueue->rps_map && !rxqueue->rps_flow_table)
-		goto done;
-
-	if (skb->rxhash)
+	if (hash)
 		goto got_hash; /* Skip hash computation on packet header */
 
 	switch (skb->protocol) {
@@ -2334,6 +2307,7 @@ static int get_rps_cpu(struct net_device
 	default:
 		goto done;
 	}
+
 	switch (ip_proto) {
 	case IPPROTO_TCP:
 	case IPPROTO_UDP:
@@ -2356,11 +2330,59 @@ static int get_rps_cpu(struct net_device
 	/* get a consistent hash (same value on both flow directions) */
 	if (addr2 < addr1)
 		swap(addr1, addr2);
-	skb->rxhash = jhash_3words(addr1, addr2, ports.v32, hashrnd);
-	if (!skb->rxhash)
-		skb->rxhash = 1;
+
+	hash = jhash_3words(addr1, addr2, ports.v32, hashrnd);
+	if (!hash)
+		hash = 1;
 
 got_hash:
+	return hash;
+
+done:
+	return -1;
+}
+EXPORT_SYMBOL(skb_calculate_flow);
+
+#ifdef CONFIG_RPS
+
+/* One global table that all flow-based protocols share. */
+struct rps_sock_flow_table *rps_sock_flow_table __read_mostly;
+EXPORT_SYMBOL(rps_sock_flow_table);
+
+/*
+ * get_rps_cpu is called from netif_receive_skb and returns the target
+ * CPU from the RPS map of the receiving queue for a given skb.
+ * rcu_read_lock must be held on entry.
+ */
+static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
+		       struct rps_dev_flow **rflowp)
+{
+	struct netdev_rx_queue *rxqueue;
+	struct rps_map *map;
+	struct rps_dev_flow_table *flow_table;
+	struct rps_sock_flow_table *sock_flow_table;
+	int cpu = -1;
+	u16 tcpu;
+
+	if (skb_rx_queue_recorded(skb)) {
+		u16 index = skb_get_rx_queue(skb);
+		if (unlikely(index >= dev->num_rx_queues)) {
+			WARN_ONCE(dev->num_rx_queues > 1, "%s received packet "
+				"on queue %u, but number of RX queues is %u\n",
+				dev->name, index, dev->num_rx_queues);
+			goto done;
+		}
+		rxqueue = dev->_rx + index;
+	} else
+		rxqueue = dev->_rx;
+
+	if (!rxqueue->rps_map && !rxqueue->rps_flow_table)
+		goto done;
+
+	skb->rxhash = skb_calculate_flow(dev, skb);
+	if (skb->rxhash < 0)
+		goto done;
+
 	flow_table = rcu_dereference(rxqueue->rps_flow_table);
 	sock_flow_table = rcu_dereference(rps_sock_flow_table);
 	if (flow_table && sock_flow_table) {

^ permalink raw reply

* [PATCH v2 2/2] macvtap: Implement multiqueue macvtap driver
From: Krishna Kumar @ 2010-08-02 14:33 UTC (permalink / raw)
  To: davem, arnd; +Cc: bhutchings, netdev, mst, Krishna Kumar, therbert
In-Reply-To: <20100802143304.1517.42494.sendpatchset@krkumar2.in.ibm.com>

From: Krishna Kumar <krkumar2@in.ibm.com>

Implement multiqueue facility for macvtap driver. The idea is that
a macvtap device can be opened multiple times and the fd's can be
used to register eg, as backend for vhost.

Please review.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 drivers/net/macvtap.c      |   89 ++++++++++++++++++++++++++++-------
 include/linux/if_macvlan.h |   11 +++-
 include/linux/netdevice.h  |    1 
 3 files changed, 83 insertions(+), 18 deletions(-)

diff -ruNp org/include/linux/netdevice.h new/include/linux/netdevice.h
--- org/include/linux/netdevice.h	2010-07-25 16:57:07.000000000 +0530
+++ new/include/linux/netdevice.h	2010-08-02 16:05:57.000000000 +0530
@@ -2253,6 +2253,7 @@ static inline const char *netdev_name(co
 	return dev->name;
 }
 
+extern int skb_calculate_flow(struct net_device *dev, struct sk_buff *skb);
 extern int netdev_printk(const char *level, const struct net_device *dev,
 			 const char *format, ...)
 	__attribute__ ((format (printf, 3, 4)));
diff -ruNp org/include/linux/if_macvlan.h new/include/linux/if_macvlan.h
--- org/include/linux/if_macvlan.h	2010-08-02 15:32:33.000000000 +0530
+++ new/include/linux/if_macvlan.h	2010-08-02 15:32:33.000000000 +0530
@@ -40,6 +40,14 @@ struct macvlan_rx_stats {
 	unsigned long		rx_errors;
 };
 
+#define MIN(x, y)		(((x) < (y)) ? (x) : (y))
+
+/*
+ * Maximum times a macvtap device can be opened. This can be used to
+ * configure the number of receive queue, e.g. for multiqueue virtio.
+ */
+#define MAX_MACVTAP_QUEUES	MIN(16, NR_CPUS)
+
 struct macvlan_dev {
 	struct net_device	*dev;
 	struct list_head	list;
@@ -50,7 +58,8 @@ struct macvlan_dev {
 	enum macvlan_mode	mode;
 	int (*receive)(struct sk_buff *skb);
 	int (*forward)(struct net_device *dev, struct sk_buff *skb);
-	struct macvtap_queue	*tap;
+	struct macvtap_queue	*taps[MAX_MACVTAP_QUEUES];
+	int			numvtaps;
 };
 
 static inline void macvlan_count_rx(const struct macvlan_dev *vlan,
diff -ruNp org/drivers/net/macvtap.c new/drivers/net/macvtap.c
--- org/drivers/net/macvtap.c	2010-07-28 15:10:10.000000000 +0530
+++ new/drivers/net/macvtap.c	2010-08-02 17:48:38.000000000 +0530
@@ -84,26 +84,45 @@ static const struct proto_ops macvtap_so
 static DEFINE_SPINLOCK(macvtap_lock);
 
 /*
- * Choose the next free queue, for now there is only one
+ * get_slot: return a [unused/occupied] slot in vlan->taps[]:
+ *	- if 'q' is NULL, return the first empty slot;
+ *	- otherwise, return the slot this pointer occupies.
  */
+static int get_slot(struct macvlan_dev *vlan, struct macvtap_queue *q)
+{
+	int i;
+
+	for (i = 0; i < MAX_MACVTAP_QUEUES; i++) {
+		if (rcu_dereference(vlan->taps[i]) == q)
+			return i;
+	}
+
+	/* Should never happen */
+	BUG_ON(1);
+}
+
 static int macvtap_set_queue(struct net_device *dev, struct file *file,
 				struct macvtap_queue *q)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
+	int index;
 	int err = -EBUSY;
 
 	spin_lock(&macvtap_lock);
-	if (rcu_dereference(vlan->tap))
+	if (vlan->numvtaps == MAX_MACVTAP_QUEUES)
 		goto out;
 
 	err = 0;
+	index = get_slot(vlan, NULL);
 	rcu_assign_pointer(q->vlan, vlan);
-	rcu_assign_pointer(vlan->tap, q);
+	rcu_assign_pointer(vlan->taps[index], q);
 	sock_hold(&q->sk);
 
 	q->file = file;
 	file->private_data = q;
 
+	vlan->numvtaps++;
+
 out:
 	spin_unlock(&macvtap_lock);
 	return err;
@@ -124,9 +143,12 @@ static void macvtap_put_queue(struct mac
 	spin_lock(&macvtap_lock);
 	vlan = rcu_dereference(q->vlan);
 	if (vlan) {
-		rcu_assign_pointer(vlan->tap, NULL);
+		int index = get_slot(vlan, q);
+
+		rcu_assign_pointer(vlan->taps[index], NULL);
 		rcu_assign_pointer(q->vlan, NULL);
 		sock_put(&q->sk);
+		--vlan->numvtaps;
 	}
 
 	spin_unlock(&macvtap_lock);
@@ -136,39 +158,72 @@ static void macvtap_put_queue(struct mac
 }
 
 /*
- * Since we only support one queue, just dereference the pointer.
+ * Select a queue based on the rxq of the device on which this packet
+ * arrived. If the incoming device is not mq, calculate a flow hash to
+ * select a queue. vlan->numvtaps is cached in case it reduces during
+ * the execution of this function.
  */
 static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
 					       struct sk_buff *skb)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
+	struct macvtap_queue *tap = NULL;
+	int numvtaps = vlan->numvtaps;
+	u16 rxq;
+
+	if (!numvtaps)
+		goto out;
+
+	if (skb_rx_queue_recorded(skb)) {
+		rxq = skb_get_rx_queue(skb);
+
+		while (unlikely(rxq >= numvtaps))
+			rxq -= numvtaps;
 
-	return rcu_dereference(vlan->tap);
+		tap = rcu_dereference(vlan->taps[rxq]);
+		if (tap)
+			goto out;
+	}
+
+	rxq = skb_calculate_flow(dev, skb);
+	if (rxq < 0)
+		rxq = smp_processor_id();
+
+	tap = rcu_dereference(vlan->taps[rxq & (numvtaps - 1)]);
+
+out:
+	return tap;
 }
 
 /*
  * The net_device is going away, give up the reference
- * that it holds on the queue (all the queues one day)
- * and safely set the pointer from the queues to NULL.
+ * that it holds on all queues and safely set the pointer
+ * from the queues to NULL.
  */
 static void macvtap_del_queues(struct net_device *dev)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
-	struct macvtap_queue *q;
+	struct macvtap_queue *q, *qlist[MAX_MACVTAP_QUEUES];
+	int i, j = 0;
 
+	/* macvtap_put_queue can free some slots, so go through all slots */
 	spin_lock(&macvtap_lock);
-	q = rcu_dereference(vlan->tap);
-	if (!q) {
-		spin_unlock(&macvtap_lock);
-		return;
+	for (i = 0; i < MAX_MACVTAP_QUEUES && vlan->numvtaps; i++) {
+		q = rcu_dereference(vlan->taps[i]);
+		if (q) {
+			qlist[j++] = q;
+			rcu_assign_pointer(vlan->taps[i], NULL);
+			rcu_assign_pointer(q->vlan, NULL);
+			vlan->numvtaps--;
+		}
 	}
-
-	rcu_assign_pointer(vlan->tap, NULL);
-	rcu_assign_pointer(q->vlan, NULL);
+	BUG_ON(vlan->numvtaps != 0);
 	spin_unlock(&macvtap_lock);
 
 	synchronize_rcu();
-	sock_put(&q->sk);
+
+	for (--j; j >= 0; j--)
+		sock_put(&qlist[j]->sk);
 }
 
 /*

^ permalink raw reply

* Re: [PATCH nf-next-2.6] netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary
From: Patrick McHardy @ 2010-08-02 14:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Netfilter Development Mailinglist, netdev
In-Reply-To: <1279898595.2481.69.camel@edumazet-laptop>

On 23.07.2010 17:23, Eric Dumazet wrote:
> We currently disable BH for the whole duration of get_counters()
> 
> On machines with a lot of cpus and large tables, this might be too long.
> 
> We can disable preemption during the whole function, and disable BH only
> while fetching counters for the current cpu.
> 

Applied, thanks Eric.


^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Oleg Nesterov @ 2010-08-02 15:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Sridhar Samudrala, Peter Zijlstra, Tejun Heo, Ingo Molnar, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <20100801085022.GD16158@redhat.com>

On 08/01, Michael S. Tsirkin wrote:
>
> Oleg, I mean Ack the exporting of get/set affinity.

Ah, I misunderstood.

Yes, I believe the exporting is the lesser evil. Please feel free
to add my ack.

Oleg.

^ permalink raw reply

* Re: [PATCH] nf_conntrack_extend: introduce __nf_ct_ext_exist()
From: Patrick McHardy @ 2010-08-02 15:07 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <1280068734-5332-1-git-send-email-xiaosuo@gmail.com>

On 25.07.2010 16:38, Changli Gao wrote:
> some users of nf_ct_ext_exist() know ct->ext isn't NULL. For these users, the
> check for ct->ext isn't necessary, the function __nf_ct_ext_exist() can be
> used instead.
> 
> the type of the return value of nf_ct_ext_exist() is changed to bool.

Applied, thanks.

^ permalink raw reply

* Re: [patch] ipvs: remove EXPERIMENTAL tag
From: Patrick McHardy @ 2010-08-02 15:08 UTC (permalink / raw)
  To: Simon Horman; +Cc: lvs-devel, netdev, Wensong Zhang, Julian Anastasov
In-Reply-To: <20100726075609.GA11755@verge.net.au>

On 26.07.2010 09:56, Simon Horman wrote:
> IPVS was merged into the kernel quite a long time ago and
> has been seeing wide-spread production use for even longer.
> 
> It seems appropriate for it to be no longer tagged as EXPERIMENTAL

Applied, thanks.

^ permalink raw reply

* Re: [patch] ipvs: provide default ip_vs_conn_{in,out}_get_proto
From: Patrick McHardy @ 2010-08-02 15:13 UTC (permalink / raw)
  To: Simon Horman; +Cc: lvs-devel, netdev, Wensong Zhang, Julian Anastasov
In-Reply-To: <20100726075632.GB11755@verge.net.au>

On 26.07.2010 09:56, Simon Horman wrote:
> This removes duplicate code by providing a default implementation
> which is used by 3 of the 4 modules that provide these call.

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] nf_nat: use local variable hdrlen
From: Patrick McHardy @ 2010-08-02 15:16 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <1280476725-7637-1-git-send-email-xiaosuo@gmail.com>

On 30.07.2010 09:58, Changli Gao wrote:
> use local variable hdrlen instead of ip_hdrlen(skb).

Applied. But again, please prefix your subject lines with
"netfilter: " and use capital letters for the beginning of
a sentence as I asked you to.

^ permalink raw reply

* Re: [PATCH v2 1/2] nf_nat: make unique_tuple return void
From: Patrick McHardy @ 2010-08-02 15:21 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <1280560117-30747-1-git-send-email-xiaosuo@gmail.com>

On 31.07.2010 09:08, Changli Gao wrote:
> the only user of unique_tuple() get_unique_tuple() doesn't care about the
> return value of unique_tuple(), so make unique_tuple() return void (nothing).

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v2 2/2] nf_nat: don't check if the tuple is unique when there isn't any other choice
From: Patrick McHardy @ 2010-08-02 15:36 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <1280560164-30783-1-git-send-email-xiaosuo@gmail.com>

On 31.07.2010 09:09, Changli Gao wrote:
> the tuple got from unique_tuple() doesn't need to be really unique, so the
> check for the unique tuple isn't necessary, when there isn't any other
> choice. Eliminating the unnecessary nf_nat_used_tuple() can save some CPU
> cycles too.

Applied.

^ permalink raw reply

* Re: [PATCH v2] nf_nat: no IP_NAT_RANGE_MAP_IPS flags when alloc_null_binding()
From: Patrick McHardy @ 2010-08-02 15:41 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <1280673907-12619-1-git-send-email-xiaosuo@gmail.com>

On 01.08.2010 16:45, Changli Gao wrote:
> when alloc_null_binding(), no IP_NAT_RNAGE_MAP_IPS in flags means no IP address
> translation is needed. It isn't necessary to specify the address explicitly.

When sending updated patches, you have to tell people what has changed.

^ permalink raw reply

* Re: [PATCH] nf: make skb_make_writable() return bool
From: Patrick McHardy @ 2010-08-02 15:45 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <1280559384-25197-1-git-send-email-xiaosuo@gmail.com>

On 31.07.2010 08:56, Changli Gao wrote:
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ----

> -int skb_make_writable(struct sk_buff *skb, unsigned int writable_len)
> +bool skb_make_writable(struct sk_buff *skb, unsigned int writable_len)

You can change things like this when making actual changes to the code,
but this is just useless noise in the changelogs and makes it harder
to locate real changes in the history.

^ permalink raw reply

* Re: [PATCH] ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice
From: Patrick McHardy @ 2010-08-02 15:50 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, Bart De Schuymer, netdev
In-Reply-To: <1280618708-9241-1-git-send-email-xiaosuo@gmail.com>

On 01.08.2010 01:25, Changli Gao wrote:
> 6c79bf0f2440fd250c8fce8d9b82fcf03d4e8350 subtracts PPPOE_SES_HLEN from mtu at
> the front of ip_fragment(). So the later subtraction should be removed. The
> MTU of 802.1q is also 1500, so MTU should not be changed.

Bart, please review, thanks.

> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ----
>  net/ipv4/ip_output.c |    6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 6652bd9..04b6989 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -446,7 +446,7 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
>  	int ptr;
>  	struct net_device *dev;
>  	struct sk_buff *skb2;
> -	unsigned int mtu, hlen, left, len, ll_rs, pad;
> +	unsigned int mtu, hlen, left, len, ll_rs;
>  	int offset;
>  	__be16 not_last_frag;
>  	struct rtable *rt = skb_rtable(skb);
> @@ -585,9 +585,7 @@ slow_path:
>  	/* for bridged IP traffic encapsulated inside f.e. a vlan header,
>  	 * we need to make room for the encapsulating header
>  	 */
> -	pad = nf_bridge_pad(skb);
> -	ll_rs = LL_RESERVED_SPACE_EXTRA(rt->dst.dev, pad);
> -	mtu -= pad;
> +	ll_rs = LL_RESERVED_SPACE_EXTRA(rt->dst.dev, nf_bridge_pad(skb));
>  
>  	/*
>  	 *	Fragment the datagram.
> 


^ permalink raw reply

* Re: [PATCH v2 2/2] macvtap: Implement multiqueue macvtap driver
From: Arnd Bergmann @ 2010-08-02 15:52 UTC (permalink / raw)
  To: Krishna Kumar; +Cc: davem, bhutchings, netdev, mst, therbert
In-Reply-To: <20100802143310.1517.55824.sendpatchset@krkumar2.in.ibm.com>

On Monday 02 August 2010, Krishna Kumar wrote:
> Implement multiqueue facility for macvtap driver. The idea is that
> a macvtap device can be opened multiple times and the fd's can be
> used to register eg, as backend for vhost.
>
> Please review.

Only two very minor points from my side:

> diff -ruNp org/include/linux/netdevice.h new/include/linux/netdevice.h
> --- org/include/linux/netdevice.h	2010-07-25 16:57:07.000000000 +0530
> +++ new/include/linux/netdevice.h	2010-08-02 16:05:57.000000000 +0530
> @@ -2253,6 +2253,7 @@ static inline const char *netdev_name(co
>  	return dev->name;
>  }
>  
> +extern int skb_calculate_flow(struct net_device *dev, struct sk_buff *skb);
>  extern int netdev_printk(const char *level, const struct net_device *dev,
>  			 const char *format, ...)
>  	__attribute__ ((format (printf, 3, 4)));

This logically belongs into the first patch.

> diff -ruNp org/include/linux/if_macvlan.h new/include/linux/if_macvlan.h
> --- org/include/linux/if_macvlan.h	2010-08-02 15:32:33.000000000 +0530
> +++ new/include/linux/if_macvlan.h	2010-08-02 15:32:33.000000000 +0530
> @@ -40,6 +40,14 @@ struct macvlan_rx_stats {
>  	unsigned long		rx_errors;
>  };
>  
> +#define MIN(x, y)		(((x) < (y)) ? (x) : (y))
> +
> +/*
> + * Maximum times a macvtap device can be opened. This can be used to
> + * configure the number of receive queue, e.g. for multiqueue virtio.
> + */
> +#define MAX_MACVTAP_QUEUES	MIN(16, NR_CPUS)
> +

Please use the existing min() or min_t() macro instead of providing your own.

	Arnd

^ permalink raw reply

* Re: [PATCH] nf_conntrack_acct: use skb->len for accounting
From: Patrick McHardy @ 2010-08-02 15:56 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <1279856399-9058-1-git-send-email-xiaosuo@gmail.com>

On 23.07.2010 05:39, Changli Gao wrote:
> use skb->len for accounting as xt_quota does. Since nf_conntrack works at
> the network layer, skb_network_offset should always returns ZERO.

Applied.

^ permalink raw reply

* Re: [RFC PATCH v8 00/16] Provide a zero-copy method on KVM virtio-net.
From: Shirley Ma @ 2010-08-02 16:01 UTC (permalink / raw)
  To: Avi Kivity
  Cc: xiaohui.xin, netdev, kvm, linux-kernel, mst, mingo, davem,
	herbert, jdike
In-Reply-To: <4C552DC4.5000600@redhat.com>

Hello Avi,

On Sun, 2010-08-01 at 11:18 +0300, Avi Kivity wrote:
> I don't understand.  Under what conditions do you use
> get_user_pages() 
> instead of get_user_pages_fast()?  Why?

The code always calls get_user_pages_fast, however, the page will be
unpinned in skb_free if the same page is not used again for a new
buffer. The reason for unpin the page is we don't want to pin all of the
guest kernel memory(memory over commit). So get_user_pages_fast will
call slow path get_user_pages. 

Your previous comment is suggesting to keep the page pinned for
get_user_pages_fast fast path?

Thanks
Shirley


^ permalink raw reply

* Re: [RFC PATCH v8 00/16] Provide a zero-copy method on KVM virtio-net.
From: Shirley Ma @ 2010-08-02 16:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: xiaohui.xin, netdev, kvm, linux-kernel, mingo, davem, herbert,
	jdike
In-Reply-To: <20100801083113.GB16158@redhat.com>

Hello Michael,

On Sun, 2010-08-01 at 11:31 +0300, Michael S. Tsirkin wrote:
> I think we should explore the idea for the driver to fall back on data
> copy
> for small message sizes.
> The benefit of zero copy would then be CPU utilization on large
> messages.

Yes, we used to have 128 bytes for small copy in other driver. I saw
Xiaohui's patch here is using 64 bytes. I think we need to compare the
performance on different platform to decide what's the best for small
message size.

Thanks
Shirley


^ permalink raw reply

* Re: [RFC PATCH v8 00/16] Provide a zero-copy method on KVM virtio-net.
From: Shirley Ma @ 2010-08-02 16:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: xiaohui.xin, netdev, kvm, linux-kernel, mingo, davem, herbert,
	jdike
In-Reply-To: <20100801083113.GB16158@redhat.com>

Hello Michael,

On Sun, 2010-08-01 at 11:31 +0300, Michael S. Tsirkin wrote:
> Could you provide an example of a good setup?
> Specifically, is it a good idea for the vhost thread
> to inherit CPU affinities from qemu? 

I need to retest my set up with multi-threads vhost. My previous set up
applies to single thread vhost. The single stream netperf/netserver set
up, for example, if we have two quad-cores sockets to get the consistent
9.4Gb/s BW:

socket 1:
cpu0: netperf/netserver
cpu1: ixgbe 10GbE NIC IRQ 
cpu2: I/O thread
cpu3: vhost thread

socket 2:
cpu0: QEMU VCPU0
cpu1: QEMU VCPU1
cpu2:
cpu3:

Thanks
Shirley


^ permalink raw reply

* Re: Re: why do we need printk on sending syn flood cookie?
From: Franchoze Eric @ 2010-08-02 16:11 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev
In-Reply-To: <20100802081716.GA8374@Chamillionaire.breakpoint.cc>



02.08.10, 12:17, "Florian Westphal" <fw@strlen.de>:

> Franchoze Eric  wrote:
>  >  Just sirious why do we need printk each 1 second (60*HZ) about possible syn-flood? It really floods dmesg. Is there something dengerous? I have suggestion to turn off printk about sending tcp cookie each 1 second.
>  
>  It is handled exactly like other printks in the networking path,
>  e.g. receipt of tcp wscale == 15.
>  
>  Why does this need special treatment?
>  

For now I see "possible SYN flooding on port %d. Sending cookies.\n" message each second on my server. I know that there are a lot of SYNs and I know that kernel sends cookie. Why do I need so mach printk?
So I suggested add new value to /proc/sys/net/ipv4/tcp_syncookies, which will enable cookie but this printk will be turned off.

^ permalink raw reply

* Re: [RFC PATCH v8 00/16] Provide a zero-copy method on KVM virtio-net.
From: Avi Kivity @ 2010-08-02 16:11 UTC (permalink / raw)
  To: Shirley Ma
  Cc: xiaohui.xin, netdev, kvm, linux-kernel, mst, mingo, davem,
	herbert, jdike
In-Reply-To: <1280764918.22830.7.camel@localhost.localdomain>

  On 08/02/2010 07:01 PM, Shirley Ma wrote:
> Hello Avi,
>
> On Sun, 2010-08-01 at 11:18 +0300, Avi Kivity wrote:
>> I don't understand.  Under what conditions do you use
>> get_user_pages()
>> instead of get_user_pages_fast()?  Why?
> The code always calls get_user_pages_fast, however, the page will be
> unpinned in skb_free if the same page is not used again for a new
> buffer. The reason for unpin the page is we don't want to pin all of the
> guest kernel memory(memory over commit).

That is fine.

> So get_user_pages_fast will
> call slow path get_user_pages.

I don't understand this. gup_fast() only calls gup() if the page is 
swapped out or read-only.

> Your previous comment is suggesting to keep the page pinned for
> get_user_pages_fast fast path?
>

Right now I'm not sure I understand what's happening.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox