[PATCH 0/3] Kernel interfaces for multiqueue aware socket

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] Kernel interfaces for multiqueue aware socket
@ 2010-12-15 20:02 Fenghua Yu
  2010-12-15 20:02 ` [PATCH 1/3] " Fenghua Yu
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Fenghua Yu @ 2010-12-15 20:02 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, John Fastabend, Xinan Tang,
	"Junchang Wang"
  Cc: netdev, linux-kernel, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

This patch set implements a kernel ioctl interfaces for multiqueue aware socket.
We hope network applications can use the interfaces to get ride of serialized
packet assembling kernel processing and take advantage of multiqueue feature to
handle packets in parallel.

Fenghua Yu (3):
  Kernel interfaces for multiqueue aware socket
  net/packet/af_packet.c: implement multiqueue aware socket in
    af_apcket
  drivers/net/ixgbe/ixgbe_main.c: get tx queue mapping specified in
    socket

 drivers/net/ixgbe/ixgbe_main.c |    9 +++-
 include/linux/sockios.h        |    7 +++
 include/net/sock.h             |   18 +++++++
 net/core/sock.c                |    4 +-
 net/packet/af_packet.c         |  109 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 145 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3] Kernel interfaces for multiqueue aware socket
  2010-12-15 20:02 [PATCH 0/3] Kernel interfaces for multiqueue aware socket Fenghua Yu
@ 2010-12-15 20:02 ` Fenghua Yu
  2010-12-15 20:48   ` Eric Dumazet
  2010-12-15 20:52   ` John Fastabend
  2010-12-15 20:02 ` [PATCH 2/3] net/packet/af_packet.c: implement multiqueue aware socket in af_apcket Fenghua Yu
  2010-12-15 20:02 ` [PATCH 3/3] drivers/net/ixgbe/ixgbe_main.c: get tx queue mapping specified in socket Fenghua Yu
  2 siblings, 2 replies; 19+ messages in thread
From: Fenghua Yu @ 2010-12-15 20:02 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, John Fastabend, Xinan Tang,
	"Junchang Wang"
  Cc: netdev, linux-kernel, Fenghua Yu, Junchang Wang, Xinan Tang

From: Fenghua Yu <fenghua.yu@intel.com>

Multiqueue and multicore provide packet parallel processing methodology.
Current kernel and network drivers place one queue on one core. But the higher
level socket doesn't know multiqueue. Current socket only can receive or send
packets through one network interfaces. In some cases e.g. multi bpf filter
tcpdump and snort, a lot of contentions come from socket operations like ring
buffer. Even if the application itself has been fully parallelized and run on
multi-core systems and NIC handlex tx/rx in multiqueue in parallel, network layer
and NIC device driver assemble packets to a single, serialized queue. Thus the
application cannot actually run in parallel in high speed.

To break the serialized packets assembling bottleneck in kernel, one way is to
allow socket to know multiqueue associated with a NIC interface. So each socket
can handle tx/rx in one queue in parallel.

Kernel provides several interfaces by which sockets can be bound to rx/tx queues.
User applications can configure socket by providing several sockets that each
bound to a single queue, applications can get data from kernel in parallel. After
that, competitions mentioned above can be removed.

With this patch, the user-space receiving speed on a Intel SR1690 server with
a single L5640 6-core processor and a single ixgbe-based NIC goes from 0.73Mpps
to 4.20Mpps, nearly a linear speedup. A Intel SR1625 server two E5530 4-core
processors and a single ixgbe-based NIC goes from 0.80Mpps to 4.6Mpps. We noticed
the performance penalty comes from NUMA memory allocation.

This patch set provides kernel ioctl interfaces for user space. User space can
either directly call the interfaces or libpcap interfaces can be further provided
on the top of the kernel ioctl interfaces.

The order of tx/rx packets is up to user application. In some cases, e.g. network
monitors, ordering is not a big problem because they more care how to receive and
analyze packets in highest performance in parallel.

This patch set only implements multiqueue interfaces for AF_PACKET and Intel
ixgbe NIC. Other protocols and NIC's can be handled on the top of this patch set.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Junchang Wang <junchangwang@gmail.com>
Signed-off-by: Xinan Tang <xinan.tang@intel.com>
---
 include/linux/sockios.h |    7 +++++++
 include/net/sock.h      |   18 ++++++++++++++++++
 net/core/sock.c         |    4 +++-
 3 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/include/linux/sockios.h b/include/linux/sockios.h
index 241f179..b121d9a 100644
--- a/include/linux/sockios.h
+++ b/include/linux/sockios.h
@@ -66,6 +66,13 @@
 #define	SIOCSIFHWBROADCAST	0x8937	/* set hardware broadcast addr	*/
 #define SIOCGIFCOUNT	0x8938		/* get number of devices */

+#define SIOGNUMRXQUEUE	0x8939	/* Get number of rx queues. */
+#define SIOGNUMTXQUEUE	0x893A	/* Get number of tx queues. */
+#define SIOSRXQUEUEMAPPING	0x893B	/* Set rx queue mapping. */
+#define SIOSTXQUEUEMAPPING	0x893C	/* Set tx queue mapping. */
+#define SIOGRXQUEUEMAPPING	0x893D	/* Get rx queue mapping. */
+#define SIOGTXQUEUEMAPPING	0x893E	/* Get tx queue mapping. */
+
 #define SIOCGIFBR	0x8940		/* Bridging support		*/
 #define SIOCSIFBR	0x8941		/* Set bridging options 	*/

diff --git a/include/net/sock.h b/include/net/sock.h
index 659d968..d677bba 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -109,6 +109,7 @@ struct net;
  *	@skc_nulls_node: main hash linkage for TCP/UDP/UDP-Lite protocol
  *	@skc_refcnt: reference count
  *	@skc_tx_queue_mapping: tx queue number for this connection
+ *	@skc_rx_queue_mapping: rx queue number for this connection
  *	@skc_hash: hash value used with various protocol lookup tables
  *	@skc_u16hashes: two u16 hash values used by UDP lookup tables
  *	@skc_family: network address family
@@ -133,6 +134,7 @@ struct sock_common {
 	};
 	atomic_t		skc_refcnt;
 	int			skc_tx_queue_mapping;
+	int			skc_rx_queue_mapping;

 	union  {
 		unsigned int	skc_hash;
@@ -231,6 +233,7 @@ struct sock {
 #define sk_nulls_node		__sk_common.skc_nulls_node
 #define sk_refcnt		__sk_common.skc_refcnt
 #define sk_tx_queue_mapping	__sk_common.skc_tx_queue_mapping
+#define sk_rx_queue_mapping	__sk_common.skc_rx_queue_mapping

 #define sk_copy_start		__sk_common.skc_hash
 #define sk_hash			__sk_common.skc_hash
@@ -1234,6 +1237,21 @@ static inline int sk_tx_queue_get(const struct sock *sk)
 	return sk ? sk->sk_tx_queue_mapping : -1;
 }

+static inline void sk_rx_queue_set(struct sock *sk, int rx_queue)
+{
+	sk->sk_rx_queue_mapping = rx_queue;
+}
+
+static inline int sk_rx_queue_get(const struct sock *sk)
+{
+	return sk ? sk->sk_rx_queue_mapping : -1;
+}
+
+static inline void sk_rx_queue_clear(struct sock *sk)
+{
+	sk->sk_rx_queue_mapping = -1;
+}
+
 static inline void sk_set_socket(struct sock *sk, struct socket *sock)
 {
 	sk_tx_queue_clear(sk);
diff --git a/net/core/sock.c b/net/core/sock.c
index fb60801..9ad92cb 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1000,7 +1000,8 @@ static void sock_copy(struct sock *nsk, const struct sock *osk)
 #endif
 	BUILD_BUG_ON(offsetof(struct sock, sk_copy_start) !=
 		     sizeof(osk->sk_node) + sizeof(osk->sk_refcnt) +
-		     sizeof(osk->sk_tx_queue_mapping));
+		     sizeof(osk->sk_tx_queue_mapping) +
+		     sizeof(osk->sk_rx_queue_mapping));
 	memcpy(&nsk->sk_copy_start, &osk->sk_copy_start,
 	       osk->sk_prot->obj_size - offsetof(struct sock, sk_copy_start));
 #ifdef CONFIG_SECURITY_NETWORK
@@ -1045,6 +1046,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
 		if (!try_module_get(prot->owner))
 			goto out_free_sec;
 		sk_tx_queue_clear(sk);
+		sk_rx_queue_clear(sk);
 	}

 	return sk;
-- 
1.6.0.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/3] net/packet/af_packet.c: implement multiqueue aware socket in af_apcket
  2010-12-15 20:02 [PATCH 0/3] Kernel interfaces for multiqueue aware socket Fenghua Yu
  2010-12-15 20:02 ` [PATCH 1/3] " Fenghua Yu
@ 2010-12-15 20:02 ` Fenghua Yu
  2010-12-15 20:02 ` [PATCH 3/3] drivers/net/ixgbe/ixgbe_main.c: get tx queue mapping specified in socket Fenghua Yu
  2 siblings, 0 replies; 19+ messages in thread
From: Fenghua Yu @ 2010-12-15 20:02 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, John Fastabend, Xinan Tang,
	"Junchang Wang"
  Cc: netdev, linux-kernel, Fenghua Yu, Junchang Wang, Xinan Tang

From: Fenghua Yu <fenghua.yu@intel.com>

This patch implements multiqueue aware socket interfaces in af_packet.

The interfaces are:

1. ioctl(int sockfd, int SIOSTXQUEUEMAPPING, int *tx_queue);
Set tx queue mapping for sockfd;

2. int ioctl(int sockfd, int SIOGTXQUEUEMAPPING, int *tx_queue):
Get tx queue mapping for sockfd. If no queue mapping is set, error is returned.

3. ioctl(int sockfd, int SIOSRXQUEUEMAPPING, int *rx_queue);
Set rx queue mapping for sockfd;

4. ioctl(int sockfd, int SIOGRXQUEUEMAPPING, int *rx_queue);
Get rx queue mapping for sockfd. If no queue mapping is set, error is returned.

5. ioctl(int sockfd, int SIOGNUMTXQUEUE, int *num_tx_queue);
Get number of tx queue which is configured on the NIC interface bound to sockfd.

6. ioctl(int sockfd, int SIOGNUMRXQUEUE, int *num_rx_queue);
Get number of rx queue which is configured on the NIC interface bound to sockfd.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Junchang Wang <junchangwang@gmail.com>
Signed-off-by: Xinan Tang <xinan.tang@intel.com>
---
 net/packet/af_packet.c |  109 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 109 insertions(+), 0 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8298e67..022900d 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -659,6 +659,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 	struct timeval tv;
 	struct timespec ts;
 	struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb);
+	int rx_queue_mapping;
 
 	if (skb->pkt_type == PACKET_LOOPBACK)
 		goto drop;
@@ -666,6 +667,11 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 	sk = pt->af_packet_priv;
 	po = pkt_sk(sk);
 
+	rx_queue_mapping = sk_rx_queue_get(sk);
+	if (rx_queue_mapping >= 0)
+		if (skb_get_queue_mapping(skb) != rx_queue_mapping + 1)
+			goto drop;
+
 	if (!net_eq(dev_net(dev), sock_net(sk)))
 		goto drop;
 
@@ -2219,6 +2225,83 @@ static int packet_notifier(struct notifier_block *this, unsigned long msg, void
 	return NOTIFY_DONE;
 }
 
+static void set_queue_mapping(struct sock *sk, unsigned int cmd, __u16 queue)
+{
+	if (cmd == SIOSRXQUEUEMAPPING)
+		sk_rx_queue_set(sk, queue);
+	else
+		sk_tx_queue_set(sk, queue);
+}
+
+static int get_num_queues(struct socket *sock, unsigned int cmd,
+			 unsigned int *p)
+{
+	struct net_device *dev;
+	struct sock *sk = sock->sk;
+
+	if (!pkt_sk(sk)->ifindex)
+		return -EPERM;
+
+	dev = dev_get_by_index(sock_net(sk), pkt_sk(sk)->ifindex);
+	if (dev == NULL)
+		return -ENODEV;
+
+	switch (cmd) {
+	case SIOGNUMRXQUEUE:
+		*p = dev->real_num_rx_queues;
+		break;
+	case SIOGNUMTXQUEUE:
+		*p = dev->real_num_tx_queues;
+		break;
+	default:
+		return -EFAULT;
+	}
+	return 0;
+}
+
+static int set_sock_queue(struct socket *sock, unsigned int cmd,
+			  char __user *uarg)
+{
+	__u16 queue;
+	struct sock *sk = sock->sk;
+	struct net_device *dev;
+	int num_queues;
+
+	if (copy_from_user(&queue, uarg, sizeof(queue)))
+		return -EFAULT;
+
+	if (!pkt_sk(sk)->ifindex)
+		return -EPERM;
+
+	dev = dev_get_by_index(sock_net(sk), pkt_sk(sk)->ifindex);
+	if (dev == NULL)
+		return -ENODEV;
+
+	num_queues = cmd == SIOSRXQUEUEMAPPING ? dev->real_num_rx_queues :
+						dev->real_num_tx_queues;
+	if (queue >= num_queues)
+		return -EINVAL;
+
+	set_queue_mapping(sk, cmd, queue);
+	return 0;
+}
+
+static int get_sock_queue(struct socket *sock, unsigned int cmd, int *p)
+{
+	struct sock *sk = sock->sk;
+
+	switch (cmd) {
+	case SIOGTXQUEUEMAPPING:
+		*p = sk_tx_queue_get(sk);
+		break;
+	case SIOGRXQUEUEMAPPING:
+		*p = sk_rx_queue_get(sk);
+		break;
+	default:
+		return -EFAULT;
+	}
+	return 0;
+}
 
 static int packet_ioctl(struct socket *sock, unsigned int cmd,
 			unsigned long arg)
@@ -2267,6 +2350,32 @@ static int packet_ioctl(struct socket *sock, unsigned int cmd,
 		return inet_dgram_ops.ioctl(sock, cmd, arg);
 #endif
 
+	case SIOGNUMRXQUEUE:
+	case SIOGNUMTXQUEUE:
+	{
+		int err;
+		unsigned int num_queues;
+		err = get_num_queues(sock, cmd, &num_queues);
+		if (!err)
+			return put_user(num_queues, (int __user *)arg);
+		else
+			return err;
+	}
+	case SIOSRXQUEUEMAPPING:
+	case SIOSTXQUEUEMAPPING:
+		return set_sock_queue(sock, cmd, (char __user *)arg);
+
+	case SIOGRXQUEUEMAPPING:
+	case SIOGTXQUEUEMAPPING:
+	{
+		int err;
+		int queue_mapping;
+		err = get_sock_queue(sock, cmd, &queue_mapping);
+		if (!err)
+			return put_user(queue_mapping, (int __user *)arg);
+		else
+			return err;
+	}
 	default:
 		return -ENOIOCTLCMD;
 	}
-- 
1.6.0.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/3] drivers/net/ixgbe/ixgbe_main.c: get tx queue mapping specified in socket
  2010-12-15 20:02 [PATCH 0/3] Kernel interfaces for multiqueue aware socket Fenghua Yu
  2010-12-15 20:02 ` [PATCH 1/3] " Fenghua Yu
  2010-12-15 20:02 ` [PATCH 2/3] net/packet/af_packet.c: implement multiqueue aware socket in af_apcket Fenghua Yu
@ 2010-12-15 20:02 ` Fenghua Yu
  2010-12-15 20:54   ` John Fastabend
  2 siblings, 1 reply; 19+ messages in thread
From: Fenghua Yu @ 2010-12-15 20:02 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, John Fastabend, Xinan Tang,
	"Junchang Wang"
  Cc: netdev, linux-kernel, Fenghua Yu, Junchang Wang, Xinan Tang

From: Fenghua Yu <fenghua.yu@intel.com>

Instead of using calculated tx queue mapping, this patch selects tx queue mapping
which is specified in socket.

By doing this, tx queue mapping can be bigger than the number of cores and
stressfully use multiqueue TSS. Or application can specify some of cores/queues
to send packets and implement flexible load balance policies.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Junchang Wang <junchangwang@gmail.com>
Signed-off-by: Xinan Tang <xinan.tang@intel.com>
---
 drivers/net/ixgbe/ixgbe_main.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index eee0b29..4d98928 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -6255,7 +6255,14 @@ static int ixgbe_maybe_stop_tx(struct net_device *netdev,
 static u16 ixgbe_select_queue(struct net_device *dev, struct sk_buff *skb)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(dev);
-	int txq = smp_processor_id();
+	int txq;
+
+	txq = sk_tx_queue_get(skb->sk);
+
+	if (txq >= 0 && txq < dev->real_num_tx_queues)
+		return txq;
+
+	txq = smp_processor_id();
 #ifdef IXGBE_FCOE
 	__be16 protocol;
 
-- 
1.6.0.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] Kernel interfaces for multiqueue aware socket
  2010-12-15 20:02 ` [PATCH 1/3] " Fenghua Yu
@ 2010-12-15 20:48   ` Eric Dumazet
  2010-12-15 20:56     ` Eric Dumazet
                       ` (2 more replies)
  2010-12-15 20:52   ` John Fastabend
  1 sibling, 3 replies; 19+ messages in thread
From: Eric Dumazet @ 2010-12-15 20:48 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: David S. Miller, John Fastabend, Xinan Tang, Junchang Wang,
	netdev, linux-kernel

Le mercredi 15 décembre 2010 à 12:02 -0800, Fenghua Yu a écrit :
> From: Fenghua Yu <fenghua.yu@intel.com>
> 
> Multiqueue and multicore provide packet parallel processing methodology.
> Current kernel and network drivers place one queue on one core. But the higher
> level socket doesn't know multiqueue. Current socket only can receive or send
> packets through one network interfaces. In some cases e.g. multi bpf filter
> tcpdump and snort, a lot of contentions come from socket operations like ring
> buffer. Even if the application itself has been fully parallelized and run on
> multi-core systems and NIC handlex tx/rx in multiqueue in parallel, network layer
> and NIC device driver assemble packets to a single, serialized queue. Thus the
> application cannot actually run in parallel in high speed.
> 
> To break the serialized packets assembling bottleneck in kernel, one way is to
> allow socket to know multiqueue associated with a NIC interface. So each socket
> can handle tx/rx in one queue in parallel.
> 
> Kernel provides several interfaces by which sockets can be bound to rx/tx queues.
> User applications can configure socket by providing several sockets that each
> bound to a single queue, applications can get data from kernel in parallel. After
> that, competitions mentioned above can be removed.
> 
> With this patch, the user-space receiving speed on a Intel SR1690 server with
> a single L5640 6-core processor and a single ixgbe-based NIC goes from 0.73Mpps
> to 4.20Mpps, nearly a linear speedup. A Intel SR1625 server two E5530 4-core
> processors and a single ixgbe-based NIC goes from 0.80Mpps to 4.6Mpps. We noticed
> the performance penalty comes from NUMA memory allocation.
> 

??? please elaborate on these NUMA memory allocations. This should be OK
after commit 564824b0c52c34692d (net: allocate skbs on local node)

> This patch set provides kernel ioctl interfaces for user space. User space can
> either directly call the interfaces or libpcap interfaces can be further provided
> on the top of the kernel ioctl interfaces.

So, say we have 8 queues, you want libpcap opens 8 sockets, and bind
them to each queue. Add a bpf filter to each one of them. This seems not
generic way, because it wont work for an UDP socket for example.
And you already can do this using SKF_AD_QUEUE (added in commit
d19742fb)

Also your AF_PACKET patch only address mmaped sockets.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] Kernel interfaces for multiqueue aware socket
  2010-12-15 20:02 ` [PATCH 1/3] " Fenghua Yu
  2010-12-15 20:48   ` Eric Dumazet
@ 2010-12-15 20:52   ` John Fastabend
  1 sibling, 0 replies; 19+ messages in thread
From: John Fastabend @ 2010-12-15 20:52 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: David S. Miller, Eric Dumazet, Tang, Xinan, Junchang Wang, netdev,
	linux-kernel

On 12/15/2010 12:02 PM, Yu, Fenghua wrote:
> From: Fenghua Yu <fenghua.yu@intel.com>
> 
> Multiqueue and multicore provide packet parallel processing methodology.
> Current kernel and network drivers place one queue on one core. But the higher
> level socket doesn't know multiqueue. Current socket only can receive or send
> packets through one network interfaces. In some cases e.g. multi bpf filter
> tcpdump and snort, a lot of contentions come from socket operations like ring
> buffer. Even if the application itself has been fully parallelized and run on
> multi-core systems and NIC handlex tx/rx in multiqueue in parallel, network layer
> and NIC device driver assemble packets to a single, serialized queue. Thus the
> application cannot actually run in parallel in high speed.
> 
> To break the serialized packets assembling bottleneck in kernel, one way is to
> allow socket to know multiqueue associated with a NIC interface. So each socket
> can handle tx/rx in one queue in parallel.
> 
> Kernel provides several interfaces by which sockets can be bound to rx/tx queues.
> User applications can configure socket by providing several sockets that each
> bound to a single queue, applications can get data from kernel in parallel. After
> that, competitions mentioned above can be removed.
> 
> With this patch, the user-space receiving speed on a Intel SR1690 server with
> a single L5640 6-core processor and a single ixgbe-based NIC goes from 0.73Mpps
> to 4.20Mpps, nearly a linear speedup. A Intel SR1625 server two E5530 4-core
> processors and a single ixgbe-based NIC goes from 0.80Mpps to 4.6Mpps. We noticed
> the performance penalty comes from NUMA memory allocation.
> 
> This patch set provides kernel ioctl interfaces for user space. User space can
> either directly call the interfaces or libpcap interfaces can be further provided
> on the top of the kernel ioctl interfaces.
> 
> The order of tx/rx packets is up to user application. In some cases, e.g. network
> monitors, ordering is not a big problem because they more care how to receive and
> analyze packets in highest performance in parallel.
> 
> This patch set only implements multiqueue interfaces for AF_PACKET and Intel
> ixgbe NIC. Other protocols and NIC's can be handled on the top of this patch set.
> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Signed-off-by: Junchang Wang <junchangwang@gmail.com>
> Signed-off-by: Xinan Tang <xinan.tang@intel.com>
> ---

I think it would be easier to manipulate the sk_hash to accomplish this. Allowing this from user space doesn't seem so great to me. You don't really want to pick the tx/rx bindings for sockets I think what you actually want is to optimize the hashing for this case to avoid the bottleneck you observe.

I'm not too familiar with the af_packet stuff but could you do this with a single flag that indicates the sk_hash should be set in {t}packet_snd(). Maybe I missed your point or there is a reason this wouldn't work. But, then you don't need to do funny stuff in select_queue and it works with rps/xps as well.

--John.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] drivers/net/ixgbe/ixgbe_main.c: get tx queue mapping specified in socket
  2010-12-15 20:02 ` [PATCH 3/3] drivers/net/ixgbe/ixgbe_main.c: get tx queue mapping specified in socket Fenghua Yu
@ 2010-12-15 20:54   ` John Fastabend
  0 siblings, 0 replies; 19+ messages in thread
From: John Fastabend @ 2010-12-15 20:54 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: David S. Miller, Eric Dumazet, Tang, Xinan, Junchang Wang, netdev,
	linux-kernel

On 12/15/2010 12:02 PM, Yu, Fenghua wrote:
> From: Fenghua Yu <fenghua.yu@intel.com>
> 
> Instead of using calculated tx queue mapping, this patch selects tx queue mapping
> which is specified in socket.
> 
> By doing this, tx queue mapping can be bigger than the number of cores and
> stressfully use multiqueue TSS. Or application can specify some of cores/queues
> to send packets and implement flexible load balance policies.
> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Signed-off-by: Junchang Wang <junchangwang@gmail.com>
> Signed-off-by: Xinan Tang <xinan.tang@intel.com>
> ---
>  drivers/net/ixgbe/ixgbe_main.c |    9 ++++++++-
>  1 files changed, 8 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
> index eee0b29..4d98928 100644
> --- a/drivers/net/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ixgbe/ixgbe_main.c
> @@ -6255,7 +6255,14 @@ static int ixgbe_maybe_stop_tx(struct net_device *netdev,
>  static u16 ixgbe_select_queue(struct net_device *dev, struct sk_buff *skb)
>  {
>  	struct ixgbe_adapter *adapter = netdev_priv(dev);
> -	int txq = smp_processor_id();
> +	int txq;
> +
> +	txq = sk_tx_queue_get(skb->sk);
> +
> +	if (txq >= 0 && txq < dev->real_num_tx_queues)
> +		return txq;
> +
> +	txq = smp_processor_id();
>  #ifdef IXGBE_FCOE
>  	__be16 protocol;
>  

We are trying to remove stuff from select_queue not add it. I believe however you solve this problem should be generic and not specific to ixgbe.

Thanks,
John.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] Kernel interfaces for multiqueue aware socket
  2010-12-15 20:48   ` Eric Dumazet
@ 2010-12-15 20:56     ` Eric Dumazet
  2010-12-16  1:14     ` Fenghua Yu
  2010-12-16  1:52     ` Junchang Wang
  2 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2010-12-15 20:56 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: David S. Miller, John Fastabend, Xinan Tang, Junchang Wang,
	netdev, linux-kernel

Le mercredi 15 décembre 2010 à 21:48 +0100, Eric Dumazet a écrit :
> Le mercredi 15 décembre 2010 à 12:02 -0800, Fenghua Yu a écrit :
> > From: Fenghua Yu <fenghua.yu@intel.com>
> > 
> > Multiqueue and multicore provide packet parallel processing methodology.
> > Current kernel and network drivers place one queue on one core. But the higher
> > level socket doesn't know multiqueue. Current socket only can receive or send
> > packets through one network interfaces. In some cases e.g. multi bpf filter
> > tcpdump and snort, a lot of contentions come from socket operations like ring
> > buffer. Even if the application itself has been fully parallelized and run on
> > multi-core systems and NIC handlex tx/rx in multiqueue in parallel, network layer
> > and NIC device driver assemble packets to a single, serialized queue. Thus the
> > application cannot actually run in parallel in high speed.

I forgot to say that your patches are not against net-next-2.6, and not
apply anyway.

Always use David trees for networking patches...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] Kernel interfaces for multiqueue aware socket
  2010-12-15 20:48   ` Eric Dumazet
  2010-12-15 20:56     ` Eric Dumazet
@ 2010-12-16  1:14     ` Fenghua Yu
  2010-12-16  1:23       ` Stephen Hemminger
                         ` (2 more replies)
  2010-12-16  1:52     ` Junchang Wang
  2 siblings, 3 replies; 19+ messages in thread
From: Fenghua Yu @ 2010-12-16  1:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Yu, Fenghua, David S. Miller, Fastabend, John R, Tang, Xinan,
	Junchang Wang, netdev, linux-kernel

On Wed, Dec 15, 2010 at 12:48:38PM -0800, Eric Dumazet wrote:
> Le mercredi 15 décembre 2010 à 12:02 -0800, Fenghua Yu a écrit :
> > From: Fenghua Yu <fenghua.yu@intel.com>
> > 
> > Multiqueue and multicore provide packet parallel processing methodology.
> > Current kernel and network drivers place one queue on one core. But the higher
> > level socket doesn't know multiqueue. Current socket only can receive or send
> > packets through one network interfaces. In some cases e.g. multi bpf filter
> > tcpdump and snort, a lot of contentions come from socket operations like ring
> > buffer. Even if the application itself has been fully parallelized and run on
> > multi-core systems and NIC handlex tx/rx in multiqueue in parallel, network layer
> > and NIC device driver assemble packets to a single, serialized queue. Thus the
> > application cannot actually run in parallel in high speed.
> > 
> > To break the serialized packets assembling bottleneck in kernel, one way is to
> > allow socket to know multiqueue associated with a NIC interface. So each socket
> > can handle tx/rx in one queue in parallel.
> > 
> > Kernel provides several interfaces by which sockets can be bound to rx/tx queues.
> > User applications can configure socket by providing several sockets that each
> > bound to a single queue, applications can get data from kernel in parallel. After
> > that, competitions mentioned above can be removed.
> > 
> > With this patch, the user-space receiving speed on a Intel SR1690 server with
> > a single L5640 6-core processor and a single ixgbe-based NIC goes from 0.73Mpps
> > to 4.20Mpps, nearly a linear speedup. A Intel SR1625 server two E5530 4-core
> > processors and a single ixgbe-based NIC goes from 0.80Mpps to 4.6Mpps. We noticed
> > the performance penalty comes from NUMA memory allocation.
> > 
> 
> ??? please elaborate on these NUMA memory allocations. This should be OK
> after commit 564824b0c52c34692d (net: allocate skbs on local node)
> 
> > This patch set provides kernel ioctl interfaces for user space. User space can
> > either directly call the interfaces or libpcap interfaces can be further provided
> > on the top of the kernel ioctl interfaces.
> 
> So, say we have 8 queues, you want libpcap opens 8 sockets, and bind
> them to each queue. Add a bpf filter to each one of them. This seems not
> generic way, because it wont work for an UDP socket for example.

This only works for AF_PACKET like this patch set shows.

> And you already can do this using SKF_AD_QUEUE (added in commit
> d19742fb)

SKF_AD_QUEUE doesn't know number of rx queues. Thus user application can't
specify right SKF_AD_QUEUE.

SKF_AD_QUEUE only works for rx. There is no queue bound interfaces for tx.

I can change the patch set to use SKF_AD_QUEUE by removing the set rx queue
interface and still keep interfaces of
#define SIOGNUMRXQUEUE 0x8939  /* Get number of rx queues. */
#define SIOGNUMTXQUEUE 0x893A  /* Get number of tx queues. */
#define SIOSTXQUEUEMAPPING     0x893C  /* Set tx queue mapping. */
#define SIOGRXQUEUEMAPPING     0x893D  /* Get rx queue mapping. */
#define SIOGTXQUEUEMAPPING     0x893E  /* Get tx queue mapping. */

> 
> Also your AF_PACKET patch only address mmaped sockets.
> 
The new patch set will use SKF_AD_QUEUE for rx. So it won't be limited to mmaped
sockets.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] Kernel interfaces for multiqueue aware socket
  2010-12-16  1:14     ` Fenghua Yu
@ 2010-12-16  1:23       ` Stephen Hemminger
  2010-12-16  1:28       ` Changli Gao
  2010-12-16  4:44       ` Eric Dumazet
  2 siblings, 0 replies; 19+ messages in thread
From: Stephen Hemminger @ 2010-12-16  1:23 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Eric Dumazet, David S. Miller, Fastabend, John R, Tang, Xinan,
	Junchang Wang, netdev, linux-kernel

On Wed, 15 Dec 2010 17:14:25 -0800
Fenghua Yu <fenghua.yu@intel.com> wrote:

> On Wed, Dec 15, 2010 at 12:48:38PM -0800, Eric Dumazet wrote:
> > Le mercredi 15 décembre 2010 à 12:02 -0800, Fenghua Yu a écrit :
> > > From: Fenghua Yu <fenghua.yu@intel.com>
> > > 
> > > Multiqueue and multicore provide packet parallel processing methodology.
> > > Current kernel and network drivers place one queue on one core. But the higher
> > > level socket doesn't know multiqueue. Current socket only can receive or send
> > > packets through one network interfaces. In some cases e.g. multi bpf filter
> > > tcpdump and snort, a lot of contentions come from socket operations like ring
> > > buffer. Even if the application itself has been fully parallelized and run on
> > > multi-core systems and NIC handlex tx/rx in multiqueue in parallel, network layer
> > > and NIC device driver assemble packets to a single, serialized queue. Thus the
> > > application cannot actually run in parallel in high speed.
> > > 
> > > To break the serialized packets assembling bottleneck in kernel, one way is to
> > > allow socket to know multiqueue associated with a NIC interface. So each socket
> > > can handle tx/rx in one queue in parallel.
> > > 
> > > Kernel provides several interfaces by which sockets can be bound to rx/tx queues.
> > > User applications can configure socket by providing several sockets that each
> > > bound to a single queue, applications can get data from kernel in parallel. After
> > > that, competitions mentioned above can be removed.
> > > 
> > > With this patch, the user-space receiving speed on a Intel SR1690 server with
> > > a single L5640 6-core processor and a single ixgbe-based NIC goes from 0.73Mpps
> > > to 4.20Mpps, nearly a linear speedup. A Intel SR1625 server two E5530 4-core
> > > processors and a single ixgbe-based NIC goes from 0.80Mpps to 4.6Mpps. We noticed
> > > the performance penalty comes from NUMA memory allocation.
> > > 
> > 
> > ??? please elaborate on these NUMA memory allocations. This should be OK
> > after commit 564824b0c52c34692d (net: allocate skbs on local node)
> > 
> > > This patch set provides kernel ioctl interfaces for user space. User space can
> > > either directly call the interfaces or libpcap interfaces can be further provided
> > > on the top of the kernel ioctl interfaces.
> > 
> > So, say we have 8 queues, you want libpcap opens 8 sockets, and bind
> > them to each queue. Add a bpf filter to each one of them. This seems not
> > generic way, because it wont work for an UDP socket for example.
> 
> This only works for AF_PACKET like this patch set shows.
> 
> > And you already can do this using SKF_AD_QUEUE (added in commit
> > d19742fb)
> 
> SKF_AD_QUEUE doesn't know number of rx queues. Thus user application can't
> specify right SKF_AD_QUEUE.
> 
> SKF_AD_QUEUE only works for rx. There is no queue bound interfaces for tx.
> 
> I can change the patch set to use SKF_AD_QUEUE by removing the set rx queue
> interface and still keep interfaces of
> #define SIOGNUMRXQUEUE 0x8939  /* Get number of rx queues. */
> #define SIOGNUMTXQUEUE 0x893A  /* Get number of tx queues. */
> #define SIOSTXQUEUEMAPPING     0x893C  /* Set tx queue mapping. */
> #define SIOGRXQUEUEMAPPING     0x893D  /* Get rx queue mapping. */
> #define SIOGTXQUEUEMAPPING     0x893E  /* Get tx queue mapping. */
> 
> > 
> > Also your AF_PACKET patch only address mmaped sockets.
> > 
> The new patch set will use SKF_AD_QUEUE for rx. So it won't be limited to mmaped
> sockets.

Do we really want to expose this kind of internals to userspace?
The problem is once exposed, it becomes a kernel ABI and can not ever
change. 
-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] Kernel interfaces for multiqueue aware socket
  2010-12-16  1:14     ` Fenghua Yu
  2010-12-16  1:23       ` Stephen Hemminger
@ 2010-12-16  1:28       ` Changli Gao
  2010-12-16  2:43         ` Dimitris Michailidis
  2010-12-17  6:22         ` Junchang Wang
  2010-12-16  4:44       ` Eric Dumazet
  2 siblings, 2 replies; 19+ messages in thread
From: Changli Gao @ 2010-12-16  1:28 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Eric Dumazet, David S. Miller, Fastabend, John R, Tang, Xinan,
	Junchang Wang, netdev, linux-kernel

On Thu, Dec 16, 2010 at 9:14 AM, Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> SKF_AD_QUEUE doesn't know number of rx queues. Thus user application can't
> specify right SKF_AD_QUEUE.

It is wrong. AFAIK, you can get the queue number through
/sys/class/net/eth*/queues/ or /proc/interrupts


>
> SKF_AD_QUEUE only works for rx. There is no queue bound interfaces for tx.

Do you really need queue number? The packets must be already spreaded
among CPUs, I think you means the current CPU number. Please see
SKF_AD_CPU added by Eric.

>
> I can change the patch set to use SKF_AD_QUEUE by removing the set rx queue
> interface and still keep interfaces of
> #define SIOGNUMRXQUEUE 0x8939  /* Get number of rx queues. */
> #define SIOGNUMTXQUEUE 0x893A  /* Get number of tx queues. */
> #define SIOSTXQUEUEMAPPING     0x893C  /* Set tx queue mapping. */
> #define SIOGRXQUEUEMAPPING     0x893D  /* Get rx queue mapping. */
> #define SIOGTXQUEUEMAPPING     0x893E  /* Get tx queue mapping. */
>
>>
>> Also your AF_PACKET patch only address mmaped sockets.
>>
> The new patch set will use SKF_AD_QUEUE for rx. So it won't be limited to mmaped
> sockets.
>

If you turn to SKF_AD_QUEUE, I think no patch for kernel is needed.


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] Kernel interfaces for multiqueue aware socket
  2010-12-15 20:48   ` Eric Dumazet
  2010-12-15 20:56     ` Eric Dumazet
  2010-12-16  1:14     ` Fenghua Yu
@ 2010-12-16  1:52     ` Junchang Wang
  2010-12-16  5:00       ` Eric Dumazet
  2 siblings, 1 reply; 19+ messages in thread
From: Junchang Wang @ 2010-12-16  1:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Fenghua Yu, David S. Miller, John Fastabend, Xinan Tang, netdev,
	linux-kernel

On Thu, Dec 16, 2010 at 4:48 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> With this patch, the user-space receiving speed on a Intel SR1690 server with
>> a single L5640 6-core processor and a single ixgbe-based NIC goes from 0.73Mpps
>> to 4.20Mpps, nearly a linear speedup. A Intel SR1625 server two E5530 4-core
>> processors and a single ixgbe-based NIC goes from 0.80Mpps to 4.6Mpps. We noticed
>> the performance penalty comes from NUMA memory allocation.
>>
>
> ??? please elaborate on these NUMA memory allocations. This should be OK
> after commit 564824b0c52c34692d (net: allocate skbs on local node)
>
Hi Eric,
Commit 564824b0c52c34692d had been used in the experiments, but the problem
remained unsolved.

SLUB was used, and both servers were equipped with 8G physical memory.
Is there any
additional information I can provide?

Thanks.
-- 
--Junchang

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] Kernel interfaces for multiqueue aware socket
  2010-12-16  1:28       ` Changli Gao
@ 2010-12-16  2:43         ` Dimitris Michailidis
  2010-12-17  6:22         ` Junchang Wang
  1 sibling, 0 replies; 19+ messages in thread
From: Dimitris Michailidis @ 2010-12-16  2:43 UTC (permalink / raw)
  To: Changli Gao
  Cc: Fenghua Yu, Eric Dumazet, David S. Miller, Fastabend, John R,
	Tang, Xinan, Junchang Wang, netdev, linux-kernel

Changli Gao wrote:
> On Thu, Dec 16, 2010 at 9:14 AM, Fenghua Yu <fenghua.yu@intel.com> wrote:
>> SKF_AD_QUEUE doesn't know number of rx queues. Thus user application can't
>> specify right SKF_AD_QUEUE.
> 
> It is wrong. AFAIK, you can get the queue number through
> /sys/class/net/eth*/queues/ or /proc/interrupts

The number of Rx queues is also available through ETHTOOL_GRXRINGS though 
few drivers support it currently.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] Kernel interfaces for multiqueue aware socket
  2010-12-16  1:14     ` Fenghua Yu
  2010-12-16  1:23       ` Stephen Hemminger
  2010-12-16  1:28       ` Changli Gao
@ 2010-12-16  4:44       ` Eric Dumazet
  2010-12-17  6:12         ` Junchang Wang
  2 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2010-12-16  4:44 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: David S. Miller, Fastabend, John R, Tang, Xinan, Junchang Wang,
	netdev, linux-kernel

Le mercredi 15 décembre 2010 à 17:14 -0800, Fenghua Yu a écrit :
> On Wed, Dec 15, 2010 at 12:48:38PM -0800, Eric Dumazet wrote:
> > Le mercredi 15 décembre 2010 à 12:02 -0800, Fenghua Yu a écrit :
> > > From: Fenghua Yu <fenghua.yu@intel.com>
> > > 
> > > Multiqueue and multicore provide packet parallel processing methodology.
> > > Current kernel and network drivers place one queue on one core. But the higher
> > > level socket doesn't know multiqueue. Current socket only can receive or send
> > > packets through one network interfaces. In some cases e.g. multi bpf filter
> > > tcpdump and snort, a lot of contentions come from socket operations like ring
> > > buffer. Even if the application itself has been fully parallelized and run on
> > > multi-core systems and NIC handlex tx/rx in multiqueue in parallel, network layer
> > > and NIC device driver assemble packets to a single, serialized queue. Thus the
> > > application cannot actually run in parallel in high speed.
> > > 
> > > To break the serialized packets assembling bottleneck in kernel, one way is to
> > > allow socket to know multiqueue associated with a NIC interface. So each socket
> > > can handle tx/rx in one queue in parallel.
> > > 
> > > Kernel provides several interfaces by which sockets can be bound to rx/tx queues.
> > > User applications can configure socket by providing several sockets that each
> > > bound to a single queue, applications can get data from kernel in parallel. After
> > > that, competitions mentioned above can be removed.
> > > 
> > > With this patch, the user-space receiving speed on a Intel SR1690 server with
> > > a single L5640 6-core processor and a single ixgbe-based NIC goes from 0.73Mpps
> > > to 4.20Mpps, nearly a linear speedup. A Intel SR1625 server two E5530 4-core
> > > processors and a single ixgbe-based NIC goes from 0.80Mpps to 4.6Mpps. We noticed
> > > the performance penalty comes from NUMA memory allocation.
> > > 
> > 
> > ??? please elaborate on these NUMA memory allocations. This should be OK
> > after commit 564824b0c52c34692d (net: allocate skbs on local node)
> > 

No data for this NUMA problem ?
We had to convince Andrew Morton for this patch to get in.

> > > This patch set provides kernel ioctl interfaces for user space. User space can
> > > either directly call the interfaces or libpcap interfaces can be further provided
> > > on the top of the kernel ioctl interfaces.
> > 
> > So, say we have 8 queues, you want libpcap opens 8 sockets, and bind
> > them to each queue. Add a bpf filter to each one of them. This seems not
> > generic way, because it wont work for an UDP socket for example.
> 
> This only works for AF_PACKET like this patch set shows.
> 

Yes, we also should address other sockets, with generic mechanisms.

> > And you already can do this using SKF_AD_QUEUE (added in commit
> > d19742fb)
> 
> SKF_AD_QUEUE doesn't know number of rx queues. Thus user application can't
> specify right SKF_AD_QUEUE.
> 
> SKF_AD_QUEUE only works for rx. There is no queue bound interfaces for tx.
> 
> I can change the patch set to use SKF_AD_QUEUE by removing the set rx queue
> interface and still keep interfaces of
> #define SIOGNUMRXQUEUE 0x8939  /* Get number of rx queues. */
> #define SIOGNUMTXQUEUE 0x893A  /* Get number of tx queues. */
> #define SIOSTXQUEUEMAPPING     0x893C  /* Set tx queue mapping. */
> #define SIOGRXQUEUEMAPPING     0x893D  /* Get rx queue mapping. */
> #define SIOGTXQUEUEMAPPING     0x893E  /* Get tx queue mapping. */
> 
> > 
> > Also your AF_PACKET patch only address mmaped sockets.
> > 
> The new patch set will use SKF_AD_QUEUE for rx. So it won't be limited to mmaped
> sockets.
> 

We really need to be smarter than that, not adding raw API.

Tom Herbert added RPS, RFS, XPS, in a way applications dont have to use
special API, just run normal code.

Please understand that using 8 AF_PACKET sockets bound to a given device
is a total waste, because the way we loop on ptype_all before entering
AF_PACKET code, and in 12% of the cases deliver the packet into a queue,
and 77.5% of the case reject the packet.

This is absolutely not scalable to say... 64 queues.

I do believe we can handle that using one AF_PACKET socket for the RX
side, in order to not slow down the loop we have in
__netif_receive_skb()

list_for_each_entry_rcu(ptype, &ptype_all, list) {
	...
	deliver_skb(skb, pt_prev, orig_dev); 
}

(Same problem with dev_queue_xmit_nit() by the way, even worse since we
skb_clone() packet _before_ entering af_packet code)

And we can change af_packet to split the load to N skb queues or N ring
buffers, N not being necessarly number of NIC queues, but the number
needed to handle the expected load.

There is nothing preventing us changing af_packet/udp/tcp_listener to
something more scalable in itself, using a set of receive queues, and
NUMA friendly data set. We did multiqueue for a net_device like this,
not adding N pseudo devices as we could have done.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] Kernel interfaces for multiqueue aware socket
  2010-12-16  1:52     ` Junchang Wang
@ 2010-12-16  5:00       ` Eric Dumazet
  2010-12-17  6:15         ` Junchang Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2010-12-16  5:00 UTC (permalink / raw)
  To: Junchang Wang
  Cc: Fenghua Yu, David S. Miller, John Fastabend, Xinan Tang, netdev,
	linux-kernel

Le jeudi 16 décembre 2010 à 09:52 +0800, Junchang Wang a écrit :
> On Thu, Dec 16, 2010 at 4:48 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> With this patch, the user-space receiving speed on a Intel SR1690 server with
> >> a single L5640 6-core processor and a single ixgbe-based NIC goes from 0.73Mpps
> >> to 4.20Mpps, nearly a linear speedup. A Intel SR1625 server two E5530 4-core
> >> processors and a single ixgbe-based NIC goes from 0.80Mpps to 4.6Mpps. We noticed
> >> the performance penalty comes from NUMA memory allocation.
> >>
> >
> > ??? please elaborate on these NUMA memory allocations. This should be OK
> > after commit 564824b0c52c34692d (net: allocate skbs on local node)
> >
> Hi Eric,
> Commit 564824b0c52c34692d had been used in the experiments, but the problem
> remained unsolved.
> 
> SLUB was used, and both servers were equipped with 8G physical memory.
> Is there any
> additional information I can provide?
> 

Yes, sure, you could provide a description of the bench you used, and
data you gathered to make the conclusion that NUMA was a problem.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] Kernel interfaces for multiqueue aware socket
  2010-12-16  4:44       ` Eric Dumazet
@ 2010-12-17  6:12         ` Junchang Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Junchang Wang @ 2010-12-17  6:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Fenghua Yu, David S. Miller, Fastabend, John R, Tang, Xinan,
	netdev, linux-kernel

On Thu, Dec 16, 2010 at 12:44 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> We really need to be smarter than that, not adding raw API.
>
> Tom Herbert added RPS, RFS, XPS, in a way applications dont have to use
> special API, just run normal code.
>
> Please understand that using 8 AF_PACKET sockets bound to a given device
> is a total waste, because the way we loop on ptype_all before entering
> AF_PACKET code, and in 12% of the cases deliver the packet into a queue,
> and 77.5% of the case reject the packet.
>
> This is absolutely not scalable to say... 64 queues.
>
> I do believe we can handle that using one AF_PACKET socket for the RX
> side, in order to not slow down the loop we have in
> __netif_receive_skb()
>
> list_for_each_entry_rcu(ptype, &ptype_all, list) {
>        ...
>        deliver_skb(skb, pt_prev, orig_dev);
> }
>
> (Same problem with dev_queue_xmit_nit() by the way, even worse since we
> skb_clone() packet _before_ entering af_packet code)
>
> And we can change af_packet to split the load to N skb queues or N ring
> buffers, N not being necessarly number of NIC queues, but the number
> needed to handle the expected load.
>
> There is nothing preventing us changing af_packet/udp/tcp_listener to
> something more scalable in itself, using a set of receive queues, and
> NUMA friendly data set. We did multiqueue for a net_device like this,
> not adding N pseudo devices as we could have done.
>
Valuable comments. Thank you very much.

We'll cook a new version and resubmit it.


-- 
--Junchang

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] Kernel interfaces for multiqueue aware socket
  2010-12-16  5:00       ` Eric Dumazet
@ 2010-12-17  6:15         ` Junchang Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Junchang Wang @ 2010-12-17  6:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Fenghua Yu, David S. Miller, John Fastabend, Xinan Tang, netdev,
	linux-kernel

On Thu, Dec 16, 2010 at 1:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 16 décembre 2010 à 09:52 +0800, Junchang Wang a écrit :
>> Commit 564824b0c52c34692d had been used in the experiments, but the problem
>> remained unsolved.
>>
>> SLUB was used, and both servers were equipped with 8G physical memory.
>> Is there any
>> additional information I can provide?
>>
>
> Yes, sure, you could provide a description of the bench you used, and
> data you gathered to make the conclusion that NUMA was a problem.
>
Under the current circumstances (1Mpps), we can hardly see side effects
from memory allocator. With higher speed (say, 5Mpps with this patch set),
the problem emerged.

I'll continue this work after the patch set is done.


Thanks.
-- 
--Junchang

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] Kernel interfaces for multiqueue aware socket
  2010-12-16  1:28       ` Changli Gao
  2010-12-16  2:43         ` Dimitris Michailidis
@ 2010-12-17  6:22         ` Junchang Wang
  2010-12-17  6:50           ` Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: Junchang Wang @ 2010-12-17  6:22 UTC (permalink / raw)
  To: Changli Gao
  Cc: Fenghua Yu, Eric Dumazet, David S. Miller, Fastabend, John R,
	Tang, Xinan, netdev, linux-kernel

On Thu, Dec 16, 2010 at 9:28 AM, Changli Gao <xiaosuo@gmail.com> wrote:
> On Thu, Dec 16, 2010 at 9:14 AM, Fenghua Yu <fenghua.yu@intel.com> wrote:
>>
>> SKF_AD_QUEUE doesn't know number of rx queues. Thus user application can't
>> specify right SKF_AD_QUEUE.
>
> It is wrong. AFAIK, you can get the queue number through
> /sys/class/net/eth*/queues/ or /proc/interrupts
>

Valuable comment. Thanks.

>
> If you turn to SKF_AD_QUEUE, I think no patch for kernel is needed.
>
This patch set is about parallelization of socket interfaces to gain
performance boost (say, from 1Mpps to around 5Mpps), rather than
simply bounding socket to cpu/queue. Therefore, it does worth having.


Thanks.
-- 
--Junchang

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] Kernel interfaces for multiqueue aware socket
  2010-12-17  6:22         ` Junchang Wang
@ 2010-12-17  6:50           ` Eric Dumazet
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2010-12-17  6:50 UTC (permalink / raw)
  To: Junchang Wang
  Cc: Changli Gao, Fenghua Yu, David S. Miller, Fastabend, John R,
	Tang, Xinan, netdev, linux-kernel

Le vendredi 17 décembre 2010 à 14:22 +0800, Junchang Wang a écrit :
> On Thu, Dec 16, 2010 at 9:28 AM, Changli Gao <xiaosuo@gmail.com> wrote:
> > On Thu, Dec 16, 2010 at 9:14 AM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> >>
> >> SKF_AD_QUEUE doesn't know number of rx queues. Thus user application can't
> >> specify right SKF_AD_QUEUE.
> >
> > It is wrong. AFAIK, you can get the queue number through
> > /sys/class/net/eth*/queues/ or /proc/interrupts
> >
> 
> Valuable comment. Thanks.
> 
> >
> > If you turn to SKF_AD_QUEUE, I think no patch for kernel is needed.
> >
> This patch set is about parallelization of socket interfaces to gain
> performance boost (say, from 1Mpps to around 5Mpps), rather than
> simply bounding socket to cpu/queue. Therefore, it does worth having.
> 

Definitely, but this needs to be designed so that it can be used by even
dumb applications :)




^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2010-12-17  6:50 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-15 20:02 [PATCH 0/3] Kernel interfaces for multiqueue aware socket Fenghua Yu
2010-12-15 20:02 ` [PATCH 1/3] " Fenghua Yu
2010-12-15 20:48   ` Eric Dumazet
2010-12-15 20:56     ` Eric Dumazet
2010-12-16  1:14     ` Fenghua Yu
2010-12-16  1:23       ` Stephen Hemminger
2010-12-16  1:28       ` Changli Gao
2010-12-16  2:43         ` Dimitris Michailidis
2010-12-17  6:22         ` Junchang Wang
2010-12-17  6:50           ` Eric Dumazet
2010-12-16  4:44       ` Eric Dumazet
2010-12-17  6:12         ` Junchang Wang
2010-12-16  1:52     ` Junchang Wang
2010-12-16  5:00       ` Eric Dumazet
2010-12-17  6:15         ` Junchang Wang
2010-12-15 20:52   ` John Fastabend
2010-12-15 20:02 ` [PATCH 2/3] net/packet/af_packet.c: implement multiqueue aware socket in af_apcket Fenghua Yu
2010-12-15 20:02 ` [PATCH 3/3] drivers/net/ixgbe/ixgbe_main.c: get tx queue mapping specified in socket Fenghua Yu
2010-12-15 20:54   ` John Fastabend

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).