* [PATCH] pktgen node allocation
@ 2010-03-19 8:44 Robert Olsson
2010-03-19 9:28 ` Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Robert Olsson @ 2010-03-19 8:44 UTC (permalink / raw)
To: David Miller; +Cc: netdev, robert
Hi,
Here is patch to manipulate packet node allocation and implicitly
how packets are DMA'd etc.
The flag NODE_ALLOC enables the function and numa_node_id();
when enabled it can also be explicitly controlled via a new
node parameter
Tested this with 10 Intel 82599 ports w. TYAN S7025 E5520 CPU's.
Was able to TX/DMA ~80 Gbit/s to Ethernet wires.
Cheers
--ro
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 4392381..c195fd0 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -169,7 +169,7 @@
#include <asm/dma.h>
#include <asm/div64.h> /* do_div */
-#define VERSION "2.72"
+#define VERSION "2.73"
#define IP_NAME_SZ 32
#define MAX_MPLS_LABELS 16 /* This is the max label stack depth */
#define MPLS_STACK_BOTTOM htonl(0x00000100)
@@ -190,6 +190,7 @@
#define F_IPSEC_ON (1<<12) /* ipsec on for flows */
#define F_QUEUE_MAP_RND (1<<13) /* queue map Random */
#define F_QUEUE_MAP_CPU (1<<14) /* queue map mirrors smp_processor_id() */
+#define F_NODE (1<<15) /* Node memory alloc*/
/* Thread control flag bits */
#define T_STOP (1<<0) /* Stop run */
@@ -372,6 +373,7 @@ struct pktgen_dev {
u16 queue_map_min;
u16 queue_map_max;
+ int node; /* Memory node */
#ifdef CONFIG_XFRM
__u8 ipsmode; /* IPSEC mode (config) */
@@ -607,6 +609,9 @@ static int pktgen_if_show(struct seq_file *seq, void *v)
if (pkt_dev->traffic_class)
seq_printf(seq, " traffic_class: 0x%02x\n", pkt_dev->traffic_class);
+ if (pkt_dev->node >= 0)
+ seq_printf(seq, " node: %d\n", pkt_dev->node);
+
seq_printf(seq, " Flags: ");
if (pkt_dev->flags & F_IPV6)
@@ -660,6 +665,9 @@ static int pktgen_if_show(struct seq_file *seq, void *v)
if (pkt_dev->flags & F_SVID_RND)
seq_printf(seq, "SVID_RND ");
+ if (pkt_dev->flags & F_NODE)
+ seq_printf(seq, "NODE_ALLOC ");
+
seq_puts(seq, "\n");
/* not really stopped, more like last-running-at */
@@ -1074,6 +1082,21 @@ static ssize_t pktgen_if_write(struct file *file,
pkt_dev->dst_mac_count);
return count;
}
+ if (!strcmp(name, "node")) {
+ len = num_arg(&user_buffer[i], 10, &value);
+ if (len < 0)
+ return len;
+
+ i += len;
+
+ if(node_possible(value)) {
+ pkt_dev->node = value;
+ sprintf(pg_result, "OK: node=%d", pkt_dev->node);
+ }
+ else
+ sprintf(pg_result, "ERROR: node not possible");
+ return count;
+ }
if (!strcmp(name, "flag")) {
char f[32];
memset(f, 0, 32);
@@ -1166,12 +1189,18 @@ static ssize_t pktgen_if_write(struct file *file,
else if (strcmp(f, "!IPV6") == 0)
pkt_dev->flags &= ~F_IPV6;
+ else if (strcmp(f, "NODE_ALLOC") == 0)
+ pkt_dev->flags |= F_NODE;
+
+ else if (strcmp(f, "!NODE_ALLOC") == 0)
+ pkt_dev->flags &= ~F_NODE;
+
else {
sprintf(pg_result,
"Flag -:%s:- unknown\nAvailable flags, (prepend ! to un-set flag):\n%s",
f,
"IPSRC_RND, IPDST_RND, UDPSRC_RND, UDPDST_RND, "
- "MACSRC_RND, MACDST_RND, TXSIZE_RND, IPV6, MPLS_RND, VID_RND, SVID_RND, FLOW_SEQ, IPSEC\n");
+ "MACSRC_RND, MACDST_RND, TXSIZE_RND, IPV6, MPLS_RND, VID_RND, SVID_RND, FLOW_SEQ, IPSEC, NODE_ALLOC\n");
return count;
}
sprintf(pg_result, "OK: flags=0x%x", pkt_dev->flags);
@@ -2572,9 +2601,27 @@ static struct sk_buff *fill_packet_ipv4(struct net_device *odev,
mod_cur_headers(pkt_dev);
datalen = (odev->hard_header_len + 16) & ~0xf;
- skb = __netdev_alloc_skb(odev,
- pkt_dev->cur_pkt_size + 64
- + datalen + pkt_dev->pkt_overhead, GFP_NOWAIT);
+
+ if(pkt_dev->flags & F_NODE) {
+ int node;
+
+ if(pkt_dev->node >= 0)
+ node = pkt_dev->node;
+ else
+ node = numa_node_id();
+
+ skb = __alloc_skb(NET_SKB_PAD + pkt_dev->cur_pkt_size + 64
+ + datalen + pkt_dev->pkt_overhead, GFP_NOWAIT, 0, node);
+ if (likely(skb)) {
+ skb_reserve(skb, NET_SKB_PAD);
+ skb->dev = odev;
+ }
+ }
+ else
+ skb = __netdev_alloc_skb(odev,
+ pkt_dev->cur_pkt_size + 64
+ + datalen + pkt_dev->pkt_overhead, GFP_NOWAIT);
+
if (!skb) {
sprintf(pkt_dev->result, "No memory");
return NULL;
@@ -3674,6 +3721,7 @@ static int pktgen_add_device(struct pktgen_thread *t, const char *ifname)
pkt_dev->svlan_p = 0;
pkt_dev->svlan_cfi = 0;
pkt_dev->svlan_id = 0xffff;
+ pkt_dev->node = -1;
err = pktgen_setup_dev(pkt_dev, ifname);
if (err)
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] pktgen node allocation
2010-03-19 8:44 [PATCH] pktgen node allocation Robert Olsson
@ 2010-03-19 9:28 ` Eric Dumazet
2010-03-19 13:35 ` robert
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2010-03-19 9:28 UTC (permalink / raw)
To: Robert Olsson; +Cc: David Miller, netdev
Le vendredi 19 mars 2010 à 09:44 +0100, Robert Olsson a écrit :
>
> Hi,
> Here is patch to manipulate packet node allocation and implicitly
> how packets are DMA'd etc.
>
> The flag NODE_ALLOC enables the function and numa_node_id();
> when enabled it can also be explicitly controlled via a new
> node parameter
>
> Tested this with 10 Intel 82599 ports w. TYAN S7025 E5520 CPU's.
> Was able to TX/DMA ~80 Gbit/s to Ethernet wires.
>
> Cheers
> --ro
>
I cannot understand how this can help.
__netdev_alloc_skb() is supposed to already take into account NUMA
properties :
int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
If this doesnt work, we should correct core stack, not only pktgen :)
Are you allocating memory in the node where pktgen CPU is running or the
node close to the NIC ?
Thanks
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] pktgen node allocation
2010-03-19 9:28 ` Eric Dumazet
@ 2010-03-19 13:35 ` robert
2010-03-19 13:47 ` Eric Dumazet
2010-03-22 3:37 ` David Miller
0 siblings, 2 replies; 8+ messages in thread
From: robert @ 2010-03-19 13:35 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Robert Olsson, David Miller, netdev
Eric Dumazet writes:
> Le vendredi 19 mars 2010 à 09:44 +0100, Robert Olsson a écrit :
>
> I cannot understand how this can help.
>
> __netdev_alloc_skb() is supposed to already take into account NUMA
> properties :
>
> int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
>
> If this doesnt work, we should correct core stack, not only pktgen :)
>
> Are you allocating memory in the node where pktgen CPU is running or the
> node close to the NIC ?
I didn't say it should help the idea was to give some hooks to
experiment and see effects with different node memory allocations.
There are many degrees of freedom wrt buses(device)/CPU/menory.
Cheers
--ro
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] pktgen node allocation
2010-03-19 13:35 ` robert
@ 2010-03-19 13:47 ` Eric Dumazet
2010-03-22 6:24 ` Robert Olsson
2010-03-22 3:37 ` David Miller
1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2010-03-19 13:47 UTC (permalink / raw)
To: robert; +Cc: David Miller, netdev
Le vendredi 19 mars 2010 à 14:35 +0100, robert@herjulf.net a écrit :
> Eric Dumazet writes:
> > Le vendredi 19 mars 2010 à 09:44 +0100, Robert Olsson a écrit :
> >
> > I cannot understand how this can help.
> >
> > __netdev_alloc_skb() is supposed to already take into account NUMA
> > properties :
> >
> > int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
> >
> > If this doesnt work, we should correct core stack, not only pktgen :)
> >
> > Are you allocating memory in the node where pktgen CPU is running or the
> > node close to the NIC ?
>
> I didn't say it should help the idea was to give some hooks to
> experiment and see effects with different node memory allocations.
> There are many degrees of freedom wrt buses(device)/CPU/menory.
>
Well, you said "Tested this with 10 Intel 82599 ports w. TYAN S7025
E5520 CPU's. Was able to TX/DMA ~80 Gbit/s to Ethernet wires."
I am interested to know what particular setup you did to maximize
throughput then, or are you saing you managed to reduce it ? :)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] pktgen node allocation
2010-03-19 13:47 ` Eric Dumazet
@ 2010-03-22 6:24 ` Robert Olsson
2010-03-22 7:43 ` Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Robert Olsson @ 2010-03-22 6:24 UTC (permalink / raw)
To: Eric Dumazet; +Cc: robert, David Miller, netdev, olofh
Eric Dumazet writes:
> Well, you said "Tested this with 10 Intel 82599 ports w. TYAN S7025
> E5520 CPU's. Was able to TX/DMA ~80 Gbit/s to Ethernet wires."
>
> I am interested to know what particular setup you did to maximize
> throughput then, or are you saing you managed to reduce it ? :)
Some notes from the experiment, It's getting
complex and hairy. Anyway results from the first
tests to give you an idea... My colleague Olof
might have some comments/details
pktgen sending on 10 * 10g interfaces.
[From pktgen script]
fn()
{
i=$1 #ifname
c=$2 #queue / cpu core
n=$3 # numa node
PGDEV=/proc/net/pktgen/kpktgend_$c
pgset "add_device eth$i@$c "
PGDEV=/proc/net/pktgen/eth$i@$c
pgset "node $n"
pgset "$COUNT"
pgset "flag NODE_ALLOC"
pgset "$CLONE_SKB"
pgset "$PKT_SIZE"
pgset "$DELAY"
pgset "dst 10.0.0.0"
}
remove_all
# Setup
# TYAN S7025 with two nodes.
# Each node has own bus with it's own TYLERSBURG bridge
# so eth0-eth3 is closest to node0 which in turn "owns"
# CPU-cores 0-3 in this HW setup. So we setup so
# pktgen according to this. clone_skb=1000000.
# Used slots are PCIe-x16 except when PCIe-x8 is indicated.
# eth0 queue=0(CPU) node=0
fn 0 0 0
fn 1 1 0
fn 2 2 0
fn 3 3 0
fn 4 4 1
fn 5 5 1
fn 6 6 1
fn 7 7 1
fn 8 12 1
fn 9 13 1
Result "manually" tuned.
eth0 9617.7 M bit/s 822 k pps
eth1 9619.1 M bit/s 823 k pps
eth2 9619.1 M bit/s 823 k pps
eth3 9619.2 M bit/s 823 k pps
eth4 5995.2 M bit/s 512 k pps <- PCIe-x8
eth5 5995.3 M bit/s 512 k pps <- PCIe-x8
eth6 9619.2 M bit/s 823 k pps
eth7 9619.2 M bit/s 823 k pps
eth8 9619.1 M bit/s 823 k pps
eth9 9619.0 M bit/s 823 k pps
> 90 Gbit/s
Result "manually" mistuned by switching node 0 and 1.
eth0 9613.6 M bit/s 822 k pps
eth1 9614.9 M bit/s 822 k pps
eth2 9615.0 M bit/s 822 k pps
eth3 9615.1 M bit/s 822 k pps
eth4 2918.5 M bit/s 249 k pps <- PCIe-x8
eth5 2918.4 M bit/s 249 k pps <- PCIe-x8
eth6 8597.0 M bit/s 735 k pps
eth7 8597.0 M bit/s 735 k pps
eth8 8568.3 M bit/s 733 k pps
eth9 8568.3 M bit/s 733 k pps
A lot things is to be investgated...
Cheers
--ro
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] pktgen node allocation
2010-03-22 6:24 ` Robert Olsson
@ 2010-03-22 7:43 ` Eric Dumazet
2010-03-22 18:05 ` Robert Olsson
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2010-03-22 7:43 UTC (permalink / raw)
To: Robert Olsson; +Cc: David Miller, netdev, olofh
Le lundi 22 mars 2010 à 07:24 +0100, Robert Olsson a écrit :
> Eric Dumazet writes:
>
> > Well, you said "Tested this with 10 Intel 82599 ports w. TYAN S7025
> > E5520 CPU's. Was able to TX/DMA ~80 Gbit/s to Ethernet wires."
> >
> > I am interested to know what particular setup you did to maximize
> > throughput then, or are you saing you managed to reduce it ? :)
>
>
> Some notes from the experiment, It's getting
> complex and hairy. Anyway results from the first
> tests to give you an idea... My colleague Olof
> might have some comments/details
>
> pktgen sending on 10 * 10g interfaces.
>
> [From pktgen script]
> fn()
> {
> i=$1 #ifname
> c=$2 #queue / cpu core
> n=$3 # numa node
> PGDEV=/proc/net/pktgen/kpktgend_$c
> pgset "add_device eth$i@$c "
> PGDEV=/proc/net/pktgen/eth$i@$c
> pgset "node $n"
> pgset "$COUNT"
> pgset "flag NODE_ALLOC"
> pgset "$CLONE_SKB"
> pgset "$PKT_SIZE"
> pgset "$DELAY"
> pgset "dst 10.0.0.0"
> }
>
> remove_all
> # Setup
>
> # TYAN S7025 with two nodes.
> # Each node has own bus with it's own TYLERSBURG bridge
> # so eth0-eth3 is closest to node0 which in turn "owns"
> # CPU-cores 0-3 in this HW setup. So we setup so
> # pktgen according to this. clone_skb=1000000.
> # Used slots are PCIe-x16 except when PCIe-x8 is indicated.
>
> # eth0 queue=0(CPU) node=0
> fn 0 0 0
> fn 1 1 0
> fn 2 2 0
> fn 3 3 0
> fn 4 4 1
> fn 5 5 1
> fn 6 6 1
> fn 7 7 1
> fn 8 12 1
> fn 9 13 1
>
> Result "manually" tuned.
>
> eth0 9617.7 M bit/s 822 k pps
> eth1 9619.1 M bit/s 823 k pps
> eth2 9619.1 M bit/s 823 k pps
> eth3 9619.2 M bit/s 823 k pps
> eth4 5995.2 M bit/s 512 k pps <- PCIe-x8
> eth5 5995.3 M bit/s 512 k pps <- PCIe-x8
> eth6 9619.2 M bit/s 823 k pps
> eth7 9619.2 M bit/s 823 k pps
> eth8 9619.1 M bit/s 823 k pps
> eth9 9619.0 M bit/s 823 k pps
>
> > 90 Gbit/s
>
> Result "manually" mistuned by switching node 0 and 1.
>
> eth0 9613.6 M bit/s 822 k pps
> eth1 9614.9 M bit/s 822 k pps
> eth2 9615.0 M bit/s 822 k pps
> eth3 9615.1 M bit/s 822 k pps
> eth4 2918.5 M bit/s 249 k pps <- PCIe-x8
> eth5 2918.4 M bit/s 249 k pps <- PCIe-x8
> eth6 8597.0 M bit/s 735 k pps
> eth7 8597.0 M bit/s 735 k pps
> eth8 8568.3 M bit/s 733 k pps
> eth9 8568.3 M bit/s 733 k pps
>
> A lot things is to be investgated...
Sure :)
I wonder why eth0-eth3 results are unchanged after a node flip.
Thanks for sharing
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] pktgen node allocation
2010-03-22 7:43 ` Eric Dumazet
@ 2010-03-22 18:05 ` Robert Olsson
0 siblings, 0 replies; 8+ messages in thread
From: Robert Olsson @ 2010-03-22 18:05 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Robert Olsson, David Miller, netdev, olofh
Eric Dumazet writes:
> > Result "manually" tuned.
> >
> > eth0 9617.7 M bit/s 822 k pps
> > eth1 9619.1 M bit/s 823 k pps
> > eth2 9619.1 M bit/s 823 k pps
> > eth3 9619.2 M bit/s 823 k pps
> > eth4 5995.2 M bit/s 512 k pps <- PCIe-x8
> > eth5 5995.3 M bit/s 512 k pps <- PCIe-x8
> > eth6 9619.2 M bit/s 823 k pps
> > eth7 9619.2 M bit/s 823 k pps
> > eth8 9619.1 M bit/s 823 k pps
> > eth9 9619.0 M bit/s 823 k pps
> >
> > > 90 Gbit/s
DMA potential this box is about four 10g ports.
> > Result "manually" mistuned by switching node 0 and 1.
> >
> > eth0 9613.6 M bit/s 822 k pps
> > eth1 9614.9 M bit/s 822 k pps
> > eth2 9615.0 M bit/s 822 k pps
> > eth3 9615.1 M bit/s 822 k pps
> > eth4 2918.5 M bit/s 249 k pps <- PCIe-x8
> > eth5 2918.4 M bit/s 249 k pps <- PCIe-x8
> > eth6 8597.0 M bit/s 735 k pps
> > eth7 8597.0 M bit/s 735 k pps
> > eth8 8568.3 M bit/s 733 k pps
> > eth9 8568.3 M bit/s 733 k pps
> >
> I wonder why eth0-eth3 results are unchanged after a node flip.
Yes it's strange.
With clone_skb=1 we could see differences with just one GIGE interface
using 64 byte pkts so it might be very different on 10g. We're getting
unfortunely closer to hardware...
Cheers
--ro
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] pktgen node allocation
2010-03-19 13:35 ` robert
2010-03-19 13:47 ` Eric Dumazet
@ 2010-03-22 3:37 ` David Miller
1 sibling, 0 replies; 8+ messages in thread
From: David Miller @ 2010-03-22 3:37 UTC (permalink / raw)
To: robert; +Cc: eric.dumazet, netdev
From: robert@herjulf.net
Date: Fri, 19 Mar 2010 14:35:22 +0100
>
> Eric Dumazet writes:
> > Le vendredi 19 mars 2010 à 09:44 +0100, Robert Olsson a écrit :
> >
> > I cannot understand how this can help.
> >
> > __netdev_alloc_skb() is supposed to already take into account NUMA
> > properties :
> >
> > int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
> >
> > If this doesnt work, we should correct core stack, not only pktgen :)
> >
> > Are you allocating memory in the node where pktgen CPU is running or the
> > node close to the NIC ?
>
> I didn't say it should help the idea was to give some hooks to
> experiment and see effects with different node memory allocations.
> There are many degrees of freedom wrt buses(device)/CPU/menory.
I think it's a useful feature and by default the netdev alloc
is still used, so... applied to net-next-2.6
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-03-22 18:22 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-19 8:44 [PATCH] pktgen node allocation Robert Olsson
2010-03-19 9:28 ` Eric Dumazet
2010-03-19 13:35 ` robert
2010-03-19 13:47 ` Eric Dumazet
2010-03-22 6:24 ` Robert Olsson
2010-03-22 7:43 ` Eric Dumazet
2010-03-22 18:05 ` Robert Olsson
2010-03-22 3:37 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).