From: jamal <hadi@cyberus.ca>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Changli Gao <xiaosuo@gmail.com>,
David Miller <davem@davemloft.net>,
therbert@google.com, shemminger@vyatta.com,
netdev@vger.kernel.org, Eilon Greenstein <eilong@broadcom.com>,
Brian Bloniarz <bmb@athenacr.com>
Subject: Re: [PATCH net-next-2.6] net: speedup udp receive path
Date: Fri, 30 Apr 2010 15:30:14 -0400 [thread overview]
Message-ID: <1272655814.3879.8.camel@bigi> (raw)
In-Reply-To: <1272573383.3969.8.camel@bigi>
[-- Attachment #1: Type: text/plain, Size: 1322 bytes --]
Eric!
I managed to mod your program to look conceptually similar to mine
and i reproduced the results with same test kernel from yesterday.
So it is likely the issue is in using epoll vs not using any async as
in your case.
Results attached as well as modified program.
Note: the key things to remember:
rps with this program gets worse over time and different net-next
kernels since Apr14 (look at graph i supplied). Sorry, I am really
busy-ed out to dig any further.
cheers,
jamal
On Thu, 2010-04-29 at 16:36 -0400, jamal wrote:
> On Thu, 2010-04-29 at 09:56 -0400, jamal wrote:
>
> >
> > I will try your program instead so we can reduce the variables
>
> Results attached.
> With your app rps does a hell lot better and non-rps worse ;->
> With my proggie, non-rps does much better than yours and rps does
> a lot worse for same setup. I see the scheduler kicking quiet a bit in
> non-rps for you...
>
> The main difference between us as i see it is:
> a) i use epoll - actually linked to libevent (1.0.something)
> b) I fork processes and you use pthreads.
>
> I dont have time to chase it today, but 1) I am either going to change
> yours to use libevent or make mine get rid of it then 2) move towards
> pthreads or have yours fork..
> then observe if that makes any difference..
>
>
> cheers,
> jamal
[-- Attachment #2: apr30-ericmod --]
[-- Type: text/plain, Size: 8919 bytes --]
First a few runs with Eric's code + epoll/libevent
-------------------------------------------------------------------------------
PerfTop: 4009 irqs/sec kernel:83.4% [1000Hz cycles], (all, 8 CPUs)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ ____________________
2097.00 8.6% sky2_poll [sky2]
1742.00 7.2% _raw_spin_lock_irqsave [kernel]
831.00 3.4% system_call [kernel]
654.00 2.7% copy_user_generic_string [kernel]
654.00 2.7% datagram_poll [kernel]
647.00 2.7% fget [kernel]
623.00 2.6% _raw_spin_unlock_irqrestore [kernel]
547.00 2.3% _raw_spin_lock_bh [kernel]
506.00 2.1% sys_epoll_ctl [kernel]
475.00 2.0% kmem_cache_free [kernel]
466.00 1.9% schedule [kernel]
436.00 1.8% vread_tsc [kernel].vsyscall_fn
417.00 1.7% fput [kernel]
415.00 1.7% sys_epoll_wait [kernel]
402.00 1.7% _raw_spin_lock [kernel]
-------------------------------------------------------------------------------
PerfTop: 616 irqs/sec kernel:98.7% [1000Hz cycles], (all, cpu: 0)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ______________________ ________
2534.00 28.6% sky2_poll [sky2]
503.00 5.7% ip_route_input [kernel]
438.00 4.9% _raw_spin_lock_irqsave [kernel]
418.00 4.7% __udp4_lib_lookup [kernel]
378.00 4.3% __alloc_skb [kernel]
364.00 4.1% ip_rcv [kernel]
323.00 3.6% _raw_spin_lock [kernel]
315.00 3.5% sock_queue_rcv_skb [kernel]
284.00 3.2% __netif_receive_skb [kernel]
281.00 3.2% __udp4_lib_rcv [kernel]
266.00 3.0% __wake_up_common [kernel]
238.00 2.7% sock_def_readable [kernel]
181.00 2.0% __kmalloc [kernel]
163.00 1.8% kmem_cache_alloc [kernel]
150.00 1.7% ep_poll_callback [kernel]
-------------------------------------------------------------------------------
PerfTop: 854 irqs/sec kernel:80.2% [1000Hz cycles], (all, cpu: 2)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ ____________________
341.00 8.0% _raw_spin_lock_irqsave [kernel]
235.00 5.5% system_call [kernel]
174.00 4.1% datagram_poll [kernel]
174.00 4.1% fget [kernel]
173.00 4.1% copy_user_generic_string [kernel]
135.00 3.2% _raw_spin_unlock_irqrestore [kernel]
125.00 2.9% _raw_spin_lock_bh [kernel]
122.00 2.9% schedule [kernel]
113.00 2.6% sys_epoll_ctl [kernel]
113.00 2.6% kmem_cache_free [kernel]
108.00 2.5% vread_tsc [kernel].vsyscall_fn
105.00 2.5% sys_epoll_wait [kernel]
102.00 2.4% udp_recvmsg [kernel]
95.00 2.2% mutex_lock [kernel]
Average 97.55% of 10M packets at 750Kpps
Turn on rps mask ee and irq affinity to cpu0
-------------------------------------------------------------------------------
PerfTop: 3885 irqs/sec kernel:83.6% [1000Hz cycles], (all, 8 CPUs)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ______________________________ ________
2945.00 16.7% sky2_poll [sky2]
653.00 3.7% _raw_spin_lock_irqsave [kernel]
460.00 2.6% system_call [kernel]
420.00 2.4% _raw_spin_unlock_irqrestore [kernel]
414.00 2.3% sky2_intr [sky2]
392.00 2.2% fget [kernel]
360.00 2.0% ip_rcv [kernel]
324.00 1.8% sys_epoll_ctl [kernel]
323.00 1.8% __netif_receive_skb [kernel]
310.00 1.8% schedule [kernel]
292.00 1.7% ip_route_input [kernel]
292.00 1.7% _raw_spin_lock [kernel]
291.00 1.7% copy_user_generic_string [kernel]
284.00 1.6% kmem_cache_free [kernel]
262.00 1.5% call_function_single_interrupt [kernel]
-------------------------------------------------------------------------------
PerfTop: 1000 irqs/sec kernel:98.1% [1000Hz cycles], (all, cpu: 0)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________________ ________
4170.00 61.9% sky2_poll [sky2]
723.00 10.7% sky2_intr [sky2]
159.00 2.4% __alloc_skb [kernel]
140.00 2.1% get_rps_cpu [kernel]
106.00 1.6% __kmalloc [kernel]
95.00 1.4% enqueue_to_backlog [kernel]
86.00 1.3% kmem_cache_alloc [kernel]
85.00 1.3% irq_entries_start [kernel]
85.00 1.3% _raw_spin_lock_irqsave [kernel]
82.00 1.2% _raw_spin_lock [kernel]
66.00 1.0% swiotlb_sync_single [kernel]
58.00 0.9% sky2_remove [sky2]
49.00 0.7% default_send_IPI_mask_sequence_phys [kernel]
47.00 0.7% sky2_rx_submit [sky2]
36.00 0.5% _raw_spin_unlock_irqrestore [kernel]
-------------------------------------------------------------------------------
PerfTop: 344 irqs/sec kernel:84.3% [1000Hz cycles], (all, cpu: 2)
-------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ______________________________ ____________________
114.00 5.2% _raw_spin_lock_irqsave [kernel]
79.00 3.6% fget [kernel]
78.00 3.6% ip_rcv [kernel]
78.00 3.6% system_call [kernel]
75.00 3.4% _raw_spin_unlock_irqrestore [kernel]
67.00 3.1% sys_epoll_ctl [kernel]
65.00 3.0% schedule [kernel]
61.00 2.8% ip_route_input [kernel]
48.00 2.2% vread_tsc [kernel].vsyscall_fn
48.00 2.2% call_function_single_interrupt [kernel]
46.00 2.1% kmem_cache_free [kernel]
45.00 2.1% __netif_receive_skb [kernel]
41.00 1.9% process_recv snkudp
40.00 1.8% kfree [kernel]
39.00 1.8% _raw_spin_lock [kernel]
92.97% of 10M packets at 750Kpps
Ok, so this is exactly what i saw with my app. non-rps is better.
To summarize: It used to be the opposite on net-next before around
Apr14. rps has gotten worse.
[-- Attachment #3: udpsnkfrk.c --]
[-- Type: text/x-csrc, Size: 3650 bytes --]
/*
* Usage: udpsink [ -p baseport] nbports
*/
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <event.h>
struct worker_data {
struct event *snk_ev;
struct event_base *base;
struct timeval t;
unsigned long pack_count;
unsigned long bytes_count;
unsigned long tout;
int fd; /* move to avoid hole on 64-bit */
int pad1; /*64B - let Eric figure the math;-> */
//unsigned long _padd[16 - 3]; /* alignment */
};
void usage(int code)
{
fprintf(stderr, "Usage: udpsink [-p baseport] nbports\n");
exit(code);
}
void process_recv(int fd, short ev, void *arg)
{
char buffer[4096];
struct sockaddr_in addr;
socklen_t len = sizeof(addr);
struct worker_data *wdata = (struct worker_data *)arg;
int lu = 0;
if ((event_add(wdata->snk_ev, &wdata->t)) < 0) {
perror("cb event_add");
return;
}
if (ev == EV_TIMEOUT) {
wdata->tout++;
} else {
lu = recvfrom(wdata->fd, buffer, sizeof(buffer), 0,
(struct sockaddr *)&addr, &len);
if (lu > 0) {
wdata->pack_count++;
wdata->bytes_count += lu;
}
}
}
int prep_thread(struct worker_data *wdata)
{
wdata->t.tv_sec = 1;
wdata->t.tv_usec = random() % 50000L;
wdata->base = event_init();
event_set(wdata->snk_ev, wdata->fd, EV_READ, process_recv, wdata);
event_base_set(wdata->base, wdata->snk_ev);
if ((event_add(wdata->snk_ev, &wdata->t)) < 0) {
perror("event_add");
return -1;
}
return 0;
}
void *worker_func(void *arg)
{
struct worker_data *wdata = (struct worker_data *)arg;
return (void *)event_base_loop(wdata->base, 0);
}
int main(int argc, char *argv[])
{
int c;
int baseport = 4000;
int nbthreads;
struct worker_data *wdata;
unsigned long ototal = 0;
int concurrent = 0;
int verbose = 0;
int i;
while ((c = getopt(argc, argv, "cvp:")) != -1) {
if (c == 'p')
baseport = atoi(optarg);
else if (c == 'c')
concurrent = 1;
else if (c == 'v')
verbose++;
else
usage(1);
}
if (optind == argc)
usage(1);
nbthreads = atoi(argv[optind]);
wdata = calloc(sizeof(struct worker_data), nbthreads);
if (!wdata) {
perror("calloc");
return 1;
}
for (i = 0; i < nbthreads; i++) {
struct sockaddr_in addr;
pthread_t tid;
if (i && concurrent) {
wdata[i].fd = wdata[0].fd;
} else {
wdata[i].snk_ev = malloc(sizeof(struct event));
if (!wdata[i].snk_ev)
return 1;
memset(wdata[i].snk_ev, 0, sizeof(struct event));
wdata[i].fd = socket(PF_INET, SOCK_DGRAM, 0);
if (wdata[i].fd == -1) {
free(wdata[i].snk_ev);
perror("socket");
return 1;
}
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
// addr.sin_addr.s_addr = inet_addr(argv[optind]);
addr.sin_port = htons(baseport + i);
if (bind
(wdata[i].fd, (struct sockaddr *)&addr,
sizeof(addr)) < 0) {
free(wdata[i].snk_ev);
perror("bind");
return 1;
}
// fcntl(wdata[i].fd, F_SETFL, O_NDELAY);
}
if (prep_thread(wdata + i)) {
printf("failed to allocate thread %d, exit\n", i);
exit(0);
}
pthread_create(&tid, NULL, worker_func, wdata + i);
}
for (;;) {
unsigned long total;
long delta;
sleep(1);
total = 0;
for (i = 0; i < nbthreads; i++) {
total += wdata[i].pack_count;
}
delta = total - ototal;
if (delta) {
printf("%lu pps (%lu", delta, total);
if (verbose) {
for (i = 0; i < nbthreads; i++) {
if (wdata[i].pack_count)
printf(" %d:%lu", i,
wdata[i].pack_count);
}
}
printf(")\n");
}
ototal = total;
}
}
next prev parent reply other threads:[~2010-04-30 19:30 UTC|newest]
Thread overview: 108+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-23 8:12 [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue Changli Gao
2010-04-23 9:27 ` Eric Dumazet
2010-04-23 22:02 ` jamal
2010-04-24 14:10 ` jamal
2010-04-26 14:03 ` Eric Dumazet
2010-04-26 14:55 ` Eric Dumazet
2010-04-26 21:06 ` jamal
[not found] ` <20100429174056.GA8044@gargoyle.fritz.box>
2010-04-29 17:56 ` Eric Dumazet
2010-04-29 18:10 ` OFT - reserving CPU's for networking Stephen Hemminger
2010-04-29 19:19 ` Thomas Gleixner
2010-04-29 20:02 ` Eric Dumazet
2010-04-30 18:15 ` Brian Bloniarz
2010-04-30 18:57 ` David Miller
2010-04-30 19:58 ` Thomas Gleixner
2010-04-30 21:01 ` Andi Kleen
2010-04-30 22:30 ` David Miller
2010-05-01 10:53 ` Andi Kleen
2010-05-01 22:03 ` David Miller
2010-05-01 22:58 ` Andi Kleen
2010-05-01 23:29 ` David Miller
2010-05-01 23:44 ` Ben Hutchings
2010-05-01 20:31 ` Martin Josefsson
2010-05-01 22:13 ` David Miller
[not found] ` <20100429182347.GA8512@gargoyle.fritz.box>
2010-04-29 19:12 ` [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue Eric Dumazet
[not found] ` <20100429214144.GA10663@gargoyle.fritz.box>
2010-04-30 5:25 ` Eric Dumazet
2010-04-30 23:38 ` David Miller
2010-05-01 11:00 ` Andi Kleen
2010-05-02 6:56 ` Eric Dumazet
2010-05-02 9:20 ` Andi Kleen
2010-05-02 10:54 ` Eric Dumazet
2010-05-02 14:13 ` Arjan van de Ven
2010-05-02 14:27 ` Eric Dumazet
2010-05-02 15:32 ` Eric Dumazet
2010-05-02 17:54 ` Arjan van de Ven
2010-05-02 19:22 ` Eric Dumazet
2010-05-02 22:06 ` Andi Kleen
2010-05-03 3:50 ` Arjan van de Ven
2010-05-03 5:17 ` Eric Dumazet
2010-05-03 10:22 ` Arjan van de Ven
2010-05-03 10:34 ` Andi Kleen
2010-05-03 14:09 ` Arjan van de Ven
2010-05-03 14:45 ` Brian Bloniarz
2010-05-04 1:10 ` Arjan van de Ven
2010-05-03 15:52 ` Andi Kleen
2010-05-04 1:11 ` Arjan van de Ven
2010-05-02 21:30 ` Andi Kleen
2010-05-02 15:46 ` Andi Kleen
2010-05-02 16:35 ` Eric Dumazet
2010-05-02 17:43 ` Arjan van de Ven
2010-05-02 17:47 ` Eric Dumazet
2010-05-02 21:25 ` Andi Kleen
2010-05-02 21:45 ` Eric Dumazet
2010-05-02 21:54 ` Andi Kleen
2010-05-02 22:08 ` Eric Dumazet
2010-05-03 20:15 ` jamal
2010-04-26 21:03 ` jamal
2010-04-23 10:26 ` Eric Dumazet
2010-04-27 22:08 ` David Miller
2010-04-27 22:18 ` [PATCH net-next-2.6] bnx2x: Remove two prefetch() Eric Dumazet
2010-04-27 22:19 ` David Miller
2010-04-28 13:14 ` Eilon Greenstein
2010-04-28 15:44 ` Eliezer Tamir
2010-04-28 16:53 ` David Miller
[not found] ` <w2ue8f3c3211004280842r9f2589e8qb8fd4b7933cd9756@mail.gmail.com>
2010-04-28 16:55 ` David Miller
2010-04-28 11:33 ` jamal
2010-04-28 12:33 ` Eric Dumazet
2010-04-28 12:36 ` jamal
2010-04-28 14:06 ` [PATCH net-next-2.6] net: speedup udp receive path Eric Dumazet
2010-04-28 14:19 ` Eric Dumazet
2010-04-28 14:34 ` Eric Dumazet
2010-04-28 21:36 ` David Miller
2010-04-28 22:22 ` [PATCH net-next-2.6] net: ip_queue_rcv_skb() helper Eric Dumazet
2010-04-28 22:39 ` David Miller
2010-04-28 23:44 ` [PATCH net-next-2.6] net: speedup udp receive path jamal
2010-04-29 0:00 ` jamal
2010-04-29 4:09 ` Eric Dumazet
2010-04-29 11:35 ` jamal
2010-04-29 12:12 ` Changli Gao
2010-04-29 12:45 ` Eric Dumazet
2010-04-29 13:17 ` jamal
2010-04-29 13:21 ` Eric Dumazet
2010-04-29 13:37 ` jamal
2010-04-29 13:49 ` Eric Dumazet
2010-04-29 13:56 ` jamal
2010-04-29 20:36 ` jamal
2010-04-29 21:01 ` [PATCH net-next-2.6] net: sock_def_readable() and friends RCU conversion Eric Dumazet
2010-04-30 13:55 ` Brian Bloniarz
2010-04-30 17:26 ` Eric Dumazet
2010-04-30 23:35 ` David Miller
2010-05-01 4:56 ` Eric Dumazet
2010-05-01 7:02 ` Eric Dumazet
2010-05-01 8:03 ` Eric Dumazet
2010-05-01 22:00 ` David Miller
2010-04-30 19:30 ` jamal [this message]
2010-04-30 20:40 ` [PATCH net-next-2.6] net: speedup udp receive path Eric Dumazet
2010-05-01 0:06 ` jamal
2010-05-01 5:57 ` Eric Dumazet
2010-05-01 6:14 ` Eric Dumazet
2010-05-01 10:24 ` Changli Gao
2010-05-01 10:47 ` Eric Dumazet
2010-05-01 11:29 ` jamal
2010-05-01 11:23 ` jamal
2010-05-01 11:42 ` Eric Dumazet
2010-05-01 11:56 ` jamal
2010-05-01 13:22 ` Eric Dumazet
2010-05-01 13:49 ` jamal
2010-05-03 20:10 ` jamal
2010-04-29 23:07 ` Changli Gao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1272655814.3879.8.camel@bigi \
--to=hadi@cyberus.ca \
--cc=bmb@athenacr.com \
--cc=davem@davemloft.net \
--cc=eilong@broadcom.com \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=shemminger@vyatta.com \
--cc=therbert@google.com \
--cc=xiaosuo@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).