netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: jamal <hadi@cyberus.ca>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Changli Gao <xiaosuo@gmail.com>,
	David Miller <davem@davemloft.net>,
	therbert@google.com, shemminger@vyatta.com,
	netdev@vger.kernel.org, Eilon Greenstein <eilong@broadcom.com>,
	Brian Bloniarz <bmb@athenacr.com>
Subject: Re: [PATCH net-next-2.6] net: speedup udp receive path
Date: Fri, 30 Apr 2010 15:30:14 -0400	[thread overview]
Message-ID: <1272655814.3879.8.camel@bigi> (raw)
In-Reply-To: <1272573383.3969.8.camel@bigi>

[-- Attachment #1: Type: text/plain, Size: 1322 bytes --]

Eric!

I managed to mod your program to look conceptually similar to mine
and i reproduced the results with same test kernel from yesterday. 
So it is likely the issue is in using epoll vs not using any async as
in your case.
Results attached as well as modified program.

Note: the key things to remember:
rps with this program gets worse over time and different net-next
kernels since Apr14 (look at graph i supplied). Sorry, I am really
busy-ed out to dig any further.

cheers,
jamal



On Thu, 2010-04-29 at 16:36 -0400, jamal wrote:
> On Thu, 2010-04-29 at 09:56 -0400, jamal wrote:
> 
> > 
> > I will try your program instead so we can reduce the variables
> 
> Results attached.
> With your app rps does a hell lot better and non-rps worse ;->
> With my proggie, non-rps does much better than yours and rps does
> a lot worse for same setup. I see the scheduler kicking quiet a bit in
> non-rps for you...
> 
> The main difference between us as i see it is:
> a) i use epoll - actually linked to libevent (1.0.something)
> b) I fork processes and you use pthreads.
> 
> I dont have time to chase it today, but 1) I am either going to change
> yours to use libevent or make mine get rid of it then 2) move towards
> pthreads or have yours fork..
> then observe if that makes any difference..
> 
> 
> cheers,
> jamal

[-- Attachment #2: apr30-ericmod --]
[-- Type: text/plain, Size: 8919 bytes --]


First a few runs with Eric's code + epoll/libevent

-------------------------------------------------------------------------------
   PerfTop:    4009 irqs/sec  kernel:83.4% [1000Hz cycles],  (all, 8 CPUs)
-------------------------------------------------------------------------------

             samples  pcnt function                    DSO
             _______ _____ ___________________________ ____________________

             2097.00  8.6% sky2_poll                   [sky2]              
             1742.00  7.2% _raw_spin_lock_irqsave      [kernel]            
              831.00  3.4% system_call                 [kernel]            
              654.00  2.7% copy_user_generic_string    [kernel]            
              654.00  2.7% datagram_poll               [kernel]            
              647.00  2.7% fget                        [kernel]            
              623.00  2.6% _raw_spin_unlock_irqrestore [kernel]            
              547.00  2.3% _raw_spin_lock_bh           [kernel]            
              506.00  2.1% sys_epoll_ctl               [kernel]            
              475.00  2.0% kmem_cache_free             [kernel]            
              466.00  1.9% schedule                    [kernel]            
              436.00  1.8% vread_tsc                   [kernel].vsyscall_fn
              417.00  1.7% fput                        [kernel]            
              415.00  1.7% sys_epoll_wait              [kernel]            
              402.00  1.7% _raw_spin_lock              [kernel]            


-------------------------------------------------------------------------------
   PerfTop:     616 irqs/sec  kernel:98.7% [1000Hz cycles],  (all, cpu: 0)
-------------------------------------------------------------------------------

             samples  pcnt function               DSO
             _______ _____ ______________________ ________

             2534.00 28.6% sky2_poll              [sky2]  
              503.00  5.7% ip_route_input         [kernel]
              438.00  4.9% _raw_spin_lock_irqsave [kernel]
              418.00  4.7% __udp4_lib_lookup      [kernel]
              378.00  4.3% __alloc_skb            [kernel]
              364.00  4.1% ip_rcv                 [kernel]
              323.00  3.6% _raw_spin_lock         [kernel]
              315.00  3.5% sock_queue_rcv_skb     [kernel]
              284.00  3.2% __netif_receive_skb    [kernel]
              281.00  3.2% __udp4_lib_rcv         [kernel]
              266.00  3.0% __wake_up_common       [kernel]
              238.00  2.7% sock_def_readable      [kernel]
              181.00  2.0% __kmalloc              [kernel]
              163.00  1.8% kmem_cache_alloc       [kernel]
              150.00  1.7% ep_poll_callback       [kernel]


-------------------------------------------------------------------------------
   PerfTop:     854 irqs/sec  kernel:80.2% [1000Hz cycles],  (all, cpu: 2)
-------------------------------------------------------------------------------

             samples  pcnt function                    DSO
             _______ _____ ___________________________ ____________________

              341.00  8.0% _raw_spin_lock_irqsave      [kernel]            
              235.00  5.5% system_call                 [kernel]            
              174.00  4.1% datagram_poll               [kernel]            
              174.00  4.1% fget                        [kernel]            
              173.00  4.1% copy_user_generic_string    [kernel]            
              135.00  3.2% _raw_spin_unlock_irqrestore [kernel]            
              125.00  2.9% _raw_spin_lock_bh           [kernel]            
              122.00  2.9% schedule                    [kernel]            
              113.00  2.6% sys_epoll_ctl               [kernel]            
              113.00  2.6% kmem_cache_free             [kernel]            
              108.00  2.5% vread_tsc                   [kernel].vsyscall_fn
              105.00  2.5% sys_epoll_wait              [kernel]            
              102.00  2.4% udp_recvmsg                 [kernel]            
               95.00  2.2% mutex_lock                  [kernel]            

Average 97.55% of 10M packets at 750Kpps

Turn on rps mask ee and irq affinity to cpu0

-------------------------------------------------------------------------------
   PerfTop:    3885 irqs/sec  kernel:83.6% [1000Hz cycles],  (all, 8 CPUs)
-------------------------------------------------------------------------------

             samples  pcnt function                       DSO
             _______ _____ ______________________________ ________

             2945.00 16.7% sky2_poll                      [sky2]  
              653.00  3.7% _raw_spin_lock_irqsave         [kernel]
              460.00  2.6% system_call                    [kernel]
              420.00  2.4% _raw_spin_unlock_irqrestore    [kernel]
              414.00  2.3% sky2_intr                      [sky2]  
              392.00  2.2% fget                           [kernel]
              360.00  2.0% ip_rcv                         [kernel]
              324.00  1.8% sys_epoll_ctl                  [kernel]
              323.00  1.8% __netif_receive_skb            [kernel]
              310.00  1.8% schedule                       [kernel]
              292.00  1.7% ip_route_input                 [kernel]
              292.00  1.7% _raw_spin_lock                 [kernel]
              291.00  1.7% copy_user_generic_string       [kernel]
              284.00  1.6% kmem_cache_free                [kernel]
              262.00  1.5% call_function_single_interrupt [kernel]

-------------------------------------------------------------------------------
   PerfTop:    1000 irqs/sec  kernel:98.1% [1000Hz cycles],  (all, cpu: 0)
-------------------------------------------------------------------------------

             samples  pcnt function                            DSO
             _______ _____ ___________________________________ ________

             4170.00 61.9% sky2_poll                           [sky2]  
              723.00 10.7% sky2_intr                           [sky2]  
              159.00  2.4% __alloc_skb                         [kernel]
              140.00  2.1% get_rps_cpu                         [kernel]
              106.00  1.6% __kmalloc                           [kernel]
               95.00  1.4% enqueue_to_backlog                  [kernel]
               86.00  1.3% kmem_cache_alloc                    [kernel]
               85.00  1.3% irq_entries_start                   [kernel]
               85.00  1.3% _raw_spin_lock_irqsave              [kernel]
               82.00  1.2% _raw_spin_lock                      [kernel]
               66.00  1.0% swiotlb_sync_single                 [kernel]
               58.00  0.9% sky2_remove                         [sky2]  
               49.00  0.7% default_send_IPI_mask_sequence_phys [kernel]
               47.00  0.7% sky2_rx_submit                      [sky2]  
               36.00  0.5% _raw_spin_unlock_irqrestore         [kernel]

-------------------------------------------------------------------------------
   PerfTop:     344 irqs/sec  kernel:84.3% [1000Hz cycles],  (all, cpu: 2)
-------------------------------------------------------------------------------

             samples  pcnt function                       DSO
             _______ _____ ______________________________ ____________________

              114.00  5.2% _raw_spin_lock_irqsave         [kernel]            
               79.00  3.6% fget                           [kernel]            
               78.00  3.6% ip_rcv                         [kernel]            
               78.00  3.6% system_call                    [kernel]            
               75.00  3.4% _raw_spin_unlock_irqrestore    [kernel]            
               67.00  3.1% sys_epoll_ctl                  [kernel]            
               65.00  3.0% schedule                       [kernel]            
               61.00  2.8% ip_route_input                 [kernel]            
               48.00  2.2% vread_tsc                      [kernel].vsyscall_fn
               48.00  2.2% call_function_single_interrupt [kernel]            
               46.00  2.1% kmem_cache_free                [kernel]            
               45.00  2.1% __netif_receive_skb            [kernel]            
               41.00  1.9% process_recv                   snkudp              
               40.00  1.8% kfree                          [kernel]            
               39.00  1.8% _raw_spin_lock                 [kernel]            

92.97% of 10M packets at 750Kpps


Ok, so this is exactly what i saw with my app. non-rps is better.
To summarize: It used to be the opposite on net-next before around
Apr14. rps has gotten worse.

[-- Attachment #3: udpsnkfrk.c --]
[-- Type: text/x-csrc, Size: 3650 bytes --]

/*
 *  Usage: udpsink [ -p baseport] nbports
*/
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <event.h>

struct worker_data {
	struct event *snk_ev;
	struct event_base *base;
	struct timeval t;
	unsigned long pack_count;
	unsigned long bytes_count;
	unsigned long tout;
	int fd;			/* move to avoid hole on 64-bit */
	int pad1;		/*64B - let Eric figure the math;-> */
	//unsigned long _padd[16 - 3]; /* alignment */ 
};

void usage(int code)
{
	fprintf(stderr, "Usage: udpsink [-p baseport] nbports\n");
	exit(code);
}

void process_recv(int fd, short ev, void *arg)
{
	char buffer[4096];
	struct sockaddr_in addr;
	socklen_t len = sizeof(addr);
	struct worker_data *wdata = (struct worker_data *)arg;
	int lu = 0;

	if ((event_add(wdata->snk_ev, &wdata->t)) < 0) {
		perror("cb event_add");
		return;
	}

	if (ev == EV_TIMEOUT) {
		wdata->tout++;
	} else {
		lu = recvfrom(wdata->fd, buffer, sizeof(buffer), 0,
			      (struct sockaddr *)&addr, &len);
		if (lu > 0) {
			wdata->pack_count++;
			wdata->bytes_count += lu;
		}
	}
}

int prep_thread(struct worker_data *wdata)
{
	wdata->t.tv_sec = 1;
	wdata->t.tv_usec = random() % 50000L;

	wdata->base = event_init();
	event_set(wdata->snk_ev, wdata->fd, EV_READ, process_recv, wdata);
	event_base_set(wdata->base, wdata->snk_ev);
	if ((event_add(wdata->snk_ev, &wdata->t)) < 0) {
		perror("event_add");
		return -1;
	}
	return 0;
}

void *worker_func(void *arg)
{
	struct worker_data *wdata = (struct worker_data *)arg;

	return (void *)event_base_loop(wdata->base, 0);
}

int main(int argc, char *argv[])
{
	int c;
	int baseport = 4000;
	int nbthreads;
	struct worker_data *wdata;
	unsigned long ototal = 0;
	int concurrent = 0;
	int verbose = 0;
	int i;
	while ((c = getopt(argc, argv, "cvp:")) != -1) {
		if (c == 'p')
			baseport = atoi(optarg);
		else if (c == 'c')
			concurrent = 1;
		else if (c == 'v')
			verbose++;
		else
			usage(1);
	}
	if (optind == argc)
		usage(1);
	nbthreads = atoi(argv[optind]);
	wdata = calloc(sizeof(struct worker_data), nbthreads);
	if (!wdata) {
		perror("calloc");
		return 1;
	}

	for (i = 0; i < nbthreads; i++) {
		struct sockaddr_in addr;
		pthread_t tid;

		if (i && concurrent) {
			wdata[i].fd = wdata[0].fd;
		} else {
			wdata[i].snk_ev = malloc(sizeof(struct event));
			if (!wdata[i].snk_ev)
				return 1;
			memset(wdata[i].snk_ev, 0, sizeof(struct event));

			wdata[i].fd = socket(PF_INET, SOCK_DGRAM, 0);
			if (wdata[i].fd == -1) {
				free(wdata[i].snk_ev);
				perror("socket");
				return 1;
			}
			memset(&addr, 0, sizeof(addr));
			addr.sin_family = AF_INET;
//                      addr.sin_addr.s_addr = inet_addr(argv[optind]);
			addr.sin_port = htons(baseport + i);
			if (bind
			    (wdata[i].fd, (struct sockaddr *)&addr,
			     sizeof(addr)) < 0) {
				free(wdata[i].snk_ev);
				perror("bind");
				return 1;
			}
//                      fcntl(wdata[i].fd, F_SETFL, O_NDELAY);
		}
		if (prep_thread(wdata + i)) {
			printf("failed to allocate thread %d, exit\n", i);
			exit(0);
		}
		pthread_create(&tid, NULL, worker_func, wdata + i);
	}

	for (;;) {
		unsigned long total;
		long delta;

		sleep(1);
		total = 0;
		for (i = 0; i < nbthreads; i++) {
			total += wdata[i].pack_count;
		}
		delta = total - ototal;
		if (delta) {
			printf("%lu pps (%lu", delta, total);
			if (verbose) {
				for (i = 0; i < nbthreads; i++) {
					if (wdata[i].pack_count)
						printf(" %d:%lu", i,
						       wdata[i].pack_count);
				}
			}
			printf(")\n");
		}
		ototal = total;
	}
}

  parent reply	other threads:[~2010-04-30 19:30 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-23  8:12 [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue Changli Gao
2010-04-23  9:27 ` Eric Dumazet
2010-04-23 22:02   ` jamal
2010-04-24 14:10     ` jamal
2010-04-26 14:03       ` Eric Dumazet
2010-04-26 14:55         ` Eric Dumazet
2010-04-26 21:06           ` jamal
     [not found]           ` <20100429174056.GA8044@gargoyle.fritz.box>
2010-04-29 17:56             ` Eric Dumazet
2010-04-29 18:10               ` OFT - reserving CPU's for networking Stephen Hemminger
2010-04-29 19:19                 ` Thomas Gleixner
2010-04-29 20:02                   ` Eric Dumazet
2010-04-30 18:15                     ` Brian Bloniarz
2010-04-30 18:57                   ` David Miller
2010-04-30 19:58                     ` Thomas Gleixner
2010-04-30 21:01                     ` Andi Kleen
2010-04-30 22:30                       ` David Miller
2010-05-01 10:53                         ` Andi Kleen
2010-05-01 22:03                           ` David Miller
2010-05-01 22:58                             ` Andi Kleen
2010-05-01 23:29                               ` David Miller
2010-05-01 23:44                             ` Ben Hutchings
2010-05-01 20:31                     ` Martin Josefsson
2010-05-01 22:13                       ` David Miller
     [not found]               ` <20100429182347.GA8512@gargoyle.fritz.box>
2010-04-29 19:12                 ` [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue Eric Dumazet
     [not found]                   ` <20100429214144.GA10663@gargoyle.fritz.box>
2010-04-30  5:25                     ` Eric Dumazet
2010-04-30 23:38                     ` David Miller
2010-05-01 11:00                       ` Andi Kleen
2010-05-02  6:56                         ` Eric Dumazet
2010-05-02  9:20                           ` Andi Kleen
2010-05-02 10:54                             ` Eric Dumazet
2010-05-02 14:13                               ` Arjan van de Ven
2010-05-02 14:27                                 ` Eric Dumazet
2010-05-02 15:32                                   ` Eric Dumazet
2010-05-02 17:54                                   ` Arjan van de Ven
2010-05-02 19:22                                     ` Eric Dumazet
2010-05-02 22:06                                       ` Andi Kleen
2010-05-03  3:50                                       ` Arjan van de Ven
2010-05-03  5:17                                         ` Eric Dumazet
2010-05-03 10:22                                           ` Arjan van de Ven
2010-05-03 10:34                                             ` Andi Kleen
2010-05-03 14:09                                               ` Arjan van de Ven
2010-05-03 14:45                                                 ` Brian Bloniarz
2010-05-04  1:10                                                   ` Arjan van de Ven
2010-05-03 15:52                                                 ` Andi Kleen
2010-05-04  1:11                                                   ` Arjan van de Ven
2010-05-02 21:30                                     ` Andi Kleen
2010-05-02 15:46                               ` Andi Kleen
2010-05-02 16:35                                 ` Eric Dumazet
2010-05-02 17:43                                   ` Arjan van de Ven
2010-05-02 17:47                                     ` Eric Dumazet
2010-05-02 21:25                                   ` Andi Kleen
2010-05-02 21:45                                     ` Eric Dumazet
2010-05-02 21:54                                       ` Andi Kleen
2010-05-02 22:08                                         ` Eric Dumazet
2010-05-03 20:15                                           ` jamal
2010-04-26 21:03         ` jamal
2010-04-23 10:26 ` Eric Dumazet
2010-04-27 22:08   ` David Miller
2010-04-27 22:18     ` [PATCH net-next-2.6] bnx2x: Remove two prefetch() Eric Dumazet
2010-04-27 22:19       ` David Miller
2010-04-28 13:14         ` Eilon Greenstein
2010-04-28 15:44           ` Eliezer Tamir
2010-04-28 16:53           ` David Miller
     [not found]           ` <w2ue8f3c3211004280842r9f2589e8qb8fd4b7933cd9756@mail.gmail.com>
2010-04-28 16:55             ` David Miller
2010-04-28 11:33       ` jamal
2010-04-28 12:33         ` Eric Dumazet
2010-04-28 12:36           ` jamal
2010-04-28 14:06             ` [PATCH net-next-2.6] net: speedup udp receive path Eric Dumazet
2010-04-28 14:19               ` Eric Dumazet
2010-04-28 14:34                 ` Eric Dumazet
2010-04-28 21:36               ` David Miller
2010-04-28 22:22                 ` [PATCH net-next-2.6] net: ip_queue_rcv_skb() helper Eric Dumazet
2010-04-28 22:39                   ` David Miller
2010-04-28 23:44               ` [PATCH net-next-2.6] net: speedup udp receive path jamal
2010-04-29  0:00                 ` jamal
2010-04-29  4:09                 ` Eric Dumazet
2010-04-29 11:35                   ` jamal
2010-04-29 12:12                     ` Changli Gao
2010-04-29 12:45                       ` Eric Dumazet
2010-04-29 13:17                         ` jamal
2010-04-29 13:21                           ` Eric Dumazet
2010-04-29 13:37                             ` jamal
2010-04-29 13:49                               ` Eric Dumazet
2010-04-29 13:56                                 ` jamal
2010-04-29 20:36                                   ` jamal
2010-04-29 21:01                                     ` [PATCH net-next-2.6] net: sock_def_readable() and friends RCU conversion Eric Dumazet
2010-04-30 13:55                                       ` Brian Bloniarz
2010-04-30 17:26                                         ` Eric Dumazet
2010-04-30 23:35                                       ` David Miller
2010-05-01  4:56                                         ` Eric Dumazet
2010-05-01  7:02                                         ` Eric Dumazet
2010-05-01  8:03                                           ` Eric Dumazet
2010-05-01 22:00                                             ` David Miller
2010-04-30 19:30                                     ` jamal [this message]
2010-04-30 20:40                                       ` [PATCH net-next-2.6] net: speedup udp receive path Eric Dumazet
2010-05-01  0:06                                         ` jamal
2010-05-01  5:57                                           ` Eric Dumazet
2010-05-01  6:14                                             ` Eric Dumazet
2010-05-01 10:24                                               ` Changli Gao
2010-05-01 10:47                                                 ` Eric Dumazet
2010-05-01 11:29                                               ` jamal
2010-05-01 11:23                                             ` jamal
2010-05-01 11:42                                               ` Eric Dumazet
2010-05-01 11:56                                                 ` jamal
2010-05-01 13:22                                                   ` Eric Dumazet
2010-05-01 13:49                                                     ` jamal
2010-05-03 20:10                                                   ` jamal
2010-04-29 23:07                         ` Changli Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1272655814.3879.8.camel@bigi \
    --to=hadi@cyberus.ca \
    --cc=bmb@athenacr.com \
    --cc=davem@davemloft.net \
    --cc=eilong@broadcom.com \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@vyatta.com \
    --cc=therbert@google.com \
    --cc=xiaosuo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).