netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
To: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org, caitlinb@broadcom.com, kelly@au1.ibm.com,
	rusty@rustcorp.com.au
Subject: Re: Initial benchmarks of some VJ ideas [mmap memcpy vs copy_to_user].
Date: Thu, 11 May 2006 20:18:15 +0400	[thread overview]
Message-ID: <20060511161815.GA623@2ka.mipt.ru> (raw)
In-Reply-To: <20060511083031.GA12712@2ka.mipt.ru>

On Thu, May 11, 2006 at 12:30:32PM +0400, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> On Thu, May 11, 2006 at 12:07:21AM -0700, David S. Miller (davem@davemloft.net) wrote:
> > You can test with single stream, but then you are only testing
> > in-cache case.  Try several thousand sockets and real load from many
> > unique source systems, it becomes interesting then.
>
> I can test system with large number of streams, but unfortunately only
> from small number of different src/dst ip addresses, so I can not
> benchmark route lookup performance in layered design.

I've run it with 200 UDP sockets in receive path. There were two load
generator machines with 100 clients in each.
There are no copies of skb->data in recvmsg().
Since I only have 1Gb link I'm unable to provide each client with high
bandwith, so they send 4k chunks.
Performance dropped twice down to 55 MB/sec and CPU usage increased noticebly
(slow drift from 12 to 8% compared to 2% with one socket),
but it is not because of cache effect I believe,
but due to highly increased number of syscalls per second.

Here is profile result:
1463625  78.0003  poll_idle
19171     1.0217  _spin_lock_irqsave
15887     0.8467  _read_lock
14712     0.7840  kfree
13370     0.7125  ip_frag_queue
11896     0.6340  delay_pmtmr
11811     0.6294  _spin_lock
11723     0.6247  csum_partial
11399     0.6075  ip_frag_destroy
11063     0.5896  serial_in
10533     0.5613  skb_release_data
10524     0.5609  ip_route_input
10319     0.5499  __alloc_skb
9903      0.5278  ip_defrag
9889      0.5270  _read_unlock
9536      0.5082  _write_unlock
8639      0.4604  _write_lock
7557      0.4027  netif_receive_skb
6748      0.3596  ip_frag_intern
6534      0.3482  preempt_schedule
6220      0.3315  __kmalloc
6005      0.3200  schedule
5924      0.3157  irq_entries_start
5823      0.3103  _spin_unlock_irqrestore
5678      0.3026  ip_rcv
5410      0.2883  __kfree_skb
5056      0.2694  kmem_cache_alloc
5014      0.2672  kfree_skb
4900      0.2611  eth_type_trans
4067      0.2167  kmem_cache_free
3532      0.1882  udp_recvmsg
3531      0.1882  ip_frag_reasm
3331      0.1775  _read_lock_irqsave
3327      0.1773  ipq_kill
3304      0.1761  udp_v4_lookup_longway

I'm going to resurrect zero-copy sniffer project [1] and create special
socket option which would allow to insert pages, which contain
skb->data, into process VMA using VM remapping tricks. Unfortunately it
requires TLB flushing and probably there will be no significant
performance/CPU gain if any, but I think, it is the only way to provide receiving 
zero-copy access to hardware which does not support header split.

Other idea, which I will try, if I understood you correctly, is to create unified cache.
I think some interesting results can be obtained from following
approach: in softint we do not process skb->data at all, but only get
src/dst/sport/dport/protocol numbers (it could require maximum two cache lines,
or it is not fast-path packet (but something like ipsec) and can be processed as usual) 
and create some "initial" cache based on that data, skb is then queued into that
"initial" cache entry and recvmsg() in process context later process' 
that entry.

Back to the drawing board...
Thanks for discussion.

1. zero-copy sniffer
http://tservice.net.ru/~s0mbre/old/?section=projects&item=af_tlb

-- 
	Evgeniy Polyakov

  reply	other threads:[~2006-05-11 16:18 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-08 12:24 Initial benchmarks of some VJ ideas [mmap memcpy vs copy_to_user] Evgeniy Polyakov
2006-05-08 19:51 ` Evgeniy Polyakov
2006-05-08 20:15   ` David S. Miller
2006-05-10 19:58 ` David S. Miller
2006-05-11  6:40   ` Evgeniy Polyakov
2006-05-11  7:07     ` David S. Miller
2006-05-11  8:30       ` Evgeniy Polyakov
2006-05-11 16:18         ` Evgeniy Polyakov [this message]
2006-05-11 18:54           ` David S. Miller
2006-05-11 19:30             ` Rick Jones
2006-05-12  7:54             ` Evgeniy Polyakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060511161815.GA623@2ka.mipt.ru \
    --to=johnpol@2ka.mipt.ru \
    --cc=caitlinb@broadcom.com \
    --cc=davem@davemloft.net \
    --cc=kelly@au1.ibm.com \
    --cc=netdev@vger.kernel.org \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).