From: Hans Schillstrom <hans@schillstrom.com>
To: Julian Anastasov <ja@ssi.bg>
Cc: Simon Horman <horms@verge.net.au>,
lvs-devel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH net-next 11/12] ipvs: reorder keys in connection structure
Date: Thu, 07 Mar 2013 08:49:52 +0100 [thread overview]
Message-ID: <1362642592.17102.140.camel@hawk.mlab.se> (raw)
In-Reply-To: <alpine.LFD.2.00.1303062246480.1899@ja.ssi.bg>
[-- Attachment #1: Type: text/plain, Size: 5410 bytes --]
Hi Julian
On Wed, 2013-03-06 at 23:01 +0200, Julian Anastasov wrote:
> Hello,
>
> On Wed, 6 Mar 2013, Hans Schillstrom wrote:
>
> > Hi Julian
> > Great job you have done !
> > I'll test it immediate...
>
> Thanks, it would be good to catch the problems
> in early phase...
>
> > On Wed, 2013-03-06 at 10:42 +0200, Julian Anastasov wrote:
> > > __ip_vs_conn_in_get and ip_vs_conn_out_get are
> > > hot places. Optimize them, so that ports are matched first.
> > > By moving net and fwmark below, on 32-bit arch we can fit
> > > caddr in 32-byte cache line and all addresses in 64-byte
> > > cache line.
> >
> > Earlier I made some rearrangements like the one you have made.
> > My conclusion at that time was that the best gain was to have
> > fwmark and net within the first 64 bytes, and move daddr to the next
> > cache line.
>
> But fwmark is used only for lookups in backup
> server. The net field is checked first only in
> ip_vs_ct_in_get (on scheduling), it can be optimized too.
> Modern CPUs have 64-byte cache line and may be the
> places of these fields do not play much because checking
> the two ports is enough to differentiate most of the
> connections. The addresses play when ports do not
> differ, i.e. mostly for persistent connections. So,
> on 64-byte cache line it would be more difficult to
> see any difference.
I made some tests on weaker machine (i7-3930K) with moderate background
load, there is absolute no measurable difference with daddr in first
cache line or in second line.
So based on that I prefer your solution since it keeps data together.
> > I uesd UDP at ~7Gbit/sec and 256k source address into a x86_64 machine,
> > and a 50/50 mix of fwmarks and port in that tests.
> >
> > I guess that you have made similar test, and even take
> > ip_vs_conn_out_get() into your calculations ?
>
> No, I have only virtual boxes for tests...
>
> > Regards
> > Hans
> >
> > >
> > > Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
> > > ---
> > > include/net/ip_vs.h | 12 ++++++------
> > > net/netfilter/ipvs/ip_vs_conn.c | 8 ++++----
> > > 2 files changed, 10 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> > > index 9059360..2bc30e6 100644
> > > --- a/include/net/ip_vs.h
> > > +++ b/include/net/ip_vs.h
> > > @@ -566,20 +566,19 @@ struct ip_vs_conn_param {
> > > */
> > > struct ip_vs_conn {
> > > struct hlist_node c_list; /* hashed list heads */
> > > -#ifdef CONFIG_NET_NS
> > > - struct net *net; /* Name space */
> > > -#endif
> > > /* Protocol, addresses and port numbers */
> > > - u16 af; /* address family */
> > > __be16 cport;
> > > - __be16 vport;
> > > __be16 dport;
> > > - __u32 fwmark; /* Fire wall mark from skb */
> > > + __be16 vport;
> > > + u16 af; /* address family */
> > > union nf_inet_addr caddr; /* client address */
> > > union nf_inet_addr vaddr; /* virtual address */
> > > union nf_inet_addr daddr; /* destination address */
> > > volatile __u32 flags; /* status flags */
> > > __u16 protocol; /* Which protocol (TCP/UDP) */
> > > +#ifdef CONFIG_NET_NS
> > > + struct net *net; /* Name space */
> > > +#endif
> > >
> > > /* counter and timer */
> > > atomic_t refcnt; /* reference count */
> > > @@ -593,6 +592,7 @@ struct ip_vs_conn {
> > > * state transition triggerd
> > > * synchronization
> > > */
> > > + __u32 fwmark; /* Fire wall mark from skb */
> > > unsigned long sync_endtime; /* jiffies + sent_retries */
> > >
> > > /* Control members */
> > > diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
> > > index b0cd2be..a4d8ec5 100644
> > > --- a/net/netfilter/ipvs/ip_vs_conn.c
> > > +++ b/net/netfilter/ipvs/ip_vs_conn.c
> > > @@ -265,8 +265,8 @@ __ip_vs_conn_in_get(const struct ip_vs_conn_param *p)
> > > rcu_read_lock();
> > >
> > > hlist_for_each_entry_rcu(cp, &ip_vs_conn_tab[hash], c_list) {
> > > - if (cp->af == p->af &&
> > > - p->cport == cp->cport && p->vport == cp->vport &&
> > > + if (p->cport == cp->cport && p->vport == cp->vport &&
> > > + cp->af == p->af &&
> > > ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) &&
> > > ip_vs_addr_equal(p->af, p->vaddr, &cp->vaddr) &&
> > > ((!p->cport) ^ (!(cp->flags & IP_VS_CONN_F_NO_CPORT))) &&
> > > @@ -404,8 +404,8 @@ struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p)
> > > rcu_read_lock();
> > >
> > > hlist_for_each_entry_rcu(cp, &ip_vs_conn_tab[hash], c_list) {
> > > - if (cp->af == p->af &&
> > > - p->vport == cp->cport && p->cport == cp->dport &&
> > > + if (p->vport == cp->cport && p->cport == cp->dport &&
> > > + cp->af == p->af &&
> > > ip_vs_addr_equal(p->af, p->vaddr, &cp->caddr) &&
> > > ip_vs_addr_equal(p->af, p->caddr, &cp->daddr) &&
> > > p->protocol == cp->protocol &&
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6177 bytes --]
next prev parent reply other threads:[~2013-03-07 7:49 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-06 8:42 [PATCH net-next 00/12] IPVS optimizations Julian Anastasov
2013-03-06 8:42 ` [PATCH net-next 01/12] net: add dst_get_noref and refdst_ptr helpers Julian Anastasov
2013-03-06 8:42 ` [PATCH net-next 02/12] ipvs: avoid routing by TOS for real server Julian Anastasov
2013-03-06 8:42 ` [PATCH net-next 03/12] ipvs: prefer NETDEV_DOWN event to free cached dsts Julian Anastasov
2013-03-06 9:56 ` Hans Schillstrom
2013-03-06 21:21 ` Julian Anastasov
2013-03-07 7:43 ` Hans Schillstrom
2013-03-06 8:42 ` [PATCH net-next 04/12] ipvs: convert the IP_VS_XMIT macros to functions Julian Anastasov
2013-03-06 8:42 ` [PATCH net-next 05/12] ipvs: rename functions related to dst_cache reset Julian Anastasov
2013-03-06 8:42 ` [PATCH net-next 06/12] ipvs: optimize dst usage for real server Julian Anastasov
2013-03-06 20:18 ` David Miller
2013-03-06 21:58 ` Julian Anastasov
2013-03-06 22:06 ` David Miller
2013-03-07 0:14 ` Julian Anastasov
2013-03-06 8:42 ` [PATCH net-next 07/12] ipvs: convert app locks Julian Anastasov
2013-03-06 8:42 ` [PATCH net-next 08/12] ipvs: remove rs_lock by using RCU Julian Anastasov
2013-03-06 8:42 ` [PATCH net-next 09/12] ipvs: convert locks used in persistence engines Julian Anastasov
2013-03-06 8:42 ` [PATCH net-next 10/12] ipvs: convert connection locking Julian Anastasov
2013-03-06 8:42 ` [PATCH net-next 11/12] ipvs: reorder keys in connection structure Julian Anastasov
2013-03-06 9:43 ` Hans Schillstrom
2013-03-06 21:01 ` Julian Anastasov
2013-03-07 7:49 ` Hans Schillstrom [this message]
2013-03-07 23:23 ` Julian Anastasov
2013-03-06 8:42 ` [PATCH net-next 12/12] ipvs: avoid kmem_cache_zalloc in ip_vs_conn_new Julian Anastasov
2013-03-07 10:09 ` [PATCH net-next 00/12] IPVS optimizations Jesper Dangaard Brouer
2013-03-07 23:46 ` Julian Anastasov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1362642592.17102.140.camel@hawk.mlab.se \
--to=hans@schillstrom.com \
--cc=horms@verge.net.au \
--cc=ja@ssi.bg \
--cc=lvs-devel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).