From: Simon Kirby <sim@hostway.ca>
To: Julian Anastasov <ja@ssi.bg>
Cc: lvs-devel@vger.kernel.org, Changli Gao <xiaosuo@gmail.com>
Subject: Re: activeconns * weight overflowing 32-bit int
Date: Mon, 5 Aug 2013 19:41:50 -0700 [thread overview]
Message-ID: <20130806024150.GA2393@hostway.ca> (raw)
In-Reply-To: <alpine.LFD.2.00.1305241013130.1636@ja.ssi.bg>
Hello!
On Fri, May 24, 2013 at 11:11:35AM +0300, Julian Anastasov wrote:
> On Thu, 23 May 2013, Simon Kirby wrote:
>
> > > Last time I also checked the assembler output from x86-32
> > > and it looked good. I used 'make net/netfilter/ipvs/ip_vs_wlc.s'
> > > to generate asm output, note the 's' extension.
> > >
> > > May be problem in your ipvs.c file comes from the
> > > fact that __u64 is unsigned long long for 32-bit platforms.
> > > I changed the code to use 'unsigned long' as follows:
> > >
> > > #if 0
> > > typedef unsigned long long __u64;
> > > #else
> > > typedef unsigned long __u64;
> > > #endif
> > >
> > > and the x86-32 platform works correctly.
> > > x86-64 works for both cases: 'unsigned long long' and
> > > 'unsigned long' but x86-32 generates many mul operations
> > > for the 'unsigned long long' case which is not used in
> > > the kernel. That is why I didn't noticed such problem.
> > >
> > > So, I think we should be safe by adding
> > > (__u64) without any (__u32), next days I'll check again
> > > the asm output. May be (__u32) before weight ensures
> > > 32x32 multiply but it should not be needed.
> >
> > Hmm, I was comparing atomic_t being s32 versus u32, not u64 being u64. :)
> > Anyway, the .s results are much easier to read, and (closer to) reality!
> > I did a comparison with (__u64)loh * atomic_read(dest->weight) versus
> > (__u64)loh * (__u32)atomic_read(dest->weight) on both arches and uploaded
> > them to http://0x.ca/sim/ref/3.9-ipvs/. It's not a huge difference, but I
> > prefer the shorter/faster version. ;)
>
> I now see why your patch shows difference compared
> to my tests month ago. This change is the culprit:
>
> - int loh, doh;
> + unsigned int loh, doh;
>
> It effectively changes the operation from:
>
> (__u64/__s64) int * int
>
> into
>
> (__u64) unsigned int * int
>
> that is why you fix it by using __u32:
>
> (__u64) unsigned int * unsigned int
>
> so that both operands are from same 4-byte signedness.
>
> I think, we should keep loh and doh to be int, may be
> the following both solutions should generate 32x32 multiply:
>
> 1. same as my first email:
>
> int loh, doh;
>
> (__u64/__s64) loh * atomic_read(&dest->weight)
>
> In this case I see only one difference between
> __u64 and __s64:
>
> - jb .L41 #,
> - ja .L79 #,
> + jl .L41 #,
> + jg .L79 #,
>
> 2. Your patch:
>
> unsigned int loh, doh;
>
> (__u64) loh * (__u32) atomic_read(&dest->weight)
> or
> (__s64) loh * (__u32) atomic_read(&dest->weight)
Did you mean here "(__s64) loh * (__s32) atomic_read(&dest->sweight)"?
If not, the results for me on GCC 4.7.2 were what I posted at
http://0x.ca/sim/ref/3.9-ipvs/.
> Both solutions generate code that differs only
> in imul vs. mul. In internet I see that imul is
> preferred/faster than mul. That is why I prefer solution 1,
> it has less casts.
>
> So, I think you can change your patch as follows:
>
> 1. Use int for loh, doh. Note that some schedulers
> use 'unsigned int' and should be patched for this
> definition: NQ, SED, WLC
>
> 2. Use (__u64) prefix only, no (__u32) before atomic_read:
> LBLC, LBLCR, NQ, SED, WLC
>
> (__u64) loh * atomic_read(&dest->weight) ...
> (__u64) doh * ...
>
> 3. Explain in commit message that we find the
> result64=int32*int32 faster than result64=uint32*uint32
> and far better than using 64*64 multiply which is
> a bit slower on older CPUs.
I found that u64*u32 was faster than u64*s32, but I didn't check s64*s32.
I checked now and found that u64*u32 and s64*s32 have the same number of
output instructions on i386, with just the different signedness tests.
Both actually use imul since it's the comparison after which it cares
about for signedness.
But what actually makes sense? Do negative weights ever make sense?
Simon-
next prev parent reply other threads:[~2013-08-06 2:41 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-13 6:43 activeconns * weight overflowing 32-bit int Simon Kirby
2013-04-13 15:10 ` Julian Anastasov
2013-05-22 6:18 ` Julian Anastasov
2013-05-23 16:58 ` Simon Kirby
2013-05-23 20:44 ` Julian Anastasov
2013-05-24 0:43 ` Simon Kirby
2013-05-24 8:11 ` Julian Anastasov
2013-08-05 6:10 ` Julian Anastasov
2013-08-06 2:41 ` Simon Kirby [this message]
2013-08-06 6:45 ` Julian Anastasov
2013-08-08 23:54 ` [PATCH] ipvs: Use 64-bit comparisons (connections * weight) to avoid overflow Simon Kirby
2013-08-09 9:02 ` Julian Anastasov
2013-08-10 8:26 ` [PATCH v2] ipvs: fix overflow on dest weight multiply Simon Kirby
2013-08-10 12:31 ` Julian Anastasov
2013-08-13 2:23 ` Simon Horman
2013-08-13 4:45 ` Simon Horman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130806024150.GA2393@hostway.ca \
--to=sim@hostway.ca \
--cc=ja@ssi.bg \
--cc=lvs-devel@vger.kernel.org \
--cc=xiaosuo@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.