linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Pavel Emelyanov <xemul@parallels.com>,
	zhang.zhanghailiang@huawei.com,
	Dave Hansen <dave.hansen@intel.com>,
	Rik van Riel <riel@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Huangpeng (Peter)" <peter.huangpeng@huawei.com>,
	Bamvor Zhang Jian <bamvor.zhangjian@linaro.org>,
	Bharata B Rao <bharata@linux.vnet.ibm.com>,
	Geert Uytterhoeven <geert@linux-m68k.org>
Subject: Re: [PATCH 06/12] userfaultfd: selftest: avoid my_bcmp false positives with powerpc
Date: Wed, 9 Sep 2015 19:02:19 +0200	[thread overview]
Message-ID: <20150909170219.GD10639@redhat.com> (raw)
In-Reply-To: <1441767026.7854.12.camel@ellerman.id.au>

On Wed, Sep 09, 2015 at 12:50:26PM +1000, Michael Ellerman wrote:
> On Tue, 2015-09-08 at 22:43 +0200, Andrea Arcangeli wrote:
> > Keep a non-zero placeholder after the count, for the my_bcmp
> > comparison of the page against the zeropage. The lockless increment
> > between 255 to 256 against a lockless my_bcmp could otherwise return
> > false positives on ppc32le.
> > 
> > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> > ---
> >  tools/testing/selftests/vm/userfaultfd.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> Without groking what the code is doing, this fix makes the test pass on my
> ppc64le box.

On ppc byte reads are "more" out of order, so you can read byte 1
before byte 0. This doesn't seem to happen on x86 (even though
smp_rmb() is not a noop on x86). Now the code didn't even have an
actual compiler barrier but gcc is unlikely to unroll the loop in a
unordered way so that wasn't a problem, but this patch takes care of
gcc too.

One side does a inc++ on a long long. The other side read byte 1
before byte 0, and check if they're both zero. When the inc long long
var transitions from 255 to 256 if you read it in reverse with little
endian, you'll see 0 0 and so my_bcmp would think the page is full of
zeros.

my_bcmp checks that we didn't map a zeropage in there by mistake (like
it would happen if userfaultfd wasn't registered on the anonymous
holes). So instead of relaying on the counter to be always > 0, I
added a further word after the counter that is never changing and not
zero, so we don't have to use any memory barrier for those out of
order checks.

On a side note (feel free to skip this part as it's userland): this is
also why I couldn't use bcmp because bcmp and memcmp return false
positive zeroes if the memory changes under it. In the sse4 unrolled
loop if it finds a difference, it can't tell if it should return > or
< 0 because it's a ptest and not a cmp insn, so then it has to repeat
the memory comparison (re-reading from memory a potentially different
copy of the memory that could have become equal). Problem is it's not
restarting the unrolled loop from where it stopped it, if this final
comparison returns zero and it's not the last byte to compare. In
short glibc memcmp/bcmp can very well return 0 before comparing all
memory that it is told to compare, if the memory is changing under
memcmp/bcmp.

It'd be enough to restart the unrolled loop if the "length" isn't zero
and the last final comparison returned zero, to fix memcmp/bcmp in
glibc. It's not getting fixed because it's not a bug by the C
standard, but it makes the SIMD accellerated bcmp/memcmp in glibc
unusable for lockless fast-path comparisons as it would lead to false
positives that would degrade performance. For example if glibc memcmp
was used to build the stable tree in KSM, it would lead to superfluous
write protections. KSM is in kernel of course so it's not affected by
the memcmp glibc implementation, but similar things can happen in
userland (like I found out the hard way in the userfaultfd program).

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-09-09 17:02 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-08 20:43 [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 01/12] userfaultfd: selftest: update userfaultfd x86 32bit syscall number Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 02/12] userfaultfd: Revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key" Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 03/12] userfaultfd: selftests: vm: pick up sanitized kernel headers Andrea Arcangeli
2015-09-09  2:48   ` Michael Ellerman
2015-09-08 20:43 ` [PATCH 04/12] userfaultfd: selftest: headers fixup Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 05/12] userfaultfd: selftest: only warn if __NR_userfaultfd is undefined Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 06/12] userfaultfd: selftest: avoid my_bcmp false positives with powerpc Andrea Arcangeli
2015-09-09  2:50   ` Michael Ellerman
2015-09-09 17:02     ` Andrea Arcangeli [this message]
2015-09-08 20:43 ` [PATCH 07/12] userfaultfd: selftest: Fix compiler warnings on 32-bit Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 08/12] userfaultfd: selftest: return an error if BOUNCE_VERIFY fails Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 09/12] userfaultfd: selftest: don't error out if pthread_mutex_t isn't identical Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 10/12] userfaultfd: powerpc: Bump up __NR_syscalls to account for __NR_userfaultfd Andrea Arcangeli
2015-09-09  2:48   ` Michael Ellerman
2015-09-08 20:43 ` [PATCH 11/12] userfaultfd: powerpc: implement syscall Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 12/12] userfaultfd: register uapi generic syscall (aarch64) Andrea Arcangeli
2015-09-15 20:02   ` Andrew Morton
2015-09-15 20:20     ` Mathieu Desnoyers
2015-09-15 20:47     ` Andrea Arcangeli
2015-09-30 21:56 ` [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Mike Kravetz
2015-10-01  0:06   ` Andrea Arcangeli
2015-10-01  0:42     ` Mike Kravetz
2015-10-01 16:04       ` Andrea Arcangeli
2015-10-01 16:45         ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150909170219.GD10639@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bamvor.zhangjian@linaro.org \
    --cc=bharata@linux.vnet.ibm.com \
    --cc=dave.hansen@intel.com \
    --cc=dgilbert@redhat.com \
    --cc=geert@linux-m68k.org \
    --cc=linux-mm@kvack.org \
    --cc=mpe@ellerman.id.au \
    --cc=peter.huangpeng@huawei.com \
    --cc=riel@redhat.com \
    --cc=xemul@parallels.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).