qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Philippe Mathieu-Daudé" <philmd@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org
Cc: "Alex Bennée" <alex.bennee@linaro.org>,
	"Joseph Myers" <joseph@codesourcery.com>
Subject: Re: [PULL 082/116] target/i386: correct fix for pcmpxstrx substring search
Date: Mon, 15 Jun 2020 12:58:50 +0200	[thread overview]
Message-ID: <bdc40c92-6518-5c9f-646e-817af94d548e@redhat.com> (raw)
In-Reply-To: <530b3231-0e47-80af-4bb1-17e50e231efa@redhat.com>

On 6/15/20 12:18 PM, Philippe Mathieu-Daudé wrote:
> On 6/12/20 6:07 PM, Paolo Bonzini wrote:
>> From: Joseph Myers <joseph@codesourcery.com>
>>
>> This corrects a bug introduced in my previous fix for SSE4.2 pcmpestri
>> / pcmpestrm / pcmpistri / pcmpistrm substring search, commit
>> ae35eea7e4a9f21dd147406dfbcd0c4c6aaf2a60.
>>
>> That commit fixed a bug that showed up in four GCC tests with one libc
>> implementation.  The tests in question generate random inputs to the
>> intrinsics and compare results to a C implementation, but they only
>> test 1024 possible random inputs, and when the tests use the cases of
>> those instructions that work with word rather than byte inputs, it's
>> easy to have problematic cases that show up much less frequently than
>> that.  Thus, testing with a different libc implementation, and so a
>> different random number generator, showed up a problem with the
>> previous patch.
>>
>> When investigating the previous test failures, I found the description
>> of these instructions in the Intel manuals (starting from computing a
>> 16x16 or 8x8 set of comparison results) confusing and hard to match up
>> with the more optimized implementation in QEMU, and referred to AMD
>> manuals which described the instructions in a different way.  Those
>> AMD descriptions are very explicit that the whole of the string being
>> searched for must be found in the other operand, not running off the
>> end of that operand; they say "If the prototype and the SUT are equal
>> in length, the two strings must be identical for the comparison to be
>> TRUE.".  However, that statement is incorrect.
>>
>> In my previous commit message, I noted:
>>
>>   The operation in this case is a search for a string (argument d to
>>   the helper) in another string (argument s to the helper); if a copy
>>   of d at a particular position would run off the end of s, the
>>   resulting output bit should be 0 whether or not the strings match in
>>   the region where they overlap, but the QEMU implementation was
>>   wrongly comparing only up to the point where s ends and counting it
>>   as a match if an initial segment of d matched a terminal segment of
>>   s.  Here, "run off the end of s" means that some byte of d would
>>   overlap some byte outside of s; thus, if d has zero length, it is
>>   considered to match everywhere, including after the end of s.
>>
>> The description "some byte of d would overlap some byte outside of s"
>> is accurate only when understood to refer to overlapping some byte
>> *within the 16-byte operand* but at or after the zero terminator; it
>> is valid to run over the end of s if the end of s is the end of the
>> 16-byte operand.  So the fix in the previous patch for the case of d
>> being empty was correct, but the other part of that patch was not
>> correct (as it never allowed partial matches even at the end of the
>> 16-byte operand).  Nor was the code before the previous patch correct
>> for the case of d nonempty, as it would always have allowed partial
>> matches at the end of s.
>>
>> Fix with a partial revert of my previous change, combined with
>> inserting a check for the special case of s having maximum length to
>> determine where it is necessary to check for matches.
>>
>> In the added test, test 1 is for the case of empty strings, which
>> failed before my 2017 patch, test 2 is for the bug introduced by my
>> 2017 patch and test 3 deals with the case where a match of an initial
>> segment at the end of the string is not valid when the string ends
>> before the end of the 16-byte operand (that is, the case that would be
>> broken by a simple revert of the non-empty-string part of my 2017
>> patch).
>>
>> Signed-off-by: Joseph Myers <joseph@codesourcery.com>
>> Message-Id: <alpine.DEB.2.21.2006121344290.9881@digraph.polyomino.org.uk>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  target/i386/ops_sse.h                |  4 ++--
>>  tests/tcg/i386/Makefile.target       |  3 +++
>>  tests/tcg/i386/test-i386-pcmpistri.c | 33 ++++++++++++++++++++++++++++
>>  3 files changed, 38 insertions(+), 2 deletions(-)
>>  create mode 100644 tests/tcg/i386/test-i386-pcmpistri.c
>>
>> diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
>> index 01d6017412..14f2b16abd 100644
>> --- a/target/i386/ops_sse.h
>> +++ b/target/i386/ops_sse.h
>> @@ -2089,10 +2089,10 @@ static inline unsigned pcmpxstrx(CPUX86State *env, Reg *d, Reg *s,
>>              res = (2 << upper) - 1;
>>              break;
>>          }
>> -        for (j = valids - validd; j >= 0; j--) {
>> +        for (j = valids == upper ? valids : valids - validd; j >= 0; j--) {
>>              res <<= 1;
>>              v = 1;
>> -            for (i = validd; i >= 0; i--) {
>> +            for (i = MIN(valids - j, validd); i >= 0; i--) {
>>                  v &= (pcmp_val(s, ctrl, i + j) == pcmp_val(d, ctrl, i));
>>              }
>>              res |= v;
>> diff --git a/tests/tcg/i386/Makefile.target b/tests/tcg/i386/Makefile.target
>> index 43ee2e181e..53efec0668 100644
>> --- a/tests/tcg/i386/Makefile.target
>> +++ b/tests/tcg/i386/Makefile.target
>> @@ -10,6 +10,9 @@ ALL_X86_TESTS=$(I386_SRCS:.c=)
>>  SKIP_I386_TESTS=test-i386-ssse3
>>  X86_64_TESTS:=$(filter test-i386-ssse3, $(ALL_X86_TESTS))
>>  
>> +test-i386-pcmpistri: CFLAGS += -msse4.2
>> +run-test-i386-pcmpistri: QEMU_OPTS += -cpu max
> 
> This test fails on our CI:
> https://travis-ci.org/github/qemu/qemu/jobs/698006621#L4246

FYI Paolo's analysis from 'make V=1' output
https://api.travis-ci.org/v3/job/698459904/log.txt:

timeout 60
/home/travis/build/philmd/qemu/build/i386-linux-user/qemu-i386 -cpu max
test-i386-pcmpistri >  test-i386-pcmpistri.out

timeout 60
/home/travis/build/philmd/qemu/build/i386-linux-user/qemu-i386  -plugin
../../plugin/libbb.so -d plugin -D
test-i386-pcmpistri-with-libbb.so.pout test-i386-pcmpistri >
run-plugin-test-i386-pcmpistri-with-libbb.so.out

"incorrect qemu invocation, missing -cpu max in the second".

> 
>> +
>>  #
>>  # hello-i386 is a barebones app
>>  #
>> diff --git a/tests/tcg/i386/test-i386-pcmpistri.c b/tests/tcg/i386/test-i386-pcmpistri.c
>> new file mode 100644
>> index 0000000000..1e81ae611a
>> --- /dev/null
>> +++ b/tests/tcg/i386/test-i386-pcmpistri.c
>> @@ -0,0 +1,33 @@
>> +/* Test pcmpistri instruction.  */
>> +
>> +#include <nmmintrin.h>
>> +#include <stdio.h>
>> +
>> +union u {
>> +    __m128i x;
>> +    unsigned char uc[16];
>> +};
>> +
>> +union u s0 = { .uc = { 0 } };
>> +union u s1 = { .uc = "abcdefghijklmnop" };
>> +union u s2 = { .uc = "bcdefghijklmnopa" };
>> +union u s3 = { .uc = "bcdefghijklmnab" };
>> +
>> +int
>> +main(void)
>> +{
>> +    int ret = 0;
>> +    if (_mm_cmpistri(s0.x, s0.x, 0x4c) != 15) {
>> +        printf("FAIL: pcmpistri test 1\n");
>> +        ret = 1;
>> +    }
>> +    if (_mm_cmpistri(s1.x, s2.x, 0x4c) != 15) {
>> +        printf("FAIL: pcmpistri test 2\n");
>> +        ret = 1;
>> +    }
>> +    if (_mm_cmpistri(s1.x, s3.x, 0x4c) != 16) {
>> +        printf("FAIL: pcmpistri test 3\n");
>> +        ret = 1;
>> +    }
>> +    return ret;
>> +}
>>
> 



  reply	other threads:[~2020-06-15 11:00 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-12 16:07 [PULL v2 000/116] Huge miscellaneous pull request for 2020-06-11 Paolo Bonzini
2020-06-12 16:07 ` [PULL 082/116] target/i386: correct fix for pcmpxstrx substring search Paolo Bonzini
2020-06-15 10:18   ` Philippe Mathieu-Daudé
2020-06-15 10:58     ` Philippe Mathieu-Daudé [this message]
2020-06-15 13:43       ` Alex Bennée
2020-06-12 16:07 ` [PULL 088/116] i386: hvf: Drop useless declarations in sysemu Paolo Bonzini
2020-06-12 16:07 ` [PULL 097/116] i386: hvf: Move lazy_flags into CPUX86State Paolo Bonzini
2020-06-12 16:07 ` [PULL 098/116] i386: hvf: Move mmio_buf " Paolo Bonzini
2020-06-13 16:17 ` [PULL v2 000/116] Huge miscellaneous pull request for 2020-06-11 Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bdc40c92-6518-5c9f-646e-817af94d548e@redhat.com \
    --to=philmd@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=joseph@codesourcery.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).