linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Borkmann <dborkman@redhat.com>
To: James Yonan <james@openvpn.net>
Cc: Florian Weimer <fw@deneb.enyo.de>,
	Marcelo Cerri <mhcerri@linux.vnet.ibm.com>,
	linux-crypto@vger.kernel.org, herbert@gondor.hengli.com.au
Subject: Re: [PATCH] crypto_mem_not_equal: add constant-time equality testing of memory regions
Date: Tue, 17 Sep 2013 21:07:12 +0200	[thread overview]
Message-ID: <5238A860.2010404@redhat.com> (raw)
In-Reply-To: <52373BA3.9090709@openvpn.net>

On 09/16/2013 07:10 PM, James Yonan wrote:
> On 16/09/2013 01:56, Daniel Borkmann wrote:
>> On 09/15/2013 06:59 PM, James Yonan wrote:
>>> On 15/09/2013 09:45, Florian Weimer wrote:
>>>> * James Yonan:
>>>>
>>>>> + * Constant-time equality testing of memory regions.
>>>>> + * Returns 0 when data is equal, non-zero otherwise.
>>>>> + * Fast path if size == 16.
>>>>> + */
>>>>> +noinline unsigned long crypto_mem_not_equal(const void *a, const
>>>>> void *b, size_t size)
>>>>
>>>> I think this should really return unsigned or int, to reduce the risk
>>>> that the upper bytes are truncated because the caller uses an
>>>> inappropriate type, resulting in a bogus zero result.  Reducing the
>>>> value to 0/1 probably doesn't hurt performance too much.  It also
>>>> doesn't encode any information about the location of the difference in
>>>> the result value, which helps if that ever leaks.
>>>
>>> The problem with returning 0/1 within the function body of
>>> crypto_mem_not_equal is that it makes it easier for the compiler to
>>> introduce a short-circuit optimization.
>>>
>>> It might be better to move the test where the result is compared
>>> against 0 into an inline function:
>>>
>>> noinline unsigned long __crypto_mem_not_equal(const void *a, const
>>> void *b, size_t size);
>>>
>>> static inline int crypto_mem_not_equal(const void *a, const void *b,
>>> size_t size) {
>>>      return __crypto_mem_not_equal(a, b, size) != 0UL ? 1 : 0;
>>> }
>>>
>>> This hides the fact that we are only interested in a boolean result
>>> from the compiler when it's compiling crypto_mem_not_equal.c, but also
>>> ensures type safety when users test the return value.  It's also
>>> likely to have little or no performance impact.
>>
>> Well, the code snippet I've provided from NaCl [1] is not really
>> "fast-path"
>> as you say, but rather to prevent the compiler from doing such
>> optimizations
>> by having a transformation of the "accumulated" bits into 0 and 1 as an end
>> result (likely to prevent a short circuit), plus it has static size, so no
>> loops applied here that could screw up.
>>
>> Variable size could be done under arch/ in asm, and if not available, that
>> just falls back to normal memcmp that is being transformed into a same
>> return
>> value. By that, all other archs could easily migrate afterwards. What do
>> you
>> think?
>>
>>   [1] http://www.spinics.net/lists/linux-crypto/msg09558.html
>
> I'm not sure that the differentbits -> 0/-1 transform in NaCl really buys us anything because
 > we don't care very much about making the final test of differentbits != 0 constant-time.  An
 > attacker already knows whether the test succeeded or failed -- we care more about making the
 > failure cases constant-time.
>
> To do this, we need to make sure that the compiler doesn't insert one or more early instructions
 > to compare differentbits with 0xFF and then bypass the rest of the F(n) lines because it knows
 > then that the value of differentbits cannot be changed by subsequent F(n) lines.  It seems that
 > all of the approaches that use |= to build up the differentbits value suffer from this problem.
>
> My proposed solution is rather than trying to outsmart the compiler with code that resists
 > optimization, why not just turn optimization off directly with #pragma GCC optimize.  Or better
 > yet, use an optimization setting that provides reasonable speed without introducing potential
 > short-circuit optimizations.
>
> By optimizing for size ("Os"), the compiler will need to turn off optimizations that add code
 > for a possible speed benefit, and the kind of short-circuit optimizations that we are trying to
 > avoid fall precisely into this class -- they add an instruction to check if the OR accumulator has
 > all of its bits set, then if so, do an early exit.  So by using Os, we still benefit from
 > optimizations that increase speed but don't increase code size.

While I was looking a bit further into this, I thought using an attribute like this might be a
more clean variant ...

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 92669cd..2505b1b 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -351,6 +351,11 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
   */
  #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

+/* Tell GCC to turn off optimization for a particular function. */
+#ifndef __do_not_optimize
+#define __do_not_optimize      __attribute__((optimize("O0")))
+#endif
+
  /* Ignore/forbid kprobes attach on very low level functions marked by this attribute: */
  #ifdef CONFIG_KPROBES
  # define __kprobes     __attribute__((__section__(".kprobes.text")))

... however, then I found on the GCC developer mailing list [1]: That said, I consider the
optimize attribute code seriously broken and unmaintained (but sometimes useful for debugging -
and only that). [...] Does "#pragma Gcc optimize" work more reliably? No, it uses the same
mechanism internally. [...] And if these options are so broken, should they be marked as such
in the manual? [...] Probably yes.

Therefore, I still need to ask ... what about the option of an arch/*/crypto/... asm
implementation? This still seems safer for something that crucial, imho.

> Once we eliminate the possibility of short-circuit optimizations, then it's much more straightforward
 > to make the function fast (e.g. by comparing 64-bits at a time) and flexible (by handling arbitrary
 > size).  My current implementation of crypto_mem_not_equal does both.

  [1] http://gcc.gnu.org/ml/gcc/2012-07/msg00211.html

  reply	other threads:[~2013-09-17 19:07 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-10 18:38 [PATCH] crypto_memcmp: add constant-time memcmp James Yonan
2013-09-10 18:57 ` Daniel Borkmann
2013-09-11 12:19   ` Marcelo Cerri
2013-09-11 17:20     ` James Yonan
2013-09-13  8:33       ` Daniel Borkmann
2013-09-15 15:32         ` [PATCH] crypto_mem_not_equal: add constant-time equality testing of memory regions James Yonan
2013-09-15 15:45           ` Florian Weimer
2013-09-15 16:59             ` James Yonan
2013-09-16  7:56               ` Daniel Borkmann
2013-09-16 17:10                 ` James Yonan
2013-09-17 19:07                   ` Daniel Borkmann [this message]
2013-09-19  0:13                     ` James Yonan
2013-09-19  8:37                       ` Daniel Borkmann
2013-09-16 17:25               ` Florian Weimer
2013-09-15 15:38         ` [PATCH] crypto_memcmp: add constant-time memcmp James Yonan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5238A860.2010404@redhat.com \
    --to=dborkman@redhat.com \
    --cc=fw@deneb.enyo.de \
    --cc=herbert@gondor.hengli.com.au \
    --cc=james@openvpn.net \
    --cc=linux-crypto@vger.kernel.org \
    --cc=mhcerri@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).