linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Dan Williams <dan.j.williams@intel.com>
Cc: "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"alan@linux.intel.com" <alan@linux.intel.com>,
	"Reshetova, Elena" <elena.reshetova@intel.com>,
	"mark.rutland@arm.com" <mark.rutland@arm.com>,
	"gnomes@lxorguk.ukuu.org.uk" <gnomes@lxorguk.ukuu.org.uk>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"jikos@kernel.org" <jikos@kernel.org>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
Subject: Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier
Date: Thu, 04 Jan 2018 08:54:11 -0600	[thread overview]
Message-ID: <87wp0xu12k.fsf@xmission.com> (raw)
In-Reply-To: <CAPcyv4hOtk3QsCWOhECs7=UCh-iO+TKSJvRmqVq+Xhjx9OTiew@mail.gmail.com> (Dan Williams's message of "Wed, 3 Jan 2018 22:32:08 -0800")

Dan Williams <dan.j.williams@intel.com> writes:

> On Wed, Jan 3, 2018 at 9:01 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> "Williams, Dan J" <dan.j.williams@intel.com> writes:
>>
>>
>>
>>> Note that these are "a human looked at static analysis reports and
>>> could not rationalize that these are false positives". Specific domain
>>> knowledge about these paths may find that some of them are indeed false
>>> positives.
>>>
>>> The change to m_start in kernel/user_namespace.c is interesting because
>>> that's an example where the nospec_load() approach by itself would need
>>> to barrier speculation twice whereas if_nospec can do it once for the
>>> whole block.
>>
>>
>> This user_namespace.c change is very convoluted for what it is trying to
>> do.
>
> Sorry this was my rebase on top of commit d5e7b3c5f51f "userns: Don't
> read extents twice in m_start" the original change from Elena was
> simpler. Part of the complexity arises from converting the common
> kernel pattern of
>
> if (<invalid condition>)
>    return NULL;
> do_stuff;
>
> ...to:
>
> if (<valid conidtion>) {
>    barrier();
>    do_stuff;
> }
>
>> It simplifies to a one liner that just adds osb() after pos >=
>> extents. AKA:
>>
>>         if (pos >= extents)
>>                 return NULL;
>> +       osb();
>>
>> Is the intent to hide which branch branch we take based on extents,
>> after the pos check?
>
> The intent is to prevent speculative execution from triggering any
> reads when 'pos' is invalid.

If that is the intent I think the patch you posted is woefully
inadequate.  We have many many more seq files in proc than just
/proc/<pid>/uid_map.

>> I suspect this implies that using a user namespace and a crafted uid
>> map you can hit this in stat, on the fast path.
>>
>> At which point I suspect we will be better off extending struct
>> user_namespace by a few pointers, so there is no union and remove the
>> need for blocking speculation entirely.
>
> How does this help prevent a speculative read with an invalid 'pos'
> reading arbitrary kernel addresses?

I though the concern was extents.

I am now convinced that collectively we need a much better description
of the problem than currently exists.

Either the patch you presented missed a whole lot like 90%+ of the
user/kernel interface or there is some mitigating factor that I am not
seeing.  Either way until reasonable people can read the code and
agree on the potential exploitability of it, I will be nacking these
patches.

>>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>>> index 246d4d4ce5c7..aa0be8cef2d4 100644
>>> --- a/kernel/user_namespace.c
>>> +++ b/kernel/user_namespace.c
>>> @@ -648,15 +648,18 @@ static void *m_start(struct seq_file *seq, loff_t *ppos,
>>>  {
>>>       loff_t pos = *ppos;
>>>       unsigned extents = map->nr_extents;
>>> -     smp_rmb();
>>>
>>> -     if (pos >= extents)
>>> -             return NULL;
>>> +     /* paired with smp_wmb in map_write */
>>> +     smp_rmb();
>>>
>>> -     if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
>>> -             return &map->extent[pos];
>>> +     if (pos < extents) {
>>> +             osb();
>>> +             if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
>>> +                     return &map->extent[pos];
>>> +             return &map->forward[pos];
>>> +     }
>>>
>>> -     return &map->forward[pos];
>>> +     return NULL;
>>>  }
>>>
>>>  static void *uid_m_start(struct seq_file *seq, loff_t *ppos)
>>
>>
>>
>>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>>> index 8ca9915befc8..7f83abdea255 100644
>>> --- a/net/mpls/af_mpls.c
>>> +++ b/net/mpls/af_mpls.c
>>> @@ -81,6 +81,8 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
>>>       if (index < net->mpls.platform_labels) {
>>>               struct mpls_route __rcu **platform_label =
>>>                       rcu_dereference(net->mpls.platform_label);
>>> +
>>> +             osb();
>>>               rt = rcu_dereference(platform_label[index]);
>>>       }
>>>       return rt;
>>
>> Ouch!  This adds a barrier in the middle of an rcu lookup, on the
>> fast path for routing mpls packets.  Which if memory serves will
>> noticably slow down software processing of mpls packets.
>>
>> Why does osb() fall after the branch for validity?  So that we allow
>> speculation up until then?
>
> It falls there so that the cpu only issues reads with known good 'index' values.
>
>> I suspect it would be better to have those barriers in the tun/tap
>> interfaces where userspace can inject packets and thus time them.  Then
>> the code could still speculate and go fast for remote packets.
>>
>> Or does the speculation stomping have to be immediately at the place
>> where we use data from userspace to perform a table lookup?
>
> The speculation stomping barrier has to be between where we validate
> the input and when we may speculate on invalid input.

So a serializing instruction at the kernel/user boundary (like say
loading cr3) is not enough?  That would seem to break any chance of a
controlled timing.

> So, yes, moving
> the user controllable input validation earlier and out of the fast
> path would be preferred. Think of this patch purely as a static
> analysis warning that something might need to be done to resolve the
> report.

That isn't what I was suggesting.  I was just suggesting a serialization
instruction earlier in the pipeline.

Given what I have seen in other parts of the thread I think an and
instruction that just limits the index to a sane range is generally
applicable, and should be cheap enough to not care about.  Further
it seems to apply to the pattern the static checkers were catching,
so I suspect that is the pattern we want to stress for limiting
speculation.  Assuming of course the compiler won't just optimize the
and of the index out.

Eric

  reply	other threads:[~2018-01-04 14:54 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-03 22:38 [RFC PATCH 0/4] API for inhibiting speculative arbitrary read primitives Mark Rutland
2018-01-03 22:38 ` [RFC PATCH 1/4] asm-generic/barrier: add generic nospec helpers Mark Rutland
2018-01-03 22:38   ` Mark Rutland
2018-01-04 12:00   ` Mark Rutland
2018-01-05  4:21     ` Dan Williams
2018-01-05  9:15       ` Mark Rutland
2018-01-03 22:38 ` [RFC PATCH 2/4] Documentation: document " Mark Rutland
2018-01-03 22:38   ` Mark Rutland
2018-01-03 22:38 ` [RFC PATCH 3/4] arm64: implement nospec_{load,ptr}() Mark Rutland
2018-01-03 22:38   ` Mark Rutland
2018-01-03 22:38 ` [RFC PATCH 4/4] bpf: inhibit speculated out-of-bounds pointers Mark Rutland
2018-01-03 22:38   ` Mark Rutland
2018-01-03 23:45   ` Peter Zijlstra
2018-01-03 23:45     ` Peter Zijlstra
2018-01-04 10:59     ` Mark Rutland
2018-01-04  0:15 ` [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier Dan Williams
2018-01-04  0:15   ` Dan Williams
2018-01-04  0:39   ` Linus Torvalds
2018-01-04  1:07     ` Alan Cox
2018-01-04  1:13       ` Dan Williams
2018-01-04  1:13         ` Dan Williams
2018-01-04  6:28         ` Julia Lawall
2018-01-04 17:58           ` Dan Williams
2018-01-04 19:26             ` Pavel Machek
2018-01-04 19:26               ` Pavel Machek
2018-01-04 21:43               ` Dan Williams
2018-01-04 22:20                 ` Linus Torvalds
2018-01-04 22:23                   ` Linus Torvalds
2018-01-04 22:55                   ` Alan Cox
2018-01-04 23:06                     ` Linus Torvalds
2018-01-04 23:11                       ` Alan Cox
2018-01-04 23:11                         ` Alan Cox
2018-01-05  0:24                       ` Dan Williams
2018-01-04 22:44                 ` Pavel Machek
2018-01-04 23:12                   ` Dan Williams
2018-01-04 23:12                     ` Dan Williams
2018-01-04 23:21                     ` Alan Cox
2018-01-04 23:33                     ` Pavel Machek
2018-01-05  8:11                       ` Julia Lawall
2018-01-04  1:27       ` Jiri Kosina
2018-01-04  1:27         ` Jiri Kosina
2018-01-04  1:41         ` Alan Cox
2018-01-04  1:47           ` Jiri Kosina
2018-01-04  1:47             ` Jiri Kosina
2018-01-04 19:39             ` Pavel Machek
2018-01-04 20:32               ` Alan Cox
2018-01-04 20:32                 ` Alan Cox
2018-01-04 20:39                 ` Jiri Kosina
2018-01-04 21:23                   ` Alan Cox
2018-01-04 21:23                     ` Alan Cox
2018-01-04 21:48                     ` Pavel Machek
2018-01-04  1:51         ` Dan Williams
2018-01-04  1:51           ` Dan Williams
2018-01-04  1:54           ` Linus Torvalds
2018-01-04  1:54             ` Linus Torvalds
2018-01-04  3:10             ` Williams, Dan J
2018-01-04  4:44               ` Al Viro
2018-01-04  5:44                 ` Dan Williams
2018-01-04  5:49                   ` Dave Hansen
2018-01-04  5:49                     ` Dave Hansen
2018-01-04  5:50                   ` Al Viro
2018-01-04  5:55                     ` Al Viro
2018-01-04  6:42                       ` Dan Williams
2018-01-04  5:01               ` Eric W. Biederman
2018-01-04  6:32                 ` Dan Williams
2018-01-04 14:54                   ` Eric W. Biederman [this message]
2018-01-04 16:39                     ` Mark Rutland
2018-01-04 20:56                     ` Pavel Machek
2018-01-04 20:56                       ` Pavel Machek
2018-01-04 11:47               ` Mark Rutland
2018-01-04 11:47                 ` Mark Rutland
2018-01-04 22:09                 ` Dan Williams
2018-01-05 14:40                   ` Mark Rutland
2018-01-05 16:44                     ` Dan Williams
2018-01-05 18:05                       ` Dan Williams
2018-01-04  1:59           ` Jiri Kosina
2018-01-04  1:59             ` Jiri Kosina
2018-01-04  2:15             ` Alan Cox
2018-01-04  3:12               ` Alexei Starovoitov
2018-01-04  9:16                 ` Reshetova, Elena
2018-01-04  9:16                   ` Reshetova, Elena
2018-01-04 20:40             ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wp0xu12k.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=alan@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=elena.reshetova@intel.com \
    --cc=gnomes@lxorguk.ukuu.org.uk \
    --cc=gregkh@linuxfoundation.org \
    --cc=jikos@kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).