From: "Michael S. Tsirkin" <mst@redhat.com>
To: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>,
David Hildenbrand <dahi@linux.vnet.ibm.com>,
linuxppc-dev@lists.ozlabs.org, linux-arch@vger.kernel.org,
linux-kernel@vger.kernel.org, benh@kernel.crashing.org,
paulus@samba.org, akpm@linux-foundation.org,
schwidefsky@de.ibm.com, mingo@kernel.org
Subject: Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic
Date: Thu, 27 Nov 2014 09:40:11 +0200 [thread overview]
Message-ID: <20141127074011.GB8644@redhat.com> (raw)
In-Reply-To: <20141127070919.GA4390@osiris>
On Thu, Nov 27, 2014 at 08:09:19AM +0100, Heiko Carstens wrote:
> On Wed, Nov 26, 2014 at 07:04:47PM +0200, Michael S. Tsirkin wrote:
> > On Wed, Nov 26, 2014 at 05:51:08PM +0100, Christian Borntraeger wrote:
> > > > But this one was > giving users in field false positives.
> > >
> > > So lets try to fix those, ok? If we cant, then tough luck.
> >
> > Sure.
> > I think the simplest way might be to make spinlock disable
> > premption when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.
> >
> > As a result, userspace access will fail and caller will
> > get a nice error.
>
> Yes, _userspace_ now sees unpredictable behaviour, instead of that the
> kernel emits a big loud warning to the console.
So I don't object to adding more debugging at all.
Sure, would be nice.
But the fix is not an unconditional might_sleep
within might_fault, this would trigger false positives.
Rather, detect that you took a spinlock
without disabling preemption.
> Please consider this simple example:
>
> int bar(char __user *ptr)
> {
> ...
> if (copy_to_user(ptr, ...)
> return -EFAULT;
> ...
> }
>
> SYSCALL_DEFINE1(foo, char __user *, ptr)
> {
> int rc;
>
> ...
> rc = bar(ptr);
> if (rc)
> goto out;
> ...
> out:
> return rc;
> }
>
> The above simple system call just works fine, with and without your change,
> however if somebody (incorrectly) changes sys_foo() to the code below:
>
> spin_lock(&lock);
> rc = bar(ptr);
> if (rc)
> goto out;
> out:
> spin_unlock(&lock);
> return rc;
>
> Broken code like above used to generate warnings. With your change we won't
> see any warnings anymore. Instead we get random and bad behaviour:
>
> For !CONFIG_PREEMPT if the page at ptr is not mapped, the kernel will see
> a fault, potentially schedule and potentially deadlock on &lock.
> Without _any_ warning anymore.
>
> For CONFIG_PREEMPT if the page at ptr is mapped, everthing works. However if
> the page is not mapped, userspace now all of the sudden will see an invalid(!)
> -EFAULT return code, instead of that the kernel resolved the page fault.
> Yes, the kernel can't resolve the fault since we hold a spinlock. But the
> above bogus code did give warnings to give you an idea that something probably
> is not correct.
>
> Who on earth is supposed to debug crap like this???
>
> What we really want is:
>
> Code like
> spin_lock(&lock);
> if (copy_to_user(...))
> rc = ...
> spin_unlock(&lock);
> really *should* generate warnings like it did before.
>
> And *only* code like
> spin_lock(&lock);
> page_fault_disable();
> if (copy_to_user(...))
> rc = ...
> page_fault_enable();
> spin_unlock(&lock);
> should not generate warnings, since the author hopefully knew what he did.
>
> We could achieve that by e.g. adding a couple of pagefault disabled bits
> within current_thread_info()->preempt_count, which would allow
> pagefault_disable() and pagefault_enable() to modify a different part of
> preempt_count than it does now, so there is a way to tell if pagefaults have
> been explicitly disabled or are just a side effect of preemption being
> disabled.
> This would allow might_fault() to restore its old sane behaviour for the
> !page_fault_disabled() case.
Exactly. I agree, that would be a useful debugging tool.
In fact this comment in mm/memory.c hints at this:
* it would be nicer only to annotate paths which are not under
* pagefault_disable,
it further says
* however that requires a larger audit and
* providing helpers like get_user_atomic.
but I think that what you outline is a better way to do this.
--
MST
WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-arch@vger.kernel.org,
David Hildenbrand <dahi@linux.vnet.ibm.com>,
linux-kernel@vger.kernel.org,
Christian Borntraeger <borntraeger@de.ibm.com>,
paulus@samba.org, schwidefsky@de.ibm.com,
akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org,
mingo@kernel.org
Subject: Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic
Date: Thu, 27 Nov 2014 09:40:11 +0200 [thread overview]
Message-ID: <20141127074011.GB8644@redhat.com> (raw)
In-Reply-To: <20141127070919.GA4390@osiris>
On Thu, Nov 27, 2014 at 08:09:19AM +0100, Heiko Carstens wrote:
> On Wed, Nov 26, 2014 at 07:04:47PM +0200, Michael S. Tsirkin wrote:
> > On Wed, Nov 26, 2014 at 05:51:08PM +0100, Christian Borntraeger wrote:
> > > > But this one was > giving users in field false positives.
> > >
> > > So lets try to fix those, ok? If we cant, then tough luck.
> >
> > Sure.
> > I think the simplest way might be to make spinlock disable
> > premption when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.
> >
> > As a result, userspace access will fail and caller will
> > get a nice error.
>
> Yes, _userspace_ now sees unpredictable behaviour, instead of that the
> kernel emits a big loud warning to the console.
So I don't object to adding more debugging at all.
Sure, would be nice.
But the fix is not an unconditional might_sleep
within might_fault, this would trigger false positives.
Rather, detect that you took a spinlock
without disabling preemption.
> Please consider this simple example:
>
> int bar(char __user *ptr)
> {
> ...
> if (copy_to_user(ptr, ...)
> return -EFAULT;
> ...
> }
>
> SYSCALL_DEFINE1(foo, char __user *, ptr)
> {
> int rc;
>
> ...
> rc = bar(ptr);
> if (rc)
> goto out;
> ...
> out:
> return rc;
> }
>
> The above simple system call just works fine, with and without your change,
> however if somebody (incorrectly) changes sys_foo() to the code below:
>
> spin_lock(&lock);
> rc = bar(ptr);
> if (rc)
> goto out;
> out:
> spin_unlock(&lock);
> return rc;
>
> Broken code like above used to generate warnings. With your change we won't
> see any warnings anymore. Instead we get random and bad behaviour:
>
> For !CONFIG_PREEMPT if the page at ptr is not mapped, the kernel will see
> a fault, potentially schedule and potentially deadlock on &lock.
> Without _any_ warning anymore.
>
> For CONFIG_PREEMPT if the page at ptr is mapped, everthing works. However if
> the page is not mapped, userspace now all of the sudden will see an invalid(!)
> -EFAULT return code, instead of that the kernel resolved the page fault.
> Yes, the kernel can't resolve the fault since we hold a spinlock. But the
> above bogus code did give warnings to give you an idea that something probably
> is not correct.
>
> Who on earth is supposed to debug crap like this???
>
> What we really want is:
>
> Code like
> spin_lock(&lock);
> if (copy_to_user(...))
> rc = ...
> spin_unlock(&lock);
> really *should* generate warnings like it did before.
>
> And *only* code like
> spin_lock(&lock);
> page_fault_disable();
> if (copy_to_user(...))
> rc = ...
> page_fault_enable();
> spin_unlock(&lock);
> should not generate warnings, since the author hopefully knew what he did.
>
> We could achieve that by e.g. adding a couple of pagefault disabled bits
> within current_thread_info()->preempt_count, which would allow
> pagefault_disable() and pagefault_enable() to modify a different part of
> preempt_count than it does now, so there is a way to tell if pagefaults have
> been explicitly disabled or are just a side effect of preemption being
> disabled.
> This would allow might_fault() to restore its old sane behaviour for the
> !page_fault_disabled() case.
Exactly. I agree, that would be a useful debugging tool.
In fact this comment in mm/memory.c hints at this:
* it would be nicer only to annotate paths which are not under
* pagefault_disable,
it further says
* however that requires a larger audit and
* providing helpers like get_user_atomic.
but I think that what you outline is a better way to do this.
--
MST
next prev parent reply other threads:[~2014-11-27 7:40 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-25 11:43 [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic David Hildenbrand
2014-11-25 11:43 ` David Hildenbrand
2014-11-25 11:43 ` [RFC 1/2] powerpc/fsl-pci: atomic get_user when pagefault_disabled David Hildenbrand
2014-11-25 11:43 ` David Hildenbrand
2015-01-30 5:15 ` [RFC,1/2] " Scott Wood
2015-01-30 7:58 ` David Hildenbrand
2014-11-25 11:43 ` [RFC 2/2] mm, sched: trigger might_sleep() in might_fault() when atomic David Hildenbrand
2014-11-25 11:43 ` David Hildenbrand
2014-11-26 7:02 ` [RFC 0/2] Reenable might_sleep() checks for " Michael S. Tsirkin
2014-11-26 7:02 ` Michael S. Tsirkin
2014-11-26 10:05 ` David Hildenbrand
2014-11-26 10:05 ` David Hildenbrand
2014-11-26 15:17 ` Michael S. Tsirkin
2014-11-26 15:17 ` Michael S. Tsirkin
2014-11-26 15:23 ` Michael S. Tsirkin
2014-11-26 15:23 ` Michael S. Tsirkin
2014-11-26 15:23 ` Michael S. Tsirkin
2014-11-26 15:32 ` David Hildenbrand
2014-11-26 15:32 ` David Hildenbrand
2014-11-26 15:47 ` Michael S. Tsirkin
2014-11-26 15:47 ` Michael S. Tsirkin
2014-11-26 16:02 ` David Hildenbrand
2014-11-26 16:02 ` David Hildenbrand
2014-11-26 16:19 ` Michael S. Tsirkin
2014-11-26 16:19 ` Michael S. Tsirkin
2014-11-26 16:30 ` Christian Borntraeger
2014-11-26 16:30 ` Christian Borntraeger
2014-11-26 16:50 ` Michael S. Tsirkin
2014-11-26 16:50 ` Michael S. Tsirkin
2014-11-26 16:07 ` Christian Borntraeger
2014-11-26 16:07 ` Christian Borntraeger
2014-11-26 16:32 ` Michael S. Tsirkin
2014-11-26 16:32 ` Michael S. Tsirkin
2014-11-26 16:51 ` Christian Borntraeger
2014-11-26 16:51 ` Christian Borntraeger
2014-11-26 17:04 ` Michael S. Tsirkin
2014-11-26 17:04 ` Michael S. Tsirkin
2014-11-26 17:21 ` Michael S. Tsirkin
2014-11-26 17:21 ` Michael S. Tsirkin
2014-11-27 7:09 ` Heiko Carstens
2014-11-27 7:09 ` Heiko Carstens
2014-11-27 7:40 ` Michael S. Tsirkin [this message]
2014-11-27 7:40 ` Michael S. Tsirkin
2014-11-27 8:03 ` David Hildenbrand
2014-11-27 8:03 ` David Hildenbrand
2014-11-27 12:04 ` Heiko Carstens
2014-11-27 12:04 ` Heiko Carstens
2014-11-27 12:08 ` David Hildenbrand
2014-11-27 12:08 ` David Hildenbrand
2014-11-27 15:07 ` Thomas Gleixner
2014-11-27 15:07 ` Thomas Gleixner
2014-11-27 15:19 ` David Hildenbrand
2014-11-27 15:19 ` David Hildenbrand
2014-11-27 15:37 ` David Laight
2014-11-27 15:37 ` David Laight
2014-11-27 15:37 ` David Laight
2014-11-27 15:45 ` David Hildenbrand
2014-11-27 15:45 ` David Hildenbrand
2014-11-27 16:27 ` David Laight
2014-11-27 16:27 ` David Laight
2014-11-27 16:49 ` David Hildenbrand
2014-11-27 16:49 ` David Hildenbrand
2014-11-27 16:49 ` David Hildenbrand
2014-11-27 21:52 ` Thomas Gleixner
2014-11-27 21:52 ` Thomas Gleixner
2014-11-28 7:34 ` David Hildenbrand
2014-11-28 7:34 ` David Hildenbrand
2014-11-26 15:30 ` Christian Borntraeger
2014-11-26 15:30 ` Christian Borntraeger
2014-11-26 15:37 ` Michael S. Tsirkin
2014-11-26 15:37 ` Michael S. Tsirkin
2014-11-26 16:02 ` Christian Borntraeger
2014-11-26 16:02 ` Christian Borntraeger
2014-11-26 15:22 ` Michael S. Tsirkin
2014-11-26 15:22 ` Michael S. Tsirkin
2014-11-27 17:10 ` [PATCH RFC " David Hildenbrand
2014-11-27 17:10 ` David Hildenbrand
2014-11-27 17:10 ` [PATCH RFC 1/2] preempt: track pagefault_disable() calls in the preempt counter David Hildenbrand
2014-11-27 17:10 ` David Hildenbrand
2014-11-27 17:10 ` [PATCH RFC 2/2] mm, sched: trigger might_sleep() in might_fault() when pagefaults are disabled David Hildenbrand
2014-11-27 17:10 ` David Hildenbrand
2014-11-27 17:24 ` Michael S. Tsirkin
2014-11-27 17:24 ` Michael S. Tsirkin
2014-11-27 17:32 ` Michael S. Tsirkin
2014-11-27 17:32 ` Michael S. Tsirkin
2014-11-27 18:08 ` David Hildenbrand
2014-11-27 18:08 ` David Hildenbrand
2014-11-27 18:27 ` Michael S. Tsirkin
2014-11-27 18:27 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141127074011.GB8644@redhat.com \
--to=mst@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=benh@kernel.crashing.org \
--cc=borntraeger@de.ibm.com \
--cc=dahi@linux.vnet.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mingo@kernel.org \
--cc=paulus@samba.org \
--cc=schwidefsky@de.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.