Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Oleg Nesterov <oleg@redhat.com>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: the arch/x86 maintainers <x86@kernel.org>,
	KVM list <kvm@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	virtualization <virtualization@lists.linux-foundation.org>,
	Paul Gortmaker <paul.gortmaker@windriver.com>,
	Peter Anvin <hpa@zytor.com>, Davidlohr Bueso <dave@stgolabs.net>,
	Andrey Ryabinin <a.ryabinin@samsung.com>,
	Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Sasha Levin <sasha.levin@oracle.com>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	Rik van Riel <riel@redhat.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Andi Kleen <ak@linux.intel.com>,
	xen-devel@lists.xenproject.org, Dave Jones <davej@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Waiman Long <waiman.long@hp.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Paolo Bonzini <pbonzini@redhat.>
Subject: Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions
Date: Thu, 12 Feb 2015 15:18:19 +0100	[thread overview]
Message-ID: <20150212141819.GA11633@redhat.com> (raw)
In-Reply-To: <54DBE27C.8050105@goop.org>

On 02/11, Jeremy Fitzhardinge wrote:
>
> On 02/11/2015 09:24 AM, Oleg Nesterov wrote:
> > I agree, and I have to admit I am not sure I fully understand why
> > unlock uses the locked add. Except we need a barrier to avoid the race
> > with the enter_slowpath() users, of course. Perhaps this is the only
> > reason?
>
> Right now it needs to be a locked operation to prevent read-reordering.
> x86 memory ordering rules state that all writes are seen in a globally
> consistent order, and are globally ordered wrt reads *on the same
> addresses*, but reads to different addresses can be reordered wrt to writes.
>
> So, if the unlocking add were not a locked operation:
>
>         __add(&lock->tickets.head, TICKET_LOCK_INC);		/* not locked */
>
>         if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
>             __ticket_unlock_slowpath(lock, prev);
>
> Then the read of lock->tickets.tail can be reordered before the unlock,
> which introduces a race:

Yes, yes, thanks, but this is what I meant. We need a barrier. Even if
"Every store is a release" as Linus mentioned.

> This *might* be OK, but I think it's on dubious ground:
>
>         __add(&lock->tickets.head, TICKET_LOCK_INC);		/* not locked */
>
> 	/* read overlaps write, and so is ordered */
>         if (unlikely(lock->head_tail & (TICKET_SLOWPATH_FLAG << TICKET_SHIFT))
>             __ticket_unlock_slowpath(lock, prev);
>
> because I think Intel and AMD differed in interpretation about how
> overlapping but different-sized reads & writes are ordered (or it simply
> isn't architecturally defined).

can't comment, I simply so not know how the hardware works.

> If the slowpath flag is moved to head, then it would always have to be
> locked anyway, because it needs to be atomic against other CPU's RMW
> operations setting the flag.

Yes, this is true.

But again, if we want to avoid the read-after-unlock, we need to update
this lock and read SLOWPATH atomically, it seems that we can't avoid the
locked insn.

Oleg.

WARNING: multiple messages have this Message-ID (diff)

From: Oleg Nesterov <oleg@redhat.com>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: the arch/x86 maintainers <x86@kernel.org>,
	KVM list <kvm@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	virtualization <virtualization@lists.linux-foundation.org>,
	Paul Gortmaker <paul.gortmaker@windriver.com>,
	Peter Anvin <hpa@zytor.com>, Davidlohr Bueso <dave@stgolabs.net>,
	Andrey Ryabinin <a.ryabinin@samsung.com>,
	Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Sasha Levin <sasha.levin@oracle.com>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	Rik van Riel <riel@redhat.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Andi Kleen <ak@linux.intel.com>,
	xen-devel@lists.xenproject.org, Dave Jones <davej@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Waiman Long <waiman.long@hp.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Paolo Bonzini <pbonzini@redhat.
Subject: Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions
Date: Thu, 12 Feb 2015 15:18:19 +0100	[thread overview]
Message-ID: <20150212141819.GA11633@redhat.com> (raw)
In-Reply-To: <54DBE27C.8050105@goop.org>

On 02/11, Jeremy Fitzhardinge wrote:
>
> On 02/11/2015 09:24 AM, Oleg Nesterov wrote:
> > I agree, and I have to admit I am not sure I fully understand why
> > unlock uses the locked add. Except we need a barrier to avoid the race
> > with the enter_slowpath() users, of course. Perhaps this is the only
> > reason?
>
> Right now it needs to be a locked operation to prevent read-reordering.
> x86 memory ordering rules state that all writes are seen in a globally
> consistent order, and are globally ordered wrt reads *on the same
> addresses*, but reads to different addresses can be reordered wrt to writes.
>
> So, if the unlocking add were not a locked operation:
>
>         __add(&lock->tickets.head, TICKET_LOCK_INC);		/* not locked */
>
>         if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
>             __ticket_unlock_slowpath(lock, prev);
>
> Then the read of lock->tickets.tail can be reordered before the unlock,
> which introduces a race:

Yes, yes, thanks, but this is what I meant. We need a barrier. Even if
"Every store is a release" as Linus mentioned.

> This *might* be OK, but I think it's on dubious ground:
>
>         __add(&lock->tickets.head, TICKET_LOCK_INC);		/* not locked */
>
> 	/* read overlaps write, and so is ordered */
>         if (unlikely(lock->head_tail & (TICKET_SLOWPATH_FLAG << TICKET_SHIFT))
>             __ticket_unlock_slowpath(lock, prev);
>
> because I think Intel and AMD differed in interpretation about how
> overlapping but different-sized reads & writes are ordered (or it simply
> isn't architecturally defined).

can't comment, I simply so not know how the hardware works.

> If the slowpath flag is moved to head, then it would always have to be
> locked anyway, because it needs to be atomic against other CPU's RMW
> operations setting the flag.

Yes, this is true.

But again, if we want to avoid the read-after-unlock, we need to update
this lock and read SLOWPATH atomically, it seems that we can't avoid the
locked insn.

Oleg.

WARNING: multiple messages have this Message-ID (diff)

From: Oleg Nesterov <oleg@redhat.com>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Sasha Levin <sasha.levin@oracle.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Peter Anvin <hpa@zytor.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	Waiman Long <waiman.long@hp.com>, Dave Jones <davej@redhat.com>,
	the arch/x86 maintainers <x86@kernel.org>,
	Paul Gortmaker <paul.gortmaker@windriver.com>,
	Andi Kleen <ak@linux.intel.com>, Jason Wang <jasowang@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	KVM list <kvm@vger.kernel.org>,
	virtualization <virtualization@lists.linux-foundation.org>,
	xen-devel@lists.xenproject.org, Rik van Riel <riel@redhat.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrey Ryabinin <a.ryabinin@samsung.com>
Subject: Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions
Date: Thu, 12 Feb 2015 15:18:19 +0100	[thread overview]
Message-ID: <20150212141819.GA11633@redhat.com> (raw)
In-Reply-To: <54DBE27C.8050105@goop.org>

On 02/11, Jeremy Fitzhardinge wrote:
>
> On 02/11/2015 09:24 AM, Oleg Nesterov wrote:
> > I agree, and I have to admit I am not sure I fully understand why
> > unlock uses the locked add. Except we need a barrier to avoid the race
> > with the enter_slowpath() users, of course. Perhaps this is the only
> > reason?
>
> Right now it needs to be a locked operation to prevent read-reordering.
> x86 memory ordering rules state that all writes are seen in a globally
> consistent order, and are globally ordered wrt reads *on the same
> addresses*, but reads to different addresses can be reordered wrt to writes.
>
> So, if the unlocking add were not a locked operation:
>
>         __add(&lock->tickets.head, TICKET_LOCK_INC);		/* not locked */
>
>         if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
>             __ticket_unlock_slowpath(lock, prev);
>
> Then the read of lock->tickets.tail can be reordered before the unlock,
> which introduces a race:

Yes, yes, thanks, but this is what I meant. We need a barrier. Even if
"Every store is a release" as Linus mentioned.

> This *might* be OK, but I think it's on dubious ground:
>
>         __add(&lock->tickets.head, TICKET_LOCK_INC);		/* not locked */
>
> 	/* read overlaps write, and so is ordered */
>         if (unlikely(lock->head_tail & (TICKET_SLOWPATH_FLAG << TICKET_SHIFT))
>             __ticket_unlock_slowpath(lock, prev);
>
> because I think Intel and AMD differed in interpretation about how
> overlapping but different-sized reads & writes are ordered (or it simply
> isn't architecturally defined).

can't comment, I simply so not know how the hardware works.

> If the slowpath flag is moved to head, then it would always have to be
> locked anyway, because it needs to be atomic against other CPU's RMW
> operations setting the flag.

Yes, this is true.

But again, if we want to avoid the read-after-unlock, we need to update
this lock and read SLOWPATH atomically, it seems that we can't avoid the
locked insn.

Oleg.

next prev parent reply	other threads:[~2015-02-12 14:18 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-06 14:49 [PATCH] x86 spinlock: Fix memory corruption on completing completions Raghavendra K T
2015-02-06 15:20 ` Sasha Levin
2015-02-06 15:20 ` Sasha Levin
2015-02-06 15:20   ` Sasha Levin
2015-02-06 16:15   ` Linus Torvalds
2015-02-06 16:15   ` Linus Torvalds
2015-02-06 16:15     ` Linus Torvalds
2015-02-06 16:15     ` Linus Torvalds
2015-02-06 17:03     ` Andrey Ryabinin
2015-02-06 17:03     ` Andrey Ryabinin
2015-02-06 17:03       ` Andrey Ryabinin
2015-02-06 17:03     ` Andrey Ryabinin
2015-02-08 17:14   ` Oleg Nesterov
2015-02-08 17:14   ` Oleg Nesterov
2015-02-08 17:14     ` Oleg Nesterov
2015-02-06 16:25 ` Linus Torvalds
2015-02-06 16:25 ` Linus Torvalds
2015-02-06 16:25   ` Linus Torvalds
2015-02-06 16:25   ` Linus Torvalds
2015-02-06 19:42   ` Davidlohr Bueso
2015-02-06 19:42     ` Davidlohr Bueso
2015-02-06 21:15     ` Sasha Levin
2015-02-06 21:15       ` Sasha Levin
2015-02-06 21:15       ` Sasha Levin
2015-02-06 23:24       ` Davidlohr Bueso
2015-02-06 23:24         ` Davidlohr Bueso
2015-02-06 23:24         ` Davidlohr Bueso
2015-02-06 23:24       ` Davidlohr Bueso
2015-02-06 21:15     ` Sasha Levin
2015-02-06 19:42   ` Davidlohr Bueso
2015-02-06 19:42   ` Davidlohr Bueso
2015-02-08 17:49   ` Raghavendra K T
2015-02-08 17:49     ` Raghavendra K T
2015-02-08 17:49     ` Raghavendra K T
2015-02-08 17:49   ` Raghavendra K T
2015-02-06 18:57 ` Sasha Levin
2015-02-06 18:57 ` Sasha Levin
2015-02-06 18:57   ` Sasha Levin
2015-02-08 17:57   ` Raghavendra K T
2015-02-08 17:57   ` Raghavendra K T
2015-02-08 17:57   ` Raghavendra K T
2015-02-08 21:14 ` Jeremy Fitzhardinge
2015-02-08 21:14   ` Jeremy Fitzhardinge
2015-02-09  9:34   ` Raghavendra K T
2015-02-09  9:34   ` Raghavendra K T
2015-02-09  9:34   ` Raghavendra K T
2015-02-09 12:02     ` Peter Zijlstra
2015-02-09 12:02     ` Peter Zijlstra
2015-02-09 12:02       ` Peter Zijlstra
2015-02-09 12:52       ` Raghavendra K T
2015-02-09 12:52       ` Raghavendra K T
2015-02-09 12:52         ` Raghavendra K T
2015-02-10  0:53       ` Linus Torvalds
2015-02-10  0:53       ` Linus Torvalds
2015-02-10  0:53         ` Linus Torvalds
2015-02-10  0:53         ` Linus Torvalds
2015-02-10  9:30         ` Raghavendra K T
2015-02-10  9:30           ` Raghavendra K T
2015-02-10 13:18           ` Denys Vlasenko
2015-02-10 13:18             ` Denys Vlasenko
2015-02-10 13:20             ` Denys Vlasenko
2015-02-10 13:20             ` Denys Vlasenko
2015-02-10 13:20               ` Denys Vlasenko
2015-02-10 13:20               ` Denys Vlasenko
2015-02-10 14:24             ` Oleg Nesterov
2015-02-10 14:24               ` Oleg Nesterov
2015-02-10 14:24               ` Oleg Nesterov
2015-02-10 14:24             ` Oleg Nesterov
2015-02-10 13:18           ` Denys Vlasenko
2015-02-10 13:18           ` Denys Vlasenko
2015-02-10 13:23           ` Sasha Levin
2015-02-10 13:23           ` Sasha Levin
2015-02-10 13:23             ` Sasha Levin
2015-02-10 13:26           ` Oleg Nesterov
2015-02-10 13:26             ` Oleg Nesterov
2015-02-11  1:18             ` Jeremy Fitzhardinge
2015-02-11  1:18             ` Jeremy Fitzhardinge
2015-02-11  1:18               ` Jeremy Fitzhardinge
2015-02-11  1:18               ` Jeremy Fitzhardinge
2015-02-11 17:24               ` Oleg Nesterov
2015-02-11 17:24                 ` Oleg Nesterov
2015-02-11 17:24                 ` Oleg Nesterov
2015-02-11 23:15                 ` Jeremy Fitzhardinge
2015-02-11 23:15                 ` Jeremy Fitzhardinge
2015-02-11 23:15                 ` Jeremy Fitzhardinge
2015-02-11 23:15                   ` Jeremy Fitzhardinge
2015-02-11 23:28                   ` Linus Torvalds
2015-02-11 23:28                   ` Linus Torvalds
2015-02-12  7:08                     ` Jeremy Fitzhardinge
2015-02-12  7:08                     ` Jeremy Fitzhardinge
2015-02-12 14:18                   ` Oleg Nesterov
2015-02-12 14:18                   ` Oleg Nesterov [this message]
2015-02-12 14:18                     ` Oleg Nesterov
2015-02-12 14:18                     ` Oleg Nesterov
2015-02-11 17:24               ` Oleg Nesterov
2015-02-11 11:08             ` Raghavendra K T
2015-02-11 11:08             ` Raghavendra K T
2015-02-11 11:08             ` Raghavendra K T
2015-02-11 11:08               ` Raghavendra K T
2015-02-11 17:38               ` Oleg Nesterov
2015-02-11 17:38                 ` Oleg Nesterov
2015-02-11 18:38                 ` Raghavendra K T
2015-02-11 18:38                   ` Raghavendra K T
2015-02-11 18:38                 ` Raghavendra K T
2015-02-11 18:38                 ` Raghavendra K T
2015-02-11 17:38               ` Oleg Nesterov
2015-02-10 13:26           ` Oleg Nesterov
2015-02-10  9:30         ` Raghavendra K T
2015-02-08 21:14 ` Jeremy Fitzhardinge
  -- strict thread matches above, loose matches on Subject: below --
2015-02-06 14:49 Raghavendra K T
2015-02-06 14:49 Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150212141819.GA11633@redhat.com \
    --to=oleg@redhat.com \
    --cc=a.ryabinin@samsung.com \
    --cc=ak@linux.intel.com \
    --cc=borntraeger@de.ibm.com \
    --cc=dave@stgolabs.net \
    --cc=davej@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jeremy@goop.org \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paul.gortmaker@windriver.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pbonzini@redhat. \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=sasha.levin@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=waiman.long@hp.com \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.