From: Petr Tesarik <ptesarik@suse.cz>
To: linux-ia64@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, Tony Luck <tony.luck@intel.com>,
Hedi Berriche <hedi@sgi.com>
Subject: Serious problem with ticket spinlocks on ia64
Date: Fri, 27 Aug 2010 13:37:33 +0000 [thread overview]
Message-ID: <201008271537.35709.ptesarik@suse.cz> (raw)
Hi everybody,
SGI has recently experienced failures with the new ticket spinlock
implementation. Hedi Berriche sent me a simple test case that can
trigger the failure on the siglock. To debug the issue, I wrote a
small module that watches writes to current->sighand->siglock and
records the values.
I observed that the __ticket_spin_lock() primitive fails when the
tail wraps around to zero. I reconstructed the following:
CPU 7 holds the spinlock
CPU 5 wants to acquire the spinlock
Spinlock value is 0xfffcffff
(now serving 0x7fffe, next ticket 0x7ffff)
CPU 7 executes st2.rel to release the spinlock.
At the same time CPU 5 executes a fetchadd4.acq.
The resulting lock value is 0xfffe0000 (correct), and CPU 5 has
recorded its ticket number (0x7fff).
Consequently, the first spinlock loop iteration succeeds, and CPU 5
now holds the spinlock.
Next, CPU 5 releases the spinlock with st2.rel, changing the lock
value to 0x0 (correct).
SO FAR SO GOOD.
Now, CPU 4, CPU 5 and CPU 7 all want to acquire the lock again.
Interestingly, CPU 5 and CPU 7 are both granted the same ticket,
and the spinlock value (as seen from the debug fault handler) is
0x0 after single-stepping over the fetchadd4.acq, in both cases.
CPU 4 correctly sets the spinlock value to 0x1.
I don't know if the simultaneos acquire attempt and release are
necessary to trigger the bug, but I noted it here.
I've only seen this happen when the spinlock wraps around to zero,
but I don't know whether it cannot happen otherwise.
In any case, there seems to be a serious problem with memory
ordering, and I'm not an expert to tell exactly what it is.
Any ideas?
Petr Tesarik
L3 International
Novell, Inc.
WARNING: multiple messages have this Message-ID (diff)
From: Petr Tesarik <ptesarik@suse.cz>
To: linux-ia64@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, Tony Luck <tony.luck@intel.com>,
Hedi Berriche <hedi@sgi.com>
Subject: Serious problem with ticket spinlocks on ia64
Date: Fri, 27 Aug 2010 15:37:33 +0200 [thread overview]
Message-ID: <201008271537.35709.ptesarik@suse.cz> (raw)
Hi everybody,
SGI has recently experienced failures with the new ticket spinlock
implementation. Hedi Berriche sent me a simple test case that can
trigger the failure on the siglock. To debug the issue, I wrote a
small module that watches writes to current->sighand->siglock and
records the values.
I observed that the __ticket_spin_lock() primitive fails when the
tail wraps around to zero. I reconstructed the following:
CPU 7 holds the spinlock
CPU 5 wants to acquire the spinlock
Spinlock value is 0xfffcffff
(now serving 0x7fffe, next ticket 0x7ffff)
CPU 7 executes st2.rel to release the spinlock.
At the same time CPU 5 executes a fetchadd4.acq.
The resulting lock value is 0xfffe0000 (correct), and CPU 5 has
recorded its ticket number (0x7fff).
Consequently, the first spinlock loop iteration succeeds, and CPU 5
now holds the spinlock.
Next, CPU 5 releases the spinlock with st2.rel, changing the lock
value to 0x0 (correct).
SO FAR SO GOOD.
Now, CPU 4, CPU 5 and CPU 7 all want to acquire the lock again.
Interestingly, CPU 5 and CPU 7 are both granted the same ticket,
and the spinlock value (as seen from the debug fault handler) is
0x0 after single-stepping over the fetchadd4.acq, in both cases.
CPU 4 correctly sets the spinlock value to 0x1.
I don't know if the simultaneos acquire attempt and release are
necessary to trigger the bug, but I noted it here.
I've only seen this happen when the spinlock wraps around to zero,
but I don't know whether it cannot happen otherwise.
In any case, there seems to be a serious problem with memory
ordering, and I'm not an expert to tell exactly what it is.
Any ideas?
Petr Tesarik
L3 International
Novell, Inc.
next reply other threads:[~2010-08-27 13:37 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-27 13:37 Petr Tesarik [this message]
2010-08-27 13:37 ` Serious problem with ticket spinlocks on ia64 Petr Tesarik
2010-08-27 13:48 ` Hedi Berriche
2010-08-27 13:48 ` Hedi Berriche
2010-08-27 14:09 ` Petr Tesarik
2010-08-27 14:09 ` Petr Tesarik
2010-08-27 14:31 ` Hedi Berriche
2010-08-27 14:31 ` Hedi Berriche
2010-08-27 14:40 ` Petr Tesarik
2010-08-27 14:40 ` Petr Tesarik
2010-08-27 14:52 ` Hedi Berriche
2010-08-27 14:52 ` Hedi Berriche
2010-08-27 16:37 ` Petr Tesarik
2010-08-27 16:37 ` Petr Tesarik
2010-08-27 16:08 ` Luck, Tony
2010-08-27 16:08 ` Luck, Tony
2010-08-27 17:16 ` Petr Tesarik
2010-08-27 17:16 ` Petr Tesarik
2010-08-27 18:20 ` Hedi Berriche
2010-08-27 18:20 ` Hedi Berriche
2010-08-27 19:40 ` Petr Tesarik
2010-08-27 19:40 ` Petr Tesarik
2010-08-27 20:29 ` Luck, Tony
2010-08-27 20:29 ` Luck, Tony
2010-08-27 20:41 ` Petr Tesarik
2010-08-27 20:41 ` Petr Tesarik
2010-08-27 21:03 ` Petr Tesarik
2010-08-27 21:03 ` Petr Tesarik
2010-08-27 21:11 ` Luck, Tony
2010-08-27 21:11 ` Luck, Tony
2010-08-27 22:13 ` Petr Tesarik
2010-08-27 22:13 ` Petr Tesarik
2010-08-27 23:26 ` Luck, Tony
2010-08-27 23:26 ` Luck, Tony
2010-08-27 23:55 ` Luck, Tony
2010-08-27 23:55 ` Luck, Tony
2010-08-28 0:28 ` Hedi Berriche
2010-08-28 0:28 ` Hedi Berriche
2010-08-28 5:01 ` Luck, Tony
2010-08-28 5:01 ` Luck, Tony
2010-08-30 18:17 ` Luck, Tony
2010-08-30 18:17 ` Luck, Tony
2010-08-30 21:41 ` Petr Tesarik
2010-08-30 21:41 ` Petr Tesarik
2010-08-30 22:43 ` Tony Luck
2010-08-30 22:43 ` Tony Luck
2010-08-31 22:17 ` Tony Luck
2010-08-31 22:17 ` Tony Luck
2010-09-01 23:09 ` Tony Luck
2010-09-01 23:09 ` Tony Luck
2010-09-02 0:26 ` Hedi Berriche
2010-09-02 0:26 ` Hedi Berriche
2010-09-03 0:06 ` Tony Luck
2010-09-03 0:06 ` Tony Luck
2010-09-03 9:04 ` Petr Tesarik
2010-09-03 9:04 ` Petr Tesarik
2010-09-03 14:35 ` Petr Tesarik
2010-09-03 14:35 ` Petr Tesarik
2010-09-03 14:52 ` Petr Tesarik
2010-09-03 14:52 ` Petr Tesarik
2010-09-03 15:50 ` Tony Luck
2010-09-03 15:50 ` Tony Luck
2010-09-06 14:47 ` Petr Tesarik
2010-09-06 14:47 ` Petr Tesarik
2010-09-07 13:17 ` Petr Tesarik
2010-09-07 13:17 ` Petr Tesarik
2010-09-07 17:35 ` Tony Luck
2010-09-07 17:35 ` Tony Luck
2010-09-08 15:55 ` Tony Luck
2010-09-08 15:55 ` Tony Luck
2010-09-10 2:55 ` Dave Jones
2010-09-10 2:55 ` Dave Jones
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201008271537.35709.ptesarik@suse.cz \
--to=ptesarik@suse.cz \
--cc=hedi@sgi.com \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.