* gcc-3.2 breaks rmap on s390x
@ 2003-04-03 18:10 Pete Zaitcev
2003-04-03 18:44 ` Rik van Riel
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Pete Zaitcev @ 2003-04-03 18:10 UTC (permalink / raw)
To: riel; +Cc: zaitcev, linux-kernel, linux390
Hi, Rik:
the following patch seems to fix my rmap problems on s390x.
--- linux-2.4.20-2.1.24.z1/include/linux/mm.h 2003-03-27 21:30:09.000000000 -0500
+++ linux-2.4.20-2.1.24.z2/include/linux/mm.h 2003-04-02 20:26:11.000000000 -0500
@@ -376,8 +376,10 @@
*/
#ifdef CONFIG_SMP
while (test_and_set_bit(PG_chainlock, &page->flags)) {
- while (test_bit(PG_chainlock, &page->flags))
+ while (test_bit(PG_chainlock, &page->flags)) {
cpu_relax();
+ barrier();
+ }
}
#endif
}
The failure happens in page_remove_rmap. It runs like so:
if (!page_mapped(page)) // (page->pte.direct!=0)
return; /* remap_page_range() from a driver? */
pte_chain_lock(page);
if (PageDirect(page)) {
if (page->pte.direct == pte_paddr) {
The pte_chain_lock loop is in the patch above.
The compiler loads page->pte.direct into a register when
page_mapped() is tested, and uses it in if statement
across pte_chain_lock().
Andrew Morton observed that bitops on s390(x) do not clobber
memory. I saw it too, but I thought it was legal. A bitop
changes a word in a specific location. I think it just does
not sound right to trash an optimization because of that.
As an example, sparc(32) has a mix of bitops implemented in C
and bitops implemented in assembly. C bitops do not clobber
memory wholesale. Of asm bitops, only test_and_set_bit
clobbers memory (I have to ask Dave why).
Cheers,
-- Pete
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: gcc-3.2 breaks rmap on s390x
2003-04-03 18:10 gcc-3.2 breaks rmap on s390x Pete Zaitcev
@ 2003-04-03 18:44 ` Rik van Riel
2003-04-03 19:00 ` Davide Libenzi
2003-04-03 19:01 ` Hugh Dickins
2003-04-03 19:17 ` Anton Blanchard
2 siblings, 1 reply; 7+ messages in thread
From: Rik van Riel @ 2003-04-03 18:44 UTC (permalink / raw)
To: Pete Zaitcev; +Cc: linux-kernel, linux390
On Thu, 3 Apr 2003, Pete Zaitcev wrote:
> cpu_relax();
> + barrier();
Gah, now I look over the source I see that cpu_relax() is always
used together with barrier() ...
I guess the best long-term thing (2.5) would be to build a barrier
into cpu_relax(), but for 2.4-rmap I'll just add your patch.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: gcc-3.2 breaks rmap on s390x
2003-04-03 18:44 ` Rik van Riel
@ 2003-04-03 19:00 ` Davide Libenzi
0 siblings, 0 replies; 7+ messages in thread
From: Davide Libenzi @ 2003-04-03 19:00 UTC (permalink / raw)
To: Rik van Riel; +Cc: Pete Zaitcev, Linux Kernel Mailing List, linux390
On Thu, 3 Apr 2003, Rik van Riel wrote:
> On Thu, 3 Apr 2003, Pete Zaitcev wrote:
>
> > cpu_relax();
> > + barrier();
>
> Gah, now I look over the source I see that cpu_relax() is always
> used together with barrier() ...
>
> I guess the best long-term thing (2.5) would be to build a barrier
> into cpu_relax(), but for 2.4-rmap I'll just add your patch.
cpu_relax() is a barrier on Intel ( pause ). It's just a coincidence IMHO.
- Davide
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: gcc-3.2 breaks rmap on s390x
2003-04-03 18:10 gcc-3.2 breaks rmap on s390x Pete Zaitcev
2003-04-03 18:44 ` Rik van Riel
@ 2003-04-03 19:01 ` Hugh Dickins
2003-04-03 19:30 ` Pete Zaitcev
2003-04-03 19:17 ` Anton Blanchard
2 siblings, 1 reply; 7+ messages in thread
From: Hugh Dickins @ 2003-04-03 19:01 UTC (permalink / raw)
To: Pete Zaitcev; +Cc: riel, linux-kernel, linux390
On Thu, 3 Apr 2003, Pete Zaitcev wrote:
>
> the following patch seems to fix my rmap problems on s390x.
>
> --- linux-2.4.20-2.1.24.z1/include/linux/mm.h 2003-03-27 21:30:09.000000000 -0500
> +++ linux-2.4.20-2.1.24.z2/include/linux/mm.h 2003-04-02 20:26:11.000000000 -0500
> @@ -376,8 +376,10 @@
> */
> #ifdef CONFIG_SMP
> while (test_and_set_bit(PG_chainlock, &page->flags)) {
> - while (test_bit(PG_chainlock, &page->flags))
> + while (test_bit(PG_chainlock, &page->flags)) {
> cpu_relax();
> + barrier();
> + }
> }
> #endif
> }
Isn't it rather odd that it should fix the problem you describe?
because the barrier you're adding comes only in the exceptional path,
when the lock was found already held. I suppose the compiler is free
to make the barrier more general than you've asked for, but it seems
unsafe to rely on that.
Hugh
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: gcc-3.2 breaks rmap on s390x
2003-04-03 18:10 gcc-3.2 breaks rmap on s390x Pete Zaitcev
2003-04-03 18:44 ` Rik van Riel
2003-04-03 19:01 ` Hugh Dickins
@ 2003-04-03 19:17 ` Anton Blanchard
2 siblings, 0 replies; 7+ messages in thread
From: Anton Blanchard @ 2003-04-03 19:17 UTC (permalink / raw)
To: Pete Zaitcev; +Cc: riel, linux-kernel, linux390
> the following patch seems to fix my rmap problems on s390x.
>
> --- linux-2.4.20-2.1.24.z1/include/linux/mm.h 2003-03-27 21:30:09.000000000 -0500
> +++ linux-2.4.20-2.1.24.z2/include/linux/mm.h 2003-04-02 20:26:11.000000000 -0500
> @@ -376,8 +376,10 @@
> */
> #ifdef CONFIG_SMP
> while (test_and_set_bit(PG_chainlock, &page->flags)) {
> - while (test_bit(PG_chainlock, &page->flags))
> + while (test_bit(PG_chainlock, &page->flags)) {
> cpu_relax();
> + barrier();
> + }
> }
> #endif
> }
cpu_relax() should be a gcc barrier. Its this way on all architectures in
2.5.
Anton
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: gcc-3.2 breaks rmap on s390x
2003-04-03 19:01 ` Hugh Dickins
@ 2003-04-03 19:30 ` Pete Zaitcev
0 siblings, 0 replies; 7+ messages in thread
From: Pete Zaitcev @ 2003-04-03 19:30 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Pete Zaitcev, riel, linux-kernel, linux390
> Date: Thu, 3 Apr 2003 20:01:10 +0100 (BST)
> From: Hugh Dickins <hugh@veritas.com>
> > while (test_and_set_bit(PG_chainlock, &page->flags)) {
> > - while (test_bit(PG_chainlock, &page->flags))
> > + while (test_bit(PG_chainlock, &page->flags)) {
> > cpu_relax();
> > + barrier();
> > + }
> > }
> Isn't it rather odd that it should fix the problem you describe?
> because the barrier you're adding comes only in the exceptional path,
> when the lock was found already held.
The actual code is not drastically rearranged and fragments
are not replicated. Thus, the code below the while() loop
cannot possibly know if the lock was contested or not
(if the loop was executed at least once or not),
and it has to load the value from memory.
But suppose your worst fears came true and complier decided
to push the beyond the limits of reasonable into the shady
realm of legal. The worst the compiler can do is:
register word r10;
r10 = page->pte.direct;
if (r10!=0) something();
if (test_and_set_bit(PG_chainlock, &page->flags) == 0) {
/* No barrier here, mwuahahaha!!! */
} else {
do {
while (test_bit(PG_chainlock, &page->flags)) {
// This loop can be inverted too. No change to the reasoning.
cpu_relax();
barrier();
}
} while (test_and_set_bit(PG_chainlock, &page->flags));
r10 = page->pte.direct;
}
foo_using(r10);
Which continues to work, because the value loaded before uncontested
was taken is still valid later. The problem only happens if the
lock was contested.
BTW, Bill Irwin noted that, strictly speaking, the first if statement
with page_mapped should be done under the lock for safety. I think
he got a point. A 2.5 preempt in the wrong moment (between the
first load and test_and_set_bit in the above example) makes
the code invalid after all. Not that it mattered as long as
compiler does not do such exercises as I outlined above.
-- Pete
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: gcc-3.2 breaks rmap on s390x
@ 2003-04-03 19:42 Ulrich Weigand
0 siblings, 0 replies; 7+ messages in thread
From: Ulrich Weigand @ 2003-04-03 19:42 UTC (permalink / raw)
To: hugh, zaitcev, riel; +Cc: linux-kernel, Martin Schwidefsky
Hugh Dickins wrote:
>Isn't it rather odd that it should fix the problem you describe?
>because the barrier you're adding comes only in the exceptional path,
>when the lock was found already held. I suppose the compiler is free
>to make the barrier more general than you've asked for, but it seems
>unsafe to rely on that.
No, it's the other way around: the compiler is free to ignore the
barrier only if can prove the path containing it is not taken
(which it cannot in this case as the outer while depends on a volatile
test_and_set_bit operation).
However, this code still contains a (theoretical) problem:
> while (test_and_set_bit(PG_chainlock, &page->flags)) {
> - while (test_bit(PG_chainlock, &page->flags))
> + while (test_bit(PG_chainlock, &page->flags)) {
> cpu_relax();
> + barrier();
> + }
nothing in the inner loop contains a (hardware) memory barrier
('serialization' in s390-speak), so nothing forces the processor
to ever refresh its cached value of page->flags from memory.
On s390, the architecture definition would allow a processor
to loop endlessly (that is, until the next interrupt, which
serves as serialization operation). However, none of the currently
existing processors do that, so the problem is only theoretical ...
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand
Linux for S/390 Design & Development
IBM Deutschland Entwicklung GmbH, Schoenaicher Str. 220, 71032 Boeblingen
Phone: +49-7031/16-3727 --- Email: Ulrich.Weigand@de.ibm.com
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2003-04-03 19:37 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-03 18:10 gcc-3.2 breaks rmap on s390x Pete Zaitcev
2003-04-03 18:44 ` Rik van Riel
2003-04-03 19:00 ` Davide Libenzi
2003-04-03 19:01 ` Hugh Dickins
2003-04-03 19:30 ` Pete Zaitcev
2003-04-03 19:17 ` Anton Blanchard
-- strict thread matches above, loose matches on Subject: below --
2003-04-03 19:42 Ulrich Weigand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox