* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-20 17:09 [parisc-linux] Does it lakes some cloberred r1 in __put_kernel_asm() 64bit? Carlos O'Donell
@ 2006-04-20 17:28 ` John David Anglin
2006-04-20 17:36 ` Michael S. Zick
2006-04-20 20:04 ` Carlos O'Donell
0 siblings, 2 replies; 28+ messages in thread
From: John David Anglin @ 2006-04-20 17:28 UTC (permalink / raw)
To: Carlos O'Donell; +Cc: soete.joel, parisc-linux
> > (here) insn, an interruption occures which in turn launch the
> > fixup_put_user_skip_1()
>
> A couple more questions for research :)
>
> Q: Does the process of interruption clobber registers?
r8 and r9?
Dave
--
J. David Anglin dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-20 17:28 ` [parisc-linux] Does it lakes some cloberred r1 in John David Anglin
@ 2006-04-20 17:36 ` Michael S. Zick
2006-04-20 19:32 ` John David Anglin
2006-04-20 20:04 ` Carlos O'Donell
1 sibling, 1 reply; 28+ messages in thread
From: Michael S. Zick @ 2006-04-20 17:36 UTC (permalink / raw)
To: parisc-linux
On Thu April 20 2006 12:28, John David Anglin wrote:
> > > (here) insn, an interruption occures which in turn launch the
> > > fixup_put_user_skip_1()
> >
> > A couple more questions for research :)
> >
> > Q: Does the process of interruption clobber registers?
>
Joel,
The machine should switch a small sub-set of the general
registers to an alternate set, called the 'shadow registers'
for the interruption routine to use.
I do not recall the exact list of which register numbers
those are. Search for 'shadow registers' in the acd.pdf
I do not know if r1 is shadowed or not.
If the interrupt routine needs to use other registers,
then it must save/restore them in that routine.
Mike
> r8 and r9?
>
> Dave
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-20 17:36 ` Michael S. Zick
@ 2006-04-20 19:32 ` John David Anglin
2006-04-20 20:21 ` Michael S. Zick
0 siblings, 1 reply; 28+ messages in thread
From: John David Anglin @ 2006-04-20 19:32 UTC (permalink / raw)
To: Michael S. Zick; +Cc: parisc-linux
> > > > (here) insn, an interruption occures which in turn launch the
> > > > fixup_put_user_skip_1()
> > >
> > > A couple more questions for research :)
> > >
> > > Q: Does the process of interruption clobber registers?
> >
> Joel,
>
> The machine should switch a small sub-set of the general
> registers to an alternate set, called the 'shadow registers'
> for the interruption routine to use.
fixup_put_user_skip_1() runs after the interruption, so the
shadow registers don't matter. Look at the registers used
by fixup_put_user_skip_1() and where it returns. Note that
GCC is only concerned about registers used in the current
function (i.e., if an asm changes sections and generates code
using registers in that section, this is all black magic to
GCC and it doesn't need to know about it).
It's clear that the __get* and __put* macros need brief comments
about the register use of the fixup routines. Also, the fixup
routines need corresponding comments.
> I do not know if r1 is shadowed or not.
Yes.
Dave
--
J. David Anglin dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-20 17:28 ` [parisc-linux] Does it lakes some cloberred r1 in John David Anglin
2006-04-20 17:36 ` Michael S. Zick
@ 2006-04-20 20:04 ` Carlos O'Donell
2006-04-20 21:29 ` John David Anglin
2006-04-21 18:52 ` Michael S. Zick
1 sibling, 2 replies; 28+ messages in thread
From: Carlos O'Donell @ 2006-04-20 20:04 UTC (permalink / raw)
To: John David Anglin, Parisc List, randolph
parisc,
Cat's out of the bag, even Dave seems interested. Randolph, tell me
when I'm wrong. I did a review of this code for Randolph when he
implemented the first iteration.
> > Q: Does the process of interruption clobber registers?
> r8 and r9?
Normally no, these are shadowed registers. However see below.
Routines used reside in:
~~~~~~~~~~~~~~~~~~
kernel/entry.S
kernel/traps.c
mm/fault.c
Exception Stages:
~~~~~~~~~~~~~~~
Exception happens.
Trap handler executes.
We don't handle the exception in assembly?
Shadow registers are not enough to execute C code.
All registers saved.
Load or Store has an exception table entry "fixup" in the kernel.
iaoq[0] and iaoq[1] are set to the address of the fixup.
gr[0] has the B-bit zeroed.
When the interrupt returns you execute the fixup function.
A: The process of interruption does not clobber any registers.
Corollory: The process of fixup uses r8/r9 as outputs, with r1 clobbered.
A: The fixups run in the original context of the failed load/store.
Generic Fixup Functions
~~~~~~~~~~~~~~~~~~~
There are 4 generic fixup functions:
The first 2 correspond to failed loads.
The second 2 correspond to failed stores.
There is also a FIXUP_BRANCH in the emulation routines.
arch/parisc/lib/fixup.S
fixup_get_user_skip1 (Skip 1 word, used in 32-bit mode)
fixup_get_user_skip2 (Skip 2 words, used in 64-bit mode)
fixup_put_user_skip1 ( " )
fixup_put_user_skip2 ( " )
The macro "get_fault_ip" is used by each of the 4 generic
fixups and clobbers both inputs and r1.
The registers used for each of the following functions:
r8 - Stores return value / Used as temp in get_fault_ip
r9 - Returned value (We don't want to clobber ret0)
r1 - Used as temp to load fault ip + 4/8, Used as temp in get_fault_ip.
All of the put get asm routines should list "r1" as a clobber.
All of the put get asm routines already list r8 and r9 as outputs.
Joel is correct in saying that there is a __put_kernel_asm for 64-bit
mode which is missing an r1 clobber. However, you need to know why
you are correct and understand the repercussions of the change.
All of the routines using FIXUP_BRANCH must specify "r1" clobbered.
Caveats:
- The fixup routines read the exception tables but don't list that
memory as an input. Hence mixing put and get user/kernel
calls and modifications to the exception tables is dangerous?
Summary:
- Add the missing "r1" clobber to the 64-bit put kernel macro.
Run a boot test. Post a patch including the result of the boot test.
- Add the missing "r1" clobbers to any asm using FIXUP_BRANCH.
Run a boot test. Post a pach including the result of the boot test.
Thanks!
Cheers,
Carlos.
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-20 19:32 ` John David Anglin
@ 2006-04-20 20:21 ` Michael S. Zick
0 siblings, 0 replies; 28+ messages in thread
From: Michael S. Zick @ 2006-04-20 20:21 UTC (permalink / raw)
To: parisc-linux
On Thu April 20 2006 14:32, John David Anglin wrote:
> > > > > (here) insn, an interruption occures which in turn launch the
> > > > > fixup_put_user_skip_1()
> > > >
> > > > A couple more questions for research :)
> > > >
> > > > Q: Does the process of interruption clobber registers?
> > >
> > Joel,
> >
> > The machine should switch a small sub-set of the general
> > registers to an alternate set, called the 'shadow registers'
> > for the interruption routine to use.
>
> fixup_put_user_skip_1() runs after the interruption, so the
> shadow registers don't matter.
>
I was unclear, I was answering the second question, my bad.
> Look at the registers used
> by fixup_put_user_skip_1() and where it returns.
>
Ah, the first question...
> Note that
> GCC is only concerned about registers used in the current
> function (i.e., if an asm changes sections and generates code
> using registers in that section, this is all black magic to
> GCC and it doesn't need to know about it).
>
Giving that a little though, and it should be obvious...
GCC only compiles a single flow of execution at a time,
the ::: information fields only apply to what the compiler
is doing at the moment.
If the author of the code writes instructions inside of the
current black box (__asm block) that are on a different path
of execution, then it is up to the author to ensure that
the _other_ path (section in this case) observes proper
register usage.
It is the _other_ path of execution that needs its register
usage audited.
Mike
> It's clear that the __get* and __put* macros need brief comments
> about the register use of the fixup routines. Also, the fixup
> routines need corresponding comments.
>
> > I do not know if r1 is shadowed or not.
>
> Yes.
>
> Dave
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-20 20:04 ` Carlos O'Donell
@ 2006-04-20 21:29 ` John David Anglin
2006-04-21 18:52 ` Michael S. Zick
1 sibling, 0 replies; 28+ messages in thread
From: John David Anglin @ 2006-04-20 21:29 UTC (permalink / raw)
To: Carlos O'Donell; +Cc: parisc-linux
> - Add the missing "r1" clobbers to any asm using FIXUP_BRANCH.
> Run a boot test. Post a pach including the result of the boot test.
With luck, this will fix the handling of unaligned fldd instructions
(failure of libjava negzero).
Dave
--
J. David Anglin dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-20 20:04 ` Carlos O'Donell
2006-04-20 21:29 ` John David Anglin
@ 2006-04-21 18:52 ` Michael S. Zick
1 sibling, 0 replies; 28+ messages in thread
From: Michael S. Zick @ 2006-04-21 18:52 UTC (permalink / raw)
To: parisc-linux
On Thu April 20 2006 15:04, Carlos O'Donell wrote:
> parisc,
>
Some 'deep background' since the thread Joel and I are discussing
is nearly a month old now.
http://lists.parisc-linux.org/pipermail/parisc-linux/2006-March/028603.html
> Cat's out of the bag, even Dave seems interested. Randolph, tell me
> when I'm wrong. I did a review of this code for Randolph when he
> implemented the first iteration.
>
> > > Q: Does the process of interruption clobber registers?
> > r8 and r9?
>
> Normally no, these are shadowed registers. However see below.
>
The relevant part of the dump:
IASQ: 0000000000000000 0000000000000000 IAOQ: 000000001010e728 000000001010e718
IIR: 0f48109c ISR: 0000000000000080 IOR: 0000000000000002
CPU: 0 CR30: 000000008b1bc000 CR31: 000000001053c000
ORIG_R28: 000000001013f77c
IAOQ[0]: _read_lock+0x18/0x30
IAOQ[1]: _read_lock+0x8/0x30
RP(r2): send_group_sig_info+0x3c/0xb0
Note the values of IAOQ...
I told Joel that the only way I knew of they could look like that
was a branch in the delay slot of a branch.
Which is a neat way to execute one instruction out of linear, sequential
order, but that I did not believe the compiler used that trick.
I.E:
It was either hand coded that way or a sign of something wrong.
[I might have told Joel wrong - if so, sorry Joel]
I did not think of the possibility of the queue being diddled with, using
the trashed value of a register.
I mention this at this time, so eyes can be watching for this 'back to
the future' execution order after this set of fixes goes in.
Mike
> Routines used reside in:
> ~~~~~~~~~~~~~~~~~~
> kernel/entry.S
> kernel/traps.c
> mm/fault.c
>
> Exception Stages:
> ~~~~~~~~~~~~~~~
> Exception happens.
> Trap handler executes.
> We don't handle the exception in assembly?
> Shadow registers are not enough to execute C code.
> All registers saved.
> Load or Store has an exception table entry "fixup" in the kernel.
> iaoq[0] and iaoq[1] are set to the address of the fixup.
> gr[0] has the B-bit zeroed.
> When the interrupt returns you execute the fixup function.
>
> A: The process of interruption does not clobber any registers.
> Corollory: The process of fixup uses r8/r9 as outputs, with r1 clobbered.
>
> A: The fixups run in the original context of the failed load/store.
>
> Generic Fixup Functions
> ~~~~~~~~~~~~~~~~~~~
> There are 4 generic fixup functions:
> The first 2 correspond to failed loads.
> The second 2 correspond to failed stores.
> There is also a FIXUP_BRANCH in the emulation routines.
>
> arch/parisc/lib/fixup.S
> fixup_get_user_skip1 (Skip 1 word, used in 32-bit mode)
> fixup_get_user_skip2 (Skip 2 words, used in 64-bit mode)
> fixup_put_user_skip1 ( " )
> fixup_put_user_skip2 ( " )
>
> The macro "get_fault_ip" is used by each of the 4 generic
> fixups and clobbers both inputs and r1.
>
> The registers used for each of the following functions:
> r8 - Stores return value / Used as temp in get_fault_ip
> r9 - Returned value (We don't want to clobber ret0)
> r1 - Used as temp to load fault ip + 4/8, Used as temp in get_fault_ip.
>
> All of the put get asm routines should list "r1" as a clobber.
> All of the put get asm routines already list r8 and r9 as outputs.
>
> Joel is correct in saying that there is a __put_kernel_asm for 64-bit
> mode which is missing an r1 clobber. However, you need to know why
> you are correct and understand the repercussions of the change.
>
> All of the routines using FIXUP_BRANCH must specify "r1" clobbered.
>
> Caveats:
> - The fixup routines read the exception tables but don't list that
> memory as an input. Hence mixing put and get user/kernel
> calls and modifications to the exception tables is dangerous?
>
> Summary:
> - Add the missing "r1" clobber to the 64-bit put kernel macro.
> Run a boot test. Post a patch including the result of the boot test.
>
> - Add the missing "r1" clobbers to any asm using FIXUP_BRANCH.
> Run a boot test. Post a pach including the result of the boot test.
>
> Thanks!
>
> Cheers,
> Carlos.
> _______________________________________________
> parisc-linux mailing list
> parisc-linux@lists.parisc-linux.org
> http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
>
>
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
[not found] <200604212013.k3LKDAbx003500@hiauly1.hia.nrc.ca>
@ 2006-04-21 20:30 ` John David Anglin
0 siblings, 0 replies; 28+ messages in thread
From: John David Anglin @ 2006-04-21 20:30 UTC (permalink / raw)
To: John David Anglin; +Cc: parisc-linux
> 101112c4: e8 1f 1f df b,l,n 101112b8 <_read_lock+0x8>,r0
> 101112c8: 0f 48 10 9c ldw 4(r26),ret0 <== exception here
>
> r26 contains 0x105d62a8 according to Joel's message. There isn't
> a branch in the delay slot. The exception just occurs in the delay
> slot of a branch. It's not obvious to me what caused the exception.
Correction, r26 contained 0x000000001051d040. I also just noticed
that the branch is nullified (N=1 in PSW). Just guessing, but I think
this is a timer interruption (soft lockup).
Dave
--
J. David Anglin dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
[not found] <20060422154641.GC10514@quicksilver.road.mcmartin.ca>
@ 2006-04-22 16:48 ` John David Anglin
2006-04-23 16:18 ` Michael S. Zick
0 siblings, 1 reply; 28+ messages in thread
From: John David Anglin @ 2006-04-22 16:48 UTC (permalink / raw)
To: Kyle McMartin; +Cc: parisc-linux
> On Fri, Apr 21, 2006 at 04:30:08PM -0400, John David Anglin wrote:
> > > 101112c4: e8 1f 1f df b,l,n 101112b8 <_read_lock+0x8>,r0
> > > 101112c8: 0f 48 10 9c ldw 4(r26),ret0 <== exception here
> > >
> > > r26 contains 0x105d62a8 according to Joel's message. There isn't
> > > a branch in the delay slot. The exception just occurs in the delay
> > > slot of a branch. It's not obvious to me what caused the exception.
> >
> > Correction, r26 contained 0x000000001051d040. I also just noticed
> > that the branch is nullified (N=1 in PSW). Just guessing, but I think
> > this is a timer interruption (soft lockup).
> >
>
> This crashed palinux last night... Same IIR/IASQIAOQ...
Actually, it's probably just a nice place to crash for the night ;)
Any info on the interruption and what was going on? In Joel's case,
the call was from here:
int
send_group_sig_info(int sig, struct siginfo *info, struct task_struct *p)
{
int ret;
read_lock(&tasklist_lock);
ret = group_send_sig_info(sig, info, p);
read_unlock(&tasklist_lock);
return ret;
}
tasklist_lock is used in a lot of places. send_group_sig_info only
seems to be called in one place from kernel/itimer.c. It would seem
like there's some situation where the lock isn't getting unlocked,
or it's very highly contended.
Dave
--
J. David Anglin dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-22 16:48 ` [parisc-linux] Does it lakes some cloberred r1 in John David Anglin
@ 2006-04-23 16:18 ` Michael S. Zick
2006-04-23 17:06 ` Michael S. Zick
0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Zick @ 2006-04-23 16:18 UTC (permalink / raw)
To: parisc-linux
On Sat April 22 2006 11:48, John David Anglin wrote:
> > On Fri, Apr 21, 2006 at 04:30:08PM -0400, John David Anglin wrote:
> > > > 101112c4: e8 1f 1f df b,l,n 101112b8 <_read_lock+0x8>,r0
> > > > 101112c8: 0f 48 10 9c ldw 4(r26),ret0 <== exception here
> > > >
> > > > r26 contains 0x105d62a8 according to Joel's message. There isn't
> > > > a branch in the delay slot. The exception just occurs in the delay
> > > > slot of a branch. It's not obvious to me what caused the exception.
> > >
> > > Correction, r26 contained 0x000000001051d040. I also just noticed
> > > that the branch is nullified (N=1 in PSW). Just guessing, but I think
> > > this is a timer interruption (soft lockup).
> > >
> >
> > This crashed palinux last night... Same IIR/IASQIAOQ...
>
> Actually, it's probably just a nice place to crash for the night ;)
>
> Any info on the interruption and what was going on? In Joel's case,
> the call was from here:
>
> int
> send_group_sig_info(int sig, struct siginfo *info, struct task_struct *p)
> {
> int ret;
> read_lock(&tasklist_lock);
> ret = group_send_sig_info(sig, info, p);
> read_unlock(&tasklist_lock);
> return ret;
> }
>
> tasklist_lock is used in a lot of places. send_group_sig_info only
> seems to be called in one place from kernel/itimer.c. It would seem
> like there's some situation where the lock isn't getting unlocked,
> or it's very highly contended.
>
Or the routine is not coded to make the lock instruction observable
by all processors:
> This the read lock I see in vmlinux-2.6.15-rc2-pa1:
>
> 00000000101112b0 <_read_lock>:
>
> 101112b0: 0f 40 15 dc ldcw,co 0(r26),ret0
<format 5 instruction: 0000011 11010 00000 00 0 1 01 0111 0 11100>
a==0,s==0,b==26,t==28
cc==01 Coherent Operation, not a hint but required[1];
Processor may operate on line in cache rather than update memory[2];
<quote [3]>
If a cache control hint is specified, the semaphore operation may be handled
as if a cache control hint had not been specified, or, preferably, the addressed
word is zero extended and copied into GR t and then the addressed word is set to
zero in the cache. The cleared word need not be flushed to memory.
</quote>;
<quote [4]>
if (cache line is present and dirty || coherent_system || cc != 0) {
GR[t] <- mem_load(space,offset,0,63,NO_HINT);
mem_store(space,offset,0,63,NO_HINT,0);
} else {
Dcache_flush(space,offset);
GR[t] <- mem_load(space,offset,0,63,NO_HINT);
store_in_memory(space,offset,0,63,NO_HINT,0);
}
</quote>;
Note that the first block is selected by ((cc == 01) != 00)
The first block describes the operation required for semaphores used
by multiple execution paths on a single processor.
The second block (after the else) describes the operation required
for execution paths on multiple processors.
[1]: http://h21007.www2.hp.com/dspp/tech/tech_TechDocumentDetailPage_IDX/1,1701,5310,00.html
[2]: http://h21007.www2.hp.com/dspp/files/unprotected/parisc20/PA_6_inst_overview.pdf; page 13 and table 6-9
[3]: http://h21007.www2.hp.com/dspp/files/unprotected/parisc20/PA_7_inst_descriptions.pdf; page 74 and 75
[4]: ibid. 'indivisible', page 75
Next question.
Mike
> Dave
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-23 16:18 ` Michael S. Zick
@ 2006-04-23 17:06 ` Michael S. Zick
2006-04-24 15:35 ` John David Anglin
0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Zick @ 2006-04-23 17:06 UTC (permalink / raw)
To: parisc-linux
On Sun April 23 2006 11:18, Michael S. Zick wrote:
>
> Next question.
>
My bad, that was just plain rude.
The question and the answer:
Multiple processor pa-risc systems with per-processor
caches use a 'cache coherency' trigger.
To trip the trigger (I.E: make the changes observable)
ldcw,co target_address
Where target_address includes the magic byte[0] of
the cache line.
Translation:
Spin on the ldcw,co not the ldw here.
On the systems with 128 byte long cache lines,
ensure these spinlocks are 128 byte aligned not
64 byte aligned as in this dump.
Mike
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-23 17:06 ` Michael S. Zick
@ 2006-04-24 15:35 ` John David Anglin
2006-04-24 16:25 ` Grant Grundler
` (2 more replies)
0 siblings, 3 replies; 28+ messages in thread
From: John David Anglin @ 2006-04-24 15:35 UTC (permalink / raw)
To: Michael S. Zick; +Cc: parisc-linux
> ldcw,co target_address
>
> Where target_address includes the magic byte[0] of
> the cache line.
Where is this documented?
> Translation:
>
> Spin on the ldcw,co not the ldw here.
I believe this makes sense as the errata specifies that the ldcw,co
operation has to be performed in cache on PA 2.0 machines:
http://h21007.www2.hp.com/dspp/tech/tech_TechDocumentDetailPage_IDX/1,1701,5310,00.html
> On the systems with 128 byte long cache lines,
> ensure these spinlocks are 128 byte aligned not
> 64 byte aligned as in this dump.
As a practical note, this is very difficult to achieve for dynamically
allocated spinlocks.
The intent of the errata seems to be to relax the alignment requirement
for ldc[dw],co in cases where the spinlock is not being shared with a
non-coherent I/O device. If spinlocks have to be aligned to the start
of a cacheline, there doesn't seem to be any point to the errata.
Dave
--
J. David Anglin dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-24 15:35 ` John David Anglin
@ 2006-04-24 16:25 ` Grant Grundler
2006-04-24 16:50 ` John David Anglin
2006-04-24 16:35 ` Michael S. Zick
2006-04-25 15:17 ` Michael S. Zick
2 siblings, 1 reply; 28+ messages in thread
From: Grant Grundler @ 2006-04-24 16:25 UTC (permalink / raw)
To: John David Anglin; +Cc: parisc-linux
On Mon, Apr 24, 2006 at 11:35:48AM -0400, John David Anglin wrote:
> The intent of the errata seems to be to relax the alignment requirement
> for ldc[dw],co in cases where the spinlock is not being shared with a
> non-coherent I/O device.
AFAIK, only one non-coherent PA-RISC box has PA2.0 and parisc-linux
doesn't support it: T600.
(I'm assuming ",co" completer is PA2.0 only - please correct me if
that's wrong.)
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-24 15:35 ` John David Anglin
2006-04-24 16:25 ` Grant Grundler
@ 2006-04-24 16:35 ` Michael S. Zick
2006-04-24 18:00 ` Michael S. Zick
2006-04-24 18:46 ` John David Anglin
2006-04-25 15:17 ` Michael S. Zick
2 siblings, 2 replies; 28+ messages in thread
From: Michael S. Zick @ 2006-04-24 16:35 UTC (permalink / raw)
To: John David Anglin; +Cc: parisc-linux
On Mon April 24 2006 10:35, you wrote:
> > ldcw,co target_address
> >
> > Where target_address includes the magic byte[0] of
> > the cache line.
>
> Where is this documented?
>
Well, they didn't put it in the instruction RTL where
someone could find it. It is a footnote or mentioned
in passing somewhere.
I will look for it, I found it about 5 years ago when
Matt and I discussed this on the list, I can find it again.
But it is reasonable - you don't want to waste the cache
coherency bandwidth with every ldcw,co in the cache line.
> > Translation:
> >
> > Spin on the ldcw,co not the ldw here.
>
> I believe this makes sense as the errata specifies that the ldcw,co
> operation has to be performed in cache on PA 2.0 machines:
> http://h21007.www2.hp.com/dspp/tech/tech_TechDocumentDetailPage_IDX/1,1701,5310,00.html
>
Now there is a footnote that applies here...
Those instruction descriptions do not always mention side-effects,
even less often do they mention the exceptions to the side-effects.
An exception (footnoted somewhere):
ldcw,co does not set the dirty bit on the dcache line.
Which makes sense, if you recall that we are in the first clause
of that indivisible RTL block - the one that avoids the Dcache flush
and corresponding memory cycles.
If the instruction had the usual side-effect of setting the dirty
bit on the cache line, then we would not be avoiding the Dcache flush
and the memory cycle (sooner or later).
So spinning on the ldcw,co is evidently what the hardware people had
in mind. It will not generate a bunch of Dcache flushes.
Since there is no way to clear the lock with ldcw,co then when the
lock is cleared, then there must be another magic completer that needs
to be used on the instruction that resets the condition to '1' (unlocked).
Something that triggers the cache coherency system so the change is
immediately observable by all cpus.
But this is also reasonable - since the lifetime of the release period
could well be longer than the lifetime of the cache line. Common memory
is the only place for long term storage of the released lock.
I have not found that reference yet either. It will be one of the cache
'hints' (actually a command in this case).
> > On the systems with 128 byte long cache lines,
> > ensure these spinlocks are 128 byte aligned not
> > 64 byte aligned as in this dump.
>
> As a practical note, this is very difficult to achieve for dynamically
> allocated spinlocks.
>
Yea, I understand, will have to refer that to the software engineering
department. This is the non-HP Hardware Engineering (retired) Department.
Just imagine that the non-HP Hardware Engineering (retired) Department is
temporarily out of hyper-link ink.
> The intent of the errata seems to be to relax the alignment requirement
> for ldc[dw],co in cases where the spinlock is not being shared with a
> non-coherent I/O device. If spinlocks have to be aligned to the start
> of a cacheline, there doesn't seem to be any point to the errata.
>
This really applies more to the second clause and/or when you have
multiple semaphores per cache line. Also, for I/O, there is normally
no cache coherency signals between the I/O devices and the cpu(s).
Which is why non-HP systems do cache snooping.
Mike
> Dave
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-24 16:25 ` Grant Grundler
@ 2006-04-24 16:50 ` John David Anglin
2006-04-24 18:55 ` John David Anglin
` (2 more replies)
0 siblings, 3 replies; 28+ messages in thread
From: John David Anglin @ 2006-04-24 16:50 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
> On Mon, Apr 24, 2006 at 11:35:48AM -0400, John David Anglin wrote:
> > The intent of the errata seems to be to relax the alignment requirement
> > for ldc[dw],co in cases where the spinlock is not being shared with a
> > non-coherent I/O device.
>
> AFAIK, only one non-coherent PA-RISC box has PA2.0 and parisc-linux
> doesn't support it: T600.
It probably has coherent I & D caches as well as this seems to be a
requirement for both 1.1 and 2.0.
> (I'm assuming ",co" completer is PA2.0 only - please correct me if
> that's wrong.)
It's not in the PA 1.1 arch...
Dave
--
J. David Anglin dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-24 16:35 ` Michael S. Zick
@ 2006-04-24 18:00 ` Michael S. Zick
2006-04-24 19:15 ` John David Anglin
2006-04-24 21:57 ` Michael S. Zick
2006-04-24 18:46 ` John David Anglin
1 sibling, 2 replies; 28+ messages in thread
From: Michael S. Zick @ 2006-04-24 18:00 UTC (permalink / raw)
To: John David Anglin; +Cc: parisc-linux
On Mon April 24 2006 11:35, you wrote:
> On Mon April 24 2006 10:35, you wrote:
> > > ldcw,co target_address
> > >
> > > Where target_address includes the magic byte[0] of
> > > the cache line.
> >
> > Where is this documented?
> >
> Well, they didn't put it in the instruction RTL where
> someone could find it. It is a footnote or mentioned
> in passing somewhere.
>
> I will look for it, I found it about 5 years ago when
> Matt and I discussed this on the list, I can find it again.
>
Still looking...
> But it is reasonable - you don't want to waste the cache
> coherency bandwidth with every ldcw,co in the cache line.
>
> > > Translation:
> > >
> > > Spin on the ldcw,co not the ldw here.
> >
> > I believe this makes sense as the errata specifies that the ldcw,co
> > operation has to be performed in cache on PA 2.0 machines:
> > http://h21007.www2.hp.com/dspp/tech/tech_TechDocumentDetailPage_IDX/1,1701,5310,00.html
> >
>
Right, notice that it was at the request of HP-UX group for non-I/O
devices.
Think of it this way:
ldcw, co Cache_Line[0]
The hardware, system wide, exclusive lock for this cache line.
A cache line is a big place...
ldcw, co Cache_Line[4 .. max-4]
This cpu now owns the cache line, so the other cpus do not need
to be updated, nor the cache coherency bandwidth burned up...
The non-zero cache line offset does this trick.
These are the per-cpu (cache line owner) semaphores (and/or data) for
the multiple threads of execution that are servicing whatever caused
the cache line to be grabbed on a machine wide, exclusive lock.
But if our cpu is going to be a polite neighbor to the other cpus
in the machine, it will have to use an instruction (+completer)
that is immediately observable by the other cpus when it is done
with the hardware wide lock.
Yo - Software Engineering, are you home?
Mike
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-24 16:35 ` Michael S. Zick
2006-04-24 18:00 ` Michael S. Zick
@ 2006-04-24 18:46 ` John David Anglin
2006-04-24 19:12 ` Michael S. Zick
1 sibling, 1 reply; 28+ messages in thread
From: John David Anglin @ 2006-04-24 18:46 UTC (permalink / raw)
To: Michael S. Zick; +Cc: parisc-linux
> Those instruction descriptions do not always mention side-effects,
> even less often do they mention the exceptions to the side-effects.
>
> An exception (footnoted somewhere):
> ldcw,co does not set the dirty bit on the dcache line.
Don't see this. It uses mem_store just like stw. The only exception
is when gr0 is the target and it behaves like a prefetch. See also
discussion of D bit trap. If the machine has multiple D-caches, I
don't see how the overhead present in the coherency communication
can be avoided.
In the case where store_in_memory is used, the line is first flushed
and then the data is written to memory. It doesn't make sense to set
the dirty bit in this case. So, if you do a tight ldcw loop without
the co completer on a CPU that is not fully coherent, you will always
be in the slow store_in_memory case.
> Since there is no way to clear the lock with ldcw,co then when the
> lock is cleared, then there must be another magic completer that needs
> to be used on the instruction that resets the condition to '1' (unlocked).
>
> Something that triggers the cache coherency system so the change is
> immediately observable by all cpus.
The lock can be reset with a store. You are probably thinking of
the 'O' completer. However, I think that all PA-RISC CPUs have strongly
ordered loads and stores. However, I think the discussion on pages
435-438 in http://ftp.parisc-linux.org/docs/arch/parisc2.0.pdf is
relevant to the SMP case.
Dave
--
J. David Anglin dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-24 16:50 ` John David Anglin
@ 2006-04-24 18:55 ` John David Anglin
2006-04-25 0:38 ` Grant Grundler
2006-04-26 16:42 ` Michael S. Zick
2 siblings, 0 replies; 28+ messages in thread
From: John David Anglin @ 2006-04-24 18:55 UTC (permalink / raw)
To: John David Anglin; +Cc: parisc-linux
> > (I'm assuming ",co" completer is PA2.0 only - please correct me if
> > that's wrong.)
>
> It's not in the PA 1.1 arch...
Rechecked. I should have remembered given that I updated gas to improve
the handling of cache-control completers for PA 1.1 ;( So, we should be
using the 'co' completer except in situations (if any) where the spinlock
is shared with a non-coherent I/O processor. However, I believe that
PA 1.1 still has the 16-byte alignment requirement.
Need to look at the uses in GCC.
Dave
--
J. David Anglin dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-24 18:46 ` John David Anglin
@ 2006-04-24 19:12 ` Michael S. Zick
2006-04-24 21:07 ` John David Anglin
0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Zick @ 2006-04-24 19:12 UTC (permalink / raw)
To: John David Anglin; +Cc: parisc-linux
On Mon April 24 2006 13:46, John David Anglin wrote:
> > Those instruction descriptions do not always mention side-effects,
> > even less often do they mention the exceptions to the side-effects.
> >
> > An exception (footnoted somewhere):
> > ldcw,co does not set the dirty bit on the dcache line.
>
> Don't see this.
>
I agree, it is not clear
Start with:
http://h21007.www2.hp.com/dspp/files/unprotected/parisc20/PA_7_inst_descriptions.pdf
Find section 7-74, physical page 76 of the above.
Sub-section: "If the cache control hint is not specified ..."
First bullet, last sentence:
"If the line is retained in cache, it must not be marked dirty."
PA2.0 only does this on lines in cache with the co completer;
therefore, it must be 'retained in cache'.
Now:
Sub-section: "If the cache control hint is specified ..."
"... the semaphore operation _may_ be handled as if the cache control
hint had not been specified ..."
Now add in the errata to this flow ...
Then jump forward a page to the first clause of the indivisible RTL
statement:
Note that all the operations are qualified by "NO_HINT"
Duh...
And this is the easy one to find, still searching for the magic byte[0]
reference.
Mike
> It uses mem_store just like stw. The only exception
> is when gr0 is the target and it behaves like a prefetch. See also
> discussion of D bit trap. If the machine has multiple D-caches, I
> don't see how the overhead present in the coherency communication
> can be avoided.
>
> In the case where store_in_memory is used, the line is first flushed
> and then the data is written to memory. It doesn't make sense to set
> the dirty bit in this case. So, if you do a tight ldcw loop without
> the co completer on a CPU that is not fully coherent, you will always
> be in the slow store_in_memory case.
>
> > Since there is no way to clear the lock with ldcw,co then when the
> > lock is cleared, then there must be another magic completer that needs
> > to be used on the instruction that resets the condition to '1' (unlocked).
> >
> > Something that triggers the cache coherency system so the change is
> > immediately observable by all cpus.
>
> The lock can be reset with a store. You are probably thinking of
> the 'O' completer. However, I think that all PA-RISC CPUs have strongly
> ordered loads and stores. However, I think the discussion on pages
> 435-438 in http://ftp.parisc-linux.org/docs/arch/parisc2.0.pdf is
> relevant to the SMP case.
>
> Dave
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-24 18:00 ` Michael S. Zick
@ 2006-04-24 19:15 ` John David Anglin
2006-04-24 21:57 ` Michael S. Zick
1 sibling, 0 replies; 28+ messages in thread
From: John David Anglin @ 2006-04-24 19:15 UTC (permalink / raw)
To: Michael S. Zick; +Cc: parisc-linux
> Right, notice that it was at the request of HP-UX group for non-I/O
> devices.
>
> Think of it this way:
>
> ldcw, co Cache_Line[0]
>
> The hardware, system wide, exclusive lock for this cache line.
>
> A cache line is a big place...
>
> ldcw, co Cache_Line[4 .. max-4]
>
> This cpu now owns the cache line, so the other cpus do not need
> to be updated, nor the cache coherency bandwidth burned up...
> The non-zero cache line offset does this trick.
Ok, I understand. I don't think that there is anything special
regarding Cache_Line[0]. You just need 4 bytes for the system-wide
lock. Then, you can use the rest for semaphores on the cpu that
grabbed the system-wide lock.
Don't know if this trick would be useful for linux or not.
Dave
--
J. David Anglin dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-24 19:12 ` Michael S. Zick
@ 2006-04-24 21:07 ` John David Anglin
0 siblings, 0 replies; 28+ messages in thread
From: John David Anglin @ 2006-04-24 21:07 UTC (permalink / raw)
To: Michael S. Zick; +Cc: parisc-linux
> > > An exception (footnoted somewhere):
> > > ldcw,co does not set the dirty bit on the dcache line.
> >
> > Don't see this.
> >
> I agree, it is not clear
>
> Start with:
> http://h21007.www2.hp.com/dspp/files/unprotected/parisc20/PA_7_inst_descriptions.pdf
>
> Find section 7-74, physical page 76 of the above.
>
> Sub-section: "If the cache control hint is not specified ..."
>
> First bullet, last sentence:
> "If the line is retained in cache, it must not be marked dirty."
>
> PA2.0 only does this on lines in cache with the co completer;
> therefore, it must be 'retained in cache'.
>
> Now:
> Sub-section: "If the cache control hint is specified ..."
>
> "... the semaphore operation _may_ be handled as if the cache control
> hint had not been specified ..."
>
> Now add in the errata to this flow ...
I believe that the PA 2.0 errata requires support for the cache control
hint and that the operation must be performed in cache when it is specified.
The first bullet and second bullets only apply when the hint isn't specified.
I would argue that the errata requires PA 2.0 machines to effectively
use bullet 2 (see indivisible on page 7-75). In this case, I believe
that the line has to marked dirty. The difference being that the line
hasn't been flushed and zero written to memory.
The PA 1.1 situation is messy in that the arch gave the hardware designers
an out since the hint can be ignored. So, on a mchine that's not coherent,
it seems like an efficient implementation would try load the line and
make it dirty before doing the ldcw. However, this is only going to work
if the machine does the operation in cache. It's allowed to use the
technique in bullet 1. Thus, probably trying ldcw once and then sampling
with ldw is as optimal has it gets without checking the capabilities
of each CPU. PA7100LC ERS says the cache control hint is supported and
will operate in cache if the line is present the cache. PA7300LC ERS
states that ldcw hints are supported at all privilege levels. So,
I would say add the hint but be aware that it might be ignored on some
cpus.
> Then jump forward a page to the first clause of the indivisible RTL
> statement:
>
> Note that all the operations are qualified by "NO_HINT"
I think the hints used in mem_store are those in table 6-8 which is
why ldcw uses "NO_HINT".
Dave
--
J. David Anglin dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-24 18:00 ` Michael S. Zick
2006-04-24 19:15 ` John David Anglin
@ 2006-04-24 21:57 ` Michael S. Zick
2006-04-24 22:40 ` John David Anglin
1 sibling, 1 reply; 28+ messages in thread
From: Michael S. Zick @ 2006-04-24 21:57 UTC (permalink / raw)
To: parisc-linux; +Cc: John David Anglin
On Mon April 24 2006 13:00, Michael S. Zick wrote:
> On Mon April 24 2006 11:35, you wrote:
> > On Mon April 24 2006 10:35, you wrote:
> > > > ldcw,co target_address
> > > >
> > > > Where target_address includes the magic byte[0] of
> > > > the cache line.
> > >
> > > Where is this documented?
> > >
> > Well, they didn't put it in the instruction RTL where
> > someone could find it. It is a footnote or mentioned
> > in passing somewhere.
> >
> > I will look for it, I found it about 5 years ago when
> > Matt and I discussed this on the list, I can find it again.
> >
>
I give up - I can not find it now.
Which does not mean it is not there, somewhere.
Ever need to be put to sleep?
Read this:
http://h21007.www2.hp.com/dspp/files/unprotected/parisc20/PA_G_memory_ordering.pdf
In particular, that the semaphore instructions are described as
a load followed (sic: indivisibly) by a store.
Now branch thee here:
http://h21007.www2.hp.com/dspp/files/unprotected/parisc20/PA_6_inst_overview.pdf
Pick section 6-10, physically pages 11 and 12.
Now overlay tables 6-7, 6-8, and 6-9 and note the relationship of cc=01 in all
three tables.
ldw cc=01 <reserved>
stw cc=01 <stw,bc >
ldcw cc=01 <ldcw,co >
This is not an accident, in the days this cpu was designed, silicon did not
grow on trees.
Read thee the paragraphs between table 6-8, and 6-9 - note the special
significance of cache line byte[0] for stw,bc.
But finding that specificly for ldcw,co - even in en_HP - is beyond my
abilities.
Nor do I have a machine where I could simply change the *&^*& spinlock
macro to see if it makes a difference.
I think I will go fishing with Joel,
I am too old to stand for my orals now.
Mike
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-24 21:57 ` Michael S. Zick
@ 2006-04-24 22:40 ` John David Anglin
0 siblings, 0 replies; 28+ messages in thread
From: John David Anglin @ 2006-04-24 22:40 UTC (permalink / raw)
To: Michael S. Zick; +Cc: parisc-linux
> Now overlay tables 6-7, 6-8, and 6-9 and note the relationship of cc=01 in all
> three tables.
>
> ldw cc=01 <reserved>
> stw cc=01 <stw,bc >
> ldcw cc=01 <ldcw,co >
>
> This is not an accident, in the days this cpu was designed, silicon did not
> grow on trees.
>
> Read thee the paragraphs between table 6-8, and 6-9 - note the special
> significance of cache line byte[0] for stw,bc.
See.
> But finding that specificly for ldcw,co - even in en_HP - is beyond my
> abilities.
Possibly, you saw this in an ERS.
> I think I will go fishing with Joel,
> I am too old to stand for my orals now.
I thought I was the old guy here!
Dave
--
J. David Anglin dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-24 16:50 ` John David Anglin
2006-04-24 18:55 ` John David Anglin
@ 2006-04-25 0:38 ` Grant Grundler
2006-04-26 16:42 ` Michael S. Zick
2 siblings, 0 replies; 28+ messages in thread
From: Grant Grundler @ 2006-04-25 0:38 UTC (permalink / raw)
To: John David Anglin; +Cc: parisc-linux
On Mon, Apr 24, 2006 at 12:50:48PM -0400, John David Anglin wrote:
> > AFAIK, only one non-coherent PA-RISC box has PA2.0 and parisc-linux
> > doesn't support it: T600.
>
> It probably has coherent I & D caches as well as this seems to be a
> requirement for both 1.1 and 2.0.
Sorry - you are correct.
I was thinking IO. T600 is not DMA/cache coherent.
apologies for spacing out,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-24 15:35 ` John David Anglin
2006-04-24 16:25 ` Grant Grundler
2006-04-24 16:35 ` Michael S. Zick
@ 2006-04-25 15:17 ` Michael S. Zick
2006-04-25 18:52 ` Michael S. Zick
2 siblings, 1 reply; 28+ messages in thread
From: Michael S. Zick @ 2006-04-25 15:17 UTC (permalink / raw)
To: John David Anglin; +Cc: parisc-linux
On Mon April 24 2006 10:35, John David Anglin wrote:
> > ldcw,co target_address
> >
> > Where target_address includes the magic byte[0] of
> > the cache line.
>
> Where is this documented?
>
Close, not quite there yet:
HP patent number: 4,713,755
The page to retrieve this by number:
http://patft1.uspto.gov/netahtml/PTO/srchnum.htm
Now it should be a 'simple' matter to just read
every patent that references this one.
Where in the _published HP documents_ that describe
the implementation - Ah, that is another question.
So far the closest approximation is the stw,bc
behavior when including magic byte[0].
Mike
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-25 15:17 ` Michael S. Zick
@ 2006-04-25 18:52 ` Michael S. Zick
2006-04-25 21:42 ` John David Anglin
0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Zick @ 2006-04-25 18:52 UTC (permalink / raw)
To: parisc-linux; +Cc: John David Anglin
On Tue April 25 2006 10:17, Michael S. Zick wrote:
> On Mon April 24 2006 10:35, John David Anglin wrote:
> > > ldcw,co target_address
> > >
> > > Where target_address includes the magic byte[0] of
> > > the cache line.
> >
> > Where is this documented?
> >
> Close, not quite there yet:
>
> HP patent number: 4,713,755
>
> The page to retrieve this by number:
> http://patft1.uspto.gov/netahtml/PTO/srchnum.htm
>
> Now it should be a 'simple' matter to just read
> every patent that references this one.
>
One more conjecture confirmed,
The cache lines are a master/slave arrangement,
only one processor (or device) can be the master
(owner) of the cache line.
This is accomplished by keeping the cpu id (address)
of the master in the virtual tag.
Ref: HP patent number: 5,197,146
That is how the logical 'makePrivate' of the formal
memory model happens.
Translation:
Thou shall not allow load balancing to migrate a
task that holds a spinlock - you leave the master
of that cache line on the prior processor.
Mike
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-25 18:52 ` Michael S. Zick
@ 2006-04-25 21:42 ` John David Anglin
0 siblings, 0 replies; 28+ messages in thread
From: John David Anglin @ 2006-04-25 21:42 UTC (permalink / raw)
To: Michael S. Zick; +Cc: parisc-linux
> On Tue April 25 2006 10:17, Michael S. Zick wrote:
> > On Mon April 24 2006 10:35, John David Anglin wrote:
> > > > ldcw,co target_address
> > > >
> > > > Where target_address includes the magic byte[0] of
> > > > the cache line.
> > >
> > > Where is this documented?
> > >
> > Close, not quite there yet:
> >
> > HP patent number: 4,713,755
> >
> > The page to retrieve this by number:
> > http://patft1.uspto.gov/netahtml/PTO/srchnum.htm
> >
> > Now it should be a 'simple' matter to just read
> > every patent that references this one.
> >
>
> One more conjecture confirmed,
>
> The cache lines are a master/slave arrangement,
> only one processor (or device) can be the master
> (owner) of the cache line.
>
> This is accomplished by keeping the cpu id (address)
> of the master in the virtual tag.
>
> Ref: HP patent number: 5,197,146
>
> That is how the logical 'makePrivate' of the formal
> memory model happens.
That's pretty much what I said this morning.
Correct me if I'm wrong but wouldn't an ldcw,co on a different
cpu cause a transfer of ownership? However, it's not clear to
me that a cpu that doesn't own the line can unlock the semaphore
owned by a different cpu. As far as I can tell, the only
instructions that appear to perform a coherent store are ldcw,co
and ldcd,co, and these can't do an unlock. Hmmm, maybe doing a
flush and sync (see G-3 and G-4) before the store would do the trick.
> Translation:
>
> Thou shall not allow load balancing to migrate a
> task that holds a spinlock - you leave the master
> of that cache line on the prior processor.
Page G-3 says that a SYNC is necessary when switching tasks. Possibly,
this together with the flush allows spinlocks in user code.
I see the PA-8200 has a CINCD instruction (coherent 64-bit increment).
Dave
--
J. David Anglin dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [parisc-linux] Does it lakes some cloberred r1 in
2006-04-24 16:50 ` John David Anglin
2006-04-24 18:55 ` John David Anglin
2006-04-25 0:38 ` Grant Grundler
@ 2006-04-26 16:42 ` Michael S. Zick
2 siblings, 0 replies; 28+ messages in thread
From: Michael S. Zick @ 2006-04-26 16:42 UTC (permalink / raw)
To: parisc-linux; +Cc: John David Anglin
On Mon April 24 2006 11:50, John David Anglin wrote:
> > On Mon, Apr 24, 2006 at 11:35:48AM -0400, John David Anglin wrote:
> > > The intent of the errata seems to be to relax the alignment requirement
> > > for ldc[dw],co in cases where the spinlock is not being shared with a
> > > non-coherent I/O device.
> >
> > AFAIK, only one non-coherent PA-RISC box has PA2.0 and parisc-linux
> > doesn't support it: T600.
>
> It probably has coherent I & D caches as well as this seems to be a
> requirement for both 1.1 and 2.0.
>
> > (I'm assuming ",co" completer is PA2.0 only - please correct me if
> > that's wrong.)
>
> It's not in the PA 1.1 arch...
>
Dave, Grant, Group;
We are faced with 10 year old documentation on a 20 year old design
and a question on a 21st century kernel.
To find a direct quote in answer to a bullet proof spinlock in this
context has eluded me.
But, I think I can explain the requirements.
To keep this mail from becoming pathologically long, I will paraphrase
a lot.
You want on hand the:
PA7200_design.pdf (HP website);
Ms. Ruby B. Lee's article "Precision Architecture" from IEEE Computer,
Volume 22, No. 1, January 1989, pp 79-91 (IEEE Society will sell you
a reprint);
The tables and descriptions of load/store completer codes from the
PA Instruction Overview (website);
The instruction descriptions for ldw/stw/ldcw from the PA Instruction
descriptions (website);
Optionally, HP Patent No. 6,079,012 - get the 'image' version with
the nice flow charts. (ignore the issue date)
- - - -
Background:
Ms. Lee redesigned the data cache system for the PA7200; she also
gave us the instructions that we need to deal with our problem.
Ms. Lee describes this as a Level 1 cache without a Level 2 cache.
Without quibbling over terminology, consider it a split, single
level, cache system. That makes its layout clearer.
Physically, it is split into two parts. A small cache, on-chip,
which she terms an 'assist cache' and the larger capacity, off-chip
cache. Both run at cpu clock speed.
The cpu operates only on/to/from the data in the assist cache.
A not present address is fetched directly to the assist cache, if
it is not present in the off-chip cache.
What happens when the little cache runs out of room?
Without programmer intervention, it is written to the off-chip cache,
and it is the off-chip cache that deals with memory when required.
How to make changes (such as lock release) immediately observable?
Ms. Lee gave us a couple new instructions. The tech writer who wrote
the instruction descriptions call these 'Hints'.
Ms. Lee describes them as cache control commands.
Enter the 'sl' completer for ldw/stw.
Specifying the 'sl' disables the write-back to the off-chip cache
and causes the assist cache to write to main memory. In effect, a
strongly ordered store of a Dcache line flush.
This design also employs 'cache snooping' and allows approximately
ten snoops to be outstanding at the same time.
The snoops have a higher priority than the data/instruction transactions.
They _should not_ (but might) require a 'sync' to give us the
immediately observable lock release that we require.
It is unclear to me if the ownership (master/slave) relationship
of the cache lines still holds in this new design. Both the on-chip
and off-chip caches are under the control of the same on-chip CAM.
(Which probably why she calls this a single level cache system.)
- - - -
Entering the 21st century, Linux world...
Consider:
Load balancing can 'pull' a task off of a cpu and migrate it
to another cpu...
Kernel code is executed in the kernel context of the user task...
Tasks can be preempted...
It seems (without a code path audit) that a task could be migrated
to another cpu while holding an 'in system' spinlock.
Now that would be a rare occurrence, but the current spinlocks only
fail once every two or three days under heavy load.
- - - -
The bullet proof spinlock...
The elegant solution lies somewhere between this and what is
currently implemented.
/* Optional, only if available and proven needed */
- - Save current flag and set tcb 'do not migrate'
- - Save current flag and set tcb 'do not preempt'
/* If another processor already holds the lock, the cache
line has been replicated in our cache. Pretend we are
first to use the lock. */
ldw,sl the_lock, a_register /* This will be assist cache only */
ldcw,co the_lock, a_register /* Try to grab lock */
- - If successful, continue
- - If fail, spin on either ldcw,co or a ldw,sl - not clear which
/* Now release the lock with the implied assist cache line flush
to memory. */
stw,sl UNLOCKED, the_lock
/* If we diddled the scheduler tcb flags, restore them here
- - Restore the tcb 'do not preempt'
- - Restore the tcb 'do not migrate'
- - - -
At which point, I must defer to the software engineering department.
Mike
> Dave
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2006-04-26 16:42 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20060422154641.GC10514@quicksilver.road.mcmartin.ca>
2006-04-22 16:48 ` [parisc-linux] Does it lakes some cloberred r1 in John David Anglin
2006-04-23 16:18 ` Michael S. Zick
2006-04-23 17:06 ` Michael S. Zick
2006-04-24 15:35 ` John David Anglin
2006-04-24 16:25 ` Grant Grundler
2006-04-24 16:50 ` John David Anglin
2006-04-24 18:55 ` John David Anglin
2006-04-25 0:38 ` Grant Grundler
2006-04-26 16:42 ` Michael S. Zick
2006-04-24 16:35 ` Michael S. Zick
2006-04-24 18:00 ` Michael S. Zick
2006-04-24 19:15 ` John David Anglin
2006-04-24 21:57 ` Michael S. Zick
2006-04-24 22:40 ` John David Anglin
2006-04-24 18:46 ` John David Anglin
2006-04-24 19:12 ` Michael S. Zick
2006-04-24 21:07 ` John David Anglin
2006-04-25 15:17 ` Michael S. Zick
2006-04-25 18:52 ` Michael S. Zick
2006-04-25 21:42 ` John David Anglin
[not found] <200604212013.k3LKDAbx003500@hiauly1.hia.nrc.ca>
2006-04-21 20:30 ` John David Anglin
2006-04-20 17:09 [parisc-linux] Does it lakes some cloberred r1 in __put_kernel_asm() 64bit? Carlos O'Donell
2006-04-20 17:28 ` [parisc-linux] Does it lakes some cloberred r1 in John David Anglin
2006-04-20 17:36 ` Michael S. Zick
2006-04-20 19:32 ` John David Anglin
2006-04-20 20:21 ` Michael S. Zick
2006-04-20 20:04 ` Carlos O'Donell
2006-04-20 21:29 ` John David Anglin
2006-04-21 18:52 ` Michael S. Zick
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox