* Questions on the stack for IA64
@ 2004-04-13 5:17 Rahul Chaturvedi
2004-04-13 16:25 ` Luck, Tony
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Rahul Chaturvedi @ 2004-04-13 5:17 UTC (permalink / raw)
To: linux-ia64
(***I would appreciate it if you'd cc your reply to
this mail address since I am not a member of this
list***)
I have been trying to figure out exactly how the stack
mechanisms available on IA-64 work internally for a
while now. I've made a few observations which I had a
few questions on,
a.) The stack pointer seems to be r12, which is
pointing to a location along the lines of,
0x60000fffffffxxxx
Shouldn't this address be in the 0x8xxxxxxxxxxxxxxx
range? Isn't the 0x6xxxxxxxxxxxxxxx range reserved for
the data segment?
I also checked the address of a global variable, and
it was stored at in the 0x6xxxxxxxxxxxxxxx range.
Wouldn't the stack and the data segment conflict?
b.) There is something called $GP and $TP that the
compiler defines as r1 and r13 respectively. Anyone
got any idea what they are?
c.) How exactly is the data stored in the bspstore
locations?
d.) The pfs register seems to point to a location in
0xcxxxxxxxxxxxxxxx. Shouldn't this point to some
accessible memory location (from usermode)?
e.) For some reason, compiled code, even with full
optimizations seems to do a "st8" on any value I put
in a local variable. For example, I have a function
with about 40 variables, the intel compiler that I am
using does something like,
mov r47, value
st8 [r48], r47
GCC also does something similar.
Shouldn't the compiler simply store the value in r47
instead of st8'ing it to memory when using full
optimizations (-O2)?
Please forgive me if my questions seem a bit naive, my
experience with IA64 is very limited.
I'd also like to remind you to please CC this address
on your reply :)
Thanks
__________________________________
Do you Yahoo!?
Yahoo! Small Business $15K Web Design Giveaway
http://promotions.yahoo.com/design_giveaway/
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Questions on the stack for IA64
2004-04-13 5:17 Questions on the stack for IA64 Rahul Chaturvedi
@ 2004-04-13 16:25 ` Luck, Tony
2004-04-13 17:24 ` Chen, Kenneth W
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Luck, Tony @ 2004-04-13 16:25 UTC (permalink / raw)
To: linux-ia64
>-----Original Message-----
>From: linux-ia64-owner@vger.kernel.org
>[mailto:linux-ia64-owner@vger.kernel.org] On Behalf Of Rahul Chaturvedi
>Sent: Monday, April 12, 2004 10:17 PM
>a.) The stack pointer seems to be r12, which is
>pointing to a location along the lines of,
>
>0x60000fffffffxxxx
>
>Shouldn't this address be in the 0x8xxxxxxxxxxxxxxx
>range? Isn't the 0x6xxxxxxxxxxxxxxx range reserved for
>the data segment?
The user stack was moved from region 4 (0x8....) to
region 3 (0x6....) to free up region 4 for use by the
hugetlbfs code (using large page size like 256M for
very large objects to reduce TLB overhead).
>I also checked the address of a global variable, and
>it was stored at in the 0x6xxxxxxxxxxxxxxx range.
>Wouldn't the stack and the data segment conflict?
64-bit address space is very large ... so the data and
stack are still very far apart, even though they have
addresses in region 3.
>b.) There is something called $GP and $TP that the
>compiler defines as r1 and r13 respectively. Anyone
>got any idea what they are?
r1 is used for simple access to some of the data, without
it you would need to use "movl reg=<64-bit value>" to get
the address of a global. Using it, many data objects can
be accessed using relative addressing from the $GP, a.k.a.
__gp address. $TP is used in the same way for thread local
storage.
>c.) How exactly is the data stored in the bspstore
>locations?
The processor writes out stacked registers to the bspstore
area "in the background" while other execution is happening.
>d.) The pfs register seems to point to a location in
>0xcxxxxxxxxxxxxxxx. Shouldn't this point to some
>accessible memory location (from usermode)?
someone else can take this one?
>e.) For some reason, compiled code, even with full
>optimizations seems to do a "st8" on any value I put
>in a local variable. For example, I have a function
>with about 40 variables, the intel compiler that I am
>using does something like,
>
>mov r47, value
>st8 [r48], r47
>
>GCC also does something similar.
>
>Shouldn't the compiler simply store the value in r47
>instead of st8'ing it to memory when using full
>optimizations (-O2)?
Can you provide a sample of C-code that results in this
excessive storing to memory?
-Tony
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Questions on the stack for IA64
2004-04-13 5:17 Questions on the stack for IA64 Rahul Chaturvedi
2004-04-13 16:25 ` Luck, Tony
@ 2004-04-13 17:24 ` Chen, Kenneth W
2004-04-14 5:05 ` Rahul Chaturvedi
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Chen, Kenneth W @ 2004-04-13 17:24 UTC (permalink / raw)
To: linux-ia64
>>>>> Luck, Tony wrote on Tuesday, April 13, 2004 9:26 AM
> >Rahul Chaturvedi wrote on Monday, April 12, 2004 10:17 PM
> >d.) The pfs register seems to point to a location in
> >0xcxxxxxxxxxxxxxxx. Shouldn't this point to some
> >accessible memory location (from usermode)?
>
> someone else can take this one?
ar.pfs is not a memory address pointer. See section 3.1.8.10 of
ia64 software developer's manual, volume 1 for proper way to
decode that register (FYI, bit [63-62] is previous privilege
level, which is 3 in the above case).
- Ken
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Questions on the stack for IA64
2004-04-13 5:17 Questions on the stack for IA64 Rahul Chaturvedi
2004-04-13 16:25 ` Luck, Tony
2004-04-13 17:24 ` Chen, Kenneth W
@ 2004-04-14 5:05 ` Rahul Chaturvedi
2004-04-14 5:19 ` David Mosberger
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Rahul Chaturvedi @ 2004-04-14 5:05 UTC (permalink / raw)
To: linux-ia64
> >c.) How exactly is the data stored in the bspstore
> >locations?
>
> The processor writes out stacked registers to the
> bspstore
> area "in the background" while other execution is
> happening.
Does the compiler ever touch this backing store?
I believe it is used during a stack unwind?
> >b.) There is something called $GP and $TP that the
> >compiler defines as r1 and r13 respectively. Anyone
> >got any idea what they are?
>
> r1 is used for simple access to some of the data,
> without
> it you would need to use "movl reg=<64-bit value>"
> to get
> the address of a global. Using it, many data
> objects can
> be accessed using relative addressing from the $GP,
> a.k.a.
> __gp address. $TP is used in the same way for
> thread local
> storage.
Thanks, the code makes a lot more sense now :)
> Can you provide a sample of C-code that results in
> this
> excessive storing to memory?
(I hope this e-mail is still readable after all this
code pasted in here =\)
+++------------------------------+++
int func(char* pszTest, int iTest, long lTest)
{
int a, b, c, d, e, f, g, h, i, j, k, l, m, n,
o, p, q, r, s, t, u, v, w, x, y, z;
a = 1; b = 2; c = 3; d = 4; e = 5; f = 6; g 7; h = 8; i = 9; j = 10; k = 11; l = 12; m = 13;
n = 14; o = 15; p = 16; q = 17; r = 18; s = 19; t 20; u = 21; v = 22; w = 23; x = 24; y = 25; z = 26;
printf("", a, b, c, d, e, f, g, h, i, j, k, l,
m, n, o, p, q, r, s, t, u, v, w, x, y, z, aa, bb, cc,
dd, ee, ff, gg);
c = temp(a, b);
return 0;
}
+++------------------------------+++
This code when compiled with gcc -g -O2 gives me this
assembly,
+++------------------------------+++
(From objdump) The function starts with,
int func(char* pszTest, int iTest, long lTest)
{
4000000000000680: 00 08 2d 06 80 05 [MII]
alloc r33=ar.pfs,11,3,0
Then corresponding to these lines,
int a, b, c, d, e, f, g, h, i, j, k, l, m, n,
o, p, q, r, s, t, u, v, w, x, y, z;
a = 1; b = 2; c = 3; d = 4; e = 5; f = 6; g 7; h = 8; i = 9; j = 10; k = 11; l = 12; m = 13;
n = 14; o = 15; p = 16; q = 17; r = 18; s = 19; t 20; u = 21; v = 22; w = 23; x = 24; y = 25; z = 26;
printf("", a, b, c, d, e, f, g, h, i, j, k, l,
m, n, o, p, q, r, s, t, u, v, w, x, y, z, aa, bb, cc,
dd, ee, ff, gg);
This is the assembly code,
4000000000000686: c0 00 33 7c 46 c0
adds r12=-160,r12
400000000000068c: 81 00 00 90
mov r14=8
4000000000000690: 09 28 09 00 00 24 [MMI]
mov r37=2
4000000000000696: 60 1a 00 00 48 e0
mov r38=3
400000000000069c: 44 00 00 90
mov r39=4;;
40000000000006a0: 00 78 40 18 00 21 [MII]
adds r15\x16,r12
40000000000006a6: 00 c1 30 00 42 20
adds r16$,r12
40000000000006ac: 02 62 00 84
adds r172,r12
40000000000006b0: 00 90 a0 18 00 21 [MII]
adds r18@,r12
40000000000006b6: 30 81 31 00 42 80
adds r19H,r12
40000000000006bc: 82 63 00 84
adds r20V,r12
40000000000006c0: 09 a8 00 19 00 21 [MMI]
adds r21d,r12
40000000000006c6: 60 41 32 00 42 00
adds r22r,r12
40000000000006cc: 00 00 04 00
nop.i 0x0;;
40000000000006d0: 00 00 38 1e 90 11 [MII]
st4 [r15]=r14
40000000000006d6: e0 48 00 00 48 e0
mov r14=9
40000000000006dc: 02 65 00 84
adds r23€,r12
40000000000006e0: 0a c0 60 19 00 21 [MMI]
adds r24ˆ,r12;;
40000000000006e6: 00 70 40 20 23 c0
st4 [r16]=r14
40000000000006ec: a1 00 00 90
mov r14\x10
40000000000006f0: 02 c8 80 19 00 21 [MII]
adds r25–,r12
40000000000006f6: a0 41 33 00 42 60
adds r26\x104,r12;;
40000000000006fc: 03 67 00 84
adds r27\x112,r12
4000000000000700: 09 00 38 22 90 11 [MMI]
st4 [r17]=r14
4000000000000706: e0 58 00 00 48 80
mov r14\x11
400000000000070c: 83 67 00 84
adds r28\x120,r12;;
4000000000000710: 00 00 38 24 90 11 [MII]
st4 [r18]=r14
4000000000000716: e0 60 00 00 48 a0
mov r14\x12
400000000000071c: 03 60 04 84
adds r29\x128,r12
4000000000000720: 0a f0 20 18 01 21 [MMI]
adds r30\x136,r12;;
4000000000000726: 00 70 4c 20 23 c0
st4 [r19]=r14
400000000000072c: d1 00 00 90
mov r14\x13
4000000000000730: 02 f8 40 18 01 21 [MII]
adds r31\x144,r12
4000000000000736: 80 c0 30 02 42 20
adds r8\x152,r12;;
400000000000073c: 01 62 04 84
adds r9\x160,r12
4000000000000740: 09 00 38 28 90 11 [MMI]
st4 [r20]=r14
4000000000000746: e0 70 00 00 48 60
mov r14\x14
400000000000074c: 04 0d 00 90
addl r35€,r1;;
4000000000000750: 00 00 38 2a 90 11 [MII]
st4 [r21]=r14
4000000000000756: e0 78 00 00 48 40
mov r14\x15
400000000000075c: 04 08 00 84
mov r34=r1
4000000000000760: 0a 40 15 00 00 24 [MMI]
mov r40=5;;
4000000000000766: 00 70 58 20 23 c0
st4 [r22]=r14
400000000000076c: 01 01 00 90
mov r14\x16
4000000000000770: 02 48 19 00 00 24 [MII]
mov r41=6
4000000000000776: a0 3a 00 00 48 00
mov r42=7;;
400000000000077c: 04 00 c4 00
mov r32°
4000000000000780: 09 00 38 2e 90 11 [MMI]
st4 [r23]=r14
4000000000000786: e0 88 00 00 48 80
mov r14\x17
400000000000078c: 14 00 00 90
mov r36=1;;
4000000000000790: 00 00 38 30 90 11 [MII]
st4 [r24]=r14
4000000000000796: e0 90 00 00 48 00
mov r14\x18
400000000000079c: 00 00 04 00
nop.i 0x0
40000000000007a0: 0b 18 01 46 18 10 [MMI]
ld8 r35=[r35];;
40000000000007a6: 00 70 64 20 23 c0
st4 [r25]=r14
40000000000007ac: 31 01 00 90
mov r14\x19;;
40000000000007b0: 02 00 38 34 90 11 [MII]
st4 [r26]=r14
40000000000007b6: e0 a0 00 00 48 00
mov r14 ;;
40000000000007bc: 00 00 04 00
nop.i 0x0
40000000000007c0: 02 00 38 36 90 11 [MII]
st4 [r27]=r14
40000000000007c6: e0 a8 00 00 48 00
mov r14!;;
40000000000007cc: 00 00 04 00
nop.i 0x0
40000000000007d0: 02 00 38 38 90 11 [MII]
st4 [r28]=r14
40000000000007d6: e0 b0 00 00 48 00
mov r14";;
40000000000007dc: 00 00 04 00
nop.i 0x0
40000000000007e0: 02 00 38 3a 90 11 [MII]
st4 [r29]=r14
40000000000007e6: e0 b8 00 00 48 00
mov r14#;;
40000000000007ec: 00 00 04 00
nop.i 0x0
40000000000007f0: 02 00 38 3c 90 11 [MII]
st4 [r30]=r14
40000000000007f6: e0 c0 00 00 48 00
mov r14$;;
40000000000007fc: 00 00 04 00
nop.i 0x0
4000000000000800: 02 00 38 3e 90 11 [MII]
st4 [r31]=r14
4000000000000806: e0 c8 00 00 48 00
mov r14%;;
400000000000080c: 00 00 04 00
nop.i 0x0
4000000000000810: 02 00 38 10 90 11 [MII]
st4 [r8]=r14
4000000000000816: e0 d0 00 00 48 00
mov r14&;;
400000000000081c: 00 00 04 00
nop.i 0x0
4000000000000820: 11 00 38 12 90 11 [MIB]
st4 [r9]=r14
4000000000000826: 00 00 00 02 00 00
nop.i 0x0
400000000000082c: 28 fc ff 58
br.call.sptk.many b0@00000000000440
<_init+0x150>;;
4000000000000830: 00 08 00 44 00 21 [MII]
mov r1=r34
+++------------------------------+++
As you can see, almost all the parameters are being
"st4'd" to memory. From the architecture
specification, shouldn't all these be moved into
registers? I have a total of much lesser than 96
parameters?
Code with lesser parameters (around 2-3) seems to use
registers fine though.
In the same code, corresponding to,
c = temp(a, b);
The corresponding assembly is,
4000000000000836: 30 0a 00 00 48 80
mov r35=1
400000000000083c: 24 00 00 90
mov r36=2
4000000000000840: 1d 00 00 00 01 00 [MFB]
nop.m 0x0
4000000000000846: 00 00 00 02 00 00
nop.f 0x0
400000000000084c: e8 fd ff 58
br.call.sptk.many b0@00000000000620 <temp>;;
The register stack has more than enough registers to
handle 26 input's, 26 outputs and 26 local variables,
so why would it not put all the locals into registers
and pass them onto printf?
>>>>> Luck, Tony wrote on Tuesday, April 13, 2004 9:26
AM
> >Rahul Chaturvedi wrote on Monday, April 12, 2004
10:17 PM
> >d.) The pfs register seems to point to a location
in
> >0xcxxxxxxxxxxxxxxx. Shouldn't this point to some
> >accessible memory location (from usermode)?
>
> someone else can take this one?
ar.pfs is not a memory address pointer. See section
3.1.8.10 of
ia64 software developer's manual, volume 1 for proper
way to
decode that register (FYI, bit [63-62] is previous
privilege
level, which is 3 in the above case).
- Ken
Alright, okay, that also makes more sense now :)
I didn't look for the format for the pfs, I assumed it
was just a memory address.
__________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online by April 15th
http://taxes.yahoo.com/filing.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Questions on the stack for IA64
2004-04-13 5:17 Questions on the stack for IA64 Rahul Chaturvedi
` (2 preceding siblings ...)
2004-04-14 5:05 ` Rahul Chaturvedi
@ 2004-04-14 5:19 ` David Mosberger
2004-04-14 5:29 ` Ian Wienand
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: David Mosberger @ 2004-04-14 5:19 UTC (permalink / raw)
To: linux-ia64
>>>>> On Tue, 13 Apr 2004 22:05:56 -0700 (PDT), Rahul Chaturvedi <justanotheraliasforrahul@yahoo.com> said:
Rahul> Does the compiler ever touch this backing store?
Not directly.
Rahul> I believe it is used during a stack unwind?
Yes (as is the memory stack and the registers).
Rahul> printf("", a, b, c, d, e, f, g, h, i, j, k, l, m, n,
Rahul> o, p, q, r, s, t, u, v, w, x, y, z, aa, bb, cc, dd, ee, ff,
Rahul> gg);
Rahul> As you can see, almost all the parameters are being "st4'd"
Rahul> to memory. From the architecture specification, shouldn't all
Rahul> these be moved into registers? I have a total of much lesser
Rahul> than 96 parameters?
You need to look at the software conventions & runtime architecture guide:
http://www.intel.com/design/itanium/downloads/245358.htm
It specifies that up to 8 registers are used for argument passing.
There are also some fine books that might help you get started. ;-)
--david
--
Interested in learning more about IA-64 Linux? Try http://www.lia64.org/book/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Questions on the stack for IA64
2004-04-13 5:17 Questions on the stack for IA64 Rahul Chaturvedi
` (3 preceding siblings ...)
2004-04-14 5:19 ` David Mosberger
@ 2004-04-14 5:29 ` Ian Wienand
2004-04-14 5:34 ` Rahul Chaturvedi
2004-04-14 16:06 ` Luck, Tony
6 siblings, 0 replies; 8+ messages in thread
From: Ian Wienand @ 2004-04-14 5:29 UTC (permalink / raw)
To: linux-ia64
[-- Attachment #1: Type: text/plain, Size: 802 bytes --]
On Tue, Apr 13, 2004 at 10:05:56PM -0700, Rahul Chaturvedi wrote:
> The register stack has more than enough registers to handle 26
> input's, 26 outputs and 26 local variables, so why would it not put
> all the locals into registers and pass them onto printf?
See
"Itanium Software Conventions and Runtime Architecture Guide"
http://www.intel.com/design/itanium/downloads/24535803.pdf
on page 43, where it says
"The contents of the first eight parameter slots are always passed in
registers, while the remaining parameters are always passed on the
memory stack"
In general, your best references are probably everything on
http://www.intel.com/design/itanium/arch_spec.htm
and
http://www.lia64.org/book/
-i
ianw@gelato.unsw.edu.au
http://www.gelato.unsw.edu.au
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Questions on the stack for IA64
2004-04-13 5:17 Questions on the stack for IA64 Rahul Chaturvedi
` (4 preceding siblings ...)
2004-04-14 5:29 ` Ian Wienand
@ 2004-04-14 5:34 ` Rahul Chaturvedi
2004-04-14 16:06 ` Luck, Tony
6 siblings, 0 replies; 8+ messages in thread
From: Rahul Chaturvedi @ 2004-04-14 5:34 UTC (permalink / raw)
To: linux-ia64
> Rahul> Does the compiler ever touch this backing
> store?
>
> Not directly.
Another question on this, who allocates the memory for
the backing store? What if the RSE runs out of memory
there?
>
> Rahul> I believe it is used during a stack unwind?
>
> Yes (as is the memory stack and the registers).
>
> Rahul> printf("", a, b, c, d, e, f, g, h,
> i, j, k, l, m, n,
> Rahul> o, p, q, r, s, t, u, v, w, x, y, z, aa, bb,
> cc, dd, ee, ff,
> Rahul> gg);
>
> Rahul> As you can see, almost all the parameters
> are being "st4'd"
> Rahul> to memory. From the architecture
> specification, shouldn't all
> Rahul> these be moved into registers? I have a
> total of much lesser
> Rahul> than 96 parameters?
>
> You need to look at the software conventions &
> runtime architecture guide:
>
>
>
http://www.intel.com/design/itanium/downloads/245358.htm
>
> It specifies that up to 8 registers are used for
> argument passing.
>
Oh okay. I'll read the rest of the guide before I ask
more stupid questions on this :)
> There are also some fine books that might help you
> get started. ;-)
Books are expensive (especially in India) :( Processor
manuals and mailing lists are free :)
__________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online by April 15th
http://taxes.yahoo.com/filing.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Questions on the stack for IA64
2004-04-13 5:17 Questions on the stack for IA64 Rahul Chaturvedi
` (5 preceding siblings ...)
2004-04-14 5:34 ` Rahul Chaturvedi
@ 2004-04-14 16:06 ` Luck, Tony
6 siblings, 0 replies; 8+ messages in thread
From: Luck, Tony @ 2004-04-14 16:06 UTC (permalink / raw)
To: linux-ia64
>Another question on this, who allocates the memory for
>the backing store? What if the RSE runs out of memory
>there?
For the user ... the kernel allocates space for RSE. If it
runs out, then the kernel will grow the vma (just like it
does when you run out of regular stack ... except the RSE
grows up to higher addresses instead of down to lower addresses).
In the kernel the RSE starts just above the task structure and
grows up towards the stack which is growing down from the
the pt_regs that are allocated at the top of the pages allocated
for the task. Running out of kernel RSE (or stack) results in
clobbering the switch_stack which lies in between. There is
a pretty ascii picture in include/asm-ia64/ptrace.h
-Tony
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-04-14 16:06 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-13 5:17 Questions on the stack for IA64 Rahul Chaturvedi
2004-04-13 16:25 ` Luck, Tony
2004-04-13 17:24 ` Chen, Kenneth W
2004-04-14 5:05 ` Rahul Chaturvedi
2004-04-14 5:19 ` David Mosberger
2004-04-14 5:29 ` Ian Wienand
2004-04-14 5:34 ` Rahul Chaturvedi
2004-04-14 16:06 ` Luck, Tony
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox