* Questions on the stack for IA64
@ 2004-04-13 5:17 Rahul Chaturvedi
2004-04-13 16:25 ` Luck, Tony
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Rahul Chaturvedi @ 2004-04-13 5:17 UTC (permalink / raw)
To: linux-ia64
(***I would appreciate it if you'd cc your reply to
this mail address since I am not a member of this
list***)
I have been trying to figure out exactly how the stack
mechanisms available on IA-64 work internally for a
while now. I've made a few observations which I had a
few questions on,
a.) The stack pointer seems to be r12, which is
pointing to a location along the lines of,
0x60000fffffffxxxx
Shouldn't this address be in the 0x8xxxxxxxxxxxxxxx
range? Isn't the 0x6xxxxxxxxxxxxxxx range reserved for
the data segment?
I also checked the address of a global variable, and
it was stored at in the 0x6xxxxxxxxxxxxxxx range.
Wouldn't the stack and the data segment conflict?
b.) There is something called $GP and $TP that the
compiler defines as r1 and r13 respectively. Anyone
got any idea what they are?
c.) How exactly is the data stored in the bspstore
locations?
d.) The pfs register seems to point to a location in
0xcxxxxxxxxxxxxxxx. Shouldn't this point to some
accessible memory location (from usermode)?
e.) For some reason, compiled code, even with full
optimizations seems to do a "st8" on any value I put
in a local variable. For example, I have a function
with about 40 variables, the intel compiler that I am
using does something like,
mov r47, value
st8 [r48], r47
GCC also does something similar.
Shouldn't the compiler simply store the value in r47
instead of st8'ing it to memory when using full
optimizations (-O2)?
Please forgive me if my questions seem a bit naive, my
experience with IA64 is very limited.
I'd also like to remind you to please CC this address
on your reply :)
Thanks
__________________________________
Do you Yahoo!?
Yahoo! Small Business $15K Web Design Giveaway
http://promotions.yahoo.com/design_giveaway/
^ permalink raw reply [flat|nested] 8+ messages in thread* RE: Questions on the stack for IA64 2004-04-13 5:17 Questions on the stack for IA64 Rahul Chaturvedi @ 2004-04-13 16:25 ` Luck, Tony 2004-04-13 17:24 ` Chen, Kenneth W ` (5 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: Luck, Tony @ 2004-04-13 16:25 UTC (permalink / raw) To: linux-ia64 >-----Original Message----- >From: linux-ia64-owner@vger.kernel.org >[mailto:linux-ia64-owner@vger.kernel.org] On Behalf Of Rahul Chaturvedi >Sent: Monday, April 12, 2004 10:17 PM >a.) The stack pointer seems to be r12, which is >pointing to a location along the lines of, > >0x60000fffffffxxxx > >Shouldn't this address be in the 0x8xxxxxxxxxxxxxxx >range? Isn't the 0x6xxxxxxxxxxxxxxx range reserved for >the data segment? The user stack was moved from region 4 (0x8....) to region 3 (0x6....) to free up region 4 for use by the hugetlbfs code (using large page size like 256M for very large objects to reduce TLB overhead). >I also checked the address of a global variable, and >it was stored at in the 0x6xxxxxxxxxxxxxxx range. >Wouldn't the stack and the data segment conflict? 64-bit address space is very large ... so the data and stack are still very far apart, even though they have addresses in region 3. >b.) There is something called $GP and $TP that the >compiler defines as r1 and r13 respectively. Anyone >got any idea what they are? r1 is used for simple access to some of the data, without it you would need to use "movl reg=<64-bit value>" to get the address of a global. Using it, many data objects can be accessed using relative addressing from the $GP, a.k.a. __gp address. $TP is used in the same way for thread local storage. >c.) How exactly is the data stored in the bspstore >locations? The processor writes out stacked registers to the bspstore area "in the background" while other execution is happening. >d.) The pfs register seems to point to a location in >0xcxxxxxxxxxxxxxxx. Shouldn't this point to some >accessible memory location (from usermode)? someone else can take this one? >e.) For some reason, compiled code, even with full >optimizations seems to do a "st8" on any value I put >in a local variable. For example, I have a function >with about 40 variables, the intel compiler that I am >using does something like, > >mov r47, value >st8 [r48], r47 > >GCC also does something similar. > >Shouldn't the compiler simply store the value in r47 >instead of st8'ing it to memory when using full >optimizations (-O2)? Can you provide a sample of C-code that results in this excessive storing to memory? -Tony ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Questions on the stack for IA64 2004-04-13 5:17 Questions on the stack for IA64 Rahul Chaturvedi 2004-04-13 16:25 ` Luck, Tony @ 2004-04-13 17:24 ` Chen, Kenneth W 2004-04-14 5:05 ` Rahul Chaturvedi ` (4 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: Chen, Kenneth W @ 2004-04-13 17:24 UTC (permalink / raw) To: linux-ia64 >>>>> Luck, Tony wrote on Tuesday, April 13, 2004 9:26 AM > >Rahul Chaturvedi wrote on Monday, April 12, 2004 10:17 PM > >d.) The pfs register seems to point to a location in > >0xcxxxxxxxxxxxxxxx. Shouldn't this point to some > >accessible memory location (from usermode)? > > someone else can take this one? ar.pfs is not a memory address pointer. See section 3.1.8.10 of ia64 software developer's manual, volume 1 for proper way to decode that register (FYI, bit [63-62] is previous privilege level, which is 3 in the above case). - Ken ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Questions on the stack for IA64 2004-04-13 5:17 Questions on the stack for IA64 Rahul Chaturvedi 2004-04-13 16:25 ` Luck, Tony 2004-04-13 17:24 ` Chen, Kenneth W @ 2004-04-14 5:05 ` Rahul Chaturvedi 2004-04-14 5:19 ` David Mosberger ` (3 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: Rahul Chaturvedi @ 2004-04-14 5:05 UTC (permalink / raw) To: linux-ia64 > >c.) How exactly is the data stored in the bspstore > >locations? > > The processor writes out stacked registers to the > bspstore > area "in the background" while other execution is > happening. Does the compiler ever touch this backing store? I believe it is used during a stack unwind? > >b.) There is something called $GP and $TP that the > >compiler defines as r1 and r13 respectively. Anyone > >got any idea what they are? > > r1 is used for simple access to some of the data, > without > it you would need to use "movl reg=<64-bit value>" > to get > the address of a global. Using it, many data > objects can > be accessed using relative addressing from the $GP, > a.k.a. > __gp address. $TP is used in the same way for > thread local > storage. Thanks, the code makes a lot more sense now :) > Can you provide a sample of C-code that results in > this > excessive storing to memory? (I hope this e-mail is still readable after all this code pasted in here =\) +++------------------------------+++ int func(char* pszTest, int iTest, long lTest) { int a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z; a = 1; b = 2; c = 3; d = 4; e = 5; f = 6; g 7; h = 8; i = 9; j = 10; k = 11; l = 12; m = 13; n = 14; o = 15; p = 16; q = 17; r = 18; s = 19; t 20; u = 21; v = 22; w = 23; x = 24; y = 25; z = 26; printf("", a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z, aa, bb, cc, dd, ee, ff, gg); c = temp(a, b); return 0; } +++------------------------------+++ This code when compiled with gcc -g -O2 gives me this assembly, +++------------------------------+++ (From objdump) The function starts with, int func(char* pszTest, int iTest, long lTest) { 4000000000000680: 00 08 2d 06 80 05 [MII] alloc r33=ar.pfs,11,3,0 Then corresponding to these lines, int a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z; a = 1; b = 2; c = 3; d = 4; e = 5; f = 6; g 7; h = 8; i = 9; j = 10; k = 11; l = 12; m = 13; n = 14; o = 15; p = 16; q = 17; r = 18; s = 19; t 20; u = 21; v = 22; w = 23; x = 24; y = 25; z = 26; printf("", a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z, aa, bb, cc, dd, ee, ff, gg); This is the assembly code, 4000000000000686: c0 00 33 7c 46 c0 adds r12=-160,r12 400000000000068c: 81 00 00 90 mov r14=8 4000000000000690: 09 28 09 00 00 24 [MMI] mov r37=2 4000000000000696: 60 1a 00 00 48 e0 mov r38=3 400000000000069c: 44 00 00 90 mov r39=4;; 40000000000006a0: 00 78 40 18 00 21 [MII] adds r15\x16,r12 40000000000006a6: 00 c1 30 00 42 20 adds r16$,r12 40000000000006ac: 02 62 00 84 adds r172,r12 40000000000006b0: 00 90 a0 18 00 21 [MII] adds r18@,r12 40000000000006b6: 30 81 31 00 42 80 adds r19H,r12 40000000000006bc: 82 63 00 84 adds r20V,r12 40000000000006c0: 09 a8 00 19 00 21 [MMI] adds r21d,r12 40000000000006c6: 60 41 32 00 42 00 adds r22r,r12 40000000000006cc: 00 00 04 00 nop.i 0x0;; 40000000000006d0: 00 00 38 1e 90 11 [MII] st4 [r15]=r14 40000000000006d6: e0 48 00 00 48 e0 mov r14=9 40000000000006dc: 02 65 00 84 adds r23€,r12 40000000000006e0: 0a c0 60 19 00 21 [MMI] adds r24ˆ,r12;; 40000000000006e6: 00 70 40 20 23 c0 st4 [r16]=r14 40000000000006ec: a1 00 00 90 mov r14\x10 40000000000006f0: 02 c8 80 19 00 21 [MII] adds r25–,r12 40000000000006f6: a0 41 33 00 42 60 adds r26\x104,r12;; 40000000000006fc: 03 67 00 84 adds r27\x112,r12 4000000000000700: 09 00 38 22 90 11 [MMI] st4 [r17]=r14 4000000000000706: e0 58 00 00 48 80 mov r14\x11 400000000000070c: 83 67 00 84 adds r28\x120,r12;; 4000000000000710: 00 00 38 24 90 11 [MII] st4 [r18]=r14 4000000000000716: e0 60 00 00 48 a0 mov r14\x12 400000000000071c: 03 60 04 84 adds r29\x128,r12 4000000000000720: 0a f0 20 18 01 21 [MMI] adds r30\x136,r12;; 4000000000000726: 00 70 4c 20 23 c0 st4 [r19]=r14 400000000000072c: d1 00 00 90 mov r14\x13 4000000000000730: 02 f8 40 18 01 21 [MII] adds r31\x144,r12 4000000000000736: 80 c0 30 02 42 20 adds r8\x152,r12;; 400000000000073c: 01 62 04 84 adds r9\x160,r12 4000000000000740: 09 00 38 28 90 11 [MMI] st4 [r20]=r14 4000000000000746: e0 70 00 00 48 60 mov r14\x14 400000000000074c: 04 0d 00 90 addl r35€,r1;; 4000000000000750: 00 00 38 2a 90 11 [MII] st4 [r21]=r14 4000000000000756: e0 78 00 00 48 40 mov r14\x15 400000000000075c: 04 08 00 84 mov r34=r1 4000000000000760: 0a 40 15 00 00 24 [MMI] mov r40=5;; 4000000000000766: 00 70 58 20 23 c0 st4 [r22]=r14 400000000000076c: 01 01 00 90 mov r14\x16 4000000000000770: 02 48 19 00 00 24 [MII] mov r41=6 4000000000000776: a0 3a 00 00 48 00 mov r42=7;; 400000000000077c: 04 00 c4 00 mov r32° 4000000000000780: 09 00 38 2e 90 11 [MMI] st4 [r23]=r14 4000000000000786: e0 88 00 00 48 80 mov r14\x17 400000000000078c: 14 00 00 90 mov r36=1;; 4000000000000790: 00 00 38 30 90 11 [MII] st4 [r24]=r14 4000000000000796: e0 90 00 00 48 00 mov r14\x18 400000000000079c: 00 00 04 00 nop.i 0x0 40000000000007a0: 0b 18 01 46 18 10 [MMI] ld8 r35=[r35];; 40000000000007a6: 00 70 64 20 23 c0 st4 [r25]=r14 40000000000007ac: 31 01 00 90 mov r14\x19;; 40000000000007b0: 02 00 38 34 90 11 [MII] st4 [r26]=r14 40000000000007b6: e0 a0 00 00 48 00 mov r14 ;; 40000000000007bc: 00 00 04 00 nop.i 0x0 40000000000007c0: 02 00 38 36 90 11 [MII] st4 [r27]=r14 40000000000007c6: e0 a8 00 00 48 00 mov r14!;; 40000000000007cc: 00 00 04 00 nop.i 0x0 40000000000007d0: 02 00 38 38 90 11 [MII] st4 [r28]=r14 40000000000007d6: e0 b0 00 00 48 00 mov r14";; 40000000000007dc: 00 00 04 00 nop.i 0x0 40000000000007e0: 02 00 38 3a 90 11 [MII] st4 [r29]=r14 40000000000007e6: e0 b8 00 00 48 00 mov r14#;; 40000000000007ec: 00 00 04 00 nop.i 0x0 40000000000007f0: 02 00 38 3c 90 11 [MII] st4 [r30]=r14 40000000000007f6: e0 c0 00 00 48 00 mov r14$;; 40000000000007fc: 00 00 04 00 nop.i 0x0 4000000000000800: 02 00 38 3e 90 11 [MII] st4 [r31]=r14 4000000000000806: e0 c8 00 00 48 00 mov r14%;; 400000000000080c: 00 00 04 00 nop.i 0x0 4000000000000810: 02 00 38 10 90 11 [MII] st4 [r8]=r14 4000000000000816: e0 d0 00 00 48 00 mov r14&;; 400000000000081c: 00 00 04 00 nop.i 0x0 4000000000000820: 11 00 38 12 90 11 [MIB] st4 [r9]=r14 4000000000000826: 00 00 00 02 00 00 nop.i 0x0 400000000000082c: 28 fc ff 58 br.call.sptk.many b0@00000000000440 <_init+0x150>;; 4000000000000830: 00 08 00 44 00 21 [MII] mov r1=r34 +++------------------------------+++ As you can see, almost all the parameters are being "st4'd" to memory. From the architecture specification, shouldn't all these be moved into registers? I have a total of much lesser than 96 parameters? Code with lesser parameters (around 2-3) seems to use registers fine though. In the same code, corresponding to, c = temp(a, b); The corresponding assembly is, 4000000000000836: 30 0a 00 00 48 80 mov r35=1 400000000000083c: 24 00 00 90 mov r36=2 4000000000000840: 1d 00 00 00 01 00 [MFB] nop.m 0x0 4000000000000846: 00 00 00 02 00 00 nop.f 0x0 400000000000084c: e8 fd ff 58 br.call.sptk.many b0@00000000000620 <temp>;; The register stack has more than enough registers to handle 26 input's, 26 outputs and 26 local variables, so why would it not put all the locals into registers and pass them onto printf? >>>>> Luck, Tony wrote on Tuesday, April 13, 2004 9:26 AM > >Rahul Chaturvedi wrote on Monday, April 12, 2004 10:17 PM > >d.) The pfs register seems to point to a location in > >0xcxxxxxxxxxxxxxxx. Shouldn't this point to some > >accessible memory location (from usermode)? > > someone else can take this one? ar.pfs is not a memory address pointer. See section 3.1.8.10 of ia64 software developer's manual, volume 1 for proper way to decode that register (FYI, bit [63-62] is previous privilege level, which is 3 in the above case). - Ken Alright, okay, that also makes more sense now :) I didn't look for the format for the pfs, I assumed it was just a memory address. __________________________________ Do you Yahoo!? Yahoo! Tax Center - File online by April 15th http://taxes.yahoo.com/filing.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Questions on the stack for IA64 2004-04-13 5:17 Questions on the stack for IA64 Rahul Chaturvedi ` (2 preceding siblings ...) 2004-04-14 5:05 ` Rahul Chaturvedi @ 2004-04-14 5:19 ` David Mosberger 2004-04-14 5:29 ` Ian Wienand ` (2 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: David Mosberger @ 2004-04-14 5:19 UTC (permalink / raw) To: linux-ia64 >>>>> On Tue, 13 Apr 2004 22:05:56 -0700 (PDT), Rahul Chaturvedi <justanotheraliasforrahul@yahoo.com> said: Rahul> Does the compiler ever touch this backing store? Not directly. Rahul> I believe it is used during a stack unwind? Yes (as is the memory stack and the registers). Rahul> printf("", a, b, c, d, e, f, g, h, i, j, k, l, m, n, Rahul> o, p, q, r, s, t, u, v, w, x, y, z, aa, bb, cc, dd, ee, ff, Rahul> gg); Rahul> As you can see, almost all the parameters are being "st4'd" Rahul> to memory. From the architecture specification, shouldn't all Rahul> these be moved into registers? I have a total of much lesser Rahul> than 96 parameters? You need to look at the software conventions & runtime architecture guide: http://www.intel.com/design/itanium/downloads/245358.htm It specifies that up to 8 registers are used for argument passing. There are also some fine books that might help you get started. ;-) --david -- Interested in learning more about IA-64 Linux? Try http://www.lia64.org/book/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Questions on the stack for IA64 2004-04-13 5:17 Questions on the stack for IA64 Rahul Chaturvedi ` (3 preceding siblings ...) 2004-04-14 5:19 ` David Mosberger @ 2004-04-14 5:29 ` Ian Wienand 2004-04-14 5:34 ` Rahul Chaturvedi 2004-04-14 16:06 ` Luck, Tony 6 siblings, 0 replies; 8+ messages in thread From: Ian Wienand @ 2004-04-14 5:29 UTC (permalink / raw) To: linux-ia64 [-- Attachment #1: Type: text/plain, Size: 802 bytes --] On Tue, Apr 13, 2004 at 10:05:56PM -0700, Rahul Chaturvedi wrote: > The register stack has more than enough registers to handle 26 > input's, 26 outputs and 26 local variables, so why would it not put > all the locals into registers and pass them onto printf? See "Itanium Software Conventions and Runtime Architecture Guide" http://www.intel.com/design/itanium/downloads/24535803.pdf on page 43, where it says "The contents of the first eight parameter slots are always passed in registers, while the remaining parameters are always passed on the memory stack" In general, your best references are probably everything on http://www.intel.com/design/itanium/arch_spec.htm and http://www.lia64.org/book/ -i ianw@gelato.unsw.edu.au http://www.gelato.unsw.edu.au [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Questions on the stack for IA64 2004-04-13 5:17 Questions on the stack for IA64 Rahul Chaturvedi ` (4 preceding siblings ...) 2004-04-14 5:29 ` Ian Wienand @ 2004-04-14 5:34 ` Rahul Chaturvedi 2004-04-14 16:06 ` Luck, Tony 6 siblings, 0 replies; 8+ messages in thread From: Rahul Chaturvedi @ 2004-04-14 5:34 UTC (permalink / raw) To: linux-ia64 > Rahul> Does the compiler ever touch this backing > store? > > Not directly. Another question on this, who allocates the memory for the backing store? What if the RSE runs out of memory there? > > Rahul> I believe it is used during a stack unwind? > > Yes (as is the memory stack and the registers). > > Rahul> printf("", a, b, c, d, e, f, g, h, > i, j, k, l, m, n, > Rahul> o, p, q, r, s, t, u, v, w, x, y, z, aa, bb, > cc, dd, ee, ff, > Rahul> gg); > > Rahul> As you can see, almost all the parameters > are being "st4'd" > Rahul> to memory. From the architecture > specification, shouldn't all > Rahul> these be moved into registers? I have a > total of much lesser > Rahul> than 96 parameters? > > You need to look at the software conventions & > runtime architecture guide: > > > http://www.intel.com/design/itanium/downloads/245358.htm > > It specifies that up to 8 registers are used for > argument passing. > Oh okay. I'll read the rest of the guide before I ask more stupid questions on this :) > There are also some fine books that might help you > get started. ;-) Books are expensive (especially in India) :( Processor manuals and mailing lists are free :) __________________________________ Do you Yahoo!? Yahoo! Tax Center - File online by April 15th http://taxes.yahoo.com/filing.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Questions on the stack for IA64 2004-04-13 5:17 Questions on the stack for IA64 Rahul Chaturvedi ` (5 preceding siblings ...) 2004-04-14 5:34 ` Rahul Chaturvedi @ 2004-04-14 16:06 ` Luck, Tony 6 siblings, 0 replies; 8+ messages in thread From: Luck, Tony @ 2004-04-14 16:06 UTC (permalink / raw) To: linux-ia64 >Another question on this, who allocates the memory for >the backing store? What if the RSE runs out of memory >there? For the user ... the kernel allocates space for RSE. If it runs out, then the kernel will grow the vma (just like it does when you run out of regular stack ... except the RSE grows up to higher addresses instead of down to lower addresses). In the kernel the RSE starts just above the task structure and grows up towards the stack which is growing down from the the pt_regs that are allocated at the top of the pages allocated for the task. Running out of kernel RSE (or stack) results in clobbering the switch_stack which lies in between. There is a pretty ascii picture in include/asm-ia64/ptrace.h -Tony ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-04-14 16:06 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-04-13 5:17 Questions on the stack for IA64 Rahul Chaturvedi 2004-04-13 16:25 ` Luck, Tony 2004-04-13 17:24 ` Chen, Kenneth W 2004-04-14 5:05 ` Rahul Chaturvedi 2004-04-14 5:19 ` David Mosberger 2004-04-14 5:29 ` Ian Wienand 2004-04-14 5:34 ` Rahul Chaturvedi 2004-04-14 16:06 ` Luck, Tony
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox