From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zack Weinberg Date: Mon, 08 Mar 2004 18:08:09 +0000 Subject: Re: Possible race condition with deferred binding on IPF Message-Id: <87wu5v1aba.fsf@egil.codesourcery.com> List-Id: References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-ia64@vger.kernel.org Cary Coutant writes: >> Converting the ld8 to a ld8.acq is a simple matter of changing the >> second line of this array to >> >> 0x00, 0x41, 0x3c, 0x70, 0x29, 0xc0, /* ld8.acq >> r16=[r15],8 */ > > Yes, this is the same bit pattern Steve Ellcey and I came up with. Ok. I'll see about testing this and submitting a proper patch. > This code does not need to be patched. The two words loaded here point > to the dynamic loader's BOR routine. The dynamic loader must provide > the proper values in the linkage table before the program can run; > these values will not change, so the ordering isn't important. Adding > an ld.acq here would unnecessarily slow the code down. Ok, thanks for the clarification. > I don't see anything wrong with you're reasoning, but changing this > will have a binary compatibility impact, as the copy of gp to r14 is > now part of the ABI, and will be present in inlined import stubs in > existing .o files. I don't think gcc generates inlined import stubs at > the moment, but I think Intel's compiler does. > > Too bad. It leaves me wondering why we didn't design it this way in > the first place. Understood. I can still squeeze PLT0 down to two bundles by moving the r2=r14 move into PLT1a, but I suspect that it only fits because I didn't put in all the necessary stop bits. Also it relies on being able to express a relocation to PLT_RESERVE+8, which may not be possible. And I'm not sure whether this actually executes any faster. (The idea is, since the ordering doesn't matter, to fetch the branch target address first, and then the move to b6 can fit into that bundle - but only if I don't need a stop bit between the load and the move to b6.) .PLT0: addl r2 = @gprel(plt_reserve+8), r2 ;; ld8 r17 = [r2], 8 mov b6 = r17 ld8 r1 = [r2], -16 ld8 r16 = [r2] br b6 ;; .PLT1: addl r15 = @pltoff(name1), r1 ;; ld8.acq r16 = [r15], 8 mov r14 = r1 ;; ld8 r1 = [r15] mov b6 = r16 br b6 ;; .PLT1a: mov r2 = r14 mov r15 = @iplt(name1) br .PLT0 ;; zw