From mboxrd@z Thu Jan 1 00:00:00 1970 From: jamie@shareable.org (Jamie Lokier) Date: Thu, 22 Apr 2010 01:14:17 +0100 Subject: udelay() broken for SMP cores? In-Reply-To: <20100421205745.GI26616@n2100.arm.linux.org.uk> References: <4BCE9E8B.2070103@codeaurora.org> <20100421072243.GA913@n2100.arm.linux.org.uk> <20100421095036.GA13971@n2100.arm.linux.org.uk> <20100421100008.GE13114@shareable.org> <20100421192911.GA26616@n2100.arm.linux.org.uk> <20100421195225.GS27575@shareable.org> <20100421202115.GH26616@n2100.arm.linux.org.uk> <20100421204718.GY27575@shareable.org> <20100421205745.GI26616@n2100.arm.linux.org.uk> Message-ID: <20100422001417.GF27575@shareable.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Russell King - ARM Linux wrote: > > > Consider system performance where you're driving a bus using udelay() > > > to provide 1us timings, but udelay ends up taking 10us instead every > > > time because of the calculation for number of loops for a 1us timing. > > > > Hence nested loop. You don't multiply. No calculation. > > Ok, since you seem to have a clear idea how to convert this into a double > nested loop, try converting it: > > @ 0 <= r0 <= 0x7fffff06 > ldr r2, .LC0 (loops_per_jiffy) > ldr r2, [r2] @ max = 0x01ffffff > mov r0, r0, lsr #14 @ max = 0x0001ffff > mov r2, r2, lsr #10 @ max = 0x00007fff > mul r0, r2, r0 @ max = 2^32-1 > movs r0, r0, lsr #6 > moveq pc, lr > 1: subs r0, r0, #1 > bhi 1b > mov pc, lr > > into two loops without losing the precision - note that the multiply > is part of a 'dividing by multiply+shift' technique. ldr r2, loops_per_jiffy ldr r3, microseconds_per_jiffy mov r4, r2 1: subs r4, r4, r3 bhi 1b subs r0, r0, #1 add r4, r4, r2 bhi 1b mov pc, lr Goodnight :) - Jamie (Admission: I wasn't thinking of high precision when I glibly said two loops; your challenge prompted me to work it out, and I was pleasantly surprised to see it come out so neatly.)