From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Warren Date: Fri, 08 May 2015 10:06:41 -0600 Subject: [U-Boot] [PATCH 4/4] ARM: bcm283x: Switch to generic timer In-Reply-To: References: <1430772877-7301-1-git-send-email-marex@denx.de> <201505060137.45642.marex@denx.de> <554A38C5.9070206@wwwdotorg.org> <201505062013.57612.marex@denx.de> Message-ID: <554CDF11.9060106@wwwdotorg.org> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: u-boot@lists.denx.de On 05/06/2015 01:51 PM, Tyler Baker wrote: > On 6 May 2015 at 11:13, Marek Vasut wrote: >> On Wednesday, May 06, 2015 at 05:52:37 PM, Stephen Warren wrote: >> [...] >>>>>> So, if now is close to 0x7fffffff (which it can), then if endtime is >>>>>> big-ish, diff will become negative and this udelay() will not perform >>>>>> the correct delay, right ? >>>>> >>>>> I don't believe so, no. >>>>> >>>>> endtime and now are both unsigned. My (admittedly intuitive rather than >>>>> well-researched) understanding of C math promotion rules means that >>>>> "endtime - now" will be calculated as an unsigned value, then converted >>>>> into a signed value to be stored in the signed diff. As such, I would >>>>> expect the value of diff to be a small value in this case. I wrote a >>>>> test program to validate this; endtime = 0x80000002, now = 0x7ffffffe, >>>>> yields diff=4 as expected. >>>>> >>>>> Perhaps you meant a much larger endtime value than 0x80000002; perhaps >>>>> 0xffffffff? This doesn't cause issues either. All that's relevant is the >>>>> difference between endtime and now, not their absolute values, and not >>>>> whether endtime has wrapped but now has or hasn't. For example, endtime >>>>> = 0x00000002, now = 0xfffffff0 yields diff=18 as expected. >>>> >>>> So what if the difference is bigger than 1 << 31 ? >>> >>> As I said, I don't believe that case is relevant; it can only happen if >>> passing ridiculously large delay values into __udelay() (i.e. greater >>> than the 1<<31value you mention), and I don't believe there's any need >>> to support that. >> >> So what you say is that it's OK to have a function which is buggy in >> corner cases ? >> >>> The implementation in lib/time.c probably has exactly the same problem, >>> except that since it uses 64-bit math rather than 32-bit math, so the >>> issue happens at 1<<63 rather than 1<<31. It's probably equally >>> problematic for delay values as large as 1<<63:-) In practice, given >>> 1<<31 us is so large, I don't think there's any practical difference. >> >> The implementation in lib/time.c uses 32bit usec argument though, so >> it's not prone to this overflow. Please correct me if I'm wrong. >> >>>>>>> Besides, what's passing a value >~36 minutes to udelay()? >>>>>> >>>>>> Nothing, but that doesn't mean we can have a possibly broken >>>>>> implementation, right ? >>>>> >>>>> True. However, I'd expect that any specification for udelay would >>>>> disallow such large parameter values, and hence its behaviour wouldn't >>>>> be relevant if such values were passed. >>>> >>>> Do you think you can pick this patch and drop the "fixes overflow" part >>>> or do you need resubmission ? >>> >>> Tom Rini (or in the past Albert Aribaud) actually apply the patches. >>> >>> Re: the patch description: I'd certainly be happy if it was re-written >>> to say something more like "replace bcm2835-specific timer logic with >>> common code to reduce the number of different implementations for the >>> same thing". >> >> Tom, do you want a repost ? >> >>> I think you'd mentioned on IRC that this change fixed something >>> USB-related for you, and I still don't understand how that could be >>> possible. Perhaps there's some intermittent problem, and it just >>> happened not to show up when you tested after this patch? >> >> I think Tyler can elaborate on that, but in his test case, he still >> triggers the USB issue. > > I'll provide some context on the issue I'm fighting... > > I recently bought a RPi B+ Model, flashed the latest raspbian image[0] > to an sd card, built the master branch (v2015.05+) u-boot and > overwrote kernel.img with u-boot.bin. U-Boot came right up but I was > unable to obtain a DHCP lease after using 'usb start; dhcp'. I ran > tcpdump and saw the DHCP requests being made, but shortly after the > board was seemly ignoring responses from the DHCP server. Now > sometimes if the response from the DHCP server was quick enough, it > could get a lease, but the tftp transfer would stall. I decided it > would be best to set 'ipaddr, gateway, netmask' to see if this was a > DHCP issue. It turned out that I had no network connectivity even when > I configured the ip address statically. > > Could not obtain a lease: > > starting USB... > USB0: Core Release: 2.80a > scanning bus 0 for devices... 3 USB Device(s) found > scanning usb for storage devices... 0 Storage Device(s) found > scanning usb for ethernet devices... 1 Ethernet Device(s) found > Waiting for Ethernet connection... done. > BOOTP broadcast 1 > BOOTP broadcast 2 > BOOTP broadcast 3 > BOOTP broadcast 4 > BOOTP broadcast 5 > BOOTP broadcast 6 > BOOTP broadcast 7 > BOOTP broadcast 8 > Retry time exceeded; starting again > Bad Linux ARM zImage magic! > > Obtains a lease, but stalls on transfer: > > starting USB... > USB0: Core Release: 2.80a > scanning bus 0 for devices... 3 USB Device(s) found > scanning usb for storage devices... 0 Storage Device(s) found > scanning usb for ethernet devices... 1 Ethernet Device(s) found > Waiting for Ethernet connection... done. > BOOTP broadcast 1 > DHCP client bound to address 192.168.2.55 (1328 ms) > Waiting for Ethernet connection... done. > Using sms0 device > TFTP from server 192.168.2.2; our IP address is 192.168.2.55 > Filename 'tmp2rkX_N/.zImage'. > Load address: 0x1000000 > Loading: * T T T T T T T T T T > Retry count exceeded; starting again > Bad Linux ARM zImage magic! > > At this point I reached out for help on IRC and that is when Marek and > I starting chatting about this. Hacking around, I found that using > v2015.01 I was able to obtain a lease and transfer files 100% of the > time (although it seems very slow). Thinking that we have a good and > bad commit, this would be easy to bisect. Wrong, we both tried and got > different results. It seems that as you move from v2015.01 -> HEAD > master this issue becomes very intermittent and thus hard to pin down. > > So my test case for this issue has become... > > * Obtain lease > * Transfer kernel, dtb, ramdisk without stalling/timing out > * Do this 10 times in a row with a power cycle in between > > Hope this help clarify the situation in some way, OK, but if you apply Marek's change to replace the timer implementation with the one in lib/time.c, does that reliably fix the issue when tested over a large number of runs with/without that change? With the current explanation, I can't see how it possibly could. Equally, I can't see why the move from v2015.01 to HEAD would affect the issue if it was caused by the timer implementation. My suspicion is that there's something else entirely going on.