From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Warren Date: Fri, 06 Feb 2015 11:06:55 -0700 Subject: [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot In-Reply-To: <1423184798.1232.63.camel@posteo.de> References: <1422999858.2688.14.camel@posteo.de> <1423135265.26058.11.camel@posteo.de> <54D38D2E.50102@wwwdotorg.org> <1423174204.1232.25.camel@posteo.de> <54D3ED62.7030208@wwwdotorg.org> <1423184798.1232.63.camel@posteo.de> Message-ID: <54D502BF.7070401@wwwdotorg.org> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: u-boot@lists.denx.de On 02/05/2015 06:06 PM, J?rg Krause wrote: > On Do, 2015-02-05 at 15:23 -0700, Stephen Warren wrote: >> >> b) In ci_bounce(), the bounce buffer is only allocated if the >> user-buffer is already aligned, and if a large-enough bounce buffer >> wasn't previously allocated. If ci_req->b_buf was uninitialized it could >> be non-zero (thus preventing the expected aligned allocation) yet not >> actually aligned enough. > > I can reproduce this issue now. After some "timeout sending packets to > usb ethernet" messages, the bounce buffer somehow gets corrupted. > ci_bounce() is called with an unaligned input buffer length > 'req->length=66', but the bounce buffer length > 'ci_req->b_len=1140305940' or in hex 'ci_req->b_len=0x43f7b014'. This > bounce buffer length is obviously an address, as the following > misaligned error message shows: "CACHE: Misaligned operation at range > [43f7b010, 43f7b070]". Ah, I hadn't realized that was [start, length] rather than [start, end]. The question is: How is ci_req->b_len getting corrupted? Is it simply never initialized, or does something trash that value later? ci_ep_alloc_request() appears to calloc() the whole struct ci_req, so I imagine an initialization/allocating error isn't happening. The only issue there might be some code somehow creating its own struct usb_request instead of calling into the controller's ->alloc_request() function. I vaguely recall fixing some of those, but might have missed some in protocols that I didn't test (i.e. anything other than USB Mass Storage or DFU, although I might have very briefly tested netconsole once?). I would suggest adding a whole ton of printfs() to catch where ci_reqs are being allocated, and where ci_req->b_len is getting written in which ci_req objects, and then mapping that back to the ci_req that the cache alignment error message complains about. Sorry, this will be a bit painful. If the ci_req is always at the same address on different boots of the code, that will make it easier, especially if you have a debugger with a data watchpoint, or can write some code to use any data watchpoint self-hosted debug capability in your CPU.