From mboxrd@z Thu Jan 1 00:00:00 1970 From: robherring2@gmail.com (Rob Herring) Date: Thu, 11 Oct 2012 10:23:30 -0500 Subject: alignment faults in 3.6 In-Reply-To: <1349963227.21172.9188.camel@edumazet-glaptop> References: <20121005082439.GF4625@n2100.arm.linux.org.uk> <20121011103257.GO4625@n2100.arm.linux.org.uk> <1349952574.21172.8604.camel@edumazet-glaptop> <201210111228.25995.arnd@arndb.de> <1349959248.21172.8970.camel@edumazet-glaptop> <5076C78E.1020408@gmail.com> <1349963227.21172.9188.camel@edumazet-glaptop> Message-ID: <5076E472.8030703@gmail.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 10/11/2012 08:47 AM, Eric Dumazet wrote: > On Thu, 2012-10-11 at 08:20 -0500, Rob Herring wrote: >> On 10/11/2012 07:40 AM, Eric Dumazet wrote: >>> On Thu, 2012-10-11 at 12:28 +0000, Arnd Bergmann wrote: >>> >>>> >>>> Rob Herring as the original reporter has dropped off the Cc list, adding >>>> him back. >>>> >>>> I assume that the calxeda xgmac driver is the culprit then. It uses >>>> netdev_alloc_skb() rather than netdev_alloc_skb_ip_align() in >>>> xgmac_rx_refill but it is not clear whether it does so intentionally >>>> or by accident. >> >> This in fact does work and eliminates the unaligned traps. However, not >> all h/w can do IP aligned DMA (i.MX FEC for example), so I still think >> this is a questionable optimization by the compiler. We're saving 1 load >> instruction here for data that is likely already in the cache. It may be >> legal per the ABI, but the downside of this optimization is much greater >> than the upside. > > Compiler is asked to perform a 32bit load, it does it. Not exactly. It is asked to to perform 2 32-bit loads which are combined into a single ldm (load multiple) which cannot handle unaligned accesses. Here's a simple example that does the same thing: void test(char * buf) { printf("%d, %d\n", *((unsigned int *)&buf[0]), *((unsigned int *)&buf[4])); } So I guess the only ABI legal unaligned access is in a packed struct. > There is no questionable optimization here. Really. > Please stop pretending this, this makes no sense. I'm not the one calling the networking stack bad code. I can fix my h/w, so I'll stop caring about this. Others can all get bitten by this new behavior in gcc 4.7. Rob > As I said, if some h/w cannot do IP aligned DMA, driver can use a > workaround, or a plain memmove() (some drivers seems to do this to work > around this h/w limitation, just grep for memmove() in drivers/net) > >> >>> >>> Thanks Arnd >>> >>> It seems an accident, since driver doesnt check skb->data alignment at >>> all (this can change with SLAB debug on/off) >>> >>> It also incorrectly adds 64 bytes to bfsize, there is no need for this. >> >> I'm pretty sure this was needed as the h/w writes out full bursts of >> data, but I'll go back and check. > > Maybe the ALIGN() was needed then. But the 64 + NE_IP_ALIGN sounds like > the head room that we allocate/reserve in netdev_alloc_skb_ip_align() > > So you allocate this extra room twice. > > Thanks > >