* ARM unaligned MMIO access with attribute((packed)) @ 2011-02-02 16:00 Arnd Bergmann 2011-02-02 16:37 ` Russell King - ARM Linux ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Arnd Bergmann @ 2011-02-02 16:00 UTC (permalink / raw) To: linux-arm-kernel As noticed by Peter Maydell, the EHCI device driver in Linux gets miscompiled by some versions of arm-gcc (still need to find out which) due to a combination of problems: 1. In include/linux/usb/ehci_def.h, struct ehci_caps is defined with __attribute__((packed)), for no good reason. This is clearly a bug and needs to get fixed, but other drivers have the same bug and it used to work. The attribute forces byte access on all members accessed through pointer dereference, which is not allowed on MMIO accesses in general. The specific code triggering the problem in Peter's case is in ehci-omap.c: omap->ehci->regs = hcd->regs + HC_LENGTH(readl(&omap->ehci->caps->hc_capbase)); 2. The ARM version of the readl() function is implemented as a macro doing a direct pointer dereference with a typecast: #define __raw_readl(a) (__chk_io_ptr(a), *(volatile unsigned int __force *)(a)) #define readl_relaxed(c) ({ u32 __v = le32_to_cpu((__force __le32) \ __raw_readl(__mem_pci(c))); __v; }) #define readl(c) ({ u32 __v = readl_relaxed(c); __iormb(); __v; }) On other architectures, readl() is implemented using an inline assembly specifically to prevent gcc from issuing anything but a single 32-bit load instruction. readl() only makes sense on aligned memory, so in case of a misaligned pointer argument, it should cause a trap anyway. 3. gcc does not seem to clearly define what happens during a cast between aligned an packed pointers. In this case, the original pointer is packed (byte aligned), while the access is done through a 32-bit aligned volatile unsigned int pointer. In gcc-4.4, casting from "unsigned int __attribute__((packed))" to "volatile unsigned int" resulted in a 32-bit aligned access, while casting to "unsigned int" (without volatile) resulted in four byte accesses. gcc-4.5 seems to have changed this to always do a byte access in both cases, but still does not document the behavior. (need to confirm this). I would suggest fixing this by: 1. auditing all uses of __attribute__((packed)) in the Linux USB code and other drivers, removing the ones that are potentially harmful. 2. Changing the ARM MMIO functions to use inline assembly instead of direct pointer dereference. 3. Documenting the gcc behavior as undefined. Other suggestions? Arnd ^ permalink raw reply [flat|nested] 20+ messages in thread
* ARM unaligned MMIO access with attribute((packed)) 2011-02-02 16:00 ARM unaligned MMIO access with attribute((packed)) Arnd Bergmann @ 2011-02-02 16:37 ` Russell King - ARM Linux 2011-02-02 17:39 ` Arnd Bergmann ` (2 more replies) 2011-02-02 16:51 ` Richard Guenther 2011-04-26 15:00 ` Rabin Vincent 2 siblings, 3 replies; 20+ messages in thread From: Russell King - ARM Linux @ 2011-02-02 16:37 UTC (permalink / raw) To: linux-arm-kernel On Wed, Feb 02, 2011 at 05:00:20PM +0100, Arnd Bergmann wrote: > I would suggest fixing this by: > > 1. auditing all uses of __attribute__((packed)) in the Linux USB code > and other drivers, removing the ones that are potentially harmful. > > 2. Changing the ARM MMIO functions to use inline assembly instead of > direct pointer dereference. > > 3. Documenting the gcc behavior as undefined. We used to use inline assembly at one point, but that got chucked out. The problem is that using asm() for this causes GCC to generate horrid code. 1. there's no way to tell GCC that the inline assembly is a load instruction and therefore it needs to schedule the following instructions appropriately. 2. GCC will needlessly reload pointers from structures and other such behaviour because it can't be told clearly what the inline assembly is doing, so the inline asm needs to have a "memory" clobber. 3. It seems to misses out using the pre-index addressing, prefering to create add/sub instructions prior to each inline assembly load/store. 4. There are no (documented) constraints in GCC to allow you to represent the offset format for the half-word instructions. Overall, it means greater register pressure, more instructions, larger functions, greater instruction cache pressure, etc. ^ permalink raw reply [flat|nested] 20+ messages in thread
* ARM unaligned MMIO access with attribute((packed)) 2011-02-02 16:37 ` Russell King - ARM Linux @ 2011-02-02 17:39 ` Arnd Bergmann 2011-02-02 19:14 ` Ian Lance Taylor 2011-02-02 21:38 ` David Miller 2011-02-03 15:03 ` Arnd Bergmann 2 siblings, 1 reply; 20+ messages in thread From: Arnd Bergmann @ 2011-02-02 17:39 UTC (permalink / raw) To: linux-arm-kernel On Wednesday 02 February 2011 17:37:02 Russell King - ARM Linux wrote: > We used to use inline assembly at one point, but that got chucked out. > The problem is that using asm() for this causes GCC to generate horrid > code. > > 1. there's no way to tell GCC that the inline assembly is a load > instruction and therefore it needs to schedule the following > instructions appropriately. > > 2. GCC will needlessly reload pointers from structures and other such > behaviour because it can't be told clearly what the inline assembly > is doing, so the inline asm needs to have a "memory" clobber. > > 3. It seems to misses out using the pre-index addressing, prefering to > create add/sub instructions prior to each inline assembly load/store. > > 4. There are no (documented) constraints in GCC to allow you to represent > the offset format for the half-word instructions. > > Overall, it means greater register pressure, more instructions, larger > functions, greater instruction cache pressure, etc. Another solution would be to declare the readl function extern and define it out of line, but I assume that this would be at least as bad as an inline assembly for all the points you brought up, right? Would it be possible to add the proper constraints for defining readl in an efficient way to a future version of gcc? That wouldn't help us in the near future, but we could at some points use those in a number of places. Arnd ^ permalink raw reply [flat|nested] 20+ messages in thread
* ARM unaligned MMIO access with attribute((packed)) 2011-02-02 17:39 ` Arnd Bergmann @ 2011-02-02 19:14 ` Ian Lance Taylor 0 siblings, 0 replies; 20+ messages in thread From: Ian Lance Taylor @ 2011-02-02 19:14 UTC (permalink / raw) To: linux-arm-kernel Arnd Bergmann <arnd@arndb.de> writes: > On Wednesday 02 February 2011 17:37:02 Russell King - ARM Linux wrote: >> We used to use inline assembly at one point, but that got chucked out. >> The problem is that using asm() for this causes GCC to generate horrid >> code. >> >> 1. there's no way to tell GCC that the inline assembly is a load >> instruction and therefore it needs to schedule the following >> instructions appropriately. >> >> 2. GCC will needlessly reload pointers from structures and other such >> behaviour because it can't be told clearly what the inline assembly >> is doing, so the inline asm needs to have a "memory" clobber. >> >> 3. It seems to misses out using the pre-index addressing, prefering to >> create add/sub instructions prior to each inline assembly load/store. >> >> 4. There are no (documented) constraints in GCC to allow you to represent >> the offset format for the half-word instructions. >> >> Overall, it means greater register pressure, more instructions, larger >> functions, greater instruction cache pressure, etc. > > Another solution would be to declare the readl function extern and define > it out of line, but I assume that this would be at least as bad as an > inline assembly for all the points you brought up, right? > > Would it be possible to add the proper constraints for defining readl > in an efficient way to a future version of gcc? That wouldn't help us > in the near future, but we could at some points use those in a number > of places. I think it would be quite difficult to implement item 1 above in a way that was actually usable. It would require some way to describe the scheduling requirements of an asm. But the details of scheduling are backend specific. Internally there are define_insn_reservation structures which have names, but the names are processor specific which is not what you want in source code (by processor specific I mean specific to particular CPUs within a family). There are define_cpu_unit structures which also have names, but are again processor specific. What you want here is some non-processor-specific way to describe the characteristics of an instruction. gcc does not have that today. Even if somebody implemented all that, most inline asms are not a single instructions and thus would find it difficult to take advantage of it. I don't see this as paying off in the long run. A more likely payoff would be to develop builtin functions for whatever functionality is required that can not expressed in source code. Item 2 above can be done. It is possible to describe precisely which areas of memory are clobbered. Item 3 above seems impossible to me. There is no way to combine compiler generated instructions with user written inline asm such that pre-index addressing can be used. Perhaps I misunderstand. Item 4 can be implemented; please consider opening a feature request in bugzilla. Ian ^ permalink raw reply [flat|nested] 20+ messages in thread
* ARM unaligned MMIO access with attribute((packed)) 2011-02-02 16:37 ` Russell King - ARM Linux 2011-02-02 17:39 ` Arnd Bergmann @ 2011-02-02 21:38 ` David Miller 2011-02-02 21:45 ` Russell King - ARM Linux [not found] ` <yw1xei7qm3vy.fsf@unicorn.mansr.com> 2011-02-03 15:03 ` Arnd Bergmann 2 siblings, 2 replies; 20+ messages in thread From: David Miller @ 2011-02-02 21:38 UTC (permalink / raw) To: linux-arm-kernel From: Russell King - ARM Linux <linux@arm.linux.org.uk> Date: Wed, 2 Feb 2011 16:37:02 +0000 > 1. there's no way to tell GCC that the inline assembly is a load > instruction and therefore it needs to schedule the following > instructions appropriately. Just add a dummy '"m" (pointer)' asm input argument to the inline asm statement. Just make sure "typeof(pointer)" has a size matching the size of the load your are performing. > 2. GCC will needlessly reload pointers from structures and other such > behaviour because it can't be told clearly what the inline assembly > is doing, so the inline asm needs to have a "memory" clobber. This behavior is correct, and in fact needed. Writing to chip registers can trigger changes to arbitrary main memory locations. > 3. It seems to misses out using the pre-index addressing, prefering to > create add/sub instructions prior to each inline assembly load/store. Yes, this is indeed a problem. But you really need that memory clobber there whether you like it or not, see above. ^ permalink raw reply [flat|nested] 20+ messages in thread
* ARM unaligned MMIO access with attribute((packed)) 2011-02-02 21:38 ` David Miller @ 2011-02-02 21:45 ` Russell King - ARM Linux 2011-02-02 21:59 ` David Miller [not found] ` <yw1xei7qm3vy.fsf@unicorn.mansr.com> 1 sibling, 1 reply; 20+ messages in thread From: Russell King - ARM Linux @ 2011-02-02 21:45 UTC (permalink / raw) To: linux-arm-kernel On Wed, Feb 02, 2011 at 01:38:31PM -0800, David Miller wrote: > From: Russell King - ARM Linux <linux@arm.linux.org.uk> > Date: Wed, 2 Feb 2011 16:37:02 +0000 > > > 1. there's no way to tell GCC that the inline assembly is a load > > instruction and therefore it needs to schedule the following > > instructions appropriately. > > Just add a dummy '"m" (pointer)' asm input argument to the inline asm > statement. Just make sure "typeof(pointer)" has a size matching the > size of the load your are performing. That involves this problematical cast from a packed struct pointer to an unsigned long pointer, which according to the C standard and GCC folk is undefined. > > 2. GCC will needlessly reload pointers from structures and other such > > behaviour because it can't be told clearly what the inline assembly > > is doing, so the inline asm needs to have a "memory" clobber. > > This behavior is correct, and in fact needed. Writing to chip registers > can trigger changes to arbitrary main memory locations. That is really not an argument which stands up to analysis. When does main memory locations change as a result of a write to a chip register? The answer is: when DMA is performed - which could be many microseconds or even milliseconds after you've written the register, which would be long after you've exited the function doing the writing. Not only that, but we have the DMA API to deal with the implications of that. On ARM, that's a function call, and GCC can't make any assumptions about memory contents across function calls where it doesn't know what the function does. Practice over the last 15 years on ARM has also shown that this is not necessary. ^ permalink raw reply [flat|nested] 20+ messages in thread
* ARM unaligned MMIO access with attribute((packed)) 2011-02-02 21:45 ` Russell King - ARM Linux @ 2011-02-02 21:59 ` David Miller 0 siblings, 0 replies; 20+ messages in thread From: David Miller @ 2011-02-02 21:59 UTC (permalink / raw) To: linux-arm-kernel From: Russell King - ARM Linux <linux@arm.linux.org.uk> Date: Wed, 2 Feb 2011 21:45:22 +0000 > On Wed, Feb 02, 2011 at 01:38:31PM -0800, David Miller wrote: >> From: Russell King - ARM Linux <linux@arm.linux.org.uk> >> Date: Wed, 2 Feb 2011 16:37:02 +0000 >> >> > 1. there's no way to tell GCC that the inline assembly is a load >> > instruction and therefore it needs to schedule the following >> > instructions appropriately. >> >> Just add a dummy '"m" (pointer)' asm input argument to the inline asm >> statement. Just make sure "typeof(pointer)" has a size matching the >> size of the load your are performing. > > That involves this problematical cast from a packed struct pointer to > an unsigned long pointer, which according to the C standard and GCC > folk is undefined. It's alignment may be undefined, but it's size definitely is well defined and that's what matters here. > Practice over the last 15 years on ARM has also shown that this is not > necessary. Sorry oh big super man, little ole' me is only a kernel newbie. ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <yw1xei7qm3vy.fsf@unicorn.mansr.com>]
* ARM unaligned MMIO access with attribute((packed)) [not found] ` <yw1xei7qm3vy.fsf@unicorn.mansr.com> @ 2011-02-02 23:23 ` David Miller 2011-02-03 9:26 ` Arnd Bergmann 1 sibling, 0 replies; 20+ messages in thread From: David Miller @ 2011-02-02 23:23 UTC (permalink / raw) To: linux-arm-kernel From: M?ns Rullg?rd <mans@mansr.com> Date: Wed, 02 Feb 2011 23:08:01 +0000 > David Miller <davem@davemloft.net> writes: > >> From: Russell King - ARM Linux <linux@arm.linux.org.uk> >> Date: Wed, 2 Feb 2011 16:37:02 +0000 >> >>> 1. there's no way to tell GCC that the inline assembly is a load >>> instruction and therefore it needs to schedule the following >>> instructions appropriately. >> >> Just add a dummy '"m" (pointer)' asm input argument to the inline asm >> statement. Just make sure "typeof(pointer)" has a size matching the >> size of the load your are performing. > > That should be "m"(*pointer). Right, thanks for the correction. >> But you really need that memory clobber there whether you like it or >> not, see above. > > I don't know of any device where the side-effects are not explicitly > indicated by other means in the code triggering them, so it probably > is safe without the clobber as Russel says. You're probably right. ^ permalink raw reply [flat|nested] 20+ messages in thread
* ARM unaligned MMIO access with attribute((packed)) [not found] ` <yw1xei7qm3vy.fsf@unicorn.mansr.com> 2011-02-02 23:23 ` David Miller @ 2011-02-03 9:26 ` Arnd Bergmann 1 sibling, 0 replies; 20+ messages in thread From: Arnd Bergmann @ 2011-02-03 9:26 UTC (permalink / raw) To: linux-arm-kernel On Thursday 03 February 2011 00:08:01 M?ns Rullg?rd wrote: > > But you really need that memory clobber there whether you like it or > > not, see above. > > I don't know of any device where the side-effects are not explicitly > indicated by other means in the code triggering them, so it probably > is safe without the clobber as Russel says. On configurations that have CONFIG_ARM_DMA_MEM_BUFFERABLE set, this is definitely true, since they use the rmb() and wmb() that include both an IO memory barrier instruction where required and a compiler barrier (i.e. __asm__ __volatile__ ("" : : : "memory")): 8<-------------- from arch/arm/include/asm/io.h #ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE #define __iormb() rmb() #define __iowmb() wmb() #else #define __iormb() do { } while (0) #define __iowmb() do { } while (0) #endif #define readb(c) ({ u8 __v = readb_relaxed(c); __iormb(); __v; }) #define readw(c) ({ u16 __v = readw_relaxed(c); __iormb(); __v; }) #define readl(c) ({ u32 __v = readl_relaxed(c); __iormb(); __v; }) #define writeb(v,c) ({ __iowmb(); writeb_relaxed(v,c); }) #define writew(v,c) ({ __iowmb(); writew_relaxed(v,c); }) #define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); }) 8<--------------- Also, as Russell mentioned, anything using the streaming DMA mapping API is fine because of the barriers included in the function calls there. However, I would think that this fictional piece of code would be valid for a possible PCI device driver (though inefficient) and not require any additional synchronizing operations: void foo_perform_operation(struct foo_dev *d, u32 in, u32 *out) { dma_addr_t dma_addr; u32 *cpu_addr; /* * get memory from the consistent DMA mapping API, typically * uncached memory on ARM, but could be anywhere if the DMA * is coherent. */ cpu_addr = dma_alloc_coherent(&d->dev, sizeof (*cpu_addr), &dma_addr, GFP_KERNEL); /* lock the device, not required for the example, but normally * needed in practice for SMP operation. */ spin_lock(&d->lock); /* initialize the DMA data */ *cpu_addr = in; /* * send a posted 32 bit write to the device, triggering the * start of the DMA read from *cpu_addr, which is followed by * a DMA write to *cpu_addr. writel includes a barrier that * makes sure that the previous store to *cpu_addr is visible * to the DMA, but does not block until the completion like * outl() would. */ writel(dma_addr, d->mmio_addr); /* * synchronize the outbound posted write, wait for the device * to complete the DMA and synchronize the DMA data on its * inbound path. */ (void)readl(d->mmio_addr); /* * *cpu_addr contains data just written to by the device, and * the readl includes all the necessary barriers to ensure * it's really there when we access it. */ *out = *cpu_addr; /* unlock the device */ spin_unlock(&d->lock); /* free the DMA memory */ dma_free_coherent(&d->dev, sizeof (*cpu_addr), cpu_addr, dma_addr); } However, when readl contains no explicit or implicit synchronization, the load from *cpu_addr might get pulled in front of the load from mmio_addr, resulting in invalid output data. If this is the case, it is be a problem on almost all architectures (not x86, powerpc or sparc64). Am I missing something here? Arnd ^ permalink raw reply [flat|nested] 20+ messages in thread
* ARM unaligned MMIO access with attribute((packed)) 2011-02-02 16:37 ` Russell King - ARM Linux 2011-02-02 17:39 ` Arnd Bergmann 2011-02-02 21:38 ` David Miller @ 2011-02-03 15:03 ` Arnd Bergmann 2 siblings, 0 replies; 20+ messages in thread From: Arnd Bergmann @ 2011-02-03 15:03 UTC (permalink / raw) To: linux-arm-kernel On Wednesday 02 February 2011, Russell King - ARM Linux wrote: > On Wed, Feb 02, 2011 at 05:00:20PM +0100, Arnd Bergmann wrote: > > I would suggest fixing this by: > > > > 2. Changing the ARM MMIO functions to use inline assembly instead of > > direct pointer dereference. > > We used to use inline assembly at one point, but that got chucked out. > The problem is that using asm() for this causes GCC to generate horrid > code. Here is an alternative suggestion, would that work? 8<----------- arm: avoid unaligned arguments to MMIO functions The readw/writew/readl/writel functions require aligned naturally arguments which are accessed only using atomic load and store instructions, never using byte or read-modify-write write accesses. At least one driver (ehci-hcd) annotates its MMIO registers as __attribute__((packed)), which causes some versions of gcc to generate byte accesses due to an undefined behavior when casting a packed u32 to an aligned u32 variable. There does not seem to be an efficient way to force gcc to do word accesses, but we can detect the problem and refuse to build broken code: This adds a check in these functions to ensure that their arguments are either naturally aligned or they are void pointers. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h index 20e0f7c..b98f0bc 100644 --- a/arch/arm/include/asm/io.h +++ b/arch/arm/include/asm/io.h @@ -180,17 +180,36 @@ extern void _memset_io(volatile void __iomem *, int, size_t); * IO port primitives for more information. */ #ifdef __mem_pci + +/* + * Ensure natural alignment of arguments: + * It is not allowed to pass a pointer to a packed variable into + * the readl/writel family of functions, because gcc may decide + * to create byte accesses that are illegal on multi-byte MMIO + * registers. + * A lot of code uses void pointers, which is fine. + */ +#define __checkalign(p) BUILD_BUG_ON( \ + !__builtin_types_compatible_p(__typeof__(p), void *) && \ + (__alignof__ (*(p)) < sizeof (*(p)))) + #define readb_relaxed(c) ({ u8 __v = __raw_readb(__mem_pci(c)); __v; }) -#define readw_relaxed(c) ({ u16 __v = le16_to_cpu((__force __le16) \ - __raw_readw(__mem_pci(c))); __v; }) -#define readl_relaxed(c) ({ u32 __v = le32_to_cpu((__force __le32) \ - __raw_readl(__mem_pci(c))); __v; }) +#define readw_relaxed(c) ({ u16 __v = le16_to_cpu((__force __le16) \ + __raw_readw(__mem_pci(c))); \ + __checkalign(c); __v; }) +#define readl_relaxed(c) ({ u32 __v = le32_to_cpu((__force __le32) \ + __raw_readl(__mem_pci(c))); \ + __checkalign(c); __v; }) #define writeb_relaxed(v,c) ((void)__raw_writeb(v,__mem_pci(c))) -#define writew_relaxed(v,c) ((void)__raw_writew((__force u16) \ - cpu_to_le16(v),__mem_pci(c))) -#define writel_relaxed(v,c) ((void)__raw_writel((__force u32) \ - cpu_to_le32(v),__mem_pci(c))) +#define writew_relaxed(v,c) do { (void)__raw_writew((__force u16) \ + cpu_to_le16(v),__mem_pci(c)); \ + __checkalign(c); \ + } while (0); +#define writel_relaxed(v,c) do { (void)__raw_writel((__force u32) \ + cpu_to_le32(v),__mem_pci(c)); \ + __checkalign(c); \ + } while (0); #ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE #define __iormb() rmb() ^ permalink raw reply related [flat|nested] 20+ messages in thread
* ARM unaligned MMIO access with attribute((packed)) 2011-02-02 16:00 ARM unaligned MMIO access with attribute((packed)) Arnd Bergmann 2011-02-02 16:37 ` Russell King - ARM Linux @ 2011-02-02 16:51 ` Richard Guenther 2011-02-02 17:09 ` Russell King - ARM Linux 2011-02-02 17:39 ` Joseph S. Myers 2011-04-26 15:00 ` Rabin Vincent 2 siblings, 2 replies; 20+ messages in thread From: Richard Guenther @ 2011-02-02 16:51 UTC (permalink / raw) To: linux-arm-kernel On Wed, Feb 2, 2011 at 5:00 PM, Arnd Bergmann <arnd@arndb.de> wrote: > As noticed by Peter Maydell, the EHCI device driver in Linux gets > miscompiled by some versions of arm-gcc (still need to find out which) > due to a combination of problems: > > 1. In include/linux/usb/ehci_def.h, struct ehci_caps is defined > with __attribute__((packed)), for no good reason. This is clearly > a bug and needs to get fixed, but other drivers have the same bug > and it used to work. The attribute forces byte access on all members > accessed through pointer dereference, which is not allowed on > MMIO accesses in general. The specific code triggering the problem > in Peter's case is in ehci-omap.c: > ? ? ? ?omap->ehci->regs = hcd->regs > ? ? ? ? ? ? ? ?+ HC_LENGTH(readl(&omap->ehci->caps->hc_capbase)); > > > 2. The ARM version of the readl() function is implemented as a macro > doing a direct pointer dereference with a typecast: > > #define __raw_readl(a) ? ? ? ? ?(__chk_io_ptr(a), *(volatile unsigned int __force ? *)(a)) > #define readl_relaxed(c) ({ u32 __v = le32_to_cpu((__force __le32) \ > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?__raw_readl(__mem_pci(c))); __v; }) > #define readl(c) ? ? ? ? ? ? ? ?({ u32 __v = readl_relaxed(c); __iormb(); __v; }) > > On other architectures, readl() is implemented using an inline assembly > specifically to prevent gcc from issuing anything but a single 32-bit > load instruction. readl() only makes sense on aligned memory, so in case > of a misaligned pointer argument, it should cause a trap anyway. > > 3. gcc does not seem to clearly define what happens during a cast between > aligned an packed pointers. In this case, the original pointer is packed > (byte aligned), while the access is done through a 32-bit aligned > volatile unsigned int pointer. In gcc-4.4, casting from "unsigned int > __attribute__((packed))" to "volatile unsigned int" resulted in a 32-bit > aligned access, while casting to "unsigned int" (without volatile) resulted > in four byte accesses. gcc-4.5 seems to have changed this to always do > a byte access in both cases, but still does not document the behavior. > (need to confirm this). > > I would suggest fixing this by: > > 1. auditing all uses of __attribute__((packed)) in the Linux USB code > and other drivers, removing the ones that are potentially harmful. > > 2. Changing the ARM MMIO functions to use inline assembly instead of > direct pointer dereference. > > 3. Documenting the gcc behavior as undefined. The pointer conversions already invoke undefined behavior as specified by the C standard (6.3.2.3/7). Richard. ^ permalink raw reply [flat|nested] 20+ messages in thread
* ARM unaligned MMIO access with attribute((packed)) 2011-02-02 16:51 ` Richard Guenther @ 2011-02-02 17:09 ` Russell King - ARM Linux 2011-02-02 17:39 ` Joseph S. Myers 1 sibling, 0 replies; 20+ messages in thread From: Russell King - ARM Linux @ 2011-02-02 17:09 UTC (permalink / raw) To: linux-arm-kernel On Wed, Feb 02, 2011 at 05:51:27PM +0100, Richard Guenther wrote: > > I would suggest fixing this by: > > > > 1. auditing all uses of __attribute__((packed)) in the Linux USB code > > and other drivers, removing the ones that are potentially harmful. > > > > 2. Changing the ARM MMIO functions to use inline assembly instead of > > direct pointer dereference. > > > > 3. Documenting the gcc behavior as undefined. > > The pointer conversions already invoke undefined behavior as specified by the > C standard (6.3.2.3/7). Just to be clear: you are not saying that the ARM implementation is undefined. What you're saying is that converting from a pointer with less strict alignment requirements to a pointer with more strict alignment requirements is undefined. IOW: unsigned long *blah(unsigned char *c) { return (unsigned long *)c; } would be undefined, but: unsigned char *blah(unsigned long *c) { return (unsigned char *)c; } would not be. If you're saying something else, please explain with reference to the point in the C standard you quote above. ^ permalink raw reply [flat|nested] 20+ messages in thread
* ARM unaligned MMIO access with attribute((packed)) 2011-02-02 16:51 ` Richard Guenther 2011-02-02 17:09 ` Russell King - ARM Linux @ 2011-02-02 17:39 ` Joseph S. Myers 1 sibling, 0 replies; 20+ messages in thread From: Joseph S. Myers @ 2011-02-02 17:39 UTC (permalink / raw) To: linux-arm-kernel On Wed, 2 Feb 2011, Richard Guenther wrote: > The pointer conversions already invoke undefined behavior as specified by the > C standard (6.3.2.3/7). I would say: the conversions are undefined if the pointer is insufficiently aligned for any of the pointer types involved (source, destination or intermediate), where the appropriate alignment for a packed type is 1. Thus, the conversion from packed to non-packed is OK iff the pointer target is sufficiently aligned for the non-packed type. In general from a sequence of casts the compiler is permitted to deduce that the pointer is sufficiently aligned for whatever type in the sequence has the greatest alignment requirement (the middle-end may not have that information at present, but the front end could insert some form of alignment assertion if useful for optimization). *But* that is what is permitted in standards terms; it is not necessarily safe in practice. In particular, on non-strict-alignment targets such as x86 people do in practice assume that unaligned accesses are OK at the C level, not just the assembly level (glibc does so, for example), so it might be a bad idea to assume alignment in a way that would cause that to break. -- Joseph S. Myers joseph at codesourcery.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* ARM unaligned MMIO access with attribute((packed)) 2011-02-02 16:00 ARM unaligned MMIO access with attribute((packed)) Arnd Bergmann 2011-02-02 16:37 ` Russell King - ARM Linux 2011-02-02 16:51 ` Richard Guenther @ 2011-04-26 15:00 ` Rabin Vincent 2011-04-26 18:51 ` Alan Stern 2 siblings, 1 reply; 20+ messages in thread From: Rabin Vincent @ 2011-04-26 15:00 UTC (permalink / raw) To: linux-arm-kernel On Wed, Feb 2, 2011 at 21:30, Arnd Bergmann <arnd@arndb.de> wrote: > As noticed by Peter Maydell, the EHCI device driver in Linux gets > miscompiled by some versions of arm-gcc (still need to find out which) > due to a combination of problems: > > 1. In include/linux/usb/ehci_def.h, struct ehci_caps is defined > with __attribute__((packed)), for no good reason. This is clearly > a bug and needs to get fixed, but other drivers have the same bug Was a patch submitted for this? I couldn't find it in the archives. U-Boot seems to be fixing this by adding an "aligned(4)" instead of removing the packed: http://www.mail-archive.com/u-boot at lists.denx.de/msg51418.html > and it used to work. The attribute forces byte access on all members > accessed through pointer dereference, which is not allowed on > MMIO accesses in general. The specific code triggering the problem > in Peter's case is in ehci-omap.c: > ? ? ? ?omap->ehci->regs = hcd->regs > ? ? ? ? ? ? ? ?+ HC_LENGTH(readl(&omap->ehci->caps->hc_capbase)); In my case it's this writel() in ehci-hub.c that gets chopped into strbs: /* force reset to complete */ ehci_writel(ehci, temp & ~(PORT_RWC_BITS | PORT_RESET), status_reg); ^ permalink raw reply [flat|nested] 20+ messages in thread
* ARM unaligned MMIO access with attribute((packed)) 2011-04-26 15:00 ` Rabin Vincent @ 2011-04-26 18:51 ` Alan Stern 2011-04-27 14:06 ` Rabin Vincent 0 siblings, 1 reply; 20+ messages in thread From: Alan Stern @ 2011-04-26 18:51 UTC (permalink / raw) To: linux-arm-kernel On Tue, 26 Apr 2011, Rabin Vincent wrote: > On Wed, Feb 2, 2011 at 21:30, Arnd Bergmann <arnd@arndb.de> wrote: > > As noticed by Peter Maydell, the EHCI device driver in Linux gets > > miscompiled by some versions of arm-gcc (still need to find out which) > > due to a combination of problems: > > > > 1. In include/linux/usb/ehci_def.h, struct ehci_caps is defined > > with __attribute__((packed)), for no good reason. This is clearly > > a bug and needs to get fixed, but other drivers have the same bug > > Was a patch submitted for this? I couldn't find it in the archives. > U-Boot seems to be fixing this by adding an "aligned(4)" instead > of removing the packed: > > http://www.mail-archive.com/u-boot at lists.denx.de/msg51418.html ISTR a patch was submitted, but apparently it never got picked up. > > and it used to work. The attribute forces byte access on all members > > accessed through pointer dereference, which is not allowed on > > MMIO accesses in general. The specific code triggering the problem > > in Peter's case is in ehci-omap.c: > > ? ? ? ?omap->ehci->regs = hcd->regs > > ? ? ? ? ? ? ? ?+ HC_LENGTH(readl(&omap->ehci->caps->hc_capbase)); > > In my case it's this writel() in ehci-hub.c that gets chopped into > strbs: > > /* force reset to complete */ > ehci_writel(ehci, temp & ~(PORT_RWC_BITS | PORT_RESET), > status_reg); Why would that get messed up? The status_reg variable doesn't have any __atribute__((packed)) associated with it. Alan Stern ^ permalink raw reply [flat|nested] 20+ messages in thread
* ARM unaligned MMIO access with attribute((packed)) 2011-04-26 18:51 ` Alan Stern @ 2011-04-27 14:06 ` Rabin Vincent 2011-04-27 16:25 ` Alan Stern 0 siblings, 1 reply; 20+ messages in thread From: Rabin Vincent @ 2011-04-27 14:06 UTC (permalink / raw) To: linux-arm-kernel On Wed, Apr 27, 2011 at 00:21, Alan Stern <stern@rowland.harvard.edu> wrote: > On Tue, 26 Apr 2011, Rabin Vincent wrote: >> In my case it's this writel() in ehci-hub.c that gets chopped into >> strbs: >> >> ? ? ? /* force reset to complete */ >> ? ? ? ehci_writel(ehci, temp & ~(PORT_RWC_BITS | PORT_RESET), >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? status_reg); > > Why would that get messed up? ?The status_reg variable doesn't have any > __atribute__((packed)) associated with it. The initialization of status_reg is: u32 __iomem *status_reg = &ehci->regs->port_status[(wIndex & 0xff) - 1]; where ehci->regs is a pointer to the packed struct ehci_regs. So, this is the same problem of casting pointers to stricter alignment. ^ permalink raw reply [flat|nested] 20+ messages in thread
* ARM unaligned MMIO access with attribute((packed)) 2011-04-27 14:06 ` Rabin Vincent @ 2011-04-27 16:25 ` Alan Stern 2011-04-27 16:37 ` Arnd Bergmann 0 siblings, 1 reply; 20+ messages in thread From: Alan Stern @ 2011-04-27 16:25 UTC (permalink / raw) To: linux-arm-kernel On Wed, 27 Apr 2011, Rabin Vincent wrote: > On Wed, Apr 27, 2011 at 00:21, Alan Stern <stern@rowland.harvard.edu> wrote: > > On Tue, 26 Apr 2011, Rabin Vincent wrote: > >> In my case it's this writel() in ehci-hub.c that gets chopped into > >> strbs: > >> > >> ? ? ? /* force reset to complete */ > >> ? ? ? ehci_writel(ehci, temp & ~(PORT_RWC_BITS | PORT_RESET), > >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? status_reg); > > > > Why would that get messed up? ?The status_reg variable doesn't have any > > __atribute__((packed)) associated with it. > > The initialization of status_reg is: > > u32 __iomem *status_reg > = &ehci->regs->port_status[(wIndex & 0xff) - 1]; > > where ehci->regs is a pointer to the packed struct ehci_regs. So, this > is the same problem of casting pointers to stricter alignment. Right. I can understand the compiler complaining about the cast to stricter alignment during the initialization. But I don't understand why that would affect the code generated for the writel function. Alan Stern ^ permalink raw reply [flat|nested] 20+ messages in thread
* ARM unaligned MMIO access with attribute((packed)) 2011-04-27 16:25 ` Alan Stern @ 2011-04-27 16:37 ` Arnd Bergmann 2011-04-28 13:35 ` Alan Stern 0 siblings, 1 reply; 20+ messages in thread From: Arnd Bergmann @ 2011-04-27 16:37 UTC (permalink / raw) To: linux-arm-kernel On Wednesday 27 April 2011 18:25:40 Alan Stern wrote: > On Wed, 27 Apr 2011, Rabin Vincent wrote: > > > On Wed, Apr 27, 2011 at 00:21, Alan Stern <stern@rowland.harvard.edu> wrote: > > > On Tue, 26 Apr 2011, Rabin Vincent wrote: > > >> In my case it's this writel() in ehci-hub.c that gets chopped into > > >> strbs: > > >> > > >> ? ? ? /* force reset to complete */ > > >> ? ? ? ehci_writel(ehci, temp & ~(PORT_RWC_BITS | PORT_RESET), > > >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? status_reg); > > > > > > Why would that get messed up? ?The status_reg variable doesn't have any > > > __atribute__((packed)) associated with it. > > > > The initialization of status_reg is: > > > > u32 __iomem *status_reg > > = &ehci->regs->port_status[(wIndex & 0xff) - 1]; > > > > where ehci->regs is a pointer to the packed struct ehci_regs. So, this > > is the same problem of casting pointers to stricter alignment. > > Right. I can understand the compiler complaining about the cast to > stricter alignment during the initialization. But I don't understand > why that would affect the code generated for the writel function. The compiler does not complain, it just silently assumes that it needs to do byte accesses. There is no way to tell the compiler to ignore what it knows about the alignment, other than using inline assembly for the actual pointer dereference. Most architectures today do that, but on ARM it comes down to "*(u32 *)status_reg = temp". Arnd ^ permalink raw reply [flat|nested] 20+ messages in thread
* ARM unaligned MMIO access with attribute((packed)) 2011-04-27 16:37 ` Arnd Bergmann @ 2011-04-28 13:35 ` Alan Stern 2011-04-28 14:16 ` Arnd Bergmann 0 siblings, 1 reply; 20+ messages in thread From: Alan Stern @ 2011-04-28 13:35 UTC (permalink / raw) To: linux-arm-kernel On Wed, 27 Apr 2011, Arnd Bergmann wrote: > On Wednesday 27 April 2011 18:25:40 Alan Stern wrote: > > On Wed, 27 Apr 2011, Rabin Vincent wrote: > > > > > On Wed, Apr 27, 2011 at 00:21, Alan Stern <stern@rowland.harvard.edu> wrote: > > > > On Tue, 26 Apr 2011, Rabin Vincent wrote: > > > >> In my case it's this writel() in ehci-hub.c that gets chopped into > > > >> strbs: > > > >> > > > >> ? ? ? /* force reset to complete */ > > > >> ? ? ? ehci_writel(ehci, temp & ~(PORT_RWC_BITS | PORT_RESET), > > > >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? status_reg); > > > > > > > > Why would that get messed up? ?The status_reg variable doesn't have any > > > > __atribute__((packed)) associated with it. > > > > > > The initialization of status_reg is: > > > > > > u32 __iomem *status_reg > > > = &ehci->regs->port_status[(wIndex & 0xff) - 1]; > > > > > > where ehci->regs is a pointer to the packed struct ehci_regs. So, this > > > is the same problem of casting pointers to stricter alignment. > > > > Right. I can understand the compiler complaining about the cast to > > stricter alignment during the initialization. But I don't understand > > why that would affect the code generated for the writel function. > > The compiler does not complain, it just silently assumes that it needs > to do byte accesses. There is no way to tell the compiler to ignore > what it knows about the alignment, other than using inline assembly > for the actual pointer dereference. Most architectures today do that, > but on ARM it comes down to "*(u32 *)status_reg = temp". Ah -- so the compiler associates the alignment attribute with the data value and not with the variable's type? I didn't know that. Alan Stern ^ permalink raw reply [flat|nested] 20+ messages in thread
* ARM unaligned MMIO access with attribute((packed)) 2011-04-28 13:35 ` Alan Stern @ 2011-04-28 14:16 ` Arnd Bergmann 0 siblings, 0 replies; 20+ messages in thread From: Arnd Bergmann @ 2011-04-28 14:16 UTC (permalink / raw) To: linux-arm-kernel On Thursday 28 April 2011, Alan Stern wrote: > > The compiler does not complain, it just silently assumes that it needs > > to do byte accesses. There is no way to tell the compiler to ignore > > what it knows about the alignment, other than using inline assembly > > for the actual pointer dereference. Most architectures today do that, > > but on ARM it comes down to "*(u32 *)status_reg = temp". > > Ah -- so the compiler associates the alignment attribute with the data > value and not with the variable's type? I didn't know that. The behavior here is unspecified because the underlying typecase is not valid. Gcc apparently uses some heuristics trying to do the right thing, and in recent versions that heuristic seems to have changed. Arnd ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2011-04-28 14:16 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-02-02 16:00 ARM unaligned MMIO access with attribute((packed)) Arnd Bergmann 2011-02-02 16:37 ` Russell King - ARM Linux 2011-02-02 17:39 ` Arnd Bergmann 2011-02-02 19:14 ` Ian Lance Taylor 2011-02-02 21:38 ` David Miller 2011-02-02 21:45 ` Russell King - ARM Linux 2011-02-02 21:59 ` David Miller [not found] ` <yw1xei7qm3vy.fsf@unicorn.mansr.com> 2011-02-02 23:23 ` David Miller 2011-02-03 9:26 ` Arnd Bergmann 2011-02-03 15:03 ` Arnd Bergmann 2011-02-02 16:51 ` Richard Guenther 2011-02-02 17:09 ` Russell King - ARM Linux 2011-02-02 17:39 ` Joseph S. Myers 2011-04-26 15:00 ` Rabin Vincent 2011-04-26 18:51 ` Alan Stern 2011-04-27 14:06 ` Rabin Vincent 2011-04-27 16:25 ` Alan Stern 2011-04-27 16:37 ` Arnd Bergmann 2011-04-28 13:35 ` Alan Stern 2011-04-28 14:16 ` Arnd Bergmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).